Descriptive Statistics

Descriptive Statistics

Purpose

The first stage in understanding the data in a sensory or consumer study is to summarise using descriptive statistics. For each product and attribute (or consumer liking) there will be many scores arising from multiple assessors and potentially sessions and replicates. This module enables an initial assessment of the data and can be useful for understanding whether missing values are a problem, or how the range of scores have been used.

Data Format

  1. Profiling.xlsx
  2. Consumer.xlsx

For EyeOpenR to read your data, the first five columns of the ‘Data’ sheet must be in the following order: assessor (consumer), product, session, replicate and order (sequence). For sensory analysis the data for attributes should be in the sixth column (column F) onwards. There should be one column for each attribute. The attributes data should be numeric. For consumer analysis the data for consumer liking and other consumer assessments (ratings) should be in the sixth column (column F) onwards. There should be one column for each rating. These are described as attributes in the options.

If there is no session, replicate or order information then these columns should contain the value ‘1’ in each cell.

Additional information about the data in the ‘Data’ sheet can be included in additional sheets. The ‘Attributes’ sheet can be used to specify the names of the attributes, data types and minimum and maximum values that are used to check data quality. The ‘Assessors’ sheet can be used to specify assessor names if codes are used in the ‘Data’ sheet. Similarly, the ‘Products’ sheet can be used to specify product names if codes are used in the ‘Data’ sheet. See the example spreadsheet for an illustration of the data format. 

Background

One of the initial phases of data analysis is to summarise the data in such a way that the data can be checked and a first understanding of how products differ for the attributes measured can be formed.

The descriptive statistics module creates the following summary statistics: number of observations, number of missing values, minimum, maximum, mean, standard deviation, median, variance and standard error of the mean for each product and attribute.

The median is an average that describes the central value when all assessments are ranked from highest to lowest. It is not as sensitive to unusually high values.

The standard deviation is a measure of the variation between assessments. The variance is the standard deviation squared, another way of expressing the variation in the scores.

The standard error of the mean describes the uncertainty in the mean assessment.

Confidence intervals express the uncertainty in the mean as a range of values. A 95% confidence interval contains the true mean 95% of the time.

Options

  1. Split results on: There are five options here; None, Judge, Attribute, Product, Session. If a choice is made other than None, the results are presented separately for each level of the split selected. For consumer studies ‘Attribute’ refers to the consumer ratings. Note that if Judge or Product are selected then the Type of Mean must be ‘Arithmetic’.
  2. Type of Mean: There are two options: ‘Adjusted’ presents adjusted means derived from an two-way ANOVA of products and assessors, and ‘Arithmetic’ presents unadjusted means. If the design of the study is balanced then adjusted and arithmetic means are identical.
  3. Min/Max: Include minimum and maximum in the table if ‘Yes’ is selected.
  4. Median: Include the median in the table if ‘Yes’ is selected. 
  5. Standard Deviation: Include the standard deviation in the table if ‘Yes’ is selected. 
  6. Standard Error of the Mean: Include the standard error of the mean in the table if ‘Yes’ is selected.
  7. Variance: Include the variance in the table if ‘Yes’ is selected. 
  8. Confidence intervals: Include the confidence intervals in the table if ‘Yes’ is selected. 
  9. Significance Level Confidence Intervals: This option is greyed out unless the confidence intervals are requested. There are three options: 1%, 5% and 10%. If 1% is selected then the 99% confidence intervals are shown in the table, if 5% then the 95% confidence intervals and if 10% the 90% confidence intervals are shown. Choosing 1% will give a wider confidence interval and choosing 10% will give a narrower confidence interval.
  10. Number of Decimals for Values: Specify the number of decimal places shown for statistics in the table.

Results and Interpretation

There is one table for each product in the data, and one table for the assessments of all products combined (‘Overall’). These are selected by clicking on the box that describes the product (or ‘Overall’ for the combined results). Each table has a row for each attribute in the data and a column for each descriptive statistic.

Some of the columns in the table are always reported and others are optional and selected through the options. The first column reports the number of non-missing assessments included in that row of the table. This is important for understanding how much data is being summarised. For the ‘Overall’ table in a balanced design with no missing values it will be the number of products multiplied by the number of assessors, multiplied by the number of sessions and the number of replicates. The other columns that are always reported are the number of missing values and the mean.

If the option to ‘Split results on’ has been selected there will be an additional set of boxes at the top of the display representing the different levels of the split variable. Selecting different values of the split variable will change the table to reflect the choice.

Technical Information

R packages used:

  1. min, max, mean, sd, median, var, stderr (installed with standard R installation) for calculating the minimum, maximum, mean, standard deviation, median, variance, and standard error of the mean. 

References

  1. Martin Bland (2015) “An Introduction to Medical Statistics – 4th Edition”, Oxford University Press.  See chapter 4 “Summarizing data”.


    • Related Articles

    • How To Cite EyeQuestion/EyeOpenR In Your Article

      Data Collection If you have used EyeQuestion for data collection, you can use the following citation form: EyeQuestion® (version xxx, EyeQuestion Software, the Netherlands) Data Analysis If you have used EyeOpenR for data analysis, you can use the ...
    • Napping Analysis

      Purpose To provide an analysis of data collected using the napping methodology. Data Format Napping.xlsx For EyeOpenR to read your data the first five columns must include the following in the specified order: Assessor, Product, Session, Replica and ...
    • Free Sorting

      Purpose To provide analysis of data collected from the Free Sorting (FS) procedure. This procedure presents assessors with a set of samples (products) with the instruction to group the samples into an unspecified number of groups, to satisfy the ...
    • Principal Component Analysis (PCA)

      Purpose To provide a Principal Components Analysis (PCA) of the imported data. PCA is a popular dimensionality reduction method in sensory and consumer science. In non-technical terms, the analysis reduces an initial set of variables to a smaller set ...
    • ANOVA with Multiple Comparison Tests

      Purpose To provide an analysis of variance (ANOVA) test per selected attribute. ANOVA examines sources of variation in the data: this is often used in sensory science to investigate whether variation in attributes is due to products, samples and ...