To provide an analysis of variance (ANOVA) test per selected attribute.
ANOVA examines sources of variation in the data: this is often used in sensory science to investigate whether variation in attributes is due to products, samples and replications (amongst other variables) and in consumer science too see if products differ in (say) liking.
Where there is an effect of product or assessor, the analyst needs to know between what products or assessors the differences lie: a multiple comparison test of mean scores is performed to achieve this. Several multiple comparison tests are available in the EyeOpenR ANOVA module.
Note: for EyeOpenR to read your data, the first five columns must include the following in the specified order: Assessor, Product, Session, Replica and Sequence. Sensory attributes or consumer variables start from column six (Column F). If there is no session, replica or sequence information available, the user should input a value of “1” in each cell in the column that contains no collected information. See the example dataset for more information.
ANOVA is one of the most commonly used statistical tests in consumer and sensory science. It allows variability within the data to be partitioned into several specified effects, for example, a Product effect, an Assessor effect, a Replicate effect, etc., as well as random error. The choice of what effects (aka terms) to include in an ANOVA model is chosen by the user from a shortlist in EyeOpenR, constrained by whether there are replicates, sessions or sequences in the data.
Common ANOVA models – sensory science: One of the most common in sensory science is to model each sensory variable (y) by a Product effect, an Assessor effect, and the interaction between the two, that is, Product by Assessor, as well as random error. This affords the user to understand whether there are significant differences between products, between assessors and whether there is a significant interaction term, which in this case would indicate that assessors rate the products differently on the chosen attribute. In other words, the interaction provides information on whether assessors disagree with product intensities on that attribute. Ideally, we would have agreement in the panel and thereby a non-significant interaction effect. In EyeOpenR, the Product effect is determined by examining the variation between products to the level of disagreement amongst the panel/assessors. If there is variation between products and low disagreement amongst assessors, then we can be confident that there are differences between the products on that attribute. ANOVA can therefore be thought of as a signal to noise ratio, whereby the product differences are the signal and the level of disagreement is the noise.
In sensory science it is also quite common to add session and/or replicate effects to the ANOVA model, depending on the data and research question.
Common ANOVA models – consumer science: Like sensory science, ANOVA is very popular in consumer science, where again, variability in a quantitative variable (e.g., liking, acceptance, willingness to pay) is partitioned into (say) a Product effect, a Consumer effect and random error. This allows the analyst to examine if there are differences between products in liking/acceptance (as measured by the product effect), and to see if there are differences between consumers (as measured by the consumer effect). Typically consumer tests have a much larger number of participants than a trained sensory panel: it is not surprising to see a significant consumer effect for this reason. However, including a consumer effect term in our ANOVA model helps to reduce the variability partitioned into random error. This is important as the denominator in the calculation of the product effect in consumer science ANOVA models is typically the random error (also known as the residual error). In other words, including a consumer effect helps to explain the variability in the data that is not due to random noise and as a result, we are more likely to see product differences due to reduced noise.
Unlike sensory science, a product-by-assessor interaction effect is rarely used in consumer tests, as it is acknowledged that consumers can disagree on what products they may like/dislike. Commonly a consumer panel is not trained to perceive attribute intensities but to provide hedonic information, which can of course vary from individual to individual. The product-by-assessor interaction effect is therefore quite redundant in this context, as disagreement is expected and accepted (for example, differences in consumer liking can be basis for subsequent cluster analysis). Rather, the noise measure in a consumer test is more likely to be the variation that is unaccounted for by the product and consumer effects in the model (i.e., the residual error of the model).
Importantly, if the user has requested an ANOVA model that does not include a particular variation source, then the user must not conclude this variation does not exist (Kemp, Hollowood & Hort, 2009). Rather the effect will be part of the residual error. For example, if there are replicates in the data and a replicate effect has not been chosen as a model term, the effect of replicate will be placed in the residual error term.
The user is required to input their level of significance prior analyses: either 1% (0.01) , 5% (0.05) or 10% (0.1). These numbers refer to the value of alpha: the risk of rejecting the null hypothesis when it is in fact true (also known as Type I error). The level of confidence is 1-alpha: so choosing a 5% significance level will provide results with 95% level of confidence. A ‘significant’ effect is indicated by the p-value in the EyeOpenR output being lower than the alpha value which the user sets prior analyses (e.g., p=0.03 when alpha is 0.05 would be termed “significant”).
If there is a significant effect of, say Product, then the user requires an understanding of what products differ (e.g., are Products A and B different? Products A and C? Products A and D? Etc.). This is the role of a multiple comparison test, which statistically compares the means of each product and outputs results, either in a group or pairwise format. There are several of the most commonly used multiple comparison tests available in EyeOpenR.
Multiple comparison tests vary in their computation and conservativism: when some say that a test is ‘conservative’ they refer to the test being less likely to report a significant difference between two products, whereas more ‘liberal’ tests would report a difference. Conservative tests lower the risk of Type I error: that is, they lower the risk of rejecting the null hypothesis of no-difference. This means that conservative tests report less differences between products. Lowering the risk of Type I error is however at the expense of increasing the risk of Type II error: more conservative tests will be less likely to report a “significant difference” when there actually is a difference in reality. This is also known as being less powerful. Tests that are more ‘liberal’ and therefore more inclined to report a significant difference between products exists, which lowers the risk of Type II error but increases the risk of Type I.
Due to differences in results provided by different multiple comparison tests, the analyst may well ask ‘which multiple comparison should I use?’. The answer to this question should be decided prior the experiment, depending on company policy, the number of products in the test, whether there is a reference product and whether there are particular pairwise comparison tests of interest vs. all pairwise comparisons.
Many general statistics textbooks (e.g., Ott & Longnecker, 2015) and sensory-specific textbooks (e.g., O’Mahony, 1986) elaborate on the construction of multiple comparison tests and the differences between them. Fisher’s LSD is widely regarded as the most liberal (purporting differences between products), whilst the others vary in their higher degrees of conservatism by controlling for Type I errors. Being more conservative is often important when there are many products that are compared at the pairwise level, as the risk of a Type I error increases quickly with each comparison (the error rate = 1 – (1 – alpha)^k where k is the number of pairwise tests). So, with six products and an alpha initially set to 0.05, there are 15 pairwise tests, which equates to an error rate of 1- (1 – 0.05)^15 = 0.54. This error rate is likely to be interpreted as too high and therefore the analyst may seek to protect against such a high Type I error rate by using an alternative multiple comparison test.
The reader is recommended to study statistical textbooks to further their understanding on the computation and conservatism of different tests (e.g., Ott & Longnecker, 2015).
If a replicate effect is included (see above), then this option also relates to a Product by Replicate interaction being included in the model. This could be of value if wishing to evaluate whether certain products show differences across replicates.
The ANOVA table showing the p-values associated with the specified ANOVA model. This is colour-coded to aid interpretation.
A sensory researcher may well wish to understand if there are differences between products per attribute: this will be reflected in the ‘Product’ column. In general, the researcher would like significant differences per attribute in the product column, which indicates the sensory panel can discriminate between products on this attribute.
Likewise the ‘Assessor’ and ‘Replica’ column provides information on whether there are differences between assessors and replicates per attribute.
P-values for the requested interaction terms are
then presented: small p-values are generally not wanted: e.g., a significant
p-value (say p <.05) for the Product:Assessor interaction indicates
significant disagreement among the sensory panel concerning that attribute. Product:Replica
and Assessor:Replica can be interpreted in similar fashion if they are
included in the model.
Depending on the chosen multiple comparison test, the naming
of next set of tabs vary according to the test chosen and whether the display
of test result is requested as either ‘Group’ or ‘Pairwise’:
In the LSD tab, products sharing the same letter are not significantly different. The same information is provided in the LSD Letters tab, now with group indicated by one group per column. LSD values are given per variable in the LSD values tab: these reflect the smallest (least) difference between products that is deemed ‘significant’. A value greater than the LSD between two product mean values will see the test report the two products as significantly different. To see both means and group allocation, see LSD (Group) tab.The LSD Differences tab provides the difference in means between every two-product combination. Confidence Intervals are then provided, alongside the p-value. To aid interpretation, the column ‘Sig.’ provides either no, one, two or three asterisks, if the difference is significant according to the significance levels chosen by the user.Finally, model information is available in the Information tab, such as whether the arithmetic or adjusted means was used. All results can be exported to Excel via the icon.For Fisher’s LSD with Pairwise option, the output is largely similar, except the LSD (Pairwise) table found under the LSD (Pairwise) tab: this is a table of means with each product coded as a letter (e.g., Product1 = A). A table is presented with each product as a separate column and each variable in rows: a cell corresponds to the respective products’ mean score and the codes of the products that are significantly less than the product. So, if “67.03 B” was entered in the cell of Product A by Attribute1, then Product A has a mean of 67.03 on attribute 1, which is significantly more than the product coded B.
One difference in output between SNK vs. both Tukey and Fisher is that confidence intervals around the mean for the difference between two products cannot be calculated for the SNK test. Therefore no “Differences” tab appears in the SNK output.