This analysis fits a linear model for each attribute:
Attribute = Product + Assessor + Scale + Agreement + Residuals
Where the Scale and Agreement terms are from partitioning the interaction term in the following model:
Attribute = Product + Assessor + Product:Assessor + Residuals.
For each attribute we have an ANOVA table from the panel level models. These can be interpreted in the same way as any other ANOVA table. The Assessor, Product and Scaling effects are tested against the Agreement Mean Square, instead of the Mean Square Error (MSE).
Discrimination here refers to the assessors’ ability to discriminate between products. We quantify this through the ANOVA models and the discrimination p-values are the p-values associated to the product effect in these ANOVA models. A lower discrimination p-value is usually more desirable because this suggests that the assessors are able to distinguish between the products.
The DISCRIMINATION table displays the panellist level discrimination p-values for each panellist against each attribute, highlighted “Good”, “Poor” or “Bad” based on commonly used thresholds.
The Panel level discrimination can be viewed either in the ANOVA tables or summarised in the PANEL PERFORMANCE TABLE. Which displays the discrimination F-values for each attribute and highlights these green if less than the chosen Significant Threshold (panel) or red if greater than.
Scaling here refers to the assessors’ use of the scale when evaluating products, with an interest in whether assessors use the scale differently to each other. We quantify through the linear models and ANOVA. A higher scaling p-value is usually more desirable because this suggest that the assessors are using the scale in a similar manner.
The SCALING COEFFICIENTS table is a table of scaling coefficients for each panellist against each attribute. These are coefficients in the linear model for scaling, a coefficient greater than one suggests the panellist spreads their scores more than the panel, whereas a coefficient between 0 and 1 suggests the panellist spreads their scores more than the panel. A negative coefficient is also possible, suggesting for example the associated assessor may be using the scale in the wrong direction. How large a difference from 1 is statistically significant is addressed by the SCALING P-VALUES.
The SCALING P-VALUES table tabulates the p-values of statistically testing if the scaling coefficients are different from 1, for each judge and attribute. These are highlighted “Good”, “Poor”, or “Bad” based on commonly used thresholds.
The Panel level scaling can be viewed either in the ANOVA tables or summarised in the PANEL PERFORMANCE TABLE. Which displays the scaling F-values for each attribute and highlights these green if greater than the chosen Significant Threshold (panel) or red if less than.
Agreement here refers to the level of consensus between assessors. Unlike in the Panellist Performance analysis, here scaling effects are separated out from agreement, this is sometimes referred to as “pure agreement” in the literature.
We quantify this through the linear models and ANOVA. A higher agreement p-value is more desirable because this suggests that there is broad consensus between assessors.
The Panel level agreement can be viewed either in the ANOVA tables or summarised in the PANEL PERFORMANCE TABLE. Which displays the agreement F-values for each attribute and highlights these green if greater than the chosen Significant Threshold (panel) or red if less than.
The AGREEMENT table displays the panellist level agreement p-values for each panellist against each attribute, highlighted “Good”, “Poor” or “Bad” based on commonly used thresholds.
Repeatability is a measure of the consistency of the assessors in evaluating the same products.
On the panel level this is calculated from the root mean square error of the appropriate ANOVA and displayed in the ANOVA RESULTS and summarised in PANEL PERFORMANCE table. A smaller root mean square error is generally preferable.
On the panellist level we fit the ANOVA models:
Attribute = Product + Residuals
For each panellist we then use F-tests to compare across panellists. The results are tabulated in the REPEATABILITY table, for every panellist and attribute, and are highlighted “Good”, “Poor” or “Bad” by commonly used p-value thresholds.
The MAMCAP table attempts to communicate all panellist level results in one table. It is a table of panellist against attribute. Agreement is displayed by the colour coding of the table, while other terms are communicated via symbols.
If the MAMCAP table is not selected, then the OVERALL SUMMARY table is output instead. This displays a count of the number of attributes the preferable side of the chosen Significant Threshold (panellist) for each panellist and term.
The PANELLIST SUMMARY (%) table summarises the panellist level results, for each measure of performance giving the proportion of attributes a panellist has an associated p-value the preferable side of the chosen Significant Threshold (panellist). This is the case as a significant discrimination p-value suggests the panellist can distinguish the products but a significant agreement p-value suggests the panellist is in disagreement with the panel.
Similarly, the PANEL SUMMARY (%) summarises the results in the PANEL PERFORMANCE table, giving the proportion of attributes the preferable side of the chosen Significant Threshold (panel) for each term (except Repeatability).
Attributes with zero variance will be removed from the analysis with a warning included in this table.
[1] |
C. Peltier, P.B. Brockhoff, M. Visalli, P. Schlich, “The MAM-CAP table: A new tool for monitoring panel performances”, Food Quality and Preference, vol. 32, part A, pp. 24-27, 2014. |
[2] |
Sofie Pødenphant, Minh H. Truong, Kasper Kristensen, Per B. Brockhoff, “The Mixed Assessor Model and the multiplicative mixed model”, Food Quality and Preference, vol. 74, pp. 38-48, 2019. |