- See the profiling dataset.
- The attributes should be of scale or interval type.
- Replications are required for this analysis.
Background
This analysis is primarily based on ANOVA models for each attribute, one model for the whole panel and then smaller individual models for each panellist.
On the Panel level this analysis fits an ANOVA model for each attribute:
Attribute = Product + Assessor + Replica + Product:Assessor + Product:Replicate + Assessor:Replicate + Residuals
If an attribute is constant then it is not possible to fit this model and this attribute will be omitted from the analysis with a warning given in the INFORMATION tab. Similarly, if an attribute is not evaluated on all products and all replicates or is not evaluated enough times to fit the model.
For example, to fit the model Attribute = Product + Assessor + Residuals to a dataset with 10 products and 11 Assessors the dataset needs at least 20 observations.
Intercept + Product + Assessor = Total
1 + (10 – 1) + (11 – 1) = 20
On the Panellist level for each panellist and attribute this analysis fits the following ANOVA model:
Attribute = Product + Residuals
Options
- Type of Panel Performance: Choose to calculate the Reproducibility or Repeatability on the panel and panellist level.
- Include borderline: Yes or No to including the “borderline” attributes when counting the number of “good” attributes.
- Number of Decimals for Values: The number of decimals places to round values to.
- Number of Decimals for P-Values: The number of decimals places to round p-values to.
- Anonymise Assessors? Choose to replace the assessor names or not. There are options for randomly generated names or names from the assessor metadata.
- Anonymise Products? Choose to replace the product names or not. There are options for randomly generated names or names from the product metadata.
- Anonymise Attributes? Choose to replace the attribute names or not. There are options for randomly generated names or names from the attribute metadata.
Results and Interpretation
Discrimination
Discrimination here refers to the assessors’ ability to discriminate between products. We quantify this through the ANOVA models and the discrimination p-values are the p-values associated to the product effect in these ANOVA models. A lower discrimination p-value is usually more desirable because this suggests that the assessors are able to distinguish between the products.
The PANELLIST DISCRIMINATION table displays the discrimination p-values for each panellist against each attribute, highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds.
Similarly, the panel level discrimination for each attribute is displayed in the PANEL SUMMARY table, highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds.
Agreement
Agreement here refers to the level of consensus between assessors.
On the panel level we quantify this through the ANOVA models and the agreement p-values are the p-values associated to the Product/Assessor interaction. A higher agreement p-value is more desirable because this suggests that there is broad consensus between assessors. This information is displayed in the PANEL SUMMARY table, highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds.
On a panellist level we quantify this by the correlation between the adjusted product means on panel level and on the panellist level for each assessor, a correlation closer to 1 is usually more desirable. The PANELLIST AGREEMENT table displays this for each panellist on each attribute and the median across the attributes. The correlation coefficients are highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds. The panellist agreement should be interpreted with caution if your dataset is not balanced.
Unsatisfactory agreement can occur for many different reasons, four common causes are [1]:
- An assessor using a different range of the scale, often called magnitude error.
- An assessor not being able to discriminate between a set of products in contrast to other assessors, often called a non-discriminator error.
- Crossover error, where an assessor rates a product in a different direction to most other assessors.
- Commonly called non-perceiver error, this is where an assessor does not perceive a particular attribute.
Reproducibility
Reproducibility is a measure of the consistency of the panel across replicates. This is calculated from the Replicate/Product interaction in the panel level ANOVA models. A statistically significant reproducibility indicates that the products are perceived differently across replicates, this can have a variety of causes for example the panellists learning or instabilities in the products.
The reproducibility for each attribute is displayed in the PANEL SUMMARY table and highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds.
Repeatability
Repeatability is a measure of the consistency of the assessors in evaluating the same products. This is calculated from the model Attribute = Product + Assessor + Product:Assessor + Residuals, partitioning by assessor the standardised residuals of this model and comparing to the appropriate Chi-Squared distribution. A larger repeatability p-value is more desirable.
The PANELLIST REPEATABILITY P-VALUES table is of panellist level repeatability p-values, larger is more desirable, these are highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds.
The PANELLIST REPEATABILITY table is of repeatability standard deviations, lower is more desirable. These are harder to interpret than the associated p-values and mostly are included for backwards compatibility.
The panel level repeatability is measured by the residual mean square of the panel level ANOVA model for each attribute. This is then displayed in the PANEL SUMMARY table and highlighted based on the quantiles of panel level repeatability across all attributes, values less than the 75% quantile are highlighted “Good”, between 75% and 90% highlighted “Borderline”, between 90% and 95% highlighted “Poor” and highlighted “Bad” if greater than the 95% quantile.
Summary tables
The PANELLIST SUMMARY (%) table summarises the panellist level results, giving the proportion of attributes a panellist performs “Good”, or additionally “Borderline” if the “Include borderline” option is chosen, for each measure of performance.
Similarly, the PANEL SUMMARY (%) summarises the results in the PANEL SUMMARY table, giving the proportion of “Good”, or additionally “Borderline” if the “Include borderline” option is chosen, attributes for each term.
- R packages: SensoMineR
- R function settings that are not otherwise visible to the user.
- All attributes are coerced to numeric and any attribute that is not coercible to numeric or has zero variance will be discarded.
- Replicates are required.
- If there is a perfect fit then the panellist performance steps may fail in a way which currently has no good fix. If this happened then the available output will be returned along with the warning table.
- The panellist agreement table is unreliable is the dataset is too imbalanced, and it will be removed if so.
References
[1] M. Kermit and V. Lengard, “Assessing the performance of a sensory panel–panellist monitoring and tracking,” Journal of Chemometrics, vol. 19, no. 3, pp. 154-161, 2005.