Panelist Performance Analysis

Panelist Performance Analysis

Mar 2022
Author: JM

Purpose

This analysis looks at the overall panel performance in terms of Discrimination, Agreement and Repeatability or Reproducibility, and then the performance of each individual in the panel in these terms.

Data Format

  1. See the profiling dataset.
  2. The attributes should be of scale or interval type.
  3. Replications are required for this analysis.

Background

This analysis is primarily based on ANOVA models for each attribute, one model for the whole panel and then smaller individual models for each panellist. 

On the Panel level this analysis fits an ANOVA model for each attribute:
Attribute = Product + Assessor + Replica + Product:Assessor + Product:Replicate + Assessor:Replicate + Residuals

If an attribute is constant then it is not possible to fit this model and this attribute will be omitted from the analysis with a warning given in the INFORMATION tab. Similarly, if an attribute is not evaluated on all products and all replicates or is not evaluated enough times to fit the model. 

For example, to fit the model Attribute = Product + Assessor + Residuals to a dataset with 10 products and 11 Assessors the dataset needs at least 20 observations.
Intercept + Product + Assessor = Total
      1       + (10 – 1) + (11 – 1) = 20

On the Panellist level for each panellist and attribute this analysis fits the following ANOVA model:
Attribute = Product + Residuals 

Options

  1. Type of Panel Performance: Choose to calculate the Reproducibility or Repeatability on the panel and panellist level.
  2. Include borderline: Yes or No to including the “borderline” attributes when counting the number of “good” attributes.
  3. Number of Decimals for Values: The number of decimals places to round values to.
  4. Number of Decimals for P-Values: The number of decimals places to round p-values to.
  5. Anonymise Assessors? Choose to replace the assessor names or not. There are options for randomly generated names or names from the assessor metadata.
  6. Anonymise Products? Choose to replace the product names or not. There are options for randomly generated names or names from the product metadata. 
  7. Anonymise Attributes? Choose to replace the attribute names or not. There are options for randomly generated names or names from the attribute metadata. 

Results and Interpretation

Discrimination

Discrimination here refers to the assessors’ ability to discriminate between products. We quantify this through the ANOVA models and the discrimination p-values are the p-values associated to the product effect in these ANOVA models. A lower discrimination p-value is usually more desirable because this suggests that the assessors are able to distinguish between the products.

The PANELLIST DISCRIMINATION table displays the discrimination p-values for each panellist against each attribute, highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds.

Similarly, the panel level discrimination for each attribute is displayed in the PANEL SUMMARY table, highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds. 

Agreement

Agreement here refers to the level of consensus between assessors. 

On the panel level we quantify this through the ANOVA models and the agreement p-values are the p-values associated to the Product/Assessor interaction.  A higher agreement p-value is more desirable because this suggests that there is broad consensus between assessors. This information is displayed in the PANEL SUMMARY table, highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds.  

On a panellist level we quantify this by the correlation between the adjusted product means on panel level and on the panellist level for each assessor, a correlation closer to 1 is usually more desirable. The PANELLIST AGREEMENT table displays this for each panellist on each attribute and the median across the attributes. The correlation coefficients are highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds. The panellist agreement should be interpreted with caution if your dataset is not balanced.

Unsatisfactory agreement can occur for many different reasons, four common causes are [1]:
  1. An assessor using a different range of the scale, often called magnitude error.
  2. An assessor not being able to discriminate between a set of products in contrast to other assessors, often called a non-discriminator error. 
  3. Crossover error, where an assessor rates a product in a different direction to most other assessors.
  4. Commonly called non-perceiver error, this is where an assessor does not perceive a particular attribute.  

Reproducibility

Reproducibility is a measure of the consistency of the panel across replicates. This is calculated from the Replicate/Product interaction in the panel level ANOVA models. A statistically significant reproducibility indicates that the products are perceived differently across replicates, this can have a variety of causes for example the panellists learning or instabilities in the products. 

The reproducibility for each attribute is displayed in the PANEL SUMMARY table and highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds.

Repeatability

Repeatability is a measure of the consistency of the assessors in evaluating the same products. This is calculated from the model Attribute = Product + Assessor + Product:Assessor + Residuals, partitioning by assessor the standardised residuals of this model and comparing to the appropriate Chi-Squared distribution. A larger repeatability p-value is more desirable.

The PANELLIST REPEATABILITY P-VALUES table is of panellist level repeatability p-values, larger is more desirable, these are highlighted “Good”, “Borderline”, “Poor” or “Bad” based on commonly used thresholds. 

The PANELLIST REPEATABILITY table is of repeatability standard deviations, lower is more desirable. These are harder to interpret than the associated p-values and mostly are included for backwards compatibility. 

The panel level repeatability is measured by the residual mean square of the panel level ANOVA model for each attribute. This is then displayed in the PANEL SUMMARY table and highlighted based on the quantiles of panel level repeatability across all attributes, values less than the 75% quantile are highlighted “Good”, between 75% and 90% highlighted “Borderline”, between 90% and 95% highlighted “Poor” and highlighted “Bad” if greater than the 95% quantile. 

Summary tables

The PANELLIST SUMMARY (%) table summarises the panellist level results, giving the proportion of attributes a panellist performs “Good”, or additionally “Borderline” if the “Include borderline” option is chosen, for each measure of performance.

Similarly, the PANEL SUMMARY (%) summarises the results in the PANEL SUMMARY table, giving the proportion of “Good”, or additionally “Borderline” if the “Include borderline” option is chosen, attributes for each term.

Technical Information

  1. R packages: SensoMineR
  2. R function settings that are not otherwise visible to the user. 
    1. All attributes are coerced to numeric and any attribute that is not coercible to numeric or has zero variance will be discarded. 
    2. Replicates are required.
    3. If there is a perfect fit then the panellist performance steps may fail in a way which currently has no good fix. If this happened then the available output will be returned along with the warning table.
    4. The panellist agreement table is unreliable is the dataset is too imbalanced, and it will be removed if so.

References

[1] M. Kermit and V. Lengard, “Assessing the performance of a sensory panel–panellist monitoring and tracking,” Journal of Chemometrics, vol. 19, no. 3, pp. 154-161, 2005. 



    • Related Articles

    • Direct Feedback, Panelist Feedback on a scale, Panelist Feedback

      Direct Feedback The direct feedback function provides the feedback live to panelists.  The mean and sd need to be specified for each attribute for each product, so you need to know ahead of the mean and the standard deviation value.  When panelists ...
    • Why are the results from panel and panelist performance different?

      Although the panel performance and the panellist performance seems to be linked, at least from an interpretation point of view (one could expect that the panel is discriminating if panellists are discriminated), the analyses involved different data ...
    • Panelist Feedback Analysis

      Mar 2022 Author: JM Purpose Plots summarising of the scoring range for each judge on each of the samples and attributes. Data Format See the profiling dataset. The attributes should be of scale or interval type. Options Type of Panel Mean: Adjusted ...
    • Panelist Strip Plot Analysis

      Mar 2022 Author: JM Purpose Plots for each attribute the panellists means across the products, for visual comparison and interpretation. Data Format See the profiling dataset. Background This analysis calculates the mean response from each assessor ...
    • Panelist Outliers Analysis

      Mar 2022 Author: JM Purpose To detect panellists that are possible outliers, highlighting those with extremely high or low values. Data Format See the profiling dataset. Options Type of Panel Mean: Adjusted or Arithmetic. Arithmetic means are the ...