Penalty Analysis

Penalty Analysis

Purpose

To provide a penalty analysis of a consumer data set, that is to investigate how liking or acceptability of product decreases when product attributes are not at the optimal intensity.

Data Format

  1. Consumer.xlsx
Note: for EyeOpenR to read your data, the first five columns must include the following in the specified order: Assessor, Product, Session, Replica and Sequence. Sensory attributes or consumer variables start from column six (Column F). If there is no session, replica or sequence information available, the user should input a value of “1” in each cell in the column that contains no collected information. See the example ‘Consumer.xlsx’ dataset for more information. 

Background

Penalty Analysis (JAR and CATA)

Penalty Analysis (PA) is a popular consumer science analysis method. It examines how liking or acceptability of product decreases when product attributes are not at the optimal intensity. Liking is typically measured using on a 7- or 9pt- scale, while attribute intensities are measured using either just about right (JAR) or check all that apply (CATA; binary) scales. Note, both liking and product attribute data are provided by the same consumer. 

JAR questions ask the consumer to rate the intensity of a specified attribute in a product on an agreement scale, the most common being a five- point scale: much too little, too little, just about right, too much and much too much. Although other variants exist (e.g., 7-point JAR ), all pivot around a central about right response. For CATA data the scale is binary: the attribute is either perceived (1) or absent (0).

PA works by firstly assessing the distribution of JAR/CATA responses for each level of the respective scale: the absolute number and percentage of consumers that responded to each level of response are calculated. For JAR data, some pre-processing follows: it is most common to merge categories to form three final levels for subsequent analysis: too little, about right and too much. Thus, levels much too little and too little are collapsed into one level of response, as are too much and much too much into another. This is also true of the variants of the JAR scale: levels are collapsed to form three levels. EyeOpenR accepts a variety of JAR scales and will pre-process the data automatically. No pre-processing is required for CATA data. 

After collapsing the JAR data to three levels, the mean liking of a product is calculated for each of the three level. For example, the mean liking is obtained for all consumers who report Product X to have too little of attribute Y, about right intensity of Y, and too much of attribute Y. There are now three mean scores per attribute, per product. The same protocol occurs for CATA data: mean liking is compared between those that report an attribute present vs. absent. Thus, for CATA data there are now two mean scores per attribute, per product. 

PA for JAR data continues by calculating the difference in mean liking between about right and the two non-optimal means. These are known as penalties and mean drops in the literature. A weighted penalty is also computed, which multiplies the proportion of consumers by the mean drop. For CATA data, the difference in means is known as mean impact: it represents the difference in liking due to an attribute being present vs. absent in a product. 

In summary, there are two important steps in PA: firstly, to calculate the distribution of responses across the respective scale used; secondly, to calculate the mean drop/impact (the difference in liking) for consumers responding too little/much vs. about right (JAR) or present vs. absent (CATA). Regarding proportions, in the case of JAR it is unreasonable to expect all consumers to rate an attribute (e.g., sweetness) as about right. A general rule of thumb is that the no more than 20% should report either too little or too much and that these proportions (too little and too much) should be roughly the same size (i.e., both less than 20% and both approximately equal). If there is less than this threshold reporting non optimal intensity then, in general, the attribute is deemed to be sufficiently about right level for a majority of consumers.

If more than a chosen threshold report non-optimal intensity, then it is important to understand if there is a difference in liking between this group vs. those reporting about right. In general, a mean drop of 1 or more (using a 9pt hedonic scale) is interpreted as business-relevant, although this varies. Nonetheless, a relatively high mean drop combined with a high proportion of consumers reporting non-optimal intensity is cause for concern. 

For CATA data, it is unreasonable that 0 or 100% of consumers report that a product has a particular attribute, as the meaning of that attribute is different to different consumers. Nevertheless, the same principles apply as with JAR data: firstly, one inspects the distribution of proportions of absent/present responses. A 20% threshold is often used, meaning that 20% of consumers must detect an attribute as present for penalties to be interpreted. A large mean impact indicates the attribute being present/absent is important for consumer liking: it is crucial to correctly interpret the means of absent and present categories to determine the direction.

One popular way to present PA results is to visualize proportion and mean drop is by plotting the two on the X and Y axis respectively, per product. This is said to provide diagnostic information as to how one could improve a product’s characteristics. Attributes in the upper right quadrant are of concern: high proportion of consumers (X-axis) are reporting a non-optimal intensity level and the respective mean drop/impact is high (Y-axis). Different companies have their own action standards regarding what defines this area, which is sometimes referred to as a ‘danger zone’ or ‘critical corner’, but in general, a proportion of more than 20% proportion (X-axis) and more than 1pt mean drop (Y-axis) is interpreted as non-satisfactory.

As a result of PA it may be tempting to conclude that to improve liking, the intensity needs to change in accordance with consumer feedback. Note however, from a developer’s perspective, things are never this simple: firstly, if a high proportion rate intensity as, say, “too little”, then increasing the intensity will likely impact the proportion who reported “just right”; secondly, in the domain of food products, attributes show multi-collinearity (i.e., they are correlated and unlikely to be independent); lastly, in a consumer test where liking and descriptors are rated in quick succession there is likely to be a halo-effect, whereby the degree of liking colours the perceived intensity of several attributes. Nevertheless, with these cautions in mind, PA continues to be widely adopted in the industry. 

Penalty Analysis with Ideal (CATA)

A recent addition to PA is to collect scores of an ideal product, that is, where consumers additionally rate whether an attribute is present or absent for a hypothetical product. When working with CATA data, there are four possible situations:
  1. A test product and an ideal product both have the attribute present: this can be thought of as about right in attribute intensity
  2. A test product and an ideal product both do not have the attribute present: this can be thought of as about right in attribute intensity
  3. A test product has the attribute present but an ideal product does not: this can be thought of as the test product being too much in attribute intensity
  4. A test product does not have the attribute present but an ideal product does: this can be thought of as the test product being too little in attribute intensity 
The analysis then proceeds as described in the preceding section: proportions of each response level are performed followed by calculation of mean drops. 

Options

Penalty Analysis (JAR)

Analysis options tab:
  1. Liking Variable: Select the variable to be considered as liking.
  2. JAR Scale: Select the JAR scale used for data collection or alternatively select ‘automatic’.
  3. Results by: An advantage in EyeOpenR is that results can be returned per product or per attribute.
  4. Number of Decimals for Values: Choose preferred number of decimal places in subsequent output.
  5. Number of Decimals for P-Values: Choose preferred number of decimal places in subsequent output (default=3).
Chart options tab:
  1. Show Threshold Block: If ‘Yes’ is selected, a red box is drawn around the upper right quadrant of the penalty plot (visualising proportion of consumers (X-axis) vs. mean drop (Y-axis)). An attribute/product placed in this location indicates that an above threshold proportion perceive the attribute/product and that the mean drop is greater than the threshold set. In other words, there is consumer feedback that the attribute intensity is non-optimal and impacts liking.
  2. Penalty threshold: Penalty threshold is the mean drop value considered to be the threshold for calculating a penalty and for subsequent plotting.
  3. Panellist proportion threshold: Select from 10 to 30% (default 20%). Refers to the threshold (minimum) percentage of consumers who report too little or too much in order for a penalty (in the form of mean drop) to be calculated.
  4. Add lines between the attributes?: If yes, this draws a straight line between ‘too little’ and ‘too high’ per attribute.
  5. X-Axis start value: Enter minimum value of consumer proportion (default = 0).
  6. X-Axis end value: Enter maximum value of consumer proportion (default = 100).
  7. Y-Axis min value: Enter the minimum value of the mean drop to be plotted (default is -2).
  8. Y-Axis end value: Enter the maximum value of the mean drop to be plotted (default 10).

Penalty Analysis (CATA)

  1. Attribute(s) of type Liking: Select the variable to be considered as liking.
  2. Results by: An advantage in EyeOpenR is that results can be returned per product or per attribute.
  3. Threshold for population size: the threshold proportion of consumers who are needed to perceive the attribute in a product, in order for mean impact analysis to be performed (absent minus present). Default is 20%.
  4. Number of Decimals for Values: Choose preferred number of decimal places in subsequent output.
  5. Number of Decimals for P-Values: Choose preferred number of decimal places in subsequent output (default=3).

Penalty Analysis (CATA with ideal)

Analysis options tab:
  1. Ideal Product: Select the variable to be considered as liking.
  2. Attribute(s) of type Liking: An advantage in EyeOpenR is that results can be returned per product or per attribute.
  3. Results by: An advantage in EyeOpenR is that results can be returned per product or per attribute.
  4. Number of Decimals for Values: Choose preferred number of decimal places in subsequent output.
  5. Number of Decimals for P-Values: Choose preferred number of decimal places in subsequent output (default=3).
Display options tab:
  1. Penalty threshold: The mean drop value considered to be the threshold for calculating a penalty and for subsequent plotting.
  2. Panellist proportion threshold: Select from 10 to 30% (default 20%). Refers to the threshold (minimum) percentage of consumers who report too little or too much in order for a penalty (in the form of mean drop) to be calculated.

Results and Interpretation

Penalty Analysis (JAR)

  1. Frequency tab: Provides a stacked bar chart and table of frequencies of the three level of categorical responses (not enough; JAR; too much), expressed as percentages. This is performed per product or per attribute, depending on the option chosen.
  2. Mean Drop tab: The difference in liking of consumers who rate an attribute as about right vs. not enough or too much. In other words, this is the difference in liking between those consumers that differ in the liking of attribute intensity. If a cell is shaded dark red, then the mean drop is significant (p < .05).
  3. Penalty Plot tab: The graph visualizes the proportion of consumer as a percentage on the X-axis, the same value as reported in the Frequency tab. On the Y-axis is the penalty, otherwise known as the mean drop (it is the same value as in the Mean Drop tab).

    If the analysis is performed by product, then there is a plot for each product. In general, a product developer wishes for a product to attain as high a JAR proportion as possible. If more than the set threshold proportion perceive an attribute intensity to be too low or too hight, then it is important to view the mean drop as to whether this attribute penalizes liking of the product (see Background for more information).

    If the option of threshold block is selected (see Options), then a shaded red area is shown in the upper right quadrant of the graph: the area indicates the region where the minimum consumer proportion and minimum mean drop is reached and exceeded. This area constitutes a high proportion of consumers and a high penalty: for that reason it is often called the ‘danger zone’ or ‘critical corner’. From a business perspective, product improvements are often required to address these penalties. In general, business risk decreases when the proportion and mean drop is nearer to the origin.

    If the option to ‘add lines between the attributes’ is selected, then for each attribute, a straight line connects Not Enough to Too Much. This can help interpretation in relating not enough with too high responses, per attribute.

    Below the penalty plot is a summary table. Per product, the proportion and penalty (mean drop) is provided, alongside a p-value. The p-value is the result of a t-test comparing liking of the Not Enough/Too Much consumer subset to the JAR subset. The t-test examines if the mean drop is significantly different from 0. In other words, it helps to answer the question ‘is the penalty significant?’.

  4. Weighted Penalty tab: This is the proportion of consumers multiplied by the penalty. The table is sorted in descending order.
  5. Information tab: provides the information on the original JAR scale (prior pre-processing to three levels) amongst other details.

Penalty Analysis (CATA)

  1. Frequency tab: Provides a table of frequencies of the two level of categorical responses (absent; present), expressed as percentages. This is performed per product or per attribute, depending on the option chosen.
  2. Mean Impact tab: Provides the liking mean of those consumers who report an attribute as absent and the corresponding mean impact score (present minus absent). Note that only values that meet the threshold for population size are reported (see Options). A t-test is also performed with the respective p-value reported in the table.
    A p-value below 0.05 would indicate, for example, that there is a significant difference between the liking of the product for those that perceived the attribute (present) vs. those that don’t (absent) (with the probability of reporting a difference that is not truly there to be 5%).
  3. Mean Impact Graph tab: Graph and corresponding table of mean impact values plotted per CATA question. Higher mean impact values indicate a greater discrepancy in liking between the attribute being present vs. absent, thus providing the analyst with information about which attributes relate most strongly to liking.
  4. Information tab: When the proportion of ticks is less than the consumer threshold a message will appear in the Information tab. So, if the analyst finds missing values in preceding tables, it may well be due to a below threshold number of ticks.

Penalty Analysis (CATA with Ideal)

As a recap, when an attribute is present in a test but not in an ideal product, the test product has too much of said attribute; likewise when an attribute is not present in a test product but is in an ideal product, then the test product has too low intensity of said attribute; when the intensity of the test and ideal matches, it can be thought of as about right (JAR).

  1. Frequency tab: This tab presents a stacked bar chart and frequency table of too low, JAR and too much scores per product. The JAR levels are calculated as described above and in reference to a selected ideal product (as chosen at the Options stage of analysis).
  2. Penalty tab: The difference in liking of consumers who rate an attribute as about right vs. not enough or too much (as defined above). In other words, this is the difference in liking between those consumers that report intensity as optimal vs. non-optimal. If a cell is shaded dark red, then the mean drop is significant (p < .05).
  3. Penalty Plot tab: The graph visualizes the proportion of consumer as a percentage on the X-axis and the Penalty (mean drop) on the Y-axis. A red shaded area will is drawn based on the user input for penalty threshold and user percentage threshold (consumer proportion) in the options phase. Interpretation is the same as described in the above section ‘Penalty Analysis (CATA)’.

Technical Information

  1. Use of R packages: car, SensoMineR


    • Related Articles

    • Check All That Apply (CATA)

      Introduction The "Check-All-That-Apply" (CATA) method is utilized in sensory evaluation to collect information regarding the sensory characteristics of a product. In this method, participants are presented with a predetermined list of sensory ...
    • Correspondence Analysis (CATA and categorical data)

      Purpose To visualise and summarise analyse tabular data and to highlight the patterns of association in two way tables. It is widely used for mapping pure qualitative variables – e.g cluster by demographic use. This is an example of typical data that ...
    • Principal Component Analysis (PCA)

      Purpose To provide a Principal Components Analysis (PCA) of the imported data. PCA is a popular dimensionality reduction method in sensory and consumer science. In non-technical terms, the analysis reduces an initial set of variables to a smaller set ...
    • How to run an analysis on an external dataset

      An external dataset can be imported into EyeOpenR. This dataset can be used to run analysis or create an autoreport. You can import data in .xls and .xlsx format. Below is a description of the conventional format you can import. This is the format ...
    • Different From Control Test Analysis

      Purpose To analyse the results of a different from control test. Data Format Different from control.xlsx Attribute data type is ‘category’. Background Different from control tests can determine: If a difference exists between a product vs. reference. ...