To provide a penalty analysis of a consumer data set, that is to investigate how liking or acceptability of product decreases when product attributes are not at the optimal intensity.
Note: for EyeOpenR to read your data, the first five columns must include the following in the specified order: Assessor, Product, Session, Replica and Sequence. Sensory attributes or consumer variables start from column six (Column F). If there is no session, replica or sequence information available, the user should input a value of “1” in each cell in the column that contains no collected information. See the example ‘Consumer.xlsx’ dataset for more information.
Penalty Analysis (JAR and CATA)
Penalty Analysis (PA) is a popular consumer science analysis method. It examines how liking or acceptability of product decreases when product attributes are not at the optimal intensity. Liking is typically measured using on a 7- or 9pt- scale, while attribute intensities are measured using either just about right (JAR) or check all that apply (CATA; binary) scales. Note, both liking and product attribute data are provided by the same consumer.
JAR questions ask the consumer to rate the intensity of a specified attribute in a product on an agreement scale, the most common being a five- point scale: much too little, too little, just about right, too much and much too much. Although other variants exist (e.g., 7-point JAR ), all pivot around a central about right response. For CATA data the scale is binary: the attribute is either perceived (1) or absent (0).
PA works by firstly assessing the distribution of JAR/CATA responses for each level of the respective scale: the absolute number and percentage of consumers that responded to each level of response are calculated. For JAR data, some pre-processing follows: it is most common to merge categories to form three final levels for subsequent analysis: too little, about right and too much. Thus, levels much too little and too little are collapsed into one level of response, as are too much and much too much into another. This is also true of the variants of the JAR scale: levels are collapsed to form three levels. EyeOpenR accepts a variety of JAR scales and will pre-process the data automatically. No pre-processing is required for CATA data.
After collapsing the JAR data to three levels, the mean liking of a product is calculated for each of the three level. For example, the mean liking is obtained for all consumers who report Product X to have too little of attribute Y, about right intensity of Y, and too much of attribute Y. There are now three mean scores per attribute, per product. The same protocol occurs for CATA data: mean liking is compared between those that report an attribute present vs. absent. Thus, for CATA data there are now two mean scores per attribute, per product.
PA for JAR data continues by calculating the difference in mean liking between about right and the two non-optimal means. These are known as penalties and mean drops in the literature. A weighted penalty is also computed, which multiplies the proportion of consumers by the mean drop. For CATA data, the difference in means is known as mean impact: it represents the difference in liking due to an attribute being present vs. absent in a product.
In summary, there are two important steps in PA: firstly, to calculate the distribution of responses across the respective scale used; secondly, to calculate the mean drop/impact (the difference in liking) for consumers responding too little/much vs. about right (JAR) or present vs. absent (CATA). Regarding proportions, in the case of JAR it is unreasonable to expect all consumers to rate an attribute (e.g., sweetness) as about right. A general rule of thumb is that the no more than 20% should report either too little or too much and that these proportions (too little and too much) should be roughly the same size (i.e., both less than 20% and both approximately equal). If there is less than this threshold reporting non optimal intensity then, in general, the attribute is deemed to be sufficiently about right level for a majority of consumers.
If more than a chosen threshold report non-optimal intensity, then it is important to understand if there is a difference in liking between this group vs. those reporting about right. In general, a mean drop of 1 or more (using a 9pt hedonic scale) is interpreted as business-relevant, although this varies. Nonetheless, a relatively high mean drop combined with a high proportion of consumers reporting non-optimal intensity is cause for concern.
For CATA data, it is unreasonable that 0 or 100% of consumers report that a product has a particular attribute, as the meaning of that attribute is different to different consumers. Nevertheless, the same principles apply as with JAR data: firstly, one inspects the distribution of proportions of absent/present responses. A 20% threshold is often used, meaning that 20% of consumers must detect an attribute as present for penalties to be interpreted. A large mean impact indicates the attribute being present/absent is important for consumer liking: it is crucial to correctly interpret the means of absent and present categories to determine the direction.
One popular way to present PA results is to visualize proportion and mean drop is by plotting the two on the X and Y axis respectively, per product. This is said to provide diagnostic information as to how one could improve a product’s characteristics. Attributes in the upper right quadrant are of concern: high proportion of consumers (X-axis) are reporting a non-optimal intensity level and the respective mean drop/impact is high (Y-axis). Different companies have their own action standards regarding what defines this area, which is sometimes referred to as a ‘danger zone’ or ‘critical corner’, but in general, a proportion of more than 20% proportion (X-axis) and more than 1pt mean drop (Y-axis) is interpreted as non-satisfactory.
As a result of PA it may be tempting to conclude that to improve liking, the intensity needs to change in accordance with consumer feedback. Note however, from a developer’s perspective, things are never this simple: firstly, if a high proportion rate intensity as, say, “too little”, then increasing the intensity will likely impact the proportion who reported “just right”; secondly, in the domain of food products, attributes show multi-collinearity (i.e., they are correlated and unlikely to be independent); lastly, in a consumer test where liking and descriptors are rated in quick succession there is likely to be a halo-effect, whereby the degree of liking colours the perceived intensity of several attributes. Nevertheless, with these cautions in mind, PA continues to be widely adopted in the industry.
Penalty Analysis with Ideal (CATA)
A recent addition to PA is to collect scores of an ideal product, that is, where consumers additionally rate whether an attribute is present or absent for a hypothetical product. When working with CATA data, there are four possible situations:
- A test product and an ideal product both have the attribute present: this can be thought of as about right in attribute intensity
- A test product and an ideal product both do not have the attribute present: this can be thought of as about right in attribute intensity
- A test product has the attribute present but an ideal product does not: this can be thought of as the test product being too much in attribute intensity
- A test product does not have the attribute present but an ideal product does: this can be thought of as the test product being too little in attribute intensity
The analysis then proceeds as described in the preceding section: proportions of each response level are performed followed by calculation of mean drops.