Canonical Variates Analysis (CVA)

Canonical Variates Analysis (CVA)

Purpose 

To analyse sensory data using the multidimensional visualisation technique, Canonical Variates Analysis.

Data Format

  1. Assessor by Product matrix measured on multiple attributes – there should be at least 3 attributes, and at least 3 products. The data must be interval type data.
  2. Example dataset: Profiling.xlsx

Background 

CVA is a multivariate visualisation technique that is closely related to discriminant analysis, and sometimes it is known by the alternative name of Canonical Discriminant Analysis. CVA can be thought of as a PCA of a data table where there is some pre-existing structure to the observations that must be accounted for. In the sensory setting the data table to be analysed is the matrix of assessor-by-sample means for each attribute or variable. The structure, or groups, are the samples, and within each sample there is a mean score from each assessor. The purpose of CVA is to highlight differences between groups, so just like PCA it looks for a small set of new dimensions, each of which is a linear combination of the observed variables, however, instead of maximising the variance of each new dimension as PCA does, the objective in CVA is to maximise the ratio of the between group variance to the within group variance. Visually this can be thought of as producing a map where the groups, or samples, are as widely discriminated as possible; mathematically this amounts to choosing dimensions which give the most significant group effect when the scores on that dimension are analysed by a one-way ANOVA. Because of this relation to ANOVA, CVA is often discussed in textbooks at the same time as multivariate ANOVA, or MANOVA, since a CVA using all possible dimensions is equivalent to a MANOVA.

Once the first canonical variate has been extracted, further CVs orthogonal, or uncorrelated, to the first can also be computed. Each successive dimension being the most discriminating possible after previous dimensions have been accounted for. Therefore, the analyst hopes that the first 2 dimensions are sufficient to explain most of the differences between the groups or samples, as the results can then be presented on a single map.

Applications:
  1. Can be used for any data structured as groups with replication within groups.
  2. Consumer cluster visualisation

Options

  1. Dimensions: Choose which canonical variates to display against each other in the product and attribute graphs, either dimensions 1 vs 2,  1 vs 3 or 2 vs 3.
  2. Confidence Ellipses: Choose to display confidence circle on the product graph, Yes or No.
  3. Confidence Level: 90% or 95%. The probability for the confidence circles. A 95% confidence interval means that there is a 95% chance, if the experiment were to be repeated many times, that the true group mean lies with this circle.
  4. Number of Decimals for Values: Required number of decimals for eigenvalues, canonical variate scores and canonical variate loadings.

Results and Interpretation

  1. Eigenvalues: If B is the between group covariance matrix and W is the within group covariance matrix, then the eigenvalues shown are those for the matrix W-1B, in words the eigenvalues of the matrix that measures the ratio of the between to within group variance (see [1] chapter 13). The eigenvalues are also expressed as a percentage of the total variance and a cumulative percentage. The interpretation is in terms of the amount of discrimination between groups or products – so if the percentage variance on the first 2 dimensions is 60% and 15% respectively, it should be concluded that the first dimension is 4 times better at discriminating the groups. Dimensions with an eigenvalue less than 1 are not interesting since this indicates that the within group variance is greater than the between group variance. 
  2. Products: A table of the group mean coordinates is shown. In the sensory setting these will be the are average canonical variate scores calculated across assessors for each product. For the selected pair of dimensions, the group mean coordinates are plotted on a graph with optional confidence circles showing the variation in group mean coordinate. The coordinates and graph can be interpreted as a mapping of the group or product space, and the confidence circles can be used to visualise whether products are discriminated or not by assessing the degree of overlap of the circles.
  3. Attributes: A table of the variable loadings is shown. In the sensory setting these show the importance of the contribution of each attribute to each canonical variate, the loadings can be put in to rank order to show the positive and negative drivers for each canonical variate.   For the selected pair of dimensions, the variable loadings are plotted as vectors pointing away from the origin of the plot. The plot should be interpreted in conjunction with the group coordinates graph. For example a vector on the loadings plot that points toward the top right corner, would be best at discriminating a groups that appears closest to the top right corner on the groups plot from another group at the bottom left of the plot.

Technical Information

The R package ‘SensoMineR’ is used to pre-process the data by removing the assessor effect (scalebypanelist function), and the package ‘candisc’ is used for the CVA model itself (candisc function). The confidence circles for the product graphs assume a multivariate normal distribution, their radius is taken as sqrt(χ2/n) (see [1] page 375), where χ2 is a critical value from a chi-squared distribution with 2 degrees of freedom with probability corresponding to the chosen confidence level, and where n is the number of observations within each group (for a sensory data set n is the number of assessors).

References 

  1. W. J. Krzanowski (1988) “Principles of Multivariate Analysis”, Clarendon Press, Oxford.
    • Related Articles

    • Penalty Analysis

      Purpose To provide a penalty analysis of a consumer data set, that is to investigate how liking or acceptability of product decreases when product attributes are not at the optimal intensity. Data Format Consumer.xlsx Note: for EyeOpenR to read your ...
    • Napping Analysis

      Purpose To provide an analysis of data collected using the napping methodology. Data Format Napping.xlsx For EyeOpenR to read your data the first five columns must include the following in the specified order: Assessor, Product, Session, Replica and ...
    • Same/Different Test Analysis

      Available from version: 5.0.8.6 Purpose The Same Different Test is a discrimination test that is a variation of the paired comparison test. The assessor is presented with two samples and is asked to decide whether these samples are the same or ...
    • A not A Analysis

      Purpose Analyse results from the A-not-A test. Data Format Discrimination_AnotA.xls Data type is binary. Background The A-not-A discrimination test is a variation of the paired comparison test. It is an unspecified test with a probability of guessing ...
    • MAM Model Analysis

      Purpose This analysis looks at the overall panel performance in terms of Discrimination, Agreement and Repeatability or Reproducibility, and then the performance of each individual in the panel in these terms. Using a more sophisticated model, than ...