CVA is a multivariate visualisation technique that is closely related to discriminant analysis, and sometimes it is known by the alternative name of Canonical Discriminant Analysis. CVA can be thought of as a PCA of a data table where there is some pre-existing structure to the observations that must be accounted for. In the sensory setting the data table to be analysed is the matrix of assessor-by-sample means for each attribute or variable. The structure, or groups, are the samples, and within each sample there is a mean score from each assessor. The purpose of CVA is to highlight differences between groups, so just like PCA it looks for a small set of new dimensions, each of which is a linear combination of the observed variables, however, instead of maximising the variance of each new dimension as PCA does, the objective in CVA is to maximise the ratio of the between group variance to the within group variance. Visually this can be thought of as producing a map where the groups, or samples, are as widely discriminated as possible; mathematically this amounts to choosing dimensions which give the most significant group effect when the scores on that dimension are analysed by a one-way ANOVA. Because of this relation to ANOVA, CVA is often discussed in textbooks at the same time as multivariate ANOVA, or MANOVA, since a CVA using all possible dimensions is equivalent to a MANOVA.
Once the first canonical variate has been extracted, further CVs orthogonal, or uncorrelated, to the first can also be computed. Each successive dimension being the most discriminating possible after previous dimensions have been accounted for. Therefore, the analyst hopes that the first 2 dimensions are sufficient to explain most of the differences between the groups or samples, as the results can then be presented on a single map.
Applications:
- Can be used for any data structured as groups with replication within groups.
- Consumer cluster visualisation
The R package ‘SensoMineR’ is used to pre-process the data by removing the assessor effect (scalebypanelist function), and the package ‘candisc’ is used for the CVA model itself (candisc function). The confidence circles for the product graphs assume a multivariate normal distribution, their radius is taken as sqrt(χ2/n) (see [1] page 375), where χ2 is a critical value from a chi-squared distribution with 2 degrees of freedom with probability corresponding to the chosen confidence level, and where n is the number of observations within each group (for a sensory data set n is the number of assessors).
References