Correspondence Analysis (CATA and categorical data)

Correspondence Analysis (CATA and categorical data)

Purpose 

To visualise and summarise analyse tabular data and to highlight the patterns of association in two way tables. It is widely used for mapping pure qualitative variables – e.g cluster by demographic use.

This is an example of typical data that can be mapped – this could come from a CATA study, counting up the number of checks, or it could be any other categorical table of counts:


Data Format

  1. The analysis is performed on count data from a cross tabulation of two categorical variables  but the data must be input as a column per categorical variable of interest and this should be interval or text data.
  2. In EyeOpenR this can be done on standard categorical data or CATA data and will work if there is a small amount of missing data.

Background 

Correspondence analysis tells you about the following:
  1. Similarities between row categories
  2. Similarities between column categories
  3. Associations between row and column categories

The analysis gives a visualisation of the row and column categories highlighting the chi-squared associations in the two way table.

The degree of associations are quantified by the Chi-Squared statistics. The eigenvalues measure the proportion of the inertia explained in each dimension. The dimensions are then plotted on x and y axes in order to visualise them, and are interpreted in a similar way to PCA maps. (Inertia=chi=squared/N)

Correspondence analysis is used to find coordinates of successive axes that try to recover as much of the inertia as possible

Options

  1. Dimensions to visualise: here you choose which set (1v2, 1v3 or 2v3) you wish to see shown as plots.
  2. Clustering: You can choose to perform cluster analysis on the products.
  3. Determine clusters: The number of clusters can be calculated automatically or specified by the user.
  4. Number of Decimals for Values: Required number of decimals for values given in the results.
  5. Number of Decimals for P-Values: Required number of decimals for any p-values given in the results.

Results and Interpretation

The output gives maps. We interpret the maps in a similar way to PCA maps. The rows and columns of the chi-squared components (associations) are plotted on the same plot. We interpret this as follows:

•       Row points close together have similar profiles across the columns.
•       Column points close together have similar profiles across the rows.
•       Row and column points in the same direction from the centre show a relatively high positive association.
•       Row and column points in opposite directions show a negative association.
•       Unlike PCA there is no direct interpretation of distances between row and column points.

Be careful – Particularly last point above. We have seen lots of misinterpretations of the plots – they are not point/vector plots as in PCA. In addition, if you have any attributes that are sparse in CATA data, these may be overweighted in the analysis, so should be removed before mapping.

The outputs given are as follows (for both CA on regular categorical data and for CATA data):

1.       The Frequency tab shows the counts associated with each Product and Attribute, as a table product attribute. The numbers in the cells are counts for each.
2.       The Eigenvalues tab provide the percent variance associated with each of the calculated dimensions, both individually per dimension and also as a cumulative total.
3.       The Products tab gives the product coordinates (as they would appear on the CA map) together with the associated Contribution (Contrib) and Squared Cosine (cos2) values.  Within this tab there is also the factor map of the products, which is also available to download.
4.       The Variables tab gives the coordinates or the attributes(variables) together with the associated Contribution (Contrib) and Squared Cosine (cos2) values.  Within this tab there is also the factor map of the variables/attributes, which is also available to download.
5.       The CA Graph tab is the classic correspondence analysis map showing the association between the Products and Variables (interpretation notes above).
6.       The Cluster tab gives information on the clustering if it has been performed (as it is an option). Within this the Cluster Info tab shows the cluster number that each product has been grouped in. The Cluster  Description tab highlights which Variables/Attributes are associated with each cluster of products, where they are showing as statistically significant at the 5% level and can be characterized. Note: When there are <5 products, a 2 cluster solution is forced.

Technical Information

  1. R packages: FactoMiner
Further info on these packages can be found in the R documentation in the following locations: 
  1. CA {FactoMineR}
  2. HCPC {FactoMineR}
The analysis is based on the CA on the contingency table, using the CA {FactoMineR} function (Chi2 distance). The clustering that is performed on top of it (if asked) is based on the HCPC{function} and is performed on the rows (AHC + K-means). Since CA is sensitive to sparse attributes, a filter can be applied based on the number of time a word has been selected. 

References 

  1. McEwan, J. A., Schlich, P. (1991), Correspondence Analysis in Sensory Evaluation, Food Quality & Preference 3, 23-36


    • Related Articles

    • Check All That Apply (CATA)

      Introduction The "Check-All-That-Apply" (CATA) method is utilized in sensory evaluation to collect information regarding the sensory characteristics of a product. In this method, participants are presented with a predetermined list of sensory ...
    • Penalty Analysis

      Purpose To provide a penalty analysis of a consumer data set, that is to investigate how liking or acceptability of product decreases when product attributes are not at the optimal intensity. Data Format Consumer.xlsx Note: for EyeOpenR to read your ...
    • How Can I Analyse My Data?

      In EyeQuestion there are multiple options to analyse the project data. When you go to the export page in the project you will find a dropdown menu called "Analysis". Auto Reports Via the option for Auto reports EyeQuestion will analyse the data and ...
    • Napping Analysis

      Purpose To provide an analysis of data collected using the napping methodology. Data Format Napping.xlsx For EyeOpenR to read your data the first five columns must include the following in the specified order: Assessor, Product, Session, Replica and ...
    • How To Filter Your Data Before The Analysis

      Imagine you have collected some data, but you only want to analyze a subset of those. For example, you have conducted a consumer test and you want to keep only the consumers that scored higher than four in the hedonic question. It is possible to ...