Correspondence Analysis (CATA and categorical data)

Correspondence Analysis (CATA and categorical data)

Purpose 

To visualise and summarise analyse tabular data and to highlight the patterns of association in two way tables. It is widely used for mapping pure qualitative variables – e.g cluster by demographic use.

This is an example of typical data that can be mapped – this could come from a CATA study, counting up the number of checks, or it could be any other categorical table of counts:


Data Format

  1. The analysis is performed on count data from a cross tabulation of two categorical variables  but the data must be input as a column per categorical variable of interest and this should be interval or text data.
  2. In EyeOpenR this can be done on standard categorical data or CATA data and will work if there is a small amount of missing data.

Background 

Correspondence analysis tells you about the following:
  1. Similarities between row categories
  2. Similarities between column categories
  3. Associations between row and column categories

The analysis gives a visualisation of the row and column categories highlighting the chi-squared associations in the two way table.

The degree of associations are quantified by the Chi-Squared statistics. The eigenvalues measure the proportion of the inertia explained in each dimension. The dimensions are then plotted on x and y axes in order to visualise them, and are interpreted in a similar way to PCA maps. (Inertia=chi=squared/N)

Correspondence analysis is used to find coordinates of successive axes that try to recover as much of the inertia as possible

Options

  1. Dimensions to visualise: here you choose which set (1v2, 1v3 or 2v3) you wish to see shown as plots.
  2. Clustering: You can choose to perform cluster analysis on the products.
  3. Determine clusters: The number of clusters can be calculated automatically or specified by the user.
  4. Number of Decimals for Values: Required number of decimals for values given in the results.
  5. Number of Decimals for P-Values: Required number of decimals for any p-values given in the results.

Results and Interpretation

The output gives maps. We interpret the maps in a similar way to PCA maps. The rows and columns of the chi-squared components (associations) are plotted on the same plot. We interpret this as follows:

•       Row points close together have similar profiles across the columns.
•       Column points close together have similar profiles across the rows.
•       Row and column points in the same direction from the centre show a relatively high positive association.
•       Row and column points in opposite directions show a negative association.
•       Unlike PCA there is no direct interpretation of distances between row and column points.

Be careful – Particularly last point above. We have seen lots of misinterpretations of the plots – they are not point/vector plots as in PCA. In addition, if you have any attributes that are sparse in CATA data, these may be overweighted in the analysis, so should be removed before mapping.

The outputs given are as follows (for both CA on regular categorical data and for CATA data):

1.       The Frequency tab shows the counts associated with each Product and Attribute, as a table product attribute. The numbers in the cells are counts for each.
2.       The Eigenvalues tab provide the percent variance associated with each of the calculated dimensions, both individually per dimension and also as a cumulative total.
3.       The Products tab gives the product coordinates (as they would appear on the CA map) together with the associated Contribution (Contrib) and Squared Cosine (cos2) values.  Within this tab there is also the factor map of the products, which is also available to download.
4.       The Variables tab gives the coordinates or the attributes(variables) together with the associated Contribution (Contrib) and Squared Cosine (cos2) values.  Within this tab there is also the factor map of the variables/attributes, which is also available to download.
5.       The CA Graph tab is the classic correspondence analysis map showing the association between the Products and Variables (interpretation notes above).
6.       The Cluster tab gives information on the clustering if it has been performed (as it is an option). Within this the Cluster Info tab shows the cluster number that each product has been grouped in. The Cluster  Description tab highlights which Variables/Attributes are associated with each cluster of products, where they are showing as statistically significant at the 5% level and can be characterized. Note: When there are <5 products, a 2 cluster solution is forced.

Technical Information

  1. R packages: FactoMiner
Further info on these packages can be found in the R documentation in the following locations: 
  1. CA {FactoMineR}
  2. HCPC {FactoMineR}
The analysis is based on the CA on the contingency table, using the CA {FactoMineR} function (Chi2 distance). The clustering that is performed on top of it (if asked) is based on the HCPC{function} and is performed on the rows (AHC + K-means). Since CA is sensitive to sparse attributes, a filter can be applied based on the number of time a word has been selected. 

References 

  1. McEwan, J. A., Schlich, P. (1991), Correspondence Analysis in Sensory Evaluation, Food Quality & Preference 3, 23-36


    • Related Articles

    • Check All That Apply (CATA)

      Introduction The "Check-All-That-Apply" (CATA) method is utilized in sensory evaluation to collect information regarding the sensory characteristics of a product. In this method, participants are presented with a predetermined list of sensory ...
    • Penalty Analysis

      Purpose To provide a penalty analysis of a consumer data set, that is to investigate how liking or acceptability of product decreases when product attributes are not at the optimal intensity. Data Format Consumer.xlsx Note: for EyeOpenR to read your ...
    • Data Cleaning

      Introduction Following data collection, it's essential to ensure the validity of the collected data and address any instances where participants may have completed the questionnaire without due attention. To tackle this issue, we've introduced a ...
    • Frequency Tables (Categorical Data)

      Purpose Produce summary tables and charts of data per attribute and per product if desired. This option is for categorical data, which can be nominal or interval data. Data Format Categorical coffee.xlsx The analysis will ignore data of type ‘text’. ...
    • How Can I Analyse My Data?

      In EyeQuestion there are multiple options to analyze the project data. When you select the Data tab in your project you will find a dropdown menu Analysis: Auto Reports Via the option for Auto reports EyeQuestion will analyze the data and create the ...