Chi Squared Test

Chi Squared Test

Purpose

Performs a chi-squared test on a contingency table that is derived from a cross-tabulation of counts on two categorical variables in your data set.  A chi-squared test is used to determine if there is significant association between the rows and columns of a table, e.g. do specific combinations of products (rows) and appropriate eating occasions (columns) have a particularly low or high count.

Data Format

The software will attempt to form a contingency table between the product variable and all other attributes that look like categorical variables, also tables are formed between all pairs of categorical variables (see below attached Chi-Squared-Example-Data.xlsx).

Background

A two-way table of counts is formed for each test to be performed and a chi-squared statistic is calculated from (a) the observed values in each cell (Obs), and (b) the expected values of each cell under the null hypothesis of no association between row and columns.  Expected values (Exp) are derived from the row and column totals.  Each cell contributes an amount of (Obs-Exp)2/Exp to the final chi-squared statistics which is then compared to the standard chi-squared distribution with the appropriate degrees of freedom.  For a table with r rows and c columns the degrees of freedom are (r-1) * (c-1).

Options

  1. Test to Perform: would you like to perform the standard chi-squared test (default Chi2 option) or Fisher’s exact test (Fisher option)?  Fisher’s test is more computationally intensive, especially for contingency tables that are larger than 2x2, but it is theoretically preferred, especially if there are cells in your contingency table that have low counts (expected values less than 5).
  2. Apply Yates’ Correction: Yes or No (default).  Yates’ correction changes the calculation of a 2x2 table’s chi-squared statistic to adjust for the fact that the chi-squared distribution, which is a continuous distribution, is being used to approximate the discrete distribution of table counts. The option is also known as the ‘continuity correction’ and only applies to 2x2 tables.
  3. Number of decimals for values: controls the number of decimals printed in the output for the chi-squared statistic, the odds ratio, and its confidence limits.
  4. Number of decimals for p-values: controls the number of decimals printed for p-values.

Results and Interpretation

  1. For each test performed a separate table of counts is generated under the ‘Frequency’ tab.
  2. The ‘P-Values’ tab presents the significance results of all the tests in a single table.
  3. When the standard chi-squared test is selected, the ‘P-Values’ tab will be a table with one row per test, the first column identifies the names of the two categorical variables contributing to the frequency table, the second column gives the table’s chi-squared statistic, the third column gives the degrees of freedom, and the final column gives the associated p-value.  The choice of a threshold is up to to the user, but it is customary to declare any p-values <0.05 as representing a significant association between the rows and column of the corresponding contingency table.
  4. When Fisher’s Test is selected, the ‘P-Values’ tab will be a table with one row per test, the first column identifies the names of the two categorical variables contributing to the frequency table, and the final column gives the p-value for determining the significance of the association test.  The p-value is essentially the proportion of all possible tables, with the same row and column sums, that are as extreme or more extreme, in terms of the level of association, than the observed table.  The p-value should be interpreted in the usual way, and small values (typically those smaller than 0.05), are declared as indicators of significant association between rows and columns.  For 2x2 contingency tables there is another way of looking at the degree of association, which is based on the odds ratio, this is the product of the 2 counts from the main diagonal divided by the product of the two counts from the back diagonal.  So, for 2x2 tables only, the ‘P-Values’ output table will contain values for the odds ratio and its 95% confidence limits.  Under the null hypothesis the odds ratio will be 1, so significant odds ratios are those where the confidence interval does not contain the value of one.

Technical Information

  1. The R function chisq.test from the package ‘stats’ is used for the chi-squared test, the function fisher.test, also from ‘stats’ is used for Fisher’s test.

References

Martin Bland (2015) “An Introduction to Medical Statistics – 4th Edition”, Oxford University Press.  See chapter 13 “The analysis of cross-tabulations”.

    • Related Articles

    • Same/Different Test Analysis

      Available from version: 5.0.8.6 Purpose The Same Different Test is a discrimination test that is a variation of the paired comparison test. The assessor is presented with two samples and is asked to decide whether these samples are the same or ...
    • Cochran and McNemar test (CATA)

      Purpose Cochran and McNemar tests are used to test for differences between products when the data has been collected through a ‘Check All That Apply’ (CATA) design. Using a CATA method for sensory research means that the responses collected are ...
    • T-test

      Purpose To make a statistical test for differences between two means. Or in the case that there are more than two samples, make separate difference tests for each possible pair of samples. Data Format The test is quite general and can be applied in ...
    • Appointment Scheduling

      Introduction Whether you're conducting a sensory test or a home use test (HUT) study, efficient appointment scheduling is crucial. It ensures that participants are available at the designated time, maximizing the effectiveness of your research ...
    • Implicit Association Test (IAT)

      Introduction The Implicit Association Test (IAT) is a psychological assessment tool designed to detect the strength of a person's automatic associations between mental representations of objects (concepts) in memory. It is widely used in social ...