Same/Different Test Analysis

Same/Different Test Analysis

Available from version:


The Same Different Test is a discrimination test that is a variation of the paired comparison test. The assessor is presented with two samples and is asked to decide whether these samples are the same or different. There are four possible presentations orders, namely: AA, AB, BA, BB.

The analysis can be performed on data in the ‘long’ and ‘short’ format: for the ‘long’ format it is assumed that the same assessor has evaluated at least one same (AA or BB) and at least one different (AB or BA) pair. For the ‘short’ format, it is assumed that each assessor only has one pair to evaluate and therefore only make one response. The analysis of the ‘short’ format returns the results of a Chi-Squared test for independence and includes the Thurstonian ‘Yardstick’ model (d-prime and tau).  For the ‘long’ format, in addition to the short format, the McNemar Chi-squared test is performed.

Alongside formal statistical tests and models, this analysis includes contingency tables and a summary of the presentation order for inspection. This also includes a paired contingency table for the McNemar test.

Data format

Note: when using the Simple Difference project template from EyeQuestion, the data format will automatically be suitable for the Same Different analysis.

The attributes must be of datatype Binary (where 1 indicates a correct answer and 0 indicates an incorrect answer). In the dataset the products must be separated by a “-” and only include one of three cases. For example P01-P02, P01-P01, P02-P02, but not P02-P01.

If you want to include presentation order information, then another attribute is required. For example an attribute labelled “Q1__info” to correspond to Q1. This should be formatted as text “[A]-[B]” where A, B are either 1 or 2. For example “1-2” means product 1 was presented first then product two; “2-1” means product 2 was presented first then product one; “1-1” means product 1 was presented both times and similarly for “2-2”.

A richer format can also be used. EyeQuestion creates this richer format where the separator is either “-” if the panellist answered “same” or “~” if the panellist answered “different”. For example “2~1” means product 2 was presented first then product 1 and the panellist answered “different”; “1~1” can be read as product 1 being presented both times then the panellist answering “different”. The analysis understands this format and if there is a disagreement between the information attribute and the attribute concerning what the panellist answered then the panellist answer is  taken from the attribute.


  1. Use Sessions/Replicates: Change the dataset so that each Judge by Replicate (or Session if selected) is then treated as a separate assessor.
  2. Design: Options are ‘long’ or ‘short’, choose ‘long’ if your dataset uses a ‘long’ design and choose ‘short’ if your dataset uses a ‘short’ design. McNemar’s test is done when ‘long’ is chosen and the Chi-Squared test of independence is performed when ‘short’ is selected.
  3. McNemar calculation: This decides the method used for McNemar’s test, “Exact” is based on the binomial distribution and “Asymptotic” is based on the Chi-Squared distribution. Asymptotic is the default.
  4. Confidence levelThis is used for the yardstick model, it’s confidence intervals for d-prime and tau. 

Results and Interpretation

Presentation Order

This is a contingency table with the responses split for all four combinations of first and second product presented. For each attribute this is a table with columns First Product, Second Product, Same and Different. The First/Second Product columns include the names of the products, whereas the Same column contains the total number of same responses for that presentation order and the Different column contains the total number of different responses for that presentation order.

These tables can be useful for spotting potential presentation order bias. For example, if assessors’ responded ‘Same’ more often to an AB pair than a BA pair and the design is balanced then this could indicate presentation order bias (for an unbalanced design you should instead consider the proportion of ‘Same’ responses for each pair).

The presentation order table will only be made for attributes that have a corresponding ‘info’ attribute, e.g., Q1 and Q1__info, because the info attribute includes the presentation order information.

If there are no information attributes then there won’t be any presentation order tables but there will be a relevant message in the warning table. Any info attribute that doesn’t have a corresponding attribute, for example Q2__info exists but Q2 does not, will be removed with a warning included.


This is a contingency table of the same/different responses in columns against the correct answers in rows. If there are multiple attributes then there will be a table for each.

Missing data or NAs are not included in these tables but a message will be included in the warning table if they are detected.

Yardstick model

This is a Thurstonian model, we’ll call it the ‘yardstick’ model but its name varies the literature, for example tau skimming or differencing. Essentially it models the assessors as using the decision strategy of taking the absolute difference between the two products observed then answering different if this is greater than a threshold, otherwise answering same. This threshold is called tau or 𝜏.

The yardstick model is fitted via a general linear modelling approach. If the fit does not converge then a warning message is included. The model also cannot be fitted if there is insufficient data.

If the model can be fitted then the information is displayed in a table of parameters, estimates, standard errors, confidence intervals of chosen level and p-values. These p-values are for the null hypothesis of delta equal to 0 with alternative hypothesis delta does not equal to 0.

The confidence intervals and p-values are calculated using the profile likelihood (not the observed Fisher’s information).

If there are multiple attributes then a model is fitted for each attribute.

The d-prime estimate is the estimated sensory distance between the two samples and the associated p-value indicates if the samples are significantly different, and at what level.  You will decide whether to conclude if the samples are different based on the risk you want to take.

The tau parameter conveys information about the strategy used by the assessors. A large estimate of tau with a significant p-value suggests a bias towards responding ‘Same’. However, care must be taken in interpreting this because if the two products (present in the experiment) are similar then a bias towards responding ‘Same’ is an effective strategy. 


A chi-squared test of independence is done for every contingency table with no cells less than 4. This is done without continuity correction. This chi-squared itself test tests if the rows and columns of the contingency table are associated.

If the chi-squared test is (statistically) significant then the contingency table can suggest which conclusion should be drawn about the relationship between the two products.

The results of all these tests are collected into a single table of the Chi-Squared statistics, the degrees of freedom and p-values.

This test will only be performed when the Design option is set to ‘short’. 

McNemar Contingency

This tabulates the assessor responses based on their response to a same-pair and to a different-pair. The assessor response to the same pair correspond to the rows; the response to the different pairs to the columns.

If an assessor did not see both a same-pair and a different-pair then this assessor does not contribute to this table. If an assessor evaluated more than one same-pair or more than one different-pair then the responses are tabulated via proportions.

This table will only be shown when the Design option is set to ‘long’.


McNemar’s Chi Squared test is performed on the McNemar Contingency Table, as recommended by Lawless & Heymann (2010).

It tests the hypothesis of marginal homogeneity; are the marginal proportions in the associated table similar. In this case a significant p-value indicates a difference between the two products (present in the experiment). 

The method used depends on the option chosen.

For “Asymptotic” (default) the chi-squared statistic is calculated with continuity correction and the p-value is from the appropriate chi-squared distribution. Asymptotic statistics are recommended in recent same different test literature (Fagerland, Lydersen & Laake, 2013; Pembury Smith & Ruxton, 2020).

For  “Exact” the chi-squared statistic is calculated without continuity correction and the p-value is calculated from binomial distribution with N = number of incorrect answers, p  = 0.5, with the observed value being the number of one type of incorrect answers (the test is two sided so which of these it is does not matter). If due to the data it is not possible to use the “Exact” method then the “Asymptotic” method is used and this is noted in the warnings table.This test will only be performed when the Design option is set to ‘long’. 

Technical Information

  1. The yardstick model uses the R package ‘sensR’.


  1. Ennis, J. M., & Jesionka, V. (2011). The Power of Sensory Discrimination Methods Revisited. Journal of Sensory Studies, 26(5), 371–382.
  2. Fagerland, M. W., Lydersen, S., & Laake, P. (2013). The McNemar test for binary matched-pairs data: Mid-p and asymptotic are better than exact conditional. BMC Medical Research Methodology, 13, 91.
  3. Lawless, H. T., & Heymann, H. (2010). Sensory Evaluation of Food: Principles and Practices (2nd ed.). Springer-Verlag.
  4. Pembury Smith, M. Q. R., & Ruxton, G. D. (2020). Effective use of the McNemar test. Behavioral Ecology and Sociobiology, 74(11), 133.
  5. Rune Haubo Bojesen Christensen, Per Bruun Brockhoff. (2009). Estimation and inference in the same–different test. Food Quality and Preference, 20(7), 514-524.

    • Related Articles

    • Different From Control Test Analysis

      Purpose  To analyse the results of a different from control test.  Data Format Different from control.xlsx Attribute data type is ‘category’.  Background  Different from control tests can determine: If a difference exists between a product vs. ...
    • Discrimination Test Settings - Pd and d' Analysis

      Purpose  Establish the power of a discrimination test given a set sample size or to calculate the sample size required to get a desired power. This can be done specifying the expected difference as a proportion of discriminators (Pd) or as a ...
    • A not A Analysis

      Purpose  Analyse results from the A-not-A test.  Data Format Discrimination_AnotA.xls Data type is binary.  Background  The A-not-A discrimination test is a variation of the paired comparison test.  It is an unspecified test with a probability of ...
    • Paired Comparison Analysis

      Purpose  To analyse the results of a paired comparison test.  Data Format       paired_comparison.xlsx Datatype for the attribute is pairedcomp.  Background  A paired comparison test is a directional / specified test. It is used as a difference test ...
    • R-index Analysis

      Purpose  To analyse the results of an R-index test.  Data Format R_index_rank_withMD.xlsx Background  The R-index applies signal detection theory as an alternative approach to discrimination testing.  It is often used where there are a large number ...