The Same Different Test is a discrimination test that is a variation of the paired comparison test. The assessor is presented with two samples and is asked to decide whether these samples are the same or different. There are four possible presentations orders, namely: AA, AB, BA, BB.

The analysis can be performed on data in the ‘long’ and ‘short’ format: for the ‘long’ format it is assumed that the same assessor has evaluated at least one same (AA or BB) and at least one different (AB or BA) pair. For the ‘short’ format, it is assumed that each assessor only has one pair to evaluate and therefore only make one response. The analysis of the ‘short’ format returns the results of a Chi-Squared test for independence and includes the Thurstonian ‘Yardstick’ model (d-prime and tau). For the ‘long’ format, in addition to the short format, the McNemar Chi-squared test is performed.

Alongside formal statistical tests and models, this analysis includes contingency tables and a summary of the presentation order for inspection. This also includes a paired contingency table for the McNemar test.

*Note: when using the Simple Difference project template
from EyeQuestion, the data format will automatically be suitable for the Same
Different analysis.
*

The attributes must be of datatype Binary (where 1 indicates a correct answer and 0 indicates an incorrect answer). In the dataset the products must be separated by a “-” and only include one of three cases. For example P01-P02, P01-P01, P02-P02, but not P02-P01.

If you want to include presentation order information, then another
attribute is required. For example an attribute labelled “Q1__info” to
correspond to Q1. This should be formatted as text “[A]-[B]” where A, B are
either 1 or 2. For example “1-2” means *product 1* was presented first
then *product two;* “2-1” means *product 2* was presented first then *product
one*; “1-1” means product 1 was presented both times and similarly for
“2-2”.

A richer format can also be used. EyeQuestion creates this
richer format where the separator is either “-” if the panellist answered “same”
or “~” if the panellist answered “different”. For example “2~1” means *product
2* was presented first then *product 1* and the panellist answered “different”;
“1~1” can be read as *product 1* being presented both times then the
panellist answering “different”. The analysis understands this format and if
there is a disagreement between the information attribute and the attribute concerning
what the panellist answered then the panellist answer is taken from the attribute.

**Use Sessions/Replicates**: Change the dataset so that each Judge by Replicate (or Session if selected) is then treated as a separate assessor.**Design:**Options are ‘long’ or ‘short’, choose ‘long’ if your dataset uses a ‘long’ design and choose ‘short’ if your dataset uses a ‘short’ design. McNemar’s test is done when ‘long’ is chosen and the Chi-Squared test of independence is performed when ‘short’ is selected.**McNemar calculation**: This decides the method used for McNemar’s test, “Exact” is based on the binomial distribution and “Asymptotic” is based on the Chi-Squared distribution. Asymptotic is the default.**Confidence level**: This is used for the yardstick model, it’s confidence intervals for d-prime and tau.

This is a contingency table with the responses split for all four combinations of first and second product presented. For each attribute this is a table with columns First Product, Second Product, Same and Different. The First/Second Product columns include the names of the products, whereas the Same column contains the total number of same responses for that presentation order and the Different column contains the total number of different responses for that presentation order.

These tables can be useful for spotting potential presentation order bias. For example, if assessors’ responded ‘Same’ more often to an AB pair than a BA pair and the design is balanced then this could indicate presentation order bias (for an unbalanced design you should instead consider the proportion of ‘Same’ responses for each pair).

The presentation order table will only be made for attributes that have a corresponding ‘info’ attribute, e.g., Q1 and Q1__info, because the info attribute includes the presentation order information.

If there are no information attributes then there won’t be any presentation order tables but there will be a relevant message in the warning table. Any info attribute that doesn’t have a corresponding attribute, for example Q2__info exists but Q2 does not, will be removed with a warning included.

This is a contingency table of the same/different responses
in columns against the correct answers in rows. If there are multiple
attributes then there will be a table for each.

Missing data or NAs are not included in these tables but a
message will be included in the warning table if they are detected.

This is a Thurstonian model, we’ll call it the ‘yardstick’ model but its name varies the literature, for example tau skimming or differencing. Essentially it models the assessors as using the decision strategy of taking the absolute difference between the two products observed then answering different if this is greater than a threshold, otherwise answering same. This threshold is called tau or 𝜏.

The yardstick model is fitted via a general linear modelling approach. If the fit does not converge then a warning message is included. The model also cannot be fitted if there is insufficient data.

If the model can be fitted then the information is displayed in a table of parameters, estimates, standard errors, confidence intervals of chosen level and p-values. These p-values are for the null hypothesis of delta equal to 0 with alternative hypothesis delta does not equal to 0.

The confidence intervals and p-values are calculated using the profile likelihood (not the observed Fisher’s information).

If there are multiple attributes then a model is fitted for each attribute.

The d-prime estimate is the estimated sensory distance between the two samples and the associated p-value indicates if the samples are significantly different, and at what level. You will decide whether to conclude if the samples are different based on the risk you want to take.

The tau parameter conveys information about the strategy used by the assessors. A large estimate of tau with a significant p-value suggests a bias towards responding ‘Same’. However, care must be taken in interpreting this because if the two products (present in the experiment) are similar then a bias towards responding ‘Same’ is an effective strategy.

A chi-squared test of independence is done for every contingency table with no cells less than 4. This is done without continuity correction. This chi-squared itself test tests if the rows and columns of the contingency table are associated.

If the chi-squared test is (statistically) significant then the contingency table can suggest which conclusion should be drawn about the relationship between the two products.

The results of all these tests are collected into a single table of the Chi-Squared statistics, the degrees of freedom and p-values.

This test will only be performed when the Design option is set to ‘short’.

This tabulates the assessor responses based on their
response to a same-pair and to a different-pair. The assessor response to the
same pair correspond to the rows; the response to the different pairs to the
columns.

If an assessor did not see both a same-pair and a different-pair then this assessor does not contribute to this table. If an assessor evaluated more than one same-pair or more than one different-pair then the responses are tabulated via proportions.

This table will only be shown when the Design option is set to ‘long’.

McNemar’s Chi Squared test is performed on the McNemar Contingency Table, as recommended by Lawless & Heymann (2010).

It tests the hypothesis of marginal homogeneity; are the marginal proportions in the associated table similar. In this case a significant p-value indicates a difference between the two products (present in the experiment).

The method used depends on the option chosen.

For “Asymptotic” (default) the chi-squared statistic is calculated with continuity correction and the p-value is from the appropriate chi-squared distribution. Asymptotic statistics are recommended in recent same different test literature (Fagerland, Lydersen & Laake, 2013; Pembury Smith & Ruxton, 2020).

For “Exact” the chi-squared statistic is calculated without continuity correction and the p-value is calculated from binomial distribution with N = number of incorrect answers, p = 0.5, with the observed value being the number of one type of incorrect answers (the test is two sided so which of these it is does not matter). If due to the data it is not possible to use the “Exact” method then the “Asymptotic” method is used and this is noted in the warnings table.This test will only be performed when the Design option is set to ‘long’.

- The yardstick model uses the R package ‘sensR’.

- Ennis, J. M., &
Jesionka, V. (2011). The Power of Sensory Discrimination Methods Revisited.
*Journal of Sensory Studies*,*26*(5), 371–382. https://doi.org/10.1111/j.1745-459X.2011.00353.x - Fagerland, M. W.,
Lydersen, S., & Laake, P. (2013). The McNemar test for binary matched-pairs
data: Mid-p and asymptotic are better than exact conditional.
*BMC Medical Research Methodology*,*13*, 91. https://doi.org/10.1186/1471-2288-13-91 - Lawless, H. T.,
& Heymann, H. (2010).
*Sensory Evaluation of Food: Principles and Practices*(2nd ed.). Springer-Verlag. https://doi.org/10.1007/978-1-4419-6488-5 - Pembury
Smith, M. Q. R., & Ruxton, G. D. (2020). Effective use of the McNemar test.
*Behavioral Ecology and Sociobiology*,*74*(11), 133. https://doi.org/10.1007/s00265-020-02916-y - Rune Haubo Bojesen Christensen, Per Bruun Brockhoff. (2009). Estimation
and inference in the same–different test. Food Quality and Preference, 20(7),
514-524. https://doi.org/10.1016/j.foodqual.2009.05.005

# Related Articles

## Different From Control Test Analysis

Purpose To analyse the results of a different from control test. Data Format Different from control.xlsx Attribute data type is ‘category’. Background Different from control tests can determine: If a difference exists between a product vs. ...## Discrimination Test Settings - Pd and d' Analysis

Purpose Establish the power of a discrimination test given a set sample size or to calculate the sample size required to get a desired power. This can be done specifying the expected difference as a proportion of discriminators (Pd) or as a ...## A not A Analysis

Purpose Analyse results from the A-not-A test. Data Format Discrimination_AnotA.xls Data type is binary. Background The A-not-A discrimination test is a variation of the paired comparison test. It is an unspecified test with a probability of ...## Paired Comparison Analysis

Purpose To analyse the results of a paired comparison test. Data Format paired_comparison.xlsx Datatype for the attribute is pairedcomp. Background A paired comparison test is a directional / specified test. It is used as a difference test ...## R-index Analysis

Purpose To analyse the results of an R-index test. Data Format R_index_rank_withMD.xlsx Background The R-index applies signal detection theory as an alternative approach to discrimination testing. It is often used where there are a large number ...