The Same Different Test is a discrimination test that is a variation of the paired comparison test. The assessor is presented with two samples and is asked to decide whether these samples are the same or different. There are four possible presentations orders, namely: AA, AB, BA, BB.
The analysis can be performed on data in the ‘long’ and ‘short’ format: for the ‘long’ format it is assumed that the same assessor has evaluated at least one same (AA or BB) and at least one different (AB or BA) pair. For the ‘short’ format, it is assumed that each assessor only has one pair to evaluate and therefore only make one response. The analysis of the ‘short’ format returns the results of a Chi-Squared test for independence and includes the Thurstonian ‘Yardstick’ model (d-prime and tau). For the ‘long’ format, in addition to the short format, the McNemar Chi-squared test is performed.
Alongside formal statistical tests and models, this analysis includes contingency tables and a summary of the presentation order for inspection. This also includes a paired contingency table for the McNemar test.
Note: when using the Simple Difference project template
from EyeQuestion, the data format will automatically be suitable for the Same
The attributes must be of datatype Binary (where 1 indicates a correct answer and 0 indicates an incorrect answer). In the dataset the products must be separated by a “-” and only include one of three cases. For example P01-P02, P01-P01, P02-P02, but not P02-P01.
If you want to include presentation order information, then another attribute is required. For example an attribute labelled “Q1__info” to correspond to Q1. This should be formatted as text “[A]-[B]” where A, B are either 1 or 2. For example “1-2” means product 1 was presented first then product two; “2-1” means product 2 was presented first then product one; “1-1” means product 1 was presented both times and similarly for “2-2”.
A richer format can also be used. EyeQuestion creates this richer format where the separator is either “-” if the panellist answered “same” or “~” if the panellist answered “different”. For example “2~1” means product 2 was presented first then product 1 and the panellist answered “different”; “1~1” can be read as product 1 being presented both times then the panellist answering “different”. The analysis understands this format and if there is a disagreement between the information attribute and the attribute concerning what the panellist answered then the panellist answer is taken from the attribute.
This is a contingency table with the responses split for all four combinations of first and second product presented. For each attribute this is a table with columns First Product, Second Product, Same and Different. The First/Second Product columns include the names of the products, whereas the Same column contains the total number of same responses for that presentation order and the Different column contains the total number of different responses for that presentation order.
These tables can be useful for spotting potential presentation order bias. For example, if assessors’ responded ‘Same’ more often to an AB pair than a BA pair and the design is balanced then this could indicate presentation order bias (for an unbalanced design you should instead consider the proportion of ‘Same’ responses for each pair).
The presentation order table will only be made for attributes that have a corresponding ‘info’ attribute, e.g., Q1 and Q1__info, because the info attribute includes the presentation order information.
If there are no information attributes then there won’t be any presentation order tables but there will be a relevant message in the warning table. Any info attribute that doesn’t have a corresponding attribute, for example Q2__info exists but Q2 does not, will be removed with a warning included.
This is a contingency table of the same/different responses
in columns against the correct answers in rows. If there are multiple
attributes then there will be a table for each.
This is a Thurstonian model, we’ll call it the ‘yardstick’ model but its name varies the literature, for example tau skimming or differencing. Essentially it models the assessors as using the decision strategy of taking the absolute difference between the two products observed then answering different if this is greater than a threshold, otherwise answering same. This threshold is called tau or 𝜏.
The yardstick model is fitted via a general linear modelling approach. If the fit does not converge then a warning message is included. The model also cannot be fitted if there is insufficient data.
If the model can be fitted then the information is displayed in a table of parameters, estimates, standard errors, confidence intervals of chosen level and p-values. These p-values are for the null hypothesis of delta equal to 0 with alternative hypothesis delta does not equal to 0.
The confidence intervals and p-values are calculated using the profile likelihood (not the observed Fisher’s information).
If there are multiple attributes then a model is fitted for each attribute.
The d-prime estimate is the estimated sensory distance between the two samples and the associated p-value indicates if the samples are significantly different, and at what level. You will decide whether to conclude if the samples are different based on the risk you want to take.
The tau parameter conveys information about the strategy used by the assessors. A large estimate of tau with a significant p-value suggests a bias towards responding ‘Same’. However, care must be taken in interpreting this because if the two products (present in the experiment) are similar then a bias towards responding ‘Same’ is an effective strategy.
A chi-squared test of independence is done for every contingency table with no cells less than 4. This is done without continuity correction. This chi-squared itself test tests if the rows and columns of the contingency table are associated.
If the chi-squared test is (statistically) significant then the contingency table can suggest which conclusion should be drawn about the relationship between the two products.
The results of all these tests are collected into a single table of the Chi-Squared statistics, the degrees of freedom and p-values.
This test will only be performed when the Design option is set to ‘short’.
If an assessor did not see both a same-pair and a different-pair then this assessor does not contribute to this table. If an assessor evaluated more than one same-pair or more than one different-pair then the responses are tabulated via proportions.
This table will only be shown when the Design option is set to ‘long’.
McNemar’s Chi Squared test is performed on the McNemar Contingency Table, as recommended by Lawless & Heymann (2010).
It tests the hypothesis of marginal homogeneity; are the marginal proportions in the associated table similar. In this case a significant p-value indicates a difference between the two products (present in the experiment).
The method used depends on the option chosen.
For “Asymptotic” (default) the chi-squared statistic is calculated with continuity correction and the p-value is from the appropriate chi-squared distribution. Asymptotic statistics are recommended in recent same different test literature (Fagerland, Lydersen & Laake, 2013; Pembury Smith & Ruxton, 2020).
For “Exact” the chi-squared statistic is calculated without continuity correction and the p-value is calculated from binomial distribution with N = number of incorrect answers, p = 0.5, with the observed value being the number of one type of incorrect answers (the test is two sided so which of these it is does not matter). If due to the data it is not possible to use the “Exact” method then the “Asymptotic” method is used and this is noted in the warnings table.This test will only be performed when the Design option is set to ‘long’.