Purpose
Performs a Pearson Correlation analysis on
all the numeric variables in a data set.
A square symmetric table of correlation coefficients is produced. Optionally, statistical significance tests
may be performed on each coefficient.
As the software will attempt compute
correlation coefficients between all possible pairs of numeric variables in a
data set, the only requirement is to have a data set with at least two numeric
variables.
Background
The Pearson correlation coefficient between
n pairs of numbers is a measure of how close those points are to a
straight line if an XY-scatter plot is made of the data pairs. If there is a perfect linear relationship
between the two variables, then the correlation coefficient will either be +1
or -1. If there is no linear relationship at all, then the coefficient will be
zero. In all other circumstances the coefficient is somewhere in between -1 and
1 and the absolute size of the coefficient indicates the strength of the linear
relationship. The sign of the coefficient
indicates the direction of the relationship:
- Positive coefficients indicate
that both variables increase together and decrease together.
- Negative coefficients indicate
that one variable increases as the other decreases.
Note that the correlation coefficient only assesses
the strength of linear relationships.
Two variables with a perfect non-linear (curved) relationship will have
a Pearson correlation coefficient that is not +1 or -1.
Options
- Compute Table of Means – use this option
(the default) if you would like to compute mean values per product, and then
calculate correlation coefficients on the table of means. In this instance n,
the number of data pairs, will be equal to the number of products in your data
set. Note that it is recommended to have
at least 6 products.
- Run on imported Data – use this option
if you do not want to compute means before computing correlation
coefficients. In this instance n,
the number of data pairs, will be equal to the number of observations (rows) in
your data set.
- Significance – Yes/No. use this option
if you want to perform a significance test on each Pearson correlation
coefficient. Default=Yes.
- Number of decimals for values – controls
the number of decimals printed in the output for the correlation coefficients.
- Number of decimals for p-values –
controls the number of decimals printed for p-values.
Results and Interpretation
- The ‘Correlation Coefficient’ tab is always generated and contains a
square symmetric table where both rows and columns are indexed by the numeric
variables in the data set. Each cell
contains a correlation coefficient for a pair of variables. The entries in the
leading diagonal of the table will always be 1.00 because the correlation of a
variable with itself is 1.00 by definition.
- The ‘Correlation Significance’ tab is generated when the
significance=yes option is used and contains the same table of correlation
coefficient described above with added background colour that indicates
significance at specific levels:
- Light green – positive
correlation with p-value between 0.1 and 0.05.
- Dark green – positive
correlation with p-value <=0.05.
- Light red – negative
correlation with p-value between 0.1 and 0.05.
- Dark red – negative correlation
with p-value <=0.05.
- The ‘P-Values’ tab is generated when the significance=yes option is
used and gives the actual p-value for the significance of each correlation
coefficient.
- If the ‘Compute Table of Means’ option is used, the ‘Information’
tab will contain a message on the method used to compute the means.
- The R function cor from the
package ‘stats’ is used to return the matrix of correlation coefficients, the
function cor.test from the package ‘stats’ is used to return the pairwise
p-values.
References
- Martin Bland (2015) “An Introduction to
Medical Statistics – 4th Edition”, Oxford University Press. See chapter 11 “Regression and Correlation”.