Results

Let’s look at these variables. In the UK, GCSEs are school exams taken at age 16 that are graded A, B, C, D, E or F. These grades are categories that have an order of importance (an A grade is better than all of the lower grades). In the UK, a university student can get a first-class mark, an upper second, a lower second, a third, a pass or a fail. These grades are categories, but they have an order to them (an upper second is better than a lower second). When you have categories like these that can be ordered in a meaningful way, the data are said to be ordinal. The data are not interval, because a first-class degree encompasses a 30% range (70–100%), whereas an upper second only covers a 10% range (60–70%). When data have been measured at only the ordinal level they are said to be non-parametric and Pearson’s correlation is not appropriate. Therefore, the Spearman correlation coefficient is used.

In the file, the scores are in two columns: one labelled stats and one labelled gcse. Each of the categories described above has been coded with a numeric value. In both cases, the highest grade (first class or A grade) has been coded with the value 1, with subsequent categories being labelled 2, 3 and so on. Note that for each numeric code I have provided a value label (just like we did for coding variables).

In the question I predicted that better grades in GCSE maths would correlate with better degree grades for my statistics course. This hypothesis is directional and so a one-tailed test could be selected; however, in the chapter I advised against one-tailed tests so I have done two-tailed.

The outpout below shows the Spearman correlation on the variables stats and gcse. The output shows a matrix giving the correlation coefficient between the two variables (0.455), underneath is the significance value of this coefficient (0.022) and then the sample size (25). [Note: it is good to check that the value of N corresponds to the number of observations that were made. If it doesn’t then data may have been excluded for some reason.]

I also requested the bootstrapped confidence intervals (–0.017, 0.716). The significance value for this correlation coefficient is less than 0.05; therefore, it can be concluded that there is a significant relationship between a student’s grade in GCSE maths and their degree grade for their statistics course. However, the bootstrapped confidence interval crosses zero, suggesting (under the usual assumptions) that the effect in the population could be zero. It is worth remembering that if we were to rerun the analysis we would get different results for the bootstrap confidence interval. The p-value is only just significant (0.022), although the correlation coefficient is fairly large (0.455). This situation demonstrates that it is important to replicate studies.

Smart Alex 7.4 Spearman Rho

Spearman's Correlations
Variable   stats gcse
1. stats n
Spearman's rho
p-value  
Lower 95% CI
Upper 95% CI
2. gcse n 25
Spearman's rho 0.455
p-value 0.022
Lower 95% CI 0.117
Upper 95% CI 0.716
Note.  Confidence intervals based on 1000 bootstrap replicates.

Smart Alex 7.4 Kendall's Tau

We could also look at Kendall’s correlation. The output below is much the same as for Spearman’s correlation. The value of Kendall’s coefficient is less than Spearman’s (it has decreased from 0.455 to 0.354), but it is still statistically significant (because the p-value of 0.029 is less than 0.05). The bootstrapped confidence intervals do not cross zero (0.052, 0.619) suggesting that there is likely to be a positive relationship in the population. We cannot assume that the GCSE grades caused the degree students to do better in their statistics course.

Kendall's Tau Correlations
Variable   stats gcse
1. stats n
Kendall's Tau B
p-value  
Lower 95% CI
Upper 95% CI
2. gcse n 25
Kendall's Tau B 0.354
p-value 0.029
Lower 95% CI 0.052
Upper 95% CI 0.619
Note.  Confidence intervals based on 1000 bootstrap replicates.