Smart Alex Task 8.1

Descriptive Statistics 

Descriptive Statistics
hours
essay
  First class Upper second class Lower second class Third class First class Upper second class Lower second class Third class
Valid 10 23 10 2 10 23 10 2
Mean 8.827 8.669 7.680 5.628 72.332 63.631 57.050 48.960
Std. Deviation 3.932 2.539 1.537 0.814 3.488 2.797 3.451 1.034

Scatter Plots

We’re interested in looking at the relationship between hours spent on an essay and the grade obtained. We could create a scatterplot of hours spent on the essay (x-axis) and essay mark (%) (y-axis). I’ve chosen to highlight the degree classification grades using different colours. The resulting scatterplot is below.

hours - essay

Q-Q Plots

Descriptive Statistics
  hours essay
Valid 45 45
Mean 8.349 63.450
Std. Deviation 2.725 6.757

Q-Q Plots

The Q-Q plots both look fairly normal (below).

hours

essay

Pearson's Correlation

Pearson's Correlations
Variable   essay hours
1. essay n
Pearson's r
p-value  
Lower 95% CI
Upper 95% CI
2. hours n 45
Pearson's r 0.267
p-value 0.038
Lower 95% CI -0.074
Upper 95% CI 0.532
Note.  All tests one-tailed, for positive correlation.
Note.  Confidence intervals based on 1000 bootstrap replicates.

The results in the Pearson's Correlations table above indicate that the relationship between time spent writing an essay (hours) and grade awarded (essay %) was not significant, Pearson’s r = 0.267, 95% BCa CI [-0.061, 0.532], p = 0.077.

Additional Correlations

The second part of the question asks us to do the same analysis but when the percentages are recoded into degree classifications. The degree classifications are ordinal data (not interval): they are ordered categories. So we shouldn’t use Pearson’s test statistic, but Spearman’s and Kendall’s ones instead.

Correlation Table
Variable   hours grade
1. hours n
Spearman's rho
p-value  
Kendall's Tau B
p-value  
2. grade n 45
Spearman's rho -0.193
p-value 0.204
Kendall's Tau B -0.158
p-value 0.178

In both cases the correlation is non-significant. There was no significant relationship between degree grade classification for an essay and the time spent doing it, 𝜌= -0.193, p = 0.204, and 𝜏= –0.158, p = 0.178. Note that the direction of the relationship has reversed. This has happened because the essay marks were recoded as 1 (first), 2 (upper second), 3 (lower second), and 4 (third), so high grades were represented by low numbers. This example illustrates one of the benefits of not taking continuous data (like percentages) and transforming them into categorical data: when you do, you lose information and often statistical power!