Both Ozzy Osbourne and Judas Priest have been accused of putting backward masked messages on their albums that subliminally influence poor unsuspecting teenagers into doing Very Bad things. A psychologist was interested in whether backward masked messages could have an effect. He created a version of Taylor Swift’s ‘Shake it off’ that contained the masked message ‘deliver your soul to the dark lord’ repeated in the chorus. He took this version, and the original, and played one version (randomly) to a group of 32 veterinary students. Six months later he played them whatever version they hadn’t heard the time before. So each student heard both the original and the version with the masked message, but at different points in time. The psychologist measured the number of goats that the students sacrificed in the week after listening to each version. Test the hypothesis that the backward message would lead to more goats being sacrificed using a Wilcoxon signed-rank test.
We are comparing scores from the same individuals after exposure to two songs, so we need to use the Wilcoxon signed-rank test.
If we define the difference as nomessage - message
, so a positive rank is where more goats were sacrificed after no message than after a message (i.e. no message
> message
).
The output below tells us that the test statistic is 294.5, and this is the sum of positive ranks so T+ = 294.5. There are 32 participants, with 4 tied observations, so there were 28 ranks in total and so the sum of all ranks (let’s label this Tall) is message - nomessage
(flipping the variable pair).
The effect size is r_rb = 0.45. This effect size tells us proportionately how many more positive ranks there were than negative ranks (0.45 or 45%). Since we want to test the hypothesis that more goats are slaughtered after hearing a message, we specify such a one-sided alternative hypothesis, and observe a p-value of .982. Such a non-significant p-value is because our observed effect actually goes into the opposite direction! We could report something like (note that you can obtain the second bathc of values by changing the order of the variable pairs):
The number of goats sacrificed after hearing the message was not significantly lower than after hearing the normal version of the song, T = 111.5, p = .982, r_rb = −0.45. In fact the number of goats sacrificed seemed to be significantly higher after hearing the normal version T = 294.5, p = .019, r_rb = 0.45.
This illustrates why it’s safer to do a two-sided test if you just want to test for a difference: if we would have done a two-sided test, the p-value would be .037 for both definitions of the difference scores. However, be sure to then not attribute a one-sided interpretation to this two-sided p-value.
95% CI for Rank-Biserial Correlation | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Measure 1 | Measure 2 | W | z | df | p | Rank-Biserial Correlation | SE Rank-Biserial Correlation | Lower | Upper | ||||||||||||
nomessage | - | message | 294.500 | 2.084 | 0.982 | 0.451 | 0.213 | -∞ | 0.687 | ||||||||||||
Note. For all tests, the alternative hypothesis specifies that nomessage is less than message. | |||||||||||||||||||||
Note. Wilcoxon signed-rank test. |
The raincloud plots below give a neat visualization of the observed differences. The lines connecting the matched observations show that most people saw an increase in the "no message" condition. The second raincloud plot shows the observed differences: the y-axis shows that the difference is defined as nomessage - message, and most differences are positive, indicating that observations in the "nomessage" condition were generally higher.