In a second study, Piff et al. (2012) observed the behaviour of drivers and classified social class by the type of car (vehicle), but the outcome was whether the drivers cut off a pedestrian at a crossing (pedestrian_cut). Do a logistic regression to see whether social class predicts whether or not a driver prevents a pedestrian from crossing (piff_2012_pedestrian.jasp).
Model | Deviance | AIC | BIC | df | ΔΧ² | p | McFadden R² | Nagelkerke R² | Tjur R² | Cox & Snell R² | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M₀ | 197.796 | 199.796 | 202.820 | 151 | 0.000 | 0.000 | |||||||||||||||
M₁ | 192.939 | 196.939 | 202.987 | 150 | 4.856 | 0.028 | 0.025 | 0.043 | 0.031 | 0.031 | |||||||||||
Note. M₁ includes vehicle |
Wald Test
|
95% Confidence interval
(odds ratio scale) |
||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Estimate | Standard Error | Odds Ratio | z | Wald Statistic | df | p | Lower bound | Upper bound | ||||||||||||
M₀ | (Intercept) | -0.596 | 0.169 | 0.551 | -3.517 | 12.366 | 1 | < .001 | 0.395 | 0.768 | |||||||||||
M₁ | (Intercept) | -1.910 | 0.643 | 0.148 | -2.971 | 8.828 | 1 | 0.003 | 0.042 | 0.522 | |||||||||||
vehicle | 0.402 | 0.187 | 1.495 | 2.154 | 4.641 | 1 | 0.031 | 1.037 | 2.155 | ||||||||||||
Note. pedestrian_cut level 'Yes' coded as class 1. |
Predicted | |||||||
---|---|---|---|---|---|---|---|
Observed | No | Yes | % Correct | ||||
No | 91 | 7 | 92.857 | ||||
Yes | 48 | 6 | 11.111 | ||||
Overall % Correct | 63.816 | ||||||
Note. The cut-off value is set to 0.5 |
The first row in the Model Summary table tells us about the model when only the constant is included. In this example there were 54 participants who did cut off pedestrians at intersections and 98 who did not. Therefore, of the two available options it is better to predict that all participants did not cut off other vehicles because this results in a greater number of correct predictions. The model in this basic state predicting that all participants did not cut off pedestrians results in 0% accuracy for those who did cut off pedestrians, and 100% accuracy for those who did not. Overall, the model correctly classifies 64.5% of participants, because 98/152 = 64.5% of the participants did not cut off pedestrians.. The table labelled Coefficients at this stage contains only the constant, which has a value of b0 = −0.596.
The second row in the Model Summary table tells us what happened after the predictor variable (vehicle) has been added to the model. As such, a person is now classified as either cutting off pedestrians at an intersection or not, based on the type of vehicle they were driving (as a measure of social status). The output shows summary statistics about the new model. The overall fit of the new model is significant with p = .028. Therefore, the model that includes the variable vehicle predicted whether or not participants cut off pedestrians at intersections better than the model that includes only the constant.
The Confusion matrix indicates how well the model predicts group membership. In M1, the model correctly classifies 91 participants who did not cut off pedestrians and misclassifies 7 (i.e. it correctly classifies 92.9% of cases). For participants who do did cut off pedestrians, the model correctly classifies 6 and misclassifies 48 cases (i.e. correctly classifies 11.1% of cases). The overall accuracy of classification is the weighted average of these two values (63.8%). Therefore, the accuracy (overall) has decreased slightly (from 64.5% to 63.8%).
The Coefficients table shows that significance of the Wald statistic is .031, which is less than .05. Therefore, we can conclude that the status of the vehicle the participant was driving significantly predicted whether or not they cut off pedestrians at an intersection. The odds ratio is the change in odds of the outcome resulting from a unit change in the predictor. In this example, the odds ratio for vehicle in step 1 is 1.495, which is greater than 1, indicating that as the predictor (vehicle) increases, the value of the outcome also increases, that is, the value of the categorical variable moves from 0 (did not cut off pedestrian) to 1 (cut off pedestrian). The 95% confidence interval does not include 1, indicating that there likely is an effect. However, just as in the previous exericse the BIC (which is slightly higher for M1) and confusion matrix (which shows poorer predictions for M1) do not seem to agree with this significance - perhaps it's time to stop using 0.05 as the signfiicance threshold and opt for something more strict, like 0.01 or 0.005?