Results

Logistic Regression

Task 17.1: A ‘display rule’ refers to displaying an appropriate emotion in a situation. For example, if you receive a present that you don’t like, you should smile politely and say ‘Thank you, Auntie Kate, I’ve always wanted a rotting cabbage’; you do not start crying and scream ‘Why did you buy me a rotting cabbage?!’. A psychologist measured children’s understanding of display rules (with a task that they could pass or fail), their age (months), and their ability to understand others’ mental states (‘theory of mind’, measured with a false belief task that they could pass or fail). Can display rule understanding (did the child pass the test: yes/no?) be predicted from theory of mind (did the child pass the false belief, fb, task: yes/no?), age and their interaction? (display.jasp)


Notice that both of the categorical variables have been entered as coding variables: the outcome variable is coded as 1 is having display rule understanding, and 0 represents an absence of display rule understanding. For the false-belief task a similar coding has been used (1 = passed the false-belief task, 0 = failed the false-belief task).


To compare the impact of each predictor separately, go to the Model tab and add two models. Model 0 should not include any predictors, Model 1 should only include fb, Model 2 should include fb and age, and finally Model 3 should include fb, age, and their interaction. To input the interaction, hold Ctrl (⌘ on a mac) while selecting both predictors, then add them to Model 3.


Model Summary - display
Model Deviance AIC BIC df ΔΧ² p McFadden R² Nagelkerke R² Tjur R² Cox & Snell R²
M₀ 96.124 98.124 100.373 69     0.000 0.000
M₁ 70.042 74.042 78.539 68 26.083 < .001 0.271 0.417 0.352 0.311
M₂ 67.757 73.757 80.502 67 2.285 0.131 0.295 0.446 0.373 0.333
M₃ 67.634 75.634 84.628 66 0.123 0.726 0.296 0.448 0.374 0.334
Note.  M₁ includes fb
Note.  M₂ includes fb, age
Note.  M₃ includes age, fb, age:fb

The log-likelihood of the baseline model (M0) is 96.124. This represents the fit of the model when including only the constant. Initially every child is predicted to belong to the category in which most observed cases fell. In this example there were 39 children who had display rule understanding and only 31 who did not. Therefore, of the two available options it is better to predict that all children had display rule understanding because this results in a greater number of correct predictions. Overall, the model correctly classifies 55.7% of children (since 55.7% of the children actually had display rule understanding).

Coefficients
Wald Test
95% Confidence interval
(odds ratio scale)
Model   Estimate Standard Error Odds Ratio z Wald Statistic df p Lower bound Upper bound
M₀ (Intercept) 0.230 0.241 1.258 0.954 0.910 1 0.340 0.785 2.016
M₁ (Intercept) -1.344 0.458 0.261 -2.931 8.592 1 0.003 0.106 0.641
  fb (Yes) 2.761 0.605 15.812 4.567 20.857 1 < .001 4.835 51.710
M₂ (Intercept) -2.496 0.918 0.082 -2.718 7.389 1 0.007 0.014 0.498
  fb (Yes) 2.165 0.696 8.715 3.112 9.683 1 0.002 2.229 34.081
  age 0.032 0.021 1.032 1.490 2.221 1 0.136 0.990 1.077
M₃ (Intercept) -2.948 1.596 0.052 -1.847 3.410 1 0.065 0.002 1.198
  age 0.044 0.041 1.045 1.085 1.176 1 0.278 0.965 1.132
  fb (Yes) 2.858 2.105 17.422 1.357 1.843 1 0.175 0.281 1079.156
  age * fb (Yes) -0.017 0.048 0.983 -0.351 0.123 1 0.726 0.896 1.080
Note.  display level 'Yes' coded as class 1.

In M1, false-belief understanding (fb) is added to the model as a predictor. As such, a child is now classified as having display rule understanding based on whether they passed or failed the false-belief task. The output above shows summary statistics about the new model. The overall fit of the new model is assessed using the Deviance. Remember that large values of the deviance statistic indicate poorly fitting statistical models.


If fb has improved the fit of the model then the value of deviance should be less than the value when only the constant was included (because lower values of deviance indicate better fit). When only the constant was included, deviance = 96.124, but now fb has been included this value has been reduced to 70.042. This reduction tells us that the model is better at predicting display rule understanding than it was before fb was added. We can assess the significance of the change in a model by taking the deviance of the new model and subtracting the deviance of the baseline model from it. The value of the model chi-square statistic ( ) works on this principle and is, therefore, equal to deviance with fb included minus the value of deviance when only the constant was in the model ( 96.124 − 70.042 = 26.083). This value has a chi-square distribution. In this example, the value is significant at the p < .001 level and so we can say that overall the model predicts display rule understanding significantly better than with fb included than with only the constant included.


The Coefficients table tells us the estimates for the coefficients for the predictors included in the M1 model (namely, fb and the constant). The coefficient represents the change in the logit of the outcome variable associated with a one-unit change in the predictor variable. The logit of the outcome is the natural logarithm of the odds of Y occurring.


The Wald statistic has a chi-square distribution and tells us whether the b coefficient for that predictor is significantly different from zero. If the coefficient is significantly different from zero then we can assume that the predictor is making a significant contribution to the prediction of the outcome (Y). For these data it seems to indicate that false-belief understanding is a significant predictor of display rule understanding (note the significance of the Wald statistic is less than .001).


We can also look at the Model Summary table to see if the other predictors (age and the interaction effect) add predictive value to the model. First, we have M3, which reduced the Deviance by another 2.3, compared to M2. We can use this result ( ) combined with its degrees of freedom (68-67 = 1), obtain the p-value of 0.131 that is listed in the table. The decrease in Deviance is therefore not significant, which indicates that age does not improve our model predictions to the extent that it’s worth including in the model. Here we also see BIC shining. The BIC is similar to the Deviance, in that a lower number indicates better model fit, but the BIC has some extra heavy penalty for model complexity (i.e., the number of predictors). That is a nice feature because in model selection we generally want to have a built-in Occam’s razor, where we want to predict as well as possible, while using as few predictors as possible. Since M3 has an additional predictor, while not improving the predictions much, the BIC for M3 is actually higher than that of M2 - bad news for M3! The same goes for M4, where adding the interaction does not improve model fit significantly (  (1) = 0.123, p = .726), and where the BIC is even higher than that of M3.


Assuming we are content that the model is accurate and that false-belief understanding has some substantive significance, then we could conclude that false-belief understanding is the single best predictor of display rule understanding. Furthermore, age and the interaction of age and false-belief understanding did not significantly predict display rule understanding. This conclusion is fine in itself, but to be sure that the model is a good one, it is important to examine the residuals, which brings us nicely onto the next task.



Performance Diagnostics

Confusion matrix
Predicted
Observed No Yes % Correct
No 23 8 74.194
Yes 6 33 84.615
Overall % Correct 80.000
Note.  The cut-off value is set to 0.5

The confusion matrix shows the predictions of a child passing the display rule task, based on M4. If the model perfectly fits the data, then all of the cases should fall along the diagonal (i.e., we predict a child to pass the display test and they actually did, or we predict they fail the display test and they actually did). In this example, our model correctly classified 80% of the observations.

Logistic Regression: Influential Cases

Task 17.2: Are there any influential cases or outliers in the model for Task 1?

First, we need to make sure we obtain output for the best model, based on the adventure we had in the previous exercise. This was the model with only fb, so make sure that you now have a logistic regression with the baseline model (M0) and a single alternative model (M1) that has the predictor in it. To obtain information about influential cases, we need to look at the Casewise diagnostics option, which can be found under the Statistics tab. To detect outliers, you can use a threshold for the standardized residual (anything greater than 3 is worth investigating) or Cook's distance (values close to, or greater than, 1 are usually problematic). Luckily, there do not seem to be any observations that cross the problematic thresholds, so we don't have reason to be concerned about outliers.

Model Summary - display
Model Deviance AIC BIC df ΔΧ² p McFadden R² Nagelkerke R² Tjur R² Cox & Snell R²
M₀ 96.124 98.124 100.373 69     0.000 0.000
M₁ 70.042 74.042 78.539 68 26.083 < .001 0.271 0.417 0.352 0.311
Note.  M₁ includes fb
Coefficients
Wald Test
95% Confidence interval
(odds ratio scale)
Model   Estimate Standard Error Odds Ratio z Wald Statistic df p Lower bound Upper bound
M₀ (Intercept) 0.230 0.241 1.258 0.954 0.910 1 0.340 0.785 2.016
M₁ (Intercept) -1.344 0.458 0.261 -2.931 8.592 1 0.003 0.106 0.641
  fb (Yes) 2.761 0.605 15.812 4.567 20.857 1 < .001 4.835 51.710
Note.  display level 'Yes' coded as class 1.
Influential Cases
Case Number Std. Residual display Predicted Value Residual Cook's Distance Leverage
. . . . . . .
Note.  No influential cases found.