The Research Process

Data Analysis for Psychology in R 3

Josiah King

Psychology, PPLS

University of Edinburgh

QR CODE!!!

Course Overview

multilevel modelling
working with group structured data
regression refresher
introducing multilevel models
more complex groupings
centering, assumptions, and diagnostics
recap
factor analysis
working with multi-item measures
what is a psychometric test?
using composite scores to simplify data (PCA)
uncovering underlying constructs (EFA)
more EFA
recap

VISUAL RECAP

example

How do reaction times change with increasing sleep deprivation? Participants measured over 10 days of sleep deprivation.

lm: a line

lm: a line (2)

lm: a line (3)

lm: lines and differences between them

lmer: lines with a distribution of intercepts

lmer: lines with a distribution of intercepts (2)

lmer: lines with a distribution of intercepts (3)

lmer: lines with a distribution of intercepts (4)

lmer: and with a distribution of slopes

lmer: and with a distribution of slopes (2)

fixed and random

nested: distributions of distributions

crossed: distributions and distributions

The research process: model specification

Study

In 2010 A US state’s commissioner for education was faced with growing community concern about rising levels of adolescent antisocial behaviours.

After a series of focus groups, the commissioner approved the trialing of an intervention in which yearly Parent Management Training (PMT) group sessions were offered to the parents of a cohort of students entering 10 different high schools. Every year, the parents were asked to fill out an informant-based version of the Aggressive Behaviour Scale (ABS), measuring verbal and physical abuse, socially inappropriate behavior, and resisting care. Where possible, the same parents were followed up throughout the child’s progression through high school. Alongside this, parents from a cohort of students entering 10 further high schools in the state were recruited to also complete the same informant-based ABS, but were not offered the PMT group sessions.
The commissioner has two main questions: Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

The data is available at https://uoepsy.github.io/data/abs_intervention.csv

Data

variable description
schoolid Name of school
ppt Participant number
age Age of participant (in years) at observation
interv Informant-based Aggressive Behaviour Scale (ABS) score (range 0 to 100)
ABS Whether or not the school was part of the intervention group in which Parent Management Training (PMT) group sessions were offered to the parents
# A tibble: 6 × 5
  schoolid                 ppt   age interv   ABS
  <chr>                  <dbl> <dbl> <fct>  <dbl>
1 Blue River High School     1    12 1       47.5
2 Blue River High School     1    13 1       47.4
3 Blue River High School     1    14 1       53.0
4 Blue River High School     1    15 1       70.7
5 Blue River High School     1    16 1       66.3
6 Blue River High School     1    17 1       NA  

Question to Model

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

Outcome

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • What are we interested in explaining/predicting?

    • How is this measured?

Fixed Effects

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • What are we interested in explaining/predicting?

    • How is this measured?
  • What variables are we interested in explaining this by?

               schoolid ppt age interv  ABS
 Blue River High School   1  12      1 47.5
 Blue River High School   1  13      1 47.4
 Blue River High School   1  14      1   53
 Blue River High School   1  15      1 70.7
                    ... ... ...    ...  ...

Within & Between Effects

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • Are our questions about the effects of our predictors specifically in reference to group means of predictors?

    • “the effect of being higher on \(x\) for a group”

    • “the effect of a group being on average higher on \(x\)

The Grouping Structure

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • In what different ways can we group our data?
               schoolid ppt age interv  ABS
 Blue River High School   1  12      1 47.5
 Blue River High School   1  13      1 47.4
 Blue River High School   1  14      1   53
 Blue River High School   1  15      1 70.7
                    ... ... ...    ...  ...

The Grouping Structure

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • In what different ways can we group our data?
    • Of the different ways we can group our data, which groupings are of specific inferential interest?
               schoolid ppt age interv  ABS
 Blue River High School   1  12      1 47.5
 Blue River High School   1  13      1 47.4
 Blue River High School   1  14      1   53
 Blue River High School   1  15      1 70.7
                    ... ... ...    ...  ...

The Grouping Structure

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • In what different ways can we group our data?
    • Of the different ways we can group our data, which groupings do we think of as a random sample from a general population of groups?
               schoolid ppt age interv  ABS
 Blue River High School   1  12      1 47.5
 Blue River High School   1  13      1 47.4
 Blue River High School   1  14      1   53
 Blue River High School   1  15      1 70.7
                    ... ... ...    ...  ...

The Grouping Structure

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • In what different ways can we group our data?
    • Of the different ways we can group our data, which groupings do we think of as a random sample from a general population of groups?

      • Is there more than one grouping of this sort, and if so, are these groupings nested? Are the labels unique?
      • For each level, how many groups have we sampled?
table(absint$schoolid, absint$ppt)
                                 1 2 3 4 5 6 7 8 ...
  Blue River High School         8 8 8 8 8 8 8 8 ...
  Central Grammar School         8 8 8 8 8 8 8 8 ...
  Central High                   8 8 8 8 8 8 8 8 ...
  Clearwater Charter School      8 8 8 8 8 8 8 8 ...
  Clearwater School of Fine Arts 8 8 8 8 8 8 8 8 ...
  ...                            . . . . . . . . ...

The Grouping Structure

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • In what different ways can we group our data?
    • Of the different ways we can group our data, which groupings do we think of as a random sample from a general population of groups?

      • Is there more than one grouping of this sort, and if so, are these groupings nested? Are the labels unique?
      • For each level, how many groups have we sampled?
nrow(absint)
[1] 1568
absint |> 
  mutate(schoolppt = interaction(schoolid,ppt)) |>
  summarise(across(everything(), n_distinct))
# A tibble: 1 × 6
  schoolid   ppt   age interv   ABS schoolppt
     <int> <int> <int>  <int> <int>     <int>
1       20    16     8      2   993       196

Random Intercepts and Slopes

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • Which of our fixed effects can vary across our groupings?

    • “does a single group have multiple distinct values for \(x\)?”

    • “for the data from only one of our groups, can we estimate \(y \sim x\)?”

All the data:

Data from School == “Central High”:

Random Intercepts and Slopes

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

  • Which of our fixed effects can vary across our groupings?

    • “does a single group have multiple distinct values for \(x\)?”

    • “for the data from only one of our groups, can we estimate \(y \sim x\)?”

All the data:

Data from School == “Central High”, Participant == “1”:

Our model

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

      lmer(ABS         ~ age * interv +   (1 + age               |     schoolid        ) +

                                                              (1 + age               |     schoolid:ppt        )

Our model

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

(g)lmer(outcome ~ fixed effects + (random effects | grouping structure), family = …)

    lmer(ABS           ~ ageC * interv + (1 + ageC               |     schoolid        ) +

                                                              (1 + ageC               |     schoolid:ppt        )

schoolid ppt age interv ABS ageC
Blue River High School 1 12 1 47.5 0
Blue River High School 1 13 1 47.4 1
Blue River High School 1 14 1 53 2
Blue River High School 1 15 1 70.7 3

The research process: model fitting

Model issues

  • Check for convergence issues & singular fits.
absmod <- lmer(ABS ~ ageC * interv + 
                 (1 + ageC | schoolid)+
                 (1 + ageC | schoolid:ppt),
               data = absint)
absmod@optinfo$message
[1] "NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached."

Assumptions

  • Check that we’re happy to assume the things we need to assume
plot(absmod,
     type = c("p"), main = "fitted vs residuals")

plot(absmod, 
     form = sqrt(abs(resid(.))) ~ fitted(.),
     type = c("p"), main="scale-location")

Assumptions

qqnorm(resid(absmod), main = "level one residuals")
qqline(resid(absmod))

qqnorm(ranef(absmod)$schoolid$`(Intercept)`, main="School level random intercepts")
qqline(ranef(absmod)$schoolid$`(Intercept)`)
qqnorm(ranef(absmod)$schoolid$ageC, main="School level random slopes of age")
qqline(ranef(absmod)$schoolid$ageC)

qqnorm(ranef(absmod)$`schoolid:ppt`$`(Intercept)`, main="Child level random intercepts")
qqline(ranef(absmod)$`schoolid:ppt`$`(Intercept)`)
qqnorm(ranef(absmod)$`schoolid:ppt`$ageC, main="Child level random slopes of age")
qqline(ranef(absmod)$`schoolid:ppt`$ageC)

Assumptions

Do tests if you want, but beware. Multilevel data tends to have bigger \(n\), and these tests are overly sensitive.

  • Personally, I prefer plots.
[1] "Shapiro-Wilk normality test resid(absmod) W=1, p=0.98"                             
[2] "Shapiro-Wilk normality test ranef(absmod)$schoolid$`(Intercept)` W=0.97, p=0.69"   
[3] "Shapiro-Wilk normality test ranef(absmod)$schoolid$ageC W=0.95, p=0.34"            
[4] "Shapiro-Wilk normality test ranef(absmod)$`schoolid:ppt`$`(Intercept)` W=1, p=0.95"
[5] "Shapiro-Wilk normality test ranef(absmod)$`schoolid:ppt`$ageC W=0.99, p=0.44"      

Diagnostics

The research process: inference

Inference

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

Options

  • Model comparison

  • Parameters

Methods

  • satterthwaite df:
    • lmerTest::lmer() and pbkrtest::SATmodcomp(model2,model1)
  • likelihood ratio tests:
    • anova(model1, model2, ...)
  • kenward-rogers df
  • likelihood profile confidence intervals
  • parametric bootstrap
  • residual bootstrap
  • case bootstrap

Inference

Does the presentation of aggressive behaviours increase as children enter the secondary school system? If so, is there any evidence for the effectiveness of Parent Management Training (PMT) group sessions in curbing the rise of aggressive behaviors during a child’s transition into the secondary school system?

Options

  • Model comparison

  • Parameters

fixef(absmod)
 (Intercept)         ageC      interv1 ageC:interv1 
       46.46         1.67         1.10        -1.35 

Inference

absmodp <- lmerTest::lmer(ABS ~ ageC * interv + 
                 (1 + ageC | schoolid)+
                 (1 + ageC | schoolid:ppt),
               data = absint)
summary(absmodp)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: 
ABS ~ ageC * interv + (1 + ageC | schoolid) + (1 + ageC | schoolid:ppt)
   Data: absint

REML criterion at convergence: 7132

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.0748 -0.5767  0.0061  0.5712  3.0350 

Random effects:
 Groups       Name        Variance Std.Dev. Corr
 schoolid:ppt (Intercept) 17.379   4.169        
              ageC         4.913   2.217    0.69
 schoolid     (Intercept)  2.505   1.583        
              ageC         0.448   0.669    0.26
 Residual                 49.465   7.033        
Number of obs: 992, groups:  schoolid:ppt, 196; schoolid, 20

Fixed effects:
             Estimate Std. Error     df t value Pr(>|t|)    
(Intercept)    46.461      0.862 17.914   53.88   <2e-16 ***
ageC            1.673      0.385 20.361    4.34   0.0003 ***
interv1         1.096      1.207 16.818    0.91   0.3766    
ageC:interv1   -1.352      0.543 19.560   -2.49   0.0218 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) ageC   intrv1
ageC         0.003              
interv1     -0.714 -0.002       
ageC:intrv1 -0.002 -0.709 -0.003

Interpretation: Fixed effects

fixef(absmod)
 (Intercept)         ageC      interv1 ageC:interv1 
       46.46         1.67         1.10        -1.35 

MAP IT TO THE PLOT!!!!

Interpretation: Random effects

Don’t just ignore. They add context to results

  • e.g. they can give a descriptive answer to “should we expect all children get more aggressive in secondary school?”

  • often it’s about the amount of variability relative to the corresponding fixed effect.

    • a fixed slope of 0.5 with groups varying in slopes with an SD of 1
    • a fixed slope of 10 with groups varying in slopes with an SD of 1

Interpretation: Random effects

Don’t just ignore. They add context to results

  • e.g. they can give a descriptive answer to “should we expect all children get more aggressive in secondary school?”

The research process: reporting

Reporting the analysis strategy

  • Data cleaning outlier/data removal, transformations prior to analysis.

  • Unplanned transformations and data removal which are carried out in order to meet assumptions.

  • Specify all fixed effects (explanatory variables & covariates).
    Link them to explicitly stated research questions/hypotheses.

  • Explicitly state the hierarchical structure of the data and of the model.
    Specify random effects according to the sampling units (schools/children etc) with which they interact.

  • Planned structure of random effects to be fitted

  • Procedure to be used to decide on final random effect structure.

  • State clearly the relevant test/comparison/parameter estimate of interest.
    Link to explicitly stated research questions/hypotheses.

    • Method you plan to use to conduct inference (e.g. LRT, kr, bootstrap)
    • Any model comparisons should be clearly stated so that the reader understands the structure of both models being compared.

Reporting results

Model fitting (strategy? results? ??)

  • Software packages and versions used to fit the model(s), along with the estimation method (ML/REML) and optimiser used.

  • If proposed model failed to converge:

    • steps leading to final converging model.
    • final model structure

Results

For final model:

  • all parameter estimates for fixed effects.

    • coefficients
    • associated standard errors, test statistics, df and p-values (if used, or CIs if preferred)
  • random effects

    • standard deviation and/or variance for each random effect
    • correlations/covariances if modelled
    • residual variance/standard deviation

Reporting: text and tables

Tables help a lot!

library(sjPlot)
tab_model(absmod, 
          show.ci = 0.95,
          show.stat = TRUE/FALSE,
          show.se = TRUE/FALSE,
          show.df = TRUE/FALSE,
          show.re.var = TRUE/FALSE,
          df.method="satterthwaite")

But they are not a substitute for interpretation

  • Key parameters of interest should also be included in-text, with interpretation.
  ABS
Predictors Estimates Statistic p df
(Intercept) 46.46 53.88 <0.001 17.91
ageC 1.67 4.34 <0.001 20.36
interv [1] 1.10 0.91 0.377 16.82
ageC × interv [1] -1.35 -2.49 0.022 19.56
Random Effects
σ2 49.47
τ00 schoolid:ppt 17.38
τ00 schoolid 2.50
τ11 schoolid:ppt.ageC 4.91
τ11 schoolid.ageC 0.45
ρ01 schoolid:ppt 0.69
ρ01 schoolid 0.26
ICC 0.64
N schoolid 20
N ppt 16
Observations 992
Marginal R2 / Conditional R2 0.034 / 0.650

Visualisations

fixef(absmod)
 (Intercept)         ageC      interv1 ageC:interv1 
       46.46         1.67         1.10        -1.35 
library(effects)
efplot <- effect("ageC*interv", absmod) |> 
  as.data.frame()

ggplot(efplot, aes(x = ageC, y = fit, 
                   col = interv, fill = interv)) +
  geom_line()+
  geom_ribbon(aes(ymin = lower, ymax = upper),
              alpha=.3) +
  scale_x_continuous("Age",breaks=0:7,labels=12:19)

Visualisations - Results in context

fixef(absmod)
 (Intercept)         ageC      interv1 ageC:interv1 
       46.46         1.67         1.10        -1.35 
VarCorr(absmod)
 Groups       Name        Std.Dev. Corr
 schoolid:ppt (Intercept) 4.169        
              ageC        2.217    0.69
 schoolid     (Intercept) 1.583        
              ageC        0.669    0.26
 Residual                 7.033        
library(effects)
library(broom.mixed)
efplot <- effect("ageC*interv", absmod) |> 
  as.data.frame()

augment(absmod) |>
  mutate(uppt = paste0(schoolid, ppt)) |>
  ggplot(aes(x=ageC, col = interv)) +
  geom_line(aes(y = .fitted, group = uppt), alpha=.1) + 
  geom_line(data = efplot, aes(y = fit, col= interv)) +
  geom_ribbon(data = efplot, aes(y = fit, 
                                 ymin = lower, ymax = upper,
                                 fill = interv), alpha = .3) +
  scale_x_continuous("Age",breaks=0:7,labels=12:19)

Visualising - Results in context

But plotting manually gives you more control:

QR CODE!!!

End

This week

Tasks

Complete readings


Attend your lab and work together on the exercises


Complete the weekly quiz

Support

Piazza forum!


Office hours (see Learn page for details)