Flashcards: `lm` to `lmer`

In a simple linear regression, there is only considered to be one source of random variability: any variability left unexplained by a set of predictors (which are modelled as fixed estimates) is captured in the model residuals.

Multi-level (or ‘mixed-effects’) approaches involve modelling more than one source of random variability - as well as variance resulting from taking a random sample of observations, we can identify random variability across different groups of observations. For example, if we are studying a patient population in a hospital, we would expect there to be variability across the our sample of patients, but also across the doctors who treat them.

We can account for this variability by allowing the outcome to be lower/higher for each group (a random intercept) and by allowing the estimated effect of a predictor vary across groups (random slopes).

Before you expand each of the boxes below, think about how comfortable you feel with each concept.
This content is very cumulative, which means often going back to try to isolate the place which we need to focus efforts in learning.

Simple Linear Regression

Clustered (multi-level) data

Random intercepts

Shrinkage

Random slopes

Fixed effects

We can extract the fixed effects using the fixef() function:

These are the overall intercept and slope, \(\gamma_{00}\) and \(\gamma_{10}\).

fixef(random_slopes_model)

## (Intercept)          x1 
##     405.790      -0.672

Random effects

The plots below show the fitted values for each subject from each model that we have gone through in these expandable boxes (simple linear regression, random intercept, and random intercept & slope):

In the random-intercept model (center panel), the differences from each of the subjects’ intercepts to the fixed intercept (thick green line) have mean 0 and standard deviation \(\sigma_{\zeta_0}\). The standard deviation (and variance, which is \(\sigma_{\zeta_0}^2\)) is what we see in the random effects part of our model summary (or using the VarCorr() function).

In the random-slope model (right panel), the same is true for the differences from each subjects’ slope to the fixed slope. We can extract the deviations for each group from the fixed effect estimates using the ranef() function.

These are the deviations from the overall intercept (\(\widehat \gamma_{00} = 405.79\)) and slope (\(\widehat \gamma_{10} = -0.672\)) for each subject \(i\).

ranef(random_slopes_model)

## $subject
##         (Intercept)      x1
## sub_308       31.33 -1.4400
## sub_309      -28.83  0.4184
## sub_310        2.71  0.0599
## sub_330       59.40  0.3853
## sub_331       74.96  0.1739
## sub_332       91.09 -0.2346
## sub_333       97.85 -0.1906
## sub_334      -54.19 -0.5585
## sub_335      -16.90  0.9207
## sub_337       52.22 -1.1660
## sub_349      -67.76 -0.6844
## sub_350       -5.82 -1.2379
## sub_351       61.20  0.0550
## sub_352       -7.91 -0.6650
## sub_369      -47.64 -0.4681
## sub_370      -33.12 -1.1100
## sub_371       77.58 -0.2040
## sub_372      -36.39 -0.4583
## sub_373     -197.58  1.7990
## sub_374      -52.20  4.6051
## 
## with conditional variances for "subject"

Group-level coefficients

We can also see the actual intercept and slope for each subject \(i\) directly, using the coef() function.

coef(random_slopes_model)

## $subject
##         (Intercept)     x1
## sub_308         437 -2.112
## sub_309         377 -0.254
## sub_310         409 -0.612
## sub_330         465 -0.287
## sub_331         481 -0.498
## sub_332         497 -0.907
## sub_333         504 -0.863
## sub_334         352 -1.231
## sub_335         389  0.248
## sub_337         458 -1.838
## sub_349         338 -1.357
## sub_350         400 -1.910
## sub_351         467 -0.617
## sub_352         398 -1.337
## sub_369         358 -1.140
## sub_370         373 -1.782
## sub_371         483 -0.876
## sub_372         369 -1.131
## sub_373         208  1.127
## sub_374         354  3.933
## 
## attr(,"class")
## [1] "coef.mer"

Notice that the above are the fixed effects + random effects estimates, i.e. the overall intercept and slope + deviations for each subject.

coef(random_intercept_model)

## $subject
##         (Intercept)     x1
## sub_308         384 -0.914
## sub_309         407 -0.914
## sub_310         422 -0.914
## sub_330         492 -0.914
## sub_331         498 -0.914
## sub_332         496 -0.914
## sub_333         505 -0.914
## sub_334         339 -0.914
## sub_335         440 -0.914
## sub_337         417 -0.914
## sub_349         320 -0.914
## sub_350         356 -0.914
## sub_351         479 -0.914
## sub_352         380 -0.914
## sub_369         349 -0.914
## sub_370         335 -0.914
## sub_371         484 -0.914
## sub_372         361 -0.914
## sub_373         294 -0.914
## sub_374         511 -0.914
## 
## attr(,"class")
## [1] "coef.mer"

Visualising Model Fitted values

The model fitted (or “model predicted”) values can be obtained using predict() (returning just the values) or broom.mixed::augment() (returning the values attached to the data that is inputted to the model).

To plot, them, we would typically like to plot the fitted values for each group (e.g. subject)

library(broom.mixed)
augment(random_slopes_model) %>%
  ggplot(.,aes(x=x1, y=.fitted, group=subject))+
  geom_line()

Visualising Fixed Effects

If we want to plot the fixed effects from our model, we have to do something else. Packages like sjPlot make it incredibly easy (but sometimes too easy), so a nice option is to use the effects package to construct a dataframe of the linear prediction accross the values of a predictor, plus standard errors and confidence intervals. We can then pass this to ggplot(), giving us all the control over the aesthetics.

library(effects)
ef <- as.data.frame(effect(term="x1",mod=random_slopes_model))
ggplot(ef, aes(x=x1,y=fit, ymin=lower,ymax=upper))+
  geom_line()+
  geom_ribbon(alpha=.3)

Plotting random effects

The quick and easy way to plot your random effects is to use the dotplot.ranef.mer() function in lme4.

randoms <- ranef(random_slopes_model, condVar=TRUE)
dotplot.ranef.mer(randoms)

## $subject

Completely optional - extracting them for plotting in ggplot

Sometimes, however, we might want to have a bit more control over our plotting, we can extract the estimates and correlations for each subject:

#we can get the random effects:
#(note that we use $subject because there might be other groupings, and the ranef() function will give us a list, with one element for each grouping variable)
randoms <-
  ranef(random_slopes_model)$subject %>%
  mutate(subject = row.names(.)) %>%  # the subject IDs are stored in the rownames, so lets add them as a variable
  pivot_longer(cols=1:2, names_to="term",values_to="estimate") # finally, let's reshape it for plotting

#and the same for the standard errors (from the arm package):
randoms_se <-
  arm::se.ranef(random_slopes_model)$subject %>%
  as.data.frame() %>%
  mutate(subject = row.names(.)) %>%
  pivot_longer(cols=1:2, names_to="term",values_to="se")

# join them together:
ranefs_plotting <- left_join(randoms, randoms_se)

# it's easier for plotting if we
ggplot(ranefs_plotting, aes(y=subject, x=estimate))+
  geom_errorbarh(aes(xmin=estimate-2*se, xmax=estimate+2*se))+
  facet_wrap(~term, scales="free_x")

Polynomials!

Sometimes, data have a clear non-linear pattern, such as a curvilinear trend. In such case, it is reasonable to try modelling the outcome not as a linear function of the variable, but as a curvilinear function of it.

The following plots show data (as black dots) where the outcome \(y\) has a nonlinear and decreasing dependence on \(x\). That is, as \(x\) varies from 1 to 10, the outcome \(y\) decreases in a non-linear fashion. Superimposed to the same data, you can see a linear fit (red line) and a cubic fit (blue).

The residuals corresponding to each fit are:

Clearly, a linear fit doesn’t capture the real trend in the data, and any leftover systematic pattern that the model doesn’t explicity account for always ends up in the residuals as the red points show.

On the other hand, once we account for the nonlinear trend, that systematic pattern in the residuals disappears.

The secret is to use instead of \(x\) as a predictor, the corresponding polynomial up to a specific order:

\[ y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \epsilon \]

Consider the following example data. You can add polynomials up to order 3, for example, of a predictor “time” by saying:

## # A tibble: 5 × 3
##   subject reaction  time
##     <dbl>    <dbl> <int>
## 1       1    0.428     1
## 2       1    0.427     2
## 3       1    0.211     3
## 4       1    0.585     4
## 5       1    0.127     5

source("https://uoepsy.github.io/msmr/functions/code_poly.R")

code_poly(df, predictor = 'time', poly.order = 3, draw.poly = FALSE)

## # A tibble: 5 × 7
##   subject reaction  time time.Index  poly1  poly2     poly3
##     <dbl>    <dbl> <int>      <dbl>  <dbl>  <dbl>     <dbl>
## 1       1    0.428     1          1 -0.632  0.535 -3.16e- 1
## 2       1    0.427     2          2 -0.316 -0.267  6.32e- 1
## 3       1    0.211     3          3  0     -0.535 -4.10e-16
## 4       1    0.585     4          4  0.316 -0.267 -6.32e- 1
## 5       1    0.127     5          5  0.632  0.535  3.16e- 1

and use those terms when specifying your linear model. We can, depending on how time is measured (e.g. if we have subjects over time) also include some (or all) of these terms as random effects, for example:

lmer(reaction ~ poly1 + poly2 + poly3 + (1 + poly1 | subject))

Nested and Crossed structures

Inference for Multilevel Models

Exercises: More Crossed Ranefs

Data: 11 Domain Tests

44 participants across 4 groups A, B, C, & W (between-subjects) were tested 5 times (waves) in 11 domains. In each wave, participants received a score (on a 20-point scale) for each domain and a set of questions which they answered either correctly or incorrectly.

The data can be accessed using the following code, and a description of the variables in the data can be found in the table below:

load(url("https://uoepsy.github.io/data/msmr_lab5.RData"))

variable	description
Anonymous_Subject_ID	Participant Identifier
IndivDiff	?? (not sure but it’s not relevant for these questions!)
Wave	Study wave (timepoint), ranging from 1 to 5
Domain	Domain tested (one of 11 domains studied, including things such as animals (ANI), objects (OBJ), toys (TOY), vehicles (VEH)
Correct	Number of questions answered correctly
Error	Number of questions answered incorrectly
Group	Group (between-participants), A, B, C, or W
Score	Score

Question A1

Did the groups differ in overall performance?

There are different ways to test this: use the 20-point score or the accuracy? Keep the domains separate or calculate an aggregate across all domains? Which way makes the most sense to you?

Make a plot that corresponds to the research question. Does it look like there’s a difference?

Solution

Lots of options for this one, here is one that shows Group differences:

ggplot(dat5, aes(Group, Score, color=Group)) +
  stat_summary(fun.data=mean_se, geom="pointrange") +
  guides(color="none")+
  coord_flip()

And here is one that shows Group and Domain differences:

ggplot(dat5, aes(Domain, Score, color=Group)) +
  stat_summary(fun.data=mean_se, geom="pointrange") +
  coord_flip()

And another:

library(ggridges)
ggplot(dat5, aes(x=Score,y=Domain,fill=Domain)) +
  geom_density_ridges()+
  facet_wrap(~Group)+
  viridis::scale_fill_viridis(discrete=T,option = "plasma")

Looks like there are group differences and domain differences, but not much in the way of group-by-domain differences.

Question A2

Did the groups differ in overall performance?

Use a mixed-effects model to test the difference.

Will you use a linear or logistic model?
What should the fixed(s) effect be?
What should the random effect(s) be? We have observations clustered by subjects and by domains - are they nested?

Tip: For now, we can forget about the longitudinal aspect to the data, because the research question is only concerned with overall performance.

Solution

We’re interested in the amount to which Groups vary in their overall performance, so we want a fixed effect of Group. Subjects and Domains are not nested - each subject sees different domains, and each domain is seen by multiple subjects.

The maximal model, shown below, will not converge.

mod_grp <- lmer(Score ~ Group + 
                   (1 | Anonymous_Subject_ID) + 
                   (1 + Group | Domain), 
                 data=dat5, REML=FALSE)
VarCorr(mod_grp)

##  Groups               Name        Std.Dev. Corr             
##  Anonymous_Subject_ID (Intercept) 4.415                     
##  Domain               (Intercept) 1.002                     
##                       GroupB      0.704     0.20            
##                       GroupC      0.150     0.57 -0.69      
##                       GroupW      0.426    -0.12  0.51 -0.52
##  Residual                         3.007

As the maximal model doesn’t converge, we remove the random Group slopes for Domain, representing the higher complexity.

mod_grp <- lmer(Score ~ Group + 
                   (1 | Anonymous_Subject_ID) + 
                   (1 | Domain), 
                 data=dat5, REML=FALSE)
summary(mod_grp)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Score ~ Group + (1 | Anonymous_Subject_ID) + (1 | Domain)
##    Data: dat5
## 
##      AIC      BIC   logLik deviance df.resid 
##    10398    10438    -5192    10384     2004 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -5.025 -0.498  0.064  0.634  3.278 
## 
## Random effects:
##  Groups               Name        Variance Std.Dev.
##  Anonymous_Subject_ID (Intercept) 19.49    4.41    
##  Domain               (Intercept)  1.06    1.03    
##  Residual                          9.12    3.02    
## Number of obs: 2011, groups:  Anonymous_Subject_ID, 44; Domain, 11
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    15.83       1.12   14.12
## GroupB         -4.16       2.26   -1.84
## GroupC         -3.62       1.77   -2.05
## GroupW         -7.27       1.67   -4.34
## 
## Correlation of Fixed Effects:
##        (Intr) GroupB GroupC
## GroupB -0.457              
## GroupC -0.585  0.290       
## GroupW -0.618  0.306  0.392

Yes, substantial Group differences: overall, group A does the best, groups C and B next, and group W does the worst.

Question A3

Did performance change over time (across waves)? Did the groups differ in pattern of change?

Make a plot that corresponds to the research question. Does it look like there was a change? A group difference?

Solution

ggplot(dat5, aes(Wave, Score, color=Group, fill=Group)) +
  stat_summary(fun.data=mean_se, geom="ribbon", alpha=0.3, color=NA) +
  stat_summary(fun.y=mean, geom="line")

Yes, looks like groups A, C, and W are improving, but it looks like group B is getting worse.

Question A4

Did performance change over time (across waves)? Did the groups differ in pattern of change?

Use mixed-effects model(s) to test this.

Hint: Fit a baseline model in which scores change over time (wave), then assess improvement in model fit due to inclusion of overall group effect and finally the interaction of group with time.

Solution

Remember: If we’re using a likelihood ratio test to compare models (like we are here), then the models should be fitted with ML, not REML. anova(model1,model2) will re-fit them automatically for us, but we specify it specifically for each model here anyway:

mod_wv <- lmer(Score ~ Wave + 
                   (1 + Wave | Anonymous_Subject_ID) + 
                   (1 + Wave | Domain), 
                 data=dat5, REML=FALSE,
                 lmerControl(optimizer = "bobyqa"))

mod_wv_grp <- lmer(Score ~ Wave+Group + 
                   (1 + Wave | Anonymous_Subject_ID) + 
                   (1 + Wave | Domain), 
                 data=dat5, REML=FALSE,
                 lmerControl(optimizer = "bobyqa"))

mod_wv_x_grp <- lmer(Score ~ Wave*Group + 
                   (1 + Wave | Anonymous_Subject_ID) + 
                   (1 + Wave | Domain), 
                 data=dat5, REML=FALSE,
                 lmerControl(optimizer = "bobyqa"))

anova(mod_wv, mod_wv_grp, mod_wv_x_grp)

## Data: dat5
## Models:
## mod_wv: Score ~ Wave + (1 + Wave | Anonymous_Subject_ID) + (1 + Wave | Domain)
## mod_wv_grp: Score ~ Wave + Group + (1 + Wave | Anonymous_Subject_ID) + (1 + Wave | Domain)
## mod_wv_x_grp: Score ~ Wave * Group + (1 + Wave | Anonymous_Subject_ID) + (1 + Wave | Domain)
##              npar  AIC  BIC logLik deviance Chisq Df Pr(>Chisq)   
## mod_wv          9 9720 9770  -4851     9702                       
## mod_wv_grp     12 9710 9778  -4843     9686 15.35  3     0.0015 **
## mod_wv_x_grp   15 9710 9795  -4840     9680  5.81  3     0.1212   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(mod_wv_x_grp)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Score ~ Wave * Group + (1 + Wave | Anonymous_Subject_ID) + (1 +  
##     Wave | Domain)
##    Data: dat5
## Control: lmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##     9710     9795    -4840     9680     1996 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -5.005 -0.576  0.005  0.613  3.752 
## 
## Random effects:
##  Groups               Name        Variance Std.Dev. Corr 
##  Anonymous_Subject_ID (Intercept) 22.3660  4.729         
##                       Wave         0.7479  0.865    -0.34
##  Domain               (Intercept)  2.0533  1.433         
##                       Wave         0.0219  0.148    -0.99
##  Residual                          6.0686  2.463         
## Number of obs: 2011, groups:  Anonymous_Subject_ID, 44; Domain, 11
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  12.7728     1.2493   10.22
## Wave          1.2547     0.2389    5.25
## GroupB       -1.3648     2.4745   -0.55
## GroupC       -4.1467     1.9166   -2.16
## GroupW       -6.3185     1.8166   -3.48
## Wave:GroupB  -1.1423     0.5292   -2.16
## Wave:GroupC  -0.0369     0.3686   -0.10
## Wave:GroupW  -0.5089     0.3547   -1.43
## 
## Correlation of Fixed Effects:
##             (Intr) Wave   GroupB GroupC GroupW Wv:GrB Wv:GrC
## Wave        -0.412                                          
## GroupB      -0.444  0.175                                   
## GroupC      -0.574  0.226  0.290                            
## GroupW      -0.605  0.239  0.306  0.395                     
## Wave:GroupB  0.157 -0.436 -0.389 -0.102 -0.108              
## Wave:GroupC  0.225 -0.625 -0.114 -0.366 -0.155  0.282       
## Wave:GroupW  0.234 -0.650 -0.118 -0.153 -0.370  0.293  0.421

Question A5

Using broom.mixed::augment() for the model with a Wave*Group interaction, plot the average (stat_summary() perhaps?) model fitted values for each group across Waves. Add in the observed data too.

Solution

broom.mixed::augment(mod_wv_x_grp) %>%
  ggplot(., aes(Wave, Score, color=Group)) +
  stat_summary(fun.data=mean_se, geom="pointrange") +
  stat_summary(aes(y=.fitted), fun=mean, geom="line")

We fit a linear model, but the model fit lines are not straight lines. Why is that?

Question A6

Create individual subject plots for the data and the model’s fitted values. Will these show straight lines?

Hint: make use of facet_wrap() to create a different panel for each level of a grouping variable.

Solution

broom.mixed::augment(mod_wv_x_grp) %>%
  ggplot(., aes(Wave, Score, color=Group)) +
  facet_wrap(~ Anonymous_Subject_ID) +
  stat_summary(fun.data=mean_se, geom="pointrange") +
  stat_summary(aes(y=.fitted), fun.y=mean, geom="line")

The individual subject plots show linear fits, which is a better match to the model. But now we see the missing data – some participants only completed the first few waves.

Question A7

Make a plot of the actual (linear) model prediction.

Hint: Use the effect() function from the effects package.

Solution

library(effects)
ef <- as.data.frame(effect("Wave:Group", mod_wv_x_grp))
ggplot(ef, aes(Wave, fit, color=Group, fill=Group)) +
  geom_ribbon(aes(ymax=fit+se, ymin=fit-se), color=NA, alpha=0.1) +
  geom_line()

Question A8

What important things are different between the plot from question A7 and that from question A5? (You can see the plots we created for these questions below).

Why do you think these two plots differ?

Hint

Solution

Group B was not actually getting worse. The appearance that it was getting worse is an artifact of selective drop-out: there’s only a few people in this group and the better-performing ones only did the first few waves so they are not represented in the later waves, but the worse-performing ones are contributing to the later waves. The model estimates how the better-performing ones would have done in later waves based on their early-wave performance and the pattern of performance of other participants in the study.

summary(mod_wv_x_grp)$coefficients

##             Estimate Std. Error t value
## (Intercept)  12.7728      1.249  10.224
## Wave          1.2547      0.239   5.253
## GroupB       -1.3648      2.474  -0.552
## GroupC       -4.1467      1.917  -2.164
## GroupW       -6.3185      1.817  -3.478
## Wave:GroupB  -1.1423      0.529  -2.159
## Wave:GroupC  -0.0369      0.369  -0.100
## Wave:GroupW  -0.5089      0.355  -1.435

Note that the Group A slope (coefficient for Wave) is 1.255 and, relative to that slope, the Group B slope is -1.142 (coefficient for Wave:GroupB). This means that the model-estimated slope for Group B is 0.112, which is very slightly positive, not strongly negative as appeared in the initial plots.

One of the valuable things about mixed-effects (aka multilevel) modeling is that individual-level and group-level trajectories are estimated. This helps the model overcome missing data in a sensible way. In fact, MLM/MLR models are sometimes used for imputing missing data. However, one has to think carefully about why data are missing. Group B is small and it might just be a coincidence that the better-performing participants dropped out after the first few waves, which would make it easier to generalize the patterns to them. On the other hand, it might be the case that there is something about the study that makes better-performing members of Group B drop out, which should make us suspicious of generalizing to them.

Question A9

Create a plot of the subject and domain random effects. Notice the pattern between the random intercept and random slope estimates for the 11 domains - what in our model is this pattern representing?

Solution

randoms <- ranef(mod_wv_x_grp, condVar=TRUE)
dotplot.ranef.mer(randoms)

## $Anonymous_Subject_ID

## 
## $Domain

Notice that the domains with the lower relative intercept tend to have a higher relative slope (and vice versa). This is the negative correlation between random intercepts and slopes for domain in our model:

VarCorr(mod_wv_x_grp)

##  Groups               Name        Std.Dev. Corr 
##  Anonymous_Subject_ID (Intercept) 4.729         
##                       Wave        0.865    -0.34
##  Domain               (Intercept) 1.433         
##                       Wave        0.148    -0.99
##  Residual                         2.463

Try removing the correlation (hint: use the ||) to see what happens. Does it make sense that these would be correlated? (Answer: we don’t really know enough about the study, but it’s something to think about!)

Less Guided Exercise

Instead of step-by-step questions, this exercise is designed to get you thinking more, giving you practice for the report and for your future research. If it helps, you can find a (sort of) checklist for multilevel models here (but please be aware that there is no ‘one-size-fits-all’ approach - this checklist may not always be appropriate for every research question with multi-level data)

Question C1

How does aggressive behaviour change over adolescence? How is this change dependent upon whether or not a child has siblings?

Data: Aggressive Behaviour in Adolescence

Data was collected from 30 secondary schools across Scotland. A cohort of students were followed up every year from the age of 12 to 19. Each year, they completed the Aggressive Behaviour Scale (ABS). Data was also captured on whether each child had siblings or not.
The data can be accessed from https://uoepsy.github.io/data/schoolsabs.csv . A description of the variables can be found in the table below.

variable	description
schoolid	School Identifier
ABS	Score on the Aggressive Behaviour Scale (Z-scored)
year	Age (in years) of child at observation
childid	Within-School Child Identifier
siblings	Sibling status (No/Yes)

Solution

absdat <- read_csv("https://uoepsy.github.io/data/schoolsabs.csv")

Let’s start by just exploring the data a bit more. This computes the number of distinct schools and distinct children within each school:

absdat %>% select(schoolid) %>% n_distinct()

## [1] 30

absdat %>% select(schoolid,childid) %>% n_distinct()

## [1] 500

We have 500 students from 30 different schools. Schools tend to have data from about 17 children. Some schools have data from as few as 13 children, some have data from as many as 20.

absdat %>% 
  group_by(schoolid) %>% 
  summarise(
    nchildren = n_distinct(childid)
  ) %>% summary

Note that the childid variable does not uniquely identify each child. We need to also know which school child “1” comes from.

absdat %>% count(schoolid,childid) %>% select(n) %>% table

We can see that for some children we have fewer than the 7 observations: some have 6, 5, 4, or 3.

There are lots of different ways we can cut up the data to plot it

ggplot(absdat, aes(x = year, y = ABS,col=siblings)) +
  stat_summary(geom="pointrange",aes(group=siblings),alpha=.5)+
  stat_summary(geom="line",aes(group=siblings),alpha=.5)+
  labs(title="Average ABS scores over age by number of siblings", x = "Age", y = "Aggressive Behaviour Score")

ggplot(absdat, aes(x = year, y = ABS,col=siblings)) +
  stat_summary(geom="pointrange",aes(group=schoolid),alpha=.2)+
  stat_summary(geom="line",aes(group=schoolid),alpha=.2)+
  labs(title="School level mean ABS scores", x = "Age", y = "Aggressive Behaviour Score")

ggplot(absdat, aes(x = year, y = ABS,col=siblings)) +
  stat_summary(geom="line",aes(group=interaction(schoolid,childid)),alpha=.2)+
  facet_wrap(~siblings, labeller = label_both) +
  labs(title="Children's trajectories of ABS scores",x = "Age", y = "Aggressive Behaviour Score")

Now let’s think a bit about the various considerations that go into fitting a multi-level model.

We can group our data by whether or not they have siblings, by school, and by child.
We’re specifically interested in differences between the levels of the siblings variable, so that is going to be in our fixed effects.
We’re not interested in drawing inferences about specific schools or specific children. The schools and children in our dataset in this case are just a random sample of the wider population of schools and children. They will account for some variability in the outcome (e.g. some children are more aggressive than others, maybe some schools tend to have more aggressive children in them than others). So our model wants to account for this, but we don’t want to specifically estimate differences. These will be our random effects.
Schools and children is one of the most obvious cases of nested random effects. Each child belongs to one, and only one school. This means we are going to want to model the structure (1 | School / Child).
The childid variable does not uniquely identify a single child. It just says “1”,“2”, etc. But child 1 from school 1 is a different child from child 1 in school 2. We need to remember this, and if we want to separate out our random effects of school and child, then we’ll need (1 | School) + (1 | Child:School).
Alternatively, we could make a new child identifier variable that does uniquely identify each child.
Along with being interested in differences between having/not-having siblings, we’re also interested in change over time. From our initial plotting, we see there is some curvature present, and this is most apparent in the group of children with siblings. Are we interested in differences between sibling groups at the first timepoint (age 12?). Possibly? This depends a lot on the research field.
We know that effects of time can vary by child, and so also by school. Children don’t change in whether or not they have siblings (well, they might, not often), so we can’t really think of “the effect of having siblings on aggressive behaviours for child \(i\)” because child \(i\) always has the same number of siblings. However, within a school, some children have siblings, some don’t, so it might be that this effect can vary by-school.

source("https://uoepsy.github.io/msmr/functions/code_poly.R")
absdat <- code_poly(absdat, "year", poly.order = 2, orthogonal = T, draw.poly = F)

basemod <- 
  lmer(ABS ~ poly1 + poly2 + (1 + poly1 + poly2 | schoolid/childid), 
     data = absdat, REML=FALSE,
     control = lmerControl(optimizer = "bobyqa"))
isSingular(basemod)

## [1] TRUE

basemod <- 
  lmer(ABS ~ poly1 + poly2 + (1 + poly1 | schoolid) +
         (1 + poly1 + poly2 | childid:schoolid), 
     data = absdat, REML=FALSE,
     control = lmerControl(optimizer = "bobyqa"))

Let’s use effects coding for siblings, so that our estimated coefficients for poly1 and poly2 represent the average.

# we'll put yes first, because when we use contr.sum, it will set the last level to -1
absdat$siblings <- factor(absdat$siblings, levels=c("yes","no"))
contrasts(absdat$siblings) <- contr.sum(2)
contrasts(absdat$siblings)

##     [,1]
## yes    1
## no    -1

linmod <- 
  lmer(ABS ~ siblings*poly1 + poly2 + (1 + poly1 | schoolid) +
         (1 + poly1 + poly2 | childid:schoolid), 
     data = absdat, REML=FALSE,
     control = lmerControl(optimizer = "bobyqa"))

quadmod <- 
  lmer(ABS ~ siblings*(poly1 + poly2) + (1 + poly1 | schoolid) +
         (1 + poly1 + poly2 | childid:schoolid), 
     data = absdat, REML=FALSE,
     control = lmerControl(optimizer = "bobyqa"))

anova(basemod,linmod,quadmod)

## Data: absdat
## Models:
## basemod: ABS ~ poly1 + poly2 + (1 + poly1 | schoolid) + (1 + poly1 + poly2 | childid:schoolid)
## linmod: ABS ~ siblings * poly1 + poly2 + (1 + poly1 | schoolid) + (1 + poly1 + poly2 | childid:schoolid)
## quadmod: ABS ~ siblings * (poly1 + poly2) + (1 + poly1 | schoolid) + (1 + poly1 + poly2 | childid:schoolid)
##         npar  AIC  BIC logLik deviance Chisq Df Pr(>Chisq)    
## basemod   13 8034 8113  -4004     8008                        
## linmod    15 8015 8106  -3992     7985 22.97  2      1e-05 ***
## quadmod   16 8016 8114  -3992     7984  0.65  1       0.42    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s get some bootstrapped confidence intervals around our parameter estimates. This may take quite a while to run.

confint(quadmod, method="boot")

term	est	2.5 %	97.5 %
(Intercept)	0.011	-0.145	0.158
SiblingsY (Y=1,N=-1)	-0.060	-0.103	-0.025
Age (linear)	0.526	0.259	0.800
Age (quadratic)	-0.197	-0.260	-0.135
SiblingsY:Age(lin)	-0.202	-0.315	-0.092
SiblingsY:Age(quad)	-0.025	-0.085	0.038

Aggressive behaviour scores show an increase over time (linear increase 0.526, 95% CI [0.259 – 0.8]), which gets more gradual as adolescence progresses (quadratic term -0.197 [-0.26 – -0.135]). Overall, having siblings was associated with lower scores on the Aggressive Behaviour Scale ( \(\beta\) = -0.06 [-0.103 – -0.025]), and showed less increase with age ( \(\beta\) = -0.202 [-0.315 – -0.092]). There was no evidence for the quadratic change over time being dependent upon sibling-status. The model fitted values and data are presented in Figure 1 below.

broom.mixed::augment(quadmod) %>%
  ggplot(.,aes(x=poly1, col=siblings))+
  stat_summary(aes(y=ABS),geom="pointrange", alpha=.5)+
  stat_summary(aes(y=.fitted),geom="line")+
  scale_x_continuous("age", breaks=poly(12:18,2)[,1], labels=c(12:18))

Figure 1: Means and standard errors for ABS scores over age (points), along with average model fitted values (lines)

Multilevel Modelling Recap

Flashcards: lm to lmer

Exercises: More Crossed Ranefs

Less Guided Exercise

Flashcards: `lm` to `lmer`