Preliminaries

Create a new R Script or RMarkdown document (whichever you prefer working with) and give it a title for this week.

Some extra background reading

Random effects

What are “Random Effects”?

A frequent cause of confusion when learning about multilevel models is the use of the term “random effect”. Does it refer to the the grouping variable, or to the effects that we allow to vary by-groups? The answer is really both. For example, in the model lmer(R_AGE ~ 1 + hrs_week + (1 + hrs_week | toy_type), data = toys_read), the “random effects” refers to the (1 + hrs_week | toy_type). We are specifying by-toy type random intercepts and by-toy type effects of hrs_week.

Should I fit a fixed effect: y ~ ... + group or random effect: y ~ ... + (1 | group)?

When specifying a random effects model, think about the data you have and how they fit in the following table:

Criterion:	Repetition: If the experiment were repeated:	Desired inference: The conclusions refer to:
Fixed effects	Same levels would be used	The levels used
Random effects	Different levels would be used	A population from which the levels used are just a (random) sample

Examples

Sometimes, after simplifying the model, you find that there isn’t much variability in a specific random effect and, if it still leads to singular fits or convergence warnings, it is common to just model that variable as a fixed effect.

Other times, you don’t have sufficient data or levels to estimate the random effect variance, and you are forced to model it as a fixed effect. This is similar to trying to find the “best-fit” line passing through a single point… You can’t because you need two points!

Nested & Crossed Structures

Most of the examples we have seen up to now have had only one level of clustering in the data (e.g. participants). But what happens if we have multiple different clusters? The same principle we have seen for one level of clustering can be extended to clustering at different levels, but we have to be considerate of how those levels of clustering are related.

Nested Structures

Take an example where we have observations for each student in every class within a number of schools:

Question: Is “Class 1” in “School 1” the same as “Class 1” in “School 2”?

No.
The classes in one school are distinct from the classes in another even though they are named the same.

The classes-within-schools example is a good case of nested random effects - one factor level (one group in a grouping varible) appears only within a particular level of another grouping variable.

In R, we can specify this using:

(1 | school) + (1 | class:school)

or, more succinctly:

(1 | school/class)

The labels matter!

Are the lower cluster labels unique?	equivalent structures
Yes	`(1 \| school/class)` `(1 \| school) + (1 \| class:school)` `(1 \| school) + (1 \| class)`
No	`(1 \| school/class)` `(1 \| school) + (1 \| class:school)`

Crossed Structures

Consider another example, where we administer the same set of tasks at multiple time-points for every participant.

Question: Are tasks nested within participants?

No.
Tasks are seen by multiple participants (and participants see multiple tasks).

We could visualise this as the below:

In the sense that these are not nested, they are crossed random effects.

In R, we can specify this using:

(1 | subject) + (1 | task)

Nested vs Crossed

Nested: Each group belongs uniquely to a higher-level group.

Crossed: Not-nested.

Random Effects in lme4

Fitting Random effects in lme4

Below are a selection of different formulas for specifying different random effect structures, taken from the lme4 vignette. This might look like a lot, but over time and repeated use of multilevel models you will get used to reading these in a similar way to getting used to reading the formula structure of y ~ x1 + x2 in all our linear models.

Formula	Alternative	Meaning
\(\text{y ~ (1 \| g)}\)	\(\text{y ~ 1 + (1 \| g)}\)	Random intercept with fixed mean
\(\text{y ~ 0 + offset}(o)\text{ + (1 \| g)}\)	\(\text{y ~ -1 + offset}(o)\text{ + (1 \| g)}\)	Random intercept with a priori means
\(\text{y ~ (1 \| g1/g2)}\)	\(\text{y ~ (1 \| g1) + (1 \| g1:g2)}\)	Intercept varying among \(g1\) and \(g2\) within \(g1\)
\(\text{y ~ (1 \| g1) + (1 \| g2)}\)	\(\text{y ~ 1 + (1 \| g1) + (1 \| g2)}\)	Intercept varying among \(g1\) and \(g2\)
\(\text{y ~ x + (x \| g)}\)	\(\text{y ~ 1 + x + (1 + x \| g)}\)	Correlated random intercept and slope
\(\text{y ~ x + (x \|\| g)}\)	\(\text{y ~ 1 + x + (x \| g) + (0 + x \| g)}\)	Uncorrelated random intercept and slope

Table 1: Examples of the right-hand-sides of mixed effects model formulas. \(g\), \(g1\), \(g2\) are grouping factors, covariates and a priori known offsets are \(x\) and \(o\).

Extracting random effects with lme4

In models fitted with lme4, there are some key functions to keep in mind for extracting different parts of the model.

fixef() gives us the fixed effects (in Figure 1 this is the intercept and slope of the blue line: \(\color{blue}{\gamma_{00}}\) and \(\color{blue}{\gamma_{10}}\), respectively).
ranef() gives us the group-level deviations from the fixed effects (in Figure 1, this is the differences from each of the green lines to the blue line, and these are denoted by \(\color{red}{\zeta_{0i}}\) and \(\color{red}{\zeta_{1i}}\)).
coef() gives us the intercepts and slopes of the group-level effects (in Figure 1, these are the intercepts and slopes of the green lines, \(\color{green}{\beta_{0i}\) and \(\color{green}{\beta_{1i}\)). We can also get to these because fixef() + ranef() = coef().
VarCorr() will give us the estimated variance and standard deviation of the random effects (what we get from ranef()).

Figure 1: multilevel model with group \(i\) highlighted

Some quick examples

toys_read <- read_csv("https://uoepsy.github.io/data/toyexample.csv")
rs_model <- lmer(R_AGE ~ 1 + hrs_week + (1 + hrs_week | toy_type), data = toys_read)

Q: Which toy type shows the least improvement in reading age as practice increases, and which shows the greatest improvement?

ranef(rs_model)

## $toy_type
##                              (Intercept)    hrs_week
## Barbie                         0.0855596  0.16463252
## Farm Animals                  -2.1303034 -1.72532126
## Furby                          0.8940478  0.79109745
## G.I.Joe                        1.8601170  1.05745304
## Lego Minifigures              -1.0963190 -0.92548279
## Minecraft                      0.3930185  0.43342754
## My Little Pony                -0.9557181 -0.97937536
## Peppa Pig                      0.1742980 -0.03882159
## Playmobil                     -0.7780752 -0.33556416
## Polly Pocket                   0.7147074  0.66199641
## Power Rangers                  0.1076182  0.07541960
## Rugrats                       -0.2959716 -0.04537848
## Scooby Doo                     2.3654608  1.61665908
## Sock Puppets                  -0.5225530 -0.33068125
## Star Wars                      1.2020393  0.84628213
## Stretch Armstrong             -0.1397913  0.20314258
## SuperZings                    -0.1976937 -0.41191503
## Teenage Mutant Ninja Turtles  -1.3163768 -0.91247162
## Toy Story                      0.9701905  0.88966357
## Transformers                  -1.3342550 -1.03476237
## 
## with conditional variances for "toy_type"

A: It looks like the Farm Animals have the least improvement, and Scooby Doo shows the most improvement

Q: What is the estimated reading age for sock puppets with zero hours of practice per week, and what is their estimated change in reading age for every hour per week increase in practice?

coef(rs_model)

## $toy_type
##                              (Intercept)   hrs_week
## Barbie                         1.8415026  1.3081408
## Farm Animals                  -0.3743604 -0.5818130
## Furby                          2.6499908  1.9346058
## G.I.Joe                        3.6160599  2.2009613
## Lego Minifigures               0.6596239  0.2180255
## Minecraft                      2.1489614  1.5769358
## My Little Pony                 0.8002248  0.1641329
## Peppa Pig                      1.9302409  1.1046867
## Playmobil                      0.9778678  0.8079441
## Polly Pocket                   2.4706504  1.8055047
## Power Rangers                  1.8635611  1.2189279
## Rugrats                        1.4599714  1.0981298
## Scooby Doo                     4.1214037  2.7601674
## Sock Puppets                   1.2333900  0.8128271
## Star Wars                      2.9579822  1.9897904
## Stretch Armstrong              1.6161517  1.3466509
## SuperZings                     1.5582493  0.7315933
## Teenage Mutant Ninja Turtles   0.4395661  0.2310367
## Toy Story                      2.7261335  2.0331719
## Transformers                   0.4216880  0.1087459
## 
## attr(,"class")
## [1] "coef.mer"

A: Sock puppets with zero practice are estimated to have a reading age of 1.23, which increases by 0.81 for every hour of practice per week.

Model Checks

A Note on Convergence warnings

When we start to move to more complex random effect structures, issues of “singular fits” and “non-convergence” become ever more relevant. We’ve already talked about singular fits (see the Week 2 exercises), but we haven’t said much about how to deal with non-convergence.
It may help to look back on Week 1’s section on estimation.

Issues of non-convergence can be caused by many things. If you’re model doesn’t converge, it does not necessarily mean the fit is incorrect, however it is cause for concern, and should be addressed, else you may end up reporting inferences which do not hold.

There are lots of different things which you could do which might help your model to converge. A select few are detailed below:

double-check the model specification and the data
adjust stopping (convergence) tolerances for the nonlinear optimizer, using the optCtrl argument to [g]lmerControl. (see ?convergence for convergence controls).
- What is “tolerance”? Remember that our optimizer is the the method by which the computer finds the best fitting model, by iteratively assessing and trying to maximise the likelihood (or minimise the loss).
  
  Figure 2: An optimizer will stop after a certain number of iterations, or when it meets a tolerance threshold
center and scale continuous predictor variables (e.g. with scale)
Change the optimization method (for example, here we change it to bobyqa): lmer(..., control = lmerControl(optimizer="bobyqa"))
glmer(..., control = glmerControl(optimizer="bobyqa"))
Increase the number of optimization steps: lmer(..., control = lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000)) glmer(..., control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000))
Use allFit() to try the fit with all available optimizers. This will of course be slow, but is considered ‘the gold standard’; “if all optimizers converge to values that are practically equivalent, then we would consider the convergence warnings to be false positives.”
Consider simplifying your model, for example by removing random effects with the smallest variance (but be careful to not simplify more than necessary, and ensure that your write up details these changes)

Assumptions

Hopefully by now you are getting comfortable with the idea that all our models are simplifications, and so there is always going to be some difference between a model and real-life. This difference - the residual - will hopefully just be randomness, and we assess this by checking for systematic patterns in the residual term.

Not much is different in the multilevel model - we simply now have “residuals” on multiple levels. We are assuming that our group-level differences represent one level of randomness, and that our observations represent another level. We can see these two levels in Figure 3, with the group-level deviations from the fixed effects (\(\color{red}{\zeta_{0i}}\) and \(\color{red}{\zeta_{1i}}\)) along with the observation-level deviations from that groups line (\(\color{red}{\varepsilon_{ij}}\)).

Figure 3: Multilevel model with group \(i\) highlighted

Examining Residuals (Level 1)

toys_read <- read_csv("https://uoepsy.github.io/data/toyexample.csv")
rs_model <- lmer(R_AGE ~ 1 + hrs_week + (1 + hrs_week | toy_type), data = toys_read)

We can get the level 1 (observation-level) residuals the same way we used to do for lm() - by just using resid() or residuals(). Additionally, there are a few useful techniques for plotting these which we have listed below:

We can plot the residuals vs fitted model, and assess the extend to which the assumption holds that the residuals are zero mean.
(we want the blue smoothed line to be fairly close to zero across the plot)
```
plot(rs_model, type=c("p","smooth"))
```
We can construct a scale-location plot. This is where the square-root of the absolute value of the standardised residuals is plotted against the fitted values, and allows you to more easily assess the assumption of constant variance.
(we want the blue smoothed line to be close to horizontal across the plot)
```
# "p" below is for points and "smooth" for the smoothed line
plot(rs_model,
     form = sqrt(abs(resid(.))) ~ fitted(.),
     type = c("p","smooth"))
```
We can also examine the normality the level 1 residuals:
(we want the datapoints to follow close to the diagonal line)
```
qqnorm(resid(rs_model)); qqline(resid(rs_model))
```

Examining Residuals (Level 2+)

toys_read <- read_csv("https://uoepsy.github.io/data/toyexample.csv")
rs_model <- lmer(R_AGE ~ 1 + hrs_week + (1 + hrs_week | toy_type), data = toys_read)

To get out the level 2 residuals (the random effects) we need to do a bit of indexing. ranef(rs_model) will give us a list with an item for each grouping. In each item we have a set of columns, one for each thing which is varying by that grouping.

qqnorm(ranef(rs_model)$toy_type[,1]); qqline(ranef(rs_model)$toy_type[,1])

qqnorm(ranef(rs_model)$toy_type[,2]); qqline(ranef(rs_model)$toy_type[,2])

What can we do if we’re worried about assumptions

Model mis-specification
- Is the model appropriate (e.g. is the assumed conditional distribution of y appropriate for your outcome (the family = ??? (link = ???)) bit).
- Might we be missing important theoretical predictors, or missing possible interactions?
Could/Should you transform your outcome variable?
- There are many different transformations we can apply to our outcome variable to enable us to fit a model in which the residuals are more close to being normally distributed (log(y), 1/y, sqrt(y), forecast::BoxCox(y, lambda="auto")). However, this comes at the expense of interpretation, because we are now getting coefficients of “change in transformed y”, and it is not always possible to turn that into a meaningful quantity.
What about Bootstrapping?
- The basic idea of bootstrapping is to fit your model structure to lots and lots of samples, in order to obtain a distribution of the parameter estimate of interest (and then compute a confidence interval for that estimate).
- There are different approaches to how we create the “lots and lots of samples”, and these allow us to relax certain assumptions on our modelling.
- Re-sampling with replacement from our original data allows us to have minimal assumptions, but needs careful consideration about which levels to re-sample at.
```
library(lmeresampler)
# the resample argument is whether we want to resample each level
# (from highest to lowest)
bootstrap(model, .f = fixef, type="case", resample=c(TRUE,FALSE))
```
- Bootstrapping is not a panacea for all models that cause you worry.

Influence

Just like having residuals are multiple levels, we can also consider the influence of data at different levels on our model. For instance, we might have a specific datapoint being highly influential, but we might be just as interested in thinking about specific clusters as exerting influence on our results.
A very useful package for assessing influence in multilevel models is HLMdiag.

Figure 4: Influence in MLM

Examining Influence (Level 1)

Which toy in the dataset has the greatest influence on our model?

Hint: as well as hlm_influence() in the HLMdiag package there is another nice function, hlm_augment()

We can often end up in confusion because the \(i\)th observation inputted to our model (and therefore the \(i\)th observation of hlm_influence() output) might not be the \(i\)th observation in our original dataset - there may be missing data! To ensure they match, you can clean the data before doing the modelling, by dropping the rows where you have NAs in the variables you will use in the model.

(Luckily, we have no missing data in the Toy dataset).

library(HLMdiag)
l1_inf <- hlm_influence(rs_model,level=1)
dotplot_diag(l1_inf$cooksd, cutoff="internal")+
  ylim(0,.15)

hlm_augment(rs_model, level=1) %>% arrange(desc(cooksd))

## # A tibble: 132 × 15
##       id  R_AGE hrs_week toy_type .resid .fitted .ls.r…¹ .ls.f…² .mar.…³ .mar.…⁴
##    <dbl>  <dbl>    <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1    81  2.91      6.28 SuperZi…  -3.24    6.15 -1.60     4.51   -6.03     8.93
##  2    65 14.0       2.44 G.I.Joe    4.98    8.99  2.69    11.3     9.43     4.55
##  3   115  0.735     1.70 Stretch…  -3.17    3.91 -0.182    0.917  -2.97     3.70
##  4    46 11.6       5.70 Star Wa…  -2.68   14.3  -2.50    14.1     3.34     8.27
##  5    55  1.47      2.77 SuperZi…  -2.11    3.58 -3.36     4.84   -3.45     4.92
##  6   104  2.68      2.74 Rugrats   -1.79    4.47  0.0279   2.65   -2.21     4.89
##  7    23  0.838     1.74 Playmob…  -1.55    2.39  0.585    0.253  -2.91     3.75
##  8    30  5.33      2.99 Furby     -3.10    8.43 -2.72     8.04    0.156    5.17
##  9   130  3.79      2.73 Lego Mi…   2.54    1.25  1.94     1.85   -1.08     4.87
## 10    36  3.16      2.99 My Litt…   1.87    1.29 -0.508    3.66   -2.02     5.18
## # … with 122 more rows, 5 more variables: cooksd <dbl>, mdffits <dbl>,
## #   covtrace <dbl>, covratio <dbl>, leverage.overall <dbl>, and abbreviated
## #   variable names ¹.ls.resid, ².ls.fitted, ³.mar.resid, ⁴.mar.fitted

Greatest influence:

For which toy is the model fit the worst (i.e., who has the highest residual?)

hlm_augment(rs_model, level=1) %>% 
    arrange(desc(abs(.resid)))

## # A tibble: 132 × 15
##       id  R_AGE hrs_week toy_type .resid .fitted .ls.r…¹ .ls.f…² .mar.…³ .mar.…⁴
##    <dbl>  <dbl>    <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1    93  7.78      4.28 G.I.Joe   -5.26  13.0     -4.62  12.4      1.13    6.65
##  2    65 14.0       2.44 G.I.Joe    4.98   8.99     2.69  11.3      9.43    4.55
##  3    56 -3.17      3.47 Lego Mi…  -4.59   1.42    -4.69   1.52    -8.89    5.72
##  4   102 -2.91      5.12 My Litt…  -4.56   1.64    -3.96   1.04   -10.5     7.62
##  5    43 -2.99      3.96 Transfo…  -3.84   0.853   -3.79   0.800   -9.28    6.29
##  6    16  8.41      3.89 G.I.Joe   -3.76  12.2     -3.75  12.2      2.21    6.20
##  7    22  0.763     3.97 Sock Pu…  -3.70   4.46    -3.65   4.42    -5.54    6.30
##  8    62  4.31      3.61 Minecra…  -3.54   7.85    -2.53   6.84    -1.58    5.89
##  9   129  5.00      4.84 My Litt…   3.41   1.60     3.61   1.39    -2.29    7.29
## 10    14 16.4       4.26 G.I.Joe    3.39  13.0      3.99  12.4      9.76    6.63
## # … with 122 more rows, 5 more variables: cooksd <dbl>, mdffits <dbl>,
## #   covtrace <dbl>, covratio <dbl>, leverage.overall <dbl>, and abbreviated
## #   variable names ¹.ls.resid, ².ls.fitted, ³.mar.resid, ⁴.mar.fitted

toys_read[93, ]

## # A tibble: 1 × 5
##   toy_type toy        hrs_week   age R_AGE
##   <chr>    <chr>         <dbl> <dbl> <dbl>
## 1 G.I.Joe  Cold Front     4.28  7.35  7.78

Examining Influence (Level 2+)

Which type of toy has the greatest influence on our model?

Either this way:

hlm_augment(rs_model, level="toy_type") %>% 
    arrange(desc(cooksd))

## # A tibble: 20 × 10
##    toy_type      .rane…¹ .rane…² .ls.i…³ .ls.h…⁴  cooksd mdffits covtr…⁵ covra…⁶
##    <chr>           <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 G.I.Joe        1.86    1.06     9.29   -0.889 1.60e-1 1.50e-1  0.114     1.12
##  2 Scooby Doo     2.37    1.62     8.35    0.336 1.33e-1 1.27e-1  0.110     1.11
##  3 Farm Animals  -2.13   -1.73     0.314  -2.40  1.22e-1 1.15e-1  0.131     1.14
##  4 My Little Po… -0.956  -0.979    6.82   -2.72  8.50e-2 8.08e-2  0.0947    1.10
##  5 Stretch Arms… -0.140   0.203   -4.65    1.47  8.09e-2 7.64e-2  0.112     1.12
##  6 SuperZings    -0.198  -0.412    4.57   -1.58  6.48e-2 5.99e-2  0.137     1.14
##  7 Playmobil     -0.778  -0.336   -3.72    0.487 6.24e-2 5.84e-2  0.117     1.12
##  8 Toy Story      0.970   0.890   -0.388   1.14  4.77e-2 4.50e-2  0.115     1.12
##  9 Transformers  -1.33   -1.03     1.44   -1.79  4.13e-2 3.92e-2  0.0629    1.06
## 10 Teenage Muta… -1.32   -0.912   -4.24   -0.510 3.96e-2 3.79e-2  0.0923    1.09
## 11 Lego Minifig… -1.10   -0.925    2.55   -1.94  3.81e-2 3.64e-2  0.101     1.10
## 12 Furby          0.894   0.791    0.235   0.948 3.29e-2 3.12e-2  0.122     1.13
## 13 Star Wars      1.20    0.846    3.70    0.245 3.27e-2 3.10e-2  0.128     1.13
## 14 Rugrats       -0.296  -0.0454  -3.46    0.545 2.68e-2 2.56e-2  0.102     1.10
## 15 Polly Pocket   0.715   0.662   -1.30    1.23  2.66e-2 2.55e-2  0.0979    1.10
## 16 Peppa Pig      0.174  -0.0388   4.04   -0.973 2.54e-2 2.36e-2  0.129     1.13
## 17 Minecraft      0.393   0.433  -20.2     5.84  1.97e-2 1.94e-2  0.0601    1.06
## 18 Barbie         0.0856  0.165   -0.843   0.359 8.94e-3 8.50e-3  0.105     1.11
## 19 Sock Puppets  -0.523  -0.331   -3.98    0.490 8.13e-3 7.83e-3  0.0677    1.07
## 20 Power Rangers  0.108   0.0754   1.44   -0.296 2.65e-4 2.51e-4  0.116     1.12
## # … with 1 more variable: leverage.overall <dbl>, and abbreviated variable
## #   names ¹.ranef.intercept, ².ranef.hrs_week, ³.ls.intercept, ⁴.ls.hrs_week,
## #   ⁵covtrace, ⁶covratio

Or the plot. Note, the only reason we using the cutoff = .15 is to make the labels appear.

inftoytype <- hlm_influence(rs_model,level="toy_type")
dotplot_diag(inftoytype$cooksd, index=inftoytype$toy_type, cutoff=.15) +
  ylim(0,.2)

Exercises: Three-level nesting

Data: Treatment Effects

Synthetic data from a RCT treatment study: 5 therapists randomly assigned participants to control or treatment group and monitored the participants’ performance over time. There was a baseline test, then 6 weeks of treatment, with test sessions every week (7 total sessions).

The following code will load in your R session an object already called tx with the data:

load(url("https://uoepsy.github.io/msmr/data/tx.Rdata"))

You can find a data dictionary below:

variable	description
group	Whether the participant is in the Treatment or Control group
session	Session number (1-7)
therapist	Therapist Identifier (A, B, C, D or E
Score	Score on test (Mean = 0.63, SD = 0.15)
PID	Participant Identifier. Labels take the form <Therapist><Group><Participant number>. For instance, if Therapist A’s 6th Participant is in the Treatment group, then their label is A_treatment_6

Question A1

Load and visualise the data. Does it look like the treatment had an effect on the performance score?

Solution

ggplot(tx, aes(session, Score, color=group)) +
  stat_summary(fun.data = mean_se, geom="pointrange") +
  stat_smooth() +
  theme_classic()

Just for fun, let’s add on the individual participant scores, and also make a plot for each therapist.

ggplot(tx, aes(session, Score, color=group)) +
  stat_summary(fun.data = mean_se, geom = "pointrange") +
  stat_smooth() +
  theme_classic() +
  geom_line(aes(group = PID), alpha = .2) + 
  facet_wrap(~therapist)

Question A2

Test whether the treatment had an effect using multilevel modelling.
Try to fit the maximal model.
Does it converge? Is it singular?

Consider these questions when you’re designing your model(s) and use your answers to motivate your model design and interpretation of results:

What have we randomly sampled here?
- We have randomly sampled some therapists, and within them have random sampled some participants. Each participant then has a sample of observations.
What are the levels of nesting? How should that be reflected in the random effect structure?
- Each participant is associated with just one therapist. Participants are nested within therapists.
What is the shape of change over time? Do you need polynomials to model this shape? If yes, what order polynomials?
- Looks like linear change, don’t need polynomials. And it doesn’t look like there are any baseline differences.
We are wanting to examine how time (session) varies between treatment groups (group), so we want an interaction session * group in the model. Participants have multiple sessions, but belong to only one group. Therapists have multiple sessions and participants in different groups.
Do we want to allow the same effects to vary by participants and by therapists?
- If so, we can specify (1 + .... | therapist/PID).
- If not, and we want to have some effects vary by therapist but not by participant (or vice versa), then we will need to specify these separately.
Do the participants have labels that uniquely associate them with one higher up group (i.e., one therapist?).
- If so, we can have (1..... | PID) + (1.... | therapist).
- If not, then we need to tell the model that patients are nested in therapists, and have (1..... | therapist:PID) + (1.... | therapist).

Solution

library(lme4)

# start with maximal model
m1 <- lmer(Score ~ session * group + 
             (1 + session | PID) + 
             (1 + session * group | therapist),
           data=tx,
           control = lmerControl(optimizer="bobyqa"))

isSingular(m1)

## [1] TRUE

Question A3

Try adjusting your model by removing random effects or correlations, examine the model again, and so on..

Solution

VarCorr(m1)

##  Groups    Name                 Std.Dev.  Corr                
##  PID       (Intercept)          0.1180167                     
##            session              0.0315636 -0.603              
##  therapist (Intercept)          0.0063406                     
##            session              0.0038115 -1.000              
##            groupcontrol         0.0175915 -1.000  1.000       
##            session:groupcontrol 0.0034114  1.000 -1.000 -1.000
##  Residual                       0.0732696

There’s a correlation of exactly -1 between the random intercepts and slopes for therapists, and the standard deviation estimate for session|therapist is pretty small. Let’s remove it.

m2 <- lmer(Score ~ session * group + 
             (1 + session | PID) + 
             (1 + group | therapist),
           data=tx,
           control = lmerControl(optimizer="bobyqa"))
VarCorr(m2)

##  Groups    Name         Std.Dev. Corr  
##  PID       (Intercept)  0.118271       
##            session      0.031661 -0.602
##  therapist (Intercept)  0.000000       
##            groupcontrol 0.004469   NaN 
##  Residual               0.073270

m2a <- lmer(Score ~ session * group + 
             (1 + session | PID) + 
             (1 | therapist),
           data=tx,
           control = lmerControl(optimizer="bobyqa"))
VarCorr(m2a)

##  Groups    Name        Std.Dev. Corr  
##  PID       (Intercept) 0.118298       
##            session     0.031661 -0.602
##  therapist (Intercept) 0.000000       
##  Residual              0.073270

It now looks like estimates for random intercepts for therapists is now 0. If we remove this, our model finally is non-singular:

m3 <- lmer(Score ~ session * group + 
             (1 + session | PID),
           data=tx,
           control = lmerControl(optimizer="bobyqa"))

summary(m3)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Score ~ session * group + (1 + session | PID)
##    Data: tx
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: -1633.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.63302 -0.58619  0.01599  0.55406  2.88095 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  PID      (Intercept) 0.013994 0.11830       
##           session     0.001002 0.03166  -0.60
##  Residual             0.005368 0.07327       
## Number of obs: 945, groups:  PID, 135
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)           0.526849   0.015959  33.012
## session               0.033688   0.004130   8.156
## groupcontrol          0.018136   0.023000   0.789
## session:groupcontrol -0.020138   0.005952  -3.383
## 
## Correlation of Fixed Effects:
##             (Intr) sessin grpcnt
## session     -0.655              
## groupcontrl -0.694  0.454       
## sssn:grpcnt  0.454 -0.694 -0.655

Lastly, it’s then a good idea to check that the parameter estimates and SE are not radically different across these models (they are virtually identical)

# extract and column bind the fixed effect estimates
cbind(
  summary(m1)$coefficients[,1],
  summary(m2)$coefficients[,1],
  summary(m2a)$coefficients[,1],
  summary(m3)$coefficients[,1]
)

##                             [,1]        [,2]        [,3]        [,4]
## (Intercept)           0.52684907  0.52684907  0.52684907  0.52684907
## session               0.03368821  0.03368821  0.03368821  0.03368821
## groupcontrol          0.01813605  0.01813605  0.01813605  0.01813605
## session:groupcontrol -0.02013829 -0.02013829 -0.02013829 -0.02013829

# extract and column bind the fixed effect SEs
cbind(
  summary(m1)$coefficients[,2],
  summary(m2)$coefficients[,2],
  summary(m2a)$coefficients[,2],
  summary(m3)$coefficients[,2]
)

##                             [,1]        [,2]        [,3]        [,4]
## (Intercept)          0.016179962 0.015956448 0.015959300 0.015959303
## session              0.004458345 0.004130320 0.004130320 0.004130320
## groupcontrol         0.024267514 0.023082376 0.022999799 0.022999803
## session:groupcontrol 0.006129886 0.005952425 0.005952425 0.005952425

Question A4: Optional

Try the code below to use the allFit() function to fit your final model with all the available optimizers.¹

You might need to install the dfoptim package to get one of the optimizers

sumfits <- allFit(yourmodel)
summary(sumfits)

Exercises: Crossed random effects

Data: Test-enhanced learning

An experiment was run to conceptually replicate “test-enhanced learning” (Roediger & Karpicke, 2006): two groups of 25 participants were presented with material to learn. One group studied the material twice (StudyStudy), the other group studied the material once then did a test (StudyTest). Recall was tested immediately (one minute) after the learning session and one week later. The recall tests were composed of 175 items identified by a keyword (Test_word). One of the researchers’ questions concerned how test-enhanced learning influences time-to-recall.

The critical (replication) prediction is that the StudyStudy group should perform somewhat better on the immediate recall test, but the StudyTest group will retain the material better and thus perform better on the 1-week follow-up test.

The following code loads the data into your R environment by creating a variable called tel:

load(url("https://uoepsy.github.io/data/testenhancedlearning.RData"))

variable	description
Subject_ID	Unique Participant Identifier
Group	Group denoting whether the participant studied the material twice (StudyStudy), or studied it once then did a test (StudyTest)
Delay	Time of recall test (‘min’ = Immediate, ‘week’ = One week later)
Test_word	Word being recalled (175 different test words)
Correct	Whether or not the word was correctly recalled
Rtime	Time to recall word (milliseconds)

Question B1

Load and plot the data. Does it look like the effect was replicated?

Solution

We have a choice to make - whether we focus on recall time or on the correct responses. The choice is yours to make!

You can make use of stat_summary() again!

ggplot(tel, aes(Delay, Rtime, col=Group)) + 
  stat_summary(fun.data=mean_se, geom="pointrange")+
  theme_light()

It’s more work, but some people might rather calculate the numbers and then plot them directly. It does just the same thing:

tel %>% 
  group_by(Delay, Group) %>%
  summarise(
    mean = mean(Rtime),
    se = sd(Rtime)/sqrt(n())
  ) %>%
  ggplot(., aes(x=Delay, col = Group)) +
  geom_pointrange(aes(y=mean, ymin=mean-se, ymax=mean+se))+
  theme_light() +
  labs(y = "Response Time (ms)")

That looks like test-enhanced learning to me!

Let’s also do it quickly for the proportion of correct responses:

ggplot(tel, aes(Delay, Correct, col=Group)) + 
  stat_summary(fun.data=mean_se, geom="pointrange")+
  theme_light()

Question B2

Test the critical hypothesis using a mixed-effects model. Fit the maximal random effect structure supported by the experimental design.

Some questions to consider:

There are two outcomes to consider here: recall time, and accuracy. Which will you use? (Feel free to fit models to both!)
Item accuracy is a binary variable. If you choose this as your outcome variable here, what kind of model will you use?
We can expect variability across subjects (some people are better at learning than others) and across items (some of the recall items are harder than others). How should this be represented in the random effects?
If a model takes ages to fit, you might want to cancel it by pressing the escape key. It is normal for complex models to take time, but for the purposes of this task, give up after a couple of minutes, and try simplifying your model.

Solution

We’re going to use recall time as our outcome here. For our purposes here, there is no specific reason to choose one over the other. If this were your own research project you would likely model both, because they will represent different aspects of the recall process.

Remember - as research designs get more complex, there are an ever-increasing number of different viable and defensible approaches that we might take in analysing the data. The solutions provided here are not “the answer”, they are simply “an answer”.

This one will probably take a little bit of time:

m <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Delay * Group | Test_word),
           data=tel, control=lmerControl(optimizer = "bobyqa"))

Question B3

The model with maximal random effects will probably not converge, or will obtain a singular fit. Simplify the model until you achieve convergence.

What we’re aiming to do here is to follow Barr et al.’s advice of defining our maximal model and then removing only the terms to allow a non-singular fit.

Note: This strategy - starting with the maximal random effects structure and removing terms until obtaining model convergence, is just one approach, and there are drawbacks (see Matuschek et al., 2017). There is no consensus on what approach is best (see ?isSingular).

Tip: you can look at the variance estimates and correlations easily by using the VarCorr() function. What jumps out?

Solution

There’s a correlation of 0.972 for the by-word interaction random effect:

VarCorr(m)

##  Groups     Name                     Std.Dev. Corr                
##  Test_word  (Intercept)               17.8642                     
##             Delayweek                 13.2793 -0.251              
##             GroupStudyTest            18.0270 -0.797 -0.385       
##             Delayweek:GroupStudyTest  13.1273  0.972 -0.016 -0.917
##  Subject_ID (Intercept)               40.5201                     
##             Delayweek                  7.4486 -0.038              
##  Residual                            240.3113

lets remove the interaction in the by-word random effects:

m1 <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Delay + Group | Test_word),
           data=tel, control=lmerControl(optimizer = "bobyqa"))
VarCorr(m1)

##  Groups     Name           Std.Dev. Corr         
##  Test_word  (Intercept)     14.5561              
##             Delayweek       14.6914  0.152       
##             GroupStudyTest  12.3210 -0.612 -0.875
##  Subject_ID (Intercept)     40.5190              
##             Delayweek        7.4365 -0.038       
##  Residual                  240.3436

isSingular(m1)

## [1] TRUE

We still have a singular fit here. Thinking about the study, if we are going to remove one of the by-testword random effects (Delay or Group), which one do we consider to be more theoretically justified? Is the effect of Delay likely to vary by test-words? More so than the effect of group is likely to vary by test-words? Quite possibly - there’s no obvious reason for certain words to be more memorable for people in one group vs another. But there is reason for words to vary in the effect that delay of one week has - how familiar a word is will likely influence the amount to which a week’s delay has on recall.

Let’s remove the by-testword random effect of group.

m2 <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Delay | Test_word),
           data=tel, control=lmerControl(optimizer = "bobyqa"))
isSingular(m2)

## [1] FALSE

VarCorr(m2)

##  Groups     Name        Std.Dev. Corr  
##  Test_word  (Intercept)  11.6972       
##             Delayweek    13.5689 -0.236
##  Subject_ID (Intercept)  40.5156       
##             Delayweek     7.4001 -0.037
##  Residual               240.4432

Hooray, the model converged!

summary(m2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Rtime ~ Delay * Group + (1 + Delay | Subject_ID) + (1 + Delay |  
##     Test_word)
##    Data: tel
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 241671.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8200 -0.6685 -0.0096  0.6760  4.0783 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev. Corr 
##  Test_word  (Intercept)   136.82  11.70        
##             Delayweek     184.12  13.57   -0.24
##  Subject_ID (Intercept)  1641.52  40.52        
##             Delayweek      54.76   7.40   -0.04
##  Residual               57812.93 240.44        
## Number of obs: 17498, groups:  Test_word, 175; Subject_ID, 50
## 
## Fixed effects:
##                          Estimate Std. Error t value
## (Intercept)               744.047      8.925  83.363
## Delayweek                  26.766      5.448   4.913
## GroupStudyTest            -18.032     12.560  -1.436
## Delayweek:GroupStudyTest  -17.647      7.566  -2.332
## 
## Correlation of Fixed Effects:
##             (Intr) Delywk GrpStT
## Delayweek   -0.285              
## GropStdyTst -0.704  0.200       
## Dlywk:GrpST  0.202 -0.694 -0.288

Question B4

Load the effects package, and try running this code:

library(effects)
ef <- as.data.frame(effect("Delay:Group", model))

What is ef? and how can you use it to plot the model-estimated condition means and variability?

Solution

ggplot(ef, aes(Delay, fit, color=Group)) + 
  geom_pointrange(aes(ymax=upper, ymin=lower), position=position_dodge(width = 0.2))+
  theme_classic() # just for a change :)

Question B5

Can we get a similar plot using plot_model() from the sjPlot package?

Solution

library(sjPlot)
plot_model(m2, type="int")

Question B6

What should we do with this information? How can we apply test-enhanced learning to learning R and statistics?

Solution

Exercises: Boston Naming Test

Data: Naming

72 children from 10 schools were administered the full Boston Naming Test (BNT-60) on a yearly basis for 5 years to examine development of word retrieval. Five of the schools taught lessons in a bilingual setting with English as one of the languages, and the remaining five schools taught in monolingual English.

The data is available at https://uoepsy.github.io/data/bntmono.csv.

variable	description
child_id	unique child identifier
school_id	unique school identifier
BNT60	score on the Boston Naming Test-60. Scores range from 0 to 60
schoolyear	Year of school
mlhome	Mono/Bi-lingual School. 0 = Bilingual, 1 = Monolingual

Question C1

Fit a model examining the interaction between the effects of school year and mono/bilingual teaching on word retrieval, with random intercepts only for children and schools.
Tip: make sure your variables are of the right type first - e.g. numeric, factor etc

Examine the fit and consider your model assumptions, and assess what might be done to improve the model in order to make better statistical inferences.

Solution

bnt <- bnt %>% mutate(across(c(mlhome, school_id, child_id), factor))
bntm0 <- lmer(BNT60 ~ schoolyear * mlhome + (1 | school_id/child_id), data = bnt)

Residuals don’t look zero mean:

plot(bntm0, type=c("p","smooth"))

It looks a little like, compared to our model (black lines below) the children’s scores (coloured lines) are more closely clustered together when they start school, and then they are more spread out by the end of the study. The fact that we’re fitting the same slope for each child is restricting us here, so we should try fitting random effects of schoolyear.

augment(bntm0) %>%
  ggplot(aes(x=schoolyear, col=child_id)) + 
  geom_point(aes(y = BNT60))+
  geom_path(aes(y = BNT60))+
  geom_path(aes(y = .fitted), col="black", alpha=.3)+
  guides(col="none")+
  facet_wrap(~school_id)

bntm1 <- lmer(BNT60 ~ schoolyear * mlhome + (1 + schoolyear | school_id/child_id), data = bnt)
plot(bntm1, type=c("p","smooth"))

Much better!

Let’s do some quick diagnostic checks for influence:

inf1 <- hlm_influence(bntm1, level=1)
dotplot_diag(inf1$cooksd, cutoff = "internal")

If you check in the help for dotplot_diag(), it tells you that

we can add an index for the labels, and
the coordinates (x,y) are flipped. We’re telling R to change the limits of the y axis, but actually it is the x axis. This is just because we want to see the label for that point out to the right.

infchild <- hlm_influence(bntm1, level="child_id:school_id")
dotplot_diag(infchild$cooksd, cutoff = "internal", index = infchild$`child_id:school_id`) + 
  scale_y_continuous(limits=c(0,.05))

And then we can examine the effects to the fixed effects and our standard errors when we remove this child:

del94 <- case_delete(bntm1, level="child_id:school_id", delete = "ID94:SC9")
cbind(del94$fixef.original, del94$fixef.delete)

##                         [,1]       [,2]
## (Intercept)         6.265626  6.2627962
## schoolyear          6.371168  6.3639335
## mlhome1             0.138711 -0.3052998
## schoolyear:mlhome1 -2.603763 -2.3701181

Optional: Case deletion influence on standard errors

We can examine the influence that deleting a case has on the standard errors. The standard errors are the square-root of the diagonal of the model-implied variance-covariance matrix:

cbind( 
  sqrt(diag(del94$vcov.original)),
  sqrt(diag(del94$vcov.delete))
)

##                         [,1]      [,2]
## (Intercept)        1.0413285 0.9557645
## schoolyear         0.7719321 0.6717837
## mlhome1            1.4683927 1.3525492
## schoolyear:mlhome1 1.0875897 0.9498207

infschool <- hlm_influence(bntm1, level="school_id")
dotplot_diag(infschool$cooksd, cutoff = "internal", index = infschool$school_id)

Question C2

Using a method of your choosing, conduct inferences from your model and write up the results.

Solution

bnt_null <- lmer(BNT60 ~ 1 +  (1 | school_id/child_id), data = bnt)
as.data.frame(VarCorr(bnt_null)) %>%
    select(grp, vcov) %>%
    mutate(
        icc = cumsum(vcov)/sum(vcov)*100
    )

##                  grp     vcov       icc
## 1 child_id:school_id 36.52302  22.21657
## 2          school_id 28.76157  39.71194
## 3           Residual 99.11079 100.00000

We’re going to construct some case-based bootstrapped confidence intervals around our fixed effects here.
This took quite a while to run:

library(lmeresampler)
bntm1BS <- bootstrap(bntm1, .f=fixef, type = "case", B = 2000, resample = c(FALSE,TRUE,FALSE))
confint(bntm1BS, type = "perc")

## # A tibble: 4 × 6
##   term               estimate lower upper type  level
##   <chr>                 <dbl> <dbl> <dbl> <chr> <dbl>
## 1 (Intercept)           6.27   5.43  7.12 perc   0.95
## 2 schoolyear            6.37   5.72  7.02 perc   0.95
## 3 mlhome1               0.139 -1.12  1.40 perc   0.95
## 4 schoolyear:mlhome1   -2.60  -3.51 -1.59 perc   0.95

Multilevel level linear regression was used to investigate childrens’ development of word retrieval over 5 years of school, and whether development was dependent upon the school teaching classes monolingually or bilingually. Initial evaluation of the intercept-only model indicated that the clustering of multiple observations from children within schools accounted for 39.7% of the variance in scores on the Boston Naming Task (BNT60, range 0 to 60). BNT60 scores were modelled with fixed effects of school year (1-5) and monolingual teaching (monolingual vs bilingual, treatment coded with monolingual as the reference level). Random intercepts and slopes of school year were included for schools and for children nested within schools. The model was fitting with maximum likelihood estimation using the default optimiser from the lme4 package (Bates et al., 2015).
95% Confidence for fixed effect estimates were constructed by case-based bootstrapping with 2000 bootstraps in which children, (but neither observations within children nor the schools within which children were nested) were resampled. Results indicated that children’s scores on the BNT60 increased over the 5 years in which they were studied, with children from bilingual schools increasing in scores by 6.37 ([5.72 – 7.02]) every school year. There was a significant interaction between mono/bilingual schools and the changes over school year, with children from monolingual schools increasing -2.6 ([-3.51 – -1.59]) less than those from bilingual schools for every additional year of school. Full model results can be found in Table 1.

Table 1

	BNT 60
Predictors	Estimates	95% CI bootstrap
Intercept	6.27	5.43 – 7.12
School Year	6.37	4.85 – 7.89
MonolingualSchool [1]	0.14	-2.75 – 3.03
School Year:MonolingualSchool [1]	-2.60	-4.74 – -0.46
Random Effects
σ²	8.64
τ₀₀ _{child_id:school_id}	1.77
τ₀₀ _{school_id}	3.83
τ₁₁ _{child_id:school_id.schoolyear}	6.83
τ₁₁ _{school_id.schoolyear}	1.89
ρ₀₁ _{child_id:school_id}	-0.42
ρ₀₁ _{school_id}	-0.39
ICC	0.91
N _{child_id}	74
N _{school_id}	10
Observations	370
Marginal R² / Conditional R²	0.420 / 0.947

If you have an older version of lme4, then allFit() might not be directly available, and you will need to run the following: source(system.file("utils", "allFit.R", package="lme4")).↩︎

Formula	Alternative	Meaning
\(\text{y ~ (1 \| g)}\)	\(\text{y ~ 1 + (1 \| g)}\)	Random intercept with fixed mean
\(\text{y ~ 0 + offset}(o)\text{ + (1 \| g)}\)	\(\text{y ~ -1 + offset}(o)\text{ + (1 \| g)}\)	Random intercept with a priori means
\(\text{y ~ (1 \| g1/g2)}\)	\(\text{y ~ (1 \| g1) + (1 \| g1:g2)}\)	Intercept varying among \(g1\) and \(g2\) within \(g1\)
\(\text{y ~ (1 \| g1) + (1 \| g2)}\)	\(\text{y ~ 1 + (1 \| g1) + (1 \| g2)}\)	Intercept varying among \(g1\) and \(g2\)
\(\text{y ~ x + (x \| g)}\)	\(\text{y ~ 1 + x + (1 + x \| g)}\)	Correlated random intercept and slope
\(\text{y ~ x + (x \|\| g)}\)	\(\text{y ~ 1 + x + (x \| g) + (0 + x \| g)}\)	Uncorrelated random intercept and slope

Random Effect Structures

Random effects

What are “Random Effects”?

Nested & Crossed Structures

Random Effects in lme4

Model Checks

A Note on Convergence warnings

Assumptions

Influence

Exercises: Three-level nesting

Exercises: Crossed random effects

Exercises: Boston Naming Test