Information about solutions

Solutions for these exercises are available immediately below each question.
We would like to emphasise that much evidence suggests that testing enhances learning, and we strongly encourage you to make a concerted attempt at answering each question before looking at the solutions. Immediately looking at the solutions and then copying the code into your work will lead to poorer learning.
We would also like to note that there are always many different ways to achieve the same thing in R, and the solutions provided are simply one approach.

Preliminaries

Create a new RMarkdown document or R script (whichever you like) for this week.

Assumptions

Toy data

Recall our toy example data in which we fitted a multilevel model to model how practice (in hours per week) influences the reading age of toy figurines which are grouped by toy-type (Playmobil, Powerrangers, Farm Animals etc).

toys_read <- read_csv("https://uoepsy.github.io/data/toyexample.csv")

Question A1

Run the code below to fit the model with by-toy-type random intercepts and slopes of practice.

library(tidyverse)
library(lme4)
toys_read <- read_csv("https://uoepsy.github.io/data/toyexample.csv")
rs_model <- lmer(R_AGE ~ 1 + hrs_week + (1 + hrs_week | toy_type), data = toys_read)

Plot the residuals vs fitted model, and assess the extend to which the assumption holds that the residuals are zero mean.

Solution

Question A2

Construct a scale-location plot. This is where the square-root of the absolute value of the standardised residuals is plotted against the fitted values, and allows you to more easily assess the assumption of constant variance.

Solution

Question A3

Using the augment() function from the broom.mixed package, construct the same plot using ggplot.

(constructing plots like this manually is a really useful way to help understand what exactly is being plotted)

Solution

Question A4

Examine the normality of both the level 1 and level 2 residuals.

Solution

Level 1

hist(resid(rs_model))

library(lattice)
qqmath(rs_model, id=0.05)

Level 2

qqmath(ranef(rs_model))

## $toy_type

hist(ranef(rs_model)$toy_type[,1])

hist(ranef(rs_model)$toy_type[,2])

Diagnostics

Question B1

Which toy in the dataset has the greatest influence on our model?

Hint: as well as hlm_influence() in the HLMdiag package there is another nice function, hlm_augment()

We can often end up in confusion because the \(i^{th}\) observation inputted to our model (and therefore the \(i^{th}\) observation of hlm_influence() output) might not be the \(i^{th}\) observation in our original dataset - there may be missing data!

(Luckily, we have no missing data in the Toy dataset).

Solution

library(HLMdiag)
l1_inf <- hlm_influence(rs_model,level=1)
dotplot_diag(l1_inf$cooksd, cutoff="internal")+
  ylim(0,.15)

hlm_augment(rs_model, level=1) %>% arrange(desc(cooksd))

## # A tibble: 132 × 15
##       id  R_AGE hrs_week toy_type .resid .fitted .ls.resid .ls.fitted .mar.resid
##    <dbl>  <dbl>    <dbl> <fct>     <dbl>   <dbl>     <dbl>      <dbl>      <dbl>
##  1    81  2.91      6.28 SuperZi…  -3.24    6.15   -1.60        4.51      -6.03 
##  2    65 14.0       2.44 G.I.Joe    4.98    8.99    2.69       11.3        9.43 
##  3   115  0.735     1.70 Stretch…  -3.17    3.91   -0.182       0.917     -2.97 
##  4    46 11.6       5.70 Star Wa…  -2.68   14.3    -2.50       14.1        3.34 
##  5    55  1.47      2.77 SuperZi…  -2.11    3.58   -3.36        4.84      -3.45 
##  6   104  2.68      2.74 Rugrats   -1.79    4.47    0.0279      2.65      -2.21 
##  7    23  0.838     1.74 Playmob…  -1.55    2.39    0.585       0.253     -2.91 
##  8    30  5.33      2.99 Furby     -3.10    8.43   -2.72        8.04       0.156
##  9   130  3.79      2.73 Lego Mi…   2.54    1.25    1.94        1.85      -1.08 
## 10    36  3.16      2.99 My Litt…   1.87    1.29   -0.508       3.66      -2.02 
## # … with 122 more rows, and 6 more variables: .mar.fitted <dbl>, cooksd <dbl>,
## #   mdffits <dbl>, covtrace <dbl>, covratio <dbl>, leverage.overall <dbl>

Greatest influence:

Question B2

For which toy is the model fit the worst (i.e., who has the highest residual?)

Solution

hlm_augment(rs_model, level=1) %>% arrange(desc(abs(.resid)))

## # A tibble: 132 × 15
##       id  R_AGE hrs_week toy_type .resid .fitted .ls.resid .ls.fitted .mar.resid
##    <dbl>  <dbl>    <dbl> <fct>     <dbl>   <dbl>     <dbl>      <dbl>      <dbl>
##  1    93  7.78      4.28 G.I.Joe   -5.26  13.0       -4.62     12.4         1.13
##  2    65 14.0       2.44 G.I.Joe    4.98   8.99       2.69     11.3         9.43
##  3    56 -3.17      3.47 Lego Mi…  -4.59   1.42      -4.69      1.52       -8.89
##  4   102 -2.91      5.12 My Litt…  -4.56   1.64      -3.96      1.04      -10.5 
##  5    43 -2.99      3.96 Transfo…  -3.84   0.853     -3.79      0.800      -9.28
##  6    16  8.41      3.89 G.I.Joe   -3.76  12.2       -3.75     12.2         2.21
##  7    22  0.763     3.97 Sock Pu…  -3.70   4.46      -3.65      4.42       -5.54
##  8    62  4.31      3.61 Minecra…  -3.54   7.85      -2.53      6.84       -1.58
##  9   129  5.00      4.84 My Litt…   3.41   1.60       3.61      1.39       -2.29
## 10    14 16.4       4.26 G.I.Joe    3.39  13.0        3.99     12.4         9.76
## # … with 122 more rows, and 6 more variables: .mar.fitted <dbl>, cooksd <dbl>,
## #   mdffits <dbl>, covtrace <dbl>, covratio <dbl>, leverage.overall <dbl>

toys_read[93, ]

## # A tibble: 1 × 5
##   toy_type toy        hrs_week   age R_AGE
##   <chr>    <chr>         <dbl> <dbl> <dbl>
## 1 G.I.Joe  Cold Front     4.28  7.35  7.78

Question B3

Which type of toy has the greatest influence on our model?

Solution

Either this way:

hlm_augment(rs_model, level="toy_type") %>% arrange(desc(cooksd))

## # A tibble: 20 × 10
##    toy_type  .ranef.intercept .ranef.hrs_week .ls.intercept .ls.hrs_week  cooksd
##    <chr>                <dbl>           <dbl>         <dbl>        <dbl>   <dbl>
##  1 G.I.Joe             1.86            1.06           9.29        -0.889 1.60e-1
##  2 Scooby D…           2.37            1.62           8.35         0.336 1.33e-1
##  3 Farm Ani…          -2.13           -1.73           0.314       -2.40  1.22e-1
##  4 My Littl…          -0.956          -0.979          6.82        -2.72  8.50e-2
##  5 Stretch …          -0.140           0.203         -4.65         1.47  8.09e-2
##  6 SuperZin…          -0.198          -0.412          4.57        -1.58  6.48e-2
##  7 Playmobil          -0.778          -0.336         -3.72         0.487 6.24e-2
##  8 Toy Story           0.970           0.890         -0.388        1.14  4.77e-2
##  9 Transfor…          -1.33           -1.03           1.44        -1.79  4.13e-2
## 10 Teenage …          -1.32           -0.912         -4.24        -0.510 3.96e-2
## 11 Lego Min…          -1.10           -0.925          2.55        -1.94  3.81e-2
## 12 Furby               0.894           0.791          0.235        0.948 3.29e-2
## 13 Star Wars           1.20            0.846          3.70         0.245 3.27e-2
## 14 Rugrats            -0.296          -0.0454        -3.46         0.545 2.68e-2
## 15 Polly Po…           0.715           0.662         -1.30         1.23  2.66e-2
## 16 Peppa Pig           0.174          -0.0388         4.04        -0.973 2.54e-2
## 17 Minecraft           0.393           0.433        -20.2          5.84  1.97e-2
## 18 Barbie              0.0856          0.165         -0.843        0.359 8.94e-3
## 19 Sock Pup…          -0.523          -0.331         -3.98         0.490 8.13e-3
## 20 Power Ra…           0.108           0.0754         1.44        -0.296 2.65e-4
## # … with 4 more variables: mdffits <dbl>, covtrace <dbl>, covratio <dbl>,
## #   leverage.overall <dbl>

Or the plot. Note, the only reason we using the cutoff = .15 is to make the labels appear.

inftoytype <- hlm_influence(rs_model,level="toy_type")
dotplot_diag(inftoytype$cooksd, index=inftoytype$toy_type, cutoff=.15) +
  ylim(0,.2)

Question B4

Looking at the random effects, which toy type shows the least improvement in reading age as practice increases, and which shows the greatest improvement?

Solution

It looks like the Farm Animals have the least improvement, and Scooby Doo shows the most improvement

ranef(rs_model)

## $toy_type
##                              (Intercept)    hrs_week
## Barbie                         0.0855596  0.16463252
## Farm Animals                  -2.1303034 -1.72532126
## Furby                          0.8940478  0.79109745
## G.I.Joe                        1.8601170  1.05745304
## Lego Minifigures              -1.0963190 -0.92548279
## Minecraft                      0.3930185  0.43342754
## My Little Pony                -0.9557181 -0.97937536
## Peppa Pig                      0.1742980 -0.03882159
## Playmobil                     -0.7780752 -0.33556416
## Polly Pocket                   0.7147074  0.66199641
## Power Rangers                  0.1076182  0.07541960
## Rugrats                       -0.2959716 -0.04537848
## Scooby Doo                     2.3654608  1.61665908
## Sock Puppets                  -0.5225530 -0.33068125
## Star Wars                      1.2020393  0.84628213
## Stretch Armstrong             -0.1397913  0.20314258
## SuperZings                    -0.1976937 -0.41191503
## Teenage Mutant Ninja Turtles  -1.3163768 -0.91247162
## Toy Story                      0.9701905  0.88966357
## Transformers                  -1.3342550 -1.03476237
## 
## with conditional variances for "toy_type"

Question B5

What is the estimated reading age for sock puppets with zero hours of practice per week, and what is their estimated change in reading age for every hour per week increase in practice?

Solution

We can get these either by adding the fixef() and ranef() together, or:

coef(rs_model)

## $toy_type
##                              (Intercept)   hrs_week
## Barbie                         1.8415026  1.3081408
## Farm Animals                  -0.3743604 -0.5818130
## Furby                          2.6499908  1.9346058
## G.I.Joe                        3.6160599  2.2009613
## Lego Minifigures               0.6596239  0.2180255
## Minecraft                      2.1489614  1.5769358
## My Little Pony                 0.8002248  0.1641329
## Peppa Pig                      1.9302409  1.1046867
## Playmobil                      0.9778678  0.8079441
## Polly Pocket                   2.4706504  1.8055047
## Power Rangers                  1.8635611  1.2189279
## Rugrats                        1.4599714  1.0981298
## Scooby Doo                     4.1214037  2.7601674
## Sock Puppets                   1.2333900  0.8128271
## Star Wars                      2.9579822  1.9897904
## Stretch Armstrong              1.6161517  1.3466509
## SuperZings                     1.5582493  0.7315933
## Teenage Mutant Ninja Turtles   0.4395661  0.2310367
## Toy Story                      2.7261335  2.0331719
## Transformers                   0.4216880  0.1087459
## 
## attr(,"class")
## [1] "coef.mer"

coef(rs_model)$toy_type["Sock Puppets",]

##              (Intercept)  hrs_week
## Sock Puppets     1.23339 0.8128271

Convergence Issues

Singular fits

You may have noticed that a lot of our models over the last few weeks have been giving a warning: boundary (singular) fit: see ?isSingular.
Up to now, we’ve been largely ignoring these warnings. However, this week we’re going to look at how to deal with this issue.

The warning is telling us that our model has resulted in a ‘singular fit.’ Singular fits often indicate that the model is ‘overfitted’ - that is, the random effects structure which we have specified is too complex to be supported by the data.

Perhaps the most intuitive advice would be remove the most complex part of the random effects structure (i.e. random slopes). This leads to a simpler model that is not over-fitted. In other words, start simplying from the top (where the most complexity is) to the bottom (where the lowest complexity is). Additionally, when variance estimates are very low for a specific random effect term, this indicates that the model is not estimating this parameter to differ much between the levels of your grouping variable. It might, in some experimental designs, be perfectly acceptable to remove this or simply include it as a fixed effect.

A key point here is that when fitting a mixed model, we should think about how the data are generated. Asking yourself questions such as “do we have good reason to assume subjects might vary over time, or to assume that they will have different starting points (i.e., different intercepts)?” can help you in specifying your random effect structure

You can read in depth about what this means by reading the help documentation for ?isSingular. For our purposes, a relevant section is copied below:

… intercept-only models, or 2-dimensional random effects such as intercept + slope models, singularity is relatively easy to detect because it leads to random-effect variance estimates of (nearly) zero, or estimates of correlations that are (almost) exactly -1 or 1.

Convergence warnings

Issues of non-convergence can be caused by many things. If you’re model doesn’t converge, it does not necessarily mean the fit is incorrect, however it is is cause for concern, and should be addressed, else you may end up reporting inferences which do not hold.

There are lots of different things which you could do which might help your model to converge. A select few are detailed below:

double-check the model specification and the data
adjust stopping (convergence) tolerances for the nonlinear optimizer, using the optCtrl argument to [g]lmerControl. (see ?convergence for convergence controls).
- What is “tolerance?” Remember that our optimizer is the the method by which the computer finds the best fitting model, by iteratively assessing and trying to maximise the likelihood (or minimise the loss).
  
  Figure 1: An optimizer will stop after a certain number of iterations, or when it meets a tolerance threshold
center and scale continuous predictor variables (e.g. with scale)
Change the optimization method (for example, here we change it to bobyqa): lmer(..., control = lmerControl(optimizer="bobyqa"))
glmer(..., control = glmerControl(optimizer="bobyqa"))
Increase the number of optimization steps: lmer(..., control = lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000))
glmer(..., control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000))
Use allFit() to try the fit with all available optimizers. This will of course be slow, but is considered ‘the gold standard’; “if all optimizers converge to values that are practically equivalent, then we would consider the convergence warnings to be false positives.”
Consider simplifying your model, for example by removing random effects with the smallest variance (but be careful to not simplify more than necessary, and ensure that your write up details these changes)

Question C1

Recall Questions C1-C6 in last week’s exercises, we used the WeightMaintain3 dataset and fitted some models:

load(url("https://uoepsy.github.io/data/WeightMaintain3.rda"))
m.base <- lmer(WeightChange ~ Assessment + (1 + Assessment | ID), data=WeightMaintain3)
m.int <- lmer(WeightChange ~ Assessment + Condition + (1 + Assessment | ID), data=WeightMaintain3)
m.full <- lmer(WeightChange ~ Assessment * Condition + (1 + Assessment | ID), data=WeightMaintain3)

Many of these were singular fits, and we ignored them, e.g:

isSingular(m.base)

## [1] TRUE

Think carefully about how you might simplify the random effect structure for these models, and pay close attention to the study design. What do we think the baseline weight change should be? Should it be the same for everyone? If so, might we want to remove the random intercept, which we can do by setting it to 0.

Solution

Think about what these models are fitting. Why (in this very specific research design) might it make sense to not model by-participant variation around the intercept?
The “baseline” visit is the starting point for the study, and (based on the description we were given) it doesn’t make sense to think of participants’ weights as having changed by their baseline visit - changed since when? As baseline is at the very start of the weight maintenance, it makes sense that we wouldn’t have very much (if any) participant variation in change at this point. Note that by removing the estimation of this parameter, our models now converge!

m.base <- lmer(WeightChange ~ 1 + Assessment + (0 + Assessment | ID), data=WeightMaintain3)
m.int <- lmer(WeightChange ~ 1 + Assessment + Condition + (0 + Assessment | ID), data=WeightMaintain3)
m.full <- lmer(WeightChange ~ 1 + Assessment * Condition + (0 + Assessment | ID), data=WeightMaintain3)

isSingular(m.base)

## [1] FALSE

Random effects

When specifying a random effects model, think about the data you have and how they fit in the following table:

Criterion:	Repetition: If the experiment were repeated:	Desired inference: The conclusions refer to:
Fixed effects	Same levels would be used	The levels used
Random effects	Different levels would be used	A population from which the levels used are just a (random) sample

For example, applying the criteria to the following questions:

Do dogs learn faster with higher rewards?

FIXED: reward

RANDOM: dog
Do students read faster at higher temperatures?

FIXED: temperature

RANDOM: student
Does people speaking one language speak faster than another?

FIXED: the language

RANDOM: the people speaking that language

Sometimes, after simplifying the model, you find that there isn’t much variability in a specific random effect and, if it still leads to singular fits or convergence warnings, it is common to just model that variable as a fixed effect.

Other times, you don’t have sufficient data or levels to estimate the random effect variance, and you are forced to model it as a fixed effect. This is similar to trying to find the “best-fit” line passing through a single point… You can’t because you need two points!

Random effects in lme4

Below are a selection of different formulas for specifying different random effect structures, taken from the lme4 vignette. This might look like a lot, but over time and repeated use of multilevel models you will get used to reading these in a similar way to getting used to reading the formula structure of y ~ x1 + x2 in all our linear models.

Formula	Alternative	Meaning
\(\text{(1 \| g)}\)	\(\text{1 + (1 \| g)}\)	Random intercept with fixed mean
\(\text{0 + offset(o) + (1 \| g)}\)	\(\text{-1 + offset(o) + (1 \| g)}\)	Random intercept with a priori means
\(\text{(1 \| g1/g2)}\)	\(\text{(1 \| g1) + (1 \| g1:g2)}\)	Intercept varying among \(g1\) and \(g2\) within \(g1\)
\(\text{(1 \| g1) + (1 \| g2)}\)	\(\text{1 + (1 \| g1) + (1 \| g2)}\)	Intercept varying among \(g1\) and \(g2\)
\(\text{x + (x \| g)}\)	\(\text{1 + x + (1 + x \| g)}\)	Correlated random intercept and slope
\(\text{x + (x \|\| g)}\)	\(\text{1 + x + (x \| g) + (0 + x \| g)}\)	Uncorrelated random intercept and slope

Table 1: Examples of the right-hand-sides of mixed effects model formulas. \(g\), \(g1\), \(g2\) are grouping factors, covariates and a priori known offsets are \(x\) and \(o\).

Three-level nesting

Data: Treatment Effects

Synthetic data from a RCT treatment study: 5 therapists randomly assigned participants to control or treatment group and monitored the participants’ performance over time. There was a baseline test, then 6 weeks of treatment, with test sessions every week (7 total sessions).

The following code will load in your R session an object already called tx with the data:

load(url("https://uoepsy.github.io/msmr/data/tx.Rdata"))

You can see the head of the data below:

group	session	therapist	Score	PID
control	1	A	0.56	A_control_15
control	1	B	0.61	B_control_15
control	1	C	0.54	C_control_15
control	1	D	0.45	D_control_15
control	1	E	0.59	E_control_15
control	1	A	0.56	A_control_21

Question D1

Load and visualise the data. Does it look like the treatment had an effect on the performance score?

Solution

Question D2

Consider these questions when you’re designing your model(s) and use your answers to motivate your model design and interpretation of results:

What are the levels of nesting? How should that be reflected in the random effect structure?
What is the shape of change over time? Do you need polynomials to model this shape? If yes, what order polynomials?

Solution

Question D3

Test whether the treatment had an effect using mixed-effects modelling.

Try to fit the maximal model.
Does it converge? Is it singular?

Hint: What is the maximal model?

Solution

library(lme4)

# start with maximal model
m1 <- lmer(Score ~ session * group + 
             (1 + session | PID) + 
             (1 + session * group | therapist),
           data=tx)

isSingular(m1)

## [1] TRUE

Question D4

Try adjusting your model by removing random effects or correlations, examine the model again, and so on..

Solution

VarCorr(m1)

##  Groups    Name                 Std.Dev.  Corr                
##  PID       (Intercept)          0.1181433                     
##            session              0.0315954 -0.603              
##  therapist (Intercept)          0.0000000                     
##            session              0.0023525    NaN              
##            groupcontrol         0.0098621    NaN  1.000       
##            session:groupcontrol 0.0019791    NaN -1.000 -1.000
##  Residual                       0.0732691

There’s a correlation of exactly -1 between the random intercepts and slopes for therapists, and the standard deviation estimate for session|therapist is pretty small. Let’s remove it.

m2 <- lmer(Score ~ session * group + 
             (1 + session | PID) + 
             (1 | therapist),
           data=tx)
VarCorr(m2)

##  Groups    Name        Std.Dev. Corr  
##  PID       (Intercept) 0.118298       
##            session     0.031661 -0.602
##  therapist (Intercept) 0.000000       
##  Residual              0.073270

It now looks like estimates for random intercepts for therapists is now 0. If we remove this, our model finally is non-singular:

m3 <- lmer(Score ~ session * group + 
             (1 + session | PID),
           data=tx)
summary(m3)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Score ~ session * group + (1 + session | PID)
##    Data: tx
## 
## REML criterion at convergence: -1633.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.63301 -0.58620  0.01599  0.55405  2.88093 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  PID      (Intercept) 0.013994 0.11829       
##           session     0.001002 0.03166  -0.60
##  Residual             0.005368 0.07327       
## Number of obs: 945, groups:  PID, 135
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)           0.526849   0.015959  33.013
## session               0.033688   0.004130   8.156
## groupcontrol          0.018136   0.022999   0.789
## session:groupcontrol -0.020138   0.005952  -3.383
## 
## Correlation of Fixed Effects:
##             (Intr) sessin grpcnt
## session     -0.655              
## groupcontrl -0.694  0.454       
## sssn:grpcnt  0.454 -0.694 -0.655

Lastly, it’s then a good idea to check that the parameter estimates and SE are not radically different across these models (they are virtually identical)

summary(m1)$coefficients

##                         Estimate  Std. Error    t value
## (Intercept)           0.52684907 0.015942928 33.0459425
## session               0.03368821 0.004255211  7.9169307
## groupcontrol          0.01813605 0.023395688  0.7751876
## session:groupcontrol -0.02013829 0.006007577 -3.3521487

summary(m2)$coefficients

##                         Estimate  Std. Error    t value
## (Intercept)           0.52684907 0.015959286 33.0120700
## session               0.03368821 0.004130318  8.1563229
## groupcontrol          0.01813605 0.022999779  0.7885314
## session:groupcontrol -0.02013829 0.005952422 -3.3832097

summary(m3)$coefficients

##                         Estimate  Std. Error    t value
## (Intercept)           0.52684907 0.015958981 33.0127011
## session               0.03368821 0.004130297  8.1563646
## groupcontrol          0.01813605 0.022999339  0.7885464
## session:groupcontrol -0.02013829 0.005952391 -3.3832270

Question D5: Optional

Try the code below to use the allFit() function to fit your final model with all the available optimizers.¹

You might need to install the dfoptim package to get one of the optimizers

sumfits <- allFit(yourmodel)
summary(sumfits)

Crossed random effects

Data: Test-enhanced learning

An experiment was run to conceptually replicate “test-enhanced learning” (Roediger & Karpicke, 2006): two groups of 25 participants were presented with material to learn. One group studied the material twice (StudyStudy), the other group studied the material once then did a test (StudyTest). Recall was tested immediately (one minute) after the learning session and one week later. The recall tests were composed of 175 items identified by a keyword (Test_word). One of the researchers’ questions concerned how test-enhanced learning influences time-to-recall.

The critical (replication) prediction is that the StudyStudy group should perform somewhat better on the immediate recall test, but the StudyTest group will retain the material better and thus perform better on the 1-week follow-up test.

The following code loads the data into your R environment by creating a variable called tel:

load(url("https://uoepsy.github.io/data/testenhancedlearning.RData"))

The head of the dataset can be seen below:

Subject_ID	Group	Delay	Test_word	Correct	Rtime
StudyTest_L	StudyTest	min	van	1	456.81
StudyTest_L	StudyTest	week	dinosaur	0	888.13
StudyTest_L	StudyTest	min	typewriter	0	713.43
StudyTest_L	StudyTest	min	chimney	0	725.52
StudyTest_L	StudyTest	week	dog	1	472.69
StudyTest_L	StudyTest	min	turkey	1	574.30

Question E1

Load and plot the data. Does it look like the effect was replicated?

Solution

Question E2

Test the critical hypothesis using a mixed-effects model. Fit the maximal random effect structure supported by the experimental design.

Some questions to consider:

Item accuracy is a binary variable. What kind of model will you use?
We can expect variability across subjects (some people are better at learning than others) and across items (some of the recall items are harder than others). How should this be represented in the random effects?
If a model takes ages to fit, you might want to cancel it by pressing the escape key. It is normal for complex models to take time, but for the purposes of this task, give up after a couple of minutes, and try simplifying your model.

Solution

Question E3

The model with maximal random effects will probably not converge, or will obtain a singular fit. Simplify the model until you achieve convergence.

What we’re aiming to do here is to follow Barr et al.’s advice of defining our maximal model and then removing only the terms to allow a non-singular fit.

Note: This strategy - starting with the maximal random effects structure and removing terms until obtaining model convergence, is just one approach, and there are drawbacks (see Matuschek et al., 2017). There is no consensus on what approach is best (see ?isSingular).

Tip: you can look at the variance estimates and correlations easily by using the VarCorr() function. What jumps out?

Solution

There’s a correlation of .999 for some of the random effects by-item!

VarCorr(m)

##  Groups     Name                     Std.Dev. Corr                
##  Test_word  (Intercept)               40.956                      
##             Delayweek                 25.223  -0.823              
##             GroupStudyTest            25.429  -0.852  0.999       
##             Delayweek:GroupStudyTest  13.453   0.998 -0.786 -0.817
##  Subject_ID (Intercept)               48.177                      
##             Delayweek                 14.073  -0.536              
##  Residual                            297.325

lets remove the interaction in the by-word random effects:

m1 <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Delay + Group | Test_word),
           data=tel, control=lmerControl(optimizer = "bobyqa"))
VarCorr(m1)

##  Groups     Name           Std.Dev. Corr         
##  Test_word  (Intercept)     37.547               
##             Delayweek       20.416  -0.695       
##             GroupStudyTest  20.169  -0.738  0.998
##  Subject_ID (Intercept)     48.176               
##             Delayweek       14.068  -0.536       
##  Residual                  297.346

isSingular(m1)

## [1] TRUE

We still have a singular fit here, and the correlation is just slightly different (0.998). Thinking about the study, if we are going to remove one of the by-testword random effects (Delay or Group), which one do we consider to be more theoretically justified? Is the effect of Delay likely to vary by test-words? More so than the effect of group is likely to vary by test-words? Quite possibly - there’s no obvious reason for certain words to be more memorable for people in one group vs another. But there is reason for words to vary in the effect that delay of one week has - how familiar a word is will likely influence the amount to which a week’s delay has on recall.

Let’s remove the by-testword random effect of group.

m2 <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Delay | Test_word),
           data=tel, control=lmerControl(optimizer = "bobyqa"))
isSingular(m2)

## [1] FALSE

VarCorr(m2)

##  Groups     Name        Std.Dev. Corr  
##  Test_word  (Intercept)  29.3486       
##             Delayweek     8.8058 -0.594
##  Subject_ID (Intercept)  48.1651       
##             Delayweek    13.9890 -0.537
##  Residual               297.6631

Hooray, the model converged!

summary(m2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Rtime ~ Delay * Group + (1 + Delay | Subject_ID) + (1 + Delay |  
##     Test_word)
##    Data: tel
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 249189.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2904 -0.6785 -0.0012  0.6712  3.8854 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev. Corr 
##  Test_word  (Intercept)   861.34  29.349       
##             Delayweek      77.54   8.806  -0.59
##  Subject_ID (Intercept)  2319.87  48.165       
##             Delayweek     195.69  13.989  -0.54
##  Residual               88603.33 297.663       
## Number of obs: 17498, groups:  Test_word, 175; Subject_ID, 50
## 
## Fixed effects:
##                          Estimate Std. Error t value
## (Intercept)               885.762     10.862  81.548
## Delayweek                  31.162      6.985   4.462
## GroupStudyTest            -18.564     15.037  -1.235
## Delayweek:GroupStudyTest  -16.298      9.832  -1.658
## 
## Correlation of Fixed Effects:
##             (Intr) Delywk GrpStT
## Delayweek   -0.470              
## GropStdyTst -0.692  0.331       
## Dlywk:GrpST  0.325 -0.704 -0.470

Question E4

Load the effects package, and try running this code:

library(effects)
ef <- as.data.frame(effect("Delay:Group", m2))

What is ef? and how can you use it to plot the model-estimated condition means and variability?

Solution

Question E5

Can we get a similar plot using plot_model() from the sjPlot package?

Solution

Question E6

What should we do with this information? How can we apply test-enhanced learning to learning R and statistics?

Solution

Less-Guided Exercises

Data: Naming

72 children from 10 schools were administered the full Boston Naming Test (BNT-60) on a yearly basis for 5 years to examine development of word retrieval. Five of the schools taught lessons in a bilingual setting with English as one of the languages, and the remaining five schools taught in monolingual English.

The data is available at https://uoepsy.github.io/data/bntmono.csv.

variable	description
child_id	unique child identifier
school_id	unique school identifier
BNT60	score on the Boston Naming Test-60. Scores range from 0 to 60
schoolyear	Year of school
mlhome	Mono/Bi-lingual School. 0 = Bilingual, 1 = Monolingual

Question E7

Fit a model examining the interaction between the effects of school year and mono/bilingual teaching on word retrieval, with random intercepts only for children and schools.
tip: make sure your variables are of the right type first - e.g. numeric, factor etc

Examine the fit and consider your model assumptions, and assess what might be done to improve the model in order to make better statistical inferences.

Solution

bnt <- bnt %>% mutate(across(c(mlhome, school_id, child_id), factor))
bntm0 <- lmer(BNT60 ~ schoolyear * mlhome + (1 | school_id/child_id), data = bnt)

Residuals don’t look zero mean:

plot(bntm0, type=c("p","smooth"))

It looks a little like, compared to our model (black lines below) the children’s scores (coloured lines) are more closely clustered together when they start school, and then they are more spread out by the end of the study. The fact that we’re fitting the same slope for each child is restricting us here, so we should try fitting random effects of schoolyear.

augment(bntm0) %>%
  ggplot(aes(x=schoolyear, col=child_id)) + 
  geom_point(aes(y = BNT60))+
  geom_path(aes(y = BNT60))+
  geom_path(aes(y = .fitted), col="black", alpha=.3)+
  guides(col="none")+
  facet_wrap(~school_id)

bntm1 <- lmer(BNT60 ~ schoolyear * mlhome + (1 + schoolyear | school_id/child_id), data = bnt)
plot(bntm1, type=c("p","smooth"))

Much better!

Let’s do some quick diagnostic checks for influence:

inf1 <- hlm_influence(bntm1, level=1)
dotplot_diag(inf1$cooksd, cutoff = "internal")

If you check in the help for dotplot_diag(), it tells you that

we can add an index for the labels, and
the coordinates (x,y) are flipped. We’re telling R to change the limits of the y axis, but actually it is the x axis. This is just because we want to see the label for that point out to the right.

infchild <- hlm_influence(bntm1, level="child_id:school_id")
dotplot_diag(infchild$cooksd, cutoff = "internal", index = infchild$`child_id:school_id`) + 
  scale_y_continuous(limits=c(0,.05))

And then we can examine the effects to the fixed effects and our standard errors when we remove this child:

del94 <- case_delete(bntm1, level="child_id:school_id", delete = "ID94:SC9")
cbind(del94$fixef.original, del94$fixef.delete)

##                         [,1]       [,2]
## (Intercept)         6.265626  6.2627962
## schoolyear          6.371168  6.3639335
## mlhome1             0.138711 -0.3052998
## schoolyear:mlhome1 -2.603763 -2.3701181

Optional: Case deletion influence on standard errors

We can examine the influence that deleting a case has on the standard errors. The standard errors are the square-root of the diagonal of the model-implied variance-covariance matrix:

cbind( 
  sqrt(diag(del94$vcov.original)),
  sqrt(diag(del94$vcov.delete))
)

##                         [,1]      [,2]
## (Intercept)        1.0413285 0.9557645
## schoolyear         0.7719321 0.6717837
## mlhome1            1.4683927 1.3525492
## schoolyear:mlhome1 1.0875897 0.9498207

infschool <- hlm_influence(bntm1, level="school_id")
dotplot_diag(infschool$cooksd, cutoff = "internal", index = infschool$school_id)

Question E8

Using a method of your choosing, conduct inferences from your model and write up the results.

Solution

bnt_null <- lmer(BNT60 ~ 1 +  (1 | school_id/child_id), data = bnt)
as.data.frame(VarCorr(bnt_null)) %>%
    select(grp, vcov) %>%
    mutate(
        icc = cumsum(vcov)/sum(vcov)*100
    )

##                  grp     vcov       icc
## 1 child_id:school_id 36.52302  22.21657
## 2          school_id 28.76157  39.71194
## 3           Residual 99.11079 100.00000

This took quite a while to run:

library(lmeresampler)
bntm1BS <- bootstrap(bntm1, .f=fixef, type = "case", B = 2000, resample = c(FALSE,TRUE,FALSE))
confint(bntm1BS, type = "perc")

## # A tibble: 4 × 6
##   term               estimate lower upper type  level
##   <chr>                 <dbl> <dbl> <dbl> <chr> <dbl>
## 1 (Intercept)           6.27   5.43  7.12 perc   0.95
## 2 schoolyear            6.37   5.72  7.02 perc   0.95
## 3 mlhome1               0.139 -1.12  1.40 perc   0.95
## 4 schoolyear:mlhome1   -2.60  -3.51 -1.59 perc   0.95

Multilevel level linear regression was used to investigate childrens’ development of word retrieval over 5 years of school, and whether development was dependent upon the school teaching classes monolingually or bilingually. Initial evaluation of the intercept-only model indicated that the clustering of multiple observations from children within schools accounted for 39.7% of the variance in scores on the Boston Naming Task (BNT60, range 0 to 60). BNT60 scores were modelled with fixed effects of school year (1-5) and monolingual teaching (monolingual vs bilingual, treatment coded with monolingual as the reference level). Random intercepts and slopes of school year were included for schools and for children nested within schools. The model was fitting with maximum likelihood estimation using the default optimiser from the lme4 package (Bates et al., 2015).
95% Confidence for fixed effect estimates were constructed by case-based bootstrapping with 2000 bootstraps in which children, (but neither observations within children nor the schools within which children were nested) were resampled. Results indicated that children’s scores on the BNT60 increased over the 5 years in which they were studied, with children from bilingual schools increasing in scores by 6.37 ([5.72 – 7.02]) every school year. There was a significant interaction between mono/bilingual schools and the changes over school year, with children from monolingual schools increasing -2.6 ([-3.51 – -1.59]) less than those from bilingual schools for every additional year of school. Full model results can be found in Table 1.

Table 1

	BNT 60
Predictors	Estimates	95% CI bootstrap
Intercept	6.27	5.43 – 7.12
School Year	6.37	4.85 – 7.89
MonolingualSchool [1]	0.14	-2.75 – 3.03
School Year:MonolingualSchool [1]	-2.60	-4.74 – -0.46
Random Effects
σ²	8.64
τ₀₀ _{child_id:school_id}	1.77
τ₀₀ _{school_id}	3.83
τ₁₁ _{child_id:school_id.schoolyear}	6.83
τ₁₁ _{school_id.schoolyear}	1.89
ρ₀₁ _{child_id:school_id}	-0.42
ρ₀₁ _{school_id}	-0.39
ICC	0.91
N _{child_id}	74
N _{school_id}	10
Observations	370
Marginal R² / Conditional R²	0.420 / 0.947

If you have an older version of lme4, then allFit() might not be directly available, and you will need to run the following: source(system.file("utils", "allFit.R", package="lme4")).↩︎

Formula	Alternative	Meaning
\(\text{(1 \| g)}\)	\(\text{1 + (1 \| g)}\)	Random intercept with fixed mean
\(\text{0 + offset(o) + (1 \| g)}\)	\(\text{-1 + offset(o) + (1 \| g)}\)	Random intercept with a priori means
\(\text{(1 \| g1/g2)}\)	\(\text{(1 \| g1) + (1 \| g1:g2)}\)	Intercept varying among \(g1\) and \(g2\) within \(g1\)
\(\text{(1 \| g1) + (1 \| g2)}\)	\(\text{1 + (1 \| g1) + (1 \| g2)}\)	Intercept varying among \(g1\) and \(g2\)
\(\text{x + (x \| g)}\)	\(\text{1 + x + (1 + x \| g)}\)	Correlated random intercept and slope
\(\text{x + (x \|\| g)}\)	\(\text{1 + x + (x \| g) + (0 + x \| g)}\)	Uncorrelated random intercept and slope

Assumptions, Diagnostics, and Random Effect Structures

Assumptions

Toy data

Diagnostics

Convergence Issues

Singular fits

Convergence warnings

Random effects

Random effects in lme4

Three-level nesting

Crossed random effects

Less-Guided Exercises