Assumptions, Diagnostics, and Random Effect Structures

Preliminaries

Create a new RMarkdown document or R script (whichever you like) for this week.

Exercises: Assumptions & Diagnostics

For these next set of exercises we will return to our study from Week 1, in which researchers want to study the relationship between time spent outdoors and mental wellbeing, across all of Scotland. Data is collected from 20 of the Local Authority Areas and is accessible at https://uoepsy.github.io/data/LAAwellbeing.csv.

variable	description
ppt	Participant ID
name	Participant Name
laa	Local Authority Area
outdoor_time	Self report estimated number of hours per week spent outdoors
wellbeing	Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS), a self-report measure of mental health and well-being. The scale is scored by summing responses to each item, with items answered on a 1 to 5 Likert scale. The minimum scale score is 14 and the maximum is 70.
density	LAA Population Density (people per square km)

Question 1

The code below will read in the data and fit the model with by-LAA random intercepts and slopes of outdoor time.

library(tidyverse)
library(lme4)
scotmw <- read_csv("https://uoepsy.github.io/data/LAAwellbeing.csv")
rs_model <- lmer(wellbeing ~ 1 + outdoor_time + (1 + outdoor_time | laa), data = scotmw)

Plot the residuals vs fitted model, and assess the extend to which the assumption holds that the residuals are zero mean.
Construct a scale-location plot. This is where the square-root of the absolute value of the standardised residuals is plotted against the fitted values, and allows you to more easily assess the assumption of constant variance.

Optional: can you create the same plot using ggplot, starting with the augment() function from the broom.mixed package?

Hint: plot(model) will give you this plot, but you might want to play with the type = c(......) argument to get the smoothing line

Solution

Question 2

Examine the normality of both the level 1 and level 2 residuals.

Hints:

Use hist() if you like, or qqnorm(residuals) followed by qqline(residuals)
Extracting the level 2 residuals (the random effects) can be difficult. ranef(model) will get you some of the way.

Solution

Question 3

Which person in the dataset has the greatest influence on our model?
For which person is the model fit the worst (i.e., who has the highest residual?)
Which LAA has the greatest influence on our model?

Hints:

as well as hlm_influence() in the HLMdiag package there is another nice function, hlm_augment()
we can often end up in confusion because the \(i^{th}\) observation inputted to our model (and therefore the \(i^{th}\) observation of hlm_influence() output) might not be the \(i^{th}\) observation in our original dataset - there may be missing data! (Luckily, we have no missing data in this dataset).

Solution

library(HLMdiag)
l1_inf <- hlm_influence(rs_model,level=1)
dotplot_diag(l1_inf$cooksd, cutoff="internal")+
  ylim(0,.15)

Greatest influence:

hlm_augment(rs_model, level=1) %>% arrange(desc(cooksd))

# A tibble: 132 × 15
      id wellbeing outdoo…¹ laa   .resid .fitted .ls.r…² .ls.f…³ .mar.…⁴ .mar.…⁵
   <dbl>     <dbl>    <dbl> <fct>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1    74        35       33 Scot…  -5.15    40.2  -3.73     38.7  -10.3     45.3
 2   129        60        5 City…   8.89    51.1   6.43     53.6   20.7     39.3
 3   109        31        8 Inve…  -9.22    40.2  -3.03     34.0   -8.94    39.9
 4    59        32        7 Scot…  -6.43    38.4  -7.45     39.5   -7.73    39.7
 5    31        35        7 Moray  -3.44    38.4   0.198    34.8   -4.73    39.7
 6    90        70       34 Na h…  -3.16    73.2  -1.19     71.2   24.5     45.5
 7    87        54       29 High…  -5.33    59.3  -5.66     59.7    9.55    44.4
 8    62        26       21 Midl…  -4.77    30.8   0.214    25.8  -16.7     42.7
 9    67        37        7 East…   4.58    32.4   4.62     32.4   -2.73    39.7
10    64        46       18 City… -10.5     56.5 -10.1      56.1    3.91    42.1
# … with 122 more rows, 5 more variables: cooksd <dbl>, mdffits <dbl>,
#   covtrace <dbl>, covratio <dbl>, leverage.overall <dbl>, and abbreviated
#   variable names ¹outdoor_time, ².ls.resid, ³.ls.fitted, ⁴.mar.resid,
#   ⁵.mar.fitted

scotmw[74, ]

# A tibble: 1 × 6
  ppt   name                 laa              outdoor_time wellbeing density
  <chr> <chr>                <chr>                   <dbl>     <dbl>   <dbl>
1 ID46  Groundskeeper Willie Scottish Borders           33        35      29

Highest residual:

hlm_augment(rs_model, level=1) %>% arrange(desc(abs(.resid)))

# A tibble: 132 × 15
      id wellbeing outdoo…¹ laa   .resid .fitted .ls.r…² .ls.f…³ .mar.…⁴ .mar.…⁵
   <dbl>     <dbl>    <dbl> <fct>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1    64        46       18 City… -10.5     56.5  -10.1     56.1    3.91    42.1
 2   107        22       12 East… -10.3     32.3  -10.1     32.1  -18.8     40.8
 3    72        22       24 West… -10.0     32.0   -9.24    31.2  -21.4     43.4
 4   109        31        8 Inve…  -9.22    40.2   -3.03    34.0   -8.94    39.9
 5   130        22       16 West…  -8.98    31.0   -8.61    30.6  -19.7     41.7
 6   129        60        5 City…   8.89    51.1    6.43    53.6   20.7     39.3
 7    93        65       18 City…   8.51    56.5    8.90    56.1   22.9     42.1
 8    85        47       15 City…  -8.25    55.2   -8.52    55.5    5.56    41.4
 9     7        38       13 Pert…  -7.84    45.8   -5.28    43.3   -3.01    41.0
10   121        31       16 Dumf…  -7.78    38.8   -7.70    38.7  -10.7     41.7
# … with 122 more rows, 5 more variables: cooksd <dbl>, mdffits <dbl>,
#   covtrace <dbl>, covratio <dbl>, leverage.overall <dbl>, and abbreviated
#   variable names ¹outdoor_time, ².ls.resid, ³.ls.fitted, ⁴.mar.resid,
#   ⁵.mar.fitted

scotmw[64, ]

# A tibble: 1 × 6
  ppt   name            laa               outdoor_time wellbeing density
  <chr> <chr>           <chr>                    <dbl>     <dbl>   <dbl>
1 ID37  Nicola Sturgeon City of Edinburgh           18        46    1958

Most influential LAA:

hlm_augment(rs_model, level="laa") %>% arrange(desc(cooksd))

# A tibble: 20 × 10
   laa          .rane…¹ .ranef…² .ls.i…³ .ls.o…⁴  cooksd mdffits covtr…⁵ covra…⁶
   <chr>          <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 Midlothian    -1.76  -0.486     8.75  -1.26   0.143   0.136    0.104     1.11
 2 Na h-Eilean…  14.1    0.399    21.4    0.0553 0.142   0.136    0.112     1.11
 3 Glasgow City -10.9   -0.464    -9.45  -0.593  0.101   0.0957   0.130     1.13
 4 City of Edi…  10.8    0.199    16.2   -0.144  0.0973  0.0917   0.114     1.12
 5 Stirling       6.51  -0.0758   13.3   -0.500  0.0877  0.0821   0.121     1.12
 6 Shetland Is…   4.73   0.389     3.69   0.424  0.0637  0.0600   0.118     1.12
 7 Angus         -4.86   0.0959   -6.82   0.265  0.0623  0.0584   0.117     1.12
 8 West Lothian  -3.13  -0.344     4.13  -0.725  0.0548  0.0519   0.101     1.10
 9 Falkirk       -7.45  -0.157   -12.3   -0.0360 0.0424  0.0407   0.100     1.10
10 Inverclyde    -1.30   0.197   -14.5    1.17   0.0405  0.0397   0.0735    1.07
11 Highland       6.73   0.281     8.95   0.155  0.0377  0.0358   0.125     1.13
12 West Dunbar…  -6.41  -0.267    -4.87  -0.395  0.0340  0.0323   0.0686    1.07
13 Moray         -2.22   0.133    -5.81   0.267  0.0336  0.0319   0.106     1.11
14 Perth and K…   1.49   0.257   -20.4    1.76   0.0334  0.0328   0.0650    1.07
15 East Ayrshi…  -5.67  -0.233    -3.63  -0.391  0.0263  0.0250   0.105     1.11
16 Orkney Isla…   0.179  0.155    -0.816  0.212  0.0169  0.0160   0.108     1.11
17 Scottish Bo…  -0.264 -0.148     3.28  -0.367  0.0155  0.0145   0.130     1.13
18 Dumfries an…  -2.69  -0.0120   -7.27   0.261  0.00836 0.00811  0.0728    1.07
19 Argyll and …   1.65  -0.00518   4.67  -0.187  0.00429 0.00402  0.124     1.13
20 East Renfre…   0.502  0.0849    1.40   0.0247 0.00393 0.00371  0.119     1.12
# … with 1 more variable: leverage.overall <dbl>, and abbreviated variable
#   names ¹.ranef.intercept, ².ranef.outdoor_time, ³.ls.intercept,
#   ⁴.ls.outdoor_time, ⁵covtrace, ⁶covratio

Question 4

Looking at the random effects, which LAA shows the least improvement in wellbeing as outdoor time increases, and which shows the greatest improvement?
What is the estimated wellbeing for people from City of Edinburgh with zero hours of outdoor time per week, and what is their associated increases in wellbeing for every hour per week increase in outdoor time?

Solution

It looks like the residents of Midlothian have the least improvement, and the Western Isles (Na h-Eileanan Siar) show the most increases of wellbeing with outdoor time. We can see this from the LAA-random slopes of outdoor time:

ranef(rs_model)

$laa
                      (Intercept) outdoor_time
Angus                  -4.8568066  0.095850303
Argyll and Bute         1.6488121 -0.005181049
City of Edinburgh      10.8163125  0.199034174
Dumfries and Galloway  -2.6893688 -0.012005965
East Ayrshire          -5.6749200 -0.232750990
East Renfrewshire       0.5024800  0.084907037
Falkirk                -7.4525578 -0.156694328
Glasgow City          -10.9101439 -0.464232183
Highland                6.7315989  0.280992008
Inverclyde             -1.2966048  0.197142062
Midlothian             -1.7585791 -0.485961786
Moray                  -2.2165380  0.133392034
Na h-Eileanan Siar     14.0595006  0.399493656
Orkney Islands          0.1789928  0.154590585
Perth and Kinross       1.4894924  0.256689754
Scottish Borders       -0.2638474 -0.148174460
Shetland Islands        4.7262680  0.388873631
Stirling                6.5060959 -0.075781592
West Dunbartonshire    -6.4127140 -0.266515096
West Lothian           -3.1274727 -0.343667797

with conditional variances for "laa"

We can get the cluster-specific coefficients either by adding the fixef() and ranef() together, or using coef():

coef(rs_model)

$laa
                      (Intercept) outdoor_time
Angus                    33.36700   0.31050188
Argyll and Bute          39.87261   0.20947052
City of Edinburgh        49.04011   0.41368575
Dumfries and Galloway    35.53443   0.20264561
East Ayrshire            32.54888  -0.01809942
East Renfrewshire        38.72628   0.29955861
Falkirk                  30.77124   0.05795725
Glasgow City             27.31366  -0.24958061
Highland                 44.95540   0.49564358
Inverclyde               36.92720   0.41179364
Midlothian               36.46522  -0.27131021
Moray                    36.00726   0.34804361
Na h-Eileanan Siar       52.28330   0.61414523
Orkney Islands           38.40280   0.36924216
Perth and Kinross        39.71329   0.47134133
Scottish Borders         37.95995   0.06647711
Shetland Islands         42.95007   0.60352520
Stirling                 44.72990   0.13886998
West Dunbartonshire      31.81109  -0.05186352
West Lothian             35.09633  -0.12901622

attr(,"class")
[1] "coef.mer"

coef(rs_model)$laa["City of Edinburgh",]

                  (Intercept) outdoor_time
City of Edinburgh    49.04011    0.4136857

Random Effect Structures

Random effect structures can get pretty complicated quite quickly. Very often it is not the random effects part that is of specific interest to us, but we wish to estimate random effects in order to more accurately partition up the variance in our outcome variable and provide better estimates of fixed effects.

It is a fine balance between fitting the most sophisticated model structure that we possibly can, and fitting a model that converges without too much simplification. Typically for many research designs, the following steps will keep you mostly on track to finding the maximal model:

lmer(outcome ~ fixed effects + (random effects | grouping structure))

Specify the outcome ~ fixed effects bit first.
- The outcome variable should be clear: it is the variable we are wishing to explain/predict.
- The fixed effects are the things we want to use to explain/predict variation in the outcome variable. These will often be the things that are of specific inferential interest, and other covariates. Just like the simple linear model.
If there is a grouping structure to your data, and those groups (preferably n>5) are perceived as a random sample of a wider population (the specific groups aren’t interesting to you), then consider fitting them in the random effects part (1 | grouping).
If any of the things in the fixed effects vary within the groups, it might be possible to also include them as random effects.
- as a general rule, don’t specify random effects that are not also specified as fixed effects (an exception could be specifically for model comparison, to isolate the contribution of the fixed effect).
- For things that do not vary within the groups, it rarely makes sense to include them as random effects. For instance if we had a model with lmer(score ~ genetic_status + (1 + genetic_status | participant)) then we would be trying to model a process where “the effect of genetic_status on scores is different for each participant”. But if you consider an individual participant, their genetic status never changes. For participant \(i\), what is “the effect of genetic status on score”? It’s undefined. This is because genetic status only varies between participants.

Random Effects in lme4

Below are a selection of different formulas for specifying different random effect structures, taken from the lme4 vignette. This might look like a lot, but over time and repeated use of multilevel models you will get used to reading these in a similar way to getting used to reading the formula structure of y ~ x1 + x2 in all our linear models.

Formula	Alternative	Meaning
\(\text{(1 \| g)}\)	\(\text{1 + (1 \| g)}\)	Random intercept with fixed mean
\(\text{(1 \| g1/g2)}\)	\(\text{(1 \| g1) + (1 \| g1:g2)}\)	Intercept varying among \(g1\) and \(g2\) within \(g1\)
\(\text{(1 \| g1) + (1 \| g2)}\)	\(\text{1 + (1 \| g1) + (1 \| g2)}\)	Intercept varying among \(g1\) and \(g2\)
\(\text{x + (x \| g)}\)	\(\text{1 + x + (1 + x \| g)}\)	Correlated random intercept and slope
\(\text{x + (x \|\| g)}\)	\(\text{1 + x + (x \| g) + (0 + x \| g)}\)	Uncorrelated random intercept and slope

Table 1: Examples of the right-hand-sides of mixed effects model formulas. \(g\), \(g1\), \(g2\) are grouping factors, \(x\) is a predictor variable.

Convergence Issues and What To Do

Singular fits

You may have noticed that some of our models over the last few weeks have been giving a warning: boundary (singular) fit: see ?isSingular.
Up to now, we’ve been largely ignoring these warnings. However, this week we’re going to look at how to deal with this issue.

boundary (singular) fit: see ?isSingular

The warning is telling us that our model has resulted in a ‘singular fit’. Singular fits often indicate that the model is ‘overfitted’ - that is, the random effects structure which we have specified is too complex to be supported by the data.

Perhaps the most intuitive advice would be remove the most complex part of the random effects structure (i.e. random slopes). This leads to a simpler model that is not over-fitted. In other words, start simplying from the top (where the most complexity is) to the bottom (where the lowest complexity is). Additionally, when variance estimates are very low for a specific random effect term, this indicates that the model is not estimating this parameter to differ much between the levels of your grouping variable. It might, in some experimental designs, be perfectly acceptable to remove this or simply include it as a fixed effect.

A key point here is that when fitting a mixed model, we should think about how the data are generated. Asking yourself questions such as “do we have good reason to assume subjects might vary over time, or to assume that they will have different starting points (i.e., different intercepts)?” can help you in specifying your random effect structure

You can read in depth about what this means by reading the help documentation for ?isSingular. For our purposes, a relevant section is copied below:

… intercept-only models, or 2-dimensional random effects such as intercept + slope models, singularity is relatively easy to detect because it leads to random-effect variance estimates of (nearly) zero, or estimates of correlations that are (almost) exactly -1 or 1.

Convergence warnings

Issues of non-convergence can be caused by many things. If you’re model doesn’t converge, it does not necessarily mean the fit is incorrect, however it is is cause for concern, and should be addressed, else you may end up reporting inferences which do not hold.

There are lots of different things which you could do which might help your model to converge. A select few are detailed below:

double-check the model specification and the data
adjust stopping (convergence) tolerances for the nonlinear optimizer, using the optCtrl argument to [g]lmerControl. (see ?convergence for convergence controls).
- What is “tolerance”? Remember that our optimizer is the the method by which the computer finds the best fitting model, by iteratively assessing and trying to maximise the likelihood (or minimise the loss).

Figure 1: An optimizer will stop after a certain number of iterations, or when it meets a tolerance threshold

center and scale continuous predictor variables (e.g. with scale)
Change the optimization method (for example, here we change it to bobyqa): lmer(..., control = lmerControl(optimizer="bobyqa"))
glmer(..., control = glmerControl(optimizer="bobyqa"))
Increase the number of optimization steps: lmer(..., control = lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000))
glmer(..., control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=50000))
Use allFit() to try the fit with all available optimizers. This will of course be slow, but is considered ‘the gold standard’; “if all optimizers converge to values that are practically equivalent, then we would consider the convergence warnings to be false positives.”
Consider simplifying your model, for example by removing random effects with the smallest variance (but be careful to not simplify more than necessary, and ensure that your write up details these changes)

Exercises: Random Effect Structures

Crossed Ranefs

Data: Test-enhanced learning

An experiment was run to conceptually replicate “test-enhanced learning” (Roediger & Karpicke, 2006): two groups of 25 participants were presented with material to learn. One group studied the material twice (StudyStudy), the other group studied the material once then did a test (StudyTest). Recall was tested immediately (one minute) after the learning session and one week later. The recall tests were composed of 175 items identified by a keyword (Test_word). One of the researchers’ questions concerned how test-enhanced learning influences time-to-recall.

The critical (replication) prediction is that the StudyStudy group should perform somewhat better on the immediate recall test, but the StudyTest group will retain the material better and thus perform better on the 1-week follow-up test.

variable	description
Subject_ID	Unique Participant Identifier
Group	Group denoting whether the participant studied the material twice (StudyStudy), or studied it once then did a test (StudyTest)
Delay	Time of recall test ('min' = Immediate, 'week' = One week later)
Test_word	Word being recalled (175 different test words)
Correct	Whether or not the word was correctly recalled
Rtime	Time to recall word (milliseconds)

The following code loads the data into your R environment by creating a variable called tel:

load(url("https://uoepsy.github.io/data/testenhancedlearning.RData"))

Question 5

Load and plot the data.
For this week, we’ll use Reaction Time as our proxy for the test performance, so you’ll probably want that variable on the y-axis.

Does it look like the effect was replicated?

Solution

Question 6

The critical (replication) prediction is that the StudyStudy group should perform somewhat better on the immediate recall test, but the StudyTest group will retain the material better and thus perform better on the 1-week follow-up test.

Test the critical hypothesis using a multi-level model.
Try to fit the maximally complex random effect structure that is supported by the experimental design.

NOTE: Your model probably won’t converge. We’ll deal with that in the next question

Hints:

We can expect variability across subjects (some people are better at learning than others) and across items (some of the recall items are harder than others). How should this be represented in the random effects?
If a model takes ages to fit, you might want to cancel it by pressing the escape key. It is normal for complex models to take time, but for the purposes of this task, give up after a couple of minutes, and try simplifying your model.

Solution

Question 7

Often, models with maximal random effect structures will not converge, or will obtain a singular fit. One suggested approach here is to simplify the model until you achieve convergence (Barr et al., 2013).

Incrementally simplify your model from the previous question until you obtain a model that converges and is not a singular fit.

Hint: you can look at the variance estimates and correlations easily by using the VarCorr() function. What jumps out?

Solution

There are very high correlations with the by-item random effects of the interaction Delay:Group. We might expect that because it’s an interaction term, but it is quite a complex bit of the model, so let’s remove it:

VarCorr(m)

 Groups     Name                     Std.Dev. Corr                
 Test_word  (Intercept)               17.8642                     
            Delayweek                 13.2793 -0.251              
            GroupStudyTest            18.0270 -0.797 -0.385       
            Delayweek:GroupStudyTest  13.1273  0.972 -0.016 -0.917
 Subject_ID (Intercept)               40.5201                     
            Delayweek                  7.4486 -0.038              
 Residual                            240.3113

m1 <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Delay + Group | Test_word),
           data=tel, control=lmerControl(optimizer = "bobyqa"))
VarCorr(m1)

 Groups     Name           Std.Dev. Corr         
 Test_word  (Intercept)     14.5561              
            Delayweek       14.6914  0.152       
            GroupStudyTest  12.3210 -0.612 -0.875
 Subject_ID (Intercept)     40.5190              
            Delayweek        7.4365 -0.038       
 Residual                  240.3436

isSingular(m1)

[1] TRUE

We still have a singular fit here, and we still have quite high¹ correlations between by-testword random effects. Thinking about the study, if we are going to remove one of the by-testword random effects (Delay or Group), which one do we consider to be more theoretically justified? Is the effect of Delay likely to vary by test-words? More so than the effect of group is likely to vary by test-words? Quite possibly - there’s no obvious reason for certain words to be more memorable for people in one group vs another. But there is reason for words to vary in the effect that delay of one week has - how familiar a word is will likely influence the amount to which a week’s delay has on recall.

Let’s remove the by-testword random effect of group.

m2 <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Delay | Test_word),
           data=tel, control=lmerControl(optimizer = "bobyqa"))
isSingular(m2)

[1] FALSE

VarCorr(m2)

 Groups     Name        Std.Dev. Corr  
 Test_word  (Intercept)  11.6972       
            Delayweek    13.5689 -0.236
 Subject_ID (Intercept)  40.5156       
            Delayweek     7.4001 -0.037
 Residual               240.4432

Hooray, the model converged!

summary(m2)

Linear mixed model fit by REML ['lmerMod']
Formula: Rtime ~ Delay * Group + (1 + Delay | Subject_ID) + (1 + Delay |  
    Test_word)
   Data: tel
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 241671.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8200 -0.6685 -0.0096  0.6760  4.0783 

Random effects:
 Groups     Name        Variance Std.Dev. Corr 
 Test_word  (Intercept)   136.82  11.70        
            Delayweek     184.12  13.57   -0.24
 Subject_ID (Intercept)  1641.52  40.52        
            Delayweek      54.76   7.40   -0.04
 Residual               57812.93 240.44        
Number of obs: 17498, groups:  Test_word, 175; Subject_ID, 50

Fixed effects:
                         Estimate Std. Error t value
(Intercept)               744.047      8.925  83.363
Delayweek                  26.766      5.448   4.913
GroupStudyTest            -18.032     12.560  -1.436
Delayweek:GroupStudyTest  -17.647      7.566  -2.332

Correlation of Fixed Effects:
            (Intr) Delywk GrpStT
Delayweek   -0.285              
GropStdyTst -0.704  0.200       
Dlywk:GrpST  0.202 -0.694 -0.288

Let’s quickly visualise the interaction. Remember, lower reaction times are better here. It looks like we have replicated the hypothesised effect:

library(sjPlot)
plot_model(m2, type="int")

Nested Random Effects

Data: Naming

74 children from 10 schools were administered the full Boston Naming Test (BNT-60) on a yearly basis for 5 years to examine development of word retrieval. Five of the schools taught lessons in a bilingual setting with English as one of the languages, and the remaining five schools taught in monolingual English.

The data is available at https://uoepsy.github.io/data/bntmono.csv.

variable	description
child_id	unique child identifier
school_id	unique school identifier
BNT60	score on the Boston Naming Test-60. Scores range from 0 to 60
schoolyear	Year of school
mlhome	Mono/Bi-lingual School. 0 = Bilingual, 1 = Monolingual

Question 8

Let’s start by thinking about our clustering - we’d like to know how much of the variance in BNT60 scores is due to the clustering of data within children, who are themselves within schools. One easy way of assessing this is to fit an intercept only model, which has the appropriate random effect structure.

Using the model below, calculate the proportion of variance attributable to the clustering of data within children within schools.

bnt_null <- lmer(BNT60 ~ 1 +  (1 | school_id/child_id), data = bnt)

Hint: the random intercept variances are the building blocks here. There are no predictors in this model, so all the variance in the outcome gets attributed to either school-level nesting, child-level nesting, or else is lumped into the residual.

Solution

Question 9

Fit a model examining the interaction between the effects of school year and mono/bilingual teaching on word retrieval, with random intercepts only for children and schools.

Hint: make sure your variables are of the right type first - e.g. numeric, factor etc

Examine the fit and consider your model assumptions, and assess what might be done to improve the model in order to make better statistical inferences.

Solution

This is a quick way to make a set of variables factors:

bnt <- bnt %>% mutate(across(c(mlhome, school_id, child_id), factor))

And now let’s fit our model:

bntm0 <- lmer(BNT60 ~ schoolyear * mlhome + (1 | school_id/child_id), data = bnt)

Residuals don’t look zero mean:

plot(bntm0, type=c("p","smooth"))

It looks a little like, compared to our model (black lines below) the children’s scores (coloured lines) are more closely clustered together when they start school, and then they are more spread out by the end of the study. The fact that we’re fitting the same slope for each child is restricting us here, so we should try fitting random effects of schoolyear.

augment(bntm0) %>%
  ggplot(aes(x=schoolyear, col=child_id)) + 
  geom_point(aes(y = BNT60))+
  geom_path(aes(y = BNT60))+
  geom_path(aes(y = .fitted), col="black", alpha=.3)+
  guides(col="none")+
  facet_wrap(~school_id)

bntm1 <- lmer(BNT60 ~ schoolyear * mlhome + (1 + schoolyear | school_id/child_id), data = bnt)
plot(bntm1, type=c("p","smooth"))

Much better!

Let’s do some quick diagnostic checks for influence:

inf1 <- hlm_influence(bntm1, level=1)
dotplot_diag(inf1$cooksd, cutoff = "internal")

If you check in the help for dotplot_diag(), it tells you that

we can add an index for the labels, and
the coordinates (x,y) are flipped. We’re telling R to change the limits of the y axis, but actually it is the x axis. This is just because we want to see the label for that point out to the right.

infchild <- hlm_influence(bntm1, level="child_id:school_id")
dotplot_diag(infchild$cooksd, cutoff = "internal", index = infchild$`child_id:school_id`) + 
  scale_y_continuous(limits=c(0,.05))

And then we can examine the effects to the fixed effects and our standard errors when we remove this child:

del94 <- case_delete(bntm1, level="child_id:school_id", delete = "ID94:SC9")
cbind(del94$fixef.original, del94$fixef.delete)

                        [,1]       [,2]
(Intercept)         6.265626  6.2627962
schoolyear          6.371168  6.3639335
mlhome1             0.138711 -0.3052998
schoolyear:mlhome1 -2.603763 -2.3701181

Optional: Case deletion influence on standard errors

infschool <- hlm_influence(bntm1, level="school_id")
dotplot_diag(infschool$cooksd, cutoff = "internal", index = infschool$school_id)

Question 10

Using a method of your choosing, conduct inferences (i.e. obtain p-values or confidence intervals) from your final model and write up the results.

Solution

We’ll use case-based bootstrapping for a demonstration, but other methods would be appropriate here. We have a large sample of children (74), each with 5 observations. However, we only have 10 schools. A standard likelihood ratio test using anova(model1, model2) might not be preferable here.

This took quite a while to run:

library(lmeresampler)
bntm1BS <- bootstrap(bntm1, .f=fixef, type = "case", B = 2000, resample = c(FALSE,TRUE,FALSE))
confint(bntm1BS, type = "perc")

# A tibble: 4 × 6
  term               estimate lower upper type  level
  <chr>                 <dbl> <dbl> <dbl> <chr> <dbl>
1 (Intercept)           6.27   5.43  7.12 perc   0.95
2 schoolyear            6.37   5.72  7.02 perc   0.95
3 mlhome1               0.139 -1.12  1.40 perc   0.95
4 schoolyear:mlhome1   -2.60  -3.51 -1.59 perc   0.95

Multilevel level linear regression was used to investigate childrens’ development of word retrieval over 5 years of school, and whether development was dependent upon the school teaching classes monolingually or bilingually. Initial evaluation of the intercept-only model indicated that the clustering of multiple observations from children within schools accounted for 39.7% of the variance in scores on the Boston Naming Task (BNT60, range 0 to 60). BNT60 scores were modelled with fixed effects of school year (1-5) and monolingual teaching (monolingual vs bilingual, treatment coded with monolingual as the reference level). Random intercepts and slopes of school year were included for schools and for children nested within schools. The model was fitting with maximum likelihood estimation using the default optimiser from the lme4 package (Bates et al., 2015).
95% Confidence for fixed effect estimates were constructed by case-based bootstrapping with 2000 bootstraps in which children, (but neither observations within children nor the schools within which children were nested) were resampled. Results indicated that children’s scores on the BNT60 increased over the 5 years in which they were studied, with children from bilingual schools increasing in scores by 6.37 ([5.72 – 7.02]) every school year. There was a significant interaction between mono/bilingual schools and changes over the school year, with children from monolingual schools increasing -2.6 ([-3.51 – -1.59]) less than those from bilingual schools for every additional year of school. Full model results can be found in Table 1.

Table 1

::: {.cell layout-align=“center”}

	BNT 60
Predictors	Estimates	95% CI bootstrap
Intercept	6.27	5.43 – 7.12
School Year	6.37	4.85 – 7.89
MonolingualSchool [1]	0.14	-2.75 – 3.03
School Year:MonolingualSchool [1]	-2.60	-4.74 – -0.46
Random Effects
σ²	8.64
τ₀₀ _{child_id:school_id}	1.77
τ₀₀ _{school_id}	3.83
τ₁₁ _{child_id:school_id.schoolyear}	6.83
τ₁₁ _{school_id.schoolyear}	1.89
ρ₀₁ _{child_id:school_id}	-0.42
ρ₀₁ _{school_id}	-0.39
ICC	0.91
N _{child_id}	74
N _{school_id}	10
Observations	370
Marginal R² / Conditional R²	0.420 / 0.947

:::

library(effects)
as.data.frame(effect("schoolyear:mlhome",bntm1)) %>%
  ggplot(., aes(x=schoolyear,y=fit,col=mlhome))+
  geom_pointrange(aes(ymin=lower,ymax=upper))+
  scale_color_manual(NULL,labels=c("Bilingual","Monolingual"),values=c("tomato1","navyblue"))+
  labs(x="- School Year -", y="BNT-60")

Footnotes

It’s always going to be debateable about what is ‘too high’ because in certain situations you might expect correlations close to 1. It’s best to think through whether it is a feasible value given the study itself↩︎

Formula	Alternative	Meaning
\(\text{(1 \| g)}\)	\(\text{1 + (1 \| g)}\)	Random intercept with fixed mean
\(\text{(1 \| g1/g2)}\)	\(\text{(1 \| g1) + (1 \| g1:g2)}\)	Intercept varying among \(g1\) and \(g2\) within \(g1\)
\(\text{(1 \| g1) + (1 \| g2)}\)	\(\text{1 + (1 \| g1) + (1 \| g2)}\)	Intercept varying among \(g1\) and \(g2\)
\(\text{x + (x \| g)}\)	\(\text{1 + x + (1 + x \| g)}\)	Correlated random intercept and slope
\(\text{x + (x \|\| g)}\)	\(\text{1 + x + (x \| g) + (0 + x \| g)}\)	Uncorrelated random intercept and slope