Week 1 Exercises: Intro to MLM
These are the main packages we’re going to use in this block. It might make sense to install them now if you do not have them already
- tidyverse : for organising data
- lme4 : for fitting generalised linear mixed effects models
- broom.mixed : tidying methods for mixed models
- effects : for tabulating and graphing effects in linear models
- lmerTest: for quick p-values from mixed models
- parameters: various inferential methods for mixed models
Getting to grips with MLM
These first set of exercises are not “how to do analyses with multilevel models” - they are designed to get you thinking, and help with an understanding of how these models work.
Data: New Toys!
Recall the example from last semesters’ USMR course, where the lectures explored linear regression with a toy dataset of how practice influences the reading age of toy characters (see USMR Week 7 Lecture). We’re going to now broaden our scope to the investigation of how practice affects reading age for all toys (not just Martin’s Playmobil characters).
You can find a dataset at https://uoepsy.github.io/data/toy2.csv containing information on 129 different toy characters that come from a selection of different families/types of toy. You can see the variables in the table below1.
variable | description |
---|---|
toy_type | Type of Toy |
year | Year Released |
toy | Character |
hrs_week | Hours of practice per week |
R_AGE | Reading Age |
Below is some code that fits a model of “reading age” (R_AGE
) predicted by hours of practice (hrs_week
). Line 2 then gets the ‘fitted’ values from the model and adds them as a new column to the dataset, called pred_lm
. The fitted values are what the model predicts for every individual observation (every individual toy in our dataset).
Lines 4-7 then plot the data, split up by each type of toy, and adds lines showing the model fitted values.
Run the code and check that you get a plot. What do you notice about the lines?
Below are 3 more code chunks that all 1) fit a model, then 2) add the fitted values of that model to the plot.
The first model is a ‘no-pooling’ approach, where we use tools learned in USMR and simply add in toy_type
as a predictor in the model to estimate all the differences between types of toys.
The second and third are multilevel models. The second fits random intercepts by-toytype, and the third fits random intercepts and slopes of hrs_week
Copy each chunk and run through the code. Pay attention to how the lines differ.
Code
<- lm(R_AGE ~ toy_type + hrs_week, data = toy2)
fe_mod $pred_fe <- predict(fe_mod)
toy2
ggplot(toy2, aes(x = hrs_week)) +
geom_point(aes(y = R_AGE), size=1, alpha=.3) +
facet_wrap(~toy_type) +
geom_line(aes(y=pred_fe), col = "blue")
Code
library(lme4)
<- lmer(R_AGE ~ hrs_week + (1 | toy_type), data = toy2)
ri_mod $pred_ri <- predict(ri_mod)
toy2
ggplot(toy2, aes(x = hrs_week)) +
geom_point(aes(y = R_AGE), size=1, alpha=.3) +
facet_wrap(~toy_type) +
geom_line(aes(y=pred_ri), col = "green")
Code
<- lmer(R_AGE ~ hrs_week + (1 + hrs_week | toy_type), data = toy2)
rs_mod $pred_rs <- predict(rs_mod)
toy2
ggplot(toy2, aes(x = hrs_week)) +
geom_point(aes(y = R_AGE), size=1, alpha=.3) +
facet_wrap(~toy_type) +
geom_line(aes(y=pred_rs), col = "orange")
From the previous questions you should have a model called ri_mod
.
Below is a plot of the fitted values from that model. Rather than having a separate facet for each type of toy as we did above, I have put them all on one plot. The thick black line is the average intercept and slope of the toy-type lines.
Identify the parts of the plot that correspond to A1-4 in the summary output of the model below
Choose from these options:
- where the black line cuts the y axis (at x=0)
- the slope of the black line
- the standard deviation of the distances from all the individual datapoints (toys) to their respective toy-type lines
- the standard deviation of the distances from all the toy-type lines to the black line
Below is the model equation for the ri_mod
model.
Identify the part of the equation that represents each of A1-4.
\[\begin{align} \text{For Toy }j\text{ of Type }i & \\ \text{Level 1 (Toy):}& \\ \text{R\_AGE}_{ij} &= b_{0i} + b_1 \cdot \text{hrs\_week}_{ij} + \epsilon_{ij} \\ \text{Level 2 (Type):}& \\ b_{0i} &= \gamma_{00} + \zeta_{0i} \\ \text{Where:}& \\ \zeta_{0i} &\sim N(0,\sigma_{0}) \\ \varepsilon &\sim N(0,\sigma_{e}) \\ \end{align}\]
Choose from:
- \(\sigma_{\varepsilon}\)
- \(b_{1}\)
- \(\sigma_{0}\)
- \(\gamma_{00}\)
Audio Interference in Executive Functioning (Repeated Measures)
This next set are closer to conducting a real study. We have some data and a research question (below). The exercises will walk you through describing the data, then prompt you to think about how we might fit an appropriate model to address the research question, and finally task you with having a go at writing up what you’ve done.
Data: Audio interference in executive functioning
This data is from a simulated study that aims to investigate the following research question:
How do different types of audio interfere with executive functioning, and does this interference differ depending upon whether or not noise-cancelling headphones are used?
30 healthy volunteers each completed the Symbol Digit Modalities Test (SDMT) - a commonly used test to assess processing speed and motor speed - a total of 15 times. During the tests, participants listened to either no audio (5 tests), white noise (5 tests) or classical music (5 tests). Half the participants listened via active-noise-cancelling headphones, and the other half listened via speakers in the room. Unfortunately, lots of the tests were not administered correctly, and so not every participant has the full 15 trials worth of data.
The data is available at https://uoepsy.github.io/data/lmm_ef_sdmt.csv.
variable | description |
---|---|
PID | Participant ID |
audio | Audio heard during the test ('no_audio', 'white_noise','music') |
headphones | Whether the participant listened via speakers (S) in the room or via noise cancelling headphones (H) |
SDMT | Symbol Digit Modalities Test (SDMT) score |
How many participants are there in the data?
How many have complete data (15 trials)?
What is the average number of trials that participants completed? What is the minimum?
Does every participant have some data for each type of audio?
Functions like table()
and count()
will likely be useful here.
How do different types of audio interfere with executive functioning, and does this interference differ depending upon whether or not noise-cancelling headphones are used?
Consider the following questions about the study:
- What is our outcome of interest?
- What variables are we seeking to investigate in terms of their impact on the outcome?
- What are the units of observations?
- Are the observations clustered/grouped? In what way?
- What varies within these clusters?
- What varies between these clusters?
Make factors and set the reference levels of the audio
and headphones
variables to “no audio” and “speakers” respectively.
Can’t remember about setting factors and reference levels? Check back to USMR!
Fit a multilevel model to address the aims of the study (copied below)
How do different types of audio interfere with executive functioning, and does this interference differ depending upon whether or not noise-cancelling headphones are used?
Specifying the model may feel like a lot, but try splitting it into three parts:
\[ \text{lmer(}\overbrace{\text{outcome }\sim\text{ fixed effects}}^1\, + \, (1 + \underbrace{\text{slopes}}_3\, |\, \overbrace{\text{grouping structure}}^2 ) \]
- Just like the
lm()
s we have used in the past, think about what we want to test. This should provide the outcome and the structure of our fixed effects.
- Think about how the observations are clustered/grouped. This should tell us how to specify the grouping structure in the random effects.
- Think about which slopes (i.e. which terms in our fixed effects) could feasibly vary between the clusters. This provides you with what to put in as random slopes.
Make sure to read about multilevel modesl and how to fit them in Chapter 2: MLM #multilevel-models-in-r.
We now have a model, but we don’t have any p-values or confidence intervals or anything - i.e. we have no inferential criteria on which to draw conclusions. There are a whole load of different methods available for drawing inferences from multilevel models, which means it can be a bit of a never-ending rabbit hole. For now, we’ll just use the ‘quick and easy’ approach provided by the lmerTest package seen in the lectures.
Using the lmerTest package, re-fit your model, and you should now get some p-values!
If you use library(lmerTest)
to load the package, then every single model you fit will show p-values calculated with the Satterthwaite method.
Personally, I would rather this is not the case, so I often opt to fit specific models with these p-values without ever loading the package:
modp <- lmerTest::lmer(y ~ 1 + x + ....
If we want to go down the model comparison route, we just need to isolate the relevant part(s) of the model that we are interested in.
Remember, model comparison is sometimes a useful way of testing a set of coefficients. For instance, in this example the interaction involves estimating two terms: audiomusic:headphonesH
and audiowhite_noise:headphonesH
.
To test the interaction as a whole, we can create a model without the interaction, and then compare it. The SATmodcomp()
function from the pbkrtest package provides a way of conducting an F test with the same Satterthwaite method of approximating the degrees of freedom:
<- lmer(SDMT ~ audio * headphones +
sdmt_mod 1 + audio | PID), data = efdat)
(<- lmer(SDMT ~ audio + headphones +
sdmt_res 1 + audio | PID), data = efdat)
(library(pbkrtest)
SATmodcomp(largeModel = sdmt_mod, smallModel = sdmt_res)
large : SDMT ~ audio * headphones + (1 + audio | PID)
small (restriction matrix) :
0 0 0 0 0.8936904 -0.4486841
0 0 0 0 0.4486841 0.8936904
statistic ndf ddf p.value
[1,] 11.051 2.000 26.909 0.0003136 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We’ve already seen in the example with the with the different types of toys (above) that we can visualise the fitted values (model predictions). But these were plotting all the cluster-specific values, and what we are really interested in are the estimates of (and uncertainty around) our fixed effects (i.e. estimates for clusters on average)
Using tools like the effects package can provide us with the values of the outcome across levels of a specific fixed predictor (holding other predictors at their mean).
This should get you started:
library(effects)
effect(term = "audio*headphones", mod = sdmt_mod) |>
as.data.frame()
You can see the effects package in Chapter 2: MLM #visualising-models. The logic is just the same as it was for USMR, it’s just that the estimated effects are from an lmer()
instead of an lm()
/glm()
.
Now we have some p-values and a plot, try to create a short write-up of the analysis and results.
Think about the principles that have guided you during write-ups thus far.
The aim in writing a statistical report should be that a reader is able to more or less replicate your analyses without referring to your analysis code. Furthermore, it should be able for a reader to understand and replicate your work even if they use something other than R. This requires detailing all of the steps you took in conducting the analysis, but without simply referring to R code.
- Provide a description of the sample that is used in the analysis, and any steps that you took to get this sample (i.e. data cleaning/removal)
- Describe the model/test and how it addresses the research question. What is the structure of the model, and how did you get to this model? (You don’t need a fancy model equation, you can describe in words!).
- Present (visually and numerically) the key results of the coefficient tests or model comparisons, and explain what these mean in the context of the research question (this could be things like practical significance of the effect size, and the group-level variability in the effects).
Footnotes
Image sources:
http://tophatsasquatch.com/2012-tmnt-classics-action-figures/
https://www.dezeen.com/2016/02/01/barbie-dolls-fashionista-collection-mattel-new-body-types/
https://www.wish.com/product/5da9bc544ab36314cfa7f70c
https://www.worldwideshoppingmall.co.uk/toys/jumbo-farm-animals.asp
https://www.overstock.com/Sports-Toys/NJ-Croce-Scooby-Doo-5pc.-Bendable-Figure-Set-with-Scooby-Doo-Shaggy-Daphne-Velma-and-Fred/28534567/product.html
https://tvtropes.org/pmwiki/pmwiki.php/Toys/Furby
https://www.fun.com/toy-story-4-figure-4-pack.html
https://www.johnlewis.com/lego-minifigures-71027-series-20-pack/p5079461↩︎