Flashcards: lm to lmer

In a simple linear regression, there is only considered to be one source of random variability: any variability left unexplained by a set of predictors (which are modelled as fixed estimates) is captured in the model residuals.

Multi-level (or ‘mixed-effects’) approaches involve modelling more than one source of random variability - as well as variance resulting from taking a random sample of observations, we can identify random variability across different groups of observations. For example, if we are studying a patient population in a hospital, we would expect there to be variability across the our sample of patients, but also across the doctors who treat them.

We can account for this variability by allowing the outcome to be lower/higher for each group (a random intercept) and by allowing the estimated effect of a predictor vary across groups (random slopes).

Before you expand each of the boxes below, think about how comfortable you feel with each concept.
This content is very cumulative, which means often going back to try to isolate the place which we need to focus efforts in learning.

Simple Linear Regression

Clustered (multi-level) data

Random intercepts

Shrinkage

Random slopes

Fixed effects

Random effects

Group-level coefficients

Visualising Model Fitted values

Visualising Fixed Effects

Plotting random effects

Polynomials!

Nested and Crossed structures

Inference for Multilevel Models

Exercises: More Crossed Ranefs

Data: 11 Domain Tests

44 participants across 4 groups A, B, C, & W (between-subjects) were tested 5 times (waves) in 11 domains. In each wave, participants received a score (on a 20-point scale) for each domain and a set of questions which they answered either correctly or incorrectly.

The data can be accessed using the following code, and a description of the variables in the data can be found in the table below:

load(url("https://uoepsy.github.io/data/msmr_lab5.RData"))
variable description
Anonymous_Subject_ID Participant Identifier
IndivDiff ?? (not sure but it’s not relevant for these questions!)
Wave Study wave (timepoint), ranging from 1 to 5
Domain Domain tested (one of 11 domains studied, including things such as animals (ANI), objects (OBJ), toys (TOY), vehicles (VEH)
Correct Number of questions answered correctly
Error Number of questions answered incorrectly
Group Group (between-participants), A, B, C, or W
Score Score
Question A1

Did the groups differ in overall performance?

There are different ways to test this: use the 20-point score or the accuracy? Keep the domains separate or calculate an aggregate across all domains? Which way makes the most sense to you?

Make a plot that corresponds to the research question. Does it look like there’s a difference?

Solution

Question A2

Did the groups differ in overall performance?

Use a mixed-effects model to test the difference.

  • Will you use a linear or logistic model?
  • What should the fixed(s) effect be?
  • What should the random effect(s) be? We have observations clustered by subjects and by domains - are they nested?

Tip: For now, we can forget about the longitudinal aspect to the data, because the research question is only concerned with overall performance.

Solution

Question A3

Did performance change over time (across waves)? Did the groups differ in pattern of change?

Make a plot that corresponds to the research question. Does it look like there was a change? A group difference?

Solution

Question A4

Did performance change over time (across waves)? Did the groups differ in pattern of change?

Use mixed-effects model(s) to test this.

Hint: Fit a baseline model in which scores change over time (wave), then assess improvement in model fit due to inclusion of overall group effect and finally the interaction of group with time.

Solution

Question A5

Using broom.mixed::augment() for the model with a Wave*Group interaction, plot the average (stat_summary() perhaps?) model fitted values for each group across Waves. Add in the observed data too.

Solution

Question A6

Create individual subject plots for the data and the model’s fitted values. Will these show straight lines?

Hint: make use of facet_wrap() to create a different panel for each level of a grouping variable.

Solution

Question A7

Make a plot of the actual (linear) model prediction.

Hint: Use the effect() function from the effects package.

Solution

Question A8

What important things are different between the plot from question A7 and that from question A5? (You can see the plots we created for these questions below).

Why do you think these two plots differ?

Hint

Solution

Question A9

Create a plot of the subject and domain random effects. Notice the pattern between the random intercept and random slope estimates for the 11 domains - what in our model is this pattern representing?

Solution

Less Guided Exercise

Instead of step-by-step questions, this exercise is designed to get you thinking more, giving you practice for the report and for your future research. If it helps, you can find a (sort of) checklist for multilevel models here (but please be aware that there is no ‘one-size-fits-all’ approach - this checklist may not always be appropriate for every research question with multi-level data)

Question C1

How does aggressive behaviour change over adolescence? How is this change dependent upon whether or not a child has siblings?

Data: Aggressive Behaviour in Adolescence

Data was collected from 30 secondary schools across Scotland. A cohort of students were followed up every year from the age of 12 to 19. Each year, they completed the Aggressive Behaviour Scale (ABS). Data was also captured on the number of siblings each child had.
The data can be accessed from https://uoepsy.github.io/data/schoolsabs.csv . A description of the variables can be found in the table below.

variable description
schoolid School Identifier
ABS Score on the Aggressive Behaviour Scale (Z-scored)
year Age (in years) of child at observation
childid Within-School Child Identifier
siblings Sibling status (No/Yes)

Solution