2. Intro to Multilevel Models

A Note on terminology

The methods we’re going to learn about in the first five weeks of this course are known by lots of different names: “multilevel models”; “hierarchical linear models”; “mixed-effect models”; “mixed models”; “nested data models”; “random coefficient models”; “random-effects models”; “random parameter models”… and so on).

What the idea boils down to is that model parameters vary at more than one level. This week, we’re going to explore what that means.

Throughout this course, we will tend to use the terms “mixed effect model”, “linear mixed model (LMM)” and “multilevel model (MLM)” interchangeably.

Multilevel Model Notation

Multilevel Models (MLMs) (or “Linear Mixed Models” (LMMs)) take the approach of allowing the groups/clusters to vary around our $\beta$ estimates.

In the lectures, we saw this as:

\[ \begin{align} & \text{for observation }j\text{ in group }i \\ \quad \\ & \text{Level 1:} \\ & \color{red}{y_{ij}}\color{black} = \color{blue}{\beta_{0i} \cdot 1 + \beta_{1i} \cdot x_{ij}}\color{black} + \varepsilon_{ij} \\ & \text{Level 2:} \\ & \color{blue}{\beta_{0i}}\color{black} = \gamma_{00} + \color{orange}{\zeta_{0i}} \\ & \color{blue}{\beta_{1i}}\color{black} = \gamma_{10} + \color{orange}{\zeta_{1i}} \\ \quad \\ & \text{Where:} \\ & \gamma_{00}\text{ is the population intercept, and }\color{orange}{\zeta_{0i}}\color{black}\text{ is the deviation of group }i\text{ from }\gamma_{00} \\ & \gamma_{10}\text{ is the population slope, and }\color{orange}{\zeta_{1i}}\color{black}\text{ is the deviation of group }i\text{ from }\gamma_{10} \\ \end{align} \]

We are now assuming $\color{orange}{\zeta_0}$, $\color{orange}{\zeta_1}$, and $\varepsilon$ to be normally distributed with a mean of 0, and we denote their variances as $\sigma_{\color{orange}{\zeta_0}}^2$, $\sigma_{\color{orange}{\zeta_1}}^2$, $\sigma_\varepsilon^2$ respectively.

The $\color{orange}{\zeta}$ components also get termed the “random effects” part of the model, Hence names like “random effects model”, etc.

Alternative (“mixed effect”) notation

Many people use the symbol $u$ in place of $\zeta$, and in various resources, you are likely to see $\alpha$ used to denote the intercept instead of $\beta_0$.

Sometimes, you will see the levels collapsed into one equation, as it might make for more intuitive reading.

This often fits with the name “mixed effects” for these models, as the “effect” of a predictor is a mix of both a fixed and a random part:

\[\color{red}{y_{ij}} = \underbrace{(\gamma_{00} + \color{orange}{\zeta_{0i}})}_{\color{blue}{\beta_{0i}}} \cdot 1 + \underbrace{(\gamma_{10} + \color{orange}{\zeta_{1i}})}_{\color{blue}{\beta_{1i}}} \cdot x_{ij} + \varepsilon_{ij} \\\]

In words, this equation is denoting: \[ \begin{align} \color{red}{\text{outcome}}\color{black} = &(\color{blue}{\text{overall intercept}}\color{black} + \color{orange}{\text{random adjustment to intercept by group}}\color{black}) \cdot 1 \, + \\ &( \color{blue}{\text{overall slope}}\color{black} + \color{orange}{\text{random adjustment to slope by group}} \color{black}) \cdot \text{predictor}\, + \\ & \text{residual} \\ \end{align} \]

Fitting Multilevel Models: Model Formula

To fit multilevel models, we’re going to use the lme4 package, and specifically the functions lmer() and glmer().¹

You will have seen some use of these functions in the lectures. The broad syntax is:

lmer(formula,
data = dataframe,
REML = logical,
control = lmerControl(options)
)

We write the first bit of our formula just the same as our old friend the normal linear model y ~ 1 + x + x2 + ..., where y is the name of our outcome variable, 1 is the intercept (which we don’t have to explicitly state as it will be included anyway) and x, x2 etc are the names of our explanatory variables.

With lme4, we now have the addition of random effect terms, specified in parenthesis with the | operator (the vertical line | is often found to the left of the z key on QWERTY keyboards). We use the | operator to separate the parameters (intercept, slope etc.) on the LHS, from the grouping variable(s) on the RHS, by which we would like to model these parameters as varying.

Random Intercepts
Let us suppose that we wish to model our intercept not as a fixed constant, but as varying randomly according to some grouping around a fixed center. We can such a model by allowing the intercept to vary by our grouping variable (g below):

lmer(y ~ 1 + x + (1|g), data = df)

\[ \begin{align} & \text{Level 1:} \\ & \color{red}{Y_{ij}} = \color{blue}{\beta_{0i} \cdot 1 + \beta_{1} \cdot X_{ij}} + \varepsilon_{ij} \\ & \text{Level 2:} \\ & \color{blue}{\beta_{0i}} = \gamma_{00} + \color{orange}{\zeta_{0i}} \\ \end{align} \]

Random Intercepts and Slopes
By extension we can also allow the effect y~x to vary between groups, by including the x on the left hand side of | in the random effects part of the call to lmer().

lmer(y ~ 1 + x + (1 + x |g), data = df)

\[ \begin{align} & \text{Level 1:} \\ & \color{red}{y_{ij}} = \color{blue}{\beta_{0i} \cdot 1 + \beta_{1i} \cdot x_{ij}} + \varepsilon_{ij} \\ & \text{Level 2:} \\ & \color{blue}{\beta_{0i}} = \gamma_{00} + \color{orange}{\zeta_{0i}} \\ & \color{blue}{\beta_{1i}} = \gamma_{10} + \color{orange}{\zeta_{1i}} \\ \end{align} \]

Fitting Multilevel Models: Model Estimation

We can choose whether to estimate our model parameters with ML (maximum likelihood) or REML (restricted maximum likelihood) with the REML argument of lmer():

lmer(formula,
data = dataframe,
REML = logical,
control = lmerControl(options)
)

lmer() models are by default fitted with REML, which tends to be better for small samples.

Maximum Likelihood (ML)

Remember back to DAPR2 when we introduced logistic regression, and we briefly discussed Maximum likelihood estimation in an explanation of how models are fitted.

The key idea of maximum likelihood estimation (MLE) is that we (well, the computer) iteratively finds the set of estimates for our model which it considers to best reproduce our observed data. Recall our simple linear regression model of how time spent outdoors (hrs per week) is associated with mental wellbeing: \[ \color{red}{Wellbeing_i} = \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} OutdoorTime_{i}} + \varepsilon_i \] There are values of $\beta_0$ and $\beta_1$ and $\sigma_\varepsilon$ which maximise the probability of observing the data that we have. For linear regression, these we obtained these same values a different way, via minimising the sums of squares. This approach is not possible for more complex models (e.g., logistic) which is why we turn to MLE.

To read about the subtle difference between “likelihood” and “probability”, you can find a short explanation here

If we are estimating just one single parameter (e.g. a mean), then we can imagine the process of maximum likelihood estimation in a one-dimensional world - simply finding the top of the curve:

However, our typical models estimate a whole bunch of parameters. The simple regression model above is already having to estimate $\beta_0$, $\beta_1$ and $\sigma_\varepsilon$, and our multi-level models have far more! With lots of parameters being estimated and all interacting to influence the likelihood, our nice curved line becomes a complex surface (see Left panel of Figure 2). So what we (our computers) need to do is find the maximum, but avoid local maxima and singularities (see Figure 3).

Restricted Maximum Likelihood (REML)

When it comes to estimating multilevel models, maximum likelihood will consider the fixed effects as fixed, known values when it estimates the variance components (the random effect variances). This leads to biased estimates of the variance components, specifically biasing them toward being too small, especially if $n_\textrm{clusters} - n_\textrm{level 2 predictors} - 1 < 50$. This leads to the standard errors of the fixed effects being too small, thereby inflating our type 1 error rate (i.e. greater chance of incorrectly rejecting our null hypothesis).

Restricted Maximum Likelihood (REML) is a method that separates the estimation of fixed and random parts of the model, leading to unbiased estimates of the variance components.

Fitting Multilevel Models: Model Convergence

Alongside the ML/REML choice for model estimation, we have some control over the underlying algorithm that is used to move around/search the likelihood surface for our estimates. We’ll learn more about this next week.

lmer(formula,
data = dataframe,
REML = logical,
control = lmerControl(options)
)

For large datasets and/or complex models (lots of random-effects terms), it is quite common to get a convergence warning. There are lots of different ways to deal with these (to try to rule out hypotheses about what is causing them).

For the time being, if lmer() gives you convergence errors, you could try changing the optimizer. Bobyqa is a good one: add control = lmerControl(optimizer = "bobyqa") when you run your model.

lmer(y ~ 1 + x1 + ... + (1 + .... | g), data = df, 
     control = lmerControl(optimizer = "bobyqa"))

What is a convergence warning??

There are different techniques for maximum likelihood estimation, which we apply by using different ‘optimisers’. Technical problems to do with model convergence and ‘singular fit’ come into play when the optimiser we are using either can’t find a suitable maximum, or gets stuck in a singularity (think of it like a black hole of likelihood, which signifies that there is not enough variation in our data to construct such a complex model).

Figure 3: local/global maxima and singularities

Exercises: Cross-Sectional

Data: Wellbeing Across Scotland

Recall our dataset from last week, in which we used linear regression to determine how outdoor time (hours per week) is associated with wellbeing in different local authority areas (LAAs) of Scotland. We have data from various LAAs, from Glasgow City, to the Highlands.

scotmw <- read_csv("https://uoepsy.github.io/data/LAAwellbeing.csv")

variable	description
ppt	Participant ID
name	Participant Name
laa	Local Authority Area
outdoor_time	Self report estimated number of hours per week spent outdoors
wellbeing	Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS), a self-report measure of mental health and well-being. The scale is scored by summing responses to each item, with items answered on a 1 to 5 Likert scale. The minimum scale score is 14 and the maximum is 70.
density	LAA Population Density (people per square km)

Question 1

Using lmer() from the lme4 package, fit a model predict wellbeing from outdoor_time, with by-LAA random intercepts.
Pass the model to summary() to see the output.

Question 2

Sometimes the easiest way to start understanding your model is to visualise it.

Load the package broom.mixed. Along with some handy functions tidy() and glance() which give us the information we see in summary(), there is a handy function called augment() which returns us the data in the model plus the fitted values, residuals, hat values, Cook’s D etc..

broom.mixed

the broom.mixed package has some useful functions (just like those in the broom package for lm()):

library(broom.mixed)
glance(model) # for overall model stats

# A tibble: 1 × 7
   nobs sigma logLik   AIC   BIC REMLcrit df.residual
  <int> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int>
1   132  5.05  -433.  875.  886.     867.         128

tidy(model) # for parameter stats

# A tibble: 4 × 6
  effect   group    term            estimate std.error statistic
  <chr>    <chr>    <chr>              <dbl>     <dbl>     <dbl>
1 fixed    <NA>     (Intercept)       38.2      2.65       14.4 
2 fixed    <NA>     outdoor_time       0.213    0.0724      2.95
3 ran_pars laa      sd__(Intercept)   10.3     NA          NA   
4 ran_pars Residual sd__Observation    5.05    NA          NA

augment(model) # observation level stuff (data + model)

# A tibble: 132 × 14
   wellbeing outdoor_time laa          .fitted .resid  .hat .cooksd .fixed   .mu
       <dbl>        <dbl> <fct>          <dbl>  <dbl> <dbl>   <dbl>  <dbl> <dbl>
 1        37           20 West Lothian    32.4  4.64  0.139 7.91e-2   42.5  32.4
 2        34           23 Falkirk         31.7  2.28  0.192 3.00e-2   43.1  31.7
 3        39           29 Falkirk         33.0  6.00  0.195 2.13e-1   44.4  33.0
 4        42           21 Scottish Bo…    40.1  1.95  0.163 1.74e-2   42.7  40.1
 5        37           10 Dumfries an…    37.4 -0.407 0.167 7.84e-4   40.3  37.4
 6        42           19 Argyll and …    43.9 -1.91  0.122 1.12e-2   42.2  43.9
 7        38           13 Perth and K…    46.1 -8.06  0.139 2.39e-1   41.0  46.1
 8        44           21 East Renfre…    44.4 -0.414 0.168 8.17e-4   42.7  44.4
 9        47           16 Inverclyde      42.7  4.30  0.193 1.07e-1   41.6  42.7
10        35           12 Midlothian      33.1  1.94  0.161 1.69e-2   40.8  33.1
# ℹ 122 more rows
# ℹ 5 more variables: .offset <dbl>, .sqrtXwt <dbl>, .sqrtrwt <dbl>,
#   .weights <dbl>, .wtres <dbl>

Add to the code below to plot the model fitted values, and color them according to LAA. (you will need to edit ri_model to be whatever name you assigned to your model).

augment(ri_model) %>%
  ggplot(aes(x = outdoor_time, y = ......

Question 3

We have just fitted the model: \[ \begin{align} & \text{For person } j \text{ in LAA } i \\ & \color{red}{\textrm{Wellbeing}_{ij}}\color{black} = \color{blue}{\beta_{0i} \cdot 1 + \beta_{1} \cdot \textrm{Outdoor Time}_{ij}}\color{black} + \varepsilon_{ij} \\ & \color{blue}{\beta_{0i}}\color{black} = \gamma_{00} + \color{orange}{\zeta_{0i}} \\ \end{align} \]

For our estimates of $\gamma_{00}$ (the fixed value around which LAA intercepts vary) and $\beta_1$ (the fixed estimate of the relationship between wellbeing and outdoor time), we can use fixef().

fixef(ri_model)

 (Intercept) outdoor_time 
   38.189795     0.213492

Can you add to the plot in the previous question, a thick black line with the intercept and slope given by fixef()?

Hints

Hint: geom_abline()

Question 4

By now, you should have a plot which looks more or less like the left-hand figure below (we have added on the raw data - the points).

Figure 5: Summary model output
lmer(wellbeing~1 + outdoor_time + (1|laa),
data = scotmw)

Match the coloured sections Yellow (W), Red (X), Blue (Y), and Orange (Z) in Figure 5 to the descriptions below of Figure 4 A through D.

where the black line cuts the y axis
the standard deviation of the distances from all the individual LAA lines to the black line
the slope of the black line
the standard deviation of the distances from all the individual observations to the line for the LAA to which it belongs.

Question 5

Match the colours sections and descriptions from the previous question, to the mathematical terms in the model equation:

\[ \begin{align} & \text{Level 1:} \\ & \color{red}{Wellbeing_{ij}}\color{black} = \color{blue}{\beta_{0i} \cdot 1 + \beta_{1} \cdot OutdoorTime_{ij}}\color{black} + \varepsilon_{ij} \\ & \text{Level 2:} \\ & \color{blue}{\beta_{0i}}\color{black} = \gamma_{00} + \color{orange}{\zeta_{0i}} \\ \quad \\ & \text{where} \\ & \color{orange}{\zeta_0}\color{black} \sim N(0, \sigma_{\color{orange}{\zeta_{0}}}\color{black}) \text{ independently} \\ & \varepsilon \sim N(0, \sigma_{\varepsilon}) \text{ independently} \\ \end{align} \]

Question 6

Fit a model which allows also (along with the intercept) the effect of outdoor_time to vary by-LAA.

Then, using augment() again, plot the model fitted values. What do you think you will see?
Does it look like this model better represents the individual LAAs? Take a look at, for instance, Glasgow City.

Exercises: Repeated Measures

While the wellbeing example considers the groupings or ‘clusters’ of different LAAs, a more common grouping in psychological research is that of several observations belonging to the same individual. One obvious benefit of this is that we can collect many more observations with fewer participants but control for the resulting dependency of observations.

Data: REPLICATION Audio/SDMT study

Recall the data from the previous week, from an experiment in which executive functioning was measured (via the SDMT) for people when listening to different types of audio, either via normal speakers or via noise-cancelling headphones.

This week, we have data from a replication of that study, in which the researchers managed to recruit 30 participants. Unfortunately, some participants did not complete all the trials, so we have an unbalanced design. The data is available at https://uoepsy.github.io/data/ef_replication.csv.

variable	description
PID	Participant ID
trial_n	Trial Number (1-15)
audio	Audio heard during the test ('no_audio', 'white_noise','music')
headphones	Whether the participant listened via speakers in the room or via noise cancelling headphones
SDMT	Symbol Digit Modalities Test (SDMT) score

Question 7

How many participants are there in the data?
How many have complete data (15 trials)?
What is the average number of trials that participants completed? What is the minimum?
Does every participant have some data for each type of audio?

Hints

The count() function will likely be useful here.

Question 8

The model below is sometimes referred to as the “null model” (or “intercept only model”). The grouping structure of the data is specified in the model, but nothing more.

nullmod <- lmer(SDMT ~ 1 + (1 | PID), data = efrep)

Fit the model and examine the summary.
How much of the variation in SDMT scores is down to participant grouping?

Question 9

Set the reference levels of the audio and headphones variables to “no audio” and “speakers” respectively.
Fit a multilevel model to address the research question below.

How do different types of audio interfere with executive functioning, and does this interference differ depending upon whether or not noise-cancelling headphones are used?

things to think about:

what is our outcome variable of interest?
what are our predictor variables (and interactions?) that we are interested in?
- these should be in the fixed effects part.
what is the clustering?
- this should be the random effects (1 | cluster) part
does audio type (audio) vary within clusters, or between?
- if so, we might be able to fit a random slope of audio | cluster. if not, then it doesn’t make sense to do so.
does delivery mode (headphones) vary within clusters, or between? - if so, we might be able to fit a random slope of headphones | cluster. if not, then it doesn’t make sense to do so.

If you get an error about model convergence, consider changing the optimiser (see the “model estimation” box)

Question 10

We now have a model, but we don’t have any p-values, confidence intervals, or inferential criteria on which to draw conclusions.

Pick a method of your choosing and perform a test of/provide an interval for the relevant effect of interest.
Provide a brief write-up of the results along with a visualisation.

Options

As with normal regression, we have two main ways in which we can conduct inference. We can focus on our coefficients, or we can compare models.²

There are a whole load of different methods available for drawing inferences from multilevel models, which means it can be a bit of a never-ending rabbit hole. For the purposes of this course, we’ll limit ourselves to these two:

	df approximations	likelihood-based
tests or CIs for model parameters	`library(parameters)` `model_parameters(model, ci_method="kr")`	`confint(model, type="profile")`
model comparison (different fixed effects, same random effects)	`library(pbkrtest)` `KRmodcomp(model1,model0)`	`anova(model0,model)`
	fit models with `REML=TRUE`. good option for small samples	fit models with `REML=FALSE`. needs large N at both levels (40+)

In this case we have $n=30$ participants (our level 2 sample size), and each participant has at most 15 observations (and some have fewer). These numbers are possibly a bit small for standard likelihood based methods. We would be better off using models fitted with REML because they will provide more accurate estimates of the variance components (the (1 + audio | PID) bit), and so better estimates of the standard errors for our fixed effects.

The easiest option here is to use the parameters package:

library(parameters)
model_parameters(sdmt_mod, ci_method="kr")

# Fixed Effects

Parameter                                         | Coefficient |   SE |          95% CI |     t |    df |      p
-----------------------------------------------------------------------------------------------------------------
(Intercept)                                       |       33.26 | 1.99 | [ 29.18, 37.34] | 16.71 | 27.97 | < .001
audio [white_noise]                               |       -0.03 | 1.45 | [ -3.00,  2.94] | -0.02 | 27.59 | 0.983 
audio [music]                                     |       -8.02 | 1.41 | [-10.92, -5.12] | -5.67 | 27.42 | < .001
headphones [anc_headphones]                       |        6.85 | 2.81 | [  1.09, 12.62] |  2.44 | 27.94 | 0.021 
audio [white_noise] × headphones [anc_headphones] |        8.02 | 2.05 | [  3.83, 12.22] |  3.92 | 27.73 | < .001
audio [music] × headphones [anc_headphones]       |       -3.59 | 2.00 | [ -7.69,  0.52] | -1.79 | 27.77 | 0.084 

# Random Effects

Parameter                              | Coefficient |   SE |        95% CI
---------------------------------------------------------------------------
SD (Intercept: PID)                    |        7.15 | 1.11 | [ 5.28, 9.69]
SD (audiowhite_noise: PID)             |        3.90 | 1.04 | [ 2.31, 6.58]
SD (audiomusic: PID)                   |        3.70 | 1.10 | [ 2.07, 6.62]
Cor (Intercept~audiowhite_noise: PID)  |       -0.25 | 0.31 | [-0.72, 0.37]
Cor (Intercept~audiomusic: PID)        |        0.03 | 0.32 | [-0.53, 0.58]
Cor (audiowhite_noise~audiomusic: PID) |       -0.16 | 0.41 | [-0.76, 0.58]
SD (Residual)                          |        5.61 | 0.24 | [ 5.16, 6.10]

And if we want to go down the model comparison route, we just need to isolate the relevant part(s) of the model that we are interested in. For instance, to test the interaction:

sdmt_res <- lmer(SDMT ~ audio + headphones + 
                   (1 + audio | PID), data = efrep,
                 REML = TRUE, control = lmerControl(optimizer="bobyqa"))
library(pbkrtest)
KRmodcomp(largeModel = sdmt_mod, smallModel = sdmt_res)

large : SDMT ~ audio * headphones + (1 + audio | PID)
small : SDMT ~ audio + headphones + (1 + audio | PID)
        stat    ndf    ddf F.scaling   p.value    
Ftest 10.637  2.000 26.756   0.96406 0.0003987 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

SDMT scores were modelled using linear mixed effects regression, with fixed effects of audio-type (no audio/white noise/music, treatment coded with no audio as the reference level), audio delivery (speakers/ANC-headphones, treatment coded with speakers as the reference level) and their interaction. Participant-level random intercepts and random slopes of audio-type were also included. The model was fitted using the lme4 package in R, and estimated with restricted estimation maximum likelihood (REML) using the bobyqa optimizer. Denominator degrees of freedom for all tests were approximated using the Kenward-Rogers method.

Inclusion of the interaction between headphones and audio-type was found to improve model fit ($F(2, 26.8) = 10.64, p < .001$), suggesting that the interference of different types of audio on executive functioning is dependent upon whether the audio is presented through ANC-headphones or through speakers.
Participants not wearing headphones and presented with no audio scored on average 33.26 on the SDMT. Listening to music via speakers was associated with lower scores ($\beta = -8.02, SE = 1.41, t(27.42)=-5.67, p <0.001$) compared to no audio. White noise played via speakers was not associated with a difference in performance on the SDMT compared to no audio.

Without any audio playing, wearing ANC-headphones was associated with higher SDMT scores compared to no headphones ($\beta = 6.85, SE = 2.81, t(27.94)=2.44, p =0.021$). This difference between headphones and speakers was also evident when listening to white-noise ($\beta = 8.02, SE = 2.05, t(27.73)=3.92, p <0.001$). The apparent detrimental influence of music was not found to differ depending on whether headphones were worn ($\beta = -3.59, p =0.084$).

These results suggest that while music appears to interfere with executive functioning (resulting in lower SDMT scores) regardless of whether it is heard through headphones or speakers, listening to white noise may actually improve executive functioning, but only when presented via headphones. Furthermore, there appears to be benefits for executive functioning from wearing ANC-headphones even when not-listening to audio, perhaps due to the noise cancellation. The pattern of findings are displayed in Figure 6.

library(sjPlot)
plot_model(sdmt_mod, type="int")

Figure 6: Interaction between the type (no audio/white noise/music) and the delivery (speakers/ANC headphones) on executive functioning task (SDMT)

Exercises: Longitudinal

Another very crucial advantage of these methods is that we can use them to study how people change over time.

Data: Wellbeing in Work

The “Wellbeing in Work” dataset contains information on employees who were randomly assigned to one of three employment conditions:

control: No change to employment. Employees continue at 5 days a week, with standard allocated annual leave quota.
unlimited_leave : Employees were given no limit to their annual leave, but were still expected to meet required targets as specified in their job description.
fourday_week: Employees worked a 4 day week for no decrease in pay, and were still expected to meet required targets as specified in their job description.

Wellbeing was was assessed at baseline (start of maintenance), 12 months post, 24 months post, and 36 months post.

The researchers had two main questions:

Q1): Overall, did the participants’ wellbeing stay the same or did it change?
Q2): Did the employment condition groups differ in the how wellbeing changed over the assessment period?

The data is available, in .rda format, at https://uoepsy.github.io/data/wellbeingwork3.rda. You can read it directly into your R environment using:

load(url("https://uoepsy.github.io/data/wellbeingwork3.rda"))

After running the code above you will find the data in an object called wellbeingwork3 in your environment.

Question 11

Q1): Overall, did the participants’ wellbeing stay the same or did it change?

Each of our participants have measurements at 4 assessments. We need to think about what this means for the random effects that we will include in our model (our random effect structure). Would we like our models to accommodate individuals to vary in their overall wellbeing, to vary in how they change in wellbeing over the course of the assessment period, or both?

To investigate whether wellbeing changed over the course of the assessments, or whether it stayed the same, we can fit and compare 2 models:

A null model.
A model with wellbeing predicted by time point.

And we can then compare them in terms of model fit (as mentioned above, there are lots of different ways we might do this).

Our sample size here (180 participants, each with 4 observations) is reasonably large given the relative simplicity of our model. We might consider running a straightforward Likelihood Ratio Test using anova(restricted_model, full_model) to compare our two models (in which case we should fit them with REML=FALSE)

Hints

Remember, we shouldn’t use likelihood ratio tests to compare models with different random effect structures.
(For now, don’t worry too much about “singular fits”. We’ll talk more about how we might deal with them next week!)

load(url("https://uoepsy.github.io/data/wellbeingwork3.rda"))
head(wellbeingwork3)

# A tibble: 6 × 4
  ID    TimePoint Condition    Wellbeing
  <fct>     <dbl> <fct>            <dbl>
1 101           0 fourday_week        44
2 101           1 fourday_week        39
3 101           2 fourday_week        34
4 101           3 fourday_week        46
5 102           0 fourday_week        32
6 102           1 fourday_week        45

This is our null model:

m.null <- lmer(Wellbeing ~ 1 + (1 | ID), data=wellbeingwork3, REML=FALSE)
summary(m.null)

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: Wellbeing ~ 1 + (1 | ID)
   Data: wellbeingwork3

     AIC      BIC   logLik deviance df.resid 
  4400.8   4414.6  -2197.4   4394.8      717 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.5224 -0.6067 -0.0426  0.5918  3.6020 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept)  4.762   2.182   
 Residual             22.479   4.741   
Number of obs: 720, groups:  ID, 180

Fixed effects:
            Estimate Std. Error t value
(Intercept)  40.0431     0.2402   166.7

We can see the 4.76 / (4.76 + 22.48), or 0.17 of the total variance in wellbeing is attributable to participant-level variation.

Now lets suppose we want to compare this null model with a model with an effect of TimePoint (to assess whether there is overall change over time). Which model should we compare m.null to?

modA <- lmer(Wellbeing ~ 1 + TimePoint + (1 + TimePoint | ID), data=wellbeingwork3, REML=FALSE)
modB <- lmer(Wellbeing ~ 1 + TimePoint + (1 | ID), data=wellbeingwork3, REML=FALSE)

A comparison between m.null and modA will not be assessing the influence of only the fixed effect of TimePoint (remember, we shouldn’t compare models that differ in both fixed and random effects)
However, modB doesn’t include our by-participant random effects of timepoint, so comparing this to m.null is potentially going to mis-attribute random deviations in participants’ change to being an overall effect of timepoint.

If we want to conduct a model comparison to isolate the average change over time (a fixed effect of TimePoint), we might want to compare these two models:

m.base0 <- lmer(Wellbeing ~ 1 + (1 + TimePoint | ID), data=wellbeingwork3, REML=FALSE)
m.base <- lmer(Wellbeing ~ 1 + TimePoint + (1 + TimePoint | ID), data=wellbeingwork3, REML=FALSE)

The first of these models is a bit weird to think about - how can we have by-participant random deviations of TimePoint if we don’t have a fixed effect of TimePoint? That makes very little sense. What it is actually fitting is a model where the average participant has no effect of TimePoint. So the fixed effect is 0.

# Straightforward LRT
anova(m.base0, m.base)

Data: wellbeingwork3
Models:
m.base0: Wellbeing ~ 1 + (1 + TimePoint | ID)
m.base: Wellbeing ~ 1 + TimePoint + (1 + TimePoint | ID)
        npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
m.base0    5 4202.4 4225.2 -2096.2   4192.4                         
m.base     6 4171.7 4199.2 -2079.8   4159.7 32.649  1  1.104e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Question 12

Q: Did the employment condition groups differ in the how wellbeing changed over the assessment period?

Hints

It helps to break it down. There are two questions here:

do groups differ overall?
do groups differ over time?

We can begin to see that we’re asking two questions about the Condition variable here: “is there an effect of Condition?” and “Is there an interaction between TimePoint and Condition?”.

Try fitting two more models which incrementally build these levels of complexity, and compare them (perhaps to one another, perhaps to models from the previous question - think about what each comparison is testing!)

Question 13

Visualise the model estimated change in wellbeing over time for each Condition.

Hints

There are lots of ways you can visualise the model, try a couple:

Using the effects package, this might help: as.data.frame(effect("TimePoint*Condition", model))
We can also use sjPlot, as we have seen in DAPR2

Question 14

Examine the parameter estimates and interpret them (i.e., what does each parameter represent?
Can you match them with parts of the plot obtained from plot_model(m.full, type="int")?

Hints

We can get the fixed effects using fixef(model), and we can also use tidy(model) from the broom.mixed package, and similar to lm models in DAPR2, we can pull out the bit of the summary() using summary(model)$coefficients.

                                   Estimate Std. Error t value
(Intercept)                          38.352      0.394  97.270
TimePoint                            -0.023      0.322  -0.072
Conditionunlimited_leave             -0.018      0.558  -0.033
Conditionfourday_week                -0.260      0.558  -0.466
TimePoint:Conditionunlimited_leave    1.357      0.456   2.976
TimePoint:Conditionfourday_week       2.282      0.456   5.005

(Intercept) ==> Wellbeing at baseline in ‘control’ group.
TimePoint ==> Slope of wellbeing change in ‘control’ group.
Conditionunlimited_leave ==> baseline wellbeing difference from ‘unlimited_leave’ group relative to ‘control’ group.
Conditionfourday_week ==> baseline wellbeing difference from ‘fourday_week’ group relative to ‘control’ group.
TimePoint:Conditionunlimited_leave ==> slope of wellbeing change in ‘unlimited_leave’ group relative to ‘control’ group.
TimePoint:Conditionfourday_week ==> slope of wellbeing change in ‘fourday_week’ group relative to ‘control’ group.

plot_model(m.full, type="int")

Figure 7: Well-being over time, for employees working in different conditions

(Intercept) ==> the height of the red line at timepoint 0.
TimePoint ==> slope of the red line (basically flat).
Conditionunlimited_leave ==> the difference at timepoint 0 in height of the blue line from height of the red line (basically 0, but blue line starts ever so slightly below the red).
Conditionfourday_week ==> the difference at timepoint 0 in height of the green line from height of the red line (also very small, but it is a bit further below the red than the blue is).
TimePoint:Conditionunlimited_leave ==> the difference from slope of red line to slope of blue line. i.e. while the red line goes -0.023 for every 1 across, the blue line goes $-0.023+1.357=1.334$ up for every 1 across.
TimePoint:Conditionfourday_week ==> difference from slope of red line to slope of green line. i.e. while the red line goes -0.023 for every 1 across, the green line goes $-0.023+2.282=2.259$ up for every 1 across.

Compared to the control group, wellbeing increased by 1.36 points/year more for employees with unlimited leave, and by 2.28 points/year for employees on the 4 day week.

Footnotes

“(g)lmer” here stands for “(generalised) linear mixed effects regression”.↩︎
What is the difference between testing coefficients and comparing models? This is most easily seen when the model includes a categorical variable as a predictor. The summary() function would return, for each level of the predictor (excluding the reference level), a coefficient, its standard error, the t-statistic, and the p-value for a test of whether the coefficient is significantly different from 0. A model comparison between that model and a null model without the categorical predictor would collapse all levels into a single test across all levels of the predictor.↩︎