W3 Exercises: Nested and Crossed Structures

Psychoeducation Treatment Effects

Data: gadeduc.csv

This is synthetic data from a randomised controlled trial, in which 30 therapists randomly assigned patients (each therapist saw between 2 and 28 patients) to a control or treatment group, and monitored their scores over time on a measure of generalised anxiety disorder (GAD7 - a 7 item questionnaire with 5 point likert scales).

The control group of patients received standard sessions offered by the therapists. For the treatment group, 10 mins of each sessions was replaced with a specific psychoeducational component, and patients were given relevant tasks to complete between each session. All patients had monthly therapy sessions. Generalised Anxiety Disorder was assessed at baseline and then every visit over 4 months of sessions (5 assessments in total).

The data are available at https://uoepsy.github.io/data/lmm_gadeduc.csv

You can find a data dictionary below:

Table 1: Data Dictionary: lmm_gadeduc.csv

variable	description
patient	A patient code in which the labels take the form <Therapist initials>_<group>_<patient number>.
visit_0	Score on the GAD7 at baseline
visit_1	GAD7 at 1 month assessment
visit_2	GAD7 at 2 month assessment
visit_3	GAD7 at 3 month assessment
visit_4	GAD7 at 4 month assessment

Question 1

Uh-oh… these data aren’t in the same shape as the other datasets we’ve been giving you..

Can you get it into a format that is ready for modelling?

Hints

It’s wide, and we want it long.
Once it’s long. “visit_0”, “visit_1”,.. needs to become the numbers 0, 1, …
One variable (patient) contains lots of information that we want to separate out. There’s a handy function in the tidyverse called separate(), check out the help docs!

Question 2

Visualise the data. Does it look like the treatment had an effect over time? Does it look like the treatment worked when used by every therapist?

Hints

remember, stat_summary() is very useful for aggregating data inside a plot.

Question 3

Fit a model to test if the psychoeducational treatment is associated with greater improvement in anxiety over time.

Step 1: Choose the appropriate fixed effects.

Step 2: Think about the grouping structure in the data.

Step 3: Choose the appropriate random effects.

Note that the patient variable does not uniquely specify the individual patients. That is, patient “1” from therapist “AO” is a different person from patient “1” from therapist “BJ”.

Question 4

For each of the models below, what is wrong with the random effect structure?

modelA <- lmer(GAD ~ visit*group + 
               (1+visit*group|therapist)+
               (1+visit|patient),
             geduc_long)

modelB <- lmer(GAD ~ visit*group + 
               (1+visit*group|therapist/patient),
             geduc_long)

Question 5

Let’s suppose that I don’t want the psychoeducation treatment, I just want the standard therapy sessions that the ‘Control’ group received. Which therapist should I go to?

Hints

You don’t need to fit a new model here, you can use the one you fitted above.

ranef() and dotplot.ranef.mer() will help! You can read about ranef in Chapter 2 #making-model-predictions.

Question 6

Recreate this plot.

The faint lines represent the model estimated lines for each patient. The points and ranges represent our fixed effect estimates and their uncertainty.

Make sure you’re plotting model estimates, not the raw data.

Hints

you can get the patient-specific lines using augment() from the broom.mixed package, and the fixed effects estimates using effect() from the effects package.
remember that the “patient” column doesn’t group observations into unique patients.
remember you can pull multiple datasets into ggplot:

ggplot(data = dataset1, aes(x=x,y=y)) + 
  geom_point() + # points from dataset1
  geom_line(data = dataset2) # lines from dataset2

see more in Chapter 2 #visualising-models

Jokes

Data: lmm_laughs.csv

These data are simulated to imitate an experiment that investigates the effect of visual non-verbal communication (i.e. gestures, facial expressions) on joke appreciation. 90 participants took part in the experiment, in which they each rated how funny they found a set of 30 jokes. For each participant, the order of these 30 jokes was randomised for each run of the experiment. For each participant, the set of jokes was randomly split into two halves, with the first half being presented in audio-only, and the second half being presented in audio and video. This meant that each participant saw 15 jokes with video and 15 without, and each joke would be presented with video roughly half of the time.

The researchers want to investigate whether the delivery (audio/audiovideo) of jokes is associated with differences in humour-ratings.

Data are available at https://uoepsy.github.io/data/lmm_laughs.csv

Table 2: Data Dictionary: lmm_laughs.csv

variable	description
ppt	Participant Identification Number
joke_label	The text of the joke
joke_id	Joke Identification Number
delivery	Experimental manipulation: whether joke was presented in audio-only ('audio') or in audiovideo ('video')
rating	Humour rating chosen on a slider from 0 to 100

Question 7

Don’t look at the actual data yet!

Even before getting hold of any data, we should be able to write out the structure of our ideal “maximal” model based only on the information above.

Can you do so?

Hints

Don’t know where to start? Try following the steps in Chapter 8 #maximal-model.

Question 8

Read in and clean the data (if necessary).

Create some plots showing:

the average rating for audio vs audio+video for each joke
the average rating for audio vs audio+video for each participant

Hints

you could use facet_wrap, or even stat_summary!
you might want to use joke_id, rather than joke_label (the labels are very long!)

Question 9

Fit an appropriate model to address the research aims of the study.

Hints

This should be the one you came up with a couple of questions ago!

Question 10

Which joke is funniest when presented just in audio? For which joke does the video make the most difference to ratings?

Hints

These can all be answered by examining the random effects with ranef().
See Chapter 2 #making-model-predictions.

If you’re using joke_id, can you find out the actual joke that these correspond to?

Solution 16. Which joke is funniest in audio?

Audio is the reference level of delivery, so we only need to look at the intercept adjustments.

ranef(mod)$joke_id |>
  arrange(desc(`(Intercept)`))

   (Intercept) deliveryvideo
19       4.227        1.5179
9        3.384        0.3888
27       3.317        2.4125
17       2.776        1.3163
20       2.291        2.8854
10       1.761       -1.2474
29       1.232       -1.5029
2        1.066       -1.0460
25       0.963        3.2455
28       0.807        3.3737
23       0.755        0.0733
22       0.672       -0.9136
26       0.616       -1.2609
8        0.383        1.3332
24       0.248        1.5378
21      -0.157        1.6145
5       -0.190        0.1775
13      -0.606        2.9318
18      -0.760       -0.6693
6       -0.810        1.0232
7       -1.295       -0.2572
12      -1.327       -0.4446
1       -1.433       -1.8001
30      -1.872       -3.9547
3       -1.955        0.4889
4       -1.999       -1.0400
16      -2.468       -4.1913
15      -2.497       -1.2938
14      -3.386       -1.4002
11      -3.743       -3.2984

dotplot.ranef.mer(ranef(mod))$joke_id

Joke 19 is the funniest apparently!

Lots of ways to find what the joke actually is. Here is one way:

laughs |> count(joke_id, joke_label) |>
  filter(joke_id==19) |>
  pull(joke_label)

[1] "How many psychiatrists does it take to change a lightbulb? Just one, but the lightbulb really has to want to change."

(not sure I agree)

To find out which joke benefits most from video, we should look at the by-joke adjustments to the slope over delivery.

We can see the biggest slope adjustment in the plot above, which shows us that Joke 28 has the most benefit of video. We can also quickly check this with something like:

ranef(mod)$joke_id |>
  filter(deliveryvideo == max(deliveryvideo))

   (Intercept) deliveryvideo
28       0.807          3.37

Which joke is it?

laughs |> count(joke_id, joke_label) |>
  filter(joke_id==28) |>
  pull(joke_label)

[1] "An Alsatian went to a telegram office, took out a blank form and wrote:\n\"Woof. Woof. Woof. Woof. Woof. Woof. Woof. Woof. Woof.\"\nThe clerk examined the paper and politely told the dog: \"There are only nine\nwords here. You could send another \x91Woof' for the same price.\"\n\"But,\" the dog replied, \"that would make no sense at all.\""

The joke itself is a bit weird… but I can imagine that the video really helped!

Question 11

Do jokes that are rated funnier when presented in audio-only tend to also benefit more from the addition of video?

Hints

Think careful about this question. The random effects show us that jokes vary in their intercepts (ratings in audio-only) and in their effects of delivery (the random slopes). We want to know if these are related… one might even say … co-related …

Question 12

Create a plot of the estimated effect of video on humour ratings. Try to plot not only the fixed effects, but the raw data too.

Hints

See e.g. Chapter 2 #visualising-models

Extra: Test Enhanced Learning

Data: Test-enhanced learning

An experiment was run to conceptually replicate “test-enhanced learning” (Roediger & Karpicke, 2006): two groups of 25 participants were presented with material to learn. One group studied the material twice (StudyStudy), the other group studied the material once then did a test (StudyTest). Recall was tested immediately (one minute) after the learning session and one week later. The recall tests were composed of 175 items identified by a keyword (Test_word).

The critical (replication) prediction is that the StudyStudy group perform better on the immediate test, but the StudyTest group will retain the material better and thus perform better on the 1-week follow-up test.

Test performance is measured as the speed taken to correctly recall a given word.

The following code loads the data into your R environment by creating a variable called tel:

load(url("https://uoepsy.github.io/data/testenhancedlearning.RData"))

Table 3: Data Dictionary: testenhancedlearning.Rdata

variable	description
Subject_ID	Unique Participant Identifier
Group	Group denoting whether the participant studied the material twice (StudyStudy), or studied it once then did a test (StudyTest)
Delay	Time of recall test ('min' = Immediate, 'week' = One week later)
Test_word	Word being recalled (175 different test words)
Correct	Whether or not the word was correctly recalled
Rtime	Time to recall word (milliseconds)

Question 13

Here is the beginning of our modelling.

Code

# load in the data
load(url("https://uoepsy.github.io/data/testenhancedlearning.RData"))

# performance is measured by time taken to *correctly*
# recall a word.
# so we're going to have to discard all the incorrects:
tel <- tel |> filter(Correct == 1)

# preliminary plot
# makes it look like studytest are better at immediate (contrary to prediction)
# both groups get slower from immediate > week, 
# but studystudy slows more than studytest
ggplot(tel, aes(x = Delay, y = Rtime, col = Group)) + 
  stat_summary(geom="pointrange")

mmod <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Delay * Group | Test_word),
             data=tel)

Solution 19.

This is what I did. You might do something else!

First I removed the interaction from the random effects

mod1 <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Delay + Group | Test_word),
             data=tel)

boundary (singular) fit: see help('isSingular')

This model is a singular fit, suggesting it needs further simplification. The variance covariance matrix of the random effects isn’t giving us many pointers..

# examine vcov
VarCorr(mod1)

 Groups     Name           Std.Dev. Corr       
 Test_word  (Intercept)     21.5               
            Delayweek       14.5     0.17      
            GroupStudyTest  23.4    -0.77 -0.76
 Subject_ID (Intercept)     27.1               
            Delayweek       10.2    0.01       
 Residual                  240.1

There are various things we could try here. See Chapter 8 #simplifying-random-effect-structures for some of the more in-depth options.

However, sometimes it is simplest just to trial & error the removal of different possible terms. Here we are removing Delay|Test_word and removing Delay|Subject_ID:

mod2a <- lmer(Rtime ~ Delay*Group +
             (1 + Delay | Subject_ID) +
             (1 + Group | Test_word),
             data=tel)
mod2b <- lmer(Rtime ~ Delay*Group +
             (1 | Subject_ID) +
             (1 + Delay + Group | Test_word),
             data=tel)

boundary (singular) fit: see help('isSingular')

The second model is a singular fit, but the first one is not. Just for safety, let’s check:

isSingular(mod2a)

[1] FALSE

All looks good there.

Sometimes it can be useful to check how estimates of fixed effects and their standard errors differ across possible candidate models with different random effect structures. More often than not, this simply provides us with reassurance that the removal of random effects hasn’t actually had too much of an impact on anything we’re going to conduct inferences on. If they differ a lot, then we have a lot more to discuss!

Here are the fixed effects from each model:

term	mod1	mod2a	mod2b
(Intercept)	740.55 (7.17)	740.57 (7.21)	740.69 (7.23)
Delayweek	27.65 (6.97)	27.64 (6.87)	27.23 (6.64)
GroupStudyTest	-31.82 (10.26)	-31.75 (10.23)	-31.73 (10.35)
Delayweek:GroupStudyTest	-17.2 (9.69)	-17.26 (9.7)	-17.18 (9.19)

In all these models, the fixed effects estimates are all pretty similar, suggesting that they’ve all found similar estimates of these parameters which have been largely invariant to our refinement of the random effects. This makes me feel better - there’s less worry that our final conclusions are going to be influenced by specifics of incl/exclusion of one of these random effect terms.

I would definitely settle on mod2a because that is the one that converges, but we can add a footnote if we wanted, to mention that mod2b finds the same pattern of results.

Footnotes

if it does, head back to where we learned about interactions in the single level regressions lm(). It’s just the same here.↩︎