Reasoning about random effects

In this lab, you’ll apply the tools you saw in the lectures this week to the same three datasets you got to know in Week 1.

You’ll figure out how to model the grouping structure in each dataset in terms of random intercepts and random slopes (which, together, we refer to as random effects).

Get set up

Create a new .Rmd file for this week’s exercises.
Save it somewhere you can find it again.
Give it a clear name (for example, dapr3_lab03.Rmd).
In the first code chunk, load the packages you’ll need this week:
- tidyverse
- stats (for xtabs())

Clothing

Read in the dataset located at https://uoepsy.github.io/data/dapr3_mannequin.csv and name it clothing.

RQ: Are people more likely to purchase clothing when they see it displayed on a model, and is this association dependent on item price?

variable	description
purch_rating	Purchase rating (sliding scale 0 to 100, with higher ratings indicating greater perceived likelihood of purchase)
price	Price presented for item (range £5 to £100)
ppt	Participant identifier
condition	Whether items are seen on a model or on a white background

More detail about this dataset

Thirty participants were presented with a set of pictures of items of clothing, and rated each item how likely they were to buy it. Each participant saw 20 items, ranging in price from £5 to £100. 15 participants saw these items worn by a model, while the other 15 saw the items hanging against a white background.

From the Week 1 lab, here’s the key information about clothing:

Outcome: purch_rating
Predictors: price, condition (and their interaction)
Randomly-varying grouping variable: ppt

Question 1

Write the R formula for the fixed effects part of this model (that is, the outcome and predictors, the y ~ x + z part, the kind of models you’ve done before in DAPR2).

🗂️ See Fixed effects flash card.

Solution 1.

purch_rating ~ price * condition

(Note: We know this has to be an interaction model because the RQ is asking about how an association between predictor and outcome depends on the value of another predictor.)

Question 2

Now we’re looking at how each randomly-varying grouping variable relates to the model’s predictors. We want to know: for each randomly-varying grouping variable, do at least some of its levels appear with more than one distinct value of each of the model’s predictors?

Specifically, for clothing:

Do at least some levels of ppt appear with more than one distinct value of price?
Do at least some levels of ppt appear with more than one distinct value of condition?

🗂️ See Identify possible random effects flash card.

Solution 2.

clothing <- read_csv("https://uoepsy.github.io/data/dapr3_mannequin.csv")

Do at least some levels of ppt appear with more than one distinct value of price? Yes, every ppt sees every price once. In other words, price varies within ppt.

stats::xtabs(~ ppt + price, data = clothing)

        price
ppt      5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
  ppt_1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_10 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_11 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_12 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_13 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_14 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_15 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_16 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_17 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_18 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_19 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_2  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_20 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_21 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_22 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_23 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_24 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_25 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_26 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_27 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_28 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_29 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_3  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_30 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_4  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_5  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_6  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_7  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_8  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
  ppt_9  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1

Do at least some levels of ppt appear with more than one distinct value of condition? No, every ppt sees only one condition or the other, not both. In other words, condition varies between ppts.

stats::xtabs(~ ppt + condition, data = clothing)

        condition
ppt      item_only model
  ppt_1         20     0
  ppt_10        20     0
  ppt_11        20     0
  ppt_12        20     0
  ppt_13        20     0
  ppt_14        20     0
  ppt_15        20     0
  ppt_16         0    20
  ppt_17         0    20
  ppt_18         0    20
  ppt_19         0    20
  ppt_2         20     0
  ppt_20         0    20
  ppt_21         0    20
  ppt_22         0    20
  ppt_23         0    20
  ppt_24         0    20
  ppt_25         0    20
  ppt_26         0    20
  ppt_27         0    20
  ppt_28         0    20
  ppt_29         0    20
  ppt_3         20     0
  ppt_30         0    20
  ppt_4         20     0
  ppt_5         20     0
  ppt_6         20     0
  ppt_7         20     0
  ppt_8         20     0
  ppt_9         20     0

Question 3

Let’s bring all this information together to reason about the random effects in this dataset.

We know that every randomly-varying grouping variable requires a random intercept. For each of those grouping variables, ask yourself: can we additionally include random slopes over each of the model’s predictor(s) by that variable?

Specifically, for clothing, we know we need a random intercept by ppt.

Can we also include a random slope over price by ppt?
Can we also include a random slope over condition by ppt?

These answers will be based on what you found in answer to Question 2.

Write out the complete R formula for this model, expanding on the fixed-effects-only model you wrote above by adding on the appropriate random effects.

🗂️ See Add random effects to a model formula flash card.

Solution 3. In addition to the random intercept by ppt, can we also include a random slope over price by ppt? Yes, because we figured out in the last question that each value of ppt appears with more than one value of price (that is, each participant sees more than one price).

Can we also include a random slope over condition by ppt? No, because we figured out in the last question that each value of ppt appears with only one value of condition (that is, each participant sees only one condition, not both).

Therefore the full model formula is:

purch_rating ~ price * condition + (1 + price | ppt)

Monkey status

Read in the dataset located at https://uoepsy.github.io/data/msmr_monkeystatus.csv and name it monkey.

RQ: How is the social status of monkeys associated with their ability to solve problems, while controlling for the difficulty of the problem?

variable	description
status	Social status of monkey (adolescent, subordinate adult, or dominant adult)
difficulty	Problem difficulty ('easy' vs 'difficult')
monkeyID	Monkey name
solved	Whether or not the problem was successfully solved by the monkey

More detail about this dataset

Researchers have given a sample of Rhesus Macaques various problems to solve in order to receive treats. Troops of Macaques have a complex social structure, but adult monkeys tend can be loosely categorised as having either a “dominant” or “subordinate” status. The monkeys in our sample are either adolescent monkeys, subordinate adults, or dominant adults. Each monkey attempted various problems before they got bored/distracted/full of treats. Each problems were classed as either “easy” or “difficult”, and the researchers recorded whether or not the monkey solved each problem.

From the Week 1 lab, here’s the key information about monkeystatus:

Outcome: solved
Predictors: status, difficulty
Randomly-varying grouping variable: monkeyID

Question 4

Write the R formula for the fixed effects part of this model.

🗂️ See Fixed effects flash card.

Solution 4.

solved ~ status + difficulty

(Note: We know this is not an interaction model because the RQ is asking for us to control for a specific variable, not to estimate how one effect depends on a different variable. To control for a specific covariate, we just need to add it on.)

Question 5

🗂️ See Identify possible random effects flash card.

Solution 5.

monkey <- read_csv('https://uoepsy.github.io/data/msmr_monkeystatus.csv')

Do at least some levels of monkeyID appear with more than one distinct value of status? No, every monkeyID has only one status.

stats::xtabs(~ monkeyID + status, data = monkey)

           status
monkeyID    adolescent dominant subordinate
  Aliyya             0        7           0
  Ashley             6        0           0
  Billy              0        7           0
  Brianna            0        5           0
  Catherine          9        0           0
  Celestina          5        0           0
  Cheyenne           6        0           0
  Cinoi              0        0          10
  Courtney           0        7           0
  Daniel             0        0           8
  David              0        0          11
  Delray             6        0           0
  Devonte            0        5           0
  Efren              0        7           0
  Eric               0        8           0
  Erik               6        0           0
  Fuaad              0        0           6
  Gienry             0       11           0
  Iffat              0        8           0
  Jaarallah          0       11           0
  Jashjaun           0        6           0
  Jason              0       10           0
  Jayson            10        0           0
  Jennifer          10        0           0
  Jessica            0        0           8
  Jonathan           7        0           0
  Kaamil            11        0           0
  Maya               0        7           0
  Micah              0        0          10
  Nadheera           0        0           7
  Najiyya            0       11           0
  Nathan             0        5           0
  Nichol             0        0          10
  Peter              7        0           0
  Rachel             0       11           0
  Rebecca            9        0           0
  Richard            0        0           3
  Riley              0        0           8
  Robert             7        0           0
  Saydie             0        8           0
  Sean               0        9           0
  Seunghoo           0        0           9
  Shajee'a           0       10           0
  Shannon           10        0           0
  Sierra             0        6           0
  Sydney             0        8           0
  Taylor             0        6           0
  Van               10        0           0
  Vance              0        8           0
  Young Joo          7        0           0

Do at least some levels of monkeyID appear with more than one distinct value of difficulty? Yes, nearly every monkeyID sees both levels of difficulty.

stats::xtabs(~ monkeyID + difficulty, data = monkey)

           difficulty
monkeyID    difficult easy
  Aliyya            3    4
  Ashley            3    3
  Billy             3    4
  Brianna           3    2
  Catherine         4    5
  Celestina         3    2
  Cheyenne          2    4
  Cinoi             4    6
  Courtney          2    5
  Daniel            5    3
  David             5    6
  Delray            5    1
  Devonte           2    3
  Efren             4    3
  Eric              3    5
  Erik              2    4
  Fuaad             5    1
  Gienry            5    6
  Iffat             1    7
  Jaarallah         5    6
  Jashjaun          2    4
  Jason             6    4
  Jayson            5    5
  Jennifer          7    3
  Jessica           4    4
  Jonathan          2    5
  Kaamil            7    4
  Maya              2    5
  Micah             5    5
  Nadheera          0    7
  Najiyya           5    6
  Nathan            3    2
  Nichol            4    6
  Peter             1    6
  Rachel            5    6
  Rebecca           4    5
  Richard           3    0
  Riley             3    5
  Robert            4    3
  Saydie            4    4
  Sean              3    6
  Seunghoo          3    6
  Shajee'a          7    3
  Shannon           6    4
  Sierra            5    1
  Sydney            4    4
  Taylor            3    3
  Van               3    7
  Vance             7    1
  Young Joo         3    4

(It’s fine that Nadheera and Richard only see one difficulty level each—we have enough data from the other monkeys to still be able to estimate the random effects. Speaking of which…)

Question 6

Let’s bring all this information together to reason about the random effects in this dataset.

Write out the complete R formula for this model, expanding on the fixed-effects-only model you wrote above by adding on the appropriate random effects.

🗂️ See Add random effects to a model formula flash card.

Solution 6. In addition to the random intercept by monkeyID, can we also include a random slope over status by monkeyID? No, because each monkey only has one status.

Can we also include a random slope over difficulty by monkeyID? Yes, because at least some monkeys have seen both levels of difficulty.

Therefore the full model formula is:

solved ~ status + difficulty + (1 + difficulty | monkeyID)

Laughs

Read in the dataset located at https://uoepsy.github.io/data/lmm_laughs.csv and name it laughs.

RQ: How is the delivery format of jokes (audio-only vs. audio AND video) associated with differences in humour ratings?

variable	description
ppt	Participant identification number
joke_label	Joke presented
joke_id	Joke identification number
delivery	Experimental manipulation: whether joke was presented in audio-only ('audio') or in audiovideo ('video')
rating	Humour rating chosen on a slider from 0 to 100

More detail about this dataset

These data are simulated to imitate an experiment that investigates the effect of visual non-verbal communication (i.e., gestures, facial expressions) on joke appreciation. Ninety participants took part in the experiment, in which they each rated how funny they found a set of 30 jokes. For each participant, the order of these 30 jokes was randomised for each run of the experiment. For each participant, the set of jokes was randomly split into two halves, with the first half being presented in audio-only, and the second half being presented in audio and video. This meant that each participant saw 15 jokes with video and 15 without, and each joke would be presented with video roughly half of the time.

From the Week 1 lab, here’s the key information about laughs:

Outcome: rating
Predictors: delivery
Randomly-varying grouping variables: ppt and joke_label/joke_id (two forms of the same information; for simplicity we can just use joke_id)

Question 7

Write the R formula for the fixed effects part of this model.

🗂️ See Fixed effects flash card.

Solution 7.

rating ~ delivery

Question 8

🗂️ See Identify possible random effects flash card.

Solution 8.

laughs <- read_csv('https://uoepsy.github.io/data/lmm_laughs.csv')

Do at least some levels of ppt appear with more than one distinct value of delivery? Yes, every ppt sees both kinds of delivery.

stats::xtabs(~ ppt + delivery, data = laughs)

         delivery
ppt       audio video
  PPTID1     15    15
  PPTID10    15    15
  PPTID11    15    15
  PPTID12    15    15
  PPTID13    15    15
  PPTID14    15    15
  PPTID15    15    15
  PPTID16    15    15
  PPTID17    15    15
  PPTID18    15    15
  PPTID19    15    15
  PPTID2     15    15
  PPTID20    15    15
  PPTID21    15    15
  PPTID22    15    15
  PPTID23    15    15
  PPTID24    15    15
  PPTID25    15    15
  PPTID26    15    15
  PPTID27    15    15
  PPTID28    15    15
  PPTID29    15    15
  PPTID3     15    15
  PPTID30    15    15
  PPTID31    15    15
  PPTID32    15    15
  PPTID33    15    15
  PPTID34    15    15
  PPTID35    15    15
  PPTID36    15    15
  PPTID37    15    15
  PPTID38    15    15
  PPTID39    15    15
  PPTID4     15    15
  PPTID40    15    15
  PPTID41    15    15
  PPTID42    15    15
  PPTID43    15    15
  PPTID44    15    15
  PPTID45    15    15
  PPTID46    15    15
  PPTID47    15    15
  PPTID48    15    15
  PPTID49    15    15
  PPTID5     15    15
  PPTID50    15    15
  PPTID51    15    15
  PPTID52    15    15
  PPTID53    15    15
  PPTID54    15    15
  PPTID55    15    15
  PPTID56    15    15
  PPTID57    15    15
  PPTID58    15    15
  PPTID59    15    15
  PPTID6     15    15
  PPTID60    15    15
  PPTID61    15    15
  PPTID62    15    15
  PPTID63    15    15
  PPTID64    15    15
  PPTID65    15    15
  PPTID66    15    15
  PPTID67    15    15
  PPTID68    15    15
  PPTID69    15    15
  PPTID7     15    15
  PPTID70    15    15
  PPTID71    15    15
  PPTID72    15    15
  PPTID73    15    15
  PPTID74    15    15
  PPTID75    15    15
  PPTID76    15    15
  PPTID77    15    15
  PPTID78    15    15
  PPTID79    15    15
  PPTID8     15    15
  PPTID80    15    15
  PPTID81    15    15
  PPTID82    15    15
  PPTID83    15    15
  PPTID84    15    15
  PPTID85    15    15
  PPTID86    15    15
  PPTID87    15    15
  PPTID88    15    15
  PPTID89    15    15
  PPTID9     15    15
  PPTID90    15    15

Do at least some levels of joke_id appear with more than one distinct value of delivery? Yes, every joke_id is told with both kinds of delivery.

stats::xtabs(~ joke_id + delivery, data = laughs)

       delivery
joke_id audio video
     1     40    50
     2     43    47
     3     48    42
     4     46    44
     5     43    47
     6     35    55
     7     39    51
     8     52    38
     9     45    45
     10    48    42
     11    46    44
     12    47    43
     13    36    54
     14    47    43
     15    49    41
     16    52    38
     17    50    40
     18    39    51
     19    46    44
     20    42    48
     21    54    36
     22    45    45
     23    47    43
     24    44    46
     25    39    51
     26    46    44
     27    39    51
     28    56    34
     29    40    50
     30    47    43

Question 9

Let’s bring all this information together to reason about the random effects in this dataset.

Write out the complete R formula for this model, expanding on the fixed-effects-only model you wrote above by adding on the appropriate random effects.

🗂️ See Add random effects to a model formula flash card.

Solution 9. In addition to the random intercept by ppt, can we also include a random slope over delivery by ppt? Yes, because each ppt has seen each kind of delivery.

In addition to the random intercept by joke_id, can we also include a random slope over delivery by joke_id? Yes, because each joke_id was told in each kind of delivery.

Therefore the full model formula is:

rating ~ delivery + (1 + delivery | ppt) + (1 + delivery | joke_id)

Interpreting LMM summary TODO

Question 10

TODO choose one dataset that doesn’t have convergence issues and task w interpreting LMM summary