class: center, middle, inverse, title-slide .title[ #
Random Effect Structures
] .subtitle[ ## Data Analysis for Psychology in R 3 ] .author[ ### Josiah King ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- --- # Grouping structures so far... .pull-left[ - children within schools - people within areas - trials within participants - timepoint within participants - nurses within hospitals - and probably some others... ] .pull-right[ ] --- # Look at your data! Read the study design! .pull-left[ - children within schools - people within areas - trials within participants - timepoint within participants - nurses within hospitals - and probably some others... ] .pull-right[ ``` ## g x y ## 1 1 34 ## 1 7 32 ## 1 3 37 ## ... ... ... ## ... ... ... ## i 5 25 ## i 1 29 ## ... ... ... ``` ] --- count: false # Look at your data! Read the study design! .pull-left[ - children within schools - people within areas - trials within participants - timepoint within participants - nurses within hospitals - **observations within clusters** ] .pull-right[ ``` ## g x y ## 1 1 34 ## 1 7 32 ## 1 3 37 ## ... ... ... ## ... ... ... ## i 5 25 ## i 1 29 ## ... ... ... ``` when data is in long format: - rows of data grouped by values of group identifier `g` ] --- # Adding more levels! .pull-left[ - children within schools *within districts* - people within areas *within countries* - trials within participants *within pairs* - timepoint within participants *within families* - nurses within hospitals *within health boards* - **observations within clusters _within higher clusters_** ] .pull-right[ ``` ## g1 g2 x y ## A 1 1 34 ## A 1 7 32 ## A 1 3 37 ## ... ... ... ... ## ... ... ... ... ## A i 5 25 ## A i 1 29 ## ... ... ... ... ## ## B 101 4 31 ## B 101 6 25 ## B 102 2 27 ## ... ... ... ... ## ... ... ... ... ## B ... ... ... ``` when data is in long format: - rows of data grouped by values of group identifier `g2`, which are in turn grouped by values of higher-level group identifier `g1` ] --- # Nested Structures - the things in a cluster belong __only__ to that cluster. <img src="https://media.gettyimages.com/photos/albatross-chick-between-parents-feet-falkland-islands-picture-id642348358?s=2048x2048" width="450px" style="display: block; margin: auto;" /> --- count:false # Nested Structures - the things in a cluster belong __only__ to that cluster. - **`(1 | school/class)`** or **`(1 | school) + (1 | class:school)`** <img src="jk_img_sandbox/structure_nestednew.png" width="1575" style="display: block; margin: auto;" /> --- # Nested Structures - labels! - the things in a cluster belong __only__ to that cluster. - If labels are unique, **`(1 | school) + (1 | class)`** is the same as **`(1 | school/class)`** <img src="jk_img_sandbox/structure_nestedlabnew.png" width="1575" style="display: block; margin: auto;" /> --- count:false # Example .pull-left[ One study site recruits 20 participants. Each participant has 10 datapoints. ```r d3 <- read_csv("https://uoepsy.github.io/data/dapr3_mindfuldecline.csv") ``` ``` ## sitename ppt condition visit age ACE imp ## Sncbk PPT_1 control 1 60 84.5 unimp ## Sncbk PPT_1 control 2 62 85.6 imp ## Sncbk PPT_1 control 3 64 84.5 imp ## Sncbk PPT_1 control 4 66 83.1 imp ## ... ... ... ... ... ... ... ## Sncbk PPT_11 mindfulness 1 60 85.6 imp ## Sncbk PPT_11 mindfulness 2 62 84.5 unimp ## Sncbk PPT_11 mindfulness 3 64 85.7 imp ## Sncbk PPT_11 mindfulness 4 66 84.8 unimp ``` ```r ggplot(d3, aes(x=visit, y=ACE))+ geom_line(aes(group=ppt, col=condition), alpha=.7) ``` ] .pull-right[ ```r ... + (1 + ... | ppt) ``` ![](dapr3_2324_04a_ranef_files/figure-html/unnamed-chunk-11-1.svg)<!-- --> ] --- # Nested Example .pull-left[ 14 study sites each recruit between 15-30 participants. Each participant has 10 datapoints. ```r d3full <- read_csv("https://uoepsy.github.io/data/dapr3_mindfuldeclineFULL.csv") ``` ``` ## sitename ppt condition visit age ACE imp ## Savdz PPT_1 control 1 60 84.8 imp ## Savdz PPT_1 control 2 62 85 imp ## Savdz PPT_1 control 3 64 83.9 imp ## Savdz PPT_1 control 4 66 83 imp ## Savdz PPT_1 control 5 68 82.2 imp ## Savdz PPT_1 control 6 70 81.9 imp ## ... ... ... ... ... ... ... ## ... ... ... ... ... ... ... ## Slonb PPT_8 control 9 76 82.1 imp ## Slonb PPT_8 control 10 78 81.6 imp ## Slonb PPT_9 mindfulness 1 60 85 imp ## Slonb PPT_9 mindfulness 2 62 85.1 imp ## ... ... ... ... ... ... ... ``` ```r ggplot(d3full, aes(x=visit, y=ACE))+ geom_line(aes(group=ppt, col=condition), alpha=.7) + facet_wrap(~sitename) ``` ] .pull-right[ ```r ... + (1 + ... | sitename / ppt) ``` ![](dapr3_2324_04a_ranef_files/figure-html/unnamed-chunk-16-1.svg)<!-- --> ] --- # Crossed Structures - "crossed" = not nested! -- - **`(1 | subject) + (1 | task)`** - the things in a cluster can also belong to other clusters <img src="jk_img_sandbox/structure_crossednew.png" width="1571" style="display: block; margin: auto;" /> --- # Random Effects Revisited **What do we mean by "random effects"?** $$ \text{... + }\underbrace{\text{(random intercept + random slopes | grouping structure)}}_{\text{random effects}} $$ .pull-left[ People use different phrasings... - when referring to random slopes: - "random effects of x for g" - "random effects of x by g" - "by-g random effects of x" - when referring to random intercept: - "random effect for g" common definition: "allow ___ to vary by g" ] .pull-right[ __Nested__ ``` ... + (1 + ... | g1 / g2) ... + (1 + ... | g1 ) + (1 + ... | g1:g2) ``` __Crossed__ ``` ... + (1 + ... | g1 ) + (1 + ... | g2) ``` ] --- # Random Effects Revisited (2) **Should variable `g` be fixed or random?** | Criterion: | Repetition: <br> _If the experiment were repeated:_ | Desired inference: <br> _The conclusions refer to:_ | |----------------|--------------------------------------------------|----------------------------------------------------| | Fixed<br>**y ~ ... + g** | <center>Same groups would be used</center> | <center>The groups used</center> | | Random<br>**y ~ ... + (...|g)** | <center>Different groups would be used</center> | <center>A population from which the groups used<br> are just a (random) sample</center> | - If only small number of groups, estimating variance components may be unstable. - Partialling out group-differences as fixed effects *may* be preferable. --- # Random Effects Revisited (3) **I have `y ~ 1 + x + (1 | g)` should I include by-g random slope of x?** If the effect of x can vary by g, then including `x | g` will give a better estimate of the uncertainty in the fixed effect of x. <br><br> ``` 1. ACE ~ visit + (1 + visit | ppt) 2. ACE ~ visit + (1 | ppt) ``` 1 is preferable to 2, especially because we're interested in estimating and testing the effect of visit. ``` 3. ACE ~ visit + covariate + (1 + visit + covariate | ppt) 4. ACE ~ visit + covariate + (1 + visit | ppt) ``` 3 is preferable to 4 because it more accurately represents the world (people vary in how the covariate influences cognition). But it's less crucial here - we're not interested in assessing significance of covariate, we're just controlling for it. --- # Random Effects Revisited (4) .pull-left[ ```r d3 <- read_csv("https://uoepsy.github.io/data/dapr3_mindfuldecline.csv") ggplot(d3, aes(x=visit,y=ACE,col=condition))+ geom_point()+ facet_wrap(~ppt) ``` ![](dapr3_2324_04a_ranef_files/figure-html/unnamed-chunk-18-1.svg)<!-- --> ] .pull-right[ - multiple observations from each participant `(1 | ppt)` theoretically makes sense (participants may vary in their average cognition) {{content}} ] -- - for a single ppt, the slope of `ACE ~ visit` exists in our study design. therefore, this *could* be different for different ppts! `(visit | ppt)` makes theoretical sense. {{content}} -- - for a single ppt, the slope of `ACE ~ condition` does not exist in our study design (each ppt is either one condition or the other). ~~`(condition | ppt)`~~ makes no sense --- # Random Effects Extended .pull-left[ ```r d3full <- read_csv("https://uoepsy.github.io/data/dapr3_mindfuldeclineFULL.csv") ggplot(d3full, aes(x=visit, y=ACE))+ geom_line(aes(group=ppt, col=condition), alpha=.7) + facet_wrap(~sitename) ``` ![](dapr3_2324_04a_ranef_files/figure-html/unnamed-chunk-19-1.svg)<!-- --> ] .pull-right[ - multiple observations from each participant: `(1 | ppt)` {{content}} ] -- - multiple participants nested within study sites: `(1 | sitename/ppt)` {{content}} -- - for a single ppt, the slope of `ACE ~ visit` exists in our study design: `(visit | ppt)` - for a single study site, the slope of `ACE ~ visit` exists in our study design: `(visit | sitename)` {{content}} -- - for a single ppt, the slope of `ACE ~ condition` does not exist in our study design: ~~`(condition | ppt)`~~ - for a single study site, the slope of `ACE ~ condition` exists in our study design: `(condition | sitename)` --- # Random Effects Extended (2) .pull-left[ ```r d3full <- read_csv("https://uoepsy.github.io/data/dapr3_mindfuldeclineFULL.csv") ggplot(d3full, aes(x=visit, y=ACE))+ geom_line(aes(group=ppt, col=condition), alpha=.7) + facet_wrap(~sitename) ``` ![](dapr3_2324_04a_ranef_files/figure-html/unnamed-chunk-20-1.svg)<!-- --> ] .pull-right[ `1 | ppt` `1 | sitename/ppt` `visit | ppt` `visit | sitename` ~~`condition | ppt`~~ `condition | sitename` ``` ... + (1 + visit + condition | sitename / ppt) ``` ``` ... + (1 + visit + condition | sitename ) + (1 + visit | ppt:sitename) ``` {{content}} ] -- **ONLY IF** ppts labels are unique to each study site: ``` ... + (1 + visit + condition | sitename ) + (1 + visit | ppt) ``` ``` ## sitename ppt ACE visit condition ## Savdz ppt_1 ... ... ... ## Sbfxt ppt_1 ... ... ... ## ... ... ... ... ... ``` --- # The poke in the eye It's a trade-off... .pull-left[ **Accurately representing the world** everything that can vary is modelled as varying ] -- .pull-right[ **Being able to fit the model** in our sample, some things will not vary _enough_ to fit `x|g` {{content}} ] -- - not enough groups in `g` - fit `+g` instead of `(1|g)` - predictors on different scales - `mm|g` vs `km|g` - can be fixed with scaling - not enough variance in `y~x` between groups --- class: inverse, center, middle, animated, rotateInDownLeft # End