class: center, middle, inverse, title-slide .title[ #
Bonus Content: GLMM
(OPTIONAL) ] .subtitle[ ## Data Analysis for Psychology in R 3 ] .author[ ### Josiah King ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- --- # lm() and glm() .pull-left[ ### lm() $$ `\begin{align} & \color{red}{y} = \color{blue}{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)} + \mathbf{\boldsymbol{\varepsilon}}\\ \end{align}` $$ ] --- count:false # lm() and glm() .pull-left[ ### lm() $$ `\begin{align} & \color{red}{y} = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\ \end{align}` $$ ] --- count:false # lm() and glm() .pull-left[ ### lm() $$ `\begin{align} & \color{red}{y} = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\ & \text{where } -\infty \leq \color{red}{y} \leq \infty \\ \end{align}` $$ ] -- .pull-right[ ### $$ `\begin{align} \color{red}{??} = & \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\ \end{align}` $$ ] --- count:false # lm() and glm() .pull-left[ ### lm() $$ `\begin{align} & \color{red}{y} = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\ & \text{where } -\infty \leq \color{red}{y} \leq \infty \\ \end{align}` $$ ] .pull-right[ ### glm() $$ `\begin{align} \color{red}{ln \left( \frac{p}{1-p} \right) } & = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\ & \text{where } 0 \leq p \leq 1 \\ \end{align}` $$ ] --- count:false # lm() and glm() .pull-left[ ### lm() $$ `\begin{align} & \color{red}{y} = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\ & \text{where } -\infty \leq \color{red}{y} \leq \infty \\ \end{align}` $$ ] .pull-right[ ### glm() $$ `\begin{align} \color{red}{ln \left( \frac{p}{1-p} \right) } & = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\ & \text{where } 0 \leq p \leq 1 \\ \end{align}` $$ glm() is the __generalised__ linear model. we can specify the link function to model outcomes with different distributions. this allows us to fit models such as the _logistic_ regression model: ```{} glm(y~x, family = binomial(link="logit")) ``` ] --- # logistic regression visualised .pull-left[ ### continuous outcome <br><br> ![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-1-1.svg)<!-- --> ] .pull-right[ ### binary outcome <br><br> ![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-2-1.svg)<!-- --> ] --- count:false # logistic regression visualised .pull-left[ ### linear regression we model __y__ directly as linear combination of one or more predictor variables ![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-3-1.svg)<!-- --> ] .pull-right[ ### logistic regression __probability__ is _not_ linear.. but we can model it indirectly ![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-4-1.svg)<!-- --> ] --- count:false # logistic regression visualised `\(ln \left( \frac{p}{1-p} \right)\)` __log-odds__ are linear <img src="jk_img_sandbox/logoddp.png" width="1977" height="400px" /> --- # lmer() and glmer() .pull-left[ ![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-7-1.svg)<!-- --> ] .pull-right[ ![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-8-1.svg)<!-- --> ] --- count:false # lmer() and glmer() .pull-left[ ![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-9-1.svg)<!-- --> ] .pull-right[ ![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-10-1.svg)<!-- --> ] --- # fitting glmer() .pull-left[ > In a study examining how cognition changes over time, a sample of 20 participants took the Addenbrooke's Cognitive Examination (ACE) every 2 years from age 60 to age 78. Researchers are interested in whether the **probability of being classified as cognitively impaired** changes over age. ] .pull-right[ ```r d3 <- read_csv("https://uoepsy.github.io/data/dapr3_mindfuldecline.csv") %>% mutate(isImp = ifelse(imp=="imp",1,0)) d3 %>% select(ppt, visit, isImp) %>% head() ``` ``` ## # A tibble: 6 × 3 ## ppt visit isImp ## <chr> <dbl> <dbl> ## 1 PPT_1 1 0 ## 2 PPT_1 2 1 ## 3 PPT_1 3 1 ## 4 PPT_1 4 1 ## 5 PPT_1 5 1 ## 6 PPT_1 6 1 ``` ] ```r impmod <- glmer(isImp ~ 1 + visit + (1 + visit | ppt), data = d3, family="binomial", control = glmerControl(optimizer = "bobyqa")) ``` --- # fitting glmer() .pull-left[ ```r summary(impmod) ``` ``` ## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [ ## glmerMod] ## Family: binomial ( logit ) ## Formula: isImp ~ 1 + visit + (1 + visit | ppt) ## Data: d3 ## Control: glmerControl(optimizer = "bobyqa") ## ## AIC BIC logLik deviance df.resid ## 122.7 138.5 -56.3 112.7 172 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.4107 0.0050 0.0512 0.2700 1.4308 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## ppt (Intercept) 3.50 1.87 ## visit 1.22 1.10 -0.90 ## Number of obs: 177, groups: ppt, 20 ## ## Fixed effects: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -1.019 0.734 -1.39 0.165 ## visit 1.007 0.373 2.70 0.007 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Correlation of Fixed Effects: ## (Intr) ## visit -0.839 ``` ] .pull-right[ ```r exp(fixef(impmod)) ``` ``` ## (Intercept) visit ## 0.3611 2.7362 ``` ] --- # interpretating coefficients - `lm(y ~ x + ...)` - `\(\beta_x\)` denotes the change in the average `\(y\)` when `\(x\)` is increased by one unit and all other covariates are fixed. - `lmer(y ~ x + ... + (1 + x + ... | cluster))` - `\(\beta_x\)` denotes the change in the average `\(y\)` when `\(x\)` is increased by one unit, averaged across clusters - `glmer(ybin ~ x + ... + (1 + x + ... | cluster), family=binomial)` - `\(e^{\beta_x}\)` denotes the change in the average `\(y\)` when `\(x\)` is increased by one unit, __holding cluster constant.__ --- # why are glmer() coefficients cluster-specific? consider a __linear__ multilevel model: `lmer(respiratory_rate ~ treatment + (1|hospital))` Imagine two patients from different hospitals. One has a treatment, one does not. - patient `\(j\)` from hospital `\(i\)` is "control" - patient `\(j'\)` from hospital `\(i'\)` is "treatment" The difference in estimated outcome between patient `\(j\)` and patient `\(j'\)` is the "the effect of having treatment" plus the distance in random deviations between hospitals `\(i\)` and `\(i'\)` model for patient `\(j\)` from hospital `\(i\)` `\(\hat{y}_{ij} = (\gamma_{00} + \zeta_{0i}) + \beta_1 (Treatment_{ij} = 0)\)` model for patient `\(j'\)` from hospital `\(i'\)` `\(\hat{y}_{i'j'} = (\gamma_{00} + \zeta_{0i'}) + \beta_1 (Treatment_{i'j'} = 1)\)` difference: `\(\hat{y}_{i'j'} - \hat{y}_{ij} = \beta_1 + (\zeta_{0i'} - \zeta_{0i}) = \beta_1\)` Because `\(\zeta \sim N(0,\sigma_\zeta)\)`, the differences between all different `\(\zeta_{0i'} - \zeta_{0i}\)` average out to be 0. --- # why are glmer() coefficients cluster-specific? consider a __logistic__ multilevel model: `glmer(needs_op ~ treatment + (1|hospital), family="binomial")` Imagine two patients from different hospitals. One has a treatment, one does not. - patient `\(j\)` from hospital `\(i\)` is "control" - patient `\(j'\)` from hospital `\(i'\)` is "treatment" The difference in __probability of outcome__ between patient `\(j\)` and patient `\(j'\)` is the "the effect of having treatment" plus the distance in random deviations between hospitals `\(i\)` and `\(i'\)` model for patient `\(j\)` from hospital `\(i\)` `\(log \left( \frac{p_{ij}}{1 - p_{ij}} \right) = (\gamma_{00} + \zeta_{0i}) + \beta_1 (Treatment_{ij} = 0)\)` model for patient `\(j'\)` from hospital `\(i'\)` `\(log \left( \frac{p_{i'j'}}{1 - p_{i'j'}} \right) = (\gamma_{00} + \zeta_{0i'}) + \beta_1 (Treatment_{i'j'} = 1)\)` difference (log odds): `\(log \left( \frac{p_{i'j'}}{1 - p_{i'j'}} \right) - log \left( \frac{p_{ij}}{1 - p_{ij}} \right) = \beta_1 + (\zeta_{0i'} - \zeta_{0i})\)` --- # why are glmer() coefficients cluster-specific? consider a __logistic__ multilevel model: `glmer(needs_op ~ treatment + (1|hospital), family="binomial")` Imagine two patients from different hospitals. One has a treatment, one does not. - patient `\(j\)` from hospital `\(i\)` is "control" - patient `\(j'\)` from hospital `\(i'\)` is "treatment" The difference in __probability of outcome__ between patient `\(j\)` and patient `\(j'\)` is the "the effect of having treatment" plus the distance in random deviations between hospitals `\(i\)` and `\(i'\)` model for patient `\(j\)` from hospital `\(i\)` `\(log \left( \frac{p_{ij}}{1 - p_{ij}} \right) = (\gamma_{00} + \zeta_{0i}) + \beta_1 (Treatment_{ij} = 0)\)` model for patient `\(j'\)` from hospital `\(i'\)` `\(log \left( \frac{p_{i'j'}}{1 - p_{i'j'}} \right) = (\gamma_{00} + \zeta_{0i'}) + \beta_1 (Treatment_{i'j'} = 1)\)` difference (odds ratio): `\(\frac{p_{i'j'}/(1 - p_{i'j'})}{p_{ij}/(1 - p_{ij})} = \exp(\beta_1 + (\zeta_{0i'} - \zeta_{0i}))\)` --- # why are glmer() coefficients cluster-specific? consider a __logistic__ multilevel model: `glmer(needs_op ~ treatment + (1|hospital), family="binomial")` Imagine two patients from different hospitals. One has a treatment, one does not. - patient `\(j\)` from hospital `\(i\)` is "control" - patient `\(j'\)` from hospital `\(i'\)` is "treatment" The difference in __probability of outcome__ between patient `\(j\)` and patient `\(j'\)` is the "the effect of having treatment" plus the distance in random deviations between hospitals `\(i\)` and `\(i'\)` model for patient `\(j\)` from hospital `\(i\)` `\(log \left( \frac{p_{ij}}{1 - p_{ij}} \right) = (\gamma_{00} + \zeta_{0i}) + \beta_1 (Treatment_{ij} = 0)\)` model for patient `\(j'\)` from hospital `\(i'\)` `\(log \left( \frac{p_{i'j'}}{1 - p_{i'j'}} \right) = (\gamma_{00} + \zeta_{0i'}) + \beta_1 (Treatment_{i'j'} = 1)\)` difference (odds ratio): `\(\frac{p_{i'j'}/(1 - p_{i'j'})}{p_{ij}/(1 - p_{ij})} = \exp(\beta_1 + (\zeta_{0i'} - \zeta_{0i})) \neq \exp(\beta_1)\)` --- # why are glmer() coefficients cluster-specific? consider a __logistic__ multilevel model: `glmer(needs_op ~ treatment + (1|hospital), family="binomial")` Hence, the interpretation of `\(e^{\beta_1}\)` is not the odds ratio for the effect of treatment "averaged over hospitals", but rather for patients _from the same hospital_. --- # Summary - Differences between linear and logistic multi-level models are analogous to the differences between single-level linear and logistic regression models. - Odds Ratios from fixed effects in logistic multilevel models are "conditional upon" holding the cluster constant. --- class: inverse, center, middle, animated, rotateInDownLeft # End