Bonus Content: GLMM (OPTIONAL)

class: center, middle, inverse, title-slide

.title[
# Bonus Content: GLMM (OPTIONAL)
]
.subtitle[
## Data Analysis for Psychology in R 3
]
.author[
### Josiah King
]
.institute[
### Department of Psychology The University of Edinburgh
]

---

---
# lm() and glm()

.pull-left[
### lm()  
$$
`\begin{align}
& \color{red}{y} = \color{blue}{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)} + \mathbf{\boldsymbol{\varepsilon}}\\
\end{align}`
$$ 
]

---
count:false
# lm() and glm()

.pull-left[
### lm()  
$$
`\begin{align}
& \color{red}{y} = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\
\end{align}`
$$ 
]
---
count:false
# lm() and glm()

.pull-left[
### lm()  
$$
`\begin{align}
& \color{red}{y} = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\
& \text{where } -\infty \leq \color{red}{y} \leq \infty \\
\end{align}`
$$

]

.pull-right[
### &nbsp;

$$
`\begin{align}
\color{red}{??} = & \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\
\end{align}`
$$ 
]
---
count:false
# lm() and glm()

.pull-left[
### lm()  
$$
`\begin{align}
& \color{red}{y} = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\
& \text{where } -\infty \leq \color{red}{y} \leq \infty \\
\end{align}`
$$

]

.pull-right[
### glm()

$$
`\begin{align}
\color{red}{ln \left( \frac{p}{1-p} \right) } & = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\
& \text{where } 0 \leq p \leq 1 \\
\end{align}`
$$ 
]
---
count:false
# lm() and glm()

.pull-left[
### lm()  
$$
`\begin{align}
& \color{red}{y} = \color{blue}{\underbrace{\beta_0 + \beta_1(x_1) + ... + \beta_k(x_k)}_{\mathbf{X \boldsymbol{\beta}}}} + \boldsymbol{\varepsilon} \\
& \text{where } -\infty \leq \color{red}{y} \leq \infty \\
\end{align}`
$$

]

.pull-right[
### glm()

glm() is the __generalised__ linear model.

we can specify the link function to model outcomes with different distributions.  
this allows us to fit models such as the _logistic_ regression model:
```{}
glm(y~x, family = binomial(link="logit"))
```
]
---
# logistic regression visualised

.pull-left[
### continuous outcome
 
![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-1-1.svg)
]
.pull-right[
### binary outcome
 
![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-2-1.svg)
]
---
count:false
# logistic regression visualised

.pull-left[
### linear regression
we model __y__ directly as linear combination of one or more predictor variables 
![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-3-1.svg)
]
.pull-right[
### logistic regression
__probability__ is _not_ linear.. 
but we can model it indirectly 
![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-4-1.svg)
]
---
count:false
# logistic regression visualised

`$ln \left( \frac{p}{1-p} \right)$`  
__log-odds__ are linear

---
# lmer() and glmer()

.pull-left[

![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-7-1.svg)

] 
.pull-right[

![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-8-1.svg)

]

---
count:false
# lmer() and glmer()

.pull-left[
![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-9-1.svg)

] 
.pull-right[

![](dapr3_2324_04c_glmer_files/figure-html/unnamed-chunk-10-1.svg)

]
---
# fitting glmer()

.pull-left[

> In a study examining how cognition changes over time, a sample of 20 participants took the Addenbrooke's Cognitive Examination (ACE) every 2 years from age 60 to age 78. 
Researchers are interested in whether the **probability of being classified as cognitively impaired** changes over age.

]
.pull-right[

```r
d3 <- 
 read_csv("https://uoepsy.github.io/data/dapr3_mindfuldecline.csv") %>% 
 mutate(isImp = ifelse(imp=="imp",1,0))
d3 %>% 
 select(ppt, visit, isImp) %>%
 head()
```

```
## # A tibble: 6 × 3
## ppt visit isImp
## <chr> <dbl> <dbl>
## 1 PPT_1 1 0
## 2 PPT_1 2 1
## 3 PPT_1 3 1
## 4 PPT_1 4 1
## 5 PPT_1 5 1
## 6 PPT_1 6 1
```
]

```r
impmod <- glmer(isImp ~ 1 + visit + (1 + visit | ppt),
 data = d3, 
 family="binomial",
 control = glmerControl(optimizer = "bobyqa"))
```

---
# fitting glmer()

.pull-left[

```r
summary(impmod)
```

```
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
## glmerMod]
##  Family: binomial  ( logit )
## Formula: isImp ~ 1 + visit + (1 + visit | ppt)
##    Data: d3
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##    122.7    138.5    -56.3    112.7      172 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4107  0.0050  0.0512  0.2700  1.4308 
## 
## Random effects:
##  Groups Name        Variance Std.Dev. Corr 
##  ppt    (Intercept) 3.50     1.87          
##         visit       1.22     1.10     -0.90
## Number of obs: 177, groups:  ppt, 20
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   -1.019      0.734   -1.39    0.165   
## visit          1.007      0.373    2.70    0.007 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## visit -0.839
```
]
.pull-right[

```r
exp(fixef(impmod))
```

```
## (Intercept)       visit 
##      0.3611      2.7362
```
]

---
# interpretating coefficients

- `lm(y ~ x + ...)`
  - `$\beta_x$` denotes the change in the average `$y$` when `$x$` is increased by one unit and all other covariates are fixed.

- `lmer(y ~ x + ... + (1 + x + ... | cluster))`
  - `$\beta_x$` denotes the change in the average `$y$` when `$x$` is increased by one unit, averaged across clusters

- `glmer(ybin ~ x + ... + (1 + x + ... | cluster), family=binomial)`
  - `$e^{\beta_x}$` denotes the change in the average `$y$` when `$x$` is increased by one unit, __holding cluster constant.__

---
# why are glmer() coefficients cluster-specific?
consider a __linear__ multilevel model: `lmer(respiratory_rate ~ treatment + (1|hospital))`

Imagine two patients from different hospitals. One has a treatment, one does not. 
  - patient `$j$` from hospital `$i$` is "control"   
  - patient `$j'$` from hospital `$i'$` is "treatment"

The difference in estimated outcome between patient `$j$` and patient `$j'$` is the "the effect of having treatment" plus the distance in random deviations between hospitals `$i$` and `$i'$`

model for patient `$j$` from hospital `$i$`  
`$\hat{y}_{ij} = (\gamma_{00} + \zeta_{0i}) + \beta_1 (Treatment_{ij} = 0)$`

model for patient `$j'$` from hospital `$i'$`  
`$\hat{y}_{i'j'} = (\gamma_{00} + \zeta_{0i'}) + \beta_1 (Treatment_{i'j'} = 1)$`

difference:  
`$\hat{y}_{i'j'} - \hat{y}_{ij} = \beta_1 + (\zeta_{0i'} - \zeta_{0i}) = \beta_1$`

Because `$\zeta \sim N(0,\sigma_\zeta)$`, the differences between all different `$\zeta_{0i'} - \zeta_{0i}$` average out to be 0.

---
# why are glmer() coefficients cluster-specific?

consider a __logistic__ multilevel model: `glmer(needs_op ~ treatment + (1|hospital), family="binomial")`

Imagine two patients from different hospitals. One has a treatment, one does not. 
  - patient `$j$` from hospital `$i$` is "control"   
  - patient `$j'$` from hospital `$i'$` is "treatment"  
  
The difference in __probability of outcome__ between patient `$j$` and patient `$j'$` is the "the effect of having treatment" plus the distance in random deviations between hospitals `$i$` and `$i'$`

model for patient `$j$` from hospital `$i$`  
`$log \left( \frac{p_{ij}}{1 - p_{ij}} \right)  = (\gamma_{00} + \zeta_{0i}) + \beta_1 (Treatment_{ij} = 0)$`

model for patient `$j'$` from hospital `$i'$`  
`$log \left( \frac{p_{i'j'}}{1 - p_{i'j'}} \right) = (\gamma_{00} + \zeta_{0i'}) + \beta_1 (Treatment_{i'j'} = 1)$`

difference (log odds):  
`$log \left( \frac{p_{i'j'}}{1 - p_{i'j'}} \right) - log \left( \frac{p_{ij}}{1 - p_{ij}} \right) = \beta_1 + (\zeta_{0i'} - \zeta_{0i})$`

---
# why are glmer() coefficients cluster-specific?

consider a __logistic__ multilevel model: `glmer(needs_op ~ treatment + (1|hospital), family="binomial")`

model for patient `$j$` from hospital `$i$`  
`$log \left( \frac{p_{ij}}{1 - p_{ij}} \right)  = (\gamma_{00} + \zeta_{0i}) + \beta_1 (Treatment_{ij} = 0)$`

model for patient `$j'$` from hospital `$i'$`  
`$log \left( \frac{p_{i'j'}}{1 - p_{i'j'}} \right) = (\gamma_{00} + \zeta_{0i'}) + \beta_1 (Treatment_{i'j'} = 1)$`

difference (odds ratio):  
`$\frac{p_{i'j'}/(1 - p_{i'j'})}{p_{ij}/(1 - p_{ij})} = \exp(\beta_1 + (\zeta_{0i'} - \zeta_{0i}))$`

---
# why are glmer() coefficients cluster-specific?

consider a __logistic__ multilevel model: `glmer(needs_op ~ treatment + (1|hospital), family="binomial")`

model for patient `$j$` from hospital `$i$`  
`$log \left( \frac{p_{ij}}{1 - p_{ij}} \right)  = (\gamma_{00} + \zeta_{0i}) + \beta_1 (Treatment_{ij} = 0)$`

model for patient `$j'$` from hospital `$i'$`  
`$log \left( \frac{p_{i'j'}}{1 - p_{i'j'}} \right) = (\gamma_{00} + \zeta_{0i'}) + \beta_1 (Treatment_{i'j'} = 1)$`

difference (odds ratio):  
`$\frac{p_{i'j'}/(1 - p_{i'j'})}{p_{ij}/(1 - p_{ij})} = \exp(\beta_1 + (\zeta_{0i'} - \zeta_{0i})) \neq \exp(\beta_1)$`

---
# why are glmer() coefficients cluster-specific?

consider a __logistic__ multilevel model: `glmer(needs_op ~ treatment + (1|hospital), family="binomial")`

Hence, the interpretation of `$e^{\beta_1}$` is not the odds ratio for the effect of treatment "averaged over hospitals", but rather for patients _from the same hospital_. 
 
---
# Summary

- Differences between linear and logistic multi-level models are analogous to the differences between single-level linear and logistic regression models.

- Odds Ratios from fixed effects in logistic multilevel models are "conditional upon" holding the cluster constant.

---
class: inverse, center, middle, animated, rotateInDownLeft

# End