Linear Models and Clustered Data

class: center, middle, inverse, title-slide

.title[
# <b>Linear Models and Clustered Data</b>
]
.subtitle[
## Data Analysis for Psychology in R 3
]
.author[
### Josiah King
]
.institute[
### Department of Psychology<br/>The University of Edinburgh
]

---

class: inverse, center, middle

<h2>Part 1: Linear Regression Refresh</h2>
<h2 style="text-align: left;opacity:0.3;">Part 2: Clustered Data</h2>
<h2 style="text-align: left;opacity:0.3;">Part 3: Where we're going</h2>
<h2 style="text-align: left;opacity:0.3;">Extra slides (optional): Other Approaches</h2>

---

# Models

.pull-left[
__deterministic__

given the same input, deterministic functions return *exactly* the same output

- `$y = mx + c$`

- area of sphere = `$4\pi r^2$`

- height of fall = `$1/2 g t^2$`
    - `$g = \textrm{gravitational constant, }9.8m/s^2$`
    - `$t = \textrm{time (in seconds) of fall}$`

]

.pull-right[
__statistical__

.br2.f4.white.bg-gray[
$$ 
\textrm{outcome} = (\textrm{model}) + \color{black}{\textrm{error}}
$$
]

- handspan = height + randomness

- cognitive test score = age + premorbid IQ + ... + randomness

]

---
# The Linear Model

.br3.pa2.f2[
$$
`\begin{align}
\color{red}{\textrm{outcome}} & = \color{blue}{(\textrm{model})} + \textrm{error} \\
\color{red}{y_i} & = \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} x_i} + \varepsilon_i \\
\text{where } \\
\varepsilon_i & \sim N(0, \sigma) \text{ independently} \\
\end{align}`
$$
]

---
# The Linear Model

.flex.items-top[
.w-50.pa2[
Our proposed model of the world:

`$\color{red}{y_i} = \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} x_i} + \varepsilon_i$`  
  
{{content}}
]
.w-50.pa2[
![](01_lmcluster_files/figure-html/bb-1.svg)
]]

Our model _fitted_ to some data (note the `$\widehat{\textrm{hats}}$`):

`$\hat{y}_i = \color{blue}{\hat \beta_0 \cdot{} 1 + \hat \beta_1 \cdot{} x_i}$`

For the `$i^{th}$` observation:
  - `$\color{red}{y_i}$` is the value we observe for `$x_i$`   
  - `$\hat{y}_i$` is the value the model _predicts_ for `$x_i$`   
  - `$\color{red}{y_i} = \hat{y}_i + \hat\varepsilon_i$`  
  
---
# An Example

.flex.items-top[
.w-50.pa2[

`$\color{red}{y_i} = \color{blue}{5 \cdot{} 1 + 2 \cdot{} x_i} + \hat\varepsilon_i$`  
  
__e.g.__   
for the observation `$x_i = 1.2, \; y_i = 9.9$`:

$$
`\begin{align}
\color{red}{9.9} & = \color{blue}{5 \cdot{}} 1 + \color{blue}{2 \cdot{}} 1.2 + \hat\varepsilon_i \\
& = 7.4 + \hat\varepsilon_i \\
& = 7.4 + 2.5 \\
\end{align}`
$$
]
.w-50.pa2[
![](01_lmcluster_files/figure-html/errplot-1.svg)
]]

---
# Categorical Predictors

.pull-left[  
<br><table>
 <thead>
  <tr>
   <th style="text-align:left;"> y </th>
   <th style="text-align:left;"> x </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 7.99 </td>
   <td style="text-align:left;"> Category1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 4.73 </td>
   <td style="text-align:left;"> Category0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3.66 </td>
   <td style="text-align:left;"> Category0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3.41 </td>
   <td style="text-align:left;"> Category0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 5.75 </td>
   <td style="text-align:left;"> Category1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 5.66 </td>
   <td style="text-align:left;"> Category0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ... </td>
   <td style="text-align:left;"> ... </td>
  </tr>
</tbody>
</table>
]
.pull-right[
![](01_lmcluster_files/figure-html/unnamed-chunk-2-1.svg)
]

---
# Multiple Predictors

.pull-left[
![](01_lmcluster_files/figure-html/unnamed-chunk-3-1.svg)
]

.pull-right[
![](01_lmcluster_files/figure-html/unnamed-chunk-4-1.svg)
]

---
exclude: true
# Multiple Predictors

.pull-left[
<img src="jk_img_sandbox/SSblank1.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[]

---
# Multiple Predictors

.pull-left[
<img src="jk_img_sandbox/SSblank1cov.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[

```r
lm(y ~ x1)
```

```
## 
## Call:
## lm(formula = y ~ x1)
## 
## Coefficients:
## (Intercept)           x1  
##      0.0469       0.1436
```

]

---
# Multiple Predictors

.pull-left[
<img src="jk_img_sandbox/SStype3.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[

```r
lm(y ~ x1 + x2)
```

```
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Coefficients:
## (Intercept)           x1           x2  
##      0.0428       0.1425       0.1273
```

```r
resx1.x2 = resid(lm(x1 ~ x2))
lm(y ~ resx1.x2)
```

```
## 
## Call:
## lm(formula = y ~ resx1.x2)
## 
## Coefficients:
## (Intercept)     resx1.x2  
##      0.0198       0.1425
```
]

---
# Interactions

.pull-left[
![](01_lmcluster_files/figure-html/unnamed-chunk-12-1.svg)
]

.pull-right[
![](01_lmcluster_files/figure-html/unnamed-chunk-13-1.svg)
]

---
# Notation

`$\begin{align} \color{red}{y} \;\;\;\; & = \;\;\;\;\; \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} x_1 + ... + \beta_k \cdot x_k} & + & \;\;\;\varepsilon \\ \qquad \\ \color{red}{\begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ \vdots \\ y_n \end{bmatrix}} & = \color{blue}{\begin{bmatrix} 1 & x_{11} & x_{21} & \dots & x_{k1} \\ 1 & x_{12} & x_{22} &  & x_{k2} \\ 1 & x_{13} & x_{23} &  & x_{k3} \\ 1 & x_{14} & x_{24} &  & x_{k4} \\ 1 & x_{15} & x_{25} &  & x_{k5} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & x_{2n} & \dots & x_{kn} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \vdots \\ \beta_k \end{bmatrix}} & + & \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \\ \varepsilon_5 \\ \vdots \\ \varepsilon_n \end{bmatrix} \\ \qquad \\ \\\color{red}{\boldsymbol y} \;\;\;\;\; & = \qquad \qquad \;\;\; \mathbf{\color{blue}{X \qquad \qquad \qquad \;\;\;\: \boldsymbol \beta}} & + & \;\;\; \boldsymbol \varepsilon \\ \end{align}$`

---
# Link functions

`$\begin{align} \color{red}{y} = \mathbf{\color{blue}{X \boldsymbol{\beta}} + \boldsymbol{\varepsilon}} & \qquad  & (-\infty, \infty) \\ \qquad \\ \qquad \\ \color{red}{ln \left( \frac{p}{1-p} \right) } = \mathbf{\color{blue}{X \boldsymbol{\beta}} + \boldsymbol{\varepsilon}} & \qquad  & [0,1] \\ \qquad \\ \qquad \\ \color{red}{ln (y) } = \mathbf{\color{blue}{X \boldsymbol{\beta}} + \boldsymbol{\varepsilon}} & \qquad  & (0, \infty) \\ \end{align}$`

---
# Linear Models in R

- Linear regression

```r
linear_model <- lm(continuous_y ~ x1 + x2 + x3*x4, data = df)
```

- Logistic regression

```r
logistic_model <- glm(binary_y ~ x1 + x2 + x3*x4, data = df, family=binomial(link="logit"))
```

- Poisson regression

```r
poisson_model <- glm(count_y ~ x1 + x2 + x3*x4, data = df, family=poisson(link="log"))
```

---
# Inference for Coefficients

---
# Inference for Coefficients

---
# Inference for Coefficients

---
# Inference for Coefficients

---
# Sums of Squares

Rather than specific coefficients, we can also think of our model as a whole.  
We can talk in terms of the sums of squared residuals  
`$\sum^{n}_{i=1}(y_i - \hat y_i)^2$`.

.pull-left[
<img src="jk_img_sandbox/SS3xmodel1.png" width="400px" height="350px" style="display: block; margin: auto;" />
]

.pull-right[

![](01_lmcluster_files/figure-html/unnamed-chunk-22-1.svg)

]

---
# `$R^2$`

.pull-left[
<img src="jk_img_sandbox/SSr2.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[
`$R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}}$`

```r
mdl <- lm(y ~ x1 + x2)
summary(mdl)$r.squared
```

```
## [1] 0.1935
```
]

---
# Inference: Joint tests

We can test reduction in residual sums of squares:

.pull-left[

```r
m1 <- lm(y ~ x1, data = df)
```
<img src="jk_img_sandbox/SS3xmodel1.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[

```r
m2 <- lm(y ~ x1 + x2 + x3, data = df)
```
<img src="jk_img_sandbox/SS3xfullmodel.png" width="400px" height="350px" style="display: block; margin: auto;" />

]

---
# Inference: Joint tests

i.e. isolating the improvement in model fit due to inclusion of additional parameters

.pull-left[

```r
m1 <- lm(y ~ x1, data = df)
m2 <- lm(y ~ x1 + x2 + x3, data = df)
anova(m1,m2)
```

```
## Analysis of Variance Table
## 
## Model 1: y ~ x1
## Model 2: y ~ x1 + x2 + x3
##   Res.Df  RSS Df Sum of Sq   F Pr(>F)    
## 1     98 1141                            
## 2     96  357  2       785 106 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]
.pull-right[
<img src="jk_img_sandbox/SS3xincrement.png" width="400px" height="350px" style="display: block; margin: auto;" />
]

---
# Inference: Joint tests

"additional parameters" could be a set of coefficients for levels of a categorical variable.  
This provides a way of assessing "are there differences in group means?".

.pull-left[

```r
m1 = lm(y ~ x1, data = df)
m2 = lm(y ~ x1 + grp, data = df)
coef(m2)
```

```
## (Intercept)          x1        grpb        grpc        grpd 
##      13.841       1.059      -3.162      -2.943      -3.415
```

```r
anova(m1, m2)
```

```
## Analysis of Variance Table
## 
## Model 1: y ~ x1
## Model 2: y ~ x1 + grp
##   Res.Df  RSS Df Sum of Sq    F  Pr(>F)    
## 1     98 1141                              
## 2     95  950  3       192 6.39 0.00055 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]
.pull-right[
<img src="jk_img_sandbox/SSblankq.png" width="400px" height="350px" style="display: block; margin: auto;" />
]

---
exclude: true
# Inference: Joint tests

This is kind of where traditional analysis of variance sits.  
Think of it as testing the addition of each variable entered in to the model,

.pull-left[

```r
m2 = lm(y ~ x1 + grp, data = df)
anova(m2)
```

```
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## x1         1     91    90.6    9.06 0.00334 ** 
## grp        3    192    63.9    6.39 0.00055 ***
## Residuals 95    950    10.0                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]
.pull-right[
<img src="jk_img_sandbox/SStype1.png" width="400px" height="350px" style="display: block; margin: auto;" />
]

---
# Assumptions

Our model:

`$\color{red}{y} = \color{blue}{\mathbf{X \boldsymbol \beta}} + \varepsilon \qquad \text{where } \boldsymbol \varepsilon \sim N(0, \sigma) \text{ independently}$`

Our ability to generalise from the model we fit on sample data to the wider population requires making some _assumptions._

- assumptions about the nature of the **model** .tr[
(linear)
]

- assumptions about the nature of the **errors** .tr[
(normal)
]

.footnote[
You can also phrase the linear model as: `$\color{red}{\boldsymbol  y} \sim Normal(\color{blue}{\mathbf{X \boldsymbol \beta}}, \sigma)$`
]

---
# Assumptions: The Broad Idea

All our work here is in
aim of making **models of the world**.

- Models are models. They are simplifications. They are therefore wrong.

.pull-left[]
.pull-right[
![](jk_img_sandbox/joeymap.jpg)
]

---
count:false
# Assumptions: The Broad Idea

All our work here is in
aim of making **models of the world**.

- Models are models. They are simplifications. They are therefore wrong.

- Our residuals ( `$y - \hat{y}$` ) reflect everything that we **don't** account for in our model.

- In an ideal world, our model accounts for _all_ the systematic relationships. The leftovers (our residuals) are just random noise.

- If our model is mis-specified, or we don't measure some systematic relationship, then our residuals will reflect this.

- We check by examining how much "like randomness" the residuals appear to be (zero mean, normally distributed, constant variance, i.i.d ("independent and identically distributed")
    - _these ideas tend to get referred to as our "assumptions"_

- We will **never** know whether our residuals contain only randomness - we can never observe everything!

---
# Assumptions

.pull-left[

What does "zero mean and constant variance" look like?

- mean of the residuals = zero across the predicted values on the linear predictor.

- spread of residuals is normally distributed and constant across the predicted values on the linear predictor.

]
.pull-right[

![](01_lmcluster_files/figure-html/unnamed-chunk-37-1.svg)

]

---
# Assumptions

.pull-left[

What does "zero mean and constant variance" look like?

- __mean of the residuals = zero across the predicted values on the linear predictor.__

- spread of residuals is normally distributed and constant across the predicted values on the linear predictor.

]
.pull-right[

![](01_lmcluster_files/figure-html/unnamed-chunk-38-1.svg)

]

---
# Assumptions

.pull-left[

What does "zero mean and constant variance" look like?

- mean of the residuals = zero across the predicted values on the linear predictor.

- __spread of residuals is normally distributed and constant across the predicted values on the linear predictor.__

]
.pull-right[

![](01_lmcluster_files/figure-html/unnamed-chunk-39-1.svg)
]

---
# Assumptions

.pull-left[

What does "zero mean and constant variance" look like?

- mean of the residuals = zero across the predicted values on the linear predictor.

- spread of residuals is normally distributed and constant across the predicted values on the linear predictor.

]
.pull-right[
__`plot(model)`__

```r
my_model <- lm(y ~ x, data = df)
plot(my_model, which = 1)
```

![](01_lmcluster_files/figure-html/unnamed-chunk-41-1.svg)

]

---
# Assumptions: Recipe Book
<br><br>
<center>
<div class="acronym">
L
</div> inearity<br>
<div class="acronym">
I
</div> ndependence<br>
<div class="acronym">
N
</div> ormality<br>
<div class="acronym">
E
</div> qual variance<br>
</center>
.footnote["Line without N is a Lie!" (Umberto)]

---
# What if our model doesn't meet assumptions?

- is our model mis-specified?  
  - is the relationship non-linear? higher order terms? (e.g. `$y \sim x + x^2$`)
  - is there an omitted variable or interaction term? 
  
  
--

- transform the outcome variable?
  - makes things look more "normal"
  - but can make things more tricky to interpret:  
    `lm(y ~ x)` and `lm(log(y) ~ x)` are quite different models

- bootstrap
  - do many times: resample (w/ replacement) your data, and refit your model.
  - obtain a distribution of parameter estimate of interest. 
  - compute a confidence interval for estimate
  - celebrate

__looking ahead:__ these don't help if we have violated our assumption of independence...

---
# Summary

- we can fit a linear regression model which takes the form `$\color{red}{y} = \color{blue}{\mathbf{X} \boldsymbol{\beta}} + \boldsymbol{\varepsilon}$`

- in R, we fit this with `lm(y ~ x1 + .... xk, data = mydata)`.

- we can extend this to different link functions to model outcome variables which follow different distributions.

- when drawing inferences from a fitted model to the broader population, we rely on certain assumptions.

- one of these is that the errors are independent.

---
class: inverse, center, middle, animated, rotateInDownLeft

# End of Part 1

---
class: inverse, center, middle

<h2 style="text-align: left;opacity:0.3;">Part 1: Linear Regression Refresh</h2>
<h2>Part 2: Clustered Data</h2>
<h2 style="text-align: left;opacity:0.3;">Part 3: Where we're going</h2>
<h2 style="text-align: left;opacity:0.3;">Extra slides (optional): Other Approaches</h2>

---
# Clustered Data

.pull-left[
- children within schools

- patients within clinics

- observations within individuals  
]
.pull-left[
<img src="jk_img_sandbox/h2.png" width="1716" />
]

---
# Clustered Clustered Data?

.pull-left[
- children within classrooms within schools within districts etc...

- patients within doctors within hospitals...

- time-periods within trials within individuals
]
.pull-right[
<img src="jk_img_sandbox/h3.png" width="1716" />
]

.footnote[
Other relevant terms you will tend to see: "grouping structure", "levels", "hierarchies". 
]

---
# Importance of Clustering

Clustering will likely result in measurements on observational units within a given cluster being more similar to each other than to those in other clusters.

- For example, our measure of academic performance for children in a given class will tend to be more similar to one another (because of class specific things such as the teacher) than to children in other classes.

---
# ICC (intra-class correlation coefficient)

Clustering is expressed in terms of the correlation among the measurements within the same cluster - known as the __intra-class correlation coefficient (ICC).__

There are various formulations of ICC, but the basic principle = ratio of *variance between groups* to *total variance*.

<br>
`$\rho = \frac{\sigma^2_{b}}{\sigma^2_{b} + \sigma^2_e} \\ \qquad \\\textrm{Where:} \\ \sigma^2_{b} = \textrm{variance between clusters} \\ \sigma^2_e = \textrm{variance within clusters (residual variance)} \\$`

Can also be interpreted as the correlation between two observations from the same group.

---
# ICC (intra-class correlation coefficient)

The larger the ICC, the lower the variability is within the clusters (relative to the variability between clusters). The greater the correlation between two observations from the same group.

.pull-left[
<img src="01_lmcluster_files/figure-html/unnamed-chunk-45-1.svg" style="display: block; margin: auto;" />

]

.pull-right[

<img src="01_lmcluster_files/figure-html/unnamed-chunk-46-1.svg" style="display: block; margin: auto;" />
]

---
# Clustered Data & Linear Models

.pull-left[
#### Why is it a problem?

Clustering is something **systematic** that our model should (arguably) take into account.

- remember, `$\varepsilon \sim N(0, \sigma) \textbf{ independently}$`

]

.pull-right[
#### How is it a problem?

Standard errors tend to be smaller than they should be, meaning that:

- confidence intervals will be too narrow 
  
  - `$t$`-statistics will be too large  
  
  - `$p$`-values will be misleadingly small

]

---
class: inverse, center, middle

<h2 style="text-align: left;opacity:0.3;">Part 1: Linear Regression Refresh</h2>
<h2><b style="opacity:0.4;">Part 2: Clustered Data</b><b>A Practical Comment on Data</b></h2>
<h2 style="text-align: left;opacity:0.3;">Part 3: Where we're going</h2>
<h2 style="text-align: left;opacity:0.3;">Extra slides (optional): Other Approaches</h2>

---
# Wide Data/Long Data

.pull-left[
__Wide Data__

```
## # A tibble: 5 × 5
##   ID    age   trial_1 trial_2 trial_3
##   <chr> <chr> <chr>   <chr>   <chr>  
## 1 001   28    10      12.5    18     
## 2 002   36    7.5     7       5      
## 3 003   61    12      14.5    11     
## 4 004   45    10.5    17      14     
## 5 ...   ...   ...     ...     ...
```

]
.pull-right[
__Long Data__

```
## # A tibble: 13 × 4
##    ID    age   trial   score
##    <chr> <chr> <chr>   <chr>
##  1 001   36    trial_1 10   
##  2 001   36    trial_2 12.5 
##  3 001   36    trial_3 18   
##  4 002   70    trial_1 7.5  
##  5 002   70    trial_2 7    
##  6 002   70    trial_3 5    
##  7 003   68    trial_1 12   
##  8 003   68    trial_2 14.5 
##  9 003   68    trial_3 11   
## 10 004   31    trial_1 10.5 
## 11 004   31    trial_2 17   
## 12 004   31    trial_3 14   
## 13 ...   ...   ...     ...
```

]

---
# Wide Data/Long Data

![](https://www.fromthebottomoftheheap.net/assets/img/posts/tidyr-longer-wider.gif)
.footnote[
Source: Examples of wide and long representations of the same data. Source: Garrick Aden-Buie’s (\@grrrck) [Tidy Animated Verbs](https://github.com/gadenbuie/tidyexplain)
]

---
# Long Data = Good for Plotting

.pull-left[
__`group` aesthetic__

```r
ggplot(longd, aes(x=trial,y=score, group=ID, col=ID))+
  geom_point(size=4)+
  geom_path()
```

![](01_lmcluster_files/figure-html/unnamed-chunk-50-1.svg)

]
.pull-right[
__`facet_wrap()`__

```r
ggplot(longd, aes(x=trial,y=score))+
  geom_point(size=4)+
  geom_path(aes(group=1))+
  facet_wrap(~ID)
```

![](01_lmcluster_files/figure-html/unnamed-chunk-51-1.svg)
]

---
# Long Data = Good for by-Cluster Computations

```r
longd %>% 
  group_by(ID) %>%
  summarise(
    ntrials = n_distinct(trial),
    meanscore = mean(score),
    sdscore = sd(score)
  )
```

```
## # A tibble: 4 × 4
##   ID    ntrials meanscore sdscore
##   <chr>   <int>     <dbl>   <dbl>
## 1 001         3      13.5    4.09
## 2 002         3       6.5    1.32
## 3 003         3      12.5    1.80
## 4 004         3      13.8    3.25
```

---
# Summary

- Clustering can take many forms, and exist at many levels

- Clustering is something systematic that we would want our model to take into account

- Ignoring it can lead to incorrect statistical inferences

- Clustering is typically assessed using intra-class correlation coefficient (ICC) - the ratio of variance between clusters to the total variance `$\rho = \frac{\sigma^2_b}{\sigma^2_b + \sigma^2_e}$`

- Tidying your data and converting it to *long* format (one observational unit per row, and a variable identifying the cluster ID) is a good start.

---
class: inverse, center, middle, animated, rotateInDownLeft

# End of Part 2

---
class: inverse, center, middle

<h2 style="text-align: left;opacity:0.3;">Part 1: Linear Regression Refresh</h2>
<h2 style="text-align: left;opacity:0.3;">Part 2: Clustered Data</h2>
<h2>Part 3: Where we're going</h2>
<h2 style="text-align: left;opacity:0.3;">Extra slides (optional): Other Approaches</h2>

---
# Our Data

.pull-left[
> Sample of 200 pupils from 20 schools completed a survey containing the Emotion Dysregulation Scale (EDS) and the Child Routines Questionnaire (CRQ).

```r
crq_data <- read_csv("https://uoepsy.github.io/data/crqdata.csv")
head(crq_data)
```

```
## # A tibble: 6 × 6
##   emot_dysreg   crq int       schoolid sleep   age
##         <dbl> <dbl> <chr>     <chr>    <chr> <dbl>
## 1        4.12  1.92 Treatment school1  <8hr     14
## 2        3.22  1.65 Treatment school1  <8hr     11
## 3        4.86  3.56 Treatment school1  <8hr     16
## 4        4.79  1.45 Treatment school1  8hr+     16
## 5        3.58  0.81 Treatment school1  <8hr     12
## 6        4.41  2.71 Treatment school1  <8hr     15
```

```r
library(ICC)
ICCbare(x = schoolid, y = emot_dysreg, data = crq_data)
```

```
## [1] 0.2443
```
]
.pull-right[
<img src="01_lmcluster_files/figure-html/unnamed-chunk-55-1.svg" style="display: block; margin: auto;" />
]

---
# Ignoring Clustering

.pull-left[
__(Complete pooling)__

+ `lm(y ~ 1 + x, data = df)`

+ Information from all clusters is pooled together to estimate over x

```r
model <- lm(emot_dysreg ~ crq, data = crq_data)
```

```
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.4470     0.1259   35.31   <2e-16 ***
## crq          -0.0525     0.0448   -1.17     0.24
```
]
.pull-right[
![](01_lmcluster_files/figure-html/unnamed-chunk-58-1.svg)
]

---
# Ignoring Clustering

.pull-left[
__(Complete pooling)__

+ `lm(y ~ 1 + x, data = df)`

+ Information from all clusters is pooled together to estimate over x

```r
model <- lm(emot_dysreg ~ crq, data = crq_data)
```

```
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.4470     0.1259   35.31   <2e-16 ***
## crq          -0.0525     0.0448   -1.17     0.24
```

But different clusters show different patterns.  
Residuals are __not__ independent.  
]
.pull-right[

![](01_lmcluster_files/figure-html/unnamed-chunk-61-1.svg)
]

---
count:false
# Ignoring Clustering

.pull-left[
__(Complete pooling)__

+ `lm(y ~ 1 + x, data = df)`

+ Information from all clusters is __pooled__ together to estimate over x

```r
model <- lm(emot_dysreg ~ crq, data = crq_data)
```

```
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.4470     0.1259   35.31   <2e-16 ***
## crq          -0.0525     0.0448   -1.17     0.24
```

But different clusters show different patterns.  
Residuals are __not__ independent.  
]
.pull-right[

![](01_lmcluster_files/figure-html/unnamed-chunk-64-1.svg)
]

---
# Fixed Effects Models

.pull-left[

__(No pooling)__

- `lm(y ~ x * cluster, data = df)`

- Information from a cluster contributes to estimate *for that cluster*, but information is not pooled to estimate an overall effect.

```r
model <- lm(emot_dysreg ~ 1 + crq * schoolid, 
            data = crq_data)
```
{{content}}
]
.pull-right[
![](01_lmcluster_files/figure-html/unnamed-chunk-66-1.svg)
]

+ Lots of estimates (separate for each cluster). 
+ Variance estimates constructed based on information *only* within each cluster. 
+ No overall estimate of effect over x.

```
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           3.31835    0.45187    7.34  1.8e-11 ***
## crq                   0.38247    0.20087    1.90  0.05904 .  
## schoolidschool10      1.89925    0.88643    2.14  0.03396 *  
## schoolidschool11      2.15526    0.72639    2.97  0.00356 ** 
## schoolidschool12      1.31007    0.63199    2.07  0.04009 *  
## schoolidschool13      0.39744    0.69852    0.57  0.57033    
## schoolidschool14      0.67908    0.70058    0.97  0.33414    
## schoolidschool15      1.09387    0.60494    1.81  0.07281 .  
## schoolidschool16      2.08396    0.74354    2.80  0.00582 ** 
## schoolidschool17      1.23261    0.63109    1.95  0.05289 .  
## schoolidschool18     -0.20698    0.92070   -0.22  0.82247    
## schoolidschool19      1.34223    0.59066    2.27  0.02465 *  
## schoolidschool2       1.31507    0.86143    1.53  0.12922    
## schoolidschool20      1.05688    0.63934    1.65  0.10066    
## schoolidschool3       1.17962    0.65361    1.80  0.07336 .  
## schoolidschool4       0.94437    0.60360    1.56  0.12005    
## schoolidschool5       1.49329    0.67013    2.23  0.02752 *  
## schoolidschool6       2.76414    0.68697    4.02  9.5e-05 ***
## schoolidschool7      -0.00683    0.60138   -0.01  0.99095    
## schoolidschool8       1.15007    0.62754    1.83  0.06907 .  
## schoolidschool9       0.62842    0.81937    0.77  0.44446    
## crq:schoolidschool10 -0.67112    0.34804   -1.93  0.05594 .  
## crq:schoolidschool11 -0.68897    0.29202   -2.36  0.01975 *  
## crq:schoolidschool12 -0.49875    0.25901   -1.93  0.05627 .
```

---
# Random Effects Models (MLM)

.pull-left[
__(Partial Pooling)__

- `lmer(y ~ 1 + x + (1 + x| cluster), data = df)`
- cluster-level variance in intercepts and slopes is modeled as randomly distributed around fixed parameters.
- effects are free to vary by cluster, but information from clusters contributes (according to cluster `$n$` and outlyingness of cluster) to an overall fixed parameter.

```r
library(lme4)
model <- lmer(emot_dysreg ~ 1 + crq + 
                (1 + crq | schoolid),
              data = crq_data)
summary(model)$coefficients
```

```
##             Estimate Std. Error t value
## (Intercept)  4.43772    0.15690   28.28
## crq         -0.05166    0.05063   -1.02
```

]
.pull-right[

![](01_lmcluster_files/figure-html/unnamed-chunk-69-1.svg)
]

---
# Random Effects Models (MLM)

.pull-left[
__(Partial Pooling)__

```r
library(lme4)
model <- lmer(emot_dysreg ~ 1 + crq + 
                (1 + crq | schoolid),
              data = crq_data)
summary(model)$coefficients
```

```
##             Estimate Std. Error t value
## (Intercept)  4.43772    0.15690   28.28
## crq         -0.05166    0.05063   -1.02
```

]
.pull-right[

![](01_lmcluster_files/figure-html/unnamed-chunk-71-1.svg)
]

---
# Summary

With clustered data, there are many possible approaches. Some of the main ones are:

- Ignore it (**complete pooling**)

- and make inappropriate inferences.

- Completely partition out any variance due to clustering into fixed effects for each cluster (__no pooling__).

- and limit our estimates to being cluster specific and low power.  
    
- Model cluster-level variance as randomly distributed around fixed parameters, and partially pool information across clusters.

- best of both worlds?

---
class: inverse, center, middle, animated, rotateInDownLeft

# End

---
class: inverse, center, middle

<h2 style="text-align: left;opacity:0.3;">Part 1: Linear Regression Refresh</h2>
<h2 style="text-align: left;opacity:0.3;">Part 2: Clustered Data</h2>
<h2 style="text-align: left;opacity:0.3;">Part 3: Where we're going</h2>
<h2>Extra slides (optional): Other Approaches</h2>

---
# Other: Repeated Measures ANOVA

## ANOVA revisited

.pull-left[
- We've been using `anova()` as a test of whether _several_ parameters are simultaneously zero

- The "omnibus"/"overall F"/"joint" test (can also be viewed as a model comparison)  
    
]

.pull-right[
- Traditional "ANOVA" is essentially a linear model with categorical predictor(s) where it looks for overall differences in group means (or differences in differences in group means etc).

- Still quite popular in psychology because we often design experiments with discrete conditions. 
    
    - Require post-hoc tests to examine _where_ any differences are.
]

---
exclude: true
# ANOVA revisited

In R, functions typically create anova tables from linear model objects:

```r
lin_mod <- lm(y ~ grp, df)
anova(lin_mod)
car::Anova(lin_mod)
```
or are wrappers which use the `lm()` function internally. So they're doing the same thing.

```r
summary(aov(y ~ grp, df))
```

---
exclude: true
# ANOVA sums of squares

.pull-left[

With multiple predictors, sums of squares can be calculated differently

1. Sequential Sums of Squares = Order matters
2. <p style="opacity:0.4">Partially Sequential Sums of Squares</p>
3. <p style="opacity:0.4">Partial Sums of Squares</p>

]
.pull-right[
__sequential SS__

```r
anova(lm(y~x1*x2,df))
```

```
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## x1         1     91      91   13.55 0.00038 ***
## x2         1    484     484   72.45 2.3e-13 ***
## x1:x2      1     16      16    2.32 0.13097    
## Residuals 96    642       7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
anova(lm(y~x2*x1,df))
```

```
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## x2         1    449     449   67.14 1.1e-12 ***
## x1         1    126     126   18.86 3.5e-05 ***
## x2:x1      1     16      16    2.32    0.13    
## Residuals 96    642       7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
]

---
exclude: true
# ANOVA sums of squares

.pull-left[

with multiple predictors, sums of squares can be calculated differently

1. <p style="opacity:0.4">Sequential Sums of Squares</p>
2. <p style="opacity:0.4">Partially Sequential Sums of Squares</p>
3. Partial Sums of Squares = Each one calculated as if its the last one in sequential SS (order doesn't matter).

]
.pull-right[
__partial SS__

```r
car::Anova(lm(y~x1*x2,df), type="III")
```

```
## Anova Table (Type III tests)
## 
## Response: y
##             Sum Sq Df F value  Pr(>F)    
## (Intercept)  11924  1 1784.14 < 2e-16 ***
## x1             122  1   18.27 4.5e-05 ***
## x2             497  1   74.30 1.4e-13 ***
## x1:x2           16  1    2.32    0.13    
## Residuals      642 96                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
car::Anova(lm(y~x2*x1,df), type="III")
```

```
## Anova Table (Type III tests)
## 
## Response: y
##             Sum Sq Df F value  Pr(>F)    
## (Intercept)  11924  1 1784.14 < 2e-16 ***
## x2             497  1   74.30 1.4e-13 ***
## x1             122  1   18.27 4.5e-05 ***
## x2:x1           16  1    2.32    0.13    
## Residuals      642 96                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]
    
---
# Other: Repeated Measures ANOVA

## ANOVA: Partitioning Variance

.footnote[  
The terminology here can be a nightmare.  
`$SS_{between}$` sometimes gets referred to as `$SS_{model}$`, `$SS_{condition}$`, `$SS_{regression}$`, or `$SS_{treatment}$`.  
Meanwhile `$SS_{residual}$` also gets termed `$SS_{error}$`.  
To make it all worse, there are inconsistencies in the acronyms used, `$SSR$` vs. `$RSS$`!

]

---
# Other: Repeated Measures ANOVA

## ANOVA: Partitioning Variance

---
# Other: Repeated Measures ANOVA in R

.pull-left[
![](01_lmcluster_files/figure-html/unnamed-chunk-82-1.svg)
]
.pull-right[

```r
library(afex)
aov_ez(id = "subject", dv = "y", data = df, within = "t")
```

```
## Anova Table (Type 3 tests)
## 
## Response: y
##   Effect          df  MSE         F  ges p.value
## 1      t 1.00, 39.18 3.20 36.65 *** .074   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
## 
## Sphericity correction method: GG
```

```r
library(ez)
ezANOVA(data = df, dv = y, wid = subject, 
        within = t)
```

```
## $ANOVA
##   Effect DFn DFd     F         p p<.05     ges
## 2      t   2  78 36.65 5.991e-12     * 0.07383
## 
## $`Mauchly's Test for Sphericity`
##   Effect        W         p p<.05
## 2      t 0.008994 1.335e-39     *
## 
## $`Sphericity Corrections`
##   Effect    GGe    p[GG] p[GG]<.05    HFe     p[HF] p[HF]<.05
## 2      t 0.5023 4.14e-07         * 0.5024 4.123e-07         *
```

]

---
# Other: Mixed ANOVA in R

.pull-left[
![](01_lmcluster_files/figure-html/unnamed-chunk-84-1.svg)
]
.pull-right[

```r
library(afex)
aov_ez(id = "subject", dv = "y", data = df, 
       between = "condition", within = "t")
```

```
## Anova Table (Type 3 tests)
## 
## Response: y
##        Effect          df   MSE         F  ges p.value
## 1   condition       3, 36 10.38 31.46 *** .695   <.001
## 2           t 1.01, 36.35  1.57 74.18 *** .215   <.001
## 3 condition:t 3.03, 36.35  1.57 14.31 *** .137   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
## 
## Sphericity correction method: GG
```

```r
library(ez)
ezANOVA(data = df, dv = y, wid = subject, 
        within = t, between = condition)
```

```
## $ANOVA
##        Effect DFn DFd     F         p p<.05    ges
## 2   condition   3  36 31.46 3.654e-10     * 0.6945
## 3           t   2  72 74.18 3.241e-18     * 0.2148
## 4 condition:t   6  72 14.31 1.156e-10     * 0.1367
## 
## $`Mauchly's Test for Sphericity`
##        Effect       W         p p<.05
## 3           t 0.01948 1.164e-30     *
## 4 condition:t 0.01948 1.164e-30     *
## 
## $`Sphericity Corrections`
##        Effect    GGe     p[GG] p[GG]<.05    HFe     p[HF] p[HF]<.05
## 3           t 0.5049 2.389e-10         * 0.5053 2.352e-10         *
## 4 condition:t 0.5049 2.428e-06         * 0.5053 2.407e-06         *
```
]

---
# Other: Advantages of MLM over ANOVA

- Rpt Measures and Mixed ANOVA are special cases of the MLM

- MLM can easily be extended to modelling different types of dependent variables. ANOVA cannot.

- MLM can be extended to model more complex grouping structures (e.g. children in schools in districts, or independent clusters of both participants as well as experimental items)

- MLM allows us to model an effect as varying

- MLM provides more options for dealing with unbalanced designs and missing data.

---
# Other: Cluster Robust Standard Errors

.pull-left[

Don't include clustering as part of the model directly, but incorporate the dependency into our residuals term.

$$
`\begin{align}
\color{red}{\textrm{outcome}} & = \color{blue}{(\textrm{model})} + \textrm{error}^* \\
\end{align}`
$$
.footnote[`*` Where errors are clustered]

]

.pull-right[
Simply adjusts our standard errors:

```r
library(plm)
clm <- plm(emot_dysreg ~ 1 + crq, data=crq_data, 
           model="pooling", index="schoolid")
```

```
## Coefficients:
##             Estimate Std. Error t-value Pr(>|t|)    
## (Intercept)   4.4470     0.1259   35.31   <2e-16 ***
## crq          -0.0525     0.0448   -1.17     0.24
```

```r
sqrt(diag(vcovHC(clm, 
                 method='arellano', 
                 cluster='group')))
```

```
## (Intercept)         crq 
##     0.14410     0.04601
```
]

---
# Other: Generalised Estimating Equations

.pull-left[
Don't include clustering as part of the model directly, but incorporate the dependency into our residuals term.

$$
`\begin{align}
\color{red}{\textrm{outcome}} & = \color{blue}{(\textrm{model})} + \textrm{error}^* \\
\end{align}`
$$

GEE is a "population average" model:

- __population average:__ how the _average_ response changes for a 1-unit increase in `$x$`, while accounting for within-group correlation.   
- __subject specific:__ how the response changes for a group when they increase 1-unit in `$x$`.

.footnote[`*` Where errors are clustered, and follow some form of correlational<br>structure within clusters (e.g. based on how many timepoints apart<br>two observations are).]

]

.pull-right[
Specifying a correlational structure for residuals within clusters can influence _what_ we are estimating

```r
library(geepack)
# needs to be arranged by cluster, 
# and for cluster to be numeric
crq_data <- 
  crq_data %>%
  mutate(
    cluster_id = as.numeric(as.factor(schoolid))
  ) %>% arrange(cluster_id)
# 
geemod  = geeglm(emot_dysreg ~ 1 + crq, 
                 data = crq_data, 
                 corstr = 'exchangeable',
                 id = cluster_id)
```

```
##  Coefficients:
##             Estimate Std.err   Wald Pr(>|W|)    
## (Intercept)   4.4304  0.1471 906.97   <2e-16 ***
## crq          -0.0510  0.0473   1.16     0.28
```

]

---
class: inverse, center, middle, animated, rotateInDownLeft

# End