Clustered Data

class: center, middle, inverse, title-slide

.title[
# <b>Clustered Data</b>
]
.subtitle[
## Data Analysis for Psychology in R 3
]
.author[
### Josiah King
]
.institute[
### Department of Psychology<br/>The University of Edinburgh
]

---

class: inverse, center, middle

<h1 style="text-align: left;">This Lecture:</h1>
<h3 style="text-align: left;">1. Clustered Data</h3>
<h3 style="text-align: left;">2. Practical stuff about data</h3>
<h3 style="text-align: left;">3. Where we're going...</h3>

---
class: inverse, center, middle

<h1 style="text-align: left;">This Lecture:</h1>
<h3 style="text-align: left;">1. Clustered Data</h3>
<h3 style="text-align: left; opacity:.4">2. Practical stuff about data</h3>
<h3 style="text-align: left; opacity:.4">3. Where we're going...</h3>

---
# Examples of clustered data

.pull-left[
- children within schools

- patients within clinics

- observations within individuals  
]
.pull-left[
<img src="jk_img_sandbox/h2.png" width="1716" />
]

---
# Clustered Clustered Data?

.pull-left[
- children within classrooms within schools within districts etc...

- patients within doctors within hospitals...

- time-periods within trials within individuals
]
.pull-right[
<img src="jk_img_sandbox/h3.png" width="1716" />
]

.footnote[
Other relevant terms you will tend to see: "grouping structure", "levels", "hierarchies". 
]

---
# Importance of Clustering

Clustering will likely result in measurements on observational units within a given cluster being more similar to each other than to those in other clusters.

- For example, our measure of academic performance for children in a given class will tend to be more similar to one another (because of class specific things such as the teacher) than to children in other classes.

---
# ICC (intra-class correlation coefficient)

Clustering is expressed in terms of the correlation among the measurements within the same cluster - known as the __intra-class correlation coefficient (ICC).__

There are various formulations of ICC, but the basic principle = ratio of *variance between groups* to *total variance*.

<br>
`$\rho = \frac{\sigma^2_{b}}{\sigma^2_{b} + \sigma^2_e} \\ \qquad \\\textrm{Where:} \\ \sigma^2_{b} = \textrm{variance between clusters} \\ \sigma^2_e = \textrm{variance within clusters (residual variance)} \\$`

Can also be interpreted as the correlation between two observations from the same group.

---
# ICC (intra-class correlation coefficient)

The larger the ICC, the lower the variability is within the clusters (relative to the variability between clusters). The greater the correlation between two observations from the same group.

.pull-left[
<img src="dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" />

]

.pull-right[

<img src="dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" />
]

---
# Clustered Data & Linear Models

.pull-left[
#### Why is it a problem?

Clustering is something **systematic** that our model should (arguably) take into account.

- remember, `$\varepsilon \sim N(0, \sigma) \textbf{ independently}$`

]

.pull-right[
#### How is it a problem?

Standard errors will often be smaller than they should be, meaning that:

- confidence intervals will often be too narrow 
  
  - `$t$`-statistics will often be too large  
  
  - `$p$`-values will often be misleadingly small

]

---
class: inverse, center, middle

<h1 style="text-align: left;">This Lecture:</h1>
<h3 style="text-align: left; opacity:.4">1. Clustered Data</h3>
<h3 style="text-align: left;">2. Practical stuff about data</h3>
<h3 style="text-align: left; opacity:.4">3. Where we're going...</h3>

---
# Wide Data/Long Data

.pull-left[
__Wide Data__

```
## # A tibble: 5 × 5
##   ID    age   trial_1 trial_2 trial_3
##   <chr> <chr> <chr>   <chr>   <chr>  
## 1 001   28    10      12.5    18     
## 2 002   36    7.5     7       5      
## 3 003   61    12      14.5    11     
## 4 004   45    10.5    17      14     
## 5 ...   ...   ...     ...     ...
```

]
.pull-right[
__Long Data__

```
## # A tibble: 13 × 4
##    ID    age   trial   score
##    <chr> <chr> <chr>   <chr>
##  1 001   36    trial_1 10   
##  2 001   36    trial_2 12.5 
##  3 001   36    trial_3 18   
##  4 002   70    trial_1 7.5  
##  5 002   70    trial_2 7    
##  6 002   70    trial_3 5    
##  7 003   68    trial_1 12   
##  8 003   68    trial_2 14.5 
##  9 003   68    trial_3 11   
## 10 004   31    trial_1 10.5 
## 11 004   31    trial_2 17   
## 12 004   31    trial_3 14   
## 13 ...   ...   ...     ...
```

]

---
# Wide Data/Long Data

![](https://www.fromthebottomoftheheap.net/assets/img/posts/tidyr-longer-wider.gif)
.footnote[
Source: Examples of wide and long representations of the same data. Source: Garrick Aden-Buie’s (\@grrrck) [Tidy Animated Verbs](https://github.com/gadenbuie/tidyexplain)
]

---
# Long Data = Good for Plotting

.pull-left[
__`group` aesthetic__

```r
ggplot(longd, aes(x=trial,y=score, group=ID, col=ID))+
  geom_point(size=4)+
  geom_path()
```

![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-9-1.svg)

]
.pull-right[
__`facet_wrap()`__

```r
ggplot(longd, aes(x=trial,y=score))+
  geom_point(size=4)+
  geom_path(aes(group=1))+
  facet_wrap(~ID)
```

![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-10-1.svg)
]

---
# Long Data = Good for by-Cluster Computations

```r
longd %>% 
  group_by(ID) %>%
  summarise(
    ntrials = n_distinct(trial),
    meanscore = mean(score),
    sdscore = sd(score)
  )
```

```
## # A tibble: 4 × 4
##   ID    ntrials meanscore sdscore
##   <chr>   <int>     <dbl>   <dbl>
## 1 001         3      13.5    4.09
## 2 002         3       6.5    1.32
## 3 003         3      12.5    1.80
## 4 004         3      13.8    3.25
```

---
# Summary

- Clustering can take many forms, and exist at many levels

- Clustering is something systematic that we would want our model to take into account

- Ignoring it can lead to incorrect statistical inferences

- Clustering is typically assessed using intra-class correlation coefficient (ICC) - the ratio of variance between clusters to the total variance `$\rho = \frac{\sigma^2_b}{\sigma^2_b + \sigma^2_e}$`

- Tidying your data and converting it to *long* format (one observational unit per row, and a variable identifying the cluster ID) is a good start.

---
class: inverse, center, middle

<h1 style="text-align: left;">This Lecture:</h1>
<h3 style="text-align: left; opacity:.4">1. Clustered Data</h3>
<h3 style="text-align: left; opacity:.4">2. Practical stuff about data</h3>
<h3 style="text-align: left;">3. Where we're going...</h3>

---
# Our Data

.pull-left[
> In a study examining how cognition changes over time, a sample of 20 participants took the Addenbrooke's Cognitive Examination (ACE) every 2 years from age 60 to age 78.

Each participant has 10 datapoints. Participants are clusters.

```r
d3 <- read_csv("https://uoepsy.github.io/data/dapr3_mindfuldecline.csv")
head(d3)
```

```
## # A tibble: 6 × 7
##   sitename ppt   condition visit   age   ACE imp  
##   <chr>    <chr> <chr>     <dbl> <dbl> <dbl> <chr>
## 1 Sncbk    PPT_1 control       1    60  84.5 unimp
## 2 Sncbk    PPT_1 control       2    62  85.6 imp  
## 3 Sncbk    PPT_1 control       3    64  84.5 imp  
## 4 Sncbk    PPT_1 control       4    66  83.1 imp  
## 5 Sncbk    PPT_1 control       5    68  82.3 imp  
## 6 Sncbk    PPT_1 control       6    70  83.3 imp
```

```r
library(ICC)
ICCbare(x = ppt, y = ACE, data = d3)
```

```
## [1] 0.4799
```
]
.pull-right[
<img src="dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-15-1.svg" style="display: block; margin: auto;" />
]

---
# Ignoring Clustering

.pull-left[
__(Complete pooling)__

+ `lm(y ~ 1 + x, data = df)`

+ Information from all clusters is pooled together to estimate over x

```r
model <- lm(ACE ~ 1 + visit, data = d3)
```

```
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  85.6259     0.3022   283.3  < 2e-16 ***
## visit        -0.3857     0.0488    -7.9  2.9e-13 ***
```
]
.pull-right[
![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-18-1.svg)
]

---
# Ignoring Clustering

.pull-left[
__(Complete pooling)__

+ `lm(y ~ 1 + x, data = df)`

+ Information from all clusters is pooled together to estimate over x

```r
model <- lm(ACE ~ 1 + visit, data = d3)
```

```
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  85.6259     0.3022   283.3  < 2e-16 ***
## visit        -0.3857     0.0488    -7.9  2.9e-13 ***
```

But different clusters show different patterns.  
Residuals are __not__ independent.  
]
.pull-right[

![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-21-1.svg)
]

---
count:false
# Ignoring Clustering

.pull-left[
__(Complete pooling)__

+ `lm(y ~ 1 + x, data = df)`

+ Information from all clusters is __pooled__ together to estimate over x

```r
model <- lm(ACE ~ 1 + visit, data = d3)
```

```
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  85.6259     0.3022   283.3  < 2e-16 ***
## visit        -0.3857     0.0488    -7.9  2.9e-13 ***
```

But different clusters show different patterns.  
Residuals are __not__ independent.  
]
.pull-right[

![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-24-1.svg)
]

---
# Fixed Effects Models

.pull-left[

__(No pooling)__

- `lm(y ~ x * cluster, data = df)`

- Information from a cluster contributes to estimate *for that cluster*, but information is not pooled (estimate for cluster 1 completely independent from estimate for cluster 2).

```r
model <- lm(ACE ~ 1 + visit * ppt, data = d3)
```
{{content}}
]
.pull-right[
![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-26-1.svg)
]

+ Lots of estimates (separate for each cluster). 
+ Variance estimates constructed based on information *only* within each cluster.

```
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     85.60667    0.33876  252.71  < 2e-16 ***
## visit           -0.51030    0.05460   -9.35  2.3e-16 ***
## pptPPT_10        0.38667    0.47908    0.81  0.42100    
## pptPPT_11       -0.10162    0.48862   -0.21  0.83555    
## pptPPT_12        0.63780    0.50929    1.25  0.21259    
## pptPPT_13        0.50667    0.47908    1.06  0.29210    
## pptPPT_14       -0.65026    0.58996   -1.10  0.27231    
## pptPPT_15        0.42667    0.47908    0.89  0.37470    
## pptPPT_16       -0.56000    0.47908   -1.17  0.24447    
## pptPPT_17       -0.00667    0.47908   -0.01  0.98892    
## pptPPT_18       -0.96667    0.47908   -2.02  0.04557 *  
## pptPPT_19       -0.34667    0.47908   -0.72  0.47054    
## pptPPT_2        -0.91333    0.47908   -1.91  0.05869 .  
## pptPPT_20       -0.67333    0.47908   -1.41  0.16214    
## pptPPT_3         0.68000    0.47908    1.42  0.15806    
## pptPPT_4         0.54667    0.47908    1.14  0.25583    
## pptPPT_5         0.08667    0.47908    0.18  0.85671    
## pptPPT_6        -0.15101    0.56266   -0.27  0.78881    
## pptPPT_7         0.29333    0.47908    0.61  0.54136    
## pptPPT_8        -0.28308    0.81805   -0.35  0.72984    
## pptPPT_9         0.36667    0.47908    0.77  0.44537    
## visit:pptPPT_10 -0.36485    0.07721   -4.73  5.6e-06 ***
## visit:pptPPT_11  0.36450    0.08434    4.32  3.0e-05 ***
## visit:pptPPT_12  0.19207    0.08067    2.38  0.01865 *
```

---
count: false
# Fixed Effects Models

.pull-left[

__(No pooling)__

- `lm(y ~ x * cluster, data = df)`

- Information from a cluster contributes to estimate *for that cluster*, but information is not pooled (estimate for cluster 1 completely independent from estimate for cluster 2).

```r
model <- lm(ACE ~ 1 + visit * ppt, data = d3)
```

+ Lots of estimates (separate for each cluster). 
+ Variance estimates constructed based on information *only* within each cluster.

]
.pull-right[
![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-30-1.svg)
]

---
# Random Effects Models (MLM/LMM)

.pull-left[
__(Partial Pooling)__

- `lmer(y ~ 1 + x + (1 + x| cluster), data = df)`
- cluster-level variance in intercepts and slopes is modeled as randomly distributed around fixed parameters.
- effects are free to vary by cluster, but information from clusters contributes (according to cluster `$n$` and outlyingness of cluster) to an overall fixed parameter.

```r
library(lme4)
model <- lmer(ACE ~ 1 + visit + 
                (1 + visit | ppt),
              data = d3)
summary(model)$coefficients
```

```
##             Estimate Std. Error t value
## (Intercept)  85.5843    0.12009 712.674
## visit        -0.3588    0.07333  -4.893
```

]
.pull-right[

![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-32-1.svg)
]

---
# Random Effects Models (MLM/LMM)

.pull-left[
__(Partial Pooling)__

```r
library(lme4)
model <- lmer(ACE ~ 1 + visit + 
                (1 + visit | ppt),
              data = d3)
summary(model)$coefficients
```

```
##             Estimate Std. Error t value
## (Intercept)  85.5843    0.12009 712.674
## visit        -0.3588    0.07333  -4.893
```

]
.pull-right[

![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-34-1.svg)
]

---
# Summary

With clustered data, there are many possible approaches. Some of the main ones are:

- Ignore it (**complete pooling**)

- and make inappropriate inferences.

- Completely partition out any variance due to clustering into fixed effects for each cluster (__no pooling__).

- and limit our estimates to being cluster specific and low power.  
    
- Model cluster-level variance as randomly distributed around fixed parameters, __partially pooling__ information across clusters.

- best of both worlds?

---
class: inverse, center, middle, animated, rotateInDownLeft

# End

---
class: inverse, center, middle

---
# (OPTIONAL) Repeated Measures ANOVA

## ANOVA revisited

.pull-left[
- We've been using `anova()` as a test of whether _several_ parameters are simultaneously zero

- The "omnibus"/"overall F"/"joint" test (can also be viewed as a model comparison)  
    
]

.pull-right[
- Traditional "ANOVA" is essentially a linear model with categorical predictor(s) where it looks for overall differences in group means (or differences in differences in group means etc).

- Still quite popular in psychology because we often design experiments with discrete conditions. 
    
    - Require post-hoc tests to examine _where_ any differences are.
]

---
exclude: true
# ANOVA revisited

In R, functions typically create anova tables from linear model objects:

```r
lin_mod <- lm(y ~ grp, df)
anova(lin_mod)
car::Anova(lin_mod)
```
or are wrappers which use the `lm()` function internally. So they're doing the same thing.

```r
summary(aov(y ~ grp, df))
```

---
exclude: true
# ANOVA sums of squares

.pull-left[

With multiple predictors, sums of squares can be calculated differently

1. Sequential Sums of Squares = Order matters
2. <p style="opacity:0.4">Partially Sequential Sums of Squares</p>
3. <p style="opacity:0.4">Partial Sums of Squares</p>

]
.pull-right[
__sequential SS__

```r
anova(lm(y~x1*x2,df))
```

```
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## x1         1     91      91   13.55 0.00038 ***
## x2         1    484     484   72.45 2.3e-13 ***
## x1:x2      1     16      16    2.32 0.13097    
## Residuals 96    642       7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
anova(lm(y~x2*x1,df))
```

```
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## x2         1    449     449   67.14 1.1e-12 ***
## x1         1    126     126   18.86 3.5e-05 ***
## x2:x1      1     16      16    2.32    0.13    
## Residuals 96    642       7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
]

---
exclude: true
# ANOVA sums of squares

.pull-left[

with multiple predictors, sums of squares can be calculated differently

1. <p style="opacity:0.4">Sequential Sums of Squares</p>
2. <p style="opacity:0.4">Partially Sequential Sums of Squares</p>
3. Partial Sums of Squares = Each one calculated as if its the last one in sequential SS (order doesn't matter).

]
.pull-right[
__partial SS__

```r
car::Anova(lm(y~x1*x2,df), type="III")
```

```
## Anova Table (Type III tests)
## 
## Response: y
##             Sum Sq Df F value  Pr(>F)    
## (Intercept)  11924  1 1784.14 < 2e-16 ***
## x1             122  1   18.27 4.5e-05 ***
## x2             497  1   74.30 1.4e-13 ***
## x1:x2           16  1    2.32    0.13    
## Residuals      642 96                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
car::Anova(lm(y~x2*x1,df), type="III")
```

```
## Anova Table (Type III tests)
## 
## Response: y
##             Sum Sq Df F value  Pr(>F)    
## (Intercept)  11924  1 1784.14 < 2e-16 ***
## x2             497  1   74.30 1.4e-13 ***
## x1             122  1   18.27 4.5e-05 ***
## x2:x1           16  1    2.32    0.13    
## Residuals      642 96                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]
    
---
# (OPTIONAL) Repeated Measures ANOVA

## ANOVA: Partitioning Variance

.footnote[  
The terminology here can be a nightmare.  
`$SS_{between}$` sometimes gets referred to as `$SS_{model}$`, `$SS_{condition}$`, `$SS_{regression}$`, or `$SS_{treatment}$`.  
Meanwhile `$SS_{residual}$` also gets termed `$SS_{error}$`.  
To make it all worse, there are inconsistencies in the acronyms used, `$SSR$` vs. `$RSS$`!

]

---
# (OPTIONAL) Repeated Measures ANOVA

## ANOVA: Partitioning Variance

---
# (OPTIONAL) Repeated Measures ANOVA in R

.pull-left[
![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-45-1.svg)
]
.pull-right[

```r
library(afex)
aov_ez(id = "subject", dv = "y", data = df, within = "t")
```

```
## Anova Table (Type 3 tests)
## 
## Response: y
##   Effect          df  MSE         F  ges p.value
## 1      t 1.00, 39.18 3.20 36.65 *** .074   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
## 
## Sphericity correction method: GG
```

```r
library(ez)
ezANOVA(data = df, dv = y, wid = subject, 
        within = t)
```

```
## $ANOVA
##   Effect DFn DFd     F         p p<.05     ges
## 2      t   2  78 36.65 5.991e-12     * 0.07383
## 
## $`Mauchly's Test for Sphericity`
##   Effect        W         p p<.05
## 2      t 0.008994 1.335e-39     *
## 
## $`Sphericity Corrections`
##   Effect    GGe    p[GG] p[GG]<.05    HFe     p[HF] p[HF]<.05
## 2      t 0.5023 4.14e-07         * 0.5024 4.123e-07         *
```

]

---
# (OPTIONAL) Mixed ANOVA in R

.pull-left[
![](dapr3_2324_01b_clusters_files/figure-html/unnamed-chunk-47-1.svg)
]
.pull-right[

```r
library(afex)
aov_ez(id = "subject", dv = "y", data = df, 
       between = "condition", within = "t")
```

```
## Anova Table (Type 3 tests)
## 
## Response: y
##        Effect          df   MSE         F  ges p.value
## 1   condition       3, 36 10.38 31.46 *** .695   <.001
## 2           t 1.01, 36.35  1.57 74.18 *** .215   <.001
## 3 condition:t 3.03, 36.35  1.57 14.31 *** .137   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
## 
## Sphericity correction method: GG
```

```r
library(ez)
ezANOVA(data = df, dv = y, wid = subject, 
        within = t, between = condition)
```

```
## $ANOVA
##        Effect DFn DFd     F         p p<.05    ges
## 2   condition   3  36 31.46 3.654e-10     * 0.6945
## 3           t   2  72 74.18 3.241e-18     * 0.2148
## 4 condition:t   6  72 14.31 1.156e-10     * 0.1367
## 
## $`Mauchly's Test for Sphericity`
##        Effect       W         p p<.05
## 3           t 0.01948 1.164e-30     *
## 4 condition:t 0.01948 1.164e-30     *
## 
## $`Sphericity Corrections`
##        Effect    GGe     p[GG] p[GG]<.05    HFe     p[HF] p[HF]<.05
## 3           t 0.5049 2.389e-10         * 0.5053 2.352e-10         *
## 4 condition:t 0.5049 2.428e-06         * 0.5053 2.407e-06         *
```
]

---
# (OPTIONAL) Advantages of MLM over ANOVA

- MLM can easily be extended to modelling different types of dependent variables. ANOVA cannot.

- MLM can be extended to model more complex grouping structures (e.g. children in schools in districts, or independent clusters of both participants as well as experimental items)

- MLM provides more options for dealing with unbalanced designs and missing data.

- Rpt Measures and Mixed ANOVA are special cases of the MLM! You can still get out your ANOVA-style tables from the MLM!

---
# (OPTIONAL) Cluster Robust Standard Errors

.pull-left[

Don't include clustering as part of the model directly, but incorporate the dependency into our residuals term.

$$
`\begin{align}
\color{red}{\textrm{outcome}} & = \color{blue}{(\textrm{model})} + \textrm{error}^* \\
\end{align}`
$$
.footnote[`*` Where errors are clustered]

]

.pull-right[
Simply adjusts our standard errors:

```r
library(plm)
clm <- plm(ACE ~ 1 + visit, data = d3, 
           model="pooling", index="ppt")
```

```
## Coefficients:
##             Estimate Std. Error t-value Pr(>|t|)    
## (Intercept)  85.6259     0.3022   283.3  < 2e-16 ***
## visit        -0.3857     0.0488    -7.9  2.9e-13 ***
```

```r
sqrt(diag(vcovHC(clm, 
                 method='arellano', 
                 cluster='group')))
```

```
## (Intercept)       visit 
##     0.11639     0.07501
```
]

---
# (OPTIONAL) Generalised Estimating Equations

.pull-left[
Don't include clustering as part of the model directly, but incorporate the dependency into our residuals term.

$$
`\begin{align}
\color{red}{\textrm{outcome}} & = \color{blue}{(\textrm{model})} + \textrm{error}^* \\
\end{align}`
$$

GEE is a "population average" model:

- __population average:__ how the _average_ response changes for a 1-unit increase in `$x$`, while accounting for within-group correlation.   
- __subject specific:__ how the response changes for a group when they increase 1-unit in `$x$`.

.footnote[`*` Where errors are clustered, and follow some form of correlational<br>structure within clusters (e.g. based on how many timepoints apart<br>two observations are).]

]

.pull-right[
Specifying a correlational structure for residuals within clusters can influence _what_ we are estimating

```r
library(geepack)
# needs to be arranged by cluster, 
# and for cluster to be numeric
d3 <- 
  d3 %>%
  mutate(
    cluster_id = as.numeric(as.factor(ppt))
  ) %>% arrange(cluster_id)
# 
geemod  = geeglm(ACE ~ 1 + visit, 
                 data = d3, 
                 corstr = 'exchangeable',
                 id = cluster_id)
```

```
##  Coefficients:
##             Estimate Std.err     Wald Pr(>|W|)    
## (Intercept)  85.6502  0.1336 411010.0  < 2e-16 ***
## visit        -0.3792  0.0755     25.2  5.2e-07 ***
```

]

---
class: inverse, center, middle, animated, rotateInDownLeft

# End