Multilevel Model Inference

class: center, middle, inverse, title-slide

.title[
# <b>Multilevel Model Inference</b>
]
.subtitle[
## Data Analysis for Psychology in R 3
]
.author[
### Josiah King
]
.institute[
### Department of Psychology<br/>The University of Edinburgh
]

---

class: inverse, center, middle

<h1 style="text-align: left;">This Lecture:</h1>
<h3 style="text-align: left;">1. Inferential methods with `lmer`</h3>

---
# <p></p>

you might have noticed...

```r
library(lme4)
d3 <- read_csv("https://uoepsy.github.io/data/dapr3_mindfuldecline.csv")

cogtime_model <- lmer(ACE ~ 1 + visit + (1 + visit | ppt), data = d3)

summary(cogtime_model)
```

.pull-left[

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: ACE ~ 1 + visit + (1 + visit | ppt)
##    Data: d3
## 
## REML criterion at convergence: 365.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3120 -0.6818 -0.0526  0.7016  2.4022 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  ppt      (Intercept) 0.157    0.396         
##           visit       0.104    0.322    -0.59
##  Residual             0.244    0.494         
## Number of obs: 177, groups:  ppt, 20
## 
## Fixed effects:
##             Estimate Std. Error t value
*## (Intercept)  85.5843     0.1201  712.67
*## visit        -0.3588     0.0733   -4.89
## 
## Correlation of Fixed Effects:
##       (Intr)
## visit -0.534
```
]
.pull-right[
![](jk_img_sandbox/wotnop.png)
{{content}}

]

<br><br><br>
<center><b>Extensive debate about how best to test parameters from MLMs.</b></center>

---
# p-values in `lm()`

In simple LM, we test the reduction in residual SS (sums of squares), which follows an `$F$` distribution with a known `$df$`.

`$F$` tests for each predictor will have `$k$` numerator degrees of freedom (how many parameters) and `$n-k-1$` denominator degrees of freedom.

The `$t$`-statistics for coefficients will follow `$t$`-distributions with the same `$n-k-1$` degrees of freedom.

<br>

$$
`\begin{align}
F \qquad & = \qquad \frac{MS_{model}}{MS_{residual}} \qquad &= \qquad \frac{SS_{model}/df_{model}}{SS_{residual}/df_{residual}} \\
\quad \\
& & df_{model} = k \\
& & df_{residual} = n-k-1 \\
\end{align}`
$$
---
count: false
exclude: true
# p-values in `lm()`

In simple LM, we test the reduction in residual SS (sums of squares), which follows an `$F$` distribution with a known `$df$`.

`$F$` tests for each predictor will have `$k$` numerator degrees of freedom (how many parameters) and `$n-k-1$` denominator degrees of freedom.

The `$t$`-statistics for coefficients will follow `$t$`-distributions with the same `$n-k-1$` degrees of freedom.

<br>

$$
`\begin{align}
F \qquad & = \qquad  \frac{\text{variance explained by model}/\text{nr model parameters}}{\text{variance left over}/\text{nr residuals that are free to vary}} & \\
\quad \\
& = \frac{\text{avg variance explained per model parameter}}{\text{avg variance explained per residual}} &
\end{align}`
$$

---
# p-values in `lmer()`?

- In certain *very specific* conditions, we can work out what the degrees of freedom are.

- Parameter estimates from these models are not based on minimising sums of squared residuals

- They are the ML/REML estimates

- This is good - it means they can handle unbalanced designs and complex random effect structures

- But: 
  - unclear how to calculate denominator `$df$`
  - unclear whether the test statistics even follow an `$F$` distribution
  
---
# Options for inference

<ol>
<li>approximating ddf</li>
  <ul>
  <li>satterthwaite</li>
  <li>kenward-rogers</li>
  <li>m-l-1</li>
  <li>...</li>
  </ul>
<li>likelihood based methods</li>
  <ul>
  <li>profile likelihood confidence interval</li>
  <li>likelihood ratio tests</li>
  </ul>
<li>bootstrap</li>
  <ul>
  <li>parametric</li>
    <ul>
    <li>confidence interval</li>
    <li>likelihood ratio test</li>
    </ul>
  <li>case-based</li>
    <ul>
    <li>confidence interval</li>
    </ul>
  </ul>
</ol>
  
<p>(there are others...)</p>

---
# Options for inference

<ol>
<li>approximating ddf</li>
  <ul>
  <li style="opacity:.4">satterthwaite</li>
  <li>kenward-rogers</li>
  <li style="opacity:.4">m-l-1</li>
  <li style="opacity:.4">...</li>
  </ul>
<li>likelihood based methods</li>
  <ul>
  <li>profile likelihood confidence interval</li>
  <li>likelihood ratio tests</li>
  </ul>
<li style="opacity:.4">bootstrap</li>
  <ul>
  <li style="opacity:.4">parametric</li>
    <ul>
    <li style="opacity:.4">confidence interval: <code>confint(model, method = "boot")</code></li>
    <li style="opacity:.4">likelihood ratio test: <code>pbkrtest::PBmodcomp(full_model, reduced_model)</code></li>
    </ul>
  <li style="opacity:.4">case-based</li>
    <ul>
    <li style="opacity:.4">confidence interval</li>
    </ul>
  </ul>
</ol>
  
<p style="opacity:.4">(there are others...)</p>  
---
# Kenward-Rogers

- Adjusts standard errors to avoid small sample bias
- Approximates denominator degrees of freedom
- Models must be fitted with REML (which is good for small samples!)

__fixed effects__

```r
library(parameters)
model_parameters(cogtime_model, ci_method = "kr") # %>% print_html()
```

```
## # Fixed Effects
## 
## Parameter   | Coefficient |   SE |         95% CI |      t |    df |      p
## ---------------------------------------------------------------------------
## (Intercept) |       85.58 | 0.12 | [85.33, 85.84] | 710.54 | 18.51 | < .001
## visit       |       -0.36 | 0.07 | [-0.51, -0.21] |  -4.89 | 18.99 | < .001
## 
## # Random Effects
## 
## Parameter                  | Coefficient |   SE |        95% CI
## ---------------------------------------------------------------
## SD (Intercept: ppt)        |        0.40 | 0.12 | [ 0.22, 0.71]
## SD (visit: ppt)            |        0.32 | 0.05 | [ 0.23, 0.45]
## Cor (Intercept~visit: ppt) |       -0.59 | 0.36 | [-0.94, 0.38]
## SD (Residual)              |        0.49 | 0.03 | [ 0.44, 0.56]
```

---
# Kenward-Rogers

- Adjusts standard errors to avoid small sample bias
- Approximates denominator degrees of freedom
- Models must be fitted with REML (which is good for small samples!)

__Model Comparison__

```r
restricted_model <- lmer(ACE ~ 1 + (1 + visit | ppt), data = d3, REML = TRUE)
full_model <- lmer(ACE ~ 1 + visit + (1 + visit | ppt), data = d3, REML = TRUE)

library(pbkrtest)
KRmodcomp(full_model, restricted_model)
```

```
## large : ACE ~ 1 + visit + (1 + visit | ppt)
## small : ACE ~ 1 + (1 + visit | ppt)
##       stat  ndf  ddf F.scaling p.value    
## Ftest 23.9  1.0 19.0         1  0.0001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
# Likelihood ratio tests

.pull-left[

- Compares the log-likelihood of two competing models.
- ratio of two likelihoods is **asymptotically** `$\chi^2$`-square distributed.
    - *this means for small samples (at any level) it may be unreliable*
    - how small is too small?  
        ¯\\\_(ツ)\_/¯ 42?

]
.pull-right[
__LRTs and ML/REML__

<ul>
<li>if models differ in fixed effects only, then fit with ML</li> 
<li style="opacity:.6">if models differ in random effects only, then fit with REML (or ML if big N)</li>
<li style="opacity:.6">if models differ in both, then fit with ML</li>
</ul>
`anova()` will automatically re-fit models with ML.

]
  
<br>

```r
restricted_model <- lmer(ACE ~ 1 + (1 + visit | ppt), data = d3, REML = FALSE)
full_model <- lmer(ACE ~ 1 + visit + (1 + visit | ppt), data = d3, REML = FALSE)

anova(restricted_model, full_model)
```

```
## Data: d3
## Models:
## restricted_model: ACE ~ 1 + (1 + visit | ppt)
## full_model: ACE ~ 1 + visit + (1 + visit | ppt)
##                  npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)    
## restricted_model    5 385 401   -188      375                        
## full_model          6 371 390   -180      359  16.3  1   0.000055 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
# Likelihood-based CIs

- Evaluates the curvature of the likelihood surface at the estimate.

- sharp curvature = more certainty in estimate  
  
  - gradual curvature = less certainty in estimate  
  
- models need to be fitted with ML, not REML

```r
confint(cogtime_model, method="profile")
```

```
##               2.5 %   97.5 %
## .sig01       0.1537  0.64010
## .sig02      -0.9368 -0.04886
## .sig03       0.2339  0.44750
## .sigma       0.4413  0.55875
## (Intercept) 85.3421 85.82385
## visit       -0.5057 -0.21155
```

---
# Summary

- Lots of debate around how best to conduct statistical inferences based on multi-level models.

- Lots of options.

|                  | df approximations                                                  | likelihood-based                                                    | 
| ---------------- | ------------------------------------------------------------------ | ------------------------------------------------------------------- | 
| tests or CIs for model parameters | `library(parameters)`<br>`model_parameters(model, ci_method="kr")` | `confint(model, type="profile")`                                    | 
| model comparison<br><small>(different fixed effects, same random effects)</small> | `library(pbkrtest)`<br>`KRmodcomp(model1,model0)`                  | `anova(model0,model)`                                               |
|                  | fit models with `REML=TRUE`.<br>good option for small samples      | fit models with `REML=FALSE`.<br>needs large N at both levels (40+) |

- check your sample size! but if in doubt, go for KR.

---
class: inverse, center, middle, animated, rotateInDownLeft

# End

---
class: inverse, center, middle

<h1 style="text-align: left;">This Lecture:</h1>
<h3 style="text-align: left;opacity:.4">1. Inferential methods with `lmer`</h3>
<h3 style="text-align: left; ">Bonus content</h3>

---
# Bootstrap

<ul>
<li>Parametric Bootstrap<br>assumes that explanatory variables are fixed and that model specification and the distributions such as `$\zeta_i \sim N(0,\sigma_{\zeta})$` and `$\varepsilon_i \sim N(0,\sigma_{\varepsilon})$` are correct.</li><br>
<li style="opacity: .6">Case-based Bootstrap<br>minimal assumptions - we just need to ensure that we correctly specify the hierarchical dependency of data.<br>requires decision of at which levels to resample.<br>(discussed more next week)</li>
</ul>

---
# Parametric Bootstrap

The basic premise is that we:

1. fit model(s) to data

2. Do many times:

- simulate data based on the parameters of the fitted model(s)
  - compute some statistic/estimates from the simulated data

3. Construct a distribution of possible values

---
# Parametric Bootstrap

.pull-left[
## Confidence Intervals

```r
full_model <- lmer(ACE ~ 1 + visit + 
              (1 + visit | ppt), data = d3)  
confint(full_model, method = "boot")
```
```
              2.5 %  97.5 %
.sig01       0.1130  0.6133
.sig02      -1.0000  0.1028
.sig03       0.2084  0.4225
.sigma       0.4253  0.5542
(Intercept) 85.3443 85.8334
visit       -0.5029 -0.2137
```
]
.pull-right[
## LRT
(a bootstrapped version of the likelihood ratio test):

```r
restricted_model <- lmer(ACE ~ 1 + 
                    (1 + visit | ppt), data = d3)
full_model <- lmer(ACE ~ 1 + visit + 
              (1 + visit | ppt), data = d3)

library(pbkrtest)
PBmodcomp(full_model, restricted_model)
```
```
Bootstrap test; time: 71.91 sec; samples: 1000; extremes: 0;
large : ACE ~ 1 + visit + (1 + visit | ppt)
ACE ~ 1 + (1 + visit | ppt)
       stat df  p.value    
LRT    16.3  1 0.000055 ***
PBtest 16.3       0.001 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```
]

---
# Adding more options

|                  | df approximations                                                                                                                                                                                                              | likelihood-based                                                                                 | parametric bootstrap                                                                                 |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------- |
| model parameters | `library(parameters)`<br>`model_parameters(model, ci_method="kr")`                                                                                                                                                             | `confint(model, type="profile")`                                                                 | `confint(model, type="boot")`                                                                        |
| model comparison | `library(pbkrtest)`<br>`KRmodcomp(model1,model0)`                                                                                                                                                                              | `anova(model0,model)`                                                                            | `library(pbkrtest)`<br>`PBmodcomp(model1,model0)`                                                    |
|                  | for KR, fit models with `REML=TRUE` (a good option for small samples).<br>Other options available (e.g. Satterthwaite, m-l-1, ...)<br>easy to implement, but debate as to whether statistics actually follow an F distribution | if models differ in fixed effects, we cannot use `$-2\Delta LL$` when models are fitted with REML. | Time consuming, but probably most reliable option (although can be problematic with unstable models) |