Multiple Comparisons

class: center, middle, inverse, title-slide

# Multiple Comparisons 
## Data Analysis for Psychology in R 2 
### Alex Doumas and Tom Booth
### Department of Psychology The University of Edinburgh
### AY 2020-2021

---

# Weeks Learning Objectives
1. Understand the problem of multiple comparisons.

2. Run appropriate tests and corrections for multiple comparisons.

3. Apply assumption tests to linear models for experimental designs.

---
# Topics for today

+ What happens when we have lots of tests?

+ Define the multiple comparisons problem.

+ Talk about a couple of approaches to addressing it.

---
# Types of errors

+ Type I error = False Positive
  + Reject the null when the null is true. 
  + `$\alpha = P(\text{Type I Error})$`

+ Type II error = False negative
  + Fail to reject the null when the null is false. 
  + `$\beta = P(\text{Type II Error})$`
  
---
# A single test
+ If we perform a single test, our Type I error rate is `$\alpha$`.
  + So if we set `$\alpha = 0.05$`, the probability of a false positive is 0.05

+ However, what if we do multiple tests ( `$m$` ) each with the same `$\alpha$` level?

+ What is the probability of a false positive among `$m$` tests?

---
# Multiple tests

`$$P(\text{Type I error}) = \alpha$$`
`$$P(\text{not making a Type I error}) = 1 - \alpha$$`

`$$P(\text{Not making a Type I error in m tests}) = (1 - \alpha)^m$$`

`$$P(\text{Making a Type I error in m tests}) = 1 - (1-\alpha)^m$$`

---
# P(Making a Type I error in m tests)

+ Suppose `$m=2$` and `$\alpha = 0.05$`

```r
1 - ((1-0.05)^2)
```

```
## [1] 0.0975
```

+ Suppose `$m=5$` and `$\alpha = 0.05$`

```r
1 - ((1-0.05)^5)
```

```
## [1] 0.2262191
```

+ Suppose `$m=10$` and `$\alpha = 0.05$`

```r
1 - ((1-0.05)^10)
```

```
## [1] 0.4012631
```

---
# Why does this matter?

+ The `$P(\text{Making a Type I error in m tests}) = 1 - (1-\alpha)^m$` is referred to as the family-wise error rate.

+ A "family" is a set of related tests.

+ When we analyse an experimental design, and we look at lots of specific comparisons, we can think of all these tests as a "family".

+ The larger the family, the more likely we are to find a false positive (see previous slide).

---
# Corrections
+ There are various methods designed to control for the number of tests.
  + Here control means to keep the Type I Error rate at an intended `$\alpha$`.

+ Many options. Some of most common:
  + Bonferroni
  + Sidak
  + Tukey
  + Scheffe

+ Others you may see:
  + Holm's step-down
  + Hochberg's step-up

---
# Example
+ The data comes from a study into patient care in a paediatric wards.

+ A researcher was interested in whether the subjective well-being of patients differed dependent on the post-operation treatment schedule they were given, and the hospital in which they were staying.

+ **Condition 1**: `Treatment` (Levels: TreatA, TreatB, TreatC).
  
+ **Condition 2**: `Hosp` (Levels: Hosp1, Hosp2). 
  
+ Total sample n = 180 (30 patients in each of 6 groups).
  + Between person design.

+ **Outcome**: Subjective well-being (SWB). 
  + An average of multiple raters (the patient, a member of their family, and a friend). 
  + SWB score ranged from 0 to 20.

---
# The data

```r
hosp_tbl <- read_csv("hospital.csv", col_types = "dff")
hosp_tbl %>%
 slice(1:10)
```

```
## # A tibble: 10 x 3
## SWB Treatment Hospital
## <dbl> <fct> <fct> 
## 1 6.2 TreatA Hosp1 
## 2 15.9 TreatA Hosp1 
## 3 7.2 TreatA Hosp1 
## 4 11.3 TreatA Hosp1 
## 5 11.2 TreatA Hosp1 
## 6 9 TreatA Hosp1 
## 7 14.5 TreatA Hosp1 
## 8 7.3 TreatA Hosp1 
## 9 13.7 TreatA Hosp1 
## 10 12.6 TreatA Hosp1
```

---
# Our results

```r
m4 <- lm(SWB ~ Treatment + Hospital + Treatment*Hospital, data = hosp_tbl)
anova(m4)
```

```
## Analysis of Variance Table
## 
## Response: SWB
## Df Sum Sq Mean Sq F value Pr(>F) 
## Treatment 2 177.02 88.511 21.5597 4.315e-09 ***
## Hospital 1 9.57 9.568 2.3306 0.1287 
## Treatment:Hospital 2 392.18 196.088 47.7635 < 2.2e-16 ***
## Residuals 174 714.34 4.105 
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
# Bonferroni & Sidak
+ Both are considered "conservative" adjustments.

+ Each treats individual tests within the family as if they are independent.
  + Consider an `$\alpha = 0.05$` and `$m=\text{number of tests}=10$`

+ **Bonferroni**: `$\alpha_{Bonferroni} = \frac{\alpha}{m}$`

```r
0.05/10
```

```
## [1] 0.005
```

+ **Sidak**: `$\alpha_{Sidak} = 1 - (1- \alpha)^{\frac{1}{m}}$`

```r
1-((1-0.05)^(1/10))
```

```
## [1] 0.005116197
```

---
# Bonferroni in action
-

```r
tA1_A2 <- t.test(SWB ~ Hospital, data = filter(hosp_tbl, (Treatment == "TreatA")))
tA1_B1 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatA" & Hospital == "Hosp1") | (Treatment == "TreatB" & Hospital == "Hosp1"))))
tA1_C1 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatA" & Hospital == "Hosp1") | (Treatment == "TreatC" & Hospital == "Hosp1"))))
tA1_B2 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatA" & Hospital == "Hosp1") | (Treatment == "TreatB" & Hospital == "Hosp2"))))
tA1_C2 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatA" & Hospital == "Hosp1") | (Treatment == "TreatC" & Hospital == "Hosp2"))))
tA2_B1 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatA" & Hospital == "Hosp2") | (Treatment == "TreatB" & Hospital == "Hosp1"))))
tA2_C1 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatA" & Hospital == "Hosp2") | (Treatment == "TreatC" & Hospital == "Hosp1"))))
tA2_B2 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatA" & Hospital == "Hosp2") | (Treatment == "TreatB" & Hospital == "Hosp2"))))
tA2_C2 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatA" & Hospital == "Hosp2") | (Treatment == "TreatC" & Hospital == "Hosp2"))))
tB1_C1 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatB" & Hospital == "Hosp1") | (Treatment == "TreatC" & Hospital == "Hosp1"))))
tB1_B2 <- t.test(SWB ~ Hospital, data = filter(hosp_tbl, ((Treatment == "TreatB" & Hospital == "Hosp1") | (Treatment == "TreatB" & Hospital == "Hosp2"))))
tB1_C2 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatB" & Hospital == "Hosp1") | (Treatment == "TreatC" & Hospital == "Hosp2"))))
tB2_C1 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatB" & Hospital == "Hosp2") | (Treatment == "TreatC" & Hospital == "Hosp1"))))
tB2_C2 <- t.test(SWB ~ Treatment, data = filter(hosp_tbl, ((Treatment == "TreatB" & Hospital == "Hosp2") | (Treatment == "TreatC" & Hospital == "Hosp2"))))
tC1_C2 <- t.test(SWB ~ Hospital, data = filter(hosp_tbl, ((Treatment == "TreatC" & Hospital == "Hosp1") | (Treatment == "TreatC" & Hospital == "Hosp2"))))
```

---
# Bonferroni in action
- 15 tests, so our adjusted `$\alpha = 0.05/15 = 0.003$`

```r
tA1_A2
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  SWB by Hospital
## t = 4.5985, df = 45.013, p-value = 3.455e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.656055 4.237278
## sample estimates:
## mean in group Hosp1 mean in group Hosp2 
##           10.800000            7.853333
```
  
---
# Bonferroni in action
- 15 tests, so our adjusted `$\alpha = 0.05/15 = 0.003$`

```r
tA1_B1
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  SWB by Treatment
## t = 1.9726, df = 52.958, p-value = 0.05377
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.02304211  2.76304211
## sample estimates:
## mean in group TreatA mean in group TreatB 
##                10.80                 9.43
```
  
---
# Bonferroni in action
- 15 tests, so our adjusted `$\alpha = 0.05/15 = 0.003$`

```r
tA1_B2
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  SWB by Treatment
## t = -3.8923, df = 36.002, p-value = 0.0004123
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.523772 -1.109561
## sample estimates:
## mean in group TreatA mean in group TreatB 
##             10.80000             13.11667
```
  
---
# Bonferroni in action
- 15 tests, so our adjusted `$\alpha = 0.05/15 = 0.003$`

```r
tA1_C1
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  SWB by Treatment
## t = 0.99523, df = 53.555, p-value = 0.3241
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7070316  2.1003649
## sample estimates:
## mean in group TreatA mean in group TreatC 
##             10.80000             10.10333
```
  
---
# Bonferroni in action
- 15 tests, so our adjusted `$\alpha = 0.05/15 = 0.003$`

```r
tA1_C2
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  SWB by Treatment
## t = 4.7928, df = 34.573, p-value = 3.073e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.624994 4.015006
## sample estimates:
## mean in group TreatA mean in group TreatC 
##                10.80                 7.98
```

---
# Bonferroni in action
- A little easier, but less thorough...

```r
pairwise_ts <- hosp_tbl %>%
 group_by(Hospital) %>% 
 t_test(SWB ~ Treatment, p.adjust.method = "bonferroni")
```

---
# Bonferroni in action
- A little easier, but less thorough...

```r
pairwise_ts
```

```
## # A tibble: 6 x 11
## Hospital .y. group1 group2 n1 n2 statistic df p p.adj
## * <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Hosp1 SWB TreatA TreatB 30 30 1.97 53.0 5.40e- 2 1.61e- 1
## 2 Hosp1 SWB TreatA TreatC 30 30 0.995 53.6 3.24e- 1 9.72e- 1
## 3 Hosp1 SWB TreatB TreatC 30 30 -1.15 58.0 2.54e- 1 7.62e- 1
## 4 Hosp2 SWB TreatA TreatB 30 30 -14.4 49.2 2.83e-19 8.49e-19
## 5 Hosp2 SWB TreatA TreatC 30 30 -0.357 45.9 7.22e- 1 1.00e+ 0
## 6 Hosp2 SWB TreatB TreatC 30 30 19.5 57.2 5.72e-27 1.72e-26
## # … with 1 more variable: p.adj.signif <chr>
```

---
# Tukey & Scheffe
+ **Scheffe procedure** involves an adjustment to the critical value of `$F$`.
  + The adjustment relates to the number of comparisons being made.
  + And makes the critical value of `$F$` larger for a fixed `$\alpha$`. 
  + The more tests, the larger `$F$`.

+ The square-root of the adjusted F provides and adjusted `$t$`.

+ **Tukey's HSD** (Honest significant Differences)
  + Compares all pairwise group means. 
  + Each difference is divided by the `$SE$` of the sum of means. 
  + This produces a `$q$` statistic for each comparison. 
  + And is compared against a studentized range distribution.

---
# Tukey in R
- TukeyHSD() function works on an aov object.

```r
m5 <- aov(SWB ~ Treatment + Hospital + Treatment*Hospital, data = hosp_tbl)
TukeyHSD(m5)
```
<img src="./figs/tukey.png" width="40%" />

---
# Summary of today

+ Recapped the inflation of Type I error rate with multiple tests.

+ Briefly introduced tools for correcting for multiple comparisons.

+ In next weeks practical exercises, we will look at applying these to the analysis of factorial designs.