Analysing Factorial Designs

class: center, middle, inverse, title-slide

# <b>Analysing Factorial Designs </b>
## Data Analysis for Psychology in R 2<br><br>
### Tom Booth and Alex Doumas
### Department of Psychology<br>The University of Edinburgh
### AY 2020-2021

---

# Weeks Learning Objectives
1. Interpret the output from a model using dummy coding and sum-to-zero coding.

2. Create specific contrast matrices to test specific effects.

3. Recognise other forms of contrasts.

4. Construct models to test factorial designs.

---
# Topics for today
+ Tabulating data from factorial design.

+ Recap factorial designs effects of interest.
  + Main effects
  + Simple effects/contrasts
  + Interactions

+ Show the tests of main effects via model comparison using `$F$`-tests.

---
# Example
+ The data comes from a study into patient care in a paediatric wards.

+ A researcher was interested in whether the subjective well-being of patients differed dependent on the post-operation treatment schedule they were given, and the hospital in which they were staying.

+ **Condition 1**: `Treatment` (Levels: TreatA, TreatB, TreatC.
  
+ **Condition 2**: `Hosp` (Levels: Hosp1, Hosp2). 
  
+ Total sample n = 180 (30 patients in each of 6 groups).
  + Between person design.

+ **Outcome**: Subjective well-being (SWB)
  + An average of multiple raters (the patient, a member of their family, and a friend). 
  + SWB score ranged from 0 to 20.

---
# The data

```r
hosp_tbl <- read_csv("hospital.csv", col_types = "dff")
hosp_tbl %>%
  slice(1:10)
```

```
## # A tibble: 10 x 3
##      SWB Treatment Hospital
##    <dbl> <fct>     <fct>   
##  1   6.2 TreatA    Hosp1   
##  2  15.9 TreatA    Hosp1   
##  3   7.2 TreatA    Hosp1   
##  4  11.3 TreatA    Hosp1   
##  5  11.2 TreatA    Hosp1   
##  6   9   TreatA    Hosp1   
##  7  14.5 TreatA    Hosp1   
##  8   7.3 TreatA    Hosp1   
##  9  13.7 TreatA    Hosp1   
## 10  12.6 TreatA    Hosp1
```

---
# Table of means

.pull-left[

```r
mean(hosp_tbl$SWB)
```

```
## [1] 9.880556
```

```r
aggregate(SWB ~ Treatment + Hospital, 
  hosp_tbl, mean)
```

```
##   Treatment Hospital       SWB
## 1    TreatA    Hosp1 10.800000
## 2    TreatB    Hosp1  9.430000
## 3    TreatC    Hosp1 10.103333
## 4    TreatA    Hosp2  7.853333
## 5    TreatB    Hosp2 13.116667
## 6    TreatC    Hosp2  7.980000
```

]

.pull-right[

```r
aggregate(SWB ~ Hospital, 
  hosp_tbl, mean)
```

```
##   Hospital      SWB
## 1    Hosp1 10.11111
## 2    Hosp2  9.65000
```

```r
aggregate(SWB ~ Treatment, 
  hosp_tbl, mean)
```

```
##   Treatment       SWB
## 1    TreatA  9.326667
## 2    TreatB 11.273333
## 3    TreatC  9.041667
```

]

---
# Table of means

+ All of the above gives us a full table of means

<table>
 <thead>
  <tr>
   <th style="text-align:left;">  </th>
   <th style="text-align:left;"> Hosp1 </th>
   <th style="text-align:left;"> Hosp2 </th>
   <th style="text-align:left;"> Marginal </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> TreatA </td>
   <td style="text-align:left;"> 10.80 </td>
   <td style="text-align:left;"> 7.85 </td>
   <td style="text-align:left;"> 9.33 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> TreatB </td>
   <td style="text-align:left;"> 9.43 </td>
   <td style="text-align:left;"> 13.11 </td>
   <td style="text-align:left;"> 11.27 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> TreatC </td>
   <td style="text-align:left;"> 10.10 </td>
   <td style="text-align:left;"> 7.98 </td>
   <td style="text-align:left;"> 9.04 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Marginal </td>
   <td style="text-align:left;"> 10.11 </td>
   <td style="text-align:left;"> 9.65 </td>
   <td style="text-align:left;"> 9.88 </td>
  </tr>
</tbody>
</table>

---
# Hypotheses we test in Factorial Designs
+ Main effects
  + An overall, or average, effect of a condition.
  + In our example, is there an effect of `Treatment` ignoring `Hospital` (and vice versa)?

+ Simple contrasts/effects
  + An effect of one condition at a specific level of another.
  + Is there an effect of `Hospital` for those receiving `Treatment A`? (...and so on for all combinations.)

+ Interactions (categorical*categorical)
  + A change in the effect of some condition as a function of another.
  + Does the effect of `Treatment` differ by `Hospital`?

---
# Our model and coefficients
+ The linear model with two categorical variables:

`$$y_{ijk} = b_0 + \alpha_i + \tau_j + \epsilon_{ijk}$$`
+ where;
  + i = 1 .... g_A,  j = 1 ... g_B,   k = 1... n
  + `$y_{ijk}$` is the kth observation of level i of the first factor and level j of the second factor, 
  + `$\alpha_i$` is the effect of the level i of the first factor, 
  + `$\tau_j$` is the effect of level j of the second factor.

+ But remember whichever coding scheme we use, we have `$g$`-1 variables representing the condition.
  + So for `Treatment` we have 2 predictors (D1 & D2)
  + And for `Hospital` we have 1 predictor (D3)
  
+ We can write the linear model more explicitly as:

`$$y_{ijk} = b_0 + \underbrace{(b_1D_1 + b_2D_2)}_{\text{Treatment}} + \underbrace{b_3D_3}_{\text{Hospital}} + \epsilon_{i}$$`
  
---
# Number of interaction terms
+ To include terms for the interaction, we need to cross each level of one condition with the levels of the other.

+ In general this requirement will mean we need ( `$r$`-1)( `$c$`-1) interaction terms
  + where `$c$` and `$r$` represent the number of levels of each condition.
  + In our case this is (3-1)(2-1) = 2

`$$y_{ijk} = b_0 + \underbrace{(b_1D_1 + b_2D_2)}_{\text{Treatment}} + \underbrace{b_3D_3}_{\text{Hospital}} + \underbrace{b_4D_{13} + b_5D_{23}}_{\text{Interactions}} + \epsilon_{i}$$`

+ We will talk get into more detail about this practice soon.

---
# Testing the overall effects
+ The goal of our `$F$`-tests for the overall effect of a condition or interaction, is to assess whether models which include all coefficients that code the condition improve the model.

+ Hopefully, this practice sounds familiar to you.
  + It's just using incremental `$F$` tests. 
  
+ To do incremental `$F$` tests, we need to define a set of models:

```r
m1 <- lm(SWB ~ Treatment, data = hosp_tbl)
m2 <- lm(SWB ~ Hospital, data = hosp_tbl)
m3 <- lm(SWB ~ Treatment + Hospital, data = hosp_tbl)
m4 <- lm(SWB ~ Treatment + Hospital + Treatment*Hospital, data = hosp_tbl)
```

---
# Testing the overall effects

+ For the effect of `Treatment`:

```r
m2 <- lm(SWB ~ Hospital, data = hosp_tbl)
m3 <- lm(SWB ~ Treatment + Hospital, data = hosp_tbl)
anova(m2,m3)
```

```
## Analysis of Variance Table
## 
## Model 1: SWB ~ Hospital
## Model 2: SWB ~ Treatment + Hospital
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1    178 1283.5                                 
## 2    176 1106.5  2    177.02 14.078 2.13e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

+ An effect of Treatment

---
# Testing the overall effects

+ For the effect of `Hospital`:

```r
m1 <- lm(SWB ~ Treatment, data = hosp_tbl)
m3 <- lm(SWB ~ Treatment + Hospital, data = hosp_tbl)
anova(m1,m3)
```

```
## Analysis of Variance Table
## 
## Model 1: SWB ~ Treatment
## Model 2: SWB ~ Treatment + Hospital
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    177 1116.1                           
## 2    176 1106.5  1    9.5681 1.5219  0.219
```

+ No effect of hospital

---
# Testing the overall effects

+ For the effect of interaction:

```r
m3 <- lm(SWB ~ Treatment + Hospital, data = hosp_tbl)
m4 <- lm(SWB ~ Treatment + Hospital + Treatment*Hospital, data = hosp_tbl)
anova(m3,m4)
```

```
## Analysis of Variance Table
## 
## Model 1: SWB ~ Treatment + Hospital
## Model 2: SWB ~ Treatment + Hospital + Treatment * Hospital
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    176 1106.51                                  
## 2    174  714.34  2    392.18 47.764 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

+ An interaction

---
# Testing the overall effects
+ Using `anova()`:

```r
m4 <- lm(SWB ~ Treatment + Hospital + Treatment*Hospital, data = hosp_tbl)
anova(m4)
```

```
## Analysis of Variance Table
## 
## Response: SWB
##                     Df Sum Sq Mean Sq F value    Pr(>F)    
## Treatment            2 177.02  88.511 21.5597 4.315e-09 ***
## Hospital             1   9.57   9.568  2.3306    0.1287    
## Treatment:Hospital   2 392.18 196.088 47.7635 < 2.2e-16 ***
## Residuals          174 714.34   4.105                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

+ The values are not identical (there are some devils in detail), but you can see the pattern of results is the same in both approaches.

---
# Summary of today

+ Look at constructing `$F$`-tests for the overall effect of conditions (categorical variables) from a factorial design.

+ Now we can move on to consider the interaction term in more detail.