Testing Contrasts and One-way Analyses

class: center, middle, inverse, title-slide

.title[
# <b> Testing Contrasts and One-way Analyses </b>
]
.subtitle[
## Data Analysis for Psychology in R 2<br><br>
]
.author[
### dapR2 Team
]
.institute[
### Department of Psychology<br>The University of Edinburgh
]

---

# Week's Learning Objectives
1. Introduce rules for constructing contrasts
2. Introduce `emmeans` as a tool for testing different effects in models with categorical predictors.
3. Brief refresher on experimental design
4. Distinguish between main effects, simple effects and contrasts
5. Be able to estimate main effects via use of `$F$`-tests

---
# Manual contrast testing
+ We can structure a wide variety of contrasts so long as they can be written:

1. A as a linear combination of population means.
  2. The associated coefficients (weights) sum to zero.

+ So

$$H_0: c_1\mu_1 + c_2\mu_2 + c_3\mu_3 $$

+ With

`$$c_1 + c_2 + c_3 = 0$$`

---
# Rules for assigning weights

+ **Rule 1**: Weights range between -1 and 1
+ **Rule 2**: The group(s) in one chunk are given negative weights, the group(s) in the other get positive weights
+ **Rule 3**: The sum of the weights of the comparison must be 0
+ **Rule 4**: If a group is not involved in the comparison, weight is 0
+ **Rule 5**: For a given comparison, weights assigned to group(s) are equal to 1 divided by the number of groups in that chunk.
+ **Rule 6**: Restrict yourself to running `$k$` - 1 comparisons (where `$k$` = number of groups)
+ **Rule 7**: Each contrast can only compare 2 chunks of variance
+ **Rule 8**: Once a group singled out, it can not enter other contrasts

---
# New example
+ Suppose we were interested in the effect of various relationship statuses on an individuals subjective well-being (`swb`)
  + Keeping with a theme on our outcome.

+ Our predictor is `status` which has 5 levels:
  + Married or Civil Partnership
  + Cohabiting relationship
  + Single
  + Widowed
  + Divorced

+ Let's say we have data on 500 people.

---
# Data
<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> status </th>
   <th style="text-align:right;"> n </th>
   <th style="text-align:right;"> mean </th>
   <th style="text-align:right;"> sd </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Cohab </td>
   <td style="text-align:right;"> 100 </td>
   <td style="text-align:right;"> 11.44 </td>
   <td style="text-align:right;"> 4.22 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Divorced </td>
   <td style="text-align:right;"> 50 </td>
   <td style="text-align:right;"> 9.37 </td>
   <td style="text-align:right;"> 2.34 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Married/CP </td>
   <td style="text-align:right;"> 275 </td>
   <td style="text-align:right;"> 10.63 </td>
   <td style="text-align:right;"> 3.41 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Single </td>
   <td style="text-align:right;"> 50 </td>
   <td style="text-align:right;"> 8.06 </td>
   <td style="text-align:right;"> 2.19 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Widowed </td>
   <td style="text-align:right;"> 25 </td>
   <td style="text-align:right;"> 6.00 </td>
   <td style="text-align:right;"> 1.07 </td>
  </tr>
</tbody>
</table>

---
# Applying rules
+ Let's say we want to make two contrasts

1. Those who are currently or previously married or in a civil partnership vs not.
2. Those who are currently married or in a civil partnership vs those who have previously been.

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> group </th>
   <th style="text-align:right;"> contrast1 </th>
   <th style="text-align:right;"> contrast2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Cohab </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:right;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Divorced </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Married/CP </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> 1.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Single </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:right;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Widowed </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
  </tr>
</tbody>
</table>

---
# Orthogonal vs. Non-orthogonal Contrasts
+ Orthogonal contrasts test independent sources of variation.
  + If we follow the rules above, we will have orthogonal contrasts.

+ Non-orthogonal contrasts test non-independent sources of variation.
  + This presents some further statistical challenges in terms of making inferences. 
  + We will come back to this discussion later in the course.

---
# Rule 10: Checking if contrasts are orthogonal
+ The sum of the products of the weights will = 0 for any pair of orthogonal comparisons

`$$\sum{c_{1j}c_{2j}} = 0$$`

---
# From our example

+ Below we can see the product of `$c_1c_2$` for each level, and the row-wise sums for each contrast and the products.
  + The 0 for contrast 1 and 2 show we have set correct weights.
  + The 0 for the product shows the contrasts are orthogonal

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Contrast </th>
   <th style="text-align:right;"> Cohab </th>
   <th style="text-align:right;"> Divorced </th>
   <th style="text-align:right;"> Married_CP </th>
   <th style="text-align:right;"> Single </th>
   <th style="text-align:right;"> Widowed </th>
   <th style="text-align:right;"> Sum </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Contrast1 </td>
   <td style="text-align:right;"> -0.5 </td>
   <td style="text-align:right;"> 0.330 </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
   <td style="text-align:right;"> 0.330 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Contrast2 </td>
   <td style="text-align:right;"> 0.0 </td>
   <td style="text-align:right;"> -0.500 </td>
   <td style="text-align:right;"> 1.00 </td>
   <td style="text-align:right;"> 0.0 </td>
   <td style="text-align:right;"> -0.500 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Product </td>
   <td style="text-align:right;"> 0.0 </td>
   <td style="text-align:right;"> -0.165 </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> 0.0 </td>
   <td style="text-align:right;"> -0.165 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
</tbody>
</table>

---
class: center, middle
# Questions....

---
# Using `emmeans` to test contrasts

+ We will use the package `emmeans` to test our contrasts
  + We will also be using this in the next few weeks to look at analysing experimental designs.

+ **E**stimated
+ **M**arginal
+ **Means**

+ Essentially this package provides us with a lot of tools to help us model contrasts and linear functions.

---
# Working with `emmeans`
+ First we run our model:

```r
status_res <- lm(swb ~ status, wb_tib)
```

+ wNext we use the `emmeans` to get the estimated means of our groups.

```r
status_mean <- emmeans(status_res, ~status)
status_mean
```

```
##  status     emmean    SE  df lower.CL upper.CL
##  Cohab       11.44 0.333 495    10.78    12.09
##  Divorced     9.37 0.471 495     8.45    10.30
##  Married/CP  10.63 0.201 495    10.23    11.02
##  Single       8.06 0.471 495     7.13     8.99
##  Widowed      6.00 0.666 495     4.70     7.31
## 
## Confidence level used: 0.95
```

---
# Visualise estimated means

.pull-left[

```r
plot(status_mean)
```

+ We then use these means to test contrasts

]

.pull-right[
![](dapr2_13_onewaycontrasts_files/figure-html/unnamed-chunk-9-1.png)

]

---
# Defining the contrast

+ **KEY POINT**: The order of your categorical variable matters as `emmeans` uses this order.

```r
levels(wb_tib$status)
```

```
## [1] "Cohab"      "Divorced"   "Married/CP" "Single"     "Widowed"
```

```r
status_comp <- list("Married or CP vs not" = c(-1/2, 1/3, 1/3, -1/2, 1/3),
                    "Current vs Not current" = c(0, -1/2, 1, 0, -1/2))
```

---
# Requesting the test
+ In order to test our effects, we use the `contrast` function from `emmeans`

```r
status_comp_test <- contrast(status_mean, status_comp)
status_comp_test
```

```
##  contrast               estimate    SE  df t.ratio p.value
##  Married or CP vs not      -1.08 0.402 495  -2.690  0.0074
##  Current vs Not current     2.94 0.455 495   6.459  <.0001
```
+ We can see we have p-values, but we can also request confidence intervals

```r
confint(status_comp_test)
```

```
##  contrast               estimate    SE  df lower.CL upper.CL
##  Married or CP vs not      -1.08 0.402 495    -1.87   -0.291
##  Current vs Not current     2.94 0.455 495     2.04    3.829
## 
## Confidence level used: 0.95
```

---
# Interpreting the results
+ The estimate is the difference between the average of the group means within each chunk.

```r
confint(status_comp_test)
```

```r
((10.63 + 6.00 + 9.37)/3) - ((11.44 + 8.06)/2)
```

```
## [1] -1.083333
```
+ So those who are not currently or previously married or in a civial partnership have higher SWB.
  + And this is significant.

---
class: center, middle
# Questions....

---
# Experimental Design: manipulation
+ A key feature of experimental designs is that we actively manipulate our predictor (IV).

+ The intention is that changing the predictor will result in changes in the outcome (DV).

+ That is our manipulation will lead to variation in the outcome.

+ Our experiments can fail because we design these manipulations poorly.

+ The predictors in an experiment are (primarily) experimental conditions.

---
# Conditions/Factors & levels
+ **Conditions**: 
  + Are part of our experimental designs.
  + They are what is manipulated.
  
+ **Factors**
  + The resultant variables in our data set that code the experimental conditions are typically called factors.
  + Generally the terms conditions and factors are used interchangeably.
  + But it is useful to differentiate the design (conditions) and the data that represents aspects of the design (factors)

+ Factors can have **levels**
  + These are the number of ways we vary or manipulate the condition

---
# Example
+ So for our now very familiar example:

+ **Condition 1**: `Treatment` (Levels: TreatA, TreatB, TreatC).
  
+ **Condition 2**: `Hosp` (Levels: Hosp1, Hosp2). 
  
+ **Outcome**: Subjective well-being (SWB)

---
# Models and Experiments
+ Our linear model can be simply stated as:

`$$outcome = model + error$$`

+ When we have an experiment:

`$$outcome = design + error$$`

+ The design is simply sets of categorical variables.

`$$y = b_0 + \underbrace{(b_1E_1 + b_2E_2)}_{\text{Conditin1}} + \underbrace{b_3E_3}_{\text{Condition2}} + \underbrace{b_4E_{13} + b_5E_{23}}_{\text{Interactions}} + \underbrace{\epsilon_{i}}_{\text{error}}$$`

+ So to analyse an experiment, we are simply analysing a linear model with categorical predictors.

---
# Hypotheses we test in experimental studies
+ In a one-way design we only have one condition that is manipulated:

`$$y = b_0 + \underbrace{(b_1E_1 + b_2E_2)}_{\text{Treatment}} +  \underbrace{\epsilon_{i}}_{\text{error}}$$`

+ One-way designs:
  + **Main effect**: Tests overall effect of a condition  ( `$F$`-tests)
  + **Contrasts**: Tests differences between specific group means (based on coding schemes and associated `$\beta$` )

---
# Hypotheses we test in experimental studies
+ In a two-way (or 2+ way) design we manipulate multiple conditions:

`$$y_{ijk} = b_0 + \underbrace{(b_1E_1 + b_2E_2)}_{\text{Treatment}} + \underbrace{b_3E_3}_{\text{Hospital}} + \underbrace{b_4E_{13} + b_5E_{23}}_{\text{Interactions}} + \epsilon_{i}$$`

+ Factorial designs:
  + Main effects & Contrasts
  + **Interactions**: Categorical*categorical and usually based on effects (sum to zero) coding ( `$F$`-tests & `$\beta$` )
  + **Simple contrasts/effects**: Effects of one level in one condition, across levels of another condition.

---
# Hypotheses we test 
+ Main effects
  + An overall, or average, effect of a condition.
  + Is there an effect of `Treatment` averaged over `Hospital`? 
  + Is there an effect of `Hospital` averaged over `Treatment`?

+ Interactions (categorical*categorical)
  + A change in the effect of some condition as a function of another.
  + Does the effect of `Treatment` differ by `Hospital`? 
  
+ Simple contrasts/effects
  + An effect of one condition at a specific level of another.
  + Is there an effect of `Hospital` for those receiving `Treatment A`? (...and so on for all combinations.)

---
# One way main effects

+ As we have an experiment, we typically use effects coding:

```r
contrasts(hosp_tbl$Treatment) <- contr.sum
```

+ Run the model:

```r
m1 <- lm(SWB ~ Treatment, data = hosp_tbl)
anova(m1)
```

```
## Analysis of Variance Table
## 
## Response: SWB
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## Treatment   2  177.02  88.511  14.037 2.196e-06 ***
## Residuals 177 1116.08   6.306                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
# One way main effects

```r
summary(m1)
```

```
## 
## Call:
## lm(formula = SWB ~ Treatment, data = hosp_tbl)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.373 -1.987 -0.300  1.838  7.173 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.8806     0.1872  52.791  < 2e-16 ***
## Treatment1   -0.5539     0.2647  -2.093   0.0378 *  
## Treatment2    1.3928     0.2647   5.262 4.09e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.511 on 177 degrees of freedom
## Multiple R-squared:  0.1369,	Adjusted R-squared:  0.1271 
## F-statistic: 14.04 on 2 and 177 DF,  p-value: 2.196e-06
```

---
class: center, middle
# All good?
**So lets look at main effects for factorial designs**

---
# Using model comparisons
+ In order to tests main effects, we need to compare sets of models that differ by the variable of interest.

+ To do all effects in our two-way design, we need 4 models:

```r
comp1 <- lm(SWB ~ Treatment, data = hosp_tbl)
comp2 <- lm(SWB ~ Hospital, data = hosp_tbl)
comp3 <- lm(SWB ~ Treatment + Hospital, data = hosp_tbl)
comp4 <- lm(SWB ~ Treatment + Hospital + Treatment*Hospital, data = hosp_tbl)
```

---
# Testing the overall effects

+ For the effect of `Treatment`:

```r
comp2 <- lm(SWB ~ Hospital, data = hosp_tbl)
comp3 <- lm(SWB ~ Treatment + Hospital, data = hosp_tbl)

anova(comp2,comp3)
```

```
## Analysis of Variance Table
## 
## Model 1: SWB ~ Hospital
## Model 2: SWB ~ Treatment + Hospital
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1    178 1283.5                                 
## 2    176 1106.5  2    177.02 14.078 2.13e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

+ An effect of Treatment

---
# Testing the overall effects

+ For the effect of `Hospital`:

```r
comp1 <- lm(SWB ~ Treatment, data = hosp_tbl)
comp3 <- lm(SWB ~ Treatment + Hospital, data = hosp_tbl)

anova(comp1, comp3)
```

```
## Analysis of Variance Table
## 
## Model 1: SWB ~ Treatment
## Model 2: SWB ~ Treatment + Hospital
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    177 1116.1                           
## 2    176 1106.5  1    9.5681 1.5219  0.219
```

+ No effect of hospital

---
# Testing the overall effects

+ For the effect of interaction:

```r
comp3 <- lm(SWB ~ Treatment + Hospital, data = hosp_tbl)
comp4 <- lm(SWB ~ Treatment + Hospital + Treatment*Hospital, data = hosp_tbl)

anova(comp3, comp4)
```

```
## Analysis of Variance Table
## 
## Model 1: SWB ~ Treatment + Hospital
## Model 2: SWB ~ Treatment + Hospital + Treatment * Hospital
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    176 1106.51                                  
## 2    174  714.34  2    392.18 47.764 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

+ An interaction

---
# Using `anova` on a single model

```r
anova(comp4)
```

```
## Analysis of Variance Table
## 
## Response: SWB
##                     Df Sum Sq Mean Sq F value    Pr(>F)    
## Treatment            2 177.02  88.511 21.5597 4.315e-09 ***
## Hospital             1   9.57   9.568  2.3306    0.1287    
## Treatment:Hospital   2 392.18 196.088 47.7635 < 2.2e-16 ***
## Residuals          174 714.34   4.105                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
# Testing the overall effects
+ You may have noted using `anova()` for a single model, and for the model comparison approach yield slightly different results.
  + Sums of squares difference is the same
  + Degrees of freedom are the same
  + `$F$` is slightly different for `Treatment` and `Hospital` (and therefore so is `$p$`-value)

+ Note the main conclusions do not change.

+ This difference relates to differences in the degrees of freedom associated with the `$F$`-test.

---
# Summary
+ This week we have looked at the use of `emmeans` to test specific contrasts.
  + Run the model
  + Estimate the means
  + Define the contrast
  + Test the contrast

+ We recapped experimental designs

+ And we began to explore testing them.
  + Next week we will continue this to recap interactions, look at interacting contrasts, simple and pairwise tests

---
class: center, middle
# Thanks for listening!