Sum-to-zero (Effects) coding & Manual Contrasts

class: center, middle, inverse, title-slide

.title[
# <b> Sum-to-zero (Effects) coding & Manual Contrasts </b>
]
.subtitle[
## Data Analysis for Psychology in R 2<br><br>
]
.author[
### dapR2 Team
]
.institute[
### Department of Psychology<br>The University of Edinburgh
]

---

# Course Overview

.pull-left[

<table style="border: 1px solid black;>
  <tr style="padding: 0 1em 0 1em;">
    <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1;text-align:center;vertical-align: middle">
        <b>Introduction to Linear Models</b></td>
    <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1">
        Intro to Linear Regression</td>
  </tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1">
        Interpreting Linear Models</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1">
        Testing Individual Predictors</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1">
        Model Testing & Comparison</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1">
        Linear Model Analysis</td></tr>

<tr style="padding: 0 1em 0 1em;">
    <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1;text-align:center;vertical-align: middle">
        <b>Analysing Experimental Studies</b></td>
    <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1">
        Categorical Predictors & Dummy Coding</td>
  </tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1">
        <b>	Effects Coding & Coding Specific Contrasts</b></td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Assumptions & Diagnostics</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Bootstrapping</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        	Categorical Predictor Analysis</td></tr>
</table>

]

.pull-right[

<table style="border: 1px solid black;>
  <tr style="padding: 0 1em 0 1em;">
    <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4;text-align:center;vertical-align: middle">
        <b>Interactions</b></td>
    <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Interactions I</td>
  </tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Interactions II</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Interactions III</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Analysing Experiments</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Interaction Analysis</td></tr>

<tr style="padding: 0 1em 0 1em;">
    <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4;text-align:center;vertical-align: middle">
        <b>Advanced Topics</b></td>
    <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Power Analysis</td>
  </tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Binary Logistic Regression I</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Binary Logistic Regression II</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Logistic Regresison Analysis</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        	Exam Prep and Course Q&A</td></tr>
</table>

]

---

# This Week's Learning Objectives
1. Understand the difference between dummy and sum-to-zero coding

2. Understand the core principle of different coding schemes

3. Interpret the output from a model using sum-to-zero coding

4. Review rules for constructing contrasts

5. Continue using `emmeans` to investigate manual contrasts

---
class: inverse, center, middle

# Part 1: Why can't we always use dummy coding?

---
# Why not always use dummy coding?
+ Last week we discussed dummy coding:
  + Dummy coding creates a set of `$k$`-1 dummy variables (coded `0` or `1`)
  + Each variable's `$\beta$` reflects the difference between the group coded `1`, and the reference group (coded `0` across all dummy variables)
  + As such, we say it uses a reference group constraint to estimate our group means

+ This is a neat and (comparatively) straight-forward way to deal with categorical variables

+ But it doesn't always give us the exact test we need. We might want to compare to:
  + The overall mean of different groups (the grand mean)
  + Group 1 vs groups 2, 3, 4 combined
  + etc.

---
# Why not always use dummy coding?
+ Different coding schemes answer different research questions

+ This week we will consider the two examples on the previous slide:

1. Comparing a specific group to the overall mean of groups in your sample (grand mean). This is **sum-to-zero** or **effects coding**
  2. Comparing specific combinations of groups. These are **manual contrasts**

+ Let's start with the grand mean with our class study example

---
# Effects coding (sum to zero coding)

.pull-left[
![](dapr2_07_lmcategorical2_files/figure-html/unnamed-chunk-5-1.png)
]

.pull-right[

+ To interpret the plot:
  + Coloured points: individual test scores for students in each group
  + Solid coloured lines: group means
  + Dashed grey line: the grand mean (the mean of all the group means)

+ We can see already a key difference from dummy coding
  + Rather than all groups being compared to the mean of `read`, all will be compared to the grey line

]

> **Test your understanding:** If our coefficients reflect the comparison of each group to the grand mean, what direction of coefficient would we expect for each group?

> Where is the biggest absolute difference?

---
# Model with the grand mean
+ If we write our model including the grand mean, we get:

`$$y_{ij} = \mu + \beta_j + \epsilon_{ij}$$`
+ where
  + `$y_{ij}$` is the score for a given individual ( `$i$` ) in a given group ( `$j$` )
  + `$\mu$` is the grand mean
  + `$\beta_j$` is a group specific effect
  + `$\epsilon_{ij}$` is the individual deviation from the group mean
  
+ Let's briefly consider the constraints we apply, before looking at how we do this in R

---
# Model with the grand mean
+ Each group mean is:

`$$\mu_{read} = \mu + \beta_{read}$$`

`$$\mu_{self-test} = \mu + \beta_{self-test}$$`

`$$\mu_{summarise} = \mu + \beta_{summarise}$$`

+ And as with dummy coding, this means we have 4 things to estimate ( `$\mu$` , `$\beta_{read}$` , `$\beta_{self-test}$` , `$\beta_{summarise}$` ), but only 3 group means

---
# Sum to zero constraint

+ In sum to zero coding, we fix this with the following constraint:

`$$\sum_{j=1}^m \beta_j = 0$$`

+ Or alternatively written for the 3 group case:

`$$\beta_1 + \beta_2 + \beta_3 = 0$$`

---
# Sum to zero constraint
+ This constraints leads to the following interpretations:

+ `$\beta_0$` is the grand mean or `$\mu$`

+ `$\beta_j$` are the differences between the coded group and the grand mean:

`$$\beta_j = \mu_j - \mu$$`

---
# Why the grand mean?

`$$\beta_1 + \beta_2 + \beta_3 = 0$$`

+ Substitute `$\beta_0$` :

`$$(\mu_1 - \beta_0) + (\mu_2 - \beta_0) + (\mu_3 - \beta_0) = 0$$`

`$$\mu_1 + \mu_2 + \mu_3 = 3\beta_0$$`

$$\beta_0 = \frac{\mu_1 + \mu_2 + \mu_3}{3} $$
`$$\beta_0 = \mu$$`

---
# Sum to zero constraint

+ Finally, we can get back to our group means from the coefficients as follows:

`$$\mu_1 = \beta_0 + \beta_1$$`

`$$\mu_2 = \beta_0 + \beta_2$$`

`$$\mu_3 = \beta_0 - (\beta_1 + \beta_2)$$`

---
class: center, middle

# Questions?

---
class: inverse, center, middle

# Part 2: Calculating coefficients with sum-to-zero coding

---
# Group Means

``` r
test_study3 %>%
  select(1,2,6) %>%
  group_by(method) %>%
  summarise(
    mean = round(mean(score),3),
    sd = round(sd(score),1),
    N = n()
  )
```

```
## # A tibble: 3 × 4
##   method     mean    sd     N
##   <fct>     <dbl> <dbl> <int>
## 1 read       23.4   8      87
## 2 self-test  27.6   8.3    66
## 3 summarise  24.2   8      97
```

---
# Effects (sum to zero) model

+ We need to change the contrast scheme from default before running `lm`

``` r
contrasts(test_study3$method) <- contr.sum 
contrasts(test_study3$method)
```

```
##           [,1] [,2]
## read         1    0
## self-test    0    1
## summarise   -1   -1
```

---
# Effects (sum to zero) model

``` r
summary(lm(score ~ method, data = test_study3))
```

```
## 
## Call:
## lm(formula = score ~ method, data = test_study3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.4138  -5.3593  -0.1959   5.7496  17.8041 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  25.0618     0.5177  48.409  < 2e-16 ***
## method1      -1.6480     0.7198  -2.290  0.02289 *  
## method2       2.5139     0.7731   3.252  0.00131 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.079 on 247 degrees of freedom
## Multiple R-squared:  0.04224,	Adjusted R-squared:  0.03448 
## F-statistic: 5.447 on 2 and 247 DF,  p-value: 0.004845
```

---
# Effects (sum to zero) model

.pull-left[

```
## (Intercept)     method1     method2 
##      25.062      -1.648       2.514
```

+ Coefficients from group means

`$$\beta_0 = \frac{\mu_1 + \mu_2 + \mu_3}{3}$$`

`$$\beta_1 = \mu_1 - \mu$$`

`$$\beta_2 = \mu_2 - \mu$$`

]

.pull-right[

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> method </th>
   <th style="text-align:right;"> mean </th>
   <th style="text-align:right;"> Gmean </th>
   <th style="text-align:right;"> Coefficients </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> read </td>
   <td style="text-align:right;"> 23.414 </td>
   <td style="text-align:right;"> 25.062 </td>
   <td style="text-align:right;"> -1.648 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> self-test </td>
   <td style="text-align:right;"> 27.576 </td>
   <td style="text-align:right;"> 25.062 </td>
   <td style="text-align:right;"> 2.514 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> summarise </td>
   <td style="text-align:right;"> 24.196 </td>
   <td style="text-align:right;"> 25.062 </td>
   <td style="text-align:right;"> -0.866 </td>
  </tr>
</tbody>
</table>

]

---
# Effects (sum to zero) model

.pull-left[

```
## (Intercept)     method1     method2 
##      25.062      -1.648       2.514
```

+ Group means from coefficients:

<br>
`$$\mu_1 = \beta_0 + \beta_1$$`

`$$\mu_2 = \beta_0 + \beta_2$$`

`$$\mu_3 = \beta_0 - (\beta_1 + \beta_2)$$`
]

.pull-right[

``` r
25.062 + -1.648
```

```
## [1] 23.414
```

``` r
25.062 + 2.514
```

```
## [1] 27.576
```

``` r
25.062 - (-1.648 + 2.514)
```

```
## [1] 24.196
```

]

---
# The wide world of contrasts 
+ We have now seen two examples of coding schemes (dummy and effect).

+ There are **lots** of different coding schemes we can use for categorical variables to make different comparisons.
  + If you are interested, see the excellent resource on [UCLA website](https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/)

+ But always remember...

**The data is the same, the tested contrast differs**

---
class: inverse, center, middle

# Part 3: The data doesn't change, what we compare does

---
# The data is the same, the tested contrasts differ

+ We can run our model for `method` using both dummy and sum-to-zero coding schemes

``` r
contrasts(test_study3$method) <- contr.treatment
m_dummy <- lm(score ~ method, data = test_study3)

# Change the contrasts and run again
contrasts(test_study3$method) <- contr.sum
m_zero <- lm(score ~ method, data = test_study3)
```

+ We see that the model coefficients are different, because the tested contrast differs:

.pull-left[

``` r
coef(m_dummy)
```

```
##     (Intercept) methodself-test methodsummarise 
##      23.4137931       4.1619645       0.7820832
```
]

.pull-right[

``` r
coef(m_zero)
```

```
## (Intercept)     method1     method2 
##   25.061809   -1.648016    2.513949
```
]

---
# The data is the same, the tested contrasts differ

However, if we create a small data frame:

``` r
treat <- tibble(method = c("read", "self-test", "summarise"))
```

and add the predicted values from our models:

``` r
treat %>%
  mutate(
    pred_dummy = predict(m_dummy, newdata = .),
    pred_zero = predict(m_zero, newdata = .)
  )
```

```
## # A tibble: 3 × 3
##   method    pred_dummy pred_zero
##   <chr>          <dbl>     <dbl>
## 1 read            23.4      23.4
## 2 self-test       27.6      27.6
## 3 summarise       24.2      24.2
```

You can see that no matter what coding or contrasts we use, we are still modelling the same group means!

---
class: center, middle

# Questions?

---
class: inverse, center, middle

# Part 4: Setting up our own specific tests

---
# Why do we need manual contrasts?
+ We have looked now at dummy and sum-to-zero coding

+ These provide us with coefficients which test the significance of the difference between means of groups and some other mean (either reference group or the grand mean)
  + The other coding schemes we linked to do exactly the same thing

+ ***Sometimes*** we have a research question that requires the test of the difference between particular combinations of groups for which there is no *"off the shelf"* test

+ For such situations, we can apply a set of rules and test what are referred to as manual contrasts

+ We can structure a wide variety of contrasts so long as they can be written:
  + As a linear combination of weighted group means
  + With the associated weights on coefficients summing to zero

---
# New example
+ Suppose we were interested in the effect of various relationship statuses on an individuals subjective well-being (`swb`)

+ Our predictor is `status` which has 5 levels:
  + Married or Civil Partnership
  + Cohabiting relationship
  + Single
  + Widowed
  + Divorced

+ Let's say we have data on 500 people

---
# Data
<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> status </th>
   <th style="text-align:right;"> n </th>
   <th style="text-align:right;"> mean </th>
   <th style="text-align:right;"> sd </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Cohab </td>
   <td style="text-align:right;"> 100 </td>
   <td style="text-align:right;"> 11.44 </td>
   <td style="text-align:right;"> 4.22 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Divorced </td>
   <td style="text-align:right;"> 50 </td>
   <td style="text-align:right;"> 9.37 </td>
   <td style="text-align:right;"> 2.34 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Married/CP </td>
   <td style="text-align:right;"> 275 </td>
   <td style="text-align:right;"> 10.63 </td>
   <td style="text-align:right;"> 3.41 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Single </td>
   <td style="text-align:right;"> 50 </td>
   <td style="text-align:right;"> 8.06 </td>
   <td style="text-align:right;"> 2.19 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Widowed </td>
   <td style="text-align:right;"> 25 </td>
   <td style="text-align:right;"> 6.00 </td>
   <td style="text-align:right;"> 1.07 </td>
  </tr>
</tbody>
</table>

---
# Our questions
+ Suppose we want to know if there are `swb` differences between:

1. Those who are currently or previously married or in a civil partnership, vs those who have never been married or in a civil partnership
    + Group 1: `Married/CP`, `Divorced`, `Widowed`
    + Group 2: `Single`, `Cohab`

2. Those who are currently married or in a civil partnership, vs those who have previously been
    + Group 1: `Married/CP`
    + Group 2: `Divorced`, `Widowed`

+ To test this, we need to:
  + group the levels of our factor `status`
  + calculate a mean of these new sub-groups making sure all levels contribute equally to their respective groups
  + then test the difference between these means

+ Manual contrasts can do this for us, if we follow some rules

---
# Rules for manual contrasts

+ **Rule 1**: Weights ( `$c$`) range between -1 and 1

+ **Rule 2**: The group(s) in one chunk are given negative weights, the group(s) in the other get positive weights

+ **Rule 3**: The sum of the weights of the comparison must be 0

+ **Rule 4**: If a group is not involved in the comparison, weight is 0

+ **Rule 5**: For a given comparison, weights assigned to group(s) are equal to 1 divided by the number of groups in that chunk

+ **Rule 6**: Restrict yourself to running `$k$` - 1 comparisons (where `$k$` = number of groups)

+ **Rule 7**: Each contrast can only compare 2 chunks

+ **Rule 8**: Once a group is singled out, it can not enter other contrasts

---
# Applying rules

.pull-left[

+ Let's construct two contrasts:

1. Those who are currently or previously married or in a civil partnership vs not.

2. Those who are currently married or in a civil partnership vs those who have previously been.

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> group </th>
   <th style="text-align:left;"> contrast1 </th>
   <th style="text-align:left;"> contrast2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Cohab </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Divorced </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Married/CP </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Single </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Widowed </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
</tbody>
</table>
]

.pull-right[
+ **Rule 1**: Weights range between -1 and 1

+ **Rule 2**: Groups in one chunk are given negative weights, groups in the other get positive weights

+ **Rule 3**: The sum of the weights of the comparison must be 0

+ **Rule 4**: If a group is not involved in the comparison, weight is 0

+ **Rule 5**: For a given comparison, weights assigned to group(s) = 1 divided by the number of groups in that chunk.

+ **Rule 6**: Restrict yourself to running `$k$` - 1 comparisons

+ **Rule 7**: Each contrast can only compare 2 chunks

+ **Rule 8**: Once a group is singled out, it can not enter other contrasts 
]

---
# Applying rules

.pull-left[

+ Let's construct two contrasts:

1. Those who are currently or previously married or in a civil partnership vs not.

2. Those who are currently married or in a civil partnership vs those who have previously been.

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> group </th>
   <th style="text-align:right;"> contrast1 </th>
   <th style="text-align:left;"> contrast2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Cohab </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Divorced </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Married/CP </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Single </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Widowed </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:left;">  </td>
  </tr>
</tbody>
</table>
]

.pull-right[
+ **Rule 1**: Weights range between -1 and 1

+ **Rule 2**: Groups in one chunk are given negative weights, groups in the other get positive weights

+ **Rule 3**: The sum of the weights of the comparison must be 0

+ **Rule 4**: If a group is not involved in the comparison, weight is 0

+ **Rule 5**: For a given comparison, weights assigned to group(s) = 1 divided by the number of groups in that chunk

+ **Rule 6**: Restrict yourself to running `$k$` - 1 comparisons

+ **Rule 7**: Each contrast can only compare 2 chunks

+ **Rule 8**: Once a group is singled out, it can not enter other contrasts 
]

---
# Applying rules

.pull-left[

+ Let's construct two contrasts:

1. Those who are currently or previously married or in a civil partnership vs not

2. Those who are currently married or in a civil partnership vs those who have previously been

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> group </th>
   <th style="text-align:right;"> contrast1 </th>
   <th style="text-align:right;"> contrast2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Cohab </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:right;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Divorced </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Married/CP </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> 1.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Single </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:right;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Widowed </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
  </tr>
</tbody>
</table>
]

.pull-right[
+ **Rule 1**: Weights range between -1 and 1

+ **Rule 2**: Groups in one chunk are given negative weights, groups in the other get positive weights

+ **Rule 3**: The sum of the weights of the comparison must be 0

+ **Rule 4**: If a group is not involved in the comparison, weight is 0

+ **Rule 5**: For a given comparison, weights assigned to group(s) = 1 divided by the number of groups in that chunk.

+ **Rule 6**: Restrict yourself to running `$k$` - 1 comparisons

+ **Rule 7**: Each contrast can only compare 2 chunks

+ **Rule 8**: Once a group is singled out, it can not enter other contrasts 
]

---
# Orthogonal vs. Non-orthogonal Contrasts
+ Orthogonal contrasts test independent sources of variation
  + If we follow the rules above, we will have orthogonal contrasts

+ Non-orthogonal contrasts test non-independent sources of variation
  + This presents some further statistical challenges in terms of making inferences
  + We will come back to this discussion later in the course

---
# Checking if contrasts are orthogonal
+ The sum of the products of the weights will = 0 for any pair of orthogonal comparisons

`$$\sum{c_{1j}c_{2j}} = 0$$`

---
# From our example

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> group </th>
   <th style="text-align:right;"> contrast1 </th>
   <th style="text-align:right;"> contrast2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Cohab </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:right;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Divorced </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Married/CP </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> 1.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Single </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:right;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Widowed </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
  </tr>
</tbody>
</table>

+ Below we can see the product of `$c_1c_2$` for each level, and the row-wise sums for each contrast and the products
  + The 0 for contrast 1 and 2 show we have set correct weights
  + The 0 for the product shows the contrasts are orthogonal

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Contrast </th>
   <th style="text-align:right;"> Cohab </th>
   <th style="text-align:right;"> Divorced </th>
   <th style="text-align:right;"> Married_CP </th>
   <th style="text-align:right;"> Single </th>
   <th style="text-align:right;"> Widowed </th>
   <th style="text-align:right;"> Sum </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Contrast1 </td>
   <td style="text-align:right;"> -0.5 </td>
   <td style="text-align:right;"> 0.330 </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
   <td style="text-align:right;"> 0.330 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Contrast2 </td>
   <td style="text-align:right;"> 0.0 </td>
   <td style="text-align:right;"> -0.500 </td>
   <td style="text-align:right;"> 1.00 </td>
   <td style="text-align:right;"> 0.0 </td>
   <td style="text-align:right;"> -0.500 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Product </td>
   <td style="text-align:right;"> 0.0 </td>
   <td style="text-align:right;"> -0.165 </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> 0.0 </td>
   <td style="text-align:right;"> -0.165 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
</tbody>
</table>

---
class: inverse, center, middle

# Part 5: Testing manual contrasts using emmeans

---
# Using `emmeans` to test contrasts

+ We will use the package `emmeans` to test our contrasts
  + We will also be using this in the next few weeks to look at analysing experimental designs

+ **E**stimated
+ **M**arginal
+ **Means**

+ Essentially this package provides us with a lot of tools to help us model contrasts and linear functions

---
# Working with `emmeans`
+ First we run our model:

``` r
status_res <- lm(swb ~ status, wb_tib)
```

+ Next we use the `emmeans` to get the estimated means of our groups.

``` r
status_mean <- emmeans(status_res, ~status)
status_mean
```

```
##  status     emmean    SE  df lower.CL upper.CL
##  Cohab       11.44 0.333 495    10.78    12.09
##  Divorced     9.37 0.471 495     8.45    10.30
##  Married/CP  10.63 0.201 495    10.23    11.02
##  Single       8.06 0.471 495     7.13     8.99
##  Widowed      6.00 0.666 495     4.70     7.31
## 
## Confidence level used: 0.95
```

---
# Visualise estimated means

.pull-left[

``` r
plot(status_mean)
```

+ We then use these means to test contrasts

]

.pull-right[
![](dapr2_07_lmcategorical2_files/figure-html/unnamed-chunk-30-1.png)

]

---
# Defining the contrast

+ **KEY POINT**: The order of your categorical variable matters as `emmeans` uses this order

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> group </th>
   <th style="text-align:right;"> contrast1 </th>
   <th style="text-align:right;"> contrast2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Cohab </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:right;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Divorced </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Married/CP </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> 1.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Single </td>
   <td style="text-align:right;"> -0.50 </td>
   <td style="text-align:right;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Widowed </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> -0.5 </td>
  </tr>
</tbody>
</table>

``` r
levels(wb_tib$status)
```

```
## [1] "Cohab"      "Divorced"   "Married/CP" "Single"     "Widowed"
```

``` r
status_comp <- list("Married or CP vs not" = c(-1/2, 1/3, 1/3, -1/2, 1/3),
                    "Current vs Not current" = c(0, -1/2, 1, 0, -1/2))
```

---
# Requesting the test
+ In order to test our effects, we use the `contrast` function from `emmeans`

``` r
status_comp_test <- contrast(status_mean, status_comp)
status_comp_test
```

```
##  contrast               estimate    SE  df t.ratio p.value
##  Married or CP vs not      -1.08 0.402 495  -2.690  0.0074
##  Current vs Not current     2.94 0.455 495   6.459  <.0001
```
+ We can see we have p-values, but we can also request confidence intervals

``` r
confint(status_comp_test)
```

```
##  contrast               estimate    SE  df lower.CL upper.CL
##  Married or CP vs not      -1.08 0.402 495    -1.87   -0.291
##  Current vs Not current     2.94 0.455 495     2.04    3.829
## 
## Confidence level used: 0.95
```

---
# Interpreting the results
+ The estimate is the difference between the average of the group means within each chunk

``` r
confint(status_comp_test)
```

``` r
((10.63 + 6.00 + 9.37)/3) - ((11.44 + 8.06)/2)
```

```
## [1] -1.083333
```
+ So those who are not currently or previously married or in a civial partnership have higher SWB.
  + And this is significant

---
class: center, middle

# Questions?

---
# Summary of today

+ We have considered different ways in which we can code categorical predictors

+ Take home:
  + Use of coding schemes allows us to compare groups (or levels) in lots of ways
  + Our `$\beta$`'s will represent differences in group means
  + The scheme we use determines which group or combination of groups we are comparing
  + **In all cases the underlying data is unchanged**

+ We also looked at the use of `emmeans` in testing manual contrasts
  + Run the model
  + Estimate the means
  + Define the contrast
  + Test the contrast

+ Coding schemes are a very flexible tool for testing hypotheses

---

## This week

.pull-left[

### Tasks

**Attend your lab and work together on the exercises**

<br>

**Complete the weekly quiz**

]

.pull-right[

### Support

**Help each other on the Piazza forum**

<br>

**Attend office hours (see Learn page for details)**

]

---
class: inverse, center, middle

# Thanks for listening