class: center, middle, inverse, title-slide .title[ #
Sum-to-zero (Effects) coding & Manual Contrasts
] .subtitle[ ## Data Analysis for Psychology in R 2
] .author[ ### dapR2 Team ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- # Course Overview .pull-left[ <table style="border: 1px solid black;> <tr style="padding: 0 1em 0 1em;"> <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1;text-align:center;vertical-align: middle"> <b>Introduction to Linear Models</b></td> <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Intro to Linear Regression</td> </tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Interpreting Linear Models</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Testing Individual Predictors</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Model Testing & Comparison</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Linear Model Analysis</td></tr> <tr style="padding: 0 1em 0 1em;"> <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1;text-align:center;vertical-align: middle"> <b>Analysing Experimental Studies</b></td> <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Categorical Predictors & Dummy Coding</td> </tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> <b> Effects Coding & Coding Specific Contrasts</b></td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Assumptions & Diagnostics</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Bootstrapping</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Categorical Predictor Analysis</td></tr> </table> ] .pull-right[ <table style="border: 1px solid black;> <tr style="padding: 0 1em 0 1em;"> <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4;text-align:center;vertical-align: middle"> <b>Interactions</b></td> <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Interactions I</td> </tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Interactions II</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Interactions III</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Analysing Experiments</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Interaction Analysis</td></tr> <tr style="padding: 0 1em 0 1em;"> <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4;text-align:center;vertical-align: middle"> <b>Advanced Topics</b></td> <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Power Analysis</td> </tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Binary Logistic Regression I</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Binary Logistic Regression II</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Logistic Regresison Analysis</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Exam Prep and Course Q&A</td></tr> </table> ] --- # This Week's Learning Objectives 1. Understand the difference between dummy and sum-to-zero coding 2. Understand the core principle of different coding schemes 3. Interpret the output from a model using sum-to-zero coding 4. Review rules for constructing contrasts 5. Continue using `emmeans` to investigate manual contrasts --- class: inverse, center, middle # Part 1: Why can't we always use dummy coding? --- # Why not always use dummy coding? + Last week we discussed dummy coding: + Dummy coding creates a set of `\(k\)`-1 dummy variables (coded `0` or `1`) + Each variable's `\(\beta\)` reflects the difference between the group coded `1`, and the reference group (coded `0` across all dummy variables) + As such, we say it uses a reference group constraint to estimate our group means + This is a neat and (comparatively) straight-forward way to deal with categorical variables + But it doesn't always give us the exact test we need. We might want to compare to: + The overall mean of different groups (the grand mean) + Group 1 vs groups 2, 3, 4 combined + etc. --- # Why not always use dummy coding? + Different coding schemes answer different research questions + This week we will consider the two examples on the previous slide: 1. Comparing a specific group to the overall mean of groups in your sample (grand mean). This is **sum-to-zero** or **effects coding** 2. Comparing specific combinations of groups. These are **manual contrasts** + Let's start with the grand mean with our class study example --- # Effects coding (sum to zero coding) .pull-left[ ![](dapr2_07_lmcategorical2_files/figure-html/unnamed-chunk-5-1.png)<!-- --> ] .pull-right[ + To interpret the plot: + Coloured points: individual test scores for students in each group + Solid coloured lines: group means + Dashed grey line: the grand mean (the mean of all the group means) + We can see already a key difference from dummy coding + Rather than all groups being compared to the mean of `read`, all will be compared to the grey line ] -- > **Test your understanding:** If our coefficients reflect the comparison of each group to the grand mean, what direction of coefficient would we expect for each group? -- > Where is the biggest absolute difference? --- # Model with the grand mean + If we write our model including the grand mean, we get: `$$y_{ij} = \mu + \beta_j + \epsilon_{ij}$$` + where + `\(y_{ij}\)` is the score for a given individual ( `\(i\)` ) in a given group ( `\(j\)` ) + `\(\mu\)` is the grand mean + `\(\beta_j\)` is a group specific effect + `\(\epsilon_{ij}\)` is the individual deviation from the group mean + Let's briefly consider the constraints we apply, before looking at how we do this in R --- # Model with the grand mean + Each group mean is: `$$\mu_{read} = \mu + \beta_{read}$$` `$$\mu_{self-test} = \mu + \beta_{self-test}$$` `$$\mu_{summarise} = \mu + \beta_{summarise}$$` + And as with dummy coding, this means we have 4 things to estimate ( `\(\mu\)` , `\(\beta_{read}\)` , `\(\beta_{self-test}\)` , `\(\beta_{summarise}\)` ), but only 3 group means --- # Sum to zero constraint + In sum to zero coding, we fix this with the following constraint: `$$\sum_{j=1}^m \beta_j = 0$$` + Or alternatively written for the 3 group case: `$$\beta_1 + \beta_2 + \beta_3 = 0$$` --- # Sum to zero constraint + This constraints leads to the following interpretations: + `\(\beta_0\)` is the grand mean or `\(\mu\)` + `\(\beta_j\)` are the differences between the coded group and the grand mean: `$$\beta_j = \mu_j - \mu$$` --- # Why the grand mean? `$$\beta_1 + \beta_2 + \beta_3 = 0$$` + Substitute `\(\beta_0\)` : `$$(\mu_1 - \beta_0) + (\mu_2 - \beta_0) + (\mu_3 - \beta_0) = 0$$` `$$\mu_1 + \mu_2 + \mu_3 = 3\beta_0$$` $$\beta_0 = \frac{\mu_1 + \mu_2 + \mu_3}{3} $$ `$$\beta_0 = \mu$$` --- # Sum to zero constraint + Finally, we can get back to our group means from the coefficients as follows: `$$\mu_1 = \beta_0 + \beta_1$$` `$$\mu_2 = \beta_0 + \beta_2$$` `$$\mu_3 = \beta_0 - (\beta_1 + \beta_2)$$` --- class: center, middle # Questions? --- class: inverse, center, middle # Part 2: Calculating coefficients with sum-to-zero coding --- # Group Means ``` r test_study3 %>% select(1,2,6) %>% group_by(method) %>% summarise( mean = round(mean(score),3), sd = round(sd(score),1), N = n() ) ``` ``` ## # A tibble: 3 × 4 ## method mean sd N ## <fct> <dbl> <dbl> <int> ## 1 read 23.4 8 87 ## 2 self-test 27.6 8.3 66 ## 3 summarise 24.2 8 97 ``` --- # Effects (sum to zero) model + We need to change the contrast scheme from default before running `lm` ``` r contrasts(test_study3$method) <- contr.sum contrasts(test_study3$method) ``` ``` ## [,1] [,2] ## read 1 0 ## self-test 0 1 ## summarise -1 -1 ``` --- # Effects (sum to zero) model ``` r summary(lm(score ~ method, data = test_study3)) ``` ``` ## ## Call: ## lm(formula = score ~ method, data = test_study3) ## ## Residuals: ## Min 1Q Median 3Q Max ## -23.4138 -5.3593 -0.1959 5.7496 17.8041 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 25.0618 0.5177 48.409 < 2e-16 *** ## method1 -1.6480 0.7198 -2.290 0.02289 * ## method2 2.5139 0.7731 3.252 0.00131 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8.079 on 247 degrees of freedom ## Multiple R-squared: 0.04224, Adjusted R-squared: 0.03448 ## F-statistic: 5.447 on 2 and 247 DF, p-value: 0.004845 ``` --- # Effects (sum to zero) model .pull-left[ ``` ## (Intercept) method1 method2 ## 25.062 -1.648 2.514 ``` + Coefficients from group means `$$\beta_0 = \frac{\mu_1 + \mu_2 + \mu_3}{3}$$` `$$\beta_1 = \mu_1 - \mu$$` `$$\beta_2 = \mu_2 - \mu$$` ] .pull-right[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> method </th> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> Gmean </th> <th style="text-align:right;"> Coefficients </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> read </td> <td style="text-align:right;"> 23.414 </td> <td style="text-align:right;"> 25.062 </td> <td style="text-align:right;"> -1.648 </td> </tr> <tr> <td style="text-align:left;"> self-test </td> <td style="text-align:right;"> 27.576 </td> <td style="text-align:right;"> 25.062 </td> <td style="text-align:right;"> 2.514 </td> </tr> <tr> <td style="text-align:left;"> summarise </td> <td style="text-align:right;"> 24.196 </td> <td style="text-align:right;"> 25.062 </td> <td style="text-align:right;"> -0.866 </td> </tr> </tbody> </table> ] --- # Effects (sum to zero) model .pull-left[ ``` ## (Intercept) method1 method2 ## 25.062 -1.648 2.514 ``` + Group means from coefficients: <br> `$$\mu_1 = \beta_0 + \beta_1$$` <br> <br> `$$\mu_2 = \beta_0 + \beta_2$$` <br> <br> `$$\mu_3 = \beta_0 - (\beta_1 + \beta_2)$$` ] .pull-right[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> method </th> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> Gmean </th> <th style="text-align:right;"> Coefficients </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> read </td> <td style="text-align:right;"> 23.414 </td> <td style="text-align:right;"> 25.062 </td> <td style="text-align:right;"> -1.648 </td> </tr> <tr> <td style="text-align:left;"> self-test </td> <td style="text-align:right;"> 27.576 </td> <td style="text-align:right;"> 25.062 </td> <td style="text-align:right;"> 2.514 </td> </tr> <tr> <td style="text-align:left;"> summarise </td> <td style="text-align:right;"> 24.196 </td> <td style="text-align:right;"> 25.062 </td> <td style="text-align:right;"> -0.866 </td> </tr> </tbody> </table> ``` r 25.062 + -1.648 ``` ``` ## [1] 23.414 ``` ``` r 25.062 + 2.514 ``` ``` ## [1] 27.576 ``` ``` r 25.062 - (-1.648 + 2.514) ``` ``` ## [1] 24.196 ``` ] --- # The wide world of contrasts + We have now seen two examples of coding schemes (dummy and effect). + There are **lots** of different coding schemes we can use for categorical variables to make different comparisons. + If you are interested, see the excellent resource on [UCLA website](https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/) + But always remember... **The data is the same, the tested contrast differs** --- class: inverse, center, middle # Part 3: The data doesn't change, what we compare does --- # The data is the same, the tested contrasts differ + We can run our model for `method` using both dummy and sum-to-zero coding schemes ``` r contrasts(test_study3$method) <- contr.treatment m_dummy <- lm(score ~ method, data = test_study3) # Change the contrasts and run again contrasts(test_study3$method) <- contr.sum m_zero <- lm(score ~ method, data = test_study3) ``` + We see that the model coefficients are different, because the tested contrast differs: .pull-left[ ``` r coef(m_dummy) ``` ``` ## (Intercept) methodself-test methodsummarise ## 23.4137931 4.1619645 0.7820832 ``` ] .pull-right[ ``` r coef(m_zero) ``` ``` ## (Intercept) method1 method2 ## 25.061809 -1.648016 2.513949 ``` ] --- # The data is the same, the tested contrasts differ However, if we create a small data frame: ``` r treat <- tibble(method = c("read", "self-test", "summarise")) ``` and add the predicted values from our models: ``` r treat %>% mutate( pred_dummy = predict(m_dummy, newdata = .), pred_zero = predict(m_zero, newdata = .) ) ``` ``` ## # A tibble: 3 × 3 ## method pred_dummy pred_zero ## <chr> <dbl> <dbl> ## 1 read 23.4 23.4 ## 2 self-test 27.6 27.6 ## 3 summarise 24.2 24.2 ``` You can see that no matter what coding or contrasts we use, we are still modelling the same group means! --- class: center, middle # Questions? --- class: inverse, center, middle # Part 4: Setting up our own specific tests --- # Why do we need manual contrasts? + We have looked now at dummy and sum-to-zero coding + These provide us with coefficients which test the significance of the difference between means of groups and some other mean (either reference group or the grand mean) + The other coding schemes we linked to do exactly the same thing + ***Sometimes*** we have a research question that requires the test of the difference between particular combinations of groups for which there is no *"off the shelf"* test + For such situations, we can apply a set of rules and test what are referred to as manual contrasts -- + We can structure a wide variety of contrasts so long as they can be written: + As a linear combination of weighted group means + With the associated weights on coefficients summing to zero --- # New example + Suppose we were interested in the effect of various relationship statuses on an individuals subjective well-being (`swb`) + Our predictor is `status` which has 5 levels: + Married or Civil Partnership + Cohabiting relationship + Single + Widowed + Divorced + Let's say we have data on 500 people --- # Data <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> status </th> <th style="text-align:right;"> n </th> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> sd </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cohab </td> <td style="text-align:right;"> 100 </td> <td style="text-align:right;"> 11.44 </td> <td style="text-align:right;"> 4.22 </td> </tr> <tr> <td style="text-align:left;"> Divorced </td> <td style="text-align:right;"> 50 </td> <td style="text-align:right;"> 9.37 </td> <td style="text-align:right;"> 2.34 </td> </tr> <tr> <td style="text-align:left;"> Married/CP </td> <td style="text-align:right;"> 275 </td> <td style="text-align:right;"> 10.63 </td> <td style="text-align:right;"> 3.41 </td> </tr> <tr> <td style="text-align:left;"> Single </td> <td style="text-align:right;"> 50 </td> <td style="text-align:right;"> 8.06 </td> <td style="text-align:right;"> 2.19 </td> </tr> <tr> <td style="text-align:left;"> Widowed </td> <td style="text-align:right;"> 25 </td> <td style="text-align:right;"> 6.00 </td> <td style="text-align:right;"> 1.07 </td> </tr> </tbody> </table> --- # Our questions + Suppose we want to know if there are `swb` differences between: 1. Those who are currently or previously married or in a civil partnership, vs those who have never been married or in a civil partnership + Group 1: `Married/CP`, `Divorced`, `Widowed` + Group 2: `Single`, `Cohab` 2. Those who are currently married or in a civil partnership, vs those who have previously been + Group 1: `Married/CP` + Group 2: `Divorced`, `Widowed` -- + To test this, we need to: + group the levels of our factor `status` + calculate a mean of these new sub-groups making sure all levels contribute equally to their respective groups + then test the difference between these means + Manual contrasts can do this for us, if we follow some rules --- # Rules for manual contrasts + **Rule 1**: Weights ( `\(c\)`) range between -1 and 1 + **Rule 2**: The group(s) in one chunk are given negative weights, the group(s) in the other get positive weights + **Rule 3**: The sum of the weights of the comparison must be 0 + **Rule 4**: If a group is not involved in the comparison, weight is 0 + **Rule 5**: For a given comparison, weights assigned to group(s) are equal to 1 divided by the number of groups in that chunk + **Rule 6**: Restrict yourself to running `\(k\)` - 1 comparisons (where `\(k\)` = number of groups) + **Rule 7**: Each contrast can only compare 2 chunks + **Rule 8**: Once a group is singled out, it can not enter other contrasts --- # Applying rules .pull-left[ + Let's construct two contrasts: 1. Those who are currently or previously married or in a civil partnership vs not. 2. Those who are currently married or in a civil partnership vs those who have previously been. <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> group </th> <th style="text-align:left;"> contrast1 </th> <th style="text-align:left;"> contrast2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cohab </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Divorced </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Married/CP </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Single </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Widowed </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> ] .pull-right[ + **Rule 1**: Weights range between -1 and 1 + **Rule 2**: Groups in one chunk are given negative weights, groups in the other get positive weights + **Rule 3**: The sum of the weights of the comparison must be 0 + **Rule 4**: If a group is not involved in the comparison, weight is 0 + **Rule 5**: For a given comparison, weights assigned to group(s) = 1 divided by the number of groups in that chunk. + **Rule 6**: Restrict yourself to running `\(k\)` - 1 comparisons + **Rule 7**: Each contrast can only compare 2 chunks + **Rule 8**: Once a group is singled out, it can not enter other contrasts ] --- # Applying rules .pull-left[ + Let's construct two contrasts: 1. Those who are currently or previously married or in a civil partnership vs not. 2. Those who are currently married or in a civil partnership vs those who have previously been. <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> group </th> <th style="text-align:right;"> contrast1 </th> <th style="text-align:left;"> contrast2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cohab </td> <td style="text-align:right;"> -0.50 </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Divorced </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Married/CP </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Single </td> <td style="text-align:right;"> -0.50 </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Widowed </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> ] .pull-right[ + **Rule 1**: Weights range between -1 and 1 + **Rule 2**: Groups in one chunk are given negative weights, groups in the other get positive weights + **Rule 3**: The sum of the weights of the comparison must be 0 + **Rule 4**: If a group is not involved in the comparison, weight is 0 + **Rule 5**: For a given comparison, weights assigned to group(s) = 1 divided by the number of groups in that chunk + **Rule 6**: Restrict yourself to running `\(k\)` - 1 comparisons + **Rule 7**: Each contrast can only compare 2 chunks + **Rule 8**: Once a group is singled out, it can not enter other contrasts ] --- # Applying rules .pull-left[ + Let's construct two contrasts: 1. Those who are currently or previously married or in a civil partnership vs not 2. Those who are currently married or in a civil partnership vs those who have previously been <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> group </th> <th style="text-align:right;"> contrast1 </th> <th style="text-align:right;"> contrast2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cohab </td> <td style="text-align:right;"> -0.50 </td> <td style="text-align:right;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> Divorced </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> -0.5 </td> </tr> <tr> <td style="text-align:left;"> Married/CP </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> 1.0 </td> </tr> <tr> <td style="text-align:left;"> Single </td> <td style="text-align:right;"> -0.50 </td> <td style="text-align:right;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> Widowed </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> -0.5 </td> </tr> </tbody> </table> ] .pull-right[ + **Rule 1**: Weights range between -1 and 1 + **Rule 2**: Groups in one chunk are given negative weights, groups in the other get positive weights + **Rule 3**: The sum of the weights of the comparison must be 0 + **Rule 4**: If a group is not involved in the comparison, weight is 0 + **Rule 5**: For a given comparison, weights assigned to group(s) = 1 divided by the number of groups in that chunk. + **Rule 6**: Restrict yourself to running `\(k\)` - 1 comparisons + **Rule 7**: Each contrast can only compare 2 chunks + **Rule 8**: Once a group is singled out, it can not enter other contrasts ] --- # Orthogonal vs. Non-orthogonal Contrasts + Orthogonal contrasts test independent sources of variation + If we follow the rules above, we will have orthogonal contrasts + Non-orthogonal contrasts test non-independent sources of variation + This presents some further statistical challenges in terms of making inferences + We will come back to this discussion later in the course --- # Checking if contrasts are orthogonal + The sum of the products of the weights will = 0 for any pair of orthogonal comparisons `$$\sum{c_{1j}c_{2j}} = 0$$` --- # From our example <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> group </th> <th style="text-align:right;"> contrast1 </th> <th style="text-align:right;"> contrast2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cohab </td> <td style="text-align:right;"> -0.50 </td> <td style="text-align:right;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> Divorced </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> -0.5 </td> </tr> <tr> <td style="text-align:left;"> Married/CP </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> 1.0 </td> </tr> <tr> <td style="text-align:left;"> Single </td> <td style="text-align:right;"> -0.50 </td> <td style="text-align:right;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> Widowed </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> -0.5 </td> </tr> </tbody> </table> + Below we can see the product of `\(c_1c_2\)` for each level, and the row-wise sums for each contrast and the products + The 0 for contrast 1 and 2 show we have set correct weights + The 0 for the product shows the contrasts are orthogonal <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Contrast </th> <th style="text-align:right;"> Cohab </th> <th style="text-align:right;"> Divorced </th> <th style="text-align:right;"> Married_CP </th> <th style="text-align:right;"> Single </th> <th style="text-align:right;"> Widowed </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Contrast1 </td> <td style="text-align:right;"> -0.5 </td> <td style="text-align:right;"> 0.330 </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> -0.5 </td> <td style="text-align:right;"> 0.330 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Contrast2 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> -0.500 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> -0.500 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Product </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> -0.165 </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> -0.165 </td> <td style="text-align:right;"> 0 </td> </tr> </tbody> </table> --- class: inverse, center, middle # Part 5: Testing manual contrasts using emmeans --- # Using `emmeans` to test contrasts + We will use the package `emmeans` to test our contrasts + We will also be using this in the next few weeks to look at analysing experimental designs + **E**stimated + **M**arginal + **Means** + Essentially this package provides us with a lot of tools to help us model contrasts and linear functions --- # Working with `emmeans` + First we run our model: ``` r status_res <- lm(swb ~ status, wb_tib) ``` + Next we use the `emmeans` to get the estimated means of our groups. ``` r status_mean <- emmeans(status_res, ~status) status_mean ``` ``` ## status emmean SE df lower.CL upper.CL ## Cohab 11.44 0.333 495 10.78 12.09 ## Divorced 9.37 0.471 495 8.45 10.30 ## Married/CP 10.63 0.201 495 10.23 11.02 ## Single 8.06 0.471 495 7.13 8.99 ## Widowed 6.00 0.666 495 4.70 7.31 ## ## Confidence level used: 0.95 ``` --- # Visualise estimated means .pull-left[ ``` r plot(status_mean) ``` + We then use these means to test contrasts ] .pull-right[ ![](dapr2_07_lmcategorical2_files/figure-html/unnamed-chunk-30-1.png)<!-- --> ] --- # Defining the contrast + **KEY POINT**: The order of your categorical variable matters as `emmeans` uses this order <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> group </th> <th style="text-align:right;"> contrast1 </th> <th style="text-align:right;"> contrast2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cohab </td> <td style="text-align:right;"> -0.50 </td> <td style="text-align:right;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> Divorced </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> -0.5 </td> </tr> <tr> <td style="text-align:left;"> Married/CP </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> 1.0 </td> </tr> <tr> <td style="text-align:left;"> Single </td> <td style="text-align:right;"> -0.50 </td> <td style="text-align:right;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> Widowed </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> -0.5 </td> </tr> </tbody> </table> ``` r levels(wb_tib$status) ``` ``` ## [1] "Cohab" "Divorced" "Married/CP" "Single" "Widowed" ``` ``` r status_comp <- list("Married or CP vs not" = c(-1/2, 1/3, 1/3, -1/2, 1/3), "Current vs Not current" = c(0, -1/2, 1, 0, -1/2)) ``` --- # Requesting the test + In order to test our effects, we use the `contrast` function from `emmeans` ``` r status_comp_test <- contrast(status_mean, status_comp) status_comp_test ``` ``` ## contrast estimate SE df t.ratio p.value ## Married or CP vs not -1.08 0.402 495 -2.690 0.0074 ## Current vs Not current 2.94 0.455 495 6.459 <.0001 ``` + We can see we have p-values, but we can also request confidence intervals ``` r confint(status_comp_test) ``` ``` ## contrast estimate SE df lower.CL upper.CL ## Married or CP vs not -1.08 0.402 495 -1.87 -0.291 ## Current vs Not current 2.94 0.455 495 2.04 3.829 ## ## Confidence level used: 0.95 ``` --- # Interpreting the results + The estimate is the difference between the average of the group means within each chunk ``` r confint(status_comp_test) ``` ``` ## contrast estimate SE df lower.CL upper.CL ## Married or CP vs not -1.08 0.402 495 -1.87 -0.291 ## Current vs Not current 2.94 0.455 495 2.04 3.829 ## ## Confidence level used: 0.95 ``` + So for `Married or CP vs not` : ``` r ((10.63 + 6.00 + 9.37)/3) - ((11.44 + 8.06)/2) ``` ``` ## [1] -1.083333 ``` + So those who are not currently or previously married or in a civial partnership have higher SWB. + And this is significant --- class: center, middle # Questions? --- # Summary of today + We have considered different ways in which we can code categorical predictors + Take home: + Use of coding schemes allows us to compare groups (or levels) in lots of ways + Our `\(\beta\)`'s will represent differences in group means + The scheme we use determines which group or combination of groups we are comparing + **In all cases the underlying data is unchanged** + We also looked at the use of `emmeans` in testing manual contrasts + Run the model + Estimate the means + Define the contrast + Test the contrast + Coding schemes are a very flexible tool for testing hypotheses --- ## This week .pull-left[ ### Tasks <img src="figs/labs.svg" width="10%" /> **Attend your lab and work together on the exercises** <br> <img src="figs/exam.svg" width="10%" /> **Complete the weekly quiz** ] .pull-right[ ### Support <img src="figs/forum.svg" width="10%" /> **Help each other on the Piazza forum** <br> <img src="figs/oh.png" width="10%" /> **Attend office hours (see Learn page for details)** ] --- class: inverse, center, middle # Thanks for listening