class: center, middle, inverse, title-slide .title[ #
Categorical Predictors
] .subtitle[ ## Data Analysis for Psychology in R 2
] .author[ ### dapR2 Team ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- # Weeks Learning Objectives 1. Understand the meaning of model coefficients in the case of a binary predictor. 2. Understand how to apply dummy coding to include categorical predictors with 2+ levels. 3. Be able to include categorical predictors into an `lm` model in R 4. Introduce `emmeans` as a tool for testing different effects in models with categorical predictors. --- class: inverse, center, middle # Part 1: Categorical variables --- # Categorical variables (recap dapR1) + Categorical variables can only take discrete values + E.g., animal type: 1= duck, 2= cow, 3= wasp + They are mutually exclusive + No duck-wasps or cow-ducks! + In R, categorical variables should be of class `factor` + The discrete values are `levels` + Levels can have numeric values (1, 2 3) and labels (e.g. "duck", "cow", "wasp") + All that these numbers represent is category membership. --- # Understanding category membership + When we have looked at variables like hours of study, an additional unit (1 hour) is more time. + As values change (and they do not need to change by whole hours), we get a sense of more or less time being spent studying + Across our whole sample, that gives us variation in hours of study. + When we have a categorical variable, we do not have this level of precision. + Suppose we split hours of study into the following groups: + 0 to 2.99 hours = low study time + 3 - 9.99 hours = medium study time + 10+ hours = high study time + **Side note**: This is for point of demonstration, it is never a good idea to break a continuous measure into groups --- # Understanding category membership .pull-left[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:right;"> score </th> <th style="text-align:right;"> hours </th> <th style="text-align:left;"> hours_cat </th> <th style="text-align:left;"> hours_catNum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID102 </td> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> high </td> <td style="text-align:left;"> 3 </td> </tr> <tr> <td style="text-align:left;"> ID104 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> high </td> <td style="text-align:left;"> 3 </td> </tr> <tr> <td style="text-align:left;"> ID106 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> high </td> <td style="text-align:left;"> 3 </td> </tr> <tr> <td style="text-align:left;"> ID107 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> high </td> <td style="text-align:left;"> 3 </td> </tr> <tr> <td style="text-align:left;"> ID110 </td> <td style="text-align:right;"> 41 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> high </td> <td style="text-align:left;"> 3 </td> </tr> <tr> <td style="text-align:left;"> ID103 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> medium </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> ID105 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> medium </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> ID101 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> low </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> ID108 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> low </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> ID109 </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:left;"> low </td> <td style="text-align:left;"> 1 </td> </tr> </tbody> </table> ] .pull-right[ ``` ## hours hours_cat hours_catNum ## Min. : 0.00 low : 45 1: 45 ## 1st Qu.: 4.00 medium: 96 2: 96 ## Median : 9.00 high :109 3:109 ## Mean : 8.02 ## 3rd Qu.:12.00 ## Max. :15.00 ``` ] --- # Binary variable (recap) + Binary variable is a categorical variable with two levels. + Traditionally coded with a 0 and 1 + In `lm`, binary variables are often referred to as dummy variables + when we use multiple dummies, we talk about the general procedure of dummy coding + Why 0 and 1? -- Quick version: It has some nice properties when it comes to interpretation. -- > **Test your understanding:** > What is the interpretation of the intercept? -- > What about the slope? --- class: center, middle # Questions? --- class: inverse, center, middle # Part 2: lm with a single binary predictor --- # Extending our example .pull-left[ + Our in-class example so far has used *test scores*, *revision time* and *motivation*. + Let's say we also collected data on who they studied with (`study`); + 0 = alone; 1 = with others + And also which of three different revisions methods they used for the test (`study_method`) + 1 = Notes re-reading; 2 = Notes summarising; 3 = Self-testing ([see here](https://www.psychologicalscience.org/publications/journals/pspi/learning-techniques.html)) + We collect a new sample of 200 students. ] .pull-right[ ```r head(test_study3) %>% kable(.) %>% kable_styling(full_width = F) ``` <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:right;"> score </th> <th style="text-align:right;"> hours </th> <th style="text-align:right;"> motivation </th> <th style="text-align:left;"> study </th> <th style="text-align:left;"> method </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID101 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> -0.42 </td> <td style="text-align:left;"> others </td> <td style="text-align:left;"> self-test </td> </tr> <tr> <td style="text-align:left;"> ID102 </td> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 0.15 </td> <td style="text-align:left;"> alone </td> <td style="text-align:left;"> summarise </td> </tr> <tr> <td style="text-align:left;"> ID103 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.35 </td> <td style="text-align:left;"> alone </td> <td style="text-align:left;"> summarise </td> </tr> <tr> <td style="text-align:left;"> ID104 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 0.16 </td> <td style="text-align:left;"> others </td> <td style="text-align:left;"> summarise </td> </tr> <tr> <td style="text-align:left;"> ID105 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> -1.12 </td> <td style="text-align:left;"> alone </td> <td style="text-align:left;"> read </td> </tr> <tr> <td style="text-align:left;"> ID106 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> -0.64 </td> <td style="text-align:left;"> others </td> <td style="text-align:left;"> read </td> </tr> </tbody> </table> ] --- # LM with binary predictors + Now lets ask the question: + **Do students who study with others score better than students who study alone?** + Our equation does not change: `$$score_i = \beta_0 + \beta_1 study_{i} + \epsilon_i$$` + And in R: ```r performance_study <- lm(score ~ study, data = test_study3) ``` --- # Model results ```r summary(performance_study) ``` ``` ## ## Call: ## lm(formula = score ~ study, data = test_study3) ## ## Residuals: ## Min 1Q Median 3Q Max ## -22.7902 -4.7902 0.2098 5.4766 18.2098 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 22.7902 0.6603 34.52 < 2e-16 *** ## studyothers 4.7332 1.0093 4.69 4.53e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 7.896 on 248 degrees of freedom ## Multiple R-squared: 0.08145, Adjusted R-squared: 0.07775 ## F-statistic: 21.99 on 1 and 248 DF, p-value: 4.525e-06 ``` --- # Interpretation .pull-left[ + As before, the intercept `\(\hat \beta_0\)` is the expected value of `\(y\)` when `\(x=0\)` + What is `\(x=0\)` here? + It is the students who study alone. + So what about `\(\hat \beta_1\)`? + `\(\beta_1\)` = + **Look at the output on the right hand side.** + What do you notice about the difference in averages? + Let's look back at the summary results again.... ] .pull-right[ ```r test_study3 %>% * group_by(., study) %>% summarise( * Average = round(mean(score),4) ) ``` ``` ## # A tibble: 2 × 2 ## study Average ## <fct> <dbl> ## 1 alone 22.8 ## 2 others 27.5 ``` ] --- # Model results .pull-left[ ![](figs/catModResults.png)<!-- --> ] .pull-right[ ``` ## # A tibble: 2 × 2 ## study Average ## <fct> <dbl> ## 1 alone 22.8 ## 2 others 27.5 ``` ] --- # Model results .pull-left[ ![](figs/catModResults.png)<!-- --> ] .pull-right[ ``` ## # A tibble: 2 × 2 ## study Average ## <fct> <dbl> ## 1 alone 22.8 ## 2 others 27.5 ``` ] --- # Interpretation + `\(\hat \beta_0\)` = predicted expected value of `\(y\)` when `\(x = 0\)` + Or, the mean of group coded 0 (those who study alone) + `\(\hat \beta_1\)` = predicted difference between the means of the two groups. + Group 1 - Group 0 (Mean `score` for those who study with others - mean `score` of those who study alone) + Notice how this maps to our question. + Do students who study with others score better than students who study alone? --- class: center, middle # Questions? --- class: inverse, center, middle # Part 3: Prediction equations for groups & significance --- # Equations for each group + What would our linear model look like if we added the values for `\(x\)`. `$$\widehat{score} = \hat \beta_0 + \hat \beta_1 study$$` + For those who study alone ( `\(study = 0\)` ): `$$\widehat{score}_{alone} = \hat \beta_0 + \hat \beta_1 \times 0$$` + So; `$$\widehat{score}_{alone} = \hat \beta_0$$` --- # Equations for each group + For those who study with others ( `\(study = 1\)` ): `$$\widehat{score}_{others} = \hat \beta_0 + \hat \beta_1 \times 1$$` + So; `$$\widehat{score}_{others} = \hat \beta_0 + \hat \beta_1$$` + And if we re-arrange; `$$\hat \beta_1 = \widehat{score}_{others} - \hat \beta_0$$` + Remembering that `\(\widehat{score}_{alone} = \hat \beta_0\)`, we finally obtain: `$$\hat \beta_1 = \widehat{score}_{others} - \widehat{score}_{alone}$$` --- # Visualize the model .center[ <img src="dapr2_06_LMcategorical1_files/figure-html/unnamed-chunk-13-1.png" width="504" /> ] --- count: false # Visualize the model .center[ <img src="dapr2_06_LMcategorical1_files/figure-html/unnamed-chunk-14-1.png" width="504" /> ] --- count: false # Visualize the model .center[ <img src="dapr2_06_LMcategorical1_files/figure-html/unnamed-chunk-15-1.png" width="504" /> ] --- count: false # Visualize the model .center[ <img src="dapr2_06_LMcategorical1_files/figure-html/unnamed-chunk-16-1.png" width="504" /> ] --- # Evaluation of model and significance of `\(\beta_1\)` + `\(R^2\)` and `\(F\)`-ratio interpretation are identical to their interpretation in models with only continuous predictors. + And we assess the significance of predictors in the same way + We use the standard error of the coefficient to construct: + We calculate the `\(\hat \beta_1\)` = difference between groups + `\(t\)`-value and associated `\(p\)`-value for the coefficient + Or a confidence interval around the coefficient --- # Hold on... it's a t-test ```r test_study3 %>% t.test(score ~ study, var.equal = T, .) ``` ``` ## ## Two Sample t-test ## ## data: score by study ## t = -4.6895, df = 248, p-value = 4.525e-06 ## alternative hypothesis: true difference in means between group alone and group others is not equal to 0 ## 95 percent confidence interval: ## -6.721058 -2.745251 ## sample estimates: ## mean in group alone mean in group others ## 22.79021 27.52336 ``` --- # Model results + **Test your understanding:** How do we interpret our results? .pull-left[ ![](figs/catModResults.png)<!-- --> ] -- .pull-right[ Study habits significantly predicted student test scores, *F*(1, 248) = 21.99, *p* < .001. Study habits explained 0.08% of the variance in test scores. Specifically, students who studied with others (*M* = 27.52, *SD* = 7.94 ) scored significantly higher than students who studied alone (*M* = 22.79, *SD* = 7.86), `\(\beta_1\)` = 4.73, *SE* = 1.01, *t* = 4.69, *p* < .001. ] --- class: center, middle # Questions? --- class: inverse, center, middle # Part 4: Categorical variables with >2 levels --- # Including categorical predictors with >2 levels in a regression + The goal when analyzing categorical data with 2 or more levels is that each of our `\(\beta\)` coefficients represents a specific difference between means. + e.g. When using a single binary predictor, `\(\beta_1\)` is the difference between the two groups. + To be able to do this when we have 2+ levels, we need to apply a **coding scheme**. + Two common coding schemes are: + Dummy coding (which we will discuss now) + Effects coding (or sum to zero, which we will discuss next week) + There are LOTS of ways to do this; if curious, [see here](https://stats.oarc.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/) --- # Dummy coding + Dummy coding uses 0's and 1's to represent group membership + One level is chosen as a baseline + All other levels are compared against that baseline + Notice, this is identical to binary variables already discussed. + Dummy coding is simply the process of producing a set of binary coded variables + For any categorical variable, we will create `\(k\)`-1 dummy variables + `\(k\)` = number of levels --- # Steps in dummy coding by hand 1. Choose a baseline level 2. Assign everyone in the baseline group `0` for all `\(k\)`-1 dummy variables 3. Assign everyone in the next group a `1` for the first dummy variable and a `0` for all the other dummy variables 4. Assign everyone in the next again group a `1` for the second dummy variable and a `0` for all the other dummy variables 5. Repeat until all `\(k\)`-1 dummy variables have had 0's and 1's assigned 6. Enter the `\(k\)`-1 dummy variables into your regression --- # Choosing a baseline? + Each level of your categorical predictor will be compared against the baseline. + Good baseline levels could be: + The control group in an experiment + The group expected to have the lowest score on the outcome + The largest group + It is best the baseline is not: + A poorly defined level, e.g. an `Other` group + Much smaller than the other groups --- # Dummy coding by hand .pull-left[ + We start out with a dataset that looks like: ``` ## ID score method ## 1 ID101 18 self-test ## 2 ID102 36 summarise ## 3 ID103 15 summarise ## 4 ID104 29 summarise ## 5 ID105 18 read ## 6 ID106 29 read ## 7 ID107 18 summarise ## 8 ID108 0 read ## 9 ID109 17 read ## 10 ID110 41 read ``` ] .pull-right[ + And end up with one that looks like: ``` ## ID score method method1 method2 ## 1 ID101 18 self-test 1 0 ## 2 ID102 36 summarise 0 1 ## 3 ID103 15 summarise 0 1 ## 4 ID104 29 summarise 0 1 ## 5 ID105 18 read 0 0 ## 6 ID106 29 read 0 0 ## 7 ID107 18 summarise 0 1 ## 8 ID108 0 read 0 0 ## 9 ID109 17 read 0 0 ## 10 ID110 41 read 0 0 ``` ] --- # Dummy coding by hand + We would then enter the dummy coded variables into a regression model: .pull-left[ ``` ## ID score method method1 method2 ## 1 ID101 18 self-test 1 0 ## 2 ID102 36 summarise 0 1 ## 3 ID103 15 summarise 0 1 ## 4 ID104 29 summarise 0 1 ## 5 ID105 18 read 0 0 ## 6 ID106 29 read 0 0 ## 7 ID107 18 summarise 0 1 ## 8 ID108 0 read 0 0 ## 9 ID109 17 read 0 0 ## 10 ID110 41 read 0 0 ``` ] .pull-right[ ![](figs/dcResults.png)<!-- --> ] -- > **Test your understanding:** Which method is our baseline? > How does our model know to treat it as the baseline? --- # Dummy coding with `lm` + `lm` automatically applies dummy coding when you include a variable of class `factor` in a model. + It selects the first group (alphabetically) as the baseline group + It represents this as a contrast matrix which looks like: ```r contrasts(test_study3$method) ``` ``` ## self-test summarise ## read 0 0 ## self-test 1 0 ## summarise 0 1 ``` --- # Dummy coding with `lm` + `lm` does all the dummy coding work for us: .pull-left[ .center[**Dummy Coding in lm**] ```r summary(lm(score ~ method, data = dummyCoded)) ``` ![](figs/RdcResults.png)<!-- --> ] .pull-right[ .center[**Dummy Coding by Hand**] ```r summary(lm(score ~ method1+method2, data = dummyCoded)) ``` <img src="figs/dcResults.png" width="90%" /> ] --- # Dummy coding with `lm` .pull-left[ + The intercept is the mean of the baseline group (notes re-reading) + The coefficient for `methodself-test` is the mean difference between the self-testing group and the baseline group (re-reading) + The coefficient for `methodsummarise` is the mean difference between the note summarising group and the baseline group (re-reading) ] .pull-right[ ```r mod1 ``` ``` ## ## Call: ## lm(formula = score ~ method, data = test_study3) ## ## Coefficients: ## (Intercept) methodself-test methodsummarise ## 23.4138 4.1620 0.7821 ``` ``` ## # A tibble: 3 × 2 ## method Group_Mean ## <fct> <dbl> ## 1 read 23.4 ## 2 self-test 27.6 ## 3 summarise 24.2 ``` ] --- # Dummy coding with `lm` (full results) ```r summary(mod1) ``` ``` ## ## Call: ## lm(formula = score ~ method, data = test_study3) ## ## Residuals: ## Min 1Q Median 3Q Max ## -23.4138 -5.3593 -0.1959 5.7496 17.8041 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 23.4138 0.8662 27.031 <2e-16 *** ## methodself-test 4.1620 1.3188 3.156 0.0018 ** ## methodsummarise 0.7821 1.1930 0.656 0.5127 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8.079 on 247 degrees of freedom ## Multiple R-squared: 0.04224, Adjusted R-squared: 0.03448 ## F-statistic: 5.447 on 2 and 247 DF, p-value: 0.004845 ``` --- # Changing the baseline group + The level that `lm` chooses as it's baseline may not always be the best choice + You can change it using: ```r contrasts(test_study3$method) <- contr.treatment(3, base = 2) ``` + `contrasts` updates the variable with the new coding scheme + `contr.treatment` Specifies that you want dummy coding + `3` is No. of levels of your predictor + `base=2` is the level number of your new baseline --- # Results using the new baseline .pull-left[ + The intercept is the now the mean of the second group (Self-Testing) + `method1` is now the difference between Notes re-reading and Self-Testing + `method3` is now the difference between Notes summarising and Self-testing ] .pull-right[ ```r contrasts(test_study3$method) <- contr.treatment(3, base = 2) mod2 <- lm(score ~ method, data = test_study3) mod2 ``` ``` ## ## Call: ## lm(formula = score ~ method, data = test_study3) ## ## Coefficients: ## (Intercept) method1 method3 ## 27.576 -4.162 -3.380 ``` ``` ## # A tibble: 3 × 2 ## method Group_Mean ## <fct> <dbl> ## 1 read 23.4 ## 2 self-test 27.6 ## 3 summarise 24.2 ``` ] --- # New baseline (full results) ```r summary(mod2) ``` ``` ## ## Call: ## lm(formula = score ~ method, data = test_study3) ## ## Residuals: ## Min 1Q Median 3Q Max ## -23.4138 -5.3593 -0.1959 5.7496 17.8041 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 27.5758 0.9945 27.729 < 2e-16 *** ## method1 -4.1620 1.3188 -3.156 0.00180 ** ## method3 -3.3799 1.2892 -2.622 0.00929 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8.079 on 247 degrees of freedom ## Multiple R-squared: 0.04224, Adjusted R-squared: 0.03448 ## F-statistic: 5.447 on 2 and 247 DF, p-value: 0.004845 ``` ??? + Note that the choice of baseline does not affect the R^2 or F-ratio --- class: center, middle # Questions? --- class: inverse, center, middle # Part 5: Testing manual contrasts using emmeans --- # Using `emmeans` to test contrasts + We will use the package `emmeans` to test our contrasts + We will also be using this in the next few weeks to look at analysing experimental designs. + **E**stimated + **M**arginal + **Means** + Essentially this package provides us with a lot of tools to help us model contrasts and linear functions. --- # Working with `emmeans` + We already have our model: ```r mod1 <- lm(score ~ method, test_study3) ``` + Next we use `emmeans` to get the estimated means of our groups. ```r method_mean <- emmeans(mod1, ~method) method_mean ``` ``` ## method emmean SE df lower.CL upper.CL ## read 23.4 0.866 247 21.7 25.1 ## self-test 27.6 0.994 247 25.6 29.5 ## summarise 24.2 0.820 247 22.6 25.8 ## ## Confidence level used: 0.95 ``` --- # Visualise estimated means .pull-left[ ```r plot(method_mean) ``` + We then use these means to test contrasts ] .pull-right[ <img src="dapr2_06_LMcategorical1_files/figure-html/unnamed-chunk-40-1.png" width="504" /> ] --- # Defining the contrast + **KEY POINT**: The order of your categorical variable matters as `emmeans` uses this order. <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> self-test </th> <th style="text-align:right;"> summarise </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> read </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> self-test </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> summarise </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> </tbody> </table> ```r levels(test_study3$method) ``` ``` ## [1] "read" "self-test" "summarise" ``` ```r method_comp <- list("Reading vs Self-Testing" = c(1, -1, 0), "Reading vs Summarising" = c(1, 0, -1), "Self-Testing vs Summarising" = c(0, 1, -1)) ``` --- # Requesting the test + In order to test our effects, we use the `contrast` function from `emmeans` ```r method_comp_test <- contrast(method_mean, method_comp) method_comp_test ``` ``` ## contrast estimate SE df t.ratio p.value ## Reading vs Self-Testing -4.162 1.32 247 -3.156 0.0018 ## Reading vs Summarising -0.782 1.19 247 -0.656 0.5127 ## Self-Testing vs Summarising 3.380 1.29 247 2.622 0.0093 ``` -- + We can see we have p-values, but we can also request confidence intervals ```r confint(method_comp_test) ``` ``` ## contrast estimate SE df lower.CL upper.CL ## Reading vs Self-Testing -4.162 1.32 247 -6.760 -1.56 ## Reading vs Summarising -0.782 1.19 247 -3.132 1.57 ## Self-Testing vs Summarising 3.380 1.29 247 0.841 5.92 ## ## Confidence level used: 0.95 ``` --- # Interpreting the results + The estimate is the difference between the average of the group means within each chunk. .pull-left[ ```r method_comp_test ``` ![](figs/contrastEstimates.png)<!-- --> ] .pull-right[ ```r method_mean ``` <img src="figs/contrastMeans.png" width="75%" /> ] + So for `Self-Testing vs Summarising` : ```r 27.6-24.2 ``` ``` ## [1] 3.4 ``` + So those who test themselves on the content of their notes have higher scores than those who simply read their notes. + And this is significant. --- class: center, middle # Questions? --- # Summary of today + When we have categorical predictors, we are modelling the means of groups + A categorical variable with 2 levels can be represented as a binary (0,1) variable + The associated `\(\beta\)` is the difference between groups + When the categorical predictor has more than one level, we can represent it with dummy coded variables. + We have k-1 dummy variables, where k = levels + Each dummy represents the difference between the mean of the group scored 1 and the reference group + In R, we need to make sure... + R recognises the variable as a factor (`factor()`), and + we have the correct reference level (`contr.treatment()`) + We also looked at the use of `emmeans` to test specific contrasts. + Run the model + Estimate the means + Define the contrast + Test the contrast --- class: inverse, center, middle # Thanks for listening