class: center, middle, inverse, title-slide .title[ #
Interactions 3
] .subtitle[ ## Data Analysis for Psychology in R 2
] .author[ ### dapR2 Team ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- # Course Overview .pull-left[ <!--- I've just copied the output of the Sem 1 table here and removed the bolding on the last week, so things look consistent with the trailing opacity produced by the course_table.R script otherwise. --> <table style="border: 1px solid black;> <tr style="padding: 0 1em 0 1em;"> <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1;text-align:center;vertical-align: middle"> <b>Introduction to Linear Models</b></td> <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Intro to Linear Regression</td> </tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Interpreting Linear Models</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Testing Individual Predictors</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Model Testing & Comparison</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Linear Model Analysis</td></tr> <tr style="padding: 0 1em 0 1em;"> <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1;text-align:center;vertical-align: middle"> <b>Analysing Experimental Studies</b></td> <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Categorical Predictors & Dummy Coding</td> </tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Effects Coding & Coding Specific Contrasts</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Assumptions & Diagnostics</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Bootstrapping</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Categorical Predictor Analysis</td></tr> </table> ] .pull-right[ <table style="border: 1px solid black;> <tr style="padding: 0 1em 0 1em;"> <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1;text-align:center;vertical-align: middle"> <b>Interactions</b></td> <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Interactions I</td> </tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> Interactions II</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1"> <b>Interactions III</b></td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Analysing Experiments</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Interaction Analysis</td></tr> <tr style="padding: 0 1em 0 1em;"> <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4;text-align:center;vertical-align: middle"> <b>Advanced Topics</b></td> <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Power Analysis</td> </tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Binary Logistic Regression I</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Binary Logistic Regression II</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Logistic Regresison Analysis</td></tr> <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4"> Exam Prep and Course Q&A</td></tr> </table> ] --- # Week's Learning Objectives 1. Interpret interactions between two categorical variables (with dummy coding) + Begin with 2x2 (two binary variables) + Move on to look at examples with more levels 2. Visualise and probe interactions 3. Be able to read interaction plots --- # General definition + When the effects of one predictor on the outcome differ across levels of another predictor + Categorical `\(\times\)` categorical interaction: + There is a difference in the differences between groups across levels of a second factor + This idea of a difference in differences can be quite tricky to think about + So we will start with some visualisation, and then look at two examples --- # Visualising interactions + In our class example we have looked at predicting salary from years of service (a continous variable) and department (a categorical variable) + Suppose our company also had two sites, Edinburgh and Dundee, and we wanted to see if the salaries across departments differed depending on location + The table below contains hypothetical average salaries in thousands of pounds for each group | | Edinburgh| Dundee| |:--------|---------:|------:| |Accounts | 50| 40| |Manager | 30| 20| --- # Basic plot .pull-left[ | | Edinburgh| Dundee| |:--------|---------:|------:| |Accounts | 50| 40| |Manager | 30| 20| + Let's look at the plot: + x-axis shows locations + y-axis is our salaries + The colours represent departments ] .pull-right[ <!-- --> ] --- # Difference in differences (1) .pull-left[ | | Edinburgh| Dundee| |:--------|---------:|------:| |Accounts | 50| 40| |Manager | 30| 20| + In each plot we look at, think about subtracting the average store managers salary (blue triangle) from the average accounts salary (red circle) + For both Edinburgh and Dundee, this difference is £20,000 + Note, the lines are parallel + When the lines are parallel, there is no interaction + The effect of one variable (department) does not vary along values or levels of the other variable (location) ] .pull-right[ <!-- --> ] --- # Difference in differences (2) .pull-left[ Let's imagine our group means look like this instead: | | Edinburgh| Dundee| |:--------|---------:|------:| |Accounts | 50| 40| |Manager | 40| 20| + This time we can see the difference differs + £20,000 in Dundee + £10,000 in Edinburgh + Note the lines are no longer parallel + Suggests interaction + But not crossing (so ordinal interaction) ] .pull-right[ <!-- --> ] --- # Difference in differences (3) .pull-left[ Now consider the following group means: | | Edinburgh| Dundee| |:--------|---------:|------:| |Accounts | 40| 40| |Manager | 60| 20| + This time we can see the difference differs + postive difference of £20,000 in Dundee + negative difference of £20,000 in Edinburgh + Note the lines are no longer parallel + Suggests interaction + Now crossing (so disordinal interaction) ] .pull-right[ <!-- --> ] --- class: center, middle # Questions? --- # Lecture notation `$$y_i = \beta_0 + \beta_1 x_{i} + \beta_2 z_{i} + \beta_3 x_{i}z_{i} + \epsilon_i$$` + Lecture notation: + `\(y\)` is a continuous outcome + `\(x\)` is a categorical predictor ( `location` ) + `\(z\)` is a categorical predictor (`department` ) + `\(xz\)` is their product, or interaction predictor --- # Product for dummy variables + Let's set Edinburgh as the baseline for `location` and Accounts as the baseline for `department` + The interaction is the product of the two variables + We can then figure out what to substitute for `\(x\)`, `\(z\)` and `\(xz\)` in our regression formula to get predictions for each group once we have estimated our `\(\beta\)` coefficients |location |department | x| z| xz| |:---------|:----------|--:|--:|--:| |Edinburgh |Accounts | 0| 0| 0| |Edinburgh |Manager | 0| 1| 0| |Dundee |Accounts | 1| 0| 0| |Dundee |Manager | 1| 1| 1| --- # Interpretation: Categorical `\(\times\)` categorical interaction (dummy codes) `$$y_i = \beta_0 + \beta_1 x_{i} + \beta_2 z_{i} + \beta_3 x_{i}z_{i} + \epsilon_i$$` + `\(\beta_0\)` = Value of `\(y\)` when `\(x\)` and `\(z\)` are 0 + Expected salary for Accounts in Edinburgh --- # Interpretation: Categorical `\(\times\)` categorical interaction (dummy codes) `$$y_i = \beta_0 + \beta_1 x_{i} + \beta_2 z_{i} + \beta_3 x_{i}z_{i} + \epsilon_i$$` + Remember `\(x\)` is our `location` variable and `\(z\)` is our `department` variable + `\(\beta_1\)` = Difference between levels of `\(x\)` when `\(z\)` = 0 + The difference in salary between Accounts in Edinburgh and Dundee --- # Interpretation: Categorical `\(\times\)` categorical interaction (dummy codes) `$$y_i = \beta_0 + \beta_1 x_{i} + \beta_2 z_{i} + \beta_3 x_{i}z_{i} + \epsilon_i$$` + Remember `\(x\)` is our `location` variable and `\(z\)` is our `department` variable + `\(\beta_2\)` = Difference between levels of `\(z\)` when `\(x\)` = 0 + The difference in salary between Accounts and Store managers in Edinburgh --- # Interpretation: Categorical `\(\times\)` categorical interaction (dummy codes) `$$y_i = \beta_0 + \beta_1 x_{i} + \beta_2 z_{i} + \beta_3 x_{i}z_{i} + \epsilon_i$$` + `\(\beta_3\)` = Difference between levels of `\(x\)` across levels of `\(z\)` + The difference between salary in Accounts and Store managers between Edinburgh and Dundee + "Difference in differences" --- # Example: Categorical `\(\times\)` categorical .pull-left[ Let's examine the actual group means in our data set ``` r salary3 %>% group_by(location, department) %>% summarise( Salary = mean(salary) ) ``` ] .pull-right[ ``` ## # A tibble: 4 × 3 ## # Groups: location [2] ## location department Salary ## <fct> <fct> <dbl> ## 1 Edinburgh Accounts 50.7 ## 2 Edinburgh Manager 47.2 ## 3 Dundee Accounts 48.7 ## 4 Dundee Manager 36.8 ``` ] --- # Example: Categorical `\(\times\)` categorical ``` r m1 <- lm(salary ~ location*department, salary3) ``` ``` ## ## Call: ## lm(formula = salary ~ location * department, data = salary3) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.551 -3.389 0.579 2.937 14.325 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 50.6704 0.7144 70.927 < 2e-16 *** ## locationDundee -1.9796 1.4872 -1.331 0.186297 ## departmentManager -3.4625 1.3365 -2.591 0.011072 * ## locationDundee:departmentManager -8.4153 2.2779 -3.694 0.000367 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.052 on 96 degrees of freedom ## Multiple R-squared: 0.4774, Adjusted R-squared: 0.461 ## F-statistic: 29.23 on 3 and 96 DF, p-value: 1.636e-13 ``` --- # Example: Categorical `\(\times\)` categorical .pull-left[ + We can visualise categorical interactions using `cat_plot()` from the `interactions` package + `probe_interaction()` does not work with two categorical predictors ``` r cat_plot(m1, pred = location, modx = department) ``` + Plot shows group means and 95% confidence intervals based on `m1` ] .pull-right[ <!-- --> ] --- # Example: Categorical `\(\times\)` categorical ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 50.67 0.71 70.93 0.00 ## locationDundee -1.98 1.49 -1.33 0.19 ## departmentManager -3.46 1.34 -2.59 0.01 ## locationDundee:departmentManager -8.42 2.28 -3.69 0.00 ``` .pull-left[ + `\(\beta_0\)` = Value of `\(y\)` when `\(x\)` and `\(z\)` are 0 + Expected salary for Accounts in Edinburgh is £50,670 ] .pull-right[ | | Edinburgh| Dundee| |:--------|---------:|------:| |Accounts | 50.67| 48.69| |Manager | 47.21| 36.81| ] --- # Example: Categorical `\(\times\)` categorical ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 50.67 0.71 70.93 0.00 ## locationDundee -1.98 1.49 -1.33 0.19 ## departmentManager -3.46 1.34 -2.59 0.01 ## locationDundee:departmentManager -8.42 2.28 -3.69 0.00 ``` .pull-left[ + `\(\beta_1\)` = Difference between levels of `\(x\)` when `\(z\)` = 0 + The difference in salary between Accounts in Edinburgh and Dundee is £1,980. The salary is lower in Dundee. (But note this is not statistically significant) + With dummy coding, slopes for marginal effects refer to `\(\text{Group} - \text{Reference group (intercept)}\)` ] .pull-right[ | | Edinburgh| Dundee| |:--------|---------:|------:| |Accounts | 50.67| 48.69| |Manager | 47.21| 36.81| ] --- # Example: Categorical `\(\times\)` categorical ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 50.67 0.71 70.93 0.00 ## locationDundee -1.98 1.49 -1.33 0.19 ## departmentManager -3.46 1.34 -2.59 0.01 ## locationDundee:departmentManager -8.42 2.28 -3.69 0.00 ``` .pull-left[ + `\(\beta_2\)` = Difference between levels of `\(z\)` when `\(x\)` = 0 + The difference in salary between Accounts and Store managers in Edinburgh is £3,460. The salary is lower for Store Managers. + With dummy coding, slopes for marginal effects refer to `\(\text{Group} - \text{Reference group (intercept)}\)` ] .pull-right[ | | Edinburgh| Dundee| |:--------|---------:|------:| |Accounts | 50.67| 48.69| |Manager | 47.21| 36.81| ] --- # Example: Categorical `\(\times\)` categorical ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 50.67 0.71 70.93 0.00 ## locationDundee -1.98 1.49 -1.33 0.19 ## departmentManager -3.46 1.34 -2.59 0.01 ## locationDundee:departmentManager -8.42 2.28 -3.69 0.00 ``` .pull-left[ + `\(\beta_3\)` = Difference between levels of `\(x\)` across levels of `\(z\)` + The difference between salary for Accounts and Store managers differs by £8,420 between Edinburgh and Dundee. The difference is greater in Dundee than in Edinburgh. ] .pull-right[ | | Edinburgh| Dundee| |:--------|---------:|------:| |Accounts | 50.67| 48.69| |Manager | 47.21| 36.81| ] --- # Predictions from the regression formula + Let's plug our `\(\beta\)` coefficients back into our regression formula: `$$\hat{y} = \beta_0 + \beta_1 x + \beta_2 z + \beta_3 xz$$` `$$\hat{y} = 50.67 -1.98 x -3.46 z -8.42 xz$$` > **Test yourself: What is the estimated mean salary for each group?** <br> |location |department | x| z| xz| |:---------|:----------|--:|--:|--:| |Edinburgh |Accounts | 0| 0| 0| |Edinburgh |Manager | 0| 1| 0| |Dundee |Accounts | 1| 0| 0| |Dundee |Manager | 1| 1| 1| --- class: center, middle # Questions? --- # Extending past 2x2 + When fitting an interaction to categorical variables with more than 2 levels, we need additional interaction terms + Remember, a categorical variable with 3 levels is represented by the model as 2 binary variables + So that means we have two variables to create products with + The general rule on the number of interaction terms is: `$$(r-1) \times (c-1)$$` + Where + `\(r\)` (row) = number of levels of the first categorical variable + `\(c\)` (column) = number of levels of the second categorical variable --- # Example + The data comes from a study into patient care in paediatric wards + A researcher was interested in whether the subjective well-being of patients differed dependent on the post-operation treatment schedule they were given, and the hospital in which they were staying + **Factor 1**: `Treatment` (Levels: TreatA, TreatB, TreatC). + **Factor 2**: `Hosp` (Levels: Hosp1, Hosp2). + Total sample n = 180 (30 patients in each of 6 groups) + Between-person design + **Outcome variable**: Subjective well-being (SWB) + An average of multiple raters (the patient, a member of their family, and a friend) + SWB scores ranged from 0 to 20 --- # The data ``` r head(hosp_tbl, 10) ``` ``` ## # A tibble: 10 × 3 ## SWB Treatment Hospital ## <dbl> <fct> <fct> ## 1 6.2 TreatA Hosp1 ## 2 15.9 TreatA Hosp1 ## 3 7.2 TreatA Hosp1 ## 4 11.3 TreatA Hosp1 ## 5 11.2 TreatA Hosp1 ## 6 9 TreatA Hosp1 ## 7 14.5 TreatA Hosp1 ## 8 7.3 TreatA Hosp1 ## 9 13.7 TreatA Hosp1 ## 10 12.6 TreatA Hosp1 ``` --- # The group means <br> | |Hosp1 |Hosp2 | |:------|:-----|:-----| |TreatA |10.80 |7.85 | |TreatB |9.43 |13.11 | |TreatC |10.10 |7.98 | -- Some ways of phrasing our research question: + Is there a difference in how subjective wellbeing varies for the three different treatment options between the two hospitals? + Does the hospital a patient stays at influence how the type of treatment influences their subjective wellbeing? --- # Model equations and coding `$$y_{ijk} = b_0 + \underbrace{(b_1D_1 + b_2D_2)}_{\text{Treatment}} + \underbrace{b_3D_3}_{\text{Hospital}} + \underbrace{b_4D_{13} + b_5D_{23}}_{\text{Interactions}} + \epsilon_{i}$$` ``` ## # A tibble: 6 × 7 ## Treatment Hospital D1 D2 D3 D13 D23 ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 A Hosp1 0 0 0 0 0 ## 2 A Hosp2 0 0 1 0 0 ## 3 B Hosp1 1 0 0 0 0 ## 4 B Hosp2 1 0 1 1 0 ## 5 C Hosp1 0 1 0 0 0 ## 6 C Hosp2 0 1 1 0 1 ``` <br> + Note that "D" stands for dummy-coded variables --- # Interpretation with dummy coding `$$y_{ijk} = b_0 + \underbrace{(b_1D_1 + b_2D_2)}_{\text{Treatment}} + \underbrace{b_3D_3}_{\text{Hospital}} + \underbrace{b_4D_{13} + b_5D_{23}}_{\text{Interactions}} + \epsilon_{i}$$` + Treatment A and Hospital 1 as reference levels + `\(b_0\)` = Mean of treatment A in hospital 1 + `\(b_1\)` = Difference between Treatment B and Treatment A in Hospital 1 + `\(b_2\)` = Difference between Treatment C and Treatment A in Hospital 1 + `\(b_3\)` = Difference between Treatment A in Hospital 1 and Hospital 2 + `\(b_4\)` = Difference between Treatment A and Treatment B between Hospital 1 and Hospital 2 + `\(b_5\)` = Difference between Treatment A and Treatment C between Hospital 1 and Hospital 2 --- # Our results ``` r m4 <- lm(SWB ~ Treatment + Hospital + Treatment*Hospital, data = hosp_tbl) m4sum <- summary(m4) m4res <- round(m4sum$coefficients,2) m4res ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.80 0.37 29.19 0.00 ## TreatmentTreatB -1.37 0.52 -2.62 0.01 ## TreatmentTreatC -0.70 0.52 -1.33 0.18 ## HospitalHosp2 -2.95 0.52 -5.63 0.00 ## TreatmentTreatB:HospitalHosp2 6.63 0.74 8.97 0.00 ## TreatmentTreatC:HospitalHosp2 0.82 0.74 1.11 0.27 ``` --- # Interpretation with dummy coding ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.80 0.37 29.19 0.00 ## TreatmentTreatB -1.37 0.52 -2.62 0.01 ## TreatmentTreatC -0.70 0.52 -1.33 0.18 ## HospitalHosp2 -2.95 0.52 -5.63 0.00 ## TreatmentTreatB:HospitalHosp2 6.63 0.74 8.97 0.00 ## TreatmentTreatC:HospitalHosp2 0.82 0.74 1.11 0.27 ``` .pull-left[ + `\(b_0\)` = Mean of treatment A in hospital 1 + `\(b_1\)` = Difference between Treatment B and Treatment A in Hospital 1 + `\(b_2\)` = Difference between Treatment C and Treatment A in Hospital 1 + `\(b_3\)` = Difference between Treatment A in Hospital 1 and Hospital 2 + `\(b_4\)` = Difference between Treatment A and Treatment B between Hospital 1 and Hospital 2 + `\(b_5\)` = Difference between Treatment A and Treatment C between Hospital 1 and Hospital 2 ] .pull-right[ | |Hosp1 |Hosp2 | |:------|:-----|:-----| |TreatA |10.80 |7.85 | |TreatB |9.43 |13.11 | |TreatC |10.10 |7.98 | ] --- # Brief comment: 3-way interactions + **In principle** we can extend interactions to more than two variables + Consider our salary example: + We could ask: Are there location-based differences in how years of service contribute to salary differently across departments? + Note that we'd then have three interacting predictors: + Years of service (continuous) + Department (binary; Store Manager, Accounts) + Location (binary; Edinburgh, Dundee) + This is a plausible question concerning organisatonal fairness --- # Brief comment: 3-way interactions + In general, three-way interactions are tricky to interpret + Require lots of data to test as the effects are often very small + Extends the issues of power already discussed + **Only test a three-way interaction if you have strong theoretical reason for doing so** --- # Summary + We have considered how we fit and interpret linear models with categorical interactions + We have focused on dummy coded variables (we'll return to effects coding next week) + We saw an example of an interaction involving categorical variables with more than two levels (multiple interaction terms) --- ## This week .pull-left[ ### Tasks <img src="figs/labs.svg" width="10%" /> **Attend your lab and work together on the exercises** <br> <img src="figs/exam.svg" width="10%" /> **Complete the weekly quiz** ] .pull-right[ ### Support <img src="figs/forum.svg" width="10%" /> **Help each other on the Piazza forum** <br> <img src="figs/oh.png" width="10%" /> **Attend office hours (see Learn page for details)** ]