Interactions

Univariate Statistics and Methodology using R

Martin Corley

Psychology, PPLS

University of Edinburgh

Two Weeks Ago

Two Weeks Ago

age hrs_wk method R_AGE
9.982 5.137 phonics 14.090
8.006 4.353 phonics 11.762
9.349 5.808 phonics 13.838
8.620 3.612 word 6.620
8.060 4.447 word 7.524
6.117 5.085 word 5.502

Two Weeks Ago

mod.m <- lm(R_AGE ~ age + hrs_wk, data = reading)
summary(mod.m)

Call:
lm(formula = R_AGE ~ age + hrs_wk, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-3.820 -2.382  0.074  2.404  3.549 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    0.518      2.422    0.21    0.832  
age            0.578      0.222    2.61    0.012 *
hrs_wk         0.945      0.406    2.33    0.024 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.55 on 47 degrees of freedom
Multiple R-squared:  0.253, Adjusted R-squared:  0.221 
F-statistic: 7.97 on 2 and 47 DF,  p-value: 0.00105

Two Weeks Ago

Interactions 1

Before we Reach for R

  • our model says that age and hrs_wk have orthogonal effects

    • whatever your age, an extra hour of reading/wk improves your reading age by 0.95 years
    • however much you read, an extra year of age improves your reading age by 0.58 years
  • what if practice affects people of different ages differently?

  • possibly our model isn’t a good fit to the data?

How Good is the Model?

plot(mod.m, which=1:4)

Interaction is Just Multiplication

linear model with two predictors

\[\color{red}{\textrm{outcome}_i}=\color{blue}{(\textrm{model})_i}+\textrm{error}_i\] \[\color{red}{y_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\epsilon_i\]

linear model with interaction

\[\color{red}{y_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}+\epsilon_i\] \[\color{red}{\hat{y_i}}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}\]

Model with Interaction

\[\color{red}{y_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}+\epsilon_i\]

\[\color{red}{\hat{y_i}}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}\]

 

 

mod.i <- lm(R_AGE ~ 1 + hrs_wk + age + hrs_wk:age, data = reading)

\[\color{red}{\hat{y_i}}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}\]

\(\color{blue}{b_0=1}\)

\(\color{blue}{b_1=1.5}\)

\(\color{blue}{b_2=2}\)

\(\color{blue}{b_3=0.5}\)

 

\(\color{red}{x_{1i}=2}\)

\[\color{red}{\hat{y_i}}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}\]

\(\color{blue}{b_0=1}\)

\(\color{blue}{b_1=1.5}\)

\(\color{blue}{b_2=2}\)

\(\color{blue}{b_3=0.5}\)

 

\(\color{red}{x_{1i}=2}\)

\(\color{green}{x_{1i}=4}\)

Model with Interaction

mod.i <- lm(R_AGE ~ 1 + hrs_wk + age + hrs_wk:age, data = reading)
summary(mod.i)

Call:
lm(formula = R_AGE ~ 1 + hrs_wk + age + hrs_wk:age, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-4.217 -2.172 -0.049  2.144  4.318 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   24.792     11.904    2.08    0.043 *
hrs_wk        -3.818      2.324   -1.64    0.107  
age           -2.348      1.423   -1.65    0.106  
hrs_wk:age     0.569      0.274    2.08    0.043 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.46 on 46 degrees of freedom
Multiple R-squared:  0.317, Adjusted R-squared:  0.273 
F-statistic: 7.13 on 3 and 46 DF,  p-value: 0.000496

Model with Interaction

Model with Interaction

Interactions 2

Categorical Predictors

  • how does interaction work with categorical predictors?

  • (as you’ll see) it’s all just multiplication

age hrs_wk method R_AGE
9.982 5.137 phonics 14.090
8.006 4.353 phonics 11.762
9.349 5.808 phonics 13.838
8.620 3.612 word 6.620
8.060 4.447 word 7.524
6.117 5.085 word 5.502
  • how is method coded by R?
contrasts(reading$method)
        word
phonics    0
word       1

A (Different) Two-Predictor Model

  • we know that hrs_wk affects reading age

  • perhaps method affects reading age too?

  • this is a question of model improvement

mod.m2 <- lm(R_AGE ~ hrs_wk + method, data = reading)
anova(mod.m2)
Analysis of Variance Table

Response: R_AGE
          Df Sum Sq Mean Sq F value  Pr(>F)    
hrs_wk     1   59.4    59.4    26.9 4.4e-06 ***
method     1  245.6   245.6   111.3 5.5e-14 ***
Residuals 47  103.7     2.2                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A Two-Predictor Model

summary(mod.m2)

Call:
lm(formula = R_AGE ~ hrs_wk + method, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-3.389 -1.008  0.209  1.055  2.430 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    7.237      1.205    6.00  2.6e-07 ***
hrs_wk         1.003      0.231    4.35  7.2e-05 ***
methodword    -4.446      0.421  -10.55  5.5e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.49 on 47 degrees of freedom
Multiple R-squared:  0.746, Adjusted R-squared:  0.735 
F-statistic: 69.1 on 2 and 47 DF,  p-value: 1.01e-14

A Two-Predictor Model

  • note that the lines are parallel

  • an hour of practice has the same effect, however you’re taught

Different Effects for Different Methods?

mod.m3 <- lm(R_AGE ~ hrs_wk + method + hrs_wk:method, data = reading)
anova(mod.m3)
Analysis of Variance Table

Response: R_AGE
              Df Sum Sq Mean Sq F value  Pr(>F)    
hrs_wk         1   59.4    59.4    29.5 2.0e-06 ***
method         1  245.6   245.6   122.2 1.5e-14 ***
hrs_wk:method  1   11.3    11.3     5.6   0.022 *  
Residuals     46   92.5     2.0                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Different Effects for Different Methods

summary(mod.m3)

Call:
lm(formula = R_AGE ~ hrs_wk + method + hrs_wk:method, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-2.995 -0.674  0.242  1.017  2.593 

Coefficients:
                  Estimate Std. Error t value  Pr(>|t|)    
(Intercept)          5.499      1.365    4.03   0.00021 ***
hrs_wk               1.347      0.264    5.11 0.0000061 ***
methodword           1.183      2.413    0.49   0.62629    
hrs_wk:methodword   -1.134      0.479   -2.37   0.02224 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.42 on 46 degrees of freedom
Multiple R-squared:  0.774, Adjusted R-squared:  0.759 
F-statistic: 52.4 on 3 and 46 DF,  p-value: 6.95e-15

Interaction is Just Multiplication

\[\color{red}{\hat{y_i}}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}\]

      (Intercept)            hrs_wk        methodword hrs_wk:methodword 
            5.499             1.347             1.183            -1.134 

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{5.499}\cdot{}\color{orange}{1}+\color{blue}{1.347}\cdot{}\color{orange}{\textrm{hrs_wk}}+\color{blue}{1.183}\cdot{}\color{orange}{\textrm{method}}+\color{blue}{-1.134}\cdot{}\color{orange}{\textrm{hrs_wk}\cdot{}\textrm{method}}\]

  • the coefficients show you which way things are coded

  • methodword can be read as “when method is word

Interaction is Just Multiplication

  • when method is (coded as) zero (for phonics):

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{5.499}\cdot{}\color{orange}{1}+\color{blue}{1.347}\cdot{}\color{orange}{\textrm{hrs_wk}}+\color{blue}{1.183}\cdot{}\color{orange}{0}+\color{blue}{-1.134}\cdot{}\color{orange}{\textrm{hrs_wk}\cdot{}0}\]

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{5.499}\cdot{}\color{orange}{1}+\color{blue}{1.347}\cdot{}\color{orange}{\textrm{hrs_wk}}\]

  • when method is (coded as) one (for word):

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{5.499}\cdot{}\color{orange}{1}+\color{blue}{1.347}\cdot{}\color{orange}{\textrm{hrs_wk}}+\color{blue}{1.183}\cdot{}\color{orange}{1}+\color{blue}{-1.134}\cdot{}\color{orange}{\textrm{hrs_wk}\cdot{}1}\]

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{6.682}\cdot{}\color{orange}{1}+\color{blue}{0.213}\cdot{}\color{orange}{\textrm{hrs_wk}}\]

No More Parallel Lines

A Nice Graph

reading |>
    ggplot(aes(x = hrs_wk,
        y = R_AGE, colour = method)) +
    xlab("practice (hrs/wk)") +
    ylab("reading age") +
    geom_point(size = 3) +
    geom_smooth(method = "lm")

  • geom_smooth(method="lm") effectively runs a linear model, to make a graph

  • it’s not an analysis

Interactions 3

Category x Category

age hrs_wk method school R_AGE
9.982 5.137 phonics private 14.090
8.006 4.353 phonics state 11.762
9.349 5.808 phonics state 13.838
8.620 3.612 word private 6.620
8.060 4.447 word private 7.524
6.117 5.085 word state 5.502
  • shoehorning in one more reading example

  • does where it’s taught affect the efficacy of a method?

Where to Start

  • when looking at continuous predictors, we started with a model criticism

  • when looking at mixed predictors, we looked at model improvement
Analysis of Variance Table

Response: R_AGE
              Df Sum Sq Mean Sq F value  Pr(>F)
hrs_wk         1   59.4    59.4    29.5 2.0e-06
method         1  245.6   245.6   122.2 1.5e-14
hrs_wk:method  1   11.3    11.3     5.6   0.022
Residuals     46   92.5     2.0                

Where to Start

  • where you start depends on what you’re doing

    • have I got a good model? (no: one of many possible issues: missing predictor)

    • can I improve a model? (for example, I am exploring which predictors are relevant)

  • in the present case, we already have a theory (that different schools are using the teaching methods differently)

  • this time we’ll start by looking at the data

Reading Age by School and Method

A Much Better Graph

An Analysis

  • looking at the graph, it seems as if

    • the state school is doing a bit better

    • but it depends which method we look at

  • we wouldn’t be able to predict how well a child would do without knowing which kind of school and which method

  • it seems (graphically) as if our theory/hunch was right

  • what about statistically?

Fitting a Model

mod.s <- lm(R_AGE ~ method + school + method:school, data = reading)
anova(mod.s)
Analysis of Variance Table

Response: R_AGE
              Df Sum Sq Mean Sq F value  Pr(>F)    
method         1    263   263.2  110.56 8.1e-14 ***
school         1      5     5.0    2.11 0.15357    
method:school  1     31    31.0   13.02 0.00076 ***
Residuals     46    110     2.4                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

  • save your fingers! a * b is equivalent to a + b + a:b
mod.s <- lm(R_AGE ~ method * school, data=reading)

What the Model Tells Us

summary(mod.s)

Call:
lm(formula = R_AGE ~ method + school + method:school, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-2.620 -1.132 -0.030  0.926  3.877 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)              11.258      0.428   26.31  < 2e-16 ***
methodword               -3.038      0.618   -4.92 0.000012 ***
schoolstate               2.209      0.618    3.58  0.00083 ***
methodword:schoolstate   -3.152      0.873   -3.61  0.00076 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.54 on 46 degrees of freedom
Multiple R-squared:  0.732, Adjusted R-squared:  0.715 
F-statistic: 41.9 on 3 and 46 DF,  p-value: 3.32e-13

F(3,46)=41.9

What the Model Tells Us

                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)              11.258      0.428   26.31  < 2e-16 ***
methodword               -3.038      0.618   -4.92 0.000012 ***
schoolstate               2.209      0.618    3.58  0.00083 ***
methodword:schoolstate   -3.152      0.873   -3.61  0.00076 ***
  • what is the intercept?

the predicted reading age of someone who has been to private school and taught using phonics

  • what does methodword mean?

being taught using the word method reduces your reading age by 3 years if you went to private school

What the Model Tells Us (2)

                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)              11.258      0.428   26.31  < 2e-16 ***
methodword               -3.038      0.618   -4.92 0.000012 ***
schoolstate               2.209      0.618    3.58  0.00083 ***
methodword:schoolstate   -3.152      0.873   -3.61  0.00076 ***
  • what does schoolstate mean?

being taught in a state school increases your predicted reading age by 2.2 years if you are taught using the phonics method

  • and the interaction?

compared to being taught using the word method in a private school, your predicted reading age is -3+2.2+-3.2 years different (i.e., it is 11.3-4 or 7.3 years)

What the Model Doesn’t Tell Us

  • we can’t say anything general about the value of phonics

  • we can’t say anything general about the effectiveness of schools

  • no main effects

  • possibly the only useful information we have is the interaction

  • we can fix this…

Main Effects

Coding Categories, Again

  • all of the maths on the last few slides works out because the categorical predictors are, underlyingly, dummy coded
contrasts(reading$method)
        word
phonics    0
word       1
contrasts(reading$school)
        state
private     0
state       1
  • for example the interaction term can only be added when \(\color{orange}{x_{1i}}\) and \(\color{orange}{x_{2i}}\) (here, method and school) are equal to 1

    • it applies for state school, word method

What If We Change Things?

\[\color{red}{\hat{y_i}}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}\]

 

contrasts(reading$method) <- c(-0.5, 0.5)
contrasts(reading$school) <- c(-0.5, 0.5)
  • what we’ve done is changed the values of \(\color{orange}{x_{1}}\) and \(\color{orange}{x_{2}}\)

    • where methodi is phonics, \(\color{orange}{x_{1i}}=-0.5\)

    • where methodi is word, \(\color{orange}{x_{1i}}=+0.5\)

  • similarly for school and \(\color{orange}{x_{2}}\)

Running the Model

  • NB., we’ve changed the contrast coding in the previous slide
mod.sC <- lm(R_AGE ~ method * school, data = reading)
summary(mod.sC)

Call:
lm(formula = R_AGE ~ method * school, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-2.620 -1.132 -0.030  0.926  3.877 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       10.056      0.218   46.05  < 2e-16 ***
method1           -4.614      0.437  -10.56  6.9e-14 ***
school1            0.634      0.437    1.45  0.15357    
method1:school1   -3.152      0.873   -3.61  0.00076 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.54 on 46 degrees of freedom
Multiple R-squared:  0.732, Adjusted R-squared:  0.715 
F-statistic: 41.9 on 3 and 46 DF,  p-value: 3.32e-13

F(3,46)=41.9

What the Model Tells Us

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       10.056      0.218   46.05  < 2e-16 ***
method1           -4.614      0.437  -10.56  6.9e-14 ***
school1            0.634      0.437    1.45  0.15357    
method1:school1   -3.152      0.873   -3.61  0.00076 ***
  • note the output isn’t as helpful here, you have to remember which values are 1

    • depends which you assigned positive values

    • here, word method, state school

  • for someone in private school using phonics method

reading age = 10.1 - \(\frac{1}{2}\)·-4.6 - \(\frac{1}{2}\)·0.6 + \(\frac{1}{4}\)·-3.2

reading age = 10.1 + 2.3 - 0.3 - 0.8 = 11.3 years

…etc.

What’s Good About This?

  • effect of school with (-.5, .5) contrast coding

What’s Good About This? (2)

  • effect of method with (-.5, .5) contrast coding

These are Main Effects

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       10.056      0.218   46.05  < 2e-16 ***
method1           -4.614      0.437  -10.56  6.9e-14 ***
school1            0.634      0.437    1.45  0.15357    
method1:school1   -3.152      0.873   -3.61  0.00076 ***
  • on average, phonics is a better method for teaching

  • on average, there is no (statistical) difference between types of school

  • there’s also an interaction between these two predictors (but we should really calculate it out to ensure that we know what it “means”)

  • a quick read: when school is state and method is word, reading age is reduced

  • but this obscures the fact that state schools are particularly good on phonics—perhaps we should recode?

Sum Coding

  • R provides a quick and dirty way of setting similar contrasts
contrasts(reading$method) <- c(-0.5, 0.5)
contrasts(reading$method)
        [,1]
phonics -0.5
word     0.5
mod.a <- lm(R_AGE~method,data=reading)
summary(mod.a)
            Estimate Std. Error t value        Pr(>|t|)    
(Intercept)   10.025      0.246   40.71         < 2e-16 ***
method1       -4.589      0.492   -9.32 0.0000000000024 ***
contrasts(reading$method) <- contr.sum(2)
contrasts(reading$method)
        [,1]
phonics    1
word      -1
mod.b <- lm(R_AGE~method,data=reading)
summary(mod.b)
            Estimate Std. Error t value        Pr(>|t|)    
(Intercept)   10.025      0.246   40.71         < 2e-16 ***
method1        2.294      0.246    9.32 0.0000000000024 ***

End