Interactions

Univariate Statistics and Methodology using R

Martin Corley

Psychology, PPLS

University of Edinburgh

Two Weeks Ago

age	hrs_wk	method	R_AGE
9.982	5.137	phonics	14.090
8.006	4.353	phonics	11.762
9.349	5.808	phonics	13.838
8.620	3.612	word	6.620
8.060	4.447	word	7.524
6.117	5.085	word	5.502

Two Weeks Ago

mod.m <- lm(R_AGE ~ age + hrs_wk, data = reading)
summary(mod.m)


Call:
lm(formula = R_AGE ~ age + hrs_wk, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-3.820 -2.382  0.074  2.404  3.549 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    0.518      2.422    0.21    0.832  
age            0.578      0.222    2.61    0.012 *
hrs_wk         0.945      0.406    2.33    0.024 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.55 on 47 degrees of freedom
Multiple R-squared:  0.253, Adjusted R-squared:  0.221 
F-statistic: 7.97 on 2 and 47 DF,  p-value: 0.00105

Two Weeks Ago

Interactions 1

Before we Reach for R

our model says that age and hrs_wk have orthogonal effects
- whatever your age, an extra hour of reading/wk improves your reading age by 0.95 years
- however much you read, an extra year of age improves your reading age by 0.58 years
what if practice affects people of different ages differently?

possibly our model isn’t a good fit to the data?

How Good is the Model?

plot(mod.m, which=1:4)

Interaction is Just Multiplication

linear model with two predictors

\[\color{red}{\textrm{outcome}_i}=\color{blue}{(\textrm{model})_i}+\textrm{error}_i\] \[\color{red}{y_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\epsilon_i\]

linear model with interaction

\[\color{red}{y_i}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}+\epsilon_i\] \[\color{red}{\hat{y_i}}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}\]

Model with Interaction

\[\color{red}{\hat{y_i}}=\color{blue}{b_0}\cdot{}\color{orange}{1}+\color{blue}{b_1}\cdot{}\color{orange}{x_{1i}}+\color{blue}{b_2}\cdot{}\color{orange}{x_{2i}}+\color{blue}{b_3}\cdot{}\color{orange}{x_{1i}x_{2i}}\]

mod.i <- lm(R_AGE ~ 1 + hrs_wk + age + hrs_wk:age, data = reading)

\(\color{blue}{b_0=1}\)

\(\color{blue}{b_1=1.5}\)

\(\color{blue}{b_2=2}\)

\(\color{blue}{b_3=0.5}\)

\(\color{red}{x_{1i}=2}\)

\(\color{blue}{b_0=1}\)

\(\color{blue}{b_1=1.5}\)

\(\color{blue}{b_2=2}\)

\(\color{blue}{b_3=0.5}\)

\(\color{red}{x_{1i}=2}\)

\(\color{green}{x_{1i}=4}\)

Model with Interaction

mod.i <- lm(R_AGE ~ 1 + hrs_wk + age + hrs_wk:age, data = reading)
summary(mod.i)


Call:
lm(formula = R_AGE ~ 1 + hrs_wk + age + hrs_wk:age, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-4.217 -2.172 -0.049  2.144  4.318 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   24.792     11.904    2.08    0.043 *
hrs_wk        -3.818      2.324   -1.64    0.107  
age           -2.348      1.423   -1.65    0.106  
hrs_wk:age     0.569      0.274    2.08    0.043 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.46 on 46 degrees of freedom
Multiple R-squared:  0.317, Adjusted R-squared:  0.273 
F-statistic: 7.13 on 3 and 46 DF,  p-value: 0.000496

Model with Interaction

Interactions 2

Categorical Predictors

how does interaction work with categorical predictors?
(as you’ll see) it’s all just multiplication

age	hrs_wk	method	R_AGE
9.982	5.137	phonics	14.090
8.006	4.353	phonics	11.762
9.349	5.808	phonics	13.838
8.620	3.612	word	6.620
8.060	4.447	word	7.524
6.117	5.085	word	5.502

how is method coded by R?

contrasts(reading$method)

        word
phonics    0
word       1

A (Different) Two-Predictor Model

we know that hrs_wk affects reading age
perhaps method affects reading age too?
this is a question of model improvement

mod.m2 <- lm(R_AGE ~ hrs_wk + method, data = reading)
anova(mod.m2)

Analysis of Variance Table

Response: R_AGE
          Df Sum Sq Mean Sq F value  Pr(>F)    
hrs_wk     1   59.4    59.4    26.9 4.4e-06 ***
method     1  245.6   245.6   111.3 5.5e-14 ***
Residuals 47  103.7     2.2                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A Two-Predictor Model

summary(mod.m2)


Call:
lm(formula = R_AGE ~ hrs_wk + method, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-3.389 -1.008  0.209  1.055  2.430 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    7.237      1.205    6.00  2.6e-07 ***
hrs_wk         1.003      0.231    4.35  7.2e-05 ***
methodword    -4.446      0.421  -10.55  5.5e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.49 on 47 degrees of freedom
Multiple R-squared:  0.746, Adjusted R-squared:  0.735 
F-statistic: 69.1 on 2 and 47 DF,  p-value: 1.01e-14

A Two-Predictor Model

note that the lines are parallel
an hour of practice has the same effect, however you’re taught

Different Effects for Different Methods?

mod.m3 <- lm(R_AGE ~ hrs_wk + method + hrs_wk:method, data = reading)
anova(mod.m3)

Analysis of Variance Table

Response: R_AGE
              Df Sum Sq Mean Sq F value  Pr(>F)    
hrs_wk         1   59.4    59.4    29.5 2.0e-06 ***
method         1  245.6   245.6   122.2 1.5e-14 ***
hrs_wk:method  1   11.3    11.3     5.6   0.022 *  
Residuals     46   92.5     2.0                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Different Effects for Different Methods

summary(mod.m3)


Call:
lm(formula = R_AGE ~ hrs_wk + method + hrs_wk:method, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-2.995 -0.674  0.242  1.017  2.593 

Coefficients:
                  Estimate Std. Error t value  Pr(>|t|)    
(Intercept)          5.499      1.365    4.03   0.00021 ***
hrs_wk               1.347      0.264    5.11 0.0000061 ***
methodword           1.183      2.413    0.49   0.62629    
hrs_wk:methodword   -1.134      0.479   -2.37   0.02224 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.42 on 46 degrees of freedom
Multiple R-squared:  0.774, Adjusted R-squared:  0.759 
F-statistic: 52.4 on 3 and 46 DF,  p-value: 6.95e-15

Interaction is Just Multiplication

      (Intercept)            hrs_wk        methodword hrs_wk:methodword 
            5.499             1.347             1.183            -1.134

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{5.499}\cdot{}\color{orange}{1}+\color{blue}{1.347}\cdot{}\color{orange}{\textrm{hrs_wk}}+\color{blue}{1.183}\cdot{}\color{orange}{\textrm{method}}+\color{blue}{-1.134}\cdot{}\color{orange}{\textrm{hrs_wk}\cdot{}\textrm{method}}\]

the coefficients show you which way things are coded
methodword can be read as “when method is word”

Interaction is Just Multiplication

when method is (coded as) zero (for phonics):

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{5.499}\cdot{}\color{orange}{1}+\color{blue}{1.347}\cdot{}\color{orange}{\textrm{hrs_wk}}+\color{blue}{1.183}\cdot{}\color{orange}{0}+\color{blue}{-1.134}\cdot{}\color{orange}{\textrm{hrs_wk}\cdot{}0}\]

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{5.499}\cdot{}\color{orange}{1}+\color{blue}{1.347}\cdot{}\color{orange}{\textrm{hrs_wk}}\]

when method is (coded as) one (for word):

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{5.499}\cdot{}\color{orange}{1}+\color{blue}{1.347}\cdot{}\color{orange}{\textrm{hrs_wk}}+\color{blue}{1.183}\cdot{}\color{orange}{1}+\color{blue}{-1.134}\cdot{}\color{orange}{\textrm{hrs_wk}\cdot{}1}\]

\[\color{red}{\hat{\textrm{R_AGE}}}=\color{blue}{6.682}\cdot{}\color{orange}{1}+\color{blue}{0.213}\cdot{}\color{orange}{\textrm{hrs_wk}}\]

No More Parallel Lines

A Nice Graph

reading |>
    ggplot(aes(x = hrs_wk,
        y = R_AGE, colour = method)) +
    xlab("practice (hrs/wk)") +
    ylab("reading age") +
    geom_point(size = 3) +
    geom_smooth(method = "lm")

geom_smooth(method="lm") effectively runs a linear model, to make a graph
it’s not an analysis

Interactions 3

Category x Category

age	hrs_wk	method	school	R_AGE
9.982	5.137	phonics	private	14.090
8.006	4.353	phonics	state	11.762
9.349	5.808	phonics	state	13.838
8.620	3.612	word	private	6.620
8.060	4.447	word	private	7.524
6.117	5.085	word	state	5.502

shoehorning in one more reading example
does where it’s taught affect the efficacy of a method?

Where to Start

when looking at continuous predictors, we started with a model criticism

when looking at mixed predictors, we looked at model improvement

Analysis of Variance Table

Response: R_AGE
              Df Sum Sq Mean Sq F value  Pr(>F)
hrs_wk         1   59.4    59.4    29.5 2.0e-06
method         1  245.6   245.6   122.2 1.5e-14
hrs_wk:method  1   11.3    11.3     5.6   0.022
Residuals     46   92.5     2.0

Where to Start

where you start depends on what you’re doing
- have I got a good model? (no: one of many possible issues: missing predictor)
- can I improve a model? (for example, I am exploring which predictors are relevant)

in the present case, we already have a theory (that different schools are using the teaching methods differently)
this time we’ll start by looking at the data

Reading Age by School and Method

A Much Better Graph

An Analysis

looking at the graph, it seems as if
- the state school is doing a bit better
- but it depends which method we look at
we wouldn’t be able to predict how well a child would do without knowing which kind of school and which method

it seems (graphically) as if our theory/hunch was right
what about statistically?

Fitting a Model

mod.s <- lm(R_AGE ~ method + school + method:school, data = reading)
anova(mod.s)

Analysis of Variance Table

Response: R_AGE
              Df Sum Sq Mean Sq F value  Pr(>F)    
method         1    263   263.2  110.56 8.1e-14 ***
school         1      5     5.0    2.11 0.15357    
method:school  1     31    31.0   13.02 0.00076 ***
Residuals     46    110     2.4                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

save your fingers! a * b is equivalent to a + b + a:b

mod.s <- lm(R_AGE ~ method * school, data=reading)

What the Model Tells Us

summary(mod.s)


Call:
lm(formula = R_AGE ~ method + school + method:school, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-2.620 -1.132 -0.030  0.926  3.877 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)              11.258      0.428   26.31  < 2e-16 ***
methodword               -3.038      0.618   -4.92 0.000012 ***
schoolstate               2.209      0.618    3.58  0.00083 ***
methodword:schoolstate   -3.152      0.873   -3.61  0.00076 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.54 on 46 degrees of freedom
Multiple R-squared:  0.732, Adjusted R-squared:  0.715 
F-statistic: 41.9 on 3 and 46 DF,  p-value: 3.32e-13

F(3,46)=41.9

What the Model Tells Us

                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)              11.258      0.428   26.31  < 2e-16 ***
methodword               -3.038      0.618   -4.92 0.000012 ***
schoolstate               2.209      0.618    3.58  0.00083 ***
methodword:schoolstate   -3.152      0.873   -3.61  0.00076 ***

what is the intercept?

the predicted reading age of someone who has been to private school and taught using phonics

what does methodword mean?

being taught using the word method reduces your reading age by 3 years if you went to private school

What the Model Tells Us (2)

                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)              11.258      0.428   26.31  < 2e-16 ***
methodword               -3.038      0.618   -4.92 0.000012 ***
schoolstate               2.209      0.618    3.58  0.00083 ***
methodword:schoolstate   -3.152      0.873   -3.61  0.00076 ***

what does schoolstate mean?

being taught in a state school increases your predicted reading age by 2.2 years if you are taught using the phonics method

and the interaction?

compared to being taught using the word method in a private school, your predicted reading age is -3+2.2+-3.2 years different (i.e., it is 11.3-4 or 7.3 years)

What the Model Doesn’t Tell Us

we can’t say anything general about the value of phonics
we can’t say anything general about the effectiveness of schools
no main effects
possibly the only useful information we have is the interaction
we can fix this…

Main Effects

Coding Categories, Again

all of the maths on the last few slides works out because the categorical predictors are, underlyingly, dummy coded

contrasts(reading$method)

        word
phonics    0
word       1

contrasts(reading$school)

        state
private     0
state       1

for example the interaction term can only be added when \(\color{orange}{x_{1i}}\) and \(\color{orange}{x_{2i}}\) (here, method and school) are equal to 1
- it applies for state school, word method

What If We Change Things?

contrasts(reading$method) <- c(-0.5, 0.5)
contrasts(reading$school) <- c(-0.5, 0.5)

what we’ve done is changed the values of \(\color{orange}{x_{1}}\) and \(\color{orange}{x_{2}}\)
- where method_i is phonics, \(\color{orange}{x_{1i}}=-0.5\)
- where method_i is word, \(\color{orange}{x_{1i}}=+0.5\)
similarly for school and \(\color{orange}{x_{2}}\)

Running the Model

NB., we’ve changed the contrast coding in the previous slide

mod.sC <- lm(R_AGE ~ method * school, data = reading)
summary(mod.sC)


Call:
lm(formula = R_AGE ~ method * school, data = reading)

Residuals:
   Min     1Q Median     3Q    Max 
-2.620 -1.132 -0.030  0.926  3.877 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       10.056      0.218   46.05  < 2e-16 ***
method1           -4.614      0.437  -10.56  6.9e-14 ***
school1            0.634      0.437    1.45  0.15357    
method1:school1   -3.152      0.873   -3.61  0.00076 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.54 on 46 degrees of freedom
Multiple R-squared:  0.732, Adjusted R-squared:  0.715 
F-statistic: 41.9 on 3 and 46 DF,  p-value: 3.32e-13

F(3,46)=41.9

What the Model Tells Us

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       10.056      0.218   46.05  < 2e-16 ***
method1           -4.614      0.437  -10.56  6.9e-14 ***
school1            0.634      0.437    1.45  0.15357    
method1:school1   -3.152      0.873   -3.61  0.00076 ***

note the output isn’t as helpful here, you have to remember which values are 1
- depends which you assigned positive values
- here, word method, state school

for someone in private school using phonics method

reading age = 10.1 - \(\frac{1}{2}\)·-4.6 - \(\frac{1}{2}\)·0.6 + \(\frac{1}{4}\)·-3.2

reading age = 10.1 + 2.3 - 0.3 - 0.8 = 11.3 years

…etc.

What’s Good About This?

effect of school with (-.5, .5) contrast coding

What’s Good About This? (2)

effect of method with (-.5, .5) contrast coding

These are Main Effects

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       10.056      0.218   46.05  < 2e-16 ***
method1           -4.614      0.437  -10.56  6.9e-14 ***
school1            0.634      0.437    1.45  0.15357    
method1:school1   -3.152      0.873   -3.61  0.00076 ***

on average, phonics is a better method for teaching
on average, there is no (statistical) difference between types of school
there’s also an interaction between these two predictors (but we should really calculate it out to ensure that we know what it “means”)

a quick read: when school is state and method is word, reading age is reduced
but this obscures the fact that state schools are particularly good on phonics—perhaps we should recode?

Sum Coding

R provides a quick and dirty way of setting similar contrasts

contrasts(reading$method) <- c(-0.5, 0.5)
contrasts(reading$method)

        [,1]
phonics -0.5
word     0.5

mod.a <- lm(R_AGE~method,data=reading)
summary(mod.a)

            Estimate Std. Error t value        Pr(>|t|)    
(Intercept)   10.025      0.246   40.71         < 2e-16 ***
method1       -4.589      0.492   -9.32 0.0000000000024 ***

contrasts(reading$method) <- contr.sum(2)
contrasts(reading$method)

        [,1]
phonics    1
word      -1

mod.b <- lm(R_AGE~method,data=reading)
summary(mod.b)

            Estimate Std. Error t value        Pr(>|t|)    
(Intercept)   10.025      0.246   40.71         < 2e-16 ***
method1        2.294      0.246    9.32 0.0000000000024 ***

Interactions

Two Weeks Ago

Two Weeks Ago

Two Weeks Ago

Two Weeks Ago

Interactions 1

Before we Reach for R

How Good is the Model?

Interaction is Just Multiplication

linear model with two predictors

linear model with interaction

Model with Interaction

Model with Interaction

Model with Interaction

Model with Interaction

Interactions 2

Categorical Predictors

A (Different) Two-Predictor Model

A Two-Predictor Model

A Two-Predictor Model

Different Effects for Different Methods?

Different Effects for Different Methods

Interaction is Just Multiplication

Interaction is Just Multiplication

No More Parallel Lines

A Nice Graph

Interactions 3

Category x Category

Where to Start

Where to Start

Reading Age by School and Method

A Much Better Graph

An Analysis

Fitting a Model

What the Model Tells Us

What the Model Tells Us

What the Model Tells Us (2)

What the Model Doesn’t Tell Us

Main Effects

Coding Categories, Again

What If We Change Things?

Running the Model

What the Model Tells Us

What’s Good About This?

What’s Good About This? (2)

These are Main Effects

Sum Coding

End