Data Analysis for Psychology in R 2
Department of Psychology
University of Edinburgh
2025–2026
| Introduction to linear Models | Intro to linear regression |
| Interpreting linear models | |
| Testing individual predictors | |
| Model testing & comparison | |
| Linear model analysis | |
| Analysing Experimental Studies | Categorical predictors and dummy coding |
| Effect coding and manual post-hoc contrasts | |
| Assumptions and diagnostics | |
| Bootstrapping and confidence intervals | |
| Categorical predictors: Practice analysis |
| Interactions | Mean-centering and numeric/categorical interactions |
| Numeric/numeric interactions | |
| Categorical/categorical interactions | |
| Manual contrast interactions and multiple comparisons | |
| Interactions: Practice analysis | |
| Advanced Topics | Power analysis |
| Binary logistic regression I | |
| Binary logistic regression II | |
| Logistic regression: Practice analysis | |
| Exam prep and course Q&A |
wooclap.com, enter code GECHXE
What does the interaction coefficient of a linear model mean?
If we know the contrast coding for two interacting categorical predictors, how do we work out the coding of the interaction term?
What’s the difference between a “simple slope” and a “simple effect”?
How can we calculate how many interaction terms a model will have when two categorical predictors interact?
An interaction is how we allow a model to estimate that the association between one predictor and the outcome is different, depending on the value of another predictor.
Post-travel anxiety ratings for people using two different kinds of transport:
Motorist:

Cyclist:

on two different road_types:
RuralRoad:

(all images from pixabay)
CityStreet:

# A tibble: 12 x 4
ppt_id anx transport road_type
<dbl> <dbl> <fct> <fct>
1 1 39 Cyclist RuralRoad
2 2 41 Cyclist CityStreet
3 3 38 Motorist RuralRoad
4 5 18 Motorist RuralRoad
5 7 25 Motorist RuralRoad
6 8 29 Motorist CityStreet
7 10 38 Cyclist CityStreet
8 12 11 Cyclist RuralRoad
9 13 33 Cyclist CityStreet
10 14 11 Motorist RuralRoad
11 15 40 Motorist RuralRoad
12 16 19 Motorist CityStreet
Does the difference in anxiety after travelling on different road types depend on the kind of transport?
Or, specifically: Is the difference in anxiety after travelling on rural roads vs. city streets different for motorists and cyclists?
# A tibble: 12 x 4
ppt_id anx transport road_type
<dbl> <dbl> <fct> <fct>
1 1 39 Cyclist RuralRoad
2 2 41 Cyclist CityStreet
3 3 38 Motorist RuralRoad
4 5 18 Motorist RuralRoad
5 7 25 Motorist RuralRoad
6 8 29 Motorist CityStreet
7 10 38 Cyclist CityStreet
8 12 11 Cyclist RuralRoad
9 13 33 Cyclist CityStreet
10 14 11 Motorist RuralRoad
11 15 40 Motorist RuralRoad
12 16 19 Motorist CityStreet
Does the difference in anxiety after travelling on different road types depend on the kind of transport?
Or, equivalently: Is the difference in anxiety between motorists and cyclists different after travelling on rural roads and on city streets?
wooclap.com, enter code GECHXE
The default contrast coding (treatment coding):
In other words: To get the interaction’s contrast, we multiply together each pair of coding values from the interacting predictors.
| transport | road_type | transportCyclist | road_typeCityStreet | tC:rtCS |
|---|---|---|---|---|
| Motorist | RuralRoad | 0 | 0 | 0 * 0 = 0 |
| Motorist | CityStreet | 0 | 1 | 0 * 1 = 0 |
| Cyclist | RuralRoad | 1 | 0 | 1 * 0 = 0 |
| Cyclist | CityStreet | 1 | 1 | 1 * 1 = 1 |
The interaction model anx ~ transport * road_type will have four coefficients:
Intercept (but you’re pretty good at interpreting the intercept by now, so we’ll spend time on the harder stuff!)transportCyclistroad_typeCityStreettransportCyclist:road_typeCityStreetWithout looking ahead, try to figure out: What will each coefficient represent?
wooclap.com, enter code GECHXE\[ \text{anx} = \beta_0 + (\beta_1 \cdot \text{transport}) + (\beta_2 \cdot \text{road_type}) + (\beta_3 \cdot \text{transport} \cdot \text{road_type}) + \epsilon \]
Call:
lm(formula = anx ~ transport * road_type, data = anx1)
Residuals:
Min 1Q Median 3Q Max
-23.1176 -7.0303 -0.1963 6.0811 22.5263
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.0303 1.6213 16.672 <2e-16 ***
transportCyclist 0.4434 2.2162 0.200 0.8417
road_typeCityStreet 4.8886 2.2300 2.192 0.0300 *
transportCyclist:road_typeCityStreet 7.7553 3.1316 2.476 0.0145 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.314 on 138 degrees of freedom
Multiple R-squared: 0.2413, Adjusted R-squared: 0.2248
F-statistic: 14.63 on 3 and 138 DF, p-value: 2.52e-08
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.03 1.62 16.67 0.00
transportCyclist 0.44 2.22 0.20 0.84
road_typeCityStreet 4.89 2.23 2.19 0.03
transportCyclist:road_typeCityStreet 7.76 3.13 2.48 0.01
Intercept
transport = 0 = Motorist and road_type = 0 = RuralRoad, the estimated average anxiety is 27.03 points.transportCyclist
road_type = 0 = RuralRoad, being a cyclist is associated with an increase in anxiety of 0.44 points.road_typeCityStreet
transport = 0 = Motorist, being on city streets is associated with an increase in anxiety of 4.89 points. Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.03 1.62 16.67 0.00
transportCyclist 0.44 2.22 0.20 0.84
road_typeCityStreet 4.89 2.23 2.19 0.03
transportCyclist:road_typeCityStreet 7.76 3.13 2.48 0.01
transportCyclist:road_typeCityStreet
road_type with anx by 7.76 points.road_type with anx when …
transport = 0 = Motorist is 4.89transport = 1 = Cyclist is 4.89 + 7.76 = 12.65transport with anx by 7.76 points.transport with anx when …
road_type = 0 = RuralRoad is 0.44road_type = 1 = CityStreet is 0.44 + 7.76 = 8.2A more formal version of “find the difference of differences”. (You’ll practice this more in labs!)
Model coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.03 1.62 16.67 0.00
transportCyclist 0.44 2.22 0.20 0.84
road_typeCityStreet 4.89 2.23 2.19 0.03
transportCyclist:road_typeCityStreet 7.76 3.13 2.48 0.01
Group means:
| RuralRoad | CityStreet | |
|---|---|---|
| Motorist | 27.03 | 31.92 |
| Cyclist | 27.47 | 40.12 |
How model coefficients and group means are related:
(Intercept) |
mean(Motorist, RuralRoad) | 27.03 |
transportCyclist |
mean(Cyclist, RuralRoad) – mean(Motorist, RuralRoad) | 27.47 – 27.03 = 0.44 |
road_typeCityStreet |
mean(Motorist, CityStreet) – mean(Motorist, RuralRoad) | 31.92 – 27.03 = 4.89 |
tC:rtCS |
[mean(Cyclist, CityStreet) – mean(Motorist, CityStreet)] – [mean(Cyclist, RuralRoad) – mean(Motorist, RuralRoad)] | (40.12 – 31.92) – (27.47 – 27.03) = 7.76 |
When we had continuous predictors, we used probe_interaction() from the interactions library.
Now that we have only categorical predictors, we must use cat_plot() instead.
If you like to think about interactions in terms of differences in vertical distances between two groups, then this way of visualising things might work for you.
cat_plot() can also link each group mean with lines, using the argument geom = "line".
If you like to think about interactions in terms of differences of slopes between two groups, then this way of visualising things might work for you.
Simple effects: The association of a categorical predictor with the outcome, at a specific value of another predictor.
Simple slopes: The association of a continuous predictor with the outcome, at a specific value of another predictor.
We need the model’s linear expression:
\[ \text{anx} = \beta_0 + (\beta_1 \cdot \text{transport}) + (\beta_2 \cdot \text{road_type}) + (\beta_3 \cdot \text{transport} \cdot \text{road_type}) + \epsilon \\ \]
And the model’s coefficient estimates:
(Intercept) transportCyclist
27.03 0.44
road_typeCityStreet transportCyclist:road_typeCityStreet
4.89 7.76
We substitute the coefficient estimates into the linear expression, to give:
\[ \text{anx} = 27.03 + (0.44 \cdot \text{transport}) + (4.89 \cdot \text{road_type}) + (7.76 \cdot \text{transport} \cdot \text{road_type}) + \epsilon \\ \]
transport for rural roadsRural roads are represented by road_type = 0, so we substitute 0 for \(\text{road_type}\).
\[\begin{align} \text{anx} &= 27.03 + (0.44 \cdot \text{transport}) + (4.89 \cdot \text{road_type}) + (7.76 \cdot \text{transport} \cdot \text{road_type}) + \epsilon \\ \text{anx}_{\text{Rural}} &= 27.03 + (0.44 \cdot \text{transport}) + (4.89 \cdot 0) + (7.76 \cdot \text{transport} \cdot 0) + \epsilon \\ \text{anx}_{\text{Rural}} &= 27.03 + (0.44 \cdot \text{transport}) + \epsilon \\ \end{align}\]
This is the equation for the blue line in this plot:
When we ask for a “simple effect”, we ask for the slope of this line: the difference between groups at a specific level of another predictor. So here, the simple effect is 0.44.
transport for city streetsCity streets are represented by road_type = 1, so we substitute 1 for \(\text{road_type}\) below.
\[\begin{align} \text{anx} &= 27.03 + (0.44 \cdot \text{transport}) + (4.89 \cdot \text{road_type}) + (7.76 \cdot \text{transport} \cdot \text{road_type}) + \epsilon \\ \text{anx}_{\text{City}} &= 27.03 + (0.44 \cdot \text{transport}) + (4.89 \cdot 1) + (7.76 \cdot \text{transport} \cdot 1) + \epsilon \\ \text{anx}_{\text{City}} &= 27.03 + 4.89 + (0.44 \cdot \text{transport}) + (7.76 \cdot \text{transport}) + \epsilon \\ \text{anx}_{\text{City}} &= 31.92 + ((0.44 + 7.76) \cdot \text{transport}) + \epsilon \\ \text{anx}_{\text{City}} &= 31.92 + (8.2 \cdot \text{transport}) + \epsilon \\ \end{align}\]
This is the equation for the orange line in this plot:
If we ask for the simple effect of transport for city streets, it’s the slope of this line: 8.2.
road_type?Also doable! Those would be the symmetrical simple effects: also true, just a different angle of looking at the same data.
Those simple effects would match the blue and orange slopes in this plot:
I’ll leave calculating those lines to you—it’s good practice :)
Post-travel anxiety ratings for people using two different kinds of transport:
Motorist:

Cyclist:

now on three different road_types:
RuralRoad:

CityStreet:

DualCarr(iageway):

The data is exactly the same as before, plus the new category of DualCarr.
Factor levels:
The default contrast coding (treatment coding):
Our model of this data will have two interaction terms:
transportCyclist:road_typeCityStreet andtransportCyclist:road_typeDualCarrThe number of interaction terms is given by
\[ (r - 1) \times (c - 1) \]
To see why we are talking about “rows” and “columns”, imagine the variables arranged like this:
| RuralRoad | CityStreet | DualCarr | |
|---|---|---|---|
| Motorist | |||
| Cyclist | |||
So in our 2x3 data, where \(r=2\) and \(c=3\), we have
\[\begin{align} ~& (r - 1) \times (c - 1)\\ =~& (2 - 1) \times (3 - 1)\\ =~& 1 \times 2\\ =~& 2 \end{align}\]
interaction terms.
Let’s start with transportCyclist:road_typeCityStreet:
| transport | road_type | transportCyclist | road_typeCityStreet | tC:rtCS |
|---|---|---|---|---|
| Motorist | RuralRoad | 0 | 0 | ? |
| Motorist | CityStreet | 0 | 1 | ? |
| Motorist | DualCarr | 0 | 0 | ? |
| Cyclist | RuralRoad | 1 | 0 | ? |
| Cyclist | CityStreet | 1 | 1 | ? |
| Cyclist | DualCarr | 1 | 0 | ? |
wooclap.com, enter code GECHXE
Continue on with transportCyclist:road_typeDualCarr:
| transport | road_type | transportCyclist | road_typeDualCarr | tC:rtDC |
|---|---|---|---|---|
| Motorist | RuralRoad | 0 | 0 | ? |
| Motorist | CityStreet | 0 | 0 | ? |
| Motorist | DualCarr | 0 | 1 | ? |
| Cyclist | RuralRoad | 1 | 0 | ? |
| Cyclist | CityStreet | 1 | 0 | ? |
| Cyclist | DualCarr | 1 | 1 | ? |
wooclap.com, enter code GECHXE
\[ \begin{align} \text{anx} ~=~& \beta_0 + (\beta_1 \cdot \text{transport}) + (\beta_2 \cdot \text{road_type}_{\text{CS}}) + (\beta_3 \cdot \text{road_type}_{\text{DC}}) + \\ & (\beta_4 \cdot \text{transport} \cdot \text{road_type}_{\text{CS}}) + (\beta_5 \cdot \text{transport} \cdot \text{road_type}_{\text{DC}}) + \epsilon \end{align} \]
Call:
lm(formula = anx ~ transport * road_type, data = anx2)
Residuals:
Min 1Q Median 3Q Max
-23.8235 -7.0303 0.3514 6.1765 22.5263
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.0303 1.6584 16.299 < 2e-16 ***
transportCyclist 0.4434 2.2669 0.196 0.84512
road_typeCityStreet 4.8886 2.2811 2.143 0.03325 *
road_typeDualCarr 6.0197 2.2404 2.687 0.00779 **
transportCyclist:road_typeCityStreet 7.7553 3.2033 2.421 0.01633 *
transportCyclist:road_typeDualCarr 14.3301 3.1744 4.514 1.06e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.527 on 210 degrees of freedom
Multiple R-squared: 0.3692, Adjusted R-squared: 0.3542
F-statistic: 24.58 on 5 and 210 DF, p-value: < 2.2e-16
Estimate
(Intercept) 27.03
transportCyclist 0.44
road_typeCityStreet 4.89
road_typeDualCarr 6.02
transportCyclist:road_typeCityStreet 7.76
transportCyclist:road_typeDualCarr 14.33
Intercept (same as before)
transport and road_type, where all predictors = 0), the estimated average anx is 27.03 points.transportCyclist (same as before)
road_type (RuralRoad), being a cyclist is associated with an increase in anxiety of 0.44 points.road_typeCityStreet (same as before)
transport (Motorist), being on city streets is associated with an increase in anxiety of 4.89 points.road_typeDualCarr (new!)
transport (Motorist), being on dual carriageways is associated with an increase in anxiety of 6.02 points. Estimate
(Intercept) 27.03
transportCyclist 0.44
road_typeCityStreet 4.89
road_typeDualCarr 6.02
transportCyclist:road_typeCityStreet 7.76
transportCyclist:road_typeDualCarr 14.33
transportCyclist:road_typeCityStreet
road_typeCityStreet with anx by 7.76 points.road_typeCityStreet with anx when …
transport = 0 = Motorist is 4.89transport = 1 = Cyclist is 4.89 + 7.76 = 12.65transport with anx by 7.76 points.transport with anx when …
road_typeCityStreet = 0 = RuralRoad is 0.44road_typeCityStreet = 1 = CityStreet is 0.44 + 7.76 = 8.2 Estimate
(Intercept) 27.03
transportCyclist 0.44
road_typeCityStreet 4.89
road_typeDualCarr 6.02
transportCyclist:road_typeCityStreet 7.76
transportCyclist:road_typeDualCarr 14.33
transportCyclist:road_typeDualCarr
road_typeDualCarr with anx by about 14.33 points.road_type DualCarr with anx when …
transport = 0 = Motorist is 6.02transport = 1 = Cyclist is 6.02 + 14.33 = 20.35transport with anx by 14.33 points.transport with anx when …
road_typeDualCarr = 0 = RuralRoad is 0.44road_typeDualCarr = 1 = DualCarr is 0.44 + 14.33 = 14.77cat_plot()wooclap.com, enter code GECHXE
Whenever one predictor’s association with the outcome depends on another predictor, then we’re dealing with interactions.
So far, we’ve been focusing on interactions between two predictors, called “two-way interactions”.
But in principle, we could throw another variable into the mix:
Maybe countries with different amounts of bike infrastructure differ in the anxiety that cyclists vs. motorists feel on different road types.


This would be a three-way interaction between country, transport, and road type.
My advice to you: Try not to design studies that involve three-way interactions.
What does the interaction coefficient of a linear model mean?
If we know the contrast coding for two interacting categorical predictors, how do we work out the coding of the interaction term?
What’s the difference between a “simple slope” and a “simple effect”?
How can we calculate how many interaction terms a model will have when two categorical predictors interact?
\[ (r - 1) \times (c - 1) \]
Tasks:
Attend your lab and work together on the exercises
Support:
Help each other on the Piazza forum
Complete the weekly quiz

Attend office hours (see Learn page for details)
One way:
\[ 12.65 - 4.89 = 7.76 \]
Another way:
\[ 8.2 - 0.44 = 7.76 \]
No matter which angle we look at the interaction data from, the difference between differences—which appears in the model as the coefficient of the interaction term—is the same.
As we saw last week: interactions are symmetrical.
(In practice, people usually only report the angle that makes the most sense for their research question.)
(Motorist and RuralRoad are the reference levels)
Group means:
| RuralRoad | CityStreet | |
|---|---|---|
| Motorist | 27.03 | 31.92 |
| Cyclist | 27.47 | 40.12 |
For motorists, the difference between city and rural:
\[ \begin{align} & \text{CityStreet}~ - \text{RuralRoad} \\ =~ & 31.92 - 27.03 \\ =~ & 4.89 \\ \end{align} \]
For cyclists, the difference between city and rural:
\[ \begin{align} & \text{CityStreet}~ - \text{RuralRoad} \\ =~ & 40.12 - 27.47 \\ =~ & 12.65 \\ \end{align} \]
The difference between cyclists’ difference and motorists’ difference:
\[ \begin{align} & \text{CyclistDiff}~ - \text{MotoristDiff} \\ =~ & 12.65 - 4.89 \\ =~ & 7.76 \\ \end{align} \]
(Motorist and RuralRoad are the reference levels)
Group means:
| RuralRoad | CityStreet | |
|---|---|---|
| Motorist | 27.03 | 31.92 |
| Cyclist | 27.47 | 40.12 |
For rural roads, the difference between cyclists and motorists:
\[ \begin{align} & \text{Cyclist}~ - \text{Motorist} \\ =~ & 27.47 - 27.03 \\ =~ & 0.44 \\ \end{align} \]
For city streets, the difference between cyclists and motorists:
\[ \begin{align} & \text{Cyclist}~ - \text{Motorist} \\ =~ & 40.12 - 31.92 \\ =~ & 8.2 \\ \end{align} \]
The difference between city streets’ difference and rural roads’ difference:
\[ \begin{align} & \text{CityStreetsDiff}~ - \text{RuralRoadsDiff} \\ =~ & 8.2 - 0.44 \\ =~ & 7.76 \\ \end{align} \]
Model coefficients:
Estimate
(Intercept) 27.03
transportCyclist 0.44
road_typeCityStreet 4.89
road_typeDualCarr 6.02
transportCyclist:road_typeCityStreet 7.76
transportCyclist:road_typeDualCarr 14.33
Group means:
| RuralRoad | CityStreet | DualCarr | |
|---|---|---|---|
| Motorist | 27.03 | 31.92 | 33.05 |
| Cyclist | 27.47 | 40.12 | 47.82 |
How model coefficients and group means are related:
(Intercept) |
mean(Motorist, RuralRoad) | 27.03 |
transportCyclist |
mean(Cyclist, RuralRoad) – mean(Motorist, RuralRoad) | 27.47 – 27.03 = 0.44 |
road_typeCityStreet |
mean(Motorist, CityStreet) – mean(Motorist, RuralRoad) | 31.92 – 27.03 = 4.89 |
road_typeDualCarr |
mean(Motorist, DualCarr) – mean(Motorist, RuralRoad) | 33.05 – 27.03 = 6.02 |
tC:rtCS |
[mean(Cyclist, CityStreet) – mean(Motorist, CityStreet)] – [mean(Cyclist, RuralRoad) – mean(Motorist, RuralRoad)] | (40.12 – 31.92) – (27.47 – 27.03) = 7.76 |
tC:rtDC |
[mean(Cyclist, DualCarr) – mean(Motorist, DualCarr)] – [mean(Cyclist, RuralRoad) – mean(Motorist, RuralRoad)] | (47.82 – 33.05) – (27.47 – 27.03) = 14.33 |