Data Analysis for Psychology in R 2
Department of Psychology
University of Edinburgh
2025–2026
| Introduction to linear Models | Intro to linear regression |
| Interpreting linear models | |
| Testing individual predictors | |
| Model testing & comparison | |
| Linear model analysis | |
| Analysing Experimental Studies | Categorical predictors and dummy coding |
| Effect coding and manual post-hoc contrasts | |
| Assumptions and diagnostics | |
| Bootstrapping and confidence intervals | |
| Categorical predictors: Practice analysis |
| Interactions | Mean-centering and numeric/categorical interactions |
| Numeric/numeric interactions | |
| Categorical/categorical interactions | |
| Manual contrast interactions and multiple comparisons | |
| Interactions: Practice analysis | |
| Advanced Topics | Power analysis |
| Binary logistic regression I | |
| Binary logistic regression II | |
| Logistic regression: Practice analysis | |
| Exam prep and course Q&A |
How do we specify an interaction between two numeric predictors?
What does it mean when we say “interactions are symmetrical”?
What is a simple slope?
When we have an interaction between two numeric predictors, what values do we typically use to compute simple slopes?
An interaction is how we allow a model to estimate that the association between one predictor and the outcome is different, depending on the value of another predictor.
# A tibble: 12 x 4
id wellbeing outdoor_time hrs_sun
<chr> <dbl> <dbl> <dbl>
1 ID101 80.2 2.2 3
2 ID102 124. 4.5 5
3 ID103 80.6 2.4 3
4 ID104 84.4 4.6 4
5 ID105 83.8 4.8 3
6 ID106 118. 4.4 4
7 ID107 96.4 4.3 5
8 ID108 96.5 5 5
9 ID109 88.2 2.4 3
10 ID110 144. 4.6 6
11 ID111 120. 3 5
12 ID112 130. 2.3 4
How does the association between outdoor time and people’s wellbeing change,
depending on the number of hours of sunlight in a day?
Different slopes in different panels \(\rightarrow\) The association between outdoor time and wellbeing appears different for different amounts of sunlight.
wooclap.com, enter code BIAHTI
In mathematical notation:
\[ \text{wellbeing} = \beta_0 + (\beta_1 \cdot \text{outdoor_time}) + (\beta_2 \cdot \text{hrs_sun}) + (\beta_3 \cdot \text{outdoor_time} \cdot \text{hrs_sun}) + \epsilon \]
In R, where both of these options have the same result:
Does an interaction model need to contain each of the interacting predictors on their own, too? Yes—otherwise we’d only see how one predictor’s effect depends on another predictor, with no idea how they behave individually.
Y ~ A + B + A:BY ~ A:BIncluding each interacting predictor as well as the interaction (the “Better” example above) is an example of the “principle of marginality” … yet another terrible name for a stats concept :(
wooclap.com, enter code BIAHTI
For every predictor that goes into your model, know what zero represents.
outdoor_time = 0 represent?hrs_sun = 0 represent?A model’s intercept is the estimated mean outcome when every predictor in the model is at zero.
wellbeing ~ outdoor_time * hrs_sun?wooclap.com, enter code BIAHTI
Each predictor’s slope represents the association of the given predictor with the outcome when every other predictor is at zero.
If predictors interact, then slope estimates are conditional on the other predictor being zero. (In other words, each predictor’s slope estimate is only true when the other interacting predictor is zero.)
outdoor_time represent?hrs_sun represent?The interaction term tells us how much the association between one predictor and the outcome changes, when the other predictor moves from 0 to 1.
outdoor_time:hrs_sun represent?
Call:
lm(formula = wellbeing ~ outdoor_time * hrs_sun, data = outdoors)
Residuals:
Min 1Q Median 3Q Max
-43.008 -9.710 -1.068 8.674 48.494
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 87.920 16.376 5.369 5.51e-07 ***
outdoor_time -10.944 4.538 -2.412 0.01779 *
hrs_sun 3.154 4.311 0.732 0.46614
outdoor_time:hrs_sun 3.255 1.193 2.728 0.00758 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 17.55 on 96 degrees of freedom
Multiple R-squared: 0.5404, Adjusted R-squared: 0.5261
F-statistic: 37.63 on 3 and 96 DF, p-value: 3.631e-16
Estimate Std. Error t value Pr(>|t|)
(Intercept) 87.920 16.376 5.369 5.51e-07 ***
outdoor_time -10.944 4.538 -2.412 0.01779 *
hrs_sun 3.154 4.311 0.732 0.46614
outdoor_time:hrs_sun 3.255 1.193 2.728 0.00758 **
Intercept
outdoor_time = 0 hours outdoors and hrs_sun = 0 sunlight hours, the estimated average wellbeing is 87.9.outdoor_time
hrs_sun = 0 sunlight hours, spending an additional hour outdoors is associated with a decrease in wellbeing of 10.9 points.hrs_sun
outdoor_time = 0 hours spent outdoors, the sun shining for an extra hour is associated with an increase in wellbeing of 3.2 points. Estimate Std. Error t value Pr(>|t|)
(Intercept) 87.920 16.376 5.369 5.51e-07 ***
outdoor_time -10.944 4.538 -2.412 0.01779 *
hrs_sun 3.154 4.311 0.732 0.46614
outdoor_time:hrs_sun 3.255 1.193 2.728 0.00758 **
outdoor_time:hrs_sun
Increasing sunlight hours by 1 changes the association of outdoor_time with wellbeing by 3.3.
So, the association of outdoor_time with wellbeing when …
hrs_sun = 0 is –10.9hrs_sun = 1 is –10.9 + 3.3 = –7.6hrs_sun = 2 is –10.9 + 3.3 + 3.3 = –4.3Now the x axis shows hours of sunlight, and each panel shows a group of similar outdoor_times.
This plot shows that the association between sunlight hours and wellbeing is different, when different amounts of time are spent outdoors. The interacting variables are flipped, but the interaction is still there.
Interpretation 1:
Increasing hrs_sun by 1 means that the association between outdoor_time and wellbeing changes by about 3.
Interpretation 2 flips the two interacting variables and is still true:
Increasing outdoor_time by 1 means that the association between sun_hrs and wellbeing changes by about 3.
A common example of symmetry: When a shape looks the same, even when it is mirrored.
Image from Wikimedia Commons
If we have an interaction between predictors A and B, then both “mirror images” are always true:
So we say that interactions are symmetrical.
In practice, usually we choose the interpretation that’s narratively simpler or makes more intuitive sense (like I secretly did last week!)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 87.920 16.376 5.369 5.51e-07 ***
outdoor_time -10.944 4.538 -2.412 0.01779 *
hrs_sun 3.154 4.311 0.732 0.46614
outdoor_time:hrs_sun 3.255 1.193 2.728 0.00758 **
outdoor_time:hrs_sun
Increasing sunlight hours by 1 changes the association of outdoor_time with wellbeing by 3.3.
So, the association of outdoor_time with wellbeing when …
hrs_sun = 0 is –10.9hrs_sun = 1 is –10.9 + 3.3 = –7.6hrs_sun = 2 is –10.9 + 3.3 + 3.3 = –4.3Equivalently: Increasing outdoor time by 1 changes the association of hrs_sun with wellbeing by 3.3.
So, the association of hrs_sun with wellbeing when …
outd_time = 0 is 3.2outd_time = 1 is 3.2 + 3.3 = 6.5outd_time = 2 is 3.2 + 3.3 + 3.3 = 9.8One of the best ways to understand an interaction is to figure out what the model thinks the values of these different slopes are:
Specifically, we want to know the slopes of the associations that the model estimates between outdoor_time and wellbeing for particular values of hrs_sun. We call these simple slopes.
A simple slope is the slope of the association between one predictor and the outcome, at a specific value of another predictor.
I’ve been showing you simple slopes when interpreting interaction coefficients. But now let’s calculate them formally!
What we need to know:
The model’s linear expression (where w stands for wellbeing):
\[ \text{w} = \beta_0 + (\beta_1 \cdot \text{outdoor_time}) + (\beta_2 \cdot \text{hrs_sun}) + (\beta_3 \cdot \text{outdoor_time} \cdot \text{hrs_sun}) + \epsilon \]
The model’s coefficients:
| parameter | beta | estim |
|---|---|---|
| (Intercept) | beta0 | 87.92 |
| outdoor_time | beta1 | -10.94 |
| hrs_sun | beta2 | 3.15 |
| outdoor_time:hrs_sun | beta3 | 3.25 |
Step 1: Substitute the estimated coefficients into the linear expression.
\[ \text{w} = 87.92 + (-10.94 \cdot \text{outdoor_time}) + (3.15 \cdot \text{hrs_sun}) + (3.25 \cdot \text{outdoor_time} \cdot \text{hrs_sun}) + \epsilon \]
Step 1: Substitute the estimated coefficients into the linear expression.
\[ \text{w} = 87.92 + (-10.94 \cdot \text{outdoor_time}) + (3.15 \cdot \text{hrs_sun}) + (3.25 \cdot \text{outdoor_time} \cdot \text{hrs_sun}) + \epsilon \]
Step 2: Decide what specific value of hrs_sun we want to find the slope over outdoor_time for. Substitute that value in for hrs_sun.
Let’s start with hrs_sun = 0.
\[ \text{w}_{\text{hrs_sun} = 0} = 87.92 + (-10.94 \cdot \text{outdoor_time}) + (3.15 \cdot 0) + (3.25 \cdot \text{outdoor_time} \cdot 0) + \epsilon \]
Step 3: Simplify this expression by working out the multiplication and addition step by step.
\[ \begin{align} \text{w}_{\text{hrs_sun} = 0} &= 87.92 + (-10.94 \cdot \text{outdoor_time}) + (3.15 \cdot 0) + (3.25 \cdot \text{outdoor_time} \cdot 0) + \epsilon \\ \text{w}_{\text{hrs_sun} = 0} &= 87.92 + (-10.94 \cdot \text{outdoor_time}) + \epsilon \\ \end{align} \]
So the line that associates outdoor_time with wellbeing, when hrs_sun = 0, has an intercept of 87.92 and a slope of –10.94. –10.94 is the simple slope.
Step 1: Substitute the estimated coefficients into the linear expression.
\[ \text{w} = 87.92 + (-10.94 \cdot \text{outdoor_time}) + (3.15 \cdot \text{hrs_sun}) + (3.25 \cdot \text{outdoor_time} \cdot \text{hrs_sun}) + \epsilon \]
Step 2: Decide what specific value of hrs_sun we want to find the slope over outdoor_time for. Substitute that value in for hrs_sun.
Let’s do the mean value for hrs_sun, which is 3.8.
\[ \text{w}_{\text{hrs_sun} = 3.8} = 87.92 + (-10.94 \cdot \text{outdoor_time}) + (3.15 \cdot 3.8) + (3.25 \cdot \text{outdoor_time} \cdot 3.8) + \epsilon \]
Step 3: Simplify this expression by working out the multiplication and addition step by step.
Breakdown of steps on next slide.
The starting expression that we want to simplify:
\[ \text{w}_{\text{hrs_sun} = 3.8} = 87.92 + (-10.94 \cdot \text{outdoor_time}) + (3.15 \cdot 3.8) + (3.25 \cdot \text{outdoor_time} \cdot 3.8) + \epsilon \]
Multiply the 3.8s together with the numbers in the same brackets.
\[ \text{w}_{\text{hrs_sun} = 3.8} = 87.92 + (-10.94 \cdot \text{outdoor_time}) + 11.97 + (12.35 \cdot \text{outdoor_time}) + \epsilon \]
Add together the freestanding numbers.
\[ \text{w}_{\text{hrs_sun} = 3.8} = 99.89 + (-10.94 \cdot \text{outdoor_time}) + (12.35 \cdot \text{outdoor_time}) + \epsilon \]
Combine the numbers that are multiplied with \(\text{outdoor_time}\). (In math terms, we’re factoring out the variable \(\text{outdoor_time}\) from both terms that contain it.)
\[ \text{w}_{\text{hrs_sun} = 3.8} = 99.89 + ((-10.94 + 12.35) \cdot \text{outdoor_time}) + \epsilon \]
Add up those combined numbers, and we’re done!
\[ \text{w}_{\text{hrs_sun} = 3.8} = 99.89 + (1.41 \cdot \text{outdoor_time}) + \epsilon \]
So the line that associates outdoor_time with wellbeing, when hrs_sun is at its mean of 3.8, has an intercept of about 99.89 and a slope of about 1.41. 1.41 is the simple slope.
probe_interaction() can compute simple slopes and test if they’re significantly different from zero.
JOHNSON-NEYMAN INTERVAL
When hrs_sun is OUTSIDE the interval [1.79, 4.33], the slope of
outdoor_time is p < .05.
Note: The range of observed values of hrs_sun is [1.00, 7.00]
SIMPLE SLOPES ANALYSIS
Slope of outdoor_time when hrs_sun = 2.536687 (- 1 SD):
Est. S.E. t val. p
------- ------ -------- ------
-2.69 1.88 -1.43 0.16
Slope of outdoor_time when hrs_sun = 3.800000 (Mean):
Est. S.E. t val. p
------ ------ -------- ------
1.42 1.36 1.04 0.30
Slope of outdoor_time when hrs_sun = 5.063313 (+ 1 SD):
Est. S.E. t val. p
------ ------ -------- ------
5.54 2.18 2.54 0.01
The JN interval tells us the values for one predictor when the simple slope of the other predictor is significantly different from zero. (This is also called a region of significance analysis.)
JOHNSON-NEYMAN INTERVAL
When hrs_sun is OUTSIDE the interval [1.79, 4.33], the slope of
outdoor_time is p < .05.
Note: The range of observed values of hrs_sun is [1.00, 7.00]
To plot the JN interval:
wooclap.com, enter code BIAHTI
To mean-centre a variable means to transform it so that the mean of the centered version is zero.
Specifically, this involves subtracting the mean of a variable from every observation of that variable.
# A tibble: 6 x 6
id wellbeing outdoor_time hrs_sun outdoor_time_c hrs_sun_c
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ID101 80.2 2.2 3 -1.27 -0.8
2 ID102 124. 4.5 5 1.03 1.2
3 ID103 80.6 2.4 3 -1.07 -0.8
4 ID104 84.4 4.6 4 1.13 0.200
5 ID105 83.8 4.8 3 1.33 -0.8
6 ID106 118. 4.4 4 0.931 0.200
Before (each panel is one value of hrs_sun):
After (each panel is one value of hrs_sun_c, which aren’t round numbers because the original mean of 3.8 isn’t a round number):
The interaction model wellbeing ~ outdoor_time_c * hrs_sun_c will have four coefficients:
Interceptoutdoor_time_chrs_sun_coutdoor_time_c:hrs_sun_cWhat will each coefficient represent?
wooclap.com, enter code BIAHTI\[ \text{wellbeing} = \beta_0 + (\beta_1 \cdot \text{outdoor_time_c}) + (\beta_2 \cdot \text{hrs_sun_c}) + (\beta_3 \cdot \text{outdoor_time_c} \cdot \text{hrs_sun_c}) + \epsilon \]
Call:
lm(formula = wellbeing ~ outdoor_time_c * hrs_sun_c, data = outdoors)
Residuals:
Min 1Q Median 3Q Max
-43.008 -9.710 -1.068 8.674 48.494
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 104.848 1.757 59.686 < 2e-16 ***
outdoor_time_c 1.425 1.364 1.044 0.29890
hrs_sun_c 14.445 1.399 10.328 < 2e-16 ***
outdoor_time_c:hrs_sun_c 3.255 1.193 2.728 0.00758 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 17.55 on 96 degrees of freedom
Multiple R-squared: 0.5404, Adjusted R-squared: 0.5261
F-statistic: 37.63 on 3 and 96 DF, p-value: 3.631e-16
Estimate Std. Error t value Pr(>|t|)
(Intercept) 104.85 1.76 59.69 0.00
outdoor_time_c 1.42 1.36 1.04 0.30
hrs_sun_c 14.45 1.40 10.33 0.00
outdoor_time_c:hrs_sun_c 3.25 1.19 2.73 0.01
Intercept
outdoor_time = 0 (mean hrs outdoors) and hrs_sun = 0 (mean sunlight hours), the estimated average wellbeing is 104.9.outdoor_time_c
hrs_sun = 0 (at the mean value for sunlight hours), spending an additional hour outdoors is associated with an increase in wellbeing of 1.4 points.hrs_sun_c
outdoor_time = 0 (at the mean number of hours spent outdoors), the sun shining for an extra hour is associated with an increase in wellbeing of 14.5 points. Estimate Std. Error t value Pr(>|t|)
(Intercept) 104.85 1.76 59.69 0.00
outdoor_time_c 1.42 1.36 1.04 0.30
hrs_sun_c 14.45 1.40 10.33 0.00
outdoor_time_c:hrs_sun_c 3.25 1.19 2.73 0.01
outdoor_time_c:hrs_sun_c
Increasing centered sunlight hours by 1 changes the association of outdoor_time_c with wellbeing by 3.3.
So, the association of outdoor_time_c with wellbeing when …
hrs_sun_c = 0 is 1.4hrs_sun_c = 1 is 1.4 + 3.3 = 4.7hrs_sun_c = 2 is 1.4 + 3.3 + 3.3 = 8.0Equivalently: Increasing centered outdoor time by 1 changes the association of hrs_sun_c with wellbeing by 3.3.
So, the association of hrs_sun_c with wellbeing when …
out_t_c = 0 is 14.5out_t_c = 1 is 14.5 + 3.3 = 17.8out_t_c = 2 is 14.5 + 3.3 + 3.3 = 21.1Compare to the old plot of simple slopes: the shape is the same! Mean-centering doesn’t change the relationship between simple slopes.
Compare to the old plot: the shape is the same! Mean-centering doesn’t change the regions of (non)significance in how the predictors interact.
How do we specify an interaction between two numeric predictors?
A*B (which stands for the individual predictors and their interaction) or A:B (which stands for just the interaction).What does it mean when we say “interactions are symmetrical”?
What is a simple slope?
When we have an interaction between two numeric predictors, what values do we typically use to compute simple slopes?
mean(B) - sd(B)),mean(B)), andmean(B) + sd(B)).Tasks:
Attend your lab and work together on the exercises
Support:
Help each other on the Piazza forum
Complete the weekly quiz

Attend office hours (see Learn page for details)
The old coefficients in the non-centered model, m1:
(Intercept)
87.9
outdoor_time
-10.9
hrs_sun
3.2
outdoor_time:hrs_sun
3.3
The new coefficients in the mean-centered model, m2:
(Intercept)
104.8
outdoor_time_c
1.4
hrs_sun_c
14.4
outdoor_time_c:hrs_sun_c
3.3