lm(DV ~ IV, data = datasetName)
Data Analysis for Psychology in R 2
Department of Psychology
University of Edinburgh
2025–2026
Introduction to Linear Models | Intro to Linear Regression |
Interpreting Linear Models | |
Testing Individual Predictors | |
Model Testing & Comparison | |
Linear Model Analysis | |
Analysing Experimental Studies | Categorical Predictors & Dummy Coding |
Effects Coding & Coding Specific Contrasts | |
Assumptions & Diagnostics | |
Bootstrapping | |
Categorical Predictor Analysis |
Interactions | Interactions I |
Interactions II | |
Interactions III | |
Analysing Experiments | |
Interaction Analysis | |
Advanced Topics | Power Analysis |
Binary Logistic Regression I | |
Binary Logistic Regression II | |
Logistic Regression Analysis | |
Exam Prep and Course Q&A |
Be able to interpret the coefficients from a simple linear model
Understand how and why we standardise coefficients and how this impacts interpretation
Understand how these interpretations change when we add more predictors
\[y_i = \beta_0 + \beta_1 x_{i} + \epsilon_i\]
lm
in Rlm()
function:
Call:
lm(formula = score ~ hours, data = test)
Coefficients:
(Intercept) hours
0.40 1.05
lm
in R
Call:
lm(formula = score ~ hours, data = test)
Residuals:
Min 1Q Median 3Q Max
-1.618 -1.077 -0.746 1.177 2.436
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.400 1.111 0.36 0.728
hours 1.055 0.358 2.94 0.019 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.63 on 8 degrees of freedom
Multiple R-squared: 0.52, Adjusted R-squared: 0.46
F-statistic: 8.67 on 1 and 8 DF, p-value: 0.0186
Intercept is the expected value of Y when X is 0
Slope is the number of units by which Y increases, on average, for a unit increase in X
In our example, 0 has a meaning
But it is not always the case that 0 is meaningful
Suppose our predictor variable was not hours of study, but age
Imagine the model has age as a predictor variable instead of the number of hours studied. How would we interpret an intercept of 0.4?
Imagine a model looking at the association between an employee’s salary and their duration of employment
For reference/hint:
\(\beta_0\) = Intercept is the expected value of Y when X is 0
\(\beta_1\) = Slope is the number of units by which Y increases, on average, for a unit increase in X
Imagine a model looking at the association between the length of cats’ tails and their weight
For reference/hint:
\(\beta_0\) = Intercept is the expected value of Y when X is 0
\(\beta_1\) = Slope is the number of units by which Y increases, on average, for a unit increase in X
Imagine a model looking at the association between healthy eating habits and personality
For reference/hint:
\(\beta_0\) = Intercept is the expected value of Y when X is 0
\(\beta_1\) = Slope is the number of units by which Y increases, on average, for a unit increase in X
Why might standard units be useful?
\[\hat{\beta_1^*} = \hat \beta_1 \frac{s_x}{s_y}\]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.40 1.111 0.36 0.7282
hours 1.05 0.358 2.94 0.0186
Another option is to transform continuous predictor and outcome variables to \(z\)-scores (mean=0, SD=1) prior to fitting the model
If both \(x\) and \(y\) are standardised, our model coefficients (betas) are standardised too
\(z\)-score for \(x\):
\[z_{x_i} = \frac{x_i - \bar{x}}{s_x}\]
\[z_{y_i} = \frac{y_i - \bar{y}}{s_y}\]
scale
function)center = T
\[z_{x_i} = \frac{\color{#BF1932}{x_i - \bar{x}}}{s_x}\]
scale = T
\[z_{x_i} = \frac{x_i - \bar{x}}{\color{#BF1932}{s_x}}\]
Another option is to not transform the variables and save to your dataset, but instead scale the variables directly in the model.
The defaults for center
and scale
are both TRUE
.
Unstandardised
Standardised
\(R^2\) , \(F\) and \(t\)-test and their corresponding \(p\)-values remain the same for the standardised coefficients as for unstandardised coefficients
\(\beta_0\) (intercept) = zero when all variables are standardised:
\[\beta_0 = \bar{y}-\hat \beta_1\bar{x}\]
\[\bar{y} - \hat \beta_1 \bar{x} = 0 - \hat \beta_1 0 = 0\]
The interpretation of the slope coefficient(s) becomes the increase in \(y\) in standard deviation units for every standard deviation increase in \(x\)
So, in our example:
For every standard deviation increase in hours of study, test score increases by 0.72 standard deviations
The aim of a linear model is to explain variance in an outcome
In simple linear models, we have a single predictor, but the model can accommodate (in principle) any number of predictors
If we have multiple predictors for an outcome, those predictors may be correlated with each other
A linear model with multiple predictors finds the optimal prediction of the outcome from several predictors, taking into account their redundancy with one another
For prediction: multiple predictors may lead to improved prediction
For theory testing: often our theories suggest that multiple variables together contribute to variation in an outcome
For covariate control: we might want to assess the effect of a specific predictor, controlling for the influence of others
\[y_i = \beta_0 + \beta_1 \cdot x_{1i} + \epsilon_i\]
\[y_i = \beta_0 + \beta_1 \cdot x_{1i} + \beta_2 \cdot x_{2i} + \beta_3 \cdot x_{3i} + \epsilon_i\]
\[y_i = \beta_0 + \beta_1 \cdot x_{1i} + \beta_2 \cdot x_{2i} + ... + \beta_j \cdot x_{ji} + \epsilon_i\]
Given that we have additional variables, our interpretation of the regression coefficients changes a little
\(\beta_0\) = the predicted value for \(y\) when all \(x\) are 0
Each \(\beta_j\) is now a partial regression coefficient
Refers to finding the effect of the predictor when the values of the other predictors are fixed
It may also be expressed as the effect of controlling for, or partialling out, or residualising for the other \(x\)’s
With multiple predictors lm
isolates the effects and estimates the unique contributions of predictors
A linear model with one continuous predictor
A linear model with two continuous predictors
lm
with 2 PredictorsImagine we extend our study of test scores
We sample 150 students taking a multiple choice Biology exam (max score 40)
We give all students a survey at the start of the year measuring their school motivation
We then measure the hours they spent studying for the test, and record their scores on the test
lm
code+
in the model specification\[\text{Score}_i = \color{blue}{\beta_0} + \color{blue}{\beta_1} \cdot \color{orange}{\text{Hours}_{i}} + \color{blue}{\beta_2} \cdot \color{orange}{\text{Motivation}_{i}} + \color{blue}{\epsilon_i}\]
values of the linear model (coefficients)
values we provide (inputs)
\[\text{Score}_i = \color{blue}{\beta_0} + \color{blue}{\beta_1} \cdot \color{orange}{\text{Hours}_{i}} + \color{blue}{\beta_2} \cdot \color{orange}{\text{Motivation}_{i}} + \epsilon_i\]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.87 0.65 10.49 0.00
hours 1.38 0.08 17.22 0.00
motivation 0.92 0.38 2.39 0.02
What is the interpretation of the…
hours
?motivation
?lm()
in RAttend your lab and work together on the exercises
Complete the weekly quiz
Help each other on the Piazza forum
Attend office hours (see Learn page for details)