Linear Model: Fundamentals

class: center, middle, inverse, title-slide

.title[
# <b> Linear Model: Fundamentals </b>
]
.subtitle[
## Data Analysis for Psychology in R 2<br><br>
]
.author[
### dapR2 Team
]
.institute[
### Department of Psychology<br>The University of Edinburgh
]

---

# Quick Announcements / Reminders

+ **Lab Exercises:** Solutions for labs are released each Friday at 12 noon

+ **Lab Schedule:** Only attend the lab shown on your timetable
  + If you need to switch your lab time, you must formally request the change using the [Group Change Request Form](https://registryservices.ed.ac.uk/timetabling-examinations/timetabling/personalised-timetables) on the Registry website
  + Attend your orignally schedulled lab until your change request has been approved

+ **Office Hours:** Times and locations for Marju's and Emma's office hours on the Learn page (Course Overview > Course Contacts)
  + No need to make a booking, just show up

---

# Course Overview

.pull-left[

<table style="border: 1px solid black;>
  <tr style="padding: 0 1em 0 1em;">
    <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1;text-align:center;vertical-align: middle">
        <b>Introduction to Linear Models</b></td>
    <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1">
        Intro to Linear Regression</td>
  </tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:1">
        <b>Interpreting Linear Models</b></td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Testing Individual Predictors</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Model Testing & Comparison</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Linear Model Analysis</td></tr>

<tr style="padding: 0 1em 0 1em;">
    <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4;text-align:center;vertical-align: middle">
        <b>Analysing Experimental Studies</b></td>
    <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Categorical Predictors & Dummy Coding</td>
  </tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        	Effects Coding & Coding Specific Contrasts</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Assumptions & Diagnostics</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Bootstrapping</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        	Categorical Predictor Analysis</td></tr>
</table>

]

.pull-right[

<table style="border: 1px solid black;>
  <tr style="padding: 0 1em 0 1em;">
    <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4;text-align:center;vertical-align: middle">
        <b>Interactions</b></td>
    <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Interactions I</td>
  </tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Interactions II</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Interactions III</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Analysing Experiments</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Interaction Analysis</td></tr>

<tr style="padding: 0 1em 0 1em;">
    <td rowspan="5" style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4;text-align:center;vertical-align: middle">
        <b>Advanced Topics</b></td>
    <td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Power Analysis</td>
  </tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Binary Logistic Regression I</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Binary Logistic Regression II</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        Logistic Regresison Analysis</td></tr>
  <tr><td style="border: 1px solid black;padding: 0 1em 0 1em;opacity:0.4">
        	Exam Prep and Course Q&A</td></tr>
</table>

]

---

# This Week's Learning Objectives
1. Be able to interpret the coefficients from a simple linear model

2. Understand how these interpretations change when we add more predictors

3. Understand how and why we standardise coefficients and how this impacts interpretation

---
class: inverse, center, middle

# Part 1: Recap & Coefficient Interpretation

---
# Linear Model
+ Last week we introduced the linear model:

`$$y_i = \beta_0 + \beta_1 x_{i} + \epsilon_i$$`

+ Where,
  + `$y_i$` is our measured outcome variable
  + `$x_i$` is our measured predictor variable
  + `$\beta_0$` is the model intercept
  + `$\beta_1$` is the model slope
  + `$\epsilon_i$` is the residual error (difference between the model predicted and the observed value of `$y$`)

+ We spoke about calculating by hand, and also the key concept of **residuals**

---
# `lm` in R
+ You also saw the basic structure of the `lm()` function:

``` r
lm(DV ~ IV, data = datasetName)
```

+ And we ran our first model:

``` r
lm(score ~ hours, data = test)
```

```
## 
## Call:
## lm(formula = score ~ hours, data = test)
## 
## Coefficients:
## (Intercept)        hours  
##       0.400        1.055
```

- This week, we are going to focus on the interpretation of our model, and how we extend it to include more predictors

---

# `lm` in R

``` r
summary(lm(score ~ hours, data = test))
```

```
## 
## Call:
## lm(formula = score ~ hours, data = test)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6182 -1.0773 -0.7454  1.1773  2.4364 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.4000     1.1111   0.360   0.7282  
## hours         1.0545     0.3581   2.945   0.0186 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.626 on 8 degrees of freedom
## Multiple R-squared:  0.5201,	Adjusted R-squared:  0.4601 
## F-statistic:  8.67 on 1 and 8 DF,  p-value: 0.01858
```

---
# Interpretation

.pull-left[

+ **Slope is the number of units by which Y increases, on average, for a unit increase in X.**

]

.pull-right[

]

---
count: false

# Interpretation

.pull-left[

+ **Slope is the number of units by which Y increases, on average, for a unit increase in X.**
  
  + Unit of Y = 1 point on the test
  + Unit of X = 1 hour of study

]

.pull-right[

]

---
count: false

# Interpretation

.pull-left[

+ **Slope is the number of units by which Y increases, on average, for a unit increase in X.**
  
  + Unit of Y = 1 point on the test
  + Unit of X = 1 hour of study
  + As the slope for **hours** is 1.0545, we conclude that for every hour of study, test score increases on average by 1.055 points

]

.pull-right[

]

---
# Interpretation

.pull-left[
+ **Slope is the number of units by which Y increases, on average, for a unit increase in X.**
  
  + Unit of Y = 1 point on the test
  + Unit of X = 1 hour of study
  + As the slope for **hours** is 1.0545, we conclude that for every hour of study, test score increases on average by 1.055 points

+ **Intercept is the expected value of Y when X is 0.**
  
  + X = 0 is a student who does not study

]

.pull-right[

]

---
count: false

# Interpretation
    
.pull-left[
+ **Slope is the number of units by which Y increases, on average, for a unit increase in X.**
  
  + Unit of Y = 1 point on the test
  + Unit of X = 1 hour of study
  + As the slope for **hours** is 1.0545, we conclude that for every hour of study, test score increases on average by 1.055 points

+ **Intercept is the expected value of Y when X is 0.**
  
  + X = 0 is a student who does not study
  + As the intercept is 0.4000, we conclude that a student who does not study would be expected to score 0.40 on the test
  
]

.pull-right[

]

---
# Note of caution on intercepts
+ In our example, 0 has a meaning
    + It is a student who has studied for 0 hours
    
+ But it is not always the case that 0 is meaningful

+ Suppose our predictor variable was not hours of study, but age

+ **Imagine the model has age as a predictor variable instead of the number of hours studied. How would we interpret an intercept of 0.4?**

+ This is a general lesson about interpreting statistical tests:
    + The interpretation is always in the context of the constructs and how we have measured them

---
# Practice with scales of measurement (1)

.pull-left[
+ Imagine a model looking at the relationship between an employee's duration of employment and their salary

+ `$X$` = unit is 1 year
  + `$Y$` = unit is £1000
  + `$\beta_1$` = 0.4
  
+ What can we interpret based on `$\beta_1$`?

]

.pull-right[

+ `$\beta_0$` = **Intercept is the expected value of Y when X is 0.**

+ `$\beta_1$` = **Slope is the number of units by which Y increases, on average, for a unit increase in X.**

]

---
# Practice with scales of measurement (2)

.pull-left[
+ Imagine a model looking at the relationship between cats' weight and the length of their tails

+ `$X$` = unit is 1kg
  + `$Y$` = unit is 1cm
  + `$\beta_1$` = -3.2

+ What can we interpret based on `$\beta_1$`?

]

.pull-right[

+ `$\beta_0$` = **Intercept is the expected value of Y when X is 0.**

+ `$\beta_1$` = **Slope is the number of units by which Y increases, on average, for a unit increase in X.**

]

---
# Practice with scales of measurement (3)

.pull-left[
+ Imagine a model looking at the relationship between personality and eating

+ `$X$` = unit is 1 increment on a Likert scale ranging from 1 to 5 measuring conscientiousness
  + `$Y$` = unit is 1 increment on a healthy eating scale
  + `$\beta_1$` = 0.25

+ What can we interpret based on `$\beta_1$`?

]

.pull-right[

+ `$\beta_0$` = **Intercept is the expected value of Y when X is 0.**

+ `$\beta_1$` = **Slope is the number of units by which Y increases, on average, for a unit increase in X.**

]

---
class: center, middle

# Questions?

---
class: inverse, center, middle

# Part 2: Standardisation

---
# Unstandardised vs standardised coefficients
- So far we have calculated _unstandardised_ `$\hat \beta_1$`.
  + This means we use the units of the variables we measured
  + We interpreted the slope as the change in `$y$` units for a unit change in `$x$` , where the unit is determined by how we have measured our variables

+ However, sometimes these units are not helpful for interpretation
  + We can then perform **standardisation** to aid interpretation

---
# Standardised units
Why might standard units be useful?

+ **If the scales of our variables are arbitrary**
  + Example: A sum score of questionnaire items answered on a Likert scale.
  + A unit here would equal moving from e.g. a 2 to 3 on one item
  + This is not especially meaningful (and actually has A LOT of associated assumptions)

+ **If we want to compare the effects of variables on different scales**
  + If we want to say something like  "the effect of `$x_1$` is stronger than the effect of `$x_2$`", we need a common scale

---
# Option 1: Standardising the coefficients
+ After calculating a `$\hat \beta_1$`, it can be standardised by:

`$$\hat{\beta_1^*} = \hat \beta_1 \frac{s_x}{s_y}$$`

+ where;
  + `$\hat{\beta_1^*}$` = standardised beta coefficient
  + `$\hat \beta_1$` = unstandardised beta coefficient
  + `$s_x$` = standard deviation of `$x$`
  + `$s_y$` = standard deviation of `$y$`

---
# Implementing in R

+ **Step 1: Obtain coefficients from the model**

``` r
m1 <- lm(score~hours, data = test)
summary(m1)$coefficients
```

```
##             Estimate Std. Error   t value  Pr(>|t|)
## (Intercept) 0.400000  1.1111010 0.3600033 0.7281636
## hours       1.054545  0.3581403 2.9445039 0.0185812
```

+ **Step 2: Take the slope coefficient and standardise it**

``` r
*round(1.054545 * (sd(test$hours)/sd(test$score)),3)
```

```
## [1] 0.721
```

---
# Option 2: Standardising the variables

+ Another option is to transform continuous predictor and outcome variables to `$z$`-scores (mean=0, SD=1) prior to fitting the model
+ If both `$x$` and `$y$` are standardised, our model coefficients (betas) are standardised too

+ `$z$`-score for `$x$`:

`$$z_{x_i} = \frac{x_i - \bar{x}}{s_x}$$`

+ and the `$z$`-score for `$y$`:

`$$z_{y_i} = \frac{y_i - \bar{y}}{s_y}$$`

+ That is, we divide the individual deviations from the mean by the standard deviation

---
# Implementing in R

+ **Step 1: Convert predictor and outcome variables to z-scores** (here using the `scale` function)

``` r
test <- test %>%
  mutate(
*   z_score = scale(score, center = T, scale = T),
*   z_hours = scale(hours, center = T, scale = T)
  )
```

`$$z_{x_i} = \frac{\color{#BF1932}{x_i - \bar{x}}}{s_x}$$`
+ `center = T` indicates `$x$` should be mean centered

---
count: false

# Implementing in R

+ **Step 1: Convert predictor and outcome variables to z-scores** (here using the `scale` function)

``` r
test <- test %>%
  mutate(
*   z_score = scale(score, center = T, scale = T),
*   z_hours = scale(hours, center = T, scale = T)
  )
```

`$$z_{x_i} = \frac{x_i - \bar{x}}{\color{#BF1932}{s_x}}$$`

+ `center = T` indicates `$x$` should be mean centered
+ `scale = T` indicates `$x$` should be divided by `$s_x$`

---
# Implementing in R

+ **Step 2: Run model on z-scored variables**

``` r
performance_z <- lm(z_score ~ z_hours, data = test) 
round(summary(performance_z)$coefficients,3)
```

```
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    0.000      0.232   0.000    1.000
## z_hours        0.721      0.245   2.945    0.019
```

---
## Interpreting standardised regression coefficients

.pull-left[
**Unstandardised**
<img src="figs/unstandardizedResults.png" width="80%" />
]

.pull-right[
**Standardised**
<img src="figs/standardizedResults.png" width="80%" />
]

+ `$R^2$` , `$F$` and `$t$`-test and their corresponding `$p$`-values remain the same for the standardised coefficients as for unstandardised coefficients

+ `$\beta_0$` (intercept) = zero when all variables are standardised:

`$$\beta_0 = \bar{y}-\hat \beta_1\bar{x}$$`

`$$\bar{y} - \hat \beta_1 \bar{x} = 0 - \hat \beta_1  0 = 0$$`

---
##  Interpreting standardised regression coefficients

+ The interpretation of the slope coefficient(s) becomes the increase in `$y$` in standard deviation units for every standard deviation increase in `$x$`

+ So, in our example:

>**For every standard deviation increase in hours of study, test score increases by 0.72 standard deviations**

---
# Which should we use? 
+ Unstandardised regression coefficients are often more useful when the variables are on  meaningful scales
	+ E.g. X additional hours of exercise per week adds Y years of healthy life

+ Sometimes it's useful to obtain standardised regression coefficients
	+ When the scales of variables are arbitrary
	+ When there is a desire to compare the effects of variables measured on different scales

+ Cautions
	+ Just because you can put regression coefficients on a common metric doesn't mean they can be meaningfully compared
	+ The SD is a poor measure of spread for skewed distributions, therefore, be cautious of their use with skewed variables

---
# Relationship to correlation ( `$r$` )

+ **If a linear model has a single, continuous predictor,** then the standardised slope ( `$\hat \beta_1^*$` ) = correlation coefficient ( `$r$` )

+ For example:

``` r
round(lm(z_score ~ z_hours, data = test)$coefficients, 2)
```

```
## (Intercept)     z_hours 
##        0.00        0.72
```

``` r
round(cor(test$hours, test$score),2)
```

```
## [1] 0.72
```

---
# Relationship to correlation ( `$r$` )

+ They are equivalent:
  + `$r$` is a standardised measure of linear association
  + `$\hat \beta_1^*$` is a standardised measure of the linear slope

+ Similar idea for linear models with multiple predictors
  + Slopes are now equivalent to the *part correlation coefficient*

---
class: center, middle

# Questions?

---
class: inverse, center, middle

# Part 3: Adding more predictors to our model

---
#  Multiple predictors (multiple regression)
+ The aim of a linear model is to explain variance in an outcome

+ In simple linear models, we have a single predictor, but the model can accommodate (in principle) any number of predictors

+ If we have multiple predictors for an outcome, those predictors may be correlated with each other

+ A linear model with multiple predictors finds the optimal prediction of the outcome from several predictors, **taking into account their redundancy with one another**

---
#  Uses of multiple regression 
+ **For prediction:** multiple predictors may lead to improved prediction

+ **For theory testing:** often our theories suggest that multiple variables together contribute to variation in an outcome

+ **For covariate control:** we might want to assess the effect of a specific predictor, controlling for the influence of others
	+ E.g., effects of personality on health after removing the effects of age and gender

---
#  Extending the regression model

+ Our model for a single predictor:

`$$y_i = \beta_0 + \beta_1 x_{1i} + \epsilon_i$$`

+ is extended to include additional `$x$`'s:

`$$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} + \epsilon_i$$`

+ For each `$x$`, we have an additional `$\beta$`
  + `$\beta_1$` is the coefficient for the 1st predictor
  + `$\beta_2$` for the second etc.

---
#  Interpreting coefficients in multiple regression

`$$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_j x_{ji} + \epsilon_i$$`

+ Given that we have additional variables, our interpretation of the regression coefficients changes a little

+ `$\beta_0$` = the predicted value for `$y$` **all** `$x$` are 0
	
+ Each `$\beta_j$` is now a **partial regression coefficient**
	+ It captures the change in `$y$` for a one unit change in `$x$` **when all other x's are held constant**

---
# What does holding constant mean?

+ Refers to finding the effect of the predictor when the values of the other predictors are fixed

+ It may also be expressed as the effect of **controlling for**, or **partialling out**, or **residualising for** the other `$x$`'s

+ With multiple predictors `lm` isolates the effects and estimates the unique contributions of predictors

---
#  Visualising models

.pull-left[

A linear model with one continuous predictor

]

.pull-right[

A linear model with two continuous predictors

]

???
+ In simple linear models, we could visualise the model as a straight line in 2D space
	+ Least squares finds the coefficients that produces the *regression line* that minimises the vertical distances of the observed y-values from the line

+ In a regression with  2 predictors, this becomes a regression plane in 3D space
	+ The goal now becomes finding the set of coefficients that minimises the vertical distances between the *regression*  *plane* and the observed y-values

+ The logic extends to any number of predictors
	+ (but becomes very difficult to visualise!)

---
#  Example: `lm` with 2 predictors

+ Imagine we extend our study of test scores

+ We sample 150 students taking a multiple choice Biology exam (max score 40)

+ We give all students a survey at the start of the year measuring their school motivation 
  + We standardise this variable so the mean is 0, negative numbers are low motivation, and positive numbers high motivation
  
+ We then measure the hours they spent studying for the test, and record their scores on the test

---
# Our data

``` r
slice(test_study2, 1:6)
```

```
##      ID score hours motivation
## 1 ID101     7     2      -1.42
## 2 ID102    23    12      -0.41
## 3 ID103    17     4       0.49
## 4 ID104     6     2       0.24
## 5 ID105    12     2       0.09
## 6 ID106    24    12       1.05
```

---
#  `lm` code

``` r
*performance <- lm(score ~ hours + motivation,
          data = test_study2)
```

+ Multiple predictors are separated by `+` in the model specification

---
# Walk-through of multiple regression

+ Let's run our model and illustrate the application of the linear model equation

`$$Score_i = \beta_0 + \beta_1 Hours_{i} + \beta_2 Motivation_{i} + \epsilon_i$$`

.pull-left[
<img src="figs/perfResults.png" width="125%" />
]

.pull-right[
+ `$\beta_0=$`
+ `$\beta_1=$`
+ `$\beta_2=$`
]

---
count: false

# Walk-through of multiple regression

+ Let's run our model and illustrate the application of the linear model equation

`$$Score_i = \beta_0 + \beta_1 Hours_{i} + \beta_2 Motivation_{i} + \epsilon_i$$`

.pull-left[
<img src="figs/perfResults.png" width="125%" />
]

.pull-right[
+ `$\beta_0 = 6.87$`
+ `$\beta_1 = 1.38$`
+ `$\beta_2 = 0.92$`

]

---

count: false

# Walk-through of multiple regression

+ Let's run our model and illustrate the application of the linear model equation

`$$Score_i = \beta_0 + \beta_1 Hours_{i} + \beta_2 Motivation_{i} + \epsilon_i$$`

.pull-left[
<img src="figs/perfResults.png" width="125%" />
]

.pull-right[
+ `$\beta_0 = 6.87$`
+ `$\beta_1 = 1.38$`
+ `$\beta_2 = 0.92$`

``` r
test_study2[1,]
```

```
##      ID score hours motivation
## 1 ID101     7     2      -1.42
```

``` r
round(residuals(performance)[1],2)
```

```
##     1 
## -1.32
```
]

`$$7 = 6.87 + 1.38 \times 2 +  0.92 \times -1.42 + -1.32$$`

---
#  Multiple regression coefficients

``` r
res <- summary(performance)
round(res$coefficients,2)
```

```
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)     6.87       0.65   10.49     0.00
## hours           1.38       0.08   17.22     0.00
## motivation      0.92       0.38    2.39     0.02
```

The interpretation of the intercept coefficient?

+ **A student who did not study, and who has average school motivation would be expected to score 6.87 on the test.**

The interpretation of the slope for `hours`?

+ **Controlling for students' level of motivation, for every additional hour studied, there is a 1.38 points increase in test score.**

The interpretation of the slope for `motivation`?

+ **Controlling for hours of study, for every SD unit increase in motivation, there is a 0.92 points increase in test score.**

---
class: center, middle

# Questions?

---
# Key Take Homes

1. We run linear models using `lm()` in R
2. The intercept is the value of `$Y$` when `$X$` = 0
3. The slope is the unit change in `$Y$` for each unit change in `$X$`
4. In certain cases, we may standardise our variables; this will affect their interpretation
5. We can easily add more predictors to our model
6. When we do, our interpretations of the coefficients are when all other predictors are held constant

**Plans for next week**

+ We will look at significance testing in the context of linear regression
  + And also discuss sampling variability

If you want a refresh, go back and review sampling and hypothesis testing material from dapR1

---

## This week

.pull-left[

### Tasks

**Attend your lab and work together on the exercises**

<br>

**Complete the weekly quiz**

]

.pull-right[

### Support

**Help each other on the Piazza forum**

<br>

**Attend office hours (see Learn page for details)**

]

---
class: center, middle
# Thanks for listening!