Linear Regression Refresh

class: center, middle, inverse, title-slide

.title[
# <b>Linear Regression Refresh</b>
]
.subtitle[
## Data Analysis for Psychology in R 3
]
.author[
### Josiah King
]
.institute[
### Department of Psychology<br/>The University of Edinburgh
]

---

# QUICK ANNOUNCEMENTS

- This afternoon's lecture is in EICC Sidlaw Theatre (see Learn announcement)

- Please look at the "Hello and Welcome to DAPR3 (README FIRST)" page on Learn.
  - The course intro video covers _A LOT_ of logistics  
  
- Please update/install RStudio on your computers
  - see Learn page for details and instructions
  - Do this sooner rather than later.

---
class: inverse, center, middle

<h1 style="text-align: left;">This Lecture:</h1>
<h3 style="text-align: left;">1. Linear Models Refresher</h3>

---

# Models

.pull-left[
__deterministic__

given the same input, deterministic functions return *exactly* the same output

- `$y = mx + c$`

- area of sphere = `$4\pi r^2$`

- height of fall = `$1/2 g t^2$`
    - `$g = \textrm{gravitational constant, }9.8m/s^2$`
    - `$t = \textrm{time (in seconds) of fall}$`

]

.pull-right[
__statistical__

.br2.f4.white.bg-gray[
$$ 
\textrm{outcome} = (\textrm{model}) + \color{black}{\textrm{error}}
$$
]

- handspan = height + randomness

- cognitive test score = age + premorbid IQ + ... + randomness

]

---
# The Linear Model

.br3.pa2.f2[
$$
`\begin{align}
\color{red}{\textrm{outcome}} & = \color{blue}{(\textrm{model})} + \textrm{error} \\
\color{red}{y_i} & = \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} x_i} + \varepsilon_i \\
\text{where } \\
\varepsilon_i & \sim N(0, \sigma) \text{ independently} \\
\end{align}`
$$
]

---
# The Linear Model

.flex.items-top[
.w-50.pa2[
Our proposed model of the world:

`$\color{red}{y_i} = \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} x_i} + \varepsilon_i$`  
  
{{content}}
]
.w-50.pa2[
![](dapr3_2324_01a_lmrefresh_files/figure-html/bb-1.svg)
]]

Our model _fitted_ to some data (note the `$\widehat{\textrm{hats}}$`):

`$\hat{y}_i = \color{blue}{\hat \beta_0 \cdot{} 1 + \hat \beta_1 \cdot{} x_i}$`

For the `$i^{th}$` observation:
  - `$\color{red}{y_i}$` is the value we observe for `$x_i$`   
  - `$\hat{y}_i$` is the value the model _predicts_ for `$x_i$`   
  - `$\color{red}{y_i} = \hat{y}_i + \hat\varepsilon_i$`  
  
---
# An Example

.flex.items-top[
.w-50.pa2[

`$\color{red}{y_i} = \color{blue}{5 \cdot{} 1 + 2 \cdot{} x_i} + \hat\varepsilon_i$`  
  
__e.g.__   
for the observation `$x_i = 1.2, \; y_i = 9.9$`:

$$
`\begin{align}
\color{red}{9.9} & = \color{blue}{5 \cdot{}} 1 + \color{blue}{2 \cdot{}} 1.2 + \hat\varepsilon_i \\
& = 7.4 + \hat\varepsilon_i \\
& = 7.4 + 2.5 \\
\end{align}`
$$
]
.w-50.pa2[
![](dapr3_2324_01a_lmrefresh_files/figure-html/errplot-1.svg)
]]

---
# Categorical Predictors

.pull-left[  
<br><table>
 <thead>
  <tr>
   <th style="text-align:left;"> y </th>
   <th style="text-align:left;"> x </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 7.99 </td>
   <td style="text-align:left;"> Category1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 4.73 </td>
   <td style="text-align:left;"> Category0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3.66 </td>
   <td style="text-align:left;"> Category0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3.41 </td>
   <td style="text-align:left;"> Category0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 5.75 </td>
   <td style="text-align:left;"> Category1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 5.66 </td>
   <td style="text-align:left;"> Category0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ... </td>
   <td style="text-align:left;"> ... </td>
  </tr>
</tbody>
</table>
]
.pull-right[
![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-2-1.svg)
]

---
# Multiple Predictors

.pull-left[
![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-3-1.svg)
]

.pull-right[
![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-4-1.svg)
]

---
exclude: true
# Multiple Predictors

.pull-left[
<img src="jk_img_sandbox/SSblank1.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[]

---
# Multiple Predictors

.pull-left[
<img src="jk_img_sandbox/SSblank1cov.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[

```r
lm(y ~ x1)
```

```
## 
## Call:
## lm(formula = y ~ x1)
## 
## Coefficients:
## (Intercept)           x1  
##     -0.0509       0.2007
```

]

---
# Multiple Predictors

.pull-left[
<img src="jk_img_sandbox/SSblank1_uncor.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[

```r
lm(y ~ x1 + x2u)
```

```
## 
## Call:
## lm(formula = y ~ x1 + x2u)
## 
## Coefficients:
## (Intercept)           x1          x2u  
##     -0.0446       0.2007       0.1304
```

]

---
# Multiple Predictors

.pull-left[
<img src="jk_img_sandbox/SStype3.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[

```r
lm(y ~ x1 + x2)
```

```
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Coefficients:
## (Intercept)           x1           x2  
##     -0.0644       0.0912       0.3302
```
{{content}}

]

```r
resx1.x2 = resid(lm(x1 ~ x2))
lm(y ~ resx1.x2)
```

```
## 
## Call:
## lm(formula = y ~ resx1.x2)
## 
## Coefficients:
## (Intercept)     resx1.x2  
##     -0.0697       0.0912
```

---
# Interactions

.pull-left[
![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-14-1.svg)
]

.pull-right[
![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-15-1.svg)

]

---
# Interactions

.pull-left[

![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-16-1.svg)
]
.pull-right[

![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-17-1.svg)
]
---
# Notation

`$\begin{align} \color{red}{y} \;\;\;\; & = \;\;\;\;\; \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} x_1 + ... + \beta_k \cdot x_k} & + & \;\;\;\varepsilon \\ \qquad \\ \color{red}{\begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ \vdots \\ y_n \end{bmatrix}} & = \color{blue}{\begin{bmatrix} 1 & x_{11} & x_{21} & \dots & x_{k1} \\ 1 & x_{12} & x_{22} &  & x_{k2} \\ 1 & x_{13} & x_{23} &  & x_{k3} \\ 1 & x_{14} & x_{24} &  & x_{k4} \\ 1 & x_{15} & x_{25} &  & x_{k5} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & x_{2n} & \dots & x_{kn} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \vdots \\ \beta_k \end{bmatrix}} & + & \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \\ \varepsilon_5 \\ \vdots \\ \varepsilon_n \end{bmatrix} \\ \qquad \\ \\\color{red}{\boldsymbol y} \;\;\;\;\; & = \qquad \qquad \;\;\; \mathbf{\color{blue}{X \qquad \qquad \qquad \;\;\;\: \boldsymbol \beta}} & + & \;\;\; \boldsymbol \varepsilon \\ \end{align}$`

---
# Link functions

`$\begin{align} \color{red}{y} = \mathbf{\color{blue}{X \boldsymbol{\beta}} + \boldsymbol{\varepsilon}} & \qquad  & (-\infty, \infty) \\ \qquad \\ \qquad \\ \color{red}{ln \left( \frac{p}{1-p} \right) } = \mathbf{\color{blue}{X \boldsymbol{\beta}} + \boldsymbol{\varepsilon}} & \qquad  & [0,1] \\ \qquad \\ \qquad \\ \color{red}{ln (y) } = \mathbf{\color{blue}{X \boldsymbol{\beta}} + \boldsymbol{\varepsilon}} & \qquad  & (0, \infty) \\ \end{align}$`

---
# Linear Models in R

- Linear regression

```r
linear_model <- lm(continuous_y ~ x1 + x2 + x3*x4, data = df)
```

- Logistic regression

```r
logistic_model <- glm(binary_y ~ x1 + x2 + x3*x4, data = df, family=binomial(link="logit"))
```

- Poisson regression

```r
poisson_model <- glm(count_y ~ x1 + x2 + x3*x4, data = df, family=poisson(link="log"))
```

---
# Null Hypothesis Testing

![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-21-1.svg)

---
# Inference for Coefficients

---
# Inference for Coefficients

---
# Inference for Coefficients

---
# Inference for Coefficients

---
# Sums of Squares

Rather than specific coefficients, we can also think of our model as a whole.  
We can talk in terms of the sums of squared residuals  
`$\sum^{n}_{i=1}(y_i - \hat y_i)^2$`.

.pull-left[
<img src="jk_img_sandbox/SS3xmodel1.png" width="400px" height="350px" style="display: block; margin: auto;" />
]

.pull-right[

![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-27-1.svg)

]

---
# `$R^2$`

.pull-left[
<img src="jk_img_sandbox/SSr2.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[
`$R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}}$`

```r
mdl <- lm(y ~ x1 + x2)
summary(mdl)$r.squared
```

```
## [1] 0.1935
```
]

---
# Inference: Joint tests

We can test reduction in residual sums of squares:

.pull-left[

```r
m1 <- lm(y ~ x1, data = df)
```
<img src="jk_img_sandbox/SS3xmodel1.png" width="400px" height="350px" style="display: block; margin: auto;" />
]
.pull-right[

```r
m2 <- lm(y ~ x1 + x2 + x3, data = df)
```
<img src="jk_img_sandbox/SS3xfullmodel.png" width="400px" height="350px" style="display: block; margin: auto;" />

]

---
# Inference: Joint tests

i.e. isolating the improvement in model fit due to inclusion of additional parameters

.pull-left[

```r
m1 <- lm(y ~ x1, data = df)
m2 <- lm(y ~ x1 + x2 + x3, data = df)
anova(m1,m2)
```

```
## Analysis of Variance Table
## 
## Model 1: y ~ x1
## Model 2: y ~ x1 + x2 + x3
##   Res.Df  RSS Df Sum of Sq   F Pr(>F)    
## 1     98 1141                            
## 2     96  357  2       785 106 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]
.pull-right[
<img src="jk_img_sandbox/SS3xincrement.png" width="400px" height="350px" style="display: block; margin: auto;" />
]

---
# Inference: Joint tests

"additional parameters" could be a set of coefficients for levels of a categorical variable.  
This provides a way of assessing "are there differences in group means?".

.pull-left[

```r
m1 = lm(y ~ x1, data = df)
m2 = lm(y ~ x1 + species, data = df)
coef(m2)
```

```
## (Intercept)          x1    speciesb    speciesc    speciesd 
##      13.841       1.059      -3.162      -2.943      -3.415
```

```r
anova(m1, m2)
```

```
## Analysis of Variance Table
## 
## Model 1: y ~ x1
## Model 2: y ~ x1 + species
##   Res.Df  RSS Df Sum of Sq    F  Pr(>F)    
## 1     98 1141                              
## 2     95  950  3       192 6.39 0.00055 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]
.pull-right[
<img src="jk_img_sandbox/SSblankq.png" width="400px" height="350px" style="display: block; margin: auto;" />
]

---
exclude: true
# Inference: Joint tests

This is kind of where traditional analysis of variance sits.  
Think of it as testing the addition of each variable entered in to the model,

.pull-left[

```r
m2 = lm(y ~ x1 + species, data = df)
anova(m2)
```

```
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## x1         1     91    90.6    9.06 0.00334 ** 
## species    3    192    63.9    6.39 0.00055 ***
## Residuals 95    950    10.0                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

]
.pull-right[
<img src="jk_img_sandbox/SStype1.png" width="400px" height="350px" style="display: block; margin: auto;" />
]

---
# Assumptions

Our model:

`$\color{red}{y} = \color{blue}{\mathbf{X \boldsymbol \beta}} + \varepsilon \qquad \text{where } \boldsymbol \varepsilon \sim N(0, \sigma) \text{ independently}$`

Our ability to generalise from the model we fit on sample data to the wider population requires making some _assumptions._

- assumptions about the nature of the **model** .tr[
(linear)
]

- assumptions about the nature of the **errors** .tr[
(normal)
]

.footnote[
You can also phrase the linear model as: `$\color{red}{\boldsymbol  y} \sim Normal(\color{blue}{\mathbf{X \boldsymbol \beta}}, \sigma)$`
]

---
# Assumptions: The Broad Idea

All our work here is in
aim of making **models of the world**.

- Models are models. They are simplifications. They are therefore wrong.

.pull-left[]
.pull-right[
![](jk_img_sandbox/joeymap.jpg)
]

---
count:false
# Assumptions: The Broad Idea

All our work here is in
aim of making **models of the world**.

- Models are models. They are simplifications. They are therefore wrong.

- Our residuals ( `$y - \hat{y}$` ) reflect everything that we **don't** account for in our model.

- In an ideal world, our model accounts for _all_ the systematic relationships. The leftovers (our residuals) are just random noise.

- If our model is mis-specified, or we don't measure some systematic relationship, then our residuals will reflect this.

- We check by examining how much "like randomness" the residuals appear to be (zero mean, normally distributed, constant variance, i.i.d ("independent and identically distributed")
    - _these ideas tend to get referred to as our "assumptions"_

- We will **never** know whether our residuals contain only randomness - we can never observe everything!

---
# Assumptions

.pull-left[

What does "zero mean and constant variance" look like?

- mean of the residuals = zero across the predicted values on the linear predictor.

- spread of residuals is normally distributed and constant across the predicted values on the linear predictor.

]
.pull-right[

![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-42-1.svg)

]

---
# Assumptions

.pull-left[

What does "zero mean and constant variance" look like?

- __mean of the residuals = zero across the predicted values on the linear predictor.__

- spread of residuals is normally distributed and constant across the predicted values on the linear predictor.

]
.pull-right[

![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-43-1.svg)

]

---
# Assumptions

.pull-left[

What does "zero mean and constant variance" look like?

- mean of the residuals = zero across the predicted values on the linear predictor.

- __spread of residuals is normally distributed and constant across the predicted values on the linear predictor.__

]
.pull-right[

![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-44-1.svg)
]

---
# Assumptions

.pull-left[

What does "zero mean and constant variance" look like?

- mean of the residuals = zero across the predicted values on the linear predictor.

- spread of residuals is normally distributed and constant across the predicted values on the linear predictor.

]
.pull-right[
__`plot(model)`__

```r
my_model <- lm(y ~ x, data = df)
plot(my_model, which = 1)
```

![](dapr3_2324_01a_lmrefresh_files/figure-html/unnamed-chunk-46-1.svg)

]

---
# Assumptions: Recipe Book
<br><br>
<center>
<div class="acronym">
L
</div> inearity<br>
<div class="acronym">
I
</div> ndependence<br>
<div class="acronym">
N
</div> ormality<br>
<div class="acronym">
E
</div> qual variance<br>
</center>
.footnote["Line without N is a Lie!" (Umberto)]

---
# What if our model doesn't meet assumptions?

- is our model mis-specified?  
  - is the relationship non-linear? higher order terms? (e.g. `$y \sim x + x^2$`)
  - is there an omitted variable or interaction term? 
  
  
--

- transform the outcome variable?
  - makes things look more "normal"
  - but can make things more tricky to interpret:  
    `lm(y ~ x)` and `lm(log(y) ~ x)` are quite different models

- bootstrap\*
  - do many times: resample (w/ replacement) your data, and refit your model.
  - obtain a distribution of parameter estimate of interest. 
  - compute a confidence interval for estimate
  - celebrate

.footnote[\* not great with small samples.]

__looking ahead:__ these don't help if we have violated our assumption of independence...

---
# Summary

- we can fit a linear regression model which takes the form `$\color{red}{y} = \color{blue}{\mathbf{X} \boldsymbol{\beta}} + \boldsymbol{\varepsilon}$`

- in R, we fit this with `lm(y ~ x1 + .... xk, data = mydata)`.

- we can extend this to different link functions to model outcome variables which follow different distributions.

- when drawing inferences from a fitted model to the broader population, we rely on certain assumptions.

- one of these is that the errors are independent.

---
class: inverse, center, middle, animated, rotateInDownLeft

# End