LM multiple predictors

class: center, middle, inverse, title-slide

# LM multiple predictors 
## Data Analysis for Psychology in R 2 
### dapR2 Team
### Department of Psychology The University of Edinburgh

---

# Weeks Learning Objectives
1. Understand how to extend a simple regression to multiple predictors.

2. Understand and interpret the coefficients in multiple linear regression models

3. Understand how to include and interpret models with categorical variables with 2+ levels.

---
# Topics for today

+ Introducing additional predictors

+ Evaluation of the overall model

+ Evaluation of individual predictors

---
#  Multiple regression 
+ The aim of a linear model is to explain variance in an outcome

+ In simple linear models, we have a single predictor, but the model can accommodate (in principle) any number of predictors.

+ However, when we include multiple predictors, those predictors are likely to correlate

+ Thus, a linear model with multiple predictors finds the optimal prediction of the outcome from several predictors, **taking into account their redundancy with one another**

---
#  Uses of multiple regression 
+ **For prediction:** multiple predictors may lead to improved prediction.

+ **For theory testing:** often our theories suggest that multiple variables together contribute to variation in an outcome

+ **For covariate control:** we might want to assess the effect of a specific predictor, controlling for the influence of others.
	+ E.g., effects of personality on health after removing the effects of age and sex

---
#  Extending the regression model

+ Our model for a single predictor:

`$$y_i = \beta_0 + \beta_1 x_{1i} + \epsilon_i$$`

+ is extended to include additional `$x$`'s:

`$$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} + \epsilon_i$$`

+ For each `$x$`, we have an additional `$b$`
  + `$\beta_1$` is the coefficient for the 1st predictor
  + `$\beta_2$` for the second etc.

---
#  Interpreting coefficients in multiple regression

`$$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_j x_{ji} + \epsilon_i$$`

+ Given that we have additional variables, our interpretation of the regression coefficients changes a little

+ `$\beta_0$` = the predicted value for `$y$` **all** `$x$` are 0.
	
+ Each `$\beta_j$` is now a **partial regression coefficient**
	+ It captures the change in `$y$` for a one unit change in , `$x$` **when all other x's are held constant**

+ What does holding constant mean? 
  + Refers to finding the effect of the predictor when the values of the other predictors are fixed
		+ It may also be expressed as the effect of **controlling for**, or **partialling out**, or **residualizing for** the other `$x$`'s

+ With multiple predictors `lm` isolates the effects and estimates the unique contributions of predictors.

---
#  Visualizing models

.pull-left[

![](dapr2_08_LM_multiple_files/figure-html/unnamed-chunk-2-1.png)

]

.pull-right[

]

???
+ In simple linear models, we could visualise the model as a straight line in 2D space
	+ Least squares finds the coefficients that produces the *regression line* that minimises the vertical distances of the observed y-values from the line

+ In a regression with  2 predictors, this becomes a regression plane in 3D space
	+ The goal now becomes finding the set of coefficients that minimises the vertical distances between the *regression*  *plane* and the observed y-values

+ The logic extends to any number of predictors
	+ (but becomes very difficult to visualise!)

---
#  Example: lm with 2 predictors

+ Imagine we were interested in examining predictors of school performance.

+ we get a teacher rating of child's performance, a self-report measure of self control, and also measure teacher rated class interaction.

+ We collect data on a sample of n=650 12 year old and fit a linear model.

+ We'll fit the model to `$z$`-scores for all variables. 
  + Remember `$z$`-scores have a mean of 0, and a SD of 1
  + So "1 unit" of a `$z$`-score is 1 SD

---
#  `lm` code

.pull-left[

]

```r
*perf <- lm(z_perf ~ z_SC + z_interaction,
 data = data)
```

+ Multiple predictors are separated by `+`

---
#  Multiple regression coefficients

```r
summary(perf)
```

```
## 
## Call:
## lm(formula = z_perf ~ z_SC + z_interaction, data = data)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -2.3169 -0.5840 -0.0989 0.5284 3.9093 
## 
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) 
## (Intercept) 9.822e-17 3.436e-02 0.000 1.000 
## z_SC 4.839e-01 3.444e-02 14.048 <2e-16 ***
## z_interaction -1.175e-02 3.444e-02 -0.341 0.733 
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8761 on 647 degrees of freedom
## Multiple R-squared: 0.2349,	Adjusted R-squared: 0.2325 
## F-statistic: 99.32 on 2 and 647 DF, p-value: < 2.2e-16
```

---
#  Multiple regression coefficients

```r
res <- summary(perf)
res$coefficients
```

```
##                    Estimate Std. Error       t value     Pr(>|t|)
## (Intercept)    9.822069e-17 0.03436173  2.858432e-15 1.000000e+00
## z_SC           4.838522e-01 0.03444282  1.404798e+01 2.558540e-39
## z_interaction -1.175418e-02 0.03444282 -3.412664e-01 7.330139e-01
```

+ **Controlling for class interaction, for every SD unit increase in self-control, there is a 0.48 SD unit increase in academic performance**

---
#  Multiple regression coefficients

```r
res <- summary(perf)
res$coefficients
```

+ **Controlling for self-control, for every SD unit increase in rating of class interaction, there is a -0.01 SD unit decrease in academic performance**

---
class: center, middle
# Time for a break

**Quiz time!**

---
class: center, middle
# Welcome Back!

**Where we left off... **

Overview lm with multiple predictors

Now we will look at model evaluation

---
#  `$R^2$` 
+ Like in simple regression, we use `$R^2$` for overall model evaluation.

+ The sums of squares used to calculate `$R^2$` are defined in the same way as for simple regression.

`$$SS_{total} = \sum_{i=1}^{n}(y_i - \bar{Y})^2$$`

`$$SS_{Residual} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$`

`$$SS_{Model} = SS_{total} - SS_{residual}$$`

???
+ The only difference is that is based on multiple IVs.

---
#  `$R^2$`

+ `$R^2$` is then calculated in the same way as in simple regression:

`$$R^2 = 1 - \frac{SS_{residual}}{SS_{total}}$$`

+ or

`$$R^2 = \frac{SS_{model}}{SS_{total}}$$`

---
#  `$R^2$` interpretation

+ `$R^2$` = the proportion of variation in the outcome accounted for by the model (all the predictors)

+ It's square root is now the multiple correlation coefficient between predictors and outcome

+ The multiple correlation coefficient summarizes the shared relationship between `$Y$` and a set of variables `$x$`'s
	+ It is the squared correlation between the observed `$y$` and predicted `$y$` values.

---
#  Adjusted `$R^2$` 
+ We can also compute an adjusted `$R^2$` when our lm has 2+ predictors.
  + `$R^2$` is an inflated estimate of the corresponding population value

+ Due to random sampling fluctuation, even when `$R^2 = 0$` in the population, it's value in the sample may `$\neq 0$`

+ In **smaller samples** , the fluctuations from zero will be larger on average

+ With **more IVs** , there are more opportunities to add to the positive fluctuation

`$$\hat R^2 = 1 - (1 - R^2)\frac{N-1}{N-k-1}$$`

+ Adjusted `$R^2$` adjusts for both sample size ( `$N$` ) and number of predictors ( `$k$` )

---
#  In our academic performance example

```r
res
```

---
#  In our academic performance example

+ **Based on adjusted R-squared, self-control and class interaction together explain 23.3% of the variance in academic performance**

+ As the sample size is large and the number of predictors small, unadjusted ( 0.235 ) and adjusted R-squared ( 0.233 ) are similar.

---
#  `$F$`-ratio 
+ Like in simple regression, the `$F$`-ratio is used to test the null hypothesis that **all** model slopes are zero.

+ It is calculated in exactly the same way as in simple linear model:

`$$F = \frac{MS_{Model}}{MS_{Residual}} = \frac{\frac{SS_{model}}{df_{model}}}{\frac{SS_{residual}}{df_{residual}}}$$`

+ Where
	+ df model = `$k$`
	+ df residual = `$N$` - `$k$` - 1
		  + `$N$` = sample size
		  + `$k$` = number of predictors

---
#  In our academic performance example?

```r
res
```

---
#  In our academic performance example?

+ Our overall model was significant (*F(2,647)=99.32, p<.001*).

---
class: center, middle
# Time for a break

**No specific task this time**

But use this as a chance to post some questions on the discussion board.

---
class: center, middle
# Welcome Back!

**Where we left off... **

We have evaluated the overall model...

...now time for the individual predictors

---
#  Evaluating individual predictors

+ Broadly follows the same procedure as in simple regression:
	+ Standard errors (SEs) for each regression slope are computed
	
	+ SE gives a measure of the sampling variability of a regression coefficient
	
	+ `$t$`-tests and confidence intervals evaluate the statistical significance of regression slopes

---
#  Standard errors

.pull-left[

`$$SE(\hat \beta_j) = \sqrt{\frac{ SS_{Residual}/(n-k-1)}{\sum(x_i - \bar{x})^2}}$$`

]

.pull-right[

`$$SE(\hat \beta_j) = \sqrt{\frac{ SS_{Residual}/(n-k-1)}{\sum(x_{ij} - \bar{x_{ij}})^2(1-R_{xj}^2)}}$$`

+ `$1-R_{xj}^2$` is capturing the correlation between `$x_j$` and all other `$x$`'s

]

---
#  Standard errors

`$$SE(\hat \beta_j) = \sqrt{\frac{ SS_{Residual}/(n-k-1)}{\sum(x_{ij} - \bar{x_{ij}})^2(1-R_{xj}^2)}}$$`

+ Examining the above formula we can see that:
	+ `$SE$` is smaller when residual variance ( `$SS_{residual}$` ) is smaller
	+ `$SE$` is smaller when sample size ( `$N$` ) is larger
	+ `$SE$` is larger when the number of predictors ( `$k$` ) is larger
	+ `$SE$` is larger when a predictor is strongly correlated with other predictors ( `$R_{xj}^2$` )

???
+ Well return to this later when we discuss multi-collinearity issues

---
#  Significance of coefficients

+ Once we have the standard error, all else is the same:

`$$t = \frac{\hat \beta_1}{SE(\hat \beta_1)}$$`

+ A `$t$`-test of the null hypothesis that `$b_j = 0$`

+ The `$t$`-value is compared to a `$t$`-distribution with N-k-1 degrees of freedom to assess statistical significance at a given `$\alpha$`.

---
#  Our academic performance example

```r
res$coefficients
```

**Self-control (t(647)=14.05, p<.001) was a significant predictor of academic performance ( `$\alpha = 0.05$` ), and so we reject the null hypothesis of no effect. However, we failed to reject the null for class interaction (t(647)=-0.34, p>.05).**

---
#  Confidence intervals

+ Like in simple regression, we can also compute confidence intervals for slopes in multiple regression.

+ The 100(1-alpha) confidence interval for the slope is:

`$$\hat \beta_1 \pm t^* \times SE(\hat \beta_1)$$`

```r
tibble(
  lower = res$coefficients[2,1] - round(qt(0.975, 647), 3) *  res$coefficients[2,2],
  upper = res$coefficients[2,1] + round(qt(0.975, 647), 3) *  res$coefficients[2,2]
)
```

```
## # A tibble: 1 x 2
## lower upper
## <dbl> <dbl>
## 1 0.416 0.551
```

---
#  Confidence intervals

+ Or the much easier version...

```r
confint(perf)
```

```
##                     2.5 %     97.5 %
## (Intercept)   -0.06747398 0.06747398
## z_SC           0.41621897 0.55148537
## z_interaction -0.07938738 0.05587903
```

---
#  Standardising coefficients 
+ Lastly, standardization follows the same steps as we have already seen.
  + Variables can be `$z$`-scored prior to analysis (our example)
  + Or can be standardized after model estimation.

+ However, unlike in simple lm, the standardised regression coefficients are not equal to the Pearson correlation with multiple predictors between that IV and Y.
	+ They equal the **partial** correlation coefficient 
	+ the correlation between an predictor and outcome holding all the other predictors constant

---
# Summary of today

+ Considered the evaluation and interpretation of individual coefficients in lm with multiple predictors.

+ Looked at model evaluation for an lm with multiple predictors.

+ Seen that broadly, nothing much has changed.

+ Apart from the fact that we now must take account of the presence of other predictors and the correlations between them.

---
class: center, middle
# Thanks for listening!