Week 4: LM Assumptions

class: center, middle, inverse, title-slide

# Week 4: LM Assumptions 
## Data Analysis for Psychology in R 2 
### TOM BOOTH & ALEX DOUMAS
### Department of Psychology The University of Edinburgh
### AY 2020-2021

---

# Week's Learning Objectives
1. Understand the meaning of model coefficients in the case of a binary predictor.

2. Be able to state the assumptions underlying a linear model.

3. Understand how to assess if a fitted model satisfies the linear model assumptions.

4. Understand how to use transformations when the model violates assumptions.

---
# Topics for today
+ What are the assumptions of linear model and how can we test them?
	+ Linearity
	+ Independence of errors
	+ Normality of errors
	+ Equal variance (Homoscedasticity)

---
# Linear model assumptions 
+ So far, we have discussed evaluating linear models with respect to:
	+ Overall model fit ( `$F$` -ratio, `$R^2$`)
	+ Individual predictors

+ However, the linear model is also built on a set of assumptions

+ If these assumptions are violated, the model will not be very accurate

+ Thus, we also need to test these assumptions

---
# Today's example
+ Today we will continue using our test score example.

+ As of next week, we will move on to using examples from published papers.

```r
df <- read_csv("./dapr2_lec07.csv")
m1 <- lm(score ~ hours, data = df)
m2 <- lm(score ~ study, data = df)
```

+ We do all our assumption testing after fitting the `lm()` model.
  + We will need to use information from the object `m1` and `m2`

???
+ This is a good time to make sure we are happy with the idea of objects
+ `M1` is a lm() model object
+ Contains information about the model we ran, estimates of the residuals, predicted scores etc.

---
# Visualizations vs tests
+ In talking about assumption checks, we will present statistical tests and visualizations

+ In general, graphical methods are often more useful
	+ Easier to see the nature and magnitude of the assumption violation
	+ There is also a very useful function for producing them all.

+ Statistical tests often suggest assumptions are violated when problem is small

???
Make the general point about power and tests.

---
# Visualizations made easy
+ For a majority of assumption and diagnostic plots, we will make use of the `plot()` function.

+ If we give `plot()` a linear model object (e.g. `m1` or `m2`), we can automatically get 6 useful plots.
  + we will explain these over the the next few weeks.

---
#  Linearity 
+ **Assumption**: The relationship between `$y$` and `$x$` is linear.
  + Assuming a linear relation when the true relation is non-linear can result in under-estimating that relation

+ **Investigated with**:
  + Scatterplots with loess lines.

---
# Linear vs non-linear

.pull-left[

]

.pull-right[

]

---
#  What is a loess line?

+ Method for helping visualize the shape of relationships:

+ Stands for...
  + **LO**cally
  + **E**stimated
  + **S**catterplot
  + **S**moothing

+ Essentially produces a line with follows the data.

---
# Visualization

.pull-left[

```r
lin_m1 <- df %>%
 ggplot(., aes(x=hours, y=score)) +
 geom_point()+
 geom_smooth(method = "lm", se=F) + # <<
* geom_smooth(method = "loess", se=F,
 col = "red") +
 labs(x= "Hours Study", y="Test Score", 
 title = "Scatterplot with linear (blue) 
 and loess (red) lines")
```
]

.pull-right[
<img src="dapR2_lec08_LMassumptions_files/figure-html/unnamed-chunk-6-1.png" width="90%" />

]

---
# Normally distributed errors 
+ **Assumption**: The errors ( `$\epsilon_i$` ) are normally distributed around each predicted value.

+ **Investigated with**:
  + QQ-plots
  +	Histograms
	+ Shapiro-Wilk test

---
# Visualizations 
+ **Histograms**: Plot the frequency distribution of the residuals.

```r
hist(m1$residuals)
```

+ **Q-Q Plots**: Quantile comparison plots.
	+ Plot the standardized residuals from the model against their theoretically expected values.
	+ If the residuals are normally distributed, the points should fall neatly on the diagonal of the plot.
	+ Non-normally distributed residuals cause deviations of points from the diagonal.
		+ The specific shape of these deviations are characteristic of the distribution of the residuals.

```r
*plot(m1, which = 2)
```

---
# Visualizations

.pull-left[

]

.pull-right[

]

---
# shapiro.test() 
+ The Shapiro-Wilk test provides a significance test on the departure from normality.

+ A significant `$p$`-value ( `$\alpha = .05$` ) suggests that the residuals deviate from normality.

```r
shapiro.test(m1$residuals)
```

```
## 
## 	Shapiro-Wilk normality test
## 
## data:  m1$residuals
## W = 0.99198, p-value = 0.5628
```

---
#  Equal variance (Homoscedasticity)

+ **Assumption**: The equal variances assumption is constant across values of the predictors `$x_1$`, ... `$x_k$`, and across values of the fitted values `$\hat{y}$`
	+ Heteroscedasticity refers to when this assumption is violated (non-constant variance)

+ **Investigated with**:
  + Plot residual values against the predicted values ( `$\hat{y}$` ).
	+ Breusch-Pagan test (Non-constant variance test)

---
#  Residual-vs-predicted values plot 
+ In R, we can plot the residuals vs predicted values using `residualPlot()` function in the `car` package.

+ Categorical predictors should show a similar spread of residual values across their levels

+ The plots for continuous predictors should look like a random array of dots

+ The solid line should follow the dashed line closely

```r
residualPlot(m1)
```

---
#  Residual-vs-predicted values plot

.pull-left[
<img src="dapR2_lec08_LMassumptions_files/figure-html/unnamed-chunk-13-1.png" width="90%" />

]

.pull-right[
<img src="dapR2_lec08_LMassumptions_files/figure-html/unnamed-chunk-14-1.png" width="90%" />

]

???
Discuss the right hand plot for the binary variable

---
#  Breusch-Pagan test

.pull-left[
+ Also called the non-constant variance test

+ Tests whether residual variance depends on the predicted values

+ Implemented using the `ncvTest()` function in R
  + Non-significant `$p$`-value suggests homoscedasticity assumption holds
]

.pull-right[

```r
ncvTest(m1)
```

```
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 4.975437, Df = 1, p = 0.02571
```

]

---
#  Independence of errors 
+ **Assumption**: The errors are not correlated with one another

+ Difficult to test unless we know the potential source of correlation between cases.

+ We can test a limited form of the assumption by testing for autocorrelation between errors.
	+ We can test the correlation between each case an adjacent cases in the dataset
	+ Achieved using the Durbin-Watson test

---
#  Durbin-Watson test 
+ Durbin-Watson test implemented in R using the `durbinWatsonTest()` function:

```r
durbinWatsonTest(m1)
```

```
##  lag Autocorrelation D-W Statistic p-value
##    1     -0.07672954      2.148216   0.368
##  Alternative hypothesis: rho != 0
```

+ The D-W statistic can take values between 0 and 4
	+ 2= no autocorrelation
+ Therefore, we ideally want D-W values close to 2 and a non-significant `$p$`-value
 + Values <1 or >3 may indicate problems

---
class: center, middle
# Time for a break

**And a quiz...identify the plot and the assumption**

---
class: center, middle
# Violated Assumptions
What do we do about non-normality of residuals, heteroscedasticity and non-linearity?

---
# Non-linear transformations 
+ Often non-normal residuals, heteroscedasticity and non-linearity can be ameliorated by a non-linear transformation of the outcome and/or predictors.

+ This involves applying a function (see first week) to the values of a variable. 
  + This changes the values and overall shape of the distribution

+ For non-normal residuals and heteroscedasticity, skewed outcomes can be transformed to normality

+ Non-linearity may be helped by a transformation of both predictors and outcomes

---
#  Transforming variables to normality 
+ Positively skewed data can be made more normally distributed using a log-transformation.

+ Negatively skewed data can be made more normally distributed using same procedure but first reflecting the variable (make biggest values the smallest and smallest the biggest) and then applying the log-transform

+ What does skew look like?

---
# Visualizing Skew

.pull-left[

]

.pull-right[
<img src="dapR2_lec08_LMassumptions_files/figure-html/unnamed-chunk-18-1.png" width="90%" />

]

---
#  Log-transformations 
+ Log-transformations can be implemented in R using the `log()` function.

+ If your variable contains zero or negative values, you need to first add a constant to make all your values positive
	+ A good strategy is to add a constant so that your minimum value is one
	+ E.g., if your minimum value is -1.5, add 2.5 to all your values

---
# Log-transformation in action

```r
df_skew <- df_skew %>%
 mutate(
* log_pos = log(pos),
* neg_ref = ((-1)*neg) + (max(neg)+1),
* log_neg = log(neg_ref)
 )
```

---
# Log-transformation in action

.pull-left[
<img src="dapR2_lec08_LMassumptions_files/figure-html/unnamed-chunk-20-1.png" width="90%" />

]

.pull-right[
<img src="dapR2_lec08_LMassumptions_files/figure-html/unnamed-chunk-21-1.png" width="90%" />

]

---
# Log-transformation in action

.pull-left[
<img src="dapR2_lec08_LMassumptions_files/figure-html/unnamed-chunk-22-1.png" width="90%" />

]

.pull-right[
<img src="dapR2_lec08_LMassumptions_files/figure-html/unnamed-chunk-23-1.png" width="90%" />

]

---
# Summary of today

+ Looked at the third set of model evaluations, assumptions.

+ Described and considered how to assess:
  + Linearity
	+ Independence of errors
	+ Normality of errors
	+ Equal variance (Homoscedasticity)

+ Key take home point:
  + There are no hard and fast rules for assessing assumptions
  + It takes practice to consider if violations are a problem

---
# Next tasks

+ This week:
  + Complete your lab
  + Come to office hours
  + Weekly quiz: Assessed quiz - Week 3 content.
      + Open Monday 09:00
      + Closes Sunday 17:00