Standardized Coefficients

class: center, middle, inverse, title-slide

# Standardized Coefficients
## Data Analysis for Psychology in R 2 
### dapR2 Team
### Department of Psychology The University of Edinburgh

---

# Weeks Learning Objectives
1. Understand the calculation and interpretation of the coefficient of determination.

2. Understand the calculation and interpretation of the F-test of model utility.

3. Understand how to standardize model coefficients and when this is appropriate to do.

4. Understand the relationship between the correlation coefficient and the regression slope.

5. Understand the meaning of model coefficients in the case of a binary predictor.

---
# Topics for today
+ Standardization of coefficients
  + Two ways to calculate standardized coefficients
  + Reasons to standardize
  + Standardized `$\hat \beta_1^*$` and correlation coefficient ( `$r$` )

---
# Unstandardized vs standardized coefficients
- So far we have calculated unstandardized `$\hat \beta_1$`.

+ We interpreted the slope as the change in `$y$` units for a unit change in `$x$`
  + Where the unit is determined by how we have measured our variables.

+ In our running example:
  + A unit of study time is 1 hour.
  + A unit of test score is 1 point.
  
+ However, sometimes we may want to represent our results in standard units.

---
# Standardized units
+ Why might standard units be useful?

+ **If the scales of our variables are arbitrary.**
  + Example: A sum score of questionnaire items answered on a Likert scale.
  + A unit here would equal moving from a 2 to 3 on one item.
  + This is not especially meaningful (and actually has A LOT of associated assumptions)

+ **If we want to compare the effects of variables on different scales**
  + If we want to say something like, the effect of `$x_1$` is stronger than the effect of `$x_2$`, we need a common scale.

---
# Standardizing the coefficients
+ After calculating a `$\hat \beta_1$`, it can be standardized by:

`$$\hat{\beta_1^*} = \hat \beta_1 \frac{s_x}{s_y}$$`

+ where;
  + `$\hat{\beta_1^*}$` = standardized beta coefficient
  + `$\hat \beta_1$` = unstandardized beta coefficient
  + `$s_x$` = standard deviation of `$x$`
  + `$s_y$` = standard deviation of `$y$`

---
# Standardizing the variables

+ Alternatively, for continuous variables, transforming both the IV and DV to `$z$`-scores (mean=0, SD=1) prior to fitting the model yields standardised betas.

+ `$z$`-score for `$x$`:

`$$z_{x_i} = \frac{x_i - \bar{x}}{s_x}$$`

+ and the `$z$`-score for `$y$`:

`$$z_{y_i} = \frac{y_i - \bar{y}}{s_y}$$`

+ That is, we divide the individual deviations from the mean by the standard deviation
  
---
# Two approaches in action

```r
res <- lm(score ~ hours, data = test)
summary(res)$coefficients
```

```
##             Estimate Std. Error   t value  Pr(>|t|)
## (Intercept) 0.400000  1.1111010 0.3600033 0.7281636
## hours       1.054545  0.3581403 2.9445039 0.0185812
```

```r
*1.055 * (sd(test$hours)/sd(test$score))
```

```
## [1] 0.7214897
```

---
# Two approaches in action

```r
test <- test %>%
 mutate(
 z_score = scale(score, center = T, scale = T),
 z_hours = scale(hours, center = T, scale = T)
 )

*res_z <- lm(z_score ~ z_hours, data = test)
summary(res_z)$coefficients
```

```
##                  Estimate Std. Error       t value  Pr(>|t|)
## (Intercept) -1.208554e-16  0.2323550 -5.201327e-16 1.0000000
## z_hours      7.211789e-01  0.2449237  2.944504e+00 0.0185812
```

---
#  Interpreting standardized regression coefficients  
+ `$R^2$` , `$F$` and `$t$`-test remain the same for the standardized coefficients as for unstandardised coefficients.

+ `$b_0$` (intercept) = zero when all variables are standardized:
$$
\bar{y} - \hat \beta_1 \bar{x} = 0 - \hat \beta_1  0 = 0
$$

+ The interpretation of the coefficients becomes the increase in `$y$` in standard deviation units for every standard deviation increase in `$x$`

+ So, in our example:

>**For every standard deviation increase in hours of study, test score increases by 0.72 standard deviations**

---
# Relationship to r
+ Standardized slope ( `$\hat \beta_1^*$` ) = correlation coefficient ( `$r$` ) for a linear model with a single continuous predictor.

+ In our example, `$\hat \beta_{hours}^*$` = 0.72

```r
cor(test$hours, test$score)
```

```
## [1] 0.7211789
```

+ `$r$` is a standardized measure of linear association

+ `$\hat \beta_1^*$` is a standardized measure of the linear slope.

---
# Which should we use? 
+ Unstandardized regression coefficients are often more useful when the variables are on  meaningful scales
	+ E.g. X additional hours of exercise per week adds Y years of healthy life

+ Sometimes it's useful to obtain standardized regression coefficients
	+ When the scales of variables are arbitrary
	+ When there is a desire to compare the effects of variables measured on different scales

+ Cautions
	+ Just because you can put regression coefficients on a common metric doesn't mean they can be meaningfully compared.
	+ The SD is a poor measure of spread for skewed distributions, therefore, be cautious of their use with skewed variables
	
---
# Summary of today

+ We have looked at how to calculate standardized coefficients.
  + Either after a model has been run
  + Or do z-scoring the predictors.

+ When predictors have no meaningful scale, standardizing can help us interpret the effects.

+ But when the scale of the IVs are meaningful, it may be more useful to interpret unstandardized effects.

---
class: center, middle
# Thanks for listening!