class: center, middle, inverse, title-slide #
Week 3: Standardized Coefficients
## Data Analysis for Psychology in R 2
### TOM BOOTH & ALEX DOUMAS ### Department of Psychology
The University of Edinburgh ### AY 2020-2021 --- # Weeks Learning Objectives 1. Understand the calculation and interpretation of the coefficient of determination. 2. Understand the calculation and interpretation of the F-test of model utility. 3. Understand how to standardize model coefficients and when this is appropriate to do. 4. Understand the relationship between the correlation coefficient and the regression slope. --- # Topics for today + Standardization of coefficients + Two ways to calculate standardized coefficients + Reasons to standardize + Standardized `\(\hat \beta_1^*\)` and correlation coefficient ( `\(r\)` ) --- # Unstandardized vs standardized coefficients - So far we have calculated unstandardized `\(\hat \beta_1\)`. + We interpreted the slope as the change in `\(y\)` units for a unit change in `\(x\)` + Where the unit is determined by how we have measured our variables. + In our running example: + A unit of study time is 1 hour. + A unit of test score is 1 point. + However, sometimes we may want to represent our results in standard units. --- # Standardized units + Why might standard units be useful? -- + **If the scales of our variables are arbitrary.** + Example: A sum score of questionnaire items answered on a Likert scale. + A unit here would equal moving from a 2 to 3 on one item. + This is not especially meaningful (and actually has A LOT of associated assumptions) -- + **If we want to compare the effects of variables on different scales** + If we want to say something like, the effect of `\(x_1\)` is stronger than the effect of `\(x_2\)`, we need a common scale. --- # Standardizing the coefficients + After calculating a `\(\hat \beta_1\)`, it can be standardized by: `$$\hat{\beta_1^*} = \hat \beta_1 \frac{s_x}{s_y}$$` + where; + `\(\hat{\beta_1^*}\)` = standardized beta coefficient + `\(\hat \beta_1\)` = unstandardized beta coefficient + `\(s_x\)` = standard deviation of `\(x\)` + `\(s_y\)` = standard deviation of `\(y\)` --- # Standardizing the variables + Alternatively, for continuous variables, transforming both the IV and DV to `\(z\)`-scores (mean=0, SD=1) prior to fitting the model yields standardised betas. + `\(z\)`-score for `\(x\)`: `$$z_{x_i} = \frac{x_i - \bar{x}}{s_x}$$` + and the `\(z\)`-score for `\(y\)`: `$$z_{y_i} = \frac{y_i - \bar{y}}{s_y}$$` + That is, we divide the individual deviations from the mean by the standard deviation --- # Two approaches in action ```r res <- lm(score ~ hours, data = test) summary(res)$coefficients ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.400000 1.1111010 0.3600033 0.7281636 ## hours 1.054545 0.3581403 2.9445039 0.0185812 ``` ```r *1.055 * (sd(test$hours)/sd(test$score)) ``` ``` ## [1] 0.7214897 ``` --- # Two approaches in action ```r test <- test %>% mutate( z_score = scale(score, center = T, scale = T), z_hours = scale(hours, center = T, scale = T) ) *res_z <- lm(z_score ~ z_hours, data = test) summary(res_z)$coefficients ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1.208554e-16 0.2323550 -5.201327e-16 1.0000000 ## z_hours 7.211789e-01 0.2449237 2.944504e+00 0.0185812 ``` --- # Interpreting standardized regression coefficients + `\(R^2\)` , `\(F\)` and `\(t\)`-test remain the same for the standardized coefficients as for unstandardised coefficients. + `\(b_0\)` (intercept) = zero when all variables are standardized: $$ \bar{y} - \hat \beta_1 \bar{x} = 0 - \hat \beta_1 0 = 0 $$ + The interpretation of the coefficients becomes the increase in `\(y\)` in standard deviation units for every standard deviation increase in `\(x\)` + So, in our example: >**For every standard deviation increase in hours of study, test score increases by 0.72 standard deviations** --- # Relationship to r + Standardized slope ( `\(\hat \beta_1^*\)` ) = correlation coefficient ( `\(r\)` ) for a linear model with a single continuous predictor. + In our example, `\(\hat \beta_{hours}^*\)` = 0.72 ```r cor(test$hours, test$score) ``` ``` ## [1] 0.7211789 ``` + `\(r\)` is a standardized measure of linear association + `\(\hat \beta_1^*\)` is a standardized measure of the linear slope. --- # Which should we use? + Unstandardized regression coefficients are often more useful when the variables are on meaningful scales + E.g. X additional hours of exercise per week adds Y years of healthy life + Sometimes it's useful to obtain standardized regression coefficients + When the scales of variables are arbitrary + When there is a desire to compare the effects of variables measured on different scales + Cautions + Just because you can put regression coefficients on a common metric doesn't mean they can be meaningfully compared. + The SD is a poor measure of spread for skewed distributions, therefore, be cautious of their use with skewed variables --- # Summary of today + We have looked at how to calculate standardized coefficients. + Either after a model has been run + Or do z-scoring the predictors. + When predictors have no meaningful scale, standardizing can help us interpret the effects. + But when the scale of the IVs are meaningful, it may be more useful to interpret unstandardized effects. --- # Next tasks + This week: + Complete your lab + Come to office hours + Weekly quiz: First assessed quiz - Week 2 content. + Open Monday 09:00 + Closes Sunday 17:00