class: center, middle, inverse, title-slide #
LM multiple predictors
## Data Analysis for Psychology in R 2
### dapR2 Team ### Department of Psychology
The University of Edinburgh --- # Weeks Learning Objectives 1. Understand how to extend a simple regression to multiple predictors. 2. Understand and interpret the coefficients in multiple linear regression models 3. Understand how to include and interpret models with categorical variables with 2+ levels. --- # Topics for today + Introducing additional predictors + Evaluation of the overall model + Evaluation of individual predictors --- # Multiple regression + The aim of a linear model is to explain variance in an outcome + In simple linear models, we have a single predictor, but the model can accommodate (in principle) any number of predictors. + However, when we include multiple predictors, those predictors are likely to correlate + Thus, a linear model with multiple predictors finds the optimal prediction of the outcome from several predictors, **taking into account their redundancy with one another** --- # Uses of multiple regression + **For prediction:** multiple predictors may lead to improved prediction. + **For theory testing:** often our theories suggest that multiple variables together contribute to variation in an outcome + **For covariate control:** we might want to assess the effect of a specific predictor, controlling for the influence of others. + E.g., effects of personality on health after removing the effects of age and sex --- # Extending the regression model + Our model for a single predictor: `$$y_i = \beta_0 + \beta_1 x_{1i} + \epsilon_i$$` + is extended to include additional `\(x\)`'s: `$$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} + \epsilon_i$$` + For each `\(x\)`, we have an additional `\(b\)` + `\(\beta_1\)` is the coefficient for the 1st predictor + `\(\beta_2\)` for the second etc. --- # Interpreting coefficients in multiple regression `$$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_j x_{ji} + \epsilon_i$$` + Given that we have additional variables, our interpretation of the regression coefficients changes a little + `\(\beta_0\)` = the predicted value for `\(y\)` **all** `\(x\)` are 0. + Each `\(\beta_j\)` is now a **partial regression coefficient** + It captures the change in `\(y\)` for a one unit change in , `\(x\)` **when all other x's are held constant** + What does holding constant mean? + Refers to finding the effect of the predictor when the values of the other predictors are fixed + It may also be expressed as the effect of **controlling for**, or **partialling out**, or **residualizing for** the other `\(x\)`'s + With multiple predictors `lm` isolates the effects and estimates the unique contributions of predictors. --- # Visualizing models .pull-left[ ![](dapr2_08_LM_multiple_files/figure-html/unnamed-chunk-2-1.png)<!-- --> ] .pull-right[ <img src="./lm_surface.png" width="1276" /> ] ??? + In simple linear models, we could visualise the model as a straight line in 2D space + Least squares finds the coefficients that produces the *regression line* that minimises the vertical distances of the observed y-values from the line + In a regression with 2 predictors, this becomes a regression plane in 3D space + The goal now becomes finding the set of coefficients that minimises the vertical distances between the *regression* *plane* and the observed y-values + The logic extends to any number of predictors + (but becomes very difficult to visualise!) --- # Example: lm with 2 predictors + Imagine we were interested in examining predictors of school performance. + we get a teacher rating of child's performance, a self-report measure of self control, and also measure teacher rated class interaction. + We collect data on a sample of n=650 12 year old and fit a linear model. + We'll fit the model to `\(z\)`-scores for all variables. + Remember `\(z\)`-scores have a mean of 0, and a SD of 1 + So "1 unit" of a `\(z\)`-score is 1 SD --- # `lm` code .pull-left[ ] ```r *perf <- lm(z_perf ~ z_SC + z_interaction, data = data) ``` + Multiple predictors are separated by `+` --- # Multiple regression coefficients ```r summary(perf) ``` ``` ## ## Call: ## lm(formula = z_perf ~ z_SC + z_interaction, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.3169 -0.5840 -0.0989 0.5284 3.9093 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822e-17 3.436e-02 0.000 1.000 ## z_SC 4.839e-01 3.444e-02 14.048 <2e-16 *** ## z_interaction -1.175e-02 3.444e-02 -0.341 0.733 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.8761 on 647 degrees of freedom ## Multiple R-squared: 0.2349, Adjusted R-squared: 0.2325 ## F-statistic: 99.32 on 2 and 647 DF, p-value: < 2.2e-16 ``` --- # Multiple regression coefficients ```r res <- summary(perf) res$coefficients ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822069e-17 0.03436173 2.858432e-15 1.000000e+00 ## z_SC 4.838522e-01 0.03444282 1.404798e+01 2.558540e-39 ## z_interaction -1.175418e-02 0.03444282 -3.412664e-01 7.330139e-01 ``` + **Controlling for class interaction, for every SD unit increase in self-control, there is a 0.48 SD unit increase in academic performance** --- # Multiple regression coefficients ```r res <- summary(perf) res$coefficients ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822069e-17 0.03436173 2.858432e-15 1.000000e+00 ## z_SC 4.838522e-01 0.03444282 1.404798e+01 2.558540e-39 ## z_interaction -1.175418e-02 0.03444282 -3.412664e-01 7.330139e-01 ``` + **Controlling for self-control, for every SD unit increase in rating of class interaction, there is a -0.01 SD unit decrease in academic performance** --- class: center, middle # Time for a break **Quiz time!** --- class: center, middle # Welcome Back! **Where we left off... ** Overview lm with multiple predictors Now we will look at model evaluation --- # `\(R^2\)` + Like in simple regression, we use `\(R^2\)` for overall model evaluation. + The sums of squares used to calculate `\(R^2\)` are defined in the same way as for simple regression. `$$SS_{total} = \sum_{i=1}^{n}(y_i - \bar{Y})^2$$` `$$SS_{Residual} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$` `$$SS_{Model} = SS_{total} - SS_{residual}$$` ??? + The only difference is that is based on multiple IVs. --- # `\(R^2\)` + `\(R^2\)` is then calculated in the same way as in simple regression: `$$R^2 = 1 - \frac{SS_{residual}}{SS_{total}}$$` + or `$$R^2 = \frac{SS_{model}}{SS_{total}}$$` --- # `\(R^2\)` interpretation + `\(R^2\)` = the proportion of variation in the outcome accounted for by the model (all the predictors) + It's square root is now the multiple correlation coefficient between predictors and outcome + The multiple correlation coefficient summarizes the shared relationship between `\(Y\)` and a set of variables `\(x\)`'s + It is the squared correlation between the observed `\(y\)` and predicted `\(y\)` values. --- # Adjusted `\(R^2\)` + We can also compute an adjusted `\(R^2\)` when our lm has 2+ predictors. + `\(R^2\)` is an inflated estimate of the corresponding population value + Due to random sampling fluctuation, even when `\(R^2 = 0\)` in the population, it's value in the sample may `\(\neq 0\)` + In **smaller samples** , the fluctuations from zero will be larger on average + With **more IVs** , there are more opportunities to add to the positive fluctuation `$$\hat R^2 = 1 - (1 - R^2)\frac{N-1}{N-k-1}$$` + Adjusted `\(R^2\)` adjusts for both sample size ( `\(N\)` ) and number of predictors ( `\(k\)` ) --- # In our academic performance example ```r res ``` ``` ## ## Call: ## lm(formula = z_perf ~ z_SC + z_interaction, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.3169 -0.5840 -0.0989 0.5284 3.9093 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822e-17 3.436e-02 0.000 1.000 ## z_SC 4.839e-01 3.444e-02 14.048 <2e-16 *** ## z_interaction -1.175e-02 3.444e-02 -0.341 0.733 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.8761 on 647 degrees of freedom ## Multiple R-squared: 0.2349, Adjusted R-squared: 0.2325 ## F-statistic: 99.32 on 2 and 647 DF, p-value: < 2.2e-16 ``` --- # In our academic performance example + **Based on adjusted R-squared, self-control and class interaction together explain 23.3% of the variance in academic performance** + As the sample size is large and the number of predictors small, unadjusted ( 0.235 ) and adjusted R-squared ( 0.233 ) are similar. --- # `\(F\)`-ratio + Like in simple regression, the `\(F\)`-ratio is used to test the null hypothesis that **all** model slopes are zero. + It is calculated in exactly the same way as in simple linear model: `$$F = \frac{MS_{Model}}{MS_{Residual}} = \frac{\frac{SS_{model}}{df_{model}}}{\frac{SS_{residual}}{df_{residual}}}$$` + Where + df model = `\(k\)` + df residual = `\(N\)` - `\(k\)` - 1 + `\(N\)` = sample size + `\(k\)` = number of predictors --- # In our academic performance example? ```r res ``` ``` ## ## Call: ## lm(formula = z_perf ~ z_SC + z_interaction, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.3169 -0.5840 -0.0989 0.5284 3.9093 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822e-17 3.436e-02 0.000 1.000 ## z_SC 4.839e-01 3.444e-02 14.048 <2e-16 *** ## z_interaction -1.175e-02 3.444e-02 -0.341 0.733 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.8761 on 647 degrees of freedom ## Multiple R-squared: 0.2349, Adjusted R-squared: 0.2325 ## F-statistic: 99.32 on 2 and 647 DF, p-value: < 2.2e-16 ``` --- # In our academic performance example? + Our overall model was significant (*F(2,647)=99.32, p<.001*). --- class: center, middle # Time for a break **No specific task this time** But use this as a chance to post some questions on the discussion board. --- class: center, middle # Welcome Back! **Where we left off... ** We have evaluated the overall model... ...now time for the individual predictors --- # Evaluating individual predictors + Broadly follows the same procedure as in simple regression: + Standard errors (SEs) for each regression slope are computed + SE gives a measure of the sampling variability of a regression coefficient + `\(t\)`-tests and confidence intervals evaluate the statistical significance of regression slopes --- # Standard errors .pull-left[ `$$SE(\hat \beta_j) = \sqrt{\frac{ SS_{Residual}/(n-k-1)}{\sum(x_i - \bar{x})^2}}$$` ] .pull-right[ `$$SE(\hat \beta_j) = \sqrt{\frac{ SS_{Residual}/(n-k-1)}{\sum(x_{ij} - \bar{x_{ij}})^2(1-R_{xj}^2)}}$$` + `\(1-R_{xj}^2\)` is capturing the correlation between `\(x_j\)` and all other `\(x\)`'s ] --- # Standard errors `$$SE(\hat \beta_j) = \sqrt{\frac{ SS_{Residual}/(n-k-1)}{\sum(x_{ij} - \bar{x_{ij}})^2(1-R_{xj}^2)}}$$` + Examining the above formula we can see that: + `\(SE\)` is smaller when residual variance ( `\(SS_{residual}\)` ) is smaller + `\(SE\)` is smaller when sample size ( `\(N\)` ) is larger + `\(SE\)` is larger when the number of predictors ( `\(k\)` ) is larger + `\(SE\)` is larger when a predictor is strongly correlated with other predictors ( `\(R_{xj}^2\)` ) ??? + Well return to this later when we discuss multi-collinearity issues --- # Significance of coefficients + Once we have the standard error, all else is the same: `$$t = \frac{\hat \beta_1}{SE(\hat \beta_1)}$$` + A `\(t\)`-test of the null hypothesis that `\(b_j = 0\)` + The `\(t\)`-value is compared to a `\(t\)`-distribution with N-k-1 degrees of freedom to assess statistical significance at a given `\(\alpha\)`. --- # Our academic performance example ```r res$coefficients ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822069e-17 0.03436173 2.858432e-15 1.000000e+00 ## z_SC 4.838522e-01 0.03444282 1.404798e+01 2.558540e-39 ## z_interaction -1.175418e-02 0.03444282 -3.412664e-01 7.330139e-01 ``` **Self-control (t(647)=14.05, p<.001) was a significant predictor of academic performance ( `\(\alpha = 0.05\)` ), and so we reject the null hypothesis of no effect. However, we failed to reject the null for class interaction (t(647)=-0.34, p>.05).** --- # Confidence intervals + Like in simple regression, we can also compute confidence intervals for slopes in multiple regression. + The 100(1-alpha) confidence interval for the slope is: `$$\hat \beta_1 \pm t^* \times SE(\hat \beta_1)$$` ```r tibble( lower = res$coefficients[2,1] - round(qt(0.975, 647), 3) * res$coefficients[2,2], upper = res$coefficients[2,1] + round(qt(0.975, 647), 3) * res$coefficients[2,2] ) ``` ``` ## # A tibble: 1 x 2 ## lower upper ## <dbl> <dbl> ## 1 0.416 0.551 ``` --- # Confidence intervals + Or the much easier version... ```r confint(perf) ``` ``` ## 2.5 % 97.5 % ## (Intercept) -0.06747398 0.06747398 ## z_SC 0.41621897 0.55148537 ## z_interaction -0.07938738 0.05587903 ``` --- # Standardising coefficients + Lastly, standardization follows the same steps as we have already seen. + Variables can be `\(z\)`-scored prior to analysis (our example) + Or can be standardized after model estimation. + However, unlike in simple lm, the standardised regression coefficients are not equal to the Pearson correlation with multiple predictors between that IV and Y. + They equal the **partial** correlation coefficient + the correlation between an predictor and outcome holding all the other predictors constant --- # Summary of today + Considered the evaluation and interpretation of individual coefficients in lm with multiple predictors. + Looked at model evaluation for an lm with multiple predictors. + Seen that broadly, nothing much has changed. + Apart from the fact that we now must take account of the presence of other predictors and the correlations between them. --- class: center, middle # Thanks for listening!