LM multiple predictors
## Data Analysis for Psychology in R 2
### dapR2 Team ### Department of Psychology
The University of Edinburgh --- # Weeks Learning Objectives 1. Understand how to extend a simple regression to multiple predictors. 2. Understand and interpret the coefficients in multiple linear regression models 3. + Introducing additional predictors + Evaluation of the overall model + Evaluation of individual predictors --- # Multiple regression + The aim of a linear model is to explain variance in an outcome + In simple linear models, we have a single predictor, but the model can accommodate (in principle) any number of predictors. + However, when we include multiple predictors, those predictors are likely to correlate + Thus, a linear model with multiple predictors finds the optimal prediction of the outcome from several predictors, **taking into account their redundancy with one another** --- # Uses of multiple regression + **For prediction:** multiple predictors may lead to improved prediction. + **For theory testing:** often our theories suggest that multiple variables together contribute to variation in an outcome + **For covariate control:** we might want to assess the effect of a specific predictor, controlling for the influence of others. + E.g., effects of personality on health after removing the effects of age and sex --- # Extending the regression model + Our model for a single predictor: `$$y_i = \beta_0 + \beta_1 x_{1i} + \epsilon_i$$` + is extended to include additional `\(x\)`'s: `$$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} + \epsilon_i$$` + For each `\(x\)`, we have an additional `\(b\)` + `\(\beta_1\)` is the coefficient for the 1st predictor + `\(\beta_2\)` for the second etc. --- # Interpreting coefficients in multiple regression `$$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_j x_{ji} + \epsilon_i$$` + Given that we have additional variables, our interpretation of the regression coefficients changes a little + `\(\beta_0\)` = the predicted value for `\(y\)` **all** `\(x\)` are 0. + Each `\(\beta_j\)` is now a **partial regression coefficient** + It captures the change in `\(y\)` for a one unit change in , `\(x\)` **when all other x's are held constant** + What does holding constant mean? + Refers to finding the effect of the predictor when the values of the other predictors are fixed + It may also be expressed as the effect of **controlling for**, or **partialling out**, or **residualizing for** the other `\(x\)`'s + With multiple predictors `lm` isolates the effects and estimates the unique contributions of predictors. --- # Visualizing models .pull-left[ <!-- --> ] .pull-right[ <img src="./lm_surface.png" width="1276" /> ] ??? + In simple linear models, we could visualise the model as a straight line in 2D space + Least squares finds the coefficients that produces the *regression line* that minimises the vertical distances of the observed y-values from the line + In a regression with 2 predictors, this becomes a regression plane in 3D space + The goal now becomes finding the set of coefficients that minimises the vertical distances between the *regression* *plane* and the observed y-values + The logic extends to any number of predictors + (but becomes very difficult to visualise!) --- # Example: lm with 2 predictors + Imagine we were interested in examining predictors of school performance. + we get a teacher rating of child's performance, a self-report measure of self control, and also measure teacher rated class interaction. + We collect data on a sample of n=650 12 year old and fit a linear model. + We'll fit the model to `\(z\)`-scores for all variables. + Remember `\(z\)`-scores have a mean of 0, and a SD of 1 + So "1 unit" of a `\(z\)`-score is 1 SD --- # `lm` code .pull-left[ ] ```r *perf <- lm(z_perf ~ z_SC + z_interaction, data = data) ``` + Multiple predictors are separated by `+` --- # Multiple regression coefficients ```r summary(perf) ``` ``` ## ## Call: ## lm(formula = z_perf ~ z_SC + z_interaction, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.3169 -0.5840 -0.0989 0.5284 3.9093 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822e-17 3.436e-02 0.000 1.000 ## z_SC 4.839e-01 3.444e-02 14.048 <2e-16 *** ## z_interaction -1.175e-02 3.444e-02 -0.341 0.733 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.8761 on 647 degrees of freedom ## Multiple R-squared: 0.2349, Adjusted R-squared: 0.2325 ## F-statistic: 99.32 on 2 and 647 DF, p-value: < 2.2e-16 ``` --- # Multiple regression coefficients ```r res <- summary(perf) res$coefficients ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822069e-17 0.03436173 2.858432e-15 1.000000e+00 ## z_SC 4.838522e-01 0.03444282 1.404798e+01 2.558540e-39 ## z_interaction -1.175418e-02 0.03444282 -3.412664e-01 7.330139e-01 ``` + **Controlling for class interaction, for every SD unit increase in self-control, there is a 0.48 SD unit increase in academic performance** --- # Multiple regression coefficients ```r res <- summary(perf) res$coefficients ``` ``` ## Estimate Std. + **Controlling for self-control, for every SD unit increase in rating of class interaction, there is a -0.01 SD unit decrease in academic performance** --- # `\(R^2\)` + Like in simple regression, we use `\(R^2\)` for overall model evaluation. + The sums of squares used to calculate `\(R^2\)` are defined in the same way as for simple regression. `$$SS_{total} = \sum_{i=1}^{n}(y_i - \bar{Y})^2$$` `$$SS_{Residual} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$` `$$SS_{Model} = SS_{total} - SS_{residual}$$` ??? + The only difference is that is based on multiple IVs. --- # `\(R^2\)` + `\(R^2\)` is then calculated in the same way as in simple regression: `$$R^2 = 1 - \frac{SS_{residual}}{SS_{total}}$$` + or `$$R^2 = \frac{SS_{model}}{SS_{total}}$$` --- # `\(R^2\)` interpretation + `\(R^2\)` = the proportion of variation in the outcome accounted for by the model (all the predictors) + It's square root is now the multiple correlation coefficient between predictors and outcome + The multiple correlation coefficient summarizes the shared relationship between `\(Y\)` and a set of variables `\(x\)`'s + It is the squared correlation between the observed `\(y\)` and predicted `\(y\)` values. --- # Adjusted `\(R^2\)` + We can also compute an adjusted `\(R^2\)` when our lm has 2+ predictors. + `\(R^2\)` is an inflated estimate of the corresponding population value + Due to random sampling fluctuation, even when `\(R^2 = 0\)` in the population, it's value in the sample may `\(\neq 0\)` + In **smaller samples** , the fluctuations from zero will be larger on average + With **more IVs** , there are more opportunities to add to the positive fluctuation `$$\hat R^2 = 1 - (1 - R^2)\frac{N-1}{N-k-1}$$` + Adjusted `\(R^2\)` adjusts for both sample size ( `\(N\)` ) and number of predictors ( `\(k\)` ) --- # In our academic performance example ```r res ``` ``` ## ## Call: ## lm(formula = z_perf ~ z_SC + z_interaction, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.3169 -0.5840 -0.0989 0.5284 3.9093 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822e-17 3.436e-02 0.000 1.000 ## z_SC 4.839e-01 3.444e-02 14.048 <2e-16 *** ## z_interaction -1.175e-02 3.444e-02 -0.341 0.733 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.8761 on 647 degrees of freedom ## Multiple R-squared: 0.2349, Adjusted R-squared: 0.2325 ## F-statistic: 99.32 on 2 and 647 DF, p-value: < 2.2e-16 ``` --- # In our academic performance example + **Based on adjusted R-squared, self-control and class interaction together explain 23.3% of the variance in academic performance** + As the sample size is large and the number of predictors small, unadjusted ( 0.235 ) and adjusted R-squared ( 0.233 ) are similar. --- # `\(F\)`-ratio + Like in simple regression, the `\(F\)`-ratio is used to test the null hypothesis that **all** model slopes are zero. + It is calculated in exactly the same way as in simple linear model: `$$F = \frac{MS_{Model}}{MS_{Residual}} = \frac{\frac{SS_{model}}{df_{model}}}{\frac{SS_{residual}}{df_{residual}}}$$` + Where + df model = `\(k\)` + df residual = `\(N\)` - `\(k\)` - 1 + `\(N\)` = sample size + `\(k\)` = number of predictors --- # In our academic performance example? ```r res ``` ``` ## ## Call: ## lm(formula = z_perf ~ z_SC + z_interaction, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.3169 -0.5840 -0.0989 0.5284 3.9093 ## ## Coefficients: ## Estimate Std. + Our overall model was significant (*F(2,647)=99.32, p<.001*). --- # Evaluating individual predictors + Broadly follows the same procedure as in simple regression: + Standard errors (SEs) for each regression slope are computed + SE gives a measure of the sampling variability of a regression coefficient + `\(t\)`-tests and confidence intervals evaluate the statistical significance of regression slopes --- # Standard errors .pull-left[ `$$SE(\hat \beta_j) = \sqrt{\frac{ SS_{Residual}/(n-k-1)}{\sum(x_i - \bar{x})^2}}$$` ] .pull-right[ `$$SE(\hat \beta_j) = \sqrt{\frac{ SS_{Residual}/(n-k-1)}{\sum(x_{ij} - \bar{x_{ij}})^2(1-R_{xj}^2)}}$$` + `\(1-R_{xj}^2\)` is capturing the correlation between `\(x_j\)` and all other `\(x\)`'s ] --- # Standard errors `$$SE(\hat \beta_j) = \sqrt{\frac{ SS_{Residual}/(n-k-1)}{\sum(x_{ij} - \bar{x_{ij}})^2(1-R_{xj}^2)}}$$` + Examining the above formula we can see that: + `\(SE\)` is smaller when residual variance ( `\(SS_{residual}\)` ) is smaller + `\(SE\)` is smaller when sample size ( `\(N\)` ) is larger + `\(SE\)` is larger when the number of predictors ( `\(k\)` ) is larger + `\(SE\)` is larger when a predictor is strongly correlated with other predictors ( `\(R_{xj}^2\)` ) ??? + Well return to this later when we discuss multi-collinearity issues --- # Significance of coefficients + Once we have the standard error, all else is the same: `$$t = \frac{\hat \beta_1}{SE(\hat \beta_1)}$$` + A `\(t\)`-test of the null hypothesis that `\(b_j = 0\)` + The `\(t\)`-value is compared to a `\(t\)`-distribution with N-k-1 degrees of freedom to assess statistical significance at a given `\(\alpha\)`. --- # Our academic performance example ```r res$coefficients ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.822069e-17 0.03436173 2.858432e-15 1.000000e+00 ## z_SC 4.838522e-01 0.03444282 1.404798e+01 2.558540e-39 ## z_interaction -1.175418e-02 0.03444282 -3.412664e-01 7.330139e-01 ``` **Self-control (t(647)=14.05, p<.001) was a significant predictor of academic performance ( `\(\alpha = 0.05\)` ), and so we reject the null hypothesis of no effect. 