class: center, middle, inverse, title-slide #
Week 3: Model evaluation
## Data Analysis for Psychology in R 2
### TOM BOOTH & ALEX DOUMAS ### Department of Psychology
The University of Edinburgh ### AY 2020-2021 --- # Weeks Learning Objectives 1. Understand the calculation and interpretation of the coefficient of determination. 2. Understand the calculation and interpretation of the F-test of model utility. 3. Understand how to standardize model coefficients and when this is appropriate to do. 4. Understand the relationship between the correlation coefficient and the regression slope. --- # Topics for today + Overall model evaluation + Coefficient of determination ( `\(R^2\)` ) + F-test for the model --- # Quality of the overall model + When we measure an outcome ( `\(y\)` ) in some data, the scores will vary (we hope). + Variation in `\(y\)` = total variation of interest. -- + The aim of our linear model is to build a model which describes `\(y\)` as a function of `\(x\)`. + That is we are trying to explain variation in `\(y\)` using `\(x\)`. -- + But it won't explain it all. + What is left unexplained is called the residual variance. -- + So we can breakdown variation in our data based on sums of squares as; `$$SS_{Total} = SS_{Model} + SS_{Residual}$$` --- # Coefficient of determination + One way to consider how good our model is, would be to consider the proportion of total variance our model accounts for. `$$R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}}$$` + `\(R^2\)` = coefficient of determination -- + Quantifies the amount of variability in the outcome accounted for by the predictors. + More variance accounted for, the better. + Represents the extent to which the prediction of `\(y\)` is improved when predictions are based on the linear relation between `\(x\)` and `\(y\)`. -- + Let's see how it works. + To do so, we need to calculate the different sums of squares. --- # Total Sum of Squares .pull-left[ + Sums of squares quantify difference sources of variation. `$$SS_{Total} = \sum_{i=1}^{n}(y_i - \bar{y})^2$$` + Squared distance of each data point from the mean of `\(y\)`. + Mean is our baseline. + Without any other information, our best guess at the value of `\(y\)` for any person is the mean. ] .pull-right[ <img src="dapR2_lec05_LMmodeleval_files/figure-html/unnamed-chunk-2-1.png" width="90%" /> ] --- # Calculations .pull-left[ ```r ss_tab <- test %>% mutate( y_dev = score - mean(score), y_dev2 = y_dev^2 ) ``` ```r ss_tab %>% summarize( ss_tot = sum(y_dev2) ) ``` ``` ## # A tibble: 1 x 1 ## ss_tot ## <dbl> ## 1 44.1 ``` ] .pull-right[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> student </th> <th style="text-align:right;"> hours </th> <th style="text-align:right;"> score </th> <th style="text-align:right;"> y_dev </th> <th style="text-align:right;"> y_dev2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID1 </td> <td style="text-align:right;"> 0.5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> -2.3 </td> <td style="text-align:right;"> 5.29 </td> </tr> <tr> <td style="text-align:left;"> ID2 </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -0.3 </td> <td style="text-align:right;"> 0.09 </td> </tr> <tr> <td style="text-align:left;"> ID3 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> -2.3 </td> <td style="text-align:right;"> 5.29 </td> </tr> <tr> <td style="text-align:left;"> ID4 </td> <td style="text-align:right;"> 2.0 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> -1.3 </td> <td style="text-align:right;"> 1.69 </td> </tr> <tr> <td style="text-align:left;"> ID5 </td> <td style="text-align:right;"> 2.5 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> -1.3 </td> <td style="text-align:right;"> 1.69 </td> </tr> <tr> <td style="text-align:left;"> ID6 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 2.7 </td> <td style="text-align:right;"> 7.29 </td> </tr> <tr> <td style="text-align:left;"> ID7 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -0.3 </td> <td style="text-align:right;"> 0.09 </td> </tr> <tr> <td style="text-align:left;"> ID8 </td> <td style="text-align:right;"> 4.0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -0.3 </td> <td style="text-align:right;"> 0.09 </td> </tr> <tr> <td style="text-align:left;"> ID9 </td> <td style="text-align:right;"> 4.5 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.7 </td> <td style="text-align:right;"> 0.49 </td> </tr> <tr> <td style="text-align:left;"> ID10 </td> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 22.09 </td> </tr> </tbody> </table> ] --- # Residual sum of squares .pull-left[ + Sums of squares quantify difference sources of variation. `$$SS_{Residual} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$` + Which you may recognise. + Squared distance of each point from the predicted value. ] .pull-right[ <img src="dapR2_lec05_LMmodeleval_files/figure-html/unnamed-chunk-6-1.png" width="90%" /> ] --- # Calculations .pull-left[ ```r ss_tab <- ss_tab %>% mutate( y_pred = round(res$fitted.values,2), pred_dev = round((score - y_pred),2), pred_dev2 = round(pred_dev^2,2) ) ``` ```r ss_tab %>% summarize( ss_tot = sum(y_dev2), * ss_resid = sum(pred_dev2) ) ``` ``` ## # A tibble: 1 x 2 ## ss_tot ss_resid ## <dbl> <dbl> ## 1 44.1 21.2 ``` ] .pull-right[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> student </th> <th style="text-align:right;"> hours </th> <th style="text-align:right;"> score </th> <th style="text-align:right;"> y_pred </th> <th style="text-align:right;"> pred_dev </th> <th style="text-align:right;"> pred_dev2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID1 </td> <td style="text-align:right;"> 0.5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.93 </td> <td style="text-align:right;"> 0.07 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> ID2 </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1.45 </td> <td style="text-align:right;"> 1.55 </td> <td style="text-align:right;"> 2.40 </td> </tr> <tr> <td style="text-align:left;"> ID3 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1.98 </td> <td style="text-align:right;"> -0.98 </td> <td style="text-align:right;"> 0.96 </td> </tr> <tr> <td style="text-align:left;"> ID4 </td> <td style="text-align:right;"> 2.0 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 2.51 </td> <td style="text-align:right;"> -0.51 </td> <td style="text-align:right;"> 0.26 </td> </tr> <tr> <td style="text-align:left;"> ID5 </td> <td style="text-align:right;"> 2.5 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3.04 </td> <td style="text-align:right;"> -1.04 </td> <td style="text-align:right;"> 1.08 </td> </tr> <tr> <td style="text-align:left;"> ID6 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 3.56 </td> <td style="text-align:right;"> 2.44 </td> <td style="text-align:right;"> 5.95 </td> </tr> <tr> <td style="text-align:left;"> ID7 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 4.09 </td> <td style="text-align:right;"> -1.09 </td> <td style="text-align:right;"> 1.19 </td> </tr> <tr> <td style="text-align:left;"> ID8 </td> <td style="text-align:right;"> 4.0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 4.62 </td> <td style="text-align:right;"> -1.62 </td> <td style="text-align:right;"> 2.62 </td> </tr> <tr> <td style="text-align:left;"> ID9 </td> <td style="text-align:right;"> 4.5 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 5.15 </td> <td style="text-align:right;"> -1.15 </td> <td style="text-align:right;"> 1.32 </td> </tr> <tr> <td style="text-align:left;"> ID10 </td> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 5.67 </td> <td style="text-align:right;"> 2.33 </td> <td style="text-align:right;"> 5.43 </td> </tr> </tbody> </table> ] --- # Model sums of squares .pull-left[ + Sums of squares quantify difference sources of variation. `$$SS_{Model} = \sum_{i=1}^{n}(\hat{y}_i - \bar{y})^2$$` + That is, it is the deviance of the predicted scores from the mean of `\(y\)`. + But it is easier to simply take: `$$SS_{Model} = SS_{Total} - SS_{Residual}$$` ] .pull-right[ <img src="dapR2_lec05_LMmodeleval_files/figure-html/unnamed-chunk-10-1.png" width="90%" /> ] --- # Calculations .pull-left[ `$$SS_{Model} = SS_{Total} - SS_{Residual}$$` ```r ss_tab %>% summarize( ss_tot = sum(y_dev2), ss_resid = sum(pred_dev2) ) %>% * mutate( * ss_mod = ss_tot - ss_resid ) ``` ``` ## # A tibble: 1 x 3 ## ss_tot ss_resid ss_mod ## <dbl> <dbl> <dbl> ## 1 44.1 21.2 22.9 ``` ] .pull-right[ <img src="dapR2_lec05_LMmodeleval_files/figure-html/unnamed-chunk-12-1.png" width="90%" /> ] --- # Coefficient of determination + Now we can finally come back to `\(R^2\)`. `$$R^2 = 1 - \frac{SS_{Residual}}{SS_{Total}}$$` + Or `$$R^2 = \frac{SS_{Model}}{SS_{Total}}$$` + So in our example: `$$R^2 = \frac{SS_{Model}}{SS_{Total}} = \frac{22.9}{44.1} = 0.519$$` + ** `\(R^2\)` = 0.519 means that 52% of the variation in test scores is accounted for by hours of revision.** --- # Our example ```r res <- lm(score ~ hours, data = test) summary(res) ``` ``` ## ## Call: ## lm(formula = score ~ hours, data = test) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.6182 -1.0773 -0.7454 1.1773 2.4364 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.4000 1.1111 0.360 0.7282 ## hours 1.0545 0.3581 2.945 0.0186 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.626 on 8 degrees of freedom ## Multiple R-squared: 0.5201, Adjusted R-squared: 0.4601 ## F-statistic: 8.67 on 1 and 8 DF, p-value: 0.01858 ``` ??? As at the end of last session, we can check this against the R-output: Be sure to flag small amounts of rounding difference from working through "by hand" and so presenting to less decimal places. --- class: center, middle # Time for a break **Quiz time!** --- class: center, middle # Welcome Back! **Where we left off... ** We had just calculated `\(R^2\)` Now let's look at calculating significance tests for our model --- # Significance of the overall model + The test of the individual predictors (IVs, or `\(x\)`'s) does not tell us if the overall model is significant or not. + Neither does R-square + But both are indicative + To test the significance of the model as a whole, we conduct an `\(F\)`-test. --- # F-ratio + `\(F\)`-ratio tests the null hypothesis that all the regression slopes in a model are all zero + We are currently talking about a model with only one `\(x\)`, thus one slope. + But the `\(F\)`-ratio test will generalise. -- + `\(F\)`-ratio is a ratio of the explained to unexplained variance: `$$F = \frac{MS_{Model}}{MS_{Residual}}$$` + Where MS = mean squares -- + **What are mean squares?** + Mean squares are sums of squares calculations divided by the associated degrees of freedom. + The degrees of freedom are defined by the number of "independent" values associated with the different calculations. --- # F-table <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> SS </th> <th style="text-align:left;"> df </th> <th style="text-align:left;"> MS </th> <th style="text-align:left;"> Fratio </th> <th style="text-align:left;"> pvalue </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Model </td> <td style="text-align:left;"> k </td> <td style="text-align:left;"> SS model/df model </td> <td style="text-align:left;"> MS model/ MS residual </td> <td style="text-align:left;"> F(df model,df residual) </td> </tr> <tr> <td style="text-align:left;"> Residual </td> <td style="text-align:left;"> n-k-1 </td> <td style="text-align:left;"> SS residual/df residual </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> --- # Our example: F-table <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Component </th> <th style="text-align:right;"> SS </th> <th style="text-align:left;"> df </th> <th style="text-align:left;"> MS </th> <th style="text-align:left;"> Fratio </th> <th style="text-align:left;"> pvalue </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Model </td> <td style="text-align:right;"> 22.9 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 22.9 </td> <td style="text-align:left;"> 8.641509 </td> <td style="text-align:left;"> F(1,8) </td> </tr> <tr> <td style="text-align:left;"> Residual </td> <td style="text-align:right;"> 21.2 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 2.65 </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;"> 44.1 </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> --- # F-ratio + Bigger `\(F\)`-ratios indicate better models. + It means the model variance is big compared to the residual variance. -- + The null hypothesis for the model says that the best guess of any individuals `\(y\)` value is the mean of `\(y\)` plus error. + Or, that the `\(x\)` variables carry no information collectively about `\(y\)`. -- + `\(F\)`-ratio will be close to 1 when the null hypothesis is true + If there is equivalent residual to model variation, `\(F\)`=1 + If there is more model than residual `\(F\)` > 1 -- + `\(F\)`-ratio is then evaluated against an `\(F\)`-distribution with `\(df_{Model}\)` and `\(df_{Residual}\)` and a pre-defined `\(\alpha\)` -- + Testing the `\(F\)`-ratio evaluates statistical significance of the overall model --- # Visualize the test .pull-left[ <img src="dapR2_lec05_LMmodeleval_files/figure-html/unnamed-chunk-16-1.png" width="90%" /> ] .pull-right[ + Critical value and `\(p\)`-value: ```r tibble( Crit = round(qf(0.95, 1, 8),3), Exactp = 1-pf(8.64, 1, 8) ) ``` ``` ## # A tibble: 1 x 2 ## Crit Exactp ## <dbl> <dbl> ## 1 5.32 0.0187 ``` + From this we would **reject the null**. ] --- # Our example ```r res <- lm(score ~ hours, data = test) summary(res) ``` ``` ## ## Call: ## lm(formula = score ~ hours, data = test) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.6182 -1.0773 -0.7454 1.1773 2.4364 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.4000 1.1111 0.360 0.7282 ## hours 1.0545 0.3581 2.945 0.0186 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.626 on 8 degrees of freedom ## Multiple R-squared: 0.5201, Adjusted R-squared: 0.4601 ## F-statistic: 8.67 on 1 and 8 DF, p-value: 0.01858 ``` ??? As at the end of last session, we can check this against the R-output: Comment on the minor differences for rounding. --- # Summary of today + We have looked at evaluating the overall model. + `\(R^2\)`, or coefficient of determination, tells us how much total variance is explained by our model + `\(F\)`-ratio or `\(F\)`-test provide a significance test of the overall model --- # Next tasks + This week: + Complete your lab + Come to office hours + Weekly quiz: First assessed quiz - Week 2 content. + Open Monday 09:00 + Closes Sunday 17:00