class: center, middle, inverse, title-slide #
Model evaluation
## Data Analysis for Psychology in R 2
### dapR2 Team ### Department of Psychology
The University of Edinburgh --- # Weeks Learning Objectives 1. Understand the calculation and interpretation of the coefficient of determination. 2. Understand the calculation and interpretation of the F-test of model utility. 3. Understand how to standardize model coefficients and when this is appropriate to do. 4. Understand the relationship between the correlation coefficient and the regression slope. 5. Understand the meaning of model coefficients in the case of a binary predictor. --- # Topics for today + Overall model evaluation + Coefficient of determination ( `\(R^2\)` ) + F-test for the model --- # Quality of the overall model + When we measure an outcome ( `\(y\)` ) in some data, the scores will vary (we hope). + Variation in `\(y\)` = total variation of interest. -- + The aim of our linear model is to build a model which describes `\(y\)` as a function of `\(x\)`. + That is we are trying to explain variation in `\(y\)` using `\(x\)`. -- + But it won't explain it all. + What is left unexplained is called the residual variance. -- + So we can breakdown variation in our data based on sums of squares as; `$$SS_{Total} = SS_{Model} + SS_{Residual}$$` --- # Coefficient of determination + One way to consider how good our model is, would be to consider the proportion of total variance our model accounts for. `$$R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}}$$` + `\(R^2\)` = coefficient of determination -- + Quantifies the amount of variability in the outcome accounted for by the predictors. + More variance accounted for, the better. + Represents the extent to which the prediction of `\(y\)` is improved when predictions are based on the linear relation between `\(x\)` and `\(y\)`. -- + Let's see how it works. + To do so, we need to calculate the different sums of squares. --- # Total Sum of Squares .pull-left[ + Sums of squares quantify difference sources of variation. `$$SS_{Total} = \sum_{i=1}^{n}(y_i - \bar{y})^2$$` + Squared distance of each data point from the mean of `\(y\)`. + Mean is our baseline. + Without any other information, our best guess at the value of `\(y\)` for any person is the mean. ] .pull-right[ <img src="dapr2_05_LMmodeleval_files/figure-html/unnamed-chunk-2-1.png" width="90%" /> ] --- # Calculations .pull-left[ ```r ss_tab <- test %>% mutate( y_dev = score - mean(score), y_dev2 = y_dev^2 ) ``` ```r ss_tab %>% summarize( ss_tot = sum(y_dev2) ) ``` ``` ## # A tibble: 1 x 1 ## ss_tot ## <dbl> ## 1 44.1 ``` ] .pull-right[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> student </th> <th style="text-align:right;"> hours </th> <th style="text-align:right;"> score </th> <th style="text-align:right;"> y_dev </th> <th style="text-align:right;"> y_dev2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID1 </td> <td style="text-align:right;"> 0.5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> -2.3 </td> <td style="text-align:right;"> 5.29 </td> </tr> <tr> <td style="text-align:left;"> ID2 </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -0.3 </td> <td style="text-align:right;"> 0.09 </td> </tr> <tr> <td style="text-align:left;"> ID3 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> -2.3 </td> <td style="text-align:right;"> 5.29 </td> </tr> <tr> <td style="text-align:left;"> ID4 </td> <td style="text-align:right;"> 2.0 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> -1.3 </td> <td style="text-align:right;"> 1.69 </td> </tr> <tr> <td style="text-align:left;"> ID5 </td> <td style="text-align:right;"> 2.5 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> -1.3 </td> <td style="text-align:right;"> 1.69 </td> </tr> <tr> <td style="text-align:left;"> ID6 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 2.7 </td> <td style="text-align:right;"> 7.29 </td> </tr> <tr> <td style="text-align:left;"> ID7 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -0.3 </td> <td style="text-align:right;"> 0.09 </td> </tr> <tr> <td style="text-align:left;"> ID8 </td> <td style="text-align:right;"> 4.0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -0.3 </td> <td style="text-align:right;"> 0.09 </td> </tr> <tr> <td style="text-align:left;"> ID9 </td> <td style="text-align:right;"> 4.5 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.7 </td> <td style="text-align:right;"> 0.49 </td> </tr> <tr> <td style="text-align:left;"> ID10 </td> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 22.09 </td> </tr> </tbody> </table> ] --- # Residual sum of squares .pull-left[ + Sums of squares quantify difference sources of variation. `$$SS_{Residual} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$` + Which you may recognise. + Squared distance of each point from the predicted value. ] .pull-right[ <img src="dapr2_05_LMmodeleval_files/figure-html/unnamed-chunk-6-1.png" width="90%" /> ] --- # Calculations .pull-left[ ```r ss_tab <- ss_tab %>% mutate( y_pred = round(res$fitted.values,2), pred_dev = round((score - y_pred),2), pred_dev2 = round(pred_dev^2,2) ) ``` ```r ss_tab %>% summarize( ss_tot = sum(y_dev2), * ss_resid = sum(pred_dev2) ) ``` ``` ## # A tibble: 1 x 2 ## ss_tot ss_resid ## <dbl> <dbl> ## 1 44.1 21.2 ``` ] .pull-right[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> student </th> <th style="text-align:right;"> hours </th> <th style="text-align:right;"> score </th> <th style="text-align:right;"> y_pred </th> <th style="text-align:right;"> pred_dev </th> <th style="text-align:right;"> pred_dev2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ID1 </td> <td style="text-align:right;"> 0.5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.93 </td> <td style="text-align:right;"> 0.07 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> ID2 </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1.45 </td> <td style="text-align:right;"> 1.55 </td> <td style="text-align:right;"> 2.40 </td> </tr> <tr> <td style="text-align:left;"> ID3 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1.98 </td> <td style="text-align:right;"> -0.98 </td> <td style="text-align:right;"> 0.96 </td> </tr> <tr> <td style="text-align:left;"> ID4 </td> <td style="text-align:right;"> 2.0 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 2.51 </td> <td style="text-align:right;"> -0.51 </td> <td style="text-align:right;"> 0.26 </td> </tr> <tr> <td style="text-align:left;"> ID5 </td> <td style="text-align:right;"> 2.5 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3.04 </td> <td style="text-align:right;"> -1.04 </td> <td style="text-align:right;"> 1.08 </td> </tr> <tr> <td style="text-align:left;"> ID6 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 3.56 </td> <td style="text-align:right;"> 2.44 </td> <td style="text-align:right;"> 5.95 </td> </tr> <tr> <td style="text-align:left;"> ID7 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 4.09 </td> <td style="text-align:right;"> -1.09 </td> <td style="text-align:right;"> 1.19 </td> </tr> <tr> <td style="text-align:left;"> ID8 </td> <td style="text-align:right;"> 4.0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 4.62 </td> <td style="text-align:right;"> -1.62 </td> <td style="text-align:right;"> 2.62 </td> </tr> <tr> <td style="text-align:left;"> ID9 </td> <td style="text-align:right;"> 4.5 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 5.15 </td> <td style="text-align:right;"> -1.15 </td> <td style="text-align:right;"> 1.32 </td> </tr> <tr> <td style="text-align:left;"> ID10 </td> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 5.67 </td> <td style="text-align:right;"> 2.33 </td> <td style="text-align:right;"> 5.43 </td> </tr> </tbody> </table> ] --- # Model sums of squares .pull-left[ + Sums of squares quantify difference sources of variation. `$$SS_{Model} = \sum_{i=1}^{n}(\hat{y}_i - \bar{y})^2$$` + That is, it is the deviance of the predicted scores from the mean of `\(y\)`. + But it is easier to simply take: `$$SS_{Model} = SS_{Total} - SS_{Residual}$$` ] .pull-right[ <img src="dapr2_05_LMmodeleval_files/figure-html/unnamed-chunk-10-1.png" width="90%" /> ] --- # Calculations .pull-left[ `$$SS_{Model} = SS_{Total} - SS_{Residual}$$` ```r ss_tab %>% summarize( ss_tot = sum(y_dev2), ss_resid = sum(pred_dev2) ) %>% * mutate( * ss_mod = ss_tot - ss_resid ) ``` ``` ## # A tibble: 1 x 3 ## ss_tot ss_resid ss_mod ## <dbl> <dbl> <dbl> ## 1 44.1 21.2 22.9 ``` ] .pull-right[ <img src="dapr2_05_LMmodeleval_files/figure-html/unnamed-chunk-12-1.png" width="90%" /> ] --- # Coefficient of determination + Now we can finally come back to `\(R^2\)`. `$$R^2 = 1 - \frac{SS_{Residual}}{SS_{Total}}$$` + Or `$$R^2 = \frac{SS_{Model}}{SS_{Total}}$$` + So in our example: `$$R^2 = \frac{SS_{Model}}{SS_{Total}} = \frac{22.9}{44.1} = 0.519$$` + ** `\(R^2\)` = 0.519 means that 52% of the variation in test scores is accounted for by hours of revision.** --- # Our example ```r res <- lm(score ~ hours, data = test) summary(res) ``` ``` ## ## Call: ## lm(formula = score ~ hours, data = test) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.6182 -1.0773 -0.7454 1.1773 2.4364 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.4000 1.1111 0.360 0.7282 ## hours 1.0545 0.3581 2.945 0.0186 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.626 on 8 degrees of freedom ## Multiple R-squared: 0.5201, Adjusted R-squared: 0.4601 ## F-statistic: 8.67 on 1 and 8 DF, p-value: 0.01858 ``` ??? As at the end of last session, we can check this against the R-output: Be sure to flag small amounts of rounding difference from working through "by hand" and so presenting to less decimal places. --- class: center, middle # Time for a break **Quiz time!** --- class: center, middle # Welcome Back! **Where we left off... ** We had just calculated `\(R^2\)` Now let's look at calculating significance tests for our model --- # Significance of the overall model + The test of the individual predictors (IVs, or `\(x\)`'s) does not tell us if the overall model is significant or not. + Neither does R-square + But both are indicative + To test the significance of the model as a whole, we conduct an `\(F\)`-test. --- # F-ratio + `\(F\)`-ratio tests the null hypothesis that all the regression slopes in a model are all zero + We are currently talking about a model with only one `\(x\)`, thus one slope. + But the `\(F\)`-ratio test will generalise. -- + `\(F\)`-ratio is a ratio of the explained to unexplained variance: `$$F = \frac{MS_{Model}}{MS_{Residual}}$$` + Where MS = mean squares -- + **What are mean squares?** + Mean squares are sums of squares calculations divided by the associated degrees of freedom. + The degrees of freedom are defined by the number of "independent" values associated with the different calculations. --- # F-table <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> SS </th> <th style="text-align:left;"> df </th> <th style="text-align:left;"> MS </th> <th style="text-align:left;"> Fratio </th> <th style="text-align:left;"> pvalue </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Model </td> <td style="text-align:left;"> k </td> <td style="text-align:left;"> SS model/df model </td> <td style="text-align:left;"> MS model/ MS residual </td> <td style="text-align:left;"> F(df model,df residual) </td> </tr> <tr> <td style="text-align:left;"> Residual </td> <td style="text-align:left;"> n-k-1 </td> <td style="text-align:left;"> SS residual/df residual </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> --- # Our example: F-table <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Component </th> <th style="text-align:right;"> SS </th> <th style="text-align:left;"> df </th> <th style="text-align:left;"> MS </th> <th style="text-align:left;"> Fratio </th> <th style="text-align:left;"> pvalue </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Model </td> <td style="text-align:right;"> 22.9 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 22.9 </td> <td style="text-align:left;"> 8.641509 </td> <td style="text-align:left;"> F(1,8) </td> </tr> <tr> <td style="text-align:left;"> Residual </td> <td style="text-align:right;"> 21.2 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 2.65 </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;"> 44.1 </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> --- # F-ratio + Bigger `\(F\)`-ratios indicate better models. + It means the model variance is big compared to the residual variance. -- + The null hypothesis for the model says that the best guess of any individuals `\(y\)` value is the mean of `\(y\)` plus error. + Or, that the `\(x\)` variables carry no information collectively about `\(y\)`. -- + `\(F\)`-ratio will be close to 1 when the null hypothesis is true + If there is equivalent residual to model variation, `\(F\)`=1 + If there is more model than residual `\(F\)` > 1 -- + `\(F\)`-ratio is then evaluated against an `\(F\)`-distribution with `\(df_{Model}\)` and `\(df_{Residual}\)` and a pre-defined `\(\alpha\)` -- + Testing the `\(F\)`-ratio evaluates statistical significance of the overall model --- # Visualize the test .pull-left[ <img src="dapr2_05_LMmodeleval_files/figure-html/unnamed-chunk-16-1.png" width="90%" /> ] .pull-right[ + Critical value and `\(p\)`-value: ```r tibble( Crit = round(qf(0.95, 1, 8),3), Exactp = 1-pf(8.64, 1, 8) ) ``` ``` ## # A tibble: 1 x 2 ## Crit Exactp ## <dbl> <dbl> ## 1 5.32 0.0187 ``` + From this we would **reject the null**. ] --- # Our example ```r res <- lm(score ~ hours, data = test) summary(res) ``` ``` ## ## Call: ## lm(formula = score ~ hours, data = test) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.6182 -1.0773 -0.7454 1.1773 2.4364 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.4000 1.1111 0.360 0.7282 ## hours 1.0545 0.3581 2.945 0.0186 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.626 on 8 degrees of freedom ## Multiple R-squared: 0.5201, Adjusted R-squared: 0.4601 ## F-statistic: 8.67 on 1 and 8 DF, p-value: 0.01858 ``` ??? As at the end of last session, we can check this against the R-output: Comment on the minor differences for rounding. --- # Summary of today + We have looked at evaluating the overall model. + `\(R^2\)`, or coefficient of determination, tells us how much total variance is explained by our model + `\(F\)`-ratio or `\(F\)`-test provide a significance test of the overall model --- class: center, middle # Thanks for listening!