Week 3: Model evaluation

class: center, middle, inverse, title-slide

# <b>Week 3: Model evaluation</b>
## Data Analysis for Psychology in R 2<br><br>
### TOM BOOTH & ALEX DOUMAS
### Department of Psychology<br>The University of Edinburgh
### AY 2020-2021

---

# Weeks Learning Objectives
1. Understand the calculation and interpretation of the coefficient of determination.

2. Understand the calculation and interpretation of the F-test of model utility.

3. Understand how to standardize model coefficients and when this is appropriate to do.

4. Understand the relationship between the correlation coefficient and the regression slope.

---
# Topics for today
+ Overall model evaluation

+ Coefficient of determination ( `$R^2$` )

+ F-test for the model

---
#  Quality of the overall model 
+ When we measure an outcome ( `$y$` ) in some data, the scores will vary (we hope).
  + Variation in `$y$` = total variation of interest.

+ The aim of our linear model is to build a model which describes `$y$` as a function of `$x$`.
	+ That is we are trying to explain variation in `$y$` using `$x$`.

+ But it won't explain it all.
  + What is left unexplained is called the residual variance.

+ So we can breakdown variation in our data based on sums of squares as;

`$$SS_{Total} = SS_{Model} + SS_{Residual}$$`

---
#  Coefficient of determination 
+ One way to consider how good our model is, would be to consider the proportion of total variance our model accounts for.

`$$R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}}$$`

+ `$R^2$` = coefficient of determination

+ Quantifies the amount of variability in the outcome accounted for by the predictors.
  + More variance accounted for, the better.
  + Represents the extent to which the prediction of `$y$` is improved when predictions are based on the linear relation between `$x$` and `$y$`.

+ Let's see how it works.
  + To do so, we need to calculate the different sums of squares.

---
# Total Sum of Squares

.pull-left[
+ Sums of squares quantify difference sources of variation.

`$$SS_{Total} = \sum_{i=1}^{n}(y_i - \bar{y})^2$$`

+ Squared distance of each data point from the mean of `$y$`.

+ Mean is our baseline.

+ Without any other information, our best guess at the value of `$y$` for any person is the mean.

]

.pull-right[

]

---
# Calculations

.pull-left[

```r
ss_tab <- test %>%
    mutate(
        y_dev = score - mean(score), 
        y_dev2 = y_dev^2
    )
```

```r
ss_tab %>%
    summarize(
        ss_tot = sum(y_dev2)
    )
```

```
## # A tibble: 1 x 1
##   ss_tot
##    <dbl>
## 1   44.1
```

]

.pull-right[

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> student </th>
   <th style="text-align:right;"> hours </th>
   <th style="text-align:right;"> score </th>
   <th style="text-align:right;"> y_dev </th>
   <th style="text-align:right;"> y_dev2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> ID1 </td>
   <td style="text-align:right;"> 0.5 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> -2.3 </td>
   <td style="text-align:right;"> 5.29 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID2 </td>
   <td style="text-align:right;"> 1.0 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> -0.3 </td>
   <td style="text-align:right;"> 0.09 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID3 </td>
   <td style="text-align:right;"> 1.5 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> -2.3 </td>
   <td style="text-align:right;"> 5.29 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID4 </td>
   <td style="text-align:right;"> 2.0 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> -1.3 </td>
   <td style="text-align:right;"> 1.69 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID5 </td>
   <td style="text-align:right;"> 2.5 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> -1.3 </td>
   <td style="text-align:right;"> 1.69 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID6 </td>
   <td style="text-align:right;"> 3.0 </td>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:right;"> 2.7 </td>
   <td style="text-align:right;"> 7.29 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID7 </td>
   <td style="text-align:right;"> 3.5 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> -0.3 </td>
   <td style="text-align:right;"> 0.09 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID8 </td>
   <td style="text-align:right;"> 4.0 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> -0.3 </td>
   <td style="text-align:right;"> 0.09 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID9 </td>
   <td style="text-align:right;"> 4.5 </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 0.7 </td>
   <td style="text-align:right;"> 0.49 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID10 </td>
   <td style="text-align:right;"> 5.0 </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 4.7 </td>
   <td style="text-align:right;"> 22.09 </td>
  </tr>
</tbody>
</table>

]

---
# Residual sum of squares

.pull-left[
+ Sums of squares quantify difference sources of variation.

`$$SS_{Residual} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$`

+ Which you may recognise.

+ Squared distance of each point from the predicted value.
]

.pull-right[

]

---
# Calculations

.pull-left[

```r
ss_tab <- ss_tab %>%
  mutate(
    y_pred = round(res$fitted.values,2),
    pred_dev = round((score - y_pred),2),
    pred_dev2 = round(pred_dev^2,2)
  )
```

```r
ss_tab %>%
  summarize(
    ss_tot = sum(y_dev2),
*   ss_resid = sum(pred_dev2)
  )
```

```
## # A tibble: 1 x 2
##   ss_tot ss_resid
##    <dbl>    <dbl>
## 1   44.1     21.2
```

]

.pull-right[

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> student </th>
   <th style="text-align:right;"> hours </th>
   <th style="text-align:right;"> score </th>
   <th style="text-align:right;"> y_pred </th>
   <th style="text-align:right;"> pred_dev </th>
   <th style="text-align:right;"> pred_dev2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> ID1 </td>
   <td style="text-align:right;"> 0.5 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0.93 </td>
   <td style="text-align:right;"> 0.07 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID2 </td>
   <td style="text-align:right;"> 1.0 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 1.45 </td>
   <td style="text-align:right;"> 1.55 </td>
   <td style="text-align:right;"> 2.40 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID3 </td>
   <td style="text-align:right;"> 1.5 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1.98 </td>
   <td style="text-align:right;"> -0.98 </td>
   <td style="text-align:right;"> 0.96 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID4 </td>
   <td style="text-align:right;"> 2.0 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 2.51 </td>
   <td style="text-align:right;"> -0.51 </td>
   <td style="text-align:right;"> 0.26 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID5 </td>
   <td style="text-align:right;"> 2.5 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 3.04 </td>
   <td style="text-align:right;"> -1.04 </td>
   <td style="text-align:right;"> 1.08 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID6 </td>
   <td style="text-align:right;"> 3.0 </td>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:right;"> 3.56 </td>
   <td style="text-align:right;"> 2.44 </td>
   <td style="text-align:right;"> 5.95 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID7 </td>
   <td style="text-align:right;"> 3.5 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 4.09 </td>
   <td style="text-align:right;"> -1.09 </td>
   <td style="text-align:right;"> 1.19 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID8 </td>
   <td style="text-align:right;"> 4.0 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 4.62 </td>
   <td style="text-align:right;"> -1.62 </td>
   <td style="text-align:right;"> 2.62 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID9 </td>
   <td style="text-align:right;"> 4.5 </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 5.15 </td>
   <td style="text-align:right;"> -1.15 </td>
   <td style="text-align:right;"> 1.32 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ID10 </td>
   <td style="text-align:right;"> 5.0 </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 5.67 </td>
   <td style="text-align:right;"> 2.33 </td>
   <td style="text-align:right;"> 5.43 </td>
  </tr>
</tbody>
</table>

]

---
# Model sums of squares

.pull-left[
+ Sums of squares quantify difference sources of variation.

`$$SS_{Model} = \sum_{i=1}^{n}(\hat{y}_i - \bar{y})^2$$`

+ That is, it is the deviance of the predicted scores from the mean of `$y$`.

+ But it is easier to simply take:

`$$SS_{Model} = SS_{Total} - SS_{Residual}$$`

]

.pull-right[

]

---
# Calculations

.pull-left[

`$$SS_{Model} = SS_{Total} - SS_{Residual}$$`

```r
ss_tab %>%
  summarize(
    ss_tot = sum(y_dev2),
    ss_resid = sum(pred_dev2)
  ) %>%
* mutate(
*   ss_mod = ss_tot - ss_resid
  )
```

```
## # A tibble: 1 x 3
##   ss_tot ss_resid ss_mod
##    <dbl>    <dbl>  <dbl>
## 1   44.1     21.2   22.9
```

]

.pull-right[

]

---
#  Coefficient of determination 
+ Now we can finally come back to `$R^2$`.

`$$R^2 = 1 - \frac{SS_{Residual}}{SS_{Total}}$$`

+ Or

`$$R^2 = \frac{SS_{Model}}{SS_{Total}}$$`

+ So in our example:

`$$R^2 = \frac{SS_{Model}}{SS_{Total}} = \frac{22.9}{44.1} = 0.519$$`

+ ** `$R^2$` = 0.519 means that 52% of the variation in test scores is accounted for by hours of revision.**

---
#  Our example

```r
res <- lm(score ~ hours, data = test)
summary(res)
```

```
## 
## Call:
## lm(formula = score ~ hours, data = test)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6182 -1.0773 -0.7454  1.1773  2.4364 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.4000     1.1111   0.360   0.7282  
## hours         1.0545     0.3581   2.945   0.0186 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.626 on 8 degrees of freedom
## Multiple R-squared:  0.5201,	Adjusted R-squared:  0.4601 
## F-statistic:  8.67 on 1 and 8 DF,  p-value: 0.01858
```

???
As at the end of last session, we can check this against the R-output:
Be sure to flag small amounts of rounding difference from working through "by hand" and so presenting to less decimal places.

---
class: center, middle
# Time for a break

**Quiz time!**

---
class: center, middle
# Welcome Back!

**Where we left off... **

We had just calculated `$R^2$`

Now let's look at calculating significance tests for our model

---
#  Significance of the overall model 
+ The test of the individual predictors (IVs, or `$x$`'s) does not tell us if the overall model is significant or not.
	+ Neither does R-square
	+ But both are indicative

+ To test the significance of the model as a whole, we conduct an `$F$`-test.

---
#  F-ratio
+ `$F$`-ratio tests the null hypothesis that all the regression slopes in a model are all zero
  + We are currently talking about a model with only one `$x$`, thus one slope.
  + But the `$F$`-ratio test will generalise.

+ `$F$`-ratio is a ratio of the explained to unexplained variance:

`$$F = \frac{MS_{Model}}{MS_{Residual}}$$`

+ Where MS = mean squares

+ **What are mean squares?**
  + Mean squares are sums of squares calculations divided by the associated degrees of freedom.
  + The degrees of freedom are defined by the number of "independent" values associated with the different calculations.

---
# F-table

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> SS </th>
   <th style="text-align:left;"> df </th>
   <th style="text-align:left;"> MS </th>
   <th style="text-align:left;"> Fratio </th>
   <th style="text-align:left;"> pvalue </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Model </td>
   <td style="text-align:left;"> k </td>
   <td style="text-align:left;"> SS model/df model </td>
   <td style="text-align:left;"> MS model/ MS residual </td>
   <td style="text-align:left;"> F(df model,df residual) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Residual </td>
   <td style="text-align:left;"> n-k-1 </td>
   <td style="text-align:left;"> SS residual/df residual </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
</tbody>
</table>

---
# Our example: F-table

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Component </th>
   <th style="text-align:right;"> SS </th>
   <th style="text-align:left;"> df </th>
   <th style="text-align:left;"> MS </th>
   <th style="text-align:left;"> Fratio </th>
   <th style="text-align:left;"> pvalue </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Model </td>
   <td style="text-align:right;"> 22.9 </td>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> 22.9 </td>
   <td style="text-align:left;"> 8.641509 </td>
   <td style="text-align:left;"> F(1,8) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Residual </td>
   <td style="text-align:right;"> 21.2 </td>
   <td style="text-align:left;"> 8 </td>
   <td style="text-align:left;"> 2.65 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:right;"> 44.1 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
</tbody>
</table>

---
# F-ratio
+ Bigger `$F$`-ratios indicate better models.
  + It means the model variance is big compared to the residual variance.

+ The null hypothesis for the model says that the best guess of any individuals `$y$` value is the mean of `$y$` plus error.
	+ Or, that the `$x$` variables carry no information collectively about `$y$`.

+ `$F$`-ratio will be close to 1 when the null hypothesis is true
  + If there is equivalent residual to model variation, `$F$`=1
	+ If there is more model than residual `$F$` > 1

+ `$F$`-ratio is then evaluated against an `$F$`-distribution with `$df_{Model}$` and `$df_{Residual}$` and a pre-defined `$\alpha$`

+ Testing the `$F$`-ratio evaluates statistical significance of the overall model

---
# Visualize the test

.pull-left[

<img src="dapR2_lec05_LMmodeleval_files/figure-html/unnamed-chunk-16-1.png" width="90%" />
]

.pull-right[

+ Critical value and `$p$`-value:

```r
tibble(
  Crit = round(qf(0.95, 1, 8),3),
  Exactp = 1-pf(8.64, 1, 8)
)
```

```
## # A tibble: 1 x 2
##    Crit Exactp
##   <dbl>  <dbl>
## 1  5.32 0.0187
```

+ From this we would **reject the null**.

]

---
#  Our example

```r
res <- lm(score ~ hours, data = test)
summary(res)
```

???
As at the end of last session, we can check this against the R-output:
Comment on the minor differences for rounding.

---
# Summary of today

+ We have looked at evaluating the overall model.

+ `$R^2$`, or coefficient of determination, tells us how much total variance is explained by our model

+ `$F$`-ratio or `$F$`-test provide a significance test of the overall model

---
# Next tasks
+ This week:
  + Complete your lab
  + Come to office hours
  + Weekly quiz: First assessed quiz - Week 2 content.
      + Open Monday 09:00
      + Closes Sunday 17:00