This document shows the worked-through calculations for:
We will make our calculations for the model we have been using in class this week:
performance <- lm(score ~ hours + motivation, data = test_study2)
summary(performance)
##
## Call:
## lm(formula = score ~ hours + motivation, data = test_study2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.9548 -2.8042 -0.2847 2.9344 13.8240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.86679 0.65473 10.488 <2e-16 ***
## hours 1.37570 0.07989 17.220 <2e-16 ***
## motivation 0.91634 0.38376 2.388 0.0182 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.386 on 147 degrees of freedom
## Multiple R-squared: 0.6696, Adjusted R-squared: 0.6651
## F-statistic: 148.9 on 2 and 147 DF, p-value: < 2.2e-16
\[SS_{Total} = \sum_{i=1}^{n}(y_i - \bar{y})^2\]
Applying the formula involves:
score for each individual
( \(y_i\) )score ( \(\bar{y}\) ), which is the same for
everyone, and subtracting it from each individual value of
scoreWe will build an object called ss_tab (to stand for
“sums of squares table”). Here we will add our calculations to the
original data.
ss_tab <- test_study2 %>%
mutate(
y_dev = score - mean(score),
y_dev2 = y_dev^2
)
In the code above, y_dev = \((y_i - \bar{y})\) , and y_dev2
squares these values. Each individual has these values calculated. This
we can see below where we print the top 10 rows of
ss_tab.
| ID | score | hours | motivation | y_dev | y_dev2 |
|---|---|---|---|---|---|
| ID101 | 7 | 2 | -1.42 | -9.14 | 83.5396 |
| ID102 | 23 | 12 | -0.41 | 6.86 | 47.0596 |
| ID103 | 17 | 4 | 0.49 | 0.86 | 0.7396 |
| ID104 | 6 | 2 | 0.24 | -10.14 | 102.8196 |
| ID105 | 12 | 2 | 0.09 | -4.14 | 17.1396 |
| ID106 | 24 | 12 | 1.05 | 7.86 | 61.7796 |
| ID107 | 11 | 3 | -0.09 | -5.14 | 26.4196 |
| ID108 | 23 | 8 | 0.39 | 6.86 | 47.0596 |
| ID109 | 25 | 10 | 0.34 | 8.86 | 78.4996 |
| ID110 | 15 | 9 | -0.08 | -1.14 | 1.2996 |
We can then calculate the sum of y_dev2 to give us our
sums of squares total:
ss_tot <- sum(ss_tab$y_dev2)
ss_tot
## [1] 8556.06
\[SS_{Residual} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2\] Applying the formula involves:
score for each individual
( \(y_i\) )score for that individual ( \(\hat{y}_i\) )ss_tab <- ss_tab %>%
mutate(
y_pred = round(performance$fitted.values,2), # extract the predicted values from the model
pred_dev = round((score - y_pred),2), # subtract these from the observed values of score
pred_dev2 = round(pred_dev^2,2) # square these values
)
| ID | score | hours | y_dev2 | y_pred | pred_dev | pred_dev2 |
|---|---|---|---|---|---|---|
| ID101 | 7 | 2 | 83.5396 | 8.32 | -1.32 | 1.74 |
| ID102 | 23 | 12 | 47.0596 | 23.00 | 0.00 | 0.00 |
| ID103 | 17 | 4 | 0.7396 | 12.82 | 4.18 | 17.47 |
| ID104 | 6 | 2 | 102.8196 | 9.84 | -3.84 | 14.75 |
| ID105 | 12 | 2 | 17.1396 | 9.70 | 2.30 | 5.29 |
| ID106 | 24 | 12 | 61.7796 | 24.34 | -0.34 | 0.12 |
| ID107 | 11 | 3 | 26.4196 | 10.91 | 0.09 | 0.01 |
| ID108 | 23 | 8 | 47.0596 | 18.23 | 4.77 | 22.75 |
| ID109 | 25 | 10 | 78.4996 | 20.94 | 4.06 | 16.48 |
| ID110 | 15 | 9 | 1.2996 | 19.17 | -4.17 | 17.39 |
As we did previously, we can then sum this final column to give the sums of squares residual:
ss_resid <- sum(ss_tab$pred_dev2)
ss_resid
## [1] 2826.83
\[SS_{Model} = \sum_{i=1}^{n}(\hat{y}_i - \bar{y})^2\] Applying the formula involves:
score for each individual
from our model ( \(\hat{y}_i\) )score ( \(\bar{y}\) ) from each of the \(\hat{y}_i\) valuesHowever, there is a much more efficient way to do this which is simply:
\[SS_{Model} = SS_{Total} - SS_{Residual}\]
ss_mod = ss_tot - ss_resid
ss_mod
## [1] 5729.23
\[R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}}\]
ss_mod/ss_tot
## [1] 0.6696108
or
1 - (ss_resid/ss_tot)
## [1] 0.6696108
We can compare this to the value at the bottom of our model summary:
summary(performance)
##
## Call:
## lm(formula = score ~ hours + motivation, data = test_study2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.9548 -2.8042 -0.2847 2.9344 13.8240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.86679 0.65473 10.488 <2e-16 ***
## hours 1.37570 0.07989 17.220 <2e-16 ***
## motivation 0.91634 0.38376 2.388 0.0182 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.386 on 147 degrees of freedom
## Multiple R-squared: 0.6696, Adjusted R-squared: 0.6651
## F-statistic: 148.9 on 2 and 147 DF, p-value: < 2.2e-16
If we wanted to save this specific value from the summary output, we can do the following:
sum_performance <- summary(performance)
sum_performance$r.squared
## [1] 0.6695598