This document shows the worked through calculations for:

We will make our calculations for the model we have been using in class this week:

performance <- lm(score ~ hours + motivation, data = test_study2)
summary(performance)
## 
## Call:
## lm(formula = score ~ hours + motivation, data = test_study2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.9548  -2.8042  -0.2847   2.9344  13.8240 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.86679    0.65473  10.488   <2e-16 ***
## hours        1.37570    0.07989  17.220   <2e-16 ***
## motivation   0.91634    0.38376   2.388   0.0182 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.386 on 147 degrees of freedom
## Multiple R-squared:  0.6696, Adjusted R-squared:  0.6651 
## F-statistic: 148.9 on 2 and 147 DF,  p-value: < 2.2e-16

Sums of Squares Total

\[SS_{Total} = \sum_{i=1}^{n}(y_i - \bar{y})^2\]

Here we are:

We will build an object called ss_tab stand for sums of squares table. Here we will add our calculations to the original data.

ss_tab <- test_study2 %>%
    mutate(
        y_dev = score - mean(score), 
        y_dev2 = y_dev^2
    )

In the code above, y_dev = \((y_i - \bar{y})\) , and y_dev2 squares these values. Each individual has these values calculated. This we can see below where we print the top 10 rows of ss_tab.

ID score hours motivation y_dev y_dev2
ID101 7 2 -1.42 -9.14 83.5396
ID102 23 12 -0.41 6.86 47.0596
ID103 17 4 0.49 0.86 0.7396
ID104 6 2 0.24 -10.14 102.8196
ID105 12 2 0.09 -4.14 17.1396
ID106 24 12 1.05 7.86 61.7796
ID107 11 3 -0.09 -5.14 26.4196
ID108 23 8 0.39 6.86 47.0596
ID109 25 10 0.34 8.86 78.4996
ID110 15 9 -0.08 -1.14 1.2996

We can then calculate the sum of y_dev2 to give us our sums of squares total:

ss_tot <- sum(ss_tab$y_dev2)
ss_tot
## [1] 8556.06

Sums of Squares Residual

\[SS_{Residual} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2\] Here we are:

ss_tab <- ss_tab %>%
  mutate(
    y_pred = round(performance$fitted.values,2), # extract the predicted values from the model
    pred_dev = round((score - y_pred),2), # subtract these from the observed values of score
    pred_dev2 = round(pred_dev^2,2) # square these values
  )
ID score hours y_dev2 y_pred pred_dev pred_dev2
ID101 7 2 83.5396 8.32 -1.32 1.74
ID102 23 12 47.0596 23.00 0.00 0.00
ID103 17 4 0.7396 12.82 4.18 17.47
ID104 6 2 102.8196 9.84 -3.84 14.75
ID105 12 2 17.1396 9.70 2.30 5.29
ID106 24 12 61.7796 24.34 -0.34 0.12
ID107 11 3 26.4196 10.91 0.09 0.01
ID108 23 8 47.0596 18.23 4.77 22.75
ID109 25 10 78.4996 20.94 4.06 16.48
ID110 15 9 1.2996 19.17 -4.17 17.39

As we did previously, we can then sum this final column to give the sums of squares residual:

ss_resid <- sum(ss_tab$pred_dev2)
ss_resid
## [1] 2826.83

Sums of Squares Model

\[SS_{Model} = \sum_{i=1}^{n}(\hat{y}_i - \bar{y})^2\] Here we are:

However, there is a much more efficient way to do this which is simply:

\[SS_{Model} = SS_{Total} - SS_{Residual}\]

ss_mod = ss_tot - ss_resid
ss_mod
## [1] 5729.23

Calulate \(R^2\)

\[R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}}\]

ss_mod/ss_tot
## [1] 0.6696108

or

1 - (ss_resid/ss_tot)
## [1] 0.6696108

We can compare this to the value at the bottom of our model summary:

summary(performance)
## 
## Call:
## lm(formula = score ~ hours + motivation, data = test_study2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.9548  -2.8042  -0.2847   2.9344  13.8240 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.86679    0.65473  10.488   <2e-16 ***
## hours        1.37570    0.07989  17.220   <2e-16 ***
## motivation   0.91634    0.38376   2.388   0.0182 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.386 on 147 degrees of freedom
## Multiple R-squared:  0.6696, Adjusted R-squared:  0.6651 
## F-statistic: 148.9 on 2 and 147 DF,  p-value: < 2.2e-16

If we wanted to save this specific value from the summary output, we can:

sum_performance <- summary(performance)
sum_performance$r.squared
## [1] 0.6695598