F-Ratio & Model Comparisons

Data Analysis for Psychology in R 2

Author

Affiliation

Emma Waterston

Department of Psychology
University of Edinburgh
2025–2026

In this document we are going to work through the hand calculations for:

\(F\)-statistic
Incremental \(F\)-test

Packages

For this example we will only need the tidyversepackage.

library(tidyverse)

Data

test_study2 <- read_csv("https://uoepsy.github.io/data/test_study2.csv")

We will make our calculations for the model we have been using in class this week:

performance <- lm(score ~ hours + motivation, data = test_study2)
summary(performance)


Call:
lm(formula = score ~ hours + motivation, data = test_study2)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.9548  -2.8042  -0.2847   2.9344  13.8240 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.86679    0.65473  10.488   <2e-16 ***
hours        1.37570    0.07989  17.220   <2e-16 ***
motivation   0.91634    0.38376   2.388   0.0182 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.386 on 147 degrees of freedom
Multiple R-squared:  0.6696,    Adjusted R-squared:  0.6651 
F-statistic: 148.9 on 2 and 147 DF,  p-value: < 2.2e-16

\(F\)-Statistic

\[ F = \frac{\frac{SS_{model}}{df_{model}}}{\frac{SS_{residual}}{df_{residual}}} = \frac{MS_{Model}}{MS_{Residual}} \]

Sums of Squares Residual

\[ SS_{Residual} = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \]

ss_tab <- test_study2 |>
  mutate(
    y_pred = performance$fitted.values, # extract the predicted values from the model
    pred_dev = round((score - y_pred),2), # subtract these from the observed values of score
    pred_dev2 = round(pred_dev^2,2) # square these values, and round to 2dp
  )

As we did previously, we can then sum this final column to give the sums of squares residual:

ss_resid <- sum(ss_tab$pred_dev2)
ss_resid

[1] 2826.83

Sums of Squares Model

\[ SS_{Model} = \sum_{i=1}^{n}(\hat{y}_i - \bar{y})^2 \]

ss_tab <- ss_tab |>
  mutate(
    mean_pred_dev = performance$fitted.values - mean(score), # extract the predicted values from the model and subtract mean
    mean_pred_dev2 = round(mean_pred_dev^2,2) # square these values, and round to 2dp
  )

ss_mod = sum(ss_tab$mean_pred_dev2)
ss_mod

[1] 5728.81

DF Model

\[ df_\text{model} = k = 2 \]

DF Residual

\[ df_\text{residual} = n - k - 1= 150 - 2 - 1 = 147 \]

Back to \(F\)-Statistic

\[ \begin{align} F = \frac{\frac{SS_{model}}{df_{model}}}{\frac{SS_{residual}}{df_{residual}}} = \frac{MS_{Model}}{MS_{Residual}} \\ \\ F = \frac{\frac{5728.79}{2}}{\frac{2826.83}{147}} = \frac{2864.40}{19.23} \\ \\ F = 148.95 = 148.95 \\ \\ F = 148.95 \end{align} \]

We can compare this to the value at the bottom of our model summary():

summary(performance)


Call:
lm(formula = score ~ hours + motivation, data = test_study2)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.9548  -2.8042  -0.2847   2.9344  13.8240 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.86679    0.65473  10.488   <2e-16 ***
hours        1.37570    0.07989  17.220   <2e-16 ***
motivation   0.91634    0.38376   2.388   0.0182 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.386 on 147 degrees of freedom
Multiple R-squared:  0.6696,    Adjusted R-squared:  0.6651 
F-statistic: 148.9 on 2 and 147 DF,  p-value: < 2.2e-16

If we wanted to save this specific value from the summary output, we can do the following:

sum_performance <- summary(performance)
sum_performance$fstatistic

   value    numdf    dendf 
148.9305   2.0000 147.0000

Model Comparisons

Research Question: Does motivation significantly predict score over and above the effects of study hours?

Step 1: Fit & Run Models

#model with just study hours
m1 <- lm(score ~ hours, data = test_study2)
summary(m1)


Call:
lm(formula = score ~ hours, data = test_study2)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.4074  -2.8477  -0.4074   3.2950  13.3165 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.78794    0.66420   10.22   <2e-16 ***
hours        1.36195    0.08094   16.83   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.455 on 148 degrees of freedom
Multiple R-squared:  0.6567,    Adjusted R-squared:  0.6544 
F-statistic: 283.2 on 1 and 148 DF,  p-value: < 2.2e-16

#model with study hours and motivation
m2 <- lm(score ~ hours + motivation, data = test_study2)
summary(m2)


Call:
lm(formula = score ~ hours + motivation, data = test_study2)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.9548  -2.8042  -0.2847   2.9344  13.8240 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.86679    0.65473  10.488   <2e-16 ***
hours        1.37570    0.07989  17.220   <2e-16 ***
motivation   0.91634    0.38376   2.388   0.0182 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.386 on 147 degrees of freedom
Multiple R-squared:  0.6696,    Adjusted R-squared:  0.6651 
F-statistic: 148.9 on 2 and 147 DF,  p-value: < 2.2e-16

Step 2: Statistically Compare Models

Using `anova()`

anova(m1, m2)

Analysis of Variance Table

Model 1: score ~ hours
Model 2: score ~ hours + motivation
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1    148 2936.9                              
2    147 2827.3  1    109.66 5.7017 0.01822 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By Hand (ADVANCED/OPTIONAL)

We can extract the information we need from the summary() output given the relationship between the residual sum of squares and residual standard error:

\[ \text{SS}_\text{Residual} = \text{Residual Standard Error}^2 \cdot df_\text{Residual} \]

Therefore we can calculate our \(SS_\text{Residual}\) as follows:

Restricted Model

sum_m1 <- summary(m1)
sum_m1$sigma #4.4547

[1] 4.454672

sum_m1$df #148

[1]   2 148   2

\[ \begin{align} SSR_R = (4.4547^2) \cdot 148 \\ \\ SSR_R = 2936.964 \\ \end{align} \]

Full Model

sum_m2 <- summary(m2)
sum_m2$sigma #4.3856

[1] 4.385557

sum_m2$df #147

[1]   3 147   3

\[ \begin{align} SSR_F = (4.3856^2) \cdot 147 \\ \\ SSR_F = 2827.323 \\ \end{align} \]

Once we have obtained these values, we can substitute into the formula:

\[ \begin{align} F_{(df_R-df_F),df_F} = \frac{(SSR_R-SSR_F)/(df_R-df_F)}{SSR_F / df_F} \\ \\ F_{(148-147),147} = \frac{(2936.964-2827.323)/(148-147)}{2827.323 / 147} \\ \\ F_{1,147} = \frac{109.641/1}{ 19.2335} \\ \\ F_{1,147} = \frac{109.641}{19.2335} \\ \\ F_{1,147} = 5.70 \end{align} \]

Step 3: Write Up

Motivation was found to explain a significant amount of variance in scores over and above the effects of study hours alone \(F(1, 147) . = 5.70, p < .05)\).