Model Comparisons

Learning Objectives

At the end of this lab, you will:

  1. Understand measures of model fit using F.
  2. Understand the principles of model selection and how to compare models via F tests.
  3. Understand AIC and BIC.

What You Need

  1. Be up to date with lectures
  2. Have completed previous lab exercises from Semester 1

Required R Packages

Remember to load all packages within a code chunk at the start of your RMarkdown file using library(). If you do not have a package and need to install, do so within the console using install.packages(" "). For further guidance on installing/updating packages, see Section C here.

For this lab, you will need to load the following package(s):

  • tidyverse
  • stargazer

Lab Data

You can download the data required for this lab here or read it in via this link https://uoepsy.github.io/data/wellbeing_rural.csv

Study Overview

Research Questions

  • RQ1: Is there an overall effect of the number of social interactions on wellbeing scores?
  • RQ2: Does the association between number of social interactions and wellbeing differ between rural and non-rural residents?
  • RQ3: Does weekly outdoor time explain a significant amount of variance in wellbeing scores over and above the interaction between weekly social interactions and location (rural vs not-rural)?
Wellbeing/Rurality data codebook.

Setup

Setup
  1. Create a new RMarkdown file
  2. Load the required package(s)
  3. Read the wellbeing_rural dataset into R, assigning it to an object named wrdata

Solution

Exercises

Question 1

Check coding of variables (e.g., that categorical variables are coded as factors), and create a new binary variable which specifies whether or not each participant lives in a rural location.

You will need to use two functions - firstly mutate(), and then ifelse() when creating your new variable (which should also be coded as a factor).

Solution


Question 2

Using fct_relevel(), specify ‘not rural’ as your reference group for your newly created variable (i.e., the isRural variable).

Solution


Question 3

Fit the below 5 models required to address the three research questions stated above. Note down which model(s) will be used to address each research question, and examine the results of each model.

Name the models as follows: “wb_mdl0”, “wb_mdl1”, “wb_mdl2”, “wb_mdl3”, and “wb_mdl4”.


\[ \text{Wellbeing} = \beta_0 + \epsilon \]
\[ \text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \epsilon \]
\[ \text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \beta_2 \cdot Location_{Rural} + \epsilon \]
\[ \begin{split} \text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \beta_2 \cdot Location_{Rural} \\+ \beta_3 \cdot (Social Interactions \cdot Location_{Rural}) + \epsilon \end{split} \]
\[ \begin{split} \text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \beta_2 \cdot Location_{Rural} \\+ \beta_3 \cdot (Social Interactions \cdot Location_{Rural}) + \beta_4 \cdot \text{Outdoor Time} + \epsilon \end{split} \]

The summary() function will be useful to examine the model output.

Solution


Question 4

Provide key model results from the two models required to address RQ1 - whether there is an overall effect of the number of social interactions on wellbeing scores - in a single formatted table.

You will need to use a new package to do this - stargazer.

Like tab_model() that you have used in many previous labs, stargazer() can take lots of different arguments to customize and build a table. You may want to consider specifying the below (and remember you can use the helper function via ?stargazer() for further information about the functionality of the package):

  • title = - specify the title of your table
  • dep.var.labels = - specify the name of your dependent variable(s)
  • covariate.labels = - specify the names of your covariates (or independent) variables
  • type = - specify whether you want ‘html’ (use when knitting to HTML), ‘latex’ (use when knitting to PDF), or ‘text’ (use when knitting to Word)
  • digits = - specify rounding (remember APA standard is, in most cases, 2 decimal places)
  • intercept.bottom = - specify if you want the intercept (or ‘constant’) value to be printed at the bottom (TRUE) or top (FALSE) of the output
Note

Your table will only render once you have knitted your document. Within your code chunk options, you may need to specify results = 'asis'.

You can learn more about updating your code chunk options here, and you should end up with the below:

Solution


Question 5

Is there a main effect of the number of weekly social interactions?

Check that the \(F\)-statistic and the \(p\)-value are the the same as that which is given at the bottom of summary(wb_mdl1).

Use the anova() function to perform a model comparison between your model with social interactions (wb_mdl1) to the null model (wb_mdl0).

Remember that the null model tests the null hypothesis that all beta coefficients are zero. By comparing wb_mdl0 to wb_mdl1, we can test whether we should include the IV of social_int.

Solution


Question 6

Does the association between number of social interactions and wellbeing differ between rural and non-rural residents?

Provide key model results from the two models in a single formatted table, and report the results of the model comparison in APA format.

To address RQ2, you need to compare “wb_mdl2” and “wb_mdl3”

Solution


Question 7

Look at the amount of variation in wellbeing scores explained by models “wb_mdl3” and “wb_mdl4”.

From this, can we answer the third research question of whether weekly outdoor time explains a significant amount of variance in wellbeing scores over and above the interaction between weekly social interactions and location (rural vs not-rural)?

Provide justification/rationale for your answer.

Recall from Semester 1 that to determine how much variation is explained by a model, we need to look at our \(R^2\) values (specifically the adjusted \(R^2\) value in this case since the models have multiple predictors.

Solution


Question 8

Does weekly outdoor time explain a significant amount of variance in wellbeing scores over and above the interaction between weekly social interactions and location (rural vs not-rural)?

Provide key model results from the two models in a single formatted table, and report the results of the model comparison in APA format.

To address RQ3, you need to compare “wb_mdl3” and “wb_mdl4”

Solution


Question 9

Compare the two following models, each looking at the associations of Wellbeing scores and two different predictor variables.

\(\text{Wellbeing} = \beta_0 + \beta_1 \cdot \text{Social Interactions} + \beta_2 \cdot \text{Age} + \epsilon\)

\(\text{Wellbeing} = \beta_0 + \beta_1 \cdot \text{Outdoor Time} + \beta_2 \cdot \text{Routine} + \epsilon\)

In APA format, report which model you think best fits the data.

Compare using AIC() and BIC() since the models are non-nested.

Solution


Question 10

The code below fits 5 different models based on our wrdata:

model1 <- lm(wellbeing ~ social_int + outdoor_time, data = wrdata)
model2 <- lm(wellbeing ~ social_int + outdoor_time + age, data = wrdata)
model3 <- lm(wellbeing ~ social_int + outdoor_time + routine, data = wrdata)
model4 <- lm(wellbeing ~ social_int + outdoor_time + routine + age, data = wrdata)
model5 <- lm(wellbeing ~ social_int + outdoor_time + routine + steps_k, data = wrdata)

For each of the below pairs of models, what methods are/are not available for us to use for comparison and why?

  • model1 vs model2
  • model2 vs model3
  • model1 vs model4
  • model3 vs model5

This flowchart might help you to reach your decision:

You may need to examine the dataset, and check for accuracy (e.g., are there any impossible / out of range values?) and completeness (e.g., are there any missing values?).

Solution