age | outdoor_time | social_int | routine | wellbeing | location | steps_k |
---|---|---|---|---|---|---|
28 | 12 | 13 | 1 | 36 | rural | 21.6 |
56 | 5 | 15 | 1 | 41 | rural | 12.3 |
25 | 19 | 11 | 1 | 35 | rural | 49.8 |
60 | 25 | 15 | 0 | 35 | rural | NA |
19 | 9 | 18 | 1 | 32 | rural | 48.1 |
34 | 18 | 13 | 1 | 34 | rural | 67.3 |
Model Comparisons
Learning Objectives
At the end of this lab, you will:
- Understand measures of model fit using F.
- Understand the principles of model selection and how to compare models via F tests.
- Understand AIC and BIC.
What You Need
- Be up to date with lectures
- Have completed previous lab exercises from Semester 1
Required R Packages
Remember to load all packages within a code chunk at the start of your RMarkdown file using library()
. If you do not have a package and need to install, do so within the console using install.packages(" ")
. For further guidance on installing/updating packages, see Section C here.
For this lab, you will need to load the following package(s):
- tidyverse
- stargazer
Lab Data
You can download the data required for this lab here or read it in via this link https://uoepsy.github.io/data/wellbeing_rural.csv
Study Overview
Research Questions
- RQ1: Is there an overall effect of the number of social interactions on wellbeing scores?
- RQ2: Does the association between number of social interactions and wellbeing differ between rural and non-rural residents?
- RQ3: Does weekly outdoor time explain a significant amount of variance in wellbeing scores over and above the interaction between weekly social interactions and location (rural vs not-rural)?
Setup
- Create a new RMarkdown file
- Load the required package(s)
- Read the wellbeing_rural dataset into R, assigning it to an object named
wrdata
Exercises
Check coding of variables (e.g., that categorical variables are coded as factors), and create a new binary variable which specifies whether or not each participant lives in a rural location.
Using fct_relevel()
, specify ‘not rural’ as your reference group for your newly created variable (i.e., the isRural variable).
Fit the below 5 models required to address the three research questions stated above. Note down which model(s) will be used to address each research question, and examine the results of each model.
Name the models as follows: “wb_mdl0”, “wb_mdl1”, “wb_mdl2”, “wb_mdl3”, and “wb_mdl4”.
\[
\text{Wellbeing} = \beta_0 + \epsilon
\]
\[
\text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \epsilon
\]
\[
\text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \beta_2 \cdot Location_{Rural} + \epsilon
\]
\[
\begin{split}
\text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \beta_2 \cdot Location_{Rural} \\+ \beta_3 \cdot (Social Interactions \cdot Location_{Rural}) + \epsilon
\end{split}
\]
\[
\begin{split}
\text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \beta_2 \cdot Location_{Rural} \\+ \beta_3 \cdot (Social Interactions \cdot Location_{Rural}) + \beta_4 \cdot \text{Outdoor Time} + \epsilon
\end{split}
\]
The summary()
function will be useful to examine the model output.
Provide key model results from the two models required to address RQ1 - whether there is an overall effect of the number of social interactions on wellbeing scores - in a single formatted table.
You will need to use a new package to do this - stargazer.
Like tab_model()
that you have used in many previous labs, stargazer()
can take lots of different arguments to customize and build a table. You may want to consider specifying the below (and remember you can use the helper function via ?stargazer()
for further information about the functionality of the package):
-
title =
- specify the title of your table -
dep.var.labels =
- specify the name of your dependent variable(s) -
covariate.labels =
- specify the names of your covariates (or independent) variables -
type =
- specify whether you want ‘html’ (use when knitting to HTML), ‘latex’ (use when knitting to PDF), or ‘text’ (use when knitting to Word) -
digits =
- specify rounding (remember APA standard is, in most cases, 2 decimal places) -
intercept.bottom =
- specify if you want the intercept (or ‘constant’) value to be printed at the bottom (TRUE) or top (FALSE) of the output
Your table will only render once you have knitted your document. Within your code chunk options, you may need to specify results = 'asis'
.
You can learn more about updating your code chunk options here, and you should end up with the below:
Is there a main effect of the number of weekly social interactions?
Check that the \(F\)-statistic and the \(p\)-value are the the same as that which is given at the bottom of summary(wb_mdl1)
.
Use the anova()
function to perform a model comparison between your model with social interactions (wb_mdl1) to the null model (wb_mdl0).
Remember that the null model tests the null hypothesis that all beta coefficients are zero. By comparing wb_mdl0 to wb_mdl1, we can test whether we should include the IV of social_int.
Does the association between number of social interactions and wellbeing differ between rural and non-rural residents?
Provide key model results from the two models in a single formatted table, and report the results of the model comparison in APA format.
To address RQ2, you need to compare “wb_mdl2” and “wb_mdl3”
Look at the amount of variation in wellbeing scores explained by models “wb_mdl3” and “wb_mdl4”.
From this, can we answer the third research question of whether weekly outdoor time explains a significant amount of variance in wellbeing scores over and above the interaction between weekly social interactions and location (rural vs not-rural)?
Provide justification/rationale for your answer.
Recall from Semester 1 that to determine how much variation is explained by a model, we need to look at our \(R^2\) values (specifically the adjusted \(R^2\) value in this case since the models have multiple predictors.
Does weekly outdoor time explain a significant amount of variance in wellbeing scores over and above the interaction between weekly social interactions and location (rural vs not-rural)?
Provide key model results from the two models in a single formatted table, and report the results of the model comparison in APA format.
To address RQ3, you need to compare “wb_mdl3” and “wb_mdl4”
Compare the two following models, each looking at the associations of Wellbeing scores and two different predictor variables.
\(\text{Wellbeing} = \beta_0 + \beta_1 \cdot \text{Social Interactions} + \beta_2 \cdot \text{Age} + \epsilon\)
\(\text{Wellbeing} = \beta_0 + \beta_1 \cdot \text{Outdoor Time} + \beta_2 \cdot \text{Routine} + \epsilon\)
In APA format, report which model you think best fits the data.
The code below fits 5 different models based on our wrdata
:
model1 <- lm(wellbeing ~ social_int + outdoor_time, data = wrdata)
model2 <- lm(wellbeing ~ social_int + outdoor_time + age, data = wrdata)
model3 <- lm(wellbeing ~ social_int + outdoor_time + routine, data = wrdata)
model4 <- lm(wellbeing ~ social_int + outdoor_time + routine + age, data = wrdata)
model5 <- lm(wellbeing ~ social_int + outdoor_time + routine + steps_k, data = wrdata)
For each of the below pairs of models, what methods are/are not available for us to use for comparison and why?
-
model1
vsmodel2
-
model2
vsmodel3
-
model1
vsmodel4
-
model3
vsmodel5
This flowchart might help you to reach your decision:
You may need to examine the dataset, and check for accuracy (e.g., are there any impossible / out of range values?) and completeness (e.g., are there any missing values?).