age | outdoor_time | social_int | routine | wellbeing | location | steps_k |
---|---|---|---|---|---|---|
28 | 12 | 13 | 1 | 36 | rural | 21.6 |
56 | 5 | 15 | 1 | 41 | rural | 12.3 |
25 | 19 | 11 | 1 | 35 | rural | 49.8 |
60 | 25 | 15 | 0 | 35 | rural | NA |
19 | 9 | 18 | 1 | 32 | rural | 48.1 |
34 | 18 | 13 | 1 | 34 | rural | 67.3 |
Model Comparisons
Learning Objectives
At the end of this lab, you will:
- Understand measures of model fit using F.
- Understand the principles of model selection and how to compare models via F tests.
- Understand AIC and BIC.
What You Need
- Be up to date with lectures
- Have completed previous lab exercises from Semester 1
Required R Packages
Remember to load all packages within a code chunk at the start of your RMarkdown file using library()
. If you do not have a package and need to install, do so within the console using install.packages(" ")
. For further guidance on installing/updating packages, see Section C here.
For this lab, you will need to load the following package(s):
- tidyverse
- stargazer
Lab Data
You can download the data required for this lab here or read it in via this link https://uoepsy.github.io/data/wellbeing_rural.csv
Study Overview
Research Questions
- RQ1: Is there an overall effect of the number of social interactions on wellbeing scores?
- RQ2: Does the association between number of social interactions and wellbeing differ between rural and non-rural residents?
- RQ3: Does weekly outdoor time explain a significant amount of variance in wellbeing scores over and above the interaction between weekly social interactions and location (rural vs not-rural)?
Setup
- Create a new RMarkdown file
- Load the required package(s)
- Read the wellbeing_rural dataset into R, assigning it to an object named
wrdata
Exercises
Check coding of variables (e.g., that categorical variables are coded as factors), and create a new binary variable which specifies whether or not each participant lives in a rural location.
Using fct_relevel()
, specify ‘not rural’ as your reference group for your newly created variable (i.e., the isRural variable).
Fit the below 5 models required to address the three research questions stated above. Note down which model(s) will be used to address each research question, and examine the results of each model.
Name the models as follows: “wb_mdl0”, “wb_mdl1”, “wb_mdl2”, “wb_mdl3”, and “wb_mdl4”.
\[
\text{Wellbeing} = \beta_0 + \epsilon
\]
\[
\text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \epsilon
\]
\[
\text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \beta_2 \cdot Location_{Rural} + \epsilon
\]
\[
\begin{split}
\text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \beta_2 \cdot Location_{Rural} \\+ \beta_3 \cdot (Social Interactions \cdot Location_{Rural}) + \epsilon
\end{split}
\]
\[
\begin{split}
\text{Wellbeing} = \beta_0 + \beta_1 \cdot Social Interactions + \beta_2 \cdot Location_{Rural} \\+ \beta_3 \cdot (Social Interactions \cdot Location_{Rural}) + \beta_4 \cdot \text{Outdoor Time} + \epsilon
\end{split}
\]
Provide key model results from the two models required to address RQ1 - whether there is an overall effect of the number of social interactions on wellbeing scores - in a single formatted table.
Is there a main effect of the number of weekly social interactions?
Check that the \(F\)-statistic and the \(p\)-value are the the same as that which is given at the bottom of summary(wb_mdl1)
.
Does the association between number of social interactions and wellbeing differ between rural and non-rural residents?
Provide key model results from the two models in a single formatted table, and report the results of the model comparison in APA format.
Look at the amount of variation in wellbeing scores explained by models “wb_mdl3” and “wb_mdl4”.
From this, can we answer the third research question of whether weekly outdoor time explains a significant amount of variance in wellbeing scores over and above the interaction between weekly social interactions and location (rural vs not-rural)?
Provide justification/rationale for your answer.
Does weekly outdoor time explain a significant amount of variance in wellbeing scores over and above the interaction between weekly social interactions and location (rural vs not-rural)?
Provide key model results from the two models in a single formatted table, and report the results of the model comparison in APA format.
Compare the two following models, each looking at the associations of Wellbeing scores and two different predictor variables.
\(\text{Wellbeing} = \beta_0 + \beta_1 \cdot \text{Social Interactions} + \beta_2 \cdot \text{Age} + \epsilon\)
\(\text{Wellbeing} = \beta_0 + \beta_1 \cdot \text{Outdoor Time} + \beta_2 \cdot \text{Routine} + \epsilon\)
In APA format, report which model you think best fits the data.
The code below fits 5 different models based on our wrdata
:
model1 <- lm(wellbeing ~ social_int + outdoor_time, data = wrdata)
model2 <- lm(wellbeing ~ social_int + outdoor_time + age, data = wrdata)
model3 <- lm(wellbeing ~ social_int + outdoor_time + routine, data = wrdata)
model4 <- lm(wellbeing ~ social_int + outdoor_time + routine + age, data = wrdata)
model5 <- lm(wellbeing ~ social_int + outdoor_time + routine + steps_k, data = wrdata)
For each of the below pairs of models, what methods are/are not available for us to use for comparison and why?
-
model1
vsmodel2
-
model2
vsmodel3
-
model1
vsmodel4
-
model3
vsmodel5
This flowchart might help you to reach your decision: