Week 8 Exercises: Multiple Regression

Wellbeing and Outdoor Time

Research Question: after taking into account differences due to people’s level of social interactions, to what extent is well-being associated with the time people spend outdoors?

Data: wellbeing.csv

Researchers interviewed 32 participants, selected at random from the population of residents of Edinburgh & Lothians. They used the Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS), a self-report measure of mental health and well-being. The scale is scored by summing responses to each item, with items answered on a 1 to 5 Likert scale. The minimum scale score is 14 and the maximum is 70.
The researchers also asked participants to estimate the average number of hours they spend outdoors each week, the average number of social interactions they have each week (whether on-line or in-person), and whether they believe that they stick to a routine throughout the week (Yes/No).

The dataset is available at https://uoepsy.github.io/data/wellbeing.csv and contains five attributes:

  • wellbeing: Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS), a self-report measure of mental health and well-being. The scale is scored by summing responses to each item, with items answered on a 1 to 5 Likert scale. The minimum scale score is 14 and the maximum is 70.
  • outdoor_time: Self report estimated number of hours per week spent outdoors
  • social_int: Self report estimated number of social interactions per week (both online and in-person)
  • routine: Binary Yes/No response to the question “Do you follow a daily routine throughout the week?”
  • location: Location of primary residence (City, Suburb, Rural)
Question 1

Read in the data, and explore and describe the variables and relationships that are relevant to the research question.

Hints:

You might want to:

  • plot the marginal distributions (the distributions of each variable in the analysis without reference to the other variables)
  • plot the marginal relationships between the outcome variable and each of the explanatory variables.
  • make a quick correlation matrix of the variables which are to be used in the analysis, and write a short paragraph describing the relationships.
    • e.g. cor(data[,c(1:4)]) will give us a matrix of correlations between each pair of the first 4 variables in the data

Solution

Question 2

Research Question: after taking into account differences due to people’s level of social interactions, to what extent is well-being associated with the time people spend outdoors?

Fit a regression model that you can use to answer the research question (make sure to give it a name to store it in your environment).

Hints:

\(Wellbeing = b_0 \ + \ b_1 \cdot Social Interactions \ + \ b_2 \cdot Outdoor Time \ + \ \epsilon\)

Solution

Question 3

Extract and interpret the parameter estimates (the coefficients) from your model.

Hints:

Solution

Question 4

Along with the p-values given by summary() that test the null hypothesis that each coefficient is equal to zero, we can obtain confidence intervals for our estimates to provide a range of plausible values for each coefficient. This may be useful if we want to focus more on the estimated value, rather than “is it zero or not?”.

Look up the function confint() and use it to obtain some 95% Confidence Intervals for the coefficients.

Solution

Question 5

Does your model provide a better fit to the data than a model with no explanatory variables? (i.e., test against the alternative hypothesis that at least one of the explanatory variables significantly predicts wellbeing scores).

Hints:

  • it’s all in the summary()!!
  • this might not be a useful question to ask, but we’re just trying to go through what each bit of the model output shows.

Solution

Optional Question 6

Just to prove it to ourselves, conduct a model comparison between your model and the “null model”, and check that the \(F\)-statistic from the summary() of your model matches that from the comparison.

Hints:

  • we talked about the ‘null model’, and conducting model comparison in 8A#model-comparisons.
  • we can fit a model with no predictors such as lm(wellbeing ~ 1, data = mwdata).

Solution

Question 7

Check the assumptions of your model.

Hints:

  • See 8B#assumptions
  • You don’t necessarily have to perform statistical tests of assumptions, but check the plots at the very least!
  • Interpreting these plots is ultimately a judgement call, and comes with experience.

Solution

Question 8

Check for influential observations.

Hints:

  • 8B#individual-case-diagnostics covers a whole load of different metrics you might use. As a starting point, perhaps consider Cook’s Distances plots, or use influence.measures() to get lots of info.

Solution

Question 9

Create a visualisation of the relationship between wellbeing and outdoor time, after accounting for social interactions.

Hints:

  • There’s an example of visualising multiple regression coefficients in 8A#multiple-regression-coefficients.
  • to visualise just one association, you might need the terms argument in plot_model(). Don’t forget you can look up the documentation by typing ?plot_model in the console.

Solution

Question 10

Create a regression table to present your results

Hints:

  • There is a useful function called tab_model(), also from the sjPlot package. You can see it used in 7A#example.

Solution

Student Sleep Quality

USMR 2022 Data

The data from the USMR 2022 survey (now closed) can be found at https://uoepsy.github.io/data/usmr2022.csv.

note, this is the survey data just from USMR this year, not other students on other courses or in previous years

Question 11

Recall that last week we discussed my little theory that people who believe they have more control over life will rate their sleep quality as being better.

Figure 3: linear model predicting sleep quality ratings by perceived locus of control

Monica has pointed out that it’s likely that the association we have found (Figure 3) might be due to other things. There are lots of explanations we could come up with. Some of these might involve variables we have measured.1 What if the relationship we see between peoples’ loc and sleeprating is better explained by their emotional stability? Monica makes a convincing argument that having higher emotional stability may be correlated with feeling more in control of one’s life, and may also influence sleep quality.
Sometimes it can help to think of models in the form of diagrams:

Figure 4: Monica’s suggestion, where emotional stability and locus of control are correlated (indicated by double headed arrow), and both influence sleep quality (indicated by single headed arrows)

Monica tells me that if I want to look at how ‘locus of control’ is associated with sleep quality, it would be more useful to think about the association after controlling for emotional stability.

How does locus of control (loc) influence sleep quality (sleeprating) after accounting for emotional stability (emot_stability)?

Hints:

  • See section 8A#multiple-regression-coefficients for how this might be achieved.

  • the way the question is phrased suggests that sleeprating is our outcome (the thing being explained/influenced/predicted); loc is the focal predictor (the main predictor of interest); and emot_stability is a covariate. we treat “covariates” and “focal predictors” exactly the same in the model: lm(outcome ~ covariates and focal predictors), it is just terminology that we use to distinguish what we are interested in from what we want to acknowledge as being theoretically relevant.

  • Don’t worry about assumptions, we’ll take a look at them next.

Solution

Question 12

Monica and I want to write up our study of how locus of control is associated with sleep ratings.

However, we haven’t checked any of our model assumptions, and so we shouldn’t really trust the p-values we have been looking at so far.

Take a look at the assumptions and diagnostics of your model. If your assumptions are worryingly weird, there are things that may be useful to consider doing. We would probably recommend bootstrapping (see 8B#back-to-the-bootstrap).

Hints:

  • You can ask yourself one of two questions (or both if you like), but we recommend the visualisation approach:

  • It’s also worth taking a look at the ‘diagnostics’ (such as the influence of individual cases, see 8B#individual-case-diagnostics.

  • In order to bootstrap, you might need to fit the model to a dataset without any NAs in it.

Solution

Question 13

Monica convinces me that we have theoretical motivation for thinking that sleeprating is influenced by both emotional stability and locus of control.

Tia is curious if other aspects of personality influence sleep quality ratings. She knows that we measured 5 personality traits: emotional stability, imagination, extraversion, agreeableness, and conscientiousness.

Tia asks us:

Over and above the influence of locus of control and emotional stability, are other personality traits important predictors of how sleep quality is rated?

Hints:

  • This question asks about the explanatory power of a set of predictors. The best way to test this might be model comparison (see 8A#model-comparisons)
  • To compare two models, the models need to be fitted to the same data. However, when we fit a model using lm(), rows where there is missing data for any of the variables in the model will be excluded.
    • if we have a model mod1 <- lm(y ~ x1) and we compare it to mod2 <- lm(y ~ x1 + x2), if there is missing data on x2, then mod2 will be fitted to a smaller set of the data.
    • a good approach is to make a separate object in your environment which is the data you want to model.
    # select all variables included in either model, and omit the NAs
    moddata <- data %>% select(y, x1, x2) %>% na.omit
    # these two models will now be fitted to the same data
    mod1 <- lm(y ~ x1, moddata)
    mod2 <- lm(y ~ x1 + x2, moddata)
    # meaning we can compare them:
    anova(mod1, mod2)

Solution

RMarkdown

Question 14

Create an RMarkdown document that presents and describes analyses that addresses the research questions below, and then presents and interprets the results.

Research Questions

  1. After accounting for emotional stability, how is perceived control over one’s life associated with quality of sleep?
  2. In addition to the associations studied above, are other personality traits associated with differences in sleep quality?

Hints:

  • You’ve done the modelling already in the last couple of questions, so this just a matter of writing, interpreting, and presenting!
  • Don’t forget that we have the “Rmd-bootcamp” materials for reference: https://uoepsy.github.io/scs/rmd-bootcamp/

Optional Extra: Conscientiousness

Optional Question 15

This is Eleanor Abernathy. She has been learning about astrology and horoscopes, but is still skeptical. She wants to know if, after accounting for differences between cat/dog people, a person’s month-of-birth is associated with how conscientious they are.

Using the same USMR 2022 survey data, fit the a model and use it to address the following research question:

After accounting for differences between cat/dog people, is month-of-birth associated with conscientiousness?

Hint:

  • Note that the question is more a matter of “is”/“does”, and not “what is”/“how does”.
  • To answer this question, we might be more inclined to analyse the variance in conscientiousness explained by birth-months (rather than looking at specific differences) - see 8A#analysis-of-variance.

Solution

Footnotes

  1. When designing a study/collected data, it’s good practice to do this sort of thinking before hand, to make sure you measure all the things you want to measure↩︎