Modelling group-level variability using random effects

Get set up

Create a new .Rmd file for this week’s exercises.
Save it somewhere you can find it again.
Give it a clear name (for example, dapr3_lab02.Rmd).
In the first code chunk, load the packages you’ll need this week:
- tidyverse
- lme4

Read in the dataset located at https://uoepsy.github.io/data/lmm_jsup.csv and name it wp_data.

RQ: How is the pride that government employees feel about their workplace associated with how long they have been employed and their seniority level?

variable	description
department_name	Name of government department
dept	Department Acronym
virtual	Whether the department functions as hybrid department with various employees working remotely (1), or as a fully in-person office (0)
role	Employee role (A, B or C)
seniority	Employee's seniority level. These map to roles, such that role A is 0-4, role B is 5-9, role C is 10-14. Higher numbers indicate more seniority
employment_length	Length of employment in the department (years)
wp	Composite measure of 'workplace pride'

More detail about this dataset

A questionnaire was sent to all UK civil service departments, and the lmm_jsup.csv dataset contains all responses that were received. Some of these departments work as hybrid or ‘virtual’ departments, with a mix of remote and office-based employees. Others are fully office-based.

The questionnaire included items asking about how much the respondent believe in the department and how it engages with the community, what it produces, how it operates and how treats its people. A composite measure of ‘workplace-pride’ was constructed for each employee. Employees in the civil service are categorised into 3 different roles: A, B and C. The roles tend to increase in responsibility, with role C being more managerial, and role A having less responsibility. We also have data on the length of time each employee has been in the department (sometimes new employees come straight in at role C, but many of them start in role A and work up over time).

Question 1

The research question (see above) is asking about how workplace pride is associated with employment length and seniority. From this question, we know that our outcome variable will be wp, and our predictors will be employment_length and seniority.

Now: Our analysis will also contain one set of random effects, because we’ll need to model the random variability contributed by a grouping variable. Identify the grouping variable that we will need to model using random effects.

🗂️ See Identify grouping variables flash card.

Question 2

Get to know the dataset and describe the grouping variable.

Specifically:

What does each observation in wp_data represent?
How many levels are there in the grouping variable?
What does each level represent?
How many observations do we have from each level? Which level has the most observations? Which has the fewest?

🗂️ See Describe group-structured data flash card.

Question 3

Here is a plot of the overall association between employment_length and wp. Each dot is coloured based on that employee’s seniority, so all variables from the RQ are included. First, copy this code and generate this plot yourself.

wp_data |>
  ggplot(aes(x = employment_length, y = wp, colour = seniority)) +
  geom_point()

Modify this plot to show the same data split into different panels (also called facets), one for each level of the grouping variable.

🗂️ See Plot group-structured data flash card.

Question 4

Make some predictions based on the plot you just created.

Specifically:

Which level(s) of the grouping variable appear to have higher workplace pride than the others? Which levels appear to have lower workplace pride?
Are there particular levels of this grouping variable that appear to have a stronger effect of employment length on workplace pride?

You’ll refer to these levels later, so write them down.

Question 5

Now we’ll fit a linear mixed model to this data.

We want to estimate the effects of employment length and seniority on workplace pride. Therefore, we know that our fixed effects will be wp ~ employment_length + seniority.

The model should include random intercepts by the grouping variable, as well as random slopes by that variable over both predictors. Fill in the ... blanks in the R code below with the appropriate values and variable names.

wp_lmm <- lmer(
  wp ~ employment_length + seniority + (... + ... + ... | ...), 
  data = wp_data
)

🗂️ See Add random effects to a model formula flash card.

Question 6

Plot the random effects estimated by wp_lmm using dotplot.ranef.mer().

🗂️ See Plot random effects flash card.

Question 7

Imagine you were including this dotplot of random effects in a report, and you had write a caption to describe it to your reader. Write one sentence per panel to tell the reader what is being displayed in each panel, and make sure to include what the dots/error bars represent.

Question 8

Compare your earlier predictions to the model-estimated random effects.

Back in Q4, you named a few levels that you thought might behave differently from the others.

Locate those levels in the dotplot of your random effects.

Does it look like the levels you identified are behaving differently from average in the way you predicted? How can you tell?

Question 9

Why are the numbers given to you by ranef(wp_lmm) and by coef(wp_lmm) different?

Explain in a sentence or two.

🗂️ See Extract estimates from a fitted model flash card.

Question 10

How would you use the values given to you by ranef(wp_lmm) and fixef() to recreate the values of coef()?

Describe in one sentence.

Bonus coding challenge: Actually use the values from ranef() and fixef() to recreate the values of coef().