W1 Exercises: Regression Refresher

Workplace Pride

Data: lmm_jsup.csv

A questionnaire was sent to all UK civil service departments, and the lmm_jsup.csv dataset contains all responses that were received. Some of these departments work as hybrid or ‘virtual’ departments, with a mix of remote and office-based employees. Others are fully office-based.

The questionnaire included items asking about how much the respondent believe in the department and how it engages with the community, what it produces, how it operates and how treats its people. A composite measure of ‘workplace-pride’ was constructed for each employee. Employees in the civil service are categorised into 3 different roles: A, B and C. The roles tend to increase in responsibility, with role C being more managerial, and role A having less responsibility. We also have data on the length of time each employee has been in the department (sometimes new employees come straight in at role C, but many of them start in role A and work up over time).

We’re interested in whether the different roles are associated with differences in workplace-pride.

Dataset: https://uoepsy.github.io/data/lmm_jsup.csv.

variable description
department_name Name of government department
dept Department Acronym
virtual Whether the department functions as hybrid department with various employees working remotely (1), or as a fully in-person office (0)
role Employee role (A, B or C)
seniority Employees seniority point. These map to roles, such that role A is 0-4, role B is 5-9, role C is 10-14. Higher numbers indicate more seniority
employment_length Length of employment in the department (years)
wp Composite Measure of 'Workplace Pride'
Question 1

Read in the data and provide some descriptive statistics.

Don’t remember how to do descriptives? Think back to previous courses - it’s time for some means, standard deviations, mins and maxes. For categorical variables we can do counts or proportions.

We’ve seen various functions such as summary(), and also describe() from the psych package.

Question 2

Are there differences in ‘workplace-pride’ between people in different roles?

does y [continuous variable] differ by x [three groups]? lm(y ~ x)?

Question 3

Is it something about the roles that make people report differences in workplace-pride, or is it possibly just that people who are newer to the company tend to feel more pride than those who have been there for a while (they’re all jaded), and the people in role A tend to be much newer to the company (making it look like the role A results in taking more pride). In other words, if we were to compare people in role A vs role B vs role C but hold constant their employment_length, we might see something different?

Fit another model to find out.

To help with interpreting the model, make a plot that shows all of the relevant variables that are in the model in one way or another.

So we want to adjust for how long people have been part of the company..
Remember - if we want to estimate the effect of x on y while adjusting for z, we can do lm(y ~ z + x).

For the plot - put something on the x, something on the y, and colour it by the other variable.

Question 4

Do roles differ in their workplace-pride, when adjusting for time in the company?

This may feel like a repeat of the previous question, but note that this is not a question about specific group differences. It is about whether, overall, the role groups differ. So it’s wanting to test the joint effect of the two additional parameters we’ve just added to our model. (hint hint model comparison!)

Question 5

Let’s take a step back and remember what data we actually have. We’ve got 295 people in our dataset, from 16 departments.

Departments may well differ in the general amount of workplace-pride people report. People love to say that they work in the “National Crime Agency”, but other departments might not elicit such pride (*cough* HM Revenue & Customs *cough*). We need to be careful not to mistake department differences as something else (like differences due to the job role).

Make a couple of plots to look at:

  1. how many of each role we have from each department
  2. how departments differ in their employees’ pride in their workplace

Question 6

Adjusting for both length of employment and department, are there differences in ‘workplace-pride’ between the different roles?

Can you make a plot of all four of the variables involved in our model?

Making the plot might take some thinking. We’ve now added dept into the mix, so a nice way might be to use facet_wrap() to make the same plot as the one we did previously, but for each department.

Question 7

Now we’re starting to acknowledge the grouped structure of our data - these people in our dataset are related to one another in that some belong to dept 1, some dept 2, and so on..

Let’s try to describe our sample in a bit more detail.

  • how many participants do we have, and from how many departments?
  • how many participants are there, on average, from each department? what is the minimum and maximum?
  • what is the average employment length for our participants?
  • how many departments are ‘virtual departments’ vs office-based?
  • what is the overall average reported workplace-pride?
  • how much variation in workplace-pride is due to differences between departments?

The first lot of these questions can be answered using things like count(), summary(), table(), mean(), min() etc. See 1: Clustered Data #determining-sample-sizes

For the last one, we can use the ICC! See 1: Clustered Data #icc

Question 8

What if we would like to know whether, when adjusting for differences due to employment length and roles, workplace-pride differs between people working in virtual-departments compared to office-based ones?

Can you add this to the model? What happens?