Interactions: Numeric * Numeric

Be sure to check the solutions to last week’s exercises.
You can still ask any questions about previous weeks’ materials if things aren’t clear!

LEARNING OBJECTIVES

Understand the concept of an interaction.
Interpret the meaning of an numeric \(\times\) numeric interaction.
Understand the principle of marginality and why this impacts modelling choices with interactions.
Visualize and probe interactions.

Exercises

Question 1

Previous research has identified an association between an individual’s perception of their social rank and symptoms of depression, anxiety and stress. We are interested in the individual differences in this relationship.

Create a new RMarkdown, load the tidyverse package, read the social comparison study data into R and assign it the name “scs_study.” The data is available at https://uoepsy.github.io/data/scs_study.csv

Solution

Research question
Does the effect of social comparison on symptoms of depression, anxiety and stress vary depending on level of neuroticism?

To investigate whether the effect of social comparison on symptoms of depression, anxiety and stress varies depending on level of neuroticism, we will need to fit a multiple regression model with an interaction term. Before we think about fitting our model, it is important that we understand the data available - how many participants? What type (or class) of data will we be working with? What was the measurement scale? How were scales scored? Look at the social comparison study data codebook below.

Social Comparison Study data codebook

zo	zc	ze	za	zn	scs	dass
0.7587304	1.5838564	-0.79153841	-0.09104726	1.32023703	30	56
0.3033394	-0.2698774	-0.08636742	0.08849465	-0.40280445	30	48
-0.1347251	0.6566343	-0.79796588	-0.94759192	0.92927138	35	48
1.0640775	-1.0218540	-0.16475623	-0.50409122	-0.02021664	29	48
1.7411706	-0.7773205	-1.55003059	-2.86050116	-1.14145028	41	43
0.2203023	-0.4054477	0.78397367	0.90388509	-0.24641819	37	60

Refresher: Z-scores

When we standardise a variable, we re-express each value as the distance from the mean in units of standard deviations. These transformed values are called z-scores.

To transform a given value \(x_i\) into a z-score \(z_i\), we simply calculate the distance from \(x_i\) to the mean, \(\bar{x}\), and divide this by the standard deviation, \(s\):
\[ z_i = \frac{x_i - \bar{x}}{s} \]

A Z-score of a value is the number of standard deviations below/above the mean that the value falls.

Question 2

Specify the model you plan to fit in order to answer the research question (e.g., \(\text{??} = \beta_0 + \beta_1 \cdot \text{??} + .... + \epsilon\))

Solution

Question 3

Produce plots of the relevant distributions and relationships involved in the analysis.

Solution

ggplot(data = scs_study, aes(x=dass)) + 
  geom_density() + 
  geom_boxplot(width = 1/50) +
  labs(title="Marginal distribution of DASS-21 Scores", 
       x = "Depression Anxiety and Stress Scale", y = "Probability density")

The marginal distribution of scores on the Depression, Anxiety and Stress Scale (DASS-21) is unimodal with a mean of approximately 45 and a standard deviation of 7.

ggplot(data = scs_study, aes(x=scs)) + 
  geom_density() + 
  geom_boxplot(width = 1/50) +
  labs(title="Marginal distribution of Social Comparison Scale (SCS) scores", 
       x = "Social Comparison Scale Score", y = "Probability density")

The marginal distribution of score on the Social Comparison Scale (SCS) is unimodal with a mean of approximately 36 and a standard deviation of 4. There look to be a number of outliers at the upper end of the scale.

ggplot(data = scs_study, aes(x=zn)) + 
  geom_density() + 
  geom_boxplot(width = 1/50) +
  labs(title="Marginal distribution of Neuroticism (Z-Scored)", 
       x = "Neuroticism (Z-Scored)", y = "Probability density")

The marginal distribution of Neuroticism (Z-scored) is positively skewed, with the 25% of scores falling below -0.8, 75% of scores falling below 0.59.

library(patchwork) # for arranging plots side by side
library(kableExtra) # for making tables look nice

p1 <- ggplot(data = scs_study, aes(x=scs, y=dass)) + 
  geom_point()+
  labs(x = "SCS", y = "DASS-21")

p2 <- ggplot(data = scs_study, aes(x=zn, y=dass)) + 
  geom_point()+
  labs(x = "Neuroticism", y = "DASS-21")

p1 | p2

# the kable() function from the kableExtra package can make table outputs print nicely into html.
scs_study %>%
  select(dass, scs, zn) %>%
  cor() %>% 
  kable(digits = 2) %>%
  kable_styling(full_width = FALSE)

	dass	scs	zn
dass	1.00	-0.23	0.20
scs	-0.23	1.00	0.11
zn	0.20	0.11	1.00

There is a weak, negative, linear relationship between scores on the Social Comparison Scale and scores on the Depression Anxiety and Stress Scale for the participants in the sample. Severity of symptoms measured on the DASS-21 tend to decrease, on average, the more favourably participants view their social rank.
There is a weak, positive, linear relationship between the levels of Neuroticism and scores on the DASS-21. Participants who are more neurotic tend to, on average, display a higher severity of symptoms of depression, anxiety and stress.

Question 4

Run the code below. It takes the dataset, and uses the cut() function to add a new variable called “zn_group,” which is the “zn” variable split into 4 groups.

Remember: we have re-assign this output as the name of the dataset (the scs_study <- bit at the beginning) to make these changes occur in our environment (the top-right window of Rstudio). If we didn’t have the first line, then it would simply print the output.

scs_study <-
  scs_study %>%
  mutate(
    zn_group = cut(zn, 4)
  )

We can see how it has split the “zn” variable by plotting the two against one another:
(Note that the levels of the new variable are named according to the cut-points).

ggplot(data = scs_study, aes(x = zn_group, y = zn)) + 
  geom_point()

Plot the relationship between scores on the SCS and scores on the DASS-21, for each group of the variable we just created.
How does the pattern change? Does it suggest an interaction?

Tip: Rather than creating four separate plots, you might want to map some feature of the plot to the variable we created in the data, or make use of facet_wrap()/facet_grid().

Solution

Cutting one of the explanatory variables up into groups essentially turns a numeric variable into a categorical one. We did this just to make it easier to visualise how a relationship changes across the values of another variable, because we can imagine a separate line for the relationship between SCS and DASS-21 scores for each of the groups of neuroticism. However, in grouping a numeric variable like this we lose information. Neuroticism is measured on a continuous scale, and we want to capture how the relationship between SCS and DASS-21 changes across that continuum (rather than cutting it into chunks).
We could imagine cutting it into more and more chunks (see Figure 1), until what we end up with is a an infinite number of lines - i.e., a three-dimensional plane/surface (recall that in for a multiple regression model with 2 explanatory variables, we can think of the model as having three-dimensions). The inclusion of the interaction term simply results in this surface no longer being necessarily flat. You can see this in Figure 2.

Figure 1: Separate regression lines DASS ~ SCS for neuroticism when cut into 4 (left) or 6 (center) or 12 (right) groups

Figure 2: 3D plot of regression surface with interaction. You can explore the plot in the figure below from different angles by moving it around with your mouse.

Question 5

Fit your model using lm().

Solution

dass_mdl <- lm(dass ~ 1 + scs*zn, data = scs_study)
summary(dass_mdl)

## 
## Call:
## lm(formula = dass ~ 1 + scs * zn, data = scs_study)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.301  -3.825  -0.173   3.733  45.777 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 60.80887    2.45399  24.780  < 2e-16 ***
## scs         -0.44391    0.06834  -6.495 1.64e-10 ***
## zn          20.12813    2.35951   8.531  < 2e-16 ***
## scs:zn      -0.51861    0.06552  -7.915 1.06e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.123 on 652 degrees of freedom
## Multiple R-squared:  0.1825, Adjusted R-squared:  0.1787 
## F-statistic:  48.5 on 3 and 652 DF,  p-value: < 2.2e-16

Alternatively, as scs*zn expands to scs + zn + scs:zn, you could equivalently fit the same model as above using the following code. The two are completely equivalent and you only need to run one.

dass_mdl <- lm(dass ~ 1 + scs + zn + scs:zn, data = scs_study)
summary(dass_mdl)

Question 6

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	60.8	2.45	24.8	5e-96
scs	-0.444	0.0683	-6.5	1.64e-10
zn	20.1	2.36	8.53	1.02e-16
scs:zn	-0.519	0.0655	-7.92	1.06e-14

Recall that the coefficients zn and scs from our model now reflect the estimated change in the outcome associated with an increase of 1 in the explanatory variables, when the other variable is zero.

Think - what is 0 in each variable? what is an increase of 1? Are these meaningful? Would you suggest recentering either variable?

Solution

Question 7

Recenter one or both of your explanatory variables to ensure that 0 is a meaningful value.

Solution

Question 8

We re-fit the model using mean-centered SCS scores instead of the original variable. Here are the parameter estimates:

dass_mdl2 <- lm(dass ~ 1 + scs_mc * zn, data = scs_study)

# pull out the coefficients from the summary():
summary(dass_mdl2)$coefficients

##               Estimate Std. Error    t value     Pr(>|t|)
## (Intercept) 44.9324476 0.24052861 186.807079 0.000000e+00
## scs_mc      -0.4439065 0.06834135  -6.495431 1.643265e-10
## zn           1.5797687 0.24086372   6.558766 1.105118e-10
## scs_mc:zn   -0.5186142 0.06552100  -7.915236 1.063297e-14

Fill in the blanks in the statements below.

For those of average neuroticism and who score average on the SCS, the estimated DASS-21 Score is ???
For those who who score ??? on the SCS, an increase of ??? in neuroticism is associated with a change of 1.58 in DASS-21 Scores
For those of average neuroticism, an increase of ??? on the SCS is associated with a change of -0.44 in DASS-21 Scores
For every increase of ??? in neuroticism, the change in DASS-21 associated with an increase of ??? on the SCS is asjusted by ???
For every increase of ??? in SCS, the change in DASS-21 associated with an increase of ??? in neuroticism is asjusted by ???

Solution