Information about solutions
Solutions for these exercises are available immediately below each question.
We would like to emphasise that much evidence suggests that testing enhances learning, and we strongly encourage you to make a concerted attempt at answering each question before looking at the solutions. Immediately looking at the solutions and then copying the code into your work will lead to poorer learning.
We would also like to note that there are always many different ways to achieve the same thing in R, and the solutions provided are simply one approach.
Relevant packages
Let’s imagine we are interested in peoples’ intention to get vaccinated, and we observe the following variables:
If we draw out our variables, and think about this in the form of a standard regression model with “Intention to vaccinate” as our outcome variable, then all the lines are filled in for us (see Figure 1)
But what if our theory suggests that some other model might be of more relevance? For instance, what if we believe that participants’ religiosity has an effect on their Health Locus of Control score, which in turn affects the intention to vaccinate (see Figure 2)?
In this case, the HLC variable is thought of as a mediator, because it mediates the effect of religiosity on intention to vaccinate. In this theoretical model, we distinguishing between two possible types of effect: direct and indirect.
Direct vs Indirect
In path diagrams:
We’ve seen how path analysis works in last week’s lab, and we can use that same logic to investigate models which have quite different structures such as those including mediating variables.
Because we have multiple endogenous variables here, then we’re immediately drawn to path analysis, because we’re in essence thinking of conducting several regression models. As we can’t fit our theoretical model into a nice straightforward regression model (we would need several), then we can use path analysis instead, and just smush lots of regressions together, allowing us to estimate things all at once.
So let’s look at fitting our mediation model.
Data: VaxDat.csv
100 parents were recruited and completed a survey that included measures of Health Locus of Control, Religiosity, and a sliding scale of 0 to 100 on how definite they were that their child would receive the vaccination for measles, mumps & rubella.
The data is available at https://uoepsy.github.io/data/vaxdat.csv
First we read in our data:
<- read_csv("https://uoepsy.github.io/data/vaxdat.csv")
vax summary(vax)
## religiosity hlc intention
## Min. :-1.0 Min. :0.40 Min. :39.0
## 1st Qu.: 1.8 1st Qu.:2.00 1st Qu.:59.0
## Median : 2.4 Median :3.00 Median :64.0
## Mean : 2.4 Mean :2.99 Mean :65.1
## 3rd Qu.: 3.0 3rd Qu.:3.60 3rd Qu.:74.0
## Max. : 4.6 Max. :5.80 Max. :88.0
It looks like we have a value we shouldn’t have there. We have a value of religiosity of -1. But the average of 5 items which each score 0 to 6 will have to fall within those bounds.
Let’s replace any values <0 with NA.
<-
vax %>%
vax mutate(religiosity = ifelse(religiosity < 0, NA, religiosity))
summary(vax)
## religiosity hlc intention
## Min. :0.20 Min. :0.40 Min. :39.0
## 1st Qu.:1.80 1st Qu.:2.00 1st Qu.:59.0
## Median :2.40 Median :3.00 Median :64.0
## Mean :2.43 Mean :2.99 Mean :65.1
## 3rd Qu.:3.00 3rd Qu.:3.60 3rd Qu.:74.0
## Max. :4.60 Max. :5.80 Max. :88.0
## NA's :1
Now let’s just check the marginal distributions of each variable.
There are lots of functions that we might use, but for a quick eyeball, I quite like the pairs.panels()
and multi.hist()
functions from the psych package.
library(psych)
pairs.panels(vax)
Okay, we’re ready to start thinking about our model.
First we specify the relevant paths:
<- "
med_model intention ~ religiosity
intention ~ hlc
hlc ~ religiosity
"
If we fit this model as it is, we won’t actually be testing the indirect effect, we will simply be fitting a couple of regressions.
To specifically test the indirect effect, we need to explicitly define the indirect effect in our model, by first creating a label for each of its sub-component paths, and then defining the indirect effect itself as the product of these two paths (why the product? Click here for a lovely explanation from Aja Murray).
To do this, we use a new operator, :=
.
<- "
med_model intention ~ religiosity
intention ~ a*hlc
hlc ~ b*religiosity
indirect:=a*b
"
:=
This operator ‘defines’ new parameters which take on values that are an arbitrary function of the original model parameters. The function, however, must be specified in terms of the parameter labels that are explicitly mentioned in the model syntax.
Note. The labels we use are completely up to us. This would be equivalent:
<- "
med_model intention ~ religiosity
intention ~ peppapig * hlc
hlc ~ kermit * religiosity
indirect:= kermit * peppapig
"
It is common to estimate the indirect effect using a bootstrapping approach (remember, bootstrapping involves resampling the data with replacement, thousands of times, in order to empirically generate a sampling distribution).
Why do we bootstrap mediation analysis?
We compute our indirect effect as the product of the sub-component paths. However, this results in the estimated indirect effect rarely following a normal distribution, and makes our usual analytically derived standard errors & p-values inappropriate.
Instead, bootstrapping has become the norm for assessing sampling distributions of indirect effects in mediation models.
We can do this easily in lavaan:
<- sem(med_model, data=vax, se = "bootstrap")
mm1.est summary(mm1.est, ci = TRUE)
## lavaan 0.6-8 ended normally after 25 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Used Total
## Number of observations 99 100
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## intention ~
## religiosty 0.616 1.109 0.555 0.579 -1.651 2.620
## hlc (a) 5.823 0.962 6.051 0.000 3.950 7.794
## hlc ~
## religiosty (b) 0.557 0.083 6.736 0.000 0.393 0.716
##
## Variances:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## .intention 62.112 7.733 8.032 0.000 45.544 76.037
## .hlc 0.741 0.104 7.143 0.000 0.535 0.950
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## indirect 3.243 0.685 4.736 0.000 2.033 4.684
::semTable(mm1.est, type="html") semTable
## <table style="padding-right:20px;padding-left:20px;">
## <tr><td></td><td colspan = '4'; align = 'center'>Model</td></tr>
## <tr><td></td><td colspan = '1'; align = 'center'>Estimate</td><td colspan = '1'; align = 'center'>Std. Err.</td><td colspan = '1'; align = 'center'>z</td><td colspan = '1'; align = 'center'>p</td></tr>
## <tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Regression Slopes</span></td></tr> <tr><td colspan = '1'; align = 'left'><span style="text-decoration: underline;">intention</span></td></tr>
## <tr><td>religiosity</td><td>0.62</td><td>1.11</td><td>0.56</td><td>.579</td></tr>
## <tr><td>hlc</td><td>5.82</td><td>0.96</td><td>6.05</td><td>.000</td></tr>
## <tr><td colspan = '1'; align = 'left'><span style="text-decoration: underline;">hlc</span></td></tr>
## <tr><td>religiosity</td><td>0.56</td><td>0.08</td><td>6.74</td><td>.000</td></tr>
## <tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Residual Variances</span></td></tr>
## <tr><td>intention</td><td>62.11</td><td>7.73</td><td>8.03</td><td>.000</td></tr>
## <tr><td>hlc</td><td>0.74</td><td>0.10</td><td>7.14</td><td>.000</td></tr>
## <tr><td>religiosity</td><td>0.94<sup>+</sup></td><td></td><td></td><td></td></tr>
## <tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Constructed</span></td></tr>
## <tr><td>indirect</td><td>3.24</td><td>0.68</td><td>4.74</td><td>.000</td></tr>
## <tr><td></td><td colspan = '4'; align = 'center'><span style="text-decoration: underline;">Fit Indices</span></td></tr>
## <tr><td>χ<sup>2</sup></td><td>0.00(0)</td><td></td><td></td><td></td></tr>
## <tr><td>CFI</td><td>1.00</td><td></td><td></td><td></td></tr>
## <tr><td>TLI</td><td>1.00</td><td></td><td></td><td></td></tr>
## <tr><td>RMSEA</td><td>0.00</td><td></td><td></td><td></td></tr>
## <tr><td colspan = '5'; align = 'left'><sup>+</sup>Fixed parameter</td></tr>
## </table><br>
##
We can see that the 95% bootstrapped confidence interval for the indirect effect of religiosity on intention to vaccinate does not include zero. We can conclude that the indirect effect is significant at \(p <.05\). The direct effect is not significantly different from zero, suggesting that we have complete mediation (religiosity has no effect on intention to vaccinate after controlling for health locus of control).
Partial/Complete Mediation
If we have a variable \(X\) that we take to cause variable \(Y\), then our path diagram will look like so:
In this diagram, path \(c\) is the total effect. This is the unmediated effect of \(X\) on \(Y\).
However, while the effect of \(X\) on \(Y\) could in part be explained by the process of being mediated by some variable \(M\), the variable \(X\) could still affect \(Y\) directly.
Our mediating model is shown below:
In this case, path \(c'\) is the direct effect.
Complete mediation is when \(X\) no longer affects \(Y\) after \(M\) has been controlled (so path \(c'\) is not significantly different from zero). Partial mediation is when the path from \(X\) to \(Y\) is reduced in magnitude when the mediator \(M\) is introduced, but still different from zero.
Finally, we can visualise the estimates for our model using the semPaths()
function from the semPlot package, and also with the graph_sem()
function from the tidySEM package.
Often, if we want to include a path diagram in a report then the output of these functions would not usually meet publication standards, and instead we tend to draw them in programs like powerpoint!)
library(tidySEM)
graph_sem(mm1.est, layout = get_layout(mm1.est))
This week’s lab focuses on the technique of path analysis using data on conduct problems in adolescence.
Data: conduct problems
A researcher has collected data on n=557 adolescents and would like to know whether there are associations between conduct problems (both aggressive and non-aggressive) and academic performance and whether the relations are mediated by the quality of relationships with teachers.
variable | description |
---|---|
ID | Participant ID |
Acad | Academic performance (standardised) |
Teach_r | Teacher Relationship Quality (standardised) |
Non_agg | Non-aggressive conduct problems (standardised) |
Agg | Aggressive conduct problems (standardised) |
The data is available at https://uoepsy.github.io/data/cp_teachacad.csv
First, read in the dataset from https://uoepsy.github.io/data/cp_teachacad.csv
Use the sem()
function in lavaan to specify and estimate a straightforward linear regression model to test whether aggressive and non-aggressive conduct problems significantly predict academic performance.
How do your results compare to those you obtain using the lm()
function?
Now specify a model in which non-aggressive conduct problems have both a direct and indirect effect (via teacher relationships) on academic performance
Make sure in your model you define (using the :=
operator) the indirect effect in order to test the hypothesis that non-aggressive conduct problems have both a direct and an indirect effect (via teacher relationships) on academic performance.
Fit the model and examine the 95% CI.
Specify a new parameter which is the total (direct+indirect) effect of non-aggressive conduct problems on academic performance.
Hint: we should have already got labels in our model for the constituent effects, so we can just use :=
to create a sum of them.
Now visualise the estimated model and its parameters using either the semPaths()
function from the semPlot package, or the graph_sem()
function from the tidySEM package.
Now specify a model in which both aggressive and non-aggressive conduct problems have both direct and indirect effects (via teacher relationships) on academic performance. Include the parameters for the indirect effects.
Now estimate the model and test the significance of the indirect effects
Write a brief paragraph reporting on the results of the model estimates in Question B2. Include a figure or table to display the parameter estimates.