Week 10 Exercises: Structural Equation Modelling (SEM)

You have probably heard the term “Structural Equation Modelling (SEM)” for a few weeks now, but we haven’t been very clear on what exactly it is. Is it CFA? Is it Path Analysis? In fact it is both - it is the overarching framework of which CFA and Path Analysis are just particular cases. The beauty comes in when we put the CFA and Path Analysis approaches together.

Path analysis, as we saw last week, offers a way of specifying and evaluating a structural model, in which variables relate to one another in various ways, via different (sometimes indirect) paths. Common models like our old friend multiple regression can be expressed in a Path Analysis framework.

Factor Analysis, on the other hand, brings something absolutely crucial to the table - it allows us to mitigate some of the problems which are associated with measurement error by specifying the existence of some latent variable which is measured via some observed variables. No question can perfectly measure someone’s level of “anxiety”, but if we take a set of 10 carefully chosen questions, we can consider the shared covariance between those 10 questions to represent the construct that is common between all of them (they all ask, in different ways, about “anxiety”), also modeling the unique error with which each individual question fails to perfectly represent the entire construct.

Combine them and we can reap the rewards of having both a structural model and a measurement model. The measurement model is our specification between the items we directly observed, and the latent variables of which we consider these items to be manifestations. The structural model is our specified model of the relationships between the latent variables.

Figure 1: SEM diagram. Measurement model in orange, Structural model in purple

You can’t test the structural model if the measurement model is bad

If you test the relationships between a set of latent factors, and they are not reliably measured by the observed items, then this error propagates up to influence the fit of the structural model.
To test the measurement model, it is typical to saturate the structural model (i.e., allow all the latent variables to correlate with one another). This way any misfit is due to the measurement model only.

Alternatively, we can fit individual CFA models for each construct and assess their fit (making any reasonable adjustments if necessary) prior to then fitting the full SEM.

Exercising Exercises

Dataset: tpb2

The “Theory of Planned Behaviour” is a theory about why people engage in physical activity (i.e. why people exercise).

The theory is represented in the diagram in Figure 2 (only the latent variables and not the measured items are shown). Attitudes refer to the extent to which a person has a favourable view of exercising; subjective norms refer to whether they believe others whose opinions they care about believe exercise to be a good thing; and perceived behavioural control refers to the extent to which they believe exercising is under their control. Intentions refer to whether a person intends to exercise and behaviour is a measure of the extent to which they exercised. Each construct is measured using four items.

Figure 2: Theory of planned behaviour (latent variables only)

The data are available either:

Table 1: Data Dictionary for TPB data
variable question
SN1 When I think about people whose opinions matter to me, I believe they value and support regular exercise
SN2 I feel pressure from those I care about to exercise regularly
SN3 Most people who are important to me approve of my exercising
SN4 Most people like me exercise regularly
PBC1 My exercise routine is up to me and only me
PBC2 I am confident that if I want to then I can exercise regularly
PBC3 I believe I have the ability to overcome any obstacles that may prevent me from exercising regularly.
PBC4 I feel capable of sticking to a consistent exercise schedule, even when faced with challenges or distractions
attitude1 I see exercising as an enjoyable and rewarding activity.
attitude2 I believe that exercising contributes positively to my overall well-being and health.
attitude3 I view exercising as an important part of maintaining a healthy lifestyle.
attitude4 I feel energized and invigorated after engaging in physical exercise.
int1 I am determined to take concrete steps towards establishing a consistent exercise habit
int2 I intend to exercise for at least 20 minutes, three times per week for the next three months.
int3 I have made a firm decision to prioritize exercise and allocate time for it in my schedule
int4 I intend to be in shape within the next three months.
int5 I am committed to incorporating regular exercise into my weekly routine.
beh1 I currently engage in physical activity for at least 20 minutes, three times per week, as recommended.
beh2 I already allocate time for exercise in my weekly schedule and adhere to it regularly.
beh3 I track my exercise sessions and ensure I meet my weekly goals
beh4 I do not currently exercise enough
Question 1

Load in the various packages you will probably need (tidyverse, lavaan), and read in the data using the appropriate function.

We’ve given you .csv files for a long time now, but it’s good to be prepared to encounter all sorts of weird filetypes. Can you successfully read in from both types of data?

Question 2

Before we test the theory of planned behaviour, we want to think about the measurement models for each of the constructs we are trying to capture.

Test separate one-factor models for each construct.
Are the measurement models satisfactory? (check their fit measures).

Question 3

Using lavaan syntax, specify the full structural equation model that corresponds to the model in Figure 2. For each construct use the measurement models from the previous question.

This involves specifying the measurement models for all the latent variables, and then also specifying the relationships between those latent variables. All in the same model!

Question 4

Estimate and evaluate the model

  • Does the model fit well?
  • Are the hypothesised paths significant?

Question 5

Examine the modification indices and expected parameter changes - are there any additional parameters you would consider including?

Question 6

Test the indirect effect of attitudes, subjective norms, and perceived behavioural control on behaviour via intentions.

Remember, when you fit the model with sem(), use se='bootstrap' to get boostrapped standard errors (it may take a few minutes). When you inspect the model using summary(), get the 95% confidence intervals for parameters with ci = TRUE.

Question 7

Write up your analysis as if you were presenting the work in academic paper, with brief separate ‘Method’ and ‘Results’ sections