The goal for today is to specify, fit and evaluate a path model. First, let’s load our packages:
# Load tidyverse and lavaan packages
library(tidyverse)
library(lavaan)
Let’s load and preview some data:
# Load the data
org <- read_csv("https://uoepsy.github.io/dapr3/2324/lectures/org_performance.csv")
## Rows: 487 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): OrgID, Sex, PubPri, Region
## dbl (3): Mot, Perform, Size
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Preview the data
org
## # A tibble: 487 × 7
## OrgID Sex Mot Perform PubPri Region Size
## <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl>
## 1 Org1 female 3 58 public England 3
## 2 Org1 female 7 59 public England 3
## 3 Org1 male 4 60 public England 3
## 4 Org1 male 1 48 public England 3
## 5 Org1 female 4 59 public England 3
## 6 Org1 male 4 50 public England 3
## 7 Org1 female 1 43 public England 3
## 8 Org1 female 2 48 public England 3
## 9 Org1 male 5 65 public England 3
## 10 Org1 male 2 41 public England 3
## # ℹ 477 more rows
When data cleaning, you should always make sure you’ve got a description somewhere over the variables in your data:
Get rid of this
Let’s compare the model to a baseline standard linear model:
# Fit the model
m1 <- lm(Perform ~ Mot + Sex+ PubPri, data = org)
# View the model
summary(m1)
##
## Call:
## lm(formula = Perform ~ Mot + Sex + PubPri, data = org)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.3555 -7.0569 0.2756 6.9709 20.6624
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.7244 0.8313 53.803 < 2e-16 ***
## Mot 2.6727 0.3026 8.833 < 2e-16 ***
## Sexmale -0.7322 0.8501 -0.861 0.39
## PubPripublic 4.9781 0.9388 5.302 1.74e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.342 on 483 degrees of freedom
## Multiple R-squared: 0.2399, Adjusted R-squared: 0.2351
## F-statistic: 50.8 on 3 and 483 DF, p-value: < 2.2e-16
We can also fit this in lavaan:
(Make sure to get the syntax wrong first. Pubpri not PubPri - Delete this in the lecture verson)
# Specify the model
lm1 = 'Perform ~ Mot + Sex + PubPri'
Look at the environment, we can see this has created a single string value that is our model specification. This is the first part of the input for a path analysis function, and the data is the second part:
# Estimate the model
lm1_out <- sem(lm1, data = org)
# Do you see the object it's created is different to a normal linear model? It's a special lavaan list
# View the model
summary(lm1_out)
## lavaan 0.6.16 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 4
##
## Number of observations 487
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Perform ~
## Mot 2.673 0.301 8.870 0.000
## Sex -0.732 0.847 -0.865 0.387
## PubPri 4.978 0.935 5.324 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Perform 86.549 5.546 15.604 0.000
We can standardise it too:
Extending
# Define the model
path1 = '
Perform ~ Mot
Mot ~ Sex + PubPri
'
Now let’s estimate and view the model
# Estimate the model
# Print the model
Make sure to talk about degrees of freedom here, count them when they appear in the model output