DAPR3 - Week 7 - Path Analysis - Live R

The goal for today is to specify, fit and evaluate a path model. First, let’s load our packages:

# Load tidyverse and lavaan packages

library(tidyverse)
library(lavaan)

Let’s load and preview some data:

# Load the data

org <- read_csv("https://uoepsy.github.io/dapr3/2324/lectures/org_performance.csv")
## Rows: 487 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): OrgID, Sex, PubPri, Region
## dbl (3): Mot, Perform, Size
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Preview the data

org
## # A tibble: 487 × 7
##    OrgID Sex      Mot Perform PubPri Region   Size
##    <chr> <chr>  <dbl>   <dbl> <chr>  <chr>   <dbl>
##  1 Org1  female     3      58 public England     3
##  2 Org1  female     7      59 public England     3
##  3 Org1  male       4      60 public England     3
##  4 Org1  male       1      48 public England     3
##  5 Org1  female     4      59 public England     3
##  6 Org1  male       4      50 public England     3
##  7 Org1  female     1      43 public England     3
##  8 Org1  female     2      48 public England     3
##  9 Org1  male       5      65 public England     3
## 10 Org1  male       2      41 public England     3
## # ℹ 477 more rows

When data cleaning, you should always make sure you’ve got a description somewhere over the variables in your data:

Get rid of this

  1. What do you think each variable represents?

Standard linear model

Let’s compare the model to a baseline standard linear model:

# Fit the model
m1 <- lm(Perform ~ Mot + Sex+ PubPri, data = org)

# View the model
summary(m1)
## 
## Call:
## lm(formula = Perform ~ Mot + Sex + PubPri, data = org)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.3555  -7.0569   0.2756   6.9709  20.6624 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   44.7244     0.8313  53.803  < 2e-16 ***
## Mot            2.6727     0.3026   8.833  < 2e-16 ***
## Sexmale       -0.7322     0.8501  -0.861     0.39    
## PubPripublic   4.9781     0.9388   5.302 1.74e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.342 on 483 degrees of freedom
## Multiple R-squared:  0.2399, Adjusted R-squared:  0.2351 
## F-statistic:  50.8 on 3 and 483 DF,  p-value: < 2.2e-16

We can also fit this in lavaan:

(Make sure to get the syntax wrong first. Pubpri not PubPri - Delete this in the lecture verson)

# Specify the model
lm1 = 'Perform ~ Mot + Sex + PubPri'

Look at the environment, we can see this has created a single string value that is our model specification. This is the first part of the input for a path analysis function, and the data is the second part:

# Estimate the model
lm1_out <- sem(lm1, data = org)

# Do you see the object it's created is different to a normal linear model? It's a special lavaan list

# View the model
summary(lm1_out)
## lavaan 0.6.16 ended normally after 1 iteration
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         4
## 
##   Number of observations                           487
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Perform ~                                           
##     Mot               2.673    0.301    8.870    0.000
##     Sex              -0.732    0.847   -0.865    0.387
##     PubPri            4.978    0.935    5.324    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Perform          86.549    5.546   15.604    0.000

We can standardise it too:

  1. No degrees of freedom - What level of identification does this model have?

Extending

# Define the model
path1 = '
Perform ~ Mot
Mot ~ Sex + PubPri
'

Now let’s estimate and view the model

# Estimate the model


# Print the model

Model evaluation

Make sure to talk about degrees of freedom here, count them when they appear in the model output

  1. How many degrees of freedom?
  2. What level of identification does this model have?
  3. Any paths non-significant?