Multivariate Statistics and Methodology using R

Practical Issues in Structural Equation Modeling

Aja Murray;

This week

Learning outcomes

SEM with continuous normally distributed items

Why item distributions matter

SEM with continuous non-normally distributed items

SEM with continuous non-normally distributed items

Using a robust estimator for non-normally distributed items

Example in lavaan

library(psych)
describe(Agg_data)
##       vars   n mean   sd median trimmed  mad   min  max range skew kurtosis
## item1    1 500 0.01 1.02  -0.40   -0.18 0.59 -0.83 5.32  6.15 1.75     3.30
## item2    2 500 0.05 1.31  -0.42   -0.21 0.83 -1.04 6.34  7.39 1.76     3.13
## item3    3 500 0.17 1.28  -0.24   -0.06 0.96 -0.98 5.25  6.23 1.64     2.63
## item4    4 500 0.11 1.16  -0.37   -0.12 0.64 -0.83 7.26  8.09 2.05     5.57
##         se
## item1 0.05
## item2 0.06
## item3 0.06
## item4 0.05

Observed variable distributions

Fitting the model ignoring non-normality

model1<-'Agg=~item1+item2+item3+item4'
model1.est<-cfa(model1, data=Agg_data)
summary(model1.est, fit.measures=T, standardized=T)
## lavaan 0.6-11 ended normally after 31 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                           500
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                12.998
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.002
## 
## Model Test Baseline Model:
## 
##   Test statistic                               187.030
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.939
##   Tucker-Lewis Index (TLI)                       0.818
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3097.350
##   Loglikelihood unrestricted model (H1)      -3090.851
##                                                       
##   Akaike (AIC)                                6210.700
##   Bayesian (BIC)                              6244.417
##   Sample-size adjusted Bayesian (BIC)         6219.024
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.105
##   90 Percent confidence interval - lower         0.056
##   90 Percent confidence interval - upper         0.162
##   P-value RMSEA <= 0.05                          0.035
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.038
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Agg =~                                                                
##     item1             1.000                               0.583    0.570
##     item2             1.179    0.183    6.439    0.000    0.688    0.524
##     item3             0.942    0.160    5.907    0.000    0.550    0.429
##     item4             1.080    0.167    6.481    0.000    0.630    0.541
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .item1             0.705    0.068   10.424    0.000    0.705    0.675
##    .item2             1.252    0.108   11.610    0.000    1.252    0.726
##    .item3             1.337    0.100   13.352    0.000    1.337    0.816
##    .item4             0.957    0.086   11.182    0.000    0.957    0.707
##     Agg               0.340    0.070    4.826    0.000    1.000    1.000

Using a robust estimator for non-normal data

Using MLM for non-normal data

model1<-'Agg=~item1+item2+item3+item4'
model1.est<-cfa(model1, data=Agg_data, estimator='MLM')
summary(model1.est, fit.measures=T, standardized=T, ci=T)
## lavaan 0.6-11 ended normally after 31 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                           500
##                                                       
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                                12.998       9.941
##   Degrees of freedom                                 2           2
##   P-value (Chi-square)                           0.002       0.007
##   Scaling correction factor                                  1.307
##        Satorra-Bentler correction                                 
## 
## Model Test Baseline Model:
## 
##   Test statistic                               187.030     114.973
##   Degrees of freedom                                 6           6
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.627
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.939       0.927
##   Tucker-Lewis Index (TLI)                       0.818       0.781
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.941
##   Robust Tucker-Lewis Index (TLI)                            0.824
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3097.350   -3097.350
##   Loglikelihood unrestricted model (H1)      -3090.851   -3090.851
##                                                                   
##   Akaike (AIC)                                6210.700    6210.700
##   Bayesian (BIC)                              6244.417    6244.417
##   Sample-size adjusted Bayesian (BIC)         6219.024    6219.024
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.105       0.089
##   90 Percent confidence interval - lower         0.056       0.045
##   90 Percent confidence interval - upper         0.162       0.140
##   P-value RMSEA <= 0.05                          0.035       0.069
##                                                                   
##   Robust RMSEA                                               0.102
##   90 Percent confidence interval - lower                     0.045
##   90 Percent confidence interval - upper                     0.169
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.038       0.038
## 
## Parameter Estimates:
## 
##   Standard errors                           Robust.sem
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##   Agg =~                                                                
##     item1             1.000                               1.000    1.000
##     item2             1.179    0.216    5.465    0.000    0.756    1.602
##     item3             0.942    0.209    4.510    0.000    0.533    1.352
##     item4             1.080    0.203    5.321    0.000    0.682    1.478
##    Std.lv  Std.all
##                   
##     0.583    0.570
##     0.688    0.524
##     0.550    0.429
##     0.630    0.541
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .item1             0.705    0.091    7.705    0.000    0.526    0.884
##    .item2             1.252    0.142    8.799    0.000    0.973    1.531
##    .item3             1.337    0.138    9.690    0.000    1.066    1.607
##    .item4             0.957    0.128    7.489    0.000    0.707    1.208
##     Agg               0.340    0.091    3.738    0.000    0.162    0.518
##    Std.lv  Std.all
##     0.705    0.675
##     1.252    0.726
##     1.337    0.816
##     0.957    0.707
##     1.000    1.000

Using MLR for non-normal data

model1<-'Agg=~item1+item2+item3+item4'
model1.est<-cfa(model1, data=Agg_data, estimator='MLR')
summary(model1.est, fit.measures=T, standardized=T, ci=T)
## lavaan 0.6-11 ended normally after 31 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                           500
##                                                       
## Model Test User Model:
##                                                Standard      Robust
##   Test Statistic                                 12.998      13.837
##   Degrees of freedom                                  2           2
##   P-value (Chi-square)                            0.002       0.001
##   Scaling correction factor                                   0.939
##        Yuan-Bentler correction (Mplus variant)                     
## 
## Model Test Baseline Model:
## 
##   Test statistic                               187.030     148.841
##   Degrees of freedom                                 6           6
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.257
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.939       0.917
##   Tucker-Lewis Index (TLI)                       0.818       0.751
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.938
##   Robust Tucker-Lewis Index (TLI)                            0.814
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3097.350   -3097.350
##   Scaling correction factor                                  2.129
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)      -3090.851   -3090.851
##   Scaling correction factor                                  1.891
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                                6210.700    6210.700
##   Bayesian (BIC)                              6244.417    6244.417
##   Sample-size adjusted Bayesian (BIC)         6219.024    6219.024
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.105       0.109
##   90 Percent confidence interval - lower         0.056       0.058
##   90 Percent confidence interval - upper         0.162       0.168
##   P-value RMSEA <= 0.05                          0.035       0.030
##                                                                   
##   Robust RMSEA                                               0.105
##   90 Percent confidence interval - lower                     0.058
##   90 Percent confidence interval - upper                     0.161
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.038       0.038
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##   Agg =~                                                                
##     item1             1.000                               1.000    1.000
##     item2             1.179    0.194    6.090    0.000    0.800    1.559
##     item3             0.942    0.260    3.619    0.000    0.432    1.453
##     item4             1.080    0.249    4.339    0.000    0.592    1.568
##    Std.lv  Std.all
##                   
##     0.583    0.570
##     0.688    0.524
##     0.550    0.429
##     0.630    0.541
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
##    .item1             0.705    0.098    7.201    0.000    0.513    0.897
##    .item2             1.252    0.148    8.434    0.000    0.961    1.543
##    .item3             1.337    0.143    9.341    0.000    1.056    1.617
##    .item4             0.957    0.134    7.148    0.000    0.695    1.220
##     Agg               0.340    0.099    3.433    0.001    0.146    0.534
##    Std.lv  Std.all
##     0.705    0.675
##     1.252    0.726
##     1.337    0.816
##     0.957    0.707
##     1.000    1.000

Summary of what to do when you have non-normal continuous data

BREAK 1

WELCOME BACK 1

Ordinal-categorical data

Solutions for ordinal-categorical data

Example: Ordinal data

Fitting a model with ordinal-categorical data

model1<-'Agg=~item1+item2+item3+item4'
model1.est<-cfa(model1, data=Agg_data2)
summary(model1.est, fit.measures=T, standardized=T)
## lavaan 0.6-11 ended normally after 36 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                           500
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                12.856
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.002
## 
## Model Test Baseline Model:
## 
##   Test statistic                                64.410
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.814
##   Tucker-Lewis Index (TLI)                       0.442
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -768.162
##   Loglikelihood unrestricted model (H1)       -761.734
##                                                       
##   Akaike (AIC)                                1552.325
##   Bayesian (BIC)                              1586.042
##   Sample-size adjusted Bayesian (BIC)         1560.649
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.104
##   90 Percent confidence interval - lower         0.055
##   90 Percent confidence interval - upper         0.162
##   P-value RMSEA <= 0.05                          0.036
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.045
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Agg =~                                                                
##     item1             1.000                               0.128    0.375
##     item2             1.413    0.440    3.214    0.001    0.180    0.458
##     item3             1.504    0.457    3.287    0.001    0.192    0.387
##     item4             0.527    0.184    2.868    0.004    0.067    0.267
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .item1             0.100    0.008   11.743    0.000    0.100    0.859
##    .item2             0.123    0.013    9.232    0.000    0.123    0.790
##    .item3             0.210    0.018   11.413    0.000    0.210    0.851
##    .item4             0.059    0.004   13.973    0.000    0.059    0.929
##     Agg               0.016    0.007    2.393    0.017    1.000    1.000

Fitting a model with a categorical estimator

model1<-'Agg=~item1+item2+item3+item4'
model1.est<-cfa(model1, data=Agg_data2, ordered=c('item1','item2','item3','item4'))
summary(model1.est, fit.measures=T, standardized=T)
## lavaan 0.6-11 ended normally after 24 iterations
## 
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                           500
##                                                       
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                                 7.257       7.807
##   Degrees of freedom                                 2           2
##   P-value (Chi-square)                           0.027       0.020
##   Scaling correction factor                                  0.943
##   Shift parameter                                            0.115
##        simple second-order correction                             
## 
## Model Test Baseline Model:
## 
##   Test statistic                                80.264      75.274
##   Degrees of freedom                                 6           6
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.072
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.929       0.916
##   Tucker-Lewis Index (TLI)                       0.788       0.749
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.073       0.076
##   90 Percent confidence interval - lower         0.021       0.026
##   90 Percent confidence interval - upper         0.133       0.136
##   P-value RMSEA <= 0.05                          0.194       0.166
##                                                                   
##   Robust RMSEA                                                  NA
##   90 Percent confidence interval - lower                        NA
##   90 Percent confidence interval - upper                        NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.089       0.089
## 
## Parameter Estimates:
## 
##   Standard errors                           Robust.sem
##   Information                                 Expected
##   Information saturated (h1) model        Unstructured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Agg =~                                                                
##     item1             1.000                               0.516    0.516
##     item2             1.115    0.296    3.767    0.000    0.576    0.576
##     item3             1.102    0.314    3.504    0.000    0.569    0.569
##     item4             1.300    0.393    3.308    0.001    0.671    0.671
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .item1             0.000                               0.000    0.000
##    .item2             0.000                               0.000    0.000
##    .item3             0.000                               0.000    0.000
##    .item4             0.000                               0.000    0.000
##     Agg               0.000                               0.000    0.000
## 
## Thresholds:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     item1|t1          1.108    0.071   15.690    0.000    1.108    1.108
##     item2|t1          0.871    0.065   13.484    0.000    0.871    0.871
##     item3|t1         -0.146    0.056   -2.590    0.010   -0.146   -0.146
##     item4|t1         -1.491    0.086  -17.370    0.000   -1.491   -1.491
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .item1             0.733                               0.733    0.733
##    .item2             0.669                               0.669    0.669
##    .item3             0.676                               0.676    0.676
##    .item4             0.550                               0.550    0.550
##     Agg               0.267    0.106    2.526    0.012    1.000    1.000
## 
## Scales y*:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     item1             1.000                               1.000    1.000
##     item2             1.000                               1.000    1.000
##     item3             1.000                               1.000    1.000
##     item4             1.000                               1.000    1.000

Summary of what to do with ordered-categorical items

BREAK 2

Welcome back 2

Missing data

Missing data mechanisms

MAR

MCAR

MNAR

Methods of dealing with missingness

Listwise deletion

Pairwise deletion

Mean imputation

Regression imputation

Multiple imputation

Full Information Maximum Likelihood (FIML) Estimation method

Missingness in lavaan

model1<-'Agg=~item1+item2+item3+item4'
model1.est<-cfa(model1, data=Agg_data, missing='ML')
summary(model1.est, fit.measures=T, standardized=T)
## lavaan 0.6-11 ended normally after 31 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
##                                                       
##   Number of observations                           500
##   Number of missing patterns                         1
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                12.998
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.002
## 
## Model Test Baseline Model:
## 
##   Test statistic                               187.030
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.939
##   Tucker-Lewis Index (TLI)                       0.818
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3097.350
##   Loglikelihood unrestricted model (H1)      -3090.851
##                                                       
##   Akaike (AIC)                                6218.700
##   Bayesian (BIC)                              6269.275
##   Sample-size adjusted Bayesian (BIC)         6231.186
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.105
##   90 Percent confidence interval - lower         0.056
##   90 Percent confidence interval - upper         0.162
##   P-value RMSEA <= 0.05                          0.035
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.032
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Agg =~                                                                
##     item1             1.000                               0.583    0.570
##     item2             1.179    0.171    6.911    0.000    0.688    0.524
##     item3             0.942    0.179    5.276    0.000    0.550    0.429
##     item4             1.080    0.180    6.010    0.000    0.630    0.541
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .item1             0.011    0.046    0.245    0.807    0.011    0.011
##    .item2             0.050    0.059    0.851    0.395    0.050    0.038
##    .item3             0.172    0.057    3.004    0.003    0.172    0.134
##    .item4             0.105    0.052    2.024    0.043    0.105    0.091
##     Agg               0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .item1             0.705    0.070   10.127    0.000    0.705    0.675
##    .item2             1.252    0.110   11.346    0.000    1.252    0.726
##    .item3             1.337    0.103   13.001    0.000    1.337    0.816
##    .item4             0.957    0.090   10.656    0.000    0.957    0.707
##     Agg               0.340    0.072    4.699    0.000    1.000    1.000

Advanced topics in missingness

Summary of missingess