class: center, middle, inverse, title-slide .title[ #
WEEK 2
Path Mediation
] .subtitle[ ## Data Analysis for Psychology in R 3 ] .author[ ### dapR3 Team ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- # Learning Objectives 1. Understand the purpose of mediation models and the conceptual challenges 2. Be able to describe direct, indirect and total effects in a mediation model. 3. Estimate and interpret a mediation model using `lavaan` --- class: inverse, center, middle <h2>Part 1: Introduction to mediation</h2> <h2 style="text-align: left;opacity:0.3;">Part 2: Direct, indirect and total effects</h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Estimating mediation in `lavaan`</h2> <h2 style="text-align: left;opacity:0.3;">Part 4: Reporting</h2> --- # Mediation - Is when a predictor X, has an effect on an outcome Y, via a mediating variable M - The mediator **transmits** the effect of X to Y - Examples of mediation hypotheses: - Conscientiousness (X) affects health (Y) via health behaviors (M) - Conduct problems (X) increase the risk of depression (Y) via peer problems (M) - Attitudes to smoking (X) predict intentions to smoke (M) which in turn predicts smoking behavior (Y) - An intervention (X) to reduce youth crime (Y) works by increasing youth self-control (M) --- # Visualising a mediation model .pull-left[ - In a SEM diagram we can represent mediation as: ] .pull-right[ <img src="week2_pathmediation_files/figure-html/unnamed-chunk-1-1.png" width="17777" /> ] --- # Mediation...not to be confused with moderation - Mediation is commonly confused with **moderation** - Moderation is when a moderator z modifies the effect of X on Y - e.g., the effect of X on Y is stronger at higher levels of Z - Also known as an **interaction** between X and Z - Examples of moderation could be: - An intervention (X) works better to reduce bullying (Y) at older ages (Z) of school pupil - The relation between stress (X) and depression (Y) is lower for those scoring higher on spirituality (Z) --- class: inverse, center, middle, animated, rotateInDownLeft # End of Part 1 --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Introduction to mediation</h2> <h2>Part 2: Direct, indirect and total effects</h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Estimating mediation in `lavaan`</h2> <h2 style="text-align: left;opacity:0.3;">Part 4: Reporting</h2> --- # Direct and indirect effects in mediation - We seldom hypothesize that a mediator completely explains the relation between X and Y - More commonly, we expect both **indirect effects** and **direct effects** of X on Y - The indirect effects of X on Y are those transmitted via the mediator - The direct effect of X on Y is the remaining effect of X on Y --- # Visualing direct and indirect effects in mediation <img src="week2_pathmediation_files/figure-html/unnamed-chunk-2-1.png" width="1707" /> --- # Testing mediation .pull-left[ - Traditionally, mediation was tested using a series of separate linear models: 1. Y~X 2. Y~X+M 3. M~X - May see this referred to as the Baron and Kenny approach. ] .pull-right[ <img src="week2_pathmediation_files/figure-html/unnamed-chunk-3-1.png" width="1707" /> ] --- # Traditional methods for mediation .pull-left[ - The three regression models: 1. Y~X 2. Y~X+M 3. M~X ] .pull-right[ - Model 1 estimates the overall effect of X on Y - Model 2 estimates the partial effects of X and M on Y - Model 3 estimates the effect of X on M - If the following conditions were met, mediation was assumed to hold: - The effect of X on Y (eq.1) is significant - The effect of X on M (eq.3) is significant - The effect of X on Y becomes reduced when M is added into the model (eq.2) ] --- # Limitations of traditional methods for mediation - Low power - Very cumbersome for multiple mediators, predictors, or outcomes - You don't get an estimate of the magnitude of the indirect effect - Much better way: **path mediation model** --- # BREAK QUIZ - Quiz question: - Which of these hypotheses is a mediation hypothesis? - 1) Vocabulary development in childhood follows a non-linear trajectory - 2) The effects of conscientiousness on academic achievement are stronger at low levels of cognitive ability - 3) Poverty affects child behavior problems through increasing parental stress - 4) Earlier pubertal onset increases the risk of antisocial behavior only in girls and not boys --- class: inverse, center, middle, animated, rotateInDownLeft # End of Part 2 --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Introduction to mediation</h2> <h2 style="text-align: left;opacity:0.3;">Part 2: Direct, indirect and total effects</h2> <h2>Part 3: Estimating mediation in `lavaan`</h2> <h2 style="text-align: left;opacity:0.3;">Part 4: Reporting</h2> --- # WELCOME BACK - Welcome back! - The answer to the quiz question is... - Which of these hypotheses is a mediation hypothesis? - 1) Vocabulary development in childhood follows a non-linear trajectory - 2) The effects of conscientiousness on academic achievement are stronger at low levels of cognitive ability - 3) **Poverty affects child behavior problems through increasing parental stress** - 4) Earlier pubertal onset increases the risk of antisocial behavior only in girls and not boys --- # Testing a path mediation model in lavaan - Specification - Create a lavaan syntax object - Estimation - Estimate the model using e.g., maximum likelihood estimation - Evaluation/interpretation - Inspect the model to judge how good it is - Interpret the parameter estimates --- # Example .pull-left[ - Does peer rejection mediate the association between aggression and depression? ] .pull-right[ <img src="week2_pathmediation_files/figure-html/unnamed-chunk-4-1.png" width="17777" /> ] --- # The data ```r slice(agg.data2, 1:10) ``` ``` ## Dep PR Agg ## 1 0.2466 0.7660 0.356742 ## 2 0.3543 1.5883 1.723399 ## 3 -1.7653 -0.6616 -0.891406 ## 4 0.9377 1.9045 0.852624 ## 5 0.1180 -0.1149 -0.653107 ## 6 -0.4619 2.3821 -0.667404 ## 7 0.4619 -1.0537 0.075925 ## 8 0.9918 0.2231 -0.152125 ## 9 0.4318 0.2338 0.006892 ## 10 -0.3855 -1.7180 0.127499 ``` --- # Mediation Example - Does peer rejection mediate the association between aggression and depression? ```r model1<-'Dep ~ PR # Depression predicted by peer rejection Dep ~ Agg # Depression predicted by aggression (the direct effect) PR ~ Agg # Peer rejection predicted by aggression ' ``` - Estimate the model ```r model1.est<-sem(model1, data=agg.data2) ``` --- # Model Ouput + Typically we want to see: + Model estimates + Model fit + Standardized solutions + Possibly modification indices + We get those by: ```r summary(model1.est, fit.measures = T, std=T, modindices = T) ``` --- # The model output .scroll-output[ ```r summary(model1.est) ``` ``` ## lavaan 0.6-12 ended normally after 1 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Dep ~ ## PR 0.295 0.046 6.387 0.000 ## Agg 0.196 0.046 4.236 0.000 ## PR ~ ## Agg 0.507 0.039 13.089 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Dep 0.817 0.052 15.811 0.000 ## .PR 0.766 0.048 15.811 0.000 ``` ] --- # Things to note from the model output - All three regressions paths are statistically significant - The model is **just-identified** - The degrees of freedom are equal to 0 - The model fit cannot be tested - The model fit statistics (TLI, CFI, RMSEA, SRMR) all suggest perfect fit but this is meaningless --- # Visualising the model using `semPaths()` - We can use semPaths() from the semPlot package to help us visualize the model - Shows the parameter estimates within an SEM diagram ```r library(semPlot) semPaths(model1.est, what='est') ``` --- # Visualising the model using `semPaths()` ![](week2_pathmediation_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- # Calculating the indirect effects .pull-left[ - To calculate the indirect effect of X on Y in path mediation, we need to create some new parameters - The indirect effect of X on Y via M is: - `\(a*b\)` - `\(a\)` = the regression coefficient for M~X - `\(b\)` = the regression coefficient for Y~M ] .pull-right[ <img src="week2_pathmediation_files/figure-html/unnamed-chunk-11-1.png" width="17777" /> ] --- # Calculating indirect effects in lavaan .pull-left[ - To calculate the indirect effect of X on Y in lavaan we: - Use parameter labels 'a' and 'b' to label the relevant paths - a is for the effect of X on M - b is for the effect of M on Y - Use the ':=' operator to create a new parameter 'ind' - 'ind' represents our indirect effect ] .pull-right[ ```r model1<-'Dep~b*PR Dep~Agg PR~a*Agg ind:=a*b ' ``` ] --- # Indirect effects in the output .scroll-output[ ```r model1.est<-sem(model1, data=agg.data2) summary(model1.est) ``` ``` ## lavaan 0.6-12 ended normally after 1 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Dep ~ ## PR (b) 0.295 0.046 6.387 0.000 ## Agg 0.196 0.046 4.236 0.000 ## PR ~ ## Agg (a) 0.507 0.039 13.089 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Dep 0.817 0.052 15.811 0.000 ## .PR 0.766 0.048 15.811 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ind 0.150 0.026 5.740 0.000 ``` ] --- # Statistical significance of the indirect effects - Default method of assessing the statistical significance of indirect effects assume normal sampling distribution - May not hold for indirect effects which are the product of regression coefficients - Instead we can use **bootstrapping** - Allows 95% confidence intervals (CIs) to be computed - If 95% CI includes 0, the indirect effect is not significant at alpha=.05 --- # Bootstapped CIs for indirect effect in lavaan ```r model1<-'Dep~b*PR Dep~Agg PR~a*Agg ind:=a*b' model1.est<-sem(model1, data=agg.data2, se='bootstrap') #we add the argument se='bootstrap' ``` --- # Output for bootstrapped CIs .scroll-output[ ```r summary(model1.est, ci=T) # we add the argument ci=T to see the confidence intervals in the output ``` ``` ## lavaan 0.6-12 ended normally after 1 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## Dep ~ ## PR (b) 0.295 0.043 6.862 0.000 0.204 0.381 ## Agg 0.196 0.049 4.043 0.000 0.102 0.295 ## PR ~ ## Agg (a) 0.507 0.041 12.333 0.000 0.426 0.590 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .Dep 0.817 0.050 16.235 0.000 0.708 0.909 ## .PR 0.766 0.050 15.160 0.000 0.668 0.866 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ind 0.150 0.026 5.818 0.000 0.100 0.201 ``` ] --- # Total effects in path mediation - As well as the direct and indirect effect, it is often of interest to know the **total** effect of X on Y `$$Total = Indirect + Direct$$` --- # Total effects in path mediation `$$Total = a*b + c$$` <img src="week2_pathmediation_files/figure-html/unnamed-chunk-12-1.png" width="17777" /> --- # Total effect in lavaan ```r model1<-'Dep~b*PR Dep~c*Agg # we add the label c for our direct effect PR~a*Agg ind:=a*b total:=a*b+c # we add a new parameter for the total effect' model1.est<-sem(model1, data=agg.data2, se='bootstrap') #we add the argument se='bootstrap' ``` --- # Total effect in lavaan output .scroll-output[ ```r summary(model1.est, ci=T) ``` ``` ## lavaan 0.6-12 ended normally after 1 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## Dep ~ ## PR (b) 0.295 0.043 6.879 0.000 0.212 0.381 ## Agg (c) 0.196 0.049 3.983 0.000 0.096 0.290 ## PR ~ ## Agg (a) 0.507 0.041 12.388 0.000 0.426 0.590 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .Dep 0.817 0.050 16.487 0.000 0.711 0.910 ## .PR 0.766 0.048 15.867 0.000 0.664 0.863 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ind 0.150 0.025 5.878 0.000 0.103 0.204 ## total 0.346 0.047 7.412 0.000 0.254 0.442 ``` ] --- # Why code the total effect in lavaan? - We could have just added up the coefficients for the direct and indirect effects - By coding it in lavaan, however, we can assess the statistical significance of the total effect - Useful because sometimes the direct and indirect effects are not individually significant but the total effect is - May be especially relevant in cases where there are many mediators of small effect --- # Interpreting the total, direct, and indirect effect coefficients - The total effect can be interpreted as the **unit increase in Y expected to occur when X increases by one unit** - The indirect effect can be interpreted as the **unit increase in Y expected to occur via M when X increases by one unit** - The direct effect can be interpreted as the **unit increase in Y expected to occur with a unit increase in X over and above the increase transmitted by M** - **Note**: 'direct' effect may not actually be direct - it may be acting via other mediators not included in our model --- # Standardised parameters - As with CFA models, standardized parameters can be obtained using: ```r summary(model1.est, ci=T, std=T) ``` --- # Standardised parameters .scroll-output[ ``` ## lavaan 0.6-12 ended normally after 1 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## Dep ~ ## PR (b) 0.295 0.043 6.879 0.000 0.212 0.381 ## Agg (c) 0.196 0.049 3.983 0.000 0.096 0.290 ## PR ~ ## Agg (a) 0.507 0.041 12.388 0.000 0.426 0.590 ## Std.lv Std.all ## ## 0.295 0.298 ## 0.196 0.198 ## ## 0.507 0.505 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .Dep 0.817 0.050 16.487 0.000 0.711 0.910 ## .PR 0.766 0.048 15.867 0.000 0.664 0.863 ## Std.lv Std.all ## 0.817 0.812 ## 0.766 0.745 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ind 0.150 0.025 5.878 0.000 0.103 0.204 ## total 0.346 0.047 7.412 0.000 0.254 0.442 ## Std.lv Std.all ## 0.150 0.151 ## 0.346 0.349 ``` ] --- # BREAK QUIZ - Time for a pause - Quiz question - If the effect of X on M is b=.30 and the effect of M on Y is b=.10, what is the indirect effect of X on Y? - 1) b=.40 - 2) b=.03 - 3) b=.30 - 4) b=.10 --- class: inverse, center, middle, animated, rotateInDownLeft # End of Part 3 --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Introduction to mediation</h2> <h2 style="text-align: left;opacity:0.3;">Part 2: Direct, indirect and total effects</h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Estimating mediation in `lavaan`</h2> <h2>Part 4: Reporting</h2> --- # Welcome back - The answer to the quiz question is... - Quiz question - If the effect of X on M is b=.30 and the effect of M on Y is b=.10, what is the indirect effect of X on Y? - 1) b=.40 - 2) **b=.03** - 3) b=.30 - 4) b=.10 --- # Reporting path mediation models - Methods/ Analysis Strategy - The model being tested - e.g. 'Y was regressed on both X and M and M was regressed on X' - The estimator used (e.g., maximum likelihood estimation) - The method used to test the significance of indirect effects ('bootstrapped 95% confidence intervals') - Results - Model fit (for over-identified models) - The parameter estimates for the path mediation and their statistical significance - Can be useful to present these in a SEM diagram - The diagrams from R not considered 'publication quality' draw in powerpoint or similar --- # Reporting path mediation models - example of SEM diagram with results .pull-left[ - Include the key parameter estimates - Indicate statistically significant paths (e.g. with an '*') - Include a figure note that explains how statistically significant paths (and at what level) are signified ] .pull-right[ <img src="week2_pathmediation_files/figure-html/unnamed-chunk-14-1.png" width="1707" /> ] --- # Reporting path mediation models - the indirect effects - Results - The coefficient for the indirect effect and the bootstrapped 95% confidence intervals - Common to also report **proportion mediation**: `$$\frac{indirect}{total}$$` - However, important to be aware of limitations: - Big proportion mediation possible when total effect is small - makes effect seem more impressive - Small proportion mediation even when total effect is big - can underplay importance of effect - Should be interpreted in context of total effect - Tricky interpretation if there are a mix of negative and positive effects involved --- # Extensions of path mediation models - We can extend our path mediation model in various ways: - Several mediators in sequence or parallel - Multiple outcomes - Multiple predictors - Multiple groups (e.g., comparing direct and indirect effects across males and females) - Add covariates to adjust for potential confounders --- # Example: Multiple mediation model .pull-left[ ```r model2<-'Dep~b2*Aca Aca~a2*Agg Dep~b1*PR PR~a1*Agg Dep~c*Agg ind1:=a1*b1 ind2:=a2*b2 total=a1*b1+a2*b2+c ' ``` ] .pull-right[ <img src="week2_pathmediation_files/figure-html/unnamed-chunk-15-1.png" width="17777" /> ] --- ## Other path analysis models - Path mediation models are a common application of path models - But they are just one example - Anything that can be expressed in terms of regressions between observed variables can be tested as a path model - Can include ordinal or binary variables - Can include moderation - Other common path analysis models include: - Auto-regressive models for longitudinal data - Cross-lagged panel models for longitudinal data --- # Making model modifications - You **may** want to make some modifications to your initially hypothesized model - non-significant paths that you want to trim - include some additional paths not initially included - Remember that this now moves us into exploratory territory where: - Model modifications should be substantively as well as statistically justifiable - You must be aware of the possibility that you are capitalizing on chance - You should aim to replicate the modifications in independent data --- # Cautions regarding path analysis models - **Assumption** that the paths represent causal effects is only an assumption - Especially if using cross-sectional data - Mediation models should ideally be estimated on longitudinal data. - X time 1 - M time 2 - Y time 3 - The parameters are only accurate if the model is correctly specified --- # Cautions: Indistinguishable models <img src="week2_pathmediation_files/figure-html/unnamed-chunk-16-1.png" width="17777" /> --- # Measurement error in path analysis - Path analysis models use observed variables - Assumes no measurement error in these variables - Path coefficients likely to be attenuated due to unmodeled measurement error - Structural equation models solve this issue - They are path analysis models where the paths are between latent rather than observed variables - ...very brief comment on this in the final week --- # Path analysis summary - Path analysis can be used to fit sets of regression models - Common path analysis model is the path mediation model - But very flexible huge range of models that can be tested - In R, path analysis can be done using the `sem()` function in `lavaan` - Need to be aware that we aren't *testing* causality but assuming it --- class: extra, inverse, center, middle, animated, rotateInDownLeft # Bonus `laavan` --- # 2+ category nominal predictors + To include these variables, we need to create the 0/1 dummy coded variables: ```r tib1 <- tibble( city = c("Bristol","Bristol","Birmingham","Birmingham", "London", "London"), cityd1 = ifelse(city =="Bristol", 1, 0), cityd2 = ifelse(city == "Birmingham", 1, 0) ) tib1 ``` ``` ## # A tibble: 6 × 3 ## city cityd1 cityd2 ## <chr> <dbl> <dbl> ## 1 Bristol 1 0 ## 2 Bristol 1 0 ## 3 Birmingham 0 1 ## 4 Birmingham 0 1 ## 5 London 0 0 ## 6 London 0 0 ``` +`cityd1` and `cityd2` can then be added to a `lavaan` model syntax. --- # Interactions + Here we need to create the product variable in our data set: ```r tib2 <- tibble( pred1 = rnorm(100, 15, 2), pred2 = rnorm(100, 10, 4), pred1z = scale(pred1)[,1], pred2z = scale(pred2)[,1], prod12 = pred1z*pred2z ) slice(tib2, 1:3) ``` ``` ## # A tibble: 3 × 5 ## pred1 pred2 pred1z pred2z prod12 ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 12.7 11.5 -1.01 0.288 -0.291 ## 2 14.0 13.0 -0.327 0.684 -0.223 ## 3 13.1 13.2 -0.769 0.727 -0.560 ``` --- class: extra, inverse, center, middle, animated, rotateInDownLeft # End