class: center, middle, inverse, title-slide #
WEEK 2
Path Mediation
## Data Analysis for Psychology in R 3 ### dapR3 Team ### Department of Psychology
The University of Edinburgh --- # Learning Objectives 1. Understand the purpose of mediation models and the conceptual challenges 2. Be able to describe direct, indirect and total effects in a mediation model. 3. Estimate and interpret a mediation model using `lavaan` --- class: inverse, center, middle <h2>Part 1: Introduction to mediation</h2> <h2 style="text-align: left;opacity:0.3;">Part 2: Direct, indirect and total effects</h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Estimating mediation in `lavaan`</h2> <h2 style="text-align: left;opacity:0.3;">Part 4: Reporting</h2> --- # Mediation - Is when a predictor X, has an effect on an outcome Y, via a mediating variable M - The mediator **transmits** the effect of X to Y - Examples of mediation hypotheses: - Conscientiousness (X) affects health (Y) via health behaviours (M) - Conduct problems (X) increase the risk of depression (Y) via peer problems (M) - Attitudes to smoking (X) predict intentions to smoke (M) which in turn predicts smoking behaviour (Y) - An intervention (X) to reduce youth crime (Y) works by increasing youth self-contol (M) --- # Visualising a mediation model .pull-left[ - In a SEM diagram we can represent mediation as: ] .pull-right[ <img src="Mediation diagram basic.png" width="17777" /> ] --- # Mediation...not to be confused with moderation - Mediation is commonly confused with **moderation** - Moderation is when a moderator z modifies the effect of X on Y - e.g., the effect of X on Y is stronger at higher levels of Z - Also known as an **interaction** between X and Z - Examples of moderation could be: - An intervention (X) works better to reduce bullying (Y) at older ages (Z) of school pupil - The relation between stress (X) and depression (Y) is lower for those scoring higher on spirituality (Z) --- class: inverse, center, middle, animated, rotateInDownLeft # End of Part 1 --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Introduction to mediation</h2> <h2>Part 2: Direct, indirect and total effects</h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Estimating mediation in `lavaan`</h2> <h2 style="text-align: left;opacity:0.3;">Part 4: Reporting</h2> --- # Direct and indirect effects in mediation - We seldom hypothesise that a mediator completely explains the relation between X and Y - More commonly, we expect both **indirect effects** and **direct effects** of X on Y - The indirect effects of X on Y are those transmitted via the mediator - The direct effect of X on Y is the remaining effect of X on Y --- # Visualing direct and indirect effects in mediation <img src="Mediation diagram basic indirect.png" width="1707" /> --- # Testing mediation .pull-left[ - Traditionally, mediation was tested using a series of separate linear models: 1. Y~X 2. Y~X+M 3. M~X - May see this referred to as th Baron and Kenny approach. ] .pull-right[ <img src="trad med.png" width="1707" /> ] --- # Traditional methods for mediation .pull-left[ - The three regression models: 1. Y~X 2. Y~X+M 3. M~X ] .pull-right[ - Model 1 estimates the overall effect of X on Y - Model 2 estimates the partial effects of X and M on Y - Model 3 estimates the effect of X on M - If the following conditions were met, mediation was assumed to hold: - The effect of X on Y (eq.1) is significant - The effect of X on M (eq.3) is significant - The effect of X on Y becomes reduced when M is added into the model (eq.2) ] --- # Limitations of traditional methods for mediation - Low power - Very cumbersome for multiple mediators, predictors, or outcomes - You don't get an estimate of the magnitude of the indirect effect - Much better way: **path mediation model** --- # BREAK QUIZ - Quiz question: - Which of these hypotheses is a mediation hypothesis? - 1) Vocabulary development in childhood follows a non-linear trajecrtory - 2) The effects of conscientiousness on academic achievement are stronger at low levels of cognitive ability - 3) Poverty affects child behaviour problems through increasing parental stress - 4) Earlier pubertal onset increases the risk of antisocial behaviour only in girls and not boys --- class: inverse, center, middle, animated, rotateInDownLeft # End of Part 2 --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Introduction to mediation</h2> <h2 style="text-align: left;opacity:0.3;">Part 2: Direct, indirect and total effects</h2> <h2>Part 3: Estimating mediation in `lavaan`</h2> <h2 style="text-align: left;opacity:0.3;">Part 4: Reporting</h2> --- # WELCOME BACK - Welcome back! - The answer to the quiz question is... - Which of these hypotheses is a mediation hypothesis? - 1) Vocabulary development in childhood follows a non-linear trajecrtory - 2) The effects of conscientiousness on academic achievement are stronger at low levels of cognitive ability - 3) **Poverty affects child behaviour problems through increasing parental stress** - 4) Earlier pubertal onset increases the risk of antisocial behaviour only in girls and not boys --- # Testing a path mediation model in lavaan - Specification - Create a lavaan syntax object - Estimation - Estimate the model using e.g., maximum likelihood estimation - Evaluation/interpretation - Inspect the model to judge how good it is - Interpret the parameter estimates --- # Example .pull-left[ - Does peer rejection mediate the association between aggression and depression? ] .pull-right[ <img src="Mediation diagram example.png" width="17777" /> ] --- # The data ```r slice(agg.data2, 1:10) ``` ``` ## Dep PR Agg ## 1 -0.60953 0.1402 -0.79755 ## 2 -0.17544 1.3130 1.94009 ## 3 -0.91570 -1.1912 0.28842 ## 4 -0.58408 2.0781 1.18015 ## 5 1.04598 -1.2614 -0.27574 ## 6 -0.82088 -1.1755 -1.04011 ## 7 0.53421 -1.6130 -0.08443 ## 8 -0.70440 0.9898 -0.73269 ## 9 -0.19926 -0.8087 -0.06078 ## 10 0.07733 -0.8847 -1.13479 ``` --- # Mediation Example - Does peer rejection mediate the association between aggression and depression? ```r model1<-'Dep ~ PR # Depression predicted by peer rejection Dep ~ Agg # Depression predicted by aggression (the direct effect) PR ~ Agg # Peer rejection predicted by aggression ' ``` - Estimate the model ```r model1.est<-sem(model1, data=agg.data2) ``` --- # The model output .scroll-output[ ```r summary(model1.est, fit.measures=T) ``` ``` ## lavaan 0.6-9 ended normally after 12 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Model Test Baseline Model: ## ## Test statistic 210.280 ## Degrees of freedom 3 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 1.000 ## Tucker-Lewis Index (TLI) 1.000 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -1315.140 ## Loglikelihood unrestricted model (H1) -1315.140 ## ## Akaike (AIC) 2640.279 ## Bayesian (BIC) 2661.352 ## Sample-size adjusted Bayesian (BIC) 2645.482 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.000 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.000 ## P-value RMSEA <= 0.05 NA ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Dep ~ ## PR 0.280 0.048 5.799 0.000 ## Agg 0.247 0.047 5.290 0.000 ## PR ~ ## Agg 0.430 0.039 11.103 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Dep 0.878 0.056 15.811 0.000 ## .PR 0.752 0.048 15.811 0.000 ``` ] --- # Things to note from the model output - All three regressions paths are statistically significant - The model is **just-identified** - The degrees of freedom are equal to 0 - The model fit cannot be tested - The model fit statistics (TLI, CFI, RMSEA, SRMR) all suggest perfect fit but this is meaningless --- # Visualising the model using `semPaths()` - We can use semPaths() from the semPlot package to help us visualise the model - Shows the parameter estimates within an SEM diagram ```r library(semPlot) semPaths(model1.est, what='est') ``` --- # Visualising the model using `semPaths()` ![](week2_pathmediation_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- # Calculating the indirect effects .pull-left[ - To calculate the indirect effect of X on Y in path mediation, we need to create some new parameters - The indirect effect of X on Y via M is: - `\(a*b\)` - `\(a\)` = the regression coefficient for M~X - `\(b\)` = the regression coefficient for Y~M ] .pull-right[ <img src="Mediation diagram example a b.png" width="17777" /> ] --- # Calculating indirect effects in lavaan .pull-left[ - To calculate the indirect effect of X on Y in lavaan we: - Use parameter labels 'a' and 'b' to label the relevant paths - a is for the effect of X on M - b is for the effect of M on Y - Use the ':=' operator to create a new parameter 'ind' - 'ind' represents our indirect effect ] .pull-right[ ```r model1<-'Dep~b*PR Dep~Agg PR~a*Agg ind:=a*b ' ``` ] --- # Indirect effects in the output .scroll-output[ ```r model1.est<-sem(model1, data=agg.data2) summary(model1.est) ``` ``` ## lavaan 0.6-9 ended normally after 12 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Dep ~ ## PR (b) 0.280 0.048 5.799 0.000 ## Agg 0.247 0.047 5.290 0.000 ## PR ~ ## Agg (a) 0.430 0.039 11.103 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Dep 0.878 0.056 15.811 0.000 ## .PR 0.752 0.048 15.811 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ind 0.121 0.023 5.140 0.000 ``` ] --- # Statistical significance of the indirect effects - Default method of assessing the statistical significance of indirect effects assume normal sampling distribution - May not hold for indirect effects which are the product of regression coefficients - Instead we can use **bootstrapping** - Allows 95% confidence intervals (CIs) to be computed - If 95% CI includes 0, the indirect effect is not significant at alpha=.05 --- # Bootstapped CIs for indirect effect in lavaan ```r model1<-'Dep~b*PR Dep~Agg PR~a*Agg ind:=a*b' model1.est<-sem(model1, data=agg.data2, se='bootstrap') #we add the argument se='bootstrap' ``` --- # Output for bootstrapped CIs .scroll-output[ ```r summary(model1.est, ci=T) # we add the argument ci=T to see the confidence intervals in the output ``` ``` ## lavaan 0.6-9 ended normally after 12 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## Dep ~ ## PR (b) 0.280 0.049 5.677 0.000 0.185 0.378 ## Agg 0.247 0.047 5.277 0.000 0.148 0.337 ## PR ~ ## Agg (a) 0.430 0.037 11.514 0.000 0.352 0.504 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .Dep 0.878 0.058 15.052 0.000 0.760 0.990 ## .PR 0.752 0.049 15.268 0.000 0.657 0.851 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ind 0.121 0.024 5.047 0.000 0.075 0.171 ``` ] --- # Total effects in path mediation - As well as the direct and indirect effect, it is often of interest to know the **total** effect of X on Y `$$Total = Indirect + Direct$$` --- # Total effects in path mediation `$$Total = a*b + c$$` <img src="Mediation diagram example a b c.png" width="17777" /> --- # Total effect in lavaan ```r model1<-'Dep~b*PR Dep~c*Agg # we add the label c for our direct effect PR~a*Agg ind:=a*b total:=a*b+c # we add a new parameter for the total effect' model1.est<-sem(model1, data=agg.data2, se='bootstrap') #we add the argument se='bootstrap' ``` --- # Total effect in lavaan output .scroll-output[ ```r summary(model1.est, ci=T) ``` ``` ## lavaan 0.6-9 ended normally after 12 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## Dep ~ ## PR (b) 0.280 0.049 5.756 0.000 0.182 0.376 ## Agg (c) 0.247 0.045 5.522 0.000 0.159 0.337 ## PR ~ ## Agg (a) 0.430 0.036 11.846 0.000 0.358 0.500 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .Dep 0.878 0.059 14.933 0.000 0.766 0.996 ## .PR 0.752 0.047 16.044 0.000 0.665 0.846 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ind 0.121 0.024 5.109 0.000 0.075 0.169 ## total 0.368 0.042 8.757 0.000 0.292 0.451 ``` ] --- # Why code the total effect in lavaan? - We could have just added up the coefficients for the direct and indirect effects - By coding it in lavaan, however, we can assess the statistical significance of the total effect - Useful because sometimes the direct and indirect effects are not individually significant but the total effect is - May be especially relevant in cases where there are many mediators of small effect --- # Interpreting the total, direct, and indirect effect coefficients - The total effect can be interpreted as the **unit increase in Y expected to occur when X increases by one unit** - The indirect effect can be interpreted as the **unit increase in Y expected to occur via M when X increases by one unit** - The direct effect can be interpreted as the **unit increase in Y expected to occur with a unit increase in X over and above the increase transmitted by M** - **Note**: 'direct' effect may not actually be direct - it may be acting via other mediators not included in our model --- # Standardised parameters - As with CFA models, standardised parameters can be obtained using: ```r summary(model1.est, ci=T, std=T) ``` --- # Standardised parameters .scroll-output[ ``` ## lavaan 0.6-9 ended normally after 12 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 5 ## ## Number of observations 500 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Bootstrap ## Number of requested bootstrap draws 1000 ## Number of successful bootstrap draws 1000 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## Dep ~ ## PR (b) 0.280 0.049 5.756 0.000 0.182 0.376 ## Agg (c) 0.247 0.045 5.522 0.000 0.159 0.337 ## PR ~ ## Agg (a) 0.430 0.036 11.846 0.000 0.358 0.500 ## Std.lv Std.all ## ## 0.280 0.262 ## 0.247 0.239 ## ## 0.430 0.445 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## .Dep 0.878 0.059 14.933 0.000 0.766 0.996 ## .PR 0.752 0.047 16.044 0.000 0.665 0.846 ## Std.lv Std.all ## 0.878 0.819 ## 0.752 0.802 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper ## ind 0.121 0.024 5.109 0.000 0.075 0.169 ## total 0.368 0.042 8.757 0.000 0.292 0.451 ## Std.lv Std.all ## 0.121 0.117 ## 0.368 0.355 ``` ] --- # BREAK QUIZ - Time for a pause - Quiz question - If the effect of X on M is b=.30 and the effect of M on Y is b=.10, what is the indirect effect of X on Y? - 1) b=.40 - 2) b=.03 - 3) b=.30 - 4) b=.10 --- class: inverse, center, middle, animated, rotateInDownLeft # End of Part 3 --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Introduction to mediation</h2> <h2 style="text-align: left;opacity:0.3;">Part 2: Direct, indirect and total effects</h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Estimating mediation in `lavaan`</h2> <h2>Part 4: Reporting</h2> --- # Welcome back - The answer to the quiz question is... - Quiz question - If the effect of X on M is b=.30 and the effect of M on Y is b=.10, what is the indirect effect of X on Y? - 1) b=.40 - 2) **b=.03** - 3) b=.30 - 4) b=.10 --- # Reporting path mediation models - Methods/ Analysis Strategy - The model being tested - e.g. 'Y was regressed on both X and M and M was regressed on X' - The estimator used (e.g., maximum likelihood estimation) - The method used to test the significance of indirect effects ('bootstrapped 95% confidence intervals') - Results - Model fit (for over-identified models) - The parameter estimates for the path mediation and their statistical significance - Can be useful to present these in a SEM diagram - The diagrams from R not considered 'publication quality' draw in powerpoint or similar --- # Reporting path mediation models - example of SEM diagram with results .pull-left[ - Include the key parameter estimates - Indicate statistically significant paths (e.g. with an '*') - Include a figure note that explains how statistically significant paths (and at what level) are signified ] .pull-right[ <img src="med reporting.png" width="1707" /> ] --- # Reporting path mediation models - the indirect effects - Results - The coefficient for the indirect effect and the bootstrapped 95% confidence intervals - Common to also report **proportion mediation**: `$$\frac{indirect}{total}$$` - However, important to be aware of limitations: - Big proportion mediation possible when total effect is small - makes effect seem more impressive - Small proportion mediation even when total effect is big - can underplay importance of effect - Should be interpreted in context of total effect - Tricky interpretation if there are a mix of negative and positive effects involved --- # Extensions of path mediation models - We can extend our path mediation model in various ways: - Several mediators in sequence or parallel - Multiple outcomes - Multiple predictors - Multiple groups (e.g., comparing direct and indirect effects across males and females) - Add covariates to adjust for potential confounders --- # Example: Multiple mediation model .pull-left[ ```r model2<-'Dep~b2*Aca Aca~a2*Agg Dep~b1*PR PR~a1*Agg Dep~c*Agg ind1:=a1*b1 ind2:=a2*b2 total=a1*b1+a2*b2+c ' ``` ] .pull-right[ <img src="Mediation diagram example multiple mediation.png" width="17777" /> ] --- ## Other path analysis models - Path mediation models are a common application of path models - But they are just one example - Anything that can be expressed in terms of regressions between observed variables can be tested as a path model - Can include ordinal or binary variables - Can include moderation - Other common path analysis models include: - Autoregressive models for longitudinal data - Cross-lagged panel models for longitudinal data --- # Making model modifications - You **may** want to make some modifications to your initially hypothesised model - non-significant paths that you want to trim - include some additional paths not initially included - Remember that this now moves us into exploratory territory where: - Model modifications should be substantively as well as statistically justifiable - You must be aware of the possibility that you are capitalising on chance - You should aim to replicate the modifications in independent data --- # Cautions regarding path analysis models - **Assumption** that the paths represent causal effects is only an assumption - Especially if using cross-sectional data - Mediation models should ideally be estimated on longitudinal data. - X time 1 - M time 2 - Y time 3 - The parameters are only accurate if the model is correctly specified --- # Cautions: Indistinguishable models <img src="med versus confounding.png" width="17777" /> --- # Measurement error in path analysis - Path analysis models use observed variables - Assumes no measurement error in these variables - Path coefficients likely to be attenuated due to unmodelled measurement error - Structural equation models solve this issue - They are path analysis models where the paths are between latent rather than observed variables - ...very brief comment on this in the final week --- # Path analysis summary - Path analysis can be used to fit sets of regression models - Common path analysis model is the path mediation model - But very flexible huge range of models that can be tested - In R, path analysis can be done using the `sem()` function in `lavaan` - Need to be aware that we aren't *testing* causality but assuming it --- class: extra, inverse, center, middle, animated, rotateInDownLeft # End