class: center, middle, inverse, title-slide #
Week 5: Individual Differences, part 2
## Multivariate Statistics and Methodology using R (MSMR)
### Dan Mirman ### Department of Psychology
The University of Edinburgh ### AY 2020-2021 --- # "Internal" individual differences * No "external" measure that can be entered as a fixed effect * Individual differences needed for a different analysis (e.g., LSM) .pull-left[ ### A simple example <img src="./figs/IndivDiffsDemo.png" width="75%" /> ] .pull-right[ **Random effects provide a way to quantify individual effect sizes in the context of a model of overall group performance**: Participant A: `\(\zeta_{A1} - \zeta_{A0} = 1 - (-1) = 2\)` Participant B: `\(\zeta_{B1} - \zeta_{B0} = (-1) - 1 = -2\)` ] --- # Example Data (Made-up): Effect of school mental health services on educational achievement (`EducMH`) ```r load("./data/EducMH.RData") summary(EducMH) ``` ``` ## ID Condition Year SDQ Math ## 101 : 6 Control :540 Min. :2009 Min. : 6.6 Min. : 81.7 ## 102 : 6 Treatment:540 1st Qu.:2010 1st Qu.:16.2 1st Qu.:147.5 ## 103 : 6 Median :2012 Median :17.9 Median :165.1 ## 104 : 6 Mean :2012 Mean :17.9 Mean :164.8 ## 105 : 6 3rd Qu.:2013 3rd Qu.:19.6 3rd Qu.:181.2 ## 106 : 6 Max. :2014 Max. :33.8 Max. :259.3 ## (Other):1044 NA's :540 ``` * `Condition` = Treatment (students who received mental health services) vs. Control (academically matched group of students who did not receive services) * `SDQ` = Strengths and Difficulties Questionnaire: a brief behavioural screening for mental health, only available for Treatment group. Lower scores are better (Total difficulties). * `Math` = Score on standardised math test --- ```r ggplot(EducMH, aes(Year, Math, color=Condition, fill=Condition)) + stat_summary(fun=mean, geom="line") + stat_summary(fun.data=mean_se, geom="ribbon", color=NA, alpha=0.3) + labs(y="Math Achievement Score") + theme_bw(base_size=12) + scale_color_manual(values=c("red", "blue")) + scale_fill_manual(values=c("red", "blue")) ``` ![](msmr_lec05-2_IndivDiffs_Int_files/figure-html/unnamed-chunk-2-1.png)<!-- --> **Question 1**: Did the school mental health services improve academic achievement? That is, did the two groups differ on math achievement at baseline and over the 6 years of the study? **Question 2**: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores? --- # Question 1 **Did the school mental health services improve academic achievement? That is, did the two groups differ on math achievement at baseline and over the 6 years of the study?** ```r # adjust time variable to have a sensible intercept EducMH$Time <- EducMH$Year - 2009 # fit the models m.base <- lmer(Math ~ Time + (Time | ID), data=EducMH, REML=F) m.0 <- lmer(Math ~ Time + Condition + (Time | ID), data=EducMH, REML=F) m.1 <- lmer(Math ~ Time*Condition + (Time | ID), data=EducMH, REML=F) ``` --- # Question 1 **Did the school mental health services improve academic achievement? That is, did the two groups differ on math achievement at baseline and over the 6 years of the study?** Compare the models ```r anova(m.base, m.0, m.1) ``` ``` ## Data: EducMH ## Models: ## m.base: Math ~ Time + (Time | ID) ## m.0: Math ~ Time + Condition + (Time | ID) ## m.1: Math ~ Time * Condition + (Time | ID) ## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) ## m.base 6 7847 7876 -3917 7835 ## m.0 7 7848 7883 -3917 7834 0.15 1 0.6947 ## m.1 8 7843 7883 -3914 7827 7.02 1 0.0081 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` There was no group difference at baseline, but there was a group difference on slope. That is, math achievement in the two groups started out the same, but increased more quickly in the Treatment group. --- ```r gt(tidy(m.1)) ```
effect
group
term
estimate
std.error
statistic
df
p.value
fixed
NA
(Intercept)
149.29737
2.0295
73.5636
180
3.006e-136
fixed
NA
Time
5.21577
0.3737
13.9560
180
1.776e-30
fixed
NA
ConditionTreatment
1.33710
2.8701
0.4659
180
6.419e-01
fixed
NA
Time:ConditionTreatment
1.41412
0.5285
2.6755
180
8.149e-03
ran_pars
ID
sd__(Intercept)
18.86595
NA
NA
NA
NA
ran_pars
ID
cor__(Intercept).Time
0.09138
NA
NA
NA
NA
ran_pars
ID
sd__Time
3.31043
NA
NA
NA
NA
ran_pars
Residual
sd__Observation
5.31090
NA
NA
NA
NA
--- # Question 2 **For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?** First make a plot of what we're interested in: the treatment group's change in the SDQ over time showing both group mean (black line with error bars) and individual variability (grey lines) ![](msmr_lec05-2_IndivDiffs_Int_files/figure-html/unnamed-chunk-6-1.png)<!-- --> Within the treatment group, there is not an overall change in mental health (SDQ), but it looks like there is lots of variability in response to the mental health services. Some people responded really well (big decreases in difficulties on SDQ), some people didn't respond well (increased difficulties according to SDQ). -- We want to know whether this variability is associated with variability in improved math achievement. --- # Analysis strategy 1. Build separate models for change in SDQ and change in Math scores over time 2. Use random effects to quantify individual differences in change over time for the two scores 3. Test the correlation between change in SDQ and in Math achievement (and make a scatterplot showing this). --- # Analysis 1. Build separate models for change in SDQ and change in Math scores over time ```r m.math <- lmer(Math ~ Time + (Time | ID), data=subset(EducMH, Condition == "Treatment"), REML=F) m.sdq <- lmer(SDQ ~ Time + (Time | ID), data=subset(EducMH, Condition == "Treatment"), REML=F) ``` --- # Analysis 1. Build separate models for change in SDQ and change in Math scores over time 2. Use random effects to quantify individual differences in change over time for the two scores ```r source("get_ranef.R") # get_ranef() will extract the named random effect and clean them up a bit re.math <- get_ranef(m.math, "ID") re.sdq <- get_ranef(m.sdq, "ID") # merge() will combine those into one data frame, but needs some help because the variable names are all the same re <- merge(re.math, re.sdq, by="ID", suffixes = c(".math", ".sdq")) summary(re) ``` ``` ## ID Intercept.math Time.math Intercept.sdq ## Length:90 Min. :-59.50 Min. :-3.372 Min. :-2.0326 ## Class :character 1st Qu.: -9.85 1st Qu.:-0.733 1st Qu.:-0.7895 ## Mode :character Median : -0.01 Median : 0.118 Median : 0.0341 ## Mean : 0.00 Mean : 0.000 Mean : 0.0000 ## 3rd Qu.: 11.48 3rd Qu.: 0.694 3rd Qu.: 0.6024 ## Max. : 57.64 Max. : 2.747 Max. : 3.0828 ## Time.sdq ## Min. :-2.4951 ## 1st Qu.:-0.7468 ## Median : 0.0989 ## Mean : 0.0000 ## 3rd Qu.: 0.5482 ## Max. : 3.0944 ``` ```r head(re) ``` ``` ## ID Intercept.math Time.math Intercept.sdq Time.sdq ## 1 201 0.8412 -1.542050 -0.9793 0.8381 ## 2 202 9.6071 0.029529 0.5728 0.3952 ## 3 203 -14.6657 -0.573849 0.5872 1.1176 ## 4 204 -3.5781 -0.290299 -0.8056 0.4423 ## 5 205 2.1411 -0.852670 0.4991 0.8463 ## 6 206 31.4592 0.006922 0.2029 0.6119 ``` --- # Analysis 1. Build separate models for change in SDQ and change in Math scores over time 2. Use random effects to quantify individual differences in change over time for the two scores 3. Test the correlation between change in SDQ and in Math achievement (and make a scatterplot showing this). ```r cor.test(re$Time.math, re$Time.sdq) ``` ``` ## ## Pearson's product-moment correlation ## ## data: re$Time.math and re$Time.sdq ## t = -11, df = 88, p-value <2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.8455 -0.6749 ## sample estimates: ## cor ## -0.7739 ``` Strong correlation ( `\(r = -0.77, p < 0.0001\)` ) indicating that response to mental health intervention (decreased difficulties) was associated with larger increases in math achievement. Note that the key quantities here are **slopes**. That is, the **rate** of decreased mental health difficulties is associated with a higher **rate** of math achievement. --- # Analysis 1. Build separate models for change in SDQ and change in Math scores over time 2. Use random effects to quantify individual differences in change over time for the two scores 3. Test the correlation between change in SDQ and in Math achievement (and **make a scatterplot showing this**). ```r ggplot(re, aes(Time.math, Time.sdq)) + geom_point() + stat_smooth(method="lm") + labs(x="Relative Rate of Increase in\nMath Score", y="Relative Rate of Decrease in\nSDQ Total Difficulties Score") + theme_bw(base_size=12) ``` ![](msmr_lec05-2_IndivDiffs_Int_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- # Why use random effects instead of individual models? .pull-left[ <img src="./figs/shrinkage-plot-1.png" width="75%"/> http://tjmahr.github.io/plotting-partial-pooling-in-mixed-effects-models/ ] .pull-right[ An individual's performance (on the math test, on the SDQ) is their actual level plus some noise. Individual models (no pooling) don't make that distinction, so you have a noisy estimate of individual differences. Multilevel models reduce the noise component using the mean and variance of the rest of the group (partial pooling). **This produces a better estimate of true individual differences.** See also: Efron, B. & Morris, C. (1977). Stein's Paradox in Statistics. Scientific American, 236:5, 119-127. ] --- # Key points * Individual differences provide additional insights into phenomena of interest. Can serve as further tests of a hypothesis * Random effects provide a useful way to quantify individual differences in the context of a group-level model * Partial pooling / shrinkage improves individual difference estimates