class: center, middle, inverse, title-slide .title[ #
Centering Predictors in MLM
] .subtitle[ ## Data Analysis for Psychology in R 3 ] .author[ ### Josiah King ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- --- # Centering .pull-left[ Suppose we have a variable for which the mean is 100. ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-1-1.svg)<!-- --> ] -- .pull-right[ We can re-center this so that the mean becomes zero: ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-2-1.svg)<!-- --> ] --- count:false # Centering .pull-left[ Suppose we have a variable for which the mean is 100. ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-3-1.svg)<!-- --> ] .pull-right[ We can re-center this so that _any_ value becomes zero: ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-4-1.svg)<!-- --> ] --- # Scaling .pull-left[ Suppose we have a variable for which the mean is 100. The standard deviation is 15 ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-5-1.svg)<!-- --> ] -- .pull-right[ We can scale this so that a change in 1 is equivalent to a change in 1 standard deviation: ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-6-1.svg)<!-- --> ] --- # Centering predictors in LM .pull-left[ ```r m1 <- lm(y~x,data=df) m2 <- lm(y~scale(x, center=T,scale=F),data=df) m3 <- lm(y~scale(x, center=T,scale=T),data=df) m4 <- lm(y~I(x-5), data=df) ``` ] --- count: false # Centering predictors in LM .pull-left[ ```r m1 <- lm(y~x,data=df) m2 <- lm(y~scale(x, center=T,scale=F),data=df) m3 <- lm(y~scale(x, center=T,scale=T),data=df) m4 <- lm(y~I(x-5), data=df) ``` ```r anova(m1,m2,m3,m4) ``` ``` ## Analysis of Variance Table ## ## Model 1: y ~ x ## Model 2: y ~ scale(x, center = T, scale = F) ## Model 3: y ~ scale(x, center = T, scale = T) ## Model 4: y ~ I(x - 5) ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 198 177 ## 2 198 177 0 -2.84e-14 ## 3 198 177 0 0.00e+00 ## 4 198 177 0 0.00e+00 ``` ] -- .pull-right[ <img src="dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> ] --- # Big Fish Little Fish <img src="dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> data available at https://uoepsy.github.io/data/bflp.csv --- # Things are different with multi-level data <img src="dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" /> --- # Multiple means .pull-left[ __Grand mean__ ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-15-1.svg)<!-- --> ] -- .pull-right[ __Group means__ ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-16-1.svg)<!-- --> ] --- # Group-mean centering .pull-left[ <center>__ `\(x_{ij} - \bar{x}_i\)` __</center><br> ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-17-1.svg)<!-- --> ] --- # Group-mean centering <br> <img src="jk_img_sandbox/center.gif" style="display: block; margin: auto;" /> --- # Group-mean centering .pull-left[ <center>__ `\(x_{ij} - \bar{x}_i\)` __</center><br> ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-21-1.svg)<!-- --> ] .pull-right[ <center>__ `\(\bar{x}_i\)` __</center><br> ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-22-1.svg)<!-- --> ] --- # Disaggregating within & between .pull-left[ **RE model** $$ `\begin{align} y_{ij} &= \beta_{0i} + \beta_{1}(x_j) + \varepsilon_{ij} \\ \beta_{0i} &= \gamma_{00} + \zeta_{0i} \\ ... \\ \end{align}` $$ ```r rem <- lmer(self_esteem ~ fish_weight + (1 | pond), data=bflp) ``` ] -- .pull-right[ **Within-between model** $$ `\begin{align} y_{ij} &= \beta_{0i} + \beta_{1}(\bar{x}_i) + \beta_2(x_{ij} - \bar{x}_i)+ \varepsilon_{ij} \\ \beta_{0i} &= \gamma_{00} + \zeta_{0i} \\ ... \\ \end{align}` $$ ```r bflp <- bflp %>% group_by(pond) %>% mutate( fw_pondm = mean(fish_weight), fw_pondc = fish_weight - mean(fish_weight) ) %>% ungroup wbm <- lmer(self_esteem ~ fw_pondm + fw_pondc + (1 | pond), data=bflp) fixef(wbm) ``` ``` ## (Intercept) fw_pondm fw_pondc ## 4.76802 -0.05586 0.04067 ``` ] --- # Disaggregating within & between .pull-left[ <img src="dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-25-1.svg" style="display: block; margin: auto;" /> ] .pull-right[ **Within-between model** $$ `\begin{align} y_{ij} &= \beta_{0i} + \beta_{1}(\bar{x}_i) + \beta_2(x_{ij} - \bar{x}_i)+ \varepsilon_{ij} \\ \beta_{0i} &= \gamma_{00} + \zeta_{0i} \\ ... \\ \end{align}` $$ ```r bflp <- bflp %>% group_by(pond) %>% mutate( fw_pondm = mean(fish_weight), fw_pondc = fish_weight - mean(fish_weight) ) %>% ungroup wbm <- lmer(self_esteem ~ fw_pondm + fw_pondc + (1 | pond), data=bflp) fixef(wbm) ``` ``` ## (Intercept) fw_pondm fw_pondc ## 4.76802 -0.05586 0.04067 ``` ] --- # A more realistic example .pull-left[ A research study investigates how anxiety is associated with drinking habits. Data was collected from 50 participants. Researchers administered the generalised anxiety disorder (GAD-7) questionnaire to measure levels of anxiety over the past week, and collected information on the units of alcohol participants had consumed within the week. Each participant was observed on 10 different occasions. ] .pull-right[ ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-28-1.svg)<!-- --> data available at https://uoepsy.github.io/data/alcgad.csv ] --- # A more realistic example .pull-left[ Is being more nervous (than you usually are) associated with higher consumption of alcohol? ] .pull-right[ ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-29-1.svg)<!-- --> ] --- # A more realistic example .pull-left[ Is being generally more nervous (relative to others) associated with higher consumption of alcohol? ] .pull-right[ ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-30-1.svg)<!-- --> ] --- # Modelling within & between effects .pull-left[ ```r alcgad <- alcgad %>% group_by(ppt) %>% mutate( gadm=mean(gad), gadmc=gad-gadm ) alcmod <- lmer(alcunits ~ gadm + gadmc + (1 + gadmc | ppt), data=alcgad, control=lmerControl(optimizer = "bobyqa")) ``` ] .pull-right[ ```r summary(alcmod) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: alcunits ~ gadm + gadmc + (1 + gadmc | ppt) ## Data: alcgad ## Control: lmerControl(optimizer = "bobyqa") ## ## REML criterion at convergence: 1424 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.8466 -0.6264 0.0642 0.6292 3.0281 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## ppt (Intercept) 3.7803 1.944 ## gadmc 0.0935 0.306 -0.30 ## Residual 1.7234 1.313 ## Number of obs: 375, groups: ppt, 50 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 14.5802 0.8641 16.87 ## gadm -0.7584 0.1031 -7.35 ## gadmc 0.6378 0.0955 6.68 ## ## Correlation of Fixed Effects: ## (Intr) gadm ## gadm -0.945 ## gadmc -0.055 0.012 ``` ] --- # Modelling within & between interactions .pull-left[ ```r alcmod <- lmer(alcunits ~ (gadm + gadmc)*interv + (1 | ppt), data=alcgad, control=lmerControl(optimizer = "bobyqa")) ``` ] .pull-right[ ```r summary(alcmod) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: alcunits ~ (gadm + gadmc) * interv + (1 | ppt) ## Data: alcgad ## Control: lmerControl(optimizer = "bobyqa") ## ## REML criterion at convergence: 1404 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.8183 -0.6354 0.0142 0.5928 3.0874 ## ## Random effects: ## Groups Name Variance Std.Dev. ## ppt (Intercept) 3.59 1.9 ## Residual 1.69 1.3 ## Number of obs: 375, groups: ppt, 50 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 14.858 1.275 11.65 ## gadm -0.876 0.154 -5.70 ## gadmc 1.092 0.128 8.56 ## interv -0.549 1.711 -0.32 ## gadm:interv 0.205 0.205 1.00 ## gadmc:interv -0.757 0.166 -4.57 ## ## Correlation of Fixed Effects: ## (Intr) gadm gadmc interv gdm:nt ## gadm -0.939 ## gadmc 0.000 0.000 ## interv -0.746 0.700 0.000 ## gadm:interv 0.705 -0.750 0.000 -0.944 ## gadmc:intrv 0.000 0.000 -0.770 0.000 0.000 ``` ] --- # Total effect .pull-left[ ```r alcmod2 <- lmer(alcunits ~ gad + (1 | ppt), data=alcgad, control=lmerControl(optimizer = "bobyqa")) ``` ] .pull-right[ ```r summary(alcmod2) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: alcunits ~ gad + (1 | ppt) ## Data: alcgad ## Control: lmerControl(optimizer = "bobyqa") ## ## REML criterion at convergence: 1494 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.9940 -0.6414 0.0258 0.5808 2.9825 ## ## Random effects: ## Groups Name Variance Std.Dev. ## ppt (Intercept) 14.32 3.78 ## Residual 1.83 1.35 ## Number of obs: 375, groups: ppt, 50 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 5.1787 0.8198 6.32 ## gad 0.4281 0.0779 5.50 ## ## Correlation of Fixed Effects: ## (Intr) ## gad -0.752 ``` ] --- # Within & Between effects .pull-left[ ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-38-1.svg)<!-- --> ] .pull-right[ ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-39-1.svg)<!-- --> ] --- count:false # Within & Between effects .pull-left[ ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-40-1.svg)<!-- --> ] -- .pull-right[ ![](dapr3_2324_03b_centering_files/figure-html/unnamed-chunk-41-1.svg)<!-- --> ] --- # When do we need to think about it? When we have a predictor `\(x\)` that varies _within_ a cluster and When clusters have different average levels of `\(x\)`. This typically only happens when `\(x\)` is *observed* (vs manipulated as part of study) and When our question concerns `\(x\)`. (if `\(x\)` is just a covariate, no need). --- # Summary - Applying the same linear transformation to a predictor (e.g. grand-mean centering, or standardising) makes __no difference__ to our model or significance tests - but it may change the meaning and/or interpretation of our parameters - When data are clustered, we can apply group-level transformations, e.g. __group-mean centering.__ - Group-mean centering our predictors allows us to disaggregate __within__ from __between__ effects. - allowing us to ask the theoretical questions that we are actually interested in --- class: inverse, center, middle, animated, rotateInDownLeft # End