class: center, middle, inverse, title-slide .title[ #
Linear Models and Clustered Data
] .subtitle[ ## Data Analysis for Psychology in R 3 ] .author[ ### Josiah King ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- class: inverse, center, middle <h2>Part 1: Linear Regression Refresh</h2> <h2 style="text-align: left;opacity:0.3;">Part 2: Clustered Data</h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Where we're going</h2> <h2 style="text-align: left;opacity:0.3;">Extra slides (optional): Other Approaches</h2> --- # Models .pull-left[ __deterministic__ given the same input, deterministic functions return *exactly* the same output - `\(y = mx + c\)` - area of sphere = `\(4\pi r^2\)` - height of fall = `\(1/2 g t^2\)` - `\(g = \textrm{gravitational constant, }9.8m/s^2\)` - `\(t = \textrm{time (in seconds) of fall}\)` ] -- .pull-right[ __statistical__ .br2.f4.white.bg-gray[ $$ \textrm{outcome} = (\textrm{model}) + \color{black}{\textrm{error}} $$ ] - handspan = height + randomness - cognitive test score = age + premorbid IQ + ... + randomness ] --- # The Linear Model .br3.pa2.f2[ $$ `\begin{align} \color{red}{\textrm{outcome}} & = \color{blue}{(\textrm{model})} + \textrm{error} \\ \color{red}{y_i} & = \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} x_i} + \varepsilon_i \\ \text{where } \\ \varepsilon_i & \sim N(0, \sigma) \text{ independently} \\ \end{align}` $$ ] --- # The Linear Model .flex.items-top[ .w-50.pa2[ Our proposed model of the world: `\(\color{red}{y_i} = \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} x_i} + \varepsilon_i\)` {{content}} ] .w-50.pa2[ ![](01_lmcluster_files/figure-html/bb-1.svg)<!-- --> ]] -- Our model _fitted_ to some data (note the `\(\widehat{\textrm{hats}}\)`): `\(\hat{y}_i = \color{blue}{\hat \beta_0 \cdot{} 1 + \hat \beta_1 \cdot{} x_i}\)` {{content}} -- For the `\(i^{th}\)` observation: - `\(\color{red}{y_i}\)` is the value we observe for `\(x_i\)` - `\(\hat{y}_i\)` is the value the model _predicts_ for `\(x_i\)` - `\(\color{red}{y_i} = \hat{y}_i + \hat\varepsilon_i\)` --- # An Example .flex.items-top[ .w-50.pa2[ `\(\color{red}{y_i} = \color{blue}{5 \cdot{} 1 + 2 \cdot{} x_i} + \hat\varepsilon_i\)` __e.g.__ for the observation `\(x_i = 1.2, \; y_i = 9.9\)`: $$ `\begin{align} \color{red}{9.9} & = \color{blue}{5 \cdot{}} 1 + \color{blue}{2 \cdot{}} 1.2 + \hat\varepsilon_i \\ & = 7.4 + \hat\varepsilon_i \\ & = 7.4 + 2.5 \\ \end{align}` $$ ] .w-50.pa2[ ![](01_lmcluster_files/figure-html/errplot-1.svg)<!-- --> ]] --- # Categorical Predictors .pull-left[ <br><table> <thead> <tr> <th style="text-align:left;"> y </th> <th style="text-align:left;"> x </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 7.99 </td> <td style="text-align:left;"> Category1 </td> </tr> <tr> <td style="text-align:left;"> 4.73 </td> <td style="text-align:left;"> Category0 </td> </tr> <tr> <td style="text-align:left;"> 3.66 </td> <td style="text-align:left;"> Category0 </td> </tr> <tr> <td style="text-align:left;"> 3.41 </td> <td style="text-align:left;"> Category0 </td> </tr> <tr> <td style="text-align:left;"> 5.75 </td> <td style="text-align:left;"> Category1 </td> </tr> <tr> <td style="text-align:left;"> 5.66 </td> <td style="text-align:left;"> Category0 </td> </tr> <tr> <td style="text-align:left;"> ... </td> <td style="text-align:left;"> ... </td> </tr> </tbody> </table> ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-2-1.svg)<!-- --> ] --- # Multiple Predictors .pull-left[ ![](01_lmcluster_files/figure-html/unnamed-chunk-3-1.svg)<!-- --> ] -- .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-4-1.svg)<!-- --> ] --- exclude: true # Multiple Predictors .pull-left[ <img src="jk_img_sandbox/SSblank1.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] .pull-right[] --- # Multiple Predictors .pull-left[ <img src="jk_img_sandbox/SSblank1cov.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] .pull-right[ ```r lm(y ~ x1) ``` ``` ## ## Call: ## lm(formula = y ~ x1) ## ## Coefficients: ## (Intercept) x1 ## 0.0469 0.1436 ``` ] --- # Multiple Predictors .pull-left[ <img src="jk_img_sandbox/SStype3.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] .pull-right[ ```r lm(y ~ x1 + x2) ``` ``` ## ## Call: ## lm(formula = y ~ x1 + x2) ## ## Coefficients: ## (Intercept) x1 x2 ## 0.0428 0.1425 0.1273 ``` ```r resx1.x2 = resid(lm(x1 ~ x2)) lm(y ~ resx1.x2) ``` ``` ## ## Call: ## lm(formula = y ~ resx1.x2) ## ## Coefficients: ## (Intercept) resx1.x2 ## 0.0198 0.1425 ``` ] --- # Interactions .pull-left[ ![](01_lmcluster_files/figure-html/unnamed-chunk-12-1.svg)<!-- --> ] -- .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-13-1.svg)<!-- --> ] --- # Notation `\(\begin{align} \color{red}{y} \;\;\;\; & = \;\;\;\;\; \color{blue}{\beta_0 \cdot{} 1 + \beta_1 \cdot{} x_1 + ... + \beta_k \cdot x_k} & + & \;\;\;\varepsilon \\ \qquad \\ \color{red}{\begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ \vdots \\ y_n \end{bmatrix}} & = \color{blue}{\begin{bmatrix} 1 & x_{11} & x_{21} & \dots & x_{k1} \\ 1 & x_{12} & x_{22} & & x_{k2} \\ 1 & x_{13} & x_{23} & & x_{k3} \\ 1 & x_{14} & x_{24} & & x_{k4} \\ 1 & x_{15} & x_{25} & & x_{k5} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & x_{2n} & \dots & x_{kn} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \vdots \\ \beta_k \end{bmatrix}} & + & \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \\ \varepsilon_5 \\ \vdots \\ \varepsilon_n \end{bmatrix} \\ \qquad \\ \\\color{red}{\boldsymbol y} \;\;\;\;\; & = \qquad \qquad \;\;\; \mathbf{\color{blue}{X \qquad \qquad \qquad \;\;\;\: \boldsymbol \beta}} & + & \;\;\; \boldsymbol \varepsilon \\ \end{align}\)` --- # Link functions `\(\begin{align} \color{red}{y} = \mathbf{\color{blue}{X \boldsymbol{\beta}} + \boldsymbol{\varepsilon}} & \qquad & (-\infty, \infty) \\ \qquad \\ \qquad \\ \color{red}{ln \left( \frac{p}{1-p} \right) } = \mathbf{\color{blue}{X \boldsymbol{\beta}} + \boldsymbol{\varepsilon}} & \qquad & [0,1] \\ \qquad \\ \qquad \\ \color{red}{ln (y) } = \mathbf{\color{blue}{X \boldsymbol{\beta}} + \boldsymbol{\varepsilon}} & \qquad & (0, \infty) \\ \end{align}\)` --- # Linear Models in R - Linear regression ```r linear_model <- lm(continuous_y ~ x1 + x2 + x3*x4, data = df) ``` - Logistic regression ```r logistic_model <- glm(binary_y ~ x1 + x2 + x3*x4, data = df, family=binomial(link="logit")) ``` - Poisson regression ```r poisson_model <- glm(count_y ~ x1 + x2 + x3*x4, data = df, family=poisson(link="log")) ``` --- # Inference for Coefficients <img src="jk_img_sandbox/sum1.png" width="2560" height="500px" /> --- # Inference for Coefficients <img src="jk_img_sandbox/sum2.png" width="2560" height="500px" /> --- # Inference for Coefficients <img src="jk_img_sandbox/sum3.png" width="2560" height="500px" /> --- # Inference for Coefficients <img src="jk_img_sandbox/sum4.png" width="2560" height="500px" /> --- # Sums of Squares Rather than specific coefficients, we can also think of our model as a whole. We can talk in terms of the sums of squared residuals `\(\sum^{n}_{i=1}(y_i - \hat y_i)^2\)`. .pull-left[ <img src="jk_img_sandbox/SS3xmodel1.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-22-1.svg)<!-- --> ] --- # `\(R^2\)` .pull-left[ <img src="jk_img_sandbox/SSr2.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] .pull-right[ `\(R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}}\)` ```r mdl <- lm(y ~ x1 + x2) summary(mdl)$r.squared ``` ``` ## [1] 0.1935 ``` ] --- # Inference: Joint tests We can test reduction in residual sums of squares: .pull-left[ ```r m1 <- lm(y ~ x1, data = df) ``` <img src="jk_img_sandbox/SS3xmodel1.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] .pull-right[ ```r m2 <- lm(y ~ x1 + x2 + x3, data = df) ``` <img src="jk_img_sandbox/SS3xfullmodel.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] --- # Inference: Joint tests i.e. isolating the improvement in model fit due to inclusion of additional parameters .pull-left[ ```r m1 <- lm(y ~ x1, data = df) m2 <- lm(y ~ x1 + x2 + x3, data = df) anova(m1,m2) ``` ``` ## Analysis of Variance Table ## ## Model 1: y ~ x1 ## Model 2: y ~ x1 + x2 + x3 ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 98 1141 ## 2 96 357 2 785 106 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] .pull-right[ <img src="jk_img_sandbox/SS3xincrement.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] --- # Inference: Joint tests "additional parameters" could be a set of coefficients for levels of a categorical variable. This provides a way of assessing "are there differences in group means?". .pull-left[ ```r m1 = lm(y ~ x1, data = df) m2 = lm(y ~ x1 + grp, data = df) coef(m2) ``` ``` ## (Intercept) x1 grpb grpc grpd ## 13.841 1.059 -3.162 -2.943 -3.415 ``` ```r anova(m1, m2) ``` ``` ## Analysis of Variance Table ## ## Model 1: y ~ x1 ## Model 2: y ~ x1 + grp ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 98 1141 ## 2 95 950 3 192 6.39 0.00055 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] .pull-right[ <img src="jk_img_sandbox/SSblankq.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] --- exclude: true # Inference: Joint tests This is kind of where traditional analysis of variance sits. Think of it as testing the addition of each variable entered in to the model, .pull-left[ ```r m2 = lm(y ~ x1 + grp, data = df) anova(m2) ``` ``` ## Analysis of Variance Table ## ## Response: y ## Df Sum Sq Mean Sq F value Pr(>F) ## x1 1 91 90.6 9.06 0.00334 ** ## grp 3 192 63.9 6.39 0.00055 *** ## Residuals 95 950 10.0 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] .pull-right[ <img src="jk_img_sandbox/SStype1.png" width="400px" height="350px" style="display: block; margin: auto;" /> ] --- # Assumptions Our model: `\(\color{red}{y} = \color{blue}{\mathbf{X \boldsymbol \beta}} + \varepsilon \qquad \text{where } \boldsymbol \varepsilon \sim N(0, \sigma) \text{ independently}\)` Our ability to generalise from the model we fit on sample data to the wider population requires making some _assumptions._ -- - assumptions about the nature of the **model** .tr[ (linear) ] -- - assumptions about the nature of the **errors** .tr[ (normal) ] .footnote[ You can also phrase the linear model as: `\(\color{red}{\boldsymbol y} \sim Normal(\color{blue}{\mathbf{X \boldsymbol \beta}}, \sigma)\)` ] --- # Assumptions: The Broad Idea All our work here is in aim of making **models of the world**. - Models are models. They are simplifications. They are therefore wrong. .pull-left[] .pull-right[ ![](jk_img_sandbox/joeymap.jpg) ] --- count:false # Assumptions: The Broad Idea All our work here is in aim of making **models of the world**. - Models are models. They are simplifications. They are therefore wrong. - Our residuals ( `\(y - \hat{y}\)` ) reflect everything that we **don't** account for in our model. -- - In an ideal world, our model accounts for _all_ the systematic relationships. The leftovers (our residuals) are just random noise. -- - If our model is mis-specified, or we don't measure some systematic relationship, then our residuals will reflect this. -- - We check by examining how much "like randomness" the residuals appear to be (zero mean, normally distributed, constant variance, i.i.d ("independent and identically distributed") - _these ideas tend to get referred to as our "assumptions"_ -- - We will **never** know whether our residuals contain only randomness - we can never observe everything! --- # Assumptions .pull-left[ What does "zero mean and constant variance" look like? - mean of the residuals = zero across the predicted values on the linear predictor. - spread of residuals is normally distributed and constant across the predicted values on the linear predictor. ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-37-1.svg)<!-- --> ] --- # Assumptions .pull-left[ What does "zero mean and constant variance" look like? - __mean of the residuals = zero across the predicted values on the linear predictor.__ - spread of residuals is normally distributed and constant across the predicted values on the linear predictor. ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-38-1.svg)<!-- --> ] --- # Assumptions .pull-left[ What does "zero mean and constant variance" look like? - mean of the residuals = zero across the predicted values on the linear predictor. - __spread of residuals is normally distributed and constant across the predicted values on the linear predictor.__ ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-39-1.svg)<!-- --> ] --- # Assumptions .pull-left[ What does "zero mean and constant variance" look like? - mean of the residuals = zero across the predicted values on the linear predictor. - spread of residuals is normally distributed and constant across the predicted values on the linear predictor. ] .pull-right[ __`plot(model)`__ ```r my_model <- lm(y ~ x, data = df) plot(my_model, which = 1) ``` ![](01_lmcluster_files/figure-html/unnamed-chunk-41-1.svg)<!-- --> ] --- # Assumptions: Recipe Book <br><br> <center> <div class="acronym"> L </div> inearity<br> <div class="acronym"> I </div> ndependence<br> <div class="acronym"> N </div> ormality<br> <div class="acronym"> E </div> qual variance<br> </center> .footnote["Line without N is a Lie!" (Umberto)] --- # What if our model doesn't meet assumptions? - is our model mis-specified? - is the relationship non-linear? higher order terms? (e.g. `\(y \sim x + x^2\)`) - is there an omitted variable or interaction term? -- - transform the outcome variable? - makes things look more "normal" - but can make things more tricky to interpret: `lm(y ~ x)` and `lm(log(y) ~ x)` are quite different models -- - bootstrap - do many times: resample (w/ replacement) your data, and refit your model. - obtain a distribution of parameter estimate of interest. - compute a confidence interval for estimate - celebrate -- __looking ahead:__ these don't help if we have violated our assumption of independence... --- # Summary - we can fit a linear regression model which takes the form `\(\color{red}{y} = \color{blue}{\mathbf{X} \boldsymbol{\beta}} + \boldsymbol{\varepsilon}\)` - in R, we fit this with `lm(y ~ x1 + .... xk, data = mydata)`. - we can extend this to different link functions to model outcome variables which follow different distributions. - when drawing inferences from a fitted model to the broader population, we rely on certain assumptions. - one of these is that the errors are independent. --- class: inverse, center, middle, animated, rotateInDownLeft # End of Part 1 --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Linear Regression Refresh</h2> <h2>Part 2: Clustered Data</h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Where we're going</h2> <h2 style="text-align: left;opacity:0.3;">Extra slides (optional): Other Approaches</h2> --- # Clustered Data .pull-left[ - children within schools - patients within clinics - observations within individuals ] .pull-left[ <img src="jk_img_sandbox/h2.png" width="1716" /> ] --- # Clustered Clustered Data? .pull-left[ - children within classrooms within schools within districts etc... - patients within doctors within hospitals... - time-periods within trials within individuals ] .pull-right[ <img src="jk_img_sandbox/h3.png" width="1716" /> ] .footnote[ Other relevant terms you will tend to see: "grouping structure", "levels", "hierarchies". ] --- # Importance of Clustering Clustering will likely result in measurements on observational units within a given cluster being more similar to each other than to those in other clusters. - For example, our measure of academic performance for children in a given class will tend to be more similar to one another (because of class specific things such as the teacher) than to children in other classes. <img src="jk_img_sandbox/lev1.png" width="60%" style="display: block; margin: auto;" /> --- # ICC (intra-class correlation coefficient) Clustering is expressed in terms of the correlation among the measurements within the same cluster - known as the __intra-class correlation coefficient (ICC).__ There are various formulations of ICC, but the basic principle = ratio of *variance between groups* to *total variance*. <br> `\(\rho = \frac{\sigma^2_{b}}{\sigma^2_{b} + \sigma^2_e} \\ \qquad \\\textrm{Where:} \\ \sigma^2_{b} = \textrm{variance between clusters} \\ \sigma^2_e = \textrm{variance within clusters (residual variance)} \\\)` -- Can also be interpreted as the correlation between two observations from the same group. --- # ICC (intra-class correlation coefficient) The larger the ICC, the lower the variability is within the clusters (relative to the variability between clusters). The greater the correlation between two observations from the same group. .pull-left[ <img src="01_lmcluster_files/figure-html/unnamed-chunk-45-1.svg" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="01_lmcluster_files/figure-html/unnamed-chunk-46-1.svg" style="display: block; margin: auto;" /> ] --- # Clustered Data & Linear Models .pull-left[ #### Why is it a problem? Clustering is something **systematic** that our model should (arguably) take into account. - remember, `\(\varepsilon \sim N(0, \sigma) \textbf{ independently}\)` ] -- .pull-right[ #### How is it a problem? Standard errors tend to be smaller than they should be, meaning that: - confidence intervals will be too narrow - `\(t\)`-statistics will be too large - `\(p\)`-values will be misleadingly small ] --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Linear Regression Refresh</h2> <h2><b style="opacity:0.4;">Part 2: Clustered Data</b><b>A Practical Comment on Data</b></h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Where we're going</h2> <h2 style="text-align: left;opacity:0.3;">Extra slides (optional): Other Approaches</h2> --- # Wide Data/Long Data .pull-left[ __Wide Data__ ``` ## # A tibble: 5 × 5 ## ID age trial_1 trial_2 trial_3 ## <chr> <chr> <chr> <chr> <chr> ## 1 001 28 10 12.5 18 ## 2 002 36 7.5 7 5 ## 3 003 61 12 14.5 11 ## 4 004 45 10.5 17 14 ## 5 ... ... ... ... ... ``` ] .pull-right[ __Long Data__ ``` ## # A tibble: 13 × 4 ## ID age trial score ## <chr> <chr> <chr> <chr> ## 1 001 36 trial_1 10 ## 2 001 36 trial_2 12.5 ## 3 001 36 trial_3 18 ## 4 002 70 trial_1 7.5 ## 5 002 70 trial_2 7 ## 6 002 70 trial_3 5 ## 7 003 68 trial_1 12 ## 8 003 68 trial_2 14.5 ## 9 003 68 trial_3 11 ## 10 004 31 trial_1 10.5 ## 11 004 31 trial_2 17 ## 12 004 31 trial_3 14 ## 13 ... ... ... ... ``` ] --- # Wide Data/Long Data ![](https://www.fromthebottomoftheheap.net/assets/img/posts/tidyr-longer-wider.gif)<!-- --> .footnote[ Source: Examples of wide and long representations of the same data. Source: Garrick Aden-Buie’s (\@grrrck) [Tidy Animated Verbs](https://github.com/gadenbuie/tidyexplain) ] --- # Long Data = Good for Plotting .pull-left[ __`group` aesthetic__ ```r ggplot(longd, aes(x=trial,y=score, group=ID, col=ID))+ geom_point(size=4)+ geom_path() ``` ![](01_lmcluster_files/figure-html/unnamed-chunk-50-1.svg)<!-- --> ] .pull-right[ __`facet_wrap()`__ ```r ggplot(longd, aes(x=trial,y=score))+ geom_point(size=4)+ geom_path(aes(group=1))+ facet_wrap(~ID) ``` ![](01_lmcluster_files/figure-html/unnamed-chunk-51-1.svg)<!-- --> ] --- # Long Data = Good for by-Cluster Computations ```r longd %>% group_by(ID) %>% summarise( ntrials = n_distinct(trial), meanscore = mean(score), sdscore = sd(score) ) ``` ``` ## # A tibble: 4 × 4 ## ID ntrials meanscore sdscore ## <chr> <int> <dbl> <dbl> ## 1 001 3 13.5 4.09 ## 2 002 3 6.5 1.32 ## 3 003 3 12.5 1.80 ## 4 004 3 13.8 3.25 ``` --- # Summary - Clustering can take many forms, and exist at many levels - Clustering is something systematic that we would want our model to take into account - Ignoring it can lead to incorrect statistical inferences - Clustering is typically assessed using intra-class correlation coefficient (ICC) - the ratio of variance between clusters to the total variance `\(\rho = \frac{\sigma^2_b}{\sigma^2_b + \sigma^2_e}\)` - Tidying your data and converting it to *long* format (one observational unit per row, and a variable identifying the cluster ID) is a good start. --- class: inverse, center, middle, animated, rotateInDownLeft # End of Part 2 --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Linear Regression Refresh</h2> <h2 style="text-align: left;opacity:0.3;">Part 2: Clustered Data</h2> <h2>Part 3: Where we're going</h2> <h2 style="text-align: left;opacity:0.3;">Extra slides (optional): Other Approaches</h2> --- # Our Data .pull-left[ > Sample of 200 pupils from 20 schools completed a survey containing the Emotion Dysregulation Scale (EDS) and the Child Routines Questionnaire (CRQ). ```r crq_data <- read_csv("https://uoepsy.github.io/data/crqdata.csv") head(crq_data) ``` ``` ## # A tibble: 6 × 6 ## emot_dysreg crq int schoolid sleep age ## <dbl> <dbl> <chr> <chr> <chr> <dbl> ## 1 4.12 1.92 Treatment school1 <8hr 14 ## 2 3.22 1.65 Treatment school1 <8hr 11 ## 3 4.86 3.56 Treatment school1 <8hr 16 ## 4 4.79 1.45 Treatment school1 8hr+ 16 ## 5 3.58 0.81 Treatment school1 <8hr 12 ## 6 4.41 2.71 Treatment school1 <8hr 15 ``` ```r library(ICC) ICCbare(x = schoolid, y = emot_dysreg, data = crq_data) ``` ``` ## [1] 0.2443 ``` ] .pull-right[ <img src="01_lmcluster_files/figure-html/unnamed-chunk-55-1.svg" style="display: block; margin: auto;" /> ] --- # Ignoring Clustering .pull-left[ __(Complete pooling)__ + `lm(y ~ 1 + x, data = df)` + Information from all clusters is pooled together to estimate over x ```r model <- lm(emot_dysreg ~ crq, data = crq_data) ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.4470 0.1259 35.31 <2e-16 *** ## crq -0.0525 0.0448 -1.17 0.24 ``` ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-58-1.svg)<!-- --> ] --- # Ignoring Clustering .pull-left[ __(Complete pooling)__ + `lm(y ~ 1 + x, data = df)` + Information from all clusters is pooled together to estimate over x ```r model <- lm(emot_dysreg ~ crq, data = crq_data) ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.4470 0.1259 35.31 <2e-16 *** ## crq -0.0525 0.0448 -1.17 0.24 ``` But different clusters show different patterns. Residuals are __not__ independent. ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-61-1.svg)<!-- --> ] --- count:false # Ignoring Clustering .pull-left[ __(Complete pooling)__ + `lm(y ~ 1 + x, data = df)` + Information from all clusters is __pooled__ together to estimate over x ```r model <- lm(emot_dysreg ~ crq, data = crq_data) ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.4470 0.1259 35.31 <2e-16 *** ## crq -0.0525 0.0448 -1.17 0.24 ``` But different clusters show different patterns. Residuals are __not__ independent. ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-64-1.svg)<!-- --> ] --- # Fixed Effects Models .pull-left[ __(No pooling)__ - `lm(y ~ x * cluster, data = df)` - Information from a cluster contributes to estimate *for that cluster*, but information is not pooled to estimate an overall effect. ```r model <- lm(emot_dysreg ~ 1 + crq * schoolid, data = crq_data) ``` {{content}} ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-66-1.svg)<!-- --> ] -- + Lots of estimates (separate for each cluster). + Variance estimates constructed based on information *only* within each cluster. + No overall estimate of effect over x. ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.31835 0.45187 7.34 1.8e-11 *** ## crq 0.38247 0.20087 1.90 0.05904 . ## schoolidschool10 1.89925 0.88643 2.14 0.03396 * ## schoolidschool11 2.15526 0.72639 2.97 0.00356 ** ## schoolidschool12 1.31007 0.63199 2.07 0.04009 * ## schoolidschool13 0.39744 0.69852 0.57 0.57033 ## schoolidschool14 0.67908 0.70058 0.97 0.33414 ## schoolidschool15 1.09387 0.60494 1.81 0.07281 . ## schoolidschool16 2.08396 0.74354 2.80 0.00582 ** ## schoolidschool17 1.23261 0.63109 1.95 0.05289 . ## schoolidschool18 -0.20698 0.92070 -0.22 0.82247 ## schoolidschool19 1.34223 0.59066 2.27 0.02465 * ## schoolidschool2 1.31507 0.86143 1.53 0.12922 ## schoolidschool20 1.05688 0.63934 1.65 0.10066 ## schoolidschool3 1.17962 0.65361 1.80 0.07336 . ## schoolidschool4 0.94437 0.60360 1.56 0.12005 ## schoolidschool5 1.49329 0.67013 2.23 0.02752 * ## schoolidschool6 2.76414 0.68697 4.02 9.5e-05 *** ## schoolidschool7 -0.00683 0.60138 -0.01 0.99095 ## schoolidschool8 1.15007 0.62754 1.83 0.06907 . ## schoolidschool9 0.62842 0.81937 0.77 0.44446 ## crq:schoolidschool10 -0.67112 0.34804 -1.93 0.05594 . ## crq:schoolidschool11 -0.68897 0.29202 -2.36 0.01975 * ## crq:schoolidschool12 -0.49875 0.25901 -1.93 0.05627 . ``` --- # Random Effects Models (MLM) .pull-left[ __(Partial Pooling)__ - `lmer(y ~ 1 + x + (1 + x| cluster), data = df)` - cluster-level variance in intercepts and slopes is modeled as randomly distributed around fixed parameters. - effects are free to vary by cluster, but information from clusters contributes (according to cluster `\(n\)` and outlyingness of cluster) to an overall fixed parameter. ```r library(lme4) model <- lmer(emot_dysreg ~ 1 + crq + (1 + crq | schoolid), data = crq_data) summary(model)$coefficients ``` ``` ## Estimate Std. Error t value ## (Intercept) 4.43772 0.15690 28.28 ## crq -0.05166 0.05063 -1.02 ``` ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-69-1.svg)<!-- --> ] --- # Random Effects Models (MLM) .pull-left[ __(Partial Pooling)__ - `lmer(y ~ 1 + x + (1 + x| cluster), data = df)` - cluster-level variance in intercepts and slopes is modeled as randomly distributed around fixed parameters. - effects are free to vary by cluster, but information from clusters contributes (according to cluster `\(n\)` and outlyingness of cluster) to an overall fixed parameter. ```r library(lme4) model <- lmer(emot_dysreg ~ 1 + crq + (1 + crq | schoolid), data = crq_data) summary(model)$coefficients ``` ``` ## Estimate Std. Error t value ## (Intercept) 4.43772 0.15690 28.28 ## crq -0.05166 0.05063 -1.02 ``` ] .pull-right[ ![](01_lmcluster_files/figure-html/unnamed-chunk-71-1.svg)<!-- --> ] --- # Summary With clustered data, there are many possible approaches. Some of the main ones are: - Ignore it (**complete pooling**) - and make inappropriate inferences. - Completely partition out any variance due to clustering into fixed effects for each cluster (__no pooling__). - and limit our estimates to being cluster specific and low power. - Model cluster-level variance as randomly distributed around fixed parameters, and partially pool information across clusters. - best of both worlds? --- class: inverse, center, middle, animated, rotateInDownLeft # End --- class: inverse, center, middle <h2 style="text-align: left;opacity:0.3;">Part 1: Linear Regression Refresh</h2> <h2 style="text-align: left;opacity:0.3;">Part 2: Clustered Data</h2> <h2 style="text-align: left;opacity:0.3;">Part 3: Where we're going</h2> <h2>Extra slides (optional): Other Approaches</h2> --- # Other: Repeated Measures ANOVA ## ANOVA revisited .pull-left[ - We've been using `anova()` as a test of whether _several_ parameters are simultaneously zero - The "omnibus"/"overall F"/"joint" test (can also be viewed as a model comparison) ] -- .pull-right[ - Traditional "ANOVA" is essentially a linear model with categorical predictor(s) where it looks for overall differences in group means (or differences in differences in group means etc). - Still quite popular in psychology because we often design experiments with discrete conditions. - Require post-hoc tests to examine _where_ any differences are. ] --- exclude: true # ANOVA revisited In R, functions typically create anova tables from linear model objects: ```r lin_mod <- lm(y ~ grp, df) anova(lin_mod) car::Anova(lin_mod) ``` or are wrappers which use the `lm()` function internally. So they're doing the same thing. ```r summary(aov(y ~ grp, df)) ``` <img src="01_lmcluster_files/figure-html/unnamed-chunk-75-1.svg" style="display: block; margin: auto;" /> --- exclude: true # ANOVA sums of squares .pull-left[ With multiple predictors, sums of squares can be calculated differently 1. Sequential Sums of Squares = Order matters 2. <p style="opacity:0.4">Partially Sequential Sums of Squares</p> 3. <p style="opacity:0.4">Partial Sums of Squares</p> <img src="jk_img_sandbox/SStype1b.png" width="200px" height="150px" style="display: block; margin: auto;" /> ] .pull-right[ __sequential SS__ ```r anova(lm(y~x1*x2,df)) ``` ``` ## Analysis of Variance Table ## ## Response: y ## Df Sum Sq Mean Sq F value Pr(>F) ## x1 1 91 91 13.55 0.00038 *** ## x2 1 484 484 72.45 2.3e-13 *** ## x1:x2 1 16 16 2.32 0.13097 ## Residuals 96 642 7 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r anova(lm(y~x2*x1,df)) ``` ``` ## Analysis of Variance Table ## ## Response: y ## Df Sum Sq Mean Sq F value Pr(>F) ## x2 1 449 449 67.14 1.1e-12 *** ## x1 1 126 126 18.86 3.5e-05 *** ## x2:x1 1 16 16 2.32 0.13 ## Residuals 96 642 7 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- exclude: true # ANOVA sums of squares .pull-left[ with multiple predictors, sums of squares can be calculated differently 1. <p style="opacity:0.4">Sequential Sums of Squares</p> 2. <p style="opacity:0.4">Partially Sequential Sums of Squares</p> 3. Partial Sums of Squares = Each one calculated as if its the last one in sequential SS (order doesn't matter). <img src="jk_img_sandbox/SStype3.png" width="200px" height="150px" style="display: block; margin: auto;" /> ] .pull-right[ __partial SS__ ```r car::Anova(lm(y~x1*x2,df), type="III") ``` ``` ## Anova Table (Type III tests) ## ## Response: y ## Sum Sq Df F value Pr(>F) ## (Intercept) 11924 1 1784.14 < 2e-16 *** ## x1 122 1 18.27 4.5e-05 *** ## x2 497 1 74.30 1.4e-13 *** ## x1:x2 16 1 2.32 0.13 ## Residuals 642 96 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r car::Anova(lm(y~x2*x1,df), type="III") ``` ``` ## Anova Table (Type III tests) ## ## Response: y ## Sum Sq Df F value Pr(>F) ## (Intercept) 11924 1 1784.14 < 2e-16 *** ## x2 497 1 74.30 1.4e-13 *** ## x1 122 1 18.27 4.5e-05 *** ## x2:x1 16 1 2.32 0.13 ## Residuals 642 96 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- # Other: Repeated Measures ANOVA ## ANOVA: Partitioning Variance <img src="jk_img_sandbox/anova.png" width="1288" /> .footnote[ The terminology here can be a nightmare. `\(SS_{between}\)` sometimes gets referred to as `\(SS_{model}\)`, `\(SS_{condition}\)`, `\(SS_{regression}\)`, or `\(SS_{treatment}\)`. Meanwhile `\(SS_{residual}\)` also gets termed `\(SS_{error}\)`. To make it all worse, there are inconsistencies in the acronyms used, `\(SSR\)` vs. `\(RSS\)`! ] --- # Other: Repeated Measures ANOVA ## ANOVA: Partitioning Variance <img src="jk_img_sandbox/rmanova.png" width="1288" /> --- # Other: Repeated Measures ANOVA in R .pull-left[ ![](01_lmcluster_files/figure-html/unnamed-chunk-82-1.svg)<!-- --> ] .pull-right[ ```r library(afex) aov_ez(id = "subject", dv = "y", data = df, within = "t") ``` ``` ## Anova Table (Type 3 tests) ## ## Response: y ## Effect df MSE F ges p.value ## 1 t 1.00, 39.18 3.20 36.65 *** .074 <.001 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1 ## ## Sphericity correction method: GG ``` ```r library(ez) ezANOVA(data = df, dv = y, wid = subject, within = t) ``` ``` ## $ANOVA ## Effect DFn DFd F p p<.05 ges ## 2 t 2 78 36.65 5.991e-12 * 0.07383 ## ## $`Mauchly's Test for Sphericity` ## Effect W p p<.05 ## 2 t 0.008994 1.335e-39 * ## ## $`Sphericity Corrections` ## Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 ## 2 t 0.5023 4.14e-07 * 0.5024 4.123e-07 * ``` ] --- # Other: Mixed ANOVA in R .pull-left[ ![](01_lmcluster_files/figure-html/unnamed-chunk-84-1.svg)<!-- --> ] .pull-right[ ```r library(afex) aov_ez(id = "subject", dv = "y", data = df, between = "condition", within = "t") ``` ``` ## Anova Table (Type 3 tests) ## ## Response: y ## Effect df MSE F ges p.value ## 1 condition 3, 36 10.38 31.46 *** .695 <.001 ## 2 t 1.01, 36.35 1.57 74.18 *** .215 <.001 ## 3 condition:t 3.03, 36.35 1.57 14.31 *** .137 <.001 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1 ## ## Sphericity correction method: GG ``` ```r library(ez) ezANOVA(data = df, dv = y, wid = subject, within = t, between = condition) ``` ``` ## $ANOVA ## Effect DFn DFd F p p<.05 ges ## 2 condition 3 36 31.46 3.654e-10 * 0.6945 ## 3 t 2 72 74.18 3.241e-18 * 0.2148 ## 4 condition:t 6 72 14.31 1.156e-10 * 0.1367 ## ## $`Mauchly's Test for Sphericity` ## Effect W p p<.05 ## 3 t 0.01948 1.164e-30 * ## 4 condition:t 0.01948 1.164e-30 * ## ## $`Sphericity Corrections` ## Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 ## 3 t 0.5049 2.389e-10 * 0.5053 2.352e-10 * ## 4 condition:t 0.5049 2.428e-06 * 0.5053 2.407e-06 * ``` ] --- # Other: Advantages of MLM over ANOVA - Rpt Measures and Mixed ANOVA are special cases of the MLM - MLM can easily be extended to modelling different types of dependent variables. ANOVA cannot. - MLM can be extended to model more complex grouping structures (e.g. children in schools in districts, or independent clusters of both participants as well as experimental items) - MLM allows us to model an effect as varying - MLM provides more options for dealing with unbalanced designs and missing data. --- # Other: Cluster Robust Standard Errors .pull-left[ <!-- https://rlbarter.github.io/Practical-Statistics/2017/05/10/generalized-estimating-equations-gee/ --> Don't include clustering as part of the model directly, but incorporate the dependency into our residuals term. $$ `\begin{align} \color{red}{\textrm{outcome}} & = \color{blue}{(\textrm{model})} + \textrm{error}^* \\ \end{align}` $$ .footnote[`*` Where errors are clustered] ] .pull-right[ Simply adjusts our standard errors: ```r library(plm) clm <- plm(emot_dysreg ~ 1 + crq, data=crq_data, model="pooling", index="schoolid") ``` ``` ## Coefficients: ## Estimate Std. Error t-value Pr(>|t|) ## (Intercept) 4.4470 0.1259 35.31 <2e-16 *** ## crq -0.0525 0.0448 -1.17 0.24 ``` ```r sqrt(diag(vcovHC(clm, method='arellano', cluster='group'))) ``` ``` ## (Intercept) crq ## 0.14410 0.04601 ``` ] --- # Other: Generalised Estimating Equations .pull-left[ Don't include clustering as part of the model directly, but incorporate the dependency into our residuals term. $$ `\begin{align} \color{red}{\textrm{outcome}} & = \color{blue}{(\textrm{model})} + \textrm{error}^* \\ \end{align}` $$ GEE is a "population average" model: - __population average:__ how the _average_ response changes for a 1-unit increase in `\(x\)`, while accounting for within-group correlation. - __subject specific:__ how the response changes for a group when they increase 1-unit in `\(x\)`. .footnote[`*` Where errors are clustered, and follow some form of correlational<br>structure within clusters (e.g. based on how many timepoints apart<br>two observations are).] ] .pull-right[ Specifying a correlational structure for residuals within clusters can influence _what_ we are estimating ```r library(geepack) # needs to be arranged by cluster, # and for cluster to be numeric crq_data <- crq_data %>% mutate( cluster_id = as.numeric(as.factor(schoolid)) ) %>% arrange(cluster_id) # geemod = geeglm(emot_dysreg ~ 1 + crq, data = crq_data, corstr = 'exchangeable', id = cluster_id) ``` ``` ## Coefficients: ## Estimate Std.err Wald Pr(>|W|) ## (Intercept) 4.4304 0.1471 906.97 <2e-16 *** ## crq -0.0510 0.0473 1.16 0.28 ``` ] --- class: inverse, center, middle, animated, rotateInDownLeft # End