class: center, middle, inverse, title-slide #
Model Comparisons
## Data Analysis for Psychology in R 2
### dapR2 Team ### Department of Psychology
The University of Edinburgh --- # Week's Learning Objectives + Understand how to use model comparisons to test different types of question. + Understand the calculation of the incremental `\(F\)`-test. + Understand the difference between nested and non-nested models, and the appropriate statistics to use for comparison in each case. --- # Topics for today + Discuss some motivating examples: + Categorical variables with 2+ levels + Interactions with categorical variables with 2+ levels + Controlling for covariates + Statistical tools for selection/comparison + Incremental `\(F\)`-test + Nested vs. non-nested models + AIC & BIC --- # Some data + We have previously looked at this example. + A researcher was interested in whether the subjective well-being of patients differed dependent on the post-operation treatment schedule they were given, and the hospital in which they were staying. + **Condition 1**: `Treatment` (Levels: TreatA, TreatB, TreatC). + **Condition 2**: `Hospital` (Levels: Hosp1, Hosp2). + Total sample n = 180 (30 patients in each of 6 groups). + Between person design. + **Outcome**: Subjective well-being (`SWB`) + An average of multiple raters (the patient, a member of their family, and a friend). + SWB score ranged from 0 to 20. --- # The data ```r hosp_tbl <- read_csv("hospital.csv", col_types = "dff") hosp_tbl %>% slice(1:10) ``` ``` ## # A tibble: 10 x 3 ## SWB Treatment Hospital ## <dbl> <fct> <fct> ## 1 6.2 TreatA Hosp1 ## 2 15.9 TreatA Hosp1 ## 3 7.2 TreatA Hosp1 ## 4 11.3 TreatA Hosp1 ## 5 11.2 TreatA Hosp1 ## 6 9 TreatA Hosp1 ## 7 14.5 TreatA Hosp1 ## 8 7.3 TreatA Hosp1 ## 9 13.7 TreatA Hosp1 ## 10 12.6 TreatA Hosp1 ``` --- # Example 1: Categorical Variables with 2+ levels + What if the researcher wanted to ask a general question; Is there an overall effect of `Treatment`? + How might we do this with the skills we have learned already? ```r summary(lm(SWB ~ Treatment, data = hosp_tbl)) ``` --- # Results ``` ## ## Call: ## lm(formula = SWB ~ Treatment, data = hosp_tbl) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.373 -1.987 -0.300 1.838 7.173 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.3267 0.3242 28.770 < 2e-16 *** ## TreatmentTreatB 1.9467 0.4585 4.246 3.51e-05 *** ## TreatmentTreatC -0.2850 0.4585 -0.622 0.535 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.511 on 177 degrees of freedom ## Multiple R-squared: 0.1369, Adjusted R-squared: 0.1271 ## F-statistic: 14.04 on 2 and 177 DF, p-value: 2.196e-06 ``` --- # Example 2: Categorical Interactions with 2+ levels + If we stay with the same example, what if we asked the question: + Is there an interaction between `Hospital` and `Treatement`? ```r summary(lm(SWB ~ Treatment*Hospital, data = hosp_tbl)) ``` --- # Results ``` ## ## Call: ## lm(formula = SWB ~ Treatment * Hospital, data = hosp_tbl) ## ## Residuals: ## Min 1Q Median 3Q Max ## -6.6000 -1.2533 0.1083 1.2650 5.7000 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.8000 0.3699 29.195 < 2e-16 *** ## TreatmentTreatB -1.3700 0.5232 -2.619 0.0096 ** ## TreatmentTreatC -0.6967 0.5232 -1.332 0.1847 ## HospitalHosp2 -2.9467 0.5232 -5.632 7.02e-08 *** ## TreatmentTreatB:HospitalHosp2 6.6333 0.7399 8.966 4.74e-16 *** ## TreatmentTreatC:HospitalHosp2 0.8233 0.7399 1.113 0.2673 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.026 on 174 degrees of freedom ## Multiple R-squared: 0.4476, Adjusted R-squared: 0.4317 ## F-statistic: 28.2 on 5 and 174 DF, p-value: < 2.2e-16 ``` --- # Some more data + How about this example based on data from the Midlife In United States (MIDUS2) study. + Outcome: self-rated health + Covariates: Age, sex + Predictors: Big Five traits and Purpose in Life. --- # The data ```r midus <- read_csv("MIDUS2.csv") midus2 <- midus %>% select(1:4, 31:42) %>% mutate( PIL = rowMeans(.[grep("PIL", names(.))],na.rm=T) ) %>% select(1:4, 12:17) %>% drop_na(.) slice(midus2, 1:3) ``` ``` ## # A tibble: 3 x 10 ## ID age sex health O C E A N PIL ## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 10002 69 MALE 8 2.14 2.8 2.6 3.4 2 5.86 ## 2 10019 51 MALE 8 3.14 3 3.4 3.6 1.5 5.71 ## 3 10023 78 FEMALE 4 3.57 3.4 3.6 4 1.75 5.14 ``` --- # Example 3: Controlling for covariates + Suppose our question was.... + Does personality significantly predict self-rated health over and above the effects of age and sex? ```r summary(lm(health ~ age + sex + O + C + E + A + N, data = midus2)) ``` --- ``` ## ## Call: ## lm(formula = health ~ age + sex + O + C + E + A + N, data = midus2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -6.7723 -0.7921 0.2532 1.0097 3.9550 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.66172 0.45100 14.771 < 2e-16 *** ## age -0.01310 0.00298 -4.396 1.17e-05 *** ## sexMALE -0.09571 0.07955 -1.203 0.229 ## O 0.09308 0.08306 1.121 0.263 ## C 0.57147 0.08507 6.717 2.49e-11 *** ## E 0.56771 0.08061 7.043 2.70e-12 *** ## A -0.40380 0.09025 -4.474 8.15e-06 *** ## N -0.56493 0.06189 -9.128 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.521 on 1753 degrees of freedom ## Multiple R-squared: 0.1484, Adjusted R-squared: 0.145 ## F-statistic: 43.65 on 7 and 1753 DF, p-value: < 2.2e-16 ``` --- # What do these questions have in common? + A key feature of these questions is that they require us to evaluate whether multiple variables (think more than one beta coefficient) are significant. + Another - potentially more useful - way to think are "are significant" is to say "do they improve my model"? + Up to this point, we have only discussed ways to explore this: + In a very limited case (i.e. the single categorical predictor with 2+ levels) like example 1 + Descriptively + What we will look at next is how we can formally test such questions. --- class: center, middle # Questions on the general concept... --- # Recall the `\(F\)`-test + `\(F\)`-ratio is a ratio of the explained to unexplained variance: `$$F = \frac{MS_{Model}}{MS_{Residual}}$$` + Where the mean squares (MS) are the sums of squares divided by the degrees of freedom. So we can also write: `$$F = \frac{SS_{Model}/df_{Model}}{SS_{Residual}/df_{residual}}$$` --- # F-ratio + Bigger `\(F\)`-ratios indicate better models. + It means the model variance is big compared to the residual variance. + The null hypothesis for the model says that the best guess of any individuals `\(y\)` value is the mean of `\(y\)` plus error. + Or, that the `\(x\)` variables carry no information collectively about `\(y\)`. + Or, a test that all `\(\beta\)` = 0 + `\(F\)`-ratio will be close to 1 when the null hypothesis is true + If there is equivalent residual to model variation, `\(F\)`=1 + If there is more model than residual `\(F\)` > 1 + `\(F\)`-ratio is then evaluated against an `\(F\)`-distribution with `\(df_{Model}\)` and `\(df_{Residual}\)` and a pre-defined `\(\alpha\)` + Testing the `\(F\)`-ratio evaluates statistical significance of the overall model --- # `\(F\)`-test as an incremental test + One important way we can think about the `\(F\)`-test and the `\(F\)`-ratio is as an incremental test against an "empty" or null model. + A null or empty model is a linear model with only the intercept. + In this model, our predicted value of the outcome for every case in our data set, is the mean of the outcome. + That is, with no predictors, we have no information that may help us predict the outcome. + So we will be "least wrong" by guessing the mean of the outcome. + An empty model is the same as saying all `\(\beta\)` = 0. + So in this way, the `\(F\)`-test we have already seen **is comparing two models**. + We can extend this idea, and use the `\(F\)`-test to compare two models that contain different sets of predictors. + This is the **incremental `\(F\)`-test** --- # Incremental `\(F\)`-test .pull-left[ + The incremental `\(F\)`-test evaluates the statistical significance of the improvement in variance explained in an outcome with the addition of further predictor(s) + It is based on the difference in `\(F\)`-values between two models. + We call the model with the additional predictor(s) model 1 or full model + We call the model without model 0 or restricted model ] .pull-right[ `$$F_{(df_R-df_F),df_F} = \frac{(SSR_R-SSR_F)/(df_R-df_F)}{SSR_F / df_F}$$` $$ `\begin{align} & \text{Where:} \\ & SSR_R = \text{residual sums of squares for the restricted model} \\ & SSR_F = \text{residual sums of squares for the full model} \\ & df_R = \text{residual degrees of freedom from the restricted model} \\ & df_F = \text{residual degrees of freedom from the full model} \\ \end{align}` $$ ] --- # Incremental `\(F\)`-test in R + In order to apply the `\(F\)`-test for model comparison in R, we use the `anova()` function. + `anova()` takes as its arguments models that we wish to compare + Here we will show examples with 2 models, but we can use more. --- # Application to example 1 + Is there an overall effect of `Treatment`? ```r ex1_r <- lm(SWB ~ 1, data = hosp_tbl) ex1_f <- lm(SWB ~ Treatment, data = hosp_tbl) anova(ex1_r, ex1_f) ``` ``` ## Analysis of Variance Table ## ## Model 1: SWB ~ 1 ## Model 2: SWB ~ Treatment ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 179 1293.1 ## 2 177 1116.1 2 177.02 14.037 2.196e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Application to example 1 ```r summary(ex1_f) ``` ``` ## ## Call: ## lm(formula = SWB ~ Treatment, data = hosp_tbl) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.373 -1.987 -0.300 1.838 7.173 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.3267 0.3242 28.770 < 2e-16 *** ## TreatmentTreatB 1.9467 0.4585 4.246 3.51e-05 *** ## TreatmentTreatC -0.2850 0.4585 -0.622 0.535 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.511 on 177 degrees of freedom ## Multiple R-squared: 0.1369, Adjusted R-squared: 0.1271 ## F-statistic: 14.04 on 2 and 177 DF, p-value: 2.196e-06 ``` --- # Application to example 2 + Is there an interaction between `Hospital` and `Treatment`? ```r ex2_r <- lm(SWB ~ Treatment + Hospital, data = hosp_tbl) ex2_f <- lm(SWB ~ Treatment*Hospital, data = hosp_tbl) anova(ex2_r, ex2_f) ``` ``` ## Analysis of Variance Table ## ## Model 1: SWB ~ Treatment + Hospital ## Model 2: SWB ~ Treatment * Hospital ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 176 1106.51 ## 2 174 714.34 2 392.18 47.764 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Application to example 3 + Does personality significantly predict self-rated health over and above the effects of age and sex? ```r ex3_r <- lm(health ~ age + sex, data = midus2) ex3_f <- lm(health ~ age + sex + O + C + E + A + N, data = midus2) anova(ex3_r, ex3_f) ``` ``` ## Analysis of Variance Table ## ## Model 1: health ~ age + sex ## Model 2: health ~ age + sex + O + C + E + A + N ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 1758 4740.2 ## 2 1753 4055.4 5 684.85 59.208 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- class: center, middle # Questions.... --- # Nested vs non-nested models + The `\(F\)`-ratio depends on the models being compared being nested + Nested means that the predictors in one model are a subset of the predictors in the other + We also require the models to be computed on the same data --- # Nested vs non-nested models .pull-left[ **Nested** ```r m0 <- lm(outcome ~ x1 + x2 , data = data) m1 <- lm(outcome ~ x1 + x2 + x3, data = data) ``` + These models are nested. + `x1` and `x2` appear in both models ] .pull-right[ **Non-nested** ```r m0 <- lm(outcome ~ x1 + x2 + x4, data = data) m1 <- lm(outcome ~ x1 + x2 + x3, data = data) ``` + These models are non-nested + There are unique variables in both models + `x4` in `m0` + `x3` in `m1` ] --- # Model comparison for non-nested models + So what happens when we have non-nested models? + There are two commonly used alternatives + AIC + BIC + Unlike the incremental `\(F\)`-test AIC and BIC do not require two models to be nested + Smaller (more negative) values indicate better fitting models. + So we compare values and choose the model with the smaller AIC or BIC value --- # AIC & BIC .pull-left[ `$$AIC = n\,\text{ln}\left( \frac{SS_{residual}}{n} \right) + 2k$$` $$ `\begin{align} & \text{Where:} \\ & SS_{residual} = \text{sum of squares residuals} \\ & n = \text{sample size} \\ & k = \text{number of explanatory variables} \\ & \text{ln} = \text{natural log function} \end{align}` $$ ] .pull-right[ `$$BIC = n\,\text{ln}\left( \frac{SS_{residual}}{n} \right) + k\,\text{ln}(n)$$` $$ `\begin{align} & \text{Where:} \\ & SS_{residual} = \text{sum of squares residuals} \\ & n = \text{sample size} \\ & k = \text{number of explanatory variables} \\ & \text{ln} = \text{natural log function} \end{align}` $$ ] --- # Parsimony corrections + Both AIC and BIC contain something called a parsimony correction + In essence, they penalise models for being complex + This is to help us avoid overfitting (adding predictors arbitarily to improve fit) `$$AIC = n\,\text{ln}\left( \frac{SS_{residual}}{n} \right) + 2k$$` `$$BIC = n\,\text{ln}\left( \frac{SS_{residual}}{n} \right) + k\,\text{ln}(n)$$` + BIC has a harsher parsimony penalty for typical sample sizes when applying linear models than AIC + When `\(\text{ln}(n) > 2\)` BIC will have a more severe parsimony penalty (i.e. essentially all the time!) --- # In R ```r AIC(ex3_r, ex3_f) ``` ``` ## df AIC ## ex3_r 4 6749.246 ## ex3_f 9 6484.457 ``` --- # Applied to non-nested models ```r ex3_nn1 <- lm(health ~ O + C + E + A + N, data=midus2) ex3_nn2 <- lm(health ~ age + sex + PIL, data = midus2) AIC(ex3_nn1, ex3_nn2) ``` ``` ## df AIC ## ex3_nn1 7 6501.524 ## ex3_nn2 5 6564.953 ``` --- # In R ```r BIC(ex3_r, ex3_f) ``` ``` ## df BIC ## ex3_r 4 6771.141 ## ex3_f 9 6533.719 ``` ```r BIC(ex3_nn1, ex3_nn2) ``` ``` ## df BIC ## ex3_nn1 7 6539.840 ## ex3_nn2 5 6592.321 ``` --- # Considerations for use of AIC and BIC + The AIC and BIC for a single model are not meaningful + They only make sense for model comparisons + AIC and BIC can be used for both nested and non-nested models. + For AIC, there are no cut-offs to suggest how big a difference in two models is needed to conclude that one is substantively better than the other + For BIC, a difference of 10 can be used as a rule of thumb to suggest that one model is substantively better than another --- # Summary of today + We have set out the types of question that may require us to use model comparison methods. + We have introduced the incremental `\(F\)`-test and linked it to the `\(F\)`-test from semester 1. + We also introduced the concepts of nested and non-neste tests, and the use of AIC and BIC for model comparison of non-nested models. --- class: center, middle # Thanks for listening!