class: center, middle, inverse, title-slide #
Binary Logistic Model
## Data Analysis for Psychology in R 2
### dapR2 Team ### Department of Psychology
The University of Edinburgh ### AY 2020-2021 --- # Weeks Learning Objectives 1. Identify and provide examples of binary psychological outcomes. 2. Understand why a standard LM is not appropriate for binary data. 3. Fit and interpret a logistic model --- # Topics for today + More logistic regression + The effects of individual IVs + Model selection + Other issues --- # Our data and model + Imagine we're interested in predicting hiring decisions. + We collect data on n=242 job-seekers + Age + Effort put into job application + Our variables: + DV: `work` (0 = did not get job; 1 = did get job) + IV1: `age` (in years) + IV2: `msrch` (effort into job application, 0=low effort, 1 = high effort) ```r m2 <- glm(work ~ age + msrch, data = hire, family = "binomial") summary(m2) ``` --- # Job-seeking example ``` ## ## Call: ## glm(formula = work ~ age + msrch, family = "binomial", data = hire) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.8834 -1.0496 0.6436 0.9204 2.0589 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 4.52505 1.56972 2.883 0.003943 ** ## age -0.11848 0.03214 -3.687 0.000227 *** ## msrch1 1.68335 0.33446 5.033 4.83e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 335.22 on 241 degrees of freedom ## Residual deviance: 285.36 on 239 degrees of freedom ## AIC: 291.36 ## ## Number of Fisher Scoring iterations: 4 ``` --- # Model equation: job-seeking example + Below we have the general form with two `\(x\)`'s `$$P(y_i) = \frac{1}{1+e^{-(b_0 + b_1x_1 + b_2x_2)}}$$` + And we can insert the values from the previous slide with our model results: `$$P(y_i) = \frac{1}{1+e^{-(4.525 -0.118age + 1.683msrch)}}$$` --- # Interpreting logistic model coefficients + In linear regression, the `\(b\)` coefficients for each IV are the unit increase in `\(Y\)` for every unit increase in `\(X\)` (holding other IVs constant) + In logistic regression, the `\(b\)` coefficients for each IV are the **change in log odds of `\(Y\)` for every unit increase in `\(x\)`** (holding other IVs constant) --- # What are log odds? `\(b\)` = **the change in log odds of `\(Y\)` for every unit increase in `\(X\)`** + The odds of an event occurring (e.g., a job offer; Y=1) is defined as the ratio of the probability of the event occurring to the probability of the event not occurring: `$$odds = \frac{P(Y=1)}{1-P(Y=1)}$$` + `\(P(Y=1)\)` is the same as `\(P(y_i)\)` calculated in the logistic regression model + Think of a coin toss. + Odds of tails occurring = 0.5 + Odds of not tails = 0.5 + Odds = 1 --- # What are log odds? `\(b\)` = **the change in log odds of `\(Y\)` for every unit increase in `\(X\)`** + Log odds are then the natural logarithm of the odds: `$$log odds = ln \left (\frac{P(Y=1)}{1-P(Y=1)} \right)$$` --- # Probabilities, odds and log-odds ```r tibble( Probs = seq(0.1, 0.9, 0.1) ) %>% mutate( Odds = round(Probs/(1-Probs),2), Logits = round(log(Odds),2) ) ``` ``` ## # A tibble: 9 x 3 ## Probs Odds Logits ## <dbl> <dbl> <dbl> ## 1 0.1 0.11 -2.21 ## 2 0.2 0.25 -1.39 ## 3 0.3 0.43 -0.84 ## 4 0.4 0.67 -0.4 ## 5 0.5 1 0 ## 6 0.6 1.5 0.41 ## 7 0.7 2.33 0.85 ## 8 0.8 4 1.39 ## 9 0.9 9 2.2 ``` --- # For our job-seekers example + For every additional year of age, there was a decrease in the log odds of a job offer of 0.118 + Those who showed high effort in their application had a 1.683 greater log odds of a job offer than those who showed low effort --- # Odds ratio + Log odds don't provide an easily interpretable way of understanding how the DV changes with the IV's + The `\(b\)` coefficients from logistic regression are thus often converted to odds ratios + Odds ratios are a bit easier to interpret + Odds ratios are obtained by exponentiating the `\(b\)` coefficients + In R, we exponentiate coefficients using the `exp()` function. --- # Exponentiating `\(b\)` coefficients ```r exp(coef(m2)) ``` ``` ## (Intercept) age msrch1 ## 92.3001400 0.8882662 5.3835809 ``` --- # Interpreting odds ratios + When the coefficients are converted to odds ratios, they represent the **change in odds with a unit increase in X** + Specifically the *ratio of odds* at X=x and X=x+1 + An odds ratio of 1 indicates no effect + An odds ratio < 1 indicates a negative effect + An odds ratio of >1 indicates a positive effect --- # Interpreting odds ratios ```r exp(coef(m2)) ``` ``` ## (Intercept) age msrch1 ## 92.3001400 0.8882662 5.3835809 ``` + For every year of `age`, the odds of being hired decrease by 0.88. + For those who put high effort into applications, the odds of being hired increase by a factor of 5.38. --- class: center, middle # Time for a break --- class: center, middle # Welcome Back! **Now let's look at the significance of predictors** --- # Statistical significance of predictors + We can also evaluate the statistical significance of the predictors + To do this we can use a `\(z\)`-test: `$$z = \frac{b}{SE(b)}$$` + However , we should be aware that the `\(z\)`-test is a little prone to Type II errors + We can supplement it using model selection procedures (see later) + The z-test and associated `\(p\)`-value is provided as part of the summary output for `glm()` --- # The z-test ``` ## ## Call: ## glm(formula = work ~ age + msrch, family = "binomial", data = hire) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.8834 -1.0496 0.6436 0.9204 2.0589 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 4.52505 1.56972 2.883 0.003943 ** ## age -0.11848 0.03214 -3.687 0.000227 *** ## msrch1 1.68335 0.33446 5.033 4.83e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 335.22 on 241 degrees of freedom ## Residual deviance: 285.36 on 239 degrees of freedom ## AIC: 291.36 ## ## Number of Fisher Scoring iterations: 4 ``` --- # Confidence intervals + We can also compute confidence intervals for our coefficients and associated odds ratios + For odds ratios, a value of 1= no effect + The question is, therefore, whether the confidence interval includes 1 or not --- # 95% confidence intervals for our job-seekers example .pull-left[ + We can use the `confint()` function to compute confidence intervals + We can embed this in the `exp()` function to convert our coefficients to odds ratios. + Neither 95% CI includes 1, therefore, both predictors are significant at `\(p\)`<.05. ] .pull-right[ ```r exp(confint(m2)) ``` ``` ## Waiting for profiling to be done... ``` ``` ## 2.5 % 97.5 % ## (Intercept) 4.3974299 2107.9649011 ## age 0.8328194 0.9449901 ## msrch1 2.8472535 10.6300211 ``` ] --- # Model selection + Just as in linear regression, we can compare logistic models differing in their predictors to choose a best fitting model + Methods we can use: + Likelihood ratio test + AIC + BIC --- # Likelihood ratio test + We already encountered this when we compared our model to a baseline model with no predictors. + We can compare any set of **nested** models using the likelihood ratio test + Including models differing in one predictor + This tests the statistical significance of the effect of that predictor + Provides an alternative to the z-test --- # Likelihood ratio test in R ```r m_null <- glm(work ~ 1, data = hire, family = "binomial") m_age <- glm(work ~ age , data = hire, family = "binomial") m_full <- glm(work ~ age + msrch, data = hire, family = "binomial") anova(m_null, m_age, m_full, test = "Chisq") ``` ``` ## Analysis of Deviance Table ## ## Model 1: work ~ 1 ## Model 2: work ~ age ## Model 3: work ~ age + msrch ## Resid. Df Resid. Dev Df Deviance Pr(>Chi) ## 1 241 335.22 ## 2 240 313.98 1 21.242 4.047e-06 *** ## 3 239 285.36 1 28.616 8.826e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # AIC and BIC + We met AIC and BIC in the model selection section in linear regression + Can be used to compare either nested or non-nested models + Smaller (more negative) AIC and BIC indicate better fitting models + BIC, in the context of regression, penalises extra predictors more heavily + BIC differences >10 indicate that one model is better than another to a practically significant extent --- # AIC and BIC in R ```r AIC(m_null, m_age, m_full) ``` ``` ## df AIC ## m_null 1 337.2187 ## m_age 2 317.9762 ## m_full 3 291.3604 ``` ```r BIC(m_null, m_age, m_full) ``` ``` ## df BIC ## m_null 1 340.7077 ## m_age 2 324.9541 ## m_full 3 301.8273 ``` --- # Summary of today + Logistic regression coefficients are converted to odds ratios to make them more interpretable + Odds ratios tell us how the odds of the event change with a unit increase in X + 1 is no effect + Less than 1 is a negative effect + More than 1 is a positive effect + Statistical significance of predictors can be assessed via: + z-test + Confidence intervals + Likelihood ratio test + Model selection uses + Likelihood ratio test + AIC and BIC