Binary Logistic Model

class: center, middle, inverse, title-slide

# Binary Logistic Model 
## Data Analysis for Psychology in R 2 
### dapR2 Team
### Department of Psychology The University of Edinburgh
### AY 2020-2021

---

# Weeks Learning Objectives
1. Identify and provide examples of binary psychological outcomes.
2. Understand why a standard LM is not appropriate for binary data.
3. Fit and interpret a logistic model

---
# Topics for today
+ More logistic regression
	+ The effects of individual IVs
	+ Model selection
	+ Other issues

---
#  Our data and model
+ Imagine we're interested in predicting hiring decisions.

+ We collect data on n=242 job-seekers
	+ Age
	+ Effort put into job application

+ Our variables:
	+ DV: `work` (0 = did not get job; 1 = did get job)
	+ IV1: `age` (in years)
	+ IV2: `msrch` (effort into job application, 0=low effort, 1 = high effort)

```r
m2 <- glm(work ~ age + msrch, data = hire, family = "binomial")
summary(m2)
```

---
#  Job-seeking example

```
## 
## Call:
## glm(formula = work ~ age + msrch, family = "binomial", data = hire)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8834  -1.0496   0.6436   0.9204   2.0589  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.52505    1.56972   2.883 0.003943 ** 
## age         -0.11848    0.03214  -3.687 0.000227 ***
## msrch1       1.68335    0.33446   5.033 4.83e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 335.22  on 241  degrees of freedom
## Residual deviance: 285.36  on 239  degrees of freedom
## AIC: 291.36
## 
## Number of Fisher Scoring iterations: 4
```

---
#  Model equation: job-seeking example 
+ Below we have the general form with two `$x$`'s

`$$P(y_i) = \frac{1}{1+e^{-(b_0 + b_1x_1 + b_2x_2)}}$$`

+ And we can insert the values from the previous slide with our model results:

`$$P(y_i) = \frac{1}{1+e^{-(4.525 -0.118age + 1.683msrch)}}$$`

---
#  Interpreting logistic model coefficients
+ In linear regression, the `$b$` coefficients for each IV are the unit increase in `$Y$` for every unit increase in `$X$` (holding other IVs constant)

+ In logistic regression, the `$b$` coefficients for each IV are the **change in log odds of `$Y$` for every unit increase in `$x$`** (holding other IVs constant)

---
#  What are log odds? 
`$b$` = **the change in log odds of `$Y$` for every unit increase in `$X$`**

+ The odds of an event occurring (e.g., a job offer; Y=1) is defined as the ratio of the probability of the event occurring to the probability of the event not occurring:

`$$odds = \frac{P(Y=1)}{1-P(Y=1)}$$`

+ `$P(Y=1)$` is the same as `$P(y_i)$` calculated in the logistic regression model
  + Think of a coin toss.
  + Odds of tails occurring = 0.5
  + Odds of not tails = 0.5
  + Odds = 1

---
#  What are log odds?  
`$b$` = **the change in log odds of `$Y$` for every unit increase in `$X$`**

+ Log odds are then the natural logarithm of the odds:

`$$log odds = ln \left (\frac{P(Y=1)}{1-P(Y=1)} \right)$$`

---
#  Probabilities, odds and log-odds

```r
tibble(
  Probs = seq(0.1, 0.9, 0.1)
) %>%
  mutate(
    Odds = round(Probs/(1-Probs),2),
    Logits = round(log(Odds),2)
  )
```

```
## # A tibble: 9 x 3
## Probs Odds Logits
## <dbl> <dbl> <dbl>
## 1 0.1 0.11 -2.21
## 2 0.2 0.25 -1.39
## 3 0.3 0.43 -0.84
## 4 0.4 0.67 -0.4 
## 5 0.5 1 0 
## 6 0.6 1.5 0.41
## 7 0.7 2.33 0.85
## 8 0.8 4 1.39
## 9 0.9 9 2.2
```

---
#  For our job-seekers example 
+ For every additional year of age, there was a decrease in the log odds of a job offer of 0.118

+ Those who showed high effort in their application had a 1.683 greater log odds of a job offer than those who showed low effort

---
#  Odds ratio 
+ Log odds don't provide an easily interpretable way of understanding how the DV changes with the IV's

+ The `$b$` coefficients from logistic regression are thus often converted to odds ratios
	+ Odds ratios are a bit easier to interpret
	+ Odds ratios are obtained by exponentiating the `$b$` coefficients

+ In R, we exponentiate coefficients using the `exp()` function.

---
#  Exponentiating `$b$` coefficients

```r
exp(coef(m2))
```

```
## (Intercept)         age      msrch1 
##  92.3001400   0.8882662   5.3835809
```

---
#  Interpreting odds ratios

+ When the coefficients are converted to odds ratios, they represent the **change in odds with a unit increase in X**
	+ Specifically the *ratio of odds* at X=x and X=x+1
	
+ An odds ratio of 1 indicates no effect

+ An odds ratio < 1 indicates a negative effect

+ An odds ratio of >1 indicates a positive effect

---
#  Interpreting odds ratios

```r
exp(coef(m2))
```

```
## (Intercept)         age      msrch1 
##  92.3001400   0.8882662   5.3835809
```

+ For every year of `age`, the odds of being hired decrease by 0.88.

+ For those who put high effort into applications, the odds of being hired increase by a factor of 5.38.

---
class: center, middle
# Time for a break

---
class: center, middle
# Welcome Back!

**Now let's look at the significance of predictors**

---
#  Statistical significance of predictors
+ We can also evaluate the statistical significance of the predictors

+ To do this we can use a `$z$`-test:

`$$z = \frac{b}{SE(b)}$$`

+ However , we should be aware that the `$z$`-test is a little prone to Type II errors
	+ We can supplement it using model selection procedures (see later)

+ The z-test and associated `$p$`-value is provided as part of the summary output for `glm()`

---
#  The z-test

---
#  Confidence intervals  
+ We can also compute confidence intervals for our coefficients and associated odds ratios
	+ For odds ratios, a value of 1= no effect
	+ The question is, therefore, whether the confidence interval includes 1 or not

---
#  95% confidence intervals for our job-seekers example

.pull-left[
+ We can use the `confint()` function to compute confidence intervals

+ We can embed this in the `exp()` function to convert our coefficients to odds ratios.

+ Neither 95% CI includes 1, therefore, both predictors are significant at `$p$`<.05.
]

.pull-right[

```r
exp(confint(m2))
```

```
## Waiting for profiling to be done...
```

```
##                 2.5 %       97.5 %
## (Intercept) 4.3974299 2107.9649011
## age         0.8328194    0.9449901
## msrch1      2.8472535   10.6300211
```

]

---
#  Model selection

+ Just as in linear regression, we can compare logistic models differing in their predictors to choose a best fitting model

+ Methods we can use:
	+ Likelihood ratio test
	+ AIC
	+ BIC

---
#  Likelihood ratio test 
+ We already encountered this when we compared our model to a baseline model with no predictors.

+ We can compare any set of **nested**  models using the likelihood ratio test
	+ Including models differing in one predictor
	+ This tests the statistical significance of the effect of that predictor
	+ Provides an alternative to the z-test

---
#  Likelihood ratio test in R

```r
m_null <- glm(work ~ 1, data = hire, family = "binomial")
m_age <- glm(work ~ age , data = hire, family = "binomial")
m_full <- glm(work ~ age + msrch, data = hire, family = "binomial")

anova(m_null, m_age, m_full, test = "Chisq")
```

```
## Analysis of Deviance Table
## 
## Model 1: work ~ 1
## Model 2: work ~ age
## Model 3: work ~ age + msrch
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       241     335.22                          
## 2       240     313.98  1   21.242 4.047e-06 ***
## 3       239     285.36  1   28.616 8.826e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
#  AIC and BIC  
+ We met AIC and BIC in the model selection section in linear regression
	+ Can be used to compare either nested or non-nested models
	+ Smaller (more negative) AIC and BIC indicate better fitting models
	+ BIC, in the context of regression, penalises extra predictors more heavily
	+ BIC differences >10 indicate that one model is better than another to a practically significant extent

---
#  AIC and BIC in R

```r
AIC(m_null, m_age, m_full)
```

```
##        df      AIC
## m_null  1 337.2187
## m_age   2 317.9762
## m_full  3 291.3604
```

```r
BIC(m_null, m_age, m_full)
```

```
##        df      BIC
## m_null  1 340.7077
## m_age   2 324.9541
## m_full  3 301.8273
```

---
# Summary of today
+ Logistic regression coefficients are converted to  odds ratios to make them more interpretable
	+ Odds ratios tell us how the odds of the event change with a unit increase in X
		+ 1 is no effect
		+ Less than 1 is a negative effect
		+ More than 1 is a positive effect

+ Statistical significance of predictors can be assessed via:
	+ z-test
	+ Confidence intervals
	+ Likelihood ratio test

+ Model selection uses
	+ Likelihood ratio test
	+ AIC and BIC