Chi-Square Tests

.title[
# <b>Chi-Square Tests</b>
]
.subtitle[
## Data Analysis for Psychology in R 1
]
.author[
### DapR1 Team
]
.institute[
### Department of Psychology<br/>The University of Edinburgh
]

---

# Week's Learning Objectives

1. Understand the difference between `$\chi^2$` goodness-of-fit and `$\chi^2$` test of independence

2. Perform a `$\chi^2$` goodness-of-fit and interpret results

3. Perform a `$\chi^2$` test of independence and interpret results

4. Understand the assumptions for `$\chi^2$` tests

---

# Part 1
## Introduction to `$\chi^2$`

---

# Moving on from `$t$`-tests...

+ `$t$`-tests have allowed you to make comparisons using *continuous* data:

+ A continuous outcome variable from two separate groups (independent-samples `$t$`-test)
  + A continuous outcome variable from one group at two timepoints (paired-samples `$t$`-test)
  + One continuous variable against a single value (one-sample `$t$`-test)

+ You may instead want to test whether data are distributed across *categories* in the way that you would expect:

+ Is your sample distributed equally across levels of education?
  + Is smoking (Y/N) associated with cardiovascular disease (Y/N)? 
  + Do sharks prefer to eat humans or fish?

+ In this case, you will will need a test that checks whether data are grouped according to your expectations.

+ `$\chi^2$`-tests are used to compare **frequencies** across categories in your data

---
# `$\chi^2$`-tests vs `$t$`-tests

+ Similar to a `$t$`-test,
    1. Compute a test statistic
    2. Locate the test statistic on a distribution that reflects the probability of each test statistic value, given that `$H_0$` is true.
    3. If the probability associated with your test statistic is small enough, your results are considered significant.

+ Like the `$t$`-distribution, the shape of the distribution depends on the degrees of freedom

+ Unlike the `$t$`-distribution, *df* in a `$\chi^2$` test isn't computed using sample size, but the number of groups within your data.

.pull-left[
.center[
** `$t$` Distribution **
![](dapR1_lec19_Chisquare_files/figure-html/unnamed-chunk-1-1.svg)
]
]

.pull-right[
.center[
** `$\chi^2$` Distribution**
![](dapR1_lec19_Chisquare_files/figure-html/unnamed-chunk-2-1.svg)

]
]

---
# `$\chi^2$` distribution

+ Larger `$\chi^2$` values become more probable
  + A wider range of `$\chi^2$` values become more likely

+ The `$\chi^2$` distribution begins at 0

+ Categorical variables don't have direction
  + We can investigate this further by looking at the `$\chi^2$` formula
]

.pull-right[
.center[
** `$\chi^2$` Distribution**
![](dapR1_lec19_Chisquare_files/figure-html/unnamed-chunk-3-1.svg)

]
]

---

# The basic `$\chi^2$` formula

.center.f2[
`$\chi^2 = \Sigma \frac{(O-E)^2}{E}$`
]

+ `$\Sigma$` = sum up

+ `$E$` = Expected Cases
  + The values that you expect, given `$H_0$` is true
  
+ `$O$` = Observed Cases
  + The values you actually have

---

# Assumptions of `$\chi^2$` tests

+ Sufficiently large `$n$` so that data approximate a normal distribution
  + What is 'sufficiently large' will depend on the number of cells you have.

+ Expected cases > 5

+ Observations are independent

+ Each observation appears only in a single cell.

---

# Types of `$\chi^2$` tests

+ Goodness of Fit

+ Test of Independence

---

# Part 2 
## `$\chi^2$` Goodness of Fit test

---

# `$\chi^2$` Goodness of Fit test

+ Looks at the distribution of data across a single category

+ **Hypotheses:**

+ `$H_0: p_1 = p_{1,0},\ p_2 = p_{2,0},\ ...,\ p_C = p_{C,0}$`
  
  + `$H_1:$` Some `$p_i \neq p_{i,0}$`
]

---
count: false

# `$\chi^2$` Goodness of Fit test
.pull-left[
+ Tests whether the values you actually have are consistent with the values you expect.

+ Looks at the distribution of data across a single category

+ **Hypotheses:**

+ `$H_0: p_1 = p_{1,0},\ p_2 = p_{2,0},\ ...,\ p_C = p_{C,0}$`
  
  + `$H_1:$` Some `$p_i \neq p_{i,0}$`

]

<img src="figures/chiSqGoF_Exp.png" width="80%" />
]

.pull-right.center[ **Observed Values **

<img src="figures/chiSqGoF_Obs.png" width="80%" />
]
]

---

# `$\chi^2$` Goodness of Fit test

.center.f3[ 
`$\chi^2 = \sum\limits_{i=1}^k \frac{(O_i - E_i)^2}{E_i}$`
]

+ `$\sum\limits_{i=1}^k$` : Sum all values from levels 1 through k

+ `$i$` : Current level

---

# Performing a `$\chi^2$` Goodness of Fit test

+ A new flower shop is trying to decide which days of the week they will be open

+ They want to know whether order number is consistent across days of the week

+ They count the total number of orders they take each day of the week over the course of a month

]

.pull-right.center[
<br>
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:right;"> Orders </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Monday </td>
   <td style="text-align:right;"> 54 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Tuesday </td>
   <td style="text-align:right;"> 39 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Wednesday </td>
   <td style="text-align:right;"> 44 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Thursday </td>
   <td style="text-align:right;"> 47 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Friday </td>
   <td style="text-align:right;"> 68 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Saturday </td>
   <td style="text-align:right;"> 72 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sunday </td>
   <td style="text-align:right;"> 53 </td>
  </tr>
</tbody>
</table>
]

---
count: false

# Performing a `$\chi^2$` Goodness of Fit test

+ A new flower shop is trying to decide which days of the week they will be open

+ They want to know whether order number is consistent across days of the week

+ They count the total number of orders they take each day of the week over the course of a month

+ `$H_0$`: Orders will be consistent throughout the week 
  + `$p_{Monday}=p_{Tuesday}=\cdots\ p_{Sunday}$`
  
+ `$H_1$`: Orders will differ across the week
  + Some `$p_{i}\not=p_{i0}$`
  
]

---

# Performing a `$\chi^2$` Goodness of Fit test

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^k \frac{(O_i-E_i)^2}{E_i}$`
]

+ `$E_i=n\cdot\ p_i$`

+ In this example, we expect each level to be approximately equal, so the expected proportion will be the same across levels.

---
count: false

# Performing a `$\chi^2$` Goodness of Fit test

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^k \frac{(O_i-\color{#BF1932}{E_i})^2}{\color{#BF1932}{E_i}}$`
]

+ `$E_i=n\cdot\ p_i$`

+ In this example, we expect each level to be approximately equal, so the expected proportion will be the same across levels.

```r
exVal <- sum(flowerDat$Orders)*(1/length(levels(flowerDat$Day)))

round(exVal, 2)
```

```
## [1] 53.86
```

---

# Performing a `$\chi^2$` Goodness of Fit test

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^k \frac{(O_i-\color{#BF1932}{E_i})^2}{\color{#BF1932}{E_i}}$`
]

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:right;"> Orders </th>
   <th style="text-align:right;"> Expected </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Monday </td>
   <td style="text-align:right;"> 54 </td>
   <td style="text-align:right;"> 53.86 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Tuesday </td>
   <td style="text-align:right;"> 39 </td>
   <td style="text-align:right;"> 53.86 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Wednesday </td>
   <td style="text-align:right;"> 44 </td>
   <td style="text-align:right;"> 53.86 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Thursday </td>
   <td style="text-align:right;"> 47 </td>
   <td style="text-align:right;"> 53.86 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Friday </td>
   <td style="text-align:right;"> 68 </td>
   <td style="text-align:right;"> 53.86 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Saturday </td>
   <td style="text-align:right;"> 72 </td>
   <td style="text-align:right;"> 53.86 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sunday </td>
   <td style="text-align:right;"> 53 </td>
   <td style="text-align:right;"> 53.86 </td>
  </tr>
</tbody>
</table>

---

# Performing a `$\chi^2$` Goodness of Fit test

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^k \frac{(\color{#BF1932}{O_i - E_i})^2}{E_i}$`
]

---

# Performing a `$\chi^2$` Goodness of Fit test

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^k \frac{(O_i - E_i)\color{#BF1932}{^2}}{E_i}$`
]

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:right;"> Orders </th>
   <th style="text-align:right;"> Expected </th>
   <th style="text-align:right;"> Difference </th>
   <th style="text-align:right;"> Squared </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Monday </td>
   <td style="text-align:right;"> 54 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> 0.14 </td>
   <td style="text-align:right;"> 0.02 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Tuesday </td>
   <td style="text-align:right;"> 39 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -14.86 </td>
   <td style="text-align:right;"> 220.73 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Wednesday </td>
   <td style="text-align:right;"> 44 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -9.86 </td>
   <td style="text-align:right;"> 97.16 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Thursday </td>
   <td style="text-align:right;"> 47 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -6.86 </td>
   <td style="text-align:right;"> 47.02 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Friday </td>
   <td style="text-align:right;"> 68 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> 14.14 </td>
   <td style="text-align:right;"> 200.02 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Saturday </td>
   <td style="text-align:right;"> 72 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> 18.14 </td>
   <td style="text-align:right;"> 329.16 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sunday </td>
   <td style="text-align:right;"> 53 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -0.86 </td>
   <td style="text-align:right;"> 0.73 </td>
  </tr>
</tbody>
</table>

---

# Performing a `$\chi^2$` Goodness of Fit test

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^k \color{#BF1932}{\frac{(O_i - E_i)^2}{E_i}}$`
]

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:right;"> Orders </th>
   <th style="text-align:right;"> Expected </th>
   <th style="text-align:right;"> Difference </th>
   <th style="text-align:right;"> Squared </th>
   <th style="text-align:right;"> SqbyExp </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Monday </td>
   <td style="text-align:right;"> 54 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> 0.14 </td>
   <td style="text-align:right;"> 0.02 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Tuesday </td>
   <td style="text-align:right;"> 39 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -14.86 </td>
   <td style="text-align:right;"> 220.73 </td>
   <td style="text-align:right;"> 4.10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Wednesday </td>
   <td style="text-align:right;"> 44 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -9.86 </td>
   <td style="text-align:right;"> 97.16 </td>
   <td style="text-align:right;"> 1.80 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Thursday </td>
   <td style="text-align:right;"> 47 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -6.86 </td>
   <td style="text-align:right;"> 47.02 </td>
   <td style="text-align:right;"> 0.87 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Friday </td>
   <td style="text-align:right;"> 68 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> 14.14 </td>
   <td style="text-align:right;"> 200.02 </td>
   <td style="text-align:right;"> 3.71 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Saturday </td>
   <td style="text-align:right;"> 72 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> 18.14 </td>
   <td style="text-align:right;"> 329.16 </td>
   <td style="text-align:right;"> 6.11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sunday </td>
   <td style="text-align:right;"> 53 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -0.86 </td>
   <td style="text-align:right;"> 0.73 </td>
   <td style="text-align:right;"> 0.01 </td>
  </tr>
</tbody>
</table>

---

# Performing a `$\chi^2$` Goodness of Fit test

**Compute the test statistic**

.center.f3[
`$\chi^2 = \color{#BF1932}{\sum\limits_{i=1}^k} \frac{(O_i - E_i)^2}{E_i}=$` 16.62
]

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:right;"> Orders </th>
   <th style="text-align:right;"> Expected </th>
   <th style="text-align:right;"> Difference </th>
   <th style="text-align:right;"> Squared </th>
   <th style="text-align:right;"> SqbyExp </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Monday </td>
   <td style="text-align:right;"> 54 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> 0.14 </td>
   <td style="text-align:right;"> 0.02 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Tuesday </td>
   <td style="text-align:right;"> 39 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -14.86 </td>
   <td style="text-align:right;"> 220.73 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 4.10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Wednesday </td>
   <td style="text-align:right;"> 44 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -9.86 </td>
   <td style="text-align:right;"> 97.16 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 1.80 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Thursday </td>
   <td style="text-align:right;"> 47 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -6.86 </td>
   <td style="text-align:right;"> 47.02 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 0.87 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Friday </td>
   <td style="text-align:right;"> 68 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> 14.14 </td>
   <td style="text-align:right;"> 200.02 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 3.71 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Saturday </td>
   <td style="text-align:right;"> 72 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> 18.14 </td>
   <td style="text-align:right;"> 329.16 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 6.11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sunday </td>
   <td style="text-align:right;"> 53 </td>
   <td style="text-align:right;"> 53.86 </td>
   <td style="text-align:right;"> -0.86 </td>
   <td style="text-align:right;"> 0.73 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 0.01 </td>
  </tr>
</tbody>
</table>

---

# Performing a `$\chi^2$` Goodness of Fit test

**Find the test statistic on the distribution**

---
count: false

# Performing a `$\chi^2$` Goodness of Fit test

**Find the test statistic on the distribution**

```r
length(levels(flowerDat$Day))-1
```

```
## [1] 6
```
]

![](dapR1_lec19_Chisquare_files/figure-html/unnamed-chunk-15-1.svg)
]

---
count: false

# Performing a `$\chi^2$` Goodness of Fit test

**Find the test statistic on the distribution**

```r
length(levels(flowerDat$Day))-1
```

```
## [1] 6
```

+ `$\chi^2 =$` 16.62
]

![](dapR1_lec19_Chisquare_files/figure-html/unnamed-chunk-17-1.svg)
]

---

# Performing a `$\chi^2$` Goodness of Fit test

**Compute the probability a score at least as extreme as the test statistic**

---
# Performing a `$\chi^2$` Goodness of Fit test

**Compute the probability a score at least as extreme as the test statistic**

```r
pchisq(sum(flowerDat$SqbyExp), 
       df = 6, 
       lower.tail = F)
```

```
## [1] 0.01080571
```
+ The probability that we would have a `$\chi^2$` value as extreme as 16.62 if `$H_0$` is true is only 0.01.

]

---

---

# Exploring our Results Further

+ If our results are significant, we are likely interested in knowing which levels within our category had the biggest differences.

+ We can get this information by looking at the Pearson residuals (AKA, standardized residuals).

+ `$\frac{O_i-E_i}{\sqrt{E_i}}$`

```r
(flowerDat$Orders[1]-flowerDat$Expected[1])/sqrt(flowerDat$Expected[1])
```

```
## [1] 0.01946616
```

---

# Exploring our Results Further

.pull-left[
+ Positive residuals indicate the the frequency of the corresponding level is higher than expected

+ Negative residuals indicate that the frequency of the corresponding level is lower than expected

+ More extreme residuals indicate that the values are contributing more strongly to the results

+ Values `$\leq$` -2 indicate the frequency of that level is **much lower** than expected
  
  + Values `$\geq$` 2 indicate the frequency of that level is **much higher** than expected
]

.pull-right[
<br>
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:right;"> Orders </th>
   <th style="text-align:right;"> Residuals </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Monday </td>
   <td style="text-align:right;"> 54 </td>
   <td style="text-align:right;"> 0.02 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Tuesday </td>
   <td style="text-align:right;"> 39 </td>
   <td style="text-align:right;"> -2.02 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Wednesday </td>
   <td style="text-align:right;"> 44 </td>
   <td style="text-align:right;"> -1.34 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Thursday </td>
   <td style="text-align:right;"> 47 </td>
   <td style="text-align:right;"> -0.93 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Friday </td>
   <td style="text-align:right;"> 68 </td>
   <td style="text-align:right;"> 1.93 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Saturday </td>
   <td style="text-align:right;"> 72 </td>
   <td style="text-align:right;"> 2.47 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sunday </td>
   <td style="text-align:right;"> 53 </td>
   <td style="text-align:right;"> -0.12 </td>
  </tr>
</tbody>
</table>
]
---

# Drawing Conclusions

---

# Part 3 
## `$\chi^2$` Test of Independence

---
# `$\chi^2$` Test of Independence

+ Checks whether two categorical variables from a single population are independent of each other.

+ Specifically, tests whether membership in Variable 1 is dependent upon membership in Variable 2

+ **Hypotheses:**

+ `$H_0:$` Variable A is not associated with variable B
  
  + `$H_1:$`  Variable A is associated with variable B

]

---
count: false

# `$\chi^2$` Test of Independence

+ Checks whether two categorical variables from a single population are independent of each other.

+ Specifically, tests whether membership in Variable 1 is dependent upon membership in Variable 2

+ **Hypotheses:**

+ `$H_0:$` Variable A is not associated with variable B
  
  + `$H_1:$`  Variable A is associated with variable B
  
]

.pull-right.center[

**Expected Values **

**Observed Values **

<img src="figures/chiSqToI_Obs.png" width="90%" />
]

---

# `$\chi^2$` Test of Independence

.center.f3[ 
`$\chi^2 = \sum\limits_{i=1}^r \sum\limits_{j=1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$`
]

+ `$i$` : current level within Variable A

+ `$r$` : total levels within A

+ `$j$` : levels within Variable B

+ `$c$` : total levels within B

---

# Performing a `$\chi^2$` Test of Independence

+ The flower shop is trying to decide on their flower stock

+ They want to know whether the flower type that sells the best depends on the season

+ `$H_0$`: Flower orders will be independent of season
  
+ `$H_1$`: Flower orders will be dependent on season

]

.pull-right.center[
<br>
<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 186 </td>
   <td style="text-align:right;"> 232 </td>
   <td style="text-align:right;"> 185 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> 172 </td>
   <td style="text-align:right;"> 228 </td>
   <td style="text-align:right;"> 192 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Autumn </td>
   <td style="text-align:right;"> 168 </td>
   <td style="text-align:right;"> 219 </td>
   <td style="text-align:right;"> 164 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winter </td>
   <td style="text-align:right;"> 183 </td>
   <td style="text-align:right;"> 246 </td>
   <td style="text-align:right;"> 173 </td>
  </tr>
</tbody>
</table>
]

---

# Performing a `$\chi^2$` Test of Independence

**Compute the test statistic**

.center.f3[ 
`$\chi^2 = \sum\limits_{i=1}^r \sum\limits_{j=1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$`
]

+ `$E_{ij}=\frac{R_i\ \cdot\ C_j}{n}$`

+ In this example, we expect the orders to be distributed evenly across season and flower type

---

# Performing a `$\chi^2$` Test of Independence

**Compute the test statistic**

+ `$E_{ij}=\frac{R_i\ \cdot\ C_j}{n}$`

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
   <th style="text-align:right;"> Sum </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 186 </td>
   <td style="text-align:right;"> 232 </td>
   <td style="text-align:right;"> 185 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 603 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> 172 </td>
   <td style="text-align:right;"> 228 </td>
   <td style="text-align:right;"> 192 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 592 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Autumn </td>
   <td style="text-align:right;"> 168 </td>
   <td style="text-align:right;"> 219 </td>
   <td style="text-align:right;"> 164 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 551 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winter </td>
   <td style="text-align:right;"> 183 </td>
   <td style="text-align:right;"> 246 </td>
   <td style="text-align:right;"> 173 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 602 </td>
  </tr>
  <tr>
   <td style="text-align:left;color: #BF1932 !important;"> Sum </td>
   <td style="text-align:right;color: #BF1932 !important;"> 709 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 925 </td>
   <td style="text-align:right;color: #BF1932 !important;"> 714 </td>
   <td style="text-align:right;color: #BF1932 !important;color: #BF1932 !important;"> 2348 </td>
  </tr>
</tbody>
</table>
<br>

| Season |       Lilies     |      Roses       |      Tulips      |
|--------|------------------|------------------|------------------|
| Spring |(603 x 709)/2348|(603 x 925)/2348|(603 x 714)/2348| 
| Summer |(592 x 709)/2348|(592 x 925)/2348|(592 x 714)/2348|
| Autumn |(551 x 709)/2348|(551 x 925)/2348|(551 x 714)/2348|
| Winter |(602 x 709)/2348|(602 x 925)/2348|(602 x 714)/2348|

---

# Performing a `$\chi^2$` Test of Independence

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^r \sum\limits_{j=1}^c \frac{(O_{ij} - \color{#BF1932}{E_{ij}})^2}{\color{#BF1932}{E_{ij}}}$`
]

.pull-left.center[
**Observed Values**
<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 186 </td>
   <td style="text-align:right;"> 232 </td>
   <td style="text-align:right;"> 185 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> 172 </td>
   <td style="text-align:right;"> 228 </td>
   <td style="text-align:right;"> 192 </td>
  </tr>
</tbody>
</table>
]

.pull-right.center[ 
**Expected Values**
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Seasons </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 182.08 </td>
   <td style="text-align:right;"> 237.55 </td>
   <td style="text-align:right;"> 183.37 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> 178.76 </td>
   <td style="text-align:right;"> 233.22 </td>
   <td style="text-align:right;"> 180.02 </td>
  </tr>
</tbody>
</table>
]

---

# Performing a `$\chi^2$` Test of Independence

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^r \sum\limits_{j=1}^c \frac{\color{#BF1932}{(O_{ij} - E_{ij})}^2}{E_{ij}}$`
]

.center[
**Difference**
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Seasons </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 3.92 </td>
   <td style="text-align:right;"> -5.55 </td>
   <td style="text-align:right;"> 1.63 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> -6.76 </td>
   <td style="text-align:right;"> -5.22 </td>
   <td style="text-align:right;"> 11.98 </td>
  </tr>
</tbody>
</table>
]

---

# Performing a `$\chi^2$` Test of Independence

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^r \sum\limits_{j=1}^c \frac{(O_{ij} - E_{ij})\color{#BF1932}{^2}}{E_{ij}}$`
]

.pull-left.center[
**Difference**
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Seasons </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 3.92 </td>
   <td style="text-align:right;"> -5.55 </td>
   <td style="text-align:right;"> 1.63 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> -6.76 </td>
   <td style="text-align:right;"> -5.22 </td>
   <td style="text-align:right;"> 11.98 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Autumn </td>
   <td style="text-align:right;"> 1.62 </td>
   <td style="text-align:right;"> 1.93 </td>
   <td style="text-align:right;"> -3.55 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winter </td>
   <td style="text-align:right;"> 1.22 </td>
   <td style="text-align:right;"> 8.84 </td>
   <td style="text-align:right;"> -10.06 </td>
  </tr>
</tbody>
</table>
]

.pull-right.center[ 
**Squared**
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Seasons </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 15.36 </td>
   <td style="text-align:right;"> 30.84 </td>
   <td style="text-align:right;"> 2.67 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> 45.69 </td>
   <td style="text-align:right;"> 27.25 </td>
   <td style="text-align:right;"> 143.51 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Autumn </td>
   <td style="text-align:right;"> 2.63 </td>
   <td style="text-align:right;"> 3.73 </td>
   <td style="text-align:right;"> 12.62 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winter </td>
   <td style="text-align:right;"> 1.49 </td>
   <td style="text-align:right;"> 78.16 </td>
   <td style="text-align:right;"> 101.23 </td>
  </tr>
</tbody>
</table>
]

---

# Performing a `$\chi^2$` Test of Independence

**Compute the test statistic**

.center.f3[
`$\chi^2 = \sum\limits_{i=1}^r \sum\limits_{j=1}^c \color{#BF1932}{\frac{(O_{ij} - E_{ij})^2}{E_{ij}}}$`
]

.pull-left.center[
**Squared**
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Seasons </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 15.36 </td>
   <td style="text-align:right;"> 30.84 </td>
   <td style="text-align:right;"> 2.67 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> 45.69 </td>
   <td style="text-align:right;"> 27.25 </td>
   <td style="text-align:right;"> 143.51 </td>
  </tr>
</tbody>
</table>
]

.pull-right.center[ 
**Expected**
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Seasons </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 182.08 </td>
   <td style="text-align:right;"> 237.55 </td>
   <td style="text-align:right;"> 183.37 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> 178.76 </td>
   <td style="text-align:right;"> 233.22 </td>
   <td style="text-align:right;"> 180.02 </td>
  </tr>
</tbody>
</table>
]

.center[ 
**Squared over Expected**
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Seasons </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 0.08 </td>
   <td style="text-align:right;"> 0.13 </td>
   <td style="text-align:right;"> 0.01 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> 0.26 </td>
   <td style="text-align:right;"> 0.12 </td>
   <td style="text-align:right;"> 0.80 </td>
  </tr>
</tbody>
</table>
]

---
# Performing a `$\chi^2$` Test of Independence

**Compute the test statistic**

.center.f3[
`$\chi^2 = \color{#BF1932}{\sum\limits_{i=1}^r \sum\limits_{j=1}^c}\frac{(O_{ij} - E_{ij})^2}{E_{ij}}$`
]

.pull-left.center[
**Squared over Expected**

```r
kable(divTab, digits = 2)
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Seasons </th>
   <th style="text-align:right;"> Lilies </th>
   <th style="text-align:right;"> Roses </th>
   <th style="text-align:right;"> Tulips </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Spring </td>
   <td style="text-align:right;"> 0.08 </td>
   <td style="text-align:right;"> 0.13 </td>
   <td style="text-align:right;"> 0.01 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Summer </td>
   <td style="text-align:right;"> 0.26 </td>
   <td style="text-align:right;"> 0.12 </td>
   <td style="text-align:right;"> 0.80 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Autumn </td>
   <td style="text-align:right;"> 0.02 </td>
   <td style="text-align:right;"> 0.02 </td>
   <td style="text-align:right;"> 0.08 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winter </td>
   <td style="text-align:right;"> 0.01 </td>
   <td style="text-align:right;"> 0.33 </td>
   <td style="text-align:right;"> 0.55 </td>
  </tr>
</tbody>
</table>
]

.pull-right.center[ 
** `$\chi^2$`**

```r
sum(divTab[flowerCols])
```

```
## [1] 2.397417
```
]

---

# Performing a `$\chi^2$` Test of Independence

**Find the test statistic on the distribution**

+ `$c$` = number of levels within Variable 1

+ `$r$` = number of levels within Variable 2
]

---
count: false

# Performing a `$\chi^2$` Test of Independence

**Find the test statistic on the distribution**

+ `$c$` = number of levels within Variable 1

+ `$r$` = number of levels within Variable 2

```r
r <- length(levels(seasonDat$Season))
c <- length(levels(seasonDat$Flowers))

(r-1)*(c-1)
```

```
## [1] 6
```
]

![](dapR1_lec19_Chisquare_files/figure-html/unnamed-chunk-42-1.svg)
]

---
count: false

# Performing a `$\chi^2$` Test of Independence

**Find the test statistic on the distribution**

+ `$c$` = number of levels within Variable 1

+ `$r$` = number of levels within Variable 2

```r
r <- length(levels(seasonDat$Season))
c <- length(levels(seasonDat$Flowers))

(r-1)*(c-1)
```

```
## [1] 6
```
]

![](dapR1_lec19_Chisquare_files/figure-html/unnamed-chunk-44-1.svg)
]

---

# Performing a `$\chi^2$` Test of Independence

**Compute the probability a score at least as extreme as the test statistic**

---
# Performing a `$\chi^2$` Test of Independence

**Compute the probability a score at least as extreme as the test statistic**

```r
pchisq(sum(divTab[flowerCols]), 
       df = 6, 
       lower.tail = F)
```

```
## [1] 0.8797671
```
+ The probability that we would have a `$\chi^2$` value as extreme as 2.4 if `$H_0$` is true is 0.88.

]

---
class: center, middle

---

# Exploring our Results Further

+ We can also compute standardized residuals for the Test of Independence

+ In this case, you will calculate them separately by cell.

+ `$\frac{O_{ij}-E_{ij}}{\sqrt{E_{ij}}}$`

```r
(ObsVals['Spring', 'Lilies']-exVals[1, 'Lilies'])/sqrt(exVals[1, 'Lilies'])
```

```
##      Lilies
## 1 0.2904051
```

---

# Effect Sizes

+ There are 3 possibilities:

+ Phi coefficient
  + Cramer's V
  + Odds Ratios

+ You will learn more about odds ratios in DapR2, so we will focus on Phi and Cramer's V

---

# Phi coefficient

.center.f3[
`$\phi=\sqrt{\frac{\chi^2}{n}}$`
]

+ `$n$`: total number of observations

+ Should only be used when you have a 2x2 contingency table (2 categorical variables with 2 levels each)

+ Interpretation:
  + 0.1: small effect
  + 0.3: medium effect
  + 0.5: large effect

---

# Cramer's V

.center.f3[
`$V=\sqrt{\frac{\chi^2}{n\cdot\ df^*}}$`
]

+ where `$df^* = min(r-1, c-1)$`

+ Can be used when you aren't working with a 2x2 contingency table

+ Interpretation:
  + Cramer's V is interpreted based on `$df^*$`:

| `$df^*$` | small | medium | large |
|--------|-------|--------|-------|
|   1    |  .10  |  .30   |  .50  |
|   2    |  .07  |  .21   |  .35  |
|   3    |  .06  |  .17   |  .29  |
|   4    |  .05  |  .15   |  .25  |
|   5    |  .04  |  .13   |  .22  |

---

# Summary of Today

+ We learned about the `$\chi^2$` distribution and how it compares to the `$t$` distribution.

+ We discussed the assumptions of `$\chi^2$` tests.

+ We differentiated between the `$\chi^2$` Goodness of Fit test and the `$\chi^2$` Test of Independence

+ We walked through how to calculate both types of `$\chi^2$` values.

+ We talked about standardized residuals and how they relate to your `$\chi^2$` results

+ We covered the measures of effect size you may use with `$\chi^2$` tests.

---