Bootstrapping Theory

class: center, middle, inverse, title-slide

# Bootstrapping Theory 
## Data Analysis for Psychology in R 2 
### dapR2 Team
### Department of Psychology The University of Edinburgh

---

# Weeks Learning Objectives
1. Recap the principles of bootstrapping.
2. Recap the concept of the bootstrap distribution.
3. Recap confidence intervals
4. Apply the bootstrap confidence interval to inference in linear models

---
# Topics for this week

1. Bootstrapping theory (recap)
2. Confidence intervals (recap)
3. Why this is useful for linear models?
4. Applying bootstrap inference to linear models

---
class: inverse, center, middle

# Part 1
## Bootstrapping

---

# Samples
<center>
<img src="jk_img_sandbox/statistical_inference.png" width="600" height="500" />
</center>
---

# Good Samples

- If a sample of `$n$` is drawn at **random**, it will be unbiased and representative of `$N$`
- Point estimates from such samples will be good estimates of the population parameter.
    - Without the need for census.

![](jk_img_sandbox/sampling_bias.png)

---

# Recap on sampling distributions
.pull-left[
- We have a population.
- We take a sample of size `$n$` from it, and calculate our statistic
    - The statistic is our estimate of the population parameter.
    
- We do this repeatedly, and we can construct a sampling distribution.

- The mean of the sampling distribution will be a good approximation to the population parameter.

- To quantify sampling variation we can refer to the standard deviation of the sampling distribution (the **standard error**) 
]
.pull-right[
{{content}}
]
--
+ University students
{{content}}
--

+ We take a sample of 30 students, calculate the mean height. 
{{content}}
--
    + This is our estimate of the mean height of all university students.
{{content}}
--

+ Do this repeatedly (take another sample of 30, calculate mean height).
{{content}}
--

+ The mean of these sample means will be a good approximation of the population mean.
{{content}}
--

+ To quantify sampling variation in mean heights of 30 students, we can refer to the standard deviation of these sample means.
{{content}}

---

# Practical problem:

.pull-left[
![](dapr2_14_BootstrapTheory_files/figure-html/unnamed-chunk-2-1.png)
]
.pull-right[
- This process allows us to get an estimate of the sampling variability, **but is this realistic?**
 
- Can I really go out and collect 500 samples of 30 students from the population?
 
 - Probably not...
{{content}} 
]

- So how else can I get a sense of the variability in my sample estimates?

---

.pull-left[
## Solution 1  
### Theoretical

- Collect one sample.

- Estimate the Standard Error using the formula: 
 
`$\text{SE} = \frac{\sigma}{\sqrt{n}}$`

]
.pull-right[
## Solution 2  
### Bootstrap

- Collect one sample.

- Mimick the act of repeated sampling from the population by repeated **resampling with replacement** from the original sample.

- Estimate the standard error using the standard deviation of the distribution of **resample** statistics.

]

---

# Resampling 1: The sample

.pull-left[
<img src="jk_img_sandbox/sample.png" width="350" />

Suppose I am interested in the mean age of all characters in The Simpsons, and I have collected a sample of `$n=10$`. 
{{content}}
]

+ The mean age of my sample is 44.9.

.pull-right[

```
## # A tibble: 10 × 2
## name age
## <chr> <dbl>
## 1 Homer Simpson 39
## 2 Ned Flanders 60
## 3 Chief Wiggum 43
## 4 Milhouse 10
## 5 Patty Bouvier 43
## 6 Janey Powell 8
## 7 Montgomery Burns 104
## 8 Sherri Mackleberry 10
## 9 Krusty the Clown 52
## 10 Jacqueline Bouvier 80
```

```r
simpsons_sample %>%
  summarise(mean_age = mean(age))
```

```
## # A tibble: 1 × 1
## mean_age
## <dbl>
## 1 44.9
```

]

---