Week 10: Continuous Probability Distributions

class: center, middle, inverse, title-slide

.title[
# Week 10: Continuous Probability Distributions 
]
.subtitle[
## Data Analysis for Psychology in R 1 
]
.author[
### DapR1 Team
]
.institute[
### Department of Psychology The University of Edinburgh
]

---

# Week's Learning Objectives
1. Understand the key difference between discrete and continuous probability distributions.
2. Apply understanding of continuous probability distributions to the example of a normal distribution.
3. Understand how to use a range from a continuous probability distribution. 
4. Introduce other continuous probability distributions.

---
## Discrete vs. continuous
- Recall that a _discrete probability distribution_ describes a random variable that produces a discrete set of outcomes.

- By contrast, a _continuous probability distribution_ describes a random variable that produces a continuous set of outcomes

- Temperature
  - Height
  - Reaction Time

- If you have arbitrary precision of measurement, you have a continuous random variable.

- As a result, while a discrete probability distribution is jagged, a continuous probability distribution is smooth. 
.pull-left[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-1-1.svg)
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-2-1.svg)
]

---
## Discrete vs. continuous

- Continuous probability distributions differ from discrete in two other important ways.

--
    
    - `$P(X=x)=0$`

.center[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-3-1.svg)
]

---
count: false

## Discrete vs. continuous

- Continuous probability distributions differ from discrete in two other important ways. 
    - `$P(X=x)=0$`

.center[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-4-1.svg)
]

---
count: false

## Discrete vs. continuous

- Continuous probability distributions differ from discrete in two other important ways. 
    - `$P(X=x)=0$`

.center[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-5-1.svg)
]

---
count: false

## Discrete vs. continuous

- Continuous probability distributions differ from discrete in two other important ways. 
    - `$P(X=x)=0$`

.center[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-6-1.svg)
]

---
count: false

## Discrete vs. continuous

- Continuous probability distributions differ from discrete in two other important ways. 
    - `$P(X=x)=0$`

.center[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-7-1.svg)
]

---

count: false

## Discrete vs. continuous

- Continuous probability distributions differ from discrete in two other important ways. 
    - `$P(X=x)=0$`

- Continuous probability distributions are described using the **probability density function (PDF)**, rather than the **probability mass function**.

- Now, let's take a look at perhaps the most widely used continuous probability distribution...

---
class: center, middle

## Questions?

---
## Normal distribution

.pull-left[
- A **normal distribution** (AKA the Gaussian distribution) is a continuous distribution.

- It is uni-modal (one peak) and symmetrical.

]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-8-1.svg)
]

---
## Normal: PDF

$$
f(x;\mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x - \mu)^2}{2\sigma^2}}
$$

- A bit scary!

- But the basic points are:
  
  - It is a function of data *x*
  - And *two* parameters `$\mu$` and `$\sigma$` (mean and SD)

- There is not one single normal distribution.
- We have a family of different distributions that are defined by their mean, `$\mu$`, and standard deviation, `$\sigma$`.

---
## The Standard Normal Distribution

- The **standard normal distribution** is a normal distribution where `$\mu=0$` and `$\sigma=1$`

.center[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-9-1.svg)
]

---
## Different Normal Distributions - Adjusting `$\mu$`

- Adjusting `$\mu$` changes where the curve is centered on the `$x$`-axis

.center[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-10-1.svg)
]

---
## Different Normal Distributions - Adjusting `$\sigma$`

- Adjusting `$\sigma$` changes the shape of the curve

.center[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-11-1.svg)
]

---
## Properties of Normal Distributions

.pull-left[
- Properties of any normal distribution: 
  - `$\approx$` 68% of area falls under 1 SD on either side of mean.
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-12-1.svg)
]

---
count: false

## Properties of Normal Distributions

.pull-left[
- Properties of any normal distribution: 
  - `$\approx$` 68% of area falls under 1 SD on either side of mean.
  - `$\approx$` 95% of area falls under 2 SD on either side of mean.
      - _Exactly_ 95% falls under +/- **1.96 SD**
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-13-1.svg)
]

---
count: false

## Properties of Normal Distributions

.pull-left[
- Properties of any normal distribution:
  - `$\approx$` 68% of area falls under 1 SD on either side of mean.
  - `$\approx$` 95% of area falls under 2 SD on either side of mean.
      - _Exactly_ 95% falls under +/- **1.96 SD**
  - `$\approx$` 99.75% of area falls under 3 SD on either side of mean.
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-14-1.svg)
]

---
class: center, middle

## Questions?

---
## Using the PDF of the normal distribution
- Let's use the normal distribution to illustrate how continuous probability distributions work.

- With a discrete random variable it makes sense to ask: 'what's the probability associated with a specific value of the random variable?'.
  - e.g., what the probability of getting heads on a fair coin?

- With a continuous random variable it makes sense to ask about ranges of scores
  - e.g., what's the probability of sampling someone between 1.6 and 1.7 meters tall if we sample students from a university?

---
## Using the PDF of the normal distribution

.pull-left[
- Let's imagine that in some course, student height is normally distributed.

+ `$\mu = 168$` cm 
  + `$\sigma = 7.5$` cm
    
- We can ask what is the probability of sampling someone between 175 and 180 cm?

+ This question translates to: `$p(175 \leq x \leq 180) = ?$`
  + Let's unpack this... 
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-16-1.svg)
]

---
## Using the PDF of the normal distribution
.pull-left[

`$$p(175 \leq x \leq 180) = ?$$`

+ Let's draw these boundaries on our plot...

]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-17-1.svg)
]

---
## Using the PDF of the normal distribution
.pull-left[

`$$p(175 \leq x \leq 180) = ?$$`

+ Let's draw these boundaries on our plot...

+ What is the value of the area under the curve between these two lines?

]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-18-1.svg)
]

---
## Using the PDF of the normal distribution
.pull-left[

- We get the area under a curve by calculating an integral
  
  `$$\int_{a}^{b} f(a) \,dx$$`
  
  + Don't worry, you don't need to know the details of integrals, but you may encounter the equation above.
  
  - This equation can be read as: The integral of values falling between vertical lines `$a$` and `$b$` on the function `$a$` of variable `$x$` 
  
  - We can calculate this value using the probability density function...

]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-19-1.svg)
]

---
## Using the PDF of the normal distribution
.pull-left[

- `pnorm(x, mean, sd)` 
  + *x* is the upper threshold; the function will output the probability of all values less than this.
  
  + *mean* and *sd* give the parameters of the function
  
  + Returns the area under the normal distribution below x. 
  
  + Remember, the normal curve changes based on the values of `$\mu$` and `$\sigma$`, so it makes sense that this PDF requires these parameters.
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-20-1.svg)
]

---
## Using the PDF of the normal distribution
.pull-left[

```r
pnorm(180, mean=168, sd=7.5)
```

```
## [1] 0.9452007
```

> **Test Your Understanding:** How do you interpret this output?

]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-22-1.svg)
]

---
## Using the PDF of the normal distribution

.pull-left[
- We can also calculate the area under the curve below 175:

```r
pnorm(175, mean=168, sd=7.5) 
```

```
## [1] 0.8246761
```
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-24-1.svg)
]

> **Test Your Understanding:** Now you know that 94.52% of student heights fall below 180 cm, and 82.47% of student heights fall below 175 cm. How do you calculate the probability of selecting a student whose height falls between 175-180 cm?

---
## Using the PDF of the normal distribution

.pull-left[
+ `$$P(175 \leq x \leq 180) = P(X<180) - P(X<175)$$`

```r
p180 <- pnorm(180, mean=168, sd=7.5)
p175 <- pnorm(175, mean=168, sd=7.5)

p180-p175
```

```
## [1] 0.1205247
```

+ So, the probability of randomly selecting a student with a height between 175 and 180 is 0.12

]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-26-1.svg)
]
---
## Using the PDF of the normal distribution

.pull-left[
- We can also ask about the probability of a sampled element having a value from one of 2+ ranges.

- What is the probability that a person will have a height below 151 or greater than 185? `$p(x \leq 151 \:or\: x \geq 185)$`

```r
pnorm(151, mean=168, sd=7.5)
```

```
## [1] 0.0117053
```

```r
1 - pnorm(185, mean=168, sd=7.5)
```

```
## [1] 0.0117053
```

+ **Test your understanding:** Why are we subtracting a value from 1 here?

]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-30-1.svg)
]

---
## Using the PDF of the normal distribution

.pull-left[
- `$p(x \leq 151 \cup x \geq 185) = p(x \leq 151) + p(x \geq 185)$`

- `$0.01 + 0.01 = 0.02$`
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-31-1.svg)
]

---
## Using the PDF of the normal distribution
- What if I wanted to know where the 5% of the most extreme values (i.e., smallest and largest) in this distribution fall?

--
  
   - The normal distribution is symmetric, which means that there are the same number of extreme values at the bottom and top end.

- This means the most extreme 5% will be the 2.5% at the bottom of the distribution and the 2.5% at the top.

- So our question is: what is the height below which there are only 2.5% of students, and what is the height above which there are only 2.5% of students?

---
## Using the PDF of the normal distribution
.pull-left[

- To get these values, you can use `qnorm(x, mean, sd)`

- For a normally distributed range of heights with a mean of 168 cm and a sd of 7.5 cm:

- The height below which 2.5% of students fall:

```r
qnorm(.025, mean=168, sd=7.5)
```

```
## [1] 153.3003
```

- The height above which 2.5% of students fall:

```r
qnorm(.975, mean=168, sd=7.5)
```

```
## [1] 182.6997
```
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-34-1.svg)
]

---
## Take this knowledge forward

+ These examples might seem a bit bizarre (when will you ever need to calculate extreme heights?), but this will be incredibly relevant when you discuss:

+ 1- and 2-tailed distributions
  + `$p$` -values
  + Distributions of test statistics
    
+ You may find it helpful to come back and review these slides when you get to these topics later in the course.

---
class: center, middle

## Questions?

---
## Remember `$z$`-scores

.pull-left[
.center[
$$
Z = \frac{x - \mu}{\sigma}
$$
]
- It is quite typical to present a normal distribution in terms of ** `$z$`-scores**.
- `$z$`-scores standardize values of `$x$`.
  - The numerator: converts `$x$` to deviations from the mean.
  - The denominator: scales these values based on the observed spread in the data (SD)
- The result is the **standard normal distribution**, also known as the `$z$`-distribution
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-35-1.svg)
]

---
## Standard normal vs. `$t$` distribution
.pull-left[

- There are other continuous probability distributions you'll be working with next semester, such as the  `$t$`-distribution 
- The `$t$` distribution is a bit like the `$z$`-distribution, but the shape differs slightly. 
  - When calculating `$t$`, we replace `$\sigma$` with `$sd$`.
  - As a result, the tails of the `$t$`-distribution are slightly higher to account for extra variability, or uncertainty from using an estimate ( `$sd$` ) rather than the actual population value ( `$\sigma$` )
]

.pull-right[
![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-36-1.svg)
]

---
## Summary of today
- Continuous probability distributions 
- The normal distribution
- Using the normal distribution to make estimates about the probability of events
- The normal distribution and the `$t$`-distribution

---
## Next Tasks

+ Tomorrow, I'll present a live R session focused on continuous probability distributions

+ Next week, we will talk about samples and populations

+ This week:

+ Attend the live R session
  
  + Complete your lab
  
  + Check the [reading list](https://eu01.alma.exlibrisgroup.com/leganto/readinglist/lists/43349908530002466) for recommended reading
  
  + Come to office hours
    + Monica's office hours are today (Monday) from 2:30 - 3:30 in 7 George Sq, room UG44
  
  + Weekly quiz
      + Opens Monday 09:00
      + Closes Sunday 17:00