class: center, middle, inverse, title-slide .title[ #
Week 10: Continuous Probability Distributions
] .subtitle[ ## Data Analysis for Psychology in R 1
] .author[ ### DapR1 Team ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- # Week's Learning Objectives 1. Understand the key difference between discrete and continuous probability distributions. 2. Review the difference between a PDF and CDF. 3. Apply understanding of continuous probability distributions to the example of a normal distribution. 4. Using a range from a continuous probability distribution. 5. Introduce other continuous probability distributions. --- ## Today - Discrete and continuous probability distributions. - Properties of the normal distribution. - Using ranges from the normal distribution to calculate probability estimates. - The standard normal distribution. - The standard normal distribution and the t distribution. --- ## Discrete vs. continuous - Recall that a _discrete probability distribution_ describes a random variable that produces a discrete set of outcomes. -- - By contrast, a _continuous probability distribution_ describes a random variable that produces a continuous set of outcomes - Temperature - Height - Reaction Time - If you have arbitrary precision of measurement, you have a continuous random variable. -- - As a result, while a discrete probability distribution is jagged, a continuous probability distribution is smooth. .pull-left[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-1-1.svg)<!-- --> ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-2-1.svg)<!-- --> ] --- ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. -- - `\(P(X=x)=0\)` .center[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-3-1.svg)<!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` .center[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-4-1.svg)<!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` .center[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-5-1.svg)<!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` .center[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-6-1.svg)<!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` .center[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-7-1.svg)<!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` - Continuous probability distributions are described using the **probability density function (PDF)**, rather than the **probability mass function**. -- - Now, let's take a look at perhaps the most widely used continuous probability distribution... --- ## Normal distribution .pull-left[ - A **normal distribution** (AKA the Gaussian distribution) is a continuous distribution. - It is uni-modal (one peak) and symmetrical. ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-8-1.svg)<!-- --> ] --- ## Normal: PDF $$ f(x;\mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$ - A bit scary! - But the basic points are: - It is a function of data *x* - And *two* parameters `\(\mu\)` and `\(\sigma\)` (mean and SD) -- - There is not one single normal distribution. - We have a family of different distributions that are defined by their mean, `\(\mu\)`, and standard deviation, `\(\sigma\)`. --- ## The Standard Normal Distribution - The **standard normal distribution** is a normal distribution where `\(\mu=0\)` and `\(\sigma=1\)` .center[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-9-1.svg)<!-- --> ] --- ## Different Normal Distributions - Adjusting `\(\mu\)` - Adjusting `\(\mu\)` changes where the curve is centered on the `\(x\)`-axis .center[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-10-1.svg)<!-- --> ] --- ## Different Normal Distributions - Adjusting `\(\sigma\)` - Adjusting `\(\sigma\)` changes the shape of the curve .center[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-11-1.svg)<!-- --> ] --- ## Properties of Normal Distributions .pull-left[ - Properties of any normal distribution: - `\(\approx\)` 68% of area falls under 1 SD on either side of mean. ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-12-1.svg)<!-- --> ] --- count: false ## Properties of Normal Distributions .pull-left[ - Properties of any normal distribution: - `\(\approx\)` 68% of area falls under 1 SD on either side of mean. - `\(\approx\)` 95% of area falls under 2 SD on either side of mean. - _Exactly_ 95% falls under +/- **1.96 SD** ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-13-1.svg)<!-- --> ] --- count: false ## Properties of Normal Distributions .pull-left[ - Properties of any normal distribution: - `\(\approx\)` 68% of area falls under 1 SD on either side of mean. - `\(\approx\)` 95% of area falls under 2 SD on either side of mean. - _Exactly_ 95% falls under +/- **1.96 SD** - `\(\approx\)` 99.75% of area falls under 3 SD on either side of mean. ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-14-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution - Let's use the normal distribution to illustrate how continuous probability distributions work. -- - With a discrete random variable it makes sense to ask: 'what's the probability associated with a specific value of the random variable?'. - e.g., what the probability of getting heads on a fair coin? -- - With a continuous random variable it makes sense to ask about ranges of scores - e.g., what's the probability of sampling someone between 1.6 and 1.7 meters tall if we sample students from a university? --- ## Using the PDF of the normal distribution .pull-left[ - Let's imagine that in some course, student height is normally distributed. - `\(\mu = 168\)` cm - `\(\sigma = 7.5\)` cm - We can ask what is the probability of sampling someone between 175 and 180 cm? - This question translates to: `\(p(175 \leq x \leq 180) = ?\)` - Let's unpack this... ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-16-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - `\(p(175 \leq x \leq 180) = ?\)` - Let's draw these boundaries on our plot... ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-17-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - `\(p(175 \leq x \leq 180) = ?\)` - Let's draw these boundaries on our plot... - What is the value of the area under the curve between these two lines? ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-18-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - We get the area under a curve by calculating an integral `$$\int_{a}^{b} f(a) \,dx$$` - Don't worry, you don't need to know the details of integrals, but you may encounter the equation above. - This equation can be read as: The integral of values falling between vertical lines a and b on the function a of variable x - We can calculate this value using the probability density function... ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-19-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - *pnorm(x, mean, sd)* - *x* is an vector; *mean* and *sd* give the parameters of the function - Returns the area under the normal distribution below x. - Remember, the normal curve changes based on the values of `\(\mu\)` and `\(\sigma\)`, so it makes sense that this function (and the original formula) requires these parameters. ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-20-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - *pnorm(x, mean, sd)* - *x* is an vector; *mean* and *sd* give the parameters of the function - Returns the area under the normal distribution below x. - Remember, the normal curve changes based on the values of `\(\mu\)` and `\(\sigma\)`, so it makes sense that this function (and the original formula) requires these parameters. ```r pnorm(180, mean=168, sd=7.5) ``` ``` ## [1] 0.9452007 ``` - Now you know the proportion under the curve below 180, so how do we find the area between 175 and 180? ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-22-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - We can also calculate the area under the curve below 175: ```r pnorm(175, mean=168, sd=7.5) ``` ``` ## [1] 0.8246761 ``` ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-24-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - If you subtract that value the proportion under the curve below 180, you're left with the proportion under the curve between 175 and 180. ```r pnorm(180, mean=168, sd=7.5) - pnorm(175, mean=168, sd=7.5) ``` ``` ## [1] 0.1205247 ``` - So we know there is a probability of 0.12 that someone sampled from this university will have a height between 175 and 180 ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-26-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - We can also ask about the probability of a sampled element having a value from one of 2+ ranges. - What is the probability that a person will have a height below 151 or greater than 185? `\(p(x \leq 151 \:or\: x \geq 185)\)` ```r pnorm(151, mean=168, sd=7.5) ``` ``` ## [1] 0.0117053 ``` ```r 1 - pnorm(185, mean=168, sd=7.5) ``` ``` ## [1] 0.0117053 ``` - (Why are we subtracting a value from 1 here?) ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-30-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - `\(p(x \leq 151 \cup x \geq 185) = p(x \leq 151) + p(x \geq 185)\)` - `\(0.01 + 0.01 = 0.02\)` ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-31-1.svg)<!-- --> ] --- ## Using the PDF of the normal distribution - What if I wanted to know where the 5% of the most extreme values (i.e., smallest and largest) in this distribution fall? -- - First, this distribution is symmetric, which means that there are the same number of extreme values at the bottom and top end. -- - Second, as the distribution is symmetric the most extreme 5% will be the 2.5% at the bottom of the distribution and the 2.5% at the top. -- - So, what I want to know is: What is the height below which there are only 2.5% of students, and what is the height above which there are only 2.5% of students? --- ## Using the PDF of the normal distribution .pull-left[ - To get these values, you can use *qnorm(x, mean, sd)* - For a normally distributed range of heights with a mean of 168 cm and a sd of 7.5 cm: - The height below which 2.5% of students fall: ```r qnorm(.025, mean=168, sd=7.5) ``` ``` ## [1] 153.3003 ``` - The height above which 2.5% of students fall: ```r qnorm(.975, mean=168, sd=7.5) ``` ``` ## [1] 182.6997 ``` ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-34-1.svg)<!-- --> ] --- ## Remember `\(z\)`-scores .pull-left[ .center[ $$ Z = \frac{x - \mu}{\sigma} $$ ] - It is quite typical to present a normal distribution in terms of ** `\(z\)`-scores**. - `\(z\)`-scores standardize values of `\(x\)`. - The numerator: converts `\(x\)` to deviations from the mean. - The denominator: scales these values based on the observed spread in the data (SD) - The result is the **standard normal distribution**, also known as the `\(z\)`-distribution ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-35-1.svg)<!-- --> ] --- ## Standard normal vs. `\(t\)` distribution .pull-left[ - There are other continuous probability distributions you'll be working with next semester, such as the `\(t\)`-distribution - The `\(t\)` distribution is a bit like the `\(z\)`-distribution, but the shape differs slightly. - When calculating `\(t\)`, we replace `\(\sigma\)` with `\(sd\)`. - As a result, the tails of the `\(t\)`-distribution are slightly higher to account for extra variability, or uncertainty from using an estimate ( `\(sd\)` ) rather than the actual population value ( `\(\sigma\)` ) ] .pull-right[ ![](dapR1_lec9_ContinuousProbabilityDist_files/figure-html/unnamed-chunk-36-1.svg)<!-- --> ] --- # Summary of today - Continuous probability distributions - The normal distribution - Using the normal distribution to make estimates about the probability of events - The normal distribution and the `\(t\)`-distribution --- # Next tasks + Next week, we will cover Sampling + This week: + Come to the Live R session + Complete your lab + Come to office hours + Weekly quiz - on week 9 (lect 8) content + Open Monday 09:00 + Closes Sunday 17:00