class: center, middle, inverse, title-slide .title[ #
Week 10: Continuous Probability Distributions
] .subtitle[ ## Data Analysis for Psychology in R 1
] .author[ ### DapR1 Team ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- # Week's Learning Objectives 1. Understand the key difference between discrete and continuous probability distributions. 2. Apply understanding of continuous probability distributions to the example of a normal distribution. 3. Understand how to use a range from a continuous probability distribution. 4. Introduce other continuous probability distributions. --- ## Discrete vs. continuous - Recall that a _discrete probability distribution_ describes a random variable that produces a discrete set of outcomes. -- - By contrast, a _continuous probability distribution_ describes a random variable that produces a continuous set of outcomes - Temperature - Height - Reaction Time - If you have arbitrary precision of measurement, you have a continuous random variable. -- - As a result, while a discrete probability distribution is jagged, a continuous probability distribution is smooth. .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. -- - `\(P(X=x)=0\)` .center[ <!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` .center[ <!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` .center[ <!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` .center[ <!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` .center[ <!-- --> ] --- count: false ## Discrete vs. continuous - Continuous probability distributions differ from discrete in two other important ways. - `\(P(X=x)=0\)` - Continuous probability distributions are described using the **probability density function (PDF)**, rather than the **probability mass function**. -- - Now, let's take a look at perhaps the most widely used continuous probability distribution... --- class: center, middle ## Questions? --- ## Normal distribution .pull-left[ - A **normal distribution** (AKA the Gaussian distribution) is a continuous distribution. - It is uni-modal (one peak) and symmetrical. ] .pull-right[ <!-- --> ] --- ## Normal: PDF $$ f(x;\mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$ - A bit scary! - But the basic points are: - It is a function of data *x* - And *two* parameters `\(\mu\)` and `\(\sigma\)` (mean and SD) -- - There is not one single normal distribution. - We have a family of different distributions that are defined by their mean, `\(\mu\)`, and standard deviation, `\(\sigma\)`. --- ## The Standard Normal Distribution - The **standard normal distribution** is a normal distribution where `\(\mu=0\)` and `\(\sigma=1\)` .center[ <!-- --> ] --- ## Different Normal Distributions - Adjusting `\(\mu\)` - Adjusting `\(\mu\)` changes where the curve is centered on the `\(x\)`-axis .center[ <!-- --> ] --- ## Different Normal Distributions - Adjusting `\(\sigma\)` - Adjusting `\(\sigma\)` changes the shape of the curve .center[ <!-- --> ] --- ## Properties of Normal Distributions .pull-left[ - Properties of any normal distribution: - `\(\approx\)` 68% of area falls under 1 SD on either side of mean. ] .pull-right[ <!-- --> ] --- count: false ## Properties of Normal Distributions .pull-left[ - Properties of any normal distribution: - `\(\approx\)` 68% of area falls under 1 SD on either side of mean. - `\(\approx\)` 95% of area falls under 2 SD on either side of mean. - _Exactly_ 95% falls under +/- **1.96 SD** ] .pull-right[ <!-- --> ] --- count: false ## Properties of Normal Distributions .pull-left[ - Properties of any normal distribution: - `\(\approx\)` 68% of area falls under 1 SD on either side of mean. - `\(\approx\)` 95% of area falls under 2 SD on either side of mean. - _Exactly_ 95% falls under +/- **1.96 SD** - `\(\approx\)` 99.75% of area falls under 3 SD on either side of mean. ] .pull-right[ <!-- --> ] --- class: center, middle ## Questions? --- ## Using the PDF of the normal distribution - Let's use the normal distribution to illustrate how continuous probability distributions work. -- - With a discrete random variable it makes sense to ask: 'what's the probability associated with a specific value of the random variable?'. - e.g., what the probability of getting heads on a fair coin? -- - With a continuous random variable it makes sense to ask about ranges of scores - e.g., what's the probability of sampling someone between 1.6 and 1.7 meters tall if we sample students from a university? --- ## Using the PDF of the normal distribution .pull-left[ - Let's imagine that in some course, student height is normally distributed. + `\(\mu = 168\)` cm + `\(\sigma = 7.5\)` cm - We can ask what is the probability of sampling someone between 175 and 180 cm? + This question translates to: `\(p(175 \leq x \leq 180) = ?\)` + Let's unpack this... ] .pull-right[ <!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ `$$p(175 \leq x \leq 180) = ?$$` + Let's draw these boundaries on our plot... ] .pull-right[ <!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ `$$p(175 \leq x \leq 180) = ?$$` + Let's draw these boundaries on our plot... + What is the value of the area under the curve between these two lines? ] .pull-right[ <!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - We get the area under a curve by calculating an integral `$$\int_{a}^{b} f(a) \,dx$$` + Don't worry, you don't need to know the details of integrals, but you may encounter the equation above. - This equation can be read as: The integral of values falling between vertical lines `\(a\)` and `\(b\)` on the function `\(a\)` of variable `\(x\)` - We can calculate this value using the probability density function... ] .pull-right[ <!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - `pnorm(x, mean, sd)` + *x* is the upper threshold; the function will output the probability of all values less than this. + *mean* and *sd* give the parameters of the function + Returns the area under the normal distribution below x. + Remember, the normal curve changes based on the values of `\(\mu\)` and `\(\sigma\)`, so it makes sense that this PDF requires these parameters. ] .pull-right[ <!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - `pnorm(x, mean, sd)` + *x* is the upper threshold; the function will output the probability of all values less than this. + *mean* and *sd* give the parameters of the function + Returns the area under the normal distribution below x. + Remember, the normal curve changes based on the values of `\(\mu\)` and `\(\sigma\)`, so it makes sense that this PDF requires these parameters. ```r pnorm(180, mean=168, sd=7.5) ``` ``` ## [1] 0.9452007 ``` > **Test Your Understanding:** How do you interpret this output? ] .pull-right[ <!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - We can also calculate the area under the curve below 175: ```r pnorm(175, mean=168, sd=7.5) ``` ``` ## [1] 0.8246761 ``` ] .pull-right[ <!-- --> ] -- > **Test Your Understanding:** Now you know that 94.52% of student heights fall below 180 cm, and 82.47% of student heights fall below 175 cm. How do you calculate the probability of selecting a student whose height falls between 175-180 cm? --- ## Using the PDF of the normal distribution .pull-left[ + `$$P(175 \leq x \leq 180) = P(X<180) - P(X<175)$$` ```r p180 <- pnorm(180, mean=168, sd=7.5) p175 <- pnorm(175, mean=168, sd=7.5) p180-p175 ``` ``` ## [1] 0.1205247 ``` + So, the probability of randomly selecting a student with a height between 175 and 180 is 0.12 ] .pull-right[ <!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - We can also ask about the probability of a sampled element having a value from one of 2+ ranges. - What is the probability that a person will have a height below 151 or greater than 185? `\(p(x \leq 151 \:or\: x \geq 185)\)` ```r pnorm(151, mean=168, sd=7.5) ``` ``` ## [1] 0.0117053 ``` ```r 1 - pnorm(185, mean=168, sd=7.5) ``` ``` ## [1] 0.0117053 ``` + **Test your understanding:** Why are we subtracting a value from 1 here? ] .pull-right[ <!-- --> ] --- ## Using the PDF of the normal distribution .pull-left[ - `\(p(x \leq 151 \cup x \geq 185) = p(x \leq 151) + p(x \geq 185)\)` - `\(0.01 + 0.01 = 0.02\)` ] .pull-right[ <!-- --> ] --- ## Using the PDF of the normal distribution - What if I wanted to know where the 5% of the most extreme values (i.e., smallest and largest) in this distribution fall? -- - The normal distribution is symmetric, which means that there are the same number of extreme values at the bottom and top end. -- - This means the most extreme 5% will be the 2.5% at the bottom of the distribution and the 2.5% at the top. -- - So our question is: what is the height below which there are only 2.5% of students, and what is the height above which there are only 2.5% of students? --- ## Using the PDF of the normal distribution .pull-left[ - To get these values, you can use `qnorm(x, mean, sd)` - For a normally distributed range of heights with a mean of 168 cm and a sd of 7.5 cm: - The height below which 2.5% of students fall: ```r qnorm(.025, mean=168, sd=7.5) ``` ``` ## [1] 153.3003 ``` - The height above which 2.5% of students fall: ```r qnorm(.975, mean=168, sd=7.5) ``` ``` ## [1] 182.6997 ``` ] .pull-right[ <!-- --> ] --- ## Take this knowledge forward + These examples might seem a bit bizarre (when will you ever need to calculate extreme heights?), but this will be incredibly relevant when you discuss: + 1- and 2-tailed distributions + `\(p\)` -values + Distributions of test statistics + You may find it helpful to come back and review these slides when you get to these topics later in the course. --- class: center, middle ## Questions? --- ## Remember `\(z\)`-scores .pull-left[ .center[ $$ Z = \frac{x - \mu}{\sigma} $$ ] - It is quite typical to present a normal distribution in terms of ** `\(z\)`-scores**. - `\(z\)`-scores standardize values of `\(x\)`. - The numerator: converts `\(x\)` to deviations from the mean. - The denominator: scales these values based on the observed spread in the data (SD) - The result is the **standard normal distribution**, also known as the `\(z\)`-distribution ] .pull-right[ <!-- --> ] --- ## Standard normal vs. `\(t\)` distribution .pull-left[ - There are other continuous probability distributions you'll be working with next semester, such as the `\(t\)`-distribution - The `\(t\)` distribution is a bit like the `\(z\)`-distribution, but the shape differs slightly. - When calculating `\(t\)`, we replace `\(\sigma\)` with `\(sd\)`. - As a result, the tails of the `\(t\)`-distribution are slightly higher to account for extra variability, or uncertainty from using an estimate ( `\(sd\)` ) rather than the actual population value ( `\(\sigma\)` ) ] .pull-right[ <!-- --> ] --- ## Summary of today - Continuous probability distributions - The normal distribution - Using the normal distribution to make estimates about the probability of events - The normal distribution and the `\(t\)`-distribution --- ## Next Tasks + Tomorrow, I'll present a live R session focused on continuous probability distributions + Next week, we will talk about samples and populations + This week: + Attend the live R session + Complete your lab + Check the [reading list](https://eu01.alma.exlibrisgroup.com/leganto/readinglist/lists/43349908530002466) for recommended reading + Come to office hours + Monica's office hours are today (Monday) from 2:30 - 3:30 in 7 George Sq, room UG44 + Weekly quiz + Opens Monday 09:00 + Closes Sunday 17:00