class: center, middle, inverse, title-slide .title[ #
Week 9: Discrete Probability Distributions
] .subtitle[ ## Data Analysis for Psychology in R 1
] .author[ ### DapR1 Team ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- # Weeks Learning Objectives 1. Understand concept of a random variable. 2. Understand the process of assigning probabilities to all outcomes. 3. Apply the understanding of discrete probability distributions to the example of the binomial distribution. 4. Understand the difference between a probability mass function (PMF) and a cumulative probability function (CDF). --- ## Probability as it relates to Psychology... + Recall our definition of a **random experiment:** + It could (theoretically) be infinitely repeated + The outcome is uncertain + When we conduct a random experiment, we are sampling simple events from a *sample space* to get an outcome. + We can't be 100% certain which outcome will occur each time the experiment is repeated + An outcome's probability provides us with information that can be used to make decisions about data when we're faced with randomness --- ## Probability as it relates to Psychology... .pull-left[ + *Sample Space:* all student eye colours + *Simple Event:* the eye colour of an individual student + *Random Experiment:* Randomly selecting a student and checking their eye colour ] .pull-right[ **DapR Student Eye Colours** <!-- --> ] --- ## Random variables .pull-left[ + A **random variable** is a set of values that quantify the outcome of the random experiment. + Allows you to map the outcomes of a random experiment to numbers. + Usually denoted with a capital letter ] -- .pull-right[ **Random Experiment:** Checking eye colour **Random Variable:** `$$X = \begin{cases}1\ if\ amber \\2\ if\ blue\\3\ if\ brown\\4\ if\ green\\5\ if\ grey\\6\ if\ hazel \end{cases}$$` ] -- - A **discrete random variable** can assume only a finite number of different values - e.g., outcome of a coin toss; number of children in a family -- - A **continuous random variable** is arbitrarily precise, and thus can take all infinite the values in some range. - e.g., height, age, distance -- > **Test your understanding:** What kind of variable is eye colour? --- ## Probability distributions - A probability distribution maps the values of a random variable to the probability of it occurring. .center[ <!-- --> ] --- ## Probability Mass Function + For **discrete distributions** it maps probability to each outcome value via a **probability mass function**. + A probability mass function gives the probability that a discrete random variable exactly equals a specific value: `$$f(x) = P(X=x)$$` + In the case of our eye colour example: `$$f(x) = P(X = hazel)$$` --- ## Probability Mass Function `$$f(x) = P(X=x)$$` .pull-left[ + Some observations (remember probability rules from last week): + If you have a random experiment with N possible outcomes, then: `$$\sum_{i=1}^{N}(f(x_{i}))=1$$` + For any subset A of the sample space: `$$P(A)=\sum_{i \in A}(f(x_{i}))$$` ] -- .pull-right[ `$$P(hazel)=\sum_{i \in hazel}(f(h_i))$$` <!-- --> `$$P(hazel) = P(h_1) + P(h_2)...+ P(h_{43}) = \frac{43}{374}$$` ] ??? i.e., the probability of subset A is the sum of the probabilities of all the simple events x within A. --- ## Discrete random variables: An example + **Simple Experiment:** Rolling 2 6-sided dice > **Test your understanding:** What might we use for our random variable? -- + **Discrete random variable:** The sum of the two upward facing sides -- - **Assumptions:** 1. Dice are fair 2. The outcome of each dice is *independent* of the outcome of the other. --- ## Discrete random variables: An example .pull-left[ **Sample space**, `\(S\)`: <img src="figures/DiceSampleSpace.png" width="85%" /> ] -- .pull-right[ + We can represent `\(S\)` as a frequency distribution. + **Frequency distribution:** Mapping the values of the random variable with how often they occur <table> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Outcome </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> 10 </td> <td style="text-align:left;"> 11 </td> <td style="text-align:left;"> 12 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Frequency </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> </tr> </tbody> </table> ] --- count: false ## Discrete random variables: An example .pull-left[ **Sample space**, `\(S\)`: <img src="figures/DiceSampleSpace.png" width="85%" /> ] .pull-right[ + We can represent `\(S\)` as a frequency distribution. + **Frequency distribution:** Mapping the values of the random variable with how often they occur <table> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Outcome </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> 10 </td> <td style="text-align:left;"> 11 </td> <td style="text-align:left;"> 12 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Frequency </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> </tr> </tbody> </table> + Probabilities are just frequency over total possible outcomes: `$$P(x) = \frac{ways\:x\:can\:happen}{total\:possible\:outcomes}$$` > **Test Your Understanding:** What is the probability of the dice summing to `7`? ] --- ## Discrete random variables: An example .pull-left[ + First, we need to **sum the frequencies** to the get total number of possible outcomes: ```r sum(table_data$Frequency) ``` ``` ## [1] 36 ``` ] -- .pull-right[ + Next, we **divide the frequency of each outcome by the total frequency**: `$$P(X=2) = \frac{1}{36} = .03 \\P(X=3) = \frac{2}{36} = .06 \\\vdots \\P(X=12) = \frac{1}{36} = .03$$` ] -- + This gives us a **discrete probability distribution:** <table> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Outcome </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> 10 </td> <td style="text-align:left;"> 11 </td> <td style="text-align:left;"> 12 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Frequency </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Probability </td> <td style="text-align:left;"> 0.03 </td> <td style="text-align:left;"> 0.06 </td> <td style="text-align:left;"> 0.08 </td> <td style="text-align:left;"> 0.11 </td> <td style="text-align:left;"> 0.14 </td> <td style="text-align:left;"> 0.17 </td> <td style="text-align:left;"> 0.14 </td> <td style="text-align:left;"> 0.11 </td> <td style="text-align:left;"> 0.08 </td> <td style="text-align:left;"> 0.06 </td> <td style="text-align:left;"> 0.03 </td> </tr> </tbody> </table> --- ## Probability mass function - You can plot a discrete probability distribution using a bar plot: .center[ <table> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Outcome </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> 10 </td> <td style="text-align:left;"> 11 </td> <td style="text-align:left;"> 12 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Frequency </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Probability </td> <td style="text-align:left;"> 0.03 </td> <td style="text-align:left;"> 0.06 </td> <td style="text-align:left;"> 0.08 </td> <td style="text-align:left;"> 0.11 </td> <td style="text-align:left;"> 0.14 </td> <td style="text-align:left;"> 0.17 </td> <td style="text-align:left;"> 0.14 </td> <td style="text-align:left;"> 0.11 </td> <td style="text-align:left;"> 0.08 </td> <td style="text-align:left;"> 0.06 </td> <td style="text-align:left;"> 0.03 </td> </tr> </tbody> </table> <!-- --> ] --- class: center, middle # Questions? --- ## Binomial Distributions + A common type of discrete probability distribution is the **binomial distribution** + Properties: + There are only two possible outcomes, one reflecting `success` and one reflecting `failure` + The number of observations (*n*) is fixed + Each observation is independent of each other + The probability of success (*p*) is the same for each observation. -- + We are interested in the number of successes (*k*) given a fixed number of trials (*n*) -- > **Test your understanding:** Identify `success` and `\(n\)` in the following examples: > + The number of tails in a sequence of 5 coin tosses -- > + The incidence of a disease in a sample of 100 participants --- ## Binomial Probability Mass Function $$ P(X = k) = \binom{n}{k}p^{k}q^{n-k} $$ - `\(k\)` = number of `successes` - `\(n\)` = total trials, - `\(p\)` = probability of `success` - `\(q\)` = `\(1-p\)` or probability of `failure` - `\(\binom{n}{k}\)` = `\(n\)` choose `\(k\)`, or the number of ways to select `\(k\)` `successes` from `\(n\)` observations (aka a *combination*). --- ## Binomial PMF - Worked Example $$ P(X = 3) = \binom{n}{k}p^{k}q^{n-k} $$ + **Example:** + Random Experiment - Participants were asked to guess which hand a coin is in 5 times. + We want to calculate the probability of the participant selecting the correct hand 3 times of the 5 + This looks overwhelming, but let's break it down into it's separate parts. > Step 1 - Identify `\(n\)`, `\(p\)`, `\(q\)`, and `\(k\)` and plug them into the equation -- > + `\(n\)` = 5 > + `\(p\)` = 0.5 > + `\(q\)` = 0.5 > + `\(k\)` = 3 --- ## Binomial PMF - Worked Example $$ P(X = 3) = \binom{5}{3}\times0.5^{3}\times0.5^{5-3} $$ .pull-left[ > Step 2 - `\(\binom{5}{3}\)` + Reflects the number of ways we could get 3 `successes` from 5 trials + This could happen in multiple ways... ] -- .pull-right[ | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | |---------|---------|---------|---------|---------| | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | N | | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | N | | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | N | <span style='color: #BF1932;'>Y</span> | | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | | <span style='color: #BF1932;'>Y</span> | N | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | | N | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | | N | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | ] --- ## Binomial PMF - Worked Example $$ P(X = 3) = \binom{5}{3}\times0.5^{3}\times0.5^{5-3} $$ .pull-left[ > Step 2 - `\(\binom{5}{3}\)` + Reflects the number of ways we could get 3 `successes` from 5 trials + This could happen in multiple ways... + We could calculate this by hand, but it's much easier to use the formula for `\(\binom{n}{k}\)`: + `$$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$` ] .pull-right[ | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | |---------|---------|---------|---------|---------| | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | N | | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | N | | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | N | <span style='color: #BF1932;'>Y</span> | | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | | <span style='color: #BF1932;'>Y</span> | N | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | | N | <span style='color: #BF1932;'>Y</span> | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | | N | N | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | <span style='color: #BF1932;'>Y</span> | ] --- ## Binomial PMF - Worked Example Step 2 `$$\binom{5}{3} = \frac{5!}{3!(5-3)!}$$` `$$5! = 5*4*3*2*1 = 120$$` -- `$$\binom{5}{3}=\frac{5!}{3!(5-3)!} = \frac{5!}{3!2!} = \frac{120}{6\times 2} = 10$$` + There are 10 ways to get 3 `successes` from 5 trials --- ## Binomial PMF - Worked Example Steps 3 & 4 $$ P(X = 3) = 10 \times 0.5^{3} \times 0.5^{5-3} $$ -- > Step 3 - `\(p^{k}\)` + `\(0.5^3 = 0.125\)` -- > Step 4 - `\(q^{n-k}\)` + `\(0.5^{5-3} = 0.5^2 = 0.25\)` --- ## Binomial PMF - Worked Example Step 5 > Step 5 - Put it all together $$ P(X = 3) = 10 \times 0.125 \times 0.25 = 0.3125 $$ -- + Congratulations! We've worked out the probability of one possible outcome ( `\(X=3\)` ) of our random experiment! -- ... but we still have 5 more. | `\(k\)` | `\(P(X=k)\)` | |-----|----------| | 0 | ? | | 1 | ? | | 2 | ? | | 3 | .3125 | | 4 | ? | | 5 | ? | --- ## Binomial PMF in R + Luckily, you can use the `dbinom` function in R to calculate these things for you: ```r dbinom(x, size, prob) ``` + Where: + `x` = `\(k\)` + `size` = `\(n\)` + `prob` = `\(p\)` -- ```r dbinom(3, 5, 0.5) ``` ``` ## [1] 0.3125 ``` --- class: center, middle # Questions? --- ## Visualising binomial probability distribution .pull-left[ + We can pass these values to `ggplot` to produce a bar plot that shows the binomial probability distribution for this random experiment: <table> <thead> <tr> <th style="text-align:right;"> k </th> <th style="text-align:right;"> Pk </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0.03 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.16 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.31 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.31 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.16 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.03 </td> </tr> </tbody> </table> ] .pull-right[ <!-- --> ] --- ## Cumulative probability .pull-left[ + We've been looking at the **probability mass function** to investigate the total probability of a discrete outcome. + The **Cumulative distribution function** allows us to see the total probability of all values before or after a given point. + With a binomial distribution, the cumulative probability function simply sums the probabilities of the individual outcomes. + In R, we can use `pbinom` to get cumulutive probabilities: ```r round(pbinom(0:5, 5, 0.5), 2) ``` ``` ## [1] 0.03 0.19 0.50 0.81 0.97 1.00 ``` ] .pull-right[ <!-- --> ] --- ## Interpreting CDF .pull-left[ + **A** reflects the probability of selecting the correct hand 0, 1, or 2 out of five trials + In this example, 50% + **B** reflects the individual probability of selecting the correct hand 3 out of 5 trials + The difference between the probability of selecting the correct hand 0, 1, 2, or 3 trials and the probability of selecting the correct hand 0, 1, or 2 trials + `\(0.81-0.5 = 0.3125\)` ] .pull-right[ <!-- --> ] --- class: center, middle # Questions? --- # Summary of today - Random variables and random experiments - Assigning probabilities to outcomes and defining a probability distribution - Probability mass functions vs. cumulative distribution functions - The binomial distribution for assigning probabilities to sets of outcomes --- # Next tasks + Tomorrow, I'll present a live R session focused on computing and plotting discrete probability distributions + Next week, we will talk about continuous probability distributions + This week: + Attend the live R session + Complete your lab + Check the [reading list](https://eu01.alma.exlibrisgroup.com/leganto/readinglist/lists/43349908530002466) for recommended reading + Come to office hours + Monica's office hours are TODAY (Monday) from 2:30 - 3:30 in 7 George Sq, room UG44 + Weekly quiz + Opens Monday 09:00 + Closes Sunday 17:00