class: center, middle, inverse, title-slide .title[ #
Week 3: Testing Statistical Hypotheses
] .subtitle[ ## Univariate Statistics and Methodology using R ] .author[ ### ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- # Today's Key Topics + One-tailed vs Two-tailed Hypotheses + Null vs Alternative Hypotheses + The Null Distribution + z-Scores + t-tests + `\(\alpha\)` --- class: inverse, center, middle # Part 1 --- # More about Height .pull-left[ + Last time we simulated the heights of a population of 10,000 people + `\(\bar{x}\)` = 170 cm + `\(\sigma\)` = 12 cm ``` ## height ## 1 183.4 ## 2 168.6 ## 3 179.1 ## 4 170.1 ``` ![](lecture_3_files/figure-html/unnamed-chunk-2-1.svg)<!-- --> ] -- .pull-right[ + This time, you'll learn how to compute the probability of randomly observing a specific value within the normal distribution .center[ <img src="lecture_3_files/img/playmo_tms.jpg" width="65%" /> ]] ??? More specifically, a more extreme value than your value of interest --- # How Unusual is Casper? .pull-left[ - In his socks, Casper is 198 cm tall - How likely would we be to find someone Casper's height in our population? .center[ <img src="lecture_3_files/img/playmo_tms.jpg" width="55%" /> ] ] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-5-1.svg)<!-- --> ] ??? - we can mark Casper's height on our normal curve - but we can't do anything with this information + technically the line has "no width" so we can't calculate area + we need to reformulate the question --- # How Unusual is Casper (Take 2)? .pull-left[ - In his socks, Casper is 198 cm tall - How likely would we be to find someone Casper's height **or taller** in our population? .center[ <img src="lecture_3_files/img/playmo_tms.jpg" width="55%" /> ] ] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-7-1.svg)<!-- --> ] --- count: false # How Unusual is Casper (Take 2)? .pull-left[ - In his socks, Casper is 198 cm tall - How likely would we be to find someone Casper's height **or taller** in our population? - The area is 0.0098 - So the probability of finding someone in the population of Casper's height or greater is 0.0098 (or, `\(p=0.0098\)` ) ] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-8-1.svg)<!-- --> ] --- # Area under the Curve .pull-left[ - So now we know that the area under the curve can be used to quantify **probability** - But how do we calculate area under the curve? - Luckily, R has us covered, using (in this case) the `pnorm()` function ```r pnorm(198, mean = 170, sd=12, lower.tail = FALSE) ``` ``` ## [1] 0.009815 ``` ] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-9-1.svg)<!-- --> ] --- # Area under the Curve .center[ ![](lecture_3_files/figure-html/tails-1.svg)<!-- --> ] .pull-left[ ```r pnorm(198, mean = 170, sd=12, lower.tail = TRUE) ``` ``` ## [1] 0.9902 ``` ] .pull-right[ ```r pnorm(198, mean = 170, sd=12, lower.tail = FALSE) ``` ``` ## [1] 0.009815 ``` ] -- ```r pnorm(198, mean = 170, sd=12, lower.tail = TRUE) + pnorm(198, mean = 170, sd=12,lower.tail = FALSE) ``` ``` ## [1] 1 ``` ??? + Note that if you add the two functions together, you get 1, or the total possible area under the curve. + Obviously you could have come to this conclusion by looking at the numbers, but I also wanted to reinforce that you can add function outputs together in R. --- # Tailedness .pull-left[ + In this example, we kind of knew that Casper was _tall_ + It made sense to ask what the likelihood of finding someone 198 cm _or greater_ was + This is called a **one-tailed hypothesis** (we're not expecting Casper to be well _below_ average height!) ] --- count: false # Tailedness .pull-left[ - In this example, we kind of knew that Casper was _tall_ + It made sense to ask what the likelihood of finding someone 198 cm _or greater_ was + This is called a **one-tailed hypothesis** - Often our hypothesis might be vaguer + We expect Casper to be _"different"_, but we're not sure how + In this case, we would make a non-directional, or **two-tailed hypothesis** ] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-13-1.svg)<!-- --> ] --- # Tailedness .pull-left[ + For a two-tailed hypothesis we need to sum the relevant upper and lower areas: ```r 2 * pnorm(198, 170, 12, lower.tail = FALSE) ``` ``` ## [1] 0.01963 ``` ] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-15-1.svg)<!-- --> ] ??? Since the normal curve is symmetrical, summing the two tails is easy! --- # So: Is Casper Special? - How surprised should we be that Casper is 198 cm tall? - Given the population he's in, the probability that he's 28cm or more taller than the mean of 170 is 0.0098 + (Keep in mind, this is according to a _one-tailed hypothesis_) -- - A more accurate way of saying this is that 0.0098 is the probability of selecting him (or someone even taller than him) from the population at random + There is about a 1% chance of selecting someone Casper's size or taller from the population. --- # A Judgement Call We have to decide: -- .pull-left[ .center[ If a 1% probability is _small enough_ ] .center[ <img src="lecture_3_files/img/playmo_good.jpg" width="55%" /> ]] ??? If a 1% probability of selecting someone 28cm or more above the mean is enough to surprise us, then we should be surprised If, on the other hand, we think 1% isn't particularly low, then we shouldn't be surprised -- .pull-right[ .center[ If a 1% chance doesn't impress us much .center[ <img src="lecture_3_files/img/playmo_bad.jpg" width="55%" /> ]] ] -- Note that, in either case, we have nothing (mathematical) to say about the _reasons_ for Casper's height ??? when it comes to comparing means, we'll see that there are conventions for the criteria we use, but they are just that: conventions, because this is a judgement call. And importantly, none of the maths we have done will tell us anything about the _reasons_ for Casper's height—that's for us to reason about. Perhaps he's so tall because he's weightless, or perhaps he had a lot of milk in his diet: That's the substance of the scientific paper we're going to write: The statistical calculation just tells us that he's mildly unusual. In the next part, we're going to look at how this works for group means as opposed to individuals, but the TL;DR is: pretty much the same. --- class: inverse, center, middle, animated, rubberBand # End of Part 1 --- class: inverse, center, middle # Part Two ### Group Means --- # Sleeping Guidelines .br3.pa2.pt2.bg-gray.white.f3[ The USMR instructors are concerned that university students are not following the recommended sleep guidelines of 8 hours per night, and worry this could affect their academic performance. Is this idea worth further investigation? ] -- ![](lecture_3_files/figure-html/invest-1.svg)<!-- --> ??? We ask students to record the number of hours they slept the night before (also, can anyone thing of any problems with this method?). Have students state the hypothesis and null hypothesis --- # Information About our Study .pull-left[ + **Null Hypothesis** ( `\(\sf{H_{0}}\)` ) + Students are getting the recommended amount of sleep + **Alternative Hypothesis** ( `\(\sf{H_{1}}\)` ) + Students are getting less than the recommended amount of sleep + There are 12 students ] .pull-right[ ```r summary(m) ``` ``` ## sleep names ## Min. :3.28 Abigail:1 ## 1st Qu.:5.16 Brent :1 ## Median :7.38 Chenyu :1 ## Mean :6.81 Dave :1 ## 3rd Qu.:8.33 Emil :1 ## Max. :9.79 Fergus :1 ## (Other):6 ``` ```r sd(m$sleep) ``` ``` ## [1] 2.273 ``` ] -- .br3.pa2.pt2.bg-gray.white.f4[ What is the probability of a group of 12 people getting a mean of 6.81 hours of sleep, given that most adults need around 8? (assuming they came from the same population) ] ??? Is this a one-tailed or two-tailed hypothesis? What would the two-tailed version be? Why is that the question that we care about? And what does it mean to come from the same population? To understand that, we need to go back to idea of the normal distribution we talked about last week. --- # Back to the Normal Distribution .center[ ![](lecture_3_files/figure-html/unnamed-chunk-21-1.svg)<!-- --> ] .br3.pa2.pt2.bg-gray.white.f4[ What is the probability of a group of 12 people getting a mean of 6.81 hours of sleep, given that most adults need around 8? (assuming they came from the same population) ] ??? Let's take a guess that hours of sleep is a normally distributed variable in the population with a mean of around 8. So let's look at a density plot of this variable (remember, this shows the frequency of each value of the variable, or how often that value is measured). If hours of sleep is a normally distributed variable, it means that the hours of sleep most people need falls right around the mean, with some people needing fewer hours and some people needing more. What do we mean when we say same population? In general, we mean a collection of individuals who have similar characteristics. For the purpose of this example, our population will refer to adults with typical sleep needs. For example the sleep recommendation of around 8 hours is for those 18 years and older. So in this case, children aren't part of our population. Someone with a sleep disorder is also not part of our population. --- count:false # Back to the Normal Distribution .center[ ![](lecture_3_files/figure-html/unnamed-chunk-22-1.svg)<!-- --> ] .br3.pa2.pt2.bg-gray.white.f4[ What is the probability of a group of 12 people getting a mean of 6.81 hours of sleep, given that most adults need around 8? (assuming they came from the same population) ] --- # The Null Distribution .center[ ![](lecture_3_files/figure-html/unnamed-chunk-23-1.svg)<!-- --> ] + `\(\sf{H_{0}}\)`: Students are getting the recommended amount of sleep + If `\(\sf{H_{0}}\)` reflects the **ground truth** (and _sleep_ is a normally distributed variable), we would expect a frequency distribution of student measurements to look pretty similar to this --- count: false # The Null Distribution .center[ ![](lecture_3_files/figure-html/unnamed-chunk-24-1.svg)<!-- --> ] + `\(\sf{H_{0}}\)`: Students are getting the recommended amount of sleep + If `\(\sf{H_{0}}\)` reflects the **ground truth** (and _sleep_ is a normally distributed variable), we would expect a frequency distribution of student measurements to look pretty similar to this ??? The null distribution is the probability distribution of the scores when the null hypothesis is accurate. But what if the null hypothesis isn't accurate? What if the reason we see differences in the two means is because there are systematic differences in our group that affect their sleep? If that were truly the case, the distribution of the scores that we measured may support our alternative hypothesis. So let's map those. --- # The Null Distribution .center[ ![](lecture_3_files/figure-html/unnamed-chunk-25-1.svg)<!-- --> ] -- + From this, it _looks_ as though our students may be getting less sleep than they should. -- + However, simply seeing a shift in the curves isn't enough evidence to make that claim with certainty. --- # Back to the Normal Distribution...Again We can compute the probability of a score's occurrence if it is part of a standardized normal distribution ( `\(\mu\)` = 0, `\(\sigma\)` = 1) .pull-left[ ![](lecture_3_files/figure-html/unnamed-chunk-26-1.svg)<!-- --> ] -- .pull-right[ If our data are normally distributed, we can standardize the mean by converting it to a **z-score** .center[ `\(z = \frac{\bar{x} - \mu}{(\sigma/\sqrt{n})}\)` ]] --- count:false # Back to the Normal Distribution...Again We can compute the probability of a score's occurrence if it is part of a standardized normal distribution ( `\(\mu\)` = 0, `\(\sigma\)` = 1) .pull-left[ ![](lecture_3_files/figure-html/unnamed-chunk-27-1.svg)<!-- --> ] .pull-right[ If our data are normally distributed, we can standardize the mean by converting it to a **z-score** .center[ `\(z = \frac{\bar{x} - \mu}{(\sigma/\sqrt{n})}\)` ] ```r (mean(m$sleep) - 8)/(sd(m$sleep)/sqrt(12)) ``` ``` ## [1] -1.814 ``` ] --- count:false # Back to the Normal Distribution...Again We can compute the probability of a score's occurrence if it is part of a standardized normal distribution ( `\(\mu\)` = 0, `\(\sigma\)` = 1) .pull-left[ ![](lecture_3_files/figure-html/unnamed-chunk-29-1.svg)<!-- --> ] .pull-right[ If our data are normally distributed, we can standardize the mean by converting it to a **z-score** .center[ `\(z = \frac{\bar{x} - \mu}{(\sigma/\sqrt{n})}\)` ] ```r (mean(m$sleep) - 8)/(sd(m$sleep)/sqrt(12)) ``` ``` ## [1] -1.814 ``` ```r pnorm(-1.814, mean = 0, sd = 1) ``` ``` ## [1] 0.03484 ``` ] --- # Back to the Normal Distribution...Again We can compute the probability of a score's occurrence if it is part of a standardized normal distribution ( `\(\mu\)` = 0, `\(\sigma\)` = 1) .pull-left[ ![](lecture_3_files/figure-html/unnamed-chunk-32-1.svg)<!-- --> ] .pull-right[ **If you picked 12 people at random from a population of people who get the recommended number of hours of sleep, there would be a 3% chance that their average sleep would be 6.81 hours or less** ] --- class: inverse, middle, center, animated, rubberBand # End of Part 2 --- class: inverse, middle, center # Part 3 ### The `\(t\)`-test --- # A Small Confession .pull-left[ <img src="lecture_3_files/img/playmo_liar.jpg" width="70%" /> ] .pull-right[ ### Part Two wasn't entirely true - All of the principles are correct, but for smaller `\(n\)` the normal curve isn't the best estimate - For that we use the `\(t\)` distribution ] --- # The `\(t\)` Distribution .pull-left[ .center[ <img src="lecture_3_files/img/gossett.jpg" width="40%" /> "A. Student", or William Sealy Gossett ]] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-35-1.svg)<!-- --> ] ??? The official name for the `\(t\)`-distribution is "Student's t-distribution", after William Gossett's pen-name Gossett specialised in statistics for relatively small numbers of observations, working with Pearson, Fisher, and others --- count: false # The `\(t\)` Distribution .pull-left[ .center[ <img src="lecture_3_files/img/gossett.jpg" width="40%" /> "A. Student", or William Sealy Gossett ]] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-37-1.svg)<!-- --> ] .pt2[ Note that the shape changes according to _degrees of freedom_ ] --- count: false # The `\(t\)` Distribution .pull-left[ .center[ <img src="lecture_3_files/img/gossett.jpg" width="40%" /> "A. Student", or William Sealy Gossett ]] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-39-1.svg)<!-- --> ] .pt2[ Note that the shape changes according to _degrees of freedom_ ] --- # The `\(t\)` Distribution + Conceptually, the `\(t\)` distribution increases uncertainty when the sample is small + The probability of more extreme values is slightly higher + Exact shape of distribution depends on sample size ![](lecture_3_files/figure-html/ts-1.svg)<!-- --> ??? here, I'= only showing the right-hand side of each distribution, so that you can see the differences between different degrees of freedom the distributions are, of course, symmetrical --- # Using the t Distribution - In part 2, we calculated the mean hours of sleep for the group as 6.81 - We used the formula `\(z=\frac{\bar{x} - \mu}{\sigma/\sqrt{n}}\)` to calculate `\(z\)`, and the standard normal curve to calculate probability -- - **The formula for a one-sample `\(t\)`-test is the same as the formula for `\(z\)` ** + What differs is the _distribution we are using to calculate probability_ + We need to know the degrees of freedom (to get the right `\(t\)`-curve) - so `\(t(\textrm{df}) = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}}\)` --- # Probability According to t .pull-left[ - for 12 people who got a mean 6.81 hours of sleep with a sd of 2.2732 - `\(t(11) = \frac{6.81 - 8}{2.27/\sqrt{12}} = -1.816\)` ] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-40-1.svg)<!-- --> ] --- count: false # Probability According to t .pull-left[ - for 12 people who got a mean 6.81 hours of sleep with a sd of 2.2732 - `\(t(11) = \frac{6.81 - 8}{2.27/\sqrt{12}} = -1.816\)` - instead of `pnorm()` we use `pt()` for the `\(t\)` distribution ] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-41-1.svg)<!-- --> ] --- count: false # Probability According to t .pull-left[ - for 12 people who got a mean 6.81 hours of sleep with a sd of 2.2732 - `\(t(11) = \frac{6.81 - 8}{2.27/\sqrt{12}} = -1.816\)` - instead of `pnorm()` we use `pt()` for the `\(t\)` distribution - `pt()` requires the degrees of freedom: ```r pt(-1.814, df=11, lower.tail = TRUE) ``` ``` ## [1] 0.04851 ``` ] .pull-right[ ![](lecture_3_files/figure-html/unnamed-chunk-43-1.svg)<!-- --> ] --- # Did We Have to Do All That Work? -- No. -- .flex.items-top[ .w-70.pa2[ ```r head(m$sleep) ``` ``` ## [1] 7.122 9.791 5.674 8.264 8.176 7.647 ``` ```r t.test(m$sleep, mu=8, alternative = "less") ``` ``` ## ## One Sample t-test ## ## data: m$sleep ## t = -1.8, df = 11, p-value = 0.05 ## alternative hypothesis: true mean is less than 8 ## 95 percent confidence interval: ## -Inf 7.988 ## sample estimates: ## mean of x ## 6.809 ``` ] .w-30.pa2[ - **One-sample** `\(t\)`-test - Compares a single sample against a hypothetical mean (`mu`) ]] ??? note the use of "alternative = greater" here. we'll talk about that on the next slide. --- # Types of Hypothesis ```r t.test(m$sleep, mu=0, alternative = "less") ``` + Note the use of `alternative="less"` + This refers to the direction of our **alternative hypothesis, `\(H_{1}\)`** + `\(H_{1}\)` is that our students would be getting _less_ sleep than the average person. + Can also have `alternative="greater"`... -- + Our students are getting _more_ sleep than the average person -- + ...And `alternative="two.sided"` -- + Our students are getting _different_ amounts of sleep than the average person --- # Putting it Together For `\(t(11)=-1.816\)`, `\(p =0.0483\)`: .br3.pa2.pt2.bg-gray.white.f4[ If you picked 12 people at random from a population of people who get the recommended number of hours of sleep, there would be a 5% chance that their average sleep would be 6.81 hours or less ] -- .pt2[ - Is 5% low enough for you to believe that the mean sleep probably wasn't due to chance? - Perhaps we'd better face up to this question! ] --- # Making a Decision + To make this decision, we use a cut-off value for `\(p\)` called `\(\alpha\)` -- + `\(\alpha\)` is the probability of rejecting `\({H_{0}}\)` when it actually reflects the ground truth -- + If `\(p\)` is less than `\(\alpha\)`, we can decide to _reject H<sub>0</sub>_ and _accept H<sub>1</sub>_ + If `\(p\)` is greater than `\(\alpha\)`, we _fail to reject H<sub>0</sub>_ -- + Typically, in Psychology, ** `\(\alpha\)` is set to .05** + We're willing to take a 5% risk of incorrectly rejecting the null hypothesis. -- + It's important to set `\(\alpha\)` _before_ any statistical analysis --- # Making a Decision .pull-left[ .center[ **One-Tailed** ] ![](lecture_3_files/figure-html/unnamed-chunk-46-1.svg)<!-- --> ] .pull-right[ .center[ **Two-Tailed** ] ![](lecture_3_files/figure-html/unnamed-chunk-47-1.svg)<!-- --> ] ??? I've shown you here on the normal curve because it's a more general distribution than the `\(t\)` distribution, which requires degrees of freedom, but the graphs would look quite similar, as you know - if we're testing a medicine, with possible side-effects, we want alpha to be low - for general psychology, meh... - no leading zeroes because `\(p\)` is always less than 1 - no cheating now!fl --- # `\(p < .05\)` - The `\(p\)`-value is the probability of finding our results under H<sub>0</sub>, the null hypothesis - H<sub>0</sub> is essentially "💩 happens" - `\(\alpha\)` is the maximum level of `\(p\)` at which we are prepared to conclude that H<sub>0</sub> is false (and argue for H<sub>1</sub>) -- .flash.animated[ ## there is a 5% probability of falsely rejecting H<sub>0</sub> ] - Wrongly rejecting H<sub>0</sub> (false positive) is a **type 1 error** - Wrongly failing to reject H<sub>0</sub> (false negative) is a **type 2 error** ??? --- # Types of t-tests + All `\(t\)`-tests compare two means, but with different group constraints -- + **One-sample `\(t\)`-test** -- + Compares the mean from a range of scores to a specific value -- + Lets you examine whether the mean of your data is significantly different from a set value -- + **Independent-samples `\(t\)`-test** -- + Compares the means of two independent groups -- + Lets you examine whether two groups differ significantly from each other on the variable of interest -- + **Paired-samples `\(t\)`-test** -- + Compares means that are paired in some way -- + Allows you to compare measures that come from the same individual, e.g. ??? Bearing all that in mind, let's get ready to jump into the Live R --- class: inverse, center, middle, animated, rubberBand # End