class: center, middle, inverse, title-slide #
S2W3 - Hypothesis Testing: critical values
## Data Analysis for Psychology in R 1 ### Umberto Noè ### Department of Psychology
The University of Edinburgh ### AY 2020-2021 --- # Learning objectives 1. Recognise the difference between a bootstrap and null distribution. 1. Understand the parallel between p-values and critical values. 1. Be able to perform a one-sided or two-sided hypothesis test using the critical value method. 1. Understand the link between z-scores and critical values. --- class: inverse, center, middle # Part 0 ## What you need to know --- # Parameters, statistics, best estimates - _Parameter_: a number that summarises some aspect of the population. -- - _Statistic_: a numerical summary of the sample data. -- - The statistic calculated from the sample is our _best estimate_ of the true but unknown value of the population parameter .footnote[ To revise [click here](https://uoepsy.github.io/dapr1/labs/11_sampling_distributions.html) ] --- # Normal distributions .pull-left[ In a bell-shaped distribution, approximately: - 68% of the values are within 1 SD of the mean - 95.4% of the values are within 2 SD of the mean - 99.7% of the values are within 3 SD of the mean ] .pull-right[ <img src="https://uoepsy.github.io/dapr1/labs/images/prob/normal_rule.png" width="100%" style="display: block; margin: auto;" /> ] To be precise, 95% of the values are within 1.96 SD of the mean: ```r qnorm(p = c(0.025, 0.975), mean = 0, sd = 1) ``` ``` ## [1] -1.96 1.96 ``` --- # Quantiles (a.k.a. percentiles) .pull-left[ - The `\(p\)`-quantile is the value that cuts an area equal to `\(p\)` to its left. - It is the value `\(x_p\)` such that `\(P(X \leq x_p) = p\)` - We use the term percentile when using _percentages_. The 0.5-quantile is the 50th percentile. ] .pull-right[ <img src="https://uoepsy.github.io/dapr1/labs/images/prob/normal_quantile.png" width="90%" style="display: block; margin: auto;" /> ] .footnote[ To revise [click here](https://uoepsy.github.io/dapr1/labs/10_continuous_distributions.html) ] --- # Quantiles (a.k.a. percentiles) .pull-left[ **Resampling approach** With data `df$y` ``` quantile(<data>, probs = <probs_to_the_left>) quantile(df$y, probs = c(0.25, 0.5, 0.75)) ``` With bootstrap means `bootstrap$means`: ```r quantile(bootstrap$means, probs = 0.5) ``` ``` ## 50% ## 1.265 ``` ] .pull-right[ **Theoretical approach** Find the quantiles of a probability distribution with the function `q` followed by the distribution name. _Example._ Quantiles of a normal distribution: ``` qnorm(p = <prob_to_left>, mean, sd) ``` The 0.025 and 0.975-quantiles of a standard normal distribution are: ```r qnorm(p = c(0.025, 0.975), mean = 0, sd = 1) ``` ``` ## [1] -1.96 1.96 ``` ] --- # Hypothesis testing 101 <!-- - We have a research question or hypothesis about the population. --> - Pose a question that you would like to investigate or an hypothesis you'd like to empirically check. -- - Identify the relevant population _parameters_. -- - Translate that question or hypothesis into null `\((H_0)\)` and alternative `\((H_1)\)` hypotheses. For example: `$$H_0 : \mu = 0$$` `$$H_1 : \mu > 0$$` -- - Find or collect data that will help you answer this question. -- - Compute the statistic that estimates the parameter of interest. For example, the sample mean `\(\bar x\)` .footnote[ To revise [click here](https://uoepsy.github.io/dapr1/labs/13_hypothesis_testing.html) ] --- # Measuring evidence against `\(H_0\)` - Light blue: _null distribution_ of sample means from samples of size `\(n = 50\)`. - Red line: _observed statistic_. .pull-left[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-7-1.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-8-1.png" width="70%" style="display: block; margin: auto;" /> ] Which scenario do you think provides more evidence that the population mean is greater than 0? --- # P-values and statistical significance .pull-left[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-9-1.png" width="60%" style="display: block; margin: auto;" /> $$ p = 0.14 $$ ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-10-1.png" width="60%" style="display: block; margin: auto;" /> $$ p < .001 $$ ] If results as extreme or more extreme than the observed statistic are unlikely to occur by sampling variation alone when the null hypothesis is true, we say the sample results are statistically significant. Statistical significance means that we have convincing evidence against `\(H_0\)` and in favour of `\(H_1\)`. --- class: inverse, center, middle # Part A ## Research question and data --- # Research question > Is reaction time in identifying ink colours increased when the ink is used to spell a different colour? Researchers recruited 131 participants for a study. Each participant was asked to complete two tasks, both requiring them to pronounce the _colour_ of words shown on a screen. In task (a) the colour and words matched, while in task (b) the colour and words did not match. <img src="https://uoepsy.github.io/dapr1/labs/images/numeric/stroop1.png" width="90%" style="display: block; margin: auto;" /> --- # Research question To evaluate whether mismatching words and colours increased participants reaction times, we can compute for each participant the _difference_ between the time to complete the mismatching colour-word task and the matching colour-word task. If the _average difference_ is larger than 0, then the mismatching colour-word task took _on average_ longer to complete. We are not interested in whether the mismatching colour-word task took longer to complete than the other task for one particular individual. This might happen by change. What we really want to do is assess if there is a pattern, hence why the mean! --- # Data The data can be found at: https://uoepsy.github.io/data/stroopexpt2.csv .pull-left[ ```r library(tidyverse) library(kableExtra) data <- read_csv("https://uoepsy.github.io/data/stroopexpt2.csv") dim(data) ``` ``` ## [1] 131 5 ``` ```r head(data) %>% kable(digits = 2) ``` ] .pull-right[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:right;"> age </th> <th style="text-align:right;"> matching </th> <th style="text-align:right;"> mismatching </th> <th style="text-align:right;"> stroop_effect </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 12.61 </td> <td style="text-align:right;"> 14.00 </td> <td style="text-align:right;"> 1.39 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 48 </td> <td style="text-align:right;"> 14.84 </td> <td style="text-align:right;"> 14.87 </td> <td style="text-align:right;"> 0.03 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 35 </td> <td style="text-align:right;"> 15.94 </td> <td style="text-align:right;"> 19.60 </td> <td style="text-align:right;"> 3.66 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 47 </td> <td style="text-align:right;"> 9.73 </td> <td style="text-align:right;"> 4.64 </td> <td style="text-align:right;"> -5.09 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 14.71 </td> <td style="text-align:right;"> 14.65 </td> <td style="text-align:right;"> -0.06 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 55 </td> <td style="text-align:right;"> 20.20 </td> <td style="text-align:right;"> 17.10 </td> <td style="text-align:right;"> -3.10 </td> </tr> </tbody> </table> ] <br> where: `stroop_effect = mismatching - matching` --- # Data Visualise the distribution of the `stroop_effect` variable in the sample data: .pull-left[ ```r ggplot(data) + geom_histogram(aes(x = stroop_effect), color = 'white') + labs(x = 'Stroop effect') ``` It is not very symmetric and bell-shaped... ```r data %>% summarise(Mean = mean(stroop_effect), SD = sd(stroop_effect)) %>% kable(digits = 3) ``` <table> <thead> <tr> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> SD </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.884 </td> <td style="text-align:right;"> 4.737 </td> </tr> </tbody> </table> ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> ] --- # Parameters and hypotheses - `\(\texttt{mismatching}_i\)` = time participant `\(i\)` took to complete the _mismatching_ colour-word task - `\(\texttt{matching}_i\)` = time participant `\(i\)` took to complete the _matching_ colour-word task -- - `\(D_i = \texttt{mismatching}_i - \texttt{matching}_i\)` = difference in completion times ("_Stroop effect_") -- - `\(\mu\)` = population mean difference in completion times -- We wish to test whether the population mean difference in completion times is larger than 0. -- That is, if the mean completion time of the mismatching colour-word task is higher than the matching colour-word task. -- `$$H_0 : \mu = 0$$` `$$H_1 : \mu > 0$$` --- # Sample statistic The observed sample mean difference in completion times is: ```r xbar_obs <- mean(data$stroop_effect) xbar_obs ``` ``` ## [1] 0.8843 ``` <br> This is just the mean of the differences (`stroop_effect`) in the sample data: `$$\bar{x}_{obs} = \frac{ \sum_{i=1}^n D_i }{ n }$$` --- class: inverse, center, middle # Part B ## Bootstrap distribution vs Null distribution --- # Bootstrap distribution ```r source('https://uoepsy.github.io/files/rep_sample_n.R') ``` Set the random seed: ```r set.seed(1) ``` Generate the bootstrap distribution: ```r n <- nrow(data) n ``` ``` ## [1] 131 ``` ```r boot_dist <- data %>% rep_sample_n(n = n, samples = 5000, replace = TRUE) %>% group_by(sample) %>% summarise(xbar = mean(stroop_effect)) ``` --- # Bootstrap distribution .pull-left[ ```r head(boot_dist) ``` ``` ## # A tibble: 6 x 2 ## sample xbar ## <dbl> <dbl> ## 1 1 1.24 ## 2 2 0.797 ## 3 3 1.10 ## 4 4 0.267 ## 5 5 0.901 ## 6 6 1.07 ``` ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-24-1.png" width="80%" style="display: block; margin: auto;" /> ] - Centre = mean of bootstrap distribution = 0.884 = observed sample mean - Spread = standard deviation of bootstrap distribution = 0.411 = bootstrap standard error <!-- - **Theoretical SE**: Recall the SD of the data is `\(s =\)` 4.737. The SE of the mean is `\(s / \sqrt{n}\)` = 0.414. --> --- # Null distribution: Resampling approach Centred at the value specified in the null hypothesis! ```r data <- data %>% mutate( stroop_effect_shifted = stroop_effect - xbar_obs ) mean(data$stroop_effect_shifted) %>% round(digits = 3) ``` ``` ## [1] 0 ``` ```r null_dist <- data %>% rep_sample_n(n = n, samples = 5000, replace = TRUE) %>% group_by(sample) %>% summarise(xbar = mean(stroop_effect_shifted)) ``` --- # Null distribution: Resampling approach .pull-left[ Centre and spread ```r # mean mu <- mean(null_dist$xbar) # standard error se <- sd(null_dist$xbar) tibble(mu, se) %>% kable(digits = 3) ``` <table> <thead> <tr> <th style="text-align:right;"> mu </th> <th style="text-align:right;"> se </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.006 </td> <td style="text-align:right;"> 0.411 </td> </tr> </tbody> </table> ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> ] --- # Null distribution: Theoretical approach Recall: $$ \bar X \sim N(\mu, SE), \qquad \qquad SE = \frac{s}{\sqrt{n}} $$ -- <br> But under the null hypothesis we assume that `\(H_0: \mu = 0\)`, so $$ \bar X \sim N(0, SE), \qquad \qquad SE = \frac{s}{\sqrt{n}} $$ --- # Null distribution: Theoretical approach .pull-left[ In R: ```r mu_theory <- 0 se_theory <- sd(data$stroop_effect) / sqrt(n) tibble(mu_theory, se_theory) %>% kable(digits = 3) ``` <table> <thead> <tr> <th style="text-align:right;"> mu_theory </th> <th style="text-align:right;"> se_theory </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0.414 </td> </tr> </tbody> </table> Then use `dnorm` for the density, `qnorm` for the quantiles, and `pnorm` for the probabilities of a normal distribution having mean and SE computed above. ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-30-1.png" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle # Part C ## Critical values (one-sided `\(H_1\)`) `$$H_0: \mu = 0$$` `$$H_1: \mu > 0$$` --- # Recap of p-values! .pull-left[ ```r pvalue <- sum(null_dist$xbar >= xbar_obs) / nrow(null_dist) pvalue ``` ``` ## [1] 0.0152 ``` The probability of observing a sample mean as large as 0.884 or larger, when the null hypothesis is true, is 0.0152. At the 5% significance level, the sample results provide strong evidence that the population mean difference in completion times is larger than 0. ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-32-1.png" style="display: block; margin: auto;" /> ] --- # Recap of p-values! <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-33-1.png" width="55%" style="display: block; margin: auto;" /> --- # Critical values: Resampling approach .pull-left[ ```r q0.95 <- quantile(null_dist$xbar, probs = 0.95) q0.95 ``` ``` ## 95% ## 0.6755 ``` ```r tibble(q0.95, xbar_obs, xbar_obs >= q0.95) %>% kable() ``` <table> <thead> <tr> <th style="text-align:right;"> q0.95 </th> <th style="text-align:right;"> xbar_obs </th> <th style="text-align:left;"> xbar_obs >= q0.95 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.6755 </td> <td style="text-align:right;"> 0.8843 </td> <td style="text-align:left;"> TRUE </td> </tr> </tbody> </table> ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" /> ] --- # Critical values: Theoretical approach .pull-left[ ```r # Resampling quantile(null_dist$xbar, probs = 0.95) ``` ``` ## 95% ## 0.6755 ``` ```r # Theoretical q0.95_theory <- qnorm(p = 0.95, mu_theory, se_theory) q0.95_theory ``` ``` ## [1] 0.6808 ``` <table> <thead> <tr> <th style="text-align:right;"> q0.95_theory </th> <th style="text-align:right;"> xbar_obs </th> <th style="text-align:left;"> xbar_obs >= q0.95_theory </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.6808 </td> <td style="text-align:right;"> 0.8843 </td> <td style="text-align:left;"> TRUE </td> </tr> </tbody> </table> ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle # Part D ## Critical values (two-sided `\(H_1\)`) `$$H_0: \mu = 0$$` `$$H_1: \mu \neq 0$$` --- # Critical values: Resampling approach .pull-left[ ```r quantile(null_dist$xbar, probs = c(0.025, 0.975)) ``` ``` ## 2.5% 97.5% ## -0.8074 0.7998 ``` ```r # Observed statistic xbar_obs ``` ``` ## [1] 0.8843 ``` The observed statistic `\(\bar x_{obs}\)` is larger than the 97.5th percentile, so we reject the null hypothesis. People often say that the observed statistic falls in the rejection region (the red intervals). ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-40-1.png" style="display: block; margin: auto;" /> ] --- # Critical values: Theoretical approach .pull-left[ ```r # Resampling quantile(null_dist$xbar, probs = c(0.025, 0.975)) ``` ``` ## 2.5% 97.5% ## -0.8074 0.7998 ``` ```r # Theoretical qnorm(p = c(0.025, 0.975), mean = mu_theory, sd = se_theory) ``` ``` ## [1] -0.8112 0.8112 ``` ```r # Observed statistic xbar_obs ``` ``` ## [1] 0.8843 ``` ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-42-1.png" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle # Part E ## Standardized statistics (a.k.a. z-scores) --- # Z-scores - For data `\(x_1, ..., x_n\)`, the z-score is $$ z_i = \frac{x_i - \mu}{\sigma} $$ -- - For our 5,000 means `\(\bar x_1, ..., \bar x_{5000}\)` from the null distribution, we compute the z-score as: $$ z_i = \frac{\bar x_i - 0}{SE} = \frac{\bar x_i - \text{hypothesised value}}{ SE } $$ where `\(SE\)` = standard error of the mean. -- - Don't forget to also transform the observed statistic to standard units! We need to z-score the observed statistic to bring it to the same scale: `$$z_{obs} = \frac{\bar x_{obs} - \text{hypothesised value}}{ SE }$$` --- # Z-scores How is the standard error computed? - **Resampling approach**: `\(SE\)` = standard deviation of the null distribution - **Theoretical approach**: `\(SE = \frac{s}{\sqrt{n}}\)` --- # Resampling approach .pull-left[ ```r hypothesised_value <- 0 null_dist$z <- (null_dist$xbar - hypothesised_value) / sd(null_dist$xbar) z_obs <- (xbar_obs - hypothesised_value) / sd(null_dist$xbar) z_obs ``` ``` ## [1] 2.152 ``` ```r quantile(null_dist$z, probs = c(0.025, 0.975)) ``` ``` ## 2.5% 97.5% ## -1.965 1.947 ``` ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-46-1.png" style="display: block; margin: auto;" /> ] --- # Theoretical approach .pull-left[ ```r # Resampling quantile(null_dist$z, probs = c(0.025, 0.975)) ``` ``` ## 2.5% 97.5% ## -1.965 1.947 ``` ```r # Theoretical q <- qnorm(c(0.025, 0.975), mean=0, sd=1) q ``` ``` ## [1] -1.96 1.96 ``` ```r # Observed z-score z_obs_theory <- (xbar_obs - hypothesised_value) / se_theory z_obs_theory ``` ``` ## [1] 2.136 ``` <!-- Check the probability to the left of the theoretical quantiles: --> ] .pull-right[ <img src="dapR1_lec13_nhstcrit_files/figure-html/unnamed-chunk-49-1.png" style="display: block; margin: auto;" /> ] --- # What if you don't want to z-score `\(\bar x_{obs}\)`? - Recall that $$ \text{When } H_0 : \mu = 0 \text{ is true:} \qquad \bar X \sim N(0, SE) $$ -- - We also know that for a normal distribution, 95% of the values lie within 1.96 SE of the mean (= 0 in this case). -- - So, we would reject the null hypothesis if the observed sample mean is smaller than `\(-1.96 \cdot SE\)` or larger than `\(1.96 \cdot SE\)`. -- - If we use the absolute value to ignore the sign, we can simply refer to the upper tail and `$$\text{Reject } H_0 \text{ if:} \qquad |\bar x_{obs}| \geq 1.96 \cdot SE$$` --- class: inverse, center, middle, animated, rotateInDownLeft # End