1. Understand what are Type I and Type II errors in hypothesis testing.

  2. Recognise the significance level as measuring the tolerable chance of committing a Type I error.

  3. Recognise the effect of sample size on power.

  4. Be able to check the assumptions underlying the t-test for a population mean.

Exercises: t-test, effect size, assumptions

You will be using the following dataset: https://uoepsy.github.io/data/students_reading_scores.csv

These contain measurements of age (in years) and reading scores (0 - 100) for 35 university students randomly sampled from a hypothetical university.

Question 1

Read the data into R.

Solution

Question 2

Perform preliminary checks on the data by:

  • plotting the variables in the dataset;
  • computing a table of descriptive statistics.

Solution

Question 3

What do you notice? Are there any issues with the data that need fixing?

If yes, make appropriate changes to the data to fix those issues.

Solution

Question 4

We will only be using the reading scores.

Remove from the data any rows for which the reading score is missing.

Solution

Question 5

At the 5% significance level, test whether the sample data provide significance evidence that the mean reading score for all students in that university is not equal to 50.

Solution

Question 6

The t-test for one population mean relies on some assumptions for the results to be valid. Check whether these are satisfied or not in this sample.

Solution

Question 7

Provide a write up pf your results in the context of the study.

Solution

Question 8

Is there a significant effect? Is the effect important?

Solution

Exercises: power

To compute power, you need to know the distribution of the sample mean

  • when \(H_0\) is true
  • when \(H_1\) is true

This seldom happens in practice, so these exercises will be more conceptual.

Before going ahead, though, remember that if the population has mean \(\mu\) the sampling distribution of the mean is \(N(\mu, \frac{\sigma}{\sqrt n})\), where \(\sigma\) is the population SD and \(n\) the size of a sample.

Suppose you are testing

\[ H_0 : \mu = 0 \\ H_1 : \mu > 0 \]

What is the power of the test if:

  • the population has a mean \(\mu = 5\)
  • the population has a standard deviation \(\sigma = 3\)
  • you will take a sample of size 30

The distribution of the sample mean in the population is

\[\bar X \sim N(5, \frac{3}{\sqrt{30}})\]

If \(H_0\) is true, however, the distribution would be:

\[\bar X \sim N(0, \frac{3}{\sqrt{30}})\]

You would reject the null hypothesis, at the 5% significance level, for values larger the 0.95 quantile of a \(N(0, \frac{3}{\sqrt{30}})\) distribution:

sigma <- 3
n <- 30
SE <- 3 / sqrt(30)
tstar <- qnorm(0.95, mean = 0, sd = SE)
tstar
## [1] 0.9009234

The power is the probability of rejecting the null hypothesis when the null is false.

So, if the null is false, the mean follows the distribution \(N(5, \frac{3}{\sqrt{30}})\). The probability of correctly rejecting the null would be the probability under that distribution beyond -1.07

Pow <- 1 - pnorm(0.90, mean = 5, sd = SE)
Pow
## [1] 1

The power is 1.

Question 9

Assume now that the true population mean is \(\mu = 2\). What is the power of the test?

Solution

Question 10

Assume now that the true population mean is \(\mu = 2\). What is the probability of a Type II error?

Solution

References