Hypothesis testing and confidence intervals

Semester 2 - Week 4

1 Formative Report C

Instructions and data were released in week 1 of semester 2.

Next week: Submission of Formative report C

Next week, your group must submit one PDF file for formative report C by 12 noon on Friday the 14th of February 2025. Name your submission: Group NUMBER.LETTER Formative C.pdf

One person must submit on behalf of the entire group and let the group know when they have submitted by leaving a note on the Group Discussion Space.

To submit go to the course Learn page, click “Assessment”, then click “Submit Formative Report C (PDF file only)”.

No extensions. As mentioned in the Assessment Information page, no extensions are possible for group-based reports.

1.1 This week’s task

Task C4

Tidy up your report so far, making sure to have 3 main sections: Introduction, Analysis, and Discussion. After those, you can have Appendix A (optional, for additional figures or tables that don’t fit in the page limit) and a compulsory Appendix B (displaying the R code used).

Sub-steps

Below there are sub-steps you need to consider to complete this week’s task.

Tip

To see the hints, hover your cursor on the superscript numbers.

  • Reopen last week’s Rmd file, as you will continue last week’s work and build on it.1
  • Did you install tinytex? If yes, go to the next bullet point. If not, check the hint. This package is required to compile an Rmd file into a PDF for submission.2
  • Organise the report to have the following structure:

    • Introduction:

      • What are the data that you are analysing (i.e. give a brief intro) and where can these be found?
      • What are the variables in the data and the type of these variables?
      • Which questions of interest are you investigating in the report?
      • Which variables will you use to answer those questions and what do those variables represent?
      • Are there any missing values in these variables?
    • Analysis: Present and interpret your results. This section should only contain text, figures, and tables. No R code or R output printout should be visible.

    • Discussion: Summarise the key findings from the analysis section, and provide take-home messages that directly answer the questions of interest. Link your answers to the questions detailed in the introduction. No new statistical results should be presented in the discussion.

    • Appendix A (optional): For additional figures and tables that don’t fit in the page limit. Any figures/tables presented here should be referenced in the main part of the report and have a caption. Appendix A doesn’t count in the page limit.

    • Appendix B (compulsory): Presents all the R code used. This is automatically created for you if you used the template Rmd file. If you haven’t copy and paste that section from the template into your file. Appendix B doesn’t count in the page limit.

  • Knit the document to PDF: click File > Knit Document.

Any Knitting Errors? Work Through Successful Knitting Checklist!

If you encounter errors when knitting the Rmd file, work through the suggestions in the Successful knitting checklist step by step to fix the source of errors. If none of those suggestions fix it, please seek the assistance of a tutor in the lab!

  • Edit your figures/tables formatting as required to ensure that your report meets the page limit.
Formatting Resources

The Formatting Resources webpage compiles helpful guidance on many formatting topics that you will want to consider when fixing your report formatting:

  • Fixing knitting errors using the Successful Knitting Checklist section
  • Follow APA style guidelines
  • Hiding R code and/or output
  • Referencing figures and tables
  • Resizing figures or placing figures side by side
  • Ensure any outstanding write-ups/interpretations are added this week, and that you are happy with the report formatting. Next week you will be asked to add assumption checks, then knit the file, and submit the PDF.

2 Worked Example

The R code is visible here for instructional purposes only, but it should not be visible in a PDF report. It should only appear as part of the appendix.

library(tidyverse)
library(patchwork)
library(kableExtra)

pass_scores <- read_csv("https://uoepsy.github.io/data/pass_scores.csv")
dim(pass_scores)
head(pass_scores)
glimpse(pass_scores)

summary(pass_scores)

plt_hist <- ggplot(pass_scores, aes(x = PASS)) + 
    geom_histogram(color = 'white') +
    labs(x = "PASS scores", title = "(a) Histogram")

plt_box <- ggplot(pass_scores, aes(x = PASS)) + 
    geom_boxplot() +
    labs(x = "PASS scores", title = "(b) Boxplot")

plt_hist / plt_box
stats <- pass_scores |>
    summarise(n = n(),
              Min = min(PASS),
              Max = max(PASS),
              M = mean(PASS),
              SD = sd(PASS))

kbl(stats, booktabs = TRUE, digits = 2, 
    caption = "Descriptive statistics for PASS scores")

# Confidence interval
xbar <- stats$M
s <- stats$SD
n <- stats$n
se <- s / sqrt(n)
tstar <- qt(c(0.025, 0.975), df = n - 1)

xbar + tstar * se

# observed t-statistic
tobs <- (xbar - 33) / se
tobs

# p-value method
pvalue <- 2 * pt(abs(tobs), df = n - 1, lower.tail = FALSE)
pvalue

# critical value method
tstar
tobs

A random sample of 20 students from the University of Edinburgh completed a questionnaire measuring their total endorsement of procrastination. The data, available from https://uoepsy.github.io/data/pass_scores.csv, were used to estimate the average procrastination score of all Edinburgh University students, as well as testing whether the mean procrastination score differed from the Solomon & Rothblum reported average of 33 at the 5% significance level. The recorded variables include a subject identifier (sid, categorical), the school each belongs to (school, categorical), and the total score on the Procrastination Assessment Scale for Students (PASS, numeric). The data do not include any impossible values for the PASS scores, as they were all within the possible range of 0 – 90. To answer the questions of interest, in the following we will only focus on the total PASS score variable.

The distribution of PASS scores, as shown in Figure 1(a), is roughly bell shaped and does not have any impossible values. The outlier (40) depicted in the boxplot shown in Figure 1(b) is well within the range of plausible values for the PASS scale (0–90) and as such was not removed for the analysis.

Figure 1: Distribution of PASS scores for a sample of Edinburgh University students
Table 1: Descriptive statistics for PASS scores
n Min Max M SD
20 24 40 30.7 3.31

Table 1 displays summary statistics for the PASS scores in the sample of Edinburgh University students. From the sample data we obtain an average procrastination score of \(M = 30.7\), 95% CI [29.15, 32.25]. Hence, we are 95% confident that a Edinburgh University student will have a procrastination score between 29.15 and 32.25, which is between 0.75 and 3.85 lower than the average score of 33 reported by Solomon & Rothblum.

Let \(\mu\) denote the mean PASS score of all Edinburgh University students. At the 5% significance level, we performed a one sample t-test of \(H_0 : \mu = 33\) against \(H_1 : \mu \neq 33\). The sample data provide very strong evidence against the null hypothesis and in favour of the alternative one that the mean procrastination score of Edinburgh University students is significantly different from the Solomon & Rothblum reported average of 33: \(t(19) = -3.11, p = .006\), two-sided.

Data including the Procrastination Assessment Scale for Students (PASS) scores for a random sample of 20 students at Edinburgh University we used to estimate the average procrastination score for a student of that university. In addition, the data were used to test whether there is a significant difference between that average score and the Solomon & Rothblum reported average of 33.

We are 95% confident that a Edinburgh University student will have a procrastination score between 29.15 and 32.25. Furthermore, at the 5% significance level, the data provide very strong evidence that the mean procrastination score of Edinburgh University students is different from 33. The confidence interval, reported above, indicates that a Edinburgh University student tends to have a mean procrastination score between 0.75 and 3.85 lower than the Solomon & Rothblum reported average of 33.

What is missing from this instructional example:

  • Appendix A
  • Appendix B
Back to top

Footnotes

  1. Hint: Ask last week’s driver for the Rmd file, they should share it with the group via email or the group discussion space. To download the file from the server, go to the RStudio Files pane, tick the box next to the Rmd file, and select More > Export.↩︎

  2. Installing tinytex

    Step 1. Copy the line below and paste it into the console:

    install.packages("tinytex")

    press Enter.

    Step 2. Next, copy and paste the line below into the console:

    tinytex::install_tinytex(force = TRUE)

    press Enter.↩︎