Block 4 Analysis & Write-Up Example

Learning Objectives

At the end of this lab, you will:

  1. Understand how to write-up and provide interpretation of a binary logistic regression model

What You Need

  1. Be up to date with lectures
  2. Have completed previous lab exercises from Semester 2 Week 7 and Semester 2 Week 8

Required R Packages

Remember to load all packages within a code chunk at the start of your RMarkdown file using library(). If you do not have a package and need to install, do so within the console using install.packages(" "). For further guidance on installing/updating packages, see Section C here.

For this lab, you will need to load the following package(s):

  • tidyverse
  • patchwork
  • kableExtra
  • psych
  • sjPlot

Lab Data

You can download the data required for this lab here or read it in via this link https://uoepsy.github.io/data/mallow2.csv

Section A: Write-Up

In this lab you will be presented with the output from a statistical analysis, and your job will be to write-up and present the results. We’re going to use a simulated dataset based on a paper (the same that you have worked on in lectures this week) concerning delayed gratification and age.

The aim in writing should be that a reader is able to more or less replicate your analyses without referring to your R code. This requires detailing all of the steps you took in conducting the analysis. The point of using RMarkdown is that you can pull your results directly from the code. If your analysis changes, so does your report!

Make sure that your final report doesn’t show any R functions or code. Remember you are interpreting and reporting your results in text, tables, or plots, targeting a generic reader who may use different software or may not know R at all. If you need a reminder on how to hide code, format tables, etc., make sure to review the rmd bootcamp.

Important - Write-Up Examples & Plagiarism

The example write-up sections included below are not perfect - they instead should give you a good example of what information you should include within each section, and how to structure this. For example, some information is missing (e.g., description of data checks, interpretation of descriptive statistics), some information could be presented more clearly (e.g., variable names in tables, table/figure titles/captions, and rationales for choices), and writing could be more concise in places (e.g., discussion section could be more succinct and more focused on the research questions in places).

Further, you must not copy any of the write-up included below for future reports - if you do, you will be committing plagiarism, and this type of academic misconduct is taken very seriously by the University. You can find out more here.

Study Overview

Research Aim

Explore the associations among the ability to delay gratification and age, visibility, and time of day.

Research Question

  • Does the probability of delaying gratification change as a function of age, marshmallow visibility, and time of day?
Marshmallows: Data Codebook

Setup

Setup
  1. Create a new RMarkdown file
  2. Load the required package(s)
  3. Read the mallow2 dataset into R, assigning it to an object named marshmallow

library(tidyverse)
library(patchwork)
library(kableExtra)
library(psych) 
library(sjPlot)

#read in data
marshmallow <- read_csv("https://uoepsy.github.io/data/mallow2.csv")

Analysis Code

Try to answer the research question above without referring to the provided analysis code below, and then check how your script matches up - is there anything you missed or done differently? If so, discuss the differences with a tutor - there are lots of ways to code to the same solution!

Provided Analysis Code

The 3-Act Structure: Analysis Strategy, Results, & Discussion

We need to present our report in three clear sections - think of your sections like the 3 key parts of a play or story - we need to (1) provide some background and scene setting for the reader, (2) present our results in the context of the research question, and (3) present a resolution to our story - relate our findings back to the question we were asked and provide our answer.

Act I: Analysis Strategy

Question 1

Attempt to draft an analysis strategy section based on the above research question and analysis provided.

Your analysis strategy will contain a number of different elements detailing plans and changes to your plan. Remember, your analysis strategy should not contain any results. You may wish to include the following sections:

  • Very brief data and design description:
    • Give the reader some background on the context of your write-up. For example, you may wish to describe the data source, data collection strategy, study design, number of observational units.
    • Specify the variables of interest in relation to the research question, including their unit of measurement, the allowed range (e.g., for Likert scales), and how they are scored. If you have categorical data, you will need to specify the levels and coding of your variables, and what was specified as your reference level and the justification for this choice.
  • Data management:
    • Describe any data cleaning and/or recoding.
    • Are there any observations that have been excluded based on pre-defined criteria? How/why, and how many?
    • Describe any transformations performed to aid your interpretation (i.e., mean centering, standardisation, etc.)
  • Model specification:
    • Clearly state your hypotheses and specify your chosen significance level.
    • What type of statistical analysis do you plan to use to answer the research question? (e.g., simple linear regression, multiple linear regression, binary logistic regression, etc.)
    • In some cases, you may wish to include some visualisations and descriptive tables to motivate your model specification.
    • Specify the model(s) equation you will fit to answer your given research question and analysis structure. Clearly specify the response and explanatory variables included in your model(s). This includes specifying the type of coding scheme applied if using categorical data.
    • Specify the assumption and diagnostic checks that you will conduct. Specify what plots you will use, and how you will evaluate these.

As noted and encouraged throughout the course, one of the main benefits of using RMarkdown is the ability to include inline R code in your document. Try to incorporate this in your write up so you can automatically pull the specified values from your code. If you need a reminder on how to do this, see Lesson 3 of the Rmd Bootcamp.

The mallow2 dataset contained information on 304 participants who took part in a study concerning delayed gratification - where children were presented with a single marshmallow, but were told that if they could leave it for 10 minutes, they would be rewarded with two marshmallows (scored dichotomously as taken or waited). The children participated in either the morning (am) or afternoon (pm), and the marshmallow was either visible or hidden for the 10 minute duration of the experiment. The age of each child was also recorded. All participant data was complete (no missing values), but two participants were excluded due to their age being outside of the range included in the study, and one was excluded due to an inaccurate time of day recording. This resulted in a final sample size of 301.

To investigate whether the probability of taking the marshmallow changed as a function of time of day (am/pm), age, and marshmallow visibility (visible/hidden), a binary logistic regression model was used. Effects were considered statistically significant at \(\alpha = .05\). The following model specification was used:

\[ \begin{aligned} M_1 &: \qquad \log \left( \frac{p}{1 - p}\right) = \beta_0 + \beta_1 \cdot \text{Time of day}_\text{PM} + \beta_2 \cdot \text{Age} + \beta_3 \cdot \text{Visibility}_\text{Hidden} \end{aligned} \]

\[ \begin{aligned} \text{where}~{p}~ &=~ \text{probability of taking the marshmallow} \end{aligned} \]

To address the research question of whether the probability of taking the marshmallow changed as a function of time of day, age, and marshmallow visibility, this formally corresponded to:

\[ H_0: \text{All} ~~ \beta_j ~~ = ~~ 0 ~~ (for ~~ j ~~ = 1, 2, 3) \]

\[ H_1: \text{At least one} ~~ \beta_j ~~ \neq ~~ 0 ~~ (for ~~ j ~~ = 1, 2, 3) \]

To assess how well the model fits the data, we visually assessed the standardized deviance residuals and Cook’s Distance. We expected the former to identify outliers (or extreme values), and we expected residuals to fall within the range of -2 to 2. We used the latter to check for influential observations, and visually assessed if any of our 302 observations had a Cook’s distance > 0.5 (moderately influential) or > 1 (highly influential).

Act II: Results

Question 2

Attempt to draft a results section based on your detailed analysis strategy and the analysis provided.

The results section should follow from your analysis strategy. This is where you would present the evidence and results that will be used to answer the research questions and can support your conclusions. Make sure that you address all aspects of the approach you outlined in the analysis strategy (including the evaluation of assumptions and diagnostics).

In this section, it is useful to include tables and/or plots to clearly present your findings to your reader. It is important, however, to carefully select what is the key information that should be presented. You do not want to overload the reader with unnecessary or duplicate information (e.g., do not present print outs of the head of a dataset, or the same information in tables and plots, etc.), and you also want to save space in case there is a page limit. Make use of figures with multiple panels where you can. You can also make use of an Appendix to present your assumption and diagnostic* plots/tables, but remember that you must evaluate these in-text within the results section and clearly refer the reader to the relevant plots within the Appendix.

As a broad guideline, you want to start with the results of any exploratory data analysis, presenting tables of summary statistics and exploratory plots. You may also want to visualise associations between/among variables and report covariances or correlations. Then, you should move on to the results from your model.

Descriptive statistics are displayed in Table 1.

Table 1: Descriptive Statistics
Descriptive Statistics
visibility timeofday M_Age SD_Age percent_taken
visible am 7.17 1.69 52.70
visible pm 6.04 1.78 59.21
hidden am 7.47 1.72 28.57
hidden pm 5.71 1.71 37.04

It appeared that when the marshmallow was visible, a greater percentage of children took it; and the percentage of children who engaged in delayed gratification was higher in the morning sessions (see Figure 1).

Figure 1: Association between Delayed Gratification, Age, Visibility of Marshmallow, and Time of Day

A binary logistic regression model was fitted to determine whether the probability of taking the marshmallow changed as a function of time of day, age, and marshmallow visibility.

Our model did not raise any concerns regarding outliers or high influence cases. Though there appeared to be a few residuals with a value slightly larger than 2 in absolute value (see left-hand plot in Figure 2), they were not influential points (see right-hand plot in Figure 2), since none of our observations had a Cook’s distance value > 0.5.

Figure 2: Model Fit Plots

Age and visibility were significant predictors of delayed gratification (see Table 2). For every year increase in age, the odds of taking the marshmallow decreased by a factor of 0.56 (\(95\%\, CI\, [0.47, 0.66])\). When the marshmallow was hidden (as opposed to visible), the odds of taking the marshmallow decreased by a factor of 0.3 (\(95\%\, CI\, [0.17, 0.51])\). Thus, we rejected the null hypothesis.

Table 2: Regression Table for Marshmallow Model
Regression Table for Marshmallow Model
  Marshmallow Taken
Predictors Odds Ratios CI p
Intercept 81.93 22.47 – 332.30 <0.001
Time of Day - PM 0.61 0.34 – 1.08 0.096
Age (in years) 0.56 0.47 – 0.66 <0.001
Visibility - Hidden 0.30 0.17 – 0.51 <0.001
Observations 301
R2 Tjur 0.233

Act III: Discussion

Question 3

Attempt to draft a discussion section based on your results and the analysis provided.

In the discussion section, you should summarise the key findings from the results section and provide the reader with a few take-home sentences drawing the analysis together and relating it back to the original question.

The discussion should be relatively brief, and should not include any statistical analysis - instead think of the discussion as a conclusion, providing an answer to the research question(s).

The probability of children delaying gratification did change as a function of age and marshmallow visibility (but not time of day). In summary, older age and conditions where the marshmallow was hidden appeared to be associated with greater odds of successfully delaying gratification.

Section B: Block 4 (Weeks 6-9) Recap

In the second part of the lab, there is no new content - the purpose of the recap section is for you to revisit and revise the concepts you have learned over the last 4 weeks (or the full academic year if you feel that it would be beneficial to revise the materials from blocks 1, 2, and 3 too).

We would encourage you to complete any outstanding work on these exercises (e.g., complete partial write-ups), and review solutions.

Given that we have now covered all DAPR2 course content, we would also strongly encourage you to start creating your revision materials in advance of the exam. You can access all the flashcards that you’ve been presented with in this block here. These will provide a good starting point for collating your notes together on the contents of blocks 1, 2, 3, and 4. We also suggest that you review your weekly quiz feedback (as many of you have learned in Psychology 2A, it is important to provide feedback to allow learners to improve their learning and retention of information, as well as correct any misunderstandings!).