Write Up Example & Block 4 Recap

Learning Objectives

At the end of this lab, you will:

Understand how to write-up and provide interpretation of a binary logistic regression model

What You Need

Be up to date with lectures
Have completed Labs 7-9

Required R Packages

Remember to load all packages within a code chunk at the start of your RMarkdown file using library(). If you do not have a package and need to install, do so within the console using install.packages(" "). For further guidance on installing/updating packages, see Section C here.

For this lab, you will need to load the following package(s):

tidyverse
psych
kableExtra
sjPlot

Lab Data

You can download the data required for this lab here or read it in via this link https://uoepsy.github.io/data/SenilityWAIS.csv.

Section A: Write-Up

In this section of the lab you will be be presented with a research question, and tasked with writing up and presenting your analyses.

The aim in writing should be that a reader is able to more or less replicate your analyses without referring to your R code. This requires detailing all of the steps you took in conducting the analysis. The point of using RMarkdown is that you can pull your results directly from the code. If your analysis changes, so does your report!

Make sure that your final report doesn’t show any R functions or code. Remember you are interpreting and reporting your results in text, tables, or plots, targeting a generic reader who may use different software or may not know R at all. If you need a reminder on how to hide code, format tables, etc., make sure to review the rmd bootcamp.

Important - Write-Up Examples & Plagiarism

The example write-up sections included below are not perfect - they instead should give you a good example of what information you should include within each section, and how to structure this. For example, some information is missing (e.g., interpretation of descriptive statistics, what type of interaction is present), some information could be presented more clearly (e.g., variable names in tables, table/figure titles/captions, and rationales for choices), and writing could be more concise in places (e.g., discussion section is quite long).

Further, you must not copy any of the write-up included below for future reports - if you do, you will be committing plagiarism, and this type of academic misconduct is taken very seriously by the University. You can find out more here.

Study Overview

Research Question

Does the probability of having senility symptoms change as a function of the WAIS score?

Senility: Data Codebook

wais	senility
9	1
13	1
6	1
8	1
10	1
4	1

Setup

Create a new RMarkdown file
Load the required package(s)
Read the SenilityWAIS dataset into R, assigning it to an object named sen

Solution

Analysis Code

Try to answer the research question above without referring to the provided analysis code below, and then check how your script matches up - is there anything you missed or done differently? If so, discuss the differences with a tutor - there are lots of ways to code to the same solution!

Provided Analysis Code

######Step 1 is always to read in the data, then to explore, check, describe, and visualise it.

#check coding of variables - are they coded as they should be?
str(sen)

spc_tbl_ [54 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ wais    : num [1:54] 9 13 6 8 10 4 14 8 11 7 ...
 $ senility: num [1:54] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   wais = col_double(),
  ..   senility = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

head(sen)

# A tibble: 6 × 2
   wais senility
  <dbl>    <dbl>
1     9        1
2    13        1
3     6        1
4     8        1
5    10        1
6     4        1

#both variables currently coded as should be - no changes needed. 

#create descriptives table
descript <- sen %>% 
    group_by(senility) %>%
   summarise(
       Mean_WAIS = mean(wais),
       SD_WAIS = sd(wais),
       Min_WAIS = min(wais),
       Max_WAIS = max(wais)) %>%
  kable(caption = "Descriptive Statistics", digits = 2) %>%
  kable_styling()
descript

Descriptive Statistics
senility	Mean_WAIS	SD_WAIS	Min_WAIS	Max_WAIS
0	12.50	3.46	4	20
1	8.93	3.17	4	14

#bar plot - senility presence
sen_plt1 <- ggplot(data = sen, aes(x = as_factor(senility), fill = as_factor(senility))) + 
  geom_bar() + 
    labs(x = "Senility Symptoms Present (0 = No, 1 = Yes)", fill = "Senility Symptoms Present \n(0 = No, 1 = Yes)", y = "Frequency")
sen_plt1

#density plot - senility & WAIS
sen_plt2 <- ggplot(data = sen, aes(x = wais, fill = as_factor(senility))) + 
  geom_density() + 
    labs(x = "WAIS Score", fill = "Senility Symptoms Present \n(0 = No, 1 = Yes)")
sen_plt2

######Step 2 is to run your model(s) of interest to answer your research question, and make sure that the data meet the assumptions of your chosen test

#build model & examine output
sen_mdl1 <- glm(senility ~ wais, family = "binomial", data = sen)
summary(sen_mdl1)


Call:
glm(formula = senility ~ wais, family = "binomial", data = sen)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6702  -0.7402  -0.4749   0.5200   2.1157  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)   2.4040     1.1918   2.017  0.04369 * 
wais         -0.3235     0.1140  -2.838  0.00453 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 61.806  on 53  degrees of freedom
Residual deviance: 51.017  on 52  degrees of freedom
AIC: 55.017

Number of Fisher Scoring iterations: 5

exp(coefficients(sen_mdl1))

(Intercept)        wais 
   11.06784     0.72359

#check model fit
plot(rstandard(sen_mdl1, type = "deviance"), ylab = "Standardised Deviance Residuals")

plot(cooks.distance(sen_mdl1), ylab = "Cook's Distance")

#compare to null - conduct model comparison
#fit null
sen_mdl0 <- glm(senility ~ 1, family = "binomial", data = sen)
#compare models - models are nested
anova(sen_mdl0, sen_mdl1, test = "Chisq")

Analysis of Deviance Table

Model 1: senility ~ 1
Model 2: senility ~ wais
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
1        53     61.806                        
2        52     51.017  1   10.789 0.001021 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC(sen_mdl0, sen_mdl1)

         df      AIC
sen_mdl0  1 63.80632
sen_mdl1  2 55.01738

BIC(sen_mdl0, sen_mdl1)

         df      BIC
sen_mdl0  1 65.79530
sen_mdl1  2 58.99535

#plot model
plt_mdl1 <- plot_model(sen_mdl1, type = "eff") 
plt_mdl1

$wais

#results in formatted table
tab_model(sen_mdl1,
          dv.labels = "Senility Symptoms",
          pred.labels = c("wais" = "WAIS"),
          title = "Regression Table for Senility Model")

Regression Table for Senility Model
	Senility Symptoms
Predictors	Odds Ratios	CI	p
(Intercept)	11.07	1.23 – 145.24	0.044
WAIS	0.72	0.56 – 0.89	0.005
Observations	54
R² Tjur	0.198

The 3-Act Structure: Analysis Strategy, Results, & Discussion

Recall that we need to present our report in three clear sections - think of your sections like the 3 key parts of a play or story - we need to (1) provide some background and scene setting for the reader, (2) present our results in the context of the research question, and (3) present a resolution to our story - relate our findings back to the question we were asked and provide our answer.

If you need a reminder of what to include within each section, refer to Semester 1 Lab 11, and read through the ‘what to include’ sections for Analysis Strategy, Results, and Discussion.

Act I: Analysis Strategy

Question 1

Attempt to draft a discussion section based on the above research question and analysis provided.

Example Write-Up of Analysis Strategy Section

Act II: Results

Question 2

Attempt to draft a results section based on your detailed analysis strategy and the analysis provided.

Example Write-Up of Results Section

Descriptive statistics are displayed in Table 1.

Table 1: Descriptive Statistics
senility	Mean_WAIS	SD_WAIS	Min_WAIS	Max_WAIS
0	12.50	3.46	4	20
1	8.93	3.17	4	14

It appeared that there those without senility symptoms present had higher WAIS scores than those with symptoms (see Figure 1).

Figure 1: Association between Senility Symotom Presence and WAIS

A binary logistic regression model was fitted to determine whether the probability of having senility symptoms change as a function of WAIS score.

Our model did not raise any concerns regarding fit. Though there appeared to be a few residuals with a value slightly larger than 2 in absolute value (see left-hand plot in Figure 2), they were not influential points (see right-hand plot in Figure 2), since none of our observations had a Cook’s distance value > 0.5.

WAIS scores were a significant predictor of whether or not individuals experienced senility symptoms (see Table 2), where for every additional point scored on the WAIS, the odds of having senility symptoms decreased by a factor of 0.72 (\(95\%\, CI\, [0.56, 0.89]\)).

Table 2: Regression Table for Senility Model
	Senility Symptoms
Predictors	Odds Ratios	CI	p
(Intercept)	11.07	1.23 – 145.24	0.044
WAIS	0.72	0.56 – 0.89	0.005
Observations	54
R² Tjur	0.198

We visualised the predicted probability based on our model estimates (see Figure 3), which suggested that higher WAIS scores were associated with a lower probability of endorsing senility symptoms.

$wais

We performed a deviance goodness-of-fit test to compare our fitted model to the null. At the 5% significance level, the addition of information about the participants’ WAIS resulted in a significant decrease in model deviance \(\chi^2(1) = 10.78, p = .001\). Hence, we have strong evidence that the subjects’ WAIS was a helpful predictor of whether or not participants will experience symptoms of senility.

Act III: Discussion

Question 3

Attempt to draft a discussion section based on your results and the analysis provided.

Example Write-Up of Discussion Section

Section B: Weeks 6-10 Recap

In the second part of the lab, there is no new content - the purpose of the recap section is for you to revisit and revise the concepts you have learned over the last 4/5 weeks.

Before you expand each of the boxes below, think about how comfortable you feel with each concept.

Probability, Odds, Log-Odds

Binary Logistic Regression

Interpretation of coefficients

Generalized Linear Models

Drop-in-deviance test to compare nested models

Akaike and Bayesian Information Criteria

Errors and Power in Hypothesis Testing

Factors affecting power

Effect Size

The pwr package

Function	Description
`pwr.2p.test`	Two proportions (equal n)
`pwr.2p2n.test`	Two proportions (unequal n)
`pwr.anova.test`	Balanced one-way ANOVA
`pwr.chisq.test`	Chi-square test
`pwr.f2.test`	General linear model
`pwr.p.test`	Proportion (one sample)
`pwr.r.test`	Correlation
`pwr.t.test`	t-tests (one sample, two samples, paired)
`pwr.t2n.test`	t-test (two samples with unequal n)

Type of test	Small	Medium	Large
t-test	0.20	0.50	0.80
ANOVA	0.10	0.25	0.40
Linear regression	0.02	0.15	0.35

Power for t-tests

Power for linear regression

Exploratory vs Confirmatory Analyses

Steps in Exploratory Analyses