Write Up Example & Block 2 Recap

Learning Objectives

At the end of this lab, you will:

Understand how to write-up and provide interpretation of a linear model with multiple predictors.

What You Need

Be up to date with lectures
Have completed Labs 7-10

Required R Packages

Remember to load all packages within a code chunk at the start of your RMarkdown file using library(). If you do not have a package and need to install, do so within the console using install.packages(" "). For further guidance on installing/updating packages, see Section C here.

For this lab, you will need to load the following package(s):

tidyverse
patchwork
sjPlot
sandwich
interactions

Lab Data

You can download the data required for this lab here or read it in via this link https://uoepsy.github.io/data/scs_study.csv.

Note: This is the same data as Lab 8.

Section A: Write-Up

In this lab you will be presented with the output from a statistical analysis, and your job will be to write-up and present the results. We’re going to use an example analysis using one of the datasets we have worked with on a number of exercises in previous labs concerning personality traits, social comparison, depression, and anxiety.

The aim in writing should be that a reader is able to more or less replicate your analyses without referring to your R code. This requires detailing all of the steps you took in conducting the analysis.
The point of using RMarkdown is that you can pull your results directly from the code. If your analysis changes, so does your report!

Make sure that your final report doesn’t show any R functions or code. Remember you are interpreting and reporting your results in text, tables, or plots, targeting a generic reader who may use different software or may not know R at all. If you need a reminder on how to hide code, format tables, etc., make sure to review the rmd bootcamp.

Important - Write-Up Examples & Plagiarism

The example write-up sections included below are not perfect - they instead should give you a good example of what information you should include within each section, and how to structure this. For example, some information is missing (e.g., interpretation of descriptive statistics, what type of interaction is present), some information could be presented more clearly (e.g., variable names in tables, table/figure titles/captions, and rationales for choices), and writing could be more concise in places (e.g., discussion section is quite long).

Further, you must not copy any of the write-up included below for future reports - if you do, you will be committing plagiarism, and this type of academic misconduct is taken very seriously by the University. You can find out more here.

Study Overview

Research Question

Controlling for other personality traits, does neuroticism moderate effects of social comparison on symptoms of depression, anxiety and stress?

Previous research has identified an association between an individual’s perception of their social rank and symptoms of depression, anxiety and stress. We are interested in the individual differences in this association.

To investigate whether the effect of social comparison on symptoms of depression, anxiety and stress varies depending on level of Neuroticism, we will need to fit a multiple regression model with an interaction term and control for other personality traits.

Social Comparison Study data codebook

zo	zc	ze	za	zn	scs	dass
0.76	1.58	-0.79	-0.09	1.32	30	56
0.30	-0.27	-0.09	0.09	-0.40	30	48
-0.13	0.66	-0.80	-0.95	0.93	35	48
1.06	-1.02	-0.16	-0.50	-0.02	29	48
1.74	-0.78	-1.55	-2.86	-1.14	41	43
0.22	-0.41	0.78	0.90	-0.25	37	60

Provided Analysis Code

library(tidyverse) # for all things!
library(psych) # good for descriptive stats
library(kableExtra) # useful for creating nice tables
library(car) # for assumption tests
library(sandwich)
library(interactions) # for plotting models

scs_study <- read_csv("https://uoepsy.github.io/data/scs_study.csv")

# standardise scs score
scs_study <- 
  scs_study %>% 
    mutate(
      zscs = (scs-mean(scs))/sd(scs)
    )
#alternatively, you could do zscs = scale(scs, center = TRUE, scale = TRUE)

# the describe() function is from the psych package, and kable() from kableExtra which is used to make a nice table where the values are rounded to 2 decimal places using digits = 2. 
describe(scs_study %>% 
        select(dass, scs, zn))[,c(2:4,8:9)] %>% 
        kable(., caption = "DASS-21, SCS, and Neuroticism Descriptive Statistics", digits = 2) %>%
        kable_styling()

DASS-21, SCS, and Neuroticism Descriptive Statistics
	n	mean	sd	min	max
dass	656	44.72	6.76	23.00	68.00
scs	656	35.77	3.53	27.00	54.00
zn	656	0.00	1.00	-1.45	3.35

# scatterplot matrix of dataset without the zscs variable
pairs.panels(scs_study %>% select(-zscs))

dass_mdl <- lm(dass ~ zscs*zn + zo + zc + ze + za, data = scs_study)
par(mfrow=c(2,2))
plot(dass_mdl)

# 35 seems to be a very influential point, lets remove it and re-run the model

dass_mdl2 <- lm(dass ~ zscs*zn + zo + zc + ze + za, data = scs_study[-35, ])

# check assumptions for updated model
par(mfrow=c(2,2))
plot(dass_mdl2)

par(mfrow=c(1,1))

# N.B. we cannot use crPlots for interactions

# Additional diagnostic plots for independence and homoscedasticity

# checking for independence
plot(resid(dass_mdl2))

# alternative check for equal variances (Homoscedasticity) - 
residualPlots(dass_mdl2)

           Test stat Pr(>|Test stat|)  
zscs          1.8141          0.07013 .
zn           -0.5911          0.55467  
zo            1.7801          0.07553 .
zc           -0.2403          0.81018  
ze           -0.9951          0.32004  
za            0.0725          0.94219  
Tukey test   -1.6406          0.10089  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# multicollinearity
vif(dass_mdl2)

there are higher-order terms (interactions) in this model
consider setting type = 'predictor'; see ?vif

    zscs       zn       zo       zc       ze       za  zscs:zn 
1.015133 1.015736 1.013310 1.008235 2.332486 2.342220 1.012475

# model output
summary(dass_mdl2)


Call:
lm(formula = dass ~ zscs * zn + zo + zc + ze + za, data = scs_study[-35, 
    ])

Residuals:
     Min       1Q   Median       3Q      Max 
-17.1455  -3.8155  -0.0066   3.6905  18.1483 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 44.97703    0.22635 198.708  < 2e-16 ***
zscs        -1.93818    0.23042  -8.412 2.58e-16 ***
zn           1.41639    0.22661   6.250 7.44e-10 ***
zo          -0.31435    0.22056  -1.425    0.155    
zc           0.09134    0.22515   0.406    0.685    
ze           0.52695    0.34233   1.539    0.124    
za           0.33847    0.34281   0.987    0.324    
zscs:zn     -2.76609    0.24097 -11.479  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.733 on 647 degrees of freedom
Multiple R-squared:  0.279, Adjusted R-squared:  0.2712 
F-statistic: 35.76 on 7 and 647 DF,  p-value: < 2.2e-16

#create table for results
tab_model(dass_mdl2,
          dv.labels = c("DASS-21 Scores"),
          pred.labels = c("zscs"="Social Comparison Scale (Z-scored)", 
                          "zn"="Neuroticism (Z-scored)", 
                          "zo"="Openness (Z-scored)", 
                          "zc"="Conscientiousness (Z-scored)",
                          "ze"="Extraversion (Z-scored)",
                          "za"="Agreeableness (Z-scored)",
                          "zscs:zn"="Social Comparison Scale (Z-scored): Neutoricism (Z-scored)"),
          title = "Regression table for DASS-21 model")

Regression table for DASS-21 model
	DASS-21 Scores
Predictors	Estimates	CI	p
(Intercept)	44.98	44.53 – 45.42	<0.001
Social Comparison Scale (Z-scored)	-1.94	-2.39 – -1.49	<0.001
Neuroticism (Z-scored)	1.42	0.97 – 1.86	<0.001
Openness (Z-scored)	-0.31	-0.75 – 0.12	0.155
Conscientiousness (Z-scored)	0.09	-0.35 – 0.53	0.685
Extraversion (Z-scored)	0.53	-0.15 – 1.20	0.124
Agreeableness (Z-scored)	0.34	-0.33 – 1.01	0.324
Social Comparison Scale (Z-scored): Neutoricism (Z-scored)	-2.77	-3.24 – -2.29	<0.001
Observations	655
R² / R² adjusted	0.279 / 0.271

#interaction plot and simple slopes:
plt_scs_mdl <- probe_interaction(model = dass_mdl2, 
                  pred = zscs, 
                  modx = zn, 
                  cond.int = T,
                  interval = T, 
                  jnplot = T,
                  main.title = "Neuroticism moderating the effect of\nsocial comparison on depression and anxiety",
                  x.label = "Social Comparison Scale (Z-scored)",
                  y.label = "DASS-21 Scores",
                  legend.main = "Neuroticism (Z-scored)")

plt_scs_mdl$interactplot

plt_scs_mdl$simslopes

JOHNSON-NEYMAN INTERVAL 

When zn is OUTSIDE the interval [-0.93, -0.52], the slope of zscs is p <
.05.

Note: The range of observed values of zn is [-1.45, 3.35]

SIMPLE SLOPES ANALYSIS 

When zn = -1.000344918 (- 1 SD): 

                               Est.   S.E.   t val.      p
--------------------------- ------- ------ -------- ------
Slope of zscs                  0.83   0.33     2.49   0.01
Conditional intercept         43.53   0.32   136.42   0.00

When zn = -0.003414067 (Mean): 

                               Est.   S.E.   t val.      p
--------------------------- ------- ------ -------- ------
Slope of zscs                 -1.93   0.23    -8.37   0.00
Conditional intercept         44.96   0.23   199.64   0.00

When zn =  0.993516784 (+ 1 SD): 

                               Est.   S.E.   t val.      p
--------------------------- ------- ------ -------- ------
Slope of zscs                 -4.69   0.33   -14.05   0.00
Conditional intercept         46.39   0.32   145.48   0.00

Setup

Create a new RMarkdown file
Load the required package(s)
Read the scs dataset into R, assigning it to an object named scs

Solution

The 3-Act Structure

We need to present our report in three clear sections - think of your sections like the 3 key parts of a play or story - we need to (1) provide some background and scene setting for the reader, (2) present our results in the context of the research question, and (3) present a resolution to our story - relate our findings back to the question we were asked and provide our answer.

Act I: Analysis Strategy

Question 1

Attempt to draft a discussion section based on the above research question and analysis provided.

Analysis Strategy - What to Include

Your analysis strategy will contain a number of different elements detailing plans and changes to your plan. Remember, your analysis strategy should not contain any results. You may wish to include the following sections:

Very brief data and design description:
- Give the reader some background on the context of your write-up. For example, you may wish to describe the data source, data collection strategy, study design, number of observational units.
- Specify the variables of interest in relation to the research question, including their unit of measurement, the allowed range (for Likert scales), how they are scored, and if they are factors make sure to list the order of the levels.
Data management:
- Describe any data cleaning and/or recoding.
- Are there any observations that have been excluded based on pre-defined criteria? How/why, and how many?
- * Describe any transformations performed to aid your interpretation (i.e., log transformation, mean centering, standardisation, etc.)
Model specification:
- Clearly state your hypotheses and specify your chosen significance level.
- What type of statistical analysis do you plan to use to answer the research question? (e.g., t-test, simple linear regression, multiple linear regression, etc.)
- In some cases, you may wish to include some visualisations and descriptive tables to motivate your model specification.
- Specify the model(s) to be fitted to answer your given research question and analysis structure. Clearly specify the response and explanatory variables included in your model(s) and remember to describe the coding of categorical variables (i.e., factors) so the reader is aware of any reference levels.
- Detail the steps that you will undertake to ensure that your model(s) do not violate the appropriate assumptions.
- If applicable, detail any required changes/modifications to the model specification to satisfy assumptions. Consider the following: Was there anything you had to do differently than planned during the analysis? Did the modelling highlight issues in your data? Did you have to do anything (e.g., transform any variables, exclude any observations) in order to meet assumptions?

Note that the * used on occasion in the above indicates that you may/should in some cases repeat these steps if you decide to make any modifications to your data (e.g., removing outliers, etc.).

As noted and encouraged throughout the course, one of the main benefits of using RMarkdown is the ability to include inline R code in your document. Try to incorporate this in your write up so you can automatically pull the specified values from your code. If you need a reminder on how to do this, see Lesson 4 of the Rmd Bootcamp.

Example Write-Up of Analysis Strategy Section

The dataset contained information on 656 participants, including \(Z\)-scores on 5 personality traits assessed by the Big-Five Aspects Scale (BFAS; Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism). Participants were also assessed on the Social Comparison Scale (SCS), which is an 11-item scale measuring self-perception (relative to others) of social rank, attractiveness and belonging, and the Depression Anxiety and Stress Scale (DASS-21) - a 21 item measure with higher scores indicating higher severity of symptoms. For both of these measures, only total scores were available. Items in the SCS were measured on a 5-point scale, giving minimum and maximum possible scores of 11 and 55 respectively. Items in the DASS-21 were measured on a 4-point scale, meaning that scores could range from 21 to 84.

All participant data was complete (no missing values), with scores on the SCS and the DASS-21 all within possible ranges.

To investigate whether, when controlling for other personality traits, Neuroticism moderated the effect of social comparison on symptoms of depression, anxiety and stress, total scores on the DASS-21 were modeled using multiple linear regression. The \(Z\)-scored measures on each of the big-five personality traits were included as predictors, along with scores on the SCS (\(Z\)-scored) and its interaction with the measure of Neuroticism. Effects were considered statistically significant at \(\alpha = 0.05\).

The following model specification was used: \[ \begin{aligned} \text{DASS-21} = \beta_0 + \beta_1 \text{O} + \beta_2 \text{C} + \beta_3 \text{E} + \beta_4 \text{A} + \beta_5 \text{N} + \beta_6 \text{SCS} + \beta_7 (\text{SCS} \cdot \text{N}) + \epsilon \\ \end{aligned} \]

\[ \begin{aligned} \text{where } \\ \\ & \text{O = Openness, z-scored} \\ & \text{C = Conscientiousness, z-scored} \\ & \text{E = Extraversion, z-scored} \\ & \text{A = Agreeableness, z-scored} \\ & \text{N = Neuroticism, z-scored} \\ & \text{SCS = Social Comparison Scale, z-scored} \\ \end{aligned} \]

To address the research question of whether Neuroticism moderated the effect of social comparison on depression and anxiety, we tested whether the interaction between SCS and Neuroticism was significant. Formally, this corresponded to testing whether the interaction coefficient was equal to zero:

\[ \begin{aligned} H_0: \beta_7 = 0 \\ H_1: \beta_7 \neq 0 \\ \end{aligned} \]

The following assumptions were assessed visually using diagnostic plots: linearity (via plot of residuals vs fitted values; red line should be approximately horizontal, with residual values randomly scattered around zero and thus showing no pattern in relation to fitted values), independence (with the previous plot and a plot of residuals vs index; no dependence should be indicated), equal variances (via a scale-location plot; residuals should be evenly spread across the range of fitted values, where the spread should be constant across the range of fitted values), and normality (via a qqplot of the residuals; points should follow along the diagonal line). We also checked if there was any evidence of multicollinearity by checking VIF values, where values > 5 were considered to indicate moderate multicollinearity, and values > 10 severe. Outliers were assessed via Cooks Distance, where values >2 indicated influential points.

Act II: Results

Question 2

Attempt to draft a results section based on your detailed analysis strategy and the analysis provided.

Results - What To Include

The results section should follow from your analysis strategy. This is where you would present the evidence and results that will be used to answer the research questions and can support your conclusions. Make sure that you address all aspects of the approach you outlined in the analysis strategy.

In this section, it is useful to include tables and plots to clearly present your findings to your reader. It is important, however, to carefully select what is the key information that should be presented. You don’t want to overload the reader with unnecessary information, and you also want to save space in case there is a page limit. Make use of figures with multiple panels where you can.

As a broad guideline, you want to start with the results of an exploratory data analysis, presenting tables of summary statistics and exploratory plots. You may also want to visualise relationships between variables and report covariances or correlations. Then, you should move on to the results from your model. Remember that in the main part of the report you should only interpret and report for models that do not violate the assumptions. You should also interpret all of the results presented, and remember to make reference to and comment on your assumption and diagnostic checks for key models.

Example Write-Up of Results Section

Descriptive statistics are displayed in Table 1.

Table 1: Regression table for DASS-21 model
	n	mean	sd	min	max
dass	656	44.72	6.76	23.00	68.00
scs	656	35.77	3.53	27.00	54.00
zn	656	0.00	1.00	-1.45	3.35

Bivariate correlations showed a moderate negative association between DASS-21 and SCS scores; a moderate positive association between DASS-21 and Neuroticism, and a weak positive correlation between SCS and Neuroticism (see Figure 1).

Figure 1: Bivariate scatter plots (below diagonal), histograms (diagonal), and Pearson correlation coefficient (above diagonal) for DASS-21 scores, SCS, and Big 5 Personality variables

One observation (unit 35) was judged to be too influential on the model (Cook’s Distance = 2.66) and as such was excluded from the final analysis, leaving 655 observations.

The final model met assumptions of linearity and independence (see top left panel of Figure 2; residuals were randomly scattered with a mean of zero and there was no clear dependence), homoscedasticity (see bottom left panel of Figure 2; there was a constant spread of residuals), and normality (see top right panel of Figure 2; the QQplot showed very little deviation from the diagonal line). All VIF values were <5, and hence there was no evidence of multicollinearity.

Full regression results including 95% Confidence Intervals are shown in Table 2. The \(F\)-test for model utility was significant (\(F(7,647) = 35.76, p<.001\)), and the model explained approximately 27.12% of the variability in DASS-21 Scores.

Table 2: Regression table for DASS-21 model
	DASS-21 Scores
Predictors	Estimates	CI	p
(Intercept)	44.98	44.53 – 45.42	<0.001
Social Comparison Scale (Z-scored)	-1.94	-2.39 – -1.49	<0.001
Neuroticism (Z-scored)	1.42	0.97 – 1.86	<0.001
Openness (Z-scored)	-0.31	-0.75 – 0.12	0.155
Conscientiousness (Z-scored)	0.09	-0.35 – 0.53	0.685
Extraversion (Z-scored)	0.53	-0.15 – 1.20	0.124
Agreeableness (Z-scored)	0.34	-0.33 – 1.01	0.324
Social Comparison Scale (Z-scored): Neutoricism (Z-scored)	-2.77	-3.24 – -2.29	<0.001
Observations	655
R² / R² adjusted	0.279 / 0.271

Results showed a significant conditional association between SCS scores (\(Z\)-scored) and DASS-21 Scores (\(\beta\) = -1.94, \(SE\) = 0.23, \(p\) <.001), which suggested that for those at the mean level of Neuroticism, scores on the DASS-21 decreased by 1.94 for every 1 standard deviation increase in SCS scores. A significant conditional association was also evident between Neuroticism (Z-scored) and DASS-21 Scores (\(\beta\) = 1.42, \(SE\) = 0.23, \(p\) <.001), which suggested that for those with average scores on the SCS, scores on the DASS-21 increased by 1.42 for every 1 standard deviation increase in Neuroticism.

Crucially, the association between social comparison and symptoms of depression and anxiety was found to be dependent upon the level of Neuroticism, with a greater negative association between the two for those with high levels of Neuroticism (\(\beta\) = -2.77, \(SE\) = 0.24, \(p\) <.001). This interaction is visually presented in Figure 3.

The results presented here indicated that the association between social comparison and depression and anxiety did depend upon individuals’ levels of Neuroticism, with perceived social rank perhaps leading to more symptoms of depression and anxiety for highly neurotic individuals. The Johnson-Neyman technique (see Figure 4) indicated that the association between DASS-21 scores and SCS was significant when Neuroticism scores were less than 0.93 standard deviations below the mean or more than -0.52 standard deviations above the mean.

Act III: Discussion

Question 3

Attempt to draft a discussion section based on your results and the analysis provided.

Discussion - What To Include

In the discussion section, you should summarise the key findings from the results section and provide the reader with take-home sentences drawing the analysis together and relating it back to the original question.

The discussion should be relatively brief, and should not include any statistical analysis - instead think of the discussion as a conclusion, providing an answer to the research question(s).

Example Write-Up of Discussion Section

Section B: Weeks 6 - 11 Recap

In the second part of the lab, there is no new content - the purpose of the recap section is for you to revisit and revise the concepts you have learned over the last 4/5 weeks.

Before you expand each of the boxes below, think about how comfortable you feel with each concept.

Specifying Interaction Models

Interpreting Coefficients

Mean Centering

Assumptions: Linearity

Assumptions: Equal Variances (Homoscedasticity)

Assumptions: Independence (of errors)

Assumptions: Normality (of errors)

Assumptions: Interactions

Multicollinearity

Individual Case Diagnostics

Section C: Mock Exam Questions

In the exam, there will be 3 sections containing different types of questions, as outlined below:

Section A: You will answer multiple choice questions (see Question 1 for an example)
Section B: You will solve by-hand calculation problems (see Question 2 for an example)
Section C: You will be asked questions about basic R-code and to interpret results of analyses (see Question 3 for an example)

Below there is a mock question from each section described above - note that solutions are not provided. These are just to give you an example of the types of questions you will be presented with. If you have questions about these, ask your tutor or come to office hours to discuss.

Question 1

Which assumption is being checked in the following line of code:

residualPlots(model1)

A: Linearity
B: Normality
C: Independence
D: Equal variances / Homoscedasticity

Question 2

Using the values below, calculate \(R^2\).

SS_Model = 70
SS_Residual = 56
SS_Total = 126

Question 3

Researchers have a sample of 100 people, and they have measured their resting heart rate (rhr) and their caffeine consumption (caffeine). They were interested in estimating how caffeine consumption was associated with differences in resting heart rate, after controlling for age (age; since heart rate increases with advancing age and because they thought that older people tend to drink less caffeine).

From the following output (see Figure 7), write out and interpret the regression equation for the model following APA guidelines.

Write Up Example & Block 2 Recap

Learning Objectives

What You Need

Required R Packages

Lab Data

Section A: Write-Up

Study Overview

Setup

The 3-Act Structure

Act I: Analysis Strategy

Act II: Results

Act III: Discussion

Section B: Weeks 6 - 11 Recap

Simple Linear Regression

Multiple Regression

Section C: Mock Exam Questions