Write Up Example & Block 3 Recap

Learning Objectives

At the end of this lab, you will:

Understand how to write-up and provide interpretation of a 2x2 factorial ANOVA

What You Need

Be up to date with lectures
Have completed Labs 1-4

Required R Packages

Remember to load all packages within a code chunk at the start of your RMarkdown file using library(). If you do not have a package and need to install, do so within the console using install.packages(" "). For further guidance on installing/updating packages, see Section C here.

For this lab, you will need to load the following package(s):

tidyverse
psych
kableExtra
emmeans

Lab Data

You can download the data required for this lab here or read it in via this link https://uoepsy.github.io/data/lietraining.csv.

Section A: Write-Up

In this section of the lab you will be be presented with a research question, and tasked with writing up and presenting your analyses.

The aim in writing should be that a reader is able to more or less replicate your analyses without referring to your R code. This requires detailing all of the steps you took in conducting the analysis. The point of using RMarkdown is that you can pull your results directly from the code. If your analysis changes, so does your report!

Make sure that your final report doesn’t show any R functions or code. Remember you are interpreting and reporting your results in text, tables, or plots, targeting a generic reader who may use different software or may not know R at all. If you need a reminder on how to hide code, format tables, etc., make sure to review the rmd bootcamp.

Important - Write-Up Examples & Plagiarism

The example write-up sections included below are not perfect - they instead should give you a good example of what information you should include within each section, and how to structure this. For example, some information is missing (e.g., interpretation of descriptive statistics, what type of interaction is present), some information could be presented more clearly (e.g., variable names in tables, table/figure titles/captions, and rationales for choices), and writing could be more concise in places (e.g., discussion section is quite long).

Further, you must not copy any of the write-up included below for future reports - if you do, you will be committing plagiarism, and this type of academic misconduct is taken very seriously by the University. You can find out more here.

Study Overview

Research Question

Do Police training materials and the mode of communication influence the accuracy of veracity judgements?

Lie detectors: Data Codebook

Description

A total of 120 participants took part in a study in which they were presented with 100 recordings, and were tasked with guessing whether the speaker in each recording was lying or whether they were telling the truth.

Participants scored points every time they correctly identified a truth or a lie, and lost points whenever they mistook a lie for a truth (or vice versa). The maximum possible points to be scored was 100.

Half of the participants (\(n\) = 60) were shown recordings in audio and video, the other half were presented with only the audio track.

Prior to taking part in the experiment, participants were given material to read for 10 minutes. Half of the participants in each condition (30 in the audio-only condition, and 30 in the audiovideo condition) were given instructional material used by the Police Force to train detectives to pick up on dishonesty during interrogations via various verbal and non-verbal cues. The remaining 30 participants in each condition were given a series of cartoon strips to read.

The data in lietraining.csv contain seven attributes collected from a sample of \(n=120\) participants:

pid: Participant ID
age: Age (in years) of participant
trained: Whether participants were given instructional material used by the Police Force to train detectives to pick up on dishonesty during interrogations via various verbal and non-verbal cues (yes = y / no = n)
audiovideo: Audio-Video recording condition - either audio+video for those participants shown recordings in audio and video, or ausio-only for those participants presented with only the audio track
points: Points scored for identifying a truth or a lie (range: 0-100)

Preview

The first six rows of the data are:

pid	age	trained	audiovideo	points
ppt_1	26	y	audio+video	36.23952
ppt_2	22	y	audio+video	38.46473
ppt_3	18	y	audio+video	34.19058
ppt_4	22	y	audio+video	52.62804
ppt_5	21	y	audio+video	38.89564
ppt_6	27	y	audio+video	37.46595

Setup

Create a new RMarkdown file
Load the required package(s)
Read the lietraining dataset into R, assigning it to an object named liedat

Solution

Analysis Code

Try to answer the research question above without referring to the provided analysis code below, and then check how your script matches up - is there anything you missed or done differently? If so, discuss the differences with a tutor - there are lots of ways to code to the same solution!

Provided Analysis Code

######Step 1 is always to read in the data, then to explore, check, describe, and visualise it.

#check coding of variables - are they coded as they should be?
str(liedat)

spc_tbl_ [120 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ pid       : chr [1:120] "ppt_1" "ppt_2" "ppt_3" "ppt_4" ...
 $ age       : num [1:120] 26 22 18 22 21 27 27 26 25 22 ...
 $ trained   : chr [1:120] "y" "y" "y" "y" ...
 $ audiovideo: chr [1:120] "audio+video" "audio+video" "audio+video" "audio+video" ...
 $ points    : num [1:120] 36.2 38.5 34.2 52.6 38.9 ...
 - attr(*, "spec")=
  .. cols(
  ..   pid = col_character(),
  ..   age = col_double(),
  ..   trained = col_character(),
  ..   audiovideo = col_character(),
  ..   points = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

head(liedat)

# A tibble: 6 × 5
  pid     age trained audiovideo  points
  <chr> <dbl> <chr>   <chr>        <dbl>
1 ppt_1    26 y       audio+video   36.2
2 ppt_2    22 y       audio+video   38.5
3 ppt_3    18 y       audio+video   34.2
4 ppt_4    22 y       audio+video   52.6
5 ppt_5    21 y       audio+video   38.9
6 ppt_6    27 y       audio+video   37.5

#make variables factors & label
liedat$audiovideo <- factor(liedat$audiovideo, labels=c("audio","audio+video"))
liedat$trained <- factor(liedat$trained, labels = c("untrained","trained"))

#create descriptives table
descript <- liedat %>% 
    group_by(audiovideo, trained) %>%
   summarise(
       meanpoints = round(mean(points), 2),
       se = round(sd(points)/sqrt(n()), 2)
    )

`summarise()` has grouped output by 'audiovideo'. You can override using the
`.groups` argument.

descript

# A tibble: 4 × 4
# Groups:   audiovideo [2]
  audiovideo  trained   meanpoints    se
  <fct>       <fct>          <dbl> <dbl>
1 audio       untrained       52.8  1.27
2 audio       trained         53.7  1.55
3 audio+video untrained       49.9  1.01
4 audio+video trained         40.4  1.12

#boxplot
p0 <- ggplot(data = liedat, aes(x = audiovideo, y = points, color = trained)) + 
  geom_boxplot() + 
    ylim(0,100) +
    labs(x = "Communication Condition", y = "Accuracy Score")
p0

#plot showing the mean points for each condition
p1 <- ggplot(descript, aes(x = audiovideo, y = meanpoints, color = trained)) + 
  geom_point(size = 3) +
  geom_linerange(aes(ymin = meanpoints - 2 * se, ymax = meanpoints + 2 * se)) +
  geom_path(aes(x = as.numeric(audiovideo)))
p1

######Step 2 is to run your model(s) of interest to answer your research question, and make sure that the data meet the assumptions of your chosen test

#build model
lie_mdl <- lm(points ~ audiovideo * trained, data = liedat)

#check assumptions
par(mfrow=c(2,2))
plot(lie_mdl)

par(mfrow=c(1,1))

# look at model output - summary() and anova()
summary(lie_mdl)


Call:
lm(formula = points ~ audiovideo * trained, data = liedat)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.1947  -4.2134  -0.7559   4.5235  21.1755 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                           52.7591     1.2576  41.953  < 2e-16 ***
audiovideoaudio+video                 -2.8367     1.7785  -1.595    0.113    
trainedtrained                         0.9759     1.7785   0.549    0.584    
audiovideoaudio+video:trainedtrained -10.4830     2.5152  -4.168 5.95e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.888 on 116 degrees of freedom
Multiple R-squared:  0.3768,    Adjusted R-squared:  0.3607 
F-statistic: 23.38 on 3 and 116 DF,  p-value: 6.591e-12

anova(lie_mdl)

Analysis of Variance Table

Response: points
                    Df Sum Sq Mean Sq F value    Pr(>F)    
audiovideo           1 1957.7 1957.71  41.263 3.046e-09 ***
trained              1  545.8  545.85  11.505   0.00095 ***
audiovideo:trained   1  824.2  824.20  17.372 5.949e-05 ***
Residuals          116 5503.6   47.45                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#int model plot
plt_mdl <- emmip(lie_mdl, trained~audiovideo,
                     CIs = TRUE,
                     xlab = "Mode of Communication",
                     ylab = "Predicted Accuracy Scores",
                     title = "Interaction Model") 
plt_mdl

#####Step 3 somewhat depends on the outcomes of step 2. Here, you may need to consider conducting further analyses before writing up / describing your results in relation to the research question. 

#Perform a pairwise comparison of the mean accuracy (as measured by points accrued) across the 2×2 factorial design, making sure to adjust for multiple comparisons. 

emms_lie <- emmeans(lie_mdl, ~ audiovideo * trained)

lie_con <- contrast(emms_lie, method = "pairwise", adjust="tukey")
lie_con

 contrast                                        estimate   SE  df t.ratio
 audio untrained - (audio+video untrained)          2.837 1.78 116   1.595
 audio untrained - audio trained                   -0.976 1.78 116  -0.549
 audio untrained - (audio+video trained)           12.344 1.78 116   6.941
 (audio+video untrained) - audio trained           -3.813 1.78 116  -2.144
 (audio+video untrained) - (audio+video trained)    9.507 1.78 116   5.346
 audio trained - (audio+video trained)             13.320 1.78 116   7.489
 p.value
  0.3855
  0.9467
  <.0001
  0.1456
  <.0001
  <.0001

P value adjustment: tukey method for comparing a family of 4 estimates

# confidence intervals
confint(lie_con)

 contrast                                        estimate   SE  df lower.CL
 audio untrained - (audio+video untrained)          2.837 1.78 116    -1.80
 audio untrained - audio trained                   -0.976 1.78 116    -5.61
 audio untrained - (audio+video trained)           12.344 1.78 116     7.71
 (audio+video untrained) - audio trained           -3.813 1.78 116    -8.45
 (audio+video untrained) - (audio+video trained)    9.507 1.78 116     4.87
 audio trained - (audio+video trained)             13.320 1.78 116     8.68
 upper.CL
    7.473
    3.660
   16.980
    0.823
   14.143
   17.956

Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 4 estimates

#plot
plot(lie_con)

The 3-Act Structure: Analysis Strategy, Results, & Discussion

Recall that we need to present our report in three clear sections - think of your sections like the 3 key parts of a play or story - we need to (1) provide some background and scene setting for the reader, (2) present our results in the context of the research question, and (3) present a resolution to our story - relate our findings back to the question we were asked and provide our answer.

If you need a reminder of what to include within each section, refer to Semester 1 Lab 11, and read through the ‘what to include’ sections for Analysis Strategy, Results, and Discussion.

Act I: Analysis Strategy

Question 1

Attempt to draft a discussion section based on the above research question and analysis provided.

Example Write-Up of Analysis Strategy Section

The lietraining dataset contained information on 120 participants who took part in a study concerning lie detection. Participants were each presented with 100 recordings (half were shown recordings in audio and video, and the other half audio only), and were tasked with judging whether the speaker in each recording was lying or whether they were telling the truth. Participants scored 1 point each time they correctly identified a truth or a lie, and lost 1 point whenever they mistook a lie for a truth (or vice versa). The maximum score was 100, where higher scores reflected higher levels of accuracy. Prior to taking part in the experiment participants were given materials to read. Half of the participants in each condition were given instructional material used by the Police Force (used to train detectives to pick up on dishonesty during interrogations via various verbal and non-verbal cues) and the remaining 30 participants in each condition were given a series of cartoon strips to read.

All participant data was complete, and accuracy scores (points) within range i.e., 0-100. Categorical variables were coded as factors, where audio was designated as the reference level for mode of communication, and untrained as the reference level for training materials.

To investigate whether police training materials (trained vs untrained) and the mode of communication (audio vs audiovideo) interacted to influence the accuracy of veracity judgements, a two-way ANOVA model was used. Effects were considered statistically significant at \(\alpha = 0.05\). Using dummy coding, the following model specification was used:

\[ \begin{aligned} \text{Accuracy Scores} &= \beta_0 \\ &+ \beta_1 A_\text{AudioVideo} + \beta_2 T_\text{Trained} \\ &+ \beta_3 (A_\text{AudioVideo} * T_\text{Trained}) \\ &+ \epsilon \end{aligned} \] To address the research question of whether the interaction between training materials and mode of communication was statistically significant, this formally corresponded to testing whether the interaction coefficient was equal to zero:

\[ H_0: \beta_3 = 0 \]

\[ H_1: \beta_3 \neq 0 \] The following assumptions were assessed visually using diagnostic plots: independence (with the previous plot and a plot of residuals vs index; no dependence should be indicated), equal variances (via a scale-location plot; residuals should be evenly spread across the range of fitted values, where the spread should be constant across the range of fitted values), and normality (via a qqplot of the residuals; points should follow along the diagonal line). We also checked if there was any evidence of multicollinearity by checking VIF values, where values > 5 were considered to indicate moderate multicollinearity, and values > 10 severe. Outliers were assessed via Cooks Distance, where values >2 indicated influential points.

Act II: Results

Question 2

Attempt to draft a results section based on your detailed analysis strategy and the analysis provided.

Example Write-Up of Results Section

Descriptive statistics are displayed in Table 1.

Table 1: Descriptive Statistics
audiovideo	trained	mean_points	se
audio	untrained	52.76	1.27
audio	trained	53.74	1.55
audio+video	untrained	49.92	1.01
audio+video	trained	40.42	1.12

In the audio condition, there did not appear to be a difference between trained and non-trained scores. However, untrained scored higher than trained in the audio+video condition. This suggested that there may be an interaction (see Figure 1).

Figure 1: Association between Score and Communication / Training Conditions

Accuracy of veracity judgements (measured by points scored in lie-detecting game) were analysed with a 2 (audio vs audiovideo) \(\times\) 2 (untrained vs trained) between-subjects ANOVA.

The model met assumptions of linearity and independence (see top left panel of Figure 2; residuals were randomly scattered with a mean of zero and there was no clear dependence), homoscedasticity (see bottom left panel of Figure 2; there was a constant spread of residuals), and normality (see top right panel of Figure 2; the QQplot showed very little deviation from the diagonal line).

There was a significant interaction between presentation mode and whether or not participants had received training for detecting lies \(F(1, 116) = 17.37, p <. 001\) (see Table 2).

Table 2: Model Results
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
audiovideo	1	1957.71	1957.71	41.26	0.00e+00
trained	1	545.85	545.85	11.50	9.50e-04
audiovideo:trained	1	824.20	824.20	17.37	5.95e-05
Residuals	116	5503.63	47.45	NA	NA

As displayed in Figure 3, results suggested that the difference in points did not differ significantly in the audio condition regardless of training, but that there were significant differences in the audio-video condition where untrained participants had higher accuracy scores than trained.

To explore the interaction further, pairwise comparisons were conducted. Tukey’s Honestly Significant Difference comparisons (see Figure 4) indicated that, contrary to what one might expect, participants who were presented with audiovisual recordings scored on average 9.5 points lower when they had read the police training materials compared to when they had received no training (95% CI [4.87 — 14.14]). The presentation mode (audio vs audio-video) was not found to result in a significantly different average score for those who were untrained (95% CI [-1.80 — 7.47]), and nor did training appear to have any effect on detecting lies in the audio-only condition (95% CI [-5.61 — 3.66]).

Figure 4: Tukey HSD Pairwise Comparisons

Act III: Discussion

Question 3

Attempt to draft a discussion section based on your results and the analysis provided.

Example Write-Up of Discussion Section

Section B: Weeks 1-5 Recap

In the second part of the lab, there is no new content - the purpose of the recap section is for you to revisit and revise the concepts you have learned over the last 4/5 weeks.

Before you expand each of the boxes below, think about how comfortable you feel with each concept.

F-Ratio

Nested vs Non-Nested Models

Incremental F-test

Comparing regression models with anova()

AIC & BIC

Side Contraints

Name	Constraint	Meaning of \(\beta_0\)	R
Sum to zero (Effects Coding)	\(\beta_1 + \beta_2 + \beta_3 = 0\)	\(\beta_0 = \mu\)	`contr.sum`
Reference group (Dummy Coding)	\(\beta_1 = 0\)	\(\beta_0 = \mu_1\)	`contr.treatment`

Contrasts: Rules for Assigning Weights

Multiple Comparisons: Why does the Number of Tests Matter?

Multiple Comparisons: When to use Which Correction

Bootstrap: Terminology

Bootstrap

Bootstrap: In R

Footnotes

what defines a ‘family’ of tests is debatable.↩︎