Manual contrast interactions and multiple comparisons


Data Analysis for Psychology in R 2

Elizabeth Pankratz (elizabeth.pankratz@ed.ac.uk)


Department of Psychology
University of Edinburgh
2025–2026

Course Overview


Introduction to linear Models Intro to linear regression
Interpreting linear models
Testing individual predictors
Model testing & comparison
Linear model analysis
Analysing Experimental Studies Categorical predictors and dummy coding
Effect coding and manual post-hoc contrasts
Assumptions and diagnostics
Bootstrapping and confidence intervals
Categorical predictors: Practice analysis
Interactions Mean-centering and numeric/categorical interactions
Numeric/numeric interactions
Categorical/categorical interactions
Manual contrast interactions and multiple comparisons
Interactions: Practice analysis
Advanced Topics Power analysis
Binary logistic regression I
Binary logistic regression II
Logistic regression: Practice analysis
Exam prep and course Q&A

This week’s learning objectives


How do we test an interaction between two manually-coded post-hoc contrasts?

What’s the main danger of running multiple statistical tests on the same data?

What are two common ways that we can correct for running multiple statistical tests on the same data?

Where we are in the analysis workflow

Interactions between manual post-hoc contrasts

Today’s data

The subjective wellbeing (SWB) of patients at two different hospitals:

Hosp1:

Hosp2:

who have undergone one of three different treatments for depression:

Cognitive Behavioural Therapy (CBT)

Eye Movement Desensitisation and Reprocessing therapy (EMDR)

Antidepressant medication
(Meds):

Visualising the data


hosp |>
  head(12)
# A tibble: 12 × 3
     SWB Treatment Hospital
   <dbl> <fct>     <fct>   
 1   6.2 Meds      Hosp1   
 2  15.9 Meds      Hosp1   
 3   7.2 Meds      Hosp1   
 4  11.3 Meds      Hosp1   
 5  11.2 Meds      Hosp1   
 6   9   Meds      Hosp1   
 7  14.5 Meds      Hosp1   
 8   7.3 Meds      Hosp1   
 9  13.7 Meds      Hosp1   
10  12.6 Meds      Hosp1   
11  13.8 Meds      Hosp1   
12  12.5 Meds      Hosp1   

What we want to know: Is the difference in SWB between talk therapy (both CBT and EMDR) and pharmaceutical therapy (Meds) different in each hospital?

  • We’ll manually chunk CBT and EMDR together in a post-hoc contrast, which we’ll call TrtmtType.
  • We’ll also specify a very simple manual post-hoc contrast for Hospital.
  • Then we’ll make a post-hoc interaction contrast to test whether the Hospital contrast interacts with the TrtmtType contrast.

Set up the model


We only care about the post-hoc contrasts, so it’s fine to fit the model using R’s default treatment coding.


Check on the reference levels:

contrasts(hosp$Treatment)
     EMDR Meds
CBT     0    0
EMDR    1    0
Meds    0    1
contrasts(hosp$Hospital)
      Hosp2
Hosp1     0
Hosp2     1


The model in R syntax:

m1 <- lm(SWB ~ Treatment * Hospital, data = hosp)


On your own time, practice writing out the model’s linear expression in mathematical terms, and check the appendix for the solution :)

Get estimated marginal means


m1_emm <- emmeans(m1, ~ Treatment * Hospital)
m1_emm
 Treatment Hospital emmean   SE  df lower.CL upper.CL
 CBT       Hosp1     10.10 0.37 174     9.37    10.83
 EMDR      Hosp1      9.43 0.37 174     8.70    10.16
 Meds      Hosp1     10.80 0.37 174    10.07    11.53
 CBT       Hosp2      7.98 0.37 174     7.25     8.71
 EMDR      Hosp2     13.12 0.37 174    12.39    13.85
 Meds      Hosp2      7.85 0.37 174     7.12     8.58

Confidence level used: 0.95 

Defining post-hoc contrasts

Define post-hoc contrast for TrtmtType

Step 1: “Chunk” together the two group(s) that the research question is comparing.

  • Chunk 1: Meds.
  • Chunk 2: CBT, EMDR.

Step 2: Assign a 0 to any group(s) that aren’t in one of the chunks from Step 1.

Step 3: Assign a plus sign to every group in Chunk 1, and a minus sign to every group in Chunk 2.

Step 4: Count the plus signs and minus signs.

Step 5: To figure out the actual values for each cell, start with 1 and –1. Divide 1 by \(n_{plus}\), and divide –1 by \(n_{minus}\).

Step 6: In the coding matrix, replace the plus signs with the positive coding value from Step 5, and replace the minus signs with the negative coding value from Step 5. Done!

Treatment TrtmtType
CBT
EMDR
Meds


wooclap.com

code UBEQCF

Define post-hoc contrast for Hospital

Step 1: “Chunk” together the two group(s) that the research question is comparing.

  • Chunk 1: Hosp2.
  • Chunk 2: Hosp1.

Step 2: Assign a 0 to any group(s) that aren’t in one of the chunks from Step 1.

Step 3: Assign a plus sign to every group in Chunk 1, and a minus sign to every group in Chunk 2.

Step 4: Count the plus signs and minus signs.

Step 5: To figure out the actual values for each cell, start with 1 and –1. Divide 1 by \(n_{plus}\), and divide –1 by \(n_{minus}\).

Step 6: In the coding matrix, replace the plus signs with the positive coding value from Step 5, and replace the minus signs with the negative coding value from Step 5. Done!

Hospital Hosp
Hosp1
Hosp2


wooclap.com

code UBEQCF

Visualising the interaction we’ll test

We want to know: Is there a difference between hospitals in how different treatment types (talk therapy vs. pharmaceutical therapy) are associated with SWB?

H0/H1 for interactions between manual contrasts

Step 1: Define means of all combinations of factor levels

When we test differences between groups, we’re testing differences between group means.

So, we will start by defining the group means for all combinations of factor levels in our data.

Next, we’ll combine these group means in the way that the manual contrasts specify.

Step 2: “Chunk” some of the means together

Within each hospital, we chunk CBT and EMDR together, because they are both talk therapy.


When we chunk two group means together, mathematically we are taking the mean of those group means.

To represent the mean of the blue chunk (top left):

\[ \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} \]

To represent the mean of the green chunk (top right):

\[ \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} \]

Step 3: Set up H0 as differences between chunk means, based on RQ

RQ: Is the difference in SWB between talk therapy (both CBT and EMDR) and pharmaceutical therapy (Meds) different in each hospital?


Null hypothesis in words: The difference in SWB between talk therapy (both CBT and EMDR) and pharmaceutical therapy (Meds) is the same in each hospital.


In words arranged like the math:

  • (the mean SWB of CBT and EMDR in Hosp1)
  • minus
  • (the mean SWB of Meds in Hosp1)
  • equals
  • (the mean SWB of CBT and EMDR in Hosp2)
  • minus
  • (the mean SWB of Meds in Hosp2)

In math arranged like the words:

  • \(\frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2}\)
  • \(\mu_{\text{Meds,Hosp1}}\)
  • =
  • \(\frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2}\)
  • \(\mu_{\text{Meds,Hosp2}}\)

Formal H0 and H1 for interactions between manual contrasts

Version 1: Differences are the same (H0) or not the same (H1):

\[\begin{align} H_0 &: \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} = \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \\ H_1 &: \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} \neq \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \end{align}\]


Version 2, also fine: Difference between differences is zero (H0) or not zero (H1):

\[\begin{align} H_0 &: \Big( \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} \Big) - \Big( \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \Big) = 0 \\ H_1 &: \Big( \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} \Big) - \Big( \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \Big) \neq 0 \\ \end{align}\]




(LaTeX code in appendix)

Post-hoc contrasts in R

Post-hoc contrasts in R


Treatment TrtmtType
CBT –1/2
EMDR –1/2
Meds 1
Hospital Hosp
Hosp1 –1
Hosp2 1


TrtmtType_contr <- c(
  'CBT'  = -1/2,
  'EMDR' = -1/2,
  'Meds' =    1
)

Hosp_contr <- c(
  'Hosp1' = -1,
  'Hosp2' =  1
)

The interaction’s contrast is the product of the two interacting contrasts


In other words: To get the interaction’s contrast, we multiply together the two contrasts for the interacting predictors.


Treatment Hospital TrtmtType Hosp TrtmtType:Hosp
CBT Hosp1 –0.5 –1
EMDR Hosp1 –0.5 –1
Meds Hosp1 1 –1
CBT Hosp2 –0.5 1
EMDR Hosp2 –0.5 1
Meds Hosp2 1 1


wooclap.com, code UBEQCF

Put the interaction’s contrast into R, two ways

(1) The fancy way:

Here are the two individual manual contrasts we defined a couple slides ago:

TrtmtType_contr
 CBT EMDR Meds 
-0.5 -0.5  1.0 
Hosp_contr
Hosp1 Hosp2 
   -1     1 

Multiply them together using outer(). Write TrtmtType first because we told emmeans   ~ Treatment * Hospital, with Treatment first.

outer(TrtmtType_contr, Hosp_contr)
     Hosp1 Hosp2
CBT    0.5  -0.5
EMDR   0.5  -0.5
Meds  -1.0   1.0

Flatten this table using as.vector():

as.vector( outer(TrtmtType_contr, Hosp_contr) )
[1]  0.5  0.5 -1.0 -0.5 -0.5  1.0


Name the contrast, wrap it in a list, and assign that list to a variable called interaction_contr:

interaction_contr <- list(
  'TrtmtType:Hospital' = as.vector(outer(TrtmtType_contr, Hosp_contr))
)

Put the interaction’s contrast into R, two ways

(2) The manual way:

Check the order of items that emmeans uses:

m1_emm
 Treatment Hospital emmean   SE  df lower.CL upper.CL
 CBT       Hosp1     10.10 0.37 174     9.37    10.83
 EMDR      Hosp1      9.43 0.37 174     8.70    10.16
 Meds      Hosp1     10.80 0.37 174    10.07    11.53
 CBT       Hosp2      7.98 0.37 174     7.25     8.71
 EMDR      Hosp2     13.12 0.37 174    12.39    13.85
 Meds      Hosp2      7.85 0.37 174     7.12     8.58

Confidence level used: 0.95 


Figure out the correct coding values manually (as we did a few slides ago).

Make sure to match emmeans’ order and type up the contrast coding by hand:

interaction_contr <- list( "TrtmtType:Hospital" =
    c(
        0.5,    # CBT  Hosp1 
        0.5,    # EMDR Hosp1
         -1,    # Meds Hosp1
       -0.5,    # CBT  Hosp2
       -0.5,    # EMDR Hosp2
          1     # Meds Hosp2
    )
)

Test the interaction contrast

When we test the contrast using contrast(), we are testing the H0 that the estimate is equal to zero. For this interaction, the H0 says there’s no difference between Hospital in how TrtmtType is associated with SWB.

(m1_ixn_test <- contrast(m1_emm, interaction_contr))
 contrast           estimate    SE  df t.ratio p.value
 TrtmtType:Hospital    -3.73 0.641 174  -5.819 <0.0001


Get the associated 95% CIs using confint():

(m1_ixn_confint <- confint(m1_ixn_test))
 contrast           estimate    SE  df lower.CL upper.CL
 TrtmtType:Hospital    -3.73 0.641 174    -4.99    -2.46

Confidence level used: 0.95 


The null hypothesis:

  • There’s no difference between hospitals in how different treatment types (talk therapy vs. pharmaceutical therapy) are associated with SWB.
  • In other words, if the null hypothesis were true, the data for both hospitals would pattern the same.

Can we reject the null hypothesis?

Where does this estimate of –3.73 come from?

It’s the result of multiplying each group mean (column mean_SWB) by the corresponding value from the interaction’s contrast (column TrtmtType:Hospital), shown in the column product, and then adding all those numbers together.

Treatment Hospital mean_SWB TrtmtType:Hospital product
CBT Hosp1 10.10 0.5 5.05
EMDR Hosp1 9.43 0.5 4.72
Meds Hosp1 10.80 -1.0 -10.80
CBT Hosp2 7.98 -0.5 -3.99
EMDR Hosp2 13.12 -0.5 -6.56
Meds Hosp2 7.85 1.0 7.85


sum(hosp_manual$product) |> round(2)
[1] -3.73


Can we give this number a more intuitive explanation?

Not really tbh :( Interaction coefficients are adjustments to other effects, not effects themselves. But we don’t have any other coefficients to relate this adjustment to, so we can’t compute simple slopes or anything. And it’s not very useful to interpret this single number on its own.

The reason manual interactions are useful is essentially just the significance test.

Extending the analysis workflow

Testing simple effects

Retrieval practice: Testing simple effects


  • What’s a simple effect?
  • If I ask for simple effects of Hospital at each level of Treatment, what am I looking for?
  • What’s the difference between simple effects and simple slopes?
  • What does it mean to test a simple effect?

Testing simple effects with pairs()


We’ve seen the math of how to compute simple effects by hand.

But that only gives us the value of the effect, not a test of whether that value is significantly different from zero.


When we had continuous predictors, we could test simple slopes using probe_interactions().

Now that we have categorical predictors, we can test simple effects by giving the pairs() function the model’s emmeans object and specifing the simple= argument.


For example, to get the simple effects of Hospital at each level of Treatment:

pairs(m1_emm, simple = "Hospital")

Simple effects of Hospital at each level of Treatment

pairs(m1_emm, simple = "Hospital")
Treatment = CBT:
 contrast      estimate    SE  df t.ratio p.value
 Hosp1 - Hosp2     2.12 0.523 174   4.059 <0.0001

Treatment = EMDR:
 contrast      estimate    SE  df t.ratio p.value
 Hosp1 - Hosp2    -3.69 0.523 174  -7.047 <0.0001

Treatment = Meds:
 contrast      estimate    SE  df t.ratio p.value
 Hosp1 - Hosp2     2.95 0.523 174   5.632 <0.0001

Note on the positive/negative signs:

  • pairs() computes Hosp1 – Hosp2 because of how factor levels in the factor Hospital are ordered.
  • To compute Hosp2 – Hosp1 instead, we would have to code Hosp2 as the reference level of Hospital before fitting the model.

Simple effects of Treatment for each level of Hospital

pairs(m1_emm, simple = "Treatment")
Hospital = Hosp1:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR     0.673 0.523 174   1.287  0.4044
 CBT - Meds    -0.697 0.523 174  -1.332  0.3796
 EMDR - Meds   -1.370 0.523 174  -2.619  0.0259

Hospital = Hosp2:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR    -5.137 0.523 174  -9.819 <0.0001
 CBT - Meds     0.127 0.523 174   0.242  0.9682
 EMDR - Meds    5.263 0.523 174  10.061 <0.0001

P value adjustment: tukey method for comparing a family of 3 estimates 


What’s this “P value adjustment: tukey method” thing? We’ll find out soon, but first …

Visualising cat/cat interactions

Using cat_plot():

interactions::cat_plot(
  m1,
  pred = Hospital,
  modx = Treatment,
  geom = 'line'
)

Using emmip() (estimated marginal means interaction plot):

emmip(
  m1_emm, 
  Treatment ~ Hospital, 
  CIs = TRUE
)

The multiple comparisons problem

Let’s roll some dice

On a twenty-sided die (a d20), the probability of rolling a 1 (or any other individual number) is 1/20 = 5%.

But if we roll the d20 over and over again, the probability that we’ll roll a 1 at least once goes up and up and up.

\(.05\)

\(= 1 – 0.95\)

\(.0975\)

\(= 1 – (0.95 \times 0.95)\)

\(.1426\)

\(= 1 – (0.95 \times 0.95 \times 0.95) = 1 - 0.95^3\)

In general:

\[ P(\text{rolling a 1 in m rolls}) = 1 - (1 - 0.05)^m \]

What’s the probability that we’ll roll a 1 at least once in twenty rolls?

wooclap.com, code UBEQCF

From dice to statistical tests


Why did we spend time rolling those dice?

  • When \(\alpha = .05\),
    • we accept the risk that 5% of the time, we’ll reject the H0 when actually we shouldn’t.
    • (aka that we’ll get a false positive result)
    • (aka that we’ll make a Type I error)
  • A d20 rolling a 1 happens with the same probability as us rejecting the H0 when actually we shouldn’t: one time in twenty.
  • We saw that the probability of rolling a 1 at least once increases, the more dice you roll. It’s also true that the probability of rejecting the H0 when actually we shouldn’t will increase, the more statistical tests we run.
  • So if we run 20 statistical tests, say, then even though we only want to incorrectly reject the H0 one time out of twenty (5% of the time), we would actually reject it with much higher probability.
  • The actual level of risk becomes higher than the level of risk that we wanted.

relevant XKCD (https://xkcd.com/882/)

The more tests you run, the more likely you are to see something unlikely just due to chance


With \(\alpha = 0.05\), we accept the risk that, we will incorrectly reject the null hypothesis 5% of the time.

And if we accept that positive result and ignore all the negative results (green jelly bean style), then we run into the multiple comparisons problem: we are more than 5% likely to see at least one false positive, the more tests (aka, the more comparisons) we run.


How much more likely?

Depends on the number of tests.

\[ \begin{align} P(\text{incorrectly reject H0}) &= \alpha \\ P(\text{correctly reject H0}) &= 1 - \alpha \\ P(\text{correctly reject H0 in} ~m~ \text{tests}) &= (1 - \alpha)^m \\ P(\text{incorrectly reject H0 in} ~m~ \text{tests}) &= 1 - (1 - \alpha)^m \\ \end{align} \]

How likely are we to incorrectly reject H0?


For the green jelly beans scenario:

\[ \begin{align} P(\text{incorrectly reject H0 in 20 tests}) &= 1 - (1 - 0.05)^{20} \\ &= 1 - (0.95)^{20} \\ &= 1 - 0.36 \\ &= 0.64 \\ \end{align} \]

We accepted the risk of incorrectly rejecting the null hypothesis 5% of the time. But if we run 20 tests and accept any positive result, we will risk incorrectly rejecting H0 64% of the time!


How do we fix it?

Lower the \(\alpha\) level (aka our accepted level of risk, aka the false positive rate, aka the Type I error rate) to compensate for the increased probability of incorrectly rejecting H0.

Common corrections


  • Tukey
  • Bonferroni
  • Šídák
  • Scheffe
  • Holm’s step-down
  • Hochberg’s step-up


You only need to know about Tukey and Bonferroni, the two most common.


We’ll start with Bonferroni, because it’s simplest.

The Bonferroni correction


The goal is to lower the \(\alpha\) level of our tests so that it matches our desired level of risk.

The Bonferroni correction takes our original \(\alpha\) level and divides it by the number of tests:

\[ \alpha_{\text{Bonferroni}} = \frac{\alpha}{\text{number of tests}} \]


For the jelly bean scenario, with 20 tests:

0.05 / 20
[1] 0.0025


For our simple effects of Treatment by Hospital, with three tests:

0.05 / 3
[1] 0.01666667

Bonferroni in action


pairs(m1_emm, simple = 'Treatment', adjust = "bonferroni")
Hospital = Hosp1:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR     0.673 0.523 174   1.287  0.5993
 CBT - Meds    -0.697 0.523 174  -1.332  0.5541
 EMDR - Meds   -1.370 0.523 174  -2.619  0.0288

Hospital = Hosp2:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR    -5.137 0.523 174  -9.819 <0.0001
 CBT - Meds     0.127 0.523 174   0.242  1.0000
 EMDR - Meds    5.263 0.523 174  10.061 <0.0001

P value adjustment: bonferroni method for 3 tests 

Tukey’s Honest Significant Differences (HSD)


Think back to independent-samples t-tests in DAPR1:

  • This kind of t-test compares whether two group means are significantly different from one another.
  • The difference between group means is represented as a t-statistic.
  • The t-statistic is compared to a critical value from the t-distribution to decide whether the difference is significant.

At heart, Tukey’s HSD is very similar!

  • Tukey’s HSD compares the means of the members of one pair in a pairwise comparison.
  • The difference between group means is represented as a q-statistic.
  • The q-statistic is compared to a critical value from the studentised range distribution to decide whether the difference is significant.


You do not need to know how to calculate this adjustment by hand!

If you want to know, here’s a YouTube video that walks through the calculation. (The main action is between 3:41–7:42.)

Tukey in action


pairs(m1_emm, simple = 'Treatment', adjust = "tukey")
Hospital = Hosp1:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR     0.673 0.523 174   1.287  0.4044
 CBT - Meds    -0.697 0.523 174  -1.332  0.3796
 EMDR - Meds   -1.370 0.523 174  -2.619  0.0259

Hospital = Hosp2:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR    -5.137 0.523 174  -9.819 <0.0001
 CBT - Meds     0.127 0.523 174   0.242  0.9682
 EMDR - Meds    5.263 0.523 174  10.061 <0.0001

P value adjustment: tukey method for comparing a family of 3 estimates 

When to choose Bonferroni vs. Tukey


When to choose Bonferroni:

  • When you’re interested in a small number of comparisons.
  • When you’ve got a set of p-values that come from the same data, regardless of what kind of test generated them.


When to choose Tukey:

  • When you want to compare the means of levels of a categorical predictor.
  • When you’re interested in many/all pairwise comparisons between groups.


Check the flashcards for more details.

Analysis workflow: The final form!

Back matter

Revisiting this week’s learning objectives

How do we test an interaction between two manually-coded post-hoc contrasts?

  • Manually code the interaction between them, using the same procedure that we saw for treatment-coded predictors last week:
    • get all combinations of levels, and
    • multiply each pair of coding values together.
    • The result is the contrast coding of the interaction.
  • Test that contrast using our emmeans() toolkit.

What’s the main danger of running multiple statistical tests on the same data?

  • We’ll end up rejecting the H0 at least once in all those tests with a probability much higher than our desired \(\alpha\) level.
  • This phenomenon is called the “multiple comparisons problem”.
  • The solution: Make the \(\alpha\) level smaller, to compensate for this increased likelihood of rejecting the H0.

What are two common ways that we can correct for running multiple statistical tests on the same data?

  • Bonferroni correction (a global adjustment to all \(\alpha\)s).
  • Tukey’s HSD (an adjustment to each comparison’s \(\alpha\) based on how different the two group means being compared are).

This week


Tasks:


Attend your lab and work together on the exercises

Support:


Help each other on the Piazza forum


Complete the weekly quiz

Attend office hours (see Learn page for details)

Appendix

Code for data plot

hosp |>
  ggplot(aes(x = Treatment, y = SWB, colour = Treatment, fill = Treatment)) +
  geom_violin(alpha = 0.5) +
  geom_jitter(alpha = 0.5, size = 5) +
  facet_wrap(~ Hospital) +
  stat_summary(fun = mean, geom = 'point', colour = 'black', size = 8, show.legend = F) +
  theme(legend.position = 'none') +
  NULL

Code for visualising manual chunks

hosp |>
  mutate(
    TrtmtType = ifelse(Treatment == 'Meds', 'PharmaTherapy', 'TalkTherapy')
  ) |>
  ggplot(aes(x = TrtmtType, y = SWB)) +
  geom_violin() +
  geom_jitter(aes(colour = Treatment), alpha = 0.5, size = 5) +
  facet_wrap(~ Hospital) +
  NULL

H0 and H1 LaTeX code

Version 1: Differences are the same (H0) or not the same (H1):

\[\begin{align} H_0 &: \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} = \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \\ H_1 &: \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} \neq \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \end{align}\]

$$
\begin{align}
H_0 &: \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} = 
\frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \\

H_1 &: \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} \neq 
\frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} 
\end{align}
$$


Version 2, also fine: Difference between differences is zero (H0) or not zero (H1):

\[\begin{align} H_0 &: \Big( \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} \Big) - \Big( \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \Big) = 0 \\ H_1 &: \Big( \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} \Big) - \Big( \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \Big) \neq 0 \\ \end{align}\]

$$
\begin{align}
H_0 &: \Big( \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} \Big) - 
\Big( \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \Big) = 0 \\

H_1 &: \Big( \frac{ \mu_{\text{CBT,Hosp1}} + \mu_{\text{EMDR,Hosp1}} }{2} - \mu_{\text{Meds,Hosp1}} \Big) - 
\Big( \frac{ \mu_{\text{CBT,Hosp2}} + \mu_{\text{EMDR,Hosp2}} }{2} - \mu_{\text{Meds,Hosp2}} \Big) \neq 0 \\
\end{align}
$$

Linear expression for model


\[ \begin{align} \text{SWB} ~=~& \beta_0 + (\beta_1 \cdot \text{Treatment}_{\text{EMDR}}) + (\beta_2 \cdot \text{Treatment}_{\text{Meds}}) + (\beta_3 \cdot \text{Hospital}) ~ + \\ & (\beta_4 \cdot \text{Treatment}_{\text{EMDR}} \cdot \text{Hospital}) + (\beta_5 \cdot \text{Treatment}_{\text{Meds}} \cdot \text{Hospital}) + \epsilon \end{align} \]

Are we adjusting \(\alpha\) or p?

If we compare pairs() with no correction and with the Bonferroni correction, it’s only the p-values that change.

pairs(m1_emm, simple = 'Treatment', adjust = "none")
Hospital = Hosp1:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR     0.673 0.523 174   1.287  0.1998
 CBT - Meds    -0.697 0.523 174  -1.332  0.1847
 EMDR - Meds   -1.370 0.523 174  -2.619  0.0096

Hospital = Hosp2:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR    -5.137 0.523 174  -9.819 <0.0001
 CBT - Meds     0.127 0.523 174   0.242  0.8090
 EMDR - Meds    5.263 0.523 174  10.061 <0.0001
pairs(m1_emm, simple = 'Treatment', adjust = "bonferroni")
Hospital = Hosp1:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR     0.673 0.523 174   1.287  0.5993
 CBT - Meds    -0.697 0.523 174  -1.332  0.5541
 EMDR - Meds   -1.370 0.523 174  -2.619  0.0288

Hospital = Hosp2:
 contrast    estimate    SE  df t.ratio p.value
 CBT - EMDR    -5.137 0.523 174  -9.819 <0.0001
 CBT - Meds     0.127 0.523 174   0.242  1.0000
 EMDR - Meds    5.263 0.523 174  10.061 <0.0001

P value adjustment: bonferroni method for 3 tests 

So technically, pairs() adjusts p, not \(\alpha\). But whichever one is adjusted, the results are equivalent!

Adjusting \(\alpha\) by dividing by 3 is the same as adjusting p by multiplying by 3 (and if the value ends up above 1, setting it equal to 1).

For example, the unadjusted p-value in Hosp1 for CBT - Meds, 0.1847, multiplied by 3 (the number of tests), is the Bonferroni-adjusted p-value, 0.5541.

0.1847 * 3
[1] 0.5541