Contrasts: a Quick Overview

We largely avoided the discussion of contrasts in USMR because they can be quite incomprehensible at first, and we wanted to focus on really consolidating our learning of the linear model basics.

What are contrasts and when are they relevant?

Recall when we have a categorical predictors? To continue the example in USMR, Martin’s lectures considered the utility of different types of toys (Playmo, SuperZings, and Lego).

[USMR Week 9 Lecture](https://uoepsy.github.io/usmr/lectures/lecture_8.html#36) utility of different toy_types

Figure 1: USMR Week 9 Lecture utility of different toy_types

When we put a categorical variable into our model as a predictor, we are always relying on contrasts, which are essentially just combinations of 0s & 1s which we use to define the differences which we wish to test.

The default, treatment coding, treats one level as a reference group, and results in coefficients which compare each group to that reference.
When a variable is defined as a factor in R, there will always be an attached “attribute” defining the contrasts:

#read the data
toy_utils <- read_csv("https://uoepsy.github.io/msmr/data/toyutility.csv")

#make type a factor
toy_utils <- toy_utils %>%
  mutate(
    type = factor(type)
  )

# look at the contrasts
contrasts(toy_utils$type)

##        playmo zing
## lego        0    0
## playmo      1    0
## zing        0    1

Currently, the resulting coefficients from the contrasts above will correspond to:

Intercept = estimated mean utility for Lego (reference group)
typeplaymo = estimated difference in mean utility Playmo vs Lego
typezing = estimated different in mean utility Zing vs Lego

Question 1

Read in the toy utility data from https://uoepsy.github.io/msmr/data/toyutility.csv.
We’re going to switch back to the normal lm() world here for a bit, so no multi-level stuff while we learn more about contrasts.

Fit the model below

model_treatment <- lm(UTILITY ~ type, data = toy_utils)

Calculate the means for each group, and the differences between them, to match them to the coefficients from the model above

Solution

You don’t have to do it this way. This was just the first way to calculate the difference in means that came to my head. group_by() will definitely come in handy though.

toy_utils %>% group_by(type) %>%
  summarise(
    mutil = mean(UTILITY)
  ) %>% 
  mutate(
    diff = mutil - mutil[type == "lego"]
  )

## # A tibble: 3 × 3
##   type   mutil   diff
##   <fct>  <dbl>  <dbl>
## 1 lego    2.82  0    
## 2 playmo  6.62  3.8  
## 3 zing    2.54 -0.280

coef(model_treatment)

## (Intercept)  typeplaymo    typezing 
##        2.82        3.80       -0.28

How can we change the way contrasts are coded?

In our regression model we are fitting an intercept plus a coefficient for each explanatory variable, and any categorical predictors are coded as 0s and 1s.

We can add constraints to our the way our contrasts are coded. In fact, with the default treatment coding, we already have applied one - we constrained them such that the intercept (\(\beta_0\)) = the mean of one of the groups.

Optional: Why constraints?

The other common constraint that is used is the sum-to-zero constraint. This approach gets called “effects coding”.
Under this constraint the interpretation of the coefficients becomes:

\(\beta_0\) represents the global mean (sometimes referred to as the “grand mean”).
\(\beta_1\) the effect due to group \(i\) — that is, the mean response in group \(i\) minus the global mean.

Changing contrasts in R
There are two main ways to change the contrasts in R.

Change them in your data object.

contrasts(toy_utils$type) <- "contr.sum"

Change them only in the fit of your model (don’t change them in the data object)

model_sum <- lm(UTILITY ~ type, contrasts = list(type = "contr.sum"), data = toy_utils)

Question 2

Refit the same model of utility by toy-type, using effects coding for the type variable
Then calculate the global mean, and the differences from each mean to the global mean, to match them to the coefficients

Solution

model_sum <- lm(UTILITY ~ type, contrasts = list(type = "contr.sum"), data = toy_utils)

globmean <- mean(toy_utils$UTILITY)
globmean

## [1] 3.99

toy_utils %>% group_by(type) %>%
  summarise(
    mutil = mean(UTILITY)
  ) %>% 
  mutate(
    diff = mutil - globmean
  )

## # A tibble: 3 × 3
##   type   mutil  diff
##   <fct>  <dbl> <dbl>
## 1 lego    2.82 -1.17
## 2 playmo  6.62  2.63
## 3 zing    2.54 -1.45

coef(model_sum)

## (Intercept)       type1       type2 
##        3.99       -1.17        2.63