We largely avoided the discussion of contrasts in USMR because they can be quite incomprehensible at first, and we wanted to focus on really consolidating our learning of the linear model basics.
What are contrasts and when are they relevant?
Recall when we have a categorical predictors? To continue the example in USMR, Martin’s lectures considered the utility of different types of toys (Playmo, SuperZings, and Lego).When we put a categorical variable into our model as a predictor, we are always relying on contrasts, which are essentially just combinations of 0s & 1s which we use to define the differences which we wish to test.
The default, treatment coding, treats one level as a reference group, and results in coefficients which compare each group to that reference.
When a variable is defined as a factor in R, there will always be an attached “attribute” defining the contrasts:
#read the data
toy_utils <- read_csv("https://uoepsy.github.io/msmr/data/toyutility.csv")
#make type a factor
toy_utils <- toy_utils %>%
mutate(
type = factor(type)
)
# look at the contrasts
contrasts(toy_utils$type)
## playmo zing
## lego 0 0
## playmo 1 0
## zing 0 1
Currently, the resulting coefficients from the contrasts above will correspond to:
Read in the toy utility data from https://uoepsy.github.io/msmr/data/toyutility.csv.
We’re going to switch back to the normal lm()
world here for a bit, so no multi-level stuff while we learn more about contrasts.
Fit the model below
model_treatment <- lm(UTILITY ~ type, data = toy_utils)
Calculate the means for each group, and the differences between them, to match them to the coefficients from the model above
How can we change the way contrasts are coded?
In our regression model we are fitting an intercept plus a coefficient for each explanatory variable, and any categorical predictors are coded as 0s and 1s.
We can add constraints to our the way our contrasts are coded. In fact, with the default treatment coding, we already have applied one - we constrained them such that the intercept (\(\beta_0\)) = the mean of one of the groups.
The other common constraint that is used is the sum-to-zero constraint. This approach gets called “effects coding”.
Under this constraint the interpretation of the coefficients becomes:
Changing contrasts in R
There are two main ways to change the contrasts in R.
Change them in your data object.
contrasts(toy_utils$type) <- "contr.sum"
Change them only in the fit of your model (don’t change them in the data object)
model_sum <- lm(UTILITY ~ type, contrasts = list(type = "contr.sum"), data = toy_utils)
type
variable