Grouping variables
Grouping variables are ordinal or categorical variables whose values appear more than once in a dataset. In other words, when multiple observations in a dataset are related to the same value of an ordinal or categorical variable, then you’re dealing with grouping variables.
For example:
- The same child is observed at a few different times of day, so our data contains multiple observations from that child.
- The same experimental participant sees all conditions of an experiment, so our data contains multiple observations from that participant.
- The same experimental condition is seen by many participants, so our data contains multiple observations of that experimental condition.
- The same test question is seen by every student, so our data contains multiple observations from that test question.
- The same student sees many test questions, so our data contains multiple observations from that student.
Why are grouping variables important?
In DAPR2, you learned that linear models make a number of assumptions (revisit last year’s DAPR2 flashcards here).
One of those assumptions was the assumption of independence of errors. For this assumption to be met, every observation must be independent of every other observation. But when (for example) our data contains multiple observations from the same child, those data points are not independent, because they all come from the same child.
In general, when our data contains grouping structures, the data points associated with each member of each group are not independent. They are not independent because they all come from the same member of that group. Therefore our analysis will violate the assumption of independence of errors unless we include the grouping variables in our model.
Depending on the data generating process, we model grouping variables either as fixed effects or as random effects.
In particular, if we have a randomly varying grouping variable (see data generating process), then our model needs to contain a random intercept by that variable (and may potentially also contain a random slope over the model’s predictors by that variable).
Linked flash cards
Outgoing links
- TODO
Backlinks
- TODO