11: Reporting on analyses with MLM

This reading:

A (non-exhaustive) checklist of things to think about/include when writing up analyses with multilevel models

The sample data

Descriptives of hierarchical data are sometimes a bit more difficult than when we don’t have any ‘levels’. Typically, what we are wanting to do is provide our readers with a picture of the characteristics of our sample. “Our sample” now refers to multiple levels, so we want to describe each of these. More often than not, one of these levels will be a bit more interesting to us as a population we are hoping to generalise to. In psychology we are usually interested in “people”, so if we have data that is multiple trials per participant, we would probably want to focus on describing the participants (the clusters) as the individual trials are something we exert control over as the experimenter. If each datapoint was a child and they were nested in schools, we would probably want to describe both the children and the schools that are in our sample.

The aim here is to provide a picture of our sample so that a reader can get a sense of how ‘transportable’ the findings are to different contexts. For instance, if participants in our study are all university students, then we want to be careful about thinking that the findings will apply in other populations (see e.g. “most people aren’t WEIRD”).

A checklist

what is the hierarchical data structure (how many levels, what is each level?)
Describe any data cleaning outlier/data removal prior to calculating descriptive statistics (these tend to be the impossible values - i.e. observations that you would never want in your data anyway)
sample sizes: how many at each level?
- how many lower-level within each higher level unit? (if this varies, provide an average, and possibly a min and a max)
scales of measured variables
descriptive statistics of relevant variables that characterise your sample.
- these should be computed at the level at which they were measured. For instance, if you have observations grouped by participant, mean(data$age) would give the average age of your observations (which isn’t meaningful, and would differ from the average age of your participants if you have a different number of observations for each participant).
- How much of the variability in the outcome variable is attributable to the clustering? (i.e. ICC)

The methods

When writing up any statistical analysis, one important thing to keep in mind is transparency in the decisions and actions taken in the analysis process. The aim is to avoid a reader wondering “how did they end up with these results?”. Ideally, another researcher would be able to reproduce your analysis based on your explanation of what you have done.

With multilevel models, there’s a lot of choices that we make - the scaling and centering of variables, models being fitted with ML vs REML, the method used to conduct inference, and so on. In addition, in the event that we arrived at our final model after a series of non-converging models that were then simplified, we would ideally explain this process.

A checklist

Describe any transformations to the data that are made prior to conducting the analysis (e.g., you’ll often re-center a time variable)
Describe the process that led to your final model(s)
- Clearly explain the structure of your initial model (e.g. this might be the ‘maximal model’), and if this failed to converge, explain what random effects were removed and in what order? if possible, explain why.
- State the software packages and versions used to fit models, along with the estimation method (ML/REML) and optimiser used.
What is the structure of your final model(s)?
- You don’t need to write a complicated mathematical equation for your model. Describing it in words is fine provided you’re clear. e.g. “the outcome variable Y was modelled using mixed effects regression with afixed effects including a main effect of A and B as well as their interaction. The random effects include a random intercept by participant”
- Linear/binomial/poisson/… - if not linear, what link function (e.g., logit, log) was used?
- Specify all fixed effects.
- Specify all random effects according to the sampling units (e.g. schools/children etc) with which they interact. Be careful to make sure it’s clear what slopes are for which groupings!.
It’s often useful to state clearly the relevant test/comparison/parameter estimate of interest, and link this explicitly to the research questions/hypotheses.
Any model comparisons should be clearly stated so that the reader understands the structure of both models being compared.
Specify the methods used to conduct inference (e.g. LRT, bootstrap), and if relevant, explain why (e.g. Kenward Rogers might be used due to a small number of level 2 units).

The results

Writing up results will vary depending on the strategies employed. The important part is to highlight the relevant test/comparison that addresses the research aims, and explain what the result means with respect to the question at hand.
Additionally, be sure to take some time to understand what the estimate actually means ($p<.05$ is just a small part of the story). With models like these we are almost always just looking at outcome “differences” between levels of a categorical predictor or “change” across some continuous predictor. Does the estimated difference/change, and its direction, make sense to you? What does it mean practically? Asking yourself questions like this is also a good way of sense checking your analysis (i.e. a strong counter-intuitive finding could mean you have a variable coded back to front!).

For reporting parameter estimates, ideally we would include both the estimate and the precision (i.e. the standard error or a confidence interval). When reporting statistical tests, make sure to include the test statistic ($t$, $F$, $\chi^2$, etc.), the relevant degrees of freedom, and the p-value.

A checklist

results of model comparisons and what they mean in the context of the research question
parameter estimates and precision for relevant fixed effects.
variance components
- how does the effect of interest vary between groups?
- is it related to other group level variance (i.e. the random effect correlations if modelled)
if relevant - sensitivity to influential observations and clusters.