Linear modelling workflow
You won’t need to use every step below for every analysis, and they don’t need to be in this specific order either. Think of these steps like a buffet to pick and choose from, depending on what your analysis needs.
Phase 0: Before collecting data
- Run a power analysis to see how many participants you’ll need to recruit, given the effect size you expect to see or the smallest effect size that’s still theoretically interesting.
Phase 1: Before model fitting
- Load the required R packages and read in your data.
- Tidy data (e.g., any missingness, any implausible values? are data types set correctly?).
- Identify relevant variables by mapping them to the RQ and data generating process: What’s the outcome? What variables contribute reproducible variability? What grouping variables contribute random variability?
- Get summary statistics for the relevant variables.
- Explore patterns in the data by plotting the relevant variables together.
- Set up categorical (fixed) predictors (e.g., factor levels? which a priori coding scheme?).
- Set up continuous (fixed) predictors (e.g., any transformations?).
- Based on the outcome variable, choose whether to use regular regression or logistic regression.
- Based on the RQ, decide whether the RQ requires an interaction model.
- Identify maximal model.
Phase 2: Model fitting and troubleshooting
- Try fitting the maximal model to the data.
- If convergence errors: Try a different optimiser and fit model again.
- If continued convergence errors or singular fit: Simplify random effect structure and fit model again.
- If suggestion to rescale: Convert continuous predictor(s) to z-scores and fit model again.
Phase 3: After model fitting
- Check model assumptions.
- see DAPR2 regular regression assumptions flash card
- see DAPR2 logistic regression assumptions flash card
- see LMM assumptions flash card
- If model relies on assumptions about normality/equal variance of errors and if these are violated, bootstrap model estimates.
- Run diagnostics for multicollinearity.
- Run diagnostics for outlying and influential data points (aka case diagnostics).
- see DAPR2 regular regression assumptions flash card
- see DAPR2 logistic regression assumptions flash card
- If outlying/influential data points detected: run sensitivity analysis.
- Interpret the model estimates.
- If model contains interactions:
- Compute simple slopes/simple effects (incl. corrections for multiple comparisons, if necessary).
- Plot simple slopes/simple effects.
- Plot fixed effects along with group-level variability.
- Write up your analysis and results.