Get p-values for LMM coefficients

When we fit LMMs, we are usually still interested in testing whether the fixed effects are significantly different from zero. We don’t need to do any significance testing for the random effects.

Example: Life satisfaction in Scotland

These data come from 112 people across 12 different Scottish dwellings (cities and towns). Information is captured on their ages and a measure of life satisfaction. The researchers are interested in if there is an association between age and life-satisfaction.

Data are available at https://uoepsy.github.io/data/lmm_lifesatscot.csv.

variable description
age Age (years)
lifesat Life Satisfaction score
dwelling Dwelling (town/city in Scotland)
size Size of Dwelling (> or <100k people)
lifesatscot <- read_csv("https://uoepsy.github.io/data/lmm_lifesatscot.csv")

Our model is the following.

lifesat_mod <- lmer(
  lifesat ~ age + (1 + age | dwelling),
  data = lifesatscot,
  control = lmerControl(optimizer = "bobyqa")
)

To see how we figured out this random effect structure, see Identify possible random effects. To see where the control= argument comes from, see Change optimiser.

Statistical significance of LMM coefficients

By default, summaries of LMMs do not show p-values beside each fixed effect coefficient the way that simple LMs do.

Backstory: In the linear models you’ve been fitting with lm(), p-values were obtained based on \(t\) (for the coefficient estimate) and \(F\) (the reduction in residual sums of squares) tests. Each of those tests has associated degrees of freedom.

For example, if we fit a linear model lm(y~x) to 10 data points, then our tests would have \(10-2=8\) degrees of freedom (\(n\) observations minus \(k\) parameters). We would therefore compare test statistics against, e.g., a \(t\) distribution with 8 degrees of freedom. Or if we fit the model to 100 observations, then we would compare to a \(t\) distribution with 98 degrees of freedom.

The degrees of freedom reflects the fact that there is more variability in statistics from smaller samples. Another way of thinking of degrees of freedom is that they are the number of independent datapoints that are left “free to vary” around our model parameters.

But we are now working with multilevel data, and in the scenario where we have, e.g., \(n_p\) pupils grouped into \(n_s\) schools, how many independent bits of information do we have to begin with? Is it \(n_p\)? Or \(n_s\)? Or somewhere in between? Our random effects are not “free to vary” in the sense that they are estimated under certain constraints (such as following a normal distribution).

In very specific situations that correspond to classical experimental designs (in which, e.g., we have perfectly balanced numbers across experimental factors and equal sizes within groups) it is possible to conduct similar \(F\) tests (and hence \(t\)-tests too) with a known degrees of freedom. Unfortunately, transferring this to more general scenarios is problematic (e.g., any missing data, unbalanced designs, more complex random effect structures). This is why the author of lme4 removed p-values from the model output.

However, there are various strategies that we can use to conduct inferences that either attempt to approximate the degrees of freedom, or use an alternative method based on, e.g., likelihoods or bootstrapping.

The most common way to add p-values into the model summary is to load the library lmerTest and then fit the model again using the lmer() function. (And when you report your model, be sure to mention that the p-values were estimated using the Satterthwaite method.)

library(lmerTest)

lifesat_mod <- lmer(
  lifesat ~ age + (1 + age | dwelling),
  data = lifesatscot,
  control = lmerControl(optimizer = "bobyqa")
)

summary(lifesat_mod)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: lifesat ~ age + (1 + age | dwelling)
   Data: lifesatscot
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 822

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9184 -0.6946 -0.0004  0.5799  1.9561 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 dwelling (Intercept) 320.337  17.898        
          age           0.178   0.422   -0.87
 Residual              63.478   7.967        
Number of obs: 112, groups:  dwelling, 12

Fixed effects:
            Estimate Std. Error     df t value Pr(>|t|)   
(Intercept)   28.259      6.334  9.659    4.46   0.0013 **
age            0.523      0.149  8.300    3.52   0.0075 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
    (Intr)
age -0.904

The Satterthwaite method that comes with lmerTest is a way of approximating the degrees of freedom of the \(t\) distribution that we compare our observed \(t\) statistics to.

The lmerTest package simply overwrites the lmer() function to use a version that has the degrees of freedom and associated p-values displayed in the summary.

It can be used for models fitted with either ML or REML, and generally scales well, so if you are fitting models to big datasets, it won’t take too much extra time.

Linked flash cards