Day 2: hierarchical and nonlinear models

many groups and curving lines

Andrew MacDonald (Universite de Sherbrooke)
03-24-2022

Outline of today:

Quick review

Bird masses

This example is based on work by Marie-Eve at UdeS!

We imagine a model like the following:

\[ \begin{align} \text{Nestlings}_i & \sim \text{Poisson}(\lambda_i) \\ \text{log}(\lambda_i) &= \beta_0 + \beta_1 \times \text{Mass}_i \\ \beta_0 & \sim \text{Normal}(??, ??) \\ \beta_1 & \sim \text{Normal}(??, ??) \end{align} \]

\(i\) keeps track of which bird we are talking about. You can think of it as “bird number i”

Note: We could also write the model like this:

\[ \begin{align} \text{Nestlings}_i & \sim \text{Poisson}(e^{\beta_0} \times e^{\beta_1 \times \text{Mass}_i}) \\ \beta_0 & \sim \text{Normal}(??, ??) \\ \beta_1 & \sim \text{Normal}(??, ??) \end{align} \]

Centering variables

Centering variables is one of the most important things we can do to help our models be more interpretable. This also helps us to set good priors.

Centering a variable means to subtract the mean from the variable:

\[ \begin{align} \text{Nestlings}_i & \sim \text{Poisson}(\lambda_i) \\ \text{log}(\lambda_i) &= \beta_0 + \beta_1 \times (\text{Mass}_i - \overline{\text{Mass}}) \\ \beta_0 & \sim \text{Normal}(??, ??) \\ \beta_1 & \sim \text{Normal}(??, ??) \end{align} \] Question How does this change the meaning of \(\beta_0\) and/or \(\beta_1\), if at all? (Hint: what will be the equation for a bird who has exactly average mass?)

set.seed(1234)

n_birds <- 15
avg_nestlings_at_avg_mass <- log(4.2)
effect_of_one_gram <- .2

mother_masses_g <- rnorm(n_birds, mean = 15, sd = 3)
avg_mother_mass <- mean(mother_masses_g)

log_average_nestlings <- avg_nestlings_at_avg_mass + 
  effect_of_one_gram * (mother_masses_g - avg_mother_mass)

nestlings <- rpois(n = n_birds, lambda = exp(log_average_nestlings))

Plot these to get an idea of it:

suppressPackageStartupMessages(library(tidyverse))
imaginary_birds <- tibble(mother_masses_g, nestlings)

ggplot(imaginary_birds, aes(x = mother_masses_g, y = nestlings)) + 
  geom_point()

NOTE We can also fit this very same model by frequentist statistics, using lm

coef(glm(nestlings ~ 1 + I(mother_masses_g - mean(mother_masses_g)), family = "poisson"))
                               (Intercept) 
                                 1.4138103 
I(mother_masses_g - mean(mother_masses_g)) 
                                 0.1727791 
# compare to known values
avg_nestlings_at_avg_mass
[1] 1.435085
effect_of_one_gram
[1] 0.2

Bayesian workflow: define a model and priors

library(brms)

imaginary_birds_centered <- imaginary_birds |> 
  mutate(mother_mass_g_cen = mother_masses_g - mean(mother_masses_g))

bird_form <- bf(nestlings ~ 1 + mother_mass_g_cen, family = poisson(link = "log"))

get_prior(bird_form, data = imaginary_birds_centered)
                  prior     class              coef group resp dpar
                 (flat)         b                                  
                 (flat)         b mother_mass_g_cen                
 student_t(3, 1.4, 2.5) Intercept                                  
 nlpar bound       source
                  default
             (vectorized)
                  default

We set a prior for each parameter.

bird_priors <- c(
  prior(normal(1, .5), class = "Intercept"),
  prior(normal(.1, .1), class = "b", coef = "mother_mass_g_cen")
)

prior predictive checks

prior_predictions <- brm(bird_form,
                         data = imaginary_birds_centered,
                         prior = bird_priors, 
                         sample_prior = "only", 
                         file = "bird_model",
                         file_refit = "on_change",
                         refresh = FALSE)

plot a few of these

library(tidybayes)
imaginary_birds_centered |> 
  add_predicted_draws(prior_predictions, ndraws = 6, seed = 4321) |> 
  ggplot(aes(x = mother_masses_g, y = .prediction)) + geom_point() + facet_wrap(~.draw)

Question are we satisfied with these priors?

Fit to the data

bird_posterior <- update(prior_predictions, sample_prior = "yes", 
                         file = "bird_posterior", 
                         file_refit = "on_change", refresh = FALSE)
summary(bird_posterior)
 Family: poisson 
  Links: mu = log 
Formula: nestlings ~ 1 + mother_mass_g_cen 
   Data: imaginary_birds_centered (Number of observations: 15) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept             1.39      0.13     1.13     1.64 1.00     2613
mother_mass_g_cen     0.16      0.04     0.08     0.25 1.00     2536
                  Tail_ESS
Intercept             2494
mother_mass_g_cen     2402

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
knitr::kable(head(tidybayes::tidy_draws(bird_posterior)))
.chain .iteration .draw b_Intercept b_mother_mass_g_cen prior_Intercept prior_b_mother_mass_g_cen lprior lp__ accept_stat__ stepsize__ treedepth__ n_leapfrog__ divergent__ energy__
1 1 1 1.643954 0.1813006 0.5520973 0.1319884 -0.0019872 -31.73046 0.8433474 0.819504 2 3 0 33.76309
1 2 2 1.140257 0.1877014 1.3621787 0.0842671 0.7339344 -30.96103 1.0000000 0.819504 2 3 0 32.63675
1 3 3 1.060480 0.1662869 0.4521973 -0.1016976 0.9308422 -32.60084 0.7210825 0.819504 1 1 0 32.68563
1 4 4 1.138671 0.1571804 1.0768996 0.2397268 0.9559159 -31.31604 1.0000000 0.819504 1 1 0 32.62936
1 5 5 1.443166 0.2059157 1.4607618 0.0400797 0.2041559 -29.77087 0.9950998 0.819504 2 7 0 31.04836
1 6 6 1.545824 0.1237708 1.3101264 0.0343844 0.5337555 -29.80001 0.8797944 0.819504 2 5 0 31.09643

How do our priors and posteriors compare?

library(ggridges)
tidybayes::tidy_draws(bird_posterior) |> 
  select(.draw, b_Intercept:prior_b_mother_mass_g_cen) |> 
  pivot_longer(-.draw) |> 
  ggplot(aes(x = value, y = name)) + geom_density_ridges()

Can we draw the regression line?

average_mom <- mean(mother_masses_g)

range(imaginary_birds_centered$mother_mass_g_cen)
[1] -6.025202  4.265214
tibble(mother_mass_g_cen = modelr::seq_range(imaginary_birds_centered$mother_mass_g_cen, 
                                             n = 10)) |> 
  tidybayes::add_epred_draws(bird_posterior) |> 
  ungroup() |> 
  ggplot(aes(x = average_mom + mother_mass_g_cen, y = .epred)) + 
  stat_lineribbon() + 
  scale_fill_brewer(palette = "Greens", direction = -1) + 
  geom_point(aes(x = mother_masses_g, y = nestlings),
             data = imaginary_birds_centered, pch = 21,
             fill = "orange", size = 3)

let’s also try drawing the prediction intervals

average_mom <- mean(mother_masses_g)

range(imaginary_birds_centered$mother_mass_g_cen)
[1] -6.025202  4.265214
tibble(mother_mass_g_cen = modelr::seq_range(imaginary_birds_centered$mother_mass_g_cen, 
                                             n = 10)) |> 
  tidybayes::add_predicted_draws(bird_posterior) |> 
  ungroup() |> 
  ggplot(aes(x = average_mom + mother_mass_g_cen, y = .prediction)) + 
  stat_lineribbon() + 
  scale_fill_brewer(palette = "Greens", direction = -1) + 
  geom_point(aes(x = mother_masses_g, y = nestlings),
             data = imaginary_birds_centered, pch = 21,
             fill = "orange", size = 3)

Other checks we can do:

bird_posterior_onlyparam <- update(prior_predictions, sample_prior = "no", 
                         file = "bird_posterior", 
                         file_refit = "on_change", refresh = FALSE)

shinystan::launch_shinystan(bird_posterior_onlyparam)

Multilevel models

Based on the awesome vignette for vignette for tidybayes

We begin by sampling some data from five different “conditions”:

library(modelr)
set.seed(5)
n <- 10
n_condition <- 5
ABC <-
  data_frame(
    condition = rep(c("A", "B", "C", "D", "E"), n),
    response = rnorm(n * 5, c(0, 1, 2, 1, -1), 0.5)
  )

ABC %>%
  ggplot(aes(y = condition, x = response)) +
  geom_point(pch = 21, size = 4, stroke = 1.4, fill = "#41b6c4")

And by fitting a model to these data, with varying intercepts for each group:

m <- brm(
  response ~ (1 | condition), data = ABC, 
  control = list(adapt_delta = .99),
  prior = c(
    prior(normal(0, 1), class = Intercept),
    prior(student_t(3, 0, 1), class = sd),
    prior(student_t(3, 0, 1), class = sigma)
  )
)

An easy way to visualize these results is with a ridgeline plot as above

ABC %>%
  modelr::data_grid(condition) %>%
  tidybayes::add_predicted_draws(m) %>%
  ggplot(aes(x = .prediction, y = condition)) +
  geom_density_ridges(fill = "#41b6c4") + 
  theme_minimal()

Alright. This used the simple vanilla option, add_predicted_samples(m). This uses the default options for making predictions, which recall is “NULL (default), include all group-level effects”. If you set add_predicted_samples(m, re_formula = NULL), you’ll get exactly the same figure.
So we can see that to “include” an effect is to take the actual estimated intercepts for each specific group we studied and use them to make new predictions for the same groups. This is Case 1 from McElreath’s list (though in this case, because we only have groups and nothing else, Case 1 and 2 are the same).

We can also say the exact same thing using a formula:

ABC %>%
  data_grid(condition) %>%
  add_predicted_draws(m, re_formula = ~(1|condition)) %>%
  ggplot(aes(x = .prediction, y = condition)) +
  geom_density_ridges(fill = "#41b6c4") +  
  theme_minimal()

That’s right, there are three ways to say the exact same thing: say nothing, say NULL, or say the original “random effects” formula1. You go with what you feel in your heart is right, but I prefer the formula.
In all three cases, we are using the model to predict the means for the groups in our varying-intercepts model. This is what the documentation means by “including” these varying intercepts.

Squishing those random effects

OK so that was three separate ways to make predictions for the same groups. What else can we do? Let’s try that thing with the NA argument, which means “include no group-level effects”:

ABC %>%
  data_grid(condition) %>%
  add_predicted_draws(m, re_formula = NA,
                        n = 2000) %>%
  ggplot(aes(x = .prediction, y = condition)) +
  geom_density_ridges(fill = "#41b6c4") +    theme_minimal()

Ah, so if you do this, all the groups come out the same! But if they’re all the same, what do they represent? It seems reasonable that they represent the model’s intercept, as if the varying intercepts were all 0. Let’s calculate predicitons that ignore the varying effects – that is, using only the model intercept and the standard deviation of the response – using a bit of [handy purrr magic]2:

m %>% 
  spread_draws(b_Intercept, sigma) %>% 
  mutate(prediction = rnorm(length(b_Intercept), b_Intercept, sigma),
         #map2_dbl(b_Intercept, sigma, ~ rnorm(1, mean = .x, sd = .y)),
         Prediction = "prediction") %>% #glimpse %>% 
  ggplot(aes(x = prediction, y = Prediction)) +
  geom_density_ridges(fill = "#41b6c4") +    
  theme_minimal()

As you can see, this distribution has exactly the same shape as the five in the previous figure! It is as if we calculated the predictions for a group which was exactly at the average (in other words, it had a varying intercept of 0.) In the Rethinking book, readers are taught to do this in a much more explicit way: you actually generate all the 0 intercepts yourself, and give that to the model in place of the estimated intercepts! A very manual and concrete way to “set something to 0”.

brms does this too. As the documentation says >NA values within factors in newdata, are interpreted as if all dummy variables of this factor are zero.

The brms phrasing certainly takes less space, though it also requires you to remember that this is what NA gets you!

We can also remove random effects from our predictions by excluding them from the re_formula. In our model, we have only one varying effect – yet an even simpler formula is possible, a model with no intercept at all:

ABC %>%
  data_grid(condition) %>%
  add_predicted_draws(m, re_formula = ~ 0,
                        n = 2000) %>%
  ggplot(aes(x = .prediction, y = condition)) +
  geom_density_ridges(fill = "#41b6c4") + theme_minimal() 

Once again, the same distribution appears: it is as if all group effects had been set to zero. If we had two random effects and omitted one, this is what we would get for the omitted effect – the expected value if all its effects were 0.

New levels

I’m going to show how to create predictions for new levels, but first I’m going to show two mistakes that I made frequently while learning:

First, asking for new levels without specifying allow_new_levels = TRUE:

# this does not work at all!!
data_frame(condition = "bugaboo") %>%
  add_predicted_draws(m, re_formula = ~(1|condition),
                        n = 2000)
Error: Levels 'bugaboo' of grouping factor 'condition' cannot be found in the fitted model. Consider setting argument 'allow_new_levels' to TRUE.

That fails because I tried to pass in a level of my grouping variable that wasn’t in the original model!

Second, passing in new levels – but telling the function to just ignore them:

data_frame(condition = "bugaboo") %>%
  add_predicted_draws(m, re_formula = NA,#~(1|condition),
                        n = 2000) %>%
  ggplot(aes(x = .prediction, y = condition)) +
  geom_density_ridges(fill = "#41b6c4") + 
  theme_minimal()

Here, i’m still passing in the unknown level – but the function doesn’t complain, because I’m not including random effects at all! This is the same result from above, when we used NA or ~0 to remove varying effects altogether. This is definitely something to watch for if you are passing in new data (I made this mistake, and it cost me an afternoon!)

If we avoid both of these errors, we get what we expect: our means for our original groups, and a new predicted mean for "bugaboo":

ABC %>%
  data_grid(condition) %>% 
  add_row(condition = "bugaboo") %>%
  add_predicted_draws(m, re_formula = ~(1|condition),
                        allow_new_levels = TRUE,
                        n = 2000) %>%
  ggplot(aes(x = .prediction, y = condition)) +
  geom_density_ridges(fill = "#41b6c4") +    theme_minimal()

Here you can see that the new level is much flatter than the other original five. It comes from the same population as the others, which is rather variable (the group means are sort of different to each other). As a result, this new distribution is quite wide, including all that uncertainty.

An ecologist might do something like this if we were had data on some species in a community, but wanted to make predictions for new, as yet unobserved, species we might find next year.


  1. this impulse in R to “help your users” by making it possible to say a great deal by saying almost nothing is… actually pretty counterproductive, I’d argue? But that’s a different post↩︎

  2. no magic required! rnorm is already vectorized↩︎