Summarizing many univariate models

A secret weapon for when you’re building hierarchical models.

We’ve already looked at univariate models. When we fit the same model to multiple different groups, we don’t expect the same values for all the coefficients. Each thing we are studying will respond to the same variable in different ways.

Hierarchial models represent a way to model this variation, in ways that range from simple to complex.

Before we dive in with hierarchical structure, let’s build a bridge between these two approaches.

This is useful to help us understand what a hierarchical model does.

However it is also useful from a strict model-building perspective – so useful that Andrew Gelman calls it a “Secret Weapon”

data(mite, package = "vegan")
data("mite.env", package = "vegan")
library(tidyverse)
library(cmdstanr)
This is cmdstanr version 0.7.1
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /home/andrew/software/cmdstan
- CmdStan version: 2.34.1
# combine data and environment
mite_data_long <- bind_cols(mite.env, mite) |> 
  pivot_longer(Brachy:Trimalc2, names_to = "spp", values_to = "abd")

To keep things simple and univariate, let’s consider only water:

First, a quick word about centering and scaling a predictor variable:

  1. I center the predictor by subtracting the mean. This changes the intercept of my linear predictor. it becomes the mean log-odds of occurrance when the water content is average
  2. I divide water content by 100. The dataset has units of grams per Litre of water (see ?vegan::mite.env for more details). This is fine, but I don’t think mites are able to sense differences as precise as a millimeter of water either way. by dividing by 10 I transform this into centilitres, which is more informative.
mite_data_long_transformed <- mite_data_long |> 
  mutate(presabs = as.numeric(abd>0),
         # center predictors
         water = (WatrCont - mean(WatrCont)) / 100
         )

mite_data_long_transformed |> 
  ggplot(aes(x = water, y = presabs)) + 
  geom_point() + 
  stat_smooth(method = "glm", method.args = list(family = "binomial")) + 
  facet_wrap(~spp)
`geom_smooth()` using formula = 'y ~ x'

some things to notice about this figure:

mite_many_glms <- mite_data_long_transformed |> 
  nest_by(spp) |> 
  mutate(logistic_regressions = list(
    glm(presabs ~ water,
        family = "binomial",
        data = data))) |> 
  mutate(coefs = list(broom::tidy(logistic_regressions)))
Split-Apply-Combine

To explore this kind of thinking, we are going to use an approach sometimes called “split-apply-combine”

There are many possible ways to do this in practice. We are using a technique here from the tidyverse, which you can read more about.

mite_many_glm_coefs <- mite_many_glms |> 
  select(-data, -logistic_regressions) |> 
  unnest(coefs)

mite_many_glm_coefs |> 
  ggplot(aes(x = estimate, y = spp,
             xmin = estimate - std.error,
             xmax = estimate + std.error)) + 
  geom_pointrange() + 
  facet_wrap(~term, scales = "free")

As you can see, some of these estimates are high, others low. We could also plot these as histograms to see this distribution.

mite_many_glm_coefs |> 
  ggplot(aes(x = estimate)) + 
  geom_histogram(binwidth = .5) + 
  facet_wrap(~term, scales = "free")

Once again, the two parameters of this model represent:

Say it in Stan

The above tidyverse approach is very appealing and intuitive, but we can also do the same procedure in Stan.

all_species_unpooled <- cmdstan_model(
  stan_file = "topics/correlated_effects/all_species_unpooled.stan", 
  pedantic = TRUE)

all_species_unpooled
data {
  // number of rows in dataset
  int<lower=0> Nsites;
  // number of species
  int<lower=0> S;
  // one environmental variable to use in prediction
  vector[Nsites] x;
  // response site (rows) by species (columns) 2D array
  array[Nsites,S] int <lower=0,upper=1> y;
}
parameters {
  // parameters are now VECTORS
  vector[S] intercept;
  vector[S] slope;
}
model {
  for (s in 1:S){
    y[,s] ~ bernoulli_logit(intercept[s] + slope[s] * x);
  }
  // priors don't change because Stan is vectorized:
  // every element of the vector gets the same prior
  intercept ~ normal(0, .5);
  slope ~ normal(0, .5);
}

Let’s fit this model by passing in the data:

mite_bin <- mite
mite_bin[mite_bin>0] <- 1

mite_pa_list <- list(
      Nsites = nrow(mite_bin),
      S = ncol(mite_bin),
      x = with(mite.env, (WatrCont - mean(WatrCont))/100),
      y = as.matrix(mite_bin)
    )

all_species_unpooled_posterior <- 
  all_species_unpooled$sample(
    data = mite_pa_list, 
    refresh = 1000, parallel_chains = 4
    )
Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 finished in 1.7 seconds.
Chain 3 finished in 1.6 seconds.
Chain 4 finished in 1.6 seconds.
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 1.9 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.7 seconds.
Total execution time: 2.1 seconds.

now let’s try to plot this:

# start by looking at the names of variables
# get_variables(all_species_unpooled_posterior)

post_pred <- tidybayes::spread_rvars(all_species_unpooled_posterior, 
             intercept[spp_id], slope[spp_id]) |> 
  expand_grid(water = seq(from = -4, to = 4, length.out = 10)) |> 
  mutate(prob = posterior::rfun(plogis)(intercept + slope*water),
         spp = colnames(mite_bin)[spp_id]) |> 
  ggplot(aes(x = water, dist = prob)) + 
  tidybayes::stat_lineribbon() + 
  facet_wrap(~spp) + 
  scale_fill_brewer(palette = "Greens")

post_pred

We can imitate the original figure by adding the observed data in orange:

post_pred + 
  geom_point(aes(x = water, y = presabs), 
             inherit.aes = FALSE, 
             data = mite_data_long_transformed,
             pch = 21, 
             fill = "orange")

Plot and compare to frequentist point estimates

long_rvars <- tidybayes::gather_rvars(
  all_species_unpooled_posterior, 
             intercept[spp_id], slope[spp_id]) 


mite_many_glm_coefs |> 
  select(spp, term, estimate)
# A tibble: 70 × 3
# Groups:   spp [35]
   spp      term        estimate
   <chr>    <chr>          <dbl>
 1 Brachy   (Intercept)   2.43  
 2 Brachy   water        -0.547 
 3 Ceratoz1 (Intercept)   0.466 
 4 Ceratoz1 water         0.0273
 5 Ceratoz3 (Intercept)  -0.237 
 6 Ceratoz3 water         0.280 
 7 Eupelops (Intercept)  -0.464 
 8 Eupelops water        -0.535 
 9 FSET     (Intercept)  -0.644 
10 FSET     water        -1.82  
# ℹ 60 more rows

Saying it another way: with matrix algebra

First let’s look at how to scale a univariate distribution:

z <- rnorm(300, mean = 0, sd = 1)
hist(z)
sd(z)
hist(z*2.7)
sd(z*2.7)

[1] 0.9623279

[1] 2.598285

you can also scale two things at once with a diagonal matrix:

two_sds <- diag(c(.4, 7))

zz <- matrix(rnorm(200), ncol = 2)

rescaled_zz <- zz %*% two_sds

plot(rescaled_zz)

apply(rescaled_zz, 2, sd)
[1] 0.3745896 7.4239736

We can apply this approach in Stan to create the SAME model as above:

all_species_unpooled_diag <- cmdstan_model(
  stan_file = "topics/correlated_effects/all_species_unpooled_diag.stan", 
  pedantic = TRUE)

all_species_unpooled_diag
data {
  int<lower=0> Nsites;         // num of sites in the dataset
  // int<lower=1> 2;              // num of site predictors
  int<lower=1> S;              // num of species
  matrix[Nsites, 2] x;         // site-level predictors
  array[Nsites, S] int<lower=0, upper=1> y;  // species presence or absence
}
transformed data {
 vector<lower=0>[2] sd_params = [0.5, 0.5]'; 
}
parameters {
  // species departures from the average
  matrix[2, S] z;               
  // AVERAGE of slopes and intercepts
  vector[2] gamma;
}
transformed parameters {
  matrix[2, S] beta = rep_matrix(gamma, S) + diag_pre_multiply(sd_params, z);
}
model {
  matrix[Nsites, S] mu;
  // calculate the model average
  mu = x * beta;
  // Likelihood
  for (s in 1:S) {
    y[,s] ~ bernoulli_logit(mu[,s]);
  }
  // priors
  to_vector(z) ~ std_normal();
  gamma ~ normal(0, 2);
}
all_species_unpooled_diag_sample <- all_species_unpooled_diag$sample(
  data = mite_pa_list, 
  refresh = 1000, parallel_chains = 4
)
Running MCMC with 4 parallel chains...
Chain 1 Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=x; dims declared=(70,2); dims found=(70) (in '/tmp/RtmpLDZ6cu/model-f7225abe2cdc.stan', line 5, column 2 to column 22)
Chain 2 Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=x; dims declared=(70,2); dims found=(70) (in '/tmp/RtmpLDZ6cu/model-f7225abe2cdc.stan', line 5, column 2 to column 22)
Chain 3 Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=x; dims declared=(70,2); dims found=(70) (in '/tmp/RtmpLDZ6cu/model-f7225abe2cdc.stan', line 5, column 2 to column 22)
Chain 4 Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=x; dims declared=(70,2); dims found=(70) (in '/tmp/RtmpLDZ6cu/model-f7225abe2cdc.stan', line 5, column 2 to column 22)
Warning: Chain 1 finished unexpectedly!
Warning: Chain 2 finished unexpectedly!
Warning: Chain 3 finished unexpectedly!
Warning: Chain 4 finished unexpectedly!
Warning: All chains finished unexpectedly! Use the $output(chain_id) method for more information.
Warning: Use read_cmdstan_csv() to read the results of the failed chains.
Warning: No chains finished successfully. Unable to retrieve the fit.

Making it hierarchical

all_species_partpooled_diag <- cmdstan_model(
  stan_file = "topics/correlated_effects/all_species_partpooled_diag.stan", 
  pedantic = TRUE)

all_species_partpooled_diag
data {
  int<lower=0> Nsites;         // num of sites in the dataset
  int<lower=1> S;              // num of species
  matrix[Nsites, 2] x;         // site-level predictors
  array[Nsites, S] int<lower=0, upper=1> y;  // species presence or absence
}
parameters {
  matrix[2, S] z;               // species departures from the average
  vector[2] gamma;              // AVERAGE of slopes and intercepts
  vector<lower=0>[2] sd_params;          // standard deviations of species departures
}
transformed parameters {
  matrix[2, S] beta = rep_matrix(gamma, S) + diag_pre_multiply(sd_params, z);
}
model {
  matrix[Nsites, S] mu;
  // calculate the model average
  mu = x * beta;
  // Likelihood
  for (s in 1:S) {
    y[,s] ~ bernoulli_logit(mu[,s]);
  }
  // priors
  to_vector(z) ~ std_normal();
  sd_params ~ exponential(2);
  gamma ~ normal(0, 2);
}
Covariance matrices

If you’ve fit a lot of hierarchical models you may know that usually, slopes and intercepts are modelled as correlated. In this course, we’ve decided to stop here, but the material below is available for anyone who wants

Modelling COvariation

all_species_partpooled_diag_corr <- cmdstan_model(
  stan_file = "topics/correlated_effects/all_species_partpooled_diag_corr.stan", 
  pedantic = TRUE)

all_species_partpooled_diag_corr
data {
  int<lower=0> Nsites;         // num of sites in the dataset
  int<lower=1> S;              // num of species
  matrix[Nsites, 2] x;         // site-level predictors
  array[Nsites, S] int<lower=0, upper=1> y;  // species presence or absence
}
parameters {
  // species departures from the average
  matrix[2, S] z;               
  // AVERAGE of slopes and intercepts
  vector[2] gamma;              
  // standard deviations of species departures
  vector<lower=0>[2] sd_params; 
  // add the correlations between slope and intercept
  cholesky_factor_corr[2] slope_inter_corr;
}
transformed parameters {
  matrix[2, S] beta = rep_matrix(gamma, S) + 
    diag_pre_multiply(sd_params, slope_inter_corr) * z;
}
model {
  matrix[Nsites, S] mu;
  // calculate the model average
  mu = x * beta;
  // Likelihood
  for (s in 1:S) {
    y[,s] ~ bernoulli_logit(mu[,s]);
  }
  // priors
  to_vector(z) ~ std_normal();
  sd_params ~ exponential(2);
  gamma ~ normal(0, .5);
  slope_inter_corr ~ lkj_corr_cholesky(2);
}
mite_data_list <- list(
  Nsites = nrow(mite_bin),
  S = ncol(mite_bin),
  x = cbind(1, with(mite.env, (WatrCont - mean(WatrCont))/100)),
  y = as.matrix(mite_bin))

all_species_partpooled_diag_corr_posterior <- 
  all_species_partpooled_diag_corr$sample(
  data = mite_data_list,
  refresh = 0, parallel_chains = 4)
Running MCMC with 4 parallel chains...
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2 
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2 
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2 
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2 
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2 
Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 3 Exception: bernoulli_logit_lpmf: Logit transformed probability parameter[1] is -nan, but must be not nan! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 27, column 4 to column 36)
Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 3 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4 
Chain 2 finished in 3.4 seconds.
Chain 3 finished in 4.0 seconds.
Chain 1 finished in 4.1 seconds.
Chain 4 finished in 4.3 seconds.

All 4 chains finished successfully.
Mean chain execution time: 4.0 seconds.
Total execution time: 4.4 seconds.

plot these, reproducing the figure from earlier:

# get the unpooled numbers
unpooled_slopes <- all_species_unpooled_posterior |> 
  tidybayes::spread_rvars(slope[spp])

# tidybayes::get_variables(all_species_partpooled_diag_posterior)

partpooled_slopes <- all_species_partpooled_diag_corr_posterior |> 
  tidybayes::spread_rvars(beta[param, spp]) |> 
  filter(param == 2)

left_join(unpooled_slopes,
          partpooled_slopes,
          by = "spp") |> 
  ggplot(aes(x = median(slope), y = median(beta))) + 
  geom_point() + 
  geom_abline(intercept = 0, slope = 1)

unpooled_params <- all_species_unpooled_posterior |> 
  tidybayes::spread_rvars(slope[spp], intercept[spp]) |> 
  mutate(slope = median(slope), 
         intercept = median(intercept))



all_species_partpooled_diag_corr_posterior |> 
  tidybayes::spread_rvars(beta[param, spp]) |> 
  mutate(beta = median(beta),
         param = c("intercept", "slope")[param]) |> 
  pivot_wider(names_from = "param", values_from = beta) |> 
  ggplot(aes(x = intercept, y = slope)) + 
  geom_point() + 
  geom_point(data = unpooled_params, col = "red")

Correlating numbers with a cholesky decomposition of a correlation matrix.

eg <- rethinking::rlkjcorr(1, 2, 1)
cc <- chol(eg)

zz <- matrix(data = rnorm(2000), ncol = 2)
plot(zz)

cor(zz)
            [,1]        [,2]
[1,]  1.00000000 -0.08092257
[2,] -0.08092257  1.00000000
rr <- t(cc) %*% t(zz)
plot(t(rr))

cor(t(rr))[1,2]
[1] 0.5570892

The LKJ prior distribution is a prior over correlation matrices

# rethinking::rlkjcorr(1, 5, .3)