A secret weapon for when you’re building hierarchical models.
We’ve already looked at univariate models. When we fit the same model to multiple different groups, we don’t expect the same values for all the coefficients. Each thing we are studying will respond to the same variable in different ways.
Hierarchial models represent a way to model this variation, in ways that range from simple to complex.
Before we dive in with hierarchical structure, let’s build a bridge between these two approaches.
This is useful to help us understand what a hierarchical model does.
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /home/andrew/software/cmdstan
- CmdStan version: 2.34.1
# combine data and environmentmite_data_long <-bind_cols(mite.env, mite) |>pivot_longer(Brachy:Trimalc2, names_to ="spp", values_to ="abd")
To keep things simple and univariate, let’s consider only water:
First, a quick word about centering and scaling a predictor variable:
I center the predictor by subtracting the mean. This changes the intercept of my linear predictor. it becomes the mean log-odds of occurrance when the water content is average
I divide water content by 100. The dataset has units of grams per Litre of water (see ?vegan::mite.env for more details). This is fine, but I don’t think mites are able to sense differences as precise as a millimeter of water either way. by dividing by 10 I transform this into centilitres, which is more informative.
data {// number of rows in datasetint<lower=0> Nsites;// number of speciesint<lower=0> S;// one environmental variable to use in predictionvector[Nsites] x;// response site (rows) by species (columns) 2D arrayarray[Nsites,S] int <lower=0,upper=1> y;}parameters {// parameters are now VECTORSvector[S] intercept;vector[S] slope;}model {for (s in1:S){ y[,s] ~ bernoulli_logit(intercept[s] + slope[s] * x); }// priors don't change because Stan is vectorized:// every element of the vector gets the same prior intercept ~ normal(0, .5); slope ~ normal(0, .5);}
data {int<lower=0> Nsites; // num of sites in the dataset// int<lower=1> 2; // num of site predictorsint<lower=1> S; // num of speciesmatrix[Nsites, 2] x; // site-level predictorsarray[Nsites, S] int<lower=0, upper=1> y; // species presence or absence}transformed data {vector<lower=0>[2] sd_params = [0.5, 0.5]'; }parameters {// species departures from the averagematrix[2, S] z; // AVERAGE of slopes and interceptsvector[2] gamma;}transformed parameters {matrix[2, S] beta = rep_matrix(gamma, S) + diag_pre_multiply(sd_params, z);}model {matrix[Nsites, S] mu;// calculate the model average mu = x * beta;// Likelihoodfor (s in1:S) { y[,s] ~ bernoulli_logit(mu[,s]); }// priors to_vector(z) ~ std_normal(); gamma ~ normal(0, 2);}
Chain 1 Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=x; dims declared=(70,2); dims found=(70) (in '/tmp/RtmpLDZ6cu/model-f7225abe2cdc.stan', line 5, column 2 to column 22)
Chain 2 Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=x; dims declared=(70,2); dims found=(70) (in '/tmp/RtmpLDZ6cu/model-f7225abe2cdc.stan', line 5, column 2 to column 22)
Chain 3 Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=x; dims declared=(70,2); dims found=(70) (in '/tmp/RtmpLDZ6cu/model-f7225abe2cdc.stan', line 5, column 2 to column 22)
Chain 4 Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=x; dims declared=(70,2); dims found=(70) (in '/tmp/RtmpLDZ6cu/model-f7225abe2cdc.stan', line 5, column 2 to column 22)
Warning: Chain 1 finished unexpectedly!
Warning: Chain 2 finished unexpectedly!
Warning: Chain 3 finished unexpectedly!
Warning: Chain 4 finished unexpectedly!
Warning: All chains finished unexpectedly! Use the $output(chain_id) method for more information.
Warning: Use read_cmdstan_csv() to read the results of the failed chains.
Warning: No chains finished successfully. Unable to retrieve the fit.
data {int<lower=0> Nsites; // num of sites in the datasetint<lower=1> S; // num of speciesmatrix[Nsites, 2] x; // site-level predictorsarray[Nsites, S] int<lower=0, upper=1> y; // species presence or absence}parameters {matrix[2, S] z; // species departures from the averagevector[2] gamma; // AVERAGE of slopes and interceptsvector<lower=0>[2] sd_params; // standard deviations of species departures}transformed parameters {matrix[2, S] beta = rep_matrix(gamma, S) + diag_pre_multiply(sd_params, z);}model {matrix[Nsites, S] mu;// calculate the model average mu = x * beta;// Likelihoodfor (s in1:S) { y[,s] ~ bernoulli_logit(mu[,s]); }// priors to_vector(z) ~ std_normal(); sd_params ~ exponential(2); gamma ~ normal(0, 2);}
Covariance matrices
If you’ve fit a lot of hierarchical models you may know that usually, slopes and intercepts are modelled as correlated. In this course, we’ve decided to stop here, but the material below is available for anyone who wants
data {int<lower=0> Nsites; // num of sites in the datasetint<lower=1> S; // num of speciesmatrix[Nsites, 2] x; // site-level predictorsarray[Nsites, S] int<lower=0, upper=1> y; // species presence or absence}parameters {// species departures from the averagematrix[2, S] z; // AVERAGE of slopes and interceptsvector[2] gamma; // standard deviations of species departuresvector<lower=0>[2] sd_params; // add the correlations between slope and interceptcholesky_factor_corr[2] slope_inter_corr;}transformed parameters {matrix[2, S] beta = rep_matrix(gamma, S) + diag_pre_multiply(sd_params, slope_inter_corr) * z;}model {matrix[Nsites, S] mu;// calculate the model average mu = x * beta;// Likelihoodfor (s in1:S) { y[,s] ~ bernoulli_logit(mu[,s]); }// priors to_vector(z) ~ std_normal(); sd_params ~ exponential(2); gamma ~ normal(0, .5); slope_inter_corr ~ lkj_corr_cholesky(2);}
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2
Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 2 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 2
Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 3 Exception: bernoulli_logit_lpmf: Logit transformed probability parameter[1] is -nan, but must be not nan! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 27, column 4 to column 36)
Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 3
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: lkj_corr_cholesky_lpdf: Random variable[2] is 0, but must be positive! (in '/tmp/RtmpLDZ6cu/model-f72247c0874.stan', line 33, column 2 to column 42)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 4
Chain 2 finished in 3.4 seconds.
Chain 3 finished in 4.0 seconds.
Chain 1 finished in 4.1 seconds.
Chain 4 finished in 4.3 seconds.
All 4 chains finished successfully.
Mean chain execution time: 4.0 seconds.
Total execution time: 4.4 seconds.
plot these, reproducing the figure from earlier:
# get the unpooled numbersunpooled_slopes <- all_species_unpooled_posterior |> tidybayes::spread_rvars(slope[spp])# tidybayes::get_variables(all_species_partpooled_diag_posterior)partpooled_slopes <- all_species_partpooled_diag_corr_posterior |> tidybayes::spread_rvars(beta[param, spp]) |>filter(param ==2)left_join(unpooled_slopes, partpooled_slopes,by ="spp") |>ggplot(aes(x =median(slope), y =median(beta))) +geom_point() +geom_abline(intercept =0, slope =1)
The LKJ prior distribution is a prior over correlation matrices
# rethinking::rlkjcorr(1, 5, .3)
Source Code
---title: "Summarizing many univariate models"description: | A secret weapon for when you're building hierarchical models.execute: freeze: trueformat: html: code-tools: true---We've already looked at univariate models. When we fit the same model to multiple different groups, we don't expect the same values for all the coefficients. Each thing we are studying will respond to the same variable in different ways.Hierarchial models represent a way to *model* this variation, in ways that range from simple to complex.Before we dive in with hierarchical structure, let's build a bridge between these two approaches.This is useful to help us understand what a hierarchical model does.However it is also useful from a strict model-building perspective -- so useful that [Andrew Gelman calls it a "Secret Weapon"](https://statmodeling.stat.columbia.edu/2005/03/07/the_secret_weap/)```{r}data(mite, package ="vegan")data("mite.env", package ="vegan")library(tidyverse)library(cmdstanr)# combine data and environmentmite_data_long <-bind_cols(mite.env, mite) |>pivot_longer(Brachy:Trimalc2, names_to ="spp", values_to ="abd")```To keep things simple and univariate, let's consider only water:First, a quick word about centering and scaling a predictor variable:1. I center the predictor by subtracting the mean. This changes the *intercept* of my linear predictor. it becomes the mean log-odds of occurrance when the water content is average2. I divide water content by 100. The dataset has units of **grams per Litre** of water (see `?vegan::mite.env` for more details). This is fine, but I don't think mites are able to sense differences as precise as a millimeter of water either way. by dividing by 10 I transform this into centilitres, which is more informative.```{r}mite_data_long_transformed <- mite_data_long |>mutate(presabs =as.numeric(abd>0),# center predictorswater = (WatrCont -mean(WatrCont)) /100 )mite_data_long_transformed |>ggplot(aes(x = water, y = presabs)) +geom_point() +stat_smooth(method ="glm", method.args =list(family ="binomial")) +facet_wrap(~spp)```some things to notice about this figure:- the x-axis scale has been transformed from "grams per litre" to "centilitres away from average- there is a ton of variation in how different species respond to water!```{r}mite_many_glms <- mite_data_long_transformed |>nest_by(spp) |>mutate(logistic_regressions =list(glm(presabs ~ water,family ="binomial",data = data))) |>mutate(coefs =list(broom::tidy(logistic_regressions)))```:::{.callout-note}## Split-Apply-CombineTo explore this kind of thinking, we are going to use an approach sometimes called ["split-apply-combine"](https://vita.had.co.nz/papers/plyr.pdf)There are many possible ways to do this in practice. We are using a technique here from the tidyverse, which you can [read more about](https://dplyr.tidyverse.org/articles/rowwise.html).:::```{r}mite_many_glm_coefs <- mite_many_glms |>select(-data, -logistic_regressions) |>unnest(coefs)mite_many_glm_coefs |>ggplot(aes(x = estimate, y = spp,xmin = estimate - std.error,xmax = estimate + std.error)) +geom_pointrange() +facet_wrap(~term, scales ="free")```As you can see, some of these estimates are high, others low. We could also plot these as histograms to see this distribution.```{r}mite_many_glm_coefs |>ggplot(aes(x = estimate)) +geom_histogram(binwidth = .5) +facet_wrap(~term, scales ="free")```Once again, the two parameters of this model represent:- *Intercept* The probability (in log-odds) of a species being present at the average water concentration. some species are common, others are rare.- *water* this is the change in probability (in log-odds) as water increases by one centilitre per litre of substrate.## Say it in StanThe above tidyverse approach is very appealing and intuitive, but we can also do the same procedure in Stan.```{r}#| class-output: stanall_species_unpooled <-cmdstan_model(stan_file ="topics/correlated_effects/all_species_unpooled.stan", pedantic =TRUE)all_species_unpooled```Let's fit this model by passing in the data:```{r}mite_bin <- mitemite_bin[mite_bin>0] <-1mite_pa_list <-list(Nsites =nrow(mite_bin),S =ncol(mite_bin),x =with(mite.env, (WatrCont -mean(WatrCont))/100),y =as.matrix(mite_bin) )all_species_unpooled_posterior <- all_species_unpooled$sample(data = mite_pa_list, refresh =1000, parallel_chains =4 )```now let's try to plot this:```{r}#| warning: false# start by looking at the names of variables# get_variables(all_species_unpooled_posterior)post_pred <- tidybayes::spread_rvars(all_species_unpooled_posterior, intercept[spp_id], slope[spp_id]) |>expand_grid(water =seq(from =-4, to =4, length.out =10)) |>mutate(prob = posterior::rfun(plogis)(intercept + slope*water),spp =colnames(mite_bin)[spp_id]) |>ggplot(aes(x = water, dist = prob)) + tidybayes::stat_lineribbon() +facet_wrap(~spp) +scale_fill_brewer(palette ="Greens")post_pred```We can imitate the original figure by adding the observed data in orange:```{r}#| warning: falsepost_pred +geom_point(aes(x = water, y = presabs), inherit.aes =FALSE, data = mite_data_long_transformed,pch =21, fill ="orange")```Plot and compare to frequentist point estimates```{r}long_rvars <- tidybayes::gather_rvars( all_species_unpooled_posterior, intercept[spp_id], slope[spp_id]) mite_many_glm_coefs |>select(spp, term, estimate)```## Saying it another way: with matrix algebraFirst let's look at how to scale a univariate distribution:```{r}#| layout-ncol: 2z <-rnorm(300, mean =0, sd =1)hist(z)sd(z)hist(z*2.7)sd(z*2.7)```you can also scale two things at once with a diagonal matrix:```{r}two_sds <-diag(c(.4, 7))zz <-matrix(rnorm(200), ncol =2)rescaled_zz <- zz %*% two_sdsplot(rescaled_zz)apply(rescaled_zz, 2, sd)```We can apply this approach in Stan to create the SAME model as above:```{r}#| class-output: stanall_species_unpooled_diag <-cmdstan_model(stan_file ="topics/correlated_effects/all_species_unpooled_diag.stan", pedantic =TRUE)all_species_unpooled_diag``````{r}all_species_unpooled_diag_sample <- all_species_unpooled_diag$sample(data = mite_pa_list, refresh =1000, parallel_chains =4)```## Making it hierarchical```{r}#| class-output: stanall_species_partpooled_diag <-cmdstan_model(stan_file ="topics/correlated_effects/all_species_partpooled_diag.stan", pedantic =TRUE)all_species_partpooled_diag```:::{.callout-note}## Covariance matricesIf you've fit a lot of hierarchical models you may know that usually, slopes and intercepts are modelled as correlated. In this course, we've decided to stop here, but the material below is available for anyone who wants :::## Modelling COvariation```{r}#| class.output: stanall_species_partpooled_diag_corr <-cmdstan_model(stan_file ="topics/correlated_effects/all_species_partpooled_diag_corr.stan", pedantic =TRUE)all_species_partpooled_diag_corr``````{r}mite_data_list <-list(Nsites =nrow(mite_bin),S =ncol(mite_bin),x =cbind(1, with(mite.env, (WatrCont -mean(WatrCont))/100)),y =as.matrix(mite_bin))all_species_partpooled_diag_corr_posterior <- all_species_partpooled_diag_corr$sample(data = mite_data_list,refresh =0, parallel_chains =4)```plot these, reproducing the figure from earlier:```{r}# get the unpooled numbersunpooled_slopes <- all_species_unpooled_posterior |> tidybayes::spread_rvars(slope[spp])# tidybayes::get_variables(all_species_partpooled_diag_posterior)partpooled_slopes <- all_species_partpooled_diag_corr_posterior |> tidybayes::spread_rvars(beta[param, spp]) |>filter(param ==2)left_join(unpooled_slopes, partpooled_slopes,by ="spp") |>ggplot(aes(x =median(slope), y =median(beta))) +geom_point() +geom_abline(intercept =0, slope =1)```## ```{r}unpooled_params <- all_species_unpooled_posterior |> tidybayes::spread_rvars(slope[spp], intercept[spp]) |>mutate(slope =median(slope), intercept =median(intercept))all_species_partpooled_diag_corr_posterior |> tidybayes::spread_rvars(beta[param, spp]) |>mutate(beta =median(beta),param =c("intercept", "slope")[param]) |>pivot_wider(names_from ="param", values_from = beta) |>ggplot(aes(x = intercept, y = slope)) +geom_point() +geom_point(data = unpooled_params, col ="red")```## Correlating numbers with a cholesky decomposition of a correlation matrix.```{r}eg <- rethinking::rlkjcorr(1, 2, 1)cc <-chol(eg)zz <-matrix(data =rnorm(2000), ncol =2)plot(zz)cor(zz)rr <-t(cc) %*%t(zz)plot(t(rr))cor(t(rr))[1,2]```The LKJ prior distribution is a prior over correlation matrices```{r}# rethinking::rlkjcorr(1, 5, .3)```