We can also get a huge data frame. If you are comfortable manipulating data frames, you can use all your regular techniques here. Here I use tidyverse tools to reshape and plot the posterior distribution of the three \(\sigma\) variables:
Warning: Dropping 'draws_df' class as required metadata was removed.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Caution
Parameters from a posterior are NOT independent. If you want to combine parameters in any way (to calculate an average for example) you have to do it WITHIN each posterior sample. Work on ROWS of the draws data frame above
Calculate the posterior distribution of average abundance for each species
To do this we need to extract the average \(mu\) and add it to the species effects, \(\beta_{\text{species}}\)
spp_mu_rvars |>mutate(spp_avg = mu + spp_effects) |>ggplot(aes(dist = spp_avg, group = spp_id)) + tidybayes::stat_slab(col ="black") +coord_flip() +theme_minimal()
Source Code
---title: Fitting an intercept-only modeldescription: | Where is the variationexecute: freeze: trueformat: html: code-tools: true---## Resources* [tidybayes](http://mjskay.github.io/tidybayes/index.html) is an incredible tool, and the vignette is a great read for visualization approaches (even if you aren't using `rvars`)* the [posterior package](https://mc-stan.org/posterior/) is the best place to learn about how to manipulate Stan posterior distributions.$$\begin{align}\text{Abundance}_i &\sim \text{Poisson}(\lambda_i) \\\log{\lambda_i} &\sim \mu + \beta_{\text{sample}[i]} + \beta_{\text{species[i]}} + \beta_i\\\mu &\sim \text{Normal}(3, 1)\\\beta_{\text{sample}} &\sim \text{Normal}(0, \sigma_{\text{samp}})\\\beta_{\text{species}} &\sim \text{Normal}(0, \sigma_{\text{species}})\\\beta_i &\sim \text{Normal}(0, \sigma_{\text{obs}}) \\\sigma_{\text{samp}} &\sim \text{Exponential}(3)\\\sigma_{\text{species}} &\sim \text{Exponential}(3)\\\sigma_{\text{obs}} &\sim \text{Exponential}(3)\end{align}$$```{r}library(tidyverse)library(cmdstanr)library(tidybayes)data("mite", package ="vegan")spp_names <-colnames(mite)spp_names <-setNames(1:ncol(mite), colnames(mite))mite_long <- mite |>mutate(site_id =seq_len(nrow(mite))) |> tidyr::pivot_longer(-site_id,names_to ="spp",values_to ="abd") |> dplyr::mutate(spp_id = spp_names[spp])``````{r}#| class-output: stanlibrary(cmdstanr)spp_site_obs_intercepts <-cmdstan_model("topics/intercept_only/spp_site_obs_intercepts.stan", pedantic =TRUE)spp_site_obs_intercepts```Now we can sample this model. :::{.callout-warning}## Warning: irresponsible statisticsI'm sampling only 2 chains below, for illustration purposes only! use more chains in your research.:::```{r eval=FALSE}spp_site_obs_intercepts_samp <- spp_site_obs_intercepts$sample(data =list(N =nrow(mite_long),N_spp =max(mite_long$spp_id),spp_id = mite_long$spp_id,N_sites =max(mite_long$site_id),site_id = mite_long$site_id,abd = mite_long$abd ),chains =2, parallel_chains =2)spp_site_obs_intercepts_samp$save_object("topics/intercept_only/spp_site_obs_intercepts.rds")``````{r}spp_site_obs_intercepts_samp <-read_rds("topics/intercept_only/spp_site_obs_intercepts.rds")```## Exploring the model output```{r}spp_site_obs_intercepts_samp```The sampler was just fine! Note that we have just estimated many more parameters than observations.## Plot the standard deviations```{r}sigma_post <- spp_site_obs_intercepts_samp$draws(variables =c("sigma_spp", "sigma_sites", "sigma_obs"))sigma_post```The default output is a draws array```{r}sigma_post_df <- spp_site_obs_intercepts_samp$draws(variables =c("sigma_spp", "sigma_sites", "sigma_obs"),format ="data.frame")sigma_post_df```We can also get a huge data frame. If you are comfortable manipulating data frames, you can use all your regular techniques here. Here I use tidyverse tools to reshape and plot the posterior distribution of the three $\sigma$ variables:```{r}sigma_post_df |>pivot_longer(starts_with("sigma"), names_to ="sigma", values_to ="value") |>ggplot(aes(x = value)) +geom_histogram() +facet_wrap(~sigma)```:::{.callout-warning}## CautionParameters from a posterior are NOT independent. If you want to combine parameters in any way (to calculate an average for example) you have to do it WITHIN each posterior sample. Work on ROWS of the draws data frame above:::### Calculate the posterior distribution of average abundance for each speciesTo do this we need to extract the average $mu$ and add it to the species effects, $\beta_{\text{species}}$```{r}spp_avg_effects_df <- spp_site_obs_intercepts_samp$draws(variables =c("mu", "spp_effects"),format ="data.frame")spp_avg_effects_df |>select(mu, starts_with("spp_effects")) |>mutate(row_id =seq_along(mu)) |>pivot_longer(-c("mu", "row_id"), names_to ="parname", values_to ="spp_effect") |>mutate(spp_avg = mu + spp_effect) |>ggplot(aes(x = spp_avg, group = parname)) +geom_density()```## with rvarsWe can do the same operation even more quickly by using some of the tools from tidybayes```{r}spp_mu_rvars <- spp_site_obs_intercepts_samp |> tidybayes::spread_rvars(mu, spp_effects[spp_id])``````{r}spp_mu_rvars |>mutate(spp_avg = mu + spp_effects) |>ggplot(aes(dist = spp_avg, group = spp_id)) + tidybayes::stat_slab(col ="black") +coord_flip() +theme_minimal()```