Stan

Andrew MacDonald – Guillaume Blanchet

2025-05-06

What is the point of this?

\[ \begin{equation} P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \end{equation} \]

What is the point of this

\[ \begin{equation} P(\boldsymbol{\theta}|\text{data}) = \frac{P(\boldsymbol{\text{data}|\theta}) \cdot P(\boldsymbol{\theta})}{P(\text{data})} \end{equation} \]

What is the point of this

\[ \begin{equation} P(\boldsymbol{\theta}|\text{data}) \propto P(\boldsymbol{\text{data}|\theta}) \cdot P(\boldsymbol{\theta}) \end{equation} \]

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

A concrete example

Laid	Hatched
Egg 1	Chick 1
Egg 2	Chick 2
Egg 3	Chick 3

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

What’s the probability of this dataset?
This is called the likelihood

\[ \begin{align} \text{hatch} &\sim \text{Binomial}(p, 5) \\ \end{align} \]

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

\[ \begin{align*} P(3, 4, 5 | p) &= \text{Binomial}(3 | p, 5) \\ &\quad \times \text{Binomial}(4 | p, 5) \\ &\quad \times \text{Binomial}(5 | p, 5) \end{align*} \]

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

\[ \begin{align*} \ln{P(3, 4, 5 | p)} &= \log(\text{Binomial}(3 | p, 5)) \\ &+ \log(\text{Binomial}(4 | p, 5)) \\ &+ \log(\text{Binomial}(5 | p, 5)) \end{align*} \]

log-likelihood code in R

surv <- c(3, 4, 5)

calc_ll <- function(x) {
  res <- sum(-dbinom(surv, 5,
                     prob = x,
                     log = TRUE))
  return(res)
}

prob_val <- seq(from = 0, to = 1,
                length.out = 30)
log_lik <- numeric(30L)

for (i in 1:length(prob_val)) {
  log_lik[i] <- calc_ll(prob_val[i])
}
par(cex = 3)
plot(prob_val, log_lik, type = "b")

Make it Bayes: add a prior

\[ \begin{align} \text{hatch} &\sim \text{Binomial}(p, 5) \\ p &\sim \text{Uniform}(0,1) \end{align} \]

Make it Bayes: add a prior

\[ \begin{align*} P(\boldsymbol{\theta}|\text{data}) &\propto P(\boldsymbol{\text{data}|\theta}) \cdot P(\boldsymbol{\theta}) \\[10pt] &\propto \text{Bin}(3|p,5) \cdot \text{Bin}(4|p,5) \\ &\qquad \cdot \text{Bin}(5|p,5)\cdot \text{Uniform}(p|0,1) \\[10pt] \log(P(\boldsymbol{\theta}|\text{data})) &\propto \log(\text{Bin}(3|p,5)) + \log(\text{Bin}(4|p,5)) \\ &\qquad + \log(\text{Bin}(5|p,5)) + \log(\text{Uniform}(p|0,1)) \\ \end{align*} \]

Sampling the uncalculable

MANIAC I, 1956 (top) Arianna W. Rosenbluth

Sampling the uncalculable

What is Stan?

Stan

https://mc-stan.org/

A comprehensive software ecosystem aimed at facilitating the application of Bayesian inference

Full Bayesian statistical inference with MCMC sampling (but not only)

Integrated with most data analysis languages (R, Python, MATLAB, Julia, Stata)

Why Stan?

Open source
Extensive documentation
Powerful sampling algorithm
Large and active online community!

Hamiltonian Monte Carlo (HMC)

HMC

Metropolis and Gibbs limitations:

A lot of tuning to find the best spot between large and small steps
Inefficient in high-dimensional spaces
Can’t travel long distances between isolated local minimums

Hamiltonian Monte Carlo:

Uses a gradient-based MCMC to reduce the random walk (hence autocorrelation)
Static HMC
No-U-Turn Sampler (NUTS)
Don’t get it? Viz it!

Stan

What is the point of this?

What is the point of this

What is the point of this

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

log-likelihood code in R

Make it Bayes: add a prior

Make it Bayes: add a prior

Sampling the uncalculable

Sampling the uncalculable

What is Stan?

Stan

Why Stan?

Hamiltonian Monte Carlo (HMC)

HMC

How to Stan

WHY to Stan