Stan

Andrew MacDonald – Guillaume Blanchet

2024-04-30

What is the point of this?

\[ \begin{equation} P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \end{equation} \]

What is the point of this

\[ \begin{equation} P(\boldsymbol{\theta}|\text{data}) = \frac{P(\boldsymbol{\text{data}|\theta}) \cdot P(\boldsymbol{\theta})}{P(\text{data})} \end{equation} \]

What is the point of this

\[ \begin{equation} P(\boldsymbol{\theta}|\text{data}) \propto P(\boldsymbol{\text{data}|\theta}) \cdot P(\boldsymbol{\theta}) \end{equation} \]

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

A concrete example

Laid Hatched
Egg 1 Chick 1
Egg 2 Chick 2
Egg 3 Chick 3

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

What’s the probability of this dataset?
This is called the likelihood

\[ \begin{align} \text{hatch} &\sim \text{Binomial}(p, 5) \\ \end{align} \]

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

\[ \begin{align*} P(3, 4, 5 | p) &= \text{Binomial}(3 | p, 5) \\ &\quad \times \text{Binomial}(4 | p, 5) \\ &\quad \times \text{Binomial}(5 | p, 5) \end{align*} \]

What we talk about when we talk about \(P(\boldsymbol{\text{data}|\theta})\)

\[ \begin{align*} \ln{P(3, 4, 5 | p)} &= \log(\text{Binomial}(3 | p, 5)) \\ &+ \log(\text{Binomial}(4 | p, 5)) \\ &+ \log(\text{Binomial}(5 | p, 5)) \end{align*} \]

log-likelihood code in R

surv <- c(3, 4, 5)

calc_ll <- function(x) {
  res <- sum(-dbinom(surv, 5,
                     prob = x,
                     log = TRUE))
  return(res)
}

prob_val <- seq(from = 0, to = 1,
                length.out = 30)
log_lik <- numeric(30L)

for (i in 1:length(prob_val)) {
  log_lik[i] <- calc_ll(prob_val[i])
}
par(cex = 3)
plot(prob_val, log_lik, type = "b")

Make it Bayes: add a prior

\[ \begin{align} \text{hatch} &\sim \text{Binomial}(p, 5) \\ p &\sim \text{Uniform}(0,1) \end{align} \]

Make it Bayes: add a prior

\[ \begin{align*} P(\boldsymbol{\theta}|\text{data}) &\propto P(\boldsymbol{\text{data}|\theta}) \cdot P(\boldsymbol{\theta}) \\[10pt] &\propto \text{Bin}(3|p,5) \cdot \text{Bin}(4|p,5) \\ &\qquad \cdot \text{Bin}(5|p,5)\cdot \text{Uniform}(p|0,1) \\[10pt] \log(P(\boldsymbol{\theta}|\text{data})) &\propto \log(\text{Bin}(3|p,5)) + \log(\text{Bin}(4|p,5)) \\ &\qquad + \log(\text{Bin}(5|p,5)) + \log(\text{Uniform}(p|0,1)) \\ \end{align*} \]

Sampling the uncalculable

MANIAC I, 1956 (top) Arianna W. Rosenbluth

Sampling the uncalculable

What is Stan?


Stanisław Ulam

Stan

https://mc-stan.org/

A comprehensive software ecosystem aimed at facilitating the application of Bayesian inference

Full Bayesian statistical inference with MCMC sampling (but not only)

Integrated with most data analysis languages (R, Python, MATLAB, Julia, Stata)

Why Stan?

  • Open source
  • Extensive documentation
  • Powerful sampling algorithm
  • Large and active online community!

Hamiltonian Monte Carlo (HMC)


HMC

Metropolis and Gibbs limitations:

  • A lot of tuning to find the best spot between large and small steps
  • Inefficient in high-dimensional spaces
  • Can’t travel long distances between isolated local minimums

Hamiltonian Monte Carlo:

  • Uses a gradient-based MCMC to reduce the random walk (hence autocorrelation)

  • Static HMC

  • No-U-Turn Sampler (NUTS)

  • Don’t get it? Viz it!

How to Stan

WHY to Stan