Gaussian process

Guillaume Blanchet – Andrew MacDonald

2024-05-02

Hierarchical models so far

An already (very !) general formulation

So far, we have built hierarchical models that can be integrated into the following framework.

\[(\mathbf{y}|\mathbf{X},\mathbf{Z}, \boldsymbol{\beta}, \mathbf{b}, \sigma_\mathbf{y}^2)\sim \mathcal{MVN}(\mathbf{X} \boldsymbol{\beta} + \mathbf{Z}\mathbf{b}, \sigma_\mathbf{y}^2\mathbf{I})\]

where

\[\mathbf{b}\sim \mathcal{MVN}(\mu, \mathbf{\Sigma})\]

  • \(\mathbf{y}\) is a vector quantifying a response variable of length \(n\)
  • \(\mathbf{X}\) is a matrix of explanatory variables with \(n\) rows (samples) and \(p\) columns (explanatory variables)
  • \(\boldsymbol{\beta}\) is a vector \(p\) pararameters weighting the importance of each explanatory variables in \(\mathbf{X}\)
  • \(\sigma_\mathbf{y}^2\) is a measure of variance of the error in the regression model
  • \(\mathbf{I}\) is an \(n \times n\) identity matrix
  • \(\mathbf{Z}\) is designed matrix of “explanatory” variables with \(n\) rows (samples) and \(q\) columns
  • \(\mu\) is a vector defining the average importance of hierarchical parameters
  • \(\mathbf{\Sigma}\) is a matrix defining the covariance structure of hierarchical parameters

Hierarchical models so far

An already (very !) general formulation

Another way to write this generalized formulation is

\[\mathbf{y}_i = \mathbf{X}_{ij} \boldsymbol{\beta}_j + \mathbf{Z}_{ik}\mathbf{b}_{k} + \boldsymbol{\varepsilon}_{ij}\] where

\[\mathbf{b}\sim \mathcal{MVN}(\mu, \mathbf{\Sigma})\] and

\[\varepsilon \sim \mathcal{N}(0, \sigma^2)\]

Even more complex hierarchical models!?

So far we have seen many (!) versions of hierarchical models, which got increasingly more complex in their structure.

Let’s continue on that slippery slope…

Would you know how to constraint (spatially, temporally, phylogenetically, etc.) such a model ?

Even more complex hierarchical models!

If we get back to a model that has a single hierarchy on the intercept such that

\[\mathbf{y} \sim \mathcal{N}(\mathbf{b},\sigma^2)\]

where

\[\mathbf{b} \sim \mathcal{N}(\mu, \sigma^2)\]

If we want to account for a constraint on the previously presented model we can rewrite the equation fo \(\mathbf{b}\) as

\[\mathbf{b} \sim \mathcal{N}\left(\mu, f(d)\right)\]

where

  • \(f(d)\) is a function of a distance matrix

Even more complex hierarchical models!

\[\mathbf{b} \sim \mathcal{N}\left(\mu, f(d)\right)\]

What this equation means conceptually is that the variance associated to \(\mathbf{b}\) is not a constant, it changes based on distance (across space, time, phylogeny, etc.). This is known as a Gaussian process.

A bit of history

In statistics, Gaussian processes have a unique history. The development of this type of model is closely linked to the estimation of mineral deposits.

Spatial Gaussian processes are also called geostatistical models, where the prefix geo refers to geology, not geography, as one may be led to believe.

As mining engineers are at the root of the development of Gaussian processes, the language associated with this type of model is influenced by this field.

A bit of history

Gaussian processes have been developed in the 1950s by

Daniel G. Krige (1919–2013)

Georges Mathéron (1930–2000)

Assumption with Gaussian processes

In general, when defining a Gaussian process, we make the following assumptions:

  • The closer two sites are, the more similar they are.
  • After a certain distance, it is no longer necessary to consider that a site influences another site.

Note The distance of influence of a site on another can be different depending on what is being studied, where it is being studied and when it is being studied

\(f(d)\)

So, what does \(f(d)\) looks like exactly ?

In theory, \(f(d)\) can be anything…

However, in practice, there are particularities of the functions that are defined by the assumptions we impose on our model.

As such, here is a classic structure these variance function take

\(f(d)\) - A bit of vocabulary

Nugget effect

\(f(d)\) - A bit of vocabulary

Range

\(f(d)\) - A bit of vocabulary

Sill

\(f(d)\) - types of functions

Many functions have been proposed have this general shape

In this lecture, we will use the exponential function

\[C_0 + C_1 \left(1-e^{-d/a}\right)\]

where

  • \(C_0\) is the nugget effect
  • \(C_1\) is the sill
  • \(d\) is the distance
  • \(a\) is the range

Exponential function

Nugget : 2 – Sill : 5 – Range : 10

An illustrative example

To present how Gaussian processes can be used, let’s study the distribution of Sylvilagus oviparus in Montréal.

A few characteristics of Sylvilagus oviparus

  • They are found mainly in urban parks of Montréal and are very efficient at hiding in hollow trees and burrows.
  • They lay their eggs (often pastel-coloured) on the Sunday following the first full moon after the spring equinox.
  • They move well in an urban setting and are not affected by the level of urbanisation

A typical member of the species

Distribution of S. oviparus in Montréal

In 2015, a survey was carried out to find Sylvilagus oviparus in Montréal’s park. Here are the data . . .

Within the censused park

  • Blue parks : observed

  • Pink parks : not observed

Model

Since we have presence-absence data

\[P(y = 1) = \frac{\exp(b)}{1 - \exp(b)}\]

where \[b \sim{\cal N}\left(0, C_0 +C_1\left(1 - e^{\frac{-d}{a}}\right)\right)\]

Fitting the model

If we estimate the parameters of the model presented in the previous slide we get a Gaussian process that looks like

What can we learn from this model ?

Kringing

If we want to interpolate across the region of interest, this is known as kriging.

Simple kriging

So far, we have seen the most simplistic Gaussian process where there is not even any intercept that is estimated. In short, the model is constructed using only the covariance function.

In a linear regression perspective, this means that

\[\mathbf{y}\sim\mathcal{N}(0, f(d))\]

Kringing

Ordinary kriging

If we want to account for an intercept in the model such that

\[\mathbf{y}\sim\mathcal{N}(\beta_0, f(d))\]

This is known as ordinary kriging.

Universal kriging

If we want to account for one or more explanatory variables in the model such that

\[\mathbf{y}\sim\mathcal{N}(\beta_0 + \mathbf{X}\beta, f(d))\] This is known as Universal kriging.

Prediction map