2024-04-30
Hierarchical models are a generalized version of the classic regression models you have seen in your undergraduate courses.
In its simplest form, a regression model is usually presented as
\[ y_i = \beta_0 + \beta_1 x_{i} + \varepsilon \]
It is known as a simple linear model, where :
\(y_i\) is the value of a response variable for observation \(i\)
\(x_i\) is the value of an explanatory variable for observation \(i\)
\(\beta_0\) is the model intercept
\(\beta_1\) is the model slope
\(\varepsilon\) is the error term
The cool thing about the simple linear model is that it can be studied visually quite easily.
For example if we are interested in knowing how a newly discovered plant species (Bidonia exemplaris) reacts to humidity, we can relate the biomass of B. exemplaris sampled at 100 sites with the soil humidity content and readily visual the data and the trend.
Generally, we focused on the most commonly assessed regression parameters of the simple linear model, the slope and the intercept, but there is another one that is very important to consider, especially for this course.
Any ideas which one it is ?
If we go back to the mathematical description of the model
\[ y_i = \beta_0 + \beta_1 x_{i} + \varepsilon \]
we can see that in the simple linear regression the error term (\(\varepsilon\)) has actually a very precise definition:
\[\varepsilon \sim \mathcal{N}(0, \sigma^2)\]
where \(\sigma^2\) is an estimated variance.
In words, it means that the error in a simple linear regression follows a Gaussian distribution with a variance that is estimated.
For most the course, we will play with the variance parameter \(\sigma^2\) in a bunch of different ways.
But before we do this, we need to understand a bit more about how this parameter influence the model.
A first way to do this is to think about the simple linear regression in a slightly different way. Specifically, based on what we learned in the previous slide the simple linear regression can be rewritten as \[ y \sim \mathcal{N}(\beta_0 + \beta_1 x_{i}, \sigma^2) \]
As we will see later in this course, this writting style will become particularly useful.
Variance of the model (\(\sigma^2\))
In essence, \(\sigma^2\) tells us about what the model could not account for.
For example, let’s compare the biomass of Bidonia exemplaris with that of Ilovea chicktighii, another species (a carnivorous plant)
Variance of the model (\(\sigma^2\))
By regressing humidity on the biomass of both plants, we can obtain the estimated parameters for each regression (which are all available using summary.lm
)
Variance of the model (\(\sigma^2\))
For Bidonia exemplaris
There are two major pitfalls of the simple linear model for problems in the life sciences
Simple linear regression can be extended to account for multiple explanatory variables to study more complexe problems. This type of regression model is known as a multiple linear regression.
Mathematically, a multiple linear regression can be defined as
\[ y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_p x_{ip} + \varepsilon \]
Theoretically, estimating the parameters of a multiple regression model is done the same ways as for simple linear regression. However, in practice, matrix algebra is quite practical to use in this context and also for this course in general.
In this respect, let’s take a bit of time to get acquinted with different basic (and maybe not so basic!) knowledge of matrix algebra.
\[ \mathbf{A} = \begin{bmatrix} A_{11} & A_{12} & \dots & A_{1j} & \dots & A_{1n}\\ A_{21} & A_{22} & \dots & A_{2j} & \dots & A_{2n}\\ \vdots & \vdots & \ddots & \vdots & & \vdots\\ A_{i1} & A_{i2} & \dots & A_{ij} & \dots & A_{in}\\ \vdots & \vdots & & \vdots & \ddots & \vdots\\ A_{m1} & A_{m2} & \dots & A_{mj} & \dots & A_{mn}\\ \end{bmatrix} \] \[A = \left[a_{ij}\right]=\left[a_{ij}\right]_{m\times n}\]
\[A = \begin{bmatrix} 5 & -6 & 4 & -4\\ \end{bmatrix} \]
\[B = \begin{bmatrix} -8\\ 9\\ -2\\ \end{bmatrix} \]
\[C = \begin{bmatrix} -4 & 1\\ 2 & -5\\ \end{bmatrix} \]
\[A^t=\begin{bmatrix} 5\\ -6\\ 4\\ -4\\ \end{bmatrix} \]
\[B^t =\begin{bmatrix} -8 & 9 & -2\\ \end{bmatrix} \]
\[C^t =\begin{bmatrix} -4 & 2\\ 1 & -5\\ \end{bmatrix} \]
In R
\[\mathbf{B} = c\mathbf{A}\] \[B_{ij} = cA_{ij}\]
\[ 0.3 \begin{bmatrix} 3 & 5\\ 1 & -2\\ \end{bmatrix} = \begin{bmatrix} 0.9 & 1.5\\ 0.3 & -0.6\\ \end{bmatrix} \]
In R
\[\mathbf{C} = \mathbf{A} \cdot \mathbf{B}\]
\[C_{ik} = \sum^{n}_{j=1}A_{ij}B_{jk}\]
Rules
Associative: \(\mathbf{A}(\mathbf{B}\mathbf{C}) = (\mathbf{A}\mathbf{B})\mathbf{C}\)
Distributive: \(\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}\)
Not commutative: \(\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}\)
\[(\mathbf{Ax})_i=\sum_{j=1}^{n}A_{ij}x_j\]
\[ \begin{bmatrix} 3 & 5\\ 1 & -2\\ \end{bmatrix} \begin{bmatrix} 2\\ 5\\ \end{bmatrix} = \begin{bmatrix} (3, 5) \cdot (2, 5)\\ (1, -2) \cdot (2, 5) \\ \end{bmatrix} = \begin{bmatrix} 3 \times 2 + 5 \times 5\\ 1 \times 2 -2 \times 5\\ \end{bmatrix} = \begin{bmatrix} 31\\ -8\\ \end{bmatrix} \]
In R
The identity matrix is a square matrix where all values of its diagonal are 0 except the diagonal values which are all 1s.
\[ \mathbf{I}=\begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ \end{bmatrix} \]
The identity matrix is important because
\[\mathbf{A} \cdot \mathbf{I}_n = \mathbf{A}\] or
\[\mathbf{I}_m \cdot \mathbf{A} = \mathbf{A}\]
The diagonal matrix is a square matrix where all values of its diagonal are 0 except the ones on the diagonal.
\[D= \begin{bmatrix} d_1 & 0 & \dots & 0\\ 0 & d_2 & \dots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & d_n\\ \end{bmatrix}\]
An example
\[ \begin{bmatrix} -1 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & 6\\ \end{bmatrix} \]
In this course, we will rely heavily on multiple linear regression and expand on it by studying how some of the parameters (the \(\beta\)s) can depend on other other data and parameters.
As we saw earlier, a classic way to write multiple linear regression is
\[ y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_p x_{ip} + \varepsilon \]
However, we can rewrite this using matrix notation has
\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} \]
Matrix notation
\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} \]
Using the matrix notation, we assume that
As previously mentioned, in (simple and multiple!) linear regression the error term (\(\varepsilon\)) has actually a very precise definition:
\[\varepsilon \sim \mathcal{N}(0, \sigma^2)\] where \(\sigma^2\) is an estimated variance
which means that the error linear regression follows a Gaussian distribution with an estimateed variance.
Note The model can be written using matrix notation as
\[\mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2)\] where \(\sigma^2\)
To go around this problem, generalized linear models (GLMs) have been proposed. In essence, GLMs use link functions to adapt models for them to be used on non-Gaussian data.
Mathematically, the generic way to write generalized linear model is
\[ \widehat{y}_i = g^{-1}(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_p x_{ip}) \]
or in matrix notation
\[ \widehat{\mathbf{y}} = g^{-1}(\mathbf{X}\boldsymbol{\beta}) \]
where \(g\) is the link function and \(g^{-1}\) the inverse link function.
Link functions
There are many types of link functions and they are usually directly associated to the underlying data the analysis is carried out on.
Arguably the most common link function in ecology is
logit link function
It is commonly used for modelling binary (0-1) data.
\[ \mathbf{X}\boldsymbol{\beta} = \ln\left(\frac{\widehat{\mathbf{y}}}{1 - \widehat{\mathbf{y}}}\right) \]
The inverse logit link function is
\[ \widehat{\mathbf{y}} = \frac{\exp(\mathbf{X}\boldsymbol{\beta})}{1 + \exp(\mathbf{X}\boldsymbol{\beta})} = \frac{1}{1 + \exp(-\mathbf{X}\boldsymbol{\beta})} \]
Link functions
Another commonly used link function is
log link function
It is commonly used for modelling count data.
\[ \mathbf{X}\boldsymbol{\beta} = \ln\left(\widehat{\mathbf{y}}\right) \]
The inverse logit link function is
\[ \widehat{\mathbf{y}} = \exp(\mathbf{X}\boldsymbol{\beta}) \]