How to do a count analysis for different efforts, or exposure.
let’s imagine that we have counted dandelions.
Dandelions occur on average 6 per square meter
However we have five kinds of quadrat: 1, 4, 9 and 25 square meters
library(tidyverse)
imaginary_dandelions <- tibble(quadrat_size = rep(c(1,4, 9, 25), each = 15),
n_per_m2 = purrr::map(quadrat_size, rpois, lambda = 6),
obs_dandelions = map_dbl(n_per_m2, sum))
ggplot(imaginary_dandelions, aes(x = obs_dandelions)) + geom_histogram() +
facet_wrap(~quadrat_size)

How can we get the correct number of dandelions?
\[ \begin{align} y &\sim \text{Poisson}(\lambda) \\ \text{log}(\lambda) &= \beta \end{align} \] \(\lambda\) is the average response. If we want to measure the average per unit effort, we can do that too:
\[ \begin{align} y &\sim \text{Poisson}(\lambda) \\ \text{log}(\lambda/Q) &= \beta \end{align} \]
\[ \begin{align} y &\sim \text{Poisson}(\lambda) \\ \text{log}(\lambda) - \text{log}(Q) &= \beta \end{align} \]
\[ \begin{align} y &\sim \text{Poisson}(\lambda) \\ \text{log}(\lambda) &= \beta + \text{log}(Q) \end{align} \]
In other words, we need a way to add a log coefficient to a model and give it a slope of exactly one. Fortunately the function offset() is here to do exactly this:
dandelion_model <- glm(obs_dandelions ~ 1, family = poisson(link = "log"), data = imaginary_dandelions)
summary(dandelion_model)
Call:
glm(formula = obs_dandelions ~ 1, family = poisson(link = "log"),
data = imaginary_dandelions)
Deviance Residuals:
Min 1Q Median 3Q Max
-9.986 -6.513 -2.443 2.700 11.891
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.07102 0.01686 241.4 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 3066.6 on 59 degrees of freedom
Residual deviance: 3066.6 on 59 degrees of freedom
AIC: 3387.5
Number of Fisher Scoring iterations: 5
This gives the wrong answer!
dandelion_model <- glm(obs_dandelions ~ 1 + offset(log(quadrat_size)),
family = poisson(link = "log"),
data = imaginary_dandelions)
summary(dandelion_model)
Call:
glm(formula = obs_dandelions ~ 1 + offset(log(quadrat_size)),
family = poisson(link = "log"), data = imaginary_dandelions)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.38711 -0.73105 -0.01465 0.77425 3.07455
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.79375 0.01686 106.4 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 71.52 on 59 degrees of freedom
Residual deviance: 71.52 on 59 degrees of freedom
AIC: 392.5
Number of Fisher Scoring iterations: 4
the coefficient should be close to 6, after we reverse the link function: