Day3: offsets

How to do a count analysis for different efforts, or exposure.

Andrew MacDonald true
2022-03-29

let’s imagine that we have counted dandelions.

Dandelions occur on average 6 per square meter

However we have five kinds of quadrat: 1, 4, 9 and 25 square meters

library(tidyverse)

imaginary_dandelions <- tibble(quadrat_size = rep(c(1,4, 9, 25), each = 15),
       n_per_m2 = purrr::map(quadrat_size, rpois, lambda = 6),
       obs_dandelions = map_dbl(n_per_m2, sum))

ggplot(imaginary_dandelions, aes(x = obs_dandelions)) + geom_histogram() + 
  facet_wrap(~quadrat_size)

How can we get the correct number of dandelions?

Poisson count model

\[ \begin{align} y &\sim \text{Poisson}(\lambda) \\ \text{log}(\lambda) &= \beta \end{align} \] \(\lambda\) is the average response. If we want to measure the average per unit effort, we can do that too:

\[ \begin{align} y &\sim \text{Poisson}(\lambda) \\ \text{log}(\lambda/Q) &= \beta \end{align} \]

\[ \begin{align} y &\sim \text{Poisson}(\lambda) \\ \text{log}(\lambda) - \text{log}(Q) &= \beta \end{align} \]

\[ \begin{align} y &\sim \text{Poisson}(\lambda) \\ \text{log}(\lambda) &= \beta + \text{log}(Q) \end{align} \]

In other words, we need a way to add a log coefficient to a model and give it a slope of exactly one. Fortunately the function offset() is here to do exactly this:

dandelion_model <- glm(obs_dandelions ~ 1, family = poisson(link = "log"), data = imaginary_dandelions)
summary(dandelion_model) 

Call:
glm(formula = obs_dandelions ~ 1, family = poisson(link = "log"), 
    data = imaginary_dandelions)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-9.986  -6.513  -2.443   2.700  11.891  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.07102    0.01686   241.4   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 3066.6  on 59  degrees of freedom
Residual deviance: 3066.6  on 59  degrees of freedom
AIC: 3387.5

Number of Fisher Scoring iterations: 5

This gives the wrong answer!

dandelion_model <- glm(obs_dandelions ~ 1 + offset(log(quadrat_size)),
                       family = poisson(link = "log"),
                       data = imaginary_dandelions)
summary(dandelion_model) 

Call:
glm(formula = obs_dandelions ~ 1 + offset(log(quadrat_size)), 
    family = poisson(link = "log"), data = imaginary_dandelions)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.38711  -0.73105  -0.01465   0.77425   3.07455  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.79375    0.01686   106.4   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 71.52  on 59  degrees of freedom
Residual deviance: 71.52  on 59  degrees of freedom
AIC: 392.5

Number of Fisher Scoring iterations: 4

the coefficient should be close to 6, after we reverse the link function:

exp(coef(dandelion_model)[[1]])
[1] 6.011966