Probabilities. Basic probability theory

class: title-slide, middle

# Day 1. Probabilities.

## Basic probability theory

.instructors[
  **ECL707/807** - Dominique Gravel
]

---

# Resources

MacKay, David J.C. 2003. Information thoery, Inference, and learning algorithms. Chapter 2. Cambridge University Press. Cambridge.

Bolker, Benjamin M. 2009. Ecological Models and Data in R. Chapter 4. Princeton University Press. Princeton.

---

# Rule 1.

*If two events are mtually exclusive (e.g. survival or death during a time interval), then the probability that either occurs (the probability of `\(A\)` or `\(B\)`, or `\(P(A \bigcup B)\)`) is the sum of their individual probabilities : `\(P(A \bigcup B) = P(A) + P(B)\)`.*

**Example** : the probability of picking a value lower or equal to three with a dice is `\(P(x \leq 3) = P(x=1) + P(x=2) + P(x = 3)\)`

---

# Rule 2.

*If two events `\(A\)` and `\(B\)` are not mutually exclusive -- the joint probability that they occur together, `\(P(A \bigcap B)\)`, is greater than zero -- then we have to correct the rule for combining probabilities to account for double counting:*

`\(P(A \bigcup B) = P(A) + P(B) - P(A \bigcap B)\)`

**Example** : If we are tabulating the color and sex of animals, `\(P(blue \bigcup male) = P(blue) + P(male) - P(blue \bigcap male)\)`.

---

# Rule 3.

*The probabilities of all possible outcomes of an observation or experiment add to 1.*

**Example** : Leslie matrices are used to represent the probability of changing from one ecological state (e.g. size class) to another over a time interval. The row sums usually do not sum to one, the missing piece is the mortality.

---

# Rule 4.

*The conditional probability of `\(A\)` given `\(B\)`, `\(P(A|B)\)`, is the probability that `\(A\)` happens if we know that `\(B\)` happens. The conditional probability equals :*

`\(P(A|B) = P(A \bigcap B)/P(B)\)`

**Example** : The mortality probability of an individual in a structured population is often dependent on size or age. The estimated probability of a randomly picked individual will be more precise if we have information about age (conditional probability) than if we do not (marginal probability).

---

# Rule 5.

*If the conditional probability of `\(A\)` given `\(B\)`, `\(P(A|B)\)`, equals the unconditional probability of `\(A\)`, `\(P(A)\)`, then `\(A\)` is independent of `\(B\)`. Knowing about `\(B\)` provides no information about the probability of `\(A\)`. Independence implies that:*

`\(P(A \bigcap B) = P(A)P(B)\)`

**Exemple** : Co-occurrence analysis consists of comparing the observed joint probability of occurrence of a pair of species to the expectation where species are independently distributed. A spatial association is detected when `\(P(A,B)\)` differs significantly from `\(P(A)P(B)\)`.

---

# Exercise 1.3.

Five trees of a focal species located around a seed trap at various distances. You have a model telling you that the probability a seed falling from these trees getting into the trap is `\(P = \{0.01, 0.2, 0.17, 0.24, 0.06\}\)`.

What is the probability Jonathan measure at least one seed in the trap ? What is the probability there is not a single seed ?

---

# Exercise 1.4.

Jaccard similarity index is one of the most widely used metrics to compute the similarity between two communities. It is based on the occurrence of different types of joint events :

`\(J(A,B) = \frac{N_{11}}{N_{01}+ N_{10} + N_{11}}\)`

where `\(N_{11}\)` is the number of species present at both locations, `\(N_{01}\)` is the number of species absent at first location but present at second and `\(N_{10}\)` is the inverse.

You are interested by the expected Jaccard similarity because you want to know if it differs from a random distribution. You therefore need to compute the expected Jaccard index, knowing that regional prevalences of species `\(A\)`, `\(B\)` and `\(C\)` are `\(\{0.22, 0.15, 0.37\}\)` respectively. What is the expected value ?

---

# Exercise 1.4. (advanced)

Now suppose that you found a study revealing that dispersal limitations generates spatial autocorrelation such that the conditional occurrence at location `\(x\)` depends on the occurrence at location `\(y\)` and the distance `\(D_{xy}\)` between two locations, following the relationship

`\(P(x|y, D_{xy}) = (1-P(x))*exp(-\alpha* D_{xy}) + P(x)\)`.

This function is also reversed for the conditional probability of no occurrence.

Plot the expected relationship between Jaccard similarity index and distance between the locations for different `\(\alpha\)` coefficients.

---

# Exercise 1.5. 
## Uncertainty in species distribution models

Guillaume is interested by the distribution of a rare bird species and potential threats of a changing climate. He used presence-absence data to fit a GLM for binomial data with temperature as predictor. The model returns an occurrence probability `\(p_x\)` for every location `\(x\)`,. The variance of occurrence equals to number of trials (observations) times `\(p*(1-p)\)`. In other words, the variance is maximal at 0.5 and it shrinks as the occurrence probability either tends to `\(0\)` or `\(1\)`.

Guillaume thought his model was pretty crappy because the largest occurrence probability across the range was `\(0.2\)`. Paradoxically, this means the uncertainty in the observation of the species is largest at locations where the species is known to be the most likely to occur. As many ecologists tend to do, he hypothesized that he had not enough information and went out to double his sample size. His results were still the same.

---

# Exercise 1.5. 
## Uncertainty in species distribution models

Guillaume is confused because treating the distribution as a stochastic phenomenon got him to think about several questions :
- Why the species is not 100% sure to occur at its optimal location ? 
- What are the sources of uncertainty in the model ?
- Can we reduce this uncertainty ?

---

# Exercise 1.5. 
## Follow up.

Guillaume consulted a friend ornitologist who knows a lot about that species. He was not surprised by the results since that particular bird species requires a very specific type of habitat to occur. Guillaume therefore worked hard to collect information about the distribution of the habitat and managed to re-run the model with that additional information.

Results were very promising, the occurrence probability went up to `\(0.8\)` when the habitat was favourable and as low as `\(0.05\)` for unfavourable habitats, even at optimal climatic condition. That's a significant reduction in the uncertainty, now the model is much better.

With these number in hands, can you guess what is the relative proportion of this habitat in the landscape, at optimal climatic conditions ?

---

# Exercise 1.5. 
## Follow up.

Guillaume now wants to run climate change scenarios to see where the species could be distributed in the future. The problem he faces is that he does not have scenarios for the potential distribution of this particular habitat feature. He therefore decided to keep this proportion of habitats constant for the future. What are the impacts of that decision on the uncertainty in the forecasts?

---

# Discussion. Is nature fundamentally noisy ?

Clark (2007, TREE) makes the following argument :

*Consider a word model like this:*

`\(response = f(covariates, parameters) + error\)`

*One way to view progress in science is what occurs when variation moves from the second term (unknown) to the first term (known). As information accumulates, we can incorporate more process in the first term.*

In the limit, this reasoning implies that there is no fundamental stochasticity in nature, only deterministic responses to random conditions. Ecologists are constantly fighting stochasticity to find signal in very noisy data. Is it only ignorance or part of nature is fundamentally stochastic ?

---

# Bayes rule

The Bayes rule in its most basic form could be derived from the above rules of probability:

`\(P(A|B) = \frac{P(B|A)P(A)}{P(B)}\)`

As we'll see, it's a very handy equation for bunch of problems. It is obviously fundamental to bayesian statistics, but its application goes beyond and one must be able to play with it before jumping to more sophisticated models.

---

# A classic (and so contemporary) example

Suppose a very hypothetical situation where a nasty disease hits the population and every individual is required to perform a test each time symptoms appear .... The disease is still rare, with approximately 1% of the population currently infected.  Scientists worked to develop a reliable PCR test but there are always problems with contaminations leading to false positives. `\(5\)` out of `\(100\)` who do not have the disease will test positive anyway.

Guillaume was caughing yesterdy and felt pretty bad so he went out to perform a test, which ended up positive. What is the probability that he has the disease ?

---

# Solution 
## First, define variables properly

`\(a = 1\)` : Guillaume has the disease

`\(a = 0\)` : Guillaume does not have the disease

`\(b = 1\)` : The test is positive

`\(b = 0\)` : The test is negative

---

# Solution
## Second, define probabilities

What is a false positive ?

---

# Solution
## Second, define probabilities

`\(P(b=1|a=1) = 0.95\)` : True positive

`\(P(b=1|a=0) = 0.05\)` : False positive

`\(P(a=1) = 0.01\)` : Disease prevalence

We are looking for the probability that Guillaume has the disease and it has been properly detected, `\(P(a=1|b=1)\)`

---

# Solution 
## Use Bayes theorem

`\(P(a=1|b=1) = \frac{P(b=1|a=1)P(a=1)}{P(b=1|a=1)P(a=1)+P(b=1|a=0)P(a=0)}\)`

`\(P(a=1|b=1) = \frac{0.95 X 0.01}{0.95 X 0.01 + 0.05 X 0.99}\)`

`\(P(a=1|b=1) = 0.1\)`

---

# Exercise 1.6.

You are interested to know if a certain parasite species `\(P\)` can infect an host species `\(H\)`. You have observed the interactions among several species at different locations and across the sample, you observed that both the species `\(P\)` and `\(H\)` co-occur at `\(5\)` different locations. However, you never observed them interacting with each other. You looked at all of the data for this parasite and you realized it is a fairly generalist species since it interacts with roughly with `\(20\)` percent of all other possible hosts in the community.

a) What is the probability of the data (ie no interaction) if the probability of a link is `\(0.2\)` ?

b) What is the probability of the observation (no interaction) because the species do not interact ?

c) What is the probability there is a link given the observation ?

---

# Exercise 1.6.
## Hint

The function *dbinom(x, size, prob)* returns you the probability of observing `\(x\)` events for `\(size\)` bernouilli trials with associated probability `\(prob\)`.