---
title: "Introduction to nimbleEcology"
author: "Perry de Valpine and Ben Goldstein"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
description: >
  An overview of functioning in nimbleEcology.
vignette: >
  %\VignetteIndexEntry{Introduction to nimbleEcology}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  ,eval = TRUE ## uncomment this to build quickly without running code.
)
```



Welcome to `nimbleEcology`.  This package provides distributions that can be used in NIMBLE models for common ecological model components.  These include:

- Cormack-Jolly-Seber (CJS) capture-recapture models.
- Dynamic hidden Markov models (DHMMs), which are used in multi-state and multi-event capture-recapture models.
- Occupancy models.
- Dynamic occupancy models.
- N-mixture models.

# What is nimble?

NIMBLE is a system for writing hierarchical statistical models and algorithms.  It is distributed as an R package [nimble](https://CRAN.R-project.org/package=nimble).  NIMBLE stands for "Numerical Inference for statistical Models using Bayesian and Likelihood Estimation".  NIMBLE includes:

1. A dialect of the BUGS model language that is extensible.  NIMBLE uses almost the same model code as WinBUGS, OpenBUGS, and JAGS.  Being "extensible" means that it is possible to write new functions and distributions and use them in your models.

2. An algorithm library including Markov chain Monte Carlo (MCMC) and other methods.

3. A compiler that generates C++ for each model and algorithm, compiles the C++, and lets you use it from R.  You don't need to know anything about C++ to use nimble.

More information about NIMBLE can be found at [https://r-nimble.org](https://r-nimble.org).

The paper that describes NIMBLE is [here](https://www.tandfonline.com/doi/full/10.1080/10618600.2016.1172487).

### Getting help

The best way to seek user support is the nimble-users list.  Information on how to join can be found at [https://r-nimble.org](https://r-nimble.org).

# The concept of using new distributions for ecological model components.

The distributions provided in `nimbleEcology` let you simplify model code and the algorithms that use it, such as MCMC.  For the ecological models in `nimbleEcology`, the simplification comes from removing some discrete latent states from the model and instead doing the corresponding probability (or likelihood) calculations in a specialized distribution that marginalizes over the latent states.

For each of the ecological model components provided by `nimbleEcology`, here are the discrete latent states that are replaced by use of a marginalized distribution:

- CJS (basic capture-recapture): Latent individual alive-or-dead state at each time.
- HMM and DHMM: Latent individual state, such as location or breeding status, as well as alive-or-dead, at each time.
- Occupancy: Latent occupancy status of a site.
- Dynamic occupancy: Latent occupancy status of a site at each time.
- N-mixture: Latent number of individuals at a site.

Before going further, let's illustrate how `nimbleEcology` can be used for a basic occupancy model.

## Illustration: A simple occupancy model

Occupancy models are used for data from repeated visits to a set of sites, where the detection (1) or non-detection (0) of a species of interest is recorded on each visit.  Define `y[i, j]` as the observation from site `i` on visit `j`.  `y[i, j]` is 1 if the species was seen and 0 if not.

Typical code for for an occupancy model would be as follows.  Naturally, this is written for `nimble`, but the same code should work for JAGS or BUGS (WinBUGS or OpenBUGS) when used as needed for those packages.

```{r, results='hide', messages=FALSE,warnings=FALSE}
library(nimble)
library(nimbleEcology)
```

```{r}
occupancy_code <- nimbleCode({
  psi ~ dunif(0,1)
  p ~ dunif (0,1)
  for(i in 1:nSites) {
    z[i] ~ dbern(psi)
    for(j in 1:nVisits) {
      y[i, j] ~ dbern(z[i] * p)
    }
  }
})
```

In this code:

- `psi` is occupancy probability;
- `p` is detection probability;
- `z[i]` is the latent state of whether a site is really occupied (`z[i]` = 1) or not (`z[i]` = 0);
- `nSites` is the number of sites; and
- `nVisits` is the number of sampling visits to each site.

The new version of this model using `nimbleEcology`'s specialized occupancy distribution will only work in `nimble` (not JAGS or BUGS).  It is:

```{r}
occupancy_code_new <- nimbleCode({
  psi ~ dunif(0,1)
  p ~ dunif (0,1)
  for(i in 1:nSites) {
    y[i, 1:nVisits] ~ dOcc_s(probOcc = psi, probDetect = p, len = nVisits)
  }
})
```

In the new code, the vector of data from all visits to site `i`, namely `y[i, 1:nVisits]`, has its likelihood contribution calculated in one step, `dOcc_s`.  This occupancy distribution calculates the total probability of the data by summing over the cases that the site is occupied or unoccupied. That means that `z[i]` is not needed in the model, and MCMC will not need to sample over `z[i]`.  Details of all calculations, and discussion of the pros and cons of changing models in this way, are given later this vignette.

The `_s` part of `dOcc_s` means that `p` is a scalar, i.e. it does not vary with visit.  If it should vary with visit, a condition sometimes described as being time-dependent, it would need to be provided as a vector, and the distribution function should be `dOcc_v`.

## MCMC with both versions of the example occupancy model.

We can run an MCMC for this model in the following steps:

1. Build the model.
2. Configure the MCMC.
3. Build the MCMC.
4. Compile the model and MCMC.
5. Run the MCMC.
6. Extract the samples.

The function `nimbleMCMC` does all of these steps for you.  The function `runMCMC` does steps 5-6 for you, with convenient management of options such as discarding burn-in samples.  The full set of steps allows greater control over how you use a model and configure and use an MCMC.  We will go through the steps 1-4 and then use `runMCMC` for steps 5-6.  

In this example, we also need simulated data.  We can use the same model to create simulated data, rather than writing separate R code for that purpose.

#### Build the model

```{r}
occupancy_model <- nimbleModel(occupancy_code,
                               constants = list(nSites = 50, nVisits = 5))
```

#### Simulate data

```{r}
occupancy_model$psi <- 0.7
occupancy_model$p <- 0.15
simNodes <- occupancy_model$getDependencies(c("psi", "p"), self = FALSE)
occupancy_model$simulate(simNodes)
occupancy_model$z
head(occupancy_model$y, 10) ## first 10 rows
occupancy_model$setData('y') ## set "y" as data
```

#### Configure and build the MCMC

```{r}
MCMCconf <- configureMCMC(occupancy_model)
MCMC <- buildMCMC(occupancy_model)
```

#### Compile the model and MCMC

```{r}
## These can be done in one step, but many people
## find it convenient to do them in two steps.
Coccupancy_model <- compileNimble(occupancy_model)
CMCMC <- compileNimble(MCMC, project = occupancy_model)
```

#### Run the MCMC and extract the samples

```{r, results='hide', messages=FALSE,warnings=FALSE}
samples <- runMCMC(CMCMC, niter = 10000, nburnin = 500, thin = 10)
```

### Do it all for the new version of the model

Next we show all of the same steps, except for simulating data, using the new version of the model.

```{r, results='hide', messages=FALSE,warnings=FALSE}
occupancy_model_new <- nimbleModel(occupancy_code_new,
                                   constants = list(nSites = 50, nVisits = 5),
                                   data = list(y = occupancy_model$y),
                                   inits = list(psi = 0.7, p = 0.15))
MCMC_new <- buildMCMC(occupancy_model_new) ## This will use default call to configureMCMC.
Coccupancy_model_new <- compileNimble(occupancy_model_new)
CMCMC_new <- compileNimble(MCMC_new, project = occupancy_model_new)
samples_new <- runMCMC(CMCMC_new, niter = 10000, nburnin = 500, thin = 10)
```

The results of the two versions match closely.

```{r, echo = FALSE, results='hide', messages=FALSE,warnings=FALSE}

{
plot(density(as.data.frame(samples)$psi), col = "red", main = "psi")
points(density(as.data.frame(samples_new)$psi), col = "blue", type = "l")
}

{
plot(density(as.data.frame(samples)$p), col = "red", main = "p")
points(density(as.data.frame(samples_new)$p), col = "blue", type = "l")
}

```

The posterior density plots show that the familiar, conventional version of the model yields the same posterior distribution as the new version, which uses `dOcc_s`.

## Use the new version for something different: maximum likelihood or maximum a posteriori estimation.

It is useful that the new way to write the model does not have discrete latent states.  Since this example also does not have other latent states or random effects, we can use it simply as a likelihood or posterior density calculator.  More about how to do so can be found in the nimble User Manual.  Here we illustrate making a compiled `nimbleFunction` for likelihood calculations for parameters `psi` and `p` and maximizing the likelihood using R's `optim`.

```{r}
CalcLogLik <- nimbleFunction(
  setup = function(model, nodes)
    calcNodes <- model$getDependencies(nodes, self = FALSE),
  run = function(v = double(1)) {
    values(model, nodes) <<- v
    return(model$calculate(calcNodes))
    returnType(double(0))
  }
)
OccLogLik <- CalcLogLik(occupancy_model_new, c("psi", "p"))
COccLogLik <- compileNimble(OccLogLik, project = occupancy_model_new)
optim(c(0.5, 0.5), COccLogLik$run, control = list(fnscale = -1))$par
```

## Support for automatic differentiation with `nimbleEcology`.

As of `nimble` version 1.0.0, there is a system for automatic (or algorithmic) differentiation, known as AD. This is used by algorithms such as Hamiltonian Monte Carlo (see package [`nimbleHMC`](https://cran.r-project.org/package=nimbleHMC)) and Laplace approximation (`buildLaplace` in `nimble`).

The distributions provided in `nimbleEcology` support AD as much as possible. There are three main points to keep in mind:

1. It is not possible to take derivatives with respect to discrete values, and the "data" for the `nimbleEcology` distributions are all discrete values. It *is* possible to take derivatives with respect to continuous parameters of the distributions. If the "data" are marked as `data` in the model (and hence will not be sampled by MCMC, for example), there is no problem.
2. Some values will be "baked in" to the AD calculations, meaning that the values first present will be used permanently in later AD calculations. In all cases of `nimbleEcology` distributions, the values "baked in" sizes of variables. In some cases (such ash dHMM and dDHMM) they also include the data values. See the help page for each distribution for more details (e.g. `help(dOcc)`). If the data are scientific data that do not need to be changed after creating the model and algorithm, there is no problem.
3. For the N-mixture distributions only, one needs to use different distribution names. Every `dNmixture` portion of a distribution name below should be replaced with `dNmixtureAD`.

# Distributions provided in `nimbleEcology`

In this section, we introduce each of the `nimbleEcology` distributions in more detail.  We will summarize the calculations using mathematical notation and then describe how to use the distributions in a `nimble` model.

### A note on static typing

Some distribution names are followed by a suffix indicating the type of some parameters, for example the `_s` in `dOcc_s`. NIMBLE uses a static typing system, meaning that a function must know in advance if a certain argument will be a scalar, vector, or matrix. There may be notation such as `sv` or `svm` if there are two or three parameters that can be time-independent (`s`) or time-dependent (`v` or `m`) in one or more dimensions. In general, `s` corresponds to a scalar argument, `v` to a vector argument, and `m` to a matrix argument. The order of these type indicators will correspond to the order of the relevant parameters, but always check the documentation when using a new distribution with the R function `?` (e.g., both `?dOcc` and `?dOcc_s` work).

The static typing requirement may be relaxed somewhat in the future.


## Capture-recapture (dCJS)

Cormack-Jolly-Seber models give the probability of a capture history for each of many individuals, conditional on first capture, based on parameters for survival and detection probability.  `nimbleEcology` provides a distribution for individual capture histories, with variants for time-independent and time-dependent survival and/or detection probability.  Of course, the survival and detection parameters for the CJS probability may themselves depend on other parameters and/or random effects. The rest of this summary will focus on one individual's capture history.

Define $\phi_t$ as survival from time $t$ to $t+1$ and $\mathbf{\phi} = (\phi_1, \ldots, \phi_{T-1})$, where $T$ is the length of the series.  We use "time" and "sampling occasion" as synonyms in this context, so $T$ is the number of sampling occasions.  (Be careful with time indexing. Sometimes you might see $\phi_t$ defined as survival from time $t-1$ to $t$.)  Define $p_t$ as detection probability at time $t$ and $\mathbf{p} = (p_1, \ldots, p_T)$.  Define the capture history as $\mathbf{y} = y_{1:T} = (y_1, \ldots, y_T)$, where each $y_t$ is 0 or 1.  The notation $y_{i:j}$ means the sequence of observations from time $i$ to time $j$.  The first observation of the capture history should always be 1: $y_1 = 1$.  The CJS probability calculations condition on this first capture.

There are multiple ways to write the CJS probability.  We will do so in a state-space format because that leads to the more general DHMM case next. The probability of observations given parameters, $P(\mathbf{y} | \mathbf{\phi}, \mathbf{p})$, is factored as:
\[
P(\mathbf{y} | \mathbf{\phi}, \mathbf{p}) = \prod_{t = 1}^{T-1} P(y_{t+1} | y_{1:t}, \mathbf{\phi}, \mathbf{p})
\]

Each factor $P(y_{t+1} | y_{1:t}, \mathbf{\phi}, \mathbf{p})$ is calculated as:
\[
P(y_{t+1} | y_{1:t}, \mathbf{\phi}, \mathbf{p}) = I(y_{t+1} = 1) (A_{t+1} p_{t+1}) + I(y_{t+1} = 0) (A_{t+1} (1-p_{t+1}) + (1-A_{t+1}))
\]
The indicator function $I(y_t = 1)$ is 1 if it $y_t$ is 1, 0 otherwise, and vice versa for $I(y_t = 0)$. Here $A_{t+1}$ is the probability that the individual is alive at time $t+1$ given $y_{1:t}$, the data up to the previous time.  This is calculated as:
\[
A_{t+1} = G_{t} \phi_{t}
\]
where $G_{t}$ is the probability that the individual is alive at time $t$ given $y_{1:t}$, the data up to the current time.  This is calculated as:
\[
G_{t} = I(y_t = 1) 1 + I(y_t = 0) \frac{A_t (1-p_t)}{A_t (1-p_t) + (1-A_t)}
\]
The sequential calculation is initialized with $G_1 = 1$.  For time step $t+1$, we calculate $A_{t+1}$, then $P(y_{t+1} | y_{1:t}, \mathbf{\phi}, \mathbf{p})$, then $G_{t+1}$, leaving us ready for time step $t+2$.  This is a simple case of a hidden Markov model where the latent state, alive or dead, is not written explicitly.

In the cases with time-independent survival or capture probability, we simply drop the time indexing for the corresponding parameter.

### CJS distributions in `nimbleEcology`

CJS models are available in four distributions in `nimbleEcology`.  These differ only in whether survival probability and/or capture probability are time-dependent or time-independent, yielding four combinations:

- `dCJS_ss`: Both are time-independent (scalar).
- `dCJS_sv`: Survival is time-independent (scalar).  Capture probability is time-dependent (vector).
- `dCJS_vs`: Survival is time-dependent (vector).  Capture probability is time-independent (scalar).
- `dCJS_vv`: Both are time-dependent (vector).

The usage for each is similar.  An example for `dCJS_vs` is:

`y[i, 1:T] ~ dCJS_sv(probSurvive = phi, probCapture = p[i, 1:T], len = T)`

Note the following points:

- `y[i, 1:T]` is a vector of the capture history.  It is written as if `i` indexes individual, but it could be any vector in any variable in the model.
- Arguments to `dCJS_sv` are named.  As in R, this is optional but helpful.  Without names, the order matters.
- `probSurvive` is provided as a scalar value, assuming there is a variable called `phi`.
- In variants where `probSurvive` is a vector (`dOcc_vs` and `dOcc_vv`), the $t^{\mbox{th}}$ element of `probSurvive` is $\phi_t$ above, namely the probability of survival from occasion $t$ to $t+1$.
- `probCapture` is provided as a vector value, assuming there is a matrix variable called `p`.  The value of `probCapture` could be any vector from any variable in the model.
- `probCapture[t]` (i.e., the $t^{\mbox{th}}$ element of `probCapture`, which is `p[i, t]` in this example) is $p_t$ above, namely the probability of capture, if alive, at time $t$.
- The length parameter `len` is in some cases redundant (the information could be determined by the length of other inputs), but nevertheless it is required.

## Hidden Markov models (HMMs) and Dynamic hidden Markov models (DHMMs)

Hidden Markov models give the probability of detection history that can handle:

- Detections in different states, such as locations or breeding states,
- Incorrect state information, such as if breeding state might not be observed correctly,
- Probabilities of transitions between states.

In a HMM, "dead" is simply another state, and "unobserved" is a possible observation.  Thus, HMMs are generalizations of the CJS model. In capture-recapture, HMMs encompass multi-state and multi-event models.  The HMM calculations do not condition on first capture (unlike CJS above), so the time steps below begin with probabilities of states and observation at time 1 (and "$y_{1:0}$" is empty).

We again use the factorization
\[
P(\mathbf{y} | \mathbf{\phi}, \mathbf{p}) = \prod_{t = 1}^T P(y_{t} | y_{1:t-1}, \mathbf{\phi}, \mathbf{p})
\]
In the case of a HMM, $y_t$ is the observed state at time $t$, taking an integer value.  Define $S$ as the number of possible true (latent) states and $K$ as the number of possible observed states.  Observed states need not correspond one-to-one to real states.  For example, often there is an observed state for "unobserved".  Another example is that two real states might never be distinguishable in observations, so they may correpond to only one observed state.

Define $A_{i, t}$ as the probability that the individual is in state $i$ at time $t$, given $y_{1:t-1}$, the data up to the previous time.  Define $p_{i,j,t}$ as the probability that, at time $t$, an individual in state $i$ is observed in state $j$.  Then $P(y_{t} | y_{1:t-1}, \mathbf{\phi}, \mathbf{p})$ is calculated as:
\[
P(y_{t} | y_{1:t-1}, \mathbf{\phi}, \mathbf{p}) = \sum_{i=1}^S A_{i, t} p_{i, y_t, t}
\]
where $j = y_t$ is the observed state of the individual.

Define $G_{i, t}$ as the probability that the individual is in state $i$ at time $t$ given $y_{1:t}$, the data up to the current time.  Define $\psi_{i, j, t}$ as the probability that, from time $t$ to $t+1$, an individual in state $i$ transitions to state $j$. Then the calculation of $A_{j,t}$ for each possible state $j$ is given by:
\[
A_{j, t} = \sum_{i=1}^S G_{i, t-1} \psi_{i, j, t-1}, \mbox{ for } j=1,\ldots,S
\]

Finally, calculation of $G_{i, t}$ for each possible state $i$ is done by conditioning on $y_t$ as follows:
\[
G_{i, t} = \frac{A_{i, t} p_{i, y_t, t}}{P(y_{t} | y_{1:t-1}, \mathbf{\phi}, \mathbf{p})}, \mbox{ for } j=1,\ldots,S
\]

The calculation of the total probability of a detection history with uncertain state information starts by initializing $A_{i, 1}$ to some choice of initial (or prior) probability of being in each state before any observations are made.  Then, starting with $t = 1$, we calculate $P(y_{t} | y_{1:t-1}, \mathbf{\phi}, \mathbf{p})$.  If $t < T$, we prepare for time step $t+1$ by calculating $G_{i, t}$ for each $i$ followed by $A_{i, t+1}$ for each $i$.

If the transition probabilities, $\psi_{i, j, t}$, actually depend on $t$, we refer to the model as "Dynamic", namely a DHMM instead of an HMM.  It may also be the case that observation probabilities $p_{i, j, t}$ depend on $t$ or not.

## HMM and DHMM distributions in `nimbleEcology`

HMMs and DHMMs are available in four distributions in `nimbleEcology`.  These differ only in whether transition and/or observation probabilities are time-dependent or time-independent, yielding four combinations:

- `dHMM`: Both are time-independent.
- `dDHMM`: State transitions are time-dependent (dynamic).  Observation probabilities are time-independent.
- `dHMMo`: Observation probabilities are time-dependent. State transitions are time-independent (not dynamic).
- `dDHMMo`: Both are time-dependent.

In this notation, the leading `D` is for "dynamic" (time-dependent state transitions), while the trailing "o" is for "observations" being time-dependent.

The usage for each is similar.  An example for `dDHMM` is:

`y[i, 1:T] ~ dDHMM(init = initial_probs[i, 1:T], obsProb = p[1:nStates, 1:nCat], transProb = Trans[i, 1:nStates, 1:nStates, 1:(T-1)], len = T)`

Note the following points:

- As above, this is written as if `i` indexes individuals in the model, but this is arbitrary as an example.
- `nStates` is $S$ above.
- `nCat` is $K$ above.
- `init[i]` is the initial probability of being in state `i`, namely $A_{i, 1}$ above.
- `obsProb[i, j]` (i.e., `p[i, j]` in this example) is probability of observing an individual who is truly in state `i` as being in observation state `j`.  This is $p_{i, j}$ above if indexing by $t$ is not needed.  If observation probabilities are time-dependent (in `dHMMo` and `dDHMMo`), then `obsProb[i, j, t]` is $p_{i, j, t}$ above.
- `transProb[i, j, t]` (i.e., `Trans[i, j, t]` in this example) is the probability that an individual who is truly in state `i` changes to state `j` during the transition from time step `t` to `t+1`.  This is $\psi_{i,j,t}$ above.
- `len` is the length of the observation record, `T` in this example.
- `len` must match the length of the data, `y[i, 1:T]` in this example.
- The dimensions of `obsProb` must be $K \times S$ in the time-independent case (`dHMM` or `dDHMM`) or $K \times S \times T$ in the time-dependent case (`dHMMo` or `dDHMMo`).
- The dimensions of `transProb` must be $S \times S$ in the time-independent case (`dHMM` or `dHMMo`) or $S \times S \times (T-1)$ in the time-dependent case (`dDHMM` or `dDHMMo`).  The last dimension is one less than $T$ because no transition to time $T+1$ is needed.

## Occupancy

An occupancy model gives the probability of a series of detection/non-detection records for a species during multiple visits to a site.  The occupancy distributions in `nimbleEcology` give the probability of the detection history for one site, so this summary focuses on data from one site.

Define $y_t$ to be the observation at time $t$, with $y_t = 1$ for a detection and $y_t = 0$ for a non-detection.  Again, we use "time" as a synonym for "sampling occasion".  Again, define the vector of observations as $\mathbf{y} = (y_1, \ldots, y_T)$, where $T$ is the number of sampling occasions.

Define $\psi$ as the probability that a site is occupied. Define $p_t$ as the probability of a detection on sampling occasion $t$ if the site is occupied, and $\mathbf{p} = (p_1, \ldots, p_T)$.  Then the probability of the data given the parameters is:
\[
P(\mathbf{y} | \psi, \mathbf{p}) = \psi \prod_{t = 1}^T p_t^{y_t} (1-p_t)^{1-y_t} + (1-\psi) I\left(\sum_{t=1}^T y_t= 0 \right)
\]
The indicator function usage in the last term, $I(\cdot)$, is 1 if the given summation is 0, i.e. if no detections were made.  Otherwise it is 0.

### Occupancy models in `nimbleEcology`

Occupancy models are available in two distributions in `nimbleEcology`. These differ only in whether detection probability depends on time or not:

- `dOcc_s`: Detection probability is time-independent (scalar).
- `dOcc_v`: Detection probability is time-dependent (vector).

An example for `dOcc_v` is:

`y[i, 1:T] ~ dOcc_v(probOcc = psi, probDetect = p[i, 1:T], len = T)`

Note the following points:

- This is written as if `i` indexes site, but the variables could be arranged in other ways.
- `y[i, 1:T]` is the detection record.
- `probOcc` is the probability of occupancy, $\psi$ above.
- `probDetect` is the vector of detection probabilities, $\mathbf{p}$ above.  In the case of `dOcc_s`, `probDetect` would be a scalar.
- `len` is the length of the detection record.

## Dynamic occupancy

Dynamic occupancy models give the probability of detection records from multiple seasons (primary periods) in each of which there were multiple sampling occasions (secondary periods) at each of multiple sites.  The dynamic occupancy distribution in `nimbleEcology` provides probability calculations for data from one site at a time.

We will use "year" for primary periods and "time" or "sampling occasion" as above for secondary periods.  Define $y_{r, t}$ as the observation (1 or 0) on sampling occasion $t$ of year $r$.  Define $\mathbf{y}_r$ as the detection history in year $r$, i.e. $\mathbf{y}_r = (y_{r, 1}, \ldots, y_{r, T})$ .  Define $\phi_t$ as the probability of being occupied at time $t+1$ given the site was occupied at time $t$, called "persistence".  Define $\gamma_t$ as the probability of being occupied at time $t+1$ given the site was unoccupied at time $t$, called "colonization".  Define $p_{r, t}$ as the detection probability on sampling occasion $t$ of year $r$ given the site is occupied.

The probability of all the data given parameters is:
\[
P(\mathbf{y} | \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p}) =  \prod_{r = 1}^R P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p})
\]
Each factor $P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p})$ is calculated as:
\[
P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p}) = A_{r} \prod_{t = 1}^T p_{r,t}^{y_{r,t}} (1-p_{r,t})^{1-y_{r,t}} + (1-A_{r}) I\left(\sum_{t=1}^T y_{r,t} = 0 \right)
\]
Here $A_r$ is the probability that the site is occupied in year $r$ given observations up to the previous year $\mathbf{y}_{1:r-1}$.  Otherwise, this equation is just like the occupancy model above, except there are indices for year $r$ in many places.  $A_r$ is calculated as:
\[
A_r = G_{r-1} \phi_{r-1} + (1-G_{r-1}) \gamma_{r-1}
\]
Here $G_r$ is the probability that the site is occupied given the data up to time $r$, $\mathbf{y}_{1:r}$.  This is calculated as
\[
G_r = \frac{A_{r} \prod_{t = 1}^T p_{r,t}^{y_{r,t}} (1-p_{r,t})^{1-y_{r,t}}}{P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p})}
\]

The sequential calculation is initiated with $A_1$, which is natural to think of as "$\psi_1$", probability of occupancy in the first year.  Then for year $r$, starting with $r = 1$, we calculate $P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p})$.  If $r < R$, we calculate $G_r$ and then $A_{r+1}$, leaving us ready to increment $r$ and iterate.

### Dynamic occupancy models in `nimbleEcology`

Dynamic occupancy models are available in twelve parameterizations in `nimbleEcology`. These differ in whether persistence, colonization, and/or detection probabilities are time-dependent, with a "s" (time-independent) and "v" (time-dependent) notation similar to the distributions above. Detection probabilities can be the same for all seasons and sampling events ("s"), constant within each season but different season to season ("v"), or time-dependent by sampling event within season ("m"), in which case a matrix argument is required. The distributions are named by `dDynOcc_` followed by three letters. Each letter indicates the typing (or dimension) of the persistence, colonization, and detection probabilities, respectively:

- `dDynOcc_s**` functions take time-independent (scalar) persistence probabilities, while `dDynOcc_v**`functions take time-dependent (vector) persistence probabilities
- `dDynOcc_*s*` functions take time-independent (scalar) colonization probabilities, while `dDynOcc_*v*`functions take time-dependent (vector) colonization probabilities
- `dDynOcc_**s` functions take time-independent (scalar) observation probabilities, while `dDynOcc_**v` functions take observation probabilities dependent on time step (vector) and `dDynOcc_**m` functions take observation probabilities dependent on both time step and observation event (matrix)

Expanding these typing possibilities gives $2 \times 2 \times 3 = 12$ total functions:

- `dDynOcc_sss`
- `dDynOcc_svs`
- `dDynOcc_vss`
- `dDynOcc_vvs`
- `dDynOcc_ssv`
- `dDynOcc_svv`
- `dDynOcc_vsv`
- `dDynOcc_vvv`
- `dDynOcc_ssm`
- `dDynOcc_svm`
- `dDynOcc_vsm`
- `dDynOcc_vvm`

An example for `dDynOcc_svs` is:

`y[i, 1:T] ~ dDynOcc_svs(init = psi1[i], probPersist = phi[i], probColonize = gamma[i, 1:T], p = p, len = T)`

Note the following points:

- As in the examples above, this is written as if `i` indexes the individual site, but the variables could be arranged in other ways.
- `y[i, 1:T]` is the detection record.
- `probPersist` is the probability of persistence, $\phi$ above.
- `probColonize` is the vector of detection probabilities, $\mathbf{\gamma}$ above.  In the case of `dDynOcc_*s*`, `probColonize` would be a scalar.
- `len` is the length of the detection record.
- `p` here is a single constant value of observation probability for all samples. If `p` changed with season or season and observation event, we would need to use a different function (`dDynOcc_**v` or `dDynOcc_**m`).

## N-mixture

An N-mixture model gives the probability of a set of counts from repeated visits to each of multiple sites.  The N-mixture distribution in `nimbleEcology` gives probability calculations for data from one site.

Define $y_t$ as the number of individuals counted at the site on sampling occasion (time) $t$.  Define $\mathbf{y} = (y_1, \ldots, y_t)$.  Define $\lambda$ as the average density of individuals, such that the true number of individuals, $N$, follows a Poisson distribution with mean $\lambda$.  Define $p_t$ to be the detection probability for a single individual at time $t$, and $\mathbf{p} = (p_1, \ldots, p_t)$.

The probability of the data given the parameters is:
\[
P(\mathbf{y} | \lambda, \mathbf{p}) = \sum_{N = 1}^\infty \left[ P(N | \lambda)
\prod_{t = 1}^T P(y_t | N) \right]
\]
where $P(N | \lambda)$ is a Poisson probability and $P(y_t | N)$ is a binomial probability.  That is, $y_t \sim \mbox{Binomial}(N, p_t)$, and the $y_t$s are independent.

In practice, the summation over $N$ can start at a value greater than 0 and must be truncated at some value less than infinity. Two options are provided for the range of summation:

1. The user can provide values $Nmix$ and $Nmax$ to start and end the summation, respectively. A typical choice for $Nmin$ would be the largest value of $y_t$ (there must be at least this many individuals).
2. The following heuristic can be used:

If we consider a single $y_t$, then $N - y_t | y_t \sim \mbox{Poisson}(\lambda (1-p_t))$ (*See opening example of [Royle and Dorazio, 2008](https://www.mbr-pwrc.usgs.gov/pubanalysis/roylebook/roylebook.html)*).  Thus, a natural upper end for the summation range of $N$ would be $y_t$ plus a very high quantile of The $\mbox{Poisson}(\lambda (1-p_t))$ distribution.  For a set of observations, a natural choice would be the maximum of such values across the observation times.  We use the 0.99999 quantile to be conservative.

Correspondingly, the summation can begin at smallest of the 0.00001 quantiles of $N | y_t$. If $p_t$ is small, this can be considerably larger than the maximum value of $y_t$, allowing more efficient computation.

### N-mixture models in `nimbleEcology`

Standard (binomial-Poisson) N-mixture models are available in two distributions in `nimbleEcology`. They differ in whether probability of detection is visit-dependent (vector case, corresponding to `dNmixture_v`) or visit-independent (scalar, `dNmixture_s`).

An example is:

`y[i, 1:T] ~ dNmixture_v(lambda = lambda, p = p[1:T], Nmin = Nmin, Nmax = Nmax, len = T)`

- As in the examples above, this is written as if `i` indexes the individual site, but the variables could be arranged in other ways.
- `lambda` is $\lambda$ above.
- `p[1:T]` is $\mathbf{p}$ above. If $p$ were constant across visits, we would use `dNmixture_s` and a scalar value of `p`.
- `len` is $T$.
- `Nmin` and `Nmax` provide the lower and upper bounds for the sum over Ns (option 1 above). If both are set to `-1`, bounds are chosen dynamically using quantiles of the Poisson distribution (option 2 above).

Three variations of the N-mixture model are also available, in which the Poisson distribution is replaced by negative binomial, the binomial is replaced by beta binomial, or both. These are called `dNmixture_BNB_*`, `dNmixture_BBP_*`, and `dNmixture_BBNB_*`, respectively. Each has three suffixes: `_v` and `_s` correspond to the cases provided above, and `_oneObs` distributions are provided for the case where the data are scalar (i.e., only one observation at the site). No `_oneObs` observation is provided for the default `dNmixture` because `dNmixture(x[1:1], lambda, prob[1:1])` is equivalent to `dpois(x[1:1], lambda * prob[1:1])`.

These combinations lead to the following set of 11 N-mixture distributions:

- `dNmixture_v`
- `dNmixture_s`
- `dNmixture_BNB_v`
- `dNmixture_BNB_s`
- `dNmixture_BNB_oneObs`
- `dNmixture_BBP_v`
- `dNmixture_BBP_s`
- `dNmixture_BBP_oneObs`
- `dNmixture_BBNB_v`
- `dNmixture_BBNB_s`
- `dNmixture_BBNB_oneObs`

If an N-mixture distribution needs to be used with AD (e.g. for HMC or Laplace approximation), replace `dNmixture` with `dNmixtureAD`. In that case, one must provide `Nmin` and `Nmax` values manually; the second (heuristic) option described above is not available.

Further details on all the distributions in `nimbleEcology` can be found on the help pages within R, e.g. `help(dNmixture)`.