Bernoulli distribution with probit link function

Model definition

According to the article Albert & Siddhartha (1993), a possible model is to assume the existence of an underlying latent variable related to our observed binary variable using the following proposition :

Proposition

$$ \begin{aligned} &z_{ij} = \alpha_i X_i'\beta_j+ W_i'\lambda_j + \epsilon_{ij},\\ &\text{ with } \epsilon_{ij} \sim \mathcal{N}(0,1) \ \forall ij \text{ and such as : } \\ &y_{ij}= \begin{cases} 1 & \text{ if } z_{ij} > 0 \\ 0 & \text{ otherwise.} \end{cases} \end{aligned} \Rightarrow \begin{cases} y_{ij}| z_{ij} \sim \mathcal{B}ernoulli(\theta_{ij}) \text{ with } \\ \theta_{ij} = \Phi(\alpha_i + X_i'\beta_j+ W_i'\lambda_j) \\ \text{where } \Phi \text{ correspond to the repartition function} \\ \text{of the reduced centred normal distribution.} \end{cases} $$

Proof

$$\begin{aligned} \mathbb{P}(y_{ij}=1) & = \mathbb{P}(z_{ij} > 0)\\ & = \mathbb{P}(\alpha_i + X_i'\beta_j + W_i'\lambda_j + \epsilon_{ij} > 0)\\ & = \mathbb{P}(\epsilon_{ij} > - (\alpha_i + X_i'\beta_j + W_i'\lambda_j) \ ) \\ & = \mathbb{P}(\epsilon_{ij} \leq \alpha_i + X_i'\beta_j + W_i'\lambda_j) \\ & = \Phi( \alpha_i + X_i'\beta_j + W_i'\lambda_j) \\ \end{aligned}$$

In the same way:

$$\begin{aligned} \mathbb{P}(y_{ij}=0) & = \mathbb{P}(z_{ij} \leq 0)\\ & = \mathbb{P}(\epsilon_{ij} \leq - (\alpha_i + X_i'\beta_j + W_i'\lambda_j) \ ) \\ & = \mathbb{P}(\epsilon_{ij} > \alpha_i + X_i'\beta_j + W_i'\lambda_j) \\ & = 1 - \Phi( \alpha_i + X_i'\beta_j + W_i'\lambda_j) \\ \end{aligned}$$

with the following parameters and priors :

Latent variables: W_i = (W_i1, …, W_iq) where q is the number of latent variables considered, which has to be fixed by the user (by default q=2). We assume that W_i ∼ 𝒩(0, I_q) and we define the associated coefficients: λ_j = (λ_j1, …, λ_jq)′. We use a prior distribution 𝒩(μ_λ, V_λ) for each lambda not concerned by constraints to 0 on upper diagonal and to strictly positive values on diagonal.
Explanatory variables:
- bioclimatic data about each site. X = (X_i)_{i = 1, …, nsite} with X_i = (x_i0, x_i1, …, x_ip) ∈ ℝ^p + 1 where p is the number of bioclimatic variables considered and ∀i, x_i0 = 1.
- traits data about each species. T = (T_j)_{j = 1, …, nspecies} with T_j = (t_j0, t_j1, …, t_jq, …, t_jnt) ∈ ℝ^nt + 1 where nt is the number of species specific traits considered and t_j0 = 1, ∀j.
The corresponding regression coefficients for each species j are noted : β_j = (β_j0, β_j1, …, β_jp)′ where β_j0 represents the intercept for species j which is assume to be a fixed effect.
- In the absence of data on species traits, the effect of species j: β_j; follows the same a priori Gaussian distribution such that β_j ∼ 𝒩_p + 1(μ_β, V_β), for each species.
- If species traits data are provided, the effect of species j: β_j; follows an a priori Gaussian distribution such that β_j ∼ 𝒩_p + 1(μ_{β_j}, V_β), where $\mu_{\beta_{jk}} = \sum_{r=0}^{nt} t_{jr}.\gamma_{rk}$ for k = 0, …, p, takes different values for each species. We assume that γ_rk ∼ 𝒩(μ_{γ_rk}, V_{γ_rk}) as prior distribution.
α_i represents the random effect of site i such as α_i ∼ 𝒩(0, V_α) and we assumed that V_α ∼ ℐ𝒢(shape = 0.5, rate = 0.005) as prior distribution by default.

Conjugate priors

Fixed species effects

Proposition

We go back to a model of the form: Z′ = Xβ + ϵ to estimate the posterior distributions of betas, lambdas and latent variables W_i of the model. For example concerning λ_j, we define Z′_ij = Z_ij − α_i − X_i′β_j such as Z′_ij = W_i′λ_j + ϵ_ij so Z′_ij | W_i , λ_j ∼ 𝒩(W_i′λ_j, 1).

In this case we can use the following proposition:

$$\begin{cases} Y \ | \ \beta &\sim \mathcal{N}_n ( X\beta, I_n) \\ \beta &\sim \mathcal{N}_p (m,V) \end{cases} \Rightarrow \begin{cases} \beta|Y &\sim \mathcal{N}_p (m^*,V^*) \text{ with } \\ m^* &= (V^{-1} + X'X)^{-1}(V^{-1}m + X'Y)\\ V^*&=(V^{-1} + X'X)^{-1} \end{cases}$$.

Proof

$$\begin{aligned} p(\beta \ | \ Y) & \propto p(Y \ | \ \beta) \ p(\beta) \\ & \propto \frac{1}{(2\pi)^{\frac{n}{2}}}\exp\left(-\frac{1}{2}(Y-X\beta)'(Y-X\beta)\right)\frac{1}{(2\pi)^{\frac{p}{2}}|V|^{\frac{1}{2}}}\exp\left(-\frac{1}{2}(\beta-m)'V^{-1}(\beta-m)\right) \\ & \propto \exp\left(-\frac{1}{2}\left((\beta-m)'V^{-1}(\beta-m) + (Y-X\beta)'(Y-X\beta)\right)\right) \\ & \propto \exp\left(-\frac{1}{2}\left(\beta'V^{-1}\beta + m'V^{-1}m - m'V^{-1}\beta -\beta'V^{-1}m + Y'Y + \beta'X'X\beta - Y'X\beta - \beta'X'Y\right)\right) \\ & \propto \exp\left(-\frac{1}{2}\left(\beta'(V^{-1}+X'X)\beta -\beta'(V^{-1}m + X'Y) - (Y'X + m'V^{-1})\beta + m'V^{-1}m + Y'Y \right)\right) \\ & \propto \exp\left(-\frac{1}{2}\left(\beta'(V^{-1}+X'X)\beta -\beta'(V^{-1}m + X'Y) - (X'Y + V^{-1}m)'\beta + m'V^{-1}m + Y'Y \right)\right) \\ & \propto \exp(-\frac{1}{2}\left(\beta - (V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)\right)'(V^{-1}+X'X)\left(\beta - (V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)\right)\\ & \quad -(V^{-1}m + X'Y)'(V^{-1}+X'X)^{-1}(V^{-1}m + X'Y) +m'V^{-1}m + Y'Y)\\ & \propto \exp\left(-\frac{1}{2}\left(\beta - \underbrace{(V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)}_{m^*}\right)'\underbrace{(V^{-1}+X'X)}_{{V^*}^{-1}}\left(\beta - (V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)\right)\right) \end{aligned}$$

Actually, we use that proposition to estimate betas, lambdas and gammas if species traits data are provided.

Random site effects

Proposition

About the posterior distribution of the random site effects (α_i)_{i = 1, …, nsite}, we can use a transformation of the form Z′_ij = α_i + ϵ_ij, with Z′_ij = Z_ij − W_i′λ_j − X_i′β_j so Z′_ij | W_i, λ_j, β_j, α_i ∼ 𝒩(α_i, 1). We then use the following proposition:

$$\begin{cases} x \ | \ \theta & \sim \mathcal{N}(\theta, \ \sigma^2) \\ \theta & \sim \mathcal{N}(\mu_0,{\tau_0}^2) \\ \sigma^2 & \text{ known} \end{cases} \Rightarrow \begin{cases} \theta | \ x &\sim \mathcal{N}(\mu_1,{\tau_1}^2) \text{ with }\\ \mu_1 &= \dfrac{{\tau_0}^2\mu_0 + x\sigma^2}{{\tau_0}^{-2}+\sigma^{-2}} \\ {\tau_1}^{-2} &={\tau_0}^{-2}+\sigma^{-2} \end{cases}$$.

Proof

$$\begin{aligned} p(\theta \ | \ x) & \propto p(x \ | \ \theta) \ p(\theta) \\ & \propto \frac{1}{(2\pi\sigma^2)^{\frac{1}{2}}}\exp\left(-\frac{1}{2\sigma^2}(x-\theta)^2\right)\frac{1}{(2\pi{\tau_0}^2)^{\frac{1}{2}}}\exp\left(-\frac{1}{2{\tau_0}^2}(\theta-\mu_0)^2\right) \\ & \propto \exp\left(-\frac{1}{2{\tau_0}^2}(\theta-\mu_0)^2-\frac{1}{2\sigma^2}(x-\theta)^2\right) \\ & \propto \exp\left(-\frac{1}{2{\tau_0}^2}(\theta^2-2\mu_0\theta)-\frac{1}{2\sigma^2}(\theta^2-2x\theta)\right)\\ & \propto \exp\left(-\frac{1}{2}\left(\theta^2 ({\tau_0}^{-2}+\sigma^{-2})-2\mu_0\theta{\tau_0}^{-2}-2x\theta\sigma^{-2}\right)\right)\\ & \propto \exp\left(-\frac{1}{2({\tau_0}^{-2}+\sigma^{-2})^{-1}}\left(\theta^2 -2\theta \frac{\mu_0{\tau_0}^{-2}+ x\sigma^{-2}}{{\tau_0}^{-2}+\sigma^{-2}}\right)\right)\\ \end{aligned}$$

Random site effect variance

Proposition

Concerning posterior distribution of V_α, the variance of random site effects (α_i)_{i = 1, …, nsite}, we use the following proposition :
If $$\begin{cases} x \ | \ \sigma^2 & \sim \mathcal{N}_n (\theta, \ \sigma^2I_n) \\ \sigma^2 & \sim \mathcal{IG} (a,b) \\ \theta & \text{ known} \end{cases} \Rightarrow \begin{cases} \sigma^2|x \sim \mathcal{IG}(a',b') \text{ with } \\ a' = a + \frac{n}{2} \\ b' = \frac{1}{2}\sum\limits_{i=1}^n(x_i-\theta)^2 + b. \end{cases}$$

Proof

$$\begin{aligned} p(\sigma^2 \ | \ x) & \propto p(x \ | \ \sigma^2) \ p(\sigma^2) \\ & \propto \frac{1}{(2\pi\sigma^2)^{\frac{n}{2}}}\exp\left(-\frac{1}{2\sigma^2}(x-\theta)'(x-\theta)\right)\frac{b^a}{\Gamma(a)}{(\sigma^2)}^{-(a+1)}\exp\left(-\frac{b}{\sigma^2}\right) \\ & \propto {(\sigma^2)}^{-\left(\underbrace{\frac{n}{2}+a}_{a'}+1\right)}\exp\left(-\frac{1}{\sigma^2}\underbrace{\left(b+\frac{1}{2}\sum\limits_{i=1}^n(x_i-\theta)^2\right)}_{b'}\right) \end{aligned}$$

Gibbs sampler principle

In the Bayesian framework, Gibbs’ algorithm produces a realization of the parameter θ = (θ₁, …, θ_m) according to the a posteriori law Π(θ | x) as soon as we are able to express the conditional laws: Π(θ_i|θ₁, …, θ_i − 1, θ_i + 1, …, θ_m, x) for i = 1, …, m.

Gibbs sampling consists of:

Initialization : arbitrary choice of θ⁽⁰⁾ = (θ₁⁽⁰⁾, …, θ_m⁽⁰⁾).
Iteration t : Genererate θ^(t) as follows :
- θ₁^(t) ∼ Π(θ₁ |θ₂^{(t − 1)}, …, θ_m^{(t − 1)}, x)
- θ₂^(t) ∼ Π((θ₂ | (θ₁^(t), θ₃^{(t − 1)}, …, θ_m^{(t − 1)}, x)
- θ_m^(t) ∼ Π(θ_m | θ₁^(t), …, θ_m − 1^(t), x)

Successive iterations of this algorithm generate the states of a Markov chain {θ^(t), t > 0} to values in ℝ^m, we show that this chain admits an invariant measure which is the a posteriori law.

For a sufficiently large number of iterations, the vector θ obtained can thus be considered as a realization of the joint a posteriori law Π(θ | x).

Consequently, the implementation of a Gibbs sampler requires the knowledge of the a posteriori distributions of each of the parameters conditionally to the other parameters of the model, which can be deduced from the conjugated priors formulas in the case of the probit model but are not explicitly expressible in the case where a logit or log link function is used.

Gibbs sampler using conjuate priors

The algorithm used in jSDM_binomial_probit() function to estimate the parameters of the probit model is therefore as follows:

Define the constants N_Gibbs, N_burn, N_thin such that N_Gibbs corresponds to the number of iterations performed by the Gibbs sampler, N_burn to the number of iterations required for burn-in or warm-up time and $N_{samp} = \dfrac{N_{Gibbs}-N_{burn}}{N_{thin}}$ to the number of estimated values retained for each parameter. Indeed, the estimated parameters are recorded at certain iterations, in order to obtain a sample of N_samp values distributed according to the a posteriori distribution for each of the parameters.

Initialize all parameters to 0 for example, except the diagonal values of Λ initialized at 1 and V_α⁽⁰⁾ = 1.

Gibbs sampler: at each iteration t for t = 1, …, N_Gibbs we repeat each of these steps :
- Generate the latent variable Z^(t) = (Z_ij^(t))_{i = 1, …, I}^{j = 1, …, J} such that $$Z_{ij}^{(t)} \sim \begin{cases} \mathcal{N}\left(\alpha_i^{(t−1)} + X_i\beta_j^{(t−1)} + W_i^{(t−1)}\lambda_j^{(t−1)}, \ 1 \right) \text{ right truncated by } 0 & \text{ if } y_{ij } =0 \\ \mathcal{N}\left(\alpha_i^{(t−1)} + X_i\beta_j^{(t−1)} + W_i^{(t−1)}\lambda_j^{(t−1)}, \ 1 \right) \text{ left truncated by } 0 & \text{ if } y_{ij} =1 \end{cases}$$ , the latent variable is thus initialized at the first iteration by generating it according to these centered normal laws.
- If species traits data are provided, generate the effects of species-specific traits on species’ responses γ^(t) = (γ_rk^(t))_{k = 0, …, p}^{r = 0, …, nt} such as : γ_rk^(t) |β_1k^{(t − 1)}, …, β_Jk^{(t − 1)} ∼ 𝒩(m^⋆, V^⋆), with $$m^\star = (V_{\gamma_{rk}}^{-1} + T_r'T_r)^{-1}(V_{\gamma_{rk}}^{-1}\mu_{\gamma_{rk}} + T_r\left(\beta_k^{(t-1)} - \sum\limits_{r' \neq r} T_{r'} \gamma_{r'k}^{(t-1)} \right) \text{ and } V^\star = \left(V_{\gamma_{rk}}^{-1}+ T_r'T_r\right)^{-1}.$$
- Generate the fixed species effects β_j^(t) = (β_j0^(t), β_j1^(t), …, β_jp^(t))′ for j = 1, …, J such as : β_j^(t) | Z^(t), W₁^{(t − 1)}, α₁^{(t − 1)}, …, W_I^{(t − 1)}, α_I^{(t − 1),,λ_j1^{(t − 1)}, …, λ_jq^{(t − 1)}} ∼ 𝒩_p + 1(m^⋆, V^⋆), with m^⋆ = (V_β⁻¹ + X′X)⁻¹(V_β⁻¹μ_{β_j} + X′Z_j^⋆) and V^⋆ = (V_β⁻¹ + X′X)⁻¹, where Z_j^⋆ = (Z_1j^⋆, …, Z_Ij^⋆)′ such as Z_ij^⋆ = Z_ij^(t) − W_i^{(t − 1)}λ_j^{(t − 1)} − α_i^{(t − 1)}.
- Generate the the loading factors related to latent variables λ_j^(t) = (λ_j1^(t), …, λ_jq^(t))′ for j = 1, …, J such as : λ_jl^(t) | Z^(t), β_j^(t), α^{(t − 1)}, W^{(t − 1)}, λ₁^{(t − 1)}, …, λ_l − 1^{(t − 1)}, λ_l + 1^{(t − 1)}, …, λ_q^{(t − 1)} ∼ 𝒩(m^⋆, V^⋆), with m^⋆ = (V_λ⁻¹ + W_l^{(t − 1)}′W_l^{(t − 1)})⁻¹(V_λ⁻¹μ_λ + W_l^{(t − 1)}′Z_j^⋆) and V^⋆ = (V_λ⁻¹ + W_l^{(t − 1)}′W_l^{(t − 1)})⁻¹, $$\text{ where } Z_j^\star =(Z_{1j}^\star,\ldots,Z_{Ij}^\star)' \text{ such as } Z^\star_{ij} = Z_{ij}^{(t)}-X_i\beta_j^{(t)}-\alpha_i^{(t-1)}-\sum\limits_{l'\neq l}W_{il'}\lambda_{jl'}.$$ In order to constrain the diagonal values of Λ = (λ_jl)_{j = 1, …, J}^{l = 1, …, q} to positive values and make the matrix lower triangular, the values of λ_j^(t) are simulated according to the following conditions: $$ \lambda_{jl}^{(t)} \sim \begin{cases} P \text{ such as } \mathbb{P}(\lambda_{jl} = 0)=1 & \text{ if } l>j, \\ \mathcal{N}(m^\star,V^\star) \text{ left truncated by } 0 & \text{ if } l=j, \\ \mathcal{N}(m^\star,V^\star) & \text{ if } l<j. \end{cases}$$
- Generate the latent variables (or unmeasured predictors) W_i^(t) for i = 1, …, I according to : W_i^(t) | Z^(t), λ^(t), β^(t), α_i^{(t − 1)} ∼ 𝒩_q((I_q + Λ^(t)′Λ^(t))⁻¹(Λ^(t)′Z_i^⋆), (I_q + Λ^(t)′Λ^(t))⁻¹), where Z_i^⋆ = (Z_i1^⋆, …, Z_iJ^⋆) such as Z_ij^⋆ = Z_ij^(t) − α_i^{(t − 1)} − X_iβ_j^(t).
- Generate the random site effects α_i^(t) for i = 1, …, I selon : $$ \alpha_i | \ Z^{(t)}, \lambda^{(t)}, \beta^{(t)}, W_i^{(t)} \sim \mathcal{N}\left(\dfrac{ \sum_{j=1}^J Z_{ij}^{(t)} - X_i\beta_j^{(t)} - W_i^{(t)}\lambda_j^{(t)}}{{V_{\alpha}^{(t-1)}}^{-1} + J} , \left( \frac{1}{V_{\alpha}^{(t-1)}}+ J \right)^{-1} \right)$$
- Generate the variance of random site effects V_α^(t) according to: $$V_\alpha^{(t)} \ | \ \alpha_1^{(t)},\ldots,\alpha_I^{(t)} \sim \mathcal{IG}\left( \text{shape}=0.5 + \frac{I}{2}, \text{rate}=0.005 + \frac{1}{2}\sum\limits_{i=1}^I \left(\alpha_i^{(t)}\right)^2\right)$$

Binomial distribution with logit link function

Model definition

In the same way as for the probit model, the logit model can be defined by means of a latent variable: Z_ij = α_i + X_iβ_j + W_iλ_j + ϵ_ij for i = 1, …, I et j = 1, …, J, with ϵ_ij ∼ logistique(0, 1) iid and such as: $$y_{ij}= \begin{cases} 1 & \text{ if } Z_{ij} > 0 \\ 0 & \text{ else } \end{cases}$$ However in this case the a priori distributions of the latent variable and the parameters are not conjugated, we are not able to use the properties of the conjugated priors, so modelling using a latent variable is irrelevant.
In this case it is assumed that y_ij |θ_ij ∼ ℬinomial(n_i, θ_ij), with probit(θ_ij) = α_i + X_iβ_j + W_iλ_j and n_i the number of visits to the site i.
Therefore, the parameters of this model will be sampled by estimating their conditional a posteriori distributions using an adaptive Metropolis algorithm.

Priors used

An a priori distribution is determined for each parameter of the model :
$$\begin{array}{lll} V_{\alpha} & \sim & \mathcal {IG}(\text{shape}=0.5, \text{rate}=0.005) \text{ with } \mathrm{rate}=\frac{1}{\mathrm{scale}}, \\ \beta_{jk} & \sim & \begin{cases} \mathcal{N}(\mu_{\beta_{jk}},V_{\beta_{k}}) \text{ for } j=1,\ldots,J \text{ and } k=0,\ldots,p, & \text{if species traits data are provided} \\ \text{ where } \mu_{\beta_{jk}} = \sum_{r=0}^{nt} t_{jr}.\gamma_{rk} \text{ and } \gamma_{rk} \sim \mathcal{N}(\mu_{\gamma_{rk}},V_{\gamma_{rk}}) & \\ \text{ for } r=0,\ldots,nt \text{ and } k=0,\ldots,p. & \\ \mathcal{N}(\mu_{\beta_{k}},V_{\beta_{k}}) \text{ for } j=1,\ldots,J \text{ and } k=0,\ldots,p, & \text{if species traits data are not provided} \\ \end{cases} \\ \lambda_{jl} & \sim & \begin{cases} \mathcal{N}(\mu_{\lambda_{l}},V_{\lambda_{l}}) & \text{if } l < j \\ \mathcal{N}(\mu_{\lambda_{l}},V_{\lambda_{l}}) \text{ left truncated by } 0 & \text{if } l=j \\ P \text{ such as } \mathbb{P}(\lambda_{jl} = 0)=1 & \text{if } l>j \end{cases} \\ \quad & \quad & \text{ for } j=1,\ldots,J \text{ and } l=1,\ldots,q. \end{array}$$

Adaptive Metropolis algorithm principle

This algorithm belongs to the MCMC methods and allows to obtain a realization of the parameter θ = (θ₁, …,,θ_m) according to their conditional a posteriori distributions Π(θ_i|θ₁, …, θ_i − 1, θ_i + 1, …, θ_m, x), for i = 1, …, m known to within a multiplicative constant.
It is called adaptive because the variance of the conditional instrumental density used is adapted according to the number of acceptances in the last iterations.

Initialization : θ⁽⁰⁾ = (θ₁⁽⁰⁾, …, θ_m⁽⁰⁾) arbitrarily set, the acceptance numbers (n_i^A)_{i = 1, …, m} are initialized at 0 and the variances (σ_i²)_{i = 1, …, m} are initialized at 1.
Iteration t : for i = 1, …, m
- Generate θ_i^⋆ ∼ q(θ_i^{(t − 1)}, .), with conditional instrumental density q(θ_i^{(t − 1)}, θ_i^⋆) symmetric, we will choose a law 𝒩(θ_i^{(t − 1)}, σ_i²) for example.
- Calculate the probability of acceptance : $$\gamma= min\left(1,\dfrac{\Pi\left(\theta_i^\star \ | \ \theta_1^{(t-1)},\dots,\theta_{i-1}^{(t-1)},\theta_{i+1}^{(t-1)},\ldots,\theta_m^{(t-1)}, x \right)}{\Pi\left(\theta_i^{(t-1)} \ | \ \theta_1^{(t-1)},\dots,\theta_{i-1}^{ (t-1)},\theta_{i+1}^{(t-1)},\ldots,\theta_m^{(t-1)},x\right)}\right)$$.
- $$\theta_i^{(t)} = \begin{cases} \theta_i^\star & \text{ with probability } \gamma \\ &\text{ if we are in this case the acceptance number becomes : } n^A_{i} \leftarrow n^A_{i} +1 \\ \theta_i^{(t-1)} & \text{ with probability } 1-\gamma. \\ \end{cases}$$
During the burn-in, every DIV iteration, with $$\mathrm{DIV} = \begin{cases} 100 & \text{ if } N_{Gibbs} \geq 1000 \\ \dfrac{N_{Gibbs}}{10}& \text{ else } \\ \end{cases}$$ , where N_Gibbs is the total number of iterations performed.
The variances are modified as a function of the acceptance numbers as follows for i = 1, …, m :
- The acceptance rate is calculated : $r^A_{i} = \dfrac{ n^A_i}{\mathrm{DIV}}$.
- The variances are adapted according to the acceptance rate and a fixed constant R_opt : $$\sigma_i \leftarrow \begin{cases} \sigma_i\left(2-\dfrac{1-r^A_i}{1-R_{opt}}\right) & \text{ if } r^A_{i} \geq R_{opt} \\ \\ \dfrac{\sigma_i}{2-\dfrac{1-r^A_i}{1-R_{opt}}} & \text{ else } \end{cases}$$
- We reset the acceptance numbers : n_i^A ← 0.
Every $\dfrac{N_{Gibbs}}{10}$ iteration, average acceptance rates are calculated and displayed $m^A = \dfrac{1}{m}\sum\limits_{i=1,\ldots,m}r^A_i$.

Gibbs sampler using adaptative Metropolis algorithm

An adaptive Metropolis algorithm is used to sample the model parameters according to their conditional a posteriori distributions estimated to within one multiplicative constant.

First we define the f function that calculates the likelihood of the model as a function of the estimated parameters:
f : λ_j, β_j, α_i, W_i, X_i, y_ij, n_i → f(λ_j, β_j, α_i, W_i, X_i, y_ij, n_i) = L(θ_ij) - Compute logit(θ_ij) = α_i + X_iβ_j + W_iλ_j.

Compute $\theta_{ij}= \dfrac{1}{1+\exp\left(-\mathrm{logit}(\theta_{ij})\right)}$.
Return $\mathrm{L}(\theta_{ij})= p(y_{ij} \ | \ \theta_{ij},n_i)= \dbinom{n_i}{y_{ij}}(\theta_{ij})^{y_{ij}}(1-\theta_{ij})^{n_i-y_{ij}}$.

We repeat those steps for i = 1, …, I et j = 1, …, J, and then we define θ = (θij)_{i = 1, …I}^{j = 1, …, J}.
This allows us to calculate the likelihood of the model: $\mathrm{L}(\theta)= \prod\limits_{\substack{1\leq i\leq I \\ 1 \leq j\leq I}}\mathrm{L}(\theta_{ij})$.

According to Bayes’ formula we have p(θ | Y) ∝ Π(θ)L(θ). We thus use the following relations to approach the conditional a posteriori densities of each of the parameters with Π(.) the densities corresponding to their a priori laws. $$\begin{aligned} & p(\beta_{jk} \ | \ \beta_{j0},\beta_{j1},\ldots,\beta_{jk-1},\beta_{jk+1},\ldots,\beta_{jp}, \lambda_j,\alpha_1,\ldots,\alpha_I, W_1,\ldots,W_I,Y) \propto \Pi(\beta_{jk})\prod\limits_{1\leq i\leq I} \mathrm{L}(\theta_{ij})\\ &p(\lambda_{jl} \ | \ \lambda_{j1},\ldots,\lambda_{jl-1},\lambda_{jl+1},\ldots,\lambda_{jq}, \beta_j,\alpha_1,\ldots,\alpha_I, W_1,\ldots,W_I,Y) \propto \Pi(\lambda_{jl}) \prod\limits_{1\leq i \leq I}\mathrm{L}(\theta_{ij})\\ &p(W_{il} \ | \ W_{i1},\ldots,W_{il-1},W_{il+1},\ldots,W_{iq},\alpha_i,\beta_1,\ldots,\beta_J,\lambda_1,\ldots, \lambda_J,Y) \propto \Pi(W_{il}) \prod\limits_{1\leq j\leq J}\mathrm{L}(\theta_{ij})\\ &p(\alpha_i \ | \ W_i,\beta_1,\ldots,\beta_J,\lambda_1,\ldots, \lambda_j,V_{\alpha},Y) \propto \Pi(\alpha_i \ | \ V_{\alpha}) \prod\limits_{1\leq j\leq J}\mathrm{L}(\theta_{ij})\\ & \text{, for $i=1,\ldots,I$, $j=1,\ldots,J$, $k=1,\ldots,p$ and $l=1,\ldots,q$. } \end{aligned}$$

The algorithm implemented in jSDM_binomial_logit() on the basis of Rosenthal (2009) and Roberts & Rosenthal (2001) articles, to estimate the parameters of the logit model is the following :

Definition of constants N_Gibbs, N_burn, N_thin and R_opt such that N_Gibbs corresponds to the number of iterations performed by the algorithm, N_burn to the number of iterations required for the burn-in or warm-up time,
$N_{samp}= \dfrac{N_{Gibbs}-N_{burn}}{N_{thin}}$ corresponding to the number of estimated values retained for each parameter. Indeed we record the estimated parameters at certain iterations in order to obtain N_samp values, allowing us to represent a a posteriori distribution for each parameter.
We set R_opt the optimal acceptance ratio used in the adaptive Metropolis algorithms implemented for each parameter of the model.
Initialize all parameters to 0 for example, except the diagonal values of Λ initialized at 1 and V_α⁽⁰⁾ = 1. The acceptance number of each parameter is initialized to 0 and the variances of their conditional instrumental densities take the value 1.
Gibbs sampler at each iteration t for t = 1, …, N_Gibbs we repeat each of these steps:
- Generate the random site effects α_i^(t) for i = 1, …, I according to an adaptive Metropolis algorithm that simulates α_i^⋆ ∼ 𝒩(α_i^{(t − 1)}, σ_{α_i}²) and then calculates the acceptance rate as follows:
  $$\gamma =min\left(1, \ \dfrac{\Pi\left(\alpha_i^\star \ | \ V_{\alpha}^{(t-1)}\right)\prod\limits_{1\leq j\leq J}\left(\alpha_i^\star, W_i^{(t-1)},\beta_j^{(t-1)}, \lambda_j^{(t-1)}, X_i,y_{ij},n_i\right)}{\Pi\left(\alpha_i^{(t-1)} \ | \ V_{\alpha}^{(t-1)}\right)\prod\limits_{1\leq j\leq J}f\left(\alpha_i^{(t-1)}, W_i^{(t-1)},\beta_j^{(t-1)}, \lambda_j^{(t-1)}, X_i,y_{ij},n_i\right)}\right).$$
- Generate the variance of random site effects V_α^(t) according to: $$V_\alpha^{(t)} \ | \ \alpha_1^{(t)},\ldots,\alpha_I^{(t)} \sim \mathcal{IG}\left( \text{shape}=0.5 + \frac{I}{2}, \text{rate}=0.005 + \frac{1}{2}\sum\limits_{i=1}^I \left(\alpha_i^{(t)}\right)^2\right)$$
- Generate the latent variables (or unmeasured predictors) W_il^(t) for i = 1, …, I and l = 1, …, q according to an adaptive Metropolis algorithm that simulates W_il^⋆ ∼ 𝒩(W_il^{(t − 1)}, σ_{W_il}²)and then calculates the acceptance rate as follows:

$$\gamma = min\left(1,\ \dfrac{\Pi\left(W_{il}^\star\right)\prod\limits_{1\leq j\leq J}f\left(W_{il}^\star, \alpha_i^{(t)},\beta_j^{(t-1)}, \lambda_j^{(t-1)},X_i,y_{ij},n_i\right)} {\Pi\left(W_{il}^{(t-1)}\right)\prod\limits_{1\leq j\leq J}f\left(W_{il}^{(t-1)}, \alpha_i^{(t)},\beta_j^{(t-1)}, \lambda_j^{(t-1)}, X_i,y_{ij},n_i\right)}\right).$$ * If species traits data are provided, generate the effects of species-specific traits on species’ responses γ^(t) = (γ_rk^(t))_{k = 0, …, p}^{r = 0, …, nt} such as : γ_rk^(t) |β_1k^{(t − 1)}, …, β_Jk^{(t − 1)} ∼ 𝒩(m^⋆, V^⋆), with $$m^\star = (V_{\gamma_{rk}}^{-1} + T_r'T_r)^{-1}(V_{\gamma_{rk}}^{-1}\mu_{\gamma_{rk}} + T_r\left(\beta_k^{(t-1)} - \sum\limits_{r' \neq r} T_{r'} \gamma_{r'k}^{(t-1)} \right) \text{ and } V^\star = \left(V_{\gamma_{rk}}^{-1}+ T_r'T_r\right)^{-1}.$$

Generate the fixed species effects β_jk^(t) for j = 1, …, J and k = 0, …, p using an adaptive Metropolis algorithm that simulates β_jk^⋆ ∼ 𝒩(β_jk^{(t − 1)}, σ_{β_jk}²) and then calculates the acceptance rate as follows:

$$\gamma = min\left(1,\dfrac{\Pi\left(\beta_{jk}^\star\right)\prod\limits_{1\leq i\leq I}f\left(\beta_{j0}^{(t)},\small{\ldots},\beta_{jk-1}^{(t)},\beta_{jk}^\star,\beta_{jk+1}^{(t-1)},\small{\ldots}, \beta_{jp}^{(t-1)},\lambda_j^{(t-1)}, \alpha_1^{(t)},W_1^{(t)},\small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)} {\Pi\left(\beta_{jk}^{(t-1)}\right)\prod\limits_{1\leq i\leq I}f\left(\beta_{j0}^{(t)},\small{\ldots},\beta_{jk-1}^{(t)},\beta_{jk}^{(t-1)},\beta_{jk+1}^{(t-1)},\small{\ldots}, \beta_{jp}^{(t-1)},\lambda_j^{(t-1)}, \alpha_1^{(t)},W_1^{(t)}, \small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)}\right).$$

Generate the loading factors related to latent variables λ_jl^(t) for j = 1, …, J and l = 1, …, q according to an adaptive Metropolis algorithm for l ≤ j, simulating λ_jl^⋆ ∼ 𝒩(λ_jl^{(t − 1)}, σ_{λ_jl}²) and then calculating the acceptance rate as follows: : $$\gamma = min\left(1,\dfrac{\Pi\left(\lambda_{jl}^\star\right)\prod\limits_{1\leq i\leq I}f\left(\lambda_{j1}^{(t)},\small{\ldots},\lambda_{jl-1}^{(t)},\lambda_{jl}^\star,\lambda_{jl+1}^{(t-1)},\small{\ldots}, \lambda_{jq}^{(t-1)},\beta_j^{(t)}, \alpha_1^{(t)},W_1^{(t)},\small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)} {\Pi\left(\lambda_{jl}^{(t-1)}\right)\prod\limits_{1\leq i\leq I}f\left(\lambda_{j1}^{(t)},\small{\ldots},\lambda_{jl-1}^{(t)},\lambda_{jl}^{(t-1)},\lambda_{jl+1}^{(t-1)},\small{\ldots}, \lambda_{jq}^{(t-1)},\beta_j^{(t)}, \alpha_1^{(t)},W_1^{(t)},\small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)}\right).$$ In the case of l > j, we put λ_jl^(t) = 0.

Poisson distribution with log link function

Model definition

According to the article Hui (2016), we can use the Poisson distribution for the analysis of multivariate abundance data, with estimation performed using Bayesian Markov chain Monte Carlo methods.

In this case, it is assumed that y_ij ∼ 𝒫oisson(θ_ij), with log(θ_ij) = α_i + X_iβ_j + W_iλ_j.

We therefore consider abundance data with a response variable noted : Y = (y_ij)_{j = 1, …, nsp}^{i = 1, …, nsite} such as :

$$y_{ij}=\begin{cases} 0 & \text{if species $j$ has been observed as absent at site $i$}\\ n & \text{if $n$ individuals of the species $j$ have been observed at the site $i$}. \end{cases}$$

Gibbs sampler using adaptative Metropolis algorithm

In this case, we cannot use the properties of the conjugate priors, therefore, the parameters of this model will be sampled by estimating their conditional a posteriori distributions using an adaptive Metropolis algorithm in the Gibbs sampler, in the same way as for the logit model.

We use the same algorithm as before by replacing the logit link function by a log link function and the binomial distribution by a poisson’s law to calculate the likelihood of the model in the function jSDM_poisson_log().

References

Albert, J.H. & Siddhartha, C. (1993) Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88, 669–679.

Hui, F.K.C. (2016) Boral – Bayesian Ordination and Regression Analysis of Multivariate Abundance Data in r. Methods in Ecology and Evolution, 7, 744–750.

Roberts, G.O. & Rosenthal, J.S. (2001) Optimal scaling for various Metropolis-Hastings algorithms. Statistical Science, 16, 351–367.

Rosenthal, S. (2009) Optimal Proposal Distributions and Adaptive MCMC. Handbook of Markov Chain Monte Carlo.

Bayesian inference methods

Bernoulli distribution with probit link function

Model definition

Conjugate priors

Fixed species effects

Random site effects

Random site effect variance

Gibbs sampler principle

Gibbs sampler using conjuate priors

Binomial distribution with logit link function

Model definition

Priors used

Adaptive Metropolis algorithm principle

Gibbs sampler using adaptative Metropolis algorithm

Poisson distribution with log link function

Model definition

Gibbs sampler using adaptative Metropolis algorithm

References