According to the article Albert & Siddhartha (1993), a possible model is to assume the existence of an underlying latent variable related to our observed binary variable using the following proposition :
$$ \begin{aligned} &z_{ij} = \alpha_i X_i'\beta_j+ W_i'\lambda_j + \epsilon_{ij},\\ &\text{ with } \epsilon_{ij} \sim \mathcal{N}(0,1) \ \forall ij \text{ and such as : } \\ &y_{ij}= \begin{cases} 1 & \text{ if } z_{ij} > 0 \\ 0 & \text{ otherwise.} \end{cases} \end{aligned} \Rightarrow \begin{cases} y_{ij}| z_{ij} \sim \mathcal{B}ernoulli(\theta_{ij}) \text{ with } \\ \theta_{ij} = \Phi(\alpha_i + X_i'\beta_j+ W_i'\lambda_j) \\ \text{where } \Phi \text{ correspond to the repartition function} \\ \text{of the reduced centred normal distribution.} \end{cases} $$
$$\begin{aligned} \mathbb{P}(y_{ij}=1) & = \mathbb{P}(z_{ij} > 0)\\ & = \mathbb{P}(\alpha_i + X_i'\beta_j + W_i'\lambda_j + \epsilon_{ij} > 0)\\ & = \mathbb{P}(\epsilon_{ij} > - (\alpha_i + X_i'\beta_j + W_i'\lambda_j) \ ) \\ & = \mathbb{P}(\epsilon_{ij} \leq \alpha_i + X_i'\beta_j + W_i'\lambda_j) \\ & = \Phi( \alpha_i + X_i'\beta_j + W_i'\lambda_j) \\ \end{aligned}$$
In the same way:
$$\begin{aligned} \mathbb{P}(y_{ij}=0) & = \mathbb{P}(z_{ij} \leq 0)\\ & = \mathbb{P}(\epsilon_{ij} \leq - (\alpha_i + X_i'\beta_j + W_i'\lambda_j) \ ) \\ & = \mathbb{P}(\epsilon_{ij} > \alpha_i + X_i'\beta_j + W_i'\lambda_j) \\ & = 1 - \Phi( \alpha_i + X_i'\beta_j + W_i'\lambda_j) \\ \end{aligned}$$
with the following parameters and priors :
Latent variables: Wi = (Wi1, …, Wiq) where q is the number of latent variables considered, which has to be fixed by the user (by default q=2). We assume that Wi ∼ 𝒩(0, Iq) and we define the associated coefficients: λj = (λj1, …, λjq)′. We use a prior distribution 𝒩(μλ, Vλ) for each lambda not concerned by constraints to 0 on upper diagonal and to strictly positive values on diagonal.
Explanatory variables:
The corresponding regression coefficients for each species j are noted : βj = (βj0, βj1, …, βjp)′ where βj0 represents the intercept for species j which is assume to be a fixed effect.
αi represents the random effect of site i such as αi ∼ 𝒩(0, Vα) and we assumed that Vα ∼ ℐ𝒢(shape = 0.5, rate = 0.005) as prior distribution by default.
We go back to a model of the form: Z′ = Xβ + ϵ to estimate the posterior distributions of betas, lambdas and latent variables Wi of the model. For example concerning λj, we define Z′ij = Zij − αi − Xi′βj such as Z′ij = Wi′λj + ϵij so Z′ij | Wi , λj ∼ 𝒩(Wi′λj, 1).
In this case we can use the following proposition:
$$\begin{cases} Y \ | \ \beta &\sim \mathcal{N}_n ( X\beta, I_n) \\ \beta &\sim \mathcal{N}_p (m,V) \end{cases} \Rightarrow \begin{cases} \beta|Y &\sim \mathcal{N}_p (m^*,V^*) \text{ with } \\ m^* &= (V^{-1} + X'X)^{-1}(V^{-1}m + X'Y)\\ V^*&=(V^{-1} + X'X)^{-1} \end{cases}$$.
$$\begin{aligned} p(\beta \ | \ Y) & \propto p(Y \ | \ \beta) \ p(\beta) \\ & \propto \frac{1}{(2\pi)^{\frac{n}{2}}}\exp\left(-\frac{1}{2}(Y-X\beta)'(Y-X\beta)\right)\frac{1}{(2\pi)^{\frac{p}{2}}|V|^{\frac{1}{2}}}\exp\left(-\frac{1}{2}(\beta-m)'V^{-1}(\beta-m)\right) \\ & \propto \exp\left(-\frac{1}{2}\left((\beta-m)'V^{-1}(\beta-m) + (Y-X\beta)'(Y-X\beta)\right)\right) \\ & \propto \exp\left(-\frac{1}{2}\left(\beta'V^{-1}\beta + m'V^{-1}m - m'V^{-1}\beta -\beta'V^{-1}m + Y'Y + \beta'X'X\beta - Y'X\beta - \beta'X'Y\right)\right) \\ & \propto \exp\left(-\frac{1}{2}\left(\beta'(V^{-1}+X'X)\beta -\beta'(V^{-1}m + X'Y) - (Y'X + m'V^{-1})\beta + m'V^{-1}m + Y'Y \right)\right) \\ & \propto \exp\left(-\frac{1}{2}\left(\beta'(V^{-1}+X'X)\beta -\beta'(V^{-1}m + X'Y) - (X'Y + V^{-1}m)'\beta + m'V^{-1}m + Y'Y \right)\right) \\ & \propto \exp(-\frac{1}{2}\left(\beta - (V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)\right)'(V^{-1}+X'X)\left(\beta - (V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)\right)\\ & \quad -(V^{-1}m + X'Y)'(V^{-1}+X'X)^{-1}(V^{-1}m + X'Y) +m'V^{-1}m + Y'Y)\\ & \propto \exp\left(-\frac{1}{2}\left(\beta - \underbrace{(V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)}_{m^*}\right)'\underbrace{(V^{-1}+X'X)}_{{V^*}^{-1}}\left(\beta - (V^{-1}+X'X)^{-1}(V^{-1}m + X'Y)\right)\right) \end{aligned}$$
Actually, we use that proposition to estimate betas, lambdas and gammas if species traits data are provided.
About the posterior distribution of the random site effects (αi)i = 1, …, nsite, we can use a transformation of the form Z′ij = αi + ϵij, with Z′ij = Zij − Wi′λj − Xi′βj so Z′ij | Wi, λj, βj, αi ∼ 𝒩(αi, 1). We then use the following proposition:
$$\begin{cases} x \ | \ \theta & \sim \mathcal{N}(\theta, \ \sigma^2) \\ \theta & \sim \mathcal{N}(\mu_0,{\tau_0}^2) \\ \sigma^2 & \text{ known} \end{cases} \Rightarrow \begin{cases} \theta | \ x &\sim \mathcal{N}(\mu_1,{\tau_1}^2) \text{ with }\\ \mu_1 &= \dfrac{{\tau_0}^2\mu_0 + x\sigma^2}{{\tau_0}^{-2}+\sigma^{-2}} \\ {\tau_1}^{-2} &={\tau_0}^{-2}+\sigma^{-2} \end{cases}$$.
$$\begin{aligned} p(\theta \ | \ x) & \propto p(x \ | \ \theta) \ p(\theta) \\ & \propto \frac{1}{(2\pi\sigma^2)^{\frac{1}{2}}}\exp\left(-\frac{1}{2\sigma^2}(x-\theta)^2\right)\frac{1}{(2\pi{\tau_0}^2)^{\frac{1}{2}}}\exp\left(-\frac{1}{2{\tau_0}^2}(\theta-\mu_0)^2\right) \\ & \propto \exp\left(-\frac{1}{2{\tau_0}^2}(\theta-\mu_0)^2-\frac{1}{2\sigma^2}(x-\theta)^2\right) \\ & \propto \exp\left(-\frac{1}{2{\tau_0}^2}(\theta^2-2\mu_0\theta)-\frac{1}{2\sigma^2}(\theta^2-2x\theta)\right)\\ & \propto \exp\left(-\frac{1}{2}\left(\theta^2 ({\tau_0}^{-2}+\sigma^{-2})-2\mu_0\theta{\tau_0}^{-2}-2x\theta\sigma^{-2}\right)\right)\\ & \propto \exp\left(-\frac{1}{2({\tau_0}^{-2}+\sigma^{-2})^{-1}}\left(\theta^2 -2\theta \frac{\mu_0{\tau_0}^{-2}+ x\sigma^{-2}}{{\tau_0}^{-2}+\sigma^{-2}}\right)\right)\\ \end{aligned}$$
Concerning posterior distribution of Vα, the variance
of random site effects (αi)i = 1, …, nsite,
we use the following proposition :
If $$\begin{cases}
x \ | \ \sigma^2 & \sim \mathcal{N}_n (\theta, \ \sigma^2I_n) \\
\sigma^2 & \sim \mathcal{IG} (a,b) \\
\theta & \text{ known}
\end{cases} \Rightarrow
\begin{cases}
\sigma^2|x \sim \mathcal{IG}(a',b') \text{ with } \\
a' = a + \frac{n}{2} \\
b' = \frac{1}{2}\sum\limits_{i=1}^n(x_i-\theta)^2 + b.
\end{cases}$$
$$\begin{aligned} p(\sigma^2 \ | \ x) & \propto p(x \ | \ \sigma^2) \ p(\sigma^2) \\ & \propto \frac{1}{(2\pi\sigma^2)^{\frac{n}{2}}}\exp\left(-\frac{1}{2\sigma^2}(x-\theta)'(x-\theta)\right)\frac{b^a}{\Gamma(a)}{(\sigma^2)}^{-(a+1)}\exp\left(-\frac{b}{\sigma^2}\right) \\ & \propto {(\sigma^2)}^{-\left(\underbrace{\frac{n}{2}+a}_{a'}+1\right)}\exp\left(-\frac{1}{\sigma^2}\underbrace{\left(b+\frac{1}{2}\sum\limits_{i=1}^n(x_i-\theta)^2\right)}_{b'}\right) \end{aligned}$$
In the Bayesian framework, Gibbs’ algorithm produces a realization of the parameter θ = (θ1, …, θm) according to the a posteriori law Π(θ | x) as soon as we are able to express the conditional laws: Π(θi|θ1, …, θi − 1, θi + 1, …, θm, x) for i = 1, …, m.
Gibbs sampling consists of:
Initialization : arbitrary choice of θ(0) = (θ1(0), …, θm(0)).
Iteration t : Genererate θ(t) as follows :
θ1(t) ∼ Π(θ1 |θ2(t − 1), …, θm(t − 1), x)
θ2(t) ∼ Π((θ2 | (θ1(t), θ3(t − 1), …, θm(t − 1), x)
θm(t) ∼ Π(θm | θ1(t), …, θm − 1(t), x)
Successive iterations of this algorithm generate the states of a Markov chain {θ(t), t > 0} to values in ℝm, we show that this chain admits an invariant measure which is the a posteriori law.
For a sufficiently large number of iterations, the vector θ obtained can thus be considered as a realization of the joint a posteriori law Π(θ | x).
Consequently, the implementation of a Gibbs sampler requires the knowledge of the a posteriori distributions of each of the parameters conditionally to the other parameters of the model, which can be deduced from the conjugated priors formulas in the case of the probit model but are not explicitly expressible in the case where a logit or log link function is used.
The algorithm used in jSDM_binomial_probit()
function to
estimate the parameters of the probit model is therefore as follows:
Initialize all parameters to 0 for example, except the diagonal values of Λ initialized at 1 and Vα(0) = 1.
Gibbs sampler: at each iteration t for t = 1, …, NGibbs we repeat each of these steps :
Generate the latent variable Z(t) = (Zij(t))i = 1, …, Ij = 1, …, J such that $$Z_{ij}^{(t)} \sim \begin{cases} \mathcal{N}\left(\alpha_i^{(t−1)} + X_i\beta_j^{(t−1)} + W_i^{(t−1)}\lambda_j^{(t−1)}, \ 1 \right) \text{ right truncated by } 0 & \text{ if } y_{ij } =0 \\ \mathcal{N}\left(\alpha_i^{(t−1)} + X_i\beta_j^{(t−1)} + W_i^{(t−1)}\lambda_j^{(t−1)}, \ 1 \right) \text{ left truncated by } 0 & \text{ if } y_{ij} =1 \end{cases}$$ , the latent variable is thus initialized at the first iteration by generating it according to these centered normal laws.
If species traits data are provided, generate the effects of species-specific traits on species’ responses γ(t) = (γrk(t))k = 0, …, pr = 0, …, nt such as : γrk(t) |β1k(t − 1), …, βJk(t − 1) ∼ 𝒩(m⋆, V⋆), with $$m^\star = (V_{\gamma_{rk}}^{-1} + T_r'T_r)^{-1}(V_{\gamma_{rk}}^{-1}\mu_{\gamma_{rk}} + T_r\left(\beta_k^{(t-1)} - \sum\limits_{r' \neq r} T_{r'} \gamma_{r'k}^{(t-1)} \right) \text{ and } V^\star = \left(V_{\gamma_{rk}}^{-1}+ T_r'T_r\right)^{-1}.$$
Generate the fixed species effects βj(t) = (βj0(t), βj1(t), …, βjp(t))′ for j = 1, …, J such as : βj(t) | Z(t), W1(t − 1), α1(t − 1), …, WI(t − 1), αI(t − 1),,λj1(t − 1), …, λjq(t − 1) ∼ 𝒩p + 1(m⋆, V⋆), with m⋆ = (Vβ−1 + X′X)−1(Vβ−1μβj + X′Zj⋆) and V⋆ = (Vβ−1 + X′X)−1, where Zj⋆ = (Z1j⋆, …, ZIj⋆)′ such as Zij⋆ = Zij(t) − Wi(t − 1)λj(t − 1) − αi(t − 1).
Generate the the loading factors related to latent variables λj(t) = (λj1(t), …, λjq(t))′ for j = 1, …, J such as : λjl(t) | Z(t), βj(t), α(t − 1), W(t − 1), λ1(t − 1), …, λl − 1(t − 1), λl + 1(t − 1), …, λq(t − 1) ∼ 𝒩(m⋆, V⋆), with m⋆ = (Vλ−1 + Wl(t − 1)′Wl(t − 1))−1(Vλ−1μλ + Wl(t − 1)′Zj⋆) and V⋆ = (Vλ−1 + Wl(t − 1)′Wl(t − 1))−1, $$\text{ where } Z_j^\star =(Z_{1j}^\star,\ldots,Z_{Ij}^\star)' \text{ such as } Z^\star_{ij} = Z_{ij}^{(t)}-X_i\beta_j^{(t)}-\alpha_i^{(t-1)}-\sum\limits_{l'\neq l}W_{il'}\lambda_{jl'}.$$ In order to constrain the diagonal values of Λ = (λjl)j = 1, …, Jl = 1, …, q to positive values and make the matrix lower triangular, the values of λj(t) are simulated according to the following conditions: $$ \lambda_{jl}^{(t)} \sim \begin{cases} P \text{ such as } \mathbb{P}(\lambda_{jl} = 0)=1 & \text{ if } l>j, \\ \mathcal{N}(m^\star,V^\star) \text{ left truncated by } 0 & \text{ if } l=j, \\ \mathcal{N}(m^\star,V^\star) & \text{ if } l<j. \end{cases}$$
Generate the latent variables (or unmeasured predictors) Wi(t) for i = 1, …, I according to : Wi(t) | Z(t), λ(t), β(t), αi(t − 1) ∼ 𝒩q((Iq + Λ(t)′Λ(t))−1(Λ(t)′Zi⋆), (Iq + Λ(t)′Λ(t))−1), where Zi⋆ = (Zi1⋆, …, ZiJ⋆) such as Zij⋆ = Zij(t) − αi(t − 1) − Xiβj(t).
Generate the random site effects αi(t) for i = 1, …, I selon : $$ \alpha_i | \ Z^{(t)}, \lambda^{(t)}, \beta^{(t)}, W_i^{(t)} \sim \mathcal{N}\left(\dfrac{ \sum_{j=1}^J Z_{ij}^{(t)} - X_i\beta_j^{(t)} - W_i^{(t)}\lambda_j^{(t)}}{{V_{\alpha}^{(t-1)}}^{-1} + J} , \left( \frac{1}{V_{\alpha}^{(t-1)}}+ J \right)^{-1} \right)$$
Generate the variance of random site effects Vα(t) according to: $$V_\alpha^{(t)} \ | \ \alpha_1^{(t)},\ldots,\alpha_I^{(t)} \sim \mathcal{IG}\left( \text{shape}=0.5 + \frac{I}{2}, \text{rate}=0.005 + \frac{1}{2}\sum\limits_{i=1}^I \left(\alpha_i^{(t)}\right)^2\right)$$
In the same way as for the probit model, the logit model can be
defined by means of a latent variable: Zij = αi + Xiβj + Wiλj + ϵij
for i = 1, …, I et
j = 1, …, J, with
ϵij ∼ logistique(0, 1)
iid and such as: $$y_{ij}=
\begin{cases}
1 & \text{ if } Z_{ij} > 0 \\
0 & \text{ else }
\end{cases}$$ However in this case the a priori
distributions of the latent variable and the parameters are not
conjugated, we are not able to use the properties of the conjugated
priors, so modelling using a latent variable is irrelevant.
In this case it is assumed that yij |θij ∼ ℬinomial(ni, θij),
with probit(θij) = αi + Xiβj + Wiλj
and ni the
number of visits to the site i.
Therefore, the parameters of this model will be sampled by estimating
their conditional a posteriori distributions using an adaptive
Metropolis algorithm.
An a priori distribution is determined for each parameter of
the model :
$$\begin{array}{lll}
V_{\alpha} & \sim & \mathcal {IG}(\text{shape}=0.5,
\text{rate}=0.005) \text{ with } \mathrm{rate}=\frac{1}{\mathrm{scale}},
\\
\beta_{jk} & \sim & \begin{cases}
\mathcal{N}(\mu_{\beta_{jk}},V_{\beta_{k}}) \text{ for } j=1,\ldots,J
\text{ and } k=0,\ldots,p, & \text{if species traits data are
provided} \\
\text{ where } \mu_{\beta_{jk}} = \sum_{r=0}^{nt} t_{jr}.\gamma_{rk}
\text{ and } \gamma_{rk} \sim
\mathcal{N}(\mu_{\gamma_{rk}},V_{\gamma_{rk}}) & \\
\text{ for } r=0,\ldots,nt \text{ and } k=0,\ldots,p. & \\
\mathcal{N}(\mu_{\beta_{k}},V_{\beta_{k}}) \text{ for } j=1,\ldots,J
\text{ and } k=0,\ldots,p, & \text{if species traits data are not
provided} \\
\end{cases} \\
\lambda_{jl} & \sim & \begin{cases}
\mathcal{N}(\mu_{\lambda_{l}},V_{\lambda_{l}}) & \text{if } l < j
\\
\mathcal{N}(\mu_{\lambda_{l}},V_{\lambda_{l}}) \text{ left truncated by
} 0 & \text{if } l=j \\
P \text{ such as } \mathbb{P}(\lambda_{jl} = 0)=1 & \text{if }
l>j
\end{cases} \\
\quad & \quad & \text{ for } j=1,\ldots,J \text{ and }
l=1,\ldots,q.
\end{array}$$
This algorithm belongs to the MCMC methods and allows to obtain a
realization of the parameter θ = (θ1, …,,θm)
according to their conditional a posteriori distributions Π(θi|θ1, …, θi − 1, θi + 1, …, θm, x),
for i = 1, …, m known
to within a multiplicative constant.
It is called adaptive because the variance of the conditional
instrumental density used is adapted according to the number of
acceptances in the last iterations.
Initialization : θ(0) = (θ1(0), …, θm(0)) arbitrarily set, the acceptance numbers (niA)i = 1, …, m are initialized at 0 and the variances (σi2)i = 1, …, m are initialized at 1.
Iteration t : for i = 1, …, m
Generate θi⋆ ∼ q(θi(t − 1), .), with conditional instrumental density q(θi(t − 1), θi⋆) symmetric, we will choose a law 𝒩(θi(t − 1), σi2) for example.
Calculate the probability of acceptance : $$\gamma= min\left(1,\dfrac{\Pi\left(\theta_i^\star \ | \ \theta_1^{(t-1)},\dots,\theta_{i-1}^{(t-1)},\theta_{i+1}^{(t-1)},\ldots,\theta_m^{(t-1)}, x \right)}{\Pi\left(\theta_i^{(t-1)} \ | \ \theta_1^{(t-1)},\dots,\theta_{i-1}^{ (t-1)},\theta_{i+1}^{(t-1)},\ldots,\theta_m^{(t-1)},x\right)}\right)$$.
$$\theta_i^{(t)} = \begin{cases} \theta_i^\star & \text{ with probability } \gamma \\ &\text{ if we are in this case the acceptance number becomes : } n^A_{i} \leftarrow n^A_{i} +1 \\ \theta_i^{(t-1)} & \text{ with probability } 1-\gamma. \\ \end{cases}$$
During the burn-in, every DIV iteration, with $$\mathrm{DIV} = \begin{cases}
100 & \text{ if } N_{Gibbs} \geq 1000 \\
\dfrac{N_{Gibbs}}{10}& \text{ else } \\
\end{cases}$$ , where NGibbs
is the total number of iterations performed.
The variances are modified as a function of the acceptance numbers as
follows for i = 1, …, m :
The acceptance rate is calculated : $r^A_{i} = \dfrac{ n^A_i}{\mathrm{DIV}}$.
The variances are adapted according to the acceptance rate and a fixed constant Ropt : $$\sigma_i \leftarrow \begin{cases} \sigma_i\left(2-\dfrac{1-r^A_i}{1-R_{opt}}\right) & \text{ if } r^A_{i} \geq R_{opt} \\ \\ \dfrac{\sigma_i}{2-\dfrac{1-r^A_i}{1-R_{opt}}} & \text{ else } \end{cases}$$
We reset the acceptance numbers : niA ← 0.
Every $\dfrac{N_{Gibbs}}{10}$ iteration, average acceptance rates are calculated and displayed $m^A = \dfrac{1}{m}\sum\limits_{i=1,\ldots,m}r^A_i$.
An adaptive Metropolis algorithm is used to sample the model parameters according to their conditional a posteriori distributions estimated to within one multiplicative constant.
First we define the f
function that calculates the likelihood of the model as a function of
the estimated parameters:
f : λj, βj, αi, Wi, Xi, yij, ni → f(λj, βj, αi, Wi, Xi, yij, ni) = L(θij)
- Compute logit(θij) = αi + Xiβj + Wiλj.
Compute $\theta_{ij}= \dfrac{1}{1+\exp\left(-\mathrm{logit}(\theta_{ij})\right)}$.
Return $\mathrm{L}(\theta_{ij})= p(y_{ij} \ | \ \theta_{ij},n_i)= \dbinom{n_i}{y_{ij}}(\theta_{ij})^{y_{ij}}(1-\theta_{ij})^{n_i-y_{ij}}$.
We repeat those steps for i = 1, …, I et j = 1, …, J, and then we
define θ = (θij)i = 1, …Ij = 1, …, J.
This allows us to calculate the likelihood of the model: $\mathrm{L}(\theta)= \prod\limits_{\substack{1\leq
i\leq I \\ 1 \leq j\leq I}}\mathrm{L}(\theta_{ij})$.
According to Bayes’ formula we have p(θ | Y) ∝ Π(θ)L(θ). We thus use the following relations to approach the conditional a posteriori densities of each of the parameters with Π(.) the densities corresponding to their a priori laws. $$\begin{aligned} & p(\beta_{jk} \ | \ \beta_{j0},\beta_{j1},\ldots,\beta_{jk-1},\beta_{jk+1},\ldots,\beta_{jp}, \lambda_j,\alpha_1,\ldots,\alpha_I, W_1,\ldots,W_I,Y) \propto \Pi(\beta_{jk})\prod\limits_{1\leq i\leq I} \mathrm{L}(\theta_{ij})\\ &p(\lambda_{jl} \ | \ \lambda_{j1},\ldots,\lambda_{jl-1},\lambda_{jl+1},\ldots,\lambda_{jq}, \beta_j,\alpha_1,\ldots,\alpha_I, W_1,\ldots,W_I,Y) \propto \Pi(\lambda_{jl}) \prod\limits_{1\leq i \leq I}\mathrm{L}(\theta_{ij})\\ &p(W_{il} \ | \ W_{i1},\ldots,W_{il-1},W_{il+1},\ldots,W_{iq},\alpha_i,\beta_1,\ldots,\beta_J,\lambda_1,\ldots, \lambda_J,Y) \propto \Pi(W_{il}) \prod\limits_{1\leq j\leq J}\mathrm{L}(\theta_{ij})\\ &p(\alpha_i \ | \ W_i,\beta_1,\ldots,\beta_J,\lambda_1,\ldots, \lambda_j,V_{\alpha},Y) \propto \Pi(\alpha_i \ | \ V_{\alpha}) \prod\limits_{1\leq j\leq J}\mathrm{L}(\theta_{ij})\\ & \text{, for $i=1,\ldots,I$, $j=1,\ldots,J$, $k=1,\ldots,p$ and $l=1,\ldots,q$. } \end{aligned}$$
The algorithm implemented in jSDM_binomial_logit()
on
the basis of Rosenthal (2009) and Roberts & Rosenthal (2001) articles, to estimate the
parameters of the logit model is the following :
Definition of constants NGibbs,
Nburn,
Nthin
and Ropt
such that NGibbs
corresponds to the number of iterations performed by the algorithm,
Nburn
to the number of iterations required for the burn-in or warm-up
time,
$N_{samp}=
\dfrac{N_{Gibbs}-N_{burn}}{N_{thin}}$ corresponding to the number
of estimated values retained for each parameter. Indeed we record the
estimated parameters at certain iterations in order to obtain Nsamp
values, allowing us to represent a a posteriori
distribution for each parameter.
We set Ropt
the optimal acceptance ratio used in the adaptive Metropolis algorithms
implemented for each parameter of the model.
Initialize all parameters to 0 for example, except the diagonal values of Λ initialized at 1 and Vα(0) = 1. The acceptance number of each parameter is initialized to 0 and the variances of their conditional instrumental densities take the value 1.
Gibbs sampler at each iteration t for t = 1, …, NGibbs we repeat each of these steps:
Generate the random site effects αi(t)
for i = 1, …, I
according to an adaptive Metropolis algorithm that simulates αi⋆ ∼ 𝒩(αi(t − 1), σαi2)
and then calculates the acceptance rate as follows:
$$\gamma =min\left(1, \
\dfrac{\Pi\left(\alpha_i^\star \ | \
V_{\alpha}^{(t-1)}\right)\prod\limits_{1\leq j\leq
J}\left(\alpha_i^\star, W_i^{(t-1)},\beta_j^{(t-1)}, \lambda_j^{(t-1)},
X_i,y_{ij},n_i\right)}{\Pi\left(\alpha_i^{(t-1)} \ | \
V_{\alpha}^{(t-1)}\right)\prod\limits_{1\leq j\leq
J}f\left(\alpha_i^{(t-1)}, W_i^{(t-1)},\beta_j^{(t-1)},
\lambda_j^{(t-1)}, X_i,y_{ij},n_i\right)}\right).$$
Generate the variance of random site effects Vα(t) according to: $$V_\alpha^{(t)} \ | \ \alpha_1^{(t)},\ldots,\alpha_I^{(t)} \sim \mathcal{IG}\left( \text{shape}=0.5 + \frac{I}{2}, \text{rate}=0.005 + \frac{1}{2}\sum\limits_{i=1}^I \left(\alpha_i^{(t)}\right)^2\right)$$
Generate the latent variables (or unmeasured predictors) Wil(t) for i = 1, …, I and l = 1, …, q according to an adaptive Metropolis algorithm that simulates Wil⋆ ∼ 𝒩(Wil(t − 1), σWil2)and then calculates the acceptance rate as follows:
$$\gamma = min\left(1,\ \dfrac{\Pi\left(W_{il}^\star\right)\prod\limits_{1\leq j\leq J}f\left(W_{il}^\star, \alpha_i^{(t)},\beta_j^{(t-1)}, \lambda_j^{(t-1)},X_i,y_{ij},n_i\right)} {\Pi\left(W_{il}^{(t-1)}\right)\prod\limits_{1\leq j\leq J}f\left(W_{il}^{(t-1)}, \alpha_i^{(t)},\beta_j^{(t-1)}, \lambda_j^{(t-1)}, X_i,y_{ij},n_i\right)}\right).$$ * If species traits data are provided, generate the effects of species-specific traits on species’ responses γ(t) = (γrk(t))k = 0, …, pr = 0, …, nt such as : γrk(t) |β1k(t − 1), …, βJk(t − 1) ∼ 𝒩(m⋆, V⋆), with $$m^\star = (V_{\gamma_{rk}}^{-1} + T_r'T_r)^{-1}(V_{\gamma_{rk}}^{-1}\mu_{\gamma_{rk}} + T_r\left(\beta_k^{(t-1)} - \sum\limits_{r' \neq r} T_{r'} \gamma_{r'k}^{(t-1)} \right) \text{ and } V^\star = \left(V_{\gamma_{rk}}^{-1}+ T_r'T_r\right)^{-1}.$$
$$\gamma = min\left(1,\dfrac{\Pi\left(\beta_{jk}^\star\right)\prod\limits_{1\leq i\leq I}f\left(\beta_{j0}^{(t)},\small{\ldots},\beta_{jk-1}^{(t)},\beta_{jk}^\star,\beta_{jk+1}^{(t-1)},\small{\ldots}, \beta_{jp}^{(t-1)},\lambda_j^{(t-1)}, \alpha_1^{(t)},W_1^{(t)},\small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)} {\Pi\left(\beta_{jk}^{(t-1)}\right)\prod\limits_{1\leq i\leq I}f\left(\beta_{j0}^{(t)},\small{\ldots},\beta_{jk-1}^{(t)},\beta_{jk}^{(t-1)},\beta_{jk+1}^{(t-1)},\small{\ldots}, \beta_{jp}^{(t-1)},\lambda_j^{(t-1)}, \alpha_1^{(t)},W_1^{(t)}, \small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)}\right).$$
According to the article Hui (2016), we can use the Poisson distribution for the analysis of multivariate abundance data, with estimation performed using Bayesian Markov chain Monte Carlo methods.
In this case, it is assumed that yij ∼ 𝒫oisson(θij), with log(θij) = αi + Xiβj + Wiλj.
We therefore consider abundance data with a response variable noted : Y = (yij)j = 1, …, nspi = 1, …, nsite such as :
$$y_{ij}=\begin{cases} 0 & \text{if species $j$ has been observed as absent at site $i$}\\ n & \text{if $n$ individuals of the species $j$ have been observed at the site $i$}. \end{cases}$$
In this case, we cannot use the properties of the conjugate priors, therefore, the parameters of this model will be sampled by estimating their conditional a posteriori distributions using an adaptive Metropolis algorithm in the Gibbs sampler, in the same way as for the logit model.
We use the same algorithm as before by replacing the logit link
function by a log link function and the binomial distribution by a
poisson’s law to calculate the likelihood of the model in the function
jSDM_poisson_log()
.