In this tutorial, we will use the lynx
dataset as an
example, which can be accessed directly from R. Let’s load the dataset
and visualize it:
data <- data.frame(year = seq(1821, 1934, by = 1), y = as.numeric(lynx))
data$x <- data$year - min(data$year)
plot(lynx)
Based on a visual examination of the dataset, we can observe an obvious 10-year quasi-periodic behavior in the lynx count with evolving amplitudes over time. Therefore, we will consider fitting the following sGP model:
$$ \begin{aligned} y_i|\lambda_i &\sim \text{Poisson}(\lambda_i) ,\\ \log(\lambda_i) &= \eta_i = \beta_0 + g(x_i) + \xi_i,\\ g &\sim \text{sGP} \bigg(a = \frac{2\pi}{10}, \sigma\bigg),\\ \xi_i &\sim N(0,\sigma_\xi). \end{aligned} $$
Here, each yi represents the lynx count, xi represents the number of years since 1821, and ξi is an observation-level random intercept to account for overdispersion effect.
To specify the priors for the sGP boundary conditions and the intercept parameter, we assume independent normal priors with mean 0 and variance 1000. For the overdispersion parameter σξ, we assign an exponential prior with P(σξ > 1) = 0.01.
To determine the prior for the standard deviation parameter σ of the sGP, we employ the concept
of predictive standard deviation (PSD). We start with an exponential
prior on the 50 years PSD: P(σ(50) > 1) = 0.01. To
convert this prior to the original standard deviation parameter σ, we use the
compute_d_step_sGPsd
function from the sGPfit
package:
To fit the sGP model with BayesGP, specify
model = "sGP"
, and then specify the frequency parameter
a
and the number of sB spline basis used to approximate
k
.
mod <- model_fit(formula = y ~ f(x = year,
model = "sGP",
a = 2*pi/10, k = 30,
sd.prior = list(prior = "exp", param = prior_SD, h = 2)) +
f(x = x,
model = "IID",
sd.prior = list(prior = "exp", param = 0.5)),
family = "Poisson",
data = data,
method = "aghq")
The fitted sGP model is presented below:
summary(mod)
#> Here are some posterior/prior summaries for the parameters:
#> name median q0.025 q0.975 prior prior:P1 prior:P2
#> 1 intercept 6.689 6.518 6.853 Normal 0.000 1e+03
#> 2 year (PSD) 0.461 0.364 0.581 Exponential 0.126 1e-02
#> 3 x (SD) 0.262 0.203 0.342 Exponential 0.500 5e-01
#> For Normal prior, P1 is its mean and P2 is its variance.
#> For Exponential prior, prior is specified as P(theta > P1) = P2.
We can similarly use predict
to evaluate the effect at
different locations:
pred_sGP <- predict(mod, variable = "year", newdata = data.frame(year = seq(min(data$year), max(data$year), by = 0.1)))
plot(mean ~ year, data = pred_sGP, type = 'l', col = 'black')
lines(q0.025 ~ year, data = pred_sGP, lty = 'dashed', col = 'grey')
lines(q0.975 ~ year, data = pred_sGP, lty = 'dashed', col = 'grey')
mod <- model_fit(formula = y ~ f(x = year,
model = "sGP",
a = 2*pi/10, k = 30,
sd.prior = list(prior = "exp", param = prior_SD, h = 2),
boundary.prior = list(prec = c(Inf, Inf))) +
f(x = x,
model = "IID",
sd.prior = list(prior = "exp", param = 0.5)),
family = "Poisson",
data = data,
method = "aghq")
par(oldpar)