Compared to the univariate gridded Gaussian case, we now place the data irregularly and assume we observe counts rather than a Gaussian response.
library(magrittr)
library(dplyr)
library(ggplot2)
library(meshed)
set.seed(2021)
SS <- 30 # coord values for jth dimension
dd <- 2 # spatial dimension
n <- SS^2 # number of locations
p <- 3 # number of covariates
# irregularly spaced
coords <- cbind(runif(n), runif(n))
colnames(coords) <- c("Var1", "Var2")
sigmasq <- 1.5
phi <- 10
# covariance at coordinates
CC <- sigmasq * exp(-phi * as.matrix(dist(coords)))
# cholesky of covariance matrix
LC <- t(chol(CC))
# spatial latent effects are a GP
w <- LC %*% rnorm(n)
# covariates and coefficients
X <- matrix(rnorm(n*p), ncol=p)
Beta <- matrix(rnorm(p), ncol=1)
# univariate outcome, fully observed
y_full <- rpois(n, exp(-3 + X %*% Beta + w))
# now introduce some NA values in the outcomes
y <- y_full %>% matrix(ncol=1)
y[sample(1:n, n/5, replace=FALSE), ] <- NA
simdata <- coords %>%
cbind(data.frame(Outcome_full= y_full,
Outcome_obs = y,
w = w))
simdata %>% ggplot(aes(Var1, Var2, color=w)) +
geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("w: Latent GP")
simdata %>% ggplot(aes(Var1, Var2, color=y)) +
geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("y: Observed outcomes")
We now fit the following spatial regression model y Poisson(η), where log(η) = X, and w ∼ MGP are spatial random effects. For spatial data, an exponential covariance function is used by default: C(h) = σ2exp (−ϕh) where h is the spatial distance.
The prior for the spatial decay ϕ is U[l, u] and the values of l and u must be specified. We choose l = 1, u = 30 for this dataset.1
Setting verbose
tells spmeshed
how many
intermediate messages to output while running MCMC. For brevity, we opt
to run a very short chain of MCMC with only 2000 iterations, of which we
discard the first third. Since the data are irregularly spaced, we build
a grid of knots of size 1600 using argument grid_size
,
which will facilitate computations. Then, just like in the gridded case
we use block_size=20
to specify the coarseness of domain
partitioning.
Finally, we note that NA
values for the outcome are OK
since the full dataset is on a grid. spmeshed
will figure
this out and use settings optimized for partly observed lattices, and
automatically predict the outcomes at missing locations. On the other
hand, X
values are assumed to not be missing.
mcmc_keep <- 200 # too small! this is just a vignette.
mcmc_burn <- 400
mcmc_thin <- 2
mesh_total_time <- system.time({
meshout <- spmeshed(y, X, coords,
family="poisson",
grid_size=c(20, 20),
block_size = 20,
n_samples = mcmc_keep, n_burn = mcmc_burn, n_thin = mcmc_thin,
n_threads = 16,
verbose = 5,
prior=list(phi=c(1,30))
)})
#> Bayesian Meshed GP regression model fit via Markov chain Monte Carlo
#> Sending to MCMC.
#> 20.2% elapsed: 514ms (+ 499ms). ETR: 2.23s.
#> theta: Metrop. acceptance 31.00%, average 38.27%
#> theta = 1.637 0.155
#> lambda = 2.254
#>
#> 40.4% elapsed: 1061ms (+ 546ms). ETR: 1.72s.
#> theta: Metrop. acceptance 25.50%, average 30.96%
#> theta = 1.743 0.244
#> lambda = 2.236
#>
#> 60.5% elapsed: 1608ms (+ 546ms). ETR: 1.15s.
#> theta: Metrop. acceptance 24.50%, average 29.13%
#> theta = 3.171 0.907
#> lambda = 2.520
#>
#> 80.6% elapsed: 2215ms (+ 607ms). ETR: 0.59s.
#> theta: Metrop. acceptance 28.00%, average 28.84%
#> theta = 1.876 0.325
#> lambda = 1.763
#>
#> MCMC done [2.789s]
We can now do some postprocessing of the results. We extract
posterior marginal summaries for σ2, ϕ, τ2, and β2. The model that
spmeshed
targets is a slight reparametrization of the
above:2
log(η) = Xβ + λw,
where w ∼ MGP
has unitary variance. This model is equivalent to the previous one and
in fact we find σ2 = λ2.
Naturally, it is much more difficult to estimate parameters when data
are counts.
summary(meshout$lambda_mcmc[1,1,]^2)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 2.465 4.480 6.105 6.376 7.571 15.741
summary(meshout$theta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.001 1.248 1.515 1.675 1.988 3.171
summary(meshout$beta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.73118 -0.41000 -0.33049 -0.33305 -0.26083 -0.07672
We proceed to plot predictions across the domain along with the
recovered latent effects. We plot the latent effects at the grid we used
for fitting spmeshed
. Instead, we plot our predictions at
the original data locations. We may see some pattern by plotting the
data on the log scale.
# process means
wmesh <- data.frame(w_mgp = meshout$w_mcmc %>% summary_list_mean())
# predictions
ymesh <- data.frame(y_mgp = meshout$yhat_mcmc %>% summary_list_mean())
outdf <- meshout$coordsdata %>%
cbind(wmesh, ymesh) %>%
left_join(simdata, by = c("Var1", "Var2"))
outdf %>% filter(forced_grid==1) %>%
ggplot(aes(Var1, Var2, fill=w_mgp)) +
geom_raster() +
scale_fill_viridis_c() +
theme_minimal() + ggtitle("w: recovered")
outdf %>% filter(forced_grid==0) %>%
ggplot(aes(Var1, Var2, color=y_mgp)) +
geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("y: predictions")
spmeshed
implements a model which can be
interpreted as assigning σ2 a folded-t via
parameter expansion if settings$ps=TRUE
, or an inverse
Gamma with parameters a = 2
and b = 1 if
settings$ps=FALSE
, which cannot be changed at this time.
τ2 is assigned an
exponential prior.↩︎
At its core, spmeshed
implements the
spatial factor model Y(s) Poisson(exp(X(s)β + Λv(s)))
where w(s) = Λv(s)
is modeled via linear coregionalization.↩︎