The Direct Sampling Spatial Prior (DSSP) is based on the thin-plate splines solution to the smoothing problem of minimising the penalised sum of squares $$ S_{\eta}(f) = \frac{1}{n}\sum^{n}_{i}W_i(y_i - f(\mathbf{x}_i))^2 +\eta J_m(f) $$ which can be written as minν (y − ν)′W(y − ν) + ην′Mν yielding the solution $$ \hat{\mathbf{\nu}}= \left(\mathbf{W}+\eta\mathbf{M}\right)^{-1}\mathbf{y}. $$ Note that the matrix M is based on thin-plate spline basis functions subject to boundary conditions. The details are omitted here for clarity and to focus on the results and implementation in package.
If we assume that the observed data are from a Gaussian distribution
y ∼ N(ν, δW−1)
and specify the prior for ν $$
\left[\mathbf{\nu}\mid\eta,\delta\right]\propto\
\frac{\eta}{\delta}^{-{r}/2}
\exp\left(-\frac{\eta}{2\delta}\mathbf{\nu}'\mathbf{M}\mathbf{\nu}\right),
$$ the resulting conditional posterior of ν is
$$
[\mathbf{\nu}|\eta,\delta,\mathbf{y}]\propto\exp\left\{
-\frac{1}{2\delta}\left[
(\mathbf{y}-\mathbf{\nu})'\mathbf{W}({\mathbf{y}}-\mathbf{\nu})
-\eta\mathbf{\nu}'\mathbf{M}\mathbf{\nu}\right]\right\}.
$$ This can be written as π(ν|η, δ, y) ∼ N(a, B)
where $$
\begin{aligned}
\operatorname{E}(\mathbf{\nu}|\eta,\delta,\mathbf{y})&=\mathbf{a}\\
&=\left(\mathbf{W}+\eta\mathbf{M}\right)^{-1}\mathbf{y}\\
\operatorname{Cov}(\mathbf{\nu})&=\mathbf{B}\\
&=\delta\left(\mathbf{W}+\eta\mathbf{M}\right)^{-1}.
\end{aligned}
$$ Note that the posterior mean with the same solution as the
penalised least squares.
The complete model is specified with a Gaussian likelihood, the
improper prior for ν, an
inverse gamma prior for δ, and
a prior for η. With this
specification the joint posterior is written π(ν, δ, η|y) ∝ f(y|ν, δ)π(ν|η, δ)π(δ)π(η).
It is possible to derive the set of conditional posterior distributions
$$
\pi\left(\mathbf{\nu}|\delta,\eta,\mathbf{y}\right)\\
\pi\left(\delta|\eta,\mathbf{y}\right)\\
\pi\left(\eta|\mathbf{y}\right)
$$ which can be sampled directly in sequence to create a draw
from the joint posterior π(ν, δ0, η|y).
This is the heart of what the function DSSP()
does1.
This short example demonstrates the use of the functions
DSSP()
and predict()
to analyse spatial
data.
The example here uses the Meuse River data set from the package
{sp}
. First start by loading the library and data set
meuse.all
.
Next, set the coordinates for the data and extract the transform the dataset into a spatial dataframe by specifying the coordinates.
meuse.train <- meuse.all[1:155, ]
meuse.valid <- meuse.all[156:164, ]
coordinates(meuse.train) <- ~ x + y
coordinates(meuse.valid) <- ~ x + y
Next, we run DSSP()
.
The function DSSP()
takes as its argument:
formula
: the model formula. This first example fits an
intercept only model for log(zinc)
.data
: the data.frame
that contains the
data specified in the formula. This is either a spatial
data.frame
or a data.frame
with columns for
it’s coordinates that are specified in the coords
argument.N
: the number of samples desiredpars
: the prior shape and rate parameters for the
inverse-gamma prior distribution on δ (the variance parameter for the
Gaussian likelihood).log_prior
: a function evaluating the log of the prior
density of η.coords
: spatial coordinates. Only required if the
data
argument is not a spatial data.frame.meuse.fit <- DSSP(
formula = log(zinc) ~ 1, data = meuse.train, N = 10000,
pars = c(0.001, 0.001), log_prior = function(x) -2 * log(1 + x)
)
Check the observed vs. fitted values.
meuse.train$Yhat <- rowMeans(exp(predict(meuse.fit)))
ggplot(as.data.frame(meuse.train), aes(Yhat, zinc)) +
geom_point(size = 3) +
geom_abline() +
labs(
x = "Smoothed Values", y = "Observed Values",
title = "Smoothed vs. Observed Values"
)
Check the posterior for the model parameters: η and δ.
ggplot(data.frame(x = meuse.fit$eta)) +
geom_density(aes(x = x)) +
labs(
x = expression(eta), y = "posterior density",
title = expression("Posterior Density of " * eta)
)
ggplot(data.frame(x = meuse.fit$delta)) +
geom_density(aes(x = x)) +
labs(
x = expression(delta), y = "posterior density",
title = expression("Posterior Density of " * delta)
)
Check the autocorrelation function (ACF) plots.
eta_acf <- acf(meuse.fit$eta, plot = FALSE)
eta_acfdf <- with(eta_acf, data.frame(lag, acf))
ggplot(data = eta_acfdf, mapping = aes(x = lag, y = acf)) +
geom_hline(aes(yintercept = 0)) +
geom_segment(mapping = aes(xend = lag, yend = 0)) +
labs(
x = "Lag", y = "ACF",
title = expression("ACF for Samples from Posterior of " * eta)
) +
theme(plot.title = element_text(hjust = 0.5))
delta_acf <- acf(meuse.fit$delta, plot = FALSE)
delta_acfdf <- with(delta_acf, data.frame(lag, acf))
ggplot(data = delta_acfdf, mapping = aes(x = lag, y = acf)) +
geom_hline(aes(yintercept = 0)) +
geom_segment(mapping = aes(xend = lag, yend = 0)) +
labs(
x = "Lag", y = "ACF",
title = expression("ACF for Samples from Posterior of " * delta)
) +
theme(plot.title = element_text(hjust = 0.5))
eta_cummean_df <- data.frame(
x = 1:length(meuse.fit$eta),
y = cumsum(meuse.fit$eta) / (1:length(meuse.fit$eta))
)
ggplot(eta_cummean_df, aes(x = x, y = y)) +
geom_line() +
labs(
x = "sample", y = expression(eta),
title = bquote(atop("Cumulative Mean of Samples", "from Posterior of" ~ eta))
) +
theme(plot.title = element_text(hjust = 0.5))
delta_cummean_df <- data.frame(
x = 1:length(meuse.fit$delta),
y = cumsum(meuse.fit$delta) / (1:length(meuse.fit$delta))
)
ggplot(delta_cummean_df, aes(x = x, y = y)) +
geom_line() +
labs(
x = "sample", y = expression(delta),
title = bquote(atop("Cumulative Mean of Samples", "from Posterior of" ~ delta))
) +
theme(plot.title = element_text(hjust = 0.5))
Check predictions against true values in validation data.
Ypred_mat <- exp(predict(meuse.fit, meuse.valid))
meuse.valid$Ypred <- rowMeans(Ypred_mat)
min_value <- min(c(meuse.valid$Ypred, meuse.valid$zinc))
max_value <- max(c(meuse.valid$Ypred, meuse.valid$zinc))
ggplot(as.data.frame(meuse.valid), aes(x = Ypred, y = zinc)) +
geom_point(size = 3) +
geom_abline() +
labs(
x = "Predicted Values", y = "True Values",
title = "Predicted vs. True Values"
) +
xlim(min_value, max_value) +
ylim(min_value, max_value) +
theme(plot.title = element_text(hjust = 0.5))
ggplot(stack(as.data.frame(t(Ypred_mat)))) +
geom_boxplot(aes(x = ind, y = values)) +
geom_point(
data = data.frame(Y.true = meuse.valid$zinc),
aes(x = 1:9, y = Y.true),
shape = 4, size = 3
) +
labs(
x = "", y = "Y",
title = bquote(atop("Boxplot of Predicted Values of", "Y and True Values (X)"))
) +
theme(plot.title = element_text(hjust = 0.5))
You can specify a formula with covariates in DSSP()
.
summary()
will return a summary of the covariates as well
as the model parameters, η and
δ.
meuse.fit.covariates <- DSSP(
formula = log(zinc) ~ log(lead) + lime, data = meuse.train, N = 10000,
pars = c(0.001, 0.001), log_prior = function(x) -2 * log(1 + x)
)
summary(meuse.fit.covariates)
#> Formula: log(zinc) ~ log(lead) + lime
#> Number of observations: 155
#> Number of iterations: 10000
#>
#> Summary of model:
#> Estimate Est.Error l-95% CI u-95% CI ESS
#> eta 306.38 966.93 3.89 1706.99 10000.00
#> delta 0.11 0.01 0.09 0.14 10000.00
#>
#> Summary of covariates:
#> Estimate Est.Error min q0.025 q0.25 q0.50 q0.75 q0.975 max
#> (Intercept) 1.25 0.21 0.28 0.85 1.11 1.24 1.38 1.69 2.46
#> log(lead) 0.96 0.04 0.80 0.89 0.94 0.96 0.99 1.04 1.11
#> lime 0.24 0.05 0.05 0.13 0.20 0.24 0.27 0.34 0.44
plot()
can be used to get several plots from the
model.
plot(meuse.fit.covariates)
#> `geom_smooth()` using formula = 'y ~ x'
#> Warning in get_plot_component(plot, "guide-box"): Multiple components found;
#> returning the first one. To return all, use `return_all = TRUE`.
The posterior density of the covariates
(covariates_posterior
) can be accessed directly from the
dsspMod
object.
ggplot(
data.frame(log_lead = meuse.fit.covariates$covariates_posterior["log(lead)", ]),
aes(x = log_lead)
) +
geom_density() +
labs(
title = "Posterior density of log(lead)",
y = "posterior density", x = "log(lead)"
)
ggplot(
data.frame(lime = meuse.fit.covariates$covariates_posterior["lime", ]),
aes(x = lime)
) +
geom_density() +
labs(
title = "Posterior density of lime",
y = "posterior density"
)
Note that in both examples diagnostic plots are not necessary as sample are drawn directly from the posterior. The plots are included here for clarity and illustration, which may be useful in some circumstances.
G. White, D. Sun, P. Speckman (2019) <arXiv:1906.05575>↩︎