--- title: "MGPs for univariate spatial gridded data" author: "M Peruzzi" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{MGPs for univariate spatial gridded data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = '40%', fig.align="center", message=FALSE ) ``` This is the simplest case for Meshed GPs. We start by generating some data ```{r} library(magrittr) library(dplyr) library(ggplot2) library(meshed) set.seed(2021) SS <- 50 # coord values for jth dimension dd <- 2 # spatial dimension n <- SS^2 # number of locations p <- 3 # number of covariates xlocs <- seq(0, 1, length.out=SS) coords <- expand.grid(list(xlocs, xlocs)) sigmasq <- 1.5 nu <- 0.5 phi <- 10 tausq <- .1 # covariance at coordinates CC <- sigmasq * exp(-phi * as.matrix(dist(coords))) # cholesky of covariance matrix LC <- t(chol(CC)) # spatial latent effects are a GP w <- LC %*% rnorm(n) # measurement errors eps <- tausq^.5 * rnorm(n) # covariates and coefficients X <- matrix(rnorm(n*p), ncol=p) Beta <- matrix(rnorm(p), ncol=1) # univariate outcome, fully observed y_full <- X %*% Beta + w + eps # now introduce some NA values in the outcomes y <- y_full %>% matrix(ncol=1) y[sample(1:n, n/5, replace=FALSE), ] <- NA simdata <- coords %>% cbind(data.frame(Outcome_full= y_full, Outcome_obs = y, w = w)) simdata %>% ggplot(aes(Var1, Var2, fill=w)) + geom_raster() + scale_fill_viridis_c() + theme_minimal() + ggtitle("w: Latent GP") simdata %>% ggplot(aes(Var1, Var2, fill=y)) + geom_raster() + scale_fill_viridis_c() + theme_minimal() + ggtitle("y: Observed outcomes") ``` We now fit the following spatial regression model $$ y = X \beta + w + \epsilon, $$ where $w \sim MGP$ are spatial random effects, and $\epsilon \sim N(0, \tau^2)$. For spatial data, an exponential covariance function is used by default: $C(h) = \sigma^2 \exp( -\phi h )$ where $h$ is the spatial distance. The prior for the spatial decay $\phi$ is $U[l,u]$ and the values of $l$ and $u$ must be specified. We choose $l=1$, $u=30$ for this dataset.^[`spmeshed` implements a model which can be interpreted as assigning $\sigma^2$ a folded-t via parameter expansion if `settings$ps=TRUE`, or an inverse Gamma with parameters $a=2$ and $b=1$ if `settings$ps=FALSE`, which cannot be changed at this time. $\tau^2$ is assigned an exponential prior.] Setting `verbose` tells `spmeshed` how many intermediate messages to output while running MCMC. We set up MCMC to run for 3000 iterations, of which we discard the first third. We use the argument `block_size=25` to specify the coarseness of domain partitioning. In this case, the same result can be achieved by setting `axis_partition=c(10, 10)`. Finally, we note that `NA` values for the outcome are OK since the full dataset is on a grid. `spmeshed` will figure this out and use settings optimized for partly observed lattices, and automatically predict the outcomes at missing locations. On the other hand, `X` values are assumed to not be missing. ```{r} mcmc_keep <- 200 # too small! this is just a vignette. mcmc_burn <- 400 mcmc_thin <- 2 mesh_total_time <- system.time({ meshout <- spmeshed(y, X, coords, #axis_partition=c(10,10), #same as block_size=25 block_size = 25, n_samples = mcmc_keep, n_burn = mcmc_burn, n_thin = mcmc_thin, n_threads = 2, verbose = 0, prior=list(phi=c(1,30)) )}) ``` We can now do some postprocessing of the results. We extract posterior marginal summaries for $\sigma^2$, $\phi$, $\tau^2$, and $\beta_2$. The model that `spmeshed` targets is a slight reparametrization of the above:^[At its core, `spmeshed` implements the spatial factor model $Y(s) = X(s)\beta + \Lambda v(s) + \epsilon(s)$ where $w(s) = \Lambda v(s)$ is modeled via linear coregionalization.] $$ y = X \beta + \lambda w + \epsilon, $$ where $w\sim MGP$ has unitary variance. This model is equivalent to the previous one and in fact we find $\sigma^2=\lambda^2$. ```{r} summary(meshout$lambda_mcmc[1,1,]^2) summary(meshout$theta_mcmc[1,1,]) summary(meshout$tausq_mcmc[1,]) summary(meshout$beta_mcmc[1,1,]) ``` We proceed to plot predictions across the domain along with the recovered latent effects. ```{r} # process means wmesh <- data.frame(w_mgp = meshout$w_mcmc %>% summary_list_mean()) # predictions ymesh <- data.frame(y_mgp = meshout$yhat_mcmc %>% summary_list_mean()) outdf <- meshout$coordsdata %>% cbind(wmesh, ymesh) %>% left_join(simdata) outdf %>% ggplot(aes(Var1, Var2, fill=w_mgp)) + geom_raster() + scale_fill_viridis_c() + theme_minimal() + ggtitle("w: recovered") outdf %>% ggplot(aes(Var1, Var2, fill=y_mgp)) + geom_raster() + scale_fill_viridis_c() + theme_minimal() + ggtitle("y: predictions") ```