Title: | Modeling Spatially Varying Coefficients |
---|---|
Description: | Implements a maximum likelihood estimation (MLE) method for estimation and prediction of Gaussian process-based spatially varying coefficient (SVC) models (Dambon et al. (2021a) <doi:10.1016/j.spasta.2020.100470>). Covariance tapering (Furrer et al. (2006) <doi:10.1198/106186006X132178>) can be applied such that the method scales to large data. Further, it implements a joint variable selection of the fixed and random effects (Dambon et al. (2021b) <doi:10.1080/13658816.2022.2097684>). The package and its capabilities are described in (Dambon et al. (2021c) <arXiv:2106.02364>). |
Authors: | Jakob A. Dambon [aut, cre] , Fabio Sigrist [ctb] , Reinhard Furrer [ctb] |
Maintainer: | Jakob A. Dambon <[email protected]> |
License: | GPL-2 |
Version: | 0.3.4 |
Built: | 2024-12-10 06:39:42 UTC |
Source: | CRAN |
Ensures that the covariance parameters define a positive definite covariance
matrix. It takes the vector
and checks if
all
, all
, and
.
check_cov_lower(cv, q)
check_cov_lower(cv, q)
cv |
( |
q |
( |
logical(1)
with TRUE
if all conditions above are
fulfilled.
# first one is true, all other are false check_cov_lower(c(0.1, 0, 0.2, 1, 0.2), q = 2) check_cov_lower(c(0 , 0, 0.2, 1, 0.2), q = 2) check_cov_lower(c(0.1, 0, 0.2, 1, 0 ), q = 2) check_cov_lower(c(0.1, 0, 0.2, -1, 0 ), q = 2)
# first one is true, all other are false check_cov_lower(c(0.1, 0, 0.2, 1, 0.2), q = 2) check_cov_lower(c(0 , 0, 0.2, 1, 0.2), q = 2) check_cov_lower(c(0.1, 0, 0.2, 1, 0 ), q = 2) check_cov_lower(c(0.1, 0, 0.2, -1, 0 ), q = 2)
Method to extract the mean effects from an SVC_mle
or SVC_selection
object.
## S3 method for class 'SVC_mle' coef(object, ...) ## S3 method for class 'SVC_selection' coef(object, ...)
## S3 method for class 'SVC_mle' coef(object, ...) ## S3 method for class 'SVC_selection' coef(object, ...)
object |
|
... |
further arguments |
named vector with mean effects, i.e. from
SVC_mle
Jakob Dambon
Function to extract the covariance parameters from an
SVC_mle
or SVC_selection
object.
cov_par(...) ## S3 method for class 'SVC_mle' cov_par(object, ...) ## S3 method for class 'SVC_selection' cov_par(object, ...)
cov_par(...) ## S3 method for class 'SVC_mle' cov_par(object, ...) ## S3 method for class 'SVC_selection' cov_par(object, ...)
... |
further arguments |
object |
|
vector with covariance parameters with the following attributes:
"GRF"
, charachter, describing the covariance function used for
the GP, see SVC_mle_control
.
"tapering"
, either NULL
if no tapering is applied of
the taper range.
Jakob Dambon
Method to extract the fitted values from an SVC_mle
object. This is only possible if save.fitted
was set to TRUE
in the control of the function call
## S3 method for class 'SVC_mle' fitted(object, ...)
## S3 method for class 'SVC_mle' fitted(object, ...)
object |
|
... |
further arguments |
Data frame, fitted values to given data, i.e., the SVC as well as the response and their locations
Jakob Dambon
Computes the GLS estimate using the formula:
The computation is done depending on the input class of the Cholesky factor
R
. It relies on the classical solve
or on
using forwardsolve
and backsolve
functions of package
spam
, see solve
. This is much faster than
computing the inverse of , especially since we have to compute
the Cholesky decomposition of
either way.
GLS_chol(R, X, y) ## S3 method for class 'spam.chol.NgPeyton' GLS_chol(R, X, y) ## S3 method for class 'matrix' GLS_chol(R, X, y)
GLS_chol(R, X, y) ## S3 method for class 'spam.chol.NgPeyton' GLS_chol(R, X, y) ## S3 method for class 'matrix' GLS_chol(R, X, y)
R |
( |
X |
( |
y |
( |
A numeric(p)
vector, i.e., the mean effects.
Jakob Dambon
# generate data n <- 10 X <- cbind(1, 20+1:n) y <- rnorm(n) A <- matrix(runif(n^2)*2-1, ncol=n) Sigma <- t(A) %*% A # two possibilities ## using standard Cholesky decomposition R_mat <- chol(Sigma); str(R_mat) mu_mat <- GLS_chol(R_mat, X, y) ## using spam R_spam <- chol(spam::as.spam(Sigma)); str(R_spam) mu_spam <- GLS_chol(R_spam, X, y) # should be identical to the following mu <- solve(crossprod(X, solve(Sigma, X))) %*% crossprod(X, solve(Sigma, y)) ## check abs(mu - mu_mat) abs(mu - mu_spam)
# generate data n <- 10 X <- cbind(1, 20+1:n) y <- rnorm(n) A <- matrix(runif(n^2)*2-1, ncol=n) Sigma <- t(A) %*% A # two possibilities ## using standard Cholesky decomposition R_mat <- chol(Sigma); str(R_mat) mu_mat <- GLS_chol(R_mat, X, y) ## using spam R_spam <- chol(spam::as.spam(Sigma)); str(R_spam) mu_spam <- GLS_chol(R_spam, X, y) # should be identical to the following mu <- solve(crossprod(X, solve(Sigma, X))) %*% crossprod(X, solve(Sigma, y)) ## check abs(mu - mu_mat) abs(mu - mu_spam)
A dataset containing the prices and other attributes of 25,357 houses in
Lucas County, Ohio. The selling dates span years 1993 to 1998. Data taken
from house
(spData
package) and slightly modified to a data.frame
.
house
house
A data frame with 25357 rows and 25 variables:
(integer
) selling price, in US dollars
(integer
) year the house was built
(factor
) levels are "one", "bilevel",
"multilvl", "one+half", "two", "two+half", "three"
(integer
) total living area, in square feet.
(factor
) levels are "stucdrvt", "ccbtile",
"metlvnyl", "brick", "stone", "wood", "partbrk"
(integer
) number of corresponding
rooms / facilities.
dimensions of the lot. Unit is feet.
(factor
) levels are "no garage", "basement",
"attached", "detached", "carport"
(integer
) garage area, in square feet. If
garage == "no garage"
, then garagesqft == 0
.
(integer
) number of rooms
(integer
) area of lot, in square feet
(Date
) selling date, in format yyyy-mm-dd
(int
) appraised value
(int
) dummies for
selling year.
(factor
) levels are selling years "1993", "1994",
"1995", "1996", "1997", "1998"
(numeric
) location of houses. Longitude and
Latitude are given in CRS(+init=epsg:2834)
, the Ohio North State
Plane. Units are meters.
http://www.spatial-econometrics.com/html/jplv6.zip
Methods to calculate information criteria for
SVC_mle
objects. Currently, two are supported: the conditional
Akaike's Information Criteria
and the Bayesian Information Criteria
.
Note that the Akaike's Information Criteria is of the corrected form, that
is:
is the effective degrees of freedom which is derived as the
trace of the hat matrices and df is the degree of freedoms with respect to
mean parameters.
## S3 method for class 'SVC_mle' BIC(object, ...) ## S3 method for class 'SVC_mle' AIC(object, conditional = "BW", ...)
## S3 method for class 'SVC_mle' BIC(object, ...) ## S3 method for class 'SVC_mle' AIC(object, conditional = "BW", ...)
object |
|
... |
further arguments |
conditional |
string. If |
numeric, value of information criteria
Jakob Dambon
Sets bounds and initial values for optim
by
extracting potentially given values from SVC_mle_control
and
checking them, or calculating them from given data. See Details.
init_bounds_optim(control, p, q, id_obj, med_dist, y_var, OLS_mu)
init_bounds_optim(control, p, q, id_obj, med_dist, y_var, OLS_mu)
control |
( |
p |
( |
q |
( |
id_obj |
( |
med_dist |
( |
y_var |
( |
OLS_mu |
( |
If values are not provided, then they are set in the following way.
Let be the median distance
med_dist
, let be
the variance of the response
y_var
, and let be the OLS
coefficients of the linear model. The computed values are given in the
table below.
Parameter | Lower bound | Initial Value | Upper Bound |
Range | |
|
|
Variance | |
|
|
Nugget | |
|
|
Mean |
-Inf |
|
Inf |
A list
with three entries: lower
, init
,
and upper
.
Jakob Dambon
Method to extract the computed (penalized) log (profile) Likelihood from an SVC_mle
object.
## S3 method for class 'SVC_mle' logLik(object, ...)
## S3 method for class 'SVC_mle' logLik(object, ...)
object |
|
... |
further arguments |
an object of class logLik
with attributes
"penalized"
, logical, if the likelihood (FALSE
) or some penalized likelihood (TRUE
) was optimized.
"profileLik"
, logical, if the optimization was done using the profile likelihood (TRUE
) or not.
"nobs"
, integer of number of observations
"df"
, integer of how many parameters were estimated. Note: This includes only the covariance parameters if the profile likelihood was used.
Jakob Dambon
Function to extract the number of unique locations in the data
set used in an MLE of the SVC_mle
object.
nlocs(object)
nlocs(object)
object |
|
integer with the number of unique locations
Jakob Dambon
Method to extract the number of observations used in MLE for an SVC_mle
object.
## S3 method for class 'SVC_mle' nobs(object, ...)
## S3 method for class 'SVC_mle' nobs(object, ...)
object |
|
... |
further arguments |
an integer of number of observations
Jakob Dambon
SVC_mle
modelMethod to plot the residuals from an SVC_mle
object. For this, save.fitted
has to be TRUE
in
SVC_mle_control
.
## S3 method for class 'SVC_mle' plot(x, which = 1:2, ...)
## S3 method for class 'SVC_mle' plot(x, which = 1:2, ...)
x |
( |
which |
( |
... |
further arguments |
a maximum 2 plots
Tukey-Anscombe plot, i.e. residuals vs. fitted
QQ-plot
Jakob Dambon
#' ## ---- toy example ---- ## sample data # setting seed for reproducibility set.seed(123) m <- 7 # number of observations n <- m*m # number of SVC p <- 3 # sample data y <- rnorm(n) X <- matrix(rnorm(n*p), ncol = p) # locations on a regular m-by-m-grid locs <- expand.grid(seq(0, 1, length.out = m), seq(0, 1, length.out = m)) ## preparing for maximum likelihood estimation (MLE) # controls specific to MLE control <- SVC_mle_control( # initial values of optimization init = rep(0.1, 2*p+1), # using profile likelihood profileLik = TRUE ) # controls specific to optimization procedure, see help(optim) opt.control <- list( # number of iterations (set to one for demonstration sake) maxit = 1, # tracing information trace = 6 ) ## starting MLE fit <- SVC_mle(y = y, X = X, locs = locs, control = control, optim.control = opt.control) ## output: convergence code equal to 1, since maxit was only 1 summary(fit) ## plot residuals # only QQ-plot plot(fit, which = 2) # two plots next to each other oldpar <- par(mfrow = c(1, 2)) plot(fit) par(oldpar)
#' ## ---- toy example ---- ## sample data # setting seed for reproducibility set.seed(123) m <- 7 # number of observations n <- m*m # number of SVC p <- 3 # sample data y <- rnorm(n) X <- matrix(rnorm(n*p), ncol = p) # locations on a regular m-by-m-grid locs <- expand.grid(seq(0, 1, length.out = m), seq(0, 1, length.out = m)) ## preparing for maximum likelihood estimation (MLE) # controls specific to MLE control <- SVC_mle_control( # initial values of optimization init = rep(0.1, 2*p+1), # using profile likelihood profileLik = TRUE ) # controls specific to optimization procedure, see help(optim) opt.control <- list( # number of iterations (set to one for demonstration sake) maxit = 1, # tracing information trace = 6 ) ## starting MLE fit <- SVC_mle(y = y, X = X, locs = locs, control = control, optim.control = opt.control) ## output: convergence code equal to 1, since maxit was only 1 summary(fit) ## plot residuals # only QQ-plot plot(fit, which = 2) # two plots next to each other oldpar <- par(mfrow = c(1, 2)) plot(fit) par(oldpar)
Prediction of SVCs (and response variable)
## S3 method for class 'SVC_mle' predict( object, newlocs = NULL, newX = NULL, newW = NULL, newdata = NULL, compute.y.var = FALSE, ... )
## S3 method for class 'SVC_mle' predict( object, newlocs = NULL, newX = NULL, newW = NULL, newdata = NULL, compute.y.var = FALSE, ... )
object |
( |
newlocs |
( |
newX |
( |
newW |
( |
newdata |
( |
compute.y.var |
( |
... |
further arguments |
The function returns a data frame of n.new
rows and with
columns
SVC_1, ..., SVC_p
: the predicted SVC at locations newlocs
.
y.pred
, if newX
and newW
are provided
y.var
, if newX
and newW
are provided and
compute.y.var
is set to TRUE
.
loc_x, loc_y
, the locations of the predictions
Jakob Dambon
Dambon, J. A., Sigrist, F., Furrer, R. (2021) Maximum likelihood estimation of spatially varying coefficient models for large data with an application to real estate price prediction, Spatial Statistics doi:10.1016/j.spasta.2020.100470
## ---- toy example ---- ## We use the sampled, i.e., one dimensional SVCs str(SVCdata) # sub-sample data to have feasible run time for example set.seed(123) id <- sample(length(SVCdata$locs), 50) ## SVC_mle call with matrix arguments fit_mat <- with(SVCdata, SVC_mle( y[id], X[id, ], locs[id], control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32"))) ## SVC_mle call with formula df <- with(SVCdata, data.frame(y = y[id], X = X[id, -1])) fit_form <- SVC_mle( y ~ X, data = df, locs = SVCdata$locs[id], control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32") ) ## prediction # predicting SVCs predict(fit_mat, newlocs = 1:2) predict(fit_form, newlocs = 1:2) # predicting SVCs and response providing new covariates predict( fit_mat, newX = matrix(c(1, 1, 3, 4), ncol = 2), newW = matrix(c(1, 1, 3, 4), ncol = 2), newlocs = 1:2 ) predict(fit_form, newdata = data.frame(X = 3:4), newlocs = 1:2)
## ---- toy example ---- ## We use the sampled, i.e., one dimensional SVCs str(SVCdata) # sub-sample data to have feasible run time for example set.seed(123) id <- sample(length(SVCdata$locs), 50) ## SVC_mle call with matrix arguments fit_mat <- with(SVCdata, SVC_mle( y[id], X[id, ], locs[id], control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32"))) ## SVC_mle call with formula df <- with(SVCdata, data.frame(y = y[id], X = X[id, -1])) fit_form <- SVC_mle( y ~ X, data = df, locs = SVCdata$locs[id], control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32") ) ## prediction # predicting SVCs predict(fit_mat, newlocs = 1:2) predict(fit_form, newlocs = 1:2) # predicting SVCs and response providing new covariates predict( fit_mat, newX = matrix(c(1, 1, 3, 4), ncol = 2), newW = matrix(c(1, 1, 3, 4), ncol = 2), newlocs = 1:2 ) predict(fit_form, newdata = data.frame(X = 3:4), newlocs = 1:2)
summary.SVC_mle
Printing Method for summary.SVC_mle
## S3 method for class 'summary.SVC_mle' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'summary.SVC_mle' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
|
digits |
the number of significant digits to use when printing. |
... |
further arguments |
The printed output of the summary in the console.
SVC_mle
Method to print an SVC_mle
object.
## S3 method for class 'SVC_mle' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'SVC_mle' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
|
digits |
( |
... |
further arguments |
Jakob Dambon
Method to extract the residuals from an SVC_mle
object. This is only possible if save.fitted
was set to TRUE
.
## S3 method for class 'SVC_mle' residuals(object, ...)
## S3 method for class 'SVC_mle' residuals(object, ...)
object |
|
... |
further arguments |
(numeric(n)
)
Residuals of model
Jakob Dambon
Samples SVC data at given locations. The SVCs parameters and the covariance function have to be provided. The sampled model matrix can be provided or it is sampled. The SVCs are sampled according to their given parametrization and at respective observation locations. The error vector is sampled from a nugget effect. Finally, the response vector is computed. Please note that the function is not optimized for sampling large data sets.
sample_SVCdata( df.pars, nugget.sd, locs, cov.name = c("exp", "sph", "mat32", "mat52", "wend1", "wend2"), X = NULL )
sample_SVCdata( df.pars, nugget.sd, locs, cov.name = c("exp", "sph", "mat32", "mat52", "wend1", "wend2"), X = NULL )
df.pars |
( |
nugget.sd |
( |
locs |
( |
cov.name |
( |
X |
( |
The parameters of the model can be chosen such that we obtain data
from a not full model, i.e., not all covariates are associated with a
fixed and a random effect. Using var = 0
for instance yields a
constant beta coefficient for respective covariate. Note that in that
case the scale
value is neglected.
list
Returns a list with the response y
, model matrix
X
, a matrix beta
containing the sampled SVC at given
locations, a vector eps
containing the error, and a matrix
locs
containing the original locations. The true_pars
contains the data frame of covariance parameters that were used to
sample the GP-based SVCs. The nugget variance has been added to the
original argument of the function with its respective variance, but
NA
for "mean"
and "scale"
.
set.seed(123) # SVC parameters (df.pars <- data.frame( var = c(2, 1), scale = c(3, 1), mean = c(1, 2))) # nugget standard deviation tau <- 0.5 # sample locations s <- sort(runif(500, min = 0, max = 10)) SVCdata <- sample_SVCdata( df.pars = df.pars, nugget.sd = tau, locs = s, cov.name = "mat32" )
set.seed(123) # SVC parameters (df.pars <- data.frame( var = c(2, 1), scale = c(3, 1), mean = c(1, 2))) # nugget standard deviation tau <- 0.5 # sample locations s <- sort(runif(500, min = 0, max = 10)) SVCdata <- sample_SVCdata( df.pars = df.pars, nugget.sd = tau, locs = s, cov.name = "mat32" )
SVC_mle
Method to construct a summary.SVC_mle
object out of a
SVC_mle
object.
## S3 method for class 'SVC_mle' summary(object, ...)
## S3 method for class 'SVC_mle' summary(object, ...)
object |
|
... |
further arguments |
object of class summary.SVC_mle
with summarized values of the MLE.
Jakob Dambon
Conducts a maximum likelihood estimation (MLE) for a Gaussian process-based spatially varying coefficient model as described in Dambon et al. (2021) doi:10.1016/j.spasta.2020.100470.
SVC_mle(...) ## Default S3 method: SVC_mle(y, X, locs, W = NULL, control = NULL, optim.control = list(), ...) ## S3 method for class 'formula' SVC_mle( formula, data, RE_formula = NULL, locs, control = NULL, optim.control = list(), ... )
SVC_mle(...) ## Default S3 method: SVC_mle(y, X, locs, W = NULL, control = NULL, optim.control = list(), ...) ## S3 method for class 'formula' SVC_mle( formula, data, RE_formula = NULL, locs, control = NULL, optim.control = list(), ... )
... |
further arguments |
y |
( |
X |
( |
locs |
( |
W |
( |
control |
( |
optim.control |
( |
formula |
Formula describing the fixed effects in SVC model. The response,
i.e. LHS of the formula, is not allowed to have functions such as |
data |
data frame containing the observations |
RE_formula |
Formula describing the random effects in SVC model.
Only RHS is considered. If |
The GP-based SVC model is defined with some abuse of notation as:
where:
is the response (vector of length
)
is the data matrix for the fixed effects covariates. The
dimensions are
times
. This leads to
fixed effects.
is the vector containing the fixed effects
W is the data matrix for the SVCs modeled by GPs. The dimensions are
times
. This lead to
SVCs in the model.
are the SVCs represented by a GP.
is the nugget effect
The MLE is an numeric optimization that runs optim
or
(if parallelized) optimParallel
.
You can call the function in two ways. Either, you define the model matrices
yourself and provide them using the arguments X
and W
. As usual,
the individual columns correspond to the fixed and random effects, i.e., the
Gaussian processes, respectively. The second way is to call the function with
formulas, like you would in lm
. From the data.frame
provided in argument data
, the respective model matrices as described
above are implicitly built. Using simple arguments formula
and
RE_formula
with data
column names, we can decide which
covariate is modeled with a fixed or random effect (SVC).
Note that similar to model matrix call from above, if the RE_formula
is not provided, we use the one as in argument formula
. Further, note
that the intercept is implicitly constructed in the model matrix if not
prohibited.
Object of class SVC_mle
if control$extract_fun = FALSE
,
meaning that a MLE has been conducted. Otherwise, if control$extract_fun = TRUE
,
the function returns a list with two entries:
obj_fun
: the objective function used in the optimization
args
: the arguments to evaluate the objective function.
For further details, see description of SVC_mle_control
.
Jakob Dambon
Dambon, J. A., Sigrist, F., Furrer, R. (2021) Maximum likelihood estimation of spatially varying coefficient models for large data with an application to real estate price prediction, Spatial Statistics doi:10.1016/j.spasta.2020.100470
## ---- toy example ---- ## We use the sampled, i.e., one dimensional SVCs str(SVCdata) # sub-sample data to have feasible run time for example set.seed(123) id <- sample(length(SVCdata$locs), 50) ## SVC_mle call with matrix arguments fit <- with(SVCdata, SVC_mle( y[id], X[id, ], locs[id], control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32"))) ## SVC_mle call with formula df <- with(SVCdata, data.frame(y = y[id], X = X[id, -1])) fit <- SVC_mle( y ~ X, data = df, locs = SVCdata$locs[id], control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32") ) class(fit) summary(fit) ## ---- real data example ---- require(sp) ## get data set data("meuse", package = "sp") # construct data matrix and response, scale locations y <- log(meuse$cadmium) X <- model.matrix(~1+dist+lime+elev, data = meuse) locs <- as.matrix(meuse[, 1:2])/1000 ## starting MLE # the next call takes a couple of seconds fit <- SVC_mle( y = y, X = X, locs = locs, # has 4 fixed effects, but only 3 random effects (SVC) # elev is missing in SVC W = X[, 1:3], control = SVC_mle_control( # inital values for 3 SVC # 7 = (3 * 2 covariance parameters + nugget) init = c(rep(c(0.4, 0.2), 3), 0.2), profileLik = TRUE ) ) ## summary and residual output summary(fit) plot(fit) ## predict # new locations newlocs <- expand.grid( x = seq(min(locs[, 1]), max(locs[, 1]), length.out = 30), y = seq(min(locs[, 2]), max(locs[, 2]), length.out = 30)) # predict SVC for new locations SVC <- predict(fit, newlocs = as.matrix(newlocs)) # visualization sp.SVC <- SVC coordinates(sp.SVC) <- ~loc_1+loc_2 spplot(sp.SVC, colorkey = TRUE)
## ---- toy example ---- ## We use the sampled, i.e., one dimensional SVCs str(SVCdata) # sub-sample data to have feasible run time for example set.seed(123) id <- sample(length(SVCdata$locs), 50) ## SVC_mle call with matrix arguments fit <- with(SVCdata, SVC_mle( y[id], X[id, ], locs[id], control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32"))) ## SVC_mle call with formula df <- with(SVCdata, data.frame(y = y[id], X = X[id, -1])) fit <- SVC_mle( y ~ X, data = df, locs = SVCdata$locs[id], control = SVC_mle_control(profileLik = TRUE, cov.name = "mat32") ) class(fit) summary(fit) ## ---- real data example ---- require(sp) ## get data set data("meuse", package = "sp") # construct data matrix and response, scale locations y <- log(meuse$cadmium) X <- model.matrix(~1+dist+lime+elev, data = meuse) locs <- as.matrix(meuse[, 1:2])/1000 ## starting MLE # the next call takes a couple of seconds fit <- SVC_mle( y = y, X = X, locs = locs, # has 4 fixed effects, but only 3 random effects (SVC) # elev is missing in SVC W = X[, 1:3], control = SVC_mle_control( # inital values for 3 SVC # 7 = (3 * 2 covariance parameters + nugget) init = c(rep(c(0.4, 0.2), 3), 0.2), profileLik = TRUE ) ) ## summary and residual output summary(fit) plot(fit) ## predict # new locations newlocs <- expand.grid( x = seq(min(locs[, 1]), max(locs[, 1]), length.out = 30), y = seq(min(locs[, 2]), max(locs[, 2]), length.out = 30)) # predict SVC for new locations SVC <- predict(fit, newlocs = as.matrix(newlocs)) # visualization sp.SVC <- SVC coordinates(sp.SVC) <- ~loc_1+loc_2 spplot(sp.SVC, colorkey = TRUE)
SVC_mle
Function to set up control parameters for SVC_mle
.
In the following, we assume the GP-based SVC model to have GPs which
model the SVCs and
fixed effects.
SVC_mle_control(...) ## Default S3 method: SVC_mle_control( cov.name = c("exp", "sph", "mat32", "mat52", "wend1", "wend2"), tapering = NULL, parallel = NULL, init = NULL, lower = NULL, upper = NULL, save.fitted = TRUE, profileLik = FALSE, mean.est = c("GLS", "OLS"), pc.prior = NULL, extract_fun = FALSE, hessian = TRUE, dist = list(method = "euclidean"), parscale = TRUE, ... ) ## S3 method for class 'SVC_mle' SVC_mle_control(object, ...)
SVC_mle_control(...) ## Default S3 method: SVC_mle_control( cov.name = c("exp", "sph", "mat32", "mat52", "wend1", "wend2"), tapering = NULL, parallel = NULL, init = NULL, lower = NULL, upper = NULL, save.fitted = TRUE, profileLik = FALSE, mean.est = c("GLS", "OLS"), pc.prior = NULL, extract_fun = FALSE, hessian = TRUE, dist = list(method = "euclidean"), parscale = TRUE, ... ) ## S3 method for class 'SVC_mle' SVC_mle_control(object, ...)
... |
Further Arguments yet to be implemented |
cov.name |
( |
tapering |
( |
parallel |
( |
init |
( |
lower |
( |
upper |
( |
save.fitted |
( |
profileLik |
( |
mean.est |
( |
pc.prior |
( |
extract_fun |
( |
hessian |
( |
dist |
( |
parscale |
( |
object |
( |
If not provided, the initial values as well as the lower and upper
bounds are calculated given the provided data. In particular, we require
the median distance between observations, the variance of the response and,
the ordinary least square (OLS) estimates, see init_bounds_optim
.
The argument extract_fun
is useful, when one wants to modify
the objective function. Further, when trying to parallelize the
optimization, it is useful to check whether a single evaluation of the
objective function takes longer than 0.05 seconds to evaluate,
cf. Gerber and Furrer (2019) doi:10.32614/RJ-2019-030. Platform specific
issues can be sorted out by the user by setting up their own optimization.
A list with which SVC_mle
can be controlled.
Jakob Dambon
control <- SVC_mle_control(init = rep(0.3, 10)) # or control <- SVC_mle_control() control$init <- rep(0.3, 10) # Code for setting up parallel computing require(parallel) # exchange number of nodes (1) for detectCores()-1 or appropriate number cl <- makeCluster(1, setup_strategy = "sequential") clusterEvalQ( cl = cl, { library(spam) library(varycoef) }) # use this list for parallel argument in SVC_mle_control parallel.control <- list(cl = cl, forward = TRUE, loginfo = TRUE) # SVC_mle goes here ... # DO NOT FORGET TO STOP THE CLUSTER! stopCluster(cl); rm(cl)
control <- SVC_mle_control(init = rep(0.3, 10)) # or control <- SVC_mle_control() control$init <- rep(0.3, 10) # Code for setting up parallel computing require(parallel) # exchange number of nodes (1) for detectCores()-1 or appropriate number cl <- makeCluster(1, setup_strategy = "sequential") clusterEvalQ( cl = cl, { library(spam) library(varycoef) }) # use this list for parallel argument in SVC_mle_control parallel.control <- list(cl = cl, forward = TRUE, loginfo = TRUE) # SVC_mle goes here ... # DO NOT FORGET TO STOP THE CLUSTER! stopCluster(cl); rm(cl)
This function implements the variable selection for Gaussian process-based SVC models using a penalized maximum likelihood estimation (PMLE, Dambon et al., 2021, <arXiv:2101.01932>). It jointly selects the fixed and random effects of GP-based SVC models.
SVC_selection(obj.fun, mle.par, control = NULL, ...)
SVC_selection(obj.fun, mle.par, control = NULL, ...)
obj.fun |
( |
mle.par |
( |
control |
( |
... |
Further arguments. |
Returns an object of class SVC_selection
. It contains parameter estimates under PMLE and the optimization as well as choice of the shrinkage parameters.
Jakob Dambon
Dambon, J. A., Sigrist, F., Furrer, R. (2021). Joint Variable Selection of both Fixed and Random Effects for Gaussian Process-based Spatially Varying Coefficient Models, ArXiv Preprint https://arxiv.org/abs/2101.01932
Function to set up control parameters for
SVC_selection
. The underlying Gaussian Process-based
SVC model is defined in SVC_mle
. SVC_selection
then jointly selects fixed and random effects of the GP-based
SVC model using a penalized maximum likelihood estimation (PMLE).
In this function, one can set the parameters for the PMLE and
its optimization procedures (Dambon et al., 2022).
SVC_selection_control( IC.type = c("BIC", "cAIC_BW", "cAIC_VB"), method = c("grid", "MBO"), r.lambda = c(1e-10, 10), n.lambda = 10L, n.init = 10L, n.iter = 10L, CD.conv = list(N = 20L, delta = 1e-06, logLik = TRUE), hessian = FALSE, adaptive = FALSE, parallel = NULL, optim.args = list() )
SVC_selection_control( IC.type = c("BIC", "cAIC_BW", "cAIC_VB"), method = c("grid", "MBO"), r.lambda = c(1e-10, 10), n.lambda = 10L, n.init = 10L, n.iter = 10L, CD.conv = list(N = 20L, delta = 1e-06, logLik = TRUE), hessian = FALSE, adaptive = FALSE, parallel = NULL, optim.args = list() )
IC.type |
( |
method |
( |
r.lambda |
( |
n.lambda |
( |
n.init |
( |
n.iter |
( |
CD.conv |
( |
hessian |
( |
adaptive |
( |
parallel |
( |
optim.args |
( |
A list of control parameters for SVC selection.
Jakob Dambon
Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M. (2017). mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions, ArXiv preprint https://arxiv.org/abs/1703.03373
Dambon, J. A., Sigrist, F., Furrer, R. (2022). Joint Variable Selection of both Fixed and Random Effects for Gaussian Process-based Spatially Varying Coefficient Models, International Journal of Geographical Information Science doi:10.1080/13658816.2022.2097684
# Initializing parameters and switching logLik to FALSE selection_control <- SVC_selection_control( CD.conv = list(N = 20L, delta = 1e-06, logLik = FALSE) ) # or selection_control <- SVC_selection_control() selection_control$CD.conv$logLik <- FALSE
# Initializing parameters and switching logLik to FALSE selection_control <- SVC_selection_control( CD.conv = list(N = 20L, delta = 1e-06, logLik = FALSE) ) # or selection_control <- SVC_selection_control() selection_control$CD.conv$logLik <- FALSE
A list object that contains sampled data of 500 observations. The data has
been sampled using the RandomFields
package (Schlather et al., 2015).
It is given in the list object SVCdata
which contains the following.
SVCdata
SVCdata
A list
with the following entries:
(numeric
) Response
(numeric
) Covariates; first columns contains ones to model
an intercept, the second column contains standard-normal sampled data.
(numeric
) The sampled Gaussian processes, which are
usually unobserved. It uses a Matern covariance function and the true
parameters are given in the entry 'true_pars'.
(numeric
) Error (or Nugget effect), i.e., drawn from a
zero-mean normal distribution with 0.5 standard deviation.
(numeric
) Locations sampled from a uniform distribution
on the interval 0 to 10.
(data.frame
) True parameters of the GP-based SVC
model with Gaussian process mean, variance, and range. Additionally, the
smoothness (nu) is given.
Schlather, M., Malinowski, A., Menck, P. J., Oesting, M., Strokorb, K. (2015) Analysis, simulation and prediction of multivariate random fields with package RandomFields, Journal of Statistical Software, doi:10.18637/jss.v063.i08
This package offers functions to estimate and predict Gaussian process-based spatially varying coefficient (SVC) models. Briefly described, one generalizes a linear regression equation such that the coefficients are no longer constant, but have the possibility to vary spatially. This is enabled by modeling the coefficients using Gaussian processes with (currently) either an exponential or spherical covariance function. The advantages of such SVC models are that they are usually quite easy to interpret, yet they offer a very high level of flexibility.
The ensemble of the function SVC_mle
and the method
predict
estimates the defined SVC model and gives predictions of the
SVC as well as the response for some pre-defined locations. This concept
should be rather familiar as it is the same for the classical regression
(lm
) or local polynomial regression (loess
),
to name a couple. As the name suggests, we are using a maximum
likelihood estimation (MLE) approach in order to estimate the model. The
predictor is obtained by the empirical best linear unbiased predictor.
to give location-specific predictions. A detailed tutorial with examples is
given in a vignette; call vignette("example", package = "varycoef")
.
We also refer to the original article Dambon et al. (2021) which lays the
methodological foundation of this package.
With the before mentioned SVC_mle
function one gets an object
of class SVC_mle
. And like the method predict
for
predictions, there are several more methods in order to diagnose the model,
see methods(class = "SVC_mle")
.
As of version 0.3.0 of varycoef
, a joint variable selection of both
fixed and random effect of the Gaussian process-based SVC model is
implemented. It uses a penalized maximum likelihood estimation (PMLE)
which is implemented via a gradient descent. The estimation of the shrinkage
parameter is available using a model-based optimization (MBO). Here,
we use the framework by Bischl et al. (2017). The methodological foundation
of the PMLE is described in Dambon et al. (2022).
Jakob Dambon
Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M. (2017). mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions, ArXiv preprint https://arxiv.org/abs/1703.03373
Dambon, J. A., Sigrist, F., Furrer, R. (2021). Maximum likelihood estimation of spatially varying coefficient models for large data with an application to real estate price prediction, Spatial Statistics 41 100470 doi:10.1016/j.spasta.2020.100470
Dambon, J. A., Sigrist, F., Furrer, R. (2022). Joint Variable Selection of both Fixed and Random Effects for Gaussian Process-based Spatially Varying Coefficient Models, International Journal of Geographical Information Science doi:10.1080/13658816.2022.2097684
vignette("manual", package = "varycoef") methods(class = "SVC_mle")
vignette("manual", package = "varycoef") methods(class = "SVC_mle")