| Title: | Spatio-Temporal Estimation and Prediction for Censored/Missing Responses |
|---|---|
| Description: | It estimates the parameters of spatio-temporal models with censored or missing data using the SAEM algorithm (Delyon et al., 1999). This algorithm is a stochastic approximation of the widely used EM algorithm and is particularly valuable for models in which the E-step lacks a closed-form expression. It also provides a function to compute the observed information matrix using the method developed by Louis (1982). To assess the performance of the fitted model, case-deletion diagnostics are provided. |
| Authors: | Larissa A. Matos [aut, cre] (ORCID: <https://orcid.org/0000-0002-2635-0901>), Katherine L. Valeriano [aut] (ORCID: <https://orcid.org/0000-0001-6388-4753>), Victor H. Lachos [ctb] (ORCID: <https://orcid.org/0000-0002-7239-2459>) |
| Maintainer: | Larissa A. Matos <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.2.0 |
| Built: | 2026-05-10 07:07:05 UTC |
| Source: | https://github.com/cran/StempCens |
It estimates the parameters of spatio-temporal models with censored or missing data using the SAEM algorithm (Delyon et al., 1999). This algorithm is a stochastic approximation of the widely used EM algorithm and is particularly valuable for models in which the E-step lacks a closed-form expression. It also provides a function to compute the observed information matrix using the method developed by Louis (1982). To assess the performance of the fitted model, case-deletion diagnostics are provided.
The functions provided are:
- CovarianceM: computes the spatio-temporal covariance matrix for balanced data.
- EffectiveRange: computes the effective range for an isotropic spatial correlation function.
- EstStempCens: returns the maximum likelihood estimates of the unknown parameters.
- PredStempCens: performs spatio-temporal prediction in a set of new S spatial locations for fixed time points.
- CrossStempCens: performs cross-validation, which measure the performance of the predictive model on new test dataset.
- DiagStempCens: returns measures and graphics for diagnostic analysis.
Larissa A. Matos (ORCID), Katherine L. Valeriano (ORCID) and Victor H. Lachos (ORCID)
Maintainer: Larissa A. Matos ([email protected]).
Cook R (1977). “Detection of influential observation in linear regression.” Technometrics, 19(1), 15–18. doi:10.1080/00401706.1977.10489493.
Delyon B, Lavielle M, Moulines E (1999). “Convergence of a stochastic approximation version of the EM algorithm.” Annals of Statistics, 27(1), 94–128. doi:10.1214/aos/1018031103.
Louis T (1982). “Finding the observed information matrix when using the EM algorithm.” Journal of the Royal Statistical Society: Series B (Methodological), 44(2), 226–233. doi:10.1111/j.2517-6161.1982.tb01203.x.
Zhu H, Lee S, Wei B, Zhou J (2001). “Case-deletion measures for models with incomplete data.” Biometrika, 88(3), 727–737. doi:10.1093/biomet/88.3.727.
It computes the spatio-temporal covariance matrix for balanced data, i.e., when we have the same temporal indexes per location. To compute the spatial correlation it provides 5 functions: exponential, gaussian, matern, spherical and power exponential. To compute the temporal correlation is used an autocorrelation function of an AR(1) process.
CovarianceM(phi, rho, tau2, sigma2, distSpa, disTemp, kappa, type.S)CovarianceM(phi, rho, tau2, sigma2, distSpa, disTemp, kappa, type.S)
phi |
value of the spatial scaling parameter. |
rho |
value of the time scaling parameter. |
tau2 |
value of the the nugget effect parameter. |
sigma2 |
value of the partial sill. |
distSpa |
|
disTemp |
|
kappa |
parameter for all spatial covariance functions. In the case of exponential, gaussian and spherical function |
type.S |
type of spatial correlation function: ' |
The function returns the spatio-temporal covariance matrix for balanced data.
Katherine L. Valeriano, Victor H. Lachos and Larissa A. Matos
set.seed(1000) # Parameter values phi <- 5 rho <- 0.45 tau2 <- 0.80 sigma2 <- 2 # Coordinates and time points coords <- matrix(runif(20, 0, 10), ncol=2) # Cartesian coordinates without repetitions time <- as.matrix(1:5) # Time index without repetitions Ms <- as.matrix(dist(coords)) # Spatial distances Mt <- as.matrix(dist(time)) # Temporal distances # Covariance matrix Cov <- CovarianceM(phi, rho, tau2, sigma2, distSpa=Ms, disTemp=Mt, kappa=0, type.S="exponential")set.seed(1000) # Parameter values phi <- 5 rho <- 0.45 tau2 <- 0.80 sigma2 <- 2 # Coordinates and time points coords <- matrix(runif(20, 0, 10), ncol=2) # Cartesian coordinates without repetitions time <- as.matrix(1:5) # Time index without repetitions Ms <- as.matrix(dist(coords)) # Spatial distances Mt <- as.matrix(dist(time)) # Temporal distances # Covariance matrix Cov <- CovarianceM(phi, rho, tau2, sigma2, distSpa=Ms, disTemp=Mt, kappa=0, type.S="exponential")
This function performs cross-validation in spatio-temporal model with censored/missing responses, which measure the performance of the predictive model on new test dataset. The cross-validation method for assessing the model performance is validation set approach (or data split).
CrossStempCens(Pred.StempCens, yObs.pre)CrossStempCens(Pred.StempCens, yObs.pre)
Pred.StempCens |
an object of class |
yObs.pre |
a vector of the observed responses, the test data. |
Bias |
bias prediction error. |
Mspe |
mean squared prediction error. |
Rmspe |
root mean squared prediction error. |
Mae |
mean absolute error. |
Katherine L. Valeriano, Victor H. Lachos and Larissa A. Matos
set.seed(400) # Parameter values beta <- c(-1,1.50) phi <- 5 rho <- 0.6 tau2 <- 0.80 sigma2 <- 2 # Coordinates and covariates coords <- matrix(round(runif(14, 0, 10), 9), ncol=2) # Coordinates without repetitions time <- as.matrix(seq(1, 5)) # Time index without repetitions x <- cbind(rexp(35, 2), rnorm(35, 2, 1)) # Data data <- rnStempCens(x, time, coords, beta, phi, rho, tau2, sigma2, type.S="gaussian", kappa=0, cens="left", pcens=0.2) # Splitting the dataset train <- data[1:32,] test <- data[33:35,] # Estimation x <- cbind(train$x1, train$x2) cc <- train$ci est_teste <- EstStempCens(y=train$yObs, x, cc=train$ci, time=train$time, coord=train[, 1:2], LI=train$lcl, LS=train$ucl, init.phi=3.5, init.rho=0.5, init.tau2=1, type.Data="unbalanced", method="nlminb", kappa=0, type.S="gaussian", IMatrix=TRUE, M=20, perc=0.25, MaxIter=300, pc=0.20) # Prediction xPre <- cbind(test$x1, test$x2) pre_teste <- PredStempCens(est_teste, test[,1:2], test$time, xPre) class(pre_teste) # Cross-validation cross_teste <- CrossStempCens(pre_teste, test$yObs) cross_teste$Mspe # MSPEset.seed(400) # Parameter values beta <- c(-1,1.50) phi <- 5 rho <- 0.6 tau2 <- 0.80 sigma2 <- 2 # Coordinates and covariates coords <- matrix(round(runif(14, 0, 10), 9), ncol=2) # Coordinates without repetitions time <- as.matrix(seq(1, 5)) # Time index without repetitions x <- cbind(rexp(35, 2), rnorm(35, 2, 1)) # Data data <- rnStempCens(x, time, coords, beta, phi, rho, tau2, sigma2, type.S="gaussian", kappa=0, cens="left", pcens=0.2) # Splitting the dataset train <- data[1:32,] test <- data[33:35,] # Estimation x <- cbind(train$x1, train$x2) cc <- train$ci est_teste <- EstStempCens(y=train$yObs, x, cc=train$ci, time=train$time, coord=train[, 1:2], LI=train$lcl, LS=train$ucl, init.phi=3.5, init.rho=0.5, init.tau2=1, type.Data="unbalanced", method="nlminb", kappa=0, type.S="gaussian", IMatrix=TRUE, M=20, perc=0.25, MaxIter=300, pc=0.20) # Prediction xPre <- cbind(test$x1, test$x2) pre_teste <- PredStempCens(est_teste, test[,1:2], test$time, xPre) class(pre_teste) # Cross-validation cross_teste <- CrossStempCens(pre_teste, test$yObs) cross_teste$Mspe # MSPE
Return measures and graphics for diagnostic analysis in spatio-temporal model with censored/missing responses.
DiagStempCens(Est.StempCens, type.diag = "individual", diag.plot = TRUE, ck)DiagStempCens(Est.StempCens, type.diag = "individual", diag.plot = TRUE, ck)
Est.StempCens |
an object of class |
type.diag |
type of diagnostic: ' |
diag.plot |
|
ck |
the value for |
This function uses the case deletion approach to study the impact of deleting one or more observations from the dataset on the parameters estimates, using the ideas of Cook (1977) and Zhu et al. (2001). The measure is defined by
where is the estimate of using the complete data,
are the estimates obtained after deletion of the i-th observation (or group of observations) and
is the Hessian matrix.
We can eliminate an observation, an entire location or an entire time index.
The function returns a list with the diagnostic measures.
type.diag == individual | time | location:GD is a data.frame with the index value of the observation and the GD measure.
type.diag == all:GDind is a data.frame with the index value of the observation and the GD measure for individual.
GDtime is a data.frame with the time index value and the GD measure for time.
GDloc is a data.frame with the side index value and the GD measure for location.
Katherine L. Valeriano, Victor H. Lachos and Larissa A. Matos
set.seed(12345) # Parameter values beta <- c(-1,1.5) phi <- 3 rho <- 0.40 tau2 <- 1 sigma2 <- 2 # Simulating data coord <- matrix(runif(10, 0, 10), ncol=2) # Cartesian coordinates without repetitions time <- as.matrix(1:5) # Time index without repetitions x <- cbind(rexp(25,2), rnorm(25,2,1)) data <- rnStempCens(x, time, coord, beta, phi, rho, tau2, sigma2, type.S="exponential", cens="right", pcens=0.20) data$yObs[17] <- abs(data$yObs[17]) + 2*sd(data$yObs) # perturbed observation # Estimation est <- EstStempCens(y=data$yObs, x, cc=data$ci, data$time, cbind(data$x.coord,data$y.coord), LI=data$lcl, LS=data$ucl, init.phi=2.5, init.rho=0.5, init.tau2=0.8, type.Data="balanced", method="nlminb", kappa=0, type.S="exponential", IMatrix=TRUE, lower.lim=c(0.01,-0.99,0.01), upper.lim=c(30,0.99,20), M=20, perc=0.25, MaxIter=300, pc=0.20) # Diagnostic set.seed(12345) diag <- DiagStempCens(est, type.diag="time", diag.plot = TRUE, ck=1)set.seed(12345) # Parameter values beta <- c(-1,1.5) phi <- 3 rho <- 0.40 tau2 <- 1 sigma2 <- 2 # Simulating data coord <- matrix(runif(10, 0, 10), ncol=2) # Cartesian coordinates without repetitions time <- as.matrix(1:5) # Time index without repetitions x <- cbind(rexp(25,2), rnorm(25,2,1)) data <- rnStempCens(x, time, coord, beta, phi, rho, tau2, sigma2, type.S="exponential", cens="right", pcens=0.20) data$yObs[17] <- abs(data$yObs[17]) + 2*sd(data$yObs) # perturbed observation # Estimation est <- EstStempCens(y=data$yObs, x, cc=data$ci, data$time, cbind(data$x.coord,data$y.coord), LI=data$lcl, LS=data$ucl, init.phi=2.5, init.rho=0.5, init.tau2=0.8, type.Data="balanced", method="nlminb", kappa=0, type.S="exponential", IMatrix=TRUE, lower.lim=c(0.01,-0.99,0.01), upper.lim=c(30,0.99,20), M=20, perc=0.25, MaxIter=300, pc=0.20) # Diagnostic set.seed(12345) diag <- DiagStempCens(est, type.diag="time", diag.plot = TRUE, ck=1)
It computes the effective range for an isotropic spatial correlation function, which is commonly defined to be the distance from which the correlation becomes small, typically below 0.05.
EffectiveRange(cor = 0.05, phi, kappa = 0, Sp.model = "exponential")EffectiveRange(cor = 0.05, phi, kappa = 0, Sp.model = "exponential")
cor |
effective correlation to check for. By default = 0.05. |
phi |
spatial scaling parameter. |
kappa |
smoothness parameter, required by the matern and power exponential functions. By default = 0. |
Sp.model |
type of spatial correlation function: ' |
The available isotropic spatial correlation functions are:
,
,
,
,
,
where is the Euclidean distance between two observations, is the spatial scaling
parameter, is the gamma function, is the smoothness parameter and
is the modified Bessel function of the second kind of order .
The function returns the effective range, i.e., the approximate distance from which the
spatial correlation is lower than cor.
Katherine L. Valeriano, Victor H. Lachos and Larissa A. Matos
phi <- 164.60 range1 <- EffectiveRange(0.05, phi, kappa=0, Sp.model="exponential") range2 <- EffectiveRange(0.05, phi, kappa=1, Sp.model="pow.exp") # Note that these functions are equivalent.phi <- 164.60 range1 <- EffectiveRange(0.05, phi, kappa=0, Sp.model="exponential") range2 <- EffectiveRange(0.05, phi, kappa=1, Sp.model="pow.exp") # Note that these functions are equivalent.
Return the maximum likelihood estimates of the unknown parameters of spatio-temporal model with censored/missing responses. The estimates are obtained using SAEM algorithm. The function also computes the observed information matrix using the method developed by Louis (1982). The types of censoring considered are left, right, interval or missing values.
EstStempCens(y, x, cc, time, coord, LI, LS, init.phi, init.rho, init.tau2, tau2.fixo = FALSE, type.Data = "balanced", method = "nlminb", kappa = 0, type.S = "exponential", IMatrix = TRUE, lower.lim = c(0.01, -0.99, 0.01), upper.lim = c(30, 0.99, 20), M = 20, perc = 0.25, MaxIter = 300, pc = 0.2, error = 1e-06)EstStempCens(y, x, cc, time, coord, LI, LS, init.phi, init.rho, init.tau2, tau2.fixo = FALSE, type.Data = "balanced", method = "nlminb", kappa = 0, type.S = "exponential", IMatrix = TRUE, lower.lim = c(0.01, -0.99, 0.01), upper.lim = c(30, 0.99, 20), M = 20, perc = 0.25, MaxIter = 300, pc = 0.2, error = 1e-06)
y |
a vector of responses. |
x |
a matrix or vector of covariates. |
cc |
a vector of censoring indicators. For each observation: |
time |
a vector of time. |
coord |
a matrix of coordinates of the spatial locations. |
LI |
lower limit of detection. For each observation: if non-censored/non-missing |
LS |
upper limit of detection. For each observation: if non-censored/non-missing |
init.phi |
initial value of the spatial scaling parameter. |
init.rho |
initial value of the time scaling parameter. |
init.tau2 |
initial value of the the nugget effect parameter. |
tau2.fixo |
|
type.Data |
type of the data: ' |
method |
optimization method used to estimate ( |
kappa |
parameter for all spatial covariance functions. In the case of exponential, gaussian and spherical function |
type.S |
type of spatial correlation function: ' |
IMatrix |
|
lower.lim, upper.lim
|
vectors of lower and upper bounds for the optimization method.
If unspecified, the default is |
M |
number of Monte Carlo samples for stochastic approximation. By default = |
perc |
percentage of burn-in on the Monte Carlo sample. By default = |
MaxIter |
the maximum number of iterations of the SAEM algorithm. By default = |
pc |
percentage of iterations of the SAEM algorithm with no memory. By default = |
error |
the convergence maximum error. By default = |
The spatio-temporal Gaussian model is giving by:
where the deterministic term and the stochastic terms ,
can depend on the observed spatio-temporal indexes for .
We assume is normally distributed with zero-mean and covariance matrix ,
where is the partial sill, is the spatio-temporal correlation matrix,
and are the spatial and time scaling parameters; is an independent and
identically distributed measurement error with , variance
(the nugget effect) and
for all or .
In particular, we define , the mean of the stochastic process as
where are known functions of , and
are unknown parameters to be estimated. Equivalently, in matrix notation, we have the spatio-temporal linear model as follows:
Therefore the spatio-temporal process, , has normal distribution with mean and
variance . We assume that is non-singular
and has full rank.
The estimation process was computed via SAEM algorithm initially proposed by Delyon et al. (1999).
The function returns an object of class Est.StempCens which is a list given by:
m.dataReturns a list with all data components given in input.
m.resultsA list given by:
theta |
final estimation of |
Theta |
estimated parameters in all iterations, |
beta |
estimated |
sigma2 |
estimated |
tau2 |
estimated |
phi |
estimated |
rho |
estimated |
Eff.range |
estimated effective range. |
PsiInv |
estimated |
Cov |
estimated |
SAEMy |
stochastic approximation of the first moment for the truncated normal distribution. |
SAEMyy |
stochastic approximation of the second moment for the truncated normal distribution. |
Hessian |
Hessian matrix, the negative of the conditional expected second derivative matrix given the observed values. |
Louis |
the observed information matrix using the Louis' method. |
loglik |
log likelihood for SAEM method. |
AIC |
Akaike information criteria. |
BIC |
Bayesian information criteria. |
AICcorr |
corrected AIC by the number of parameters. |
iteration |
number of iterations needed to convergence. |
Katherine L. Valeriano, Victor H. Lachos and Larissa A. Matos
set.seed(12345) # Initial parameter values beta <- c(-1,1.50) phi <- 5 rho <- 0.45 tau2 <- 0.80 sigma2 <- 1.5 coord <- matrix(runif(10, 0, 10), ncol=2) time <- matrix(1:5, ncol=1) x <- cbind(rexp(25,2), rnorm(25,2,1)) # Data simulation data <- rnStempCens(x, time, coord, beta, phi, rho, tau2, sigma2, type.S="matern", kappa=1.5, cens="left", pcens=0.20) # Estimation est_teste <- EstStempCens(data$yObs, x, data$ci, data$time, cbind(data$x.coord, data$y.coord), data$lcl, data$ucl, init.phi=3.5, init.rho=0.5, init.tau2=0.7, tau2.fixo=FALSE, kappa=1.5, type.S="matern", IMatrix=TRUE, M=20, perc=0.25, MaxIter=300, pc=0.2)set.seed(12345) # Initial parameter values beta <- c(-1,1.50) phi <- 5 rho <- 0.45 tau2 <- 0.80 sigma2 <- 1.5 coord <- matrix(runif(10, 0, 10), ncol=2) time <- matrix(1:5, ncol=1) x <- cbind(rexp(25,2), rnorm(25,2,1)) # Data simulation data <- rnStempCens(x, time, coord, beta, phi, rho, tau2, sigma2, type.S="matern", kappa=1.5, cens="left", pcens=0.20) # Estimation est_teste <- EstStempCens(data$yObs, x, data$ci, data$time, cbind(data$x.coord, data$y.coord), data$lcl, data$ucl, init.phi=3.5, init.rho=0.5, init.tau2=0.7, tau2.fixo=FALSE, kappa=1.5, type.S="matern", IMatrix=TRUE, M=20, perc=0.25, MaxIter=300, pc=0.2)
This function performs spatio-temporal prediction in a set of new S spatial locations for fixed time points.
PredStempCens(Est.StempCens, locPre, timePre, xPre)PredStempCens(Est.StempCens, locPre, timePre, xPre)
Est.StempCens |
an object of class |
locPre |
a matrix of coordinates for which prediction is performed. |
timePre |
the time point vector for which prediction is performed. |
xPre |
a matrix of covariates for which prediction is performed. |
The function returns an object of class Pred.StempCens which is a list given by:
predValues |
predicted values. |
VarPred |
predicted covariance matrix. |
Katherine L. Valeriano, Victor H. Lachos and Larissa A. Matos
set.seed(12345) # Parameter values beta <- c(-1,1.50) phi <- 5 rho <- 0.60 tau2 <- 0.80 sigma2 <- 2 coord <- matrix(runif(34, 0, 10), ncol=2) time <- matrix(1:5, ncol=1) x <- cbind(rexp(85,2), rnorm(85,2,1)) # Data simulation data <- rnStempCens(x, time, coord, beta, phi, rho, tau2, sigma2, type.S="pow.exp", kappa=0.5, cens="left", pcens=0.10) # Splitting the dataset train <- data[11:85,] test <- data[1:10,] # Estimation x <- cbind(train$x1, train$x2) coord2 <- cbind(train$x.coord, train$y.coord) est_teste <- EstStempCens(train$yObs, x, train$ci, train$time, coord2, train$lcl, train$ucl, init.phi=3.5, init.rho=0.5, init.tau2=1, kappa=0.5, type.S="pow.exp", IMatrix=FALSE, M=20, perc=0.25, MaxIter=300, pc=0.20) # Prediction xPre <- cbind(test$x1, test$x2) pre_test <- PredStempCens(est_teste, test[,1:2], test$time, xPre)set.seed(12345) # Parameter values beta <- c(-1,1.50) phi <- 5 rho <- 0.60 tau2 <- 0.80 sigma2 <- 2 coord <- matrix(runif(34, 0, 10), ncol=2) time <- matrix(1:5, ncol=1) x <- cbind(rexp(85,2), rnorm(85,2,1)) # Data simulation data <- rnStempCens(x, time, coord, beta, phi, rho, tau2, sigma2, type.S="pow.exp", kappa=0.5, cens="left", pcens=0.10) # Splitting the dataset train <- data[11:85,] test <- data[1:10,] # Estimation x <- cbind(train$x1, train$x2) coord2 <- cbind(train$x.coord, train$y.coord) est_teste <- EstStempCens(train$yObs, x, train$ci, train$time, coord2, train$lcl, train$ucl, init.phi=3.5, init.rho=0.5, init.tau2=1, kappa=0.5, type.S="pow.exp", IMatrix=FALSE, M=20, perc=0.25, MaxIter=300, pc=0.20) # Prediction xPre <- cbind(test$x1, test$x2) pre_test <- PredStempCens(est_teste, test[,1:2], test$time, xPre)
It simulates balanced censored spatio-temporal data with a linear structure for an established censoring rate o limit of detection.
rnStempCens(x, time, coords, beta, phi, rho, tau2, sigma2, type.S = "exponential", kappa = 0, cens = "left", pcens = 0.1, lod = NULL)rnStempCens(x, time, coords, beta, phi, rho, tau2, sigma2, type.S = "exponential", kappa = 0, cens = "left", pcens = 0.1, lod = NULL)
x |
design matrix of dimensions |
time |
vector containing the unique time points at which the observations are made, of length |
coords |
2D unique spatial coordinates of dimension |
beta |
linear regression parameters. |
phi |
value of the spatial scaling parameter. |
rho |
value of the time scaling parameter. |
tau2 |
value of the the nugget effect parameter. |
sigma2 |
value of the partial sill. |
type.S |
type of spatial correlation function: ' |
kappa |
parameter for all spatial covariance functions. In the case of exponential, gaussian and spherical function |
cens |
' |
pcens |
desired censoring rate. By default= |
lod |
desired detection limit for censored observations. By default= |
The function returns a data.frame containing the simulated data.
Katherine L. Valeriano, Victor H. Lachos and Larissa A. Matos
set.seed(1000) # Initial parameter values phi <- 5 rho <- 0.45 tau2 <- 0.80 sigma2 <- 2 beta <- c(1, 2.5) x <- cbind(1, rnorm(50)) # Coordinates coords <- matrix(runif(20, 0, 10), ncol=2) # Cartesian coordinates without repetitions time <- 1:5 # Time index without repetitions # Data simulation data <- rnStempCens(x, time, coords, beta, phi, rho, tau2, sigma2, type.S="exponential", cens="left", pcens=0.10)set.seed(1000) # Initial parameter values phi <- 5 rho <- 0.45 tau2 <- 0.80 sigma2 <- 2 beta <- c(1, 2.5) x <- cbind(1, rnorm(50)) # Coordinates coords <- matrix(runif(20, 0, 10), ncol=2) # Cartesian coordinates without repetitions time <- 1:5 # Time index without repetitions # Data simulation data <- rnStempCens(x, time, coords, beta, phi, rho, tau2, sigma2, type.S="exponential", cens="left", pcens=0.10)