Title: | Conditional Multivariate Reference Regions |
---|---|
Description: | An R package for estimating conditional multivariate reference regions. The reference region is non parametrically estimated using a kernel density estimator. Covariates effects on the multivariate response means vector and variance-covariance matrix, thus on the region shape, are estimated by flexible additive predictors. Continuous covariates non linear effects might be estimated using penalized splines smoothers. Confidence intervals for the covariates estimated effects might be derived from bootstrap resampling. Kernel density bandwidth can be estimated with different methods, including a method that optimize the region coverage. Numerical, and graphical, summaries can be obtained by the user in order to evaluate reference region performance with real data. Full mathematical details can be found in <doi:10.1002/sim.9163> and <doi:10.1007/s00477-020-01901-1>. |
Authors: | Oscar Lado-Baleato [cre, aut], Javier Roca-Pardinas [aut, ctb], Carmen Cadarso-Suarez [aut, ths], Gude Francisco [aut, ths] |
Maintainer: | Oscar Lado-Baleato <[email protected]> |
License: | GPL-2 |
Version: | 0.1.1 |
Built: | 2024-11-14 06:48:45 UTC |
Source: | CRAN |
This function implements Alternating Conditional Expectation algorithm (ACE, Hastie and Tibshirani, 1990). This is a bivRegr's inner function for estimating variance, and correlation models.
ACE( y = "y", predictor = "~s(x)", restriction = "positive", eps = 0.01, itmax = 10, data = data, ... )
ACE( y = "y", predictor = "~s(x)", restriction = "positive", eps = 0.01, itmax = 10, data = data, ... )
y |
A character defining the response variable. |
predictor |
The regression predictor, following the mgcv package structure (see example below). |
restriction |
Type of restriction to be imposed to the response variable, "positive" for variance, and "correlation" for the correlation. |
eps |
A number defining the allowed estimation error, default = 0.01. |
itmax |
Maximum number of iterations of the algorithm, default = 10. |
data |
A data frame containing the response, and predictor variables. |
... |
Additional mgcv::gam() parameters to be modified by the user. |
This function returns a mgcv::gam() fit for a transformed response.
Hastie, T. & Tibshirani, J. (1990) Generalized additive models. CRC press. London.
n <- 100 x <- runif(n, -1, 1) y <- x^2 + rnorm(n, sd = 0.1) df <- data.frame(y, x) plot(df$x, df$y) m1 <- ACE( y = "y", predictor = "~s(x)", restriction = "positive", eps = 0.01, itmax = 10, data = df )$fit nw <- data.frame(x = seq(-1, 1, 0.1)) abline(h = 0, col = 2) lines(exp(predict(m1, newdata = nw)) ~ nw$x, col = 3, lwd = 2) legend("top", legend = c("ACE fit", "Zero"), lty = 1, lwd = 2, col = c(3, 2), bty = "n")
n <- 100 x <- runif(n, -1, 1) y <- x^2 + rnorm(n, sd = 0.1) df <- data.frame(y, x) plot(df$x, df$y) m1 <- ACE( y = "y", predictor = "~s(x)", restriction = "positive", eps = 0.01, itmax = 10, data = df )$fit nw <- data.frame(x = seq(-1, 1, 0.1)) abline(h = 0, col = 2) lines(exp(predict(m1, newdata = nw)) ~ nw$x, col = 3, lwd = 2) legend("top", legend = c("ACE fit", "Zero"), lty = 1, lwd = 2, col = c(3, 2), bty = "n")
The aegis dataset was obtained from the Estrada Glycation and Inflammation (AEGIS) project. A cross sectional, population based study that was performed in the municipality of A Estrada (Galicia, NW Spain). The study objective was to investigate the association between glycation, inflammation status, lifestyle and common diseases, and reasons for glycemic markers discordances (Gude et al., 2017).
aegis
aegis
A data frame containing 1516 observations, and 7 variables:
Anonymized patient identificator.
A factor variable describing patient gender, (female, and male).
Patients' age.
Diabetes mellitus indicator (no, and yes).
Fasting plasma glucose levels (mg/dL).
Glycated hemoglobin percentage.
Fructosamine concentration (mg/dL).
Gude F, Diaz–Vidal P, Rua–Perez C, et al. Glycemic variability and its association with demographics and lifestyles in ageneral adult population. J Diabetes Sci Technol. 2017; 11(4): 780–790
This functions estimate a probabilistic/reference region for bivariate data. It is based on a kernel density estimation. It may be applied to a set of bivariate data points, or to a bivRegr object. In the former case, the function will estimate a bivariate reference region for the model standarized residuals.
bivRegion( Y = fit, H_choice = "Hcov", tau = 0.95, k = 20, display_plot = TRUE, shape = NULL, ... )
bivRegion( Y = fit, H_choice = "Hcov", tau = 0.95, k = 20, display_plot = TRUE, shape = NULL, ... )
Y |
A set of bivariate data points, or a bivRegr object. |
H_choice |
Kernel bandwidth selection method: "plug.in" for plug.in method, "LSCV" for least squate cross valiation, "SCV" for smooth cross validation, and "Hcov" for a bandwidth selection method which optimize the region coverage. |
tau |
A number or vector defining the desired coverage(s) of the bivariate reference region. |
k |
In case of using "Hcov" the number of k fold cross validations replicates to be performed. |
display_plot |
A logical indicating if plot must be displayed during "Hcov" bandwidht estimation procedure. The plot depicts region's coverage, evaluated with k fold cross validation, depending on kernel bandwidth value. |
shape |
Shape parameter modulating the final shape of the bivariate probabilistic/reference region by hand. |
... |
Additional parameters to be modified in KernSmooth::bkde2D() function by the user (e.g. gridsize). |
This function return a region or a set of regions containing a given percentage of bivariate data points.
Duong, T. (2019) ks: Kernel Smoothing. R package version 1.11.6. https://CRAN.R–project.org/package=ks.
Matt Wand (2020). KernSmooth: Functions for Kernel Smoothing Supporting Wand & Jones (1995). R package version 2.23–18. https://CRAN.R–project.org/package=KernSmooth
Y <- cbind(rnorm(100), rnorm(100)) Y <- as.data.frame(Y) names(Y) <- c("y1", "y2") region <- bivRegion(Y, tau = 0.95, shape = 2) plot(region)
Y <- cbind(rnorm(100), rnorm(100)) Y <- as.data.frame(Y) names(Y) <- c("y1", "y2") region <- bivRegion(Y, tau = 0.95, shape = 2) plot(region)
This function estimates the covariates effects on the means vector, and variance covariance matrix from a bivariate variable. Non linear effects might be estimated for continuous covariates using penalized spline smoothers.
bivRegr(f = f, data = data, ace.eps = 0.01, ace.itmax = 25)
bivRegr(f = f, data = data, ace.eps = 0.01, ace.itmax = 25)
f |
A list of five formulas defining the covariates effects in both responses means, variances, and in their correlation. The formulas follow the same structure as mgcv::gam() (see example below). |
data |
A data frame containing the reponses, and predictor variables values. |
ace.eps |
A number defining the error rate in the ACE algorithm. |
ace.itmax |
A number defining the maximum number of ACE algorithm iterations. |
This function returns the covariates effect on the means, variances, and correlation of a bivariate response variable.
# Bivariate reference region for fasting plasma glucose (fpg) # and glycated hemoglobin (hba1c) levels depending on age dm_no <- subset(aegis, aegis$dm == "no") # select healthy patients # 1.1) Define predictors mu1 <- fpg ~ s(age) mu2 <- hba1c ~ s(age) var1 <- ~ s(age) var2 <- ~ s(age) rho <- ~ s(age) f <- list(mu1, mu2, var1, var2, rho) fit <- bivRegr(f, data = dm_no) # 1.2) Depict the estimated covariates effects plot(fit, eq = 1) plot(fit, eq = 2) plot(fit, eq = 3) plot(fit, eq = 4) plot(fit, eq = 5) # 1.2.1) Depict the estimated covariates effects with CI (Not Run) s0 <- summary_boot(fit, B = 100) # no parallelization # s1 = summary_boot(fit,B=100,parallel=TRUE) #parallelization plot(s0, eq = 1) # 1.3) Obtain the reference region in the standarized residuals region <- bivRegion(fit, tau = 0.95, shape = 2) plot(region) # 1.4) Identify those patients located outside the reference region summary(region) # 1.5) Depict the conditional reference region for two ages plot(region, cond = TRUE, newdata = data.frame(age = c(20, 50)), col = "grey", pch = "*", reg.lwd = 2, reg.lty = 2 )
# Bivariate reference region for fasting plasma glucose (fpg) # and glycated hemoglobin (hba1c) levels depending on age dm_no <- subset(aegis, aegis$dm == "no") # select healthy patients # 1.1) Define predictors mu1 <- fpg ~ s(age) mu2 <- hba1c ~ s(age) var1 <- ~ s(age) var2 <- ~ s(age) rho <- ~ s(age) f <- list(mu1, mu2, var1, var2, rho) fit <- bivRegr(f, data = dm_no) # 1.2) Depict the estimated covariates effects plot(fit, eq = 1) plot(fit, eq = 2) plot(fit, eq = 3) plot(fit, eq = 4) plot(fit, eq = 5) # 1.2.1) Depict the estimated covariates effects with CI (Not Run) s0 <- summary_boot(fit, B = 100) # no parallelization # s1 = summary_boot(fit,B=100,parallel=TRUE) #parallelization plot(s0, eq = 1) # 1.3) Obtain the reference region in the standarized residuals region <- bivRegion(fit, tau = 0.95, shape = 2) plot(region) # 1.4) Identify those patients located outside the reference region summary(region) # 1.5) Depict the conditional reference region for two ages plot(region, cond = TRUE, newdata = data.frame(age = c(20, 50)), col = "grey", pch = "*", reg.lwd = 2, reg.lty = 2 )
This function implements a method for estimating bivariate kernel bandwidth based on data covarage. The method starts with the plug in estimate (which usually overfits the data), and then increase this bandwidth value until the desired coverage is obtained. Region coverage is evaluated in an out sample design, using a k fold cross validation scheme.
Hcov(Y, shape = seq(1, 10, 0.5), k = 20, tau = 0.9, display_plot = TRUE)
Hcov(Y, shape = seq(1, 10, 0.5), k = 20, tau = 0.9, display_plot = TRUE)
Y |
A matrix containing bivariate data values. |
shape |
A sequence of values which controls plug in estimator increasing. |
k |
A number indicating k fold cross validations to be performed. |
tau |
The desired region coverage |
display_plot |
A logical indicating if a plot must be displaying, during the function estimation process, summarizing the results. |
This function return a diagonal kernel bandwidth matrix.
This function allow to depict the estimated bivariate reference/probabilistic region, in the estandarized residuals scale (cond=FALSE), or for any covariate value (cond=TRUE).
## S3 method for class 'bivRegion' plot( x, tau = 0.95, newdata = NULL, reg.col = NULL, reg.lwd = 1, reg.lty = NULL, axes = TRUE, axes.col = "black", axes.lwd = 2L, cond = FALSE, add = FALSE, legend = TRUE, ... )
## S3 method for class 'bivRegion' plot( x, tau = 0.95, newdata = NULL, reg.col = NULL, reg.lwd = 1, reg.lty = NULL, axes = TRUE, axes.col = "black", axes.lwd = 2L, cond = FALSE, add = FALSE, legend = TRUE, ... )
x |
A bivRegion object. |
tau |
A number, or vector, defining the desired coverage(s) of the bivariate reference region. |
newdata |
If cond=FALSE, a data.frame with new values to be depicted in the standarized residuals scale. If cond=TRUE, a data frame containing covariate values for which the reference/probabilistic region will be depicted. |
reg.col |
Region line colour, in case of more than one tau it can be a vector. |
reg.lwd |
Region line width, in case of more than one tau it can be a vector. |
reg.lty |
Region line type, in case of more than one tau it can be a vector. |
axes |
Logical; if TRUE (and cond=FALSE), vertical and horizontal lines are added indicating four quadrants in the model residuals scale. |
axes.col |
Axes colour. |
axes.lwd |
Axes line width. |
cond |
A logical argument, if TRUE a conditional reference region is depicted. |
add |
A logical argument, if TRUE the conditional reference region is depicted over a pre existing plot. |
legend |
A logical argument, if TRUE a legend is given along with the reference region. |
... |
Further plot parameters. |
This function return a graphical representation for a bivRegion object.
Y <- cbind(rnorm(100), rnorm(100)) Y <- as.data.frame(Y) names(Y) <- c("y1", "y2") reg <- bivRegion(Y, tau = 0.95, shape = 2) plot(reg)
Y <- cbind(rnorm(100), rnorm(100)) Y <- as.data.frame(Y) names(Y) <- c("y1", "y2") reg <- bivRegion(Y, tau = 0.95, shape = 2) plot(reg)
This function takes an bivRegr object and plots the estimated effects for the conditional response means, variances or correlation. summary_boot function must be applied by the user in order to get the estimated effects confidence intervals.
## S3 method for class 'bivRegr' plot(x, eq = 1, newdata = NULL, ...)
## S3 method for class 'bivRegr' plot(x, eq = 1, newdata = NULL, ...)
x |
A bivRegr fit. |
eq |
A number indicating the model effects to be depicted; 1 = first response mean, 2 = second response mean, 3 = first response variance, 4 = second response variance, and 5 = correlation model. |
newdata |
An optional data frame containing covariates values. |
... |
Additional plot arguments. |
This function returns a plot for bivRegr mean, variance and correlation models.
This function depict the univariate conditional quantile model based on the non parametric location scale model fitted with the refcurv function.
## S3 method for class 'refcurve' plot( x, newdata = data.frame(x = seq(0, 1, 0.01)), tau = seq(0.1, 0.9, 0.2), ... )
## S3 method for class 'refcurve' plot( x, newdata = data.frame(x = seq(0, 1, 0.01)), tau = seq(0.1, 0.9, 0.2), ... )
x |
A refcurv object. |
newdata |
A data frame defining a sequence of the predictor variables values. |
tau |
A number or vector defining desired quantile. |
... |
Additional plot options. |
This function returns a plot of the refcurve model.
This function takes the bivRegr bootstrap replicates obtained with summary_boot function, and plots the parametric, and smooth effects for each model.
## S3 method for class 'summary_boot' plot(x, eq = 1, select = NULL, ...)
## S3 method for class 'summary_boot' plot(x, eq = 1, select = NULL, ...)
x |
A summary_boot object. |
eq |
A number indicating the model effects to be depicted; 1 = first response mean, 2 = second response mean, 3 = first response variance, 4 = second response variance, and 5 = correlation model. |
select |
An optional parameter to represent an specific effect for each equation. |
... |
Additional plot parameters, not yet implemented. |
This function returns a ggplot2 plot for the estimated effects along with bootstrap 95% confidence intervals.
This function allow to depict in an interactive rgl plot the estimated trivariate reference/probabilistic region. If cond=FALSE it showes trivariate standarized residuals, while with cond=TRUE it represents the region shape for any covariate(s) value.
## S3 method for class 'trivRegion' plot( x, cond = FALSE, planes = FALSE, newdata = NULL, add = FALSE, reg.col = NULL, incol = "grey", legend = FALSE, ... )
## S3 method for class 'trivRegion' plot( x, cond = FALSE, planes = FALSE, newdata = NULL, add = FALSE, reg.col = NULL, incol = "grey", legend = FALSE, ... )
x |
A trivRegion object. |
cond |
A logical argument, if TRUE a conditional reference region is depicted. |
planes |
Logical; if TRUE, planes are added indicating (x=0,y=0,z=0). |
newdata |
If cond==TRUE, a data frame containing covariate values for which the reference/probabilistic region will be depicted. |
add |
A logical argument, if TRUE the conditional reference region is depicted over a pre existing plot. |
reg.col |
Region contour colour. |
incol |
Colours for the points included inside the reference region. |
legend |
A logical argument, if TRUE a legend is given along with the reference region. |
... |
Further rgl plot parameters. |
This function return an interactive rgl plot of a bivRegion object.
The pollution dataset contains the SO2 and Nox concentrations in the As Pontes thermal power plant surrondings, during a year. The data was analyzed from a bivariate perspective in Roca–Pardiñas et al. (2021).
pollution
pollution
A data frame with 6179 observations and 13 variables:
The observation day, hour, and minute.
So2 air concentration at the present time.
Nox air concentration at the present time.
So2 air concentration registered 30 minutes before than the So2 one.
Nox air concentration registered 30 minutes before than the Nox one.
So2 air concentration registered 45 minutes before than the So2 one.
Nox air concentration registered 45 minutes before than the Nox one.
So2 air concentration registered 60 minutes before than the So2 one.
Nox air concentration registered 60 minutes before than the Nox one.
So2 air concentration registered 75 minutes before than the So2 one.
Nox air concentration registered 75 minutes before than the Nox one.
So2 air concentration registered 90 minutes before than the So2 one.
Nox air concentration registered 90 minutes before than the Nox one.
Roca–Pardinas, J., Ordonez, C., & Lado Baleato, O. (2021). Nonparametric location scale model for the joint forecasting of SO2 and NOx pollution episodes. Stochastic Environmental Research and Risk Assessment, 35(2), 231–244.
SO2 and Nox concentrations in the As Pontes thermal power plant surrondings, during a pollution episode.
pollution_episode
pollution_episode
A data frame with 288 observations and 13 variables:
The observation day, hour and minute.
So2 air concentration at the present time.
Nox air concentration at the present time.
So2 air concentration registered 30 minutes before than the So2 one.
Nox air concentration registered 30 minutes before than the Nox one.
So2 air concentration registered 45 minutes before than the So2 one.
Nox air concentration registered 45 minutes before than the Nox one.
So2 air concentration registered 60 minutes before than the So2 one.
Nox air concentration registered 60 minutes before than the Nox one.
So2 air concentration registered 75 minutes before than the So2 one.
Nox air concentration registered 75 minutes before than the Nox one.
So2 air concentration registered 90 minutes before than the So2 one.
Nox air concentration registered 90 minutes before than the Nox one.
This function takes a bivRegion object and allow to obtain region limits for a given covariate value if cond=TRUE. If not, it can be applied to a new dataset to evaluate which points will be included inside the standarized region (for instance, if we estimate a reference region with healthy patients results, we can know where non healthy patients will be located).
## S3 method for class 'bivRegion' predict(object, tau = 0.95, newdata = NULL, cond = FALSE, ...)
## S3 method for class 'bivRegion' predict(object, tau = 0.95, newdata = NULL, cond = FALSE, ...)
object |
A bivRegion object. |
tau |
A number or vector defining the desired coverage(s) of the bivariate reference/probabilistic region. |
newdata |
An optional data frame containing new covariate values, or new observations. |
cond |
A logical argument, if TRUE, a matrix of values defining the conditional reference region limits is given. |
... |
Additional prediction parameters. |
This function returns reference region limits values for a given covariate value (if cond=TRUE), or which observations falls outside the estimated region (if cond=FALSE).
Obtains predictions for a bivRegr object, that is the bivRegr means, variances, and correlation for given covariate values.
## S3 method for class 'bivRegr' predict(object, newdata = NULL, ...)
## S3 method for class 'bivRegr' predict(object, newdata = NULL, ...)
object |
A bivRegr fit. |
newdata |
A data frame defining the covariate values for prediction. Default is NULL and the prediction will be done in the original data. |
... |
Additional predict options. |
This function returns prediction of bivRegr mean, variance and correlation models.
This function obtain univariate conditional quantiles as described in Martinez-Silva et. al (2016).
refcurv(mu = "y~s(x)", sigma = "~s(x)", data = data)
refcurv(mu = "y~s(x)", sigma = "~s(x)", data = data)
mu |
A formula object for the response mean model following the mgcv package structure (see example below). |
sigma |
a formula object for fitting a model to the response variance (see example below). |
data |
A data frame containing both the response, and predictor variables. |
In the Martinez Silva et. al (2016) the non linear effects of the continuous covariates are estimating through polynomial kernel smoother, in this package we implement the same methodology but using penalized splines in order to reduce computational cost.
This function returns univariate conditional quantiles estimated using a non parametric location scale model.
Martinez–Silva, I., Roca–Pardinas, J., & Ordonez, C. (2016). Forecasting SO2 pollution incidents by means of quantile curves based on additive models. Environmetrics, 27(3), 147–157.
#--- Glycation hemoglobin reference curve depending on age dm_no <- subset(aegis, aegis$dm == "no") fit1 <- refcurv(mu = "hba1c~s(age)", sigma = "~s(age)", data = dm_no) plot(fit1, newdata = data.frame(age = 18:90), tau = c(0.025, 0.05, 0.10, 0.90, 0.95, 0.975))
#--- Glycation hemoglobin reference curve depending on age dm_no <- subset(aegis, aegis$dm == "no") fit1 <- refcurv(mu = "hba1c~s(age)", sigma = "~s(age)", data = dm_no) plot(fit1, newdata = data.frame(age = 18:90), tau = c(0.025, 0.05, 0.10, 0.90, 0.95, 0.975))
This function perform a bootstrap procedure in order to obtain confidence intervals for the bivRegr estimated effects. The function allow to paralelize the bootstrap resampling scheme using doParalell, and foreach libraries.
summary_boot(object, B = 100, parallel = FALSE, cores = NULL)
summary_boot(object, B = 100, parallel = FALSE, cores = NULL)
object |
A bivRegr fit. |
B |
A number indicating the bootstrap iterations. |
parallel |
A logical indicating if bootstrap parallelization must be applied. |
cores |
If parallel = TRUE, a number indicating computer cores to be used during paralellization. If NULL the function use all cores available but one. |
This function returns the bootstrap replicates of the bivRegr sub models. Results might be checked applying plot.summary_boot().
This function takes an bivRegion object and indicates which observations are located outside the estimated reference region and why.
## S3 method for class 'bivRegion' summary(object, tau = 0.95, ...)
## S3 method for class 'bivRegion' summary(object, tau = 0.95, ...)
object |
A bivRegion object. |
tau |
The data coverage proportion previously obtained by the bivRegion function. |
... |
Additional summary options. |
This function indicates the region apparent coverage, and which patients are located outside the estimated reference region.
This functions estimate a probabilistic/reference region for trivariate data. It is based on a non parametric kernel density estimation. It can only be applied to a trivRegr object, and for one single tau.
trivRegion(fit, tau = 0.9)
trivRegion(fit, tau = 0.9)
fit |
A trivRegr object. |
tau |
A number defining the desired coverage of the trivariate reference region. |
This function return a region containing a given percentage of trivariate data points.
Duong, T. (2019) ks: Kernel Smoothing. R package version 1.11.6. https://CRAN.R–project.org/package=ks.
This function estimates the covariates effects on the means vector, and variance covariance matrix of a trivariate variable. Non linear effects might be estimated for continuous covariates using penalized splines.
trivRegr(f = f, data = data)
trivRegr(f = f, data = data)
f |
A list of 9 formulas defining the covariates effects in three responses means, in their variances, and in their correlations. The formulas follow the mgcv::gam() structure. |
data |
A data frame containing the reponses, and predictor variables values. |
This function returns the covariates effect on the means, variances, and correlation for a trivariate response variable.
dm_no <- subset(aegis, aegis$dm == "no") # Model formulas mu1 <- fpg ~ s(age) mu2 <- hba1c ~ s(age) mu3 <- fru~s(age) var1 <- ~ s(age) var2 <- ~ s(age) var3 <- ~ s(age) theta12 <- ~ s(age) theta13 <- ~ s(age) theta23 <- ~ s(age) f <- list(mu1, mu2, mu3, var1, var2, var3, theta12, theta13, theta23) # Model fit fit <- trivRegr(f, data = dm_no) # Trivariate region estimation region <- trivRegion(fit, tau = 0.95) plot(region, col = 2, planes = TRUE) plot(region, cond = TRUE, newdata = data.frame(age = c(20, 80)), xlab = "FPG, mg/dl", ylab = "HbA1c, %", zlab = "Fru, mg/dL" )
dm_no <- subset(aegis, aegis$dm == "no") # Model formulas mu1 <- fpg ~ s(age) mu2 <- hba1c ~ s(age) mu3 <- fru~s(age) var1 <- ~ s(age) var2 <- ~ s(age) var3 <- ~ s(age) theta12 <- ~ s(age) theta13 <- ~ s(age) theta23 <- ~ s(age) f <- list(mu1, mu2, mu3, var1, var2, var3, theta12, theta13, theta23) # Model fit fit <- trivRegr(f, data = dm_no) # Trivariate region estimation region <- trivRegion(fit, tau = 0.95) plot(region, col = 2, planes = TRUE) plot(region, cond = TRUE, newdata = data.frame(age = c(20, 80)), xlab = "FPG, mg/dl", ylab = "HbA1c, %", zlab = "Fru, mg/dL" )