Title: | Regularized Calibrated Estimation |
---|---|
Description: | Regularized calibrated estimation for causal inference and missing-data problems with high-dimensional data, based on Tan (2020a) <doi:10.1093/biomet/asz059>, Tan (2020b) <doi:10.1214/19-AOS1824> and Sun and Tan (2020) <arXiv:2009.09286>. |
Authors: | Zhiqiang Tan, Baoluo Sun |
Maintainer: | Zhiqiang Tan <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0 |
Built: | 2024-12-24 07:00:07 UTC |
Source: | CRAN |
Regularized calibrated estimation for causal inference and missing-data problems with high-dimensional data.
The R package RCAL
- version 2.0 can be used for two main tasks:
to estimate the mean of an outcome in the presence of missing data,
to estimate the average treatment effects (ATE) and local average treatment effects (LATE) in causal inference.
There are 3 high-level functions provided for the first task:
mn.nreg
: inference using non-regularized calibrated estimation,
mn.regu.cv
: inference using regularized calibrated estimation based on cross validation,
mn.regu.path
: inference using regularized calibrated estimation along a regularization path.
The first function mn.nreg
is appropriate only in relatively low-dimensional settings,
whereas the functions mn.regu.cv
and mn.regu.path
are designed to deal with high-dimensional data (namely,
the number of covariates close to or greater than the sample size).
In parallel, there are 3 functions for estimating the average treatment effect in the second task, ate.nreg
, ate.regu.cv
, and ate.regu.path
.
These functions can also be used to perform inference for the average treatment effects on the treated or on the untreated.
Currently, the treatment is assumed to be binary (i.e., untreated or treated). There are also 3 functions for estimating the local average treatment effect using instrumental variables,
late.nreg
, late.regu.cv
, and late.regu.path
. Currently both the treatment and instrumental variable are assumed to be binary. Extensions to multi-valued treatments and instrumental variables will be incorporated in later versions.
The package also provides lower-level functions, including glm.nreg
to implement non-regularized M-estimation and glm.regu
to
implement Lasso regularized M-estimation for fitting generalized linear models currently with continuous or binary outcomes.
The latter function glm.regu
uses an active-set descent algorithm, which enjoys a finite termination property
for solving least-squares Lasso problems.
See the the vignettes for more details.
This function implements augmented inverse probability weighted (IPW) estimation of average treatment effects (ATEs), provided both fitted propensity scores and fitted values from outcome regression.
ate.aipw(y, tr, mfp, mfo, off = NULL)
ate.aipw(y, tr, mfp, mfo, off = NULL)
y |
An |
tr |
An |
mfp |
An |
mfo |
An |
off |
A |
one |
A |
ipw |
A |
or |
A |
est |
A |
var |
The estimated variances associated with the augmented IPW estimates of means. |
ze |
The z-statistics for the augmented IPW estimates of means, compared to |
diff |
The augmented IPW estimate of ATE. |
diff.var |
The estimated variance associated with the augmented IPW estimate of ATE. |
diff.ze |
The z-statistic for the augmented IPW estimate of ATE. |
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
This function implements inverse probability weighted (IPW) estimation of average treatment effects (ATEs), provided fitted propensity scores.
ate.ipw(y, tr, mfp)
ate.ipw(y, tr, mfp)
y |
An |
tr |
An |
mfp |
An |
one |
The direct IPW estimates of 1. |
est |
The ratio IPW estimates of means. |
diff |
The ratio IPW estimate of ATE. |
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
This function implements model-assisted inference for average treatment effects, using non-regularized calibrated estimation.
ate.nreg(y, tr, x, ploss = "cal", yloss = "gaus", off = NULL)
ate.nreg(y, tr, x, ploss = "cal", yloss = "gaus", off = NULL)
y |
An |
tr |
An |
x |
An |
ploss |
A loss function used in propensity score estimation (either "ml" or "cal"). |
yloss |
A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes). |
off |
A |
For calibrated estimation, two sets of propensity scores are separately estimated for the untreated and treated as discussed in Tan (2020a, 2020b).
See also Details for mn.nreg
.
ps |
A list containing the results from fitting the propensity score model by |
mfp |
An |
or |
A list containing the results from fitting the outcome regression model by |
mfo |
An |
est |
A list containing the results from augmented IPW estimation by |
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # include only 10 covariates x2 <- x[,1:10] ate.cal <- ate.nreg(y, tr, x2, ploss="cal", yloss="gaus") matrix(unlist(ate.cal$est), ncol=2, byrow=TRUE, dimnames=list(c("one", "ipw", "or", "est", "var", "ze", "diff.est", "diff.var", "diff.ze"), c("untreated", "treated")))
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # include only 10 covariates x2 <- x[,1:10] ate.cal <- ate.nreg(y, tr, x2, ploss="cal", yloss="gaus") matrix(unlist(ate.cal$est), ncol=2, byrow=TRUE, dimnames=list(c("one", "ipw", "or", "est", "var", "ze", "diff.est", "diff.var", "diff.ze"), c("untreated", "treated")))
This function implements model-assisted inference for average treatment effects, using regularized calibrated estimation based on cross validation.
ate.regu.cv(fold, nrho = NULL, rho.seq = NULL, y, tr, x, ploss = "cal", yloss = "gaus", off = NULL, ...)
ate.regu.cv(fold, nrho = NULL, rho.seq = NULL, y, tr, x, ploss = "cal", yloss = "gaus", off = NULL, ...)
fold |
A vector of length 2 giving the fold numbers for cross validation in propensity score estimation and outcome regression respectively. |
nrho |
A vector of length 2 giving the numbers of tuning parameters searched in cross validation. |
rho.seq |
A list of two vectors giving the tuning parameters in propensity score estimation (first vector) and outcome regression (second vector). |
y |
An |
tr |
An |
x |
An |
ploss |
A loss function used in propensity score estimation (either "ml" or "cal"). |
yloss |
A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes). |
off |
A |
... |
Additional arguments to |
For calibrated estimation, two sets of propensity scores are separately estimated for the untreated and treated as discussed in Tan (2020a, 2020b).
See also Details for mn.regu.cv
.
ps |
A list containing the results from fitting the propensity score model by |
mfp |
An |
or |
A list containing the results from fitting the outcome regression model by |
mfo |
An |
est |
A list containing the results from augmented IPW estimation by |
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ate.cv.rcal <- ate.regu.cv(fold=5*c(1,1), nrho=(1+10)*c(1,1), rho.seq=NULL, y, tr, x, ploss="cal", yloss="gaus") matrix(unlist(ate.cv.rcal$est), ncol=2, byrow=TRUE, dimnames=list(c("one", "ipw", "or", "est", "var", "ze", "diff.est", "diff.var", "diff.ze"), c("untreated", "treated")))
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ate.cv.rcal <- ate.regu.cv(fold=5*c(1,1), nrho=(1+10)*c(1,1), rho.seq=NULL, y, tr, x, ploss="cal", yloss="gaus") matrix(unlist(ate.cv.rcal$est), ncol=2, byrow=TRUE, dimnames=list(c("one", "ipw", "or", "est", "var", "ze", "diff.est", "diff.var", "diff.ze"), c("untreated", "treated")))
This function implements model-assisted inference for average treatment effects, using regularized calibrated estimation along regularization paths for propensity score (PS) estimation while based on cross validation for outcome regression (OR).
ate.regu.path(fold, nrho = NULL, rho.seq = NULL, y, tr, x, ploss = "cal", yloss = "gaus", off = NULL, ...)
ate.regu.path(fold, nrho = NULL, rho.seq = NULL, y, tr, x, ploss = "cal", yloss = "gaus", off = NULL, ...)
fold |
A vector of length 2, with the second component giving the fold number for cross validation in outcome regression. The first component is not used. |
nrho |
A vector of length 2 giving the number of tuning parameters in a regularization path for PS estimation and that in cross validation for OR. |
rho.seq |
A list of two vectors giving the tuning parameters for propensity score estimation (first vector) and outcome regression (second vector). |
y |
An |
tr |
An |
x |
An |
ploss |
A loss function used in propensity score estimation (either "ml" or "cal"). |
yloss |
A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes). |
off |
A |
... |
Additional arguments to |
See Details for ate.regu.cv
.
ps |
A list of 2 objects, giving the results from fitting the propensity score model by |
mfp |
A list of 2 matrices of fitted propensity scores, along the PS regularization path, for untreated (first matrix) and treated (second matrix). |
or |
A list of 2 lists of objects for untreated (first) and treated (second), where each object gives
the results from fitting the outcome regression model by |
mfo |
A list of 2 matrices of fitted values from outcome regression based on cross validation, along the PS regularization path, for untreated (first matrix) and treated (second matrix). |
est |
A list containing the results from augmented IPW estimation by |
rho |
A vector of tuning parameters leading to converged results in propensity score estimation. |
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ate.path.rcal <- ate.regu.path(fold=5*c(0,1), nrho=(1+10)*c(1,1), rho.seq=NULL, y, tr, x, ploss="cal", yloss="gaus") ate.path.rcal$est
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ate.path.rcal <- ate.regu.path(fold=5*c(0,1), nrho=(1+10)*c(1,1), rho.seq=NULL, y, tr, x, ploss="cal", yloss="gaus") ate.path.rcal$est
This function implements non-regularizd M-estimation for fitting generalized linear models with continuous or binary responses, including maximum likelihood, calibrated estimation, and covariate-balancing estimation in the latter case of fitting propensity score models.
glm.nreg(y, x, iw = NULL, loss = "cal", init = NULL)
glm.nreg(y, x, iw = NULL, loss = "cal", init = NULL)
y |
An |
x |
An |
iw |
An |
loss |
A loss function used, which can be specified as "gaus" for continuous responses, or "ml", "cal", or "bal" for binary responses. |
init |
A |
Least squares estimation is implemented by calling lm
for continuous responses (loss
="gaus"). For binary responses,
maximum likelihood estimation (loss
="ml") is implemented by calling glm
. Calibrated estimation (loss
="cal") is implemented by
using a trust-region algorithm in the R package trust to minimize the calibration loss, i.e., (6) in Tan (2020).
Covariate-balancing estimation (loss
="bal") in Imai and Ratkovic (2014) is implemented by using trust to minimize (36) in Tan (2020a).
coef |
The |
fit |
The |
conv |
Logical; 1 if loss="gaus" for continuous responses or convergence is obtained within 1000 iterations by |
Imai, K. and Ratkovic, M. (2014) Covariate balancing propensity score, Journal of the Royal Statistical Society, Ser. B, 76, 243-263.
Tan, Z. (2020) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # include only 10 covariates x2 <- x[,1:10] ps.ml <- glm.nreg(y=tr, x=x2, loss="ml") check.ml <- mn.ipw(x2, tr, ps.ml$fit) check.ml ps.cal <- glm.nreg(y=tr, x=x2, loss="cal") check.cal <- mn.ipw(x2, tr, ps.cal$fit) check.cal # should be numerically 0 ps.bal <- glm.nreg(y=tr, x=x2, loss="bal") check.bal <- mn.ipw(x2, tr, ps.bal$fit) check.bal
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # include only 10 covariates x2 <- x[,1:10] ps.ml <- glm.nreg(y=tr, x=x2, loss="ml") check.ml <- mn.ipw(x2, tr, ps.ml$fit) check.ml ps.cal <- glm.nreg(y=tr, x=x2, loss="cal") check.cal <- mn.ipw(x2, tr, ps.cal$fit) check.cal # should be numerically 0 ps.bal <- glm.nreg(y=tr, x=x2, loss="bal") check.bal <- mn.ipw(x2, tr, ps.bal$fit) check.bal
This function implements regularized M-estimation for fitting generalized linear models with continuous or binary responses for a fixed choice of tuning parameters.
glm.regu(y, x, iw = NULL, loss = "cal", init = NULL, rhos, test = NULL, offs = NULL, id = NULL, Wmat = NULL, Rmat = NULL, zzs = NULL, xxs = NULL, n.iter = 100, eps = 1e-06, bt.lim = 3, nz.lab = NULL, pos = 10000)
glm.regu(y, x, iw = NULL, loss = "cal", init = NULL, rhos, test = NULL, offs = NULL, id = NULL, Wmat = NULL, Rmat = NULL, zzs = NULL, xxs = NULL, n.iter = 100, eps = 1e-06, bt.lim = 3, nz.lab = NULL, pos = 10000)
y |
An |
x |
An |
iw |
An |
loss |
A loss function, which can be specified as "gaus" for continuous responses, or "ml" or "cal" for binary respones. |
init |
A |
rhos |
A |
test |
A vector giving the indices of observations between 1 and |
offs |
An |
id |
An argument which can be used to speed up computation. |
Wmat |
An argument which can be used to speed up computation. |
Rmat |
An argument which can be used to speed up computation. |
zzs |
An argument which can be used to speed up computation. |
xxs |
An argument which can be used to speed up computation. |
n.iter |
The maximum number of iterations allowed. An iteration is defined by computing an quadratic approximation and solving a least-squares Lasso problem. |
eps |
The tolerance at which the difference in the objective (loss plus penalty) values is considered close enough to 0 to declare convergence. |
bt.lim |
The maximum number of backtracking steps allowed. |
nz.lab |
A |
pos |
A value which can be used to facilitate recording the numbers of nonzero coefficients with or without the restriction by |
For continuous responses, this function uses an active-set descent algorithm (Osborne et al. 2000; Yang and Tan 2018) to solve the least-squares Lasso problem. For binary responses, regularized calibrated estimation is implemented using the Fisher scoring descent algorithm in Tan (2020), whereas regularized maximum likelihood estimation is implemented in a similar manner based on quadratic approximation as in the R package glmnet.
iter |
The number of iterations performed up to |
conv |
1 if convergence is obtained, 0 if exceeding the maximum number of iterations, or -1 if exceeding maximum number of backtracking steps. |
nz |
A value defined as (nz0 * |
inter |
The estimated intercept. |
bet |
The |
fit |
The vector of fitted values in the training set. |
eta |
The vector of linear predictors in the training set. |
tau |
The |
obj.train |
The average loss in the training set. |
pen |
The Lasso penalty of the estimates. |
obj |
The average loss plus the Lasso penalty. |
fit.test |
The vector of fitted values in the test set. |
eta.test |
The vector of linear predictors in the test set. |
obj.test |
The average loss in the test set. |
id |
This can be re-used to speed up computation. |
Wmat |
This can be re-used to speed up computation. |
Rmat |
This can be re-used to speed up computation. |
zzs |
This can be re-used to speed up computation. |
xxs |
This can be re-used to speed up computation. |
Osborne, M., Presnell, B., and Turlach, B. (2000) A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, 20, 389-404.
Yang, T. and Tan, Z. (2018) Backfitting algorithms for total-variation and empirical-norm penalized additive modeling with high-dimensional data, Stat, 7, e198.
Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Ser. B, 58, 267-288.
Tan, Z. (2020) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ### Example 1: linear regression # rhos should be a vector of length p, even though a constant vector out.rgaus <- glm.regu(y[tr==1], x[tr==1,], rhos=rep(.05,p), loss="gaus") # the intercept out.rgaus$inter # the estimated coefficients and generalized signs; the first 10 are shown cbind(out.rgaus$bet, out.rgaus$tau)[1:10,] # the number of nonzero coefficients out.rgaus$nz ### Example 2: logistic regression using likelihood loss out.rml <- glm.regu(tr, x, rhos=rep(.01,p), loss="ml") out.rml$inter cbind(out.rml$bet, out.rml$tau)[1:10,] out.rml$nz ### Example 3: logistic regression using calibration loss out.rcal <- glm.regu(tr, x, rhos=rep(.05,p), loss="cal") out.rcal$inter cbind(out.rcal$bet, out.rcal$tau)[1:10,] out.rcal$nz
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ### Example 1: linear regression # rhos should be a vector of length p, even though a constant vector out.rgaus <- glm.regu(y[tr==1], x[tr==1,], rhos=rep(.05,p), loss="gaus") # the intercept out.rgaus$inter # the estimated coefficients and generalized signs; the first 10 are shown cbind(out.rgaus$bet, out.rgaus$tau)[1:10,] # the number of nonzero coefficients out.rgaus$nz ### Example 2: logistic regression using likelihood loss out.rml <- glm.regu(tr, x, rhos=rep(.01,p), loss="ml") out.rml$inter cbind(out.rml$bet, out.rml$tau)[1:10,] out.rml$nz ### Example 3: logistic regression using calibration loss out.rcal <- glm.regu(tr, x, rhos=rep(.05,p), loss="cal") out.rcal$inter cbind(out.rcal$bet, out.rcal$tau)[1:10,] out.rcal$nz
This function implements regularized M-estimation for fitting generalized linear models with binary or contiunous responses based on cross validation.
glm.regu.cv(fold, nrho = NULL, rho.seq = NULL, y, x, iw = NULL, loss = "cal", n.iter = 100, eps = 1e-06, tune.fac = 0.5, tune.cut = TRUE, ann.init = TRUE, nz.lab = NULL, permut = NULL)
glm.regu.cv(fold, nrho = NULL, rho.seq = NULL, y, x, iw = NULL, loss = "cal", n.iter = 100, eps = 1e-06, tune.fac = 0.5, tune.cut = TRUE, ann.init = TRUE, nz.lab = NULL, permut = NULL)
fold |
A fold number used for cross validation. |
nrho |
The number of tuning parameters searched in cross validation. |
rho.seq |
A vector of tuning parameters searched in cross validation. If both |
y |
An |
x |
An |
iw |
An |
loss |
A loss function, which can be specified as "gaus" for continuous responses, or "ml" or "cal" for binary respones. |
n.iter |
The maximum number of iterations allowed as in |
eps |
The tolerance used to declare convergence as in |
tune.fac |
The multiplier (factor) used to define |
tune.cut |
Logical; if |
ann.init |
Logical; if |
nz.lab |
A |
permut |
An |
Cross validation is performed as described in Tan (2020a, 2020b). If not specified by users, the sequence of tuning parameters searched is defined as
a geometric series of length nrho
, starting from the value which yields a zero solution, and then decreasing by a factor tune.fac
successively.
After cross validation, two tuning parameters are selected. The first and default choice is the value yielding the smallest average test loss. The second choice is the largest value giving the average test loss within one standard error of the first choice (Hastie, Tibshirani, and Friedman 2016).
permut |
An |
rho |
The vector of tuning parameters, searched in cross validation. |
non.conv |
A vector indicating the non-convergene status found or imputed if |
err.ave |
A vector giving the averages of the test losses in cross validation. |
err.sd |
A vector giving the standard deviations of the test losses in cross validation. |
sel.rho |
A vector of two selected tuning parameters by cross validation; see Details. |
sel.nz |
A vector of numbers of nonzero coefficients estimated for the selected tuning parameters. |
sel.bet |
The |
sel.fit |
The |
Hastie, T., Tibshirani, R., and Friedman. J. (2016) The Elements of Statistical Learning (second edition), Springer: New York.
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ### Example 1: Regularized maximum likelihood estimation of propensity scores ps.cv.rml <- glm.regu.cv(fold=5, nrho=1+10, y=tr, x=x, loss="ml") ps.cv.rml$rho ps.cv.rml$err.ave ps.cv.rml$err.sd ps.cv.rml$sel.rho ps.cv.rml$sel.nz fp.cv.rml <- ps.cv.rml $sel.fit[,1] check.cv.rml <- mn.ipw(x, tr, fp.cv.rml) check.cv.rml$est ### Example 2: Regularized calibrated estimation of propensity scores ps.cv.rcal <- glm.regu.cv(fold=5, nrho=1+10, y=tr, x=x, loss="cal") ps.cv.rcal$rho ps.cv.rcal$err.ave ps.cv.rcal$err.sd ps.cv.rcal$sel.rho ps.cv.rcal$sel.nz fp.cv.rcal <- ps.cv.rcal $sel.fit[,1] check.cv.rcal <- mn.ipw(x, tr, fp.cv.rcal) check.cv.rcal$est
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ### Example 1: Regularized maximum likelihood estimation of propensity scores ps.cv.rml <- glm.regu.cv(fold=5, nrho=1+10, y=tr, x=x, loss="ml") ps.cv.rml$rho ps.cv.rml$err.ave ps.cv.rml$err.sd ps.cv.rml$sel.rho ps.cv.rml$sel.nz fp.cv.rml <- ps.cv.rml $sel.fit[,1] check.cv.rml <- mn.ipw(x, tr, fp.cv.rml) check.cv.rml$est ### Example 2: Regularized calibrated estimation of propensity scores ps.cv.rcal <- glm.regu.cv(fold=5, nrho=1+10, y=tr, x=x, loss="cal") ps.cv.rcal$rho ps.cv.rcal$err.ave ps.cv.rcal$err.sd ps.cv.rcal$sel.rho ps.cv.rcal$sel.nz fp.cv.rcal <- ps.cv.rcal $sel.fit[,1] check.cv.rcal <- mn.ipw(x, tr, fp.cv.rcal) check.cv.rcal$est
This function implements regularized M-estimation for fitting generalized linear models with binary or contiunous responses along a regularization path.
glm.regu.path(nrho = NULL, rho.seq = NULL, y, x, iw = NULL, loss = "cal", n.iter = 100, eps = 1e-06, tune.fac = 0.5, tune.cut = TRUE, ann.init = TRUE, nz.lab = NULL)
glm.regu.path(nrho = NULL, rho.seq = NULL, y, x, iw = NULL, loss = "cal", n.iter = 100, eps = 1e-06, tune.fac = 0.5, tune.cut = TRUE, ann.init = TRUE, nz.lab = NULL)
nrho |
The number of tuning parameters in a regularization path. |
rho.seq |
A vector of tuning parameters in a regularization path. If both |
y |
An |
x |
An |
iw |
An |
loss |
A loss function, which can be specified as "gaus" for continuous responses, or "ml" or "cal" for binary respones. |
n.iter |
The maximum number of iterations allowed as in |
eps |
The tolerance used to declare convergence as in |
tune.fac |
The multiplier (factor) used to define rho.seq if only |
tune.cut |
Logical; if |
ann.init |
Logical; if |
nz.lab |
A |
If not specified by users, the sequence of tuning parameters (i.e., the regularization path) is defined as a geometric series
of length nrho
, starting from the value which yields a zero solution, and then decreasing by a factor tune.fac
successively.
rho |
The vector of tuning parameters included in the regularization path. |
non.conv |
A vector indicating the non-convergene status found or imputed if |
nz.all |
A vector giving the numbers of nonzero coefficients estimated, along the regularization path. |
bet.all |
A matrix giving estimated intercept and coefficients, column by column, along the regularization path. |
fit.all |
A matrix giving fitted values, column by column, along the regularization path. |
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ### Example 1: linear regression out.rgaus.path <- glm.regu.path(rho.seq=c(.01, .02, .05, .1, .2, .5), y=y[tr==1], x=x[tr==1,], loss="gaus") # the estimated intercept and coefficients; the first 10 are shown out.rgaus.path$bet.all[1:10,] ### Example 2: logistic regression using likelihood loss out.rml.path <- glm.regu.path(rho.seq=c(.002, .005, .01, .02, .05, .1), y=tr, x=x, loss="ml") out.rml.path$bet.all[1:10,] ### Example 3: logistic regression using calibration loss out.rcal.path <- glm.regu.path(rho.seq=c(.005, .01, .02, .05, .1, .2), y=tr, x=x, loss="cal") out.rcal.path$bet.all[1:10,]
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ### Example 1: linear regression out.rgaus.path <- glm.regu.path(rho.seq=c(.01, .02, .05, .1, .2, .5), y=y[tr==1], x=x[tr==1,], loss="gaus") # the estimated intercept and coefficients; the first 10 are shown out.rgaus.path$bet.all[1:10,] ### Example 2: logistic regression using likelihood loss out.rml.path <- glm.regu.path(rho.seq=c(.002, .005, .01, .02, .05, .1), y=tr, x=x, loss="ml") out.rml.path$bet.all[1:10,] ### Example 3: logistic regression using calibration loss out.rcal.path <- glm.regu.path(rho.seq=c(.005, .01, .02, .05, .1, .2), y=tr, x=x, loss="cal") out.rcal.path$bet.all[1:10,]
This function implements augmented inverse probability weighted (IPW) estimation of local average treatment effects (LATEs) as proposed in Tan (2006), provided the fitted instrument propensity scores and fitted values from both treatment and outcome regressions.
late.aipw(y, tr, iv, mfp, mft, mfo, off = NULL)
late.aipw(y, tr, iv, mfp, mft, mfo, off = NULL)
y |
An |
tr |
An |
iv |
An |
mfp |
An |
mft |
An |
mfo |
An |
off |
A |
The individual expectations are estimated separately for
using inverse probability weighting ("ipw"), treatment and outcome regressions ("or") and augmented IPW methods as proposed in Tan (2006). The population LATE is defined as
.
ipw |
A |
or |
A |
est |
A |
var |
The estimated variances associated with the augmented IPW estimates of |
ze |
The z-statistics for the augmented IPW estimates of |
late.est |
The augmented IPW estimate of LATE. |
late.var |
The estimated variance associated with the augmented IPW estimate of LATE. |
late.ze |
The z-statistic for the augmented IPW estimate of LATE, compared to |
Tan, Z. (2006) Regression and weighting methods for causal inference using instrumental variables, Journal of the American Statistical Association, 101, 1607–1618.
This function implements model-assisted inference for local average treatment effects, using non-regularized calibrated estimation.
late.nreg(y, tr, iv, fx, gx, hx, arm = 2, d1 = NULL, d2 = NULL, ploss = "cal", yloss = "gaus", off = NULL)
late.nreg(y, tr, iv, fx, gx, hx, arm = 2, d1 = NULL, d2 = NULL, ploss = "cal", yloss = "gaus", off = NULL)
y |
An |
tr |
An |
iv |
An |
fx |
An |
gx |
An |
hx |
An |
arm |
An integer 0, 1 or 2 indicating whether |
d1 |
Degree of truncated polynomials of fitted values from treatment regression to be included as regressors in the outcome regression (NULL: no adjustment, 0: piecewise constant, 1: piecewise linear etc..). |
d2 |
Number of knots of fitted values from treatment regression to be included as regressors in the outcome regression, with knots specified as the |
ploss |
A loss function used in instrument propensity score estimation (either "ml" for likelihood estimation or "cal" for calibrated estimation). |
yloss |
A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes). |
off |
A |
For ploss="cal", calibrated estimation of the instrument propensity score (IPS) and weighted likelihood estimation of the treatment and outcome regression models are performed, similarly as in Sun and Tan (2020), but without regularization.
See also Details for mn.nreg
.
ips |
A list containing the results from fitting the instrument propensity score models by |
mfp |
An |
tps |
A list containing the results from fitting the treatment regression models by |
mft |
An |
or |
A list containing the results from fitting the outcome regression models by |
mfo |
An |
est |
A list containing the results from augmented IPW estimation by |
Tan, Z. (2006) Regression and weighting methods for causal inference using instrumental variables, Journal of the American Statistical Association, 101, 1607–1618.
Sun, B. and Tan, Z. (2020) High-dimensional model-assisted inference for local average treatment effects with instrumental variables, arXiv:2009.09286.
data(simu.iv.data) n <- dim(simu.iv.data)[1] p <- dim(simu.iv.data)[2]-3 y <- simu.iv.data[,1] tr <- simu.iv.data[,2] iv <- simu.iv.data[,3] x <- simu.iv.data[,3+1:p] x <- scale(x) # include only 10 covariates x2 <- x[,1:10] late.cal <- late.nreg(y, tr, iv, fx=x2, gx=x2, hx=x2, arm=2, d1=1, d2=3, ploss="cal", yloss="gaus") matrix(unlist(late.cal$est), ncol=2, byrow=TRUE, dimnames=list(c("ipw", "or", "est", "var", "ze", "late.est", "late.var", "late.ze"), c("theta1", "theta0")))
data(simu.iv.data) n <- dim(simu.iv.data)[1] p <- dim(simu.iv.data)[2]-3 y <- simu.iv.data[,1] tr <- simu.iv.data[,2] iv <- simu.iv.data[,3] x <- simu.iv.data[,3+1:p] x <- scale(x) # include only 10 covariates x2 <- x[,1:10] late.cal <- late.nreg(y, tr, iv, fx=x2, gx=x2, hx=x2, arm=2, d1=1, d2=3, ploss="cal", yloss="gaus") matrix(unlist(late.cal$est), ncol=2, byrow=TRUE, dimnames=list(c("ipw", "or", "est", "var", "ze", "late.est", "late.var", "late.ze"), c("theta1", "theta0")))
This function implements model-assisted inference for LATEs with instrumental variables, using regularized calibrated estimation based on cross validation.
late.regu.cv(fold, nrho = NULL, rho.seq = NULL, y, tr, iv, fx, gx, hx, arm = 2, d1 = NULL, d2 = NULL, ploss = "cal", yloss = "gaus", off = NULL, ...)
late.regu.cv(fold, nrho = NULL, rho.seq = NULL, y, tr, iv, fx, gx, hx, arm = 2, d1 = NULL, d2 = NULL, ploss = "cal", yloss = "gaus", off = NULL, ...)
fold |
A vector of length 3 giving the fold numbers for cross validation in instrument propensity score estimation, treatment and outcome regressions respectively. |
nrho |
A vector of length 3 giving the numbers of tuning parameters searched in cross validation. |
rho.seq |
A list of three vectors giving the tuning parameters in instrument propensity score estimation (first vector), treatment (second vector) and outcome (third vector) regressions. |
y |
An |
tr |
An |
iv |
An |
fx |
An |
gx |
An |
hx |
An |
arm |
An integer 0, 1 or 2 indicating whether |
d1 |
Degree of truncated polynomials of fitted values from treatment regression to be included as regressors in the outcome regression (NULL: no adjustment, 0: piecewise constant, 1: piecewise linear etc.). |
d2 |
Number of knots of fitted values from treatment regression to be included as regressors in the outcome regression, with knots specified as the |
ploss |
A loss function used in instrument propensity score estimation (either "ml" for likelihood estimation or "cal" for calibrated estimation). |
yloss |
A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes). |
off |
A |
... |
Additional arguments to |
For ploss
="cal", regularized calibrated estimation of the instrument propensity score (IPS) and regularized weighted likelihood estimation of the treatment and outcome regression models are performed. The method leads to model-assisted inference for LATE, in which condidence intervals are valid
with high-dimensional data if the IPS model is correctly specified, but the treatment and outcome regression models may be misspecified (Sun and Tan 2020). For ploss
="ml", regularized maximum likelihood estimation is used (Chernozhukov et al. 2018). In this case, standard errors are only shown to be valid if the IPS, treatment and outcome models are all correctly specified.
ips |
A list containing the results from fitting the instrument propensity score models by |
mfp |
An |
tps |
A list containing the results from fitting the treatment regression models by |
mft |
An |
or |
A list containing the results from fitting the outcome regression models by |
mfo |
An |
est |
A list containing the results from augmented IPW estimation by |
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J.M. (2018) Double/debiased machine learning for treatment and structural parameters, The Econometrics Journal, 21, C1–C68.
Sun, B. and Tan, Z. (2020) High-dimensional model-assisted inference for local average treatment effects with instrumental variables, arXiv:2009.09286.
data(simu.iv.data) n <- dim(simu.iv.data)[1] p <- dim(simu.iv.data)[2]-3 y <- simu.iv.data[,1] tr <- simu.iv.data[,2] iv <- simu.iv.data[,3] x <- simu.iv.data[,3+1:p] x <- scale(x) late.cv.rcal <- late.regu.cv(fold=5*c(1,1,1), nrho=(1+10)*c(1,1,1), rho.seq=NULL, y, tr, iv, fx=x, gx=x, hx=x, arm=2, d1=1, d2=3, ploss="cal", yloss="gaus") matrix(unlist(late.cv.rcal$est), ncol=2, byrow=TRUE, dimnames=list(c("ipw", "or", "est", "var", "ze", "late.est", "late.var", "late.ze"), c("theta1", "theta0")))
data(simu.iv.data) n <- dim(simu.iv.data)[1] p <- dim(simu.iv.data)[2]-3 y <- simu.iv.data[,1] tr <- simu.iv.data[,2] iv <- simu.iv.data[,3] x <- simu.iv.data[,3+1:p] x <- scale(x) late.cv.rcal <- late.regu.cv(fold=5*c(1,1,1), nrho=(1+10)*c(1,1,1), rho.seq=NULL, y, tr, iv, fx=x, gx=x, hx=x, arm=2, d1=1, d2=3, ploss="cal", yloss="gaus") matrix(unlist(late.cv.rcal$est), ncol=2, byrow=TRUE, dimnames=list(c("ipw", "or", "est", "var", "ze", "late.est", "late.var", "late.ze"), c("theta1", "theta0")))
This function implements model-assisted inference for local average treatment effects (LATEs) using regularized calibrated estimation along regularization paths for instrument propensity score (IPS) estimation, while based on cross validation for the treatment and outcome regressions.
late.regu.path(fold, nrho = NULL, rho.seq = NULL, y, tr, iv, fx, gx, hx, arm = 2, d1 = NULL, d2 = NULL, ploss = "cal", yloss = "gaus", off = NULL, ...)
late.regu.path(fold, nrho = NULL, rho.seq = NULL, y, tr, iv, fx, gx, hx, arm = 2, d1 = NULL, d2 = NULL, ploss = "cal", yloss = "gaus", off = NULL, ...)
fold |
A vector of length 3, with the second and third components giving the fold number for cross validation in the treatment and outcome regressions respectively. The first component is not used. |
nrho |
A vector of length 3 giving the number of tuning parameters in a regularization path for IPS estimation and that in cross validation for the treatment and outcome regressions. |
rho.seq |
A list of two vectors giving the tuning parameters for IPS estimation (first vector), treatment (second vector) and outcome (third vector) regressions. |
y |
An |
tr |
An |
iv |
An |
fx |
An |
gx |
An |
hx |
An |
arm |
An integer 0, 1 or 2 indicating whether |
d1 |
Degree of truncated polynomials of fitted values from treatment regression to be included as regressors in the outcome regression (NULL: no adjustment, 0: piecewise constant, 1: piecewise linear etc.). |
d2 |
Number of knots of fitted values from treatment regression to be included as regressors in the outcome regression, with knots specified as the |
ploss |
A loss function used in instrument propensity score estimation (either "ml" for likelihood estimation or "cal" for calibrated estimation). |
yloss |
A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes). |
off |
A |
... |
Additional arguments to |
ips |
A list of 2 objects, giving the results from fitting the IPS models by |
mfp |
A list of 2 matrices of fitted instrument propensity scores, along the IPS regularization path, for |
tps |
A list of 2 lists of objects for |
mft |
A list of 2 matrices of fitted treatment regression models based on cross validation, along the IPS regularization path, for |
or |
A list of 4 lists of objects for |
mfo |
A list of 4 matrices of fitted outcome regression models based on cross validation, along the IPS regularization path, for |
est |
A list containing the results from augmented IPW estimation by |
rho |
A vector of tuning parameters leading to converged results in IPS estimation. |
Sun, B. and Tan, Z. (2020) High-dimensional model-assisted inference for local average treatment effects with instrumental variables, arXiv:2009.09286.
data(simu.iv.data) n <- dim(simu.iv.data)[1] p <- dim(simu.iv.data)[2]-3 y <- simu.iv.data[,1] tr <- simu.iv.data[,2] iv <- simu.iv.data[,3] x <- simu.iv.data[,3+1:p] x <- scale(x) late.path.rcal <- late.regu.path(fold=5*c(0,1,1), nrho=(1+10)*c(1,1,1), rho.seq=NULL, y, tr, iv, fx=x, gx=x, hx=x, arm=2, d1=1, d2=3, ploss="cal", yloss="gaus") late.path.rcal$est
data(simu.iv.data) n <- dim(simu.iv.data)[1] p <- dim(simu.iv.data)[2]-3 y <- simu.iv.data[,1] tr <- simu.iv.data[,2] iv <- simu.iv.data[,3] x <- simu.iv.data[,3+1:p] x <- scale(x) late.path.rcal <- late.regu.path(fold=5*c(0,1,1), nrho=(1+10)*c(1,1,1), rho.seq=NULL, y, tr, iv, fx=x, gx=x, hx=x, arm=2, d1=1, d2=3, ploss="cal", yloss="gaus") late.path.rcal$est
This function implements augmented inverse probability weighted (IPW) estimation of population means with missing data, provided both fitted propensity scores and fitted values from outcome regression.
mn.aipw(y, tr, fp, fo, off = 0)
mn.aipw(y, tr, fp, fo, off = 0)
y |
An |
tr |
An |
fp |
An |
fo |
An |
off |
An offset value (e.g., the true value in simulations) used to calculate the z-statistic. |
one |
The direct IPW estimate of 1. |
ipw |
The ratio IPW estimate. |
or |
The outcome regression estimate. |
est |
The augmented IPW estimate. |
var |
The estimated variance associated with the augmented IPW estimate. |
ze |
The z-statistic for the augmented IPW estimate, compared to |
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
This function implements inverse probability weighted (IPW) estimation of population means with missing data, provided fitted propensity scores.
mn.ipw(y, tr, fp)
mn.ipw(y, tr, fp)
y |
An |
tr |
An |
fp |
An |
The ratio IPW estimate is the direct IPW estimate divided by that with y
replaced by a vector of 1s. The latter is referred to as
the direct IPW estimate of 1.
one |
The direct IPW estimate of 1. |
est |
The ratio IPW estimate. |
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
This function implements model-assisted inference for population means with missing data, using non-regularized calibrated estimation.
mn.nreg(y, tr, x, ploss = "cal", yloss = "gaus", off = 0)
mn.nreg(y, tr, x, ploss = "cal", yloss = "gaus", off = 0)
y |
An |
tr |
An |
x |
An |
ploss |
A loss function used in propensity score estimation (either "ml" or "cal"). |
yloss |
A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes). |
off |
An offset value (e.g., the true value in simulations) used to calculate the z-statistic from augmented IPW estimation. |
Two steps are involved in this function: first fitting propensity score and outcome regression models and then applying the augmented IPW estimator
for a population mean. For ploss
="cal", calibrated estimation is performed similarly as in Tan (2020a, 2020b), but without regularization.
The method then leads to model-assisted inference, in which confidence intervals are valid if the propensity score model is correctly specified but
the outcome regression model may be misspecified.
With linear outcome models, the inference is also doubly robust (Kim and Haziza 2014; Vermeulen and Vansteelandt 2015).
For ploss
="ml", maximum likelihood estimation is used (Robins et al. 1994). In this case, standard errors are in general conservative
if the propensity score model is correctly specified but the outcome regression model may be misspecified.
ps |
A list containing the results from fitting the propensity score model by |
fp |
The |
or |
A list containing the results from fitting the outcome regression model by |
fo |
The |
est |
A list containing the results from augmented IPW estimation by |
Kim, J.K. and Haziza, D. (2014) Doubly robust inference with missing data in survey sampling, Statistica Sinica, 24, 375-394.
Robins, J.M., Rotnitzky, A., and Zhao, L.P. (1994) Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association, 89, 846-866.
Vermeulen, K. and Vansteelandt, S. (2015) Bias-reduced doubly robust estimation, Journal of the American Statistical Association, 110, 1024-1036.
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # missing data y[tr==0] <- NA # include only 10 covariates x2 <- x[,1:10] mn.cal <- mn.nreg(y, tr, x2, ploss="cal", yloss="gaus") unlist(mn.cal$est)
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # missing data y[tr==0] <- NA # include only 10 covariates x2 <- x[,1:10] mn.cal <- mn.nreg(y, tr, x2, ploss="cal", yloss="gaus") unlist(mn.cal$est)
This function implements model-assisted inference for population means with missing data, using regularized calibrated estimation based on cross validation.
mn.regu.cv(fold, nrho = NULL, rho.seq = NULL, y, tr, x, ploss = "cal", yloss = "gaus", off = 0, ...)
mn.regu.cv(fold, nrho = NULL, rho.seq = NULL, y, tr, x, ploss = "cal", yloss = "gaus", off = 0, ...)
fold |
A vector of length 2 giving the fold numbers for cross validation in propensity score estimation and outcome regression respectively. |
nrho |
A vector of length 2 giving the numbers of tuning parameters searched in cross validation. |
rho.seq |
A list of two vectors giving the tuning parameters in propensity score estimation (first vector) and outcome regression (second vector). |
y |
An |
tr |
An |
x |
An |
ploss |
A loss function used in propensity score estimation (either "ml" or "cal"). |
yloss |
A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes). |
off |
An offset value (e.g., the true value in simulations) used to calculate the z-statistic from augmented IPW estimation. |
... |
Additional arguments to |
Two steps are involved in this function: first fitting propensity score and outcome regression models and then applying the augmented IPW estimator
for a population mean. For ploss
="cal", regularized calibrated estimation is performed with cross validation as described in Tan (2020a, 2020b).
The method then leads to model-assisted inference, in which confidence intervals are valid with high-dimensinoal data
if the propensity score model is correctly specified but the outcome regression model may be misspecified.
With linear outcome models, the inference is also doubly robust.
For ploss
="ml", regularized maximum likelihood estimation is used (Belloni et al. 2014; Farrell 2015). In this case, standard errors
are only shown to be valid if both the propensity score model and the outcome regression model are correctly specified.
ps |
A list containing the results from fitting the propensity score model by |
fp |
The |
or |
A list containing the results from fitting the outcome regression model by |
fo |
The |
est |
A list containing the results from augmented IPW estimation by |
Belloni, A., Chernozhukov, V., and Hansen, C. (2014) Inference on treatment effects after selection among high-dimensional controls, Review of Economic Studies, 81, 608-650.
Farrell, M.H. (2015) Robust inference on average treatment effects with possibly more covariates than observations, Journal of Econometrics, 189, 1-23.
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # missing data y[tr==0] <- NA mn.cv.rcal <- mn.regu.cv(fold=5*c(1,1), nrho=(1+10)*c(1,1), rho.seq=NULL, y, tr, x, ploss="cal", yloss="gaus") unlist(mn.cv.rcal$est)
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # missing data y[tr==0] <- NA mn.cv.rcal <- mn.regu.cv(fold=5*c(1,1), nrho=(1+10)*c(1,1), rho.seq=NULL, y, tr, x, ploss="cal", yloss="gaus") unlist(mn.cv.rcal$est)
This function implements model-assisted inference for population means with missing data, using regularized calibrated estimation along a regularization path for propensity score (PS) estimation while based on cross validation for outcome regression (OR).
mn.regu.path(fold, nrho = NULL, rho.seq = NULL, y, tr, x, ploss = "cal", yloss = "gaus", off = 0, ...)
mn.regu.path(fold, nrho = NULL, rho.seq = NULL, y, tr, x, ploss = "cal", yloss = "gaus", off = 0, ...)
fold |
A vector of length 2, with the second component giving the fold number for cross validation in outcome regression. The first component is not used. |
nrho |
A vector of length 2 giving the number of tuning parameters in a regularization path for PS estimation and that in cross validation for OR. |
rho.seq |
A list of two vectors giving the tuning parameters for propensity score estimation (first vector) and outcome regression (second vector). |
y |
An |
tr |
An |
x |
An |
ploss |
A loss function used in propensity score estimation (either "ml" or "cal"). |
yloss |
A loss function used in outcome regression (either "gaus" for continuous outcomes or "ml" for binary outcomes). |
off |
An offset value (e.g., the true value in simulations) used to calculate the z-statistic from augmented IPW estimation. |
... |
Additional arguments to |
See Details for mn.regu.cv
.
ps |
A list containing the results from fitting the propensity score model by |
fp |
The matrix of fitted propensity scores, column by column, along the PS regularization path. |
or |
A list of objects, each giving the results from fitting the outcome regression model by |
fo |
The matrix of fitted values from outcome regression based on cross validation, column by column, along the PS regularization path. |
est |
A list containing the results from augmented IPW estimation by |
rho |
A vector of tuning parameters leading to converged results in propensity score estimation. |
Tan, Z. (2020a) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.
Tan, Z. (2020b) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # missing data y[tr==0] <- NA mn.path.rcal <- mn.regu.path(fold=5*c(0,1), nrho=(1+10)*c(1,1), y=y, tr=tr, x=x, ploss="cal", yloss="gaus") mn.path.rcal$est
data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) # missing data y[tr==0] <- NA mn.path.rcal <- mn.regu.path(fold=5*c(0,1), nrho=(1+10)*c(1,1), y=y, tr=tr, x=x, ploss="cal", yloss="gaus") mn.path.rcal$est
A dataset simulated as in Tan (2020), Section 4.
data(simu.data)
data(simu.data)
A data matrix with 800 rows and 202 columns.
The dataset is generated as follows, where y
, tr
, and x
represent an outcome, a treatment, and covariates respectively.
library(MASS) ### mt0 <- 1-pnorm(-1) mt1 <- dnorm(-1) mt2 <- -(2*pnorm(-1)-1)/2 - dnorm(-1) +1/2 mt3 <- 3*dnorm(-1) mt4 <- -3/2*(2*pnorm(-1)-1) - 4*dnorm(-1) +3/2 m.z1 <- mt0 + 2*mt1 + mt2 v.z1 <- mt0 + 4*mt1 + 6*mt2 + 4*mt3 + mt4 v.z1 <- v.z1 + 1 + 2*(mt1 + 2*mt2 + mt3) sd.z1 <- sqrt(v.z1 -m.z1^2) ### set.seed(123) n <- 800 p <- 200 noise <- rnorm(n) covm <- matrix(1,p,p) for (i1 in 1:p) for (i2 in 1:p) { covm[i1,i2] <- 2^(-abs(i1-i2)) } x <- mvrnorm(n, mu=rep(0,p), Sigma=covm) # transformation z <- x for (i in 1:4) { z[,i] <- ifelse(x[,i]>-1,x[,i]+(x[,i]+1)^2,x[,i]) z[,i] <- (z[,i]-m.z1) /sd.z1 # standardized } # treatment eta <- 1+ c( z[,1:4] %*% c(1, .5, .25, .125) ) tr <- rbinom(n, size=1, prob=expit(eta)) # outcome eta.y <- c( z[,1:4] %*% c(1, .5, .25, .125) ) y <- eta.y + noise # save; if using main effects of x, then both the propensity score # and outcome regression models are misspecified simu.data <- cbind(y, tr, x) save(simu.data, file="simu.data.rda")
Tan, Z. (2020) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
A dataset simulated as in Sun and Tan (2020), Section 4.
data(simu.iv.data)
data(simu.iv.data)
A data matrix with 800 rows and 203 columns.
The dataset is generated as follows, where y
, iv
, tr
and x
represent an outcome, an instrumental variable, a treatment, and covariates respectively.
g<-function(z) { 1/(1+exp(z/b))^2*dnorm(z) } rnorm.trunct <- function(n, mu, sig, lft, rgt) { x <- rep(0,n) for (i in 1:n) { x[i] <- rnorm(1,mu,sig) while (x[i]<=lft | x[i]>rgt) x[i] <- rnorm(1,mu,sig) } return(x) } ### covariate mean and variance computed as in preprint of Tan (2020) a<- 2.5; c<- 2*pnorm(a)-1; b<- sqrt(1-2*a*dnorm(a)/c) m1<- exp(1/(8*b^2))*(pnorm(a-1/(2*b))-pnorm(-a-1/(2*b)))/c v1<- exp(1/(2*b^2))*(pnorm(a-1/b)-pnorm(-a-1/b))/c-m1^2; m2<- 10; v2<- 1/c*integrate(g,-a,a)$value #by numerical integration m3 <- 3/(25^2)*0.6+(0.6)^3; mu4 <-(1/(b^4*c))*((3/2*(2*pnorm(a)-1)-a*(a^2+3)*dnorm(a)) -(3/2*(2*pnorm(-a)-1)-(-a)*((-a)^2+3)*dnorm(-a))) mu6 <-(1/(b^6*c))*((15/2*(2*pnorm(a)-1)-a*(a^4+5*a^2+15)*dnorm(a)) -(15/2*(2*pnorm(-a)-1)-(-a)*((-a)^4+5*(-a)^2+15)*dnorm(-a))) v3 <-mu6^2/25^6+15*mu4^2/25^4*0.6^2+15/25^2*0.6^4+0.6^6-m3^2 m4<- 2+20^2; v4<- (2*mu4+6)+6*2*20^2+20^4-m4^2 ### set.seed(120) n<- 800 p<- 200 # covariates x<- matrix(rnorm.trunct(p*n, 0, 1, -a, a),n,p)/b # transformation z<- x z[,1] <- (exp(0.5*x[,1])-m1)/sqrt(v1); z[,2] <- (10+x[,2]/(1+exp(x[,1]))-m2)/sqrt(v2); z[,3] <- ((0.04*x[,1]*x[,3]+0.6)^3-m3)/sqrt(v3); z[,4] <- ((x[,2]+x[,4]+20)^2-m4)/sqrt(v4); # instrumental variable eta<- z[,1:4] iv<- rbinom(n,1,prob=expit(eta)); # unmeasured confounder in latent index model u<- rlogis(n, location = 0, scale = 1); # treatment eta.d<- 1+cbind(iv,z[,1:4]) tr<- as.numeric(eta.d >=u); # outcome late <- 1 eta.y <- late*tr +z[,1:4] y <- rnorm(n, mean=eta.y, sd=1) # save; if using main effects of x, then both the instrument propensity score # and outcome models are misspecified simu.iv.data <- cbind(y,tr,iv,x) save(simu.iv.data, file="simu.iv.data.rda")
Tan, Z. (2020) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811–837.
Sun, B. and Tan, Z. (2020) High-dimensional model-assisted inference for local average treatment effects with instrumental variables, arXiv:2009.09286.