Title: | Cross-Fitting for Doubly Robust Evaluation of High-Dimensional Surrogate Markers |
---|---|
Description: | Doubly robust methods for evaluating surrogate markers as outlined in: Agniel D, Hejblum BP, Thiebaut R & Parast L (2022). "Doubly robust evaluation of high-dimensional surrogate markers", Biostatistics <doi:10.1093/biostatistics/kxac020>. You can use these methods to determine how much of the overall treatment effect is explained by a (possibly high-dimensional) set of surrogate markers. |
Authors: | Denis Agniel [aut, cre], Boris P. Hejblum [aut] |
Maintainer: | Denis Agniel <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.1 |
Built: | 2024-12-12 07:05:01 UTC |
Source: | CRAN |
A simple function to simulate example data.
sim_data(n, p)
sim_data(n, p)
n |
number of simulated observations |
p |
number of simulated variables |
toy dataset used for demonstrating the methods with outcome y
, treatment a
, covariates x.1, x.2
, and surrogates s.1, s.2, ...
A function for estimating the proportion of treatment effect explained using cross-fitting.
xf_surrogate( ds, x = NULL, s, y, a, K = 5, outcome_learners = NULL, ps_learners = outcome_learners, interaction_model = TRUE, trim_at = 0.05, outcome_family = gaussian(), mthd = "superlearner", n_ptb = 0, ncores = parallel::detectCores() - 1, ... )
xf_surrogate( ds, x = NULL, s, y, a, K = 5, outcome_learners = NULL, ps_learners = outcome_learners, interaction_model = TRUE, trim_at = 0.05, outcome_family = gaussian(), mthd = "superlearner", n_ptb = 0, ncores = parallel::detectCores() - 1, ... )
ds |
a |
x |
names of all covariates in |
s |
names of surrogates in |
y |
name of the outcome in |
a |
treatment variable name (eg. groups). Expect a binary variable made of |
K |
number of folds for cross-fitting. Default is |
outcome_learners |
string vector indicating learners to be used for estimation of the outcome function (e.g., |
ps_learners |
string vector indicating learners to be used for estimation of the propensity score function (e.g., |
interaction_model |
logical indicating whether outcome functions for treated and control should be estimated separately. Default is |
trim_at |
threshold at which to trim propensity scores. Default is |
outcome_family |
default is |
mthd |
selected regression method. Default is |
n_ptb |
Number of perturbations. Default is |
ncores |
number of cpus used for parallel computations. Default is |
... |
additional parameters (in particular for super_learner) |
a tibble
with columns:
R
: estimate of the proportion of treatment effect explained, equal to 1 - deltahat_s
/deltahat
.
R_se
standard error for the PTE.
deltahat_s
: residual treatment effect estimate.
deltahat_s_se
: standard error for the residual treatment effect.
pi_o
: estimate of the proportion of overlap.
R_o
: PTE only in the overlap region.
R_o_se
: the standard error for R_o
.
deltahat_s_o
: residual treatment effect in overlap region,
deltahat_s_se_o
: standard error for deltahat_s_o
.
deltahat
: overall treatment effect estimate.
deltahat_se
: standard error for overall treatment effect estimate.
delta_diff
: difference between the treatment effects, equal to the numerator of PTE.
dd_se
: standard error for delta_diff
n <- 300 p <- 50 q <- 2 wds <- sim_data(n = n, p = p) if(interactive()){ sl_est <- xf_surrogate(ds = wds, x = paste('x.', 1:q, sep =''), s = paste('s.', 1:p, sep =''), a = 'a', y = 'y', K = 4, trim_at = 0.01, mthd = 'superlearner', outcome_learners = c("SL.mean","SL.lm", "SL.svm", "SL.ridge"), ps_learners = c("SL.mean", "SL.glm", "SL.svm", "SL.lda"), ncores = 1) lasso_est <- xf_surrogate(ds = wds, x = paste('x.', 1:q, sep =''), s = paste('s.', 1:p, sep =''), a = 'a', y = 'y', K = 4, trim_at = 0.01, mthd = 'lasso', ncores = 1) }
n <- 300 p <- 50 q <- 2 wds <- sim_data(n = n, p = p) if(interactive()){ sl_est <- xf_surrogate(ds = wds, x = paste('x.', 1:q, sep =''), s = paste('s.', 1:p, sep =''), a = 'a', y = 'y', K = 4, trim_at = 0.01, mthd = 'superlearner', outcome_learners = c("SL.mean","SL.lm", "SL.svm", "SL.ridge"), ps_learners = c("SL.mean", "SL.glm", "SL.svm", "SL.lda"), ncores = 1) lasso_est <- xf_surrogate(ds = wds, x = paste('x.', 1:q, sep =''), s = paste('s.', 1:p, sep =''), a = 'a', y = 'y', K = 4, trim_at = 0.01, mthd = 'lasso', ncores = 1) }
A function for estimating the proportion of treatment effect explained using repeated cross-fitting.
xfr_surrogate( ds, x = NULL, s, y, a, splits = 50, K = 5, outcome_learners = NULL, ps_learners = NULL, interaction_model = TRUE, trim_at = 0.05, outcome_family = gaussian(), mthd = "superlearner", n_ptb = 0, ... )
xfr_surrogate( ds, x = NULL, s, y, a, splits = 50, K = 5, outcome_learners = NULL, ps_learners = NULL, interaction_model = TRUE, trim_at = 0.05, outcome_family = gaussian(), mthd = "superlearner", n_ptb = 0, ... )
ds |
a |
x |
names of all covariates in |
s |
names of surrogates in |
y |
name of the outcome in |
a |
treatment variable name (eg. groups). Expect a binary variable made of |
splits |
number of data splits to perform. |
K |
number of folds for cross-fitting. Default is |
outcome_learners |
string vector indicating learners to be used for estimation of the outcome function (e.g., |
ps_learners |
string vector indicating learners to be used for estimation of the propensity score function (e.g., |
interaction_model |
logical indicating whether outcome functions for treated and control should be estimated separately. Default is |
trim_at |
threshold at which to trim propensity scores. Default is |
outcome_family |
default is |
mthd |
selected regression method. Default is |
n_ptb |
Number of perturbations. Default is |
... |
additional parameters (in particular for super_learner) |
a tibble
with columns:
Rm
: estimate of the proportion of treatment effect explained, computed as the median over the repeated splits.
R_se0
standard error for the PTE, accounting for the variability due to splitting.
R_cil0
lower confidence interval value for the PTE.
R_cih0
upper confidence interval value for the PTE.
Dm
: estimate of the overall treatment effect, computed as the median over the repeated splits.
D_se0
standard error for the overall treatment effect, accounting for the variability due to splitting.
D_cil0
lower confidence interval value for the overall treatment effect.
D_cih0
upper confidence interval value for the overall treatment effect.
Dsm
: estimate of the residual treatment effect, computed as the median over the repeated splits.
Ds_se0
standard error for the residual treatment effect, accounting for the variability due to splitting.
Ds_cil0
lower confidence interval value for the residual treatment effect.
Ds_cih0
upper confidence interval value for the residual treatment effect.
n <- 100 p <- 20 q <- 2 wds <- sim_data(n = n, p = p) if(interactive()){ lasso_est <- xfr_surrogate(ds = wds, x = paste('x.', 1:q, sep =''), s = paste('s.', 1:p, sep =''), a = 'a', y = 'y', splits = 2, K = 2, trim_at = 0.01, mthd = 'lasso', ncores = 1) }
n <- 100 p <- 20 q <- 2 wds <- sim_data(n = n, p = p) if(interactive()){ lasso_est <- xfr_surrogate(ds = wds, x = paste('x.', 1:q, sep =''), s = paste('s.', 1:p, sep =''), a = 'a', y = 'y', splits = 2, K = 2, trim_at = 0.01, mthd = 'lasso', ncores = 1) }