Title: | Precision Medicine |
---|---|
Description: | A doubly robust precision medicine approach to fit, cross-validate and visualize prediction models for the conditional average treatment effect (CATE). It implements doubly robust estimation and semiparametric modeling approach of treatment-covariate interactions as proposed by Yadlowsky et al. (2020) <doi:10.1080/01621459.2020.1772080>. |
Authors: | Lu Tian [aut] , Xiaotong Jiang [aut] , Gabrielle Simoneau [aut] , Biogen MA Inc. [cph], Thomas Debray [ctb, cre] , Stan Wijn [ctb] , Joana Caldas [ctb] |
Maintainer: | Thomas Debray <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 1.1.0 |
Built: | 2024-12-05 07:10:23 UTC |
Source: | CRAN |
"precmed"
objectCompute the area between curves (ABC) for each scoring method in the "precmed"
object.
This should be run only after results of catecv()
have been obtained.
abc(x, ...) ## Default S3 method: abc(x, ...)
abc(x, ...) ## Default S3 method: abc(x, ...)
x |
An object of class |
... |
Additional arguments (currently unused). |
The ABC is the area between a validation curve and the overall ATE in the validation set.
It is calculated for each scoring method separately. Higher ABC values are preferable as they
indicate that more treatment effect heterogeneity is captured by the scoring method.
Negative values of ABC are possible if segments of the validation curve cross the overall ATE line.
The ABC is calculated with the auc()
in utility.R
with a natural
cubic spline interpolation. The calculation of the ABC is always based on validation curves based on
100 proportions equally spaced from min(prop.cutoff)
to max(prop.cutoff)
.
The ABC is a metric to help users select the best scoring method in terms of capturing treatment
effect heterogeneity in the data. It should be used in complement to the visual inspection of
the validation curves in the validation set in plot()
.
Returns a matrix of numeric values with number of columns equal to the number cross-validation
iteration and number of rows equal to the number of scoring methods in x
.
Zhao, L., Tian, L., Cai, T., Claggett, B., & Wei, L. J. (2013). Effectively selecting a target population for a future comparative study. Journal of the American Statistical Association, 108(502), 527-539.
catecv()
function and plot()
, boxplot()
methods for
"precmed"
objects.
# Count outcome cv_count <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5, verbose = 1) # ABC of the validation curves for each method and each CV iteration abc(cv_count) # Survival outcome library(survival) cv_surv <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5) # ABC of the validation curves for each method and each CV iteration abc(cv_surv)
# Count outcome cv_count <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5, verbose = 1) # ABC of the validation curves for each method and each CV iteration abc(cv_count) # Survival outcome library(survival) cv_surv <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5) # ABC of the validation curves for each method and each CV iteration abc(cv_surv)
"precmed"
objectCompute the area between curves (ABC) for each scoring method in the "precmed"
object.
This should be run only after results of catecv()
have been obtained.
## S3 method for class 'precmed' abc(x, ...)
## S3 method for class 'precmed' abc(x, ...)
x |
An object of class |
... |
Additional arguments (currently unused). |
The ABC is the area between a validation curve and the overall ATE in the validation set.
It is calculated for each scoring method separately. Higher ABC values are preferable as they
indicate that more treatment effect heterogeneity is captured by the scoring method.
Negative values of ABC are possible if segments of the validation curve cross the overall ATE line.
The ABC is calculated with the auc()
in utility.R
with a natural
cubic spline interpolation. The calculation of the ABC is always based on validation curves based on
100 proportions equally spaced from min(prop.cutoff)
to max(prop.cutoff)
.
The ABC is a metric to help users select the best scoring method in terms of capturing treatment
effect heterogeneity in the data. It should be used in complement to the visual inspection of
the validation curves in the validation set in plot()
.
Returns a matrix of numeric values with number of columns equal to the number cross-validation
iteration and number of rows equal to the number of scoring methods in x
.
Zhao, L., Tian, L., Cai, T., Claggett, B., & Wei, L. J. (2013). Effectively selecting a target population for a future comparative study. Journal of the American Statistical Association, 108(502), 527-539.
catecv()
function and plot()
, boxplot()
methods for
"precmed"
objects.
# Count outcome cv_count <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5, verbose = 1) # ABC of the validation curves for each method and each CV iteration abc(cv_count) # Survival outcome library(survival) cv_surv <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5) # ABC of the validation curves for each method and each CV iteration abc(cv_surv)
# Count outcome cv_count <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5, verbose = 1) # ABC of the validation curves for each method and each CV iteration abc(cv_count) # Survival outcome library(survival) cv_surv <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5) # ABC of the validation curves for each method and each CV iteration abc(cv_surv)
pmcount()
, cvcount()
, drcount.inference()
, catefitsurv()
, catecvsurv()
, and drsurv.inference()
Check arguments
Catered to all types of outcome
Apply at the beginning of pmcount()
, cvcount()
, drcount.inference()
, catefitsurv()
, catecvsurv()
, and drsurv.inference()
arg.checks( fun, response, data, followup.time = NULL, tau0 = NULL, surv.min = NULL, ipcw.method = NULL, ps.method, minPS, maxPS, higher.y = NULL, score.method = NULL, abc = NULL, prop.cutoff = NULL, prop.multi = NULL, train.prop = NULL, cv.n = NULL, error.max = NULL, max.iter = NULL, initial.predictor.method = NULL, tree.depth = NULL, n.trees.rf = NULL, n.trees.boosting = NULL, B = NULL, Kfold = NULL, plot.gbmperf = NULL, error.maxNR = NULL, max.iterNR = NULL, tune = NULL, n.boot = NULL, plot.boot = NULL, interactions = NULL )
arg.checks( fun, response, data, followup.time = NULL, tau0 = NULL, surv.min = NULL, ipcw.method = NULL, ps.method, minPS, maxPS, higher.y = NULL, score.method = NULL, abc = NULL, prop.cutoff = NULL, prop.multi = NULL, train.prop = NULL, cv.n = NULL, error.max = NULL, max.iter = NULL, initial.predictor.method = NULL, tree.depth = NULL, n.trees.rf = NULL, n.trees.boosting = NULL, B = NULL, Kfold = NULL, plot.gbmperf = NULL, error.maxNR = NULL, max.iterNR = NULL, tune = NULL, n.boot = NULL, plot.boot = NULL, interactions = NULL )
fun |
A function for which argument check is needed; "catefit" for |
response |
The type of response. Always 'survival' for this function. |
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
followup.time |
Follow-up time, interpreted as the potential censoring time. If the potential censoring time is known,
followup.time is the name of a corresponding column in the data. Otherwise, set |
tau0 |
The truncation time for defining restricted mean time lost. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
ipcw.method |
The censoring model. Allowed values are: |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
n.boot |
A numeric value indicating the number of bootstrap samples used. This is only relevant
if |
plot.boot |
A logic value indicating whether histograms of the bootstrapped log(rate ratio) should
be produced at every |
interactions |
A logical value indicating whether the outcome model should assume interactions
between |
Nothing. Will stop if arguments are incorrect.
arg.checks()
Check arguments that are common to all types of outcome
USed inside arg.checks()
arg.checks.common( fun, ps.method, minPS, maxPS, higher.y = NULL, abc = NULL, prop.cutoff = NULL, prop.multi = NULL, B = NULL, Kfold = NULL, plot.gbmperf = NULL, tree.depth = NULL, n.trees.boosting = NULL, error.maxNR = NULL, max.iterNR = NULL, tune = NULL, train.prop = NULL, cv.n = NULL, error.max = NULL, max.iter = NULL, n.boot = NULL, plot.boot = NULL )
arg.checks.common( fun, ps.method, minPS, maxPS, higher.y = NULL, abc = NULL, prop.cutoff = NULL, prop.multi = NULL, B = NULL, Kfold = NULL, plot.gbmperf = NULL, tree.depth = NULL, n.trees.boosting = NULL, error.maxNR = NULL, max.iterNR = NULL, tune = NULL, train.prop = NULL, cv.n = NULL, error.max = NULL, max.iter = NULL, n.boot = NULL, plot.boot = NULL )
fun |
A function for which argument check is needed; "pm" for |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
n.boot |
A numeric value indicating the number of bootstrap samples used. This is only relevant
if |
plot.boot |
A logic value indicating whether histograms of the bootstrapped log(rate ratio) should
be produced at every |
Nothing. Will stop if arguments are incorrect.
Doubly robust estimator of the average treatment effect between two treatments, which is the rate ratio for count outcomes, the restricted mean time lost ratio for survival outcomes and the mean difference for continuous outcome. Bootstrap is used for inference.
atefit( response, data, cate.model, ps.model, ps.method = "glm", ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, surv.min = 0.025, interactions = TRUE, n.boot = 500, seed = NULL, verbose = 0 )
atefit( response, data, cate.model, ps.model, ps.method = "glm", ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, surv.min = 0.025, interactions = TRUE, n.boot = 500, seed = NULL, verbose = 0 )
response |
A string describing the type of outcome in the data. Allowed values include
"count" (see |
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
cate.model |
A formula describing the outcome model to be fitted.
The outcome must appear on the left-hand side. For survival outcomes,
a |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Only applies for survival outcomes.
Default is |
ipcw.method |
A character value for the censoring model. Only applies for survival
outcomes. Allowed values are: |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Only applies for
survival outcomes. Default is |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Only applies for survival
outcomes. Default is |
interactions |
A logical value indicating whether the outcome model should assume interactions
between |
n.boot |
A numeric value indicating the number of bootstrap samples used. Default is |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating whether intermediate progress messages and histograms should
be printed. |
For count response, see details in atefitcount()
.
For survival response, see details in atefitsurv()
.
For count response, see description of outputs in atefitcount()
.
For survival response, see description of outputs in atefitsurv()
.
# Count outcome output <- atefit(response = "count", data = countExample, cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, n.boot = 50, seed = 999) output plot(output) # Survival outcome tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) output2 <- atefit(response = "survival", data = survivalExample, cate.model = survival::Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, tau0 = tau0, seed = 999) output2 plot(output2)
# Count outcome output <- atefit(response = "count", data = countExample, cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, n.boot = 50, seed = 999) output plot(output) # Survival outcome tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) output2 <- atefit(response = "survival", data = survivalExample, cate.model = survival::Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, tau0 = tau0, seed = 999) output2 plot(output2)
Doubly robust estimator of the average treatment effect between two treatments, which is the rate ratio for count outcomes. Bootstrap is used for inference.
atefitcount( data, cate.model, ps.model, ps.method = "glm", minPS = 0.01, maxPS = 0.99, interactions = TRUE, n.boot = 500, seed = NULL, verbose = 0 )
atefitcount( data, cate.model, ps.model, ps.method = "glm", minPS = 0.01, maxPS = 0.99, interactions = TRUE, n.boot = 500, seed = NULL, verbose = 0 )
data |
A data frame containing the variables in the outcome, propensity
score, and inverse probability of censoring models (if specified); a data
frame with |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score (PS) model to be
fitted. The treatment must appear on the left-hand side. The treatment must
be a numeric vector coded as 0 or 1. If data are from a randomized controlled
trial, specify |
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of: |
minPS |
A numerical value between 0 and 1 below which estimated
propensity scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated
propensity scores should be truncated. Must be strictly greater than
|
interactions |
A logical value indicating whether the outcome model
should assume treatment-covariate interaction by |
n.boot |
A numeric value indicating the number of bootstrap samples
used. Default is |
seed |
An optional integer specifying an initial randomization seed for
reproducibility. Default is |
verbose |
An integer value indicating whether intermediate progress
messages should be printed. |
This helper function estimates the average treatment effect (ATE) between two treatment groups in a given dataset. The ATE is estimated with a doubly robust estimator that accounts for imbalances in covariate distributions between the two treatment groups with inverse probability treatment weighting. For count outcomes, the estimated ATE is the estimated rate ratio between treatment 1 versus treatment 0.
Return an item of the class atefit
with the following
elements:
log.rate.ratio
: A vector of numeric values of the estimated
ATE (expressed as a log rate ratio of trt=1
over trt=0
),
the bootstrap standard error, the lower and upper limits of 95% confidence
interval, and the p-value.
rate0
: A numeric value of the estimated rate in the group
trt=0
.
rate1
: A numeric value of the estimated rate in the group
trt=1
.
trt.boot
: Estimated log rate ratios in each bootstrap
sample.
warning
: A warning message produced if the treatment
variable was not coded as 0 or 1. The key to map the original coding of the
variable to a 0-1 coding is displayed in the warning to facilitate the
interpretation of the remaining of the output.
output <- atefitcount(data = countExample, cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, verbose = 1, n.boot = 50, seed = 999) output plot(output)
output <- atefitcount(data = countExample, cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, verbose = 1, n.boot = 50, seed = 999) output plot(output)
Doubly robust estimator of the average treatment effect between two treatments, which is the rate ratio of treatment 1 over treatment 0 for count outcomes. Bootstrap is used for inference.
atefitmean( data, cate.model, ps.model, ps.method = "glm", minPS = 0.01, maxPS = 0.99, interactions = TRUE, n.boot = 500, plot.boot = FALSE, seed = NULL, verbose = 0 )
atefitmean( data, cate.model, ps.model, ps.method = "glm", minPS = 0.01, maxPS = 0.99, interactions = TRUE, n.boot = 500, plot.boot = FALSE, seed = NULL, verbose = 0 )
data |
A data frame containing the variables in the outcome and
propensity score models; a data frame with |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a
numeric vector coded as 0/1. If data are from a RCT, specify |
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of: |
minPS |
A numerical value between 0 and 1 below which estimated
propensity scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated
propensity scores should be truncated. Must be strictly greater than
|
interactions |
A logical value indicating whether the outcome model
should be fitted separately by treatment arm with the variables in
|
n.boot |
A numeric value indicating the number of bootstrap samples
used. Default is |
plot.boot |
A logical value indicating whether histograms of the
bootstrapped treatment effect estimates should be produced at every
|
seed |
An optional integer specifying an initial randomization seed for
reproducibility. Default is |
verbose |
An integer value indicating whether intermediate progress
messages and histograms should be printed. |
This helper function estimates the average treatment effect (ATE) between two
treatment groups in a given dataset specified by y, trt, x.cate, x.ps, time
. The ATE is
estimated with a doubly robust estimator that accounts for imbalances in covariate distributions
between the two treatment groups with inverse probability treatment weighting.
For count outcomes, the estimated ATE is the estimated
rate ratio between treatment 1 versus treatment 0. Both original and log-transformed ATEs are
returned, as well as the rate in either treatment group.
If inference = TRUE
, the variability of the estimated rate ratio is also calculated
using bootstrap. Additional variability outputs include standard error of the log rate ratio,
95% confidence interval of the rate ratio, p-value, and a histogram of the log rate ratio.
Return a list of 8 elements:
log.rate.ratio
: A numeric value of the estimated log rate ratio.
se.boot.log.rate.ratio
: A numeric value of the bootstrap standard error of log rate ratio.
rate.ratio
: A numeric value of the estimated rate ratio.
rate.ratio0
: A numeric value of the estimated rate in the group trt=0.
rate.ratio1
: A numeric value of the estimated rate in the group trt=1.
rate.ratio.CIl
: A numeric value of the lower limit 95% bootstrap confidence interval
for estimated rate ratio.
rate.ratio.CIu
: A numeric value of the upper limit 95% bootstrap confidence interval
for estimated rate ratio.
pvalue
: A numeric value of the p-value derived from the bootstrapped values
based on a Chi-squared distribution.
warning
: A warning message produced if the treatment variable was not coded as 0/1. The key
to map the original coding of the variable to a 0/1 key is displayed in the warning to facilitate the
interpretation of the remaining of the output.
plot
: If plot.boot
is TRUE
, a histogram displaying the distribution of the bootstrapped log rate ratios.
The red vertical reference line in the histogram represents the estimated log rate ratio.
# This module is not implemented yet!
# This module is not implemented yet!
Doubly robust estimator of the average treatment effect between two treatments, which is the restricted mean time lost ratio for survival outcomes. Bootstrap is used for inference.
atefitsurv( data, cate.model, ps.model, ps.method = "glm", ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, surv.min = 0.025, n.boot = 500, seed = NULL, verbose = 0 )
atefitsurv( data, cate.model, ps.model, ps.method = "glm", ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, surv.min = 0.025, n.boot = 500, seed = NULL, verbose = 0 )
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
cate.model |
A formula describing the outcome model to be fitted.
The outcome must appear on the left-hand side. For survival outcomes, a |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Only applies for survival outcomes.
Default is |
ipcw.method |
A character value for the censoring model. Only applies for survival
outcomes. Allowed values are: |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Only applies for
survival outcomes. Default is |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Only applies for survival
outcomes. Default is |
n.boot |
A numeric value indicating the number of bootstrap samples used. Default is |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating whether intermediate progress messages should
be printed. |
This helper function estimates the average treatment effect (ATE) for survival data between two treatment groups in a given dataset. The ATE is estimated with a doubly robust estimator that accounts for imbalances in covariate distributions between the two treatment groups with inverse probability treatment and censoring weighting. For survival outcomes, the estimated ATE is the estimated by RMTL ratio between treatment 1 versus treatment 0. The log-transformed ATEs and log-transformed adjusted hazard ratios are returned, as well as the estimated RMST in either treatment group. The variability of the estimated RMTL ratio is calculated using bootstrap. Additional outputs include standard error of the log RMTL ratio, 95% confidence interval, p-value, and a histogram of the bootstrap estimates.
Return an object of class atefit
with 6 elements:
rmst1
: A vector of numeric values of the estimated RMST, bootstrap standard error,
lower and upper limits of 95% confidence interval, and the p-value in the group trt=1
.
rmst0
: A vector of numeric values of the estimated RMST, bootstrap standard error,
lower and upper limits of 95% confidence interval, and the p-value in the group trt=0
.
log.rmtl.ratio
: A vector of numeric values of the estimated log RMTL ratio of
trt=1
over trt=0
, bootstrap standard error, lower and upper limits of 95% confidence
interval, and the p-value.
log.hazard.ratio
: A vector of numeric values of the estimated adjusted log hazard ratio
of trt=1
over trt=0
, bootstrap standard error, lower and upper limits of 95% confidence
interval, and the p-value.
trt.boot
: Estimates of rmst1
, rmst0
,
log.rmtl.ratio
and log.hazard.ratio
in each bootstrap sample.
warning
: A warning message produced if the treatment variable was not coded as 0/1.
The key to map the original coding of the variable to a 0/1 key is displayed in the warning to facilitate
the interpretation of the remaining of the output.
library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) output <- atefitsurv(data = survivalExample, cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, tau0 = tau0, n.boot = 50, seed = 999, verbose = 1) output plot(output)
library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) output <- atefitsurv(data = survivalExample, cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, tau0 = tau0, n.boot = 50, seed = 999, verbose = 1) output plot(output)
This function computes the area under the curve for two vectors where one corresponds to the x values and the other corresponds to the y values. It supports both linear and spline interpolation.
auc( x, y, from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE), type = c("linear", "spline"), subdivisions = 100, ... )
auc( x, y, from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE), type = c("linear", "spline"), subdivisions = 100, ... )
x |
A numeric vector of x values. |
y |
A numeric vector of y values of the same length as x. |
from |
The value from where to start calculating the area under the curve. Defaults to the smallest x value. |
to |
The value from where to end the calculation of the area under the curve. Defaults to the greatest x value. |
type |
The type of interpolation: "linear" or "spline". Defaults to "linear". |
subdivisions |
An integer indicating how many subdivisions to use for 'integrate' (for spline-based approximations). |
... |
Additional arguments passed on to 'approx' (for linear interpolations). |
A numeric value representing the area under the curve.
Split the given dataset into balanced training and validation sets (within a pre-specified tolerance) Balanced means 1) The ratio of treated and controls is maintained in the training and validation sets 2) The covariate distributions are balanced between the training and validation sets
balance.split( y, trt, x.cate, x.ps, time, minPS = 0.01, maxPS = 0.99, train.prop = 3/4, error.max = 0.1, max.iter = 5000 )
balance.split( y, trt, x.cate, x.ps, time, minPS = 0.01, maxPS = 0.99, train.prop = 3/4, error.max = 0.1, max.iter = 5000 )
y |
Observed outcome; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
A list of 10 objects, 5 training and 5 validation of y, trt, x.cate, x.ps, time:
y.train - observed outcome in the training set; vector of size m
(observations in the training set)
trt.train - treatment received in the training set; vector of size m
coded as 0/1
x.cate.train - baseline covariates for the outcome model in the training set; matrix of dimension m
by p.cate
x.ps.train - baseline covariates (plus intercept) for the propensity score model in the training set; matrix of dimension m
by p.ps + 1
time.train - log-transformed person-years of follow-up in the training set; vector of size m
y.valid - observed outcome in the validation set; vector of size n-m
trt.valid - treatment received in the validation set; vector of size n-m
coded as 0/1
x.cate.valid - baseline covariates for the outcome model in the validation set; matrix of dimension n-m
by p.cate
x.ps.valid - baseline covariates (plus intercept) for the propensity score model in the validation set; matrix of dimension n-m
by p.ps + 1
time.valid - log-transformed person-years of follow-up in the validation set; vector of size n-m
Split the given dataset into balanced training and validation sets (within a pre-specified tolerance) Balanced means 1) The ratio of treated and controls is maintained in the training and validation sets 2) The covariate distributions are balanced between the training and validation sets
balancemean.split( y, trt, x.cate, x.ps, minPS = 0.01, maxPS = 0.99, train.prop = 3/4, error.max = 0.1, max.iter = 5000 )
balancemean.split( y, trt, x.cate, x.ps, minPS = 0.01, maxPS = 0.99, train.prop = 3/4, error.max = 0.1, max.iter = 5000 )
y |
Observed outcome; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
A list of 10 objects, 5 training and 5 validation of y, trt, x.cate, x.ps, time:
y.train - observed outcome in the training set; vector of size m
(observations in the training set)
trt.train - treatment received in the training set; vector of size m
coded as 0/1
x.cate.train - baseline covariates for the outcome model in the training set; matrix of dimension m
by p.cate
x.ps.train - baseline covariates (plus intercept) for the propensity score model in the training set; matrix of dimension m
by p.ps + 1
y.valid - observed outcome in the validation set; vector of size n-m
trt.valid - treatment received in the validation set; vector of size n-m
coded as 0/1
x.cate.valid - baseline covariates for the outcome model in the validation set; matrix of dimension n-m
by p.cate
x.ps.valid - baseline covariates (plus intercept) for the propensity score model in the validation set; matrix of dimension n-m
by p.ps + 1
bestid.valid - id for the validation set by the best split; vector of size n-m
Split the given time-to-event dataset into balanced training and validation sets (within a pre-specified tolerance) Balanced means 1) The ratio of treated and controls is maintained in the training and validation sets 2) The covariate distributions are balanced between the training and validation sets
balancesurv.split( y, d, trt, x.cate, x.ps, x.ipcw, yf = NULL, train.prop = 3/4, error.max = 0.1, max.iter = 5000 )
balancesurv.split( y, d, trt, x.cate, x.ps, x.ipcw, yf = NULL, train.prop = 3/4, error.max = 0.1, max.iter = 5000 )
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
A list of 14 objects, 7training and 7 validation of y, trt, x.cate, x.ps, x.ipcw, time, yf:
y.train - observed survival or censoring time in the training set; vector of size m
(observations in the training set)
d.train - event indicator in the training set; vector of size m
coded as 0/1
trt.train - treatment received in the training set; vector of size m
coded as 0/1
x.cate.train - baseline covariates for the outcome model in the training set; matrix of dimension m
by p.cate
x.ps.train - baseline covariates (plus intercept) for the propensity score model in the training set; matrix of dimension m
by p.ps + 1
x.ipcw.train - baseline covariates for inverse probability of censoring in the training set; matrix of dimension m
by p.ipw
yf.train - follow-up time in the training set; if known, vector of size m
; if unknown, yf == NULL
y.valid - observed survival or censoring time in the validation set; vector of size n-m
d.valid - event indicator in the validation set; vector of size n-m
coded as 0/1
trt.valid - treatment received in the validation set; vector of size n-m
coded as 0/1
x.cate.valid - baseline covariates for the outcome model in the validation set; matrix of dimension n-m
by p.cate
x.ps.valid - baseline covariates (plus intercept) for the propensity score model in the validation set; matrix of dimension n-m
by p.ps + 1
x.ipcw.valid - baseline covariates for inverse probability of censoring in the validation set; matrix of dimension n-m
by p.ipw
yf.valid - follow-up time in the training set; if known, vector of size n-m
; if unknown, yf == NULL
"precmed"
objectProvides box plots which depict distributions of estimated ATEs for each multi-category subgroup in
the validation set across all cross-validation iterations. The subgroups are mutually exclusive and
are categorized by the CATE score percentiles (prop.multi
specified in catecv()
or
catecvmean()
). Box plots of mutually exclusive subgroups are constructed separately by scoring
method specified in catecv()
. This should be run only after results of catecv()
or
catecvmean()
) have been obtained.
## S3 method for class 'precmed' boxplot( x, ylab = NULL, plot.hr = FALSE, title = waiver(), theme = theme_classic(), ... )
## S3 method for class 'precmed' boxplot( x, ylab = NULL, plot.hr = FALSE, title = waiver(), theme = theme_classic(), ... )
x |
An object of class |
ylab |
A character value for the y-axis label to describe what the ATE is. Default is |
plot.hr |
A logical value indicating whether the hazard ratios should be plotted in the
validation curves ( |
title |
The text for the title |
theme |
Defaults to |
... |
Other parameters |
boxplot()
takes in outputs from catecv()
and generates
the box plots of estimated ATEs for multi-category subgroups of the validation set.
The box plots together with the overall ATE reference line can help compare the scoring methods'
ability to distinguish subgroups of patients with different treatment effects.
For a given scoring method, box plots showing increasing or decreasing trends across the multi-category subgroups indicate presence of treatment effect heterogeneity (and the ability of the scoring method to capture it). On the contrary, box plots which are relatively aligned across the multi-category subgroups indicate absence of treatment effect heterogeneity (or the inability of the scoring method to capture it).
Returns sets of box plots, one set for each scoring method, over each of the multi-category subgroups. A gray horizontal dashed line of the overall ATE is included as a reference.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
plot
and abc()
for "precmed"
objects.
# Count outcome eval_1 <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5) boxplot(eval_1, ylab = "Rate ratio of drug1 vs drug0 in each subgroup") # Survival outcome library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) eval_2 <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "randomForest", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, higher.y = TRUE, cv.n = 5, seed = 999) boxplot(eval_2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
# Count outcome eval_1 <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5) boxplot(eval_1, ylab = "Rate ratio of drug1 vs drug0 in each subgroup") # Survival outcome library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) eval_2 <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "randomForest", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, higher.y = TRUE, cv.n = 5, seed = 999) boxplot(eval_2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
Provides (doubly robust) estimation of the average treatment effect (ATE) for count, survival or continuous outcomes in nested and mutually exclusive subgroups of patients defined by an estimated conditional average treatment effect (CATE) score via cross-validation (CV).
catecv( response, data, score.method, cate.model, ps.model, ps.method = "glm", init.model = NULL, initial.predictor.method = NULL, ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), prop.multi = c(0, 1/3, 2/3, 1), abc = TRUE, train.prop = 3/4, cv.n = 10, error.max = 0.1, max.iter = 5000, surv.min = 0.025, xvar.smooth.score = NULL, xvar.smooth.init = NULL, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = TRUE, verbose = 0 )
catecv( response, data, score.method, cate.model, ps.model, ps.method = "glm", init.model = NULL, initial.predictor.method = NULL, ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), prop.multi = c(0, 1/3, 2/3, 1), abc = TRUE, train.prop = 3/4, cv.n = 10, error.max = 0.1, max.iter = 5000, surv.min = 0.025, xvar.smooth.score = NULL, xvar.smooth.init = NULL, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = TRUE, verbose = 0 )
response |
A string describing the type of outcome in the data.
Allowed values include "count" (see |
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted.
The outcome must appear on the left-hand side. For survival outcomes, a
|
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates specified in |
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Only applies for survival outcomes.
Default is |
ipcw.method |
A character value for the censoring model. Only applies for survival
outcomes. Allowed values are: |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Only applies for
survival outcomes. Default is |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Only applies for survival
outcomes. Default is |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as
the smooth terms if |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
For count response, see details in catecvcount()
.
For survival response, see details in catecvsurv()
.
For continuous response, see details in catecvmean()
.
For count response, see description of outputs in catecvcount()
.
For survival response, see description of outputs in catecvsurv()
.
For continuous response, see description of outputs in catecvmean()
.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
catefit()
function and boxplot()
, abc
methods for
"precmed"
objects.
cate_1 <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5, seed = 999, verbose = 1) plot(cate_1, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") boxplot(cate_1, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") abc(cate_1) # Survival outcome library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) cate_2 <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "randomForest", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, higher.y = TRUE, surv.min = 0.025, cv.n = 5, seed = 999) plot(cate_2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") boxplot(cate_2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") abc(cate_2)
cate_1 <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5, seed = 999, verbose = 1) plot(cate_1, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") boxplot(cate_1, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") abc(cate_1) # Survival outcome library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) cate_2 <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "randomForest", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, higher.y = TRUE, surv.min = 0.025, cv.n = 5, seed = 999) plot(cate_2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") boxplot(cate_2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") abc(cate_2)
Provides doubly robust estimation of the average treatment effect (ATE) in
nested and mutually exclusive subgroups of patients defined by an estimated
conditional average treatment effect (CATE) score via cross-validation (CV).
The CATE score can be estimated with up to 5 methods among the following:
Poisson regression, boosting, two regressions, contrast regression, and
negative binomial (see score.method
).
catecvcount( data, score.method, cate.model, ps.model, ps.method = "glm", initial.predictor.method = "boosting", minPS = 0.01, maxPS = 0.99, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), prop.multi = c(0, 1/3, 2/3, 1), abc = TRUE, train.prop = 3/4, cv.n = 10, error.max = 0.1, max.iter = 5000, xvar.smooth = NULL, tree.depth = 2, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = TRUE, verbose = 0, ... )
catecvcount( data, score.method, cate.model, ps.model, ps.method = "glm", initial.predictor.method = "boosting", minPS = 0.01, maxPS = 0.99, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), prop.multi = c(0, 1/3, 2/3, 1), abc = TRUE, train.prop = 3/4, cv.n = 10, error.max = 0.1, max.iter = 5000, xvar.smooth = NULL, tree.depth = 2, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = TRUE, verbose = 0, ... )
data |
A data frame containing the variables in the outcome and
propensity score model; a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE
score. Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a
numeric vector coded as 0 or 1. If data are from a randomized trial, specify
|
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of: |
initial.predictor.method |
A character vector for the method used to get
initial outcome predictions conditional on the covariates in
|
minPS |
A numerical value between 0 and 1 below which estimated
propensity scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated
propensity scores should be truncated. Must be strictly greater than
|
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values between 0 and 1 specifying
percentiles of the estimated log CATE scores to define nested subgroups. Each
element represents the cutoff to separate observations in nested subgroups
(below vs above cutoff). The length of |
prop.multi |
A vector of numerical values between 0 and 1 specifying
percentiles of the estimated log CATE scores to define mutually exclusive
subgroups. It should start with 0, end with 1, and be of
|
abc |
A logical value indicating whether the area between curves (ABC)
should be calculated at each cross-validation iterations, for each
|
train.prop |
A numerical value between 0 and 1 indicating the proportion
of total data used for training. Default is |
cv.n |
A positive integer value indicating the number of
cross-validation iterations. Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum
value of error) for the largest standardized absolute difference in the
covariate distributions or in the doubly robust estimated rate ratios between
the training and validation sets. This is used to define a balanced
training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of
iterations when searching for a balanced training-validation split. Default
is |
xvar.smooth |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
The CATE score represents an individual-level treatment effect expressed as a rate ratio for count outcomes. It can be estimated with boosting, Poisson regression, negative binomial regression, and the doubly robust estimator two regressions (Yadlowsky, 2020) applied separately by treatment group or with the other doubly robust estimator contrast regression (Yadlowsky, 2020) applied to the entire data set.
Internal CV is applied to reduce optimism in choosing the CATE estimation method that
captures the most treatment effect heterogeneity. The CV is applied by repeating the
following steps cv.n
times:
Split the data into a training and validation set according to train.prop
.
The training and validation sets must be balanced with respect to covariate distributions
and doubly robust rate ratio estimates (see error.max
).
Estimate the CATE score in the training set with the specified scoring method.
Predict the CATE score in the validation set using the scoring model fitted from the training set.
Build nested subgroups of treatment responders in the training and validation sets,
separately, and estimate the ATE within each nested subgroup. For each element i of
prop.cutoff
(e.g., prop.cutoff[i]
= 0.6), take the following steps:
Identify high responders as observations with the 60%
(i.e., prop.cutoff[i]
x100%) highest (if higher.y = TRUE
) or
lowest (if higher.y = FALSE
) estimated CATE scores.
Estimate the ATE in the subgroup of high responders using a doubly robust estimator.
Conversely, identify low responders as observations with the 40%
(i.e., 1 - prop.cutoff[i]
x100%) lowest (if higher.y
= TRUE) or
highest (if higher.y
= FALSE) estimated CATE scores.
Estimate the ATE in the subgroup of low responders using a doubly robust estimator.
If abc
= TRUE, calculate the area between the ATE and the series of ATEs in
nested subgroups of high responders in the validation set.
Build mutually exclusive subgroups of treatment responders in the training and
validation sets, separately, and estimate the ATE within each subgroup. Mutually exclusive
subgroups are built by splitting the estimated CATE scores according to prop.multi
.
Returns a list containing the following components saved as a "precmed"
object:
ate.poisson
: A list of results output if score.method
includes
'poisson'
:
ate.est.train.high.cv
: A matrix of numerical values with
length(prop.cutoff)
rows and cv.n
columns.
The ith row/jth column cell contains the estimated ATE in the nested subgroup of high responders
defined by CATE score above (if higher.y = TRUE
) or below (if higher.y = FALSE
) the
prop.cutoff[i]
x100% percentile of the estimated CATE score in the training set in the jth
cross-validation iteration.
ate.est.train.low.cv
: A matrix of numerical values with
length(prop.cutoff) - 1
rows and cv.n
columns.
The ith row/jth column cell contains the estimated ATE in the nested subgroup of low responders
defined by CATE score below (if higher.y = TRUE
) or above (if higher.y = FALSE
) the
prop.cutoff[i]
x100% percentile of the estimated CATE score in the training set in the jth
cross-validation iteration.
ate.est.valid.high.cv
: Same as ate.est.train.high.cv
,
but in the validation set.
ate.est.valid.low.cv
: Same as ate.est.train.low.cv
,
but in the validation set.
ate.est.train.group.cv
: A matrix of numerical values with
length(prop.multi) - 1
rows and cv.n
columns.
The jth column contains the estimated ATE in length(prop.multi) - 1
mutually exclusive subgroups defined by prop.multi
in the training set in jth
cross-validation iteration.
ate.est.valid.group.cv
: Same as ate.est.train.group.cv
, but in the
validation set.
abc.valid
: A vector of numerical values of length cv.n
.
The ith element returns the ABC of the validation curve in the ith cross-validation
iteration. Only returned if abc = TRUE
.
ate.boosting
: A list of results similar to ate.poisson
output
if score.method
includes 'boosting'
.
ate.twoReg
: A list of results similar to ate.poisson
output
if score.method
includes 'twoReg'
.
ate.contrastReg
: A list of results similar to ate.poisson
output
if score.method
includes 'contrastReg'
.
This method has an additional element in the list of results:
converge.contrastReg.cv
: A vector of logical value of length cv.n
.
The ith element indicates whether the algorithm converged in the ith cross-validation
iteration.
ate.negBin
: A list of results similar to ate.poisson
output
if score.method
includes 'negBin'
.
props
: A list of 3 elements:
prop.onlyhigh
: The original argument prop.cutoff
,
reformatted as necessary.
prop.bi
: The original argument prop.cutoff
,
similar to prop.onlyhigh
but reformatted to exclude 1.
prop.multi
: The original argument prop.multi
,
reformatted as necessary to include 0 and 1.
overall.ate.valid
: A vector of numerical values of length cv.n
.
The ith element contains the ATE in the validation set of the ith cross-validation
iteration, estimated with the doubly robust estimator.
overall.ate.train
: A vector of numerical values of length cv.n
.
The ith element contains the ATE in the training set of the ith cross-validation
iteration, estimated with the doubly robust estimator.
fgam
: The formula used in GAM if initial.predictor.method = 'gam'
.
higher.y
: The original higher.y
argument.
abc
: The original abc
argument.
cv.n
: The original cv.n
argument.
response
: The type of response. Always 'count' for this function.
formulas
:A list of 3 elements: (1) cate.model
argument,
(2) ps.model
argument and (3) original labels of the left-hand side variable in
ps.model
(treatment) if it was not 0/1.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18.. DOI: 10.1080/01621459.2020.1772080.
plot.precmed()
, boxplot.precmed()
,
abc()
methods for "precmed"
objects,
and catefitcount()
function.
catecv <- catecvcount(data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5, seed = 999, plot.gbmperf = FALSE, verbose = 1) plot(catecv, ylab = "Rate ratio of drug1 vs drug0 in each subgroup") boxplot(catecv, ylab = "Rate ratio of drug1 vs drug0 in each subgroup") abc(catecv)
catecv <- catecvcount(data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5, seed = 999, plot.gbmperf = FALSE, verbose = 1) plot(catecv, ylab = "Rate ratio of drug1 vs drug0 in each subgroup") boxplot(catecv, ylab = "Rate ratio of drug1 vs drug0 in each subgroup") abc(catecv)
Provides doubly robust estimation of the average treatment effect (ATE) in nested and
mutually exclusive subgroups of patients defined by an estimated conditional average
treatment effect (CATE) score via cross-validation (CV). The CATE score can be estimated
with up to 6 methods among the following: Linear regression, boosting, two regressions,
contrast regression, random forest and generalized additive model (see score.method
).
catecvmean( data, score.method, cate.model, ps.model, ps.method = "glm", init.model = NULL, initial.predictor.method = "boosting", minPS = 0.01, maxPS = 0.99, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), prop.multi = c(0, 1/3, 2/3, 1), abc = TRUE, train.prop = 3/4, cv.n = 10, error.max = 0.1, max.iter = 5000, xvar.smooth.score = NULL, xvar.smooth.init = NULL, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 6, plot.gbmperf = TRUE, error.maxNR = 0.001, tune = c(0.5, 2), seed = NULL, verbose = 0, ... )
catecvmean( data, score.method, cate.model, ps.model, ps.method = "glm", init.model = NULL, initial.predictor.method = "boosting", minPS = 0.01, maxPS = 0.99, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), prop.multi = c(0, 1/3, 2/3, 1), abc = TRUE, train.prop = 3/4, cv.n = 10, error.max = 0.1, max.iter = 5000, xvar.smooth.score = NULL, xvar.smooth.init = NULL, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 6, plot.gbmperf = TRUE, error.maxNR = 0.001, tune = c(0.5, 2), seed = NULL, verbose = 0, ... )
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a RCT, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as
the smooth terms if |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
The CATE score represents an individual-level treatment effect for continuous data, estimated with boosting, linear regression, random forest, generalized additive model and the doubly robust estimator (two regressions, Yadlowsky, 2020) applied separately by treatment group or with the other doubly robust estimators (contrast regression, Yadlowsky, 2020) applied to the entire data set.
Internal CV is applied to reduce optimism in choosing the CATE estimation method that
captures the most treatment effect heterogeneity. The CV is applied by repeating the
following steps cv.n
times:
Split the data into a training and validation set according to train.prop
.
The training and validation sets must be balanced with respect to covariate distributions
and doubly robust rate ratio estimates (see error.max
).
Estimate the CATE score in the training set with the specified scoring method.
Predict the CATE score in the validation set using the scoring model fitted from the training set.
Build nested subgroups of treatment responders in the training and validation sets,
separately, and estimate the ATE within each nested subgroup. For each element i of
prop.cutoff
(e.g., prop.cutoff[i]
= 0.6), take the following steps:
Identify high responders as observations with the 60%
(i.e., prop.cutoff[i]
x100%) highest (if higher.y = TRUE
) or
lowest (if higher.y = FALSE
) estimated CATE scores.
Estimate the ATE in the subgroup of high responders using a doubly robust estimator.
Conversely, identify low responders as observations with the 40%
(i.e., 1 - prop.cutoff[i]
x100%) lowest (if higher.y
= TRUE) or
highest (if higher.y
= FALSE) estimated CATE scores.
Estimate the ATE in the subgroup of low responders using a doubly robust estimator.
Build mutually exclusive subgroups of treatment responders in the training and
validation sets, separately, and estimate the ATE within each subgroup. Mutually exclusive
subgroups are built by splitting the estimated CATE scores according to prop.multi
.
If abc
= TRUE, calculate the area between the ATE and the series of ATEs in
nested subgroups of high responders in the validation set.
Returns a list containing the following components saved as a "precmed"
object:
ate.gaussian
: A list of results output if score.method
includes
'gaussian'
:
ate.est.train.high.cv
: A matrix of numerical values with
length(prop.cutoff)
rows and cv.n
columns.
The ith column/jth row cell contains the estimated ATE in the nested subgroup of high responders
defined by CATE score above (if higher.y = TRUE
) or below (if higher.y = FALSE
) the
prop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith
cross-validation iteration.
ate.est.train.low.cv
: A matrix of numerical values with
length(prop.cutoff) - 1
rows and cv.n
columns.
The ith column/jth row cell contains the estimated ATE in the nested subgroup of low responders
defined by CATE score below (if higher.y = TRUE
) or above (if higher.y = FALSE
) the
prop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith
cross-validation iteration.
ate.est.valid.high.cv
: Same as ate.est.train.high.cv
,
but in the validation set.
ate.est.valid.low.cv
: Same as ate.est.train.low.cv
,
but in the validation set.
ate.est.train.group.cv
: A matrix of numerical values with
length(prop.multi) - 1
rows and cv.n
columns.
The ith column contains the estimated ATE in length(prop.multi) - 1
mutually exclusive subgroups defined by prop.multi
in the training set in ith
cross-validation iteration.
ate.est.valid.group.cv
: Same as ate.est.train.group.cv
, but in the
validation set.
abc.valid
: A vector of numerical values of length cv.n
,
The ith element returns the ABC of the validation curve in the ith cross-validation
iteration. Only returned if abc = TRUE
.
ate.boosting
: A list of results similar to ate.gaussian
output
if score.method
includes 'boosting'
.
ate.twoReg
: A list of results similar to ate.gaussian
output
if score.method
includes 'twoReg'
.
ate.contrastReg
: A list of results similar to ate.gaussian
output
if score.method
includes 'contrastReg'
.
ate.randomForest
: A list of ATE output measured by the RMTL ratio if
score.method
includes 'randomForest'
:
ate.gam
: A list of results similar to ate.gaussian
output
if score.method
includes 'gam'
.
props
: A list of 3 elements:
prop.onlyhigh
: The original argument prop.cutoff
,
reformatted as necessary.
prop.bi
: The original argument prop.cutoff
,
similar to prop.onlyhigh
but reformatted to exclude 1.
prop.multi
: The original argument prop.multi
,
reformatted as necessary.
overall.ate.train
: A vector of numerical values of length cv.n
.
The ith element contains the ATE in the training set of the ith cross-validation
iteration, estimated with the doubly robust estimator.
overall.ate.valid
: A vector of numerical values of length cv.n
.
The ith element contains the ATE in the validation set of the ith cross-validation
iteration, estimated with the doubly robust estimator.
higher.y
: The original higher.y
argument.
abc
: The original abc
argument.
cv.n
: The original cv.n
argument.
response
: The type of response. Always 'continuous' for this function.
formulas
:A list of 3 elements: (1) cate.model
argument,
(2) ps.model
argument and (3) original labels of the left-hand side variable in
ps.model
(treatment) if it was not 0/1.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
plot.precmed()
, boxplot.precmed()
, abc()
methods for "precmed"
objects,
and catefitmean()
function.
# Not implemented yet!
# Not implemented yet!
Provides doubly robust estimation of the average treatment effect (ATE) by the
RMTL (restricted mean time lost) ratio in nested and mutually exclusive subgroups of patients
defined by an estimated conditional average treatment effect (CATE) score via
cross-validation (CV). The CATE score can be estimated with up to 5 methods among the following:
Random forest, boosting, poisson regression, two regressions, and contrast regression
(see score.method
).
catecvsurv( data, score.method, cate.model, ps.model, ps.method = "glm", initial.predictor.method = "randomForest", ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), prop.multi = c(0, 1/3, 2/3, 1), abc = TRUE, train.prop = 3/4, cv.n = 10, error.max = 0.1, max.iter = 5000, surv.min = 0.025, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = TRUE, verbose = 0 )
catecvsurv( data, score.method, cate.model, ps.model, ps.method = "glm", initial.predictor.method = "randomForest", ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), prop.multi = c(0, 1/3, 2/3, 1), abc = TRUE, train.prop = 3/4, cv.n = 10, error.max = 0.1, max.iter = 5000, surv.min = 0.025, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = TRUE, verbose = 0 )
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A standard |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates specified in |
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Default is |
ipcw.method |
A character value for the censoring model. Allowed values are:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Default is |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
abc |
A logical value indicating whether the area between curves (ABC) should be calculated
at each cross-validation iterations, for each |
train.prop |
A numerical value (in '(0, 1)') indicating the proportion of total data used
for training. Default is |
cv.n |
A positive integer value indicating the number of cross-validation iterations.
Default is |
error.max |
A numerical value > 0 indicating the tolerance (maximum value of error)
for the largest standardized absolute difference in the covariate distributions or in the
doubly robust estimated rate ratios between the training and validation sets. This is used
to define a balanced training-validation splitting. Default is |
max.iter |
A positive integer value indicating the maximum number of iterations when
searching for a balanced training-validation split. Default is |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Default is |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
The CATE score represents an individual-level treatment effect expressed as the restricted mean survival time (RMTL) ratio) for survival outcomes. It can be estimated with boosting, Poisson regression, random forest, and the doubly robust estimator two regressions (Yadlowsky, 2020) applied separately by treatment group or with the other doubly robust estimator contrast regression (Yadlowsky, 2020) applied to the entire data set.
Internal CV is applied to reduce optimism in choosing the CATE estimation method that
captures the most treatment effect heterogeneity. The CV is applied by repeating the
following steps cv.n
times:
Split the data into a training and validation set according to train.prop
.
The training and validation sets must be balanced with respect to covariate distributions
and doubly robust RMTL ratio estimates (see error.max
).
Estimate the CATE score in the training set with the specified scoring method.
Predict the CATE score in the validation set using the scoring model fitted from the training set.
Build nested subgroups of treatment responders in the training and validation sets,
separately, and estimate the ATE within each nested subgroup. For each element i of
prop.cutoff
(e.g., prop.cutoff[i]
= 0.6), take the following steps:
Identify high responders as observations with the 60%
(i.e., prop.cutoff[i]
x100%) highest (if higher.y = FALSE
) or
lowest (if higher.y = TRUE
) estimated CATE scores.
Estimate the ATE in the subgroup of high responders using a doubly robust estimator.
Conversely, identify low responders as observations with the 40%
(i.e., 1 - prop.cutoff[i]
x100%) lowest (if higher.y
= FALSE) or
highest (if higher.y
= TRUE) estimated CATE scores.
Estimate the ATE in the subgroup of low responders using a doubly robust estimator.
If abc
= TRUE, calculate the area between the ATE and the series of ATEs in
nested subgroups of high responders in the validation set.
Build mutually exclusive subgroups of treatment responders in the training and
validation sets, separately, and estimate the ATE within each subgroup. Mutually exclusive
subgroups are built by splitting the estimated CATE scores according to prop.multi
.
Returns a list containing the following components saved as a "precmed"
object:
ate.randomForest
: A list of ATE output measured by the RMTL ratio if
score.method
includes 'randomForest'
:
ate.est.train.high.cv
: A matrix of numerical values with
length(prop.cutoff)
rows and cv.n
columns.
The ith column/jth row cell contains the estimated ATE in the nested subgroup of high responders
defined by CATE score above (if higher.y = FALSE
) or below (if higher.y = TRUE
) the
prop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith
cross-validation iteration.
ate.est.train.low.cv
: A matrix of numerical values with
length(prop.cutoff) - 1
rows and cv.n
columns.
TThe ith column/jth row cell contains the estimated ATE in the nested subgroup of low responders
defined by CATE score below (if higher.y = FALSE
) or above (if higher.y = TRUE
) the
prop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith
cross-validation iteration.
ate.est.valid.high.cv
: Same as ate.est.train.high.cv
,
but in the validation set.
ate.est.valid.low.cv
: Same as ate.est.train.low.cv
,
but in the validation set.
ate.est.train.group.cv
: A matrix of numerical values with
length(prop.multi) - 1
rows and cv.n
columns.
The ith column contains the estimated ATE in length(prop.multi) - 1
mutually exclusive subgroups defined by prop.multi
in the training set in ith
cross-validation iteration.
ate.est.valid.group.cv
: Same as ate.est.train.group.cv
, but in the
validation set.
abc.valid
: A vector of numerical values of length cv.n
,
The ith element returns the ABC of the validation curve in the ith cross-validation
iteration. Only returned if abc = TRUE
.
ate.boosting
: A list of results similar to ate.randomForest
output
if score.method
includes 'boosting'
.
ate.poisson
: A list of results similar to ate.randomForest
output
if score.method
includes 'poisson'
.
ate.twoReg
: A list of results similar to ate.randomForest
output
if score.method
includes 'twoReg'
.
ate.contrastReg
: A list of results similar to ate.randomForest
output
if score.method
includes 'contrastReg'
.
This method has an additional element in the list of results:
converge.contrastReg.cv
: A vector of logical value of length cv.n
.
The ith element indicates whether the algorithm converged in the ith cross-validation
iteration.
hr.randomForest
: A list of adjusted hazard ratio if score.method
includes
'randomForest'
:
hr.est.train.high.cv
: A matrix of numerical values with
length(prop.cutoff)
rows and cv.n
columns.
The ith column/jth row cell contains the estimated HR in the nested subgroup of high responders
defined by CATE score above (if higher.y = FALSE
) or below (if higher.y = TRUE
) the
prop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith
cross-validation iteration.
hr.est.train.low.cv
: A matrix of numerical values with
length(prop.cutoff) - 1
rows and cv.n
columns.
TThe ith column/jth row cell contains the estimated HR in the nested subgroup of low responders
defined by CATE score below (if higher.y = FALSE
) or above (if higher.y = TRUE
) the
prop.cutoff[j]
x100% percentile of the estimated CATE score in the training set in the ith
cross-validation iteration.
hr.est.valid.high.cv
: Same as hr.est.train.high.cv
,
but in the validation set.
hr.est.valid.low.cv
: Same as hr.est.train.low.cv
,
but in the validation set.
hr.est.train.group.cv
: A matrix of numerical values with
length(prop.multi) - 1
rows and cv.n
columns.
The ith column contains the estimated HR in length(prop.multi) - 1
mutually exclusive subgroups defined by prop.multi
in the training set in ith
cross-validation iteration.
hr.est.valid.group.cv
: Same as hr.est.train.group.cv
, but in the
validation set.
hr.boosting
: A list of results similar to hr.randomForest
output
if score.method
includes 'boosting'
.
hr.poisson
: A list of results similar to hr.randomForest
output
if score.method
includes 'poisson'
.
hr.twoReg
: A list of results similar to hr.randomForest
output
if score.method
includes 'twoReg'
.
hr.contrastReg
: A list of results similar to hr.randomForest
output
if score.method
includes 'contrastReg'
.
props
: A list of 3 elements:
prop.onlyhigh
: The original argument prop.cutoff
,
reformatted as necessary.
prop.bi
: The original argument prop.cutoff
,
similar to prop.onlyhigh
but reformatted to exclude 1.
prop.multi
: The original argument prop.multi
,
reformatted as necessary to include 0 and 1.
overall.ate.train
: A vector of numerical values of length cv.n
.
The ith element contains the ATE (RMTL ratio) in the training set of the ith cross-validation
iteration, estimated with the doubly robust estimator.
overall.hr.train
: A vector of numerical values of length cv.n
.
The ith element contains the ATE (HR) in the training set of the ith cross-validation
iteration.
overall.ate.valid
: A vector of numerical values of length cv.n
.
The ith element contains the ATE (RMTL ratio) in the validation set of the ith cross-validation
iteration, estimated with the doubly robust estimator.
overall.hr.valid
: A vector of numerical values of length cv.n
.
The ith element contains the ATE (HR) in the validation set of the ith cross-validation
iteration.
errors/warnings
: A nested list of errors and warnings that were wrapped during the
calculation of ATE. Errors and warnings are organized by score.method
and
position in the CV flow.
higher.y
: The original higher.y
argument.
abc
: The original abc
argument.
cv.n
: The original cv.n
argument.
response
: The type of response. Always 'survival' for this function.
formulas
:A list of 3 elements: (1) cate.model
argument,
(2) ps.model
argument and (3) original labels of the left-hand side variable in
ps.model
(treatment) if it was not 0/1.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18.. DOI: 10.1080/01621459.2020.1772080.
catefitsurv()
function and boxplot()
, abc
methods for
"precmed"
objects.
library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) catecv <- catecvsurv(data = survivalExample, score.method = "poisson", cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "logistic", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, higher.y = TRUE, cv.n = 5, seed = 999, verbose = 1) # Try: plot(catecv, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") boxplot(catecv, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") abc(catecv)
library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) catecv <- catecvsurv(data = survivalExample, score.method = "poisson", cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "logistic", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, higher.y = TRUE, cv.n = 5, seed = 999, verbose = 1) # Try: plot(catecv, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") boxplot(catecv, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") abc(catecv)
Provides singly robust and doubly robust estimation of CATE score for count, survival and continuous data with up the following scoring methods among the following: Random forest (survival, continuous only), boosting, poisson regression (count, survival only), two regressions, contrast regression, negative binomial regression (count only), linear regression (continuous only), and generalized additive model (continuous only).
catefit( response, data, score.method, cate.model, ps.model, ps.method = "glm", init.model = NULL, initial.predictor.method = NULL, ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), surv.min = 0.025, xvar.smooth.score = NULL, xvar.smooth.init = NULL, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = TRUE, verbose = 0, ... )
catefit( response, data, score.method, cate.model, ps.model, ps.method = "glm", init.model = NULL, initial.predictor.method = NULL, ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), surv.min = 0.025, xvar.smooth.score = NULL, xvar.smooth.init = NULL, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = TRUE, verbose = 0, ... )
response |
A string describing the type of outcome in the data. Allowed values include
"count" (see |
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted.
The outcome must appear on the left-hand side. For survival outcomes, a |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates specified in |
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Only applies for survival outcomes.
Default is |
ipcw.method |
A character value for the censoring model. Only applies for survival
outcomes. Allowed values are: |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Only applies for
survival outcomes. Default is |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Only applies for survival
outcomes. Default is |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as
the smooth terms if |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating whether intermediate progress messages and histograms should
be printed. |
... |
Additional arguments for |
For count response, see details in catefitcount()
.
For survival response, see details in catefitsurv()
.
For count response, see description of outputs in catefitcount()
.
For survival response, see description of outputs in catefitsurv()
.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
catecv()
# Count outcome fit_1 <- catefit(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = TRUE, seed = 999) coef(fit_1) # Survival outcome library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) fit_2 <- catefit(response = "survival", data = survivalExample, score.method = c("poisson", "boosting", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "logistic", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, higher.y = TRUE, seed = 999, n.cores = 1) coef(fit_2)
# Count outcome fit_1 <- catefit(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = TRUE, seed = 999) coef(fit_1) # Survival outcome library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) fit_2 <- catefit(response = "survival", data = survivalExample, score.method = c("poisson", "boosting", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "logistic", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, higher.y = TRUE, seed = 999, n.cores = 1) coef(fit_2)
Provides singly robust and doubly robust estimation of CATE score with up to 5 scoring methods among the following: Poisson regression, boosting, two regressions, contrast regression, and negative binomial.
catefitcount( data, score.method, cate.model, ps.model, ps.method = "glm", initial.predictor.method = "boosting", minPS = 0.01, maxPS = 0.99, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), xvar.smooth = NULL, tree.depth = 2, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = FALSE, verbose = 0, ... )
catefitcount( data, score.method, cate.model, ps.model, ps.method = "glm", initial.predictor.method = "boosting", minPS = 0.01, maxPS = 0.99, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), xvar.smooth = NULL, tree.depth = 2, n.trees.boosting = 200, B = 3, Kfold = 5, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), seed = NULL, plot.gbmperf = FALSE, verbose = 0, ... )
data |
A data frame containing the variables in the outcome and propensity score model; a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a randomized trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
xvar.smooth |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
The CATE score represents an individual-level treatment effect, estimated with either Poisson regression, boosting or negative binomial regression applied separately by treatment group or with two doubly robust estimators, two regressions and contrast regression (Yadlowsky, 2020) applied to the entire dataset.
catefitcount()
provides the coefficients of the CATE score for each scoring method requested
through score.method
. Currently, contrast regression is the only method which allows
for inference of the CATE coefficients by providing standard errors of the coefficients.
The coefficients can be used to learn the effect size of each variable and predict the
CATE score for a new observation.
catefitcount()
also provides the predicted CATE score of each observation in the data set,
for each scoring method. The predictions allow ranking the observations from potentially
high responders to the treatment to potentially low or standard responders.
The estimated ATE among nested subgroups of high responders are also provided by scoring method.
Note that the ATEs in catefitcount()
are derived based on the CATE score which is estimated
using the same data sample. Therefore, overfitting may be an issue. catecvcount()
is more
suitable to inspect the estimated ATEs across scoring methods as it implements internal cross
validation to reduce optimism.
Returns a list containing the following components:
ate.poisson
: A vector of numerical values of length prop.cutoff
containing the estimated ATE in nested subgroups (defined by prop.cutoff
)
constructed based on the estimated CATE scores with poisson regression.
Only provided if score.method
includes 'poisson'
.
ate.boosting
: Same as ate.poisson
, but with the nested subgroups based
the estimated CATE scores with boosting. Only provided if score.method
includes 'boosting'
.
ate.twoReg
: Same as ate.poisson
, but with the nested subgroups based
the estimated CATE scores with two regressions.
Only provided if score.method
includes 'twoReg'
.
ate.contrastReg
: Same as ate.poisson
, but with the nested subgroups based
the estimated CATE scores with contrast regression.
Only provided if score.method
includes 'contrastReg'
.
ate.negBin
: Same as ate.poisson
, but with the nested subgroups based
the estimated CATE scores with negative binomial regression.
Only provided if score.method
includes 'negBin'
.
score.poisson
: A vector of numerical values of length n
(number of observations in data
) containing the estimated log-CATE scores
according to the Poisson regression. Only provided if score.method
includes 'poisson'
.
score.boosting
: Same as score.poisson
, but with estimated log-CATE score
according to boosting. Only provided if score.method
includes
'boosting'
.
score.twoReg
: Same as score.poisson
, but with estimated log-CATE score
according to two regressions. Only provided if score.method
includes
'twoReg'
.
score.contrastReg
: Same as score.poisson
, but with estimated log-CATE score
according to contrast regression. Only provided if score.method
includes
'contrastReg'
.
score.negBin
: Same as score.poisson
, but with estimated log-CATE score
according to negative binomial regression. Only provided if score.method
includes 'negBin'
.
fit
: Additional details on model fitting if score.method
includes 'boosting' or 'contrastReg':
result.boosting
: Details on the boosting model fitted to observations
with treatment = 0 ($fit0.boosting)
and to observations with treatment = 1 ($fit1.boosting)
.
Only provided if score.method
includes 'boosting'
.
result.contrastReg$sigma.contrastReg
: Variance-covariance matrix of
the estimated log-CATE coefficients in contrast regression.
Only provided if score.method
includes 'contrastReg'
.
coefficients
: A data frame with the coefficients of the estimated log-CATE
score by score.method
. The data frame has number of rows equal to the number of
covariates in cate.model
and number of columns equal to length(score.method)
.
If score.method
includes 'contrastReg'
, the data frame has an additional
column containing the standard errors of the coefficients estimated with contrast regression.
'boosting'
does not have coefficient results because tree-based methods typically do not
express the log-CATE as a linear combination of coefficients and covariates.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
fit <- catefitcount(data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, seed = 999, verbose = 1)
fit <- catefitcount(data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, seed = 999, verbose = 1)
Provides singly robust and doubly robust estimation of CATE score with up to 6 scoring methods among the following: Linear regression, boosting, two regressions, contrast regression, random forest and generalized additive model.
catefitmean( data, score.method, cate.model, ps.model, ps.method = "glm", init.model = NULL, initial.predictor.method = "boosting", minPS = 0.01, maxPS = 0.99, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), xvar.smooth.score = NULL, xvar.smooth.init = NULL, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 6, plot.gbmperf = FALSE, error.maxNR = 0.001, tune = c(0.5, 2), seed = NULL, verbose = 0, ... )
catefitmean( data, score.method, cate.model, ps.model, ps.method = "glm", init.model = NULL, initial.predictor.method = "boosting", minPS = 0.01, maxPS = 0.99, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), xvar.smooth.score = NULL, xvar.smooth.init = NULL, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 6, plot.gbmperf = FALSE, error.maxNR = 0.001, tune = c(0.5, 2), seed = NULL, verbose = 0, ... )
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side.
The treatment must be a numeric vector coded as 0/1.
If data are from a RCT, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of: |
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
initial.predictor.method |
A character vector for the method used to get initial outcome
predictions conditional on the covariates in |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of
the estimated log CATE scores to define nested subgroups. Each element represents the
cutoff to separate observations in nested subgroups (below vs above cutoff).
The length of |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as the
smooth terms if |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as the
smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the number of trees. Used only if
|
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures
in boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
The CATE score represents an individual-level treatment effect, estimated with either linear regression, boosting, random forest and generalized additive model applied separately by treatment group or with two doubly robust estimators, two regressions and contrast regression (Yadlowsky, 2020) applied to the entire dataset.
catefitmean()
provides the coefficients of the CATE score for each scoring method requested
through score.method
. Currently, contrast regression is the only method which allows
for inference of the CATE coefficients by providing standard errors of the coefficients.
The coefficients can be used to learn the effect size of each variable and predict the
CATE score for a new observation.
catefitmean()
also provides the predicted CATE score of each observation in the data set,
for each scoring method. The predictions allow ranking the observations from potentially
high responders to the treatment to potentially low or standard responders.
The estimated ATE among nested subgroups of high responders are also provided by scoring method.
Note that the ATEs in catefitmean()
are derived based on the CATE score which is estimated
using the same data sample. Therefore, overfitting may be an issue. catefitmean()
is more
suitable to inspect the estimated ATEs across scoring methods as it implements internal cross
validation to reduce optimism.
Returns a list containing the following components:
ate.gaussian
: A vector of numerical values of length prop.cutoff
containing the estimated ATE in nested subgroups (defined by prop.cutoff
)
constructed based on the estimated CATE scores with Poisson regression.
Only provided if score.method
includes 'gaussian'
.
ate.boosting
: Same as ate.gaussian
, but with the nested subgroups based
the estimated CATE scores with boosting. Only provided if score.method
includes 'boosting'
.
ate.twoReg
: Same as ate.gaussian
, but with the nested subgroups based
the estimated CATE scores with two regressions.
Only provided if score.method
includes 'twoReg'
.
ate.contrastReg
: Same as ate.gaussian
, but with the nested subgroups based
the estimated CATE scores with contrast regression.
Only provided if score.method
includes 'contrastReg'
.
ate.randomForest
: Same as ate.gaussian
, but with the nested subgroups based
the estimated CATE scores with random forest.
Only provided if score.method
includes 'gam'
.
ate.gam
: Same as ate.gaussian
, but with the nested subgroups based
the estimated CATE scores with generalized additive model.
Only provided if score.method
includes 'gam'
.
score.gaussian
: A vector of numerical values of length n
(number of observations in data
) containing the estimated CATE scores
according to the linear regression. Only provided if score.method
includes 'gaussian'
.
score.boosting
: Same as score.gaussian
, but with estimated CATE score
according to boosting. Only provided if score.method
includes
'boosting'
.
score.twoReg
: Same as score.gaussian
, but with estimated CATE score
according to two regressions. Only provided if score.method
includes
'twoReg'
.
score.contrastReg
: Same as score.gaussian
, but with estimated CATE score
according to contrast regression. Only provided if score.method
includes
'contrastReg'
.
score.randomForest
: Same as score.gaussian
, but with estimated CATE score
according to random forest. Only provided if score.method
includes 'randomForest'
.
score.gam
: Same as score.gaussian
, but with estimated CATE score
according to generalized additive model. Only provided if score.method
includes 'gam'
.
fit
: Additional details on model fitting if score.method
includes 'boosting' or 'contrastReg':
result.boosting
: Details on the boosting model fitted to observations
with treatment = 0 ($fit0.boosting)
and to observations with treatment = 1 ($fit1.boosting)
.
Only provided if score.method
includes 'boosting'
.
result.randomForest
: Details on the boosting model fitted to observations
with treatment = 0 ($fit0.randomForest)
and to observations with treatment = 1 ($fit1.randomForest)
.
Only provided if score.method
includes 'randomForest'
.
result.gam
: Details on the boosting model fitted to observations
with treatment = 0 ($fit0.gam)
and to observations with treatment = 1 ($fit1.gam)
.
Only provided if score.method
includes 'gam'
.
result.contrastReg$sigma.contrastReg
: Variance-covariance matrix of
the estimated CATE coefficients in contrast regression.
Only provided if score.method
includes 'contrastReg'
.
coefficients
: A data frame with the coefficients of the estimated CATE
score by score.method
. The data frame has number of rows equal to the number of
covariates in cate.model
and number of columns equal to length(score.method)
.
If score.method
includes 'contrastReg'
, the data frame has an additional
column containing the standard errors of the coefficients estimated with contrast regression.
'boosting'
, 'randomForest'
, 'gam'
do not have coefficient results because these methods do not
express the CATE as a linear combination of coefficients and covariates.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
catecvmean()
function
Provides singly robust and doubly robust estimation of CATE score for survival data with up to 5 scoring methods among the following: Random forest, boosting, poisson regression, two regressions, and contrast regression.
catefitsurv( data, score.method, cate.model, ps.model, ps.method = "glm", initial.predictor.method = "randomForest", ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), surv.min = 0.025, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 5, plot.gbmperf = TRUE, error.maxNR = 0.001, max.iterNR = 100, tune = c(0.5, 2), seed = NULL, verbose = 0, ... )
catefitsurv( data, score.method, cate.model, ps.model, ps.method = "glm", initial.predictor.method = "randomForest", ipcw.model = NULL, ipcw.method = "breslow", minPS = 0.01, maxPS = 0.99, followup.time = NULL, tau0 = NULL, higher.y = TRUE, prop.cutoff = seq(0.5, 1, length = 6), surv.min = 0.025, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 3, Kfold = 5, plot.gbmperf = TRUE, error.maxNR = 0.001, max.iterNR = 100, tune = c(0.5, 2), seed = NULL, verbose = 0, ... )
data |
A data frame containing the variables in the outcome, propensity score, and inverse
probability of censoring models (if specified); a data frame with |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
cate.model |
A standard |
ps.model |
A formula describing the propensity score (PS) model to be fitted. The treatment must
appear on the left-hand side. The treatment must be a numeric vector coded as 0/1.
If data are from a randomized controlled trial, specify |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates specified in |
ipcw.model |
A formula describing the inverse probability of censoring weighting (IPCW)
model to be fitted. The left-hand side must be empty. Default is |
ipcw.method |
A character value for the censoring model. Allowed values are:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
followup.time |
A column name in |
tau0 |
The truncation time for defining restricted mean time lost. Default is |
higher.y |
A logical value indicating whether higher ( |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
surv.min |
Lower truncation limit for the probability of being censored.
It must be a positive value and should be chosen close to 0. Default is |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
seed |
An optional integer specifying an initial randomization seed for reproducibility.
Default is |
verbose |
An integer value indicating what kind of intermediate progress messages should
be printed. |
... |
Additional arguments for |
The CATE score represents an individual-level treatment effect for survival data, estimated with random forest, boosting, Poisson regression, and the doubly robust estimator (two regressions, Yadlowsky, 2020) applied separately by treatment group or with the other doubly robust estimators (contrast regression, Yadlowsky, 2020) applied to the entire data set.
catefitsurv()
provides the coefficients of the CATE score for each scoring method requested
through score.method
. Currently, contrast regression is the only method which allows
for inference of the CATE coefficients by providing standard errors of the coefficients.
The coefficients can be used to learn the effect size of each variable and predict the
CATE score for a new observation.
catefitsurv()
also provides the predicted CATE score of each observation in the data set,
for each scoring method. The predictions allow ranking the observations from potentially
high responders to the treatment to potentially low or standard responders.
The estimated ATE among nested subgroups of high responders are also provided by scoring method.
Note that the ATEs in catefitsurv()
are derived based on the CATE score which is estimated
using the same data sample. Therefore, overfitting may be an issue. catecvsurv()
is more
suitable to inspect the estimated ATEs across scoring methods as it implements internal cross
validation to reduce optimism.
Returns an object of the class catefit
containing the following components:
ate.randomForest
: A vector of numerical values of length prop.cutoff
containing the estimated ATE by the RMTL ratio in nested subgroups (defined by prop.cutoff
)
constructed based on the estimated CATE scores with random forest method.
Only provided if score.method
includes 'randomForest'
.
ate.boosting
: Same as ate.randomForest
, but with the nested subgroups based
the estimated CATE scores with boosting. Only provided if score.method
includes 'boosting'
.
ate.poisson
: Same as ate.randomForest
, but with the nested subgroups based
the estimated CATE scores with poisson regression.
Only provided if score.method
includes 'poisson'
.
ate.twoReg
: Same as ate.randomForest
, but with the nested subgroups based
the estimated CATE scores with two regressions.
Only provided if score.method
includes 'twoReg'
.
ate.contrastReg
: Same as ate.randomForest
, but with the nested subgroups based
the estimated CATE scores with contrast regression.
Only provided if score.method
includes 'contrastReg'
.
hr.randomForest
: A vector of numerical values of length prop.cutoff
containing the adjusted hazard ratio in nested subgroups (defined by prop.cutoff
)
constructed based on the estimated CATE scores with random forest method.
Only provided if score.method
includes 'randomForest'
.
hr.boosting
: Same as hr.randomForest
, but with the nested subgroups based
the estimated CATE scores with boosting. Only provided if score.method
includes 'boosting'
.
hr.poisson
: Same as hr.randomForest
, but with the nested subgroups based
the estimated CATE scores with poisson regression.
Only provided if score.method
includes 'poisson'
.
hr.twoReg
: Same as hr.randomForest
, but with the nested subgroups based
the estimated CATE scores with two regressions.
Only provided if score.method
includes 'twoReg'
.
hr.contrastReg
: Same as hr.randomForest
, but with the nested subgroups based
the estimated CATE scores with contrast regression.
Only provided if score.method
includes 'contrastReg'
.
score.randomForest
: A vector of numerical values of length n
(number of observations in data
) containing the estimated log-CATE scores
according to random forest. Only provided if score.method
includes 'randomForest'
.
score.boosting
: Same as score.randomForest
, but with estimated log-CATE score
according to boosting. Only provided if score.method
includes
'boosting'
.
score.poisson
: Same as score.randomForest
, but with estimated log-CATE score
according to the Poisson regression. Only provided if score.method
includes 'poisson'
.
score.twoReg
: Same as score.randomForest
, but with estimated log-CATE score
according to two regressions. Only provided if score.method
includes
'twoReg'
.
score.contrastReg
: Same as score.randomForest
, but with estimated log-CATE score
according to contrast regression. Only provided if score.method
includes
'contrastReg'
.
fit
: Additional details on model fitting if score.method
includes 'randomForest', 'boosting' or 'contrastReg':
result.randomForest
: Details on the random forest model fitted to observations
with treatment = 0 ($fit0.rf)
and to observations with treatment = 1 ($fit1.rf)
.
Only provided if score.method
includes 'randomForest'
.
result.boosting
: Details on the boosting model fitted to observations
with treatment = 0, ($fit0.boosting)
and ($fit0.gam)
and to observations with treatment = 1,
($fit1.boosting)
and ($fit1.gam)
.
Only provided if score.method
includes 'boosting'
.
result.contrastReg$converge.contrastReg
: Whether the contrast regression algorithm converged
or not. Only provided if score.method
includes 'contrastReg'
.
coefficients
: A data frame with the coefficients of the estimated log-CATE
score by score.method
. The data frame has number of rows equal to the number of
covariates in cate.model
and number of columns equal to length(score.method)
.
If score.method
includes 'contrastReg'
, the data frame has an additional
column containing the standard errors of the coefficients estimated with contrast regression.
'randomForest'
and 'boosting'
do not have coefficient results because
tree-based methods typically do not express the log-CATE as a linear combination of coefficients
and covariates.
errors/warnings
: A nested list of errors and warnings that were wrapped during the
calculation of ATE. Errors and warnings are organized by score.method
.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) fit <- catefitsurv(data = survivalExample, score.method = "randomForest", cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, seed = 999) coef(fit)
library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) fit <- catefitsurv(data = survivalExample, score.method = "randomForest", cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, seed = 999) coef(fit)
A dataset containing a count outcome, a length of follow-up and 6 baseline covariates
data(countExample)
data(countExample)
A dataframe with 4000 rows (patients) and 9 variables:
age at baseline, centered to 48 years old, in years
sex, 0 for male, 1 for female
previous treatment, "drugA", "drugB", or "drugC"
previous medical cost, in US dollars
previous number of symptoms, "0", "1", or ">=2"
previous number of relapses
current treatment, "drug0" or "drug1"
count outcome, current number of relapses
length of follow-up, in years
data(countExample) str(countExample) rate <- countExample$y / countExample$years
data(countExample) str(countExample) rate <- countExample$y / countExample$years
Estimate restricted mean survival time (RMST) based on Cox regression model
cox.rmst(y, d, x.cate, xnew, tau0)
cox.rmst(y, d, x.cate, xnew, tau0)
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.cate |
Matrix of |
xnew |
Matrix of |
tau0 |
The truncation time for defining restricted mean time lost. |
The estimated RMST for new subjects with covariates xnew
; vector of size m
.
pmcount()
and cvcount()
, after arg.checks()
Data preprocessing
Apply at the beginning of pmcount()
and cvcount()
, after arg.checks()
data.preproc( fun, cate.model, ps.model, data, prop.cutoff = NULL, prop.multi = NULL, ps.method, initial.predictor.method = NULL )
data.preproc( fun, cate.model, ps.model, data, prop.cutoff = NULL, prop.multi = NULL, ps.method, initial.predictor.method = NULL )
fun |
A function for which argument check is needed; "pm" for |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a RCT, specify |
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates. Only applies when |
A list of 6 elements:
- y: outcome; vector of length n
(observations)
- trt: binary treatment; vector of length n
- x.ps: matrix of p.ps
baseline covariates (plus intercept); dimension n
by p.ps + 1
- x.cate: matrix of p.cate
baseline covariates; dimension n
by p.cate
- time: offset; vector of length n
- if fun = "pm"
:
- prop: formatted prop.cutoff
- if fun = "cv"
- prop.onlyhigh: formatted prop.cutoff
with 0 removed if applicable
- prop.bi; formatted prop.cutoff
with 0 and 1 removed if applicable
- prop.multi: formatted prop.multi
, starting with 0 and ending with 1
catefitmean()
and catecvmean()
, after arg.checks()
Data preprocessing
Apply at the beginning of catefitmean()
and catecvmean()
, after arg.checks()
data.preproc.mean( fun, cate.model, init.model, ps.model, data, prop.cutoff = NULL, prop.multi = NULL, ps.method, score.method = NULL, initial.predictor.method = NULL )
data.preproc.mean( fun, cate.model, init.model, ps.model, data, prop.cutoff = NULL, prop.multi = NULL, ps.method, score.method = NULL, initial.predictor.method = NULL )
fun |
A function for which argument check is needed; "pm" for |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
init.model |
A formula describing the initial predictor model. The outcome must appear on the left-hand side.
It must be specified when |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a RCT, specify |
data |
A data frame containing the variables in the outcome and propensity score models;
a data frame with |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates. Only applies when |
A list of 6 elements:
- y: outcome; vector of length n
(observations)
- trt: binary treatment; vector of length n
- x.ps: matrix of p.ps
baseline covariates (plus intercept); dimension n
by p.ps + 1
- x.cate: matrix of p.cate
baseline covariates; dimension n
by p.cate
- x.init: matrix of p.init
baseline covariates; dimension n
by p.init
- if fun = "pm"
:
- prop: formatted prop.cutoff
- if fun = "cv"
- prop.onlyhigh: formatted prop.cutoff
with 0 removed if applicable
- prop.bi; formatted prop.cutoff
with 0 and 1 removed if applicable
- prop.multi: formatted prop.multi
, starting with 0 and ending with 1
catefitcount()
, catecvcount()
, catefitsurv()
, and catecvsurv()
, after arg.checks()
Data preprocessing
Apply at the beginning of catefitcount()
, catecvcount()
, catefitsurv()
, and catecvsurv()
, after arg.checks()
data.preproc.surv( fun, cate.model, ps.model, ipcw.model = NULL, tau0 = NULL, data, prop.cutoff = NULL, prop.multi = NULL, ps.method, initial.predictor.method = NULL, response = "count" )
data.preproc.surv( fun, cate.model, ps.model, ipcw.model = NULL, tau0 = NULL, data, prop.cutoff = NULL, prop.multi = NULL, ps.method, initial.predictor.method = NULL, response = "count" )
fun |
A function for which argument check is needed; "catefit" for |
cate.model |
A formula describing the outcome model to be fitted. The outcome must appear on the left-hand side. |
ps.model |
A formula describing the propensity score model to be fitted.
The treatment must appear on the left-hand side. The treatment must be a numeric vector
coded as 0/1. If data are from a RCT, specify |
ipcw.model |
A formula describing inverse probability of censoring weighting(IPCW) model to be fitted.
If covariates are the same as outcome model, set |
tau0 |
The truncation time for defining restricted mean time lost. Default is |
data |
A data frame containing the variables in the outcome, propensity score, and IPCW models;
a data frame with |
prop.cutoff |
A vector of numerical values (in '(0, 1]') specifying percentiles of the
estimated log CATE scores to define nested subgroups. Each element represents the cutoff to
separate observations in nested subgroups (below vs above cutoff).
The length of |
prop.multi |
A vector of numerical values (in '[0, 1]') specifying percentiles of the
estimated log CATE scores to define mutually exclusive subgroups.
It should start with 0, end with 1, and be of |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates. Only applies when |
response |
The type of response variables; |
A list of elements:
- y: outcome; vector of length n
(observations)
- d : the event indicator; vector of length n
; only if respone = "survival"
- trt: binary treatment; vector of length n
- x.ps: matrix of p.ps
baseline covariates specified in the propensity score model (plus intercept); dimension n
by p.ps + 1
- x.cate: matrix of p.cate
baseline covariates specified in the outcome model; dimension n
by p.cate
- x.ipcw: matrix of p.ipw
baseline covarites specified in inverse probability of censoring weighting model; dimension n
by p.ipw
- time: offset; vector of length n
; only if response = "count"
- if fun = "catefit"
:
- prop: formatted prop.cutoff
- prop.no1: formatted prop.cutoff
with 1 removed if applicable; otherwise prop.no1 is the same as prop
- if fun = "crossv"
- prop.onlyhigh: formatted prop.cutoff
with 0 removed if applicable
- prop.bi; formatted prop.cutoff
with 0 and 1 removed if applicable
- prop.multi: formatted prop.multi
, starting with 0 and ending with 1
Doubly robust estimator of the average treatment effect between two treatments, which is the rate ratio of treatment 1 over treatment 0 for count outcomes.
drcount( y, trt, x.cate, x.ps, time, ps.method = "glm", minPS = 0.01, maxPS = 0.99, interactions = TRUE )
drcount( y, trt, x.cate, x.ps, time, ps.method = "glm", minPS = 0.01, maxPS = 0.99, interactions = TRUE )
y |
A numeric vector of size |
trt |
A numeric vector (in {0, 1}) of size |
x.cate |
A numeric matrix of dimension |
x.ps |
A numeric matrix of dimension |
time |
A numeric vector of size |
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of: |
minPS |
A numerical value between 0 and 1 below which estimated
propensity scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated
propensity scores should be truncated. Must be strictly greater than
|
interactions |
A logical value indicating whether the outcome model
should allow for treatment-covariate interaction by |
Return a list of 4 elements:
log.rate.ratio
: A numeric value of the estimated log rate ratio.
rate0
: A numeric value of the estimated rate in the group trt=0.
rate1
: A numeric value of the estimated rate in the group trt=1.
Doubly robust estimator of the average treatment effect between two treatments, which is the mean difference of treatment 1 over treatment 0 for continuous outcomes.
drmean( y, trt, x.cate, x.ps, ps.method = "glm", minPS = 0.01, maxPS = 0.99, interactions = TRUE )
drmean( y, trt, x.cate, x.ps, ps.method = "glm", minPS = 0.01, maxPS = 0.99, interactions = TRUE )
y |
A numeric vector of size |
trt |
A numeric vector (in {0, 1}) of size |
x.cate |
A numeric matrix of dimension |
x.ps |
A numeric matrix of dimension |
ps.method |
A character value for the method to estimate the propensity
score. Allowed values include one of:
|
minPS |
A numerical value between 0 and 1 below which estimated propensity
scores should be truncated. Default is |
maxPS |
A numerical value between 0 and 1 above which estimated propensity
scores should be truncated. Must be strictly greater than |
interactions |
A logical value indicating whether the outcome model
should assume interactions between |
Return a list of 4 elements:
mean.diff
: A numeric value of the estimated mean difference.
mean.diff0
: A numeric value of the estimated mean difference
in treatment group 0.
mean.diff1
: A numeric value of the estimated mean difference
in treatment group 1.
Doubly robust estimator of the average treatment effect between two treatments, which is the restricted mean time lost (RMTL) ratio of treatment 1 over treatment 0 for survival outcomes.
drsurv( y, d, x.cate, x.ps, x.ipcw, trt, yf = NULL, tau0, surv.min = 0.025, ps.method = "glm", minPS = 0.01, maxPS = 0.99, ipcw.method = "breslow" )
drsurv( y, d, x.cate, x.ps, x.ipcw, trt, yf = NULL, tau0, surv.min = 0.025, ps.method = "glm", minPS = 0.01, maxPS = 0.99, ipcw.method = "breslow" )
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
trt |
Treatment received; vector of size |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
tau0 |
The truncation time for defining restricted mean time lost. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
ipcw.method |
The censoring model. Allowed values are: |
Return a list of 4 elements:
rmst1
: A numeric value of the estimated restricted mean survival time n the group trt = 1
.
rmst0
: A numeric value of the estimated restricted mean survival time n the group trt = 0
.
log.rmtl.ratio
: A numeric value of the estimated log rmtl ratio.
log.hazard.ratio
: A numeric value of the estimated log hazard ratio.
If only care about the higher subgroup (above cutoff), only need trt.est.high so set onlyhigh
to be TRUE
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== TRUE;
this is because estcount.bilevel.subgroups() always takes the subgroup of the top highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
estcount.bilevel.subgroups( y, x.cate, x.ps, time, trt, score, higher.y, prop, onlyhigh, ps.method = "glm", minPS = 0.01, maxPS = 0.99 )
estcount.bilevel.subgroups( y, x.cate, x.ps, time, trt, score, higher.y, prop, onlyhigh, ps.method = "glm", minPS = 0.01, maxPS = 0.99 )
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
trt |
Treatment received; vector of size |
score |
Estimated log CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated log CATE scores that define subgroups to calculate ATE for;
vector of floats in '(0, 1]' (if onlyhigh=T) or in '(0, 1)' (if onlyhigh=F):
Each element of |
onlyhigh |
Indicator of returning only the ATEs in the higher-than-cutoff category of the bi-level subgroups; boolean |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
ate.est.high: estimated ATEs in the multiple bi-level subgroups that are in the higher-than-cutoff category;
vector of size equal to the length of prop; always returned
ate.est.low: estimated ATEs in the multiple bi-level subgroups that are in the lower-than-cutoff category;
vector of size equal to the length of prop; returned only when onlyhigh
== TRUE
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== TRUE;
this is because subgroups defined in estcount.multilevel.subgroup() start from the lowest to the highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
estcount.multilevel.subgroup( y, x.cate, x.ps, time, trt, score, higher.y, prop, ps.method = "glm", minPS = 0.01, maxPS = 0.99 )
estcount.multilevel.subgroup( y, x.cate, x.ps, time, trt, score, higher.y, prop, ps.method = "glm", minPS = 0.01, maxPS = 0.99 )
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
trt |
Treatment received; vector of size |
score |
Estimated log CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated log CATE scores that define subgroups to calculate ATE for;
vector of floats in '[0, 1]' always starting with 0 and ending with 1:
Each element of |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
estimated ATEs of all categories in the one multilevel subgroup; vector of size equal to the length of categories in the multilevel subgroup
If only care about the higher subgroup (above cutoff), only need
trt.est.high so set onlyhigh
to be TRUE. Scores are adjusted to the
opposite sign if higher.y
== FALSE; scores stay the same if
higher.y
== TRUE. This is because estcount.bilevel.subgroups
()
always takes the subgroup of the top highest adjusted scores,and higher
adjusted scores should always represent high responders in treatment group 1.
estmean.bilevel.subgroups( y, x.cate, x.ps, trt, score, higher.y, prop, onlyhigh, ps.method = "glm", minPS = 0.01, maxPS = 0.99 )
estmean.bilevel.subgroups( y, x.cate, x.ps, trt, score, higher.y, prop, onlyhigh, ps.method = "glm", minPS = 0.01, maxPS = 0.99 )
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
trt |
Treatment received; vector of size |
score |
Estimated CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated CATE scores that define subgroups to calculate ATE for;
vector of floats in '(0, 1]' (if onlyhigh=T) or in '(0, 1)' (if onlyhigh=F):
Each element of |
onlyhigh |
Indicator of returning only the ATEs in the higher-than-cutoff category of the bi-level subgroups; boolean |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
ate.est.high: estimated ATEs in the multiple bi-level subgroups that are in the higher-than-cutoff category;
vector of size equal to the length of prop; always returned
ate.est.low: estimated ATEs in the multiple bi-level subgroups that are in the lower-than-cutoff category;
vector of size equal to the length of prop; returned only when onlyhigh
== TRUE
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== TRUE;
this is because subgroups defined in estmean.multilevel.subgroup() start from the lowest to the highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
estmean.multilevel.subgroup( y, x.cate, x.ps, trt, score, higher.y, prop, ps.method = "glm", minPS = 0.01, maxPS = 0.99 )
estmean.multilevel.subgroup( y, x.cate, x.ps, trt, score, higher.y, prop, ps.method = "glm", minPS = 0.01, maxPS = 0.99 )
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
trt |
Treatment received; vector of size |
score |
Estimated CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated CATE scores that define subgroups to calculate ATE for;
vector of floats in '[0, 1]' always starting with 0 and ending with 1:
Each element of |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
estimated ATEs of all categories in the one multilevel subgroup; vector of size equal to the length of categories in the multilevel subgroup
If only care about the higher subgroup (above cutoff), only need ate.rmtl.high and hr.high so set "onlyhigh" to be TRUE
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== FALSE;
this is because estsurv() function always takes the subgroup of the top highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
estsurv.bilevel.subgroups( y, d, x.cate, x.ps, x.ipcw, trt, yf, tau0 = tau0, score, higher.y, prop, onlyhigh, surv.min = 0.025, ps.method = "glm", minPS = 0.01, maxPS = 0.99, ipcw.method = "breslow" )
estsurv.bilevel.subgroups( y, d, x.cate, x.ps, x.ipcw, trt, yf, tau0 = tau0, score, higher.y, prop, onlyhigh, surv.min = 0.025, ps.method = "glm", minPS = 0.01, maxPS = 0.99, ipcw.method = "breslow" )
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
trt |
Treatment received; vector of size |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
tau0 |
The truncation time for defining restricted mean time lost. |
score |
Estimated log CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated log CATE scores that define subgroups to calculate ATE for;
vector of floats in '(0, 1]' (if |
onlyhigh |
Indicator of returning only the ATEs in the higher-than-cutoff category of the bi-level subgroups; boolean. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
ipcw.method |
The censoring model. Allowed values are: |
ate.rmtl.high: estimated ATEs (ratio of RMTL) in the multiple bi-level subgroups that are in the higher-than-cutoff category;
vector of size equal to the length of prop; always returned.
ate.rmtl.low: estimated ATEs (ratio of RMTL) in the multiple bi-level subgroups that are in the lower-than-cutoff category;
vector of size equal to the length of prop; returned only when onlyhigh = TRUE
.
hr.high: unadjusted hazard ratio in the multiple bi-level subgroups that are in the higher-than-cutoff category;
vector of size equal to the length of prop; always returned.
hr.low: unadjusted hazard ratio in the multiple bi-level subgroups that are in the lower-than-cutoff category;
vector of size equal to the length of prop; returned only when onlyhigh = TRUE
Scores are adjusted to the opposite sign if higher.y
== FALSE; scores stay the same if higher.y
== FALSE;
this is because estsurv function for multilevel subgroups start from the lowest to the highest adjusted scores,
and higher adjusted scores should always represent high responders of trt=1
estsurv.multilevel.subgroups( y, d, x.cate, x.ps, x.ipcw, trt, yf, tau0 = tau0, score, higher.y, prop, surv.min = 0.025, ps.method = "glm", minPS = 0.01, maxPS = 0.99, ipcw.method = "breslow" )
estsurv.multilevel.subgroups( y, d, x.cate, x.ps, x.ipcw, trt, yf, tau0 = tau0, score, higher.y, prop, surv.min = 0.025, ps.method = "glm", minPS = 0.01, maxPS = 0.99, ipcw.method = "breslow" )
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
trt |
Treatment received; vector of size |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
tau0 |
The truncation time for defining restricted mean time lost. |
score |
Estimated log CATE scores for all |
higher.y |
A logical value indicating whether higher ( |
prop |
Proportions corresponding to percentiles in the estimated log CATE scores that define subgroups to calculate ATE for;
vector of floats in '[0, 1]' always starting with 0 and ending with 1:
Each element of |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
ps.method |
A character value for the method to estimate the propensity score. Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
ipcw.method |
The censoring model. Allowed values are: |
ate.rmtl: estimated ATEs (ratio of RMTL) of all categories in the one multilevel subgroup; vector of size equal to the length of categories in the multilevel subgroup. ate.hr: unadjusted hazard ratio of all categories in the one multilevel subgroup; vector of size equal to the length of categories in the multilevel subgroup.
This function generates indices for K-fold cross-validation based on the total sample size 'N' and the number of folds 'Kfold'. If 'reverse = TRUE', the remainder indices will be assigned in reverse order.
generate_kfold_indices(N, Kfold, reverse = FALSE)
generate_kfold_indices(N, Kfold, reverse = FALSE)
N |
Integer. Total sample size (number of observations). |
Kfold |
Integer. The number of folds to split the data into. |
reverse |
Logical. Whether to reverse the remainder indices when 'N' is not divisible by 'Kfold'. Defaults to 'FALSE'. |
A vector of length 'N' containing the fold assignments (from 1 to 'Kfold').
Thomas Debray
Propensity score based on a multivariate logistic regression with LASSO penalization on the two-way interactions
glm.ps(trt, x.ps, xnew = NULL, minPS = 0.01, maxPS = 0.99)
glm.ps(trt, x.ps, xnew = NULL, minPS = 0.01, maxPS = 0.99)
trt |
Treatment received; vector of size |
x.ps |
Matrix of |
xnew |
Matrix of |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
The trimmed propensity score for each unit; vector of size n
(if xnew
is NULL) or m
Propensity score based on a multivariate logistic regression with main effects only
glm.simplereg.ps(trt, x.ps, xnew = NULL, minPS = 0.01, maxPS = 0.99)
glm.simplereg.ps(trt, x.ps, xnew = NULL, minPS = 0.01, maxPS = 0.99)
trt |
Treatment received; vector of size |
x.ps |
A matrix of |
xnew |
A matrix of |
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A numerical value (in '(0, 1]') above which estimated propensity scores should be
truncated. Must be strictly greater than |
The estimated propensity score for each unit; vector of size n
(if xnew
is NULL) or m
Coefficients of the CATE estimated with boosting, naive Poisson, two regression, contrast regression, negative binomial
intxcount( y, trt, x.cate, x.ps, time, score.method = c("boosting", "poisson", "twoReg", "contrastReg", "negBin"), ps.method = "glm", minPS = 0.01, maxPS = 0.99, initial.predictor.method = "boosting", xvar.smooth = NULL, tree.depth = 2, n.trees.boosting = 200, B = 3, Kfold = 6, plot.gbmperf = TRUE, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), ... )
intxcount( y, trt, x.cate, x.ps, time, score.method = c("boosting", "poisson", "twoReg", "contrastReg", "negBin"), ps.method = "glm", minPS = 0.01, maxPS = 0.99, initial.predictor.method = "boosting", xvar.smooth = NULL, tree.depth = 2, n.trees.boosting = 200, B = 3, Kfold = 6, plot.gbmperf = TRUE, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2), ... )
y |
Observed outcome; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A number above which estimated propensity scores should be trimmed; scalar |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
xvar.smooth |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
... |
Additional arguments for |
Depending on what score.method is, the outputs is a combination of the following:
result.boosting: Results of boosting fit and best iteration, for trt = 0 and trt = 1 separately
result.poisson: Naive Poisson estimator (beta1 - beta0); vector of length p.cate
+ 1
result.twoReg: Two regression estimator (beta1 - beta0); vector of length p.cate
+ 1
result.contrastReg: A list of the contrast regression results with 3 elements:
$delta.contrastReg: Contrast regression DR estimator; vector of length p.cate
+ 1
$sigma.contrastReg: Variance covariance matrix for delta.contrastReg; matrix of size p.cate
+ 1 by p.cate
+ 1
$converge.contrastReg: Indicator that the Newton Raphson algorithm converged for delta_0
; boolean
result.negBin: Negative binomial estimator (beta1 - beta0); vector of length p.cate
+ 1
best.iter: Largest best iterations for boosting (if used)
fgam: Formula applied in GAM (if used)
Coefficients of the CATE estimated with boosting, linear regression, two regression, contrast regression, random forest, generalized additive model
intxmean( y, trt, x.cate, x.init, x.ps, score.method = c("boosting", "gaussian", "twoReg", "contrastReg", "gam", "randomForest"), ps.method = "glm", minPS = 0.01, maxPS = 0.99, initial.predictor.method = "boosting", xvar.smooth.init, xvar.smooth.score, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 1, Kfold = 2, plot.gbmperf = TRUE, ... )
intxmean( y, trt, x.cate, x.init, x.ps, score.method = c("boosting", "gaussian", "twoReg", "contrastReg", "gam", "randomForest"), ps.method = "glm", minPS = 0.01, maxPS = 0.99, initial.predictor.method = "boosting", xvar.smooth.init, xvar.smooth.score, tree.depth = 2, n.trees.rf = 1000, n.trees.boosting = 200, B = 1, Kfold = 2, plot.gbmperf = TRUE, ... )
y |
Observed outcome; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.init |
Matrix of |
x.ps |
Matrix of |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
ps.method |
A character value for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A number above which estimated propensity scores should be trimmed; scalar |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
xvar.smooth.init |
A vector of characters indicating the name of the variables used as
the smooth terms if |
xvar.smooth.score |
A vector of characters indicating the name of the variables used as
the smooth terms if |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the number of trees. Used only if
|
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used only if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
... |
Additional arguments for |
Depending on what score.method is, the outputs is a combination of the following:
result.boosting: Results of boosting fit and best iteration, for trt = 0 and trt = 1 separately
result.gaussian: Linear regression estimator (beta1 - beta0); vector of length p.cate
+ 1
result.twoReg: Two regression estimator (beta1 - beta0); vector of length p.cate
+ 1
result.contrastReg: A list of the contrast regression results with 3 elements:
$delta.contrastReg: Contrast regression DR estimator; vector of length p.cate
+ 1
$sigma.contrastReg: Variance covariance matrix for delta.contrastReg; matrix of size p.cate
+ 1 by p.cate
+ 1
result.randomForest: Results of random forest fit and best iteration, for trt = 0 and trt = 1 separately
result.gam: Results of generalized additive model fit and best iteration, for trt = 0 and trt = 1 separately
best.iter: Largest best iterations for boosting (if used)
fgam: Formula applied in GAM when initial.predictor.method = 'gam'
warn.fit: Warnings occurred when fitting score.method
err.fit:: Errors occurred when fitting score.method
Coefficients of the CATE estimated with random forest, boosting, naive Poisson, two regression, and contrast regression
intxsurv( y, d, trt, x.cate, x.ps, x.ipcw, yf = NULL, tau0, surv.min = 0.025, score.method = c("randomForest", "boosting", "poisson", "twoReg", "contrastReg"), ps.method = "glm", minPS = 0.01, maxPS = 0.99, ipcw.method = "breslow", initial.predictor.method = "randomForest", tree.depth = 3, n.trees.rf = 1000, n.trees.boosting = 150, B = 3, Kfold = 5, plot.gbmperf = TRUE, error.maxNR = 0.001, max.iterNR = 100, tune = c(0.5, 2), ... )
intxsurv( y, d, trt, x.cate, x.ps, x.ipcw, yf = NULL, tau0, surv.min = 0.025, score.method = c("randomForest", "boosting", "poisson", "twoReg", "contrastReg"), ps.method = "glm", minPS = 0.01, maxPS = 0.99, ipcw.method = "breslow", initial.predictor.method = "randomForest", tree.depth = 3, n.trees.rf = 1000, n.trees.boosting = 150, B = 3, Kfold = 5, plot.gbmperf = TRUE, error.maxNR = 0.001, max.iterNR = 100, tune = c(0.5, 2), ... )
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
x.ps |
Matrix of |
x.ipcw |
Matrix of |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
tau0 |
The truncation time for defining restricted mean time lost. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
ps.method |
A character vector for the method to estimate the propensity score.
Allowed values include one of:
|
minPS |
A numerical value (in '[0, 1]') below which estimated propensity scores should be
truncated. Default is |
maxPS |
A number above which estimated propensity scores should be trimmed; scalar |
ipcw.method |
The censoring model. Allowed values are: |
initial.predictor.method |
A character vector for the method used to get initial
outcome predictions conditional on the covariates in |
tree.depth |
A positive integer specifying the depth of individual trees in boosting
(usually 2-3). Used only if |
n.trees.rf |
A positive integer specifying the maximum number of trees in random forest.
Used if |
n.trees.boosting |
A positive integer specifying the maximum number of trees in boosting
(usually 100-1000). Used if |
B |
A positive integer specifying the number of time cross-fitting is repeated in
|
Kfold |
A positive integer specifying the number of folds (parts) used in cross-fitting
to partition the data in |
plot.gbmperf |
A logical value indicating whether to plot the performance measures in
boosting. Used only if |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
... |
Additional arguments for |
Depending on what score.method is, the outputs is a combination of the following:
result.randomForest: Results of random forest fit, for trt = 0 and trt = 1 separately
result.boosting: Results of boosting fit, for trt = 0 and trt = 1 separately
result.poisson: Naive Poisson estimator (beta1 - beta0); vector of length p.cate
+ 1
result.twoReg: Two regression estimator (beta1 - beta0); vector of length p.cate
+ 1
result.contrastReg: A list of the contrast regression results with 2 elements:
$delta.contrastReg: Contrast regression DR estimator; vector of length p.cate
+ 1
$converge.contrastReg: Indicator that the Newton Raphson algorithm converged for delta_0
; boolean
Probability of being censored which is used to correct the effect of right censoring.
ipcw.surv( y, d, x.ipcw, yf = NULL, ipcw.method = "breslow", tau0, surv.min = 0.025 )
ipcw.surv( y, d, x.ipcw, yf = NULL, ipcw.method = "breslow", tau0, surv.min = 0.025 )
y |
Observed survival or censoring time; vector of size |
d |
The event indicator, normally |
x.ipcw |
Matrix of |
yf |
Follow-up time, interpreted as the potential censoring time; vector of size |
ipcw.method |
The censoring model. Allowed values are: |
tau0 |
The truncation time for defining restricted mean time lost. |
surv.min |
Lower truncation limit for probability of being censored (positive and very close to 0). |
A vector of size n
with the estimated probabilities Pr(C > min(y, tau0) | x.ipcw)
Storing the errors and warnings that occurred when estimating the ATEs in the nested subgroups. If there are no errors and no warnings, the estimated mean difference is provided. If there are warnings but no errors, the estimated mean difference is provided with a warning attribute set. If there are errors, the NA values are returned for mean difference. A error attribute set is also provided.
meanCatch(fun)
meanCatch(fun)
fun |
The drsurv function... |
A list containing
A dataset containing a continuous outcome and 6 baseline covariates
data(meanExample)
data(meanExample)
A dataframe with 4000 rows (patients) and 9 variables:
age at baseline, centered to 48 years old, in years
sex, 0 for male, 1 for female
previous treatment, "drugA", "drugB", or "drugC"
previous medical cost, in US dollars
previous number of symptoms, "0", "1", or ">=2"
previous number of relapses
current treatment, "drug0" or "drug1"
count outcome, current number of relapses
#'
data(meanExample) str(meanExample)
data(meanExample) str(meanExample)
Doubly robust estimators of the coefficients in the two regression
onearmglmcount.dr(y, x.cate, time, trt, ps, f.predictor)
onearmglmcount.dr(y, x.cate, time, trt, ps, f.predictor)
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
trt |
Treatment received; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f.predictor |
Initial prediction of the outcome (expected number of relapses for one unit of exposure time) conditioned
on the covariates |
Doubly robust estimators of the regression coefficients beta_r
in the doubly robust estimating equation
where r = 0, 1
is treatment received; vector of size p
+ 1 (intercept included)
Doubly robust estimators of the coefficients in the two regression
onearmglmmean.dr(y, x.cate, trt, ps, f.predictor)
onearmglmmean.dr(y, x.cate, trt, ps, f.predictor)
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
trt |
Treatment received; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f.predictor |
Initial prediction of the outcome (expected number of relapses for one unit of exposure time) conditioned
on the covariates |
Doubly robust estimators of the regression coefficients beta_r
in the doubly robust estimating equation
where r = 0, 1
is treatment received; vector of size p
+ 1 (intercept included)
Doubly robust estimators of the coefficients in the two regression
onearmsurv.dr(ynew, dnew, trt, x.cate, tau0, weightsurv, ps, f.predictor)
onearmsurv.dr(ynew, dnew, trt, x.cate, tau0, weightsurv, ps, f.predictor)
ynew |
Truncated survival or censoring time; vector of size |
dnew |
The event indicator after truncation, |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
tau0 |
The truncation time for defining restricted mean time lost. |
weightsurv |
Estimated inverse probability of censoring weights with truncation for all observations; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f.predictor |
Initial prediction of the outcome (restricted mean time loss) conditioned on the covariates |
Doubly robust estimators of the two regression coefficients beta_r
where r = 0, 1
is treatment received; vector of size p.cate
+ 1 (intercept included)
Histogram of bootstrap estimates
## S3 method for class 'atefit' plot(x, bins, alpha = 0.7, title = waiver(), theme = theme_classic(), ...)
## S3 method for class 'atefit' plot(x, bins, alpha = 0.7, title = waiver(), theme = theme_classic(), ...)
x |
An object of class |
bins |
Number of bins |
alpha |
Opacity |
title |
The text for the title |
theme |
Defaults to |
... |
Other parameters |
Create a histogram displaying the distribution of the bootstrap estimates. The red vertical reference line represents the final estimate.
A plot of the class ggplot
, displaying the estimated ATE across
the bootstrap samples
Thomas Debray
"precmed"
objectProvides validation curves in two side-by-side plots, visualizing the estimated ATEs in a series
of nested subgroups in the training set and validation set separately, where each line represents
one scoring method specified in catecv()
or catecvmean()
. This should be run
only after results of catecv()
or catecvmean()
have been obtained.
## S3 method for class 'precmed' plot( x, cv.i = NULL, combine = "mean", show.abc = TRUE, valid.only = FALSE, plot.hr = FALSE, ylab = NULL, legend.position = "bottom", xlim = NULL, title = waiver(), theme = theme_classic(), ... )
## S3 method for class 'precmed' plot( x, cv.i = NULL, combine = "mean", show.abc = TRUE, valid.only = FALSE, plot.hr = FALSE, ylab = NULL, legend.position = "bottom", xlim = NULL, title = waiver(), theme = theme_classic(), ... )
x |
An object of class |
cv.i |
A positive integer indicating the index of the CV iteration results to be plotted.
Allowed values are: a positive integer |
combine |
A character value indicating how to combine the estimated ATEs across all CV
iterations into a validation curve for each nested subgroup, separately for the training and
validation results. Allowed values are: |
show.abc |
A logical value indicating whether to show the ABC statistics in the validation set. Used
only if |
valid.only |
A logical value indicating whether only the validation curves in the validation set
should be plotted ( |
plot.hr |
A logical value indicating whether the hazard ratios should be plotted in the
validation curves ( |
ylab |
A character value for the y-axis label to describe what the ATE is. Default is |
legend.position |
A character value for the legend position argument to be passed to |
xlim |
A numeric value for the range of the x-axis. Default is |
title |
The text for the title |
theme |
Defaults to |
... |
Other parameters |
plot()
takes in outputs from catecv()
and generates two plots
of validation curves side-by-side, one for the training set and one for validation set.
Separate validation curves are produced for each scoring method specified via score.method
in catecv()
or catecvmean()
.
The validation curves (and ABC statistics, if applicable) can help compare the performance of different scoring methods in terms of discerning potential treatment heterogeneity in subgroups with internal validation. Steeper validation curves in the validation set suggest presence of treatment effect heterogeneity (and the ability of the scoring methods to capture it) while flat validation curves indicate absence of treatment effect heterogeneity (or inability of the scoring method to capture it).
Returns two side-by-side line plots, one of which shows the validation curves of the training
sets and the other the validation curves in the validation sets. A gray horizontal dashed line of
overall ATE is included as a reference. ABC statistics will be added to the legend if
show.abc = TRUE
.
Yadlowsky, S., Pellegrini, F., Lionetto, F., Braune, S., & Tian, L. (2020). Estimation and validation of ratio-based conditional average treatment effects using observational data. Journal of the American Statistical Association, 1-18. DOI: 10.1080/01621459.2020.1772080.
abc()
and boxplot()
for "precmed"
objects.
# Count outcome eval_1 <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5) # default setting plot(eval_1) # turn off ABC annotation plot(eval_1, show.abc = FALSE) # use a different theme plot(eval_1, theme = ggplot2::theme_bw()) # plot the validation curves from the 2nd CV iteration instead of the mean # of all validation curves plot(eval_1, cv.i = 2) # median of the validation curves plot(eval_1, combine = "median") # plot validation curves in validation set only plot(eval_1, valid.only = TRUE) # Survival outcome library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) eval_2 <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "randomForest", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, cv.n = 5, seed = 999) # default setting, plot RMTL ratios in both training and validation sets plot(eval_2) # plot hazard ratio plot(eval_2, plot.hr = TRUE)
# Count outcome eval_1 <- catecv(response = "count", data = countExample, score.method = "poisson", cate.model = y ~ age + female + previous_treatment + previous_cost + previous_number_relapses + offset(log(years)), ps.model = trt ~ age + previous_treatment, higher.y = FALSE, cv.n = 5) # default setting plot(eval_1) # turn off ABC annotation plot(eval_1, show.abc = FALSE) # use a different theme plot(eval_1, theme = ggplot2::theme_bw()) # plot the validation curves from the 2nd CV iteration instead of the mean # of all validation curves plot(eval_1, cv.i = 2) # median of the validation curves plot(eval_1, combine = "median") # plot validation curves in validation set only plot(eval_1, valid.only = TRUE) # Survival outcome library(survival) tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95))) eval_2 <- catecv(response = "survival", data = survivalExample, score.method = c("poisson", "randomForest"), cate.model = Surv(y, d) ~ age + female + previous_cost + previous_number_relapses, ps.model = trt ~ age + previous_treatment, initial.predictor.method = "randomForest", ipcw.model = ~ age + previous_cost + previous_treatment, tau0 = tau0, cv.n = 5, seed = 999) # default setting, plot RMTL ratios in both training and validation sets plot(eval_2) # plot hazard ratio plot(eval_2, plot.hr = TRUE)
Print function for atefit
## S3 method for class 'atefit' print(x, ...)
## S3 method for class 'atefit' print(x, ...)
x |
An object of class |
... |
Other parameters |
Display the estimated treatment effects for survival outcomes (log restricted mean time lost ratio and log hazard ratio) and count outcomes (the log rate ratio).
No return value
Thomas Debray
Print function for atefit
## S3 method for class 'catefit' print(x, ...)
## S3 method for class 'catefit' print(x, ...)
x |
An object of class |
... |
Other parameters |
Display the estimated treatment effects for survival outcomes (log restricted mean time lost ratio and log hazard ratio) and count outcomes (the log rate ratio).
No return value
Thomas Debray
Based on intxcount results of the CATE coefficients estimated with boosting, naive Poisson, two regression, contrast regression, negative binomial
scorecount( fit, x.cate, time, score.method = c("boosting", "poisson", "twoReg", "contrastReg", "negBin") )
scorecount( fit, x.cate, time, score.method = c("boosting", "poisson", "twoReg", "contrastReg", "negBin") )
fit |
List of objects generated from intxcount: outputs of boosting, naive Poisson, two regression, contrast regression, negative binomial |
x.cate |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
score.boosting: Estimated log CATE score for all n
observations with the boosting method; vector of size n
score.poisson: Estimated log CATE score for all n
observations with the naive Poisson method; vector of size n
score.twoReg: Estimated log CATE score for all n
observations with the two regression method; vector of size n
score.contrastReg: Estimated log CATE score for all n
observations with the contrast regression method; vector of size n
score.negBin: Estimated log CATE score for all n
observations with the naive Poisson method; vector of size n
score = NA if the corresponding method is not called
Based on intxmean results of the CATE coefficients estimated with boosting, linear regression, two regression, contrast regression, random forest, generalized additive model
scoremean( fit, x.cate, score.method = c("boosting", "gaussian", "twoReg", "contrastReg", "randomForest", "gam") )
scoremean( fit, x.cate, score.method = c("boosting", "gaussian", "twoReg", "contrastReg", "randomForest", "gam") )
fit |
List of objects generated from intxmean: outputs of boosting, linear regression, two regression, contrast regression, random forest, generalized additive model |
x.cate |
Matrix of |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
score.boosting: Estimated CATE score for all n
observations with the boosting method; vector of size n
score.gaussian: Estimated CATE score for all n
observations with the linear regression method; vector of size n
score.twoReg: Estimated CATE score for all n
observations with the two regression method; vector of size n
score.contrastReg: Estimated CATE score for all n
observations with the contrast regression method; vector of size n
score.randomForest: Estimated CATE score for all n
observations with the random forest method; vector of size n
score.gam: Estimated CATE score for all n
observations with the generalized additive model; vector of size n
score = NA if the corresponding method is not called
Based on intxsurv results of the CATE coefficients estimated with random forest, boosting, naive Poisson, two regression, contrast regression
scoresurv( fit, x.cate, tau0, score.method = c("randomForest", "boosting", "poisson", "twoReg", "contrastReg") )
scoresurv( fit, x.cate, tau0, score.method = c("randomForest", "boosting", "poisson", "twoReg", "contrastReg") )
fit |
List of objects generated from intxsurv: outputs of random forest, boosting, naive Poisson, two regression, contrast regression |
x.cate |
Matrix of |
tau0 |
The truncation time for defining restricted mean time lost. |
score.method |
A vector of one or multiple methods to estimate the CATE score.
Allowed values are: |
score.randomForest: Estimated log CATE score for all n
observations with the random forest method; vector of size n
score.boosting: Estimated log CATE score for all n
observations with the boosting method; vector of size n
score.poisson: Estimated log CATE score for all n
observations with the naive Poisson method; vector of size n
score.twoReg: Estimated log CATE score for all n
observations with the two regression method; vector of size n
score.contrastReg: Estimated log CATE score for all n
observations with the contrast regression method; vector of size n
score = NA if the corresponding method is not called
Storing the errors and warnings that occurred when estimating the ATEs in the nested subgroups. If there are no errors and no warnings, the estimated log.rmtl.ratio and log.hazard.ratio are provided. If there are warnings but no errors, the estimated log.rmtl.ratio and log.hazard.ratio are provided with a warning attribute set. If there are errors, the NA values are returned for log.rmtl.ratio and log.hazard.ratio. A error attribute set is also provided.
survCatch(fun)
survCatch(fun)
fun |
The drsurv function... |
A list containing
A dataset containing a time-to-event outcome, an event indicator, treatment group, and 6 baseline covariates
data(survivalExample)
data(survivalExample)
A dataframe with 4000 rows (patients) and 9 variables:
age at baseline, centered to 48 years old, in years
sex, 0 for male, 1 for female
previous treatment, "drugA", "drugB", or "drugC"
previous medical cost, in US dollars
previous number of symptoms, "0", "1", or ">=2"
previous number of relapses
current treatment, "drug0" or "drug1"
time to first relapse or censoring
event indicator, 1: relapse, 0: censored
data(survivalExample) str(survivalExample)
data(survivalExample) str(survivalExample)
Newton-Raphson algorithm is used to solve the estimating equation bar S_n (delta) = 0
twoarmglmcount.dr( y, x.cate, time, trt, ps, f1.predictor, f0.predictor, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2) )
twoarmglmcount.dr( y, x.cate, time, trt, ps, f1.predictor, f0.predictor, error.maxNR = 0.001, max.iterNR = 150, tune = c(0.5, 2) )
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
time |
Log-transformed person-years of follow-up; vector of size |
trt |
Treatment received; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f1.predictor |
Initial predictions of the outcome (expected number of relapses for one unit of exposure time)
conditioned on the covariates |
f0.predictor |
Initial predictions of the outcome (expected number of relapses for one unit of exposure time)
conditioned on the covariates |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
coef: Doubly robust estimators of the regression coefficients delta_0
; vector of size p
+ 1 (intercept included)
vcov: Variance-covariance matrix of the estimated coefficient delta_0
; matrix of size p
+ 1 by p
+ 1
converge: Indicator that the Newton Raphson algorithm converged for delta_0
; boolean
Solving the estimating equation bar S_n (delta) = 0
twoarmglmmean.dr(y, x.cate, trt, ps, f1.predictor, f0.predictor)
twoarmglmmean.dr(y, x.cate, trt, ps, f1.predictor, f0.predictor)
y |
Observed outcome; vector of size |
x.cate |
Matrix of |
trt |
Treatment received; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f1.predictor |
Initial predictions of the outcome (expected number of relapses for one unit of exposure time)
conditioned on the covariates |
f0.predictor |
Initial predictions of the outcome (expected number of relapses for one unit of exposure time)
conditioned on the covariates |
coef: Doubly robust estimators of the regression coefficients delta_0
; vector of size p
+ 1 (intercept included)
vcov: Variance-covariance matrix of the estimated coefficient delta_0
; matrix of size p
+ 1 by p
+ 1
Newton-Raphson algorithm is used to solve the estimating equation bar S_n (delta) = 0
twoarmsurv.dr( ynew, dnew, trt, x.cate, tau0, weightsurv, ps, f1.predictor, f0.predictor, error.maxNR = 0.001, max.iterNR = 100, tune = c(0.5, 2) )
twoarmsurv.dr( ynew, dnew, trt, x.cate, tau0, weightsurv, ps, f1.predictor, f0.predictor, error.maxNR = 0.001, max.iterNR = 100, tune = c(0.5, 2) )
ynew |
Truncated survival time; vector of size |
dnew |
Event indicator after truncation; vector of size |
trt |
Treatment received; vector of size |
x.cate |
Matrix of |
tau0 |
The truncation time for defining restricted mean time lost. |
weightsurv |
Estimated inverse probability of censoring weights with truncation for all observations; vector of size |
ps |
Estimated propensity scores for all observations; vector of size |
f1.predictor |
Initial predictions of the outcome (restricted mean time loss) conditioned on the covariates |
f0.predictor |
Initial predictions of the outcome (restricted mean time loss) conditioned on the covariates |
error.maxNR |
A numerical value > 0 indicating the minimum value of the mean absolute
error in Newton Raphson algorithm. Used only if |
max.iterNR |
A positive integer indicating the maximum number of iterations in the
Newton Raphson algorithm. Used only if |
tune |
A vector of 2 numerical values > 0 specifying tuning parameters for the
Newton Raphson algorithm. |
coef: Doubly robust estimators of the contrast regression coefficients delta_0
; vector of size p.cate
+ 1 (intercept included)
converge: Indicator that the Newton Raphson algorithm converged for delta_0
; boolean