| Title: | Joint Covariance and Treatment-Effect Tests for Multiple Outcomes |
|---|---|
| Description: | Fits generalized linear models, Cox proportional-hazards models, log-rank tests, generalized estimating equations, mixed models with repeated measures, Kaplan-Meier curves, quantile differences, and hierarchical net-benefit (win-difference) and log win-ratio statistics jointly across multiple endpoints, and returns the full asymptotic covariance matrix linking them. Implements PATED (Prognostic Assisted Treatment Effect Detection), a randomized-trial method that exploits balanced prognostic covariates to tighten standard errors and increase statistical power without introducing bias. |
| Authors: | Han Zhang [aut, cre] |
| Maintainer: | Han Zhang <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.18.1 |
| Built: | 2026-06-15 15:57:46 UTC |
| Source: | https://github.com/cran/multipleOutcomes |
actg dataset from Hosmer et al.
A data frame
Identification Code
Time to AIDS diagnosis or death (days).
Event indicator. 1 = AIDS defining diagnosis, 0 = Otherwise.
Time to death (days)
Event indicator for death (only). 1 = Death, 0 = Otherwise.
Treatment indicator. 1 = Treatment includes IDV, 0 = Control group.
Treatment group indicator. 1 = ZDV + 3TC. 2 = ZDV + 3TC + IDV. 3 = d4T + 3TC. 4 = d4T + 3TC + IDV.
CD4 stratum at screening. 0 = CD4 <= 50. 1 = CD4 > 50.
0 = Male. 1 = Female.
Race/Ethnicity. 1 = White Non-Hispanic. 2 = Black Non-Hispanic. 3 = Hispanic. 4 = Asian, Pacific Islander. 5 = American Indian, Alaskan Native. 6 = Other/unknown.
IV drug use history. 1 = Never. 2 = Currently. 3 = Previously.
Hemophiliac. 1 = Yes. 0 = No.
Karnofsky Performance Scale. 100 = Normal; no complaint no evidence of disease. 90 = Normal activity possible; minor signs/symptoms of disease. 80 = Normal activity with effort; some signs/symptoms of disease. 70 = Cares for self; normal activity/active work not possible.
Baseline CD4 count (Cells/Milliliter).
Months of prior ZDV use (months).
Age at Enrollment (years).
ftp://ftp.wiley.com/public/sci_tech_med/survival
Hosmer, D.W. and Lemeshow, S. and May, S. (2008) Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY
data(actg)data(actg)
coef is a generic function.
## S3 method for class 'jointCovariance' coef(object, model_index = NULL, ...)## S3 method for class 'jointCovariance' coef(object, model_index = NULL, ...)
object |
an object returned by |
model_index |
|
... |
for debugging only |
a vector of coefficient estimates
coxph_ is a wrapper function of survival::coxph to create an object to be
passed into jointCovariance, the main function of this package through its
argument .... The object defines how a proportional hazard model would be
fitted.
coxph_(formula, data_index = 1)coxph_(formula, data_index = 1)
formula |
see |
data_index |
integer. Index of the data frame in the |
Not all arguments of survival::coxph are supported in coxph_ due to the
complexity in handling environment and scope, which is particularly difficult
for arguments like weights, subset, etc.
gee_ is a wrapper function of gee::gee to create an object to be passed
into jointCovariance, the main function of this package through its
argument .... The object defines how a GEE model would be fitted.
This package does not import the package gee. Instead, codes of gee are
modified and integrated to compute score and information matrix. Thus, users
does not need to install the package gee to use this package.
gee_(formula, family, corstr, R = NULL, b = NULL, Mv = 1, data_index = 1)gee_(formula, family, corstr, R = NULL, b = NULL, Mv = 1, data_index = 1)
formula |
see |
family |
see |
corstr |
see |
R |
see |
b |
see |
Mv |
see |
data_index |
integer. Index of the data frame in the |
Not all arguments of stats::gee are supported in gee_ due to the
complexity in handling environment and scope, which is particularly difficult
for arguments like subset, etc.
glm_ is a wrapper function of stats::glm to create an object to be passed
into jointCovariance, the main function of this package through its
argument .... The object defines how a GLM model would be fitted.
glm_(formula, family, data_index = 1)glm_(formula, family, data_index = 1)
formula |
see |
family |
currently supports |
data_index |
integer. Index of the data frame in the |
Not all arguments of stats::glm are supported in glm_ due to the
complexity in handling environment and scope, which is particularly difficult
for arguments like weights, subset, etc.
Patient-level data from a multi-center, randomized, double-blind, placebo-controlled 2-arm trial (n = 602) of rectal indomethacin (100 mg) versus placebo to prevent post-ERCP pancreatitis in high-risk patients, as reported by Elmunzer, Higgins, et al. (2012) in the New England Journal of Medicine.
This dataset was originally collected, cleaned, reformatted, and released
for public teaching and research use by Dr. Peter D. R. Higgins in the
medicaldata R package as indo_rct. The version shipped here is
redistributed in support of the worked examples in this package. The
variable definitions below follow medicaldata; note that a few
columns are stored as numeric (rather than factor) in this copy. Users who
need the authoritative copy or accompanying documentation should consult
medicaldata.
A data frame with 602 observations on the following 33 variables:
idSubject identifier (numeric); leading digit indicates center. Range 1001–4003.
siteStudy site (factor, 4 levels):
1_UM = University of Michigan,
2_IU = Indiana University,
3_UK = University of Kentucky,
4_Case = Case Western Reserve University.
ageAge in years (numeric), range 19–90.
riskRisk score for post-ERCP pancreatitis (numeric), range 1–5.5.
genderSex (factor): 1_female, 2_male.
outcomePrimary outcome: post-ERCP pancreatitis (numeric, 1 = yes, 0 = no).
sodSphincter of Oddi dysfunction present (factor):
0_no, 1_yes.
pepHistory of prior post-ERCP pancreatitis (factor):
0_no, 1_yes.
recpancHistory of recurrent pancreatitis (factor):
0_no, 1_yes.
psphincPancreatic sphincterotomy performed (factor):
0_no, 1_yes.
precutSphincter pre-cut needed to enter papilla (factor):
0_no, 1_yes.
difcanCannulation of papilla was difficult (factor):
0_no, 1_yes.
pneudilPneumatic dilation of papilla performed (factor):
0_no, 1_yes.
ampAmpullectomy performed (factor):
0_no, 1_yes.
paninjContrast injected into pancreas (factor):
0_no, 1_yes.
acinarPancreas appeared to have acinarization on imaging
(factor): 0_no, 1_yes.
brushBrushings taken from pancreatic duct (factor):
0_no, 1_yes.
asa81Aspirin used at 81 mg per day (factor with 3 levels):
0_no, 1_yes, and a third level retained from the
source coding.
asa325Aspirin used at 325 mg per day (factor with 3
levels): 0_no, 1_yes, and a third level retained from
the source coding.
asaAspirin used at any dose (factor with 3 levels):
0_no, 1_yes, and a third level retained from the
source coding.
prophystentPancreatic duct stent placed per endoscopist
judgment (factor): 0_no, 1_yes.
therastentPancreatic duct stent placed to treat narrowing
(factor): 0_no, 1_yes.
pdstentPancreatic duct stent placed for any reason
(factor): 0_no, 1_yes.
sodsomSphincter of Oddi manometry performed (factor):
0_no, 1_yes.
bsphincBiliary sphincterotomy performed (factor):
0_no, 1_yes.
bstentBiliary stent placed to relieve obstruction
(factor): 0_no, 1_yes.
choleCholedocholithiasis present (factor):
0_no, 1_yes.
pbmalBiliary duct or pancreatic malignancy found (factor):
0_no, 1_yes.
trainTrainee participated in ERCP (factor):
0_no, 1_yes.
statusPatient status (factor):
0_inpatient, 1_outpatient.
typeSphincter of Oddi dysfunction type (factor):
0_no SOD, 1_type 1, 2_type 2, 3_type 3.
rxTreatment assignment (numeric, 1 = indomethacin, 0 = placebo).
bleedReportable gastrointestinal bleeding (numeric,
coded 1 = no, 2 = yes; NA when not assessed).
Higgins, P. D. R. medicaldata: Data Package for Medical Datasets.
R package, dataset indo_rct.
https://CRAN.R-project.org/package=medicaldata
Elmunzer BJ, Higgins PDR, Saini SD, et al. A randomized trial of rectal indomethacin to prevent post-ERCP pancreatitis. New England Journal of Medicine 2012; 366(15):1414–1422. doi:10.1056/NEJMoa1111103
data(indo)data(indo)
jointCovariance can fit different types of models for multiple outcomes
simultaneously and return model parameters and variance-covariance matrix
for further analysis.
jointCovariance(..., data, nboot = 0, compute_cov = TRUE, seed = NULL)jointCovariance(..., data, nboot = 0, compute_cov = TRUE, seed = NULL)
... |
objects returned by |
data |
a data frame if all models are fitted on the same dataset;
otherwise a list of data frames for fitting models in |
nboot |
non-zero integer if bootstrap is adopted. By default 0. |
compute_cov |
logic. If |
seed |
random seed when generate bootstrap data. |
It returns an object of class "jointCovariance", which is a list containing the following components:
coefficients |
an unnamed vector of coefficients of all fitted models.
Use id_map for variable mapping. |
mcov |
a unnamed matrix of covariance of coefficients. Use id_map
for variable mapping. |
id_map |
a list mapping the elements in coefficients and mcov to
variable names. |
n_shared_sample_sizes |
a matrix of shared sample sizes between datasets being used to fit the models. |
call |
the matched call. |
## More examples can be found in the vignettes. library(survival) library(mvtnorm) library(tidyr) genData <- function(seed = NULL){ set.seed(seed) n <- 300 sigma <- matrix(.7, 4, 4) diag(sigma) <- 1 v <- rmvnorm(n, sigma = sigma) x1 <- v[, 1] x2 <- v[, 2] z1 <- (v[, 3] > 0) + 0 z2 <- v[, 4] trt <- rbinom(n, 1, .5) bet <- c(-.3,.3) y <- -log(runif(n))/ exp(-.3 * x1 + .3 * x2 + z1 * .5 - z2 * .3 + .1 * trt + rnorm(n)) z1[sample.int(n, 50)] <- NA z2[sample.int(n, 50)] <- NA x1[sample.int(n, 50)] <- NA x2[sample.int(n, 50)] <- NA death <- ifelse(y > 2, 0, 1) y[y > 2] <- 2 pid <- paste0('pid-', 1:n) ret <- data.frame( y = y, trt = trt, z1 = z1, z2 = z2, x1 = x1, x2 = x2, death, pid) ret } dat1 <- genData() ## create a dataset with repeated measurements x dat2 <- dat1 %>% pivot_longer(c(x1, x2), names_to='visit', values_to='x') %>% dplyr::select(x, trt, visit, pid) %>% as.data.frame() dat2$visit <- as.factor(dat2$visit) dat2$pid <- as.factor(dat2$pid) fit <- jointCovariance( coxph_(Surv(time = y, event = death) ~ trt, data_index = 1), logrank_(Surv(time = y, event=death) ~ trt, data_index = 1), glm_(z1 ~ trt, family = 'binomial', data_index = 1), glm_(z2 ~ trt, family = 'gaussian', data_index = 1), mmrm_(x ~ trt + us(visit | pid), reml = TRUE, data_index = 2), gee_(x ~ trt, family = 'gaussian', corstr = 'independence', data_index = 2), data = list(dat1, dat2)) fit bfit <-jointCovariance( coxph_(Surv(time=y, event=death) ~ trt, data_index = 1), logrank_(Surv(time=y, event=death) ~ trt, data_index = 1), glm_(z1 ~ trt, family = 'binomial', data_index = 1), glm_(z2 ~ trt, family = 'gaussian', data_index = 1), mmrm_(x ~ trt + us(visit | pid), reml = TRUE, data_index = 2), gee_(x ~ trt, family = 'gaussian', corstr = 'independence', data_index = 2), data = list(dat1, dat2), nboot = 10) summary(bfit) ## km_() and quantile_() require nboot > 0 because they have no ## closed-form score. compute_cov is forced to FALSE for km_(). ## When all models share one dataset, `data_index` and `list(...)` ## can be omitted. kfit <- jointCovariance( km_(Surv(time = y, event = death) ~ trt, conf_type = 'log', times = c(0.5, 1, 1.5)), glm_(z1 ~ trt, family = 'binomial'), data = dat1, nboot = 30, seed = 1) qfit <- jointCovariance( quantile_(y ~ trt, probs = c(0.25, 0.5, 0.75)), glm_(z2 ~ trt, family = 'gaussian'), data = dat1, nboot = 30, seed = 1)## More examples can be found in the vignettes. library(survival) library(mvtnorm) library(tidyr) genData <- function(seed = NULL){ set.seed(seed) n <- 300 sigma <- matrix(.7, 4, 4) diag(sigma) <- 1 v <- rmvnorm(n, sigma = sigma) x1 <- v[, 1] x2 <- v[, 2] z1 <- (v[, 3] > 0) + 0 z2 <- v[, 4] trt <- rbinom(n, 1, .5) bet <- c(-.3,.3) y <- -log(runif(n))/ exp(-.3 * x1 + .3 * x2 + z1 * .5 - z2 * .3 + .1 * trt + rnorm(n)) z1[sample.int(n, 50)] <- NA z2[sample.int(n, 50)] <- NA x1[sample.int(n, 50)] <- NA x2[sample.int(n, 50)] <- NA death <- ifelse(y > 2, 0, 1) y[y > 2] <- 2 pid <- paste0('pid-', 1:n) ret <- data.frame( y = y, trt = trt, z1 = z1, z2 = z2, x1 = x1, x2 = x2, death, pid) ret } dat1 <- genData() ## create a dataset with repeated measurements x dat2 <- dat1 %>% pivot_longer(c(x1, x2), names_to='visit', values_to='x') %>% dplyr::select(x, trt, visit, pid) %>% as.data.frame() dat2$visit <- as.factor(dat2$visit) dat2$pid <- as.factor(dat2$pid) fit <- jointCovariance( coxph_(Surv(time = y, event = death) ~ trt, data_index = 1), logrank_(Surv(time = y, event=death) ~ trt, data_index = 1), glm_(z1 ~ trt, family = 'binomial', data_index = 1), glm_(z2 ~ trt, family = 'gaussian', data_index = 1), mmrm_(x ~ trt + us(visit | pid), reml = TRUE, data_index = 2), gee_(x ~ trt, family = 'gaussian', corstr = 'independence', data_index = 2), data = list(dat1, dat2)) fit bfit <-jointCovariance( coxph_(Surv(time=y, event=death) ~ trt, data_index = 1), logrank_(Surv(time=y, event=death) ~ trt, data_index = 1), glm_(z1 ~ trt, family = 'binomial', data_index = 1), glm_(z2 ~ trt, family = 'gaussian', data_index = 1), mmrm_(x ~ trt + us(visit | pid), reml = TRUE, data_index = 2), gee_(x ~ trt, family = 'gaussian', corstr = 'independence', data_index = 2), data = list(dat1, dat2), nboot = 10) summary(bfit) ## km_() and quantile_() require nboot > 0 because they have no ## closed-form score. compute_cov is forced to FALSE for km_(). ## When all models share one dataset, `data_index` and `list(...)` ## can be omitted. kfit <- jointCovariance( km_(Surv(time = y, event = death) ~ trt, conf_type = 'log', times = c(0.5, 1, 1.5)), glm_(z1 ~ trt, family = 'binomial'), data = dat1, nboot = 30, seed = 1) qfit <- jointCovariance( quantile_(y ~ trt, probs = c(0.25, 0.5, 0.75)), glm_(z2 ~ trt, family = 'gaussian'), data = dat1, nboot = 30, seed = 1)
km_ is a wrapper function creating an object of Kaplan-Meier curve to be
passed into jointCovariance, the main function of this package through its
argument .... The object defines how a Kaplan-Meier curve would be fitted.
km_(formula, times = NULL, conf_type, data_index = 1)km_(formula, times = NULL, conf_type, data_index = 1)
formula |
a formula created by |
times |
numeric vector of time. Survival probabilities at |
conf_type |
character. Type of confidence interval. It must be one of
|
data_index |
integer. Index of the data frame in the |
Usually, g-transformation is applied to the survival probability S(t)
to obtain pointwise confidence interval of a Kaplan-Meier curve. This
can be achieved by specifying conf_type. For identity transformation,
use conf_type = "plain".
This function can only be used with jointCovariance when the bootstrap
method is used to estimate variance-covariance matrix of multiple outcome
models.
logrank_ is a wrapper function survival::coxph to create an object to be
passed into jointCovariance, the main function of this package through its
argument .... Logrank test is the score test under the proportional hazards
regression model. The object defines how a logrank test would be computed.
logrank_(formula, ties = c("efron", "breslow", "exact"), data_index = 1)logrank_(formula, ties = c("efron", "breslow", "exact"), data_index = 1)
formula |
see |
ties |
character string specifying the method for tie handling. One of
|
data_index |
integer. Index of the data frame in the |
Not all arguments of survival::coxph are supported in logrank_ due to the
complexity in handling environment and scope, which is particularly difficult
for arguments like weights, subset, etc.
mmrm_ is a wrapper function of mmrm::mmrm to create an object to be passed
into jointCovariance, the main function of this package through its
argument .... The object defines how a MMRM model would be fitted.
mmrm_( formula, covariance = NULL, reml = TRUE, control = mmrm::mmrm_control(...), ..., data_index = 1 )mmrm_( formula, covariance = NULL, reml = TRUE, control = mmrm::mmrm_control(...), ..., data_index = 1 )
formula |
see |
covariance |
see |
reml |
see |
control |
see |
... |
see |
data_index |
integer. Index of the data frame in the |
The argument weights of mmrm::mmrm is supported in mmrm_ due to the
complexity in handling environment and scope.
Please always refer to help document of mmrm::mmrm before using mmrm_.
For example, time variable and observation ID must be factor variables in
some cases, otherwise error may be prompted. Users can call mmrm::mmrm
using the same arguments being passed to mmrm_ to check validity.
These helpers build the individual endpoint specifications that
netbenefit_() consumes through its endpoints argument. Each call
returns a small list describing one comparison rule; the order in
which they appear in the endpoints list defines the hierarchical
priority (first = highest).
nb_tte( time, event, direction = c("longer_better", "shorter_better"), margin = 0, censor_rule = c("informative", "ignore") ) nb_continuous( value, direction = c("larger_better", "smaller_better"), margin = 0 ) nb_binary(value, direction = c("larger_better", "smaller_better"))nb_tte( time, event, direction = c("longer_better", "shorter_better"), margin = 0, censor_rule = c("informative", "ignore") ) nb_continuous( value, direction = c("larger_better", "smaller_better"), margin = 0 ) nb_binary(value, direction = c("larger_better", "smaller_better"))
time |
character. Name of the time column in |
event |
character. Name of the event-indicator column in |
direction |
one of |
margin |
non-negative numeric. Minimum directional gap required to declare a winner on this endpoint. Defaults to 0. |
censor_rule |
( |
value |
character. Name of the outcome column in |
An object of class c("nb_endpoint_<type>", "nb_endpoint")
carrying the rule and its arguments.
For each endpoint type, direction says whether a larger value (or
longer time, for TTE) counts as a win for the subject. With the
default convention, a treatment subject "wins" the endpoint when its
value, possibly minus a margin, exceeds the control subject's value.
Flip the default for endpoints where smaller (or shorter) is better
(e.g., pain scores; time to symptom relief).
margin is a non-negative clinically meaningful difference threshold.
A pair only declares a winner when the directional gap strictly exceeds
margin; otherwise the pair ties at this endpoint and falls through to
the next one. nb_binary() does not expose margin because binary
values admit only two outcomes.
nb_tte() accepts a censor_rule argument. The default
"informative" matches the conventional Pocock/Buyse rule: if one
subject is censored and the other observed, a winner can only be
declared in the direction the censoring is uninformative about (i.e.,
the censored time is already past the event time). "ignore" drops
pairs where either subject is censored and treats them as ties at this
level.
netbenefit_ creates an object to be passed into jointCovariance or
pated through its ... argument. The object defines a hierarchical
net-benefit (a.k.a. win-difference, proportion-in-favor) statistic
across a list of endpoints, each declared by nb_tte(),
nb_continuous(), or nb_binary().
netbenefit_(formula, endpoints, data_index = 1)netbenefit_(formula, endpoints, data_index = 1)
formula |
a two-sided R formula |
endpoints |
a non-empty list of endpoint specs built by |
data_index |
integer. Index of the data frame in the |
The estimator is
where , , and are the numbers of treatment
wins, losses, and overall ties across all pairs
(control vs. treatment subject). The per-subject influence function is
available in closed form, so both the asymptotic and the bootstrap paths
of jointCovariance are supported.
The arm reference level is inferred from the arm column the same way
model.matrix(~ arm) would: levels(arm)[1] for factor, the smaller
value for numeric or logical, and the alphabetically first value for
character. To override, convert the column to a factor with the desired
level order before calling netbenefit_().
An object of class c("jc_spec_netbenefit", "jc_spec").
nb_tte(), nb_continuous(), nb_binary(), jointCovariance(),
pated().
Patient-level data (primary intention-to-treat population, n = 850) from OAK, an international, open-label, randomized Phase III trial (NCT02008227) comparing the anti-PD-L1 antibody atezolizumab (MPDL3280A, 1200 mg IV every 3 weeks) with docetaxel (75 mg/m^2 IV every 3 weeks) as second- or third-line therapy in patients with previously treated locally advanced or metastatic non-small-cell lung cancer (NSCLC). Patients were randomized 1:1 to the two arms.
The variables retained here are those released in Supplementary
Table 8 of Gandara et al. (2018), Nature Medicine, who
retrospectively assayed banked baseline plasma samples for
blood-based tumor mutational burden (bTMB) and reported the
biomarker-evaluable analyses on this primary ITT cohort. The shipped
copy applies the following purely cosmetic transformations to the
source: source variable names lower-cased; categorical values
lower-cased; the string "." (the source's missing code)
converted to NA; numeric-looking character columns coerced to
numeric; and the event indicators inverted so that 1 = event,
0 = censored (the source uses the opposite convention). Categorical
variables are stored as character vectors (not factors);
convert to factor on demand with factor() if a modeling
function requires it. No rows are dropped: the full primary ITT
population is included.
Authoritative reference. This documentation is a convenience summary; users should consult the original publication and its supplementary materials (Gandara et al. 2018; Rittmeyer et al. 2017) for the definitive description of the trial design, eligibility criteria, endpoint definitions, and assay protocol.
A data frame with 850 observations on 27 variables:
idAnonymized patient identifier; integer 301–1150. (identifier)
armRandomized treatment assignment, character:
"docetaxel" or "atezolizumab".
(treatment assignment)
pfsProgression-free survival, months (numeric). (post-randomization outcome)
pfs_eventPFS event indicator, integer: 1 = progression
or death, 0 = censored. Inverted from the source's
PFS.CNSR. (post-randomization outcome)
osOverall survival, months (numeric). (post-randomization outcome)
os_eventOS event indicator, integer: 1 = death,
0 = censored. Inverted from the source's OS.CNSR.
(post-randomization outcome)
bcorBest confirmed overall response per investigator
(RECIST 1.1), character: "CR", "PR", "SD", "PD", "NE";
NA when not assessable.
(post-randomization outcome)
btmbBlood-based tumor mutational burden: count of
somatic base substitutions on a 394-gene, 1.1 Mb panel
of cell-free DNA from the baseline plasma sample (numeric);
NA when the assay was not run or failed QC.
(baseline characteristic; predictive of atezolizumab
benefit at ; not strongly prognostic in itself)
msafMaximum somatic allele frequency in the baseline cfDNA sample (numeric, 0–1); a tumor-content / shedding surrogate. (baseline characteristic; weakly prognostic)
cfdna_ngcfDNA mass entering NGS library construction (nanograms, numeric, capped at 100 ng). (baseline characteristic; not prognostic — assay input)
coverageMedian exon sequencing coverage (numeric). (baseline characteristic; not prognostic — assay quality)
qcbTMB assay QC status, character: "pass",
"fail"; NA when not run.
(baseline characteristic; not prognostic — assay quality)
bepLogical flag for the biomarker-evaluable population
used in the bTMB analyses: coverage >= 800 & qc == "pass" &
msaf >= 0.01 (n = 642 of 850).
(baseline derived flag; not prognostic in itself)
ageBaseline age, years (numeric); range 33–85. (baseline characteristic; modestly prognostic)
sexSex, character: "F", "M".
(baseline characteristic; weakly prognostic; correlates
with smoking and bTMB)
raceRace, character: "white", "asian",
"other".
(baseline characteristic; not robustly prognostic in this
setting)
histologyTumor histology, character:
"non-squamous", "squamous".
(baseline characteristic; modestly prognostic)
ecogECOG performance status at baseline, integer (0 or 1). (baseline characteristic; strongly prognostic for PFS and OS)
prior_txNumber of prior lines of systemic therapy, integer (1 or 2). (baseline characteristic; prognostic — patients with 2 prior lines tend to do worse)
smokingSmoking history, character: "never",
"previous", "current".
(baseline characteristic; modestly prognostic and also a
correlate of IO benefit through bTMB)
sldBaseline sum of longest diameters of target lesions per RECIST 1.1, mm (numeric). (baseline characteristic; strongly prognostic — tumor burden)
metsitesNumber of metastatic sites at enrollment, integer. (baseline characteristic; strongly prognostic — disease burden)
krasKRAS mutation status from CRF, character:
"negative", "positive"; NA when not tested.
(baseline characteristic; weakly / inconsistently
prognostic; possible predictive signal for IO)
egfrEGFR mutation status from CRF, character:
"negative", "positive"; NA when not tested.
(baseline characteristic; weakly prognostic in IO-treated
populations; strongly predictive of differential treatment
effect — EGFR-positive patients typically derive less benefit
from anti-PD-L1)
eml4EML4-ALK rearrangement status from CRF,
character: "negative", "positive"; NA when not
tested.
(baseline characteristic; analogous to egfr in
interpretation — predictive of reduced IO benefit)
tc1ic1PD-L1 IHC (Ventana SP142) dichotomized at the
TC1IC1 cut-point, character: "low" = TC0 and IC0;
"high" = TC1/2/3 or IC1/2/3; NA when unknown.
(baseline characteristic; predictive of atezolizumab
benefit; weakly prognostic)
tc3ic3PD-L1 IHC dichotomized at the TC3IC3
cut-point, character: "low" = TC0/1/2 and IC0/1/2;
"high" = TC3 or IC3; NA when unknown.
(baseline characteristic; predictive of atezolizumab
benefit; weakly prognostic)
OAK's trial-level eligibility criteria (locally advanced or metastatic
NSCLC after platinum failure, ECOG 0–1, measurable disease per
RECIST 1.1, no prior immune-checkpoint therapy, adequate organ
function, etc.) were applied before randomization, so the
n = 850 primary ITT cohort is randomized 1:1. The bTMB assay was
performed retrospectively on banked baseline plasma; all assay-derived
variables therefore reflect quantities present before treatment
exposure but were generated post-randomization in calendar time.
EGFR, EML4-ALK, and PD-L1 IHC are recorded at screening
and can be used to subset without compromising randomization.
The published bTMB analyses (PFS and OS hazard ratios at the
btmb >= 16 cut-point) used the subset
bep & !(egfr %in% "positive") & !(eml4 %in% "positive") (n = 583).
OAK published a primary ITT of 850 patients and a secondary ITT of
1225 patients; only the primary ITT is included here. Variant-level
information (per-mutation chromosome, position, gene, protein change,
etc.) was released alongside this table but is not included here.
Each variable in the format section above carries a role tag in italic parentheses at the end of its description. The tags combine two dimensions:
Baseline characteristic — measured (or, for the bTMB assay, derived from material collected) at the baseline visit before the first dose of study treatment. Using such variables as covariates does not break randomization in expectation, because they are functions of pre-treatment quantities only.
Prognostic — based on the clinical-oncology literature for previously treated advanced NSCLC, whether the variable has been reported to be associated with PFS or OS independent of treatment assignment. “Predictive” indicates association with differential atezolizumab vs docetaxel benefit rather than with outcome per se. Strength qualifiers (strong / modest / weak) are the author's reading of the literature and are no substitute for a formal analysis on the dataset at hand.
Variables that are outcomes, identifiers, or treatment assignment carry a single role tag in place of the two-dimensional tag.
For users planning to apply PATED or any analysis that relies on baseline covariates being balanced across arms:
age, sex, race, histology, ecog, prior_tx,
smoking, sld, metsites, kras, egfr, eml4, tc1ic1, tc3ic3.
btmb, msaf, cfdna_ng, coverage, qc, bep.
pfs, pfs_event, os, os_event, bcor.
Conditioning on the biomarker-evaluable population
(bep == TRUE, n = 642) or further restricting to
!(egfr %in% "positive") & !(eml4 %in% "positive") (the paper's primary
bTMB analysis subgroup, n = 583) is a selection on pre-treatment
variables and does not break randomization in expectation; Supp.
Tables 4 and 5 of the source paper confirm empirical arm-balance
(no nominally significant imbalance on age, race, ECOG, histology,
prior lines, KRAS, or PD-L1 IHC; the btmb >= 16
subgroup is enriched for males and for smokers, but those are
biomarker–covariate associations, not arm–covariate associations).
Users should still verify arm-balance on their working subset (e.g.
with tableone) before invoking methods that assume it.
Built from Supplementary Table 8 (Excel file MOESM3) of:
Gandara DR, Paul SM, Kowanetz M, Schleifman E, Zou W, Li Y,
Rittmeyer A, Fehrenbacher L, Otto G, Malboeuf C, Lieber DS, Lipson D,
Silterra J, Amler L, Riehl T, Cummings CA, Hegde PS, Sandler A,
Ballinger M, Fabrizio D, Mok T, Shames DS. Blood-based tumor
mutational burden as a predictor of clinical benefit in non-small-cell
lung cancer patients treated with atezolizumab. Nature Medicine
2018; 24(9):1441–1448. doi:10.1038/s41591-018-0134-3.
Supplementary file downloaded from
https://www.nature.com/articles/s41591-018-0134-3.
See data-raw/poplar_oak.R in the package source for the exact
cleaning steps. Redistribution in this package is for non-commercial
research and teaching use; users intending other use should verify
the licensing terms of the Springer Nature supplementary materials.
Always cite the original publication when using this dataset.
Rittmeyer A, Barlesi F, Waterkamp D, Park K, Ciardiello F, von Pawel J, Gadgeel SM, Hida T, Kowalski DM, Dols MC, Cortinovis DL, Leach J, Polikoff J, Barrios C, Kabbinavar F, Frontera OA, De Marinis F, Turna H, Lee JS, Ballinger M, Kowanetz M, He P, Chen DS, Sandler A, Gandara DR; OAK Study Group. Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a Phase 3, open-label, multicentre randomised controlled trial. Lancet 2017; 389(10066):255–265. doi:10.1016/S0140-6736(16)32517-X.
data(oak) table(oak$arm, useNA = "ifany") summary(oak$btmb) # Primary bTMB analysis subset (paper's n = 583); %in% keeps NA rows # (untested patients) since they are not confirmed positive. oak_bep <- subset(oak, bep & !(egfr %in% "positive") & !(eml4 %in% "positive"))data(oak) table(oak$arm, useNA = "ifany") summary(oak$btmb) # Primary bTMB analysis subset (paper's n = 583); %in% keeps NA rows # (untested patients) since they are not confirmed positive. oak_bep <- subset(oak, bep & !(egfr %in% "positive") & !(eml4 %in% "positive"))
pated is a wrapper function of jointCovariance for testing treatment effect
in randomized clinical trials. It assumes that prognostic variables are fully
randomized. This assumption can help enhancing statistical power of conventional
approaches in detecting the treatment effect. Specifically, the sensitivity
of the conventional models specified in ... are improved by pated.
pated( ..., data, nboot = 0, compute_cov = TRUE, seed = NULL, transform = "identity" )pated( ..., data, nboot = 0, compute_cov = TRUE, seed = NULL, transform = "identity" )
... |
model specifications built by |
data |
either a single data frame (when all models are fitted on
the same dataset) or a list of data frames (one entry per |
nboot |
non-zero integer if bootstrap is adopted. By default 0. |
compute_cov |
logic. If |
seed |
random seed when generate bootstrap data. |
transform |
character. Now only supports |
a data frame of testing results.
## More examples can be found in the vignettes. library(survival) library(mvtnorm) genData <- function(seed = NULL){ set.seed(seed) n <- 300 sigma <- matrix(c(1, .6, .6, 1), 2) x <- rmvnorm(n, sigma = sigma) z1 <- rbinom(n, 1, .6) z2 <- rnorm(n) trt <- rbinom(n, 1, .5) bet <- c(-.2, .2) y <- -.5 + x %*% bet + z1 * .3 - z2 * .1 + .1 * trt - .1 * rnorm(n) death <- rbinom(n, 1, .8) data.frame( y = as.numeric(y), trt = trt, z1 = z1, z2 = z2, x1 = x[, 1], x2 = x[, 2], death, pid = paste0('s-', seq_len(n)) ) } dat <- genData(seed = 31415926) ## `data_index` defaults to 1 in every spec constructor and a single ## data.frame is auto-wrapped into a list, so neither needs spelling out ## when all models are fitted on the same dataset. fit <- pated( coxph_(Surv(time = y, event = death) ~ trt), glm_(z1 ~ trt, family = 'binomial'), glm_(z2 ~ trt, family = 'gaussian'), glm_(x1 ~ trt, family = 'gaussian'), glm_(x2 ~ trt, family = 'gaussian'), data = dat ) fit## More examples can be found in the vignettes. library(survival) library(mvtnorm) genData <- function(seed = NULL){ set.seed(seed) n <- 300 sigma <- matrix(c(1, .6, .6, 1), 2) x <- rmvnorm(n, sigma = sigma) z1 <- rbinom(n, 1, .6) z2 <- rnorm(n) trt <- rbinom(n, 1, .5) bet <- c(-.2, .2) y <- -.5 + x %*% bet + z1 * .3 - z2 * .1 + .1 * trt - .1 * rnorm(n) death <- rbinom(n, 1, .8) data.frame( y = as.numeric(y), trt = trt, z1 = z1, z2 = z2, x1 = x[, 1], x2 = x[, 2], death, pid = paste0('s-', seq_len(n)) ) } dat <- genData(seed = 31415926) ## `data_index` defaults to 1 in every spec constructor and a single ## data.frame is auto-wrapped into a list, so neither needs spelling out ## when all models are fitted on the same dataset. fit <- pated( coxph_(Surv(time = y, event = death) ~ trt), glm_(z1 ~ trt, family = 'binomial'), glm_(z2 ~ trt, family = 'gaussian'), glm_(x1 ~ trt, family = 'gaussian'), glm_(x2 ~ trt, family = 'gaussian'), data = dat ) fit
Plot PATED Analysis Results
## S3 method for class 'pated' plot(x, ...)## S3 method for class 'pated' plot(x, ...)
x |
an object returned from |
... |
currently not supported. |
NULL
Patient-level data (full intention-to-treat population, n = 287) from POPLAR, a multi-center, open-label, randomized Phase II trial (NCT01903993) comparing the anti-PD-L1 antibody atezolizumab (MPDL3280A, 1200 mg IV every 3 weeks) with docetaxel (75 mg/m^2 IV every 3 weeks) as second- or third-line therapy in patients with previously treated locally advanced or metastatic non-small-cell lung cancer (NSCLC). Patients were randomized 1:1 to the two arms.
The variables retained here are those released in Supplementary
Table 8 of Gandara et al. (2018), Nature Medicine, who
retrospectively assayed banked baseline plasma samples for
blood-based tumor mutational burden (bTMB). The shipped copy applies
the following purely cosmetic transformations to the source: source
variable names lower-cased; categorical values lower-cased; the
string "." (the source's missing code) converted to NA;
numeric-looking character columns coerced to numeric; and the event
indicators inverted so that 1 = event, 0 = censored (the source
uses the opposite convention). Categorical variables are stored as
character vectors (not factors); convert to factor on demand
with factor() if a modeling function requires it. No rows are
dropped: the full ITT population is included.
Authoritative reference. This documentation is a convenience summary; users should consult the original publication and its supplementary materials (Gandara et al. 2018; Fehrenbacher et al. 2016) for the definitive description of the trial design, eligibility criteria, endpoint definitions, and assay protocol.
A data frame with 287 observations on 25 variables:
idAnonymized patient identifier; integer 1–287. (identifier)
armRandomized treatment assignment, character:
"docetaxel" or "atezolizumab".
(treatment assignment)
pfsProgression-free survival, months (numeric). (post-randomization outcome)
pfs_eventPFS event indicator, integer: 1 = progression
or death, 0 = censored. Inverted from the source's
PFS.CNSR. (post-randomization outcome)
osOverall survival, months (numeric). (post-randomization outcome)
os_eventOS event indicator, integer: 1 = death,
0 = censored. Inverted from the source's OS.CNSR.
(post-randomization outcome)
bcorBest confirmed overall response per investigator
(RECIST 1.1), character: "CR", "PR", "SD", "PD", "NE";
NA when not assessable.
(post-randomization outcome)
btmbBlood-based tumor mutational burden: count of
somatic base substitutions on a 394-gene, 1.1 Mb panel
of cell-free DNA from the baseline plasma sample (numeric);
NA when the assay was not run or failed QC.
(baseline characteristic; predictive of atezolizumab
benefit at ; not strongly prognostic in itself)
msafMaximum somatic allele frequency in the baseline cfDNA sample (numeric, 0–1); a tumor-content / shedding surrogate. (baseline characteristic; weakly prognostic)
cfdna_ngcfDNA mass entering NGS library construction (nanograms, numeric, capped at 100 ng). (baseline characteristic; not prognostic — assay input)
coverageMedian exon sequencing coverage (numeric). (baseline characteristic; not prognostic — assay quality)
qcbTMB assay QC status, character: "pass",
"fail"; NA when not run.
(baseline characteristic; not prognostic — assay quality)
bepLogical flag for the biomarker-evaluable population
used in the bTMB analyses: coverage >= 800 & qc == "pass" &
msaf >= 0.01 (n = 211 of 287).
(baseline derived flag; not prognostic in itself)
ageBaseline age, years (numeric); range 36–84. (baseline characteristic; modestly prognostic)
sexSex, character: "F", "M".
(baseline characteristic; weakly prognostic; correlates
with smoking and bTMB)
raceRace, character: "white", "asian",
"other".
(baseline characteristic; not robustly prognostic in this
setting)
histologyTumor histology, character:
"non-squamous", "squamous".
(baseline characteristic; modestly prognostic)
ecogECOG performance status at baseline, integer (0 or 1). (baseline characteristic; strongly prognostic for PFS and OS)
prior_txNumber of prior lines of systemic therapy, integer (1 or 2). (baseline characteristic; prognostic — patients with 2 prior lines tend to do worse)
smokingSmoking history, character: "never",
"previous", "current".
(baseline characteristic; modestly prognostic and also a
correlate of IO benefit through bTMB)
sldBaseline sum of longest diameters of target lesions per RECIST 1.1, mm (numeric). (baseline characteristic; strongly prognostic — tumor burden)
metsitesNumber of metastatic sites at enrollment, integer. (baseline characteristic; strongly prognostic — disease burden)
krasKRAS mutation status from CRF, character:
"negative", "positive"; NA when not tested.
(baseline characteristic; weakly / inconsistently
prognostic; possible predictive signal for IO)
egfrEGFR mutation status from CRF, character:
"negative", "positive"; NA when not tested. A single
T790M patient is coded "positive".
(baseline characteristic; weakly prognostic in IO-treated
populations; strongly predictive of differential treatment
effect — EGFR-positive patients typically derive less benefit
from anti-PD-L1)
eml4EML4-ALK rearrangement status from CRF,
character: "negative", "positive"; NA when not
tested.
(baseline characteristic; analogous to egfr in
interpretation — predictive of reduced IO benefit)
POPLAR's trial-level eligibility criteria (locally advanced or metastatic NSCLC after platinum failure, ECOG 0–1, measurable disease, no prior anti-PD-L1/PD-1 therapy, adequate organ function, etc.) were applied before randomization, so the n = 287 ITT cohort is randomized 1:1. The bTMB assay was performed retrospectively on banked baseline plasma; all assay-derived variables therefore reflect quantities present before treatment exposure, but were generated post-randomization in calendar time. EGFR and EML4-ALK mutation status are CRF variables collected at screening and can be used to subset without compromising randomization. Variant-level information (per-mutation chromosome, position, gene, protein change, etc.) was released alongside this table but is not included here; consult the source supplementary file if needed.
Each variable in the format section above carries a role tag in italic parentheses at the end of its description. The tags combine two dimensions:
Baseline characteristic — measured (or, for the bTMB assay, derived from material collected) at the baseline visit before the first dose of study treatment. Using such variables as covariates does not break randomization in expectation, because they are functions of pre-treatment quantities only.
Prognostic — based on the clinical-oncology literature for previously treated advanced NSCLC, whether the variable has been reported to be associated with PFS or OS independent of treatment assignment. “Predictive” indicates association with differential atezolizumab vs docetaxel benefit rather than with outcome per se. Strength qualifiers (strong / modest / weak) are the author's reading of the literature and are no substitute for a formal analysis on the dataset at hand.
Variables that are outcomes, identifiers, or treatment assignment carry a single role tag in place of the two-dimensional tag.
For users planning to apply PATED or any analysis that relies on baseline covariates being balanced across arms:
age, sex, race, histology, ecog, prior_tx,
smoking, sld, metsites, kras, egfr, eml4.
btmb, msaf, cfdna_ng, coverage, qc, bep.
pfs, pfs_event, os, os_event, bcor.
Conditioning on the biomarker-evaluable population (bep == TRUE)
or further restricting to !(egfr %in% "positive") & !(eml4 %in% "positive")
is a selection on pre-treatment variables and does not break
randomization in expectation; Supp. Tables 3 and 5 of the source paper
confirm empirical arm-balance. Users should still verify arm-balance
on their working subset (e.g. with tableone) before invoking
methods that assume it.
Built from Supplementary Table 8 (Excel file MOESM3) of:
Gandara DR, Paul SM, Kowanetz M, Schleifman E, Zou W, Li Y,
Rittmeyer A, Fehrenbacher L, Otto G, Malboeuf C, Lieber DS, Lipson D,
Silterra J, Amler L, Riehl T, Cummings CA, Hegde PS, Sandler A,
Ballinger M, Fabrizio D, Mok T, Shames DS. Blood-based tumor
mutational burden as a predictor of clinical benefit in non-small-cell
lung cancer patients treated with atezolizumab. Nature Medicine
2018; 24(9):1441–1448. doi:10.1038/s41591-018-0134-3.
Supplementary file downloaded from
https://www.nature.com/articles/s41591-018-0134-3.
See data-raw/poplar_oak.R in the package source for the exact
cleaning steps. Redistribution in this package is for non-commercial
research and teaching use; users intending other use should verify
the licensing terms of the Springer Nature supplementary materials.
Always cite the original publication when using this dataset.
Fehrenbacher L, Spira A, Ballinger M, Kowanetz M, Vansteenkiste J, Mazieres J, Park K, Smith D, Artal-Cortes A, Lewanski C, Braiteh F, Waterkamp D, He P, Zou W, Chen DS, Yi J, Sandler A, Rittmeyer A; POPLAR Study Group. Atezolizumab versus docetaxel for patients with previously treated non-small-cell lung cancer (POPLAR): a multicentre, open-label, Phase 2 randomised controlled trial. Lancet 2016; 387(10030):1837–1846. doi:10.1016/S0140-6736(16)00587-0.
data(poplar) table(poplar$arm, useNA = "ifany") summary(poplar$btmb) # Biomarker-evaluable subset used in the published bTMB analyses poplar_bep <- subset(poplar, bep)data(poplar) table(poplar$arm, useNA = "ifany") summary(poplar$btmb) # Biomarker-evaluable subset used in the published bTMB analyses poplar_bep <- subset(poplar, bep)
Summarize an analysis of multiple outcomes.
## S3 method for class 'summary.jointCovariance' print(x, ...)## S3 method for class 'summary.jointCovariance' print(x, ...)
x |
an object returned by |
... |
for debugging only. |
an invisible object.
## no example## no example
quantile_ is a wrapper function creating an object that, for each
requested probability, computes the difference between the two arms'
sample quantiles of the outcome. The object is passed to jointCovariance
or pated through .... Because the empirical-quantile estimator has no
tractable closed-form score, this engine is available only through the
bootstrap path (nboot > 0).
quantile_(formula, probs = c(0.25, 0.5, 0.75), data_index = 1)quantile_(formula, probs = c(0.25, 0.5, 0.75), data_index = 1)
formula |
a two-sided formula |
probs |
numeric vector of probabilities in |
data_index |
integer. Index of the data frame in the |
simulateMoData generates data for simulation and testing purposes.
simulateMoData(n = 500, hr = 0.8, seed = NULL)simulateMoData(n = 500, hr = 0.8, seed = NULL)
n |
an integer for total sample size of a randomized control trial of two arms. |
hr |
hazard ratio of treatment. |
seed |
random seed. By default |
summary method for class jointCovariance.
## S3 method for class 'jointCovariance' summary(object, model_index = NULL, ...)## S3 method for class 'jointCovariance' summary(object, model_index = NULL, ...)
object |
an object returned by |
model_index |
|
... |
for debugging only |
a list
Returns the variance-covariance matrix of the main parameters of fitted model
objects. The "main" parameters of models correspond to those returned by coef.
## S3 method for class 'jointCovariance' vcov(object, model_index = NULL, ...)## S3 method for class 'jointCovariance' vcov(object, model_index = NULL, ...)
object |
an object returned by |
model_index |
|
... |
for debugging only |
a matrix of covariance of all estimates
winratio_ creates an object to be passed into jointCovariance or
pated through its ... argument. The object defines a hierarchical
win-ratio statistic across a list of endpoints, each declared by
nb_tte(), nb_continuous(), or nb_binary().
winratio_(formula, endpoints, data_index = 1)winratio_(formula, endpoints, data_index = 1)
formula |
a two-sided R formula |
endpoints |
a non-empty list of endpoint specs built by |
data_index |
integer. Index of the data frame in the |
The fitted coefficient is on the log scale:
where and are the proportions of
all control-treatment pairs where the treatment subject wins or loses,
respectively. Tied pairs remain in the denominator of both proportions
but do not contribute to either numerator.
The raw win ratio can be recovered by exponentiating the coefficient. The log scale is used because it gives the standard large-sample Wald representation and maps the null win ratio of 1 to 0.
The arm reference level is inferred from the arm column the same way
model.matrix(~ arm) would: levels(arm)[1] for factor, the smaller
value for numeric or logical, and the alphabetically first value for
character. To override, convert the column to a factor with the desired
level order before calling winratio_().
An object of class c("jc_spec_winratio", "jc_spec").
nb_tte(), nb_continuous(), nb_binary(), jointCovariance(),
pated().