sim.BA
)This is a vignette for sim.BA
. This package
conducts individual level simulations that can allow researchers to
characterize the bias arising from unmeasured confounding with a
specified but modifiable structure during the study design.
Time-to-event outcomes are generated in simulated datasets based on
user-specified treatment and outcome prevalence and other related
variables including unmeasured confounders and their measured proxies.
Results are summarized quantitatively to evaluate balance and bias when
conducting analyses that do not account for unmeasured confounding
(Level 1 adjustment) versus when accounting for unmeasured confounding
through measured proxies (Level 2 adjustment).
First, a user can create the parameters
object that will
be supplied to simBA()
containing the parameters for the
simulation design.
parameters <- create_parameters(nbinary = 6,
ncontinuous = 1,
ncount = 1,
# file = "parameters.csv",
unmeasured_conf = "u1",
unmeasured_type = "binary")
# Note: uncomment `file` argument to automatically save file to supplied path
The user can manually fill in the parameter values in the csv file. An example file comes with the package, which we can load in using the following:
parameters <- read.csv(system.file("extdata", "parameters.csv",
package = "sim.BA"))
parameters
#> Variable Description Type prevalence mean sd
#> 1 c1 Gender binary 0.21 NA NA
#> 2 c2 Age continuous NA 77 7.6
#> 3 c3 GI complication history binary 0.03 NA NA
#> 4 c4 Concurrent GI protective agent use binary 0.21 NA NA
#> 5 c5 Warfarin use history binary 0.08 NA NA
#> 6 c6 Number of hospitalizations count NA 8 NA
#> 7 c7 Gastric ulcer history binary 0.14 NA NA
#> 8 c8 OA diagnosis binary 0.35 NA NA
#> 9 c9 RA diagnosis binary 0.04 NA NA
#> 10 c10 GI protective agent use history binary 0.28 NA NA
#> 11 c11 Any hospital admission in the CAP binary 0.18 NA NA
#> 12 c12 COPD binary 0.15 NA NA
#> 13 c13 Corticosteroid use history binary 0.07 NA NA
#> 14 u1 Unmeasured binary 0.10 NA NA
#> coeff_treatment_model coeff_outcome_model
#> 1 -0.2783 0.46667
#> 2 0.0321 0.07803
#> 3 0.1660 1.03895
#> 4 0.0910 -0.68778
#> 5 0.5499 0.23121
#> 6 0.0110 0.05539
#> 7 0.0488 NA
#> 8 0.4507 NA
#> 9 0.5488 NA
#> 10 0.1569 NA
#> 11 -0.0435 NA
#> 12 NA 0.24222
#> 13 NA -0.13412
#> 14 1.0000 1.00000
A representative directed acyclic graph (DAG) for data generation for the simulation study used in this vignette is shown below. Users are encouraged to use DAGs to explicitly convey design of their own simulations.
simBA()
The following table provides guidance on how to design your
simulations using various options available in the simBA()
function.
Option | Specification notes |
iterations |
The number of simulation iterations. Default is 500. |
size |
The size of each sample to be generated. Default is 1000. |
treatment_prevalence |
The desired prevalence of treatment. Should be a number between 0 and 1. This is REQUIRED to be specified with no defaults. |
treatment_coeff |
The coefficient on the treatment variable in the data-generating model for survival times. This is REQUIRED to be specified with no defaults. |
outcome_prevalence |
The desired prevalence of the outcome. Based on the specified value, censoring time are adjusted for the data-generating model to achieve the desired prevalence. This is REQUIRED to be specified with no defaults. |
dist |
The distribution to use to generate survival times. Allowable
options include "exponential" (default) and
"weibull" . Abbreviations allowed. |
unmeasured_conf |
The name of the variable in parameters corresponding to
the unmeasured confounder. |
n_proxies |
the number of proxies for the unmeasured confounder to include in the simulation. Default is 0. |
proxy_type |
When n_proxies is greater than 0, the type of variable
the proxies should be. Allowable options include "binary"
(default) and "continuous" . Abbreviations allowed. |
corr |
When n_proxies is greater than 0, the desired
correlations between the proxy variable and the unmeasured confounder in
the simulation. Should be length 1 (in which case all proxies have the
same correlation with the unmeasured confounder) or length equal to
n_proxies . |
adj |
The method used to adjust for the confounders. Allowable options
include "matching" (the default), which uses
MatchIt::matchit() , and "weighting" , which
uses WeightIt::weightit() . Abbreviations allowed. |
adj_args |
A list of arguments passed to MatchIt::matchit() or
WeightIt::weightit() depending on the argument to
adj . If not supplied, the parameter defaults will be used.
Take care to specify these arguments to ensure the adjustment method is
as desired. |
estimand |
The desired estimand to target. Allowable options include
"ATT" (default), "ATC" , and
"ATE" . Note this is also passed to the
estimand argument of the function used for adjustment as
specified by adj if omitted in adj_args . |
keep_data |
Default is FALSE . Setting to TRUE will
keep the datasets generated in each simulation and make the output
object large. |
cl |
A cluster object created by parallel::makeCluster() , or
an integer to indicate number of child-processes (integer values are
ignored on Windows) for parallel evaluations. See
?pbapply::pbapply for details. Default is NULL
for no parallel evaluation. |
verbose |
Whether to print information about the progress of the simulation,
including a progress bar. Default is TRUE . |
sim <- simBA(parameters,
iterations = 500,
size = 1000,
treatment_prevalence = .2,
treatment_coeff = -.5,
outcome_prevalence = .05,
dist = "weibull",
unmeasured_conf = "u1",
n_proxies = 2,
proxy_type = "binary",
corr = c(0.5, 0.4),
adj = "matching",
adj_args = list(method = "nearest", caliper=0.2),
estimand = "ATT",
keep_data = FALSE,
# cl = 4, #uncomment for speed on Mac
verbose = FALSE)
sim
#> A `simBA` object
#> - sample size: 1000
#> - treatment prevalence: 0.2
#> - treatment coefficient: -0.5 (true HR: 0.62)
#> - outcome prevalence: 0.05
#> - outcome distribution: Weibull
#> - unmeasured confounder: "u1"
#> - proxies: 2 (binary; r = 0.5, 0.4)
#> - adjustment method: matching
#> Use `summary()` and `plot()` to examine results.
summary(sim)
#> - Average Standardized Mean Difference (SMD):
#> Variable Adjustment SMD
#> p1 crude 1.45e-01
#> p1 L1 1.49e-01
#> p1 L2 2.33e-04
#> p2 crude 1.18e-01
#> p2 L1 1.22e-01
#> p2 L2 -4.63e-05
#> u1 crude 2.66e-01
#> u1 L1 2.72e-01
#> u1 L2 1.86e-01
#>
#> - Hazard Ratios (HR):
#> Adjustment HR
#> crude 0.798
#> L1 0.732
#> L2 0.701
#> true 0.616
The summary provides numerical results for key metrics including balance (standardized mean differences) and hazard ratios averaged over all simulation iterations.
The balance plot provides distribution of absolute standardized mean differences (SMD) in the unmeasured confounder and proxies (when included) over the simulation runs. Vertical lines are placed at 0 (solid) to denote perfect balance, 0.1 (dashed) to denote ‘reasonable’ balance, and at the crude SMD for the unmeasured confounder (dotted). As you go from the unadjusted (which does not adjust for any confounding) to Level 1 (which only adjusts for measured confounding), large imbalances remain in the unmeasured confounder and proxies between treatment groups. Including proxies of unmeasured confounding in adjustment Level 2, excellent balance is achieved in the included proxies and imbalance also reduces for the unmeasured confounder due to correlations with the proxies.
The hazard ratio (HR) plot plots hazard ratios on a log scale for the x-axis. Vertical lines are placed at 1 (solid) and the true marginal HR (dashed). As you go from the unadjusted to Level 1 and Level 2, the distribution shifts closer to the simulated truth suggesting that accounting for unmeasured confounding through proxies in Level 2 helps in reducing the threat of unmeasured confounding.
We can request a similar plot that displays the relative bias of the
HR estimates ((est_HR - true_HR) / true_HR
) by setting
type = "bias"
.
See ?plot.simBA
for further details.