| Title: | Variance-Based Sensitivity Analysis for Weighting Estimators |
|---|---|
| Description: | Provides methods for variance-based sensitivity analysis and weighting estimators in observational studies based on methodology by Huang & Pimentel (2025) <doi:10.1093/biomet/asae040>. Includes bootstrap inference, bias bounds estimation, and visualization tools for sensitivity parameters. |
| Authors: | Jiayao Gan [aut, cre], Melody Huang [aut], Samuel D. Pimentel [aut], Andy A. Shen [aut], National Science Foundation [fnd] (Grant #2142146) |
| Maintainer: | Jiayao Gan <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-07-01 07:37:51 UTC |
| Source: | https://github.com/cran/vbm |
This function computes a benchmarked lambda value for the Marginal Sensitivity Model (MSM) by omitting a specified covariate from the weighting model. The resulting lambda represents the worst-case odds ratio bound that would be required to account for the confounding due to omitting that covariate. This allows researchers to calibrate sensitivity parameters against observed covariates (Hsu & Small, 2013; Soriano et al., 2023).
benchmark_lambda(omit, weightlist, data, treatment_name = "treatment")benchmark_lambda(omit, weightlist, data, treatment_name = "treatment")
omit |
A character string specifying the name of the covariate to omit from the weighting model when computing benchmark weights. |
weightlist |
A pre-fitted weightit object from the WeightIt package. Must contain the full weighting specification including all covariates, method, and any additional parameters (e.g., tols, maxit). |
data |
A data frame containing the covariates and treatment indicator. |
treatment_name |
Character string naming the treatment variable in data. Default is "treatment". |
A data frame with one row containing:
variable |
The name of the omitted variable |
lambda |
The benchmarked lambda value (rounded to 1 decimal place) |
This function assumes that the original weights were estimated using the full set of covariates including the omitted variable.
The benchmarked lambda represents the worst-case individual-level deviation in the odds of treatment that would result from omitting the given covariate. Larger lambda values indicate that the covariate is more important for addressing confounding.
Hsu, J. Y., & Small, D. S. (2013). Calibrating sensitivity analyses to observed covariates in observational studies. Biometrics, 69(4), 803-811.
Soriano, D., Ben-Michael, E., Bickel, P. J., Feller, A., & Pimentel, S. D. (2023). Interpretable sensitivity analysis for balancing weights. Journal of the Royal Statistical Society Series A: Statistics in Society, 186(4), 707-721.
library(WeightIt) data("nhanes-clean") weightlist <- WeightIt::weightit(treatment ~ gender + age + income + income.missing + education + smoking.now + smoking.ever + race, data = nhanes.clean, method = "ebal", estimand = "ATT") # Benchmark omitting "education" benchmark_result <- benchmark_lambda( omit = "education", weightlist, nhanes.clean ) print(benchmark_result) # Interpret the result # If critical lambda* from sensitivity analysis is 5, and benchmarked lambda # for "education" is 4.4, then an unmeasured confounder would need to be stronger # than "education" to explain away the effect. # glm call for weighting, must specify formula, data in glm glm_call <- quote(glm( formula=treatment ~ age + education + income , data = nhanes.clean,family = binomial() )) glm_result <- benchmark_lambda( omit = "education", glm_call, nhanes.clean ) print(glm_result)library(WeightIt) data("nhanes-clean") weightlist <- WeightIt::weightit(treatment ~ gender + age + income + income.missing + education + smoking.now + smoking.ever + race, data = nhanes.clean, method = "ebal", estimand = "ATT") # Benchmark omitting "education" benchmark_result <- benchmark_lambda( omit = "education", weightlist, nhanes.clean ) print(benchmark_result) # Interpret the result # If critical lambda* from sensitivity analysis is 5, and benchmarked lambda # for "education" is 4.4, then an unmeasured confounder would need to be stronger # than "education" to explain away the effect. # glm call for weighting, must specify formula, data in glm glm_call <- quote(glm( formula=treatment ~ age + education + income , data = nhanes.clean,family = binomial() )) glm_result <- benchmark_lambda( omit = "education", glm_call, nhanes.clean ) print(glm_result)
Computes benchmark R-squared, bias, and correlation values for a given omitted covariate by re-fitting the weighting model without that covariate. This function is used to assess how sensitive results are to unobserved confounders similar to observed covariates.
benchmark_R2( omit, weightlist, data, Y = "Y", treatment = "treatment", estimate )benchmark_R2( omit, weightlist, data, Y = "Y", treatment = "treatment", estimate )
omit |
A character string specifying the name of the covariate to omit from the weighting model. |
weightlist |
A pre-fitted weightit object from the WeightIt package. Must contain the full weighting specification including all covariates, method, and any additional parameters (e.g., tols, maxit). |
data |
Data frame containing the original data used to fit W. Must include all variables from the weighting formula plus the outcome and treatment variables. |
Y |
Character string specifying the name of the outcome variable in data. Default is "Y". |
treatment |
Character string specifying the name of the treatment variable in data. Default is "treatment". |
estimate |
Numeric value of the treatment effect estimate from the full weighting model (e.g., IPW estimate). Used for calculating additional metrics like MRCS if needed. |
A data frame with one row containing the following columns:
variable |
The name of the omitted covariate |
bias |
The estimated bias from omitting the covariate |
R2_benchmark |
The R-squared value representing the proportion of variance in weights explained by the omitted covariate |
rho_benchmark |
The correlation between the weight difference (full - benchmark) and the outcome among control units |
Huang, M., & Pimentel, S. D. (2025). Variance-based sensitivity analysis for weighting estimators results in more informative bounds. Biometrika, 112(1).
library(WeightIt) weightlist <- weightit(treatment ~ age + education + income, data = nhanes.clean, method = "ebal", estimand = "ATT", tols = 0.01) # First get the IPW estimate model_ipw <- estimatr::lm_robust(Y ~ treatment, data = nhanes.clean, weights = weightlist$weights) ipw_estimate <- coef(model_ipw)[2] # Benchmark R-squared for omitting "age" result <- benchmark_R2( omit = "age", weightlist = weightlist, data = nhanes.clean, estimate = ipw_estimate ) print(result) # glm call glm_call <- quote(glm(formula = treatment ~ age + education + gender + income, data = nhanes.clean,family = binomial())) result <- benchmark_R2( omit = "age", weightlist = glm_call, data = nhanes.clean, estimate = ipw_estimate ) print(result)library(WeightIt) weightlist <- weightit(treatment ~ age + education + income, data = nhanes.clean, method = "ebal", estimand = "ATT", tols = 0.01) # First get the IPW estimate model_ipw <- estimatr::lm_robust(Y ~ treatment, data = nhanes.clean, weights = weightlist$weights) ipw_estimate <- coef(model_ipw)[2] # Benchmark R-squared for omitting "age" result <- benchmark_R2( omit = "age", weightlist = weightlist, data = nhanes.clean, estimate = ipw_estimate ) print(result) # glm call glm_call <- quote(glm(formula = treatment ~ age + education + gender + income, data = nhanes.clean,family = binomial())) result <- benchmark_R2( omit = "age", weightlist = glm_call, data = nhanes.clean, estimate = ipw_estimate ) print(result)
This function performs a percentile bootstrap procedure to construct confidence intervals for the Average Treatment Effect on the Treated (ATT) under either the Marginal Sensitivity Model (MSM) or the Variance-Based Sensitivity Model (VBM). The intervals account for both sampling uncertainty and unmeasured confounding, following the methods of Zhao et al. (2019) and Soriano et al. (2023).
estimate_confidence_intervals( alpha = 0.05, weightlist, df, lambda_seq = seq(1, 8, by = 0.25), R2_seq = seq(0, 0.9, by = 0.01), cor_eps_seq = NULL, treatment_variable = "treatment", approach, covariates = NULL, n_bootstrap = 1000, n_cores = 4, seed = 331 )estimate_confidence_intervals( alpha = 0.05, weightlist, df, lambda_seq = seq(1, 8, by = 0.25), R2_seq = seq(0, 0.9, by = 0.01), cor_eps_seq = NULL, treatment_variable = "treatment", approach, covariates = NULL, n_bootstrap = 1000, n_cores = 4, seed = 331 )
alpha |
Significance level for confidence intervals (default = 0.05 for 95% CI) |
weightlist |
A pre-fitted weightit object from the WeightIt package. Must contain the estimated weights and the original call specification. |
df |
Data frame containing the original data used to fit W. Must include all variables from the weighting formula plus the outcome. |
lambda_seq |
Numeric vector of lambda values for MSM approach. Default = seq(1, 8, by = 0.25). Lambda bounds the odds ratio deviation from ignorability. |
R2_seq |
Numeric vector of R-squared values for VBM approach. Default = seq(0, 0.9, by = 0.01). R-squared represents proportion of variance in weights from unobserved confounders. |
cor_eps_seq |
Numeric vector of correlation values for VBM with correlation approach. Must be same length as R2_seq. If NULL, uses worst-case bounds. |
treatment_variable |
Character string naming the treatment variable in df. Default = "treatment". |
approach |
Character string specifying the sensitivity approach. Must be one of: "msm" (Marginal Sensitivity Model), "vbm" (Variance-Based Model), or "vbm_w_cor" (VBM with user-specified correlation). |
covariates |
Optional character vector of variable names to add to the output data frame. If provided, each variable name will be repeated for all rows. |
n_bootstrap |
Integer number of bootstrap iterations. Default = 1000. |
n_cores |
Integer number of CPU cores for parallel computation. Default = 4. |
seed |
Integer seed for reproducibility. Default = 331. |
A data frame containing confidence intervals for the ATT across sensitivity parameter values:
For approach = "msm": Columns lambda, ci_low, ci_high
For approach = "vbm": Columns R2, ci_low, ci_high
(and rho if cor_eps_seq provided)
If covariates is provided: an additional column variable
This function uses parallel processing via parallel::mclapply() with 4 cores.
For Windows systems, you may need to adjust the parallel backend.
The function assumes the outcome variable is named "Y" in the input data frame.
Zhao, Q., Small, D. S., & Bhattacharya, B. B. (2019). Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society: Series B, 81(4), 735-761.
Soriano, D., Ben-Michael, E., Bickel, P. J., Feller, A., & Pimentel, S. D. (2023). Interpretable sensitivity analysis for balancing weights. Journal of the Royal Statistical Society Series A: Statistics in Society, 186(4), 707-721.
Huang, M., & Pimentel, S. D. (2025). Variance-based sensitivity analysis for weighting estimators results in more informative bounds. Biometrika.
# MSM confidence intervals library(WeightIt) # First, fit the weighting model weightlist_msm <- weightit(treatment ~ age + education + income, data = nhanes.clean, method = "ebal", estimand = "ATT") # Run MSM confidence intervals msm_results <- estimate_confidence_intervals( alpha = 0.05, weightlist = weightlist_msm, df = nhanes.clean, lambda_seq = seq(1, 5, by = 0.5), treatment_variable = "treatment", approach = "msm", n_bootstrap = 1000, n_cores = 1, seed = 123 ) print(msm_results) # VBM confidence intervals weightlist_vbm <- weightit(treatment ~ age + education + income, data = nhanes.clean, method = "optweight", estimand = "ATT", tols = 0.01) vbm_results <- estimate_confidence_intervals( alpha = 0.05, weightlist = weightlist_vbm, df = nhanes.clean, R2_seq = seq(0, 0.9, by = 0.1), treatment_variable = "treatment", approach = "vbm", n_bootstrap = 1000, n_cores = 1, seed = 123 ) print(vbm_results) # VBM with user-specified correlation and custom numbers of bootstrap iterations and cores weightlist_vbm_cor <- weightit(treatment ~ age + education + income, data = nhanes.clean, method = "optweight", estimand = "ATT") vbm_results_cor <- estimate_confidence_intervals( alpha = 0.05, weightlist = weightlist_vbm_cor, df = nhanes.clean, R2_seq = seq(0, 0.9, by = 0.1), cor_eps_seq = rep(0.9, 10), treatment_variable = "treatment", approach = "vbm", n_bootstrap = 500, n_cores = 1, seed = 123 ) print(vbm_results_cor) # glm call glm_call <-quote(glm(treatment ~ age + education + income, data = nhanes.clean, family = binomial())) glm_results <- estimate_confidence_intervals( alpha = 0.05, weightlist = glm_call, df = nhanes.clean, lambda_seq = seq(1, 5, by = 0.5), treatment_variable = "treatment", approach = "msm", n_bootstrap = 1000, n_cores = 1, seed = 123 ) print(glm_results)# MSM confidence intervals library(WeightIt) # First, fit the weighting model weightlist_msm <- weightit(treatment ~ age + education + income, data = nhanes.clean, method = "ebal", estimand = "ATT") # Run MSM confidence intervals msm_results <- estimate_confidence_intervals( alpha = 0.05, weightlist = weightlist_msm, df = nhanes.clean, lambda_seq = seq(1, 5, by = 0.5), treatment_variable = "treatment", approach = "msm", n_bootstrap = 1000, n_cores = 1, seed = 123 ) print(msm_results) # VBM confidence intervals weightlist_vbm <- weightit(treatment ~ age + education + income, data = nhanes.clean, method = "optweight", estimand = "ATT", tols = 0.01) vbm_results <- estimate_confidence_intervals( alpha = 0.05, weightlist = weightlist_vbm, df = nhanes.clean, R2_seq = seq(0, 0.9, by = 0.1), treatment_variable = "treatment", approach = "vbm", n_bootstrap = 1000, n_cores = 1, seed = 123 ) print(vbm_results) # VBM with user-specified correlation and custom numbers of bootstrap iterations and cores weightlist_vbm_cor <- weightit(treatment ~ age + education + income, data = nhanes.clean, method = "optweight", estimand = "ATT") vbm_results_cor <- estimate_confidence_intervals( alpha = 0.05, weightlist = weightlist_vbm_cor, df = nhanes.clean, R2_seq = seq(0, 0.9, by = 0.1), cor_eps_seq = rep(0.9, 10), treatment_variable = "treatment", approach = "vbm", n_bootstrap = 500, n_cores = 1, seed = 123 ) print(vbm_results_cor) # glm call glm_call <-quote(glm(treatment ~ age + education + income, data = nhanes.clean, family = binomial())) glm_results <- estimate_confidence_intervals( alpha = 0.05, weightlist = glm_call, df = nhanes.clean, lambda_seq = seq(1, 5, by = 0.5), treatment_variable = "treatment", approach = "msm", n_bootstrap = 1000, n_cores = 1, seed = 123 ) print(glm_results)
This function computes the range of possible point estimates (i.e., bounds) for the Average Treatment Effect on the Treated (ATT) under either the Marginal Sensitivity Model (MSM) or the Variance-Based Sensitivity Model (VBM). These bounds represent the worst-case treatment effects possible given a specified level of unmeasured confounding, without accounting for sampling uncertainty.
estimate_point_estimate_bounds( weights, Y, treatment, approach = "vbm", lambda_range = seq(1, 8, by = 0.25), R2_range = seq(0, 0.9, by = 0.01), variables = NULL, data )estimate_point_estimate_bounds( weights, Y, treatment, approach = "vbm", lambda_range = seq(1, 8, by = 0.25), R2_range = seq(0, 0.9, by = 0.01), variables = NULL, data )
weights |
A numeric vector of balancing weights (estimated from observed data).
Typically obtained from |
Y |
A numeric vector of outcomes. |
treatment |
A numeric or logical vector indicating treatment assignment (1 = treated, 0 = control). |
approach |
A character string specifying the sensitivity model:
|
lambda_range |
A numeric vector of lambda values (>= 1) for the MSM.
Default is |
R2_range |
A numeric vector of R-squared values (0 to 1) for the VBM.
Default is |
variables |
Optional character vector of variable names to add to the output data frame. If provided, each variable name will be repeated for all rows. |
data |
Optional data frame containing the variables. Currently not used but may be included for compatibility with other functions. |
A data frame containing the point estimate bounds:
For approach = "msm": Columns lambda, ATT_low, ATT_high
For approach = "vbm": Columns R2, ATT_low, ATT_high
If variables is provided: an additional column variable
These point estimate bounds do NOT account for sampling uncertainty. For confidence
intervals that incorporate sampling variability, use a bootstrap procedure (e.g.
estimate_confidence_intervals).
The bounds represent the range of possible point estimates under the specified confounding level. As lambda increases (MSM) or R2 increases (VBM), the bounds widen.
Soriano, D., Ben-Michael, E., Bickel, P. J., Feller, A., & Pimentel, S. D. (2023). Interpretable sensitivity analysis for balancing weights. Journal of the Royal Statistical Society Series A: Statistics in Society, 186(4), 707-721.
Zhao, Q., Small, D. S., & Bhattacharya, B. B. (2019). Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society: Series B, 81(4), 735-761.
Huang, M., & Pimentel, S. D. (2025). Variance-based sensitivity analysis for weighting estimators results in more informative bounds. Biometrika.
# MSM approach library(WeightIt) weightlist <- weightit(treatment ~ age + education, data = nhanes.clean, method = "ps", estimand = "ATT") weights <- weightlist$weights bounds_msm <- estimate_point_estimate_bounds( weights = weights, Y = nhanes.clean$Y, treatment = nhanes.clean$treatment, approach = "msm", lambda_range = seq(1, 5, by = 0.5) ) print(bounds_msm) # VBM approach bounds_vbm <- estimate_point_estimate_bounds( weights = weights, Y = nhanes.clean$Y, treatment = nhanes.clean$treatment, approach = "vbm", R2_range = seq(0, 0.9, by = 0.1) ) print(bounds_vbm) # With variable names for plotting bounds_with_names <- estimate_point_estimate_bounds( weights = weights, Y = nhanes.clean$Y, treatment = nhanes.clean$treatment, approach = "msm", lambda_range = c(2,3), variables = c("age","education") ) print(bounds_with_names)# MSM approach library(WeightIt) weightlist <- weightit(treatment ~ age + education, data = nhanes.clean, method = "ps", estimand = "ATT") weights <- weightlist$weights bounds_msm <- estimate_point_estimate_bounds( weights = weights, Y = nhanes.clean$Y, treatment = nhanes.clean$treatment, approach = "msm", lambda_range = seq(1, 5, by = 0.5) ) print(bounds_msm) # VBM approach bounds_vbm <- estimate_point_estimate_bounds( weights = weights, Y = nhanes.clean$Y, treatment = nhanes.clean$treatment, approach = "vbm", R2_range = seq(0, 0.9, by = 0.1) ) print(bounds_vbm) # With variable names for plotting bounds_with_names <- estimate_point_estimate_bounds( weights = weights, Y = nhanes.clean$Y, treatment = nhanes.clean$treatment, approach = "msm", lambda_range = c(2,3), variables = c("age","education") ) print(bounds_with_names)
The National Child Development Survey (NCDS) is a longitudinal study of individuals born in the United Kingdom during the week of 3-9 March 1958. The original study collected information on educational attainment, familial backgrounds, and socioeconomic and health well-being for 17,415 individuals.
ncdsncds
A data frame with 3,642 rows and 14 columns.
Following Battistin and Sianesi (2011), the data were pre-processed to obtain a subset of 3,642 males who were employed in 1991 and had complete educational attainment and wage information. Missing covariates were imputed a single time using Multiple Imputation by Chained Equations (MICE; Buuren and Groothuis-Oudshoorn, 2010).
Outcome Variable:
wage: Log gross hourly wage in British Pounds.
Treatment Variables:
Dany (binary): Indicator of any academic qualification.
Levels: 1 = qualification (n = 2,399), 0 = no qualification
(n = 1,243).
Covariates (Pre-treatment Confounders):
white: Indicator of self-identified white race.
maemp: Mother's employment status in 1974.
scht: School type attended at age 16.
qmab, qmab2: Mathematics test scores at ages 7 and 11.
qvab, qvab2: Reading test scores at ages 7 and 11.
paed_u, maed_u: Father and mother's years of education.
sib_u: Number of siblings.
agepa, agema: age of father and mother in 1974.
Use str(NCDS) to inspect the full structure of the dataset.
Battistin, E., & Sianesi, B. (2011). Misclassified Treatment Status and Treatment Effects: An Application to Returns to Education in the United Kingdom. Review of Economics and Statistics, 93(2), 495-509.
https://cls.ucl.ac.uk/cls-studies/1958-national-child-development-study/
Battistin E, Sianesi B. (2011). Misclassified Treatment Status and Treatment Effects: an Application to Returns to Education in the United Kindom. Review of Economics and Statistics, 93(2), 495-509.
Battistin E, Sianesi B. (2012). Replication data for: Misclassified Treatment Status and Treatment Effects: an Application to Returns to Education in the United Kindom. URL https://doi.org.10.7910/DVN/EPCYUL.
A cleaned subset of data from the 2013-2014 National Health and Nutrition Examination Survey (NHANES), used for sensitivity analysis in causal inference studies of fish consumption on blood mercury levels.
nhanes.cleannhanes.clean
A data frame with 1,107 rows and 9 columns:
Total blood mercury (in log2 scale, micrograms per liter). A one-unit difference implies a twofold difference in total blood mercury.
Binary indicator of fish/shellfish consumption in the preceding month. 1 = more than 12 servings, 0 = 12 or fewer servings.
Participant gender.
Participant age in years.
Household income level.
Whether income data is missing.
Participant race/ethnicity.
Educational attainment level.
Current smoking status.
Lifetime smoking history (ever smoked or not).
The outcome variable is total blood mercury measured in micrograms per liter, transformed to log2 scale. An estimated treated-control outcome difference of one implies that a treated person's total blood mercury is twice that of a control.
The treatment variable indicates whether an individual consumed more than 12 servings of fish or shellfish in the month preceding the survey.
The dataset contains:
234 treated units (consumed larger than 12 servings of fish/shellfish)
873 control units (consumed no larger than 12 servings of fish/shellfish)
Demographic variables (gender, age, income, race, education, and smoking history) are included as covariates to adjust for non-random treatment assignment. These are commonly used to estimate propensity score weights via logistic regression for causal analyses of fish consumption on blood mercury levels.
National Health and Nutrition Examination Survey (NHANES) 2013-2014. https://www.cdc.gov/nchs/nhanes/
Original study describing this data preprocessing approach. (Add citation if available.)
data("nhanes.clean") head(nhanes.clean)data("nhanes.clean") head(nhanes.clean)
Creates a comprehensive visualization of sensitivity analysis results showing how treatment effect estimates and confidence intervals vary across different levels of unmeasured confounding. The function generates a publication-ready ggplot2 object with customizable options for benchmarks, axis breaks, and titles.
plot_sensitivity_analysis( results, parameter_level = NULL, ci = TRUE, benchmarking = FALSE, benchmark_variable = NULL, variable_name = NULL, header = NULL, x_axis_breaks = "default", x_break_value = NULL )plot_sensitivity_analysis( results, parameter_level = NULL, ci = TRUE, benchmarking = FALSE, benchmark_variable = NULL, variable_name = NULL, header = NULL, x_axis_breaks = "default", x_break_value = NULL )
results |
A list object returned by
|
parameter_level |
Numeric vector of parameter values to display on the x-axis.
Values are rounded to 5 decimal places for matching. If |
ci |
Logical flag indicating whether there is confidence interval evaluation in |
benchmarking |
Logical flag indicating whether to add benchmark reference
lines from observed confounders. When |
benchmark_variable |
Character vector specifying which benchmark variables
to display as reference lines. If |
variable_name |
Character vector for renaming benchmark variables in the plot.
Must have the same length as the selected |
header |
Character string for the plot title. If |
x_axis_breaks |
Character string controlling x-axis break behavior. Options:
Default is |
x_break_value |
Numeric value specifying the interval between custom breaks.
Only used when |
The plot includes:
Point estimate bounds as vertical line segments
95 percent confidence intervals as error bars
A dashed vertical line at the critical threshold (R* or *)
An IPW point estimate at the reference parameter value of 0
Optional benchmark reference lines from observed confounders
A horizontal reference line at y = 0 (no treatment effect)
Colors indicate whether estimates fall below (steelblue) or above (slategray)
the critical threshold. The x-axis represents the confounding parameter
(R² for VBM approaches, for other approaches).
The function performs several data processing steps before plotting:
Extracts point bounds and confidence intervals from the results list.
Determines approach-specific parameters (R² for VBM, for others).
Optionally filters data to specified parameter_level values.
Color-codes estimates based on position relative to critical threshold.
Optionally filters and renames benchmark variables.
Constructs the ggplot with appropriate scales and themes.
Parameter values are rounded to 5 decimal places when matching to parameter_level to avoid floating-point precision issues. Warnings are issued when requested parameter levels or benchmark variables are not found.
A ggplot2 object that can be further customized using standard
ggplot2 syntax (e.g., adding + theme(...) or + labs(...)).
The plot displays:
X-axis: Confounding parameter (R² or )
Y-axis: Estimated Average Treatment Effect (ATT)
Point estimates and confidence intervals colored by critical threshold
Optional benchmark reference lines with labels
The IPW estimate is always plotted at param = 0 regardless of whether
0 is included in parameter_level. This represents the unadjusted
treatment effect estimate.
run_sensitivity_analysis for generating the input list results
benchmark_lambda for generating benchmark parameters
ggplot for additional plot customization options
# Basic usage with default settings weightlist <- WeightIt::weightit(treatment ~ gender + age + income + income.missing + education + smoking.now + smoking.ever + race, data = nhanes.clean, method = "ps", estimand = "ATT") results <- run_sensitivity_analysis(weightlist=weightlist, Y="Y", treatment="treatment", data=nhanes.clean, approach="vbm", R2_seq = seq(0, 0.8, by = 0.01), alpha = 0.05, benchmarking = TRUE, n_bootstrap = 1000, n_cores = 1, seed = 331) # Add benchmark references with custom labels plot_sensitivity_analysis(results=results, benchmarking=TRUE, benchmark_variable = c( "age","education","smoking.now","smoking.ever","race"), variable_name=c("age","education","smoking","smoking history","race")) # Focus on specific parameter range with custom title plot_sensitivity_analysis(results=results, parameter_level=seq(0, 0.6, by = 0.1),, header = "Sensitivity Analysis: R² Range 0-0.6", x_axis_breaks = "pretty") # Custom x-axis breaks plot_sensitivity_analysis(results=results, x_axis_breaks = "custom", x_break_value = 0.02) # Minimal plot without benchmarks or title plot_sensitivity_analysis(results, x_axis_breaks = "none")# Basic usage with default settings weightlist <- WeightIt::weightit(treatment ~ gender + age + income + income.missing + education + smoking.now + smoking.ever + race, data = nhanes.clean, method = "ps", estimand = "ATT") results <- run_sensitivity_analysis(weightlist=weightlist, Y="Y", treatment="treatment", data=nhanes.clean, approach="vbm", R2_seq = seq(0, 0.8, by = 0.01), alpha = 0.05, benchmarking = TRUE, n_bootstrap = 1000, n_cores = 1, seed = 331) # Add benchmark references with custom labels plot_sensitivity_analysis(results=results, benchmarking=TRUE, benchmark_variable = c( "age","education","smoking.now","smoking.ever","race"), variable_name=c("age","education","smoking","smoking history","race")) # Focus on specific parameter range with custom title plot_sensitivity_analysis(results=results, parameter_level=seq(0, 0.6, by = 0.1),, header = "Sensitivity Analysis: R² Range 0-0.6", x_axis_breaks = "pretty") # Custom x-axis breaks plot_sensitivity_analysis(results=results, x_axis_breaks = "custom", x_break_value = 0.02) # Minimal plot without benchmarks or title plot_sensitivity_analysis(results, x_axis_breaks = "none")
Performs sensitivity analysis to assess the robustness of average treatment effect estimates (ATT) to unmeasured confounding using various approaches including Huang & Pimentel's variance-based sensitivity model (vbm), variance-based sensitivity model using a relaxed correlation bound (vbm_w_cor) and Zhao's marginal sensitivity model with bootstrap confidence interval (msm). The function handles weighting, bootstrap confidence intervals, and optional benchmarking against observed confounders.
run_sensitivity_analysis( weightlist = NULL, weights = NULL, Y, treatment, data, approach, lambda_seq = seq(1, 8, by = 0.25), R2_seq = seq(0, 0.9, by = 0.01), cor_eps_seq = NULL, alpha = 0.05, benchmarking = FALSE, n_bootstrap = 1000, n_cores = 4, seed = 123, verbose = TRUE )run_sensitivity_analysis( weightlist = NULL, weights = NULL, Y, treatment, data, approach, lambda_seq = seq(1, 8, by = 0.25), R2_seq = seq(0, 0.9, by = 0.01), cor_eps_seq = NULL, alpha = 0.05, benchmarking = FALSE, n_bootstrap = 1000, n_cores = 4, seed = 123, verbose = TRUE )
weightlist |
A WeightIt object from the |
weights |
A vector of weights fitted on data. Should have the same length as data. Confidence interval and benchmarking will be automatically turned off. If neither of weightlist or weights are provided, execution will halt and throw an error. |
Y |
A character string specifying the name of the outcome variable in
|
treatment |
A character string specifying the name of the treatment
variable in |
data |
A data frame containing the original data used to fit |
approach |
A character string specifying the sensitivity analysis
approach. Must be one of: |
lambda_seq |
A numeric vector of lambda values for MSM approach.
Default is |
R2_seq |
A numeric vector of R-squared values for VBM approach.
Default is |
cor_eps_seq |
A numeric vector of correlation values for
|
alpha |
A numeric value specifying the significance level for confidence
intervals. Default is |
benchmarking |
A logical value indicating whether to perform
benchmarking using leave-one-out covariate omission.
Default is |
n_bootstrap |
An integer specifying the number of bootstrap iterations
for confidence interval estimation. Default is |
n_cores |
An integer specifying the number of CPU cores to use for
parallel bootstrap computation. Default is |
seed |
An integer seed for random number generation to ensure
reproducibility. Default is |
verbose |
A boolean specifying whether give detailed massage at each stage.
Default is |
The function implements three sensitivity analysis approaches:
VBM (Huang & Pimentel's variance-based sensitivity model): Uses the R2 parameter to quantify the proportion of residual variance in the outcome explained by unmeasured confounders.
MSM (Zhao's marginal sensitivity model): Uses the lambda parameter representing the ratio of largest possible error on individual weights that can arise from omitting a confounder.
VBM with Correlation: Extends VBM by incorporating correlation between the error terms of the treatment imbalance and outcome.
A list object of class sensitivity_analysis containing:
difference_in_means: Unadjusted difference-in-means estimate
ipw_estimate: Inverse probability weighted treatment effect
estimate
difference_in_means_se: Standard error of unadjusted estimate
ipw_se: Standard error of IPW estimate
approach: Sensitivity analysis method used
point_bounds: Data frame with parameter values and
corresponding bounds for the point estimate of ATT
confidence_intervals: Data frame with parameter values and
confidence intervals
Rstar: (if approach = "vbm" or
approach = "vbm_w_cor") Critical threshold value for R-squared
lambda_star: (if approach = "msm") Critical threshold
value for lambda
benchmark_parameters: (if benchmarking = TRUE) Benchmark
values. For approach = "msm": A data frame with variable names
and benchmarked lambdas. For approach = "vbm" or
approach = "vbm_w_cor": A data frame with variable names,
benchmarked R-squared values, correlations, and the maximum bias of
point estimates
benchmark_point_bounds: (if benchmarking = TRUE) Point
bounds at benchmark values
benchmark_confidence_intervals: (if benchmarking = TRUE)
Confidence intervals at benchmark values
The dataset must have no missing values in treatment, outcome, or covariates.
Either weightlist or weights should be specified. Otherwise, analysis will halt and give a warning message.
Estimand must be ATT if weightlist is a WeightIt object. Otherwise, analysis will halt and give a warning message.
formula and data should be specified if if weightlist is a glm call.
Otherwise, analysis will halt and give a warning message.
Treatment variable should be binary (0/1).
For approach = "vbm_w_cor", cor_eps_seq must be provided
and have the same length as R2_seq. Otherwise, a warning will
be issued and execution will halt.
Bootstrap procedures can be computationally intensive for large datasets or many parameter values.
Huang, M., & Pimentel, S. D. (2025). Variance-based sensitivity analysis for weighting estimators results in more informative bounds. Biometrika, 112(1).
Zhao, Q., Small, D. S., & Bhattacharya, B. B. (2019). Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(4), 735-761.
plot_sensitivity_analysis for visualizing results,
estimate_point_estimate_bounds for bound calculations,
estimate_confidence_intervals for bootstrap CI computation,
benchmark_lambda for MSM benchmark calculations
data("nhanes-clean") weightlist <- WeightIt::weightit(treatment ~ gender + age + income + income.missing + education + smoking.now + smoking.ever + race, data = nhanes.clean, method = "ps", estimand = "ATT") # Basic MSM sensitivity analysis results <- run_sensitivity_analysis(weightlist=weightlist, Y="Y", treatment="treatment", data=nhanes.clean, approach="msm", lambda_seq = seq(1, 8, by = 0.25), alpha = 0.05, n_cores = 1, seed = 123) # VBM with benchmarking results <- run_sensitivity_analysis(weightlist=weightlist, Y="Y", treatment="treatment", data=nhanes.clean, approach="vbm", R2_seq = seq(0, 0.8, by = 0.01), alpha = 0.05, benchmarking = TRUE, n_bootstrap = 1000, n_cores = 1, seed = 331) # VBM with correlation results <- run_sensitivity_analysis(weightlist=weightlist, Y="Y", treatment="treatment", data=nhanes.clean, approach="vbm_w_cor", R2_seq = seq(0, 0.8, by=0.01), cor_eps_seq = rep(0.9, 81), n_cores = 1, alpha = 0.05) # if only weights are proviced only_weights <- run_sensitivity_analysis(weights=weightlist$weights, Y="Y", treatment="treatment", data=nhanes.clean, approach="msm", R2_seq = seq(1, 8, by = 0.25), alpha = 0.05, n_cores = 1, seed = 123) # if weightit call is glm glm_call <- quote(glm(formula = treatment ~ gender + age + income + income.missing + education + smoking.now + smoking.ever + race, data = nhanes.clean,family = binomial())) glm_results <- run_sensitivity_analysis(weightlist=glm_call, Y="Y", treatment="treatment", data=nhanes.clean, approach="msm", R2_seq = seq(1, 8, by = 0.25), alpha = 0.05, seed = 123, n_cores = 1, benchmarking = TRUE)data("nhanes-clean") weightlist <- WeightIt::weightit(treatment ~ gender + age + income + income.missing + education + smoking.now + smoking.ever + race, data = nhanes.clean, method = "ps", estimand = "ATT") # Basic MSM sensitivity analysis results <- run_sensitivity_analysis(weightlist=weightlist, Y="Y", treatment="treatment", data=nhanes.clean, approach="msm", lambda_seq = seq(1, 8, by = 0.25), alpha = 0.05, n_cores = 1, seed = 123) # VBM with benchmarking results <- run_sensitivity_analysis(weightlist=weightlist, Y="Y", treatment="treatment", data=nhanes.clean, approach="vbm", R2_seq = seq(0, 0.8, by = 0.01), alpha = 0.05, benchmarking = TRUE, n_bootstrap = 1000, n_cores = 1, seed = 331) # VBM with correlation results <- run_sensitivity_analysis(weightlist=weightlist, Y="Y", treatment="treatment", data=nhanes.clean, approach="vbm_w_cor", R2_seq = seq(0, 0.8, by=0.01), cor_eps_seq = rep(0.9, 81), n_cores = 1, alpha = 0.05) # if only weights are proviced only_weights <- run_sensitivity_analysis(weights=weightlist$weights, Y="Y", treatment="treatment", data=nhanes.clean, approach="msm", R2_seq = seq(1, 8, by = 0.25), alpha = 0.05, n_cores = 1, seed = 123) # if weightit call is glm glm_call <- quote(glm(formula = treatment ~ gender + age + income + income.missing + education + smoking.now + smoking.ever + race, data = nhanes.clean,family = binomial())) glm_results <- run_sensitivity_analysis(weightlist=glm_call, Y="Y", treatment="treatment", data=nhanes.clean, approach="msm", R2_seq = seq(1, 8, by = 0.25), alpha = 0.05, seed = 123, n_cores = 1, benchmarking = TRUE)