Title: | A Calibrated Sensitivity Analysis for Matched Observational Studies |
---|---|
Description: | Implements the calibrated sensitivity analysis approach for matched observational studies. Our sensitivity analysis framework views matched sets as drawn from a super-population. The unmeasured confounder is modeled as a random variable. We combine matching and model-based covariate-adjustment methods to estimate the treatment effect. The hypothesized unmeasured confounder enters the picture as a missing covariate. We adopt a state-of-art Expectation Maximization (EM) algorithm to handle this missing covariate problem in generalized linear models (GLMs). As our method also estimates the effect of each observed covariate on the outcome and treatment assignment, we are able to calibrate the unmeasured confounder to observed covariates. Zhang, B., Small, D. S. (2018). <arXiv:1812.00215>. |
Authors: | Bo Zhang |
Maintainer: | Bo Zhang <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1 |
Built: | 2025-02-20 07:04:32 UTC |
Source: | CRAN |
This is another main function in the package. For a given p and the border of the sensitivity parameters (lambda, delta), a calibration plot is made for each (lambda, delta) pair on the border.
calibrate_anim(border, q, u, p, degree, xmax, ymax, data_matched)
calibrate_anim(border, q, u, p, degree, xmax, ymax, data_matched)
border |
Border or frontier of the sensitivity parameters for a fixed p. |
q |
Number of matched covariates plus treatment. |
u |
Unmeasured confounder; u = c(1,0) if the unmeasured confounder is assumed to be binary. |
p |
The probability vector corresponding to u; p = c(0.5, 0.5) if the unmeasured confounder is assumed to be Bernoulli(0.5). |
degree |
Degree of freedom of the spline fit for the boundary. |
xmax |
Maximum xlim of the plot. |
ymax |
Maximum ylim of the plot. |
data_matched |
The matched dataset. |
border is the dataframe returned by the function find_border. It has to contain at least (k+1) different lambda/delta pairs in order to fit a smoothing spline with k dfs.
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) # Prepare the border lambda_vec = c(seq(0.1,1.9,0.1), 2.2, 2.5, 3, 3.5, 4) delta_vec = c(7.31, 5.34, 4.38, 3.76, 3.18, 2.87, 2.55, 2.36, 2.16, 1.99, 1.86, 1.74, 1.63, 1.54, 1.44, 1.40, 1.31, 1.28, 1.22, 1.08, 0.964, 0.877, 0.815, 0.750) border = data.frame(lambda_vec, delta_vec) calibrate_anim(border, 9, c(1,0), c(0.5,0.5), 10, 5, 3.5, NHANES_blood_lead_small_matched) detach(NHANES_blood_lead_small_matched)
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) # Prepare the border lambda_vec = c(seq(0.1,1.9,0.1), 2.2, 2.5, 3, 3.5, 4) delta_vec = c(7.31, 5.34, 4.38, 3.76, 3.18, 2.87, 2.55, 2.36, 2.16, 1.99, 1.86, 1.74, 1.63, 1.54, 1.44, 1.40, 1.31, 1.28, 1.22, 1.08, 0.964, 0.877, 0.815, 0.750) border = data.frame(lambda_vec, delta_vec) calibrate_anim(border, 9, c(1,0), c(0.5,0.5), 10, 5, 3.5, NHANES_blood_lead_small_matched) detach(NHANES_blood_lead_small_matched)
This is the main function in the package. Given a matched dataset and one particular (p, lambda, delta) triple, obtain corresponding coefficients of observed coefficients and plot them with the lengend added. This graph is meant to provide an intuitive interpretation of the magnitude of the sensitivity parameters lambda and delta by contrasting them with the estimated coefficients of the observed covariates.
calibrate_one(lambda_vec, delta_vec, q, u, p, lambda, delta, label_vec, data_matched)
calibrate_one(lambda_vec, delta_vec, q, u, p, lambda, delta, label_vec, data_matched)
lambda_vec |
A vector of lambdas that define the border. |
delta_vec |
A vector of deltas that define the border. |
q |
Number of matched covariates plus treatment. |
u |
Unmeasured confounder; u = c(1,0) if the unmeasured confounder is assumed to be binary. |
p |
The probability vector corresponding to u; p = c(0.5, 0.5) if the unmeasured confounder is assumed to be Bernoulli(0.5). |
lambda |
Sensitivity parameter that controls association between U and treatment assignment. |
delta |
Sensitivity parameter that controls association between U and response. |
label_vec |
A vector of characters of length q-1 consists of the names of observed/matched covariates. |
data_matched |
The matched dataset. |
border is the dataframe returned by the function find_border. It has to contain at least 7 different lambda/delta pairs in order to fit a smoothing spline with 6 dfs.
lambda and delta is a pair on the border.
label_vec is typically taken to be the columns names of the dataset, i.e., the names of the q - 1 observed covariates.
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) # Prepare the lambda_vec and delta_vec lambda_vec = c(seq(0.1,1.9,0.1), 2.2, 2.5, 3, 3.5, 4) delta_vec = c(7.31, 5.34, 4.38, 3.76, 3.18, 2.87, 2.55, 2.36, 2.16, 1.99, 1.86, 1.74, 1.63, 1.54, 1.44, 1.40, 1.31, 1.28, 1.22, 1.08, 0.964, 0.877, 0.815, 0.750) calibrate_one(lambda_vec, delta_vec, 9, c(1,0), c(0.5,0.5), 1, 0.492, colnames(NHANES_blood_lead_small_matched)[1:8], NHANES_blood_lead_small_matched) detach(NHANES_blood_lead_small_matched)
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) # Prepare the lambda_vec and delta_vec lambda_vec = c(seq(0.1,1.9,0.1), 2.2, 2.5, 3, 3.5, 4) delta_vec = c(7.31, 5.34, 4.38, 3.76, 3.18, 2.87, 2.55, 2.36, 2.16, 1.99, 1.86, 1.74, 1.63, 1.54, 1.44, 1.40, 1.31, 1.28, 1.22, 1.08, 0.964, 0.877, 0.815, 0.750) calibrate_one(lambda_vec, delta_vec, 9, c(1,0), c(0.5,0.5), 1, 0.492, colnames(NHANES_blood_lead_small_matched)[1:8], NHANES_blood_lead_small_matched) detach(NHANES_blood_lead_small_matched)
This is the main function in the package. Given a dataset and sensitivity parameters (p, lambda, delta), the function returns 95% CI for the estimated treatment effect.
CI_block_boot(q, u, p, lambda, delta, data_matched, n_boot = 2000)
CI_block_boot(q, u, p, lambda, delta, data_matched, n_boot = 2000)
q |
Number of matched covariates plus treatment. |
u |
Unmeasured confounder; u = c(1,0) if the unmeasured confounder is assumed to be binary. |
p |
The probability vector corresponding to u; p = c(0.5, 0.5) if the unmeasured confounder is assumed to be Bernoulli(0.5). |
lambda |
Sensitivity parameter that controls association between U and treatment assignment. |
delta |
Sensitivity parameter that controls association between U and response. |
data_matched |
The dataset after matching. |
n_boot |
Number of boostrap samples. |
If the number of matched covariates is k, then q = k + 1.
If the hypothesized unmeasured confounder is binary, then u = c(1,0) and p = c(p, 1-p).
data_matched should be in the following format: the first (q-1) columns are matched covariates, the qth column is the treatment status, and the (q+1)th column is the response. See the NHANES_blood_lead_small_matched dataset for an example.
Note the input for this function is a dataset before matching. To run this function, optmatch package needs to be installed and loaded.
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) CI_block_boot(9, c(1,0), c(0.5,0.5), 0, 0, NHANES_blood_lead_small_matched, n_boot = 10) detach(NHANES_blood_lead_small_matched)
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) CI_block_boot(9, c(1,0), c(0.5,0.5), 0, 0, NHANES_blood_lead_small_matched, n_boot = 10) detach(NHANES_blood_lead_small_matched)
This is the main function in the package. Given a matched dataset and sensitivity parameters (p, lambda, delta), the function runs the EM algorithm by the method of weights and return estimated coefficients of the propensity score model and the outcome regression model.
EM_Algorithm(q, u, p, lambda, delta, data_matched, all_coef = FALSE, aug_data = FALSE, tol = 0.0001)
EM_Algorithm(q, u, p, lambda, delta, data_matched, all_coef = FALSE, aug_data = FALSE, tol = 0.0001)
q |
Number of matched covariates plus treatment. |
u |
Unmeasured confounder; u = c(1,0) if the unmeasured confounder is assumed to be binary. |
p |
The probability vector corresponding to u; p = c(0.5, 0.5) if the unmeasured confounder is assumed to be Bernoulli(0.5). |
lambda |
Sensitivity parameter that controls association between U and treatment assignment. |
delta |
Sensitivity parameter that controls association between U and response. |
data_matched |
A matched dataset. See details below. |
all_coef |
TRUE then all estimated coefficients are returned, FALSE then only the estimated treatment effect is returned. |
aug_data |
TRUE then the augmented dataframe at the time of convergence is returned. |
tol |
Tolerance for the algorithm convergence. |
If the number of matched covariates is k, then q = k + 1.
If the hypothesized unmeasured confounder is binary, then u = c(1,0) and p = c(p, 1-p).
data_matched should be in the following format: the first (q-1) columns are matched covariates, the qth column is the treatment status, the (q+1)th column is the column of unmeasured confounders U0, the (q+2)th column is the response, the last column, i.e., (q+3)th column, is the assignment of the matched set. We use the fullmatch function in the package optmatch to perform the fullmatching. See NHANES_blood_lead_small_matched for an example of a matched dataset and the examples section therein for instructions on how to construct such a matched dataset.
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) # Run the EM algorithm assuming no unmeasured confounding, i.e., lambda =delta = 0 EM_Algorithm(9, c(1,0), c(0.5,0.5), 0, 0, NHANES_blood_lead_small_matched) # Run the EM algorithm assuming the magnitude of the unmeasured confounding is lambda =delta = 1 EM_Algorithm(9, c(1,0), c(0.5,0.5), 1, 1, NHANES_blood_lead_small_matched) detach(NHANES_blood_lead_small_matched)
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) # Run the EM algorithm assuming no unmeasured confounding, i.e., lambda =delta = 0 EM_Algorithm(9, c(1,0), c(0.5,0.5), 0, 0, NHANES_blood_lead_small_matched) # Run the EM algorithm assuming the magnitude of the unmeasured confounding is lambda =delta = 1 EM_Algorithm(9, c(1,0), c(0.5,0.5), 1, 1, NHANES_blood_lead_small_matched) detach(NHANES_blood_lead_small_matched)
Given the dataset, unmeasured confounder, sensitivity parameter p, and a sequence of lambda values, the function uses binary search to find a sequence of delta corresponding to each lambda in the lambda_vec such that the estimated 95% for the treatment effect barely covers 0. The function returns a dataframe consisting of lambda_vec and the corresponding deltas. See below for an example.
find_border(q, u, p, lambda_vec, start_value_low, start_value_high, data_matched, n_boot = 2000, tol = 0.01)
find_border(q, u, p, lambda_vec, start_value_low, start_value_high, data_matched, n_boot = 2000, tol = 0.01)
q |
Number of matched covariates plus treatment. |
u |
Unmeasured confounder; u = c(1,0) if the unmeasured confounder is assumed to be binary. |
p |
The probability vector corresponding to u; p = c(0.5, 0.5) if the unmeasured confounder is assumed to be Bernoulli(0.5). |
lambda_vec |
A sequence of lambda values. |
start_value_low |
Starting value for the binary search (the lower endpoint). |
start_value_high |
Starting value for the binary search (the higher endpoint). |
data_matched |
The dataset after matching. |
n_boot |
Number of boostrap samples used to approximate the CI. |
tol |
Tolerance for the binary search. |
start_value_low and start_value_high are user supplied numbers to start the binary search.
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) find_border(9, c(1,0), c(0.5,0.5), c(0.5,1,1.5), 0, 4, NHANES_blood_lead_small_matched, n_boot = 1000) detach(NHANES_blood_lead_small_matched)
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) find_border(9, c(1,0), c(0.5,0.5), c(0.5,1,1.5), 0, 4, NHANES_blood_lead_small_matched, n_boot = 1000) detach(NHANES_blood_lead_small_matched)
Estimate the maximum delta value for a given p and lambda, so that the estimated 95% confidence interval for the treatment effect is still significant. Note in order to run this function, optmatch package needs to be installed and loaded.
find_delta(q, u, p, lambda, start_value_low, start_value_high, data_matched, n_boot = 200, tol = 0.01)
find_delta(q, u, p, lambda, start_value_low, start_value_high, data_matched, n_boot = 200, tol = 0.01)
q |
Number of matched covariates plus treatment. |
u |
Unmeasured confounder; u = c(1,0) if the unmeasured confounder is assumed to be binary. |
p |
The probability vector corresponding to u; p = c(0.5, 0.5) if the unmeasured confounder is assumed to be Bernoulli(0.5). |
lambda |
A lambda value. |
start_value_low |
Starting value for the binary search (the lower endpoint). |
start_value_high |
Starting value for the binary search (the higher endpoint). |
data_matched |
The dataset after matching. |
n_boot |
Number of boostrap samples used to approximate the CI. |
tol |
Tolerance for the binary search. |
start_value_low and start_value_high are user supplied numbers to start the binary search.
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) find_delta(9, c(1,0), c(0.5,0.5), 1, 1, 3, NHANES_blood_lead_small_matched, n_boot = 1000) detach(NHANES_blood_lead_small_matched)
data(NHANES_blood_lead_small_matched) attach(NHANES_blood_lead_small_matched) find_delta(9, c(1,0), c(0.5,0.5), 1, 1, 3, NHANES_blood_lead_small_matched, n_boot = 1000) detach(NHANES_blood_lead_small_matched)
A dataset constructed from NHANES III.
data(NHANES_blood_lead)
data(NHANES_blood_lead)
A data frame with 4519 observations on the following 10 variables.
COP
treatment, 1 if cotinine level is between 0.563-14.9 ng/ml and 0 otherwise
DMARETHN
1 if white, 0 if others
DMPPIR
Poverty income ratio
HFE1
1 if the house is built before 1974, 0 if after 1974
HFE2
number of rooms in the house
HFHEDUCR
education level of the reference adult
HSAGEIR
age at the time of interview
HSFSIZER
size of the family
HSSEX
1 if male, 0 if female
PBP
blood lead level
We follow Mannino rt al. (2003) in constructing a dataset that includes children aged 4-16 years old for whom both serum cotinine levels and blood lead levels were measured in the Third National Health and Nutrition Examination Survey (NHANES III), along with the following variables: race/ethnicity, age, sex, poverty income ratio, education level of the reference adult, family size, number of rooms in the house, and year the house was constructed. The biomarker cotinine is a metabolite of nicotine and an indicator of second-hand smoke exposure. Treatment status is 1 if cotinine level is between 0.563-14.9 ng/ml and 0 otherwise. All continuous/ordinal variables are standardized by subtracting the mean and divided by 2 standard deviations so that they are more comparable to binary covariates (Gelman 2008).
NHANES III, the Third US National Health and Nutrition Examination Survey.
D. M. Mannino, R. Albalak, S. D. Grosse, and J. Repace. Second-hand smoke exposureand blood lead levels in U.S. children.Epidemiology, 14:719-727, 2003
A. Gelman. Scaling regression inputs by dividing by two standard deviations.Statisticsin Medicine, 27:2865-2873, 2008.
data(NHANES_blood_lead)
data(NHANES_blood_lead)
A random subset of NHANES_blood_lead data for the purpose of testing.
data(NHANES_blood_lead_small)
data(NHANES_blood_lead_small)
A random sample from the NHANES_blood_lead dataset. It consists of 500 instances and the same 10 variables as the NHANES_blood_lead data.
COP
treatment, 1 if cotinine level is between 0.563-14.9 ng/ml and 0 otherwise
DMARETHN
1 if white, 0 if others
DMPPIR
Poverty income ratio
HFE1
1 if the house is built before 1974, 0 if after 1974
HFE2
number of rooms in the house
HFHEDUCR
education level of the reference adult
HSAGEIR
age at the time of interview
HSFSIZER
size of the family
HSSEX
1 if male, 0 if female
PBP
blood lead level
We take a 500 random sample from the NHANES_blood_lead dataset. This small dataset is primarily for the purpose of testing the algorithm.
NHANES III, the Third US National Health and Nutrition Examination Survey.
D. M. Mannino, R. Albalak, S. D. Grosse, and J. Repace. Second-hand smoke exposureand blood lead levels in U.S. children.Epidemiology, 14:719-727, 2003
A. Gelman. Scaling regression inputs by dividing by two standard deviations.Statisticsin Medicine, 27:2865-2873, 2008.
data(NHANES_blood_lead_small)
data(NHANES_blood_lead_small)
NHANES_blood_lead_small data after a full matching using the optmatch package
data(NHANES_blood_lead_small_matched)
data(NHANES_blood_lead_small_matched)
NHANES_blood_lead_small dataset after a full matching. It consists of 500 instances and the following 12 variables:
COP
treatment, 1 if cotinine level is between 0.563-14.9 ng/ml and 0 otherwise
DMARETHN
1 if white, 0 if others
DMPPIR
Poverty income ratio
HFE1
1 if the house is built before 1974, 0 if after 1974
HFE2
number of rooms in the house
HFHEDUCR
education level of the reference adult
HSAGEIR
age at the time of interview
HSFSIZER
size of the family
HSSEX
1 if male, 0 if female
PBP
blood lead level
U0
placeholder for the hypothesized unmeasured confounder U
matches
matched set assignment
We perform a full matching on the NHANES_blood_lead_small dataset using the optmatch package. The code for constructing this matched dataset from the original dataset is given in the examples section. We add a column U0 as placeholder for the unmeasurefor confounder U.
NHANES III, the Third US National Health and Nutrition Examination Survey.
D. M. Mannino, R. Albalak, S. D. Grosse, and J. Repace. Second-hand smoke exposureand blood lead levels in U.S. children.Epidemiology, 14:719-727, 2003
A. Gelman. Scaling regression inputs by dividing by two standard deviations.Statisticsin Medicine, 27:2865-2873, 2008.
## Not run: # To run this example, optmatch must be installed set.seed(1) library(optmatch) data(NHANES_blood_lead_small) attach(NHANES_blood_lead_small) # Perform a fullmatch fm = fullmatch(COP ~. , data = NHANES_blood_lead_small[, 1:9], min.controls = 1/4, max.controls = 4) NHANES_blood_lead_small_matched = cbind(NHANES_blood_lead_small, matches = fm) # Add a U0 row U0 = rep(1, dim(NHANES_blood_lead_small_matched)[1]) NHANES_blood_lead_small_matched = cbind(NHANES_blood_lead_small_matched[,1:9], U0, NHANES_blood_lead_small_matched[, 10:11]) ## End(Not run)
## Not run: # To run this example, optmatch must be installed set.seed(1) library(optmatch) data(NHANES_blood_lead_small) attach(NHANES_blood_lead_small) # Perform a fullmatch fm = fullmatch(COP ~. , data = NHANES_blood_lead_small[, 1:9], min.controls = 1/4, max.controls = 4) NHANES_blood_lead_small_matched = cbind(NHANES_blood_lead_small, matches = fm) # Add a U0 row U0 = rep(1, dim(NHANES_blood_lead_small_matched)[1]) NHANES_blood_lead_small_matched = cbind(NHANES_blood_lead_small_matched[,1:9], U0, NHANES_blood_lead_small_matched[, 10:11]) ## End(Not run)