Title: | Simultaneous Multi-Bias Adjustment |
---|---|
Description: | Quantify the causal effect of a binary exposure on a binary outcome with adjustment for multiple biases. The functions can simultaneously adjust for any combination of uncontrolled confounding, exposure/outcome misclassification, and selection bias. The underlying method generalizes the concept of combining inverse probability of selection weighting with predictive value weighting. Simultaneous multi-bias analysis can be used to enhance the validity and transparency of real-world evidence obtained from observational, longitudinal studies. Based on the work from Paul Brendel, Aracelis Torres, and Onyebuchi Arah (2023) <doi:10.1093/ije/dyad001>. |
Authors: | Paul Brendel [aut, cre, cph] |
Maintainer: | Paul Brendel <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.6 |
Built: | 2024-10-27 12:33:18 UTC |
Source: | CRAN |
adjust_em
returns the exposure-outcome odds ratio and confidence
interval, adjusted for exposure misclassificaiton.
adjust_em( data_observed, data_validation = NULL, x_model_coefs = NULL, level = 0.95 )
adjust_em( data_observed, data_validation = NULL, x_model_coefs = NULL, level = 0.95 )
data_observed |
Object of class |
data_validation |
Object of class |
x_model_coefs |
The regression coefficients corresponding to the model: logit(P(X=1)) = δ0 + δ1X* + δ2Y + δ2+jCj, where X represents the binary true exposure, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters is therefore 3 + j. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df_observed <- data_observed( data = df_em, exposure = "Xstar", outcome = "Y", confounders = "C1" ) # Using validation data ----------------------------------------------------- df_validation <- data_validation( data = df_em_source, true_exposure = "X", true_outcome = "Y", confounders = "C1", misclassified_exposure = "Xstar" ) adjust_em( data_observed = df_observed, data_validation = df_validation ) # Using x_model_coefs ------------------------------------------------------- adjust_em( data_observed = df_observed, x_model_coefs = c(-2.10, 1.62, 0.63, 0.35) )
df_observed <- data_observed( data = df_em, exposure = "Xstar", outcome = "Y", confounders = "C1" ) # Using validation data ----------------------------------------------------- df_validation <- data_validation( data = df_em_source, true_exposure = "X", true_outcome = "Y", confounders = "C1", misclassified_exposure = "Xstar" ) adjust_em( data_observed = df_observed, data_validation = df_validation ) # Using x_model_coefs ------------------------------------------------------- adjust_em( data_observed = df_observed, x_model_coefs = c(-2.10, 1.62, 0.63, 0.35) )
adjust_em_om
returns the exposure-outcome odds ratio and confidence
interval, adjusted for exposure misclassification and outcome
misclassification. Two different options for the bias parameters are
available here: 1) parameters from separate models of X and Y
(x_model_coefs
and y_model_coefs
) or 2) parameters from
a joint model of X and Y (x1y0_model_coefs
,
x0y1_model_coefs
, and x1y1_model_coefs
).
adjust_em_om( data_observed, x_model_coefs = NULL, y_model_coefs = NULL, x1y0_model_coefs = NULL, x0y1_model_coefs = NULL, x1y1_model_coefs = NULL, level = 0.95 )
adjust_em_om( data_observed, x_model_coefs = NULL, y_model_coefs = NULL, x1y0_model_coefs = NULL, x0y1_model_coefs = NULL, x1y1_model_coefs = NULL, level = 0.95 )
data_observed |
Object of class |
x_model_coefs |
The regression coefficients corresponding to the model: logit(P(X=1)) = δ0 + δ1X* + δ2Y* + δ2+jCj, where X represents the binary true exposure, X* is the binary misclassified exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters is therefore 3 + j. |
y_model_coefs |
The regression coefficients corresponding to the model: logit(P(Y=1)) = β0 + β1X + β2Y* + β2+jCj, where Y represents the binary true exposure, X is the binary exposure, Y is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters is therefore 3 + j. |
x1y0_model_coefs |
The regression coefficients corresponding to the model: log(P(X=1,Y=0) / P(X=0,Y=0)) = γ1,0 + γ1,1X* + γ1,2Y* + γ1,2+jCj, where X is the binary true exposure, Y is the binary true outcome, X* is the binary misclassified exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
x0y1_model_coefs |
The regression coefficients corresponding to the model: log(P(X=0,Y=1) / P(X=0,Y=0)) = γ2,0 + γ2,1X* + γ2,2Y* + γ2,2+jCj, where X is the binary true exposure, Y is the binary true outcome, X* is the binary misclassified exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
x1y1_model_coefs |
The regression coefficients corresponding to the model: log(P(X=1,Y=1) / P(X=0,Y=0)) = γ3,0 + γ3,1X* + γ3,2Y* + γ3,2+jCj, where X is the binary true exposure, Y is the binary true outcome, X* is the binary misclassified exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df <- data_observed( data = df_em_om, exposure = "Xstar", outcome = "Ystar", confounders = "C1" ) # Using x_model_coefs and y_model_coefs ------------------------------------- adjust_em_om( df, x_model_coefs = c(-2.15, 1.64, 0.35, 0.38), y_model_coefs = c(-3.10, 0.63, 1.60, 0.39) ) # Using x1y0_model_coefs, x0y1_model_coefs, and x1y1_model_coefs ------------ adjust_em_om( df, x1y0_model_coefs = c(-2.18, 1.63, 0.23, 0.36), x0y1_model_coefs = c(-3.17, 0.22, 1.60, 0.40), x1y1_model_coefs = c(-4.76, 1.82, 1.83, 0.72) )
df <- data_observed( data = df_em_om, exposure = "Xstar", outcome = "Ystar", confounders = "C1" ) # Using x_model_coefs and y_model_coefs ------------------------------------- adjust_em_om( df, x_model_coefs = c(-2.15, 1.64, 0.35, 0.38), y_model_coefs = c(-3.10, 0.63, 1.60, 0.39) ) # Using x1y0_model_coefs, x0y1_model_coefs, and x1y1_model_coefs ------------ adjust_em_om( df, x1y0_model_coefs = c(-2.18, 1.63, 0.23, 0.36), x0y1_model_coefs = c(-3.17, 0.22, 1.60, 0.40), x1y1_model_coefs = c(-4.76, 1.82, 1.83, 0.72) )
adjust_em_sel
returns the exposure-outcome odds ratio and confidence
interval, adjusted for exposure misclassification and selection bias.
adjust_em_sel(data_observed, x_model_coefs, s_model_coefs, level = 0.95)
adjust_em_sel(data_observed, x_model_coefs, s_model_coefs, level = 0.95)
data_observed |
Object of class |
x_model_coefs |
The regression coefficients corresponding to the model: logit(P(X=1)) = δ0 + δ1X* + δ2Y + δ2+jCj, where X represents the binary true exposure, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters is therefore 3 + j. |
s_model_coefs |
The regression coefficients corresponding to the model: logit(P(S=1)) = β0 + β1X* + β2Y + β2+jCj, where S represents binary selection, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters is therefore 3 + j. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df <- data_observed( data = df_em_sel, exposure = "Xstar", outcome = "Y", confounders = "C1" ) adjust_em_sel( df, x_model_coefs = c(-2.78, 1.62, 0.58, 0.34), s_model_coefs = c(0.04, 0.18, 0.92, 0.05) )
df <- data_observed( data = df_em_sel, exposure = "Xstar", outcome = "Y", confounders = "C1" ) adjust_em_sel( df, x_model_coefs = c(-2.78, 1.62, 0.58, 0.34), s_model_coefs = c(0.04, 0.18, 0.92, 0.05) )
adjust_om
returns the exposure-outcome odds ratio and confidence
interval, adjusted for outcome misclassificaiton.
adjust_om( data_observed, data_validation = NULL, y_model_coefs = NULL, level = 0.95 )
adjust_om( data_observed, data_validation = NULL, y_model_coefs = NULL, level = 0.95 )
data_observed |
Object of class |
data_validation |
Object of class |
y_model_coefs |
The regression coefficients corresponding to the model: logit(P(Y=1)) = δ0 + δ1X + δ2Y* + δ2+jCj, where Y represents the binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters is therefore 3 + j. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df_observed <- data_observed( data = df_om, exposure = "X", outcome = "Ystar", confounders = "C1" ) # Using validation data ----------------------------------------------------- df_validation <- data_validation( data = df_om_source, true_exposure = "X", true_outcome = "Y", confounders = "C1", misclassified_outcome = "Ystar" ) adjust_om( data_observed = df_observed, data_validation = df_validation ) # Using y_model_coefs ------------------------------------------------------- adjust_om( data_observed = df_observed, y_model_coefs = c(-3.1, 0.6, 1.6, 0.4) )
df_observed <- data_observed( data = df_om, exposure = "X", outcome = "Ystar", confounders = "C1" ) # Using validation data ----------------------------------------------------- df_validation <- data_validation( data = df_om_source, true_exposure = "X", true_outcome = "Y", confounders = "C1", misclassified_outcome = "Ystar" ) adjust_om( data_observed = df_observed, data_validation = df_validation ) # Using y_model_coefs ------------------------------------------------------- adjust_om( data_observed = df_observed, y_model_coefs = c(-3.1, 0.6, 1.6, 0.4) )
adjust_om_sel
returns the exposure-outcome odds ratio and confidence
interval, adjusted for outcome misclassification and selection bias.
adjust_om_sel(data_observed, y_model_coefs, s_model_coefs, level = 0.95)
adjust_om_sel(data_observed, y_model_coefs, s_model_coefs, level = 0.95)
data_observed |
Object of class |
y_model_coefs |
The regression coefficients corresponding to the model: logit(P(Y=1)) = δ0 + δ1X + δ2Y* + δ2+jCj, where Y represents the binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters is therefore 3 + j. |
s_model_coefs |
The regression coefficients corresponding to the model: logit(P(S=1)) = β0 + β1X + β2Y* + β2+jCj, where S represents binary selection, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters is therefore 3 + j. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df <- data_observed( data = df_om_sel, exposure = "X", outcome = "Ystar", confounders = "C1" ) adjust_om_sel( df, y_model_coefs = c(-3.24, 0.58, 1.59, 0.45), s_model_coefs = c(0.03, 0.92, 0.12, 0.05) )
df <- data_observed( data = df_om_sel, exposure = "X", outcome = "Ystar", confounders = "C1" ) adjust_om_sel( df, y_model_coefs = c(-3.24, 0.58, 1.59, 0.45), s_model_coefs = c(0.03, 0.92, 0.12, 0.05) )
adjust_sel
returns the exposure-outcome odds ratio and confidence
interval, adjusted for selection bias.
adjust_sel( data_observed, data_validation = NULL, s_model_coefs = NULL, level = 0.95 )
adjust_sel( data_observed, data_validation = NULL, s_model_coefs = NULL, level = 0.95 )
data_observed |
Object of class |
data_validation |
Object of class |
s_model_coefs |
The regression coefficients corresponding to the model: logit(P(S=1)) = β0 + β1X + β2Y, where S represents binary selection, X is the exposure, and Y is the outcome. The number of parameters is therefore 3. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df_observed <- data_observed( data = df_sel, exposure = "X", outcome = "Y", confounders = "C1" ) # Using validation data ----------------------------------------------------- df_validation <- data_validation( data = df_sel_source, true_exposure = "X", true_outcome = "Y", confounders = "C1", selection = "S" ) adjust_sel( data_observed = df_observed, data_validation = df_validation ) # Using s_model_coefs ------------------------------------------------------- adjust_sel( data_observed = df_observed, s_model_coefs = c(0, 0.9, 0.9) )
df_observed <- data_observed( data = df_sel, exposure = "X", outcome = "Y", confounders = "C1" ) # Using validation data ----------------------------------------------------- df_validation <- data_validation( data = df_sel_source, true_exposure = "X", true_outcome = "Y", confounders = "C1", selection = "S" ) adjust_sel( data_observed = df_observed, data_validation = df_validation ) # Using s_model_coefs ------------------------------------------------------- adjust_sel( data_observed = df_observed, s_model_coefs = c(0, 0.9, 0.9) )
adjust_uc
returns the exposure-outcome odds ratio and confidence
interval, adjusted for uncontrolled confounding from a binary confounder.
adjust_uc( data_observed, data_validation = NULL, u_model_coefs = NULL, level = 0.95 )
adjust_uc( data_observed, data_validation = NULL, u_model_coefs = NULL, level = 0.95 )
data_observed |
Object of class |
data_validation |
Object of class |
u_model_coefs |
The regression coefficients corresponding to the model: logit(P(U=1)) = α0 + α1X + α2Y + α2+jCj, where U is the binary unmeasured confounder, X is the exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df_observed <- data_observed( data = df_uc, exposure = "X_bi", outcome = "Y_bi", confounders = c("C1", "C2", "C3") ) # Using validation data ----------------------------------------------------- df_validation <- data_validation( data = df_uc_source, true_exposure = "X_bi", true_outcome = "Y_bi", confounders = c("C1", "C2", "C3", "U") ) adjust_uc( data_observed = df_observed, data_validation = df_validation ) # Using u_model_coefs ------------------------------------------------------- adjust_uc( data_observed = df_observed, u_model_coefs = c(-0.19, 0.61, 0.70, -0.09, 0.10, -0.15) )
df_observed <- data_observed( data = df_uc, exposure = "X_bi", outcome = "Y_bi", confounders = c("C1", "C2", "C3") ) # Using validation data ----------------------------------------------------- df_validation <- data_validation( data = df_uc_source, true_exposure = "X_bi", true_outcome = "Y_bi", confounders = c("C1", "C2", "C3", "U") ) adjust_uc( data_observed = df_observed, data_validation = df_validation ) # Using u_model_coefs ------------------------------------------------------- adjust_uc( data_observed = df_observed, u_model_coefs = c(-0.19, 0.61, 0.70, -0.09, 0.10, -0.15) )
adjust_uc_em
returns the exposure-outcome odds ratio and confidence
interval, adjusted for uncontrolled confounding and exposure
misclassificaiton. Two different options for the bias parameters are
available here: 1) parameters from separate models of U and X
(u_model_coefs
and x_model_coefs
) or 2) parameters from a
joint model of U and X (x1u0_model_coefs
,
x0u1_model_coefs
, and x1u1_model_coefs
).
adjust_uc_em( data_observed, u_model_coefs = NULL, x_model_coefs = NULL, x1u0_model_coefs = NULL, x0u1_model_coefs = NULL, x1u1_model_coefs = NULL, level = 0.95 )
adjust_uc_em( data_observed, u_model_coefs = NULL, x_model_coefs = NULL, x1u0_model_coefs = NULL, x0u1_model_coefs = NULL, x1u1_model_coefs = NULL, level = 0.95 )
data_observed |
Object of class |
u_model_coefs |
The regression coefficients corresponding to the model: logit(P(U=1)) = α0 + α1X + α2Y, where U is the binary unmeasured confounder, X is the binary true exposure, and Y is the outcome. The number of parameters therefore equals 3. |
x_model_coefs |
The regression coefficients corresponding to the model: logit(P(X=1)) = δ0 + δ1X* + δ2Y + δ2+jCj, where X represents the binary true exposure, X* is the binary misclassified exposure, Y is the outcome, and C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
x1u0_model_coefs |
The regression coefficients corresponding to the model: log(P(X=1,U=0)/P(X=0,U=0)) = γ1,0 + γ1,1X* + γ1,2Y + γ1,2+jCj, where X is the binary true exposure, U is the binary unmeasured confounder, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
x0u1_model_coefs |
The regression coefficients corresponding to the model: log(P(X=0,U=1)/P(X=0,U=0)) = γ2,0 + γ2,1X* + γ2,2Y + γ2,2+jCj, where X is the binary true exposure, U is the binary unmeasured confounder, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
x1u1_model_coefs |
The regression coefficients corresponding to the model: log(P(X=1,U=1)/P(X=0,U=0)) = γ3,0 + γ3,1X* + γ3,2Y + γ3,2+jCj, where X is the binary true exposure, U is the binary unmeasured confounder, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df <- data_observed( data = df_uc_em, exposure = "Xstar", outcome = "Y", confounders = "C1" ) # Using u_model_coefs and x_model_coefs ------------------------------------- adjust_uc_em( df, u_model_coefs = c(-0.23, 0.63, 0.66), x_model_coefs = c(-2.47, 1.62, 0.73, 0.32) ) # Using x1u0_model_coefs, x0u1_model_coefs, x1u1_model_coefs ---------------- adjust_uc_em( df, x1u0_model_coefs = c(-2.82, 1.62, 0.68, -0.06), x0u1_model_coefs = c(-0.20, 0.00, 0.68, -0.05), x1u1_model_coefs = c(-2.36, 1.62, 1.29, 0.27) )
df <- data_observed( data = df_uc_em, exposure = "Xstar", outcome = "Y", confounders = "C1" ) # Using u_model_coefs and x_model_coefs ------------------------------------- adjust_uc_em( df, u_model_coefs = c(-0.23, 0.63, 0.66), x_model_coefs = c(-2.47, 1.62, 0.73, 0.32) ) # Using x1u0_model_coefs, x0u1_model_coefs, x1u1_model_coefs ---------------- adjust_uc_em( df, x1u0_model_coefs = c(-2.82, 1.62, 0.68, -0.06), x0u1_model_coefs = c(-0.20, 0.00, 0.68, -0.05), x1u1_model_coefs = c(-2.36, 1.62, 1.29, 0.27) )
adjust_uc_em_sel
returns the exposure-outcome odds ratio and
confidence interval, adjusted for uncontrolled confounding, exposure
misclassificaiton, and selection bias. Two different options for the bias
parameters are availale here: 1) parameters from separate models
of U and X (u_model_coefs
and x_model_coefs
)
or 2) parameters from a joint model of U and X
(x1u0_model_coefs
, x0u1_model_coefs
, and
x1u1_model_coefs
). Both approaches require s_model_coefs
.
adjust_uc_em_sel( data_observed, u_model_coefs = NULL, x_model_coefs = NULL, x1u0_model_coefs = NULL, x0u1_model_coefs = NULL, x1u1_model_coefs = NULL, s_model_coefs, level = 0.95 )
adjust_uc_em_sel( data_observed, u_model_coefs = NULL, x_model_coefs = NULL, x1u0_model_coefs = NULL, x0u1_model_coefs = NULL, x1u1_model_coefs = NULL, s_model_coefs, level = 0.95 )
data_observed |
Object of class |
u_model_coefs |
The regression coefficients corresponding to the model: logit(P(U=1)) = α0 + α1X + α2Y, where U is the binary unmeasured confounder, X is the binary true exposure, and Y is the outcome. The number of parameters therefore equals 3. |
x_model_coefs |
The regression coefficients corresponding to the model: logit(P(X=1)) = δ0 + δ1X* + δ2Y + δ2+jCj, where X represents binary true exposure, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
x1u0_model_coefs |
The regression coefficients corresponding to the model: log(P(X=1,U=0)/P(X=0,U=0)) = γ1,0 + γ1,1X* + γ1,2Y + γ1,2+jCj, where X is the binary true exposure, U is the binary unmeasured confounder, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
x0u1_model_coefs |
The regression coefficients corresponding to the model: log(P(X=0,U=1)/P(X=0,U=0)) = γ2,0 + γ2,1X* + γ2,2Y + γ2,2+jCj, where X is the binary true exposure, U is the binary unmeasured confounder, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
x1u1_model_coefs |
The regression coefficients corresponding to the model: log(P(X=1,U=1)/P(X=0,U=0)) = γ3,0 + γ3,1X* + γ3,2Y + γ3,2+jCj, where X is the binary true exposure, U is the binary unmeasured confounder, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
s_model_coefs |
The regression coefficients corresponding to the model: logit(P(S=1)) = β0 + β1X* + β2Y + β2+jC2+j, where S represents binary selection, X* is the binary misclassified exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df <- data_observed( data = df_uc_em_sel, exposure = "Xstar", outcome = "Y", confounders = c("C1", "C2", "C3") ) # Using u_model_coefs, x_model_coefs, s_model_coefs ------------------------- adjust_uc_em_sel( df, u_model_coefs = c(-0.32, 0.59, 0.69), x_model_coefs = c(-2.44, 1.62, 0.72, 0.32, -0.15, 0.85), s_model_coefs = c(0.00, 0.26, 0.78, 0.03, -0.02, 0.10) ) # Using x1u0_model_coefs, x0u1_model_coefs, x1u1_model_coefs, s_model_coefs adjust_uc_em_sel( df, x1u0_model_coefs = c(-2.78, 1.62, 0.61, 0.36, -0.27, 0.88), x0u1_model_coefs = c(-0.17, -0.01, 0.71, -0.08, 0.07, -0.15), x1u1_model_coefs = c(-2.36, 1.62, 1.29, 0.25, -0.06, 0.74), s_model_coefs = c(0.00, 0.26, 0.78, 0.03, -0.02, 0.10) )
df <- data_observed( data = df_uc_em_sel, exposure = "Xstar", outcome = "Y", confounders = c("C1", "C2", "C3") ) # Using u_model_coefs, x_model_coefs, s_model_coefs ------------------------- adjust_uc_em_sel( df, u_model_coefs = c(-0.32, 0.59, 0.69), x_model_coefs = c(-2.44, 1.62, 0.72, 0.32, -0.15, 0.85), s_model_coefs = c(0.00, 0.26, 0.78, 0.03, -0.02, 0.10) ) # Using x1u0_model_coefs, x0u1_model_coefs, x1u1_model_coefs, s_model_coefs adjust_uc_em_sel( df, x1u0_model_coefs = c(-2.78, 1.62, 0.61, 0.36, -0.27, 0.88), x0u1_model_coefs = c(-0.17, -0.01, 0.71, -0.08, 0.07, -0.15), x1u1_model_coefs = c(-2.36, 1.62, 1.29, 0.25, -0.06, 0.74), s_model_coefs = c(0.00, 0.26, 0.78, 0.03, -0.02, 0.10) )
adjust_uc_om
returns the exposure-outcome odds ratio and confidence
interval, adjusted for uncontrolled confounding and outcome
misclassificaiton. Two different options for the bias parameters are
available here: 1) parameters from separate models of U and Y
(u_model_coefs
and y_model_coefs
) or 2) parameters from
a joint model of U and Y (u1y0_model_coefs
,
u0y1_model_coefs
, and u1y1_model_coefs
).
adjust_uc_om( data_observed, u_model_coefs = NULL, y_model_coefs = NULL, u1y0_model_coefs = NULL, u0y1_model_coefs = NULL, u1y1_model_coefs = NULL, level = 0.95 )
adjust_uc_om( data_observed, u_model_coefs = NULL, y_model_coefs = NULL, u1y0_model_coefs = NULL, u0y1_model_coefs = NULL, u1y1_model_coefs = NULL, level = 0.95 )
data_observed |
Object of class |
u_model_coefs |
The regression coefficients corresponding to the model: logit(P(U=1)) = α0 + α1X + α2Y, where U is the binary unmeasured confounder, X is the exposure, Y is the binary true outcome. The number of parameters therefore equals 3. |
y_model_coefs |
The regression coefficients corresponding to the model: logit(P(Y=1)) = δ0 + δ1X + δ2Y* + δ2+jCj, where Y represents binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
u1y0_model_coefs |
The regression coefficients corresponding to the model: log(P(U=1,Y=0)/P(U=0,Y=0)) = γ1,0 + γ1,1X + γ1,2Y* + γ1,2+jCj, where U is the binary unmeasured confounder, Y is the binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
u0y1_model_coefs |
The regression coefficients corresponding to the model: log(P(U=0,Y=1)/P(U=0,Y=0)) = γ2,0 + γ2,1X + γ2,2Y* + γ2,2+jCj, where U is the binary unmeasured confounder, Y is the binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
u1y1_model_coefs |
The regression coefficients corresponding to the model: log(P(U=1,Y=1)/P(U=0,Y=0)) = γ3,0 + γ3,1X + γ3,2Y* + γ3,2+jCj, where U is the binary unmeasured confounder, Y is the binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df <- data_observed( data = df_uc_om, exposure = "X", outcome = "Ystar", confounders = "C1" ) # Using u_model_coefs and y_model_coefs ------------------------------------- adjust_uc_om( df, u_model_coefs = c(-0.22, 0.61, 0.70), y_model_coefs = c(-2.85, 0.73, 1.60, 0.38) ) # Using u1y0_model_coefs, u0y1_model_coefs, u1y1_model_coefs ---------------- adjust_uc_om( df, u1y0_model_coefs = c(-0.19, 0.61, 0.00, -0.07), u0y1_model_coefs = c(-3.21, 0.60, 1.60, 0.36), u1y1_model_coefs = c(-2.72, 1.24, 1.59, 0.34) )
df <- data_observed( data = df_uc_om, exposure = "X", outcome = "Ystar", confounders = "C1" ) # Using u_model_coefs and y_model_coefs ------------------------------------- adjust_uc_om( df, u_model_coefs = c(-0.22, 0.61, 0.70), y_model_coefs = c(-2.85, 0.73, 1.60, 0.38) ) # Using u1y0_model_coefs, u0y1_model_coefs, u1y1_model_coefs ---------------- adjust_uc_om( df, u1y0_model_coefs = c(-0.19, 0.61, 0.00, -0.07), u0y1_model_coefs = c(-3.21, 0.60, 1.60, 0.36), u1y1_model_coefs = c(-2.72, 1.24, 1.59, 0.34) )
adjust_uc_om_sel
returns the exposure-outcome odds ratio and
confidence interval, adjusted for uncontrolled confounding, outcome
misclassificaiton, and selection bias. Two different options for the bias
parameters are availale here: 1) parameters from separate models
of U and Y (u_model_coefs
and y_model_coefs
)
or 2) parameters from a joint model of U and Y
(u1y0_model_coefs
, u0y1_model_coefs
, and
u1y1_model_coefs
). Both approaches require s_model_coefs
.
adjust_uc_om_sel( data_observed, u_model_coefs = NULL, y_model_coefs = NULL, u0y1_model_coefs = NULL, u1y0_model_coefs = NULL, u1y1_model_coefs = NULL, s_model_coefs, level = 0.95 )
adjust_uc_om_sel( data_observed, u_model_coefs = NULL, y_model_coefs = NULL, u0y1_model_coefs = NULL, u1y0_model_coefs = NULL, u1y1_model_coefs = NULL, s_model_coefs, level = 0.95 )
data_observed |
Object of class |
u_model_coefs |
The regression coefficients corresponding to the model: logit(P(U=1)) = α0 + α1X + α2Y, where U is the binary unmeasured confounder, X is the exposure, and Y is the binary true outcome. The number of parameters therefore equals 3. |
y_model_coefs |
The regression coefficients corresponding to the model: logit(P(Y=1)) = δ0 + δ1X + δ2Y* + δ2+jCj, where Y represents binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
u0y1_model_coefs |
The regression coefficients corresponding to the model: log(P(U=0,Y=1)/P(U=0,Y=0)) = γ2,0 + γ2,1X + γ2,2Y* + γ2,2+jCj, where U is the binary unmeasured confounder, Y is the binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
u1y0_model_coefs |
The regression coefficients corresponding to the model: log(P(U=1,Y=0)/P(U=0,Y=0)) = γ1,0 + γ1,1X + γ1,2Y* + γ1,2+jCj, where U is the binary unmeasured confounder, Y is the binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
u1y1_model_coefs |
The regression coefficients corresponding to the model: log(P(U=1,Y=1)/P(U=0,Y=0)) = γ3,0 + γ3,1X + γ3,2Y* + γ3,2+jCj, where U is the binary unmeasured confounder, Y is the binary true outcome, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
s_model_coefs |
The regression coefficients corresponding to the model: logit(P(S=1)) = β0 + β1X + β2Y* + β2+jC2+j, where S represents binary selection, X is the exposure, Y* is the binary misclassified outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df <- data_observed( data = df_uc_om_sel, exposure = "X", outcome = "Ystar", confounders = c("C1", "C2", "C3") ) # Using u_model_coefs, y_model_coefs, s_model_coefs ------------------------- adjust_uc_om_sel( df, u_model_coefs = c(-0.32, 0.59, 0.69), y_model_coefs = c(-2.85, 0.71, 1.63, 0.40, -0.85, 0.22), s_model_coefs = c(0.00, 0.74, 0.19, 0.02, -0.06, 0.02) ) # Using u1y0_model_coefs, u0y1_model_coefs, u1y1_model_coefs, s_model_coefs adjust_uc_om_sel( df, u1y0_model_coefs = c(-0.20, 0.62, 0.01, -0.08, 0.10, -0.15), u0y1_model_coefs = c(-3.28, 0.63, 1.65, 0.42, -0.85, 0.26), u1y1_model_coefs = c(-2.70, 1.22, 1.64, 0.32, -0.77, 0.09), s_model_coefs = c(0.00, 0.74, 0.19, 0.02, -0.06, 0.02) )
df <- data_observed( data = df_uc_om_sel, exposure = "X", outcome = "Ystar", confounders = c("C1", "C2", "C3") ) # Using u_model_coefs, y_model_coefs, s_model_coefs ------------------------- adjust_uc_om_sel( df, u_model_coefs = c(-0.32, 0.59, 0.69), y_model_coefs = c(-2.85, 0.71, 1.63, 0.40, -0.85, 0.22), s_model_coefs = c(0.00, 0.74, 0.19, 0.02, -0.06, 0.02) ) # Using u1y0_model_coefs, u0y1_model_coefs, u1y1_model_coefs, s_model_coefs adjust_uc_om_sel( df, u1y0_model_coefs = c(-0.20, 0.62, 0.01, -0.08, 0.10, -0.15), u0y1_model_coefs = c(-3.28, 0.63, 1.65, 0.42, -0.85, 0.26), u1y1_model_coefs = c(-2.70, 1.22, 1.64, 0.32, -0.77, 0.09), s_model_coefs = c(0.00, 0.74, 0.19, 0.02, -0.06, 0.02) )
adjust_uc_sel
returns the exposure-outcome odds ratio and confidence
interval, adjusted for uncontrolled confounding and exposure
misclassificaiton.
adjust_uc_sel(data_observed, u_model_coefs, s_model_coefs, level = 0.95)
adjust_uc_sel(data_observed, u_model_coefs, s_model_coefs, level = 0.95)
data_observed |
Object of class |
u_model_coefs |
The regression coefficients corresponding to the model: logit(P(U=1)) = α0 + α1X + α2Y + α2+jCj, where U is the binary unmeasured confounder, X is the exposure, Y is the outcome, C represents the vector of measured confounders (if any), and j corresponds to the number of measured confounders. The number of parameters therefore equals 3 + j. |
s_model_coefs |
The regression coefficients corresponding to the model: logit(P(S=1)) = β0 + β1X + β2Y, where S represents binary selection, X is the exposure, and Y is the outcome. The number of parameters therefore equals 3. |
level |
Value from 0-1 representing the full range of the confidence interval. Default is 0.95. |
Values for the regression coefficients can be applied as
fixed values or as single draws from a probability
distribution (ex: rnorm(1, mean = 2, sd = 1)
). The latter has
the advantage of allowing the researcher to capture the uncertainty
in the bias parameter estimates. To incorporate this uncertainty in the
estimate and confidence interval, this function should be run in loop across
bootstrap samples of the dataframe for analysis. The estimate and
confidence interval would then be obtained from the median and quantiles
of the distribution of odds ratio estimates.
A list where the first item is the odds ratio estimate of the effect of the exposure on the outcome and the second item is the confidence interval as the vector: (lower bound, upper bound).
df <- data_observed( data = df_uc_sel, exposure = "X", outcome = "Y", confounders = c("C1", "C2", "C3") ) adjust_uc_sel( df, u_model_coefs = c(-0.19, 0.61, 0.72, -0.09, 0.10, -0.15), s_model_coefs = c(-0.01, 0.92, 0.94) )
df <- data_observed( data = df_uc_sel, exposure = "X", outcome = "Y", confounders = c("C1", "C2", "C3") ) adjust_uc_sel( df, u_model_coefs = c(-0.19, 0.61, 0.72, -0.09, 0.10, -0.15), s_model_coefs = c(-0.01, 0.92, 0.94) )
data_observed
combines the observed dataframe with specific identification
of the columns corresponding to the exposure, outcome, and confounders. It is
an essential input of all adjust
functions.
data_observed(data, exposure, outcome, confounders = NULL)
data_observed(data, exposure, outcome, confounders = NULL)
data |
Dataframe for bias analysis. |
exposure |
String name of the column in |
outcome |
String name of the column in |
confounders |
String name(s) of the column(s) in |
df <- data_observed( data = df_sel, exposure = "X", outcome = "Y", confounders = c("C1", "C2", "C3") )
df <- data_observed( data = df_sel, exposure = "X", outcome = "Y", confounders = c("C1", "C2", "C3") )
data_validation
combines the validation dataframe with specific
identification of the appropriate columns for bias adjustment, including:
true exposure, true outcome, confounders, misclassified exposure,
misclassified outcome, and selection. The purpose of validation data is to
use an external data source to transport the necessary causal relationships
that are missing in the observed data.
data_validation( data, true_exposure, true_outcome, confounders = NULL, misclassified_exposure = NULL, misclassified_outcome = NULL, selection = NULL )
data_validation( data, true_exposure, true_outcome, confounders = NULL, misclassified_exposure = NULL, misclassified_outcome = NULL, selection = NULL )
data |
Dataframe of validation data |
true_exposure |
String name of the column in |
true_outcome |
String name of the column in |
confounders |
String name(s) of the column(s) in |
misclassified_exposure |
String name of the column in |
misclassified_outcome |
String name of the column in |
selection |
String name of the column in |
df <- data_validation( data = df_sel_source, true_exposure = "X", true_outcome = "Y", confounders = c("C1", "C2", "C3"), selection = "S" )
df <- data_validation( data = df_sel_source, true_exposure = "X", true_outcome = "Y", confounders = c("C1", "C2", "C3"), selection = "S" )
Data containing one source of bias, three known confounders, and
100,000 observations. This data is obtained from df_emc_source
by removing the column X. The resulting data corresponds to
what a researcher would see in the real-world: a misclassified exposure,
Xstar, and no data on the true exposure. As seen in
df_emc_source
, the true, unbiased exposure-outcome odds ratio = 2.
df_em
df_em
A dataframe with 100,000 rows and 5 columns:
misclassified exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
Data containing two sources of bias, three known confounders, and
100,000 observations. This data is obtained from df_emc_omc_source
by removing the columns X and Y. The resulting data corresponds
to what a researcher would see in the real-world: a misclassified exposure,
Xstar, and a misclassified outcome, Ystar. As seen in
df_em_om_source
, the true, unbiased exposure-outcome
odds ratio = 2.
df_em_om
df_em_om
A dataframe with 100,000 rows and 5 columns:
misclassified exposure, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
df_em_om
Data with complete information on the two sources of bias, three known
confounders, and 100,000 observations. This data is used to derive
df_em_om
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_em_om
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_em_om_source
df_em_om_source
A dataframe with 100,000 rows and 7 columns:
true exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
misclassified exposure, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
Data containing two sources of bias, three known confounders, and
100,000 observations. This data is obtained by sampling with replacement
with probability = S from df_em_sel_source
then removing the
columns X and S. The resulting data corresponds to what a
researcher would see in the real-world: a misclassified exposure,
Xstar, and missing data for those not selected into the study
(S=0). As seen in df_em_sel_source
, the true, unbiased
exposure-outcome odds ratio = 2.
df_em_sel
df_em_sel
A dataframe with 100,000 rows and 5 columns:
misclassified exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
df_em_sel
Data with complete information on the two sources of bias, three known
confounders, and 100,000 observations. This data is used to derive
df_em_sel
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_em_sel
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_em_sel_source
df_em_sel_source
A dataframe with 100,000 rows and 7 columns:
true exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
misclassified exposure, 1 = present and 0 = absent
selection, 1 = selected into the study and 0 = not selected into the study
df_em
Data with complete information on one sources of bias, three known
confounders, and 100,000 observations. This data is used to derive
df_em
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_em
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_em_source
df_em_source
A dataframe with 100,000 rows and 6 columns:
exposure, 1 = present and 0 = absent
true outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
misclassified exposure, 1 = present and 0 = absent
Data containing one source of bias, three known confounders, and
100,000 observations. This data is obtained from df_om_source
by removing the column Y. The resulting data corresponds to
what a researcher would see in the real-world: a misclassified outcome,
Ystar, and no data on the true outcome. As seen in
df_om_source
, the true, unbiased exposure-outcome odds ratio = 2.
df_om
df_om
A dataframe with 100,000 rows and 5 columns:
exposure, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
Data containing two sources of bias, a known confounder, and
100,000 observations. This data is obtained by sampling with replacement
with probability = S from df_om_sel_source
then removing the
columns Y and S. The resulting data corresponds to what a
researcher would see in the real-world: a misclassified outcome,
Ystar, and missing data for those not selected into the study
(S=0). As seen in df_om_sel_source
, the true, unbiased
exposure-outcome odds ratio = 2.
df_om_sel
df_om_sel
A dataframe with 100,000 rows and 5 columns:
exposure, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
df_om_sel
Data with complete information on the two sources of bias, a known
confounder, and 100,000 observations. This data is used to derive
df_om_sel
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_om_sel
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_om_sel_source
df_om_sel_source
A dataframe with 100,000 rows and 7 columns:
exposure, 1 = present and 0 = absent
true outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
selection, 1 = selected into the study and 0 = not selected into the study
df_om
Data with complete information on one sources of bias, three known
confounders, and 100,000 observations. This data is used to derive
df_om
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_om
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_om_source
df_om_source
A dataframe with 100,000 rows and 6 columns:
exposure, 1 = present and 0 = absent
true outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
Data containing one source of bias, three known confounders, and 100,000
observations. This data is obtained by sampling with replacement with
probability = S from df_sel_source
then removing the S
column. The resulting data corresponds to what a researcher would see
in the real-world: missing data for those not selected into the study
(S=0). As seen in df_sel_source
, the true, unbiased
exposure-outcome odds ratio = 2.
df_sel
df_sel
A dataframe with 100,000 rows and 5 columns:
exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
df_sel
Data with complete information on study selection, three known
confounders, and 100,000 observations. This data is used to derive
df_sel
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_sel
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_sel_source
df_sel_source
A dataframe with 100,000 rows and 6 columns:
true exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
selection, 1 = selected into the study and 0 = not selected into the study
Data containing one source of bias, three known confounders, and
100,000 observations. This data is obtained from df_uc_source
by removing the column U. The resulting data corresponds to
what a researcher would see in the real-world: information on known
confounders (C1, C2, and C3), but not for
confounder U.
As seen in df_uc_source
, the true, unbiased exposure-outcome
effect estimate = 2.
df_uc
df_uc
A dataframe with 100,000 rows and 7 columns:
binary exposure, 1 = present and 0 = absent
continuous exposure
binary outcome corresponding to exposure X_bi, 1 = present and 0 = absent
continuous outcome corresponding to exposure X_cont
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
Data containing two sources of bias, three known confounders, and
100,000 observations. This data is obtained from df_uc_em_source
by removing the columns X and U. The resulting data
corresponds to what a researcher would see in the real-world: a
misclassified exposure, Xstar, and missing data on a confounder
U. As seen in df_uc_em_source
, the true, unbiased
exposure-outcome odds ratio = 2.
df_uc_em
df_uc_em
A dataframe with 100,000 rows and 5 columns:
misclassified exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
Data containing three sources of bias, three known confounders, and
100,000 observations. This data is obtained by sampling with replacement
with probability = S from df_uc_em_sel_source
then removing
the columns X, U, and S. The resulting data corresponds
to what a researcher would see in the real-world: a misclassified exposure,
Xstar; missing data on a confounder U; and missing data for
those not selected into the study (S=0). As seen in
df_uc_em_sel_source
, the true, unbiased exposure-outcome
odds ratio = 2.
df_uc_em_sel
df_uc_em_sel
A dataframe with 100,000 rows and 5 columns:
misclassified exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
df_uc_em_sel
Data with complete information on the three sources of bias, three known
confounders, and 100,000 observations. This data is used to derive
df_uc_em_sel
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_uc_em_sel
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3 + α5U
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_uc_em_sel_source
df_uc_em_sel_source
A dataframe with 100,000 rows and 8 columns:
true exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
unmeasured confounder, 1 = present and 0 = absent
misclassified exposure, 1 = present and 0 = absent
selection, 1 = selected into the study and 0 = not selected into the study
df_uc_em
Data with complete information on the two sources of bias, a known
confounder, and 100,000 observations. This data is used to derive
df_uc_em
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_uc_em
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3U
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_uc_em_source
df_uc_em_source
A dataframe with 100,000 rows and 7 columns:
true exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
unmeasured confounder, 1 = present and 0 = absent
misclassified exposure, 1 = present and 0 = absent
Data containing two sources of bias, three known confounders, and
100,000 observations. This data is obtained from df_uc_om_source
by removing the columns Y and U. The resulting data
corresponds to what a researcher would see in the real-world: a
misclassified outcome, Ystar, and missing data on the binary
confounder U. As seen in df_uc_omc_source
, the true, unbiased
exposure-outcome odds ratio = 2.
df_uc_om
df_uc_om
A dataframe with 100,000 rows and 5 columns:
exposure, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
Data containing three sources of bias, three known confounders, and
100,000 observations. This data is obtained by sampling with replacement
with probability = S from df_uc_om_sel_source
then removing
the columns Y, U, and S. The resulting data
corresponds to what a researcher would see in the real-world:
a misclassified outcome, Ystar; missing data
on a confounder U; and missing data for those not selected
into the study (S=0). As seen in df_uc_om_sel_source
,
the true, unbiased exposure-outcome odds ratio = 2.
df_uc_om_sel
df_uc_om_sel
A dataframe with 100,000 rows and 5 columns:
exposure, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
df_uc_om_sel
Data with complete information on the three sources of bias, three known
confounders, and 100,000 observations. This data is used to derive
df_uc_om_sel
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_uc_om_sel
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3 + α5U
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_uc_om_sel_source
df_uc_om_sel_source
A dataframe with 100,000 rows and 8 columns:
exposure, 1 = present and 0 = absent
true outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
unmeasured confounder, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
selection, 1 = selected into the study and 0 = not selected into the study
df_uc_om
Data with complete information on the two sources of bias, three known
confounders, and 100,000 observations. This data is used to derive
df_uc_om
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_uc_om
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3U
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_uc_om_source
df_uc_om_source
A dataframe with 100,000 rows and 7 columns:
exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
unmeasured confounder, 1 = present and 0 = absent
misclassified outcome, 1 = present and 0 = absent
Data containing two sources of bias, three known confounders, and 100,000
observations. This data is obtained by sampling with replacement with
probability = S from df_uc_sel_source
then removing
the columns U and S. The resulting data corresponds to
what a researcher would see
in the real-world: missing data on confounder U; and missing data for
those not selected into the study (S=0). As seen in
df_uc_sel_source
, the true, unbiased exposure-outcome odds ratio = 2.
df_uc_sel
df_uc_sel
A dataframe with 100,000 rows and 5 columns:
exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
df_uc_sel
Data with complete information on the two sources of bias, a known
confounder, and 100,000 observations. This data is used to derive
df_uc_sel
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_uc_sel
. With this source data, the fitted regression
logit(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3 + α5U
shows that the true, unbiased exposure-outcome odds ratio = 2.
df_uc_sel_source
df_uc_sel_source
A dataframe with 100,000 rows and 7 columns:
true exposure, 1 = present and 0 = absent
outcome, 1 = present and 0 = absent
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
unmeasured confounder, 1 = present and 0 = absent
selection, 1 = selected into the study and 0 = not selected into the study
df_uc
Data with complete information on one source of bias, three known
confounders, and 100,000 observations. This data is used to derive
df_uc
and can be used to obtain bias parameters for purposes
of validating the simultaneous multi-bias adjustment method with
df_uc
. With this source data, the fitted regression
g(P(Y=1)) = α0 + α1X + α2C1 + α3C2 + α4C3 + α5U
shows that the true, unbiased exposure-outcome effect estimate = 2 when:
g = logit, Y = Y_bi, and X = X_bi or
g = identity, Y = Y_cont, X = X_cont.
df_uc_source
df_uc_source
A dataframe with 100,000 rows and 8 columns:
binary exposure, 1 = present and 0 = absent
continuous exposure
binary outcome corresponding to exposure X_bi, 1 = present and 0 = absent
continuous outcome corresponding to exposure X_cont
1st confounder, 1 = present and 0 = absent
2nd confounder, 1 = present and 0 = absent
3rd confounder, 1 = present and 0 = absent
uncontrolled confounder, 1 = present and 0 = absent
Data from a cohort study in which white males in Evans County were followed for 7 years, with coronary heart disease as the outcome of interest.
evans
evans
A dataframe with 609 rows and 9 columns:
subject identifiction
outcome variable; 1 = coronary heart disease
age (in years)
cholesterol, mg/dl
1 = subject has ever smoked
1 = presence of electrocardiogram abnormality
diastolic blood pressure, mmHg
systolic blood pressure, mmHg
1 = SBP greater than or equal to 160 or DBP greater than or equal to 95
http://web1.sph.emory.edu/dkleinb/logreg3.htm#data