Pseudo population dataset is computed based on user-defined causal inference approaches (e.g., matching or weighting). A covariate balance test is performed on the pseudo population dataset. Users can specify covariate balance criteria and activate an adaptive approach and number of attempts to search for a target pseudo population dataset that meets the covariate balance criteria.
Input parameters:
w
A data.frame of observed continues
exposure, including id
and w
columns.
c
A data frame or matrix of observed
baseline covariates, also includes id
column,
ci_appr
The causal inference approach.
Options are “matching” and “weighting”.
dist_measure
Distance measuring
function.
scale
specified scale parameter to control
the relative weight that is attributed to the distance measures of the
exposure versus the GPS estimates
delta_n
specified caliper parameter on the
exposure
covar_bl_method
specified covariate
balance method
covar_bl_trs
specified covariate balance
threshold
max_attempt
maximum number of attempt to
satisfy covariate balance
The matching algorithm aims to match an observed unit j to each j′ at each exposure level w(l).
We specify delta_n
(δn), a caliper
for any exposure level w,
which constitutes equally sized bins, i.e., [w − δn, w + δn].
Based on the caliper delta_n
, we define a
predetermined set of L
exposure levels {w(1) = min (w) + δn, w(2) = min (w) + 3δn, ..., w(L) = min (w) + (2L − 1)δn},
where $L = \lfloor
\frac{\max(w)-\min(w)}{2\delta_n} + \frac{1}{2} \rfloor$. Each
exposure level w(l) is the
midpoint of equally sized bins, [w(l) − δn, w(l) + δn].
We implement a nested-loop algorithm, with l in 1, 2, …, L as the outer-loop, and
j′ in 1, …, N as the inner-loop. The
algorithm outputs the final product of our design stage, i.e., a matched
set with N × L
units.
for l = 1, 2, …, L
do
Choose one exposure level of interest w(l) ∈ {w(1), w(2), ..., w(L)}.
for j′ = 1, …, N
do
2.1 Evaluate the GPS ê(w(l), cj′)
(for short ej′(l))
at w(l)
based on the fitted GPS model in Step 1 for each unit j′ having observed covariates cj′.
2.2 Implement the matching to find an observed unit –
denoted by j – that matched
with j′ with respect to both
the exposure wj ≈ w(l)
and the estimated GPS ê(wj, cj) ≈ ej′(l)
(under a standardized Euclidean transformation). More specifically, we
find a j as $$
j_{{gps}}(e^{(l)}_{j'},w^{(l)})=\text{arg} \ \underset{j: w_j \in
[w^{(l)}-\delta_n,w^{(l)}+\delta_n]}{\text{min}} \ \mid\mid( \lambda
\hat{e}^{*}(w_j,\mathbf{c}_j), (1-\lambda)w^{*}_j) -(\lambda
e_{j'}^{(l)*}, (1-\lambda) w^{(l)*})\mid\mid,
$$ where dist_measure
(||.||) is a pre-specified two-dimensional
metric, scale
(λ) is the scale parameter assigning
weights to the corresponding two dimensions (i.e., the GPS and
exposure), and δ is the
caliper defined in Step 2 allowing that only the unit j with an observed exposure wj ∈ [w(l) − δ, w(l) + δ]
can get matched.
2.3 Impute Yj′(w(l))
as: Ŷj′(w(l)) = Yjgps(ej′(l), w(l))obs.
end for
end for
After implementing the matching algorithm, we construct the matched set with N × L units by combining all Ŷj′(w(l)) for j′ = 1, …, N and for all w(l) ∈ {w(1), w(2), ..., w(L)}.
We introduce the absolute correlation measure
(covar_bl_method
= “absolute”) to assess
covariate balance for continuous exposures . The absolute correlation
between the exposure and each pre-exposure covariate is a global measure
and can inform whether the whole matched set is balanced. The measures
above build upon the work by (Austin 2019)
who examine covariate balance conditions with continuous exposures. We
adapt them into the proposed matching framework.
In a balanced pseudo population dataset, the correlations between the
exposure and pre-exposure covariates should close to zero, that is E[ci*wi*] ≈ 0.
We calculate the absolute correlation in the pseudo population dataset
as
The average absolute correlations are defined as the average of
absolute correlations among all covariates. Average absolute
correlation: We specify a pre-specified threshold
covar_bl_trs
(ϵ1), for example
0.1, on average absolute correlation as the threshold for covariate
balance in the pseudo population dataset.