Generate Synthetic Data

We provide gen_syn_data to generate synthetic data for CausalGPS package

Usage

Input parameters:

sample_size Number of data samples

seed The seed of R’s random number generator

outcome_sd Standard deviation used to generate the outcome

gps_spec A numerical value (1-7) that indicates the GPS model used to generate synthetic data. See the following section for more details.

cova_spec A numerical value (1-2) to modify the covariates. See the code for more details.

Technical Details for Data Generating Process

We generate six confounders (C₁, C₂, ..., C₆), which include a combination of continuous and categorical variables, and generate W using six specifications of the generalized propensity score model,

W = 9{−0.8 + (0.1, 0.1, −0.1, 0.2, 0.1, 0.1)C} + 17 + N(0, 5)
W = 15{−0.8 + (0.1, 0.1, −0.1, 0.2, 0.1, 0.1)C} + 22 + T(2)
W = 9{−0.8 + (0.1, 0.1, −0.1, 0.2, 0.1, 0.1)C} + 3/2C₃² + 15 + N(0, 5)
$W = \frac{49 \exp(\{-0.8+ (0.1,0.1,-0.1,0.2,0.1,0.1) \boldsymbol{C}\})}{1+ \exp(\{-0.8+ (0.1,0.1,-0.1,0.2,0.1,0.1) \boldsymbol{C}\})} -6 + N(0,5)$
$W = \frac{42}{1+ \exp(\{-0.8+ (0.1,0.1,-0.1,0.2,0.1,0.1) \boldsymbol{C}\})} - 18 + N(0,5)$
W = 7log({−0.8 + (0.1, 0.1, −0.1, 0.2, 0.1, 0.1)C}) + 13 + N(0, 4)

We generate Y from an outcome model which is assumed to be a cubical function of W with additive terms for the confounders and interactions between W and confounders C,

Y|W, C ∼ N{μ(W, C), sd²}

μ(W, C) = −10 − (2, 2, 3, −1, 2, 2)C − W(0.1 − 0.1C₁ + 0.1C₄ + 0.1C₅ + 0.1C₃²) + 0.13²W³

- Usage
- Technical Details for Data Generating Process