--- title: "CRE" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{CRE} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Installation Installing from CRAN. ```{r, eval=FALSE} install.packages("CRE") ``` Installing the latest developing version. ```{r, eval=FALSE} library(devtools) install_github("NSAPH-Software/CRE", ref = "develop") ``` Import. ```{r, eval=FALSE} library("CRE") ``` # Arguments __Data (required)__ **`y`** The observed response/outcome vector (binary or continuous). **`z`** The treatment/exposure/policy vector (binary). **`X`** The covariate matrix (binary or continuous). __Parameters (not required)__ **`method_parameters`** The list of parameters to define the models used, including: - **`ratio_dis`** The ratio of data delegated to the discovery sub-sample (default: 0.5). - **`ite_method`** The method to estimate the individual treatment effect (default: "aipw") [1]. - **`learner_ps`** The ([SuperLearner](https://CRAN.R-project.org/package=SuperLearner)) model for the propensity score estimation (default: "SL.xgboost", used only for "aipw","bart","cf" ITE estimators). - **`learner_y`** The ([SuperLearner](https://CRAN.R-project.org/package=SuperLearner)) model for the outcome estimation (default: "SL.xgboost", used only for "aipw","slearner","tlearner" and "xlearner" ITE estimators). **`hyper_params`** The list of hyper parameters to fine tune the method, including: - **`intervention_vars`** Intervention-able variables used for Rules Generation (default: `NULL`). - **`ntrees`** The number of decision trees for random forest (default: 20). - **`node_size`** Minimum size of the trees' terminal nodes (default: 20). - **`max_rules`** Maximum number of candidate decision rules (default: 50). - **`max_depth`** Maximum rules length (default: 3). - **`t_decay`** The decay threshold for rules pruning (default: 0.025). - **`t_ext`** The threshold to define too generic or too specific (extreme) rules (default: 0.01). - **`t_corr`** The threshold to define correlated rules (default: 1). - **`stability_selection`** Method for stability selection for selecting the rules. `vanilla` for stability selection, `error_control` for stability selection with error control and `no` for no stability selection (default: `vanilla`). - **`B`** Number of bootstrap samples for stability selection in rules selection and uncertainty quantification in estimation (default: 20). - **`subsample`** Bootstrap ratio subsample for stability selection in rules selection and uncertainty quantification in estimation (default: 0.5). - **`offset`** Name of the covariate to use as offset (i.e. "x1") for T-Poisson ITE Estimation. `NULL` if not used (default: `NULL`). - **`cutoff`** Threshold defining the minimum cutoff value for the stability scores in Stability Selection (default: 0.9). - **`pfer`** Upper bound for the per-family error rate (tolerated amount of falsely selected rules) in Error Control Stability Selection (default: 1). __Additional Estimates (not required)__ **`ite`** The estimated ITE vector. If given, both the ITE estimation steps in Discovery and Inference are skipped (default: `NULL`). ## Notes ### Options for the ITE estimation **[1]** Options for the ITE estimation are as follows: - [S-Learner](https://CRAN.R-project.org/package=SuperLearner) (`slearner`). - [T-Learner](https://CRAN.R-project.org/package=SuperLearner) (`tlearner`) - T-Poisson(`tpoisson`) - [X-Learner](https://CRAN.R-project.org/package=SuperLearner) (`xlearner`) - [Augmented Inverse Probability Weighting](https://CRAN.R-project.org/package=SuperLearner) (`aipw`) - [Causal Forests](https://CRAN.R-project.org/package=grf) (`cf`) - [Causal Bayesian Additive Regression Trees](https://CRAN.R-project.org/package=bartCause) (`bart`) If other estimates of the ITE are provided in `ite` additional argument, both the ITE estimations in discovery and inference are skipped and those values estimates are used instead. The ITE estimator requires also an outcome learner and/or a propensity score learner from the [SuperLearner](https://CRAN.R-project.org/package=SuperLearner) package (i.e., "SL.lm", "SL.svm"). Both these models are simple classifiers/regressors. By default XGBoost algorithm is used for both these steps. ### Customized wrapper for SuperLearner One can create a customized wrapper for SuperLearner internal packages. The following is an example of providing the number of cores (e.g., 12) for the xgboost package in a shared memory system. ```R m_xgboost <- function(nthread = 12, ...) { SuperLearner::SL.xgboost(nthread = nthread, ...) } ``` Then use "m_xgboost", instead of "SL.xgboost". # Examples Example 1 (*default parameters*) ```R set.seed(9687) dataset <- generate_cre_dataset(n = 1000, rho = 0, n_rules = 2, p = 10, effect_size = 2, binary_covariates = TRUE, binary_outcome = FALSE, confounding = "no") y <- dataset[["y"]] z <- dataset[["z"]] X <- dataset[["X"]] cre_results <- cre(y, z, X) summary(cre_results) plot(cre_results) ite_pred <- predict(cre_results, X) ``` Example 2 (*personalized ite estimation*) ```R set.seed(9687) dataset <- generate_cre_dataset(n = 1000, rho = 0, n_rules = 2, p = 10, effect_size = 2, binary_covariates = TRUE, binary_outcome = FALSE, confounding = "no") y <- dataset[["y"]] z <- dataset[["z"]] X <- dataset[["X"]] ite_pred <- ... # personalized ite estimation cre_results <- cre(y, z, X, ite = ite_pred) summary(cre_results) plot(cre_results) ite_pred <- predict(cre_results, X) ``` Example 3 (*setting parameters*) ```R set.seed(9687) dataset <- generate_cre_dataset(n = 1000, rho = 0, n_rules = 2, p = 10, effect_size = 2, binary_covariates = TRUE, binary_outcome = FALSE, confounding = "no") y <- dataset[["y"]] z <- dataset[["z"]] X <- dataset[["X"]] method_params <- list(ratio_dis = 0.5, ite_method ="aipw", learner_ps = "SL.xgboost", learner_y = "SL.xgboost") hyper_params <- list(intervention_vars = c("x1","x2","x3","x4"), offset = NULL, ntrees = 20, node_size = 20, max_rules = 50, max_depth = 3, t_decay = 0.025, t_ext = 0.025, t_corr = 1, stability_selection = "vanilla", cutoff = 0.8, pfer = 1, B = 10, subsample = 0.5) cre_results <- cre(y, z, X, method_params, hyper_params) summary(cre_results) plot(cre_results) ite_pred <- predict(cre_results, X) ``` More synthetic data sets can be generated using `generate_cre_dataset()`.