--- title: "Estimating GPS" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{estimating_gps} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The propensity score (PS) is the conditional probability of assignment to a particular treatment given a vector of observed covariates [@rosenbaum_1983]. @hirano_2004 extended the idea to studies with continuous treatment (or exposure) and labeled it as the generalized propensity score (GPS), which is a probability density function. In this package, we use either a parametric model (a standard linear regression model) or a non-parametric model (a flexible machine learning model) to train the GPS model as a density estimation procedure [@kennedy_2017]. After the model training, we can estimate GPS values based on the model prediction. The machine learning models are developed using the SuperLearner Package [@superlearner_2007]. For more details on the problem framework and assumptions, please see @wu_2018. Whether the prediction models' performance should be considered the primary parameter in the training of the prediction model is an open research question. In this package, the users have complete control over the hyperparameters, which can fine-tune the prediction models to achieve different performance levels. ## Available SuperLearner Libraries The users can use any library in the SuperLearner package. However, in order to have control on internal libraries we generate customized wrappers. The following table represents the available customized wrappers as well as hyperparameters. | Package name | `sl_lib` name | prefix| available hyperparameters | |:------------:|:-------------:|:-----:|:-------------------------:| | [XGBoost](https://xgboost.readthedocs.io/en/latest/index.html)| `m_xgboost` | `xgb_`| nrounds, eta, max_depth, min_child_weight, verbose | | [ranger](https://cran.r-project.org/package=ranger) |`m_ranger`| `rgr_` | num.trees, write.forest, replace, verbose, family | ## Implementation Both `XGBoost` and `ranger` libraries are developed for efficient processing on multiple cores. The only requirement is making sure that OpenMP is installed on the system. User needs to pass the number of threads (`nthread`) in running the `estimate_gps` function. In the following section, we conduct several analyses to test the scalability and performance. These analyses can be used to have a rough estimate of what to expect in different data sizes and computational resources. ## References