---
title: "Estimating GPS"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{estimating_gps}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

The propensity score (PS) is the conditional probability of assignment to a 
particular treatment given a vector of observed covariates [@rosenbaum_1983]. 
@hirano_2004 extended the idea to studies with continuous treatment (or exposure) 
and labeled it as the generalized propensity score (GPS), which is a probability 
density function. In this package, we use either a parametric model (a standard 
linear regression model) or a non-parametric model (a flexible machine learning 
model) to train the GPS model as a density estimation procedure [@kennedy_2017]. 
After the model training, we can estimate GPS values based on the model prediction. 
The machine learning models are developed using the SuperLearner Package [@superlearner_2007]. 
For more details on the problem framework and assumptions, please see @wu_2018.

<!-- [TBD: Data flow and mathematical equations] -->

Whether the prediction models' performance should be considered the primary 
parameter in the training of the prediction model is an open research question. 
In this package, the users have complete control over the hyperparameters, which 
can fine-tune the prediction models to achieve different performance levels. 


## Available SuperLearner Libraries

The users can use any library in the SuperLearner package. However, in order to 
have control on internal libraries we generate customized wrappers. The following 
table represents the available customized wrappers as well as hyperparameters.

| Package name | `sl_lib` name | prefix| available hyperparameters |
|:------------:|:-------------:|:-----:|:-------------------------:|
| [XGBoost](https://xgboost.readthedocs.io/en/latest/index.html)| `m_xgboost` | `xgb_`|  nrounds, eta, max_depth, min_child_weight, verbose |
| [ranger](https://cran.r-project.org/package=ranger) |`m_ranger`| `rgr_` | num.trees, write.forest, replace, verbose, family |


## Implementation

Both `XGBoost` and `ranger` libraries are developed for efficient processing on 
multiple cores. The only requirement is making sure that OpenMP is installed on 
the system. User needs to pass the number of threads (`nthread`) in running the 
`estimate_gps` function.

In the following section, we conduct several analyses to test the scalability and
performance. These analyses can be used to have a rough estimate of what to expect
in different data sizes and computational resources.


## References