--- title: "Continual Reassessment Method" output: rmarkdown::html_vignette: df_print: tibble bibliography: library.bib vignette: > %\VignetteIndexEntry{Continual Reassessment Method} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The `escalation` package by Kristian Brock. Documentation is hosted at https://brockk.github.io/escalation/ ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) options(rmarkdown.html_vignette.check_title = FALSE) ``` # Introduction The continual reassessment method (CRM) was introduced by @OQuigley1990. It has proved to be a truly seminal dose-finding design, spurring many revisions, variants and imitations. # Summary of the CRM Design Pinning a finger on what is _the_ CRM design is complicated because there have been so many versions over the years. At its simplest, the CRM is a dose-escalation design that seeks a dose with probability of toxicity closest to some pre-specified target toxicity rate, $p_T$, in an homogeneous patient group. The hallmark that unifies the CRM variants is the assumption that the probability of toxicity, $p_i$, at the $i$th dose, $x_i$, can be modelled using a smooth mathematical function: $$ p_i = F(x_i, \theta), $$ where $\theta$ is a general vector of parameters. Priors are specified on $\theta$, and the dose with posterior estimate of $p_i$ closest to $p_T$ is iteratively recommended to the next patient(s). Different variants of the CRM use different forms for $F$. We consider those briefly now. ### Hyperbolic tangent model @OQuigley1990 first introduced the method using $$ p_i = \left( \frac{\tanh{(x_i)} + 1}{2} \right)^\beta, $$ with an exponential prior was placed on $\beta$. ### Empiric model (aka power model) $$ p_i = x_i^{\exp(\beta)}, $$ for dose $x_i \in (0, 1)$; ### One-parameter logistic model $$ \text{logit} p_i = a_0 + \exp{(\beta)} x_i, $$ where $a_0$ is a pre-specified constant, and $x_i \in \mathbb{R}$. ### Two-parameter logistic model $$ \text{logit} p_i = \alpha + \exp{(\beta)} x_i, $$ for $x_i \in \mathbb{R}$. Other model considerations include: ### Priors In each of these models, different distributions may be used for the parameter priors. ### Toxicity skeletons and standardised doses The $x_i$ dose variables in the models above do not reflect the raw dose quantities given to patients. For example, if the dose is 10mg, we do not use `x=10`. Instead, a _skeleton_ containing estimates of the probabilities of toxicity at the doses is identified. This skeleton could reflect the investigators' prior expectations of toxicities at all of the doses; or it could reflect their expectations for some of the doses with the others interpolated in some plausible way. The $x_i$ are then calculated so that the model-estimated probabilities of toxicity with the parameters taking their prior mean values match the skeleton. This will be much clearer with an example. In a five dose setting, let the skeleton be $\pi = (0.05, 0.1, 0.2, 0.4, 0.7)$. That is, the investigators believe before the trial commences that the probbaility of toxicity at the second dose is 10%, and so on. Let us assume that we are using a one-parameter logistic model with $a_0 = 3$ and $\beta ~ \sim N(0, 1)$. Then we require that $$ \text{logit} \pi_i = 3 + e^0 x_i = 3 + x_i, $$ i.e. $$ x_i = \text{logit} \pi_i - 3. $$ This yields the vector of standardised doses $x = (-5.94, -5.20, -4.39, -3.41, -2.15)$. Equivalent transformations can be derived for the other model forms. The $x_i$ are then used as covariates in the model-fitting. CRM users specify their skeleton, $\pi$, and their parameter priors. From these, the software calculates the $x_i$. The actual doses given to patients in SI units do not actually feature in the model. # Implementation in `escalation` `escalation` simply aims to give a common interface to dose-selection models to facilitate a grammar for specifying dose-finding trial designs. Where possible, it delegates the mathematical model-fitting to existing R packages. There are several R packages that implement CRM models. The two used in `escalation` are the `dfcrm` package [@Cheung2011; @dfcrm]; and the `trialr` package [@Brock2019; @trialr]. They have different strengths and weaknesses, so are suitable to different scenarios. We discuss those now. ### dfcrm [`dfcrm`](https://cran.r-project.org/package=dfcrm) offers: * the **empiric** model with normal prior on $\beta$; * the **one-parameter logistic** model with normal prior on $\beta$. `dfcrm` models are fit in `escalation` using the `get_dfcrm` function. Examples are given below. ### trialr [`trialr`](https://cran.r-project.org/package=trialr) offers: * the **empiric** model with normal prior on $\beta$; * the **one-parameter logistic** model with normal prior on $\beta$; * the **one-parameter logistic** model with gamma prior on $\beta$; * the **two-parameter logistic** model with normal priors on $\alpha$ and $\beta$. `trialr` models are fit in `escalation` using the `get_trialr_crm` function. Let us commence by replicating an example from p.21 of @Cheung2011. They choose the following parameters: ```{r} skeleton <- c(0.05, 0.12, 0.25, 0.40, 0.55) target <- 0.25 a0 <- 3 beta_sd <- sqrt(1.34) ``` Let us define a model fitter using the `dfcrm` package: ```{r, message=FALSE} library(escalation) model1 <- get_dfcrm(skeleton = skeleton, target = target, model = 'logistic', intcpt = a0, scale = beta_sd) ``` and a fitter using the `trialr` package: ```{r} model2 <- get_trialr_crm(skeleton = skeleton, target = target, model = 'logistic', a0 = a0, beta_mean = 0, beta_sd = beta_sd) ``` Names for the function parameters `skeleton`, `target`, and `model` are standardised by `escalation` because they are fundamental. Further parameters (i.e. those in the second lines of each of the above examples) are passed onwards to the model-fitting functions in `dfcrm` and `trialr`. You can see that some of these parameter names vary between the approaches. E.g., what `dfcrm` calls the `intcpt`, `trialr` calls `a0`. Refer to the documentation of the `crm` function in `dfcrm` and `stan_crm` in `trialr` for further information. We then fit those models to the notional outcomes described in the source text: ```{r} outcomes <- '3N 5N 5T 3N 4N' fit1 <- model1 %>% fit(outcomes) fit2 <- model2 %>% fit(outcomes) ``` The dose recommended by each of the models for the next patient is: ```{r} fit1 %>% recommended_dose() fit2 %>% recommended_dose() ``` Thankfully, the models agree. They advocate staying at dose 4, wary of the toxicity already seen at dose 5. If we take a summary of each model fit: ```{r} fit1 %>% summary() ``` ```{r} fit2 %>% summary() ``` We can see that they closely agree on model estimates of the probability of toxicity at each dose. Note that the median perfectly matches the mean in the `dfcrm` fit because it assumes a normal posterior distribution on $\beta$. In contrast, the `trialr` class uses `Stan` to fit the model using Hamiltonian Monte Carlo sampling. The posterior distributions for the probabilities of toxicity are evidently non-normal and positively-skewed because the median estimates are less than the mean estimates. Let us imagine instead that we want to fit the empiric model. That simply requires we change the `model` variable and adjust the prior parameters: ```{r} model3 <- get_dfcrm(skeleton = skeleton, target = target, model = 'empiric', scale = beta_sd) model4 <- get_trialr_crm(skeleton = skeleton, target = target, model = 'empiric', beta_sd = beta_sd) ``` Fitting each to the same set of outcomes yields: ```{r} fit3 <- model3 %>% fit(outcomes) fit4 <- model4 %>% fit(outcomes) ``` ```{r} fit3 %>% summary() ``` ```{r} fit4 %>% summary() ``` In this example, the model estimates are broadly consistent across methodology and model type. However, this is not the general case. To illustrate this point, let us examine a two parameter logistic model fit using `trialr` (note: this model is not implemented in `dfcrm`): ```{r} model5 <- get_trialr_crm(skeleton = skeleton, target = target, model = 'logistic2', alpha_mean = 0, alpha_sd = 2, beta_mean = 0, beta_sd = 1) fit5 <- model5 %>% fit(outcomes) fit5 %>% summary() ``` Now the estimate of toxicity at the highest dose is high relative to the other models. The extra free parameter in the two-parameter model offers more flexibility. There has been a debate in the literature about one-parameter vs two-parameter models (and possibly more). It is generally accepted that a single parameter model is too simplistic to accurately estimate $p_i$ over the entire dose range. However, it will be sufficient to identify the dose closest to $p_T$, and if that is the primary objective of the trial, the simplicity of a one-parameter model may be entirely justified. The interested reader is directed to @OQuigley1990 and @Neuenschwander2008. Note that the CRM does not natively implement stopping rules, so these classes on their own will always advocate trial continuance: ```{r} fit1 %>% continue() fit5 %>% continue() ``` and identify each dose as admissible: ```{r} fit1 %>% dose_admissible() ``` This behaviour can be altered by appending classes to advocate stopping for consensus: ```{r} model6 <- get_trialr_crm(skeleton = skeleton, target = 0.3, model = 'empiric', beta_sd = 1) %>% stop_when_n_at_dose(dose = 'recommended', n = 6) fit6 <- model6 %>% fit('2NNN 3TTT 2NTN') fit6 %>% continue() fit6 %>% recommended_dose() ``` Or for stopping under excess toxicity: ```{r} model7 <- get_trialr_crm(skeleton = skeleton, target = 0.3, model = 'empiric', beta_sd = 1) %>% stop_when_too_toxic(dose = 1, tox_threshold = 0.3, confidence = 0.8) fit7 <- model7 %>% fit('1NTT 1TTN') fit7 %>% continue() fit7 %>% recommended_dose() fit7 %>% dose_admissible() ``` Or both: ```{r} model8 <- get_trialr_crm(skeleton = skeleton, target = 0.3, model = 'empiric', beta_sd = 1) %>% stop_when_n_at_dose(dose = 'recommended', n = 6) %>% stop_when_too_toxic(dose = 1, tox_threshold = 0.3, confidence = 0.8) ``` For more information, check the package README and the other vignettes. ### dfcrm vs trialr So which method should you use? The answer depends on how you plan to use the models. The `trialr` classes produce posterior samples: ```{r} fit7 %>% prob_tox_samples(tall = TRUE) %>% head() ``` and these facilitate flexible visualisation: ```{r, message=FALSE} library(ggplot2) library(dplyr) fit7 %>% prob_tox_samples(tall = TRUE) %>% mutate(.draw = .draw %>% as.integer()) %>% filter(.draw <= 200) %>% ggplot(aes(dose, prob_tox)) + geom_line(aes(group = .draw), alpha = 0.2) ``` However, MCMC sampling is an expensive computational procedure compared to the numerical integration used in `dfcrm`. If you envisage fitting lots of models, perhaps in simulations or dose-paths (see below) and favour a model offered by `dfcrm`, we recommend using `get_dfcrm`. However, if you favour a model only offered by `trialr`, or if you are willing for calculation to be slow in order to get posterior samples, then use `trialr`. ## Dose paths We can use the `get_dose_paths` function in `escalation` to calculate exhaustive model recommendations in response to every possible set of outcomes in future cohorts. For instance, at the start of a trial using an empiric CRM, we can examine all possible paths a trial might take in the first two cohorts of three patients, starting at dose 2: ```{r, message=FALSE} skeleton <- c(0.05, 0.12, 0.25, 0.40, 0.55) target <- 0.25 beta_sd <- 1 model <- get_dfcrm(skeleton = skeleton, target = target, model = 'empiric', scale = beta_sd) paths <- model %>% get_dose_paths(cohort_sizes = c(3, 3), next_dose = 2) graph_paths(paths) ``` We see that the design would willingly skip dose 3 if no tox is seen in the first cohort. This might warrant suppressing dose-dkipping by appending a `dont_skip_doses(when_escalating = TRUE)` selector. Dose-paths can also be run for in-progress trials where some outcomes have been established. For more information on working with dose-paths, refer to the dose-paths vignette. ## Simulation We can use the `simulate_trials` function to calculate operating characteristics for a design. Let us use the example above and tell the design to stop when the lowest dose is too toxic, when 9 patients have already been evaluated at the candidate dose, or when a sample size of $n=24$ is reached: ```{r} model <- get_dfcrm(skeleton = skeleton, target = target, model = 'empiric', scale = beta_sd) %>% stop_when_too_toxic(dose = 1, tox_threshold = target, confidence = 0.8) %>% stop_when_n_at_dose(dose = 'recommended', n = 9) %>% stop_at_n(n = 24) ``` For the sake of speed, we will run just fifty iterations: ```{r} num_sims <- 50 ``` In real life, however, we would naturally run many thousands of iterations. Let us investigate under the following true probabilities of toxicity: ```{r} sc1 <- c(0.25, 0.5, 0.6, 0.7, 0.8) ``` The simulated behaviour is: ```{r} set.seed(123) sims <- model %>% simulate_trials(num_sims = num_sims, true_prob_tox = sc1, next_dose = 1) sims ``` We see that the chances of stopping for excess toxicity and recommending no dose is about 1-in-4. Dose 1 is the clear favourite to be identified. Interestingly, the `stop_when_n_at_dose` class reduces the expected sample size to 12-13 patints. Without it: ```{r} get_dfcrm(skeleton = skeleton, target = target, model = 'empiric', scale = beta_sd) %>% stop_when_too_toxic(dose = 1, tox_threshold = target, confidence = 0.8) %>% stop_at_n(n = 24) %>% simulate_trials(num_sims = num_sims, true_prob_tox = sc1, next_dose = 1) ``` expected sample size is much higher and the chances of erroneously stopping early are also higher. These phenomena would justify a wider simulation study in a real situation. For more information on running dose-finding simulations, refer to the simulation vignette. # References