uddbart: Dynamic Interval-Censored Risk Prediction

Overview

The uddbart package provides tools for dynamic risk prediction from irregular longitudinal biomarker data with interval-censored outcomes.

The package is designed for studies where patients are followed over time, biomarker measurements are collected at irregular visit times, and the clinical event is known only to occur between two observation times.

A motivating example is chronic myeloid leukemia (CML), where patients are monitored using repeated BCR–ABL measurements and the event of interest is deep molecular response.

Installation

install.packages("uddbart")

The development version can be installed from GitHub:

# install.packages("pak")
pak::pak("xulinpan/uddbart")

Load the package

library(uddbart)

Example data

The package includes two example datasets:

data("cml_long", package = "uddbart")
data("cml_event", package = "uddbart")

The longitudinal dataset contains repeated biomarker measurements:

head(cml_long)
#>   patient_id  t_months log_mrd
#> 1      P0001  2.004107    -1.3
#> 2      P0001 11.498973    -0.8
#> 3      P0001 17.347023    -1.3
#> 4      P0001 22.636550    -1.2
#> 5      P0001 32.065708    -1.0
#> 6      P0001 53.059548     1.0

The event dataset contains interval-censored outcome information:

head(cml_event)
#>   patient_id         L         R         C delta
#> 1      P0001 53.059548 53.059548 53.059548     0
#> 2      P0002 87.162218 87.162218 87.162218     0
#> 3      P0003  5.979466 10.184805 10.184805     1
#> 4      P0004  7.096509 11.006160 11.006160     1
#> 5      P0005  2.694045  6.045175  6.045175     1
#> 6      P0006 18.595483 38.110883 38.110883     1

Data structure

The longitudinal biomarker data should contain one row per patient visit. A typical structure is:

head(cml_long)

Required columns are usually:

  • patient_id: patient identifier
  • t_months: visit time
  • log_mrd: longitudinal biomarker value

The event data should contain one row per patient:

head(cml_event)

Required columns are usually:

  • patient_id: patient identifier
  • L: left endpoint of the event interval
  • R: right endpoint of the event interval
  • C: censoring time
  • delta: event indicator

Fitting a model

The following example demonstrates the basic workflow.

For CRAN checking, the full model fit is not evaluated in this vignette because Bayesian tree fitting can take time.

fit <- uddbart(
  long_data = cml_long,
  event_data = cml_event,
  landmark = c(6, 12),
  horizon = 12,
  ntree = 20,
  ndpost = 50,
  nskip = 25,
  seed = 1
)

Prediction

After fitting a model, predicted risks can be obtained using predict().

pred <- predict(fit)

head(pred)

The predicted values represent individualized probabilities of experiencing the event within the specified prediction horizon after each landmark time.

Model output

A fitted uddbart object typically contains:

str(fit)

Common components include:

  • landmark-specific prediction data
  • posterior risk estimates
  • fitted Bayesian tree model
  • model settings
  • prediction horizon

Practical interpretation

For a landmark time \(s\) and prediction horizon \(\Delta\), uddbart estimates:

\[ P(T \le s + \Delta \mid T > s, \mathcal{H}(s)), \]

where \(T\) is the event time and \(\mathcal{H}(s)\) is the longitudinal biomarker history observed before or at time \(s\).

In the CML example, this can be interpreted as:

the probability that a patient will achieve deep molecular response within the next prediction window, given their observed BCR–ABL monitoring history up to the landmark time.

Notes for CRAN

The computationally intensive examples are wrapped in eval=FALSE so that the vignette can be built quickly during CRAN checks.

Users can copy and run these examples interactively after installing all required dependencies.

References

Chipman, H. A., George, E. I., and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298.

Rizopoulos, D. (2011). Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics, 67(3), 819–829.

van Houwelingen, H. C., and Putter, H. (2012). Dynamic Prediction in Clinical Survival Analysis. CRC Press.