Package 'xpect' reference manual

Title:	Probabilistic Time Series Forecasting with XGBoost and Conformal Inference
Description:	Implements a probabilistic approach to time series forecasting combining XGBoost regression with conformal inference methods. The package provides functionality for generating predictive distributions, evaluating uncertainty, and optimizing hyperparameters using Bayesian, coarse-to-fine, or random search strategies.
Authors:	Giancarlo Vercellino [aut, cre, cph]
Maintainer:	Giancarlo Vercellino <giancarlo.vercellino@gmail.com>
License:	GPL-3
Version:	1.0
Built:	2025-03-24 13:32:04 UTC
Source:	CRAN

xpect

Description

This function implements probabilistic time series forecasting by combining gradient-boosted regression (XGBoost) with conformal inference techniques. It produces predictive distributions capturing uncertainty and optimizes hyper parameters through Bayesian, coarse-to-fine, or random search methods. The approach leverages historical observations from predictor series to estimate the future values of a specified target series. Users can customize the forecasting model extensively by setting parameters for model complexity, regularization, and conformal calibration.

Implements a probabilistic approach to time series forecasting combining XGBoost regression with conformal inference methods. The package provides functionality for generating predictive distributions, evaluating uncertainty, and optimizing hyper parameters using Bayesian, coarse-to-fine, or random search strategies.

Usage

xpect(
  predictors,
  target,
  future,
  past = 1L,
  coverage = 0.5,
  max_depth = 3L,
  eta = 0.1,
  gamma = 0,
  alpha = 0,
  lambda = 1,
  subsample = 0.8,
  colsample_bytree = 0.8,
  search = "none",
  calib_rate = 0.5,
  n_sim = 1000,
  nrounds = 200,
  n_samples = 10,
  n_exploration = 10,
  n_phases = 3,
  top_k = 3,
  seed = 42
)
xpect(
  predictors,
  target,
  future,
  past = 1L,
  coverage = 0.5,
  max_depth = 3L,
  eta = 0.1,
  gamma = 0,
  alpha = 0,
  lambda = 1,
  subsample = 0.8,
  colsample_bytree = 0.8,
  search = "none",
  calib_rate = 0.5,
  n_sim = 1000,
  nrounds = 200,
  n_samples = 10,
  n_exploration = 10,
  n_phases = 3,
  top_k = 3,
  seed = 42
)

Arguments

`predictors`	A data frame containing multiple time series predictors and the target series to forecast.
`target`	Character string specifying the name of the target series to forecast within the predictors dataset.
`future`	Integer specifying the number of future time steps to forecast.
`past`	Integer or numeric vector specifying past observations used as input features. Single value sets fixed value (default: 1). NULL sets standard range (1L-30L), while two values define custom range.
`coverage`	Numeric or numeric vector for fraction of total variance preserved during SVD. Single value sets fixed value (default: 0.5). NULL sets standard range (0.05-0.95), while two values define custom range.
`max_depth`	Integer or numeric vector for max depth of XGBoost trees. Single value sets fixed value (default: 3). NULL sets standard range (3L-10L), while two values define custom range.
`eta`	Numeric or numeric vector for learning rate in XGBoost. Single value sets fixed value (default: 0.1). NULL sets standard range (0.01-0.3), while two values define custom range.
`gamma`	Numeric or numeric vector for minimum loss reduction to split a leaf node. Single value sets fixed value (default: 0). NULL sets standard range (0-5), while two values define custom range.
`alpha`	Numeric or numeric vector for L1 regularization strength. Single value sets fixed value (default: 0). NULL sets standard range (0-1), while two values define custom range.
`lambda`	Numeric or numeric vector for L2 regularization strength. Single value sets fixed value (default: 1). NULL sets standard range (0-1), while two values define custom range.
`subsample`	Numeric or numeric vector (0-1) for instance subsampling ratio per tree. Single value sets fixed value (default: 0.8). NULL sets standard range (0-1), while two values define custom range.
`colsample_bytree`	Numeric or numeric vector (0-1) for column subsampling ratio per tree. Single value sets fixed value (default: 0.8). NULL sets standard range (0-1), while two values define custom range.
`search`	Character string specifying the hyper parameter search method to employ. Options include: "none" (default), "random_search", "bayesian", "coarse_to_fine".
`calib_rate`	Numeric fraction (default: 0.5) of observations allocated for conformal calibration, influencing the uncertainty estimation.
`n_sim`	Integer (default: 1000) determining the number of simulated calibration error samples used during conformal inference.
`nrounds`	Integer (default: 200) specifying the maximum number of boosting iterations allowed during model training.
`n_samples`	Integer specifying the number of parameter configurations evaluated during random search or initial Bayesian sampling.
`n_exploration`	Integer specifying the number of exploratory evaluations during Bayesian optimization to balance exploration-exploitation.
`n_phases`	Integer specifying how many iterative refinement phases are performed in coarse-to-fine optimization.
`top_k`	Integer (default: 3) indicating how many top-performing parameter configurations are retained in each coarse-to-fine optimization iteration.
`seed`	Integer setting the random seed for reproducibility.

Value

A list containing:

history: A data frame logging each evaluated hyperparameter configuration and its associated cross-entropy performance against the selected benchmark.
best_model: The optimal forecasting model, including probability density functions (pdf), cumulative distribution functions (cdf), inverse cumulative distribution functions (icdf), and random sampling functions (sampler) for each point in the forecasted horizon.
best_params: A named vector detailing the selected hyper parameters of the best-performing forecasting model.
plot: A visualization displaying the optimal forecasts alongside confidence bands derived from conformal intervals, facilitating intuitive uncertainty interpretation.
time_log: Duration tracking the computational time required for the complete optimization and model-building process.

Author(s)

Giancarlo Vercellino giancarlo.vercellino@gmail.com

Maintainer: Giancarlo Vercellino giancarlo.vercellino@gmail.com [copyright holder]

Examples



  dummy_data <- data.frame(target_series = cumsum(rnorm(100)), predictor1 = cumsum(rnorm(100)))

  result <- xpect(predictors = dummy_data,
                          target = "target_series",
                          future = 3,
                          past = c(5L, 20L),#CUSTOM RANGE
                          coverage = 0.9,
                          max_depth = c(3L, 8L),#CUSTOM RANGE
                          eta = c(0.01, 0.05),
                          gamma = NULL,#STANDARD RANGE
                          alpha = NULL,#STANDARD RANGE
                          lambda = NULL,#STANDARD RANGE
                          subsample = 0.8,
                          colsample_bytree = 0.8,
                          search = "random_search",
                          n_samples = 3,
                          seed = 123)


dummy_data <- data.frame(target_series = cumsum(rnorm(100)), predictor1 = cumsum(rnorm(100)))

  result <- xpect(predictors = dummy_data,
                          target = "target_series",
                          future = 3,
                          past = c(5L, 20L),#CUSTOM RANGE
                          coverage = 0.9,
                          max_depth = c(3L, 8L),#CUSTOM RANGE
                          eta = c(0.01, 0.05),
                          gamma = NULL,#STANDARD RANGE
                          alpha = NULL,#STANDARD RANGE
                          lambda = NULL,#STANDARD RANGE
                          subsample = 0.8,
                          colsample_bytree = 0.8,
                          search = "random_search",
                          n_samples = 3,
                          seed = 123)

Package 'xpect'

Help Index