Forecasting intermittent time series with fable.intermittent

Introduction

fable.intermittent is an R package for modeling and forecasting intermittent time series in the tidyverts framework, providing a suite of probabilistic methods and data sets. This vignette introduces the main features and demonstrates basic usage.

Installation

You can install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("StefanoDamato/fable.intermittent")

Usage

We show different features of the package, including the data sets, a pipeline to fit models, generate forecast and evaluate them, and our implementation of the Tweedie distribution. First, load the package.

library(fable.intermittent)

Data

The package includes two data sets,auto and raf. Each data set is saved as a tsibble object, which is a tidy data structure for time series data. The identifiers of the time series are stored in the series_id column. Time indices are stored in the index column. The observed values are stored in the value column.

In this vignette, we use a subset sampling 200 time series from the auto data set, which contains 3000 time series of monthly sales of spare parts for cars. We use the filter() function to select the desired time series. Each time series is composed by 24 observations spanning from January 2010 to December 2011.

data(auto)
idx <- paste0("TS", sample(3000, 200))
data <- auto |>
  dplyr::filter(series_id %in% idx)
First observations of the data subset
series_id index value
TS100 2010 Jan 0
TS100 2010 Feb 6
TS100 2010 Mar 8
TS100 2010 Apr 8
TS100 2010 May 34
TS100 2010 Jun 0
Example time series from the auto data set

Example time series from the auto data set

Fitting and forecasting

The package provides several models from the literature, including:

  • BETANBB()
  • EMPDISTR()
  • GAMPOISB()
  • HSPES()
  • MARWAL()
  • NEGBINES()
  • VZ()
  • WSS()

Moreover, two novel methods are implemented:

  • TWEES()
  • STATICDISTR()

See the documentation for details on each function.

Before fitting the methods, we use the filter() function to drop the last 6 observation from each time series, which will be used for evaluating the forecasts. The methods are fitted using the model() function. In this vignette, we only use a pool of 4 models, two of which are the novel ones.

fit <- data |>
  dplyr::filter(index <= tsibble::yearmonth("2011 June")) |>
  fabletools::model(
    betanbb = BETANBB(value),
    staticdistr = STATICDISTR(value),
    twees = TWEES(value),
    wss = WSS(value)
  )
Fitted models
series_id betanbb staticdistr twees wss
TS100 BETANBB STATICDISTR TWEES WSS
TS1015 BETANBB STATICDISTR TWEES WSS
TS1027 BETANBB STATICDISTR TWEES WSS
TS103 BETANBB STATICDISTR TWEES WSS
TS1036 BETANBB STATICDISTR TWEES WSS
TS1078 BETANBB STATICDISTR TWEES WSS

Forecasts are generated using the forecast() function, which produces a fable object containing the forecasts and their associated uncertainty. Other than the time series identifiers and the time indices, it contains a .model column that indicates the method used to generate the forecasts, a value column storing the predictive distributions (implemented as distributional objects), and a .mean column for the point forecast (the mean). We generate forecasts for the next 6 months.

fc <- fit |>
  fabletools::forecast(h = "6 months")
Forecasts for the next 6 months
series_id .model index value .mean
TS100 betanbb 2011 Jul sample[5000] 8.1980
TS100 betanbb 2011 Aug sample[5000] 8.1302
TS100 betanbb 2011 Sep sample[5000] 8.1690
TS100 betanbb 2011 Oct sample[5000] 8.7408
TS100 betanbb 2011 Nov sample[5000] 8.2350
TS100 betanbb 2011 Dec sample[5000] 8.4800

Evaluation

We evaluate the forecasts using the accuracy() function: among the different scoring rules, we use RMSSE for mean forecasts, and the pinball loss for quantile forecasts. To compare the performance of forecasting methods, we group the scores by model (via group_by()) and we aggregate them using the mean (via summarise()).

results <- fc |>
  fabletools::accuracy(data, measures = list(
    RMSSE = fabletools::RMSSE, 
    pinball_loss = fabletools::pinball_loss
    )) |>
  dplyr::group_by(.model) |>
  dplyr::summarise(
    RMSSE = mean(RMSSE), 
    pinball_loss  = mean(pinball_loss)
    )
Forecast accuracy metrics
.model RMSSE pinball_loss
betanbb 0.847 1.298
staticdistr 0.846 1.224
twees 0.848 1.197
wss 0.891 1.317
Example forecast for one time series

Example forecast for one time series

References