Introduction

fable.intermittent is an R package for modeling and forecasting intermittent time series in the tidyverts framework, providing a suite of probabilistic methods and data sets. This vignette introduces the main features and demonstrates basic usage.

Installation

You can install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("StefanoDamato/fable.intermittent")

Usage

We show different features of the package, including the data sets, a pipeline to fit models, generate forecast and evaluate them, and our implementation of the Tweedie distribution. First, load the package.

library(fable.intermittent)

Data

The package includes two data sets,auto and raf. Each data set is saved as a tsibble object, which is a tidy data structure for time series data. The identifiers of the time series are stored in the series_id column. Time indices are stored in the index column. The observed values are stored in the value column.

In this vignette, we use a subset sampling 200 time series from the auto data set, which contains 3000 time series of monthly sales of spare parts for cars. We use the filter() function to select the desired time series. Each time series is composed by 24 observations spanning from January 2010 to December 2011.

data(auto)
idx <- paste0("TS", sample(3000, 200))
data <- auto |>
  dplyr::filter(series_id %in% idx)

First observations of the data subset
series_id	index	value
TS100	2010 Jan	0
TS100	2010 Feb	6
TS100	2010 Mar	8
TS100	2010 Apr	8
TS100	2010 May	34
TS100	2010 Jun	0

Example time series from the auto data set

Fitting and forecasting

The package provides several models from the literature, including:

BETANBB()
EMPDISTR()
GAMPOISB()
HSPES()
MARWAL()
NEGBINES()
VZ()
WSS()

Moreover, two novel methods are implemented:

TWEES()
STATICDISTR()

See the documentation for details on each function.

Before fitting the methods, we use the filter() function to drop the last 6 observation from each time series, which will be used for evaluating the forecasts. The methods are fitted using the model() function. In this vignette, we only use a pool of 4 models, two of which are the novel ones.

fit <- data |>
  dplyr::filter(index <= tsibble::yearmonth("2011 June")) |>
  fabletools::model(
    betanbb = BETANBB(value),
    staticdistr = STATICDISTR(value),
    twees = TWEES(value),
    wss = WSS(value)
  )

Fitted models
series_id	betanbb	staticdistr	twees	wss
TS100	BETANBB	STATICDISTR	TWEES	WSS
TS1015	BETANBB	STATICDISTR	TWEES	WSS
TS1027	BETANBB	STATICDISTR	TWEES	WSS
TS103	BETANBB	STATICDISTR	TWEES	WSS
TS1036	BETANBB	STATICDISTR	TWEES	WSS
TS1078	BETANBB	STATICDISTR	TWEES	WSS

Forecasts are generated using the forecast() function, which produces a fable object containing the forecasts and their associated uncertainty. Other than the time series identifiers and the time indices, it contains a .model column that indicates the method used to generate the forecasts, a value column storing the predictive distributions (implemented as distributional objects), and a .mean column for the point forecast (the mean). We generate forecasts for the next 6 months.

fc <- fit |>
  fabletools::forecast(h = "6 months")

Forecasts for the next 6 months
series_id	.model	index	value	.mean
TS100	betanbb	2011 Jul	sample[5000]	8.1980
TS100	betanbb	2011 Aug	sample[5000]	8.1302
TS100	betanbb	2011 Sep	sample[5000]	8.1690
TS100	betanbb	2011 Oct	sample[5000]	8.7408
TS100	betanbb	2011 Nov	sample[5000]	8.2350
TS100	betanbb	2011 Dec	sample[5000]	8.4800

Evaluation

We evaluate the forecasts using the accuracy() function: among the different scoring rules, we use RMSSE for mean forecasts, and the pinball loss for quantile forecasts. To compare the performance of forecasting methods, we group the scores by model (via group_by()) and we aggregate them using the mean (via summarise()).

results <- fc |>
  fabletools::accuracy(data, measures = list(
    RMSSE = fabletools::RMSSE, 
    pinball_loss = fabletools::pinball_loss
    )) |>
  dplyr::group_by(.model) |>
  dplyr::summarise(
    RMSSE = mean(RMSSE), 
    pinball_loss  = mean(pinball_loss)
    )

Forecast accuracy metrics
.model	RMSSE	pinball_loss
betanbb	0.847	1.298
staticdistr	0.846	1.224
twees	0.848	1.197
wss	0.891	1.317

Example forecast for one time series

References

GitHub repository