fable.intermittent is an R package for modeling and
forecasting intermittent time series in the tidyverts
framework, providing a suite of probabilistic methods and data sets.
This vignette introduces the main features and demonstrates basic
usage.
You can install the development version from GitHub:
We show different features of the package, including the data sets, a pipeline to fit models, generate forecast and evaluate them, and our implementation of the Tweedie distribution. First, load the package.
The package includes two data sets,auto and
raf. Each data set is saved as a tsibble
object, which is a tidy data structure for time series data. The
identifiers of the time series are stored in the series_id
column. Time indices are stored in the index column. The
observed values are stored in the value column.
In this vignette, we use a subset sampling 200 time series from the
auto data set, which contains 3000 time series of monthly
sales of spare parts for cars. We use the filter() function
to select the desired time series. Each time series is composed by 24
observations spanning from January 2010 to December 2011.
| series_id | index | value |
|---|---|---|
| TS100 | 2010 Jan | 0 |
| TS100 | 2010 Feb | 6 |
| TS100 | 2010 Mar | 8 |
| TS100 | 2010 Apr | 8 |
| TS100 | 2010 May | 34 |
| TS100 | 2010 Jun | 0 |
Example time series from the auto data set
The package provides several models from the literature, including:
BETANBB()EMPDISTR()GAMPOISB()HSPES()MARWAL()NEGBINES()VZ()WSS()Moreover, two novel methods are implemented:
TWEES()STATICDISTR()See the documentation for details on each function.
Before fitting the methods, we use the filter() function
to drop the last 6 observation from each time series, which will be used
for evaluating the forecasts. The methods are fitted using the
model() function. In this vignette, we only use a pool of 4
models, two of which are the novel ones.
fit <- data |>
dplyr::filter(index <= tsibble::yearmonth("2011 June")) |>
fabletools::model(
betanbb = BETANBB(value),
staticdistr = STATICDISTR(value),
twees = TWEES(value),
wss = WSS(value)
)| series_id | betanbb | staticdistr | twees | wss |
|---|---|---|---|---|
| TS100 | BETANBB | STATICDISTR | TWEES | WSS |
| TS1015 | BETANBB | STATICDISTR | TWEES | WSS |
| TS1027 | BETANBB | STATICDISTR | TWEES | WSS |
| TS103 | BETANBB | STATICDISTR | TWEES | WSS |
| TS1036 | BETANBB | STATICDISTR | TWEES | WSS |
| TS1078 | BETANBB | STATICDISTR | TWEES | WSS |
Forecasts are generated using the forecast() function,
which produces a fable object containing the forecasts and
their associated uncertainty. Other than the time series identifiers and
the time indices, it contains a .model column that
indicates the method used to generate the forecasts, a
value column storing the predictive distributions
(implemented as distributional objects), and a
.mean column for the point forecast (the mean). We generate
forecasts for the next 6 months.
| series_id | .model | index | value | .mean |
|---|---|---|---|---|
| TS100 | betanbb | 2011 Jul | sample[5000] | 8.1980 |
| TS100 | betanbb | 2011 Aug | sample[5000] | 8.1302 |
| TS100 | betanbb | 2011 Sep | sample[5000] | 8.1690 |
| TS100 | betanbb | 2011 Oct | sample[5000] | 8.7408 |
| TS100 | betanbb | 2011 Nov | sample[5000] | 8.2350 |
| TS100 | betanbb | 2011 Dec | sample[5000] | 8.4800 |
We evaluate the forecasts using the accuracy() function:
among the different scoring rules, we use RMSSE for mean forecasts, and
the pinball loss for quantile forecasts. To compare the performance of
forecasting methods, we group the scores by model (via
group_by()) and we aggregate them using the mean (via
summarise()).
results <- fc |>
fabletools::accuracy(data, measures = list(
RMSSE = fabletools::RMSSE,
pinball_loss = fabletools::pinball_loss
)) |>
dplyr::group_by(.model) |>
dplyr::summarise(
RMSSE = mean(RMSSE),
pinball_loss = mean(pinball_loss)
)| .model | RMSSE | pinball_loss |
|---|---|---|
| betanbb | 0.847 | 1.298 |
| staticdistr | 0.846 | 1.224 |
| twees | 0.848 | 1.197 |
| wss | 0.891 | 1.317 |
Example forecast for one time series