| Title: | Functions and Utilities for Tidy Time Series Forecasting and Time Series Cross-Validation |
|---|---|
| Description: | Provides functions and tools for tidy time series analysis and forecasting as well as time series cross-validation. This is mainly a set of wrapper and helper functions as well as some extensions for the packages 'tsibble', 'fable', and 'fabletools'. |
| Authors: | Alexander Häußer [aut, cre, cph] (ORCID: <https://orcid.org/0009-0000-5419-8479>) |
| Maintainer: | Alexander Häußer <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-05-13 14:43:56 UTC |
| Source: | https://github.com/cran/tscv |
Estimate the sample autocorrelation function of a numeric vector.
acf_vec(x, lag_max = 24, ...)acf_vec(x, lag_max = 24, ...)
x |
Numeric vector. |
lag_max |
Integer. Maximum lag for which the autocorrelation is estimated. |
... |
Further arguments passed to |
acf_vec() is a small wrapper around stats::acf(). It returns
the sample autocorrelations as a numeric vector and removes lag 0 from the
output, because lag 0 is always equal to 1 and is usually not needed for
diagnostics.
A numeric vector containing the sample autocorrelations for lags
1 to lag_max.
Other data analysis:
estimate_acf(),
estimate_kurtosis(),
estimate_mode(),
estimate_pacf(),
estimate_skewness(),
pacf_vec(),
summarise_data(),
summarise_split(),
summarise_stats()
library(dplyr) x <- M4_monthly_data |> filter(series == first(series)) |> pull(value) acf_vec( x = x, lag_max = 12 )library(dplyr) x <- M4_monthly_data |> filter(series == first(series)) |> pull(value) acf_vec( x = x, lag_max = 12 )
Check that the input data is a valid regular and ordered tsibble, fill
implicit gaps if requested, and convert wide data to long format.
check_data(data, fill_missing = TRUE)check_data(data, fill_missing = TRUE)
data |
A |
fill_missing |
Logical value. If |
check_data() is a data preparation helper for time series workflows.
It performs three tasks:
checks that data is a tsibble;
checks that the time index is regular and ordered by key and index;
optionally turns implicit missing values into explicit missing values
using fill_gaps().
If the input data has no key variables, it is treated as wide data and is
converted to long format. The resulting output contains the original index
column, a variable column containing the former column names, and a
value column containing the corresponding observations.
Existing explicit missing values are not changed.
A tsibble prepared for downstream use. Wide data is returned in long
format with one measurement variable named value.
Other data preparation:
interpolate_missing(),
smooth_outlier()
library(dplyr) library(tsibble) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) |> as_tsibble( index = index, key = series ) check_data(data) wide_data <- data |> as_tibble() |> select(index, series, value) |> tidyr::pivot_wider( names_from = series, values_from = value ) |> as_tsibble(index = index) check_data(wide_data)library(dplyr) library(tsibble) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) |> as_tsibble( index = index, key = series ) check_data(data) wide_data <- data |> as_tibble() |> select(index, series, value) |> tidyr::pivot_wider( names_from = series, values_from = value ) |> as_tsibble(index = index) check_data(wide_data)
Specify a Double Seasonal Holt-Winters model for use with
fabletools::model().
DSHW(formula, ...)DSHW(formula, ...)
formula |
A model formula specifying the response variable, for example
|
... |
Further arguments passed to |
DSHW() is a model specification wrapper around
forecast::dshw() for the fable, tsibble, and
fabletools ecosystem.
The model is useful for time series with two important seasonal patterns, such as hourly data with daily and weekly seasonality.
The seasonal periods must be supplied through the periods argument.
A model definition that can be used inside fabletools::model().
Other DSHW:
fitted.DSHW(),
forecast.DSHW(),
model_sum.DSHW(),
residuals.DSHW()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) model_frame
Hourly tibble with actual electricity loads and load forecasts from the ENTSO-E Transparency Platform. The data set contains time series data from 2019-01-01 00:00:00 to 2019-12-31 23:00:00 for 8 bidding zones within Europe (DE, DK1, ES, FI, FR, NL, NO1, SE1). The original data are on a quarter-hourly basis (15-minutes interval), but aggregated to hourly data.
data(elec_load)data(elec_load)
A time series object of class tibble with 140.160 rows and 5 columns:
time: Date and time index
item: Time series name
unit: Measured unit
bidding_zone: Bidding zone
value: Measurement variable
data(elec_load)data(elec_load)
Hourly tibble with day-ahead electricity spot prices from the ENTSO-E Transparency Platform. The data set contains time series data from 2019-01-01 00:00:00 to 2020-12-31 23:00:00 for 8 bidding zones within Europe (DE, DK1, ES, FI, FR, NL, NO1, SE1).
data(elec_price)data(elec_price)
A time series object of class tibble with 140.352 rows and 5 columns:
time: Date and time index
item: Time series name
unit: Measured unit
bidding_zone: Bidding zone
value: Measurement variable
data(elec_price)data(elec_price)
Estimate the sample autocorrelation function for one or more time series in a
tibble.
estimate_acf(.data, context, lag_max = 24, level = 0.9, ...)estimate_acf(.data, context, lag_max = 24, level = 0.9, ...)
.data |
A |
context |
A named |
lag_max |
Integer. Maximum lag for which the autocorrelation is estimated. |
level |
Numeric value. Confidence level used to calculate the approximate significance bound. |
... |
Further arguments passed to |
estimate_acf() groups the input data by the series identifier supplied
in context and estimates the sample autocorrelation function for each
time series separately.
The output contains one row per series and lag. The column bound
contains an approximate significance threshold based on the selected
confidence level. The logical column sign indicates whether the
absolute autocorrelation is larger than this threshold.
A tibble with the series identifier and the columns type,
lag, value, bound, and sign.
Other data analysis:
acf_vec(),
estimate_kurtosis(),
estimate_mode(),
estimate_pacf(),
estimate_skewness(),
pacf_vec(),
summarise_data(),
summarise_split(),
summarise_stats()
library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) estimate_acf( .data = data, context = context, lag_max = 12 )library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) estimate_acf( .data = data, context = context, lag_max = 12 )
Estimate the kurtosis of a numeric distribution.
estimate_kurtosis(x, na_rm = TRUE)estimate_kurtosis(x, na_rm = TRUE)
x |
Numeric vector. |
na_rm |
Logical value. If |
The function computes the moment-based kurtosis
Missing values are removed by default.
This returns the usual kurtosis, not excess kurtosis. A normal distribution
has kurtosis close to 3.
A numeric value giving the estimated kurtosis.
Other data analysis:
acf_vec(),
estimate_acf(),
estimate_mode(),
estimate_pacf(),
estimate_skewness(),
pacf_vec(),
summarise_data(),
summarise_split(),
summarise_stats()
x <- c(1, 2, 3, 4, 5, NA) estimate_kurtosis(x) estimate_kurtosis(x, na_rm = TRUE) set.seed(123) y <- rnorm(100) estimate_kurtosis(y)x <- c(1, 2, 3, 4, 5, NA) estimate_kurtosis(x) estimate_kurtosis(x, na_rm = TRUE) set.seed(123) y <- rnorm(100) estimate_kurtosis(y)
Estimate the mode of a numeric distribution using kernel density estimation.
estimate_mode(x, na_rm = TRUE, ...)estimate_mode(x, na_rm = TRUE, ...)
x |
Numeric vector. |
na_rm |
Logical value. If |
... |
Further arguments passed to |
The function computes a kernel density estimate with stats::density()
and returns the value of x at which the estimated density is largest.
Missing values are removed by default. Additional arguments are passed to
stats::density(), for example bw, kernel, or n.
A numeric value giving the estimated mode of the distribution.
Other data analysis:
acf_vec(),
estimate_acf(),
estimate_kurtosis(),
estimate_pacf(),
estimate_skewness(),
pacf_vec(),
summarise_data(),
summarise_split(),
summarise_stats()
x <- c(1, 1, 2, 2, 2, 3, 4, NA) estimate_mode(x) estimate_mode(x, na_rm = TRUE) estimate_mode(x, bw = "nrd0") set.seed(123) y <- rnorm(100, mean = 5) estimate_mode(y)x <- c(1, 1, 2, 2, 2, 3, 4, NA) estimate_mode(x) estimate_mode(x, na_rm = TRUE) estimate_mode(x, bw = "nrd0") set.seed(123) y <- rnorm(100, mean = 5) estimate_mode(y)
Estimate the sample partial autocorrelation function for one or more time
series in a tibble.
estimate_pacf(.data, context, lag_max = 24, level = 0.9, ...)estimate_pacf(.data, context, lag_max = 24, level = 0.9, ...)
.data |
A |
context |
A named |
lag_max |
Integer. Maximum lag for which the partial autocorrelation is estimated. |
level |
Numeric value. Confidence level used to calculate the approximate significance bound. |
... |
Further arguments passed to |
estimate_pacf() groups the input data by the series identifier
supplied in context and estimates the sample partial autocorrelation
function for each time series separately.
The output contains one row per series and lag. The column bound
contains an approximate significance threshold based on the selected
confidence level. The logical column sign indicates whether the
absolute partial autocorrelation is larger than this threshold.
A tibble with the series identifier and the columns type,
lag, value, bound, and sign.
Other data analysis:
acf_vec(),
estimate_acf(),
estimate_kurtosis(),
estimate_mode(),
estimate_skewness(),
pacf_vec(),
summarise_data(),
summarise_split(),
summarise_stats()
library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) estimate_pacf( .data = data, context = context, lag_max = 12 )library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) estimate_pacf( .data = data, context = context, lag_max = 12 )
Estimate the skewness of a numeric distribution.
estimate_skewness(x, na_rm = TRUE)estimate_skewness(x, na_rm = TRUE)
x |
Numeric vector. |
na_rm |
Logical value. If |
The function computes the moment-based skewness
Missing values are removed by default. Positive values indicate a distribution with a longer or heavier right tail; negative values indicate a distribution with a longer or heavier left tail.
A numeric value giving the estimated skewness.
Other data analysis:
acf_vec(),
estimate_acf(),
estimate_kurtosis(),
estimate_mode(),
estimate_pacf(),
pacf_vec(),
summarise_data(),
summarise_split(),
summarise_stats()
x <- c(1, 2, 3, 4, 10, NA) estimate_skewness(x) estimate_skewness(x, na_rm = TRUE) set.seed(123) y <- rexp(100) estimate_skewness(y)x <- c(1, 2, 3, 4, 10, NA) estimate_skewness(x) estimate_skewness(x, na_rm = TRUE) set.seed(123) y <- rexp(100) estimate_skewness(y)
Extract fitted values from a fitted DSHW model.
## S3 method for class 'DSHW' fitted(object, ...)## S3 method for class 'DSHW' fitted(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Fitted values.
Other DSHW:
DSHW(),
forecast.DSHW(),
model_sum.DSHW(),
residuals.DSHW()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) fitted(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) fitted(model_frame)
Extract fitted values from a fitted MEDIAN model.
## S3 method for class 'MEDIAN' fitted(object, ...)## S3 method for class 'MEDIAN' fitted(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Fitted values.
Other MEDIAN:
MEDIAN(),
forecast.MEDIAN(),
model_sum.MEDIAN(),
residuals.MEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) fitted(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) fitted(model_frame)
Extract fitted values from a fitted SMEAN model.
## S3 method for class 'SMEAN' fitted(object, ...)## S3 method for class 'SMEAN' fitted(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Fitted values.
Other SMEAN:
SMEAN(),
forecast.SMEAN(),
model_sum.SMEAN(),
residuals.SMEAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) fitted(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) fitted(model_frame)
Extract fitted values from a fitted SMEDIAN model.
## S3 method for class 'SMEDIAN' fitted(object, ...)## S3 method for class 'SMEDIAN' fitted(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Fitted values.
Other SMEDIAN:
SMEDIAN(),
forecast.SMEDIAN(),
model_sum.SMEDIAN(),
residuals.SMEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) fitted(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) fitted(model_frame)
Extract fitted values from a fitted SNAIVE2 model.
## S3 method for class 'SNAIVE2' fitted(object, ...)## S3 method for class 'SNAIVE2' fitted(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Fitted values.
Other SNAIVE2:
SNAIVE2(),
forecast.SNAIVE2(),
model_sum.SNAIVE2(),
residuals.SNAIVE2()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) fitted(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) fitted(model_frame)
Extract fitted values from a fitted TBATS model.
## S3 method for class 'TBATS' fitted(object, ...)## S3 method for class 'TBATS' fitted(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
This method is used by fitted() when extracting fitted values from a
mable containing a TBATS model.
Fitted values.
Other TBATS:
TBATS(),
forecast.TBATS(),
model_sum.TBATS(),
residuals.TBATS()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) fitted(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) fitted(model_frame)
Forecast a fitted DSHW model.
## S3 method for class 'DSHW' forecast(object, new_data, specials = NULL, ...)## S3 method for class 'DSHW' forecast(object, new_data, specials = NULL, ...)
object |
A fitted |
new_data |
A |
specials |
Parsed specials. Currently not used. |
... |
Additional arguments. Currently not used. |
A vector of forecast distributions.
Other DSHW:
DSHW(),
fitted.DSHW(),
model_sum.DSHW(),
residuals.DSHW()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) forecast(model_frame, h = 24)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) forecast(model_frame, h = 24)
Forecast a fitted MEDIAN model.
## S3 method for class 'MEDIAN' forecast(object, new_data, specials = NULL, ...)## S3 method for class 'MEDIAN' forecast(object, new_data, specials = NULL, ...)
object |
A fitted |
new_data |
A |
specials |
Parsed specials. Currently not used. |
... |
Additional arguments. Currently not used. |
A vector of forecast distributions.
Other MEDIAN:
MEDIAN(),
fitted.MEDIAN(),
model_sum.MEDIAN(),
residuals.MEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) forecast(model_frame, h = 12)library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) forecast(model_frame, h = 12)
Forecast a fitted SMEAN model.
## S3 method for class 'SMEAN' forecast(object, new_data, specials = NULL, ...)## S3 method for class 'SMEAN' forecast(object, new_data, specials = NULL, ...)
object |
A fitted |
new_data |
A |
specials |
Parsed specials. Currently not used. |
... |
Additional arguments. Currently not used. |
A vector of forecast distributions.
Other SMEAN:
SMEAN(),
fitted.SMEAN(),
model_sum.SMEAN(),
residuals.SMEAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) forecast(model_frame, h = 12)library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) forecast(model_frame, h = 12)
Forecast a fitted SMEDIAN model.
## S3 method for class 'SMEDIAN' forecast(object, new_data, specials = NULL, ...)## S3 method for class 'SMEDIAN' forecast(object, new_data, specials = NULL, ...)
object |
A fitted |
new_data |
A |
specials |
Parsed specials. Currently not used. |
... |
Additional arguments. Currently not used. |
A vector of forecast distributions.
Other SMEDIAN:
SMEDIAN(),
fitted.SMEDIAN(),
model_sum.SMEDIAN(),
residuals.SMEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) forecast(model_frame, h = 24)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) forecast(model_frame, h = 24)
Forecast a fitted SNAIVE2 model.
## S3 method for class 'SNAIVE2' forecast(object, new_data, specials = NULL, ...)## S3 method for class 'SNAIVE2' forecast(object, new_data, specials = NULL, ...)
object |
A fitted |
new_data |
A |
specials |
Parsed specials. Currently not used. |
... |
Additional arguments. Currently not used. |
A vector of forecast distributions.
Other SNAIVE2:
SNAIVE2(),
fitted.SNAIVE2(),
model_sum.SNAIVE2(),
residuals.SNAIVE2()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) forecast(model_frame, h = 24)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) forecast(model_frame, h = 24)
Forecast a fitted TBATS model.
## S3 method for class 'TBATS' forecast(object, new_data, specials = NULL, ...)## S3 method for class 'TBATS' forecast(object, new_data, specials = NULL, ...)
object |
A fitted |
new_data |
A |
specials |
Parsed specials. Currently not used. |
... |
Additional arguments. Currently not used. |
This method is used by forecast() when forecasting a mable containing
a TBATS model.
A vector of forecast distributions.
Other TBATS:
TBATS(),
fitted.TBATS(),
model_sum.TBATS(),
residuals.TBATS()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) forecast(model_frame, h = 24)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) forecast(model_frame, h = 24)
Interpolate missing values in a numeric time series.
interpolate_missing(x, periods, ...)interpolate_missing(x, periods, ...)
x |
Numeric vector containing the time series observations. |
periods |
Numeric vector giving the seasonal periods of the time series,
for example |
... |
Further arguments passed to |
interpolate_missing() is a small wrapper around
forecast::na.interp(). The input vector is first converted to an
msts object using the seasonal periods supplied in periods.
For non-seasonal time series, missing values are replaced using linear
interpolation. For seasonal time series, forecast::na.interp() uses an
STL-based approach: the series is decomposed, the seasonally adjusted series
is interpolated, and the seasonal component is added back.
The function returns a plain numeric vector with the same length as the input.
A numeric vector with missing values interpolated.
Other data preparation:
check_data(),
smooth_outlier()
library(dplyr) x <- M4_monthly_data |> filter(series == first(series)) |> pull(value) x_missing <- x x_missing[c(10, 20, 30)] <- NA x_interpolated <- interpolate_missing( x = x_missing, periods = 12 ) anyNA(x_missing) anyNA(x_interpolated) hourly <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 14) |> pull(value) hourly_missing <- hourly hourly_missing[c(24, 48, 72)] <- NA interpolate_missing( x = hourly_missing, periods = c(24, 168) )library(dplyr) x <- M4_monthly_data |> filter(series == first(series)) |> pull(value) x_missing <- x x_missing[c(10, 20, 30)] <- NA x_interpolated <- interpolate_missing( x = x_missing, periods = 12 ) anyNA(x_missing) anyNA(x_interpolated) hourly <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 14) |> pull(value) hourly_missing <- hourly hourly_missing[c(24, 48, 72)] <- NA interpolate_missing( x = hourly_missing, periods = c(24, 168) )
The data set contains 30 selected time series on a monthly basis from the M4 Competition.
data(M4_monthly_data)data(M4_monthly_data)
A time series object of class tibble with 7881 rows and 4 columns:
index: Date and time index
series: Time series ID from M4 forecasting competition
category: Category from M4 forecasting competition
value: Measurement variable
data(M4_monthly_data)data(M4_monthly_data)
The data set contains 30 selected time series on a quarterly basis from the M4 Competition.
data(M4_quarterly_data)data(M4_quarterly_data)
A time series object of class tibble with 2818 rows and 4 columns:
index: Date and time index
series: Time series ID from M4 forecasting competition
category: Category from M4 forecasting competition
value: Measurement variable
data(M4_quarterly_data)data(M4_quarterly_data)
Calculate the mean absolute error of a numeric vector.
mae_vec(truth, estimate, na_rm = TRUE)mae_vec(truth, estimate, na_rm = TRUE)
truth |
Numeric vector containing the actual values. |
estimate |
Numeric vector containing the forecasts. |
na_rm |
Logical value. If |
mae_vec() computes the average absolute forecast error
abs(truth - estimate). The metric is reported in the same units as the
original data.
A numeric value.
Other accuracy functions:
make_accuracy(),
make_errors(),
mape_vec(),
me_vec(),
mpe_vec(),
mse_vec(),
rmse_vec(),
smape_vec()
truth <- c(10, 20, 30) estimate <- c(8, 22, 25) mae_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) mae_vec(truth_na, estimate_na)truth <- c(10, 20, 30) estimate <- c(8, 22, 25) mae_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) mae_vec(truth_na, estimate_na)
Estimate accuracy metrics for point forecasts generated from rolling-origin time series cross-validation.
make_accuracy( future_frame, main_frame, context, dimension = "split", benchmark = NULL )make_accuracy( future_frame, main_frame, context, dimension = "split", benchmark = NULL )
future_frame |
A |
main_frame |
A |
context |
A named |
dimension |
Character value. Determines the dimension over which
accuracy is summarized. Common choices are |
benchmark |
Optional character value giving the model name used as the
benchmark for the relative mean absolute error |
make_accuracy() compares point forecasts in future_frame with
the observed values in main_frame. The two data sets are joined using
the series identifier and time index defined in context.
Accuracy can be summarized along different cross-validation dimensions:
dimension = "split" summarizes accuracy separately for each
test split.
dimension = "horizon" summarizes accuracy separately for each
forecast horizon.
The following point forecast accuracy metrics are returned:
ME: mean error.
MAE: mean absolute error.
MSE: mean squared error.
RMSE: root mean squared error.
MAPE: mean absolute percentage error.
sMAPE: symmetric mean absolute percentage error.
MPE: mean percentage error.
If benchmark is supplied, the function also computes the relative mean
absolute error rMAE. The rMAE is calculated as the model's
MAE divided by the MAE of the benchmark model for the same
series and selected dimension.
A tibble containing the forecast accuracy metrics. The output contains
the series identifier, model name, selected dimension, dimension value
n, metric name, and metric value.
Other accuracy functions:
mae_vec(),
make_errors(),
mape_vec(),
me_vec(),
mpe_vec(),
mse_vec(),
rmse_vec(),
smape_vec()
library(dplyr) library(tsibble) library(fabletools) library(fable) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) |> as_tsibble( index = index, key = c(series, split) ) model_frame <- train_frame |> model( "SNAIVE" = SNAIVE(value ~ lag("year")) ) fable_frame <- model_frame |> forecast(h = 18) future_frame <- make_future( fable = fable_frame, context = context ) accuracy_horizon <- make_accuracy( future_frame = future_frame, main_frame = main_frame, context = context, dimension = "horizon" ) accuracy_horizon accuracy_split <- make_accuracy( future_frame = future_frame, main_frame = main_frame, context = context, dimension = "split" ) accuracy_splitlibrary(dplyr) library(tsibble) library(fabletools) library(fable) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) |> as_tsibble( index = index, key = c(series, split) ) model_frame <- train_frame |> model( "SNAIVE" = SNAIVE(value ~ lag("year")) ) fable_frame <- model_frame |> forecast(h = 18) future_frame <- make_future( fable = fable_frame, context = context ) accuracy_horizon <- make_accuracy( future_frame = future_frame, main_frame = main_frame, context = context, dimension = "horizon" ) accuracy_horizon accuracy_split <- make_accuracy( future_frame = future_frame, main_frame = main_frame, context = context, dimension = "split" ) accuracy_split
Calculate forecast errors and percentage forecast errors for point forecasts.
make_errors(future_frame, main_frame, context)make_errors(future_frame, main_frame, context)
future_frame |
A |
main_frame |
A |
context |
A named |
make_errors() compares point forecasts in future_frame with the
observed values in main_frame. The two data sets are joined by the
series identifier and time index specified in context.
The forecast error is calculated as error = actual - point. The
percentage forecast error is calculated as pct_error = (actual -
point / point) * 100.
Positive errors indicate that the forecast is below the observed value. Negative errors indicate that the forecast is above the observed value.
The returned data contains:
series_id: Unique identifier for the time series as specified
in context.
model: Forecasting model name.
split: Train-test split identifier.
horizon: Forecast horizon.
error: Forecast error.
pct_error: Percentage forecast error.
A tibble containing forecast errors and percentage forecast errors.
Other accuracy functions:
mae_vec(),
make_accuracy(),
mape_vec(),
me_vec(),
mpe_vec(),
mse_vec(),
rmse_vec(),
smape_vec()
library(dplyr) library(tsibble) library(fabletools) library(fable) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) |> as_tsibble( index = index, key = c(series, split) ) model_frame <- train_frame |> model( "SNAIVE" = SNAIVE(value ~ lag("year")) ) fable_frame <- model_frame |> forecast(h = 18) future_frame <- make_future( fable = fable_frame, context = context ) error_frame <- make_errors( future_frame = future_frame, main_frame = main_frame, context = context ) error_framelibrary(dplyr) library(tsibble) library(fabletools) library(fable) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) |> as_tsibble( index = index, key = c(series, split) ) model_frame <- train_frame |> model( "SNAIVE" = SNAIVE(value ~ lag("year")) ) fable_frame <- model_frame |> forecast(h = 18) future_frame <- make_future( fable = fable_frame, context = context ) error_frame <- make_errors( future_frame = future_frame, main_frame = main_frame, context = context ) error_frame
Convert forecasts from a fable object to a standardized forecast
table.
make_future(fable, context)make_future(fable, context)
fable |
A |
context |
A named |
make_future() converts the output of forecast() into a
tibble with a consistent structure for downstream evaluation,
plotting, and accuracy calculation.
The returned future_frame contains one row per forecasted observation,
time series, split, and model. It includes the following columns:
the time index column specified by context$index_id;
the series identifier column specified by context$series_id;
model: the forecasting model name;
split: the train-test split identifier;
horizon: the forecast horizon within each series, split, and
model;
point: the point forecast, taken from the .mean column
of the fable.
This format is used by functions such as make_accuracy() and
make_errors().
A tibble containing forecasts in standardized future_frame
format.
Other time series cross-validation:
make_split(),
make_tsibble(),
slice_test(),
slice_train(),
split_index()
library(dplyr) library(tsibble) library(fable) library(fabletools) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) |> as_tsibble( index = index, key = c(series, split) ) model_frame <- train_frame |> model( "SNAIVE" = SNAIVE(value ~ lag("year")) ) fable_frame <- model_frame |> forecast(h = 18) future_frame <- make_future( fable = fable_frame, context = context ) future_framelibrary(dplyr) library(tsibble) library(fable) library(fabletools) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) |> as_tsibble( index = index, key = c(series, split) ) model_frame <- train_frame |> model( "SNAIVE" = SNAIVE(value ~ lag("year")) ) fable_frame <- model_frame |> forecast(h = 18) future_frame <- make_future( fable = fable_frame, context = context ) future_frame
Create a split frame with train and test indices for one or more time series.
make_split( main_frame, context, type, value, n_ahead, n_skip = 0, n_lag = 0, mode = "slide", exceed = TRUE )make_split( main_frame, context, type, value, n_ahead, n_skip = 0, n_lag = 0, mode = "slide", exceed = TRUE )
main_frame |
A |
context |
A named |
type |
Character value. The type of initial split. Possible values are
|
value |
Numeric value specifying the initial split. |
n_ahead |
Integer. The forecast horizon, i.e. the number of observations in each test window. |
n_skip |
Integer. The number of observations to skip between split
origins. The default is |
n_lag |
Integer. The number of lagged observations to include before the
test window. This is useful if lagged predictors are required when
constructing test features. The default is |
mode |
Character value. Either |
exceed |
Logical value. If |
make_split() creates rolling-origin train-test splits for time series
cross-validation. The output is used by functions such as
slice_train() and slice_test() to extract the corresponding
training and testing samples from main_frame.
The function supports two training-window modes:
mode = "slide" creates a fixed-window approach. The training
window has constant length and moves forward over time.
mode = "stretch" creates an expanding-window approach. The
training window starts at the first observation and grows over time.
The initial training window is controlled by type and value:
type = "first" uses the first value observations as the
initial training window.
type = "last" keeps the last value observations for
testing and derives the initial training window from the remaining sample.
type = "prob" uses floor(value * n_total) observations
as the initial training window.
The argument n_skip controls how far the rolling origin moves between
consecutive splits. For non-overlapping test windows, use
n_skip = n_ahead - 1.
A tibble containing the split plan. The output has one row per time
series and split, with list-columns train and test containing
integer row positions.
Other time series cross-validation:
make_future(),
make_tsibble(),
slice_test(),
slice_train(),
split_index()
library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series == "M23100") # Fixed-window split plan fixed_split <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "slide", exceed = FALSE ) fixed_split # Expanding-window split plan expanding_split <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) expanding_splitlibrary(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series == "M23100") # Fixed-window split plan fixed_split <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "slide", exceed = FALSE ) fixed_split # Expanding-window split plan expanding_split <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) expanding_split
Convert a tibble containing time series data to a tsibble.
make_tsibble(main_frame, context)make_tsibble(main_frame, context)
main_frame |
A |
context |
A named |
make_tsibble() is a small helper for time series cross-validation
workflows. It uses the time index and series identifier supplied in
context to create a regular tsibble.
The input data must contain the columns specified by context$index_id
and context$series_id. The column specified by context$index_id
is used as the time index, and the column specified by
context$series_id is used as the key.
A tsibble with the same columns as main_frame, using the index
and key defined in context.
Other time series cross-validation:
make_future(),
make_split(),
slice_test(),
slice_train(),
split_index()
library(dplyr) library(tsibble) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) tsibble_frame <- make_tsibble( main_frame = main_frame, context = context ) tsibble_framelibrary(dplyr) library(tsibble) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) tsibble_frame <- make_tsibble( main_frame = main_frame, context = context ) tsibble_frame
Calculate the mean absolute percentage error of a numeric vector.
mape_vec(truth, estimate, na_rm = TRUE)mape_vec(truth, estimate, na_rm = TRUE)
truth |
Numeric vector containing the actual values. |
estimate |
Numeric vector containing the forecasts. |
na_rm |
Logical value. If |
mape_vec() computes the average absolute percentage forecast error:
abs((truth - estimate) / truth) * 100.
This metric is undefined when truth is zero and may return
Inf or NaN in such cases.
A numeric value.
Other accuracy functions:
mae_vec(),
make_accuracy(),
make_errors(),
me_vec(),
mpe_vec(),
mse_vec(),
rmse_vec(),
smape_vec()
truth <- c(10, 20, 40) estimate <- c(8, 22, 30) mape_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) mape_vec(truth_na, estimate_na)truth <- c(10, 20, 40) estimate <- c(8, 22, 30) mape_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) mape_vec(truth_na, estimate_na)
Calculate the mean error of a numeric vector.
me_vec(truth, estimate, na_rm = TRUE)me_vec(truth, estimate, na_rm = TRUE)
truth |
Numeric vector containing the actual values. |
estimate |
Numeric vector containing the forecasts. |
na_rm |
Logical value. If |
me_vec() computes the average signed forecast error
truth - estimate. Positive values indicate that forecasts are, on
average, below the observed values. Negative values indicate that forecasts
are, on average, above the observed values.
The metric is reported in the same units as the original data.
A numeric value.
Other accuracy functions:
mae_vec(),
make_accuracy(),
make_errors(),
mape_vec(),
mpe_vec(),
mse_vec(),
rmse_vec(),
smape_vec()
truth <- c(10, 20, 30) estimate <- c(8, 22, 25) me_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) me_vec(truth_na, estimate_na)truth <- c(10, 20, 30) estimate <- c(8, 22, 25) me_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) me_vec(truth_na, estimate_na)
Specify a median benchmark model for use with fabletools::model().
MEDIAN(formula, ...)MEDIAN(formula, ...)
formula |
A model formula specifying the response and optional
|
... |
Further arguments. |
MEDIAN() forecasts future values using the median of the observed
response. The window() special controls whether the median is estimated
using all observations, a fixed trailing window, or a rolling window.
A model definition that can be used inside fabletools::model().
Other MEDIAN:
fitted.MEDIAN(),
forecast.MEDIAN(),
model_sum.MEDIAN(),
residuals.MEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) model_frame
Return a short model label for a fitted DSHW model.
## S3 method for class 'DSHW' model_sum(x)## S3 method for class 'DSHW' model_sum(x)
x |
A fitted |
A character string.
Other DSHW:
DSHW(),
fitted.DSHW(),
forecast.DSHW(),
residuals.DSHW()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) model_frame
Return a short model label for a fitted MEDIAN model.
## S3 method for class 'MEDIAN' model_sum(x)## S3 method for class 'MEDIAN' model_sum(x)
x |
A fitted |
A character string.
Other MEDIAN:
MEDIAN(),
fitted.MEDIAN(),
forecast.MEDIAN(),
residuals.MEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) model_frame
Return a short model label for a fitted SMEAN model.
## S3 method for class 'SMEAN' model_sum(x)## S3 method for class 'SMEAN' model_sum(x)
x |
A fitted |
A character string.
Other SMEAN:
SMEAN(),
fitted.SMEAN(),
forecast.SMEAN(),
residuals.SMEAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) model_frame
Return a short model label for a fitted SMEDIAN model.
## S3 method for class 'SMEDIAN' model_sum(x)## S3 method for class 'SMEDIAN' model_sum(x)
x |
A fitted |
A character string.
Other SMEDIAN:
SMEDIAN(),
fitted.SMEDIAN(),
forecast.SMEDIAN(),
residuals.SMEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) model_frame
Return a short model label for a fitted SNAIVE2 model.
## S3 method for class 'SNAIVE2' model_sum(x)## S3 method for class 'SNAIVE2' model_sum(x)
x |
A fitted |
A character string.
Other SNAIVE2:
SNAIVE2(),
fitted.SNAIVE2(),
forecast.SNAIVE2(),
residuals.SNAIVE2()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) model_frame
Return a short model label for a fitted TBATS model.
## S3 method for class 'TBATS' model_sum(x)## S3 method for class 'TBATS' model_sum(x)
x |
A fitted |
A character string.
Other TBATS:
TBATS(),
fitted.TBATS(),
forecast.TBATS(),
residuals.TBATS()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) model_frame
Calculate the mean percentage error of a numeric vector.
mpe_vec(truth, estimate, na_rm = TRUE)mpe_vec(truth, estimate, na_rm = TRUE)
truth |
Numeric vector containing the actual values. |
estimate |
Numeric vector containing the forecasts. |
na_rm |
Logical value. If |
mpe_vec() computes the average signed percentage forecast error:
((truth - estimate) / truth) * 100. Positive values indicate that
forecasts are, on average, below the observed values in percentage terms.
Negative values indicate that forecasts are, on average, above the observed
values.
This metric is undefined when truth is zero and may return
Inf, -Inf, or NaN in such cases.
A numeric value.
Other accuracy functions:
mae_vec(),
make_accuracy(),
make_errors(),
mape_vec(),
me_vec(),
mse_vec(),
rmse_vec(),
smape_vec()
truth <- c(10, 20, 40) estimate <- c(8, 22, 30) mpe_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) mpe_vec(truth_na, estimate_na)truth <- c(10, 20, 40) estimate <- c(8, 22, 30) mpe_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) mpe_vec(truth_na, estimate_na)
Calculate the mean squared error of a numeric vector.
mse_vec(truth, estimate, na_rm = TRUE)mse_vec(truth, estimate, na_rm = TRUE)
truth |
Numeric vector containing the actual values. |
estimate |
Numeric vector containing the forecasts. |
na_rm |
Logical value. If |
mse_vec() computes the average squared forecast error
(truth - estimate)^2. The metric is reported in squared units of the
original data.
A numeric value.
Other accuracy functions:
mae_vec(),
make_accuracy(),
make_errors(),
mape_vec(),
me_vec(),
mpe_vec(),
rmse_vec(),
smape_vec()
truth <- c(10, 20, 30) estimate <- c(8, 22, 25) mse_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) mse_vec(truth_na, estimate_na)truth <- c(10, 20, 30) estimate <- c(8, 22, 25) mse_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) mse_vec(truth_na, estimate_na)
Estimate the sample partial autocorrelation function of a numeric vector.
pacf_vec(x, lag_max = 24, ...)pacf_vec(x, lag_max = 24, ...)
x |
Numeric vector. |
lag_max |
Integer. Maximum lag for which the partial autocorrelation is estimated. |
... |
Further arguments passed to |
pacf_vec() is a small wrapper around stats::pacf(). It returns
the sample partial autocorrelations as a numeric vector for lags
1 to lag_max.
A numeric vector containing the sample partial autocorrelations for lags
1 to lag_max.
Other data analysis:
acf_vec(),
estimate_acf(),
estimate_kurtosis(),
estimate_mode(),
estimate_pacf(),
estimate_skewness(),
summarise_data(),
summarise_split(),
summarise_stats()
library(dplyr) x <- M4_monthly_data |> filter(series == first(series)) |> pull(value) pacf_vec( x = x, lag_max = 12 )library(dplyr) x <- M4_monthly_data |> filter(series == first(series)) |> pull(value) pacf_vec( x = x, lag_max = 12 )
Create a bar chart for grouped numeric values.
plot_bar( data, x, y, position = "dodge", facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, flip = FALSE, reorder = FALSE, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, bar_size = 0.75, bar_color = "grey35", bar_alpha = 1, theme_set = theme_tscv(), theme_config = list(), ... )plot_bar( data, x, y, position = "dodge", facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, flip = FALSE, reorder = FALSE, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, bar_size = 0.75, bar_color = "grey35", bar_alpha = 1, theme_set = theme_tscv(), theme_config = list(), ... )
data |
A |
x |
Unquoted column in |
y |
Unquoted column in |
position |
Character value defining the bar position. Common values are
|
facet_var |
Optional unquoted column in |
facet_scale |
Character value defining facet axis scaling. Common values
are |
facet_nrow |
Optional integer. Number of rows in the facet layout. |
facet_ncol |
Optional integer. Number of columns in the facet layout. |
color |
Optional unquoted column in |
flip |
Logical value. If |
reorder |
Logical value. If |
title |
Character value. Plot title. |
subtitle |
Character value. Plot subtitle. |
xlab |
Character value. Label for the x-axis. |
ylab |
Character value. Label for the y-axis. |
caption |
Character value. Plot caption. |
bar_size |
Numeric value defining the bar border line width. |
bar_color |
Character value defining the bar fill color. Ignored when
|
bar_alpha |
Numeric value between |
theme_set |
A complete |
theme_config |
A named |
... |
Currently not used. |
plot_bar() is a convenience wrapper around ggplot2::geom_bar()
with stat = "identity". It is intended for data that already contains
summarized values, for example accuracy metrics, counts, or grouped summary
statistics.
The arguments x, y, facet_var, and color are
passed as unquoted column names.
If color is supplied, bar fill colors are mapped to that variable and
bar_color is ignored. If color is not supplied, all bars are
drawn using bar_color.
The argument position controls how bars are displayed when
color is supplied. Common values are "dodge" and
"stack".
If flip = TRUE, the x-axis and y-axis are swapped using
ggplot2::coord_flip().
If reorder = TRUE, tidytext::scale_x_reordered() is added.
This is useful when the x-axis has been reordered within facets with
tidytext::reorder_within().
Additional theme settings can be supplied through theme_config. This
should be a named list of arguments passed to ggplot2::theme().
An object of class ggplot.
Other data visualization:
plot_density(),
plot_histogram(),
plot_line(),
plot_point(),
plot_qq(),
scale_color_tscv(),
scale_fill_tscv(),
theme_tscv(),
tscv_cols(),
tscv_pal()
library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) stats <- summarise_stats( .data = data, context = context ) plot_bar( data = stats, x = series, y = mean, title = "Average Value by Series", xlab = "Series", ylab = "Mean" ) acf_data <- estimate_acf( .data = data, context = context, lag_max = 12 ) plot_bar( data = acf_data, x = lag, y = value, facet_var = series, title = "Autocorrelation by Series", subtitle = "Sample autocorrelation up to lag 12", xlab = "Lag", ylab = "ACF" )library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) stats <- summarise_stats( .data = data, context = context ) plot_bar( data = stats, x = series, y = mean, title = "Average Value by Series", xlab = "Series", ylab = "Mean" ) acf_data <- estimate_acf( .data = data, context = context, lag_max = 12 ) plot_bar( data = acf_data, x = lag, y = value, facet_var = series, title = "Autocorrelation by Series", subtitle = "Sample autocorrelation up to lag 12", xlab = "Lag", ylab = "ACF" )
Create a density plot for one or more numeric variables using kernel density estimation.
plot_density( data, x, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, fill = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, line_width = 0.1, line_type = "solid", line_color = "grey35", fill_color = "grey35", fill_alpha = 0.5, theme_set = theme_tscv(), theme_config = list(), ... )plot_density( data, x, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, fill = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, line_width = 0.1, line_type = "solid", line_color = "grey35", fill_color = "grey35", fill_alpha = 0.5, theme_set = theme_tscv(), theme_config = list(), ... )
data |
A |
x |
Unquoted column in |
facet_var |
Optional unquoted column in |
facet_scale |
Character value defining facet axis scaling. Common values
are |
facet_nrow |
Optional integer. Number of rows in the facet layout. |
facet_ncol |
Optional integer. Number of columns in the facet layout. |
color |
Optional unquoted column in |
fill |
Optional unquoted column in |
title |
Character value. Plot title. |
subtitle |
Character value. Plot subtitle. |
xlab |
Character value. Label for the x-axis. |
ylab |
Character value. Label for the y-axis. |
caption |
Character value. Plot caption. |
line_width |
Numeric value defining the density line width. |
line_type |
Character or numeric value defining the density line type. |
line_color |
Character value defining the density line color. Ignored
when |
fill_color |
Character value defining the fill color under the density
curve. Ignored when |
fill_alpha |
Numeric value between |
theme_set |
A complete |
theme_config |
A named |
... |
Further arguments passed to |
plot_density() is a convenience wrapper around
ggplot2::geom_density(). It is useful for comparing the distribution
of values across one or more time series, models, groups, or residual sets.
The arguments x, facet_var, color, and fill are
passed as unquoted column names.
If color is supplied, both line color and fill color are mapped to that
variable. In this case, line_color and fill_color are ignored.
If color is not supplied, all density curves use line_color and
fill_color.
Missing values are removed before plotting.
Additional arguments can be passed to ggplot2::geom_density() through
..., for example adjust, bw, or kernel.
Additional theme settings can be supplied through theme_config. This
should be a named list of arguments passed to ggplot2::theme().
An object of class ggplot.
Other data visualization:
plot_bar(),
plot_histogram(),
plot_line(),
plot_point(),
plot_qq(),
scale_color_tscv(),
scale_fill_tscv(),
theme_tscv(),
tscv_cols(),
tscv_pal()
library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_density( data = data, x = value, facet_var = series, title = "Distribution of M4 Monthly Values", subtitle = "Kernel density estimates by series", xlab = "Value", ylab = "Density" ) plot_density( data = data, x = value, color = series, title = "Distribution of M4 Monthly Values", subtitle = "Kernel density estimates by series", xlab = "Value", ylab = "Density", adjust = 1.2 )library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_density( data = data, x = value, facet_var = series, title = "Distribution of M4 Monthly Values", subtitle = "Kernel density estimates by series", xlab = "Value", ylab = "Density" ) plot_density( data = data, x = value, color = series, title = "Distribution of M4 Monthly Values", subtitle = "Kernel density estimates by series", xlab = "Value", ylab = "Density", adjust = 1.2 )
Create a histogram for one or more numeric variables.
plot_histogram( data, x, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, fill = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, line_color = "grey35", line_width = 0.5, fill_color = "grey35", fill_alpha = 1, theme_set = theme_tscv(), theme_config = list(), ... )plot_histogram( data, x, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, fill = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, line_color = "grey35", line_width = 0.5, fill_color = "grey35", fill_alpha = 1, theme_set = theme_tscv(), theme_config = list(), ... )
data |
A |
x |
Unquoted column in |
facet_var |
Optional unquoted column in |
facet_scale |
Character value defining facet axis scaling. Common values
are |
facet_nrow |
Optional integer. Number of rows in the facet layout. |
facet_ncol |
Optional integer. Number of columns in the facet layout. |
color |
Optional unquoted column in |
fill |
Optional unquoted column in |
title |
Character value. Plot title. |
subtitle |
Character value. Plot subtitle. |
xlab |
Character value. Label for the x-axis. |
ylab |
Character value. Label for the y-axis. |
caption |
Character value. Plot caption. |
line_color |
Character value defining the histogram bar outline color.
Ignored when |
line_width |
Numeric value defining the histogram bar outline width. |
fill_color |
Character value defining the histogram bar fill color.
Ignored when |
fill_alpha |
Numeric value between |
theme_set |
A complete |
theme_config |
A named |
... |
Further arguments passed to |
plot_histogram() is a convenience wrapper around
ggplot2::geom_histogram(). It is useful for visualizing the
distribution of values across one or more time series, models, groups, or
residual sets.
The arguments x, facet_var, color, and fill are
passed as unquoted column names.
If color is supplied, both bar outline color and fill color are mapped
to that variable. In this case, line_color and fill_color are
ignored. If color is not supplied, all histogram bars use
line_color and fill_color.
Missing values are removed before plotting.
Additional arguments can be passed to ggplot2::geom_histogram() through
..., for example bins, binwidth, or boundary.
Additional theme settings can be supplied through theme_config. This
should be a named list of arguments passed to ggplot2::theme().
An object of class ggplot.
Other data visualization:
plot_bar(),
plot_density(),
plot_line(),
plot_point(),
plot_qq(),
scale_color_tscv(),
scale_fill_tscv(),
theme_tscv(),
tscv_cols(),
tscv_pal()
library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_histogram( data = data, x = value, facet_var = series, title = "Distribution of M4 Monthly Values", subtitle = "Histograms by series", xlab = "Value", ylab = "Count", bins = 20 ) plot_histogram( data = data, x = value, color = series, title = "Distribution of M4 Monthly Values", subtitle = "Grouped histograms by series", xlab = "Value", ylab = "Count", bins = 20, position = "identity", fill_alpha = 0.4 )library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_histogram( data = data, x = value, facet_var = series, title = "Distribution of M4 Monthly Values", subtitle = "Histograms by series", xlab = "Value", ylab = "Count", bins = 20 ) plot_histogram( data = data, x = value, color = series, title = "Distribution of M4 Monthly Values", subtitle = "Grouped histograms by series", xlab = "Value", ylab = "Count", bins = 20, position = "identity", fill_alpha = 0.4 )
Create a line chart for one or more time series or grouped numeric variables.
plot_line( data, x, y, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, line_size = 0.75, line_type = "solid", line_color = "grey35", line_alpha = 1, theme_set = theme_tscv(), theme_config = list(), ... )plot_line( data, x, y, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, line_size = 0.75, line_type = "solid", line_color = "grey35", line_alpha = 1, theme_set = theme_tscv(), theme_config = list(), ... )
data |
A |
x |
Unquoted column in |
y |
Unquoted column in |
facet_var |
Optional unquoted column in |
facet_scale |
Character value defining facet axis scaling. Common values
are |
facet_nrow |
Optional integer. Number of rows in the facet layout. |
facet_ncol |
Optional integer. Number of columns in the facet layout. |
color |
Optional unquoted column in |
title |
Character value. Plot title. |
subtitle |
Character value. Plot subtitle. |
xlab |
Character value. Label for the x-axis. |
ylab |
Character value. Label for the y-axis. |
caption |
Character value. Plot caption. |
line_size |
Numeric value defining the line width. |
line_type |
Character or numeric value defining the line type. |
line_color |
Character value defining the line color. Ignored when
|
line_alpha |
Numeric value between |
theme_set |
A complete |
theme_config |
A named |
... |
Currently not used. |
plot_line() is a convenience wrapper around ggplot2::geom_line()
for plotting data in long format. It supports optional grouping by color,
optional faceting, and the default tscv theme.
The arguments x, y, facet_var, and color are
passed as unquoted column names.
If color is supplied, line colors are mapped to that variable and
line_color is ignored. If color is not supplied, all lines are
drawn using line_color.
Additional theme settings can be supplied through theme_config. This
should be a named list of arguments passed to ggplot2::theme().
An object of class ggplot.
Other data visualization:
plot_bar(),
plot_density(),
plot_histogram(),
plot_point(),
plot_qq(),
scale_color_tscv(),
scale_fill_tscv(),
theme_tscv(),
tscv_cols(),
tscv_pal()
library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_line( data = data, x = index, y = value, facet_var = series, title = "M4 Monthly Time Series", subtitle = "Selected monthly series from the M4 forecasting competition", xlab = "Time", ylab = "Value", caption = "Data: M4 Forecasting Competition" ) plot_line( data = data, x = index, y = value, color = series, title = "M4 Monthly Time Series", xlab = "Time", ylab = "Value", line_size = 1.5 )library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_line( data = data, x = index, y = value, facet_var = series, title = "M4 Monthly Time Series", subtitle = "Selected monthly series from the M4 forecasting competition", xlab = "Time", ylab = "Value", caption = "Data: M4 Forecasting Competition" ) plot_line( data = data, x = index, y = value, color = series, title = "M4 Monthly Time Series", xlab = "Time", ylab = "Value", line_size = 1.5 )
Create a scatterplot for two variables.
plot_point( data, x, y, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, point_size = 1.5, point_type = 16, point_color = "grey35", point_alpha = 1, theme_set = theme_tscv(), theme_config = list(), ... )plot_point( data, x, y, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, point_size = 1.5, point_type = 16, point_color = "grey35", point_alpha = 1, theme_set = theme_tscv(), theme_config = list(), ... )
data |
A |
x |
Unquoted column in |
y |
Unquoted column in |
facet_var |
Optional unquoted column in |
facet_scale |
Character value defining facet axis scaling. Common values
are |
facet_nrow |
Optional integer. Number of rows in the facet layout. |
facet_ncol |
Optional integer. Number of columns in the facet layout. |
color |
Optional unquoted column in |
title |
Character value. Plot title. |
subtitle |
Character value. Plot subtitle. |
xlab |
Character value. Label for the x-axis. |
ylab |
Character value. Label for the y-axis. |
caption |
Character value. Plot caption. |
point_size |
Numeric value defining the point size. |
point_type |
Numeric or character value defining the point shape. |
point_color |
Character value defining the point color. Ignored when
|
point_alpha |
Numeric value between |
theme_set |
A complete |
theme_config |
A named |
... |
Currently not used. |
plot_point() is a convenience wrapper around
ggplot2::geom_point(). It is useful for plotting relationships between
two variables, for example observed values over time, forecast errors by
horizon, or one numeric diagnostic against another.
The arguments x, y, facet_var, and color are
passed as unquoted column names.
If color is supplied, point colors are mapped to that variable and
point_color is ignored. If color is not supplied, all points are
drawn using point_color.
Additional theme settings can be supplied through theme_config. This
should be a named list of arguments passed to ggplot2::theme().
An object of class ggplot.
Other data visualization:
plot_bar(),
plot_density(),
plot_histogram(),
plot_line(),
plot_qq(),
scale_color_tscv(),
scale_fill_tscv(),
theme_tscv(),
tscv_cols(),
tscv_pal()
library(dplyr) data <- M4_monthly_data |> filter(series == "M23100") plot_point( data = data, x = index, y = value, title = "M4 Monthly Time Series", subtitle = "Series M23100", xlab = "Time", ylab = "Value" ) acf_data <- estimate_acf( .data = M4_monthly_data |> filter(series %in% c("M23100", "M14395")), context = list( series_id = "series", value_id = "value", index_id = "index" ), lag_max = 12 ) plot_point( data = acf_data, x = lag, y = value, color = series, title = "Autocorrelation by Series", subtitle = "Sample autocorrelation up to lag 12", xlab = "Lag", ylab = "ACF", point_size = 4 )library(dplyr) data <- M4_monthly_data |> filter(series == "M23100") plot_point( data = data, x = index, y = value, title = "M4 Monthly Time Series", subtitle = "Series M23100", xlab = "Time", ylab = "Value" ) acf_data <- estimate_acf( .data = M4_monthly_data |> filter(series %in% c("M23100", "M14395")), context = list( series_id = "series", value_id = "value", index_id = "index" ), lag_max = 12 ) plot_point( data = acf_data, x = lag, y = value, color = series, title = "Autocorrelation by Series", subtitle = "Sample autocorrelation up to lag 12", xlab = "Lag", ylab = "ACF", point_size = 4 )
Create a quantile-quantile plot for one or more numeric variables.
plot_qq( data, x, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, point_size = 2, point_shape = 16, point_color = "grey35", point_fill = "grey35", point_alpha = 0.25, line_width = 0.25, line_type = "solid", line_color = "grey35", line_alpha = 1, band_color = "grey35", band_alpha = 0.25, theme_set = theme_tscv(), theme_config = list(), ... )plot_qq( data, x, facet_var = NULL, facet_scale = "free", facet_nrow = NULL, facet_ncol = NULL, color = NULL, title = NULL, subtitle = NULL, xlab = NULL, ylab = NULL, caption = NULL, point_size = 2, point_shape = 16, point_color = "grey35", point_fill = "grey35", point_alpha = 0.25, line_width = 0.25, line_type = "solid", line_color = "grey35", line_alpha = 1, band_color = "grey35", band_alpha = 0.25, theme_set = theme_tscv(), theme_config = list(), ... )
data |
A |
x |
Unquoted column in |
facet_var |
Optional unquoted column in |
facet_scale |
Character value defining facet axis scaling. Common values
are |
facet_nrow |
Optional integer. Number of rows in the facet layout. |
facet_ncol |
Optional integer. Number of columns in the facet layout. |
color |
Optional unquoted column in |
title |
Character value. Plot title. |
subtitle |
Character value. Plot subtitle. |
xlab |
Character value. Label for the x-axis. |
ylab |
Character value. Label for the y-axis. |
caption |
Character value. Plot caption. |
point_size |
Numeric value defining the point size. |
point_shape |
Numeric or character value defining the point shape. |
point_color |
Character value defining the point outline color. Ignored
when |
point_fill |
Character value defining the point fill color. Ignored when
|
point_alpha |
Numeric value between |
line_width |
Numeric value defining the qq-line width. |
line_type |
Character or numeric value defining the qq-line type. |
line_color |
Character value defining the qq-line color. Ignored when
|
line_alpha |
Numeric value between |
band_color |
Character value defining the confidence-band fill color.
Ignored when |
band_alpha |
Numeric value between |
theme_set |
A complete |
theme_config |
A named |
... |
Further arguments passed to |
plot_qq() is a convenience wrapper around the qqplotr
functions stat_qq_point(), stat_qq_line(), and
stat_qq_band(). It is useful for checking whether values, residuals,
or forecast errors approximately follow a theoretical distribution.
By default, the function creates a normal quantile-quantile plot with pointwise confidence bands.
The arguments x, facet_var, and color are passed as
unquoted column names.
If color is supplied, point colors, line colors, and confidence-band
fills are mapped to that variable. In this case, point_color,
point_fill, line_color, and band_color are ignored.
If color is not supplied, the fixed styling arguments are used.
Additional arguments can be passed to the underlying qqplotr
statistics through ..., for example distributional arguments supported
by qqplotr.
Additional theme settings can be supplied through theme_config. This
should be a named list of arguments passed to ggplot2::theme().
An object of class ggplot.
Other data visualization:
plot_bar(),
plot_density(),
plot_histogram(),
plot_line(),
plot_point(),
scale_color_tscv(),
scale_fill_tscv(),
theme_tscv(),
tscv_cols(),
tscv_pal()
library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_qq( data = data, x = value, facet_var = series, title = "QQ Plot of M4 Monthly Values", subtitle = "Normal quantile-quantile plots by series", xlab = "Theoretical quantiles", ylab = "Sample quantiles" ) stats <- data |> group_by(series) |> mutate(value_centered = value - mean(value, na.rm = TRUE)) |> ungroup() plot_qq( data = stats, x = value_centered, color = series, title = "QQ Plot of Centered M4 Monthly Values", subtitle = "Normal quantile-quantile plots by series", xlab = "Theoretical quantiles", ylab = "Sample quantiles" )library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_qq( data = data, x = value, facet_var = series, title = "QQ Plot of M4 Monthly Values", subtitle = "Normal quantile-quantile plots by series", xlab = "Theoretical quantiles", ylab = "Sample quantiles" ) stats <- data |> group_by(series) |> mutate(value_centered = value - mean(value, na.rm = TRUE)) |> ungroup() plot_qq( data = stats, x = value_centered, color = series, title = "QQ Plot of Centered M4 Monthly Values", subtitle = "Normal quantile-quantile plots by series", xlab = "Theoretical quantiles", ylab = "Sample quantiles" )
Extract residuals from a fitted DSHW model.
## S3 method for class 'DSHW' residuals(object, ...)## S3 method for class 'DSHW' residuals(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Residuals.
Other DSHW:
DSHW(),
fitted.DSHW(),
forecast.DSHW(),
model_sum.DSHW()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) residuals(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_load |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 28) |> as_tsibble(index = time) model_frame <- train_frame |> model("DSHW" = DSHW(value, periods = c(24, 168))) residuals(model_frame)
Extract residuals from a fitted MEDIAN model.
## S3 method for class 'MEDIAN' residuals(object, ...)## S3 method for class 'MEDIAN' residuals(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Residuals.
Other MEDIAN:
MEDIAN(),
fitted.MEDIAN(),
forecast.MEDIAN(),
model_sum.MEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) residuals(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("MEDIAN" = MEDIAN(value ~ window())) residuals(model_frame)
Extract residuals from a fitted SMEAN model.
## S3 method for class 'SMEAN' residuals(object, ...)## S3 method for class 'SMEAN' residuals(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Residuals.
Other SMEAN:
SMEAN(),
fitted.SMEAN(),
forecast.SMEAN(),
model_sum.SMEAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) residuals(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) residuals(model_frame)
Extract residuals from a fitted SMEDIAN model.
## S3 method for class 'SMEDIAN' residuals(object, ...)## S3 method for class 'SMEDIAN' residuals(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Residuals.
Other SMEDIAN:
SMEDIAN(),
fitted.SMEDIAN(),
forecast.SMEDIAN(),
model_sum.SMEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) residuals(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) residuals(model_frame)
Extract residuals from a fitted SNAIVE2 model.
## S3 method for class 'SNAIVE2' residuals(object, ...)## S3 method for class 'SNAIVE2' residuals(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
Residuals.
Other SNAIVE2:
SNAIVE2(),
fitted.SNAIVE2(),
forecast.SNAIVE2(),
model_sum.SNAIVE2()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) residuals(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) residuals(model_frame)
Extract residuals from a fitted TBATS model.
## S3 method for class 'TBATS' residuals(object, ...)## S3 method for class 'TBATS' residuals(object, ...)
object |
A fitted |
... |
Additional arguments. Currently not used. |
This method is used by residuals() when extracting residuals from a
mable containing a TBATS model.
Residuals.
Other TBATS:
TBATS(),
fitted.TBATS(),
forecast.TBATS(),
model_sum.TBATS()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) residuals(model_frame)library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) residuals(model_frame)
Calculate the root mean squared error of a numeric vector.
rmse_vec(truth, estimate, na_rm = TRUE)rmse_vec(truth, estimate, na_rm = TRUE)
truth |
Numeric vector containing the actual values. |
estimate |
Numeric vector containing the forecasts. |
na_rm |
Logical value. If |
rmse_vec() computes the square root of the mean squared forecast error.
The metric is reported in the same units as the original data.
A numeric value.
Other accuracy functions:
mae_vec(),
make_accuracy(),
make_errors(),
mape_vec(),
me_vec(),
mpe_vec(),
mse_vec(),
smape_vec()
truth <- c(10, 20, 30) estimate <- c(8, 22, 25) rmse_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) rmse_vec(truth_na, estimate_na)truth <- c(10, 20, 30) estimate <- c(8, 22, 25) rmse_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) rmse_vec(truth_na, estimate_na)
Create a ggplot2 color scale based on a predefined tscv palette.
scale_color_tscv(palette = "main", discrete = TRUE, reverse = FALSE, ...)scale_color_tscv(palette = "main", discrete = TRUE, reverse = FALSE, ...)
palette |
Character value. Name of the palette. |
discrete |
Logical value. If |
reverse |
Logical value. If |
... |
Additional arguments passed to |
scale_color_tscv() creates either a discrete or continuous color scale
for the color aesthetic.
For discrete variables, the function uses ggplot2::discrete_scale().
For continuous variables, it uses ggplot2::scale_color_gradientn().
Available palettes are "main", "cool", "hot",
"mixed", and "grey".
A ggplot2 scale object.
Other data visualization:
plot_bar(),
plot_density(),
plot_histogram(),
plot_line(),
plot_point(),
plot_qq(),
scale_fill_tscv(),
theme_tscv(),
tscv_cols(),
tscv_pal()
library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_line( data = data, x = index, y = value, color = series, title = "M4 Monthly Time Series", subtitle = "Selected monthly series", xlab = "Time", ylab = "Value" ) + scale_color_tscv(palette = "main")library(dplyr) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) plot_line( data = data, x = index, y = value, color = series, title = "M4 Monthly Time Series", subtitle = "Selected monthly series", xlab = "Time", ylab = "Value" ) + scale_color_tscv(palette = "main")
Create a ggplot2 fill scale based on a predefined tscv palette.
scale_fill_tscv(palette = "main", discrete = TRUE, reverse = FALSE, ...)scale_fill_tscv(palette = "main", discrete = TRUE, reverse = FALSE, ...)
palette |
Character value. Name of the palette. |
discrete |
Logical value. If |
reverse |
Logical value. If |
... |
Additional arguments passed to |
scale_fill_tscv() creates either a discrete or continuous fill scale
for the fill aesthetic.
For discrete variables, the function uses ggplot2::discrete_scale().
For continuous variables, it uses ggplot2::scale_fill_gradientn().
Available palettes are "main", "cool", "hot",
"mixed", and "grey".
A ggplot2 scale object.
Other data visualization:
plot_bar(),
plot_density(),
plot_histogram(),
plot_line(),
plot_point(),
plot_qq(),
scale_color_tscv(),
theme_tscv(),
tscv_cols(),
tscv_pal()
library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) stats <- summarise_stats( .data = data, context = context ) plot_bar( data = stats, x = series, y = mean, color = series, title = "Average Value by Series", xlab = "Series", ylab = "Mean" ) + scale_fill_tscv(palette = "main")library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) stats <- summarise_stats( .data = data, context = context ) plot_bar( data = stats, x = series, y = mean, color = series, title = "Average Value by Series", xlab = "Series", ylab = "Mean" ) + scale_fill_tscv(palette = "main")
Extract test observations from a complete time series data set according to a
split plan created by make_split().
slice_test(main_frame, split_frame, context)slice_test(main_frame, split_frame, context)
main_frame |
A |
split_frame |
A |
context |
A named |
slice_test() uses the row positions stored in the test
list-column of split_frame to extract the corresponding observations
from main_frame. The function is designed for rolling-origin time
series cross-validation workflows.
The returned data has the same columns as main_frame, plus a
split column identifying the train-test split. If main_frame
contains multiple time series, slicing is performed separately for each
series using the series identifier supplied in context.
When make_split() was called with n_lag > 0, the test data may
include lagged observations before the forecast horizon.
A tibble containing the sliced test data. It contains the same columns
as main_frame, plus a split column.
Other time series cross-validation:
make_future(),
make_split(),
make_tsibble(),
slice_train(),
split_index()
library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series == "M23100") split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) test_frame <- slice_test( main_frame = main_frame, split_frame = split_frame, context = context ) test_framelibrary(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series == "M23100") split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) test_frame <- slice_test( main_frame = main_frame, split_frame = split_frame, context = context ) test_frame
Extract training observations from a complete time series data set according
to a split plan created by make_split().
slice_train(main_frame, split_frame, context)slice_train(main_frame, split_frame, context)
main_frame |
A |
split_frame |
A |
context |
A named |
slice_train() uses the row positions stored in the train
list-column of split_frame to extract the corresponding observations
from main_frame. The function is designed for rolling-origin time
series cross-validation workflows.
The returned data has the same columns as main_frame, plus a
split column identifying the train-test split. If main_frame
contains multiple time series, slicing is performed separately for each
series using the series identifier supplied in context.
A tibble containing the sliced training data. It contains the same
columns as main_frame, plus a split column.
Other time series cross-validation:
make_future(),
make_split(),
make_tsibble(),
slice_test(),
split_index()
library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series == "M23100") split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) train_framelibrary(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) main_frame <- M4_monthly_data |> filter(series == "M23100") split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) train_frame
Calculate the symmetric mean absolute percentage error of a numeric vector.
smape_vec(truth, estimate, na_rm = TRUE)smape_vec(truth, estimate, na_rm = TRUE)
truth |
Numeric vector containing the actual values. |
estimate |
Numeric vector containing the forecasts. |
na_rm |
Logical value. If |
smape_vec() computes the symmetric mean absolute percentage error:
abs(estimate - truth) / ((abs(truth) + abs(estimate)) / 2) * 100.
This metric is undefined when both truth and estimate are zero
and may return NaN in such cases.
A numeric value.
Other accuracy functions:
mae_vec(),
make_accuracy(),
make_errors(),
mape_vec(),
me_vec(),
mpe_vec(),
mse_vec(),
rmse_vec()
truth <- c(10, 20, 40) estimate <- c(8, 22, 30) smape_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) smape_vec(truth_na, estimate_na)truth <- c(10, 20, 40) estimate <- c(8, 22, 30) smape_vec(truth, estimate) truth_na <- c(10, 20, NA) estimate_na <- c(8, 22, 25) smape_vec(truth_na, estimate_na)
Specify a seasonal mean benchmark model for use with
fabletools::model().
SMEAN(formula, ...)SMEAN(formula, ...)
formula |
A model formula specifying the response and |
... |
Further arguments. |
SMEAN() forecasts each future observation using the historical mean of
the matching seasonal position. Use the lag() special to define the
seasonal period, for example lag("year") for monthly data.
A model definition that can be used inside fabletools::model().
Other SMEAN:
fitted.SMEAN(),
forecast.SMEAN(),
model_sum.SMEAN(),
residuals.SMEAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- M4_monthly_data |> filter(series == first(series)) |> as_tsibble(index = index) model_frame <- train_frame |> model("SMEAN" = SMEAN(value ~ lag("year"))) model_frame
Specify a seasonal median benchmark model for use with
fabletools::model().
SMEDIAN(formula, ...)SMEDIAN(formula, ...)
formula |
A model formula specifying the response and |
... |
Further arguments. |
SMEDIAN() forecasts each future observation using the historical median
of the matching seasonal position. Use the lag() special to define the
seasonal period, for example lag("week") for hourly data with weekly
seasonality.
A model definition that can be used inside fabletools::model().
Other SMEDIAN:
fitted.SMEDIAN(),
forecast.SMEDIAN(),
model_sum.SMEDIAN(),
residuals.SMEDIAN()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SMEDIAN" = SMEDIAN(value ~ lag("week"))) model_frame
Identify outliers in a numeric time series and replace them with smoothed values.
smooth_outlier(x, periods, ...)smooth_outlier(x, periods, ...)
x |
Numeric vector containing the time series observations. |
periods |
Numeric vector giving the seasonal periods of the time series,
for example |
... |
Further arguments passed to |
smooth_outlier() is a small wrapper around
forecast::tsoutliers(). The input vector is first converted to an
msts object using the seasonal periods supplied in periods.
For non-seasonal time series, forecast::tsoutliers() uses a
supsmu-based approach. For seasonal time series, the series is
decomposed using STL and outliers are identified on the remainder component.
Detected outliers are replaced by the replacement values returned by
forecast::tsoutliers().
The function returns a plain numeric vector with the same length as the input.
A numeric vector where detected outliers are replaced by smoothed values.
Other data preparation:
check_data(),
interpolate_missing()
library(dplyr) x <- M4_monthly_data |> filter(series == first(series)) |> pull(value) x_outlier <- x x_outlier[20] <- x_outlier[20] * 5 x_smoothed <- smooth_outlier( x = x_outlier, periods = 12 ) x_outlier[20] x_smoothed[20] hourly <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 14) |> pull(value) hourly_outlier <- hourly hourly_outlier[48] <- hourly_outlier[48] * 5 smooth_outlier( x = hourly_outlier, periods = c(24, 168) )library(dplyr) x <- M4_monthly_data |> filter(series == first(series)) |> pull(value) x_outlier <- x x_outlier[20] <- x_outlier[20] * 5 x_smoothed <- smooth_outlier( x = x_outlier, periods = 12 ) x_outlier[20] x_smoothed[20] hourly <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 14) |> pull(value) hourly_outlier <- hourly hourly_outlier[48] <- hourly_outlier[48] * 5 smooth_outlier( x = hourly_outlier, periods = c(24, 168) )
Specify a seasonal naive benchmark model for use with
fabletools::model().
SNAIVE2(formula, ...)SNAIVE2(formula, ...)
formula |
A model formula specifying the response variable, for example
|
... |
Further arguments. |
SNAIVE2() is intended for hourly time series. It uses a daily lag for
Tuesday to Friday observations and a weekly lag otherwise. This can be useful
for electricity price or load data where weekdays have similar intraday
structure and weekends require a weekly comparison.
A model definition that can be used inside fabletools::model().
Other SNAIVE2:
fitted.SNAIVE2(),
forecast.SNAIVE2(),
model_sum.SNAIVE2(),
residuals.SNAIVE2()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("SNAIVE2" = SNAIVE2(value)) model_frame
Create train and test indices for time series cross-validation.
split_index( n_total, n_init, n_ahead, n_skip = 0, n_lag = 0, mode = "slide", exceed = FALSE )split_index( n_total, n_init, n_ahead, n_skip = 0, n_lag = 0, mode = "slide", exceed = FALSE )
n_total |
Integer. The total number of observations in the time series. |
n_init |
Integer. The number of observations in the initial training window. |
n_ahead |
Integer. The forecast horizon, i.e. the number of observations in each test window. |
n_skip |
Integer. The number of observations to skip between split
origins. The default is |
n_lag |
Integer. The number of lagged observations to include before the
test window. The default is |
mode |
Character value. Either |
exceed |
Logical value. If |
split_index() creates integer index vectors for rolling-origin
resampling. The function can create either fixed-window or expanding-window
splits:
mode = "slide" creates a fixed training window that moves
forward over time.
mode = "stretch" creates an expanding training window that
always starts at the first observation.
The first training window contains n_init observations. Each test
window contains n_ahead observations. The argument n_skip
controls how many observations are skipped between consecutive split origins.
For example, with n_ahead = 18 and n_skip = 17, consecutive
test windows are non-overlapping.
If n_lag > 0, the test indices include lagged observations before the
forecast horizon. This is useful when lagged predictors are needed for
constructing features during testing.
If exceed = TRUE, additional out-of-sample test indices are allowed to
exceed the original sample size.
A list with two elements:
train: a list of integer vectors with training indices.
test: a list of integer vectors with test indices.
Other time series cross-validation:
make_future(),
make_split(),
make_tsibble(),
slice_test(),
slice_train()
# Fixed-window splits fixed_index <- split_index( n_total = 180, n_init = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "slide", exceed = FALSE ) fixed_index # Expanding-window splits expanding_index <- split_index( n_total = 180, n_init = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) expanding_index# Fixed-window splits fixed_index <- split_index( n_total = 180, n_init = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "slide", exceed = FALSE ) fixed_index # Expanding-window splits expanding_index <- split_index( n_total = 180, n_init = 120, n_ahead = 18, n_skip = 17, n_lag = 0, mode = "stretch", exceed = FALSE ) expanding_index
Calculate basic data-quality summary statistics for one or more time series.
summarise_data(.data, context)summarise_data(.data, context)
.data |
A |
context |
A named |
summarise_data() groups the input data by the series identifier
supplied in context and returns one row per time series.
The function reports:
start: first time index;
end: last time index;
n_obs: number of observations;
n_missing: number of missing values;
pct_missing: percentage of missing values;
n_zeros: number of zero values;
pct_zeros: percentage of zero values.
A tibble containing one row per time series and the calculated summary
statistics.
Other data analysis:
acf_vec(),
estimate_acf(),
estimate_kurtosis(),
estimate_mode(),
estimate_pacf(),
estimate_skewness(),
pacf_vec(),
summarise_split(),
summarise_stats()
library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) summarise_data( .data = data, context = context )library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) summarise_data( .data = data, context = context )
Summarise the time and row-index ranges of training and test samples.
summarise_split(data)summarise_split(data)
data |
A valid |
summarise_split() is intended for data sets that contain sliced
training and test observations from a time series cross-validation workflow.
The input must be a tsibble with the columns split,
sample, and id:
split: train-test split identifier;
sample: sample label, usually "train" or "test";
id: integer row index within the original time series.
The function returns one row per split. For each split, it reports the time range and index range of each sample.
A tibble containing the summarized split ranges.
Other data analysis:
acf_vec(),
estimate_acf(),
estimate_kurtosis(),
estimate_mode(),
estimate_pacf(),
estimate_skewness(),
pacf_vec(),
summarise_data(),
summarise_stats()
library(dplyr) library(tsibble) context <- list( series_id = "bidding_zone", value_id = "value", index_id = "time" ) main_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 120) split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 48, n_ahead = 24, n_skip = 23, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) |> mutate(sample = "train") test_frame <- slice_test( main_frame = main_frame, split_frame = split_frame, context = context ) |> mutate(sample = "test") split_data <- bind_rows(train_frame, test_frame) |> group_by(bidding_zone, split, sample) |> mutate(id = row_number()) |> ungroup() |> as_tsibble( index = time, key = c(bidding_zone, split, sample) ) summarise_split(split_data)library(dplyr) library(tsibble) context <- list( series_id = "bidding_zone", value_id = "value", index_id = "time" ) main_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 120) split_frame <- make_split( main_frame = main_frame, context = context, type = "first", value = 48, n_ahead = 24, n_skip = 23, n_lag = 0, mode = "stretch", exceed = FALSE ) train_frame <- slice_train( main_frame = main_frame, split_frame = split_frame, context = context ) |> mutate(sample = "train") test_frame <- slice_test( main_frame = main_frame, split_frame = split_frame, context = context ) |> mutate(sample = "test") split_data <- bind_rows(train_frame, test_frame) |> group_by(bidding_zone, split, sample) |> mutate(id = row_number()) |> ungroup() |> as_tsibble( index = time, key = c(bidding_zone, split, sample) ) summarise_split(split_data)
Calculate descriptive statistics for one or more time series.
summarise_stats(.data, context)summarise_stats(.data, context)
.data |
A |
context |
A named |
summarise_stats() groups the input data by the series identifier
supplied in context and returns one row per time series.
The function reports:
mean: arithmetic mean;
median: median;
mode: kernel-density based mode estimate;
sd: standard deviation;
p0: minimum;
p25: 25 percent quantile;
p75: 75 percent quantile;
p100: maximum;
skewness: moment-based skewness;
kurtosis: moment-based kurtosis.
Missing values are removed when calculating the statistics.
A tibble containing one row per time series and the calculated
descriptive statistics.
Other data analysis:
acf_vec(),
estimate_acf(),
estimate_kurtosis(),
estimate_mode(),
estimate_pacf(),
estimate_skewness(),
pacf_vec(),
summarise_data(),
summarise_split()
library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) summarise_stats( .data = data, context = context )library(dplyr) context <- list( series_id = "series", value_id = "value", index_id = "index" ) data <- M4_monthly_data |> filter(series %in% c("M23100", "M14395")) summarise_stats( .data = data, context = context )
Specify a TBATS model for use with fabletools::model().
TBATS(formula, ...)TBATS(formula, ...)
formula |
A model formula specifying the response variable, for example
|
... |
Further arguments passed to |
TBATS() is a model specification wrapper around
forecast::tbats() for the fable, tsibble, and
fabletools ecosystem.
TBATS stands for trigonometric seasonality, Box-Cox transformation, ARMA errors, trend, and seasonal components. It can be useful for time series with multiple or complex seasonal patterns.
The seasonal periods must be supplied through the periods argument.
A model definition that can be used inside fabletools::model().
Other TBATS:
fitted.TBATS(),
forecast.TBATS(),
model_sum.TBATS(),
residuals.TBATS()
library(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) model_framelibrary(dplyr) library(tsibble) library(fabletools) train_frame <- elec_price |> filter(bidding_zone == "DE") |> slice_head(n = 24 * 21) |> as_tsibble(index = time) model_frame <- train_frame |> model("TBATS" = TBATS(value, periods = c(24, 168))) model_frame
Create the default ggplot2 theme used by the tscv package.
theme_tscv( base_size = 11, base_family = "", base_line_size = base_size/22, base_rect_size = base_size/22 )theme_tscv( base_size = 11, base_family = "", base_line_size = base_size/22, base_rect_size = base_size/22 )
base_size |
Numeric value. Base font size. |
base_family |
Character value. Base font family. |
base_line_size |
Numeric value. Base line width for line elements. |
base_rect_size |
Numeric value. Base line width for rectangle elements. |
theme_tscv() returns a complete ggplot2 theme with a clean
layout, subtle grid lines, bottom legend placement, and formatting for plot
titles, subtitles, captions, facets, and axes.
The theme is used as the default theme in the plotting helpers of the
tscv package, such as plot_line(), plot_point(),
plot_bar(), plot_histogram(), plot_density(), and
plot_qq().
Since the returned object is a regular ggplot2 theme, it can also be
added directly to any ggplot object with + theme_tscv().
A complete ggplot2 theme.
Other data visualization:
plot_bar(),
plot_density(),
plot_histogram(),
plot_line(),
plot_point(),
plot_qq(),
scale_color_tscv(),
scale_fill_tscv(),
tscv_cols(),
tscv_pal()
library(dplyr) library(ggplot2) data <- M4_monthly_data |> filter(series == "M23100") |> mutate(index = as.Date(index)) # Plot with the default tscv theme plot_line( data = data, x = index, y = value, title = "M4 Monthly Time Series", subtitle = "Series M23100", xlab = "Time", ylab = "Value", theme_set = theme_tscv() ) # The same plot with the default ggplot2 grey theme plot_line( data = data, x = index, y = value, title = "M4 Monthly Time Series", subtitle = "Series M23100", xlab = "Time", ylab = "Value", theme_set = theme_grey() ) # theme_tscv() can also be added to regular ggplot objects ggplot(data, aes(x = index, y = value)) + geom_line(color = "grey35") + labs( title = "M4 Monthly Time Series", subtitle = "Series M23100", x = "Time", y = "Value" ) + theme_tscv()library(dplyr) library(ggplot2) data <- M4_monthly_data |> filter(series == "M23100") |> mutate(index = as.Date(index)) # Plot with the default tscv theme plot_line( data = data, x = index, y = value, title = "M4 Monthly Time Series", subtitle = "Series M23100", xlab = "Time", ylab = "Value", theme_set = theme_tscv() ) # The same plot with the default ggplot2 grey theme plot_line( data = data, x = index, y = value, title = "M4 Monthly Time Series", subtitle = "Series M23100", xlab = "Time", ylab = "Value", theme_set = theme_grey() ) # theme_tscv() can also be added to regular ggplot objects ggplot(data, aes(x = index, y = value)) + geom_line(color = "grey35") + labs( title = "M4 Monthly Time Series", subtitle = "Series M23100", x = "Time", y = "Value" ) + theme_tscv()
Extract named colors from the tscv color palette as hexadecimal color
codes.
tscv_cols(...)tscv_cols(...)
... |
Character values giving the names of colors to extract. |
tscv_cols() returns the hexadecimal color codes used by the
tscv package. If no color names are supplied, all available colors are
returned.
Available colors are:
"red", "green", "blue", "orange",
"yellow", "light grey", and "dark grey".
A named character vector of hexadecimal color codes.
Other data visualization:
plot_bar(),
plot_density(),
plot_histogram(),
plot_line(),
plot_point(),
plot_qq(),
scale_color_tscv(),
scale_fill_tscv(),
theme_tscv(),
tscv_pal()
# Return all available tscv colors tscv_cols() # Return selected colors tscv_cols("steelblue", "orange", "green") # Use a tscv color in a plot library(dplyr) data <- M4_monthly_data |> filter(series == "M23100") plot_line( data = data, x = index, y = value, title = "M4 Monthly Time Series", subtitle = "Series M23100", xlab = "Time", ylab = "Value", line_color = tscv_cols("steelblue") )# Return all available tscv colors tscv_cols() # Return selected colors tscv_cols("steelblue", "orange", "green") # Use a tscv color in a plot library(dplyr) data <- M4_monthly_data |> filter(series == "M23100") plot_line( data = data, x = index, y = value, title = "M4 Monthly Time Series", subtitle = "Series M23100", xlab = "Time", ylab = "Value", line_color = tscv_cols("steelblue") )
Create a color interpolation function based on one of the predefined
tscv palettes.
tscv_pal(palette = "main", reverse = FALSE, ...)tscv_pal(palette = "main", reverse = FALSE, ...)
palette |
Character value. Name of the palette. |
reverse |
Logical value. If |
... |
Additional arguments passed to
|
tscv_pal() returns a palette function created with
grDevices::colorRampPalette(). The returned function can be used to
generate any number of colors from the selected palette.
Available palettes are:
"main": blue, green, yellow.
"cool": blue, green.
"hot": yellow, orange, red.
"mixed": blue, green, yellow, orange, red.
"grey": light grey, dark grey.
A palette function that takes an integer and returns hexadecimal color codes.
Other data visualization:
plot_bar(),
plot_density(),
plot_histogram(),
plot_line(),
plot_point(),
plot_qq(),
scale_color_tscv(),
scale_fill_tscv(),
theme_tscv(),
tscv_cols()
# Create a palette function pal <- tscv_pal("main") # Generate five colors pal(5) # Reverse the palette tscv_pal("hot", reverse = TRUE)(5) # Use generated colors in base R barplot( height = c(3, 5, 4), col = tscv_pal("main")(3) )# Create a palette function pal <- tscv_pal("main") # Generate five colors pal(5) # Reverse the palette tscv_pal("hot", reverse = TRUE)(5) # Use generated colors in base R barplot( height = c(3, 5, 4), col = tscv_pal("main")(3) )