This vignette provides an overview of Global Trend time series forecasting models. The LGT stands for Local and Global Trend model and is devised to forecast non-seasonal time series. Meanwhile, SGT stands for Seasonal model with Global Trend and is devised to complement the former on seasonal time series. The combination of these models were tested on the 3003 time-series M3-competition dataset, and they have been shown to outperform all of the models originally participating in the competition by a significant margin. It is quite likely that L/SGT is still the most accurate algorithm on the M3 data sets.
Model | sMAPE | MASE |
---|---|---|
Best M3 Model(Theta) | 12.76 | 1.39 |
ETS(ZZZ) | 13.13 | 1.43 |
LGT&SGT | 12.27 | 1.30 |
Additionally, the package provides the S2GT model that extends SGT to deal with two seasonalities, following Taylor (2003).
The models can be seen as extensions of following Exponential Smoothing models: Holt Linear Method (LGT), Holt-Winters model (SGT), and Taylor dual seasonality model (S2GT). The main differences are as follows:
The package also provides some variants of these base algorithms, they are described in later sections.
The models are Bayesian and are fitted using Markov Chain Monte Carlo with the RStan package. The user can “nudge” the models using many hyperparamters of the parameters’ prior distributions, altough this is usually not necessary. While the Hamiltonian Monte Carlo engine of the Stan system is relatively fast, the computation time required for a single time series is typically between a couple to up to 20 minutes. A substantially longer computation time is likely a sign of not a good match of the chosen model to the time series. The current version of the models only supports time series data with entirely positive data values. Many time series data fall into this category, while for others, a simple transformation can be performed to obtain the required positive time-series data.
The following sections present the models’ formulas:
yt + 1 ∼ Student(ν, ŷt + 1, σt + 1) (eq.1.1)
ŷt + 1 = lt + γltρ + λbt (eq.1.2)
lt + 1 = αyt + 1 + (1 − α)lt (eq.1.3)
bt + 1 = β(lt + 1 − lt) + (1 − β)bt (eq.1.4)
σ̂t + 1 = σŷt + 1τ + ξ (eq.1.5)
eq. 1.1. Student’s-t error distribution
We assume that the time series values follow Student’s t-distribution around the expected value with a time-varying size of error. The behavior of Student-t distribution changes smoothly from being similar to Normal distribution (for larger, i.e. above 20, values of degrees of freedom) to being close to Cauchy distribution (for smaller, close to 1 degrees of freedom), therefore accomodating a possible need for a fat-tailed distribution of error. The parameter ν, degrees of freedom, like all other parameters, is fitted automatically, not user specified.
eq. 1.2: Expected value for next step
The expected value for the next step is modeled as a sum of level lt, global nonlinear trend γltρ, and damped local linear trend λbt. We use the term “global” in the nonlinear trend description to stress the fact that both γ and ρ are fitted on the entire time series and do not change over time. Typically the global trend will dominate over the short-term behavior caused by the local linear trend.
Also, note that the simple function of global trend has some interesting properties: if ρ becomes close to 0, the contribution of the global trend to the expected value of the next step becomes constant, and therefore the trend becomes close to linear; when ρ becomes close to 1, the contribution of the global trend to the expected value of the next step becomes almost proportional to the level value, therefore together with the level providing the compound interest formula, and the trajectory of a multistep forecast becomes exponential. So, the global trend can model well a typical situation in fast growing businesses: the growth is faster than linear but slower than exponential.
The last term refers to the usual local trend in ETS model. However, there is an addition of the dampening parameter λ, constrained such that 0 < λ < 1, to reduce the strength of the local trend model.
eq. 1.5. Heteroscedastic error
The size of the error is modelled with a monotonically growing function of the expected value. Typically, the size of the error grows in absolute terms as the level and expected value grow, but due to averaging effects, it gets proportionally smaller - this formula models well such behavior. Please note that in extreme cases: when τ is close to 0, the error size is constant, and when τ gets cose to 1, the error size is proportional to the expected value, and thus approaches behavior of the multiplicative error.
eq. 1.3: Level adjustment formula
This formular is very similar to that of the Holt Linear Method, except that it does not include local nor global trends.
eq. 1.4: Local trend adjustment formula
The same as in the Holt Linear Method.
yt + 1 ∼ Student(ν, ŷt + 1, σt + 1) (eq.2.1)
ŷt + 1 = (lt + γltρ)st + 1 (eq.2.2)
$$l_{t+1}= \alpha \frac{y_{t+1}}{s_{t+1}}+ \left( 1- \alpha \right) l_{t} \quad (eq. 2.3)$$ $$s_{t+m+1}= \zeta \frac{y_{t+1}}{l_{t+1}}+ \left( 1- \zeta \right) s_{t+1} \quad (eq. 2.4)$$
σ̂t + 1 = σŷt + 1τ + ξ (eq.2.5)
The model is similar to the non-seasonal LGT model described above. There are a couple of modifications as follows:
eq. 2.3.Level adjustment formula
It is similar to the relevant Holt-Winters formula, but as in the LGT case, the trend is not included.
The S2GT model is an extension of the SGT model to be used on time series with two seasonalities. A classic example of this type of data is hourly electricity consumption which is affected by the time of the day as well as the day of the week. Such series can be modelled with dual 24 and 168 (=7*24) seasonality. We follow here Taylor (2003). Work on this algorithm is still ongoing, it is usually better to use the SGT model with seasonality equal to the maximum of a multiple seasonalities of the series, e.g. 168 for hourly data with weekly cycle.
yt + 1 ∼ Student(ν, ŷt + 1, σt + 1) (eq.3.1)
ŷt + 1 = (lt + γltρ)st + 1wt + 1 (eq.3.2)
$$l_{t+1}= \alpha \frac{y_{t+1}}{s_{t+1}w_{t+1}}+ \left( 1- \alpha \right) l_{t} \quad (eq. 3.3)$$
$$s_{t+m+1}= \zeta \frac{y_{t+1}}{l_{t+1}w_{t+1}}+ \left( 1- \zeta \right) s_{t+1} \quad (eq. 3.4)$$
$$w_{t+d+1}= \delta \frac{y_{t+1}}{l_{t+1}s_{t+1}}+ \left( 1- \delta \right) w_{t+1} \quad (eq. 3.5)$$
σ̂t + 1 = σŷt + 1τ + ξ (eq.3.6)
So far we have described standard models, that are well tested and usually work well. The package provides also some additional variants that can be useful in some situations; they are described in the following sections
When the error size appears to have its own dynamics that isn’t just a reflection of the current expected value of the series, e.g. during periods of high volatility on stock markets, a better function for the error size may be one that is proportional to an average of absolute values of recent “innovations” or errors. The formula is:
at + 1 = η|yt + 1 − ŷt + 1| + (1 − η)at (eq.4.1)
σ̂t + 1 = σat + 1 + ξ (eq.4.2)
This is an interesting option, but more demanding computationally and not advantegous for typical business series.
By default the models use multiplicative seasonality, as the additive seasonality is no good for series with strong trend. However, the multiplicative seasonality may produce too strong seasonality effects in some cases. The generalized seasonality spans additive to multiplicative seasonality and may be a better choice in situations where the seasonality effect gets proportinally smaller with the series growth.
yt + 1 ∼ Student(ν, ŷt + 1, σt + 1) (eq.2.1) ŷt + 1 = lt + γltρ + st + 1ltθ (eq.5.2) lt + 1 = α(yt + 1 − st + 1ltθ) + (1 − α)lt (eq.5.3)
$$s_{t+m+1}= \zeta \frac{y_{t+1}-l_{t+1}}{l_{t+1}^{\theta}}+ \left( 1- \zeta \right) s_{t+1} \quad (eq. 5.4)$$
σ̂t + 1 = σŷt + 1τ + ξ (eq.2.5)
Three formulas are different as compared to the SGT model with multiplicative seasonality.
eq. 5.2. Expected value formula
Instead of multiplying by st + 1 we are adding st + 1ltτ. It has interesting consequences: if τ is getting close to 0, ltτ gets close to 1 and then the whole term gets close to st + 1 and we get the additive seasonality. However on the opposite end, when τ is getting close to 1, as in the case of the nonlinear trend, we are getting the compound formula and the seasonality similar to the multiplicative one. Therefore the formula provides for the behavior that falls between additive and multiplicative seasonality.
Formulas for S2GT are modified along similar lines.
We provide two alternative method for calculating level in the seasonal models (SGT and S2GT), instead of using the Holt-Winters’ style way, as in eq.2.3 and 3.3.
Here the level is calculated as a smoothed moving average of the last SEASONALITY points in case of SGT, and SEASONALITY2 points in case of S2GT. This approach makes sense for series with a stable or steadily changing level, and unimportant occasional flare-ups. The benefits may include faster calculation and better accuracy. Eq. 2.3 becomes: lt + 1 = α * movingAvg[t − SEASONALITY + 1, t] + (1 − α)lt
It is an weighted average of the standard (HW) and seasAvg methods. Eq. 2.3 becomes: $$k_{t+1}= \alpha \frac{y_{t+1}}{s_{t+1}}+ \left( 1- \alpha \right) k_{t}$$ where k is calculated as the standard level, and the final level
lt + 1 = α * movingAvg[t − SEASONALITY + 1, t] + (1 − α)kt This is an interesting, although a bit slower method, often most accurate one.
All models support additional regressors. The data needs to be passed as a matrix (with number of columns equal to the number of regressors) to both rlgt (for training) and forecast functions.
The prior distributions of parameters have their hyper-parameters documented and available for tweaking, although most of the time it is not required. Perhaps the most likely hyperparameter to change would be POW_TREND_ALPHA which is the alpha parameter of the Beta distribution of the global trend power coefficient. Occasionally it may be useful to increase it above 1, to say 3-6, to encourage higher curvature of the global trend.
The seasonal models support specifying non-integer seasonalities. Internally this is done through a partial allocation of newly calculated seasonality coefficients.