| Title: | Efficient Outlier Detection for Large Time Series Databases |
|---|---|
| Description: | Programs for detecting and cleaning outliers in single time series and in time series from homogeneous and heterogeneous databases using an Orthogonal Greedy Algorithm (OGA) for saturated linear regression models. The programs implement the procedures presented in the paper entitled "Efficient Outlier Detection for Large Time Series Databases" by Pedro Galeano, Daniel Peña and Ruey S. Tsay (2026), working paper, Universidad Carlos III de Madrid. Version 1.1.2 fixes one bug. |
| Authors: | Pedro Galeano [aut, cre] (ORCID: <https://orcid.org/0000-0003-2577-2747>), Daniel Peña [aut] (ORCID: <https://orcid.org/0000-0002-9137-1557>), Ruey S. Tsay [aut] (ORCID: <https://orcid.org/0000-0002-4949-4035>) |
| Maintainer: | Pedro Galeano <[email protected]> |
| License: | GPL-3 |
| Version: | 1.1.2 |
| Built: | 2026-06-01 08:59:02 UTC |
| Source: | https://github.com/cran/outliers.ts.oga |
Detects and cleans Additive Outliers (AOs) and Level Shifts (LSs) in time series that form a heterogeneous database, i.e. the series may have different definitions, sample sizes and/or frequencies. The function runs in parallel on the computer cores.
db_het_oga(Y)db_het_oga(Y)
Y |
The database, a |
The function applies the single_oga function to each of the time series that make up the database to detect outlier effects and clean the series of such effects. This process is run in parallel on the computer cores, which saves a lot of computational cost. The function provides a list of ts objects with the original series cleaned from the effect of the AOs and LSs, in addition to the location, size and t-statistic corresponding to each of them.
n_AOs |
A |
n_LSs |
A |
AOs |
A |
LSs |
A |
Y_clean |
The cleaned database, a |
result |
A message indicating when the procedure has worked correctly or the problem encountered if the procedure stops. |
The computational cost depends on the size of the database and the level of contamination of the series. Note that the function may take several minutes if the database contains hundred of series with thousands of observations.
Pedro Galeano.
Galeano, P., Peña, D. and Tsay, R. S. (2025). Efficient outlier detection for large time series databases. Working paper, Universidad Carlos III de Madrid.
# Load FRED_MD dataset data("FRED_MD") # Define frequency s, the same for all series s <- 12 # Define a list with the first 10 time series with frequency s X <- FRED_MD[,1:10] Y <- vector(mode='list',length=ncol(X)) for (k in 1:ncol(X)){Y[[k]] <- ts(X[,k],frequency=s)} # Apply the function to Y out_db_het_oga <- db_het_oga(Y)# Load FRED_MD dataset data("FRED_MD") # Define frequency s, the same for all series s <- 12 # Define a list with the first 10 time series with frequency s X <- FRED_MD[,1:10] Y <- vector(mode='list',length=ncol(X)) for (k in 1:ncol(X)){Y[[k]] <- ts(X[,k],frequency=s)} # Apply the function to Y out_db_het_oga <- db_het_oga(Y)
Detects and cleans Additive Outliers (AOs) and Level Shifts (LSs) in time series that form a homogeneous database, i.e. all series are defined similarly, have the same length and the same frequency. The function runs in parallel on the computer cores.
db_hom_oga(Y,s=NULL)db_hom_oga(Y,s=NULL)
Y |
The database, a |
s |
Optional, the time series frequency, i.e., the number of observations per unit of time ( |
The function applies the single_oga function to each of the time series that make up the database to detect outlier effects and clean the series of such effects. This process is run in parallel on the computer cores, which saves a lot of computational cost. The function provides a matrix with the original series cleaned from the effect of the AOs and LSs, in addition to the location, size and t-statistic corresponding to each of them.
n_AOs |
A |
n_LSs |
A |
AOs |
A |
LSs |
A |
Y_clean |
The cleaned database, a |
result |
A message indicating when the procedure has worked correctly or the problem encountered if the procedure stops. |
The computational cost depends on the size of the database and the level of contamination of the series. Note that the function may take several minutes if the database contains hundred of series with thousands of observations.
Pedro Galeano.
Galeano, P., Peña, D. and Tsay, R. S. (2025). Efficient outlier detection for large time series databases. Working paper, Universidad Carlos III de Madrid.
# Load FRED_MD dataset data("FRED_MD") # Define frequency s s <- 12 # Apply the procedure to the first 10 time series in FREDMDApril19 Y <- FRED_MD[,1:10] out_db_hom_oga <- db_hom_oga(Y,s=s)# Load FRED_MD dataset data("FRED_MD") # Define frequency s s <- 12 # Apply the procedure to the first 10 time series in FREDMDApril19 Y <- FRED_MD[,1:10] out_db_hom_oga <- db_hom_oga(Y,s=s)
Data obtained from the Federal Research Bank after process to remove missing values.
data("FRED_MD")data("FRED_MD")
An object of class "data.frame".
Algorithm for detecting and cleaning additive outliers and level shifts in a single time series with an Orthogonal Greedy Algorithm (OGA).
single_oga(yt,s=NULL)single_oga(yt,s=NULL)
yt |
A numeric |
s |
Optional, the time series frequency, i.e., the number of observations per unit of time ( |
The program detects and cleans a time series from the effect of Additive Outliers (AOs) and Level Shifts (LSs). For this purpose, the procedure proposed in the paper 'Efficient outlier detection in heterogeneous time series databases' by Galeano, Peña and Tsay (2024) is used. The procedure can be divided into three automatic steps. The initial step involves fitting a sufficiently high-order AR model to yt using robust regression to obtain an AR representation and a residual series. Then, an Orthogonal Greedy Algorithm (OGA) procedure is applied to the residual series to identify a set of potential AOs and LSs and to remove their effects from yt. The identified set of outlying effects is referred to as the first set of potential outliers. The second step is to identify and fit an ARIMA or SARIMA model, depending on whether seasonality is detected, to the outlier-adjusted series of the first step and to obtain a new residual series. The OGA procedure is then applied to this new residual series to identify a new set of potential AOs and LSs, if any. The detected outlying effects form the second set of potential outliers. The third step involves combining the potential outliers identified in the first and second steps to remove any redundancies so as to obtain a final set of potential AOs and LSs, and fitting an ARIMA (or SARIMA) model jointly with the final set of potential outliers. Then, any negligible outlying effects, if any, are removed. Finally, any detected AOs and LSs are removed from the observed time series yt to produce an outlier-free time series.
yt_clean |
A |
aos |
A |
lss |
A |
Pedro Galeano.
Galeano, P., Peña, D. and Tsay, R. S. (2025). Efficient outlier detection for large time series databases. Working paper, Universidad Carlos III de Madrid.
## Load FRED_MD dataset data("FRED_MD") Y <- FRED_MD ## Define time series yt and frequency s yt <- Y[,1] s <- 12 ## Apply the function to yt out_single_oga <- single_oga(yt,s=s)## Load FRED_MD dataset data("FRED_MD") Y <- FRED_MD ## Define time series yt and frequency s yt <- Y[,1] s <- 12 ## Apply the function to yt out_single_oga <- single_oga(yt,s=s)