Title: | Distance Measures for Time Series Data |
---|---|
Description: | A set of commonly used distance measures and some additional functions which, although initially not designed for this purpose, can be used to measure the dissimilarity between time series. These measures can be used to perform clustering, classification or other data mining tasks which require the definition of a distance measure between time series. U. Mori, A. Mendiburu and J.A. Lozano (2016), <doi:10.32614/RJ-2016-058>. |
Authors: | Usue Mori [aut, cre], Alexander Mendiburu [aut], Jose A. Lozano [aut], Duarte Folgado [ctb] |
Maintainer: | Usue Mori <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.7.1 |
Built: | 2024-12-25 06:53:58 UTC |
Source: | CRAN |
A complete set of distance measures specifically designed to deal with time series.
Package: | TSdist |
Type: | Package |
Version: | 3.1 |
Date: | 2015-07-14 |
License: | GPL (>=2) |
This package provides a comprehensive set of time series distance measures published in the literature and some additional functions which, although initially not designed for this purpose, can be used to measure the dissimilarity between time series. These measures can be used to perform clustering, classification or other data mining tasks which require the definition of a distance measure between time series. Some of the measures are specifically implemented for this package while other are originally hosted in other R packages. The measures included are:
Lp distances LPDistance
Distance based on the cross-correlation CCorDistance
Short Time Series distance (STS) STSDistance
Dynamic Time Warping (DTW) DTWDistance
LB_Keogh lower bound for the Dynamic Time Warping distance LBKeoghDistance
Edit Distance for Real Sequences (EDR) EDRDistance
Longest Common Subsequence distance for real sequences(LCSS) LCSSDistance
Edit Distance based on Real Penalty (ERP) ERPDistance
Distance based on the Fourier Discrete Transform FourierDistance
TQuest distance TquestDistance
Dissim distance DissimDistance
Autocorrelation-based dissimilarity ACFDistance
.
Partial autocorrelation-based dissimilarity PACFDistance
.
Dissimilarity based on LPC cepstral coefficients ARLPCCepsDistance
.
Model-based dissimilarity proposed by Maharaj (1996, 2000) ARMahDistance
.
Model-based dissimilarity proposed by Piccolo (1990) ARPicDistance
.
Compression-based dissimilarity measure CDMDistance
.
Complexity-invariant distance measure CIDDistance
.
Dissimilarities based on Pearson's correlation CorDistance
.
Dissimilarity index which combines temporal correlation and raw value
behaviors CortDistance
.
Integrated periodogram based dissimilarity IntPerDistance
.
Periodogram based dissimilarity PerDistance
.
Symbolic Aggregate Aproximation based dissimilarity MindistSaxDistance
.
Normalized compression based distance NCDDistance
.
Dissimilarity measure cased on nonparametric forecasts PredDistance
.
Dissimilarity based on the integrated squared difference between the log-spectra SpecISDDistance
.
General spectral dissimilarity measure using local-linear estimation of the log-spectra SpecLLRDistance
.
Permutation Distribution Distance PDCDistance
.
Frechet distance FrechetDistance
.
All the measures are implemented in separate functions but can also be invoked by means of the wrapper function TSDistances
. Moreover, this distance enables the use of time series objects of type ts
, zoo
and xts
.
As an additional functionality of the package, pairwise distances between all the time series in a database can be easily computed by using the dist
function from the proxy package or the TSDatabaseDistances
function included in the TSdist package.
Usue Mori, Alexander Mendiburu, Jose A. Lozano. Maintainer: <[email protected]>
Esling, P., & Agon, C. (2012). Time-series data mining. ACM Computing Surveys, 45(1), 1-34.
Liao, T. W. (2005). Clustering of time series data-a survey. Pattern Recognition, 38(11), 1857-1874.
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., & Keogh, E. (2012). Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery, 26(2), 275-309.
David Meyer and Christian Buchta (2013). proxy: Distance and Similarity Measures. R package version 0.4-10. http://CRAN.R-project.org/package=proxy
library(TSdist);
library(TSdist);
Computes the dissimilarity between a pair of numeric time series based on their estimated autocorrelation coefficients.
ACFDistance(x, y, ...)
ACFDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.ACF
function of package TSclust. As such, all the functionalities of the diss.ACF
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
Galeano, P., & Pella, D. (2000). Multivariate Analysis in Vector Time Series Pedro Galeano and Daniel Pella. Resenhas, the Journal of the Institute of Mathematics and Statistics of the University of Sao Paolo, 4, 383–403.
Lei, H., & Sun, B. (2007). A Study on the Dynamic Time Warping in Kernel Machines. In 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System (pp. 839–845).
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the autocorrelation based distance between the two series using # the default parameters: ACFDistance(example.series3, example.series4)
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the autocorrelation based distance between the two series using # the default parameters: ACFDistance(example.series3, example.series4)
Computes the dissimilarity between two numeric time series in terms of their Linear Predictive Coding (LPC) ARIMA processes.
ARLPCCepsDistance(x, y, ...)
ARLPCCepsDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.AR.LPC.CEPS
function of package TSclust. As such, all the functionalities of the diss.AR.LPC.CEPS
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package obtained from an ARIMA(3,0,2) process. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.lpc.ceps distance between the two series using # the default parameters. In this case an AR model is automatically # selected for each of the series: ARLPCCepsDistance(example.series3, example.series4) # Calculate the ar.lpc.ceps distance between the two series # imposing the order of the ARIMA model of each series: ARLPCCepsDistance(example.series3, example.series4, order.x=c(3,0,2), order.y=c(3,0,2))
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package obtained from an ARIMA(3,0,2) process. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.lpc.ceps distance between the two series using # the default parameters. In this case an AR model is automatically # selected for each of the series: ARLPCCepsDistance(example.series3, example.series4) # Calculate the ar.lpc.ceps distance between the two series # imposing the order of the ARIMA model of each series: ARLPCCepsDistance(example.series3, example.series4, order.x=c(3,0,2), order.y=c(3,0,2))
Computes the model based dissimilarity proposed by Maharaj.
ARMahDistance(x, y, ...)
ARMahDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.AR.MAH
function of package TSclust. As such, all the functionalities of the diss.AR.MAH
function are also available when using this function.
statistic |
The statistic of the homogeneity test. |
p-value |
The p-value issued by the homogeneity test. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package obtained from an ARIMA(3,0,2) process. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.mah distance between the two series using # the default parameters. ARMahDistance(example.series3, example.series4) # The p-value is almost 1, which indicates that the two series come from the same # ARMA process.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package obtained from an ARIMA(3,0,2) process. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.mah distance between the two series using # the default parameters. ARMahDistance(example.series3, example.series4) # The p-value is almost 1, which indicates that the two series come from the same # ARMA process.
Computes the model based dissimilarity proposed by Piccolo.
ARPicDistance(x, y, ...)
ARPicDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.AR.PIC
function of package TSclust. As such, all the functionalities of the diss.AR.PIC
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package obtained from an ARIMA(3,0,2) process. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the Piccolo distance between the two series using # the default parameters. In this case an AR model is automatically # selected for each of the series: ARPicDistance(example.series3, example.series4) # Calculate the Piccolo distance between the two series # imposing the order of the ARMA model of each series: ARPicDistance(example.series3, example.series4, order.x=c(3,0,2), order.y=c(3,0,2))
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package obtained from an ARIMA(3,0,2) process. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the Piccolo distance between the two series using # the default parameters. In this case an AR model is automatically # selected for each of the series: ARPicDistance(example.series3, example.series4) # Calculate the Piccolo distance between the two series # imposing the order of the ARMA model of each series: ARPicDistance(example.series3, example.series4, order.x=c(3,0,2), order.y=c(3,0,2))
Computes the distance measure based on the cross-correlation between a pair of numeric time series.
CCorDistance(x, y, lag.max=(min(length(x), length(y))-1))
CCorDistance(x, y, lag.max=(min(length(x), length(y))-1))
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
lag.max |
Positive integer that defines the maximum lag considered in the
cross-correlation calculations (default= |
The cross-correlation based distance between two numeric time series is calculated as follows:
where is the cross-correlation between
and
at lag
.
The summatory in the denominator goes from 1 to lag.max
. In view of this, the parameter must be a positive integer no larger than the length of the series.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Liao, T. W. (2005). Clustering of time series data-a survey. Pattern Recognition, 38(11), 1857-1874.
Pree, H., Herwig, B., Gruber, T., Sick, B., David, K., & Lukowicz, P. (2014). On general purpose time series similarity measures and their use as kernel functions in support vector machines. Information Sciences, 281, 478–495.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the cross-correlation based distance # using the default lag.max. CCorDistance(example.series3, example.series4) # Calculate the cross-correlaion based distance # with lag.max=50. CCorDistance(example.series3, example.series4, lag.max=50)
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the cross-correlation based distance # using the default lag.max. CCorDistance(example.series3, example.series4) # Calculate the cross-correlaion based distance # with lag.max=50. CCorDistance(example.series3, example.series4, lag.max=50)
Computes the dissimilarity between two numeric series based on their size after compression.
CDMDistance(x, y, ...)
CDMDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.CDM
function of package TSclust. As such, all the functionalities of the diss.CDM
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the compression based distance between the two series using # the default parameters. CDMDistance(example.series3, example.series4)
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the compression based distance between the two series using # the default parameters. CDMDistance(example.series3, example.series4)
Computes the dissimilarity between two numeric series of the same length by calculating a correction of the Euclidean distance based on the complexity estimation of the series.
CIDDistance(x, y)
CIDDistance(x, y)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
This is simply a wrapper for the diss.CID
function of package TSclust. As such, all the functionalities of the diss.CID
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the compression based distance between the two series using # the default parameters. CIDDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the compression based distance between the two series using # the default parameters. CIDDistance(example.series1, example.series2)
Computes two different distance measure based on Pearson's correlation between a pair of numeric time series of the same length.
CorDistance(x, y, ...)
CorDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.COR
function of package TSclust. As such, all the functionalities of the diss.COR
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., & Boesiger, P. (1998). A new correlation-based fuzzy logic clustering algorithm for FMRI. Magnetic Resonance in Medicine, 40(2), 249–260.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the first correlation based distance between the series. CorDistance(example.series1, example.series2) # Calculate the second correlation based distance between the series # by specifying \eqn{beta}. CorDistance(example.series1, example.series2, beta=2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the first correlation based distance between the series. CorDistance(example.series1, example.series2) # Calculate the second correlation based distance between the series # by specifying \eqn{beta}. CorDistance(example.series1, example.series2, beta=2)
Computes the dissimilarity between two numeric series of the same length by combining the dissimilarity between the raw values and the dissimilarity between the temporal correlation behavior of the series.
CortDistance(x, y, ...)
CortDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.CORT
function of package TSclust. As such, all the functionalities of the diss.CORT
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
Chouakria, A. D., & Nagabhushan, P. N. (2007). Adaptive dissimilarity index for measuring time series proximity. Advances in Data Analysis and Classification, 1(1), 5–21. http://doi.org/10.1007/s11634-006-0004-6
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the first correlation based distance between the series using the default # parameters. CortDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the first correlation based distance between the series using the default # parameters. CortDistance(example.series1, example.series2)
Computes the Dissim distance between a pair of numeric series.
DissimDistance(x, y, tx, ty)
DissimDistance(x, y, tx, ty)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
tx |
If not constant, a numeric vector that specifies the sampling index of series |
ty |
If not constant, a numeric vector that specifies the sampling index of series |
The Dissim distance is obtained by calculating the integral of the Euclidean distance between the two series. The series are assumed to be linear between sampling points.
The two series must start and end in the same interval but they may have different and non-constant sampling rates. These sampling rates must be positive and strictly increasing. For more information see reference below.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Frentzos, E., Gratsias, K., & Theodoridis, Y. (2007). Index-based Most Similar Trajectory Search. In Proceeding of the IEEE 23rd International Conference on Data Engineering (pp. 816-825).
Esling, P., & Agon, C. (2012). Time-series data mining. ACM Computing Surveys (CSUR), 45(1), 1–34.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
#The objects example.series1 and example.series2 are two #numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) #For information on their generation and shape see help #page of example.series. help(example.series) #Calculate the Dissim distance assuming even sampling: DissimDistance(example.series1, example.series2) #Calculate the Dissim distance assuming uneven sampling: tx<-unique(c(seq(2, 175, 2), seq(7, 175, 7))) tx <- tx[order(tx)] ty <- tx DissimDistance(example.series1, example.series2, tx, ty)
#The objects example.series1 and example.series2 are two #numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) #For information on their generation and shape see help #page of example.series. help(example.series) #Calculate the Dissim distance assuming even sampling: DissimDistance(example.series1, example.series2) #Calculate the Dissim distance assuming uneven sampling: tx<-unique(c(seq(2, 175, 2), seq(7, 175, 7))) tx <- tx[order(tx)] ty <- tx DissimDistance(example.series1, example.series2, tx, ty)
Computes the Dynamic Time Warping distance between a pair of numeric time series.
DTWDistance(x, y, ...)
DTWDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the dtw
function of package dtw. As such, all the functionalities of the dtw
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Giorgino T (2009). Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software, 31(7), pp. 1-24. URL:http://www.jstatsoft.org/v31/i07/
Cuturi, M. (2011). Fast Global Alignment Kernels. In Proceedings of the 28th International Conference on Machine Learning (pp. 929–936).
Gaidon, A., Harchaoui, Z., & Schmid, C. (2011). A time series kernel for action recognition. In BMVC 2011 - British Machine Vision Conference (pp. 63.1–63.11).
Marteau, P.-F., & Gibet, S. (2014). On Recursive Edit Distance Kernels With Applications To Time Series Classification. IEEE Transactions on Neural Networks and Learning Systems, PP(6), 1–13.
Lei, H., & Sun, B. (2007). A Study on the Dynamic Time Warping in Kernel Machines. In 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System (pp. 839–845).
Pree, H., Herwig, B., Gruber, T., Sick, B., David, K., & Lukowicz, P. (2014). On general purpose time series similarity measures and their use as kernel functions in support vector machines. Information Sciences, 281, 478–495.
To calculate a lower bound of the DTW distance see LBKeoghDistance
.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the basic DTW distance for two series of different length. DTWDistance(example.series3, example.series4) # Calculate the DTW distance for two series of different length # with a sakoechiba window of size 30: DTWDistance(example.series3, example.series4, window.type="sakoechiba", window.size=30) # Calculate the DTW distance for two series of different length # with an assymetric step pattern DTWDistance(example.series3, example.series4, step.pattern=asymmetric)
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the basic DTW distance for two series of different length. DTWDistance(example.series3, example.series4) # Calculate the DTW distance for two series of different length # with a sakoechiba window of size 30: DTWDistance(example.series3, example.series4, window.type="sakoechiba", window.size=30) # Calculate the DTW distance for two series of different length # with an assymetric step pattern DTWDistance(example.series3, example.series4, step.pattern=asymmetric)
Computes the Edit Distance for Real Sequences between a pair of numeric time series.
EDRDistance(x, y, epsilon, sigma)
EDRDistance(x, y, epsilon, sigma)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
epsilon |
A positive threshold value that defines the distance. |
sigma |
If desired, a Sakoe-Chiba windowing contraint can be added by specifying a positive integer representing the window size. |
The basic Edit Distance for Real Sequences between two numeric series is calculated. The idea is to count the number of edit operations (insert, delete, replace) that are necessary to transform one series into the other.
For that, if the Euclidean distance between two points and
is smaller that
epsilon
they will be considered equal () and if they are farther apart, they will be considered different (
).
As a last detail, this distance permits gaps or sequences of points that are not matched with any other point.
The length of series x
and y
may be different. Furthermore, if desired, a temporal constraint may be added to the EDR
distance. In this package, only the most basic windowing function, introduced
by H.Sakoe and S.Chiba (1978), is implemented. This function sets a band around the main diagonal of the distance matrix and avoids the matching of the points that are farther in time than a specified .
The size of the window must be a positive integer value. Furthermore, the following condition must be fulfilled:
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Chen, L., Ozsu, M. T., & Oria, V. (2005). Robust and Fast Similarity Search for Moving Object Trajectories. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (pp. 491-502).
Chen, L., & Ng, R. (2004). On The Marriage of Lp-norms and Edit Distance. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (pp. 792–803).
Cuturi, M. (2011). Fast Global Alignment Kernels. In Proceedings of the 28th International Conference on Machine Learning (pp. 929–936).
Gaidon, A., Harchaoui, Z., & Schmid, C. (2011). A time series kernel for action recognition. In BMVC 2011 - British Machine Vision Conference (pp. 63.1–63.11).
Marteau, P.-F., & Gibet, S. (2014). On Recursive Edit Distance Kernels With Applications To Time Series Classification. IEEE Transactions on Neural Networks and Learning Systems, PP(6), 1–13.
Lei, H., & Sun, B. (2007). A Study on the Dynamic Time Warping in Kernel Machines. In 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System (pp. 839–845).
Pree, H., Herwig, B., Gruber, T., Sick, B., David, K., & Lukowicz, P. (2014). On general purpose time series similarity measures and their use as kernel functions in support vector machines. Information Sciences, 281, 478–495.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the EDR distance for two series of different length # with no windowing constraint: EDRDistance(example.series3, example.series4, epsilon=0.1) # Calculate the EDR distance for two series of different length # with a window of size 30: EDRDistance(example.series3, example.series4, epsilon=0.1, sigma=30)
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the EDR distance for two series of different length # with no windowing constraint: EDRDistance(example.series3, example.series4, epsilon=0.1) # Calculate the EDR distance for two series of different length # with a window of size 30: EDRDistance(example.series3, example.series4, epsilon=0.1, sigma=30)
Computes the Edit Distance with Real Penalty between a pair of numeric time series.
ERPDistance(x, y, g, sigma)
ERPDistance(x, y, g, sigma)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
g |
The reference value used to penalize gaps. |
sigma |
If desired, a Sakoe-Chiba windowing contraint can be added by specifying a positive integer representing the window size. |
The basic Edit Distance with Real Penalty between two numeric series is calculated. Unlike other edit based distances included in this package, this distance is a metric and fulfills the triangle inequality.
The idea is to search for the minimal path in a distance matrix that describes the mapping between the two series. This distance matrix is built by using the Euclidean distance. However, unlike DTW, this distance permits gaps or sequences of points that are not matched with any other point. These gaps will be penalized based on the distance of the unmatched points from a reference value .
As with other edit based distances, the length of x
and y
may be different.
Furthermore, if desired, a temporal constraint may be added to the ERP
distance. In this package, only the most basic windowing function, introduced
by H.Sakoe and S.Chiba (1978), is implemented. This function sets a band around the main diagonal of the distance matrix and avoids the matching of the points that are farther in time than a specified .
The size of the window must be a positive integer value. Furthermore, the following condition must be fulfilled:
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Chen, L., & Ng, R. (2004). On The Marriage of Lp-norms and Edit Distance. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (pp. 792-803).
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
#The objects example.series3 and example.series4 are two #numeric series of length 100 and 120 contained in the TSdist #package. data(example.series3) data(example.series4) #For information on their generation and shape see #help page of example.series. help(example.series) #Calculate the ERP distance for two series of different length #with no windowing constraint: ERPDistance(example.series3, example.series4, g=0) #Calculate the ERP distance for two series of different length #with a window of size 30: ERPDistance(example.series3, example.series4, g=0, sigma=30)
#The objects example.series3 and example.series4 are two #numeric series of length 100 and 120 contained in the TSdist #package. data(example.series3) data(example.series4) #For information on their generation and shape see #help page of example.series. help(example.series) #Calculate the ERP distance for two series of different length #with no windowing constraint: ERPDistance(example.series3, example.series4, g=0) #Calculate the ERP distance for two series of different length #with a window of size 30: ERPDistance(example.series3, example.series4, g=0, sigma=30)
Computes the Euclidean distance between a pair of numeric vectors.
EuclideanDistance(x, y)
EuclideanDistance(x, y)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
The Euclidean distance is computed between the two numeric series using the following formula:
The two series must have the same length. This distance is calculated with the help of the dist
function of the proxy
package.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
David Meyer and Christian Buchta (2015). proxy: Distance and Similarity Measures. R package version 0.4-14. http://CRAN.R-project.org/package=proxy
This function can also be invoked by the wrapper function LPDistance
.
Furthermore, to calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the Euclidean distance between them: EuclideanDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the Euclidean distance between them: EuclideanDistance(example.series1, example.series2)
Example database saved both as a numeric matrix and as a zoo
object.
data(example.database); data(zoo.database);
data(example.database); data(zoo.database);
example.database
is saved in a numerical matrix.
zoo.database
is saved as a zoo
object with a given temporal index.
example.database
is a numerical matrix conformed by six ARMA(3,2) series of coefficients AR=(1, -0.24, 0.1) and MA=(1, 1.2) and length 100 that are situated in a row-wise format. They are generated from innovation vectors obtained randomly from a normal distribution of mean 0 and standard deviation 1, but by setting different random seeds.
zoo.database
is a copy of example.database
but saved in a zoo
object with a specific time index. The series are set in a column-wise format.
data(example.database); data(zoo.database); ## In example.database the series are set in a row-wise format. plot(example.database)[1,] ## In zoo.database the series are set in a column-wise format. plot(zoo.database)[,1]
data(example.database); data(zoo.database); ## In example.database the series are set in a row-wise format. plot(example.database)[1,] ## In zoo.database the series are set in a column-wise format. plot(zoo.database)[,1]
Example synthetic database with series belonging to 6 different classes.
data(example.database2);
data(example.database2);
example.database2
a list conformed of the following two elements:
data The 100 time series are stored in a numeric matrix, row-wise.
classes A numerical vector of length 100 that takes values in {1,2,3,4,5,6}. Each element in the vector represents the class of one of the series.
example.database2
is a database conformed of 100 series of length 100 obtained from 6 different classes. Each class is represented by the following function:
The class to which each series belongs is given in the classes
vector.
Class 1: random function
Class 2: periodic function
Class 3: increasing linear trend
Class 4: decreasing linear trend
Class 5: piecewise linear function which takes a value of for the first L/2+sh of the series and a value of
for the rest of the points.
Class 6: piecewise linear function which takes a value of for the first L/2+sh of the series and a value of
for the rest of the points.
is a random value issued from a
distribution,
is the length of the series, 100 in this case, and
is the period and is defined as a third of the length of the series.
is a random noise obtained from a
distribution.. Finally,
is an integer value that takes a random value between
and shifts the series sh positions to the right or left, depending on the sign.
data(example.database2); ## The "data" element of the list contains the time series, set in a row-wise format. plot(example.database2$data)[1,] ## The "classes" element in example.database2 contains the classes of the series: example.database2$classes
data(example.database2); ## The "data" element of the list contains the time series, set in a row-wise format. plot(example.database2$data)[1,] ## The "classes" element in example.database2 contains the classes of the series: example.database2$classes
Example synthetic database with ARMA series belonging to 5 different classes.
data(example.database3);
data(example.database3);
example.database3
a list conformed of the following two elements:
data The 50 time series are stored in a numeric matrix, row-wise.
classes A numerical vector of length 50 that takes values in {1,2,3,4,5}. Each element in the vector represents the class of one of the series.
example.database3
is a database conformed of 50 series of length 100 obtained from 5 different classes. Each class is obtained from a different initializations of an
ARMA(3,2) process of coefficients AR=(1,-0.24,0.1) and MA=(1,1.2).
Random noise is added to all the series by sampling values from a distribution. R is obtained from the following formula:
Finally, all the series in the database are shifted positions to the right or left,
being a random integer value extracted from
in each case.
data(example.database3); ## The "data" element of the list contains the time series, set in a row-wise format. plot(example.database3$data)[1,] ## The "classes" element in example.database3 contains the classes of the series: example.database3$classes
data(example.database3); ## The "data" element of the list contains the time series, set in a row-wise format. plot(example.database3$data)[1,] ## The "classes" element in example.database3 contains the classes of the series: example.database3$classes
Example series saved as numeric vectors and as zoo
objects.
data(example.series1); data(example.series2); data(example.series3); data(example.series4); data(zoo.series1); data(zoo.series2);
data(example.series1); data(example.series2); data(example.series3); data(example.series4); data(zoo.series1); data(zoo.series2);
example.series1
, example.series2
, example.series3
and example.series4
are saved in numerical vectors.
zoo.series1
and zoo.series2
are saved as zoo
objects with a given temporal index.
example.series1
and example.series2
are generated based on the Two Patterns synthetic database introduced by Geurts (2002).
example.series3
and example.series4
are two ARMA(3,2) series of coefficients AR=(1, -0.24, 0.1) and MA=(1, 1.2) and length 100 and 120 respectively. They are generated from a pair of innovation vectors obtained randomly from a normal distribution of mean 0 and standard deviation 1, but by setting different random seeds.
zoo.series1
and zoo.series2
are copies of example.series1
and example.series2
but with a specific time index.
Geurts, P. (2002). Contributions to decision tree induction: bias/variance tradeoff and time series classification. University of Liege, Belgium.
data(example.series1); data(example.series2); data(example.series3); data(example.series4); data(zoo.series1); data(zoo.series2); ## Plot series plot(example.series1, type="l") plot(example.series2, type="l") plot(example.series3, type="l") plot(example.series4, type="l") plot(zoo.series1) plot(zoo.series2)
data(example.series1); data(example.series2); data(example.series3); data(example.series4); data(zoo.series1); data(zoo.series2); ## Plot series plot(example.series1, type="l") plot(example.series2, type="l") plot(example.series3, type="l") plot(example.series4, type="l") plot(zoo.series1) plot(zoo.series2)
Computes the distance between a pair of numerical series based on their Discrete Fourier Transforms.
FourierDistance(x, y, n = (floor(length(x) / 2) + 1))
FourierDistance(x, y, n = (floor(length(x) / 2) + 1))
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
n |
Positive integer that represents the number of Fourier Coefficients to consider. ( default=(floor(length(x) / 2) + 1) ) |
The Euclidean distance between the first n
Fourier coefficients of series x
and y
is computed. The series must have the same length. Furthermore, n
should not be larger than the length of the series.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequence databases. In Proceedings of the 4th International Conference of Foundations of Data Organization and Algorithms (Vol. 5, pp. 69-84).
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Calculate the Fourier coefficient based distance using # the default number of coefficients: FourierDistance(example.series1, example.series2) # Calculate the Fourier coefficient based distance using # only the first 20 Fourier coefficients: FourierDistance(example.series1, example.series2, n=20)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Calculate the Fourier coefficient based distance using # the default number of coefficients: FourierDistance(example.series1, example.series2) # Calculate the Fourier coefficient based distance using # only the first 20 Fourier coefficients: FourierDistance(example.series1, example.series2, n=20)
Computes the Frechet distance between two numerical trajectories.
FrechetDistance(x, y, tx, ty, ...)
FrechetDistance(x, y, tx, ty, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
tx |
If not constant, a numeric vector that specifies the sampling index of series |
ty |
If not constant, a numeric vector that specifies the sampling index of series |
... |
Additional parameters for the function. See |
This is essentially a wrapper for the distFrechet
function of package longitudinalData. As such, all the functionalities of the distFrechet
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Christophe Genolini (2014). longitudinalData: Longitudinal Data. R package version 2.2. http://CRAN.R-project.org/package=longitudinalData
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
Eiter, T., & Mannila, H. (1994). Computing Discrete Frechet Distance. Technical Report. Retrieved from http://www.kr.tuwien.ac.at/staff/eiter/et-archive/cdtr9464.pdf
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.serie3 and example.series4 are two # numeric series of length 100 and 120, respectively. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the distance based on wavelet feature extraction between the series. ## Not run: FrechetDistance(example.series3, example.series4)
# The objects example.serie3 and example.series4 are two # numeric series of length 100 and 120, respectively. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the distance based on wavelet feature extraction between the series. ## Not run: FrechetDistance(example.series3, example.series4)
Computes the infinite norm distance between a pair of numeric vectors.
InfNormDistance(x, y)
InfNormDistance(x, y)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
The infinite norm distance is computed between the two numeric series using the following formula:
The two series must have the same length. This distance is calculated with the help of the dist
function of the proxy
package.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
David Meyer and Christian Buchta (2015). proxy: Distance and Similarity Measures. R package version 0.4-14. http://CRAN.R-project.org/package=proxy
This function can also be invoked by the wrapper function LPDistance
.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the infinite norm distance between them: InfNormDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the infinite norm distance between them: InfNormDistance(example.series1, example.series2)
Calculates the dissimilarity between two numerical series of the same length based on the distance between their integrated periodograms.
IntPerDistance(x, y, ...)
IntPerDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.INT.PER
function of package TSclust. As such, all the functionalities of the diss.INT.PER
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.mah distance between the two series using # the default parameters. IntPerDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.mah distance between the two series using # the default parameters. IntPerDistance(example.series1, example.series2)
Given a specific distance measure and a time series database, this function provides the K-medoids clustering result. Furthermore, if the ground truth clustering is provided, and the associated F-value is also provided.
KMedoids(data, k, ground.truth, distance, ...)
KMedoids(data, k, ground.truth, distance, ...)
data |
Time series database saved in a numeric matrix, a list, an |
k |
Integer value which represents the number of clusters. |
ground.truth |
Numerical vector which indicates the ground truth clustering of the database. |
distance |
Distance measure to be used. It must be one of: |
... |
Additional parameters required by the chosen distance measure. |
This function is useful to evaluate the performance of different distance measures in the task of clustering time series.
clustering |
Numerical vector providing the clustering result for the database. |
F |
F-value corresponding to the clustering result. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
To calculate the distance matrices of time series databases the TSDatabaseDistances
is used.
# The example.database3 synthetic database is loaded data(example.database3) tsdata <- example.database3[[1]] groundt <- example.database3[[2]] # Apply K-medoids clusterning for different distance measures KMedoids(data=tsdata, ground.truth=groundt, k=5, "euclidean") KMedoids(data=tsdata, ground.truth=groundt, k=5, "cid") KMedoids(data=tsdata, ground.truth=groundt, k=5, "pdc")
# The example.database3 synthetic database is loaded data(example.database3) tsdata <- example.database3[[1]] groundt <- example.database3[[2]] # Apply K-medoids clusterning for different distance measures KMedoids(data=tsdata, ground.truth=groundt, k=5, "euclidean") KMedoids(data=tsdata, ground.truth=groundt, k=5, "cid") KMedoids(data=tsdata, ground.truth=groundt, k=5, "pdc")
Computes the Keogh lower bound for the Dynamic Time Warping distance between a pair of numeric time series.
LBKeoghDistance(x, y, window.size)
LBKeoghDistance(x, y, window.size)
x |
Numeric vector containing the first time series (query time series). |
y |
Numeric vector containing the second time series (reference time series). |
window.size |
Window size that defines the upper and lower envelopes. |
The lower bound introduced by Keogh and Ratanamahatana (2005) is calculated for the Dynamic Time Warping distance. Given window.size
, the width of a Sakoe-Chiba band, an upper and lower envelope of the query time series is calculated in the following manner:
Based on this, the Keogh_LB distance is calculated as the Euclidean distance between the points in the reference time series (y
) that fall outside both the lower and upper envelopes, and their nearest point of the corresponding envelope.
The series must have the same length. Furthermore, the width of the window should be even in order to assure a symmetric band around the diagonal and should not exceed the length of the series.
d |
The Keogh lower bound of the Dynamic Time Warping distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Keogh, E., & Ratanamahatana, C. A. (2004). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358-386.
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43-49.
Esling, P., & Agon, C. (2012). Time-series data mining. ACM Computing Surveys (CSUR), 45(1), 1–34.
To calculate the full DTW distance see DTWDistance
.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Calculate the LB_Keogh distance measure for these two series # with a window of band of width 11: LBKeoghDistance(example.series1, example.series2, window.size=11)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Calculate the LB_Keogh distance measure for these two series # with a window of band of width 11: LBKeoghDistance(example.series1, example.series2, window.size=11)
Computes the Longest Common Subsequence distance between a pair of numeric time series.
LCSSDistance(x, y, epsilon, sigma)
LCSSDistance(x, y, epsilon, sigma)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
epsilon |
A positive threshold value that defines the distance. |
sigma |
If desired, a Sakoe-Chiba windowing contraint can be added by specifying a positive integer representing the window size. |
The Longest Common Subsequence for two real sequences is computed.
For this purpose,
the distances between the points of x
and y
are reduced to 0 or 1. If the Euclidean distance between two points and
is smaller than
epsilon
they are considered
equal and their distance is reduced to 0. In the opposite case,
the distance between them is represented with a value of 1.
Once the distance matrix is defined in this manner, the maximum common subsequence is seeked. Of course, as in other Edit Based Distances, gaps or unmatched regions are permitted and they are penalized with a value proportional to their length.
Based on its definition, the length of series x
and y
may be
different.
If desired, a temporal constraint may be added to the LCSS
distance. In this package, only the most basic windowing function, introduced
by H.Sakoe and S.Chiba (1978), is implemented. This function sets a band around the
main diagonal of the distance matrix and avoids the matching of the points that are farther in time than a specified .
The size of the window must be a positive integer value. Furthermore, the following condition must be fulfilled:
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Vlachos, M., Kollios, G., & Gunopulos, D. (2002). Discovering similar multidimensional trajectories. In Proceedings 18th International Conference on Data Engineering (pp. 673-684). IEEE Comput. Soc. doi:10.1109/ICDE.2002.994784
Chen, L., & Ng, R. (2004). On The Marriage of Lp-norms and Edit Distance. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (pp. 792–803).
Cuturi, M. (2011). Fast Global Alignment Kernels. In Proceedings of the 28th International Conference on Machine Learning (pp. 929–936).
Gaidon, A., Harchaoui, Z., & Schmid, C. (2011). A time series kernel for action recognition. In BMVC 2011 - British Machine Vision Conference (pp. 63.1–63.11).
Marteau, P.-F., & Gibet, S. (2014). On Recursive Edit Distance Kernels With Applications To Time Series Classification. IEEE Transactions on Neural Networks and Learning Systems, PP(6), 1–13.
Lei, H., & Sun, B. (2007). A Study on the Dynamic Time Warping in Kernel Machines. In 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System (pp. 839–845).
Pree, H., Herwig, B., Gruber, T., Sick, B., David, K., & Lukowicz, P. (2014). On general purpose time series similarity measures and their use as kernel functions in support vector machines. Information Sciences, 281, 478–495.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the LCSS distance for two series of different length # with no windowing constraint: LCSSDistance(example.series3, example.series4, epsilon=0.1) # Calculate the LCSS distance for two series of different length # with a window of size 30: LCSSDistance(example.series3, example.series4, epsilon=0.1, sigma=30)
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the LCSS distance for two series of different length # with no windowing constraint: LCSSDistance(example.series3, example.series4, epsilon=0.1) # Calculate the LCSS distance for two series of different length # with a window of size 30: LCSSDistance(example.series3, example.series4, epsilon=0.1, sigma=30)
Computes the distance based on the chosen Lp norm between a pair of numeric vectors.
LPDistance(x, y, method="euclidean", ...)
LPDistance(x, y, method="euclidean", ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
method |
A value in "euclidean", "manhattan", "infnorm", "minkowski". |
... |
If method="minkowski" a positive integer value must be specified for |
The distances based on Lp norms are computed between two numeric vectors using the following formulas:
Euclidean distance:
Manhattan distance:
Infinite norm distance:
Minkowski distance:
The two series must have the same length. Furthermore, in the case of the Minkowski distance, p
must be specified as a positive integer value.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
These distances are also implemeted in separate functions. For more information see EuclideanDistance
, ManhattanDistance
, MinkowskiDistance
and InfNormDistance
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the different Lp distances # Euclidean distance LPDistance(example.series1, example.series2, method="euclidean") # Manhattan distance LPDistance(example.series1, example.series2, method="manhattan") # Infinite norm distance LPDistance(example.series1, example.series2, method="infnorm") # Minkowski distance with p=3. LPDistance(example.series1, example.series2, method="minkowski", p=3)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the different Lp distances # Euclidean distance LPDistance(example.series1, example.series2, method="euclidean") # Manhattan distance LPDistance(example.series1, example.series2, method="manhattan") # Infinite norm distance LPDistance(example.series1, example.series2, method="infnorm") # Minkowski distance with p=3. LPDistance(example.series1, example.series2, method="minkowski", p=3)
Computes the Manhattan distance between a pair of numeric vectors.
ManhattanDistance(x, y)
ManhattanDistance(x, y)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
The Manhattan distance is computed between the two numeric series using the following formula:
The two series must have the same length. This distance is calculated with the help of the dist
function of the proxy
package.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
David Meyer and Christian Buchta (2015). proxy: Distance and Similarity Measures. R package version 0.4-14. http://CRAN.R-project.org/package=proxy
This function can also be invoked by the wrapper function LPDistance
.
Furthermore, to calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the Manhattan distance between them: ManhattanDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the Manhattan distance between them: ManhattanDistance(example.series1, example.series2)
Calculates the dissimilarity between two numerical series based on the distance between their SAX representations.
MindistSaxDistance(x, y, w, ...)
MindistSaxDistance(x, y, w, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
w |
The amount of equal sized windows that the series will be reduced to. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.MINDIST.SAX
function of package TSclust. As such, all the functionalities of the diss.MINDIST.SAX
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 respectively. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the mindist.sax distance between the two series using # 20 equal sized windows for each series. The rest of the parameters # are left in their default mode. MindistSaxDistance(example.series3, example.series4, w=20)
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 respectively. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the mindist.sax distance between the two series using # 20 equal sized windows for each series. The rest of the parameters # are left in their default mode. MindistSaxDistance(example.series3, example.series4, w=20)
Computes the Minkowski distance between two numeric vectors for a given p.
MinkowskiDistance(x, y, p)
MinkowskiDistance(x, y, p)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
p |
A strictly positive integer value that defines the chosen |
The Minkowski distance is computed between the two numeric series using the following formula:
The two series must have the same length and p
must be a positive integer value. This distance is calculated with the help of the dist
function of the proxy
package.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
David Meyer and Christian Buchta (2015). proxy: Distance and Similarity Measures. R package version 0.4-14. http://CRAN.R-project.org/package=proxy
This function can also be invoked by the wrapper function LPDistance
.
Furthermore, to calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the Minkowski distance between them: MinkowskiDistance(example.series1, example.series2, p=3)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Compute the Minkowski distance between them: MinkowskiDistance(example.series1, example.series2, p=3)
Calculates a normalized distance between two numerical series based on their compressed sizes.
NCDDistance(x, y, ...)
NCDDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.NCD
function of package TSclust. As such, all the functionalities of the diss.NCD
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 respectively. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the normalized compression based distance between the two series # using default parameter. NCDDistance(example.series3, example.series4)
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 respectively. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the normalized compression based distance between the two series # using default parameter. NCDDistance(example.series3, example.series4)
Given a specific distance measure, this function provides the 1NN classification values and the associated error for a specific train/test pair of time series databases.
OneNN(train, trainc, test, testc, distance, ...)
OneNN(train, trainc, test, testc, distance, ...)
train |
Time series database saved in a numeric matrix, a list, an |
trainc |
Numerical vector which indicates the class of each of the series in the training set. |
test |
Time series database saved in a numeric matrix, a list, an |
testc |
Numerical vector which indicates the class of each of the series in the testing set. |
distance |
Distance measure to be used. It must be one of: |
... |
Additional parameters required by the chosen distance measure. |
This function is useful to evaluate the performance of different distance measures in the task of classification of time series.
classes |
Numerical vector providing the predicted class values for the series in the test set. |
error |
Error obtained in the 1NN classification process. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
To calculate the distance matrices of time series databases the TSDatabaseDistances
is used.
# The example.database2 synthetic database is loaded data(example.database2) # Create train/test by dividing the dataset 70%-30% set.seed(100) trainindex <- sample(1:100, 70, replace=FALSE) train <- example.database2[[1]][trainindex, ] test <- example.database2[[1]][-trainindex, ] trainclass <- example.database2[[2]][trainindex] testclass <- example.database2[[2]][-trainindex] # Apply the 1NN classifier for different distance measures OneNN(train, trainclass, test, testclass, "euclidean") OneNN(train, trainclass, test, testclass, "pdc")
# The example.database2 synthetic database is loaded data(example.database2) # Create train/test by dividing the dataset 70%-30% set.seed(100) trainindex <- sample(1:100, 70, replace=FALSE) train <- example.database2[[1]][trainindex, ] test <- example.database2[[1]][-trainindex, ] trainclass <- example.database2[[2]][trainindex] testclass <- example.database2[[2]][-trainindex] # Apply the 1NN classifier for different distance measures OneNN(train, trainclass, test, testclass, "euclidean") OneNN(train, trainclass, test, testclass, "pdc")
Computes the dissimilarity between a pair of numeric time series based on their estimated partial autocorrelation coefficients.
PACFDistance(x, y, ...)
PACFDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.PACF
function of package TSclust. As such, all the functionalities of the diss.PACF
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the autocorrelation based distance between the two series using # the default parameters: PACFDistance(example.series3, example.series4)
# The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the # TSdist package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the autocorrelation based distance between the two series using # the default parameters: PACFDistance(example.series3, example.series4)
Calculates the permutation distribution distance between two numerical series of the same length.
PDCDistance(x, y, ...)
PDCDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the pdcDist
function of package pdc. As such, all the functionalities of the pdcDist
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Andreas M. Brandmaier (2015). pdc: An R package for Complexity-Based Clustering of Time Series. Journal of Statistical Software, Vol 67, Issue 5.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the normalized compression based distance between the two series # using the default parameters. PDCDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the normalized compression based distance between the two series # using the default parameters. PDCDistance(example.series1, example.series2)
Calculates the dissimilarity between two numerical series of the same length based on the distance between their periodograms.
PerDistance(x, y, ...)
PerDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.PER
function of package TSclust. As such, all the functionalities of the diss.PER
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.mah distance between the two series using # the default parameters. PerDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.mah distance between the two series using # the default parameters. PerDistance(example.series1, example.series2)
The dissimilarity of two numerical series of the same length is calculated based on the L1 distance between the kernel estimators of their forecast densities at a given time horizon.
PredDistance(x, y, h, ...)
PredDistance(x, y, h, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
h |
Integer value representing the prediction horizon. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.PRED
function of package TSclust. As such, all the functionalities of the diss.PRED
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the prediction based distance between the two series using # the default parameters. PredDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the prediction based distance between the two series using # the default parameters. PredDistance(example.series1, example.series2)
The dissimilarity of two numerical series of the same length is calculated based on an adaptation of the generalized likelihood ratio test.
SpecGLKDistance(x, y, ...)
SpecGLKDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This function simply intends to be a wrapper for the diss.SPEC.GLK
function of package TSclust. However, in the 1.2.3 version of the TSclust package we have found an error in the call to this function. As such, in this version, the more general diss
function, designed for distance matrix calculations of time series databases, is used to calculate the spec.glk distance between two series. Once this bug is fixed in the original package, we will update our call procedure.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.mah distance between the two series using # the default parameters. SpecGLKDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the ar.mah distance between the two series using # the default parameters. SpecGLKDistance(example.series1, example.series2)
The dissimilarity of two numerical series of the same length is calculated based on the integrated squared difference between the non-parametric estimators of their log-spectra.
SpecISDDistance(x, y, ...)
SpecISDDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.SPEC.ISD
function of package TSclust. As such, all the functionalities of the diss.SPEC.ISD
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the spec.isd distance between the two series using # the default parameters. SpecISDDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the spec.isd distance between the two series using # the default parameters. SpecISDDistance(example.series1, example.series2)
The dissimilarity of two numerical series of the same length is calculated based on the ratio between local linear estimations of the log-spectras.
SpecLLRDistance(x, y, ...)
SpecLLRDistance(x, y, ...)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
... |
Additional parameters for the function. See |
This is simply a wrapper for the diss.SPEC.LLR
function of package TSclust. As such, all the functionalities of the diss.SPEC.LLR
function are also available when using this function.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Pablo Montero, José A. Vilar (2014). TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. URL http://www.jstatsoft.org/v62/i01/.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the spec.isd distance between the two series using # the default parameters. SpecLLRDistance(example.series1, example.series2)
# The objects example.series1 and example.series2 are two # numeric series of length 100. data(example.series1) data(example.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the spec.isd distance between the two series using # the default parameters. SpecLLRDistance(example.series1, example.series2)
Computes the Short Time Series Distance between a pair of numeric time series.
STSDistance(x, y, tx, ty)
STSDistance(x, y, tx, ty)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
tx |
If not constant, a numeric vector that specifies the sampling index of series |
ty |
If not constant, a numeric vector that specifies the sampling index of series |
The short time series distance between two series is designed specially for series with an equal but uneven sampling rate. However, it can also be used for time series with a constant sampling rate. It is calculated as follows:
where is the length of series
and
.
tx
and ty
must be positive and strictly increasing. Furthermore, the sampling rate in both indexes must be equal:
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Möller-Levet, C. S., Klawonn, F., Cho, K., & Wolkenhauer, O. (2003). Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points. In Proceedings of the 5th International Symposium on Intelligent Data Analysis.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Calculate the STS distance assuming even sampling: STSDistance(example.series1, example.series2) # Calculate the STS distance providing an uneven sampling: tx<-unique(c(seq(2, 175, 2), seq(7, 175, 7))) tx <- tx[order(tx)] ty <- tx STSDistance(example.series1, example.series2, tx, ty)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Calculate the STS distance assuming even sampling: STSDistance(example.series1, example.series2) # Calculate the STS distance providing an uneven sampling: tx<-unique(c(seq(2, 175, 2), seq(7, 175, 7))) tx <- tx[order(tx)] ty <- tx STSDistance(example.series1, example.series2, tx, ty)
Computes the Time Alignment Measurement between a pair of numeric time series.
TAMDistance(x, y)
TAMDistance(x, y)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
The Time Alignment Measurement (TAM) between two numeric series is calculated. Quantifies the degree of temporal distortion between two time series.
The main idea behind TAM is to measure the warping cost between a given time series and another. TAM is calculated from the optimal alignment warping path between two time series provided by dtw
, which allows characterizing the intervals when the series are in phase, advance or delay. This distance penalizes signals where advance or delay is present and benefits series that are in phase with each other. As the distance increases, the dissimilarity between both signals also increases. The distance is bounded between 0 (both series are in phase) and 3 (both series are completely out-of-phase).
The length of series x
and y
may be different.
d |
The computed distance between the pair of series. |
Duarte Folgado
Duarte Folgado, Marília Barandas, Ricardo Matias, Rodrigo Martins, Miguel Carvalho, Hugo Gamboa (2018). Time Alignment Measurement for Time Series. Pattern Recognition 81, pp. 268-279.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the TAM distance for two series of the same length: TAMDistance(example.series1, example.series2) # Calculate the TAM distance for two series of different length: TAMDistance(example.series3, example.series4)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package. data(example.series3) data(example.series4) # For information on their generation and shape see # help page of example.series. help(example.series) # Calculate the TAM distance for two series of the same length: TAMDistance(example.series1, example.series2) # Calculate the TAM distance for two series of different length: TAMDistance(example.series3, example.series4)
Computes the Tquest distance between a pair of numeric vectors.
TquestDistance(x, y, tx, ty, tau)
TquestDistance(x, y, tx, ty, tau)
x |
Numeric vector containing the first time series. |
y |
Numeric vector containing the second time series. |
tx |
If not constant, temporal index of series |
ty |
If not constant, temporal index of series |
tau |
Parameter (threshold) used to define the threshold passing intervals. |
The TQuest distance represents the series based on a set of intervals that fulfill the following conditions:
All the values that the time series takes during these time intervals must be strictly above a user specified threshold tau
.
They are the largest possible intervals that satisfy the previous condition.
The final distance between two series is defined in terms of the similarity between their threshold passing interval sets. For more information, see references.
d |
The computed distance between the pair of series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
Aßfalg, J., Kriegel, H., Kröger, P., Kunath, P., Pryakhin, A., & Renz, M. (2006). Similarity Search on Time Series based on Threshold Queries. In Proceedings of the 10th international conference on Advances in Database Technology (pp. 276-294).
Esling, P., & Agon, C. (2012). Time-series data mining. ACM Computing Surveys (CSUR), 45(1), 1–34.
To calculate this distance measure using ts
, zoo
or xts
objects see TSDistances
. To calculate distance matrices of time series databases using this measure see TSDatabaseDistances
.
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Calculate the Tquest distance assuming even sampling: TquestDistance(example.series1, example.series2, tau=2.5) # The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package. data(example.series3) data(example.series4) # Calculate the Tquest distance for two series of different length: TquestDistance(example.series3, example.series4, tau=2.5)
# The objects example.series1 and example.series2 are two # numeric series of length 100 contained in the TSdist package. data(example.series1) data(example.series2) # For information on their generation and shape see help # page of example.series. help(example.series) # Calculate the Tquest distance assuming even sampling: TquestDistance(example.series1, example.series2, tau=2.5) # The objects example.series3 and example.series4 are two # numeric series of length 100 and 120 contained in the TSdist # package. data(example.series3) data(example.series4) # Calculate the Tquest distance for two series of different length: TquestDistance(example.series3, example.series4, tau=2.5)
TSdist distance matrix computation for time series databases.
TSDatabaseDistances(X, Y=NULL, distance, ...)
TSDatabaseDistances(X, Y=NULL, distance, ...)
X |
Time series database saved in a numeric matrix, a list, an |
Y |
Time series database saved in a numeric matrix, a list, an |
distance |
Distance measure to be used. It must be one of: |
f
... |
Additional parameters required by the chosen distance measure. |
The distance matrix of a time series database is calculated by providing the pair-wise distances between the series that conform it. x
can be saved in a numeric matrix, a list or a mts
, zoo
or xts
object. The following distance methods are supported:
"euclidean": Euclidean distance. EuclideanDistance
"manhattan": Manhattan distance. ManhattanDistance
"minkowski": Minkowski distance. MinkowskiDistance
"infnorm": Infinite norm distance. InfNormDistance
"ccor": Distance based on the cross-correlation. CCorDistance
"sts": Short time series distance. STSDistance
"dtw": Dynamic Time Warping distance. DTWDistance
. Uses the dtw package (see dtw
).
"lb.keogh": LB_Keogh lower bound for the Dynamic Time Warping distance. LBKeoghDistance
"edr": Edit distance for real sequences. EDRDistance
"erp": Edit distance with real penalty. ERPDistance
"lcss": Longest Common Subsequence Matching. LCSSDistance
"fourier": Distance based on the Fourier Discrete Transform. FourierDistance
"tquest": TQuest distance. TquestDistance
"dissim": Dissim distance. DissimDistance
"acf": Autocorrelation-based dissimilarity ACFDistance
. Uses the TSclust package (see diss.ACF
).
"pacf": Partial autocorrelation-based dissimilarity PACFDistance
. Uses the TSclust package (see diss.PACF
).
"ar.lpc.ceps": Dissimilarity based on LPC cepstral coefficients ARLPCCepsDistance
. Uses the TSclust package (see diss.AR.LPC.CEPS
).
"ar.mah": Model-based dissimilarity proposed by Maharaj (1996, 2000) ARMahDistance
. Uses the TSclust package (see diss.AR.MAH
).
"ar.pic": Model-based dissimilarity measure proposed by Piccolo (1990) ARPicDistance
. Uses the TSclust package (see diss.AR.PIC
).
"cdm": Compression-based dissimilarity measure CDMDistance
. Uses the TSclust package (see diss.CDM
).
"cid": Complexity-invariant distance measure CIDDistance
. Uses the TSclust package (see diss.CID
).
"cor": Dissimilarities based on Pearson's correlation CorDistance
. Uses the TSclust package (see diss.COR
).
"cort": Dissimilarity index which combines temporal correlation and raw value
behaviors CortDistance
. Uses the TSclust package (see diss.CORT
).
"int.per": Integrated periodogram based dissimilarity IntPerDistance
. Uses the TSclust package (see diss.INT.PER
).
"per": Periodogram based dissimilarity PerDistance
. Uses the TSclust package (see diss.PER
).
"mindist.sax": Symbolic Aggregate Aproximation based dissimilarity MindistSaxDistance
. Uses the TSclust package (see diss.MINDIST.SAX
).
"ncd": Normalized compression based distance NCDDistance
. Uses the TSclust package (see diss.NCD
).
"pred": Dissimilarity measure cased on nonparametric forecasts PredDistance
. Uses the TSclust package (see diss.PRED
).
"spec.glk": Dissimilarity based on the generalized likelihood ratio test SpecGLKDistance
. Uses the TSclust package (see diss.SPEC.GLK
).
"spec.isd": Dissimilarity based on the integrated squared difference between the log-spectra SpecISDDistance
. Uses the TSclust package (see diss.SPEC.ISD
).
"spec.llr": General spectral dissimilarity measure using local-linear estimation of the log-spectra SpecLLRDistance
. Uses the TSclust package (see diss.SPEC.LLR
).
"pdc": Permutation Distribution Distance PDCDistance
. Uses the pdc package (see pdcDist
).
"frechet": Frechet distance FrechetDistance
. Uses the longitudinalData package (see distFrechet
).
"tam": Time Aligment Measurement TAMDistance
.
Some distance measures may require additional arguments. See the individual help pages (detailed above) for more information about each method. These parameters should be named in order to avoid mismatches.
Finally, for options dissim
, dissimapprox
and sts
, databases conformed of series with different sampling rates can be introduced as a list
of zoo
, xts
or ts
objects, where each element in the list is a time series with its own time index.
D |
The computed distance matrix of the time series database. In some cases, such as ar.mahDistance or predDistance, some additional information is also provided. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
# The object example.database is a numeric matrix that saves # 6 ARIMA time series in a row-wise format. For more information # see help page of example.databases: help(example.database) data(example.database) # To calculate the distance matrix of this database: TSDatabaseDistances(example.database, distance="manhattan") TSDatabaseDistances(example.database, distance="edr", epsilon=0.2) TSDatabaseDistances(example.database, distance="fourier", n=20) # The object zoo.database is a zoo object that saves # the same 6 ARIMA time series saved in example.database. data(zoo.database) # To calculate the distance matrix of this database: TSDatabaseDistances(zoo.database, distance="manhattan") TSDatabaseDistances(zoo.database, distance="edr", epsilon=0.2) TSDatabaseDistances(zoo.database, distance="fourier", n=20)
# The object example.database is a numeric matrix that saves # 6 ARIMA time series in a row-wise format. For more information # see help page of example.databases: help(example.database) data(example.database) # To calculate the distance matrix of this database: TSDatabaseDistances(example.database, distance="manhattan") TSDatabaseDistances(example.database, distance="edr", epsilon=0.2) TSDatabaseDistances(example.database, distance="fourier", n=20) # The object zoo.database is a zoo object that saves # the same 6 ARIMA time series saved in example.database. data(zoo.database) # To calculate the distance matrix of this database: TSDatabaseDistances(zoo.database, distance="manhattan") TSDatabaseDistances(zoo.database, distance="edr", epsilon=0.2) TSDatabaseDistances(zoo.database, distance="fourier", n=20)
TSdist distance calculation between two time series.
TSDistances(x, y, tx, ty, distance, ...)
TSDistances(x, y, tx, ty, distance, ...)
x |
Numeric vector or |
y |
Numeric vector or |
tx |
Optional temporal index of series |
ty |
Optional temporal index of series |
distance |
Distance measure to be used. It must be one of: |
... |
Additional parameters required by the distance method. |
The distance between the two time series x
and y
is calculated. x
and y
can be saved in a numeric vector or a ts
, zoo
or xts
object. The following distance methods are supported:
"euclidean": Euclidean distance. EuclideanDistance
"manhattan": Manhattan distance. ManhattanDistance
"minkowski": Minkowski distance. MinkowskiDistance
"infnorm": Infinite norm distance. InfNormDistance
"ccor": Distance based on the cross-correlation. CCorDistance
"sts": Short time series distance. STSDistance
"dtw": Dynamic Time Warping distance. DTWDistance
. Uses the dtw package (see dtw
).
"lb.keogh": LB_Keogh lower bound for the Dynamic Time Warping distance. LBKeoghDistance
"edr": Edit distance for real sequences. EDRDistance
"erp": Edit distance with real penalty. ERPDistance
"lcss": Longest Common Subsequence Matching. LCSSDistance
"fourier": Distance based on the Fourier Discrete Transform. FourierDistance
"tquest": TQuest distance. TquestDistance
"dissim": Dissim distance. DissimDistance
"acf": Autocorrelation-based dissimilarity ACFDistance
. Uses the TSclust package (see diss.ACF
).
"pacf": Partial autocorrelation-based dissimilarity PACFDistance
. Uses the TSclust package (see diss.PACF
).
"ar.lpc.ceps": Dissimilarity based on LPC cepstral coefficients ARLPCCepsDistance
. Uses the TSclust package (see diss.AR.LPC.CEPS
).
"ar.mah": Model-based dissimilarity proposed by Maharaj (1996, 2000) ARMahDistance
. Uses the TSclust package (see diss.AR.MAH
).
"ar.pic": Model-based dissimilarity measure proposed by Piccolo (1990) ARPicDistance
. Uses the TSclust package (see diss.AR.PIC
).
"cdm": Compression-based dissimilarity measure CDMDistance
. Uses the TSclust package (see diss.CDM
).
"cid": Complexity-invariant distance measure CIDDistance
. Uses the TSclust package (see diss.CID
).
"cor": Dissimilarities based on Pearson's correlation CorDistance
. Uses the TSclust package (see diss.COR
).
"cort": Dissimilarity index which combines temporal correlation and raw value
behaviors CortDistance
. Uses the TSclust package (see diss.CORT
).
"int.per": Integrated periodogram based dissimilarity IntPerDistance
. Uses the TSclust package (see diss.INT.PER
).
"per": Periodogram based dissimilarity PerDistance
. Uses the TSclust package (see diss.PER
).
"mindist.sax": Symbolic Aggregate Aproximation based dissimilarity MindistSaxDistance
. Uses the TSclust package (see diss.MINDIST.SAX
).
"ncd": Normalized compression based distance NCDDistance
. Uses the TSclust package (see diss.NCD
).
"pred": Dissimilarity measure cased on nonparametric forecasts PredDistance
. Uses the TSclust package (see diss.PRED
).
"spec.glk": Dissimilarity based on the generalized likelihood ratio test SpecGLKDistance
. Uses the TSclust package (see diss.SPEC.GLK
).
"spec.isd": Dissimilarity based on the integrated squared difference between the log-spectra SpecISDDistance
. Uses the TSclust package (see diss.SPEC.ISD
).
"spec.llr": General spectral dissimilarity measure using local-linear estimation of the log-spectra SpecLLRDistance
. Uses the TSclust package (see diss.SPEC.LLR
).
"pdc": Permutation Distribution Distance PDCDistance
. Uses the pdc package (see pdcDist
).
"frechet": Frechet distance FrechetDistance
. Uses the longitudinalData package (see distFrechet
).
"tam": Time Aligment Measurement TAMDistance
.
Some distance measures may require additional arguments. See the individual help pages (detailed above) for more information about each method.
d |
The computed distance between the pair of time series. |
Usue Mori, Alexander Mendiburu, Jose A. Lozano.
# The objects zoo.series1 and zoo.series2 are two # zoo objects that save two series of length 100. data(zoo.series1) data(zoo.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # The distance calculation for these two series is done # as follows: TSDistances(zoo.series1, zoo.series2, distance="infnorm") TSDistances(zoo.series1, zoo.series2, distance="cor", beta=3) TSDistances(zoo.series1, zoo.series2, distance="dtw", sigma=20)
# The objects zoo.series1 and zoo.series2 are two # zoo objects that save two series of length 100. data(zoo.series1) data(zoo.series2) # For information on their generation and shape see # help page of example.series. help(example.series) # The distance calculation for these two series is done # as follows: TSDistances(zoo.series1, zoo.series2, distance="infnorm") TSDistances(zoo.series1, zoo.series2, distance="cor", beta=3) TSDistances(zoo.series1, zoo.series2, distance="dtw", sigma=20)