Title: | Time Series Outlier Detection |
---|---|
Description: | Time series outlier detection with non parametric test. This is a new outlier detection methodology (washer): efficient for time saving elaboration and implementation procedures, adaptable for general assumptions and for needing very short time series, reliable and effective as involving robust non parametric test. You can find two approaches: single time series (a vector) and grouped time series (a data frame). For other informations: Andrea Venturini (2011) Statistica - Universita di Bologna, Vol.71, pp.329-344. For an informal explanation look at R-bloggers on web. |
Authors: | Andrea Venturini |
Maintainer: | Andrea Venturini <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.3 |
Built: | 2024-12-03 06:58:39 UTC |
Source: | CRAN |
This sample data would be invented meteorological information detected by weather stations.
dati
dati
A data frame with 800 rows and 4 variables:
Temperature, Rain
ordered numbers for time (a number in the format YYYYMMDD [Year Month Day] is possible too)
label classification for the group, for example the identification code of a wheather station.
values
This is an example of a single time series with increasing trend and some variability.
ts
ts
A data frame with 35 rows and 1 variable:
pseudo random numers
This function provides anomaly signals (even a graphical visualization) when there is a 'jump' in a single time series, or the 'jump' is too much different respect those ones of grouped similar time series.
wash.out( dati, graph = FALSE, linear_analysis = FALSE, val_test_limit = 5, save_out = FALSE, out_out = "out.csv", pdf_out = "out.pdf", r_out = 3, c_out = 2, first_line = 1, pace_line = 6 )
wash.out( dati, graph = FALSE, linear_analysis = FALSE, val_test_limit = 5, save_out = FALSE, out_out = "out.csv", pdf_out = "out.pdf", r_out = 3, c_out = 2, first_line = 1, pace_line = 6 )
dati |
data frame (grouped time series: phenomenon+date+group+values) or vector (single time series) |
graph |
logical value for graphical analysis (default=FALSE) |
linear_analysis |
logical value for linear analysis (default=FALSE) |
val_test_limit |
value for outlier detection sensitiveness (default=5 ; max=10) |
save_out |
logical value for saving detected outliers (default=FALSE) |
out_out |
a character file name for saving outliers in csv form, delimited with ";" and using ',' as decimal separator (default out.csv) |
pdf_out |
a character file name for saving graphic analysis in pdf file (default=out.pdf) |
r_out |
rows number of graphs (default=3) |
c_out |
cols number of graphs (default=2) |
first_line |
value for first dotted line in graphic analysis (default=1) |
pace_line |
value for pace in dotted line in graphic analysis (default=6) |
Data frame of possible outliers in a triad. Output record: rows/time.2/series/y1/y2/y3/test(AV)/AV/ n/median(AV)/mad(AV)/madindex(AV). Where time.2 is the center of the triad y1, y2, y3; test(AV) is the number to compare with 5 to detect outlier; n is the number of observations of the group ....
## we can start with data without outliers but structured with co-movement between groups data("dati") ## first column for phenomenon ## 2° col for time written in ordered numbers or strings ## 3° col for group classification variable ## 4° col for values str(dati) ####################################### ## a data frame without any outlier ####################################### out=wash.out(dati) out ## empity data frame length(out[,1]) ## no row ## we can add two outliers #### time=3 temperature value=0 dati[99,4]= 0 ## ... and then for "rain" phenomenon! #### time=3 rain value=37 dati[118,4]= 37 ####################################### ## data.frame with 2 fresh outliers ####################################### out=wash.out(dati) ## all "three terms" time series ## let's take a look at anomalous time series out ## ... the same but we save results in a file.... ## If we don't specify a name, out.csv is the default out=wash.out(dati,save_out=TRUE,out_out="tabel_out.csv") out ## we put the parameter from 5 to 10, using this upper one to capture ## only particularly anomalous outliers out=wash.out(dati, val_test_limit = 10) out ## save plots and outliers in a pdf file "out.pdf" as a default out=wash.out(dati, val_test_limit = 10, graph=TRUE) out ## we can make the usual analysis for groups but we can also use that one ## reserved for every single time series ## (linear_analysis): two files for saved outliers (out.csv and linout.csv) ## and for graph display in two pdf files (out.pdf and linout.pdf) out=wash.out(dati,val_test_limit=5,save_out=TRUE,linear_analysis=TRUE,graph=TRUE) out ## out return only the linear analysis... ## ... in this case we lose the co-movement information an we run the risk ## of finding too much variance in a single time series ## and detecting not too much likely outliers ########################################################## ## single time series analysis ########################################################## data(ts) str(ts) sts= ts$dati plot(sts,type="b",pch=20,col="red") ## a time series with a variability and an increasing trend ## sts is a vector and linear analysis is the default one out=wash.out(sts) out ## we find no outlier out=wash.out(sts,val_test_limit=5,linear_analysis=TRUE,graph=TRUE) out ## no outlier ## We can add an outlier with limited amount sts[5]=sts[5]*2 plot(sts,type="b",pch=20,col="red") out=wash.out(sts,val_test_limit=5) out ## test is over 5 for a bit out=wash.out(sts,val_test_limit=5,save_out=TRUE,graph=TRUE) out data(ts) sts= ts$dati sts[5]=sts[5]*3 ## we can try a greater value to put an outlier of a certain importance plot(sts,type="b",pch=20,col="blue") out=wash.out(sts,val_test_limit=5,save_out=TRUE,graph=TRUE) out ## washer procedure identify three triads of outliers values system("rm *.csv *.pdf")
## we can start with data without outliers but structured with co-movement between groups data("dati") ## first column for phenomenon ## 2° col for time written in ordered numbers or strings ## 3° col for group classification variable ## 4° col for values str(dati) ####################################### ## a data frame without any outlier ####################################### out=wash.out(dati) out ## empity data frame length(out[,1]) ## no row ## we can add two outliers #### time=3 temperature value=0 dati[99,4]= 0 ## ... and then for "rain" phenomenon! #### time=3 rain value=37 dati[118,4]= 37 ####################################### ## data.frame with 2 fresh outliers ####################################### out=wash.out(dati) ## all "three terms" time series ## let's take a look at anomalous time series out ## ... the same but we save results in a file.... ## If we don't specify a name, out.csv is the default out=wash.out(dati,save_out=TRUE,out_out="tabel_out.csv") out ## we put the parameter from 5 to 10, using this upper one to capture ## only particularly anomalous outliers out=wash.out(dati, val_test_limit = 10) out ## save plots and outliers in a pdf file "out.pdf" as a default out=wash.out(dati, val_test_limit = 10, graph=TRUE) out ## we can make the usual analysis for groups but we can also use that one ## reserved for every single time series ## (linear_analysis): two files for saved outliers (out.csv and linout.csv) ## and for graph display in two pdf files (out.pdf and linout.pdf) out=wash.out(dati,val_test_limit=5,save_out=TRUE,linear_analysis=TRUE,graph=TRUE) out ## out return only the linear analysis... ## ... in this case we lose the co-movement information an we run the risk ## of finding too much variance in a single time series ## and detecting not too much likely outliers ########################################################## ## single time series analysis ########################################################## data(ts) str(ts) sts= ts$dati plot(sts,type="b",pch=20,col="red") ## a time series with a variability and an increasing trend ## sts is a vector and linear analysis is the default one out=wash.out(sts) out ## we find no outlier out=wash.out(sts,val_test_limit=5,linear_analysis=TRUE,graph=TRUE) out ## no outlier ## We can add an outlier with limited amount sts[5]=sts[5]*2 plot(sts,type="b",pch=20,col="red") out=wash.out(sts,val_test_limit=5) out ## test is over 5 for a bit out=wash.out(sts,val_test_limit=5,save_out=TRUE,graph=TRUE) out data(ts) sts= ts$dati sts[5]=sts[5]*3 ## we can try a greater value to put an outlier of a certain importance plot(sts,type="b",pch=20,col="blue") out=wash.out(sts,val_test_limit=5,save_out=TRUE,graph=TRUE) out ## washer procedure identify three triads of outliers values system("rm *.csv *.pdf")