Package 'washeR'

Title: Time Series Outlier Detection
Description: Time series outlier detection with non parametric test. This is a new outlier detection methodology (washer): efficient for time saving elaboration and implementation procedures, adaptable for general assumptions and for needing very short time series, reliable and effective as involving robust non parametric test. You can find two approaches: single time series (a vector) and grouped time series (a data frame). For other informations: Andrea Venturini (2011) Statistica - Universita di Bologna, Vol.71, pp.329-344. For an informal explanation look at R-bloggers on web.
Authors: Andrea Venturini
Maintainer: Andrea Venturini <[email protected]>
License: GPL (>= 2)
Version: 0.1.3
Built: 2024-12-03 06:58:39 UTC
Source: CRAN

Help Index


Data frame of meteorological data

Description

This sample data would be invented meteorological information detected by weather stations.

Usage

dati

Format

A data frame with 800 rows and 4 variables:

phen

Temperature, Rain

time

ordered numbers for time (a number in the format YYYYMMDD [Year Month Day] is possible too)

zone

label classification for the group, for example the identification code of a wheather station.

value

values


Time series

Description

This is an example of a single time series with increasing trend and some variability.

Usage

ts

Format

A data frame with 35 rows and 1 variable:

dati

pseudo random numers


Outlier detection for single or grouped time series

Description

This function provides anomaly signals (even a graphical visualization) when there is a 'jump' in a single time series, or the 'jump' is too much different respect those ones of grouped similar time series.

Usage

wash.out(
  dati,
  graph = FALSE,
  linear_analysis = FALSE,
  val_test_limit = 5,
  save_out = FALSE,
  out_out = "out.csv",
  pdf_out = "out.pdf",
  r_out = 3,
  c_out = 2,
  first_line = 1,
  pace_line = 6
)

Arguments

dati

data frame (grouped time series: phenomenon+date+group+values) or vector (single time series)

graph

logical value for graphical analysis (default=FALSE)

linear_analysis

logical value for linear analysis (default=FALSE)

val_test_limit

value for outlier detection sensitiveness (default=5 ; max=10)

save_out

logical value for saving detected outliers (default=FALSE)

out_out

a character file name for saving outliers in csv form, delimited with ";" and using ',' as decimal separator (default out.csv)

pdf_out

a character file name for saving graphic analysis in pdf file (default=out.pdf)

r_out

rows number of graphs (default=3)

c_out

cols number of graphs (default=2)

first_line

value for first dotted line in graphic analysis (default=1)

pace_line

value for pace in dotted line in graphic analysis (default=6)

Value

Data frame of possible outliers in a triad. Output record: rows/time.2/series/y1/y2/y3/test(AV)/AV/ n/median(AV)/mad(AV)/madindex(AV). Where time.2 is the center of the triad y1, y2, y3; test(AV) is the number to compare with 5 to detect outlier; n is the number of observations of the group ....

Examples

## we can start with data without outliers but structured with co-movement between groups
data("dati")
## first column for phenomenon
## 2° col for time written in ordered numbers or strings
## 3° col for group classification variable
## 4° col for values
str(dati)
#######################################
## a data frame without any outlier
#######################################
out=wash.out(dati)
out   ## empity data frame
length(out[,1])  ## no row
## we can add two outliers
####  time=3 temperature value=0
dati[99,4]=  0
## ... and then for "rain" phenomenon!
####  time=3 rain value=37
dati[118,4]=  37
#######################################
##   data.frame with 2 fresh outliers
#######################################
out=wash.out(dati)
##  all "three terms" time series
## let's take a look at anomalous time series
out
## ... the same but we save results in a file....
## If we don't specify a name, out.csv  is the default
out=wash.out(dati,save_out=TRUE,out_out="tabel_out.csv")
out
## we put the parameter from 5 to 10, using this upper one  to capture
##       only  particularly anomalous outliers
out=wash.out(dati, val_test_limit = 10)
out
## save plots and outliers in a pdf file "out.pdf" as a default
out=wash.out(dati, val_test_limit = 10, graph=TRUE)
out
## we can make the usual analysis for groups but we can also use that one
## reserved for every single time series
## (linear_analysis): two files for saved outliers (out.csv and linout.csv)
##  and for graph display in two pdf files (out.pdf and linout.pdf)
out=wash.out(dati,val_test_limit=5,save_out=TRUE,linear_analysis=TRUE,graph=TRUE)
out
## out return only the linear analysis...
## ... in this case we lose the co-movement information an we run the risk
##     of finding too much variance in a single time series
##     and detecting not too much likely outliers
##########################################################
##  single time series analysis
##########################################################
data(ts)
str(ts)
sts= ts$dati
plot(sts,type="b",pch=20,col="red")
## a time series with a variability and an increasing trend
## sts is a vector and linear analysis is the default one
out=wash.out(sts)
out
## we find no outlier
out=wash.out(sts,val_test_limit=5,linear_analysis=TRUE,graph=TRUE)
out
## no outlier
## We can add an outlier with limited amount
sts[5]=sts[5]*2
plot(sts,type="b",pch=20,col="red")
out=wash.out(sts,val_test_limit=5)
out
## test is over 5 for a bit
out=wash.out(sts,val_test_limit=5,save_out=TRUE,graph=TRUE)
out
data(ts)
sts= ts$dati
sts[5]=sts[5]*3
## we can try a greater value to put an outlier of a certain importance
plot(sts,type="b",pch=20,col="blue")
out=wash.out(sts,val_test_limit=5,save_out=TRUE,graph=TRUE)
out
## washer procedure identify three triads of outliers values
system("rm *.csv *.pdf")