Title: | Outlier Detection Using Invariant Coordinate Selection |
---|---|
Description: | Multivariate outlier detection is performed using invariant coordinates where the package offers different methods to choose the appropriate components. ICS is a general multivariate technique with many applications in multivariate analysis. ICSOutlier offers a selection of functions for automated detection of outliers in the data based on a fitted ICS object or by specifying the dataset and the scatters of interest. The current implementation targets data sets with only a small percentage of outliers. |
Authors: | Klaus Nordhausen [aut, cre] , Aurore Archimbaud [aut] , Anne Ruiz-Gazen [aut] |
Maintainer: | Klaus Nordhausen <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.4-0 |
Built: | 2024-10-09 06:45:24 UTC |
Source: | CRAN |
Multivariate outlier detection is performed using invariant coordinates where the package offers different methods to choose the appropriate components. ICS is a general multivariate technique with many applications in multivariate analysis. ICSOutlier offers a selection of functions for automated detection of outliers in the data based on a fitted ICS object or by specifying the dataset and the scatters of interest. The current implementation targets data sets with only a small percentage of outliers.
The DESCRIPTION file:
Package: | ICSOutlier |
Type: | Package |
Title: | Outlier Detection Using Invariant Coordinate Selection |
Version: | 0.4-0 |
Date: | 2023-12-13 |
Authors@R: | c(person("Klaus", "Nordhausen", email = "[email protected]", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-3758-8501")), person("Aurore", "Archimbaud", email = "[email protected]", role = "aut", comment = c(ORCID = "0000-0002-6511-9091")), person("Anne", "Ruiz-Gazen", email = "[email protected]", role = "aut", comment = c(ORCID = "0000-0001-8970-8061"))) |
Author: | Klaus Nordhausen [aut, cre] (<https://orcid.org/0000-0002-3758-8501>), Aurore Archimbaud [aut] (<https://orcid.org/0000-0002-6511-9091>), Anne Ruiz-Gazen [aut] (<https://orcid.org/0000-0001-8970-8061>) |
Maintainer: | Klaus Nordhausen <[email protected]> |
Depends: | R (>= 3.0.0), methods, ICS (>= 1.4-0), moments |
Imports: | graphics, grDevices, mvtnorm, parallel |
Suggests: | ICSClust, REPPlab, testthat (>= 3.0.0) |
Description: | Multivariate outlier detection is performed using invariant coordinates where the package offers different methods to choose the appropriate components. ICS is a general multivariate technique with many applications in multivariate analysis. ICSOutlier offers a selection of functions for automated detection of outliers in the data based on a fitted ICS object or by specifying the dataset and the scatters of interest. The current implementation targets data sets with only a small percentage of outliers. |
License: | GPL (>= 2) |
NeedsCompilation: | no |
Packaged: | 2023-12-13 11:13:06 UTC; admin |
Repository: | CRAN |
Date/Publication: | 2023-12-13 15:10:02 UTC |
Config/testthat/edition: | 3 |
RoxygenNote: | 7.2.3 |
Encoding: | UTF-8 |
Index of help topics:
HTP Production Measurements of High-Tech Parts - Full Rank Case HTP2 Production Measurements of High-Tech Parts - Singular Case HTP3 Production Measurements of High-Tech Parts - Nearly Singular Case ICSOutlier-package Outlier Detection Using Invariant Coordinate Selection ICS_outlier Outlier Detection Using ICS comp.norm.test Selection of Nonnormal Invariant Components Using Marginal Normality Tests comp.simu.test Selection of Nonnormal Invariant Components Using Simulations comp_norm_test Selection of Nonnormal Invariant Components Using Marginal Normality Tests comp_simu_test Selection of Nonnormal Invariant Components Using Simulations dist.simu.test Cut-Off Values Using Simulations for the Detection of Extreme ICS Distances dist_simu_test Cut-Off Values Using Simulations for the Detection of Extreme ICS Distances ics.distances Squared ICS Distances for Invariant Coordinates ics.outlier Outlier Detection Using ICS icsOut-class Class icsOut ics_distances Squared ICS Distances for Invariant Coordinates plot.ICS_Out Distances Plot for an 'ICS_Out' Object plot.icsOut Distances Plot for an icsOut Object print.ICS_Out Vector of Outlier Indicators print.icsOut Vector of Outlier Indicators summary.ICS_Out Summary of an 'ICS_Out' Object Summarizes an 'ICS_Out' object in an informative way. summary.icsOut Summarize a icsOut object
Klaus Nordhausen [aut, cre] (<https://orcid.org/0000-0002-3758-8501>), Aurore Archimbaud [aut] (<https://orcid.org/0000-0002-6511-9091>), Anne Ruiz-Gazen [aut] (<https://orcid.org/0000-0001-8970-8061>)
Maintainer: Klaus Nordhausen <[email protected]>
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. doi:10.1016/j.csda.2018.06.011.
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICSOutlier: Unsupervised Outlier Detection for Low-Dimensional Contamination Structure. The R Journal, 10:234-250. doi:10.32614/RJ-2018-034.
Identifies invariant coordinates that are non normal using univariate normality tests.
comp_norm_test( object, test = "agostino.test", type = "smallprop", level = 0.05, adjust = TRUE )
comp_norm_test( object, test = "agostino.test", type = "smallprop", level = 0.05, adjust = TRUE )
object |
object of class |
test |
name of the normality test to be used. Possibilites are |
type |
currently the only option is |
level |
the initial level used to make a decision based on the test p-values. See details. |
adjust |
logical. If |
Currently the only available type
is "smallprop"
which detects which of the components follow a univariately normal distribution. It starts from the first component and stops when a component is detected as gaussian. Five tests for univariate normality are available. See normal_crit() function for more general cases.
If adjust = FALSE
all tests are performed at the same level
. This leads however often to too many components. Therefore some multiple testing adjustments might be useful. The current default adjusts the level for the jth component as level
/j.
Note that the function is seldomly called directly by the user but internally by ICS_outlier()
.
A list containing:
index
: integer vector indicating the indices of the selected components.
test
: string with the name of the normality test used.
criterion
: vector of the p-values from the marginal normality tests for each component.
levels
: vector of the levels used for the decision for each component.
adjust
: logical. TRUE
if adjusted.
type
: type
used
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. doi:10.1016/j.csda.2018.06.011.
ICS(), comp_simu_test()
, jarque.test(),
anscombe.test(), bonett.test(), bonett.test(),
shapiro.test()
Z <- rmvnorm(1000, rep(0, 6)) # Add 20 outliers on the first component Z[1:20, 1] <- Z[1:20, 1] + 10 pairs(Z) icsZ <- ICS(Z) # The shift located outliers can be displayed in one dimension comp_norm_test(icsZ) # Only one invariant component is non normal and selected. comp_norm_test(icsZ, test = "bonett.test") # Example with no outlier Z0 <- rmvnorm(1000, rep(0, 6)) pairs(Z0) icsZ0 <-ICS(Z0) # Should select no component comp_norm_test(icsZ0, level = 0.01)$index
Z <- rmvnorm(1000, rep(0, 6)) # Add 20 outliers on the first component Z[1:20, 1] <- Z[1:20, 1] + 10 pairs(Z) icsZ <- ICS(Z) # The shift located outliers can be displayed in one dimension comp_norm_test(icsZ) # Only one invariant component is non normal and selected. comp_norm_test(icsZ, test = "bonett.test") # Example with no outlier Z0 <- rmvnorm(1000, rep(0, 6)) pairs(Z0) icsZ0 <-ICS(Z0) # Should select no component comp_norm_test(icsZ0, level = 0.01)$index
Identifies invariant coordinates that are nonnormal using simulations under a standard multivariate normal model for a specific data setup and scatter combination.
comp_simu_test( object, S1 = NULL, S2 = NULL, S1_args = list(), S2_args = list(), m = 10000, type = "smallprop", level = 0.05, adjust = TRUE, n_cores = NULL, iseed = NULL, pkg = "ICSOutlier", q_type = 7, ... )
comp_simu_test( object, S1 = NULL, S2 = NULL, S1_args = list(), S2_args = list(), m = 10000, type = "smallprop", level = 0.05, adjust = TRUE, n_cores = NULL, iseed = NULL, pkg = "ICSOutlier", q_type = 7, ... )
object |
object of class |
S1 |
an object of class |
S2 |
an object of class |
S1_args |
a list containing additional arguments for |
S2_args |
a list containing additional arguments for |
m |
number of simulations. Note that since extreme quantiles are of interest |
type |
currently the only type option is |
level |
the initial level used to make a decision. The cut-off values are the (1- |
adjust |
logical. If |
n_cores |
number of cores to be used. If |
iseed |
If parallel computation is used the seed passed on to |
pkg |
When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via |
q_type |
specifies the quantile algorithm used in |
... |
further arguments passed on to the function |
Based on simulations it detects which of the components follow a univariately normal distribution. More precisely it identifies the observed eigenvalues larger than the ones coming
from normal distributed data. m
standard normal data sets are simulated using the same data size and scatters as specified in the "ICS"
object.
The cut-off values are determined based on a quantile of these simulated eigenvalues.
As the eigenvalues, aka generalized kurtosis values, of ICS are ordered it is natural to perform the comparison in a specific order depending on the purpose.
Currently the only available type
is "smallprop"
so starting with the first component, the observed eigenvalues are successively compared to these cut-off values. The precedure stops when an eigenvalue is below the corresponding cut-off, so when a normal component is detected.
If adjust = FALSE
all eigenvalues are compared to the same (1-level
)th level of the quantile. This leads however often to too many selected components.
Therefore some multiple testing adjustment might be useful. The current default adjusts the quantile for the jth component as 1-level
/j.
Note that depending on the data size and scatters used this can take a while and so it is more efficient to parallelize computations.
Note also that the function is seldomly called directly by the user but internally by ICS_outlier()
.
A list containing:
index
: integer vector indicating the indices of the selected components.
test
: string "simulation"
.
criterion
: vector of the cut-off values for all the eigenvalues.
levels
: vector of the levels used for the decision for each component.
adjust
: logical. TRUE
if adjusted.
type
: type
used
m
: number of iterations m
used in the simulations.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. doi:10.1016/j.csda.2018.06.011.
# For a real analysis use larger values for m and more cores if available set.seed(123) Z <- rmvnorm(1000, rep(0, 6)) # Add 20 outliers on the first component Z[1:20, 1] <- Z[1:20, 1] + 10 pairs(Z) icsZ <- ICS(Z) # For demo purpose only small m value, should select the first component comp_simu_test(icsZ, S1 = ICS_cov, S2= ICS_cov4, m = 400, n_cores = 1) ## Not run: # For using two cores # For demo purpose only small m value, should select the first component comp_simu_test(icsZ, S1 = ICS_cov, S2 = ICS_cov4, m = 500, n_cores = 2, iseed = 123) # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with MCD estimates and the usual estimates library(ICSClust) icsZmcd <- ICS(Z, S1 = ICS_mcd_raw, S2 = ICS_cov, S1_args = list(alpha = 0.75)) # For demo purpose only small m value, should select the first component comp_simu_test(icsZmcd, S1 = ICS_mcd_raw, S2 = ICS_cov, S1_args = list(alpha = 0.75, location = TRUE), m = 500, ncores = detectCores()-1, pkg = c("ICSOutlier", "ICSClust"), iseed = 123) ## End(Not run) # Example with no outlier Z0 <- rmvnorm(1000, rep(0, 6)) pairs(Z0) icsZ0 <- ICS(Z0) # Should select no component comp_simu_test(icsZ0,S1 = ICS_cov, S2 = ICS_cov4, m = 400, level = 0.01, n_cores = 1)
# For a real analysis use larger values for m and more cores if available set.seed(123) Z <- rmvnorm(1000, rep(0, 6)) # Add 20 outliers on the first component Z[1:20, 1] <- Z[1:20, 1] + 10 pairs(Z) icsZ <- ICS(Z) # For demo purpose only small m value, should select the first component comp_simu_test(icsZ, S1 = ICS_cov, S2= ICS_cov4, m = 400, n_cores = 1) ## Not run: # For using two cores # For demo purpose only small m value, should select the first component comp_simu_test(icsZ, S1 = ICS_cov, S2 = ICS_cov4, m = 500, n_cores = 2, iseed = 123) # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with MCD estimates and the usual estimates library(ICSClust) icsZmcd <- ICS(Z, S1 = ICS_mcd_raw, S2 = ICS_cov, S1_args = list(alpha = 0.75)) # For demo purpose only small m value, should select the first component comp_simu_test(icsZmcd, S1 = ICS_mcd_raw, S2 = ICS_cov, S1_args = list(alpha = 0.75, location = TRUE), m = 500, ncores = detectCores()-1, pkg = c("ICSOutlier", "ICSClust"), iseed = 123) ## End(Not run) # Example with no outlier Z0 <- rmvnorm(1000, rep(0, 6)) pairs(Z0) icsZ0 <- ICS(Z0) # Should select no component comp_simu_test(icsZ0,S1 = ICS_cov, S2 = ICS_cov4, m = 400, level = 0.01, n_cores = 1)
Identifies invariant coordinates that are non normal using univariate normality tests.
comp.norm.test(object, test = "agostino.test", type = "smallprop", level = 0.05, adjust = TRUE)
comp.norm.test(object, test = "agostino.test", type = "smallprop", level = 0.05, adjust = TRUE)
object |
object of class |
test |
name of the normality test to be used. Possibilites are |
type |
currently the only option is |
level |
the initial level used to make a decision based on the test p-values. See details. |
adjust |
logical. If |
Currently the only available type
is "smallprop"
which detects which of the components follow a univariately normal distribution. It starts
from the first component and stops when a component is detected as gaussian. Five tests for univariate normality are available.
If adjust = FALSE
all tests are performed at the same level
. This leads however often to too many components.
Therefore some multiple testing adjustments might be useful. The current default adjusts the level for the jth component as
level
/j.
Note that the function is seldomly called directly by the user but internally by ics.outlier
.
A list containing:
index |
integer vector indicating the indices of the selected components. |
test |
string with the name of the normality test used. |
criterion |
vector of the p-values from the marginal normality tests for each component. |
levels |
vector of the levels used for the decision for each component. |
adjust |
logical. |
type |
|
Function comp.norm.test
reached the end of its lifecycle, please use comp_norm_test
instead. In future versions, comp.norm.test
will be deprecated and eventually removed.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. <https://doi.org/10.1016/j.csda.2018.06.011>.
ics2
, comp.simu.test
, jarque.test
, anscombe.test
,
bonett.test
, agostino.test
,
shapiro.test
Z <- rmvnorm(1000, rep(0, 6)) # Add 20 outliers on the first component Z[1:20, 1] <- Z[1:20, 1] + 10 pairs(Z) icsZ <- ics2(Z) # The shift located outliers can be displayed in one dimension comp.norm.test(icsZ) # Only one invariant component is non normal and selected. comp.norm.test(icsZ, test = "bo") # Example with no outlier Z0 <- rmvnorm(1000, rep(0, 6)) pairs(Z0) icsZ0 <- ics2(Z0) # Should select no component comp.norm.test(icsZ0, level = 0.01)$index
Z <- rmvnorm(1000, rep(0, 6)) # Add 20 outliers on the first component Z[1:20, 1] <- Z[1:20, 1] + 10 pairs(Z) icsZ <- ics2(Z) # The shift located outliers can be displayed in one dimension comp.norm.test(icsZ) # Only one invariant component is non normal and selected. comp.norm.test(icsZ, test = "bo") # Example with no outlier Z0 <- rmvnorm(1000, rep(0, 6)) pairs(Z0) icsZ0 <- ics2(Z0) # Should select no component comp.norm.test(icsZ0, level = 0.01)$index
Identifies invariant coordinates that are nonnormal using simulations under a standard multivariate normal model for a specific data setup and scatter combination.
comp.simu.test(object, m = 10000, type = "smallprop", level = 0.05, adjust = TRUE, ncores = NULL, iseed = NULL, pkg = "ICSOutlier", qtype = 7, ...)
comp.simu.test(object, m = 10000, type = "smallprop", level = 0.05, adjust = TRUE, ncores = NULL, iseed = NULL, pkg = "ICSOutlier", qtype = 7, ...)
object |
object of class |
m |
number of simulations. Note that since extreme quantiles are of interest |
type |
currently the only type option is |
level |
the initial level used to make a decision. The cut-off values are the (1- |
adjust |
logical. If |
ncores |
number of cores to be used. If |
iseed |
If parallel computation is used the seed passed on to |
pkg |
When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via |
qtype |
specifies the quantile algorithm used in |
... |
further arguments passed on to the function |
Based on simulations it detects which of the components follow a univariately normal distribution. More precisely it identifies the observed eigenvalues larger than the ones coming
from normal distributed data. m
standard normal data sets are simulated using the same data size and scatters as specified in the ics2
object.
The cut-off values are determined based on a quantile of these simulated eigenvalues.
As the eigenvalues, aka generalized kurtosis values, of ICS are ordered it is natural to perform the comparison in a specific order depending on the purpose.
Currently the only available type
is "smallprop"
so starting with the first component, the observed eigenvalues are successively compared to
these cut-off values. The precedure stops when an eigenvalue is below the corresponding cut-off, so when a normal component is detected.
If adjust = FALSE
all eigenvalues are compared to the same (1-level
)th level of the quantile. This leads however often to too many selected components.
Therefore some multiple testing adjustment might be useful. The current default adjusts the quantile for the jth component as 1-level
/j.
Note that depending on the data size and scatters used this can take a while and so it is more efficient to parallelize computations.
Note also that the function is seldomly called directly by the user but internally by ics.outlier
.
A list containing:
index |
integer vector indicating the indices of the selected components. |
test |
string |
criterion |
vector of the cut-off values for all the eigenvalues. |
levels |
vector of the levels used to derive the cut-offs for each component. |
adjust |
logical. |
type |
|
m |
number of iterations |
Function comp.simu.test
reached the end of its lifecycle, please use comp_simu_test()
instead. In future versions, comp.simu.test
will be deprecated and eventually removed.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. <https://doi.org/10.1016/j.csda.2018.06.011>.
# For a real analysis use larger values for m and more cores if available set.seed(123) Z <- rmvnorm(1000, rep(0, 6)) # Add 20 outliers on the first component Z[1:20, 1] <- Z[1:20, 1] + 10 pairs(Z) icsZ <- ics2(Z) # For demo purpose only small m value, should select the first component comp.simu.test(icsZ, m = 400, ncores = 1) ## Not run: # For using two cores # For demo purpose only small m value, should select the first component comp.simu.test(icsZ, m = 500, ncores = 2, iseed = 123) # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with MCD estimates and the usual estimates # Need to create a wrapper for the CovMcd function to return first the location estimate # and the scatter estimate secondly. library(rrcov) myMCD <- function(x,...){ mcd <- CovMcd(x,...) return(list(location = mcd@center, scatter = mcd@cov)) } icsZmcd <- ics2(Z, S1 = myMCD, S2 = MeanCov, S1args = list(alpha = 0.75)) # For demo purpose only small m value, should select the first component comp.simu.test(icsZmcd, m = 500, ncores = detectCores()-1, pkg = c("ICSOutlier", "rrcov"), iseed = 123) ## End(Not run) # Example with no outlier Z0 <- rmvnorm(1000, rep(0, 6)) pairs(Z0) icsZ0 <- ics2(Z0) #Should select no component comp.simu.test(icsZ0, m = 400, level = 0.01, ncores = 1)
# For a real analysis use larger values for m and more cores if available set.seed(123) Z <- rmvnorm(1000, rep(0, 6)) # Add 20 outliers on the first component Z[1:20, 1] <- Z[1:20, 1] + 10 pairs(Z) icsZ <- ics2(Z) # For demo purpose only small m value, should select the first component comp.simu.test(icsZ, m = 400, ncores = 1) ## Not run: # For using two cores # For demo purpose only small m value, should select the first component comp.simu.test(icsZ, m = 500, ncores = 2, iseed = 123) # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with MCD estimates and the usual estimates # Need to create a wrapper for the CovMcd function to return first the location estimate # and the scatter estimate secondly. library(rrcov) myMCD <- function(x,...){ mcd <- CovMcd(x,...) return(list(location = mcd@center, scatter = mcd@cov)) } icsZmcd <- ics2(Z, S1 = myMCD, S2 = MeanCov, S1args = list(alpha = 0.75)) # For demo purpose only small m value, should select the first component comp.simu.test(icsZmcd, m = 500, ncores = detectCores()-1, pkg = c("ICSOutlier", "rrcov"), iseed = 123) ## End(Not run) # Example with no outlier Z0 <- rmvnorm(1000, rep(0, 6)) pairs(Z0) icsZ0 <- ics2(Z0) #Should select no component comp.simu.test(icsZ0, m = 400, level = 0.01, ncores = 1)
Computes the cut-off values for the identification of the outliers based on the squared ICS distances. It uses simulations under a multivariate standard normal model for a specific data setup and scatters combination.
dist_simu_test( object, S1 = NULL, S2 = NULL, S1_args = list(), S2_args = list(), index, m = 10000, level = 0.025, n_cores = NULL, iseed = NULL, pkg = "ICSOutlier", q_type = 7, ... )
dist_simu_test( object, S1 = NULL, S2 = NULL, S1_args = list(), S2_args = list(), index, m = 10000, level = 0.025, n_cores = NULL, iseed = NULL, pkg = "ICSOutlier", q_type = 7, ... )
object |
object of class |
S1 |
an object of class |
S2 |
an object of class |
S1_args |
a list containing additional arguments for |
S2_args |
a list containing additional arguments for |
index |
integer vector specifying which components are used to compute the |
m |
number of simulations. Note that extreme quantiles are of interest and hence |
level |
the (1- |
n_cores |
number of cores to be used. If |
iseed |
If parallel computation is used the seed passed on to |
pkg |
When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via |
q_type |
specifies the quantile algorithm used in |
... |
further arguments passed on to the function |
The function extracts basically the dimension of the data from the "ICS"
object and simulates m
times, from a multivariate standard normal distribution, the squared ICS distances with the components specified in index
. The resulting value is then the mean of the m
correponding quantiles of these distances at level 1-level
.
Note that depending on the data size and scatters used this can take a while and so it is more efficient to parallelize computations.
Note that the function is seldomly called directly by the user but internally by ICS_outlier()
.
A vector with the values of the (1-level
)th quantile.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. doi:10.1016/j.csda.2018.06.011.
# For a real analysis use larger values for m and more cores if available Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 10 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) pairs(X) icsX <- ICS(X, center = TRUE) icsX.dist.1 <- ics_distances(icsX, index = 1) CutOff <- dist_simu_test(icsX, S1 = ICS_cov, S2= ICS_cov4, index = 1, m = 500, ncores = 1) # check if outliers are above the cut-off value plot(icsX.dist.1, col = rep(2:1, c(20, 980))) abline(h = CutOff) library(REPPlab) data(ReliabilityData) # The observations 414 and 512 are suspected to be outliers icsReliability <- ICS(ReliabilityData, center = TRUE) # Choice of the number of components with the screeplot: 2 screeplot(icsReliability) # Computation of the distances with the first 2 components ics.dist.scree <- ics_distances(icsReliability, index = 1:2) # Computation of the cut-off of the distances CutOff <- dist_simu_test(icsReliability, S1 = ICS_cov, S2= ICS_cov4, index = 1:2, m = 50, level = 0.02, ncores = 1) # Identification of the outliers based on the cut-off value plot(ics.dist.scree) abline(h = CutOff) outliers <- which(ics.dist.scree >= CutOff) text(outliers, ics.dist.scree[outliers], outliers, pos = 2, cex = 0.9) ## Not run: # For using three cores # For demo purpose only small m value, should select the first #' component dist_simu_test(icsReliability, S1 = ICS_cov, S2= ICS_cov4, index = 1:2, m = 500, level = 0.02, n_cores = 3, iseed #' = 123) # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with Cauchy estimates library(ICSClust) icsReliabilityMLC <- ICS(ReliabilityData, S1 = ICS_mlc, S1_args = list(location = TRUE), S2 = ICS_cov, center = TRUE) # Computation of the cut-off of the distances. For demo purpose only small m value. dist_simu_test(icsReliabilityMLC, S1 = ICS_mlc, S1_args = list(location = TRUE), S2 = ICS_cov, index = 1:2, m = 500, level = 0.02, n_cores = detectCores()-1, pkg = c("ICSOutlier","ICSClust"), iseed = 123) ## End(Not run)
# For a real analysis use larger values for m and more cores if available Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 10 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) pairs(X) icsX <- ICS(X, center = TRUE) icsX.dist.1 <- ics_distances(icsX, index = 1) CutOff <- dist_simu_test(icsX, S1 = ICS_cov, S2= ICS_cov4, index = 1, m = 500, ncores = 1) # check if outliers are above the cut-off value plot(icsX.dist.1, col = rep(2:1, c(20, 980))) abline(h = CutOff) library(REPPlab) data(ReliabilityData) # The observations 414 and 512 are suspected to be outliers icsReliability <- ICS(ReliabilityData, center = TRUE) # Choice of the number of components with the screeplot: 2 screeplot(icsReliability) # Computation of the distances with the first 2 components ics.dist.scree <- ics_distances(icsReliability, index = 1:2) # Computation of the cut-off of the distances CutOff <- dist_simu_test(icsReliability, S1 = ICS_cov, S2= ICS_cov4, index = 1:2, m = 50, level = 0.02, ncores = 1) # Identification of the outliers based on the cut-off value plot(ics.dist.scree) abline(h = CutOff) outliers <- which(ics.dist.scree >= CutOff) text(outliers, ics.dist.scree[outliers], outliers, pos = 2, cex = 0.9) ## Not run: # For using three cores # For demo purpose only small m value, should select the first #' component dist_simu_test(icsReliability, S1 = ICS_cov, S2= ICS_cov4, index = 1:2, m = 500, level = 0.02, n_cores = 3, iseed #' = 123) # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with Cauchy estimates library(ICSClust) icsReliabilityMLC <- ICS(ReliabilityData, S1 = ICS_mlc, S1_args = list(location = TRUE), S2 = ICS_cov, center = TRUE) # Computation of the cut-off of the distances. For demo purpose only small m value. dist_simu_test(icsReliabilityMLC, S1 = ICS_mlc, S1_args = list(location = TRUE), S2 = ICS_cov, index = 1:2, m = 500, level = 0.02, n_cores = detectCores()-1, pkg = c("ICSOutlier","ICSClust"), iseed = 123) ## End(Not run)
Computes the cut-off values for the identification of the outliers based on the squared ICS distances. It uses simulations under a multivariate standard normal model for a specific data setup and scatters combination.
dist.simu.test(object, index, m = 10000, level = 0.025, ncores = NULL, iseed = NULL, pkg = "ICSOutlier", qtype = 7, ...)
dist.simu.test(object, index, m = 10000, level = 0.025, ncores = NULL, iseed = NULL, pkg = "ICSOutlier", qtype = 7, ...)
object |
object of class |
index |
integer vector specifiying which components are used to compute the
|
m |
number of simulations. Note that extreme quantiles are of interest and hence |
level |
the (1- |
ncores |
number of cores to be used. If |
iseed |
If parallel computation is used the seed passed on to |
pkg |
When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via |
qtype |
specifies the quantile algorithm used in |
... |
further arguments passed on to the function |
The function extracts basically the dimension of the data from the ics2
object and simulates m
times, from a multivariate standard normal distribution,
the squared ICS distances with the components specified in index
. The resulting value is then the mean of the m
correponding quantiles of these distances
at level 1-level
.
Note that depending on the data size and scatters used this can take a while and so it is more efficient to parallelize computations.
Note that the function is seldomly called directly by the user but internally by ics.outlier
.
A vector with the values of the (1-level
)th quantile.
Function dist.simu.test
reached the end of its lifecycle, please use dist_simu_test
instead. In future versions, dist.simu.test
will be deprecated and eventually removed.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. <https://doi.org/10.1016/j.csda.2018.06.011>.
# For a real analysis use larger values for m and more cores if available Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 10 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) pairs(X) icsX <- ics2(X) icsX.dist.1 <- ics.distances(icsX, index = 1) CutOff <- dist.simu.test(icsX, 1, m = 500, ncores = 1) # check if outliers are above the cut-off value plot(icsX.dist.1, col = rep(2:1, c(20, 980))) abline(h = CutOff) library(REPPlab) data(ReliabilityData) # The observations 414 and 512 are suspected to be outliers icsReliability <- ics2(ReliabilityData, S1 = MeanCov, S2 = Mean3Cov4) # Choice of the number of components with the screeplot: 2 screeplot(icsReliability) # Computation of the distances with the first 2 components ics.dist.scree <- ics.distances(icsReliability, index = 1:2) # Computation of the cut-off of the distances CutOff <- dist.simu.test(icsReliability, 1:2, m = 50, level = 0.02, ncores = 1) # Identification of the outliers based on the cut-off value plot(ics.dist.scree) abline(h = CutOff) outliers <- which(ics.dist.scree >= CutOff) text(outliers, ics.dist.scree[outliers], outliers, pos = 2, cex = 0.9) ## Not run: # For using three cores # For demo purpose only small m value, should select the first component dist.simu.test(icsReliability, 1:2, m = 500, level = 0.02, ncores = 3, iseed = 123) # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with Multivariate Median and Tyler's Shape Matrix and the usual estimates library(ICSNP) icsReliabilityHRMest <- ics2(ReliabilityData, S1 = HR.Mest, S2 = MeanCov, S1args = list(maxiter = 1000)) # Computation of the cut-off of the distances. For demo purpose only small m value. dist.simu.test(icsReliabilityHRMest, 1:2, m = 500, level = 0.02, ncores = detectCores()-1, pkg = c("ICSOutlier","ICSNP"), iseed = 123) ## End(Not run)
# For a real analysis use larger values for m and more cores if available Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 10 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) pairs(X) icsX <- ics2(X) icsX.dist.1 <- ics.distances(icsX, index = 1) CutOff <- dist.simu.test(icsX, 1, m = 500, ncores = 1) # check if outliers are above the cut-off value plot(icsX.dist.1, col = rep(2:1, c(20, 980))) abline(h = CutOff) library(REPPlab) data(ReliabilityData) # The observations 414 and 512 are suspected to be outliers icsReliability <- ics2(ReliabilityData, S1 = MeanCov, S2 = Mean3Cov4) # Choice of the number of components with the screeplot: 2 screeplot(icsReliability) # Computation of the distances with the first 2 components ics.dist.scree <- ics.distances(icsReliability, index = 1:2) # Computation of the cut-off of the distances CutOff <- dist.simu.test(icsReliability, 1:2, m = 50, level = 0.02, ncores = 1) # Identification of the outliers based on the cut-off value plot(ics.dist.scree) abline(h = CutOff) outliers <- which(ics.dist.scree >= CutOff) text(outliers, ics.dist.scree[outliers], outliers, pos = 2, cex = 0.9) ## Not run: # For using three cores # For demo purpose only small m value, should select the first component dist.simu.test(icsReliability, 1:2, m = 500, level = 0.02, ncores = 3, iseed = 123) # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with Multivariate Median and Tyler's Shape Matrix and the usual estimates library(ICSNP) icsReliabilityHRMest <- ics2(ReliabilityData, S1 = HR.Mest, S2 = MeanCov, S1args = list(maxiter = 1000)) # Computation of the cut-off of the distances. For demo purpose only small m value. dist.simu.test(icsReliabilityHRMest, 1:2, m = 500, level = 0.02, ncores = detectCores()-1, pkg = c("ICSOutlier","ICSNP"), iseed = 123) ## End(Not run)
The HTP
data set contains 902 high-tech parts designed for consumer products characterized by 88 tests. These tests are performed to ensure a high quality
of the production. All these 902 parts were considered functional and have been sold. However the two parts 581 and 619
showed defects in use and were returned to the manufacturer by the customer. Therefore these two can be considered as outliers.
data("HTP")
data("HTP")
A data frame with 902 observations and 88 numeric variables V.1 - V.88.
Anonymized data from a nondisclosed manufacturer.
# HTP data: the observations 581 and 619 are considered as outliers data(HTP) outliers <- c(581, 619) boxplot(HTP) # Outlier detection using ICS icsHTP <- ics2(HTP) # Selection of components based on a Normality Test, for demo purpose only small mDist value, # but as extreme quantiles are of interest mDist should be much larger. # Also more cores could be used if available. icsOutlierDA <- ics.outlier(icsHTP, test = "agostino.test", level.test = 0.05, level.dist = 0.02, mDist = 50, ncores = 1) icsOutlierDA summary(icsOutlierDA) plot(icsOutlierDA) text(outliers, [email protected][outliers], outliers, pos = 2, cex = 0.9, col = 2) ## Not run: # Selection of components based on simulations # This might take a while to run (around 30 minutes) icsOutlierPA <- ics.outlier(icsHTP, method = "simulation", level.dist = 0.02, level.test = 0.05, mEig = 10000, mDist = 10000) icsOutlierPA summary(icsOutlierPA) plot(icsOutlierPA) text(outliers, [email protected][outliers], outliers, pos = 2, cex = 0.9, col = 2) ## End(Not run)
# HTP data: the observations 581 and 619 are considered as outliers data(HTP) outliers <- c(581, 619) boxplot(HTP) # Outlier detection using ICS icsHTP <- ics2(HTP) # Selection of components based on a Normality Test, for demo purpose only small mDist value, # but as extreme quantiles are of interest mDist should be much larger. # Also more cores could be used if available. icsOutlierDA <- ics.outlier(icsHTP, test = "agostino.test", level.test = 0.05, level.dist = 0.02, mDist = 50, ncores = 1) icsOutlierDA summary(icsOutlierDA) plot(icsOutlierDA) text(outliers, icsOutlierDA@ics.distances[outliers], outliers, pos = 2, cex = 0.9, col = 2) ## Not run: # Selection of components based on simulations # This might take a while to run (around 30 minutes) icsOutlierPA <- ics.outlier(icsHTP, method = "simulation", level.dist = 0.02, level.test = 0.05, mEig = 10000, mDist = 10000) icsOutlierPA summary(icsOutlierPA) plot(icsOutlierPA) text(outliers, icsOutlierPA@ics.distances[outliers], outliers, pos = 2, cex = 0.9, col = 2) ## End(Not run)
The HTP2
data set contains 457 high-tech parts designed for consumer
products characterized by 149 tests.
These tests are performed to ensure a high quality of the production.
All these 457 parts were considered functional and have been sold.
However the part 28 showed defects in use and was
returned to the manufacturer by the customer. Therefore this part
can be considered as outlier.
data("HTP2")
data("HTP2")
A data frame with 457 rows and 149 variables V.1 - V.149, presenting some collinearity issues.
Anonymized data from a nondisclosed manufacturer.
Archimbaud, A., Drmac, Z., Nordhausen, K., Radojcic, U. and Ruiz-Gazen, A. (2023) Numerical Considerations and a New Implementation for Invariant Coordinate Selection. SIAM Journal on Mathematics of Data Science, 5(1), 97–121. doi:10.1137/22M1498759.
# HTP2 data: the observation 28 is considered as an outlier data("HTP2") outliers <- c(28) boxplot(HTP2, horizontal = TRUE) # Outlier detection using ICS library(ICS) ## Not run: out <- ICS_outlier(HTP2, ICS_algorithm = "QR", method = "norm_test", test = "agostino.test", level_test = 0.05, level_dist = 0.01, n_dist = 50) # Here there is a singularity issue. One solution is to first reduce the # dimension. To ensure higher numerical stability of the subsequent methods # we suggest to permute the data and to use the QR decomposition instead of # the regular SVD decomposition. Xt <- HTP2 # Normalization by the mean Xt.c <- sweep(HTP2, 2, colMeans(HTP2), "-") # Permutation by rows # decreasing by infinity norm: absolute maximum norm_inf <- apply(Xt.c, 1, function(x) max(abs(x))) order_rows <- order(norm_inf, decreasing = TRUE) Xt_row_per <- Xt.c[order_rows,] # QR decomposition of Xt with column pivoting from LAPACK qr_Xt <- qr(1/sqrt(nrow(Xt.c)-1)*Xt_row_per, LAPACK = TRUE) # Estimation of rank q # R is nxp, but with only zero for rows > p # the diag of R is already in decreasing order and is a good approximation # of the rank of X.c. To decide on which singular values are zero we use # a relative criteria based on previous values. # R should be pxp R <- qr.R(qr_Xt) r_all <- abs(diag(R)) r_ratios <- r_all[2:length(r_all)]/r_all[1:(length(r_all)-1)] q <- which(r_ratios < max(dim(Xt.c)) *.Machine$double.eps)[1] q <- ifelse(is.na(q), length(r_all), q) # Q should be nxp but we are only interested in nxq Q1 <- qr.Q(qr_Xt)[,1:q] # QR decomposition of Rt R_q <- R[1:q, ] qr_R <- qr(t(R_q), LAPACK = TRUE) Tau <- qr.Q(qr_R)[1:q, ] Omega1 <- qr.R(qr_R)[1:q, 1:q] # New X tilde # permutation matrices # permutation of rows Pi2 <- data.frame(model.matrix(~ . -1, data = data.frame(row=as.character(order_rows)))) Pi2 <- Pi2[,order(as.numeric(substr(colnames(Pi2), start = 4, stop = nchar(colnames(Pi2)))))] colnames(Pi2) <- rownames(Xt) # permutation of cols Pi3 <- data.frame(model.matrix(~ . -1, data = data.frame(col=as.character( qr_R$pivot)))) Pi3 <- t(Pi3[,order(as.numeric(substr(colnames(Pi3), start = 4, stop = nchar(colnames(Pi3)))))]) X_tilde <- sqrt(nrow(Xt)-1)* Tau %*% t(Pi3) %*% t(Q1) Xt_tilde <- t(Pi2) %*% t(X_tilde) # Run ICS_outlier out <- ICS_outlier(Xt_tilde, ICS_algorithm = "QR", method = "norm_test", test = "agostino.test", level_test = 0.01, level_dist = 0.01, n_dist = 50) summary(out) plot(out) text(outliers, out$ics_distances[outliers], outliers, pos = 2, cex = 0.9, col = 2) ## End(Not run)
# HTP2 data: the observation 28 is considered as an outlier data("HTP2") outliers <- c(28) boxplot(HTP2, horizontal = TRUE) # Outlier detection using ICS library(ICS) ## Not run: out <- ICS_outlier(HTP2, ICS_algorithm = "QR", method = "norm_test", test = "agostino.test", level_test = 0.05, level_dist = 0.01, n_dist = 50) # Here there is a singularity issue. One solution is to first reduce the # dimension. To ensure higher numerical stability of the subsequent methods # we suggest to permute the data and to use the QR decomposition instead of # the regular SVD decomposition. Xt <- HTP2 # Normalization by the mean Xt.c <- sweep(HTP2, 2, colMeans(HTP2), "-") # Permutation by rows # decreasing by infinity norm: absolute maximum norm_inf <- apply(Xt.c, 1, function(x) max(abs(x))) order_rows <- order(norm_inf, decreasing = TRUE) Xt_row_per <- Xt.c[order_rows,] # QR decomposition of Xt with column pivoting from LAPACK qr_Xt <- qr(1/sqrt(nrow(Xt.c)-1)*Xt_row_per, LAPACK = TRUE) # Estimation of rank q # R is nxp, but with only zero for rows > p # the diag of R is already in decreasing order and is a good approximation # of the rank of X.c. To decide on which singular values are zero we use # a relative criteria based on previous values. # R should be pxp R <- qr.R(qr_Xt) r_all <- abs(diag(R)) r_ratios <- r_all[2:length(r_all)]/r_all[1:(length(r_all)-1)] q <- which(r_ratios < max(dim(Xt.c)) *.Machine$double.eps)[1] q <- ifelse(is.na(q), length(r_all), q) # Q should be nxp but we are only interested in nxq Q1 <- qr.Q(qr_Xt)[,1:q] # QR decomposition of Rt R_q <- R[1:q, ] qr_R <- qr(t(R_q), LAPACK = TRUE) Tau <- qr.Q(qr_R)[1:q, ] Omega1 <- qr.R(qr_R)[1:q, 1:q] # New X tilde # permutation matrices # permutation of rows Pi2 <- data.frame(model.matrix(~ . -1, data = data.frame(row=as.character(order_rows)))) Pi2 <- Pi2[,order(as.numeric(substr(colnames(Pi2), start = 4, stop = nchar(colnames(Pi2)))))] colnames(Pi2) <- rownames(Xt) # permutation of cols Pi3 <- data.frame(model.matrix(~ . -1, data = data.frame(col=as.character( qr_R$pivot)))) Pi3 <- t(Pi3[,order(as.numeric(substr(colnames(Pi3), start = 4, stop = nchar(colnames(Pi3)))))]) X_tilde <- sqrt(nrow(Xt)-1)* Tau %*% t(Pi3) %*% t(Q1) Xt_tilde <- t(Pi2) %*% t(X_tilde) # Run ICS_outlier out <- ICS_outlier(Xt_tilde, ICS_algorithm = "QR", method = "norm_test", test = "agostino.test", level_test = 0.01, level_dist = 0.01, n_dist = 50) summary(out) plot(out) text(outliers, out$ics_distances[outliers], outliers, pos = 2, cex = 0.9, col = 2) ## End(Not run)
The HTP3
data set contains 371 high-tech parts designed for consumer
products characterized by 33 tests.
These tests are performed to ensure a high quality of the production.
All these 371 parts were considered functional and have been sold.
However the part 32 showed defects in use and was
returned to the manufacturer by the customer. Therefore this part
can be considered as outlier.
data("HTP3")
data("HTP3")
A data frame with 371 rows and 33 variables V.1 - V.33, presenting some approximate collinearity issues which may cause some numerical inaccuracies.
Anonymized data from a nondisclosed manufacturer.
Archimbaud, A., Drmac, Z., Nordhausen, K., Radojcic, U. and Ruiz-Gazen, A. (2023) Numerical Considerations and a New Implementation for Invariant Coordinate Selection. SIAM Journal on Mathematics of Data Science, 5(1), 97–121. doi:10.1137/22M1498759.
# HTP3 data: the observation 32 is considered as an outlier data("HTP3") outliers <- c(32) boxplot(HTP3) # Outlier detection using ICS library(ICS) out <- ICS_outlier(HTP3, ICS_algorithm = "QR", method = "norm_test", test = "agostino.test", level_test = 0.05, level_dist = 0.01, n_dist = 50) summary(out) plot(out) text(outliers, out$ics_distances[outliers], outliers, pos = 2, cex = 0.9, col = 2)
# HTP3 data: the observation 32 is considered as an outlier data("HTP3") outliers <- c(32) boxplot(HTP3) # Outlier detection using ICS library(ICS) out <- ICS_outlier(HTP3, ICS_algorithm = "QR", method = "norm_test", test = "agostino.test", level_test = 0.05, level_dist = 0.01, n_dist = 50) summary(out) plot(out) text(outliers, out$ics_distances[outliers], outliers, pos = 2, cex = 0.9, col = 2)
Squared ICS Distances for Invariant Coordinates
ics_distances(object, index = NULL)
ics_distances(object, index = NULL)
object |
object of class |
index |
vector of integers indicating the indices of the components to select. |
For outlier detection, the squared ICS distances can be used as a measure of outlierness. Denote as the invariant coordinates centered with the location estimate specified in
S1
(for details see ICS()). Let be the
components of
selected by
index
, then the ICS distance of the observation is defined as:
Note that if all components are selected, the ICS distances are equivalent to the Mahalanobis distances computed with respect of the first scatter and associated location specified in S1
.
A numeric vector containing the squared ICS distances.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. doi:10.1016/j.csda.2018.06.011.
Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 5 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) pairs(X) icsX <- ICS(X, center = TRUE) icsX.dist.all <- ics_distances(icsX, index = 1:6) maha <- mahalanobis(X, center = colMeans(X), cov = cov(X)) # in this case the distances should be the same plot(icsX.dist.all, maha) all.equal(icsX.dist.all, maha) icsX.dist.first <- ics_distances(icsX, index = 1) plot(icsX.dist.first)
Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 5 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) pairs(X) icsX <- ICS(X, center = TRUE) icsX.dist.all <- ics_distances(icsX, index = 1:6) maha <- mahalanobis(X, center = colMeans(X), cov = cov(X)) # in this case the distances should be the same plot(icsX.dist.all, maha) all.equal(icsX.dist.all, maha) icsX.dist.first <- ics_distances(icsX, index = 1) plot(icsX.dist.first)
In a multivariate framework outlier(s) are detected using ICS. The function performs ICS() and decides automatically about the number of invariant components to use to search for the outliers and the number of outliers detected on these components. Currently the function is restricted to the case of searching outliers only on the first components.
ICS_outlier( X, S1 = ICS_cov, S2 = ICS_cov4, S1_args = list(), S2_args = list(), ICS_algorithm = c("whiten", "standard", "QR"), method = "norm_test", test = "agostino.test", n_eig = 10000, level_test = 0.05, adjust = TRUE, level_dist = 0.025, n_dist = 10000, type = "smallprop", n_cores = NULL, iseed = NULL, pkg = "ICSOutlier", q_type = 7, ... )
ICS_outlier( X, S1 = ICS_cov, S2 = ICS_cov4, S1_args = list(), S2_args = list(), ICS_algorithm = c("whiten", "standard", "QR"), method = "norm_test", test = "agostino.test", n_eig = 10000, level_test = 0.05, adjust = TRUE, level_dist = 0.025, n_dist = 10000, type = "smallprop", n_cores = NULL, iseed = NULL, pkg = "ICSOutlier", q_type = 7, ... )
X |
a numeric matrix or data frame containing the data to be transformed. |
S1 |
an object of class |
S2 |
an object of class |
S1_args |
a list containing additional arguments for |
S2_args |
a list containing additional arguments for |
ICS_algorithm |
a character string specifying with which algorithm
the invariant coordinate system is computed. Possible values are
|
method |
name of the method used to select the ICS components involved to compute ICS distances. Options are |
test |
name of the marginal normality test to use if |
n_eig |
number of simulations performed to derive the cut-off values for selecting the ICS components. Only if |
level_test |
for the |
adjust |
logical. For selecting the invariant coordinates, the level of the test can be adjusted for each component to deal with multiple testing. See |
level_dist |
|
n_dist |
number of simulations performed to derive the cut-off value for the ICS distances. See |
type |
currently the only option is |
n_cores |
number of cores to be used in |
iseed |
If parallel computation is used the seed passed on to |
pkg |
When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via |
q_type |
specifies the quantile algorithm used in |
... |
passed on to other methods. |
The ICS method has attractive properties for outlier detection in the case of a small proportion of outliers. As for PCA three steps have to be performed:(i) select the components most useful for the detection, (ii) compute distances as outlierness measures for all observation and finally (iii) label outliers using some cut-off value.
This function performs these three steps automatically:
For choosing the components of interest two methods are proposed: "norm_test"
based on some marginal normality tests (see details in comp_norm_test
) or "simulation"
based on a parallel analysis (see details in comp_simu_test
). These two approaches lie on the intrinsic property of ICS in case of a small proportion of outliers with the choice of S1 "more robust" than S2, which ensures to find outliers on the first components. Indeed when using S1 = ICS_cov
and S2 = ICS_cov4
, the Invariant Coordinates are ordered according to their classical Pearson kurtosis values in decreasing order. The information to find the outliers should be then contained in the first k non-normal directions.
Then the ICS distances are computed as the Euclidean distances on the selected k centered components .
Finally the outliers are identified based on a cut-off derived from simulations. If the distance of an observation exceeds the expectation under the normal model, this observation is labeled as outlier (see details in dist_simu_test
).
As a rule of thumb, the percentage of contamination should be limited to 10% in case of a mixture of gaussian distributions and using the default combination of locations and scatters for ICS.
An object of S3-class 'ICS_Out' which contains:
outliers
: A vector containing ones for outliers and zeros for non outliers.
ics_distances
: A numeric vector containing the squared ICS distances.
ics_dist_cutoff
: The cut-off for the distances to decide if an observation is outlying or not.
level_dist
: The level for deciding upon the cut-off value for the ICS distances.
level_test
: The initial level for selecting the invariant coordinates.
method
: Name of the method used to decide upon the number of ICS components.
index
: Vector giving the indices of the ICS components selected.
test
: The name of the normality test as specified in the function call.
criterion
: Vector giving the marginal levels for the components selection.
adjust
: Wether the initial level used to decide upon the number of components has been adjusted for multiple testing or not.
type
: Currently always the string "smallprop"
.
n_dist
: Number of simulations performed to decide upon the cut-off for the ICS distances.
n_eig
: Number of simulations performed for selecting the ICS components based on simulations.
S1_label
: Name of S1.
S2_label
: Name of S2.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. doi:10.1016/j.csda.2018.06.011.
ICS(), comp_norm_test()
, comp_simu_test()
,
dist_simu_test()
and
print(), plot(), summary() methods
# ReliabilityData example: the observations 414 and 512 are suspected to be outliers library(REPPlab) data(ReliabilityData) # For demo purpose only small mDist value, but as extreme quantiles # are of interest mDist should be much larger. Also number of cores used # should be larger if available icsOutlierDA <- ICS_outlier(ReliabilityData, S1 = ICS_tM, S2 = ICS_cov, level_dist = 0.01, n_dist = 50, n_cores = 1) icsOutlierDA summary(icsOutlierDA) plot(icsOutlierDA) ## Not run: # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with MCD estimates and the usual estimates # Need to create a wrapper for the CovMcd function to return first the location estimate # and the scatter estimate secondly. data(HTP) library(ICSClust) # For demo purpose only small m value, should select the first seven components icsOutlier <- ICS_outlier(HTP, S1 = ICS_mcd_rwt, S2 = ICS_cov, S1_args = list(location = TRUE, alpha = 0.75), n_eig = 50, level_test = 0.05, adjust = TRUE, level_dist = 0.025, n_dist = 50, n_cores = detectCores()-1, iseed = 123, pkg = c("ICSOutlier", "ICSClust")) icsOutlier ## End(Not run) # Exemple of no direction and hence also no outlier set.seed(123) X = rmvnorm(500, rep(0, 2), diag(rep(0.1,2))) icsOutlierJB <- ICS_outlier(X, test = "jarque.test", level_dist = 0.01, level_test = 0.01, n_dist = 100, n_cores = 1) summary(icsOutlierJB) plot(icsOutlierJB) rm(.Random.seed) # Example of no outlier set.seed(123) X = matrix(rweibull(1000, 4, 4), 500, 2) X = apply(X,2, function(x){ifelse(x<5 & x>2, x, runif(sum(!(x<5 & x>2)), 5, 5.5))}) icsOutlierAG <- ICS_outlier(X, test = "anscombe.test", level_dist = 0.01, level_test = 0.05, n_dist = 100, n_cores = 1) summary(icsOutlierAG) plot(icsOutlierAG) rm(.Random.seed)
# ReliabilityData example: the observations 414 and 512 are suspected to be outliers library(REPPlab) data(ReliabilityData) # For demo purpose only small mDist value, but as extreme quantiles # are of interest mDist should be much larger. Also number of cores used # should be larger if available icsOutlierDA <- ICS_outlier(ReliabilityData, S1 = ICS_tM, S2 = ICS_cov, level_dist = 0.01, n_dist = 50, n_cores = 1) icsOutlierDA summary(icsOutlierDA) plot(icsOutlierDA) ## Not run: # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with MCD estimates and the usual estimates # Need to create a wrapper for the CovMcd function to return first the location estimate # and the scatter estimate secondly. data(HTP) library(ICSClust) # For demo purpose only small m value, should select the first seven components icsOutlier <- ICS_outlier(HTP, S1 = ICS_mcd_rwt, S2 = ICS_cov, S1_args = list(location = TRUE, alpha = 0.75), n_eig = 50, level_test = 0.05, adjust = TRUE, level_dist = 0.025, n_dist = 50, n_cores = detectCores()-1, iseed = 123, pkg = c("ICSOutlier", "ICSClust")) icsOutlier ## End(Not run) # Exemple of no direction and hence also no outlier set.seed(123) X = rmvnorm(500, rep(0, 2), diag(rep(0.1,2))) icsOutlierJB <- ICS_outlier(X, test = "jarque.test", level_dist = 0.01, level_test = 0.01, n_dist = 100, n_cores = 1) summary(icsOutlierJB) plot(icsOutlierJB) rm(.Random.seed) # Example of no outlier set.seed(123) X = matrix(rweibull(1000, 4, 4), 500, 2) X = apply(X,2, function(x){ifelse(x<5 & x>2, x, runif(sum(!(x<5 & x>2)), 5, 5.5))}) icsOutlierAG <- ICS_outlier(X, test = "anscombe.test", level_dist = 0.01, level_test = 0.05, n_dist = 100, n_cores = 1) summary(icsOutlierAG) plot(icsOutlierAG) rm(.Random.seed)
Computes the squared ICS distances, defined as the Euclidian distances of the selected centered components.
ics.distances(object, index = NULL)
ics.distances(object, index = NULL)
object |
object of class |
index |
vector of integers indicating the indices of the components to select. |
For outlier detection, the squared ICS distances can be used as a measure of outlierness. Denote as
the invariant coordinates centered with the location estimate specified in
S1
(for details see ics2
).
Let be the
components of
selected by
index
, then the ICS distance of the observation
is defined as:
Note that if all components are selected, the ICS distances are equivalent to the Mahlanobis distances computed with
respect of the first scatter and associated location specified in S1
.
A numeric vector containing the squared ICS distances.
Function ics.distances()
reached the end of its lifecycle, please use ics_distances
instead. In future versions, ics_distances()
will be deprecated and eventually removed.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. <https://doi.org/10.1016/j.csda.2018.06.011>.
Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 5 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) pairs(X) icsX <- ics2(X) icsX.dist.all <- ics.distances(icsX, index = 1:6) maha <- mahalanobis(X, center = colMeans(X), cov = cov(X)) # in this case the distances should be the same plot(icsX.dist.all, maha) all.equal(icsX.dist.all, maha) icsX.dist.first <- ics.distances(icsX, index = 1) plot(icsX.dist.first)
Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 5 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) pairs(X) icsX <- ics2(X) icsX.dist.all <- ics.distances(icsX, index = 1:6) maha <- mahalanobis(X, center = colMeans(X), cov = cov(X)) # in this case the distances should be the same plot(icsX.dist.all, maha) all.equal(icsX.dist.all, maha) icsX.dist.first <- ics.distances(icsX, index = 1) plot(icsX.dist.first)
In a multivariate framework outlier(s) are detected using ICS. The function works on an object of class ics2
and decides automatically about the number of invariant components to use to search for the outliers and the number of outliers
detected on these components. Currently the function is restricted to the case of searching outliers only on the first components.
ics.outlier(object, method = "norm.test", test = "agostino.test", mEig = 10000, level.test = 0.05, adjust = TRUE, level.dist = 0.025, mDist = 10000, type = "smallprop", ncores = NULL, iseed = NULL, pkg = "ICSOutlier", qtype = 7, ...)
ics.outlier(object, method = "norm.test", test = "agostino.test", mEig = 10000, level.test = 0.05, adjust = TRUE, level.dist = 0.025, mDist = 10000, type = "smallprop", ncores = NULL, iseed = NULL, pkg = "ICSOutlier", qtype = 7, ...)
object |
object of class |
method |
name of the method used to select the ICS components involved to compute ICS distances. Options are
|
test |
name of the marginal normality test to use if |
mEig |
number of simulations performed to derive the cut-off values for selecting the ICS components. Only if |
level.test |
|
adjust |
logical. For selecting the invariant coordinates, the level of the test can be adjusted for each component to deal with multiple testing. See |
level.dist |
|
mDist |
number of simulations performed to derive the cut-off value for the ICS distances.
See |
type |
currently the only option is |
ncores |
number of cores to be used in |
iseed |
If parallel computation is used the seed passed on to |
pkg |
When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via |
qtype |
specifies the quantile algorithm used in |
... |
passed on to other methods. |
The ICS method has attractive properties for outlier detection in the case of a small proportion of outliers. As for PCA three steps have to be performed: (i) select the components most useful for the detection, (ii) compute distances as outlierness measures for all observation and finally (iii) label outliers using some cut-off value.
This function performs these three steps automatically:
For choosing the components of interest two methods are proposed: "norm.test"
based on some marginal normality tests (see details in comp.norm.test
)
or "simulation"
based on a parallel analysis (see details in comp.simu.test
). These two approaches lie on the intrinsic property of ICS in case of a small proportion
of outliers with the choice of S1 "more robust" than S2, which ensures to find outliers on the first components. Indeed when using S1 = MeanCov
and S2 = Mean3Cov4
,
the Invariant Coordinates are ordered according to their classical Pearson kurtosis values in decreasing order. The information to find the outliers should be then contained in the first
k nonnormal directions.
Then the ICS distances are computed as the Euclidian distances on the selected k centered components .
Finally the outliers are identified based on a cut-off derived from simulations. If the distance of an observation exceeds the expectation under the normal model,
this observation is labeled as outlier (see details in dist.simu.test
).
As a rule of thumb, the percentage of contamination should be limited to 10% in case of a mixture of gaussian distributions and using the default combination of locations and scatters for ICS.
an object of class icsOut
Function ics.outlier
reached the end of its lifecycle, please use ICS_outlier
instead. In future versions, ics.outlier
will be deprecated and eventually removed.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. <https://doi.org/10.1016/j.csda.2018.06.011>.
Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICSOutlier: Unsupervised Outlier Detection for Low-Dimensional Contamination Structure. The R Journal, 10:234-250. <doi:10.32614/RJ-2018-034>.
ics2
, comp.norm.test
, comp.simu.test
, dist.simu.test
,
icsOut-class
# ReliabilityData example: the observations 414 and 512 are suspected to be outliers library(REPPlab) data(ReliabilityData) icsReliabilityData <- ics2(ReliabilityData, S1 = tM, S2 = MeanCov) # For demo purpose only small mDist value, but as extreme quantiles # are of interest mDist should be much larger. Also number of cores used # should be larger if available icsOutlierDA <- ics.outlier(icsReliabilityData, level.dist = 0.01, mDist = 50, ncores = 1) icsOutlierDA summary(icsOutlierDA) plot(icsOutlierDA) ## Not run: # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with MCD estimates and the usual estimates # Need to create a wrapper for the CovMcd function to return first the location estimate # and the scatter estimate secondly. data(HTP) library(rrcov) myMCD <- function(x,...){ mcd <- CovMcd(x,...) return(list(location = mcd@center, scatter = mcd@cov)) } icsHTP <- ics2(HTP, S1 = myMCD, S2 = MeanCov, S1args = list(alpha = 0.75)) # For demo purpose only small m value, should select the first seven components icsOutlier <- ics.outlier(icsHTP, mEig = 50, level.test = 0.05, adjust = TRUE, level.dist = 0.025, mDist = 50, ncores = detectCores()-1, iseed = 123, pkg = c("ICSOutlier", "rrcov")) icsOutlier ## End(Not run) # Exemple of no direction and hence also no outlier set.seed(123) X = rmvnorm(500, rep(0, 2), diag(rep(0.1,2))) icsX <- ics2(X) icsOutlierJB <- ics.outlier(icsX, test = "jarque", level.dist = 0.01, level.test = 0.01, mDist = 100, ncores = 1) summary(icsOutlierJB) plot(icsOutlierJB) rm(.Random.seed) # Example of no outlier set.seed(123) X = matrix(rweibull(1000, 4, 4), 500, 2) X = apply(X,2, function(x){ifelse(x<5 & x>2, x, runif(sum(!(x<5 & x>2)), 5, 5.5))}) icsX <- ics2(X) icsOutlierAG <- ics.outlier(icsX, test = "anscombe", level.dist = 0.01, level.test = 0.05, mDist = 100, ncores = 1) summary(icsOutlierAG) plot(icsOutlierAG) rm(.Random.seed)
# ReliabilityData example: the observations 414 and 512 are suspected to be outliers library(REPPlab) data(ReliabilityData) icsReliabilityData <- ics2(ReliabilityData, S1 = tM, S2 = MeanCov) # For demo purpose only small mDist value, but as extreme quantiles # are of interest mDist should be much larger. Also number of cores used # should be larger if available icsOutlierDA <- ics.outlier(icsReliabilityData, level.dist = 0.01, mDist = 50, ncores = 1) icsOutlierDA summary(icsOutlierDA) plot(icsOutlierDA) ## Not run: # For using several cores and for using a scatter function from a different package # Using the parallel package to detect automatically the number of cores library(parallel) # ICS with MCD estimates and the usual estimates # Need to create a wrapper for the CovMcd function to return first the location estimate # and the scatter estimate secondly. data(HTP) library(rrcov) myMCD <- function(x,...){ mcd <- CovMcd(x,...) return(list(location = mcd@center, scatter = mcd@cov)) } icsHTP <- ics2(HTP, S1 = myMCD, S2 = MeanCov, S1args = list(alpha = 0.75)) # For demo purpose only small m value, should select the first seven components icsOutlier <- ics.outlier(icsHTP, mEig = 50, level.test = 0.05, adjust = TRUE, level.dist = 0.025, mDist = 50, ncores = detectCores()-1, iseed = 123, pkg = c("ICSOutlier", "rrcov")) icsOutlier ## End(Not run) # Exemple of no direction and hence also no outlier set.seed(123) X = rmvnorm(500, rep(0, 2), diag(rep(0.1,2))) icsX <- ics2(X) icsOutlierJB <- ics.outlier(icsX, test = "jarque", level.dist = 0.01, level.test = 0.01, mDist = 100, ncores = 1) summary(icsOutlierJB) plot(icsOutlierJB) rm(.Random.seed) # Example of no outlier set.seed(123) X = matrix(rweibull(1000, 4, 4), 500, 2) X = apply(X,2, function(x){ifelse(x<5 & x>2, x, runif(sum(!(x<5 & x>2)), 5, 5.5))}) icsX <- ics2(X) icsOutlierAG <- ics.outlier(icsX, test = "anscombe", level.dist = 0.01, level.test = 0.05, mDist = 100, ncores = 1) summary(icsOutlierAG) plot(icsOutlierAG) rm(.Random.seed)
A S4 class to store results from performing outlier detection in an ICS context.
Objects can be created by calls of the form new("icsOut", ...)
. But usually objects are created by the function ics.outlier
.
outliers
:Object of class "integer"
. A vector containing ones for outliers and zeros for non outliers.
ics.distances
:Object of class "numeric"
. Vector giving the squared ICS distances of the observations from the invariant coordinates centered with the location estimate specified in S1
.
ics.dist.cutoff
:Object of class "numeric"
. The cut-off for the distances to decide if an observation is outlying or not.
level.dist
:Object of class "numeric"
. The level for deciding upon the cut-off value for the ICS distances.
level.test
:Object of class "numeric"
. The inital level for selecting the invariant coordinates.
method
:Object of class "character"
. Name of the method used to decide upon the number of ICS components.
index
:Object of class "numeric"
. Vector giving the indices of the ICS components selected.
test
:Object of class "character"
. The name of the normality test as specified in the function call.
criterion
:Object of class "numeric"
. Vector giving the marginal levels for the components selection.
adjust
:Object of class "logical"
. Wether the initial level used to decide upon the number of components has been adjusted for multiple testing or not.
type
:Object of class "character"
. Currently always the string "smallprop"
.
mDist
:Object of class "integer"
. Number of simulations performed to decide upon the cut-off for the ICS distances.
mEig
:Object of class "integer"
. Number of simulations performed for selecting the ICS components based on simulations.
S1name
:Object of class "character"
. Name of S1 in the original ics2 object.
S2name
:Object of class "character"
. Name of S2 in the original ics2 object.
For this class the following generic functions are available: print.icsOut
, summary.icsOut
and plot.ics
In case no extractor function for the slots exists, the component can be extracted the usual way using '@'. This S4 class is created by ics.outlier
that reached the end of its lifecycle, please use ICS_outlier
instead for which an object of class S3 is returned. In future versions, ics.outlier
will be deprecated and eventually removed.
Aurore Archimbaud and Klaus Nordhausen
Distances plot for an 'ICS_Out' object visualizing the separation of the outliers from the good data points.
## S3 method for class 'ICS_Out' plot( x, pch.out = 16, pch.good = 4, col.out = 1, col.good = grey(0.5), col.cut = 1, lwd.cut = 1, lty.cut = 1, xlab = "Observation Number", ylab = "ICS distances", ... )
## S3 method for class 'ICS_Out' plot( x, pch.out = 16, pch.good = 4, col.out = 1, col.good = grey(0.5), col.cut = 1, lwd.cut = 1, lty.cut = 1, xlab = "Observation Number", ylab = "ICS distances", ... )
x |
object of class |
pch.out |
plotting symbol for the outliers. |
pch.good |
plotting symbol for the 'good' data points. |
col.out |
color for the outliers. |
col.good |
color for the 'good' data points. |
col.cut |
color for cut-off line. |
lwd.cut |
lwd value for cut-off line. |
lty.cut |
lty value for cut-off line. |
xlab |
default x-axis label. |
ylab |
default y-axis label. |
... |
other arguments for |
For the figure the IC distances are plotted versus their index. The cut-off value for distances is given as a horizontal line and all observations above the line are considered as outliers.
A plot is displayed.
Aurore Archimbaud and Klaus Nordhausen
Distances plot for an icsOut object visualizing the separation of the outliers from the good data points.
## S4 method for signature 'icsOut,missing' plot(x, pch.out = 16, pch.good = 4, col.out = 1, col.good = grey(0.5), col.cut = 1, lwd.cut = 1, lty.cut = 1, xlab = "Observation Number", ylab = "ICS distances", ...)
## S4 method for signature 'icsOut,missing' plot(x, pch.out = 16, pch.good = 4, col.out = 1, col.good = grey(0.5), col.cut = 1, lwd.cut = 1, lty.cut = 1, xlab = "Observation Number", ylab = "ICS distances", ...)
x |
object of class |
pch.out |
ploting symbol for the outliers. |
pch.good |
plotting symbol for the ‘good’ data points. |
col.out |
color for the outliers. |
col.good |
color for the ‘good’ data points. |
col.cut |
color for cut-off line. |
lwd.cut |
lwd value for cut-off line. |
lty.cut |
lty value for cut-off line. |
xlab |
default x-axis label. |
ylab |
default y-axis label. |
... |
other arguments for |
For the figure the IC distances are plotted versus their index. The cut-off value for distances is given as a horizontal line and all observations above the line are considered as outliers.
Aurore Archimbaud and Klaus Nordhausen
Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 10 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) icsX <- ics2(X) # For demonstation purposes mDist is small, should be larger for real data analysis icsXoutliers <- ics.outlier(icsX, mDist = 500) plot(icsXoutliers, col.out = 2)
Z <- rmvnorm(1000, rep(0, 6)) Z[1:20, 1] <- Z[1:20, 1] + 10 A <- matrix(rnorm(36), ncol = 6) X <- tcrossprod(Z, A) icsX <- ics2(X) # For demonstation purposes mDist is small, should be larger for real data analysis icsXoutliers <- ics.outlier(icsX, mDist = 500) plot(icsXoutliers, col.out = 2)
Short statement about how many components are selected for the outlier detection and how many outliers are detected.
## S3 method for class 'ICS_Out' print(x, ...)
## S3 method for class 'ICS_Out' print(x, ...)
x |
object object of class |
... |
additional arguments, not used. |
The supplied object of class "ICS_Out_summary"
is returned invisibly.
Aurore Archimbaud and Klaus Nordhausen
Short statement about how many components are selected for the outlier detection and how many outliers are detected.
## S4 method for signature 'icsOut' show(object)
## S4 method for signature 'icsOut' show(object)
object |
object of class |
Aurore Archimbaud and Klaus Nordhausen
Summary of an 'ICS_Out' Object
Summarizes an 'ICS_Out' object in an informative way.
## S3 method for class 'ICS_Out' summary(object, ...)
## S3 method for class 'ICS_Out' summary(object, ...)
object |
object object of class |
... |
additional arguments passed to |
An object of class "ICS_Out_summary"
with the following components:
comps
: Vector giving the indices of the ICS components selected.
method
: Name of the method used to decide upon the number of ICS components.
test
: he name of the normality test as specified in the function call.
S1_label
: Name of S1.
S2_label
: Name of S2.
level_test
: The level for deciding upon the cut-off value for the ICS distances.
level_dist
: The initial level for selecting the invariant coordinates.
nb_outliers
: the number of observations identified as outliers.
Aurore Archimbaud and Klaus Nordhausen
Summarizes and prints an icsOut
object in an informative way.
## S4 method for signature 'icsOut' summary(object, digits = 4)
## S4 method for signature 'icsOut' summary(object, digits = 4)
object |
object of class |
digits |
number of digits for the numeric output. |
Aurore Archimbaud and Klaus Nordhausen