Title: | Robust Principal Component Analysis Using the Cauchy Distribution |
---|---|
Description: | A new robust principal component analysis algorithm is implemented that relies upon the Cauchy Distribution. The algorithm is suitable for high dimensional data even if the sample size is less than the number of variables. The methodology is described in this paper: Fayomi A., Pantazis Y., Tsagris M. and Wood A.T.A. (2024). "Cauchy robust principal component analysis with applications to high-dimensional data sets". Statistics and Computing, 34: 26. <doi:10.1007/s11222-023-10328-x>. |
Authors: | Michail Tsagris [aut, cre], Aisha Fayomi [ctb], Yannis Pantazis [ctb], Andrew T.A. Wood [ctb] |
Maintainer: | Michail Tsagris <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3 |
Built: | 2024-11-20 06:47:14 UTC |
Source: | CRAN |
A new robust principal component analysis algorithm is implemented that relies upon the Cauchy Distribution. The algorithm is suitable for high dimensional data even if the sample size is less than the number of variables.
Package: | cauchypca |
Type: | Package |
Version: | 1.3 |
Date: | 2024-01-24 |
License: | GPL-2 |
Michail Tsagris <[email protected]>.
Michail Tsagris [email protected], Aisha Fayomi [email protected], Yannis Pantazis [email protected] and Andrew T.A. Wood [email protected].
Fayomi A., Pantazis Y., Tsagris M. and Wood A.T.A. (2024). Cauchy robust principal component analysis with applications to high-dimensional data sets. Statistics and Computing, 34: 26. https://doi.org/10.1007/s11222-023-10328-x
MLE of the Cauchy distribution.
cauchy.mle(x, tol = 1e-07)
cauchy.mle(x, tol = 1e-07)
x |
A numerical vector with data. |
tol |
The tolerance level up to which the maximisation stops set to 1e-09 by default. |
Instead of maximising the log-likelihood via a numerical optimiser we have used a Newton-Raphson algorithm which is faster. The Cauchy is the t distribution with 1 degree of freedom.
A list including:
iters |
The number of iterations required for the Newton-Raphson to converge. |
loglik |
The value of the maximised log-likelihood. |
param |
The vector of the parameters. |
Michail Tsagris
R implementation and documentation: Michail Tsagris <[email protected]>.
Johnson, Norman L. Kemp, Adrianne W. Kotz, Samuel (2005). Univariate Discrete Distributions (third edition). Hoboken, NJ: Wiley-Interscience.
https://en.wikipedia.org/wiki/Wigner_semicircle_distribution
x <- rcauchy(1000) a <- cauchy.mle(x)
x <- rcauchy(1000) a <- cauchy.mle(x)
Robust PCA using the Cauchy distribution.
cauchy.pca(x, k = 1, center = "sm", scale = "mad", trials = 20, parallel = FALSE)
cauchy.pca(x, k = 1, center = "sm", scale = "mad", trials = 20, parallel = FALSE)
x |
A numerical matrix with the data. |
k |
The number of eigenvectors to extract. |
center |
The way to center the data. This can be either "sm" corresponding to the spatial median, "med" corresponding to the classical variable-wise median. Alternatively the user can specify their own vector. |
scale |
This is the method to scale the data. The default value is "mad" corresponding to the mean absolute deviation, computed column-wise. Alternatively the user can provide their own vector. |
trials |
The number of trials to attempt. How many times the algorithm will be performed with different starting values (different starting vectors). |
parallel |
If you want parallel computations set this equal to TRUE. |
This is the main function used to extract the Cauchy robust eigenvectors.
A list including:
runtime |
The duration (in seconds) of the algorithm. |
loglik |
The minimum maximum Cauchy log-likelihood. |
mu |
The estimated location parameter of the Cauchy ditribution. |
su |
The estimated scale parameter of the Cauchy ditribution. |
loadings |
A matrix with the robust eigenvectors. |
Michail Tsagris, Aisha Fayomi, Yannis Pantazis and Andrew T.A. Wood.
R implementation and documentation: Michail Tsagris [email protected].
Fayomi A., Pantazis Y., Tsagris M. and Wood A.T.A. (2024). Cauchy robust principal component analysis with applications to high-dimensional data sets. Statistics and Computing, 34: 26. https://doi.org/10.1007/s11222-023-10328-x
x <- as.matrix( iris[, 1:4] ) cauchy.pca(x, k = 1)
x <- as.matrix( iris[, 1:4] ) cauchy.pca(x, k = 1)