Title: | Sparse Principal Component Based on Least Trimmed Squares |
---|---|
Description: | Implementation of robust and sparse PCA algorithm of Wang and Van Aelst (2019) <DOI:10.1080/00401706.2019.1671234>. |
Authors: | Yixin Wang [aut, cre], Stefan Van Aelst [aut], Holger Cevallos Valdiviezo [ctb] (Original R code for the LTS-PCA algorithm), Tom Reynkens [ctb] (Original R code for angle in the rospca package) |
Maintainer: | Yixin Wang <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2024-11-21 06:52:17 UTC |
Source: | CRAN |
Standardised last principal angle between the subspaces generated by the columns of A and B.
Angle(A, B)
Angle(A, B)
A |
numerical matrix of size p by k |
B |
numerical matrix of size q by l |
Standardised last principal angle between A and B.
Tom Reynkens
Bjorck, A. and Golub, G. H. (1973), “Numerical Methods for Computing Angles Between Linear Subspaces,” Mathematics of Computation, 27, 579–594.
Hubert, M., Rousseeuw, P. J., and Vanden Branden, K. (2005), “ROBPCA: A New Approach to Robust Principal Component Analysis,” Technometrics, 47, 64–79.
Hubert, M., Reynkens, T., Schmitt, E. and Verdonck, T. (2016), “Sparse PCA for High-Dimensional Data With Outliers,” Technometrics, 58, 424–434.
the function that generates the simulation data set
dataSim(n = 200, p = 20, bLength = 4, a = c(0.9, 0.5, 0), SD = c(10, 5, 2), eps = 0, eta = 25, setting = "3", seed = 123, vc = NULL)
dataSim(n = 200, p = 20, bLength = 4, a = c(0.9, 0.5, 0), SD = c(10, 5, 2), eps = 0, eta = 25, setting = "3", seed = 123, vc = NULL)
n |
number of observations |
p |
number of variables |
bLength |
the number of correlated variables in the first k blocks |
a |
numveric vector of length k+1 that contains the correlations between the variables in each block (the last block contains uncorrelated variables); by default is (0.9, 0.5, 0) |
SD |
numveric vector of length k+1 that contains the standard deviation of the variables in each block (the last block contains uncorrelated variables); by default is (10, 5, 2) |
eps |
proportion of outliers, default is 0 |
eta |
parameter that contols the outlyingness, default is 25 |
setting |
type of outliers: |
seed |
random seed used to simulate the data |
vc |
controls the direction of the score outliers within the PC subspace, default is NULL |
a list with components
data |
generated data matrix |
ind |
row indices of outliers |
R |
Correlation matrix of the data |
Sigma |
Covariance matrix of the data |
Glass data of Lemberge et al. (2000) containing Electron Probe X-ray Microanalysis (EPXMA) intensities for different wavelengths of 16–17th century archaeological glass vessels. This dataset was also used in Hubert et al. (2005) and Hubert et al. (2016).
Glass
Glass
A data frame with columns:
A data frame with 180 observations and 750 variables. These variables correspond to
EPXMA intensities for different wavelengths and are indicated by V1
, V2
, ..., V750
.
Lemberge, P., De Raedt, I., Janssens, K. H., Wei, F., and Van Espen, P. J. (2000), “Quantitative Z-Analysis of the 16–17th Century Archaelogical Glass Vessels using PLS Regression of EPXMA and -XRF Data," Journal of Chemometrics, 14, 751–763.
Hubert, M., Rousseeuw, P. J., and Vanden Branden, K. (2005), “ROBPCA: A New Approach to Robust Principal Component Analysis,” Technometrics, 47, 64–79.
Hubert, M., Reynkens, T., Schmitt, E. and Verdonck, T. (2016), “Sparse PCA for High-Dimensional Data With Outliers,” Technometrics, 58, 424–434.
## Not run: data(Glass) ## End(Not run)
## Not run: data(Glass) ## End(Not run)
the function that computes LTS-PCA
ltspca(x, q, alpha = 0.5, b.choice = NULL, tol = 1e-06, N1 = 3, N2 = 2, N2bis = 10, Npc = 10)
ltspca(x, q, alpha = 0.5, b.choice = NULL, tol = 1e-06, N1 = 3, N2 = 2, N2bis = 10, Npc = 10)
x |
the input data matrix |
q |
the dimension of the PC subspace |
alpha |
the robust parameter which takes value between 0 to 0.5, default is 0.5 |
b.choice |
intial loading matrix; by default is NULL and the deterministic starting values will be computed by the algorithm |
tol |
convergence criterion |
N1 |
the number controls the updates for a without updating b in the concentration step |
N2 |
the number controls outer loop in the concentration step |
N2bis |
the number controls the outer loop for the selected b |
Npc |
the number controls the inner loop |
the object of class "ltspca" is returned
b |
the unnormalized loading matrix |
mu |
the center estimate |
ws |
if the observation in included in the h-subset |
best.cand |
the method which computes the best deterministic starting value in the concentration step |
Cevallos Valdiviezo
Cevallos Valdiviezo, H., Van Aelst, S. (2019), “ Fast computation of robust subspace estimators”, Computational Statistics & Data Analysis, 134, 171–185.
## Not run: ltspcaM <- ltspca(x = x, q = 2, alpha = 0.5) ## End(Not run)
## Not run: ltspcaM <- ltspca(x = x, q = 2, alpha = 0.5) ## End(Not run)
the function that computes the initial LTS-SPCA
ltsspca(x, kmax, alpha = 0.5, mu.choice = NULL, l.search = NULL, ls.min = 1, tol = 1e-06, N1 = 3, N2 = 2, N2bis = 10, Npc = 10)
ltsspca(x, kmax, alpha = 0.5, mu.choice = NULL, l.search = NULL, ls.min = 1, tol = 1e-06, N1 = 3, N2 = 2, N2bis = 10, Npc = 10)
x |
the input data matrix |
kmax |
the maximal number of PCs searched by the intial LTS-SPCA |
alpha |
the robust parameter which takes value between 0 to 0.5, default is 0.5 |
mu.choice |
the center estimate fixed by the user; by default, the center will be estimated automatically by the algorithm |
l.search |
a list of length kmax which contains the search grids chosen by the user; default is NULL |
ls.min |
the smallest grid step when searching for the sparsity of each PC; default is 1 |
tol |
convergence criterion |
N1 |
the number controls the updates for a without updating b in the concentration step for LTS-PCA |
N2 |
the number controls outer loop in the concentration step for LTS-PCA |
N2bis |
the number controls the outer loop for the selected b for both LTS-PCA and LTS-SPCA |
Npc |
the number controls the inner loop for both LTS-PCA and LTS-SPCA |
the object of class "ltsspca" is returned
loadings |
the initially estimated loading matrix by LTS-SPCA |
mu |
the center estimates associated with each PC |
spca.it |
the list that contains the results of LTS-SPCA when searching for the individual PCs |
ls |
the list that contains the final search grid for each PC direction |
Yixin Wang
Wang, Y., Van Aelst, S. (2019), “ Sparse Principal Component Based On Least Trimmed Squares”, Technometrics, accepted.
library(mvtnorm) dataM <- dataSim(n = 200, p = 20, bLength = 4, a = c(0.9, 0.5, 0), SD = c(10, 5, 2), eps = 0, seed = 123) x <- dataM$data ltsspcaMI <- ltsspca(x = x, kmax = 5, alpha = 0.5) ltsspcaMR <- ltsspcaRw(x = x, obj = ltsspcaMI, k = 2, alpha = 0.5) matplot(ltsspcaMR$loadings,type="b",ylab="Loadings")
library(mvtnorm) dataM <- dataSim(n = 200, p = 20, bLength = 4, a = c(0.9, 0.5, 0), SD = c(10, 5, 2), eps = 0, seed = 123) x <- dataM$data ltsspcaMI <- ltsspca(x = x, kmax = 5, alpha = 0.5) ltsspcaMR <- ltsspcaRw(x = x, obj = ltsspcaMI, k = 2, alpha = 0.5) matplot(ltsspcaMR$loadings,type="b",ylab="Loadings")
the function that computes the reweighted LTS-SPCA
ltsspcaRw(x, obj, k = NULL, alpha = 0.5, co.sd = 0.25)
ltsspcaRw(x, obj, k = NULL, alpha = 0.5, co.sd = 0.25)
x |
the input data matrix |
obj |
initial LTS-SPCA object given by ltsspca function |
k |
dimension of the PC subspace; by default is NULL then k takes the value of kmax in the initial LTS-SPCA |
alpha |
the robust parameter which takes value between 0 to 0.5, default is 0.5 |
co.sd |
cutoff value for score outlier weight, default is 0.25 |
the object of class "ltsspcaRw" is returned
loadings |
the sparse loading matrix estimated with reweighted LTS-SPCA |
scores |
the estimated score matrix |
eigenvalues |
the estimated eigenvalues |
mu |
the center estimate |
rw.obj |
the list that contains the results of sPCA_rSVD on the reduced data |
od |
the orthonal distances with respect to the initially estimated PC subspace with all the noisy variables removed |
co.od |
the cutoff value for the orthogonal distances |
ws.od |
if the observation is outlying in the orthgonal complement of the initially estimated PC subspace |
sc.wt |
the score outlier weight, which is compared with 0.25 (by default) to flag score outliers |
co.sd |
the cutoff value for score outlier weight, default is 0.25 |
ws.sd |
if the observation is outlying with the PC subspace |
sc.out |
the retruned object when computing the score outlier weights |
Make diagnostic plot using the estimated PC subspace
mydiagPlot(x, obj, k, alpha = 0.5, co.sd = 0.25)
mydiagPlot(x, obj, k, alpha = 0.5, co.sd = 0.25)
x |
the input data matrix |
obj |
the returned output from rwltsspca |
k |
dimension of the PC subspace |
alpha |
the robust parameter which takes value between 0 to 0.5, default is 0.5 |
co.sd |
cutoff value for score outlier weight, default is 0.25 |
the diagnostics of outliers
od |
the orthgonal distances with respect to the k-dimensional PC subspace |
ws.od |
if the observation is outlying in the orthgonal complement of the PC subspace |
co.od |
the cutoff value for orthogonal distances |
sc.wt |
the score outlier weight, which is compared with 0.25 (by default) to flag score outliers |
ws.sd |
if the observation is outlying with the PC subspace |
co.sd |
the cutoff value for score outlier weight, default is 0.25 |
sc.out |
the retruned object when computing the score outlier weights |
the function that computes sPCA_rSVD
sPCA_rSVD(x, k, method = "hard", center = FALSE, scale = FALSE, l.search = NULL, ls.min = 1)
sPCA_rSVD(x, k, method = "hard", center = FALSE, scale = FALSE, l.search = NULL, ls.min = 1)
x |
the input data matrix |
k |
the maximal number of PC's to seach for in the initial stage |
method |
threshold method used in the algorithm; If |
center |
if |
scale |
if |
l.search |
a list of length kmax which contains the search grids chosen by the user; default is NULL |
ls.min |
the smallest grid step when searching for the sparsity of each PC; default is 1 |
an object of class "sPCA_rSVD" is returned
loadings |
the sparse loading matrix estimated with sPCA_rSVD |
scores |
the estimated score matrix |
eigenvalues |
the estimated eigenvalues |
spca.it |
the list that contains the results of sPCA_rSVD when searching for the individual PCs |
ls |
the list that contains the final search grid for each PC direction |
Shen, H. and Huang, J. (2008), “Sparse principal component anlysis via regularized low rank matrix decomposition”, Journal of Multivariate Analysis, 99, 1015–1034.
Shen, D., Shen, H., and Marron, J. (2013). “Consistency of sparse PCA in high dimensional low sample size context”, Journal of Multivariate Analysis, 115, 315–333.
## Not run: nonrobM <- sPCA_rSVD(x = x, k = 2, center = T, scale = F) ## End(Not run)
## Not run: nonrobM <- sPCA_rSVD(x = x, k = 2, center = T, scale = F) ## End(Not run)