In an elliptical model often principal component analysis (PCA) is used as a linear dimension reduction tool and it is assumed that the components with large variation are interesting and the directions with small and equal variation are the uninteresting ones. Note that those components with equal variation form then a subspherical subset.
Assume that X is a data matrix with n observations and p variables and that PCA is performed using a scatter matrix (and location vector) of the user’s choice, such as the regular covariance matrix (and the mean vector).
Then X is first centered using the location of choice and, assuming the matrix W contains the eigenvectors of the scatter matrix, S = XW gives then the principal components where the amount of variation of each component is given by the corresponding eigenvalues d1, ..., dp.
The null hypothesis of interest is H0: d1 ≥ ... ≥ dk > dk + 1 = ... = dp.
The function PCAasymp
offers an asymptotic test for the
above hypothesis where the user can choose between two scatter matrices.
Using the argument scatter = "cov"
centers the data with
regular mean vector and performs PCA using the regular covariance
matrix. Specifying instead scatter = "tyler"
estimates
jointly Tyler’s shape matrix and the spatial median using the function
HR.Mest
from the package ICSNP
and uses then
them accordingly.
The test statistic is in both cases based on the variance of the last p − k eigenvalues: $$ T = n / (2 \bar{d}^2 \sigma_1) \sum_{i=k+1}^p (d_i - \bar{d})^2, $$ where $\bar{d} = 1/n \sum_{i=k+1}^p d_i$ and σ1 is a constant specific for the used scatter matrix and depends on the underlying elliptic distribution and the dimension p. The value of σ1 is estimated from the data.
Under H0 this test statistic has a chisquare distribution with (p − k − 1)(p − k + 2)/2 degrees of freedom.
To demonstrate the function consider the following artificial data:
library(ICtest)
set.seed(1)
n <- 200
S <- cbind(rnorm(n, sd = 2), rnorm(n, sd = 1.5), rnorm(n), rnorm(n), rnorm(n))
A <- rorth(5)
X <- S %*% t(A)
pairs(X)
which means there are two components of interest.
##
## PCA subspericity test using cov
##
## data: X
## T = 0.80054, df = 5, p-value = 0.977
## alternative hypothesis: the last 3 eigenvalues are not equal
##
## PCA subspericity test using Tyler's shape matrix
##
## data: X
## T = 2.1696, df = 5, p-value = 0.8252
## alternative hypothesis: the last 3 eigenvalues are not equal
And in both cases this hypothesis is not rejected.
But for example testing for no interesting component using cov yields:
##
## PCA subspericity test using cov
##
## data: X
## T = 126.68, df = 14, p-value < 2.2e-16
## alternative hypothesis: the last 5 eigenvalues are not equal
To obtain a bootstrap test for this problem the package offers the
function PCAboot
. The function can perform the test for any
scatter matrix the user has a function for. Important is only that the
function returns a list which returns as first element the location
vector and as second element the scatter matrix. The function is
specified using the argument S
and Sargs
is an
argument where additional arguments to S
can be passed in
the form of a list.
The test statistic used for the bootstrapping tests is $$ T = n / (\bar{d}^2) \sum_{i=k+1}^p (d_i - \bar{d})^2, $$
and two different bootstrapping strategies are available:
s.boot="B1"
: The first strategy has the following
steps:s.boot="B2"
: The second strategy has the following
steps:The p-value is in both cases obtained then as: $$ \frac{\#(T_i^* \geq T)+1}{m+1}. $$
To create the random orthogonal matrices in the bootstrapping
approaches the function rorth
is used.
To demonstrate how to use another scatter matrix and how to pass on
additional arguments we use now the scatter based on the MLE of t2 distribution, as for
example implemented as the function tM
in the
ICS
package. For that purpose, using strategy two we test
using the data from above if k = 1 or k = 2.
##
## PCA subsphericity bootstrapping test using "tM" and strategy B1
##
## data: X
## T = 23.357, replications = 200, p-value = 0.004975
## alternative hypothesis: the last 4 eigenvalues are not equal
##
## PCA subsphericity bootstrapping test using "tM" and strategy B1
##
## data: X
## T = 1.417, replications = 200, p-value = 0.8557
## alternative hypothesis: the last 3 eigenvalues are not equal