Testing for the number of interesting components in SIR

Sliced inverse regression

Denote the p-variate predictors xi, i = 1, ..., n with corresponding responses yi. The predictors are assumed to follow the model xi = Asi + b, where A is a non-singular p × p matrix, b a p-vector and the random vector s can be decomposed into (s1T, s2T)T with E(s) = 0 and Cov(s) = Ip where s1 has dimension k and s2 dimension p − k. It is then assumed that s1 is the signal part, the interesting components, which are relevant to model y, whereas s2 is the noise part.

The working model assumption is then that (y, s1T)T is independent of s2. Defining S1 = E((x − b)(x − b)T) and S2 = E(E(x − b|y)E(x − b|y)T) the sliced inverse regression (SIR) can be interpreted as finding the transformation matrix W such that WS1WT = Ip and WS2WT = D, where D is a diagonal matrix with diagonal elements d1 ≥ d2 ≥ ... ≥ dk > dk + 1 = ... = dp = 0.

In practice S2 is estimated by approximating E(x − b|y) by dividing y into h slices where in this package y is divided into h intervals containing an equal number of observations.

The practical problem is to decide then the value k.

Asymptotic test a specific value of k

For an asymptotic test using the test statistic $$ T= n\sum_{i=k+1}^p d_i, $$ the limiting distribution under the null is then T ∼ χ(p − k)(h − k − 1)2. Therefore for the hypothetical value k and the number of slices h is required that k ≥ h − 1.

Bootstrapping test a specific value of k

Bootstrap tests can be constructed as follows

  1. Compute i = (xi − i), and decompose into i, 1 and i, 2.
  2. Sample a bootstrap sample (yi*, i, 1*) of size n from (yi, i, 1).
  3. Sample a bootstrap sample i, 2* of size n from i, 2.
  4. Compute xi* = −1((i, 1*)T, (i, 2*)T)T.
  5. Recompute the test statistic T* based on the bootstrap data xi*, i=,1, ..., n.
  6. Repeat the steps above m times to obtain bootstrap test statistics T1*, ..., Tm*.

The p-value is then $$ \frac{\#(T_i^* \geq T)+1}{m+1}. $$

Example

Some simulated data with true k = 2:

set.seed(1234)
n <- 200
p <- 10
X <- matrix(rnorm(p*n), ncol = p)
eps <- rnorm(n, sd=0.25)
y <- X[, 1]/ (0.5+(X[, 2]+1.5)^2)
pairs(cbind(y,X))

First performing the asymptotic test

library(ICtest)
SIRasympk2 <- SIRasymp(X,y,2)
screeplot(SIRasympk2)
SIRasympk2
## 
##  SIR test for subspace dimension
## 
## data:  X
## T = 65.121, df = 56, p-value = 0.1891
## alternative hypothesis: the last 8 eigenvalues are not zero

Then the bootstrap test

SIRbootk2 <- SIRboot(X,y,2)
SIRbootk2
## 
##  SIR bootstrapping test for subspace dimension
## 
## data:  X
## T = 0.32561, replications = 200, p-value = 0.209
## alternative hypothesis: the last 8 eigenvalues are not zero

Looking at the first two components and their relationship with the response

pairs(cbind(y, components(SIRbootk2, which="k")))