Package 'PEtests'

Title: Power-Enhanced (PE) Tests for High-Dimensional Data
Description: Two-sample power-enhanced mean tests, covariance tests, and simultaneous tests on mean vectors and covariance matrices for high-dimensional data. Methods of these PE tests are presented in Yu, Li, and Xue (2022) <doi:10.1080/01621459.2022.2126781>; Yu, Li, Xue, and Li (2022) <doi:10.1080/01621459.2022.2061354>.
Authors: Xiufan Yu [aut, cre], Danning Li [aut], Lingzhou Xue [aut], Runze Li [aut]
Maintainer: Xiufan Yu <[email protected]>
License: GPL (>= 3)
Version: 0.1.0
Built: 2024-12-03 06:43:16 UTC
Source: CRAN

Help Index


Power-Enhanced (PE) Tests for High-Dimensional Data

Description

The package implements several two-sample power-enhanced mean tests, covariance tests, and simultaneous tests on mean vectors and covariance matrices for high-dimensional data.

Details

There are three main functions:
covtest
meantest
simultest

References

Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2):808–835. doi:10.1214/09-AOS716

Cai, T. T., Liu, W., and Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108(501):265–277. doi:10.1080/01621459.2012.758041

Cai, T. T., Liu, W., and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 76(2):349–372. doi:10.1111/rssb.12034

Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2):908–940. doi:10.1214/12-AOS993

Yu, X., Li, D., and Xue, L. (2022). Fisher’s combined probability test for high-dimensional covariance matrices. Journal of the American Statistical Association, (in press):1–14. doi:10.1080/01621459.2022.2126781

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14. doi:10.1080/01621459.2022.2061354

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
covtest(X, Y)
meantest(X, Y)
simultest(X, Y)

Two-sample covariance tests for high-dimensional data

Description

This function implements five two-sample covariance tests on high-dimensional covariance matrices. Let XRp\mathbf{X} \in \mathbb{R}^p and YRp\mathbf{Y} \in \mathbb{R}^p be two pp-dimensional populations with mean vectors (μ1,μ2)(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2) and covariance matrices (Σ1,Σ2)(\mathbf{\Sigma}_1, \mathbf{\Sigma}_2), respectively. The problem of interest is to test the equality of the two covariance matrices:

H0c:Σ1=Σ2.H_{0c}: \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2.

Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. We denote dataX=(X1,,Xn1)Rn1×p(\mathbf{X}_1, \ldots, \mathbf{X}_{n_1})^\top\in\mathbb{R}^{n_1\times p} and dataY=(Y1,,Yn2)Rn2×p(\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2})^\top\in\mathbb{R}^{n_2\times p}.

Usage

covtest(dataX,dataY,method='pe.comp',delta=NULL)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

method

the method type (default = 'pe.comp'); chosen from

  • 'clx': the ll_\infty-norm-based covariance test, proposed in Cai et al. (2013);
    see covtest.clx for details.

  • 'lc': the l2l_2-norm-based covariance test, proposed in Li and Chen (2012);
    see covtest.lc for details.

  • 'pe.cauchy': the PE covariance test via Cauchy combination;
    see covtest.pe.cauchy for details.

  • 'pe.comp': the PE covariance test via the construction of PE components;
    see covtest.pe.comp for details.

  • 'pe.fisher': the PE covariance test via Fisher's combination;
    see covtest.pe.fisher for details.

delta

This is needed only in method='pe.comp'; see covtest.pe.comp for details. The default is NULL.

Value

method the method type

stat the value of test statistic

pval the p-value for the test.

References

Cai, T. T., Liu, W., and Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108(501):265–277.

Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2):908–940.

Yu, X., Li, D., and Xue, L. (2022). Fisher’s combined probability test for high-dimensional covariance matrices. Journal of the American Statistical Association, (in press):1–14.

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
covtest(X,Y)

Two-sample high-dimensional covariance test (Cai, Liu and Xia, 2013)

Description

This function implements the two-sample ll_\infty-norm-based high-dimensional covariance test proposed in Cai, Liu and Xia (2013). Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. The test statistic is defined as

TCLX=max1i,jp(σ^ij1σ^ij2)2θ^ij1/n1+θ^ij2/n2,T_{CLX} = \max_{1\leq i,j \leq p} \frac{(\hat\sigma_{ij1}-\hat\sigma_{ij2})^2} {\hat\theta_{ij1}/n_1+\hat\theta_{ij2}/n_2},

where σ^ij1\hat\sigma_{ij1} and σ^ij2\hat\sigma_{ij2} are the sample covariances, and θ^ij1/n1+θ^ij2/n2\hat\theta_{ij1}/n_1+\hat\theta_{ij2}/n_2 estimates the variance of σ^ij1σ^ij2\hat{\sigma}_{ij1}-\hat{\sigma}_{ij2}. The explicit formulas of σ^ij1\hat\sigma_{ij1}, σ^ij2\hat\sigma_{ij2}, θ^ij1\hat\theta_{ij1} and θ^ij2\hat\theta_{ij2} can be found in Section 2 of Cai, Liu and Xia (2013). With some regularity conditions, under the null hypothesis H0c:Σ1=Σ2H_{0c}: \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the test statistic TCLX4logp+loglogpT_{CLX}-4\log p+\log\log p converges in distribution to a Gumbel distribution Gcov(x)=exp(18πexp(x2))G_{cov}(x) = \exp(-\frac{1}{\sqrt{8\pi}}\exp(-\frac{x}{2})) as n1,n2,pn_1, n_2, p \rightarrow \infty. The asymptotic pp-value is obtained by

pCLX=1Gcov(TCLX4logp+loglogp).p_{CLX} = 1-G_{cov}(T_{CLX}-4\log p+\log\log p).

Usage

covtest.clx(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Cai, T. T., Liu, W., and Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108(501):265–277.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
covtest.clx(X,Y)

Two-sample high-dimensional covariance test (Li and Chen, 2012)

Description

This function implements the two-sample l2l_2-norm-based high-dimensional covariance test proposed by Li and Chen (2012). Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. The test statistic TLCT_{LC} is defined as

TLC=An1+Bn22Cn1,n2,T_{LC} = A_{n_1}+B_{n_2}-2C_{n_1,n_2},

where An1A_{n_1}, Bn2B_{n_2}, and Cn1,n2C_{n_1,n_2} are unbiased estimators for tr(Σ12)\mathrm{tr}(\mathbf{\Sigma}^2_1), tr(Σ22)\mathrm{tr}(\mathbf{\Sigma}^2_2), and tr(Σ1Σ2)\mathrm{tr}(\mathbf{\Sigma}_1\mathbf{\Sigma}_2), respectively. Under the null hypothesis H0c:Σ1=Σ2H_{0c}: \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the leading variance of TLCT_{LC} is σTLC2=4(1n1+1n2)2tr2(Σ2)\sigma^2_{T_{LC}} = 4(\frac{1}{n_1}+\frac{1}{n_2})^2 \rm{tr}^2(\mathbf{\Sigma}^2), which can be consistently estimated by σ^LC2\hat\sigma^2_{LC}. The explicit formulas of An1A_{n_1}, Bn2B_{n_2}, Cn1,n2C_{n_1,n_2} and σ^TLC2\hat\sigma^2_{T_{LC}} can be found in Equations (2.1), (2.2) and Theorem 1 of Li and Chen (2012). With some regularity conditions, under the null hypothesis H0c:Σ1=Σ2H_{0c}: \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the test statistic TLCT_{LC} converges in distribution to a standard normal distribution as n1,n2,pn_1, n_2, p \rightarrow \infty. The asymptotic pp-value is obtained by

pLC=1Φ(TLC/σ^TLC),p_{LC} = 1-\Phi(T_{LC}/\hat\sigma_{T_{LC}}),

where Φ()\Phi(\cdot) is the cdf of the standard normal distribution.

Usage

covtest.lc(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2):908–940.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
covtest.lc(X,Y)

Two-sample PE covariance test for high-dimensional data via Cauchy combination

Description

This function implements the two-sample PE covariance test via Cauchy combination. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let pLCp_{LC} and pCLXp_{CLX} denote the pp-values associated with the l2l_2-norm-based covariance test (see covtest.lc for details) and the ll_\infty-norm-based covariance test (see covtest.clx for details), respectively. The PE covariance test via Cauchy combination is defined as

TCauchy=12tan((0.5pLC)π)+12tan((0.5pCLX)π).T_{Cauchy} = \frac{1}{2}\tan((0.5-p_{LC})\pi) + \frac{1}{2}\tan((0.5-p_{CLX})\pi).

It has been proved that with some regularity conditions, under the null hypothesis H0c:Σ1=Σ2,H_{0c}: \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore TCauchyT_{Cauchy} asymptotically converges in distribution to a standard Cauchy distribution. The asymptotic pp-value is obtained by

p-value=1FCauchy(TCauchy),p\text{-value} = 1-F_{Cauchy}(T_{Cauchy}),

where FCauchy()F_{Cauchy}(\cdot) is the cdf of the standard Cauchy distribution.

Usage

covtest.pe.cauchy(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Yu, X., Li, D., and Xue, L. (2022). Fisher’s combined probability test for high-dimensional covariance matrices. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
covtest.pe.cauchy(X,Y)

Two-sample PE covariance test for high-dimensional data via PE component

Description

This function implements the two-sample PE covariance test via the construction of the PE component. Let TLC/σ^TLCT_{LC}/\hat\sigma_{T_{LC}} denote the l2l_2-norm-based covariance test statistic (see covtest.lc for details). The PE component is constructed by

Jc=pi=1pj=1pTijξ^ij1/2I{2Tijξ^ij1/2+1>δcov},J_c=\sqrt{p}\sum_{i=1}^p\sum_{j=1}^p T_{ij}\widehat\xi^{-1/2}_{ij} \mathcal{I}\{ \sqrt{2}T_{ij}\widehat\xi^{-1/2}_{ij} +1 > \delta_{cov} \},

where δcov\delta_{cov} is a threshold for the screening procedure, recommended to take the value of δcov=4log(log(n1+n2))logp\delta_{cov}=4\log(\log (n_1+n_2))\log p. The explicit forms of TijT_{ij} and ξ^ij\widehat\xi_{ij} can be found in Section 3.2 of Yu et al. (2022). The PE covariance test statistic is defined as

TPE=TLC/σ^TLC+Jc.T_{PE}=T_{LC}/\hat\sigma_{T_{LC}}+J_c.

With some regularity conditions, under the null hypothesis H0c:Σ1=Σ2H_{0c}: \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the test statistic TPET_{PE} converges in distribution to a standard normal distribution as n1,n2,pn_1, n_2, p \rightarrow \infty. The asymptotic pp-value is obtained by

p-value=1Φ(TPE),p\text{-value}=1-\Phi(T_{PE}),

where Φ()\Phi(\cdot) is the cdf of the standard normal distribution.

Usage

covtest.pe.comp(dataX,dataY,delta=NULL)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

delta

a scalar; the thresholding value used in the construction of the PE component. If not specified, the function uses a default value δcov=4log(log(n1+n2))logp\delta_{cov}=4\log(\log (n_1+n_2))\log p.

Value

stat the value of test statistic

pval the p-value for the test.

References

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
covtest.pe.comp(X,Y)

Two-sample PE covariance test for high-dimensional data via Fisher's combination

Description

This function implements the two-sample PE covariance test via Fisher's combination. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let pLCp_{LC} and pCLXp_{CLX} denote the pp-values associated with the l2l_2-norm-based covariance test (see covtest.lc for details) and the ll_\infty-norm-based covariance test (see covtest.clx for details), respectively. The PE covariance test via Fisher's combination is defined as

TFisher=2log(pLC)2log(pCLX).T_{Fisher} = -2\log(p_{LC})-2\log(p_{CLX}).

It has been proved that with some regularity conditions, under the null hypothesis H0c:Σ1=Σ2,H_{0c}: \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore TFisherT_{Fisher} asymptotically converges in distribution to a χ42\chi_4^2 distribution. The asymptotic pp-value is obtained by

p-value=1Fχ42(TFisher),p\text{-value} = 1-F_{\chi_4^2}(T_{Fisher}),

where Fχ42()F_{\chi_4^2}(\cdot) is the cdf of the χ42\chi_4^2 distribution.

Usage

covtest.pe.fisher(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Yu, X., Li, D., and Xue, L. (2022). Fisher’s combined probability test for high-dimensional covariance matrices. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
covtest.pe.fisher(X,Y)

Two-sample mean tests for high-dimensional data

Description

This function implements five two-sample mean tests on high-dimensional mean vectors. Let XRp\mathbf{X} \in \mathbb{R}^p and YRp\mathbf{Y} \in \mathbb{R}^p be two pp-dimensional populations with mean vectors (μ1,μ2)(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2) and covariance matrices (Σ1,Σ2)(\mathbf{\Sigma}_1, \mathbf{\Sigma}_2), respectively. The problem of interest is to test the equality of the two mean vectors of the two populations:

H0m:μ1=μ2.H_{0m}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2.

Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. We denote dataX=(X1,,Xn1)Rn1×p(\mathbf{X}_1, \ldots, \mathbf{X}_{n_1})^\top\in\mathbb{R}^{n_1\times p} and dataY=(Y1,,Yn2)Rn2×p(\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2})^\top\in\mathbb{R}^{n_2\times p}.

Usage

meantest(dataX,dataY,method='pe.comp',delta=NULL)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

method

the method type (default = 'pe.comp'); chosen from

  • 'clx': the ll_\infty-norm-based mean test, proposed in Cai et al. (2014);
    see meantest.clx for details.

  • 'cq': the l2l_2-norm-based mean test, proposed in Chen and Qin (2010);
    see meantest.cq for details.

  • 'pe.cauchy': the PE mean test via Cauchy combination;
    see meantest.pe.cauchy for details.

  • 'pe.comp': the PE mean test via the construction of PE components;
    see meantest.pe.comp for details.

  • 'pe.fisher': the PE mean test via Fisher's combination;
    see meantest.pe.fisher for details.

delta

This is needed only in method='pe.comp'; see meantest.pe.comp for details. The default is NULL.

Value

method the method type

stat the value of test statistic

pval the p-value for the test.

References

Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2):808–835.

Cai, T. T., Liu, W., and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 76(2):349–372.

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
meantest(X,Y)

Two-sample high-dimensional mean test (Cai, Liu and Xia, 2014)

Description

This function implements the two-sample ll_\infty-norm-based high-dimensional mean test proposed in Cai, Liu and Xia (2014). Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. The test statistic is defined as

MCLX=n1n2n1+n2max1jp(XjˉYjˉ)21n1+n2[u=1n1(XujXjˉ)2+v=1n2(YvjYjˉ)2]M_{CLX}=\frac{n_1n_2}{n_1+n_2}\max_{1\leq j\leq p} \frac{(\bar{X_j}-\bar{Y_j})^2} {\frac{1}{n_1+n_2} [\sum_{u=1}^{n_1} (X_{uj}-\bar{X_j})^2+\sum_{v=1}^{n_2} (Y_{vj}-\bar{Y_j})^2] }

With some regularity conditions, under the null hypothesis H0c:Σ1=Σ2H_{0c}: \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the test statistic MCLX2logp+loglogpM_{CLX}-2\log p+\log\log p converges in distribution to a Gumbel distribution Gmean(x)=exp(1πexp(x2))G_{mean}(x) = \exp(-\frac{1}{\sqrt{\pi}}\exp(-\frac{x}{2})) as n1,n2,pn_1, n_2, p \rightarrow \infty. The asymptotic pp-value is obtained by

pCLX=1Gmean(MCLX2logp+loglogp).p_{CLX} = 1-G_{mean}(M_{CLX}-2\log p+\log\log p).

Usage

meantest.clx(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Cai, T. T., Liu, W., and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 76(2):349–372.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
meantest.clx(X,Y)

Two-sample high-dimensional mean test (Chen and Qin, 2010)

Description

This function implements the two-sample l2l_2-norm-based high-dimensional mean test proposed by Chen and Qin (2010). Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. The test statistic MCQM_{CQ} is defined as

MCQ=1n1(n11)uvn1XuXv+1n2(n21)uvn2YuYv2n1n2un1vn2XuYv.M_{CQ} = \frac{1}{n_1(n_1-1)}\sum_{u\neq v}^{n_1} \mathbf{X}_{u}'\mathbf{X}_{v} +\frac{1}{n_2(n_2-1)}\sum_{u\neq v}^{n_2} \mathbf{Y}_{u}'\mathbf{Y}_{v} -\frac{2}{n_1n_2}\sum_u^{n_1}\sum_v^{n_2} \mathbf{X}_{u}'\mathbf{Y}_{v}.

Under the null hypothesis H0m:μ1=μ2H_{0m}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2, the leading variance of MCQM_{CQ} is σMCQ2=2n1(n11)tr(Σ12)+2n2(n21)tr(Σ22)+4n1n2tr(Σ1Σ2)\sigma^2_{M_{CQ}}=\frac{2}{n_1(n_1-1)}\text{tr}(\mathbf{\Sigma}_1^2)+ \frac{2}{n_2(n_2-1)}\text{tr}(\mathbf{\Sigma}_2^2)+ \frac{4}{n_1n_2}\text{tr}(\mathbf{\Sigma}_1\mathbf{\Sigma}_2), which can be consistently estimated by σ^MCQ2=2n1(n11)tr(Σ12)^+2n2(n21)tr(Σ22)^+4n1n2tr(Σ1Σ2)^.\widehat\sigma^2_{M_{CQ}}= \frac{2}{n_1(n_1-1)}\widehat{\text{tr}(\mathbf{\Sigma}_1^2)}+ \frac{2}{n_2(n_2-1)}\widehat{\text{tr}(\mathbf{\Sigma}_2^2)}+ \frac{4}{n_1n_2}\widehat{\text{tr}(\mathbf{\Sigma}_1\mathbf{\Sigma}_2)}. The explicit formulas of tr(Σ12)^\widehat{\text{tr}(\mathbf{\Sigma}_1^2)}, tr(Σ22)^\widehat{\text{tr}(\mathbf{\Sigma}_2^2)}, and tr(Σ1Σ2)^\widehat{\text{tr}(\mathbf{\Sigma}_1\mathbf{\Sigma}_2)} can be found in Section 3 of Chen and Qin (2010). With some regularity conditions, under the null hypothesis H0m:μ1=μ2H_{0m}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2, the test statistic MCQM_{CQ} converges in distribution to a standard normal distribution as n1,n2,pn_1, n_2, p \rightarrow \infty. The asymptotic pp-value is obtained by

pCQ=1Φ(MCQ/σ^MCQ),p_{CQ} = 1-\Phi(M_{CQ}/\hat\sigma_{M_{CQ}}),

where Φ()\Phi(\cdot) is the cdf of the standard normal distribution.

Usage

meantest.cq(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2):808–835.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
meantest.cq(X,Y)

Two-sample PE mean test for high-dimensional data via Cauchy combination

Description

This function implements the two-sample PE covariance test via Cauchy combination. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let pCQp_{CQ} and pCLXp_{CLX} denote the pp-values associated with the l2l_2-norm-based covariance test (see meantest.cq for details) and the ll_\infty-norm-based covariance test (see meantest.clx for details), respectively. The PE covariance test via Cauchy combination is defined as

MCauchy=12tan((0.5pCQ)π)+12tan((0.5pCLX)π).M_{Cauchy} = \frac{1}{2}\tan((0.5-p_{CQ})\pi) + \frac{1}{2}\tan((0.5-p_{CLX})\pi).

It has been proved that with some regularity conditions, under the null hypothesis H0m:μ1=μ2,H_{0m}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore MCauchyM_{Cauchy} asymptotically converges in distribution to a standard Cauchy distribution. The asymptotic pp-value is obtained by

p-value=1FCauchy(MCauchy),p\text{-value} = 1-F_{Cauchy}(M_{Cauchy}),

where FCauchy()F_{Cauchy}(\cdot) is the cdf of the standard Cauchy distribution.

Usage

meantest.pe.cauchy(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2):808–835.

Cai, T. T., Liu, W., and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 76(2):349–372.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
meantest.pe.cauchy(X,Y)

Two-sample PE mean test for high-dimensional data via PE component

Description

This function implements the two-sample PE mean via the construction of the PE component. Let MCQ/σ^MCQM_{CQ}/\hat\sigma_{M_{CQ}} denote the l2l_2-norm-based mean test statistic (see meantest.cq for details). The PE component is constructed by

Jm=pi=1pMiν^i1/2I{2Miν^i1/2+1>δmean},J_m = \sqrt{p}\sum_{i=1}^p M_i\widehat\nu^{-1/2}_i \mathcal{I}\{ \sqrt{2}M_i\widehat\nu^{-1/2}_i + 1 > \delta_{mean} \},

where δmean\delta_{mean} is a threshold for the screening procedure, recommended to take the value of δmean=2log(log(n1+n2))logp\delta_{mean}=2\log(\log (n_1+n_2))\log p. The explicit forms of MiM_{i} and ν^j\widehat\nu_{j} can be found in Section 3.1 of Yu et al. (2022). The PE covariance test statistic is defined as

MPE=MCQ/σ^MCQ+Jm.M_{PE}=M_{CQ}/\hat\sigma_{M_{CQ}}+J_m.

With some regularity conditions, under the null hypothesis H0m:μ1=μ2H_{0m}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2, the test statistic MPEM_{PE} converges in distribution to a standard normal distribution as n1,n2,pn_1, n_2, p \rightarrow \infty. The asymptotic pp-value is obtained by

p-value=1Φ(MPE),p\text{-value}= 1-\Phi(M_{PE}),

where Φ()\Phi(\cdot) is the cdf of the standard normal distribution.

Usage

meantest.pe.comp(dataX,dataY,delta=NULL)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

delta

a scalar; the thresholding value used in the construction of the PE component. If not specified, the function uses a default value δmean=2log(log(n1+n2))logp\delta_{mean}=2\log(\log (n_1+n_2))\log p.

Value

stat the value of test statistic

pval the p-value for the test.

References

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
meantest.pe.comp(X,Y)

Two-sample PE mean test for high-dimensional data via Fisher's combination

Description

This function implements the two-sample PE covariance test via Fisher's combination. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let pCQp_{CQ} and pCLXp_{CLX} denote the pp-values associated with the l2l_2-norm-based covariance test (see meantest.cq for details) and the ll_\infty-norm-based covariance test (see meantest.clx for details), respectively. The PE covariance test via Fisher's combination is defined as

MFisher=2log(pCQ)2log(pCLX).M_{Fisher} = -2\log(p_{CQ})-2\log(p_{CLX}).

It has been proved that with some regularity conditions, under the null hypothesis H0m:μ1=μ2,H_{0m}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore MFisherM_{Fisher} asymptotically converges in distribution to a χ42\chi_4^2 distribution. The asymptotic pp-value is obtained by

p-value=1Fχ42(MFisher),p\text{-value} = 1-F_{\chi_4^2}(M_{Fisher}),

where Fχ42()F_{\chi_4^2}(\cdot) is the cdf of the χ42\chi_4^2 distribution.

Usage

meantest.pe.fisher(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2):808–835.

Cai, T. T., Liu, W., and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 76(2):349–372.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
meantest.pe.fisher(X,Y)

Two-sample simultaneous tests on high-dimensional mean vectors and covariance matrices

Description

This function implements six two-sample simultaneous tests on high-dimensional mean vectors and covariance matrices. Let XRp\mathbf{X} \in \mathbb{R}^p and YRp\mathbf{Y} \in \mathbb{R}^p be two pp-dimensional populations with mean vectors (μ1,μ2)(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2) and covariance matrices (Σ1,Σ2)(\mathbf{\Sigma}_1, \mathbf{\Sigma}_2), respectively. The problem of interest is the simultaneous inference on the equality of mean vectors and covariance matrices of the two populations:

H0:μ1=μ2  and  Σ1=Σ2.H_0: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2 \ \text{ and } \ \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2.

Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. We denote dataX=(X1,,Xn1)Rn1×p(\mathbf{X}_1, \ldots, \mathbf{X}_{n_1})^\top\in\mathbb{R}^{n_1\times p} and dataY=(Y1,,Yn2)Rn2×p(\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2})^\top\in\mathbb{R}^{n_2\times p}.

Usage

simultest(dataX, dataY, method='pe.fisher', delta_mean=NULL, delta_cov=NULL)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

method

the method type (default = 'pe.fisher'); chosen from

  • 'cauchy': the simultaneous test via Cauchy combination;
    see simultest.cauchy for details.

  • 'chisq': the simultaneous test via chi-squared approximation;
    see simultest.chisq for details.

  • 'fisher': the simultaneous test via Fisher's combination;
    see simultest.fisher for details.

  • 'pe.cauchy': the PE simultaneous test via Cauchy combination;
    see simultest.pe.cauchy for details.

  • 'pe.chisq': the PE simultaneous test via chi-squared approximation;
    see simultest.pe.chisq for details.

  • 'pe.fisher': the PE simultaneous test via Fisher's combination;
    see simultest.pe.fisher for details.

delta_mean

the thresholding value used in the construction of the PE component for the mean test statistic. It is needed only in PE methods such as method='pe.cauchy', method='pe.chisq', and method='pe.fisher'; see simultest.pe.cauchy,
simultest.pe.chisq, and simultest.pe.fisher for details. The default is NULL.

delta_cov

the thresholding value used in the construction of the PE component for the covariance test statistic. It is needed only in PE methods such as method='pe.cauchy', method='pe.chisq', and method='pe.fisher'; see simultest.pe.cauchy,
simultest.pe.chisq, and simultest.pe.fisher for details. The default is NULL.

Value

method the method type

stat the value of test statistic

pval the p-value for the test.

References

Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2):808–835.

Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2):908–940.

Yu, X., Li, D., and Xue, L. (2022). Fisher’s combined probability test for high-dimensional covariance matrices. Journal of the American Statistical Association, (in press):1–14.

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
simultest(X,Y)

Two-sample simultaneous test using Cauchy combination

Description

This function implements the two-sample simultaneous test on high-dimensional mean vectors and covariance matrices using Cauchy combination. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let pCQp_{CQ} and pLCp_{LC} denote the pp-values associated with the l2l_2-norm-based mean test proposed in Chen and Qin (2010) (see meantest.cq for details) and the l2l_2-norm-based covariance test proposed in Li and Chen (2012) (see covtest.lc for details), respectively. The simultaneous test statistic via Cauchy combination is defined as

Cn1,n2=12tan((0.5pCQ)π)+12tan((0.5pLC)π).C_{n_1, n_2} = \frac{1}{2}\tan((0.5-p_{CQ})\pi) + \frac{1}{2}\tan((0.5-p_{LC})\pi).

It has been proved that with some regularity conditions, under the null hypothesis H0:μ1=μ2  and  Σ1=Σ2H_0: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2 \ \text{ and } \ \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore Cn1,n2C_{n_1,n_2} asymptotically converges in distribution to a standard Cauchy distribution. The asymptotic pp-value is obtained by

p-value=1FCauchy(Cn1,n2),p\text{-value} = 1-F_{Cauchy}(C_{n_1,n_2}),

where FCauchy()F_{Cauchy}(\cdot) is the cdf of the standard Cauchy distribution.

Usage

simultest.cauchy(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2):808–835.

Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2):908–940.

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
simultest.cauchy(X,Y)

Two-sample simultaneous test using chi-squared approximation

Description

This function implements the two-sample simultaneous test on high-dimensional mean vectors and covariance matrices using chi-squared approximation. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let MCQ/σ^MCQM_{CQ}/\hat\sigma_{M_{CQ}} denote the l2l_2-norm-based mean test statistic proposed in Chen and Qin (2010) (see meantest.cq for details), and let TLC/σ^TLCT_{LC}/\hat\sigma_{T_{LC}} denote the l2l_2-norm-based covariance test statistic proposed in Li and Chen (2012) (see covtest.lc for details). The simultaneous test statistic via chi-squared approximation is defined as

Sn1,n2=MCQ2/σ^MCQ2+TLC2/σ^TLC2.S_{n_1, n_2} = M_{CQ}^2/\hat\sigma^2_{M_{CQ}} + T_{LC}^2/\hat\sigma^2_{T_{LC}}.

It has been proved that with some regularity conditions, under the null hypothesis H0:μ1=μ2  and  Σ1=Σ2H_0: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2 \ \text{ and } \ \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore Sn1,n2S_{n_1,n_2} asymptotically converges in distribution to a χ22\chi_2^2 distribution. The asymptotic pp-value is obtained by

p-value=1Fχ22(Sn1,n2),p\text{-value} = 1-F_{\chi_2^2}(S_{n_1,n_2}),

where Fχ22()F_{\chi_2^2}(\cdot) is the cdf of the χ22\chi_2^2 distribution.

Usage

simultest.chisq(dataX,dataY)

Arguments

dataX

n1 by p data matrix

dataY

n2 by p data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
simultest.chisq(X,Y)

Two-sample simultaneous test using Fisher's combination

Description

This function implements the two-sample simultaneous test on high-dimensional mean vectors and covariance matrices using Fisher's combination. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let pCQp_{CQ} and pLCp_{LC} denote the pp-values associated with the l2l_2-norm-based mean test proposed in Chen and Qin (2010) (see meantest.cq for details) and the l2l_2-norm-based covariance test proposed in Li and Chen (2012) (see covtest.lc for details), respectively. The simultaneous test statistic via Fisher's combination is defined as

Jn1,n2=2log(pCQ)2log(pLC).J_{n_1, n_2} = -2\log(p_{CQ}) -2\log(p_{LC}).

It has been proved that with some regularity conditions, under the null hypothesis H0:μ1=μ2  and  Σ1=Σ2H_0: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2 \ \text{ and } \ \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore Jn1,n2J_{n_1,n_2} asymptotically converges in distribution to a χ42\chi_4^2 distribution. The asymptotic pp-value is obtained by

p-value=1Fχ42(Jn1,n2),p\text{-value} = 1-F_{\chi_4^2}(J_{n_1,n_2}),

where Fχ42()F_{\chi_4^2}(\cdot) is the cdf of the χ42\chi_4^2 distribution.

Usage

simultest.fisher(dataX,dataY)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

Value

stat the value of test statistic

pval the p-value for the test.

References

Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Annals of Statistics, 38(2):808–835.

Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2):908–940.

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
simultest.fisher(X,Y)

Two-sample PE simultaneous test using Cauchy combination

Description

This function implements the two-sample PE simultaneous test on high-dimensional mean vectors and covariance matrices using Cauchy combination. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let MPEM_{PE} and TPET_{PE} denote the PE mean test statistic and PE covariance test statistic, respectively. (see meantest.pe.comp and covtest.pe.comp for details). Let pmp_{m} and pcp_{c} denote their respective pp-values. The PE simultaneous test statistic via Cauchy combination is defined as

CPE=12tan((0.5pm)π)+12tan((0.5pc)π).C_{PE} = \frac{1}{2}\tan((0.5-p_{m})\pi) + \frac{1}{2}\tan((0.5-p_{c})\pi).

It has been proved that with some regularity conditions, under the null hypothesis H0:μ1=μ2  and  Σ1=Σ2H_0: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2 \ \text{ and } \ \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore CPEC_{PE} asymptotically converges in distribution to a standard Cauchy distribution. The asymptotic pp-value is obtained by

p-value=1FCauchy(CPE),p\text{-value} = 1-F_{Cauchy}(C_{PE}),

where FCauchy()F_{Cauchy}(\cdot) is the cdf of the standard Cauchy distribution.

Usage

simultest.pe.cauchy(dataX,dataY,delta_mean=NULL,delta_cov=NULL)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

delta_mean

a scalar; the thresholding value used in the construction of the PE component for mean test; see meantest.pe.comp for details.

delta_cov

a scalar; the thresholding value used in the construction of the PE component for covariance test; see covtest.pe.comp for details.

Value

stat the value of test statistic

pval the p-value for the test.

References

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
simultest.pe.cauchy(X,Y)

Two-sample PE simultaneous test using chi-squared approximation

Description

This function implements the two-sample PE simultaneous test on high-dimensional mean vectors and covariance matrices using chi-squared approximation. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let MPEM_{PE} and TPET_{PE} denote the PE mean test statistic and PE covariance test statistic, respectively. (see meantest.pe.comp and covtest.pe.comp for details). The PE simultaneous test statistic via chi-squared approximation is defined as

SPE=MPE2+TPE2.S_{PE} = M_{PE}^2 + T_{PE}^2.

It has been proved that with some regularity conditions, under the null hypothesis H0:μ1=μ2  and  Σ1=Σ2H_0: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2 \ \text{ and } \ \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore SPES_{PE} asymptotically converges in distribution to a χ22\chi_2^2 distribution. The asymptotic pp-value is obtained by

p-value=1Fχ22(SPE),p\text{-value} = 1-F_{\chi_2^2}(S_{PE}),

where Fχ22()F_{\chi_2^2}(\cdot) is the cdf of the χ22\chi_2^2 distribution.

Usage

simultest.pe.chisq(dataX,dataY,delta_mean=NULL,delta_cov=NULL)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

delta_mean

a scalar; the thresholding value used in the construction of the PE component for mean test; see meantest.pe.comp for details.

delta_cov

a scalar; the thresholding value used in the construction of the PE component for covariance test; see covtest.pe.comp for details.

Value

stat the value of test statistic

pval the p-value for the test.

References

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
simultest.pe.chisq(X,Y)

Two-sample PE simultaneous test using Fisher's combination

Description

This function implements the two-sample PE simultaneous test on high-dimensional mean vectors and covariance matrices using Fisher's combination. Suppose {X1,,Xn1}\{\mathbf{X}_1, \ldots, \mathbf{X}_{n_1}\} are i.i.d. copies of X\mathbf{X}, and {Y1,,Yn2}\{\mathbf{Y}_1, \ldots, \mathbf{Y}_{n_2}\} are i.i.d. copies of Y\mathbf{Y}. Let MPEM_{PE} and TPET_{PE} denote the PE mean test statistic and PE covariance test statistic, respectively. (see meantest.pe.comp and covtest.pe.comp for details). Let pmp_{m} and pcp_{c} denote their respective pp-values. The PE simultaneous test statistic via Fisher's combination is defined as

JPE=2log(pm)2log(pc).J_{PE} = -2\log(p_{m})-2\log(p_{c}).

It has been proved that with some regularity conditions, under the null hypothesis H0:μ1=μ2  and  Σ1=Σ2H_0: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2 \ \text{ and } \ \mathbf{\Sigma}_1 = \mathbf{\Sigma}_2, the two tests are asymptotically independent as n1,n2,pn_1, n_2, p\rightarrow \infty, and therefore JPEJ_{PE} asymptotically converges in distribution to a χ42\chi_4^2 distribution. The asymptotic pp-value is obtained by

p-value=1Fχ42(JPE),p\text{-value} = 1-F_{\chi_4^2}(J_{PE}),

where Fχ42()F_{\chi_4^2}(\cdot) is the cdf of the χ42\chi_4^2 distribution.

Usage

simultest.pe.fisher(dataX,dataY,delta_mean=NULL,delta_cov=NULL)

Arguments

dataX

an n1n_1 by pp data matrix

dataY

an n2n_2 by pp data matrix

delta_mean

a scalar; the thresholding value used in the construction of the PE component for mean test; see meantest.pe.comp for details.

delta_cov

a scalar; the thresholding value used in the construction of the PE component for covariance test; see covtest.pe.comp for details.

Value

stat the value of test statistic

pval the p-value for the test.

References

Yu, X., Li, D., Xue, L., and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, (in press):1–14.

Examples

n1 = 100; n2 = 100; pp = 500
set.seed(1)
X = matrix(rnorm(n1*pp), nrow=n1, ncol=pp)
Y = matrix(rnorm(n2*pp), nrow=n2, ncol=pp)
simultest.pe.fisher(X,Y)