Package 'HDNRA'

Title: High-Dimensional Location Testing with Normal-Reference Approaches
Description: We provide a collection of various classical tests and latest normal-reference tests for comparing high-dimensional mean vectors including two-sample and general linear hypothesis testing (GLHT) problem. Some existing tests for two-sample problem [see Bai, Zhidong, and Hewa Saranadasa.(1996) <https://www.jstor.org/stable/24306018>; Chen, Song Xi, and Ying-Li Qin.(2010) <doi:10.1214/09-aos716>; Srivastava, Muni S., and Meng Du.(2008) <doi:10.1016/j.jmva.2006.11.002>; Srivastava, Muni S., Shota Katayama, and Yutaka Kano.(2013)<doi:10.1016/j.jmva.2012.08.014>]. Normal-reference tests for two-sample problem [see Zhang, Jin-Ting, Jia Guo, Bu Zhou, and Ming-Yen Cheng.(2020) <doi:10.1080/01621459.2019.1604366>; Zhang, Jin-Ting, Bu Zhou, Jia Guo, and Tianming Zhu.(2021) <doi:10.1016/j.jspi.2020.11.008>; Zhang, Liang, Tianming Zhu, and Jin-Ting Zhang.(2020) <doi:10.1016/j.ecosta.2019.12.002>; Zhang, Liang, Tianming Zhu, and Jin-Ting Zhang.(2023) <doi:10.1080/02664763.2020.1834516>; Zhang, Jin-Ting, and Tianming Zhu.(2022) <doi:10.1080/10485252.2021.2015768>; Zhang, Jin-Ting, and Tianming Zhu.(2022) <doi:10.1007/s42519-021-00232-w>; Zhu, Tianming, Pengfei Wang, and Jin-Ting Zhang.(2023) <doi:10.1007/s00180-023-01433-6>]. Some existing tests for GLHT problem [see Fujikoshi, Yasunori, Tetsuto Himeno, and Hirofumi Wakaki.(2004) <doi:10.14490/jjss.34.19>; Srivastava, Muni S., and Yasunori Fujikoshi.(2006) <doi:10.1016/j.jmva.2005.08.010>; Yamada, Takayuki, and Muni S. Srivastava.(2012) <doi:10.1080/03610926.2011.581786>; Schott, James R.(2007) <doi:10.1016/j.jmva.2006.11.007>; Zhou, Bu, Jia Guo, and Jin-Ting Zhang.(2017) <doi:10.1016/j.jspi.2017.03.005>]. Normal-reference tests for GLHT problem [see Zhang, Jin-Ting, Jia Guo, and Bu Zhou.(2017) <doi:10.1016/j.jmva.2017.01.002>; Zhang, Jin-Ting, Bu Zhou, and Jia Guo.(2022) <doi:10.1016/j.jmva.2021.104816>; Zhu, Tianming, Liang Zhang, and Jin-Ting Zhang.(2022) <doi:10.5705/ss.202020.0362>; Zhu, Tianming, and Jin-Ting Zhang.(2022) <doi:10.1007/s00180-021-01110-6>; Zhang, Jin-Ting, and Tianming Zhu.(2022) <doi:10.1016/j.csda.2021.107385>].
Authors: Pengfei Wang [aut, cre], Shuqi Luo [aut], Tianming Zhu [aut], Bu Zhou [aut]
Maintainer: Pengfei Wang <[email protected]>
License: GPL (>= 3)
Version: 2.0.1
Built: 2024-10-23 05:22:14 UTC
Source: CRAN

Help Index


Normal-approximation-based test for two-sample problem proposed by Bai and Saranadasa (1996)

Description

Bai and Saranadasa (1996)'s test for testing equality of two-sample high-dimensional mean vectors with assuming that two covariance matrices are the same.

Usage

BS1996.TS.NABT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σ,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma},i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Bai and Saranadasa (1996) proposed the following centralised L2L^2-norm-based test statistic:

TBS=n1n2nyˉ1yˉ22tr(Σ^),T_{BS} = \frac{n_1n_2}{n} \|\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2\|^2-\operatorname{tr}(\hat{\boldsymbol{\Sigma}}),

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors and Σ^\hat{\boldsymbol{\Sigma}} is the pooled sample covariance matrix. They showed that under the null hypothesis, TBST_{BS} is asymptotically normally distributed.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Bai Z, Saranadasa H (1996). “Effect of high dimension: by an example of a two sample problem.” Statistica Sinica, 311–329. https://www.jstor.org/stable/24306018.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
BS1996.TS.NABT(group1,group2)

HDNRA_data corneal

Description

This dataset was acquired during a keratoconus study, a collaborative project involving Ms.Nancy Tripoli and Dr.Kenneth L.Cohen of Department of Ophthalmology at the University of North Carolina, Chapel Hill. The fitted feature vectors for the complete corneal surface dataset collectively into a feature matrix with dimensions of 150 × 2000.

Usage

data(corneal)

Format

'corneal'

A data frame with 150 observations on the following 4 groups.

normal group1

row 1 to row 43 in total 43 rows of the feature matrix correspond to observations from the normal group

unilateral suspect group2

row 44 to row 57 in total 14 rows of the feature matrix correspond to observations from the unilateral suspect group

suspect map group3

row 58 to row 78 in total 21 of the feature matrix correspond to observations from the suspect map group

clinical keratoconus group4

row 79 to row 150 in total 72 of the feature matrix correspond to observations from the clinical keratoconus group

References

Smaga Ł, Zhang J (2019). “Linear hypothesis testing with functional data.” Technometrics, 61(1), 99–110. doi:10.1080/00401706.2018.1456976.

Examples

library(HDNRA)
data(corneal)
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
dim(group1)
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
dim(group2)
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
dim(group3)
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
dim(group4)

HDNRA_data COVID19

Description

A COVID19 data set from NCBI with ID GSE152641. The data set profiled peripheral blood from 24 healthy controls and 62 prospectively enrolled patients with community-acquired lower respiratory tract infection by SARS-COV-2 within the first 24 hours of hospital admission using RNA sequencing.

Usage

data(COVID19)

Format

'COVID19'

A data frame with 86 observations on the following 2 groups.

healthy group1

row 2 to row 19, and row 82 to 87, in total 24 healthy controls

patients group2

row 20 to 81, in total 62 prospectively enrolled patients

References

Thair SA, He YD, Hasin-Brumshtein Y, Sakaram S, Pandya R, Toh J, Rawling D, Remmel M, Coyle S, Dalekos GN, others (2021). “Transcriptomic similarities and differences in host response between SARS-CoV-2 and other viral infections.” Iscience, 24(1). doi:10.1016/j.isci.2020.101947.

Examples

library(HDNRA)
data(COVID19)
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
dim(group1)
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
dim(group2)

Normal-approximation-based test for two-sample BF problem proposed by Chen and Qin (2010)

Description

Chen and Qin (2010)'s test for testing equality of two-sample high-dimensional mean vectors without assuming that two covariance matrices are the same.

Usage

CQ2010.TSBF.NABT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σi,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Chen and Qin (2010) proposed the following test statistic:

TCQ=ijn1y1iy1jn1(n11)+ijn2y2iy2jn2(n21)2i=1n1j=1n2y1iy2jn1n2.T_{CQ} = \frac{\sum_{i \neq j}^{n_1} \boldsymbol{y}_{1i}^\top \boldsymbol{y}_{1j}}{n_1 (n_1 - 1)} + \frac{\sum_{i \neq j}^{n_2} \boldsymbol{y}_{2i}^\top \boldsymbol{y}_{2j}}{n_2 (n_2 - 1)} - 2 \frac{\sum_{i = 1}^{n_1} \sum_{j = 1}^{n_2} \boldsymbol{y}_{1i}^\top \boldsymbol{y}_{2j}}{n_1 n_2}.

They showed that under the null hypothesis, TCQT_{CQ} is asymptotically normally distributed.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Chen SX, Qin Y (2010). “A two-sample test for high-dimensional data with applications to gene-set testing.” The Annals of Statistics, 38(2). doi:10.1214/09-aos716.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
CQ2010.TSBF.NABT(group1,group2)

Normal-approximation-based test for GLHT problem proposed by Fujikoshi et al. (2004)

Description

Fujikoshi et al. (2004)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.

Usage

FHW2004.GLHT.NABT(Y,X,C,n,p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i \times p) from the iith population with each row representing a pp-dimensional observation.

X

A known n×kn\times k full-rank design matrix with rank(X)=k<n\operatorname{rank}(\boldsymbol{X})=k<n.

C

A known matrix of size q×kq\times k with rank(C)=q<k\operatorname{rank}(\boldsymbol{C})=q<k.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

A high-dimensional linear regression model can be expressed as

Y=XΘ+ϵ,\boldsymbol{Y}=\boldsymbol{X\Theta}+\boldsymbol{\epsilon},

where Θ\Theta is a k×pk\times p unknown parameter matrix and ϵ\boldsymbol{\epsilon} is an n×pn\times p error matrix.

It is of interest to test the following GLHT problem

H0:CΘ=0, vs. H1:CΘ0.H_0: \boldsymbol{C\Theta}=\boldsymbol{0}, \quad \text { vs. } \quad H_1: \boldsymbol{C\Theta} \neq \boldsymbol{0}.

Fujikoshi et al. (2004) proposed the following test statistic:

TFHW=p[(nk)tr(Sh)tr(Se)q],T_{FHW}=\sqrt{p}\left[(n-k)\frac{\operatorname{tr}(\boldsymbol{S}_h)}{\operatorname{tr}(\boldsymbol{S}_e)}-q\right],

where Sh\boldsymbol{S}_h and Se\boldsymbol{S}_e are the matrices of sums of squares and products due to the hypothesis and the error, respecitively.

They showed that under the null hypothesis, TFHWT_{FHW} is asymptotically normally distributed.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Fujikoshi Y, Himeno T, Wakaki H (2004). “Asymptotic results of a high dimensional MANOVA test and power comparison when the dimension is large compared to the sample size.” Journal of the Japan Statistical Society, 34(1), 19–26. doi:10.14490/jjss.34.19.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
k <- 4
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
X <- matrix(c(rep(1,n[1]),rep(0,sum(n)),rep(1,n[2]), rep(0,sum(n)),
            rep(1,n[3]),rep(0,sum(n)),rep(1,n[4])),ncol=k,nrow=sum(n))
q <- k-1
C <- cbind(diag(q),-rep(1,q))
FHW2004.GLHT.NABT(Y,X,C,n,p)

S3 Class "NRtest"

Description

The "NRtest" objects provide a comprehensive summary of hypothesis test outcomes, including test statistics, p-values, parameter estimates, and confidence intervals, if applicable.

Usage

NRtest.object(
  statistic,
  p.value,
  method,
  null.value,
  alternative,
  parameter = NULL,
  sample.size = NULL,
  sample.dimension = NULL,
  estimation.method = NULL,
  data.name = NULL,
  ...
)

Arguments

statistic

Numeric scalar containing the value of the test statistic, with a names attribute indicating the name of the test statistic.

p.value

Numeric scalar containing the p-value for the test.

method

Character string giving the name of the test.

null.value

Character string indicating the null hypothesis.

alternative

Character string indicating the alternative hypothesis.

parameter

Numeric vector containing the estimated approximation parameter(s) associated with the approximation method. This vector has a names attribute describing its element(s).

sample.size

Numeric vector containing the number of observations in each group used for the hypothesis test.

sample.dimension

Numeric scalar containing the dimension of the dataset used for the hypothesis test.

estimation.method

Character string giving the name of the approximation approach used to approximate the null distribution of the test statistic.

data.name

Character string describing the data set used in the hypothesis test.

...

Additional optional arguments.

Details

A class of objects returned by high-dimensional hypothesis testing functions in the HDNRA package, designed to encapsulate detailed results from statistical hypothesis tests. These objects are structured similarly to htest objects in the package EnvStats but are tailored to the needs of the HDNRA package.

Value

An object of class "NRtest" containing both required and optional components depending on the specifics of the hypothesis test, shown as follows:

Required Components

These components must be present in every "NRtest" object:

statistic

Must e present.

p.value

Must e present.

null.value

Must e present.

alternative

Must e present.

method

Must e present.

Optional Components

These components are included depending on the specifics of the hypothesis test performed:

parameter

May be present.

sample.size

May be present.

sample.dimension

May be present.

estimation.method

May be present.

data.name

May be present.

Methods

The class has the following methods:

print.NRtest

Printing the contents of the NRtest object in a human-readable form.

Examples

# Example 1: Using Bai and Saranadasa (1996)'s test (two-sample problem)
NRtest.obj1 <- NRtest.object(
  statistic = c("T[BS]" = 2.208),
  p.value = 0.0136,
  method = "Bai and Saranadasa (1996)'s test",
  data.name = "group1 and group2",
  null.value = c("Difference between two mean vectors is o"),
  alternative = "Difference between two mean vectors is not 0",
  parameter = NULL,
  sample.size = c(n1 = 24, n2 = 26),
  sample.dimension = 20460,
  estimation.method = "Normal approximation"
)
print(NRtest.obj1)

# Example 2: Using Fujikoshi et al. (2004)'s test (GLHT problem)
NRtest.obj2 <- NRtest.object(
  statistic = c("T[FHW]" = 6.4015),
  p.value = 0,
  method = "Fujikoshi et al. (2004)'s test",
  data.name = "Y",
  null.value  = "The general linear hypothesis is true",
  alternative = "The general linear hypothesis is not true",
  parameter = NULL,
  sample.size = c(n1 = 43, n2 = 14, n3 = 21, n4 = 72),
  sample.dimension = 2000,
  estimation.method = "Normal approximation"
)
print(NRtest.obj2)

Print Method for S3 Class "NRtest"

Description

Prints the details of the NRtest object in a user-friendly manner. This method provides a clear and concise presentation of the test results contained within the NRtest object, including all relevant statistical metrics and test details.

Usage

## S3 method for class \pkg{NRtest}
## S3 method for class 'NRtest'
print(x, ...)

Arguments

x

an NRtest object.

...

further arguments passed to or from other methods.

Details

The print.NRtest function formats and presents the contents of the NRtest object, which includes statistical test results and related parameters. This function is designed to provide a user-friendly display of the object's contents, making it easier to understand the results of the analysis.

Value

Invisibly returns the input x.

Author(s)

Pengfei Wang [email protected]

See Also

NRtest.object


Normal-approximation-based test for one-way MANOVA problem proposed by Schott (2007)

Description

Schott, J. R. (2007)'s test for one-way MANOVA problem for high-dimensional data with assuming that underlying covariance matrices are the same.

Usage

S2007.ks.NABT(Y, n, p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i \times p) from the iith population with each row representing a pp-dimensional observation.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

Suppose we have the following kk independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σ,i=1,,k.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma},i=1,\ldots,k.

It is of interest to test the following one-way MANOVA problem:

H0:μ1==μk, vs.   H1:H0  is  not  ture.H_0: \boldsymbol{\mu}_1=\cdots=\boldsymbol{\mu}_k, \quad \text { vs. }\; H_1: H_0 \;\operatorname{is \; not\; ture}.

Schott (2007) proposed the following test statistic:

TS=[tr(H)/htr(E)/e]/N1,T_{S}=[\operatorname{tr}(\boldsymbol{H})/h-\operatorname{tr}(\boldsymbol{E})/e]/\sqrt{N-1},

where H=i=1kni(yˉiyˉ)(yˉiyˉ)\boldsymbol{H}=\sum_{i=1}^kn_i(\bar{\boldsymbol{y}}_i-\bar{\boldsymbol{y}})(\bar{\boldsymbol{y}}_i-\bar{\boldsymbol{y}})^\top, E=i=1kj=1ni(yijyˉi)(yijyˉi)\boldsymbol{E}=\sum_{i=1}^k\sum_{j=1}^{n_i}(\boldsymbol{y}_{ij}-\bar{\boldsymbol{y}}_{i})(\boldsymbol{y}_{ij}-\bar{\boldsymbol{y}}_{i})^\top, h=k1h=k-1, and e=Nke=N-k, with N=n1++nkN=n_1+\cdots+n_k. They showed that under the null hypothesis, TST_{S} is asymptotically normally distributed.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Schott JR (2007). “Some high-dimensional tests for a one-way MANOVA.” Journal of Multivariate Analysis, 98(9), 1825–1839. doi:10.1016/j.jmva.2006.11.007.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
S2007.ks.NABT(Y, n, p)

Normal-approximation-based test for two-sample problem proposed by Srivastava and Du (2008)

Description

Srivastava and Du (2008)'s test for testing equality of two-sample high-dimensional mean vectors with assuming that two covariance matrices are the same.

Usage

SD2008.TS.NABT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σ,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma},i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Srivastava and Du (2008) proposed the following test statistic:

TSD=n1n1n2(yˉ1yˉ2)DS1(yˉ1yˉ2)(n2)pn42[tr(R2)p2n2]cp,n,T_{SD} = \frac{n^{-1}n_1n_2(\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2)^\top \boldsymbol{D}_S^{-1}(\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2) - \frac{(n-2)p}{n-4}}{\sqrt{2 \left[\operatorname{tr}(\boldsymbol{R}^2) - \frac{p^2}{n-2}\right] c_{p, n}}},

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors, DS\boldsymbol{D}_S is the diagonal matrix of sample variance, R\boldsymbol{R} is the sample correlation matrix and cp,nc_{p, n} is the adjustment coefficient proposed by Srivastava and Du (2008). They showed that under the null hypothesis, TSDT_{SD} is asymptotically normally distributed.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Srivastava MS, Du M (2008). “A test for the mean vector with fewer observations than the dimension.” Journal of Multivariate Analysis, 99(3), 386–402. doi:10.1016/j.jmva.2006.11.002.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
SD2008.TS.NABT(group1,group2)

Normal-approximation-based test for GLHT problem proposed by Srivastava and Fujikoshi (2006)

Description

Srivastava and Fujikoshi (2006)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.

Usage

SF2006.GLHT.NABT(Y,X,C,n,p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i \times p) from the iith population with each row representing a pp-dimensional observation.

X

A known n×kn\times k full-rank design matrix with rank(X)=k<n\operatorname{rank}(\boldsymbol{X})=k<n.

C

A known matrix of size q×kq\times k with rank(C)=q<k\operatorname{rank}(\boldsymbol{C})=q<k.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

A high-dimensional linear regression model can be expressed as

Y=XΘ+ϵ,\boldsymbol{Y}=\boldsymbol{X\Theta}+\boldsymbol{\epsilon},

where Θ\Theta is a k×pk\times p unknown parameter matrix and ϵ\boldsymbol{\epsilon} is an n×pn\times p error matrix.

It is of interest to test the following GLHT problem

H0:CΘ=0, vs. H1:CΘ0.H_0: \boldsymbol{C\Theta}=\boldsymbol{0}, \quad \text { vs. } \quad H_1: \boldsymbol{C\Theta} \neq \boldsymbol{0}.

Srivastava and Fujikoshi (2006) proposed the following test statistic:

TSF=[2qa^2(1+(nk)1q)]1/2[tr(B)pqnktr(W)(nk)p].T_{SF}=\left[2q\hat{a}_2(1+(n-k)^{-1}q)\right]^{-1/2}\left[\frac{\operatorname{tr}(\boldsymbol{B})}{\sqrt{p}}-\frac{q}{\sqrt{n-k}}\frac{\operatorname{tr}(\boldsymbol{W})}{\sqrt{(n-k)p}}\right].

where W\boldsymbol{W} and B\boldsymbol{B} are the matrix of sum of squares and products due to error and the error, respectively, and a^2=[tr(W2)tr2(W)/(nk)]/[(nk1)(nk+2)p]\hat{a}_2=[\operatorname{tr}(\boldsymbol{W}^2)-\operatorname{tr}^2(\boldsymbol{W})/(n-k)]/[(n-k-1)(n-k+2)p]. They showed that under the null hypothesis, TSFT_{SF} is asymptotically normally distributed.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Srivastava MS, Fujikoshi Y (2006). “Multivariate analysis of variance with fewer observations than the dimension.” Journal of Multivariate Analysis, 97(9), 1927–1940. doi:10.1016/j.jmva.2005.08.010.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
k <- 4
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
X <- matrix(c(rep(1,n[1]),rep(0,sum(n)),rep(1,n[2]), rep(0,sum(n)),
            rep(1,n[3]),rep(0,sum(n)),rep(1,n[4])),ncol=k,nrow=sum(n))
q <- k-1
C <- cbind(diag(q),-rep(1,q))
SF2006.GLHT.NABT(Y,X,C,n,p)

Normal-approximation-based test for two-sample BF problem proposed by Srivastava et al. (2013)

Description

Srivastava et al. (2013)'s test for testing equality of two-sample high-dimensional mean vectors without assuming that two covariance matrices are the same.

Usage

SKK2013.TSBF.NABT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σi,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Srivastava et al. (2013) proposed the following test statistic:

TSKK=(yˉ1yˉ2)D^1(yˉ1yˉ2)p2Var^(q^n)cp,n,T_{SKK} = \frac{(\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2)^\top \hat{\boldsymbol{D}}^{-1}(\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2) - p}{\sqrt{2 \widehat{\operatorname{Var}}(\hat{q}_n) c_{p,n}}},

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors, D^=D^1/n1+D^2/n2\hat{\boldsymbol{D}}=\hat{\boldsymbol{D}}_1/n_1+\hat{\boldsymbol{D}}_2/n_2 with D^i,i=1,2\hat{\boldsymbol{D}}_i,i=1,2 being the diagonal matrices consisting of only the diagonal elements of the sample covariance matrices. Var^(q^n)\widehat{\operatorname{Var}}(\hat{q}_n) is given by equation (1.18) in Srivastava et al. (2013), and cp,nc_{p, n} is the adjustment coefficient proposed by Srivastava et al. (2013). They showed that under the null hypothesis, TSKKT_{SKK} is asymptotically normally distributed.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Srivastava MS, Katayama S, Kano Y (2013). “A two sample test in high dimensional data.” Journal of Multivariate Analysis, 114, 349–358. doi:10.1016/j.jmva.2012.08.014.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
SKK2013.TSBF.NABT(group1,group2)

Normal-approximation-based test for GLHT problem proposed by Yamada and Srivastava (2012)

Description

Yamada and Srivastava (2012)'test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.

Usage

YS2012.GLHT.NABT(Y,X,C,n,p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i \times p) from the iith population with each row representing a pp-dimensional observation.

X

A known n×kn\times k full-rank design matrix with rank(X)=k<n\operatorname{rank}(\boldsymbol{X})=k<n.

C

A known matrix of size q×kq\times k with rank(C)=q<k\operatorname{rank}(\boldsymbol{C})=q<k.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

A high-dimensional linear regression model can be expressed as

Y=XΘ+ϵ,\boldsymbol{Y}=\boldsymbol{X\Theta}+\boldsymbol{\epsilon},

where Θ\Theta is a k×pk\times p unknown parameter matrix and ϵ\boldsymbol{\epsilon} is an n×pn\times p error matrix.

It is of interest to test the following GLHT problem

H0:CΘ=0, vs. H1:CΘ0.H_0: \boldsymbol{C\Theta}=\boldsymbol{0}, \quad \text { vs. } H_1: \boldsymbol{C\Theta} \neq \boldsymbol{0}.

Yamada and Srivastava (2012) proposed the following test statistic:

TYS=(nk)tr(ShDSe1)(nk)pq/(nk2)2q[tr(R2)p2/(nk)]cp,n,T_{YS}=\frac{(n-k)\operatorname{tr}(\boldsymbol{S}_h\boldsymbol{D}_{\boldsymbol{S}_e}^{-1})-(n-k)pq/(n-k-2)}{\sqrt{2q[\operatorname{tr}(\boldsymbol{R}^2)-p^2/(n-k)]c_{p,n}}},

where Sh\boldsymbol{S}_h and Se\boldsymbol{S}_e are the variation matrices due to the hypothesis and error, respectively, and DSe\boldsymbol{D}_{\boldsymbol{S}_e} and R\boldsymbol{R} are diagonal matrix with the diagonal elements of Se\boldsymbol{S}_e and the sample correlation matrix, respectively. cp,nc_{p, n} is the adjustment coefficient proposed by Yamada and Srivastava (2012). They showed that under the null hypothesis, TYST_{YS} is asymptotically normally distributed.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Yamada T, Srivastava MS (2012). “A test for multivariate analysis of variance in high dimension.” Communications in Statistics-Theory and Methods, 41(13-14), 2602–2615. doi:10.1080/03610926.2011.581786.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
k <- 4
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
X <- matrix(c(rep(1,n[1]),rep(0,sum(n)),rep(1,n[2]), rep(0,sum(n)),rep(1,n[3]),
            rep(0,sum(n)),rep(1,n[4])),ncol=k,nrow=sum(n))
q <- k-1
C <- cbind(diag(q),-rep(1,q))
YS2012.GLHT.NABT(Y,X,C,n,p)

Normal-reference-test with two-cumulant (2-c) matched $\chi^2$-approximation for GLHT problem proposed Zhang et al. (2017)

Description

Zhang et al. (2017)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.

Usage

ZGZ2017.GLHT.2cNRT(Y,G,n,p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i\times p) from the iith population with each row representing a pp-dimensional observation.

G

A known full-rank coefficient matrix (q×kq\times k) with rank(G)<k\operatorname{rank}(\boldsymbol{G})<k.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

Suppose we have the following kk independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σ,  i=1,,k.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma},\;i=1,\ldots,k.

It is of interest to test the following GLHT problem:

H0:GM=0, vs. H1:GM0,H_0: \boldsymbol{G M}=\boldsymbol{0}, \quad \text { vs. } \quad H_1: \boldsymbol{G M} \neq \boldsymbol{0},

where M=(μ1,,μk)\boldsymbol{M}=(\boldsymbol{\mu}_1,\ldots,\boldsymbol{\mu}_k)^\top is a k×pk\times p matrix collecting kk mean vectors and G:q×k\boldsymbol{G}:q\times k is a known full-rank coefficient matrix with rank(G)<k\operatorname{rank}(\boldsymbol{G})<k.

Zhang et al. (2017) proposed the following test statistic:

TZGZ=Cμ^2,T_{ZGZ}=\|\boldsymbol{C \hat{\mu}}\|^2,

where C=[(GDG)1/2G]Ip\boldsymbol{C}=[(\boldsymbol{G D G}^\top)^{-1/2}\boldsymbol{G}]\otimes\boldsymbol{I}_p, and μ^=(yˉ1,,yˉk)\hat{\boldsymbol{\mu}}=(\bar{\boldsymbol{y}}_1^\top,\ldots,\bar{\boldsymbol{y}}_k^\top)^\top, with yˉi,i=1,,k\bar{\boldsymbol{y}}_{i},i=1,\ldots,k being the sample mean vectors and D=diag(1/n1,,1/nk)\boldsymbol{D}=\operatorname{diag}(1/n_1,\ldots,1/n_k).

They showed that under the null hypothesis, TZGZT_{ZGZ} and a chi-squared-type mixture have the same normal or non-normal limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhang J, Guo J, Zhou B (2017). “Linear hypothesis testing in high-dimensional one-way MANOVA.” Journal of Multivariate Analysis, 155, 200–216. doi:10.1016/j.jmva.2017.01.002.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
k <- 4
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
G <- cbind(diag(k-1),rep(-1,k-1))
ZGZ2017.GLHT.2cNRT(Y,G,n,p)

Normal-approximation-based test for GLHT problem under heteroscedasticity proposed by Zhou et al. (2017)

Description

Zhou et al. (2017)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data under heteroscedasticity.

Usage

ZGZ2017.GLHTBF.NABT(Y,G,n,p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i\times p) from the iith population with each row representing a pp-dimensional observation.

G

A known full-rank coefficient matrix (q×kq\times k) with rank(G)<k\operatorname{rank}(\boldsymbol{G})< k.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

Suppose we have the following kk independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σi,i=1,,k.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,\ldots,k.

It is of interest to test the following GLHT problem:

H0:GM=0, vs. H1:GM0,H_0: \boldsymbol{G M}=\boldsymbol{0}, \quad \text { vs. } H_1: \boldsymbol{G M} \neq \boldsymbol{0},

where M=(μ1,,μk)\boldsymbol{M}=(\boldsymbol{\mu}_1,\ldots,\boldsymbol{\mu}_k)^\top is a k×pk\times p matrix collecting kk mean vectors and G:q×k\boldsymbol{G}:q\times k is a known full-rank coefficient matrix with rank(G)<k\operatorname{rank}(\boldsymbol{G})<k.

Let yˉi,i=1,,k\bar{\boldsymbol{y}}_{i},i=1,\ldots,k be the sample mean vectors and Σ^i,i=1,,k\hat{\boldsymbol{\Sigma}}_i,i=1,\ldots,k be the sample covariance matrices.

Zhou et al. (2017) proposed the following U-statistic based test statistic:

TZGZ=Cμ^2i=1khiitr(Σ^i)/ni,T_{ZGZ}=\|\boldsymbol{C \hat{\mu}}\|^2-\sum_{i=1}^k h_{ii}\operatorname{tr}(\hat{\boldsymbol{\Sigma}}_i)/n_i,

where C=[(GDG)1/2G]Ip\boldsymbol{C}=[(\boldsymbol{G D G}^\top)^{-1/2}\boldsymbol{G}]\otimes\boldsymbol{I}_p, D=diag(1/n1,,1/nk)\boldsymbol{D}=\operatorname{diag}(1/n_1,\ldots,1/n_k), and hijh_{ij} is the (i,j)(i,j)th entry of the k×kk\times k matrix H=G(GDG)1G\boldsymbol{H}=\boldsymbol{G}^\top(\boldsymbol{G}\boldsymbol{D}\boldsymbol{G}^\top)^{-1}\boldsymbol{G}.

They showed that under the null hypothesis, TZGZT_{ZGZ} is asymptotically normally distributed.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhou B, Guo J, Zhang J (2017). “High-dimensional general linear hypothesis testing under heteroscedasticity.” Journal of Statistical Planning and Inference, 188, 36–54. doi:10.1016/j.jspi.2017.03.005.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
k <- 4
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
G <- cbind(diag(k-1),rep(-1,k-1))
ZGZ2017.GLHTBF.NABT(Y,G,n,p)

Normal-reference-test with two-cumulant (2-c) matched $\chi^2$-approximation for two-sample problem proposed by Zhang et al. (2020)

Description

Zhang et al. (2020)'s test for testing equality of two-sample high-dimensional mean vectors with assuming that two covariance matrices are the same.

Usage

ZGZC2020.TS.2cNRT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σ,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma},i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Zhang et al.(2020) proposed the following test statistic:

TZGZC=n1n2nyˉ1yˉ22,T_{ZGZC} = \frac{n_1n_2}{n} \|\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2\|^2,

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors. They showed that under the null hypothesis, TZGZCT_{ZGZC} and a chi-squared-type mixture have the same normal or non-normal limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhang J, Guo J, Zhou B, Cheng M (2020). “A simple two-sample test in high dimensions based on L 2-norm.” Journal of the American Statistical Association, 115(530), 1011–1027. doi:10.1080/01621459.2019.1604366.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
ZGZC2020.TS.2cNRT(group1, group2)

Normal-reference-test with two-cumulant (2-c) matched $\chi^2$-approximation for two-sample BF problem proposed by Zhu et al. (2023)

Description

Zhu et al. (2023)'s test for testing equality of two-sample high-dimensional mean vectors without assuming that two covariance matrices are the same.

Usage

ZWZ2023.TSBF.2cNRT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σi,  i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,\; i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Zhu et al. (2023) proposed the following test statistic:

TZWZ=n1n2n1yˉ1yˉ22tr(Ω^n),T_{ZWZ}=\frac{n_1n_2n^{-1}\|\bar{\boldsymbol{y}}_1-\bar{\boldsymbol{y}}_2\|^2}{\operatorname{tr}(\hat{\boldsymbol{\Omega}}_n)},

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors and Ω^n\hat{\boldsymbol{\Omega}}_n is the estimator of Cov[(n1n2/n)1/2(yˉ1yˉ2)]\operatorname{Cov}[(n_1n_2/n)^{1/2}(\bar{\boldsymbol{y}}_1-\bar{\boldsymbol{y}}_2)]. They showed that under the null hypothesis, TZWZT_{ZWZ} and an F-type mixture have the same normal or non-normal limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhu T, Wang P, Zhang J (2023). “Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test.” Computational Statistics, 1–24. doi:10.1007/s00180-023-01433-6.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
ZWZ2023.TSBF.2cNRT(group1, group2)

Normal-reference-test with three-cumulant (3-c) matched $\chi^2$-approximation for GLHT problem proposed by Zhu and Zhang (2022)

Description

Zhu and Zhang (2022)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.

Usage

ZZ2022.GLHT.3cNRT(Y,G,n,p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i\times p) from the iith population with each row representing a pp-dimensional observation.

G

A known full-rank coefficient matrix (q×kq\times k) with rank(G)<k\operatorname{rank}(\boldsymbol{G})<k.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

Suppose we have the following kk independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σ,  i=1,,k.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma},\; i=1,\ldots,k.

It is of interest to test the following GLHT problem:

H0:GM=0, vs. H1:GM0,H_0: \boldsymbol{G M}=\boldsymbol{0}, \quad \text { vs. } \quad H_1: \boldsymbol{G M} \neq \boldsymbol{0},

where M=(μ1,,μk)\boldsymbol{M}=(\boldsymbol{\mu}_1,\ldots,\boldsymbol{\mu}_k)^\top is a k×pk\times p matrix collecting kk mean vectors and G:q×k\boldsymbol{G}:q\times k is a known full-rank coefficient matrix with rank(G)<k\operatorname{rank}(\boldsymbol{G})<k.

Zhu and Zhang (2022) proposed the following test statistic:

TZZ=Cμ^2qtr(Σ^),T_{ZZ}=\|\boldsymbol{C} \hat{\boldsymbol{\mu}}\|^2-q \operatorname{tr}(\hat{\boldsymbol{\Sigma}}),

where C=[(GDG)1/2G]Ip\boldsymbol{C}=[(\boldsymbol{G D G}^\top)^{-1/2}\boldsymbol{G}]\otimes\boldsymbol{I}_p, and μ^=(yˉ1,,yˉk)\hat{\boldsymbol{\mu}}=(\bar{\boldsymbol{y}}_1^\top,\ldots,\bar{\boldsymbol{y}}_k^\top)^\top, with yˉi,i=1,,k\bar{\boldsymbol{y}}_{i},i=1,\ldots,k being the sample mean vectors and Σ^\hat{\boldsymbol{\Sigma}} being the usual pooled sample covariance matrix of the kk samples.

They showed that under the null hypothesis, TZZT_{ZZ} and a chi-squared-type mixture have the same normal or non-normal limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhu T, Zhang J (2022). “Linear hypothesis testing in high-dimensional one-way MANOVA: a new normal reference approach.” Computational Statistics, 37(1), 1–27. doi:10.1007/s00180-021-01110-6.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
k <- 4
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
G <- cbind(diag(k-1),rep(-1,k-1))
ZZ2022.GLHT.3cNRT(Y,G,n,p)

Normal-reference-test with three-cumulant (3-c) matched $\chi^2$-approximation for GLHT problem under heteroscedasticity proposed by Zhang and Zhu (2022)

Description

Zhang and Zhu (2022)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data under heteroscedasticity.

Usage

ZZ2022.GLHTBF.3cNRT(Y,G,n,p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i\times p) from the iith population with each row representing a pp-dimensional observation.

G

A known full-rank coefficient matrix (q×kq\times k) with rank(G)<k\operatorname{rank}(\boldsymbol{G})< k.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

Suppose we have the following kk independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σi,i=1,,k.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,\ldots,k.

It is of interest to test the following GLHT problem:

H0:GM=0, vs. H1:GM0,H_0: \boldsymbol{G M}=\boldsymbol{0}, \quad \text { vs. } H_1: \boldsymbol{G M} \neq \boldsymbol{0},

where M=(μ1,,μk)\boldsymbol{M}=(\boldsymbol{\mu}_1,\ldots,\boldsymbol{\mu}_k)^\top is a k×pk\times p matrix collecting kk mean vectors and G:q×k\boldsymbol{G}:q\times k is a known full-rank coefficient matrix with rank(G)<k\operatorname{rank}(\boldsymbol{G})<k.

Let yˉi,i=1,,k\bar{\boldsymbol{y}}_{i},i=1,\ldots,k be the sample mean vectors and Σ^i,i=1,,k\hat{\boldsymbol{\Sigma}}_i,i=1,\ldots,k be the sample covariance matrices.

Zhang and Zhu (2022) proposed the following U-statistic based test statistic:

TZZ=Cμ^2i=1khiitr(Σ^i)/ni,T_{ZZ}=\|\boldsymbol{C \hat{\mu}}\|^2-\sum_{i=1}^kh_{ii}\operatorname{tr}(\hat{\boldsymbol{\Sigma}}_i)/n_i,

where C=[(GDG)1/2G]Ip\boldsymbol{C}=[(\boldsymbol{G D G}^\top)^{-1/2}\boldsymbol{G}]\otimes\boldsymbol{I}_p, D=diag(1/n1,,1/nk)\boldsymbol{D}=\operatorname{diag}(1/n_1,\ldots,1/n_k), and hijh_{ij} is the (i,j)(i,j)th entry of the k×kk\times k matrix H=G(GDG)1G\boldsymbol{H}=\boldsymbol{G}^\top(\boldsymbol{G}\boldsymbol{D}\boldsymbol{G}^\top)^{-1}\boldsymbol{G}.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhang J, Zhu T (2022). “A new normal reference test for linear hypothesis testing in high-dimensional heteroscedastic one-way MANOVA.” Computational Statistics & Data Analysis, 168, 107385. doi:10.1016/j.csda.2021.107385.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
k <- 4
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
G <- cbind(diag(k-1),rep(-1,k-1))
ZZ2022.GLHTBF.3cNRT(Y,G,n,p)

Normal-reference-test with three-cumulant (3-c) matched $\chi^2$-approximation for two-sample problem proposed by Zhang and Zhu (2022)

Description

Zhang and Zhu (2022)'s test for testing equality of two-sample high-dimensional mean vectors with assuming that two covariance matrices are the same.

Usage

ZZ2022.TS.3cNRT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σ,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma},i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Zhang et al.(2022) proposed the following test statistic:

TZZ=n1n2nyˉ1yˉ22tr(Σ^),T_{ZZ} = \frac{n_1n_2}{n} \|\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2\|^2-\operatorname{tr}(\hat{\boldsymbol{\Sigma}}),

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors and Σ^\hat{\boldsymbol{\Sigma}} is the pooled sample covariance matrix. They showed that under the null hypothesis, TZZT_{ZZ} and a chi-squared-type mixture have the same normal or non-normal limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhang J, Zhu T (2022). “A revisit to Bai–Saranadasa's two-sample test.” Journal of Nonparametric Statistics, 34(1), 58–76. doi:10.1080/10485252.2021.2015768.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
ZZ2022.TS.3cNRT(group1, group2)

Normal-reference-test with three-cumulant (3-c) matched $\chi^2$-approximation for two-sample BF problem proposed by Zhang and Zhu (2022)

Description

Zhang and Zhu (2022)'s test for testing equality of two-sample high-dimensional mean vectors without assuming that two covariance matrices are the same.

Usage

ZZ2022.TSBF.3cNRT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σi,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Zhang and Zhu (2022) proposed the following test statistic:

TZZ=yˉ1yˉ22tr(Ω^n),T_{ZZ} = \|\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2\|^2-\operatorname{tr}(\hat{\boldsymbol{\Omega}}_n),

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors and Ω^n\hat{\boldsymbol{\Omega}}_n is the estimator of Cov(yˉ1yˉ2)\operatorname{Cov}(\bar{\boldsymbol{y}}_1-\bar{\boldsymbol{y}}_2). They showed that under the null hypothesis, TZZT_{ZZ} and a chi-squared-type mixture have the same normal or non-normal limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhang J, Zhu T (2022). “A further study on Chen-Qin’s test for two-sample Behrens–Fisher problems for high-dimensional data.” Journal of Statistical Theory and Practice, 16(1), 1. doi:10.1007/s42519-021-00232-w.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
ZZ2022.TSBF.3cNRT(group1, group2)

Normal-reference-test with two-cumulant (2-c) matched $\chi^2$-approximation for GLHT problem under heteroscedasticity proposed by Zhang et al. (2022)

Description

Zhang et al. (2022)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data under heteroscedasticity.

Usage

ZZG2022.GLHTBF.2cNRT(Y,G,n,p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i\times p) from the iith population with each row representing a pp-dimensional observation.

G

A known full-rank coefficient matrix (q×kq\times k) with rank(G)<k\operatorname{rank}(\boldsymbol{G})< k.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

Suppose we have the following kk independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σi,i=1,,k.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,\ldots,k.

It is of interest to test the following GLHT problem:

H0:GM=0, vs.   H1:GM0,H_0: \boldsymbol{G M}=\boldsymbol{0}, \quad \text { vs. } \; H_1: \boldsymbol{G M} \neq \boldsymbol{0},

where M=(μ1,,μk)\boldsymbol{M}=(\boldsymbol{\mu}_1,\ldots,\boldsymbol{\mu}_k)^\top is a k×pk\times p matrix collecting kk mean vectors and G:q×k\boldsymbol{G}:q\times k is a known full-rank coefficient matrix with rank(G)<k\operatorname{rank}(\boldsymbol{G})<k.

Zhang et al. (2022) proposed the following test statistic:

TZZG=Cμ^2,T_{ZZG}=\|\boldsymbol{C} \hat{\boldsymbol{\mu}}\|^2,

where C=[(GDG)1/2G]Ip\boldsymbol{C}=[(\boldsymbol{G D G}^\top)^{-1/2}\boldsymbol{G}]\otimes\boldsymbol{I}_p with D=diag(1/n1,,1/nk)\boldsymbol{D}=\operatorname{diag}(1/n_1,\ldots,1/n_k), and μ^=(yˉ1,,yˉk)\hat{\boldsymbol{\mu}}=(\bar{\boldsymbol{y}}_1^\top,\ldots,\bar{\boldsymbol{y}}_k^\top)^\top with yˉi,i=1,,k\bar{\boldsymbol{y}}_{i},i=1,\ldots,k being the sample mean vectors.

They showed that under the null hypothesis, TZZGT_{ZZG} and a chi-squared-type mixture have the same normal or non-normal limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhang J, Zhou B, Guo J (2022). “Linear hypothesis testing in high-dimensional heteroscedastic one-way MANOVA: A normal reference L2L^2-norm based test.” Journal of Multivariate Analysis, 187, 104816. doi:10.1016/j.jmva.2021.104816.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
k <- 4
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
G <- cbind(diag(k-1),rep(-1,k-1))
ZZG2022.GLHTBF.2cNRT(Y,G,n,p)

Normal-reference-test with two-cumulant (2-c) matched $\chi^2$-approximation for two-sample BF problem proposed by Zhang et al. (2021)

Description

Zhang et al. (2021)'s test for testing equality of two-sample high-dimensional mean vectors without assuming that two covariance matrices are the same.

Usage

ZZGZ2021.TSBF.2cNRT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σi,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Zhang et al.(2021) proposed the following test statistic:

TZZGZ=n1n2nyˉ1yˉ22,T_{ZZGZ} = \frac{n_1n_2}{n} \|\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2\|^2,

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors. They showed that under the null hypothesis, TZZGZT_{ZZGZ} and a chi-squared-type mixture have the same normal or non-normal limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhang J, Zhou B, Guo J, Zhu T (2021). “Two-sample Behrens-Fisher problems for high-dimensional data: A normal reference approach.” Journal of Statistical Planning and Inference, 213, 142–161. doi:10.1016/j.jspi.2020.11.008.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
ZZGZ2021.TSBF.2cNRT(group1, group2)

Normal-reference-test with two-cumulant (2-c) matched $\chi^2$-approximation for two-sample problem proposed by Zhang et al. (2020)

Description

Zhang et al. (2020)'s test for testing equality of two-sample high-dimensional mean vectors with assuming that two covariance matrices are the same.

Usage

ZZZ2020.TS.2cNRT(y1, y2)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σ,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma},i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Zhang et al.(2020) proposed the following test statistic:

TZZZ=n1n2np(yˉ1yˉ2)D^1(yˉ1yˉ2),T_{ZZZ} = \frac{n_1n_2}{np}(\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2)^\top \hat{\boldsymbol{D}}^{-1}(\bar{\boldsymbol{y}}_1 - \bar{\boldsymbol{y}}_2),

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors, D^\hat{\boldsymbol{D}} is the diagonal matrix of sample covariance matrix. They showed that under the null hypothesis, TZZZT_{ZZZ} and a chi-squared-type mixture have the same limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhang L, Zhu T, Zhang J (2020). “A simple scale-invariant two-sample test for high-dimensional data.” Econometrics and Statistics, 14, 131–144. doi:10.1016/j.ecosta.2019.12.002.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
ZZZ2020.TS.2cNRT(group1,group2)

Normal-reference-test with two-cumulant (2-c) matched $\chi^2$-approximation for GLHT problem proposed by Zhu et al. (2022)

Description

Zhu et al. (2022)'s test for general linear hypothesis testing (GLHT) problem for high-dimensional data with assuming that underlying covariance matrices are the same.

Usage

ZZZ2022.GLHT.2cNRT(Y,X,C,n,p)

Arguments

Y

A list of kk data matrices. The iith element represents the data matrix (ni×pn_i \times p) from the iith population with each row representing a pp-dimensional observation.

X

A known n×kn\times k full-rank design matrix with rank(X)=k<n\operatorname{rank}(\boldsymbol{X})=k<n.

C

A known matrix of size q×kq\times k with rank(C)=q<k\operatorname{rank}(\boldsymbol{C})=q<k.

n

A vector of kk sample sizes. The iith element represents the sample size of group ii, nin_i.

p

The dimension of data.

Details

A high-dimensional linear regression model can be expressed as

Y=XΘ+ϵ,\boldsymbol{Y}=\boldsymbol{X\Theta}+\boldsymbol{\epsilon},

where Θ\Theta is a k×pk\times p unknown parameter matrix and ϵ\boldsymbol{\epsilon} is an n×pn\times p error matrix.

It is of interest to test the following GLHT problem

H0:CΘ=0, vs. H1:CΘ0.H_0: \boldsymbol{C\Theta}=\boldsymbol{0}, \quad \text { vs. } H_1: \boldsymbol{C\Theta} \neq \boldsymbol{0}.

Zhu et al. (2022) proposed the following test statistic:

TZZZ=(nk2)(nk)pqtr(ShD1),T_{ZZZ}=\frac{(n-k-2)}{(n-k)pq}\operatorname{tr}(\boldsymbol{S}_h\boldsymbol{D}^{-1}),

where Sh\boldsymbol{S}_h and Se\boldsymbol{S}_e are the variation matrices due to the hypothesis and error, respectively, and D\boldsymbol{D} is the diagonal matrix with the diagonal elements of Se/(nk)\boldsymbol{S}_e/(n-k). They showed that under the null hypothesis, TZZZT_{ZZZ} and a chi-squared-type mixture have the same limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhu T, Zhang L, Zhang J (2023). “Hypothesis Testing in High-Dimensional Linear Regression: A Normal Reference Scale-Invariant Test.” Statistica Sinica. doi:10.5705/ss.202020.0362.

Examples

library("HDNRA")
data("corneal")
dim(corneal)
group1 <- as.matrix(corneal[1:43, ]) ## normal group
group2 <- as.matrix(corneal[44:57, ]) ## unilateral suspect group
group3 <- as.matrix(corneal[58:78, ]) ## suspect map group
group4 <- as.matrix(corneal[79:150, ]) ## clinical keratoconus group
p <- dim(corneal)[2]
Y <- list()
k <- 4
Y[[1]] <- group1
Y[[2]] <- group2
Y[[3]] <- group3
Y[[4]] <- group4
n <- c(nrow(Y[[1]]),nrow(Y[[2]]),nrow(Y[[3]]),nrow(Y[[4]]))
X <- matrix(c(rep(1,n[1]),rep(0,sum(n)),rep(1,n[2]), rep(0,sum(n)),
            rep(1,n[3]),rep(0,sum(n)),rep(1,n[4])),ncol=k,nrow=sum(n))
q <- k-1
C <- cbind(diag(q),-rep(1,q))
ZZZ2022.GLHT.2cNRT(Y,X,C,n,p)

Normal-reference-test with two-cumulant (2-c) matched $\chi^2$-approximation for two-sample BF problem proposed by Zhang et al. (2023)

Description

Zhang et al. (2023)'s test for testing equality of two-sample high-dimensional mean vectors without assuming that two covariance matrices are the same.

Usage

ZZZ2023.TSBF.2cNRT(y1, y2, cutoff)

Arguments

y1

The data matrix (n1×pn_1 \times p) from the first population. Each row represents a pp-dimensional observation.

y2

The data matrix (n2×pn_2 \times p) from the second population. Each row represents a pp-dimensional observation.

cutoff

An empirical criterion for applying the adjustment coefficient

Details

Suppose we have two independent high-dimensional samples:

yi1,,yini,  are  i.i.d.  with  E(yi1)=μi,  Cov(yi1)=Σi,i=1,2.\boldsymbol{y}_{i1},\ldots,\boldsymbol{y}_{in_i}, \;\operatorname{are \; i.i.d. \; with}\; \operatorname{E}(\boldsymbol{y}_{i1})=\boldsymbol{\mu}_i,\; \operatorname{Cov}(\boldsymbol{y}_{i1})=\boldsymbol{\Sigma}_i,i=1,2.

The primary object is to test

H0:μ1=μ2  versus  H1:μ1μ2.H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_2\; \operatorname{versus}\; H_{1}: \boldsymbol{\mu}_1 \neq \boldsymbol{\mu}_2.

Zhang et al.(2023) proposed the following test statistic:

TZZZ=n1n2np(yˉ1yˉ2)D^n1(yˉ1yˉ2),T_{ZZZ}=\frac{n_1 n_2}{np}(\bar{\boldsymbol{y}}_1-\bar{\boldsymbol{y}}_2)^{\top} \hat{\boldsymbol{D}}_n^{-1}(\bar{\boldsymbol{y}}_1-\bar{\boldsymbol{y}}_2),

where yˉi,i=1,2\bar{\boldsymbol{y}}_{i},i=1,2 are the sample mean vectors, and D^n=diag(Σ^1/n+Σ^2/n)\hat{\boldsymbol{D}}_n=\operatorname{diag}(\hat{\boldsymbol{\Sigma}}_1/n+\hat{\boldsymbol{\Sigma}}_2/n) with n=n1+n2n=n_1+n_2. They showed that under the null hypothesis, TZZZT_{ZZZ} and a chi-squared-type mixture have the same limiting distribution.

Value

A list of class "NRtest" containing the results of the hypothesis test. See the help file for NRtest.object for details.

References

Zhang L, Zhu T, Zhang J (2023). “Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference scale-invariant test.” Journal of Applied Statistics, 50(3), 456–476. doi:10.1080/02664763.2020.1834516.

Examples

library("HDNRA")
data("COVID19")
dim(COVID19)
group1 <- as.matrix(COVID19[c(2:19, 82:87), ]) ## healthy group
group2 <- as.matrix(COVID19[-c(1:19, 82:87), ]) ## COVID-19 patients
ZZZ2023.TSBF.2cNRT(group1,group2,cutoff=1.2)