Title: | Multiple Testing Procedure for Grouped Hypotheses |
---|---|
Description: | Contains functions for a two-stage multiple testing procedure for grouped hypothesis, aiming at controlling both the total posterior false discovery rate and within-group false discovery rate. |
Authors: | Zhigen Zhao |
Maintainer: | Zhigen Zhao <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2024-11-05 06:38:06 UTC |
Source: | CRAN |
This package provides functions for the multiple hypotheses testing when there exists group structures.
Package: | GroupTest |
Type: | Package |
Version: | 1.0 |
Date: | 2015-11-20 |
License: | GPL-3 |
This package provides functions for multiple testing for the grouped hypotheses. The data is an array of list with G list where G is the total number of groups. Each list within this array corresponds to a group, with the test statistic and the group size as its two elements. Under the null hypotheses, the test statistic follows a standard normal distribution.
The main function is GT.wrapper(). One example is provided under this function, explaining the data structure and how to use the package.
Zhigen Zhao <[email protected]>
Maintainer: Zhigen Zhao <[email protected]>
Liu, Y., Sarkar, S. K., and Zhao, Z. (2015) A New Approach to Multiple Testing of Grouped Hypotheses
He, L., Sarkar, S. K. and Zhao, Z. (2015) Capturing the severity of Type II errors in high-dimensional multiple testing. Journal of Multivariate Analysis. Vol. 142, 106-116.
This data set is adequate yearly progress (AYP) study of California elementary schools in 2013 comparing the academic performance for socioeconomically advantaged (SEA) against socioeconomically disadvantaged (SED) students in the elementary schools. What is compared are the success rates of SEA students and SED students. The z-test statistic based two sample proportions test is cacluated for each schools. After removing schools with extremely small or large z-values, there are 4118 schools within 701 qualified school districts.
data("AYP")
data("AYP")
An array of lists.
AYP data set is an array of lists, with each list corresponding to one school district. In each list, three variables are stored:
X: the test statistic for each individual schools within this school district.
md: the number of schools within this school district.
School.District: the name of the school district.
http://www.cde.ca.gov/ta/ac/ay/aypdatafiles.asp
Liu, Y., Sarkar, S. K., and Zhao, Z. (2015) A New Approach to Multiple Testing of Grouped Hypotheses
Efron, B. (2008) Microarrays, empirical bayes and the two-groups model. Statisitcal Science, 23, 1-22.
data(AYP) AYP.result <- GT.wrapper( AYP, alpha=0.1, eta=alpha, pi1.ini=0.5, pi2.1.ini=0.05, L=2, muL.ini=c(3,-2), sigmaL.ini=c(1,1), cL.ini=c(0.5,0.5), DELTA=0.0001, sigma.KNOWN=TRUE )
data(AYP) AYP.result <- GT.wrapper( AYP, alpha=0.1, eta=alpha, pi1.ini=0.5, pi2.1.ini=0.05, L=2, muL.ini=c(3,-2), sigmaL.ini=c(1,1), cL.ini=c(0.5,0.5), DELTA=0.0001, sigma.KNOWN=TRUE )
Simulated data set to demonstrate the package. In this data set, there are three groups. There are 3, 4, and 5 hypotheses respectively among the groups.
data("GroupTest_simulate")
data("GroupTest_simulate")
An array of lists.
data(GroupTest_simulate) GT.test <- GT.wrapper( GroupTest_simulate, alpha=0.05, eta=alpha, pi1.ini=0.7, pi2.1.ini=0.4, L=2, muL.ini=c(-1,1), sigmaL.ini=c(1,2), cL.ini=c(0.4,0.6), DELTA=0.001, sigma.KNOWN=FALSE )
data(GroupTest_simulate) GT.test <- GT.wrapper( GroupTest_simulate, alpha=0.05, eta=alpha, pi1.ini=0.7, pi2.1.ini=0.4, L=2, muL.ini=c(-1,1), sigmaL.ini=c(1,2), cL.ini=c(0.4,0.6), DELTA=0.001, sigma.KNOWN=FALSE )
Based on the -level and the the local fdr scores, this
function provides the decision on between- and within-group levels.
GT.decision(TestStatistic, alpha = 0.05, eta = alpha)
GT.decision(TestStatistic, alpha = 0.05, eta = alpha)
TestStatistic |
An array of list. Each list of the array corresponds to one group, containing the test statistic, stored as X, and the group size, stored as mg. |
alpha |
the targeted FDR level. |
eta |
the targeted FDR level within each group. The default and recommended choice is alpha. |
TestStatistic |
An array of list. Each list of the array corresponds to one group, two additional varialbes: within.group.rej and between.group.rej are stored in each list. |
data(GroupTest_simulate) GroupTest_simulate <- GT.localfdr( GroupTest_simulate, L=2, pi1=0.5, pi2.1=0.5, muL=c(-1, 1), sigmaL=c(1,2), cL=c(0.4,0.6) ) GroupTest.decision <- GT.decision(GroupTest_simulate, alpha=0.05)
data(GroupTest_simulate) GroupTest_simulate <- GT.localfdr( GroupTest_simulate, L=2, pi1=0.5, pi2.1=0.5, muL=c(-1, 1), sigmaL=c(1,2), cL=c(0.4,0.6) ) GroupTest.decision <- GT.decision(GroupTest_simulate, alpha=0.05)
This function estimates all the parameters using the EM algorithm. The iteration is termined when the sum of squared difference of the current updated values and the previous values of the parameters is less than DELTA. A list consisting of all the estimated values of the parameters is returned.
GT.em(TestStatistic, pi1.ini, pi2.1.ini, L, muL.ini, sigmaL.ini, cL.ini, DELTA, sigma.KNOWN)
GT.em(TestStatistic, pi1.ini, pi2.1.ini, L, muL.ini, sigmaL.ini, cL.ini, DELTA, sigma.KNOWN)
TestStatistic |
An array of list. Each list of the array corresponds to one group, containing the test statistic, stored as X, and the group size, stored as mg. |
L |
The number of Gaussian component under the alternative hypothesis. |
pi1.ini |
Initial value: the probability that a group is significant. |
pi2.1.ini |
Initial value: the probability that an individual null hypothesis is false given that the group is significant. |
muL.ini |
Initial value: a vector of means for all the components of the Gaussian mixture. |
sigmaL.ini |
Initial value: a vector of standard deviation of all the components of the Gaussian mixture. |
cL.ini |
Initial value: a vector of the probability for all the components of the Gaussian mixture. |
DELTA |
The criteria to stop the EM algorithm. |
sigma.KNOWN |
The boolean variable, indicating whether the variance is known. |
This function return a list, consisting of the estimated values of all the parameters. The variables within this list are shown as following:
pi1 |
estimated value of |
pi2.1 |
estimated value of |
muL |
a vector of estimated means for all the components of the Gaussian mixture |
sigmaL |
a vector of estimated standard deviation of all the components of the Gaussian mixture |
cL |
a vector of the probability for all the components of the Gaussian mixture |
L |
the number of components in the Gaussian mixture |
data(GroupTest_simulate) em.estimate <- GT.em( GroupTest_simulate, L=2, pi1.ini=0.7, pi2.1.ini=0.4, muL.ini=c(-1,1), sigmaL.ini=c(1,2), cL.ini=c(0.4,0.6), DELTA=0.001, sigma.KNOWN=FALSE )
data(GroupTest_simulate) em.estimate <- GT.em( GroupTest_simulate, L=2, pi1.ini=0.7, pi2.1.ini=0.4, muL.ini=c(-1,1), sigmaL.ini=c(1,2), cL.ini=c(0.4,0.6), DELTA=0.001, sigma.KNOWN=FALSE )
This function calculate the between-group and within-group local fdr scores for a given set of all the parameters.
GT.localfdr(TestStatistic, pi1, pi2.1, L, muL, sigmaL, cL)
GT.localfdr(TestStatistic, pi1, pi2.1, L, muL, sigmaL, cL)
TestStatistic |
An array of list. Each element of the array corresponds to one group, containing the test statistic, stored as X, and the group size, stored as mg. |
L |
The number of Gaussian component under the alternative hypothesis. |
pi1 |
|
pi2.1 |
|
muL |
a vector of means for all the components of the Gaussian mixture. |
sigmaL |
a vector of standard deviation of all the components of the Gaussian mixture. |
cL |
a vector of the probability for all the components of the Gaussian mixture. |
This function returns an array of G lists where G is the number of groups.
TSGroupTest[[g]] |
in each element, the individual
conditional local fdr score ( |
data(GroupTest_simulate) GroupTest_simulate <- GT.localfdr( GroupTest_simulate, L=2, pi1=0.5, pi2.1=0.5, muL=c(-1, 1), sigmaL=c(1,2), cL=c(0.4,0.6) )
data(GroupTest_simulate) GroupTest_simulate <- GT.localfdr( GroupTest_simulate, L=2, pi1=0.5, pi2.1=0.5, muL=c(-1, 1), sigmaL=c(1,2), cL=c(0.4,0.6) )
This function is the main function to perform the two-stage testing for the grouped hypotheses.
GT.wrapper(TestStatistic, alpha = 0.05, eta = alpha, pi1.ini = 0.7, pi2.1.ini = 0.4, L = 2, muL.ini = c(-1, 1), sigmaL.ini = c(1, 1), cL.ini = c(0.5, 0.5), DELTA = 0.001, sigma.KNOWN=FALSE)
GT.wrapper(TestStatistic, alpha = 0.05, eta = alpha, pi1.ini = 0.7, pi2.1.ini = 0.4, L = 2, muL.ini = c(-1, 1), sigmaL.ini = c(1, 1), cL.ini = c(0.5, 0.5), DELTA = 0.001, sigma.KNOWN=FALSE)
TestStatistic |
An array of list. Each list of the array corresponds to one group, containing the test statistic, stored as X, and the group size, stored as mg. |
alpha |
the targeted FDR level. By default, it is chosen as 0.05. |
eta |
the targeted FDR level within each group. The default and
recommended choice is alpha. By default, it is chosen as |
pi1.ini |
Initial value: the probability that a group is significant. By default, it is chosen as 0.7 |
pi2.1.ini |
Initial value: the probability that an individual null hypothesis is false given that the group is significant. By default, it is chosen as 0.4. |
L |
The number of Gaussian component under the alternative hypothesis. By default, it is chosen as 2. |
muL.ini |
Initial value: a vector of means for all the components of the Gaussian mixture. By default, is is chosen as -1 and 1. |
sigmaL.ini |
Initial value: a vector of standard deviation of all the components of the Gaussian mixture. By default, it is chosen as 1 and 1. |
cL.ini |
Initial value: a vector of the probability for all the components of the Gaussian mixture. By default, it is chosen as 50% and 50%. |
DELTA |
The criteria to stop the EM algorithm. In this algorithm, we calcualte the maximum of absolution difference of the current estiamted value and its previous value for the parameters. By default, it is chosen as 0.0001. |
sigma.KNOWN |
The boolean variable, indicating whether the variance is known. Be default, it is chosen as FALSE. |
The function returns a TSGroupTest object. It contains
parameter |
this is a list, consisting of estimated parameters
based on the EM algorithm. The elements are |
TSGroupTest[[g]] |
all the quntities regarding the g-th group,
including the test statistic within this group, the individual
conditional local fdr score ( |
data(GroupTest_simulate) GT.Test <- GT.wrapper( GroupTest_simulate, alpha=0.05, eta=alpha, pi1.ini=0.7, pi2.1.ini=0.4, L=2, muL.ini=c(-1,1), sigmaL.ini=c(1,2), cL.ini=c(0.4,0.6), DELTA=0.001, sigma.KNOWN=FALSE )
data(GroupTest_simulate) GT.Test <- GT.wrapper( GroupTest_simulate, alpha=0.05, eta=alpha, pi1.ini=0.7, pi2.1.ini=0.4, L=2, muL.ini=c(-1,1), sigmaL.ini=c(1,2), cL.ini=c(0.4,0.6), DELTA=0.001, sigma.KNOWN=FALSE )