| Title: | Symmetrized Data Aggregation |
|---|---|
| Description: | We develop a new class of distribution free multiple testing rules for false discovery rate (FDR) control under general dependence. A key element in our proposal is a symmetrized data aggregation (SDA) approach to incorporating the dependence structure via sample splitting, data screening and information pooling. The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data driven threshold along the ranking to control the FDR. For more information, see the website below and the accompanying paper: Du et al. (2023), "False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation", <doi:10.1080/01621459.2021.1945459>. Some optional functionality uses the archived R packages ‘huge’ and ‘pfa’, which are not available from CRAN’s main repositories. Users who need this optional functionality can obtain them from the CRAN Archive as follows: ‘huge’ at <https://cran.r-project.org/src/contrib/Archive/huge/>; ‘pfa’ at <https://cran.r-project.org/src/contrib/Archive/pfa/>. |
| Authors: | Lilun Du [aut, cre], Xu Guo [aut], Wenguang Sun [aut], Changliang Zou [aut] |
| Maintainer: | Lilun Du <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.0.1 |
| Built: | 2026-05-14 08:45:47 UTC |
| Source: | https://github.com/cran/sdafilter |
Symmetrized Data Aggregation for two-sample t-test
SDA_2S(dat_I, dat_II, alpha, Sigma_I, Sigma_II, stable = TRUE)SDA_2S(dat_I, dat_II, alpha, Sigma_I, Sigma_II, stable = TRUE)
dat_I |
a |
dat_II |
a |
alpha |
the FDR level |
Sigma_I |
the covariance matrix of sample 1; if it is missing, it will be estimated by the glasso package. |
Sigma_II |
the covariance matrix of sample 2; if it is missing, it will be estimated by the glasso package. |
stable |
If it is TRUE, the sample will be randomly splitted |
the indices of the hypotheses rejected
p = 100 n = 30 dat_I = matrix(rnorm(n*p),nrow = n) mu = rep(0, p) mu[1:10] = 1.5 dat_I = dat_I = rep(1, n)%*%t(mu) dat_II = matrix(rnorm(n*p), nrow = n) Sigma_I = diag(p) Sigma_II = diag(p) out = SDA_2S(dat_I, dat_II, alpha=0.05, Sigma_I, Sigma_II) print(out)p = 100 n = 30 dat_I = matrix(rnorm(n*p),nrow = n) mu = rep(0, p) mu[1:10] = 1.5 dat_I = dat_I = rep(1, n)%*%t(mu) dat_II = matrix(rnorm(n*p), nrow = n) Sigma_I = diag(p) Sigma_II = diag(p) out = SDA_2S(dat_I, dat_II, alpha=0.05, Sigma_I, Sigma_II) print(out)
This is the main function in the SDA paper. Other commonly used test statistics for the first part of data are also allowed in this function.
SDA_M( dat, alpha, Omega, nonsparse = FALSE, stable = TRUE, kwd = c("lasso", "de-lasso", "innovate", "pfa"), scale = TRUE )SDA_M( dat, alpha, Omega, nonsparse = FALSE, stable = TRUE, kwd = c("lasso", "de-lasso", "innovate", "pfa"), scale = TRUE )
dat |
a n by p data matrix |
alpha |
the FDR level |
Omega |
the inverse covariance matrix; if it is missing, it will be estimated by the glasso package. |
nonsparse |
If it is TRUE, the covariance matrix will be estimated by the POET package; otherwise it will be fitted by glasso by default. |
stable |
If it is TRUE, the sample will be randomly splitted |
kwd |
various methods for calculating the test statistics from the first part of data |
scale |
If it is TRUE, the test statistic from the first part of data will be standardized. |
We provide other commonly used test statistics for the first part of sample. These include the de-biased lasso, innovated transformation, and factor-adjusted test statistics.
the indices of the hypotheses rejected by the SDA method
n = 50 p = 100 rho = 0.8 Sig = matrix(rho, p, p) diag(Sig) = 1 dat <- MASS::mvrnorm(n, rep(0, p), Sig) mu = rep(0, p) mu[1:as.integer(0.1*p)]=0.5 dat = dat+rep(1, n)%*%t(mu) alpha = 0.2 out = SDA_M(dat, alpha, solve(Sig), kwd='lasso') print(out)n = 50 p = 100 rho = 0.8 Sig = matrix(rho, p, p) diag(Sig) = 1 dat <- MASS::mvrnorm(n, rep(0, p), Sig) mu = rep(0, p) mu[1:as.integer(0.1*p)]=0.5 dat = dat+rep(1, n)%*%t(mu) alpha = 0.2 out = SDA_M(dat, alpha, solve(Sig), kwd='lasso') print(out)