Title: | Generalized Kernel Two-Sample Tests |
---|---|
Description: | New kernel-based test and fast tests for testing whether two samples are from the same distribution. They work well particularly for high-dimensional data. Song, H. and Chen, H. (2023) <arXiv:2011.06127>. |
Authors: | Hoseung Song [aut, cre], Hao Chen [aut] |
Maintainer: | Hoseung Song <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.4 |
Built: | 2024-12-16 06:48:53 UTC |
Source: | CRAN |
This function provides generalzied kernel-based two-sample tests.
kertests(X, Y, sigma, r1=1.2, r2=0.8, perm=0)
kertests(X, Y, sigma, r1=1.2, r2=0.8, perm=0)
X |
The first samples. |
Y |
The second samples. |
sigma |
The bandwidth of Gaussian kernels. The median heuristic should be used. |
r1 |
The constant in the test statistics |
r2 |
The constant in the test statistics |
perm |
The number of permutations performed to calculate the p-value of the test. The default value is 0, which means the permutation is not performed and only approximated p-value based on the asymptotic theory is provided. Doing permutation could be time consuming, so be cautious if you want to set this value to be larger than 10,000. |
Returns a list teststat
with each test statistic value and a list pval
with p-values of the tests. See below for more details.
GPK |
The value of the test statistic GPK |
ZW1 |
The value of the test statistic |
ZW2 |
The value of the test statistic |
ZD |
The value of the test statistic |
fGPK_appr |
The approximated p-value of fGPK based on asymptotic theory. |
fGPKM_appr |
The approximated p-value of |
fGPK_Simes_appr |
The approximated p-value of fGPK based on asymptotic theory with a Simes procedure. |
fGPKM_Simes_appr |
The approximated p-value of |
GPK_perm |
The permutation p-value of GPK when argument ‘perm’ is positive. |
fGPK_perm |
The permutation p-value of fGPK when argument ‘perm’ is positive. |
fGPKM_perm |
The permutation p-value of |
fGPK_Simes_perm |
The permutation p-value of fGPK with a Simes procedure when argument ‘perm’ is positive. |
fGPKM_Simes_perm |
The permutation p-value of |
## Mean difference in Gaussian distribution. d = 100 mu = 0.2 sam = 100 n = 200 set.seed(500) X = matrix(rnorm(d*sam), sam) Y = matrix(rnorm(d*sam,mu), sam) sigma = med_sigma(X, Y) # median heuristic a = kertests(X, Y, sigma, r1=1.2, r2=0.8, perm=1000) # output results based on the permutation and the asymptotic results # the test statistic values can be found in a$teststat # p-values can be found in a$pval
## Mean difference in Gaussian distribution. d = 100 mu = 0.2 sam = 100 n = 200 set.seed(500) X = matrix(rnorm(d*sam), sam) Y = matrix(rnorm(d*sam,mu), sam) sigma = med_sigma(X, Y) # median heuristic a = kertests(X, Y, sigma, r1=1.2, r2=0.8, perm=1000) # output results based on the permutation and the asymptotic results # the test statistic values can be found in a$teststat # p-values can be found in a$pval
This package can be used to determine whether two samples are from the same distribution. The Gaussian kernel with the median heuristic, which is the median of all pairwise distances among observations, is used. To obtain the median heuristic, the function med_sigma
should be used. The main function is kertests
Hoseung Song and Hao Chen
Maintainer: Hoseung Song ([email protected])
Song, Hoseung, and Hao Chen (2020). Generalized kernel two-sample tests. arXiv:2011.06127
## Mean difference in Gaussian distribution. d = 100 mu = 0.2 sam = 100 n = 200 set.seed(500) X = matrix(rnorm(d*sam), sam) Y = matrix(rnorm(d*sam,mu), sam) sigma = med_sigma(X, Y) # median heuristic a = kertests(X, Y, sigma, r1=1.2, r2=0.8, perm=1000) # output results based on the permutation and the asymptotic results # the test statistic values can be found in a$teststat # p-values can be found in a$pval
## Mean difference in Gaussian distribution. d = 100 mu = 0.2 sam = 100 n = 200 set.seed(500) X = matrix(rnorm(d*sam), sam) Y = matrix(rnorm(d*sam,mu), sam) sigma = med_sigma(X, Y) # median heuristic a = kertests(X, Y, sigma, r1=1.2, r2=0.8, perm=1000) # output results based on the permutation and the asymptotic results # the test statistic values can be found in a$teststat # p-values can be found in a$pval
This function provides the most popular bandwidth of the Gaussian kernel, the median heuristic.
med_sigma(X, Y)
med_sigma(X, Y)
X |
The first samples. |
Y |
The second samples. |
Returns a numeric value, the median heuristic, which is the median of all pairwise distances among pooled observations, as a bandwidth of the kernel.
## Mean difference in Gaussian distribution. d = 100 mu = 0.2 sam = 100 n = 200 set.seed(500) X = matrix(rnorm(d*sam), sam) Y = matrix(rnorm(d*sam,mu), sam) sigma = med_sigma(X, Y) # median heuristic (bandwidth)
## Mean difference in Gaussian distribution. d = 100 mu = 0.2 sam = 100 n = 200 set.seed(500) X = matrix(rnorm(d*sam), sam) Y = matrix(rnorm(d*sam,mu), sam) sigma = med_sigma(X, Y) # median heuristic (bandwidth)