| Title: | One- And Two-Sample Hausdorff Goodness-of-Fit Test |
|---|---|
| Description: | Computes the test statistic and p-values of the one-sample and two-sample Hausdorff (H) goodness-of-fit tests. The H statistic measures the Hausdorff distance under the Chebyshev (l-infinity) metric, between the two cumulative distribution functions (cdfs) underlying the corresponding one-sample and two-sample null hypothesis. It coincides to the side length of the largest axis-aligned square (hypercube) that can be inscribed between the two cdfs. The following cases are covered: (i) one-sample, univariate; (ii) two-sample univariate; and (iii) two-sample bivariate. Exact one-sample p-values are computed in O(n^2 log n) time via the 'Exact-KS-FFT' method of Dimitrova, Kaishev, and Tan (2020) <doi:10.18637/jss.v095.i10>; two-sample p-values are obtained by permutation. A key advantage of the H test is that its sensitivity can be directed towards the left tail, body, or right tail of the distribution by tuning a scale parameter sigma, and therefore maximizing its power which as shown numerically is significantly higher than the power of the classical tests such as the Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling test, especially when the right tail of the distribution is targeted. The sensitivity of the test (left tail, body, or right tail) is governed by two parameters psi1 and psi2, whose values needs to be input. Then the optimal value of the scale parameter sigma is automatically computed. |
| Authors: | Dimitrina S. Dimitrova [aut], Yun Jia [aut, cre], Vladimir K. Kaishev [aut] |
| Maintainer: | Yun Jia <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.3.0 |
| Built: | 2026-05-15 22:01:25 UTC |
| Source: | https://github.com/cran/HausdorffGoF |
This package computes the test statistic and p-values of the one-sample and
two-sample Hausdorff () goodness-of-fit test, proposed by Dimitrova,
Jia, and Kaishev (2026a) and Dimitrova, Jia, and Kaishev (2026b) respectively.
Exploiting the scale dependence of , it is shown that the sensitivity
of the test can be controlled, making it left-tail, central body or
right-tail sensitive and thereby allowing its power to be optimized for the
problem at hand.
Simulation studies further demonstrate that, in terms of statistical power, the
test substantially outperforms classical goodness-of-fit tests such as
the Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling tests. The
improvement is particularly pronounced when discrepancies in the right tail of
the distribution are of primary interest, as is often the case in finance and
insurance, economics, the natural sciences, and extreme value theory. These
results make the test a compelling and effective alternative to
existing testing procedures.
In the one-sample setting, given a random sample
of size with empirical cdf
and a prespecified null cdf , the
statistic is defined as the Hausdorff distance between the planar curves (completed graphs)
and of and
(completed by vertical segments at jump discontinuities). Based on Lemma 2.1 of Dimitrova, Jia, and Kaishev (2026a), the latter can be expressed as
where, is the Chebyshev
() i.e., .
In the two-sample setting, the null hypothesis is for all , against the
alternative for at least one , where and are unknown -dimensional cdfs.
The null hypothesis is tested using two independent random samples
and of sizes
and from and
respectively.
The two-sample statistic is then defined as the Hausdorff
distance between the completed graphs and of
the empirical cdfs and , again under the
Chebyshev distance . Similarly to the one-sample case, using Lemma 2.4 of Dimitrova, Jia, and Kaishev (2026b), it can be expressed as
A geometric interpretation given by Theorem 2.5 of Dimitrova, Jia, and Kaishev
(2026b) is that the statistic equals the side length of the largest
square (hypercube) that can be inscribed in the region between the two curves (hypersurfaces).
In the univariate case this is equivalent to the L\'evy metric between
and or and (see Remark 6 of Dimitrova, Jia, and Kaishev 2026b).
The package HausdorffGoF implements efficient algorithms to compute the
statistic in the one-sample (Algorithm 1 of Dimitrova, Jia, and
Kaishev 2026a), two-sample univariate (Section 3.1 of Dimitrova, Jia, and
Kaishev 2026b) and two-sample bivariate (Appendix A of Dimitrova, Jia, and
Kaishev 2026b) cases. Exact one-sample p-values are provided via a
rectangle-probability representation (Theorem 4 of Dimitrova, Jia, and Kaishev
2026a), evaluated in time using the Exact-KS-FFT method
of Dimitrova, Kaishev, and Tan (2020) via
ks_c_cdf_Rcpp from the package
KSgeneral. Two-sample p-values are obtained by applying exact or Monte Carlo
permutation of the test (see functions H_test_2s_1d and H_test_2s_2d).
A key feature of the test is that its p-values are not
scale-invariant. This property is used to make the test
the left-tail or body or right-tail sensitive and correspondingly optimize the power of the test. This is achieved by rescaling the data with a scale parameter that is selected as follows.
In the one-sample case the scaled statistic is
, where
and .
An explicit scale-tuning formula
depending only on the null distribution and two probability levels
, allows the power to be focused on the
right tail (e.g. ), the left tail
(e.g. ), or the body
(e.g. ) of the distribution. Proposition
13 of Dimitrova, Jia, and Kaishev (2026a) shows that
and its p-value are then invariant under any
further affine rescaling of the data.
In the two-sample case no null distribution is assumed, so
cannot be computed from directly. Instead, the scale is estimated from
the pooled sample via the permutation-based rule
where and
are the -quantiles of permuted sub-samples
and of the pooled sample
(Equation (49) of
Dimitrova, Jia, and Kaishev 2026b). Theorem 4.4 of Dimitrova, Jia, and Kaishev
(2026b) shows that converges in probability to the one-sample
rule applied to the pooled distribution , and
that the critical value of the scale-tuned permutation test is asymptotically
equivalent to that of the original test. Recommended choices of
are the same as in the one-sample case: right tail
, body or ,
left tail .
Scale tuning is integrated directly into
Hausdorff_test via the scale_psi
argument, which accepts the pair and computes
automatically before running the test.
The package provides the following exported functions. For the
one-sample test:
H_stat_1s_1d computes the test
statistic, H_test_1s_1d computes
the exact p-value, and
H_test_c_cdf is the low-level
rectangle-probability engine used by H_test_1s_1d and useful when
is already available.
For the two-sample test:
H_test_2s_1d performs the
univariate permutation test,
H_stat_2s_1d_tr computes
the univariate two-sample statistic via the C++ transformation method,
H_stat_2s_1d_p computes
the univariate two-sample statistic via the C++ projection method,
H_test_2s_2d performs the bivariate permutation test, and
H_stat_2s_2d computes the
Hausdorff distance between two bivariate empirical cdfs.
The unified interface
Hausdorff_test and
Hausdorff_stat are S3 generics that dispatch
to the appropriate underlying function based on the type of input, covering
one-sample, two-sample univariate, and two-sample bivariate cases in a single
call. Null distributions are specified via
distribution, which constructs a
"NullDist" object bundling the cdf, quantile function, and density.
No arguments. This is a package-level documentation page.
One-sample test:
The algorithm to compute the statistic is based on Lemmas 2 and 3 of
Dimitrova, Jia, and Kaishev (2026a). It identifies the locally farthest vertices
of the planar curve from the null curve
, then for each finds the intersection point
of the line
with , and returns .
This is implemented in
H_stat_1s_1d.
The exact p-value is computed by
H_test_1s_1d using the
rectangle-probability representation of Theorem 4 of Dimitrova, Jia, and
Kaishev (2026a):
where are the order statistics of
i.i.d. Uniform random variables, and
This rectangle probability is evaluated in time via
KSgeneral::ks_c_cdf_Rcpp, with the boundary construction handled by
H_test_c_cdf. A Monte Carlo
alternative (bootstrap p-value) is available by passing method = "mc"
to Hausdorff_test.
Two-sample test:
The two-sample statistic
is computed by
H_stat_2s_1d_tr using the
transformation method of Section 3.1 of Dimitrova, Jia, and Kaishev (2026b).
The two staircase curves and are first
modified so that one lies entirely above the other (Lemma 2.13), then the plane
is rotated by so that becomes the
supremum of half the vertical difference between the rotated curves (Section
3.1, ibid.), attained over a finite set of transformed vertices. The bivariate
version H_stat_2s_2d implements
the projection method of Appendix A of Dimitrova, Jia, and Kaishev (2026b) via
an internal C++ routine.
Because the distribution of depends on the unknown
underlying distributions and , p-values are obtained by the
permutation approach of Proposition 3.5 of Dimitrova, Jia, and Kaishev (2026b).
The permutation Hausdorff statistic is
defined by randomly splitting the pooled sample into two groups
of sizes and , computing on each split, and
averaging over all splits:
Theorems 3.2–3.4 of Dimitrova, Jia, and Kaishev (2026b) establish that this
permutation p-value is asymptotically equivalent to the true p-value of
under the null and under fixed or contiguous
alternatives, and that the permutation test controls the type I error at the
nominal level for any . This is implemented in
H_test_2s_1d, which performs exact
enumeration when and Monte Carlo permutation
otherwise.
The asymptotic null distribution of the normalised statistic
is given by Theorem 3.6
of Dimitrova, Jia, and Kaishev (2026b) and involves the supremum of a Brownian
bridge weighted by the density of the pooled distribution.
Unified interface:
Hausdorff_test and
Hausdorff_stat are S3 generics that dispatch
on the class of their second argument y: a
distribution object triggers the one-sample path;
a numeric vector triggers the two-sample univariate path; a matrix or list
triggers the two-sample bivariate path. Hausdorff_test also accepts
scale_psi to activate scale tuning in a single call, computing
automatically and returning the augmented "htest"
object with $sigma and $scale_psi components.
No return value. This is a package-level documentation page. See the
individual function help pages (e.g. Hausdorff_test,
Hausdorff_stat) for the return values of each exported
function.
Dimitrina S. Dimitrova [email protected], Yun Jia [email protected], Vladimir K. Kaishev [email protected]
Maintainer: Yun Jia [email protected]
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026a). “On a One Sample Goodness-of-fit Test Based on the Hausdorff Metric”. Submitted.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026b). “On a Two-Sample Multivariate Goodness-of-Fit Test Based on the Hausdorff Metric”. Submitted.
Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan (2020). “Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous”. Journal of Statistical Software, 95(10): 1–42. doi:10.18637/jss.v095.i10.
Creates an object of class "NullDist" that bundles the null
cumulative distribution function together with its optional
quantile function and density .
The resulting object is passed as the second argument y to
Hausdorff_test or
Hausdorff_stat to trigger the one-sample
dispatch path.
distribution(CDF, CDFinverse = NULL, pdf = NULL)distribution(CDF, CDFinverse = NULL, pdf = NULL)
CDF |
an R function for the null cdf |
CDFinverse |
(optional) an R function for the null quantile function
Only in-range values of |
pdf |
(optional) an R function for the null density |
An object of class "NullDist": a named list with elements
CDF, CDFinverse, and pdf, all stored in their
vectorised and (for CDFinverse) boundary-extended forms. A
print method is provided.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026a). “On a One Sample Goodness-of-fit Test Based on the Hausdorff Metric”. Submitted.
Hausdorff_test,
Hausdorff_stat.
## Standard normal null (all three functions supplied) null_norm <- distribution(CDF = pnorm, CDFinverse = qnorm, pdf = dnorm) null_norm ## Exp(2) null -- only CDF supplied (root-finding used internally) null_exp2 <- distribution(CDF = function(x) pexp(x, rate = 2)) null_exp2## Standard normal null (all three functions supplied) null_norm <- distribution(CDF = pnorm, CDFinverse = qnorm, pdf = dnorm) null_norm ## Exp(2) null -- only CDF supplied (root-finding used internally) null_exp2 <- distribution(CDF = function(x) pexp(x, rate = 2)) null_exp2
Computes the Hausdorff distance between the planar
curve of a prespecified null cdf and the planar curve
of the empirical cdf based on a given sample
.
H_stat_1s_1d(x, CDF, pdf = NULL, tol = 1e-10, max.init = 1000)H_stat_1s_1d(x, CDF, pdf = NULL, tol = 1e-10, max.init = 1000)
x |
a numeric vector of data sample values |
CDF |
a vectorised R function for the null cdf |
pdf |
(optional) a vectorised R function for the null density |
tol |
a numeric value giving the tolerance for the root-finding step.
Defaults to |
max.init |
maximum number of Newton–Raphson iterations.
Used only when a density |
Given a random sample with empirical cdf
and a prespecified null cdf ,
H_stat_1s_1d computes the Hausdorff
test statistic
where are the locally farthest vertices of the planar curve
from the null curve , is the
intersection of the line with
, and
.
This implements Algorithm 1 (Lemmas 2 and 3) of Dimitrova, Jia, and Kaishev
(2026a).
The function first identifies the set of locally
farthest vertices, then for each solves the
equation to find , and
finally returns the maximum of . When
pdf is supplied the equation is solved via Newton–Raphson; otherwise
uniroot is used.
A single numeric value: the observed Hausdorff statistic
.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026a). “On a One Sample Goodness-of-fit Test Based on the Hausdorff Metric”. Submitted.
## Compute the H statistic for a sample from Exp(1) against the Exp(1) null set.seed(1) x <- rexp(50, rate = 1) H_stat <- H_stat_1s_1d(x = x, CDF = function(t) pexp(t, rate = 1), pdf = function(t) dexp(t, rate = 1)) H_stat ## Compute the H statistic for a sample from N(0,1) against the N(0,1) null set.seed(2) y <- rnorm(100) H_stat2 <- H_stat_1s_1d(x = y, CDF = pnorm, pdf = dnorm) H_stat2## Compute the H statistic for a sample from Exp(1) against the Exp(1) null set.seed(1) x <- rexp(50, rate = 1) H_stat <- H_stat_1s_1d(x = x, CDF = function(t) pexp(t, rate = 1), pdf = function(t) dexp(t, rate = 1)) H_stat ## Compute the H statistic for a sample from N(0,1) against the N(0,1) null set.seed(2) y <- rnorm(100) H_stat2 <- H_stat_1s_1d(x = y, CDF = pnorm, pdf = dnorm) H_stat2
A C++ implementation (via Rcpp) of the projection method for computing the
Hausdorff distance between the planar curves of
the empirical cdfs of two independent univariate samples
and .
See Appendix E of Dimitrova, Jia, and Kaishev (2026b) for a numerical accuracy
and timing comparison with
H_stat_2s_1d_tr.
H_stat_2s_1d_p(a, b)H_stat_2s_1d_p(a, b)
a |
a numeric vector of data sample values |
b |
a numeric vector of data sample values |
H_stat_2s_1d_p implements the
projection method of Section 3.1 of Dimitrova, Jia, and Kaishev (2026b)
and proceeds in five steps identical to those described for the univariate case
in the bivariate function
H_stat_2s_2d, specialised to
. The overall computation is in the number of
distinct observations.
A single numeric value: the Hausdorff distance .
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026b). “On a Two-Sample Multivariate Goodness-of-Fit Test Based on the Hausdorff Metric”. Submitted.
H_stat_2s_1d_tr for the
faster transformation-method implementation.
set.seed(1) a <- rnorm(50) b <- rnorm(50) H_stat_2s_1d_p(a, b) ## Verify agreement with the transformation-method C++ implementation H_stat_2s_1d_tr(a, b)set.seed(1) a <- rnorm(50) b <- rnorm(50) H_stat_2s_1d_p(a, b) ## Verify agreement with the transformation-method C++ implementation H_stat_2s_1d_tr(a, b)
A C++ implementation (via Rcpp) of the transformation method for computing
the Hausdorff distance between the planar curves
of the empirical cdfs of two independent univariate samples
and . This is
the faster of the two exported C++ routines; see Appendix E of Dimitrova, Jia, and
Kaishev (2026b) for timing comparisons.
H_stat_2s_1d_tr(a, b)H_stat_2s_1d_tr(a, b)
a |
a numeric vector of data sample values |
b |
a numeric vector of data sample values |
H_stat_2s_1d_tr implements
the transformation method of Section 3.1 of Dimitrova, Jia, and Kaishev
(2026b). The plane is rotated by via the linear map
(Equation (35), ibid.), after
which the Hausdorff distance equals half the supremum of the absolute vertical
difference between the two rotated staircase curves (Equation (36), ibid.). The
algorithm proceeds in four steps.
Step 1 (signed-increment encoding). For each sample, the sorted distinct values and their relative jump heights are extracted. Both samples are shifted by subtracting the joint minimum. Each staircase curve is encoded as an interleaved signed-increment vector.
Step 2 (rotated coordinates via cumulative sums).
The rotation is performed implicitly: the -coordinate
() and the -coordinate () of every
staircase vertex are derived from the encoded vectors via two cumulative sums.
Step 3 (greedy pointer sweep).
A single left-to-right pass through both encoded curves simultaneously identifies,
for each vertex of one curve, the segment of the other curve whose
-projection interval contains it.
Step 4 (distance formula and return value). At each vertex, the candidate Hausdorff distance is computed in closed form from the parity of the current pointer position. The Hausdorff distance is
The entire computation is in the number of distinct observations.
A single numeric value: the Hausdorff distance .
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026b). “On a Two-Sample Multivariate Goodness-of-Fit Test Based on the Hausdorff Metric”. Submitted.
H_stat_2s_1d_p for the C++
projection-method implementation (same result, slower).
set.seed(1) a <- rnorm(50) b <- rnorm(50) H_stat_2s_1d_tr(a, b) ## Verify agreement with the projection-method C++ implementation H_stat_2s_1d_p(a, b)set.seed(1) a <- rnorm(50) b <- rnorm(50) H_stat_2s_1d_tr(a, b) ## Verify agreement with the projection-method C++ implementation H_stat_2s_1d_p(a, b)
Computes the Hausdorff distance between the bivariate empirical cdfs of two
independent two-dimensional data samples
and
.
H_stat_2s_2d(x, y, tol = 1e-6)H_stat_2s_2d(x, y, tol = 1e-6)
x |
a two-column numeric matrix representing the first bivariate sample
of size |
y |
a two-column numeric matrix representing the second bivariate sample
of size |
tol |
a numeric value giving the tolerance for locating omnidirectional jump
vertices in the bivariate empirical cdfs (see Definition 2.11 of
Dimitrova, Jia, and Kaishev 2026b). Defaults to |
H_stat_2s_2d computes the Hausdorff
distance between the three-dimensional
staircase surfaces corresponding to the bivariate empirical cdfs
and of x and y,
under the Chebyshev distance
.
It implements the projection method of Appendix A of Dimitrova, Jia, and
Kaishev (2026b), which generalises the univariate algorithm of Section 3.1 to
.
Bivariate empirical cdf computation.
The evaluation of the bivariate empirical cdfs and
at the required grid points is performed using the fast divide-and-conquer
algorithm of Langren\'e and Warin (2021). For data points in
dimensions this algorithm achieves
complexity, compared to the
naive , by recursively splitting the dataset along alternating
coordinate axes and accumulating counts in each sub-problem. In the bivariate
case () this gives complexity.
Projection method (Appendix A, Dimitrova, Jia, and Kaishev 2026b). The algorithm proceeds in five steps.
Step 1 (omnidirectional jump vertices of ).
A point is an
omnidirectional jump of if and only if
for both
and (Definition 2.11), where
, , and
. Each omnidirectional jump gives rise to
two vertices of the planar curve (Equation (19)):
where .
Step 2 (truncation boundary and additional vertices).
The region between the bivariate planar curves is unbounded. Lemma 2.3 is
applied to truncate both curves at
, ensuring that
.
Step 3 (projection of vertices).
Each vertex of is projected onto the
plane along the direction
.
Step 4 (projection partition for ).
The vertices and sides of are projected in the same
direction , forming a partition of whose regions
determine which coordinate of drives the distance to
.
Step 5 (nearest-point distances via Lemma A.3).
For each locally farthest vertex , the
nearest point on is found via the Pareto frontier of
dominated projected vertices, and the closed-form distance formula of Lemma
A.3 is applied. The Hausdorff distance is
The combinatorial search is executed by the internal C++ routine
hsearch_Rcpp (via Rcpp).
Component-wise orders and .
The function computes the statistic under the standard component-wise order
( iff
and ). The
order-invariant statistic
(Equation (17) of Dimitrova, Jia, and Kaishev 2026b) is available directly
via Hausdorff_stat with
invariant = TRUE, which calls this function over the four sign-flip
orderings and takes the maximum.
The inputs x and y must each be two-column numeric matrices;
otherwise the function stops with an error. To pass list input, use
Hausdorff_stat, which converts lists to matrices before
calling this function.
A single numeric value: the Hausdorff distance
between the bivariate empirical cdfs of x and y.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026b). “On a Two-Sample Multivariate Goodness-of-Fit Test Based on the Hausdorff Metric”. Submitted to the Annals of Statistics.
Nicolas Langren\'e, Xavier Warin (2021). “Fast Multivariate Empirical Cumulative Distribution Function with Connection to Kernel Density Estimation”. Computational Statistics & Data Analysis, 162, 107267. doi:10.1016/j.csda.2021.107267.
Hausdorff_stat,
Hausdorff_test.
## Hausdorff distance between two bivariate samples from the same distribution set.seed(1) x <- matrix(rnorm(100), ncol = 2) y <- matrix(rnorm(100), ncol = 2) H_stat_2s_2d(x, y) ## Hausdorff distance between two bivariate samples from different distributions set.seed(2) x2 <- matrix(rnorm(100), ncol = 2) y2 <- matrix(rnorm(100, mean = 0.5), ncol = 2) H_stat_2s_2d(x2, y2) ## List input: use Hausdorff_stat() which handles the conversion set.seed(3) x3 <- list(rnorm(30), rnorm(30)) y3 <- list(rnorm(30), rnorm(30)) Hausdorff_stat(x3, y3) ## Order-invariant statistic via the unified wrapper Hausdorff_stat(x, y, invariant = TRUE)## Hausdorff distance between two bivariate samples from the same distribution set.seed(1) x <- matrix(rnorm(100), ncol = 2) y <- matrix(rnorm(100), ncol = 2) H_stat_2s_2d(x, y) ## Hausdorff distance between two bivariate samples from different distributions set.seed(2) x2 <- matrix(rnorm(100), ncol = 2) y2 <- matrix(rnorm(100, mean = 0.5), ncol = 2) H_stat_2s_2d(x2, y2) ## List input: use Hausdorff_stat() which handles the conversion set.seed(3) x3 <- list(rnorm(30), rnorm(30)) y3 <- list(rnorm(30), rnorm(30)) Hausdorff_stat(x3, y3) ## Order-invariant statistic via the unified wrapper Hausdorff_stat(x, y, invariant = TRUE)
A thin wrapper that first computes the one-sample Hausdorff test statistic
via
H_stat_1s_1d, then delegates
the p-value computation to
H_test_c_cdf. The split into
two functions allows the p-value to be computed independently when
is already known.
H_test_1s_1d(x, CDF, CDFinverse = NULL, pdf = NULL, tol = 1e-10, max.init = 1000)H_test_1s_1d(x, CDF, CDFinverse = NULL, pdf = NULL, tol = 1e-10, max.init = 1000)
x |
a numeric vector of data sample values |
CDF |
a vectorised R function for the null cdf |
CDFinverse |
(optional) a vectorised R function for |
pdf |
(optional) a vectorised R function for the null density |
tol |
tolerance for root-finding, passed to both
|
max.init |
maximum number of Newton–Raphson iterations, passed to
|
An object of class "htest" with the following components:
statisticthe observed Hausdorff statistic
, named "H".
p.valuethe exact p-value .
methodthe character string "One-sample Hausdorff test".
alternativethe character string "two-sided".
data.namea character string giving the name of the data object.
H_stat_1s_1d,
H_test_c_cdf,
Hausdorff_test.
set.seed(1) x <- rexp(50, rate = 1) H_test_1s_1d(x, CDF = function(t) pexp(t, rate = 1), CDFinverse = function(p) qexp(p, rate = 1), pdf = function(t) dexp(t, rate = 1))set.seed(1) x <- rexp(50, rate = 1) H_test_1s_1d(x, CDF = function(t) pexp(t, rate = 1), CDFinverse = function(p) qexp(p, rate = 1), pdf = function(t) dexp(t, rate = 1))
Tests the null hypothesis for all against
the two-sided alternative for at least one
, using the Hausdorff distance between the empirical cdfs of two
independent univariate samples. The p-value is obtained either by exact
enumeration of all permutations of the pooled
sample or by Monte Carlo permutation. The test statistic is computed by
H_stat_2s_1d_tr.
H_test_2s_1d(x1, x2, nboots = 2000, Exact = FALSE)H_test_2s_1d(x1, x2, nboots = 2000, Exact = FALSE)
x1 |
a numeric vector of data sample values |
x2 |
a numeric vector of data sample values |
nboots |
a positive integer giving the number of Monte Carlo permutation
replications, used only when |
Exact |
a logical value indicating whether an exact permutation p-value should be
computed by enumerating all |
Given two independent random samples
and from
unknown univariate cdfs and respectively, the two-sample
Hausdorff () test statistic is
,
computed by
H_stat_2s_1d_tr. By
Theorem 2.5 of Dimitrova, Jia, and Kaishev (2026b), this equals the side length
of the largest axis-aligned square that can be inscribed in the region between
the two empirical cdf staircase curves and .
Permutation p-value. Because the distribution of
depends on the unknown and , p-values are computed via a
permutation argument (Proposition 3.5 of Dimitrova, Jia, and Kaishev 2026b). Let
be the pooled
sample. The permutation Hausdorff statistic
is defined on all
splits of into groups of
sizes and , and its conditional distribution given
is:
where and
are the empirical cdf planar curves of the -th permuted sub-samples.
When Exact = TRUE all splits are enumerated. When
Exact = FALSE, the p-value is estimated by Monte Carlo: in each of
nboots replications the pooled sample is randomly split into groups of
sizes and and is recomputed; the p-value is the
proportion of replications for which . If the estimated
p-value is zero it is replaced by .
Asymptotic equivalence (Theorems 3.2–3.4). Under the null hypothesis,
or under fixed or contiguous alternatives (with bounded densities and
), the permutation p-value is asymptotically
equivalent to the true p-value of . The permutation
test controls the type I error at the nominal level for all ,
and the power of the two tests are asymptotically equal.
Scale tuning. The power of is not
scale-invariant. The optimal scale can be estimated and
applied automatically by passing scale_psi to the unified wrapper
Hausdorff_test, which calls this function
internally on the already-scaled samples.
An object of class "htest" with the following components:
statisticthe observed Hausdorff statistic
, named
"H".
p.valuethe exact or Monte Carlo permutation p-value, i.e. the proportion of
permutation statistics
that are
greater than or equal to . If the Monte Carlo estimate is
zero it is replaced by .
methoda character string indicating the method:
"Two-sample Hausdorff Test (Exact)" when all permutations are
enumerated, or "Two-sample Hausdorff Test (Monte Carlo)" otherwise.
alternativethe character string "two-sided".
data.namea character string giving the names of the two data objects.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026a). “On a One Sample Goodness-of-fit Test Based on the Hausdorff Metric”. Submitted.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026b). “On a Two-Sample Multivariate Goodness-of-Fit Test Based on the Hausdorff Metric”. Submitted to the Annals of Statistics.
Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan (2020). “Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous”. Journal of Statistical Software, 95(10): 1–42. doi:10.18637/jss.v095.i10.
H_stat_2s_1d_tr,
Hausdorff_test.
## Two-sample H test: both samples from N(0,1) (null is true) set.seed(1) x1 <- rnorm(30) x2 <- rnorm(30) H_test_2s_1d(x1, x2, nboots = 1000) ## Two-sample H test: samples from N(0,1) and N(0.5,1) (null is false) set.seed(2) x3 <- rnorm(30) x4 <- rnorm(30, mean = 0.5) H_test_2s_1d(x3, x4, nboots = 1000) ## Exact permutation test for small samples set.seed(3) a <- rnorm(8) b <- rnorm(8) H_test_2s_1d(a, b, Exact = TRUE) ## Scale-tuned test via the unified wrapper (recommended) set.seed(4) x_exp <- rexp(60, rate = 2) y_exp <- rexp(60, rate = 3) Hausdorff_test(x_exp, y_exp, scale_psi = c(0.99, 0.95), scale_nperms = 500)## Two-sample H test: both samples from N(0,1) (null is true) set.seed(1) x1 <- rnorm(30) x2 <- rnorm(30) H_test_2s_1d(x1, x2, nboots = 1000) ## Two-sample H test: samples from N(0,1) and N(0.5,1) (null is false) set.seed(2) x3 <- rnorm(30) x4 <- rnorm(30, mean = 0.5) H_test_2s_1d(x3, x4, nboots = 1000) ## Exact permutation test for small samples set.seed(3) a <- rnorm(8) b <- rnorm(8) H_test_2s_1d(a, b, Exact = TRUE) ## Scale-tuned test via the unified wrapper (recommended) set.seed(4) x_exp <- rexp(60, rate = 2) y_exp <- rexp(60, rate = 3) Hausdorff_test(x_exp, y_exp, scale_psi = c(0.99, 0.95), scale_nperms = 500)
Tests the null hypothesis for all
against the two-sided alternative
for at least one ,
using the Hausdorff distance between the bivariate empirical cdfs of two
independent samples. The p-value is obtained by Monte Carlo permutation.
The test statistic is computed by H_stat_2s_2d.
H_test_2s_2d(x, y, nboots = 2000, Exact = FALSE, invariant = FALSE, tol = 1e-6)H_test_2s_2d(x, y, nboots = 2000, Exact = FALSE, invariant = FALSE, tol = 1e-6)
x |
a two-column numeric matrix representing the first bivariate sample
of size |
y |
a two-column numeric matrix representing the second bivariate sample
of size |
nboots |
a positive integer giving the number of Monte Carlo permutation
replications, used only when |
Exact |
a logical value indicating whether an exact permutation p-value
should be computed by enumerating all |
invariant |
logical. When |
tol |
a numeric value giving the tolerance for locating omnidirectional
jump vertices in the bivariate empirical cdfs, passed to
|
Given two independent random samples
and
from
unknown bivariate cdfs and respectively, the two-sample
Hausdorff () test statistic is
,
computed by H_stat_2s_2d. By Theorem 2.5 of Dimitrova, Jia,
and Kaishev (2026b), this equals the side length of the largest axis-aligned
hypercube that can be inscribed in the region between the two bivariate
empirical cdf surfaces and .
Permutation p-value. Because the null distribution of
depends on the unknown and ,
p-values are obtained by permutation (Proposition 3.5 of Dimitrova, Jia, and
Kaishev 2026b). When Exact = TRUE all
splits of the pooled sample are enumerated; if the count exceeds
the function falls back to Monte Carlo
automatically. When Exact = FALSE the pooled sample is randomly split
for nboots replications and the p-value is the proportion of
permutation statistics , floored at
. The asymptotic equivalence between the
permutation p-value and the true p-value of , and
the control of type I error, follow from Theorems 3.2–3.4 of Dimitrova, Jia,
and Kaishev (2026b); see H_test_2s_1d for the full statement.
Order-invariant statistic. When invariant = TRUE the
order-invariant statistic
(Equation (17) of Dimitrova, Jia, and Kaishev 2026b) is used for both the
observed value and every permutation replicate. It maximises over the
sign-flip orderings of the two coordinates, making the
test invariant to the labelling of coordinate axes.
Scale tuning. For scale-tuned testing pass scale_psi to
Hausdorff_test, which estimates the column-wise optimal scale
vector and calls this function
internally on the already-scaled samples.
An object of class "htest" with the following components:
statisticthe observed Hausdorff statistic
(or
when invariant = TRUE), named "H".
p.valuethe Monte Carlo permutation p-value, i.e. the proportion
of permutation statistics greater than or equal to . If
zero, replaced by .
methoda character string indicating the procedure:
"Two-sample bivariate Hausdorff test (Exact)" or
"Two-sample bivariate Hausdorff test (Monte Carlo permutation)",
with the suffix ", order-invariant" appended when
invariant = TRUE.
alternativethe character string "two-sided".
data.namea character string giving the names of the two data objects.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026a). “On a One Sample Goodness-of-fit Test Based on the Hausdorff Metric”. Submitted.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026b). “On a Two-Sample Multivariate Goodness-of-Fit Test Based on the Hausdorff Metric”. Submitted to the Annals of Statistics.
H_stat_2s_2d,
Hausdorff_test,
H_test_2s_1d.
## Bivariate H test: both samples from N(0,I) (null is true) set.seed(1) xm <- matrix(rnorm(100), ncol = 2) ym <- matrix(rnorm(100), ncol = 2) H_test_2s_2d(xm, ym, nboots = 1000) ## Bivariate H test: samples from N(0,I) and N(0.5,I) (null is false) set.seed(2) xm2 <- matrix(rnorm(100), ncol = 2) ym2 <- matrix(rnorm(100, mean = 0.5), ncol = 2) H_test_2s_2d(xm2, ym2, nboots = 1000) ## Exact permutation test for small samples set.seed(3) H_test_2s_2d(matrix(rnorm(16), ncol = 2), matrix(rnorm(16), ncol = 2), Exact = TRUE) ## Order-invariant statistic set.seed(4) H_test_2s_2d(xm, ym, nboots = 1000, invariant = TRUE) ## Scale-tuned test via the unified wrapper (recommended) set.seed(4) H_test_2s_2d(xm, ym, nboots = 1000) Hausdorff_test(xm, ym, nboots = 1000, scale_psi = c(0.99, 0.95), scale_nperms = 500)## Bivariate H test: both samples from N(0,I) (null is true) set.seed(1) xm <- matrix(rnorm(100), ncol = 2) ym <- matrix(rnorm(100), ncol = 2) H_test_2s_2d(xm, ym, nboots = 1000) ## Bivariate H test: samples from N(0,I) and N(0.5,I) (null is false) set.seed(2) xm2 <- matrix(rnorm(100), ncol = 2) ym2 <- matrix(rnorm(100, mean = 0.5), ncol = 2) H_test_2s_2d(xm2, ym2, nboots = 1000) ## Exact permutation test for small samples set.seed(3) H_test_2s_2d(matrix(rnorm(16), ncol = 2), matrix(rnorm(16), ncol = 2), Exact = TRUE) ## Order-invariant statistic set.seed(4) H_test_2s_2d(xm, ym, nboots = 1000, invariant = TRUE) ## Scale-tuned test via the unified wrapper (recommended) set.seed(4) H_test_2s_2d(xm, ym, nboots = 1000) Hausdorff_test(xm, ym, nboots = 1000, scale_psi = c(0.99, 0.95), scale_nperms = 500)
Computes the exact complementary cdf
of the one-sample Hausdorff goodness-of-fit statistic for a sample of size
. For a continuous null cdf , the rectangle-probability
representation of Theorem 4 of Dimitrova, Jia, and Kaishev (2026a) shows that
where are the order statistics of an i.i.d.\
sample and
The function evaluates this rectangle-probability representation numerically
via KSgeneral::ks_c_cdf_Rcpp() after constructing the boundary
vectors required by that routine.
This is the low-level p-value engine used by
H_test_1s_1d and is useful when
the observed Hausdorff statistic has already been computed.
H_test_c_cdf(q, n, CDF, CDFinverse = NULL, pdf = NULL, tol = 1e-10, max.init = 1000)H_test_c_cdf(q, n, CDF, CDFinverse = NULL, pdf = NULL, tol = 1e-10, max.init = 1000)
q |
a single numeric value giving the observed Hausdorff statistic. |
n |
a positive integer, the sample size. |
CDF |
a vectorised R function implementing the null cdf |
CDFinverse |
(optional) a vectorised R function implementing
where |
pdf |
(optional) a vectorised R function for the null density |
tol |
numerical tolerance used in the root-finding step. Defaults to
|
max.init |
maximum number of Newton–Raphson iterations allowed at each boundary
point. Defaults to |
For , the function constructs two boundary vectors
f_a and f_b of length . If CDFinverse is
available, these are computed directly from the closed-form expressions for
and . Otherwise, the code solves
and numerically and maps
the resulting roots through and ,
respectively. When the entry is assigned
directly; when the entry is assigned
directly, avoiding unnecessary root-finding.
The two boundary vectors are written to a temporary file
‘Boundary_Crossing_Time.txt’, then passed to
KSgeneral::ks_c_cdf_Rcpp(n), which evaluates the required
rectangle probability in time. The file is deleted
after use.
The trivial boundary cases q >= 1 (returns 1) and q <= 0
(returns 0) are handled analytically before any computation.
A single numeric value equal to the exact complementary cdf
.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026a). “On a One Sample Goodness-of-fit Test Based on the Hausdorff Metric”. Submitted.
Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan (2020). “Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous”. Journal of Statistical Software, 95(10), 1–42. doi:10.18637/jss.v095.i10.
H_test_1s_1d,
H_stat_1s_1d,
Hausdorff_test.
set.seed(1) x <- rexp(50, rate = 1) q <- H_stat_1s_1d(x = x, CDF = function(t) pexp(t, rate = 1), pdf = function(t) dexp(t, rate = 1)) H_test_c_cdf(q = q, n = length(x), CDF = function(t) pexp(t, rate = 1), CDFinverse = function(p) qexp(p, rate = 1), pdf = function(t) dexp(t, rate = 1))set.seed(1) x <- rexp(50, rate = 1) q <- H_stat_1s_1d(x = x, CDF = function(t) pexp(t, rate = 1), pdf = function(t) dexp(t, rate = 1)) H_test_c_cdf(q = q, n = length(x), CDF = function(t) pexp(t, rate = 1), CDFinverse = function(p) qexp(p, rate = 1), pdf = function(t) dexp(t, rate = 1))
An S3 generic that computes the Hausdorff () test statistic
without a p-value, dispatching on the class of y in the same way as
Hausdorff_test:
y numericTwo-sample statistic
via H_stat_2s_1d_tr.
y NullDist
One-sample statistic via
H_stat_1s_1d.
y functionOne-sample shorthand; the bare cdf is passed directly to
H_stat_1s_1d with
pdf = NULL.
y matrix or listTwo-sample bivariate statistic via
H_stat_2s_2d. Both x
and y must be two-column matrices (or lists of two numeric vectors)
of sizes and . When invariant = TRUE the
order-invariant statistic
is returned by maximising over the sign-flip orderings of
the two coordinates (Equation (17) of Dimitrova, Jia, and Kaishev 2026b).
List input is converted to a two-column matrix and the same path is followed.
Hausdorff_stat(x, y, tol = 1e-10, ...) ## S3 method for class 'numeric' Hausdorff_stat(x, y, tol = 1e-10, ...) ## S3 method for class 'NullDist' Hausdorff_stat(x, y, tol = 1e-10, ...) ## S3 method for class 'function' Hausdorff_stat(x, y, tol = 1e-10, ...) ## S3 method for class 'matrix' Hausdorff_stat(x, y, tol = 1e-6, invariant = FALSE, ...) ## S3 method for class 'list' Hausdorff_stat(x, y, tol = 1e-6, invariant = FALSE, ...)Hausdorff_stat(x, y, tol = 1e-10, ...) ## S3 method for class 'numeric' Hausdorff_stat(x, y, tol = 1e-10, ...) ## S3 method for class 'NullDist' Hausdorff_stat(x, y, tol = 1e-10, ...) ## S3 method for class 'function' Hausdorff_stat(x, y, tol = 1e-10, ...) ## S3 method for class 'matrix' Hausdorff_stat(x, y, tol = 1e-6, invariant = FALSE, ...) ## S3 method for class 'list' Hausdorff_stat(x, y, tol = 1e-6, invariant = FALSE, ...)
x |
a numeric vector (univariate case) or a two-column numeric matrix or list of two numeric vectors (bivariate case). |
y |
one of: (i) a numeric vector (two-sample univariate), (ii) a
|
tol |
a numeric value giving the tolerance for the root-finding step (one-sample
path, default |
invariant |
logical; bivariate path only. When |
... |
further arguments (currently unused). |
A single numeric value: the observed Hausdorff statistic .
distribution,
Hausdorff_test,
H_stat_1s_1d,
H_stat_2s_1d_tr,
H_stat_2s_2d.
## --- Univariate two-sample statistic ---------------------------------------- set.seed(1) Hausdorff_stat(rnorm(40), rnorm(40)) ## --- One-sample statistic: full distribution object ------------------------- set.seed(2) x <- rexp(50, rate = 1) null_exp <- distribution(CDF = function(t) pexp(t, rate = 1), pdf = function(t) dexp(t, rate = 1)) Hausdorff_stat(x, null_exp) ## --- One-sample statistic: bare CDF shorthand ------------------------------- set.seed(3) Hausdorff_stat(rnorm(60), pnorm) ## --- Bivariate two-sample statistic (standard ordering) --------------------- set.seed(4) x <- matrix(rnorm(100), ncol = 2) y <- matrix(rnorm(100), ncol = 2) Hausdorff_stat(x, y) ## --- Bivariate two-sample statistic (order-invariant) ----------------------- Hausdorff_stat(x, y, invariant = TRUE) ## --- Bivariate two-sample statistic: list input ----------------------------- set.seed(5) x3 <- list(rnorm(30), rnorm(30)) y3 <- list(rnorm(30), rnorm(30)) Hausdorff_stat(x3, y3)## --- Univariate two-sample statistic ---------------------------------------- set.seed(1) Hausdorff_stat(rnorm(40), rnorm(40)) ## --- One-sample statistic: full distribution object ------------------------- set.seed(2) x <- rexp(50, rate = 1) null_exp <- distribution(CDF = function(t) pexp(t, rate = 1), pdf = function(t) dexp(t, rate = 1)) Hausdorff_stat(x, null_exp) ## --- One-sample statistic: bare CDF shorthand ------------------------------- set.seed(3) Hausdorff_stat(rnorm(60), pnorm) ## --- Bivariate two-sample statistic (standard ordering) --------------------- set.seed(4) x <- matrix(rnorm(100), ncol = 2) y <- matrix(rnorm(100), ncol = 2) Hausdorff_stat(x, y) ## --- Bivariate two-sample statistic (order-invariant) ----------------------- Hausdorff_stat(x, y, invariant = TRUE) ## --- Bivariate two-sample statistic: list input ----------------------------- set.seed(5) x3 <- list(rnorm(30), rnorm(30)) y3 <- list(rnorm(30), rnorm(30)) Hausdorff_stat(x3, y3)
An S3 generic that performs the Hausdorff () goodness-of-fit test
and returns an object of class "htest", dispatching on the class of
y.
Dispatch paths.
y NullDist
One-sample test.
H_test_1s_1d is called when
method is "default" or "exact", giving the exact
rectangle-probability p-value. When method = "mc", a Monte Carlo
bootstrap p-value is computed instead: nboots samples of size
are drawn from via the quantile transform and the
proportion of bootstrap statistics exceeding the observed value
is returned.
y functionOne-sample shorthand. The bare CDF is promoted to a NullDist with
CDFinverse = NULL and pdf = NULL, then dispatched as above.
Activating scale_psi with a bare function is not supported because
requires a quantile function; an informative error
directs the user to distribution.
y numericTwo-sample univariate test via
H_test_2s_1d.
"default" and "mc" use Monte Carlo permutation;
"exact" enumerates all permutations.
y matrix or listTwo-sample bivariate test via
H_test_2s_2d. Monte Carlo permutation is used by default;
method = "exact" enumerates all splits
(automatically falling back to Monte Carlo if the count exceeds
). List input is converted to a two-column matrix
before proceeding.
Resolution of method by path.
"default"One-sample: exact rectangle-probability p-value. Two-sample (univariate and bivariate): Monte Carlo permutation.
"exact"One-sample: exact rectangle-probability p-value (same as "default").
Two-sample univariate: full enumeration of all permutations.
Two-sample bivariate: full enumeration of all permutations (falls back to
Monte Carlo automatically if sample sizes are too large).
"mc"One-sample: Monte Carlo bootstrap p-value. Bootstrap samples are drawn
from via CDFinverse if supplied in the NullDist
object, otherwise by uniroot inversion of CDF.
Two-sample (univariate and bivariate): Monte Carlo permutation.
Scale tuning (scale_psi).
When scale_psi is supplied, is computed before
the test, the data are rescaled, and the test is run on the scaled inputs.
One-sample (y is NullDist): uses
the closed-form formula of Proposition 13 of Dimitrova, Jia, and Kaishev
(2026a),
The quantile is evaluated by priority: CDFinverse
(if supplied in the NullDist object) Newton–Raphson (if
pdf is supplied) uniroot applied to CDF.
The Newton–Raphson iteration is bounded by max.init steps. The
sample is replaced by and the null distribution by
. Only functions that exist in
the original NullDist object are scaled; absent ones remain
NULL.
Two-sample univariate (y is numeric): is
estimated from the pooled sample by averaging over scale_nperms
random splits (Equation (49) of Dimitrova, Jia, and Kaishev 2026b),
Both samples are scaled by the scalar .
Two-sample bivariate (y is matrix or list): the univariate
formula is applied column-wise, yielding a length-2 vector
. Column of each matrix
is scaled by .
In all cases, scale_psi and are attached to the
returned "htest" object as $scale_psi and $sigma,
and the string "(scale-tuned)" is appended to $method.
Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-10, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, ...) ## S3 method for class 'NullDist' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-10, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, ...) ## S3 method for class 'function' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-10, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, ...) ## S3 method for class 'numeric' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-10, scale_psi = NULL, scale_nperms = 1000, ...) ## S3 method for class 'matrix' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-6, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, invariant = FALSE, ...) ## S3 method for class 'list' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-6, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, invariant = FALSE, ...)Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-10, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, ...) ## S3 method for class 'NullDist' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-10, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, ...) ## S3 method for class 'function' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-10, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, ...) ## S3 method for class 'numeric' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-10, scale_psi = NULL, scale_nperms = 1000, ...) ## S3 method for class 'matrix' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-6, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, invariant = FALSE, ...) ## S3 method for class 'list' Hausdorff_test(x, y, method = "default", nboots = 2000, tol = 1e-6, scale_psi = NULL, scale_nperms = 1000, max.init = 1000, invariant = FALSE, ...)
x |
a numeric vector (one-sample or two-sample univariate), or a two-column numeric matrix or list of two numeric vectors (bivariate). |
y |
one of: (i) a |
method |
|
nboots |
a positive integer: number of Monte Carlo replications for the test
p-value (one-sample bootstrap or two-sample permutation).
Defaults to |
tol |
numeric tolerance for root-finding in the one-sample statistic and p-value
computation (default |
scale_psi |
|
scale_nperms |
a positive integer: number of Monte Carlo splits used to estimate
|
max.init |
a positive integer: maximum number of Newton–Raphson iterations when
inverting |
invariant |
logical; bivariate path only. When |
... |
further arguments (currently unused). |
An object of class "htest" with components:
statisticthe observed Hausdorff statistic (computed on the scaled
data when scale_psi is supplied), named "H".
p.valuethe p-value computed according to the active path and method.
methoda character string identifying the procedure. The suffix
"(scale-tuned)" is appended when scale_psi is supplied.
alternativethe character string "two-sided".
data.namea character string giving the names of the data objects.
sigma(only when scale_psi is supplied) the computed :
a scalar for the one-sample and two-sample univariate paths; a length-2
vector for the bivariate path.
scale_psi(only when scale_psi is supplied) the validated sorted
c(psi_low, psi_high) vector used to compute .
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026a). “On a One Sample Goodness-of-fit Test Based on the Hausdorff Metric”. Submitted.
Dimitrina S. Dimitrova, Yun Jia, Vladimir K. Kaishev (2026b). “On a Two-Sample Multivariate Goodness-of-Fit Test Based on the Hausdorff Metric”. Submitted.
Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan (2020). “Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous”. Journal of Statistical Software, 95(10): 1–42. doi:10.18637/jss.v095.i10.
distribution,
Hausdorff_stat,
H_test_1s_1d,
H_test_c_cdf,
H_test_2s_1d,
H_stat_2s_2d.
## ---- One-sample, no scaling ------------------------------------------------ set.seed(1) x <- rexp(50, rate = 1) null_e <- distribution(CDF = function(t) pexp(t, rate = 1), CDFinverse = function(p) qexp(p, rate = 1), pdf = function(t) dexp(t, rate = 1)) Hausdorff_test(x, null_e) ## ---- One-sample, scale-tuned: right tail ----------------------------------- res <- Hausdorff_test(x, null_e, scale_psi = c(0.99, 0.95)) res$method # "... (scale-tuned)" res$sigma res$scale_psi ## ---- One-sample, Monte Carlo p-value, scale-tuned: body -------------------- Hausdorff_test(x, null_e, method = "mc", nboots = 1000, scale_psi = c(0.70, 0.40)) ## ---- One-sample, only CDF and pdf supplied (Newton-Raphson sigma*) --------- null_cdf_only <- distribution(CDF = function(t) pexp(t, rate = 1), pdf = function(t) dexp(t, rate = 1)) Hausdorff_test(x, null_cdf_only, scale_psi = c(0.99, 0.95)) ## ---- Two-sample univariate, no scaling ------------------------------------- set.seed(2) x1 <- rnorm(40); x2 <- rnorm(40) Hausdorff_test(x1, x2) ## ---- Two-sample univariate, scale-tuned ------------------------------------ res2 <- Hausdorff_test(x1, x2, scale_psi = c(0.99, 0.95), scale_nperms = 500) res2$sigma ## ---- Two-sample univariate, exact permutation, small samples --------------- set.seed(3) Hausdorff_test(rnorm(8), rnorm(8), method = "exact") ## ---- Two-sample bivariate, no scaling -------------------------------------- set.seed(4) xm <- matrix(rnorm(100), ncol = 2) ym <- matrix(rnorm(100, mean = 0.5), ncol = 2) Hausdorff_test(xm, ym, nboots = 1000) ## ---- Two-sample bivariate, scale-tuned (column-wise sigma*) ---------------- res3 <- Hausdorff_test(xm, ym, nboots = 1000, scale_psi = c(0.70, 0.40), scale_nperms = 500) res3$sigma # length-2 vector, one sigma* per coordinate ## ---- Two-sample bivariate, order-invariant statistic ----------------------- Hausdorff_test(xm, ym, nboots = 1000, invariant = TRUE)## ---- One-sample, no scaling ------------------------------------------------ set.seed(1) x <- rexp(50, rate = 1) null_e <- distribution(CDF = function(t) pexp(t, rate = 1), CDFinverse = function(p) qexp(p, rate = 1), pdf = function(t) dexp(t, rate = 1)) Hausdorff_test(x, null_e) ## ---- One-sample, scale-tuned: right tail ----------------------------------- res <- Hausdorff_test(x, null_e, scale_psi = c(0.99, 0.95)) res$method # "... (scale-tuned)" res$sigma res$scale_psi ## ---- One-sample, Monte Carlo p-value, scale-tuned: body -------------------- Hausdorff_test(x, null_e, method = "mc", nboots = 1000, scale_psi = c(0.70, 0.40)) ## ---- One-sample, only CDF and pdf supplied (Newton-Raphson sigma*) --------- null_cdf_only <- distribution(CDF = function(t) pexp(t, rate = 1), pdf = function(t) dexp(t, rate = 1)) Hausdorff_test(x, null_cdf_only, scale_psi = c(0.99, 0.95)) ## ---- Two-sample univariate, no scaling ------------------------------------- set.seed(2) x1 <- rnorm(40); x2 <- rnorm(40) Hausdorff_test(x1, x2) ## ---- Two-sample univariate, scale-tuned ------------------------------------ res2 <- Hausdorff_test(x1, x2, scale_psi = c(0.99, 0.95), scale_nperms = 500) res2$sigma ## ---- Two-sample univariate, exact permutation, small samples --------------- set.seed(3) Hausdorff_test(rnorm(8), rnorm(8), method = "exact") ## ---- Two-sample bivariate, no scaling -------------------------------------- set.seed(4) xm <- matrix(rnorm(100), ncol = 2) ym <- matrix(rnorm(100, mean = 0.5), ncol = 2) Hausdorff_test(xm, ym, nboots = 1000) ## ---- Two-sample bivariate, scale-tuned (column-wise sigma*) ---------------- res3 <- Hausdorff_test(xm, ym, nboots = 1000, scale_psi = c(0.70, 0.40), scale_nperms = 500) res3$sigma # length-2 vector, one sigma* per coordinate ## ---- Two-sample bivariate, order-invariant statistic ----------------------- Hausdorff_test(xm, ym, nboots = 1000, invariant = TRUE)