| Title: | Fast and Light-Weight Energy Statistics |
|---|---|
| Description: | Fast and memory-less computation of the energy statistics related quantities for vectors and matrices. References include: Szekely G. J. and Rizzo M. L. (2014), <doi:10.1214/14-AOS1255>. Szekely G. J. and Rizzo M. L. (2023), <ISBN:9781482242744>. Tsagris M. and Papadakis M. (2025). <doi:10.48550/arXiv.2501.02849>. |
| Authors: | Michail Tsagris [aut, cre], Manos Papadakis [aut] |
| Maintainer: | Michail Tsagris <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.1 |
| Built: | 2026-05-26 10:35:44 UTC |
| Source: | https://github.com/cran/estats |
Description: Fast and memory-less computation of the energy statistics related quantities for vectors and matrices.
| Package: | estats |
| Type: | Package |
| Version: | 1.1 |
| Date: | 2026-03-26 |
| License: | GPL-2 |
Michail Tsagris [email protected].
Michail Tsagris [email protected] and Manos Papadakis [email protected].
Approximate distance variance.
adcov(x, y, bc = FALSE, K = 100)adcov(x, y, bc = FALSE, K = 100)
x |
A numerical matrix. |
y |
A numerical matrix. |
bc |
If you want the bias-corrected distance correlation set this equal to TRUE. |
K |
The number of projections to perform. |
The approximate distance covariance of Huand and Huo (2022) is computed.
The approximate distance covariance.
Michail Tsagris and Manos Papadakis.
R implementation and documentation: Michail Tsagris <[email protected]>.
Szekely G.J., Rizzo M.L. and Bakirov N.K.(2007). Measuring and Testing Independence by Correlation of Distances. Annals of Statistics, 35(6):2769–2794.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
Huang C. and Huo X. (2022). A statistically and numerically efficient independence test based on random projections and distance covariance. Frontiers in Applied Mathematics and Statistics, 7: 779841.
Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. https://arxiv.org/abs/2501.02849
x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) res <- dvar(x[, 1]) dcor(x, y)x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) res <- dvar(x[, 1]) dcor(x, y)
Distance correlation matrix.
dcorm(x, bc = FALSE)dcorm(x, bc = FALSE)
x |
A numerical matrix. |
bc |
If you want the bias-corrected distance correlation set this equal to TRUE. |
The squared distance correlation matrix is computed.
A matrix with the pairwise squared distance correlations between all variables in x.
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
G.J. Szekely, M.L. Rizzo and N. K. Bakirov (2007). Measuring and Testing Independence by Correlation of Distances. Annals of Statistics, 35(6):2769-2794.
x <- as.matrix( iris[1:50, 1:4] ) res <- dcorm(x)x <- as.matrix( iris[1:50, 1:4] ) res <- dcorm(x)
Distance variance, covariance and correlation.
dvar(x, bc = FALSE) dcov(x, y, bc = FALSE) dcor(x, y, bc = FALSE)dvar(x, bc = FALSE) dcov(x, y, bc = FALSE) dcor(x, y, bc = FALSE)
x |
A numerical matrix or a vector. |
y |
A numerical matrix or a vector. |
bc |
If you want the bias-corrected distance correlation set this equal to TRUE. |
The distance variance of a matrix/vector, the distance covariance or distance correlation of two matrices is calculated. For the dcov() and dcor(), if x and y are matrices, they must have the same dinmensions. We have optimized the code, using the formulas provided in Szekely and Rizzo (2023), but only for the case that both matrices are of the same dimensionality.
The distance covariance or the distance variance.
For the distance correlation a vector with the distance covariance, the distance variance of x, the distance variance of Y and the distance correlation.
Michail Tsagris and Manos Papadakis.
R implementation and documentation: Michail Tsagris <[email protected]> and Manos Papadakis <[email protected]>.
Szekely G.J., Rizzo M.L. and Bakirov N.K.(2007). Measuring and Testing Independence by Correlation of Distances. Annals of Statistics, 35(6):2769–2794.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. https://arxiv.org/abs/2501.02849
x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) res <- dvar(x[, 1]) dcor(x, y)x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) res <- dvar(x[, 1]) dcor(x, y)
Energy based normality test.
normal.etest(x, R = 999)normal.etest(x, R = 999)
x |
A numerical vector. |
R |
The number of Monte Carlo samples to generate. |
The energy based normality test is performed where the p-value is computed via parametric bootstrap. The function is faster than the original implementation in the R package "energy".
A vector with two values, the test statistic value and the Monte Carlo (parametric bootstrap) based p-value.
Michail Tsagris
R implementation and documentation: Michail Tsagris <[email protected]>.
Szekely G. J. and Rizzo M.L. (2005) A New Test for Multivariate Normality. Journal of Multivariate Analysis, 93(1): 58–80.
x <- rnorm(100) normal.etest(x, R = 299)x <- rnorm(100) normal.etest(x, R = 299)
Energy distance between matrices.
edist(x, y = NULL)edist(x, y = NULL)
x |
A matrix with numbers or a list with matrices. |
y |
A second matrix with data. The number of columns of x and y must match. The number of rows can be different. |
This calculates the energy distance between two matrices. It will work even for tens of thousands of rows, it will just take some time. See the references for more information. If you have many matrices and want to calculate the distance matrix, then put them in a list and use the function.
If "x" is matrix, a numerical value, the energy distance. If "x" is list, a matrix with all pairwsie distances of the matrices.
Manos Papadakis
R implementation and documentation: Manos Papadakis <[email protected]>.
Szekely G. J. and Rizzo M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).
Szekely G. J. (2000) Technical Report 03-05, E-statistics: Energy of Statistical Samples, Department of Mathematics and Statistics, Bowling Green State University.
Sejdinovic D., Sriperumbudur B., Gretton A. and Fukumizu, K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics, 41(5): 2263–2291.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. https://arxiv.org/abs/2501.02849
x <- as.matrix( iris[1:50, 1:4] ) y <- as.matrix( iris[51:100, 1:4] ) res<-edist(x, y) z <- as.matrix(iris[101:150, 1:4]) a <- list() a[[ 1 ]] <- x a[[ 2 ]] <- y a[[ 3 ]] <- z res<-edist(a) x<-y<-z<-a<-NULLx <- as.matrix( iris[1:50, 1:4] ) y <- as.matrix( iris[51:100, 1:4] ) res<-edist(x, y) z <- as.matrix(iris[101:150, 1:4]) a <- list() a[[ 1 ]] <- x a[[ 2 ]] <- y a[[ 3 ]] <- z res<-edist(a) x<-y<-z<-a<-NULL
Energy test of equal univariate distributions.
eqdist.etest(y, x, R = 999)eqdist.etest(y, x, R = 999)
y |
A numerical vector or a numerical matrix. |
x |
A numerical vector or a numerical matrix. |
R |
The number of permutations to perform. |
The test performs the energy test of equal univariate distributions and the p-value is computed via permutations. Both the univariate and multivariate cases are memory-saving, the univariate case is pretty fast, but the multivariate case is not fast enough.
The permutation based p-value.
Michail Tsagris.
R implementation and documentation: Michail Tsagris <[email protected]>.
Szekely G. J. and Rizzo M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).
Szekely G. J. (2000) Technical Report 03-05, E-statistics: Energy of Statistical Samples, Department of Mathematics and Statistics, Bowling Green State University.
Sejdinovic D., Sriperumbudur B., Gretton A. and Fukumizu, K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics, 41(5): 2263–2291.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. https://www.researchgate.net/publication/387583091_Fast_and_light-weight_energy_statistics_using_the_R_package_Rfast
y <- rnorm(30) x <- rnorm(40) eqdist.etest(y, x, R = 99)y <- rnorm(30) x <- rnorm(40) eqdist.etest(y, x, R = 99)
Hypothesis test for the distance correlation with high dimensional matrices.
dcor.ttest(x, y, logged = FALSE)dcor.ttest(x, y, logged = FALSE)
x |
A numerical matrix. |
y |
A numerical matrix (of the same dimensions). |
logged |
Do you want the logarithm of the p-value to be returned? If yes, set this to TRUE. |
The bias corrected distance correlation is used. The hypothesis test is whether the two matrices are independent or not. Note, that this test is size correct as both the sample size and the dimensionality goes to infinity. It will not have the correct type I error for univariate data or for matrices with just a couple of variables.
A vector with 4 elements, the bias corrected distance correlation, the degrees of freedom, the test statistic and its associated p-value.
Manos Papadakis
R implementation and documentation: Michail Tsagris <[email protected]> and Manos Papadakis <[email protected]>.
G.J. Szekely, M.L. Rizzo and N. K. Bakirov (2007). Measuring and Testing Independence by Correlation of Distances. Annals of Statistics, 35(6): 2769–2794.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) dcor.ttest(x, y)x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) dcor.ttest(x, y)
Hypothesis testing for many partial distance correlations.
mpdcor.test(y, x, z, R = 500)mpdcor.test(y, x, z, R = 500)
y |
A numerical vector. |
x |
A numerical matrix. |
z |
A numerical vector. |
R |
The number of permutations to implement. If R = 1, the the asymptotic p-value is returned only. |
Hypothesis testing between y and each column of x, conditional on z is performed.
A matrix with three columns: the unbiased partial distance correlation, the permutation based p-value and the asymptotic p-value as proposed by Shen, Panda and Vogelstein (2022).
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Szekely G. J. and Rizzo M. L. (2014). Partial Distance Correlation with Methods for Dissimilarities. The Annals of Statistics, 42(6): 2382–2412.
Shen C., Panda S. and Vogelstein J. T. (2022). The Chi-Square Test of Distance Correlation. Journal of Computational and Graphical Statistics, 31(1): 254–262.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. https://arxiv.org/abs/2501.02849
Kontemeniotis N., Vargiakakis R. and Tsagris M. (2025). On independence testing using the (partial) distance correlation. https://arxiv.org/abs/2506.15659v1
y <- iris[, 1] x <- matrix( rnorm(150 * 10), ncol = 10 ) z <- iris[, 2] mpdcor.test(y, x, z)y <- iris[, 1] x <- matrix( rnorm(150 * 10), ncol = 10 ) z <- iris[, 2] mpdcor.test(y, x, z)
Hypothesis testing for the partial distance correlation.
pdcor.test(x, y, z, type = 1, R = 500)pdcor.test(x, y, z, type = 1, R = 500)
x |
A numerical vector or matrix. |
y |
A numerical vector or matrix. |
z |
A numerical vector or matrix. |
type |
In case that all x, y, and z are vectors the user may select the type = 2 which is even faster, but at the expense of requiring more memory. |
R |
The number of permutations to implement. If R = 1, the the asymptotic p-value is returned only. |
Hypothesis testing using the unbiased partial distance correlation between x and y conditioning on z is computed. Note: currently, ony two cases are supported, all x, y, and z are vectors or they are all matrices with the same dimensions.
A vector with the unbiased partial distance correlation, the permutation based p-value and the asymptotic p-value as proposed by Shen, Panda and Vogelstein (2022).
Michail Tsagris and Nikolaos Kontemeniotis .
R implementation and documentation: Michail Tsagris [email protected] and Nikolaos Kontemeniotis [email protected].
Szekely G. J. and Rizzo M. L. (2014). Partial Distance Correlation with Methods for Dissimilarities. The Annals of Statistics, 42(6): 2382–2412.
Shen C., Panda S. and Vogelstein J. T. (2022). The Chi-Square Test of Distance Correlation. Journal of Computational and Graphical Statistics, 31(1): 254–262.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. https://arxiv.org/abs/2501.02849
Kontemeniotis N., Vargiakakis R. and Tsagris M. (2025). On independence testing using the (partial) distance correlation. https://arxiv.org/abs/2506.15659v1
x <- iris[, 1] y <- iris[, 2] z <- iris[, 3] pdcor.test(x, y, z)x <- iris[, 1] y <- iris[, 2] z <- iris[, 3] pdcor.test(x, y, z)
Many partial distance correlations.
mpdcor(y, x, z)mpdcor(y, x, z)
y |
A numerical vector. |
x |
A numerical matrix. |
z |
A numerical vector. |
This computes the unbiased pdcor between y and each column of x, conditional on the vector z.
A vector with many unbiased partial distance correlations.
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Szekely G. J. and Rizzo M. L. (2014). Partial Distance Correlation with Methods for Dissimilarities. The Annals of Statistics, 42(6): 2382–2412.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. https://arxiv.org/abs/2501.02849
Kontemeniotis N., Vargiakakis R. and Tsagris M. (2025). On independence testing using the (partial) distance correlation. https://arxiv.org/abs/2506.15659v1
y <- iris[, 1] x <- matrix( rnorm(150 * 10), ncol = 10 ) z <- iris[, 2] mpdcor(y, x, z) pdcor(y, x[, 1], z)y <- iris[, 1] x <- matrix( rnorm(150 * 10), ncol = 10 ) z <- iris[, 2] mpdcor(y, x, z) pdcor(y, x[, 1], z)
Partial distance correlation.
pdcor(x, y, z)pdcor(x, y, z)
x |
A numerical vector or matrix. |
y |
A numerical vector or matrix. |
z |
A numerical vector or matrix. |
The unbiased partial distance correlation between x and y conditioning on z is computed. Note: currently, ony two cases are supported, all x, y, and z are vectors or they are all matrices with the same dimensions.
The unbiased partial distance correlation.
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Szekely G. J. and Rizzo M. L. (2014). Partial Distance Correlation with Methods for Dissimilarities. The Annals of Statistics, 42(6): 2382–2412.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. https://arxiv.org/abs/2501.02849
Kontemeniotis N., Vargiakakis R. and Tsagris M. (2025). On independence testing using the (partial) distance correlation. https://arxiv.org/abs/2506.15659v1
x <- iris[, 1] y <- iris[, 2] z <- iris[, 3] pdcor(x, y, z)x <- iris[, 1] y <- iris[, 2] z <- iris[, 3] pdcor(x, y, z)
Permutation based and asymptotic (approximate) distance covariance hypothesis test.
dcov.test(x, y, R = 1) adcov.test(x, y, R = 499)dcov.test(x, y, R = 1) adcov.test(x, y, R = 499)
x |
A numerical matrix or a vector. For the approximate distance covariance test (adcov.test()) this can only be a matrix. |
y |
A numerical matrix (of the same dimensions) or a vector. For the approximate distance covariance test (adcov.test()) this can only be a matrix (the number of variables need not be the same). |
R |
For the dcov.test() iIf R=1, the asymptotic p-value of Shen, Panda and Vogelstein (2022) is returned. If R > 1, the permutation based p-value is computed. For the adcov.test() this must be a large number because the permutation based p-value is returned. |
The bias corrected distance correlation is used. The hypothesis test is whether the two matrices are independent or not. If R=1, the test is based on the distance correlation. If R > 1 the test is based upon the distance covariance. For the approximate distance covariance test of Huang and Huo (2022) that is based upon permutations is performed.
A vector with 2 elements, the bias corrected distance correlation or covariance, and the associated permutation or asymptotic based p-value.
Manos Papadakis
R implementation and documentation: Michail Tsagris <[email protected]>.
Shen C., Panda S. and Vogelstein J. T. (2022). The Chi-Square Test of Distance Correlation. Journal of Computational and Graphical Statistics, 31(1): 254–262.
G.J. Szekely, M.L. Rizzo and N. K. Bakirov (2007). Measuring and Testing Independence by Correlation of Distances. Annals of Statistics, 35(6): 2769–2794.
Szekely G. J. and Rizzo M. L. (2023). The Energy of Data and Distance Correlation. Chapman and Hall/CRC.
Huang C. and Huo X. (2022). A statistically and numerically efficient independence test based on random projections and distance covariance. Frontiers in Applied Mathematics and Statistics, 7: 779841.
x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) res <- dcov.test(x, y)x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) res <- dcov.test(x, y)