| Title: | Distance-Correlation Based Methods for Blind Source Separation and Dependence Analysis |
|---|---|
| Description: | Independent component analysis based on distance correlation, including a robust variant using the bowl transformation. The package provides user-facing implementations of distance covariance and distance correlation, including memory-efficient blockwise computations for large data sets. It includes a sequential ICA estimator based on minimizing distance correlation, as well as tools for analyzing serial dependence via distance autocorrelation, dependograms, and permutation-based tests. In addition, it provides functions for testing serial dependence based on distance correlation and the Hilbert–Schmidt independence criterion. The methodology is related to Matteson and Tsay (2017) <doi:10.1080/01621459.2016.1150851> and to the robust framework of Leyder et al. (2026) <doi:10.1007/s11634-026-00674-9>. |
| Authors: | Sarah Leyder [aut] (ORCID: <https://orcid.org/0000-0002-0828-0399>), Klaus Nordhausen [aut, cre] (ORCID: <https://orcid.org/0000-0002-3758-8501>) |
| Maintainer: | Klaus Nordhausen <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0-0 |
| Built: | 2026-06-10 11:44:14 UTC |
| Source: | https://github.com/cran/dcorBSS |
Apply the biloop transformation to a numeric vector or matrix.
biloop_transform( X, c1 = 4, c2 = 4, do_scale = FALSE, location = NULL, scale = NULL )biloop_transform( X, c1 = 4, c2 = 4, do_scale = FALSE, location = NULL, scale = NULL )
X |
A numeric vector or matrix. Rows correspond to observations and columns to variables. |
c1 |
Positive scaling constant used before the hyperbolic tangent transformation. See details. |
c2 |
Positive scaling constant applied to the first transformed coordinate. See details. |
do_scale |
Logical; if |
location |
Optional centering vector used when |
scale |
Optional scaling vector used when |
The transformation maps each univariate variable to a two-dimensional
nonlinear embedding based on two tangential circles.
The data may first be robustly standardized (advised).
For multivariate input, the transformation is applied columnwise and the
transformed columns are concatenated.
and are defined as:
By default, no scaling is performed. If do_scale = TRUE and neither
location nor scale is supplied, the columns are centered using
stats::median() and scaled using stats::mad(). Alternatively,
user-supplied location and scale vectors can be provided.
The biloop transformation can be useful when computing dependence measures on heavy-tailed or contaminated data.
A numeric matrix with nrow(X) rows and 2 * ncol(X)
columns. If X is a vector, the result has two columns.
Leyder, S., Raymaekers, J., and Rousseeuw, P. J. (2026). Robust Distance Covariance. International Statistical Review, 94, 1–25. https://doi.org/10.1111/insr.70005
x <- rnorm(10) biloop_transform(x) X <- cbind(rnorm(10), rt(10, df = 3)) colnames(X) <- c("A","B") biloop_transform(X, do_scale = TRUE) # Robust scaling using median and Qn from robustbase if (requireNamespace("robustbase", quietly = TRUE)) { loc <- apply(X, 2, median) scl <- apply(X, 2, robustbase::Qn) biloop_transform(X, do_scale = TRUE, location = loc, scale = scl) } # Robust distance correlation after the biloop transformation if (requireNamespace("robustbase", quietly = TRUE)) { x <- rt(100, df = 3) y <- x^2 + rnorm(100, sd = 0.2) loc_x <- median(x) scl_x <- robustbase::Qn(x) loc_y <- median(y) scl_y <- robustbase::Qn(y) xb <- biloop_transform(x, do_scale = TRUE, location = loc_x, scale = scl_x) yb <- biloop_transform(y, do_scale = TRUE, location = loc_y, scale = scl_y) energy::dcor(xb, yb) }x <- rnorm(10) biloop_transform(x) X <- cbind(rnorm(10), rt(10, df = 3)) colnames(X) <- c("A","B") biloop_transform(X, do_scale = TRUE) # Robust scaling using median and Qn from robustbase if (requireNamespace("robustbase", quietly = TRUE)) { loc <- apply(X, 2, median) scl <- apply(X, 2, robustbase::Qn) biloop_transform(X, do_scale = TRUE, location = loc, scale = scl) } # Robust distance correlation after the biloop transformation if (requireNamespace("robustbase", quietly = TRUE)) { x <- rt(100, df = 3) y <- x^2 + rnorm(100, sd = 0.2) loc_x <- median(x) scl_x <- robustbase::Qn(x) loc_y <- median(y) scl_y <- robustbase::Qn(y) xb <- biloop_transform(x, do_scale = TRUE, location = loc_x, scale = scl_x) yb <- biloop_transform(y, do_scale = TRUE, location = loc_y, scale = scl_y) energy::dcor(xb, yb) }
Apply the bowl transformation proposed for robust distance-correlation based independent component analysis. The transformation is bounded, one-to-one, and continuous, and maps far outlying observations close to the origin while preserving the key implication that zero distance correlation implies independence after transformation.
bowl_transform( X, alpha = 0.9975, do_scale = FALSE, location = NULL, scale = NULL )bowl_transform( X, alpha = 0.9975, do_scale = FALSE, location = NULL, scale = NULL )
X |
A numeric data matrix or an object coercible to a matrix. Rows are observations and columns are variables. |
alpha |
A probability level used to define the cutoff
|
do_scale |
Logical; if |
location |
Optional centering vector used when |
scale |
Optional scaling vector used when |
By default, the input is not standardized before the transformation is
applied. Optional centering and scaling can be requested through
do_scale = TRUE.
Let be an observation, its Euclidean norm, and
. Define
The transformed observation is then
A numeric matrix with nrow(X) rows and ncol(X) + 1 columns.
The first ncol(X) columns contain the transformed coordinates and the
final column contains the additional radial component introduced by the
bowl transformation.
Leyder, S., Raymaekers, J., Rousseeuw, P. J., Van Deuren, T., and Verdonck, T. (2026). Independent component analysis by robust distance correlation. Advances in Data Analysis and Classification. https://doi.org/10.1007/s11634-026-00674-9
X <- cbind(rnorm(10), rt(10, df = 4)) bowl_transform(X) bowl_transform(X, do_scale = TRUE) bowl_transform(X, alpha = 0.99)X <- cbind(rnorm(10), rt(10, df = 4)) bowl_transform(X) bowl_transform(X, do_scale = TRUE) bowl_transform(X, alpha = 0.99)
Computes lagwise distance covariance or distance correlation values for a
univariate or multivariate time series and returns an object that can be
plotted as a dependogram using plot().
dacf_curve( x, transform = c("none", "bowl", "biloop"), measure = c("dcov", "dcor"), lags = NULL, block_size = NULL, ... )dacf_curve( x, transform = c("none", "bowl", "biloop"), measure = c("dcov", "dcor"), lags = NULL, block_size = NULL, ... )
x |
A numeric vector or numeric matrix containing a time series. If
|
transform |
Character string indicating whether the data should be
transformed before computing the lagwise dependence measure. One of
|
measure |
Character string specifying the lagwise dependence measure.
Either |
lags |
Integer vector of lags to compute. If |
block_size |
Either |
... |
Additional arguments passed to |
The function computes either or
for each lag . The choice is
controlled by measure.
If block_size = NULL, the standard full pairwise-distance computation
is used at each lag. If block_size is positive, a blockwise
computation is used. The block size does not need to divide n - k;
the final block may be smaller.
If transform = "bowl" or transform = "biloop", the
transformation is applied before computing the lagwise distance
autocorrelations. For multivariate input, "biloop" is allowed but is
generally not recommended.
The returned object is compatible with plot.sdt(), so the lagwise values
can be shown as a barplot or line plot in the same style as the dependograms
produced by dcor_serial_test() and hsic_serial_test().
An object of class "sdt" containing the lagwise dependence values in
estimate. It can be plotted with plot().
x <- as.numeric(arima.sim(list(ar = 0.6), n = 200)) fit <- dacf_curve(x, lags = 1:10) plot(fit) y <- as.numeric(arima.sim(list(ar = 0.6), n = 200)) fit2 <- dacf_curve(y, lags = 1:10, block_size = 64L) plot(fit2, type = "line") # Multivariate example set.seed(1) X <- cbind( as.numeric(arima.sim(list(ar = 0.5), n = 200)), as.numeric(arima.sim(list(ar = -0.3), n = 200)) ) fit3 <- dacf_curve(X, lags = 1:8) plot(fit3)x <- as.numeric(arima.sim(list(ar = 0.6), n = 200)) fit <- dacf_curve(x, lags = 1:10) plot(fit) y <- as.numeric(arima.sim(list(ar = 0.6), n = 200)) fit2 <- dacf_curve(y, lags = 1:10, block_size = 64L) plot(fit2, type = "line") # Multivariate example set.seed(1) X <- cbind( as.numeric(arima.sim(list(ar = 0.5), n = 200)), as.numeric(arima.sim(list(ar = -0.3), n = 200)) ) fit3 <- dacf_curve(X, lags = 1:8) plot(fit3)
Computes lag-lag distance autocorrelation for the rows of x. If
block_size = NULL, the standard full pairwise-distance computation is
used. If block_size is a positive integer, a blockwise algorithm is
used that avoids allocating the full distance
matrices.
dacor_large(x, lag = 1L, block_size = NULL)dacor_large(x, lag = 1L, block_size = NULL)
x |
A numeric vector or numeric matrix. Rows are observations ordered in time. |
lag |
A positive integer smaller than |
block_size |
Either |
Vectors are treated as one-column matrices.
Distance autocorrelation is computed as the distance correlation between
x[1:(n-lag), ] and x[(1+lag):n, ].
If block_size = NULL, the required pairwise distances are computed
using the standard full computation.
If block_size is a positive integer, the effective sample of size
n - lag is partitioned into consecutive blocks of size at most
block_size, and distances are accumulated block by block. Only one
block pair is stored at a time, which can substantially reduce memory usage.
The block size does not need to divide n - lag. If n - lag is
not a multiple of block_size, the final block is simply smaller.
The blockwise algorithm mainly reduces memory usage; the computational
complexity remains of order .
A numeric scalar giving the distance autocorrelation at lag
lag.
Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794. https://doi.org/10.1214/009053607000000505.
Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Annals of Applied Statistics, 3(4), 1236–1265. https://doi.org/10.1214/09-AOAS312.
Fokianos, K. and Pitsillou, M. (2017). Consistent testing for pairwise dependence in time series. Technometrics, 59(2), 262–270. https://doi.org/10.1080/00401706.2016.1156024.
set.seed(1) n <- 100 x <- matrix(rnorm(n * 2), n, 2) r1 <- dacor_large(x, lag = 1L) r2 <- dacor_large(x, lag = 1L, block_size = 32L) r1 r2 all.equal(r1, r2, tolerance = 1e-10)set.seed(1) n <- 100 x <- matrix(rnorm(n * 2), n, 2) r1 <- dacor_large(x, lag = 1L) r2 <- dacor_large(x, lag = 1L, block_size = 32L) r1 r2 all.equal(r1, r2, tolerance = 1e-10)
Computes lag-lag distance autocovariance for the rows of x. If
block_size = NULL, the standard full pairwise-distance computation is
used. If block_size is a positive integer, a blockwise algorithm is
used that avoids allocating the full distance
matrices.
dacov_large(x, lag = 1L, block_size = NULL) dacov2_large(x, lag = 1L, block_size = NULL)dacov_large(x, lag = 1L, block_size = NULL) dacov2_large(x, lag = 1L, block_size = NULL)
x |
A numeric vector or numeric matrix. Rows are observations ordered in time. |
lag |
A positive integer smaller than |
block_size |
Either |
dacov_large() returns the distance autocovariance and
dacov2_large() returns the corresponding squared distance
autocovariance.
Vectors are treated as one-column matrices.
The functions compare the segment x[1:(n-lag), ] with the lagged
segment x[(1+lag):n, ] using distance covariance.
If block_size = NULL, all pairwise distances for these two segments
are computed in the usual way.
If block_size is a positive integer, the same quantity is computed
block by block. The effective sample size is n - lag, and the
corresponding rows are split into consecutive blocks of size at most
block_size. Distances are accumulated over all block pairs, so only
one block pair needs to be stored at a time.
The block size does not need to divide n - lag. If n - lag is
not a multiple of block_size, the final block is simply smaller.
The blockwise algorithm mainly reduces memory usage; the computational
complexity remains of order .
For dacov_large(), a numeric scalar giving the distance
autocovariance at lag lag.
For dacov2_large(), a numeric scalar giving the squared distance
autocovariance at lag lag.
Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794. https://doi.org/10.1214/009053607000000505.
Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Annals of Applied Statistics, 3(4), 1236–1265. https://doi.org/10.1214/09-AOAS312.
Fokianos, K. and Pitsillou, M. (2017). Consistent testing for pairwise dependence in time series. Technometrics, 59(2), 262–270. https://doi.org/10.1080/00401706.2016.1156024.
set.seed(1) n <- 100 x <- matrix(rnorm(n * 2), n, 2) v1 <- dacov2_large(x, lag = 1L) v2 <- dacov2_large(x, lag = 1L, block_size = 32L) v1 v2 all.equal(v1, v2, tolerance = 1e-10) a1 <- dacov_large(x, lag = 2L) a2 <- dacov_large(x, lag = 2L, block_size = 32L) a1 a2 all.equal(a1, a2, tolerance = 1e-10)set.seed(1) n <- 100 x <- matrix(rnorm(n * 2), n, 2) v1 <- dacov2_large(x, lag = 1L) v2 <- dacov2_large(x, lag = 1L, block_size = 32L) v1 v2 all.equal(v1, v2, tolerance = 1e-10) a1 <- dacov_large(x, lag = 2L) a2 <- dacov_large(x, lag = 2L, block_size = 32L) a1 a2 all.equal(a1, a2, tolerance = 1e-10)
Computes the distance correlation between x and y. If
block_size = NULL, the standard full pairwise-distance computation is
used. If block_size is a positive integer, a blockwise algorithm is
used that avoids allocating the full distance matrices.
dcor_large(x, y, block_size = NULL)dcor_large(x, y, block_size = NULL)
x |
A numeric vector or numeric matrix. Rows are observations. |
y |
A numeric vector or numeric matrix. Rows are observations. |
block_size |
Either |
If block_size = NULL and both inputs are univariate, the computation
is delegated to the optimized dccpp implementation. In this special
case, the computational complexity is of order , whereas
the standard and blockwise multivariate implementations both have quadratic
complexity.
Vectors are treated as one-column matrices.
Distance correlation is obtained by combining squared distance covariance
terms for (x, y), (x, x), and (y, y).
If block_size is a positive integer, the same quantities are computed
block by block. The rows of x and y are partitioned into
consecutive blocks of size at most block_size. Distances are then
computed only for one block pair at a time, and the contributions needed for
distance covariance are accumulated across all block pairs. This can greatly
reduce memory requirements for large n.
The block size does not need to divide the sample size. If the sample size
is not a multiple of block_size, the final block is simply smaller.
The blockwise computation is intended for larger data sets where forming the
full pairwise distance matrices may be memory intensive. It computes the same
sample quantity as the standard version, up to possible small differences
from floating-point arithmetic. It mainly reduces memory usage; the computational
complexity remains of order .
Smaller block sizes use less memory but may be somewhat slower. Larger block sizes use more memory and approach the standard full computation.
A numeric scalar, the distance correlation.
Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794. https://doi.org/10.1214/009053607000000505.
Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Annals of Applied Statistics, 3(4), 1236–1265. https://doi.org/10.1214/09-AOAS312.
Chaudhuri, A. and Hu, W. (2019). A fast algorithm for computing distance correlation. Computational Statistics & Data Analysis, 135, 15–24. https://doi.org/10.1016/j.csda.2019.01.016.
n <- 100 x <- matrix(rnorm(n * 3), n, 3) y <- matrix(rnorm(n * 2), n, 2) r1 <- dcor_large(x, y) r2 <- dcor_large(x, y, block_size = 32L) r1 r2 all.equal(r1, r2, tolerance = 1e-10)n <- 100 x <- matrix(rnorm(n * 3), n, 3) y <- matrix(rnorm(n * 2), n, 2) r1 <- dcor_large(x, y) r2 <- dcor_large(x, y, block_size = 32L) r1 r2 all.equal(r1, r2, tolerance = 1e-10)
Test whether a univariate time series exhibits serial dependence using a portmanteau-type test based on lagwise distance covariance or distance correlation statistics.
dcor_serial_test( x, transform = c("none", "bowl", "biloop"), measure = c("dcov", "dcor"), type = c("FP", "BP"), kernel = "bartlett", B = 499L, lags = NULL, seed = NULL, block_size = NULL, ... )dcor_serial_test( x, transform = c("none", "bowl", "biloop"), measure = c("dcov", "dcor"), type = c("FP", "BP"), kernel = "bartlett", B = 499L, lags = NULL, seed = NULL, block_size = NULL, ... )
x |
A numeric vector, or a one-column numeric matrix, containing a univariate time series. |
transform |
Character string indicating whether the data should be
transformed before computing the lagwise dependence measure. One of
|
measure |
Character string specifying the lagwise dependence measure.
Either |
type |
Character string specifying the portmanteau statistic. Either
|
kernel |
Character string specifying the kernel used when
|
B |
Number of permutation resamples used to approximate the null distribution. |
lags |
Integer vector of lags to include in the test statistic. If
|
seed |
Optional random seed. |
block_size |
Either |
... |
Additional arguments passed to |
For each lag , the procedure computes a dependence measure
between and . The measure is either
distance covariance or distance correlation.
If type = "BP", the test statistic is
If type = "FP", the test statistic is
where is a kernel function and is the number of selected
lags. Currently, the Bartlett kernel is used:
The null distribution is approximated by random permutations of the observed series. This targets the null hypothesis of serial independence. The procedure is simple and broadly applicable, but it is computationally more intensive than classical autocorrelation-based tests.
The procedure assumes that the time series is completely observed. Missing observations should therefore be handled before applying the test.
In practical applications, a larger number of permutation resamples is
recommended to obtain stable and reliable p-values. The default
B = 499 is intended for exploratory use, whereas values such as
B = 2000 or larger are recommended for reporting final results.
If transform = "bowl" or transform = "biloop", the series is
first transformed and the distance correlation is then computed on the
transformed data. By default, scaling is enabled for these transformations,
unless the user explicitly supplies do_scale = FALSE.
Robust scaling can be supplied through the location and scale
arguments. For example, for univariate data one may use the sample median
together with robustbase::Qn.
An object of classes "sdt" and "htest".
Fokianos, K. and Pitsillou, M (2017). Consistent testing for pairwise dependence in time series. Technometrics, 59(2), 262–270 https://doi.org/10.1080/00401706.2016.1156024
Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794. https://doi.org/10.1214/009053607000000505.
Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Annals of Applied Statistics, 3(4), 1236–1265. https://doi.org/10.1214/09-AOAS312.
x <- rnorm(200) dcor_serial_test(x, B = 99) y <- as.numeric(arima.sim(list(ar = 0.6), n = 200)) dcor_serial_test(y, B = 99) dcor_serial_test(y, B = 99, measure = "dcor") dcor_serial_test(y, B = 99, type = "FP") dcor_serial_test(y, B = 99, transform = "biloop") # robust scaling example ymat <- matrix(y, ncol = 1) loc <- apply(ymat, 2, stats::median) sc <- apply(ymat, 2, robustbase::Qn) dcor_serial_test( y, transform = "bowl", B = 99, location = loc, scale = sc ) dcor_serial_test(y, B = 99, block_size = 64L)x <- rnorm(200) dcor_serial_test(x, B = 99) y <- as.numeric(arima.sim(list(ar = 0.6), n = 200)) dcor_serial_test(y, B = 99) dcor_serial_test(y, B = 99, measure = "dcor") dcor_serial_test(y, B = 99, type = "FP") dcor_serial_test(y, B = 99, transform = "biloop") # robust scaling example ymat <- matrix(y, ncol = 1) loc <- apply(ymat, 2, stats::median) sc <- apply(ymat, 2, robustbase::Qn) dcor_serial_test( y, transform = "bowl", B = 99, location = loc, scale = sc ) dcor_serial_test(y, B = 99, block_size = 64L)
Estimate an unmixing matrix by sequentially minimizing a distance-correlation criterion on whitened data. The method can be run either with the ordinary distance correlation criterion or with a robustified criterion based on the bowl transformation.
dcorICA( X, mu = NULL, scatter = NULL, transform = c("none", "bowl"), alpha = 0.9975, sweeps = ncol(X) + 1, seed = NULL, block_size = NULL, optimizer = c("cobyla", "bobyqa"), control = list() )dcorICA( X, mu = NULL, scatter = NULL, transform = c("none", "bowl"), alpha = 0.9975, sweeps = ncol(X) + 1, seed = NULL, block_size = NULL, optimizer = c("cobyla", "bobyqa"), control = list() )
X |
A numeric data matrix or an object coercible to a matrix. Rows are observations and columns are variables. |
mu |
Optional location vector used for whitening and centering. If
|
scatter |
Optional positive definite scatter matrix used for whitening.
If |
transform |
Character string specifying the dependence criterion.
Possible values are |
alpha |
Probability level passed to |
sweeps |
Number of sweeps through the sequential optimization scheme. |
seed |
Optional integer random seed for the random sweep permutations. |
block_size |
Either |
optimizer |
Character string specifying the optimizer to use. Either
|
control |
A named list of control parameters passed to
|
The returned object has class "bss" so that common extractor methods
from package JADE, such as JADE::bss.components() and stats::coef()
can be used directly.
The algorithm first whitens the observations using the supplied location and scatter estimates. It then searches for an orthogonal rotation that minimizes the distance correlation between one component and the remaining components, proceeding sequentially until all components have been extracted.
The dependence criterion is evaluated using dcor_large(). If
block_size = NULL, the standard full pairwise-distance computation is
used. If block_size is a positive integer, the same criterion is
computed blockwise in order to reduce memory usage. The block size does not
need to divide the sample size; if it does not, the final block is simply
smaller.
When the sample mean and sample covariance matrix are used for whitening and the ordinary, non-transformed distance-correlation criterion is minimized, the resulting procedure is closely related in spirit to the distance covariance ICA method of Matteson and Tsay (2017). That paper develops ICA based on distance covariance and distance correlation under classical whitening. When robust location and scatter estimates such as MCD are used for whitening and the bowl-transformed distance-correlation criterion is minimized, the procedure corresponds to the robust approach referred to as PICARD (Leyder et al. 2026).
An object of class "bss", implemented as a list with components:
Estimated unmixing matrix in the convention
.
Estimated independent components.
Location vector used for centering.
Selected dependence criterion.
Bowl-transform tuning parameter.
Block size used in the distance-correlation criterion,
or NULL if the standard full computation was used.
Optimization diagnostics from the sequential optimization steps.
Matteson, D. S. and Tsay, R. S. (2017). Independent component analysis via distance covariance. Journal of the American Statistical Association, 112(518), 623–637. https://doi.org/10.1080/01621459.2016.1150851
Leyder, S., Raymaekers, J., Rousseeuw, P. J., Van Deuren, T., and Verdonck, T. (2026). Independent component analysis by robust distance correlation. Advances in Data Analysis and Classification. https://doi.org/10.1007/s11634-026-00674-9
bowl_transform(), dcor_large(), JADE::bss.components(),
stats::coef()
n <- 400 S <- cbind(runif(n), rnorm(n), rchisq(n, df = 3)) A <- matrix(rnorm(ncol(S)^2), ncol = ncol(S)) X <- tcrossprod(S, A) fit_default <- dcorICA(X, seed = 1) fit_block <- dcorICA(X, seed = 1, block_size = 128L) fit_bobyqa <- dcorICA(X, seed = 1, optimizer = "bobyqa") fit_default$W fit_block$W fit_bobyqa$W head(fit_default$S) if (requireNamespace("JADE", quietly = TRUE)) { coef(fit_default) plot(fit_default) } # PICARD: Robust whitening + bowl criterion if (requireNamespace("robustbase", quietly = TRUE)) { mcd <- robustbase::covMcd(X) fit_picard <- dcorICA( X, mu = mcd$center, scatter = mcd$cov, transform = "bowl", seed = 1 ) if (requireNamespace("JADE", quietly = TRUE)) { JADE::MD(fit_picard$W, A) } }n <- 400 S <- cbind(runif(n), rnorm(n), rchisq(n, df = 3)) A <- matrix(rnorm(ncol(S)^2), ncol = ncol(S)) X <- tcrossprod(S, A) fit_default <- dcorICA(X, seed = 1) fit_block <- dcorICA(X, seed = 1, block_size = 128L) fit_bobyqa <- dcorICA(X, seed = 1, optimizer = "bobyqa") fit_default$W fit_block$W fit_bobyqa$W head(fit_default$S) if (requireNamespace("JADE", quietly = TRUE)) { coef(fit_default) plot(fit_default) } # PICARD: Robust whitening + bowl criterion if (requireNamespace("robustbase", quietly = TRUE)) { mcd <- robustbase::covMcd(X) fit_picard <- dcorICA( X, mu = mcd$center, scatter = mcd$cov, transform = "bowl", seed = 1 ) if (requireNamespace("JADE", quietly = TRUE)) { JADE::MD(fit_picard$W, A) } }
Computes distance covariance between x and y. If
block_size = NULL, the standard full pairwise-distance computation is
used. If block_size is a positive integer, a blockwise algorithm is
used that avoids allocating the full distance matrices.
dcov_large(x, y, block_size = NULL) dcov2_large(x, y, block_size = NULL)dcov_large(x, y, block_size = NULL) dcov2_large(x, y, block_size = NULL)
x |
A numeric vector or numeric matrix. Rows are observations. |
y |
A numeric vector or numeric matrix. Rows are observations. |
block_size |
Either |
dcov_large() returns the distance covariance and thus behaves like
dcov. dcov2_large() returns the corresponding squared
distance covariance.
Vectors are treated as one-column matrices.
The functions compute the biased V-statistic version of distance covariance
based on pairwise Euclidean distances between the rows of x and
y.
If block_size = NULL, all pairwise distances are computed in the
usual way. In the special case where both x and y are
univariate, the optimized dccpp implementation is used instead. This
implementation has computational complexity of order ,
making it preferable to the blockwise approach in the univariate setting.
If block_size is a positive integer, the same quantity is computed
block by block. The observations are split into consecutive blocks of size
at most block_size, and pairwise distances are accumulated over all
block pairs. This reduces memory usage because only distances for one block
pair are stored at a time rather than the full distance matrices.
The block size does not need to divide the sample size. If n is not
a multiple of block_size, the final block simply contains the
remaining observations and is processed in the same way as the other blocks.
The blockwise result is intended to agree with the standard computation up
to numerical differences due to floating-point arithmetic. It mainly reduces memory usage; the computational
complexity remains of order .
Smaller block sizes reduce peak memory usage but may increase computation time due to additional block overhead. Larger block sizes are closer to the standard full computation in memory behavior.
For dcov_large(), a numeric scalar giving the distance covariance.
For dcov2_large(), a numeric scalar giving the squared distance
covariance.
Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35(6), 2769–2794. https://doi.org/10.1214/009053607000000505.
Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Annals of Applied Statistics, 3(4), 1236–1265. https://doi.org/10.1214/09-AOAS312.
Chaudhuri, A. and Hu, W. (2019). A fast algorithm for computing distance correlation. Computational Statistics & Data Analysis, 135, 15–24. https://doi.org/10.1016/j.csda.2019.01.016.
n <- 100 x <- matrix(rnorm(n * 3), n, 3) y <- matrix(rnorm(n * 2), n, 2) v1 <- dcov2_large(x, y) v2 <- dcov2_large(x, y, block_size = 32L) v1 v2 all.equal(v1, v2, tolerance = 1e-10) d1 <- dcov_large(x, y) d2 <- dcov_large(x, y, block_size = 32L) d1 d2 all.equal(d1, d2, tolerance = 1e-10) # Compare with energy::dcov() if (requireNamespace("energy", quietly = TRUE)) { d_energy <- energy::dcov(x, y) d_large <- dcov_large(x, y) d_energy d_large all.equal(d_energy, d_large, tolerance = 1e-10) }n <- 100 x <- matrix(rnorm(n * 3), n, 3) y <- matrix(rnorm(n * 2), n, 2) v1 <- dcov2_large(x, y) v2 <- dcov2_large(x, y, block_size = 32L) v1 v2 all.equal(v1, v2, tolerance = 1e-10) d1 <- dcov_large(x, y) d2 <- dcov_large(x, y, block_size = 32L) d1 d2 all.equal(d1, d2, tolerance = 1e-10) # Compare with energy::dcov() if (requireNamespace("energy", quietly = TRUE)) { d_energy <- energy::dcov(x, y) d_large <- dcov_large(x, y) d_energy d_large all.equal(d_energy, d_large, tolerance = 1e-10) }
Test whether a univariate time series exhibits serial dependence using a portmanteau-type test based on Hilbert-Schmidt Independence Criterion (HSIC) statistics over a set of lags.
hsic_serial_test( x, B = 499L, lags = NULL, normalize = TRUE, type = c("H96", "BP"), kernel = "bartlett", seed = NULL )hsic_serial_test( x, B = 499L, lags = NULL, normalize = TRUE, type = c("H96", "BP"), kernel = "bartlett", seed = NULL )
x |
A numeric vector, or a one-column numeric matrix, containing a univariate time series. |
B |
Number of permutation resamples used to approximate the null distribution. |
lags |
Integer vector of lags to include in the test statistic. If
|
normalize |
Logical; if |
type |
Character string specifying the portmanteau statistic. Either
|
kernel |
Character string specifying the kernel used when
|
seed |
Optional random seed. |
For each lag , the procedure computes a dependence measure
between and . The measure is
either HSIC or normalized HSIC, depending on the value of
normalize. By default, normalize = TRUE.
If type = "BP", the test statistic is the Box–Pierce type statistic
where is the set of selected lags.
If type = "H96", the test statistic is the kernel-weighted
Hong-type statistic
where is a kernel function and is the number of selected
lags. Currently, the Bartlett kernel is used:
The null distribution is approximated by random permutations of the observed series. This targets the null hypothesis of serial independence.
The procedure assumes that the time series is completely observed. Missing observations should therefore be handled before applying the test.
In practical applications, a larger number of permutation resamples is
recommended to obtain stable and reliable p-values. The default
B = 499 is intended for exploratory use, whereas values such as
B = 2000 or larger are recommended for reporting final results.
An object of classes "sdt" and "htest" with components:
Observed test statistic.
Permutation p-value.
Description of the procedure.
Name of the supplied data object.
HSIC or normalized HSIC values at the selected lags.
Number of resamples, number of lags used, whether normalization was used, and the statistic type.
Null hypothesis value.
The alternative hypothesis.
Integer vector of lags used in the test.
Name of the dependence measure.
Type of portmanteau statistic.
Hong, Y. (1996). Consistent testing for serial correlation of unknown form. Econometrica, 64(4), 837–864. https://doi.org/10.2307/2171847.
Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory, 63–77. https://doi.org/10.1007/11564089_7.
x <- rnorm(80) hsic_serial_test(x, B = 99) y <- as.numeric(arima.sim(list(ar = 0.6), n = 80)) # Default: normalized HSIC with BP-type statistic hsic_serial_test(y, B = 99, lags = 1:3) # Unnormalized HSIC with BP-type statistic hsic_serial_test(y, B = 99, normalize = FALSE) # Normalized HSIC with kernel-weighted H96-type statistic hsic_serial_test(y, B = 99, type = "H96") # Compare both statistic types hsic_serial_test(y, B = 99, type = "BP") hsic_serial_test(y, B = 99, type = "H96")x <- rnorm(80) hsic_serial_test(x, B = 99) y <- as.numeric(arima.sim(list(ar = 0.6), n = 80)) # Default: normalized HSIC with BP-type statistic hsic_serial_test(y, B = 99, lags = 1:3) # Unnormalized HSIC with BP-type statistic hsic_serial_test(y, B = 99, normalize = FALSE) # Normalized HSIC with kernel-weighted H96-type statistic hsic_serial_test(y, B = 99, type = "H96") # Compare both statistic types hsic_serial_test(y, B = 99, type = "BP") hsic_serial_test(y, B = 99, type = "H96")
Computes a normalized version of the Hilbert-Schmidt Independence Criterion (HSIC) between two random vectors.
nHSIC(x, y)nHSIC(x, y)
x |
A numeric vector or numeric matrix. Rows correspond to observations. |
y |
A numeric vector or numeric matrix. Rows correspond to observations. |
The normalization rescales HSIC to reduce the influence of marginal scale effects and yields a quantity analogous in spirit to a correlation measure.
The normalized HSIC is defined as
Here, denotes the empirical HSIC value computed
using dhsic.
Vectors are treated as one-column matrices.
The function computes empirical HSIC values for (x, y),
(x, x), and (y, y) and combines them according to the
normalization formula above.
If the normalization denominator is numerically non-positive, the function
returns 0.
Small negative values caused by floating-point arithmetic are truncated to zero before taking the square root.
A numeric scalar giving the normalized HSIC value.
Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory, 63–77. https://doi.org/10.1007/11564089_7.
set.seed(1) n <- 200 x <- matrix(rnorm(n), ncol = 1) y <- x^2 + 0.3 * rnorm(n) # Standard HSIC dHSIC::dhsic(x, y)$dHSIC # Normalized HSIC nHSIC(x, y) # Independent variables z <- matrix(rnorm(n), ncol = 1) dHSIC::dhsic(x, z)$dHSIC nHSIC(x, z)set.seed(1) n <- 200 x <- matrix(rnorm(n), ncol = 1) y <- x^2 + 0.3 * rnorm(n) # Standard HSIC dHSIC::dhsic(x, y)$dHSIC # Normalized HSIC nHSIC(x, y) # Independent variables z <- matrix(rnorm(n), ncol = 1) dHSIC::dhsic(x, z)$dHSIC nHSIC(x, z)
Objects of class "sdt" are returned by dcor_serial_test() and
hsic_serial_test(). They inherit from class "htest" and contain
additional components for lagwise dependence measures, which can be plotted
as dependograms using plot().
## S3 method for class 'sdt' plot( x, type = c("bar", "line"), xlab = "Lag", ylab = NULL, main = NULL, add_h0 = TRUE, pch = 19, lty = 1, ... )## S3 method for class 'sdt' plot( x, type = c("bar", "line"), xlab = "Lag", ylab = NULL, main = NULL, add_h0 = TRUE, pch = 19, lty = 1, ... )
x |
An object of class |
type |
Type of dependogram. Either |
xlab |
Label for the x-axis. |
ylab |
Label for the y-axis. If omitted, a default based on the stored dependence measure is used. |
main |
Main title. If omitted, a default title is used. |
add_h0 |
Logical; if |
pch |
Plotting character used for |
lty |
Line type used for |
... |
Further arguments passed to |
An object of class "sdt" contains the usual "htest" components,
together with:
Named vector of lagwise dependence values.
Integer vector of lags used in the test.
Character string naming the dependence measure, for
example "dCor" or "HSIC".
The plot method displays the lagwise dependence values either as a barplot
or as a line plot. For kernel-weighted tests such as type = "H96",
the plot shows the unweighted lagwise dependence values, not the weighted
contributions to the test statistic.
The plot method is called for its side effect of producing a plot and returns the object invisibly.
x <- as.numeric(arima.sim(list(ar = 0.6), n = 100)) fit <- dcor_serial_test(x, B = 99, lags = 1:10) plot(fit) fit2 <- hsic_serial_test(x, B = 99, lags = 1:10) plot(fit2, type = "line")x <- as.numeric(arima.sim(list(ar = 0.6), n = 100)) fit <- dcor_serial_test(x, B = 99, lags = 1:10) plot(fit) fit2 <- hsic_serial_test(x, B = 99, lags = 1:10) plot(fit2, type = "line")