| Title: | Coverage Correlation Coefficient and Testing for Independence |
|---|---|
| Description: | Computes the coverage correlation coefficient introduced in <doi:10.48550/arXiv.2508.06402> , a statistical measure that quantifies dependence between two random vectors by computing the union volume of data-centered hypercubes in a uniform space. |
| Authors: | Tengyao Wang [aut, cre], Mona Azadkia [aut, ctb], Xuzhi Yang [aut, ctb] |
| Maintainer: | Tengyao Wang <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-06-06 07:24:05 UTC |
| Source: | https://github.com/cran/covercorr |
The CD8T dataset provides the gene expression data of fetal CD8+ T cells obtained in a single-cell RNA-seq experiment.
data(CD8T)data(CD8T)
A data frame with 9369 rows (cells) and 1000 columns (genes).
Suo et al., Science (2022).
Suo, C., Dann, E., Goh, I., Jardine, L., Kleshchevnikov, V., Park, J.-E., Botting, R. A., et al. "Mapping the developing human immune system across organs." Science 376(6597), eabo0510 (2022).
Computes the coverage correlation coefficient between input x and y, as introduced in the arXiv preprint. This coefficient measures the dependence between two random variables or vectors.
coverage_correlation( x, y, visualise = FALSE, method = c("auto", "exact", "approx"), M = NULL, na.rm = TRUE )coverage_correlation( x, y, visualise = FALSE, method = c("auto", "exact", "approx"), M = NULL, na.rm = TRUE )
x |
Numeric vector or matrix. |
y |
Numeric vector or matrix with the same number of rows as |
visualise |
Logical; if |
method |
Character string specifying the computation method. Options are |
M |
Integer; Number of Monte Carlo integration sample points (used when |
na.rm |
Logical; if |
The procedure is as follows:
Calculate the rank transformations of the inputs x and y.
Construct small cubes (in 2D, squares) of volume centered at each rank-transformed point.
Compute the total area of the union of these cubes, intersected with where .
The coverage correlation coefficient is then calculated based on this union area.
For more details, please refer to the original paper: the arXiv preprint.
The method argument controls how the computation is performed:
"exact": Computes the exact value.
"approx": Uses a Monte Carlo approximation with M sample points.
"auto": Automatically selects a method based on the total number of columns in x and y: if more than 6, "approx" is used (with M = nrow(x)^{1.5} if M is not provided); otherwise, "exact" is used.
A list with four elements:
stat – The numeric value of the coverage correlation coefficient.
pval – The p-value, calculated using the exact variance under the null hypothesis of independence between x and y.
method – A character string indicating the computation method used.
mc_se – A numeric value. If method "approx" was used mc_se is the standard error of the Monte Carlo approximation, otherwise it is 0.
set.seed(1) n <- 100 x <- runif(n) y <- sin(3*x) + runif(n) * 0.01 coverage_correlation(x, y, visualise = TRUE)set.seed(1) n <- 100 x <- runif(n) y <- sin(3*x) + runif(n) * 0.01 coverage_correlation(x, y, visualise = TRUE)
Total volume of union of rectangles
covered_volume(zmin, zmax)covered_volume(zmin, zmax)
zmin |
n x d matrix of bottomleft coordinates, one row per rectangle |
zmax |
n x d matrix of topright coordinates, one row per rectangle |
This is a wrapper of the C_covered_volume_partitioned function in C
a numeric value of the volume of the union
Total volume of union of rectangles using Monte Carlo integration
covered_volume_mc(zmin_s, zmax_s, M)covered_volume_mc(zmin_s, zmax_s, M)
zmin_s |
n x d matrix of bottomleft coordinates, one row per rectangle |
zmax_s |
n x d matrix of topright coordinates, one row per rectangle |
M |
number of Monte Carlo integration sample points |
This is a wrapper of the C_covered_volume_mc function in C
a list of the estimated volume of the union and its standard error
Total volume of union of rectangles using volume hashing
covered_volume_partitioned(zmin, zmax)covered_volume_partitioned(zmin, zmax)
zmin |
n x d matrix of bottomleft coordinates, one row per rectangle |
zmax |
n x d matrix of topright coordinates, one row per rectangle |
This is a wrapper of the C_covered_volume_partitioned function in C
a numeric value of the volume of the union
Computes the optimal matching that maps each observation in X to a
reference point in U using uniform weights and squared Euclidean cost.
Internally uses transport::transport(method = "networkflow", p = 2).
In 1D, this reduces to a rank-based matching
sort(U)[rank(X, ties.method = "random")].
MK_rank(X, U)MK_rank(X, U)
X |
Numeric vector of length |
U |
Numeric vector of length |
Rows must match: nrow(X) == nrow(U) (otherwise an error is thrown).
Columns must match: ncol(X) == ncol(U) (otherwise an error is thrown).
Weights are uniform () and the cost matrix is the sum of squared
coordinate differences across columns.
In 1D, ties in X are broken at random via
ties.method = "random"; use set.seed() for reproducibility.
If ncol(X) == 1, a numeric vector of length
containing the entries of U reordered to match the ranks of
X. Otherwise, a numeric matrix whose -th row
is the matched row of U corresponding to the -th row of
X.
Requires the transport package.
# 1D example (set seed for reproducible tie-breaking) set.seed(1) x <- rnorm(10) u <- seq(0, 1, length.out = 10) MK_rank(x, u) # 2D example set.seed(42) X <- matrix(rnorm(200), ncol = 2) # 100 x 2 U <- matrix(runif(200), ncol = 2) # 100 x 2 R <- MK_rank(X, U) dim(R) # 100 2# 1D example (set seed for reproducible tie-breaking) set.seed(1) x <- rnorm(10) u <- seq(0, 1, length.out = 10) MK_rank(x, u) # 2D example set.seed(42) X <- matrix(rnorm(200), ncol = 2) # 100 x 2 U <- matrix(runif(200), ncol = 2) # 100 x 2 R <- MK_rank(X, U) dim(R) # 100 2
Draws rectangles specified by their xmin, xmax, ymin,
and ymax, optionally adding them to an existing plot. When
add = FALSE, a fresh plot with a grid and
equal aspect ratio is created.
plot_rectangles(xmin, xmax, ymin, ymax, add = FALSE)plot_rectangles(xmin, xmax, ymin, ymax, add = FALSE)
xmin |
Numeric vector of left x-coordinates. |
xmax |
Numeric vector of right x-coordinates (same length as |
ymin |
Numeric vector of bottom y-coordinates (same length as |
ymax |
Numeric vector of top y-coordinates (same length as |
add |
Logical; if |
Invisibly returns NULL. Use this function for its plotting output, not for a returned value.
Split rectangles by wrapping them around edges of
split_rectangles(zmin, zmax)split_rectangles(zmin, zmax)
zmin |
n x d matrix of bottom-left coordinates, one row per rectangle |
zmax |
n x d matrix of top-right coordinates, one row per rectangle |
This is a wrapper of the C_split_rectangles function implemented in C
a list of zmin and zmax, describing the bottom-left and top-right coordinates of splitted rectangles
Exact formula for times the variance of the excess vacancy.
For independent and , the variance of the coverage correlation
coefficient is obtained by dividing the returned value by .
check the arXiv preprint for more details
variance_formula(n, d)variance_formula(n, d)
n |
sample size |
d |
dimension |
variance formula in paper