| Title: | Multivariate Depth Functions for General Dimension |
|---|---|
| Description: | Efficient computation of multivariate statistical depth functions in arbitrary dimension d. Implements Mahalanobis depth, Tukey (halfspace) depth, Liu simplicial depth (via adaptive Monte Carlo), projection depth, and spatial depth. Provides depth-based medians, central regions, outlier detection, and depth-depth plots. 'C++' backends via 'Rcpp' and 'RcppEigen' ensure performance at large n and d. References: Liu (1990) <doi:10.1214/aos/1176347507>, Zuo and Serfling (2000) <doi:10.1214/aos/1016218226>, Vardi and Zhang (2000) <doi:10.1073/pnas.97.4.1423>. |
| Authors: | Jason Parker [aut, cre] |
| Maintainer: | Jason Parker <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.8 |
| Built: | 2026-06-26 18:39:03 UTC |
| Source: | https://github.com/cran/depthR |
Returns the set of observations whose depth is at or above the
alpha-th quantile of the depth distribution — the multivariate
analog of a quantile interval.
central_region(x, alpha = 0.5, ...)central_region(x, alpha = 0.5, ...)
x |
A |
alpha |
Numeric scalar in (0, 1). The central region contains the
deepest |
... |
Ignored. |
A named list:
Row indices of observations in the central region.
Matrix of observations in the central region.
Depth values of those observations.
The depth cutoff used.
The alpha level used.
Computes the statistical depth of every row of data with respect to
the empirical distribution of data, returning a depth object
from which medians, outliers, ranks, and other derived quantities can be
extracted cheaply without recomputing depth.
compute_depth(data, depth_fn = mahalanobis_depth, ...)compute_depth(data, depth_fn = mahalanobis_depth, ...)
data |
Numeric matrix (n x d) or data frame. Rows are observations, columns are variables. |
depth_fn |
Depth function to use. Must have signature
|
... |
Additional arguments forwarded to |
An object of class "depth" with components:
Numeric vector of length n — depth of each observation.
The original data matrix.
The depth function used.
Number of observations.
Dimension.
The matched call.
set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) dd <- compute_depth(data, depth_fn = mahalanobis_depth) median(dd) rank(dd) outliers(dd) summary(dd) plot(dd)set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) dd <- compute_depth(data, depth_fn = mahalanobis_depth) median(dd) rank(dd) outliers(dd) summary(dd) plot(dd)
Computes and plots the depth-depth (DD) plot for two samples. Each
observation from both samples is assigned two depth values — its depth
with respect to the empirical distribution of x and its depth
with respect to the empirical distribution of y. Points from the
same distribution cluster near the main diagonal.
dd_plot( x, y, depth_fn = simplicial_depth, plot = TRUE, xlab = "Depth wrt X", ylab = "Depth wrt Y", main = "DD-Plot", col_x = "steelblue", col_y = "firebrick", pch_x = 19L, pch_y = 17L, legend = TRUE, ... )dd_plot( x, y, depth_fn = simplicial_depth, plot = TRUE, xlab = "Depth wrt X", ylab = "Depth wrt Y", main = "DD-Plot", col_x = "steelblue", col_y = "firebrick", pch_x = 19L, pch_y = 17L, legend = TRUE, ... )
x |
Numeric matrix (n1 x d) — first sample. |
y |
Numeric matrix (n2 x d) — second sample. Must have the same
number of columns as |
depth_fn |
Depth function to use. Must have signature
|
plot |
Logical. If |
xlab |
Label for the x-axis. Defaults to "Depth wrt X". |
ylab |
Label for the y-axis. Defaults to "Depth wrt Y". |
main |
Plot title. Defaults to "DD-Plot". |
col_x |
Color for points from |
col_y |
Color for points from |
pch_x |
Plot character for points from |
pch_y |
Plot character for points from |
legend |
Logical. If |
... |
Additional arguments passed to |
The DD-plot was introduced by Liu, Parelius & Singh (1999) as a nonparametric graphical tool for two-sample comparison. It is the multivariate analog of the QQ-plot, using depth in place of quantiles.
If the two distributions are identical, all points should fall near the diagonal. Systematic deviations indicate location shifts (points above or below the diagonal) or scale/shape differences (spread of points away from the diagonal).
Invisibly returns a data frame with columns:
Depth of each observation with respect to x.
Depth of each observation with respect to y.
Factor indicating which sample the observation came from.
Liu, R. Y., Parelius, J. M. & Singh, K. (1999). Multivariate analysis by data depth: descriptive statistics, graphics and inference. Annals of Statistics, 27(3), 783–858.
set.seed(42) # Same distribution — points near diagonal x <- matrix(rnorm(200), nrow = 100, ncol = 2) y <- matrix(rnorm(200), nrow = 100, ncol = 2) dd_plot(x, y, depth_fn = simplicial_depth) # Location shift — points systematically off diagonal y_shift <- matrix(rnorm(200, mean = 1), nrow = 100, ncol = 2) dd_plot(x, y_shift, depth_fn = tukey_depth) # Store results without plotting result <- dd_plot(x, y, plot = FALSE) head(result)set.seed(42) # Same distribution — points near diagonal x <- matrix(rnorm(200), nrow = 100, ncol = 2) y <- matrix(rnorm(200), nrow = 100, ncol = 2) dd_plot(x, y, depth_fn = simplicial_depth) # Location shift — points systematically off diagonal y_shift <- matrix(rnorm(200, mean = 1), nrow = 100, ncol = 2) dd_plot(x, y_shift, depth_fn = tukey_depth) # Store results without plotting result <- dd_plot(x, y, plot = FALSE) head(result)
Converts depth values to outlyingness scores via O(x) = 1/D(x) - 1, so that depth 1 maps to outlyingness 0 and depth approaching 0 maps to outlyingness approaching infinity.
depth_outlyingness(depths)depth_outlyingness(depths)
depths |
Numeric vector of depth values in (0, 1]. |
Numeric vector of outlyingness values in [0, inf).
Computes the Mahalanobis depth of one or more query points with respect
to a reference distribution estimated from data.
mahalanobis_depth(x, data, mu = NULL, sigma = NULL)mahalanobis_depth(x, data, mu = NULL, sigma = NULL)
x |
Numeric matrix of query points (m x d), or a numeric vector of length d for a single query point. |
data |
Numeric matrix of reference data (n x d). Used to estimate the mean and covariance. |
mu |
Optional numeric vector of length d. If supplied, overrides the
mean estimated from |
sigma |
Optional numeric matrix (d x d). If supplied, overrides the
covariance estimated from |
Numeric vector of depth values in (0, 1], one per query point.
Generic function for computing the median. For depth objects,
returns the deepest observation. For all other objects, delegates to
stats::median.
median(x, ...)median(x, ...)
x |
An object. For |
... |
Additional arguments passed to methods. |
For depth objects, a named list with elements point,
depth, and index. For other objects, see
median.
Returns the observation with the highest depth — the multivariate analog of the median.
## S3 method for class 'depth' median(x, ...)## S3 method for class 'depth' median(x, ...)
x |
A |
... |
Ignored. |
A named list:
Numeric vector of length d — the deepest observation.
Depth value at the median.
Row index of the deepest observation in the data.
Flags observations whose depth falls below a threshold as outliers. The threshold can be specified as a quantile of the depth distribution (default) or as an absolute depth cutoff.
outliers(x, threshold = 0.05, absolute = FALSE, ...)outliers(x, threshold = 0.05, absolute = FALSE, ...)
x |
A |
threshold |
Numeric scalar in (0, 1). Interpreted as a quantile of
the depth distribution when |
absolute |
Logical. If |
... |
Ignored. |
A named list:
Logical vector of length n — TRUE for outliers.
Integer vector of row indices of outlying observations.
Matrix of outlying observations.
Depth values of outlying observations.
The actual depth cutoff used.
set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) dd <- compute_depth(data) # Flag bottom 5% by depth (default) outliers(dd) # Flag bottom 10% outliers(dd, threshold = 0.10) # Absolute depth cutoff outliers(dd, threshold = 0.05, absolute = TRUE)set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) dd <- compute_depth(data) # Flag bottom 5% by depth (default) outliers(dd) # Flag bottom 10% outliers(dd, threshold = 0.10) # Absolute depth cutoff outliers(dd, threshold = 0.05, absolute = TRUE)
For d = 2, plots the data with point size proportional to depth and outliers flagged in red. For d > 2, plots a depth profile (observation index vs depth value).
## S3 method for class 'depth' plot(x, outlier_threshold = 0.05, main = NULL, ...)## S3 method for class 'depth' plot(x, outlier_threshold = 0.05, main = NULL, ...)
x |
A |
outlier_threshold |
Quantile threshold for flagging outliers. Default 0.05. |
main |
Plot title. If |
... |
Additional arguments passed to |
Invisibly returns x, the original depth object.
Called primarily for its side effect of producing a plot.
Computes the projection depth of one or more query points with respect
to a reference distribution estimated from data, using an adaptive
random projection approximation with parallel computation.
projection_depth( x, data, tol = 0.01, batch_size = 100L, min_batches = 5L, patience = 3L, seed = 42L )projection_depth( x, data, tol = 0.01, batch_size = 100L, min_batches = 5L, patience = 3L, seed = 42L )
x |
Numeric matrix of query points (m x d), or a numeric vector of length d for a single point. |
data |
Numeric matrix of reference data (n x d). |
tol |
Convergence tolerance for the adaptive stopping rule. Default 0.01. |
batch_size |
Number of random projections per batch. Default 100. |
min_batches |
Minimum batches before checking convergence. Default 5. |
patience |
Consecutive stable batches to declare convergence. Default 3. |
seed |
Integer random seed for reproducibility. Default 42. |
Projection depth is defined via the Stahel-Donoho outlyingness — the supremum over all directions of the robust univariate Z-score of the projected point, using median and MAD as location and scale. This makes it fully robust with a high breakdown point, and affine invariant.
The deepest point under projection depth is a genuine robust estimator of multivariate location.
Numeric vector of depth values in (0, 1], one per query point.
Zuo, Y. & Serfling, R. (2000). General notions of statistical depth function. Annals of Statistics, 28(2), 461–482.
set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) x <- matrix(rnorm(25), nrow = 5, ncol = 5) projection_depth(x, data) dd <- compute_depth(data, depth_fn = projection_depth) median(dd) outliers(dd)set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) x <- matrix(rnorm(25), nrow = 5, ncol = 5) projection_depth(x, data) dd <- compute_depth(data, depth_fn = projection_depth) median(dd) outliers(dd)
Generic function for ranking. For depth objects, returns
depth-based ranks with rank 1 assigned to the deepest observation.
For all other objects, delegates to base::rank.
rank(x, ...)rank(x, ...)
x |
An object. For |
... |
Additional arguments passed to methods. |
For depth objects, an integer vector of length n where
rank 1 is the deepest observation. For other objects, see
rank.
Ranks observations by depth. Rank 1 is assigned to the deepest (most central) observation; rank n to the shallowest (most outlying).
## S3 method for class 'depth' rank(x, ...)## S3 method for class 'depth' rank(x, ...)
x |
A |
... |
Ignored. |
Integer vector of length n. Rank 1 = deepest.
Computes the simplicial depth of one or more query points with respect
to a reference distribution estimated from data, using an adaptive
Monte Carlo approximation with parallel computation.
simplicial_depth( x, data, tol = 0.05, batch_size = 200L, min_batches = 3L, max_batches = 20L, seed = 42L )simplicial_depth( x, data, tol = 0.05, batch_size = 200L, min_batches = 3L, max_batches = 20L, seed = 42L )
x |
Numeric matrix of query points (m x d), or a numeric vector of length d for a single point. |
data |
Numeric matrix of reference data (n x d). Must have at least d+1 rows. |
tol |
Relative standard error tolerance for the adaptive stopping
rule. Sampling stops when the standard error of the depth estimate
drops below |
batch_size |
Number of random simplices sampled per batch. Default 200. |
min_batches |
Minimum number of batches before checking convergence. Default 3. |
max_batches |
Maximum number of batches regardless of convergence. Acts as a hard cap on computation time. Default 20. |
seed |
Integer random seed for reproducibility. Default 42. |
Simplicial depth is the probability that a random simplex formed by d+1 points drawn from the data contains the query point. It is a genuine multivariate generalization of the median with strong geometric intuition and no distributional assumptions.
The deepest point — the simplicial median — is a robust estimator of location that reduces to the univariate median when d=1.
Numeric vector of depth values in [0, 1], one per query point. Higher values indicate greater centrality.
Liu, R. Y. (1990). On a notion of data depth based on random simplices. Annals of Statistics, 18(1), 405–414.
Zuo, Y. & Serfling, R. (2000). General notions of statistical depth function. Annals of Statistics, 28(2), 461–482.
set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) x <- matrix(rnorm(25), nrow = 5, ncol = 5) # Basic usage simplicial_depth(x, data) # Via compute_depth for full depth object dd <- compute_depth(data, depth_fn = simplicial_depth) median(dd) outliers(dd) plot(dd)set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) x <- matrix(rnorm(25), nrow = 5, ncol = 5) # Basic usage simplicial_depth(x, data) # Via compute_depth for full depth object dd <- compute_depth(data, depth_fn = simplicial_depth) median(dd) outliers(dd) plot(dd)
Computes the spatial depth of one or more query points with respect
to a reference distribution estimated from data.
spatial_depth(x, data)spatial_depth(x, data)
x |
Numeric matrix of query points (m x d), or a numeric vector of length d for a single point. |
data |
Numeric matrix of reference data (n x d). |
Spatial depth is defined as 1 minus the norm of the mean unit vector pointing from the data toward the query point. Unlike other depth functions in this package, it has a closed-form sample estimate with no Monte Carlo approximation required — making it the fastest depth function here, suitable for very large n and d.
Spatial depth is orthogonally invariant but not affine invariant.
For affine invariant depth use projection_depth or
tukey_depth.
Numeric vector of depth values in [0, 1], one per query point.
Vardi, Y. & Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proceedings of the National Academy of Sciences, 97(4), 1423–1426.
set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) x <- matrix(rnorm(25), nrow = 5, ncol = 5) spatial_depth(x, data) dd <- compute_depth(data, depth_fn = spatial_depth) median(dd) outliers(dd)set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) x <- matrix(rnorm(25), nrow = 5, ncol = 5) spatial_depth(x, data) dd <- compute_depth(data, depth_fn = spatial_depth) median(dd) outliers(dd)
Computes the Tukey halfspace depth of one or more query points with respect
to a reference distribution estimated from data.
tukey_depth( x, data, tol = 0.01, batch_size = 100L, min_batches = 5L, patience = 3L, seed = 42L )tukey_depth( x, data, tol = 0.01, batch_size = 100L, min_batches = 5L, patience = 3L, seed = 42L )
x |
Numeric matrix of query points (m x d), or a numeric vector of length d for a single point. |
data |
Numeric matrix of reference data (n x d). |
tol |
Convergence tolerance for the adaptive stopping rule. Default 0.01 (1% relative change). |
batch_size |
Number of random projections per batch. Default 100. |
min_batches |
Minimum number of batches before checking convergence. Default 5. |
patience |
Number of consecutive stable batches to declare convergence. Default 3. |
seed |
Integer random seed for reproducibility. Default 42. |
Tukey depth is the canonical multivariate depth function. The deepest point — the Tukey median — is a genuine robust generalization of the univariate median, with breakdown point up to 1/(d+1). Depth is defined purely geometrically via halfspaces with no distributional assumptions.
Exact computation is O(n^(d-1)) and infeasible for d > 3. This implementation uses an adaptive random projection approximation: depth is estimated as the minimum over random unit vector projections of the fraction of data points on either side of the query point's projection. The stopping rule automatically determines when the estimate has stabilised.
Numeric vector of depth values in [0, 0.5], one per query point.
Tukey, J. W. (1975). Mathematics and the picturing of data. Proceedings of the International Congress of Mathematicians, 2, 523–531.
Zuo, Y. & Serfling, R. (2000). General notions of statistical depth function. Annals of Statistics, 28(2), 461–482.
set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) x <- matrix(rnorm(25), nrow = 5, ncol = 5) # Basic usage tukey_depth(x, data) # Via compute_depth for full depth object dd <- compute_depth(data, depth_fn = tukey_depth) median(dd) outliers(dd)set.seed(42) data <- matrix(rnorm(500), nrow = 100, ncol = 5) x <- matrix(rnorm(25), nrow = 5, ncol = 5) # Basic usage tukey_depth(x, data) # Via compute_depth for full depth object dd <- compute_depth(data, depth_fn = tukey_depth) median(dd) outliers(dd)