Title: | Box-Plots and Outlier Detection for Probability Density Functions |
---|---|
Description: | Orders a data-set consisting of an ensemble of probability density functions on the same x-grid. Visualizes a box-plot of these functions based on the notion of distance determined by the user. Reports outliers based on the distance chosen and the scaling factor for an interquartile range rule. For further details, see: Alexander C. Murph et al. (2023). "Visualization and Outlier Detection for Probability Density Function Ensembles." <https://sirmurphalot.github.io/publications>. |
Authors: | Alexander C. Murph [aut, cre] , Justin D. Strait [aut] |
Maintainer: | Alexander C. Murph <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0 |
Built: | 2024-12-14 06:35:18 UTC |
Source: | CRAN |
Orders a data-set consisting of probability density functions on the same x-grid. Visualizes a boxplot of these functions based on the notion of distance determined by the user. Reports outliers based on the distance chosen and k value.
deboinr( x_grid, densities_matrix, distance = c("hellinger", "nLQD", "fisher_rao", "TV_dist", "CLR", "wasserstein", "BD_fboxplot", "MBD_fboxplot", "user_defined"), median_type = c("cross", "geometric"), center_PDFs = FALSE, user_dist = NULL, k = 1.5, num_cores = 1 )
deboinr( x_grid, densities_matrix, distance = c("hellinger", "nLQD", "fisher_rao", "TV_dist", "CLR", "wasserstein", "BD_fboxplot", "MBD_fboxplot", "user_defined"), median_type = c("cross", "geometric"), center_PDFs = FALSE, user_dist = NULL, k = 1.5, num_cores = 1 )
x_grid |
Vector. X values of the PDF |
densities_matrix |
Matrix. A n x p matrix where rows are individual PDFs and p matches the length of x_grid. |
distance |
Character. The distance metric to use for the pairwise distances, or one of the two band depth options. |
median_type |
Character. Whether the cross-median or the geometric median should be used. |
center_PDFs |
Logical. Whether or not the modes of all the PDFs should be aligned prior to performing any calculations. |
user_dist |
R Function. User-defined function that takes in two PDFs as vectors and returns a non-negative float corresponding to a distance between them. |
k |
Float. The factor by which to expand the IQR when calculating outliers. |
num_cores |
Integer. The number of cores to use if parallelizing the distance matrix calculations. |
An deboinr object containing the following:
density_order. Vector of indices corresponding to rows of densities_matrix that sort from closest to furthest from the median PDF.
outliers. Vector of indices corresponding to rows of densities_matrix that are determined to be outliers.
box_plot. ggplot object of graphic output by calling this method.
example_data = DeBoinR::pdf_data[1:100,] xx = deboinr(DeBoinR::x_grid, as.matrix(example_data), distance = "hellinger", median_type = 'cross', center_PDFs = TRUE, num_cores = 1 ) print("about to print DeBoinR object...") print(xx)
example_data = DeBoinR::pdf_data[1:100,] xx = deboinr(DeBoinR::x_grid, as.matrix(example_data), distance = "hellinger", median_type = 'cross', center_PDFs = TRUE, num_cores = 1 ) print("about to print DeBoinR object...") print(xx)
Data simulated using the the dfnWorks suite.
pdf_data x_grid
pdf_data x_grid
'pdf_data' is an n x p matrix, where n is the number of PDFs and p matches the length of x_grid. x_grid contains the points at which the PDFs are evaluated (assumed equally spaced apart).
'pdf_data' is a data frame with 1,000 rows and 5 columns. ‘x_grid'; is a timestamp of each of 'full_data'’s 1,000 rows.
pdf_data x_grid
pdf_data x_grid
Print function for a DeBoinR object. Prints ggplot graphs and other output values.
x |
deboinr object. Fit from DeBoinR main method. |
... |
Additional plotting arguments. |