Package: DataSimilarity 0.1.1

Marieke Stolte

DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

A collection of methods for quantifying the similarity of two or more datasets, many of which can be used for two- or k-sample testing. It provides newly implemented methods as well as wrapper functions for existing methods that enable calling many different methods in a unified framework. The methods were selected from the review and comparison of Stolte et al. (2024) <doi:10.1214/24-SS149>.

Authors:Marieke Stolte [aut, cre, cph], Luca Sauer [aut], David Alvarez-Melis [ctb], Nabarun Deb [ctb], Bodhisattva Sen [ctb]

DataSimilarity_0.1.1.tar.gz
DataSimilarity_0.1.1.tar.gz(r-4.5-noble)DataSimilarity_0.1.1.tar.gz(r-4.4-noble)
DataSimilarity_0.1.1.tgz(r-4.4-emscripten)DataSimilarity_0.1.1.tgz(r-4.3-emscripten)
DataSimilarity.pdf |DataSimilarity.html
DataSimilarity/json (API)

# Install 'DataSimilarity' in R:
install.packages('DataSimilarity', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

On CRAN:

Conda:

This package does not link to any Github/Gitlab/R-forge repository. No issue tracker or development information is available.

2.00 score 66 exports 1 dependencies

Last updated 7 days agofrom:e8fa0c3cbd. Checks:3 OK. Indexed: yes.

TargetResultLatest binary
Doc / VignettesOKMar 18 2025
R-4.5-linuxOKMar 18 2025
R-4.4-linuxOKMar 18 2025

Exports:AUCBahrBallDivergenceBFBGBG2BMGBQSC2STCCSCCS_catCFCF_catCMDistanceCramerDiProPermDISCOBDISCOFDSdwdProjEnergyengineerMetricf.af.aCatf.sf.sCatfindSigmaFRFR_catFStestGGRLGGRLCatGPKgTestsgTests_catgTestsMultiHamiltonPathhammingDistHMNJeffreyskerTestsKMDknnknn.bfknn.fastLHZLHZStatisticMDMMCMMMDMSTMWNKTOTDDPetrierectPartitionRItestRosenbaumSCSHsvmProjtStatWassersteinYMRZLZCZC_cat

Dependencies:boot

Using DataSimilarity

Rendered fromvignette.Rnwusingutils::Sweaveon Mar 18 2025.

Last update: 2025-03-18
Started: 2025-03-18

Readme and manuals

Help Manual

Help pageTopics
Quantifying Similarity of Datasets and Multivariate Two- And k-Sample TestingDataSimilarity-package DataSimilarity
Bahr (1996) multivariate two-sample testBahr
Ball Divergence based two- or k-sample testBallDivergence
Baringhaus and Franz (2010) rigid motion invariant multivariate two-sample testBF
Biau and Gyorfi (2005) two-sample homogeneity testBG
Biswas and Ghosh (2014) Two-Sample TestBG2
Biswas et al. (2014) two-sample run testBMG
Barakat et al. (1996) Two-Sample TestBQS
Classifier Two-Sample TestC2ST
Weighted Edge-Count Two-Sample TestCCS
Weighted Edge-Count Two-Sample Test for Discrete DataCCS_cat
Generalized Edge-Count TestCF
Generalized Edge-Count Test for Discrete DataCF_cat
Constrained Minimum DistanceCMDistance
Cramér Two-Sample TestCramer
Direction-Projection Functions for DiProPerm Testdipro.fun dwdProj svmProj
Direction-Projection-Permutation (DiProPerm) TestDiProPerm
Distance Components (DISCO) TestsDISCOB
Distance Components (DISCO) TestsDISCOF
Rank-Based Energy Test (Deb and Sen, 2021)DS
Energy Statistic and TestEnergy
Engineer MetricengineerMetric
Friedman-Rafsky TestFR
Friedman-Rafsky Test for Discrete DataFR_cat
Multisample FS TestFStest
Decision-Tree Based Measure of Dataset Distance and Two-Sample Testf.a f.aCat f.s f.sCat GGRL GGRLCat
Generalized Permutation-Based Kernel (GPK) Two-Sample TestfindSigma GPK
Graph-Based TestsgTests
Graph-Based Tests for Discrete DatagTests_cat
Graph-Based Multi-Sample TestgTestsMulti
Shortest Hamilton pathHamiltonPath
Random Forest Based Two-Sample TestHMN
Jeffreys divergenceJeffreys
Generalized Permutation-Based Kernel (GPK) Two-Sample TestkerTests
Kernel Measure of Multi-Sample Dissimilarity (KMD)KMD
K-Nearest Neighbor Graphknn knn.bf knn.fast
Li et al. (2022) empirical characteristic distanceLHZ
Calculation of the Li et al. (2022) empirical characteristic distanceLHZStatistic
Multisample Mahalanobis Crossmatch (MMCM) TestMMCM
Maximum Mean Discrepancy (MMD) TestMMD
Minimum Spanning Tree (MST)MST
Nonparametric Graph-Based LP (GLP) TestMW
Decision-Tree Based Measure of Dataset Similarity (Ntoutsi et al., 2008)NKT
Optimal Transport Dataset DistancehammingDist OTDD
Multisample Crossmatch (MCM) TestPetrie
Calculate a rectangular partitionrectPartition
Multisample RI TestRItest
Rosenbaum Crossmatch TestRosenbaum
Graph-Based Multi-Sample TestSC
Schilling-Henze Nearest Neighbor TestSH
Univariate Two-Sample Statistics for DiProPerm TestAUC MD stat.fun tStat
Wasserstein Distance based TestWasserstein
Yu et al. (2007) Two-Sample TestYMRZL
Maxtype Edge-Count TestZC
Maxtype Edge-Count Test for Discrete DataZC_cat