Package: chomper 0.1.3
chomper: A Comprehensive Hit or Miss Probabilistic Entity Resolution Model
Provides Bayesian probabilistic methods for record linkage and entity resolution across multiple datasets using the Comprehensive Hit Or Miss Probabilistic Entity Resolution (CHOMPER) model. The package implements three main inference approaches: (1) Evolutionary Variational Inference for record Linkage (EVIL), (2) Coordinate Ascent Variational Inference (CAVI), and (3) Markov Chain Monte Carlo (MCMC) with split and merge process. The model supports both discrete and continuous fields, and it performs locally-varying hit mechanism for the attributes with multiple truths. It also provides tools for performance evaluation based on either approximated variational factors or posterior samples. The package is designed to support parallel computing with multi-threading support for EVIL to estimate the linkage structure faster.
Authors:
chomper_0.1.3.tar.gz
chomper_0.1.3.tar.gz(r-4.7-arm64)chomper_0.1.3.tar.gz(r-4.7-x86_64)chomper_0.1.3.tar.gz(r-4.6-arm64)chomper_0.1.3.tar.gz(r-4.6-x86_64)
manual.pdf |manual.html✨
card.svg |card.png
chomper/json (API)
| # Install 'chomper' in R: |
| install.packages('chomper', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/hjkim8987/chomper/issues
- italy - Italian Survey on Household Income and Wealth (ISHIW) data from 2020 and 2022
- simulation.high - Synthetic data with high overlap ratio
- simulation.low - Synthetic data with low overlap ratio
- simulation.medium - Synthetic data with medium overlap ratio
Last updated from:77097e0f34. Checks:5 OK, 1 FAIL. Indexed: no.
| Target | Result | Time | Files | Syslog |
|---|---|---|---|---|
| linux-devel-arm64 | OK | 178 | ||
| linux-devel-x86_64 | OK | 165 | ||
| source / vignettes | OK | 310 | ||
| linux-release-arm64 | OK | 200 | ||
| linux-release-x86_64 | OK | 163 | ||
| wasm-release | FAIL | 130 |
Exports:chomperCAVIchomperEVILchomperMCMCflatten_posterior_samplesgenerate_sample_dataperformancepsm_mcmcpsm_vi
Dependencies:RcppRcppArmadilloRcppThread
Readme and manuals
Help Manual
| Help page | Topics |
|---|---|
| chomper: A Comprehensive Hit or Miss Probabilistic Entity Resolution Model | chomper-package chomper |
| CHOMPER with a single Coordinate Ascent Variational Inference | chomperCAVI |
| CHOMPER with Evolutionary Variational Inference for Record Linkage | chomperEVIL |
| CHOMPER with Markov chain Monte Carlo with Split and Merge Process | chomperMCMC |
| Flatten the posterior samples, lambda, into a matrix | flatten_posterior_samples |
| Generate synthetic data for record linkage | generate_sample_data |
| Italian Survey on Household Income and Wealth (ISHIW) data from 2020 and 2022 | italy |
| Evaluate the performance of the linkage structure estimation | performance |
| Calculate the posterior similarity matrix | psm_mcmc |
| Calculate the posterior similarity matrix | psm_vi |
| Synthetic data with high overlap ratio (70%) | simulation.high |
| Synthetic data with low overlap ratio (30%) | simulation.low |
| Synthetic data with medium overlap ratio (50%) | simulation.medium |
