Package: blocking 1.0.3

Maciej Beręsewicz

blocking: Various Blocking Methods for Entity Resolution

The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.

Authors:Maciej Beręsewicz [aut, cre], Adam Struzik [aut, ctr]

blocking_1.0.3.tar.gz
blocking_1.0.3.tar.gz(r-4.7-any)blocking_1.0.3.tar.gz(r-4.6-any)
blocking_1.0.3.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
blocking/json (API)

# Install 'blocking' in R:
install.packages('blocking', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/ncn-foreigners/blocking/issues

Pkgdown/docs site:https://ncn-foreigners.ue.poznan.pl

Datasets:
  • census - Fictional census data
  • cis - Fictional customer data
  • foreigners - Fictional 2024 population of foreigners in Poland
  • RLdata500 - RLdata500 dataset from the RecordLinkage package

On CRAN:

Conda:

4.26 score 1 packages 20 scripts 515 downloads 10 exports 49 dependencies

Last updated from:ef098b1eb2. Checks:4 OK. Indexed: no.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK226
source / vignettesOK409
linux-release-x86_64OK214
wasm-releaseOK179

Exports:blockingcontrol_annoycontrol_hnswcontrol_kdcontrol_lshcontrol_nndcontrols_anncontrols_txtest_block_errorpair_ann

Dependencies:BHbitbit64clicliprcpp11crayondata.tabledigestdqrngfloatgluehmsigraphlatticelgrlifecyclemagrittrMatrixMatrixExtramlapimlpackpillarpkgconfigprettyunitsprogressR6RcppRcppAnnoyRcppArmadilloRcppEnsmallenRcppHNSWreadrRhpcBLASctlrlangrnndescentrsparsesitmoSnowballCstringitext2vectibbletidyselecttokenizerstzdbutf8vctrsvroomwithr

Blocking records for deduplication
Setup | Blocking for deduplication

Last update: 2026-06-30
Started: 2025-06-13

Integration with existing packages
Setup | Data | Integration with the reclin2 package | Usage with fastLink package | Usage with RecordLinkage package

Last update: 2026-06-30
Started: 2025-06-13

Blocking records for record linkage
Setup | Data | Linking datasets | Using basic functionalities of blocking package | Assessing the quality | Compare results

Last update: 2026-03-11
Started: 2025-06-13