Package: llmclean 0.1.1

Sadikul Islam
llmclean: LLM-Assisted Data Cleaning with Multi-Provider Support
Detects and suggests fixes for semantic inconsistencies in data frames by calling large language models (LLMs) through a unified, provider-agnostic interface. Supported providers include 'OpenAI' ('GPT-4o', 'GPT-4o-mini') <https://platform.openai.com>, 'Anthropic' ('Claude') <https://www.anthropic.com>, 'Google' ('Gemini') <https://ai.google.dev>, 'Groq' (free-tier 'LLaMA' and 'Mixtral') <https://groq.com>, and local 'Ollama' models <https://ollama.com>. The package identifies issues that rule-based tools cannot detect: abbreviation variants, typographic errors, case inconsistencies, and malformed values. Results are returned as tidy data frames with column, row index, detected value, issue type, suggested fix, and confidence score. An offline fallback using statistical and fuzzy-matching methods is provided for use without any application programming interface (API) key. Interactive fix application with human review is supported via 'apply_fixes()'. Methods follow de Jonge and van der Loo (2013) <https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf> and Chaudhuri et al. (2003) <doi:10.1145/872757.872796>.
Authors:
llmclean_0.1.1.tar.gz
llmclean_0.1.1.tar.gz(r-4.7-any)llmclean_0.1.1.tar.gz(r-4.6-any)
llmclean_0.1.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
card.svg |card.png
llmclean/json (API)
| # Install 'llmclean' in R: |
| install.packages('llmclean', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org')) |
- messy_employees - Hypothetical Messy Employee Records Dataset
- messy_survey - Hypothetical Messy Survey Response Dataset
This package does not link to any Github/Gitlab/R-forge repository. No issue tracker or development information is available.
Last updated from:03c9478b9f. Checks:4 OK. Indexed: yes.
| Target | Result | Time | Files | Syslog |
|---|---|---|---|---|
| linux-devel-x86_64 | OK | 131 | ||
| source / vignettes | OK | 229 | ||
| linux-release-x86_64 | OK | 137 | ||
| wasm-release | OK | 109 |
Exports:apply_fixesdetect_issuesget_llm_providerllmclean_reportoffline_detectset_llm_providersuggest_fixes
Dependencies:clidplyrgenericsgluelifecyclemagrittrpillarpkgconfigR6rlangtibbletidyselectutf8vctrswithr
Readme and manuals
Help Manual
| Help page | Topics |
|---|---|
| llmclean: LLM-Assisted Data Cleaning with Multi-Provider Support | llmclean-package llmclean |
| Apply Suggested Fixes to a Data Frame | apply_fixes |
| Detect Semantic Inconsistencies in a Data Frame Using an LLM | detect_issues |
| Get Current LLM Provider Configuration | get_llm_provider |
| Generate a Summary Report of LLM-Assisted Data Cleaning | llmclean_report |
| Hypothetical Messy Employee Records Dataset | messy_employees |
| Hypothetical Messy Survey Response Dataset | messy_survey |
| Offline Detection of Data Inconsistencies Without an LLM | offline_detect |
| Configure the LLM Provider for Data Cleaning | set_llm_provider |
| Request Enriched Fix Suggestions for Detected Issues | suggest_fixes |