Package: miceFast 0.8.5

Maciej Nasinski

miceFast: Fast Imputations Using 'Rcpp' and 'Armadillo'

Fast imputations under the object-oriented programming paradigm. Moreover there are offered a few functions built to work with popular R packages such as 'data.table' or 'dplyr'. The biggest improvement in time performance could be achieve for a calculation where a grouping variable have to be used. A single evaluation of a quantitative model for the multiple imputations is another major enhancement. A new major improvement is one of the fastest predictive mean matching in the R world because of presorting and binary search.

Authors:Maciej Nasinski [aut, cre]

miceFast_0.8.5.tar.gz
miceFast_0.8.5.tar.gz(r-4.5-noble)miceFast_0.8.5.tar.gz(r-4.4-noble)
miceFast_0.8.5.tgz(r-4.4-emscripten)miceFast_0.8.5.tgz(r-4.3-emscripten)
miceFast.pdf |miceFast.html
miceFast/json (API)
NEWS

# Install 'miceFast' in R:
install.packages('miceFast', repos = 'https://cloud.r-project.org')

Bug tracker:https://github.com/polkas/micefast/issues2 issues

Uses libs:
  • openblas– Optimized BLAS
  • c++– GNU Standard C++ Library v3
  • openmp– GCC OpenMP (GOMP) support library
Datasets:
  • air_miss - Airquality dataset with additional variables

On CRAN:

Conda:

openblascppopenmp

2.70 score 682 downloads 9 exports 3 dependencies

Last updated 2 months agofrom:f57885a1b2. Checks:3 OK. Indexed: no.

TargetResultLatest binary
Doc / VignettesOKMar 06 2025
R-4.5-linux-x86_64OKMar 06 2025
R-4.4-linux-x86_64OKMar 06 2025

Exports:compare_impcorrDatafill_NAfill_NA_NmiceFastnaive_fill_NAneiboupset_NAVIF

Dependencies:data.tableRcppRcppArmadillo

miceFast - Introduction

Rendered frommiceFast-intro.Rmdusingknitr::rmarkdownon Mar 06 2025.

Last update: 2025-02-03
Started: 2018-03-19

Citation

To cite package ‘miceFast’ in publications use:

Nasinski M (2025). miceFast: Fast Imputations Using 'Rcpp' and 'Armadillo'. R package version 0.8.5, https://CRAN.R-project.org/package=miceFast.

Corresponding BibTeX entry:

  @Manual{,
    title = {miceFast: Fast Imputations Using 'Rcpp' and 'Armadillo'},
    author = {Maciej Nasinski},
    year = {2025},
    note = {R package version 0.8.5},
    url = {https://CRAN.R-project.org/package=miceFast},
  }

Readme and manuals

miceFast

Author: Maciej Nasinski

Check the miceFast website for more details

R build status CRAN codecov Dependencies

Overview

miceFast provides fast methods for imputing missing data, leveraging an object-oriented programming paradigm and optimized linear algebra routines.
The package includes convenient helper functions compatible with data.table, dplyr, and other popular R packages.

Major speed improvements occur when:

  • Using a grouping variable, where the data is automatically sorted by group, significantly reducing computation time.
  • Performing multiple imputations, by evaluating the underlying quantitative model only once for multiple draws.
  • Running Predictive Mean Matching (PMM), thanks to presorting and binary search.

For performance details, see performance_validity.R in the extdata folder.

It is recommended to read the Advanced Usage Vignette.

Installation

You can install miceFast from CRAN:

install.packages("miceFast")

Or install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("polkas/miceFast")

Quick Example

Below is a short demonstration. See the vignette for advanced usage and best practices.

library(miceFast)

set.seed(1234)
data(air_miss)

# Visualize the NA structure
upset_NA(air_miss, 6)

# Simple and naive fill
imputed_data <- naive_fill_NA(air_miss)

# Compare with other packages:
# Hmisc
library(Hmisc)
data.frame(Map(function(x) Hmisc::impute(x, "random"), air_miss))

# mice
library(mice)
mice::complete(mice::mice(air_miss, printFlag = FALSE))

Key Features

  • Object-Oriented Interface via miceFast objects (Rcpp modules).
  • Convenient Helpers:
    • fill_NA(): Single imputation (lda, lm_pred, lm_bayes, lm_noise).
    • fill_NA_N(): Multiple imputations (pmm, lm_bayes, lm_noise).
    • VIF(): Variance Inflation Factor calculations.
    • naive_fill_NA(): Automatic naive imputations.
    • compare_imp(): Compare original vs. imputed values.
    • upset_NA(): Visualize NA structure using UpSetR.

Quick Reference Table:

Function Description
new(miceFast) Creates an OOP instance with numerous imputation methods (see the vignette).
fill_NA() Single imputation: lda, lm_pred, lm_bayes, lm_noise.
fill_NA_N() Multiple imputations (N repeats): pmm, lm_bayes, lm_noise.
VIF() Computes Variance Inflation Factors.
naive_fill_NA() Performs automatic, naive imputations.
compare_imp() Compares imputations vs. original data.
upset_NA() Visualizes NA structure using an UpSet plot.

Performance Highlights

Benchmark testing (on R 4.2, macOS M1) shows miceFast can significantly reduce computation time, especially in these scenarios:

  • Linear Discriminant Analysis (LDA): ~5x faster.
  • Grouping Variable Imputations: ~10x faster (and can exceed 100x in some edge cases).
  • Multiple Imputations: ~x * (number of multiple imputations) faster, since the model is computed only once.
  • Variance Inflation Factors (VIF): ~5x faster, because we only compute the inverse of X'X.
  • Predictive Mean Matching (PMM): ~3x faster, thanks to presorting and binary search.

For performance details, see performance_validity.R in the extdata folder.

Help Manual

Help pageTopics
miceFast package for fast multiple imputations.miceFast-package
airquality dataset with additional variablesair_miss
Comparing imputations and original data distributionscompare_imp
'fill_NA' function for the imputations purpose.fill_NA fill_NA.data.frame fill_NA.data.table fill_NA.matrix
'fill_NA_N' function for the multiple imputations purposefill_NA_N fill_NA_N.data.frame fill_NA_N.data.table fill_NA_N.matrix
'naive_fill_NA' function for the simple and automatic imputationnaive_fill_NA naive_fill_NA.data.frame naive_fill_NA.data.table naive_fill_NA.matrix
Finding in random manner one of the k closets points in a certain vector for each value in a second vectorneibo
Class '"Rcpp_corrData"'corrData Rcpp_corrData-class
Class '"Rcpp_miceFast"'miceFast Rcpp_miceFast-class
upset plot for NA valuesupset_NA
'VIF' function for assessing VIF.VIF VIF.data.frame VIF.data.table VIF.matrix