knn_imp() no longer supports Ball Tree-based imputation due to lower
accuracy compared to brute-force K-NN.pca_imp() gained a .progress argument to display imputation progress.Fixed pca_imp() tests that broke due to upstream changes in
missMDA::imputePCA() and FactoMineR::svd.triplet().
Implemented an internal .crate() function, a minimal version of
carrier::crate().
DESCRIPTION now describes the LOBPCG eigensolver and its warm-start
behavior across EM iterations in pca_imp().
Minor wording improvements.
Added citation information.
knn_imp() no longer caches pair-wise distances. The use_cache argument
has been removed.pca_imp() gained a solver argument to select the LOBPCG eigensolver,
configured via the new lobpcg_control() helper.
pca_imp() gained a clamp argument to bound imputed values, e.g.
c(0, 1) for methylation beta values. Also threaded through slide_imp()
and group_imp().
knn_imp() now uses {RcppThread} instead of OpenMP, enabling parallelism
on macOS.
tune_imp() now infers the subset from na_loc, speeding up tuning for
knn_imp() and slide_imp().
prep_groups() is now an S3 generic, replacing the previous register-on-load
pattern with {slideimp.extra}.
New article describing how to speed up PCA imputation by choosing solver
and threshold.
Fixed numerical tolerance check failure on CRAN's ATLAS configuration.
group_imp() enforces stricter data validation. The requested feature subset
must be a subset the object's column names which must be a subset of the mapping
data.frame. Set allow_unmapped = TRUE to bypass errors when intersections are
incomplete.
group_imp() and tune_imp() now error when arguments are supplied that do
not apply to the chosen imputation method, rather than silently ignoring them.
knn_imp() now uses a logical tree argument to toggle between Ball tree
(TRUE) and brute force (FALSE). KD tree is no longer supported.
knn_imp() and pca_imp() gain more early errors and early exits.
pca_imp() gains the same colmax and post_imp arguments as knn_imp().
prep_groups() (formerly group_features()) is the new name for the grouping
function. It now accepts a column name vector instead of a full matrix.
sample_na_loc() (formerly inject_na()) is now exported. The original
remains accessible via slideimp:::inject_na() for legacy code.
sim_mat() now returns a matrix in sample-by-column format for immediate
compatibility with other package functions. perc_NA is renamed to
perc_total_na, and dimensions are now specified via n (rows) and p (columns).
tune_imp() gains a unified method argument that applies to both
pca_imp() and knn_imp(), replacing pca_method and knn_method.
The rep argument is renamed to n_reps.
tune_imp() results from v0.5.4 are no longer reproducible because internal
NA generation now uses sample_na_loc().
The khanmiss1 dataset has been removed.
compute_metrics() now supports data frames with a result list column
containing truth and estimate columns, similar to {yardstick}.
group_imp() and prep_groups() automatically look up Illumina manifests
using the register-on-load pattern for {slideimp.extra}.
knn_imp() gains max_cache to control the internal cache size
(defaults to 4GB).
sim_mat() gains a rho argument to support compound symmetry correlation
structures in simulated matrices.
sim_mat() and tune_imp() gain dedicated print methods that provide concise
summaries instead of dumping raw data to the console.
slide_imp() gains location, flank, and dry_run arguments for
fixed-window imputation, "flank mode" for features surrounding a subset, and
pre-computation inspection of window statistics.
tune_imp() gains granular control over NA injection via n_cols, n_rows,
num_na, and na_col_subset. Pre-calculated locations can also be passed to
na_loc to compare methods using identical NA patterns.
col_vars() and mean_imp_col() have been overhauled to use the faster
{RcppArmadillo} backend and now support parallel computation with OpenMP.
Dependencies are streamlined. {tibble} and {purrr} are removed as hard
dependencies, {cli} is added for more informative messaging, and {carrier}
is added as an explicit dependency.
Documentation is thoroughly overhauled with numerous consistency improvements and bug fixes.
{RhpcBLASctl} is added as a suggested package to allow pinning BLAS cores
and avoid thrashing during parallel runs.
group_imp() and tune_imp() prioritize process-level parallelization via
{mirai}. knn_imp() supports OpenMP-controlled parallelization via the cores
argument when {mirai} daemons are not active.
knn_imp() and pca_imp() use optimized internal Rcpp functions for better
performance.
CRAN resubmission.
group_features() is added to help with creating the group tibble needed for
group_imp().
pca_imp() now allows row.w = "n_miss" to scale row weights by the number
of missing values per row.