Incorporated comments from peer-review in JAMIA Open (Weberpals et al. 2024, doi:10.1093/jamiaopen/ooae008)
Added tune
parameter to smdi_rf
to allow users to perform 5-fold cross validation and optimized random search for mtry
()
Changes to smdi_outcome
: the model
parameter option does not accept logistic
anymore for logistic regressions but glm
along with a new corresponding parameter glm_family
to allow users to take advantage of all possible glm
families as an outcome regression model (CAVE: no backwards compatibility)
Variables are now one-hot encoded before running naniar::mcar_test() in smdi_little
to address potential issues with categorical variables and to be consistent with smdi_hotelling. Results may differ slightly from those form previous versions and we suggest re-running analyses.
Changed n_cores from a warning to a message notifying the user
Improvement to smdi_style_gt
to show correct formatting in gt
exports of any supported type
General maintenance and dependency management
CRAN release
Formally implemented unit tests
Added unit test coverage report to pkgdown website
Implemented automated GitLab CI/CD pipeline to run checks on daily basis
Minor fixes and improvements in documentation of functions
Included re-exports of naniar's gg_miss_upset
and mice's md.pattern
functions to explore missing data patterns.
New function smdi_style_gt()
to make publication-ready tables based on objects of class smdi in combination with the gt()
package.
Added more details to Routine structural missing data diagnostics vignette.
Updated README
with more details and guidance on how to interpret the three group diagnostics and apply those to a real-world study.
Some improved documentation here and there.
smdi_asmd()
, and consequently also smdi_diagnose()
, now also outputs the minimum (min) and maximum (max) absolute standardized mean difference (asmd) in addition to the mean/median to provide more comprehensive information about the asmd range without having to look at each asmd plot individually.
In case of monotone missing data patterns, we observed unreasonably high AUC values for the Group 2 diagnostic which was caused by other partially observed covariates being almost perfect linear predictors of missingness. The new version has an in-built mechanism to prompt a message if AUCs are very high (> 0.9). The prompt also gives additional details about the covariate for which this behavior was observed and the strongest predictor based on the mean decrease in accuracy. In case of monotonicity this typically another partially observed covariate which would then be flagged with a “_NA” suffix. Based on the prompt, the analyst can then decide if this variable should be better dropped for the smdi diagnostics.
To address issues and learning around multivariate missing data and handling of monotone missing data patterns in smdi
, an additional vignette on Multivariate missingness and monotonicity
was added.
Change of colors in plots produced by smdi_rf()
to address color-blindness
Some improved documentation for smdi_diagnose, smdi_asmd and smdi_rf
Internal release of version 0.1.0 for beta testing
First draft of all smdi_xxx()
functions.
Implementation of parallel processing to increase computational speed using mclapply
(UNIX machines only)
Initial build of website using pkgdown
.
Added a NEWS.md
file to track changes to the package.
Created three vignettes to learn more about the smdi
package