| Title: | Multimodal Late Fusion with 'caret' |
|---|---|
| Description: | Extends the 'caret' framework to support late fusion workflows, enabling users to train models independently across multiple data modalities and combine their predictions into a single meta-model. Designed for developers, data scientists, and biomedical researchers alike, 'caretMultimodal' aims to make late fusion ensemble modelling as accessible and flexible as single-dataset workflows in 'caret'. Late fusion methods are based on Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1>. |
| Authors: | Josh Dyce [aut, cre], Amrit Singh [aut] |
| Maintainer: | Josh Dyce <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-06-30 16:49:18 UTC |
| Source: | https://github.com/cran/caretMultimodal |
caret_list objectBuilds a list of caret::train objects, where each model corresponds to a data set in data_list.
The resulting list is used as input to caret_stack() to construct a meta model.
caret_list( target, data_list, method, identifier_column_name = NULL, trControl = NULL, metric = NULL, trim = TRUE, do_parallel = TRUE, ... )caret_list( target, data_list, method, identifier_column_name = NULL, trControl = NULL, metric = NULL, trim = TRUE, do_parallel = TRUE, ... )
target |
Target vector, either numeric for regression or a factor/character for classification. |
data_list |
A named list of matrix-like objects, where each element is a dataset to train a separate model on.
Names are preserved in the returned |
method |
The method to train the models with. Can be a custom method or one found in |
identifier_column_name |
A string giving the name of a column that links rows across datasets (e.g. a participant ID).
If provided, this column must be present in all datasets in Note: Providing |
trControl |
Control for use with the |
metric |
Metric for use with |
trim |
Logical, whether the trained models should be trimmed to save memory. Default is |
do_parallel |
Logical, whether to parallelize model training across datasets. Default is |
... |
Additional arguments to pass to the |
A caret_list object (a named list of trained caret::train models corresponding to data_list).
set.seed(42) data(heart_failure_datasets) data_list <- heart_failure_datasets[c("cells", "holter", "mrna", "proteins")] # Define hyperparameters to tune (optional) tuneGrid <- expand.grid(alpha = 0.5, lambda = c(0.01, 0.1)) # Construct caret_list object base_models <- caret_list( target = heart_failure_datasets$demo$hospitalizations, data_list = data_list, method = "glmnet", tuneGrid = tuneGrid ) class(base_models)set.seed(42) data(heart_failure_datasets) data_list <- heart_failure_datasets[c("cells", "holter", "mrna", "proteins")] # Define hyperparameters to tune (optional) tuneGrid <- expand.grid(alpha = 0.5, lambda = c(0.01, 0.1)) # Construct caret_list object base_models <- caret_list( target = heart_failure_datasets$demo$hospitalizations, data_list = data_list, method = "glmnet", tuneGrid = tuneGrid ) class(base_models)
caret_stack object.Train an ensemble (stacked) model from the base learners in a
caret_list. The ensemble is itself a caret::train model that learns to
combine the predictions of the base models. By default, the meta-learner is
trained on out-of-fold predictions from the resampling process, ensuring that
the ensemble does not overfit to in-sample predictions. Alternatively, new
datasets can be supplied via data_list and target for transfer-learning
style ensembling.
caret_stack( caret_list, method, data_list = NULL, target = NULL, trControl = NULL, metric = NULL, ... )caret_stack( caret_list, method, data_list = NULL, target = NULL, trControl = NULL, metric = NULL, ... )
caret_list |
a |
method |
The method to train the ensemble model. Can be a custom method or one found in |
data_list |
A list of datasets to predict on, with each dataset matching the corresponding model in |
target |
Target parameter vector that must be provided if predicting on a new data list.
If |
trControl |
Control for use with the |
metric |
Metric for use with |
... |
Additional arguments to pass to |
A caret_stack object.
set.seed(42) data(heart_failure_datasets) data_list <- heart_failure_datasets[c("cells", "holter", "mrna", "proteins")] # Define hyperparameters to tune (optional) tuneGrid <- expand.grid(alpha = 0.5, lambda = c(0.01, 0.1)) # Construct caret_list object base_models <- caret_list( target = heart_failure_datasets$demo$hospitalizations, data_list = data_list, method = "glmnet", tuneGrid = tuneGrid ) # Train a Random Forest stacked model on the out-of-fold predictions from the base models stacked_model <- caret_stack( caret_list = base_models, method = "rf" ) class(stacked_model)set.seed(42) data(heart_failure_datasets) data_list <- heart_failure_datasets[c("cells", "holter", "mrna", "proteins")] # Define hyperparameters to tune (optional) tuneGrid <- expand.grid(alpha = 0.5, lambda = c(0.01, 0.1)) # Construct caret_list object base_models <- caret_list( target = heart_failure_datasets$demo$hospitalizations, data_list = data_list, method = "glmnet", tuneGrid = tuneGrid ) # Train a Random Forest stacked model on the out-of-fold predictions from the base models stacked_model <- caret_stack( caret_list = base_models, method = "rf" ) class(stacked_model)
caret_stack
This function performs an ablation analysis on a caret_stack ensemble to evaluate
each base learner's contribution to predictive performance.
Starting from the full ensemble, the procedure iteratively removes one base learner per step. At each step:
The ensemble meta-learner is retrained on the remaining base learners,
using the same method, tuneGrid, and trControl as
the original stack.
Variable importance scores are extracted from the retrained meta-learner to estimate each remaining learner's relative contribution.
Out-of-fold predictions are generated and scored with metric_function.
The learner with the lowest importance score (or highest, if
reverse = TRUE) is removed before the next iteration.
## S3 method for class 'caret_stack' compute_ablation(object, metric_function, metric_name, reverse = FALSE, ...)## S3 method for class 'caret_stack' compute_ablation(object, metric_function, metric_name, reverse = FALSE, ...)
object |
A |
metric_function |
A function that takes two arguments |
metric_name |
The name of the metric. Used as a row label in the returned |
reverse |
Logical, controls the direction to ablate in. If |
... |
Not used. Included for S3 compatibility. |
A data.table
This function does not support for multiclass classifiers.
# Load pre-trained example caret_stack object data(heart_failure_stack) # Since the example stack is a binary classifier, # this metric function needs to take in predictions (floats) and # ground truth (binary vector), and produce a single number. metric_fun <- function(preds, target) { pROC::roc(response = target, predictor = preds, quiet = TRUE)$auc } compute_ablation(heart_failure_stack, metric_fun, "AUC")# Load pre-trained example caret_stack object data(heart_failure_stack) # Since the example stack is a binary classifier, # this metric function needs to take in predictions (floats) and # ground truth (binary vector), and produce a single number. metric_fun <- function(preds, target) { pROC::roc(response = target, predictor = preds, quiet = TRUE)$auc } compute_ablation(heart_failure_stack, metric_fun, "AUC")
caret_stack.Computes the contribution of each individual feature to the ensemble's
predictions using a two-stage application of caret::varImp:
Dataset-level weights: varImp is applied to the
ensemble meta-learner, treating each base model's predictions as a
feature. This yields a relative importance weight for each dataset.
Feature-level importance: varImp is applied to
each base model individually, yielding feature importance scores
within each dataset.
The final contribution of a feature is the product of its dataset-level weight and its within-dataset feature importance score. All scores are normalized to sum to 100.
## S3 method for class 'caret_stack' compute_feature_contributions(object, n_features = 20, ...)## S3 method for class 'caret_stack' compute_feature_contributions(object, n_features = 20, ...)
object |
A |
n_features |
The maximum number of features to include. Setting to a very large value will include all features. Default is 20. |
... |
Not used. Included for S3 compatibility. |
A data.table
# Load pre-trained example caret_stack object data(heart_failure_stack) compute_feature_contributions(heart_failure_stack)# Load pre-trained example caret_stack object data(heart_failure_stack) compute_feature_contributions(heart_failure_stack)
The metric_function is applied to the out-of-fold predictions for the caret_stack.
## S3 method for class 'caret_stack' compute_metric(object, metric_function, metric_name, descending = TRUE, ...)## S3 method for class 'caret_stack' compute_metric(object, metric_function, metric_name, descending = TRUE, ...)
object |
A |
metric_function |
A function that takes two arguments |
metric_name |
The name of the metric |
descending |
Whether to sort in descending order. If |
... |
Not used. Included for S3 compatibility. |
A data.table of metrics
# Load pre-trained example caret_stack object data(heart_failure_stack) # Since the example stack is a binary classifier, # this metric function needs to take in predictions (floats) and # ground truth (binary vector), and produce a single number. metric_fun <- function(preds, target) { pROC::roc(response = target, predictor = preds, quiet = TRUE)$auc } compute_metric(heart_failure_stack, metric_fun, "AUC")# Load pre-trained example caret_stack object data(heart_failure_stack) # Since the example stack is a binary classifier, # this metric function needs to take in predictions (floats) and # ground truth (binary vector), and produce a single number. metric_fun <- function(preds, target) { pROC::roc(response = target, predictor = preds, quiet = TRUE)$auc } compute_metric(heart_failure_stack, metric_fun, "AUC")
The relative contributions are calculated using the caret::varImp function on the ensemble model.
A scaling factor is applied to make the contributions sum to 100%.
## S3 method for class 'caret_stack' compute_model_contributions(object, descending = TRUE, ...)## S3 method for class 'caret_stack' compute_model_contributions(object, descending = TRUE, ...)
object |
A |
descending |
Whether to sort in descending order. If |
... |
Not used. Included for S3 compatibility. |
A data.table
# Load pre-trained example caret_stack object data(heart_failure_stack) compute_model_contributions(heart_failure_stack)# Load pre-trained example caret_stack object data(heart_failure_stack) compute_model_contributions(heart_failure_stack)
A multimodal dataset from Singh et al. (2019) containing demographic, cellular, electrophysiological, and molecular features for predicting cardiac-related hospitalizations. Used in examples throughout the caretMultimodal package.
heart_failure_datasetsheart_failure_datasets
A named list with 5 elements:
A data.frame of demographic features
A data.frame of cell count features
A data.frame of Holter monitor (ECG) features
A data.frame of mRNA expression features
A data.frame of protein abundance features
Singh et al. Ensembling Electrical and Proteogenomics Biomarkers for Improved Prediction of Cardiac-Related 3-Month Hospitalizations: A Pilot Study. Can J Cardiol. 2019 Apr. doi:10.1016/j.cjca.2018.11.021
caret_stack on Heart Failure DatasetsA caret_stack object pre-trained on heart_failure_datasets.
Used in examples throughout the caretMultimodal package.
heart_failure_stackheart_failure_stack
A caret_stack object
caret_list
Retrieve the out-of-fold predictions corresponding to the best hyperparameter setting of the trained caret models. These predictions come from the resampling process (not the final refit) and can optionally be aggregated across resamples to produce a single prediction per training instance.
## S3 method for class 'caret_list' oof_predictions( object, drop_redundant_class = TRUE, aggregate_resamples = TRUE, intersection_only = TRUE, ... )## S3 method for class 'caret_list' oof_predictions( object, drop_redundant_class = TRUE, aggregate_resamples = TRUE, intersection_only = TRUE, ... )
object |
A |
drop_redundant_class |
Logical, whether to exclude the first class level from prediction output. Default is |
aggregate_resamples |
Logical, whether to aggregate resamples across folds. Default is |
intersection_only |
Logical, whether to trim down the out-of-fold predictions to only the intersection of
samples that are present across all models in the list (i.e., the intersection of training indices used during resampling).
Default is |
... |
Not used. Included for S3 compatibility. |
A data.table::data.table of out-of-fold predictions, with samples as rows and predictions as columns.
# Load pre-trained example caret_stack object data(heart_failure_stack) # Extract the caret_list object from the caret_stack base_models <- heart_failure_stack$caret_list oof_predictions(base_models)# Load pre-trained example caret_stack object data(heart_failure_stack) # Extract the caret_list object from the caret_stack base_models <- heart_failure_stack$caret_list oof_predictions(base_models)
Retrieve the out-of-fold predictions corresponding to the best hyperparameter setting of a trained ensemble model. These predictions come from the resampling process (not the final refit) and can optionally be aggregated across resamples to produce a single prediction per training instance.
The base model predictions returned here are the training data for the ensemble; depending on model setup, these may be true out-of-fold predictions or simply fitted values. For classification models, the predictions always exclude the first class index.
## S3 method for class 'caret_stack' oof_predictions( object, drop_redundant_class = TRUE, aggregate_resamples = TRUE, ... )## S3 method for class 'caret_stack' oof_predictions( object, drop_redundant_class = TRUE, aggregate_resamples = TRUE, ... )
object |
A |
drop_redundant_class |
A boolean controlling whether to exclude the first class level from prediction output. Default is |
aggregate_resamples |
Logical, whether to aggregate resamples across folds. Default is |
... |
Not used. Included for S3 compatibility. |
A data.table::data.table of OOF predictions
# Load pre-trained example caret_stack object data(heart_failure_stack) oof_predictions(heart_failure_stack)# Load pre-trained example caret_stack object data(heart_failure_stack) oof_predictions(heart_failure_stack)
caret_stack.Makes a bar plot from compute_ablation.caret_stack output.
## S3 method for class 'caret_stack' plot_ablation(object, metric_function, metric_name, reverse = FALSE, ...)## S3 method for class 'caret_stack' plot_ablation(object, metric_function, metric_name, reverse = FALSE, ...)
object |
A |
metric_function |
A function that takes two arguments |
metric_name |
The name of the metric. Used as a row label in the returned |
reverse |
Logical, controls the direction to ablate in. If |
... |
Not used. Included for S3 compatibility. |
A ggplot2 bar plot
# Load pre-trained example caret_stack object data(heart_failure_stack) # Since the example stack is a binary classifier, # this metric function needs to take in predictions (floats) and # ground truth (binary vector), and produce a single number. metric_fun <- function(preds, target) { pROC::roc(response = target, predictor = preds, quiet = TRUE)$auc } plot_ablation(heart_failure_stack, metric_fun, "AUC")# Load pre-trained example caret_stack object data(heart_failure_stack) # Since the example stack is a binary classifier, # this metric function needs to take in predictions (floats) and # ground truth (binary vector), and produce a single number. metric_fun <- function(preds, target) { pROC::roc(response = target, predictor = preds, quiet = TRUE)$auc } plot_ablation(heart_failure_stack, metric_fun, "AUC")
caret_stack.Constructs a bar plot with the output of compute_feature_contributions.caret_stack.
## S3 method for class 'caret_stack' plot_feature_contributions(object, n_features = 20, ...)## S3 method for class 'caret_stack' plot_feature_contributions(object, n_features = 20, ...)
object |
A |
n_features |
The maximum number of features to include. Setting to a very large value will include all features. Default is 20. |
... |
Not used. Included for S3 compatibility. |
A ggplot2 bar plot.
# Load pre-trained example caret_stack object data(heart_failure_stack) plot_feature_contributions(heart_failure_stack)# Load pre-trained example caret_stack object data(heart_failure_stack) plot_feature_contributions(heart_failure_stack)
This function constructs a bar plot with the output of the compute metric method. The bars are ordered by increasing value.
## S3 method for class 'caret_stack' plot_metric(object, metric_function, metric_name, descending = TRUE, ...)## S3 method for class 'caret_stack' plot_metric(object, metric_function, metric_name, descending = TRUE, ...)
object |
A |
metric_function |
A function that takes two arguments |
metric_name |
The name of the metric |
descending |
Whether to sort in descending order. If |
... |
Not used. Included for S3 compatibility. |
A ggplot2 bar chart
# Load pre-trained example caret_stack object data(heart_failure_stack) # Since the example stack is a binary classifier, # this metric function needs to take in predictions (floats) and # ground truth (binary vector), and produce a single number. metric_fun <- function(preds, target) { pROC::roc(response = target, predictor = preds, quiet = TRUE)$auc } plot_metric(heart_failure_stack, metric_fun, "AUC")# Load pre-trained example caret_stack object data(heart_failure_stack) # Since the example stack is a binary classifier, # this metric function needs to take in predictions (floats) and # ground truth (binary vector), and produce a single number. metric_fun <- function(preds, target) { pROC::roc(response = target, predictor = preds, quiet = TRUE)$auc } plot_metric(heart_failure_stack, metric_fun, "AUC")
The relative contributions are calculated using the caret::varImp function on the ensemble model.
A scaling factor is applied to make the contributions sum to 100%.
## S3 method for class 'caret_stack' plot_model_contributions(object, descending = TRUE, ...)## S3 method for class 'caret_stack' plot_model_contributions(object, descending = TRUE, ...)
object |
A |
descending |
Whether to sort in descending order. If |
... |
Not used. Included for S3 compatibility. |
A ggplot2 bar chart
# Load pre-trained example caret_stack object data(heart_failure_stack) plot_model_contributions(heart_failure_stack)# Load pre-trained example caret_stack object data(heart_failure_stack) plot_model_contributions(heart_failure_stack)
This function calculates ROC curves for all base models and the ensemble model
using the out-of-fold predictions from a caret_stack object.
The pROC package is used to compute the ROC curves. ROC curves can only be constructed for binary classifiers.
## S3 method for class 'caret_stack' plot_roc(object, include_auc = TRUE, ...)## S3 method for class 'caret_stack' plot_roc(object, include_auc = TRUE, ...)
object |
A |
include_auc |
Whether to include AUC values in the legend. Default is |
... |
Not used. Included for S3 compatibility. |
A ggplot2 object
# Load pre-trained example caret_stack object data(heart_failure_stack) plot_roc(heart_failure_stack)# Load pre-trained example caret_stack object data(heart_failure_stack) plot_roc(heart_failure_stack)
caret_list
Generate a matrix of predictions from each model in a caret_list.
For classification models, probabilities are always returned, with the option to drop
one class to avoid multicollinearity in downstream stacking models.
## S3 method for class 'caret_list' predict(object, data_list, drop_redundant_class = TRUE, ...)## S3 method for class 'caret_list' predict(object, data_list, drop_redundant_class = TRUE, ...)
object |
A |
data_list |
A list of datasets to predict on, with each dataset matching the corresponding model in |
drop_redundant_class |
Logical, whether to exclude the first class level from prediction output. Default is |
... |
Additional arguments to pass to |
A data.table::data.table of predictions
# Load example data and pre-trained caret_stack object data(heart_failure_datasets) data(heart_failure_stack) # Extract the caret_list object from the caret_stack base_models <- heart_failure_stack$caret_list # List of datasets to predict on data_list <- heart_failure_datasets[c("cells", "holter", "mrna", "proteins")] predict(base_models, data_list)# Load example data and pre-trained caret_stack object data(heart_failure_datasets) data(heart_failure_stack) # Extract the caret_list object from the caret_stack base_models <- heart_failure_stack$caret_list # List of datasets to predict on data_list <- heart_failure_datasets[c("cells", "holter", "mrna", "proteins")] predict(base_models, data_list)
caret_stack object.Create a matrix of predictions for a caret_stack object.
## S3 method for class 'caret_stack' predict(object, data_list, drop_redundant_class = TRUE, ...)## S3 method for class 'caret_stack' predict(object, data_list, drop_redundant_class = TRUE, ...)
object |
A |
data_list |
A list of datasets to predict on, with each dataset matching the corresponding model in |
drop_redundant_class |
A boolean controlling whether to exclude the first class from prediction output. Default is |
... |
Additional arguments to pass to |
A data.table::data.table of predictions for base and ensemble models.
# Load example data and pre-trained caret_stack object data(heart_failure_datasets) data(heart_failure_stack) # List of datasets to predict on data_list <- heart_failure_datasets[c("cells", "holter", "mrna", "proteins")] predict(heart_failure_stack, data_list)# Load example data and pre-trained caret_stack object data(heart_failure_datasets) data(heart_failure_stack) # List of datasets to predict on data_list <- heart_failure_datasets[c("cells", "holter", "mrna", "proteins")] predict(heart_failure_stack, data_list)
Converts a MultiAssayExperiment object from the MultiAssayExperiment package to a
simple list of datasets to pass into caret_list.
prepare_mae(mae, transpose = FALSE, ...)prepare_mae(mae, transpose = FALSE, ...)
mae |
The MultiAssayExperiment object. |
transpose |
Whether to transpose the individual matrices. Samples must correspond to rows for caret_list. Default is FALSE. |
... |
Not used. Included for S3 compatibility. |
A named list of matrices.
caret_list models.Provide a summary of the best tuning parameters and resampling metrics for all the caret_list models.
## S3 method for class 'caret_list' summary(object, ...)## S3 method for class 'caret_list' summary(object, ...)
object |
a |
... |
Not used. Included for S3 compatibility. |
A data.table with tunes and metrics from each model.
# Load pre-trained example caret_stack object data(heart_failure_stack) # Extract the caret_list object from the caret_stack base_models <- heart_failure_stack$caret_list summary(base_models)# Load pre-trained example caret_stack object data(heart_failure_stack) # Extract the caret_list object from the caret_stack base_models <- heart_failure_stack$caret_list summary(base_models)
caret_stack objectGet a summary of a caret_stack object
## S3 method for class 'caret_stack' summary(object, ...)## S3 method for class 'caret_stack' summary(object, ...)
object |
A |
... |
Not used. Included for S3 compatibility. |
A data.table of methods, tuning parameters and performance metrics for the base and ensemble model
# Load pre-trained example caret_stack object data(heart_failure_stack) summary(heart_failure_stack)# Load pre-trained example caret_stack object data(heart_failure_stack) summary(heart_failure_stack)