Title: | Fast Machine Learning Model Training and Evaluation |
---|---|
Description: | Streamlines the training, evaluation, and comparison of multiple machine learning models with minimal code by providing comprehensive data preprocessing and support for a wide range of algorithms with hyperparameter tuning. It offers performance metrics and visualization tools to facilitate efficient and effective machine learning workflows. |
Authors: | Selcuk Korkmaz [aut, cre] , Dincer Goksuluk [aut] |
Maintainer: | Selcuk Korkmaz <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2024-11-25 15:02:49 UTC |
Source: | CRAN |
Evaluates the trained models on the test data and computes performance metrics.
evaluate_models(models, test_data, label, metric = "Accuracy")
evaluate_models(models, test_data, label, metric = "Accuracy")
models |
A list of trained model objects. |
test_data |
Preprocessed test data frame. |
label |
Name of the target variable. |
metric |
The performance metric to optimize (e.g., "Accuracy", "ROC"). |
A list of performance metrics for each model.
Trains and evaluates multiple classification models.
fastml( data, label, algorithms = c("xgboost", "random_forest", "svm_radial"), test_size = 0.2, resampling_method = "cv", folds = 5, tune_params = NULL, metric = "Accuracy", n_cores = 1, stratify = TRUE, impute_method = NULL, encode_categoricals = TRUE, scaling_methods = c("center", "scale"), summaryFunction = NULL, seed = 123 )
fastml( data, label, algorithms = c("xgboost", "random_forest", "svm_radial"), test_size = 0.2, resampling_method = "cv", folds = 5, tune_params = NULL, metric = "Accuracy", n_cores = 1, stratify = TRUE, impute_method = NULL, encode_categoricals = TRUE, scaling_methods = c("center", "scale"), summaryFunction = NULL, seed = 123 )
data |
A data frame containing the features and target variable. |
label |
A string specifying the name of the target variable. |
algorithms |
A vector of algorithm names to use. Default is |
test_size |
A numeric value between 0 and 1 indicating the proportion of the data to use for testing. Default is |
resampling_method |
A string specifying the resampling method for cross-validation. Default is |
folds |
An integer specifying the number of folds for cross-validation. Default is |
tune_params |
A list specifying hyperparameter tuning ranges. Default is |
metric |
The performance metric to optimize during training. Default is |
n_cores |
An integer specifying the number of CPU cores to use for parallel processing. Default is |
stratify |
Logical indicating whether to use stratified sampling when splitting the data. Default is |
impute_method |
Method for missing value imputation. Default is |
encode_categoricals |
Logical indicating whether to encode categorical variables. Default is |
scaling_methods |
Vector of scaling methods to apply. Default is |
summaryFunction |
A custom summary function for model evaluation. Default is |
seed |
An integer value specifying the random seed for reproducibility. |
An object of class fastml_model
containing the best model, performance metrics, and other information.
# Example 1: Using the iris dataset for binary classification (excluding 'setosa') data(iris) iris <- iris[iris$Species != "setosa", ] # Binary classification iris$Species <- factor(iris$Species) # Train models model <- fastml( data = iris, label = "Species" ) # View model summary summary(model) # Example 2: Using the mtcars dataset for binary classification data(mtcars) mtcars$am <- factor(mtcars$am) # Convert transmission (0 = automatic, 1 = manual) to a factor # Train models with a different resampling method and specific algorithms model2 <- fastml( data = mtcars, label = "am", algorithms = c("random_forest", "svm_radial"), resampling_method = "repeatedcv", folds = 3, test_size = 0.25 ) # View model performance summary(model2) # Example 3: Using the airquality dataset with missing values data(airquality) airquality <- na.omit(airquality) # Simple example to remove missing values for demonstration airquality$Month <- factor(airquality$Month) # Train models with categorical encoding and scaling model3 <- fastml( data = airquality, label = "Month", encode_categoricals = TRUE, scaling_methods = c("center", "scale") ) # Evaluate and compare models summary(model3) # Example 4: Custom hyperparameter tuning for a random forest data(iris) iris <- iris[iris$Species != "setosa", ] # Filter out 'setosa' for binary classification iris$Species <- factor(iris$Species) custom_tuning <- list( random_forest = expand.grid(mtry = c(1:10)) ) model4 <- fastml( data = iris, label = "Species", algorithms = c("random_forest"), tune_params = custom_tuning, metric = "Accuracy" ) # View the results summary(model4)
# Example 1: Using the iris dataset for binary classification (excluding 'setosa') data(iris) iris <- iris[iris$Species != "setosa", ] # Binary classification iris$Species <- factor(iris$Species) # Train models model <- fastml( data = iris, label = "Species" ) # View model summary summary(model) # Example 2: Using the mtcars dataset for binary classification data(mtcars) mtcars$am <- factor(mtcars$am) # Convert transmission (0 = automatic, 1 = manual) to a factor # Train models with a different resampling method and specific algorithms model2 <- fastml( data = mtcars, label = "am", algorithms = c("random_forest", "svm_radial"), resampling_method = "repeatedcv", folds = 3, test_size = 0.25 ) # View model performance summary(model2) # Example 3: Using the airquality dataset with missing values data(airquality) airquality <- na.omit(airquality) # Simple example to remove missing values for demonstration airquality$Month <- factor(airquality$Month) # Train models with categorical encoding and scaling model3 <- fastml( data = airquality, label = "Month", encode_categoricals = TRUE, scaling_methods = c("center", "scale") ) # Evaluate and compare models summary(model3) # Example 4: Custom hyperparameter tuning for a random forest data(iris) iris <- iris[iris$Species != "setosa", ] # Filter out 'setosa' for binary classification iris$Species <- factor(iris$Species) custom_tuning <- list( random_forest = expand.grid(mtry = c(1:10)) ) model4 <- fastml( data = iris, label = "Species", algorithms = c("random_forest"), tune_params = custom_tuning, metric = "Accuracy" ) # View the results summary(model4)
Loads a trained model object from a file.
load_model(filepath)
load_model(filepath)
filepath |
A string specifying the file path to load the model from. |
An object of class fastml_model
.
Generates plots to compare the performance of different models.
## S3 method for class 'fastml_model' plot(x, ...)
## S3 method for class 'fastml_model' plot(x, ...)
x |
An object of class |
... |
Additional arguments (not used). |
Displays comparison plots of model performances.
Makes predictions on new data using the trained model.
## S3 method for class 'fastml_model' predict(object, newdata, ...)
## S3 method for class 'fastml_model' predict(object, newdata, ...)
object |
An object of class |
newdata |
A data frame containing new data for prediction. |
... |
Additional arguments (not used). |
A vector of predictions.
Saves the trained model object to a file.
save_model(model, filepath)
save_model(model, filepath)
model |
An object of class |
filepath |
A string specifying the file path to save the model. |
No return value, called for its side effect of saving the model object to a file.
Provides a detailed summary of the models' performances.
## S3 method for class 'fastml_model' summary(object, sort_metric = NULL, ...)
## S3 method for class 'fastml_model' summary(object, sort_metric = NULL, ...)
object |
An object of class |
sort_metric |
A string specifying which metric to sort the models by.
Default is |
... |
Additional arguments (not used). |
Prints a summary of the models' performances and displays comparison plots.
Trains specified machine learning algorithms on the preprocessed training data.
train_models( train_data, label, algorithms, resampling_method, folds, repeats = NULL, tune_params, metric, summaryFunction = NULL, seed = 123 )
train_models( train_data, label, algorithms, resampling_method, folds, repeats = NULL, tune_params, metric, summaryFunction = NULL, seed = 123 )
train_data |
Preprocessed training data frame. |
label |
Name of the target variable. |
algorithms |
Vector of algorithm names to train. |
resampling_method |
Resampling method for cross-validation (e.g., "cv", "repeatedcv"). |
folds |
Number of folds for cross-validation. |
repeats |
Number of times to repeat cross-validation (only applicable for methods like "repeatedcv"). |
tune_params |
List of hyperparameter tuning ranges. |
metric |
The performance metric to optimize. |
summaryFunction |
A custom summary function for model evaluation. Default is |
seed |
An integer value specifying the random seed for reproducibility. |
A list of trained model objects.
Ensures that the tuneGrid includes all required hyperparameters and adjusts it based on cross-validation.
validate_tuneGrid(tuneGrid, default_params, required_params, resampling_method)
validate_tuneGrid(tuneGrid, default_params, required_params, resampling_method)
tuneGrid |
User-provided tuning grid. |
default_params |
Default hyperparameter ranges. |
required_params |
Required hyperparameters for the algorithm. |
resampling_method |
Logical indicating whether cross-validation is enabled. |
A validated and possibly modified tuneGrid.