Title: | Data Pre-Processing Extensions |
---|---|
Description: | An important aspect of data analytics is related to data management support for artificial intelligence. It is related to preparing data correctly. This package provides extensions to support data preparation in terms of both data sampling and data engineering. Overall, the package provides researchers with a comprehensive set of functionalities for data science based on experiment lines, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>. |
Authors: | Eduardo Ogasawara [aut, ths, cre] , Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ) [cph] |
Maintainer: | Eduardo Ogasawara <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.767 |
Built: | 2024-11-23 06:36:12 UTC |
Source: | CRAN |
Oversampling balances the class distribution of a dataset by increasing the representation of the minority class in the dataset. It wraps the smotefamily library.
bal_oversampling(attribute)
bal_oversampling(attribute)
attribute |
The class attribute to target balancing using oversampling. |
A bal_oversampling
object.
data(iris) mod_iris <- iris[c(1:50,51:71,101:111),] bal <- bal_oversampling('Species') bal <- daltoolbox::fit(bal, mod_iris) adjust_iris <- daltoolbox::transform(bal, mod_iris) table(adjust_iris$Species)
data(iris) mod_iris <- iris[c(1:50,51:71,101:111),] bal <- bal_oversampling('Species') bal <- daltoolbox::fit(bal, mod_iris) adjust_iris <- daltoolbox::transform(bal, mod_iris) table(adjust_iris$Species)
Subsampling balances the class distribution of a dataset by reducing the representation of the majority class in the dataset.
bal_subsampling(attribute)
bal_subsampling(attribute)
attribute |
The class attribute to target balancing using subsampling |
A bal_subsampling
object.
data(iris) mod_iris <- iris[c(1:50,51:71,101:111),] bal <- bal_subsampling('Species') bal <- daltoolbox::fit(bal, mod_iris) adjust_iris <- daltoolbox::transform(bal, mod_iris) table(adjust_iris$Species)
data(iris) mod_iris <- iris[c(1:50,51:71,101:111),] bal <- bal_subsampling('Species') bal <- daltoolbox::fit(bal, mod_iris) adjust_iris <- daltoolbox::transform(bal, mod_iris) table(adjust_iris$Species)
Feature selection is a process of selecting a subset of relevant features from a larger set of features in a dataset for use in model training. The FeatureSelection class in R provides a framework for performing feature selection.
fs(attribute)
fs(attribute)
attribute |
The target variable. |
An instance of the FeatureSelection class.
#See ?fs_fss for an example of feature selection
#See ?fs_fss for an example of feature selection
Forward stepwise selection is a technique for feature selection in which attributes are added to a model one at a time based on their ability to improve the model's performance. It stops adding once the candidate addition does not significantly improve model adjustment. It wraps the leaps library.
fs_fss(attribute)
fs_fss(attribute)
attribute |
The target variable. |
A fs_fss
object.
data(iris) myfeature <- daltoolbox::fit(fs_fss("Species"), iris) data <- daltoolbox::transform(myfeature, iris) head(data)
data(iris) myfeature <- daltoolbox::fit(fs_fss("Species"), iris) data <- daltoolbox::transform(myfeature, iris) head(data)
Information Gain is a feature selection technique based on information theory. It measures the information obtained for the target variable by knowing the presence or absence of a feature. It wraps the FSelector library.
fs_ig(attribute)
fs_ig(attribute)
attribute |
The target variable. |
A fs_ig
object.
data(iris) myfeature <- daltoolbox::fit(fs_ig("Species"), iris) data <- daltoolbox::transform(myfeature, iris) head(data)
data(iris) myfeature <- daltoolbox::fit(fs_ig("Species"), iris) data <- daltoolbox::transform(myfeature, iris) head(data)
Feature selection using Lasso regression is a technique for selecting a subset of relevant features. It wraps the glmnet library.
fs_lasso(attribute)
fs_lasso(attribute)
attribute |
The target variable. |
A fs_lasso
object.
data(iris) myfeature <- daltoolbox::fit(fs_lasso("Species"), iris) data <- daltoolbox::transform(myfeature, iris) head(data)
data(iris) myfeature <- daltoolbox::fit(fs_lasso("Species"), iris) data <- daltoolbox::transform(myfeature, iris) head(data)
Feature selection using Relief is a technique for selecting a subset of relevant features. It calculates the relevance of a feature by considering the difference in feature values between nearest neighbors of the same and different classes. It wraps the FSelector library.
fs_relief(attribute)
fs_relief(attribute)
attribute |
The target variable. |
A fs_relief
object.
data(iris) myfeature <- daltoolbox::fit(fs_relief("Species"), iris) data <- daltoolbox::transform(myfeature, iris) head(data)
data(iris) myfeature <- daltoolbox::fit(fs_relief("Species"), iris) data <- daltoolbox::transform(myfeature, iris) head(data)