Title: | Leveraging Experiment Lines to Data Analytics |
---|---|
Description: | The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>. |
Authors: | Eduardo Ogasawara [aut, ths, cre] , Antonio Castro [aut, ctb], Heraldo Borges [aut, ths], Janio Lima [aut, ths], Lucas Tavares [aut, ths], Diego Carvalho [aut, ths], Eduardo Bezerra [aut, ths], Joel Santos [aut, ths], Rafaelli Coutinho [aut, ths], Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ) [cph] |
Maintainer: | Eduardo Ogasawara <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.727 |
Built: | 2024-11-25 19:30:58 UTC |
Source: | CRAN |
Extracts a subset of a time series object based on specified rows and columns. The function allows for flexible indexing and subsetting of time series data.
## S3 method for class 'ts_data' x[i, j, ...]
## S3 method for class 'ts_data' x[i, j, ...]
x |
|
i |
row i |
j |
column j |
... |
optional arguments |
returns a new ts_data object
data(sin_data) data10 <- ts_data(sin_data$y, 10) ts_head(data10) #single line data10[12,] #range of lines data10[12:13,] #single column data10[,1] #range of columns data10[,1:2] #range of rows and columns data10[12:13,1:2] #single line and a range of columns #'data10[12,1:2] #range of lines and a single column data10[12:13,1] #single observation data10[12,1]
data(sin_data) data10 <- ts_data(sin_data$y, 10) ts_head(data10) #single line data10[12,] #range of lines data10[12:13,] #single column data10[,1] #range of columns data10[,1:2] #range of rows and columns data10[12:13,1:2] #single line and a range of columns #'data10[12,1:2] #range of lines and a single column data10[12:13,1] #single observation data10[12,1]
Creates an deep learning adversarial autoencoder to encode a sequence of observations. It wraps the pytorch library.
aae_encode( input_size, encoding_size, batch_size = 350, num_epochs = 1000, learning_rate = 0.001 )
aae_encode( input_size, encoding_size, batch_size = 350, num_epochs = 1000, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
a aae_encode
object.
#See an example of using aae_encode
at this
#link
Creates an deep learning adversarial autoencoder to encode a sequence of observations. It wraps the pytorch library.
aae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
aae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
a aae_encode_decode
object.
#See an example of using `aae_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/aae_enc_decode.ipynb)
#See an example of using `aae_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/aae_enc_decode.ipynb)
Executes the action of model applied in provided data
action(obj, ...)
action(obj, ...)
obj |
object: a dal_base object to apply the transformation on the input dataset. |
... |
optional arguments. |
returns the result of an action of the model applied in provided data
data(iris) # an example is minmax normalization trans <- minmax() trans <- fit(trans, iris) tiris <- action(trans, iris)
data(iris) # an example is minmax normalization trans <- minmax() trans <- fit(trans, iris) tiris <- action(trans, iris)
A default function that defines the action to proxy transform method
## S3 method for class 'dal_transform' action(obj, ...)
## S3 method for class 'dal_transform' action(obj, ...)
obj |
object |
... |
optional arguments |
returns a transformed data
#See ?minmax for an example of transformation
#See ?minmax for an example of transformation
Converts a vector into a categorical mapping, where each category is represented by a specific value. By default, the values represent binary categories (true/false)
adjust_class_label(x, valTrue = 1, valFalse = 0)
adjust_class_label(x, valTrue = 1, valFalse = 0)
x |
vector to be categorized |
valTrue |
value to represent true |
valFalse |
value to represent false |
returns an adjusted categorical mapping
Converts a dataset to a data.frame
if it is not already in that format
adjust_data.frame(data)
adjust_data.frame(data)
data |
dataset |
returns a data.frame
data(iris) df <- adjust_data.frame(iris)
data(iris) df <- adjust_data.frame(iris)
Converts a vector into a factor with specified levels and labels
adjust_factor(value, ilevels, slevels)
adjust_factor(value, ilevels, slevels)
value |
vector to be converted into factor |
ilevels |
order for categorical values |
slevels |
labels for categorical values |
returns an adjusted factor
Converts a dataset to a matrix format if it is not already in that format
adjust_matrix(data)
adjust_matrix(data)
data |
dataset |
returns an adjusted matrix
data(iris) mat <- adjust_matrix(iris)
data(iris) mat <- adjust_matrix(iris)
ts_data
Converts a dataset to a ts_data
object
adjust_ts_data(data)
adjust_ts_data(data)
data |
dataset |
returns an adjusted ts_data
Creates an deep learning autoencoder to encode a sequence of observations. It wraps the pytorch and reticulate libraries.
autoenc_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
autoenc_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
returns a autoenc_encode
object.
#See an example of using `autoenc_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/van_encode.ipynb)
#See an example of using `autoenc_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/van_encode.ipynb)
Creates an deep learning autoencoder to encode-decode a sequence of observations. It wraps the pytorch and reticulate libraries.
autoenc_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
autoenc_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
returns a autoenc_encode_decode
object.
#See an example of using `autoenc_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/van_enc_decode.ipynb)
#See an example of using `autoenc_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/van_enc_decode.ipynb)
housing values in suburbs of Boston.
crim: per capita crime rate by town.
zn: proportion of residential land zoned for lots over 25,000 sq.ft.
indus: proportion of non-retail business acres per town
chas: Charles River dummy variable (= 1 if tract bounds)
nox: nitric oxides concentration (parts per 10 million)
rm: average number of rooms per dwelling
age: proportion of owner-occupied units built prior to 1940
dis: weighted distances to five Boston employment centres
rad: index of accessibility to radial highways
tax: full-value property-tax rate per $10,000
ptratio: pupil-teacher ratio by town
black: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
lstat: percentage of lower status of the population
medv: Median value of owner-occupied homes in $1000's
data(Boston)
data(Boston)
Regression Dataset.
This dataset was obtained from the MASS library.
Creator: Harrison, D. and Rubinfeld, D.L. Hedonic prices and the demand for clean air, J. Environ. Economics & Management, vol.5, 81-102, 1978.
data(Boston) head(Boston)
data(Boston) head(Boston)
Creates an deep learning convolutional autoencoder to encode a sequence of observations. It wraps the pytorch library.
cae_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
cae_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
a cae_encode
object.
#See an example of using `cae_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae_encode.ipynb)
#See an example of using `cae_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae_encode.ipynb)
Creates an deep learning convolutional autoencoder to encode a sequence of observations. It wraps the pytorch library.
cae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
cae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
a cae_encode_decode
object.
#See an example of using `cae_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae_enc_decode.ipynb)
#See an example of using `cae_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae_enc_decode.ipynb)
Creates an deep learning convolutional autoencoder to encode a sequence of observations. It wraps the pytorch library.
cae2d_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
cae2d_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
a cae2d_encode_decode
object.
returns a cae2d_encode_decode
object.
#See an example of using `cae2d_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2d_enc_decode.ipynb)
#See an example of using `cae2d_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2d_enc_decode.ipynb)
Creates an deep learning convolutional denoising autoencoder to encode a sequence of observations. It wraps the pytorch library.
Creates an deep learning convolutional autoencoder to encode a sequence of observations. It wraps the pytorch library.
cae2den_encode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 ) cae2den_encode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
cae2den_encode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 ) cae2den_encode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
a c2den_encode_decode
object.
a cae2den_encode
object.
#See an example of using `c2den_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/c2den_encode.ipynb) #See an example of using `cae2den_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2den_encode.ipynb)
#See an example of using `c2den_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/c2den_encode.ipynb) #See an example of using `cae2den_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2den_encode.ipynb)
Creates an deep learning convolutional denoising autoencoder to encode a sequence of observations. It wraps the pytorch library.
cae2den_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
cae2den_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
a c2den_encode_decode
object.
#See an example of using `cae2den_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2den_enc_decode.ipynb)
#See an example of using `cae2den_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2den_enc_decode.ipynb)
Categorical mapping provides a way to map the levels of a categorical variable to new values. Each possible value is converted to a binary attribute.
categ_mapping(attribute)
categ_mapping(attribute)
attribute |
attribute to be categorized. |
returns a data frame with binary attributes, one for each possible category.
cm <- categ_mapping("Species") iris_cm <- transform(cm, iris) # can be made in a single column species <- iris[,"Species", drop=FALSE] iris_cm <- transform(cm, species)
cm <- categ_mapping("Species") iris_cm <- transform(cm, iris) # can be made in a single column species <- iris[,"Species", drop=FALSE] iris_cm <- transform(cm, species)
Creates a classification object that uses the Decision Tree algorithm for classification. It wraps the tree library.
cla_dtree(attribute, slevels)
cla_dtree(attribute, slevels)
attribute |
attribute target to model building |
slevels |
the possible values for the target classification |
returns a classification object
data(iris) slevels <- levels(iris$Species) model <- cla_dtree("Species", slevels) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
data(iris) slevels <- levels(iris$Species) model <- cla_dtree("Species", slevels) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
Classifies using the K-Nearest Neighbor algorithm. It wraps the class library.
cla_knn(attribute, slevels, k = 1)
cla_knn(attribute, slevels, k = 1)
attribute |
attribute target to model building. |
slevels |
possible values for the target classification. |
k |
a vector of integers indicating the number of neighbors to be considered. |
returns a knn object.
data(iris) slevels <- levels(iris$Species) model <- cla_knn("Species", slevels, k=3) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
data(iris) slevels <- levels(iris$Species) model <- cla_knn("Species", slevels, k=3) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
This function creates a classification object that uses the majority vote strategy to predict the target attribute. Given a target attribute, the function counts the number of occurrences of each value in the dataset and selects the one that appears most often.
cla_majority(attribute, slevels)
cla_majority(attribute, slevels)
attribute |
attribute target to model building. |
slevels |
possible values for the target classification. |
returns a classification object.
data(iris) slevels <- levels(iris$Species) model <- cla_majority("Species", slevels) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
data(iris) slevels <- levels(iris$Species) model <- cla_majority("Species", slevels) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
Creates a classification object that uses the Multi-Layer Perceptron (MLP) method. It wraps the nnet library.
cla_mlp(attribute, slevels, size = NULL, decay = 0.1, maxit = 1000)
cla_mlp(attribute, slevels, size = NULL, decay = 0.1, maxit = 1000)
attribute |
attribute target to model building |
slevels |
possible values for the target classification |
size |
number of nodes that will be used in the hidden layer |
decay |
how quickly it decreases in gradient descent |
maxit |
maximum iterations |
returns a classification object
data(iris) slevels <- levels(iris$Species) model <- cla_mlp("Species", slevels, size=3, decay=0.03) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
data(iris) slevels <- levels(iris$Species) model <- cla_mlp("Species", slevels, size=3, decay=0.03) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
Classification using the Naive Bayes algorithm It wraps the e1071 library.
cla_nb(attribute, slevels)
cla_nb(attribute, slevels)
attribute |
attribute target to model building. |
slevels |
possible values for the target classification. |
returns a classification object.
data(iris) slevels <- levels(iris$Species) model <- cla_nb("Species", slevels) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
data(iris) slevels <- levels(iris$Species) model <- cla_nb("Species", slevels) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
Creates a classification object that uses the Random Forest method It wraps the randomForest library.
cla_rf(attribute, slevels, nodesize = 5, ntree = 10, mtry = NULL)
cla_rf(attribute, slevels, nodesize = 5, ntree = 10, mtry = NULL)
attribute |
attribute target to model building |
slevels |
possible values for the target classification |
nodesize |
node size |
ntree |
number of trees |
mtry |
number of attributes to build tree |
returns a classification object
data(iris) slevels <- levels(iris$Species) model <- cla_rf("Species", slevels, ntree=5) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
data(iris) slevels <- levels(iris$Species) model <- cla_rf("Species", slevels, ntree=5) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
Creates a classification object that uses the Support Vector Machine (SVM) method for classification It wraps the e1071 and svm library.
cla_svm(attribute, slevels, epsilon = 0.1, cost = 10, kernel = "radial")
cla_svm(attribute, slevels, epsilon = 0.1, cost = 10, kernel = "radial")
attribute |
attribute target to model building |
slevels |
possible values for the target classification |
epsilon |
parameter that controls the width of the margin around the separating hyperplane |
cost |
parameter that controls the trade-off between having a wide margin and correctly classifying training data points |
kernel |
the type of kernel function to be used in the SVM algorithm (linear, radial, polynomial, sigmoid) |
returns a SVM classification object
data(iris) slevels <- levels(iris$Species) model <- cla_svm("Species", slevels, epsilon=0.0,cost=20.000) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
data(iris) slevels <- levels(iris$Species) model <- cla_svm("Species", slevels, epsilon=0.0,cost=20.000) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test model <- fit(model, train) prediction <- predict(model, test) predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
This function performs a grid search or random search over specified hyperparameter values to optimize a base classification model
cla_tune(base_model, folds = 10, metric = "accuracy")
cla_tune(base_model, folds = 10, metric = "accuracy")
base_model |
base model for tuning |
folds |
number of folds for cross-validation |
metric |
metric used to optimize |
returns a cla_tune
object
# preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test # hyper parameter setup tune <- cla_tune(cla_mlp("Species", levels(iris$Species))) ranges <- list(size=c(3:5), decay=c(0.1)) # hyper parameter optimization model <- fit(tune, train, ranges) # testing optimization test_prediction <- predict(model, test) test_predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
# preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, iris) train <- sr$train test <- sr$test # hyper parameter setup tune <- cla_tune(cla_mlp("Species", levels(iris$Species))) ranges <- list(size=c(3:5), decay=c(0.1)) # hyper parameter optimization model <- fit(tune, train, ranges) # testing optimization test_prediction <- predict(model, test) test_predictand <- adjust_class_label(test[,"Species"]) test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
Ancestor class for classification problems using MLmetrics nnet
classification(attribute, slevels)
classification(attribute, slevels)
attribute |
attribute target to model building |
slevels |
possible values for the target classification |
returns a classification object
#See ?cla_dtree for a classification example using a decision tree
#See ?cla_dtree for a classification example using a decision tree
Creates an object for tuning clustering models. This object can be used to fit and optimize clustering algorithms by specifying hyperparameter ranges
clu_tune(base_model)
clu_tune(base_model)
base_model |
base model for tuning |
returns a clu_tune
object.
data(iris) # fit model model <- clu_tune(cluster_kmeans(k = 0)) ranges <- list(k = 1:10) model <- fit(model, iris[,1:4], ranges) model$k
data(iris) # fit model model <- clu_tune(cluster_kmeans(k = 0)) ranges <- list(k = 1:10) model <- fit(model, iris[,1:4], ranges) model$k
Defines a cluster method
cluster(obj, ...)
cluster(obj, ...)
obj |
a |
... |
optional arguments |
clustered data
#See ?cluster_kmeans for an example of transformation
#See ?cluster_kmeans for an example of transformation
Creates a clusterer object that uses the DBSCAN method It wraps the dbscan library.
cluster_dbscan(minPts = 3, eps = NULL)
cluster_dbscan(minPts = 3, eps = NULL)
minPts |
minimum number of points |
eps |
distance value |
returns a dbscan object
# setup clustering model <- cluster_dbscan(minPts = 3) #load dataset data(iris) # build model model <- fit(model, iris[,1:4]) clu <- cluster(model, iris[,1:4]) table(clu) # evaluate model using external metric eval <- evaluate(model, clu, iris$Species) eval
# setup clustering model <- cluster_dbscan(minPts = 3) #load dataset data(iris) # build model model <- fit(model, iris[,1:4]) clu <- cluster(model, iris[,1:4]) table(clu) # evaluate model using external metric eval <- evaluate(model, clu, iris$Species) eval
Creates a clusterer object that uses the k-means method It wraps the stats library.
cluster_kmeans(k = 1)
cluster_kmeans(k = 1)
k |
the number of clusters to form. |
returns a k-means object.
# setup clustering model <- cluster_kmeans(k=3) #load dataset data(iris) # build model model <- fit(model, iris[,1:4]) clu <- cluster(model, iris[,1:4]) table(clu) # evaluate model using external metric eval <- evaluate(model, clu, iris$Species) eval
# setup clustering model <- cluster_kmeans(k=3) #load dataset data(iris) # build model model <- fit(model, iris[,1:4]) clu <- cluster(model, iris[,1:4]) table(clu) # evaluate model using external metric eval <- evaluate(model, clu, iris$Species) eval
Creates a clusterer object that uses the Partition Around Medoids (PAM) method It wraps the cluster library.
cluster_pam(k = 1)
cluster_pam(k = 1)
k |
the number of clusters to generate. |
returns PAM object.
# setup clustering model <- cluster_pam(k = 3) #load dataset data(iris) # build model model <- fit(model, iris[,1:4]) clu <- cluster(model, iris[,1:4]) table(clu) # evaluate model using external metric eval <- evaluate(model, clu, iris$Species) eval
# setup clustering model <- cluster_pam(k = 3) #load dataset data(iris) # build model model <- fit(model, iris[,1:4]) clu <- cluster(model, iris[,1:4]) table(clu) # evaluate model using external metric eval <- evaluate(model, clu, iris$Species) eval
Ancestor class for clustering problems
clusterer()
clusterer()
returns a clusterer
object
#See ?cluster_kmeans for an example of transformation
#See ?cluster_kmeans for an example of transformation
The dal_base class is an abstract class for all dal descendants classes. It provides both fit() and action() functions
dal_base()
dal_base()
returns a dal_base object
trans <- dal_base()
trans <- dal_base()
A ancestor class for clustering, classification, regression, and time series regression. It also provides the basis for specialized evaluation of learning performance.
An example of a learner is a decision tree (cla_dtree)
dal_learner()
dal_learner()
returns a learner
#See ?cla_dtree for a classification example using a decision tree
#See ?cla_dtree for a classification example using a decision tree
A transformation method applied to a dataset. If needed, the fit can be called to adjust the transform.
dal_transform()
dal_transform()
returns a dal_transform
object.
#See ?minmax for an example of transformation
#See ?minmax for an example of transformation
Creates an ancestor class for hyperparameter optimization, allowing the tuning of a base model using cross-validation.
dal_tune(base_model, folds = 10)
dal_tune(base_model, folds = 10)
base_model |
base model for tuning |
folds |
number of folds for cross-validation |
returns a dal_tune
object
#See ?cla_tune for classification tuning #See ?reg_tune for regression tuning #See ?ts_tune for time series tuning
#See ?cla_tune for classification tuning #See ?reg_tune for regression tuning #See ?ts_tune for time series tuning
The data_sample function in R is used to randomly sample data from a given data frame. It can be used to obtain a subset of data for further analysis or modeling.
Two basic specializations of data_sample are sample_random and sample_stratified. They provide random sampling and stratified sampling, respectively.
Data sample provides both training and testing partitioning (train_test) and k-fold partitioning (k_fold) of data.
data_sample()
data_sample()
returns an object of class data_sample
#using random sampling sample <- sample_random() tt <- train_test(sample, iris) # distribution of train table(tt$train$Species) # preparing dataset into four folds folds <- k_fold(sample, iris, 4) # distribution of folds tbl <- NULL for (f in folds) { tbl <- rbind(tbl, table(f$Species)) } head(tbl)
#using random sampling sample <- sample_random() tt <- train_test(sample, iris) # distribution of train table(tt$train$Species) # preparing dataset into four folds folds <- k_fold(sample, iris, 4) # distribution of folds tbl <- NULL for (f in folds) { tbl <- rbind(tbl, table(f$Species)) } head(tbl)
Creates an deep learning denoising autoencoder to encode a sequence of observations. It wraps the pytorch library.
dns_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, noise_factor = 0.3 )
dns_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, noise_factor = 0.3 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
noise_factor |
level of noise to be added to the data |
a dns_encode_decode
object.
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
Creates an deep learning denoising autoencoder to encode a sequence of observations. It wraps the pytorch library.
dns_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, noise_factor = 0.3 )
dns_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, noise_factor = 0.3 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
noise_factor |
level of noise to be added to the data |
a dns_encode_decode
object.
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
The actual time series model fitting. This method should be override by descendants.
do_fit(obj, x, y = NULL)
do_fit(obj, x, y = NULL)
obj |
an object representing the model or algorithm to be fitted |
x |
a matrix or data.frame containing the input features for training the model |
y |
a vector or matrix containing the output values to be predicted by the model |
returns a fitted object
The actual time series model prediction. This method should be override by descendants.
do_predict(obj, x)
do_predict(obj, x)
obj |
an object representing the fitted model or algorithm |
x |
a matrix or data.frame containing the input features for making predictions |
returns the predicted values
PCA (Principal Component Analysis) is an unsupervised dimensionality reduction technique used in data analysis and machine learning. It transforms a dataset of possibly correlated variables into a new set of uncorrelated variables called principal components.
dt_pca(attribute = NULL, components = NULL)
dt_pca(attribute = NULL, components = NULL)
attribute |
target attribute to model building |
components |
number of components for PCA |
returns an object of class dt_pca
mypca <- dt_pca("Species") # Automatically fitting number of components mypca <- fit(mypca, iris) iris.pca <- transform(mypca, iris) head(iris.pca) head(mypca$pca.transf) # Manual establishment of number of components mypca <- dt_pca("Species", 3) mypca <- fit(mypca, datasets::iris) iris.pca <- transform(mypca, iris) head(iris.pca) head(mypca$pca.transf)
mypca <- dt_pca("Species") # Automatically fitting number of components mypca <- fit(mypca, iris) iris.pca <- transform(mypca, iris) head(iris.pca) head(mypca$pca.transf) # Manual establishment of number of components mypca <- dt_pca("Species", 3) mypca <- fit(mypca, datasets::iris) iris.pca <- transform(mypca, iris) head(iris.pca) head(mypca$pca.transf)
Evaluate learner performance. The actual evaluate varies according to the type of learner (clustering, classification, regression, time series regression)
evaluate(obj, ...)
evaluate(obj, ...)
obj |
object |
... |
optional arguments |
returns the evaluation
data(iris) slevels <- levels(iris$Species) model <- cla_dtree("Species", slevels) model <- fit(model, iris) prediction <- predict(model, iris) predictand <- adjust_class_label(iris[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
data(iris) slevels <- levels(iris$Species) model <- cla_dtree("Species", slevels) model <- fit(model, iris) prediction <- predict(model, iris) predictand <- adjust_class_label(iris[,"Species"]) test_eval <- evaluate(model, predictand, prediction) test_eval$metrics
Applies the fit
method to a model object to train or configure it using the provided data and optional arguments
fit(obj, ...)
fit(obj, ...)
obj |
object |
... |
optional arguments. |
returns a object after fitting
data(iris) # an example is minmax normalization trans <- minmax() trans <- fit(trans, iris) tiris <- action(trans, iris)
data(iris) # an example is minmax normalization trans <- minmax() trans <- fit(trans, iris) tiris <- action(trans, iris)
Fitting a curvature model in a sequence of observations. It extracts the the maximum curvature computed.
fit_curvature_max()
fit_curvature_max()
returns an object of class fit_curvature_max, which inherits from the fit_curvature and dal_transform classes. The object contains a list with the following elements:
x: The position in which the maximum curvature is reached.
y: The value where the the maximum curvature occurs.
yfit: The value of the maximum curvature.
x <- seq(from=1,to=10,by=0.5) dat <- data.frame(x = x, value = -log(x), variable = "log") myfit <- fit_curvature_max() res <- transform(myfit, dat$value) head(res)
x <- seq(from=1,to=10,by=0.5) dat <- data.frame(x = x, value = -log(x), variable = "log") myfit <- fit_curvature_max() res <- transform(myfit, dat$value) head(res)
Fitting a curvature model in a sequence of observations. It extracts the the minimum curvature computed.
fit_curvature_min()
fit_curvature_min()
Returns an object of class fit_curvature_max, which inherits from the fit_curvature and dal_transform classes. The object contains a list with the following elements:
x: The position in which the minimum curvature is reached.
y: The value where the the minimum curvature occurs.
yfit: The value of the minimum curvature.
x <- seq(from=1,to=10,by=0.5) dat <- data.frame(x = x, value = log(x), variable = "log") myfit <- fit_curvature_min() res <- transform(myfit, dat$value) head(res)
x <- seq(from=1,to=10,by=0.5) dat <- data.frame(x = x, value = log(x), variable = "log") myfit <- fit_curvature_min() res <- transform(myfit, dat$value) head(res)
Tunes the hyperparameters of a machine learning model for classification
## S3 method for class 'cla_tune' fit(obj, data, ranges, ...)
## S3 method for class 'cla_tune' fit(obj, data, ranges, ...)
obj |
an object containing the model and tuning configuration |
data |
the dataset used for training and evaluation |
ranges |
a list of hyperparameter ranges to explore |
... |
optional arguments |
a fitted obj
Fits a DBSCAN clustering model by setting the eps
parameter.
If eps
is not provided, it is estimated based on the k-nearest neighbor distances.
It wraps dbscan library
## S3 method for class 'cluster_dbscan' fit(obj, data, ...)
## S3 method for class 'cluster_dbscan' fit(obj, data, ...)
obj |
an object containing the DBSCAN model configuration, including |
data |
the dataset to use for fitting the model |
... |
optional arguments |
returns a fitted obj with the eps
parameter set
Reverses the transformation applied to data.
inverse_transform(obj, ...)
inverse_transform(obj, ...)
obj |
a dal_transform object. |
... |
optional arguments. |
dataset inverse transformed.
#See ?minmax for an example of transformation
#See ?minmax for an example of transformation
k-fold partition of a dataset using a sampling method
k_fold(obj, data, k)
k_fold(obj, data, k)
obj |
an object representing the sampling method |
data |
dataset to be partitioned |
k |
number of folds |
returns a list of k
data frames
#using random sampling sample <- sample_random() # preparing dataset into four folds folds <- k_fold(sample, iris, 4) # distribution of folds tbl <- NULL for (f in folds) { tbl <- rbind(tbl, table(f$Species)) } head(tbl)
#using random sampling sample <- sample_random() # preparing dataset into four folds folds <- k_fold(sample, iris, 4) # distribution of folds tbl <- NULL for (f in folds) { tbl <- rbind(tbl, table(f$Species)) } head(tbl)
Creates an deep learning LSTM autoencoder to encode a sequence of observations. It wraps the pytorch library.
lae_encode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
lae_encode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
returns a lae_encode
object.
#See an example of using `lae_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/lae_encode.ipynb)
#See an example of using `lae_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/lae_encode.ipynb)
Creates an deep learning LSTM autoencoder to encode a sequence of observations. It wraps the pytorch library.
lae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
lae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 50, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
returns a lae_encode_decode
object.
#See an example of using `lae_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/lae_enc_decode.ipynb)
#See an example of using `lae_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/lae_enc_decode.ipynb)
The minmax performs scales data between [0,1].
minmax()
minmax()
returns an object of class minmax
data(iris) head(iris) trans <- minmax() trans <- fit(trans, iris) tiris <- transform(trans, iris) head(tiris) itiris <- inverse_transform(trans, tiris) head(itiris)
data(iris) head(iris) trans <- minmax() trans <- fit(trans, iris) tiris <- transform(trans, iris) head(tiris) itiris <- inverse_transform(trans, tiris) head(itiris)
Compute the mean squared error (MSE) between actual values and forecasts of a time series
MSE.ts(actual, prediction)
MSE.ts(actual, prediction)
actual |
real observations |
prediction |
predicted observations |
returns a number, which is the calculated MSE
The outliers class uses box-plot definition for outliers.
An outlier is a value that is below than or higher than
.
The class remove outliers for numeric attributes.
Users can set alpha to 3 to remove extreme values.
outliers(alpha = 1.5)
outliers(alpha = 1.5)
alpha |
boxplot outlier threshold (default 1.5, but can be 3.0 to remove extreme values) |
returns an outlier object
# code for outlier removal out_obj <- outliers() # class for outlier analysis out_obj <- fit(out_obj, iris) # computing boundaries iris.clean <- transform(out_obj, iris) # returning cleaned dataset #inspection of cleaned dataset nrow(iris.clean) idx <- attr(iris.clean, "idx") table(idx) iris.outliers <- iris[idx,] iris.outliers
# code for outlier removal out_obj <- outliers() # class for outlier analysis out_obj <- fit(out_obj, iris) # computing boundaries iris.clean <- transform(out_obj, iris) # returning cleaned dataset #inspection of cleaned dataset nrow(iris.clean) idx <- attr(iris.clean, "idx") table(idx) iris.outliers <- iris[idx,] iris.outliers
this function displays a bar graph from a data frame containing x-axis categories using ggplot2.
plot_bar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)
plot_bar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
alpha |
level of transparency |
returns a ggplot graphic
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length)) head(data) #ploting data grf <- plot_bar(data, colors="blue") plot(grf)
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length)) head(data) #ploting data grf <- plot_bar(data, colors="blue") plot(grf)
this function displays a boxplot graph from a data frame containing x-axis categories and numeric values using ggplot2.
plot_boxplot(data, label_x = "", label_y = "", colors = NULL, barwidth = 0.25)
plot_boxplot(data, label_x = "", label_y = "", colors = NULL, barwidth = 0.25)
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
barwidth |
width of bar |
returns a ggplot graphic
grf <- plot_boxplot(iris, colors="white") plot(grf)
grf <- plot_boxplot(iris, colors="white") plot(grf)
This function generates boxplots grouped by a specified class label from a data frame containing numeric values using ggplot2.
plot_boxplot_class( data, class_label, label_x = "", label_y = "", colors = NULL )
plot_boxplot_class( data, class_label, label_x = "", label_y = "", colors = NULL )
data |
data.frame contain x, value, and variable |
class_label |
name of attribute for class label |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
returns a ggplot graphic
grf <- plot_boxplot_class(iris |> dplyr::select(Sepal.Width, Species), class = "Species", colors=c("red", "green", "blue")) plot(grf)
grf <- plot_boxplot_class(iris |> dplyr::select(Sepal.Width, Species), class = "Species", colors=c("red", "green", "blue")) plot(grf)
This function generates a density plot from a data frame containing numeric values using ggplot2. If the data frame has multiple columns, densities can be grouped and plotted.
plot_density( data, label_x = "", label_y = "", colors = NULL, bin = NULL, alpha = 0.25 )
plot_density( data, label_x = "", label_y = "", colors = NULL, bin = NULL, alpha = 0.25 )
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
bin |
bin width for density estimation |
alpha |
level of transparency |
returns a ggplot graphic
grf <- plot_density(iris |> dplyr::select(Sepal.Width), colors="blue") plot(grf)
grf <- plot_density(iris |> dplyr::select(Sepal.Width), colors="blue") plot(grf)
This function generates density plots using ggplot2 grouped by a specified class label from a data frame containing numeric values.
plot_density_class( data, class_label, label_x = "", label_y = "", colors = NULL, bin = NULL, alpha = 0.5 )
plot_density_class( data, class_label, label_x = "", label_y = "", colors = NULL, bin = NULL, alpha = 0.5 )
data |
data.frame contain x, value, and variable |
class_label |
name of attribute for class label |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
bin |
bin width for density estimation |
alpha |
level of transparency |
returns a ggplot graphic
grf <- plot_density_class(iris |> dplyr::select(Sepal.Width, Species), class = "Species", colors=c("red", "green", "blue")) plot(grf)
grf <- plot_density_class(iris |> dplyr::select(Sepal.Width, Species), class = "Species", colors=c("red", "green", "blue")) plot(grf)
This function generates a grouped bar plot from a given data frame using ggplot2.
plot_groupedbar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)
plot_groupedbar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
alpha |
level of transparency |
returns a ggplot graphic
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length), Sepal.Width=mean(Sepal.Width)) head(data) #ploting data grf <- plot_groupedbar(data, colors=c("blue", "red")) plot(grf)
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length), Sepal.Width=mean(Sepal.Width)) head(data) #ploting data grf <- plot_groupedbar(data, colors=c("blue", "red")) plot(grf)
This function generates a histogram from a specified data frame using ggplot2.
plot_hist(data, label_x = "", label_y = "", color = "white", alpha = 0.25)
plot_hist(data, label_x = "", label_y = "", color = "white", alpha = 0.25)
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
color |
color vector |
alpha |
transparency level |
returns a ggplot graphic
grf <- plot_hist(iris |> dplyr::select(Sepal.Width), color=c("blue")) plot(grf)
grf <- plot_hist(iris |> dplyr::select(Sepal.Width), color=c("blue")) plot(grf)
This function creates a lollipop chart using ggplot2.
plot_lollipop( data, label_x = "", label_y = "", colors = NULL, color_text = "black", size_text = 3, size_ball = 8, alpha_ball = 0.2, min_value = 0, max_value_gap = 1 )
plot_lollipop( data, label_x = "", label_y = "", colors = NULL, color_text = "black", size_text = 3, size_ball = 8, alpha_ball = 0.2, min_value = 0, max_value_gap = 1 )
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
color_text |
color of text inside ball |
size_text |
size of text inside ball |
size_ball |
size of ball |
alpha_ball |
transparency of ball |
min_value |
minimum value |
max_value_gap |
maximum value gap |
returns a ggplot graphic
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length)) head(data) #ploting data grf <- plot_lollipop(data, colors="blue", max_value_gap=0.2) plot(grf)
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length)) head(data) #ploting data grf <- plot_lollipop(data, colors="blue", max_value_gap=0.2) plot(grf)
This function creates a pie chart using ggplot2.
plot_pieplot( data, label_x = "", label_y = "", colors = NULL, textcolor = "white", bordercolor = "black" )
plot_pieplot( data, label_x = "", label_y = "", colors = NULL, textcolor = "white", bordercolor = "black" )
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
textcolor |
text color |
bordercolor |
border color |
returns a ggplot graphic
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length)) head(data) #ploting data grf <- plot_pieplot(data, colors=c("red", "green", "blue")) plot(grf)
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length)) head(data) #ploting data grf <- plot_pieplot(data, colors=c("red", "green", "blue")) plot(grf)
This function creates a scatter plot using ggplot2.
plot_points(data, label_x = "", label_y = "", colors = NULL)
plot_points(data, label_x = "", label_y = "", colors = NULL)
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
returns a ggplot graphic
x <- seq(0, 10, 0.25) data <- data.frame(x, sin=sin(x), cosine=cos(x)+5) head(data) grf <- plot_points(data, colors=c("red", "green")) plot(grf)
x <- seq(0, 10, 0.25) data <- data.frame(x, sin=sin(x), cosine=cos(x)+5) head(data) grf <- plot_points(data, colors=c("red", "green")) plot(grf)
This function creates a radar chart using ggplot2.
plot_radar(data, label_x = "", label_y = "", colors = NULL)
plot_radar(data, label_x = "", label_y = "", colors = NULL)
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
returns a ggplot graphic
data <- data.frame(name = "Petal.Length", value = mean(iris$Petal.Length)) data <- rbind(data, data.frame(name = "Petal.Width", value = mean(iris$Petal.Width))) data <- rbind(data, data.frame(name = "Sepal.Length", value = mean(iris$Sepal.Length))) data <- rbind(data, data.frame(name = "Sepal.Width", value = mean(iris$Sepal.Width))) grf <- plot_radar(data, colors="red") + ggplot2::ylim(0, NA) plot(grf)
data <- data.frame(name = "Petal.Length", value = mean(iris$Petal.Length)) data <- rbind(data, data.frame(name = "Petal.Width", value = mean(iris$Petal.Width))) data <- rbind(data, data.frame(name = "Sepal.Length", value = mean(iris$Sepal.Length))) data <- rbind(data, data.frame(name = "Sepal.Width", value = mean(iris$Sepal.Width))) grf <- plot_radar(data, colors="red") + ggplot2::ylim(0, NA) plot(grf)
This function creates a scatter plot using ggplot2.
plot_scatter(data, label_x = "", label_y = "", colors = NULL)
plot_scatter(data, label_x = "", label_y = "", colors = NULL)
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
return a ggplot graphic
grf <- plot_scatter(iris |> dplyr::select(x = Sepal.Length, value = Sepal.Width, variable = Species), label_x = "Sepal.Length", label_y = "Sepal.Width", colors=c("red", "green", "blue")) plot(grf)
grf <- plot_scatter(iris |> dplyr::select(x = Sepal.Length, value = Sepal.Width, variable = Species), label_x = "Sepal.Length", label_y = "Sepal.Width", colors=c("red", "green", "blue")) plot(grf)
This function creates a time series plot using ggplot2.
plot_series(data, label_x = "", label_y = "", colors = NULL)
plot_series(data, label_x = "", label_y = "", colors = NULL)
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
returns a ggplot graphic
x <- seq(0, 10, 0.25) data <- data.frame(x, sin=sin(x)) head(data) grf <- plot_series(data, colors=c("red")) plot(grf)
x <- seq(0, 10, 0.25) data <- data.frame(x, sin=sin(x)) head(data) grf <- plot_series(data, colors=c("red")) plot(grf)
this function creates a stacked bar chart using ggplot2.
plot_stackedbar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)
plot_stackedbar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)
data |
data.frame contain x, value, and variable |
label_x |
x-axis label |
label_y |
y-axis label |
colors |
color vector |
alpha |
level of transparency |
returns a ggplot graphic
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length), Sepal.Width=mean(Sepal.Width)) #plotting data grf <- plot_stackedbar(data, colors=c("blue", "red")) plot(grf)
#summarizing iris dataset data <- iris |> dplyr::group_by(Species) |> dplyr::summarize(Sepal.Length=mean(Sepal.Length), Sepal.Width=mean(Sepal.Width)) #plotting data grf <- plot_stackedbar(data, colors=c("blue", "red")) plot(grf)
This function plots a time series chart with points and a line using ggplot2.
plot_ts(x = NULL, y, label_x = "", label_y = "", color = "black")
plot_ts(x = NULL, y, label_x = "", label_y = "", color = "black")
x |
input variable |
y |
output variable |
label_x |
x-axis label |
label_y |
y-axis label |
color |
color for time series |
returns a ggplot graphic
x <- seq(0, 10, 0.25) data <- data.frame(x, sin=sin(x)) head(data) grf <- plot_ts(x = data$x, y = data$sin, color=c("red")) plot(grf)
x <- seq(0, 10, 0.25) data <- data.frame(x, sin=sin(x)) head(data) grf <- plot_ts(x = data$x, y = data$sin, color=c("red")) plot(grf)
This function plots a time series chart with three lines: the original series, the adjusted series, and the predicted series using ggplot2.
plot_ts_pred( x = NULL, y, yadj, ypred = NULL, label_x = "", label_y = "", color = "black", color_adjust = "blue", color_prediction = "green" )
plot_ts_pred( x = NULL, y, yadj, ypred = NULL, label_x = "", label_y = "", color = "black", color_adjust = "blue", color_prediction = "green" )
x |
time index |
y |
time series |
yadj |
adjustment of time series |
ypred |
prediction of the time series |
label_x |
x-axis title |
label_y |
y-axis title |
color |
color for the time series |
color_adjust |
color for the adjusted values |
color_prediction |
color for the predictions |
returns a ggplot graphic
data(sin_data) ts <- ts_data(sin_data$y, 0) ts_head(ts, 3) samp <- ts_sample(ts, test_size= 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_arima() model <- fit(model, x=io_train$input, y=io_train$output) adjust <- predict(model, io_train$input) prediction <- predict(model, x=io_test$input, steps_ahead=5) prediction <- as.vector(prediction) yvalues <- c(io_train$output, io_test$output) grf <- plot_ts_pred(y=yvalues, yadj=adjust, ypre=prediction) plot(grf)
data(sin_data) ts <- ts_data(sin_data$y, 0) ts_head(ts, 3) samp <- ts_sample(ts, test_size= 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_arima() model <- fit(model, x=io_train$input, y=io_train$output) adjust <- predict(model, io_train$input) prediction <- predict(model, x=io_test$input, steps_ahead=5) prediction <- as.vector(prediction) yvalues <- c(io_train$output, io_test$output) grf <- plot_ts_pred(y=yvalues, yadj=adjust, ypre=prediction) plot(grf)
Ancestor class for regression and classification It provides basis for fit and predict methods. Besides, action method proxies to predict.
An example of learner is a decision tree (cla_dtree)
predictor()
predictor()
returns a predictor object
#See ?cla_dtree for a classification example using a decision tree
#See ?cla_dtree for a classification example using a decision tree
Compute the R-squared (R2) between actual values and forecasts of a time series
R2.ts(actual, prediction)
R2.ts(actual, prediction)
actual |
real observations |
prediction |
predicted observations |
returns a number, which is the calculated R2
Creates a regression object that uses the Decision Tree method for regression It wraps the tree library.
reg_dtree(attribute)
reg_dtree(attribute)
attribute |
attribute target to model building. |
returns a decision tree regression object
data(Boston) model <- reg_dtree("medv") # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
data(Boston) model <- reg_dtree("medv") # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
Creates a regression object that uses the K-Nearest Neighbors (knn) method for regression
reg_knn(attribute, k)
reg_knn(attribute, k)
attribute |
attribute target to model building |
k |
number of k neighbors |
returns a knn regression object
data(Boston) model <- reg_knn("medv", k=3) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
data(Boston) model <- reg_knn("medv", k=3) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
Creates a regression object that uses the Multi-Layer Perceptron (MLP) method. It wraps the nnet library.
reg_mlp(attribute, size = NULL, decay = 0.05, maxit = 1000)
reg_mlp(attribute, size = NULL, decay = 0.05, maxit = 1000)
attribute |
attribute target to model building |
size |
number of neurons in hidden layers |
decay |
decay learning rate |
maxit |
number of maximum iterations for training |
returns a object of class reg_mlp
data(Boston) model <- reg_mlp("medv", size=5, decay=0.54) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
data(Boston) model <- reg_mlp("medv", size=5, decay=0.54) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
Creates a regression object that uses the Random Forest method. It wraps the randomForest library.
reg_rf(attribute, nodesize = 1, ntree = 10, mtry = NULL)
reg_rf(attribute, nodesize = 1, ntree = 10, mtry = NULL)
attribute |
attribute target to model building |
nodesize |
node size |
ntree |
number of trees |
mtry |
number of attributes to build tree |
returns an object of class reg_rf
obj
data(Boston) model <- reg_rf("medv", ntree=10) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
data(Boston) model <- reg_rf("medv", ntree=10) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
Creates a regression object that uses the Support Vector Machine (SVM) method for regression It wraps the e1071 and svm library.
reg_svm(attribute, epsilon = 0.1, cost = 10, kernel = "radial")
reg_svm(attribute, epsilon = 0.1, cost = 10, kernel = "radial")
attribute |
attribute target to model building |
epsilon |
parameter that controls the width of the margin around the separating hyperplane |
cost |
parameter that controls the trade-off between having a wide margin and correctly classifying training data points |
kernel |
the type of kernel function to be used in the SVM algorithm (linear, radial, polynomial, sigmoid) |
returns a SVM regression object
data(Boston) model <- reg_svm("medv", epsilon=0.2,cost=40.000) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
data(Boston) model <- reg_svm("medv", epsilon=0.2,cost=40.000) # preparing dataset for random sampling sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test model <- fit(model, train) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
Creates an object for tuning regression models
reg_tune(base_model, folds = 10)
reg_tune(base_model, folds = 10)
base_model |
base model for tuning |
folds |
number of folds for cross-validation |
returns a reg_tune
object.
# preparing dataset for random sampling data(Boston) sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test # hyper parameter setup tune <- reg_tune(reg_mlp("medv")) ranges <- list(size=c(3), decay=c(0.1,0.5)) # hyper parameter optimization model <- fit(tune, train, ranges) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
# preparing dataset for random sampling data(Boston) sr <- sample_random() sr <- train_test(sr, Boston) train <- sr$train test <- sr$test # hyper parameter setup tune <- reg_tune(reg_mlp("medv")) ranges <- list(size=c(3), decay=c(0.1,0.5)) # hyper parameter optimization model <- fit(tune, train, ranges) test_prediction <- predict(model, test) test_predictand <- test[,"medv"] test_eval <- evaluate(model, test_predictand, test_prediction) test_eval$metrics
Ancestor class for regression problems. This ancestor class is used to define and manage the target attribute for regression tasks.
regression(attribute)
regression(attribute)
attribute |
attribute target to model building |
returns a regression object
#See ?reg_dtree for a regression example using a decision tree
#See ?reg_dtree for a regression example using a decision tree
Creates an deep learning stacked autoencoder to encode a sequence of observations. The autoencoder layers are based on DAL Toolbox Vanilla Autoencoder It wraps the pytorch library.
sae_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, k = 3 )
sae_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, k = 3 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
k |
number of AE layers in the stack |
a sae_encode_decode
object.
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
Creates an deep learning stacked autoencoder to encode a sequence of observations. The autoencoder layers are based on DAL Toolbox Vanilla Autoencoder It wraps the pytorch library.
sae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, k = 3 )
sae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001, k = 3 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
k |
number of AE layers in the stack |
a sae_encode_decode
object.
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
The sample_random function in R is used to generate a random sample of specified size from a given data set.
sample_random()
sample_random()
returns an object of class 'sample_random
#using random sampling sample <- sample_random() tt <- train_test(sample, iris) # distribution of train table(tt$train$Species) # preparing dataset into four folds folds <- k_fold(sample, iris, 4) # distribution of folds tbl <- NULL for (f in folds) { tbl <- rbind(tbl, table(f$Species)) } head(tbl)
#using random sampling sample <- sample_random() tt <- train_test(sample, iris) # distribution of train table(tt$train$Species) # preparing dataset into four folds folds <- k_fold(sample, iris, 4) # distribution of folds tbl <- NULL for (f in folds) { tbl <- rbind(tbl, table(f$Species)) } head(tbl)
The sample_stratified function in R is used to generate a stratified random sample from a given dataset. Stratified sampling is a statistical method that is used when the population is divided into non-overlapping subgroups or strata, and a sample is selected from each stratum to represent the entire population. In stratified sampling, the sample is selected in such a way that it is representative of the entire population and the variability within each stratum is minimized.
sample_stratified(attribute)
sample_stratified(attribute)
attribute |
attribute target to model building |
returns an object of class sample_stratified
#using stratified sampling sample <- sample_stratified("Species") tt <- train_test(sample, iris) # distribution of train table(tt$train$Species) # preparing dataset into four folds folds <- k_fold(sample, iris, 4) # distribution of folds tbl <- NULL for (f in folds) { tbl <- rbind(tbl, table(f$Species)) } head(tbl)
#using stratified sampling sample <- sample_stratified("Species") tt <- train_test(sample, iris) # distribution of train table(tt$train$Species) # preparing dataset into four folds folds <- k_fold(sample, iris, 4) # distribution of folds tbl <- NULL for (f in folds) { tbl <- rbind(tbl, table(f$Species)) } head(tbl)
Selects the optimal hyperparameters from a dataset resulting from k-fold cross-validation
select_hyper(obj, hyperparameters)
select_hyper(obj, hyperparameters)
obj |
the object or model used for hyperparameter selection. |
hyperparameters |
data set with hyper parameters and quality measure from execution |
returns the index of selected hyper parameter
Selects the optimal hyperparameter by maximizing the average classification metric. It wraps dplyr library.
## S3 method for class 'cla_tune' select_hyper(obj, hyperparameters)
## S3 method for class 'cla_tune' select_hyper(obj, hyperparameters)
obj |
an object representing the model or tuning process |
hyperparameters |
a dataframe with columns |
returns a optimized key number of hyperparameters
Identifies the optimal hyperparameters by minimizing the error from a dataset of hyperparameters. The function selects the hyperparameter configuration that results in the lowest average error. It wraps the dplyr library.
## S3 method for class 'ts_tune' select_hyper(obj, hyperparameters)
## S3 method for class 'ts_tune' select_hyper(obj, hyperparameters)
obj |
a |
hyperparameters |
hyperparameters dataset |
returns the optimized key number of hyperparameters
set_params function assigns all parameters to the attributes presented in the object.
set_params(obj, params)
set_params(obj, params)
obj |
object of class dal_base |
params |
parameters to set obj |
returns an object with parameters set
obj <- set_params(dal_base(), list(x = 0))
obj <- set_params(dal_base(), list(x = 0))
Default method for set_params
which returns the object unchanged
## Default S3 method: set_params(obj, params)
## Default S3 method: set_params(obj, params)
obj |
object |
params |
parameters |
returns the object unchanged
Synthetic dataset of sine function.
x: correspond time from 0 to 10.
y: dependent variable for time series modeling.
data(sin_data)
data(sin_data)
data.frame
.
This dataset was generated for examples.
data(sin_data) head(sin_data)
data(sin_data) head(sin_data)
Compute the symmetric mean absolute percent error (sMAPE)
sMAPE.ts(actual, prediction)
sMAPE.ts(actual, prediction)
actual |
real observations |
prediction |
predicted observations |
returns the sMAPE between the actual and prediction vectors
Smoothing is a statistical technique used to reduce the noise in a signal or a dataset by removing the high-frequency components. The smoothing level is associated with the number of bins used. There are alternative methods to establish the smoothing: equal interval, equal frequency, and clustering.
smoothing(n)
smoothing(n)
n |
number of bins |
returns an object of class smoothing
data(iris) obj <- smoothing_inter(n = 2) obj <- fit(obj, iris$Sepal.Length) sl.bi <- transform(obj, iris$Sepal.Length) table(sl.bi) obj$interval entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species) entro$entropy
data(iris) obj <- smoothing_inter(n = 2) obj <- fit(obj, iris$Sepal.Length) sl.bi <- transform(obj, iris$Sepal.Length) table(sl.bi) obj$interval entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species) entro$entropy
Uses clustering method to perform data smoothing. The input vector is divided into clusters using the k-means algorithm. The mean of each cluster is then calculated and used as the smoothed value for all observations within that cluster.
smoothing_cluster(n)
smoothing_cluster(n)
n |
number of bins |
returns an object of class smoothing_cluster
data(iris) obj <- smoothing_cluster(n = 2) obj <- fit(obj, iris$Sepal.Length) sl.bi <- transform(obj, iris$Sepal.Length) table(sl.bi) obj$interval entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species) entro$entropy
data(iris) obj <- smoothing_cluster(n = 2) obj <- fit(obj, iris$Sepal.Length) sl.bi <- transform(obj, iris$Sepal.Length) table(sl.bi) obj$interval entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species) entro$entropy
The 'smoothing_freq' function is used to smooth a given time series data by aggregating observations within a fixed frequency.
smoothing_freq(n)
smoothing_freq(n)
n |
number of bins |
returns an object of class smoothing_freq
data(iris) obj <- smoothing_freq(n = 2) obj <- fit(obj, iris$Sepal.Length) sl.bi <- transform(obj, iris$Sepal.Length) table(sl.bi) obj$interval entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species) entro$entropy
data(iris) obj <- smoothing_freq(n = 2) obj <- fit(obj, iris$Sepal.Length) sl.bi <- transform(obj, iris$Sepal.Length) table(sl.bi) obj$interval entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species) entro$entropy
The "smoothing by interval" function is used to apply a smoothing technique to a vector or time series data using a moving window approach.
smoothing_inter(n)
smoothing_inter(n)
n |
number of bins |
returns an object of class smoothing_inter
data(iris) obj <- smoothing_inter(n = 2) obj <- fit(obj, iris$Sepal.Length) sl.bi <- transform(obj, iris$Sepal.Length) table(sl.bi) obj$interval entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species) entro$entropy
data(iris) obj <- smoothing_inter(n = 2) obj <- fit(obj, iris$Sepal.Length) sl.bi <- transform(obj, iris$Sepal.Length) table(sl.bi) obj$interval entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species) entro$entropy
Partitions a dataset into training and test sets using a specified sampling method
train_test(obj, data, perc = 0.8, ...)
train_test(obj, data, perc = 0.8, ...)
obj |
an object of a class that supports the |
data |
dataset to be partitioned |
perc |
a numeric value between 0 and 1 specifying the proportion of data to be used for training |
... |
additional optional arguments passed to specific methods. |
returns an list with two elements:
train: A data frame containing the training set
test: A data frame containing the test set
#using random sampling sample <- sample_random() tt <- train_test(sample, iris) # distribution of train table(tt$train$Species)
#using random sampling sample <- sample_random() tt <- train_test(sample, iris) # distribution of train table(tt$train$Species)
Splits a dataset into training and test sets based on k-fold cross-validation. The function takes a list of data partitions (folds) and a specified fold index k. It returns the data corresponding to the k-th fold as the test set, and combines all other folds to form the training set.
train_test_from_folds(folds, k)
train_test_from_folds(folds, k)
folds |
data partitioned into folds |
k |
k-fold for test set, all reminder for training set |
returns a list with two elements:
train: A data frame containing the combined data from all folds except the k-th fold, used as the training set.
test: A data frame corresponding to the k-th fold, used as the test set.
# Create k-fold partitions of a dataset (e.g., iris) folds <- k_fold(sample_random(), iris, k = 5) # Use the first fold as the test set and combine the remaining folds for the training set train_test_split <- train_test_from_folds(folds, k = 1) # Display the training set head(train_test_split$train) # Display the test set head(train_test_split$test)
# Create k-fold partitions of a dataset (e.g., iris) folds <- k_fold(sample_random(), iris, k = 5) # Use the first fold as the test set and combine the remaining folds for the training set train_test_split <- train_test_from_folds(folds, k = 1) # Display the training set head(train_test_split$train) # Display the test set head(train_test_split$test)
Defines a transformation method.
transform(obj, ...)
transform(obj, ...)
obj |
a |
... |
optional arguments. |
returns a transformed data.
#See ?minmax for an example of transformation
#See ?minmax for an example of transformation
Creates a time series prediction object that uses the AutoRegressive Integrated Moving Average (ARIMA). It wraps the forecast library.
ts_arima()
ts_arima()
returns a ts_arima
object.
data(sin_data) ts <- ts_data(sin_data$y, 0) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_arima() model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
data(sin_data) ts <- ts_data(sin_data$y, 0) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_arima() model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
Creates a time series prediction object that uses the Conv1D. It wraps the pytorch library.
ts_conv1d(preprocess = NA, input_size = NA, epochs = 10000L)
ts_conv1d(preprocess = NA, input_size = NA, epochs = 10000L)
preprocess |
normalization |
input_size |
input size for machine learning model |
epochs |
maximum number of epochs |
returns a ts_conv1d
object.
#See an example of using `ts_conv1d` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/timeseries/ts_conv1d.ipynb)
#See an example of using `ts_conv1d` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/timeseries/ts_conv1d.ipynb)
Time series data structure used in DAL Toolbox.
It receives a vector (representing a time series) or
a matrix y
(representing a sliding windows).
Internal ts_data is matrix of sliding windows with size sw
.
If sw equals to zero, it store a time series as a single matrix column.
ts_data(y, sw = 1)
ts_data(y, sw = 1)
y |
output variable |
sw |
integer: sliding window size. |
returns a ts_data
object.
data(sin_data) head(sin_data) data <- ts_data(sin_data$y) ts_head(data) data10 <- ts_data(sin_data$y, 10) ts_head(data10)
data(sin_data) head(sin_data) data <- ts_data(sin_data$y) ts_head(data) data10 <- ts_data(sin_data$y, 10) ts_head(data10)
Creates a time series prediction object that uses the Extreme Learning Machine (ELM). It wraps the elmNNRcpp library.
ts_elm(preprocess = NA, input_size = NA, nhid = NA, actfun = "purelin")
ts_elm(preprocess = NA, input_size = NA, nhid = NA, actfun = "purelin")
preprocess |
normalization |
input_size |
input size for machine learning model |
nhid |
ensemble size |
actfun |
defines the type to use, possible values: 'sig', 'radbas', 'tribas', 'relu', 'purelin' (default). |
returns a ts_elm
object.
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_elm(ts_norm_gminmax(), input_size=4, nhid=3, actfun="purelin") model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_elm(ts_norm_gminmax(), input_size=4, nhid=3, actfun="purelin") model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
ts_data
ObjectReturns the first n observations from a ts_data
ts_head(x, n = 6L, ...)
ts_head(x, n = 6L, ...)
x |
|
n |
number of rows to return |
... |
optional arguments |
returns the first n observations of a ts_data
data(sin_data) data10 <- ts_data(sin_data$y, 10) ts_head(data10)
data(sin_data) data10 <- ts_data(sin_data$y, 10) ts_head(data10)
Creates a prediction object that uses the K-Nearest Neighbors (KNN) method for time series regression
ts_knn(preprocess = NA, input_size = NA, k = NA)
ts_knn(preprocess = NA, input_size = NA, k = NA)
preprocess |
normalization |
input_size |
input size for machine learning model |
k |
number of k neighbors |
returns a ts_knn
object.
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_knn(ts_norm_gminmax(), input_size=4, k=3) model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_knn(ts_norm_gminmax(), input_size=4, k=3) model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
Creates a time series prediction object that uses the LSTM. It wraps the pytorch library.
ts_lstm(preprocess = NA, input_size = NA, epochs = 10000L)
ts_lstm(preprocess = NA, input_size = NA, epochs = 10000L)
preprocess |
normalization |
input_size |
input size for machine learning model |
epochs |
maximum number of epochs |
returns a ts_lstm
object.
#See an example of using `ts_ts_lstmconv1d` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/timeseries/ts_lstm.ipynb)
#See an example of using `ts_ts_lstmconv1d` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/timeseries/ts_lstm.ipynb)
Creates a time series prediction object that uses the Multilayer Perceptron (MLP). It wraps the nnet library.
ts_mlp(preprocess = NA, input_size = NA, size = NA, decay = 0.01, maxit = 1000)
ts_mlp(preprocess = NA, input_size = NA, size = NA, decay = 0.01, maxit = 1000)
preprocess |
normalization |
input_size |
input size for machine learning model |
size |
number of neurons inside hidden layer |
decay |
decay parameter for MLP |
maxit |
maximum number of iterations |
returns a ts_mlp
object.
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_mlp(ts_norm_gminmax(), input_size=4, size=4, decay=0) model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_mlp(ts_norm_gminmax(), input_size=4, size=4, decay=0) model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
Transform data to a common scale while taking into account the changes in the statistical properties of the data over time.
ts_norm_an(remove_outliers = TRUE, nw = 0)
ts_norm_an(remove_outliers = TRUE, nw = 0)
remove_outliers |
logical: if TRUE (default) outliers will be removed. |
nw |
integer: window size. |
returns a ts_norm_an
object.
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_an() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_an() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
This function calculates the difference between the values of a time series.
ts_norm_diff(remove_outliers = TRUE)
ts_norm_diff(remove_outliers = TRUE)
remove_outliers |
logical: if TRUE (default) outliers will be removed. |
returns a ts_norm_diff
object.
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_diff() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,9])
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_diff() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,9])
Creates a normalization object for time series data using an Exponential Moving Average (EMA) method. This normalization approach adapts to changes in the time series and optionally removes outliers.
ts_norm_ean(remove_outliers = TRUE, nw = 0)
ts_norm_ean(remove_outliers = TRUE, nw = 0)
remove_outliers |
logical: if TRUE (default) outliers will be removed. |
nw |
windows size |
returns a ts_norm_ean
object.
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_ean() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_ean() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
Rescales data, so the minimum value is mapped to 0 and the maximum value is mapped to 1.
ts_norm_gminmax(remove_outliers = TRUE)
ts_norm_gminmax(remove_outliers = TRUE)
remove_outliers |
logical: if TRUE (default) outliers will be removed. |
returns a ts_norm_gminmax
object.
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_gminmax() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_gminmax() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
The ts_norm_swminmax function creates an object for normalizing a time series based on the "sliding window min-max scaling" method
ts_norm_swminmax(remove_outliers = TRUE)
ts_norm_swminmax(remove_outliers = TRUE)
remove_outliers |
logical: if TRUE (default) outliers will be removed. |
returns a ts_norm_swminmax
object.
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_swminmax() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
# time series to normalize data(sin_data) # convert to sliding windows ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) summary(ts[,10]) # normalization preproc <- ts_norm_swminmax() preproc <- fit(preproc, ts) tst <- transform(preproc, ts) ts_head(tst, 3) summary(tst[,10])
Separates a ts_data
object into input and output components for time series analysis.
This function is useful for preparing data for modeling, where the input and output variables are extracted from a time series dataset.
ts_projection(ts)
ts_projection(ts)
ts |
matrix or data.frame containing the time series. |
returns a ts_projection
object.
#setting up a ts_data data(sin_data) ts <- ts_data(sin_data$y, 10) io <- ts_projection(ts) #input data ts_head(io$input) #output data ts_head(io$output)
#setting up a ts_data data(sin_data) ts <- ts_data(sin_data$y, 10) io <- ts_projection(ts) #input data ts_head(io$input) #output data ts_head(io$output)
Time Series Regression directly from time series Ancestral class for non-sliding windows implementation.
ts_reg()
ts_reg()
returns ts_reg
object
#See ?ts_arima for an example using Auto-regressive Integrated Moving Average
#See ?ts_arima for an example using Auto-regressive Integrated Moving Average
Time Series Regression from Sliding Windows. Ancestral class for Machine Learning Implementation.
ts_regsw(preprocess = NA, input_size = NA)
ts_regsw(preprocess = NA, input_size = NA)
preprocess |
normalization |
input_size |
input size for machine learning model |
returns a ts_regsw
object
#See ?ts_elm for an example using Extreme Learning Machine
#See ?ts_elm for an example using Extreme Learning Machine
Creates a time series prediction object that uses the Random Forest. It wraps the randomForest library.
ts_rf(preprocess = NA, input_size = NA, nodesize = 1, ntree = 10, mtry = NULL)
ts_rf(preprocess = NA, input_size = NA, nodesize = 1, ntree = 10, mtry = NULL)
preprocess |
normalization |
input_size |
input size for machine learning model |
nodesize |
node size |
ntree |
number of trees |
mtry |
number of attributes to build tree |
returns a ts_rf
object.
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_rf(ts_norm_gminmax(), input_size=4, nodesize=3, ntree=50) model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_rf(ts_norm_gminmax(), input_size=4, nodesize=3, ntree=50) model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
Separates the ts_data
into training and test.
It separates the test size from the last observations minus an offset.
The offset is important to allow replication under different recent origins.
The data for train uses the number of rows of a ts_data
minus the test size and offset.
ts_sample(ts, test_size = 1, offset = 0)
ts_sample(ts, test_size = 1, offset = 0)
ts |
time series. |
test_size |
integer: size of test data (default = 1). |
offset |
integer: starting point (default = 0). |
returns a list with the two samples
#setting up a ts_data data(sin_data) ts <- ts_data(sin_data$y, 10) #separating into train and test test_size <- 3 samp <- ts_sample(ts, test_size) #first five rows from training data ts_head(samp$train, 5) #last five rows from training data ts_head(samp$train[-c(1:(nrow(samp$train)-5)),]) #testing data ts_head(samp$test)
#setting up a ts_data data(sin_data) ts <- ts_data(sin_data$y, 10) #separating into train and test test_size <- 3 samp <- ts_sample(ts, test_size) #first five rows from training data ts_head(samp$train, 5) #last five rows from training data ts_head(samp$train[-c(1:(nrow(samp$train)-5)),]) #testing data ts_head(samp$test)
Creates a time series prediction object that uses the Support Vector Machine (SVM). It wraps the e1071 library.
ts_svm( preprocess = NA, input_size = NA, kernel = "radial", epsilon = 0, cost = 10 )
ts_svm( preprocess = NA, input_size = NA, kernel = "radial", epsilon = 0, cost = 10 )
preprocess |
normalization |
input_size |
input size for machine learning model |
kernel |
SVM kernel (linear, radial, polynomial, sigmoid) |
epsilon |
error threshold |
cost |
this parameter controls the trade-off between achieving a low error on the training data and minimizing the model complexity |
returns a ts_svm
object.
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_svm(ts_norm_gminmax(), input_size=4) model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) model <- ts_svm(ts_norm_gminmax(), input_size=4) model <- fit(model, x=io_train$input, y=io_train$output) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
Creates a ts_tune
object for tuning hyperparameters of a time series model.
This function sets up a tuning process for the specified base model by exploring different
configurations of hyperparameters using cross-validation.
ts_tune(input_size, base_model, folds = 10)
ts_tune(input_size, base_model, folds = 10)
input_size |
input size for machine learning model |
base_model |
base model for tuning |
folds |
number of folds for cross-validation |
returns a ts_tune
object
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) tune <- ts_tune(input_size=c(3:5), base_model = ts_elm(ts_norm_gminmax())) ranges <- list(nhid = 1:5, actfun=c('purelin')) # Generic model tunning model <- fit(tune, x=io_train$input, y=io_train$output, ranges) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
data(sin_data) ts <- ts_data(sin_data$y, 10) ts_head(ts, 3) samp <- ts_sample(ts, test_size = 5) io_train <- ts_projection(samp$train) io_test <- ts_projection(samp$test) tune <- ts_tune(input_size=c(3:5), base_model = ts_elm(ts_norm_gminmax())) ranges <- list(nhid = 1:5, actfun=c('purelin')) # Generic model tunning model <- fit(tune, x=io_train$input, y=io_train$output, ranges) prediction <- predict(model, x=io_test$input[1,], steps_ahead=5) prediction <- as.vector(prediction) output <- as.vector(io_test$output) ev_test <- evaluate(model, output, prediction) ev_test
Creates an deep learning variational autoencoder to encode a sequence of observations. It wraps the pytorch library.
varae_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
varae_encode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
returns a varae_encode
object.
#See an example of using `varae_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/varae_encode.ipynb)
#See an example of using `varae_encode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/varae_encode.ipynb)
Creates an deep learning variational autoencoder to encode a sequence of observations. It wraps the pytorch library.
varae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
varae_encode_decode( input_size, encoding_size, batch_size = 32, num_epochs = 1000, learning_rate = 0.001 )
input_size |
input size |
encoding_size |
encoding size |
batch_size |
size for batch learning |
num_epochs |
number of epochs for training |
learning_rate |
learning rate |
returns a varae_encode_decode
object.
#See an example of using `varae_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/varae_enc_decode.ipynb)
#See an example of using `varae_encode_decode` at this #[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/varae_enc_decode.ipynb)
Scale data using z-score normalization.
zscore(nmean = 0, nsd = 1)
zscore(nmean = 0, nsd = 1)
nmean |
new mean for normalized data |
nsd |
new standard deviation for normalized data |
returns the z-score transformation object
data(iris) head(iris) trans <- zscore() trans <- fit(trans, iris) tiris <- transform(trans, iris) head(tiris) itiris <- inverse_transform(trans, tiris) head(itiris)
data(iris) head(iris) trans <- zscore() trans <- fit(trans, iris) tiris <- transform(trans, iris) head(tiris) itiris <- inverse_transform(trans, tiris) head(itiris)