Package 'daltoolbox' reference manual

Title:	Leveraging Experiment Lines to Data Analytics
Description:	The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Authors:	Eduardo Ogasawara [aut, ths, cre] , Antonio Castro [aut, ctb], Heraldo Borges [aut, ths], Janio Lima [aut, ths], Lucas Tavares [aut, ths], Diego Carvalho [aut, ths], Eduardo Bezerra [aut, ths], Joel Santos [aut, ths], Rafaelli Coutinho [aut, ths], Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ) [cph]
Maintainer:	Eduardo Ogasawara <[email protected]>
License:	MIT + file LICENSE
Version:	1.1.727
Built:	2025-01-24 07:12:55 UTC
Source:	CRAN

Subset Extraction for Time Series Data

Description

Extracts a subset of a time series object based on specified rows and columns. The function allows for flexible indexing and subsetting of time series data.

Usage

## S3 method for class 'ts_data'
x[i, j, ...]
## S3 method for class 'ts_data'
x[i, j, ...]

Arguments

`x`	`ts_data` object
`i`	row i
`j`	column j
`...`	optional arguments

Value

returns a new ts_data object

Examples

data(sin_data)
data10 <- ts_data(sin_data$y, 10)
ts_head(data10)
#single line
data10[12,]

#range of lines
data10[12:13,]

#single column
data10[,1]

#range of columns
data10[,1:2]

#range of rows and columns
data10[12:13,1:2]

#single line and a range of columns
#'data10[12,1:2]

#range of lines and a single column
data10[12:13,1]

#single observation
data10[12,1]
data(sin_data)
data10 <- ts_data(sin_data$y, 10)
ts_head(data10)
#single line
data10[12,]

#range of lines
data10[12:13,]

#single column
data10[,1]

#range of columns
data10[,1:2]

#range of rows and columns
data10[12:13,1:2]

#single line and a range of columns
#'data10[12,1:2]

#range of lines and a single column
data10[12:13,1]

#single observation
data10[12,1]

Adversarial Autoencoder - Encode

Description

Creates an deep learning adversarial autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

aae_encode(
  input_size,
  encoding_size,
  batch_size = 350,
  num_epochs = 1000,
  learning_rate = 0.001
)
aae_encode(
  input_size,
  encoding_size,
  batch_size = 350,
  num_epochs = 1000,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

a aae_encode object. #See an example of using aae_encode at this #link

Adversarial Autoencoder - Encode

Description

Creates an deep learning adversarial autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

aae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)
aae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

a aae_encode_decode object.

Examples

#See an example of using `aae_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/aae_enc_decode.ipynb)
#See an example of using `aae_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/aae_enc_decode.ipynb)

Action

Description

Executes the action of model applied in provided data

Usage

action(obj, ...)
action(obj, ...)

Arguments

`obj`	object: a dal_base object to apply the transformation on the input dataset.
`...`	optional arguments.

Value

returns the result of an action of the model applied in provided data

Examples

data(iris)
# an example is minmax normalization
trans <- minmax()
trans <- fit(trans, iris)
tiris <- action(trans, iris)
data(iris)
# an example is minmax normalization
trans <- minmax()
trans <- fit(trans, iris)
tiris <- action(trans, iris)

Action implementation for transform

Description

A default function that defines the action to proxy transform method

Usage

## S3 method for class 'dal_transform'
action(obj, ...)
## S3 method for class 'dal_transform'
action(obj, ...)

Arguments

`obj`	object
`...`	optional arguments

Value

returns a transformed data

Examples

#See ?minmax for an example of transformation
#See ?minmax for an example of transformation

Adjust to data frame

Description

Converts a dataset to a data.frame if it is not already in that format

Usage

adjust_data.frame(data)
adjust_data.frame(data)

Arguments

data

dataset

Value

returns a data.frame

Examples

data(iris)
df <- adjust_data.frame(iris)
data(iris)
df <- adjust_data.frame(iris)

Adjust to matrix

Description

Converts a dataset to a matrix format if it is not already in that format

Usage

adjust_matrix(data)
adjust_matrix(data)

Arguments

data

dataset

Value

returns an adjusted matrix

Examples

data(iris)
mat <- adjust_matrix(iris)
data(iris)
mat <- adjust_matrix(iris)

Adjust `ts_data`

Description

Converts a dataset to a ts_data object

Usage

adjust_ts_data(data)
adjust_ts_data(data)

Arguments

data

dataset

Value

returns an adjusted ts_data

Autoencoder - Encode

Description

Creates an deep learning autoencoder to encode a sequence of observations. It wraps the pytorch and reticulate libraries.

Usage

autoenc_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)
autoenc_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

returns a autoenc_encode object.

Examples

#See an example of using `autoenc_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/van_encode.ipynb)
#See an example of using `autoenc_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/van_encode.ipynb)

Autoencoder - Encode-decode

Description

Creates an deep learning autoencoder to encode-decode a sequence of observations. It wraps the pytorch and reticulate libraries.

Usage

autoenc_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)
autoenc_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

returns a autoenc_encode_decode object.

Examples

#See an example of using `autoenc_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/van_enc_decode.ipynb)
#See an example of using `autoenc_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/van_enc_decode.ipynb)

Boston Housing Data (Regression)

Description

housing values in suburbs of Boston.

crim: per capita crime rate by town.
zn: proportion of residential land zoned for lots over 25,000 sq.ft.
indus: proportion of non-retail business acres per town
chas: Charles River dummy variable (= 1 if tract bounds)
nox: nitric oxides concentration (parts per 10 million)
rm: average number of rooms per dwelling
age: proportion of owner-occupied units built prior to 1940
dis: weighted distances to five Boston employment centres
rad: index of accessibility to radial highways
tax: full-value property-tax rate per $10,000
ptratio: pupil-teacher ratio by town
black: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
lstat: percentage of lower status of the population
medv: Median value of owner-occupied homes in $1000's

Usage

data(Boston)
data(Boston)

Format

Regression Dataset.

Source

This dataset was obtained from the MASS library.

References

Creator: Harrison, D. and Rubinfeld, D.L. Hedonic prices and the demand for clean air, J. Environ. Economics & Management, vol.5, 81-102, 1978.

Examples

data(Boston)
head(Boston)
data(Boston)
head(Boston)

Convolutional Autoencoder - Encode

Description

Creates an deep learning convolutional autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

cae_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)
cae_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

a cae_encode object.

Examples

#See an example of using `cae_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae_encode.ipynb)
#See an example of using `cae_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae_encode.ipynb)

Convolutional Autoencoder - Encode

Description

Creates an deep learning convolutional autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

cae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)
cae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

a cae_encode_decode object.

Examples

#See an example of using `cae_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae_enc_decode.ipynb)
#See an example of using `cae_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae_enc_decode.ipynb)

Convolutional 2d Autoencoder - Encode

Description

Creates an deep learning convolutional autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

cae2d_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)
cae2d_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

a cae2d_encode_decode object.

returns a cae2d_encode_decode object.

Examples

#See an example of using `cae2d_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2d_enc_decode.ipynb)
#See an example of using `cae2d_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2d_enc_decode.ipynb)

Convolutional 2d Denoising Autoencoder - Encode

Description

Creates an deep learning convolutional denoising autoencoder to encode a sequence of observations. It wraps the pytorch library.

Creates an deep learning convolutional autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

cae2den_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)

cae2den_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)
cae2den_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)

cae2den_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

a c2den_encode_decode object.

a cae2den_encode object.

Examples

#See an example of using `c2den_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/c2den_encode.ipynb)
#See an example of using `cae2den_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2den_encode.ipynb)
#See an example of using `c2den_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/c2den_encode.ipynb)
#See an example of using `cae2den_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2den_encode.ipynb)

Convolutional 2d Denoising Autoencoder - Encode

Description

Creates an deep learning convolutional denoising autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

cae2den_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)
cae2den_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

a c2den_encode_decode object.

Examples

#See an example of using `cae2den_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2den_enc_decode.ipynb)
#See an example of using `cae2den_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/cae2den_enc_decode.ipynb)

Categorical mapping

Description

Categorical mapping provides a way to map the levels of a categorical variable to new values. Each possible value is converted to a binary attribute.

Usage

categ_mapping(attribute)
categ_mapping(attribute)

Arguments

attribute

attribute to be categorized.

Value

returns a data frame with binary attributes, one for each possible category.

Examples

cm <- categ_mapping("Species")
iris_cm <- transform(cm, iris)

# can be made in a single column
species <- iris[,"Species", drop=FALSE]
iris_cm <- transform(cm, species)
cm <- categ_mapping("Species")
iris_cm <- transform(cm, iris)

# can be made in a single column
species <- iris[,"Species", drop=FALSE]
iris_cm <- transform(cm, species)

Decision Tree for classification

Description

Creates a classification object that uses the Decision Tree algorithm for classification. It wraps the tree library.

Usage

cla_dtree(attribute, slevels)
cla_dtree(attribute, slevels)

Arguments

`attribute`	attribute target to model building
`slevels`	the possible values for the target classification

Value

returns a classification object

Examples

data(iris)
slevels <- levels(iris$Species)
model <- cla_dtree("Species", slevels)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics
data(iris)
slevels <- levels(iris$Species)
model <- cla_dtree("Species", slevels)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics

K Nearest Neighbor Classification

Description

Classifies using the K-Nearest Neighbor algorithm. It wraps the class library.

Usage

cla_knn(attribute, slevels, k = 1)
cla_knn(attribute, slevels, k = 1)

Arguments

`attribute`	attribute target to model building.
`slevels`	possible values for the target classification.
`k`	a vector of integers indicating the number of neighbors to be considered.

Value

returns a knn object.

Examples

data(iris)
slevels <- levels(iris$Species)
model <- cla_knn("Species", slevels, k=3)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics
data(iris)
slevels <- levels(iris$Species)
model <- cla_knn("Species", slevels, k=3)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics

This function creates a classification object that uses the majority vote strategy to predict the target attribute. Given a target attribute, the function counts the number of occurrences of each value in the dataset and selects the one that appears most often.

Usage

cla_majority(attribute, slevels)
cla_majority(attribute, slevels)

Arguments

`attribute`	attribute target to model building.
`slevels`	possible values for the target classification.

Value

returns a classification object.

Examples

data(iris)
slevels <- levels(iris$Species)
model <- cla_majority("Species", slevels)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics
data(iris)
slevels <- levels(iris$Species)
model <- cla_majority("Species", slevels)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics

MLP for classification

Description

Creates a classification object that uses the Multi-Layer Perceptron (MLP) method. It wraps the nnet library.

Usage

cla_mlp(attribute, slevels, size = NULL, decay = 0.1, maxit = 1000)
cla_mlp(attribute, slevels, size = NULL, decay = 0.1, maxit = 1000)

Arguments

`attribute`	attribute target to model building
`slevels`	possible values for the target classification
`size`	number of nodes that will be used in the hidden layer
`decay`	how quickly it decreases in gradient descent
`maxit`	maximum iterations

Value

returns a classification object

Examples

data(iris)
slevels <- levels(iris$Species)
model <- cla_mlp("Species", slevels, size=3, decay=0.03)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics
data(iris)
slevels <- levels(iris$Species)
model <- cla_mlp("Species", slevels, size=3, decay=0.03)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics

Naive Bayes Classifier

Description

Classification using the Naive Bayes algorithm It wraps the e1071 library.

Usage

cla_nb(attribute, slevels)
cla_nb(attribute, slevels)

Arguments

`attribute`	attribute target to model building.
`slevels`	possible values for the target classification.

Value

returns a classification object.

Examples

data(iris)
slevels <- levels(iris$Species)
model <- cla_nb("Species", slevels)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics
data(iris)
slevels <- levels(iris$Species)
model <- cla_nb("Species", slevels)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics

Random Forest for classification

Description

Creates a classification object that uses the Random Forest method It wraps the randomForest library.

Usage

cla_rf(attribute, slevels, nodesize = 5, ntree = 10, mtry = NULL)
cla_rf(attribute, slevels, nodesize = 5, ntree = 10, mtry = NULL)

Arguments

`attribute`	attribute target to model building
`slevels`	possible values for the target classification
`nodesize`	node size
`ntree`	number of trees
`mtry`	number of attributes to build tree

Value

returns a classification object

Examples

data(iris)
slevels <- levels(iris$Species)
model <- cla_rf("Species", slevels, ntree=5)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics
data(iris)
slevels <- levels(iris$Species)
model <- cla_rf("Species", slevels, ntree=5)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics

SVM for classification

Description

Creates a classification object that uses the Support Vector Machine (SVM) method for classification It wraps the e1071 and svm library.

Usage

cla_svm(attribute, slevels, epsilon = 0.1, cost = 10, kernel = "radial")
cla_svm(attribute, slevels, epsilon = 0.1, cost = 10, kernel = "radial")

Arguments

`attribute`	attribute target to model building
`slevels`	possible values for the target classification
`epsilon`	parameter that controls the width of the margin around the separating hyperplane
`cost`	parameter that controls the trade-off between having a wide margin and correctly classifying training data points
`kernel`	the type of kernel function to be used in the SVM algorithm (linear, radial, polynomial, sigmoid)

Value

returns a SVM classification object

Examples

data(iris)
slevels <- levels(iris$Species)
model <- cla_svm("Species", slevels, epsilon=0.0,cost=20.000)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics
data(iris)
slevels <- levels(iris$Species)
model <- cla_svm("Species", slevels, epsilon=0.0,cost=20.000)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

model <- fit(model, train)

prediction <- predict(model, test)
predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics

Classification Tune

Description

This function performs a grid search or random search over specified hyperparameter values to optimize a base classification model

Usage

cla_tune(base_model, folds = 10, metric = "accuracy")
cla_tune(base_model, folds = 10, metric = "accuracy")

Arguments

`base_model`	base model for tuning
`folds`	number of folds for cross-validation
`metric`	metric used to optimize

Value

returns a cla_tune object

Examples

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

# hyper parameter setup
tune <- cla_tune(cla_mlp("Species", levels(iris$Species)))
ranges <- list(size=c(3:5), decay=c(0.1))

# hyper parameter optimization
model <- fit(tune, train, ranges)

# testing optimization
test_prediction <- predict(model, test)
test_predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics
# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, iris)
train <- sr$train
test <- sr$test

# hyper parameter setup
tune <- cla_tune(cla_mlp("Species", levels(iris$Species)))
ranges <- list(size=c(3:5), decay=c(0.1))

# hyper parameter optimization
model <- fit(tune, train, ranges)

# testing optimization
test_prediction <- predict(model, test)
test_predictand <- adjust_class_label(test[,"Species"])
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics

classification

Description

Ancestor class for classification problems using MLmetrics nnet

Usage

classification(attribute, slevels)
classification(attribute, slevels)

Arguments

`attribute`	attribute target to model building
`slevels`	possible values for the target classification

Value

returns a classification object

Examples

#See ?cla_dtree for a classification example using a decision tree
#See ?cla_dtree for a classification example using a decision tree

Clustering Tune

Description

Creates an object for tuning clustering models. This object can be used to fit and optimize clustering algorithms by specifying hyperparameter ranges

Usage

clu_tune(base_model)
clu_tune(base_model)

Arguments

base_model

base model for tuning

Value

returns a clu_tune object.

Examples

data(iris)

# fit model
model <- clu_tune(cluster_kmeans(k = 0))
ranges <- list(k = 1:10)
model <- fit(model, iris[,1:4], ranges)
model$k
data(iris)

# fit model
model <- clu_tune(cluster_kmeans(k = 0))
ranges <- list(k = 1:10)
model <- fit(model, iris[,1:4], ranges)
model$k

Cluster

Description

Defines a cluster method

Usage

cluster(obj, ...)
cluster(obj, ...)

Arguments

`obj`	a `clusterer` object
`...`	optional arguments

Value

clustered data

Examples

#See ?cluster_kmeans for an example of transformation
#See ?cluster_kmeans for an example of transformation

DBSCAN

Description

Creates a clusterer object that uses the DBSCAN method It wraps the dbscan library.

Usage

cluster_dbscan(minPts = 3, eps = NULL)
cluster_dbscan(minPts = 3, eps = NULL)

Arguments

`minPts`	minimum number of points
`eps`	distance value

Value

returns a dbscan object

Examples

# setup clustering
model <- cluster_dbscan(minPts = 3)

#load dataset
data(iris)

# build model
model <- fit(model, iris[,1:4])
clu <- cluster(model, iris[,1:4])
table(clu)

# evaluate model using external metric
eval <- evaluate(model, clu, iris$Species)
eval
# setup clustering
model <- cluster_dbscan(minPts = 3)

#load dataset
data(iris)

# build model
model <- fit(model, iris[,1:4])
clu <- cluster(model, iris[,1:4])
table(clu)

# evaluate model using external metric
eval <- evaluate(model, clu, iris$Species)
eval

k-means

Description

Creates a clusterer object that uses the k-means method It wraps the stats library.

Usage

cluster_kmeans(k = 1)
cluster_kmeans(k = 1)

Arguments

`k`	the number of clusters to form.

Value

returns a k-means object.

Examples

# setup clustering
model <- cluster_kmeans(k=3)

#load dataset
data(iris)

# build model
model <- fit(model, iris[,1:4])
clu <- cluster(model, iris[,1:4])
table(clu)

# evaluate model using external metric
eval <- evaluate(model, clu, iris$Species)
eval
# setup clustering
model <- cluster_kmeans(k=3)

#load dataset
data(iris)

# build model
model <- fit(model, iris[,1:4])
clu <- cluster(model, iris[,1:4])
table(clu)

# evaluate model using external metric
eval <- evaluate(model, clu, iris$Species)
eval

PAM

Description

Creates a clusterer object that uses the Partition Around Medoids (PAM) method It wraps the cluster library.

Usage

cluster_pam(k = 1)
cluster_pam(k = 1)

Arguments

`k`	the number of clusters to generate.

Value

returns PAM object.

Examples

# setup clustering
model <- cluster_pam(k = 3)

#load dataset
data(iris)

# build model
model <- fit(model, iris[,1:4])
clu <- cluster(model, iris[,1:4])
table(clu)

# evaluate model using external metric
eval <- evaluate(model, clu, iris$Species)
eval
# setup clustering
model <- cluster_pam(k = 3)

#load dataset
data(iris)

# build model
model <- fit(model, iris[,1:4])
clu <- cluster(model, iris[,1:4])
table(clu)

# evaluate model using external metric
eval <- evaluate(model, clu, iris$Species)
eval

Clusterer

Description

Ancestor class for clustering problems

Usage

clusterer()
clusterer()

Value

returns a clusterer object

Examples

#See ?cluster_kmeans for an example of transformation
#See ?cluster_kmeans for an example of transformation

Class dal_base

Description

The dal_base class is an abstract class for all dal descendants classes. It provides both fit() and action() functions

Usage

dal_base()
dal_base()

Value

returns a dal_base object

Examples

trans <- dal_base()
trans <- dal_base()

DAL Learner

Description

A ancestor class for clustering, classification, regression, and time series regression. It also provides the basis for specialized evaluation of learning performance.

An example of a learner is a decision tree (cla_dtree)

Usage

dal_learner()
dal_learner()

Value

returns a learner

Examples

#See ?cla_dtree for a classification example using a decision tree
#See ?cla_dtree for a classification example using a decision tree

DAL Transform

Description

A transformation method applied to a dataset. If needed, the fit can be called to adjust the transform.

Usage

dal_transform()
dal_transform()

Value

returns a dal_transform object.

Examples

#See ?minmax for an example of transformation
#See ?minmax for an example of transformation

DAL Tune

Description

Creates an ancestor class for hyperparameter optimization, allowing the tuning of a base model using cross-validation.

Usage

dal_tune(base_model, folds = 10)
dal_tune(base_model, folds = 10)

Arguments

`base_model`	base model for tuning
`folds`	number of folds for cross-validation

Value

returns a dal_tune object

Examples

#See ?cla_tune for classification tuning
#See ?reg_tune for regression tuning
#See ?ts_tune for time series tuning
#See ?cla_tune for classification tuning
#See ?reg_tune for regression tuning
#See ?ts_tune for time series tuning

Data Sample

Description

The data_sample function in R is used to randomly sample data from a given data frame. It can be used to obtain a subset of data for further analysis or modeling.

Two basic specializations of data_sample are sample_random and sample_stratified. They provide random sampling and stratified sampling, respectively.

Data sample provides both training and testing partitioning (train_test) and k-fold partitioning (k_fold) of data.

Usage

data_sample()
data_sample()

Value

returns an object of class data_sample

Examples

#using random sampling
sample <- sample_random()
tt <- train_test(sample, iris)

# distribution of train
table(tt$train$Species)

# preparing dataset into four folds
folds <- k_fold(sample, iris, 4)

# distribution of folds
tbl <- NULL
for (f in folds) {
 tbl <- rbind(tbl, table(f$Species))
}
head(tbl)
#using random sampling
sample <- sample_random()
tt <- train_test(sample, iris)

# distribution of train
table(tt$train$Species)

# preparing dataset into four folds
folds <- k_fold(sample, iris, 4)

# distribution of folds
tbl <- NULL
for (f in folds) {
 tbl <- rbind(tbl, table(f$Species))
}
head(tbl)

Denoising Autoencoder - Encode

Description

Creates an deep learning denoising autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

dns_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001,
  noise_factor = 0.3
)
dns_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001,
  noise_factor = 0.3
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate
`noise_factor`	level of noise to be added to the data

Value

a dns_encode_decode object.

Examples

#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples

Denoising Autoencoder - Encode

Description

Creates an deep learning denoising autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

dns_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001,
  noise_factor = 0.3
)
dns_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001,
  noise_factor = 0.3
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate
`noise_factor`	level of noise to be added to the data

Value

a dns_encode_decode object.

Examples

#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples

Fit Time Series Model

Description

The actual time series model fitting. This method should be override by descendants.

Usage

do_fit(obj, x, y = NULL)
do_fit(obj, x, y = NULL)

Arguments

`obj`	an object representing the model or algorithm to be fitted
`x`	a matrix or data.frame containing the input features for training the model
`y`	a vector or matrix containing the output values to be predicted by the model

Value

returns a fitted object

Predict Time Series Model

Description

The actual time series model prediction. This method should be override by descendants.

Usage

do_predict(obj, x)
do_predict(obj, x)

Arguments

`obj`	an object representing the fitted model or algorithm
`x`	a matrix or data.frame containing the input features for making predictions

Value

returns the predicted values

PCA

Description

PCA (Principal Component Analysis) is an unsupervised dimensionality reduction technique used in data analysis and machine learning. It transforms a dataset of possibly correlated variables into a new set of uncorrelated variables called principal components.

Usage

dt_pca(attribute = NULL, components = NULL)
dt_pca(attribute = NULL, components = NULL)

Arguments

`attribute`	target attribute to model building
`components`	number of components for PCA

Value

returns an object of class dt_pca

Examples

mypca <- dt_pca("Species")
# Automatically fitting number of components
mypca <- fit(mypca, iris)
iris.pca <- transform(mypca, iris)
head(iris.pca)
head(mypca$pca.transf)
# Manual establishment of number of components
mypca <- dt_pca("Species", 3)
mypca <- fit(mypca, datasets::iris)
iris.pca <- transform(mypca, iris)
head(iris.pca)
head(mypca$pca.transf)
mypca <- dt_pca("Species")
# Automatically fitting number of components
mypca <- fit(mypca, iris)
iris.pca <- transform(mypca, iris)
head(iris.pca)
head(mypca$pca.transf)
# Manual establishment of number of components
mypca <- dt_pca("Species", 3)
mypca <- fit(mypca, datasets::iris)
iris.pca <- transform(mypca, iris)
head(iris.pca)
head(mypca$pca.transf)

Evaluate

Description

Evaluate learner performance. The actual evaluate varies according to the type of learner (clustering, classification, regression, time series regression)

Usage

evaluate(obj, ...)
evaluate(obj, ...)

Arguments

`obj`	object
`...`	optional arguments

Value

returns the evaluation

Examples

data(iris)
slevels <- levels(iris$Species)
model <- cla_dtree("Species", slevels)
model <- fit(model, iris)
prediction <- predict(model, iris)
predictand <- adjust_class_label(iris[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics
data(iris)
slevels <- levels(iris$Species)
model <- cla_dtree("Species", slevels)
model <- fit(model, iris)
prediction <- predict(model, iris)
predictand <- adjust_class_label(iris[,"Species"])
test_eval <- evaluate(model, predictand, prediction)
test_eval$metrics

Fit

Description

Applies the fit method to a model object to train or configure it using the provided data and optional arguments

Usage

fit(obj, ...)
fit(obj, ...)

Arguments

`obj`	object
`...`	optional arguments.

Value

returns a object after fitting

Examples

data(iris)
# an example is minmax normalization
trans <- minmax()
trans <- fit(trans, iris)
tiris <- action(trans, iris)
data(iris)
# an example is minmax normalization
trans <- minmax()
trans <- fit(trans, iris)
tiris <- action(trans, iris)

maximum curvature analysis

Description

Fitting a curvature model in a sequence of observations. It extracts the the maximum curvature computed.

Usage

fit_curvature_max()
fit_curvature_max()

Value

returns an object of class fit_curvature_max, which inherits from the fit_curvature and dal_transform classes. The object contains a list with the following elements:

x: The position in which the maximum curvature is reached.
y: The value where the the maximum curvature occurs.
yfit: The value of the maximum curvature.

Examples

x <- seq(from=1,to=10,by=0.5)
dat <- data.frame(x = x, value = -log(x), variable = "log")
myfit <- fit_curvature_max()
res <- transform(myfit, dat$value)
head(res)
x <- seq(from=1,to=10,by=0.5)
dat <- data.frame(x = x, value = -log(x), variable = "log")
myfit <- fit_curvature_max()
res <- transform(myfit, dat$value)
head(res)

minimum curvature analysis

Description

Fitting a curvature model in a sequence of observations. It extracts the the minimum curvature computed.

Usage

fit_curvature_min()
fit_curvature_min()

Value

Returns an object of class fit_curvature_max, which inherits from the fit_curvature and dal_transform classes. The object contains a list with the following elements:

x: The position in which the minimum curvature is reached.
y: The value where the the minimum curvature occurs.
yfit: The value of the minimum curvature.

Examples

x <- seq(from=1,to=10,by=0.5)
dat <- data.frame(x = x, value = log(x), variable = "log")
myfit <- fit_curvature_min()
res <- transform(myfit, dat$value)
head(res)
x <- seq(from=1,to=10,by=0.5)
dat <- data.frame(x = x, value = log(x), variable = "log")
myfit <- fit_curvature_min()
res <- transform(myfit, dat$value)
head(res)

tune hyperparameters of ml model

Description

Tunes the hyperparameters of a machine learning model for classification

Usage

## S3 method for class 'cla_tune'
fit(obj, data, ranges, ...)
## S3 method for class 'cla_tune'
fit(obj, data, ranges, ...)

Arguments

`obj`	an object containing the model and tuning configuration
`data`	the dataset used for training and evaluation
`ranges`	a list of hyperparameter ranges to explore
`...`	optional arguments

Value

a fitted obj

fit dbscan model

Description

Fits a DBSCAN clustering model by setting the eps parameter. If eps is not provided, it is estimated based on the k-nearest neighbor distances. It wraps dbscan library

Usage

## S3 method for class 'cluster_dbscan'
fit(obj, data, ...)
## S3 method for class 'cluster_dbscan'
fit(obj, data, ...)

Arguments

`obj`	an object containing the DBSCAN model configuration, including `minPts` and optionally `eps`
`data`	the dataset to use for fitting the model
`...`	optional arguments

Value

returns a fitted obj with the eps parameter set

Inverse Transform

Description

Reverses the transformation applied to data.

Usage

inverse_transform(obj, ...)
inverse_transform(obj, ...)

Arguments

`obj`	a dal_transform object.
`...`	optional arguments.

Value

dataset inverse transformed.

Examples

#See ?minmax for an example of transformation
#See ?minmax for an example of transformation

K-fold sampling

Description

k-fold partition of a dataset using a sampling method

Usage

k_fold(obj, data, k)
k_fold(obj, data, k)

Arguments

`obj`	an object representing the sampling method
`data`	dataset to be partitioned
`k`	number of folds

Value

returns a list of k data frames

Examples

#using random sampling
sample <- sample_random()

# preparing dataset into four folds
folds <- k_fold(sample, iris, 4)

# distribution of folds
tbl <- NULL
for (f in folds) {
 tbl <- rbind(tbl, table(f$Species))
}
head(tbl)
#using random sampling
sample <- sample_random()

# preparing dataset into four folds
folds <- k_fold(sample, iris, 4)

# distribution of folds
tbl <- NULL
for (f in folds) {
 tbl <- rbind(tbl, table(f$Species))
}
head(tbl)

LSTM Autoencoder - Encode

Description

Creates an deep learning LSTM autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

lae_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)
lae_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

returns a lae_encode object.

Examples

#See an example of using `lae_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/lae_encode.ipynb)
#See an example of using `lae_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/lae_encode.ipynb)

LSTM Autoencoder - Decode

Description

Creates an deep learning LSTM autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

lae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)
lae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 50,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

returns a lae_encode_decode object.

Examples

#See an example of using `lae_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/lae_enc_decode.ipynb)
#See an example of using `lae_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/lae_enc_decode.ipynb)

Min-max normalization

Description

The minmax performs scales data between [0,1].

$minmax = (x-min(x))/(max(x)-min(x))$

Usage

minmax()
minmax()

Value

returns an object of class minmax

Examples

data(iris)
head(iris)

trans <- minmax()
trans <- fit(trans, iris)
tiris <- transform(trans, iris)
head(tiris)

itiris <- inverse_transform(trans, tiris)
head(itiris)
data(iris)
head(iris)

trans <- minmax()
trans <- fit(trans, iris)
tiris <- transform(trans, iris)
head(tiris)

itiris <- inverse_transform(trans, tiris)
head(itiris)

MSE

Description

Compute the mean squared error (MSE) between actual values and forecasts of a time series

Usage

MSE.ts(actual, prediction)
MSE.ts(actual, prediction)

Arguments

`actual`	real observations
`prediction`	predicted observations

Value

returns a number, which is the calculated MSE

Outliers

Description

The outliers class uses box-plot definition for outliers. An outlier is a value that is below than $Q_1 - 1.5 \cdot IQR$ or higher than $Q_3 + 1.5 \cdot IQR$ . The class remove outliers for numeric attributes. Users can set alpha to 3 to remove extreme values.

Usage

outliers(alpha = 1.5)
outliers(alpha = 1.5)

Arguments

alpha

boxplot outlier threshold (default 1.5, but can be 3.0 to remove extreme values)

Value

returns an outlier object

Examples

# code for outlier removal
out_obj <- outliers() # class for outlier analysis
out_obj <- fit(out_obj, iris) # computing boundaries
iris.clean <- transform(out_obj, iris) # returning cleaned dataset

#inspection of cleaned dataset
nrow(iris.clean)

idx <- attr(iris.clean, "idx")
table(idx)
iris.outliers <- iris[idx,]
iris.outliers
# code for outlier removal
out_obj <- outliers() # class for outlier analysis
out_obj <- fit(out_obj, iris) # computing boundaries
iris.clean <- transform(out_obj, iris) # returning cleaned dataset

#inspection of cleaned dataset
nrow(iris.clean)

idx <- attr(iris.clean, "idx")
table(idx)
iris.outliers <- iris[idx,]
iris.outliers

Plot bar graph

Description

this function displays a bar graph from a data frame containing x-axis categories using ggplot2.

Usage

plot_bar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)
plot_bar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector
`alpha`	level of transparency

Value

returns a ggplot graphic

Examples

#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length))
head(data)

#ploting data
grf <- plot_bar(data, colors="blue")
plot(grf)
#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length))
head(data)

#ploting data
grf <- plot_bar(data, colors="blue")
plot(grf)

Plot boxplot

Description

this function displays a boxplot graph from a data frame containing x-axis categories and numeric values using ggplot2.

Usage

plot_boxplot(data, label_x = "", label_y = "", colors = NULL, barwidth = 0.25)
plot_boxplot(data, label_x = "", label_y = "", colors = NULL, barwidth = 0.25)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector
`barwidth`	width of bar

Value

returns a ggplot graphic

Examples

grf <- plot_boxplot(iris, colors="white")
plot(grf)
grf <- plot_boxplot(iris, colors="white")
plot(grf)

Boxplot per class

Description

This function generates boxplots grouped by a specified class label from a data frame containing numeric values using ggplot2.

Usage

plot_boxplot_class(
  data,
  class_label,
  label_x = "",
  label_y = "",
  colors = NULL
)
plot_boxplot_class(
  data,
  class_label,
  label_x = "",
  label_y = "",
  colors = NULL
)

Arguments

`data`	data.frame contain x, value, and variable
`class_label`	name of attribute for class label
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector

Value

returns a ggplot graphic

Examples

grf <- plot_boxplot_class(iris |> dplyr::select(Sepal.Width, Species),
 class = "Species", colors=c("red", "green", "blue"))
plot(grf)
grf <- plot_boxplot_class(iris |> dplyr::select(Sepal.Width, Species),
 class = "Species", colors=c("red", "green", "blue"))
plot(grf)

Plot density

Description

This function generates a density plot from a data frame containing numeric values using ggplot2. If the data frame has multiple columns, densities can be grouped and plotted.

Usage

plot_density(
  data,
  label_x = "",
  label_y = "",
  colors = NULL,
  bin = NULL,
  alpha = 0.25
)
plot_density(
  data,
  label_x = "",
  label_y = "",
  colors = NULL,
  bin = NULL,
  alpha = 0.25
)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector
`bin`	bin width for density estimation
`alpha`	level of transparency

Value

returns a ggplot graphic

Examples

grf <- plot_density(iris |> dplyr::select(Sepal.Width), colors="blue")
plot(grf)
grf <- plot_density(iris |> dplyr::select(Sepal.Width), colors="blue")
plot(grf)

Plot density per class

Description

This function generates density plots using ggplot2 grouped by a specified class label from a data frame containing numeric values.

Usage

plot_density_class(
  data,
  class_label,
  label_x = "",
  label_y = "",
  colors = NULL,
  bin = NULL,
  alpha = 0.5
)
plot_density_class(
  data,
  class_label,
  label_x = "",
  label_y = "",
  colors = NULL,
  bin = NULL,
  alpha = 0.5
)

Arguments

`data`	data.frame contain x, value, and variable
`class_label`	name of attribute for class label
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector
`bin`	bin width for density estimation
`alpha`	level of transparency

Value

returns a ggplot graphic

Examples

grf <- plot_density_class(iris |> dplyr::select(Sepal.Width, Species),
 class = "Species", colors=c("red", "green", "blue"))
plot(grf)
grf <- plot_density_class(iris |> dplyr::select(Sepal.Width, Species),
 class = "Species", colors=c("red", "green", "blue"))
plot(grf)

Plot grouped bar

Description

This function generates a grouped bar plot from a given data frame using ggplot2.

Usage

plot_groupedbar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)
plot_groupedbar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector
`alpha`	level of transparency

Value

returns a ggplot graphic

Examples

#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length), Sepal.Width=mean(Sepal.Width))
head(data)

#ploting data
grf <- plot_groupedbar(data, colors=c("blue", "red"))
plot(grf)
#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length), Sepal.Width=mean(Sepal.Width))
head(data)

#ploting data
grf <- plot_groupedbar(data, colors=c("blue", "red"))
plot(grf)

Plot histogram

Description

This function generates a histogram from a specified data frame using ggplot2.

Usage

plot_hist(data, label_x = "", label_y = "", color = "white", alpha = 0.25)
plot_hist(data, label_x = "", label_y = "", color = "white", alpha = 0.25)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`color`	color vector
`alpha`	transparency level

Value

returns a ggplot graphic

Examples

grf <- plot_hist(iris |> dplyr::select(Sepal.Width), color=c("blue"))
plot(grf)
grf <- plot_hist(iris |> dplyr::select(Sepal.Width), color=c("blue"))
plot(grf)

Plot lollipop

Description

This function creates a lollipop chart using ggplot2.

Usage

plot_lollipop(
  data,
  label_x = "",
  label_y = "",
  colors = NULL,
  color_text = "black",
  size_text = 3,
  size_ball = 8,
  alpha_ball = 0.2,
  min_value = 0,
  max_value_gap = 1
)
plot_lollipop(
  data,
  label_x = "",
  label_y = "",
  colors = NULL,
  color_text = "black",
  size_text = 3,
  size_ball = 8,
  alpha_ball = 0.2,
  min_value = 0,
  max_value_gap = 1
)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector
`color_text`	color of text inside ball
`size_text`	size of text inside ball
`size_ball`	size of ball
`alpha_ball`	transparency of ball
`min_value`	minimum value
`max_value_gap`	maximum value gap

Value

returns a ggplot graphic

Examples

#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length))
head(data)

#ploting data
grf <- plot_lollipop(data, colors="blue", max_value_gap=0.2)
plot(grf)
#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length))
head(data)

#ploting data
grf <- plot_lollipop(data, colors="blue", max_value_gap=0.2)
plot(grf)

Plot pie

Description

This function creates a pie chart using ggplot2.

Usage

plot_pieplot(
  data,
  label_x = "",
  label_y = "",
  colors = NULL,
  textcolor = "white",
  bordercolor = "black"
)
plot_pieplot(
  data,
  label_x = "",
  label_y = "",
  colors = NULL,
  textcolor = "white",
  bordercolor = "black"
)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector
`textcolor`	text color
`bordercolor`	border color

Value

returns a ggplot graphic

Examples

#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length))
head(data)

#ploting data
grf <- plot_pieplot(data, colors=c("red", "green", "blue"))
plot(grf)
#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length))
head(data)

#ploting data
grf <- plot_pieplot(data, colors=c("red", "green", "blue"))
plot(grf)

Plot points

Description

This function creates a scatter plot using ggplot2.

Usage

plot_points(data, label_x = "", label_y = "", colors = NULL)
plot_points(data, label_x = "", label_y = "", colors = NULL)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector

Value

returns a ggplot graphic

Examples

x <- seq(0, 10, 0.25)
data <- data.frame(x, sin=sin(x), cosine=cos(x)+5)
head(data)

grf <- plot_points(data, colors=c("red", "green"))
plot(grf)
x <- seq(0, 10, 0.25)
data <- data.frame(x, sin=sin(x), cosine=cos(x)+5)
head(data)

grf <- plot_points(data, colors=c("red", "green"))
plot(grf)

Plot radar

Description

This function creates a radar chart using ggplot2.

Usage

plot_radar(data, label_x = "", label_y = "", colors = NULL)
plot_radar(data, label_x = "", label_y = "", colors = NULL)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector

Value

returns a ggplot graphic

Examples

data <- data.frame(name = "Petal.Length", value = mean(iris$Petal.Length))
data <- rbind(data, data.frame(name = "Petal.Width", value = mean(iris$Petal.Width)))
data <- rbind(data, data.frame(name = "Sepal.Length", value = mean(iris$Sepal.Length)))
data <- rbind(data, data.frame(name = "Sepal.Width", value = mean(iris$Sepal.Width)))

grf <- plot_radar(data, colors="red") + ggplot2::ylim(0, NA)
plot(grf)
data <- data.frame(name = "Petal.Length", value = mean(iris$Petal.Length))
data <- rbind(data, data.frame(name = "Petal.Width", value = mean(iris$Petal.Width)))
data <- rbind(data, data.frame(name = "Sepal.Length", value = mean(iris$Sepal.Length)))
data <- rbind(data, data.frame(name = "Sepal.Width", value = mean(iris$Sepal.Width)))

grf <- plot_radar(data, colors="red") + ggplot2::ylim(0, NA)
plot(grf)

Scatter graph

Description

This function creates a scatter plot using ggplot2.

Usage

plot_scatter(data, label_x = "", label_y = "", colors = NULL)
plot_scatter(data, label_x = "", label_y = "", colors = NULL)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector

Value

return a ggplot graphic

Examples

grf <- plot_scatter(iris |> dplyr::select(x = Sepal.Length,
 value = Sepal.Width, variable = Species),
 label_x = "Sepal.Length", label_y = "Sepal.Width",
 colors=c("red", "green", "blue"))
 plot(grf)
grf <- plot_scatter(iris |> dplyr::select(x = Sepal.Length,
 value = Sepal.Width, variable = Species),
 label_x = "Sepal.Length", label_y = "Sepal.Width",
 colors=c("red", "green", "blue"))
 plot(grf)

Plot series

Description

This function creates a time series plot using ggplot2.

Usage

plot_series(data, label_x = "", label_y = "", colors = NULL)
plot_series(data, label_x = "", label_y = "", colors = NULL)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector

Value

returns a ggplot graphic

Examples

x <- seq(0, 10, 0.25)
data <- data.frame(x, sin=sin(x))
head(data)

grf <- plot_series(data, colors=c("red"))
plot(grf)
x <- seq(0, 10, 0.25)
data <- data.frame(x, sin=sin(x))
head(data)

grf <- plot_series(data, colors=c("red"))
plot(grf)

Plot stacked bar

Description

this function creates a stacked bar chart using ggplot2.

Usage

plot_stackedbar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)
plot_stackedbar(data, label_x = "", label_y = "", colors = NULL, alpha = 1)

Arguments

`data`	data.frame contain x, value, and variable
`label_x`	x-axis label
`label_y`	y-axis label
`colors`	color vector
`alpha`	level of transparency

Value

returns a ggplot graphic

Examples

#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length), Sepal.Width=mean(Sepal.Width))

#plotting data
grf <- plot_stackedbar(data, colors=c("blue", "red"))
plot(grf)
#summarizing iris dataset
data <- iris |> dplyr::group_by(Species) |>
 dplyr::summarize(Sepal.Length=mean(Sepal.Length), Sepal.Width=mean(Sepal.Width))

#plotting data
grf <- plot_stackedbar(data, colors=c("blue", "red"))
plot(grf)

Plot time series chart

Description

This function plots a time series chart with points and a line using ggplot2.

Usage

plot_ts(x = NULL, y, label_x = "", label_y = "", color = "black")
plot_ts(x = NULL, y, label_x = "", label_y = "", color = "black")

Arguments

`x`	input variable
`y`	output variable
`label_x`	x-axis label
`label_y`	y-axis label
`color`	color for time series

Value

returns a ggplot graphic

Examples

x <- seq(0, 10, 0.25)
data <- data.frame(x, sin=sin(x))
head(data)

grf <- plot_ts(x = data$x, y = data$sin, color=c("red"))
plot(grf)
x <- seq(0, 10, 0.25)
data <- data.frame(x, sin=sin(x))
head(data)

grf <- plot_ts(x = data$x, y = data$sin, color=c("red"))
plot(grf)

Plot a time series chart with predictions

Description

This function plots a time series chart with three lines: the original series, the adjusted series, and the predicted series using ggplot2.

Usage

plot_ts_pred(
  x = NULL,
  y,
  yadj,
  ypred = NULL,
  label_x = "",
  label_y = "",
  color = "black",
  color_adjust = "blue",
  color_prediction = "green"
)
plot_ts_pred(
  x = NULL,
  y,
  yadj,
  ypred = NULL,
  label_x = "",
  label_y = "",
  color = "black",
  color_adjust = "blue",
  color_prediction = "green"
)

Arguments

`x`	time index
`y`	time series
`yadj`	adjustment of time series
`ypred`	prediction of the time series
`label_x`	x-axis title
`label_y`	y-axis title
`color`	color for the time series
`color_adjust`	color for the adjusted values
`color_prediction`	color for the predictions

Value

returns a ggplot graphic

Examples

data(sin_data)
ts <- ts_data(sin_data$y, 0)
ts_head(ts, 3)


samp <- ts_sample(ts, test_size= 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_arima()
model <- fit(model, x=io_train$input, y=io_train$output)
adjust <- predict(model, io_train$input)

prediction <- predict(model, x=io_test$input, steps_ahead=5)
prediction <- as.vector(prediction)

yvalues <- c(io_train$output, io_test$output)
grf <- plot_ts_pred(y=yvalues, yadj=adjust, ypre=prediction)
plot(grf)
data(sin_data)
ts <- ts_data(sin_data$y, 0)
ts_head(ts, 3)


samp <- ts_sample(ts, test_size= 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_arima()
model <- fit(model, x=io_train$input, y=io_train$output)
adjust <- predict(model, io_train$input)

prediction <- predict(model, x=io_test$input, steps_ahead=5)
prediction <- as.vector(prediction)

yvalues <- c(io_train$output, io_test$output)
grf <- plot_ts_pred(y=yvalues, yadj=adjust, ypre=prediction)
plot(grf)

DAL Predict

Description

Ancestor class for regression and classification It provides basis for fit and predict methods. Besides, action method proxies to predict.

An example of learner is a decision tree (cla_dtree)

Usage

predictor()
predictor()

Value

returns a predictor object

Examples

#See ?cla_dtree for a classification example using a decision tree
#See ?cla_dtree for a classification example using a decision tree

R2

Description

Compute the R-squared (R2) between actual values and forecasts of a time series

Usage

R2.ts(actual, prediction)
R2.ts(actual, prediction)

Arguments

`actual`	real observations
`prediction`	predicted observations

Value

returns a number, which is the calculated R2

Decision Tree for regression

Description

Creates a regression object that uses the Decision Tree method for regression It wraps the tree library.

Usage

reg_dtree(attribute)
reg_dtree(attribute)

Arguments

attribute

attribute target to model building.

Value

returns a decision tree regression object

Examples

data(Boston)
model <- reg_dtree("medv")

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics
data(Boston)
model <- reg_dtree("medv")

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics

knn regression

Description

Creates a regression object that uses the K-Nearest Neighbors (knn) method for regression

Usage

reg_knn(attribute, k)
reg_knn(attribute, k)

Arguments

`attribute`	attribute target to model building
`k`	number of k neighbors

Value

returns a knn regression object

Examples

data(Boston)
model <- reg_knn("medv", k=3)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics
data(Boston)
model <- reg_knn("medv", k=3)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics

MLP for regression

Description

Creates a regression object that uses the Multi-Layer Perceptron (MLP) method. It wraps the nnet library.

Usage

reg_mlp(attribute, size = NULL, decay = 0.05, maxit = 1000)
reg_mlp(attribute, size = NULL, decay = 0.05, maxit = 1000)

Arguments

`attribute`	attribute target to model building
`size`	number of neurons in hidden layers
`decay`	decay learning rate
`maxit`	number of maximum iterations for training

Value

returns a object of class reg_mlp

Examples

data(Boston)
model <- reg_mlp("medv", size=5, decay=0.54)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics
data(Boston)
model <- reg_mlp("medv", size=5, decay=0.54)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics

Random Forest for regression

Description

Creates a regression object that uses the Random Forest method. It wraps the randomForest library.

Usage

reg_rf(attribute, nodesize = 1, ntree = 10, mtry = NULL)
reg_rf(attribute, nodesize = 1, ntree = 10, mtry = NULL)

Arguments

`attribute`	attribute target to model building
`nodesize`	node size
`ntree`	number of trees
`mtry`	number of attributes to build tree

Value

returns an object of class reg_rfobj

Examples

data(Boston)
model <- reg_rf("medv", ntree=10)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics
data(Boston)
model <- reg_rf("medv", ntree=10)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics

SVM for regression

Description

Creates a regression object that uses the Support Vector Machine (SVM) method for regression It wraps the e1071 and svm library.

Usage

reg_svm(attribute, epsilon = 0.1, cost = 10, kernel = "radial")
reg_svm(attribute, epsilon = 0.1, cost = 10, kernel = "radial")

Arguments

`attribute`	attribute target to model building
`epsilon`	parameter that controls the width of the margin around the separating hyperplane
`cost`	parameter that controls the trade-off between having a wide margin and correctly classifying training data points
`kernel`	the type of kernel function to be used in the SVM algorithm (linear, radial, polynomial, sigmoid)

Value

returns a SVM regression object

Examples

data(Boston)
model <- reg_svm("medv", epsilon=0.2,cost=40.000)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics
data(Boston)
model <- reg_svm("medv", epsilon=0.2,cost=40.000)

# preparing dataset for random sampling
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

model <- fit(model, train)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics

Regression Tune

Description

Creates an object for tuning regression models

Usage

reg_tune(base_model, folds = 10)
reg_tune(base_model, folds = 10)

Arguments

`base_model`	base model for tuning
`folds`	number of folds for cross-validation

Value

returns a reg_tune object.

Examples

# preparing dataset for random sampling
data(Boston)
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

# hyper parameter setup
tune <- reg_tune(reg_mlp("medv"))
ranges <- list(size=c(3), decay=c(0.1,0.5))

# hyper parameter optimization
model <- fit(tune, train, ranges)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics
# preparing dataset for random sampling
data(Boston)
sr <- sample_random()
sr <- train_test(sr, Boston)
train <- sr$train
test <- sr$test

# hyper parameter setup
tune <- reg_tune(reg_mlp("medv"))
ranges <- list(size=c(3), decay=c(0.1,0.5))

# hyper parameter optimization
model <- fit(tune, train, ranges)

test_prediction <- predict(model, test)
test_predictand <- test[,"medv"]
test_eval <- evaluate(model, test_predictand, test_prediction)
test_eval$metrics

Regression

Description

Ancestor class for regression problems. This ancestor class is used to define and manage the target attribute for regression tasks.

Usage

regression(attribute)
regression(attribute)

Arguments

attribute

attribute target to model building

Value

returns a regression object

Examples

#See ?reg_dtree for a regression example using a decision tree
#See ?reg_dtree for a regression example using a decision tree

Stacked Autoencoder - Encode

Description

Creates an deep learning stacked autoencoder to encode a sequence of observations. The autoencoder layers are based on DAL Toolbox Vanilla Autoencoder It wraps the pytorch library.

Usage

sae_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001,
  k = 3
)
sae_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001,
  k = 3
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate
`k`	number of AE layers in the stack

Value

a sae_encode_decode object.

Examples

#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples

Stacked Autoencoder - Encode

Description

Creates an deep learning stacked autoencoder to encode a sequence of observations. The autoencoder layers are based on DAL Toolbox Vanilla Autoencoder It wraps the pytorch library.

Usage

sae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001,
  k = 3
)
sae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001,
  k = 3
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate
`k`	number of AE layers in the stack

Value

a sae_encode_decode object.

Examples

#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples
#See example at https://nbviewer.org/github/cefet-rj-dal/daltoolbox-examples

Sample Random

Description

The sample_random function in R is used to generate a random sample of specified size from a given data set.

Usage

sample_random()
sample_random()

Value

returns an object of class 'sample_random

Examples

#using random sampling
sample <- sample_random()
tt <- train_test(sample, iris)

# distribution of train
table(tt$train$Species)

# preparing dataset into four folds
folds <- k_fold(sample, iris, 4)

# distribution of folds
tbl <- NULL
for (f in folds) {
 tbl <- rbind(tbl, table(f$Species))
}
head(tbl)
#using random sampling
sample <- sample_random()
tt <- train_test(sample, iris)

# distribution of train
table(tt$train$Species)

# preparing dataset into four folds
folds <- k_fold(sample, iris, 4)

# distribution of folds
tbl <- NULL
for (f in folds) {
 tbl <- rbind(tbl, table(f$Species))
}
head(tbl)

Stratified Random Sampling

Description

The sample_stratified function in R is used to generate a stratified random sample from a given dataset. Stratified sampling is a statistical method that is used when the population is divided into non-overlapping subgroups or strata, and a sample is selected from each stratum to represent the entire population. In stratified sampling, the sample is selected in such a way that it is representative of the entire population and the variability within each stratum is minimized.

Usage

sample_stratified(attribute)
sample_stratified(attribute)

Arguments

attribute

attribute target to model building

Value

returns an object of class sample_stratified

Examples

#using stratified sampling
sample <- sample_stratified("Species")
tt <- train_test(sample, iris)

# distribution of train
table(tt$train$Species)

# preparing dataset into four folds
folds <- k_fold(sample, iris, 4)

# distribution of folds
tbl <- NULL
for (f in folds) {
 tbl <- rbind(tbl, table(f$Species))
}
head(tbl)
#using stratified sampling
sample <- sample_stratified("Species")
tt <- train_test(sample, iris)

# distribution of train
table(tt$train$Species)

# preparing dataset into four folds
folds <- k_fold(sample, iris, 4)

# distribution of folds
tbl <- NULL
for (f in folds) {
 tbl <- rbind(tbl, table(f$Species))
}
head(tbl)

Selection hyper parameters

Description

Selects the optimal hyperparameters from a dataset resulting from k-fold cross-validation

Usage

select_hyper(obj, hyperparameters)
select_hyper(obj, hyperparameters)

Arguments

`obj`	the object or model used for hyperparameter selection.
`hyperparameters`	data set with hyper parameters and quality measure from execution

Value

returns the index of selected hyper parameter

selection of hyperparameters

Description

Selects the optimal hyperparameter by maximizing the average classification metric. It wraps dplyr library.

Usage

## S3 method for class 'cla_tune'
select_hyper(obj, hyperparameters)
## S3 method for class 'cla_tune'
select_hyper(obj, hyperparameters)

Arguments

`obj`	an object representing the model or tuning process
`hyperparameters`	a dataframe with columns `key` (hyperparameter configuration) and `metric` (classification metric)

Value

returns a optimized key number of hyperparameters

Select Optimal Hyperparameters for Time Series Models

Description

Identifies the optimal hyperparameters by minimizing the error from a dataset of hyperparameters. The function selects the hyperparameter configuration that results in the lowest average error. It wraps the dplyr library.

Usage

## S3 method for class 'ts_tune'
select_hyper(obj, hyperparameters)
## S3 method for class 'ts_tune'
select_hyper(obj, hyperparameters)

Arguments

`obj`	a `ts_tune` object containing the model and tuning settings
`hyperparameters`	hyperparameters dataset

Value

returns the optimized key number of hyperparameters

Assign parameters

Description

set_params function assigns all parameters to the attributes presented in the object.

Usage

set_params(obj, params)
set_params(obj, params)

Arguments

`obj`	object of class dal_base
`params`	parameters to set obj

Value

returns an object with parameters set

Examples

obj <- set_params(dal_base(), list(x = 0))
obj <- set_params(dal_base(), list(x = 0))

Default Assign parameters

Description

Default method for set_params which returns the object unchanged

Usage

## Default S3 method:
set_params(obj, params)
## Default S3 method:
set_params(obj, params)

Arguments

`obj`	object
`params`	parameters

Value

returns the object unchanged

Time series example dataset

Description

Synthetic dataset of sine function.

x: correspond time from 0 to 10.
y: dependent variable for time series modeling.

Usage

data(sin_data)
data(sin_data)

Format

data.frame.

Source

This dataset was generated for examples.

Examples

data(sin_data)
head(sin_data)
data(sin_data)
head(sin_data)

sMAPE

Description

Compute the symmetric mean absolute percent error (sMAPE)

Usage

sMAPE.ts(actual, prediction)
sMAPE.ts(actual, prediction)

Arguments

`actual`	real observations
`prediction`	predicted observations

Value

returns the sMAPE between the actual and prediction vectors

Smoothing

Description

Smoothing is a statistical technique used to reduce the noise in a signal or a dataset by removing the high-frequency components. The smoothing level is associated with the number of bins used. There are alternative methods to establish the smoothing: equal interval, equal frequency, and clustering.

Usage

smoothing(n)
smoothing(n)

Arguments

`n`	number of bins

Value

returns an object of class smoothing

Examples

data(iris)
obj <- smoothing_inter(n = 2)
obj <- fit(obj, iris$Sepal.Length)
sl.bi <- transform(obj, iris$Sepal.Length)
table(sl.bi)
obj$interval

entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species)
entro$entropy
data(iris)
obj <- smoothing_inter(n = 2)
obj <- fit(obj, iris$Sepal.Length)
sl.bi <- transform(obj, iris$Sepal.Length)
table(sl.bi)
obj$interval

entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species)
entro$entropy

Smoothing by cluster

Description

Uses clustering method to perform data smoothing. The input vector is divided into clusters using the k-means algorithm. The mean of each cluster is then calculated and used as the smoothed value for all observations within that cluster.

Usage

smoothing_cluster(n)
smoothing_cluster(n)

Arguments

`n`	number of bins

Value

returns an object of class smoothing_cluster

Examples

data(iris)
obj <- smoothing_cluster(n = 2)
obj <- fit(obj, iris$Sepal.Length)
sl.bi <- transform(obj, iris$Sepal.Length)
table(sl.bi)
obj$interval

entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species)
entro$entropy
data(iris)
obj <- smoothing_cluster(n = 2)
obj <- fit(obj, iris$Sepal.Length)
sl.bi <- transform(obj, iris$Sepal.Length)
table(sl.bi)
obj$interval

entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species)
entro$entropy

Smoothing by Freq

Description

The 'smoothing_freq' function is used to smooth a given time series data by aggregating observations within a fixed frequency.

Usage

smoothing_freq(n)
smoothing_freq(n)

Arguments

`n`	number of bins

Value

returns an object of class smoothing_freq

Examples

data(iris)
obj <- smoothing_freq(n = 2)
obj <- fit(obj, iris$Sepal.Length)
sl.bi <- transform(obj, iris$Sepal.Length)
table(sl.bi)
obj$interval

entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species)
entro$entropy
data(iris)
obj <- smoothing_freq(n = 2)
obj <- fit(obj, iris$Sepal.Length)
sl.bi <- transform(obj, iris$Sepal.Length)
table(sl.bi)
obj$interval

entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species)
entro$entropy

Smoothing by interval

Description

The "smoothing by interval" function is used to apply a smoothing technique to a vector or time series data using a moving window approach.

Usage

smoothing_inter(n)
smoothing_inter(n)

Arguments

`n`	number of bins

Value

returns an object of class smoothing_inter

Examples

data(iris)
obj <- smoothing_inter(n = 2)
obj <- fit(obj, iris$Sepal.Length)
sl.bi <- transform(obj, iris$Sepal.Length)
table(sl.bi)
obj$interval

entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species)
entro$entropy
data(iris)
obj <- smoothing_inter(n = 2)
obj <- fit(obj, iris$Sepal.Length)
sl.bi <- transform(obj, iris$Sepal.Length)
table(sl.bi)
obj$interval

entro <- evaluate(obj, as.factor(names(sl.bi)), iris$Species)
entro$entropy

Train-Test Partition

Description

Partitions a dataset into training and test sets using a specified sampling method

Usage

train_test(obj, data, perc = 0.8, ...)
train_test(obj, data, perc = 0.8, ...)

Arguments

`obj`	an object of a class that supports the `train_test` method
`data`	dataset to be partitioned
`perc`	a numeric value between 0 and 1 specifying the proportion of data to be used for training
`...`	additional optional arguments passed to specific methods.

Value

returns an list with two elements:

train: A data frame containing the training set
test: A data frame containing the test set

Examples

#using random sampling
sample <- sample_random()
tt <- train_test(sample, iris)

# distribution of train
table(tt$train$Species)
#using random sampling
sample <- sample_random()
tt <- train_test(sample, iris)

# distribution of train
table(tt$train$Species)

k-fold training and test partition object

Description

Splits a dataset into training and test sets based on k-fold cross-validation. The function takes a list of data partitions (folds) and a specified fold index k. It returns the data corresponding to the k-th fold as the test set, and combines all other folds to form the training set.

Usage

train_test_from_folds(folds, k)
train_test_from_folds(folds, k)

Arguments

`folds`	data partitioned into folds
`k`	k-fold for test set, all reminder for training set

Value

returns a list with two elements:

train: A data frame containing the combined data from all folds except the k-th fold, used as the training set.
test: A data frame corresponding to the k-th fold, used as the test set.

Examples

# Create k-fold partitions of a dataset (e.g., iris)
folds <- k_fold(sample_random(), iris, k = 5)

# Use the first fold as the test set and combine the remaining folds for the training set
train_test_split <- train_test_from_folds(folds, k = 1)

# Display the training set
head(train_test_split$train)

# Display the test set
head(train_test_split$test)
# Create k-fold partitions of a dataset (e.g., iris)
folds <- k_fold(sample_random(), iris, k = 5)

# Use the first fold as the test set and combine the remaining folds for the training set
train_test_split <- train_test_from_folds(folds, k = 1)

# Display the training set
head(train_test_split$train)

# Display the test set
head(train_test_split$test)

Transform

Description

Defines a transformation method.

Usage

transform(obj, ...)
transform(obj, ...)

Arguments

`obj`	a `dal_transform` object.
`...`	optional arguments.

Value

returns a transformed data.

Examples

#See ?minmax for an example of transformation
#See ?minmax for an example of transformation

ARIMA

Description

Creates a time series prediction object that uses the AutoRegressive Integrated Moving Average (ARIMA). It wraps the forecast library.

Usage

ts_arima()
ts_arima()

Value

returns a ts_arima object.

Examples

data(sin_data)
ts <- ts_data(sin_data$y, 0)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_arima()
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test
data(sin_data)
ts <- ts_data(sin_data$y, 0)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_arima()
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test

Conv1D

Description

Creates a time series prediction object that uses the Conv1D. It wraps the pytorch library.

Usage

ts_conv1d(preprocess = NA, input_size = NA, epochs = 10000L)
ts_conv1d(preprocess = NA, input_size = NA, epochs = 10000L)

Arguments

`preprocess`	normalization
`input_size`	input size for machine learning model
`epochs`	maximum number of epochs

Value

returns a ts_conv1d object.

Examples

#See an example of using `ts_conv1d` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/timeseries/ts_conv1d.ipynb)
#See an example of using `ts_conv1d` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/timeseries/ts_conv1d.ipynb)

ts_data

Description

Time series data structure used in DAL Toolbox. It receives a vector (representing a time series) or a matrix y (representing a sliding windows). Internal ts_data is matrix of sliding windows with size sw. If sw equals to zero, it store a time series as a single matrix column.

Usage

ts_data(y, sw = 1)
ts_data(y, sw = 1)

Arguments

`y`	output variable
`sw`	integer: sliding window size.

Value

returns a ts_data object.

Examples

data(sin_data)
head(sin_data)

data <- ts_data(sin_data$y)
ts_head(data)

data10 <- ts_data(sin_data$y, 10)
ts_head(data10)
data(sin_data)
head(sin_data)

data <- ts_data(sin_data$y)
ts_head(data)

data10 <- ts_data(sin_data$y, 10)
ts_head(data10)

ELM

Description

Creates a time series prediction object that uses the Extreme Learning Machine (ELM). It wraps the elmNNRcpp library.

Usage

ts_elm(preprocess = NA, input_size = NA, nhid = NA, actfun = "purelin")
ts_elm(preprocess = NA, input_size = NA, nhid = NA, actfun = "purelin")

Arguments

`preprocess`	normalization
`input_size`	input size for machine learning model
`nhid`	ensemble size
`actfun`	defines the type to use, possible values: 'sig', 'radbas', 'tribas', 'relu', 'purelin' (default).

Value

returns a ts_elm object.

Examples

data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_elm(ts_norm_gminmax(), input_size=4, nhid=3, actfun="purelin")
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test
data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_elm(ts_norm_gminmax(), input_size=4, nhid=3, actfun="purelin")
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test

Extract the First Observations from a `ts_data` Object

Description

Returns the first n observations from a ts_data

Usage

ts_head(x, n = 6L, ...)
ts_head(x, n = 6L, ...)

Arguments

`x`	`ts_data` object
`n`	number of rows to return
`...`	optional arguments

Value

returns the first n observations of a ts_data

Examples

data(sin_data)
data10 <- ts_data(sin_data$y, 10)
ts_head(data10)
data(sin_data)
data10 <- ts_data(sin_data$y, 10)
ts_head(data10)

KNN time series prediction

Description

Creates a prediction object that uses the K-Nearest Neighbors (KNN) method for time series regression

Usage

ts_knn(preprocess = NA, input_size = NA, k = NA)
ts_knn(preprocess = NA, input_size = NA, k = NA)

Arguments

`preprocess`	normalization
`input_size`	input size for machine learning model
`k`	number of k neighbors

Value

returns a ts_knn object.

Examples

data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_knn(ts_norm_gminmax(), input_size=4, k=3)
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test
data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_knn(ts_norm_gminmax(), input_size=4, k=3)
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test

LSTM

Description

Creates a time series prediction object that uses the LSTM. It wraps the pytorch library.

Usage

ts_lstm(preprocess = NA, input_size = NA, epochs = 10000L)
ts_lstm(preprocess = NA, input_size = NA, epochs = 10000L)

Arguments

`preprocess`	normalization
`input_size`	input size for machine learning model
`epochs`	maximum number of epochs

Value

returns a ts_lstm object.

Examples

#See an example of using `ts_ts_lstmconv1d` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/timeseries/ts_lstm.ipynb)
#See an example of using `ts_ts_lstmconv1d` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/timeseries/ts_lstm.ipynb)

MLP

Description

Creates a time series prediction object that uses the Multilayer Perceptron (MLP). It wraps the nnet library.

Usage

ts_mlp(preprocess = NA, input_size = NA, size = NA, decay = 0.01, maxit = 1000)
ts_mlp(preprocess = NA, input_size = NA, size = NA, decay = 0.01, maxit = 1000)

Arguments

`preprocess`	normalization
`input_size`	input size for machine learning model
`size`	number of neurons inside hidden layer
`decay`	decay parameter for MLP
`maxit`	maximum number of iterations

Value

returns a ts_mlp object.

Examples

data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_mlp(ts_norm_gminmax(), input_size=4, size=4, decay=0)
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test
data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_mlp(ts_norm_gminmax(), input_size=4, size=4, decay=0)
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test

Time Series Adaptive Normalization

Description

Transform data to a common scale while taking into account the changes in the statistical properties of the data over time.

Usage

ts_norm_an(remove_outliers = TRUE, nw = 0)
ts_norm_an(remove_outliers = TRUE, nw = 0)

Arguments

`remove_outliers`	logical: if TRUE (default) outliers will be removed.
`nw`	integer: window size.

Value

returns a ts_norm_an object.

Examples

# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_an()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,10])
# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_an()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,10])

Time Series Diff

Description

This function calculates the difference between the values of a time series.

Usage

ts_norm_diff(remove_outliers = TRUE)
ts_norm_diff(remove_outliers = TRUE)

Arguments

remove_outliers

logical: if TRUE (default) outliers will be removed.

Value

returns a ts_norm_diff object.

Examples

# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_diff()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,9])
# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_diff()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,9])

Time Series Adaptive Normalization (Exponential Moving Average - EMA)

Description

Creates a normalization object for time series data using an Exponential Moving Average (EMA) method. This normalization approach adapts to changes in the time series and optionally removes outliers.

Usage

ts_norm_ean(remove_outliers = TRUE, nw = 0)
ts_norm_ean(remove_outliers = TRUE, nw = 0)

Arguments

`remove_outliers`	logical: if TRUE (default) outliers will be removed.
`nw`	windows size

Value

returns a ts_norm_ean object.

Examples

# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_ean()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,10])
# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_ean()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,10])

Time Series Global Min-Max

Description

Rescales data, so the minimum value is mapped to 0 and the maximum value is mapped to 1.

Usage

ts_norm_gminmax(remove_outliers = TRUE)
ts_norm_gminmax(remove_outliers = TRUE)

Arguments

remove_outliers

logical: if TRUE (default) outliers will be removed.

Value

returns a ts_norm_gminmax object.

Examples

# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_gminmax()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,10])
# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_gminmax()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,10])

Time Series Sliding Window Min-Max

Description

The ts_norm_swminmax function creates an object for normalizing a time series based on the "sliding window min-max scaling" method

Usage

ts_norm_swminmax(remove_outliers = TRUE)
ts_norm_swminmax(remove_outliers = TRUE)

Arguments

remove_outliers

logical: if TRUE (default) outliers will be removed.

Value

returns a ts_norm_swminmax object.

Examples

# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_swminmax()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,10])
# time series to normalize
data(sin_data)

# convert to sliding windows
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)
summary(ts[,10])

# normalization
preproc <- ts_norm_swminmax()
preproc <- fit(preproc, ts)
tst <- transform(preproc, ts)
ts_head(tst, 3)
summary(tst[,10])

Time Series Projection

Description

Separates a ts_data object into input and output components for time series analysis. This function is useful for preparing data for modeling, where the input and output variables are extracted from a time series dataset.

Usage

ts_projection(ts)
ts_projection(ts)

Arguments

`ts`	matrix or data.frame containing the time series.

Value

returns a ts_projection object.

Examples

#setting up a ts_data
data(sin_data)
ts <- ts_data(sin_data$y, 10)

io <- ts_projection(ts)

#input data
ts_head(io$input)

#output data
ts_head(io$output)
#setting up a ts_data
data(sin_data)
ts <- ts_data(sin_data$y, 10)

io <- ts_projection(ts)

#input data
ts_head(io$input)

#output data
ts_head(io$output)

TSReg

Description

Time Series Regression directly from time series Ancestral class for non-sliding windows implementation.

Usage

ts_reg()
ts_reg()

Value

returns ts_reg object

Examples

#See ?ts_arima for an example using Auto-regressive Integrated Moving Average
#See ?ts_arima for an example using Auto-regressive Integrated Moving Average

TSRegSW

Description

Time Series Regression from Sliding Windows. Ancestral class for Machine Learning Implementation.

Usage

ts_regsw(preprocess = NA, input_size = NA)
ts_regsw(preprocess = NA, input_size = NA)

Arguments

`preprocess`	normalization
`input_size`	input size for machine learning model

Value

returns a ts_regsw object

Examples

#See ?ts_elm for an example using Extreme Learning Machine
#See ?ts_elm for an example using Extreme Learning Machine

Random Forest

Description

Creates a time series prediction object that uses the Random Forest. It wraps the randomForest library.

Usage

ts_rf(preprocess = NA, input_size = NA, nodesize = 1, ntree = 10, mtry = NULL)
ts_rf(preprocess = NA, input_size = NA, nodesize = 1, ntree = 10, mtry = NULL)

Arguments

`preprocess`	normalization
`input_size`	input size for machine learning model
`nodesize`	node size
`ntree`	number of trees
`mtry`	number of attributes to build tree

Value

returns a ts_rf object.

Examples

data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_rf(ts_norm_gminmax(), input_size=4, nodesize=3, ntree=50)
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test
data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_rf(ts_norm_gminmax(), input_size=4, nodesize=3, ntree=50)
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test

Time Series Sample

Description

Separates the ts_data into training and test. It separates the test size from the last observations minus an offset. The offset is important to allow replication under different recent origins. The data for train uses the number of rows of a ts_data minus the test size and offset.

Usage

ts_sample(ts, test_size = 1, offset = 0)
ts_sample(ts, test_size = 1, offset = 0)

Arguments

`ts`	time series.
`test_size`	integer: size of test data (default = 1).
`offset`	integer: starting point (default = 0).

Value

returns a list with the two samples

Examples

#setting up a ts_data
data(sin_data)
ts <- ts_data(sin_data$y, 10)

#separating into train and test
test_size <- 3
samp <- ts_sample(ts, test_size)

#first five rows from training data
ts_head(samp$train, 5)

#last five rows from training data
ts_head(samp$train[-c(1:(nrow(samp$train)-5)),])

#testing data
ts_head(samp$test)
#setting up a ts_data
data(sin_data)
ts <- ts_data(sin_data$y, 10)

#separating into train and test
test_size <- 3
samp <- ts_sample(ts, test_size)

#first five rows from training data
ts_head(samp$train, 5)

#last five rows from training data
ts_head(samp$train[-c(1:(nrow(samp$train)-5)),])

#testing data
ts_head(samp$test)

SVM

Description

Creates a time series prediction object that uses the Support Vector Machine (SVM). It wraps the e1071 library.

Usage

ts_svm(
  preprocess = NA,
  input_size = NA,
  kernel = "radial",
  epsilon = 0,
  cost = 10
)
ts_svm(
  preprocess = NA,
  input_size = NA,
  kernel = "radial",
  epsilon = 0,
  cost = 10
)

Arguments

`preprocess`	normalization
`input_size`	input size for machine learning model
`kernel`	SVM kernel (linear, radial, polynomial, sigmoid)
`epsilon`	error threshold
`cost`	this parameter controls the trade-off between achieving a low error on the training data and minimizing the model complexity

Value

returns a ts_svm object.

Examples

data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_svm(ts_norm_gminmax(), input_size=4)
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test
data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

model <- ts_svm(ts_norm_gminmax(), input_size=4)
model <- fit(model, x=io_train$input, y=io_train$output)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test

Time Series Tune

Description

Creates a ts_tune object for tuning hyperparameters of a time series model. This function sets up a tuning process for the specified base model by exploring different configurations of hyperparameters using cross-validation.

Usage

ts_tune(input_size, base_model, folds = 10)
ts_tune(input_size, base_model, folds = 10)

Arguments

`input_size`	input size for machine learning model
`base_model`	base model for tuning
`folds`	number of folds for cross-validation

Value

returns a ts_tune object

Examples

data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

tune <- ts_tune(input_size=c(3:5), base_model = ts_elm(ts_norm_gminmax()))
ranges <- list(nhid = 1:5, actfun=c('purelin'))

# Generic model tunning
model <- fit(tune, x=io_train$input, y=io_train$output, ranges)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test
data(sin_data)
ts <- ts_data(sin_data$y, 10)
ts_head(ts, 3)

samp <- ts_sample(ts, test_size = 5)
io_train <- ts_projection(samp$train)
io_test <- ts_projection(samp$test)

tune <- ts_tune(input_size=c(3:5), base_model = ts_elm(ts_norm_gminmax()))
ranges <- list(nhid = 1:5, actfun=c('purelin'))

# Generic model tunning
model <- fit(tune, x=io_train$input, y=io_train$output, ranges)

prediction <- predict(model, x=io_test$input[1,], steps_ahead=5)
prediction <- as.vector(prediction)
output <- as.vector(io_test$output)

ev_test <- evaluate(model, output, prediction)
ev_test

Variational Autoencoder - Encode

Description

Creates an deep learning variational autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

varae_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)
varae_encode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

returns a varae_encode object.

Examples

#See an example of using `varae_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/varae_encode.ipynb)
#See an example of using `varae_encode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/varae_encode.ipynb)

Variational Autoencoder - Encode

Description

Creates an deep learning variational autoencoder to encode a sequence of observations. It wraps the pytorch library.

Usage

varae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)
varae_encode_decode(
  input_size,
  encoding_size,
  batch_size = 32,
  num_epochs = 1000,
  learning_rate = 0.001
)

Arguments

`input_size`	input size
`encoding_size`	encoding size
`batch_size`	size for batch learning
`num_epochs`	number of epochs for training
`learning_rate`	learning rate

Value

returns a varae_encode_decode object.

Examples

#See an example of using `varae_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/varae_enc_decode.ipynb)
#See an example of using `varae_encode_decode` at this
#[link](https://github.com/cefet-rj-dal/daltoolbox/blob/main/transf/varae_enc_decode.ipynb)

Z-score normalization

Description

Scale data using z-score normalization.

$zscore = (x - mean(x))/sd(x)$

Usage

zscore(nmean = 0, nsd = 1)
zscore(nmean = 0, nsd = 1)

Arguments

`nmean`	new mean for normalized data
`nsd`	new standard deviation for normalized data

Value

returns the z-score transformation object

Examples

data(iris)
head(iris)

trans <- zscore()
trans <- fit(trans, iris)
tiris <- transform(trans, iris)
head(tiris)

itiris <- inverse_transform(trans, tiris)
head(itiris)
data(iris)
head(iris)

trans <- zscore()
trans <- fit(trans, iris)
tiris <- transform(trans, iris)
head(tiris)

itiris <- inverse_transform(trans, tiris)
head(itiris)

`x`	vector to be categorized
`valTrue`	value to represent true
`valFalse`	value to represent false

`value`	vector to be converted into factor
`ilevels`	order for categorical values
`slevels`	labels for categorical values

Package 'daltoolbox'

Help Index

Subset Extraction for Time Series Data

Description

Usage

Arguments

Value

Examples

Adversarial Autoencoder - Encode

Description

Usage

Arguments

Value

Adversarial Autoencoder - Encode

Description

Usage

Arguments

Value

Examples

Action

Description

Usage

Arguments

Value

Examples

Action implementation for transform

Description

Usage

Arguments

Value

Examples

Adjust categorical mapping

Description

Usage

Arguments

Value

Adjust to data frame

Description

Usage

Arguments

Value

Examples

Adjust factors

Description

Usage

Arguments

Value

Adjust to matrix

Description

Usage

Arguments

Value

Examples

Adjust ts_data

Description

Usage

Arguments

Value

Autoencoder - Encode

Description

Usage

Arguments

Value

Examples

Autoencoder - Encode-decode

Description

Usage

Arguments

Value

Examples

Boston Housing Data (Regression)

Description

Usage

Format

Source

References

Examples

Convolutional Autoencoder - Encode

Description

Usage

Adjust `ts_data`