Title: | Learn Supervised Classification Methods Through Examples and Code |
---|---|
Description: | Supervised classification methods, which (if asked) can provide step-by-step explanations of the algorithms used, as described in PK Josephine et. al., (2021) <doi:10.59176/kjcs.v1i1.1259>; and datasets to test them on, which highlight the strengths and weaknesses of each technique. |
Authors: | Víctor Amador Padilla [aut], Juan Jose Cuadrado Gallego [ctb]
|
Maintainer: | Andriy Protsak Protsak <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2025-02-17 13:32:01 UTC |
Source: | CRAN |
Upon a received input, calculates the output based on the selected activation function
act_method(method, x)
act_method(method, x)
method |
Activation function to be used. It must be one of
|
x |
Input value to be used in the activation function. |
Formulae used:
List with the weights of the inputs.
Víctor Amador Padilla, [email protected]
# example code act_method("step", 0.3) act_method("gelu", 0.7)
# example code act_method("step", 0.3) act_method("gelu", 0.7)
Test Database 5
db_flowers
db_flowers
## 'db_flowers' A data frame representing features of flowers. It has 4 independent variables (first 4 columns) and one independent variable (last column).
Test Database 2
db_per_and
db_per_and
## 'db_per_and' A data frame with 3 independent variables (first 3 columns) and one independent variable (last column). It represents a 3 input "AND" logic gate.
Test Database 3
db_per_or
db_per_or
## 'db_per_or' A data frame with 3 independent variables (first 3 columns) and one independent variable (last column). It represents a 3 input "OR" logic gate.
Test Database 4
db_per_xor
db_per_xor
## 'db_per_xor' A data frame with 3 independent variables (first 3 columns) and one independent variable (last column). It represents a 3 input "XOR" logic gate.
Test Database 8
db_tree_struct
db_tree_struct
## 'db_tree_struct' Decision tree structure. output of the decision_tree() function "decision_tree(db2, "VehicleType", 4, "gini")"
Test Database 1
db1rl
db1rl
## 'db1rl' A data frame with 4 independent variables (first 4 columns, representing different line types). The last column is the independent variable.
Test Database 6
db2
db2
## 'db2' A data frame with 3 independent variables (first 3 columns) and one independent variable (last column). It has information about vehicles.
Test Database 7
db3
db3
## 'db3' A data frame with 3 independent variables (first 3 columns) and one independent variable (last column). It has information about vehicles. Similar to db2 but a little bit more complex.
This function creates a decision tree based of an example dataset, calculating the best classifier possible in each step. Only creates perfect divisions, this means, if the rule doesn't create a classified group, it is not considered. It is specifically designed for categorical values. Continues values are not recommended as they will be treated as categorical ones.
decision_tree( data, classy, m, method = "entropy", learn = FALSE, waiting = TRUE )
decision_tree( data, classy, m, method = "entropy", learn = FALSE, waiting = TRUE )
data |
A data frame with already classified observations. Each column represents a parameter of the value. Each row is a different observation. The column names in the parameter "data" must not contain the sequence of characters " or ". As this is supposed to be a binary decision rules generator and not a binary decision tree generator, no tree structures are used, except for the information gain formulas. |
classy |
Name of the column we want the data to be classified by. the set of rules obtained will be calculated according to this. |
m |
Maximum numbers of child nodes each node can have. |
method |
The definition of Gain. It must be one of
|
learn |
Boolean value. If it is set to "TRUE" multiple clarifications and explanations are printed along the code |
waiting |
If TRUE while |
If data
is not perfectly classifiable, the code will not finish.
Available information gain methods are:
The formula to calculate the entropy
works as follows:
The formula to calculate gini
works as follows:
The formula to calculate error
works as follows:
Once the impurity is calculated, the information gain is calculated as follows:
Structure of the tree. List with a list per tree level. Each of these contains a list per level node, each of these contains a list with the node's filtered data, the node's id, the father's node id, the height that node is at, the variable it filters by, the value that variable is filtered by and the information gain of the division
Víctor Amador Padilla, [email protected]
# example code decision_tree(db3, "VehicleType", 5, "entropy", learn = TRUE, waiting = FALSE) decision_tree(db2, "VehicleType", 4, "gini")
# example code decision_tree(db3, "VehicleType", 5, "entropy", learn = TRUE, waiting = FALSE) decision_tree(db2, "VehicleType", 4, "gini")
This function applies knn algorithm to classify data.
knn( data, ClassLabel, p1, d_method = "euclidean", k, p = 3, learn = FALSE, waiting = TRUE )
knn( data, ClassLabel, p1, d_method = "euclidean", k, p = 3, learn = FALSE, waiting = TRUE )
data |
Data frame with already classified observations. Each column represents a parameter of the values. The last column contains the output, this means, the expected output when the other column values are inputs. Each row is a different observation. |
ClassLabel |
String containing the name of the column of the classes we want to classify |
p1 |
Vector containing the parameters of the new value that we want to classify. |
d_method |
String with the name of the distance method that will
be used. It must be one of |
k |
Number of closest values that will be considered in order to classify the new value ("p1"). |
p |
Exponent used in the |
learn |
Boolean value. If it is set to "TRUE" multiple clarifications and explanations are printed along the code |
waiting |
If TRUE while |
Value of the new classified example.
Víctor Amador Padilla, [email protected]
# example code knn(db_flowers,"ClassLabel", c(4.7, 1.2, 5.3, 2.1), "chebyshev", 4) knn(db_flowers,"ClassLabel", c(4.7, 1.5, 5.3, 2.1), "chebyshev", 5) knn(db_flowers,"ClassLabel", c(6.7, 1.5, 5.3, 2.1), "Euclidean", 2, learn = TRUE, waiting = FALSE) knn(db_per_or,"y", c(1,1,1), "Hamming", 3, learn = TRUE, waiting = FALSE)
# example code knn(db_flowers,"ClassLabel", c(4.7, 1.2, 5.3, 2.1), "chebyshev", 4) knn(db_flowers,"ClassLabel", c(4.7, 1.5, 5.3, 2.1), "chebyshev", 5) knn(db_flowers,"ClassLabel", c(6.7, 1.5, 5.3, 2.1), "Euclidean", 2, learn = TRUE, waiting = FALSE) knn(db_per_or,"y", c(1,1,1), "Hamming", 3, learn = TRUE, waiting = FALSE)
Calculates and plots the linear regression of a given set of values. Being all of them independent values but one, which is the dependent value. It provides information about the process and intermediate values used to calculate the line equation.
multivariate_linear_regression(data, learn = FALSE, waiting = TRUE)
multivariate_linear_regression(data, learn = FALSE, waiting = TRUE)
data |
x*y data frame with already classified observations. Each column represents a parameter of the values (independent variable). The last column represents the classification value (dependent variable). Each row is a different observation. |
learn |
Boolean value. If it is set to "TRUE" multiple clarifications and explanations are printed along the code |
waiting |
If TRUE while |
List containing a list for each independent variable, each one contains, the variable name, the intercept and the slope.
Víctor Amador Padilla, [email protected]
# example code multivariate_linear_regression(db1rl)
# example code multivariate_linear_regression(db1rl)
Binary classification algorithm that learns to separate two classes of data points by finding an optimal decision boundary (hyper plane) in the feature space.
perceptron( training_data, to_clasify, activation_method, max_iter, learning_rate, learn = FALSE, waiting = TRUE )
perceptron( training_data, to_clasify, activation_method, max_iter, learning_rate, learn = FALSE, waiting = TRUE )
training_data |
Data frame with already classified observations. Each column represents a parameter of the values. The last column contains the output, this means, the expected output when the other column values are inputs. Each row is a different observation. It works as training data. |
to_clasify |
Vector containing the parameters of the new value that we want to classify. |
activation_method |
Activation function to be used. It must be one of
|
max_iter |
Maximum epoch during the training phase. |
learning_rate |
Value at which the perceptron will learn from previous epochs mistakes. |
learn |
Boolean value. If it is set to "TRUE" multiple clarifications and explanations are printed along the code |
waiting |
If TRUE while |
Functioning:
Generate a random weight for each independent variable.
Check if the weights classify correctly. If they do, go to step 4
Adjust weights based on the error between the expected output and the real output. If max_iter is reached go to step 4. If not, go to step 2.
Return the weights and use them to classify the new value
List with the weights of the inputs.
Víctor Amador Padilla, [email protected]
# example code perceptron(db_per_or, c(1, 1, 1), "gelu", 1000, 0.1) perceptron(db_per_and, c(0,0,1), "swish", 1000, 0.1, TRUE, FALSE)
# example code perceptron(db_per_or, c(1, 1, 1), "gelu", 1000, 0.1) perceptron(db_per_and, c(0,0,1), "swish", 1000, 0.1, TRUE, FALSE)
Calculates and plots the polynomial regression of a given set of values.
Being all of them independent values but one, which is the dependent value.
It provides (if asked) information about the process and intermediate values used to calculate the line equation.
The approximation depends entirely in the degree
of the equations.
polynomial_regression(data, degree, learn = FALSE, waiting = TRUE)
polynomial_regression(data, degree, learn = FALSE, waiting = TRUE)
data |
x*y data frame with already classified observations. Each column represents a parameter of the values (independent variable). The last column represents the classification value (dependent variable). Each row is a different observation. |
degree |
Degree of the equations approximation. |
learn |
Boolean value. If it is set to "TRUE" multiple clarifications and explanations are printed along the code |
waiting |
If TRUE while |
List containing a list for each independent variable, each one contains the equation coefficients.
Víctor Amador Padilla, [email protected]
# example code polynomial_regression(db1rl,4, TRUE, FALSE) polynomial_regression(db1rl,6)
# example code polynomial_regression(db1rl,4, TRUE, FALSE) polynomial_regression(db1rl,6)
This function prints the structure of a tree, generated by the
decision_tree
function.
## S3 method for class 'tree_struct' print(x, ...)
## S3 method for class 'tree_struct' print(x, ...)
x |
The tree structure. |
... |
Extra useless parameters. |
It must receive a tree_struct
data type.
nothing.
Víctor Amador Padilla, [email protected]
# example code print(db_tree_struct)
# example code print(db_tree_struct)