Title: | Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis |
---|---|
Description: | Offers methods to perform asymptotically bias-corrected regularized linear discriminant analysis (ABC_RLDA) for cost-sensitive binary classification. The bias-correction is an estimate of the bias term added to regularized discriminant analysis (RLDA) that minimizes the overall risk. The default magnitude of misclassification costs are equal and set to 0.5; however, the package also offers the options to set them to some predetermined values or, alternatively, take them as hyperparameters to tune. A. Zollanvari, M. Abdirash, A. Dadlani and B. Abibullaev (2019) <doi:10.1109/LSP.2019.2918485>. |
Authors: | Dmitriy Fedorov [aut, cre], Amin Zollanvari [aut], Aresh Dadlani [aut], Berdakh Abibullaev [aut] |
Maintainer: | Dmitriy Fedorov <[email protected]> |
License: | GPL-3 |
Version: | 1.0.3 |
Built: | 2025-01-30 07:35:25 UTC |
Source: | CRAN |
Constructs Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis.
abcrlda(x, y, gamma = 1, cost = c(0.5, 0.5), bias_correction = TRUE)
abcrlda(x, y, gamma = 1, cost = c(0.5, 0.5), bias_correction = TRUE)
x |
Input matrix or data.frame of dimension |
y |
A numeric vector or factor of class labels. Factor should have either two levels or be
a vector with two distinct values.
If |
gamma |
Regularization parameter
Formulas and derivations for parameters used in above equation can be found in the article under reference section. |
cost |
Parameter that controls the overall misclassification costs.
This is a vector of length 1 or 2 where the first value is If a single value is provided, it should be normalized to lie between 0 and 1 (but not including 0 or 1).
This value will be assigned to |
bias_correction |
Takes in a boolean value.
If |
An object of class "abcrlda" is returned which can be used for class prediction (see predict()).
a |
Coefficient vector of a discriminant hyperplane: W(x) = a' x + m. |
m |
Intercept of discriminant hyperplane: W(x) = a'x + m. |
cost |
Vector of cost values that are used to construct ABC-RLDA. |
ncost |
Normalized cost such that |
gamma |
Regularization parameter value used in ABC_RLDA discriminant function. |
lev |
Levels corresponding to the labels in y. |
A. Zollanvari, M. Abdirash, A. Dadlani and B. Abibullaev, "Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis for Cost-Sensitive Binary Classification," in IEEE Signal Processing Letters, vol. 26, no. 9, pp. 1300-1304, Sept. 2019. doi: 10.1109/LSP.2019.2918485 URL: https://ieeexplore.ieee.org/document/8720003
Other functions in the package:
cross_validation()
,
da_risk_estimator()
,
grid_search()
,
predict.abcrlda()
,
risk_calculate()
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) model <- abcrlda(train_data, train_label, gamma = 0.5, cost = 0.75) a <- predict(model, train_data) # same params but more explicit model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(0.75, 0.25)) b <- predict(model, train_data) # same class costs ratio model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(3, 1)) c <- predict(model, train_data) # all this model will give the same predictions all(a == b & a == c & b == c) #' [1] TRUE
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) model <- abcrlda(train_data, train_label, gamma = 0.5, cost = 0.75) a <- predict(model, train_data) # same params but more explicit model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(0.75, 0.25)) b <- predict(model, train_data) # same class costs ratio model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(3, 1)) c <- predict(model, train_data) # all this model will give the same predictions all(a == b & a == c & b == c) #' [1] TRUE
This function implements Cross Validation for separate sampling adjusted for cost.
cross_validation( x, y, gamma = 1, cost = c(0.5, 0.5), nfolds = 10, bias_correction = TRUE )
cross_validation( x, y, gamma = 1, cost = c(0.5, 0.5), nfolds = 10, bias_correction = TRUE )
x |
Input matrix or data.frame of dimension |
y |
A numeric vector or factor of class labels. Factor should have either two levels or be
a vector with two distinct values.
If |
gamma |
Regularization parameter
Formulas and derivations for parameters used in above equation can be found in the article under reference section. |
cost |
Parameter that controls the overall misclassification costs.
This is a vector of length 1 or 2 where the first value is If a single value is provided, it should be normalized to lie between 0 and 1 (but not including 0 or 1).
This value will be assigned to |
nfolds |
Number of folds to use with cross-validation. Default is 10.
In case of imbalanced data, |
bias_correction |
Takes in a boolean value.
If |
Returns list of parameters.
risk_cross |
Returns risk estimation where |
e_0 |
Error estimate for class 0. |
e_1 |
Error estimate for class 1. |
Braga-Neto, Ulisses & Zollanvari, Amin & Dougherty, Edward. (2014). Cross-Validation Under Separate Sampling: Strong Bias and How to Correct It. Bioinformatics (Oxford, England). 30. 10.1093/bioinformatics/btu527. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4296143/pdf/btu527.pdf
Other functions in the package:
abcrlda()
,
da_risk_estimator()
,
grid_search()
,
predict.abcrlda()
,
risk_calculate()
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) cross_validation(train_data, train_label, gamma = 10)
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) cross_validation(train_data, train_label, gamma = 10)
This function implements the generalized (double asymptotic) consistent estimator of risk.
da_risk_estimator(object)
da_risk_estimator(object)
object |
An object of class "abcrlda". |
Calculates risk based on estimated class error rates and misclassification costs
A. Zollanvari, M. Abdirash, A. Dadlani and B. Abibullaev, "Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis for Cost-Sensitive Binary Classification," in IEEE Signal Processing Letters, vol. 26, no. 9, pp. 1300-1304, Sept. 2019. doi: 10.1109/LSP.2019.2918485 URL: https://ieeexplore.ieee.org/document/8720003
Other functions in the package:
abcrlda()
,
cross_validation()
,
grid_search()
,
predict.abcrlda()
,
risk_calculate()
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) model <- abcrlda(train_data, train_label, gamma = 0.5, cost = 0.75) da_risk_estimator(model)
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) model <- abcrlda(train_data, train_label, gamma = 0.5, cost = 0.75) da_risk_estimator(model)
Performs grid search to estimate the optimal hyperparameters (gamma
and cost
)
within specified space based on double asymptotic risk estimation or cross validation.
Double asymptotic risk estimation is more efficient to compute because it uses closed form for risk estimation.
For further details, refer to the article in the reference section.
Separate sampling cross-validation (see cross-validation function) was adapted to work with cost-based risk estimation.
grid_search( x, y, range_gamma, range_cost, method = "estimator", nfolds = 10, bias_correction = TRUE )
grid_search( x, y, range_gamma, range_cost, method = "estimator", nfolds = 10, bias_correction = TRUE )
x |
Input matrix or data.frame of dimension |
y |
A numeric vector or factor of class labels. Factor should have either two levels or be
a vector with two distinct values.
If |
range_gamma |
Vector of |
range_cost |
nobs x 1 vector (values should be between 0 and 1) or
nobs x 2 matrix (each row is cost pair value c( |
method |
Selects method to evaluete risk. "estimator" and "cross". |
nfolds |
Number of folds to use with cross-validation. Default is 10.
In case of imbalanced data, |
bias_correction |
Takes in a boolean value.
If |
List of estimated parameters.
cost |
Cost value for which risk estimates are lowest during the search. |
gamma |
Gamma regularization parameter for which risk estimates are lowest during the search. |
risk |
Lowest risk value estimated during grid search. |
A. Zollanvari, M. Abdirash, A. Dadlani and B. Abibullaev, "Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis for Cost-Sensitive Binary Classification," in IEEE Signal Processing Letters, vol. 26, no. 9, pp. 1300-1304, Sept. 2019. doi: 10.1109/LSP.2019.2918485 URL: https://ieeexplore.ieee.org/document/8720003
Braga-Neto, Ulisses & Zollanvari, Amin & Dougherty, Edward. (2014). Cross-Validation Under Separate Sampling: Strong Bias and How to Correct It. Bioinformatics (Oxford, England). 30. 10.1093/bioinformatics/btu527. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4296143/pdf/btu527.pdf
Other functions in the package:
abcrlda()
,
cross_validation()
,
da_risk_estimator()
,
predict.abcrlda()
,
risk_calculate()
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) cost_range <- seq(0.1, 0.9, by = 0.2) gamma_range <- c(0.1, 1, 10, 100, 1000) gs <- grid_search(train_data, train_label, range_gamma = gamma_range, range_cost = cost_range, method = "estimator") model <- abcrlda(train_data, train_label, gamma = gs$gamma, cost = gs$cost) predict(model, train_data) cost_range <- matrix(1:10, ncol = 2) gamma_range <- c(0.1, 1, 10, 100, 1000) gs <- grid_search(train_data, train_label, range_gamma = gamma_range, range_cost = cost_range, method = "cross") model <- abcrlda(train_data, train_label, gamma = gs$gamma, cost = gs$cost) predict(model, train_data)
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) cost_range <- seq(0.1, 0.9, by = 0.2) gamma_range <- c(0.1, 1, 10, 100, 1000) gs <- grid_search(train_data, train_label, range_gamma = gamma_range, range_cost = cost_range, method = "estimator") model <- abcrlda(train_data, train_label, gamma = gs$gamma, cost = gs$cost) predict(model, train_data) cost_range <- matrix(1:10, ncol = 2) gamma_range <- c(0.1, 1, 10, 100, 1000) gs <- grid_search(train_data, train_label, range_gamma = gamma_range, range_cost = cost_range, method = "cross") model <- abcrlda(train_data, train_label, gamma = gs$gamma, cost = gs$cost) predict(model, train_data)
Classifies observations based on a given abcrlda object.
## S3 method for class 'abcrlda' predict(object, newx, ...)
## S3 method for class 'abcrlda' predict(object, newx, ...)
object |
An object of class "abcrlda". |
newx |
Matrix of new values for x at which predictions are to be made. |
... |
Argument used by generic function predict(object, x, ...). |
Returns factor vector with predictions (i.e., assigned labels) for each observation. Factor levels are inherited from the object variable.
A. Zollanvari, M. Abdirash, A. Dadlani and B. Abibullaev, "Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis for Cost-Sensitive Binary Classification," in IEEE Signal Processing Letters, vol. 26, no. 9, pp. 1300-1304, Sept. 2019. doi: 10.1109/LSP.2019.2918485 URL: https://ieeexplore.ieee.org/document/8720003
Other functions in the package:
abcrlda()
,
cross_validation()
,
da_risk_estimator()
,
grid_search()
,
risk_calculate()
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) model <- abcrlda(train_data, train_label, gamma = 0.5, cost = 0.75) a <- predict(model, train_data) # same params but more explicit model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(0.75, 0.25)) b <- predict(model, train_data) # same class costs ratio model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(3, 1)) c <- predict(model, train_data) # all this model will give the same predictions all(a == b & a == c & b == c) #' [1] TRUE
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) model <- abcrlda(train_data, train_label, gamma = 0.5, cost = 0.75) a <- predict(model, train_data) # same params but more explicit model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(0.75, 0.25)) b <- predict(model, train_data) # same class costs ratio model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(3, 1)) c <- predict(model, train_data) # all this model will give the same predictions all(a == b & a == c & b == c) #' [1] TRUE
Estimates risk and error by applying a constructed classifier (an object of class abcrlda) to a given set of observations.
risk_calculate(object, x_true, y_true)
risk_calculate(object, x_true, y_true)
object |
An object of class "abcrlda". |
x_true |
Matrix of values for x for which true class labels are known. |
y_true |
A numeric vector or factor of true class labels. Factor should have either two levels or be a vector with two distinct values.
If |
A list of parameters where
actual_err0 |
Error rate for class 0. |
actual_err1 |
Error rate for class 1. |
actual_errTotal |
Error rate overall. |
actual_normrisk |
Risk value normilized to be between 0 and 1. |
actual_risk |
Risk value without normilization. |
Other functions in the package:
abcrlda()
,
cross_validation()
,
da_risk_estimator()
,
grid_search()
,
predict.abcrlda()
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) model <- abcrlda(train_data, train_label, gamma = 0.5, cost = 0.75) risk_calculate(model, train_data, train_label)
data(iris) train_data <- iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 1:4] train_label <- factor(iris[which(iris[, ncol(iris)] == "virginica" | iris[, ncol(iris)] == "versicolor"), 5]) model <- abcrlda(train_data, train_label, gamma = 0.5, cost = 0.75) risk_calculate(model, train_data, train_label)