Package 'wconf' reference manual

Title:	Weighted Confusion Matrix
Description:	Allows users to create weighted confusion matrices and accuracy metrics that help with the model selection process for classification problems, where distance from the correct category is important. The package includes several weighting schemes which can be parameterized, as well as custom configuration options. Furthermore, users can decide whether they wish to positively or negatively affect the accuracy score as a result of applying weights to the confusion matrix. Functions are included to calculate accuracy metrics for imbalanced data. Finally, 'wconf' integrates well with the 'caret' package, but it can also work standalone when provided data in matrix form. References: Kuhn, M. (2008) "Building Perspective Models in R Using the caret Package" <doi:10.18637/jss.v028.i05> Monahov, A. (2021) "Model Evaluation with Weighted Threshold Optimization (and the mewto R package)" <doi:10.2139/ssrn.3805911> Monahov, A. (2024) "Improved Accuracy Metrics for Classification with Imbalanced Data and Where Distance from the Truth Matters, with the wconf R Package" <doi:10.2139/ssrn.4802336> Starovoitov, V., Golub, Y. (2020). New Function for Estimating Imbalanced Data Classification Results. Pattern Recognition and Image Analysis, 295–302 Van de Velden, M., Iodice D'Enza, A., Markos, A., Cavicchia, C. (2023) "A general framework for implementing distances for categorical variables" <doi:10.48550/arXiv.2301.02190>.
Authors:	Alexandru Monahov [aut, cre, cph]
Maintainer:	Alexandru Monahov <[email protected]>
License:	CC BY-SA 4.0
Version:	1.2.0
Built:	2025-01-30 07:41:36 UTC
Source:	CRAN

Starovoitov-Golub Sine-Accuracy Function for Imbalanced Classification Data

Description

This function calculates classification accuracy scores using the sine-based formulas proposed by Starovoitov and Golub (2020). The advantage of the new method consists in producing improved results when compared with the standard balanced accuracy function, by taking into account the class distribution of errors.

Usage

balancedaccuracy(m, print.scores = TRUE)
balancedaccuracy(m, print.scores = TRUE)

Arguments

`m`	the caret confusion matrix object or simple matrix.
`print.scores`	print the accuracy metrics.

Details

The input object "m" should be a square matrix of at least size 2x2.

Value

a list containing 5 elements: 3 overall and 2 class accuracy scores

Author(s)

Alexandru Monahov, <https://www.alexandrumonahov.eu.org/>

Examples

m = matrix(c(70,0,0,10,10,0,5,3,2), ncol = 3, nrow=3)
balancedaccuracy(m, print.scores = TRUE)

m = matrix(c(70,0,0,10,10,0,5,3,2), ncol = 3, nrow=3)
balancedaccuracy(m, print.scores = TRUE)

Redistributed confusion matrix

Description

This function calculates the redistributed confusion matrix from a caret ConfusionMatrix object or a simple matrix and optionally prints the redistributed standard accuracy score. The redistributed confusion matrix can serve to place significance on observations close to the diagonal by applying a custom weighting scheme which transfers a proportion of the non-diagonal observations to the diagonal.

Usage

rconfusionmatrix(m, custom.weights = NA,
                        print.weighted.accuracy = FALSE)
rconfusionmatrix(m, custom.weights = NA,
                        print.weighted.accuracy = FALSE)

Arguments

`m`	the caret confusion matrix object or simple matrix.
`custom.weights`	the vector of custom weights to be applied, which should be equal to "n", but can be larger, with excess values, as well as the first element, being ignored. The first element is ignored because it represents weighting applied to the diagonal. As, in the case of redistribution, a proportion of the non-diagonal observations is shifted towards the diagonal, the weighting applied to the diagonal depends on the weights assigned to the non-diagonal elements, and is thus not configurable by the user.
`print.weighted.accuracy`	print the standard accuracy metric for the redistributed matrix, which represents the sum of the correctly classified observations (or the diagonal elements of the matrix) divided by the total number of observations (or the sum of all observations).

Details

The number of categories "n" should be greater or equal to 2.

Value

an nxn weighted confusion matrix

Author(s)

Alexandru Monahov, <https://www.alexandrumonahov.eu.org/>

Examples

m = matrix(c(70,0,0,10,10,0,5,3,2), ncol = 3, nrow=3)
rconfusionmatrix(m, custom.weights = c(0,0.5,0.25),
                 print.weighted.accuracy = TRUE)

m = matrix(c(70,0,0,10,10,0,5,3,2), ncol = 3, nrow=3)
rconfusionmatrix(m, custom.weights = c(0,0.5,0.25),
                 print.weighted.accuracy = TRUE)

Weighted confusion matrix

Description

This function calculates the weighted confusion matrix from a caret ConfusionMatrix object or a simple matrix, according to one of several weighting schemas and optionally prints the weighted accuracy score.

Usage

wconfusionmatrix(m, weight.type = "arithmetic",
                        weight.penalty = FALSE,
                        standard.deviation = 2,
                        geometric.multiplier = 2,
                        interval.high=1, interval.low = -1,
                        sin.high=1.5*pi, sin.low = 0.5*pi,
                        tanh.decay = 3,
                        custom.weights = NA,
                        print.weighted.accuracy = FALSE)
wconfusionmatrix(m, weight.type = "arithmetic",
                        weight.penalty = FALSE,
                        standard.deviation = 2,
                        geometric.multiplier = 2,
                        interval.high=1, interval.low = -1,
                        sin.high=1.5*pi, sin.low = 0.5*pi,
                        tanh.decay = 3,
                        custom.weights = NA,
                        print.weighted.accuracy = FALSE)

Arguments

`m`	the caret confusion matrix object or simple matrix.
`weight.type`	the weighting schema to be used. Can be one of: "arithmetic" - a decreasing arithmetic progression weighting scheme, "geometric" - a decreasing geometric progression weighting scheme, "normal" - weights drawn from the right tail of a normal distribution, "interval" - weights contained on a user-defined interval, "sin" - a weighing scheme based on a sine function, "tanh" - a weighing scheme based on a hyperbolic tangent function, "custom" - custom weight vector defined by the user.
`weight.penalty`	determines whether the weights associated with non-diagonal elements generated by the "normal", "arithmetic" and "geometric" weight types are positive or negative values. By default, the value is set to FALSE, which means that generated weights will be positive values.
`standard.deviation`	standard deviation of the normal distribution, if the normal distribution weighting schema is used.
`geometric.multiplier`	the multiplier used to construct the geometric progression series, if the geometric progression weighting scheme is used.
`interval.high`	the upper bound of the weight interval, if the interval weighting scheme is used.
`interval.low`	the lower bound of the weight interval, if the interval weighting scheme is used.
`sin.high`	the upper segment of the sine function to be used in the weighting scheme.
`sin.low`	the lower segment of the sine function to be used in the weighting scheme.
`tanh.decay`	the decay factor of the hyperbolic tangent weighing function. Higher values increase the rate of decay and place less weight on observations farther away from the correctly predicted category.
`custom.weights`	the vector of custom weight to be applied, if the custom weighting scheme was selected. The vector should be equal to "n", but can be larger, with excess values being ignored.
`print.weighted.accuracy`	print the weighted accuracy metric, which represents the sum of all weighted confusion matrix cells divided by the total number of observations.

Details

The number of categories "n" should be greater or equal to 2.

Value

an nxn weighted confusion matrix

Author(s)

Alexandru Monahov, <https://www.alexandrumonahov.eu.org/>

Examples

m = matrix(c(70,0,0,10,10,0,5,3,2), ncol = 3, nrow=3)
wconfusionmatrix(m, weight.type="arithmetic", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="geometric", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="interval", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="normal", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="sin", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="tanh", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type= "custom", custom.weights = c(1,0.1,0),
                 print.weighted.accuracy = TRUE)

m = matrix(c(70,0,0,10,10,0,5,3,2), ncol = 3, nrow=3)
wconfusionmatrix(m, weight.type="arithmetic", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="geometric", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="interval", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="normal", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="sin", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type="tanh", print.weighted.accuracy = TRUE)
wconfusionmatrix(m, weight.type= "custom", custom.weights = c(1,0.1,0),
                 print.weighted.accuracy = TRUE)

Weight matrix

Description

This function compiles a weight matrix according to one of several weighting schemas and allows users to visualize the impact of the weight matrix on each element of the confusion matrix.

Usage

weightmatrix(n, weight.type = "arithmetic", weight.penalty = FALSE,
                    standard.deviation = 2,
                    geometric.multiplier = 2,
                    interval.high = 1, interval.low = -1,
                    sin.high = 1.5 * pi, sin.low = 0.5 * pi,
                    tanh.decay = 3,
                    custom.weights = NA,
                    plot.weights = FALSE)
weightmatrix(n, weight.type = "arithmetic", weight.penalty = FALSE,
                    standard.deviation = 2,
                    geometric.multiplier = 2,
                    interval.high = 1, interval.low = -1,
                    sin.high = 1.5 * pi, sin.low = 0.5 * pi,
                    tanh.decay = 3,
                    custom.weights = NA,
                    plot.weights = FALSE)

Arguments

`n`	the number of classes contained in the confusion matrix.
`weight.type`	the weighting schema to be used. Can be one of: "arithmetic" - a decreasing arithmetic progression weighting scheme, "geometric" - a decreasing geometric progression weighting scheme, "normal" - weights drawn from the right tail of a normal distribution, "interval" - weights contained on a user-defined interval, "sin" - a weighing scheme based on a sine function, "tanh" - a weighing scheme based on a hyperbolic tangent function, "custom" - custom weight vector defined by the user.
`weight.penalty`	determines whether the weights associated with non-diagonal elements generated by the "normal", "arithmetic" and "geometric" weight types are positive or negative values. By default, the value is set to FALSE, which means that generated weights will be positive values.
`standard.deviation`	standard deviation of the normal distribution, if the normal distribution weighting schema is used.
`geometric.multiplier`	the multiplier used to construct the geometric progression series, if the geometric progression weighting scheme is used.
`interval.high`	the upper bound of the weight interval, if the interval weighting scheme is used.
`interval.low`	the lower bound of the weight interval, if the interval weighting scheme is used.
`sin.high`	the upper segment of the sine function to be used in the weighting scheme.
`sin.low`	the lower segment of the sine function to be used in the weighting scheme.
`tanh.decay`	the decay factor of the hyperbolic tangent weighing function. Higher values increase the rate of decay and place less weight on observations farther away from the correctly predicted category.
`custom.weights`	the vector of custom weights to be applied, is the custom weighting scheme was selected. The vector should be equal to "n", but can be larger, with excess values being ignored.
`plot.weights`	optional setting to enable plotting of weight vector, corresponding to the first column of the weight matrix

Details

The number of categories "n" should be greater or equal to 2.

Value

an nxn matrix, containing the weights to be multiplied with the confusion matrix.

Author(s)

Alexandru Monahov, <https://www.alexandrumonahov.eu.org/>

Examples

weightmatrix(n=4, weight.type="arithmetic", plot.weights = TRUE)
weightmatrix(n=4, weight.type="normal", standard.deviation = 1,
             plot.weights = TRUE)
weightmatrix(n=4, weight.type="interval", interval.high = 1,
             interval.low = -0.5, plot.weights = TRUE)
weightmatrix(n=4, weight.type="geometric", geometric.multiplier = 0.6)
weightmatrix(n=10, weight.type="sin", sin.low = 0, sin.high = pi,
             plot.weights = TRUE)
weightmatrix(n=10, weight.type="tanh", tanh.decay = 5, plot.weights = TRUE)
weightmatrix(n=4, weight.type="custom", custom.weights = c(1,0.2,0.1,0),
             plot.weights = TRUE)

weightmatrix(n=4, weight.type="arithmetic", plot.weights = TRUE)
weightmatrix(n=4, weight.type="normal", standard.deviation = 1,
             plot.weights = TRUE)
weightmatrix(n=4, weight.type="interval", interval.high = 1,
             interval.low = -0.5, plot.weights = TRUE)
weightmatrix(n=4, weight.type="geometric", geometric.multiplier = 0.6)
weightmatrix(n=10, weight.type="sin", sin.low = 0, sin.high = pi,
             plot.weights = TRUE)
weightmatrix(n=10, weight.type="tanh", tanh.decay = 5, plot.weights = TRUE)
weightmatrix(n=4, weight.type="custom", custom.weights = c(1,0.2,0.1,0),
             plot.weights = TRUE)

Package 'wconf'

Help Index

Starovoitov-Golub Sine-Accuracy Function for Imbalanced Classification Data

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Redistributed confusion matrix

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Weighted confusion matrix

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Weight matrix

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples