Package 'mlr3measures' reference manual

Title:	Performance Measures for 'mlr3'
Description:	Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are.
Authors:	Michel Lang [aut] , Martin Binder [ctb], Marc Becker [cre, aut] , Lona Koers [aut]
Maintainer:	Marc Becker <marcbecker@posteo.de>
License:	LGPL-3
Version:	1.0.0
Built:	2025-03-11 07:00:15 UTC
Source:	CRAN

mlr3measures: Performance Measures for 'mlr3'

Description

Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are.

Author(s)

Maintainer: Marc Becker marcbecker@posteo.de (ORCID)

Authors:

Michel Lang michellang@gmail.com (ORCID)
Lona Koers

Other contributors:

Martin Binder mlr.developer@mb706.com [contributor]

Classification Accuracy

Description

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Usage

acc(truth, response, sample_weights = NULL, ...)
acc(truth, response, sample_weights = NULL, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the same levels and length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the same levels and length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Classification Accuracy is defined as

$\frac{1}{n} \sum_{i=1}^n w_i \mathbf{1} \left( t_i = r_i \right),$

where $w_i$ are normalized weights for all observations $x_i$ .

Value

Performance value as numeric(1).

Meta Information

Type: "classif"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: response

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
acc(truth, response)
set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
acc(truth, response)

Absolute Error (per observation)

Description

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

ae(truth, response, ...)
ae(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Calculates the per-observation absolute error as

$\left| t_i - r_i \right|.$

Value

Performance value as numeric(length(truth)).

Meta Information

Type: "regr"
Range (per observation): $[0, \infty)$
Minimize (per observation): TRUE
Required prediction: response

Absolute Percentage Error (per observation)

Description

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

ape(truth, response, ...)
ape(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Calculates the per-observation absolute percentage error as

$\left| \frac{ t_i - r_i}{t_i} \right|.$

Value

Performance value as numeric(length(truth)).

Meta Information

Type: "regr"
Range (per observation): $[0, \infty)$
Minimize (per observation): TRUE
Required prediction: response

Area Under the ROC Curve

Description

Measure to compare true observed labels with predicted probabilities in binary classification tasks.

Usage

auc(truth, prob, positive, na_value = NaN, ...)
auc(truth, prob, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`prob`	(`numeric()`) Predicted probability for positive class. Must have exactly same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Computes the area under the Receiver Operator Characteristic (ROC) curve. The AUC can be interpreted as the probability that a randomly chosen positive observation has a higher predicted probability than a randomly chosen negative observation.

This measure is undefined if the true values are either all positive or all negative.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: prob

References

Youden WJ (1950). “Index for rating diagnostic tests.” Cancer, 3(1), 32–35. doi:10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3.

Examples

truth = factor(c("a", "a", "a", "b"))
prob = c(.6, .7, .1, .4)
auc(truth, prob, "a")
truth = factor(c("a", "a", "a", "b"))
prob = c(.6, .7, .1, .4)
auc(truth, prob, "a")

Balanced Accuracy

Description

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Usage

bacc(truth, response, sample_weights = NULL, ...)
bacc(truth, response, sample_weights = NULL, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the same levels and length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the same levels and length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Balanced Accuracy computes the weighted balanced accuracy, suitable for imbalanced data sets. It is defined analogously to the definition in sklearn.

First, all sample weights $w_i$ are normalized per class so that each class has the same influence:

$\hat{w}_i = \frac{w_i}{\sum_{j=1}^n w_j \cdot \mathbf{1}(t_j = t_i)}.$

The Balanced Accuracy is then calculated as

$\frac{1}{\sum_{i=1}^n \hat{w}_i} \sum_{i=1}^n \hat{w}_i \cdot \mathbf{1}(r_i = t_i).$

This definition is equivalent to acc() with class-balanced sample weights.

Value

Performance value as numeric(1).

Meta Information

Type: "classif"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: response

References

Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010). “The Balanced Accuracy and Its Posterior Distribution.” In 2010 20th International Conference on Pattern Recognition. doi:10.1109/icpr.2010.764.

Guyon I, Bennett K, Cawley G, Escalante HJ, Escalera S, Ho TK, Macia N, Ray B, Saeed M, Statnikov A, Viegas E (2015). “Design of the 2015 ChaLearn AutoML challenge.” In 2015 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn.2015.7280767.

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
bacc(truth, response)
set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
bacc(truth, response)

Binary Brier Score

Description

Measure to compare true observed labels with predicted probabilities in binary classification tasks.

Usage

bbrier(truth, prob, positive, sample_weights = NULL, ...)
bbrier(truth, prob, positive, sample_weights = NULL, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`prob`	(`numeric()`) Predicted probability for positive class. Must have exactly same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Binary Brier Score is defined as

$\frac{1}{n} \sum_{i=1}^n w_i (I_i - p_i)^2,$

where $w_i$ are the sample weights, and $I_{i}$ is 1 if observation $x_i$ belongs to the positive class, and 0 otherwise.

Note that this (more common) definition of the Brier score is equivalent to the original definition of the multi-class Brier score (see mbrier()) divided by 2.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: TRUE
Required prediction: prob

References

https://en.wikipedia.org/wiki/Brier_score

Brier GW (1950). “Verification of forecasts expressed in terms of probability.” Monthly Weather Review, 78(1), 1–3. doi:10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = runif(10)
bbrier(truth, prob, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = runif(10)
bbrier(truth, prob, positive = "a")

Bias

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

bias(truth, response, sample_weights = NULL, ...)
bias(truth, response, sample_weights = NULL, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Bias is defined as

$\frac{1}{n} \sum_{i=1}^n w_i \left( t_i - r_i \right),$

where $w_i$ are normalized sample weights. Good predictions score close to 0.

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $(-\infty, \infty)$
Minimize: NA
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
bias(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
bias(truth, response)

Classification Error

Description

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Usage

ce(truth, response, sample_weights = NULL, ...)
ce(truth, response, sample_weights = NULL, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the same levels and length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the same levels and length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Classification Error is defined as

$\frac{1}{n} \sum_{i=1}^n w_i \mathbf{1} \left( t_i \neq r_i \right),$

where $w_i$ are normalized weights for each observation $x_i$ .

Value

Performance value as numeric(1).

Meta Information

Type: "classif"
Range: $[0, 1]$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
ce(truth, response)
set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
ce(truth, response)

Calculate Binary Confusion Matrix

Description

Calculates the confusion matrix for a binary classification problem once and then calculates all binary confusion measures of this package.

Usage

confusion_matrix(truth, response, positive, na_value = NaN, relative = FALSE)
confusion_matrix(truth, response, positive, na_value = NaN, relative = FALSE)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`relative`	(`logical(1)`) If `TRUE`, the returned confusion matrix contains relative frequencies instead of absolute frequencies.

Details

The binary confusion matrix is defined as

$\begin{pmatrix} TP & FP \\ FN & TN \end{pmatrix}.$

If relative = TRUE, all values are divided by $n$ .

Value

List with two elements:

matrix stores the calculated confusion matrix.
measures stores the metrics as named numeric vector.

Examples

set.seed(123)
lvls = c("a", "b")
truth = factor(sample(lvls, 20, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 20, replace = TRUE), levels = lvls)

confusion_matrix(truth, response, positive = "a")
confusion_matrix(truth, response, positive = "a", relative = TRUE)
confusion_matrix(truth, response, positive = "b")
set.seed(123)
lvls = c("a", "b")
truth = factor(sample(lvls, 20, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 20, replace = TRUE), levels = lvls)

confusion_matrix(truth, response, positive = "a")
confusion_matrix(truth, response, positive = "a", relative = TRUE)
confusion_matrix(truth, response, positive = "b")

Diagnostic Odds Ratio

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

dor(truth, response, positive, na_value = NaN, ...)
dor(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Diagnostic Odds Ratio is defined as

$\frac{\mathrm{TP}/\mathrm{FP}}{\mathrm{FN}/\mathrm{TN}}.$

This measure is undefined if FP = 0 or FN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, \infty)$
Minimize: FALSE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
dor(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
dor(truth, response, positive = "a")

F-beta Score

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fbeta(truth, response, positive, beta = 1, na_value = NaN, ...)
fbeta(truth, response, positive, beta = 1, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`beta`	(`numeric(1)`) Parameter to give either precision or recall more weight. Default is `1`, resulting in balanced weights.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

With $P$ as precision() and $R$ as recall(), the F-beta Score is defined as

$(1 + \beta^2) \frac{P \cdot R}{(\beta^2 P) + R}.$

It measures the effectiveness of retrieval with respect to a user who attaches $\beta$ times as much importance to recall as precision. For $\beta = 1$ , this measure is called "F1" score.

This measure is undefined if precision or recall is undefined, i.e. TP + FP = 0 or TP + FN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: response

References

Rijsbergen, Van CJ (1979). Information Retrieval, 2nd edition. Butterworth-Heinemann, Newton, MA, USA. ISBN 408709294.

Goutte C, Gaussier E (2005). “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation.” In Lecture Notes in Computer Science, 345–359. doi:10.1007/978-3-540-31865-1_25.

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fbeta(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fbeta(truth, response, positive = "a")

False Discovery Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fdr(truth, response, positive, na_value = NaN, ...)
fdr(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The False Discovery Rate is defined as

$\frac{\mathrm{FP}}{\mathrm{TP} + \mathrm{FP}}.$

This measure is undefined if TP + FP = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: TRUE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fdr(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fdr(truth, response, positive = "a")

False Negatives

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fn(truth, response, positive, ...)
fn(truth, response, positive, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`...`	(`any`) Additional arguments. Currently ignored.

Details

This measure counts the false negatives (type 2 error), i.e. the number of predictions indicating a negative class label while in fact it is positive. This is sometimes also called a "miss" or an "underestimation".

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fn(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fn(truth, response, positive = "a")

False Negative Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fnr(truth, response, positive, na_value = NaN, ...)
fnr(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The False Negative Rate is defined as

$\frac{\mathrm{FN}}{\mathrm{TP} + \mathrm{FN}}.$

Also know as "miss rate".

This measure is undefined if TP + FN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: TRUE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fnr(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fnr(truth, response, positive = "a")

False Omission Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fomr(truth, response, positive, na_value = NaN, ...)
fomr(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The False Omission Rate is defined as

$\frac{\mathrm{FN}}{\mathrm{FN} + \mathrm{TN}}.$

This measure is undefined if FN + TN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: TRUE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fomr(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fomr(truth, response, positive = "a")

False Positives

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fp(truth, response, positive, ...)
fp(truth, response, positive, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`...`	(`any`) Additional arguments. Currently ignored.

Details

This measure counts the false positives (type 1 error), i.e. the number of predictions indicating a positive class label while in fact it is negative. This is sometimes also called a "false alarm".

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fp(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fp(truth, response, positive = "a")

False Positive Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fpr(truth, response, positive, na_value = NaN, ...)
fpr(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The False Positive Rate is defined as

$\frac{\mathrm{FP}}{\mathrm{FP} + \mathrm{TN}}.$

Also know as fall out or probability of false alarm.

This measure is undefined if FP + TN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: TRUE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fpr(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fpr(truth, response, positive = "a")

Geometric Mean of Recall and Specificity

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

gmean(truth, response, positive, na_value = NaN, ...)
gmean(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Calculates the geometric mean of recall() R and specificity() S as

$\sqrt{\mathrm{R} \cdot \mathrm{S}}.$

This measure is undefined if recall or specificity is undefined, i.e. if TP + FN = 0 or if FP + TN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: response

References

He H, Garcia EA (2009). “Learning from Imbalanced Data.” IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. doi:10.1109/TKDE.2008.239.

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
gmean(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
gmean(truth, response, positive = "a")

Geometric Mean of Precision and Recall

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

gpr(truth, response, positive, na_value = NaN, ...)
gpr(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Calculates the geometric mean of precision() P and recall() R as

$\sqrt{\mathrm{P} \cdot \mathrm{R}}.$

This measure is undefined if precision or recall is undefined, i.e. if TP + FP = 0 or if TP + FN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: response

References

He H, Garcia EA (2009). “Learning from Imbalanced Data.” IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. doi:10.1109/TKDE.2008.239.

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
gpr(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
gpr(truth, response, positive = "a")

Jaccard Similarity Index

Description

Measure to compare two or more sets w.r.t. their similarity.

Usage

jaccard(sets, na_value = NaN, ...)
jaccard(sets, na_value = NaN, ...)

Arguments

`sets`	(`list()`) List of character or integer vectors. `sets` must have at least 2 elements.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

For two sets $A$ and $B$ , the Jaccard Index is defined as

$J(A, B) = \frac{|A \cap B|}{|A \cup B|}.$

If more than two sets are provided, the mean of all pairwise scores is calculated.

This measure is undefined if two or more sets are empty.

Value

Performance value as numeric(1).

Meta Information

Type: "similarity"
Range: $[0, 1]$
Minimize: FALSE

References

Jaccard, Paul (1901). “Étude comparative de la distribution florale dans une portion des Alpes et du Jura.” Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547-579. doi:10.5169/SEALS-266450.

Bommert A, Rahnenführer J, Lang M (2017). “A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.” Computational and Mathematical Methods in Medicine, 2017, 1–18. doi:10.1155/2017/7907163.

Bommert A, Lang M (2021). “stabm: Stability Measures for Feature Selection.” Journal of Open Source Software, 6(59), 3010. doi:10.21105/joss.03010.

Examples

set.seed(1)
sets = list(
  sample(letters[1:3], 1),
  sample(letters[1:3], 2)
)
jaccard(sets)
set.seed(1)
sets = list(
  sample(letters[1:3], 1),
  sample(letters[1:3], 2)
)
jaccard(sets)

Kendall's tau

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

ktau(truth, response, ...)
ktau(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Kendall's tau is defined as Kendall's rank correlation coefficient between truth and response. It is defined as

$\tau = \frac{(\mathrm{number of concordant pairs)} - (\mathrm{number of discordant pairs)}}{\mathrm{(number of pairs)}}$

Calls stats::cor() with method set to "kendall".

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[-1, 1]$
Minimize: FALSE
Required prediction: response

References

Rosset S, Perlich C, Zadrozny B (2006). “Ranking-based evaluation of regression models.” Knowledge and Information Systems, 12(3), 331–353. doi:10.1007/s10115-006-0037-3.

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
ktau(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
ktau(truth, response)

Linear-Exponential Loss (per observation)

Description

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

linex(truth, response, a = -1, b = 1, ...)
linex(truth, response, a = -1, b = 1, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`a`	(`numeric(1)`) Shape parameter controlling asymmetry. Negative values penalize overestimation more, positive values penalize underestimation more. As `a` approaches 0, the loss resembles squared error loss. Default is `-1`.
`b`	(`numeric(1)`) Positive scaling factor for the loss. Larger values increase the loss magnitude. Default is `1`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Linear-Exponential Loss is defined as

$b (\exp (t_i - r_i) - a (t_i - r_i) - 1),$

where $a \neq 0, b > 0$ .

Value

Performance value as numeric(length(truth)).

Meta Information

Type: "regr"
Range (per observation): $[0, \infty)$
Minimize (per observation): TRUE
Required prediction: response

References

Varian, R. H (1975). “A Bayesian Approach to Real Estate Assessment.” In Fienberg SE, Zellner A (eds.), Studies in Bayesian Econometrics and Statistics: In Honor of Leonard J. Savage, 195–208. North-Holland, Amsterdam.

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
linex(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
linex(truth, response)

Log Loss

Description

Measure to compare true observed labels with predicted probabilities in multiclass classification tasks.

Usage

logloss(truth, prob, sample_weights = NULL, eps = 1e-15, ...)
logloss(truth, prob, sample_weights = NULL, eps = 1e-15, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the same levels and length as `response`.
`prob`	(`matrix()`) Matrix of predicted probabilities, each column is a vector of probabilities for a specific class label. Columns must be named with levels of `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`eps`	(`numeric(1)`) Probabilities are clipped to `max(eps, min(1 - eps, p))`. Otherwise the measure would be undefined for probabilities `p = 0` and `p = 1`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Log Loss (a.k.a Benoulli Loss, Logistic Loss, Cross-Entropy Loss) is defined as

$-\frac{1}{n} \sum_{i=1}^n w_i \log \left( p_i \right )$

where $p_i$ is the probability for the true class of observation $i$ and $w_i$ are normalized weights for each observation $x_i$ .

Value

Performance value as numeric(1).

Meta Information

Type: "classif"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: prob

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = matrix(runif(3 * 10), ncol = 3, dimnames = list(NULL, lvls))
prob = t(apply(prob, 1, function(x) x / sum(x)))
logloss(truth, prob)
set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = matrix(runif(3 * 10), ncol = 3, dimnames = list(NULL, lvls))
prob = t(apply(prob, 1, function(x) x / sum(x)))
logloss(truth, prob)

Mean Absolute Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

mae(truth, response, sample_weights = NULL, ...)
mae(truth, response, sample_weights = NULL, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Mean Absolute Error is defined as

$\frac{1}{n} \sum_{i=1}^n w_i \left| t_i - r_i \right|,$

where $w_i$ are normalized sample weights.

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
mae(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
mae(truth, response)

Mean Absolute Percent Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

mape(truth, response, sample_weights = NULL, na_value = NaN, ...)
mape(truth, response, sample_weights = NULL, na_value = NaN, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Mean Absolute Percent Error is defined as

$\frac{1}{n} \sum_{i=1}^n w_i \left| \frac{ t_i - r_i}{t_i} \right|,$

where $w_i$ are normalized sample weights.

This measure is undefined if any element of $t$ is $0$ .

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

References

de Myttenaere, Arnaud, Golden, Boris, Le Grand, Bénédicte, Rossi, Fabrice (2016). “Mean Absolute Percentage Error for regression models.” Neurocomputing, 192, 38-48. ISSN 0925-2312, doi:10.1016/j.neucom.2015.12.114.

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
mape(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
mape(truth, response)

Multiclass AUC Scores

Description

Measure to compare true observed labels with predicted probabilities in multiclass classification tasks.

Usage

mauc_aunu(truth, prob, na_value = NaN, ...)

mauc_aunp(truth, prob, na_value = NaN, ...)

mauc_au1u(truth, prob, na_value = NaN, ...)

mauc_au1p(truth, prob, na_value = NaN, ...)

mauc_mu(truth, prob, na_value = NaN, ...)
mauc_aunu(truth, prob, na_value = NaN, ...)

mauc_aunp(truth, prob, na_value = NaN, ...)

mauc_au1u(truth, prob, na_value = NaN, ...)

mauc_au1p(truth, prob, na_value = NaN, ...)

mauc_mu(truth, prob, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the same levels and length as `response`.
`prob`	(`matrix()`) Matrix of predicted probabilities, each column is a vector of probabilities for a specific class label. Columns must be named with levels of `truth`.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Multiclass AUC measures.

AUNU: AUC of each class against the rest, using the uniform class distribution. Computes the AUC treating a c-dimensional classifier as c two-dimensional 1-vs-rest classifiers, where classes are assumed to have uniform distribution, in order to have a measure which is independent of class distribution change (Fawcett 2001).
AUNP: AUC of each class against the rest, using the a-priori class distribution. Computes the AUC treating a c-dimensional classifier as c two-dimensional 1-vs-rest classifiers, taking into account the prior probability of each class (Fawcett 2001).
AU1U: AUC of each class against each other, using the uniform class distribution. Computes something like the AUC of c(c - 1) binary classifiers (all possible pairwise combinations). See Hand (2001) for details.
AU1P: AUC of each class against each other, using the a-priori class distribution. Computes something like AUC of c(c - 1) binary classifiers while considering the a-priori distribution of the classes as suggested in Ferri (2009). Note we deviate from the definition in Ferri (2009) by a factor of c.
MU: Multiclass AUC as defined in Kleinman and Page (2019). This measure is an average of the pairwise AUCs between all classes. The measure was tested against the Python implementation by Ross Kleinman.

Value

Performance value as numeric(1).

Meta Information

Type: "classif"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: prob

References

Fawcett, Tom (2001). “Using rule sets to maximize ROC performance.” In Proceedings 2001 IEEE international conference on data mining, 131–138. IEEE.

Ferri, César, Hernández-Orallo, José, Modroiu, R (2009). “An experimental comparison of performance measures for classification.” Pattern Recognition Letters, 30(1), 27–38. doi:10.1016/j.patrec.2008.08.010.

Hand, J D, Till, J R (2001). “A simple generalisation of the area under the ROC curve for multiple class classification problems.” Machine learning, 45(2), 171–186.

Kleiman R, Page D (2019). “AUC mu: A Performance Metric for Multi-Class Machine Learning Models.” In Chaudhuri, Kamalika, Salakhutdinov, Ruslan (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 series Proceedings of Machine Learning Research, 3439–3447. PMLR.

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = matrix(runif(3 * 10), ncol = 3)
colnames(prob) = levels(truth)
mauc_aunu(truth, prob)
set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = matrix(runif(3 * 10), ncol = 3)
colnames(prob) = levels(truth)
mauc_aunu(truth, prob)

Max Absolute Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

maxae(truth, response, ...)
maxae(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Max Absolute Error is defined as

$\max \left( \left| t_i - r_i \right| \right).$

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
maxae(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
maxae(truth, response)

Max Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

maxse(truth, response, ...)
maxse(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Max Squared Error is defined as

$\max \left( t_i - r_i \right)^2.$

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
maxse(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
maxse(truth, response)

Multiclass Brier Score

Description

Measure to compare true observed labels with predicted probabilities in multiclass classification tasks.

Usage

mbrier(truth, prob, ...)
mbrier(truth, prob, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the same levels and length as `response`.
`prob`	(`matrix()`) Matrix of predicted probabilities, each column is a vector of probabilities for a specific class label. Columns must be named with levels of `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Brier score for multi-class classification problems with $k$ labels defined as

$\frac{1}{n} \sum_{i=1}^n \sum_{j=1}^k (I_{ij} - p_{ij})^2.$

$I_{ij}$ is 1 if observation $x_i$ has true label $j$ , and 0 otherwise. $p_{ij}$ is the probability that observation $x_i$ belongs to class $j$ .

Note that there also is the more common definition of the Brier score for binary classification problems in bbrier().

Value

Performance value as numeric(1).

Meta Information

Type: "classif"
Range: $[0, 2]$
Minimize: TRUE
Required prediction: prob

References

Brier GW (1950). “Verification of forecasts expressed in terms of probability.” Monthly Weather Review, 78(1), 1–3. doi:10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = matrix(runif(3 * 10), ncol = 3)
colnames(prob) = levels(truth)
mbrier(truth, prob)
set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = matrix(runif(3 * 10), ncol = 3)
colnames(prob) = levels(truth)
mbrier(truth, prob)

Matthews Correlation Coefficient

Description

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Usage

mcc(truth, response, positive = NULL, ...)
mcc(truth, response, positive = NULL, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the same levels and length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the same levels and length as `truth`.
`positive`	(`character(1)`) Name of the positive class in case of binary classification.
`...`	(`any`) Additional arguments. Currently ignored.

Details

In the binary case, the Matthews Correlation Coefficient is defined as

$\frac{\mathrm{TP} \cdot \mathrm{TN} - \mathrm{FP} \cdot \mathrm{FN}}{\sqrt{(\mathrm{TP} + \mathrm{FP}) (\mathrm{TP} + \mathrm{FN}) (\mathrm{TN} + \mathrm{FP}) (\mathrm{TN} + \mathrm{FN})}},$

where $TP$ , $FP$ , $TN$ , $TP$ are the number of true positives, false positives, true negatives, and false negatives respectively.

In the multi-class case, the Matthews Correlation Coefficient is defined for a multi-class confusion matrix $C$ with $K$ classes:

$\frac{c \cdot s - \sum_k^K p_k \cdot t_k}{\sqrt{(s^2 - \sum_k^K p_k^2) \cdot (s^2 - \sum_k^K t_k^2)}},$

where

$s = \sum_i^K \sum_j^K C_{ij}$ : total number of samples
$c = \sum_k^K C_{kk}$ : total number of correctly predicted samples
$t_k = \sum_i^K C_{ik}$ : number of predictions for each class $k$
$p_k = \sum_j^K C_{kj}$ : number of true occurrences for each class $k$ .

The above formula is undefined if any of the four sums in the denominator is 0 in the binary case and more generally if either $s^2 - \sum_k^K p_k^2$ or $s^2 - \sum_k^K t_k^2)$ is equal to 0. The denominator is then set to 1.

When there are more than two classes, the MCC will no longer range between -1 and +1. Instead, the minimum value will be between -1 and 0 depending on the true distribution. The maximum value is always +1.

Value

Performance value as numeric(1).

Meta Information

Type: "classif"
Range: $[-1, 1]$
Minimize: FALSE
Required prediction: response

References

https://en.wikipedia.org/wiki/Phi_coefficient

Matthews BW (1975). “Comparison of the predicted and observed secondary structure of T4 phage lysozyme.” Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2), 442–451. doi:10.1016/0005-2795(75)90109-9.

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
mcc(truth, response)
set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
mcc(truth, response)

Measure Registry

Description

The environment() measures keeps track of all measures in this package. It stores meta information such as minimum, maximum or if the measure must be minimized or maximized. The following information is available for each measure:

id: Name of the measure.
title: Short descriptive title.
type: "binary" for binary classification, "classif" for binary or multi-class classification, "regr" for regression and "similarity" for similarity measures.
lower: lower bound.
upper: upper bound.
predict_type: prediction type the measure operates on. "response" corresponds to class labels for classification and the numeric response for regression. "prob" corresponds to class probabilities, provided as a matrix with class labels as column names. "se" corresponds to to the vector of predicted standard errors for regression.
minimize: If TRUE or FALSE, the objective is to minimize or maximize the measure, respectively. Can also be NA.
obs_loss: Name of the function which is called to calculate the (unaggregated) loss per observation.
trafo: Optional list() of length 2, containing a transformation "fn" and its derivative "deriv".
aggregated: If TRUE, this function aggregates the losses to a single numeric value. Otherwise, a vector of losses is returned.
sample_weights: If TRUE, it is possible calculate a weighted measure.

Usage

measures
measures

Format

An object of class environment of length 65.

Examples

names(measures)
measures$tpr
names(measures)
measures$tpr

Median Absolute Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

medae(truth, response, ...)
medae(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Median Absolute Error is defined as

$\mathop{\mathrm{median}} \left| t_i - r_i \right|.$

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
medae(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
medae(truth, response)

Median Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

medse(truth, response, ...)
medse(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Median Squared Error is defined as

$\mathop{\mathrm{median}} \left[ \left( t_i - r_i \right)^2 \right].$

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
medse(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
medse(truth, response)

Mean Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

mse(truth, response, sample_weights = NULL, ...)
mse(truth, response, sample_weights = NULL, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Mean Squared Error is defined as

$\frac{1}{n} \sum_{i=1}^n w_i \left( t_i - r_i \right)^2,$

where $w_i$ are normalized sample weights.

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
mse(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
mse(truth, response)

Mean Squared Log Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

msle(truth, response, sample_weights = NULL, na_value = NaN, ...)
msle(truth, response, sample_weights = NULL, na_value = NaN, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Mean Squared Log Error is defined as

$\frac{1}{n} \sum_{i=1}^n w_i \left( \ln (1 + t_i) - \ln (1 + r_i) \right)^2,$

where $w_i$ are normalized sample weights. This measure is undefined if any element of $t$ or $r$ is less than or equal to $-1$ .

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
msle(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
msle(truth, response)

Negative Predictive Value

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

npv(truth, response, positive, na_value = NaN, ...)
npv(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Negative Predictive Value is defined as

$\frac{\mathrm{TN}}{\mathrm{FN} + \mathrm{TN}}.$

This measure is undefined if FN + TN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
npv(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
npv(truth, response, positive = "a")

Percent Bias

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

pbias(truth, response, sample_weights = NULL, na_value = NaN, ...)
pbias(truth, response, sample_weights = NULL, na_value = NaN, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Percent Bias is defined as

$\frac{1}{n} \sum_{i=1}^n w_i \frac{\left( t_i - r_i \right)}{\left| t_i \right|},$

where $w_i$ are normalized sample weights. Good predictions score close to 0.

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $(-\infty, \infty)$
Minimize: NA
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
pbias(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
pbias(truth, response)

Phi Coefficient Similarity

Description

Measure to compare two or more sets w.r.t. their similarity.

Usage

phi(sets, p, na_value = NaN, ...)
phi(sets, p, na_value = NaN, ...)

Arguments

`sets`	(`list()`) List of character or integer vectors. `sets` must have at least 2 elements.
`p`	(`integer(1)`) Total number of possible elements.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Phi Coefficient is defined as the Pearson correlation between the binary representation of two sets $A$ and $B$ . The binary representation for $A$ is a logical vector of length $p$ with the i-th element being 1 if the corresponding element is in $A$ , and 0 otherwise.

If more than two sets are provided, the mean of all pairwise scores is calculated.

This measure is undefined if one set contains none or all possible elements.

Value

Performance value as numeric(1).

Meta Information

Type: "similarity"
Range: $[-1, 1]$
Minimize: FALSE

References

Nogueira S, Brown G (2016). “Measuring the Stability of Feature Selection.” In Machine Learning and Knowledge Discovery in Databases, 442–457. Springer International Publishing. doi:10.1007/978-3-319-46227-1_28.

Bommert A, Lang M (2021). “stabm: Stability Measures for Feature Selection.” Journal of Open Source Software, 6(59), 3010. doi:10.21105/joss.03010.

Examples

set.seed(1)
sets = list(
  sample(letters[1:3], 1),
  sample(letters[1:3], 2)
)
phi(sets, p = 3)
set.seed(1)
sets = list(
  sample(letters[1:3], 1),
  sample(letters[1:3], 2)
)
phi(sets, p = 3)

Average Pinball Loss

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

pinball(truth, response, sample_weights = NULL, alpha = 0.5, ...)
pinball(truth, response, sample_weights = NULL, alpha = 0.5, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`alpha`	`numeric(1)` The quantile to compute the pinball loss.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The pinball loss for quantile regression is defined as

$\text{Average Pinball Loss} = \frac{1}{n} \sum_{i=1}^{n} w_{i} \begin{cases} q \cdot (t_i - r_i) & \text{if } t_i \geq r_i \\ (1 - q) \cdot (r_i - t_i) & \text{if } t_i < r_i \end{cases}$

where $q$ is the quantile and $w_i$ are normalized sample weights.

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $(-\infty, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
pinball(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
pinball(truth, response)

Positive Predictive Value

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

ppv(truth, response, positive, na_value = NaN, ...)

precision(truth, response, positive, na_value = NaN, ...)
ppv(truth, response, positive, na_value = NaN, ...)

precision(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Positive Predictive Value is defined as

$\frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP}}.$

Also know as "precision".

This measure is undefined if TP + FP = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
ppv(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
ppv(truth, response, positive = "a")

Area Under the Precision-Recall Curve

Description

Measure to compare true observed labels with predicted probabilities in binary classification tasks.

Usage

prauc(truth, prob, positive, na_value = NaN, ...)
prauc(truth, prob, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`prob`	(`numeric()`) Predicted probability for positive class. Must have exactly same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Computes the area under the Precision-Recall curve (PRC). The PRC can be interpreted as the relationship between precision and recall (sensitivity), and is considered to be a more appropriate measure for unbalanced datasets than the ROC curve. The AUC-PRC is computed by integration of the piecewise function.

This measure is undefined if the true values are either all positive or all negative.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: prob

References

Davis J, Goadrich M (2006). “The relationship between precision-recall and ROC curves.” In Proceedings of the 23rd International Conference on Machine Learning. ISBN 9781595933836.

Examples

truth = factor(c("a", "a", "a", "b"))
prob = c(.6, .7, .1, .4)
prauc(truth, prob, "a")
truth = factor(c("a", "a", "a", "b"))
prob = c(.6, .7, .1, .4)
prauc(truth, prob, "a")

Relative Absolute Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rae(truth, response, na_value = NaN, ...)
rae(truth, response, na_value = NaN, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Relative Absolute Error is defined as

$\frac{\sum_{i=1}^n \left| t_i - r_i \right|}{\sum_{i=1}^n \left| t_i - \bar{t} \right|},$

where $\bar{t} = \sum_{i=1}^n t_i$ . This measure is undefined for constant $t$ .

Can be interpreted as absolute error of the predictions relative to a naive model predicting the mean.

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rae(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rae(truth, response)

Root Mean Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rmse(truth, response, sample_weights = NULL, ...)
rmse(truth, response, sample_weights = NULL, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Root Mean Squared Error is defined as

$\sqrt{\frac{1}{n} \sum_{i=1}^n w_i \left( t_i - r_i \right)^2},$

where $w_i$ are normalized sample weights.

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rmse(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rmse(truth, response)

Root Mean Squared Log Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rmsle(truth, response, sample_weights = NULL, na_value = NaN, ...)
rmsle(truth, response, sample_weights = NULL, na_value = NaN, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`sample_weights`	(`numeric()`) Vector of non-negative and finite sample weights. Must have the same length as `truth`. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Root Mean Squared Log Error is defined as

$\sqrt{\frac{1}{n} \sum_{i=1}^n w_i \left( \ln (1 + t_i) - \ln (1 + r_i) \right)^2},$

where $w_i$ are normalized sample weights.

This measure is undefined if any element of $t$ or $r$ is less than or equal to $-1$ .

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rmsle(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rmsle(truth, response)

Root Relative Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rrse(truth, response, na_value = NaN, ...)
rrse(truth, response, na_value = NaN, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Root Relative Squared Error is defined as

$\sqrt{\frac{\sum_{i=1}^n \left( t_i - r_i \right)^2}{\sum_{i=1}^n \left( t_i - \bar{t} \right)^2}},$

where $\bar{t} = \sum_{i=1}^n t_i$ .

Can be interpreted as root of the squared error of the predictions relative to a naive model predicting the mean.

This measure is undefined for constant $t$ .

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rrse(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rrse(truth, response)

Relative Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rse(truth, response, na_value = NaN, ...)
rse(truth, response, na_value = NaN, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Relative Squared Error is defined as

$\frac{\sum_{i=1}^n \left( t_i - r_i \right)^2}{\sum_{i=1}^n \left( t_i - \bar{t} \right)^2},$

where $\bar{t} = \sum_{i=1}^n t_i$ .

Can be interpreted as squared error of the predictions relative to a naive model predicting the mean.

This measure is undefined for constant $t$ .

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rse(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rse(truth, response)

R Squared

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rsq(truth, response, na_value = NaN, ...)
rsq(truth, response, na_value = NaN, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

R Squared is defined as

$1 - \frac{\sum_{i=1}^n \left( t_i - r_i \right)^2}{\sum_{i=1}^n \left( t_i - \bar{t} \right)^2},$

where $\bar{t} = \sum_{i=1}^n t_i$ .

Also known as coefficient of determination or explained variation. Subtracts the rse() from 1, hence it compares the squared error of the predictions relative to a naive model predicting the mean.

This measure is undefined for constant $t$ .

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $(-\infty, 1]$
Minimize: FALSE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rsq(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rsq(truth, response)

Sum of Absolute Errors

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

sae(truth, response, ...)
sae(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Sum of Absolute Errors is defined as

$\sum_{i=1}^n \left| t_i - r_i \right|.$

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
sae(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
sae(truth, response)

Squared Error (per observation)

Description

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

se(truth, response, ...)
se(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Calculates the per-observation squared error as

$\left( t_i - r_i \right)^2.$

Value

Performance value as numeric(length(truth)).

Meta Information

Type: "regr"
Range (per observation): $[0, \infty)$
Minimize (per observation): TRUE
Required prediction: response

Squared Log Error (per observation)

Description

Calculates the per-observation squared error as

$\left( \ln (1 + t_i) - \ln (1 + r_i) \right)^2.$

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

sle(truth, response, ...)
sle(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Value

Performance value as numeric(length(truth)).

Meta Information

Type: "regr"
Range (per observation): $[0, \infty)$
Minimize (per observation): TRUE
Required prediction: response

Symmetric Mean Absolute Percent Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

smape(truth, response, na_value = NaN, ...)
smape(truth, response, na_value = NaN, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Symmetric Mean Absolute Percent Error is defined as

$\frac{2}{n} \sum_{i=1}^n \frac{\left| t_i - r_i \right|}{\left| t_i \right| + \left| r_i \right|}.$

This measure is undefined if if any $|t| + |r|$ is equal to $0$ .

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, 2]$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
smape(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
smape(truth, response)

Spearman's rho

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

srho(truth, response, ...)
srho(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

Spearman's rho is defined as Spearman's rank correlation coefficient between truth and response. Calls stats::cor() with method set to "spearman".

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[-1, 1]$
Minimize: FALSE
Required prediction: response

References

Rosset S, Perlich C, Zadrozny B (2006). “Ranking-based evaluation of regression models.” Knowledge and Information Systems, 12(3), 331–353. doi:10.1007/s10115-006-0037-3.

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
srho(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
srho(truth, response)

Sum of Squared Errors

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

sse(truth, response, ...)
sse(truth, response, ...)

Arguments

`truth`	(`numeric()`) True (observed) values. Must have the same length as `response`.
`response`	(`numeric()`) Predicted response values. Must have the same length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The Sum of Squared Errors is defined as

$\sum_{i=1}^n \left( t_i - r_i \right)^2.$

Value

Performance value as numeric(1).

Meta Information

Type: "regr"
Range: $[0, \infty)$
Minimize: TRUE
Required prediction: response

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
sse(truth, response)
set.seed(1)
truth = 1:10
response = truth + rnorm(10)
sse(truth, response)

True Negatives

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

tn(truth, response, positive, ...)
tn(truth, response, positive, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`...`	(`any`) Additional arguments. Currently ignored.

Details

This measure counts the true negatives, i.e. the number of predictions correctly indicating a negative class label. This is sometimes also called a "correct rejection".

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, \infty)$
Minimize: FALSE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tn(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tn(truth, response, positive = "a")

True Negative Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

tnr(truth, response, positive, na_value = NaN, ...)

specificity(truth, response, positive, na_value = NaN, ...)
tnr(truth, response, positive, na_value = NaN, ...)

specificity(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The True Negative Rate is defined as

$\frac{\mathrm{TN}}{\mathrm{FP} + \mathrm{TN}}.$

Also know as "specificity" or "selectivity".

This measure is undefined if FP + TN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tnr(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tnr(truth, response, positive = "a")

True Positives

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

tp(truth, response, positive, ...)
tp(truth, response, positive, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`...`	(`any`) Additional arguments. Currently ignored.

Details

This measure counts the true positives, i.e. the number of predictions correctly indicating a positive class label. This is sometimes also called a "hit".

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, \infty)$
Minimize: FALSE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tp(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tp(truth, response, positive = "a")

True Positive Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

tpr(truth, response, positive, na_value = NaN, ...)

recall(truth, response, positive, na_value = NaN, ...)

sensitivity(truth, response, positive, na_value = NaN, ...)
tpr(truth, response, positive, na_value = NaN, ...)

recall(truth, response, positive, na_value = NaN, ...)

sensitivity(truth, response, positive, na_value = NaN, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the exactly same two levels and the same length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the exactly same two levels and the same length as `truth`.
`positive`	(`⁠character(1))⁠` Name of the positive class.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

The True Positive Rate is defined as

$\frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}}.$

This is also know as "recall", "sensitivity", or "probability of detection".

This measure is undefined if TP + FN = 0.

Value

Performance value as numeric(1).

Meta Information

Type: "binary"
Range: $[0, 1]$
Minimize: FALSE
Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tpr(truth, response, positive = "a")
set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tpr(truth, response, positive = "a")

Zero-One Classification Loss (per observation)

Description

Calculates the per-observation 0/1 (zero-one) loss as

$\mathbf{1} (t_i \neq r_1).$

The 1/0 (one-zero) loss is equal to 1 - zero-one and calculated as

$\mathbf{1} (t_i = r_i).$

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

zero_one(truth, response, ...)

one_zero(truth, response, ...)
zero_one(truth, response, ...)

one_zero(truth, response, ...)

Arguments

`truth`	(`factor()`) True (observed) labels. Must have the same levels and length as `response`.
`response`	(`factor()`) Predicted response labels. Must have the same levels and length as `truth`.
`...`	(`any`) Additional arguments. Currently ignored.

Value

Performance value as numeric(length(truth)).

Meta Information

Type: "classif"
Range (per observation): $[0, 1]$
Minimize (per observation): TRUE
Required prediction: response

Package 'mlr3measures'

Help Index

mlr3measures: Performance Measures for 'mlr3'

Description

Author(s)

See Also

Classification Accuracy

Description

Usage

Arguments

Details

Value

Meta Information

See Also

Examples

Absolute Error (per observation)

Description

Usage

Arguments

Details

Value

Meta Information

See Also

Absolute Percentage Error (per observation)

Description

Usage

Arguments

Details

Value

Meta Information

See Also

Area Under the ROC Curve

Description

Usage

Arguments

Details

Value

Meta Information

References

See Also

Examples

Balanced Accuracy

Description

Usage

Arguments

Details

Value

Meta Information

References

See Also

Examples

Binary Brier Score

Description

Usage

Arguments

Details

Value

Meta Information

References

See Also

Examples

Bias

Description

Usage

Arguments

Details

Value

Meta Information

See Also

Examples

Classification Error

Description

Usage

Arguments

Details

Value

Meta Information

See Also

Examples

Calculate Binary Confusion Matrix