Package 'mlr3measures'

Title: Performance Measures for 'mlr3'
Description: Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are.
Authors: Michel Lang [aut] , Martin Binder [ctb], Marc Becker [cre, aut] , Lona Koers [aut]
Maintainer: Marc Becker <[email protected]>
License: LGPL-3
Version: 1.0.0
Built: 2025-01-10 07:17:28 UTC
Source: CRAN

Help Index


mlr3measures: Performance Measures for 'mlr3'

Description

Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are.

Author(s)

Maintainer: Marc Becker [email protected] (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Classification Accuracy

Description

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Usage

acc(truth, response, sample_weights = NULL, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the same levels and length as response.

response

(factor())
Predicted response labels. Must have the same levels and length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

...

(any)
Additional arguments. Currently ignored.

Details

The Classification Accuracy is defined as

1ni=1nwi1(ti=ri),\frac{1}{n} \sum_{i=1}^n w_i \mathbf{1} \left( t_i = r_i \right),

where wiw_i are normalized weights for all observations xix_i.

Value

Performance value as numeric(1).

Meta Information

  • Type: "classif"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: response

See Also

Other Classification Measures: bacc(), ce(), logloss(), mauc_aunu(), mbrier(), mcc(), zero_one()

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
acc(truth, response)

Absolute Error (per observation)

Description

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

ae(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

Calculates the per-observation absolute error as

tiri.\left| t_i - r_i \right|.

Value

Performance value as numeric(length(truth)).

Meta Information

  • Type: "regr"

  • Range (per observation): [0,)[0, \infty)

  • Minimize (per observation): TRUE

  • Required prediction: response

See Also

Other Regression Measures: ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()


Absolute Percentage Error (per observation)

Description

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

ape(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

Calculates the per-observation absolute percentage error as

tiriti.\left| \frac{ t_i - r_i}{t_i} \right|.

Value

Performance value as numeric(length(truth)).

Meta Information

  • Type: "regr"

  • Range (per observation): [0,)[0, \infty)

  • Minimize (per observation): TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()


Area Under the ROC Curve

Description

Measure to compare true observed labels with predicted probabilities in binary classification tasks.

Usage

auc(truth, prob, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

prob

(numeric())
Predicted probability for positive class. Must have exactly same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

Computes the area under the Receiver Operator Characteristic (ROC) curve. The AUC can be interpreted as the probability that a randomly chosen positive observation has a higher predicted probability than a randomly chosen negative observation.

This measure is undefined if the true values are either all positive or all negative.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: prob

References

Youden WJ (1950). “Index for rating diagnostic tests.” Cancer, 3(1), 32–35. doi:10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3.

See Also

Other Binary Classification Measures: bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

truth = factor(c("a", "a", "a", "b"))
prob = c(.6, .7, .1, .4)
auc(truth, prob, "a")

Balanced Accuracy

Description

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Usage

bacc(truth, response, sample_weights = NULL, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the same levels and length as response.

response

(factor())
Predicted response labels. Must have the same levels and length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

...

(any)
Additional arguments. Currently ignored.

Details

The Balanced Accuracy computes the weighted balanced accuracy, suitable for imbalanced data sets. It is defined analogously to the definition in sklearn.

First, all sample weights wiw_i are normalized per class so that each class has the same influence:

w^i=wij=1nwj1(tj=ti).\hat{w}_i = \frac{w_i}{\sum_{j=1}^n w_j \cdot \mathbf{1}(t_j = t_i)}.

The Balanced Accuracy is then calculated as

1i=1nw^ii=1nw^i1(ri=ti).\frac{1}{\sum_{i=1}^n \hat{w}_i} \sum_{i=1}^n \hat{w}_i \cdot \mathbf{1}(r_i = t_i).

This definition is equivalent to acc() with class-balanced sample weights.

Value

Performance value as numeric(1).

Meta Information

  • Type: "classif"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: response

References

Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010). “The Balanced Accuracy and Its Posterior Distribution.” In 2010 20th International Conference on Pattern Recognition. doi:10.1109/icpr.2010.764.

Guyon I, Bennett K, Cawley G, Escalante HJ, Escalera S, Ho TK, Macia N, Ray B, Saeed M, Statnikov A, Viegas E (2015). “Design of the 2015 ChaLearn AutoML challenge.” In 2015 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn.2015.7280767.

See Also

Other Classification Measures: acc(), ce(), logloss(), mauc_aunu(), mbrier(), mcc(), zero_one()

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
bacc(truth, response)

Binary Brier Score

Description

Measure to compare true observed labels with predicted probabilities in binary classification tasks.

Usage

bbrier(truth, prob, positive, sample_weights = NULL, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

prob

(numeric())
Predicted probability for positive class. Must have exactly same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

...

(any)
Additional arguments. Currently ignored.

Details

The Binary Brier Score is defined as

1ni=1nwi(Iipi)2,\frac{1}{n} \sum_{i=1}^n w_i (I_i - p_i)^2,

where wiw_i are the sample weights, and IiI_{i} is 1 if observation xix_i belongs to the positive class, and 0 otherwise.

Note that this (more common) definition of the Brier score is equivalent to the original definition of the multi-class Brier score (see mbrier()) divided by 2.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: TRUE

  • Required prediction: prob

References

https://en.wikipedia.org/wiki/Brier_score

Brier GW (1950). “Verification of forecasts expressed in terms of probability.” Monthly Weather Review, 78(1), 1–3. doi:10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.

See Also

Other Binary Classification Measures: auc(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = runif(10)
bbrier(truth, prob, positive = "a")

Bias

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

bias(truth, response, sample_weights = NULL, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

...

(any)
Additional arguments. Currently ignored.

Details

The Bias is defined as

1ni=1nwi(tiri),\frac{1}{n} \sum_{i=1}^n w_i \left( t_i - r_i \right),

where wiw_i are normalized sample weights. Good predictions score close to 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: (,)(-\infty, \infty)

  • Minimize: NA

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
bias(truth, response)

Classification Error

Description

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Usage

ce(truth, response, sample_weights = NULL, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the same levels and length as response.

response

(factor())
Predicted response labels. Must have the same levels and length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

...

(any)
Additional arguments. Currently ignored.

Details

The Classification Error is defined as

1ni=1nwi1(tiri),\frac{1}{n} \sum_{i=1}^n w_i \mathbf{1} \left( t_i \neq r_i \right),

where wiw_i are normalized weights for each observation xix_i.

Value

Performance value as numeric(1).

Meta Information

  • Type: "classif"

  • Range: [0,1][0, 1]

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Classification Measures: acc(), bacc(), logloss(), mauc_aunu(), mbrier(), mcc(), zero_one()

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
ce(truth, response)

Calculate Binary Confusion Matrix

Description

Calculates the confusion matrix for a binary classification problem once and then calculates all binary confusion measures of this package.

Usage

confusion_matrix(truth, response, positive, na_value = NaN, relative = FALSE)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

relative

(logical(1))
If TRUE, the returned confusion matrix contains relative frequencies instead of absolute frequencies.

Details

The binary confusion matrix is defined as

(TPFPFNTN).\begin{pmatrix} TP & FP \\ FN & TN \end{pmatrix}.

If relative = TRUE, all values are divided by nn.

Value

List with two elements:

  • matrix stores the calculated confusion matrix.

  • measures stores the metrics as named numeric vector.

Examples

set.seed(123)
lvls = c("a", "b")
truth = factor(sample(lvls, 20, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 20, replace = TRUE), levels = lvls)

confusion_matrix(truth, response, positive = "a")
confusion_matrix(truth, response, positive = "a", relative = TRUE)
confusion_matrix(truth, response, positive = "b")

Diagnostic Odds Ratio

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

dor(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Diagnostic Odds Ratio is defined as

TP/FPFN/TN.\frac{\mathrm{TP}/\mathrm{FP}}{\mathrm{FN}/\mathrm{TN}}.

This measure is undefined if FP = 0 or FN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,)[0, \infty)

  • Minimize: FALSE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
dor(truth, response, positive = "a")

F-beta Score

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fbeta(truth, response, positive, beta = 1, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

beta

(numeric(1))
Parameter to give either precision or recall more weight. Default is 1, resulting in balanced weights.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

With PP as precision() and RR as recall(), the F-beta Score is defined as

(1+β2)PR(β2P)+R.(1 + \beta^2) \frac{P \cdot R}{(\beta^2 P) + R}.

It measures the effectiveness of retrieval with respect to a user who attaches β\beta times as much importance to recall as precision. For β=1\beta = 1, this measure is called "F1" score.

This measure is undefined if precision or recall is undefined, i.e. TP + FP = 0 or TP + FN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: response

References

Rijsbergen, Van CJ (1979). Information Retrieval, 2nd edition. Butterworth-Heinemann, Newton, MA, USA. ISBN 408709294.

Goutte C, Gaussier E (2005). “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation.” In Lecture Notes in Computer Science, 345–359. doi:10.1007/978-3-540-31865-1_25.

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fbeta(truth, response, positive = "a")

False Discovery Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fdr(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The False Discovery Rate is defined as

FPTP+FP.\frac{\mathrm{FP}}{\mathrm{TP} + \mathrm{FP}}.

This measure is undefined if TP + FP = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: TRUE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fdr(truth, response, positive = "a")

False Negatives

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fn(truth, response, positive, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

...

(any)
Additional arguments. Currently ignored.

Details

This measure counts the false negatives (type 2 error), i.e. the number of predictions indicating a negative class label while in fact it is positive. This is sometimes also called a "miss" or an "underestimation".

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fn(truth, response, positive = "a")

False Negative Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fnr(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The False Negative Rate is defined as

FNTP+FN.\frac{\mathrm{FN}}{\mathrm{TP} + \mathrm{FN}}.

Also know as "miss rate".

This measure is undefined if TP + FN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: TRUE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fnr(truth, response, positive = "a")

False Omission Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fomr(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The False Omission Rate is defined as

FNFN+TN.\frac{\mathrm{FN}}{\mathrm{FN} + \mathrm{TN}}.

This measure is undefined if FN + TN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: TRUE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fomr(truth, response, positive = "a")

False Positives

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fp(truth, response, positive, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

...

(any)
Additional arguments. Currently ignored.

Details

This measure counts the false positives (type 1 error), i.e. the number of predictions indicating a positive class label while in fact it is negative. This is sometimes also called a "false alarm".

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fp(truth, response, positive = "a")

False Positive Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

fpr(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The False Positive Rate is defined as

FPFP+TN.\frac{\mathrm{FP}}{\mathrm{FP} + \mathrm{TN}}.

Also know as fall out or probability of false alarm.

This measure is undefined if FP + TN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: TRUE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
fpr(truth, response, positive = "a")

Geometric Mean of Recall and Specificity

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

gmean(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

Calculates the geometric mean of recall() R and specificity() S as

RS.\sqrt{\mathrm{R} \cdot \mathrm{S}}.

This measure is undefined if recall or specificity is undefined, i.e. if TP + FN = 0 or if FP + TN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: response

References

He H, Garcia EA (2009). “Learning from Imbalanced Data.” IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. doi:10.1109/TKDE.2008.239.

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
gmean(truth, response, positive = "a")

Geometric Mean of Precision and Recall

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

gpr(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

Calculates the geometric mean of precision() P and recall() R as

PR.\sqrt{\mathrm{P} \cdot \mathrm{R}}.

This measure is undefined if precision or recall is undefined, i.e. if TP + FP = 0 or if TP + FN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: response

References

He H, Garcia EA (2009). “Learning from Imbalanced Data.” IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. doi:10.1109/TKDE.2008.239.

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), npv(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
gpr(truth, response, positive = "a")

Jaccard Similarity Index

Description

Measure to compare two or more sets w.r.t. their similarity.

Usage

jaccard(sets, na_value = NaN, ...)

Arguments

sets

(list())
List of character or integer vectors. sets must have at least 2 elements.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

For two sets AA and BB, the Jaccard Index is defined as

J(A,B)=ABAB.J(A, B) = \frac{|A \cap B|}{|A \cup B|}.

If more than two sets are provided, the mean of all pairwise scores is calculated.

This measure is undefined if two or more sets are empty.

Value

Performance value as numeric(1).

Meta Information

  • Type: "similarity"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

References

Jaccard, Paul (1901). “Étude comparative de la distribution florale dans une portion des Alpes et du Jura.” Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547-579. doi:10.5169/SEALS-266450.

Bommert A, Rahnenführer J, Lang M (2017). “A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.” Computational and Mathematical Methods in Medicine, 2017, 1–18. doi:10.1155/2017/7907163.

Bommert A, Lang M (2021). “stabm: Stability Measures for Feature Selection.” Journal of Open Source Software, 6(59), 3010. doi:10.21105/joss.03010.

See Also

Package stabm which implements many more stability measures with included correction for chance.

Other Similarity Measures: phi()

Examples

set.seed(1)
sets = list(
  sample(letters[1:3], 1),
  sample(letters[1:3], 2)
)
jaccard(sets)

Kendall's tau

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

ktau(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

Kendall's tau is defined as Kendall's rank correlation coefficient between truth and response. It is defined as

τ=(numberofconcordantpairs)(numberofdiscordantpairs)(numberofpairs)\tau = \frac{(\mathrm{number of concordant pairs)} - (\mathrm{number of discordant pairs)}}{\mathrm{(number of pairs)}}

Calls stats::cor() with method set to "kendall".

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [1,1][-1, 1]

  • Minimize: FALSE

  • Required prediction: response

References

Rosset S, Perlich C, Zadrozny B (2006). “Ranking-based evaluation of regression models.” Knowledge and Information Systems, 12(3), 331–353. doi:10.1007/s10115-006-0037-3.

See Also

Other Regression Measures: ae(), ape(), bias(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
ktau(truth, response)

Linear-Exponential Loss (per observation)

Description

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

linex(truth, response, a = -1, b = 1, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

a

(numeric(1))
Shape parameter controlling asymmetry. Negative values penalize overestimation more, positive values penalize underestimation more. As a approaches 0, the loss resembles squared error loss. Default is -1.

b

(numeric(1))
Positive scaling factor for the loss. Larger values increase the loss magnitude. Default is 1.

...

(any)
Additional arguments. Currently ignored.

Details

The Linear-Exponential Loss is defined as

b(exp(tiri)a(tiri)1),b (\exp (t_i - r_i) - a (t_i - r_i) - 1),

where a0,b>0a \neq 0, b > 0.

Value

Performance value as numeric(length(truth)).

Meta Information

  • Type: "regr"

  • Range (per observation): [0,)[0, \infty)

  • Minimize (per observation): TRUE

  • Required prediction: response

References

Varian, R. H (1975). “A Bayesian Approach to Real Estate Assessment.” In Fienberg SE, Zellner A (eds.), Studies in Bayesian Econometrics and Statistics: In Honor of Leonard J. Savage, 195–208. North-Holland, Amsterdam.

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
linex(truth, response)

Log Loss

Description

Measure to compare true observed labels with predicted probabilities in multiclass classification tasks.

Usage

logloss(truth, prob, sample_weights = NULL, eps = 1e-15, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the same levels and length as response.

prob

(matrix())
Matrix of predicted probabilities, each column is a vector of probabilities for a specific class label. Columns must be named with levels of truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

eps

(numeric(1))
Probabilities are clipped to max(eps, min(1 - eps, p)). Otherwise the measure would be undefined for probabilities p = 0 and p = 1.

...

(any)
Additional arguments. Currently ignored.

Details

The Log Loss (a.k.a Benoulli Loss, Logistic Loss, Cross-Entropy Loss) is defined as

1ni=1nwilog(pi)-\frac{1}{n} \sum_{i=1}^n w_i \log \left( p_i \right )

where pip_i is the probability for the true class of observation ii and wiw_i are normalized weights for each observation xix_i.

Value

Performance value as numeric(1).

Meta Information

  • Type: "classif"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: prob

See Also

Other Classification Measures: acc(), bacc(), ce(), mauc_aunu(), mbrier(), mcc(), zero_one()

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = matrix(runif(3 * 10), ncol = 3, dimnames = list(NULL, lvls))
prob = t(apply(prob, 1, function(x) x / sum(x)))
logloss(truth, prob)

Mean Absolute Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

mae(truth, response, sample_weights = NULL, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

...

(any)
Additional arguments. Currently ignored.

Details

The Mean Absolute Error is defined as

1ni=1nwitiri,\frac{1}{n} \sum_{i=1}^n w_i \left| t_i - r_i \right|,

where wiw_i are normalized sample weights.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
mae(truth, response)

Mean Absolute Percent Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

mape(truth, response, sample_weights = NULL, na_value = NaN, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Mean Absolute Percent Error is defined as

1ni=1nwitiriti,\frac{1}{n} \sum_{i=1}^n w_i \left| \frac{ t_i - r_i}{t_i} \right|,

where wiw_i are normalized sample weights.

This measure is undefined if any element of tt is 00.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

References

de Myttenaere, Arnaud, Golden, Boris, Le Grand, Bénédicte, Rossi, Fabrice (2016). “Mean Absolute Percentage Error for regression models.” Neurocomputing, 192, 38-48. ISSN 0925-2312, doi:10.1016/j.neucom.2015.12.114.

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
mape(truth, response)

Multiclass AUC Scores

Description

Measure to compare true observed labels with predicted probabilities in multiclass classification tasks.

Usage

mauc_aunu(truth, prob, na_value = NaN, ...)

mauc_aunp(truth, prob, na_value = NaN, ...)

mauc_au1u(truth, prob, na_value = NaN, ...)

mauc_au1p(truth, prob, na_value = NaN, ...)

mauc_mu(truth, prob, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the same levels and length as response.

prob

(matrix())
Matrix of predicted probabilities, each column is a vector of probabilities for a specific class label. Columns must be named with levels of truth.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

Multiclass AUC measures.

  • AUNU: AUC of each class against the rest, using the uniform class distribution. Computes the AUC treating a c-dimensional classifier as c two-dimensional 1-vs-rest classifiers, where classes are assumed to have uniform distribution, in order to have a measure which is independent of class distribution change (Fawcett 2001).

  • AUNP: AUC of each class against the rest, using the a-priori class distribution. Computes the AUC treating a c-dimensional classifier as c two-dimensional 1-vs-rest classifiers, taking into account the prior probability of each class (Fawcett 2001).

  • AU1U: AUC of each class against each other, using the uniform class distribution. Computes something like the AUC of c(c - 1) binary classifiers (all possible pairwise combinations). See Hand (2001) for details.

  • AU1P: AUC of each class against each other, using the a-priori class distribution. Computes something like AUC of c(c - 1) binary classifiers while considering the a-priori distribution of the classes as suggested in Ferri (2009). Note we deviate from the definition in Ferri (2009) by a factor of c.

  • MU: Multiclass AUC as defined in Kleinman and Page (2019). This measure is an average of the pairwise AUCs between all classes. The measure was tested against the Python implementation by Ross Kleinman.

Value

Performance value as numeric(1).

Meta Information

  • Type: "classif"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: prob

References

Fawcett, Tom (2001). “Using rule sets to maximize ROC performance.” In Proceedings 2001 IEEE international conference on data mining, 131–138. IEEE.

Ferri, César, Hernández-Orallo, José, Modroiu, R (2009). “An experimental comparison of performance measures for classification.” Pattern Recognition Letters, 30(1), 27–38. doi:10.1016/j.patrec.2008.08.010.

Hand, J D, Till, J R (2001). “A simple generalisation of the area under the ROC curve for multiple class classification problems.” Machine learning, 45(2), 171–186.

Kleiman R, Page D (2019). “AUC mu: A Performance Metric for Multi-Class Machine Learning Models.” In Chaudhuri, Kamalika, Salakhutdinov, Ruslan (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 series Proceedings of Machine Learning Research, 3439–3447. PMLR.

See Also

Other Classification Measures: acc(), bacc(), ce(), logloss(), mbrier(), mcc(), zero_one()

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = matrix(runif(3 * 10), ncol = 3)
colnames(prob) = levels(truth)
mauc_aunu(truth, prob)

Max Absolute Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

maxae(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

The Max Absolute Error is defined as

max(tiri).\max \left( \left| t_i - r_i \right| \right).

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
maxae(truth, response)

Max Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

maxse(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

The Max Squared Error is defined as

max(tiri)2.\max \left( t_i - r_i \right)^2.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
maxse(truth, response)

Multiclass Brier Score

Description

Measure to compare true observed labels with predicted probabilities in multiclass classification tasks.

Usage

mbrier(truth, prob, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the same levels and length as response.

prob

(matrix())
Matrix of predicted probabilities, each column is a vector of probabilities for a specific class label. Columns must be named with levels of truth.

...

(any)
Additional arguments. Currently ignored.

Details

Brier score for multi-class classification problems with kk labels defined as

1ni=1nj=1k(Iijpij)2.\frac{1}{n} \sum_{i=1}^n \sum_{j=1}^k (I_{ij} - p_{ij})^2.

IijI_{ij} is 1 if observation xix_i has true label jj, and 0 otherwise. pijp_{ij} is the probability that observation xix_i belongs to class jj.

Note that there also is the more common definition of the Brier score for binary classification problems in bbrier().

Value

Performance value as numeric(1).

Meta Information

  • Type: "classif"

  • Range: [0,2][0, 2]

  • Minimize: TRUE

  • Required prediction: prob

References

Brier GW (1950). “Verification of forecasts expressed in terms of probability.” Monthly Weather Review, 78(1), 1–3. doi:10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.

See Also

Other Classification Measures: acc(), bacc(), ce(), logloss(), mauc_aunu(), mcc(), zero_one()

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
prob = matrix(runif(3 * 10), ncol = 3)
colnames(prob) = levels(truth)
mbrier(truth, prob)

Matthews Correlation Coefficient

Description

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Usage

mcc(truth, response, positive = NULL, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the same levels and length as response.

response

(factor())
Predicted response labels. Must have the same levels and length as truth.

positive

(character(1)) Name of the positive class in case of binary classification.

...

(any)
Additional arguments. Currently ignored.

Details

In the binary case, the Matthews Correlation Coefficient is defined as

TPTNFPFN(TP+FP)(TP+FN)(TN+FP)(TN+FN),\frac{\mathrm{TP} \cdot \mathrm{TN} - \mathrm{FP} \cdot \mathrm{FN}}{\sqrt{(\mathrm{TP} + \mathrm{FP}) (\mathrm{TP} + \mathrm{FN}) (\mathrm{TN} + \mathrm{FP}) (\mathrm{TN} + \mathrm{FN})}},

where TPTP, FPFP, TNTN, TPTP are the number of true positives, false positives, true negatives, and false negatives respectively.

In the multi-class case, the Matthews Correlation Coefficient is defined for a multi-class confusion matrix CC with KK classes:

cskKpktk(s2kKpk2)(s2kKtk2),\frac{c \cdot s - \sum_k^K p_k \cdot t_k}{\sqrt{(s^2 - \sum_k^K p_k^2) \cdot (s^2 - \sum_k^K t_k^2)}},

where

  • s=iKjKCijs = \sum_i^K \sum_j^K C_{ij}: total number of samples

  • c=kKCkkc = \sum_k^K C_{kk}: total number of correctly predicted samples

  • tk=iKCikt_k = \sum_i^K C_{ik}: number of predictions for each class kk

  • pk=jKCkjp_k = \sum_j^K C_{kj}: number of true occurrences for each class kk.

The above formula is undefined if any of the four sums in the denominator is 0 in the binary case and more generally if either s2kKpk2s^2 - \sum_k^K p_k^2 or s2kKtk2)s^2 - \sum_k^K t_k^2) is equal to 0. The denominator is then set to 1.

When there are more than two classes, the MCC will no longer range between -1 and +1. Instead, the minimum value will be between -1 and 0 depending on the true distribution. The maximum value is always +1.

Value

Performance value as numeric(1).

Meta Information

  • Type: "classif"

  • Range: [1,1][-1, 1]

  • Minimize: FALSE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Phi_coefficient

Matthews BW (1975). “Comparison of the predicted and observed secondary structure of T4 phage lysozyme.” Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2), 442–451. doi:10.1016/0005-2795(75)90109-9.

See Also

Other Classification Measures: acc(), bacc(), ce(), logloss(), mauc_aunu(), mbrier(), zero_one()

Examples

set.seed(1)
lvls = c("a", "b", "c")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
mcc(truth, response)

Measure Registry

Description

The environment() measures keeps track of all measures in this package. It stores meta information such as minimum, maximum or if the measure must be minimized or maximized. The following information is available for each measure:

  • id: Name of the measure.

  • title: Short descriptive title.

  • type: "binary" for binary classification, "classif" for binary or multi-class classification, "regr" for regression and "similarity" for similarity measures.

  • lower: lower bound.

  • upper: upper bound.

  • predict_type: prediction type the measure operates on. "response" corresponds to class labels for classification and the numeric response for regression. "prob" corresponds to class probabilities, provided as a matrix with class labels as column names. "se" corresponds to to the vector of predicted standard errors for regression.

  • minimize: If TRUE or FALSE, the objective is to minimize or maximize the measure, respectively. Can also be NA.

  • obs_loss: Name of the function which is called to calculate the (unaggregated) loss per observation.

  • trafo: Optional list() of length 2, containing a transformation "fn" and its derivative "deriv".

  • aggregated: If TRUE, this function aggregates the losses to a single numeric value. Otherwise, a vector of losses is returned.

  • sample_weights: If TRUE, it is possible calculate a weighted measure.

Usage

measures

Format

An object of class environment of length 65.

Examples

names(measures)
measures$tpr

Median Absolute Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

medae(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

The Median Absolute Error is defined as

mediantiri.\mathop{\mathrm{median}} \left| t_i - r_i \right|.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
medae(truth, response)

Median Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

medse(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

The Median Squared Error is defined as

median[(tiri)2].\mathop{\mathrm{median}} \left[ \left( t_i - r_i \right)^2 \right].

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
medse(truth, response)

Mean Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

mse(truth, response, sample_weights = NULL, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

...

(any)
Additional arguments. Currently ignored.

Details

The Mean Squared Error is defined as

1ni=1nwi(tiri)2,\frac{1}{n} \sum_{i=1}^n w_i \left( t_i - r_i \right)^2,

where wiw_i are normalized sample weights.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
mse(truth, response)

Mean Squared Log Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

msle(truth, response, sample_weights = NULL, na_value = NaN, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Mean Squared Log Error is defined as

1ni=1nwi(ln(1+ti)ln(1+ri))2,\frac{1}{n} \sum_{i=1}^n w_i \left( \ln (1 + t_i) - \ln (1 + r_i) \right)^2,

where wiw_i are normalized sample weights. This measure is undefined if any element of tt or rr is less than or equal to 1-1.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
msle(truth, response)

Negative Predictive Value

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

npv(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Negative Predictive Value is defined as

TNFN+TN.\frac{\mathrm{TN}}{\mathrm{FN} + \mathrm{TN}}.

This measure is undefined if FN + TN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), ppv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
npv(truth, response, positive = "a")

Percent Bias

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

pbias(truth, response, sample_weights = NULL, na_value = NaN, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Percent Bias is defined as

1ni=1nwi(tiri)ti,\frac{1}{n} \sum_{i=1}^n w_i \frac{\left( t_i - r_i \right)}{\left| t_i \right|},

where wiw_i are normalized sample weights. Good predictions score close to 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: (,)(-\infty, \infty)

  • Minimize: NA

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
pbias(truth, response)

Phi Coefficient Similarity

Description

Measure to compare two or more sets w.r.t. their similarity.

Usage

phi(sets, p, na_value = NaN, ...)

Arguments

sets

(list())
List of character or integer vectors. sets must have at least 2 elements.

p

(integer(1))
Total number of possible elements.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Phi Coefficient is defined as the Pearson correlation between the binary representation of two sets AA and BB. The binary representation for AA is a logical vector of length pp with the i-th element being 1 if the corresponding element is in AA, and 0 otherwise.

If more than two sets are provided, the mean of all pairwise scores is calculated.

This measure is undefined if one set contains none or all possible elements.

Value

Performance value as numeric(1).

Meta Information

  • Type: "similarity"

  • Range: [1,1][-1, 1]

  • Minimize: FALSE

References

Nogueira S, Brown G (2016). “Measuring the Stability of Feature Selection.” In Machine Learning and Knowledge Discovery in Databases, 442–457. Springer International Publishing. doi:10.1007/978-3-319-46227-1_28.

Bommert A, Rahnenführer J, Lang M (2017). “A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.” Computational and Mathematical Methods in Medicine, 2017, 1–18. doi:10.1155/2017/7907163.

Bommert A, Lang M (2021). “stabm: Stability Measures for Feature Selection.” Journal of Open Source Software, 6(59), 3010. doi:10.21105/joss.03010.

See Also

Package stabm which implements many more stability measures with included correction for chance.

Other Similarity Measures: jaccard()

Examples

set.seed(1)
sets = list(
  sample(letters[1:3], 1),
  sample(letters[1:3], 2)
)
phi(sets, p = 3)

Average Pinball Loss

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

pinball(truth, response, sample_weights = NULL, alpha = 0.5, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

alpha

numeric(1)
The quantile to compute the pinball loss.

...

(any)
Additional arguments. Currently ignored.

Details

The pinball loss for quantile regression is defined as

Average Pinball Loss=1ni=1nwi{q(tiri)if tiri(1q)(riti)if ti<ri\text{Average Pinball Loss} = \frac{1}{n} \sum_{i=1}^{n} w_{i} \begin{cases} q \cdot (t_i - r_i) & \text{if } t_i \geq r_i \\ (1 - q) \cdot (r_i - t_i) & \text{if } t_i < r_i \end{cases}

where qq is the quantile and wiw_i are normalized sample weights.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: (,)(-\infty, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
pinball(truth, response)

Positive Predictive Value

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

ppv(truth, response, positive, na_value = NaN, ...)

precision(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Positive Predictive Value is defined as

TPTP+FP.\frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP}}.

Also know as "precision".

This measure is undefined if TP + FP = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Goutte C, Gaussier E (2005). “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation.” In Lecture Notes in Computer Science, 345–359. doi:10.1007/978-3-540-31865-1_25.

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), prauc(), tn(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
ppv(truth, response, positive = "a")

Area Under the Precision-Recall Curve

Description

Measure to compare true observed labels with predicted probabilities in binary classification tasks.

Usage

prauc(truth, prob, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

prob

(numeric())
Predicted probability for positive class. Must have exactly same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

Computes the area under the Precision-Recall curve (PRC). The PRC can be interpreted as the relationship between precision and recall (sensitivity), and is considered to be a more appropriate measure for unbalanced datasets than the ROC curve. The AUC-PRC is computed by integration of the piecewise function.

This measure is undefined if the true values are either all positive or all negative.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: prob

References

Davis J, Goadrich M (2006). “The relationship between precision-recall and ROC curves.” In Proceedings of the 23rd International Conference on Machine Learning. ISBN 9781595933836.

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), tn(), tnr(), tp(), tpr()

Examples

truth = factor(c("a", "a", "a", "b"))
prob = c(.6, .7, .1, .4)
prauc(truth, prob, "a")

Relative Absolute Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rae(truth, response, na_value = NaN, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Relative Absolute Error is defined as

i=1ntirii=1ntitˉ,\frac{\sum_{i=1}^n \left| t_i - r_i \right|}{\sum_{i=1}^n \left| t_i - \bar{t} \right|},

where tˉ=i=1nti\bar{t} = \sum_{i=1}^n t_i. This measure is undefined for constant tt.

Can be interpreted as absolute error of the predictions relative to a naive model predicting the mean.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rae(truth, response)

Root Mean Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rmse(truth, response, sample_weights = NULL, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

...

(any)
Additional arguments. Currently ignored.

Details

The Root Mean Squared Error is defined as

1ni=1nwi(tiri)2,\sqrt{\frac{1}{n} \sum_{i=1}^n w_i \left( t_i - r_i \right)^2},

where wiw_i are normalized sample weights.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rmse(truth, response)

Root Mean Squared Log Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rmsle(truth, response, sample_weights = NULL, na_value = NaN, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

sample_weights

(numeric())
Vector of non-negative and finite sample weights. Must have the same length as truth. The vector gets automatically normalized to sum to one. Defaults to equal sample weights.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Root Mean Squared Log Error is defined as

1ni=1nwi(ln(1+ti)ln(1+ri))2,\sqrt{\frac{1}{n} \sum_{i=1}^n w_i \left( \ln (1 + t_i) - \ln (1 + r_i) \right)^2},

where wiw_i are normalized sample weights.

This measure is undefined if any element of tt or rr is less than or equal to 1-1.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rmsle(truth, response)

Root Relative Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rrse(truth, response, na_value = NaN, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Root Relative Squared Error is defined as

i=1n(tiri)2i=1n(titˉ)2,\sqrt{\frac{\sum_{i=1}^n \left( t_i - r_i \right)^2}{\sum_{i=1}^n \left( t_i - \bar{t} \right)^2}},

where tˉ=i=1nti\bar{t} = \sum_{i=1}^n t_i.

Can be interpreted as root of the squared error of the predictions relative to a naive model predicting the mean.

This measure is undefined for constant tt.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rrse(truth, response)

Relative Squared Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rse(truth, response, na_value = NaN, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Relative Squared Error is defined as

i=1n(tiri)2i=1n(titˉ)2,\frac{\sum_{i=1}^n \left( t_i - r_i \right)^2}{\sum_{i=1}^n \left( t_i - \bar{t} \right)^2},

where tˉ=i=1nti\bar{t} = \sum_{i=1}^n t_i.

Can be interpreted as squared error of the predictions relative to a naive model predicting the mean.

This measure is undefined for constant tt.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rsq(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rse(truth, response)

R Squared

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

rsq(truth, response, na_value = NaN, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

R Squared is defined as

1i=1n(tiri)2i=1n(titˉ)2,1 - \frac{\sum_{i=1}^n \left( t_i - r_i \right)^2}{\sum_{i=1}^n \left( t_i - \bar{t} \right)^2},

where tˉ=i=1nti\bar{t} = \sum_{i=1}^n t_i.

Also known as coefficient of determination or explained variation. Subtracts the rse() from 1, hence it compares the squared error of the predictions relative to a naive model predicting the mean.

This measure is undefined for constant tt.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: (,1](-\infty, 1]

  • Minimize: FALSE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), sae(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
rsq(truth, response)

Sum of Absolute Errors

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

sae(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

The Sum of Absolute Errors is defined as

i=1ntiri.\sum_{i=1}^n \left| t_i - r_i \right|.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), se(), sle(), smape(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
sae(truth, response)

Squared Error (per observation)

Description

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

se(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

Calculates the per-observation squared error as

(tiri)2.\left( t_i - r_i \right)^2.

Value

Performance value as numeric(length(truth)).

Meta Information

  • Type: "regr"

  • Range (per observation): [0,)[0, \infty)

  • Minimize (per observation): TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), sle(), smape(), srho(), sse()


Squared Log Error (per observation)

Description

Calculates the per-observation squared error as

(ln(1+ti)ln(1+ri))2.\left( \ln (1 + t_i) - \ln (1 + r_i) \right)^2.

Measure to compare true observed response with predicted response in regression tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

sle(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Value

Performance value as numeric(length(truth)).

Meta Information

  • Type: "regr"

  • Range (per observation): [0,)[0, \infty)

  • Minimize (per observation): TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), smape(), srho(), sse()


Symmetric Mean Absolute Percent Error

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

smape(truth, response, na_value = NaN, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The Symmetric Mean Absolute Percent Error is defined as

2ni=1ntiriti+ri.\frac{2}{n} \sum_{i=1}^n \frac{\left| t_i - r_i \right|}{\left| t_i \right| + \left| r_i \right|}.

This measure is undefined if if any t+r|t| + |r| is equal to 00.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,2][0, 2]

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), srho(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
smape(truth, response)

Spearman's rho

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

srho(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

Spearman's rho is defined as Spearman's rank correlation coefficient between truth and response. Calls stats::cor() with method set to "spearman".

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [1,1][-1, 1]

  • Minimize: FALSE

  • Required prediction: response

References

Rosset S, Perlich C, Zadrozny B (2006). “Ranking-based evaluation of regression models.” Knowledge and Information Systems, 12(3), 331–353. doi:10.1007/s10115-006-0037-3.

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), sse()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
srho(truth, response)

Sum of Squared Errors

Description

Measure to compare true observed response with predicted response in regression tasks.

Usage

sse(truth, response, ...)

Arguments

truth

(numeric())
True (observed) values. Must have the same length as response.

response

(numeric())
Predicted response values. Must have the same length as truth.

...

(any)
Additional arguments. Currently ignored.

Details

The Sum of Squared Errors is defined as

i=1n(tiri)2.\sum_{i=1}^n \left( t_i - r_i \right)^2.

Value

Performance value as numeric(1).

Meta Information

  • Type: "regr"

  • Range: [0,)[0, \infty)

  • Minimize: TRUE

  • Required prediction: response

See Also

Other Regression Measures: ae(), ape(), bias(), ktau(), linex(), mae(), mape(), maxae(), maxse(), medae(), medse(), mse(), msle(), pbias(), pinball(), rae(), rmse(), rmsle(), rrse(), rse(), rsq(), sae(), se(), sle(), smape(), srho()

Examples

set.seed(1)
truth = 1:10
response = truth + rnorm(10)
sse(truth, response)

True Negatives

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

tn(truth, response, positive, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

...

(any)
Additional arguments. Currently ignored.

Details

This measure counts the true negatives, i.e. the number of predictions correctly indicating a negative class label. This is sometimes also called a "correct rejection".

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,)[0, \infty)

  • Minimize: FALSE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tnr(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tn(truth, response, positive = "a")

True Negative Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

tnr(truth, response, positive, na_value = NaN, ...)

specificity(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The True Negative Rate is defined as

TNFP+TN.\frac{\mathrm{TN}}{\mathrm{FP} + \mathrm{TN}}.

Also know as "specificity" or "selectivity".

This measure is undefined if FP + TN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tp(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tnr(truth, response, positive = "a")

True Positives

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

tp(truth, response, positive, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

...

(any)
Additional arguments. Currently ignored.

Details

This measure counts the true positives, i.e. the number of predictions correctly indicating a positive class label. This is sometimes also called a "hit".

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,)[0, \infty)

  • Minimize: FALSE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tpr()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tp(truth, response, positive = "a")

True Positive Rate

Description

Measure to compare true observed labels with predicted labels in binary classification tasks.

Usage

tpr(truth, response, positive, na_value = NaN, ...)

recall(truth, response, positive, na_value = NaN, ...)

sensitivity(truth, response, positive, na_value = NaN, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the exactly same two levels and the same length as response.

response

(factor())
Predicted response labels. Must have the exactly same two levels and the same length as truth.

positive

(⁠character(1))⁠
Name of the positive class.

na_value

(numeric(1))
Value that should be returned if the measure is not defined for the input (as described in the note). Default is NaN.

...

(any)
Additional arguments. Currently ignored.

Details

The True Positive Rate is defined as

TPTP+FN.\frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}}.

This is also know as "recall", "sensitivity", or "probability of detection".

This measure is undefined if TP + FN = 0.

Value

Performance value as numeric(1).

Meta Information

  • Type: "binary"

  • Range: [0,1][0, 1]

  • Minimize: FALSE

  • Required prediction: response

References

https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram

Goutte C, Gaussier E (2005). “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation.” In Lecture Notes in Computer Science, 345–359. doi:10.1007/978-3-540-31865-1_25.

See Also

Other Binary Classification Measures: auc(), bbrier(), dor(), fbeta(), fdr(), fn(), fnr(), fomr(), fp(), fpr(), gmean(), gpr(), npv(), ppv(), prauc(), tn(), tnr(), tp()

Examples

set.seed(1)
lvls = c("a", "b")
truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
response = factor(sample(lvls, 10, replace = TRUE), levels = lvls)
tpr(truth, response, positive = "a")

Zero-One Classification Loss (per observation)

Description

Calculates the per-observation 0/1 (zero-one) loss as

1(tir1).\mathbf{1} (t_i \neq r_1).

The 1/0 (one-zero) loss is equal to 1 - zero-one and calculated as

1(ti=ri).\mathbf{1} (t_i = r_i).

Measure to compare true observed labels with predicted labels in multiclass classification tasks.

Note that this is an unaggregated measure, returning the losses per observation.

Usage

zero_one(truth, response, ...)

one_zero(truth, response, ...)

Arguments

truth

(factor())
True (observed) labels. Must have the same levels and length as response.

response

(factor())
Predicted response labels. Must have the same levels and length as truth.

...

(any)
Additional arguments. Currently ignored.

Value

Performance value as numeric(length(truth)).

Meta Information

  • Type: "classif"

  • Range (per observation): [0,1][0, 1]

  • Minimize (per observation): TRUE

  • Required prediction: response

See Also

Other Classification Measures: acc(), bacc(), ce(), logloss(), mauc_aunu(), mbrier(), mcc()