Title: | Performance Measures for 'mlr3' |
---|---|
Description: | Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are. |
Authors: | Michel Lang [aut] , Martin Binder [ctb], Marc Becker [cre, aut] , Lona Koers [aut] |
Maintainer: | Marc Becker <[email protected]> |
License: | LGPL-3 |
Version: | 1.0.0 |
Built: | 2025-01-10 07:17:28 UTC |
Source: | CRAN |
Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are.
Maintainer: Marc Becker [email protected] (ORCID)
Authors:
Michel Lang [email protected] (ORCID)
Lona Koers
Other contributors:
Martin Binder [email protected] [contributor]
Useful links:
Report bugs at https://github.com/mlr-org/mlr3measures/issues
Measure to compare true observed labels with predicted labels in multiclass classification tasks.
acc(truth, response, sample_weights = NULL, ...)
acc(truth, response, sample_weights = NULL, ...)
truth |
( |
response |
( |
sample_weights |
( |
... |
( |
The Classification Accuracy is defined as
where are normalized weights for all observations
.
Performance value as numeric(1)
.
Type: "classif"
Range:
Minimize: FALSE
Required prediction: response
Other Classification Measures:
bacc()
,
ce()
,
logloss()
,
mauc_aunu()
,
mbrier()
,
mcc()
,
zero_one()
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) acc(truth, response)
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) acc(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
Note that this is an unaggregated measure, returning the losses per observation.
ae(truth, response, ...)
ae(truth, response, ...)
truth |
( |
response |
( |
... |
( |
Calculates the per-observation absolute error as
Performance value as numeric(length(truth))
.
Type: "regr"
Range (per observation):
Minimize (per observation): TRUE
Required prediction: response
Other Regression Measures:
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
Measure to compare true observed response with predicted response in regression tasks.
Note that this is an unaggregated measure, returning the losses per observation.
ape(truth, response, ...)
ape(truth, response, ...)
truth |
( |
response |
( |
... |
( |
Calculates the per-observation absolute percentage error as
Performance value as numeric(length(truth))
.
Type: "regr"
Range (per observation):
Minimize (per observation): TRUE
Required prediction: response
Other Regression Measures:
ae()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
Measure to compare true observed labels with predicted probabilities in binary classification tasks.
auc(truth, prob, positive, na_value = NaN, ...)
auc(truth, prob, positive, na_value = NaN, ...)
truth |
( |
prob |
( |
positive |
( |
na_value |
( |
... |
( |
Computes the area under the Receiver Operator Characteristic (ROC) curve. The AUC can be interpreted as the probability that a randomly chosen positive observation has a higher predicted probability than a randomly chosen negative observation.
This measure is undefined if the true values are either all positive or all negative.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: prob
Youden WJ (1950). “Index for rating diagnostic tests.” Cancer, 3(1), 32–35. doi:10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3.
Other Binary Classification Measures:
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
truth = factor(c("a", "a", "a", "b")) prob = c(.6, .7, .1, .4) auc(truth, prob, "a")
truth = factor(c("a", "a", "a", "b")) prob = c(.6, .7, .1, .4) auc(truth, prob, "a")
Measure to compare true observed labels with predicted labels in multiclass classification tasks.
bacc(truth, response, sample_weights = NULL, ...)
bacc(truth, response, sample_weights = NULL, ...)
truth |
( |
response |
( |
sample_weights |
( |
... |
( |
The Balanced Accuracy computes the weighted balanced accuracy, suitable for imbalanced data sets. It is defined analogously to the definition in sklearn.
First, all sample weights are normalized per class so that each class has the same influence:
The Balanced Accuracy is then calculated as
This definition is equivalent to acc()
with class-balanced sample weights.
Performance value as numeric(1)
.
Type: "classif"
Range:
Minimize: FALSE
Required prediction: response
Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010). “The Balanced Accuracy and Its Posterior Distribution.” In 2010 20th International Conference on Pattern Recognition. doi:10.1109/icpr.2010.764.
Guyon I, Bennett K, Cawley G, Escalante HJ, Escalera S, Ho TK, Macia N, Ray B, Saeed M, Statnikov A, Viegas E (2015). “Design of the 2015 ChaLearn AutoML challenge.” In 2015 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn.2015.7280767.
Other Classification Measures:
acc()
,
ce()
,
logloss()
,
mauc_aunu()
,
mbrier()
,
mcc()
,
zero_one()
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) bacc(truth, response)
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) bacc(truth, response)
Measure to compare true observed labels with predicted probabilities in binary classification tasks.
bbrier(truth, prob, positive, sample_weights = NULL, ...)
bbrier(truth, prob, positive, sample_weights = NULL, ...)
truth |
( |
prob |
( |
positive |
( |
sample_weights |
( |
... |
( |
The Binary Brier Score is defined as
where are the sample weights,
and
is 1 if observation
belongs to the positive class, and 0 otherwise.
Note that this (more common) definition of the Brier score is equivalent to the
original definition of the multi-class Brier score (see mbrier()
) divided by 2.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: TRUE
Required prediction: prob
https://en.wikipedia.org/wiki/Brier_score
Brier GW (1950). “Verification of forecasts expressed in terms of probability.” Monthly Weather Review, 78(1), 1–3. doi:10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.
Other Binary Classification Measures:
auc()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) prob = runif(10) bbrier(truth, prob, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) prob = runif(10) bbrier(truth, prob, positive = "a")
Measure to compare true observed response with predicted response in regression tasks.
bias(truth, response, sample_weights = NULL, ...)
bias(truth, response, sample_weights = NULL, ...)
truth |
( |
response |
( |
sample_weights |
( |
... |
( |
The Bias is defined as
where are normalized sample weights.
Good predictions score close to 0.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: NA
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) bias(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) bias(truth, response)
Measure to compare true observed labels with predicted labels in multiclass classification tasks.
ce(truth, response, sample_weights = NULL, ...)
ce(truth, response, sample_weights = NULL, ...)
truth |
( |
response |
( |
sample_weights |
( |
... |
( |
The Classification Error is defined as
where are normalized weights for each observation
.
Performance value as numeric(1)
.
Type: "classif"
Range:
Minimize: TRUE
Required prediction: response
Other Classification Measures:
acc()
,
bacc()
,
logloss()
,
mauc_aunu()
,
mbrier()
,
mcc()
,
zero_one()
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) ce(truth, response)
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) ce(truth, response)
Calculates the confusion matrix for a binary classification problem once and then calculates all binary confusion measures of this package.
confusion_matrix(truth, response, positive, na_value = NaN, relative = FALSE)
confusion_matrix(truth, response, positive, na_value = NaN, relative = FALSE)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
relative |
( |
The binary confusion matrix is defined as
If relative = TRUE
, all values are divided by .
List with two elements:
matrix
stores the calculated confusion matrix.
measures
stores the metrics as named numeric vector.
set.seed(123) lvls = c("a", "b") truth = factor(sample(lvls, 20, replace = TRUE), levels = lvls) response = factor(sample(lvls, 20, replace = TRUE), levels = lvls) confusion_matrix(truth, response, positive = "a") confusion_matrix(truth, response, positive = "a", relative = TRUE) confusion_matrix(truth, response, positive = "b")
set.seed(123) lvls = c("a", "b") truth = factor(sample(lvls, 20, replace = TRUE), levels = lvls) response = factor(sample(lvls, 20, replace = TRUE), levels = lvls) confusion_matrix(truth, response, positive = "a") confusion_matrix(truth, response, positive = "a", relative = TRUE) confusion_matrix(truth, response, positive = "b")
Measure to compare true observed labels with predicted labels in binary classification tasks.
dor(truth, response, positive, na_value = NaN, ...)
dor(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
The Diagnostic Odds Ratio is defined as
This measure is undefined if FP = 0 or FN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) dor(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) dor(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
fbeta(truth, response, positive, beta = 1, na_value = NaN, ...)
fbeta(truth, response, positive, beta = 1, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
beta |
( |
na_value |
( |
... |
( |
With as
precision()
and as
recall()
, the F-beta Score is defined as
It measures the effectiveness of retrieval with respect to a user who attaches times
as much importance to recall as precision.
For
, this measure is called "F1" score.
This measure is undefined if precision or recall is undefined, i.e. TP + FP = 0 or TP + FN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
Rijsbergen, Van CJ (1979). Information Retrieval, 2nd edition. Butterworth-Heinemann, Newton, MA, USA. ISBN 408709294.
Goutte C, Gaussier E (2005). “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation.” In Lecture Notes in Computer Science, 345–359. doi:10.1007/978-3-540-31865-1_25.
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fbeta(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fbeta(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
fdr(truth, response, positive, na_value = NaN, ...)
fdr(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
The False Discovery Rate is defined as
This measure is undefined if TP + FP = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: TRUE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fdr(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fdr(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
fn(truth, response, positive, ...)
fn(truth, response, positive, ...)
truth |
( |
response |
( |
positive |
( |
... |
( |
This measure counts the false negatives (type 2 error), i.e. the number of predictions indicating a negative class label while in fact it is positive. This is sometimes also called a "miss" or an "underestimation".
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: TRUE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fn(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fn(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
fnr(truth, response, positive, na_value = NaN, ...)
fnr(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
The False Negative Rate is defined as
Also know as "miss rate".
This measure is undefined if TP + FN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: TRUE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fnr(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fnr(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
fomr(truth, response, positive, na_value = NaN, ...)
fomr(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
The False Omission Rate is defined as
This measure is undefined if FN + TN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: TRUE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fomr(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fomr(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
fp(truth, response, positive, ...)
fp(truth, response, positive, ...)
truth |
( |
response |
( |
positive |
( |
... |
( |
This measure counts the false positives (type 1 error), i.e. the number of predictions indicating a positive class label while in fact it is negative. This is sometimes also called a "false alarm".
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: TRUE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fp(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fp(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
fpr(truth, response, positive, na_value = NaN, ...)
fpr(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
The False Positive Rate is defined as
Also know as fall out or probability of false alarm.
This measure is undefined if FP + TN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: TRUE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fpr(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) fpr(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
gmean(truth, response, positive, na_value = NaN, ...)
gmean(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
Calculates the geometric mean of recall()
R and specificity()
S as
This measure is undefined if recall or specificity is undefined, i.e. if TP + FN = 0 or if FP + TN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
He H, Garcia EA (2009). “Learning from Imbalanced Data.” IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. doi:10.1109/TKDE.2008.239.
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) gmean(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) gmean(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
gpr(truth, response, positive, na_value = NaN, ...)
gpr(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
Calculates the geometric mean of precision()
P and recall()
R as
This measure is undefined if precision or recall is undefined, i.e. if TP + FP = 0 or if TP + FN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
He H, Garcia EA (2009). “Learning from Imbalanced Data.” IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. doi:10.1109/TKDE.2008.239.
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) gpr(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) gpr(truth, response, positive = "a")
Measure to compare two or more sets w.r.t. their similarity.
jaccard(sets, na_value = NaN, ...)
jaccard(sets, na_value = NaN, ...)
sets |
( |
na_value |
( |
... |
( |
For two sets and
, the Jaccard Index is defined as
If more than two sets are provided, the mean of all pairwise scores is calculated.
This measure is undefined if two or more sets are empty.
Performance value as numeric(1)
.
Type: "similarity"
Range:
Minimize: FALSE
Jaccard, Paul (1901). “Étude comparative de la distribution florale dans une portion des Alpes et du Jura.” Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547-579. doi:10.5169/SEALS-266450.
Bommert A, Rahnenführer J, Lang M (2017). “A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.” Computational and Mathematical Methods in Medicine, 2017, 1–18. doi:10.1155/2017/7907163.
Bommert A, Lang M (2021). “stabm: Stability Measures for Feature Selection.” Journal of Open Source Software, 6(59), 3010. doi:10.21105/joss.03010.
Package stabm which implements many more stability measures with included correction for chance.
Other Similarity Measures:
phi()
set.seed(1) sets = list( sample(letters[1:3], 1), sample(letters[1:3], 2) ) jaccard(sets)
set.seed(1) sets = list( sample(letters[1:3], 1), sample(letters[1:3], 2) ) jaccard(sets)
Measure to compare true observed response with predicted response in regression tasks.
ktau(truth, response, ...)
ktau(truth, response, ...)
truth |
( |
response |
( |
... |
( |
Kendall's tau is defined as Kendall's rank correlation coefficient between truth and response. It is defined as
Calls stats::cor()
with method
set to "kendall"
.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: FALSE
Required prediction: response
Rosset S, Perlich C, Zadrozny B (2006). “Ranking-based evaluation of regression models.” Knowledge and Information Systems, 12(3), 331–353. doi:10.1007/s10115-006-0037-3.
Other Regression Measures:
ae()
,
ape()
,
bias()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) ktau(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) ktau(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
Note that this is an unaggregated measure, returning the losses per observation.
linex(truth, response, a = -1, b = 1, ...)
linex(truth, response, a = -1, b = 1, ...)
truth |
( |
response |
( |
a |
( |
b |
( |
... |
( |
The Linear-Exponential Loss is defined as
where .
Performance value as numeric(length(truth))
.
Type: "regr"
Range (per observation):
Minimize (per observation): TRUE
Required prediction: response
Varian, R. H (1975). “A Bayesian Approach to Real Estate Assessment.” In Fienberg SE, Zellner A (eds.), Studies in Bayesian Econometrics and Statistics: In Honor of Leonard J. Savage, 195–208. North-Holland, Amsterdam.
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) linex(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) linex(truth, response)
Measure to compare true observed labels with predicted probabilities in multiclass classification tasks.
logloss(truth, prob, sample_weights = NULL, eps = 1e-15, ...)
logloss(truth, prob, sample_weights = NULL, eps = 1e-15, ...)
truth |
( |
prob |
( |
sample_weights |
( |
eps |
( |
... |
( |
The Log Loss (a.k.a Benoulli Loss, Logistic Loss, Cross-Entropy Loss) is defined as
where is the probability for the true class of observation
and
are normalized weights for each observation
.
Performance value as numeric(1)
.
Type: "classif"
Range:
Minimize: TRUE
Required prediction: prob
Other Classification Measures:
acc()
,
bacc()
,
ce()
,
mauc_aunu()
,
mbrier()
,
mcc()
,
zero_one()
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) prob = matrix(runif(3 * 10), ncol = 3, dimnames = list(NULL, lvls)) prob = t(apply(prob, 1, function(x) x / sum(x))) logloss(truth, prob)
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) prob = matrix(runif(3 * 10), ncol = 3, dimnames = list(NULL, lvls)) prob = t(apply(prob, 1, function(x) x / sum(x))) logloss(truth, prob)
Measure to compare true observed response with predicted response in regression tasks.
mae(truth, response, sample_weights = NULL, ...)
mae(truth, response, sample_weights = NULL, ...)
truth |
( |
response |
( |
sample_weights |
( |
... |
( |
The Mean Absolute Error is defined as
where are normalized sample weights.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) mae(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) mae(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
mape(truth, response, sample_weights = NULL, na_value = NaN, ...)
mape(truth, response, sample_weights = NULL, na_value = NaN, ...)
truth |
( |
response |
( |
sample_weights |
( |
na_value |
( |
... |
( |
The Mean Absolute Percent Error is defined as
where are normalized sample weights.
This measure is undefined if any element of is
.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
de Myttenaere, Arnaud, Golden, Boris, Le Grand, Bénédicte, Rossi, Fabrice (2016). “Mean Absolute Percentage Error for regression models.” Neurocomputing, 192, 38-48. ISSN 0925-2312, doi:10.1016/j.neucom.2015.12.114.
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) mape(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) mape(truth, response)
Measure to compare true observed labels with predicted probabilities in multiclass classification tasks.
mauc_aunu(truth, prob, na_value = NaN, ...) mauc_aunp(truth, prob, na_value = NaN, ...) mauc_au1u(truth, prob, na_value = NaN, ...) mauc_au1p(truth, prob, na_value = NaN, ...) mauc_mu(truth, prob, na_value = NaN, ...)
mauc_aunu(truth, prob, na_value = NaN, ...) mauc_aunp(truth, prob, na_value = NaN, ...) mauc_au1u(truth, prob, na_value = NaN, ...) mauc_au1p(truth, prob, na_value = NaN, ...) mauc_mu(truth, prob, na_value = NaN, ...)
truth |
( |
prob |
( |
na_value |
( |
... |
( |
Multiclass AUC measures.
AUNU: AUC of each class against the rest, using the uniform class
distribution. Computes the AUC treating a c
-dimensional classifier
as c
two-dimensional 1-vs-rest classifiers, where classes are assumed to have
uniform distribution, in order to have a measure which is independent
of class distribution change (Fawcett 2001).
AUNP: AUC of each class against the rest, using the a-priori class
distribution. Computes the AUC treating a c
-dimensional classifier as c
two-dimensional 1-vs-rest classifiers, taking into account the prior probability of
each class (Fawcett 2001).
AU1U: AUC of each class against each other, using the uniform class
distribution. Computes something like the AUC of c(c - 1)
binary classifiers
(all possible pairwise combinations). See Hand (2001) for details.
AU1P: AUC of each class against each other, using the a-priori class
distribution. Computes something like AUC of c(c - 1)
binary classifiers
while considering the a-priori distribution of the classes as suggested
in Ferri (2009). Note we deviate from the definition in
Ferri (2009) by a factor of c
.
MU: Multiclass AUC as defined in Kleinman and Page (2019). This measure is an average of the pairwise AUCs between all classes. The measure was tested against the Python implementation by Ross Kleinman.
Performance value as numeric(1)
.
Type: "classif"
Range:
Minimize: FALSE
Required prediction: prob
Fawcett, Tom (2001). “Using rule sets to maximize ROC performance.” In Proceedings 2001 IEEE international conference on data mining, 131–138. IEEE.
Ferri, César, Hernández-Orallo, José, Modroiu, R (2009). “An experimental comparison of performance measures for classification.” Pattern Recognition Letters, 30(1), 27–38. doi:10.1016/j.patrec.2008.08.010.
Hand, J D, Till, J R (2001). “A simple generalisation of the area under the ROC curve for multiple class classification problems.” Machine learning, 45(2), 171–186.
Kleiman R, Page D (2019). “AUC mu: A Performance Metric for Multi-Class Machine Learning Models.” In Chaudhuri, Kamalika, Salakhutdinov, Ruslan (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 series Proceedings of Machine Learning Research, 3439–3447. PMLR.
Other Classification Measures:
acc()
,
bacc()
,
ce()
,
logloss()
,
mbrier()
,
mcc()
,
zero_one()
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) prob = matrix(runif(3 * 10), ncol = 3) colnames(prob) = levels(truth) mauc_aunu(truth, prob)
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) prob = matrix(runif(3 * 10), ncol = 3) colnames(prob) = levels(truth) mauc_aunu(truth, prob)
Measure to compare true observed response with predicted response in regression tasks.
maxae(truth, response, ...)
maxae(truth, response, ...)
truth |
( |
response |
( |
... |
( |
The Max Absolute Error is defined as
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) maxae(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) maxae(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
maxse(truth, response, ...)
maxse(truth, response, ...)
truth |
( |
response |
( |
... |
( |
The Max Squared Error is defined as
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) maxse(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) maxse(truth, response)
Measure to compare true observed labels with predicted probabilities in multiclass classification tasks.
mbrier(truth, prob, ...)
mbrier(truth, prob, ...)
truth |
( |
prob |
( |
... |
( |
Brier score for multi-class classification problems with labels defined as
is 1 if observation
has true label
, and 0 otherwise.
is the probability that observation
belongs to class
.
Note that there also is the more common definition of the Brier score for binary
classification problems in bbrier()
.
Performance value as numeric(1)
.
Type: "classif"
Range:
Minimize: TRUE
Required prediction: prob
Brier GW (1950). “Verification of forecasts expressed in terms of probability.” Monthly Weather Review, 78(1), 1–3. doi:10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.
Other Classification Measures:
acc()
,
bacc()
,
ce()
,
logloss()
,
mauc_aunu()
,
mcc()
,
zero_one()
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) prob = matrix(runif(3 * 10), ncol = 3) colnames(prob) = levels(truth) mbrier(truth, prob)
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) prob = matrix(runif(3 * 10), ncol = 3) colnames(prob) = levels(truth) mbrier(truth, prob)
Measure to compare true observed labels with predicted labels in multiclass classification tasks.
mcc(truth, response, positive = NULL, ...)
mcc(truth, response, positive = NULL, ...)
truth |
( |
response |
( |
positive |
( |
... |
( |
In the binary case, the Matthews Correlation Coefficient is defined as
where ,
,
,
are the number of true positives, false positives, true negatives, and false negatives respectively.
In the multi-class case, the Matthews Correlation Coefficient is defined for a multi-class confusion matrix with
classes:
where
: total number of samples
: total number of correctly predicted samples
: number of predictions for each class
: number of true occurrences for each class
.
The above formula is undefined if any of the four sums in the denominator is 0 in the binary case and more generally if either or
is equal to 0.
The denominator is then set to 1.
When there are more than two classes, the MCC will no longer range between -1 and +1. Instead, the minimum value will be between -1 and 0 depending on the true distribution. The maximum value is always +1.
Performance value as numeric(1)
.
Type: "classif"
Range:
Minimize: FALSE
Required prediction: response
https://en.wikipedia.org/wiki/Phi_coefficient
Matthews BW (1975). “Comparison of the predicted and observed secondary structure of T4 phage lysozyme.” Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2), 442–451. doi:10.1016/0005-2795(75)90109-9.
Other Classification Measures:
acc()
,
bacc()
,
ce()
,
logloss()
,
mauc_aunu()
,
mbrier()
,
zero_one()
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) mcc(truth, response)
set.seed(1) lvls = c("a", "b", "c") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) mcc(truth, response)
The environment()
measures
keeps track of all measures in this package.
It stores meta information such as minimum, maximum or if the
measure must be minimized or maximized.
The following information is available for each measure:
id
: Name of the measure.
title
: Short descriptive title.
type
: "binary"
for binary classification, "classif"
for binary or multi-class classification,
"regr"
for regression and "similarity"
for similarity measures.
lower
: lower bound.
upper
: upper bound.
predict_type
: prediction type the measure operates on.
"response"
corresponds to class labels for classification and the numeric response for regression.
"prob"
corresponds to class probabilities, provided as a matrix with class labels as column names.
"se"
corresponds to to the vector of predicted standard errors for regression.
minimize
: If TRUE
or FALSE
, the objective is to minimize or maximize the measure, respectively.
Can also be NA
.
obs_loss
: Name of the function which is called to calculate the (unaggregated) loss per observation.
trafo
: Optional list()
of length 2, containing a transformation "fn"
and its derivative "deriv"
.
aggregated
: If TRUE
, this function aggregates the losses to a single numeric value.
Otherwise, a vector of losses is returned.
sample_weights
: If TRUE
, it is possible calculate a weighted measure.
measures
measures
An object of class environment
of length 65.
names(measures) measures$tpr
names(measures) measures$tpr
Measure to compare true observed response with predicted response in regression tasks.
medae(truth, response, ...)
medae(truth, response, ...)
truth |
( |
response |
( |
... |
( |
The Median Absolute Error is defined as
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) medae(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) medae(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
medse(truth, response, ...)
medse(truth, response, ...)
truth |
( |
response |
( |
... |
( |
The Median Squared Error is defined as
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) medse(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) medse(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
mse(truth, response, sample_weights = NULL, ...)
mse(truth, response, sample_weights = NULL, ...)
truth |
( |
response |
( |
sample_weights |
( |
... |
( |
The Mean Squared Error is defined as
where are normalized sample weights.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) mse(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) mse(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
msle(truth, response, sample_weights = NULL, na_value = NaN, ...)
msle(truth, response, sample_weights = NULL, na_value = NaN, ...)
truth |
( |
response |
( |
sample_weights |
( |
na_value |
( |
... |
( |
The Mean Squared Log Error is defined as
where are normalized sample weights.
This measure is undefined if any element of
or
is less than or equal to
.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) msle(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) msle(truth, response)
Measure to compare true observed labels with predicted labels in binary classification tasks.
npv(truth, response, positive, na_value = NaN, ...)
npv(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
The Negative Predictive Value is defined as
This measure is undefined if FN + TN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) npv(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) npv(truth, response, positive = "a")
Measure to compare true observed response with predicted response in regression tasks.
pbias(truth, response, sample_weights = NULL, na_value = NaN, ...)
pbias(truth, response, sample_weights = NULL, na_value = NaN, ...)
truth |
( |
response |
( |
sample_weights |
( |
na_value |
( |
... |
( |
The Percent Bias is defined as
where are normalized sample weights.
Good predictions score close to 0.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: NA
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) pbias(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) pbias(truth, response)
Measure to compare two or more sets w.r.t. their similarity.
phi(sets, p, na_value = NaN, ...)
phi(sets, p, na_value = NaN, ...)
sets |
( |
p |
( |
na_value |
( |
... |
( |
The Phi Coefficient is defined as the Pearson correlation between the binary
representation of two sets and
.
The binary representation for
is a logical vector of
length
with the i-th element being 1 if the corresponding
element is in
, and 0 otherwise.
If more than two sets are provided, the mean of all pairwise scores is calculated.
This measure is undefined if one set contains none or all possible elements.
Performance value as numeric(1)
.
Type: "similarity"
Range:
Minimize: FALSE
Nogueira S, Brown G (2016). “Measuring the Stability of Feature Selection.” In Machine Learning and Knowledge Discovery in Databases, 442–457. Springer International Publishing. doi:10.1007/978-3-319-46227-1_28.
Bommert A, Rahnenführer J, Lang M (2017). “A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.” Computational and Mathematical Methods in Medicine, 2017, 1–18. doi:10.1155/2017/7907163.
Bommert A, Lang M (2021). “stabm: Stability Measures for Feature Selection.” Journal of Open Source Software, 6(59), 3010. doi:10.21105/joss.03010.
Package stabm which implements many more stability measures with included correction for chance.
Other Similarity Measures:
jaccard()
set.seed(1) sets = list( sample(letters[1:3], 1), sample(letters[1:3], 2) ) phi(sets, p = 3)
set.seed(1) sets = list( sample(letters[1:3], 1), sample(letters[1:3], 2) ) phi(sets, p = 3)
Measure to compare true observed response with predicted response in regression tasks.
pinball(truth, response, sample_weights = NULL, alpha = 0.5, ...)
pinball(truth, response, sample_weights = NULL, alpha = 0.5, ...)
truth |
( |
response |
( |
sample_weights |
( |
alpha |
|
... |
( |
The pinball loss for quantile regression is defined as
where is the quantile and
are normalized sample weights.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) pinball(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) pinball(truth, response)
Measure to compare true observed labels with predicted labels in binary classification tasks.
ppv(truth, response, positive, na_value = NaN, ...) precision(truth, response, positive, na_value = NaN, ...)
ppv(truth, response, positive, na_value = NaN, ...) precision(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
The Positive Predictive Value is defined as
Also know as "precision".
This measure is undefined if TP + FP = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Goutte C, Gaussier E (2005). “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation.” In Lecture Notes in Computer Science, 345–359. doi:10.1007/978-3-540-31865-1_25.
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
prauc()
,
tn()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) ppv(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) ppv(truth, response, positive = "a")
Measure to compare true observed labels with predicted probabilities in binary classification tasks.
prauc(truth, prob, positive, na_value = NaN, ...)
prauc(truth, prob, positive, na_value = NaN, ...)
truth |
( |
prob |
( |
positive |
( |
na_value |
( |
... |
( |
Computes the area under the Precision-Recall curve (PRC). The PRC can be interpreted as the relationship between precision and recall (sensitivity), and is considered to be a more appropriate measure for unbalanced datasets than the ROC curve. The AUC-PRC is computed by integration of the piecewise function.
This measure is undefined if the true values are either all positive or all negative.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: prob
Davis J, Goadrich M (2006). “The relationship between precision-recall and ROC curves.” In Proceedings of the 23rd International Conference on Machine Learning. ISBN 9781595933836.
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
tn()
,
tnr()
,
tp()
,
tpr()
truth = factor(c("a", "a", "a", "b")) prob = c(.6, .7, .1, .4) prauc(truth, prob, "a")
truth = factor(c("a", "a", "a", "b")) prob = c(.6, .7, .1, .4) prauc(truth, prob, "a")
Measure to compare true observed response with predicted response in regression tasks.
rae(truth, response, na_value = NaN, ...)
rae(truth, response, na_value = NaN, ...)
truth |
( |
response |
( |
na_value |
( |
... |
( |
The Relative Absolute Error is defined as
where .
This measure is undefined for constant
.
Can be interpreted as absolute error of the predictions relative to a naive model predicting the mean.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) rae(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) rae(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
rmse(truth, response, sample_weights = NULL, ...)
rmse(truth, response, sample_weights = NULL, ...)
truth |
( |
response |
( |
sample_weights |
( |
... |
( |
The Root Mean Squared Error is defined as
where are normalized sample weights.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) rmse(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) rmse(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
rmsle(truth, response, sample_weights = NULL, na_value = NaN, ...)
rmsle(truth, response, sample_weights = NULL, na_value = NaN, ...)
truth |
( |
response |
( |
sample_weights |
( |
na_value |
( |
... |
( |
The Root Mean Squared Log Error is defined as
where are normalized sample weights.
This measure is undefined if any element of or
is less than or equal to
.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) rmsle(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) rmsle(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
rrse(truth, response, na_value = NaN, ...)
rrse(truth, response, na_value = NaN, ...)
truth |
( |
response |
( |
na_value |
( |
... |
( |
The Root Relative Squared Error is defined as
where .
Can be interpreted as root of the squared error of the predictions relative to a naive model predicting the mean.
This measure is undefined for constant .
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) rrse(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) rrse(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
rse(truth, response, na_value = NaN, ...)
rse(truth, response, na_value = NaN, ...)
truth |
( |
response |
( |
na_value |
( |
... |
( |
The Relative Squared Error is defined as
where .
Can be interpreted as squared error of the predictions relative to a naive model predicting the mean.
This measure is undefined for constant .
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) rse(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) rse(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
rsq(truth, response, na_value = NaN, ...)
rsq(truth, response, na_value = NaN, ...)
truth |
( |
response |
( |
na_value |
( |
... |
( |
R Squared is defined as
where .
Also known as coefficient of determination or explained variation.
Subtracts the rse()
from 1, hence it compares the squared error of
the predictions relative to a naive model predicting the mean.
This measure is undefined for constant .
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: FALSE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) rsq(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) rsq(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
sae(truth, response, ...)
sae(truth, response, ...)
truth |
( |
response |
( |
... |
( |
The Sum of Absolute Errors is defined as
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
se()
,
sle()
,
smape()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) sae(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) sae(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
Note that this is an unaggregated measure, returning the losses per observation.
se(truth, response, ...)
se(truth, response, ...)
truth |
( |
response |
( |
... |
( |
Calculates the per-observation squared error as
Performance value as numeric(length(truth))
.
Type: "regr"
Range (per observation):
Minimize (per observation): TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
sle()
,
smape()
,
srho()
,
sse()
Calculates the per-observation squared error as
Measure to compare true observed response with predicted response in regression tasks.
Note that this is an unaggregated measure, returning the losses per observation.
sle(truth, response, ...)
sle(truth, response, ...)
truth |
( |
response |
( |
... |
( |
Performance value as numeric(length(truth))
.
Type: "regr"
Range (per observation):
Minimize (per observation): TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
smape()
,
srho()
,
sse()
Measure to compare true observed response with predicted response in regression tasks.
smape(truth, response, na_value = NaN, ...)
smape(truth, response, na_value = NaN, ...)
truth |
( |
response |
( |
na_value |
( |
... |
( |
The Symmetric Mean Absolute Percent Error is defined as
This measure is undefined if if any is equal to
.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
srho()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) smape(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) smape(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
srho(truth, response, ...)
srho(truth, response, ...)
truth |
( |
response |
( |
... |
( |
Spearman's rho is defined as Spearman's rank correlation coefficient between truth and response.
Calls stats::cor()
with method
set to "spearman"
.
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: FALSE
Required prediction: response
Rosset S, Perlich C, Zadrozny B (2006). “Ranking-based evaluation of regression models.” Knowledge and Information Systems, 12(3), 331–353. doi:10.1007/s10115-006-0037-3.
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
sse()
set.seed(1) truth = 1:10 response = truth + rnorm(10) srho(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) srho(truth, response)
Measure to compare true observed response with predicted response in regression tasks.
sse(truth, response, ...)
sse(truth, response, ...)
truth |
( |
response |
( |
... |
( |
The Sum of Squared Errors is defined as
Performance value as numeric(1)
.
Type: "regr"
Range:
Minimize: TRUE
Required prediction: response
Other Regression Measures:
ae()
,
ape()
,
bias()
,
ktau()
,
linex()
,
mae()
,
mape()
,
maxae()
,
maxse()
,
medae()
,
medse()
,
mse()
,
msle()
,
pbias()
,
pinball()
,
rae()
,
rmse()
,
rmsle()
,
rrse()
,
rse()
,
rsq()
,
sae()
,
se()
,
sle()
,
smape()
,
srho()
set.seed(1) truth = 1:10 response = truth + rnorm(10) sse(truth, response)
set.seed(1) truth = 1:10 response = truth + rnorm(10) sse(truth, response)
Measure to compare true observed labels with predicted labels in binary classification tasks.
tn(truth, response, positive, ...)
tn(truth, response, positive, ...)
truth |
( |
response |
( |
positive |
( |
... |
( |
This measure counts the true negatives, i.e. the number of predictions correctly indicating a negative class label. This is sometimes also called a "correct rejection".
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tnr()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) tn(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) tn(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
tnr(truth, response, positive, na_value = NaN, ...) specificity(truth, response, positive, na_value = NaN, ...)
tnr(truth, response, positive, na_value = NaN, ...) specificity(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
The True Negative Rate is defined as
Also know as "specificity" or "selectivity".
This measure is undefined if FP + TN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tp()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) tnr(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) tnr(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
tp(truth, response, positive, ...)
tp(truth, response, positive, ...)
truth |
( |
response |
( |
positive |
( |
... |
( |
This measure counts the true positives, i.e. the number of predictions correctly indicating a positive class label. This is sometimes also called a "hit".
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tpr()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) tp(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) tp(truth, response, positive = "a")
Measure to compare true observed labels with predicted labels in binary classification tasks.
tpr(truth, response, positive, na_value = NaN, ...) recall(truth, response, positive, na_value = NaN, ...) sensitivity(truth, response, positive, na_value = NaN, ...)
tpr(truth, response, positive, na_value = NaN, ...) recall(truth, response, positive, na_value = NaN, ...) sensitivity(truth, response, positive, na_value = NaN, ...)
truth |
( |
response |
( |
positive |
( |
na_value |
( |
... |
( |
The True Positive Rate is defined as
This is also know as "recall", "sensitivity", or "probability of detection".
This measure is undefined if TP + FN = 0.
Performance value as numeric(1)
.
Type: "binary"
Range:
Minimize: FALSE
Required prediction: response
https://en.wikipedia.org/wiki/Template:DiagnosticTesting_Diagram
Goutte C, Gaussier E (2005). “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation.” In Lecture Notes in Computer Science, 345–359. doi:10.1007/978-3-540-31865-1_25.
Other Binary Classification Measures:
auc()
,
bbrier()
,
dor()
,
fbeta()
,
fdr()
,
fn()
,
fnr()
,
fomr()
,
fp()
,
fpr()
,
gmean()
,
gpr()
,
npv()
,
ppv()
,
prauc()
,
tn()
,
tnr()
,
tp()
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) tpr(truth, response, positive = "a")
set.seed(1) lvls = c("a", "b") truth = factor(sample(lvls, 10, replace = TRUE), levels = lvls) response = factor(sample(lvls, 10, replace = TRUE), levels = lvls) tpr(truth, response, positive = "a")
Calculates the per-observation 0/1 (zero-one) loss as
The 1/0 (one-zero) loss is equal to 1 - zero-one and calculated as
Measure to compare true observed labels with predicted labels in multiclass classification tasks.
Note that this is an unaggregated measure, returning the losses per observation.
zero_one(truth, response, ...) one_zero(truth, response, ...)
zero_one(truth, response, ...) one_zero(truth, response, ...)
truth |
( |
response |
( |
... |
( |
Performance value as numeric(length(truth))
.
Type: "classif"
Range (per observation):
Minimize (per observation): TRUE
Required prediction: response
Other Classification Measures:
acc()
,
bacc()
,
ce()
,
logloss()
,
mauc_aunu()
,
mbrier()
,
mcc()