All kinds of predictor evaluations are performed using this function.
performance(prediction.obj, measure, x.measure = "cutoff", ...)
prediction.obj 
An object of class 
measure 
Performance measure to use for the evaluation. A complete list
of the performance measures that are available for 
x.measure 
A second performance measure. If different from the default,
a twodimensional curve, with 
... 
Optional arguments (specific to individual performance measures). 
Here is the list of available performance measures. Let Y and
$\hat{Y}$
be random variables representing the class and the prediction for
a randomly drawn sample, respectively. We denote by
$\oplus$
and $\ominus$
the positive and
negative class, respectively. Further, we use the following
abbreviations for empirical quantities: P (\# positive
samples), N (\# negative samples), TP (\# true positives), TN (\# true
negatives), FP (\# false positives), FN (\# false negatives).
acc
:Accuracy. $P(\hat{Y}=Y)$
. Estimated
as: $\frac{TP+TN}{P+N}$
.
err
:Error rate. $P(\hat{Y}\ne Y)$
. Estimated as: $\frac{FP+FN}{P+N}$
.
fpr
:False positive rate. $P(\hat{Y}=\oplus  Y =
\ominus)$
. Estimated as:
$\frac{FP}{N}$
.
fall
:Fallout. Same as fpr
.
tpr
:True positive
rate. $P(\hat{Y}=\oplusY=\oplus)$
. Estimated
as: $\frac{TP}{P}$
.
rec
:Recall. Same as tpr
.
sens
:Sensitivity. Same as tpr
.
fnr
:False negative
rate. $P(\hat{Y}=\ominusY=\oplus)$
. Estimated as: $\frac{FN}{P}$
.
miss
:Miss. Same as fnr
.
tnr
:True negative rate. $P(\hat{Y} =
\ominusY=\ominus)$
.
spec
:Specificity. Same as tnr
.
ppv
:Positive predictive
value. $P(Y=\oplus\hat{Y}=\oplus)$
. Estimated as: $\frac{TP}{TP+FP}$
.
prec
:Precision. Same as ppv
.
npv
:Negative predictive
value. $P(Y=\ominus\hat{Y}=\ominus)$
. Estimated as: $\frac{TN}{TN+FN}$
.
pcfall
:Predictionconditioned
fallout. $P(Y=\ominus\hat{Y}=\oplus)$
. Estimated as: $\frac{FP}{TP+FP}$
.
pcmiss
:Predictionconditioned
miss. $P(Y=\oplus\hat{Y}=\ominus)$
. Estimated as: $\frac{FN}{TN+FN}$
.
rpp
:Rate of positive predictions. $P( \hat{Y} =
\oplus)$
. Estimated as: (TP+FP)/(TP+FP+TN+FN).
rnp
:Rate of negative predictions. $P( \hat{Y} =
\ominus)$
. Estimated as: (TN+FN)/(TP+FP+TN+FN).
phi
:Phi correlation coefficient. $\frac{TP \cdot
TN  FP \cdot FN}{\sqrt{ (TP+FN) \cdot (TN+FP) \cdot (TP+FP)
\cdot (TN+FN)}}$
. Yields a
number between 1 and 1, with 1 indicating a perfect
prediction, 0 indicating a random prediction. Values below 0
indicate a worse than random prediction.
mat
:Matthews correlation coefficient. Same as phi
.
mi
:Mutual information. $I(\hat{Y},Y) := H(Y) 
H(Y\hat{Y})$
, where H is the
(conditional) entropy. Entropies are estimated naively (no bias
correction).
chisq
:Chi square test statistic. ?chisq.test
for details. Note that R might raise a warning if the sample size
is too small.
odds
:Odds ratio. $\frac{TP \cdot TN}{FN \cdot
FP}$
. Note that odds ratio produces
Inf or NA values for all cutoffs corresponding to FN=0 or
FP=0. This can substantially decrease the plotted cutoff region.
lift
:Lift
value. $\frac{P(\hat{Y}=\oplusY=\oplus)}{P(\hat{Y}=\oplus)}$
.
f
:Precisionrecall F measure (van Rijsbergen, 1979). Weighted
harmonic mean of precision (P) and recall (R). $F =
\frac{1}{\alpha \frac{1}{P} + (1\alpha)\frac{1}{R}}$
. If
$\alpha=\frac{1}{2}$
, the mean is balanced. A
frequent equivalent formulation is
$F = \frac{(\beta^2+1) \cdot P \cdot R}{R + \beta^2 \cdot
P}$
. In this formulation, the
mean is balanced if $\beta=1$
. Currently, ROCR only accepts
the alpha version as input (e.g. $\alpha=0.5$
). If no
value for alpha is given, the mean will be balanced by default.
rch
:ROC convex hull. A ROC (=tpr
vs fpr
) curve
with concavities (which represent suboptimal choices of cutoff) removed
(Fawcett 2001). Since the result is already a parametric performance
curve, it cannot be used in combination with other measures.
auc
:Area under the ROC curve. This is equal to the value of the
WilcoxonMannWhitney test statistic and also the probability that the
classifier will score are randomly drawn positive sample higher than a
randomly drawn negative sample. Since the output of
auc
is cutoffindependent, this
measure cannot be combined with other measures into a parametric
curve. The partial area under the ROC curve up to a given false
positive rate can be calculated by passing the optional parameter
fpr.stop=0.5
(or any other value between 0 and 1) to
performance
.
aucpr
:Area under the Precision/Recall curve. Since the output
of aucpr
is cutoffindependent, this measure cannot be combined
with other measures into a parametric curve.
prbe
:Precisionrecall breakeven point. The cutoff(s) where
precision and recall are equal. At this point, positive and negative
predictions are made at the same rate as their prevalence in the
data. Since the output of
prbe
is just a cutoffindependent scalar, this
measure cannot be combined with other measures into a parametric curve.
cal
:Calibration error. The calibration error is the
absolute difference between predicted confidence and actual reliability. This
error is estimated at all cutoffs by sliding a window across the
range of possible cutoffs. The default window size of 100 can be
adjusted by passing the optional parameter window.size=200
to performance
. E.g., if for several
positive samples the output of the classifier is around 0.75, you might
expect from a wellcalibrated classifier that the fraction of them
which is correctly predicted as positive is also around 0.75. In a
wellcalibrated classifier, the probabilistic confidence estimates
are realistic. Only for use with
probabilistic output (i.e. scores between 0 and 1).
mxe
:Mean crossentropy. Only for use with
probabilistic output. $MXE :=\frac{1}{P+N}( \sum_{y_i=\oplus}
ln(\hat{y}_i) + \sum_{y_i=\ominus} ln(1\hat{y}_i))$
. Since the output of
mxe
is just a cutoffindependent scalar, this
measure cannot be combined with other measures into a parametric curve.
rmse
:Rootmeansquared error. Only for use with
numerical class labels. $RMSE:=\sqrt{\frac{1}{P+N}\sum_i (y_i
 \hat{y}_i)^2}$
. Since the output of
rmse
is just a cutoffindependent scalar, this
measure cannot be combined with other measures into a parametric curve.
sar
:Score combinining performance measures of different characteristics, in the attempt of creating a more "robust" measure (cf. Caruana R., ROCAI2004): SAR = 1/3 * ( Accuracy + Area under the ROC curve + Root meansquared error ).
ecost
:Expected cost. For details on cost curves,
cf. Drummond&Holte 2000,2004. ecost
has an obligatory x
axis, the socalled 'probabilitycost function'; thus it cannot be
combined with other measures. While using ecost
one is
interested in the lower envelope of a set of lines, it might be
instructive to plot the whole set of lines in addition to the lower
envelope. An example is given in demo(ROCR)
.
cost
:Cost of a classifier when
classconditional misclassification costs are explicitly given.
Accepts the optional parameters cost.fp
and
cost.fn
, by which the costs for false positives and
negatives can be adjusted, respectively. By default, both are set
to 1.
An S4 object of class performance
.
Here is how to call performance()
to create some standard
evaluation plots:
measure="tpr", x.measure="fpr".
measure="prec", x.measure="rec".
measure="sens", x.measure="spec".
measure="lift", x.measure="rpp".
Tobias Sing tobias.sing@gmail.com, Oliver Sander osander@gmail.com
A detailed list of references can be found on the ROCR homepage at http://rocr.bioinf.mpisb.mpg.de.
prediction
,
predictionclass
,
performanceclass
,
plot.performance
# computing a simple ROC curve (xaxis: fpr, yaxis: tpr)
library(ROCR)
data(ROCR.simple)
pred < prediction( ROCR.simple$predictions, ROCR.simple$labels)
pred
perf < performance(pred,"tpr","fpr")
perf
plot(perf)
# precision/recall curve (xaxis: recall, yaxis: precision)
perf < performance(pred, "prec", "rec")
perf
plot(perf)
# sensitivity/specificity curve (xaxis: specificity,
# yaxis: sensitivity)
perf < performance(pred, "sens", "spec")
perf
plot(perf)
performance
Object to capture the result of a performance evaluation, optionally collecting evaluations from several crossvalidation or bootstrapping runs.
A performance
object can capture information from four
different evaluation scenarios:
The behaviour of a cutoffdependent performance measure across
the range of all cutoffs (e.g. performance( predObj, 'acc' )
). Here,
x.values
contains the cutoffs, y.values
the
corresponding values of the performance measure, and
alpha.values
is empty.
The tradeoff between two performance measures across the
range of all cutoffs (e.g. performance( predObj,
'tpr', 'fpr' )
). In this case, the cutoffs are stored in
alpha.values
, while x.values
and y.values
contain the corresponding values of the two performance measures.
A performance measure that comes along with an obligatory
second axis (e.g. performance( predObj, 'ecost' )
). Here, the measure values are
stored in y.values
, while the corresponding values of the
obligatory axis are stored in x.values
, and alpha.values
is empty.
A performance measure whose value is just a scalar
(e.g. performance( predObj, 'auc' )
). The value is then stored in
y.values
, while x.values
and alpha.values
are
empty.
x.name
Performance measure used for the x axis.
y.name
Performance measure used for the y axis.
alpha.name
Name of the unit that is used to create the parametrized
curve. Currently, curves can only be parametrized by cutoff, so
alpha.name
is either none
or cutoff
.
x.values
A list in which each entry contains the x values of the curve
of this particular crossvalidation run. x.values[[i]]
,
y.values[[i]]
, and alpha.values[[i]]
correspond to each
other.
y.values
A list in which each entry contains the y values of the curve of this particular crossvalidation run.
alpha.values
A list in which each entry contains the cutoff values of the curve of this particular crossvalidation run.
Objects can be created by using the performance
function.
Tobias Sing tobias.sing@gmail.com, Oliver Sander osander@gmail.com
A detailed list of references can be found on the ROCR homepage at http://rocr.bioinf.mpisb.mpg.de.
prediction
performance
,
predictionclass
,
plot.performance
This is the method to plot all objects of class performance.
## S4 method for signature 'performance,missing'
plot(
x,
y,
...,
avg = "none",
spread.estimate = "none",
spread.scale = 1,
show.spread.at = c(),
colorize = FALSE,
colorize.palette = rev(rainbow(256, start = 0, end = 4/6)),
colorkey = colorize,
colorkey.relwidth = 0.25,
colorkey.pos = "right",
print.cutoffs.at = c(),
cutoff.label.function = function(x) { round(x, 2) },
downsampling = 0,
add = FALSE
)
## S3 method for class 'performance'
plot(...)
x 
an object of class 
y 
not used 
... 
Optional graphical parameters to adjust different components of
the performance plot. Parameters are directed to their target component by
prefixing them with the name of the component ( 
avg 
If the performance object describes several curves (from
crossvalidation runs or bootstrap evaluations of one particular method),
the curves from each of the runs can be averaged. Allowed values are

spread.estimate 
When curve averaging is enabled, the variation around
the average curve can be visualized as standard error bars
( 
spread.scale 
For 
show.spread.at 
For vertical averaging, this vector determines the x positions for which the spread estimates should be visualized. In contrast, for horizontal and threshold averaging, the y positions and cutoffs are determined, respectively. By default, spread estimates are shown at 11 equally spaced positions. 
colorize 
This logical determines whether the curve(s) should be colorized according to cutoff. 
colorize.palette 
If curve colorizing is enabled, this determines the color palette onto which the cutoff range is mapped. 
colorkey 
If true, a color key is drawn into the 4% border
region (default of 
colorkey.relwidth 
Scalar between 0 and 1 that determines the fraction of the 4% border region that is occupied by the colorkey. 
colorkey.pos 
Determines if the colorkey is drawn vertically at
the 
print.cutoffs.at 
This vector specifies the cutoffs which should be printed as text along the curve at the corresponding curve positions. 
cutoff.label.function 
By default, cutoff annotations along the curve
or at the color key are rounded to two decimal places before printing.
Using a custom 
downsampling 
ROCR can efficiently compute most performance measures even for data sets with millions of elements. However, plotting of large data sets can be slow and lead to PS/PDF documents of considerable size. In that case, performance curves that are indistinguishable from the original can be obtained by using only a fraction of the computed performance values. Values for downsampling between 0 and 1 indicate the fraction of the original data set size to which the performance object should be downsampled, integers above 1 are interpreted as the actual number of performance values to which the curve(s) should be downsampled. 
add 
If 
Tobias Sing tobias.sing@gmail.com, Oliver Sander osander@gmail.com
A detailed list of references can be found on the ROCR homepage at http://rocr.bioinf.mpisb.mpg.de.
prediction
,
performance
,
predictionclass
,
performanceclass
# plotting a ROC curve:
library(ROCR)
data(ROCR.simple)
pred < prediction( ROCR.simple$predictions, ROCR.simple$labels )
pred
perf < performance( pred, "tpr", "fpr" )
perf
plot( perf )
# To entertain your children, make your plots nicer
# using ROCR's flexible parameter passing mechanisms
# (much cheaper than a finger painting set)
par(bg="lightblue", mai=c(1.2,1.5,1,1))
plot(perf, main="ROCR fingerpainting toolkit", colorize=TRUE,
xlab="Mary's axis", ylab="", box.lty=7, box.lwd=5,
box.col="gold", lwd=17, colorkey.relwidth=0.5, xaxis.cex.axis=2,
xaxis.col='blue', xaxis.col.axis="blue", yaxis.col='green', yaxis.cex.axis=2,
yaxis.at=c(0,0.5,0.8,0.85,0.9,1), yaxis.las=1, xaxis.lwd=2, yaxis.lwd=3,
yaxis.col.axis="orange", cex.lab=2, cex.main=2)
Every classifier evaluation using ROCR starts with creating a
prediction
object. This function is used to transform the input data
(which can be in vector, matrix, data frame, or list form) into a
standardized format.
prediction(predictions, labels, label.ordering = NULL)
predictions 
A vector, matrix, list, or data frame containing the predictions. 
labels 
A vector, matrix, list, or data frame containing the true class
labels. Must have the same dimensions as 
label.ordering 
The default ordering (cf.details) of the classes can be changed by supplying a vector containing the negative and the positive class label. 
predictions
and labels
can simply be vectors of the same
length. However, in the case of crossvalidation data, different
crossvalidation runs can be provided as the *columns* of a matrix or
data frame, or as the entries of a list. In the case of a matrix or
data frame, all crossvalidation runs must have the same length, whereas
in the case of a list, the lengths can vary across the crossvalidation
runs. Internally, as described in section 'Value', all of these input
formats are converted to list representation.
Since scoring classifiers give relative tendencies towards a negative
(low scores) or positive (high scores) class, it has to be declared
which class label denotes the negative, and which the positive class.
Ideally, labels should be supplied as ordered factor(s), the lower
level corresponding to the negative class, the upper level to the
positive class. If the labels are factors (unordered), numeric,
logical or characters, ordering of the labels is inferred from
R's builtin <
relation (e.g. 0 < 1, 1 < 1, 'a' < 'b',
FALSE < TRUE). Use label.ordering
to override this default
ordering. Please note that the ordering can be localedependent
e.g. for character labels '1' and '1'.
Currently, ROCR supports only binary classification (extensions toward multiclass classification are scheduled for the next release, however). If there are more than two distinct label symbols, execution stops with an error message. If all predictions use the same two symbols that are used for the labels, categorical predictions are assumed. If there are more than two predicted values, but all numeric, continuous predictions are assumed (i.e. a scoring classifier). Otherwise, if more than two symbols occur in the predictions, and not all of them are numeric, execution stops with an error message.
An S4 object of class prediction
.
Tobias Sing tobias.sing@gmail.com, Oliver Sander osander@gmail.com
A detailed list of references can be found on the ROCR homepage at http://rocr.bioinf.mpisb.mpg.de.
predictionclass
,
performance
,
performanceclass
,
plot.performance
# create a simple prediction object
library(ROCR)
data(ROCR.simple)
pred < prediction(ROCR.simple$predictions,ROCR.simple$labels)
pred
prediction
Object to encapsulate numerical predictions together with the corresponding true class labels, optionally collecting predictions and labels for several crossvalidation or bootstrapping runs.
predictions
A list, in which each element is a vector of predictions (the list has length > 1 for xvalidation data.
labels
Analogously, a list in which each element is a vector of true class labels.
cutoffs
A list in which each element is a vector of all necessary cutoffs. Each cutoff vector consists of the predicted scores (duplicates removed), in descending order.
fp
A list in which each element is a vector of the number (not the rate!) of false positives induced by the cutoffs given in the corresponding 'cutoffs' list entry.
tp
As fp, but for true positives.
tn
As fp, but for true negatives.
fn
As fp, but for false negatives.
n.pos
A list in which each element contains the number of positive samples in the given xvalidation run.
n.neg
As n.pos, but for negative samples.
n.pos.pred
A list in which each element is a vector of the number of samples predicted as positive at the cutoffs given in the corresponding 'cutoffs' entry.
n.neg.pred
As n.pos.pred, but for negatively predicted samples.
Objects can be created by using the prediction
function.
Every prediction
object contains information about the 2x2
contingency table consisting of tp,tn,fp, and fn, along with the
marginal sums n.pos,n.neg,n.pos.pred,n.neg.pred, because these form
the basis for many derived performance measures.
Tobias Sing tobias.sing@gmail.com, Oliver Sander osander@gmail.com
A detailed list of references can be found on the ROCR homepage at http://rocr.bioinf.mpisb.mpg.de.
prediction
,
performance
,
performanceclass
,
plot.performance
Linear support vector machines (libsvm) and neural networks (R package nnet) were applied to predict usage of the coreceptors CCR5 and CXCR4 based on sequence data of the third variable loop of the HIV envelope protein.
data(ROCR.hiv)
A list consisting of the SVM (ROCR.hiv$hiv.svm
) and NN
(ROCR.hiv$hiv.nn
) classification data. Each of those is in turn a list
consisting of the two elements $predictions
and $labels
(10
element list representing crossvalidation data).
Sing, T. & Beerenwinkel, N. & Lengauer, T. "Learning mixtures of localized rules by maximizing the area under the ROC curve". 1st International Workshop on ROC Analysis in AI, 8996, 2004.
library(ROCR)
data(ROCR.hiv)
attach(ROCR.hiv)
pred.svm < prediction(hiv.svm$predictions, hiv.svm$labels)
pred.svm
perf.svm < performance(pred.svm, 'tpr', 'fpr')
perf.svm
pred.nn < prediction(hiv.nn$predictions, hiv.svm$labels)
pred.nn
perf.nn < performance(pred.nn, 'tpr', 'fpr')
perf.nn
plot(perf.svm, lty=3, col="red",main="SVMs and NNs for prediction of
HIV1 coreceptor usage")
plot(perf.nn, lty=3, col="blue",add=TRUE)
plot(perf.svm, avg="vertical", lwd=3, col="red",
spread.estimate="stderror",plotCI.lwd=2,add=TRUE)
plot(perf.nn, avg="vertical", lwd=3, col="blue",
spread.estimate="stderror",plotCI.lwd=2,add=TRUE)
legend(0.6,0.6,c('SVM','NN'),col=c('red','blue'),lwd=3)
A mock data set containing a simple set of predictions and corresponding class labels.
data(ROCR.simple)
A two element list. The first element, ROCR.simple$predictions
, is a
vector of numerical predictions. The second element,
ROCR.simple$labels
, is a vector of corresponding class labels.
# plot a ROC curve for a single prediction run
# and color the curve according to cutoff.
library(ROCR)
data(ROCR.simple)
pred < prediction(ROCR.simple$predictions, ROCR.simple$labels)
pred
perf < performance(pred,"tpr","fpr")
perf
plot(perf,colorize=TRUE)
A mock data set containing 10 sets of predictions and corresponding labels as would be obtained from 10fold crossvalidation.
data(ROCR.xval)
A two element list. The first element, ROCR.xval$predictions
, is
itself a 10 element list. Each of these 10 elements is a vector of numerical
predictions for each crossvalidation run. Likewise, the second list entry,
ROCR.xval$labels
is a 10 element list in which each element is a
vector of true class labels corresponding to the predictions.
# plot ROC curves for several crossvalidation runs (dotted
# in grey), overlaid by the vertical average curve and boxplots
# showing the vertical spread around the average.
library(ROCR)
data(ROCR.xval)
pred < prediction(ROCR.xval$predictions, ROCR.xval$labels)
pred
perf < performance(pred,"tpr","fpr")
perf
plot(perf,col="grey82",lty=3)
plot(perf,lwd=3,avg="vertical",spread.estimate="boxplot",add=TRUE)