Title: | Generic Functions for Cross Validation |
---|---|
Description: | Contains generic functions for performing cross validation and for computing diagnostic errors. |
Authors: | Korbinian Strimmer. |
Maintainer: | Korbinian Strimmer <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.5 |
Built: | 2024-11-15 06:37:56 UTC |
Source: | CRAN |
The "crossval" package implements generic functions for performing cross validation and for computing diagnostic errors.
Korbinian Strimmer (https://strimmerlab.github.io/)
Website: https://cran.r-project.org/package=crossval
crossval
, confusionMatrix
, diagnosticErrors
.
confusionMatrix
computes the confusion matrix, i.e. it counts the number of false positives (FP),
true positives (TP), true negatives (TN), and false negatives (FN).
Despite its name the functions returns a vector rather than an actual matrix for easier use with the crossval
function.
confusionMatrix(actual, predicted, negative="control")
confusionMatrix(actual, predicted, negative="control")
actual |
a vector containing the actual correct labels for each sample (e.g. "cancer" or "control"). |
predicted |
a vector containing the predicted labels. |
negative |
the label of a negative "null" sample (default: "control"). |
confusionMatrix
returns a vector of length 4 containing the counts for FP, TP, TN, and FN.
Korbinian Strimmer (https://strimmerlab.github.io).
# load crossval library library("crossval") # true labels a = c("cancer", "cancer", "control", "control", "cancer", "control", "control") # predicted labels p = c("cancer", "control", "control", "control", "cancer", "control", "cancer") # confusion matrix (a vector) cm = confusionMatrix(a, p, negative="control") cm # FP TP TN FN # 1 2 3 1 # attr(,"negative") # [1] "control" # corresponding accuracy, sensitivity etc. diagnosticErrors(cm) # acc sens spec ppv npv lor # 0.7142857 0.6666667 0.7500000 0.6666667 0.7500000 1.7917595 # attr(,"negative") # [1] "control"
# load crossval library library("crossval") # true labels a = c("cancer", "cancer", "control", "control", "cancer", "control", "control") # predicted labels p = c("cancer", "control", "control", "control", "cancer", "control", "cancer") # confusion matrix (a vector) cm = confusionMatrix(a, p, negative="control") cm # FP TP TN FN # 1 2 3 1 # attr(,"negative") # [1] "control" # corresponding accuracy, sensitivity etc. diagnosticErrors(cm) # acc sens spec ppv npv lor # 0.7142857 0.6666667 0.7500000 0.6666667 0.7500000 1.7917595 # attr(,"negative") # [1] "control"
crossval
performs K-fold cross validation with B repetitions. If Y
is a factor then balanced sampling is used (i.e. in each fold each category is represented in appropriate proportions).
crossval(predfun, X, Y, K=10, B=20, verbose=TRUE, ...)
crossval(predfun, X, Y, K=10, B=20, verbose=TRUE, ...)
predfun |
Prediction function (see details). |
X |
Matrix of predictors (columns correspond to variables). |
Y |
Univariate response variable. |
K |
Number of folds. |
B |
Number of repetitions. |
verbose |
If |
... |
optional arguments for |
The argument predfun
must be a function of the form
predfun(Xtrain, Ytrain, Xtest, Ytest, ...)
.
crossval
returns a list with three entries:
stat.cv: the statistic returned by predfun for each cross validation run.
stat: the statistic returned by predfun averaged over all cross validation runs.
stat.se: the corresponding standard error.
Korbinian Strimmer (https://strimmerlab.github.io).
# load "crossval" package library("crossval") # classification examples # set up lda prediction function predfun.lda = function(train.x, train.y, test.x, test.y, negative) { require("MASS") # for lda function lda.fit = lda(train.x, grouping=train.y) ynew = predict(lda.fit, test.x)$class # count TP, FP etc. out = confusionMatrix(test.y, ynew, negative=negative) return( out ) } # Student's Sleep Data data(sleep) X = as.matrix(sleep[,1, drop=FALSE]) # increase in hours of sleep Y = sleep[,2] # drug given plot(X ~ Y) levels(Y) # "1" "2" dim(X) # 20 1 set.seed(12345) cv.out = crossval(predfun.lda, X, Y, K=5, B=20, negative="1") cv.out$stat diagnosticErrors(cv.out$stat) # linear regression example data("attitude") y = attitude[,1] # rating variable x = attitude[,-1] # date frame with the remaining variables is.factor(y) # FALSE summary( lm(y ~ . , data=x) ) # set up lm prediction function predfun.lm = function(train.x, train.y, test.x, test.y) { lm.fit = lm(train.y ~ . , data=train.x) ynew = predict(lm.fit, test.x ) # compute squared error risk (MSE) out = mean( (ynew - test.y)^2 ) return( out ) } # prediction MSE using all variables set.seed(12345) cv.out = crossval(predfun.lm, x, y, K=5, B=20) c(cv.out$stat, cv.out$stat.se) # and only two variables cv.out = crossval(predfun.lm, x[,c(1,3)], y, K=5, B=20) c(cv.out$stat, cv.out$stat.se) # for more examples (e.g. using cross validation in a regression or classification context) # see the R packages "sda", "care", or "binda".
# load "crossval" package library("crossval") # classification examples # set up lda prediction function predfun.lda = function(train.x, train.y, test.x, test.y, negative) { require("MASS") # for lda function lda.fit = lda(train.x, grouping=train.y) ynew = predict(lda.fit, test.x)$class # count TP, FP etc. out = confusionMatrix(test.y, ynew, negative=negative) return( out ) } # Student's Sleep Data data(sleep) X = as.matrix(sleep[,1, drop=FALSE]) # increase in hours of sleep Y = sleep[,2] # drug given plot(X ~ Y) levels(Y) # "1" "2" dim(X) # 20 1 set.seed(12345) cv.out = crossval(predfun.lda, X, Y, K=5, B=20, negative="1") cv.out$stat diagnosticErrors(cv.out$stat) # linear regression example data("attitude") y = attitude[,1] # rating variable x = attitude[,-1] # date frame with the remaining variables is.factor(y) # FALSE summary( lm(y ~ . , data=x) ) # set up lm prediction function predfun.lm = function(train.x, train.y, test.x, test.y) { lm.fit = lm(train.y ~ . , data=train.x) ynew = predict(lm.fit, test.x ) # compute squared error risk (MSE) out = mean( (ynew - test.y)^2 ) return( out ) } # prediction MSE using all variables set.seed(12345) cv.out = crossval(predfun.lm, x, y, K=5, B=20) c(cv.out$stat, cv.out$stat.se) # and only two variables cv.out = crossval(predfun.lm, x[,c(1,3)], y, K=5, B=20) c(cv.out$stat, cv.out$stat.se) # for more examples (e.g. using cross validation in a regression or classification context) # see the R packages "sda", "care", or "binda".
diagnosticErrors
computes various diagnostic errors useful for evaluating the performance of a diagnostic test or a classifier: accuracy (acc), sensitivity (sens), specificity (spec), positive predictive value (ppv), negative predictive value (npv), and log-odds ratio (lor).
diagnosticErrors(cm)
diagnosticErrors(cm)
cm |
a vector containing the true positives, false positives etc, as computed by |
The diagnostic errors are computed as follows:
acc = (TP+TN)/(FP+TN+TP+FN)
sens = TP/(TP+FN)
spec = TN/(FP+TN)
ppv = TP/(FP+TP)
npv = TN/(TN+FN)
lor = log(TP*TN/(FN*FP))
diagnostic errors
returns a vector containing various diagnostic errors.
Korbinian Strimmer (https://strimmerlab.github.io).
# load crossval library library("crossval") # true labels a = c("cancer", "cancer", "control", "control", "cancer", "control", "control") # predicted labels p = c("cancer", "control", "control", "control", "cancer", "control", "cancer") # confusion matrix (a vector) cm = confusionMatrix(a, p, negative="control") cm # FP TP TN FN # 1 2 3 1 # attr(,"negative") # [1] "control" # corresponding accuracy, sensitivity etc. diagnosticErrors(cm) # acc sens spec ppv npv lor # 0.7142857 0.6666667 0.7500000 0.6666667 0.7500000 1.7917595 # attr(,"negative") # [1] "control"
# load crossval library library("crossval") # true labels a = c("cancer", "cancer", "control", "control", "cancer", "control", "control") # predicted labels p = c("cancer", "control", "control", "control", "cancer", "control", "cancer") # confusion matrix (a vector) cm = confusionMatrix(a, p, negative="control") cm # FP TP TN FN # 1 2 3 1 # attr(,"negative") # [1] "control" # corresponding accuracy, sensitivity etc. diagnosticErrors(cm) # acc sens spec ppv npv lor # 0.7142857 0.6666667 0.7500000 0.6666667 0.7500000 1.7917595 # attr(,"negative") # [1] "control"