Title: | Mapping ML Scores to Calibrated Predictions |
---|---|
Description: | Transforms your uncalibrated Machine Learning scores to well-calibrated prediction estimates that can be interpreted as probability estimates. The implemented BBQ (Bayes Binning in Quantiles) model is taken from Naeini (2015, ISBN:0-262-51129-0). Please cite this paper: Schwarz J and Heider D, Bioinformatics 2019, 35(14):2458-2465. |
Authors: | Johanna Schwarz, Dominik Heider |
Maintainer: | Dominik Heider <[email protected]> |
License: | LGPL-3 |
Version: | 0.1.2 |
Built: | 2024-11-20 06:39:49 UTC |
Source: | CRAN |
trains and evaluates the BBQ calibration model using folds
-Cross-Validation (CV).
The predicted
values are partitioned into n subsets. A BBQ model is constructed on (n-1) subsets; the remaining set is used
for testing the model. All test set predictions are merged and used to compute error metrics for the model.
BBQ_CV(actual, predicted, method_for_prediction = 0, n_folds = 10, seed, input)
BBQ_CV(actual, predicted, method_for_prediction = 0, n_folds = 10, seed, input)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
method_for_prediction |
0=selection, 1=averaging, Default: 0 |
n_folds |
number of folds in the cross-validation, Default: 10 |
seed |
random seed to alternate the split of data set partitions |
input |
specify if the input was scaled or transformed, scaled=1, transformed=2 |
list object containing the following components:
error |
list object that summarizes discrimination and calibration errors obtained during the CV |
pred_idx |
which BBQ prediction method was used during CV, 0=selection, 1=averaging |
type |
"BBQ" |
probs_CV |
vector of calibrated predictions that was used during the CV |
actual_CV |
respective vector of true values (0 or 1) that was used during the CV |
## Loading dataset in environment data(example) actual <- example$actual predicted <- example$predicted BBQ_model <- CalibratR:::BBQ_CV(actual, predicted, method_for_prediction=0, n_folds=4, 123, 1)
## Loading dataset in environment data(example) actual <- example$actual predicted <- example$predicted BBQ_model <- CalibratR:::BBQ_CV(actual, predicted, method_for_prediction=0, n_folds=4, 123, 1)
p_values from stats::binom.test for each bin, if bin is empty, a p-value of 2 is returned
binom_for_histogram(n_x)
binom_for_histogram(n_x)
n_x |
numeric vector of two integers. The first one is the number of cases in the bin; the second the number of instances in the bin |
p-value from stats::binom.test method
This method builds a BBQ calibration model using the trainings set provided.
build_BBQ(actual, predicted)
build_BBQ(actual, predicted)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
Based on the paper (and matlab code) : "Obtaining Well Calibrated Probabilities Using Bayesian Binning" by Naeini, Cooper and Hauskrecht: ; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410090/
returns the BBQ model which includes models for all evaluated binning schemes; the prunedmodel contains only a selection of BBQ models with the best Bayesian score
This method builds a GUESS calibration model using the trainings set provided.
build_GUESS(actual, predicted)
build_GUESS(actual, predicted)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
returns the trained GUESS model that can be used to calibrate a test set using the predict_GUESS
method
calculate estimated probability per bin, input predicted and real score as numeric vector; builds a histogram binning model which can be used to calibrate uncalibrated predictions using the predict_histogramm_binning method
build_hist_binning(actual, predicted, bins = NULL)
build_hist_binning(actual, predicted, bins = NULL)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
bins |
number of bins that should be used to build the binning model, Default: decide_on_break estimates optimal number of bins |
if trainings set is smaller then threshold (15 bins*5 elements=75), number of bins is decreased
returns the trained histogram model that can be used to calibrate a test set using the predict_hist_binning
method
Builds selected calibration models on the supplied trainings values actual
and predicted
and returns them
to the user. New test instances can be calibrated using the predict_calibratR
function.
Returns cross-validated calibration and discrimination error values for the models if evaluate_CV_error
is set to TRUE. Repeated cross-Validation can be time-consuming.
calibrate(actual, predicted, model_idx = c(1, 2, 3, 4, 5), evaluate_no_CV_error = TRUE, evaluate_CV_error = TRUE, folds = 10, n_seeds = 30, nCores = 4)
calibrate(actual, predicted, model_idx = c(1, 2, 3, 4, 5), evaluate_no_CV_error = TRUE, evaluate_CV_error = TRUE, folds = 10, n_seeds = 30, nCores = 4)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
model_idx |
which calibration models should be implemented, 1=hist_scaled, 2=hist_transformed, 3=BBQ_scaled, 4=BBQ_transformed, 5=GUESS, Default: c(1, 2, 3, 4, 5) |
evaluate_no_CV_error |
computes internal errors for calibration models that were trained on all available |
evaluate_CV_error |
computes cross-validation error. |
folds |
number of folds in the cross-validation of the calibration model. If |
n_seeds |
|
nCores |
|
parallised execution of random data set splits for the Cross-Validation procedure over n_seeds
A list object with the following components:
calibration_models |
a list of all trained calibration models, which can be used in the |
summary_CV |
a list containing information on the CV errors of the implemented models |
summary_no_CV |
a list containing information on the internal errors of the implemented models |
predictions |
calibrated predictions for the original |
n_seeds |
number of random data set partitions into training and test set for |
Johanna Schwarz
## Loading dataset in environment data(example) actual <- example$actual predicted <- example$predicted ## Create calibration models calibration_model <- calibrate(actual, predicted, model_idx = c(1,2), FALSE, FALSE, folds = 10, n_seeds = 1, nCores = 2)
## Loading dataset in environment data(example) actual <- example$actual predicted <- example$predicted ## Create calibration models calibration_model <- calibrate(actual, predicted, model_idx = c(1,2), FALSE, FALSE, folds = 10, n_seeds = 1, nCores = 2)
trains calibration models on the training set of predicted
/actual
value pairs.model_idx
specifies which models should be trained.
calibrate_me(actual, predicted, model_idx)
calibrate_me(actual, predicted, model_idx)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
model_idx |
a single number from 1 to 5, indicating which calibration model should be implemented, 1=hist_scaled, 2=hist_transformed, 3=BBQ_scaled, 4=BBQ_transformed, 5=GUESS |
depending on the value of model_idx
, the respective calibration model is build on the input from actual
and predicted
trains and evaluates calibration models using n_seeds
-times repeated folds
-Cross-Validation (CV).model_idx
specifies which models should be trained.
Model training and evaluation is repeated n_seeds
-times with a different training/test set partition scheme for the CV each time.
calibrate_me_CV_errors(actual, predicted, model_idx, folds = 10, n_seeds, nCores)
calibrate_me_CV_errors(actual, predicted, model_idx, folds = 10, n_seeds, nCores)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
model_idx |
which calibration models should be implemented, 1=hist_scaled, 2=hist_transformed, 3=BBQ_scaled, 4=BBQ_transformed, 5=GUESS |
folds |
number of folds in the cross-validation, Default: 10 |
n_seeds |
|
nCores |
|
parallised execution over n_seeds
returns all trained calibration models that were built during the n_seeds
-times repeated folds
-CV.
Error values for each of the n_seeds
CV runs are given.
FUNCTION_DESCRIPTION
compare_models_visual(models, seq = NULL)
compare_models_visual(models, seq = NULL)
models |
PARAM_DESCRIPTION |
seq |
sequence for which the calibrated predictions should be plotted, Default: NULL |
DETAILS
OUTPUT_DESCRIPTION
ggplot
,geom_line
,aes
,ylim
,theme
,labs
,scale_color_brewer
melt
computes various discrimination error values, namely: sensitivity, specificity, accuracy, positive predictive value (ppv), negative predictive value (npv) and AUC
evaluate_discrimination(actual, predicted, cutoff = NULL)
evaluate_discrimination(actual, predicted, cutoff = NULL)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
cutoff |
cut-off to be used for the computation of npv, ppv, sensitivity and specificity, Default: value that maximizes sensitivity and specificity (Youden-Index) |
list object with the following components:
sens |
sensitivity |
spec |
specificity |
acc |
accuracy |
ppv |
positive predictive value |
npv |
negative predictive value |
cutoff |
cut-off that was used to compute the error values |
auc |
AUC value |
list object containing 1) the simulated classifiers for two classes. Distributions are simulated from Gaussian distributions with Normal(mean=1.5, sd=0) for class 1 and Normal(mean=0, sd=0) for class 0 instances. Each class consists of 100 instances. and 2) A test set of 100 instances
data(example)
data(example)
predicted
=vector of 200 simulated classifier values; actual
=their respective true class labels (0/1)
returns formatted input.
If specified, the uncalibrated input is mapped to the [0;1] range using scaling (scale_me
) or transforming (transform_me
)
format_values(cases, control, input, min = NULL, max = NULL, mean = NULL)
format_values(cases, control, input, min = NULL, max = NULL, mean = NULL)
cases |
instances from class 1 |
control |
instances from class 0 |
input |
single integer (0, 1 or 2). specify if the input should be formatted (=0), formatted and scaled (=1) or formatted and transformed (=2) |
min |
min value of the original data set, default=calculated on input |
max |
max value of the original data set, default=calculated on input |
mean |
mean value of the original data set, default=calculated on input |
list object with the following components:
formated_values |
formatted input. If |
min |
minimum value among all instances |
max |
maximum value among all instances |
mean |
mean value among all instances |
FUNCTION_DESCRIPTION
get_Brier_score(actual, predicted)
get_Brier_score(actual, predicted)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
DETAILS
OUTPUT_DESCRIPTION
calculates the class-specific classification error CLE in the test set.
The method computes the deviation of the calibrated predictions of class 1 instances from their true value 1.
For class 0 instances, get_CLE_class
computes the deviation from 0.
Class 1 CLE is 0 when all class 1 instances have a calibrated prediction of 1 regardless of potential miscalibration of class 0 instances.
CLE calculation is helpful when miscalibration and -classification is more cost-sensitive for one class than for the other.
get_CLE_class(actual, predicted, bins = 10)
get_CLE_class(actual, predicted, bins = 10)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
bins |
number of bins for the equal-width binning model, default=10 |
object of class list containing the following components:
class_1 |
CLE of class 1 instances |
class_0 |
CLE of class 0 instances |
melt
ggplot
,geom_line
,aes
,position_dodge
,labs
,scale_colour_manual
visualises how class 1 and class 0 classification error (CLE) differs in each trained calibration model.
Comparing class-specific CLE helps to choose a calibration model for applications were classification error is cost-sensitive for one class.
See get_CLE_class
for details on the implementation.
get_CLE_comparison(list_models)
get_CLE_comparison(list_models)
list_models |
list object that contains all error values for all trained calibration models. For the specific format, see the calling function |
ggplot2
Expected Calibration Error (ECE); the model is divided into 10 equal-width bins (default) and the mean of the observed (0/1) vs. mean of predicted is calculated per bin, weighted by emperical frequency of elements in bin i
get_ECE_equal_width(actual, predicted, bins = 10)
get_ECE_equal_width(actual, predicted, bins = 10)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
bins |
number of bins for the equal-width binning model |
equal-width ECE value
Maximum Calibration Error (MCE), returns maximum calibration error for equal-width binning model
get_MCE_equal_width(actual, predicted, bins = 10)
get_MCE_equal_width(actual, predicted, bins = 10)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
bins |
number of bins for the binning model |
equal-width MCE value
Expected Calibration Error (ECE); the model is divided into 10 equal-width bins (default) and the mean of the observed (0/1) vs. mean of predicted is calculated per bin, weighted by emperical frequency of elements in bin i
getECE(actual, predicted, n_bins = 10)
getECE(actual, predicted, n_bins = 10)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
n_bins |
number of bins of the underlying equal-frequency histogram, Default: 10 |
equal-frequency ECE value
Maximum Calibration Error (MCE), returns maximum calibration error for equal-frequency binning model
getMCE(actual, predicted, n_bins = 10)
getMCE(actual, predicted, n_bins = 10)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
n_bins |
number of bins of the underlying equal-frequency histogram, Default: 10 |
equal-frequency MCE value
calculates the root of mean square error (RMSE) in the test set of calibrated predictions
getRMSE(actual, predicted)
getRMSE(actual, predicted)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
RMSE value
trains and evaluates the GUESS calibration model using folds
-Cross-Validation (CV).
The predicted
values are partitioned into n subsets. A GUESS model is constructed on (n-1) subsets; the remaining set is used
for testing the model. All test set predictions are merged and used to compute error metrics for the model.
GUESS_CV(actual, predicted, n_folds = 10, method_of_prediction = 2, seed, input)
GUESS_CV(actual, predicted, n_folds = 10, method_of_prediction = 2, seed, input)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
n_folds |
number of folds for the cross-validation, Default: 10 |
method_of_prediction |
PARAM_DESCRIPTION, Default: 2 |
seed |
random seed to alternate the split of data set partitions |
input |
specify if the input was scaled or transformed, scaled=1, transformed=2 |
list object containing the following components:
error |
list object that summarizes discrimination and calibration errors obtained during the CV |
type |
"GUESS" |
pred_idx |
which prediction method was used during CV |
probs_CV |
vector of calibrated predictions that was used during the CV |
actual_CV |
respective vector of true values (0 or 1) that was used during the CV |
trains and evaluates the histogram binning calibration model repeated folds
-Cross-Validation (CV).
The predicted
values are partitioned into n subsets. A histogram binning model is constructed on (n-1) subsets; the remaining set is used
for testing the model. All test set predictions are merged and used to compute error metrics for the model.
hist_binning_CV(actual, predicted, n_bins = 15, n_folds = 10, seed, input)
hist_binning_CV(actual, predicted, n_bins = 15, n_folds = 10, seed, input)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
n_bins |
number of bins used in the histogram binning scheme, Default: 15 |
n_folds |
number of folds in the cross-validation, Default: 10 |
seed |
random seed to alternate the split of data set partitions |
input |
specify if the input was scaled or transformed, scaled=1, transformed=2 |
list object containing the following components:
error |
list object that summarizes discrimination and calibration errors obtained during the CV |
type |
"hist" |
probs_CV |
vector of calibrated predictions that was used during the CV |
actual_CV |
respective vector of true values (0 or 1) that was used during the CV |
plots the the returned conditional class probabilities P(x|C) of GUESS_1 or GUESS_2 models. Which GUESS model is plotted can be specified in pred_idx
.
plot_class_distributions(build_guess_object, pred_idx)
plot_class_distributions(build_guess_object, pred_idx)
build_guess_object |
output from build_GUESS() |
pred_idx |
if |
ggplot object that visualizes the returned calibrated predicition estimates by GUESS_1 or GUESS_2
melt
ggplot
,geom_line
,aes
,scale_colour_manual
,theme
,labs
,geom_vline
,geom_text
this methods visualizes all implemented calibration models as a mapping function between original ML scores (x-axis) and calibrated predictions (y-axis)
plot_model(calibration_model, seq = NULL)
plot_model(calibration_model, seq = NULL)
calibration_model |
output from the |
seq |
sequence of ML scores over which the mapping function should be evaluated, Default: 100 scores from the minimum to the maximum of the original ML scores |
ggplot object
melt
ggplot
,geom_line
,aes
,ylim
,scale_colour_manual
,theme
,labs
,geom_text
,geom_vline
FUNCTION_DESCRIPTION
predict_BBQ(bbq, new, option)
predict_BBQ(bbq, new, option)
bbq |
output from the |
new |
vector of uncalibrated probabilities |
option |
either 1 or 0; averaging=1, selecting=0 |
Based on the paper (and matlab code) : "Obtaining Well Calibrated Probabilities Using Bayesian Binning" by Naeini, Cooper and Hauskrecht: ; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410090/
a list object containing the following components:
predictions |
contains a vector of calibrated predictions |
pred_idx |
which option was used (averaging or selecting) |
significance_test_set |
the percentage of |
pred_per_bin |
number of instances |
maps the uncalibrated predictions new
into calibrated predictions using the passed over calibration models
predict_calibratR(calibration_models, new = NULL, nCores = 4)
predict_calibratR(calibration_models, new = NULL, nCores = 4)
calibration_models |
list of trained calibration models that were constructed using the |
new |
vector of new uncalibrated instances. Default: 100 scores from the minimum to the maximum of the original ML scores |
nCores |
|
if no new
value is given, the function will evaluate a sequence of numbers ranging from the minimum to the maximum of the original values in the training set
list object with the following components:
predictions |
a list containing the calibrated predictions for each calibration model |
significance_test_set |
a list containing the percentage of |
pred_per_bin |
a list containing the number of instances in each bin for the binning models |
Johanna Schwarz
## Loading dataset in environment data(example) test_set <- example$test_set calibration_model <- example$calibration_model ## Predict for test set predictions <- predict_calibratR(calibration_model$calibration_models, new=test_set, nCores = 2)
## Loading dataset in environment data(example) test_set <- example$test_set calibration_model <- example$calibration_model ## Predict for test set predictions <- predict_calibratR(calibration_model$calibration_models, new=test_set, nCores = 2)
returns calibrated predictions for the instances new
using the trained GUESS calibration model build_guess_object
.
Two different evaluation methods are available.
Method 1: returns the p-value for the score new
under the distribution that is handed over in the build_guess_object
Method 2: returns the probability density value for the score new
under the distribution that is handed over in the build_guess_object
predict_GUESS(build_guess_object, new, density_evaluation = 2, return_class_density = FALSE)
predict_GUESS(build_guess_object, new, density_evaluation = 2, return_class_density = FALSE)
build_guess_object |
output from the |
new |
vector of uncalibrated probabilities |
density_evaluation |
which density evaluation method should be used to infer calculate probabilities, Default: 2 |
return_class_density |
if set to TRUE, class densities p(x|class) are returned, Default: FALSE |
dens_case
and dens_control
are only returned when return_class_density
is set to TRUE
a list object containing the following components:
predictions |
contains a vector of calibrated predictions |
pred_idx |
which density evaluation method was used |
significance_test_set |
the percentage of |
dens_case |
a vector containing the p(x|case) values |
dens_control |
a vector containing the p(x|control) values |
predict for a new element using histogram binning
predict_hist_binning(histogram, new)
predict_hist_binning(histogram, new)
histogram |
the output of |
new |
vector of uncalibrated probabilities |
a list object containing the following components
predictions |
contains a vector of calibrated predictions |
significance_test_set |
the percentage of |
pred_per_bin |
a table containing the number of instances from |
calibrates the uncalibrated predictions new
using calibration_model
.
predict_model(new, calibration_model, min, max, mean, inputtype)
predict_model(new, calibration_model, min, max, mean, inputtype)
new |
vector of uncalibrated predictions |
calibration_model |
calibration model to be used for the calibration. Can be the output of |
min |
minimum value of the original data set |
max |
maximum value of the original data set |
mean |
mean value of the original data set |
inputtype |
specify if the model was build on original (=0), scaled(=1) or transformed (=2) data |
vector of calibrated predictions
This functions plots all n reliability diagrams that were constructed during n-times repeated m-fold cross-validation (CV). During calibration model evaluation, CV is repeated n times, so that eventually n reliability diagrams are obtained.
rd_multiple_runs(list_models)
rd_multiple_runs(list_models)
list_models |
list object that contains n-times the output from the |
a list object that contains a reliability diagram that visualises all reliabilty diagrams that were constructed during n-times repeated m-fold cross-validation.
melt
ggplot
,geom_line
,aes
,geom_abline
,ylab
,xlab
,xlim
,ylim
,coord_fixed
,geom_text
,scale_color_discrete
,ggtitle
Reliability curves allow checking if the predicted probabilities of a
reliability_diagramm(actual, predicted, bins = 10, plot_rd = TRUE)
reliability_diagramm(actual, predicted, bins = 10, plot_rd = TRUE)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
bins |
number of bins in the reliability diagram, Default: 10 |
plot_rd |
should the reliability diagram be plotted, Default: TRUE |
a list object containing the following elements
calibration_error |
|
discrimination_error |
|
rd_breaks |
|
histogram_plot |
|
diagram_plot |
|
mean_pred_per_bin |
|
accuracy_per_bin |
|
freq_per_bin |
|
sign |
ggplot
,stat_bin
,aes
,scale_fill_manual
,theme
,labs
,geom_point
,xlim
,ylim
,geom_abline
,geom_line
,geom_text
,geom_label
,coord_fixed
maps all instances in x
to the [0;1] range using the equation:
y = (x-min)/(max-min)
If no values for min and max are given, they are calculated per default as min=min(x) and max=max(x)
scale_me(x, min = NULL, max = NULL)
scale_me(x, min = NULL, max = NULL)
x |
vector of predictions |
min |
minimum of |
max |
maximum of |
if x
is greater (smaller) than max
(min
), its calibrated prediction is set to 1 (0) and warning is triggered.
scaled values of x
this method offers a variety of statistical evaluation methods for the output of the calibrate
method.
All returned error values represent mean error values over the n_seeds
times repeated 10-fold CV.
statistics_calibratR(calibrate_object, t.test_partitions = TRUE, significance_models = TRUE)
statistics_calibratR(calibrate_object, t.test_partitions = TRUE, significance_models = TRUE)
calibrate_object |
list that is returned from the |
t.test_partitions |
Performs a paired two sided t.test over the error values (ECE, CLE1, CLE0, MCE, AUC, sensitivity and specificity) from the
random partition splits comparing a possible significant difference in mean among the calibration models. All models and the original, scaled and transformed values are tested against each other.
The p_value and the effect size of the t.test are returned to the user. Can only be performed, if the |
significance_models |
returns important characteristics of the implemented calibration models, Default: TRUE |
DETAILS
An object of class list, with the following components:
mean_calibration |
mean of calibration error values (ECE_equal_width, MCE_equal_width, ECE_equal_freq, MCE_equal_freq, RMSE, Class 1 CLE, Class 0 CLE, Brier Score, Class 1 Brier Score, Class 0 Brier Score) over |
standard_deviation |
standard deviation of calibration error values over |
var_coeff_calibration |
variation coefficient of calibration error values over |
mean_discrimination |
mean of discrimination error (sensitivity, specificity, AUC, positive predictive value, negative predictive value, accuracy) values over |
sd_discrimination |
standard deviation of discrimination error values over |
var_coeff_discrimination |
variation coefficient of discrimination error values over |
t.test_calibration |
=list(p_value=t.test.calibration, effect_size=effect_size_calibration), only returned if t.test=TRUE |
t.test_discrimination |
=list(p_value=t.test.discrimination, effect_size=effect_size_discrimination), only returned if t.test=TRUE |
significance_models |
only returned if significance_models=TRUE |
n_seeds |
number of random data set partitions into training and test set for |
original_values |
list object that consists of the |
Johanna Schwarz
## Loading dataset in environment data(example) calibration_model <- example$calibration_model statistics <- statistics_calibratR(calibration_model)
## Loading dataset in environment data(example) calibration_model <- example$calibration_model statistics <- statistics_calibratR(calibration_model)
maps all instances in x_unscaled
to the [0;1] range using the equation:
y=exp(x)/(1+exp(x))
transform_me(x_unscaled, mean)
transform_me(x_unscaled, mean)
x_unscaled |
vector of predictions |
mean |
mean of |
values greater then exp(700)/ or smaller then exp(-700) are returned as "Inf". To avoid NaN values, these "Inf." values are turned into min(y) or max(y).
transformed values of x_unscaled
performs n_folds
-CV but with only input-preprocessing the test set. No calibration model is trained and evaluated in this method.
The predicted
values are partitioned into n subsets. The training set is constructed on (n-1) subsets; the remaining set is used
for testing. Since no calibration model is used in this method, the test set predictions are only input-preprocessed (either scaled or transformed, depending on input
).
All test set predictions are merged and used to compute error metrics for the input-preprocessing methods.
uncalibrated_CV(actual, predicted, n_folds = 10, seed, input)
uncalibrated_CV(actual, predicted, n_folds = 10, seed, input)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
n_folds |
number of folds for the cross-validation, Default: 10 |
seed |
random seed to alternate the split of data set partitions |
input |
specify if the input was scaled or transformed, scaled=1, transformed=2 |
list object containing the following components:
error |
list object that summarizes discrimination and calibration errors obtained during the CV |
type |
"uncalibrated" |
probs_CV |
vector of input-preprocessed predictions that was used during the CV |
actual_CV |
respective vector of true values (0 or 1) that was used during the CV |
plots a panel for all calibrated predictions from the respective calibration model. Allows visual comparison of the models output and their optimal cut off
visualize_calibrated_test_set(actual, predicted_list, cutoffs)
visualize_calibrated_test_set(actual, predicted_list, cutoffs)
actual |
vector of observed class labels (0/1) |
predicted_list |
predict_calibratR$predictions object (list of calibrated predictions from calibration models) |
cutoffs |
vector of optimal cut-off thresholds for each calibration model |
ggplot2 element for visual comparison of the evaluated calibration models
ggplot
,geom_point
,scale_colour_manual
,xlab
,ylab
,geom_hline
,ylim
this method offers a variety of visualisations to compare implemented calibration models
visualize_calibratR(calibrate_object, visualize_models = FALSE, plot_distributions = FALSE, rd_partitions = FALSE, training_set_calibrated = FALSE)
visualize_calibratR(calibrate_object, visualize_models = FALSE, plot_distributions = FALSE, rd_partitions = FALSE, training_set_calibrated = FALSE)
calibrate_object |
the list component |
visualize_models |
returns the list components |
plot_distributions |
returns a density distribution plot of the calibrated predictions after CV (External) or without CV (internal) |
rd_partitions |
returns a reliability diagram for each model |
training_set_calibrated |
returns a list of ggplots. Each plot represents the calibrated predictions by the respective calibration model of the training set.
If the list object |
An object of class list, with the following components:
histogram_distribution |
returns a histogram of the original ML score distribution |
density_calibration_internal |
returns a list of density distribution plots for each calibration method, the original and the two input-preprocessing methods scaling and transforming. The plot visualises the density distribution of the calibrated predictions of the training set. In this case, training and test set values are identical, so be careful to evaluate the plots. |
density_calibration_external |
returns a list of density distribution plots for each calibration method, the original and the two input-preprocessing methods scaling and transforming. The plot visualises the density distribution of the calibrated predictions, that were returned during Cross Validation. If more than one repetition of CV was performed, run number 1 is evaluated |
plot_calibration_models |
maps the original ML scores to their calibrated prediction estimates for each model.
This enables easy model comparison over the range of ML scores See also |
plot_single_models |
returns a list of ggplots for each calibration model, also mapping the original ML scores to their calibrated prediction. Significance values are indicated.
See also |
rd_plot |
returns a list of reliability diagrams for each of the implemented calibration models and the two input-preprocessing methods "scaled" and "transformed". The returned plot visualises the calibrated predictions that
were returned for the test set during each of the n run of the n-times repeated CV. Each grey line represents one of the n runs. The blue line represents the median of all calibrated bin predictions.
Insignificant bin estimates are indicated with "ns". If no CV was performed during calibration model building using the |
calibration_error |
returns a list of boxplots for the calibration error metrics ECE, MCE, CLE and RMSE. The n values for each model represent the obtained error values during the
n times repeated CV. If no CV was performed during calibration model building using the |
discrimination_error |
returns a list of boxplots for the discrimination error AUC, sensitivity and specificity. The n values for each model represent the obtained error values during the
n times repeated CV. If no CV was performed during calibration model building using the |
cle_class_specific_error |
If no CV was performed during calibration model building using the |
training_set_calibrated |
returns a list of ggplots. Each plot represents the calibrated predictions by the respective calibration model of the training set.
If the list object |
GUESS_1_final_model |
plots the the returned conditional probability p(x|Class) values of the GUESS_1 model |
GUESS_2_final_model |
plots the the returned conditional probability p(x|Class) values of the GUESS_2 model |
Johanna Schwarz
ggplot
,geom_density
,aes
,scale_colour_manual
,scale_fill_manual
,labs
,geom_point
,geom_hline
,theme
,element_text
melt
## Loading dataset in environment data(example) calibration_model <- example$calibration_model visualisation <- visualize_calibratR(calibration_model, plot_distributions=FALSE, rd_partitions=FALSE, training_set_calibrated=FALSE)
## Loading dataset in environment data(example) calibration_model <- example$calibration_model visualisation <- visualize_calibratR(calibration_model, plot_distributions=FALSE, rd_partitions=FALSE, training_set_calibrated=FALSE)
FUNCTION_DESCRIPTION
visualize_distribution(actual, predicted)
visualize_distribution(actual, predicted)
actual |
vector of observed class labels (0/1) |
predicted |
vector of uncalibrated predictions |
list object containing the following components:
plot_distribution |
ggplot histogram that visualizes the observed class distributions |
parameter |
list object that summarizes all relevant parameters (mean, sd, number) of the observed class distributions |
ggplot
,geom_histogram
,aes
,scale_colour_manual
,scale_fill_manual
,labs
compares error values among different calibration models. A boxplots is created from the n error values that were obtained during the n-times repeated Cross-Validation procedure.
Different error values are implemented and can be compared:
discrimination error = sensitivity, specificity, accuracy, AUC (when discrimination
=TRUE)
calibration error = ece, mce, rmse, class 0 cle, class 1 cle (when discrimination
=FALSE)
For the calculation of the errors, see the respective methods listed in the "see also" section
visualize_error_boxplot(list_models, discrimination = TRUE)
visualize_error_boxplot(list_models, discrimination = TRUE)
list_models |
list object that contains all error values for all trained calibration models. For the specific format, see the calling function |
discrimination |
boolean (TRUE or FALSE). If TRUE, discrimination errors are compared between models; if FALSE calibration error is compared, Default: TRUE |
An object of class list, with the following components:
if discrimination
=TRUE
sens |
ggplot2 boxplot that compares all evaluated calibration models with regard to sensitivity. |
spec |
ggplot2 boxplot that compares all evaluated calibration models with regard to specificity |
acc |
ggplot2 boxplot that compares all evaluated calibration models with regard to accuracy |
auc |
ggplot2 boxplot that compares all evaluated calibration models with regard to AUC |
list_errors |
list object that contains all discrimination error values that were used to construct the boxplots |
if discrimination
=FALSE
ece |
ggplot2 boxplot that compares all evaluated calibration models with regard to expected calibration error |
mce |
ggplot2 boxplot that compares all evaluated calibration models with regard to maximum expected calibration error (MCE) |
rmse |
ggplot2 boxplot that compares all evaluated calibration models with regard to root mean square error (RMSE) |
cle_0 |
ggplot2 boxplot that compares all evaluated calibration models with regard to class 0 classification error (CLE) |
cle_1 |
ggplot2 boxplot that compares all evaluated calibration models with regard to class 1 classification error (CLE) |
list_errors |
list object that contains all calibration error values that were used to construct the boxplots |
ggplot
,aes
,ggtitle
,scale_x_discrete
,geom_boxplot
,theme
,element_text
melt
,get_CLE_class
,getECE
,getMCE
,getRMSE
, evaluate_discrimination