Title: | The GiViTI Calibration Test and Belt |
---|---|
Description: | Functions to assess the calibration of logistic regression models with the GiViTI (Gruppo Italiano per la Valutazione degli interventi in Terapia Intensiva, Italian Group for the Evaluation of the Interventions in Intensive Care Units - see <http://www.giviti.marionegri.it/>) approach. The approach consists in a graphical tool, namely the GiViTI calibration belt, and in the associated statistical test. These tools can be used both to evaluate the internal calibration (i.e. the goodness of fit) and to assess the validity of an externally developed model. |
Authors: | Giovanni Nattino [cre, aut], Stefano Finazzi [aut], Guido Bertolini [aut], Carlotta Rossi [aut], Greta Carrara [aut] |
Maintainer: | Giovanni Nattino <[email protected]> |
License: | GPL-3 |
Version: | 1.3 |
Built: | 2024-11-26 06:31:54 UTC |
Source: | CRAN |
calibrationBeltIntersections
returns the
intervals where the calibration belt significantly deviates
from the bisector.
calibrationBeltIntersections(cbBound, seqP, minMax)
calibrationBeltIntersections(cbBound, seqP, minMax)
cbBound |
A |
seqP |
The vector of the the probabilities where the points of the calibration belt have been evaluated. |
minMax |
A list with two elements, named |
A list with two components, overBisector
and underBisector
.
Each component is a list containing all the intervals where the calibration
belt is significantly over/under the bisector.
givitiCalibrationBelt
and plot.givitiCalibrationBelt
to compute and plot the calibaration belt, and
givitiCalibrationTest
to perform the
associated calibration test.
e <- runif(1000) logite <- logit(e) eMod <- logistic(logit(e) + (logit(e))^2) o <- rbinom(1000, size = 1, prob = eMod) data <- data.frame(e = e, o = o, logite = logite) seqP <- seq(from = .01, to =.99, by = .01) seqG <- logit(seqP) minMax <- list(min = min(e), max = max(e)) fwLR <- polynomialLogRegrFw(data, .95, 4, 1) cbBound <- calibrationBeltPoints(data, seqG, fwLR$m, fwLR$fit, .95, .90, "external") calibrationBeltIntersections(cbBound, seqP, minMax)
e <- runif(1000) logite <- logit(e) eMod <- logistic(logit(e) + (logit(e))^2) o <- rbinom(1000, size = 1, prob = eMod) data <- data.frame(e = e, o = o, logite = logite) seqP <- seq(from = .01, to =.99, by = .01) seqG <- logit(seqP) minMax <- list(min = min(e), max = max(e)) fwLR <- polynomialLogRegrFw(data, .95, 4, 1) cbBound <- calibrationBeltPoints(data, seqG, fwLR$m, fwLR$fit, .95, .90, "external") calibrationBeltIntersections(cbBound, seqP, minMax)
calibrationBeltPoints
computes the points defining the boundary
of the confidence region.
calibrationBeltPoints(data, seqG, m, fit, thres, cLevel, devel)
calibrationBeltPoints(data, seqG, m, fit, thres, cLevel, devel)
data |
A |
seqG |
A vector containing the logit of the probabilities where the points of the calibration belt will be evaluated. |
m |
A scalar integer representing the degree of the polynomial at the end of the forward selection. |
fit |
An object of class |
thres |
A numeric scalar between 0 and 1 representing 1 - the significance level adopted in the forward selection. |
cLevel |
A numeric scalar between 0 and 1 representing the confidence level that will be used for the confidence region. |
devel |
A character string specifying if the model has been fit on
the same dataset under evaluation ( |
A data.frame
object with two columns, "U" and "L", containing
the points of the upper and lower boundary of the cLevel
*100%-level calibration belt evaluated
at values seqG
.
givitiCalibrationBelt
and plot.givitiCalibrationBelt
to compute and plot the calibaration belt, and
givitiCalibrationTest
to perform the
associated calibration test.
e <- runif(100) logite <- logit(e) o <- rbinom(100, size = 1, prob = e) data <- data.frame(e = e, o = o, logite = logite) seqG <- logit(seq(from = .01, to =.99, by = .01)) fwLR <- polynomialLogRegrFw(data, .95, 4, 1) calibrationBeltPoints(data, seqG, fwLR$m, fwLR$fit, .95, .90, "external")
e <- runif(100) logite <- logit(e) o <- rbinom(100, size = 1, prob = e) data <- data.frame(e = e, o = o, logite = logite) seqG <- logit(seq(from = .01, to =.99, by = .01)) fwLR <- polynomialLogRegrFw(data, .95, 4, 1) calibrationBeltPoints(data, seqG, fwLR$m, fwLR$fit, .95, .90, "external")
givitiCalibrationBelt
implements the computations necessary
to plot the calibration belt.
givitiCalibrationBelt(o, e, devel, subset = NULL, confLevels = c(0.8, 0.95), thres = 0.95, maxDeg = 4, nPoints = 200)
givitiCalibrationBelt(o, e, devel, subset = NULL, confLevels = c(0.8, 0.95), thres = 0.95, maxDeg = 4, nPoints = 200)
o |
A numeric vector representing the binary outcomes.
The elements must assume only the values 0 or 1. The predictions
in |
e |
A numeric vector containing the predictions of the
model under evaluation. The elements must be numeric and between 0 and 1.
The lenght of the vector must be equal to the length of the vector |
devel |
A character string specifying if the model has been fit on
the same dataset under evaluation ( |
subset |
An optional boolean vector specifying the subset of observations to be considered. |
confLevels |
A numeric vector containing the confidence levels of the calibration belt. The default values are set to .80 and .95. |
thres |
A numeric scalar between 0 and 1 representing 1 - the significance level adopted in the forward selection. By default is set to 0.95. |
maxDeg |
The maximum degree considered in the forward selection. By default is set to 4. |
nPoints |
A numeric scalar indicating the number of points to be considered to plot the calibration belt. The default value is 200. |
The calibration belt and the associated test can be used both to evaluate the calibration of the model in external samples or in the development dataset. However, the two cases have different requirements. When a model is evaluated on independent samples, the calibration belt and the related test can be applied whatever is the method used to fit the model. Conversely, they can be used on the development set only if the model is fitted with logistic regression.
An object of class givitiCalibrationBelt
.
After computing the calibration belt with the present function,
the plot
method can be used to plot
the calibration belt. The object returned is a list that contains the
following components:
The size of the sample evaluated in the analysis, after discarding
missing values from the vectors o
and e
.
Result of the check on the data. If the data are compatible with the
construction of the calibration belt, the value is the boolean
TRUE
. Otherwise, the element contain a character string
describing the problem found.
The degree of the polynomial at the end of the forward selection.
The value of the test's statistic.
The p-value of the test.
The vector of the probabilities where the points of the calibration belt has been evaluated.
A list with two elements named min
and max
representing the minimum and maximum probabilities in the model under evaluation
The vector containing the confidence levels of the calibration belt.
A list whose elements report the intervals where the
calibration belt is significantly over/under the bisector
for each confidence level in confLevels
.
plot.givitiCalibrationBelt
to plot the calibaration belt and
givitiCalibrationTest
to perform the
associated calibration test.
#Random by-construction well calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = e) cb <- givitiCalibrationBelt(o, e, "external") plot(cb) #Random by-construction poorly calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = logistic(logit(e)+2)) cb <- givitiCalibrationBelt(o, e, "external") plot(cb)
#Random by-construction well calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = e) cb <- givitiCalibrationBelt(o, e, "external") plot(cb) #Random by-construction poorly calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = logistic(logit(e)+2)) cb <- givitiCalibrationBelt(o, e, "external") plot(cb)
givitiCalibrationBeltTable
prints on the graphical area of the calibration
belt plot the table that summarizes the significant deviations from the
line of perfect calibration (i.e. the bisector of the I quadrant).
givitiCalibrationBeltTable(cb, tableStrings, grayLevels, xlim, ylim)
givitiCalibrationBeltTable(cb, tableStrings, grayLevels, xlim, ylim)
cb |
A |
tableStrings |
Optional. A list with four character elements named
|
grayLevels |
A vector containing the code of the gray levels used in the plot of the calibration belt. |
xlim , ylim
|
Numeric vectors of length 2, giving the
x and y coordinates ranges. Default values are |
The function prints the table on the graphical area.
givitiCalibrationTest
performs the calibration test associated to the
calibration belt.
givitiCalibrationTest(o, e, devel, subset = NULL, thres = 0.95, maxDeg = 4)
givitiCalibrationTest(o, e, devel, subset = NULL, thres = 0.95, maxDeg = 4)
o |
A numeric vector representing the binary outcomes.
The elements must assume only the values 0 or 1. The predictions
in |
e |
A numeric vector containing the probabilities of the
model under evaluation. The elements must be numeric and between 0 and 1.
The lenght of the vector must be equal to the length of the vector |
devel |
A character string specifying if the model has been fit on
the same dataset under evaluation ( |
subset |
An optional boolean vector specifying the subset of observations to be considered. |
thres |
A numeric scalar between 0 and 1 representing 1 - the significance level adopted in the forward selection. By default is set to 0.95. |
maxDeg |
The maximum degree considered in the forward selection. By default is set to 4. |
The calibration belt and the associated test can be used both to evaluate the calibration of the model in external samples or in the development dataset. However, the two cases have different requirements. When a model is evaluated on independent samples, the calibration belt and the related test can be applied whatever is the method used to fit the model. Conversely, they can be used on the development set only if the model is fitted with logistic regression.
A list of class htest
containing the following components:
The value of the test's statistic.
The p-value of the test.
The vector of coefficients hypothesized under the null hypothesis, that is, the parameters corresponding to the bisector.
A character string describing the alternative hypothesis.
A character string indicating what type of calibration test (internal or external) was performed.
The estimate of the coefficients of the polynomial logistic regression.
A character string giving the name(s) of the data.
givitiCalibrationBelt
and plot.givitiCalibrationBelt
to compute and plot the calibaration belt.
#Random by-construction well calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = e) givitiCalibrationTest(o, e, "external") #Random by-construction poorly calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = logistic(logit(e)+2)) givitiCalibrationTest(o, e, "external")
#Random by-construction well calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = e) givitiCalibrationTest(o, e, "external") #Random by-construction poorly calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = logistic(logit(e)+2)) givitiCalibrationTest(o, e, "external")
givitiCalibrationTestComp
implements the computations necessary to
perform the calibration test associated to the calibration belt.
givitiCalibrationTestComp(o, e, devel, thres, maxDeg)
givitiCalibrationTestComp(o, e, devel, thres, maxDeg)
o |
A numeric vector representing the binary outcomes.
The elements must assume only the values 0 or 1. The predictions
in |
e |
A numeric vector containing the probabilities of the
model under evaluation. The elements must be numeric and between 0 and 1.
The lenght of the vector must be equal to the length of the vector |
devel |
A character string specifying if the model has been fit on
the same dataset under evaluation ( |
thres |
A numeric scalar between 0 and 1 representing 1 - the significance level adopted in the forward selection. |
maxDeg |
The maximum degree considered in the forward selection. |
The calibration belt and the associated test can be used both to evaluate the calibration of the model in external samples or in the development dataset. However, the two cases have different requirements. When a model is evaluated on independent samples, the calibration belt and the related test can be applied whatever is the method used to fit the model. Conversely, they can be used on the development set only if the model is fitted with logistic regression.
A list containing the following components:
A data.frame
object with the numeric variables "o", "e" provided
in the input and the variable "logite", the logit of the probabilities.
The size of the original sample, i.e. the length of the
vectors e
and o
.
The value of the test's statistic.
The p-value of the test.
The degree of the polynomial at the end of the forward selection.
An object of class glm
containig the output of the fit
of the logistic regression model at the end of the iterative
forward selection.
givitiCalibrationBelt
and plot.givitiCalibrationBelt
to compute and plot the calibaration belt, and
givitiCalibrationTest
to perform the
associated calibration test.
e <- runif(100) o <- rbinom(100, size = 1, prob = e) givitiCalibrationTestComp(o, e, "external", .95, 4)
e <- runif(100) o <- rbinom(100, size = 1, prob = e) givitiCalibrationTestComp(o, e, "external", .95, 4)
Check of the coherence of the values passed to the functions
givitiCalibrationTest
and givitiCalibrationBelt
.
givitiCheckArgs(o, e, devel, thres, maxDeg)
givitiCheckArgs(o, e, devel, thres, maxDeg)
o |
A numeric vector representing the binary outcomes.
The elements must assume only the values 0 or 1. The predictions
in |
e |
A numeric vector containing the probabilities of the
model under evaluation. The elements must be numeric and between 0 and 1.
The lenght of the vector must be equal to the length of the vector |
devel |
A character string specifying if the model has been fit on
the same dataset under evaluation ( |
thres |
A numeric scalar between 0 and 1 representing 1 - the significance level adopted in the forward selection. |
maxDeg |
The maximum degree considered in the forward selection. |
The function produce an error if the elements provided through the arguments do not meet the constraints reported.
The function verifies that the data are compatible with the construction of the calibration belt. In particular, the function checks that the predictions provided do not complete separate the outcomes and that at least two events and non-events are present in the data.
givitiCheckData(o, e)
givitiCheckData(o, e)
o |
A numeric vector representing the binary outcomes.
The elements must assume only the values 0 or 1. The predictions
in |
e |
A numeric vector containing the probabilities of the
model under evaluation. The elements must be numeric and between 0 and 1.
The lenght of the vector must be equal to the length of the vector |
The output is TRUE
if the data do not show any of the
reported problems. Otherwise, the function returns a string describing the
problem found.
The package 'givitiR' provides the functions to plot the GiViTI calibration belt and to compute the associated statistical test.
The name of the approach derives from the GiViTI (Gruppo Italiano per la valutazione degli interventi in Terapia Intensiva, Italian Group for the Evaluation of the Interventions in Intensive Care Units), an international network of intensive care units (ICU) established in Italy in 1992. The group counts more than 400 ICUs from 7 countries, with about the half of the participating centers continuosly collecting data on the admitted patients through the PROSAFE project (PROmoting patient SAFEty and quality improvement in critical care). For further information, see the package vignette and the references therein.
The GiViTI calibration belt has been developed within the methodological research promoted by the GiViTI network, with the purposes of a) enhancing the quality of the logistic regression models built in the group's projects b) providing the participating ICUs with a detailed feedback about their quality of care. A description of the approach and examples of applications are reported in the package vignette.
The main functions of the package are listed below.
givitiCalibrationBelt
implements the computations necessary
to plot the calibration belt.
plot.givitiCalibrationBelt
plots the calibration belt.
givitiCalibrationTest
performs the calibration test associated to the
calibration belt.
givitiStatCdf
returns the cumulative density function of the
calibration statistic under the null hypothesis.
givitiStatCdf(t, m, devel, thres)
givitiStatCdf(t, m, devel, thres)
t |
The argument of the CDF. Must be a scalar value. |
m |
The scalar integer representing the degree of the polynomial at the end of the forward selection. |
devel |
A character string specifying if the model has been fit on
the same dataset under evaluation ( |
thres |
A numeric scalar between 0 and 1 representing the significance level adopted in the forward selection. |
A number representing the value of the CDF evaluated in t.
givitiCalibrationBelt
and plot.givitiCalibrationBelt
to compute and plot the calibaration belt, and
givitiCalibrationTest
to perform the
associated calibration test.
givitiStatCdf(3, 1, "external", .95) givitiStatCdf(3, 2, "internal", .95)
givitiStatCdf(3, 1, "external", .95) givitiStatCdf(3, 2, "internal", .95)
A dataset containing clinical information of 1,000 patients admitted to Italian Intesive Care Units joining the GiViTI network (Gruppo Italiano per la valutazione degli interventi in Terapia Intensiva, Italian Group for the Evaluation of the Interventions in Intensive Care Units). The data has been collected within the ProSAFE project, an Italian observational study based on a continuous data collection of clinical data in more than 200 Italian ICUs. The purpose of the project is a continuous surveillance of the quality of care provided in the participating centres. The actual values of the variables have been modified to protect subject confidentiality.
icuData
icuData
A data frame with 1000 rows and 33 variables. The dataset contains, for each predictor of the SAPSII score, both the clinical information and the weight of that variable in the score (the variable with the suffix '_NUM').
hospital outcome, numeric binary variable with values 1 (deceased) and 0 (alive).
probability estimated by the SAPSII prognostic model.
SAPSII score.
age, factor variable with levels (in years): '<40', '40-59', '60-69', '70-74', '75-80', '>=80'.
type of admission, factor variable with 3 levels: 'unschSurg' (unscheduled surgery), 'med' (medical), 'schSurg' (scheduled surgery).
chronic diseases, factor variable with 4 levels: 'noChronDis' (no chronic disease), 'metCarc' (metastatic carcinoma), 'hemMalig' (hematologic malignancy), 'aids' (AIDS).
Glasgow Coma Scale, factor variable with 5 levels: '3-5', '6-8', '9-10', '11-13', '14-15'.
systolic blood pressure, factor variable with 4 levels (in mmHg): '<70', '70-99', '100-199', '>=200'.
heart rate, factor variable with 5 levels: '<40', '40-69', '70-119', '120-159', '>=160'
temperature, factor variable with 2 levels (in Celsius degree): '<39', '>=39'.
urine output, factor variable with 3 levels (in L/24h): '<0.5', '0.5-0.99', '>=1'.
serum urea, factor variable with 3 levels (in g/L): '<0.60', '0.60-1.79', '>=1.80'.
wbc, factor variable with 3 levels (in 1/mm3): '<1', '1-19', '>=20'.
potassium, factor variable with 3 levels (in mEq/L): '<3', '3-4.9', '>=5'.
sodium, factor variable with 3 levels (in mEq/L): '<125', '125-144', '>=145'.
HCO3, factor variable with 3 levels (in mEq/L): '<15', '15-19', '>=20'.
bilirubin, factor variable with 3 levels (in mg/dL): '<4', '4-5.9', '>=6'.
mechanical ventilation and CPAP PaO2/FIO2, factor variable with 4 levels (PaO2/FIO2 in mmHg): 'noVent' (not ventilated), 'vent_<100' (ventialated and Pa02/FI02 <100), 'vent_100-199' (ventialated and Pa02/FI02 in 100-199), 'vent_>=200' (ventialated and Pa02/FI02 >= 200).
The data contain the information to apply the SAPSII model, a prognostic model developed to predict hospital mortality (Le Gall et al., 1993). Both the computed SAPSII score and the associated probability of death are variables of the dataset. The score is an integer number ranging from 0 to 163 describing the severity of the patient (the higher the score, the more severe the patient). The probability is computed from the score through the formula reported in the original paper. The dataset contains also the hospital survival of the patients.
http://www.giviti.marionegri.it/Default.asp (in Italian only)
Le Gall, Jean-Roger, Stanley Lemeshow, and Fabienne Saulnier. "A new simplified acute physiology score (SAPS II) based on a European/North American multicenter study." Jama 270, no. 24 (1993): 2957-2963.
The GiViTI Network, Prosafe Project - 2014 report. Sestante Edizioni: Bergamo, 2015. http://www.giviti.marionegri.it/Download/ReportPROSAFE_2014_EN_Polivalenti_ITALIA.pdf.
logit
and logistic
implement the logit and logistic transformations, respectively.
logit(p) logistic(x)
logit(p) logistic(x)
p |
A numeric vector whose components are numbers between 0 and 1. |
x |
A numeric vector. |
The functions apply the logit and logistic transformation to each element of the vector passed as argument. In particular, logit(p)=ln(p/(1-p)) and logistic(x)=exp(x)/(1+exp(x)).
logit(0.1) logit(0.5) logistic(0) logistic(logit(0.25)) logit(logistic(2))
logit(0.1) logit(0.5) logistic(0) logistic(logit(0.25)) logit(logistic(2))
The plot
method for calibration belt objects.
## S3 method for class 'givitiCalibrationBelt' plot(x, xlim = c(0, 1), ylim = c(0, 1), colBis = "red", xlab = "e", ylab = "o", main = "GiViTI Calibration Belt", polynomialString = T, pvalueString = T, nString = T, table = T, tableStrings = NULL, unableToFitString = NULL, ...)
## S3 method for class 'givitiCalibrationBelt' plot(x, xlim = c(0, 1), ylim = c(0, 1), colBis = "red", xlab = "e", ylab = "o", main = "GiViTI Calibration Belt", polynomialString = T, pvalueString = T, nString = T, table = T, tableStrings = NULL, unableToFitString = NULL, ...)
x |
A |
xlim , ylim
|
Numeric vectors of length 2, giving the
x and y coordinates ranges. Default values are |
colBis |
The color to be used for the bisector. The default value is red. |
xlab , ylab
|
Titles for the x and y axis. Default values are "e" and "o", repectively. |
main |
The main title of the plot. The default value is "GiViTI Calibration Belt". |
polynomialString |
If the value is FALSE, the degree of the polynomial is not printed on the graphical area. If the value is TRUE, the degree m is reported. If a string is passed to this argument, the string is reported instead of the text "Polynomial degree". The default value is TRUE. |
pvalueString |
If the value is FALSE, the p-value of the test is not printed on the graphical area. If the value is TRUE, the p-value is reported. If a string is passed to this argument, the string is reported instead of the text "p-value". The default value is TRUE. |
nString |
If the value is FALSE, the sample size is not printed on the graphical area. If the value is TRUE, the sample size is reported. If a string is passed to this argument, the string is reported instead of the text "n". The default value is TRUE. |
table |
A boolean value indicating whether the table reporting the intersections of the calibration belt with the bisector should be printed on the plot. |
tableStrings |
Optional. A list with four character elements named
|
unableToFitString |
Optional. If a string is passed to this argument, this string is reported in the plot area when the dataset is not compatible with the fit of the calibration belt (e.g. data separation or no positive events). By default, in such cases the text "Unable to fit the Calibration Belt" is reported. |
... |
Other graphical parameters passed to the generic |
The function generates the calibration belt plot. In addition, a list containing the following components is returned:
The p-value of the test.
The degree of the polynomial at the end of the forward selection.
givitiCalibrationBelt
to compute the calibaration belt and
givitiCalibrationTest
to perform the
associated calibration test.
#Random by-construction well calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = e) cb <- givitiCalibrationBelt(o, e, "external") plot(cb) #Random by-construction poorly calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = logistic(logit(e)+2)) cb <- givitiCalibrationBelt(o, e, "external") plot(cb)
#Random by-construction well calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = e) cb <- givitiCalibrationBelt(o, e, "external") plot(cb) #Random by-construction poorly calibrated model e <- runif(100) o <- rbinom(100, size = 1, prob = logistic(logit(e)+2)) cb <- givitiCalibrationBelt(o, e, "external") plot(cb)
polynomialLogRegrFw
implements a forward selection in a
polynomial logistic regression model.
polynomialLogRegrFw(data, thres, maxDeg, startDeg)
polynomialLogRegrFw(data, thres, maxDeg, startDeg)
data |
A |
thres |
A numeric scalar between 0 and 1 representing the significance level adopted in the forward selection. |
maxDeg |
The maximum degree considered in the forward selection. |
startDeg |
The starting degree in the forward selection. |
A list containing the following components:
An object of class glm
containig the output of the fit
of the logistic regression model at the end of the iterative
forward selection.
The degree of the polynomial at the end of the forward selection.
e <- runif(100) logite <- logit(e) o <- rbinom(100, size = 1, prob = e) data <- data.frame(e = e, o = o, logite = logite) polynomialLogRegrFw(data, .95, 4, 1)
e <- runif(100) logite <- logit(e) o <- rbinom(100, size = 1, prob = e) data <- data.frame(e = e, o = o, logite = logite) polynomialLogRegrFw(data, .95, 4, 1)