Title: | Assessment of Diagnostic and Prognostic Markers |
---|---|
Description: | Provides estimations of the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC) based on the two-stages mixed-subjects ROC curve estimator (Diaz-Coto et al. (2020) <doi:10.1515/ijb-2019-0097> and Diaz-Coto et al. (2020) <doi:10.1080/00949655.2020.1736071>). |
Authors: | Susana Diaz-Coto [aut], Susana Diaz-Coto [cre] |
Maintainer: | Susana Diaz-Coto <[email protected]> |
License: | GPL |
Version: | 0.1.2 |
Built: | 2024-12-02 06:54:31 UTC |
Source: | CRAN |
Computation of confidence intervals for the AUC based on Bootstrap Percentile.
auc_ci_boot(marker, outcome, status, observed.time, left, right, time, data_type, meth, grid, probs, ci.cl, ci.nboots, parallel, ncpus, all)
auc_ci_boot(marker, outcome, status, observed.time, left, right, time, data_type, meth, grid, probs, ci.cl, ci.nboots, parallel, ncpus, all)
marker |
vector with the biomarker values. |
outcome |
vector with the condition of the subjects as positive, negative or unknown at the considered time |
status |
response vector. |
observed.time |
vector with the observed times for each subject. |
left |
vector with the lower edges of the observed intervals. |
right |
vector with the upper edges of the observed intervals. |
time |
point of time at which the sMS ROC curve estimator will be computed. |
data_type |
scenario handled. |
meth |
method for approximating the predictive model |
grid |
grid size. |
probs |
vector containing the probabilities estimated through the predictive model. |
ci.cl |
confidence level at which the confidence intervals will be computed. |
ci.nboots |
number of bootstrap samples. |
parallel |
indicates whether parallel computing will be performed or not. |
ncpus |
number of CPUs to use if parallel computing is performed. |
all |
indicates whether the probabilities from the predictive model will be considered for all individuals, or only for those whose outcome value (condition) is unknown. |
List with two components:
ic.l |
lower edge of the confidence interval. |
ic.u |
upper edge of the confidence interval. |
Computation of confidence intervals for the AUC by implementing the empirical procedure for estimating the variance of the AUC, as described in doi:10.1515/ijb-2019-0097.
auc_ci_empr(SE, SP, auc, probs, controls, cases, ci.cl)
auc_ci_empr(SE, SP, auc, probs, controls, cases, ci.cl)
SE |
vector containing the values of the sensitivity returned from the |
SP |
vector containing the values of the specificity. |
auc |
value with the AUC estimate. |
probs |
vector containing the probabilities estimated through the predictive model. |
controls |
number of negative individuals. |
cases |
number of positive individuals. |
ci.cl |
confidence level at which confidence intervals will be computed. |
List with two components:
ic.l |
lower edge of the confidence interval. |
ic.u |
upper edge of the confidence interval. |
Computation of confidence intervals for the AUC by implementing the theoretical procedure for estimating the variance of the AUC, as described in doi:10.1515/ijb-2019-0097.
auc_ci_nvar(marker, outcome, status, observed.time, left, right, time, meth, data_type, grid, probs, sd.probs, ci.cl, nboots, SE, SP, auc, parallel, ncpus, all)
auc_ci_nvar(marker, outcome, status, observed.time, left, right, time, meth, data_type, grid, probs, sd.probs, ci.cl, nboots, SE, SP, auc, parallel, ncpus, all)
marker |
vector with the biomarker values. |
outcome |
vector with the condition of the subjects as positive, negative or unknown at the considered time |
status |
response vector. |
observed.time |
vector with the observed times for each subject. |
left |
vector with the lower edges of the observed intervals. |
right |
vector with the upper edges of the observed intervals. |
time |
point of time at which the sMS ROC curve estimator will be computed. |
meth |
method for approximating the predictive model |
data_type |
scenario handled. |
grid |
grid size. |
probs |
vector containing the probabilities estimated through the predictive model. |
sd.probs |
vector containing the standard deviation of the probabilities of the predictive model. |
ci.cl |
confidence levet at which the confidence intervals will be computed. |
nboots |
number of bootstrap samples. |
SE |
vector containing the values of the sensitivity returned from |
SP |
vector containing the values of the specificity. |
auc |
value with the AUC estimate. |
parallel |
indicates whether parallel computing will be performed or not. |
ncpus |
number of CPUs to use if parallel computing is performed. |
all |
parameter indicating whether all probabilities given by the predictive model should be considered (value “T”) or just those corresponding to individuals whose condition as positive or negative is unknown (“F”). The default value is (“T”). |
List with two components:
ic.l |
lower edge of the confidence interval. |
ic.u |
upper edge of the confidence interval. |
Checks the validity of the value entered as confidence level for computing the confidence intervals for the AUC.
check_ci_cl(ci.cl)
check_ci_cl(ci.cl)
ci.cl |
confidence level at which the confidence intervals for the AUC will be computed. |
Verifies that the value entered as confidence level ranges between 0 and 1. The 0.95 confidence level is taken as default.
A list with two components:
ci.cl |
value entered as confidence level for the AUC. |
message |
table with the warning messages generated by the function. |
Check of the consistency of the parameters indicated to compute the confidence intervals for the AUC.
check_conf_int(conf.int, ci.cl, ci.meth, ci.nboots, parallel, ncpus)
check_conf_int(conf.int, ci.cl, ci.meth, ci.nboots, parallel, ncpus)
conf.int |
parameter indicating whether confidence intervals for the AUC will be computed (“T”) or not (“F”). The default is “F”. |
ci.cl |
confidence level at which the confidence intervals for the AUC will be calculated. The default value is 0.95. |
ci.meth |
method for computing the confidence intervals according to doi:10.1515/ijb-2019-0097. There are three options:
The empirical method E is taken as default. This parameter is ignored if |
ci.nboots |
number of bootstrap samples to be generated when the chosen |
parallel |
indicates whether parallel computing will be done (“T”) or not (“F”), when computing the variance of the AUC through the methods “V” and “B”. |
ncpus |
number of CPUS that will be used when parallel computing is chosen. |
A list with the following components:
ci.cl |
value entered as confidence level for the AUC. |
ci.meth |
value entered as method for computing the confidence intervals. |
ci.nboots |
value entered as number of bootstrap samples. |
ci.ncpus |
value entered as number of CPUs chosen. |
message |
table with the warning messages generated by the function. |
Cheking of the parameter grid.
check_grid(grid)
check_grid(grid)
grid |
grid size for computing the ROC curve estimate. The default value is 1000. |
Verifies if the parameter entered as grid is a numerical value greater than 0.
A list with two components:
grid |
grid size entered. |
message |
table with the warning messages generated by the function. |
Checks the consistency of the parameters entered for prognosis scenarios under interval censorship.
check_marker_timeic(marker, left, right, time, probs, sd.probs)
check_marker_timeic(marker, left, right, time, probs, sd.probs)
marker |
vector with the biomarker values. It is a mandatory parameter. |
left |
vector containing the lower edges of the observed intervals. It is mandatory in prognosis scenarios and interval censorship and ignored in other situations. |
right |
vector with the upper edges of the observed intervals. It is mandatory as well in prognosis scenarios and interval censorship and ignored in other situations. |
time |
point of time at which the sMS ROC curve estimator will be computed. It is a mandatory parameter. The default value is 1. |
probs |
vector containing the probabilities corresponding to the predictive model when it has been externally computed. Only values between [0,1] are admissible. |
sd.probs |
vector with the standard deviations of the probabilities entered in |
The function returns a list with the following components:
marker |
vector containing the biomarker values. |
left |
vector containing the lower edges of the observed intervals. |
right |
vector containing the upper edges of the observed intervals. |
probs |
vector with the probabilities corresponding to the predictive model. |
sd.probs |
vector containing the standard deviations of the predictive model if they have been manually entered. |
outcome |
vector with the condition of the subjects at the time |
controls |
number of negative subjects. |
cases |
number of positive subjects. |
misout |
number of subjects whose condition is unknown. |
message |
table containing the warning messages generated during the execution of the function. |
check_marker_binout
and check_marker_timerc
Checks the consistency of the parameters entered for diagnosis scenarios.
check_marker_binout(marker, status, probs, sd.probs)
check_marker_binout(marker, status, probs, sd.probs)
marker |
vector with the biomarker values. It is a mandatory parameter. |
status |
numeric response vector. Only two values will be taken into account. The highest one is assumed to stand for the subjects having the event under study. The lowest value, for those who do not. Any other value will not be considered. It is a mandatory parameter in diagnosis scenarios. |
probs |
vector containing the probabilities corresponding to the predictive model when it has been externally computed. Obviosly, only values between [0,1] are admissible. |
sd.probs |
vector with the standard deviations of the probabilities entered in |
The ouput is a list with the following components:
marker |
vector containing the biomarker values. |
outcome |
vector with the condition of the subjects as positive or negative. |
probs |
vector with the probabilities corresponding to the predictive model. |
sd.probs |
vector containing the standard deviations of the predictive model if they have been manually entered. |
controls |
number of negative subjects. |
cases |
number of negative subjects. |
misout |
number of subjects whose outcome value is not known. |
message |
table containing the warning messages generated during the execution of the function. |
check_marker_timerc
and check_marker_timeic
Checks the consistency of the parameters entered for prognosis scenarios under right censorship.
check_marker_timerc(marker, status, observed.time, time, probs, sd.probs)
check_marker_timerc(marker, status, observed.time, time, probs, sd.probs)
marker |
vector with the biomarker values. It is a mandatory parameter. |
status |
numeric response vector. Only two values will be taken into account. The highest one is assumed to stand for the subjects having the event under study. The lowest value, for those who do not. Any other value will not be considered. It is mandatory in prognosis scenarios and right censorship. |
observed.time |
vector with the observed times for each subject. These values can be the event times or the censoring times. It is mandatory when dealing with time-dependent outcomes under right censorship. |
time |
point of time at which the sMS ROC curve estimator will be computed. It is a mandatory parameter. The default value is 1. |
probs |
vector containing the probabilities corresponding to the predictive model when it has been externally computed. Only values between [0,1] are admissible. |
sd.probs |
vector with the standard deviations of the probabilities entered in |
The function returns a list with the following components:
marker |
vector containing the biomarker values. |
status |
response vector. |
observed.time |
vector containing the observed time. Recall event/censoring time. |
probs |
vector with the probabilities corresponding to the predictive model. |
sd.probs |
vector containing the standard deviations of the predictive model if they have been manually entered. |
outcome |
vector with the condition of the subjects at the time given in |
controls |
number of negative subjects. |
cases |
number of positive subjects. |
misout |
number of unknown subjects. |
message |
table containing the warning messages generated during the execution of the function. |
check_marker_binout
and check_marker_timeic
When the predictive model is entered manually by the user, this function ensures that no other method by default is used to compute it.
check_meth(meth, probs)
check_meth(meth, probs)
meth |
method for approximating the predictive model |
probs |
vector containing the probabilities corresponding to the predictive model when it is entered manually by the user. |
If the predicitve model has been manually indicated, this function sets the parameter to "M" ignoring other options. In this way, none of the function computing the predictive model will be called.
A list with one component:
meth |
value set up for the method parameter.It can take eiter the value entered by the user or its default, if the predictive model was not manually indicated, or the value "M", when the predicitive model was entered in the parameter |
This function checks if the value entered as number of bootstrap samples is correct.
check_nboots(nboots)
check_nboots(nboots)
nboots |
number of bootstrap samples to be run. The default value is 500. |
A list with two components:
nboots |
value entered as number of bootstrap samples. |
message |
table with the warning messages generated by the function. |
Checks the number of CPUs to be used when parallel computing is performed. The default value is 1 and the maximum is 2.
check_ncpus(ncpus)
check_ncpus(ncpus)
ncpus |
number of CPUs to be used when performing parallel computing. |
A list with two components:
ncpus |
value entered as number of CPUs chosen. |
message |
table with the warning messages generated by the function. |
Cheking of the parameter time.
check_tim(time)
check_tim(time)
time |
point of time at which the ROC curve estimate in prognosis scenarios will be computed. It is mandatory in this scenario. |
A list with two components:
time |
value entered as time. |
message |
table with the warning messages generated by the function. |
Determines the type of scenario handled: diagnosis or prognosis, under right or interval censorship, according to the parameters entered by the user.
check_type_outcome(status, observed.time, left, right)
check_type_outcome(status, observed.time, left, right)
status |
numeric response vector. Only two values will be taken into account. The highest one is assumed to stand for the subjects having the event under study. The lowest value, for those who do not. Any other value will not be considered. |
observed.time |
vector with the observed times for each subject, when dealing with time-dependent outcomes under right censorship. These values can be the event times or the censoring times. |
left |
vector containing the lower edges of the observed intervals. It is mandatory when dealing with prognosis scenarios and interval censorship, and will be ignored in other situations. |
right |
vector with the upper edges of the observed intervals. It is mandatory as well in prognosis scenarios and interval censorship and ignored in other situations. |
If both the vectors status and observed time are indicated the funtion assumes a prognosis scenario and right censorship. When only the vector status is entered, a diagnosis scenario is set up. If none of these parameters are indicated but the left and right ones, a prognosis scenario and interval censorship is assumed. Any other case, the function is not able to determine the type of scenario.
A list with a single component:
type.outcome |
string of length 6 with the following values:
|
Computes the weighted empirical ROC curve estimator associated to the input biomarker.
compute_ROC(marker, probs, grid)
compute_ROC(marker, probs, grid)
marker |
vector with the biomarker values. |
probs |
vector containing the probabilities corresponding to the predictive model. |
grid |
grid size. |
This function computes the weighted empirical estimators for the sensitivity (SE) and specificity (SP) using as weights the probabilities given by the predictive model. Then, the ROC curve is approximated through linear interpolation of 1 - SP and SE and computed at a partition of the interval of size
grid
.
The returned value is a list with the following components:
SE |
vector with the weighted empirical estimator of the sensitivity. |
SP |
vector with the weighted empirical estimator of the specificity. |
u |
vector containing the points between 0 and 1 at which the ROC curve estimator will be computed. Its size is determined by the |
ROC |
ROC curve approximated at each point of the vector |
auc |
area under the weighted empirical ROC curve estimator. |
marker |
vector with the ordered biomarker values. |
probs |
vector with the probabilities of the predictive model corresponding to each biomarker value. |
sMSbinout
, sMStimerc
and sMStimeic
Prints the AUC estimate value and its confidence intervals computed by the sMSROC
function.
conf_int_print(sMS)
conf_int_print(sMS)
sMS |
object of class |
This function reads the AUC, lower and upper edges of its confidence intervals and the confidence level at which they were computed and prints this information in a single line.
Printed string in the console containing the AUC, its confidence intervals and the confidence level at which they were computed.
sMSROC
data(diabet) roc <- sMSROC(marker=diabet$stab.glu, status=diabet$glyhb, conf.int="T") conf_int_print(roc)
data(diabet) roc <- sMSROC(marker=diabet$stab.glu, status=diabet$glyhb, conf.int="T") conf_int_print(roc)
This dataset contains part of the Diabetes Dataset (see References), courtesy of Dr John Schorling from the Department of Medicine, University of Virginia School of Medicine. This version contains 3 variables on 403 subjects interviewed to understand the prevalence of several cardiovascular risks factors in central Virginia for African Americans.
data("diabet")
data("diabet")
A data frame with 403 observations on the following 3 variables.
stab.glu
a numeric vector indicating the level of stabilized glucose.
glyhb
a numeric vector indicating the level of glycosolated hemoglobin.
age
age in years of the participants.
diab
a numeric vector indicating whether the subject is diagnosed as diabetic (value = 1) or not (value = 0).
The diab variable is not present in the original dataset. Here, values of glycosolated hemoglobin > 7.0 were taken as a positive diagnosis of diabetes (diab = 1) and those of glycosolated hemoglobin <= 7.0 as a negative diagnosis (diab = 0).
Full dataset can be downloaded at https://hbiostat.org/data.
Willems JP, Saunders JT, Hunt DE, Schorling JB. Prevalence of coronary heart disease risk factors among rural blacks: a community-based study. South Med J. 1997 Aug;90(8):814-20. PMID: 9258308.
data(diabet) summary(diabet)
data(diabet) summary(diabet)
Plots, in prognosis scenarios, the areas under the ROC curves computed by the sMSROC estimator for a sequence of times.
evol_auc(marker, status, observed.time, left, right, time = 1, meth = c("L", "S", "E"), grid = 500)
evol_auc(marker, status, observed.time, left, right, time = 1, meth = c("L", "S", "E"), grid = 500)
marker |
vector with the biomarker values. |
status |
numeric response vector. Only two values will be taken into account. The highest one is assumed to stand for the subjects having the event under study. The lowest value, for those who do not. Any other value will not be considered. It is mandatory in prognosis scenarios and right censorship. |
observed.time |
vector with the observed times for each subject, for prognosis scenarios under right censorship. Notice that these values may be the event times or the censoring times. |
left |
vector containing the lower edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. |
right |
vector with the upper edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. The infinity is admissible as value (indicated as inf). |
time |
vector of times at which the sMS ROC curve estimator will be computed. The default value is 1. |
meth |
method for approximating the predictive model
|
grid |
grid size for computing the AUC. Default value 500. |
This function calls the sMSROC
function at each of the times indicated in the vector time
, and the AUC is computed according to the parameters indicated.
A list with the following components:
evol.auc |
object of class |
time |
vector with the ordered values of the |
auc |
vector with the values of the AUCs computed at the times indicated at the |
sMSROC
# Example of the use of the evol.AUC function data(ktfs) DT = ktfs aucs <- evol_auc(marker = DT$score, status = DT$failure, observed.time = DT$time, time = seq(2:3), meth = "E") aucs$evol.auc
# Example of the use of the evol.AUC function data(ktfs) DT = ktfs aucs <- evol_auc(marker = DT$score, status = DT$failure, observed.time = DT$time, time = seq(2:3), meth = "E") aucs$evol.auc
Plots the kernel density estimations of the biomarker distributions on positive and negative individuals.
explore_plot(marker, status, observed.time, left, right, time)
explore_plot(marker, status, observed.time, left, right, time)
marker |
vector with the biomarker values. |
status |
numeric response vector. The highest value is assumed to stand for the subjects having the event under study. The lowest value, for those who do not. Any other value will not be considered. |
observed.time |
vector with the observed times for each subject, for prognosis scenarios under right censorship. Notice that these values may be the event times or the censoring times. |
left |
vector containing the lower edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. |
right |
vector with the upper edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. The infinity is admissible as value (indicated as inf). |
time |
point of time at which the sMS ROC curve estimator will be computed. The default value is 1. |
The ouput is a list with three components:
plot |
object of class |
neg |
vector with the biomarker values on negative individuals. |
pos |
vector with the marker values on positive individuals. |
explore_table
data(diabet) explore_plot(marker=diabet$stab.glu, status=diabet$diab)
data(diabet) explore_plot(marker=diabet$stab.glu, status=diabet$diab)
This function provides descriptive statistics for the pooled sample and the samples of positive, negative individuals and those whose condition is unknown.
explore_table(marker, status, observed.time, left, right, time, d, ...)
explore_table(marker, status, observed.time, left, right, time, d, ...)
marker |
vector with the biomarker values. It is a mandatory parameter. |
status |
numeric response vector. The highest value is assumed to stand for the subjects having the event under study. The lowest value, for those who do not. Any other value will not be considered. |
observed.time |
vector with the observed times for each subject, for prognosis scenarios under right censorship. Notice that these values may be the event times or the censoring times. |
left |
vector containing the lower edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. |
right |
vector with the upper edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. The infinity is admissible as value (indicated as inf). |
time |
point of time at which the sMS ROC curve estimator will be computed. The default value is 1. |
d |
number of decimal figures to which the results will be rounded. |
... |
additional parameters of the |
The function computes the following descriptive statistics for the pooled sample and the samples of the different groups of individuals: minimum, maximun, mean, variance, standard deviation, and first, second and third quartiles.
The ouput is a list with two components:
summary |
matrix whose columns are the statistics described above and the rows show the corresponding results for each sample. |
table |
object of class |
explore_plot
data(diabet) explore_table(marker=diabet$stab.glu, status=diabet$diab)
data(diabet) explore_table(marker=diabet$stab.glu, status=diabet$diab)
Synthetic dataset generated to simulate data from a study that aimed to assess the predictive ability of a constructed score to determine the worsening in the fibrosis stage in individuals infected by the hepatitis C (HC) virus. When participants underwent a target revision, their fibrosis stage, certain polymorphisms, and other clinical variables were collected. Highest stages of fibrosis were considered a worsening in the disease. See References for more information about the study.
data("fibrosis")
data("fibrosis")
A data frame with 722 observations and the following variables:
Id
Identification label for each participant.
Score
Score proposed to stratify participants infected by the HC virus according to the risk of a worsening in their fibrosis stage.
Start
Lower edge of the observable interval.
Stop
Upper edge of the observable interval. It can take the value infinity, represented as Inf.
Synthetic dataset.
Vidal-Castineira JR. al. Genetic contribution of endoplasmic reticulum aminopeptidase 1 polymorphisms to liver fibrosis progression in patients with HCV infection. Journal of Molecular Medicine, 98:1245-1254, 2020. doi:10.1007/s00109-020-01948-1
data(fibrosis) summary(fibrosis)
data(fibrosis) summary(fibrosis)
Dataset originally delivered in the RISCA
package. It contains data from kidney transplant recipients for whom the Kidney Transplant Failure Score (KTFS) was collected. The KTFS is a score proposed by Foucher et al. (2010) (see References) to assess the recipients according to their risk of returning in dialysis.
data("ktfs")
data("ktfs")
A data frame with 2169 observations and the following 3 variables:
time
a numeric vector depicting the follow-up time in years.
failure
a numeric vector indicating the graft failure at the end of the follow-up (1-Yes, 0-Censoring).
score
a numeric vector representing the KTFS value.
This dataset is available at RISCA
package. More information about the KTFS score can be found at https://www.divat.fr.
Foucher Y. al. A clinical scoring system highly predictive of long-term kidney graft survival. Kidney International, 78:1288-94, 2020. doi:10.1038/ki.2010.232.
data(ktfs) summary(ktfs)
data(ktfs) summary(ktfs)
Estimation of the predictive models in diagnosis scenarios.
pred_model_binout(marker, status, meth)
pred_model_binout(marker, status, meth)
marker |
vector with the biomarker values. |
status |
numeric response vector. The highest value is assumed to stand for the subjects having the event under study. The lowest value, for those who do not. Any other value will not be considered. |
meth |
method for approximating the predictive model
|
If meth
= “L”, the logit transformation of the predicitive model is approximated by a linear logistic regression model:
with .
If meth
= “S”, the logit transformation of the predicitive model is estimated by the smooth logistic regression,
being the smooth function (splines, doi:10.1002/sim.4780080504).
The returned value is a list with the two components:
marker |
vector containing the ordered marker values. |
probs |
vector with the probabilities corresponding to each marker value estimated through the predictive model. |
sMS_binout
and sMSROC
Naive estimation of the predictive model.
pred_model_emp(marker, status)
pred_model_emp(marker, status)
marker |
vector with the biomarker values. |
status |
numeric response vector. The highest value is assumed to stand for the subjects having the event under study. The lowest value, for those who do not. Any other value will not be considered. |
This method for estimating the predictive model is used in both diagnosis and prognosis scenarios. It allocates individuals their own condition as positive or negative. Those with unknown condition are dismissed.
The returned value is a list with two components:
marker |
vector containing the ordered marker values. |
probs |
vector with the probabilities corresponding to each marker value estimated through the predictive model. |
sMS_binout
, sMS_timerc
, sMS_timeic
and sMSROC
Estimation of the predictive model in prognosis scenarios under interval censorship.
pred_model_timeic(marker, left, right, outcome, time, meth)
pred_model_timeic(marker, left, right, outcome, time, meth)
marker |
vector with the biomarker values. |
left |
vector containing the lower edges of the observed intervals. It is mandatory in prognosis scenarios and interval censorship and ignored in other situations. |
right |
vector with the upper edges of the observed intervals. It is mandatory as well in prognosis scenarios and interval censorship and ignored in other situations. |
outcome |
vector with the condition of the subjects as positive, negative or unknown at the considered time |
time |
point of time at which the sMS ROC curve estimator will be computed. |
meth |
method for approximating the predictive model
|
If meth
= “L”, the event times are assumed to come from a Cox proportional hazards regression model and the predictive model is estimated as indicated in
doi:10.1080/00949655.2020.1736071.
where and
, being L and R the random variables that stand for the edges of the observable interval containing the event time.
If meth
= “S”, the approximation is done by
being the survival function at time t given the marker value, estimated through a proportional hazard model for interval censored data according to doi:10.2307/2530698.
The returned value is a list with three components:
marker |
vector containing the ordered marker values. |
probs |
vector with the probabilities corresponding to each marker value estimated through the predictive model. |
outcome |
vector with the condition of the subjects as positive, negative or censored at the considered time |
sMS_timeic
and sMSROC
Estimation of the predictive model in prognosis scenarios under right censorship.
pred_model_timerc(marker, status, observed.time, outcome, time, meth)
pred_model_timerc(marker, status, observed.time, outcome, time, meth)
marker |
vector with the biomarker values. |
status |
numeric response vector. The highest value is assumed to stand for the subjects having the event under study. The lowest value, for those who do not. Any other value will not be considered. |
observed.time |
vector with the observed times for each subject. Notice that these values may be the event times or the censoring times. |
outcome |
vector with the status of the subjects as positive, negative or censored (unknown) at the considered time |
time |
point of time at which the sMS ROC curve estimator will be computed. |
meth |
method for approximating the predictive model
|
If meth
= “L”, the event times are assumed to come from a Cox proportional hazards regression model:
where is the baseline hazard function and
.
If meth
= “S”, the approximation is done by
being the smooth function (penalized splines, doi:10.1111/1467-9868.00125).
The returned value is a list with three components:
marker |
vector containing the ordered marker values. |
probs |
vector with the probabilities corresponding to each marker value estimated through the predictive model. |
outcome |
vector with the status of the subjects as positive, negative or censored at the considered time |
sMS_timerc
and sMSROC
Prints the estimated AUC and the probabilistic model used to compute the predictive model.
## S3 method for class 'sMSROC' print(x, ...)
## S3 method for class 'sMSROC' print(x, ...)
x |
object of class |
... |
Ignored. |
This function prints the estimated area under the ROC curve computed through the sMSROC estimator and the probabilistic model used to compute the predictive model.
Printed output in the console containing the information described above.
sMSROC
data(diabet) roc <- sMSROC(marker=diabet$stab.glu, status=diabet$glyhb, conf.int="T") print(roc)
data(diabet) roc <- sMSROC(marker=diabet$stab.glu, status=diabet$glyhb, conf.int="T") print(roc)
This function plots the predicted probabilities for each marker value computed through the predictive model together, with 95% pointwise confidence intervals.
probs_pred(sMS, var, nboots, parallel, ncpus)
probs_pred(sMS, var, nboots, parallel, ncpus)
sMS |
object of class |
var |
parameter indicating whether 95% pointwise confidence intervals for the predictive model will be plotted (value "T") or not (value "F"). The default value is "F". |
nboots |
number of bootstrap samples to be generated for computing the pointwise confidence intervals. The default value is 500. |
parallel |
parameter indicating whether parallel computing will be performed (value "T") or not (value "F"). The default is “F”. |
ncpus |
number of CPUS to be used in the case of carrying out parallel computing. The default value is 1 and the maximum is 2. |
The function plots the probability function estimation of the predictive model versus the biomaker. It also computes and plots 95% pointwise confidence intervals on the same graphic when the var
parameter is set to "T".
The variance of the probability estimates, obtained by the predictive model, is computed via bootstrap with nboots
samples.
A list with these components:
plot |
object of class |
thres |
ordered biomarker values (x-axis coordinates). |
probs |
predicted probabilities (y-axis coordinates). |
sd.probs |
estimates of the standard deviation of the predicted probabilities. |
pred_model_binout
, pred_model_timerc
and pred_model_timeic
data(ktfs) DT <- ktfs roc <- sMSROC(marker = DT$score, status = DT$failure, observed.time = DT$time, time = 5, meth = "S") probs <- probs_pred(sMS = roc) probs$plot
data(ktfs) DT <- ktfs roc <- sMSROC(marker = DT$score, status = DT$failure, observed.time = DT$time, time = 5, meth = "S") probs <- probs_pred(sMS = roc) probs$plot
Wrap function for computing the sMS estimator in diagnosis scenarios.
sMS_binout(marker, status, meth, grid, probs, all)
sMS_binout(marker, status, meth, grid, probs, all)
marker |
vector with the biomarker values. |
status |
numeric response vector. |
meth |
method for approximating the predictive model |
grid |
grid size. |
probs |
vector with the probabilities from the predictive model when it is manually entered. |
all |
parameter indicating whether all probabilities given by the predictive model should be considered (value “T”) or just those corresponding to individuals whose condition as positive or negative is unknown (“F”). The default value is (“T”). |
The function obtains the probabilities corresponding to the predictive model (first stage of the sMS ROC curve estimator). If they were not manually entered, the functions pred.mod.emp
or pred.mod.binout
are called depending on the chosen meth
. Then, it calls the function computeROC
to compute the weighted empirical ROC curve estimator (second stage).
The returned value is a list with the following components:
SE |
vector with the weighted empirical estimator of the sensitivity. |
SP |
vector with the weighted empirical estimator of the specificity. |
u |
vector containing the points between 0 and 1 at which the ROC curve estimator will be computed. Its size is determined by the |
ROC |
ROC curve approximated at each point of the vector |
auc |
area under the weighted empirical ROC curve estimator. |
marker |
vector with the ordered biomarker values. |
probs |
vector with the probabilities of the predictive model corresponding to each biomarker value. |
pred_mod_emp
, pred_mod_binout
,
computeROC
, sMS_timerc
and sMS_timeic
Wrap function for computing the sMS estimator in prognosis scenarios under interval censorship.
sMS_timeic(marker, left, right, outcome, time, meth, grid, probs, all)
sMS_timeic(marker, left, right, outcome, time, meth, grid, probs, all)
marker |
vector with the biomarker values. |
left |
vector containing the lower edges of the observed intervals. |
right |
vector with the upper edges of the observed intervals. The infinity is admissible as value (indicated as inf). |
outcome |
vector containing the condition of the individuals as positive, negative or censored at the time |
time |
point of time at which the sMSROC curve estimator will be computed. |
meth |
method for approximating the predictive model
|
grid |
grid size. |
probs |
vector with the probabilities from the predictive model when it is manually entered. |
all |
parameter indicating whether all probabilities given by the predictive model should be considered (value “T”) or just those corresponding to individuals whose condition as positive or negative is unknown (“F”). The default value is (“T”). |
This function gets the probabilities corresponding to the predictive model (first stage of the sMS ROC curve estimator). If they were not manually entered, the functions pred.mod.emp
or pred.mod.timeic
are called depending on the chosen meth
. Then, it calls the function computeROC
to compute the weighted empirical ROC curve estimator (second stage).
The returned value is a list with the following components:
SE |
vector with the weighted empirical estimator of the sensitivity. |
SP |
vector with the weighted empirical estimator of the specificity. |
u |
vector containing the points between 0 and 1 at which the ROC curve estimator will be computed. Its size is determined by the |
ROC |
ROC curve approximated at each point of the vector |
auc |
area under the weighted empirical ROC curve estimator. |
marker |
vector with the ordered biomarker values. |
outcome |
vector with the condition of the individuals at time |
probs |
vector with the probabilities of the predictive model corresponding to each biomarker value. |
pred.mod.emp
, pred.mod.binout
,
computeROC
, sMS.binout
and sMS.timerc
Wrap function for computing the sMS estimator in prognosis scenarios under right censorship.
sMS_timerc(marker, status, observed.time, outcome, time, meth, grid, probs, all)
sMS_timerc(marker, status, observed.time, outcome, time, meth, grid, probs, all)
marker |
vector with the biomarker values. |
status |
numeric response vector. |
observed.time |
vector with the observed times. These values may be the event times or the censoring times. |
outcome |
vector containing the condition of the individuals as positive, negative or censored (unknown) at the time |
time |
point of time at which the sMSROC curve estimator will be computed. |
meth |
method for approximating the predictive model |
grid |
grid size. |
probs |
vector with the probabilities from the predictive model when it is manually entered. |
all |
parameter indicating whether all probabilities given by the predictive model should be considered (value “T”) or just those corresponding to individuals whose condition as positive or negative is unknown (“F”). The default value is (“T”). |
This function gets the probabilities corresponding to the predictive model (first stage of the sMS ROC curve estimator). If they were not manually entered, the functions pred.mod.emp
or pred.mod.timerc
are called depending on the chosen meth. Then, it calls the function computeROC
to compute the weighted empirical ROC curve estimator (second stage).
The returned value is a list with the following components:
SE |
vector with the weighted empirical estimator of the sensitivity. |
SP |
vector with the weighted empirical estimator of the specificity. |
u |
vector containing the points between 0 and 1 at which the ROC curve estimator will be computed. Its size is determined by the |
ROC |
ROC curve approximated at each point of the vector |
auc |
area under the weighted empirical ROC curve estimator. |
marker |
vector with the ordered biomarker values. |
outcome |
vector with the condition of the individuals at time |
probs |
vector with the probabilities of the predictive model corresponding to each biomarker value. |
pred.mod.emp
, pred.mod.binout
,
computeROC
, sMS.timerc
, sMS.timeic
Core function for computing the sMS ROC estimator which fits the estimation of the ROC curve when the outcome of interest is time-dependent (prognosis scenarios) and when it is not (diagnosis scenarios).
sMSROC(marker, status, observed.time, left, right, time, meth, grid, probs, sd.probs, conf.int, ci.cl, ci.meth, ci.nboots, parallel, ncpus, all)
sMSROC(marker, status, observed.time, left, right, time, meth, grid, probs, sd.probs, conf.int, ci.cl, ci.meth, ci.nboots, parallel, ncpus, all)
marker |
vector with the biomarker values. |
status |
numeric response vector. The highest value is assumed to stand for the subjects having the event under study. The lowest one, for those who do not. Any other value will not be considered. It is a mandatory parameter in diagnosis scenarios. |
observed.time |
vector with the observed times for each subject, for prognosis scenarios under right censorship. Notice that these values may be the event times or the censoring times. |
left |
vector containing the lower edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. |
right |
vector with the upper edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. The infinity is admissible as value (indicated as inf). |
time |
point of time at which the sMS ROC curve estimator will be computed. The default value is 1. |
meth |
method for approximating the predictive model
|
probs |
vector containing the probabilities corresponding to the predictive model when it has been externally computed. Only values within [0,1] are admissible. |
sd.probs |
vector with the standard deviations of the probabilities entered in |
grid |
grid size for computing the AUC. Default value 1000. |
conf.int |
indicates whethet a conficence interval for the AUC will be computed (“T”) or not (“F”). The default value is (“F”). |
ci.cl |
confidence level at which the confidence interval for the AUC will be provided. The default value is 95%. This parameter is ignored when |
ci.meth |
method for computing the confidence interval for the AUC. There are three options:
|
ci.nboots |
number of boostrap samples to be run when Boostrap is set as |
parallel |
indicates whether parallel computing will be done (“T”) or not (“F”) when computing the variance of the AUC through the methods “V” and “B”. |
ncpus |
number of CPUS that will be used when parallel computing is chosen. The default value is 1 and the maximum is 2. |
all |
parameter indicating whether all probabilities given by the predictive model should be considered (value “T”) or just those corresponding to individuals whose condition as positive or negative is unknown (“F”). The default value is (“T”). |
The Two-stages mixed-subjects (sMSROC) ROC curve estimator links diagnosis and prognosis scenarios through a general predictive model (first stage) and the weighted empirical estimator of the cumulative distribution function of the biomarker (second stage).
The predictive model depicts the relationship between the biomarker and the binary response variable. It is approximated through the most suitable probabilistic model.
For diagnosis scenarios:
If meth
= “L”, the logit transformation of the predicitive model is approximated by a linear logistic regression model:
with .
If meth
= “S”, the logit transformation of the predicitive model is estimated by the smooth logistic regression,
being the smooth function (splines, doi:10.1002/sim.4780080504).
Notice that the predictive model allows to compute the probability of being positive/negative even when the actual belonging group is unknown.
For prognosis scenarios and right censorship:
If meth
= “L”, the event times are assumed to come from a Cox proportional hazards regression model:
where is the baseline hazard function and
.
If meth
= “S”, the approximation is done by
being the smooth function (penalized splines, doi:10.1111/1467-9868.00125).
Finally, for prognosis scenarios and interval censorship:
If meth
= “L”, the event times are assumed to come from a Cox proportional hazards regression model and the predictive model is estimated as indicated in doi:10.1080/00949655.2020.1736071.
where and
, being L and R the random variables that stand for the edges of the observable interval containing the event time.
If meth
= “S”, the approximation is done by
being the survival function at time t given the marker value, estimated through a proportional hazard model for interval censored data according to doi:10.2307/2530698.
The confidence intervals for the AUC can be computed in three different ways according to parameter ci.meth
. When it is set to "E" the variance of the AUC is estimated by the empirical
procedure and when the chosen option is "V", the theoretical approximation is used (see doi:10.1515/ijb-2019-0097). The third option in by using the Bootstrap percentile.
The ouput is an objetc of class sMSROC
with the following components:
thres |
vector containing the biomarker values for which sensitivity and specificity were computed. |
SE |
vector with the estimates of the sensitivity. |
SP |
vector with the estimates of the specificity. |
probs |
vector with the probabilities corresponding to the predictive model. |
u |
vector containing the points between 0 and 1 at which the ROC curve estimator will be computed. Its size is determined by the |
ROC |
ROC curve approximated at each point of the vector |
auc |
area under sMSROC curve estimator. |
auc.ci.l |
lower edge of the confidence interval for the AUC. |
auc.ci.u |
upper edge of the confidence interval for the AUC. |
ci.cl |
confidence level at which the confidence interval for the AUC were computed. |
ci.meth |
method chosen for computing the confidence interval for the AUC. |
time |
point of time at which the sMS ROC curve estimator was computed in prognosis scenarios. |
data |
list contaning several parameters used in the internal functions, when applicable:
|
message |
table containing the warning messages generated during the execution of the function. |
S. Díaz-Coto, P. Martínez-Camblor, and N. O. Corral-Blanco. Cumulative/dynamic ROC curve estimation under interval censorship. Journal of Statistical Computation and Simulation, 90(9):1570– 1590, 2020. doi:10.1080/00949655.2020.1736071.
S. Díaz-Coto, N. O. Corral-Blanco, and P. Martínez-Camblor. Two-stage receiver operating-characteristic curve estimator for cohort studies. The International Journal of Biostatistics, 17:117–137, 2021. doi:10.1515/ijb-2019-0097.
Finkelstein, Dianne M. A Proportional Hazards Model for Interval-Censored Failure Time Data. Biometrics 42, no. 4 (1986): 845–54. doi:10.2307/2530698.
Durrleman S, Simon R. Flexible regression models with cubic splines. Statistics in Medicine 1989; 8(5): 551-561.doi:10.1002/sim.4780080504
Hurvich C, Simonoff J, Tsai CL. Smoothing parameter selection in nonparametric regression using an improved Akaike 1998. J.R. Statist. Soc. 60 271-293. doi:10.1111/1467-9868.00125
B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. CRC press, 1994.
data(ktfs) DT <- ktfs sROC <- sMSROC(marker = DT$score, status = DT$failure, observed.time = DT$time, time = 5, meth = "L", conf.int = "T", ci.cl =0.90, ci.meth = "E")
data(ktfs) DT <- ktfs sROC <- sMSROC(marker = DT$score, status = DT$failure, observed.time = DT$time, time = 5, meth = "L", conf.int = "T", ci.cl =0.90, ci.meth = "E")
Provides informative plots of the sMS ROC curve estimates.
sMSROC_plot(sMS, m.value)
sMSROC_plot(sMS, m.value)
sMS |
object of class |
m.value |
marker value. It is an optional parameter that, when indicated, adds over the graphic of the ROC curve, the point which corresponds to that marker value. |
The function provides two types of graphics:
A basic plot approximating the ROC curve by the pairs given by the sequences 1 - SP and SE, from the sMSROC
object. The layers geom_roc()
and roc_style()
from the plotROC
package were added to this plot, which make possible to take advantage of the functionality of this package.
A customized graphic of the ROC curve whose class is ggplot
, obtained approximating the sequences 1 - SP and SE. When te parameter m.value
is indicated, the final plot displays over the ROC curve estimate the point that corresponds to the entered value.
A list with the following elements:
basic.plot |
object that can be used and customized by the tools from the |
roc.plot |
object of class |
sMSROC
# Example of the use of the plot.sMSROC function data(ktfs) DT = ktfs ROC <- sMSROC(marker = DT$score, status = DT$failure, observed.time = DT$time, time = 5, meth = "S") plot <- sMSROC_plot(sMS = ROC, m.value = 4.2) plot$basicplot; plot$rocplot
# Example of the use of the plot.sMSROC function data(ktfs) DT = ktfs ROC <- sMSROC(marker = DT$score, status = DT$failure, observed.time = DT$time, time = 5, meth = "S") plot <- sMSROC_plot(sMS = ROC, m.value = 4.2) plot$basicplot; plot$rocplot
Estimation of the variance of the predictive model by bootstrap.
variance_probs(marker, outcome, status, observed.time, left, right, time, meth, data_type, grid, probs, ci.nboots, parallel, ncpus, all)
variance_probs(marker, outcome, status, observed.time, left, right, time, meth, data_type, grid, probs, ci.nboots, parallel, ncpus, all)
marker |
vector with the biomarker values. |
outcome |
vector with the condition of the subjects as positive, negative or unknown at the considered time |
status |
response vector with the outcome values. The highest one is assumed to stand for the subjects having the event under study. |
observed.time |
vector with the observed times for each subject. |
left |
vector with the lower edges of the observed intervals. |
right |
vector with the upper edges of the observed intervals. |
time |
point of time at which the sMS ROC curve estimator will be computed. |
meth |
method for approximating the predictive model |
data_type |
scenario handled. |
grid |
grid size. |
probs |
vector containing the probabilities estimated through to the predictive model. |
ci.nboots |
number of bootstrap samples. |
parallel |
indicates whether parallel computing will be done or not. |
ncpus |
number of CPUs to use if parallel computing is performed. |
all |
indicates whether the probabilities from the predictive model should be considered or not. |
List with a single component:
sd.probs |
vector containing the standard deviation of the probabilities of the predictive model. |