When people evaluate the performance of a diagnostic test, it is important to control both True Positive Rate (TPR) and False Positive Rate (FPR). In the literature, most researchers propose the partial area under the ROC curve (pAUC) with restrictions on FPR to assess a binary classification system, which is named as FPR pAUC. It could be artificially designed to measure the area controlled by TPR, but is often misleading conceptually and practically. A new and intuitive method, named two-way pAUC, is provided in Yang et al. (2016), which focuses directly on the partial area under the ROC curve with both horizontal and vertical restrictions. This package solves two-way pAUC estimation based on a non-parametric method in Yang et al. (2016). Moreover, estimation and inference of FPR partial AUC and FNR parital ODC are included in this package, utilizing algorithms proposed in Yang et al. (2017) (see Methodology for details).
The ROC curve is a well-established graphical tool used to evaluate performance of a classifier in accurately discriminating between subjects from different populations (e.g., diseased and healthy individuals). Let F and G be distribution functions of random variables X and Y corresponding to independent populations. Let G−1(t) = inf {y : G(y) ≥ t} be the quantile function of G, 0 < t < 1. Let SF(t) and SG(t) be the corresponding survival functions SF(t) = 1 − F(t) and SG(t) = 1 − G(t). For t ∈ (0, 1), the ROC curve is defined as ROC(t) = 1 − F{G−1(1 − t)} or ROC(t) = SF{SG−1(t)}, where t is the value of FPR and SG−1(t) = G−1(1 − t). The ROC curve is not a convenient tool for comparisions, in particular when two ROC curves cross. A summary measure of an ROC curve can be found by integrating the ROC curve over the the range of FPR values to obtain the area under the ROC curve as AUC = ∫01ROC(t)dt = ∫∞−∞SF(u)dSG(u). For economical and practical purposes, it is common to hold the FPR to a low level. When interest is restricted to a sub-region of the ROC space, the partial area under the ROC curve, pAUC(P0) = ∫0P0ROC(p)dp for the threshold value of FPR P0 ∈ (0, 1), can provide a useful summary measure.
Let X = {Xi, i = 1, ..., m}
and Y = {Yi, i = 1, ..., n}
be random samples from the distribution functions F(x) and G(y), respectively. A
Mann-Whitney nonparamteric method for pAUC is (method='WM'
)
$$
\widehat{pAUC}(P_0 ) = \frac{1}{mn}\sum^{m}_{i=1}\sum^{n}_{j=1}I(X_i\ge
Y_j)I\{Y_j\ge S^{-1}_{G,n}(P_0)\},
$$ where SG, n−1(t) = inf {x ∈ R; t ≥ SG, n(x)}
and SF, m(⋅)
and SG, n(⋅)
are estimators of SF and SG based on
empirical distributions.
Wang and Chang
(2011) propose the following method
(method = 'expect'
), $$\widetilde{pAUC}(P_0) = P_{0} - \frac{1}{m}
\overset{m}{\underset{i=1}{\sum}} \min \{ S_{G, n}(X_{i}),
P_{0}\}.$$
Yang
et al. (2017) propose a jackknife method
(method = 'jackknife'
) based on $\widetilde{pAUC}(P_0)$, in particular, $$
\widetilde{pAUC}_{jack}(P_0)=\frac{1}{n+m}\sum^{n+m}_{h=1}{V}_h(P_0),
$$ where $$
{V}_h(P_0)=(n+m)\widetilde{pAUC}(P_0 )-(n+m-1)\widetilde{pAUC}_h(P_0),
$$ and $$
\widetilde{pAUC}_h(P_0)=\left\{\begin{array}{ll}
P_{0} - \frac{1}{m-1} \sum \limits_{ i \ne h }^m \min \{ S_{G,
n}(X_{i}), P_{0}\} & ~\text{ $1 \leq h \leq m$}\\
P_{0} - \frac{1}{m} \sum \limits_{ i=1 }^m \min \{ S_{G,n-1,
h-m}(X_{i}), P_{0}\} & ~ \text{$m+1 \leq h \leq m+n,$}
\end{array} \right.
$$ where $$S_{G,n-1,
h-m}(X_{i})=\frac{1}{n-1} \overset{n}{\underset{j=1, j \neq
h-m}{\sum}}I(Y_j>X_{i}). $$
The ordinal dominance curve (ODC) introduced by Bamber (1973), describes the association between true negative rate (TNR) and false negative rate (FNR), ODC(t) = G{F−1(t)} where t ∈ (0, 1). The area under the ODC, ∫01ODC(t)dt = ∫−∞∞G(u)dF(u), is a commonly used summary measure. A partial area under the ODC (pODC) from 0 to P0 is taken as pODC(P0) = ∫0P0ODC(t)dt.
A Mann-Whitney nonparamteric method for pAUC is
(method='WM'
) $$\widehat{pODC}(P_0)=\frac{1}{mn}\sum^{m}_{i=1}\sum^{n}_{j=1}I(Y_j\le
X_i
)I\{X_i\le F^{-1}_{m}(P_0)\},$$ where Fm−1(P0)
is an empirical quantile estimate at P0 and Fm(⋅) and Gn(⋅) are the
empirical distributions of F(⋅) and G(⋅).
Yang
et al. (2017) propose the following method
(method = 'expect'
), $$\widetilde{pODC}(P_0) = P_{0} - \frac{1}{n}
\overset{n}{\underset{j=1}{\sum}} \min \{ F_m(Y_{j}), P_{0}\}.$$
Yang
et al. (2017) propose a jackknife method
(method = 'jackknife'
) based on $\widetilde{pODC}(P_0)$, in particular, $$
\widetilde{pODC}_{jack}(P_0)=\frac{1}{n+m}\sum^{n+m}_{h=1}\check{U}_h(P_0),
$$ where $$
\check{U}_h(P_0)=(n+m)\widetilde{pODC}(P_0)-(n+m-1)\widetilde{pODC}_h(P_0)
$$ and $$
\widetilde{pODC}_h(P_0)=\left\{\begin{array}{ll}
P_{0} - \frac{1}{n-1} \sum \limits_{j \ne h}^n \min \{ F_{m}(Y_{j}),
P_{0}\} & ~\text{ $1 \leq h \leq n$}\\
P_{0} - \frac{1}{n} \sum \limits_{j = 1 }^n \min \{F_{m-1, h-n}(Y_{j}),
P_{0}\} & ~ \text{$n+1 \leq h \leq m+n,$}
\end{array} \right.
$$ where $$F_{m-1,
h-n}(Y_{j})=\frac{1}{m-1} \overset{m}{\underset{i=1, i \neq
h-n}{\sum}}I(X_{i} \leq Y_j). $$
The definition and estimation of two-way pAUC are proposed intuitively. Given bounds p0 and q0, two-way pAUC is formulated as U(p0, q0) = ∫SG{SF−1(q0)}p0SF{SG−1(u)}du − [p0 − SG{SF−1(q0)}]q0. Alternatively, from a probability perspective, U(p0, q0) can be transformed as: P{Y < X, X ≤ SF−1(q0), Y ≥ SG−1(p0)}. A trimmed Mann-Whitney U-statistics estimator directly following the above expression is $$ \frac{1}{mn}\sum^{m}_{i=1}\sum^{n}_{j=1}V_{i,j} (p_0 , q_0), $$ where Vi, j(p0, q0) = I{Yj ≤ Xi, Xi ≤ SF, m−1(q0), Yj ≥ SG, n−1(p0)}.
Yang
et al. (2017) prove that, under certain conditions, $$
\sqrt{m+n}\{\widehat{pAUC}(P_0)-pAUC(P_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^{2}_{1}(P_0)}{\lambda}+\frac{\sigma^{2}_{2}(P_0)}{1-\lambda}\right\}, m,n\to
\infty, \nonumber
$$ where $\frac{m}{m+n}\to
\lambda$, σ12(P0) = ∫+∞SG−1(P0){P0 − SG(t)}2dSF(t) − {∫+∞SG−1(P0)SF(t)dSG(t)}2,
and σ22(P0) = ∫+∞SG−1(P0)[SF(t) − SF{SG−1(P0)}]2dSG(t) − (∫+∞SG−1(P0)[SF(t) − SF{SG−1(P0)}]dSG(t))2.
Moveover, under same conditions, $$\sqrt{m+n}\{\widetilde{pAUC}_{}(P_0)-pAUC(P_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^{2}_{1}(P_0)}{\lambda}+\frac{\sigma^{2}_{2}(P_0)}{1-\lambda}\right\}, m,n\to
\infty.$$ and $$
\sqrt{m+n}\{\widetilde{pAUC}_{jack}(P_0)-pAUC(P_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^{2}_{1}(P_0)}{\lambda}+\frac{\sigma^{2}_{2}(P_0)}{1-\lambda}\right\}, m,n\to
\infty.
$$
Consider the jackknife variance estimator $$S^2_{\widetilde{pAUC}}={(m+n)}^{-1}\sum^{m+n}_{h=1}\{{V}_h(P_0)-\widetilde{pAUC}_{jack}(P_0)\}^2.$$
Yang
et al. (2017) prove that $$
S^2_{\widetilde{pAUC}}(P_0)=\frac{\sigma^{2}_{1}(P_0)}{\lambda}+\frac{\sigma^{2}_{2}(P_0)}{1-\lambda}+o_p(1).
$$ Therefore, $$
\frac{\sqrt{m+n}\{\widetilde{pAUC}_{jack}(P_0)-pAUC(P_0)\}}{\sqrt{S^2_{\widetilde{pAUC}}(P_0)}}\stackrel{d}{\to}N(0,1).
$$
In ODC cases, we have $$ \sqrt{m+n}\{\widehat{pODC}(P_0)-pODC(P_0)\}\stackrel{d}{\to}N\left(0,\frac{\sigma^{2}_{3}}{1-\lambda}+\frac{\sigma^{2}_{4}}{\lambda}\right), m,n\to \infty, $$ where σ32 = ∫−∞F−1(P0){P0 − F(t)}2dG(t) − {∫−∞F−1(P0)G(t)dF(t)}2, and σ42 = ∫−∞F−1(P0)[G(t) − G{F−1(P0)}]2dF(t) − (∫−∞F−1(P0)[G(t) − G{F−1(P0)}]dF(t))2. Similarly, $$ \sqrt{m+n}\{\widetilde{pODC}_{}(P_0)-pODC(P_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^{2}_{3}(P_0)}{1- \lambda}+\frac{\sigma^{2}_{4}(P_0)}{\lambda}\right\}, $$ and $$\sqrt{m+n}\{\widetilde{pODC}_{jack}(P_0)-pODC(P_0)\}\stackrel{d}{\to}N\left(0,\frac{\sigma^{2}_{3}}{1- \lambda}+\frac{\sigma^{2}_{4}}{\lambda}\right).$$ Together with $$ \begin{align*} S^2_{\widetilde{pODC}}& ={(m+n)}^{-1}\sum^{m+n}_{h=1}\{\check{U}_h(P_0)-\widetilde{pODC}_{jack}(P_0)\}^2\\ &=\frac{\sigma^{2}_{3}(P_0)}{1-\lambda}+\frac{\sigma^{2}_{4}(P_0)}{\lambda}+o_p(1). \end{align*} $$ and $$ \frac{\sqrt{m+n}\{\widetilde{pODC}_{jack}(P_0)-pODC(P_0)\}}{\sqrt{S^2_{\widetilde{pODC}}(P_0)}}\stackrel{d}{\to}N(0,1). $$
From Yang et al. (2016), we have, under certain conditions, $$ \sqrt{m+n}\{\hat{U}(p_0,q_0)-U(p_0,q_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^2_5}{\lambda}+\frac{\sigma^2_6}{1-\lambda}\right\}, \quad \text{as } \;\; m,n\to \infty, $$ where $$ \begin{align} \sigma^2_5= &F\{G^{-1}(1-p_0)\}[G\{F^{-1}(1-q_0)\}-(1-p_0)]^2+\int^{F^{-1}(1-q_0)}_{G^{-1}(1-p_0)}[G\{F^{-1}(1-q_0)\}-G(t)]^2dF(t)\nonumber\\ &-\left\{\int^{F^{-1}(1-q_0)}_{G^{-1}(1-p_0)}F(t)dG(t)\right\}^2\nonumber, \end{align} $$ and $$ \begin{align} \sigma^2_6=&[1-q_0-F\{G^{-1}(1-p_0)\}]^2(1-p_0)\nonumber+ \int^{F^{-1}(1-q_0)}_{G^{-1}(1-p_0)}\{1-q_0-F(t)\}^2dG(t) \\ \nonumber & -\left\{\int^{F^{-1}(1-q_0)}_{G^{-1}(1-p_0)}G(t)dF(t)\right\}^2\nonumber. \end{align} $$
This packages contains following functions:
tproc.est
proc
'WM'
,
'expect'
and 'jackknife'
.
proc.est
'WM'
,
'expect'
and 'jackknife'
.
proc.ci
'WM'
,
'expect'
and 'jackknife'
.
podc
'WM'
,
'expect'
and 'jackknife'
.
podc.est
'WM'
,
'expect'
and 'jackknife'
.
podc.ci
'WM'
,
'expect'
and 'jackknife'
.
The purpose of this section is to show users the basic usage of this package. We will briefly go through main functions, see what they can do and have a look at outputs. An detailed example of complete procedures of estimation and inference will be presented to give users a general sense of the pakcage.
First, we load tpAUC
package:
Then, we estimate two-way partial AUC with date from package
pROC
.
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
## Warning in tproc.est(aSAH$outcome, aSAH$s100b, threshold = c(0.8, 0.2)):
## response levels are not 0/1, the first level is defaultly regarded as negative.
## [1] 0.4186992
tproc.est
returns an estimate of two-way partial
AUC.
Then, we turn to FPR partial AUC.
## Warning in proc.est(aSAH$outcome, aSAH$s100b, method = "expect", threshold =
## 0.8): response levels are not 0/1, the first level is defaultly regarded as
## negative.
## [1] 0.548103
# use method 'expect' to estimate partial AUC
proc.ci(aSAH$outcome,aSAH$s100b, cp=0.95 ,threshold=0.8,method='expect')
## Warning in proc.ci(aSAH$outcome, aSAH$s100b, cp = 0.95, threshold = 0.8, :
## response levels are not 0/1, the first level is defaultly regarded as negative.
## Warning in proc.est(response, predictor, threshold = threshold, method =
## "expect", : response levels are not 0/1, the first level is defaultly regarded
## as negative.
## 2.5 % 97.5 %
## [1,] 0.4528877 0.6433182
Alternatively, we can use proc
to do both estimation and
inference simultaneously.
## Warning in proc.est(response, predictor, threshold = threshold, method =
## method, : response levels are not 0/1, the first level is defaultly regarded as
## negative.
## Warning in proc.ci(response, predictor, cp = cp, threshold = threshold, :
## response levels are not 0/1, the first level is defaultly regarded as negative.
## Warning in proc.est(response, predictor, threshold = threshold, method =
## "expect", : response levels are not 0/1, the first level is defaultly regarded
## as negative.
## $pauc
## [1] 0.548103
##
## $ci
## 2.5 % 97.5 %
## [1,] 0.4528877 0.6433182
Similar procedures on FNR partial ODC are as follows.
## Warning in podc.est(aSAH$outcome, aSAH$s100b, method = "expect", threshold =
## 0.8): response levels are not 0/1, the first level is defaultly regarded as
## negative.
## [1] 0.5195122
# estimate FNR partial ODC with method 'expect'
podc.ci(aSAH$outcome, aSAH$s100b, method='expect',threshold=0.8, cp=0.97)
## Warning in podc.ci(aSAH$outcome, aSAH$s100b, method = "expect", threshold =
## 0.8, : response levels are not 0/1, the first level is defaultly regarded as
## negative.
## Warning in podc.est(response, predictor, threshold = threshold, method =
## "expect", : response levels are not 0/1, the first level is defaultly regarded
## as negative.
## 1.5 % 98.5 %
## [1,] 0.403401 0.6356234
podc
aggregates the functions of podc.est
and podc.ci
.
## Warning in podc.est(response, predictor, threshold = threshold, method =
## method, : response levels are not 0/1, the first level is defaultly regarded as
## negative.
## Warning in podc.ci(response, predictor, cp = cp, threshold = threshold, :
## response levels are not 0/1, the first level is defaultly regarded as negative.
## Warning in podc.est(response, predictor, threshold = threshold, method =
## "expect", : response levels are not 0/1, the first level is defaultly regarded
## as negative.
## $podc
## [1] 0.5195122
##
## $ci
## 1.5 % 98.5 %
## [1,] 0.403401 0.6356234