Package “tpAUC”

Basic Info

Methodology

Functions

Quick Start

Reference

Basic Info

When people evaluate the performance of a diagnostic test, it is important to control both True Positive Rate (TPR) and False Positive Rate (FPR). In the literature, most researchers propose the partial area under the ROC curve (pAUC) with restrictions on FPR to assess a binary classification system, which is named as FPR pAUC. It could be artificially designed to measure the area controlled by TPR, but is often misleading conceptually and practically. A new and intuitive method, named two-way pAUC, is provided in Yang et al. (2016), which focuses directly on the partial area under the ROC curve with both horizontal and vertical restrictions. This package solves two-way pAUC estimation based on a non-parametric method in Yang et al. (2016). Moreover, estimation and inference of FPR partial AUC and FNR parital ODC are included in this package, utilizing algorithms proposed in Yang et al. (2017) (see Methodology for details).

Methodology

Estimation:

pAUC:

The ROC curve is a well-established graphical tool used to evaluate performance of a classifier in accurately discriminating between subjects from different populations (e.g., diseased and healthy individuals). Let F and G be distribution functions of random variables X and Y corresponding to independent populations. Let G⁻¹(t) = inf {y : G(y) ≥ t} be the quantile function of G, 0 < t < 1. Let S_F(t) and S_G(t) be the corresponding survival functions S_F(t) = 1 − F(t) and S_G(t) = 1 − G(t). For t ∈ (0, 1), the ROC curve is defined as ROC(t) = 1 − F{G⁻¹(1 − t)} or ROC(t) = S_F{S_G⁻¹(t)}, where t is the value of FPR and S_G⁻¹(t) = G⁻¹(1 − t). The ROC curve is not a convenient tool for comparisions, in particular when two ROC curves cross. A summary measure of an ROC curve can be found by integrating the ROC curve over the the range of FPR values to obtain the area under the ROC curve as AUC = ∫₀¹ROC(t)dt = ∫_∞^−∞S_F(u)dS_G(u). For economical and practical purposes, it is common to hold the FPR to a low level. When interest is restricted to a sub-region of the ROC space, the partial area under the ROC curve, pAUC(P₀) = ∫₀^P₀ROC(p)dp for the threshold value of FPR P₀ ∈ (0, 1), can provide a useful summary measure.

Let X = {X_i, i = 1, ..., m} and Y = {Y_i, i = 1, ..., n} be random samples from the distribution functions F(x) and G(y), respectively. A Mann-Whitney nonparamteric method for pAUC is (method='WM') $\hat{p A U C} (P_{0}) = \frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} I (X_{i} \geq Y_{j}) I {Y_{j} \geq S_{G, n}^{- 1} (P_{0})},$ where S_G, n⁻¹(t) = inf {x ∈ R; t ≥ S_G, n(x)} and S_F, m(⋅) and S_G, n(⋅) are estimators of S_F and S_G based on empirical distributions.

Wang and Chang (2011) propose the following method (method = 'expect'), $\tilde{p A U C} (P_{0}) = P_{0} - \frac{1}{m} \overset{m}{\sum_{i = 1}} min {S_{G, n} (X_{i}), P_{0}} .$

Yang et al. (2017) propose a jackknife method (method = 'jackknife') based on $\widetilde{pAUC}(P_0)$, in particular, ${\tilde{p A U C}}_{j a c k} (P_{0}) = \frac{1}{n + m} \sum_{h = 1}^{n + m} V_{h} (P_{0}),$ where $V_{h} (P_{0}) = (n + m) \tilde{p A U C} (P_{0}) - (n + m - 1) {\tilde{p A U C}}_{h} (P_{0}),$ and ${\tilde{p A U C}}_{h} (P_{0}) = {\begin{cases} P_{0} - \frac{1}{m - 1} \sum_{i \neq h}^{m} min {S_{G, n} (X_{i}), P_{0}} & 1 \leq h \leq m \\ P_{0} - \frac{1}{m} \sum_{i = 1}^{m} min {S_{G, n - 1, h - m} (X_{i}), P_{0}} & m + 1 \leq h \leq m + n, \end{cases}$ where $S_{G, n - 1, h - m} (X_{i}) = \frac{1}{n - 1} \overset{n}{\sum_{j = 1, j \neq h - m}} I (Y_{j} > X_{i}) .$

pODC:

The ordinal dominance curve (ODC) introduced by Bamber (1973), describes the association between true negative rate (TNR) and false negative rate (FNR), ODC(t) = G{F⁻¹(t)} where t ∈ (0, 1). The area under the ODC, ∫₀¹ODC(t)dt = ∫_−∞^∞G(u)dF(u), is a commonly used summary measure. A partial area under the ODC (pODC) from 0 to P₀ is taken as pODC(P₀) = ∫₀^P₀ODC(t)dt.

A Mann-Whitney nonparamteric method for pAUC is (method='WM') $\hat{p O D C} (P_{0}) = \frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} I (Y_{j} \leq X_{i}) I {X_{i} \leq F_{m}^{- 1} (P_{0})},$ where F_m⁻¹(P₀) is an empirical quantile estimate at P₀ and F_m(⋅) and G_n(⋅) are the empirical distributions of F(⋅) and G(⋅).

Yang et al. (2017) propose the following method (method = 'expect'), $\tilde{p O D C} (P_{0}) = P_{0} - \frac{1}{n} \overset{n}{\sum_{j = 1}} min {F_{m} (Y_{j}), P_{0}} .$ Yang et al. (2017) propose a jackknife method (method = 'jackknife') based on $\widetilde{pODC}(P_0)$, in particular, ${\tilde{p O D C}}_{j a c k} (P_{0}) = \frac{1}{n + m} \sum_{h = 1}^{n + m} {\overset{ˇ}{U}}_{h} (P_{0}),$ where ${\overset{ˇ}{U}}_{h} (P_{0}) = (n + m) \tilde{p O D C} (P_{0}) - (n + m - 1) {\tilde{p O D C}}_{h} (P_{0})$ and ${\tilde{p O D C}}_{h} (P_{0}) = {\begin{cases} P_{0} - \frac{1}{n - 1} \sum_{j \neq h}^{n} min {F_{m} (Y_{j}), P_{0}} & 1 \leq h \leq n \\ P_{0} - \frac{1}{n} \sum_{j = 1}^{n} min {F_{m - 1, h - n} (Y_{j}), P_{0}} & n + 1 \leq h \leq m + n, \end{cases}$ where $F_{m - 1, h - n} (Y_{j}) = \frac{1}{m - 1} \overset{m}{\sum_{i = 1, i \neq h - n}} I (X_{i} \leq Y_{j}) .$

two-way pAUC:

The definition and estimation of two-way pAUC are proposed intuitively. Given bounds p₀ and q₀, two-way pAUC is formulated as U(p₀, q₀) = ∫_{S_G{S_F⁻¹(q₀)}}^p₀S_F{S_G⁻¹(u)}du − [p₀ − S_G{S_F⁻¹(q₀)}]q₀. Alternatively, from a probability perspective, U(p₀, q₀) can be transformed as: P{Y < X, X ≤ S_F⁻¹(q₀), Y ≥ S_G⁻¹(p₀)}. A trimmed Mann-Whitney U-statistics estimator directly following the above expression is $\frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} V_{i, j} (p_{0}, q_{0}),$ where V_i, j(p₀, q₀) = I{Y_j ≤ X_i, X_i ≤ S_F, m⁻¹(q₀), Y_j ≥ S_G, n⁻¹(p₀)}.

Inference:

pAUC:

Yang et al. (2017) prove that, under certain conditions, $\begin{matrix} \sqrt{m + n} {\hat{p A U C} (P_{0}) - p A U C (P_{0})} \overset{d}{\to} N {0, \frac{σ_{1}^{2} (P_{0})}{λ} + \frac{σ_{2}^{2} (P_{0})}{1 - λ}}, m, n \to \infty, \end{matrix}$ where $\frac{m}{m+n}\to \lambda$, σ₁²(P₀) = ∫_+∞^{S_G⁻¹(P₀)}{P₀ − S_G(t)}²dS_F(t) − {∫_+∞^{S_G⁻¹(P₀)}S_F(t)dS_G(t)}², and σ₂²(P₀) = ∫_+∞^{S_G⁻¹(P₀)}[S_F(t) − S_F{S_G⁻¹(P₀)}]²dS_G(t) − (∫_+∞^{S_G⁻¹(P₀)}[S_F(t) − S_F{S_G⁻¹(P₀)}]dS_G(t))². Moveover, under same conditions, $\sqrt{m + n} {{\tilde{p A U C}}_{} (P_{0}) - p A U C (P_{0})} \overset{d}{\to} N {0, \frac{σ_{1}^{2} (P_{0})}{λ} + \frac{σ_{2}^{2} (P_{0})}{1 - λ}}, m, n \to \infty .$ and $\sqrt{m + n} {{\tilde{p A U C}}_{j a c k} (P_{0}) - p A U C (P_{0})} \overset{d}{\to} N {0, \frac{σ_{1}^{2} (P_{0})}{λ} + \frac{σ_{2}^{2} (P_{0})}{1 - λ}}, m, n \to \infty .$
Consider the jackknife variance estimator $S_{\tilde{p A U C}}^{2} = {(m + n)}^{- 1} \sum_{h = 1}^{m + n} {V_{h} (P_{0}) - {\tilde{p A U C}}_{j a c k} (P_{0})}^{2} .$ Yang et al. (2017) prove that $S_{\tilde{p A U C}}^{2} (P_{0}) = \frac{σ_{1}^{2} (P_{0})}{λ} + \frac{σ_{2}^{2} (P_{0})}{1 - λ} + o_{p} (1) .$ Therefore, $\frac{\sqrt{m + n} {{\tilde{p A U C}}_{j a c k} (P_{0}) - p A U C (P_{0})}}{\sqrt{S_{\tilde{p A U C}}^{2} (P_{0})}} \overset{d}{\to} N (0, 1) .$

pODC:

In ODC cases, we have $\sqrt{m + n} {\hat{p O D C} (P_{0}) - p O D C (P_{0})} \overset{d}{\to} N (0, \frac{σ_{3}^{2}}{1 - λ} + \frac{σ_{4}^{2}}{λ}), m, n \to \infty,$ where σ₃² = ∫_−∞^F⁻¹(P₀){P₀ − F(t)}²dG(t) − {∫_−∞^F⁻¹(P₀)G(t)dF(t)}², and σ₄² = ∫_−∞^F⁻¹(P₀)[G(t) − G{F⁻¹(P₀)}]²dF(t) − (∫_−∞^F⁻¹(P₀)[G(t) − G{F⁻¹(P₀)}]dF(t))². Similarly, $\sqrt{m + n} {{\tilde{p O D C}}_{} (P_{0}) - p O D C (P_{0})} \overset{d}{\to} N {0, \frac{σ_{3}^{2} (P_{0})}{1 - λ} + \frac{σ_{4}^{2} (P_{0})}{λ}},$ and $\sqrt{m + n} {{\tilde{p O D C}}_{j a c k} (P_{0}) - p O D C (P_{0})} \overset{d}{\to} N (0, \frac{σ_{3}^{2}}{1 - λ} + \frac{σ_{4}^{2}}{λ}) .$ Together with $\begin{aligned} S_{\tilde{p O D C}}^{2} & = {(m + n)}^{- 1} \sum_{h = 1}^{m + n} {{\overset{ˇ}{U}}_{h} (P_{0}) - {\tilde{p O D C}}_{j a c k} (P_{0})}^{2} \\ = \frac{σ_{3}^{2} (P_{0})}{1 - λ} + \frac{σ_{4}^{2} (P_{0})}{λ} + o_{p} (1) . \end{aligned}$ and $\frac{\sqrt{m + n} {{\tilde{p O D C}}_{j a c k} (P_{0}) - p O D C (P_{0})}}{\sqrt{S_{\tilde{p O D C}}^{2} (P_{0})}} \overset{d}{\to} N (0, 1) .$

two-way pAUC:

From Yang et al. (2016), we have, under certain conditions, $\sqrt{m + n} {\hat{U} (p_{0}, q_{0}) - U (p_{0}, q_{0})} \overset{d}{\to} N {0, \frac{σ_{5}^{2}}{λ} + \frac{σ_{6}^{2}}{1 - λ}}, as m, n \to \infty,$ where $\begin{aligned} σ_{5}^{2} = & F {G^{- 1} (1 - p_{0})} [G {F^{- 1} (1 - q_{0})} - (1 - p_{0})]^{2} + \int_{G^{- 1} (1 - p_{0})}^{F^{- 1} (1 - q_{0})} [G {F^{- 1} (1 - q_{0})} - G (t)]^{2} d F (t) \\ - {\int_{G^{- 1} (1 - p_{0})}^{F^{- 1} (1 - q_{0})} F (t) d G (t)}^{2}, \end{aligned}$ and $\begin{aligned} σ_{6}^{2} = & [1 - q_{0} - F {G^{- 1} (1 - p_{0})}]^{2} (1 - p_{0}) + \int_{G^{- 1} (1 - p_{0})}^{F^{- 1} (1 - q_{0})} {1 - q_{0} - F (t)}^{2} d G (t) \\ - {\int_{G^{- 1} (1 - p_{0})}^{F^{- 1} (1 - q_{0})} G (t) d F (t)}^{2} . \end{aligned}$

Functions

This packages contains following functions:

tproc.est

This function estimates two-way parital AUC given response, predictor and pre-specific FPR/TPR constraints via the method in Yang et al. (2016).
proc

This function estimates and infers FPR parital AUC given response, predictor and pre-specific FPR constraint via method 'WM', 'expect' and 'jackknife'.
proc.est

This function estimates FPR parital AUC given response, predictor and pre-specific FPR constraint via method 'WM', 'expect' and 'jackknife'.
proc.ci

This function infers FPR parital AUC given response, predictor and pre-specific FPR constraint via method 'WM', 'expect' and 'jackknife'.
podc

This function estimates and infers FNR parital ODC given response, predictor and pre-specific FNR constraint via method 'WM', 'expect' and 'jackknife'.
podc.est

This function estimates FNR parital ODC given response, predictor and pre-specific FNR constraint via method 'WM', 'expect' and 'jackknife'.
podc.ci

This function infers FNR parital ODC given response, predictor and pre-specific FNR constraint via method 'WM', 'expect' and 'jackknife'.

Quick Start

The purpose of this section is to show users the basic usage of this package. We will briefly go through main functions, see what they can do and have a look at outputs. An detailed example of complete procedures of estimation and inference will be presented to give users a general sense of the pakcage.

First, we load tpAUC package:

library(tpAUC)

Then, we estimate two-way partial AUC with date from package pROC.

library('pROC')

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

data(aSAH)
tproc.est(aSAH$outcome, aSAH$s100b, threshold=c(0.8,0.2) )

## Warning in tproc.est(aSAH$outcome, aSAH$s100b, threshold = c(0.8, 0.2)):
## response levels are not 0/1, the first level is defaultly regarded as negative.

## [1] 0.4186992

#estimate two-way partial AUC

tproc.est returns an estimate of two-way partial AUC.

Then, we turn to FPR partial AUC.

proc.est(aSAH$outcome, aSAH$s100b, method='expect',threshold=0.8 )

## Warning in proc.est(aSAH$outcome, aSAH$s100b, method = "expect", threshold =
## 0.8): response levels are not 0/1, the first level is defaultly regarded as
## negative.

## [1] 0.548103

# use method 'expect' to estimate partial AUC 
proc.ci(aSAH$outcome,aSAH$s100b, cp=0.95 ,threshold=0.8,method='expect')

## Warning in proc.ci(aSAH$outcome, aSAH$s100b, cp = 0.95, threshold = 0.8, :
## response levels are not 0/1, the first level is defaultly regarded as negative.

## Warning in proc.est(response, predictor, threshold = threshold, method =
## "expect", : response levels are not 0/1, the first level is defaultly regarded
## as negative.

##          2.5 %    97.5 %
## [1,] 0.4528877 0.6433182

# use method 'expect' to infer partial AUC

Alternatively, we can use proc to do both estimation and inference simultaneously.

proc(aSAH$outcome,aSAH$s100b,threshold=0.8, method='expect',ci=TRUE, cp=0.95 )

## Warning in proc.est(response, predictor, threshold = threshold, method =
## method, : response levels are not 0/1, the first level is defaultly regarded as
## negative.

## Warning in proc.ci(response, predictor, cp = cp, threshold = threshold, :
## response levels are not 0/1, the first level is defaultly regarded as negative.

## Warning in proc.est(response, predictor, threshold = threshold, method =
## "expect", : response levels are not 0/1, the first level is defaultly regarded
## as negative.

## $pauc
## [1] 0.548103
## 
## $ci
##          2.5 %    97.5 %
## [1,] 0.4528877 0.6433182

# set ci=TRUE to get confidence interval

Similar procedures on FNR partial ODC are as follows.

podc.est(aSAH$outcome, aSAH$s100b, method='expect',threshold=0.8)

## Warning in podc.est(aSAH$outcome, aSAH$s100b, method = "expect", threshold =
## 0.8): response levels are not 0/1, the first level is defaultly regarded as
## negative.

## [1] 0.5195122

# estimate FNR partial ODC with method 'expect'
podc.ci(aSAH$outcome, aSAH$s100b, method='expect',threshold=0.8, cp=0.97)

## Warning in podc.ci(aSAH$outcome, aSAH$s100b, method = "expect", threshold =
## 0.8, : response levels are not 0/1, the first level is defaultly regarded as
## negative.

## Warning in podc.est(response, predictor, threshold = threshold, method =
## "expect", : response levels are not 0/1, the first level is defaultly regarded
## as negative.

##         1.5 %    98.5 %
## [1,] 0.403401 0.6356234

# infer FNR partial ODC with method 'expect'

podc aggregates the functions of podc.est and podc.ci.

podc(aSAH$outcome, aSAH$s100b,threshold=0.8, method='expect',ci=TRUE, cp=0.97)

## Warning in podc.est(response, predictor, threshold = threshold, method =
## method, : response levels are not 0/1, the first level is defaultly regarded as
## negative.

## Warning in podc.ci(response, predictor, cp = cp, threshold = threshold, :
## response levels are not 0/1, the first level is defaultly regarded as negative.

## Warning in podc.est(response, predictor, threshold = threshold, method =
## "expect", : response levels are not 0/1, the first level is defaultly regarded
## as negative.

## $podc
## [1] 0.5195122
## 
## $ci
##         1.5 %    98.5 %
## [1,] 0.403401 0.6356234

# inference and estimation

Reference

Wang Z, Chang Y C I. Marker selection via maximizing the partial area under the ROC curve of linear risk scores. Biostatistics, 2011, 12(2): 369-385.
Yang H, Lu K, Lyu X, et al. Two-Way Partial AUC and Its Properties. arXiv:1508.00298, 2016.
Yang H, Lu K, Zhao Y. A nonparametric approach for partial areas under ROC curves and ordinal dominance curves. Statistica Sinica, 2017, 27: 357-371.