Title: | Estimate a Log-Concave Probability Density from Iid Observations |
---|---|
Description: | Given independent and identically distributed observations X(1), ..., X(n), compute the maximum likelihood estimator (MLE) of a density as well as a smoothed version of it under the assumption that the density is log-concave, see Rufibach (2007) and Duembgen and Rufibach (2009). The main function of the package is 'logConDens' that allows computation of the log-concave MLE and its smoothed version. In addition, we provide functions to compute (1) the value of the density and distribution function estimates (MLE and smoothed) at a given point (2) the characterizing functions of the estimator, (3) to sample from the estimated distribution, (5) to compute a two-sample permutation test based on log-concave densities, (6) the ROC curve based on log-concave estimates within cases and controls, including confidence intervals for given values of false positive fractions (7) computation of a confidence interval for the value of the true density at a fixed point. Finally, three datasets that have been used to illustrate log-concave density estimation are made available. |
Authors: | Kaspar Rufibach <[email protected]> and Lutz Duembgen <[email protected]> |
Maintainer: | Kaspar Rufibach <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1.8 |
Built: | 2024-11-14 06:43:09 UTC |
Source: | CRAN |
The main function of this package is logConDens
: compute the maximum likelihood estimator (MLE) of a log-concave density from one-dimensional i.i.d. observations as well as the kernel smoothed version derived from it. A list of additional functions that can be used to compute quantities relevant in that context can be found below.
Two algorithms are offered to compute the estimate: An active set (logConDens
) and an iterative algorithm based on the pool-adjacent-violaters algorithm (icmaLogCon
). The latter of these functions is only part of this package for historical reasons: it was the first method that was proposed to estimate a log-concave density, see Rufibach (2007). The more efficient way of computing the estimate is via an active set algorithm. The smoothed versions of the log-concave density and distribution function estimates discussed in Section 3 of Duembgen and Rufibach (2009) are available in the function evaluateLogConDens
.
To compute a log-concave density, CDF, and survival function from interval- or right-censored observations use the package logconcens. A log-concave probability mass function can be computed using the package logcondiscr, see Balabdaoui et al (2012) for details. For the computation of a log-concave estimate in any dimension the package LogConDEAD can be used.
Package: | logcondens |
Type: | Package |
Version: | 2.1.8 |
Date: | 2023-08-21 |
License: | GPL (>=2) |
The following additional functions and datasets are provided in the package:
evaluateLogConDens
Computes the value of the estimated log-density, density, and distribution function
of the MLE and the smoothed estimator at a given x0
.
quantilesLogConDens
Computes quantiles of the estimated distribution function at a given x0
.
intECDF
Compute the integrated empirical distribution function of the observations at a given x0
.
intF
Compute the integral of the distribution function corresponding to the log-concave density
estimator at a given x0
.
logconTwoSample
Compute a permutation test for the difference between two distribution functions.
logConROC
Compute ROC curve based on log-concave density estimates within cases and controls.
confIntBootLogConROC_t0
Compute bootstrap confidence intervals at given false positive fractions (= 1 - specificity) for the ROC curve based on log-concave density estimates.
logConCI
Compute a confidence interval for the value of the true density at a fixed point, based on the theory developed in Balabdaoui et al (2009). This function was contributed by Mahdis Azadbakhsh and Hanna Jankowski, see Azadbakhsh et al (2012).
isoMean
Compute the weighted least squares regression under a monotonicity constraint (the Grenander estimator).
This function is used as part of icmaLogCon
but is also of independent interest.
The following datasets have been used in several publications to illustrate log-concave density estimation and are therefore included in this package:
reliability
Dataset that contains the data analyzed in Duembgen and Rufibach (2009, Figure 2).
See the help file for logConDens
for the analysis of this data.
brightstar
Dataset that contains the data analyzed in Mizera and Koenker (2009, Section 5). The
sample consists of measurements of radial and rotational velocities for the stars from the Bright Star Catalog, see
Hoffleit and Warren (1991).
pancreas
Data from pancreatic cancer serum biomarker study, first published by Wieand et al (1989).
Contains data on serum measurements for a cancer antigen (CA-125) and
a carbohydrate antigen (CA19.9) both of which are measured on a continuous non-negative scale. The measurements were taken within a case-control
study on 90 cases with pancreatic cancer and 51 controls who did not have cancer but pancreatitis.
A print
and summary
for objects of class dlc
, generated by logConDens
, are available as well.
Kaspar Rufibach (maintainer), [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Kaspar Rufibach gratefully acknowledges support by the Swiss National Science Foundation SNF, http://www.snf.ch.
MatLab code with an implementation of the active set algorithm is available on request from Lutz Duembgen.
Azadbakhsh, M., Jankowski, H. and Gao, X. (2014). Computing confidence intervals for log-concave densities. Comput. Statist. Data Anal., 75, 248–264.
Balabdaoui, F., Jankowski, H., Rufibach, K. and Pavlides, M. (2012). Asymptotics of the discrete log-concave maximum likelihood estimator and related applications. J. R. Stat. Soc. Ser. B Stat. Methodol., 75(4), 769–790.
Baladbaoui, F., Rufibach, K. and Wellner, J. (2009). Limit distribution theory for maximum likelihood estimation of a log-concave density. Ann. Statist., 37(3), 1299–1331.
Duembgen, L., Huesler, A. and Rufibach, K. (2010). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at https://arxiv.org/abs/0707.4643.
Duembgen, L. and Rufibach, K. (2009). Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011). logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
Hoffleit, D., Warren, W.H. (1991). The Bright Star Catalog. Yale University Observatory, New Heaven.
Mizera, I., Koenker, R. (2010). Quasi-concave density estimation. Ann. Statist., 38(5), 2998–3027.
Rufibach K. (2006).
Log-concave Density Estimation and Bump Hunting for i.i.d. Observations.
PhD Thesis, University of Bern, Switzerland and Georg-August University of Goettingen, Germany, 2006.
Available at https://slsp-ube.primo.exlibrisgroup.com/permalink/41SLSP_UBE/17e6d97/alma99116730175505511.
Rufibach, K. (2007). Computing maximum likelihood estimators of a log-concave density function. J. Stat. Comput. Simul. 77, 561–574.
Rufibach, K. (2012). A smooth ROC curve estimator based on log-concave density estimates. Int. J. Biostat., 8(1), 1–29.
## estimate gamma density set.seed(1977) x <- rgamma(100, 2, 1) res <- logConDens(x, smoothed = FALSE, print = TRUE) summary(res) ## compare performance to ICMA res2 <- icmaLogCon(x, T1 = 2000, robustif = TRUE, print = TRUE) res$L res2$L ## plot resulting functions par(mfrow = c(2, 2), mar = c(3, 2, 1, 2)) plot(res, which = "density") plot(res, which = "log-density") plot(res, which = "CDF") xli <- range(res$x) + 0.1 * c(-1, 1) * diff(range(res$x)) plot(res$x, res$H, type = 'l', xlim = xli, yli = c(min(res$H) * 1.1, 0)) segments(res$knots, 0, res$knots, min(res$H) * 1.1, lty = 2) rug(res$x); abline(h = 0, lty = 2) ## compute function values at an arbitrary point x0 <- (res$x[50] + res$x[51]) / 2 evaluateLogConDens(x0, res) ## compute 0.5 quantile of Fhat quantilesLogConDens(0.5, res)
## estimate gamma density set.seed(1977) x <- rgamma(100, 2, 1) res <- logConDens(x, smoothed = FALSE, print = TRUE) summary(res) ## compare performance to ICMA res2 <- icmaLogCon(x, T1 = 2000, robustif = TRUE, print = TRUE) res$L res2$L ## plot resulting functions par(mfrow = c(2, 2), mar = c(3, 2, 1, 2)) plot(res, which = "density") plot(res, which = "log-density") plot(res, which = "CDF") xli <- range(res$x) + 0.1 * c(-1, 1) * diff(range(res$x)) plot(res$x, res$H, type = 'l', xlim = xli, yli = c(min(res$H) * 1.1, 0)) segments(res$knots, 0, res$knots, min(res$H) * 1.1, lty = 2) rug(res$x); abline(h = 0, lty = 2) ## compute function values at an arbitrary point x0 <- (res$x[50] + res$x[51]) / 2 evaluateLogConDens(x0, res) ## compute 0.5 quantile of Fhat quantilesLogConDens(0.5, res)
Given a vector of observations
with not necessarily equal entries,
activeSetLogCon
first computes vectors
and
where
is the weight of each
s.t.
.
Then,
activeSetLogCon
computes a concave, piecewise
linear function on
with knots only in
such that
is maximal. To accomplish this, an active set algorithm is used.
activeSetLogCon(x, xgrid = NULL, print = FALSE, w = NA)
activeSetLogCon(x, xgrid = NULL, print = FALSE, w = NA)
x |
Vector of independent and identically distributed numbers, not necessarily unique. |
xgrid |
Governs the generation of weights for observations. See |
print |
|
w |
Optional vector of weights. If weights are provided, i.e. if |
xn |
Vector with initial observations |
x |
Vector of observations |
w |
The vector of weights that had been used. Depends on the chosen setting for |
phi |
Vector with entries |
IsKnot |
Vector with entries IsKnot |
L |
The value |
Fhat |
A vector
|
H |
Vector
at zero and |
n |
Number of initial observations. |
m |
Number of unique observations. |
knots |
Observations that correspond to the knots. |
mode |
Mode of the estimated density |
sig |
The standard deviation of the initial sample |
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L, Huesler, A. and Rufibach, K. (2010) Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at https://arxiv.org/abs/0707.4643.
Duembgen, L. and Rufibach, K. (2009) Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
activeSetLogCon
can be used to estimate a log-concave density. However, to generate an object of
class dlc
that allows application of summary
and plot
we recommend to use logConDens
.
The following functions are used by activeSetLogCon
:
J00
, J10
, J11
, J20
,
Local_LL
, Local_LL_all
, LocalCoarsen
,
LocalConvexity
, LocalExtend
, LocalF
, LocalMLE
,
LocalNormalize
, MLE
Log concave density estimation via an iterative convex minorant algorithm can be performed using
icmaLogCon
.
## estimate gamma density set.seed(1977) n <- 200 x <- rgamma(n, 2, 1) res <- activeSetLogCon(x, w = rep(1 / n, n), print = FALSE) ## plot resulting functions par(mfrow = c(2, 2), mar = c(3, 2, 1, 2)) plot(res$x, exp(res$phi), type = 'l'); rug(x) plot(res$x, res$phi, type = 'l'); rug(x) plot(res$x, res$Fhat, type = 'l'); rug(x) plot(res$x, res$H, type = 'l'); rug(x) ## compute and plot function values at an arbitrary point x0 <- (res$x[100] + res$x[101]) / 2 Fx0 <- evaluateLogConDens(x0, res, which = 3)[, "CDF"] plot(res$x, res$Fhat, type = 'l'); rug(res$x) abline(v = x0, lty = 3); abline(h = Fx0, lty = 3) ## compute and plot 0.9-quantile of Fhat q <- quantilesLogConDens(0.9, res)[2] plot(res$x, res$Fhat, type = 'l'); rug(res$x) abline(h = 0.9, lty = 3); abline(v = q, lty = 3)
## estimate gamma density set.seed(1977) n <- 200 x <- rgamma(n, 2, 1) res <- activeSetLogCon(x, w = rep(1 / n, n), print = FALSE) ## plot resulting functions par(mfrow = c(2, 2), mar = c(3, 2, 1, 2)) plot(res$x, exp(res$phi), type = 'l'); rug(x) plot(res$x, res$phi, type = 'l'); rug(x) plot(res$x, res$Fhat, type = 'l'); rug(x) plot(res$x, res$H, type = 'l'); rug(x) ## compute and plot function values at an arbitrary point x0 <- (res$x[100] + res$x[101]) / 2 Fx0 <- evaluateLogConDens(x0, res, which = 3)[, "CDF"] plot(res$x, res$Fhat, type = 'l'); rug(res$x) abline(v = x0, lty = 3); abline(h = Fx0, lty = 3) ## compute and plot 0.9-quantile of Fhat q <- quantilesLogConDens(0.9, res)[2] plot(res$x, res$Fhat, type = 'l'); rug(res$x) abline(h = 0.9, lty = 3); abline(v = q, lty = 3)
Functions that are used by activeSetLogCon.
LocalCoarsen(x, w, IsKnot) LocalConvexity(x, phi) LocalExtend(x, IsKnot, x2, phi2) LocalF(x, phi) LocalNormalize(x, phi) LocalMLE(x, w, IsKnot, phi_o, prec) LocalVariance(x, w = NULL, phi)
LocalCoarsen(x, w, IsKnot) LocalConvexity(x, phi) LocalExtend(x, IsKnot, x2, phi2) LocalF(x, phi) LocalNormalize(x, phi) LocalMLE(x, w, IsKnot, phi_o, prec) LocalVariance(x, w = NULL, phi)
x |
Vector of independent and identically distributed numbers, with strictly increasing entries. |
w |
Optional vector of nonnegative weights corresponding to |
IsKnot |
Vector with entries IsKnot |
phi |
Vector with entries |
x2 |
Vector of same type as |
phi2 |
Vector of same type as |
phi_o |
Optional starting vector. |
prec |
Threshold for the directional derivative during Newton-Raphson procedure. |
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
All the above functions are used by activeSetLogCon
to estimate a log-concave
probability density.
Log concave density estimation via an iterative convex minorant algorithm can be performed using
icmaLogCon
.
Dataset that contains the data analyzed in Mizera and Koenker (2009, Section 5). The sample consists of measurements of radial and rotational velocities for the stars from the Bright Star Catalog, see Hoffleit and Warren (1991).
data(brightstar)
data(brightstar)
A data frame with 9092 rows on the following 2 variables.
nr
Location of measurements.
rad
Measurements of radial velocities.
rot
Measurements of rotational velocities.
Duembgen, L. and Rufibach, K. (2009) Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Hoffleit, D., Warren, W.H. (1991). The Bright Star Catalog. Yale University Observatory, New Heaven.
Mizera, I., Koenker, R. (2010). Quasi-concave density estimation. Ann. Statist., 38(5), 2998–3027.
# ---- load rotational velocity data ---- data(brightstar) # ---- compute and plot log-concave estimate ---- # See also Figure 3 in Koenker & Mizera (2009) x0 <- sort(brightstar[, 3]) res <- logConDens(x0, print = FALSE, smoothed = FALSE) plot(res, which = "density")
# ---- load rotational velocity data ---- data(brightstar) # ---- compute and plot log-concave estimate ---- # See also Figure 3 in Koenker & Mizera (2009) x0 <- sort(brightstar[, 3]) res <- logConDens(x0, print = FALSE, smoothed = FALSE) plot(res, which = "density")
This function computes a bootstrap confidence interval for the ROC curve at a given value false negative fraction (1 - specificity) . The ROC curve estimate is based on log-concave densities, as discussed in Rufibach (2011).
confIntBootLogConROC_t0(controls, cases, grid = c(0.2, 0.8), conf.level = 0.95, M = 1000, smooth = TRUE, output = TRUE)
confIntBootLogConROC_t0(controls, cases, grid = c(0.2, 0.8), conf.level = 0.95, M = 1000, smooth = TRUE, output = TRUE)
cases |
Values of the continuous variable for the cases. |
controls |
Values of the continuous variable for the controls. |
grid |
Values of 1 - specificity where confidence intervals should be computed at (may be a vector). |
conf.level |
Confidence level of confidence interval. |
M |
Number of bootstrap replicates. |
smooth |
|
output |
|
A list containing the following elements:
qs |
|
boot.mat |
Bootstrap samples for the ROC curve based on the log-concave density estimate. |
qs.smooth |
If |
boot.mat.smooth |
If |
The confidence intervals are only valid if observations are independent, i.e. eacht patient only contributes one measurement, e.g.
Kaspar Rufibach (maintainer)
[email protected]
http://www.kasparrufibach.ch.
The reference for computation of these bootstrap confidence intervals is:
Rufibach, K. (2012). A smooth ROC curve estimator based on log-concave density estimates. Int. J. Biostat., 8(1), 1–29.
The bootstrap competitor based on the empirical ROC curve is described in:
Zhou, X.H. and Qin, G. (2005). Improved confidence intervals for the sensitivity at a fixed level of specificity of a continuous-scale diagnostic test. Statist. Med., 24, 465–477.
The ROC curve based on log-concave density estimates can be computed using logConROC
. In the example below we analyze the pancreas
data.
## Not run: ## ROC curve for pancreas data data(pancreas) status <- factor(pancreas[, "status"], levels = 0:1, labels = c("healthy", "diseased")) var <- log(pancreas[, "ca199"]) cases <- var[status == "diseased"] controls <- var[status == "healthy"] ## compute confidence intervals res <- confIntBootLogConROC_t0(controls, cases, grid = c(0.2, 0.8), conf.level = 0.95, M = 1000, smooth = TRUE, output = TRUE) res ## End(Not run)
## Not run: ## ROC curve for pancreas data data(pancreas) status <- factor(pancreas[, "status"], levels = 0:1, labels = c("healthy", "diseased")) var <- log(pancreas[, "ca199"]) cases <- var[status == "diseased"] controls <- var[status == "healthy"] ## compute confidence intervals res <- confIntBootLogConROC_t0(controls, cases, grid = c(0.2, 0.8), conf.level = 0.95, M = 1000, smooth = TRUE, output = TRUE) res ## End(Not run)
Based on a "dlc"
object generated by logConDens
, this function computes the values of
at all real number in
xs
. The exact formula for and
is
for the function introduced in
Jfunctions
. Closed formulas can also be given for
and
.
evaluateLogConDens(xs, res, which = 1:5, gam = NULL, print = FALSE)
evaluateLogConDens(xs, res, which = 1:5, gam = NULL, print = FALSE)
xs |
Vector of real numbers where the functions should be evaluated at. |
res |
An object of class |
which |
A (sub-)vector of |
gam |
Only necessary if |
print |
Progress in computation of smooth estimates is shown. |
Matrix with rows
where
is the
-th entry of
xs
.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
## estimate gamma density set.seed(1977) x <- rgamma(200, 2, 1) res <- logConDens(x, smoothed = TRUE, print = FALSE) ## compute function values at an arbitrary point xs <- (res$x[100] + res$x[101]) / 2 evaluateLogConDens(xs, res) ## only compute function values for non-smooth estimates evaluateLogConDens(xs, res, which = 1:3)
## estimate gamma density set.seed(1977) x <- rgamma(200, 2, 1) res <- logConDens(x, smoothed = TRUE, print = FALSE) ## compute function values at an arbitrary point xs <- (res$x[100] + res$x[101]) / 2 evaluateLogConDens(xs, res) ## only compute function values for non-smooth estimates evaluateLogConDens(xs, res, which = 1:3)
Given a vector of observations
with not necessarily equal entries,
activeSetLogCon
first computes vectors and
where
is the weight of each
s.t.
.
Then,
activeSetLogCon
computes a concave, piecewise
linear function on
with knots only in
such that
is maximal. In order to be able to apply the pool - adjacent - violaters algorithm, computations are performed in the parametrization
To find the maximum of , a variant of the iterative convex minorant using the pool - adjacent - violaters
algorithm is used.
icmaLogCon(x, xgrid = NULL, eps = 10^-8, T1 = 2000, robustif = TRUE, print = FALSE)
icmaLogCon(x, xgrid = NULL, eps = 10^-8, T1 = 2000, robustif = TRUE, print = FALSE)
x |
Vector of independent and identically distributed numbers, not necessarily equal. |
xgrid |
Governs the generation of weights for observations. See |
eps |
An arbitrary real number, typically small. Iterations are halted if the directional derivative of |
T1 |
Maximal number of iterations to perform. |
robustif |
|
print |
|
x |
Vector of observations |
w |
The vector of weights that had been used. Depends on the chosen setting for |
f |
Vector with entries |
xn |
Vector with initial observations |
Loglik |
The value |
Iterations |
Number of iterations performed. |
sig |
The standard deviation of the initial sample |
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Rufibach K. (2006) Log-concave Density Estimation and Bump Hunting for i.i.d. Observations.
PhD Thesis, University of Bern, Switzerland and Georg-August University of Goettingen, Germany, 2006.
Available at https://slsp-ube.primo.exlibrisgroup.com/permalink/41SLSP_UBE/17e6d97/alma99116730175505511.
Rufibach, K. (2007) Computing maximum likelihood estimators of a log-concave density function. J. Stat. Comput. Simul. 77, 561–574.
icmaLogCon
can be used to estimate a log-concave density. However, to generate an object of
class dlc
that allows application of summary
and plot
one has to
use logConDens
.
The following functions are used by icmaLogCon
:
phieta
, etaphi
, Lhat_eta
, quadDeriv
,
robust
, isoMean
.
set.seed(1977) x <- rgamma(200, 2, 1) ## Not run: res <- icmaLogCon(x, T1 = 2000, robustif = TRUE, print = TRUE) ## plot resulting functions par(mfrow = c(2, 1), mar = c(3, 2, 1, 2)) plot(x, exp(res$phi), type = 'l'); rug(x) plot(x, res$phi, type = 'l'); rug(x) ## End(Not run)
set.seed(1977) x <- rgamma(200, 2, 1) ## Not run: res <- icmaLogCon(x, T1 = 2000, robustif = TRUE, print = TRUE) ## plot resulting functions par(mfrow = c(2, 1), mar = c(3, 2, 1, 2)) plot(x, exp(res$phi), type = 'l'); rug(x) plot(x, res$phi, type = 'l'); rug(x) ## End(Not run)
Computes the value of
where is the empirical distribution function of
, at all real numbers
in the
vector
. Note that
(so all elements in
) must lie in
.
The exact formula for
is
where .
intECDF(s, x)
intECDF(s, x)
s |
Vector of real numbers in |
x |
Vector |
Vector of the same length as , containing the values of
at the elements of
.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L. and Rufibach, K. (2009) Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
Rufibach K. (2006) Log-concave Density Estimation and Bump Hunting for i.i.d. Observations.
PhD Thesis, University of Bern, Switzerland and Georg-August University of Goettingen, Germany, 2006.
Available at https://slsp-ube.primo.exlibrisgroup.com/permalink/41SLSP_UBE/17e6d97/alma99116730175505511.
This function together with intF
can be used to check the characterization of the log-concave density
estimator in terms of distribution functions, see Rufibach (2006) and Duembgen and Rufibach (2009).
# for an example see the function intF.
# for an example see the function intF.
Based on an object of class dlc
as output by the function logConDens
,
this function gives values of
at all numbers in . Note that
(so all elements in
) must lie in
. The exact formula for
is
where min
and
for ,
for any vector
and the function
introduced in
Jfunctions
.
intF(s, res)
intF(s, res)
s |
Vector of real numbers where the functions should be evaluated at. |
res |
An object of class |
Vector of the same length as , containing the values of
at the elements of
.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L. and Rufibach, K. (2009) Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
Rufibach K. (2006) Log-concave Density Estimation and Bump Hunting for i.i.d. Observations.
PhD Thesis, University of Bern, Switzerland and Georg-August University of Goettingen, Germany, 2006.
Available at https://slsp-ube.primo.exlibrisgroup.com/permalink/41SLSP_UBE/17e6d97/alma99116730175505511.
This function uses the output of activeSetLogCon
. The function intECDF
is similar, but based
on the empirical distribution function.
## estimate gamma density set.seed(1977) x <- rgamma(200, 2, 1) res <- logConDens(x, smoothed = FALSE, print = FALSE) ## compute and plot the process D(t) in Duembgen and Rufibach (2009) s <- seq(min(res$x), max(res$x), by = 10 ^ -3) D1 <- intF(s, res) D2 <- intECDF(s, res$xn) par(mfrow = c(2, 1)) plot(res$x, res$phi, type = 'l'); rug(res$x) plot(s, D1 - D2, type = 'l'); abline(h = 0, lty = 2)
## estimate gamma density set.seed(1977) x <- rgamma(200, 2, 1) res <- logConDens(x, smoothed = FALSE, print = FALSE) ## compute and plot the process D(t) in Duembgen and Rufibach (2009) s <- seq(min(res$x), max(res$x), by = 10 ^ -3) D1 <- intF(s, res) D2 <- intECDF(s, res$xn) par(mfrow = c(2, 1)) plot(res$x, res$phi, type = 'l'); rug(res$x) plot(s, D1 - D2, type = 'l'); abline(h = 0, lty = 2)
Fits a vector with nondecreasing components to the data vector
such that
is minimal (pool - adjacent - violators algorithm). In case a weight vector with positive entries (and the same size as ) is provided, the function produces an isotonic vector minimizing
isoMean(y, w)
isoMean(y, w)
y |
Vector |
w |
Arbitrary vector |
Returns vector .
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
## simple regression model n <- 50 x <- sort(runif(n, 0, 1)) y <- x ^ 2 + rnorm(n, 0, 0.2) s <- seq(0, 1, by = 0.01) plot(s, s ^ 2, col = 2, type = 'l', xlim = range(c(0, 1, x)), ylim = range(c(0, 1 , y))); rug(x) ## plot pava result lines(x, isoMean(y, rep(1 / n, n)), type = 's')
## simple regression model n <- 50 x <- sort(runif(n, 0, 1)) y <- x ^ 2 + rnorm(n, 0, 0.2) s <- seq(0, 1, by = 0.01) plot(s, s ^ 2, col = 2, type = 'l', xlim = range(c(0, 1, x)), ylim = range(c(0, 1 , y))); rug(x) ## plot pava result lines(x, isoMean(y, rep(1 / n, n)), type = 's')
J00 represents the function where for real numbers
and
The functions Jab give the respective derivatives for
, i.e.
Specifically,
J00(x, y, v) J10(x, y) J11(x, y) J20(x, y)
J00(x, y, v) J10(x, y) J11(x, y) J20(x, y)
x |
Vector of length |
y |
Vector of length |
v |
Number in |
Value of the respective function.
Taylor approximations are used if is small. We refer to Duembgen et al (2011, Section 6) for
details.
These functions are not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L, Huesler, A. and Rufibach, K. (2010) Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at https://arxiv.org/abs/0707.4643.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
Gives the value of
where is parametrized via
Lhat_eta(x, w, eta)
Lhat_eta(x, w, eta)
x |
Vector of independent and identically distributed numbers, with strictly increasing entries. |
w |
Optional vector of nonnegative weights corresponding to |
eta |
Some vector |
Value of the log-likelihood function is returned.
This function is not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Gives the value of
Local_LL(x, w, phi)
Local_LL(x, w, phi)
x |
Vector of independent and identically distributed numbers, with strictly increasing entries. |
w |
Optional vector of nonnegative weights corresponding to |
phi |
Some vector |
Value of the log-likelihood function is returned.
This function is not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Computes the value of the log-likelihood function
a new candidate for via the Newton method as well as the directional derivative of
into that direction.
Local_LL_all(x, w, phi)
Local_LL_all(x, w, phi)
x |
Vector of independent and identically distributed numbers, with strictly increasing entries. |
w |
Optional vector of nonnegative weights corresponding to |
phi |
Some vector |
ll |
Value |
phi_new |
New candidate for |
dirderiv |
Directional derivative of |
This function is not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Compute approximate confidence interval for the true log-concave density, on a grid of points. Two main approaches are implemented: In the first, the confidence interval at a fixed point is based on the pointwise asymptotic theory for the log-concave maximum likelihood estimator (MLE) developed in Balabdaoui, Rufibach, and Wellner (2009). In the second, the confidence interval is estimated via the boostrap.
logConCI(res, xx0, conf.level = c(0.8, 0.9, 0.95, 0.99)[3], type = c("DR", "ks", "nrd", "ECDFboot", "NPMLboot")[2], htype = c("hscv", "hlscv", "hpi", "hns")[4], BB = 500)
logConCI(res, xx0, conf.level = c(0.8, 0.9, 0.95, 0.99)[3], type = c("DR", "ks", "nrd", "ECDFboot", "NPMLboot")[2], htype = c("hscv", "hlscv", "hpi", "hns")[4], BB = 500)
res |
An object of class |
xx0 |
Vector of grid points at which to calculate the confidence interval. |
conf.level |
Confidence level for the confidence interval(s). The default is 95%. |
type |
Vector of strings indicating type of confidence interval to compute. When |
htype |
Vector of strings indicating bandwidth selection method if |
BB |
number of iterations in the bootstrap if |
In Balabdaoui et al. (2009) it is shown that (if the true density is strictly log-concave) the limiting distribution of the MLE of a log-concave
density at a point
is
The nuisance parameter depends on the true density
and the second derivative of its logarithm. The limiting process
is found as the second derivative at zero of a particular operator (called the "envelope") of an integrated Brownian motion plus
.
Three of the confidence intervals are based on inverting the above limit using estimated quantiles of , and estimating the nuisance
parameter
. The options for the function
logConCI
provide different ways to estimate this nuisance parameter. If type = "DR"
,
is estimated using derivatives of the smoothed MLE as calculated by the function
logConDens
(this method does not perform well in
simulations and is therefore not recommended). If type="ks"
, is estimated using kernel density estimates of the true density and its
first and second derivatives. This is done using the
R
package ks, and, with this option, a bandwidth selection method htype
must also
be chosen. The choices in htype
correspond to the various options for bandwidth selection available in ks. If type = "nrd"
, the second
derivative of the logarithm of the true density in is estimated assuming a normal reference distribution.
Two of the confidence intervals are based on the bootstrap. For type = "ECDFboot"
confidence intervals based on re-sampling from the empirical
cumulative distribution function are computed. For type = "NPMLboot"
confidence intervals based on re-sampling from the nonparametric maximum
likelihood estimate of log-concave density are computed. Bootstrap confidence intervals take a few minutes to compute!
The default option is type = "ks"
with htype = "hns"
. Currently available confidence levels are 80%, 90%, 95% and 99%, with a default
of 95%.
Azadbakhsh et al. (2014) provides an empirical study of the relative performance of the various approaches available in this function.
The function returns a list containing the following elements:
fhat |
MLE evaluated at grid points. |
up_DR |
Upper confidence interval limit when |
lo_DR |
Lower confidence interval limit when |
up_ks_hscv |
Upper confidence interval limit when |
lo_ks_hscv |
Lower confidence interval limit when |
up_ks_hlscv |
Upper confidence interval limit when |
lo_ks_hlscv |
Lower confidence interval limit when |
up_ks_hpi |
Upper confidence interval limit when |
lo_ks_hpi |
Lower confidence interval limit when |
up_ks_hns |
Upper confidence interval limit when |
lo_ks_hns |
Lower confidence interval limit when |
up_nrd |
Upper confidence interval limit when |
lo_nrd |
Lower confidence interval limit when |
up_npml |
Upper confidence interval limit when |
lo_npml |
Lower confidence interval limit when |
up_ecdf |
Upper confidence interval limit when |
lo_ecdf |
Lower confidence interval limit when |
Mahdis Azadbakhsh
Hanna Jankowski, [email protected]
Azadbakhsh, M., Jankowski, H. and Gao, X. (2014). Computing confidence intervals for log-concave densities. Comput. Statist. Data Anal., 75, 248–264.
Baladbaoui, F., Rufibach, K. and Wellner, J. (2009) Limit distribution theory for maximum likelihood estimation of a log-concave density. Ann. Statist., 37(3), 1299–1331.
Tarn Duong (2012). ks: Kernel smoothing. R package version 1.8.10. https://CRAN.R-project.org/package=ks
## Not run: ## =================================================== ## Confidence intervals at a fixed point for the density ## =================================================== data(reliability) x.rel <- sort(reliability) # calculate 95 grid <- seq(min(x.rel), max(x.rel), length.out = 200) res <- logConDens(x.rel) ci <- logConCI(res, grid, type = c("nrd", "ECDFboot")) par(las = 1, mar = c(2.5, 3.5, 0.5, 0.5)) hist(x.rel, n = 25, col = gray(0.9), main = "", freq = FALSE, xlab = "", ylab = "", ylim = c(0, 0.0065), border = gray(0.5)) lines(grid, ci$fhat, col = "black", lwd = 2) lines(grid, ci$lo_nrd, col = "red", lwd = 2, lty = 2) lines(grid, ci$up_nrd, col = "red", lwd = 2, lty = 2) lines(grid, ci$lo_ecdf, col = "blue", lwd = 2, lty = 2) lines(grid, ci$up_ecdf, col = "blue", lwd = 2, lty = 2) legend("topleft", col = c("black", "blue", "red"), lwd = 2, lty = c(1, 2, 2), legend = c("log-concave NPMLE", "CI for type = nrd", "CI for type = ECDFboot"), bty = "n") ## End(Not run)
## Not run: ## =================================================== ## Confidence intervals at a fixed point for the density ## =================================================== data(reliability) x.rel <- sort(reliability) # calculate 95 grid <- seq(min(x.rel), max(x.rel), length.out = 200) res <- logConDens(x.rel) ci <- logConCI(res, grid, type = c("nrd", "ECDFboot")) par(las = 1, mar = c(2.5, 3.5, 0.5, 0.5)) hist(x.rel, n = 25, col = gray(0.9), main = "", freq = FALSE, xlab = "", ylab = "", ylim = c(0, 0.0065), border = gray(0.5)) lines(grid, ci$fhat, col = "black", lwd = 2) lines(grid, ci$lo_nrd, col = "red", lwd = 2, lty = 2) lines(grid, ci$up_nrd, col = "red", lwd = 2, lty = 2) lines(grid, ci$lo_ecdf, col = "blue", lwd = 2, lty = 2) lines(grid, ci$up_ecdf, col = "blue", lwd = 2, lty = 2) legend("topleft", col = c("black", "blue", "red"), lwd = 2, lty = c(1, 2, 2), legend = c("log-concave NPMLE", "CI for type = nrd", "CI for type = ECDFboot"), bty = "n") ## End(Not run)
Functions that are used by logConCI
and are not intended to be called by the user.
Mahdis Azadbakhsh
Hanna Jankowski, [email protected]
Azadbakhsh, M., Jankowski, H. and Gao, X. (2014). Computing confidence intervals for log-concave densities. Comput. Statist. Data Anal., 75, 248–264.
Baladbaoui, F., Rufibach, K. and Wellner, J. (2009). Limit distribution theory for maximum likelihood estimation of a log-concave density. Ann. Statist., 37(3), 1299–1331.
Tarn Duong (2012). ks: Kernel smoothing. R package version 1.8.10. https://CRAN.R-project.org/package=ks
Compute the log-concave and smoothed log-concave density estimator.
logConDens(x, xgrid = NULL, smoothed = TRUE, print = FALSE, gam = NULL, xs = NULL)
logConDens(x, xgrid = NULL, smoothed = TRUE, print = FALSE, gam = NULL, xs = NULL)
x |
Vector of independent and identically distributed numbers, not necessarily unique. |
xgrid |
Governs the generation of weights for observations. See |
smoothed |
If |
print |
|
gam |
Only necessary if |
xs |
Only necessary if |
See activeSetLogCon
for details on the computations.
logConDens
returns an object of class "dlc"
, a list containing the
following components:
xn
, x
, w
, phi
, IsKnot
, L
, Fhat
, H
,
n
, m
, knots
, mode
, and sig
as generated
by activeSetLogCon
. If smoothed = TRUE
, then the returned object additionally contains
f.smoothed
, F.smoothed
, gam
, and xs
as generated by evaluateLogConDens
. Finally, the
entry smoothed
of type "logical"
returnes the value of smoothed
.
The methods summary.dlc
and plot.dlc
are used to obtain a summary and generate plots of the estimated
density.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L, Huesler, A. and Rufibach, K. (2010). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at https://arxiv.org/abs/0707.4643.
Duembgen, L. and Rufibach, K. (2009). Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011). logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
## =================================================== ## Illustrate on simulated data ## =================================================== ## Set parameters n <- 50 x <- rnorm(n) res <- logConDens(x, smoothed = TRUE, print = FALSE, gam = NULL, xs = NULL) summary(res) plot(res, which = "density", legend.pos = "topright") plot(res, which = "log-density") plot(res, which = "CDF") ## Compute slopes and intercepts of the linear functions that ## compose phi slopes <- diff(res$phi) / diff(res$x) intercepts <- -slopes * res$x[-n] + res$phi[-n] ## =================================================== ## Illustrate method on reliability data ## Reproduce Fig. 2 in Duembgen & Rufibach (2009) ## =================================================== ## Set parameters data(reliability) x <- reliability n <- length(x) res <- logConDens(x, smooth = TRUE, print = TRUE) phi <- res$phi f <- exp(phi) ## smoothed log-concave PDF f.smoothed <- res$f.smoothed xs <- res$xs ## compute kernel density sig <- sd(x) h <- sig / sqrt(n) f.kernel <- rep(NA, length(xs)) for (i in 1:length(xs)){ xi <- xs[i] f.kernel[i] <- mean(dnorm(xi, mean = x, sd = h)) } ## compute normal density mu <- mean(x) f.normal <- dnorm(xs, mean = mu, sd = sig) ## =================================================== ## Plot resulting densities, i.e. reproduce Fig. 2 ## in Duembgen and Rufibach (2009) ## =================================================== plot(0, 0, type = 'n', xlim = range(xs), ylim = c(0, 6.5 * 10^-3)) rug(res$x) lines(res$x, f, col = 2) lines(xs, f.normal, col = 3) lines(xs, f.kernel, col = 4) lines(xs, f.smoothed, lwd = 3, col = 5) legend("topleft", c("log-concave", "normal", "kernel", "log-concave smoothed"), lty = 1, col = 2:5, bty = "n") ## =================================================== ## Plot log-densities ## =================================================== plot(0, 0, type = 'n', xlim = range(xs), ylim = c(-20, -5)) legend("bottomright", c("log-concave", "normal", "kernel", "log-concave smoothed"), lty = 1, col = 2:5, bty = "n") rug(res$x) lines(res$x, phi, col = 2) lines(xs, log(f.normal), col = 3) lines(xs, log(f.kernel), col = 4) lines(xs, log(f.smoothed), lwd = 3, col = 5) ## =================================================== ## Confidence intervals at a fixed point for the density ## see help file for logConCI() ## ===================================================
## =================================================== ## Illustrate on simulated data ## =================================================== ## Set parameters n <- 50 x <- rnorm(n) res <- logConDens(x, smoothed = TRUE, print = FALSE, gam = NULL, xs = NULL) summary(res) plot(res, which = "density", legend.pos = "topright") plot(res, which = "log-density") plot(res, which = "CDF") ## Compute slopes and intercepts of the linear functions that ## compose phi slopes <- diff(res$phi) / diff(res$x) intercepts <- -slopes * res$x[-n] + res$phi[-n] ## =================================================== ## Illustrate method on reliability data ## Reproduce Fig. 2 in Duembgen & Rufibach (2009) ## =================================================== ## Set parameters data(reliability) x <- reliability n <- length(x) res <- logConDens(x, smooth = TRUE, print = TRUE) phi <- res$phi f <- exp(phi) ## smoothed log-concave PDF f.smoothed <- res$f.smoothed xs <- res$xs ## compute kernel density sig <- sd(x) h <- sig / sqrt(n) f.kernel <- rep(NA, length(xs)) for (i in 1:length(xs)){ xi <- xs[i] f.kernel[i] <- mean(dnorm(xi, mean = x, sd = h)) } ## compute normal density mu <- mean(x) f.normal <- dnorm(xs, mean = mu, sd = sig) ## =================================================== ## Plot resulting densities, i.e. reproduce Fig. 2 ## in Duembgen and Rufibach (2009) ## =================================================== plot(0, 0, type = 'n', xlim = range(xs), ylim = c(0, 6.5 * 10^-3)) rug(res$x) lines(res$x, f, col = 2) lines(xs, f.normal, col = 3) lines(xs, f.kernel, col = 4) lines(xs, f.smoothed, lwd = 3, col = 5) legend("topleft", c("log-concave", "normal", "kernel", "log-concave smoothed"), lty = 1, col = 2:5, bty = "n") ## =================================================== ## Plot log-densities ## =================================================== plot(0, 0, type = 'n', xlim = range(xs), ylim = c(-20, -5)) legend("bottomright", c("log-concave", "normal", "kernel", "log-concave smoothed"), lty = 1, col = 2:5, bty = "n") rug(res$x) lines(res$x, phi, col = 2) lines(xs, log(f.normal), col = 3) lines(xs, log(f.kernel), col = 4) lines(xs, log(f.smoothed), lwd = 3, col = 5) ## =================================================== ## Confidence intervals at a fixed point for the density ## see help file for logConCI() ## ===================================================
The receiver operating characteristic (ROC) curve for two constituent distributions and
is defined as
for . It is typically used to assess the performance of a diagnostic test used to discriminate between healthy and diseased
individuals based on a continuous variable.
logConROC(cases, controls, grid, smooth = TRUE)
logConROC(cases, controls, grid, smooth = TRUE)
cases |
A vector of measurements for the cases. |
controls |
A vector of measurements for the controls. |
grid |
A vector specifying the grid where the ROC curve is computed on. |
smooth |
Logical, indicating whether ROC curve and AUC should also be computed based on the smoothed log-concave density estimator. |
In Rufibach (2011) it was shown that the ROC curve based on log-concave density estimates exhibit nice properties for finite sample sizes as well as asymptotically. Its performance is typically much better than that of the empirical ROC curve and only, if at all, sligthly worse compared to the binormal model when in fact the underlying densities are normal. However, log-concavity encompasses many parametric densities, so this new model is much more flexible than the binormal one, at little efficiency sacrifice.
m |
Number of control measurements. |
n |
Number of case measurements. |
fROC |
Estimated ROC curve based on the log-concave density estimate. |
fROC.smooth |
Estimated ROC curve based on the smoothed log-concave density estimate. |
res0 |
|
res1 |
|
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Duembgen, L. and Rufibach, K. (2009). Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011). logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
Rufibach, K. (2012). A smooth ROC curve estimator based on log-concave density estimates. Int. J. Biostat., 8(1), 1–29.
Confidence intervals at given false-positive fractions for the ROC curve based on log-concave densities can be computed using confIntBootLogConROC_t0
. For the computation of the AUC the function ROCx
is used. In the example below we analyze the
pancreas
data.
## ROC curve for pancreas data data(pancreas) status <- factor(pancreas[, "status"], levels = 0:1, labels = c("healthy", "diseased")) var <- log(pancreas[, "ca199"]) cases <- var[status == "diseased"] controls <- var[status == "healthy"] ## compute and plot empirical ROC curve ## code modified from https://stat.ethz.ch/pipermail/r-help/2008-October/178531.html xx <- c(-Inf, sort(unique(c(cases, controls))), Inf) sens <- sapply(xx, function(x){mean(cases >= x)}) spec <- sapply(xx, function(x){mean(controls < x)}) ## compute log-concave ROC curve grid <- seq(0, 1, by = 1 / 500) roc.logcon <- logConROC(cases, controls, grid) ## plot plot(0, 0, xlim = c(0, 1), ylim = c(0, 1), type = 'l', main = "ROC curves for pancreas data", xlab = "1 - specificity", ylab = "sensitivity", pty = 's') legend("bottomright", c("empirical ROC", "log-concave ROC", "smooth log-concave ROC"), lty = c(1, 1, 2), lwd = 2, col = 2:4, bty = "n") segments(0, 0, 1, 1, col = 1) lines(1 - spec, sens, type = 'l', col = 2, lwd = 2) lines(grid, roc.logcon$fROC, col = 3, lwd = 2) lines(grid, roc.logcon$fROC.smooth, col = 4, lwd = 2, lty = 2) ## Not run: ## bootstrap confidence intervals at 1 - specificity = 0.2 and 0.8: res <- confIntBootLogConROC_t0(controls, cases, grid = c(0.2, 0.8), conf.level = 0.95, M = 1000, smooth = TRUE, output = TRUE) res ## End(Not run)
## ROC curve for pancreas data data(pancreas) status <- factor(pancreas[, "status"], levels = 0:1, labels = c("healthy", "diseased")) var <- log(pancreas[, "ca199"]) cases <- var[status == "diseased"] controls <- var[status == "healthy"] ## compute and plot empirical ROC curve ## code modified from https://stat.ethz.ch/pipermail/r-help/2008-October/178531.html xx <- c(-Inf, sort(unique(c(cases, controls))), Inf) sens <- sapply(xx, function(x){mean(cases >= x)}) spec <- sapply(xx, function(x){mean(controls < x)}) ## compute log-concave ROC curve grid <- seq(0, 1, by = 1 / 500) roc.logcon <- logConROC(cases, controls, grid) ## plot plot(0, 0, xlim = c(0, 1), ylim = c(0, 1), type = 'l', main = "ROC curves for pancreas data", xlab = "1 - specificity", ylab = "sensitivity", pty = 's') legend("bottomright", c("empirical ROC", "log-concave ROC", "smooth log-concave ROC"), lty = c(1, 1, 2), lwd = 2, col = 2:4, bty = "n") segments(0, 0, 1, 1, col = 1) lines(1 - spec, sens, type = 'l', col = 2, lwd = 2) lines(grid, roc.logcon$fROC, col = 3, lwd = 2) lines(grid, roc.logcon$fROC.smooth, col = 4, lwd = 2, lty = 2) ## Not run: ## bootstrap confidence intervals at 1 - specificity = 0.2 and 0.8: res <- confIntBootLogConROC_t0(controls, cases, grid = c(0.2, 0.8), conf.level = 0.95, M = 1000, smooth = TRUE, output = TRUE) res ## End(Not run)
Compute -values for a test for the null hypothesis of equal CDFs of two samples. The test
statistic is reminiscient of Kolmogorv-Smirnov's, but instead of computing it for the empirical CDFs, this function
computes it based on log-concave estimates for the CDFs.
logconTwoSample(x, y, which = c("MLE", "smooth"), M = 999, n.grid = 500, display = TRUE, seed0 = 1977)
logconTwoSample(x, y, which = c("MLE", "smooth"), M = 999, n.grid = 500, display = TRUE, seed0 = 1977)
x |
First data sample. |
y |
Second data sample. |
which |
Indicate for which type of estimate the test statistic should be computed. |
M |
Number of permutations. |
n.grid |
Number of grid points in computation of maximal difference between smoothed log-concave CDFs. See |
display |
If |
seed0 |
Set seed to reproduce results. |
Given two i.i.d. samples and
this function computes a permutation
test
-value that provides evidence against the null hypothesis
where are the CDFs of the samples, respectively. A test either based on the log-concave MLE or on its
smoothed version (see Duembgen and Rufibach, 2009, Section 3) are provided. Note that computation of the smoothed
version takes considerably more time.
p.value |
A two dimensional vector containing the |
test.stat.orig |
The test statistics for the original samples. |
test.stats |
A |
Note that the algorithm that finds the maximal difference for the smoothed estimate is of approximative nature only. It may fail for very large sample sizes.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L. and Rufibach, K. (2009) Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
## Not run: n1 <- 30 n2 <- 25 x <- rgamma(n1, 2, 1) y <- rgamma(n2, 2, 1) + 1 twosample <- logconTwoSample(x, y, which = c("MLE", "smooth")[1], M = 999) ## End(Not run)
## Not run: n1 <- 30 n2 <- 25 x <- rgamma(n1, 2, 1) y <- rgamma(n2, 2, 1) + 1 twosample <- logconTwoSample(x, y, which = c("MLE", "smooth")[1], M = 999) ## End(Not run)
Compute the maximal difference between two estimated log-concave distribution functions, either the MLEs or the smoothed versions. This function is used to set up a two-sample permutation test that assesses the null hypothesis of equality of distribution functions.
maxDiffCDF(res1, res2, which = c("MLE", "smooth"), n.grid = 500)
maxDiffCDF(res1, res2, which = c("MLE", "smooth"), n.grid = 500)
res1 |
An object of class |
res2 |
An object of class |
which |
Indicate for which type of estimate the maximal difference should be computed. |
n.grid |
Number of grid points used to find zeros of |
Given two i.i.d. samples and
this function computes the
maxima of
and
test.stat |
A two-dimensional vector containing the above maxima. |
location |
A two-dimensional vector where the maxima occur. |
Note that the algorithm that finds the maximal difference for the smoothed estimate is of approximative nature only. It may fail for very large sample sizes.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L. and Rufibach, K. (2009) Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
n1 <- 100 n2 <- 120 x <- sort(rgamma(n1, 2, 1)) y <- sort(rgamma(n2, 2, 1)) res1 <- logConDens(x, smoothed = TRUE) res2 <- logConDens(y, smoothed = TRUE) d <- maxDiffCDF(res1, res2, n.grid = 200) ## log-concave estimate xs <- seq(min(res1$xs, res2$xs), max(res1$xs, res2$xs), length = 200) F1 <- matrix(NA, nrow = length(xs), ncol = 1); F2 <- F1 for (i in 1:length(xs)){ F1[i] <- evaluateLogConDens(xs[i], res1, which = 3)[, "CDF"] F2[i] <- evaluateLogConDens(xs[i], res2, which = 3)[, "CDF"] } par(mfrow = c(1, 2)) plot(xs, abs(F1 - F2), type = "l") abline(v = d$location[1], lty = 2, col = 3) abline(h = d$test.stat[1], lty = 2, col = 3) ## smooth estimate xs <- seq(min(res1$xs, res2$xs), max(res1$xs, res2$xs), length = 200) F1smooth <- matrix(NA, nrow = length(xs), ncol = 2); F2smooth <- F1smooth for (i in 1:length(xs)){ F1smooth[i, ] <- evaluateLogConDens(xs[i], res1, which = 4:5)[, c("smooth.density", "smooth.CDF")] F2smooth[i, ] <- evaluateLogConDens(xs[i], res2, which = 4:5)[, c("smooth.density", "smooth.CDF")] } plot(xs, abs(F1smooth[, 2] - F2smooth[, 2]), type = "l") abline(h = 0) abline(v = d$location[2], lty = 2, col = c(3, 4)) abline(h = d$test.stat[2], lty = 2, col = c(3, 4))
n1 <- 100 n2 <- 120 x <- sort(rgamma(n1, 2, 1)) y <- sort(rgamma(n2, 2, 1)) res1 <- logConDens(x, smoothed = TRUE) res2 <- logConDens(y, smoothed = TRUE) d <- maxDiffCDF(res1, res2, n.grid = 200) ## log-concave estimate xs <- seq(min(res1$xs, res2$xs), max(res1$xs, res2$xs), length = 200) F1 <- matrix(NA, nrow = length(xs), ncol = 1); F2 <- F1 for (i in 1:length(xs)){ F1[i] <- evaluateLogConDens(xs[i], res1, which = 3)[, "CDF"] F2[i] <- evaluateLogConDens(xs[i], res2, which = 3)[, "CDF"] } par(mfrow = c(1, 2)) plot(xs, abs(F1 - F2), type = "l") abline(v = d$location[1], lty = 2, col = 3) abline(h = d$test.stat[1], lty = 2, col = 3) ## smooth estimate xs <- seq(min(res1$xs, res2$xs), max(res1$xs, res2$xs), length = 200) F1smooth <- matrix(NA, nrow = length(xs), ncol = 2); F2smooth <- F1smooth for (i in 1:length(xs)){ F1smooth[i, ] <- evaluateLogConDens(xs[i], res1, which = 4:5)[, c("smooth.density", "smooth.CDF")] F2smooth[i, ] <- evaluateLogConDens(xs[i], res2, which = 4:5)[, c("smooth.density", "smooth.CDF")] } plot(xs, abs(F1smooth[, 2] - F2smooth[, 2]), type = "l") abline(h = 0) abline(v = d$location[2], lty = 2, col = c(3, 4)) abline(h = d$test.stat[2], lty = 2, col = c(3, 4))
Given a vector of observations with pairwise distinct entries and
a vector of weights
s.t.
, this function computes a function
(represented by the vector
) supported by
such that
is maximal over all continuous, piecewise linear functions with knots in
MLE(x, w = NA, phi_o = NA, prec = 1e-7, print = FALSE)
MLE(x, w = NA, phi_o = NA, prec = 1e-7, print = FALSE)
x |
Vector of independent and identically distributed numbers, with strictly increasing entries. |
w |
Optional vector of nonnegative weights corresponding to |
phi_o |
Optional starting vector. |
prec |
Threshold for the directional derivative during the Newton-Raphson procedure. |
print |
print = TRUE outputs log-likelihood in every loop, print = FALSE does not. Make sure to tell R to output (press CTRL+W). |
phi |
Resulting column vector |
L |
Value |
Fhat |
Vector of the same length as
for |
This function is not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
First published by Wieand et al (1989), this dataset contains data on serum measurements for a cancer antigen (CA-125) and a carbohydrate antigen (CA19.9) both of which are measured on a continuous non-negative scale. The measurements were taken within a case-control study on 90 cases with pancreatic cancer and 51 controls who did not have cancer but pancreatitis. The primary question of the study was which one of the two biomarkers best distinguishes cases from controls.
data(pancreas)
data(pancreas)
A data frame with 141 observations on the following 3 variables.
ca199
CA19.9 measurements.
ca125
CA125 measurements.
status
Patient status, 0 for controls and 1 for cases.
The data was downloaded from http://labs.fhcrc.org/pepe/book/ on February 2, 2011.
Wieand, S., Gail, M. H., James, B. R., and James, K.L. (1989). A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika, 76, 585–592.
Pepe, M.S. (2003) The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press (Section 1.3.3).
This data is analyzed in the help file for the function logConROC
.
plot
method for class "dlc"
.
Three plots (selectable by which
) are currently available: a plot of the estimated density,
the estimated log-density, or the distribution function corresponding to the estimated log-concave density.
By default, a plot of the density estimate is provided. If smoothed = TRUE
, the smoothed version of
the log-concave density estimate (saved in x
) is added to the density and log-density plot.
For the CDF, the smoothed version is not contained by default in a dlc
object and needs to be computed
when asked to be plotted.
## S3 method for class 'dlc' plot(x, which = c("density", "log-density", "CDF"), add.title = TRUE, legend.pos = "topright", ...)
## S3 method for class 'dlc' plot(x, which = c("density", "log-density", "CDF"), add.title = TRUE, legend.pos = "topright", ...)
x |
An object of class |
which |
One of |
add.title |
Logical, if |
legend.pos |
Placement of the legend. One of |
... |
Further arguments. |
See activeSetLogCon
and evaluateLogConDens
for details on the computations.
Chosen plot is generated.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L, Huesler, A. and Rufibach, K. (2010). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at https://arxiv.org/abs/0707.4643.
Duembgen, L. and Rufibach, K. (2009). Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
## See help file of function "logConDens".
## See help file of function "logConDens".
Generates weights from initial sample.
preProcess(x, xgrid = NULL)
preProcess(x, xgrid = NULL)
x |
Vector of independent and identically distributed numbers, not necessarily unique. |
xgrid |
Parameter that governs the generation of weights: If |
x |
Vector of unique and sorted observations deduced from the input |
w |
Vector of corresponding weights, normalized to sum to one. |
sig |
Standard deviation of the inputed observations. This quantity is needed when computing the smoothed
log-concave density estimator via |
n |
Number of initial observations. |
This function is not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
This function is used in the computation of and
.
Q00(x, a, u, v, gamma, QFhat = FALSE)
Q00(x, a, u, v, gamma, QFhat = FALSE)
x |
Number at which to compute |
a |
Vector of length |
u |
Vector of length |
v |
Vector of length |
gamma |
Real number. Standard deviation to be used. |
QFhat |
Logical. Should |
The vector(s) and/or
.
Taylor approximation is used if is small. In addition, as described in Duembgen and Rufibach (2011) at
the end of Appendix C, in extreme situations, e.g. when data sets contain extreme spacings, numerical problems may
occur in the computation of the function
(eq. (7) in Duembgen and Rufibach, 2011).
For it may happen that the exponent is rather large while the difference of Gaussian CDFs is very
small. To moderate these problems, we are using the following bounds:
for arbitrary numbers and
,
.
However, the function Q00
is not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. https://www.jstatsoft.org/v39/i06
Suppose the random variable has density function
for an arbitrary real number and
. The function
qloglin
is simply the
quantile function
in this model, for . This quantile function is used for the computation of quantiles of
in
quantilesLogConDens
.
qloglin(u, t)
qloglin(u, t)
u |
Vector in |
t |
Parameter |
z |
Vector containing the quantiles |
Taylor approximation is used if is small.
This function is not intended to be called by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Computes gradient and diagonal of the Hesse matrix w.r.t. to of a quadratic approximation to the
reparametrized original log-likelihood function
where is parametrized via
: vector
representing concave, piecewise linear function
,
: vector representing successive slopes of
quadDeriv(dx, w, eta)
quadDeriv(dx, w, eta)
dx |
Vector |
w |
Vector of weights as in |
eta |
Vector |
matrix. First column contains gradient and second column diagonal of Hesse matrix.
This function is not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
quadDeriv
is used by the function icmaLogCon
.
Function to compute -quantile of
where is the log-concave density estimator, typically computed via
logConDens
and runs through the vector
ps
.
The formula to compute a quantile at for
is:
where is described in
qloglin
.
quantilesLogConDens(ps, res)
quantilesLogConDens(ps, res)
ps |
Vector of real numbers where quantiles should be computed. |
res |
An object of class |
Returns a data.frame with row where
and
runs through
ps
.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
## estimate gamma density set.seed(1977) x <- rgamma(200, 2, 1) res <- logConDens(x, smoothed = FALSE, print = FALSE) ## compute 0.95 quantile of Fhat q <- quantilesLogConDens(0.95, res)[, "quantile"] plot(res, which = "CDF", legend.pos = "none") abline(h = 0.95, lty = 3); abline(v = q, lty = 3)
## estimate gamma density set.seed(1977) x <- rgamma(200, 2, 1) res <- logConDens(x, smoothed = FALSE, print = FALSE) ## compute 0.95 quantile of Fhat q <- quantilesLogConDens(0.95, res)[, "quantile"] plot(res, which = "CDF", legend.pos = "none") abline(h = 0.95, lty = 3); abline(v = q, lty = 3)
Dataset that contains the data analyzed in Duembgen and Rufibach (2009, Figure 2).
data(reliability)
data(reliability)
A vector containing the 786 observations analyzed in Duembgen and Rufibach (2009, Figure 2).
The data was taken from Duembgen and Rufibach (2009).
Duembgen, L. and Rufibach, K. (2009) Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
See the example in logConDens
for the analysis of this data.
Given a vector representing the values of a piecewise linear concave function at
etaphi
returns a column vector with the entries
The function phieta
returns a vector with the entries
etaphi(x, eta) phieta(x, phi)
etaphi(x, eta) phieta(x, phi)
x |
Vector of independent and identically distributed numbers, with strictly increasing entries. |
eta |
Vector with entries |
phi |
Vector with entries |
These functions are not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Generate a random sample from a distribution with density and
,
as described in Duembgen and Rufibach (2009, Section 3).
rlogcon(n, x0)
rlogcon(n, x0)
n |
Size of random sample to be generated. |
x0 |
Sorted vector of independent and identically distributed numbers, not necessarily unique. |
X |
Random sample from |
X_star |
Random sample from |
U |
Uniform random sample of size |
Z |
Normal random sample of size |
f |
Computed log-concave density estimator. |
f.smoothed |
List containing smoothed log-concave density estimator, as output by |
x |
Vector of distinct observations generated from |
w |
Weights corresponding to |
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L. and Rufibach, K. (2009) Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
## =================================================== ## Generate random samples as described in Section 3 of ## Duembgen and Rufibach (2009) ## =================================================== x0 <- rnorm(111) n <- 22 random <- rlogcon(n, x0) ## sample of size n from the log-concave density estimator random$X ## sample of size n from the smoothed log-concave density estimator random$X_star
## =================================================== ## Generate random samples as described in Section 3 of ## Duembgen and Rufibach (2009) ## =================================================== x0 <- rnorm(111) n <- 22 random <- rlogcon(n, x0) ## sample of size n from the log-concave density estimator random$X ## sample of size n from the smoothed log-concave density estimator random$X_star
Performs robustification and Hermite interpolation in the iterative convex minorant algorithm as described in Rufibach (2006, 2007).
robust(x, w, eta, etanew, grad)
robust(x, w, eta, etanew, grad)
x |
Vector of independent and identically distributed numbers, with strictly increasing entries. |
w |
Optional vector of nonnegative weights corresponding to |
eta |
Current candidate vector. |
etanew |
New candidate vector. |
grad |
Gradient of L at current candidate vector |
Returns a (possibly) new vector on the segment
such that the log-likelihood of this new is strictly greater than that of the initial
and
is chosen
according to the Hermite interpolation procedure described in Rufibach (2006, 2007).
This function is not intended to be invoked by the end user.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Rufibach K. (2006) Log-concave Density Estimation and Bump Hunting for i.i.d. Observations.
PhD Thesis, University of Bern, Switzerland and Georg-August University of Goettingen, Germany, 2006.
Available at https://slsp-ube.primo.exlibrisgroup.com/permalink/41SLSP_UBE/17e6d97/alma99116730175505511.
Rufibach, K. (2007) Computing maximum likelihood estimators of a log-concave density function. J. Stat. Comput. Simul. 77, 561–574.
Computes the value of the ROC curve at (which may be a vector) based on log-concave density estimates of the constituent distributions.
ROCx(x, res0, res1, smooth = FALSE)
ROCx(x, res0, res1, smooth = FALSE)
x |
Vector of numbers in |
res0 |
|
res1 |
|
smooth |
Logical. If |
A real number or a vector of dimension the same as , the value of the ROC curve at
x
.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Rufibach, K. (2012). A smooth ROC curve estimator based on log-concave density estimates. Int. J. Biostat., 8(1), 1–29.
Used for the computation of AUC in logConROC
.
summary
method for class "dlc"
.
## S3 method for class 'dlc' summary(object, ...)
## S3 method for class 'dlc' summary(object, ...)
object |
An object of class |
... |
Further arguments. |
See activeSetLogCon
and evaluateLogConDens
for details on the computations.
The function summary.dlc
returns a list of summary statistics of the estimated
log-concave density as well as of its smoothed version (depending on the value of smoothed
when calling
logConDens
).
Note that the numbering of knots in the output relies on the vector of unique observations.
Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch
Lutz Duembgen, [email protected],
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Duembgen, L, Huesler, A. and Rufibach, K. (2010). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at https://arxiv.org/abs/0707.4643.
Duembgen, L. and Rufibach, K. (2009). Maximum likelihood estimation of a log–concave density and its distribution function: basic properties and uniform consistency. Bernoulli, 15(1), 40–68.
Duembgen, L. and Rufibach, K. (2011) logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1–28. doi:10.18637/jss.v039.i06
## See help file of function "logConDens".
## See help file of function "logConDens".