Title: | Estimation of (Local) False Discovery Rates and Higher Criticism |
---|---|
Description: | Estimates both tail area-based false discovery rates (Fdr) as well as local false discovery rates (fdr) for a variety of null models (p-values, z-scores, correlation coefficients, t-scores). The proportion of null values and the parameters of the null distribution are adaptively estimated from the data. In addition, the package contains functions for non-parametric density estimation (Grenander estimator), for monotone regression (isotonic regression and antitonic regression with weights), for computing the greatest convex minorant (GCM) and the least concave majorant (LCM), for the half-normal and correlation distributions, and for computing empirical higher criticism (HC) scores and the corresponding decision threshold. |
Authors: | Bernd Klaus [aut], Korbinian Strimmer [aut, cre] |
Maintainer: | Korbinian Strimmer <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.18 |
Built: | 2024-11-19 06:45:16 UTC |
Source: | CRAN |
censored.fit
fits a null distribution
to censored data.
fndr.cutoff
finds a suitable cutoff point based on the
(approximate) false non-discovery rate (FNDR).
censored.fit(x, cutoff, statistic=c("normal", "correlation", "pvalue", "studentt")) fndr.cutoff(x, statistic=c("normal", "correlation", "pvalue", "studentt"))
censored.fit(x, cutoff, statistic=c("normal", "correlation", "pvalue", "studentt")) fndr.cutoff(x, statistic=c("normal", "correlation", "pvalue", "studentt"))
x |
vector of test statistics. |
cutoff |
truncation point (this may a single value or a vector). |
statistic |
type of statistic - normal, correlation, or student t. |
As null model truncated normal, truncated student t or a truncated correlation density is assumed. The truncation point is specified by the cutoff parameter. All data points whose absolute value are large than the cutoff point are ignored when fitting the truncated null model via maximum likelihood. The total number of data points is only used to estimate the fraction of null values eta0.
censored.fit
returns a matrix whose rows contain the estimated parameters and corresponding errors
for each cutoff point.
fndr.cutoff
returns a tentative cutoff point.
# load "fdrtool" library library("fdrtool") # simulate normal data sd.true = 2.232 n = 5000 z = rnorm(n, sd=sd.true) censored.fit(z, c(2,3,5), statistic="normal") # simulate contaminated mixture of correlation distribution r = rcor0(700, kappa=10) u1 = runif(200, min=-1, max=-0.7) u2 = runif(200, min=0.7, max=1) rc = c(r, u1, u2) censored.fit(r, 0.7, statistic="correlation") censored.fit(rc, 0.7, statistic="correlation") # pvalue example data(pvalues) co = fndr.cutoff(pvalues, statistic="pvalue") co censored.fit(pvalues, cutoff=co, statistic="pvalue")
# load "fdrtool" library library("fdrtool") # simulate normal data sd.true = 2.232 n = 5000 z = rnorm(n, sd=sd.true) censored.fit(z, c(2,3,5), statistic="normal") # simulate contaminated mixture of correlation distribution r = rcor0(700, kappa=10) u1 = runif(200, min=-1, max=-0.7) u2 = runif(200, min=0.7, max=1) rc = c(r, u1, u2) censored.fit(r, 0.7, statistic="correlation") censored.fit(rc, 0.7, statistic="correlation") # pvalue example data(pvalues) co = fndr.cutoff(pvalues, statistic="pvalue") co censored.fit(pvalues, cutoff=co, statistic="pvalue")
The above functions describe the distribution of the Pearson correlation
coefficient r
assuming that there is no correlation present (rho = 0
).
Note that the distribution has only a single parameter: the degree
of freedom kappa
, which is equal to the inverse of the variance of the distribution.
The theoretical value of kappa
depends both on the sample size n
and the number
p
of considered variables. If a simple correlation coefficient between two
variables (p=2
) is considered the degree of freedom equals kappa = n-1
.
However, if a partial correlation coefficient is considered (conditioned on p-2
remaining
variables) the degree of freedom is kappa = n-1-(p-2) = n-p+1
.
dcor0(x, kappa, log=FALSE) pcor0(q, kappa, lower.tail=TRUE, log.p=FALSE) qcor0(p, kappa, lower.tail=TRUE, log.p=FALSE) rcor0(n, kappa)
dcor0(x, kappa, log=FALSE) pcor0(q, kappa, lower.tail=TRUE, log.p=FALSE) qcor0(p, kappa, lower.tail=TRUE, log.p=FALSE) rcor0(n, kappa)
x , q
|
vector of sample correlations |
p |
vector of probabilities |
kappa |
the degree of freedom of the distribution (= inverse variance) |
n |
number of values to generate. If n is a vector, length(n) values will be generated |
log , log.p
|
logical vector; if TRUE, probabilities p are given as log(p) |
lower.tail |
logical vector; if TRUE (default), probabilities are |
For density and distribution functions as well as a corresponding random number generator
of the correlation coefficient for arbitrary non-vanishing correlation rho
please refer to the
SuppDists
package by Bob Wheeler [email protected] (available on CRAN).
Note that the parameter N
in his dPearson
function corresponds to N=kappa+1
.
dcor0
gives the density, pcor0
gives the distribution function, qcor0
gives
the quantile function, and rcor0
generates random deviates.
Korbinian Strimmer (https://strimmerlab.github.io).
cor
.
# load fdrtool library library("fdrtool") # distribution of r for various degrees of freedom x = seq(-1,1,0.01) y1 = dcor0(x, kappa=7) y2 = dcor0(x, kappa=15) plot(x,y2,type="l", xlab="r", ylab="pdf", xlim=c(-1,1), ylim=c(0,2)) lines(x,y1) # simulated data r = rcor0(1000, kappa=7) hist(r, freq=FALSE, xlim=c(-1,1), ylim=c(0,5)) lines(x,y1,type="l") # distribution function pcor0(-0.2, kappa=15)
# load fdrtool library library("fdrtool") # distribution of r for various degrees of freedom x = seq(-1,1,0.01) y1 = dcor0(x, kappa=7) y2 = dcor0(x, kappa=15) plot(x,y2,type="l", xlab="r", ylab="pdf", xlim=c(-1,1), ylim=c(0,2)) lines(x,y1) # simulated data r = rcor0(1000, kappa=7) hist(r, freq=FALSE, xlim=c(-1,1), ylim=c(0,5)) lines(x,y1,type="l") # distribution function pcor0(-0.2, kappa=15)
fdrtool
takes a vector of z-scores (or of correlations, p-values,
or t-statistics), and estimates for each case both the tail area-based Fdr
as well as the density-based fdr (=q-value resp. local false discovery rate).
The parameters of the null distribution are
estimated adaptively from the data (except for the case of p-values where
this is not necessary).
fdrtool(x, statistic=c("normal", "correlation", "pvalue"), plot=TRUE, color.figure=TRUE, verbose=TRUE, cutoff.method=c("fndr", "pct0", "locfdr"), pct0=0.75)
fdrtool(x, statistic=c("normal", "correlation", "pvalue"), plot=TRUE, color.figure=TRUE, verbose=TRUE, cutoff.method=c("fndr", "pct0", "locfdr"), pct0=0.75)
x |
vector of the observed test statistics. |
statistic |
one of "normal" (default), "correlation", "pvalue". This species the null model. |
plot |
plot a figure with estimated densities, distribution functions, and (local) false discovery rates. |
verbose |
print out status messages. |
cutoff.method |
one of "fndr" (default), "pct0", "locfdr". |
pct0 |
fraction of data used for fitting null model - only if |
color.figure |
determines whether a color figure or a black and white figure is produced (defaults to "TRUE", i.e. to color figure). |
The algorithm implemented in this function proceeds as follows:
A suitable cutoff point is determined. If cutoff.method
is "fndr" then first an approximate null model is fitted and
subsequently a cutoff point is sought with false nondiscovery
rate as small as possible (see fndr.cutoff
).
If cutoff.method
is "pct0"
then a specified quantile (default value: 0.75) of the data
is used as the cutoff point. If cutoff.method
equals
"locfdr" then the heuristic of the "locfdr" package (version 1.1-6)
is employed to find the cutoff (z-scores and correlations only).
The parameters of the null model are estimated from the
data using censored.fit
. This results
in estimates for scale parameters und and proportion
of null values (eta0
).
Subsequently the corresponding p-values are computed, and
a modified grenander
algorithm is employed
to obtain the overall density and distribution function
(note that this respects the estimated eta0
).
Finally, q-values and local fdr values are computed for each case.
The assumed null models all have (except for p-values) one free scale parameter. Note that the z-scores and the correlations are assumed to have zero mean.
A list with the following components:
pval |
a vector with p-values for each case. |
qval |
a vector with q-values (Fdr) for each case. |
lfdr |
a vector with local fdr values for each case. |
statistic |
the specified type of null model. |
param |
a vector containing the estimated parameters (the null
proportion |
Korbinian Strimmer (https://strimmerlab.github.io).
Strimmer, K. (2008a). A unified approach to false discovery rate estimation. BMC Bioinformatics 9: 303. <DOI:10.1186/1471-2105-9-303>
Strimmer, K. (2008b). fdrtool: a versatile R package for estimating local and tail area- based false discovery rates. Bioinformatics 24: 1461-1462. <DOI:10.1093/bioinformatics/btn209>
pval.estimate.eta0
, censored.fit
.
# load "fdrtool" library and p-values library("fdrtool") data(pvalues) # estimate fdr and Fdr from p-values data(pvalues) fdr = fdrtool(pvalues, statistic="pvalue") fdr$qval # estimated Fdr values fdr$lfdr # estimated local fdr # the same but with black and white figure fdr = fdrtool(pvalues, statistic="pvalue", color.figure=FALSE) # estimate fdr and Fdr from z-scores sd.true = 2.232 n = 500 z = rnorm(n, sd=sd.true) z = c(z, runif(30, 5, 10)) # add some contamination fdr = fdrtool(z) # you may change some parameters of the underlying functions fdr = fdrtool(z, cutoff.method="pct0", pct0=0.9)
# load "fdrtool" library and p-values library("fdrtool") data(pvalues) # estimate fdr and Fdr from p-values data(pvalues) fdr = fdrtool(pvalues, statistic="pvalue") fdr$qval # estimated Fdr values fdr$lfdr # estimated local fdr # the same but with black and white figure fdr = fdrtool(pvalues, statistic="pvalue", color.figure=FALSE) # estimate fdr and Fdr from z-scores sd.true = 2.232 n = 500 z = rnorm(n, sd=sd.true) z = c(z, runif(30, 5, 10)) # add some contamination fdr = fdrtool(z) # you may change some parameters of the underlying functions fdr = fdrtool(z, cutoff.method="pct0", pct0=0.9)
gcmlcm
computes the greatest convex minorant (GCM) or the
least concave majorant (LCM) of a piece-wise linear function.
gcmlcm(x, y, type=c("gcm", "lcm"))
gcmlcm(x, y, type=c("gcm", "lcm"))
x , y
|
coordinate vectors of the piece-wise linear function. Note that the x values need to be unique and be arranged in sorted order. |
type |
specifies whether to compute the greatest convex
minorant ( |
The GCM is obtained by isotonic regression of the raw slopes, whereas the LCM is obtained by antitonic regression. See Robertson et al. (1988).
A list with the following entries:
x.knots |
the x values belonging to the knots of the LCM/GCM curve |
y.knots |
the corresponding y values |
slope.knots |
the slopes of the corresponding line segments |
Korbinian Strimmer (https://strimmerlab.github.io).
Robertson, T., F. T. Wright, and R. L. Dykstra. 1988. Order restricted statistical inference. John Wiley and Sons.
# load "fdrtool" library library("fdrtool") # generate some data x = 1:20 y = rexp(20) plot(x, y, type="l", lty=3, main="GCM (red) and LCM (blue)") points(x, y) # greatest convex minorant (red) gg = gcmlcm(x,y) lines(gg$x.knots, gg$y.knots, col=2, lwd=2) # least concave majorant (blue) ll = gcmlcm(x,y, type="lcm") lines(ll$x.knots, ll$y.knots, col=4, lwd=2)
# load "fdrtool" library library("fdrtool") # generate some data x = 1:20 y = rexp(20) plot(x, y, type="l", lty=3, main="GCM (red) and LCM (blue)") points(x, y) # greatest convex minorant (red) gg = gcmlcm(x,y) lines(gg$x.knots, gg$y.knots, col=2, lwd=2) # least concave majorant (blue) ll = gcmlcm(x,y, type="lcm") lines(ll$x.knots, ll$y.knots, col=4, lwd=2)
The function grenander
computes the Grenander estimator
of a one-dimensional decreasing or increasing density.
grenander(F, type=c("decreasing", "increasing"))
grenander(F, type=c("decreasing", "increasing"))
F |
an |
type |
specifies whether the distribution is decreasing (the default) or increasing. |
The Grenander (1956) density estimator is given by the slopes of the least concave majorant (LCM) of the empirical distribution function (ECDF). It is a decreasing piecewise-constant function and can be shown to be the non-parametric maximum likelihood estimate (NPMLE) under the assumption of a decreasing density (note that the ECDF is the NPMLE without this assumption). Similarly, an increasing density function is obtained by using the greatest convex minorant (GCM) of the ECDF.
A list of class grenander
with the following components:
F |
the empirical distribution function specified as input. |
x.knots |
x locations of the knots of the least concave majorant of the ECDF. |
F.knots |
the corresponding y locations of the least concave majorant of the ECDF. |
f.knots |
the corresponding slopes (=density). |
Korbinian Strimmer (https://strimmerlab.github.io).
Grenander, U. (1956). On the theory of mortality measurement, Part II. Skan. Aktuarietidskr, 39, 125–153.
# load "fdrtool" library library("fdrtool") # samples from random exponential variable z = rexp(30,1) e = ecdf(z) g = grenander(e) g # plot ecdf, concave cdf, and Grenander density estimator (on log scale) plot(g, log="y") # for comparison the kernel density estimate plot(density(z)) # area under the Grenander density estimator sum( g$f.knots[-length(g$f.knots)]*diff(g$x.knots) )
# load "fdrtool" library library("fdrtool") # samples from random exponential variable z = rexp(30,1) e = ecdf(z) g = grenander(e) g # plot ecdf, concave cdf, and Grenander density estimator (on log scale) plot(g, log="y") # for comparison the kernel density estimate plot(density(z)) # area under the Grenander density estimator sum( g$f.knots[-length(g$f.knots)]*diff(g$x.knots) )
Density, distribution function, quantile function and random
generation for the half-normal distribution with parameter theta
.
dhalfnorm(x, theta=sqrt(pi/2), log = FALSE) phalfnorm(q, theta=sqrt(pi/2), lower.tail = TRUE, log.p = FALSE) qhalfnorm(p, theta=sqrt(pi/2), lower.tail = TRUE, log.p = FALSE) rhalfnorm(n, theta=sqrt(pi/2)) sd2theta(sd) theta2sd(theta)
dhalfnorm(x, theta=sqrt(pi/2), log = FALSE) phalfnorm(q, theta=sqrt(pi/2), lower.tail = TRUE, log.p = FALSE) qhalfnorm(p, theta=sqrt(pi/2), lower.tail = TRUE, log.p = FALSE) rhalfnorm(n, theta=sqrt(pi/2)) sd2theta(sd) theta2sd(theta)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
theta |
parameter of half-normal distribution. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are
|
sd |
standard deviation of the zero-mean normal distribution
that corresponds to the half-normal with parameter |
x = abs(z)
follows a half-normal distribution with
if z
is a normal variate with zero mean.
The half-normal distribution has density
It has mean and variance
.
The parameter is related to the
standard deviation
of the corresponding
zero-mean normal distribution by the equation
.
If is not specified in the above functions
it assumes the default values of
,
corresponding to
.
dhalfnorm
gives the density,
phalfnorm
gives the distribution function,
qhalfnorm
gives the quantile function, and
rhalfnorm
generates random deviates.
sd2theta
computes a theta
parameter.
theta2sd
computes a sd
parameter.
# load "fdrtool" library library("fdrtool") ## density of half-normal compared with a corresponding normal par(mfrow=c(1,2)) sd.norm = 0.64 x = seq(0, 5, 0.01) x2 = seq(-5, 5, 0.01) plot(x, dhalfnorm(x, sd2theta(sd.norm)), type="l", xlim=c(-5, 5), lwd=2, main="Probability Density", ylab="pdf(x)") lines(x2, dnorm(x2, sd=sd.norm), col=8 ) plot(x, phalfnorm(x, sd2theta(sd.norm)), type="l", xlim=c(-5, 5), lwd=2, main="Distribution Function", ylab="cdf(x)") lines(x2, pnorm(x2, sd=sd.norm), col=8 ) legend("topleft", c("half-normal", "normal"), lwd=c(2,1), col=c(1, 8), bty="n", cex=1.0) par(mfrow=c(1,1)) ## distribution function integrate(dhalfnorm, 0, 1.4, theta = 1.234) phalfnorm(1.4, theta = 1.234) ## quantile function qhalfnorm(-1) # NaN qhalfnorm(0) qhalfnorm(.5) qhalfnorm(1) qhalfnorm(2) # NaN ## random numbers theta = 0.72 hz = rhalfnorm(10000, theta) hist(hz, freq=FALSE) lines(x, dhalfnorm(x, theta)) mean(hz) 1/theta # theoretical mean var(hz) (pi-2)/(2*theta*theta) # theoretical variance ## relationship with two-sided normal p-values z = rnorm(1000) # two-sided p-values pvl = 1- phalfnorm(abs(z)) pvl2 = 2 - 2*pnorm(abs(z)) sum(pvl-pvl2)^2 # equivalent hist(pvl2, freq=FALSE) # uniform distribution # back to half-normal scores hz = qhalfnorm(1-pvl) hist(hz, freq=FALSE) lines(x, dhalfnorm(x))
# load "fdrtool" library library("fdrtool") ## density of half-normal compared with a corresponding normal par(mfrow=c(1,2)) sd.norm = 0.64 x = seq(0, 5, 0.01) x2 = seq(-5, 5, 0.01) plot(x, dhalfnorm(x, sd2theta(sd.norm)), type="l", xlim=c(-5, 5), lwd=2, main="Probability Density", ylab="pdf(x)") lines(x2, dnorm(x2, sd=sd.norm), col=8 ) plot(x, phalfnorm(x, sd2theta(sd.norm)), type="l", xlim=c(-5, 5), lwd=2, main="Distribution Function", ylab="cdf(x)") lines(x2, pnorm(x2, sd=sd.norm), col=8 ) legend("topleft", c("half-normal", "normal"), lwd=c(2,1), col=c(1, 8), bty="n", cex=1.0) par(mfrow=c(1,1)) ## distribution function integrate(dhalfnorm, 0, 1.4, theta = 1.234) phalfnorm(1.4, theta = 1.234) ## quantile function qhalfnorm(-1) # NaN qhalfnorm(0) qhalfnorm(.5) qhalfnorm(1) qhalfnorm(2) # NaN ## random numbers theta = 0.72 hz = rhalfnorm(10000, theta) hist(hz, freq=FALSE) lines(x, dhalfnorm(x, theta)) mean(hz) 1/theta # theoretical mean var(hz) (pi-2)/(2*theta*theta) # theoretical variance ## relationship with two-sided normal p-values z = rnorm(1000) # two-sided p-values pvl = 1- phalfnorm(abs(z)) pvl2 = 2 - 2*pnorm(abs(z)) sum(pvl-pvl2)^2 # equivalent hist(pvl2, freq=FALSE) # uniform distribution # back to half-normal scores hz = qhalfnorm(1-pvl) hist(hz, freq=FALSE) lines(x, dhalfnorm(x))
hc.score
computes the empirical higher criticism (HC) scores from p-values.
hc.thresh
determines the HC decision threshold by searching for the p-value with the maximum HC score.
hc.score(pval) hc.thresh(pval, alpha0=1, plot=TRUE)
hc.score(pval) hc.thresh(pval, alpha0=1, plot=TRUE)
pval |
vector of p-values. |
alpha0 |
look only at a fraction |
plot |
show plot with HC decision threshold. |
Higher Criticism (HC) provides an alternative means to determine decision thresholds for signal identification, especially if the signal is rare and weak.
See Donoho and Jin (2008) for details of this approach and Klaus and Strimmer (2012) for a review and connections with FDR methdology.
hc.score
returns a vector with the HC score corresponding to each p-value.
hc.thresh
returns the p-value corresponding to the maximum HC score.
Bernd Klaus and Korbinian Strimmer (https://strimmerlab.github.io).
Donoho, D. and J. Jin. (2008). Higher criticism thresholding: optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105:14790-15795.
Klaus, B., and K. Strimmer (2013). Signal identification for rare and weak features: higher criticism or false discovery rates? Biostatistics 14: 129-143. <DOI:10.1093/biostatistics/kxs030>
# load "fdrtool" library library("fdrtool") # some p-values data(pvalues) # compute HC scores hc.score(pvalues) # determine HC threshold hc.thresh(pvalues)
# load "fdrtool" library library("fdrtool") # some p-values data(pvalues) # compute HC scores hc.score(pvalues) # determine HC threshold hc.thresh(pvalues)
monoreg
performs monotone regression (either isotonic
or antitonic) with weights.
monoreg(x, y=NULL, w=rep(1, length(x)), type=c("isotonic", "antitonic"))
monoreg(x, y=NULL, w=rep(1, length(x)), type=c("isotonic", "antitonic"))
x , y
|
coordinate vectors of the regression
points. Alternatively a single “plotting” structure can be
specified: see |
w |
data weights (default values: 1). |
type |
fit a monotonely increasing ("isotonic") or monotonely decreasing ("antitonic") function. |
monoreg
is similar to isoreg
, with the addition
that monoreg
accepts weights.
If several identical x
values are given as input, the
corresponding y
values and the
weights w
are automatically merged, and a warning is issued.
The plot.monoreg
function optionally plots the cumulative
sum diagram with the greatest convex minorant (isotonic regression)
or the least concave majorant (antitonic regression), see the
examples below.
A list with the following entries:
x |
the sorted and unique x values |
y |
the corresponding y values |
w |
the corresponding weights |
yf |
the fitted y values |
type |
the type of monotone regression ("isotonic" or "antitonic" |
call |
the function call |
Korbinian Strimmer (https://strimmerlab.github.io).
Part of this function is C code that has been ported from R code originally written by Kaspar Rufibach.
Robertson, T., F. T. Wright, and R. L. Dykstra. 1988. Order restricted statistical inference. John Wiley and Sons.
# load "fdrtool" library library("fdrtool") # an example with weights # Example 1.1.1. (dental study) from Robertson, Wright and Dykstra (1988) age = c(14, 14, 8, 8, 8, 10, 10, 10, 12, 12, 12) size = c(23.5, 25, 21, 23.5, 23, 24, 21, 25, 21.5, 22, 19) mr = monoreg(age, size) # sorted x values mr$x # 8 10 12 14 # weights and merged y values mr$w # 3 3 3 2 mr$y # 22.50000 23.33333 20.83333 24.25000 # fitted y values mr$yf # 22.22222 22.22222 22.22222 24.25000 fitted(mr) residuals(mr) plot(mr, ylim=c(18, 26)) # this shows the averaged data points points(age, size, pch=2) # add original data points ### y = c(1,0,1,0,0,1,0,1,1,0,1,0) x = 1:length(y) mr = monoreg(y) # plot with greatest convex minorant plot(mr, plot.type="row.wise") # this is the same mr = monoreg(x,y) plot(mr) # antitonic regression and least concave majorant mr = monoreg(-y, type="a") plot(mr, plot.type="row.wise") # the fit yf is independent of the location of x and y plot(monoreg(x + runif(1, -1000, 1000), y +runif(1, -1000, 1000)) ) ### y = c(0,0,2/4,1/5,2/4,1/2,4/5,5/8,7/11,10/11) x = c(5,9,13,18,22,24,29,109,120,131) mr = monoreg(x,y) plot(mr, plot.type="row.wise") # the fit (yf) only depends on the ordering of x monoreg(1:length(y), y)$yf monoreg(x, y)$yf
# load "fdrtool" library library("fdrtool") # an example with weights # Example 1.1.1. (dental study) from Robertson, Wright and Dykstra (1988) age = c(14, 14, 8, 8, 8, 10, 10, 10, 12, 12, 12) size = c(23.5, 25, 21, 23.5, 23, 24, 21, 25, 21.5, 22, 19) mr = monoreg(age, size) # sorted x values mr$x # 8 10 12 14 # weights and merged y values mr$w # 3 3 3 2 mr$y # 22.50000 23.33333 20.83333 24.25000 # fitted y values mr$yf # 22.22222 22.22222 22.22222 24.25000 fitted(mr) residuals(mr) plot(mr, ylim=c(18, 26)) # this shows the averaged data points points(age, size, pch=2) # add original data points ### y = c(1,0,1,0,0,1,0,1,1,0,1,0) x = 1:length(y) mr = monoreg(y) # plot with greatest convex minorant plot(mr, plot.type="row.wise") # this is the same mr = monoreg(x,y) plot(mr) # antitonic regression and least concave majorant mr = monoreg(-y, type="a") plot(mr, plot.type="row.wise") # the fit yf is independent of the location of x and y plot(monoreg(x + runif(1, -1000, 1000), y +runif(1, -1000, 1000)) ) ### y = c(0,0,2/4,1/5,2/4,1/2,4/5,5/8,7/11,10/11) x = c(5,9,13,18,22,24,29,109,120,131) mr = monoreg(x,y) plot(mr, plot.type="row.wise") # the fit (yf) only depends on the ordering of x monoreg(1:length(y), y)$yf monoreg(x, y)$yf
pval.estimate.eta0
estimates the proportion eta0 of null p-values in a given
vector of p-values.
pval.estimate.eta0(p, method=c("smoother", "bootstrap", "conservative", "adaptive", "quantile"), lambda=seq(0,0.9,0.05), diagnostic.plot=TRUE, q=0.1)
pval.estimate.eta0(p, method=c("smoother", "bootstrap", "conservative", "adaptive", "quantile"), lambda=seq(0,0.9,0.05), diagnostic.plot=TRUE, q=0.1)
p |
vector of p-values |
method |
algorithm used to estimate the proportion of null p-values. Available options are "conservative" , "adaptive", "bootstrap", quantile, and "smoother" (default). |
lambda |
optional tuning parameter vector needed for "bootstrap"
and "smoothing" methods (defaults to |
diagnostic.plot |
if |
q |
quantile used for estimating eta0 - only if |
This quantity eta0
, i.e. the proportion eta0 of null p-values in a given
vector of p-values, is an important parameter
when controlling the false discovery rate (FDR). A conservative choice is
eta0 = 1 but a choice closer to the true value will increase efficiency
and power
- see Benjamini and Hochberg (1995, 2000) and Storey (2002) for details.
The function pval.estimate.eta0
provides five algorithms: the "conservative"
method always returns eta0 = 1 (Benjamini and Hochberg, 1995), "adaptive"
uses the approach suggested in Benjamini and Hochberg (2000), "bootstrap"
employs the method from Storey (2002), "smoother" uses the smoothing spline
approach in Storey and Tibshirani (2003), and "quantile" is a simplified version
of "smoother".
The estimated proportion eta0 of null p-values.
Korbinian Strimmer (https://strimmerlab.github.io).
Adapted in part from code by Y. Benjamini and J.D. Storey.
"conservative" procedure: Benjamini, Y., and Y. Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. B, 57, 289–300.
"adaptive" procedure: Benjamini, Y., and Y. Hochberg (2000) The adaptive control of the false discovery rate in multiple hypotheses testing with independent statistics. J. Behav. Educ. Statist., 25, 60–83.
"bootstrap" procedure: Storey, J. D. (2002) A direct approach to false discovery rates. J. Roy. Statist. Soc. B., 64, 479–498.
"smoother" procedure: Storey, J. D., and R. Tibshirani (2003) Statistical significance for genome-wide experiments. Proc. Nat. Acad. Sci. USA, 100, 9440-9445.
"quantile" procedure: similar to smoother, except that the lower q quantile of all eta0 computed in dependence of lambda is taken as overall estimate of eta0.
# load "fdrtool" library and p-values library("fdrtool") data(pvalues) # Proportion of null p-values for different methods pval.estimate.eta0(pvalues, method="conservative") pval.estimate.eta0(pvalues, method="adaptive") pval.estimate.eta0(pvalues, method="bootstrap") pval.estimate.eta0(pvalues, method="smoother") pval.estimate.eta0(pvalues, method="quantile")
# load "fdrtool" library and p-values library("fdrtool") data(pvalues) # Proportion of null p-values for different methods pval.estimate.eta0(pvalues, method="conservative") pval.estimate.eta0(pvalues, method="adaptive") pval.estimate.eta0(pvalues, method="bootstrap") pval.estimate.eta0(pvalues, method="smoother") pval.estimate.eta0(pvalues, method="quantile")
This data set contains 4,289 p-values. These data are used to
illustrate the functionality of the functions fdrtool
and pval.estimate.eta0
.
data(pvalues)
data(pvalues)
pvalues
is a vector with 4,289 p-values.
# load fdrtool library library("fdrtool") # load data set data(pvalues) # estimate density and distribution function, # and compute corresponding (local) false discovery rates fdrtool(pvalues, statistic="pvalue")
# load fdrtool library library("fdrtool") # load data set data(pvalues) # estimate density and distribution function, # and compute corresponding (local) false discovery rates fdrtool(pvalues, statistic="pvalue")