Title: | Visualization and Analysis of Statistical Measures of Confidence |
---|---|
Description: | Enables: (1) plotting two-dimensional confidence regions, (2) coverage analysis of confidence region simulations, (3) calculating confidence intervals and the associated actual coverage for binomial proportions, (4) calculating the support values and the probability mass function of the Kaplan-Meier product-limit estimator, and (5) plotting the actual coverage function associated with a confidence interval for the survivor function from a randomly right-censored data set. Each is given in greater detail next. (1) Plots the two-dimensional confidence region for probability distribution parameters (supported distribution suffixes: cauchy, gamma, invgauss, logis, llogis, lnorm, norm, unif, weibull) corresponding to a user-given complete or right-censored dataset and level of significance. The crplot() algorithm plots more points in areas of greater curvature to ensure a smooth appearance throughout the confidence region boundary. An alternative heuristic plots a specified number of points at roughly uniform intervals along its boundary. Both heuristics build upon the radial profile log-likelihood ratio technique for plotting confidence regions given by Jaeger (2016) <doi:10.1080/00031305.2016.1182946>, and are detailed in a publication by Weld et al. (2019) <doi:10.1080/00031305.2018.1564696>. (2) Performs confidence region coverage simulations for a random sample drawn from a user- specified parametric population distribution, or for a user-specified dataset and point of interest with coversim(). (3) Calculates confidence interval bounds for a binomial proportion with binomTest(), calculates the actual coverage with binomTestCoverage(), and plots the actual coverage with binomTestCoveragePlot(). Calculates confidence interval bounds for the binomial proportion using an ensemble of constituent confidence intervals with binomTestEnsemble(). Calculates confidence interval bounds for the binomial proportion using a complete enumeration of all possible transitions from one actual coverage acceptance curve to another which minimizes the root mean square error for n <= 15 and follows the transitions for well-known confidence intervals for n > 15 using binomTestMSE(). (4) The km.support() function calculates the support values of the Kaplan-Meier product-limit estimator for a given sample size n using an induction algorithm described in Qin et al. (2023) <doi:10.1080/00031305.2022.2070279>. The km.outcomes() function generates a matrix containing all possible outcomes (all possible sequences of failure times and right-censoring times) of the value of the Kaplan-Meier product-limit estimator for a particular sample size n. The km.pmf() function generates the probability mass function for the support values of the Kaplan-Meier product-limit estimator for a particular sample size n, probability of observing a failure h at the time of interest expressed as the cumulative probability percentile associated with X = min(T, C), where T is the failure time and C is the censoring time under a random-censoring scheme. The km.surv() function generates multiple probability mass functions of the Kaplan-Meier product-limit estimator for the same arguments as those given for km.pmf(). (5) The km.coverage() function plots the actual coverage function associated with a confidence interval for the survivor function from a randomly right-censored data set for one or more of the following confidence intervals: Greenwood, log-minus-log, Peto, arcsine, and exponential Greenwood. The actual coverage function is plotted for a small number of items on test, stated coverage, failure rate, and censoring rate. The km.coverage() function can print an optional table containing all possible failure/censoring orderings, along with their contribution to the actual coverage function. |
Authors: | Christopher Weld [aut, cre] , Kexin Feng [aut], Hayeon Park [aut], Yuxin Qin [aut], Xingyu Wang [aut], Heather Sasinowska [aut], Lawrence Leemis [aut], Yuan Chang [ctb], Brock Crook [ctb], Chris Kuebler [ctb], Andrew Loh [ctb], Xin Zhang [ctb] |
Maintainer: | Christopher Weld <[email protected]> |
License: | GPL (<= 2) |
Version: | 1.9.1 |
Built: | 2024-12-02 06:37:03 UTC |
Source: | CRAN |
Generates lower and upper confidence interval limits for a binomial proportion using different types of confidence intervals.
binomTest(n, x, alpha = 0.05, intervalType = "Clopper-Pearson")
binomTest(n, x, alpha = 0.05, intervalType = "Clopper-Pearson")
n |
sample size |
x |
number of successes |
alpha |
significance level for confidence interval |
intervalType |
type of confidence interval used; either "Clopper-Pearson", "Wald", "Wilson-Score", "Jeffreys", "Agresti-Coull", "Arcsine", or "Blaker" |
Generates a lower and upper confidence interval limit for a binomial proportion using
various types of confidence intervals,
various sample sizes, and
various numbers of successes.
When the binomTest
function is called, it returns a two-element vector in which
the first element is the lower bound of the confidence interval, and
the second element is the upper bound of the confidence interval.
This confidence interval is constructed by calculating lower and upper bounds associated with the confidence interval procedure specified by the intervalType
argument. Lower bounds that are negative are set to 0 and upper bounds that are greater than 1 are set to 1.
Hayeon Park ([email protected]), Larry Leemis ([email protected])
binomTest(10, 6) binomTest(100, 30, intervalType = "Agresti-Coull")
binomTest(10, 6) binomTest(100, 30, intervalType = "Agresti-Coull")
Calculates the actual coverage of a confidence interval for a
binomial proportion for a particular sample size and
a particular value of the probability of success
for several
confidence interval procedures.
binomTestCoverage(n, p, alpha = 0.05, intervalType = "Clopper-Pearson")
binomTestCoverage(n, p, alpha = 0.05, intervalType = "Clopper-Pearson")
n |
sample size |
p |
population probability of success |
alpha |
significance level for confidence interval |
intervalType |
type of confidence interval used; either "Clopper-Pearson", "Wald", "Wilson-Score", "Jeffreys", "Agresti-Coull", "Arcsine", or "Blaker" |
Calculates the actual coverage of a confidence interval
procedure at a particular value of for
various types of confidence intervals,
various probabilities of success , and
various sample sizes .
The actual coverage for a particular value of , the probability of success of interest, is
where is an indicator function that determines whether a confidence interval
covers
when
(see Vollset, 1993).
The binomial distribution with arguments size
= and
prob
= has probability mass function
for .
The algorithm for computing the actual coverage for a particular probability of
success begins by calculating all possible lower and upper bounds associated
with the confidence interval procedure specified by the intervalType
argument.
The appropriate binomial probabilities are summed to determine the actual coverage
at .
Hayeon Park ([email protected]), Larry Leemis ([email protected])
Vollset, S.E. (1993). Confidence Intervals for a Binomial Proportion. Statistics in Medicine, 12, 809-824.
binomTestCoverage(6, 0.4) binomTestCoverage(n = 10, p = 0.3, alpha = 0.01, intervalType = "Wilson-Score")
binomTestCoverage(6, 0.4) binomTestCoverage(n = 10, p = 0.3, alpha = 0.01, intervalType = "Wilson-Score")
Generates plots for the actual coverage of a binomial proportion
using various types of confidence intervals. Plots the actual
coverage for a given sample size and stated nominal coverage
alpha
.
binomTestCoveragePlot(n, alpha = 0.05, intervalType = "Clopper-Pearson", plo = 0, phi = 1, clo = 1 - 2 * alpha, chi = 1, points = 5 + floor(250 / n), showTrueCoverage = TRUE, gridCurves = FALSE)
binomTestCoveragePlot(n, alpha = 0.05, intervalType = "Clopper-Pearson", plo = 0, phi = 1, clo = 1 - 2 * alpha, chi = 1, points = 5 + floor(250 / n), showTrueCoverage = TRUE, gridCurves = FALSE)
n |
sample size |
alpha |
significance level for confidence interval |
intervalType |
type of confidence interval used; either "Clopper-Pearson", "Wald", "Wilson-Score", "Jeffreys", "Agresti-Coull", "Arcsine", or "Blaker" |
plo |
lower limit for percentile (horizontal axis) |
phi |
upper limit for percentile (horizontal axis) |
clo |
lower limit for coverage (vertical axis) |
chi |
upper limit for coverage (vertical axis) |
points |
number of points plotted in each segment of the plot; if default, varies with 'n' (see above) |
showTrueCoverage |
logical; if |
gridCurves |
logical; if |
Generates an actual coverage plot for binomial proportions using
various types of confidence intervals, and
various sample sizes.
When the function is called with default arguments,
the horizontal axis is the percentile at which the coverage is evaluated,
the vertical axis is the actual coverage percentage at each percentile, that is, the probability that the true value at a percentile is contained in the corresponding confidence interval, and
the solid red line is the stated coverage of
alpha
.
The actual coverage for a particular value of , the percentile of interest, is
where is an indicator function that determines whether a confidence interval covers
when
(see Vollset, 1993).
The binomial distribution with arguments size
= and
prob
= has probability mass function
for .
The algorithm for plotting the actual coverage begins by calculating all possible lower and upper bounds associated with the confidence interval procedure specified by the intervalType
argument.
These values are concatenated into a vector which is sorted. Negative values and values that exceed 1 are removed from this vector. These values are the breakpoints in the actual coverage function. The points
argument gives the number of points plotted on each segment of the graph of the actual coverage.
The plo
and phi
arguments can be used to expand or compress the plots horizontally.
The clo
and chi
arguments can be used to expand or compress the plots vertically.
By default, the showTrueCoverage
argument plots a solid horizontal
red line at the height of the stated coverage. The actual coverage is
plotted with solid black lines for each segment of the actual coverage.
The gridCurves
argument is assigned a logical value which indicates whether the acceptance curves giving all possible actual coverage values should be displayed as gray curves.
Hayeon Park ([email protected]), Larry Leemis ([email protected])
Vollset, S.E. (1993). Confidence Intervals for a Binomial Proportion. Statistics in Medicine, 12, 809–824.
binomTestCoveragePlot(6) binomTestCoveragePlot(10, intervalType = "Wilson-Score", clo = 0.8) binomTestCoveragePlot(n = 100, intervalType = "Wald", clo = 0, chi = 1, points = 30)
binomTestCoveragePlot(6) binomTestCoveragePlot(10, intervalType = "Wilson-Score", clo = 0.8) binomTestCoveragePlot(n = 100, intervalType = "Wald", clo = 0, chi = 1, points = 30)
Generates lower and upper confidence interval limits for a binomial proportion using an ensemble of confidence intervals.
binomTestEnsemble(n, x, alpha = 0.05, CP = TRUE, WS = TRUE, JF = TRUE, AC = TRUE, AR = TRUE)
binomTestEnsemble(n, x, alpha = 0.05, CP = TRUE, WS = TRUE, JF = TRUE, AC = TRUE, AR = TRUE)
n |
sample size |
x |
number of successes |
alpha |
significance level for confidence interval |
CP |
logical; if |
WS |
logical; if |
JF |
logical; if |
AC |
logical; if |
AR |
logical; if |
Generates lower and upper confidence interval limits for a binomial proportions using
various sample sizes,
various numbers of successes, and
various combinations of confidence intervals.
When the binomTestEnsemble
function is called, it returns a two-element vector in which
the first element is the lower bound of the Ensemble confidence interval, and
the second element is the upper bound of the Ensemble confidence interval.
To construct an Ensemble confidence interval that attains an actual coverage that is close to the stated coverage, the five constituent confidence interval procedures can be combined. Since these intervals vary in width, the lower limits and the actual coverage of the constituent confidence intervals at the maximum likelihood estimator are calculated. Likewise, the upper limits and the actual coverage of the constituent confidence intervals at the maximum likelihood estimator are calculated. The centroids of the lower and upper constituent confidence intervals for points falling below and above the stated coverage are connected with a line segment. The point of intersection of these line segments and the stated coverage gives the lower and upper bound of the Ensemble confidence interval. Special cases to this approach are given in the case of (a) the actual coverages all fall above or below the stated coverage, and (b) the slope of the line connecting the centroids is infinite.
If only one of the logical arguments is TRUE
, the code returns a simple confidence interval of that one procedure.
The Wald confidence interval is omitted because it degenerates in actual coverage for and
.
Hayeon Park ([email protected]), Larry Leemis ([email protected])
Park, H., Leemis, L. (2019), "Ensemble Confidence Intervals for Binomial Proportions", Statistics in Medicine, 38 (18), 3460-3475.
binomTestEnsemble(10, 3) binomTestEnsemble(100, 82, CP = FALSE, AR = FALSE) binomTestEnsemble(33, 1, CP = FALSE, JF = FALSE, AC = FALSE, AR = FALSE)
binomTestEnsemble(10, 3) binomTestEnsemble(100, 82, CP = FALSE, AR = FALSE) binomTestEnsemble(33, 1, CP = FALSE, JF = FALSE, AC = FALSE, AR = FALSE)
Generates lower and upper confidence interval limits for a binomial proportion that minimizes the root mean square error (RMSE) of the actual coverage function.
binomTestMSE(n, x, alpha = 0.05, smooth = 1, showRMSE = TRUE, showAll = FALSE)
binomTestMSE(n, x, alpha = 0.05, smooth = 1, showRMSE = TRUE, showAll = FALSE)
n |
sample size |
x |
number of successes |
alpha |
significance level for confidence interval |
smooth |
smoothness index |
showRMSE |
a logical variable indicating whether to show the value of RSME |
showAll |
a logical variable indicating whether to show confidence intervals of all possible number of successes |
Generates lower and upper confidence interval limits for a binomial proportion for
various sample sizes,
various numbers of successes.
When the binomTestMSE
function is called, it returns a two-element vector in which
the first element is the lower bound of the RMSE-minimizing confidence interval, and
the second element is the upper bound of the RMSE-minimizing confidence interval.
An RMSE-minimizing two-sided 100 * (1 - alpha) percent confidence interval
for p is constructed from a random sample of size n from a Bernoulli(p)
population. The parameter x
gives the number of successes in the n
mutually independent Bernoulli trials. For n <= 15, all possible jumps
between acceptance curves associated with the actual coverage function are
enumerated based on their one-to-one relationship with the symmetric Dyck
paths. For each sequence of jumps between acceptance curves, the confidence
interval bounds that are returned are associated with discontinuities in the
actual coverage function that together result in the lowest possible RMSE. A
set of smoothness constraints that build on four existing non-conservative
confidence intervals (Wilson-score, Jeffreys, Arcsine, and Agresti-Coull) is
used if the smoothness index smooth
is set to one. These constraints
ensure that the RMSE-confidence interval achieves smoothness, a preferable
property of the binomial confidence interval that is related to lower bound
differences for adjacent values of x
. There is a trade-off between
the RMSE and the smoothness. For n > 100, smoothness is required. The RMSE
usually increases if the smoothness constraints are used. For n > 15, only
the symmetric Dyck paths associated with the Wilson–score, Jeffreys, Arcsine,
and Agresti–Coull confidence interval procedures are used instead of
enumerating because the computation time increases in a factorial fashion in
n. The minimal RMSE is not guaranteed for n > 15 because another symmetric
Dyck path other than those associated with the four existing confidence
interval procedures might prove to be optimal. However, this procedure does
ensure a lower RMSE than any of the four existing confidence intervals for
all n.
Kexin Feng ([email protected]), Larry Leemis ([email protected]), Heather Sasinowska ([email protected])
Feng, K., Sasinowska, H., Leemis, L. (2022), "RMSE-Minimizing Confidence Interval for the Binomial Parameter", Computational Statistics, 37 (4), 2022, 1855-1885.
binomTestMSE(10, 3)
binomTestMSE(10, 3)
Enables:
confidence region plots in two-dimensions corresponding to a user given dataset, level of significance, and parametric probability distribution (supported distribution suffixes: cauchy, gamma, invgauss, lnorm, llogis, logis, norm, unif, weibull),
coverage simulations (if a point of interest is within or outside of a confidence region boundary) for either random samples drawn from a user-specified parametric distribution or for a user-specified dataset and point of interest,
calculating confidence intervals and the associated actual coverage for binomial proportions, and
calculating the support values and the probability mass function of the Kaplan-Meier product-limit estimator.
plotting the actual coverage function for a randomly right-censored data set with exponential failure times and exponential censoring times.
Request from authors: Please properly cite any use of this package and/or its algorithms, which are detailed in the corresponding publication by Weld et al. (2018) <doi:10.1080/00031305.2018.1564696>, Park and Leemis (2019) <doi:10.1002/sim.8189>, Feng et al. (2022) <doi:10.1007/s00180-021-01183-3>, and Qin et al. (2023) <doi:10.1080/00031305.2022.2070279>. Additionally, we welcome and appreciate your feedback and insights as to how this resource is being leveraged to improve whatever it is you do. Please include your name and academic and/or business affiliation in your correspondence.
This package includes the functions:
confidence intervals for binomial proportions: binomTest
,
actual coverage calculation for binomial proportions: binomTestCoverage
,
actual coverage plots for binomial proportions: binomTestCoveragePlot
,
ensemble confidence intervals for binomial proportions: binomTestEnsemble
,
minimum root mean square confidence intervals for binomial proportions: binomTestMSE
,
confidence region coverage analysis: coversim
,
confidence region plots: crplot
,
actual coverage plot and table: km.coverage
,
enumeration of Kaplan-Meier product-limit estimator outcomes: km.outcomes
,
probability mass function of the Kaplan-Meier product-limit estimator: km.pmf
,
Kaplan-Meier product-limit estimator support values: km.support
, and
probability mass functions of the Kaplan-Meier product-limit estimator: km.surv
.
The CRAN website https://CRAN.R-project.org/package=conf contains links for vignettes on the
crplot
, coversim
, km.outcomes
, km.pmf
, km.support
,
and km.surv
functions.
The lead author thanks The Omar Bradley Fellowship for Research in Mathematics for funding that partially supported this work.
Christopher Weld, Kexin Feng, Hayeon Park, Yuxin Qin, Xingyu Wang, Heather Sasinowska, Larry Leemis
Maintainer: Christopher Weld <[email protected]>
Creates a confidence region and determines coverage results for a corresponding point of interest.
Iterates through a user specified number of trials.
Each trial uses a random dataset with user-specified parameters (default) or a user specified dataset
matrix ('n'
samples per column, 'iter'
columns) and returns the corresponding actual coverage results.
See the CRAN website https://CRAN.R-project.org/package=conf for a link to a coversim
vignette.
coversim(alpha, distn, n = NULL, iter = NULL, dataset = NULL, point = NULL, seed = NULL, a = NULL, b = NULL, kappa = NULL, lambda = NULL, mu = NULL, s = NULL, sigma = NULL, theta = NULL, heuristic = 1, maxdeg = 5, ellipse_n = 4, pts = FALSE, mlelab = TRUE, sf = c(5, 5), mar = c(4, 4.5, 2, 1.5), xlab = "", ylab = "", main = "", xlas = 0, ylas = 0, origin = FALSE, xlim = NULL, ylim = NULL, tol = .Machine$double.eps ^ 1, info = FALSE, returnsamp = FALSE, returnquant = FALSE, repair = TRUE, exact = FALSE, showplot = FALSE, delay = 0 )
coversim(alpha, distn, n = NULL, iter = NULL, dataset = NULL, point = NULL, seed = NULL, a = NULL, b = NULL, kappa = NULL, lambda = NULL, mu = NULL, s = NULL, sigma = NULL, theta = NULL, heuristic = 1, maxdeg = 5, ellipse_n = 4, pts = FALSE, mlelab = TRUE, sf = c(5, 5), mar = c(4, 4.5, 2, 1.5), xlab = "", ylab = "", main = "", xlas = 0, ylas = 0, origin = FALSE, xlim = NULL, ylim = NULL, tol = .Machine$double.eps ^ 1, info = FALSE, returnsamp = FALSE, returnquant = FALSE, repair = TRUE, exact = FALSE, showplot = FALSE, delay = 0 )
alpha |
significance level; scalar or vector; resulting plot illustrates a 100(1 - |
distn |
distribution to fit the dataset to; accepted values: |
n |
trial sample size (producing each confidence region); scalar or vector; needed if a dataset is not given. |
iter |
iterations (or replications) of individual trials per parameterization; needed if a dataset is not given. |
dataset |
a |
point |
coverage is assessed relative to this point. |
seed |
random number generator seed. |
a |
distribution parameter (when applicable). |
b |
distribution parameter (when applicable). |
kappa |
distribution parameter (when applicable). |
lambda |
distribution parameter (when applicable). |
mu |
distribution parameter (when applicable). |
s |
distribution parameter (when applicable). |
sigma |
distribution parameter (when applicable). |
theta |
distribution parameter (when applicable). |
heuristic |
numeric value selecting method for plotting: 0 for elliptic-oriented point distribution, and 1 for smoothing boundary search heuristic. |
maxdeg |
maximum angle tolerance between consecutive plot segments in degrees. |
ellipse_n |
number of roughly equidistant confidence region points to plot using the elliptic-oriented point distribution (must be a multiple of four because its algorithm exploits symmetry in the quadrants of an ellipse). |
pts |
displays confidence region boundary points if |
mlelab |
logical argument to include the maximum likelihood estimate coordinate point (default is |
sf |
significant figures in axes labels specified using sf = c(x, y), where x and y represent the optional digits argument
in the R function |
mar |
specifies margin values for |
xlab |
string specifying the horizontal axis label (applies to confidence region plots when |
ylab |
string specifying the vertical axis label (applies to confidence region plots when |
main |
string specifying the plot title (applies to confidence region plots when |
xlas |
numeric value of 0, 1, 2, or 3 specifying the style of axis labels (see |
ylas |
numeric value of 0, 1, 2, or 3 specifying the style of axis labels (see |
origin |
logical argument to include the plot origin (applies to confidence region plots when |
xlim |
two element vector containing horizontal axis minimum and maximum values (applies to confidence region plots
when |
ylim |
two element vector containing vertical axis minimum and maximum values (applies to confidence region plots
when |
tol |
the |
info |
logical argument to return coverage information in a list; includes |
returnsamp |
logical argument; if |
returnquant |
logical argument; if |
repair |
logical argument to repair regions inaccessible using a radial angle from its MLE (multiple root azimuths). |
exact |
logical argument specifying if alpha value is adjusted to compensate for negative coverage bias in order to achieve (1 - alpha) coverage probability using previously recorded Monte Carlo simulation results; available for limited values of alpha (roughly <= 0.2–0.3), n (typically n = 4, 5, ..., 50) and distributions (distn suffixes: weibull, llogis, norm). |
showplot |
logical argument specifying if each coverage trial produces a plot. |
delay |
numeric value of delay (in seconds) between trials so its plot can be seen (applies when |
Parameterizations for supported distributions are given following
the default axes convention in use by crplot
and coversim
, which are:
Horizontal | Vertical | |
Distribution | Axis | Axis |
Cauchy | |
|
gamma | |
|
inverse Gaussian | |
|
log logistic | |
|
log normal | |
|
logistic | |
|
normal | |
|
uniform | |
|
Weibull | |
|
Each respective distribution is defined below.
The Cauchy distribution
for the real-numbered location parameter , scale parameter
, and
is a real number,
has the probability density function
The gamma distribution
for shape parameter , scale parameter
, and
,
has the probability density function
The inverse Gaussian distribution
for mean , shape parameter
, and
,
has the probability density function
The log logistic distribution
for scale parameter , shape parameter
, and
,
has a probability density function
The log normal distribution
for the real-numbered mean of the logarithm, standard deviation
of the logarithm, and
,
has the probability density function
The logistic distribution
for the real-numbered location parameter , scale parameter
, and
is a real number,
has the probability density function
The normal distribution
for the real-numbered mean , standard deviation
, and
is a real number,
has the probability density function
The uniform distribution for real-valued parameters and
where
and
,
has the probability density function
The Weibull distribution
for scale parameter , shape parameter
, and
,
has the probability density function
If the optional argument info = TRUE
is included then a list of coverage results is returned. That list
includes alpha
value(s), n
value(s), coverage and error results per iteration. Additionally, returnsamp = TRUE
and/or returnquant = TRUE
will result in an n
row, iter
column maxtix of sample and/or sample cdf values.
Christopher Weld ([email protected])
Lawrence Leemis ([email protected])
C. Weld, A. Loh, L. Leemis (2020), "Plotting Two-Dimensional Confidence Regions", The American Statistician, Volume 72, Number 2, 156–168.
## assess actual coverage at various alpha = {0.5, 0.1} given n = 30 samples, completing ## 10 trials per parameterization (iter) for a normal(mean = 2, sd = 3) rv coversim(alpha = c(0.5, 0.1), "norm", n = 30, iter = 10, mu = 2, sigma = 3) ## show plots for 5 iterations of 30 samples each from a Weibull(2, 3) coversim(0.5, "weibull", n = 30, iter = 5, lambda = 1.5, kappa = 0.5, showplot = TRUE, origin = TRUE)
## assess actual coverage at various alpha = {0.5, 0.1} given n = 30 samples, completing ## 10 trials per parameterization (iter) for a normal(mean = 2, sd = 3) rv coversim(alpha = c(0.5, 0.1), "norm", n = 30, iter = 10, mu = 2, sigma = 3) ## show plots for 5 iterations of 30 samples each from a Weibull(2, 3) coversim(0.5, "weibull", n = 30, iter = 5, lambda = 1.5, kappa = 0.5, showplot = TRUE, origin = TRUE)
Plotting a two-dimensional confidence region for probability distribution parameters (supported distribution
suffixes: cauchy, gamma, invgauss, lnorm, llogis, logis, norm, unif, weibull) corresponding to a user given
complete or right-censored dataset and level of significance. See the CRAN website
https://CRAN.R-project.org/package=conf for a link to two crplot
vignettes.
crplot(dataset, alpha, distn, cen = rep(1, length(dataset)), heuristic = 1, maxdeg = 5, ellipse_n = 4, pts = TRUE, mlelab = TRUE, sf = NULL, mar = c(4, 4.5, 2, 1.5), xyswap = FALSE, xlab = "", ylab = "", main = "", xlas = 0, ylas = 0, origin = FALSE, xlim = NULL, ylim = NULL, tol = .Machine$double.eps ^ 1, info = FALSE, maxcount = 30, repair = TRUE, jumpshift = 0.5, jumpuphill = min(alpha, 0.01), jumpinfo = FALSE, showjump = FALSE, showplot = TRUE, animate = FALSE, delay = 0.5, exact = FALSE, silent = FALSE )
crplot(dataset, alpha, distn, cen = rep(1, length(dataset)), heuristic = 1, maxdeg = 5, ellipse_n = 4, pts = TRUE, mlelab = TRUE, sf = NULL, mar = c(4, 4.5, 2, 1.5), xyswap = FALSE, xlab = "", ylab = "", main = "", xlas = 0, ylas = 0, origin = FALSE, xlim = NULL, ylim = NULL, tol = .Machine$double.eps ^ 1, info = FALSE, maxcount = 30, repair = TRUE, jumpshift = 0.5, jumpuphill = min(alpha, 0.01), jumpinfo = FALSE, showjump = FALSE, showplot = TRUE, animate = FALSE, delay = 0.5, exact = FALSE, silent = FALSE )
dataset |
a vector of n data values. |
alpha |
significance level; resulting plot illustrates a 100(1 - |
distn |
distribution to fit the dataset to; accepted values: |
cen |
a vector of binary values specifying if the corresponding data values are right-censored (0), or observed (1, default); its length must match length(dataset). |
heuristic |
numeric value selecting method for plotting: 0 for elliptic-oriented point distribution, and 1 for smoothing boundary search heuristic. |
maxdeg |
maximum angle tolerance between consecutive plot segments in degrees. |
ellipse_n |
number of roughly equidistant confidence region points to plot using the elliptic-oriented point distribution (must be a multiple of four because its algorithm exploits symmetry in the quadrants of an ellipse). |
pts |
displays confidence region boundary points identified if |
mlelab |
logical argument to include the maximum
likelihood estimate coordinate point (default is |
sf |
significant figures in axes labels specified using sf = c(x, y), where x and y represent the optional
digits argument in the R function |
mar |
specifies margin values for |
xyswap |
logical argument to switch the axes that the distribution parameter are shown. |
xlab |
string specifying the x axis label. |
ylab |
string specifying the y axis label. |
main |
string specifying the plot title. |
xlas |
numeric value of 0, 1, 2, or 3 specifying the style of axis labels (see |
ylas |
numeric value of 0, 1, 2, or 3 specifying the style of axis labels (see |
origin |
logical argument to include the plot origin (default is |
xlim |
two-element vector containing horizontal axis minimum and maximum values. |
ylim |
two-element vector containing vertical axis minimum and maximum values. |
tol |
the |
info |
logical argument to return plot information: MLE is returned as a list; (x, y) plot point coordinates and corresponding phi angles (with respect to MLE) are returned as a list. |
maxcount |
integer value specifying the number of smoothing search iterations before terminating with |
repair |
logical argument to repair regions inaccessible using a radial angle from its MLE due to multiple
roots at select |
jumpshift |
see vignette "conf Advanced Options" for details; location (as a fractional value between 0 and 1) along the vertical or horizontal "gap" (near an uncharted region) to locate a jump-center toward; can be either a scalar value (uniformly applied to all jump-centers) or vector of length four (with unique values for its respective quadrants, relative to the MLE). |
jumpuphill |
see vignette "conf Advanced Options" for details; significance level increase to |
jumpinfo |
logical argument to return plot info (see |
showjump |
logical argument specifying if jump-center repair reference points appear on the confidence region plot. |
showplot |
logical argument specifying if a plot is output; altering from its default of |
animate |
logical argument specifying if an animated plot build will display; the annimation sequence is given in successive plots. |
delay |
numeric value of delay (in seconds) between successive plots when |
exact |
logical argument specifying if alpha value is adjusted to compensate for negative coverage bias to achieve (1 - alpha) coverage probability using previously recorded Monte Carlo simulation results; available for limited values of alpha (roughly <= 0.2–0.3), n (typically n = 4, 5, ..., 50) and distributions (distn suffixes: weibull, llogis, norm). |
silent |
logical argument specifying if console output should be suppressed. |
This function plots a confidence region for a variety of two-parameter distributions. It requires:
a vector of dataset values,
the level of significance (alpha), and
a population distribution to fit the data to.
Plots display according to probability density function parameterization given later in this section. Two heuristics (and their associated combination) are available to plot confidence regions. Along with their descriptions, they are:
Smoothing Boundary Search Heuristic (default). This heuristic plots more points in areas of
greater curvature to ensure a smooth appearance throughout the confidence region boundary. Its
maxdeg
parameter specifies the maximum tolerable angle between three successive points.
Lower values of maxdeg
result in smoother plots, and its default value of 5 degrees
provides adequate smoothing in most circumstances. Values of maxdeg
3 are not
recommended due to their complicating implications to trigonometric numerical approximations near 0
and 1; their use may result in plot errors.
Elliptic-Oriented Point Distribution. This heuristic allows the user to specify
a number of points to plot along the confidence region boundary at roughly uniform intervals.
Its name is derived from the technique it uses to choose these points—an extension of the Steiner
generation of a non-degenerate conic section, also known as the parallelogram method—which identifies
points along an ellipse that are approximately equidistant. To exploit the computational benefits of
ellipse symmetry over its four quadrants, ellipse_n
value must be divisible by four.
By default, crplot
implements the smoothing boundary search heuristic. Alternatively,
the user can plot using the elliptic-oriented point distribution algorithm, or a combination
of them both. Combining the two techniques initializes the plot using the elliptic-oriented point
distribution algorithm, and then subsequently populates additional points in areas of high curvature
(those outside of the maximum angle tolerance parameterization) in accordance with the smoothing
boundary search heuristic. This combination results when the smoothing boundary search heuristic
is specified in conjunction with an ellipse_n
value greater than four.
Both of the aforementioned heuristics use a radial profile log likelihood function to identify
points along the confidence region boundary. It cuts the log likelihood function in a directional
azimuth from its MLE, and locates the associated confidence region boundary point using the
asymptotic results associated with the ratio test statistic
which converges in distribution to the chi-square distribution with two degrees of freedom (for
a two parameter distribution).
The default axes convention in use by crplot
are
Horizontal | Vertical | |
Distribution | Axis | Axis |
Cauchy | |
|
gamma | |
|
inverse Gaussian | |
|
log logistic | |
|
log normal | |
|
logistic | |
|
normal | |
|
uniform | |
|
Weibull | |
|
where each respective distribution is defined below.
The Cauchy distribution
for the real-numbered location parameter , scale parameter
, and
is a real number,
has the probability density function
The gamma distribution
for shape parameter , scale parameter
, and
,
has the probability density function
The inverse Gaussian distribution
for mean , shape parameter
, and
,
has the probability density function
The log logistic distribution
for scale parameter , shape parameter
, and
,
has a probability density function
The log normal distribution
for the real-numbered mean of the logarithm, standard deviation
of the logarithm, and
,
has the probability density function
The logistic distribution
for the real-numbered location parameter , scale parameter
, and
is a real number,
has the probability density function
The normal distribution
for the real-numbered mean , standard deviation
, and
is a real number,
has the probability density function
The uniform distribution for real-valued parameters and
where
and
,
has the probability density function
The Weibull distribution
for scale parameter , shape parameter
, and
,
has the probability density function
If the optional argument info = TRUE
is included then a list is returned with:
parm1*: a vector containing the associated confidence region boundary values for parameter 1
parm2*: a vector containing the associated confidence region boundary values for parameter 2
phi: a vector containing the angles used
parm1hat*: the MLE for parameter 1
parm2hat*: the MLE for parameter 2
*Note: "param1" and "param2" are placeholders that will be replaced with the appropriate parameter names based on the probability distribution.
Christopher Weld ([email protected])
Lawrence Leemis ([email protected])
A. Jaeger (2016), "Computation of Two- and Three-Dimensional Confidence Regions with the Likelihood Ratio", The American Statistician, 49, 48–53.
C. Weld, A. Loh, L. Leemis (2020), "Plotting Two-Dimensional Confidence Regions", The American Statistician, Volume 72, Number 2, 156–168.
## plot the 95% confidence region for Weibull shape and scale parameters ## corresponding to the given ballbearing dataset ballbearing <- c(17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 48.48, 51.84, 51.96, 54.12, 55.56, 67.80, 68.64, 68.64, 68.88, 84.12, 93.12, 98.64, 105.12, 105.84, 127.92, 128.04, 173.40) crplot(dataset = ballbearing, distn = "weibull", alpha = 0.05) ## repeat this plot using the elliptic-oriented point distribution heuristic crplot(dataset = ballbearing, distn = "weibull", alpha = 0.05, heuristic = 0, ellipse_n = 80) ## combine the two heuristics, compensating any elliptic-oriented point verticies whose apparent ## angles > 6 degrees with additional points, and expand the plot area to include the origin crplot(dataset = ballbearing, distn = "weibull", alpha = 0.05, maxdeg = 6, ellipse_n = 80, origin = TRUE) ## next use the inverse Gaussian distribution and show no plot points crplot(dataset = ballbearing, distn = "invgauss", alpha = 0.05, pts = FALSE)
## plot the 95% confidence region for Weibull shape and scale parameters ## corresponding to the given ballbearing dataset ballbearing <- c(17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 48.48, 51.84, 51.96, 54.12, 55.56, 67.80, 68.64, 68.64, 68.88, 84.12, 93.12, 98.64, 105.12, 105.84, 127.92, 128.04, 173.40) crplot(dataset = ballbearing, distn = "weibull", alpha = 0.05) ## repeat this plot using the elliptic-oriented point distribution heuristic crplot(dataset = ballbearing, distn = "weibull", alpha = 0.05, heuristic = 0, ellipse_n = 80) ## combine the two heuristics, compensating any elliptic-oriented point verticies whose apparent ## angles > 6 degrees with additional points, and expand the plot area to include the origin crplot(dataset = ballbearing, distn = "weibull", alpha = 0.05, maxdeg = 6, ellipse_n = 80, origin = TRUE) ## next use the inverse Gaussian distribution and show no plot points crplot(dataset = ballbearing, distn = "invgauss", alpha = 0.05, pts = FALSE)
Density, distribution function, quantile function, and random generation for the inverse Gaussian distribution. The corresponding code for these functions as well as the manual information included here is attributed to Christophe Pouzat's STAR Package (archived 2022-05-23).
dinvgauss(x, mu = 1, sigma2 = 1, boundary = NULL, log = FALSE) pinvgauss(q, mu = 1, sigma2 = 1, boundary = NULL, lower.tail = TRUE, log.p = FALSE) qinvgauss(p, mu = 1, sigma2 = 1, boundary = NULL) rinvgauss(n = 1, mu = 1, sigma2 = 1, boundary = NULL)
dinvgauss(x, mu = 1, sigma2 = 1, boundary = NULL, log = FALSE) pinvgauss(q, mu = 1, sigma2 = 1, boundary = NULL, lower.tail = TRUE, log.p = FALSE) qinvgauss(p, mu = 1, sigma2 = 1, boundary = NULL) rinvgauss(n = 1, mu = 1, sigma2 = 1, boundary = NULL)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
mu |
mean value of the distribution in the default
parameterization, |
sigma2 |
variance of the latent Brownian motion. When this parameterization is used (the default) the distance between the "starting" point and the boundary ("absorbing barrier") is set to 1. |
boundary |
distance between the starting point and the "absorbing barrier" of the latent Brownian motion. When this parameterization is used, the Brownian motion variance is set to 1. |
lower.tail |
logical; if |
log , log.p
|
logical; if |
With the default, "sigma2"
, parameterization (mu = m,
sigma2 = s^2
) the inverse
Gaussian distribution has density:
with .
The theoretical mean is:
and the theoretical variance is:
.
With the default,
"boundary"
, parameterization (mu = m,
boundary = b
), the inverse
Gaussian distribution has density:
with .
The theoretical mean is:
and the theoretical variance is:
.
The latent Brownian motion is described in Lindsey (2004) pp 209-213,
Whitemore and Seshadri (1987), Aalen and Gjessing (2001) and Gerstein
and Mandelbrot (1964).
The expression for the distribution function is given in Eq. 4 of Whitemore and Seshadri (1987).
Initial guesses for the inversion of the distribution function used
in qinvgauss
are obtained with the transformation of Whitemore
and Yalovsky (1978).
Random variates are obtained with the method of Michael et al (1976) which is also described by Devroye (1986, p 148) and Gentle (2003, p 193).
dinvgauss
gives the density, pinvgauss
gives the
distribution function, qinvgauss
gives the quantile function
and rinvgauss
generates random deviates.
Christophe Pouzat [email protected]
Gerstein, George L. and Mandelbrot, Benoit (1964) Random Walk Models for the Spike Activity of a Single Neuron. Biophys J. 4: 41–68.
Whitmore, G. A. and Yalovsky, M. (1978) A normalizing logarithmic transformation for inverse Gaussian random variables. Technometrics 20: 207–208.
Whitmore, G. A. and Seshadri, V. (1987) A Heuristic Derivation of the Inverse Gaussian Distribution. The American Statistician 41: 280–281.
Aalen, Odd O. and Gjessing, Hakon K. (2001) Understanding the Shape of the Hazard Rate: A Process Point of View. Statistical Science 16: 1–14.
Lindsey, J.K. (2004) Introduction to Applied Statistics: A Modelling Approach. OUP.
Michael, J. R., Schucany, W. R. and Haas, R. W. (1976) Generating random variates using transformations with multiple roots. The American Statistician 30: 88–90.
Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag.
Gentle, J. E. (2003) Random Number Generation and Monte Carlo Methods. Springer.
## Not run: ## Start with the inverse Gauss ## Define standard mu and sigma mu.true <- 0.075 ## a mean ISI of 75 ms sigma2.true <- 3 ## Define a sequence of points on the time axis X <- seq(0.001, 0.3, 0.001) ## look at the density plot(X, dinvgauss(X, mu.true, sigma2.true), type="l", xlab = "ISI (s)",ylab = "Density") ## Generate a sample of 100 ISI from this distribution sampleSize <- 100 sampIG <- rinvgauss(sampleSize, mu = mu.true, sigma2 = sigma2.true) ## check out the empirical survival function (obtained with the Kaplan-Meier ## estimator) against the true one library(survival) sampIG.KMfit <- survfit(Surv(sampIG, 1 + numeric(length(sampIG))) ~1) plot(sampIG.KMfit, log = TRUE) lines(X, pinvgauss(X, mu.true, sigma2.true, lower.tail = FALSE), col = 2) ## Get a ML fit sampIGmleIG <- invgaussMLE(sampIG) ## compare true and estimated parameters rbind(est = sampIGmleIG$estimate, se = sampIGmleIG$se, true = c(mu.true, sigma2.true)) ## plot contours of the log relative likelihood function Mu <- seq(sampIGmleIG$estimate[1] - 3 * sampIGmleIG$se[1], sampIGmleIG$estimate[1] + 3 * sampIGmleIG$se[1], sampIGmleIG$se[1] / 10) Sigma2 <- seq(sampIGmleIG$estimate[2] - 7 * sampIGmleIG$se[2], sampIGmleIG$estimate[2] + 7 * sampIGmleIG$se[2], sampIGmleIG$se[2] / 10) sampIGmleIGcontour <- sapply(Mu, function(mu) sapply(Sigma2, function(s2) sampIGmleIG$r(mu, s2))) contour(Mu, Sigma2, t(sampIGmleIGcontour), levels=c(log(c(0.5, 0.1)), -0.5 * qchisq(c(0.95, 0.99), df = 2)), labels=c("log(0.5)", "log(0.1)", "-1/2 * P(Chi2 = 0.95)", "-1/2 * P(Chi2 = 0.99)"), xlab = expression(mu), ylab = expression(sigma^2)) points(mu.true, sigma2.true, pch = 16,col = 2) ## We can see that the contours are more parabola like on a log scale contour(log(Mu),log(Sigma2),t(sampIGmleIGcontour), levels = c(log(c(0.5, 0.1)), -0.5 * qchisq(c(0.95, 0.99), df = 2)), labels = c("log(0.5)", "log(0.1)", "-1/2 * P(Chi2 = 0.95)", "-1/2 * P(Chi2 = 0.99)"), xlab = expression(log(mu)), ylab = expression(log(sigma^2))) points(log(mu.true), log(sigma2.true), pch = 16, col = 2) ## make a deviance test for the true parameters pchisq(-2 * sampIGmleIG$r(mu.true, sigma2.true), df = 2) ## check fit with a QQ plot qqDuration(sampIGmleIG, log = "xy") ## Generate a censored sample using an exponential distribution sampEXP <- rexp(sampleSize, 1/(2 * mu.true)) sampIGtime <- pmin(sampIG,sampEXP) sampIGstatus <- as.numeric(sampIG <= sampEXP) ## fit the censored sample sampIG2mleIG <- invgaussMLE(sampIGtime, sampIGstatus) ## look at the results rbind(est = sampIG2mleIG$estimate, se = sampIG2mleIG$se, true = c(mu.true,sigma2.true)) pchisq(-2 * sampIG2mleIG$r(mu.true, sigma2.true), df = 2) ## repeat the survival function estimation sampIG2.KMfit <- survfit(Surv(sampIGtime, sampIGstatus) ~1) plot(sampIG2.KMfit, log = TRUE) lines(X, pinvgauss(X, sampIG2mleIG$estimate[1], sampIG2mleIG$estimate[2], lower.tail = FALSE), col = 2) ## End(Not run)
## Not run: ## Start with the inverse Gauss ## Define standard mu and sigma mu.true <- 0.075 ## a mean ISI of 75 ms sigma2.true <- 3 ## Define a sequence of points on the time axis X <- seq(0.001, 0.3, 0.001) ## look at the density plot(X, dinvgauss(X, mu.true, sigma2.true), type="l", xlab = "ISI (s)",ylab = "Density") ## Generate a sample of 100 ISI from this distribution sampleSize <- 100 sampIG <- rinvgauss(sampleSize, mu = mu.true, sigma2 = sigma2.true) ## check out the empirical survival function (obtained with the Kaplan-Meier ## estimator) against the true one library(survival) sampIG.KMfit <- survfit(Surv(sampIG, 1 + numeric(length(sampIG))) ~1) plot(sampIG.KMfit, log = TRUE) lines(X, pinvgauss(X, mu.true, sigma2.true, lower.tail = FALSE), col = 2) ## Get a ML fit sampIGmleIG <- invgaussMLE(sampIG) ## compare true and estimated parameters rbind(est = sampIGmleIG$estimate, se = sampIGmleIG$se, true = c(mu.true, sigma2.true)) ## plot contours of the log relative likelihood function Mu <- seq(sampIGmleIG$estimate[1] - 3 * sampIGmleIG$se[1], sampIGmleIG$estimate[1] + 3 * sampIGmleIG$se[1], sampIGmleIG$se[1] / 10) Sigma2 <- seq(sampIGmleIG$estimate[2] - 7 * sampIGmleIG$se[2], sampIGmleIG$estimate[2] + 7 * sampIGmleIG$se[2], sampIGmleIG$se[2] / 10) sampIGmleIGcontour <- sapply(Mu, function(mu) sapply(Sigma2, function(s2) sampIGmleIG$r(mu, s2))) contour(Mu, Sigma2, t(sampIGmleIGcontour), levels=c(log(c(0.5, 0.1)), -0.5 * qchisq(c(0.95, 0.99), df = 2)), labels=c("log(0.5)", "log(0.1)", "-1/2 * P(Chi2 = 0.95)", "-1/2 * P(Chi2 = 0.99)"), xlab = expression(mu), ylab = expression(sigma^2)) points(mu.true, sigma2.true, pch = 16,col = 2) ## We can see that the contours are more parabola like on a log scale contour(log(Mu),log(Sigma2),t(sampIGmleIGcontour), levels = c(log(c(0.5, 0.1)), -0.5 * qchisq(c(0.95, 0.99), df = 2)), labels = c("log(0.5)", "log(0.1)", "-1/2 * P(Chi2 = 0.95)", "-1/2 * P(Chi2 = 0.99)"), xlab = expression(log(mu)), ylab = expression(log(sigma^2))) points(log(mu.true), log(sigma2.true), pch = 16, col = 2) ## make a deviance test for the true parameters pchisq(-2 * sampIGmleIG$r(mu.true, sigma2.true), df = 2) ## check fit with a QQ plot qqDuration(sampIGmleIG, log = "xy") ## Generate a censored sample using an exponential distribution sampEXP <- rexp(sampleSize, 1/(2 * mu.true)) sampIGtime <- pmin(sampIG,sampEXP) sampIGstatus <- as.numeric(sampIG <= sampEXP) ## fit the censored sample sampIG2mleIG <- invgaussMLE(sampIGtime, sampIGstatus) ## look at the results rbind(est = sampIG2mleIG$estimate, se = sampIG2mleIG$se, true = c(mu.true,sigma2.true)) pchisq(-2 * sampIG2mleIG$r(mu.true, sigma2.true), df = 2) ## repeat the survival function estimation sampIG2.KMfit <- survfit(Surv(sampIGtime, sampIGstatus) ~1) plot(sampIG2.KMfit, log = TRUE) lines(X, pinvgauss(X, sampIG2mleIG$estimate[1], sampIG2mleIG$estimate[2], lower.tail = FALSE), col = 2) ## End(Not run)
Density, distribution function, quantile function, and random generation for the log logistic distribution. The corresponding code for these functions as well as the manual information included here is attributed to Christophe Pouzat's STAR Package (archived 2022-05-23).
dllogis(x, location = 0, scale = 1, log = FALSE) pllogis(q, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) qllogis(p, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) rllogis(n, location = 0, scale = 1)
dllogis(x, location = 0, scale = 1, log = FALSE) pllogis(q, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) qllogis(p, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) rllogis(n, location = 0, scale = 1)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. If |
location , scale
|
location and scale parameters (non-negative numeric). |
lower.tail |
logical; if |
log , log.p
|
logical; if |
If location
or scale
are omitted, they assume the
default values of 0 and 1 respectively.
The log-logistic distribution with location = m
and scale
= s
has distribution function
and density
.
dllogis
gives the density, pllogis
gives the
distribution function, qllogis
gives the quantile function
and rllogis
generates random deviates.
Christophe Pouzat [email protected]
Lindsey, J.K. (2004) Introduction to Applied Statistics: A Modelling Approach. OUP.
Lindsey, J.K. (2004) The Statistical Analysis of Stochastic Processes in Time. CUP.
## Not run: tSeq <- seq(0.001,0.6,0.001) location.true <- -2.7 scale.true <- 0.025 Yd <- dllogis(tSeq, location.true, scale.true) Yh <- hllogis(tSeq, location.true, scale.true) max.Yd <- max(Yd) max.Yh <- max(Yh) Yd <- Yd / max.Yd Yh <- Yh / max.Yh oldpar <- par(mar=c(5,4,4,4)) plot(tSeq, Yd, type="n", axes=FALSE, ann=FALSE, xlim=c(0,0.6), ylim=c(0,1)) axis(2,at=seq(0,1,0.2),labels=round(seq(0,1,0.2)*max.Yd,digits=2)) mtext("Density (1/s)", side=2, line=3) axis(1,at=pretty(c(0,0.6))) mtext("Time (s)", side=1, line=3) axis(4, at=seq(0,1,0.2), labels=round(seq(0,1,0.2)*max.Yh,digits=2)) mtext("Hazard (1/s)", side=4, line=3, col=2) mtext("Log Logistic Density and Hazard Functions", side=3, line=2,cex=1.5) lines(tSeq,Yd) lines(tSeq,Yh,col=2) par(oldpar) ## End(Not run)
## Not run: tSeq <- seq(0.001,0.6,0.001) location.true <- -2.7 scale.true <- 0.025 Yd <- dllogis(tSeq, location.true, scale.true) Yh <- hllogis(tSeq, location.true, scale.true) max.Yd <- max(Yd) max.Yh <- max(Yh) Yd <- Yd / max.Yd Yh <- Yh / max.Yh oldpar <- par(mar=c(5,4,4,4)) plot(tSeq, Yd, type="n", axes=FALSE, ann=FALSE, xlim=c(0,0.6), ylim=c(0,1)) axis(2,at=seq(0,1,0.2),labels=round(seq(0,1,0.2)*max.Yd,digits=2)) mtext("Density (1/s)", side=2, line=3) axis(1,at=pretty(c(0,0.6))) mtext("Time (s)", side=1, line=3) axis(4, at=seq(0,1,0.2), labels=round(seq(0,1,0.2)*max.Yh,digits=2)) mtext("Hazard (1/s)", side=4, line=3, col=2) mtext("Log Logistic Density and Hazard Functions", side=3, line=2,cex=1.5) lines(tSeq,Yd) lines(tSeq,Yh,col=2) par(oldpar) ## End(Not run)
Estimate gamma model parameters by the maximum likelihood method using possibly censored data. Two different parameterizations of the gamma distribution can be used. The corresponding code for this function as well as the manual information included here is attributed to Christophe Pouzat's STAR Package (archived 2022-05-23).
gammaMLE(yi, ni = numeric(length(yi)) + 1, si = numeric(length(yi)) + 1, scale = TRUE)
gammaMLE(yi, ni = numeric(length(yi)) + 1, si = numeric(length(yi)) + 1, scale = TRUE)
yi |
vector of (possibly binned) observations or a
|
ni |
vector of counts for each value of |
si |
vector of counts of uncensored observations for each
value of |
scale |
logical should the scale ( |
There is no closed form expression for the MLE of a gamma distribution. The numerical method implemented here uses the profile likelihood described by Monahan (2001) pp 210-216.
In order to ensure good behavior of the numerical optimization
routines, optimization is performed on the log of the parameters
(shape
and scale
or rate
).
Standard errors are obtained from the inverse of the observed information matrix at the MLE. They are transformed to go from the log scale used by the optimization routine to the parameterization requested.
A list of class durationFit
with the following components:
estimate |
the estimated parameters, a named vector. |
se |
the standard errors, a named vector. |
logLik |
the log likelihood at maximum. |
r |
a function returning the log of the relative likelihood function. |
mll |
a function returning the opposite of the log likelihood function using the log of the parameters. |
call |
the matched call. |
The returned standard errors (component se
) are valid in the asymptotic limit. You
should plot contours using function r
in the returned list and
check that the contours are reasonably close to ellipses.
Christophe Pouzat [email protected]
Monahan, J. F. (2001) Numerical Methods of Statistics. CUP.
Lindsey, J.K. (2004) Introduction to Applied Statistics: A Modelling Approach. OUP.
## Not run: ## Simulate sample of size 100 from a gamma distribution set.seed(1102006,"Mersenne-Twister") sampleSize <- 100 shape.true <- 6 scale.true <- 0.012 sampGA <- rgamma(sampleSize,shape=shape.true,scale=scale.true) sampGAmleGA <- gammaMLE(sampGA) rbind(est = sampGAmleGA$estimate,se = sampGAmleGA$se,true = c(shape.true,scale.true)) ## Estimate the log relative likelihood on a grid to plot contours Shape <- seq(sampGAmleGA$estimate[1]-4*sampGAmleGA$se[1], sampGAmleGA$estimate[1]+4*sampGAmleGA$se[1], sampGAmleGA$se[1]/10) Scale <- seq(sampGAmleGA$estimate[2]-4*sampGAmleGA$se[2], sampGAmleGA$estimate[2]+4*sampGAmleGA$se[2], sampGAmleGA$se[2]/10) sampGAmleGAcontour <- sapply(Shape, function(sh) sapply(Scale, function(sc) sampGAmleGA$r(sh,sc))) ## plot contours using a linear scale for the parameters ## draw four contours corresponding to the following likelihood ratios: ## 0.5, 0.1, Chi2 with 2 df and p values of 0.95 and 0.99 X11(width=12,height=6) layout(matrix(1:2,ncol=2)) contour(Shape,Scale,t(sampGAmleGAcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels=c("log(0.5)", "log(0.1)", "-1/2*P(Chi2=0.95)", "-1/2*P(Chi2=0.99)"), xlab="shape",ylab="scale", main="Log Relative Likelihood Contours" ) points(sampGAmleGA$estimate[1],sampGAmleGA$estimate[2],pch=3) points(shape.true,scale.true,pch=16,col=2) ## The contours are not really symmetrical about the MLE we can try to ## replot them using a log scale for the parameters to see if that improves ## the situation contour(log(Shape),log(Scale),t(sampGAmleGAcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels="", xlab="log(shape)",ylab="log(scale)", main="Log Relative Likelihood Contours", sub="log scale for the parameters") points(log(sampGAmleGA$estimate[1]),log(sampGAmleGA$estimate[2]),pch=3) points(log(shape.true),log(scale.true),pch=16,col=2) ## make a parametric boostrap to check the distribution of the deviance nbReplicate <- 10000 sampleSize <- 100 system.time( devianceGA100 <- replicate(nbReplicate,{ sampGA <- rgamma(sampleSize,shape=shape.true,scale=scale.true) sampGAmleGA <- gammaMLE(sampGA) -2*sampGAmleGA$r(shape.true,scale.true) } ) )[3] ## Get 95 and 99% confidence intervals for the QQ plot ci <- sapply(1:nbReplicate, function(idx) qchisq(qbeta(c(0.005,0.025,0.975,0.995), idx, nbReplicate-idx+1), df=2) ) ## make QQ plot X <- qchisq(ppoints(nbReplicate),df=2) Y <- sort(devianceGA100) X11() plot(X,Y,type="n", xlab=expression(paste(chi[2]^2," quantiles")), ylab="MC quantiles", main="Deviance with true parameters after ML fit of gamma data", sub=paste("sample size:", sampleSize,"MC replicates:", nbReplicate) ) abline(a=0,b=1) lines(X,ci[1,],lty=2) lines(X,ci[2,],lty=2) lines(X,ci[3,],lty=2) lines(X,ci[4,],lty=2) lines(X,Y,col=2) ## End(Not run)
## Not run: ## Simulate sample of size 100 from a gamma distribution set.seed(1102006,"Mersenne-Twister") sampleSize <- 100 shape.true <- 6 scale.true <- 0.012 sampGA <- rgamma(sampleSize,shape=shape.true,scale=scale.true) sampGAmleGA <- gammaMLE(sampGA) rbind(est = sampGAmleGA$estimate,se = sampGAmleGA$se,true = c(shape.true,scale.true)) ## Estimate the log relative likelihood on a grid to plot contours Shape <- seq(sampGAmleGA$estimate[1]-4*sampGAmleGA$se[1], sampGAmleGA$estimate[1]+4*sampGAmleGA$se[1], sampGAmleGA$se[1]/10) Scale <- seq(sampGAmleGA$estimate[2]-4*sampGAmleGA$se[2], sampGAmleGA$estimate[2]+4*sampGAmleGA$se[2], sampGAmleGA$se[2]/10) sampGAmleGAcontour <- sapply(Shape, function(sh) sapply(Scale, function(sc) sampGAmleGA$r(sh,sc))) ## plot contours using a linear scale for the parameters ## draw four contours corresponding to the following likelihood ratios: ## 0.5, 0.1, Chi2 with 2 df and p values of 0.95 and 0.99 X11(width=12,height=6) layout(matrix(1:2,ncol=2)) contour(Shape,Scale,t(sampGAmleGAcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels=c("log(0.5)", "log(0.1)", "-1/2*P(Chi2=0.95)", "-1/2*P(Chi2=0.99)"), xlab="shape",ylab="scale", main="Log Relative Likelihood Contours" ) points(sampGAmleGA$estimate[1],sampGAmleGA$estimate[2],pch=3) points(shape.true,scale.true,pch=16,col=2) ## The contours are not really symmetrical about the MLE we can try to ## replot them using a log scale for the parameters to see if that improves ## the situation contour(log(Shape),log(Scale),t(sampGAmleGAcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels="", xlab="log(shape)",ylab="log(scale)", main="Log Relative Likelihood Contours", sub="log scale for the parameters") points(log(sampGAmleGA$estimate[1]),log(sampGAmleGA$estimate[2]),pch=3) points(log(shape.true),log(scale.true),pch=16,col=2) ## make a parametric boostrap to check the distribution of the deviance nbReplicate <- 10000 sampleSize <- 100 system.time( devianceGA100 <- replicate(nbReplicate,{ sampGA <- rgamma(sampleSize,shape=shape.true,scale=scale.true) sampGAmleGA <- gammaMLE(sampGA) -2*sampGAmleGA$r(shape.true,scale.true) } ) )[3] ## Get 95 and 99% confidence intervals for the QQ plot ci <- sapply(1:nbReplicate, function(idx) qchisq(qbeta(c(0.005,0.025,0.975,0.995), idx, nbReplicate-idx+1), df=2) ) ## make QQ plot X <- qchisq(ppoints(nbReplicate),df=2) Y <- sort(devianceGA100) X11() plot(X,Y,type="n", xlab=expression(paste(chi[2]^2," quantiles")), ylab="MC quantiles", main="Deviance with true parameters after ML fit of gamma data", sub=paste("sample size:", sampleSize,"MC replicates:", nbReplicate) ) abline(a=0,b=1) lines(X,ci[1,],lty=2) lines(X,ci[2,],lty=2) lines(X,ci[3,],lty=2) lines(X,ci[4,],lty=2) lines(X,Y,col=2) ## End(Not run)
Estimate inverse Gaussian model parameters by the maximum likelihood method using possibly censored data. Two different parameterizations of the inverse Gaussian distribution can be used. The corresponding code for this function as well as the manual information included here is attributed to Christophe Pouzat's STAR Package (archived 2022-05-23).
invgaussMLE(yi, ni = numeric(length(yi)) + 1, si = numeric(length(yi)) + 1, parameterization = "sigma2")
invgaussMLE(yi, ni = numeric(length(yi)) + 1, si = numeric(length(yi)) + 1, parameterization = "sigma2")
yi |
vector of (possibly binned) observations or a
|
ni |
vector of counts for each value of |
si |
vector of counts of uncensored observations for each
value of |
parameterization |
parameterization used, |
The two different parameterizations of the inverse Gaussian distribution
are discussed in the manual of dinvgauss
.
In the absence of censored data the ML estimates are available in
closed form (Lindsey, 2004, p 212) together with the Hessian matrix at
the MLE. In presence of censored data an initial guess for the
parameters is obtained using the uncensored data before maximizing the
likelihood function to the full data set using optim
with the BFGS
method. ML
estimation is always performed with the "sigma2"
parameterization. Parameters and variance-covariance matrix are
transformed at the end if the "boundary"
parameterization is
requested.
In order to ensure good behavior of the numerical optimization
routines, optimization is performed on the log of the parameters
(mu
and sigma2
).
Standard errors are obtained from the inverse of the observed information matrix at the MLE. They are transformed to go from the log scale used by the optimization routine, when the latter is used (ie, for censored data) to the parameterization requested.
A list of class durationFit
with the following components:
estimate |
the estimated parameters, a named vector. |
se |
the standard errors, a named vector. |
logLik |
the log likelihood at maximum. |
r |
a function returning the log of the relative likelihood function. |
mll |
a function returning the opposite of the log likelihood function using the log of the parameters. |
call |
the matched call. |
The returned standard errors (component se
) are valid in the asymptotic limit. You
should plot contours using function r
in the returned list and
check that the contours are reasonably close to ellipses.
Christophe Pouzat [email protected]
Lindsey, J.K. (2004) Introduction to Applied Statistics: A Modelling Approach. OUP.
dinvgauss
, gammaMLE
, llogisMLE
## Simulate sample of size 100 from an inverse Gaussian ## distribution set.seed(1102006,"Mersenne-Twister") sampleSize <- 100 mu.true <- 0.075 sigma2.true <- 3 sampleSize <- 100 sampIG <- rinvgauss(sampleSize,mu=mu.true,sigma2=sigma2.true) ## Make a maximum likelihood fit sampIGmleIG <- invgaussMLE(sampIG) ## Compare estimates with actual values rbind(est = coef(sampIGmleIG),se = sampIGmleIG$se,true = c(mu.true,sigma2.true)) ## In the absence of censoring the MLE of the inverse Gaussian is available in a ## closed form together with its variance (ie, the observed information matrix) ## we can check that we did not screw up at that stage by comparing the observed ## information matrix obtained numerically with the analytical one. To do that we ## use the MINUS log likelihood function returned by invgaussMLE to get a numerical ## estimate detailedFit <- optim(par=as.vector(log(sampIGmleIG$estimate)), fn=sampIGmleIG$mll, method="BFGS", hessian=TRUE) ## We should not forget that the "mll" function uses the log of the parameters while ## the "se" component of sampIGmleIG list is expressed on the linear scale we must therefore ## transform one into the other as follows (Kalbfleisch, 1985, p71): ## if x = exp(u) and y = exp(v) and if we have the information matrix in term of ## u and v (that's the hessian component of list detailedFit above), we create matrix: ## du/dx du/dy ## Q = ## dv/dx dv/dy ## and we get I in term of x and y by the following matrix product: ## I(x,y) <- t(Q) %*% I(u,v) %*% Q ## In the present case: ## du/dx = 1/exp(u), du/dy = 0, dv/dx = 0, dv/dy = 1/exp(v) ## Therefore: Q <- diag(1/exp(detailedFit$par)) numericalI <- t(Q) %*% detailedFit$hessian %*% Q seComp <- rbind(sampIGmleIG$se, sqrt(diag(solve(numericalI)))) colnames(seComp) <- c("mu","sigma2") rownames(seComp) <- c("analytical", "numerical") seComp ## We can check the relative differences between the 2 apply(seComp,2,function(x) abs(diff(x))/x[1]) ## Not run: ## Estimate the log relative likelihood on a grid to plot contours Mu <- seq(coef(sampIGmleIG)[1]-4*sampIGmleIG$se[1], coef(sampIGmleIG)[1]+4*sampIGmleIG$se[1], sampIGmleIG$se[1]/10) Sigma2 <- seq(coef(sampIGmleIG)[2]-4*sampIGmleIG$se[2], coef(sampIGmleIG)[2]+4*sampIGmleIG$se[2], sampIGmleIG$se[2]/10) sampIGmleIGcontour <- sapply(Mu, function(mu) sapply(Sigma2, function(s2) sampIGmleIG$r(mu,s2))) ## plot contours using a linear scale for the parameters ## draw four contours corresponding to the following likelihood ratios: ## 0.5, 0.1, Chi2 with 2 df and p values of 0.95 and 0.99 X11(width=12,height=6) layout(matrix(1:2,ncol=2)) contour(Mu,Sigma2,t(sampIGmleIGcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels=c("log(0.5)", "log(0.1)", "-1/2*P(Chi2=0.95)", "-1/2*P(Chi2=0.99)"), xlab=expression(mu),ylab=expression(sigma^2), main="Log Relative Likelihood Contours" ) points(coef(sampIGmleIG)[1],coef(sampIGmleIG)[2],pch=3) points(mu.true,sigma2.true,pch=16,col=2) ## The contours are not really symmetrical about the MLE we can try to ## replot them using a log scale for the parameters to see if that improves ## the situation contour(log(Mu),log(Sigma2),t(sampIGmleIGcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels="", xlab=expression(log(mu)),ylab=expression(log(sigma^2)), main="Log Relative Likelihood Contours", sub="log scale for the parameters") points(log(coef(sampIGmleIG)[1]),log(coef(sampIGmleIG)[2]),pch=3) points(log(mu.true),log(sigma2.true),pch=16,col=2) ## Even with the log scale the contours are not ellipsoidal, so let us compute profiles ## For that we are going to use the returned MINUS log likelihood function logMuProfFct <- function(logMu,...) { myOpt <- optimise(function(x) sampIGmleIG$mll(c(logMu,x))+logLik(sampIGmleIG),...) as.vector(unlist(myOpt[c("objective","minimum")])) } logMuProfCI <- function(logMu, CI, a=logS2Seq[1], b=logS2Seq[length(logS2Seq)]) logMuProfFct(logMu,c(a,b))[1] - qchisq(CI,1)/2 logS2ProfFct <- function(logS2,...) { myOpt <- optimise(function(x) sampIGmleIG$mll(c(x,logS2))+logLik(sampIGmleIG),...) as.vector(unlist(myOpt[c("objective","minimum")])) } logS2ProfCI <- function(logS2, CI, a=logMuSeq[1], b=logMuSeq[length(logMuSeq)]) logS2ProfFct(logS2,c(a,b))[1] - qchisq(CI,1)/2 ## We compute profiles (on the log scale) eploxing +/- 3 times ## the se about the MLE logMuSE <- sqrt(diag(solve(detailedFit$hessian)))[1] logMuSeq <- seq(log(coef(sampIGmleIG)[1])-3*logMuSE, log(coef(sampIGmleIG)[1])+3*logMuSE, logMuSE/10) logS2SE <- sqrt(diag(solve(detailedFit$hessian)))[2] logS2Seq <- seq(log(coef(sampIGmleIG)[2])-3*logS2SE, log(coef(sampIGmleIG)[2])+3*logS2SE, logS2SE/10) logMuProf <- sapply(logMuSeq,logMuProfFct, lower=logS2Seq[1], upper=logS2Seq[length(logS2Seq)]) ## Get 95 logMuCI95 <- c(uniroot(logMuProfCI, interval=c(logMuSeq[1],log(coef(sampIGmleIG)[1])), CI=0.95)$root, uniroot(logMuProfCI, interval=c(log(coef(sampIGmleIG)[1]),logMuSeq[length(logMuSeq)]), CI=0.95)$root ) logMuCI99 <- c(uniroot(logMuProfCI, interval=c(logMuSeq[1],log(coef(sampIGmleIG)[1])), CI=0.99)$root, uniroot(logMuProfCI, interval=c(log(coef(sampIGmleIG)[1]),logMuSeq[length(logMuSeq)]), CI=0.99)$root ) logS2Prof <- sapply(logS2Seq,logS2ProfFct, lower=logMuSeq[1], upper=logMuSeq[length(logMuSeq)]) ## Get 95 logS2CI95 <- c(uniroot(logS2ProfCI, interval=c(logS2Seq[1],log(coef(sampIGmleIG)[2])), CI=0.95)$root, uniroot(logS2ProfCI, interval=c(log(coef(sampIGmleIG)[2]),logS2Seq[length(logS2Seq)]), CI=0.95)$root ) logS2CI99 <- c(uniroot(logS2ProfCI, interval=c(logS2Seq[1],log(coef(sampIGmleIG)[2])), CI=0.99)$root, uniroot(logS2ProfCI, interval=c(log(coef(sampIGmleIG)[2]),logS2Seq[length(logS2Seq)]), CI=0.99)$root ) ## Add profiles to the previous plot lines(logMuSeq,logMuProf[2,],col=2,lty=2) lines(logS2Prof[2,],logS2Seq,col=2,lty=2) ## We can now check the deviations of the (profiled) deviances ## from the asymptotic parabolic curves X11() layout(matrix(1:4,nrow=2)) oldpar <- par(mar=c(4,4,2,1)) logMuSeqOffset <- logMuSeq-log(coef(sampIGmleIG)[1]) logMuVar <- diag(solve(detailedFit$hessian))[1] plot(logMuSeq,2*logMuProf[1,],type="l",xlab=expression(log(mu)),ylab="Deviance") lines(logMuSeq,logMuSeqOffset^2/logMuVar,col=2) points(log(coef(sampIGmleIG)[1]),0,pch=3) abline(h=0) abline(h=qchisq(0.95,1),lty=2) abline(h=qchisq(0.99,1),lty=2) lines(rep(logMuCI95[1],2),c(0,qchisq(0.95,1)),lty=2) lines(rep(logMuCI95[2],2),c(0,qchisq(0.95,1)),lty=2) lines(rep(logMuCI99[1],2),c(0,qchisq(0.99,1)),lty=2) lines(rep(logMuCI99[2],2),c(0,qchisq(0.99,1)),lty=2) ## We can also "linearize" this last graph plot(logMuSeq, sqrt(2*logMuProf[1,])*sign(logMuSeqOffset), type="l", xlab=expression(log(mu)), ylab=expression(paste("signed ",sqrt(Deviance))) ) lines(logMuSeq, sqrt(logMuSeqOffset^2/logMuVar)*sign(logMuSeqOffset), col=2) points(log(coef(sampIGmleIG)[1]),0,pch=3) logS2SeqOffset <- logS2Seq-log(coef(sampIGmleIG)[2]) logS2Var <- diag(solve(detailedFit$hessian))[2] plot(logS2Seq,2*logS2Prof[1,],type="l",xlab=expression(log(sigma^2)),ylab="Deviance") lines(logS2Seq,logS2SeqOffset^2/logS2Var,col=2) points(log(coef(sampIGmleIG)[2]),0,pch=3) abline(h=0) abline(h=qchisq(0.95,1),lty=2) abline(h=qchisq(0.99,1),lty=2) lines(rep(logS2CI95[1],2),c(0,qchisq(0.95,1)),lty=2) lines(rep(logS2CI95[2],2),c(0,qchisq(0.95,1)),lty=2) lines(rep(logS2CI99[1],2),c(0,qchisq(0.99,1)),lty=2) lines(rep(logS2CI99[2],2),c(0,qchisq(0.99,1)),lty=2) ## We can also "linearize" this last graph plot(logS2Seq, sqrt(2*logS2Prof[1,])*sign(logS2SeqOffset), type="l", xlab=expression(log(sigma^2)), ylab=expression(paste("signed ",sqrt(Deviance))) ) lines(logS2Seq, sqrt(logS2SeqOffset^2/logS2Var)*sign(logS2SeqOffset), col=2) points(log(coef(sampIGmleIG)[2]),0,pch=3) par(oldpar) ## make a parametric boostrap to check the distribution of the deviance nbReplicate <- 1000 #10000 sampleSize <- 100 system.time( devianceIG100 <- lapply(1:nbReplicate, function(idx) { if ((idx sampIG <- rinvgauss(sampleSize,mu=mu.true,sigma2=sigma2.true) sampIGmleIG <- invgaussMLE(sampIG) Deviance <- -2*sampIGmleIG$r(mu.true,sigma2.true) logPara <- log(coef(sampIGmleIG)) logParaSE <- sampIGmleIG$se/coef(sampIGmleIG) intervalMu <- function(n) c(-n,n)*logParaSE[1]+logPara[1] intervalS2 <- function(n) c(-n,n)*logParaSE[2]+logPara[2] logMuProfFct <- function(logMu,...) { optimise(function(x) sampIGmleIG$mll(c(logMu,x))+logLik(sampIGmleIG),...)$objective } logMuProfCI <- function(logMu, CI, a=intervalS2(4)[1], b=intervalS2(4)[2]) logMuProfFct(logMu,c(a,b)) - qchisq(CI,1)/2 logS2ProfFct <- function(logS2,...) { optimise(function(x) sampIGmleIG$mll(c(x,logS2))+logLik(sampIGmleIG),...)$objective } logS2ProfCI <- function(logS2, CI, a=intervalMu(4)[1], b=intervalMu(4)[2]) logS2ProfFct(logS2,c(a,b)) - qchisq(CI,1)/2 factor <- 4 while((logMuProfCI(intervalMu(factor)[2],0.99) * logMuProfCI(logPara[1],0.99) >= 0) || (logMuProfCI(intervalMu(factor)[1],0.99) * logMuProfCI(logPara[1],0.99) >= 0) ) factor <- factor+1 ##browser() logMuCI95 <- c(uniroot(logMuProfCI, interval=c(intervalMu(factor)[1],logPara[1]), CI=0.95)$root, uniroot(logMuProfCI, interval=c(logPara[1],intervalMu(factor)[2]), CI=0.95)$root ) logMuCI99 <- c(uniroot(logMuProfCI, interval=c(intervalMu(factor)[1],logPara[1]), CI=0.99)$root, uniroot(logMuProfCI, interval=c(logPara[1],intervalMu(factor)[2]), CI=0.99)$root ) factor <- 4 while((logS2ProfCI(intervalS2(factor)[2],0.99) * logS2ProfCI(logPara[2],0.99) >= 0) || (logS2ProfCI(intervalS2(factor)[1],0.99) * logS2ProfCI(logPara[2],0.99) >= 0) ) factor <- factor+1 logS2CI95 <- c(uniroot(logS2ProfCI, interval=c(intervalS2(factor)[1],logPara[2]), CI=0.95)$root, uniroot(logS2ProfCI, interval=c(logPara[2],intervalS2(factor)[2]), CI=0.95)$root ) logS2CI99 <- c(uniroot(logS2ProfCI, interval=c(intervalS2(factor)[1],logPara[2]), CI=0.99)$root, uniroot(logS2ProfCI, interval=c(logPara[2],intervalS2(factor)[2]), CI=0.99)$root ) list(deviance=Deviance, logMuCI95=logMuCI95, logMuNorm95=qnorm(c(0.025,0.975),logPara[1],logParaSE[1]), logMuCI99=logMuCI99, logMuNorm99=qnorm(c(0.005,0.995),logPara[1],logParaSE[1]), logS2CI95=logS2CI95, logS2Norm95=qnorm(c(0.025,0.975),logPara[2],logParaSE[2]), logS2CI99=logS2CI99, logS2Norm99=qnorm(c(0.005,0.995),logPara[2],logParaSE[2])) } ) )[3] ## Find out how many times the true parameters was within the computed CIs nLogMuCI95 <- sum(sapply(devianceIG100, function(l) l$logMuCI95[1] <= log(mu.true) && log(mu.true)<= l$logMuCI95[2] ) ) nLogMuNorm95 <- sum(sapply(devianceIG100, function(l) l$logMuNorm95[1] <= log(mu.true) && log(mu.true)<= l$logMuNorm95[2] ) ) nLogMuCI99 <- sum(sapply(devianceIG100, function(l) l$logMuCI99[1] <= log(mu.true) && log(mu.true)<= l$logMuCI99[2] ) ) nLogMuNorm99 <- sum(sapply(devianceIG100, function(l) l$logMuNorm99[1] <= log(mu.true) && log(mu.true)<= l$logMuNorm99[2] ) ) ## Check if these counts are compatible with the nominal CIs c(prof95Mu=nLogMuCI95,norm95Mu=nLogMuNorm95) qbinom(c(0.005,0.995),nbReplicate,0.95) c(prof95Mu=nLogMuCI99,norm95Mu=nLogMuNorm99) qbinom(c(0.005,0.995),nbReplicate,0.99) nLogS2CI95 <- sum(sapply(devianceIG100, function(l) l$logS2CI95[1] <= log(sigma2.true) && log(sigma2.true)<= l$logS2CI95[2] ) ) nLogS2Norm95 <- sum(sapply(devianceIG100, function(l) l$logS2Norm95[1] <= log(sigma2.true) && log(sigma2.true)<= l$logS2Norm95[2] ) ) nLogS2CI99 <- sum(sapply(devianceIG100, function(l) l$logS2CI99[1] <= log(sigma2.true) && log(sigma2.true)<= l$logS2CI99[2] ) ) nLogS2Norm99 <- sum(sapply(devianceIG100, function(l) l$logS2Norm99[1] <= log(sigma2.true) && log(sigma2.true)<= l$logS2Norm99[2] ) ) ## Check if these counts are compatible with the nominal CIs c(prof95S2=nLogS2CI95,norm95S2=nLogS2Norm95) qbinom(c(0.005,0.995),nbReplicate,0.95) c(prof95S2=nLogS2CI99,norm95S2=nLogS2Norm99) qbinom(c(0.005,0.995),nbReplicate,0.99) ## Get 95 and 99% confidence intervals for the QQ plot ci <- sapply(1:nbReplicate, function(idx) qchisq(qbeta(c(0.005,0.025,0.975,0.995), idx, nbReplicate-idx+1), df=2) ) ## make QQ plot X <- qchisq(ppoints(nbReplicate),df=2) Y <- sort(sapply(devianceIG100,function(l) l$deviance)) X11() plot(X,Y,type="n", xlab=expression(paste(chi[2]^2," quantiles")), ylab="MC quantiles", main="Deviance with true parameters after ML fit of IG data", sub=paste("sample size:", sampleSize,"MC replicates:", nbReplicate) ) abline(a=0,b=1) lines(X,ci[1,],lty=2) lines(X,ci[2,],lty=2) lines(X,ci[3,],lty=2) lines(X,ci[4,],lty=2) lines(X,Y,col=2) ## End(Not run)
## Simulate sample of size 100 from an inverse Gaussian ## distribution set.seed(1102006,"Mersenne-Twister") sampleSize <- 100 mu.true <- 0.075 sigma2.true <- 3 sampleSize <- 100 sampIG <- rinvgauss(sampleSize,mu=mu.true,sigma2=sigma2.true) ## Make a maximum likelihood fit sampIGmleIG <- invgaussMLE(sampIG) ## Compare estimates with actual values rbind(est = coef(sampIGmleIG),se = sampIGmleIG$se,true = c(mu.true,sigma2.true)) ## In the absence of censoring the MLE of the inverse Gaussian is available in a ## closed form together with its variance (ie, the observed information matrix) ## we can check that we did not screw up at that stage by comparing the observed ## information matrix obtained numerically with the analytical one. To do that we ## use the MINUS log likelihood function returned by invgaussMLE to get a numerical ## estimate detailedFit <- optim(par=as.vector(log(sampIGmleIG$estimate)), fn=sampIGmleIG$mll, method="BFGS", hessian=TRUE) ## We should not forget that the "mll" function uses the log of the parameters while ## the "se" component of sampIGmleIG list is expressed on the linear scale we must therefore ## transform one into the other as follows (Kalbfleisch, 1985, p71): ## if x = exp(u) and y = exp(v) and if we have the information matrix in term of ## u and v (that's the hessian component of list detailedFit above), we create matrix: ## du/dx du/dy ## Q = ## dv/dx dv/dy ## and we get I in term of x and y by the following matrix product: ## I(x,y) <- t(Q) %*% I(u,v) %*% Q ## In the present case: ## du/dx = 1/exp(u), du/dy = 0, dv/dx = 0, dv/dy = 1/exp(v) ## Therefore: Q <- diag(1/exp(detailedFit$par)) numericalI <- t(Q) %*% detailedFit$hessian %*% Q seComp <- rbind(sampIGmleIG$se, sqrt(diag(solve(numericalI)))) colnames(seComp) <- c("mu","sigma2") rownames(seComp) <- c("analytical", "numerical") seComp ## We can check the relative differences between the 2 apply(seComp,2,function(x) abs(diff(x))/x[1]) ## Not run: ## Estimate the log relative likelihood on a grid to plot contours Mu <- seq(coef(sampIGmleIG)[1]-4*sampIGmleIG$se[1], coef(sampIGmleIG)[1]+4*sampIGmleIG$se[1], sampIGmleIG$se[1]/10) Sigma2 <- seq(coef(sampIGmleIG)[2]-4*sampIGmleIG$se[2], coef(sampIGmleIG)[2]+4*sampIGmleIG$se[2], sampIGmleIG$se[2]/10) sampIGmleIGcontour <- sapply(Mu, function(mu) sapply(Sigma2, function(s2) sampIGmleIG$r(mu,s2))) ## plot contours using a linear scale for the parameters ## draw four contours corresponding to the following likelihood ratios: ## 0.5, 0.1, Chi2 with 2 df and p values of 0.95 and 0.99 X11(width=12,height=6) layout(matrix(1:2,ncol=2)) contour(Mu,Sigma2,t(sampIGmleIGcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels=c("log(0.5)", "log(0.1)", "-1/2*P(Chi2=0.95)", "-1/2*P(Chi2=0.99)"), xlab=expression(mu),ylab=expression(sigma^2), main="Log Relative Likelihood Contours" ) points(coef(sampIGmleIG)[1],coef(sampIGmleIG)[2],pch=3) points(mu.true,sigma2.true,pch=16,col=2) ## The contours are not really symmetrical about the MLE we can try to ## replot them using a log scale for the parameters to see if that improves ## the situation contour(log(Mu),log(Sigma2),t(sampIGmleIGcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels="", xlab=expression(log(mu)),ylab=expression(log(sigma^2)), main="Log Relative Likelihood Contours", sub="log scale for the parameters") points(log(coef(sampIGmleIG)[1]),log(coef(sampIGmleIG)[2]),pch=3) points(log(mu.true),log(sigma2.true),pch=16,col=2) ## Even with the log scale the contours are not ellipsoidal, so let us compute profiles ## For that we are going to use the returned MINUS log likelihood function logMuProfFct <- function(logMu,...) { myOpt <- optimise(function(x) sampIGmleIG$mll(c(logMu,x))+logLik(sampIGmleIG),...) as.vector(unlist(myOpt[c("objective","minimum")])) } logMuProfCI <- function(logMu, CI, a=logS2Seq[1], b=logS2Seq[length(logS2Seq)]) logMuProfFct(logMu,c(a,b))[1] - qchisq(CI,1)/2 logS2ProfFct <- function(logS2,...) { myOpt <- optimise(function(x) sampIGmleIG$mll(c(x,logS2))+logLik(sampIGmleIG),...) as.vector(unlist(myOpt[c("objective","minimum")])) } logS2ProfCI <- function(logS2, CI, a=logMuSeq[1], b=logMuSeq[length(logMuSeq)]) logS2ProfFct(logS2,c(a,b))[1] - qchisq(CI,1)/2 ## We compute profiles (on the log scale) eploxing +/- 3 times ## the se about the MLE logMuSE <- sqrt(diag(solve(detailedFit$hessian)))[1] logMuSeq <- seq(log(coef(sampIGmleIG)[1])-3*logMuSE, log(coef(sampIGmleIG)[1])+3*logMuSE, logMuSE/10) logS2SE <- sqrt(diag(solve(detailedFit$hessian)))[2] logS2Seq <- seq(log(coef(sampIGmleIG)[2])-3*logS2SE, log(coef(sampIGmleIG)[2])+3*logS2SE, logS2SE/10) logMuProf <- sapply(logMuSeq,logMuProfFct, lower=logS2Seq[1], upper=logS2Seq[length(logS2Seq)]) ## Get 95 logMuCI95 <- c(uniroot(logMuProfCI, interval=c(logMuSeq[1],log(coef(sampIGmleIG)[1])), CI=0.95)$root, uniroot(logMuProfCI, interval=c(log(coef(sampIGmleIG)[1]),logMuSeq[length(logMuSeq)]), CI=0.95)$root ) logMuCI99 <- c(uniroot(logMuProfCI, interval=c(logMuSeq[1],log(coef(sampIGmleIG)[1])), CI=0.99)$root, uniroot(logMuProfCI, interval=c(log(coef(sampIGmleIG)[1]),logMuSeq[length(logMuSeq)]), CI=0.99)$root ) logS2Prof <- sapply(logS2Seq,logS2ProfFct, lower=logMuSeq[1], upper=logMuSeq[length(logMuSeq)]) ## Get 95 logS2CI95 <- c(uniroot(logS2ProfCI, interval=c(logS2Seq[1],log(coef(sampIGmleIG)[2])), CI=0.95)$root, uniroot(logS2ProfCI, interval=c(log(coef(sampIGmleIG)[2]),logS2Seq[length(logS2Seq)]), CI=0.95)$root ) logS2CI99 <- c(uniroot(logS2ProfCI, interval=c(logS2Seq[1],log(coef(sampIGmleIG)[2])), CI=0.99)$root, uniroot(logS2ProfCI, interval=c(log(coef(sampIGmleIG)[2]),logS2Seq[length(logS2Seq)]), CI=0.99)$root ) ## Add profiles to the previous plot lines(logMuSeq,logMuProf[2,],col=2,lty=2) lines(logS2Prof[2,],logS2Seq,col=2,lty=2) ## We can now check the deviations of the (profiled) deviances ## from the asymptotic parabolic curves X11() layout(matrix(1:4,nrow=2)) oldpar <- par(mar=c(4,4,2,1)) logMuSeqOffset <- logMuSeq-log(coef(sampIGmleIG)[1]) logMuVar <- diag(solve(detailedFit$hessian))[1] plot(logMuSeq,2*logMuProf[1,],type="l",xlab=expression(log(mu)),ylab="Deviance") lines(logMuSeq,logMuSeqOffset^2/logMuVar,col=2) points(log(coef(sampIGmleIG)[1]),0,pch=3) abline(h=0) abline(h=qchisq(0.95,1),lty=2) abline(h=qchisq(0.99,1),lty=2) lines(rep(logMuCI95[1],2),c(0,qchisq(0.95,1)),lty=2) lines(rep(logMuCI95[2],2),c(0,qchisq(0.95,1)),lty=2) lines(rep(logMuCI99[1],2),c(0,qchisq(0.99,1)),lty=2) lines(rep(logMuCI99[2],2),c(0,qchisq(0.99,1)),lty=2) ## We can also "linearize" this last graph plot(logMuSeq, sqrt(2*logMuProf[1,])*sign(logMuSeqOffset), type="l", xlab=expression(log(mu)), ylab=expression(paste("signed ",sqrt(Deviance))) ) lines(logMuSeq, sqrt(logMuSeqOffset^2/logMuVar)*sign(logMuSeqOffset), col=2) points(log(coef(sampIGmleIG)[1]),0,pch=3) logS2SeqOffset <- logS2Seq-log(coef(sampIGmleIG)[2]) logS2Var <- diag(solve(detailedFit$hessian))[2] plot(logS2Seq,2*logS2Prof[1,],type="l",xlab=expression(log(sigma^2)),ylab="Deviance") lines(logS2Seq,logS2SeqOffset^2/logS2Var,col=2) points(log(coef(sampIGmleIG)[2]),0,pch=3) abline(h=0) abline(h=qchisq(0.95,1),lty=2) abline(h=qchisq(0.99,1),lty=2) lines(rep(logS2CI95[1],2),c(0,qchisq(0.95,1)),lty=2) lines(rep(logS2CI95[2],2),c(0,qchisq(0.95,1)),lty=2) lines(rep(logS2CI99[1],2),c(0,qchisq(0.99,1)),lty=2) lines(rep(logS2CI99[2],2),c(0,qchisq(0.99,1)),lty=2) ## We can also "linearize" this last graph plot(logS2Seq, sqrt(2*logS2Prof[1,])*sign(logS2SeqOffset), type="l", xlab=expression(log(sigma^2)), ylab=expression(paste("signed ",sqrt(Deviance))) ) lines(logS2Seq, sqrt(logS2SeqOffset^2/logS2Var)*sign(logS2SeqOffset), col=2) points(log(coef(sampIGmleIG)[2]),0,pch=3) par(oldpar) ## make a parametric boostrap to check the distribution of the deviance nbReplicate <- 1000 #10000 sampleSize <- 100 system.time( devianceIG100 <- lapply(1:nbReplicate, function(idx) { if ((idx sampIG <- rinvgauss(sampleSize,mu=mu.true,sigma2=sigma2.true) sampIGmleIG <- invgaussMLE(sampIG) Deviance <- -2*sampIGmleIG$r(mu.true,sigma2.true) logPara <- log(coef(sampIGmleIG)) logParaSE <- sampIGmleIG$se/coef(sampIGmleIG) intervalMu <- function(n) c(-n,n)*logParaSE[1]+logPara[1] intervalS2 <- function(n) c(-n,n)*logParaSE[2]+logPara[2] logMuProfFct <- function(logMu,...) { optimise(function(x) sampIGmleIG$mll(c(logMu,x))+logLik(sampIGmleIG),...)$objective } logMuProfCI <- function(logMu, CI, a=intervalS2(4)[1], b=intervalS2(4)[2]) logMuProfFct(logMu,c(a,b)) - qchisq(CI,1)/2 logS2ProfFct <- function(logS2,...) { optimise(function(x) sampIGmleIG$mll(c(x,logS2))+logLik(sampIGmleIG),...)$objective } logS2ProfCI <- function(logS2, CI, a=intervalMu(4)[1], b=intervalMu(4)[2]) logS2ProfFct(logS2,c(a,b)) - qchisq(CI,1)/2 factor <- 4 while((logMuProfCI(intervalMu(factor)[2],0.99) * logMuProfCI(logPara[1],0.99) >= 0) || (logMuProfCI(intervalMu(factor)[1],0.99) * logMuProfCI(logPara[1],0.99) >= 0) ) factor <- factor+1 ##browser() logMuCI95 <- c(uniroot(logMuProfCI, interval=c(intervalMu(factor)[1],logPara[1]), CI=0.95)$root, uniroot(logMuProfCI, interval=c(logPara[1],intervalMu(factor)[2]), CI=0.95)$root ) logMuCI99 <- c(uniroot(logMuProfCI, interval=c(intervalMu(factor)[1],logPara[1]), CI=0.99)$root, uniroot(logMuProfCI, interval=c(logPara[1],intervalMu(factor)[2]), CI=0.99)$root ) factor <- 4 while((logS2ProfCI(intervalS2(factor)[2],0.99) * logS2ProfCI(logPara[2],0.99) >= 0) || (logS2ProfCI(intervalS2(factor)[1],0.99) * logS2ProfCI(logPara[2],0.99) >= 0) ) factor <- factor+1 logS2CI95 <- c(uniroot(logS2ProfCI, interval=c(intervalS2(factor)[1],logPara[2]), CI=0.95)$root, uniroot(logS2ProfCI, interval=c(logPara[2],intervalS2(factor)[2]), CI=0.95)$root ) logS2CI99 <- c(uniroot(logS2ProfCI, interval=c(intervalS2(factor)[1],logPara[2]), CI=0.99)$root, uniroot(logS2ProfCI, interval=c(logPara[2],intervalS2(factor)[2]), CI=0.99)$root ) list(deviance=Deviance, logMuCI95=logMuCI95, logMuNorm95=qnorm(c(0.025,0.975),logPara[1],logParaSE[1]), logMuCI99=logMuCI99, logMuNorm99=qnorm(c(0.005,0.995),logPara[1],logParaSE[1]), logS2CI95=logS2CI95, logS2Norm95=qnorm(c(0.025,0.975),logPara[2],logParaSE[2]), logS2CI99=logS2CI99, logS2Norm99=qnorm(c(0.005,0.995),logPara[2],logParaSE[2])) } ) )[3] ## Find out how many times the true parameters was within the computed CIs nLogMuCI95 <- sum(sapply(devianceIG100, function(l) l$logMuCI95[1] <= log(mu.true) && log(mu.true)<= l$logMuCI95[2] ) ) nLogMuNorm95 <- sum(sapply(devianceIG100, function(l) l$logMuNorm95[1] <= log(mu.true) && log(mu.true)<= l$logMuNorm95[2] ) ) nLogMuCI99 <- sum(sapply(devianceIG100, function(l) l$logMuCI99[1] <= log(mu.true) && log(mu.true)<= l$logMuCI99[2] ) ) nLogMuNorm99 <- sum(sapply(devianceIG100, function(l) l$logMuNorm99[1] <= log(mu.true) && log(mu.true)<= l$logMuNorm99[2] ) ) ## Check if these counts are compatible with the nominal CIs c(prof95Mu=nLogMuCI95,norm95Mu=nLogMuNorm95) qbinom(c(0.005,0.995),nbReplicate,0.95) c(prof95Mu=nLogMuCI99,norm95Mu=nLogMuNorm99) qbinom(c(0.005,0.995),nbReplicate,0.99) nLogS2CI95 <- sum(sapply(devianceIG100, function(l) l$logS2CI95[1] <= log(sigma2.true) && log(sigma2.true)<= l$logS2CI95[2] ) ) nLogS2Norm95 <- sum(sapply(devianceIG100, function(l) l$logS2Norm95[1] <= log(sigma2.true) && log(sigma2.true)<= l$logS2Norm95[2] ) ) nLogS2CI99 <- sum(sapply(devianceIG100, function(l) l$logS2CI99[1] <= log(sigma2.true) && log(sigma2.true)<= l$logS2CI99[2] ) ) nLogS2Norm99 <- sum(sapply(devianceIG100, function(l) l$logS2Norm99[1] <= log(sigma2.true) && log(sigma2.true)<= l$logS2Norm99[2] ) ) ## Check if these counts are compatible with the nominal CIs c(prof95S2=nLogS2CI95,norm95S2=nLogS2Norm95) qbinom(c(0.005,0.995),nbReplicate,0.95) c(prof95S2=nLogS2CI99,norm95S2=nLogS2Norm99) qbinom(c(0.005,0.995),nbReplicate,0.99) ## Get 95 and 99% confidence intervals for the QQ plot ci <- sapply(1:nbReplicate, function(idx) qchisq(qbeta(c(0.005,0.025,0.975,0.995), idx, nbReplicate-idx+1), df=2) ) ## make QQ plot X <- qchisq(ppoints(nbReplicate),df=2) Y <- sort(sapply(devianceIG100,function(l) l$deviance)) X11() plot(X,Y,type="n", xlab=expression(paste(chi[2]^2," quantiles")), ylab="MC quantiles", main="Deviance with true parameters after ML fit of IG data", sub=paste("sample size:", sampleSize,"MC replicates:", nbReplicate) ) abline(a=0,b=1) lines(X,ci[1,],lty=2) lines(X,ci[2,],lty=2) lines(X,ci[3,],lty=2) lines(X,ci[4,],lty=2) lines(X,Y,col=2) ## End(Not run)
This function calculates and plots the actual coverage functions associated with five confidence interval procedures associated with the Kaplan–Meier product–limit estimator (KMPLE) for a randomly right-censored data set with exponential failure times and exponential censoring times.
km.coverage(n, lambdaT, lambdaC, alpha = 0.1, interval = c("Greenwood"), show = TRUE, table = FALSE, value = 0)
km.coverage(n, lambdaT, lambdaC, alpha = 0.1, interval = c("Greenwood"), show = TRUE, table = FALSE, value = 0)
n |
Sample size; must be between 1 to 15. |
lambdaT |
Lambda for the exponential failure time distribution. |
lambdaC |
Lambda for the exponential censoring time distribution. |
alpha |
Significance level for confidence interval; if equals to 0.1, for example, it indicates a 90 percent confidence interval. |
interval |
Type of confidence interval used; supported types are "Greenwood", "Exp-Greenwood", "Log-Log", "Arcsine", and "Peto". This parameter can be a vector containing one or more interval types. |
show |
A logical value indicating whether to display additional potential Probability Mass Function (PMF) lines in gray. When set to |
table |
Logical value. If TRUE, a table of outcomes is printed. The table aids in understanding how the plot is generated. |
value |
A specific value of p for generating the table. |
The Kaplan–Meier product–limit estimator provides a point estimator for a data set of randomly
right-censored observations.
The km.coverage
function computes the actual coverage function of up to five different
types of confidence interval procedures for the survivor function for various values of its arguments
when failure times are IID exponential random variables with failure rate lambdaT
and
censoring times are IID exponential random variables with censoring rate lambdaC
.
The km.coverage
function
provides options for plotting and tabular display of the outcomes.
Depending on the interval type, km.coverage
calls internal plotting functions: plot_Greenwood
, plot_ExpGreenwood
, plot_LogLog
, plot_Arcsine
, and plot_Peto
.
When km.coverage
is called, it generates an actual coverage graph (and a table) based on
various types of confidence intervals,
various sample sizes,
various failure / censoring distributions (exponential)
if the "table" argument is set to TRUE, it shows the corresponding contribution table at the specified value of p.
The ‘interval’ argument can be one of the character strings "Greenwood"
, "Exp-Greenwood"
, "Log-Log"
, "Arcsine"
, and "Peto"
, or it can be a vector of the confidence interval procedure names, or it can be the character string "all"
, which will plot all five actual coverage functions. The order in which the actual coverage functions fall in the vector controls the order in which the actual coverage functions are plotted. The order is significant because it is often the case that one of the actual coverage functions obscures part of another actual coverage function on the plot. For example, the first listed interval in the vector is plotted first. Therefore, it may be covered up by the plots of subsequent intervals in the vector.
This actual coverage plot is constructed by calculating lower and upper bounds associated with the confidence interval procedure at various permutations of the failure/censoring ordering for a particular n, the number of objects on test. We calculate the contribution to the actual coverage of each permutation case based on the probability mass function of the KMPLE. At each value of p corresponding to S(t), we sum up the contributions which correspond to lower bounds and upper bounds covering p. These sums give the actual coverage curves.
This function primarily generates plots illustrating the coverage function for various Kaplan-Meier product-limit confidence intervals. If the table
argument is set to TRUE
, it additionally prints a detailed table of outcomes corresponding to the plot. The table includes calculated coverage values for a specified value of p, providing a complete explanation of how plots are generated. For example, for a value of p along the horizontal axis, the rows of the table show all possible failure/censoring orderings, along with the associated contributions. Adding up the contributions from each row in which the interval covers p gives the point on the plot at p.
When there is more than one interval, the table
option is disabled.
Xingyu Wang ([email protected]),
Larry Leemis ([email protected])
Heather Sasinowska ([email protected])
km.coverage(5, 0.5, 1, 0.1, "Greenwood") km.coverage(5, 0.5, 1, 0.1, "Greenwood", FALSE) km.coverage(5, 0.5, 1, 0.1, "Arcsine", FALSE, TRUE, 0.3) km.coverage(5, 0.5, 1, 0.1, "all") km.coverage(5, 0.5, 1, 0.1, c("Arcsine", "Greenwood", "Exp-Greenwood"))
km.coverage(5, 0.5, 1, 0.1, "Greenwood") km.coverage(5, 0.5, 1, 0.1, "Greenwood", FALSE) km.coverage(5, 0.5, 1, 0.1, "Arcsine", FALSE, TRUE, 0.3) km.coverage(5, 0.5, 1, 0.1, "all") km.coverage(5, 0.5, 1, 0.1, c("Arcsine", "Greenwood", "Exp-Greenwood"))
Generates a matrix containing all possible outcomes (all possible sequences of failure times and right-censoring times) of the value
of the Kaplan-Meier product-limit estimator for a particular sample
size n
.
km.outcomes(n)
km.outcomes(n)
n |
sample size |
The Kaplan-Meier product-limit estimator is used to
estimate the survivor function for a data set of
positive values in the presence of right censoring.
The km.outcomes
function generates a matrix with
all possible combinations of observed failures and
right censored values and the resulting support values
for the Kaplan-Meier product-limit estimator for a sample of
size n
.
The n
argument must be a positive integer denoting
the sample size. Allowable limits are from 1 to 24.
Larger values of n
are not allowed because of CPU
and memory limitations.
In order to keep the support values as exact fractions,
the numerators and denominators are stored separately in
the a
matrix in the columns named num
and
den
. The support values are stored as numeric
values in the column named S(t)
.
The km.outcomes
function returns a matrix with
2n+1-1 rows and n + 4 columns. The location l
indicates the position where the time of interest falls within the observed events.
The meaning of the columns is as follows.
l
: number of observed events (failures times or
censoring times) between times 0 and the observation time;
d1, d2, ..., dn
: equals 0 if the event corresponds
to a censored observation, equals 1 if the event
corresponds to a failure;
S(t)
: numeric value of the associated support value;
num
: numerator of the support value as a fraction;
den
: denominator of the support value as a fraction.
Yuxin Qin ([email protected]), Heather Sasinowska ([email protected]), Larry Leemis ([email protected])
Qin, Y., Sasinowska, H., Leemis, L. (2023), "The Probability Mass
Function of the Kaplan-Meier Product-Limit Estimator",
, Volume 77, Number 1, 102-110.
km.outcomes(3)
km.outcomes(3)
Generates the probability mass function for the support values
of the Kaplan-Meier product-limit estimator for a particular sample
size n
, probability of observing a failure h
at the time of interest expressed as the cumulative probability perc
associated with X = min(T, C)
, where T
is the failure time and C
is the censoring time under a random-censoring scheme.
km.pmf(n, h, perc, plot, sep, xfrac, cex.lollipop)
km.pmf(n, h, perc, plot, sep, xfrac, cex.lollipop)
n |
sample size |
h |
probability of observing a failure, in other words, |
perc |
cumulative probability associated with |
plot |
option to plot the probability mass function (default is |
sep |
option to show the breakdown of the probability for each support value (see function |
xfrac |
option to label support values on the x-axis as exact fractions (default is |
cex.lollipop |
size of the dots atop the spikes |
The Kaplan-Meier product-limit estimator is used to
estimate the survivor function for a data set of
positive values in the presence of right censoring.
The km.pmf
function generates the probability mass function for the support values
of the Kaplan-Meier product-limit estimator for a particular sample
size n
, probability of observing a failure h
at the time of interest expressed as the cumulative probability perc
associated with X = min(T, C)
, where T
is the failure time and C
is the censoring time under a random-censoring scheme.
The n
argument must be a positive integer denoting
the sample size. Allowable limits are from 1 to 23.
Larger values of n
are not allowed because of CPU
and memory limitations.
For larger sample size n
, it is recommended to set
sep = FALSE
, xfrac = FALSE
, and
cex.lollipop = 0.01
for a better visual effect.
The km.pmf
function returns a dataframe with
2 columns. The column named S
stores all the support
values for the Kaplan-Meier product-limit estimator
with sample size n
, including NA
. The
column named P
stores the associated probabilities.
Yuxin Qin ([email protected]), Heather Sasinowska ([email protected]), Larry Leemis ([email protected])
Qin, Y., Sasinowska, H., Leemis, L. (2023), "The Probability Mass
Function of the Kaplan-Meier Product-Limit Estimator",
, Volume 77, Number 1, 102-110.
km.pmf(4, 1/3, 0.75) km.pmf(8, 1/2, 0.75, sep = FALSE, xfrac = FALSE, cex.lollipop = 0.01)
km.pmf(4, 1/3, 0.75) km.pmf(8, 1/2, 0.75, sep = FALSE, xfrac = FALSE, cex.lollipop = 0.01)
Calculate the support values for the Kaplan-Meier product-limit
estimator for a particular sample size n
using an induction
algorithm.
km.support(n)
km.support(n)
n |
sample size |
The Kaplan-Meier product-limit estimator is used to
estimate the survivor function for a data set of
positive values in the presence of right censoring.
The km.support
function calculates the support values for the
Kaplan-Meier product-limit estimator for a sample of
size n
using an induction algorithm
described in Qin et al. (2023).
The n
argument must be a positive integer denoting
the sample size. Allowable limits are from 1 to 35.
Larger values of n
are not allowed because of CPU
and memory limitations.
The numerators and denominators are temporarily converted to
complex numbers within the km.support
function in order to
eliminate duplicate support values using the unique
function.
The km.support
function returns a list with two components.
num
: a vector of integers containing the numerators of the
support values
den
: a vector of integers containing the associated
denominators of the support values
The support values are not returned in sorted order. Zero and one, which are always a part of the support, are given as 0 / 1 and 1 / 1.
Yuxin Qin ([email protected]), Heather Sasinowska ([email protected]), Larry Leemis ([email protected])
Qin, Y., Sasinowska, H., Leemis, L. (2023), "The Probability Mass
Function of the Kaplan-Meier Product-Limit Estimator",
, Volume 77, Number 1,
102-110.
# display unsorted numerators and denominators of support values for n = 4 km.support(4) # display sorted support values for n = 4 as exact fractions n <- 4 s <- km.support(n) i <- order(s$num / s$den) m <- length(s$num) f <- "" for (j in i[2:(m - 1)]) f <- paste(f, s$num[j], "/", s$den[j], ", ", sep = "") cat(paste("The ", m, " support values for n = ", n, " are: 0, ", f, "1.\n", sep = "")) # print sorted support values for n = 4 as numerics print(s$num[i] / s$den[i])
# display unsorted numerators and denominators of support values for n = 4 km.support(4) # display sorted support values for n = 4 as exact fractions n <- 4 s <- km.support(n) i <- order(s$num / s$den) m <- length(s$num) f <- "" for (j in i[2:(m - 1)]) f <- paste(f, s$num[j], "/", s$den[j], ", ", sep = "") cat(paste("The ", m, " support values for n = ", n, " are: 0, ", f, "1.\n", sep = "")) # print sorted support values for n = 4 as numerics print(s$num[i] / s$den[i])
X
Plot the probability mass functions for the support
values of the Kaplan-Meier product-limit estimator for
a given sample size n
with a probability of observing a failure h
at various times of interest expressed as the cumulative probability perc
associated with X = min(T, C)
, where T
is the failure time and C
is the censoring time, under a random-censoring scheme.
km.surv(n, h, lambda, ev, line, graydots, gray.cex, gray.outline, xfrac)
km.surv(n, h, lambda, ev, line, graydots, gray.cex, gray.outline, xfrac)
n |
sample size |
h |
probability of observing a failure |
lambda |
plotting frequency of the probability mass functions (default is 10) |
ev |
option to plot the expected values of the support
values (default is |
line |
option to connect the expected values with
lines (default is |
graydots |
option to express the weight of the support
values using grayscale (default is
|
gray.cex |
option to change the size of the gray dots (default is 1) |
gray.outline |
option to display outlines of the
gray dots (default is |
xfrac |
option to label support values on the y-axis
as exact fractions (default is |
The Kaplan-Meier product-limit estimator is used to
estimate the survivor function for a data set of
positive values in the presence of right censoring.
The km.surv
function plot the probability mass
functions for the support values of the Kaplan-Meier
product-limit estimator for a given sample size n
with a probability of observing a failure h
at
various times of interest expressed as the cumulative
probability perc
associated with X = min(T,
C)
, where T
is the failure time and C
is the
censoring time, under a random-censoring scheme.
The n
argument must be a positive integer denoting
the sample size. Allowable limits are from 1 to 23.
Larger values of n
are not allowed because of CPU
and memory limitations.
The default method to plot the probability mass functions
uses the area of a dot to indicate the relative probability
of a support value. An alternative is to plot the
probability mass functions using grayscales (by setting
graydots = TRUE
). One of the two approaches might
work better in different scenarios.
The expected values are calculated by removing the
probability of NA
and normalizing the rest of the
probabilities.
The km.surv
function doesn't return any value.
Yuxin Qin ([email protected]), Heather Sasinowska ([email protected]), Larry Leemis ([email protected])
Qin, Y., Sasinowska, H., Leemis, L. (2023), "The Probability Mass
Function of the Kaplan-Meier Product-Limit Estimator",
, Volume 77, Number 1,
102-110.
km.surv(n = 4, h = 2/3, lambda = 100, ev = TRUE, line = TRUE) km.surv(n = 5, h = 3/4, lambda = 50, graydots = TRUE, gray.cex = 0.6, gray.outline = FALSE) km.surv(n = 7, h = 1/5, lambda = 30, graydots = TRUE, gray.cex = 0.6, xfrac = FALSE)
km.surv(n = 4, h = 2/3, lambda = 100, ev = TRUE, line = TRUE) km.surv(n = 5, h = 3/4, lambda = 50, graydots = TRUE, gray.cex = 0.6, gray.outline = FALSE) km.surv(n = 7, h = 1/5, lambda = 30, graydots = TRUE, gray.cex = 0.6, xfrac = FALSE)
Estimate log logistic model parameters by the maximum likelihood method using possibly censored data. The corresponding code for this function as well as the manual information included here is attributed to Christophe Pouzat's STAR Package (archived 2022-05-23).
llogisMLE(yi, ni = numeric(length(yi)) + 1, si = numeric(length(yi)) + 1)
llogisMLE(yi, ni = numeric(length(yi)) + 1, si = numeric(length(yi)) + 1)
yi |
vector of (possibly binned) observations or a
|
ni |
vector of counts for each value of |
si |
vector of counts of uncensored observations for each
value of |
The MLE for the log logistic is not available in closed formed and
is therefore obtained numerically obtained by calling
optim
with the BFGS
method.
In order to ensure good behavior of the numerical optimization
routines, optimization is performed on the log of parameter
scale
.
Standard errors are obtained from the inverse of the observed information matrix at the MLE. They are transformed to go from the log scale used by the optimization routine to the requested parameterization.
A list of class durationFit
with the following components:
estimate |
the estimated parameters, a named vector. |
se |
the standard errors, a named vector. |
logLik |
the log likelihood at maximum. |
r |
a function returning the log of the relative likelihood function. |
mll |
a function returning the opposite of the log likelihood
function using the log of parameter |
call |
the matched call. |
The returned standard errors (component se
) are valid in the asymptotic limit. You
should plot contours using function r
in the returned list and
check that the contours are reasonably close to ellipses.
Christophe Pouzat [email protected]
Lindsey, J.K. (2004) Introduction to Applied Statistics: A Modelling Approach. OUP.
Lindsey, J.K. (2004) The Statistical Analysis of Stochastic Processes in Time. CUP.
dllogis
,
invgaussMLE
,
gammaMLE.
## Not run: ## Simulate sample of size 100 from a log logisitic ## distribution set.seed(1102006,"Mersenne-Twister") sampleSize <- 100 location.true <- -2.7 scale.true <- 0.025 sampLL <- rllogis(sampleSize,location=location.true,scale=scale.true) sampLLmleLL <- llogisMLE(sampLL) rbind(est = sampLLmleLL$estimate,se = sampLLmleLL$se,true = c(location.true,scale.true)) ## Estimate the log relative likelihood on a grid to plot contours Loc <- seq(sampLLmleLL$estimate[1]-4*sampLLmleLL$se[1], sampLLmleLL$estimate[1]+4*sampLLmleLL$se[1], sampLLmleLL$se[1]/10) Scale <- seq(sampLLmleLL$estimate[2]-4*sampLLmleLL$se[2], sampLLmleLL$estimate[2]+4*sampLLmleLL$se[2], sampLLmleLL$se[2]/10) sampLLmleLLcontour <- sapply(Loc, function(m) sapply(Scale, function(s) sampLLmleLL$r(m,s))) ## plot contours using a linear scale for the parameters ## draw four contours corresponding to the following likelihood ratios: ## 0.5, 0.1, Chi2 with 2 df and p values of 0.95 and 0.99 X11(width=12,height=6) layout(matrix(1:2,ncol=2)) contour(Loc,Scale,t(sampLLmleLLcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels=c("log(0.5)", "log(0.1)", "-1/2*P(Chi2=0.95)", "-1/2*P(Chi2=0.99)"), xlab="Location",ylab="Scale", main="Log Relative Likelihood Contours" ) points(sampLLmleLL$estimate[1],sampLLmleLL$estimate[2],pch=3) points(location.true,scale.true,pch=16,col=2) ## The contours are not really symmetrical about the MLE we can try to ## replot them using a log scale for the parameters to see if that improves ## the situation contour(Loc,log(Scale),t(sampLLmleLLcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels="", xlab="log(Location)",ylab="log(Scale)", main="Log Relative Likelihood Contours", sub="log scale for parameter: scale") points(sampLLmleLL$estimate[1],log(sampLLmleLL$estimate[2]),pch=3) points(location.true,log(scale.true),pch=16,col=2) ## make a parametric boostrap to check the distribution of the deviance nbReplicate <- 10000 sampleSize <- 100 system.time( devianceLL100 <- replicate(nbReplicate,{ sampLL <- rllogis(sampleSize,location=location.true,scale=scale.true) sampLLmleLL <- llogisMLE(sampLL) -2*sampLLmleLL$r(location.true,scale.true) } ) )[3] ## Get 95 and 99 ci <- sapply(1:nbReplicate, function(idx) qchisq(qbeta(c(0.005,0.025,0.975,0.995), idx, nbReplicate-idx+1), df=2) ) ## make QQ plot X <- qchisq(ppoints(nbReplicate),df=2) Y <- sort(devianceLL100) X11() plot(X,Y,type="n", xlab=expression(paste(chi[2]^2," quantiles")), ylab="MC quantiles", main="Deviance with true parameters after ML fit of log logistic data", sub=paste("sample size:", sampleSize,"MC replicates:", nbReplicate) ) abline(a=0,b=1) lines(X,ci[1,],lty=2) lines(X,ci[2,],lty=2) lines(X,ci[3,],lty=2) lines(X,ci[4,],lty=2) lines(X,Y,col=2) ## End(Not run)
## Not run: ## Simulate sample of size 100 from a log logisitic ## distribution set.seed(1102006,"Mersenne-Twister") sampleSize <- 100 location.true <- -2.7 scale.true <- 0.025 sampLL <- rllogis(sampleSize,location=location.true,scale=scale.true) sampLLmleLL <- llogisMLE(sampLL) rbind(est = sampLLmleLL$estimate,se = sampLLmleLL$se,true = c(location.true,scale.true)) ## Estimate the log relative likelihood on a grid to plot contours Loc <- seq(sampLLmleLL$estimate[1]-4*sampLLmleLL$se[1], sampLLmleLL$estimate[1]+4*sampLLmleLL$se[1], sampLLmleLL$se[1]/10) Scale <- seq(sampLLmleLL$estimate[2]-4*sampLLmleLL$se[2], sampLLmleLL$estimate[2]+4*sampLLmleLL$se[2], sampLLmleLL$se[2]/10) sampLLmleLLcontour <- sapply(Loc, function(m) sapply(Scale, function(s) sampLLmleLL$r(m,s))) ## plot contours using a linear scale for the parameters ## draw four contours corresponding to the following likelihood ratios: ## 0.5, 0.1, Chi2 with 2 df and p values of 0.95 and 0.99 X11(width=12,height=6) layout(matrix(1:2,ncol=2)) contour(Loc,Scale,t(sampLLmleLLcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels=c("log(0.5)", "log(0.1)", "-1/2*P(Chi2=0.95)", "-1/2*P(Chi2=0.99)"), xlab="Location",ylab="Scale", main="Log Relative Likelihood Contours" ) points(sampLLmleLL$estimate[1],sampLLmleLL$estimate[2],pch=3) points(location.true,scale.true,pch=16,col=2) ## The contours are not really symmetrical about the MLE we can try to ## replot them using a log scale for the parameters to see if that improves ## the situation contour(Loc,log(Scale),t(sampLLmleLLcontour), levels=c(log(c(0.5,0.1)),-0.5*qchisq(c(0.95,0.99),df=2)), labels="", xlab="log(Location)",ylab="log(Scale)", main="Log Relative Likelihood Contours", sub="log scale for parameter: scale") points(sampLLmleLL$estimate[1],log(sampLLmleLL$estimate[2]),pch=3) points(location.true,log(scale.true),pch=16,col=2) ## make a parametric boostrap to check the distribution of the deviance nbReplicate <- 10000 sampleSize <- 100 system.time( devianceLL100 <- replicate(nbReplicate,{ sampLL <- rllogis(sampleSize,location=location.true,scale=scale.true) sampLLmleLL <- llogisMLE(sampLL) -2*sampLLmleLL$r(location.true,scale.true) } ) )[3] ## Get 95 and 99 ci <- sapply(1:nbReplicate, function(idx) qchisq(qbeta(c(0.005,0.025,0.975,0.995), idx, nbReplicate-idx+1), df=2) ) ## make QQ plot X <- qchisq(ppoints(nbReplicate),df=2) Y <- sort(devianceLL100) X11() plot(X,Y,type="n", xlab=expression(paste(chi[2]^2," quantiles")), ylab="MC quantiles", main="Deviance with true parameters after ML fit of log logistic data", sub=paste("sample size:", sampleSize,"MC replicates:", nbReplicate) ) abline(a=0,b=1) lines(X,ci[1,],lty=2) lines(X,ci[2,],lty=2) lines(X,ci[3,],lty=2) lines(X,ci[4,],lty=2) lines(X,Y,col=2) ## End(Not run)