| Title: | Goodness-of-Fit Tests and Diagrams Based on Mellin Log-Cumulants |
|---|---|
| Description: | A family of three complementary goodness-of-fit tests based on an adaptation of Hotelling's T-squared statistic applied to vectors of sample log-cumulants (Mellin statistics) for positive-support reliability data. The package provides the asymptotic chi-squared reference and parametric bootstrap p-values for reliable finite-sample inference, covering the Weibull, Frechet, Gamma, Inverse-Gamma, Log-Normal, and Log-Logistic families. It also provides three diagnostic diagrams (log-cumulant, kurtosis-skewness, and coefficient-of-variation) with bootstrap concentration ellipses, in the spirit of moment-ratio diagrams. Methods are described in Santos, Ospina, Espinheira and Oliveira (2025). |
| Authors: | Raydonal Ospina [aut, cre] (ORCID: <https://orcid.org/0000-0002-9884-9090>) |
| Maintainer: | Raydonal Ospina <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-06-12 15:04:10 UTC |
| Source: | https://github.com/cran/logcumulant |
Computes the Anderson-Darling (AD) and Cramer-von Mises (CvM) statistics and their p-values for a fitted distribution, based on the probability integral transform.
ad_cvm_test(x, dist, theta)ad_cvm_test(x, dist, theta)
x |
Numeric vector of positive observations. |
dist |
Character; distribution name. |
theta |
Numeric length-2 parameter vector (typically MLE). |
A list with AD, AD_p, CvM, CvM_p.
set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) ad_cvm_test(x, "Weibull", c(2, 1))set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) ad_cvm_test(x, "Weibull", c(2, 1))
Akaike information criterion for a fitted family
aic_value(x, dist, fit)aic_value(x, dist, fit)
x |
Numeric vector of positive observations. |
dist |
Character; distribution name. |
fit |
A |
Numeric AIC value.
set.seed(1); x <- rdist(100, "Gamma", c(3, 0.5)) aic_value(x, "Gamma", mle_fit(x, "Gamma"))set.seed(1); x <- rdist(100, "Gamma", c(3, 0.5)) aic_value(x, "Gamma", mle_fit(x, "Gamma"))
Draws the coefficient-of-variation diagnostic diagram on the original scale
(CV versus skewness ) with theoretical loci,
bootstrap cloud, and 95% concentration ellipse.
cv_diagram( data, data_name = "Dataset", B = NULL, seed = 42, level = 0.95, xlim = c(0, 2.2), ylim = c(-0.2, 5) )cv_diagram( data, data_name = "Dataset", B = NULL, seed = 42, level = 0.95, xlim = c(0, 2.2), ylim = c(-0.2, 5) )
data |
Numeric vector of positive observations. |
data_name |
Character; label used in the title. |
B |
Integer; bootstrap replicates (default chosen adaptively from |
seed |
Integer random seed. |
level |
Numeric; ellipse confidence level (default 0.95). |
xlim, ylim
|
Numeric length-2 axis limits. |
A ggplot object.
data(reliability_datasets) cv_diagram(reliability_datasets$Yarn, "Yarn", B = 200)data(reliability_datasets) cv_diagram(reliability_datasets$Yarn, "Yarn", B = 200)
Unified density, random-number, and distribution-function interfaces for the
six positive-support families supported by the package: Weibull, Frechet,
Gamma, Inverse-Gamma, Log-Normal, and Log-Logistic. The two-parameter vector
theta = c(par1, par2) is interpreted as (shape, scale) for all
families except Log-Normal, where it is (meanlog, sdlog).
ldist(x, dist, theta, log = TRUE) rdist(n, dist, theta) pdist(q, dist, theta)ldist(x, dist, theta, log = TRUE) rdist(n, dist, theta) pdist(q, dist, theta)
x, q
|
Numeric vector of quantiles (positive support). |
dist |
Character; one of |
theta |
Numeric length-2 parameter vector (see Details). |
log |
Logical; if |
n |
Integer; number of random values to draw. |
ldist returns the (log-)density, rdist a random sample
of length n, and pdist the cumulative distribution function,
each evaluated at the supplied points.
set.seed(1) x <- rdist(100, "Weibull", c(2, 1)) head(ldist(x, "Weibull", c(2, 1))) pdist(1, "Gamma", c(3, 0.5))set.seed(1) x <- rdist(100, "Weibull", c(2, 1)) head(ldist(x, "Weibull", c(2, 1))) pdist(1, "Gamma", c(3, 0.5))
Per-sample (unit) Fisher information matrix for the supported families. The Weibull/Frechet matrix uses the corrected positive-definite form derived in the methodology.
fisher_closed(dist, theta)fisher_closed(dist, theta)
dist |
Character; distribution name. |
theta |
Numeric length-2 parameter vector. |
A 2 by 2 Fisher information matrix.
fisher_closed("Weibull", c(2, 1))fisher_closed("Weibull", c(2, 1))
Fits a single family and returns the three statistics (with
asymptotic and, optionally, bootstrap p-values), the AD and CvM tests, and
the AIC, in a single row.
gof_analyze(x, dist, use_bootstrap = FALSE, B = NULL, seed = NULL)gof_analyze(x, dist, use_bootstrap = FALSE, B = NULL, seed = NULL)
x |
Numeric vector of positive observations. |
dist |
Character; distribution name. |
use_bootstrap |
Logical; compute bootstrap p-values. |
B |
Integer; bootstrap replicates. |
seed |
Optional integer random seed. |
A one-row data.frame of statistics and p-values.
set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) gof_analyze(x, "Weibull")set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) gof_analyze(x, "Weibull")
Runs gof_analyze across all six (or a chosen subset of)
families and returns a comparison table, the natural entry point for model
selection.
gof_compare_all( x, dists = .LC_DISTS, use_bootstrap = FALSE, B = NULL, seed = NULL )gof_compare_all( x, dists = .LC_DISTS, use_bootstrap = FALSE, B = NULL, seed = NULL )
x |
Numeric vector of positive observations. |
dists |
Character vector of distribution names to compare. |
use_bootstrap |
Logical; compute bootstrap p-values. |
B |
Integer; bootstrap replicates. |
seed |
Optional integer random seed. |
A data.frame with one row per distribution.
set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) gof_compare_all(x)set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) gof_compare_all(x)
Returns the analytic Jacobian for the selected set of
cumulant orders, used in the construction of the statistics.
jacobian_J(dist, theta, V)jacobian_J(dist, theta, V)
dist |
Character; distribution name. |
theta |
Numeric length-2 parameter vector. |
V |
Integer vector of cumulant orders (e.g. |
A length(V) by 2 numeric matrix.
jacobian_J("Weibull", c(2, 1), V = c(2, 3))jacobian_J("Weibull", c(2, 1), V = c(2, 3))
Draws the kurtosis-skewness diagnostic diagram on the original scale
(skewness versus excess kurtosis ), including
the feasible-region boundary , theoretical
loci, bootstrap cloud, and 95% concentration ellipse.
kurtosis_diagram( data, data_name = "Dataset", B = NULL, seed = 42, level = 0.95, xlim = c(-1.5, 4), ylim = c(-3, 16) )kurtosis_diagram( data, data_name = "Dataset", B = NULL, seed = 42, level = 0.95, xlim = c(-1.5, 4), ylim = c(-3, 16) )
data |
Numeric vector of positive observations. |
data_name |
Character; label used in the title. |
B |
Integer; bootstrap replicates (default chosen adaptively from |
seed |
Integer random seed. |
level |
Numeric; ellipse confidence level (default 0.95). |
xlim, ylim
|
Numeric length-2 axis limits. |
A ggplot object.
data(reliability_datasets) kurtosis_diagram(reliability_datasets$Yarn, "Yarn", B = 200)data(reliability_datasets) kurtosis_diagram(reliability_datasets$Yarn, "Yarn", B = 200)
Draws the log-cumulant diagnostic diagram ( versus
) with the theoretical loci of the six reference
distributions, a bootstrap cloud of the sample estimate, and a 95%
concentration ellipse.
log_cumulant_diagram( data, data_name = "Dataset", B = NULL, seed = 42, level = 0.95, xlim = c(-2, 2), ylim = c(0, 2) )log_cumulant_diagram( data, data_name = "Dataset", B = NULL, seed = 42, level = 0.95, xlim = c(-2, 2), ylim = c(0, 2) )
data |
Numeric vector of positive observations. |
data_name |
Character; label used in the title. |
B |
Integer; bootstrap replicates (default chosen adaptively from |
seed |
Integer random seed. |
level |
Numeric; ellipse confidence level (default 0.95). |
xlim, ylim
|
Numeric length-2 axis limits. |
A ggplot object.
data(reliability_datasets) log_cumulant_diagram(reliability_datasets$Yarn, "Yarn", B = 200)data(reliability_datasets) log_cumulant_diagram(reliability_datasets$Yarn, "Yarn", B = 200)
Fits one of the six supported families by maximum likelihood, optimizing on the log-scale of the parameters for numerical stability, and returns the estimates together with the observed-information-based covariance.
mle_fit(x, dist, init = NULL)mle_fit(x, dist, init = NULL)
x |
Numeric vector of positive observations. |
dist |
Character; distribution name. |
init |
Optional numeric length-2 vector of starting values. |
A list with elements theta (estimates), Sigma
(covariance of ), loglik,
and conv (convergence flag).
set.seed(1) x <- rdist(200, "Gamma", c(3, 0.5)) fit <- mle_fit(x, "Gamma") fit$thetaset.seed(1) x <- rdist(200, "Gamma", c(3, 0.5)) fit <- mle_fit(x, "Gamma") fit$theta
Overlays bootstrap clouds for several datasets on the log-cumulant diagram, distinguishing datasets by colour and plotting symbol (the “empirical data” legend) while the theoretical loci keep the “theoretical curve” legend.
multi_lc_diagram( datasets_list, dataset_names = NULL, B = 1000, seed = 42, xlim = c(-2, 2), ylim = c(0, 2), alpha_points = 0.35, point_size = 2.6 )multi_lc_diagram( datasets_list, dataset_names = NULL, B = 1000, seed = 42, xlim = c(-2, 2), ylim = c(0, 2), alpha_points = 0.35, point_size = 2.6 )
datasets_list |
Named list of numeric vectors. |
dataset_names |
Optional character vector of names to use. |
B |
Integer; bootstrap replicates per dataset. |
seed |
Integer random seed. |
xlim, ylim
|
Numeric length-2 axis limits. |
alpha_points |
Numeric; point transparency. |
point_size |
Numeric; point size. |
A ggplot object.
data(reliability_datasets) multi_lc_diagram(reliability_datasets[c("Airplane","BallBearing","Yarn")], B = 300)data(reliability_datasets) multi_lc_diagram(reliability_datasets[c("Airplane","BallBearing","Yarn")], B = 300)
Convenience wrapper around log_cumulant_diagram providing the
compact plot_lc(data = x, B = 100) interface requested for quick
diagnostics. plot.lc is kept as an alias for backward compatibility.
plot_lc(data, B = 100, data_name = "Sample", seed = 42, ...)plot_lc(data, B = 100, data_name = "Sample", seed = 42, ...)
data |
Numeric vector of positive observations. |
B |
Integer; bootstrap replicates. |
data_name |
Character; label used in the title. |
seed |
Integer random seed. |
... |
Further arguments passed to |
A ggplot object.
data(reliability_datasets) plot_lc(reliability_datasets$BallBearing, B = 100)data(reliability_datasets) plot_lc(reliability_datasets$BallBearing, B = 100)
Monte Carlo study of the power of the three tests and the AD/CvM
tests against a set of alternative distributions, with optional
size-correction.
power_study( n = 100, Nsim = 1000, eta = 0.05, alternatives = names(.ALT_CONFIGS), use_bootstrap = FALSE, B = NULL, seed = 2025, verbose = TRUE )power_study( n = 100, Nsim = 1000, eta = 0.05, alternatives = names(.ALT_CONFIGS), use_bootstrap = FALSE, B = NULL, seed = 2025, verbose = TRUE )
n |
Integer; sample size. |
Nsim |
Integer; number of Monte Carlo replications. |
eta |
Numeric; nominal significance level. |
alternatives |
Character vector of alternative names to evaluate. |
use_bootstrap |
Logical; use bootstrap calibration. |
B |
Integer; bootstrap replicates. |
seed |
Integer random seed. |
verbose |
Logical; print progress. |
A data.frame of empirical power by test and alternative.
power_study(n = 100, Nsim = 100)power_study(n = 100, Nsim = 100)
A named list with nine positive-valued reliability datasets analyzed in the paper: Kevlar, Resistors, Tensile, Airplane, BallBearing, Airborne, Failure, Yarn, AirCon.
data(reliability_datasets)data(reliability_datasets)
A named list of numeric vectors.
Monte Carlo study of the empirical size of the three tests
(asymptotic and, optionally, bootstrap) and the AD/CvM tests under a true
null model, across several sample sizes.
size_study( sample_sizes = c(30, 50, 100, 200), Nsim = 1000, eta = 0.05, use_bootstrap = FALSE, B = NULL, seed = 2025, verbose = TRUE )size_study( sample_sizes = c(30, 50, 100, 200), Nsim = 1000, eta = 0.05, use_bootstrap = FALSE, B = NULL, seed = 2025, verbose = TRUE )
sample_sizes |
Integer vector of sample sizes. |
Nsim |
Integer; number of Monte Carlo replications. |
eta |
Numeric; nominal significance level. |
use_bootstrap |
Logical; include bootstrap calibration. |
B |
Integer; bootstrap replicates. |
seed |
Integer random seed. |
verbose |
Logical; print progress. |
A data.frame of empirical rejection rates.
size_study(sample_sizes = c(30, 50), Nsim = 100)size_study(sample_sizes = c(30, 50), Nsim = 100)
statisticsConvenience wrapper returning the three nested versions
, , and
for a single fitted model.
T2_all(x, dist, fit = NULL)T2_all(x, dist, fit = NULL)
x |
Numeric vector of positive observations. |
dist |
Character; null distribution name. |
fit |
Optional precomputed |
A named list with components T2_23, T2_123,
T2_123456, each as returned by T2_one.
set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) T2_all(x, "Weibull")set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) T2_all(x, "Weibull")
statisticsComputes parametric-bootstrap p-values for the three nested
statistics. The bootstrap calibrates the ill-conditioned reference
distribution and is the recommended mode of inference in finite samples.
T2_bootstrap(x, dist, B = NULL, fit = NULL, seed = NULL)T2_bootstrap(x, dist, B = NULL, fit = NULL, seed = NULL)
x |
Numeric vector of positive observations. |
dist |
Character; null distribution name. |
B |
Integer; number of bootstrap replicates (default chosen adaptively). |
fit |
Optional precomputed |
seed |
Optional integer random seed for reproducibility. |
A list with the observed statistics and the bootstrap p-values
p_boot for the three versions.
set.seed(1); x <- rdist(80, "Weibull", c(2, 1)) T2_bootstrap(x, "Weibull", B = 199, seed = 1)set.seed(1); x <- rdist(80, "Weibull", c(2, 1)) T2_bootstrap(x, "Weibull", B = 199, seed = 1)
statistic for one cumulant setComputes the log-cumulant goodness-of-fit statistic for a single
choice of cumulant orders V, with the asymptotic chi-squared
reference using the corrected (full-rank) degrees of freedom.
T2_one(x, dist, V, fit = NULL)T2_one(x, dist, V, fit = NULL)
x |
Numeric vector of positive observations. |
dist |
Character; null distribution name. |
V |
Integer vector of cumulant orders (e.g. |
fit |
Optional precomputed |
A list with the statistic T2, degrees of freedom df,
and asymptotic p-value p_asym.
set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) T2_one(x, "Weibull", V = c(2, 3))set.seed(1); x <- rdist(100, "Weibull", c(2, 1)) T2_one(x, "Weibull", V = c(2, 3))
Closed-form theoretical log-cumulants
(Mellin cumulants of the second kind) for a given family and parameter
vector, as tabulated in the methodology.
theoretical_lc(dist, theta, order = 6)theoretical_lc(dist, theta, order = 6)
dist |
Character; distribution name (see |
theta |
Numeric length-2 parameter vector. |
order |
Integer; highest cumulant order to return (default 6). |
Numeric vector of length order with the log-cumulants.
theoretical_lc("Weibull", c(2, 1)) theoretical_lc("Gamma", c(3, 0.5), order = 4)theoretical_lc("Weibull", c(2, 1)) theoretical_lc("Gamma", c(3, 0.5), order = 4)
Arranges the log-cumulant, kurtosis-skewness, and coefficient-of-variation diagrams side by side for a single dataset.
three_diagrams(data, data_name = "Dataset", B = NULL, seed = 42)three_diagrams(data, data_name = "Dataset", B = NULL, seed = 42)
data |
Numeric vector of positive observations. |
data_name |
Character; label used in the title. |
B |
Integer; bootstrap replicates (default chosen adaptively from |
seed |
Integer random seed. |
A gtable drawn via gridExtra::grid.arrange.
data(reliability_datasets) three_diagrams(reliability_datasets$Yarn, "Yarn", B = 200)data(reliability_datasets) three_diagrams(reliability_datasets$Yarn, "Yarn", B = 200)