| Title: | Bootstrap-Calibrated Goodness-of-Fit Test for Simplex Regression |
|---|---|
| Description: | Implements the bootstrap-calibrated local-influence goodness-of-fit test for simplex regression models with constant or varying dispersion, following the local influence approach of Zhu and Zhang (2004) <doi:10.1093/biomet/91.3.579> and the simplex regression model of Barndorff-Nielsen and Jorgensen (1991) <doi:10.1016/0047-259X(91)90008-P>. The test statistic aggregates individual local-influence measures under case-weight perturbation. Because the first-order asymptotic normal calibration is severely liberal in finite samples, a parametric bootstrap calibration is provided that restores accurate size control and delivers high power against omitted covariates, neglected dispersion, and distributional misspecification. Plotting functions reproduce the figures and tables of the companion methodological paper. Computational kernels are implemented in 'C++' via 'Rcpp' and 'RcppArmadillo' for speed, and two real datasets are bundled. |
| Authors: | Raydonal Ospina [aut, cre] (ORCID: <https://orcid.org/0000-0002-9884-9090>) |
| Maintainer: | Raydonal Ospina <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-06-19 16:45:38 UTC |
| Source: | https://github.com/cran/simplexgof |
Data from an industrial ammonia oxidation process recorded over 21 days (Brownlee, 1965, p. 45). The response is the proportion of ammonia not converted to nitric acid; three process variables are included.
ammoniaammonia
A data frame with 21 rows and 4 columns:
perdaProportion of ammonia loss (response), in (0, 1).
corr_arAir flow rate (arbitrary units).
temp_aguaCooling water inlet temperature (degrees Celsius).
conc_acidoNitric acid concentration (percent).
The model selected in Espinheira et al. (2026) is:
where = corr_ar, = temp_agua.
The variable conc_acido is included in the dataset for completeness
but was not used in the selected model.
Brownlee, K.A. (1965). Statistical Theory and Methodology in Science and Engineering, 2nd ed. Wiley, New York, p. 45.
Espinheira P.L., Silva F.C., Barros M., Ospina R. (2026). A Bootstrap-Calibrated Local Influence Goodness-of-Fit Procedure for Simplex Regression Models.
data(ammonia) str(ammonia) with(ammonia, plot(corr_ar, perda, xlab = "Air flow", ylab = "Ammonia loss"))data(ammonia) str(ammonia) with(ammonia, plot(corr_ar, perda, xlab = "Air flow", ylab = "Ammonia loss"))
Fits the simplex regression model to the Brownlee (1965) ammonia-oxidation
data, runs the bootstrap test, and optionally produces the
influence index plot and the half-normal envelope plot, reproducing
Section 7.1 and Tables 5–6 of Ospina et al. (2026).
paper_ammonia(B = 1000, seed = 123, plot = TRUE, verbose = TRUE)paper_ammonia(B = 1000, seed = 123, plot = TRUE, verbose = TRUE)
B |
Integer; bootstrap replicates. Default 1000. |
seed |
Integer; random seed for reproducibility. Default 123. |
plot |
Logical; whether to produce the two diagnostic plots.
Default |
verbose |
Logical; whether to print progress. Default |
A list (invisibly) with components:
fitThe "simplexfit" object.
gofThe "simplexgof" object.
diagThe simplex_diag() output.
table_paramsData frame of parameter estimates (Table 5).
table_gofData frame of GoF test results (Table 6).
res <- paper_ammonia(B = 200, seed = 123) # B = 1000 in the paper print(res$table_params) print(res$table_gof)res <- paper_ammonia(B = 200, seed = 123) # B = 1000 in the paper print(res$table_params) print(res$table_gof)
Reproduces Figure 1 of Ospina et al. (2026): QQ-plots and histograms
of the asymptotic statistic against the standard normal, for
three ranges of and two dispersion levels
(). Also returns the table of
characteristic measures (mean, variance, kurtosis, skewness).
paper_fig1(n = 40, R = 1000, sigma2 = c(0.5, 16), seed = 185, plot = TRUE)paper_fig1(n = 40, R = 1000, sigma2 = c(0.5, 16), seed = 185, plot = TRUE)
n |
Sample size. Default 40. |
R |
Number of Monte Carlo replications. Default 1000. |
sigma2 |
Dispersion values to study. Default |
seed |
Random seed for the (fixed) covariate design and the MC loop.
Default 185 (chosen to match the |
plot |
Logical; produce the QQ and histogram panels. Default
|
The true vectors are those of Table 1 of the paper, chosen
so that the fitted means fall in
, and .
Covariates are , generated once and held fixed.
Invisibly, a list with Un (named list of
vectors) and measures (data frame of characteristic measures).
simplex_Un_asymptotic, paper_ammonia
res <- paper_fig1(n = 40, R = 200) # R = 1000 in the paper print(res$measures)res <- paper_fig1(n = 40, R = 200) # R = 1000 in the paper print(res$measures)
Fits the simplex regression model to the PBSC transplant dataset
(Edmonton Hematopoietic Institute), runs the bootstrap test,
and optionally produces diagnostic plots, reproducing Section 7.2 and
Tables 7–8 of Ospina et al. (2026).
paper_pbsc(B = 1000, seed = 456, plot = TRUE, verbose = TRUE)paper_pbsc(B = 1000, seed = 456, plot = TRUE, verbose = TRUE)
B |
Integer; bootstrap replicates. Default 1000. |
seed |
Integer; random seed. Default 456. |
plot |
Logical; whether to produce diagnostic plots. Default |
verbose |
Logical; print progress. Default |
A list (invisibly) with components fit, gof,
diag, table_params, table_gof.
res <- paper_pbsc(B = 200, seed = 456) print(res$table_params)res <- paper_pbsc(B = 200, seed = 456) print(res$table_params)
Data from a study of 242 patients at the Edmonton Hematopoietic Institute - Alberta Health Services (Espinheira et al., 2026). The response is the CD34+ cell recovery rate (ratio of viable CD34+ cells post-thaw to pre-freeze), a continuous proportion in (0, 1).
pbscpbsc
A data frame with 242 rows and 3 columns:
recoveryCD34+ recovery rate (response), in (0, 1).
adj_ageAdjusted patient-age variable.
chemoChemotherapy protocol indicator: 0 = one-day protocol, 1 = three-day protocol.
The model selected in Espinheira et al. (2026) is:
where = chemo and = =
adj_age.
Edmonton Hematopoietic Institute - Alberta Health Services. Used under the data-sharing policy of that institution.
Espinheira P.L., Silva F.C., Barros M., Ospina R. (2026). A Bootstrap-Calibrated Local Influence Goodness-of-Fit Procedure for Simplex Regression Models.
data(pbsc) str(pbsc) hist(pbsc$recovery, breaks = 30, main = "CD34+ Recovery Rate")data(pbsc) str(pbsc) hist(pbsc$recovery, breaks = 30, main = "CD34+ Recovery Rate")
Produces a half-normal plot (Atkinson, 1985) with a simulated envelope of absolute deviance residuals with a bootstrap envelope, used to assess the overall fit of a simplex regression model. This replicates the diagnostic plots in Ospina et al. (2026).
plot_envelope( fit, B = 99, conf = 0.95, col.obs = "#2c7bb6", col.env = "#abd9e9", main = "Half-normal plot with bootstrap envelope", xlab = "Half-normal quantile", ylab = "Absolute deviance residual", ... )plot_envelope( fit, B = 99, conf = 0.95, col.obs = "#2c7bb6", col.env = "#abd9e9", main = "Half-normal plot with bootstrap envelope", xlab = "Half-normal quantile", ylab = "Absolute deviance residual", ... )
fit |
A |
B |
Integer; number of simulations for the envelope. Default 99. |
conf |
Numeric in (0,1); confidence level for the envelope. Default 0.95. |
col.obs |
Colour for the observed residuals. |
col.env |
Colour for the envelope band. |
main, xlab, ylab
|
Plot labels. |
... |
Further arguments to |
Invisibly returns a list with $residuals (observed) and
$envelope (matrix of simulated residuals).
Atkinson A.C. (1985). Plots, Transformations, and Regression. Oxford University Press.
data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) fit <- simplex_fit(ammonia$perda, X, Z) plot_envelope(fit, B = 99)data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) fit <- simplex_fit(ammonia$perda, X, Z) plot_envelope(fit, B = 99)
StatisticDisplays the empirical bootstrap distribution of together
with the observed value and the bootstrap critical values at
each significance level, as shown in Ospina et al. (2026).
plot_gof_boot( x, col.hist = "#abd9e9", col.obs = "#d7191c", col.cv = "#2c7bb6", main = NULL, xlab = expression(U[n]^"*"), ylab = "Density", ... )plot_gof_boot( x, col.hist = "#abd9e9", col.obs = "#d7191c", col.cv = "#2c7bb6", main = NULL, xlab = expression(U[n]^"*"), ylab = "Density", ... )
x |
A |
col.hist |
Fill colour for the histogram. |
col.obs |
Colour for the observed |
col.cv |
Colour for the critical-value lines. |
main, xlab, ylab
|
Plot labels. |
... |
Further arguments passed to |
Invisibly returns the "simplexgof" object.
simplex_gof, plot_influence,
plot.simplexgof
data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) set.seed(123) gof <- simplex_gof(ammonia$perda, X, Z, B = 200, verbose = FALSE) plot_gof_boot(gof)data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) set.seed(123) gof <- simplex_gof(ammonia$perda, X, Z, B = 200, verbose = FALSE) plot_gof_boot(gof)
Plots the total local-influence measure for each observation
under case-weight perturbation, as used in the companion article
(Ospina et al., 2026, Figure 2). A horizontal reference line at
flags potentially influential observations.
plot_influence( x, threshold = 2, col.bar = "#2c7bb6", col.line = "#d7191c", label = TRUE, main = "Local influence -- case-weight perturbation", xlab = "Observation index", ylab = expression(C[It]), ... )plot_influence( x, threshold = 2, col.bar = "#2c7bb6", col.line = "#d7191c", label = TRUE, main = "Local influence -- case-weight perturbation", xlab = "Observation index", ylab = expression(C[It]), ... )
x |
A |
threshold |
Numeric scalar; multiplier for the mean of |
col.bar |
Colour for the bars. Default |
col.line |
Colour for the reference line. Default |
label |
Logical; whether to label bars above the threshold with their
index. Default |
main, xlab, ylab
|
Plot title and axis labels. |
... |
Additional arguments passed to |
Invisibly returns the numeric vector .
simplex_diag, simplex_fit,
plot.simplexfit
data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) fit <- simplex_fit(ammonia$perda, X, Z) dg <- simplex_diag(fit) plot_influence(dg)data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) fit <- simplex_fit(ammonia$perda, X, Z) dg <- simplex_diag(fit) plot_influence(dg)
plot method for objects of class "simplexfit". Produces
influence index plots and/or a half-normal plot with simulated envelope.
## S3 method for class 'simplexfit' plot( x, which = c("influence", "envelope"), ask = length(which) > 1 && dev.interactive(), ... )## S3 method for class 'simplexfit' plot( x, which = c("influence", "envelope"), ask = length(which) > 1 && dev.interactive(), ... )
x |
An object of class |
which |
Character vector indicating which plots to produce:
|
ask |
Logical; if |
... |
Further arguments passed to |
The object x, invisibly.
simplex_fit, plot_influence,
plot_envelope
data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) fit <- simplex_fit(ammonia$perda, X, Z) plot(fit, which = "influence")data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) fit <- simplex_fit(ammonia$perda, X, Z) plot(fit, which = "influence")
plot method for objects of class "simplexgof". Produces the
bootstrap distribution of and/or an influence index plot.
## S3 method for class 'simplexgof' plot( x, which = c("boot", "influence"), ask = length(which) > 1 && dev.interactive(), ... )## S3 method for class 'simplexgof' plot( x, which = c("boot", "influence"), ask = length(which) > 1 && dev.interactive(), ... )
x |
An object of class |
which |
Character vector indicating which plots to produce:
|
ask |
Logical; if |
... |
Further arguments passed to |
The object x, invisibly.
simplex_gof, plot_gof_boot,
plot_influence
data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) set.seed(42) gof <- simplex_gof(ammonia$perda, X, Z, B = 50, verbose = FALSE) plot(gof, which = "boot")data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) set.seed(42) gof <- simplex_gof(ammonia$perda, X, Z, B = 50, verbose = FALSE) plot(gof, which = "boot")
Generates independent observations from a simplex distribution
using the representation via inverse-Gaussian
and chi-squared random variables (Michael, Schucany & Haas, 1976).
rsimplex(n, mu, sigma2)rsimplex(n, mu, sigma2)
n |
Integer; number of observations. |
mu |
Numeric scalar or vector of means in (0, 1). |
sigma2 |
Numeric scalar or vector of dispersion parameters (> 0). |
Uses the reparametrisation (odds) and
to generate from an inverse-Gaussian
mixture. Identical algorithm to the Ox reference implementation.
Numeric vector of length n with values in (0, 1).
Barndorff-Nielsen O.E., Jorgensen B. (1991). Some parametric models on the simplex. Journal of Multivariate Analysis, 39(1), 106–116.
Michael J.R., Schucany W.R., Haas R.W. (1976). Generating random variates using transformations with multiple roots. The American Statistician, 30(2), 88–90.
set.seed(1) y <- rsimplex(200, mu = 0.3, sigma2 = 2) hist(y, breaks = 20, main = "Simplex(0.3, 2)")set.seed(1) y <- rsimplex(200, mu = 0.3, sigma2 = 2) hist(y, breaks = 20, main = "Simplex(0.3, 2)")
Replicates the Monte Carlo study of Espinheira et al. (2026), computing
empirical rejection rates of the bootstrap test under correct
specification (size) or misspecification (power).
sim_table1( n = 40, beta = c(-3, 2, 1, -1, 0.5), sigma2 = 0.5, R = 5000, B = 1000, alpha = c(0.01, 0.05, 0.1), mu_range = c("low", "mid", "high"), ncores = 1, seed = NULL )sim_table1( n = 40, beta = c(-3, 2, 1, -1, 0.5), sigma2 = 0.5, R = 5000, B = 1000, alpha = c(0.01, 0.05, 0.1), mu_range = c("low", "mid", "high"), ncores = 1, seed = NULL )
n |
Sample size. |
beta |
Numeric vector of mean-model coefficients. |
sigma2 |
Dispersion parameter (constant model). |
R |
Number of Monte Carlo replications. |
B |
Number of bootstrap replicates per replication. |
alpha |
Significance levels. |
mu_range |
One of |
ncores |
Number of parallel workers for the outer MC loop. Default 1. |
seed |
Random seed. Default |
A data frame with columns alpha and rej_rate.
res <- sim_table1(n = 40, beta = c(-3, 2, 1, -1, 0.5), sigma2 = 0.5, R = 100, B = 100, mu_range = "mid", ncores = 1) print(res)res <- sim_table1(n = 40, beta = c(-3, 2, 1, -1, 0.5), sigma2 = 0.5, R = 100, B = 100, mu_range = "mid", ncores = 1) print(res)
Given a fitted simplex regression model, computes all quantities needed
for the goodness-of-fit statistic: the influence
measures, , the J gradient vector, and the individual
terms that estimate the asymptotic variance of .
simplex_diag(fit, J.method = c("analytic", "finitediff"))simplex_diag(fit, J.method = c("analytic", "finitediff"))
fit |
An object of class |
J.method |
Method used to compute the J gradient vector: either
|
The J vector is computed using the analytic closed-form expressions derived in the article (Ox-compatible implementation). These analytic expressions are faster than numerical differentiation and produce the same test decisions as the original Ox reference implementation.
A list with components:
TnThe numerator of : .
UnThe test statistic .
CeiNumeric vector of length n: individual influence values.
J_vecGradient vector of w.r.t. .
k_vecNumeric vector of length n: the terms.
A_star, B_star, A_star_invEstimated matrices.
DeltaPerturbation matrix (k x n).
Hessiana, inv_obs, inv_fisherMatrices from the fit.
Fits a simplex regression model with logit link for the mean and log link for the dispersion parameter using maximum likelihood via BFGS.
simplex_fit( y, X, Z = NULL, start = NULL, control = list(maxit = 500, reltol = 1e-10) )simplex_fit( y, X, Z = NULL, start = NULL, control = list(maxit = 500, reltol = 1e-10) )
y |
Numeric vector of responses in (0, 1). |
X |
Numeric matrix of covariates for the mean sub-model (including intercept column). Dimension n x p. |
Z |
Numeric matrix of covariates for the dispersion sub-model
(including intercept column). Dimension n x q. If |
start |
Optional numeric vector of starting values of length |
control |
A list passed to |
A list of class "simplexfit" with components:
coefficientsNamed numeric vector of MLE estimates (beta, gamma).
loglikLog-likelihood at the MLE.
fitted.muFitted means.
fitted.sigma2Fitted dispersion values.
vcov.fisherVariance-covariance matrix from Fisher information.
seStandard errors.
convergedLogical; whether BFGS converged.
XDesign matrix for mean sub-model.
ZDesign matrix for dispersion sub-model.
yResponse vector.
n, p, qSample size and number of mean/dispersion parameters.
data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) fit <- simplex_fit(ammonia$perda, X, Z) print(fit)data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) fit <- simplex_fit(ammonia$perda, X, Z) print(fit)
Performs the local-influence goodness-of-fit test for a simplex
regression model. The null distribution is calibrated via a parametric
bootstrap, which provides accurate size control even in finite samples.
simplex_gof( y, X, Z = NULL, B = 1000, alpha = c(0.01, 0.05, 0.1), ncores = 1, seed = NULL, verbose = TRUE )simplex_gof( y, X, Z = NULL, B = 1000, alpha = c(0.01, 0.05, 0.1), ncores = 1, seed = NULL, verbose = TRUE )
y |
Numeric vector of responses in (0, 1). |
X |
Design matrix for the mean sub-model (n x p, including intercept). |
Z |
Design matrix for the dispersion sub-model (n x q, including
intercept). If |
B |
Integer; number of bootstrap replicates. Default 1000. |
alpha |
Numeric vector of significance levels. Default
|
ncores |
Integer; number of parallel workers. Default 1 (sequential).
Set to |
seed |
Integer random seed for reproducibility. Default |
verbose |
Logical; whether to print progress. Default |
The test statistic is
where is the
total local influence of observation under case-weight perturbation,
and is the sample variance of .
Because the normal approximation is severely liberal for the simplex class
(empirical size 3–7x nominal even at ), critical values from
the parametric bootstrap are preferred. Asymptotic critical
values are also reported for comparison.
An object of class "simplexgof" with components:
fitThe "simplexfit" object for the original data.
diagThe simplex_diag() output for the original data.
UnObserved test statistic.
TnObserved .
Un_bootNumeric vector of B bootstrap values.
resultsData frame summarising decisions at each level.
B, alphaAs input.
Espinheira P.L., Silva F.C., Barros M., Ospina R. (2026). A Bootstrap-Calibrated Local Influence Goodness-of-Fit Procedure for Simplex Regression Models.
Zhu H., Zhang H. (2004). A diagnostic procedure based on local influence. Biometrika, 91(3), 579–589.
simplex_fit, simplex_diag,
rsimplex
data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) set.seed(42) gof <- simplex_gof(ammonia$perda, X, Z, B = 200) print(gof)data(ammonia) X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) set.seed(42) gof <- simplex_gof(ammonia$perda, X, Z, B = 200) print(gof)
Statistic (Finite-Difference Calibration)Computes the statistic using the finite-difference gradient
, which gives the correct asymptotic variance for the simplex
class. This is the version whose null distribution is studied in the
simulation section of the companion paper (Figure 1).
simplex_Un_asymptotic(y, X, Z = NULL, eps = 1e-04)simplex_Un_asymptotic(y, X, Z = NULL, eps = 1e-04)
y |
Response vector in (0, 1). |
X |
Design matrix for the mean sub-model. |
Z |
Design matrix for the dispersion sub-model (or |
eps |
Finite-difference step. Default |
For the bootstrap test, use simplex_gof instead — the
analytic gradient is bootstrap-invariant and faster.
Scalar (asymptotic calibration), or NA if the
model fails to converge.
set.seed(1) n <- 40 X <- cbind(1, matrix(runif(n * 4), n, 4)) mu <- plogis(drop(X %*% c(2, -0.5, -1.4, 1.25, -2.35))) y <- rsimplex(n, mu, 0.5) Un <- simplex_Un_asymptotic(y, X) Unset.seed(1) n <- 40 X <- cbind(1, matrix(runif(n * 4), n, 4)) mu <- plogis(drop(X %*% c(2, -0.5, -1.4, 1.25, -2.35))) y <- rsimplex(n, mu, 0.5) Un <- simplex_Un_asymptotic(y, X) Un