--- title: "Introduction to simplexgof" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to simplexgof} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 5 ) ``` ```{r setup} library(simplexgof) ``` ## Overview **simplexgof** implements a bootstrap-calibrated local-influence goodness-of-fit (GoF) test for simplex regression models with constant or varying dispersion. The package provides: - `simplex_fit()`: fit a simplex regression model via maximum likelihood, with logit link for the mean and log link for the dispersion. - `simplex_diag()`: compute local-influence diagnostic quantities (the $T_n$ and $U_n$ statistics, individual influence measures $C_{I_t}$). - `simplex_gof()`: run the full parametric-bootstrap GoF test. - Plotting functions to visualise influence diagnostics, half-normal envelopes, and the bootstrap distribution of $U_n$. This vignette walks through a complete analysis using the `ammonia` dataset bundled with the package. ## The data The `ammonia` dataset (Brownlee, 1965) has 21 observations on the proportion of ammonia lost during an industrial oxidation process, together with three covariates. ```{r} data(ammonia) head(ammonia) ``` The response `perda` is a proportion in $(0, 1)$, making it a natural candidate for simplex regression. ## Fitting a simplex regression model We model the mean $\mu_t$ with covariates `corr_ar`, `temp_agua`, and their interaction, and allow the dispersion $\sigma^2_t$ to depend on `temp_agua` and the same interaction term. ```{r} X <- cbind(1, ammonia$corr_ar, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) Z <- cbind(1, ammonia$temp_agua, ammonia$corr_ar * ammonia$temp_agua) fit <- simplex_fit(ammonia$perda, X, Z) fit ``` The fitted object has class `"simplexfit"`, with `print`, `coef`, and `fitted` methods. ```{r} coef(fit) ``` ## Influence diagnostics `simplex_diag()` computes the case-weight local-influence measures $C_{I_t}$ and the test statistics $T_n$ and $U_n$ that aggregate them. ```{r} dg <- simplex_diag(fit) dg$Tn dg$Un ``` These quantities can be visualised with `plot_influence()`, which produces an index plot of the individual influence values $C_{I_t}$: ```{r, fig.alt = "Influence index plot for the ammonia model"} plot_influence(dg) ``` ## The bootstrap goodness-of-fit test Because the first-order asymptotic normal calibration of $U_n$ is known to be liberal in small samples, `simplex_gof()` provides a parametric bootstrap calibration. With `B = 50` replicates (for speed in this vignette; use a larger `B`, e.g. 1000, in practice): ```{r} set.seed(42) gof <- simplex_gof(ammonia$perda, X, Z, B = 50, alpha = 0.01, verbose = FALSE) gof ``` The bootstrap distribution of $U_n$ under $H_0$ can be visualised with `plot_gof_boot()`: ```{r, fig.alt = "Bootstrap distribution of Un for the ammonia model"} plot_gof_boot(gof) ``` ## Half-normal plot with simulated envelope `plot_envelope()` produces a half-normal plot of the influence measures with a simulated envelope, useful for spotting individual observations that drive the lack of fit: ```{r, eval = FALSE} plot_envelope(fit, B = 99) ``` ## Convenience `plot` methods Both `"simplexfit"` and `"simplexgof"` objects have `plot()` methods that wrap the functions above: ```{r, eval = FALSE} plot(fit, which = "influence") plot(gof, which = "boot") ``` ## Next steps For full reproductions of the figures and tables in the companion methodological paper (Ospina, Espinheira, Silva and Barros, 2026), see the *"Paper: ammonia application"* and *"Paper: PBSC application"* articles.