| Title: | Cumulative Standardized Binomial EWMA for Multiple Stream Processes |
|---|---|
| Description: | Implements the Cumulative Standardized Binomial Exponentially Weighted Moving Average (CSB-EWMA) control chart for monitoring multiple independent streams with binomial outcomes. Provides exact variance calculations, adaptive control limits, post-hoc identification with multiple testing corrections (Bonferroni, Holm, Benjamini-Hochberg), and visualization tools. The method is described in Muritala et al. (2026) <doi:10.48550/arXiv.2601.09968>. |
| Authors: | Faruk Muritala [aut, cre], Austin Brown [aut], Dhrubajyoti Ghosh [aut], Sherry Ni [aut] |
| Maintainer: | Faruk Muritala <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.1 |
| Built: | 2026-06-05 19:55:44 UTC |
| Source: | https://github.com/cran/csbewma |
Applies Bonferroni, Holm, or Benjamini-Hochberg corrections.
apply_multiple_testing(pvals, method = "BH", alpha = 0.05)apply_multiple_testing(pvals, method = "BH", alpha = 0.05)
pvals |
Numeric vector of raw p-values |
method |
Correction method: "BH", "bonferroni", "holm", or "raw" |
alpha |
Significance level (default = 0.05) |
List with adjusted_pvals and flags
pvals <- c(0.001, 0.01, 0.03, 0.10, 0.50) result <- apply_multiple_testing(pvals, method = "BH", alpha = 0.05)pvals <- c(0.001, 0.01, 0.03, 0.10, 0.50) result <- apply_multiple_testing(pvals, method = "BH", alpha = 0.05)
Computes exact binomial p-values for each stream based on successes.
calculate_pvalues(bin_matrix, p0 = 0.5)calculate_pvalues(bin_matrix, p0 = 0.5)
bin_matrix |
Matrix of binary indicators (streams as rows, time as columns) |
p0 |
In-control proportion(s). Either a single number or a vector of length = nrow(bin_matrix) |
Numeric vector of p-values
bin_mat <- matrix(rbinom(500, 1, 0.5), nrow = 10, ncol = 50) pvals <- calculate_pvalues(bin_mat, p0 = 0.5)bin_mat <- matrix(rbinom(500, 1, 0.5), nrow = 10, ncol = 50) pvals <- calculate_pvalues(bin_mat, p0 = 0.5)
Runs the Cumulative Standardized Binomial EWMA control chart on multiple stream data.
csb_ewma( data, lambda, L, p0 = 0.5, max_time = NULL, posthoc_method = "BH", alpha = 0.05, verbose = TRUE )csb_ewma( data, lambda, L, p0 = 0.5, max_time = NULL, posthoc_method = "BH", alpha = 0.05, verbose = TRUE )
data |
A matrix of binary indicators (0/1) with streams as rows and time as columns |
lambda |
Smoothing parameter for EWMA (0 < lambda <= 1) |
L |
Control limit multiplier |
p0 |
In-control proportion(s). Either a single number (same for all streams) or a numeric vector of length equal to number of rows in data. |
max_time |
Maximum time points to monitor (default = NULL uses all) |
posthoc_method |
Method for post-hoc identification (default = "BH") |
alpha |
Significance level for post-hoc (default = 0.05) |
verbose |
If TRUE, print informational messages (default = TRUE) |
A list of class "csb_ewma" containing chart results and flagged streams
{ set.seed(123) bin_data <- matrix(rbinom(10*100, 1, 0.5), nrow = 10, ncol = 100) for(i in 1:3) bin_data[i, ] <- rbinom(100, 1, 0.8) result <- csb_ewma(bin_data, lambda = 0.175, L = 1.375) message(result) plot(result) }{ set.seed(123) bin_data <- matrix(rbinom(10*100, 1, 0.5), nrow = 10, ncol = 100) for(i in 1:3) bin_data[i, ] <- rbinom(100, 1, 0.8) result <- csb_ewma(bin_data, lambda = 0.175, L = 1.375) message(result) plot(result) }
Converts continuous observations to binary indicators based on whether each observation exceeds the in-control median or specified quantile.
dichotomize_data(data, distribution, p0 = 0.5)dichotomize_data(data, distribution, p0 = 0.5)
data |
Numeric vector of continuous observations |
distribution |
Character string specifying the distribution type |
p0 |
In-control proportion (default = 0.5) |
Integer vector of binary indicators (0 or 1)
x <- rnorm(100) binary <- dichotomize_data(x, distribution = "normal", p0 = 0.5)x <- rnorm(100) binary <- dichotomize_data(x, distribution = "normal", p0 = 0.5)
Creates a summary of which streams were flagged.
flagged_streams_summary(flagged_results, verbose = TRUE)flagged_streams_summary(flagged_results, verbose = TRUE)
flagged_results |
Output from identify_ooc() |
verbose |
If TRUE, print informational messages (default = TRUE) |
List with flagged streams and summary table
bin_mat <- matrix(rbinom(10*100, 1, 0.5), nrow = 10, ncol = 100) for(i in 1:3) bin_mat[i, ] <- rbinom(100, 1, 0.8) result <- identify_ooc(bin_mat) summary <- flagged_streams_summary(result) print(summary$flagged_streams)bin_mat <- matrix(rbinom(10*100, 1, 0.5), nrow = 10, ncol = 100) for(i in 1:3) bin_mat[i, ] <- rbinom(100, 1, 0.8) result <- identify_ooc(bin_mat) summary <- flagged_streams_summary(result) print(summary$flagged_streams)
Generates continuous observations from normal, Laplace, uniform, or exponential distributions with optional shift parameter for out-of-control simulation.
generate_continuous_data(distribution, n, shift = 0, p0 = 0.5)generate_continuous_data(distribution, n, shift = 0, p0 = 0.5)
distribution |
Character string: "normal", "laplace", "uniform", or "exponential" |
n |
Number of observations to generate |
shift |
Amount to shift the proportion (default = 0) |
p0 |
In-control proportion (default = 0.5) |
A numeric vector of length n containing the generated data
data <- generate_continuous_data("normal", n = 100, shift = 0.2)data <- generate_continuous_data("normal", n = 100, shift = 0.2)
Identify out-of-control streams Main function for post-hoc identification using multiple testing corrections.
identify_ooc(bin_matrix, alpha = 0.05, method = "BH", p0 = 0.5, verbose = TRUE)identify_ooc(bin_matrix, alpha = 0.05, method = "BH", p0 = 0.5, verbose = TRUE)
bin_matrix |
Matrix of binary indicators (streams as rows, time as columns) |
alpha |
Significance level (default = 0.05) |
method |
Correction method (default = "BH") |
p0 |
In-control proportion(s). Either a single number or a vector of length = nrow(bin_matrix) |
verbose |
If TRUE, print informational messages (default = TRUE) |
Data frame with p-values and flags for each stream
set.seed(123) bin_mat <- matrix(rbinom(10*100, 1, 0.5), nrow = 10, ncol = 100) for(i in 1:3) bin_mat[i, ] <- rbinom(100, 1, 0.8) result <- identify_ooc(bin_mat, p0 = 0.5) print(result[result$flagged, ])set.seed(123) bin_mat <- matrix(rbinom(10*100, 1, 0.5), nrow = 10, ncol = 100) for(i in 1:3) bin_mat[i, ] <- rbinom(100, 1, 0.8) result <- identify_ooc(bin_mat, p0 = 0.5) print(result[result$flagged, ])
Creates a combined plot showing both the CSB-EWMA control chart and the flagged streams bar plot side by side or stacked.
plot_chart_with_flagged(chart_result, flagged_results, layout = "side")plot_chart_with_flagged(chart_result, flagged_results, layout = "side")
chart_result |
csb_ewma object from csb_ewma() function |
flagged_results |
Output from identify_ooc() function |
layout |
Either "side" for side-by-side or "stacked" for vertical |
A combined ggplot object (invisibly) and displays the plot
Creates a professional control chart showing EWMA statistic and limits. This function can be called directly without S3 dispatch.
plot_csb_ewma_direct( result, title = "CSB-EWMA Control Chart", show_signal = TRUE )plot_csb_ewma_direct( result, title = "CSB-EWMA Control Chart", show_signal = TRUE )
result |
csb_ewma object from csb_ewma() or run_csb_ewma() |
title |
Plot title (default = "CSB-EWMA Control Chart") |
show_signal |
Whether to highlight signal point (default = TRUE) |
A ggplot object (invisibly) and displays the plot
Creates a bar plot showing -log10(p-values) for each stream. Flagged streams appear in red, others in gray.
plot_flagged_streams(flagged_results, alpha = 0.05, title = "Stream P-values")plot_flagged_streams(flagged_results, alpha = 0.05, title = "Stream P-values")
flagged_results |
Output data frame from identify_ooc() function |
alpha |
Significance level for reference line (default = 0.05) |
title |
Plot title (default = "Stream P-values") |
A ggplot object (invisibly) and displays the plot
Creates a professional control chart showing EWMA statistic and limits.
## S3 method for class 'csb_ewma' plot(x, title = "CSB-EWMA Control Chart", show_signal = TRUE, ...)## S3 method for class 'csb_ewma' plot(x, title = "CSB-EWMA Control Chart", show_signal = TRUE, ...)
x |
csb_ewma object from csb_ewma() function |
title |
Plot title (default = "CSB-EWMA Control Chart") |
show_signal |
Whether to highlight signal point (default = TRUE) |
... |
Additional arguments passed to ggplot |
A ggplot object (invisibly) and displays the plot
# See csb_ewma() for examples# See csb_ewma() for examples
This function precomputes the exact variance for all time points from 1 to max_t. For efficiency, it computes exact variances up to a convergence threshold (converge_t) and then sets remaining values to 1 (asymptotic). Based on the derivation, variance reaches 99% of asymptotic value by t=227, so converge_t = 500 is safe and efficient.
precompute_variance(lambda, max_t, converge_t = 500)precompute_variance(lambda, max_t, converge_t = 500)
lambda |
Smoothing parameter for EWMA |
max_t |
Maximum time point to precompute variance for |
converge_t |
Time point after which variance is set to 1 (asymptotic) |
A numeric vector of length max_t containing variance at each time point
# Precompute variance for lambda = 0.175, up to t = 1000 var_cache <- precompute_variance(lambda = 0.175, max_t = 1000, converge_t = 500)# Precompute variance for lambda = 0.175, up to t = 1000 var_cache <- precompute_variance(lambda = 0.175, max_t = 1000, converge_t = 500)
Prints a formatted summary of CSB-EWMA chart results.
## S3 method for class 'csb_ewma' print(x, ...)## S3 method for class 'csb_ewma' print(x, ...)
x |
csb_ewma object from csb_ewma() or run_csb_ewma() function |
... |
Additional arguments (not used) |
Invisibly returns the object
Generates random numbers from the Laplace distribution using inverse transform sampling.
rlaplace(n, location = 0, scale = 1)rlaplace(n, location = 0, scale = 1)
n |
Number of observations to generate |
location |
Location parameter (median) of the distribution (default = 0) |
scale |
Scale parameter (spread) of the distribution (default = 1) |
A numeric vector of length n containing Laplace random variables
{ x <- rlaplace(100, location = 0, scale = 1) }{ x <- rlaplace(100, location = 0, scale = 1) }
This function implements the core CSB-EWMA monitoring algorithm exactly as implemented in the original simulation code.
run_csb_ewma(bin_matrix, lambda, L, var_cache, max_time = NULL, p0 = 0.5)run_csb_ewma(bin_matrix, lambda, L, var_cache, max_time = NULL, p0 = 0.5)
bin_matrix |
A matrix of binary indicators (streams as rows, time as columns) |
lambda |
Smoothing parameter for EWMA (0 < lambda <= 1) |
L |
Control limit multiplier |
var_cache |
Precomputed variance vector from precompute_variance() |
max_time |
Maximum time points to monitor (default = NULL uses all) |
p0 |
In-control proportion(s). Either a single number (same for all streams) or a numeric vector of length equal to number of rows in bin_matrix. Example p0 = c(0.5, 0.6, 0.7) # different p0 for stream1, stream2, stream3. |
The algorithm works as follows:
Initialize cumulative sum (cum_sum = 0) and EWMA (r_prev = 0)
For each time point t = 1, 2, ..., max_time:
Get binary vector for current time point
Calculate C_t = sum of binary indicators
Update cumulative sum: cum_sum = cum_sum + C_t
Compute standardized statistic: W_t = (cum_sum - mu0t) / sqrt(tsigma2_0)
Update EWMA: r_t = lambda * W_t + (1 - lambda) * r_prev
Get exact variance from precomputed cache: v_t = var_cachet
Compute control limits: UCL_t = L * sqrt(v_t), LCL_t = -L * sqrt(v_t)
If r_t > UCL_t or r_t < LCL_t, signal at time t and break
Otherwise, update r_prev = r_t and continue
A list of class "csb_ewma" containing chart results
bin_matrix <- matrix(rbinom(10*200, 1, 0.5), nrow = 10, ncol = 200) for(i in 1:3) bin_matrix[i, ] <- rbinom(100, 1, 0.8) var_cache <- precompute_variance(0.175, max_t = 200) result <- run_csb_ewma(bin_matrix, lambda = 0.175, L = 1.375, var_cache) message(paste("Signal at time:", result$signal_time))bin_matrix <- matrix(rbinom(10*200, 1, 0.5), nrow = 10, ncol = 200) for(i in 1:3) bin_matrix[i, ] <- rbinom(100, 1, 0.8) var_cache <- precompute_variance(0.175, max_t = 200) result <- run_csb_ewma(bin_matrix, lambda = 0.175, L = 1.375, var_cache) message(paste("Signal at time:", result$signal_time))
This function implements the exact variance formula derived in Theorem 2. The variance is computed using double summation over the covariance structure of the standardized statistics. For a given smoothing parameter lambda and time point t, this returns Var(r_t) as defined in Equation (19) of the supplementary material.
var_rt_exact_single(lambda, t)var_rt_exact_single(lambda, t)
lambda |
Smoothing parameter for EWMA. Must be between 0 and 1. Typical values range from 0.05 to 0.5. |
t |
Time point (positive integer). The variance is computed for this specific time point. |
The exact variance Var(r_t) at time t. For t = 0, returns 0.
# Compute variance at time 10 for lambda = 0.175 var_rt_exact_single(lambda = 0.175, t = 10) # Compute variance at time 50 for lambda = 0.15 var_rt_exact_single(lambda = 0.15, t = 50)# Compute variance at time 10 for lambda = 0.175 var_rt_exact_single(lambda = 0.175, t = 10) # Compute variance at time 50 for lambda = 0.15 var_rt_exact_single(lambda = 0.15, t = 50)