| Title: | Optimal Transport Based Testing in Factorial Design |
|---|---|
| Description: | Perform optimal transport based tests in factorial designs as introduced in Groppe et al. (2025) <doi:10.48550/arXiv.2509.13970> via the FDOTT() function. These tests are inspired by ANOVA and its nonparametric counterparts. They allow for testing linear relationships in factorial designs between finitely supported probability measures on a metric space. Such relationships include equality of all measures (no treatment effect), interaction effects between a number of factors, as well as main and simple factor effects. |
| Authors: | Michel Groppe [aut, cre], Linus Niemöller [aut] |
| Maintainer: | Michel Groppe <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-05-29 09:50:06 UTC |
| Source: | https://github.com/cran/FDOTT |
-Form.Compute cost matrices of -form.
cost_matrix_lp(x, y = NULL, p = 2, q = 1)cost_matrix_lp(x, y = NULL, p = 2, q = 1)
x |
matrix of size |
y |
matrix of size |
p |
number |
q |
number |
A matrix with entry at being equal to
For p = Inf, this is to be understood as the maximum norm to the power of .
n <- 3 m <- 4 d <- 5 x <- runif(n * d) |> matrix(n, d) y <- runif(m * d) |> matrix(m, d) costm <- cost_matrix_lp(x, y) print(costm)n <- 3 m <- 4 d <- 5 x <- runif(n * d) |> matrix(n, d) y <- runif(m * d) |> matrix(m, d) costm <- cost_matrix_lp(x, y) print(costm)
Perform FDOTT, an optimal transport (OT) based test in factorial designs, to test linear relationships between probability vectors, based on samples from them.
FDOTT( samples, costm, H0 = "*", fac.names = NULL, method = c("plug-in", "bootstrap-deriv", "bootstrap-m", "permutation"), num.sim = 1000, null.mu = NULL, m.p = 0.5, is.metric = is_metric_cost_mat(costm, tol.ti = Inf), verbose = FALSE )FDOTT( samples, costm, H0 = "*", fac.names = NULL, method = c("plug-in", "bootstrap-deriv", "bootstrap-m", "permutation"), num.sim = 1000, null.mu = NULL, m.p = 0.5, is.metric = is_metric_cost_mat(costm, tol.ti = Inf), verbose = FALSE )
samples |
nested list of depth |
costm |
semi-metric cost matrix |
H0 |
null hypothesis, see details. |
fac.names |
names of the |
method |
the method to use to simulate from the null distribution, see details. |
num.sim |
number of samples to draw from the limiting null distribution. |
null.mu |
probability vectors |
m.p |
exponent |
is.metric |
value indicating whether |
verbose |
logical value indicating whether additional information should be printed. |
Denote with the matrix (row-wise) of the probability vectors (in lexicographical order of the factor combinations)
that underlie samples. FDOTT deals with null hypotheses of the form
where is a suitable matrix with row sums all equal to . The FDOTT statistic is defined as
where and are scaling factors, is the -th row-vector of
and the extended OT functional, see ot_cost_sgn.
The test is based on the asymptotic distribution of under under the null, for more details see Groppe et al. (2025).
The form of allows for testing hypotheses like interaction effects in classical ANOVA, obtained by formally substituting means by measures.
The following values are allowed for H0:
H0 = "*" (the default). Test all interaction (including main effects) of the factors. A specific interaction or main effect can be tested by
including the corresponding indices of the factors in a list, e.g., H0 = list("*", c(1, 3)) corresponds to the interaction effect between factor 1 and 3.
Note that in a one-way layout, H0 = "*" reduces to H0 = "=".
H0 = "|". Test all simple factor effects. A specific simple factor effect can be tested by by including the
corresponding indices of the factors in a list, e.g., H0 = list("|", c(1, 3)) corresponds to the simple factor effect of factor 1 and 3 within the
other remaining factors.
H0 = "=". Test for treatment effect, i.e., whether all underlying probability vectors are the same. Note that each pairwise comparison can be tested
simultaneously via FDOTT_HSD.
H0 = L. Test for the directly supplied matrix. The name of the tested effect (useful for printing)
and the scaling (by default nrow(L)) can be supplied by setting the "effect" and "scaling" attribute of L, respectively.
H0 = list(...). Test a combined null hypothesis. Each element of the list represents a null hypothesis and can be given by one of the options above.
This is useful in combination with FDOTT_HSD, which allows to test all the given null hypotheses simultaneously.
To simulate from the limiting null distribution, there are four different methods:
"plug-in": uses the limiting distribution where is substituted by its empirical version (or null.mu, when specified).
"bootstrap-deriv": uses the so-called derivative bootstrap.
"bootstrap-m": uses -out-of- bootstrap with .
"permutation": uses a permutation approach, only works for H0 = "=".
These simulations can be done in parallel via future::plan and the progress can be shown with progressr::with_progress.
A FDOTT object containing:
fac.lvls |
vector of levels of the factors |
mu |
matrix, empirical version of that is based on samples |
n |
vector of sample sizes |
L |
matrix for the null hypothesis |
p.value |
the -value |
statistic |
the value of the test statistic |
null.samples |
samples drawn from the null distribution |
M. Groppe, L. Niemöller, S. Hundrieser, D. Ventzke, A. Blob, S. Köster and A. Munk (2025). Optimal Transport Based Testing in Factorial Design. arXiv preprint. doi:10.48550/arXiv.2509.13970.
# enable txt progressbar progressr::handlers("txtprogressbar") # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } # use higher number to better approximate null distribution and get more accurate p-value num.sim <- 10 ### one-way layout N <- 2 costm <- cost_matrix_lp(1:N) K <- 3 n <- c(300, 360, 200) # underlying probability vectors, all measures are equal mu <- matrix(1 / N, K, N, TRUE) set.seed(123) samples <- tab_sample(n, mu) # show progress progressr::with_progress({ # default in one-way layout is H0 = "=" res <- FDOTT(samples, costm, num.sim = num.sim) }) print(res) # measures are not equal mu[2, ] <- c(0.1, 0.9) set.seed(123) samples <- tab_sample(n, mu) res2 <- FDOTT(samples, costm, num.sim = num.sim) print(res2) # find out which measures are not equal via HSD res3 <- FDOTT_HSD(res2) print(res3) ### two-way layout K1 <- K2 <- 2 N <- 3 costm <- cost_matrix_lp(1:N) n <- list(list(300, 360), list(280, 200)) # underlying probability vectors (two-way layout) # no interaction effect, only factor 2 has main effect mu <- list( list(c(0, 0.5, 0.5), c(0.25, 0.25, 0.5)), list(c(0, 0.5, 0.5), c(0.25, 0.25, 0.5)) ) # test interaction effect and main effects, equivalent to H0 <- "*" H0 <- list(list("*", 1:2), list("*", 1), list("*", 2)) set.seed(123) samples <- tab_sample(n, mu) res <- FDOTT(samples, costm, H0 = H0, num.sim = num.sim) print(res) # find out exactly which effect gets rejected via HSD res1 <- FDOTT_HSD(res) print(res1) # now with interaction effect mu[[1]][[1]] <- c(0.3, 0.3, 0.4) # only test for interaction effect H0 <- list("*", 1:2) set.seed(123) samples <- tab_sample(n, mu) res2 <- FDOTT(samples, costm, H0 = H0, num.sim = num.sim, method = "bootstrap-deriv") print(res2) ### custom effect K <- 2 N <- 2 costm <- cost_matrix_lp(1:N) num.sim <- 100 # null hypothesis H0: mu^1 - 0.5 * mu^2 - 0.5 * mu^3 = 0 L <- matrix(c(1, -0.5, -0.5), 1, 3) # give custom name attr(L, "effect") <- "mu^1 = 0.5 * (mu^2 + mu^3)" # underlying probability vectors mu <- matrix(c(0.4, 0.6, 0.6, 0.4, 0.2, 0.8), 3, 2, TRUE) print(L %*% mu) n <- c(250, 280, 230) # test L, as well as mu^1 = mu^2 = mu^3 H0 <- list(L, "=") set.seed(123) samples <- tab_sample(n, mu) res <- FDOTT(samples, costm, H0 = H0, num.sim = num.sim) print(res) # find out which effect is responsible for rejection res2 <- FDOTT_HSD(res) print(res2) # L %*% mu = 0 not satisfied anymore mu[2, ] <- c(1, 0) print(L %*% mu) # only test for L %*% mu = 0 H0 <- L set.seed(123) samples <- tab_sample(n, mu) res3 <- FDOTT(samples, costm, H0 = H0, num.sim = num.sim) print(res3)# enable txt progressbar progressr::handlers("txtprogressbar") # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } # use higher number to better approximate null distribution and get more accurate p-value num.sim <- 10 ### one-way layout N <- 2 costm <- cost_matrix_lp(1:N) K <- 3 n <- c(300, 360, 200) # underlying probability vectors, all measures are equal mu <- matrix(1 / N, K, N, TRUE) set.seed(123) samples <- tab_sample(n, mu) # show progress progressr::with_progress({ # default in one-way layout is H0 = "=" res <- FDOTT(samples, costm, num.sim = num.sim) }) print(res) # measures are not equal mu[2, ] <- c(0.1, 0.9) set.seed(123) samples <- tab_sample(n, mu) res2 <- FDOTT(samples, costm, num.sim = num.sim) print(res2) # find out which measures are not equal via HSD res3 <- FDOTT_HSD(res2) print(res3) ### two-way layout K1 <- K2 <- 2 N <- 3 costm <- cost_matrix_lp(1:N) n <- list(list(300, 360), list(280, 200)) # underlying probability vectors (two-way layout) # no interaction effect, only factor 2 has main effect mu <- list( list(c(0, 0.5, 0.5), c(0.25, 0.25, 0.5)), list(c(0, 0.5, 0.5), c(0.25, 0.25, 0.5)) ) # test interaction effect and main effects, equivalent to H0 <- "*" H0 <- list(list("*", 1:2), list("*", 1), list("*", 2)) set.seed(123) samples <- tab_sample(n, mu) res <- FDOTT(samples, costm, H0 = H0, num.sim = num.sim) print(res) # find out exactly which effect gets rejected via HSD res1 <- FDOTT_HSD(res) print(res1) # now with interaction effect mu[[1]][[1]] <- c(0.3, 0.3, 0.4) # only test for interaction effect H0 <- list("*", 1:2) set.seed(123) samples <- tab_sample(n, mu) res2 <- FDOTT(samples, costm, H0 = H0, num.sim = num.sim, method = "bootstrap-deriv") print(res2) ### custom effect K <- 2 N <- 2 costm <- cost_matrix_lp(1:N) num.sim <- 100 # null hypothesis H0: mu^1 - 0.5 * mu^2 - 0.5 * mu^3 = 0 L <- matrix(c(1, -0.5, -0.5), 1, 3) # give custom name attr(L, "effect") <- "mu^1 = 0.5 * (mu^2 + mu^3)" # underlying probability vectors mu <- matrix(c(0.4, 0.6, 0.6, 0.4, 0.2, 0.8), 3, 2, TRUE) print(L %*% mu) n <- c(250, 280, 230) # test L, as well as mu^1 = mu^2 = mu^3 H0 <- list(L, "=") set.seed(123) samples <- tab_sample(n, mu) res <- FDOTT(samples, costm, H0 = H0, num.sim = num.sim) print(res) # find out which effect is responsible for rejection res2 <- FDOTT_HSD(res) print(res2) # L %*% mu = 0 not satisfied anymore mu[2, ] <- c(1, 0) print(L %*% mu) # only test for L %*% mu = 0 H0 <- L set.seed(123) samples <- tab_sample(n, mu) res3 <- FDOTT(samples, costm, H0 = H0, num.sim = num.sim) print(res3)
Perform an optimal transport based HSD test to deal with multiple comparisons simultaneously.
FDOTT_HSD(test, weights = NULL, group.sizes = TRUE)FDOTT_HSD(test, weights = NULL, group.sizes = TRUE)
test |
a |
weights |
weight vector of length |
group.sizes |
integer vector summing to the number of comparisons |
Let be the null hypothesis of test. In the case of rejection, it is of interest to
find out exactly which row-equations are not satisfied with statistical significance. To this end, can be split into
a number of sub-hypotheses which are tested simultaneously via an approach inspired by Tukey's HSD test, see Groppe et al. (2025) for more details.
A FDOTT_HSD object containing:
p.value |
the -values |
statistic |
the values of the test statistics |
null.samples |
samples drawn from the null distribution |
M. Groppe, L. Niemöller, S. Hundrieser, D. Ventzke, A. Blob, S. Köster and A. Munk (2025). Optimal Transport Based Testing in Factorial Design. arXiv preprint. doi:10.48550/arXiv.2509.13970.
# see FDOTT for more examples # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) # use higher number to better approximate null distribution and get more accurate p-value num.sim <- 10 # underlying probability vectors (one-way layout) # only mu^1 and mu^3 are equal mu <- matrix(0.5, K, N, TRUE) mu[2, ] <- c(0.2, 0.8) n <- c(300, 360, 200) set.seed(123) samples <- tab_sample(n, mu) res <- FDOTT(samples, costm, num.sim = num.sim) |> FDOTT_HSD() # significant differences for mu^1 = mu^2 and mu^2 = mu^3 print(res)# see FDOTT for more examples # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) # use higher number to better approximate null distribution and get more accurate p-value num.sim <- 10 # underlying probability vectors (one-way layout) # only mu^1 and mu^3 are equal mu <- matrix(0.5, K, N, TRUE) mu[2, ] <- c(0.2, 0.8) n <- c(300, 360, 200) set.seed(123) samples <- tab_sample(n, mu) res <- FDOTT(samples, costm, num.sim = num.sim) |> FDOTT_HSD() # significant differences for mu^1 = mu^2 and mu^2 = mu^3 print(res)
Check if a cost matrix satisfies symmetry, positive definiteness and the triangle inequality.
is_metric_cost_mat(x, tol.sym = 1e-08, tol.pd = 0, tol.ti = 1e-08)is_metric_cost_mat(x, tol.sym = 1e-08, tol.pd = 0, tol.ti = 1e-08)
x |
numeric square matrix. |
tol.sym |
tolerance used to check symmetry. |
tol.pd |
tolerance used to check positive definiteness. |
tol.ti |
tolerance used to check the triangle inequality. |
The following three properties of a square matrix
are checked:
symmetry; if ,
positive definiteness; if and for all ,
triangle inequality; if .
If symmetry and positive definiteness are satisfied, then is called a semi-metric
cost matrix. If additionally also the triangle inequality holds, then is a
metric cost matrix.
A list containing logical entries metric, semi.metric, sym, pos.def
and tri.ineq that indicate whether the corresponding property is satisfied.
x <- cost_matrix_lp(1:5) res <- is_metric_cost_mat(x) res2 <- is_metric_cost_mat(x^2) # x is a metric cost matrix print(res$metric) # x^2 is only a semi-metric cost matrix, # because the triangle inequality is not satisfied print(res2$semi.metric) print(res2$tri.ineq)x <- cost_matrix_lp(1:5) res <- is_metric_cost_mat(x) res2 <- is_metric_cost_mat(x^2) # x is a metric cost matrix print(res$metric) # x^2 is only a semi-metric cost matrix, # because the triangle inequality is not satisfied print(res2$semi.metric) print(res2$tri.ineq)
Compute the optimal transport (OT) barycenter of multiple probability vectors via linear programming.
ot_barycenter( mu, costm, w = NULL, solver = ot_test_lp_solver(), constr_mat = NULL )ot_barycenter( mu, costm, w = NULL, solver = ot_test_lp_solver(), constr_mat = NULL )
mu |
matrix (row-wise) or list containing |
costm |
cost matrix |
w |
weight vector |
solver |
the LP solver to use, see |
constr_mat |
the constraint matrix for the underlying LP. |
The OT barycenter is defined as the minimizer of the cost functional,
where the minimum is taken over all probability vectors .
The OT barycenter is solved via linear programming (LP) and the underlying solver can be controlled
via the parameter solver.
A list containing the entries cost and barycenter.
K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) w <- rep(1 / K, K) # all measures are equal mu <- matrix(1 / N, K, N, TRUE) # to run this, a LP solver must be available for ROI (ROI.plugin.glpk by default) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") print(ot_barycenter(mu, costm, w = w, solver = solver)) } # not all measures are equal mu[2, ] <- 1:N / sum(1:N) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") print(ot_barycenter(mu, costm, w = w, solver = solver)) }K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) w <- rep(1 / K, K) # all measures are equal mu <- matrix(1 / N, K, N, TRUE) # to run this, a LP solver must be available for ROI (ROI.plugin.glpk by default) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") print(ot_barycenter(mu, costm, w = w, solver = solver)) } # not all measures are equal mu[2, ] <- 1:N / sum(1:N) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") print(ot_barycenter(mu, costm, w = w, solver = solver)) }
Perform optimal transport (OT) barycenter based tests for equality of probability vectors in a one-way layout.
ot_barycenter_test( samples, costm, null.mu = NULL, w = NULL, num.sim = 1000, solver = ot_test_lp_solver(), is.metric = is_metric_cost_mat(costm, tol.ti = Inf), verbose = FALSE )ot_barycenter_test( samples, costm, null.mu = NULL, w = NULL, num.sim = 1000, solver = ot_test_lp_solver(), is.metric = is_metric_cost_mat(costm, tol.ti = Inf), verbose = FALSE )
samples |
matrix (row-wise) or nested list containing |
costm |
semi-metric cost matrix |
null.mu |
probability measures |
w |
weight vector |
num.sim |
number of samples to draw from the limiting null distribution. |
solver |
the LP solver to use, see |
is.metric |
value indicating whether |
verbose |
logical value indicating whether additional information should be printed. |
Denote with the probability measures that underlie the samples contained in samples. To test for
the one-way null hypothesis , this test employs the OT barycenter statistic
which is defined as
where is a scaling factor and is the OT barycenter functional, see ot_barycenter.
The test is based on the asymptotic distribution of under under the null, for more details see the reference.
These simulations can be done in parallel via future::plan and the progress can be shown with progressr::with_progress.
Especially for large and , simulating a sufficient number of samples from the limiting null distribution might take a while.
Consider using FDOTT instead.
An object of class "ot_barycenter_test" containing:
mu |
empirical version of that is based on samples |
n |
the sample sizes |
p.value |
the -value |
statistic |
the value of the test statistic |
null.samples |
samples drawn from the null distribution |
TODO
FDOTT with H0 = "=".
# enable txt progressbar progressr::handlers("txtprogressbar") # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) # use higher number to better approximate null distribution and get more accurate p-value num.sim <- 10 n <- c(300, 360, 200) # underlying probability vectors mu <- matrix(1 / N, K, N, TRUE) # to run this, a LP solver must be available for ROI (ROI.plugin.glpk by default) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") set.seed(123) samples <- tab_sample(n, mu) progressr::with_progress({ res <- ot_barycenter_test(samples, costm, num.sim = num.sim, solver = solver) }) print(res) } # measures are not equal anymore mu[2, ] <- 1:N / sum(1:N) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") set.seed(123) samples <- tab_sample(n, mu) progressr::with_progress({ res2 <- ot_barycenter_test(samples, costm, num.sim = num.sim, solver = solver) }) print(res2) }# enable txt progressbar progressr::handlers("txtprogressbar") # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) # use higher number to better approximate null distribution and get more accurate p-value num.sim <- 10 n <- c(300, 360, 200) # underlying probability vectors mu <- matrix(1 / N, K, N, TRUE) # to run this, a LP solver must be available for ROI (ROI.plugin.glpk by default) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") set.seed(123) samples <- tab_sample(n, mu) progressr::with_progress({ res <- ot_barycenter_test(samples, costm, num.sim = num.sim, solver = solver) }) print(res) } # measures are not equal anymore mu[2, ] <- 1:N / sum(1:N) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") set.seed(123) samples <- tab_sample(n, mu) progressr::with_progress({ res2 <- ot_barycenter_test(samples, costm, num.sim = num.sim, solver = solver) }) print(res2) }
Compute the optimal transport (OT) cost between signed measures that have the same total mass.
ot_cost_sgn(mu, nu, costm, mode = c("all", "diag"))ot_cost_sgn(mu, nu, costm, mode = c("all", "diag"))
mu |
matrix (row-wise) or list containing |
nu |
matrix (row-wise) or list containing |
costm |
cost matrix |
mode |
controls which of the pairwise OT costs are computed. |
The extended OT functional for vectors with is defined as
where and denote the positive and negative part of , and
is the standard OT functional. To compute the standard OT, the function transport::transport is used.
The values may be computed in parallel via future::plan.
The OT cost between the vectors in mu and nu.
For mode = "all" the whole matrix of size is returned. If mu or nu is a vector, then this matrix is also returned as a vector.
nu = NULL means that nu = mu and only the lower triangular part is actually computed and then reflected.
If mode = "diag", then only the diagonal is returned (requiring ).
# enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } # generate random signed measures with total mass 0 (row-wise) rsum0 <- \(K, N) { x <- runif(K * N) |> matrix(K, N) x <- sweep(x, 1, rowSums(x) / N, "-") x[, 1] <- x[, 1] - rowSums(x) x } K1 <- 3 K2 <- 2 N <- 4 costm <- cost_matrix_lp(1:N) set.seed(123) mu <- rsum0(K1, N) nu <- rsum0(K2, N) print(ot_cost_sgn(mu[2, ], nu[2, ], costm)) # mode = "diag" requires K1 = K2 print(ot_cost_sgn(mu[1:2, ], nu, costm, mode = "diag")) print(ot_cost_sgn(mu, nu, costm)) # only works properly if costm is semi-metric print(ot_cost_sgn(mu, NULL, costm)) # but it requires less computations than print(ot_cost_sgn(mu, mu, costm))# enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } # generate random signed measures with total mass 0 (row-wise) rsum0 <- \(K, N) { x <- runif(K * N) |> matrix(K, N) x <- sweep(x, 1, rowSums(x) / N, "-") x[, 1] <- x[, 1] - rowSums(x) x } K1 <- 3 K2 <- 2 N <- 4 costm <- cost_matrix_lp(1:N) set.seed(123) mu <- rsum0(K1, N) nu <- rsum0(K2, N) print(ot_cost_sgn(mu[2, ], nu[2, ], costm)) # mode = "diag" requires K1 = K2 print(ot_cost_sgn(mu[1:2, ], nu, costm, mode = "diag")) print(ot_cost_sgn(mu, nu, costm)) # only works properly if costm is semi-metric print(ot_cost_sgn(mu, NULL, costm)) # but it requires less computations than print(ot_cost_sgn(mu, mu, costm))
FDOTT linear programming solverCreate an object that controls the linear programming (LP) solver to use.
ot_test_lp_solver(name = NULL, ...)ot_test_lp_solver(name = NULL, ...)
name |
name of the LP solver. |
... |
optional control arguments passed to the corresponding LP solver. |
name can be any LP solver that is compatible with the ROI package infrastructure.
In particular, the corresponding plugin package ROI.plugin.name must be installed.
The default value corresponding to name = NULL can be set via options(FDOTT.lp_solver = name)
(the default is "glpk").
A ot_test_lp_solver_control object containing:
name |
the name of the LP solver |
control |
list of control arguments passed to the LP solver |
## Not run: # glpk is already the default options(FDOTT.lp_solver = "glpk") ## End(Not run) # plugin needs to be installed, else we get error if (requireNamespace("ROI.plugin.glpk")) { # add control parameter (specific to glpk) sol <- ot_test_lp_solver("glpk", verbose = TRUE) print(sol) } else { cat("'ROI.plugin.glpk' needs to be installed!\n") }## Not run: # glpk is already the default options(FDOTT.lp_solver = "glpk") ## End(Not run) # plugin needs to be installed, else we get error if (requireNamespace("ROI.plugin.glpk")) { # add control parameter (specific to glpk) sol <- ot_test_lp_solver("glpk", verbose = TRUE) print(sol) } else { cat("'ROI.plugin.glpk' needs to be installed!\n") }
FDOTT
Perform simulations for the test statistic used in FDOTT.
simulate_finite_FDOTT(mu, costm, n, H0 = "*", num.sim = 1000) simulate_limit_FDOTT_null( mu, costm, n = NULL, delta = NULL, H0 = "*", num.sim = 1000, method = c("plug-in", "bootstrap-deriv", "bootstrap-m"), m.p = 0.5, mean = NULL ) simulate_limit_FDOTT_alt(mu, costm, delta, H0 = "*", num.sim = 1000)simulate_finite_FDOTT(mu, costm, n, H0 = "*", num.sim = 1000) simulate_limit_FDOTT_null( mu, costm, n = NULL, delta = NULL, H0 = "*", num.sim = 1000, method = c("plug-in", "bootstrap-deriv", "bootstrap-m"), m.p = 0.5, mean = NULL ) simulate_limit_FDOTT_alt(mu, costm, delta, H0 = "*", num.sim = 1000)
mu |
matrix (row-wise) or nested list containing |
costm |
semi-metric cost matrix |
n |
samples sizes. Must be of the same structure as |
H0 |
null hypothesis, see |
num.sim |
number of samples to draw from the limiting null distribution. |
delta |
asymptotic sample size ratios. Must be of the same structure as |
method |
the method to use to simulate from the null distribution. |
m.p |
exponent |
mean |
mean of the Gaussians appearing in the limiting distribution. Must be of the same structure as |
See FDOTT for the definition of the test statistic and more details.
simulate_finite_FDOTT simulates from the finite sample distribution.
simulate_limit_FDOTT_null and simulate_limit_FDOTT_alt simulate from the limiting distribution under the null or alternative, respectively.
All these simulations can be done in parallel via future::plan and the progress can be shown with progressr::with_progress.
A vector containing the simulated samples.
# enable txt progressbar progressr::handlers("txtprogressbar") # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) # use higher values for better approximation num.sim <- 10 n <- rep(300, K) delta <- rep(1 / K, K) # under one-way null mu <- matrix(1 / N, K, N, TRUE) set.seed(123) lhs <- simulate_finite_FDOTT(mu, costm, n, num.sim = num.sim) rhs <- simulate_limit_FDOTT_null(mu, costm, delta = delta, num.sim = num.sim) h1 <- density(lhs) h2 <- density(rhs) plot(h1$x, h1$y, xlim = range(h1$x, h2$x), ylim = range(h1$y, h2$y), type = "l", col = "red", xlab = "x", ylab = "density", main = "KDEs") lines(h2$x, h2$y, col = "blue") legend("topright", c("Finite", "Limit"), col = c("red", "blue"), pch = 15) # under one-way alternative mu[2, ] <- 1:N / sum(1:N) set.seed(123) lhs <- simulate_finite_FDOTT(mu, costm, n, num.sim = num.sim) rhs <- simulate_limit_FDOTT_alt(mu, costm, delta, num.sim = num.sim) h1 <- density(lhs) h2 <- density(rhs) plot(h1$x, h1$y, xlim = range(h1$x, h2$x), ylim = range(h1$y, h2$y), type = "l", col = "red", xlab = "x", ylab = "density", main = "KDEs") lines(h2$x, h2$y, col = "blue") legend("topright", c("Finite", "Limit"), col = c("red", "blue"), pch = 15)# enable txt progressbar progressr::handlers("txtprogressbar") # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) # use higher values for better approximation num.sim <- 10 n <- rep(300, K) delta <- rep(1 / K, K) # under one-way null mu <- matrix(1 / N, K, N, TRUE) set.seed(123) lhs <- simulate_finite_FDOTT(mu, costm, n, num.sim = num.sim) rhs <- simulate_limit_FDOTT_null(mu, costm, delta = delta, num.sim = num.sim) h1 <- density(lhs) h2 <- density(rhs) plot(h1$x, h1$y, xlim = range(h1$x, h2$x), ylim = range(h1$y, h2$y), type = "l", col = "red", xlab = "x", ylab = "density", main = "KDEs") lines(h2$x, h2$y, col = "blue") legend("topright", c("Finite", "Limit"), col = c("red", "blue"), pch = 15) # under one-way alternative mu[2, ] <- 1:N / sum(1:N) set.seed(123) lhs <- simulate_finite_FDOTT(mu, costm, n, num.sim = num.sim) rhs <- simulate_limit_FDOTT_alt(mu, costm, delta, num.sim = num.sim) h1 <- density(lhs) h2 <- density(rhs) plot(h1$x, h1$y, xlim = range(h1$x, h2$x), ylim = range(h1$y, h2$y), type = "l", col = "red", xlab = "x", ylab = "density", main = "KDEs") lines(h2$x, h2$y, col = "blue") legend("topright", c("Finite", "Limit"), col = c("red", "blue"), pch = 15)
ot_barycenter_test
Perform simulations for the test statistic used in ot_barycenter_test.
simulate_finite_ot_barycenter_test( mu, costm, n, w = NULL, num.sim = 1000, solver = ot_test_lp_solver() ) simulate_limit_ot_barycenter_test_null( mu, costm, n = NULL, delta = NULL, w = NULL, num.sim = 1000, solver = ot_test_lp_solver(), mean = NULL ) simulate_limit_ot_barycenter_test_alt( mu, costm, delta, w = NULL, num.sim = 1000, solver = ot_test_lp_solver() )simulate_finite_ot_barycenter_test( mu, costm, n, w = NULL, num.sim = 1000, solver = ot_test_lp_solver() ) simulate_limit_ot_barycenter_test_null( mu, costm, n = NULL, delta = NULL, w = NULL, num.sim = 1000, solver = ot_test_lp_solver(), mean = NULL ) simulate_limit_ot_barycenter_test_alt( mu, costm, delta, w = NULL, num.sim = 1000, solver = ot_test_lp_solver() )
mu |
matrix (row-wise) or list containing |
costm |
semi-metric cost matrix |
n |
vector of samples sizes. |
w |
weight vector |
num.sim |
number of samples to draw from the limiting null distribution. |
solver |
the LP solver to use, see |
delta |
vector of asymptotic sample size ratios. |
mean |
mean of the Gaussians appearing in the limiting distribution. Must be of the same structure as |
See ot_barycenter_test for the definition of the test statistic and more details.
simulate_finite_ot_barycenter_test simulates from the finite sample distribution.
simulate_limit_ot_barycenter_test_null and simulate_limit_ot_barycenter_test_alt simulate from the limiting distribution
under the null or alternative, respectively.
All these simulations can be done in parallel via future::plan and the progress can be shown with progressr::with_progress.
A vector containing the simulated samples.
# enable txt progressbar progressr::handlers("txtprogressbar") # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) # use higher values for better approximation num.sim <- 10 n <- rep(300, K) delta <- rep(1 / K, K) # under null mu <- matrix(1 / N, K, N, TRUE) # to run this, a LP solver must be available for ROI (ROI.plugin.glpk by default) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") set.seed(123) # show progress bar progressr::with_progress({ lhs <- simulate_finite_ot_barycenter_test(mu, costm, n, num.sim = num.sim, solver = solver) rhs <- simulate_limit_ot_barycenter_test_null(mu, costm, delta = delta, num.sim = num.sim, solver = solver) }) h1 <- density(lhs) h2 <- density(rhs) plot(h1$x, h1$y, xlim = range(h1$x, h2$x), ylim = range(h1$y, h2$y), type = "l", col = "red", xlab = "x", ylab = "density", main = "KDEs") lines(h2$x, h2$y, col = "blue") legend("topright", c("Finite", "Limit"), col = c("red", "blue"), pch = 15) } # under alternative mu[2, ] <- 1:N / sum(1:N) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") set.seed(123) lhs <- simulate_finite_ot_barycenter_test(mu, costm, n, num.sim = num.sim, solver = solver) rhs <- simulate_limit_ot_barycenter_test_alt(mu, costm, delta, num.sim = num.sim, solver = solver) h1 <- density(lhs) h2 <- density(rhs) plot(h1$x, h1$y, xlim = range(h1$x, h2$x), ylim = range(h1$y, h2$y), type = "l", col = "red", xlab = "x", ylab = "density", main = "KDEs") lines(h2$x, h2$y, col = "blue") legend("topright", c("Finite", "Limit"), col = c("red", "blue"), pch = 15) }# enable txt progressbar progressr::handlers("txtprogressbar") # enable parallel computation if (requireNamespace("future")) { future::plan(future::multisession) } K <- 3 N <- 2 costm <- cost_matrix_lp(1:N) # use higher values for better approximation num.sim <- 10 n <- rep(300, K) delta <- rep(1 / K, K) # under null mu <- matrix(1 / N, K, N, TRUE) # to run this, a LP solver must be available for ROI (ROI.plugin.glpk by default) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") set.seed(123) # show progress bar progressr::with_progress({ lhs <- simulate_finite_ot_barycenter_test(mu, costm, n, num.sim = num.sim, solver = solver) rhs <- simulate_limit_ot_barycenter_test_null(mu, costm, delta = delta, num.sim = num.sim, solver = solver) }) h1 <- density(lhs) h2 <- density(rhs) plot(h1$x, h1$y, xlim = range(h1$x, h2$x), ylim = range(h1$y, h2$y), type = "l", col = "red", xlab = "x", ylab = "density", main = "KDEs") lines(h2$x, h2$y, col = "blue") legend("topright", c("Finite", "Limit"), col = c("red", "blue"), pch = 15) } # under alternative mu[2, ] <- 1:N / sum(1:N) if (requireNamespace("ROI.plugin.glpk")) { solver <- ot_test_lp_solver("glpk") set.seed(123) lhs <- simulate_finite_ot_barycenter_test(mu, costm, n, num.sim = num.sim, solver = solver) rhs <- simulate_limit_ot_barycenter_test_alt(mu, costm, delta, num.sim = num.sim, solver = solver) h1 <- density(lhs) h2 <- density(rhs) plot(h1$x, h1$y, xlim = range(h1$x, h2$x), ylim = range(h1$y, h2$y), type = "l", col = "red", xlab = "x", ylab = "density", main = "KDEs") lines(h2$x, h2$y, col = "blue") legend("topright", c("Finite", "Limit"), col = c("red", "blue"), pch = 15) }
Generate count vectors instead of samples, i.e., vectors giving the number of times a sample was observed at the respective points.
tab_sample(n, mu, prob = FALSE)tab_sample(n, mu, prob = FALSE)
n |
vector or nested list of sample sizes. |
mu |
matrix (row-wise) or nested list containing probability vectors to sample from. The structure of |
prob |
logical value indicating whether probabilities (instead of counts) should be returned. |
The count vectors corresponding to the generated samples. Has the same structure as mu.
## matrix example mu <- matrix(c(0.01, 0.99, 0.5, 0.5), 2, 2, TRUE) n <- c(80, 20) set.seed(123) cv <- tab_sample(n, mu) print(cv) # sample sizes are rowsums print(rowSums(cv)) # empirical probability vectors print(sweep(cv, 1, n, "/")) set.seed(123) # same result print(tab_sample(n, mu, prob = TRUE)) ## list example mu <- list( list(c(0.3, 0.7), c(0.25, 0.75)), list(c(0, 1), c(0.5, 0.5)) ) n <- list(list(100, 120), list(80, 90)) set.seed(123) cv <- tab_sample(n, mu) print(cv) # empirical probability vectors print(rapply(cv, \(x) x / sum(x), how = "replace")) set.seed(123) print(tab_sample(n, mu, prob = TRUE))## matrix example mu <- matrix(c(0.01, 0.99, 0.5, 0.5), 2, 2, TRUE) n <- c(80, 20) set.seed(123) cv <- tab_sample(n, mu) print(cv) # sample sizes are rowsums print(rowSums(cv)) # empirical probability vectors print(sweep(cv, 1, n, "/")) set.seed(123) # same result print(tab_sample(n, mu, prob = TRUE)) ## list example mu <- list( list(c(0.3, 0.7), c(0.25, 0.75)), list(c(0, 1), c(0.5, 0.5)) ) n <- list(list(100, 120), list(80, 90)) set.seed(123) cv <- tab_sample(n, mu) print(cv) # empirical probability vectors print(rapply(cv, \(x) x / sum(x), how = "replace")) set.seed(123) print(tab_sample(n, mu, prob = TRUE))