vennDiagramLab reports five pairwise metrics for every
set pair plus a multiple-testing correction. This vignette explains what
each metric means, when to prefer it, and how to reproduce the values
that appear in the web tool’s significance coloring.
library(vennDiagramLab)
result <- analyze(load_sample("dataset_real_cancer_drivers_4"))
stats <- statistics(result)For two sets A and B of sizes
|A|, |B| with intersection
|A ∩ B| drawn from a universe of N items:
|A ∩ B| / |A ∪ B|. Range
[0, 1]. Symmetric.2 |A ∩ B| / (|A| + |B|). Range
[0, 1]. Symmetric. Always >= Jaccard; the
two relate by Dice = 2J / (1 + J).|A ∩ B| / min(|A|, |B|). Range [0, 1]. Equal
to 1 when one set is contained in the other.|A ∩ B| or more shared items by chance, given
|A|, |B|, and N. Tests over-
representation.(|A ∩ B| * N) / (|A| * |B|). The ratio of observed to
expected intersection size under independence. > 1 is
over- representation.The helpers are exported and stateless:
jaccard(size_a = 138, size_b = 581, intersection = 100)
dice(size_a = 138, size_b = 581, intersection = 100)
overlap_coefficient(size_a = 138, size_b = 581, intersection = 100)
hypergeometric_p_value(N = 20000, K = 138, n = 581, k = 100)
fold_enrichment(N = 20000, K = 138, n = 581, k = 100)The hypergeometric p-value is essentially zero: 100 shared genes out
of an expected (138 * 581) / 20000 ≈ 4 is a 25×
enrichment.
statistics(result) returns five tables (four square
NxN matrices for the ratio metrics + a long-form data.frame
for the hypergeometric test):
stats@hypergeometric already carries the BH-FDR-adjusted
q-value (p_adjusted) and a boolean significant
(q < 0.05) and highly_significant (q < 0.001). The
adjustment uses stats::p.adjust(method = "BH"):
raw_p <- stats@hypergeometric$p_value
adjusted <- bh_fdr(raw_p)
all.equal(adjusted, stats@hypergeometric$p_adjusted)For unrelated p-values, BH-FDR is more permissive than Bonferroni and more conservative than no correction:
broom::tidy() produces a tibble that’s pipeline-friendly
(one row per pair, all metrics in one frame, sorted by adjusted
p-value):
The web tool colors significant pairs via the same
p_adjusted thresholds:
vignette("v02_real_cancer_drivers") — see these stats
in the context of a real biological analysis.vignette("v06_pipeline_integration") — feed
broom::tidy() into a downstream tidyverse pipeline.