| Title: | Hypergraph-Based Sizing Function for Generalised Linear Step-Up Method |
|---|---|
| Description: | Calculates a sizing function based on the number of independent sets in the rejected hypergraph (Organ, Kenney & Gu, 2026, <doi:10.48550/arXiv.2606.20514>). The sizing function is designed to be used with the 'GLSUP' package. |
| Authors: | Toby Kenney [aut, cre] |
| Maintainer: | Toby Kenney <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-06-24 13:25:03 UTC |
| Source: | https://github.com/cran/hypergraph.sizing |
Constructs a complete hypergraph containing all edges of order up to a given value.
complete.hypergraph(vertices,max.order)complete.hypergraph(vertices,max.order)
vertices |
The number of vertices |
max.order |
The largest number of vertices in an edge |
Constructs a hypergraph containing all possible edges of order up to the parameter value max.order. Uses the combn function in the utils package, which lists edges in lexicographic order.
The hypergraph, represented as a list with two components:
The number of vertices in the hypergraph
A list of edges. Each edge is a vector of vertex numbers.
The hypergraph includes all edges with at most max.order vertices. The edges are in increasing order of size. Within each size, the edges are produced by the combn function in the utils package, which currently produces them in lexicographic order.
Toby Kenney
H<-complete.hypergraph(10,3) HH<-complete.hypergraph(10,3) H
Counts the number of independent sets (or equivalently vertex covers) in all initial subhypergraphs of the given hypergraph.
count.indep.hyper(H,samp.size=1000) count.vc.hyper(H,samp.size=1000)count.indep.hyper(H,samp.size=1000) count.vc.hyper(H,samp.size=1000)
H |
The hypergraph, represented as a list with two components:
|
samp.size |
The number of samples used to estimate the number of independent sets for each subhypergraph. Larger values take proportionately longer, but produce more accurate values. Because samples from subgraphs are reused for efficiency, this is not the exact number of samples used. |
An independent set for a hypergraph is a set that contains no edges. A vertex cover is a set that has non-empty intersection with every edge. The complement of an independent set is a vertex cover, so there are the same number of independent sets as vertex covers. If the edges of a hypergraph are totally ordered, then we have a sequence of nested hypergraphs. These functions count the number of independent sets for each hypergraph in this nested sequence. Since counting the number of independent sets in a hypergraph is NP-hard, this function uses a random importance sampling scheme that sequentially samples independent sets starting from the empty set and adding one element each time. The samp.size parameter is the number of such sequences generated.
A vector of length the number of edges. The ith entry in this vector is the approximate number of independent sets in the subhypergraph consisting of the first i edges of the hypergraph.
Toby Kenney
Hypergraph Variable Selection with False Discovery Rate Control
Sarah Organ, Toby Kenney, Hong Gu
<doi:10.48550/arXiv.2606.20514>
set.seed(1) edge.orders<-rpois(15,1.6)+1 H<-list("vertices"=10,lapply(edge.orders,function(s){sample(seq_len(10),s)})) ind.set<-count.indep.hyper(H)set.seed(1) edge.orders<-rpois(15,1.6)+1 H<-list("vertices"=10,lapply(edge.orders,function(s){sample(seq_len(10),s)})) ind.set<-count.indep.hyper(H)
Performs Hypothesis Tests for the Null Hypothesis that a Set of Predictors Contains no True Predictor.
get.p.vals.hypergraph(x,y,H,test)get.p.vals.hypergraph(x,y,H,test)
x |
The matrix of predictors. |
y |
A vector containing the response variable. |
H |
A hypergraph represented as a list with two components:
|
test |
either one of the character strings "gaussian", "binomial" or "poisson", or else a list with two components:
The make.test function creates a standard test of this form for "gaussian", "binomial" or "poisson" GLMs. |
Each edge of the hypergraph corresponds to a collection of subsets of the predictor variables. This function, for each of these sets, computes the p-values for the hypotheses that the set contains no true predictors.
A vector of p-values in the same order as the edges of the hypergraph.
Toby Kenney
set.seed(1) X<-matrix(rnorm(200),20,10)%*%(diag(rep(1,10))-c(0.4,0.4,rep(0,8))%*%t(c(0.4,0.4,rep(0,8)))) Y<-rnorm(20)+X%*%c(0,2,0,2,2,0,0,0,0,0) H<-complete.hypergraph(10,2) get.p.vals.hypergraph(X,Y,H,"gaussian")set.seed(1) X<-matrix(rnorm(200),20,10)%*%(diag(rep(1,10))-c(0.4,0.4,rep(0,8))%*%t(c(0.4,0.4,rep(0,8)))) Y<-rnorm(20)+X%*%c(0,2,0,2,2,0,0,0,0,0) H<-complete.hypergraph(10,2) get.p.vals.hypergraph(X,Y,H,"gaussian")
Calculates a threshold to control gFDR at the specified level when using GLSUP for hypotheses corresponding to the edges of a hypergraph, using a conservative bound on the independent set sizing function.
get.threshold.hyper(H,level,assumption="PRDS")get.threshold.hyper(H,level,assumption="PRDS")
H |
The hypergraph, represented as a list with two components:
|
level |
The desired gFDR control level for GLSUP. |
assumption |
The assumptions for the dependency between null p-values. Options are "PRDS", "independent" or null. "PRDS" and "independent" give the same threshold, while null gives a more conservative threshold. |
The GLSUP rejects as many hypotheses as possible such that the sizing function applied to the rejected hypotheses exceeds a threshold multiplied by the p-value cut-off. This threshold is calculated based on the assumptions about dependency between null p-values, and the desired gFDR control level. This function performs these calculations under common cases.
The threshold for FDR control.
Toby Kenney
Hypergraph Variable Selection with False Discovery Rate Control
Sarah Organ, Toby Kenney, Hong Gu
<doi:10.48550/arXiv.2606.20514>
Setwise Hierarchical Variable Selection and the Generalized Linear Step-Up Procedure for False Discovery Rate Control
Sarah Organ, Toby Kenney, Hong Gu <doi:10.48550/arXiv.2603.02160>
set.seed(1) edge.orders<-rpois(15,1.6)+1 H<-list("vertices"=10,"edges"=lapply(edge.orders,function(s){sample(seq_len(10),s)})) get.threshold.hyper(H,0.05,assumption="PRDS")set.seed(1) edge.orders<-rpois(15,1.6)+1 H<-list("vertices"=10,"edges"=lapply(edge.orders,function(s){sample(seq_len(10),s)})) get.threshold.hyper(H,0.05,assumption="PRDS")
Produces a standard hypothesis test for a given null hypothesis that a certain set of variables are null. These tests can be input to the get.p.values.hypergraph functions
make.test(test.name)make.test(test.name)
test.name |
The name of the test being used. Currently, options are "gaussian", "binomial" and "poisson". |
Produces two-sided hypothesis tests for the null hypothesis that a set of predictors are all null, for the given GLM model. The models are fitted by the standard glm package, and p-values are calculated using ANOVA - a likelihood ratio test for "binomial" and "poisson", and an F-test for "gaussian".
A test object, which is a list with two components:
A function taking a formula as input and returning a fitted model.
A function taking two nested fitted models as input and returning a p-value for the hypothesis that the outer model does not fit the data any better than the inner model.
This test object can be used by the get.p.vals.hypergraph function to perform any collection of tests.
Toby Kenney
set.seed(1) X<-matrix(rnorm(200),20,10)%*%(diag(rep(1,10))-c(0.4,0.4,rep(0,8))%*%t(c(0.4,0.4,rep(0,8)))) Y<-rnorm(20)+X%*%c(0,2,0,2,2,0,0,0,0,0) tst<-make.test("gaussian") m1<-tst$model(Y~X) X_sub<-X[,-c(1,2)] m2<-tst$model(Y~X_sub) tst$p.val(m1,m2)set.seed(1) X<-matrix(rnorm(200),20,10)%*%(diag(rep(1,10))-c(0.4,0.4,rep(0,8))%*%t(c(0.4,0.4,rep(0,8)))) Y<-rnorm(20)+X%*%c(0,2,0,2,2,0,0,0,0,0) tst<-make.test("gaussian") m1<-tst$model(Y~X) X_sub<-X[,-c(1,2)] m2<-tst$model(Y~X_sub) tst$p.val(m1,m2)
Creates a hypergraph-based sizing function for the GLSUP package.
sizing.hyper(H,samp.size=1000)sizing.hyper(H,samp.size=1000)
H |
The hypergraph, represented as a list with two components:
|
samp.size |
The number of samples used to estimate the number of independent sets for each subhypergraph. Larger values take proportionately longer, but produce more accurate values. Because samples from subgraphs are reused for efficiency, this is not the exact number of samples used. |
An independent set for a hypergraph is a set that contains no edges. A vertex cover is a set that has non-empty intersection with every edge. The complement of an independent set is a vertex cover, so there are the same number of independent sets as vertex covers. If the edges of a hypergraph are totally ordered, then we have a sequence of nested hypergraphs. These functions count the number of independent sets for each hypergraph in this nested sequence. Since counting the number of independent sets in a hypergraph is NP-hard, this function uses a random importance sampling scheme that sequentially samples independent sets starting from the empty set and adding one element each time. The samp.size parameter is the number of such sequences generated.
For a hypergraph H of rejected hypotheses, the sizing function |V(H)|-log_2(|IS(H)|), where V(H) is the set of vertices of H and IS(H) is the set of independent sets, is suggested. This function returns a sizing function that calculates this for a nested sequence of hypergraphs from the full hypergraph.
A sizing function - that is a function that takes a permutation as input, then computes the function |V(H')|-log_2(|IS(H')|), where V(H') is the set of vertices of H' and IS(H') is the set of independent sets of H', for every H' obtained by taking the first i edges of H. This sizing function can be used with the GLSUP package to perform gFDR-controlled hypothesis testing for this sizing function.
Toby Kenney
Hypergraph Variable Selection with False Discovery Rate Control
Sarah Organ, Toby Kenney, Hong Gu
<doi:10.48550/arXiv.2606.20514>
set.seed(1) edge.orders<-rpois(15,1.6)+1 H<-list("vertices"=10,lapply(edge.orders,function(s){sample(seq_len(10),s)})) sizing<-sizing.hyper(H) sizing(seq_len(15))set.seed(1) edge.orders<-rpois(15,1.6)+1 H<-list("vertices"=10,lapply(edge.orders,function(s){sample(seq_len(10),s)})) sizing<-sizing.hyper(H) sizing(seq_len(15))