Title: | Multivariate Testing Through Joint Resampling-Based Tests |
---|---|
Description: | Runs resampling-based tests jointly, e.g., sign-flip score tests from Hemerik et al., (2020) <doi:10.1111/rssb.12369>, to allow for multivariate testing, i.e., weak and strong control of the Familywise Error Rate or True Discovery Proportion. |
Authors: | Livio Finos [cre, aut] , Angela Andreella [aut], Filippo Gambarota [ctb] |
Maintainer: | Livio Finos <[email protected]> |
License: | GPL-2 |
Version: | 1.0 |
Built: | 2024-12-16 23:41:25 UTC |
Source: | CRAN |
jointest
objectsMethods for combining jointest
objects.
combine
combines the tests derived from multiverse models.
combine_contrasts
combines the tests derived from the contrasts of a factor variable to get a
global test for the factor (i.e. categorical predictor).
It has strong analogies with ANOVA test.
combine(mods, comb_funct = "maxT", by = NULL, by_list=NULL, tail = 0) combine_contrasts(mods, comb_funct = "Mahalanobis", tail = 0)
combine(mods, comb_funct = "maxT", by = NULL, by_list=NULL, tail = 0) combine_contrasts(mods, comb_funct = "Mahalanobis", tail = 0)
mods |
a |
comb_funct |
combining function to be used.
Several functions are implemented: "mean", "median", "Fisher", "Liptak", (equal to) "Stoufer", "Tippet", (equal to) "minp", "maxT", "Mahalanobis".
Alternatively it can be a custom function that has a Tspace matrix as input.
For |
by |
if |
by_list |
NULL (default) or a list of vectors. For each vector of the list it combines test statistics with position given by the element of the vector. If the vectors in the list are characters, these refer to names(mods$Tspace). |
tail |
direction of the alternative hypothesis. It can be "two.sided" (or 0, the default), "less" (or -1) or "greater" (or +1). |
The function returns a jointest
-object.
#First example library(jointest) set.seed(123) #Simulate data n=20 D=data.frame(X=rnorm(n),Z1=rnorm(n),Z2=rnorm(n)) D$Y=D$Z1+D$X+rnorm(n) # Run four glms abd combine it in a list mod1=glm(Y~X+Z1+Z2,data=D) mod2=glm(Y~X+poly(Z1,2)+Z2,data=D) mod3=glm(Y~X+poly(Z1,2)+poly(Z2,2),data=D) mod4=glm(Y~X+Z1+poly(Z2,2),data=D) mods=list(mod1=mod1,mod2=mod2,mod3=mod3,mod4=mod4) # Let us analyze the tests related to coefficient "X" and combine them res=join_flipscores(mods,n_flips = 5000, seed = 1, tested_coeffs = "X") summary(combine(res)) # Second (continued) example # flipscores jointly on all models and all coefficients res=join_flipscores(mods,n_flips = 2000) summary(combine(res)) summary(combine(res, by="Model")) summary(combine(res, by="Coeff")) res2=combine_contrasts(res) summary(res2) #custom combinations: coeffs=c("(Intercept)","X","Z1","Z2") coeffs_ids=lapply(coeffs,grep,res2$summary_table$Coeff) names(coeffs_ids)=coeffs summary(combine(res2,by_list = coeffs_ids))
#First example library(jointest) set.seed(123) #Simulate data n=20 D=data.frame(X=rnorm(n),Z1=rnorm(n),Z2=rnorm(n)) D$Y=D$Z1+D$X+rnorm(n) # Run four glms abd combine it in a list mod1=glm(Y~X+Z1+Z2,data=D) mod2=glm(Y~X+poly(Z1,2)+Z2,data=D) mod3=glm(Y~X+poly(Z1,2)+poly(Z2,2),data=D) mod4=glm(Y~X+Z1+poly(Z2,2),data=D) mods=list(mod1=mod1,mod2=mod2,mod3=mod3,mod4=mod4) # Let us analyze the tests related to coefficient "X" and combine them res=join_flipscores(mods,n_flips = 5000, seed = 1, tested_coeffs = "X") summary(combine(res)) # Second (continued) example # flipscores jointly on all models and all coefficients res=join_flipscores(mods,n_flips = 2000) summary(combine(res)) summary(combine(res, by="Model")) summary(combine(res, by="Coeff")) res2=combine_contrasts(res) summary(res2) #custom combinations: coeffs=c("(Intercept)","X","Z1","Z2") coeffs_ids=lapply(coeffs,grep,res2$summary_table$Coeff) names(coeffs_ids)=coeffs summary(combine(res2,by_list = coeffs_ids))
This function fits a model based on the provided formula and data, accounting for clusters and summary statistics within the model.
flip2sss(formula = NULL, data = NULL, cluster = NULL, family = "gaussian", summstats_within=NULL, ...)
flip2sss(formula = NULL, data = NULL, cluster = NULL, family = "gaussian", summstats_within=NULL, ...)
formula |
A formula or a list of formulas. It can be a complete model as.formula or a list of formulas, one for each element produced by the function. |
data |
The dataset to be used for fitting the model. |
cluster |
A vector or a formula evaluated on the data that defines the clusters. |
family |
as in |
summstats_within |
A vector of summary statistics model within the data or a function with argument data. |
... |
Other arguments passed to the |
A jointest
object, i.e., a list containing the following objects:
data.frame
where rows represents the sign-flipping transformed (plus the identity one) test and columns the variables.
data.frame
containing for each second-step covariate the estimated parameter, score, std error, test , partial correlation and p-value.
List of glm
objects, i.e., first-step glm
objects
Livio Finos, Angela Andreella
library(jointest) set.seed(123) # Simulate data N=20 n=rpois(N,20) reff=rep(rnorm(N),n) D=data.frame(X1=rnorm(length(reff)), X2=rep(rnorm(N),n), Grp=factor(rep(rep(LETTERS[1:3],length.out=N),n)), Subj=rep(1:N,n)) D$Y=rbinom(n=nrow(D),prob=plogis( 2*D$X1 * (D$Grp=="B") + 2*D$X2+reff),size=1) # model of interest formula <- Y ~ Grp * X1 + X2 # clusters structure defined by cluster <- factor(D$Subj) # The 2-Stage Summary Statistics via flipscore: res <- flip2sss(Y ~ Grp * X1 + X2, data=D, cluster=D$Subj, family="binomial") summary(res) # This is an ANOVA-like overall test: summary(combine(res)) # This is an ANOVA-like test: summary(combine_contrasts(res)) # An alternative and more flexible definition of the model: # Define the summary statistics (here we propose the glm with firth correction # from the logistf package) summstats_within <- 'logistf::logistf(Y ~ X1, family = binomial(link = "logit"), control=logistf::logistf.control(maxit=100))' # however also the classic glm function can be used: #summstats_within <- 'glm(Y ~ X1, family = binomial(link = "logit"))' # Then, compute the 2-Stage Summary Statistics approach # specifying the summary statistics (within cluster/subject) res <- flip2sss(Y ~ Grp * X1 + X2, data=D, cluster=D$Subj, summstats_within=summstats_within) summary(res) # We can also combine the tests: # Overall: summary(combine(res)) # This is similar to an ANOVA test: summary(combine_contrasts(res))
library(jointest) set.seed(123) # Simulate data N=20 n=rpois(N,20) reff=rep(rnorm(N),n) D=data.frame(X1=rnorm(length(reff)), X2=rep(rnorm(N),n), Grp=factor(rep(rep(LETTERS[1:3],length.out=N),n)), Subj=rep(1:N,n)) D$Y=rbinom(n=nrow(D),prob=plogis( 2*D$X1 * (D$Grp=="B") + 2*D$X2+reff),size=1) # model of interest formula <- Y ~ Grp * X1 + X2 # clusters structure defined by cluster <- factor(D$Subj) # The 2-Stage Summary Statistics via flipscore: res <- flip2sss(Y ~ Grp * X1 + X2, data=D, cluster=D$Subj, family="binomial") summary(res) # This is an ANOVA-like overall test: summary(combine(res)) # This is an ANOVA-like test: summary(combine_contrasts(res)) # An alternative and more flexible definition of the model: # Define the summary statistics (here we propose the glm with firth correction # from the logistf package) summstats_within <- 'logistf::logistf(Y ~ X1, family = binomial(link = "logit"), control=logistf::logistf.control(maxit=100))' # however also the classic glm function can be used: #summstats_within <- 'glm(Y ~ X1, family = binomial(link = "logit"))' # Then, compute the 2-Stage Summary Statistics approach # specifying the summary statistics (within cluster/subject) res <- flip2sss(Y ~ Grp * X1 + X2, data=D, cluster=D$Subj, summstats_within=summstats_within) summary(res) # We can also combine the tests: # Overall: summary(combine(res)) # This is similar to an ANOVA test: summary(combine_contrasts(res))
The function allows hypothesis testing across all plausible multiverse models ensuring strong family-wise error rate control.
join_flipscores(mods, tested_coeffs = NULL, n_flips = 5000, score_type = "standardized", statistics = "t", seed=NULL, output_models =TRUE, ...)
join_flipscores(mods, tested_coeffs = NULL, n_flips = 5000, score_type = "standardized", statistics = "t", seed=NULL, output_models =TRUE, ...)
mods |
list of |
tested_coeffs |
list of the same length of |
n_flips |
number of flips, default 5000 |
score_type |
any valid type for |
statistics |
"t" is the only method implemented (yet). Any other value will not modify the score (a different statistic will only affect the multivariate inference, not the univariate one). |
seed |
|
output_models |
|
... |
any other further parameter. |
A jointest
object, i.e., a list containing the following objects:
data.frame
where rows represents the sign-flipping transformed (plus the identity one) test and columns the variables.
data.frame
containing for each model the estimated parameter(s), score(s), std error(s), test(s), partial correlation(s) and p-value(s).
List of glm
s or flipscores
objects.
library(jointest) set.seed(123) #EXAMPLE 1: Simulate data: n=20 D=data.frame(X=rnorm(n),Z1=rnorm(n),Z2=rnorm(n)) D$Y=D$Z1+D$X+rnorm(n) # Run four glms abd combine it in a list mod1=glm(Y~X+Z1+Z2,data=D) mod2=glm(Y~X+poly(Z1,2)+Z2,data=D) mod3=glm(Y~X+poly(Z1,2)+poly(Z2,2),data=D) mod4=glm(Y~X+Z1+poly(Z2,2),data=D) mods=list(mod1=mod1,mod2=mod2,mod3=mod3,mod4=mod4) # flipscores jointly on all models res=join_flipscores(mods,n_flips = 1000) summary(combine(res)) summary(combine(res, by="Model")) summary(combine_contrasts(res)) #Simulate multivariate (50) bionomial responses set.seed(123) n=30 D=data.frame(X=rnorm(n),Z=rnorm(n)) Y=replicate(50,rbinom(n,1,plogis(.5*D$Z+.5*D$X))) colnames(Y)=paste0("Y",1:50) D=cbind(D,Y) mods=lapply(1:50,function(i)eval(parse(text= paste(c("glm(formula(Y",i,"~X+Z),data=D,family='binomial')"),collapse="")))) # flipscores jointly on all models res=join_flipscores(mods,n_flips = 1000,tested_coeffs ="X") summary(res) res=p.adjust(res) summary(res) # Compute lower bound for the true discovery proportion. See packages pARI and sumSome # install.packages("sumSome") # install.packages("pARI") # library(sumSome) # library(pARI) # pARI returns a lower bound equals 0.24, i.e., at least 24% of the models # have a significant effect related to X # pARI::pARI(ix = c(1:50),pvalues = t(jointest:::.t2p(res$Tspace)),family = "simes",delta = 9)$TDP # sumSome returns a lower bound equals 0.42, i.e., at least 42% of the models # have a significant effect related to X # sumSome::tdp(sumSome::sumStats(G = as.matrix(res$Tspace)))
library(jointest) set.seed(123) #EXAMPLE 1: Simulate data: n=20 D=data.frame(X=rnorm(n),Z1=rnorm(n),Z2=rnorm(n)) D$Y=D$Z1+D$X+rnorm(n) # Run four glms abd combine it in a list mod1=glm(Y~X+Z1+Z2,data=D) mod2=glm(Y~X+poly(Z1,2)+Z2,data=D) mod3=glm(Y~X+poly(Z1,2)+poly(Z2,2),data=D) mod4=glm(Y~X+Z1+poly(Z2,2),data=D) mods=list(mod1=mod1,mod2=mod2,mod3=mod3,mod4=mod4) # flipscores jointly on all models res=join_flipscores(mods,n_flips = 1000) summary(combine(res)) summary(combine(res, by="Model")) summary(combine_contrasts(res)) #Simulate multivariate (50) bionomial responses set.seed(123) n=30 D=data.frame(X=rnorm(n),Z=rnorm(n)) Y=replicate(50,rbinom(n,1,plogis(.5*D$Z+.5*D$X))) colnames(Y)=paste0("Y",1:50) D=cbind(D,Y) mods=lapply(1:50,function(i)eval(parse(text= paste(c("glm(formula(Y",i,"~X+Z),data=D,family='binomial')"),collapse="")))) # flipscores jointly on all models res=join_flipscores(mods,n_flips = 1000,tested_coeffs ="X") summary(res) res=p.adjust(res) summary(res) # Compute lower bound for the true discovery proportion. See packages pARI and sumSome # install.packages("sumSome") # install.packages("pARI") # library(sumSome) # library(pARI) # pARI returns a lower bound equals 0.24, i.e., at least 24% of the models # have a significant effect related to X # pARI::pARI(ix = c(1:50),pvalues = t(jointest:::.t2p(res$Tspace)),family = "simes",delta = 9)$TDP # sumSome returns a lower bound equals 0.42, i.e., at least 42% of the models # have a significant effect related to X # sumSome::tdp(sumSome::sumStats(G = as.matrix(res$Tspace)))
jointest
objectsMethods to extract and manipulate relevant information from
a jointest
object.
print
method for class "jointest
".
summary
method for class "jointest
"
p.adjust
method for class "jointest
".
Add adjusted p-values into the jointest
object.
plot
method for class "jointest
"
This plot
function visualizes p-values from multiverse models, with
different markers to indicate statistical significance levels as defined by
the mark_signif
argument (default is 0.05). Points are plotted with
varying shapes based on whether the p-value is below the significance threshold,
and colors are used to distinguish between different coefficients.
## S3 method for class 'jointest' print(x, ...) ## S3 method for class 'jointest' summary(object, ...) p.adjust(object, method = "maxT", tail = 0, ...) ## S3 method for class 'jointest' plot(x, ...)
## S3 method for class 'jointest' print(x, ...) ## S3 method for class 'jointest' summary(object, ...) p.adjust(object, method = "maxT", tail = 0, ...) ## S3 method for class 'jointest' plot(x, ...)
x |
an object of class |
... |
additional arguments to be passed, i.e., |
object |
an object of class |
method |
any method implemented in |
tail |
argument: expresses the tail direction of the alternative hypothesis. It can be "two.sided" (or 0, the default), "less" (or -1) or "greater" (or +1). |
mark_signif
argument: numeric value representing the significance threshold
for marking p-values. Any p-value below this threshold will be marked
with a dot. The default is 0.05
.
p.values
argument: a character vector specifying which p-values to display.
It can be either "raw"
for raw p-values or "adjusted"
for
adjusted p-values. The default is "raw"
.
This dataset consists of a longitudinal collection of 150 subjects aged 60 to 96. Each subject was scanned on two or more visits, separated by at least one year for a total of 373 imaging sessions. For each subject, 3 or 4 individual T1-weighted MRI scans obtained in single scan sessions are included.
oasis
oasis
A data frame with 373 rows and 15 variables:
Subject identification
MRI Exam Identification
Class
Visit order
MR Delay Time (Contrast)
Gender
All subjects are right-handed
Age of the subject
Years of Education
Socioeconomic Status
Mini Mental State Examination
Clinical Dementia Rating
Estimated total intracranial volume
Normalize Whole Brain Volume
Atlas Scaling Factor
https://www.kaggle.com/datasets/jboysen/mri-and-alzheimers