Title: | Mixed Model Association Test for GEne-Environment Interaction |
---|---|
Description: | Use a 'glmmkin' class object (GMMAT package) from the null model to perform generalized linear mixed model-based single-variant and variant set main effect tests, gene-environment interaction tests, and joint tests for association, as proposed in Wang et al. (2020) <DOI:10.1002/gepi.22351>. |
Authors: | Xinyu Wang [aut], Han Chen [aut, cre], Duy Pham [aut], Kenneth Westerman [ctb], Cong Pan [aut], Eric Biggers [ctb, cph] (Author and copyright holder of included libdeflate library), Tino Reichardt [ctb, cph] (Author and copyright holder of threading code used in the included Zstandard (zstd) library), Meta Platforms, Inc. and affiliates [cph] (Copyright holder of included Zstandard (zstd) library) |
Maintainer: | Han Chen <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.4.2 |
Built: | 2024-11-25 06:38:26 UTC |
Source: | CRAN |
An R package for performing generalized linear mixed model-based single-variant and variant set main effect tests, gene-environment interaction tests, and joint tests for association, as proposed in Wang et al. (2020) <DOI:10.1002/gepi.22351>.
Package: | MAGEE |
Type: | Package |
Version: | 1.4.2 |
Date: | 2024-07-26 |
License: | GPL (>= 3) |
Xinyu Wang, Han Chen, Duy T. Pham
Maintainer: Han Chen <[email protected]>
Wang, X., Lim, E., Liu, C, Sung, Y.J., Rao, D.C., Morrison, A.C., Boerwinkle, E., Manning, A. K., and Chen, H. (2020) Efficient gene-environment interaction tests for large biobank-scale sequencing studies. Genetic Epidemiology, 44(8): 908-923.
Chen, H., Huffman, J.E., Brody, J.A., Wang, C., Lee, S., Li, Z., Gogarten, S.M., Sofer, T., Bielak, L.F., Bis, J.C., et al. (2019) Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. The American Journal of Human Genetics, 104 (2): 260-274.
Wang, X., Pham, D.T., Westerman, K.E., Pan, C., Manning, A.K., Chen, H., (2022) Genomic summary statistics and meta-analysis for set-based gene-environment interaction tests in large-scale sequencing studies. medrxiv.
Example dataset for MAGEE.
Contains the following objects:
a data frame of 400 observations from a cross-sectional study with 5 variables: id, disease, trait, age and sex.
a data frame of 2,000 observations from a longitudinal study with 400 individuals and 5 variables: id, y.repeated, y.trend, time and sex.
a genetic relationship matrix for 400 observations.
Use a glmmkin class object from the null GLMM to perform single variant main effect score test, gene-environment interaction test, or joint test for association with genotypes in a GDS file .gds.
glmm.gei(null.obj, interaction, geno.file, outfile, bgen.samplefile = NULL, interaction.covariates = NULL, meta.output = FALSE, covar.center="interaction.covariates.only", geno.center=TRUE, cat.threshold = 20, MAF.range = c(1e-7, 0.5), MAC.cutoff = 1, miss.cutoff = 1, RSQ.cutoff = 0, missing.method = "impute2mean", nperbatch=100, is.dosage = FALSE, ncores = 1, verbose = FALSE)
glmm.gei(null.obj, interaction, geno.file, outfile, bgen.samplefile = NULL, interaction.covariates = NULL, meta.output = FALSE, covar.center="interaction.covariates.only", geno.center=TRUE, cat.threshold = 20, MAF.range = c(1e-7, 0.5), MAC.cutoff = 1, miss.cutoff = 1, RSQ.cutoff = 0, missing.method = "impute2mean", nperbatch=100, is.dosage = FALSE, ncores = 1, verbose = FALSE)
null.obj |
a class glmmkin object, returned by fitting the null GLMM using |
interaction |
a numeric or a character vector indicating the environmental factors. If a numeric vector, it represents which indices in the order of covariates are the environmental factors; if a character vector, it represents the variable names of the environmental factors. |
geno.file |
the full name of a GDS file (including the suffix .gds). |
outfile |
the output file name. |
bgen.samplefile |
path to the BGEN .sample file. Required when the BGEN file does not contain sample identifiers. |
interaction.covariates |
a numeric or a character vector indicating the interaction covariates. If a numeric vector, it represents which indices in the order of covariates are the interaction covariates; if a character vector, it represents the variable names of the interaction covariates. |
meta.output |
boolean value to modiy the output file.If TRUE, the GxE effect estimate and variance and covariance associated with the effect estimate are included in the output file. (default = FALSE) |
covar.center |
a character value for the centering option for covariates. Possible values are "none", "all", or "interaction.covariates.only". Generally, centering exposures and covariates to have mean 0 before creating interaction terms would make the genetic main effect easier to interpret. However, if a subsequent meta-analysis is expected, then the exposures of interest should not be centered because in that case the genetic main effect may have different interpretations across studies (default = "interaction.covariates.only"). |
geno.center |
a logical switch for centering genotypes before tests. If TRUE, genotypes will be centered to have mean 0 before tests, otherwise raw values will be directly used in tests (default = TRUE). |
cat.threshold |
a numeric cut-off to determine which interaction terms or interaction covariates should be automatically treated as categorical based on the number of levels (unique observations). They will be treated as categorical if the number of levels (unique observations) is less than or equal to this numeric cut-off (default = 20). |
MAF.range |
a numeric vector of length 2 defining the minimum and maximum minor allele frequencies of variants that should be included in the analysis (default = c(1e-7, 0.5)). |
MAC.cutoff |
the minimum minor allele count allowed for a variant to be included (default = 1, including all variants). |
miss.cutoff |
the maximum missing rate allowed for a variant to be included (default = 1, including all variants). |
RSQ.cutoff |
the minimum Rsq value, defined as the ratio of observed and expected genotypic variance under Hardy-Weinberg equilibrium, allowed for a variant to be included (default = 0, including all variants). |
missing.method |
method of handling missing genotypes.Either "impute2mean" or "omit" (default = "impute2mean"). |
nperbatch |
an integer for how many SNPs should be tested in a batch (default = 100). The computational time can increase dramatically if this value is either small or large. The optimal value for best performance depends on the user's system. |
is.dosage |
a logical switch for whether imputed dosage should be used from a GDS |
ncores |
a positive integer indicating the number of cores to be used in parallel computing (default = 1). |
verbose |
a logical switch for whether failed matrix inversions should be written to |
NULL
Xinyu Wang, Han Chen, Duy Pham
Chen, H., Wang, C., Conomos, M.P., Stilp, A.M., Li, Z., Sofer, T., Szpiro, A.A., Chen, W., Brehm, J.M., Celedón, J.C., Redline, S., Papanicolaou, G.J., Thornton, T.A., Laurie, C.C., Rice, K. and Lin, X. (2016) Control forpopulation structure and relatedness for binary traits in genetic association studies via logistic mixed models. The American Journal of Human Genetics 98, 653-666.
library(GMMAT) data(example) attach(example) model0 <- glmmkin(disease ~ age + sex, data = pheno, kins = GRM, id = "id", family = binomial(link = "logit")) if(requireNamespace("SeqArray", quietly = TRUE) && requireNamespace("SeqVarTools", quietly = TRUE)) { infile <- system.file("extdata", "geno.gds", package = "MAGEE") gds_outfile <- tempfile() glmm.gei(model0, interaction='sex', geno.file = infile, outfile = gds_outfile) unlink(gds_outfile) } infile <- system.file("extdata", "geno.bgen", package = "MAGEE") samplefile <- system.file("extdata", "geno.sample", package = "MAGEE") bgen_outfile <- tempfile() glmm.gei(model0, interaction='sex', geno.file = infile, outfile = bgen_outfile, bgen.samplefile = samplefile) unlink(bgen_outfile)
library(GMMAT) data(example) attach(example) model0 <- glmmkin(disease ~ age + sex, data = pheno, kins = GRM, id = "id", family = binomial(link = "logit")) if(requireNamespace("SeqArray", quietly = TRUE) && requireNamespace("SeqVarTools", quietly = TRUE)) { infile <- system.file("extdata", "geno.gds", package = "MAGEE") gds_outfile <- tempfile() glmm.gei(model0, interaction='sex', geno.file = infile, outfile = gds_outfile) unlink(gds_outfile) } infile <- system.file("extdata", "geno.bgen", package = "MAGEE") samplefile <- system.file("extdata", "geno.sample", package = "MAGEE") bgen_outfile <- tempfile() glmm.gei(model0, interaction='sex', geno.file = infile, outfile = bgen_outfile, bgen.samplefile = samplefile) unlink(bgen_outfile)
Use a glmmkin class object from the null GLMM to perform meta-analysis of single variant main effect score test, gene-environment interaction test, or joint test for association with genotypes in a GDS file .gds.
glmm.gei.meta(files, outfile, interaction, SNPID = rep("SNPID", length(files)), CHR = rep("CHR", length(files)), POS = rep("POS", length(files)), Non_Effect_Allele = rep("Non_Effect_Allele", length(files)), Effect_Allele = rep("Effect_Allele", length(files)))
glmm.gei.meta(files, outfile, interaction, SNPID = rep("SNPID", length(files)), CHR = rep("CHR", length(files)), POS = rep("POS", length(files)), Non_Effect_Allele = rep("Non_Effect_Allele", length(files)), Effect_Allele = rep("Effect_Allele", length(files)))
files |
tab or space delimited plain text files (or compressed files that can be recognized by the R function read.table) with at least the following columns: SNPID, CHR, POS, Non_Effect_Allele, Effect_Allele, N_Samples, AF, Beta_Marginal, SE_Beta_Marginal, P_Value_Marginal, Beta_G, Beta_G_sex, SE_Beta_G, SE_Beta_G_sex, Cov_Beta_G_G.sex, P_Value_Interaction, P_Value_Joint. Generally, if each study performs score tests using genotypes in PLINK binary PED format or GDS format, the score test output from glmm.score can be directly used as input files. |
outfile |
the output file name. |
interaction |
a numeric or a character vector indicating the environmental factors. If a numeric vector, it represents which indices in the order of covariates are the environmental factors; if a character vector, it represents the variable names of the environmental factors. |
SNPID |
a character vector of SNPID column names in each input file. The length and order must match the length and order of |
CHR |
a character vector of CHR column names in each input file. The length and order must match the length and order of |
POS |
a character vector of POS column names in each input file. The length and order must match the length and order of |
Non_Effect_Allele |
a character vector of Non_Effect_Allele column names in each input file. The length and order must match the length and order of |
Effect_Allele |
a character vector of Effect_Allele column names in each input file. The length and order must match the length and order of |
a data frame containing the following:
SNPID |
SNP name. |
CHR |
chromosome. |
POS |
physical position. |
Non_Effect_Allele |
non_effect allele frequency. |
Effect_Allele |
effect allele frequency. |
N_Samples |
number of samples. |
AF |
allele frequency. |
Beta_Marginal |
coefficient estimate for the marginal genetic effect. |
SE_Beta_Marginal |
standard error of the marginal genetic effect. |
P_Value_Marginal |
marginal effect score test p-value. |
Beta_G |
coefficient estimate for the genetic main effect. |
Beta_G-* |
coefficient estimate for the interaction terms. |
SE_Beta_G |
model-based standard error associated with the the genetic main effect. |
SE_Beta_G-* |
mdel-based standard error associated with any GxE or interaction covariate terms. |
Cov_Beta_G_G-* |
model-based covariance between the genetic main effect and any GxE or interaction covariate terms. |
P_Value_Interaction |
gene-environment interaction test p-value. |
P_Value_Joint |
joint test p-value. |
Xinyu Wang, Han Chen, Duy Pham
Chen, H., Wang, C., Conomos, M.P., Stilp, A.M., Li, Z., Sofer, T., Szpiro, A.A., Chen, W., Brehm, J.M., Celedón, J.C., Redline, S., Papanicolaou, G.J., Thornton, T.A., Laurie, C.C., Rice, K. and Lin, X. (2016) Control forpopulation structure and relatedness for binary traits in genetic association studies via logistic mixed models. The American Journal of Human Genetics 98, 653-666.
infile1 <- system.file("extdata", "meta1.txt", package = "MAGEE") infile2 <- system.file("extdata", "meta2.txt", package = "MAGEE") infile3 <- system.file("extdata", "meta3.txt", package = "MAGEE") infile4 <- system.file("extdata", "meta4.txt", package = "MAGEE") infile5 <- system.file("extdata", "meta5.txt", package = "MAGEE") outfile <- tempfile() glmm.gei.meta(files = c(infile1, infile2, infile3, infile4, infile5), outfile = outfile, interaction="sex") unlink(outfile)
infile1 <- system.file("extdata", "meta1.txt", package = "MAGEE") infile2 <- system.file("extdata", "meta2.txt", package = "MAGEE") infile3 <- system.file("extdata", "meta3.txt", package = "MAGEE") infile4 <- system.file("extdata", "meta4.txt", package = "MAGEE") infile5 <- system.file("extdata", "meta5.txt", package = "MAGEE") outfile <- tempfile() glmm.gei.meta(files = c(infile1, infile2, infile3, infile4, infile5), outfile = outfile, interaction="sex") unlink(outfile)
Use a glmmkin class object from the null GLMM to perform variant set-based main effect tests, gene-environment interaction tests, and joint tests for association with genotypes in a GDS file (.gds). 7 user-defined tests are included: Main effect variance component test (MV), Main effect hybrid test of burden and variance component test using Fisher's method (MF), Interaction variance component test (IV), Interaction hybrid test of burden and variance component test using Fisher's method (IF), Joint variance component test (JV), Joint hybrid test of burden and variance component test using Fisher's method (JF), and Joint hybrid test of burden and variance component test using double Fisher's procedures (JD).
MAGEE(null.obj, interaction, geno.file, group.file, group.file.sep = "\t", bgen.samplefile = NULL, interaction.covariates = NULL, meta.file.prefix = NULL, MAF.range = c(1e-7, 0.5), AF.strata.range = c(0, 1), MAF.weights.beta = c(1, 25), miss.cutoff = 1, missing.method = "impute2mean", method = "davies", tests = "JF", use.minor.allele = FALSE, auto.flip = FALSE, Garbage.Collection = FALSE, is.dosage = FALSE, ncores = 1) MAGEE.prep(null.obj, interaction, geno.file, group.file, interaction.covariates = NULL, group.file.sep = "\t", auto.flip = FALSE) MAGEE.lowmem(MAGEE.prep.obj, geno.file = NULL, meta.file.prefix = NULL, MAF.range = c(1e-7, 0.5), AF.strata.range = c(0, 1), MAF.weights.beta = c(1, 25), miss.cutoff = 1, missing.method = "impute2mean", method = "davies", tests = "JF", use.minor.allele = FALSE, Garbage.Collection = FALSE, is.dosage = FALSE, ncores = 1)
MAGEE(null.obj, interaction, geno.file, group.file, group.file.sep = "\t", bgen.samplefile = NULL, interaction.covariates = NULL, meta.file.prefix = NULL, MAF.range = c(1e-7, 0.5), AF.strata.range = c(0, 1), MAF.weights.beta = c(1, 25), miss.cutoff = 1, missing.method = "impute2mean", method = "davies", tests = "JF", use.minor.allele = FALSE, auto.flip = FALSE, Garbage.Collection = FALSE, is.dosage = FALSE, ncores = 1) MAGEE.prep(null.obj, interaction, geno.file, group.file, interaction.covariates = NULL, group.file.sep = "\t", auto.flip = FALSE) MAGEE.lowmem(MAGEE.prep.obj, geno.file = NULL, meta.file.prefix = NULL, MAF.range = c(1e-7, 0.5), AF.strata.range = c(0, 1), MAF.weights.beta = c(1, 25), miss.cutoff = 1, missing.method = "impute2mean", method = "davies", tests = "JF", use.minor.allele = FALSE, Garbage.Collection = FALSE, is.dosage = FALSE, ncores = 1)
null.obj |
a class glmmkin object, returned by fitting the null GLMM using |
interaction |
a numeric or a character vector indicating the environmental factors. If a numberic vector, it represents which indices in the order of covariates are the environmental factors; if a character vector, it represents the variable names of the environmental factors. |
geno.file |
a .gds file for the full genotypes. The |
group.file |
a plain text file with 6 columns defining the test units. There should be no headers in the file, and the columns are group name, chromosome, position, reference allele, alternative allele and weight, respectively. |
group.file.sep |
the delimiter in group.file (default = |
bgen.samplefile |
path to the BGEN .sample file. Required when the BGEN file does not contain sample identifiers. |
interaction.covariates |
a numeric or a character vector indicating the interaction covariates. If a numeric vector, it represents which indices in the order of covariates are the interaction covariates; if a character vector, it represents the variable names of the interaction covariates. |
meta.file.prefix |
the prefix for meta-analysis (default = |
MAF.range |
a numeric vector of length 2 defining the minimum and maximum minor allele frequencies of variants that should be included in the analysis (default = c(1e-7, 0.5)). |
AF.strata.range |
a numeric vector of length 2 defining the minimum and maximum coding allele frequencies of variants in each stratum that should be included in the analysis, if the environmental factor is categorical (default = c(0, 1)). |
MAF.weights.beta |
a numeric vector of length 2 defining the beta probability density function parameters on the minor allele frequencies. This internal minor allele frequency weight is multiplied by the external weight given by the group.file. To turn off internal minor allele frequency weight and only use the external weight given by the group.file, use c(1, 1) to assign flat weights (default = c(1, 25)). |
miss.cutoff |
the maximum missing rate allowed for a variant to be included (default = 1, including all variants). |
missing.method |
method of handling missing genotypes. Either "impute2mean" or "impute2zero" (default = "impute2mean"). |
method |
a method to compute p-values for the test statistics (default = "davies"). "davies" represents an exact method that computes a p-value by inverting the characteristic function of the mixture chisq distribution, with an accuracy of 1e-6. When "davies" p-value is less than 1e-5, it defaults to method "kuonen". "kuonen" represents a saddlepoint approximation method that computes the tail probabilities of the mixture chisq distribution. When "kuonen" fails to compute a p-value, it defaults to method "liu". "liu" is a moment-matching approximation method for the mixture chisq distribution. |
tests |
a character vector indicating which MAGEE tests should be performed ("MV" for the main effect variance component test, "MF" for the main effect combined test of the burden and variance component tests using Fisher's method, "IV" for the interaction variance component test, "IF" for the interaction combined test of the burden and variance component tests using Fisher's method, "JV" for the joint variance component test for main effect and interaction, "JF" for the joint combined test of the burden and variance component tests for main effect and interaction using Fisher's method, or "JD" for the joint combined test of the burden and variance component tests for main effect and interaction using double Fisher's method.). The "MV" and "IV" test are automatically included when performing "JV", and the "MF" and "IF" test are automatically included when performing "JF" or "JD" (default = "JF"). |
use.minor.allele |
a logical switch for whether to use the minor allele (instead of the alt allele) as the coding allele (default = FALSE). It does not change "MV", "IV", and "JV" results, but "MF", "IF", "JF", and "JD" results will be affected.Along with the MAF filter, this option is useful for combining rare mutations, assuming rare allele effects are in the same direction. |
auto.flip |
a logical switch for whether to enable automatic allele flipping if a variant with alleles ref/alt is not found at a position, but a variant at the same position with alleles alt/ref is found (default = FALSE). Use with caution for whole genome sequence data, as both ref/alt and alt/ref variants at the same position are not uncommon, and they are likely two different variants, rather than allele flipping. |
Garbage.Collection |
a logical switch for whether to enable garbage collection in each test (default = FALSE). Pay for memory efficiency with slower computation speed. |
is.dosage |
a logical switch (default = FALSE). |
ncores |
a positive integer indicating the number of cores to be used in parallel computing (default = 1). |
MAGEE.prep.obj |
a class MAGEE.prep object, returned by |
a data frame with the following components:
group |
name of the test unit group. |
n.variants |
number of variants in the test unit group that pass the missing rate and allele frequency filters. |
miss.min |
minimum missing rate for variants in the test unit group. |
miss.mean |
mean missing rate for variants in the test unit group. |
miss.max |
maximum missing rate for variants in the test unit group. |
freq.min |
minimum coding allele frequency for variants in the test unit group. |
freq.mean |
mean coding allele frequency for variants in the test unit group. |
freq.max |
maximum coding allele frequency for variants in the test unit group. |
freq.strata.min |
minimum coding allele frequency of each stratum if the environmental factor is categorical. |
freq.strata.max |
maximum coding allele frequency of each stratum if the environmental factor is categorical. |
MV.pval |
MV test p-value. |
MF.pval |
MF test p-value. |
IV.pval |
IV test p-value. |
IF.pval |
IF test p-value. |
JV.pval |
JV test p-value. |
JF.pval |
JF test p-value. |
JD.pval |
JD test p-value. |
Xinyu Wang, Han Chen, Duy Pham
Wang, X., Lim, E., Liu, C, Sung, Y.J., Rao, D.C., Morrison, A.C., Boerwinkle, E., Manning, A. K., and Chen, H. (2020) Efficient gene-environment interaction tests for large biobank-scale sequencing studies. Genetic Epidemiology, 44(8): 908-923. Chen, H., Huffman, J.E., Brody, J.A., Wang, C., Lee, S., Li, Z., Gogarten, S.M., Sofer, T., Bielak, L.F., Bis, J.C., et al. (2019) Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. The American Journal of Human Genetics, 104 (2): 260-274.
if(requireNamespace("SeqArray", quietly = TRUE) && requireNamespace("SeqVarTools", quietly = TRUE)) { library(GMMAT) data(example) attach(example) model0 <- glmmkin(disease ~ age + sex, data = pheno, kins = GRM, id = "id", family = binomial(link = "logit")) geno.file <- system.file("extdata", "geno.gds", package = "MAGEE") group.file <- system.file("extdata", "SetID.withweights.txt", package = "MAGEE") out <- MAGEE(model0, interaction='sex', geno.file, group.file, group.file.sep = "\t", tests=c("JV", "JF", "JD")) print(out) }
if(requireNamespace("SeqArray", quietly = TRUE) && requireNamespace("SeqVarTools", quietly = TRUE)) { library(GMMAT) data(example) attach(example) model0 <- glmmkin(disease ~ age + sex, data = pheno, kins = GRM, id = "id", family = binomial(link = "logit")) geno.file <- system.file("extdata", "geno.gds", package = "MAGEE") group.file <- system.file("extdata", "SetID.withweights.txt", package = "MAGEE") out <- MAGEE(model0, interaction='sex', geno.file, group.file, group.file.sep = "\t", tests=c("JV", "JF", "JD")) print(out) }
Use a glmmkin class object from the null GLMM to perform meta-analysis of variant set-based main effect tests, gene-environment interaction tests, and joint tests for association with genotypes in a GDS file (.gds).
MAGEE.meta(meta.files.prefix, n.files = rep(1, length(meta.files.prefix)), cohort.group.idx = NULL, group.file, group.file.sep = "\t", E.match=NULL, MAF.range = c(1e-7, 0.5), MAF.weights.beta = c(1, 25), miss.cutoff = 1, method = "davies", tests = "JF", use.minor.allele = FALSE)
MAGEE.meta(meta.files.prefix, n.files = rep(1, length(meta.files.prefix)), cohort.group.idx = NULL, group.file, group.file.sep = "\t", E.match=NULL, MAF.range = c(1e-7, 0.5), MAF.weights.beta = c(1, 25), miss.cutoff = 1, method = "davies", tests = "JF", use.minor.allele = FALSE)
meta.files.prefix |
a character vector for prefix of intermediate files (.score.* and .var.*) required in a meta-analysis. Each element represents the prefix of .score.* and .var.* from one cohort. The length of vector should be equal to the number of cohorts. |
n.files |
an integer vector with the same length as meta.files.prefix, indicating how many sets of intermediate files (.score.* and .var.*) are expected from each cohort, usually as the result of multi-threading in creating the intermediate files (default = rep(1, length(meta.files.prefix))). |
cohort.group.idx |
a vector with the same length as meta.files.prefix, indicating which cohorts should be grouped together in the meta-analysis assuming homogeneous genetic effects. For example, c("a","b","a","a","b") means cohorts 1, 3, 4 are assumed to have homogeneous genetic effects, and cohorts 2, 5 are in another group with homogeneous genetic effects (but possibly heterogeneous with group "a"). If NULL, all cohorts are in the same group (default = NULL). |
group.file |
a plain text file with 6 columns defining the test units. There should be no headers in the file, and the columns are group name, chromosome, position, reference allele, alternative allele and weight, respectively. |
group.file.sep |
the delimiter in group.file (default = "\t"). |
E.match |
environmental factors that match the interactions (default = "\t"). |
MAF.range |
a numeric vector of length 2 defining the minimum and maximum minor allele frequencies of variants that should be included in the analysis (default = c(1e-7, 0.5)). Filter applied to the combined samples. |
MAF.weights.beta |
a numeric vector of length 2 defining the beta probability density function parameters on the minor allele frequencies. This internal minor allele frequency weight is multiplied by the external weight given by the group.file. To turn off internal minor allele frequency weight and only use the external weight given by the group.file, use c(1, 1) to assign flat weights (default = c(1, 25)). Applied to the combined samples. |
miss.cutoff |
the maximum missing rate allowed for a variant to be included (default = 1, including all variants). Filter applied to the combined samples. |
method |
a method to compute p-values for SKAT-type test statistics (default = "davies"). "davies" represents an exact method that computes a p-value by inverting the characteristic function of the mixture chisq distribution, with an accuracy of 1e-6. When "davies" p-value is less than 1e-5, it defaults to method "kuonen". "kuonen" represents a saddlepoint approximation method that computes the tail probabilities of the mixture chisq distribution. When "kuonen" fails to compute a p-value, it defaults to method "liu". "liu" is a moment-matching approximation method for the mixture chisq distribution. |
tests |
a character vector indicating which MAGEE tests should be performed ("MV" for the main effect variance component test, "MF" for the main effect combined test of the burden and variance component tests using Fisher's method, "IV" for the interaction variance component test, "IF" for the interaction combined test of the burden and variance component tests using Fisher's method, "JV" for the joint variance component test for main effect and interaction, "JF" for the joint combined test of the burden and variance component tests for main effect and interaction using Fisher's method, or "JD" for the joint combined test of the burden and variance component tests for main effect and interaction using double Fisher's method.). The "MV" and "IV" test are automatically included when performing "JV", and the "MF" and "IF" test are automatically included when performing "JF" or "JD" (default = "JF"). |
use.minor.allele |
a logical switch for whether to use the minor allele (instead of the alt allele) as the coding allele (default = FALSE). It does not change "MV", "IV", and "JV" results, but "MF", "IF", "JF", and "JD" results will be affected. Along with the MAF filter, this option is useful for combining rare mutations, assuming rare allele effects are in the same direction. |
group |
name of the test unit group. |
n.variants |
number of variants in the test unit group that pass the missing rate and allele frequency filters. |
MV.pval |
MV test p-value. |
MF.pval |
MF test p-value. |
IV.pval |
IV test p-value. |
IF.pval |
IF test p-value. |
JV.pval |
JV test p-value. |
JF.pval |
JF test p-value. |
JD.pval |
JD test p-value. |
Xinyu Wang, Han Chen, Duy Pham
Wang, X., Lim, E., Liu, C, Sung, Y.J., Rao, D.C., Morrison, A.C., Boerwinkle, E., Manning, A. K., and Chen, H. (2020) Efficient gene-environment interaction tests for large biobank-scale sequencing studies. Genetic Epidemiology, 44(8): 908-923. Chen, H., Huffman, J.E., Brody, J.A., Wang, C., Lee, S., Li, Z., Gogarten, S.M., Sofer, T., Bielak, L.F., Bis, J.C., et al. (2019) Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. The American Journal of Human Genetics, 104 (2): 260-274. Wang, X., Pham, D.T., Westerman, K.E., Pan, C., Manning, A.K., Chen, H., (2022) Genomic summary statistics and meta-analysis for set-based gene-environment interaction tests in large-scale sequencing studies. medrxiv.
if(requireNamespace("SeqArray", quietly = TRUE) && requireNamespace("SeqVarTools", quietly = TRUE)) { library(GMMAT) data(example) attach(example) model0 <- glmmkin(disease ~ age + sex, data = pheno, kins = GRM, id = "id", family = binomial(link = "logit")) geno.file <- system.file("extdata", "geno.gds", package = "MAGEE") group.file <- system.file("extdata", "SetID.withweights.txt", package = "MAGEE") meta.file.prefix <- tempfile() out <- MAGEE(model0, interaction="sex",geno.file=geno.file, group.file=group.file, meta.file.prefix = meta.file.prefix, MAF.range = c(0, 0.5), miss.cutoff = 1, method = "davies", tests = c("JV","JF","JD")) print(out) out1 <- MAGEE.meta(meta.file.prefix, group.file = group.file, tests = c("JV","JF","JD")) print(out1) unlink(paste0(meta.file.prefix, c(".score", ".cov"), ".1")) }
if(requireNamespace("SeqArray", quietly = TRUE) && requireNamespace("SeqVarTools", quietly = TRUE)) { library(GMMAT) data(example) attach(example) model0 <- glmmkin(disease ~ age + sex, data = pheno, kins = GRM, id = "id", family = binomial(link = "logit")) geno.file <- system.file("extdata", "geno.gds", package = "MAGEE") group.file <- system.file("extdata", "SetID.withweights.txt", package = "MAGEE") meta.file.prefix <- tempfile() out <- MAGEE(model0, interaction="sex",geno.file=geno.file, group.file=group.file, meta.file.prefix = meta.file.prefix, MAF.range = c(0, 0.5), miss.cutoff = 1, method = "davies", tests = c("JV","JF","JD")) print(out) out1 <- MAGEE.meta(meta.file.prefix, group.file = group.file, tests = c("JV","JF","JD")) print(out1) unlink(paste0(meta.file.prefix, c(".score", ".cov"), ".1")) }