| Title: | Genome-Wide Robust Analysis for Biobank Data (GRAB) |
|---|---|
| Description: | Provides a comprehensive suite of genome-wide association study (GWAS) methods specifically designed for biobank-scale data, including but not limited to, robust approaches for time-to-event traits (Li et al., 2025 <doi:10.1038/s43588-025-00864-z>) and ordinal categorical traits (Bi et al., 2021 <doi:10.1016/j.ajhg.2021.03.019>). The package also offers general frameworks for GWAS of any trait type (Bi et al., 2020 <doi:10.1016/j.ajhg.2020.06.003>), while accounting for sample relatedness (Xu et al., 2025 <doi:10.1038/s41467-025-56669-1>) or population structure (Ma et al., 2025 <doi:10.1186/s13059-025-03827-9>). By accurately approximating score statistic distributions using saddlepoint approximation (SPA), these methods can effectively control type I error rates for rare variants and in the presence of unbalanced phenotype distributions. Additionally, the package includes functions for simulating genotype and phenotype data to support research and method development. |
| Authors: | Wenjian Bi [aut], Wei Zhou [aut], Rounak Dey [aut], Zhangchen Zhao [aut], Seunggeun Lee [aut], Woody Miao [cre] |
| Maintainer: | Woody Miao <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.2.4 |
| Built: | 2026-06-03 09:06:54 UTC |
| Source: | https://github.com/cran/GRAB |
Combines multiple p-values using the Cauchy distribution method, which provides analytical p-value calculation under arbitrary dependency structures.
CCT(pvals, weights = NULL)CCT(pvals, weights = NULL)
pvals |
Numeric vector of p-values to combine (each between 0 and 1). P-values equal to 1 are automatically adjusted to 0.999. P-values equal to 0 will cause an error. |
weights |
Numeric vector of non-negative weights for each p-value. If NULL, equal weights are used. Must have same length as pvals. |
The Cauchy Combination Test (CCT) transforms p-values using the inverse Cauchy distribution and combines them with specified weights. This method is particularly powerful because it:
Works under arbitrary dependency structures
Provides exact analytical p-values (no simulation needed)
Maintains good power properties across different scenarios
Special Cases:
If any p-value equals 0, returns 0 immediately
P-values equal to 1 are adjusted to 0.999 with a warning
Very small p-values (< 1e-16) receive special numerical treatment
Single aggregated p-value combining all input p-values.
Liu, Y., & Xie, J. (2020). Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. Journal of the American Statistical Association, 115(529), 393-402. doi:10.1080/01621459.2018.1554485
# Basic usage with equal weights pvalues <- c(0.02, 0.0004, 0.2, 0.1, 0.8) CCT(pvals = pvalues) # Usage with custom weights weights <- c(2, 3, 1, 1, 1) CCT(pvals = pvalues, weights = weights)# Basic usage with equal weights pvalues <- c(0.02, 0.0004, 0.2, 0.1, 0.8) CCT(pvals = pvalues) # Usage with custom weights weights <- c(2, 3, 1, 1, 1) CCT(pvals = pvalues, weights = weights)
This function calculates pairwise IBD probabilities for related samples using PLINK genotype data and sparse GRM. It follows the getSparseGRM() function workflow.
getPairwiseIBD( PlinkPrefix, SparseGRMFile, PairwiseIBDOutput, frqFile = NULL, tempDir = NULL, maxSampleNums = 2500, minMafIBD = 0.01, rm.tempFile = FALSE )getPairwiseIBD( PlinkPrefix, SparseGRMFile, PairwiseIBDOutput, frqFile = NULL, tempDir = NULL, maxSampleNums = 2500, minMafIBD = 0.01, rm.tempFile = FALSE )
PlinkPrefix |
Character. Path to PLINK file (without file extensions .bed/.bim/.fam). |
SparseGRMFile |
Character. Path to sparse GRM file from getSparseGRM() function. |
PairwiseIBDOutput |
Character. Output path to save pairwise IBD results. |
frqFile |
Character. Path to frequency file corresponding to PLINK file. If NULL (default), uses PlinkPrefix.frq. |
tempDir |
Character. Directory to save temporary files. If NULL (default), uses tempdir(). |
maxSampleNums |
Integer. Maximum number of subjects' genotypes to read for analysis (default: 2500). |
minMafIBD |
Numeric. Minimum MAF cutoff to select markers (default: 0.01). |
rm.tempFile |
Logical. Whether to delete temporary files (default: FALSE). |
Character. Message indicating where the pairwise IBD results have been stored.
PlinkPrefix <- file.path(system.file(package = "GRAB"), "extdata", "simuPLINK") SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") PairwiseIBDOutput <- file.path(tempdir(), "PairwiseIBD.txt") getPairwiseIBD(PlinkPrefix, SparseGRMFile, PairwiseIBDOutput)PlinkPrefix <- file.path(system.file(package = "GRAB"), "extdata", "simuPLINK") SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") PairwiseIBDOutput <- file.path(tempdir(), "PairwiseIBD.txt") getPairwiseIBD(PlinkPrefix, SparseGRMFile, PairwiseIBDOutput)
SparseGRMFile for GRAB.NullModel.If the sample size in analysis is greater than 100,000, we recommend using sparse GRM
(instead of dense GRM) to adjust for sample relatedness.
This function is to use GCTA (link)
to make a SparseGRMFile to be passed to function GRAB.NullModel.
This function can only support Linux and PLINK files as required by
GCTA software. To make a SparseGRMFile, two steps are needed.
Please check Details section for more details.
getSparseGRM( PlinkPrefix, nPartsGRM, SparseGRMFile, tempDir = NULL, relatednessCutoff = 0.05, minMafGRM = 0.01, maxMissingGRM = 0.1, rm.tempFiles = FALSE )getSparseGRM( PlinkPrefix, nPartsGRM, SparseGRMFile, tempDir = NULL, relatednessCutoff = 0.05, minMafGRM = 0.01, maxMissingGRM = 0.1, rm.tempFiles = FALSE )
PlinkPrefix |
a path to PLINK binary files (without file extension).
Note that the current version (gcta_1.93.1beta) of |
nPartsGRM |
a numeric value (e.g. 250): |
SparseGRMFile |
a path to file of output to be passed to |
tempDir |
a path to store temp files from |
relatednessCutoff |
a cutoff for sparse GRM, only kinship coefficient greater than this cutoff will be retained in sparse GRM. (default=0.05) |
minMafGRM |
Minimal value of MAF cutoff to select markers (from PLINK files) to make sparse GRM. (default=0.01) |
maxMissingGRM |
Maximal value of missing rate to select markers (from PLINK files) to make sparse GRM. (default=0.1) |
rm.tempFiles |
a logical value indicating if the temp files generated in
|
Step 1: Run getTempFilesFullGRM to save temporary files
to tempDir.
Step 2: Run getSparseGRM to combine the temporary files to make
a SparseGRMFile to be passed to function GRAB.NullModel.
Users can customize parameters including (minMafGRM, maxMissingGRM, nPartsGRM),
but functions getTempFilesFullGRM and getSparseGRM should use the same ones.
Otherwise, package GRAB cannot accurately identify temporary files.
A character string containing a message with the path to the output file where the sparse Genetic Relationship Matrix (SparseGRM) has been stored.
# Input data (We recommend setting nPartsGRM=250 for UKBB with N=500K):
GenoFile = system.file("extdata", "simuPLINK.bed", package = "GRAB")
PlinkPrefix = tools::file_path_sans_ext(GenoFile)
nPartsGRM = 2
# For Linux, get the file path of gcta64 by which command:
gcta64File <- system("which gcta64", intern = TRUE)
# For Windows, set the file path directly:
gcta64File <- "C:\\path\\to\\gcta64.exe"
# The temp outputs (may be large) will be in tempdir() by default:
for(partParallel in 1:nPartsGRM) getTempFilesFullGRM(PlinkPrefix, nPartsGRM,
partParallel, gcta64File)
tempDir = tempdir()
SparseGRMFile = file.path(tempDir, "SparseGRM.txt")
getSparseGRM(PlinkPrefix, nPartsGRM, SparseGRMFile)
getSparseGRM.Make temporary files to be passed to function getSparseGRM.
We strongly suggest using parallel computing for different partParallel.
getTempFilesFullGRM( PlinkPrefix, nPartsGRM, partParallel, gcta64File, tempDir = NULL, subjData = NULL, minMafGRM = 0.01, maxMissingGRM = 0.1, threadNum = 8 )getTempFilesFullGRM( PlinkPrefix, nPartsGRM, partParallel, gcta64File, tempDir = NULL, subjData = NULL, minMafGRM = 0.01, maxMissingGRM = 0.1, threadNum = 8 )
PlinkPrefix |
a path to PLINK files (without file extensions of bed/bim/fam). Note that the current version (gcta_1.93.1beta) of gcta software does not support different prefix names for bim, bed, and fam files. |
nPartsGRM |
a numeric value (e.g. 250): |
partParallel |
a numeric value (from 1 to |
gcta64File |
a path to |
tempDir |
a path to store temp files to be passed to |
subjData |
a character vector to specify subject IDs to retain (i.e. IID).
Default is |
minMafGRM |
Minimal value of MAF cutoff to select markers (from PLINK files) to make sparse GRM. (default=0.01) |
maxMissingGRM |
Maximal value of missing rate to select markers (from PLINK files) to make sparse GRM. (default=0.1) |
threadNum |
Number of threads (CPUs) to use. |
Step 1: Run getTempFilesFullGRM to get temporary files.
Step 2: Run getSparseGRM to combine the temporary files
to make a SparseGRMFile to be passed to GRAB.NullModel.
A character string message indicating the completion status and location of the temporary files.
## Please check help(getSparseGRM) for an example.## Please check help(getSparseGRM) for an example.
This function shares input as in function GRAB.ReadGeno, please
check ?GRAB.ReadGeno for more details.
GRAB.getGenoInfo( GenoFile, GenoFileIndex = NULL, SampleIDs = NULL, control = NULL )GRAB.getGenoInfo( GenoFile, GenoFileIndex = NULL, SampleIDs = NULL, control = NULL )
GenoFile |
a character of genotype file. See |
GenoFileIndex |
additional index file(s) corresponding to
|
SampleIDs |
a character vector of sample IDs to extract. The default
is |
control |
List of control parameters with the following options:
|
A data frame containing marker information with allele frequencies and missing rates. The data frame includes columns from marker information (CHROM, POS, ID, REF, ALT, etc.) plus additional columns:
Alternative allele frequency (before genotype imputation)
Missing rate for each marker
Converts a numeric genotype matrix to PLINK text files (PED and MAP format) for use with PLINK software and other genetic analysis tools.
GRAB.makePlink( GenoMat, OutputPrefix, A1 = "G", A2 = "A", CHR = NULL, BP = NULL, Pheno = NULL, Sex = NULL )GRAB.makePlink( GenoMat, OutputPrefix, A1 = "G", A2 = "A", CHR = NULL, BP = NULL, Pheno = NULL, Sex = NULL )
GenoMat |
Numeric genotype matrix (n×m) with values 0, 1, 2, or -9. Rows = subjects, columns = markers. Row and column names are required. |
OutputPrefix |
Output file prefix including path (without extension). |
A1 |
Allele 1 character, usually minor/ALT allele (default: "G"). |
A2 |
Allele 2 character, usually major/REF allele (default: "A"). |
CHR |
Chromosome numbers for markers (default: all chromosome 1). |
BP |
Base positions for markers (default: 1:m). |
Pheno |
Phenotype values for subjects (default: all missing as -9). |
Sex |
Sex codes for subjects (default: all coded as 1). |
Genotype Encoding:
0, 1, 2 → copies of minor allele
-9 → missing genotype (coded as "00" in PED)
A1="G", A2="A": 0→"GG", 1→"AG", 2→"AA", -9→"00"
Output Files:
.ped: Pedigree file with genotype data
.map: Marker map file with positions
Downstream Processing:
# Convert to binary format plink --file prefix --make-bed --out prefix # Convert to raw format plink --bfile prefix --recode A --out prefix_raw # Convert to BGEN format plink2 --bfile prefix --export bgen-1.2 bits=8 ref-first --out prefix_bgen # Create BGEN index bgenix -g prefix_bgen.bgen --index
Character message confirming file creation location.
### Step 1: simulate a numeric genotype matrix n <- 1000 m <- 20 MAF <- 0.3 set.seed(123) GenoMat <- matrix(rbinom(n * m, 2, MAF), n, m) rownames(GenoMat) <- paste0("Subj-", 1:n) colnames(GenoMat) <- paste0("SNP-", 1:m) OutputDir <- tempdir() outputPrefix <- file.path(OutputDir, "simuPLINK") ### Step 2(a): make PLINK files without missing genotype GRAB.makePlink(GenoMat, outputPrefix) ### Step 2(b): make PLINK files with genotype missing rate of 0.1 indexMissing <- sample(n * m, 0.1 * n * m) GenoMat[indexMissing] <- -9 GRAB.makePlink(GenoMat, outputPrefix) ## The following are in shell environment # plink --file simuPLINK --make-bed --out simuPLINK # plink --bfile simuPLINK --recode A --out simuRAW # plink2 --bfile simuPLINK --export bgen-1.2 bits=8 ref-first --out simuBGEN # UK Biobank use 'ref-first' # bgenix -g simuBGEN.bgen --index### Step 1: simulate a numeric genotype matrix n <- 1000 m <- 20 MAF <- 0.3 set.seed(123) GenoMat <- matrix(rbinom(n * m, 2, MAF), n, m) rownames(GenoMat) <- paste0("Subj-", 1:n) colnames(GenoMat) <- paste0("SNP-", 1:m) OutputDir <- tempdir() outputPrefix <- file.path(OutputDir, "simuPLINK") ### Step 2(a): make PLINK files without missing genotype GRAB.makePlink(GenoMat, outputPrefix) ### Step 2(b): make PLINK files with genotype missing rate of 0.1 indexMissing <- sample(n * m, 0.1 * n * m) GenoMat[indexMissing] <- -9 GRAB.makePlink(GenoMat, outputPrefix) ## The following are in shell environment # plink --file simuPLINK --make-bed --out simuPLINK # plink --bfile simuPLINK --recode A --out simuRAW # plink2 --bfile simuPLINK --export bgen-1.2 bits=8 ref-first --out simuBGEN # UK Biobank use 'ref-first' # bgenix -g simuBGEN.bgen --index
Conducts single-marker association tests between genetic variants and phenotypes using various statistical methods supported by GRAB.
GRAB.Marker( objNull, GenoFile, OutputFile, GenoFileIndex = NULL, OutputFileIndex = NULL, control = NULL )GRAB.Marker( objNull, GenoFile, OutputFile, GenoFileIndex = NULL, OutputFileIndex = NULL, control = NULL )
objNull |
(S3 object) Null model object from
|
GenoFile |
Path to genotype file. Supported formats determined by extension:
|
OutputFile |
(character) Path for saving association test results. |
GenoFileIndex |
(character vector or NULL) Associated files for the genotype file (auto-detected if NULL):
|
OutputFileIndex |
(character or NULL) #' Path to the progress tracking file from a previous unfinished run.
Enables analysis to restart if interrupted. If |
control |
(list or NULL) List of control parameters with the following elements:
|
The function returns NULL invisibly. Results are written to OutputFile.
For method-specific examples and output columns and format, see:
POLMM method: GRAB.POLMM
SPACox method: GRAB.SPACox
SPAmix method: GRAB.SPAmix
WtCoxG method: GRAB.WtCoxG
SPAGRM method: GRAB.SPAGRM
SAGELD method: GRAB.SAGELD
GRAB performs two-step genetic association testing. This function
implements the first step: fitting a null model and preparing the dataset required for
downstream marker-level (GRAB.Marker) and region-level (GRAB.Region) analyses.
GRAB.NullModel( formula, data, subjIDcol = NULL, subjData = NULL, method, traitType, GenoFile = NULL, GenoFileIndex = NULL, SparseGRMFile = NULL, control = NULL, ... )GRAB.NullModel( formula, data, subjIDcol = NULL, subjData = NULL, method, traitType, GenoFile = NULL, GenoFileIndex = NULL, SparseGRMFile = NULL, control = NULL, ... )
formula |
(formula) Formula with response variable(s) on the left and covariates on the right. Do not include an intercept (added automatically). For SPAmix with traitType "Residual", multiple response variables (separated by "+") are supported. |
data |
(data.frame) Data frame containing response variables and covariates in the formula.
Parameter "subset" is deprecated. All subjects with phenotype data will be used. Missing values
should be coded as |
subjIDcol |
(character or NULL) Column name in |
subjData |
(character vector or NULL) Subject IDs aligned with rows of |
method |
(character) Supported methods:
|
traitType |
(character) Supported: "ordinal", "time-to-event", and "Residual". |
GenoFile |
(character or NULL) Path to genotype file. Supported formats determined by extension:
|
GenoFileIndex |
(character vector or NULL) Associated files for the genotype file (auto-detected if NULL):
|
SparseGRMFile |
(character or NULL) Path to a sparse GRM file. The file must be whitespace-delimited with three columns in the order:
See |
control |
(list or NULL) List of additional, less commonly used parameters. See the corresponding method documentation for available options and defaults. |
... |
Additional method-specific parameters. |
An S3 object with class "{method}_NULL_Model". All returned objects contain the following elements:
Sample size (integer).
Character vector of subject IDs included in analysis.
Original function call.
R session and package information.
Analysis completion timestamp (character).
List of control parameters used in fitting.
This object serves as input for GRAB.Marker or GRAB.Region.
See method-specific documentation for additional elements included in the returned object.
POLMM inplements single-variant association tests for ordinal categorical phenotypes, which
accounts for sample relatedness. It can control type I error rates at a stringent significance
level regardless of the phenotypic distribution, and is more powerful than alternative methods.
This instruction covers null model fitting and marker-level analysis using POLMM.
For region-based analysis with POLMM-GENE, see GRAB.POLMM.Region.
GRAB.POLMM()GRAB.POLMM()
Genotype file: GenoFile is mandatory for GRAB.NullModel() when using POLMM.
It is required for estimating the variance ratio parameter, which is essential for calibrating
the test statistics in subsequent association tests.
Genetic Relationship Matrix (GRM) Options: POLMM supports both sparse and dense GRM for modeling genetic relatedness:
If SparseGRMFile is provided to GRAB.NullModel(),
the sparse GRM will be used in model fitting.
If SparseGRMFile is not provided, GRAB.NullModel()
will calculate a dense GRM from GenoFile.
Additional Control Parameters for GRAB.NullModel():
memoryChunk (numeric, default: 2): Memory chunk size for computation.
seed (integer, default: -1): Random seed (-1 means no seed is set).
tracenrun (integer, default: 30): Number of runs for trace calculation.
maxiter (integer, default: 100): Maximum number of iterations for model fitting.
tolBeta (numeric, default: 0.001): Convergence tolerance for beta estimates.
tolTau (numeric, default: 0.002): Convergence tolerance for tau estimates.
tau (numeric, default: 0.2): Initial variance component value.
maxiterPCG (integer, default: 100): Maximum iterations for preconditioned conjugate gradient.
tolPCG (numeric, default: 1e-6): Tolerance for preconditioned conjugate gradient.
showInfo (logical, default: FALSE): Whether to print PCG iteration information for debugging.
maxiterEps (integer, default: 100): Maximum iterations for epsilon estimation.
tolEps (numeric, default: 1e-10): Tolerance for epsilon estimation.
minMafVarRatio (numeric, default: 0.1): Minimum MAF for variance ratio estimation.
maxMissingVarRatio (numeric, default: 0.1): Maximum missing rate for variance ratio estimation.
nSNPsVarRatio (integer, default: 20): Number of SNPs used for variance ratio estimation.
CVcutoff (numeric, default: 0.0025): Coefficient of variation cutoff.
grainSize (integer, default: 1): Grain size for parallel processing.
minMafGRM (numeric, default: 0.01): Minimum MAF for GRM construction.
maxMissingGRM (numeric, default: 0.1): Maximum missing rate for GRM construction.
Method-specific elements in the POLMM_NULL_Model object returned by GRAB.NullModel()::
M: Number of ordinal categories (integer).
iter: Number of iterations to convergence (numeric).
eta: Linear predictor (matrix).
yVec: Phenotype matrix (matrix).
Cova: Design matrix of covariates (matrix).
muMat: Fitted probabilities for each category (matrix).
YMat: Indicator matrix for ordinal categories (matrix).
beta: Estimated covariate coefficients (matrix).
bVec: Random effect estimates (matrix).
tau: Variance component estimate (numeric).
eps: Cutpoints for ordinal categories (matrix).
Additional Control Parameters for GRAB.Marker():
ifOutGroup (logical, default: FALSE): Whether to output group-specific statistics
(alternative allele frequency, counts, and sample size for each ordinal category).
When TRUE, adds columns AltFreqInGroup., AltCountsInGroup., and nSamplesInGroup.* to the output file.
Marker-level results (OutputFile) columns:
Marker identifier (rsID or CHR:POS:REF:ALT).
Marker information in format CHR:POS:REF:ALT.
Alternative allele frequency in the overall sample.
Total count of alternative alleles.
Proportion of missing genotypes.
P-value from the score test.
Effect size estimate (log-odds scale).
Standard error of beta.
Z-score from the score test.
(Only if ifOutGroup = TRUE)
Alternative allele frequency in each ordinal category.
(Only if ifOutGroup = TRUE)
Alternative allele counts in each ordinal category.
(Only if ifOutGroup = TRUE) Sample size in each ordinal category.
Bi et al. (2021). Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes. doi:10.1016/j.ajhg.2021.03.019
GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") OutputFile <- file.path(tempdir(), "resultPOLMMmarker.txt") PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") PhenoData <- data.table::fread(PhenoFile, header = TRUE) PhenoData$OrdinalPheno <- factor(PhenoData$OrdinalPheno, levels = c(0, 1, 2)) # Step 1 obj.POLMM <- GRAB.NullModel( OrdinalPheno ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "POLMM", traitType = "ordinal", GenoFile = GenoFile, SparseGRMFile = SparseGRMFile ) # Step 2 GRAB.Marker(obj.POLMM, GenoFile, OutputFile, control = list(ifOutGroup = TRUE)) head(data.table::fread(OutputFile))GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") OutputFile <- file.path(tempdir(), "resultPOLMMmarker.txt") PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") PhenoData <- data.table::fread(PhenoFile, header = TRUE) PhenoData$OrdinalPheno <- factor(PhenoData$OrdinalPheno, levels = c(0, 1, 2)) # Step 1 obj.POLMM <- GRAB.NullModel( OrdinalPheno ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "POLMM", traitType = "ordinal", GenoFile = GenoFile, SparseGRMFile = SparseGRMFile ) # Step 2 GRAB.Marker(obj.POLMM, GenoFile, OutputFile, control = list(ifOutGroup = TRUE)) head(data.table::fread(OutputFile))
POLMM-GENE implements region-based association tests for ordinal categorical phenotypes, adjusting for sample relatedness. It is well-suited for analyzing rare variants in large-scale biobank data, and effectively controls type I error rates while maintaining statistical power.
GRAB.POLMM.Region()GRAB.POLMM.Region()
For single-variant tests, see GRAB.POLMM.
See GRAB.POLMM for details on step 1.
Additional Control Parameters for GRAB.Region() with POLMM:
showInfo (logical, default: FALSE): Whether to print PCG iteration information for debugging.
tolPCG (numeric, default: 0.001): Tolerance for PCG in region testing.
maxiterPCG (integer, default: 100): Maximum PCG iterations in region testing.
Results are saved to four files:
OutputFile: Region-based test results (SKAT-O, SKAT, Burden p-values).
paste0(OutputFile, ".markerInfo"): Marker-level results for rare variants
(MAC >= min_mac_region) included in region tests.
paste0(OutputFile, ".otherMarkerInfo"): Information for excluded markers
(ultra-rare variants or failed QC).
paste0(OutputFile, ".infoBurdenNoWeight"): Summary statistics for burden
tests without weights.
Region-level results (OutputFile) columns:
Region identifier from GroupFile.
Number of rare variants with MAF < cutoff and MAC >= min_mac_region.
Number of ultra-rare variants with MAC < min_mac_region.
Annotation type from GroupFile.
Maximum MAF cutoff used for variant selection.
SKAT-O test p-value.
SKAT test p-value.
Burden test p-value.
Marker-level results (paste0(OutputFile, ".markerInfo")) columns:
Region identifier.
Marker identifier.
Marker information in format CHR:POS:REF:ALT.
Annotation from GroupFile.
Alternative allele frequency.
Minor allele count.
Minor allele frequency.
Proportion of missing genotypes.
Marker status indicator (1 = rare variant included, 3 = ultra-rare variant included).
Score test statistic.
Effect size estimate.
Standard error of effect size estimate.
Unadjusted p-value.
SPA-adjusted p-value.
Position row index.
Other marker info (paste0(OutputFile, ".otherMarkerInfo")) columns:
Marker identifier.
Annotation from GroupFile.
Region identifier.
Marker information in format CHR:POS:REF:ALT.
Annotation category.
Alternative allele frequency.
Minor allele count.
Minor allele frequency.
Proportion of missing genotypes.
Status indicator (0 or 2 for excluded markers).
Burden test summary (paste0(OutputFile, ".infoBurdenNoWeight")) columns:
Region identifier.
Annotation type.
Maximum MAF cutoff.
Sum of genotypes.
Score test statistic.
Effect size estimate.
Standard error of effect size estimate.
P-value for burden test.
Bi et al. (2023). Scalable mixed model methods for set-based association studies on large-scale categorical data analysis and its application to exome-sequencing data in UK Biobank. doi:10.1016/j.ajhg.2023.03.010
GenoFileStep1 <- system.file("extdata", "simuPLINK.bed", package = "GRAB") GenoFileStep2 <- system.file("extdata", "simuPLINK_RV.bed", package = "GRAB") SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") GroupFile <- system.file("extdata", "simuPLINK_RV.group", package = "GRAB") OutputFile <- file.path(tempdir(), "resultPOLMMregion.txt") PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") PhenoData <- data.table::fread(PhenoFile, header = TRUE) PhenoData$OrdinalPheno <- factor(PhenoData$OrdinalPheno, levels = c(0, 1, 2)) # Step 1 obj.POLMM <- GRAB.NullModel( OrdinalPheno ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "POLMM", traitType = "ordinal", GenoFile = GenoFileStep1, SparseGRMFile = SparseGRMFile, control = list(tolTau = 0.2, tolBeta = 0.1) ) # Step 2 GRAB.Region(obj.POLMM, GenoFileStep2, OutputFile, GroupFile = GroupFile, SparseGRMFile = SparseGRMFile, MaxMAFVec = "0.01,0.005" ) head(data.table::fread(OutputFile)) head(data.table::fread(paste0(OutputFile, ".markerInfo"))) head(data.table::fread(paste0(OutputFile, ".otherMarkerInfo"))) head(data.table::fread(paste0(OutputFile, ".infoBurdenNoWeight")))GenoFileStep1 <- system.file("extdata", "simuPLINK.bed", package = "GRAB") GenoFileStep2 <- system.file("extdata", "simuPLINK_RV.bed", package = "GRAB") SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") GroupFile <- system.file("extdata", "simuPLINK_RV.group", package = "GRAB") OutputFile <- file.path(tempdir(), "resultPOLMMregion.txt") PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") PhenoData <- data.table::fread(PhenoFile, header = TRUE) PhenoData$OrdinalPheno <- factor(PhenoData$OrdinalPheno, levels = c(0, 1, 2)) # Step 1 obj.POLMM <- GRAB.NullModel( OrdinalPheno ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "POLMM", traitType = "ordinal", GenoFile = GenoFileStep1, SparseGRMFile = SparseGRMFile, control = list(tolTau = 0.2, tolBeta = 0.1) ) # Step 2 GRAB.Region(obj.POLMM, GenoFileStep2, OutputFile, GroupFile = GroupFile, SparseGRMFile = SparseGRMFile, MaxMAFVec = "0.01,0.005" ) head(data.table::fread(OutputFile)) head(data.table::fread(paste0(OutputFile, ".markerInfo"))) head(data.table::fread(paste0(OutputFile, ".otherMarkerInfo"))) head(data.table::fread(paste0(OutputFile, ".infoBurdenNoWeight")))
Reads genotype data from PLINK or BGEN format files with flexible filtering and processing options. Supports efficient memory usage and various imputation methods for missing genotypes.
GRAB.ReadGeno( GenoFile, GenoFileIndex = NULL, SampleIDs = NULL, control = NULL, sparse = FALSE )GRAB.ReadGeno( GenoFile, GenoFileIndex = NULL, SampleIDs = NULL, control = NULL, sparse = FALSE )
GenoFile |
Path to genotype file. Supported formats determined by extension:
|
GenoFileIndex |
Associated index files for the genotype file:
|
SampleIDs |
Character vector of sample IDs to extract. If NULL, extracts all samples. |
control |
List of control parameters with the following options:
|
sparse |
Logical indicating whether to return sparse genotype matrix (default: FALSE). |
File Format Support:
PLINK Format: Binary BED/BIM/FAM files. See https://www.cog-genomics.org/plink/2.0/ for specifications.
BGEN Format: Version 1.2 with 8-bit compression. See https://www.well.ox.ac.uk/~gav/bgen_format/spec/v1.2.html for details. Requires BGI index file created with bgenix tool.
List containing:
Genotype matrix (samples × markers) with values 0, 1, 2, or NA.
Data frame with columns CHROM, POS, ID, REF, ALT.
## Raw genotype data RawFile <- system.file("extdata", "simuRAW.raw.gz", package = "GRAB") GenoMat <- data.table::fread(RawFile) GenoMat[1:10, 1:10] ## PLINK files PLINKFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") # If include/exclude files are not specified, then control$AllMarker should be TRUE GenoList <- GRAB.ReadGeno(PLINKFile, control = list(AllMarkers = TRUE)) GenoMat <- GenoList$GenoMat markerInfo <- GenoList$markerInfo head(GenoMat[, 1:6]) head(markerInfo) ## BGEN files (Note the different REF/ALT order for BGEN and PLINK formats) BGENFile <- system.file("extdata", "simuBGEN.bgen", package = "GRAB") GenoList <- GRAB.ReadGeno(BGENFile, control = list(AllMarkers = TRUE)) GenoMat <- GenoList$GenoMat markerInfo <- GenoList$markerInfo head(GenoMat[, 1:6]) head(markerInfo) ## The below is to demonstrate parameters in control PLINKFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") IDsToIncludeFile <- system.file("extdata", "simuGENO.IDsToInclude", package = "GRAB") RangesToIncludeFile <- system.file("extdata", "RangesToInclude.txt", package = "GRAB") GenoList <- GRAB.ReadGeno(PLINKFile, control = list( IDsToIncludeFile = IDsToIncludeFile, RangesToIncludeFile = RangesToIncludeFile, AlleleOrder = "ref-first" ) ) GenoMat <- GenoList$GenoMat head(GenoMat) markerInfo <- GenoList$markerInfo head(markerInfo) ## The below is for PLINK/BGEN files with missing data PLINKFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") GenoList <- GRAB.ReadGeno(PLINKFile, control = list(AllMarkers = TRUE)) head(GenoList$GenoMat) GenoList <- GRAB.ReadGeno(PLINKFile, control = list(AllMarkers = TRUE, imputeMethod = "mean")) head(GenoList$GenoMat) BGENFile <- system.file("extdata", "simuBGEN.bgen", package = "GRAB") GenoList <- GRAB.ReadGeno(BGENFile, control = list(AllMarkers = TRUE)) head(GenoList$GenoMat)## Raw genotype data RawFile <- system.file("extdata", "simuRAW.raw.gz", package = "GRAB") GenoMat <- data.table::fread(RawFile) GenoMat[1:10, 1:10] ## PLINK files PLINKFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") # If include/exclude files are not specified, then control$AllMarker should be TRUE GenoList <- GRAB.ReadGeno(PLINKFile, control = list(AllMarkers = TRUE)) GenoMat <- GenoList$GenoMat markerInfo <- GenoList$markerInfo head(GenoMat[, 1:6]) head(markerInfo) ## BGEN files (Note the different REF/ALT order for BGEN and PLINK formats) BGENFile <- system.file("extdata", "simuBGEN.bgen", package = "GRAB") GenoList <- GRAB.ReadGeno(BGENFile, control = list(AllMarkers = TRUE)) GenoMat <- GenoList$GenoMat markerInfo <- GenoList$markerInfo head(GenoMat[, 1:6]) head(markerInfo) ## The below is to demonstrate parameters in control PLINKFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") IDsToIncludeFile <- system.file("extdata", "simuGENO.IDsToInclude", package = "GRAB") RangesToIncludeFile <- system.file("extdata", "RangesToInclude.txt", package = "GRAB") GenoList <- GRAB.ReadGeno(PLINKFile, control = list( IDsToIncludeFile = IDsToIncludeFile, RangesToIncludeFile = RangesToIncludeFile, AlleleOrder = "ref-first" ) ) GenoMat <- GenoList$GenoMat head(GenoMat) markerInfo <- GenoList$markerInfo head(markerInfo) ## The below is for PLINK/BGEN files with missing data PLINKFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") GenoList <- GRAB.ReadGeno(PLINKFile, control = list(AllMarkers = TRUE)) head(GenoList$GenoMat) GenoList <- GRAB.ReadGeno(PLINKFile, control = list(AllMarkers = TRUE, imputeMethod = "mean")) head(GenoList$GenoMat) BGENFile <- system.file("extdata", "simuBGEN.bgen", package = "GRAB") GenoList <- GRAB.ReadGeno(BGENFile, control = list(AllMarkers = TRUE)) head(GenoList$GenoMat)
Tests for association between phenotypes and genomic regions containing multiple genetic variants, primarily low-frequency and rare variants.
GRAB.Region( objNull, GenoFile, OutputFile, GenoFileIndex = NULL, OutputFileIndex = NULL, GroupFile, SparseGRMFile = NULL, MaxMAFVec = "0.01,0.001,0.0005", annoVec = "lof,lof:missense,lof:missense:synonymous", control = NULL )GRAB.Region( objNull, GenoFile, OutputFile, GenoFileIndex = NULL, OutputFileIndex = NULL, GroupFile, SparseGRMFile = NULL, MaxMAFVec = "0.01,0.001,0.0005", annoVec = "lof,lof:missense,lof:missense:synonymous", control = NULL )
objNull |
(S3 object) Null model object from |
GenoFile |
(character) Path to genotype file (PLINK or BGEN format). See
|
OutputFile |
(character) Path for saving region-based association results. |
GenoFileIndex |
(character or NULL) Index files for the genotype file. If
|
OutputFileIndex |
(character or NULL) Path for progress tracking file. If
|
GroupFile |
(character) Path to region definition file specifying region-marker mappings and annotation information. Tab-separated format with 2-3 columns per region. |
SparseGRMFile |
(character or NULL) Path to sparse GRM file (optional). |
MaxMAFVec |
(character) Comma-separated MAF cutoffs for including variants in analysis (default: "0.01,0.001,0.0005"). |
annoVec |
(character) Comma-separated annotation groups for analysis (default: "lof,lof:missense,lof:missense:synonymous"). |
control |
(list or NULL) List of the following parameters:
|
The function returns NULL invisibly. Results are saved to four files:
OutputFile: Region-based test results (SKAT-O, SKAT, Burden p-values).
paste0(OutputFile, ".markerInfo"): Marker-level results for rare variants
(MAC >= min_mac_region) included in region tests.
paste0(OutputFile, ".otherMarkerInfo"): Information for excluded markers
(ultra-rare variants or failed QC).
paste0(OutputFile, ".infoBurdenNoWeight"): Summary statistics for burden
tests without weights.
For method-specific examples and output columns and format, see:
POLMM method: GRAB.POLMM.Region
SAGELD method is Scalable and Accurate algorithm for Gene-Environment interaction analysis using Longitudinal Data for related samples in a large-scale biobank. SAGELD extended SPAGRM to support gene-environment interaction analysis.
GRAB.SAGELD()GRAB.SAGELD()
Additional list of control in SAGELD.NullModel() function.
Additional list of control in GRAB.Marker() function.
No return value, called for side effects (prints information about the SAGELD method to the console).
Generates random effect vectors (bVec) that account for family relationships through kinship-based correlation structures.
GRAB.SimubVec(nSub, nFam, FamMode, tau)GRAB.SimubVec(nSub, nFam, FamMode, tau)
nSub |
Number of unrelated subjects. If 0, all subjects are related. |
nFam |
Number of families. If 0, all subjects are unrelated. |
FamMode |
Family structure: "4-members", "10-members", or "20-members".
See |
tau |
Variance component for random effects. |
For related subjects, random effects follow a multivariate normal distribution with covariance proportional to kinship coefficients. For unrelated subjects, effects are independent normal random variables.
Data frame with columns:
Subject identifiers.
Random effect values following appropriate correlation structure.
Generates genotype data for association studies, supporting both unrelated subjects and family-based designs with various pedigree structures.
GRAB.SimuGMat( nSub, nFam, FamMode, nSNP, MaxMAF = 0.5, MinMAF = 0.05, MAF = NULL )GRAB.SimuGMat( nSub, nFam, FamMode, nSNP, MaxMAF = 0.5, MinMAF = 0.05, MAF = NULL )
nSub |
Number of unrelated subjects. If 0, all subjects are related. |
nFam |
Number of families. If 0, all subjects are unrelated. |
FamMode |
Family structure: "4-members", "10-members", or "20-members".
See |
nSNP |
Number of genetic markers to simulate. |
MaxMAF |
Maximum minor allele frequency for simulation (default: 0.5). |
MinMAF |
Minimum minor allele frequency for simulation (default: 0.05). |
MAF |
Optional vector of specific MAF values for each marker. If provided,
|
Genotypes are simulated under Hardy-Weinberg equilibrium with MAF ~ Uniform(MinMAF, MaxMAF).
Family Structures:
4-members: 1+2→3+4 (parents 1,2 → offspring 3,4)
10-members: 1+2→5+6, 3+5→7+8, 4+6→9+10
20-members: Complex multi-generational pedigree with 20 members
Total subjects: nSub + nFam × family_size
List containing:
Numeric genotype matrix (subjects × markers) with values 0, 1, 2.
Data frame with marker IDs and MAF values.
GRAB.makePlink for converting to PLINK format.
nSub <- 100 nFam <- 10 FamMode <- "10-members" nSNP <- 10000 OutList <- GRAB.SimuGMat(nSub, nFam, FamMode, nSNP) GenoMat <- OutList$GenoMat markerInfo <- OutList$markerInfo GenoMat[1:10, 1:10] head(markerInfo) ## The following is to calculate GRM MAF <- apply(GenoMat, 2, mean) / 2 GenoMatSD <- t((t(GenoMat) - 2 * MAF) / sqrt(2 * MAF * (1 - MAF))) GRM <- GenoMatSD %*% t(GenoMatSD) / ncol(GenoMat) GRM1 <- GRM[1:10, 1:10] GRM2 <- GRM[100 + 1:10, 100 + 1:10] GRM1 GRM2nSub <- 100 nFam <- 10 FamMode <- "10-members" nSNP <- 10000 OutList <- GRAB.SimuGMat(nSub, nFam, FamMode, nSNP) GenoMat <- OutList$GenoMat markerInfo <- OutList$markerInfo GenoMat[1:10, 1:10] head(markerInfo) ## The following is to calculate GRM MAF <- apply(GenoMat, 2, mean) / 2 GenoMatSD <- t((t(GenoMat) - 2 * MAF) / sqrt(2 * MAF * (1 - MAF))) GRM <- GenoMatSD %*% t(GenoMatSD) / ncol(GenoMat) GRM1 <- GRM[1:10, 1:10] GRM2 <- GRM[100 + 1:10, 100 + 1:10] GRM1 GRM2
Generates genotype matrices for families and unrelated subjects using haplotype data from existing genotype files. Primarily designed for rare variant analysis simulations.
GRAB.SimuGMatFromGenoFile( nFam, nSub, FamMode, GenoFile, GenoFileIndex = NULL, SampleIDs = NULL, control = NULL )GRAB.SimuGMatFromGenoFile( nFam, nSub, FamMode, GenoFile, GenoFileIndex = NULL, SampleIDs = NULL, control = NULL )
nFam |
Number of families to simulate. |
nSub |
Number of unrelated subjects to simulate. |
FamMode |
Family structure: "4-members", "10-members", or "20-members". See Details for pedigree structures. |
GenoFile |
Path to genotype file (passed to |
GenoFileIndex |
Index file for genotype data (optional, for BGEN files). |
SampleIDs |
Vector of sample IDs to include (optional). |
control |
List of control parameters passed to |
This function supports both unrelated and related subjects. Founder genotypes are sampled from the input genotype file, and offspring genotypes are generated through Mendelian inheritance.
Note: When simulating related subjects, alleles are randomly assigned to haplotypes during the phasing process.
4-members: Total subjects = nSub + 4×nFam. Structure: 1+2→3+4
10-members: Total subjects = nSub + 10×nFam. Structure: 1+2→5+6, 3+5→7+8, 4+6→9+10
20-members: Total subjects = nSub + 20×nFam. Structure: 1+2→9+10, 3+9→11+12, 4+10→13+14, 5+11→15+16, 6+12→17, 7+13→18, 8+14→19+20
List containing:
Simulated genotype matrix (subjects × variants).
Subject identifiers for simulated samples.
Variant information from input file.
nFam <- 50 nSub <- 500 FamMode <- "10-members" # PLINK data format. Currently, this function does not support BGEN data format. PLINKFile <- system.file("extdata", "example_n1000_m236.bed", package = "GRAB") IDsToIncludeFile <- system.file("extdata", "example_n1000_m236.IDsToInclude", package = "GRAB") GenoList <- GRAB.SimuGMatFromGenoFile(nFam, nSub, FamMode, PLINKFile, control = list(IDsToIncludeFile = IDsToIncludeFile) )nFam <- 50 nSub <- 500 FamMode <- "10-members" # PLINK data format. Currently, this function does not support BGEN data format. PLINKFile <- system.file("extdata", "example_n1000_m236.bed", package = "GRAB") IDsToIncludeFile <- system.file("extdata", "example_n1000_m236.IDsToInclude", package = "GRAB") GenoList <- GRAB.SimuGMatFromGenoFile(nFam, nSub, FamMode, PLINKFile, control = list(IDsToIncludeFile = IDsToIncludeFile) )
Generates various types of phenotypes (quantitative, binary, ordinal, time-to-event) from linear predictors using appropriate link functions.
GRAB.SimuPheno( eta, traitType = "binary", control = list(pCase = 0.1, sdError = 1, pEachGroup = c(1, 1, 1), eventRate = 0.1), seed = NULL )GRAB.SimuPheno( eta, traitType = "binary", control = list(pCase = 0.1, sdError = 1, pEachGroup = c(1, 1, 1), eventRate = 0.1), seed = NULL )
eta |
Vector of linear predictors (typically covariates×beta + genotypes×beta). |
traitType |
Type of phenotype: "quantitative", "binary", "ordinal", or "time-to-event". |
control |
List of simulation parameters specific to each trait type:
|
seed |
Random seed for reproducibility (optional). |
Trait Type Details:
Quantitative: Y = eta + error, where error ~ N(0, sdError²)
Binary: Logistic model with specified case proportion
Ordinal: Proportional odds model with specified group proportions
Time-to-event: Weibull survival model with specified event rate
For more details, see: https://wenjianbi.github.io/grab.github.io/docs/simulation_phenotype.html
Simulated phenotype vector or data frame:
Numeric vector of continuous values.
Numeric vector of 0/1 values.
Numeric vector of categorical levels.
Data frame with SurvTime and SurvEvent columns.
SPACox is primarily intended for time-to-event traits in unrelated samples from large-scale biobanks. It uses the empirical cumulant generating function (CGF) to perform SPA-based single-variant association tests, enabling analysis with residuals from any null model.
GRAB.SPACox()GRAB.SPACox()
Additional Control Parameters for GRAB.NullModel():
range (numeric vector, default: c(-100, 100)):
Range for saddlepoint approximation grid. Must be symmetric (range[2] = -range[1]).
length.out (integer, default: 10000): Number of grid points for saddlepoint approximation.
Method-specific elements in the SPACox_NULL_Model object returned by GRAB.NullModel()::
mresid: Martingale residuals (numeric or "Residual" class).
cumul: Cumulative sums for variance estimation (matrix).
tX: Transpose of design matrix (matrix).
yVec: Event indicator (numeric or "Residual" class).
X.invXX: Matrix for variance calculations (matrix).
Additional Control Parameters for GRAB.Marker():
pVal_covaAdj_Cutoff (numeric, default: 5e-05): P-value cutoff for covariate adjustment.
Output file columns:
Marker identifier (rsID or CHR:POS:REF:ALT).
Marker information in format CHR:POS:REF:ALT.
Alternative allele frequency in the sample.
Total count of alternative alleles.
Proportion of missing genotypes.
P-value from the score test.
Z-score from the score test.
Bi et al. (2020). Fast and accurate method for genome-wide time-to-event data analysis and its application to UK Biobank. doi:10.1016/j.ajhg.2020.06.003
PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") OutputFile <- file.path(tempdir(), "resultSPACox.txt") PhenoData <- data.table::fread(PhenoFile, header = TRUE) # Step 1 option 1 obj.SPACox <- GRAB.NullModel( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "SPACox", traitType = "time-to-event" ) # Step 1 option 2 residuals <- survival::coxph( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER, data = PhenoData, x = TRUE )$residuals obj.SPACox <- GRAB.NullModel( residuals ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "SPACox", traitType = "Residual" ) # Step 2 GRAB.Marker(obj.SPACox, GenoFile, OutputFile) head(data.table::fread(OutputFile))PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") OutputFile <- file.path(tempdir(), "resultSPACox.txt") PhenoData <- data.table::fread(PhenoFile, header = TRUE) # Step 1 option 1 obj.SPACox <- GRAB.NullModel( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "SPACox", traitType = "time-to-event" ) # Step 1 option 2 residuals <- survival::coxph( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER, data = PhenoData, x = TRUE )$residuals obj.SPACox <- GRAB.NullModel( residuals ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "SPACox", traitType = "Residual" ) # Step 2 GRAB.Marker(obj.SPACox, GenoFile, OutputFile) head(data.table::fread(OutputFile))
SPAGRM is a scalable and accurate framework for retrospective association tests. It treats genetic loci as random vectors and uses a precise approximation of their joint distribution. This approach enables SPAGRM to handle any type of complex trait, including longitudinal and unbalanced phenotypes. SPAGRM extends SPACox to support sample relatedness.
GRAB.SPAGRM()GRAB.SPAGRM()
See SPAGRM.NullModel for detailed instructions
on preparing a SPAGRM_NULL_Model object required for GRAB.Marker().
Additional Control Parameters for GRAB.Marker():
zeta (numeric, default: 0): SPA moment approximation parameter.
tol (numeric, default: 1e-5): Numerical tolerance for SPA convergence.
Marker-level results (OutputFile) columns:
Marker identifier (rsID or CHR:POS:REF:ALT).
Marker information in format CHR:POS:REF:ALT.
Alternative allele frequency in the sample.
Total count of alternative alleles.
Proportion of missing genotypes.
Z-score from the score test.
P-value from the score test.
Hardy-Weinberg equilibrium p-value.
Xu et al. (2025). SPAGRM: effectively controlling for sample relatedness in large-scale genome-wide association studies of longitudinal traits. doi:10.1038/s41467-025-56669-1
ResidMatFile <- system.file("extdata", "ResidMat.txt", package = "GRAB") SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") PairwiseIBDFile <- system.file("extdata", "PairwiseIBD.txt", package = "GRAB") GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") OutputFile <- file.path(tempdir(), "resultSPAGRM.txt") # Step 2a: pre-calculate genotype distributions obj.SPAGRM <- SPAGRM.NullModel( ResidMatFile = ResidMatFile, SparseGRMFile = SparseGRMFile, PairwiseIBDFile = PairwiseIBDFile, control = list(ControlOutlier = FALSE) ) # Step 2b: perform association tests GRAB.Marker(obj.SPAGRM, GenoFile, OutputFile) head(data.table::fread(OutputFile))ResidMatFile <- system.file("extdata", "ResidMat.txt", package = "GRAB") SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") PairwiseIBDFile <- system.file("extdata", "PairwiseIBD.txt", package = "GRAB") GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") OutputFile <- file.path(tempdir(), "resultSPAGRM.txt") # Step 2a: pre-calculate genotype distributions obj.SPAGRM <- SPAGRM.NullModel( ResidMatFile = ResidMatFile, SparseGRMFile = SparseGRMFile, PairwiseIBDFile = PairwiseIBDFile, control = list(ControlOutlier = FALSE) ) # Step 2b: perform association tests GRAB.Marker(obj.SPAGRM, GenoFile, OutputFile) head(data.table::fread(OutputFile))
SPAmix performs retrospective single-variant association tests using genotypes and residuals from null models of any complex trait in large-scale biobanks. It extends SPACox to support complex population structures, such as admixed ancestry and multiple populations, but does not account for sample relatedness.
GRAB.SPAmix()GRAB.SPAmix()
Additional Control Parameters for GRAB.NullModel():
PC_columns (character, required): Comma-separated column names
of principal components (e.g., "PC1,PC2").
OutlierRatio (numeric, default: 1.5): IQR multiplier for outlier detection.
Outliers are defined as values outside [Q1 - rIQR, Q3 + rIQR].
Method-specific elements in the SPAmix_NULL_Model object returned by GRAB.NullModel()::
resid: Residuals from mixed model (matrix or "Residual" class).
yVec: Phenotype vector (numeric or "Residual" class).
PCs: Principal components for dimension reduction (matrix).
nPheno: Number of phenotypes analyzed (integer).
outLierList: List identifying outlier subjects for SPA adjustment.
Additional Control Parameters for GRAB.Marker():
dosage_option (character, default: "rounding_first"):
Dosage handling option. Must be either "rounding_first" or "rounding_last".
Output file columns:
Phenotype identifier (for multi-trait analysis).
Marker identifier (rsID or CHR:POS:REF:ALT).
Marker information in format CHR:POS:REF:ALT.
Alternative allele frequency in the sample.
Total count of alternative alleles.
Proportion of missing genotypes.
P-value from the score test.
Z-score from the score test.
Ma et al. (2025). SPAmix: a scalable, accurate, and universal analysis framework for large‑scale genetic association studies in admixed populations. doi:10.1186/s13059-025-03827-9
PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") OutputFile <- file.path(tempdir(), "resultSPAmix.txt") PhenoData <- data.table::fread(PhenoFile, header = TRUE) # Step 1 option 1 obj.SPAmix <- GRAB.NullModel( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER + PC1 + PC2, data = PhenoData, subjIDcol = "IID", method = "SPAmix", traitType = "time-to-event", control = list(PC_columns = "PC1,PC2") ) # Step 1 option 2 residuals <- survival::coxph( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER + PC1 + PC2, data = PhenoData )$residuals obj.SPAmix <- GRAB.NullModel( residuals ~ AGE + GENDER + PC1 + PC2, data = PhenoData, subjIDcol = "IID", method = "SPAmix", traitType = "Residual", control = list(PC_columns = "PC1,PC2") ) # Step 1 option 2: analyze multiple traits at once res_cox <- survival::coxph( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER + PC1 + PC2, data = PhenoData )$residuals res_lm <- lm(QuantPheno ~ AGE + GENDER + PC1 + PC2, data = PhenoData)$residuals obj.SPAmix <- GRAB.NullModel( res_cox + res_lm ~ AGE + GENDER + PC1 + PC2, data = PhenoData, subjIDcol = "IID", method = "SPAmix", traitType = "Residual", control = list(PC_columns = "PC1,PC2") ) # Step 2 GRAB.Marker(obj.SPAmix, GenoFile, OutputFile) head(data.table::fread(OutputFile))PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") OutputFile <- file.path(tempdir(), "resultSPAmix.txt") PhenoData <- data.table::fread(PhenoFile, header = TRUE) # Step 1 option 1 obj.SPAmix <- GRAB.NullModel( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER + PC1 + PC2, data = PhenoData, subjIDcol = "IID", method = "SPAmix", traitType = "time-to-event", control = list(PC_columns = "PC1,PC2") ) # Step 1 option 2 residuals <- survival::coxph( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER + PC1 + PC2, data = PhenoData )$residuals obj.SPAmix <- GRAB.NullModel( residuals ~ AGE + GENDER + PC1 + PC2, data = PhenoData, subjIDcol = "IID", method = "SPAmix", traitType = "Residual", control = list(PC_columns = "PC1,PC2") ) # Step 1 option 2: analyze multiple traits at once res_cox <- survival::coxph( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER + PC1 + PC2, data = PhenoData )$residuals res_lm <- lm(QuantPheno ~ AGE + GENDER + PC1 + PC2, data = PhenoData)$residuals obj.SPAmix <- GRAB.NullModel( res_cox + res_lm ~ AGE + GENDER + PC1 + PC2, data = PhenoData, subjIDcol = "IID", method = "SPAmix", traitType = "Residual", control = list(PC_columns = "PC1,PC2") ) # Step 2 GRAB.Marker(obj.SPAmix, GenoFile, OutputFile) head(data.table::fread(OutputFile))
WtCoxG is a Cox-based association test method for time-to-event traits. It effectively addresses case ascertainment and rare variant analysis. By leveraging external minor allele frequencies from public resources, WtCoxG can further boost statistical power.
GRAB.WtCoxG()GRAB.WtCoxG()
Additional Parameters for GRAB.NullModel():
RefAfFile (character, required): Reference allele frequency file path.
File must contain columns: CHROM, POS, ID, REF, ALT, AF_ref, AN_ref
RefPrevalence (numeric, required): Population-level disease prevalence
for weighting. Must be in range (0, 0.5)
Additional Control Parameters for GRAB.NullModel():
OutlierRatio (numeric, default: 1.5): IQR multiplier for outlier detection
Method-specific elements in the WtCoxG_NULL_Model object returned by GRAB.NullModel()::
mresid: Martingale residuals from weighted Cox model (numeric).
Cova: Design matrix of covariates (matrix).
yVec: Event indicator (numeric).
weight: Observation weights based on reference prevalence (numeric).
RefPrevalence: Reference population prevalence used for weighting (numeric).
outLierList: List identifying outlier subjects for SPA adjustment.
mergeGenoInfo: Data frame with batch effect QC results and external reference data.
Additional Control Parameters for GRAB.Marker():
cutoff (numeric, default: 0.1): Cutoff of batch effect test p-value for
association testing. Variants with batch effect p-value below this cutoff
will be excluded from association testing.
Output file columns:
Phenotype identifier (for multi-trait analysis).
Marker identifier (rsID or CHR:POS:REF:ALT).
Marker information in format CHR:POS:REF:ALT.
Alternative allele frequency in the sample.
Total count of alternative alleles.
Proportion of missing genotypes.
P-value from the score test.
Z-score from the score test.
Li et al. (2025). Applying weighted Cox regression to genome-wide association studies of time-to-event phenotypes. doi:10.1038/s43588-025-00864-z
# Step 1: fit null model and test batch effect PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") PhenoData <- data.table::fread(PhenoFile, header = TRUE) SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") RefAfFile <- system.file("extdata", "simuRefAf.txt", package = "GRAB") OutputFile <- file.path(tempdir(), "resultWtCoxG.txt") obj.WtCoxG <- GRAB.NullModel( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "WtCoxG", traitType = "time-to-event", GenoFile = GenoFile, SparseGRMFile = SparseGRMFile, RefAfFile = RefAfFile, RefPrevalence = 0.1 ) # Step2 GRAB.Marker(obj.WtCoxG, GenoFile, OutputFile) head(data.table::fread(OutputFile))# Step 1: fit null model and test batch effect PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB") PhenoData <- data.table::fread(PhenoFile, header = TRUE) SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB") GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB") RefAfFile <- system.file("extdata", "simuRefAf.txt", package = "GRAB") OutputFile <- file.path(tempdir(), "resultWtCoxG.txt") obj.WtCoxG <- GRAB.NullModel( survival::Surv(SurvTime, SurvEvent) ~ AGE + GENDER, data = PhenoData, subjIDcol = "IID", method = "WtCoxG", traitType = "time-to-event", GenoFile = GenoFile, SparseGRMFile = SparseGRMFile, RefAfFile = RefAfFile, RefPrevalence = 0.1 ) # Step2 GRAB.Marker(obj.WtCoxG, GenoFile, OutputFile) head(data.table::fread(OutputFile))
Builds the SAGELD (or GALLOP) null model from a fitted mixed-effects model and relatedness inputs. Extracts variance components, forms the penalization matrix, derives residual summaries, and (for SAGELD) integrates sparse GRM and pairwise IBD to prepare graph-based components for marker testing.
SAGELD.NullModel( NullModel, UsedMethod = "SAGELD", PlinkFile, SparseGRMFile, PairwiseIBDFile, PvalueCutoff = 0.001, control = list() )SAGELD.NullModel( NullModel, UsedMethod = "SAGELD", PlinkFile, SparseGRMFile, PairwiseIBDFile, PvalueCutoff = 0.001, control = list() )
NullModel |
A fitted model from lme4 (class |
UsedMethod |
Character; either |
PlinkFile |
Character. PLINK prefix (without extension) used to sample common markers for estimating the lambda parameter. |
SparseGRMFile |
Character. Path to sparse GRM file produced by
|
PairwiseIBDFile |
Character. Path to pairwise IBD file produced by
|
PvalueCutoff |
Numeric p-value threshold for screening gene–environment
association when estimating |
control |
List of options (forwarded to internal checks; see
|
A list of class "SAGELD_NULL_Model" with elements:
Character vector of subject IDs.
Number of subjects.
Method label: "SAGELD" or "GALLOP".
Per-subject sums for crossprod(X, G) terms.
Per-subject Rot %*% Si matrices for random effects.
Per-subject cross-products used in variance assembly.
Fixed-effect precision matrix (p x p).
Block matrix linking random and fixed effects.
Per-subject sums for crossprod(G).
Per-subject sums for crossprod(G, y).
Fixed-effects solution vector.
Random-effects BLUPs per subject.
Scale parameter extracted from VarCorr.
Residuals used in SAGELD testing.
Genetic component residuals.
GxE component residuals.
Environmental component residuals.
Residuals for unrelated outlier subjects.
G residuals for unrelated outliers.
GxE residuals for unrelated outliers.
Quadratic form Resid' * GRM * Resid (all subjects).
Quadratic form for G residuals.
Quadratic form for GxE residuals.
Cross-term quadratic form between G and GxE.
Quadratic form for E residuals.
Contribution from two-subject outlier families.
Two-subject outlier contribution (G).
Two-subject outlier contribution (GxE).
Two-subject outlier cross-term (G,GxE).
Sum of residuals for non-outlier unrelated subjects.
Sum of G residuals for non-outlier unrelated subjects.
Sum of GxE residuals for non-outlier unrelated subjects.
Quadratic form for non-outlier unrelated subjects.
Quadratic form for G (non-outlier unrelated).
Quadratic form for GxE (non-outlier unrelated).
Cross-term for G/GxE (non-outlier unrelated).
Per-family lists for N=2 outlier families.
CLT and standardized scores for larger families.
MAF breakpoints used in CLT construction.
Z-score threshold for E used in screening.
Builds the SPAGRM null model object using subject residuals, sparse GRM, and pairwise IBD estimates, detecting residual outliers and constructing family-level graph structures for downstream saddlepoint marker tests.
SPAGRM.NullModel( ResidMatFile, SparseGRMFile, PairwiseIBDFile, control = list(MaxQuantile = 0.75, MinQuantile = 0.25, OutlierRatio = 1.5, ControlOutlier = TRUE, MaxNuminFam = 5, MAF_interval = c(1e-04, 5e-04, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5)) )SPAGRM.NullModel( ResidMatFile, SparseGRMFile, PairwiseIBDFile, control = list(MaxQuantile = 0.75, MinQuantile = 0.25, OutlierRatio = 1.5, ControlOutlier = TRUE, MaxNuminFam = 5, MAF_interval = c(1e-04, 5e-04, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5)) )
ResidMatFile |
Data frame or file path with columns |
SparseGRMFile |
File path to sparse GRM (tab-delimited: ID1, ID2, Value). |
PairwiseIBDFile |
File path to pairwise IBD table (ID1, ID2, pa, pb, pc). |
control |
List of options controlling outlier handling and family
decomposition (see |
A list of class "SPAGRM_NULL_Model" with elements:
Numeric vector of residuals used in analysis.
Character vector of subject IDs (length = N).
Number of subjects.
Residuals of unrelated outlier subjects.
Sum of quadratic form Resid' * GRM * Resid for all subjects.
Aggregate contribution from two-subject outlier families.
Sum of residuals for non-outlier unrelated subjects.
Quadratic form contribution for non-outlier unrelated subjects.
List with per two-member family residual/Rho info.
List with Chow–Liu tree structures and standardized scores for larger families.
Vector of MAF breakpoints used in tree construction.