Title: | Mixed Logistic Regression for Genome-Wide Analysis Studies (GWAS) |
---|---|
Description: | Fast approximate methods for mixed logistic regression in genome-wide analysis studies (GWAS). Two computationnally efficient methods are proposed for obtaining effect size estimates (beta) in Mixed Logistic Regression in GWAS: the Approximate Maximum Likelihood Estimate (AMLE), and the Offset method. The wald test obtained with AMLE is identical to the score test. Data can be genotype matrices in plink format, or dosage (VCF files). The methods are described in details in Milet et al (2020) <doi:10.1101/2020.01.17.910109>. |
Authors: | Hervé Perdry [aut, cre], Jacqueline Milet [aut] |
Maintainer: | Hervé Perdry <[email protected]> |
License: | GPL-3 |
Version: | 0.7 |
Built: | 2024-12-21 06:57:26 UTC |
Source: | CRAN |
Mixed logistic regression for GWAS
association.test.logistic( x, Y = x@ped$pheno, X = matrix(1, nrow(x)), K, beg = 1, end = ncol(x), algorithm = c("amle", "offset"), eigenK, p = 0, model = c("additive", "dominant", "recessive"), ... )
association.test.logistic( x, Y = x@ped$pheno, X = matrix(1, nrow(x)), K, beg = 1, end = ncol(x), algorithm = c("amle", "offset"), eigenK, p = 0, model = c("additive", "dominant", "recessive"), ... )
x |
a bedmatrix |
Y |
phenotype vector. Default is column |
X |
A matrix of covariates (defaults to a column of ones for the intercept) |
K |
A genetic relationship matrix (or a list of such matrices) |
beg |
Index of the first SNP tested for association |
end |
Index of the last SNP tested for association |
algorithm |
Algorithm to use |
eigenK |
eigen decomposition of K (only if |
p |
Number of principal components to include in the model |
model |
Model for the effect allele (allele A2) |
... |
Additional parameter for |
Tests the association between the phenotype and requested SNPs in x
.
The phenotype Y
is a binary trait. A Wald test is performed using an approximate
method defined by the parameter algorithm
.
Parameter model
allows to specify an additive model (genotypes A1 A1, A1 A2, and A2 A2
are recoded for analysis as 0, 1 and 2 respectively), a dominant model (genotypes recoded as 0, 1, and 1) or a recessive
model (recoded as 0, 0 and 1).
All other arguments are as in gaston::association.test
.
A data frame giving for each SNP the association statistics.
data(TTN) x <- as.bed.matrix(TTN.gen, TTN.fam, TTN.bim) ## Simulation data ## set.seed(1) # some covariables X <- cbind(1, runif(nrow(x))) # A random GRM ran <- random.pm( nrow(x)) # random effects (tau = 1) omega <- lmm.simu(1, 0, eigenK=ran$eigen)$omega # linear term of the model lin <- X %*% c(0.1,-0.2) + omega # vector of probabilitues pi <- 1/(1+exp( -lin )) # vector of binary phenotypes y <- rbinom(nrow(x), 1, pi) # testing association with 1) the score test, 2) the offset algorithm, 3) the 'amle' algorithm a1 <- association.test(x, y, X, K = ran$K, method = "lmm", response = "bin") a2 <- association.test.logistic(x, y, X, K = ran$K, algorithm = "offset") a3 <- association.test.logistic(x, y, X, K = ran$K, algorithm = "amle")
data(TTN) x <- as.bed.matrix(TTN.gen, TTN.fam, TTN.bim) ## Simulation data ## set.seed(1) # some covariables X <- cbind(1, runif(nrow(x))) # A random GRM ran <- random.pm( nrow(x)) # random effects (tau = 1) omega <- lmm.simu(1, 0, eigenK=ran$eigen)$omega # linear term of the model lin <- X %*% c(0.1,-0.2) + omega # vector of probabilitues pi <- 1/(1+exp( -lin )) # vector of binary phenotypes y <- rbinom(nrow(x), 1, pi) # testing association with 1) the score test, 2) the offset algorithm, 3) the 'amle' algorithm a1 <- association.test(x, y, X, K = ran$K, method = "lmm", response = "bin") a2 <- association.test.logistic(x, y, X, K = ran$K, algorithm = "offset") a3 <- association.test.logistic(x, y, X, K = ran$K, algorithm = "amle")
Mixed logistic regression for GWAS, using dosages
association.test.logistic.dosage( filename, Y, X, K, beg, end, algorithm = c("amle", "offset"), eigenK, p = 0, n.cores = 1L, ... )
association.test.logistic.dosage( filename, Y, X, K, beg, end, algorithm = c("amle", "offset"), eigenK, p = 0, n.cores = 1L, ... )
filename |
Name of a dosage file |
Y |
phenotype vector. Default is column |
X |
A matrix of covariates (defaults to a column of ones for the intercept) |
K |
A genetic relationship matrix (or a list of such matrices) |
beg |
Index of the first SNP tested for association |
end |
Index of the last SNP tested for association |
algorithm |
Algorithm to use |
eigenK |
eigen decomposition of K (only if |
p |
Number of principal components to include in the model |
n.cores |
number of cores to use |
... |
Additional parameter for |
Dosage files can be VCF files with 'DS' or 'GP' fields. It is also possible to use a file with columns 'id"', 'chr', 'pos', 'A1', 'A2', 'sample1', 'sample2', etc. These files should have a header with column names.
For more details refer to association.test.logistic
and association.test
.
A data frame giving for each SNP the association statistics.
association.test.logistic
, association.test
Draws a QQ plot of p-values
qqplot.pvalues( p, snp.cat, col.cat, col.abline = "red", CB = TRUE, col.CB = "gray80", CB.level = 0.95, thinning = TRUE, ... )
qqplot.pvalues( p, snp.cat, col.cat, col.abline = "red", CB = TRUE, col.CB = "gray80", CB.level = 0.95, thinning = TRUE, ... )
p |
vector of p-values, or a data.frame with a column named |
snp.cat |
(optional) A factor giving the SNP categories. |
col.cat |
(optional) A vector of colors used to plot the SNP categories. |
col.abline |
Color of the line of slope 1. Set to |
CB |
|
col.CB |
The color of the confidence band. |
CB.level |
The level of the confidence band. |
thinning |
|
... |
Graphical parameters to be passed to |
This function draws a QQ plot of -values, stratified by categories.
If the parameter
snp.cat
is missing, the function falls back on gaston::qqplot.pvalues
.
Returns a 'NULL'
SNP.category
, qqplot.pvalues
(in gaston)
# a random vector of categories ca <- sample(c("A","B","C"), 1e6, TRUE, c(0.05, 0.9, 0.05)) # a vector of p-values, with different distribution depending on the strata p <- runif(1e6)**ifelse(ca == "A", .8, ifelse(ca == "B", 1, 1.2)) qqplot.pvalues(p, ca)
# a random vector of categories ca <- sample(c("A","B","C"), 1e6, TRUE, c(0.05, 0.9, 0.05)) # a vector of p-values, with different distribution depending on the strata p <- runif(1e6)**ifelse(ca == "A", .8, ifelse(ca == "B", 1, 1.2)) qqplot.pvalues(p, ca)
SNP.category
SNP.category(bed, Z, threshold = 0.8)
SNP.category(bed, Z, threshold = 0.8)
bed |
A bed matrix |
Z |
A vector of length |
threshold |
Variance thresholds |
This function determines a SNP Category from a covariable Z
,
which can be for example an indicator variable for a population strata,
or the first genomic principal component.
A factor giving the category of each SNP
# a random vector of categories ca <- sample(c("A","B","C"), 1e6, TRUE, c(0.05, 0.9, 0.05)) # a vector of p-values, with different distribution depending on the strata p <- runif(1e6)**ifelse(ca == "A", .8, ifelse(ca == "B", 1, 1.2)) qqplot.pvalues(p, ca)
# a random vector of categories ca <- sample(c("A","B","C"), 1e6, TRUE, c(0.05, 0.9, 0.05)) # a vector of p-values, with different distribution depending on the strata p <- runif(1e6)**ifelse(ca == "A", .8, ifelse(ca == "B", 1, 1.2)) qqplot.pvalues(p, ca)