Title: | Set-Based Tests for Genetic Association in Longitudinal Studies |
---|---|
Description: | Functions for the longitudinal genetic random field method (He et al., 2015, <doi:10.1111/biom.12310>) to test the association between a longitudinally measured quantitative outcome and a set of genetic variants in a gene/region. |
Authors: | Zihuai He |
Maintainer: | Zihuai He <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2024-11-27 06:36:54 UTC |
Source: | CRAN |
If users want to calculate the IBS similarity, this function creates the IBS pseudo variables. This is in order to calculate the IBS similarity in an efficient way.
IBS_pseudo(x)
IBS_pseudo(x)
x |
An n by q matrix of genetic variants. |
It returns an n by 3p matrix of pseudo variables for efficiently calculating IBS similarity.
library(LGRF) # Load data example # Z: genotype matrix, n by q matrix data(LGRF.example) Z<-LGRF.example$Z A<-IBS_pseudo(Z) # Then the IBS matrix can be calculated by K.IBS<-AA^T.
library(LGRF) # Load data example # Z: genotype matrix, n by q matrix data(LGRF.example) Z<-LGRF.example$Z A<-IBS_pseudo(Z) # Then the IBS matrix can be calculated by K.IBS<-AA^T.
The dataset contains outcome variable Y, covariate X, time and genotype data Z. The first column in time is the subject ID and the second column is the measured exam. Y, X and time are all in long form. Z is a genotype matrix where each row corresponds to one subject.
data(LGRF.example)
data(LGRF.example)
data(LGRF.example)
data(LGRF.example)
Test the association between an outcome variable and multiple regions/genes using SSD format files.
LGRF.SSD.All(SSD.INFO, result.null, Gsub.id=NULL, interGXT=FALSE, similarity='GR', impute.method='fixed', MinP.compare=FALSE, ...)
LGRF.SSD.All(SSD.INFO, result.null, Gsub.id=NULL, interGXT=FALSE, similarity='GR', impute.method='fixed', MinP.compare=FALSE, ...)
SSD.INFO |
SSD format information file, output of function “Open_SSD". The sets are defined by this file. |
result.null |
Output of function “null.LGRF". |
Gsub.id |
The subject id corresponding to the genotype matrix, an m dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X and time. |
interGXT |
Whether to incorperate the gene-time interaction effect. Incorperating this effect can improve power if there is any gene-time interaction, but has slight power loss otherwise. The default is FALSE. *Please note that the second column of time should be included as a covairate when interGXT is TRUE. |
similarity |
Choose the similarity measurement for the genetic variants. Can be either "GR" for genetic relationship or "IBS" for identity by state. The default is "GR" for better computational efficiency. |
impute.method |
Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability. |
MinP.compare |
Whether to compare with the GEE based minimum p-value (MinP) test. The default is FALSE. Please note that implementing the GEE based MinP test is time consuming. |
... |
Other options of the GEE based MinP test. Defined same as in function “test.MinP". |
results |
First column contains the set ID; Second column contains the p-values; Third column contains the number of tested SNPs. |
# * Since the Plink data files used here are hard to be included in a R package, # The usage is marked by "#" to pass the package check. #library(LGRF) ############################################## # Plink data files: File.Bed, File.Bim, File.Fam # Files defining the sets: File.SetID, File.SSD, File.Info # For longitudinal data, outcome and covariates are saved in a separate file: Y, time, X. # Null model was fitted using function null.LGRF. # Create the MW File # File.Bed<-"./example.bed" # File.Bim<-"./example.bim" # File.Fam<-"./example.fam" # File.SetID<-"./example.SetID" # File.SSD<-"./example.SSD" # File.Info<-"./example.SSD.info" # Generate SSD file # To use binary ped files, you have to generate SSD file first. # If you already have a SSD file, you do not need to call this function. # Generate_SSD_SetID(File.Bed, File.Bim, File.Fam, File.SetID, File.SSD, File.Info) # SSD.INFO<-Open_SSD(File.SSD, File.Info) # Number of samples # SSD.INFO$nSample # Number of Sets # SSD.INFO$nSets ## Fit the null model # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # result.null<-null.LGRF(Y,time,X=cbind(X,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated. ## Test all regions # out_all<-LGRF.SSD.All(SSD.INFO, result.null) # Example result # out.all$results # SetID P.value N.Marker # 1 GENE_01 0.6568851 94 # 2 GENE_02 0.1822183 84 # 3 GENE_03 0.3836986 108 # 4 GENE_04 0.1265337 101 # 5 GENE_05 0.3236089 103 # 6 GENE_06 0.9401741 94 # 7 GENE_07 0.1043820 104 # 8 GENE_08 0.6093275 96 # 9 GENE_09 0.6351147 100 # 10 GENE_10 0.5631549 100 ## Test all regions, and compare with GEE based MinP test # out_all<-LGRF.SSD.All(SSD.INFO, result.null,MinP.compare=T) # Example result # out.all$results # SetID P.value P.value.MinP N.Marker # 1 GENE_01 0.62842 1.0000 94 # 2 GENE_02 0.06558 0.2718 84 # 3 GENE_03 0.61795 1.0000 108 # 4 GENE_04 0.39667 0.7052 101 # 5 GENE_05 0.17371 0.5214 103 # 6 GENE_06 0.90104 1.0000 94 # 7 GENE_07 0.10143 0.1188 104 # 8 GENE_08 0.78082 0.3835 96 # 9 GENE_09 0.21966 0.5364 100 # 10 GENE_10 0.25468 0.3527 100
# * Since the Plink data files used here are hard to be included in a R package, # The usage is marked by "#" to pass the package check. #library(LGRF) ############################################## # Plink data files: File.Bed, File.Bim, File.Fam # Files defining the sets: File.SetID, File.SSD, File.Info # For longitudinal data, outcome and covariates are saved in a separate file: Y, time, X. # Null model was fitted using function null.LGRF. # Create the MW File # File.Bed<-"./example.bed" # File.Bim<-"./example.bim" # File.Fam<-"./example.fam" # File.SetID<-"./example.SetID" # File.SSD<-"./example.SSD" # File.Info<-"./example.SSD.info" # Generate SSD file # To use binary ped files, you have to generate SSD file first. # If you already have a SSD file, you do not need to call this function. # Generate_SSD_SetID(File.Bed, File.Bim, File.Fam, File.SetID, File.SSD, File.Info) # SSD.INFO<-Open_SSD(File.SSD, File.Info) # Number of samples # SSD.INFO$nSample # Number of Sets # SSD.INFO$nSets ## Fit the null model # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # result.null<-null.LGRF(Y,time,X=cbind(X,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated. ## Test all regions # out_all<-LGRF.SSD.All(SSD.INFO, result.null) # Example result # out.all$results # SetID P.value N.Marker # 1 GENE_01 0.6568851 94 # 2 GENE_02 0.1822183 84 # 3 GENE_03 0.3836986 108 # 4 GENE_04 0.1265337 101 # 5 GENE_05 0.3236089 103 # 6 GENE_06 0.9401741 94 # 7 GENE_07 0.1043820 104 # 8 GENE_08 0.6093275 96 # 9 GENE_09 0.6351147 100 # 10 GENE_10 0.5631549 100 ## Test all regions, and compare with GEE based MinP test # out_all<-LGRF.SSD.All(SSD.INFO, result.null,MinP.compare=T) # Example result # out.all$results # SetID P.value P.value.MinP N.Marker # 1 GENE_01 0.62842 1.0000 94 # 2 GENE_02 0.06558 0.2718 84 # 3 GENE_03 0.61795 1.0000 108 # 4 GENE_04 0.39667 0.7052 101 # 5 GENE_05 0.17371 0.5214 103 # 6 GENE_06 0.90104 1.0000 94 # 7 GENE_07 0.10143 0.1188 104 # 8 GENE_08 0.78082 0.3835 96 # 9 GENE_09 0.21966 0.5364 100 # 10 GENE_10 0.25468 0.3527 100
Test the association between an outcome variable and one region/gene using SSD format files.
LGRF.SSD.OneSet_SetIndex(SSD.INFO, SetIndex, result.null, Gsub.id=NULL, interGXT=FALSE, similarity='GR', impute.method='fixed', MinP.compare=FALSE, ...)
LGRF.SSD.OneSet_SetIndex(SSD.INFO, SetIndex, result.null, Gsub.id=NULL, interGXT=FALSE, similarity='GR', impute.method='fixed', MinP.compare=FALSE, ...)
SSD.INFO |
SSD format information file, output of function “Open_SSD". The sets are defined by this file. |
SetIndex |
Set index. From 1 to the total number of sets. |
result.null |
Output of function “null.LGRF". |
Gsub.id |
The subject id corresponding to the genotype matrix, an m dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X and time. |
interGXT |
Whether to incorperate the gene-time interaction effect. Incorperating this effect can improve power if there is any gene-time interaction, but has slight power loss otherwise. The default is FALSE. *Please note that the second column of time should be included as a covairate when interGXT is TRUE. |
similarity |
Choose the similarity measurement for the genetic variants. Can be either "GR" for genetic relationship or "IBS" for identity by state. The default is "GR" for better computational efficiency. |
impute.method |
Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability. |
MinP.compare |
Whether to compare with the GEE based minimum p-value (MinP) test. The default is FALSE. Please note that implementing the GEE based MinP test is time consuming. |
... |
Other options of the GEE based MinP test. Defined same as in function “test.MinP". |
p.value |
p-value of the LGRF test. |
n.marker |
number of tested SNPs in the SNP set. |
# * Since the Plink data files used here are hard to be included in a R package, # The usage is marked by "#" to pass the package check. #library(LGRF) ############################################## # Plink data files: File.Bed, File.Bim, File.Fam # Files defining the sets: File.SetID, File.SSD, File.Info # For longitudinal data, outcome and covariates are saved in a separate file: Y, time, X. # Null model was fitted using function null.LGRF. # Create the MW File # File.Bed<-"./example.bed" # File.Bim<-"./example.bim" # File.Fam<-"./example.fam" # File.SetID<-"./example.SetID" # File.SSD<-"./example.SSD" # File.Info<-"./example.SSD.info" # Generate SSD file # To use binary ped files, you have to generate SSD file first. # If you already have a SSD file, you do not need to call this function. # Generate_SSD_SetID(File.Bed, File.Bim, File.Fam, File.SetID, File.SSD, File.Info) # SSD.INFO<-Open_SSD(File.SSD, File.Info) # Number of samples # SSD.INFO$nSample # Number of Sets # SSD.INFO$nSets ## Fit the null model # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # result.null<-null.LGRF(Y,time,X=cbind(X,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated. ## Test a single region # out_single<-LGRF.SSD.OneSet_SetIndex(SSD.INFO=SSD.INFO, SetIndex=1, # result.null=result.null, MinP.compare=F) # Example result # $p.value # [1] 0.6284 # $n.marker # [1] 94 ## Test a single region, and compare with GEE based MinP test # out_single<-LGRF.SSD.OneSet_SetIndex(SSD.INFO=SSD.INFO, SetIndex=1, # result.null=result.null,MinP.compare=T) # $p.value # LGRF MinP # [1,] 0.6284 1 # $n.marker # [1] 94
# * Since the Plink data files used here are hard to be included in a R package, # The usage is marked by "#" to pass the package check. #library(LGRF) ############################################## # Plink data files: File.Bed, File.Bim, File.Fam # Files defining the sets: File.SetID, File.SSD, File.Info # For longitudinal data, outcome and covariates are saved in a separate file: Y, time, X. # Null model was fitted using function null.LGRF. # Create the MW File # File.Bed<-"./example.bed" # File.Bim<-"./example.bim" # File.Fam<-"./example.fam" # File.SetID<-"./example.SetID" # File.SSD<-"./example.SSD" # File.Info<-"./example.SSD.info" # Generate SSD file # To use binary ped files, you have to generate SSD file first. # If you already have a SSD file, you do not need to call this function. # Generate_SSD_SetID(File.Bed, File.Bim, File.Fam, File.SetID, File.SSD, File.Info) # SSD.INFO<-Open_SSD(File.SSD, File.Info) # Number of samples # SSD.INFO$nSample # Number of Sets # SSD.INFO$nSets ## Fit the null model # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # result.null<-null.LGRF(Y,time,X=cbind(X,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated. ## Test a single region # out_single<-LGRF.SSD.OneSet_SetIndex(SSD.INFO=SSD.INFO, SetIndex=1, # result.null=result.null, MinP.compare=F) # Example result # $p.value # [1] 0.6284 # $n.marker # [1] 94 ## Test a single region, and compare with GEE based MinP test # out_single<-LGRF.SSD.OneSet_SetIndex(SSD.INFO=SSD.INFO, SetIndex=1, # result.null=result.null,MinP.compare=T) # $p.value # LGRF MinP # [1,] 0.6284 1 # $n.marker # [1] 94
Before testing a specific region using a score test, this function fits the longitudinal genetic random field model under the null hypothesis.
null.LGRF(Y, time, X = NULL)
null.LGRF(Y, time, X = NULL)
Y |
The outcome variable, an n*1 matrix where n is the total number of observations |
time |
An n*2 matrix describing how the observations are measured. The first column is the subject id. The second column is the measured exam (1,2,3,etc.). |
X |
An n*p covariates matrix where p is the total number of covariates. |
It returns a list used for function test.LGRF().
library(LGRF) # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # Z: genotype matrix, m by q matrix where m is the total number of subjects data(LGRF.example) Y<-LGRF.example$Y;time<-LGRF.example$time;X<-LGRF.example$X;Z<-LGRF.example$Z # Fit the null model result.null<-null.LGRF(Y,time,X=cbind(X,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated.
library(LGRF) # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # Z: genotype matrix, m by q matrix where m is the total number of subjects data(LGRF.example) Y<-LGRF.example$Y;time<-LGRF.example$time;X<-LGRF.example$X;Z<-LGRF.example$Z # Fit the null model result.null<-null.LGRF(Y,time,X=cbind(X,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated.
Once the model under the null model is fitted using "null.LGRF()", this function tests a specifc region/gene.
test.LGRF(Z, result.null, Gsub.id=NULL, interGXT = FALSE, similarity = "GR", impute.method="fixed")
test.LGRF(Z, result.null, Gsub.id=NULL, interGXT = FALSE, similarity = "GR", impute.method="fixed")
Z |
Genetic variants in the target region/gene, an m*q matrix where m is the subject ID and q is the total number of genetic variables. Note that the number of rows in Z should be same as the number of subjects. |
result.null |
The output of function "null.LGRF()" |
Gsub.id |
The subject id corresponding to the genotype matrix, an m dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X and time. |
interGXT |
Whether to incorperate the gene-time interaction effect. Incorperating this effect can improve power if there is any gene-time interaction, but has slight power loss otherwise. The default is FALSE. *Please note that the second column of time should be included as a covairate when interGXT is TRUE. |
similarity |
Choose the similarity measurement for the genetic variants. Can be either "GR" for genetic relationship or "IBS" for identity by state. The default is "GR" for better computational efficiency. |
impute.method |
Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability. |
p.value |
p-value of the LGRF test. |
n.marker |
number of tested SNPs in the SNP set. |
## null.LGRF fits the null model. # Input: Y, time, X (covariates) ## test.LGRF tests a region and give p-value. # Input: Z (genetic variants) and result of null.longGRF library(LGRF) # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # Z: genotype matrix, m by q matrix where m is the total number of subjects data(LGRF.example) Y<-LGRF.example$Y;time<-LGRF.example$time;X<-LGRF.example$X;Z<-LGRF.example$Z # Fit the null model result.null<-null.LGRF(Y,time,X=cbind(X,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated. # The LGRF-G test pLGRF_G<-test.LGRF(Z,result.null) # The LGRF-GT test pLGRF_GT<-test.LGRF(Z,result.null,interGXT=TRUE) # The LGRF-G test using the IBS similarity pLGRF_G_IBS<-test.LGRF(Z,result.null,similarity="IBS") # The LGRF-GT test, main effect is modeled using the IBS similarity pLGRF_GT_IBS<-test.LGRF(Z,result.null,interGXT=TRUE,similarity="IBS")
## null.LGRF fits the null model. # Input: Y, time, X (covariates) ## test.LGRF tests a region and give p-value. # Input: Z (genetic variants) and result of null.longGRF library(LGRF) # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # Z: genotype matrix, m by q matrix where m is the total number of subjects data(LGRF.example) Y<-LGRF.example$Y;time<-LGRF.example$time;X<-LGRF.example$X;Z<-LGRF.example$Z # Fit the null model result.null<-null.LGRF(Y,time,X=cbind(X,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated. # The LGRF-G test pLGRF_G<-test.LGRF(Z,result.null) # The LGRF-GT test pLGRF_GT<-test.LGRF(Z,result.null,interGXT=TRUE) # The LGRF-G test using the IBS similarity pLGRF_G_IBS<-test.LGRF(Z,result.null,similarity="IBS") # The LGRF-GT test, main effect is modeled using the IBS similarity pLGRF_GT_IBS<-test.LGRF(Z,result.null,interGXT=TRUE,similarity="IBS")
If users want to compare LGRF with the minimum p-value (MinP) test, this function tests a specifc region/gene by a GEE based minimum p-value test after fitting "null.LGRF()".
test.MinP(Z, result.null, Gsub.id=NULL, corstr="exchangeable", MinP.adjust=0.95, impute.method="fixed")
test.MinP(Z, result.null, Gsub.id=NULL, corstr="exchangeable", MinP.adjust=0.95, impute.method="fixed")
Z |
Genetic variants in the target region/gene, an m*q matrix where m is the subject ID and q is the total number of genetic variables. Note that the number of rows in Z should be same as the number of subject. |
result.null |
The output of function "null.LGRF()". |
Gsub.id |
The subject id corresponding to the genotype matrix, an m dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X and time. |
corstr |
The working correlation as specified in 'geeglm'. The following are permitted: '"independence"', '"exchangeable"', '"ar1"', '"unstructured"' and '"userdefined". |
MinP.adjust |
The minimum p-value is adjusted by the number of independent tests. Choose the adjustment thereshold as specified in Gao, et al. (2008) "A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms". Values from 0 to 1 are permitted. |
impute.method |
Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability. |
p.value |
p-value of the MinP test. |
n.marker |
number of tested SNPs in the SNP set. |
## null.LGRF fits the null model. # Input: Y, time, X (covariates) ## test.MinP tests a region and give p-value. # Input: Z (genetic variants) and result of null.longGRF library(LGRF) # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # Z: genotype matrix, m by q matrix where m is the total number of subjects data(LGRF.example) Y<-LGRF.example$Y;time<-LGRF.example$time;X<-LGRF.example$X;Z<-LGRF.example$Z # Fit the null model result.null<-null.LGRF(Y,time,X=X) # The minimum p-value test based on GEE pMinP<-test.MinP(Z,result.null,corstr="exchangeable",MinP.adjust=0.95)
## null.LGRF fits the null model. # Input: Y, time, X (covariates) ## test.MinP tests a region and give p-value. # Input: Z (genetic variants) and result of null.longGRF library(LGRF) # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # Z: genotype matrix, m by q matrix where m is the total number of subjects data(LGRF.example) Y<-LGRF.example$Y;time<-LGRF.example$time;X<-LGRF.example$X;Z<-LGRF.example$Z # Fit the null model result.null<-null.LGRF(Y,time,X=X) # The minimum p-value test based on GEE pMinP<-test.MinP(Z,result.null,corstr="exchangeable",MinP.adjust=0.95)