Title: | Meta Analysis for SNP-Set (Sequence) Kernel Association Test |
---|---|
Description: | Functions for Meta-analysis Burden Test, Sequence Kernel Association Test (SKAT) and Optimal SKAT (SKAT-O) by Lee et al. (2013) <doi:10.1016/j.ajhg.2013.05.010>. These methods use summary-level score statistics to carry out gene-based meta-analysis for rare variants. |
Authors: | Seunggeun (Shawn) Lee [aut, cre] |
Maintainer: | Seunggeun (Shawn) Lee <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.90 |
Built: | 2024-12-08 07:16:03 UTC |
Source: | CRAN |
Example dataset
This example dataset has the following objects:
a list object of binary phenotypes. It has 3 elements for 3 study cohorts. Each element is a vector of binary phenotypes.
a list object of covariates. It has 3 elements for 3 study cohorts. Each element is a matrix of covariates. The first and last elements have two covariates (two columns), and the second element has one covariate (one column).
a numeric value of the number of cohorts (3).
a list object of genotypes of all samples. It has 10 elements for 10 genes. Each element is an nxp matrix with n being the total sample size (3000) and p being the number of SNPs.
Generate Meta SSD (MSSD) and Meta Info (MInfo) files. Both files are needed to run MetaSKAT with summary statistics.
Generate_Meta_Files(obj, File.Bed, File.Bim , File.SetID, File.MSSD, File.MInfo, N.Sample , File.Permu = NULL, data=NULL, impute.method="fixed") Generate_Meta_Files_FromDosage(obj, File.Dosage , File.SetID, File.MSSD, File.MInfo, N.Sample , File.Permu=NULL, data=NULL, impute.method="fixed")
Generate_Meta_Files(obj, File.Bed, File.Bim , File.SetID, File.MSSD, File.MInfo, N.Sample , File.Permu = NULL, data=NULL, impute.method="fixed") Generate_Meta_Files_FromDosage(obj, File.Dosage , File.SetID, File.MSSD, File.MInfo, N.Sample , File.Permu=NULL, data=NULL, impute.method="fixed")
obj |
returned object from SKAT_Null_Model. |
File.Bed |
name of the binary ped file (BED). |
File.Bim |
name of the binary bim file (BIM). |
File.SetID |
name of the SNP set ID file. The first column must be Set ID, and the second column must be SNP ID. There should be no header!! |
File.MSSD |
name of the MSSD file that will be generated. |
File.MInfo |
name of the MInfo file that will be generated. |
N.Sample |
number of samples. |
File.Permu |
name of a file that will have score statistics from permuted phenotypes (currently internal use only). |
data |
an optional data frame containing the variables in the model (default=NULL). If it is NULL, the variables are taken from environment(formula). |
impute.method |
a method to impute missing genotypes (default= "fixed"). "bestguess" imputes missing genotypes as the most likely values(0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p = MAF), and "fixed" imputes missing genotypes by assigning the mean genotype values (2p). |
File.Dosage |
name of the dosage file. The dosage file must not have a header. |
These functions generate summary statistic files (MSSD and MInfo files) from plink binary files. To run meta analysis, each study should provide both MSSD and MInfo files. The MSSD is a binary file with between-SNP information matrices, and MInfo is a text file with information on study cohorts and SNPsets.
If users want to use dosages instead of hard call genotypes, Generate_Meta_Files_FromDosage should be used instead of Generate_Meta_Files. The dosage file should follow the plink dosage file format with a single dosage value per each SNP (Format=1 in plink). The first three columns should be SNP ID, allele type1 (a1) and allele type2 (a2). After the first three columns, there should be N.Sample columns of dosage data. Each column represents each sample, and the order of samples should be matched with the order in phenotypes and covariates used in SKAT_Null_Model.
ex)
rs0001 A T 0.1 0.2
rs0002 C G 1.2 0
Dosage value is the expected number of a2 copies, and 0 .. 2 scale. So the value 0.1 indicates that the expected number of copy of a2 is 0.1.
Seunggeun Lee
Compute model parameters and residuals under the null model (H0) of no associations. It can be used only when individual level data are available.
Meta_Null_Model(y.list, x.list, n.cohort, out_type="C", n.Resampling=0) Meta_Null_Model_EmmaX(y.list, x.list, n.cohort, K.list=NULL, Kin.File.list=NULL)
Meta_Null_Model(y.list, x.list, n.cohort, out_type="C", n.Resampling=0) Meta_Null_Model_EmmaX(y.list, x.list, n.cohort, K.list=NULL, Kin.File.list=NULL)
y.list |
a list object for phenotypes. Each element should be a vector of phenotypes. If you have 3 cohorts, it should have 3 elements. |
x.list |
a list object for covariates. Each element should be a vector or a matrix of covariates. If there are 3 cohorts, it should have 3 elements. If there are no covariates to adjust for, the element should be "intercept". See the examples. |
n.cohort |
a numeric value of the number of cohort. |
out_type |
an indicator for the outcome type. "C" for continuous outcomes and "D" for dichotomous outcomes. |
n.Resampling |
internal use only. |
K.list |
a list object of the kinship matrices. If K.list=NULL, the function reads files in Kin.File.list to obtain kinship matrices. |
Kin.File.list |
a list object of emmax-kin output file names. |
It returns an object that has model parameters and residuals. The returned object will be used to run MetaSKAT_wZ.
Seunggeun Lee
data(Example) attach(Example) ############################################################# # Compute a p-value of the first gene obj<-Meta_Null_Model(y.list, x.list, n.cohort=3, out_type="D") MetaSKAT_wZ(Z.list[[1]], obj)$p.value ############################################################# # If you want to use the intercept-only model for the 2nd cohort x.list[[2]]<-"intercept" obj<-Meta_Null_Model(y.list, x.list, n.cohort=3, out_type="D") MetaSKAT_wZ(Z.list[[1]], obj)$p.value
data(Example) attach(Example) ############################################################# # Compute a p-value of the first gene obj<-Meta_Null_Model(y.list, x.list, n.cohort=3, out_type="D") MetaSKAT_wZ(Z.list[[1]], obj)$p.value ############################################################# # If you want to use the intercept-only model for the 2nd cohort x.list[[2]]<-"intercept" obj<-Meta_Null_Model(y.list, x.list, n.cohort=3, out_type="D") MetaSKAT_wZ(Z.list[[1]], obj)$p.value
Meta analysis SKAT with Meta SSD (MSSD) and Info (MInfo) files. MetaSKAT_MSSD_OneSet computes a p-value for a given set, and MetaSKAT_MSSD_ALL computes p-values for all sets.
MetaSKAT_MSSD_OneSet(Cohort.Info, SetID, combined.weight=TRUE, weights.beta=c(1,25), method="davies", r.corr=0, is.separate = FALSE, Group_Idx=NULL, MAF.cutoff=1, missing_cutoff=0.15) MetaSKAT_MSSD_ALL(Cohort.Info, ...)
MetaSKAT_MSSD_OneSet(Cohort.Info, SetID, combined.weight=TRUE, weights.beta=c(1,25), method="davies", r.corr=0, is.separate = FALSE, Group_Idx=NULL, MAF.cutoff=1, missing_cutoff=0.15) MetaSKAT_MSSD_ALL(Cohort.Info, ...)
Cohort.Info |
output object from Open_MSSD_File_2Read. |
SetID |
a character value of set id to test. |
combined.weight |
a logical value (default=TRUE) for a type of weighting. See MetaSKAT_wZ page for details. |
weights.beta |
a numeric vector of parameters for the beta weights (default=c(1,25)) |
method |
a method to compute a p-value (default= "davies"). See MetaSKAT_wZ page for details. |
r.corr |
the |
is.separate |
a logical value (default=FALSE) for homogeneous(=FALSE) or heterogeneous(=TRUE) genetic effects of a SNP set across studies. See MetaSKAT_wZ page for details. |
Group_Idx |
a vector of group indicator (default=NULL). See MetaSKAT_wZ page for details. |
MAF.cutoff |
a cutoff of the MAFs of SNPs (default=1). Any SNPs with MAFs > MAF.cutoff will be excluded from the analysis. |
missing_cutoff |
a cutoff of the missing rates of SNPs (default=0.15). See MetaSKAT_wZ page for details. |
... |
the same parameters of MetaSKAT_MSSD_OneSet after SetID. |
Please see MetaSKAT_wZ for details.
MetaSKAT_MSSD_OneSet and MetaSKAT_wZ return the same object. See MetaSKAT_wZ for details. MetaSKAT_MSSD_ALL returns a dataframe with SetIDs (first column) and p-values (second column).
Seunggeun Lee
Lee, S., Teslovich, T., Boehnke, M., Lin, X. (2013) General framework for meta-analysis of rare variants in sequencing association studies. American Journal of Human Genetics, 93, 42-53.
Meta analysis SKAT with individual level genotype data.
MetaSKAT_wZ(Z, obj, combined.weight=TRUE, weights.beta=c(1,25), method="davies", r.corr=0, is.separate = FALSE, Group_Idx=NULL, impute.method="fixed",impute.estimate.maf=1, missing_cutoff=0.15)
MetaSKAT_wZ(Z, obj, combined.weight=TRUE, weights.beta=c(1,25), method="davies", r.corr=0, is.separate = FALSE, Group_Idx=NULL, impute.method="fixed",impute.estimate.maf=1, missing_cutoff=0.15)
Z |
a numeric genotype matrix with each row as a different individual and each column as a separate snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Missing genotypes will be imputed using observed MAFs. |
obj |
an output object from the Meta_Null_Model function. |
combined.weight |
a logical value (default=TRUE) for the type of weighting. If it is TRUE, a weight for each SNP is computed using MAFs that are common across studies. If it is FALSE, group specific weights will be used based on group specific MAFs. |
weights.beta |
a numeric vector of parameters of beta weights (default=c(1,25)) |
method |
a method to compute a p-value (default= "davies"). "davies" represents an exact method that computes a p-value by inverting the characteristic function of the mixture chisq dist., and "optimal" represents the optimal test (SKAT-O) that is based on an optimal linear combination of burden and SKAT statistics. "optimal" is equivalent to "optimal.adj" in the SKAT function. |
r.corr |
the |
is.separate |
a logical value (default=FALSE) for homogeneous(=FALSE) or heterogeneous(=TRUE) genetic effects of a SNP set across studies. When FALSE, it is assumed that all studies share the same causal variants with the same effect size. When TRUE, it is assumed that studies/groups may have different causal variants. |
Group_Idx |
a vector of group indicator (default=NULL). If a vector of integers are specified, it assumes causal variants are the same for studies with the same group index, and different for studies with different group indexes. When NULL, studies are assumed to be in different groups with different group indexes. When is.separate=FALSE, it will be ignored. |
impute.method |
a method to impute missing genotypes (default= "fixed"). "bestguess" imputes missing genotypes as the most likely values(0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p = MAF), and "fixed" imputes missing genotypes by assigning the mean genotype value (2p). |
impute.estimate.maf |
a numeric value indicating how to estimate MAFs for the imputation. If impute.estimate.maf=1 (default), MetaSKAT uses study-specific MAFs, in which each study MAFs will be used for the imputation. If impute.estimate.maf=2, all samples in the Z matrix will be used to calculate MAFs for the imputation. If impute.estimate.maf=3, MetaSKAT uses group-specific MAFs. Previous versions (< ver 0.6) used impute.estimate.maf=2 as a default. |
missing_cutoff |
a cutoff of the missing rates of SNPs (default=0.15). If the first study has SNPs with missing rates higher than the cutoff, these SNPs in the study will be excluded from the analysis. However, the same SNPs in other studies will not be excluded, if their missing rates are lower than the cutoff. The missing rates are calculated study by study. |
The rows of Z should be matched with phenotypes and covariates. If there are 3 studies, and study 1,2, and 3 have n1, n2, and n3 samples, the first n1, n2, and n3 rows of Z should be genotypes of the first, second, and third studies, respectively.
Group_Idx is a vector of group index. Suppose the first two studies are European-based and the last study is African American-based. If you want to run MetaSKAT with assuming ancestry group specific heterogeneity, you can set Group_Idx=c(1,1,2), which indicates the first two studies belong to the same group.
The four methods in the MetaSKAT paper can be run with the following parameters:
Hom-Meta-SKAT: combined.weight=TRUE, is.separate=FALSE
Hom-Meta-SKAT-O: combined.weight=TRUE, is.separate=FALSE, method="optimal"
Het-Meta-SKAT: combined.weight=FALSE, is.separate=TRUE
Het-Meta-SKAT-O: combined.weight=FALSE, is.separate=TRUE, method="optimal"
p.value |
p-value. |
param |
estimated parameters of each method. |
param$Is_Converged |
(only with method="davies") an indicator for the convergence. 1 indicates convergence and 0 otherwise. When 0 (not converged), "liu" method will used to compute a p-value. |
Seunggeun Lee
data(Example) attach(Example) ############################################################# # Compute a p-value of the first gene obj<-Meta_Null_Model(y.list, x.list, n.cohort=3, out_type="D") # rho=0 MetaSKAT_wZ(Z.list[[1]], obj)$p.value # rho=1 (burden test) MetaSKAT_wZ(Z.list[[1]], obj, r.corr=1)$p.value # optimal test MetaSKAT_wZ(Z.list[[1]], obj, method="optimal")$p.value # cohort specific weights MetaSKAT_wZ(Z.list[[1]], obj, combined.weight=FALSE)$p.value # Seperate = TRUE # Assume heterogeneous genetic effect MetaSKAT_wZ(Z.list[[1]], obj, combined.weight=FALSE, is.separate = TRUE)$p.value # Group # the first two cohorts are in the same group. Group_Idx=c(1,1,2) MetaSKAT_wZ(Z.list[[1]], obj, combined.weight=FALSE, is.separate = TRUE,Group_Idx=Group_Idx)$p.value # all three cohorts are in different group. Group_Idx=c(1,2,3) MetaSKAT_wZ(Z.list[[1]], obj, combined.weight=FALSE, is.separate = TRUE,Group_Idx=Group_Idx)$p.value
data(Example) attach(Example) ############################################################# # Compute a p-value of the first gene obj<-Meta_Null_Model(y.list, x.list, n.cohort=3, out_type="D") # rho=0 MetaSKAT_wZ(Z.list[[1]], obj)$p.value # rho=1 (burden test) MetaSKAT_wZ(Z.list[[1]], obj, r.corr=1)$p.value # optimal test MetaSKAT_wZ(Z.list[[1]], obj, method="optimal")$p.value # cohort specific weights MetaSKAT_wZ(Z.list[[1]], obj, combined.weight=FALSE)$p.value # Seperate = TRUE # Assume heterogeneous genetic effect MetaSKAT_wZ(Z.list[[1]], obj, combined.weight=FALSE, is.separate = TRUE)$p.value # Group # the first two cohorts are in the same group. Group_Idx=c(1,1,2) MetaSKAT_wZ(Z.list[[1]], obj, combined.weight=FALSE, is.separate = TRUE,Group_Idx=Group_Idx)$p.value # all three cohorts are in different group. Group_Idx=c(1,2,3) MetaSKAT_wZ(Z.list[[1]], obj, combined.weight=FALSE, is.separate = TRUE,Group_Idx=Group_Idx)$p.value
Read Meta SSD (MSSD) and Meta Info (MInfo) files.
Open_MSSD_File_2Read(File.MSSD.vec, File.MInfo.vec)
Open_MSSD_File_2Read(File.MSSD.vec, File.MInfo.vec)
File.MSSD.vec |
a vector of MSSD files. Each element represents a MSSD file of each study. |
File.MInfo.vec |
a vector of MInfo files. Each element represents a MInfo file of each study. |
Users should open MSSD and MInfo files to run MetaSKAT_MSSD_OneSet or MetaSKAT_MSSD_ALL. If all individual level data are available, use MetaSKAT_wZ instead.
This function returns an data object. The returned object will be used to run MetaSKAT_MSSD_OneSet or MetaSKAT_MSSD_ALL.
Seunggeun Lee