Title: | Association Tests for Microbial Lineages on a Taxonomic Tree |
---|---|
Description: | A variety of association tests for microbiome data analysis including Quasi-Conditional Association Tests (QCAT) described in Tang Z.-Z. et al.(2017) <doi:10.1093/bioinformatics/btw804> and Zero-Inflated Generalized Dirichlet Multinomial (ZIGDM) tests described in Tang Z.-Z. & Chen G. (2017, submitted). |
Authors: | Zheng-Zheng Tang |
Maintainer: | Zheng-Zheng Tang <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1 |
Built: | 2024-12-06 06:45:43 UTC |
Source: | CRAN |
A variety of association tests for microbiome data analysis described in Tang Z.-Z. et al.(2017) <doi: 10.1093/bioinformatics/btw804> and Tang Z.-Z. & Chen G. (2017) <Submitted>. miLineage allows users to (a) perform tests on multivariate taxon counts; (b) localize the covariate-associated lineages on the taxonomic tree; and (c) assess the overall association of the microbial community with the covariate of interest.
Package: | miLineage |
Type: | Package |
Version: | 2.1 |
Date: | 2018-03-09 |
License: | GPL (>=2) |
QCAT, QCAT_GEE, ZIGDM
Zheng-Zheng Tang
Maintainer: Zheng-Zheng Tang <[email protected]>
1. Tang, Z.-Z., Chen, G., Alekseyenko, A. V., and Li, H. (2017). A General Framework for Association Analysis of Microbial Communities on a Taxonomic Tree. Bioinformatics, 33(9), 1278-1285.
2. Tang, Z.-Z. and Chen, G. (2017). Zero-Inflated Generalized Dirichlet Multinomial Regression Model for Microbiome Compositional Data Analysis. Submitted.
data(data.toy) OTU.toy = data.toy$OTU.toy Tax.toy = data.toy$Tax.toy case = data.toy$covariate.toy # the OTUs should be consistent between the OTU table and the taxonomy table OTU.toy.reorder = OTU.toy[,match(rownames(Tax.toy), colnames(OTU.toy))] # perform QCAT test for detecting differential mean QCAT(OTU.toy.reorder, case, 1, Tax.toy, fdr.alpha=0.05) # perform ZIGDM test for detecting differential dispersion ZIGDM(OTU.toy.reorder, NULL, NULL, case, test.type = "Disp", 1, ZI.LB = 10, Tax.toy, fdr.alpha = 0.05)
data(data.toy) OTU.toy = data.toy$OTU.toy Tax.toy = data.toy$Tax.toy case = data.toy$covariate.toy # the OTUs should be consistent between the OTU table and the taxonomy table OTU.toy.reorder = OTU.toy[,match(rownames(Tax.toy), colnames(OTU.toy))] # perform QCAT test for detecting differential mean QCAT(OTU.toy.reorder, case, 1, Tax.toy, fdr.alpha=0.05) # perform ZIGDM test for detecting differential dispersion ZIGDM(OTU.toy.reorder, NULL, NULL, case, test.type = "Disp", 1, ZI.LB = 10, Tax.toy, fdr.alpha = 0.05)
The data are derived from a real gut microbiome study. The data contain an OTU table, a taxonomy table, and a covariate table. The data include 96 samples and 80 OTUs.
data("data.real")
data("data.real")
data.real contains the following objects:
OTU.real: a matrix of OTU counts for 96 samples and 80 OTUs
Tax.real: a matrix of taxonomy table from Rank1 (kingdom level) to Rank6 (genus level)
covariate.real: a matrix of three variables
Wu, Gary D., et al. "Linking long-term dietary patterns with gut microbial enterotypes." Science 334.6052 (2011): 105-108.
data(data.real)
data(data.real)
This data is simulated based on a mock taxonomic tree. The data contain an OTU table, a taxonomy table, and a covariate. The data include 50 samples (25 case-control pairs) and 7 OTUs.
data("data.toy")
data("data.toy")
data.toy contains the following objects:
OTU.toy: a matrix of OTU counts for 50 samples and 7 OTUs
Tax.toy: a matrix of taxonomy table with 3 ranks
covariate.toy: a matrix of one variable
This is a simulated data based on a mock taxonomic tree.
data(data.toy)
data(data.toy)
This function allows users to (a) perform QCAT on multivariate taxon counts; (b) perform QCAT on the taxonomic tree to localize the covariate-associated lineages; and (c) assess the overall association of the microbial community with the covariate of interest.
QCAT(OTU, X, X.index, Tax = NULL, min.depth = 0, n.resample = NULL, fdr.alpha = 0.05)
QCAT(OTU, X, X.index, Tax = NULL, min.depth = 0, n.resample = NULL, fdr.alpha = 0.05)
OTU |
a numeric matrix contains counts with each row corresponds to a sample and each column corresponds to an OTU or a taxon. Column name is mandatory. No missing values are allowed. |
X |
a numeric matrix contains covariates with each column pertains to one variable. Samples in the OTU and X matrices should be identical and in the same order. No missing values are allowed. |
X.index |
a vector indicates the columns in X for the covariate(s) of interest. The remaining columns in X will be treated as confounders. |
Tax |
a matrix define the taxonomic ranks with each row corresponds to an OTU or a taxon and each column corresponds to a rank (start from the higher taxonomic level, e.g., from kingdom to genus). Row name is mandatory and should be consistent with the column name of the OTU table, Column name should be formatted as "Rank1", "Rank2", etc, representing the taxonomic levels from highest to lowest. |
min.depth |
lower bound for sample read depth. Samples with read depth less than min.depth will be removed before the analysis. |
n.resample |
number of reamplings/permutations |
fdr.alpha |
desired false discovery rate for multiple tests on the lineages. |
If Tax=NULL (Default), a test is performed using all the OTUs/taxa.
If Tax is provided, tests are performed for lineages derived from the taxonomic hierarchy. The output is a list that contains 3 components
lineage.pval |
p-values for all lineages. By default (n.resample=NULL), only the asymptotic test will be performed. |
sig.lineage |
a vector of significant lineages |
global.pval |
p-values of the global tests |
Zheng-Zheng Tang
Tang, Z.-Z., Chen, G., Alekseyenko, A. V., and Li, H. (2017). A General Framework for Association Analysis of Microbial Communities on a Taxonomic Tree. Bioinformatics, 33(9), 1278-1285.
data(data.toy) OTU.toy = data.toy$OTU.toy Tax.toy = data.toy$Tax.toy case = data.toy$covariate.toy # the OTUs should be consistent between the OTU table and the taxonomy table OTU.toy.reorder = OTU.toy[,match(rownames(Tax.toy), colnames(OTU.toy))] # perform QCAT QCAT(OTU.toy.reorder, case, 1, Tax.toy, fdr.alpha=0.05)
data(data.toy) OTU.toy = data.toy$OTU.toy Tax.toy = data.toy$Tax.toy case = data.toy$covariate.toy # the OTUs should be consistent between the OTU table and the taxonomy table OTU.toy.reorder = OTU.toy[,match(rownames(Tax.toy), colnames(OTU.toy))] # perform QCAT QCAT(OTU.toy.reorder, case, 1, Tax.toy, fdr.alpha=0.05)
This function performs Quasi-Conditional Association Test (QCAT) for the positive taxon counts and (Generalized Estimating Equation) GEE-based association test for the zero counts. Then, the two-part test is from the combination of those two tests. This function allows users to (a) perform QCAT+GEE on multivariate taxon counts; (b) perform QCAT+GEE on the taxonomic tree to localize the covariate-associated lineages; and (c) assess the overall association of the microbial community with the covariate of interest.
QCAT_GEE(OTU, X, X.index, Z, Z.index, Tax = NULL, min.depth = 0, n.resample = NULL, fdr.alpha = 0.05)
QCAT_GEE(OTU, X, X.index, Z, Z.index, Tax = NULL, min.depth = 0, n.resample = NULL, fdr.alpha = 0.05)
OTU |
a numeric matrix contains counts with each row corresponds to a sample and each column corresponds to an OTU or a taxon. Column name is mandatory. No missing values are allowed. |
X |
a numeric matrix contains covariates for the positive-part test with each column pertains to one variable. Samples in the OTU and X matrices should be identical and in the same order. No missing values are allowed. |
X.index |
a vector indicates the columns in X for the covariate(s) of interest in the positive-part test. The remaining columns in X will be treated as confounders in modeling the positive abundance. |
Z |
a numeric matrix contains covariates for the zero-part test with each column pertains to one variable. Samples in the OTU and Z matrices should be identical and in the same order. No missing values are allowed. |
Z.index |
a vector indicates the columns in Z for the covariate(s) of interest in the zero-part test. The remaining columns in Z will be treated as confounders in modeling the proportion of zero. |
Tax |
a matrix define the taxonomic ranks with each row corresponds to an OTU or a taxon and each column corresponds to a rank (start from the higher taxonomic level, e.g., from kingdom to genus). Row name is mandatory and should be consistent with the column name of the OTU table, Column name should be formatted as "Rank1", "Rank2", etc, representing the taxonomic levels from highest to lowest. |
min.depth |
lower bound for sample read depth. Samples with read depth less than min.depth will be removed before the analysis. |
n.resample |
number of reamplings/permutations |
fdr.alpha |
desired false discovery rate for multiple tests on the lineages. |
If Tax=NULL (Default), tests are performed using all the OTUs/taxa.
If Tax is provided, tests are performed for lineages derived from the taxonomic hierarchy. The output is a list that contains 3 components
lineage.pval |
a list of zero-part, positive-part and two-part (combined) p-values for all lineages. By default (n.resample=NULL), only the asymptotic test will be performed. |
sig.lineage |
a list of significant lineages based on the zero-part, positive-part and two-part p-values |
global.pval |
p-values of the global tests |
Zheng-Zheng Tang
Tang, Z.-Z., Chen, G., Alekseyenko, A. V., and Li, H. (2017). A General Framework for Association Analysis of Microbial Communities on a Taxonomic Tree. Bioinformatics, 33(9), 1278-1285.
data(data.toy) OTU.toy = data.toy$OTU.toy Tax.toy = data.toy$Tax.toy case = data.toy$covariate.toy # the OTUs should be consistent between the OTU table and the taxonomy table OTU.toy.reorder = OTU.toy[,match(rownames(Tax.toy), colnames(OTU.toy))] # perform QCAT+GEE test QCAT_GEE(OTU.toy.reorder, case, 1, case, 1, Tax.toy, fdr.alpha=0.05)
data(data.toy) OTU.toy = data.toy$OTU.toy Tax.toy = data.toy$Tax.toy case = data.toy$covariate.toy # the OTUs should be consistent between the OTU table and the taxonomy table OTU.toy.reorder = OTU.toy[,match(rownames(Tax.toy), colnames(OTU.toy))] # perform QCAT+GEE test QCAT_GEE(OTU.toy.reorder, case, 1, case, 1, Tax.toy, fdr.alpha=0.05)
Different from the distribution-free QCAT and QCAT+GEE, the ZIGDM tests are based on a parametric model (ZIGDM) for multivariate taxon counts. The ZIGDM tests can not only detect differential mean but also differential dispersion level or presence-absence frequency in microbial compositions. This function allows users to (a) perform ZIGDM tests on multivariate taxon counts; (b) perform ZIGDM tests on the taxonomic tree to localize the covariate-associated lineages; and (c) assess the overall association of the microbial community with the covariate of interest.
ZIGDM(OTU, X4freq, X4mean, X4disp, test.type = "Mean", X.index, ZI.LB = 10, Tax = NULL, min.depth = 0, n.resample = NULL, fdr.alpha = 0.05)
ZIGDM(OTU, X4freq, X4mean, X4disp, test.type = "Mean", X.index, ZI.LB = 10, Tax = NULL, min.depth = 0, n.resample = NULL, fdr.alpha = 0.05)
OTU |
a numeric matrix contains counts with each row corresponds to a sample and each column corresponds to an OTU or a taxon. Column name is mandatory. No missing values are allowed. |
X4freq |
a numeric matrix contains covariates that link to presence-absence frequency in microbial compositions. Each column pertains to one variable. Set X4freq=NULL if only intercept term is needed. Samples in the OTU and X4freq matrices should be identical and in the same order. No missing values are allowed. |
X4mean |
a numeric matrix contains covariates that link to mean abundance in microbial compositions. Each column pertains to one variable. Set X4mean=NULL if only intercept term is needed. Samples in the OTU and X4mean matrices should be identical and in the same order. No missing values are allowed. |
X4disp |
a numeric matrix contains covariates that link to dispersion level in microbial compositions. Each column pertains to one variable. Set X4disp=NULL if only intercept term is needed. Samples in the OTU and X4disp matrices should be identical and in the same order. No missing values are allowed. |
test.type |
If test.type = "Mean", the function will test for differential mean (Default). If test.type = "Disp", the function will test for differential dispersion. If test.type = "Freq", the function will test for differential presence-absence frequency. |
X.index |
If test.type = "Mean", X.index is a vector indicates the columns in X4mean for the covariate(s) of interest. The remaining columns in X4mean will be treated as confounders in modeling the abundance of the present taxa. If test.type = "Disp", X.index is a vector indicates the columns in X4disp for the covariate(s) of interest. The remaining columns in X4disp will be treated as confounders in modeling the dispersion level of the present taxa. If test.type = "Freq", X.index is a vector indicates the columns in X4freq for the covariate(s) of interest. The remaining columns in X4freq will be treated as confounders in modeling the presence-absence frequency. |
ZI.LB |
lower bound of zero counts for the taxon that needs zero-inflated model. The counts for a taxon is assumed to be zero-inflated if the number of zero observations for the taxon is greater than ZI.LB (Default is 10). If ZI.LB=NULL, GDM model (i.e. non-zero-inflated version of ZIGDM) will be applied. |
Tax |
a matrix define the taxonomic ranks with each row corresponds to an OTU or a taxon and each column corresponds to a rank (start from the higher taxonomic level, e.g., from kingdom to genus). Row name is mandatory and should be consistent with the column name of the OTU table, Column name should be formatted as "Rank1", "Rank2", etc, representing the taxonomic levels from highest to lowest. |
min.depth |
lower bound of sample read depth. Samples with read depth less than min.depth will be removed before the analysis. |
n.resample |
number of reamplings/permutations |
fdr.alpha |
desired false discovery rate for multiple tests on the lineages. |
If Tax=NULL (Default), a test is performed using all the OTUs/taxa.
If Tax is provided, tests are performed for lineages derived from the taxonomic hierarchy. The output is a list that contains 3 components
lineage.pval |
p-values for all lineages. By default (n.resample=NULL), only the asymptotic test will be performed. |
sig.lineage |
a vector of significant lineages |
global.pval |
p-values of the global tests |
Zheng-Zheng Tang
Tang, Z.-Z. and Chen, G. (2017). Zero-Inflated Generalized Dirichlet Multinomial Regression Model for Microbiome Compositional Data Analysis. Submitted.
data(data.toy) OTU.toy = data.toy$OTU.toy Tax.toy = data.toy$Tax.toy case = data.toy$covariate.toy # the OTUs should be consistent between the OTU table and the taxonomy table OTU.toy.reorder = OTU.toy[,match(rownames(Tax.toy), colnames(OTU.toy))] # perform ZIGDM test for detecting differential dispersion ZIGDM(OTU.toy.reorder, NULL, NULL, case, test.type = "Disp", 1, ZI.LB = 10, Tax.toy, fdr.alpha = 0.05)
data(data.toy) OTU.toy = data.toy$OTU.toy Tax.toy = data.toy$Tax.toy case = data.toy$covariate.toy # the OTUs should be consistent between the OTU table and the taxonomy table OTU.toy.reorder = OTU.toy[,match(rownames(Tax.toy), colnames(OTU.toy))] # perform ZIGDM test for detecting differential dispersion ZIGDM(OTU.toy.reorder, NULL, NULL, case, test.type = "Disp", 1, ZI.LB = 10, Tax.toy, fdr.alpha = 0.05)