Title: | Hypothesis Testing and Power Calculations for Comparing Metagenomic Samples from HMP |
---|---|
Description: | Using Dirichlet-Multinomial distribution to provide several functions for formal hypothesis testing, power and sample size calculations for human microbiome experiments. |
Authors: | Patricio S. La Rosa, Elena Deych, Sharina Carter, Berkley Shands, Dake Yang, William D. Shannon |
Maintainer: | Berkley Shands <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 2.0.1 |
Built: | 2024-12-18 06:57:28 UTC |
Source: | CRAN |
This package provides tools for Generating data matrices following Multinomial and Dirichlet-Multinomial distributions, Computing the following test-statistics and their corresponding p-values, and Computing the power and size of the tests described above using Monte-Carlo simulations.
|
|
|
2+ Sample Means w/ Reference Vector | Xmc.sevsample |
MC.Xmc.statistics |
1 Sample Mean w/ Reference Vector | Xsc.onesample |
MC.Xsc.statistics |
2+ Sample Means w/o Reference Vector | Xmcupo.sevsample |
MC.Xmcupo.statistics |
2+ Sample Overdispersions | Xoc.sevsample |
MC.Xoc.statistics |
2+ Sample DM-Distribution | Xdc.sevsample |
MC.Xdc.statistics |
Multinomial vs DM | C.alpha.multinomial |
MC.ZT.statistics |
In addition to hypothesis testing and power calculations you can:
Perform basic data management to exclude samples with fewer than pre-specified number of reads,
collapse rare taxa and order the taxa by frequency. This is useful to exclude failed samples
(i.e. samples with very few reads) - Data.filter
Plot your data - Barchart.data
Generate random sample of Dirichlet Multinomial data with pre-specified parameters - Dirichlet.multinomial
Note: Thought the description of the functions refer its application to ranked abundance distributions (RAD) data, every function is also applicable to model species abundance data. See references for a discussion and application to both type of ecological data.
Patricio S. La Rosa, Elena Deych, Berkley Shands, Sharina Carter, Dake Yang, William D. Shannon
La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, et al. (2012) Hypothesis Testing and Power Calculations for Taxonomic-Based Human Microbiome Data. PLoS ONE 7(12): e52078. doi:10.1371/journal.pone.0052078
Yang D, Johnson J, Zhou X, Deych E, et al. (2019) Microbiome Recursive Partitioning. Currently Under Review.
Creates a bar plot of taxonomic proportions.
Barchart.data(data, title = "Taxa Proportions")
Barchart.data(data, title = "Taxa Proportions")
data |
A matrix of taxonomic counts(columns) for each sample(rows). |
title |
A string to be used as the plots title. The default is "Taxa Proportions". |
A bar plot of taxonomic proportions for all samples at a given taxonomic level.
data(saliva) Barchart.data(saliva)
data(saliva) Barchart.data(saliva)
- Optimal Test for Assessing Multinomial Goodness of Fit Versus Dirichlet-Multinomial AlternativeA function to compute the -optimal test statistics of Kim and Margolin (1992)
for evaluating the Goodness-of-Fit of a Multinomial distribution (null hypothesis) versus a Dirichlet-Multinomial
distribution (alternative hypothesis).
C.alpha.multinomial(data)
C.alpha.multinomial(data)
data |
A matrix of taxonomic counts(columns) for each sample(rows). |
In order to test if a set of ranked-abundance distribution(RAD) from microbiome samples can be modeled better using a multinomial or Dirichlet-Multinomial
distribution, we test the hypothesis versus
,
where the null hypothesis implies a multinomial distribution and the alternative hypothesis implies a DM distribution.
Kim and Margolin (Kim and Margolin, 1992) proposed a
-optimal test- statistics given by,
Where is the number of taxa,
is the number of samples,
is the taxon
,
from sample
,
,
is the number of reads in sample
, and
is the total number of reads across samples.
As the number of reads increases, the distribution of the statistic converges to a Chi-square with degrees of freedom
equal to
, when the number of sequence reads is the same in all samples. When the number of reads is not the same in all samples,
the distribution becomes a weighted Chi-square with a modified degree of freedom (see (Kim and Margolin, 1992) for more details).
Note: Each taxa in data
should be present in at least 1 sample, a column with all 0's may result in errors and/or invalid results.
A list containing the -optimal test statistic and p-value.
Kim, B. S., and Margolin, B. H. (1992). Testing Goodness of Fit of a Multinomial Model Against Overdispersed Alternatives. Biometrics 48, 711-719.
data(saliva) calpha <- C.alpha.multinomial(saliva) calpha
data(saliva) calpha <- C.alpha.multinomial(saliva) calpha
This function creates a new dataset from an existing one by ordering taxa in order of decreasing abundance, collapsing less-abundant taxa into one category as specified by the user and excluding samples with a total number of reads fewer than the user-specified value.
Data.filter(data, order.type = "data", minReads = 0, numTaxa = NULL, perTaxa = NULL)
Data.filter(data, order.type = "data", minReads = 0, numTaxa = NULL, perTaxa = NULL)
data |
A matrix of taxonomic counts(columns) for each sample(rows). |
order.type |
If |
minReads |
Samples with a total number of reads less than read.crit value will be deleted. |
numTaxa |
The number of taxa to keep, while collapsing the other (less abundant) taxa. Only one argument, numTaxa or perTaxa should be specified. |
perTaxa |
The combined percentage of data to keep, while collapsing the remaining taxa. Only one argument, numTaxa or perTaxa should be specified. |
A data frame of taxa and samples with a total number of reads greater than the minimum value. The last taxon labeled 'Other' contains the sum of the least abundant taxa collapsed by setting 'numTaxa' or 'perTaxa'.
data(saliva) ### Excludes all samples with fewer than 1000 reads and collapses ### taxa with 11th or smaller abundance into one category filterDataNum <- Data.filter(saliva, "data", 1000, numTaxa=10) ### Excludes all samples with fewer than 1000 reads and collapses ### the least abundant taxa to keep as close to 95% of the data as ### possible filterDataPer <- Data.filter(saliva, "data", 1000, perTaxa=.95) dim(saliva) dim(filterDataNum) dim(filterDataPer)
data(saliva) ### Excludes all samples with fewer than 1000 reads and collapses ### taxa with 11th or smaller abundance into one category filterDataNum <- Data.filter(saliva, "data", 1000, numTaxa=10) ### Excludes all samples with fewer than 1000 reads and collapses ### the least abundant taxa to keep as close to 95% of the data as ### possible filterDataPer <- Data.filter(saliva, "data", 1000, perTaxa=.95) dim(saliva) dim(filterDataNum) dim(filterDataPer)
Random generation of Dirichlet-Multinomial samples.
Dirichlet.multinomial(Nrs, shape)
Dirichlet.multinomial(Nrs, shape)
Nrs |
A vector specifying the number of reads or sequence depth for each sample. |
shape |
A vector of Dirichlet parameters for each taxa. |
The Dirichlet-Multinomial distribution is given by (Mosimann, J. E. (1962); Tvedebrink, T. (2010)),
where is the random vector formed by K taxa (features) counts (RAD vector),
is the total number of reads (sequence depth),
are the mean of taxa-proportions (RAD-probability mean), and
is the overdispersion parameter.
Note: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
A data matrix of taxa counts where the rows are samples and columns are the taxa.
Mosimann, J. E. (1962). On the compound multinomial distribution, the multivariate -distribution, and correlations among proportions. Biometrika 49, 65-82.
Tvedebrink, T. (2010). Overdispersion in allelic counts and theta-correction in forensic genetics. Theor Popul Biol 78, 200-210.
data(saliva) ### Generate a the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(15000, 20) ### Get gamma from the dirichlet-multinomial parameters shape <- dirmult(saliva)$gamma dmData <- Dirichlet.multinomial(nrs, shape) dmData[1:5, 1:5]
data(saliva) ### Generate a the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(15000, 20) ### Get gamma from the dirichlet-multinomial parameters shape <- dirmult(saliva)$gamma dmData <- Dirichlet.multinomial(nrs, shape) dmData[1:5, 1:5]
Method-of-Moments (MoM) estimators of the Dirichlet-multinomial parameters: taxa proportions and overdispersion.
DM.MoM(data)
DM.MoM(data)
data |
A matrix of taxonomic counts(columns) for each sample(rows). |
Given a set of taxa-count vectors , the methods of moments (MoM) estimator of the set of parameters
and
is given as follows (Mosimann, 1962; Tvedebrink, 2010):
and
where ,
and
,
and
with
.
A list providing the MoM estimator for overdispersion, the MoM estimator of the RAD-probability mean vector, and the corresponding loglikelihood value for the given data set and estimated parameters.
Mosimann, J. E. (1962). On the compound multinomial distribution, the multivariate -distribution, and correlations among proportions. Biometrika 49, 65-82.
Tvedebrink, T. (2010). Overdispersion in allelic counts and theta-correction in forensic genetics. Theor Popul Biol 78, 200-210.
data(throat) fit.throat <- DM.MoM(throat) fit.throat
data(throat) fit.throat <- DM.MoM(throat) fit.throat
This function combines recursive partitioning and the Dirichlet-Multinomial distribution to identify homogeneous subgroups of microbiome taxa count data.
DM.Rpart(data, covars, plot = TRUE, minsplit = 1, minbucket = 1, cp = 0, numCV = 10, numCon = 100, parallel = FALSE, cores = 3, use1SE = FALSE, lowerSE = TRUE)
DM.Rpart(data, covars, plot = TRUE, minsplit = 1, minbucket = 1, cp = 0, numCV = 10, numCon = 100, parallel = FALSE, cores = 3, use1SE = FALSE, lowerSE = TRUE)
data |
A matrix of taxonomic counts(columns) for each sample(rows). |
covars |
A matrix of covariates(columns) for each sample(rows). |
plot |
When 'TRUE' a tree plot of the results will be generated. |
minsplit |
The minimum number of observations to split on, see rpart.control. |
minbucket |
The minimum number of observations in any terminal node, see rpart.control. |
cp |
The complexity parameter, see rpart.control. |
numCV |
The number folds for a k-fold cross validation. A value less than 2 will return the rpart result without any cross validation. |
numCon |
The number of cross validations to repeat to achieve a consensus solution. |
parallel |
When this is 'TRUE' it allows for parallel calculation of consensus. Requires the package |
cores |
The number of parallel processes to run if parallel is 'TRUE'. |
use1SE |
See details. |
lowerSE |
See details. |
There are 3 ways to run this function. The first is setting numCV to less than 2, which will run rpart once using the DM distribution and the specified minsplit, minbucket and cp. This result will not have any kind of branch pruning and the objects returned 'fullTree' and 'bestTree' will be the same.
The second way is setting numCV to 2 or greater (we recommend 10) and setting numCon to less than 2. This will run rpart several times using a k-fold cross validation to prune the tree to its optimal size. This is the best method to use.
The third way is setting both numCV and numCon to 2 or greater (We recommend at least 100 for numCon). This will repeat the second way numCon times and build a consensus solution. This method is ONLY needed for low sample sizes.
When the argument 'use1SE' is 'FALSE', the returned object 'bestTree' is the pruned tree with the lowest MSE. When it is 'TRUE', 'bestTree' is either the biggest pruned tree (lowerSE = FALSE) or the smallest pruned tree (lowerSE = TRUE), that is within 1 standard error of the lowest MSE.
The 3 main things returned are:
fullTree |
An rpart object without any pruning. |
bestTree |
A pruned rpart object based on use1SE and lowerSE's settings. |
cpTable |
Information about the fullTree rpart object and how it splits. |
The other variables returned include surrogate/competing splits, error rates and a plot of the bestTree if plot is TRUE.
data(saliva) data(throat) data(tonsils) ### Create some covariates for our data set site <- c(rep("Saliva", nrow(saliva)), rep("Throat", nrow(throat)), rep("Tonsils", nrow(tonsils))) covars <- data.frame(Group=site) ### Combine our data into a single object data <- rbind(saliva, throat, tonsils) ### For a single rpart tree numCV <- 0 numCon <- 0 rpartRes <- DM.Rpart(data, covars, numCV=numCV, numCon=numCon) ## Not run: ### For a cross validated rpart tree numCon <- 0 rpartRes <- DM.Rpart(data, covars, numCon=numCon) ### For a cross validated rpart tree with consensus numCon <- 2 # Note this is set to 2 for speed and should be at least 100 rpartRes <- DM.Rpart(data, covars, numCon=numCon) ## End(Not run)
data(saliva) data(throat) data(tonsils) ### Create some covariates for our data set site <- c(rep("Saliva", nrow(saliva)), rep("Throat", nrow(throat)), rep("Tonsils", nrow(tonsils))) covars <- data.frame(Group=site) ### Combine our data into a single object data <- rbind(saliva, throat, tonsils) ### For a single rpart tree numCV <- 0 numCon <- 0 rpartRes <- DM.Rpart(data, covars, numCV=numCV, numCon=numCon) ## Not run: ### For a cross validated rpart tree numCon <- 0 rpartRes <- DM.Rpart(data, covars, numCon=numCon) ### For a cross validated rpart tree with consensus numCon <- 2 # Note this is set to 2 for speed and should be at least 100 rpartRes <- DM.Rpart(data, covars, numCon=numCon) ## End(Not run)
This data set is used in the paper Microbiome Recursive Partitioning 2019. It contains 128 subjects and 11 cytokines.
data(dmrp_covars)
data(dmrp_covars)
The format is a data frame of 128 rows by 11 columns, with the each row being a separate subject and each column being a different cytokine.
data(dmrp_covars)
data(dmrp_covars)
This data set is used in the paper Microbiome Recursive Partitioning 2019. It contains 128 subjects and 29 genus level taxa.
data(dmrp_data)
data(dmrp_data)
The format is a data frame of 128 rows by 29 columns, with the each row being a separate subject and each column being a different taxa.
data(dmrp_data)
data(dmrp_data)
Calculates Dirichlet-Multinomial parameters for every group using Maximum Likelihood and Method of Moments estimates: Taxa proportion estimates (PI vector) with standard errors and Confidence intervals, as well as theta values with standard errors.
Est.PI(group.data, conf = .95)
Est.PI(group.data, conf = .95)
group.data |
A list of matrices of taxonomic counts(columns) for each sample(rows). |
conf |
The desired confidence limits. The default is 95% |
A list containing the parameters: PI, SE and the upper/lower bounds of the confidence interval for every taxa, and the theta values with standard errors for both MLE and MOM.
## Not run: data(saliva) data(throat) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) ### Get PI using MLE and MOM with CI piEsts <- Est.PI(group.data) mle <- piEsts$MLE mom <- piEsts$MOM ## End(Not run)
## Not run: data(saliva) data(throat) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) ### Get PI using MLE and MOM with CI piEsts <- Est.PI(group.data) mle <- piEsts$MLE mom <- piEsts$MOM ## End(Not run)
For a list of datasets, this function finds the union of taxa across all datasets and transforms them such that they all have the same columns of taxa.
formatDataSets(group.data)
formatDataSets(group.data)
group.data |
A list where each element is a matrix of taxonomic counts(columns) for each sample(rows). Note that the row names should correspond to sample names |
This function will also sort all the columns into the same order for every dataset and remove any columns that have 0's for every sample.
E.g. For two datasets, any taxa present in dataset1 but not dataset2 will be added to dataset2 with a 0 count for all samples and vice versa.
The list given, but modified so every data set has the same ordering and number of columns
data(saliva) data(throat) ### Set each data set to have 10 different columns saliva2 <- saliva[,1:10] throat2 <- throat[,11:20] ### Combine the data sets into a single list group.data <- list(saliva2, throat2) formattedData <- formatDataSets(group.data) formattedData[[1]][1:5, 1:5]
data(saliva) data(throat) ### Set each data set to have 10 different columns saliva2 <- saliva[,1:10] throat2 <- throat[,11:20] ### Combine the data sets into a single list group.data <- list(saliva2, throat2) formattedData <- formatDataSets(group.data) formattedData[[1]][1:5, 1:5]
GA-Mantel is a fully multivariate method that uses a genetic algorithm to search over possible taxa subsets using the Mantel correlation as the scoring measure for assessing the quality of any given taxa subset.
Gen.Alg(data, covars, iters = 50, popSize = 200, earlyStop = 0, dataDist = "euclidean", covarDist = "gower", verbose = FALSE, plot = TRUE, minSolLen = NULL, maxSolLen = NULL, custCovDist = NULL, penalty = 0)
Gen.Alg(data, covars, iters = 50, popSize = 200, earlyStop = 0, dataDist = "euclidean", covarDist = "gower", verbose = FALSE, plot = TRUE, minSolLen = NULL, maxSolLen = NULL, custCovDist = NULL, penalty = 0)
data |
A matrix of taxonomic counts(columns) for each sample(rows). |
covars |
A matrix of covariates(columns) for each sample(rows). |
iters |
The number of times to run through the GA. |
popSize |
The number of solutions to test on each iteration. |
earlyStop |
The number of consecutive iterations without finding a better solution before stopping regardless of the number of iterations remaining. A value of '0' will prevent early stopping. |
dataDist |
The distance metric to use for the data. Either "euclidean" or "gower". |
covarDist |
The distance metric to use for the covariates. Either "euclidean" or "gower". |
verbose |
While 'TRUE' the current status of the GA will be printed periodically. |
plot |
A boolean to plot the progress of the scoring statistics by iteration. |
minSolLen |
The minimum number of columns to select. |
maxSolLen |
The maximum number of columns to select. |
custCovDist |
A custom covariate distance matrix to use in place of calculating one from covars. |
penalty |
A number between 0 and 1 used to penalize the solutions based on the number of selected taxa using the following formula: score - penalty * ((number of selected taxa)/(number of taxa)). |
Use a GA approach to find taxa that separate subjects based on group membership or set of covariates.
The data and covariates should be normalized BEFORE use with this function because of distance functions.
This function uses modified code from the rbga function in the genalg package. rbga
Because the GA looks at combinations and uses the raw data, taxa with a small difference in their PIs may be selected and large differences may not be.
The distance calculations use the vegdist package. vegdist
A list containing
scoreSumm |
A matrix summarizing the score of the population. This can be used to figure out if the ga has come to a final solution or not. This data is also plotted if plot is 'TRUE'. |
solutions |
The final set of solutions, sorted with the highest scoring first. |
scores |
The scores for the final set of solutions. |
time |
How long in seconds the ga took to run. |
selected |
The selected columns by name. |
nonSelected |
The columns that were NOT selected by name. |
selectedIndex |
The selected taxa by column number. |
## Not run: data(saliva) data(throat) ### Combine the data into a single data frame group.data <- list(saliva, throat) group.data <- formatDataSets(group.data) data <- do.call("rbind", group.data) ### Normalize the data by subject dataNorm <- t(apply(data, 1, function(x){x/sum(x)})) ### Set covars to just be group membership memb <- c(rep(0, nrow(saliva)), rep(1, nrow(throat))) covars <- matrix(memb, length(memb), 1) ### We use low numbers for speed. The exact numbers to use depend ### on the data being used, but generally the higher iters and popSize ### the longer it will take to run. earlyStop is then used to stop the ### run early if the results aren't improving. iters <- 500 popSize <- 200 earlyStop <- 250 gaRes <- Gen.Alg(dataNorm, covars, iters, popSize, earlyStop) ## End(Not run)
## Not run: data(saliva) data(throat) ### Combine the data into a single data frame group.data <- list(saliva, throat) group.data <- formatDataSets(group.data) data <- do.call("rbind", group.data) ### Normalize the data by subject dataNorm <- t(apply(data, 1, function(x){x/sum(x)})) ### Set covars to just be group membership memb <- c(rep(0, nrow(saliva)), rep(1, nrow(throat))) covars <- matrix(memb, length(memb), 1) ### We use low numbers for speed. The exact numbers to use depend ### on the data being used, but generally the higher iters and popSize ### the longer it will take to run. earlyStop is then used to stop the ### run early if the results aren't improving. iters <- 500 popSize <- 200 earlyStop <- 250 gaRes <- Gen.Alg(dataNorm, covars, iters, popSize, earlyStop) ## End(Not run)
GA-Mantel is a fully multivariate method that uses a genetic algorithm to search over possible taxa subsets using the Mantel correlation as the scoring measure for assessing the quality of any given taxa subset.
Gen.Alg.Consensus(data, covars, consensus = .5, numRuns = 10, parallel = FALSE, cores = 3, ...)
Gen.Alg.Consensus(data, covars, consensus = .5, numRuns = 10, parallel = FALSE, cores = 3, ...)
data |
A matrix of taxonomic counts(columns) for each sample(rows). |
covars |
A matrix of covariates(columns) for each sample(rows). |
consensus |
The required fraction (0, 1] of solutions containing an edge in order to keep it. |
numRuns |
Number of runs to do. In practice the number of runs needed varies based on data set size and the GA parameters set. |
parallel |
When this is 'TRUE' it allows for parallel calculation of the bootstraps. Requires the package |
cores |
The number of parallel processes to run if parallel is 'TRUE'. |
... |
Other arguments for the GA function see Gen.Alg |
Use a GA consensus approach to find taxa that separate subjects based on group membership or set of covariates if you cannot run the GA long enough to get a final solution.
A list containing
solutions |
The best solution from each run. |
consSol |
The consensus solution. |
selectedIndex |
The selected taxa by column number. |
## Not run: data(saliva) data(throat) ### Combine the data into a single data frame group.data <- list(saliva, throat) group.data <- formatDataSets(group.data) data <- do.call("rbind", group.data) ### Normalize the data by subject dataNorm <- t(apply(data, 1, function(x){x/sum(x)})) ### Set covars to just be group membership memb <- c(rep(0, nrow(saliva)), rep(1, nrow(throat))) covars <- matrix(memb, length(memb), 1) ### We use low numbers for speed. The exact numbers to use depend ### on the data being used, but generally the higher iters and popSize ### the longer it will take to run. earlyStop is then used to stop the ### run early if the results aren't improving. iters <- 500 popSize <- 200 earlyStop <- 250 numRuns <- 3 gaRes <- Gen.Alg.Consensus(dataNorm, covars, .5, numRuns, FALSE, 3, iters, popSize, earlyStop) ## End(Not run)
## Not run: data(saliva) data(throat) ### Combine the data into a single data frame group.data <- list(saliva, throat) group.data <- formatDataSets(group.data) data <- do.call("rbind", group.data) ### Normalize the data by subject dataNorm <- t(apply(data, 1, function(x){x/sum(x)})) ### Set covars to just be group membership memb <- c(rep(0, nrow(saliva)), rep(1, nrow(throat))) covars <- matrix(memb, length(memb), 1) ### We use low numbers for speed. The exact numbers to use depend ### on the data being used, but generally the higher iters and popSize ### the longer it will take to run. earlyStop is then used to stop the ### run early if the results aren't improving. iters <- 500 popSize <- 200 earlyStop <- 250 numRuns <- 3 gaRes <- Gen.Alg.Consensus(dataNorm, covars, .5, numRuns, FALSE, 3, iters, popSize, earlyStop) ## End(Not run)
Calculates Kullback Leibler divergence for all pairs of the datasets.
Kullback.Leibler(group.data, plot = TRUE, main="Kullback Leibler Divergences", parallel = FALSE, cores = 3)
Kullback.Leibler(group.data, plot = TRUE, main="Kullback Leibler Divergences", parallel = FALSE, cores = 3)
group.data |
A list where each element is a matrix of taxonomic counts(columns) for each sample(rows). |
plot |
When 'TRUE' a heatmap of the results will also be generated. |
main |
A string to be used as the plots title. |
parallel |
When this is 'TRUE' it allows for parallel calculation of the KL distances. Requires the package |
cores |
The number of parallel processes to run if parallel is 'TRUE'. |
A matrix of Kullback Leibler divergence values and a heatmap if plot is TRUE.
Kotz S, Johnson N.L (1981) Encyclopedia Of Statistical Sciences
data(saliva) data(throat) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) ## Not run: kl <- Kullback.Leibler(group.data) kl ## End(Not run)
data(saliva) data(throat) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) ## Not run: kl <- Kullback.Leibler(group.data) kl ## End(Not run)
This Monte-Carlo simulation procedure provides the power and size of the several sample Dirichlet-Multinomial parameter test comparison, using the likelihood-ratio-test statistics.
MC.Xdc.statistics(group.Nrs, numMC = 10, alphap, type = "ha", siglev = 0.05, est = "mom")
MC.Xdc.statistics(group.Nrs, numMC = 10, alphap, type = "ha", siglev = 0.05, est = "mom")
group.Nrs |
A list specifying the number of reads/sequence depth for each sample in a group with one group per list entry. |
numMC |
Number of Monte-Carlo experiments. In practice this should be at least 1,000. |
alphap |
If |
type |
If |
siglev |
Significance level for size of the test / power calculation. The default is 0.05. |
est |
The type of parameter estimator to be used with the Likelihood-ratio-test statistics, 'mle' or 'mom'. Default value is 'mom'. (See Note 2 in details) |
Note 1: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
Note 2: 'mle' will take significantly longer time and may not be optimal for small sample sizes; 'mom' will provide a more conservative result in such a case.
Note 3: All components of alphap
should be non-zero or it may result in errors and/or invalid results.
Size of the test statistics (under "hnull"
) or power (under "ha"
) of the test.
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrsGrp1 <- rep(12000, 9) nrsGrp2 <- rep(12000, 11) nrsGrp3 <- rep(12000, 12) group.Nrs <- list(nrsGrp1, nrsGrp2, nrsGrp3) ### Computing size of the test statistics (Type I error) alphap <- fit.saliva$gamma pval1 <- MC.Xdc.statistics(group.Nrs, numMC, alphap, "hnull") pval1 ### Computing Power of the test statistics (Type II error) alphap <- rbind(fit.saliva$gamma, fit.throat$gamma, fit.tonsils$gamma) pval2 <- MC.Xdc.statistics(group.Nrs, numMC, alphap) pval2
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrsGrp1 <- rep(12000, 9) nrsGrp2 <- rep(12000, 11) nrsGrp3 <- rep(12000, 12) group.Nrs <- list(nrsGrp1, nrsGrp2, nrsGrp3) ### Computing size of the test statistics (Type I error) alphap <- fit.saliva$gamma pval1 <- MC.Xdc.statistics(group.Nrs, numMC, alphap, "hnull") pval1 ### Computing Power of the test statistics (Type II error) alphap <- rbind(fit.saliva$gamma, fit.throat$gamma, fit.tonsils$gamma) pval2 <- MC.Xdc.statistics(group.Nrs, numMC, alphap) pval2
This Monte-Carlo simulation procedure provides the power and size of the several sample RAD-probability mean test comparison with known reference vector of proportions, using the Generalized Wald-type statistics.
MC.Xmc.statistics(group.Nrs, numMC = 10, pi0, group.pi, group.theta, type = "ha", siglev = 0.05)
MC.Xmc.statistics(group.Nrs, numMC = 10, pi0, group.pi, group.theta, type = "ha", siglev = 0.05)
group.Nrs |
A list specifying the number of reads/sequence depth for each sample in a group with one group per list entry. |
numMC |
Number of Monte-Carlo experiments. In practice this should be at least 1,000. |
pi0 |
The RAD-probability mean vector. |
group.pi |
If |
group.theta |
A vector of overdispersion values for each group. |
type |
If |
siglev |
Significance level for size of the test / power calculation. The default is 0.05. |
Note: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
Size of the test statistics (under "hnull"
) or power (under "ha"
) of the test.
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrsGrp1 <- rep(12000, 9) nrsGrp2 <- rep(12000, 11) group.Nrs <- list(nrsGrp1, nrsGrp2) group.theta <- c(0.01, 0.05) pi0 <- fit.saliva$pi ### Computing size of the test statistics (Type I error) pval1 <- MC.Xmc.statistics(group.Nrs, numMC, pi0, group.theta=group.theta, type="hnull") pval1 ### Computing Power of the test statistics (Type II error) group.pi <- rbind(fit.throat$pi, fit.tonsils$pi) pval2 <- MC.Xmc.statistics(group.Nrs, numMC, pi0, group.pi, group.theta) pval2
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrsGrp1 <- rep(12000, 9) nrsGrp2 <- rep(12000, 11) group.Nrs <- list(nrsGrp1, nrsGrp2) group.theta <- c(0.01, 0.05) pi0 <- fit.saliva$pi ### Computing size of the test statistics (Type I error) pval1 <- MC.Xmc.statistics(group.Nrs, numMC, pi0, group.theta=group.theta, type="hnull") pval1 ### Computing Power of the test statistics (Type II error) group.pi <- rbind(fit.throat$pi, fit.tonsils$pi) pval2 <- MC.Xmc.statistics(group.Nrs, numMC, pi0, group.pi, group.theta) pval2
This Monte-Carlo simulation procedure provides the power and size of the several sample RAD-probability mean test comparisons without reference vector of proportions, using the Generalized Wald-type statistics.
MC.Xmcupo.statistics(group.Nrs, numMC = 10, pi0, group.pi, group.theta, type = "ha", siglev = 0.05)
MC.Xmcupo.statistics(group.Nrs, numMC = 10, pi0, group.pi, group.theta, type = "ha", siglev = 0.05)
group.Nrs |
A list specifying the number of reads/sequence depth for each sample in a group with one group per list entry. |
numMC |
Number of Monte-Carlo experiments. In practice this should be at least 1,000. |
pi0 |
The RAD-probability mean vector. |
group.pi |
If |
group.theta |
A vector of overdispersion values for each group. |
type |
If |
siglev |
Significance level for size of the test / power calculation. The default is 0.05. |
Note: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
Size of the test statistics (under "hnull"
) or power (under "ha"
) of the test.
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects Nrs1 <- rep(12000, 10) Nrs2 <- rep(12000, 19) group.Nrs <- list(Nrs1, Nrs2) group.theta <- c(fit.throat$theta, fit.tonsils$theta) pi0 <- fit.saliva$pi ### Computing size of the test statistics (Type I error) pval1 <- MC.Xmcupo.statistics(group.Nrs, numMC, pi0, group.theta=group.theta, type="hnull") pval1 ### Computing Power of the test statistics (Type II error) group.pi <- rbind(fit.throat$pi, fit.tonsils$pi) pval2 <- MC.Xmcupo.statistics(group.Nrs, numMC, group.pi=group.pi, group.theta=group.theta) pval2
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects Nrs1 <- rep(12000, 10) Nrs2 <- rep(12000, 19) group.Nrs <- list(Nrs1, Nrs2) group.theta <- c(fit.throat$theta, fit.tonsils$theta) pi0 <- fit.saliva$pi ### Computing size of the test statistics (Type I error) pval1 <- MC.Xmcupo.statistics(group.Nrs, numMC, pi0, group.theta=group.theta, type="hnull") pval1 ### Computing Power of the test statistics (Type II error) group.pi <- rbind(fit.throat$pi, fit.tonsils$pi) pval2 <- MC.Xmcupo.statistics(group.Nrs, numMC, group.pi=group.pi, group.theta=group.theta) pval2
This Monte-Carlo simulation procedure provides the power and size of the several sample-overdispersion test comparison, using the likelihood-ratio-test statistics.
MC.Xoc.statistics(group.Nrs, numMC = 10, group.alphap, type = "ha", siglev = 0.05)
MC.Xoc.statistics(group.Nrs, numMC = 10, group.alphap, type = "ha", siglev = 0.05)
group.Nrs |
A list specifying the number of reads/sequence depth for each sample in a group with one group per list entry. |
numMC |
Number of Monte-Carlo experiments. In practice this should be at least 1,000. |
group.alphap |
If |
type |
If |
siglev |
Significance level for size of the test / power calculation. The default is 0.05. |
Note 1: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
Note 2: All components of group.alphap
should be non-zero or it may result in errors and/or invalid results.
Size of the test statistics (under "hnull"
) or power (under "ha"
) of the test.
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrsGrp1 <- rep(12000, 9) nrsGrp2 <- rep(12000, 11) nrsGrp3 <- rep(12000, 12) group.Nrs <- list(nrsGrp1, nrsGrp2, nrsGrp3) ### Computing size of the test statistics (Type I error) alphap <- fit.tonsils$gamma pval1 <- MC.Xoc.statistics(group.Nrs, numMC, alphap, "hnull") pval1 ## Not run: ### Computing Power of the test statistics (Type II error) alphap <- rbind(fit.saliva$gamma, fit.throat$gamma, fit.tonsils$gamma) pval2 <- MC.Xoc.statistics(group.Nrs, numMC, alphap, "ha") pval2 ## End(Not run)
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrsGrp1 <- rep(12000, 9) nrsGrp2 <- rep(12000, 11) nrsGrp3 <- rep(12000, 12) group.Nrs <- list(nrsGrp1, nrsGrp2, nrsGrp3) ### Computing size of the test statistics (Type I error) alphap <- fit.tonsils$gamma pval1 <- MC.Xoc.statistics(group.Nrs, numMC, alphap, "hnull") pval1 ## Not run: ### Computing Power of the test statistics (Type II error) alphap <- rbind(fit.saliva$gamma, fit.throat$gamma, fit.tonsils$gamma) pval2 <- MC.Xoc.statistics(group.Nrs, numMC, alphap, "ha") pval2 ## End(Not run)
This Monte-Carlo simulation procedure provides the power and size of the one sample RAD probability-mean test, using the Generalized Wald-type statistic.
MC.Xsc.statistics(Nrs, numMC = 10, fit, pi0 = NULL, type = "ha", siglev = 0.05)
MC.Xsc.statistics(Nrs, numMC = 10, fit, pi0 = NULL, type = "ha", siglev = 0.05)
Nrs |
A vector specifying the number of reads/sequence depth for each sample. |
numMC |
Number of Monte-Carlo experiments. In practice this should be at least 1,000. |
fit |
A list (in the format of the output of dirmult function) containing the data parameters for evaluating either the size or power of the test. |
pi0 |
The RAD-probability mean vector. If the type is set to |
type |
If |
siglev |
Significance level for size of the test / power calculation. The default is 0.05. |
Note: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
Size of the test statistics (under "hnull"
) or power (under "ha"
) of the test.
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(15000, 25) ### Computing size of the test statistics (Type I error) pval1 <- MC.Xsc.statistics(nrs, numMC, fit.tonsils, fit.saliva$pi, "hnull") pval1 ### Computing Power of the test statistics (Type II error) pval2 <- MC.Xsc.statistics(nrs, numMC, fit.throat, fit.tonsils$pi) pval2
data(saliva) data(throat) data(tonsils) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) fit.throat <- DM.MoM(throat) fit.tonsils <- DM.MoM(tonsils) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(15000, 25) ### Computing size of the test statistics (Type I error) pval1 <- MC.Xsc.statistics(nrs, numMC, fit.tonsils, fit.saliva$pi, "hnull") pval1 ### Computing Power of the test statistics (Type II error) pval2 <- MC.Xsc.statistics(nrs, numMC, fit.throat, fit.tonsils$pi) pval2
This Monte-Carlo simulation procedure provides the power and size of the Multinomial vs. Dirichlet-Multinomial goodness of fit test, using the
C()-optimal test statistics of Kim and Margolin (1992) (t statistics) and the C(
)-optimal test statistics of (Paul et al., 1989).
MC.ZT.statistics(Nrs, numMC = 10, fit, type = "ha", siglev = 0.05)
MC.ZT.statistics(Nrs, numMC = 10, fit, type = "ha", siglev = 0.05)
Nrs |
A vector specifying the number of reads/sequence depth for each sample. |
numMC |
Number of Monte-Carlo experiments. In practice this should be at least 1,000. |
fit |
A list (in the format of the output of dirmult function) containing the data parameters for evaluating either the size or power of the test. |
type |
If |
siglev |
Significance level for size of the test / power calculation. The default is 0.05. |
Note: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
A vector containing both the size of the test statistics (under "hnull"
) or power (under "ha"
) of the test for both the z and t statistics.
data(saliva) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(15000, 25) ### Computing size of the test statistics (Type I error) pval1 <- MC.ZT.statistics(nrs, numMC, fit.saliva, "hnull") pval1 ### Computing Power of the test statistics (Type II error) pval2 <- MC.ZT.statistics(nrs, numMC, fit.saliva) pval2
data(saliva) ### Get a list of dirichlet-multinomial parameters for the data fit.saliva <- DM.MoM(saliva) ### Set up the number of Monte-Carlo experiments ### We use 1 for speed, should be at least 1,000 numMC <- 1 ### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(15000, 25) ### Computing size of the test statistics (Type I error) pval1 <- MC.ZT.statistics(nrs, numMC, fit.saliva, "hnull") pval1 ### Computing Power of the test statistics (Type II error) pval2 <- MC.ZT.statistics(nrs, numMC, fit.saliva) pval2
It generates a data matrix with random samples from a multinomial distribution where the rows are the samples and the columns are the taxa.
Multinomial(Nrs, probs)
Multinomial(Nrs, probs)
Nrs |
A vector specifying the number of reads or sequence depth for each sample. |
probs |
A vector specifying taxa probabilities. |
Note: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
A data matrix of taxa counts where the rows are the samples and the columns are the taxa.
### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(15000, 25) ### Create a probability vector probs <- c(0.4, 0.3, 0.2, .05, 0.04, .01) mData <- Multinomial(nrs, probs) mData[1:5, 1:5]
### Generate the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(15000, 25) ### Create a probability vector probs <- c(0.4, 0.3, 0.2, .05, 0.04, .01) mData <- Multinomial(nrs, probs) mData[1:5, 1:5]
Plots any number of data sets on an MDS plot.
Plot.MDS(group.data, main = "Group MDS", retCords = FALSE)
Plot.MDS(group.data, main = "Group MDS", retCords = FALSE)
group.data |
A list of matrices of taxonomic counts(columns) for each sample(rows). |
main |
A string to be used as the plots title. |
retCords |
A boolean to return the mds coordinates or not. |
A MDS plot and possibly the x-y coordinates for every point.
data(saliva) data(throat) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) Plot.MDS(group.data)
data(saliva) data(throat) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) Plot.MDS(group.data)
Plots the taxa proportions for every group.
Plot.PI(estPi, errorBars = TRUE, logScale = FALSE, main = "PI Vector", ylab = "Fractional Abundance")
Plot.PI(estPi, errorBars = TRUE, logScale = FALSE, main = "PI Vector", ylab = "Fractional Abundance")
estPi |
The results for either MLE or MOM from the function 'Est.Pi'. |
errorBars |
A boolean to display the error bars or not. |
logScale |
A boolean to log the y scale or not. |
main |
A string to be used as the plots title. |
ylab |
A string to be used as the plots y-axis title. |
A plot of the pi vectors for every group.
## Not run: data(saliva) data(throat) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) ### Get PI using MLE with CI mle <- Est.PI(group.data)$MLE ### Plot with Error Bars Plot.PI(mle) ### Plot without Error Bars Plot.PI(mle, FALSE) ### Plot with Error Bars and scaling Plot.PI(mle, TRUE, TRUE) ## End(Not run)
## Not run: data(saliva) data(throat) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) ### Get PI using MLE with CI mle <- Est.PI(group.data)$MLE ### Plot with Error Bars Plot.PI(mle) ### Plot without Error Bars Plot.PI(mle, FALSE) ### Plot with Error Bars and scaling Plot.PI(mle, TRUE, TRUE) ## End(Not run)
Plots the taxa proportions for every group/time as a barchart.
Plot.RM.Barchart(group.data, groups, times, plotByGrp = TRUE, col = NULL, conf = .95)
Plot.RM.Barchart(group.data, groups, times, plotByGrp = TRUE, col = NULL, conf = .95)
group.data |
A list of matrices of taxonomic counts(columns) for each sample(rows). |
groups |
A vector indicating group membership. |
times |
A vector indicating time. |
plotByGrp |
When 'TRUE', the plot will be split by group rather than time. |
col |
A vector of colors to use to denote taxa. |
conf |
The desired confidence limits. The default is 95% |
A barchart of the pi vectors for every group/time.
## Not run: data(saliva) data(throat) ### Reduce the size of the data saliva <- Data.filter(saliva, numTaxa=9) throat <- Data.filter(throat, numTaxa=9) ### Get the gamma value for the data saliva.gamma <- DM.MoM(saliva)$gamma throat.gamma <- DM.MoM(throat)$gamma mid.gamma <- (saliva.gamma + throat.gamma)/2 ### Generate a the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(10000, 20) ### Create data sets to be our time series in a list group.data <- list( Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, mid.gamma), Dirichlet.multinomial(nrs, throat.gamma) ) names(group.data) <- c( "Group 1, Time 1", "Group 2, Time 1", "Group 1, Time 2", "Group 2, Time 2", "Group 1, Time 3", "Group 2, Time 3" ) ### Set the group and time information for each list element groups <- c(1, 2, 1, 2, 1, 2) times <- c(1, 2, 3, 1, 2, 3) ### Plot the data by Group Plot.RM.Barchart(group.data, groups, times) ### Plot the data by Time Plot.RM.Barchart(group.data, groups, times, FALSE) ## End(Not run)
## Not run: data(saliva) data(throat) ### Reduce the size of the data saliva <- Data.filter(saliva, numTaxa=9) throat <- Data.filter(throat, numTaxa=9) ### Get the gamma value for the data saliva.gamma <- DM.MoM(saliva)$gamma throat.gamma <- DM.MoM(throat)$gamma mid.gamma <- (saliva.gamma + throat.gamma)/2 ### Generate a the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(10000, 20) ### Create data sets to be our time series in a list group.data <- list( Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, mid.gamma), Dirichlet.multinomial(nrs, throat.gamma) ) names(group.data) <- c( "Group 1, Time 1", "Group 2, Time 1", "Group 1, Time 2", "Group 2, Time 2", "Group 1, Time 3", "Group 2, Time 3" ) ### Set the group and time information for each list element groups <- c(1, 2, 1, 2, 1, 2) times <- c(1, 2, 3, 1, 2, 3) ### Plot the data by Group Plot.RM.Barchart(group.data, groups, times) ### Plot the data by Time Plot.RM.Barchart(group.data, groups, times, FALSE) ## End(Not run)
Plots the taxa proportions for every group/time as a dot plot.
Plot.RM.Dotplot(group.data, groups, times, errorBars = TRUE, col = NULL, conf = .95, alpha = 1)
Plot.RM.Dotplot(group.data, groups, times, errorBars = TRUE, col = NULL, conf = .95, alpha = 1)
group.data |
A list of matrices of taxonomic counts(columns) for each sample(rows). |
groups |
A vector indicating group membership. |
times |
A vector indicating time. |
errorBars |
When 'TRUE', error bars will also be displayed. |
col |
A vector of colors to use to denote taxa. |
conf |
The desired confidence limits. The default is 95% |
alpha |
The desired alpha level for the colors. |
A plot of the pi vectors for every group/time.
## Not run: data(saliva) data(throat) ### Reduce the size of the data saliva <- Data.filter(saliva, numTaxa=9) throat <- Data.filter(throat, numTaxa=9) ### Get the gamma value for the data saliva.gamma <- DM.MoM(saliva)$gamma throat.gamma <- DM.MoM(throat)$gamma mid.gamma <- (saliva.gamma + throat.gamma)/2 ### Generate a the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(10000, 20) ### Create data sets to be our time series in a list group.data <- list( Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, mid.gamma), Dirichlet.multinomial(nrs, throat.gamma) ) names(group.data) <- c( "Group 1, Time 1", "Group 2, Time 1", "Group 1, Time 2", "Group 2, Time 2", "Group 1, Time 3", "Group 2, Time 3" ) ### Set the group and time information for each list element groups <- c(1, 2, 1, 2, 1, 2) times <- c(1, 2, 3, 1, 2, 3) ### Plot the data with error bars Plot.RM.Dotplot(group.data, groups, times) ### Plot the data without error bars Plot.RM.Dotplot(group.data, groups, times, FALSE) ## End(Not run)
## Not run: data(saliva) data(throat) ### Reduce the size of the data saliva <- Data.filter(saliva, numTaxa=9) throat <- Data.filter(throat, numTaxa=9) ### Get the gamma value for the data saliva.gamma <- DM.MoM(saliva)$gamma throat.gamma <- DM.MoM(throat)$gamma mid.gamma <- (saliva.gamma + throat.gamma)/2 ### Generate a the number of reads per sample ### The first number is the number of reads and the second is the number of subjects nrs <- rep(10000, 20) ### Create data sets to be our time series in a list group.data <- list( Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, saliva.gamma), Dirichlet.multinomial(nrs, mid.gamma), Dirichlet.multinomial(nrs, throat.gamma) ) names(group.data) <- c( "Group 1, Time 1", "Group 2, Time 1", "Group 1, Time 2", "Group 2, Time 2", "Group 1, Time 3", "Group 2, Time 3" ) ### Set the group and time information for each list element groups <- c(1, 2, 1, 2, 1, 2) times <- c(1, 2, 3, 1, 2, 3) ### Plot the data with error bars Plot.RM.Dotplot(group.data, groups, times) ### Plot the data without error bars Plot.RM.Dotplot(group.data, groups, times, FALSE) ## End(Not run)
The saliva data set formed by the Ranked-abundance distribution vectors of 24 subjects. The RAD vectors contains 21 elements formed by the 20 most abundant taxa at the genus level and additional taxa containing the sum of the remaining less abundant taxa per sample. Note that the incorporation of the additional taxon (taxon 21) in the analysis allows for estimating the RAD proportional-mean of taxa with respect to all the taxa within the sample.
data(saliva)
data(saliva)
The format is a matrix of 24 rows by 21 columns, with the each row being a separate subject and each column being a different taxa.
data(saliva)
data(saliva)
Tests two paired data sets for similarity.
Test.Paired(group.data, numPerms = 1000, parallel = FALSE, cores = 3)
Test.Paired(group.data, numPerms = 1000, parallel = FALSE, cores = 3)
group.data |
A list of 2 matrices of taxonomic counts(columns) for each sample(rows). |
numPerms |
Number of permutations. In practice this should be at least 1,000. |
parallel |
When this is 'TRUE' it allows for parallel calculation of the permutations. Requires the package |
cores |
The number of parallel processes to run if parallel is 'TRUE'. |
A pvalue.
data(saliva) data(throat) ### Since saliva and throat come from same subjects, the data is paired saliva1 <- saliva[-24,] # Make saliva 23 subjects to match throat group.data <- list(throat, saliva1) ### We use 1 for speed, should be at least 1,000 numPerms <- 1 pval <- Test.Paired(group.data, numPerms) pval
data(saliva) data(throat) ### Since saliva and throat come from same subjects, the data is paired saliva1 <- saliva[-24,] # Make saliva 23 subjects to match throat group.data <- list(throat, saliva1) ### We use 1 for speed, should be at least 1,000 numPerms <- 1 pval <- Test.Paired(group.data, numPerms) pval
The throat data set formed by the Ranked-abundance distribution vectors of 24 subjects. The RAD vectors contains 21 elements formed by the 20 most abundant taxa at the genus level and additional taxa containing the sum of the remaining less abundant taxa per sample. Note that the incorporation of the additional taxon (taxon 21) in the analysis allows for estimating the RAD proportional-mean of taxa with respect to all the taxa within the sample.
data(throat)
data(throat)
The format is a matrix of 24 rows by 21 columns, with the each row being a separate subject and each column being a different taxa.
data(throat)
data(throat)
The tongue data set formed by the Ranked-abundance distribution vectors of 24 subjects. The RAD vectors contains 21 elements formed by the 20 most abundant taxa at the genus level and additional taxa containing the sum of the remaining less abundant taxa per sample. Note that the incorporation of the additional taxon (taxon 21) in the analysis allows for estimating the RAD proportional-mean of taxa with respect to all the taxa within the sample.
data(tongue)
data(tongue)
The format is a matrix of 24 rows by 21 columns, with the each row being a separate subject and each column being a different taxa.
data(tongue)
data(tongue)
The palatine tonsil data set formed by the Ranked-abundance distribution vectors of 24 subjects. The RAD vectors contains 21 elements formed by the 20 most abundant taxa at the genus level and additional taxa containing the sum of the remaining less abundant taxa per sample. Note that the incorporation of the additional taxon (taxon 21) in the analysis allows for estimating the RAD proportional-mean of taxa with respect to all the taxa within the sample.
data(tonsils)
data(tonsils)
The format is a matrix of 24 rows by 21 columns, with the each row being a separate subject and each column being a different taxa.
data(tonsils)
data(tonsils)
This routine provides the value of the Likelihood-Ratio-Test Statistics and the corresponding p-value for evaluating the several sample Dirichlet-Multinomial parameter test comparison.
Xdc.sevsample(group.data, epsilon = 10^(-4), est = "mom")
Xdc.sevsample(group.data, epsilon = 10^(-4), est = "mom")
group.data |
A list where each element is a matrix of taxonomic counts(columns) for each sample(rows). (See Notes 1 and 2 in details) |
epsilon |
Convergence tolerance. To terminate, the difference between two succeeding log-likelihoods must be smaller than epsilon. Default value is 10^(-4). |
est |
The type of parameter estimator to be used with the Likelihood-ratio-test statistics, 'mle' or 'mom'. Default value is 'mom'. (See Note 3 in details) |
To assess whether the Dirichlet parameter vector, (a function of the RAD probability-mean vector and overdispersion), observed in
groups of microbiome samples are equal to each other, the following hypothesis
versus
can be tested. The null hypothesis implies that the HMP samples across groups have the same mean and overdispersion, indicating that the RAD models are identical. In particular, the likelihood-ratio test statistic is used, which is given by,
The asymptotic null distribution of follows a Chi-square with degrees of freedom equal to (J-1)*K, where K is the number of taxa (Wilks, 1938).
Note 1: The matrices in group.data
must contain the same taxa, in the same order.
Note 2: Each taxa should be present in at least 1 sample, a column with all 0's may result in errors and/or invalid results.
Note 3: 'mle' will take significantly longer time and may not be optimal for small sample sizes; 'mom' will provide more conservative results in such a case.
A list containing the Xdc statistics and p-value.
Wilks, S. S. (1938). The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. The Annals of Mathematical Statistics 9, 60-62.
data(saliva) data(throat) ### Combine the data sets into a single list group.data <- list(saliva, throat) xdc <- Xdc.sevsample(group.data) xdc
data(saliva) data(throat) ### Combine the data sets into a single list group.data <- list(saliva, throat) xdc <- Xdc.sevsample(group.data) xdc
This function computes the Generalized Wald-type test statistic (Wilson and Koehler, 1984) and corresponding p-value to assess whether the sample RAD probability-means from multiple populations are the same or different. The statistics assumes that a common RAD probability-mean vector for comparison under the null hypothesis is known.
Xmc.sevsample(group.data, pi0)
Xmc.sevsample(group.data, pi0)
group.data |
A list where each element is a matrix of taxonomic counts(columns) for each sample(rows). |
pi0 |
The RAD-probability mean vector. |
Note: The matrices in group.data
must contain the same taxa, in the same order.
A list containing the Generalized Wald-type statistics and p-value.
Wilson, J. R., and Koehler, K. J. (1984). Testing of equality of vectors of proportions for several cluster samples. Proceedings of Joint Statistical Association Meetings. Survey Research Methods.
data(saliva) data(throat) data(tonsils) ### Get pi from the dirichlet-multinomial parameters pi0 <- dirmult(saliva)$pi ### Combine the data sets into a single list group.data <- list(throat, tonsils) xmc <- Xmc.sevsample(group.data, pi0) xmc
data(saliva) data(throat) data(tonsils) ### Get pi from the dirichlet-multinomial parameters pi0 <- dirmult(saliva)$pi ### Combine the data sets into a single list group.data <- list(throat, tonsils) xmc <- Xmc.sevsample(group.data, pi0) xmc
This function computes the Cramer's Phi and Modified Cramer's Phi Criterion for the test statistic Xmcupo.sevsample
.
Xmcupo.effectsize(group.data)
Xmcupo.effectsize(group.data)
group.data |
A list where each element is a matrix of taxonomic counts(columns) for each sample(rows). |
Note: The matrices in group.data
must contain the same taxa, in the same order.
A vector containing the Chi-Squared statistic value, the Cramer's Phi Criterion, and the modified Cramer's Phi Criterion.
data(saliva) data(throat) ### Combine the data sets into a single list group.data <- list(saliva, throat) effect <- Xmcupo.effectsize(group.data) effect
data(saliva) data(throat) ### Combine the data sets into a single list group.data <- list(saliva, throat) effect <- Xmcupo.effectsize(group.data) effect
This function computes the Generalized Wald-type test statistic (Wilson and Koehler, 1984) and corresponding p-value to assess whether the sample RAD probability-means from multiple populations are same or different. The statistics assumes that a common RAD probability-mean vector for comparison under the null hypothesis is unknown.
Xmcupo.sevsample(group.data)
Xmcupo.sevsample(group.data)
group.data |
A list where each element is a matrix of taxonomic counts(columns) for each sample(rows). |
Note: The matrices in group.data
must contain the same taxa, in the same order.
A list containing the Generalized Wald-type statistics and p-value.
Wilson, J. R., and Koehler, K. J. (1984). Testing of equality of vectors of proportions for several cluster samples. Proceedings of Joint Statistical Association Meetings. Survey Research Methods.
data(saliva) data(tonsils) data(throat) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) xmcupo <- Xmcupo.sevsample(group.data) xmcupo
data(saliva) data(tonsils) data(throat) ### Combine the data sets into a single list group.data <- list(saliva, throat, tonsils) xmcupo <- Xmcupo.sevsample(group.data) xmcupo
This routine provides the value of the likelihood-ratio-test statistic and the corresponding p-value to assess whether the overdispersion observed in multiple groups of microbiome samples are equal.
Xoc.sevsample(group.data, epsilon = 10^(-4))
Xoc.sevsample(group.data, epsilon = 10^(-4))
group.data |
A list where each element is a matrix of taxonomic counts(columns) for each sample(rows). (See Notes 1 and 2 in details) |
epsilon |
Convergence tolerance. To terminate, the difference between two succeeding log-likelihoods must be smaller than epsilon. Default value is 10^(-4). |
To assess whether the over dispersion parameter vectors observed in
groups of microbiome samples are equal to each other, the following hypothesis
versus
can be tested. In particular, the likelihood-ratio test statistic is used (Tvedebrink, 2010), which is given by,
The asymptotic null distribution of follows a Chi-square with degrees of freedom equal to (J-1) (Wilks, 1938).
Note 1: The matrices in group.data
must contain the same taxa, in the same order.
Note 2: Each taxa should be present in at least 1 sample, a column with all 0's may result in errors and/or invalid results.
A list containing the Xoc statistics and p-value.
Tvedebrink, T. (2010). Overdispersion in allelic counts and theta-correction in forensic genetics. Theor Popul Biol 78, 200-210.
Wilks, S. S. (1938). The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. The Annals of Mathematical Statistics 9, 60-62.
data(saliva) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, tonsils) ## Not run: xoc <- Xoc.sevsample(group.data) xoc ## End(Not run)
data(saliva) data(tonsils) ### Combine the data sets into a single list group.data <- list(saliva, tonsils) ## Not run: xoc <- Xoc.sevsample(group.data) xoc ## End(Not run)
This routine provides the value of the Generalized Wald-type statistic to assess whether the RAD probability-mean observed in one group of samples is equal to a known RAD probability-mean.
Xsc.onesample(data, pi0)
Xsc.onesample(data, pi0)
data |
A matrix of taxonomic counts(columns) for each sample(rows). |
pi0 |
The RAD-probability mean vector. |
A list containing Generalized Wald-type statistics and p-value.
data(saliva) data(throat) ### Get pi from the dirichlet-multinomial parameters pi0 <- dirmult(saliva)$pi xsc <- Xsc.onesample(throat, pi0) xsc
data(saliva) data(throat) ### Get pi from the dirichlet-multinomial parameters pi0 <- dirmult(saliva)$pi xsc <- Xsc.onesample(throat, pi0) xsc