Title: | Infer Community Assembly Mechanisms by Phylogenetic-Bin-Based Null Model Analysis |
---|---|
Description: | To implement a general framework to quantitatively infer Community Assembly Mechanisms by Phylogenetic-bin-based null model analysis, abbreviated as 'iCAMP' (Ning et al 2020) <doi:10.1038/s41467-020-18560-z>. It can quantitatively assess the relative importance of different community assembly processes, such as selection, dispersal, and drift, for both communities and each phylogenetic group ('bin'). Each bin usually consists of different taxa from a family or an order. The package also provides functions to implement some other published methods, including neutral taxa percentage (Burns et al 2016) <doi:10.1038/ismej.2015.142> based on neutral theory model and quantifying assembly processes based on entire-community null models ('QPEN', Stegen et al 2013) <doi:10.1038/ismej.2013.93>. It also includes some handy functions, particularly for big datasets, such as phylogenetic and taxonomic null model analysis at both community and bin levels, between-taxa niche difference and phylogenetic distance calculation, phylogenetic signal test within phylogenetic groups, midpoint root of big trees, etc. Version 1.3.x mainly improved the function for 'QPEN' and added function 'icamp.cate()' to summarize 'iCAMP' results for different categories of taxa (e.g. core versus rare taxa). |
Authors: | Daliang Ning |
Maintainer: | Daliang Ning <[email protected]> |
License: | GPL-2 |
Version: | 1.5.12 |
Built: | 2024-12-07 06:57:24 UTC |
Source: | CRAN |
This package is to implement a general framework to quantitatively infer Community Assembly Mechanisms by Phylogenetic-bin-based null model analysis, abbreviated as iCAMP (Ning et al 2020). It can quantitatively assess the relative importance of different community assembly processes, such as selection, dispersal, and drift, for both communities and each phylogenetic group ('bin'). Each bin usually consists of different taxa from a family or an order. The package also provides functions to implement some other published methods, including neutral taxa percentage (Burns et al 2016) based on neutral theory model (Sloan et al 2006) and quantifying assembly processes based on entire-community null models (Stegen et al 2013). It also includes quite a few handy functions, particularly for big datasets, such as phylogenetic and taxonomic null model analysis at both community and bin levels, between-taxa niche difference and phylogenetic distance calculation, phylogenetic signal test within phylogenetic groups, midpoint root of big trees, etc. URL: https://github.com/DaliangNing/iCAMP1
Version 1.2.4: the first formal version of iCAMP for CRAN. Version 1.2.5: correct typo in description and fix the error of memory.limit issue. Version 1.2.6: revise the help document of qpen to include an example for big datasets. Version 1.2.7: remove setwd in functions; add options to specify file names; change dontrun to donttest and revise save.wd in some help documents. Version 1.2.8: revise dniche to avoid unnecessary file. Version 1.2.9: update iCAMP paper newly published on Nature Communications and the GitHub link. Version 1.2.10: fix minor bug when output.wd is NULL in icamp.big. Version 1.2.11: fix minor bug in icamp.big when comm is data.frame. Version 1.3.1: add bNTI.big and bMNTD.big, and revise qpen to handle big datasets better. Version 1.3.2: revise icamp.bins to fix error when an input taxonomy name has unrecognizable character; revise icamp.boot to fix error when there is no outlier. Version 1.3.3: add icamp.cate to summary for each category of taxa, e.g. core versus rare taxa. Version 1.3.4: typo and format. Version 1.3.5: revise icamp.big to correct error when using strict bin IDs when omit small bins. Version 1.4.1: add function 'qpen.test' for bootstrapping test on 'qpen' results. Version 1.4.2: add options in icamp.big, RC.pc, and RC.bin.bigc to allow relative abundances (value < 1) in community matrix, community data transformation, and use of other taxonomic dissimilarity indexes. Version 1.4.3: debug to allow input community matrix only has two samples. Also provide a temporary solution for the failure of makeCluster in some OS. Version 1.4.4: debug ps.bin and icamp.cate to avoid error in special cases. Version 1.4.5: add option to taxa.binphy.big and icamp.big to handle trees with single edge from root. Version 1.4.6: debug icamp.big, fix 'differing number of rows' issue in version 1.4.2 to 1.4.5. Version 1.4.7: speed up qpen.test when there are numerous between-group comparisons. Version 1.4.8: fix a bug in function taxa.binphy.big. Version 1.4.9: internal version. Version 1.4.10: fix a potential bug in function maxbigm. Version 1.4.11: debug for function icamp.boot. Version 1.5.1: add functions qpen.cm, RC.cm, bNTI.cm, and bNTI.big.cm, to deal with samples from multiple metacommunities. Version 1.5.2: add functions icamp.cm, NTI.cm, NRI.cm, bNRI.cm, bNTI.bin.cm, bNRI.bin.cm, and RC.bin.cm, to deal with samples from multiple metacommunities. Version 1.5.3(20210924): add function pdist.p to calculate phylogenetic distance for relatively small datasets. Version 1.5.4(20211209): correct 'paste' error in functions bNRI.bin.big and bNRI.bin.cm. Version 1.5.5(20220210): add icamp.cm2 function to allow different metacommunity settings for taxonomic and phylogenetic null models. Version 1.5.6(20220410): fix error and warnings from package check. Version 1.5.7(20220410): fix notes from package check. Version 1.5.8(20220421): fix error when nworker=1 in several functions. Version 1.5.9(20220421): fix error in function bNTI.bin.cm. Version 1.5.10(20220425): correct parallel thread number in examples. Version 1.5.11(20220426): fix error in special cases in fucntion bNTI.bin.cm. Version 1.5.12(20220529): fix warnings due to working directory issues of some examples in help documents.
Package: | iCAMP |
Type: | Package |
Version: | 1.5.12 |
Date: | 2022-5-29 |
License: | GPL-2 |
Daliang Ning <[email protected]>
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Burns, A.R., Stephens, W.Z., Stagaman, K., Wong, S., Rawls, J.F., Guillemin, K. et al. (2016). Contribution of neutral processes to the assembly of gut microbial communities in the zebrafish over host development. Isme Journal, 10, 655-664.
Sloan, W.T., Lunn, M., Woodcock, S., Head, I.M., Nee, S. & Curtis, T.P. (2006). Quantifying the roles of immigration and chance in shaping prokaryote community structure. Environmental Microbiology, 8, 732-740.
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. Isme Journal, 7, 2069-2079.
data("example.data") comm=example.data$comm tree=example.data$tree # since need to save some outputs to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the path for 'save.wd'. wd0=getwd() # please change to the folder you want to save the pd.big output. save.wd=paste0(tempdir(),"/pdbig") nworker=2 # parallel computing thread number rand.time=20 # usually use 1000 for real data. bin.size.limit=5 # for real data, usually use a proper number # according to phylogenetic signal test or try some settings # then choose the reasonable stochasticity level. # our experience is 12, or 24, or 48. # but for this example dataset which is too small, have to use 5. icamp.out=icamp.big(comm=comm,tree=tree,pd.wd=save.wd, rand=rand.time, nworker=nworker, bin.size.limit=bin.size.limit) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # since need to save some outputs to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the path for 'save.wd'. wd0=getwd() # please change to the folder you want to save the pd.big output. save.wd=paste0(tempdir(),"/pdbig") nworker=2 # parallel computing thread number rand.time=20 # usually use 1000 for real data. bin.size.limit=5 # for real data, usually use a proper number # according to phylogenetic signal test or try some settings # then choose the reasonable stochasticity level. # our experience is 12, or 24, or 48. # but for this example dataset which is too small, have to use 5. icamp.out=icamp.big(comm=comm,tree=tree,pd.wd=save.wd, rand=rand.time, nworker=nworker, bin.size.limit=bin.size.limit) setwd(wd0)
Calculates beta MNTD (beta mean nearest taxon distance, Webb et al 2008) for taxa in each pair of communities in a givern community matrix.
bmntd(comm, pd, abundance.weighted = TRUE, exclude.conspecifics = FALSE,time.output=FALSE, unit.sum=NULL, spname.check = TRUE, silent = TRUE)
bmntd(comm, pd, abundance.weighted = TRUE, exclude.conspecifics = FALSE,time.output=FALSE, unit.sum=NULL, spname.check = TRUE, silent = TRUE)
comm |
matrix or data.frame, community data matrix, rownames are sample names, colnames are OTU ids. |
pd |
matrix, pairwise phylogenetic distance matrix. |
abundance.weighted |
logic, whether weighted by species abundance, default is TRUE, means weighted. |
exclude.conspecifics |
logic, whether conspecific taxa in different communities be exclude from beta MNTD calculations, default is FALSE. |
time.output |
logic, whether to count calculation time, default is FALSE. |
unit.sum |
NULL or a number or a nemeric vector. When unit.sum is not NULL and a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this special transformation. |
spname.check |
logic, whether to check the species names in comm and pd. |
silent |
logic, if FALSE, some messages will be showed if any mismatch in spcies names. |
beta mean nearest taxon distance for taxa in each pair of communities. Modified from 'comdistnt' in package 'picante'(Kembel et al 2010), this function includes matrix multiplication to be efficient for medium size dataset.
result is a distance object of pairwise beta MNTD between samples.
Version 3: 2020.8.16, add examples. Version 2: 2018.10.15, add unit.sum option. if unit.sum!=NULL, will calculate relative abundance according to unit.sum. Version 1: 2015.9.23
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W., Cowan, P.D., Helmus, M.R., Cornwell, W.K., Morlon, H., Ackerly, D.D. et al. (2010). Picante: R tools for integrating phylogenies and ecology. Bioinformatics, 26, 1463-1464.
data("example.data") comm=example.data$comm pd=example.data$pd bmntd.wt=bmntd(comm, pd, abundance.weighted = TRUE, exclude.conspecifics = FALSE)
data("example.data") comm=example.data$comm pd=example.data$pd bmntd.wt=bmntd(comm, pd, abundance.weighted = TRUE, exclude.conspecifics = FALSE)
Calculates beta MNTD (beta mean nearest taxon distance, Webb et al 2008) for taxa in each pair of communities in a givern community matrix, using bigmemory (Kane et al 2013) to deal with too large dataset.
bmntd.big(comm, pd.desc = "pd.desc", pd.spname, pd.wd, spname.check = FALSE, abundance.weighted = TRUE, exclude.conspecifics = FALSE, time.output = FALSE)
bmntd.big(comm, pd.desc = "pd.desc", pd.spname, pd.wd, spname.check = FALSE, abundance.weighted = TRUE, exclude.conspecifics = FALSE, time.output = FALSE)
comm |
matrix or data.frame, community data matrix, rownames are sample names, colnames are taxa ids. |
pd.desc |
character, the name to describe bigmemory file of phylogenetic distance matrix, default is "pd.desc". |
pd.spname |
vector, the OTU ids (species names) in exactly the same order as the phylogenetic matrix rows or columns |
pd.wd |
the path of the folder saving the phylogenetic distance matrix. |
spname.check |
logic, whether to check the OTU ids (species names) in community matrix and phylogenetic distance matrix are the same. |
abundance.weighted |
logic, whether weighted by species abundance, default is TRUE, means weighted. |
exclude.conspecifics |
logic, whether conspecific taxa in different communities be exclude from beta MNTD calculations, default is FALSE. |
time.output |
logic, whether to count calculation time, default is FALSE. |
beta mean nearest taxon distance for taxa in each pair of communities. Improved from 'comdistnt' in package 'picante'(Kembel et al 2010). This function adds bigmemory part (Kane et al 2013) to deal with large dataset.
result is a distance object.
Version 4: 2020.12.5, copy from package NST to iCAMP to improve the function qpen. Version 3: 2020.9.9, remove setwd; change dontrun to donttest and revise save.wd in help doc. Version 2: 2020.8.22, add to NST package, update help document. Version 1: 2017.3.13
Daliang Ning ([email protected])
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W., Cowan, P.D., Helmus, M.R., Cornwell, W.K., Morlon, H., Ackerly, D.D. et al. (2010). Picante: R tools for integrating phylogenies and ecology. Bioinformatics, 26, 1463-1464.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm tree=example.data$tree # since it needs to save some file to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bmntd.big") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) bmntd.wt=bmntd.big(comm=comm, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, abundance.weighted = TRUE) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # since it needs to save some file to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bmntd.big") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) bmntd.wt=bmntd.big(comm=comm, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, abundance.weighted = TRUE) setwd(wd0)
Calculates mean pairwise distance separating taxa in each pair of communities in a given community matrix.
bmpd(comm, pd, abundance.weighted = TRUE, na.zero = TRUE, time.output = FALSE, unit.sum = NULL)
bmpd(comm, pd, abundance.weighted = TRUE, na.zero = TRUE, time.output = FALSE, unit.sum = NULL)
comm |
matrix or data.frame, community data matrix, rownames are sample names, colnames are OTU ids. |
pd |
matrix, pairwise phylogenetic distance matrix. |
abundance.weighted |
logic, whether weighted by species abundance, default is TRUE, means weighted. |
na.zero |
logic. when the sum of a row (a sample) is zero in community data matrix, the relative abundance will be NAN. Sometimes, to avoid some problem in following calculation, this kind of NAN value need be set as zero. Defalt is TRUE. |
time.output |
logic, whether to count calculation time, default is FALSE. |
unit.sum |
When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this special transformation. |
beta mean pairwise distance.
Output is a distance object of pairwise betaMPD between samples.
Version 3: 2020.8.16, add examples. Version 2: 2018.10.3, add unit.sum option. if unit.sum!=NULL, will calculate relative abundance according to unit.sum Version 1: 2015.8.21.
Daliang Ning
Webb CO, Ackerly DD, and Kembel SW. 2008. Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 18:2098-2100
data("example.data") comm=example.data$comm pd=example.data$pd bmpd.wt=bmpd(comm, pd, abundance.weighted = TRUE)
data("example.data") comm=example.data$comm pd=example.data$pd bmpd.wt=bmpd(comm, pd, abundance.weighted = TRUE)
Perform null model test based on a phylogenetic beta diversity index, beta mean pairwise distance (betaMPD), in each bin; calculate beta net relatedness index (betaNRI. Webb et al 2008), or modified Raup-Crick metric, or confidence level based on the comparison between observed and null betaMPD in each bin. The package bigmemory (Kane et al 2013) is used to handle very large phylogenetic distance matrix.
bNRI.bin.big(comm, pd.desc, pd.spname, pd.wd, pdid.bin, sp.bin, spname.check = FALSE, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMPD = FALSE, sig.index=c("SES","Confidence","RC","bNRI"), unit.sum = NULL, correct.special = FALSE, detail.null=FALSE, special.method=c("MPD","MNTD","both"), ses.cut=1.96,rc.cut=0.95, conf.cut=0.975, dirichlet = FALSE)
bNRI.bin.big(comm, pd.desc, pd.spname, pd.wd, pdid.bin, sp.bin, spname.check = FALSE, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMPD = FALSE, sig.index=c("SES","Confidence","RC","bNRI"), unit.sum = NULL, correct.special = FALSE, detail.null=FALSE, special.method=c("MPD","MNTD","both"), ses.cut=1.96,rc.cut=0.95, conf.cut=0.975, dirichlet = FALSE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a species or OTU or gene, thus rownames should be sample IDs, colnames should be taxa IDs. |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
pdid.bin |
list, each element is a vector of integer, indicating which rows/columns in the big phylogenetic matrix represent the taxa in a bin. |
sp.bin |
one-column matrix, rownames are taxa IDs (i.e. OTU IDs), the only column shows the bin ID of each taxon. Bin IDs are integers in the same order as the elements in the list of pdid.bin. |
spname.check |
logic, whether to check the OTU ids (species names) in community matrix and phylogenetic distance matrix are the same. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
rand |
integer, randomization times. default is 1000. |
output.bMPD |
logic, if TRUE, the output will include beta mean pairwise distance (betaMPD). |
sig.index |
character, the index for null model significance test. SES or bNRI, standard effect size, i.e. beta net relatedness index (betaNRI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on betaMPD, i.e. count the number of null betaMPD lower than observed betaMPD plus a half of the number of null betaMPD equal to observed betaMPD, to get alpha, then calculate betaMPD-based RC as (2 x alpha - 1). default is SES. If input a vector, only the first element will be used. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. Usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this transformation. |
correct.special |
logic, whether to correct the special cases. Default is FALSE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MPD, use null model test based on mean pairwise distance; MNTD, use null model test of mean nearest taxon distance; both, use null model test of both MPD and MNTD. Default is MPD. |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
dirichlet |
Logic. If TRUE, the taxonomic null model for correcting special cases will use Dirichlet distribution to generate relative abundances in randomized community matrix. default is FALSE. |
The beta net relatedness index (betaNRI; Webb et al. 2008, Stegen et al 2012) is calculated for each phylogenetic bin. betaNRI is a standardized measure of the mean pairwise distance between samples/communities (betaMPD). Parallel computing is used to improve the speed.
The null model algorithm is "taxa shuffle" (Kembel 2009), i.e. shuffling taxa labels across the tips of the phylogenetic tree to randomize phylogenetic relationships among species. In this function, taxa will be randomized across all bins.
In the betaNRI of each bin, the diagonal are set as zero. If the randomized results are all the same, the standard deviation will be zero and betaNRI will be NAN. In this case, betaNRI will be set as zero, since the observed result is not differentiable from randomized results.
Modified RC (Chase et al 2011) and Confidence (Ning et al 2020) are alternative significance test indexes to evaluate how the observed beta diversity index deviates from null expectation, which could be a better metric than standardized effect size (betaNRI) in some cases, e.g. null values do not follow normal distribution.
Output is a list with following elements:
index |
list, each element is a square matrix of betaNRI (or RC or Confidence based on betaMPD) values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
sig.index |
character, indicates the index for null model significance test, SES (i.e. betaNRI), RC, or Confidence. |
betaMPD.obs |
Output only if output.bMPD is TRUE. A list, each element is a square matrix of observed beta MPD values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
rand |
Output only if detail.null is TRUE. A list, each element is a matrix with null values of beta MPD for each turnover of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
special.crct |
Output only if detail.null is TRUE. NULL if correct.special is FALSE. A list with three elements, corresponding to three different null model significance testing indexes, i.e. SES, RC, and Confidence. Each element is a matrix, where the value is zero if the result for a turnover of a bin does not need to correct, otherwise there will be a corrected value. |
Version 8: 2021.12.9, previous 'paste' led to error; corrected to 'paste0'. Version 7: 2021.4.18, fix the bug when detail.null=TRUE and comm has only two samples. Version 6: 2020.9.1, remove setwd. change dontrun to donttest and revise save.wd in help doc. Version 5: 2020.8.18, update help document, add example. Version 4: 2020.8.1, change RC opiton to sig.index, add detail.null and conf.cut. Version 3: 2018.10.15, add unit.sum, correct.special. Version 2: 2016.3.26, add RC option. Version 1: 2015.12.16
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
# this function is usually used in icamp.big when setting phylo.rand.scale="across", # means randomization across all bins in phylogenetic null model. data("example.data") comm=example.data$comm tree=example.data$tree pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the example on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNRI.bin.big") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNRIbins=bNRI.bin.big(comm=comm, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label, pd.wd=pd.big$pd.wd, pdid.bin=pdid.bin, sp.bin=sp.bin, spname.check = FALSE, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMPD = FALSE, sig.index="SES",unit.sum = NULL, correct.special = TRUE, detail.null=FALSE, special.method="MPD") setwd(wd0)
# this function is usually used in icamp.big when setting phylo.rand.scale="across", # means randomization across all bins in phylogenetic null model. data("example.data") comm=example.data$comm tree=example.data$tree pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the example on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNRI.bin.big") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNRIbins=bNRI.bin.big(comm=comm, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label, pd.wd=pd.big$pd.wd, pdid.bin=pdid.bin, sp.bin=sp.bin, spname.check = FALSE, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMPD = FALSE, sig.index="SES",unit.sum = NULL, correct.special = TRUE, detail.null=FALSE, special.method="MPD") setwd(wd0)
Perform null model test based on a phylogenetic beta diversity index, beta mean pairwise distance (betaMPD), in each bin; calculate beta net relatedness index (betaNRI. Webb et al 2008), or modified Raup-Crick metric, or confidence level based on the comparison between observed and null betaMPD in each bin. The package bigmemory (Kane et al 2013) is used to handle very large phylogenetic distance matrix. This function can deal with local communities under different metacommunities (regional pools).
bNRI.bin.cm(comm, meta.group = NULL, meta.spool = NULL, meta.frequency = NULL, meta.ab = NULL, pd.desc, pd.spname, pd.wd, pdid.bin, sp.bin, spname.check = FALSE, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMPD = FALSE, sig.index = c("SES", "Confidence", "RC", "bNRI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, dirichlet = FALSE)
bNRI.bin.cm(comm, meta.group = NULL, meta.spool = NULL, meta.frequency = NULL, meta.ab = NULL, pd.desc, pd.spname, pd.wd, pdid.bin, sp.bin, spname.check = FALSE, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMPD = FALSE, sig.index = c("SES", "Confidence", "RC", "bNRI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, dirichlet = FALSE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a species or OTU or gene, thus rownames should be sample IDs, colnames should be taxa IDs. |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. rownames are sample IDs. first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.spool |
a list object, each element is a character vector listing all taxa IDs in a metacommunity. The names of the elements indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default is NULL, means to use the observed taxa in comm across samples within the same metacommunity that is defined by meta.group. |
meta.frequency |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.ab |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group. |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
pdid.bin |
list, each element is a vector of integer, indicating which rows/columns in the big phylogenetic matrix represent the taxa in a bin. |
sp.bin |
one-column matrix, rownames are taxa IDs (i.e. OTU IDs), the only column shows the bin ID of each taxon. Bin IDs are integers in the same order as the elements in the list of pdid.bin. |
spname.check |
logic, whether to check the OTU ids (species names) in community matrix and phylogenetic distance matrix are the same. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
rand |
integer, randomization times. default is 1000. |
output.bMPD |
logic, if TRUE, the output will include beta mean pairwise distance (betaMPD). |
sig.index |
character, the index for null model significance test. SES or bNRI, standard effect size, i.e. beta net relatedness index (betaNRI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on betaMPD, i.e. count the number of null betaMPD lower than observed betaMPD plus a half of the number of null betaMPD equal to observed betaMPD, to get alpha, then calculate betaMPD-based RC as (2 x alpha - 1). default is SES. If input a vector, only the first element will be used. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. Usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this transformation. |
correct.special |
logic, whether to correct the special cases. Default is FALSE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MPD, use null model test based on mean pairwise distance; MNTD, use null model test of mean nearest taxon distance; both, use null model test of both MPD and MNTD. Default is MPD. |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
dirichlet |
Logic. If TRUE, the taxonomic null model for correcting special cases will use Dirichlet distribution to generate relative abundances in randomized community matrix. default is FALSE. |
This function is particularly designed for samples from different metacommunities. The null model "taxa shuffle" will be done under different metacommunities, separately (and independently). All other details are the same as the function bNRI.bin.big.
Output is a list with following elements:
index |
list, each element is a square matrix of betaNRI (or RC or Confidence based on betaMPD) values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
sig.index |
character, indicates the index for null model significance test, SES (i.e. betaNRI), RC, or Confidence. |
betaMPD.obs |
Output only if output.bMPD is TRUE. A list, each element is a square matrix of observed beta MPD values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
rand |
Output only if detail.null is TRUE. A list, each element is a matrix with null values of beta MPD for each turnover of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
special.crct |
Output only if detail.null is TRUE. NULL if correct.special is FALSE. A list with three elements, corresponding to three different null model significance testing indexes, i.e. SES, RC, and Confidence. Each element is a matrix, where the value is zero if the result for a turnover of a bin does not need to correct, otherwise there will be a corrected value. |
Version 2: 2021.12.9, previous 'paste' led to error; corrected to 'paste0'. Version 1: 2021.8.4
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
bNRI.bin.big
, icamp.cm
, bNRI.cm
# this function is usually used in icamp.cm when setting phylo.rand.scale="across", # means randomization across all bins in phylogenetic null model. data("example.data") comm=example.data$comm tree=example.data$tree pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the example on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNRI.bin.cm") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNRIbins=bNRI.bin.cm(comm=comm, meta.group=meta.group, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label, pd.wd=pd.big$pd.wd, pdid.bin=pdid.bin, sp.bin=sp.bin, spname.check = FALSE, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMPD = FALSE, sig.index="SES",unit.sum = NULL, correct.special = TRUE, detail.null=FALSE, special.method="MPD") setwd(wd0)
# this function is usually used in icamp.cm when setting phylo.rand.scale="across", # means randomization across all bins in phylogenetic null model. data("example.data") comm=example.data$comm tree=example.data$tree pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the example on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNRI.bin.cm") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNRIbins=bNRI.bin.cm(comm=comm, meta.group=meta.group, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label, pd.wd=pd.big$pd.wd, pdid.bin=pdid.bin, sp.bin=sp.bin, spname.check = FALSE, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMPD = FALSE, sig.index="SES",unit.sum = NULL, correct.special = TRUE, detail.null=FALSE, special.method="MPD") setwd(wd0)
Perform null model test based on a phylogenetic beta diversity index, beta mean pairwise distance (betaMPD); calculate beta net relatedness index (betaNRI), or modified Raup-Crick metric, or confidence level based on the comparison between observed and null betaMPD. Run by parallel computing. This function can deal with local communities under different metacommunities (regional pools).
bNRI.cm(comm, dis, nworker = 4, memo.size.GB = 50, meta.group = NULL, meta.spool = NULL, meta.frequency = NULL, meta.ab = NULL, weighted = c(TRUE, FALSE), rand = 1000, output.bMPD = c(FALSE, TRUE), sig.index = c("SES", "Confidence", "RC", "bNRI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, dirichlet = FALSE)
bNRI.cm(comm, dis, nworker = 4, memo.size.GB = 50, meta.group = NULL, meta.spool = NULL, meta.frequency = NULL, meta.ab = NULL, weighted = c(TRUE, FALSE), rand = 1000, output.bMPD = c(FALSE, TRUE), sig.index = c("SES", "Confidence", "RC", "bNRI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, dirichlet = FALSE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a species or OTU or gene, thus rownames should be sample IDs, colnames should be taxa IDs. |
dis |
matrix, pairwise phylogenetic distance matrix. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. rownames are sample IDs. first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.spool |
a list object, each element is a character vector listing all taxa IDs in a metacommunity. The names of the elements indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default is NULL, means to use the observed taxa in comm across samples within the same metacommunity that is defined by meta.group. |
meta.frequency |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.ab |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group. |
weighted |
logic, whether to use abundance-weighted or unweighted metrics. Default is TRUE. |
rand |
integer, randomization times. default is 1000. |
output.bMPD |
logic, if TRUE, the output will include beta mean pairwise distance (betaMPD). |
sig.index |
character, the index for null model significance test. SES or bNRI, standard effect size, i.e. beta net relatedness index (betaNRI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on betaMPD, i.e. count the number of null betaMPD lower than observed betaMPD plus a half of the number of null betaMPD equal to observed betaMPD, to get alpha, then calculate betaMPD-based RC as (2 x alpha - 1). default is SES. If input a vector, only the first element will be used. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. Usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this transformation. |
correct.special |
logic, whether to correct the special cases. Default is FALSE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MPD, use null model test based on mean pairwise distance; MNTD, use null model test of mean nearest taxon distance; both, use null model test of both MPD and MNTD. Default is MPD. |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
dirichlet |
Logic. If TRUE, the taxonomic null model for correcting special cases will use Dirichlet distribution to generate relative abundances in randomized community matrix. default is FALSE. |
This function is particularly designed for samples from different metacommunities. The null model "taxa shuffle" will be done under different metacommunities, separately (and independently). All other details are the same as the function bNRIn.p.
Output is a list with following elements:
index |
a square matrix of betaNRI (or RC or Confidence based on betaMPD) values. |
sig.index |
character, indicates the index for null model significance test, SES (i.e. betaNRI), RC, or Confidence. |
betaMPD.obs |
Output only if output.bMPD is TRUE. A square matrix of observed beta MPD values. |
rand |
Output only if detail.null is TRUE. A matrix with null values of beta MPD for each turnover. |
special.crct |
Output only if detail.null is TRUE. it will be NULL if correct.special is FALSE. Otherwise, it will be a list with three elements, corresponding to three different null model significance testing indexes, i.e. SES, RC, and Confidence. Each element is a square matrix, where the value is zero if the result for a turnover does not need to correct, otherwise there will be a corrected value. |
Version 1: 2021.8.4
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("example.data") comm=example.data$comm pd=example.data$pd # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number rand.time=4 # usually use 1000 for real data. bNRI=bNRI.cm(comm=comm, meta.group=meta.group, dis=pd, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMPD = FALSE, sig.index = "SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MPD")
data("example.data") comm=example.data$comm pd=example.data$pd # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number rand.time=4 # usually use 1000 for real data. bNRI=bNRI.cm(comm=comm, meta.group=meta.group, dis=pd, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMPD = FALSE, sig.index = "SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MPD")
Perform null model test based on a phylogenetic beta diversity index, beta mean pairwise distance (betaMPD); calculate beta net relatedness index (betaNRI), or modified Raup-Crick metric, or confidence level based on the comparison between observed and null betaMPD. Run by parallel computing.
bNRIn.p(comm, dis, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMPD = c(FALSE, TRUE), sig.index = c("SES", "Confidence", "RC", "bNRI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, dirichlet = FALSE)
bNRIn.p(comm, dis, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMPD = c(FALSE, TRUE), sig.index = c("SES", "Confidence", "RC", "bNRI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, dirichlet = FALSE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a species or OTU or gene, thus rownames should be sample IDs, colnames should be taxa IDs. |
dis |
matrix, pairwise phylogenetic distance matrix. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
logic, whether to use abundance-weighted or unweighted metrics. Default is TRUE. |
rand |
integer, randomization times. default is 1000. |
output.bMPD |
logic, if TRUE, the output will include beta mean pairwise distance (betaMPD). |
sig.index |
character, the index for null model significance test. SES or bNRI, standard effect size, i.e. beta net relatedness index (betaNRI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on betaMPD, i.e. count the number of null betaMPD lower than observed betaMPD plus a half of the number of null betaMPD equal to observed betaMPD, to get alpha, then calculate betaMPD-based RC as (2 x alpha - 1). default is SES. If input a vector, only the first element will be used. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. Usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this transformation. |
correct.special |
logic, whether to correct the special cases. Default is FALSE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MPD, use null model test based on mean pairwise distance; MNTD, use null model test of mean nearest taxon distance; both, use null model test of both MPD and MNTD. Default is MPD. |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
dirichlet |
Logic. If TRUE, the taxonomic null model for correcting special cases will use Dirichlet distribution to generate relative abundances in randomized community matrix. default is FALSE. |
The beta net relatedness index (betaNRI; Webb et al. 2008, Stegen et al. 2012) is a standardized measure of the mean pairwise distance between samples/communities (betaMPD). Parallel computing is used to improve the speed.
The null model algorithm is "taxa shuffle" (Kembel 2009), i.e. shuffling taxa labels across the tips of the phylogenetic tree to randomize phylogenetic relationships among species.
In the output of beta NRI, the diagonal are set as zero. If the randomized results are all the same, the standard deviation will be zero and betaNRI will be NAN. In this case, beta NRI will be set as zero, since the observed result is not differentiable from randomized results.
Modified RC (Chase et al 2011) and Confidence (Ning et al 2020) are alternative significance test indexes to evaluate how the observed beta diversity index deviates from null expectation, which could be a better metric than standardized effect size (betaNRI) in some cases, e.g. null values do not follow normal distribution.
Output is a list with following elements:
index |
a square matrix of betaNRI (or RC or Confidence based on betaMPD) values. |
sig.index |
character, indicates the index for null model significance test, SES (i.e. betaNRI), RC, or Confidence. |
betaMPD.obs |
Output only if output.bMPD is TRUE. A square matrix of observed beta MPD values. |
rand |
Output only if detail.null is TRUE. A matrix with null values of beta MPD for each turnover. |
special.crct |
Output only if detail.null is TRUE. it will be NULL if correct.special is FALSE. Otherwise, it will be a list with three elements, corresponding to three different null model significance testing indexes, i.e. SES, RC, and Confidence. Each element is a square matrix, where the value is zero if the result for a turnover does not need to correct, otherwise there will be a corrected value. |
Version 5: 2021.4.18, fix the bug when detail.null=TRUE and comm has only two samples. Version 4: 2020.8.18, update help document, add example. Version 3: 2020.8.1, change RC opiton to sig.index, add detail.null and conf.cut. Version 2: 2018.10.3, correct special cases Version 1: 2016.3.26.
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("example.data") comm=example.data$comm pd=example.data$pd nworker=2 # parallel computing thread number rand.time=4 # usually use 1000 for real data. bNRI=bNRIn.p(comm=comm, dis=pd, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMPD = FALSE, sig.index = "SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MPD")
data("example.data") comm=example.data$comm pd=example.data$pd nworker=2 # parallel computing thread number rand.time=4 # usually use 1000 for real data. bNRI=bNRIn.p(comm=comm, dis=pd, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMPD = FALSE, sig.index = "SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MPD")
To calculate pairwise beta nearest taxon index (betaNTI) by randomizing in the whole species pool or within each group. Package bigmemory (Kane et al 2013) is used to deal with large datasets.
bNTI.big(comm, meta.group=NULL, pd.desc="pd.desc", pd.spname,pd.wd, spname.check=TRUE, nworker=4, memo.size.GB=50, weighted=TRUE, exclude.consp=FALSE,rand=1000,output.dtail=FALSE, RC=FALSE, trace=TRUE)
bNTI.big(comm, meta.group=NULL, pd.desc="pd.desc", pd.spname,pd.wd, spname.check=TRUE, nworker=4, memo.size.GB=50, weighted=TRUE, exclude.consp=FALSE,rand=1000,output.dtail=FALSE, RC=FALSE, trace=TRUE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a species or OTU or gene, thus rownames should be sample IDs, colnames should be taxa IDs. |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. rownames are sample IDs. first column is metacommunity IDs. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples are under the same metacommunity (the same regional species pool). |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
spname.check |
logic, whether to check the OTU ids (species names) in community matrix and phylogenetic distance matrix are the same. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
exclude.consp |
Logic, should conspecific taxa in different communities be exclude from MNTD calculations? default is FALSE. The same as in the function bmntd. |
rand |
integer, randomization times. default is 1000. |
output.dtail |
logic, if TRUE, the betaNTI, RC value, observed betaMNTD, all null betaMNTD values will all be output, if FALSE, only output betaNTI or RC. |
RC |
logic, whether to use modified RC merics to evaluate significance of betaMNTD insteal of betaNTI (standardized effect size). |
trace |
logic, whether to show the progress when the code is running. |
The beta nearest taxon index (betaNTI) is a standardized measure of the mean phylogenetic distance to the nearest taxon between samples/communities (betaMNTD) and quantifies the extent of terminal clustering, independent of deep level clustering. There are a lot of null models for randomization, but this function only use phylogeny shuffle (the same as taxa.labels in ses.mntd).
In the output of betaNTI, the diagonal are set as zero. If the randomized results are all the same, the standard deviation will be zero and betaNTI will be NAN. In this case, beta NTI will be set as zero, since the observed result is not differentiable from randomized results. If the observed betaMNTD has NA values, the corresponding betaNTI will remain NA. Modified RC (Chase 2010) is another metric to evaluate how the observed betaMNTD deviates from null expectation, which could be a better metric than standardized effect size (classic betaNTI) in some cases.
If output.detail=FALSE (default), a matrix of betaNTI values (if RC=FALSE) or RC values (if RC=TRUE) is returned. If output.detail=TRUE, a list is returned.
bNTI |
a matrix of pairwise betaNTI values. |
RC.bMNTD |
a matrix of RC values based on null model test of betaMNTD. Ouput when RC=TRUE. |
bMNTD |
observed betaMNTD values. |
bMNTD.rand |
a matrix of all null results. |
Version 2: 2020.12.5, included into iCAMP package to improve the function qpen. Version 1: 2017.7.12
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNTI.big") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNTI=bNTI.big(comm=comm, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label,pd.wd=pd.big$pd.wd, spname.check=TRUE, nworker=nworker, memo.size.GB=50, weighted=TRUE, exclude.consp=FALSE,rand=rand.time, output.dtail=FALSE, RC=FALSE, trace=TRUE) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNTI.big") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNTI=bNTI.big(comm=comm, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label,pd.wd=pd.big$pd.wd, spname.check=TRUE, nworker=nworker, memo.size.GB=50, weighted=TRUE, exclude.consp=FALSE,rand=rand.time, output.dtail=FALSE, RC=FALSE, trace=TRUE) setwd(wd0)
To calculate pairwise beta nearest taxon index (betaNTI) by randomizing in the whole species pool or within each group. Package bigmemory (Kane et al 2013) is used to deal with large datasets. Besides, this function can deal with local communities under different metacommunities (regional pools).
bNTI.big.cm(comm, meta.group = NULL, meta.spool = NULL, pd.desc = "pd.desc", pd.spname, pd.wd, spname.check = TRUE, nworker = 4, memo.size.GB = 50, weighted = TRUE, exclude.consp = FALSE, rand = 1000, output.dtail = FALSE, RC = FALSE, trace = TRUE)
bNTI.big.cm(comm, meta.group = NULL, meta.spool = NULL, pd.desc = "pd.desc", pd.spname, pd.wd, spname.check = TRUE, nworker = 4, memo.size.GB = 50, weighted = TRUE, exclude.consp = FALSE, rand = 1000, output.dtail = FALSE, RC = FALSE, trace = TRUE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a species or OTU or gene, thus rownames should be sample IDs, colnames should be taxa IDs. |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. rownames are sample IDs. first column is metacommunity IDs. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples are under the same metacommunity (the same regional species pool). |
meta.spool |
a list object, each element is a character vector listing all taxa IDs in a metacommunity. The names of the elements indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default is NULL, means to use the observed taxa in comm across samples within the same metacommunity that is defined by meta.group. |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
spname.check |
logic, whether to check the OTU ids (species names) in community matrix and phylogenetic distance matrix are the same. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
exclude.consp |
Logic, should conspecific taxa in different communities be exclude from MNTD calculations? default is FALSE. The same as in the function bmntd. |
rand |
integer, randomization times. default is 1000. |
output.dtail |
logic, if TRUE, the betaNTI, RC value, observed betaMNTD, all null betaMNTD values will all be output, if FALSE, only output betaNTI or RC. |
RC |
logic, whether to use modified RC merics to evaluate significance of betaMNTD insteal of betaNTI (standardized effect size). |
trace |
logic, whether to show the progress when the code is running. |
This function is particularly designed for samples from different metacommunities. The null model "taxa shuffle" will be done under different metacommunities, separately (and independently). All other details are the same as the function bNTI.big.
If output.detail=FALSE (default), a matrix of betaNTI values (if RC=FALSE) or RC values (if RC=TRUE) is returned. If output.detail=TRUE, a list is returned.
bNTI |
a matrix of pairwise betaNTI values. |
RC.bMNTD |
a matrix of RC values based on null model test of betaMNTD. Ouput when RC=TRUE. |
bMNTD |
observed betaMNTD values. |
bMNTD.rand |
a matrix of all null results. |
Version 1: 2021.8.2
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm tree=example.data$tree # In this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNTI.big.cm") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNTI=bNTI.big.cm(comm=comm, meta.group=meta.group,pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label,pd.wd=pd.big$pd.wd, spname.check=TRUE, nworker=nworker, memo.size.GB=50, weighted=TRUE, exclude.consp=FALSE,rand=rand.time, output.dtail=FALSE, RC=FALSE, trace=TRUE) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # In this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNTI.big.cm") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNTI=bNTI.big.cm(comm=comm, meta.group=meta.group,pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label,pd.wd=pd.big$pd.wd, spname.check=TRUE, nworker=nworker, memo.size.GB=50, weighted=TRUE, exclude.consp=FALSE,rand=rand.time, output.dtail=FALSE, RC=FALSE, trace=TRUE) setwd(wd0)
Perform null model test based on a phylogenetic beta diversity index, beta mean phylogenetic distance to the nearest taxon (betaMNTD), in each bin; calculate beta nearest taxon index (betaNTI), or modified Raup-Crick metric, or confidence level based on the comparison between observed and null betaMNTD in each bin. The package bigmemory (Kane et al 2013) is used to handle very large phylogenetic distance matrix.
bNTI.bin.big(comm, pd.desc, pd.spname, pd.wd, pdid.bin, sp.bin, spname.check = FALSE, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMNTD = c(FALSE, TRUE), sig.index=c("SES","Confidence","RC","bNTI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MNTD","MPD","both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, exclude.conspecifics = FALSE, dirichlet = FALSE)
bNTI.bin.big(comm, pd.desc, pd.spname, pd.wd, pdid.bin, sp.bin, spname.check = FALSE, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMNTD = c(FALSE, TRUE), sig.index=c("SES","Confidence","RC","bNTI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MNTD","MPD","both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, exclude.conspecifics = FALSE, dirichlet = FALSE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a species or OTU or gene, thus rownames should be sample IDs, colnames should be taxa IDs. |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
pdid.bin |
list, each element is a vector of integer, indicating which rows/columns in the big phylogenetic matrix represent the taxa in a bin. |
sp.bin |
one-column matrix, rownames are taxa IDs (i.e. OTU IDs), the only column shows the bin ID of each taxon. Bin IDs are integers in the same order as the elements in the list of pdid.bin. |
spname.check |
logic, whether to check the OTU ids (species names) in community matrix and phylogenetic distance matrix are the same. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
rand |
integer, randomization times. default is 1000. |
output.bMNTD |
logic, if TRUE, the output will include betaMNTD. |
sig.index |
character, the index for null model significance test. SES or bNTI, standard effect size, i.e. beta nearest taxon index (betaNTI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on betaMNTD, i.e. count the number of null betaMNTD lower than observed betaMNTD plus a half of the number of null betaMNTD equal to observed betaMNTD, to get alpha, then calculate betaMNTD-based RC as (2 x alpha - 1). default is SES. If input a vector, only the first element will be used. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. Usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this transformation. |
correct.special |
logic, whether to correct the special cases. Default is FALSE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MNTD, use null model test of mean distance to the nearest taxon; MPD, use null model test based on mean pairwise distance; both, use null model test of both MPD and MNTD. Default is MNTD. |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
exclude.conspecifics |
Logic, should conspecific taxa in different communities be exclude from MNTD calculations? default is FALSE. The same as in the function bmntd. |
dirichlet |
Logic. If TRUE, the taxonomic null model for correcting special cases will use Dirichlet distribution to generate relative abundances in randomized community matrix. default is FALSE. |
The beta nearest taxon index (betaNTI; Webb et al. 2008, Stegen et al. 2012) is calculated for each phylogenetic bin. betaNTI is a standardized measure of the mean phylogenetic distance to the nearest taxon between samples/communities (beta MNTD) and quantifies the extent of terminal clustering, independent of deep level clustering. Parallel computing is used to improve the speed.
The null model algorithm is "taxa shuffle" (Kembel 2009), i.e. shuffling taxa labels across the tips of the phylogenetic tree to randomize phylogenetic relationships among species. In this function, taxa will be randomized across all bins.
In the betaNTI of each bin, the diagonal are set as zero. If the randomized results are all the same, the standard deviation will be zero and betaNTI will be NAN. In this case, betaNTI will be set as zero, since the observed result is not differentiable from randomized results.
Modified RC (Chase et al 2011) and Confidence (Ning et al 2020) are alternative significance test indexes to evaluate how the observed beta diversity index deviates from null expectation, which could be a better metric than standardized effect size (betaNTI) in some cases, e.g. null values do not follow normal distribution.
Output is a list with following elements:
index |
list, each element is a square matrix of betaNTI values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
sig.index |
character, indicates the index for null model significance test, SES (i.e. betaNTI), RC, or Confidence. |
betaMNTD.obs |
Output only if output.bMNTD is TRUE. A list, each element is a square matrix of observed beta MNTD values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
rand |
Output only if detail.null is TRUE. A list, each element is a matrix with null values of beta MNTD for each turnover of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
special.crct |
Output only if detail.null is TRUE. NULL if correct.special is FALSE. A list with three elements, corresponding to three different null model significance testing indexes, i.e. SES, RC, and Confidence. Each element is a matrix, where the value is zero if the result for a turnover of a bin does not need to correct, otherwise there will be a corrected value. |
Version 8: 2021.4.18, fix the bug when detail.null=TRUE and comm has only two samples. Version 7: 2020.9.1, remove setwd. change dontrun to donttest and revise save.wd in help doc. Version 6: 2020.8.18, update help document, add example. Version 5: 2020.8.1, change RC opiton to sig.index, add detail.null, rc.cut and conf.cut. Version 4: 2019.11.6, add exclude.conspecifics. Version 3: 2018.10.15, add unit.sum, correct.special. Version 2: 2016.3.26, add RC option. Version 1: 2015.12.16
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
# function 'bNTI.bin.big' is usually used in the main function, 'icamp.big', # when setting phylo.rand.scale="across", # means randomization across all bins in phylogenetic null model. data("example.data") comm=example.data$comm tree=example.data$tree pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNTI.bin.big") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNTIbins=bNTI.bin.big(comm=comm, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label, pd.wd=pd.big$pd.wd, pdid.bin=pdid.bin, sp.bin=sp.bin, spname.check = TRUE, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMNTD = FALSE, sig.index="SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MNTD", exclude.conspecifics = FALSE) setwd(wd0)
# function 'bNTI.bin.big' is usually used in the main function, 'icamp.big', # when setting phylo.rand.scale="across", # means randomization across all bins in phylogenetic null model. data("example.data") comm=example.data$comm tree=example.data$tree pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNTI.bin.big") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNTIbins=bNTI.bin.big(comm=comm, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label, pd.wd=pd.big$pd.wd, pdid.bin=pdid.bin, sp.bin=sp.bin, spname.check = TRUE, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMNTD = FALSE, sig.index="SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MNTD", exclude.conspecifics = FALSE) setwd(wd0)
Perform null model test based on a phylogenetic beta diversity index, beta mean phylogenetic distance to the nearest taxon (betaMNTD), in each bin; calculate beta nearest taxon index (betaNTI), or modified Raup-Crick metric, or confidence level based on the comparison between observed and null betaMNTD in each bin. The package bigmemory (Kane et al 2013) is used to handle very large phylogenetic distance matrix. This function can deal with local communities under different metacommunities (regional pools).
bNTI.bin.cm(comm, meta.group = NULL, meta.spool = NULL, meta.frequency = NULL, meta.ab = NULL, pd.desc, pd.spname, pd.wd, pdid.bin, sp.bin, spname.check = FALSE, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMNTD = c(FALSE, TRUE), sig.index = c("SES", "Confidence", "RC", "bNTI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MNTD", "MPD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, exclude.conspecifics = FALSE, dirichlet = FALSE)
bNTI.bin.cm(comm, meta.group = NULL, meta.spool = NULL, meta.frequency = NULL, meta.ab = NULL, pd.desc, pd.spname, pd.wd, pdid.bin, sp.bin, spname.check = FALSE, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, output.bMNTD = c(FALSE, TRUE), sig.index = c("SES", "Confidence", "RC", "bNTI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MNTD", "MPD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, exclude.conspecifics = FALSE, dirichlet = FALSE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a species or OTU or gene, thus rownames should be sample IDs, colnames should be taxa IDs. |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. rownames are sample IDs. first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.spool |
a list object, each element is a character vector listing all taxa IDs in a metacommunity. The names of the elements indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default is NULL, means to use the observed taxa in comm across samples within the same metacommunity that is defined by meta.group. |
meta.frequency |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.ab |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group. |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
pdid.bin |
list, each element is a vector of integer, indicating which rows/columns in the big phylogenetic matrix represent the taxa in a bin. |
sp.bin |
one-column matrix, rownames are taxa IDs (i.e. OTU IDs), the only column shows the bin ID of each taxon. Bin IDs are integers in the same order as the elements in the list of pdid.bin. |
spname.check |
logic, whether to check the OTU ids (species names) in community matrix and phylogenetic distance matrix are the same. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
rand |
integer, randomization times. default is 1000. |
output.bMNTD |
logic, if TRUE, the output will include betaMNTD. |
sig.index |
character, the index for null model significance test. SES or bNTI, standard effect size, i.e. beta nearest taxon index (betaNTI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on betaMNTD, i.e. count the number of null betaMNTD lower than observed betaMNTD plus a half of the number of null betaMNTD equal to observed betaMNTD, to get alpha, then calculate betaMNTD-based RC as (2 x alpha - 1). default is SES. If input a vector, only the first element will be used. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. Usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this transformation. |
correct.special |
logic, whether to correct the special cases. Default is FALSE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MNTD, use null model test of mean distance to the nearest taxon; MPD, use null model test based on mean pairwise distance; both, use null model test of both MPD and MNTD. Default is MNTD. |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
exclude.conspecifics |
Logic, should conspecific taxa in different communities be exclude from MNTD calculations? default is FALSE. The same as in the function bmntd. |
dirichlet |
Logic. If TRUE, the taxonomic null model for correcting special cases will use Dirichlet distribution to generate relative abundances in randomized community matrix. default is FALSE. |
This function is particularly designed for samples from different metacommunities. The null model "taxa shuffle" will be done under different metacommunities, separately (and independently). All other details are the same as the function bNTI.bin.big.
Output is a list with following elements:
index |
list, each element is a square matrix of betaNTI values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
sig.index |
character, indicates the index for null model significance test, SES (i.e. betaNTI), RC, or Confidence. |
betaMNTD.obs |
Output only if output.bMNTD is TRUE. A list, each element is a square matrix of observed beta MNTD values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
rand |
Output only if detail.null is TRUE. A list, each element is a matrix with null values of beta MNTD for each turnover of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
special.crct |
Output only if detail.null is TRUE. NULL if correct.special is FALSE. A list with three elements, corresponding to three different null model significance testing indexes, i.e. SES, RC, and Confidence. Each element is a matrix, where the value is zero if the result for a turnover of a bin does not need to correct, otherwise there will be a corrected value. |
Version 2: 2022.4.26, fixed error when correcting special case. Version 1: 2021.8.4
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
bNTI.bin.big
, icamp.cm
, bNTI.cm
, bNTI.big.cm
# function 'bNTI.bin.cm' is usually used in the main function, 'icamp.cm', # when setting phylo.rand.scale="across", # means randomization across all bins in phylogenetic null model. data("example.data") comm=example.data$comm tree=example.data$tree pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNTI.bin.cm") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNTIbins=bNTI.bin.cm(comm=comm,meta.group=meta.group, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label, pd.wd=pd.big$pd.wd, pdid.bin=pdid.bin, sp.bin=sp.bin, spname.check = TRUE, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMNTD = FALSE, sig.index="SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MNTD", exclude.conspecifics = FALSE) setwd(wd0)
# function 'bNTI.bin.cm' is usually used in the main function, 'icamp.cm', # when setting phylo.rand.scale="across", # means randomization across all bins in phylogenetic null model. data("example.data") comm=example.data$comm tree=example.data$tree pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.bNTI.bin.cm") # you may change save.wd to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) rand.time=20 # usually use 1000 for real data. bNTIbins=bNTI.bin.cm(comm=comm,meta.group=meta.group, pd.desc=pd.big$pd.file, pd.spname=pd.big$tip.label, pd.wd=pd.big$pd.wd, pdid.bin=pdid.bin, sp.bin=sp.bin, spname.check = TRUE, nworker = nworker, memo.size.GB = 50, weighted = TRUE, rand = rand.time, output.bMNTD = FALSE, sig.index="SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MNTD", exclude.conspecifics = FALSE) setwd(wd0)
Perform null model test based on a phylogenetic beta diversity index, beta mean phylogenetic distance to the nearest taxon (betaMNTD); calculate beta nearest taxon index (betaNTI), or modified Raup-Crick metric, or confidence level, based on the comparison between observed and null betaMNTD. Run by parallel computing. This function can deal with local communities under different metacommunities (regional pools).
bNTI.cm(comm, dis, nworker = 4, memo.size.GB = 50, meta.group = NULL, meta.spool = NULL, meta.frequency = NULL, meta.ab = NULL, weighted = c(TRUE, FALSE), exclude.consp = FALSE, rand = 1000, output.bMNTD = c(FALSE, TRUE), sig.index = c("SES", "Confidence", "RC", "bNTI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MNTD", "MPD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, dirichlet = FALSE)
bNTI.cm(comm, dis, nworker = 4, memo.size.GB = 50, meta.group = NULL, meta.spool = NULL, meta.frequency = NULL, meta.ab = NULL, weighted = c(TRUE, FALSE), exclude.consp = FALSE, rand = 1000, output.bMNTD = c(FALSE, TRUE), sig.index = c("SES", "Confidence", "RC", "bNTI"), unit.sum = NULL, correct.special = FALSE, detail.null = FALSE, special.method = c("MNTD", "MPD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, dirichlet = FALSE)
comm |
community data matrix. rownames are sample names. colnames are species names |
dis |
Phylogenetic distance matrix. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. rownames are sample IDs. first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.spool |
a list object, each element is a character vector listing all taxa IDs in a metacommunity. The names of the elements indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default is NULL, means to use the observed taxa in comm across samples within the same metacommunity that is defined by meta.group. |
meta.frequency |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.ab |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
exclude.consp |
Logic, should conspecific taxa in different communities be exclude from MNTD calculations? default is FALSE. The same as in the function bmntd. |
rand |
integer, randomization times. default is 1000. |
output.bMNTD |
logic, if TRUE, the output will include the observed betaMNTD. |
sig.index |
character, the index for null model significance test. SES or bNTI, standard effect size, i.e. beta nearest taxon index (betaNTI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on betaMNTD, i.e. count the number of null betaMNTD lower than observed betaMNTD plus a half of the number of null betaMNTD equal to observed betaMNTD, to get alpha, then calculate betaMNTD-based RC as (2 x alpha - 1). default is SES. If input a vector, only the first element will be used. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. Usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this transformation. |
correct.special |
logic, whether to correct the special cases when calculating bNTI. Default is FALSE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MNTD, use null model test of mean distance to the nearest taxon; MPD, use null model test based on mean pairwise distance; both, use null model test of both MPD and MNTD. Default is MNTD. |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
dirichlet |
Logic. If TRUE, the taxonomic null model for correcting special cases will use Dirichlet distribution to generate relative abundances in randomized community matrix. default is FALSE. |
This function is particularly designed for samples from different metacommunities. The null model "taxa shuffle" will be done under different metacommunities, separately (and independently). All other details are the same as the function bNTIn.p.
Output is a list with following elements:
index |
a square matrix of betaNTI (or RC or Confidence based on betaMNTD) values. |
sig.index |
character, indicates the index for null model significance test, SES (i.e. betaNTI), RC, or Confidence. |
betaMNTD.obs |
Output only if output.bMNTD is TRUE. A square matrix of observed beta MNTD values. |
rand |
Output only if detail.null is TRUE. A matrix with null values of beta MNTD for each turnover. |
special.crct |
Output only if detail.null is TRUE. it will be NULL if correct.special is FALSE. Otherwise, it will be a list with three elements, corresponding to three different null model significance testing indexes, i.e. SES, RC, and Confidence. Each element is a square matrix, where the value is zero if the result for a turnover does not need to correct, otherwise there will be a corrected value. |
Version 1: 2021.8.4
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("example.data") comm=example.data$comm pd=example.data$pd # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number rand.time=4 # usually use 1000 for real data. bNTI=bNTI.cm(comm=comm, meta.group=meta.group, dis=pd, nworker = nworker, memo.size.GB = 50, weighted = TRUE, exclude.consp = FALSE, rand = rand.time, output.bMNTD = FALSE, sig.index = "SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MNTD")
data("example.data") comm=example.data$comm pd=example.data$pd # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number rand.time=4 # usually use 1000 for real data. bNTI=bNTI.cm(comm=comm, meta.group=meta.group, dis=pd, nworker = nworker, memo.size.GB = 50, weighted = TRUE, exclude.consp = FALSE, rand = rand.time, output.bMNTD = FALSE, sig.index = "SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MNTD")
Perform null model test based on a phylogenetic beta diversity index, beta mean phylogenetic distance to the nearest taxon (betaMNTD); calculate beta nearest taxon index (betaNTI), or modified Raup-Crick metric, or confidence level, based on the comparison between observed and null betaMNTD. Run by parallel computing.
bNTIn.p(comm, dis, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), exclude.consp = FALSE, rand = 1000, output.bMNTD = c(FALSE, TRUE), sig.index=c("SES","Confidence","RC","bNTI"), unit.sum = NULL, correct.special = FALSE, detail.null=FALSE, special.method=c("MNTD", "MPD", "both"), ses.cut=1.96,rc.cut=0.95,conf.cut=0.975, dirichlet = FALSE)
bNTIn.p(comm, dis, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), exclude.consp = FALSE, rand = 1000, output.bMNTD = c(FALSE, TRUE), sig.index=c("SES","Confidence","RC","bNTI"), unit.sum = NULL, correct.special = FALSE, detail.null=FALSE, special.method=c("MNTD", "MPD", "both"), ses.cut=1.96,rc.cut=0.95,conf.cut=0.975, dirichlet = FALSE)
comm |
community data matrix. rownames are sample names. colnames are species names |
dis |
Phylogenetic distance matrix. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
exclude.consp |
Logic, should conspecific taxa in different communities be exclude from MNTD calculations? default is FALSE. The same as in the function bmntd. |
rand |
integer, randomization times. default is 1000. |
output.bMNTD |
logic, if TRUE, the output will include the observed betaMNTD. |
sig.index |
character, the index for null model significance test. SES or bNTI, standard effect size, i.e. beta nearest taxon index (betaNTI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on betaMNTD, i.e. count the number of null betaMNTD lower than observed betaMNTD plus a half of the number of null betaMNTD equal to observed betaMNTD, to get alpha, then calculate betaMNTD-based RC as (2 x alpha - 1). default is SES. If input a vector, only the first element will be used. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances. Usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this transformation. |
correct.special |
logic, whether to correct the special cases when calculating bNTI. Default is FALSE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MNTD, use null model test of mean distance to the nearest taxon; MPD, use null model test based on mean pairwise distance; both, use null model test of both MPD and MNTD. Default is MNTD. |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
dirichlet |
Logic. If TRUE, the taxonomic null model for correcting special cases will use Dirichlet distribution to generate relative abundances in randomized community matrix. default is FALSE. |
The beta nearest taxon index (beta NTI; Webb et al. 2008, Stegen et al. 2012) is a standardized measure of the mean phylogenetic distance to the nearest taxon between samples/communities (beta MNTD) and quantifies the extent of terminal clustering, independent of deep level clustering. Parallel computing is used to improve the speed.
The null model algorithm is "taxa shuffle" (Kembel 2009), i.e. shuffling taxa labels across the tips of the phylogenetic tree to randomize phylogenetic relationships among species.
In the output of betaNTI, the diagonal are set as zero. If the randomized results are all the same, the standard deviation will be zero and betaNTI will be NAN. In this case, beta NTI will be set as zero, since the observed result is not differentiable from randomized results.
Modified RC (Chase et al 2011) and Confidence (Ning et al 2020) are alternative significance test indexes to evaluate how the observed beta diversity index deviates from null expectation, which could be a better metric than standardized effect size (betaNTI) in some cases, e.g. null values do not follow normal distribution.
Output is a list with following elements:
index |
a square matrix of betaNTI (or RC or Confidence based on betaMNTD) values. |
sig.index |
character, indicates the index for null model significance test, SES (i.e. betaNTI), RC, or Confidence. |
betaMNTD.obs |
Output only if output.bMNTD is TRUE. A square matrix of observed beta MNTD values. |
rand |
Output only if detail.null is TRUE. A matrix with null values of beta MNTD for each turnover. |
special.crct |
Output only if detail.null is TRUE. it will be NULL if correct.special is FALSE. Otherwise, it will be a list with three elements, corresponding to three different null model significance testing indexes, i.e. SES, RC, and Confidence. Each element is a square matrix, where the value is zero if the result for a turnover does not need to correct, otherwise there will be a corrected value. |
Version 7: 2021.4.18, fix the bug when detail.null=TRUE and comm has only two samples. Version 6: 2020.8.18, update help document, add example. Version 5: 2020.8.1, change RC opiton to sig.index, add detail.null, rc.cut and conf.cut. Version 4: 2018.10.15, consider special cases. Version 3: 2016.3.26, add RC option. Version 2: 2015.9.23, set diag of bNTI = 0 and set 0/0 = 0 for bNTI. Version 1: 2015.4.1
Daliang Ning
Webb, C.O., Ackerly, D.D. & Kembel, S.W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics, 24, 2098-2100.
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. Isme Journal, 6, 1653-1664.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("example.data") comm=example.data$comm pd=example.data$pd nworker=2 # parallel computing thread number rand.time=4 # usually use 1000 for real data. bNTI=bNTIn.p(comm=comm, dis=pd, nworker = nworker, memo.size.GB = 50, weighted = TRUE, exclude.consp = FALSE, rand = rand.time, output.bMNTD = FALSE, sig.index = "SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MNTD")
data("example.data") comm=example.data$comm pd=example.data$pd nworker=2 # parallel computing thread number rand.time=4 # usually use 1000 for real data. bNTI=bNTIn.p(comm=comm, dis=pd, nworker = nworker, memo.size.GB = 50, weighted = TRUE, exclude.consp = FALSE, rand = rand.time, output.bMNTD = FALSE, sig.index = "SES", unit.sum = NULL, correct.special = TRUE, detail.null = FALSE, special.method = "MNTD")
This function is to change the method to calculate significance between null and observed dissimilarity and/or change the significance threshold values.
change.sigindex(icamp.output, sig.index = c("Confidence", "SES.RC", "SES", "RC"), detail.save = TRUE, detail.null = FALSE, ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975)
change.sigindex(icamp.output, sig.index = c("Confidence", "SES.RC", "SES", "RC"), detail.save = TRUE, detail.null = FALSE, ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975)
icamp.output |
list, the exact output of the function icamp.big in which detail.null must be TRUE, to save all null values. |
sig.index |
character, Confidence means to directly count the percentage of null values higher/lower than observed value; SES.RC means to use Standard Effect Size (e.g. betaNRI, betaNTI) for phylogenetic beta diversity and use modified Raup-Crick for taxonomic beta diversity, which is typical practice in the previous method; SES means to use Standard Effect Size for both phylogenetic and taxonomic beta diversity; RC means to use modified Raup-Crick for both phylogenetic and taxonomic beta diversity. |
detail.save |
logic, whether to output the details, including binning information, significance indexes, bin abundances, and some key parameter settings for iCAMP analysis. Default is TRUE |
detail.null |
logic, whether to output all observed and null values of beta diversity indexes. Default is FALSE. |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick index value, default is 0.95. |
conf.cut |
numeric, the cutoff of significant confidence level (one-tail), default is 0.975. |
This function is to re-calculate significance using another index or a different threshold value using previously saved null model values. Since the null values are directly extracted from previous icamp.big results, it can skip the most time-consuming step (randomization) and quickly complete calculation.
The default threshold values of Confidence (0.975), SES (1.96), and RC (0.95) mean to capture the 0.95 two-tail confidence level (P=0.05). But, SES need to assume the null values follow normal distribution. RC counts in a half of the special cases that null values are equal to observed values, which is good for obtaining a symmetric metric but theoretically has risk to misestimate significance level (but very slight). Thus, Confidence is preferred as long as the 1000-time randomization is representative.
The output will be the same as icamp.big.
Version 2: 2020.8.18, update help document, add example. Version 1: 2020.8.1
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("icamp.out") icamp.out.new=change.sigindex(icamp.output=icamp.out, sig.index = "Confidence")
data("icamp.out") icamp.out.new=change.sigindex(icamp.output=icamp.out, sig.index = "Confidence")
This function is to calculate the popular effect size index Cohen's d.
cohend(treat, control, paired = FALSE)
cohend(treat, control, paired = FALSE)
treat |
a numeric vector. treatment group. |
control |
a numberic vector. control group. |
paired |
logic. Whether the samples in treatment and control groups are paired. default is FALSE. |
This function computes the value of Cohen's d statistics (Cohen 1988). The effect size magnitude is performed using the thresholds proposed by Cohen (1992), i.e. |d|<0.2 "negligible", 0.2<=|d|<0.5 "small", 0.5<=|d|<0.8 "medium", |d|>=0.8 "large". The variance of the d is calculate using the conversion formula reportead at page 238 of Cooper et al. (2009): ((n1+n2)/(n1*n2) + .5*d^2/df) * ((n1+n2)/df) Its square root is output as standard deviation of d.
A list of values will be returned
d |
Cohen's d value, (mean(treat)-mean(control))/sd |
sd |
standard deviation of d |
magnitude |
a qualitative assessment of the magnitude of effect size |
paired |
whether the samples are paired |
version 1: 2016.2.12
Daliang Ning
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York:Academic Press
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.
The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009)
x=c(1,5,8) y=c(2,6,10) cohend(x,y) cohend(x,y,paired=TRUE)
x=c(1,5,8) y=c(2,6,10) cohend(x,y) cohend(x,y,paired=TRUE)
Transform a distance matrix to a 3-column matrix in which the first 2 columns indicate the pairwised samples/species names.
dist.3col(dist)
dist.3col(dist)
dist |
a square matrix or distance object with column names and row names. |
In many cases, a 3-column matrix is easier to use than a distance matrix.
name1 |
1st column, the first item of pairwised two items |
name2 |
2nd column, the second item of pairwised two items |
dis |
3rd column, distance value of the pairwised two itmes |
Version 1: 2015.5.17
Daliang Ning
# In this example, dist.3col transforms the distance object # of Bray-Curtis dissimilarity to 3-column matrix. data("example.data") comm=example.data$comm BC=vegan::vegdist(comm) BC3c=dist.3col(BC)
# In this example, dist.3col transforms the distance object # of Bray-Curtis dissimilarity to 3-column matrix. data("example.data") comm=example.data$comm BC=vegan::vegdist(comm) BC3c=dist.3col(BC)
Convert a list of distance matrixes (or square matrixes) with the same sample IDs into a matrix.
dist.bin.3col(dist.bin, obj.name = NULL)
dist.bin.3col(dist.bin, obj.name = NULL)
dist.bin |
a list, each element is a distance matrix or square matrix. all elements have exactly the same sample IDs (rownames and colnames) which are in the same order. |
obj.name |
a character, as a prefix of the bin names. |
A tool to facilitate format transformation in iCAMP analysis.
output is a matrix. The first two columns are sample IDs, and each of the following columns represent an element in the original list which usually is a bin in iCAMP analysis.
Version 2: 2020.8.18, add example Version 1: 2015.8.30
Daliang Ning
# let's see a very simple example bin.dist=as.matrix(dist(1:10)) rownames(bin.dist)<-colnames(bin.dist)<-paste0("Sample",1:10) dist.bins=list(bin1=bin.dist,bin2=bin.dist+1,bin3=bin.dist*2) dis.3c=dist.bin.3col(dist.bins,obj.name="test")
# let's see a very simple example bin.dist=as.matrix(dist(1:10)) rownames(bin.dist)<-colnames(bin.dist)<-paste0("Sample",1:10) dist.bins=list(bin1=bin.dist,bin2=bin.dist+1,bin3=bin.dist*2) dis.3c=dist.bin.3col(dist.bins,obj.name="test")
Calculate niche difference between species based on each environmental variable, directly output the matrix or save the result matrix as big.matrix.
dniche(env, comm, method = c("ab.overlap", "niche.value", "prefer.overlap"), nworker = 4, memory.G = 50, out.dist = FALSE, bigmemo = TRUE, nd.wd = getwd(), nd.spname.file="nd.names.csv", detail.file="ND.res.rda")
dniche(env, comm, method = c("ab.overlap", "niche.value", "prefer.overlap"), nworker = 4, memory.G = 50, out.dist = FALSE, bigmemo = TRUE, nd.wd = getwd(), nd.spname.file="nd.names.csv", detail.file="ND.res.rda")
env |
matrix or data.frame, each row is a sample, each column is an environmental factor which may be important to represent the niche, thus rownames are sample IDs, and colnames are environmental factor names. |
comm |
matrix or data.frame, each row is a sample, each column is a spcies (OTU or ASV), thus rownames are sample IDs, colnames are species/OTU/ASV IDs. |
method |
methods to calculate niche difference. ab.overlap means to calculate from overlapp based on observed abundances along an environment gradient. niche.value means to calculate the difference from abundance weighted mean of each environment factor for each species. prefer.overlap is similar to ab.overlap, but the observed abundances of each species are divided by total abundance sum of the species before calculating overlapping. If list multiple methods as a vector, only the first element will be used. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
out.dist |
logic, if TRUE, the output niche difference matrix of each environment factor will be a distance object, otherwise will be a matrix in the output list. |
bigmemo |
logic, if TRUE, big.matrix in R package bigmemory will be used to save each niche differnece matrix as a big matrix on hard disk. |
nd.wd |
folder path, when bigmemo is TRUE, where the big matrixes are saved. |
nd.spname.file |
character, name of the file saving taxa IDs, which should be in exactly the same order as in the row names (and column names) of the big niche difference matrix, if bigmemo is TRUE. it should be a .csv file. |
detail.file |
character, name of the file saving all output information in R data format. it should be a .rda file. |
The method niche.value is to calculate niche difference as the absolute difference of niche values between each pair of species. The niche value of a species is calculated as abundance-weighted mean of each environmental factor as previously reported (Stegen et al 2012 ISME J). In the method ab.overlap, the abundance of each species along the gradient of an environment factor is estimated using the density function using Gaussian kernel with 512 points. Then, the niche difference between two species is calculated as the sum of absolute abundance difference at each point divided by the sum of the higher abundance at each point, like Ruzicka dissimilarity (weighted Jaccard). It is like 1 - niche overlap based on abundance profile overlap, thus called ab.overlap. The method prefer.overlap is very similar to ab.overlap, just one modification, i.e. the observed abundance of each species in each sample is divied by the total abundance of the species across all sample, to normalize the profile, before calcuating niche difference.
Bigmemory (Kane et al 2013) is used to deal with large datasets.
The output is a list object, with several elements.
bigmemo |
logic, to show whether big.matrix is used. |
nd |
if bigmemo is FALSE, this is a list of matrixes or distance objects showing the niche difference matrix based on each environment factor. if bigmemo is TRUE, this is a list of big matrix file names. |
nd.wd |
only appear when bigmemo is TRUE, shows the folder path where the big matrixes are saved. |
names |
only appear when bigmemo is TRUE, shows species (OTU or ASV) IDs, in the same order as rownames and colnames in the niche difference matrixes. |
method |
The method used. |
Version 4: 2022.5.29, if nd.wd does not exist, creat a folder as nd.wd. Version 3: 2020.9.1, add nd.spname.file and detail.file; remove setwd; change dontrun to donttest and revise save.wd in help doc. Version 2: 2020.8.18, add example. Version 1: 2020.5.15
Daliang Ning
Stegen, J.C., Lin, X., Konopka, A.E. & Fredrickson, J.K. (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. ISME J, 6, 1653-1664.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm env=example.data$env # if data is small, you do not need to use big.memory niche.dif=dniche(env = env, comm = comm, method = "niche.value", nworker = 1,out.dist=FALSE,bigmemo=FALSE,nd.wd = NULL) # if data is large, you need to use big.memory # since big.memory need to specify a certain folder, # it is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/dnichewd") # please change to the folder you want to save the big niche difference matrix. nworker=2 # parallel computing thread number niche.dif=dniche(env = env, comm = comm, method = "niche.value", nworker = nworker, out.dist=FALSE,bigmemo=TRUE,nd.wd = save.wd) setwd(wd0)
data("example.data") comm=example.data$comm env=example.data$env # if data is small, you do not need to use big.memory niche.dif=dniche(env = env, comm = comm, method = "niche.value", nworker = 1,out.dist=FALSE,bigmemo=FALSE,nd.wd = NULL) # if data is large, you need to use big.memory # since big.memory need to specify a certain folder, # it is set as 'not test'. # but you may test the code on your computer after change the path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/dnichewd") # please change to the folder you want to save the big niche difference matrix. nworker=2 # parallel computing thread number niche.dif=dniche(env = env, comm = comm, method = "niche.value", nworker = nworker, out.dist=FALSE,bigmemo=TRUE,nd.wd = save.wd) setwd(wd0)
A small dataset including community matrix, phylogenetic tree, treatment information, environmental factors. just for test.
data("example.data")
data("example.data")
The format is: List of 4 $ comm : int [1:20, 1:30] 1 3 0 0 2 2 2 0 0 4 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. ..$ : chr [1:30] "OTU1" "OTU2" "OTU3" "OTU4" ... $ tree :List of 4 ..$ edge : int [1:58, 1:2] 31 32 33 34 35 36 36 35 34 37 ... ..$ edge.length: num [1:58] 0.314 0.422 0.315 0.881 0.774 ... ..$ Nnode : int 29 ..$ tip.label : chr [1:30] "OTU1" "OTU2" "OTU3" "OTU4" ... ..- attr(*, "class")= chr "phylo" ..- attr(*, "order")= chr "cladewise" $ treat:'data.frame': 20 obs. of 2 variables: ..$ Management: chr [1:20] "SF" "BF" "SF" "SF" ... ..$ Location : chr [1:20] "south" "south" "south" "south" ... $ env :'data.frame': 20 obs. of 2 variables: ..$ pH : num [1:20] 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 ... ..$ temperature: num [1:20] 12.9 4 13.7 6 8 2.7 10.4 3.2 7.6 5.8 ... $ pd :'data.frame': 30 obs. of 30 variables: ..$ OTU1 : num [1:30] 0 0.606 1.348 3.015 3.331 ... ..$ OTU2 : num [1:30] 0.606 0 1.19 2.856 3.173 ... ..$ OTU3 : num [1:30] 1.35 1.19 0 2.05 2.37 ... ..$ OTU4 : num [1:30] 3.015 2.856 2.051 0 0.479 ... ..$ OTU5 : num [1:30] 3.331 3.173 2.367 0.479 0 ... ..$ OTU6 : num [1:30] 2.96 2.8 2 1.9 2.22 ... ..$ OTU7 : num [1:30] 3.62 3.46 2.65 2.56 2.87 ... ..$ OTU8 : num [1:30] 4.97 4.82 4.01 3.92 4.23 ... ..$ OTU9 : num [1:30] 4.77 4.61 3.81 3.71 4.03 ... ..$ OTU10: num [1:30] 3.96 3.81 3 2.9 3.22 ... ..$ OTU11: num [1:30] 3.4 3.24 2.43 2.34 2.65 ... ..$ OTU12: num [1:30] 3.69 3.53 2.73 2.63 2.95 ... ..$ OTU13: num [1:30] 5.64 5.48 4.68 4.58 4.9 ... ..$ OTU14: num [1:30] 6.1 5.94 5.14 5.04 5.36 ... ..$ OTU15: num [1:30] 4.73 4.57 3.77 3.67 3.99 ... ..$ OTU16: num [1:30] 5.77 5.61 4.8 4.71 5.03 ... ..$ OTU17: num [1:30] 5.97 5.81 5 4.91 5.22 ... ..$ OTU18: num [1:30] 5.27 5.11 4.3 4.21 4.52 ... ..$ OTU19: num [1:30] 7.56 7.4 6.6 6.5 6.82 ... ..$ OTU20: num [1:30] 7.55 7.39 6.58 6.49 6.8 ... ..$ OTU21: num [1:30] 6.49 6.33 5.53 5.43 5.75 ... ..$ OTU22: num [1:30] 6.52 6.37 5.56 5.46 5.78 ... ..$ OTU23: num [1:30] 6.68 6.52 5.72 5.62 5.94 ... ..$ OTU24: num [1:30] 6.35 6.19 5.38 5.29 5.61 ... ..$ OTU25: num [1:30] 6.37 6.21 5.41 5.31 5.63 ... ..$ OTU26: num [1:30] 5.73 5.57 4.77 4.67 4.99 ... ..$ OTU27: num [1:30] 6.23 6.07 5.27 5.17 5.49 ... ..$ OTU28: num [1:30] 5.99 5.83 5.02 4.93 5.24 ... ..$ OTU29: num [1:30] 5.7 5.54 4.74 4.64 4.96 ... ..$ OTU30: num [1:30] 3.94 3.78 2.97 2.88 3.19 ... $ pdid.bin:List of 3 ..$ : int [1:5] 1 2 3 4 5 ..$ : int [1:7] 6 7 8 9 10 11 12 ..$ : int [1:18] 13 14 15 16 17 18 19 20 21 22 ... $ sp.bin :'data.frame': 30 obs. of 1 variable: ..$ bin.id.new: num [1:30] 1 1 1 1 1 2 2 2 2 2 ... $ classification: chr [1:30, 1:6] "Archaea" "Archaea" "Archaea" "Archaea" ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:30] "OTU1" "OTU2" "OTU3" "OTU4" ... .. ..$ : chr [1:6] "Domain" "Phylum" "Class" "Order" ...
comm is a matrix, each row as a sample, each column as a species.
tree means phylogenetic tree.
treat is a treatment information matrix, each row as a sample, each column indicates a type of treatment.
env is a matrix of environmental factors, i.e. pH and temperature in this case.
pd is a matrix of the pairwise phylogenetic distance between species.
pdid.bin is a list, each element is a vector of integer, indicating which rows/columns in the big phylogenetic matrix represent the taxa in a bin.
sp.bin is a one-column matrix, rownames are taxa IDs (i.e. OTU IDs), the only column shows the bin ID of each taxon. Bin IDs are integers in the same order as the elements in the list of pdid.bin.
classification is a matrix to define the lineage of each taxon.
This dataset is randomly generated, just for test.
data(example.data) comm=example.data$comm tree=example.data$tree treat=example.data$treat env=example.data$env pd=example.data$pd pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin
data(example.data) comm=example.data$comm tree=example.data$tree treat=example.data$treat env=example.data$env pd=example.data$pd pdid.bin=example.data$pdid.bin sp.bin=example.data$sp.bin
main function of iCAMP, to perform phylogenetic-bin-based null model analysis and quantify the relative importance of different processes.
icamp.big(comm, tree, pd.desc = NULL, pd.spname = NULL, pd.wd = getwd(), rand = 1000, prefix = "iCAMP", ds = 0.2, pd.cut = NA, sp.check = TRUE, phylo.rand.scale = c("within.bin", "across.all", "both"), taxa.rand.scale = c("across.all", "within.bin", "both"), phylo.metric = c("bMPD", "bMNTD", "both", "bNRI", "bNTI"), sig.index=c("Confidence","SES.RC","SES","RC"), bin.size.limit = 24, nworker = 4, memory.G = 50, rtree.save = FALSE, detail.save = TRUE, qp.save = TRUE, detail.null=FALSE, ignore.zero = TRUE, output.wd = getwd(), correct.special = TRUE, unit.sum = rowSums(comm), special.method = c("depend","MPD","MNTD","both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut=0.975, omit.option = c("no", "test", "omit"), meta.ab = NULL, treepath.file="path.rda", pd.spname.file="pd.taxon.name.csv", pd.backingfile="pd.bin", pd.desc.file="pd.desc", taxo.metric="bray", transform.method=NULL, logbase=2, dirichlet=FALSE, d.cut.method=c("maxpd","maxdroot"))
icamp.big(comm, tree, pd.desc = NULL, pd.spname = NULL, pd.wd = getwd(), rand = 1000, prefix = "iCAMP", ds = 0.2, pd.cut = NA, sp.check = TRUE, phylo.rand.scale = c("within.bin", "across.all", "both"), taxa.rand.scale = c("across.all", "within.bin", "both"), phylo.metric = c("bMPD", "bMNTD", "both", "bNRI", "bNTI"), sig.index=c("Confidence","SES.RC","SES","RC"), bin.size.limit = 24, nworker = 4, memory.G = 50, rtree.save = FALSE, detail.save = TRUE, qp.save = TRUE, detail.null=FALSE, ignore.zero = TRUE, output.wd = getwd(), correct.special = TRUE, unit.sum = rowSums(comm), special.method = c("depend","MPD","MNTD","both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut=0.975, omit.option = c("no", "test", "omit"), meta.ab = NULL, treepath.file="path.rda", pd.spname.file="pd.taxon.name.csv", pd.backingfile="pd.bin", pd.desc.file="pd.desc", taxo.metric="bray", transform.method=NULL, logbase=2, dirichlet=FALSE, d.cut.method=c("maxpd","maxdroot"))
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a taxon (a species or OTU or ASV), thus rownames should be sample IDs, colnames should be taxa IDs. |
tree |
phylogenetic tree, an object of class "phylo". |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. If it is NULL, the fucntion pd.big will be used to calculate the phylogenetic distance matrix from tree, and save it in pd.wd as a big.memory file. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
rand |
integer, randomization times. default is 1000. |
prefix |
character string, the prefix of those output files. |
ds |
numeric, the general threshold of phylogenetic distance within which the phylogenetic signal is still significant. default is 0.2. |
pd.cut |
numeric, the distance to the tree root where the phylogenetic tree is trancated to get strict phylogenetic bins. if pd.cut is set, the distance threshold (ds) is disabled. default is NA. |
sp.check |
logic, whether to match the taxa ids in community data, phylogenetic distance matrix, and tree. default is TRUE. |
phylo.rand.scale |
character, the scale to randomize the taxa for phylogenetic null model. "within.bin" means randomization within each bin; "across.all" means randomization across all bins; "both" means to test both methods. Default setting is within.bin. |
taxa.rand.scale |
character, the scale to randomize the taxa for taxonomic null model. "within.bin" means randomization within each bin; "across.all" means randomization across all bins; "both" means to test both methods. Default setting is across.all. |
phylo.metric |
character, the metric for phylogenetic null model analysis. bMPD (or bNRI), null model analysis based on beta mean pairwise distance (betaMPD); if sig.index is SES, it is beta net relatedness index (betaNRI). bMNTD (or bNTI), null model analysis based on beta mean nearest taxon distance (betaMNTD); if sig.index is SES, it is beta nearest taxon index (betaNTI). both, use null model test based on both bMPD and bMNTD. Default setting is based on bMPD. |
sig.index |
character, the index for null model significance test. Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; if set sig.index as Confidence, it will be applied to both phylogenetic and taxonomic metrics. If set as SES.RC, use standard effect size (SES) for phylogenetic metrics (i.e. betaNTI or betaNRI), and use modified Raup-Crick (RC) for taxonomic metrics (RCbray). If set as SES, use SES for both phylogenetic and taxonomic metrics. If set as RC, use RC for both phylogenetic and taxonomic metrics. default is Confidence. If input a vector, only the first one will be used. |
bin.size.limit |
integer, the minimal requirement of bin size (taxa numer in a bin). Default setting is 24. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
rtree.save |
logic, whether to save the rooted tree as nwk file, if the input tree is not rooted. Default is FALSE. |
detail.save |
logic, whether to save the details, i.e. some key objects for iCAMP analysis, as rda file. Default is TRUE. |
qp.save |
logic, whether to save the relative importance of processes as csv file. Default is TRUE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. But this need to be TRUE if you want to change significance testing index later using 'change.sigindex'. |
ignore.zero |
logic, in the community data matrix (comm), whether to remove the row(s)/column(s) of which the sum is zero. Default is TRUE. |
output.wd |
a folder path, where the files will be saved when rtree.save, detail.save, or qp.save is true. |
correct.special |
logic, whether to correct the special cases when calculating bNRI or bNTI. Default is TRUE. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances, and the Bray-Cuits index in each bin will become manhattan index divided by 2. Default setting are the row sums of community matrix, which are usually sequencing depth in each sample. If set as NULL, means not to do this special transformation. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MPD, use null model test based on mean pairwise distance; MNTD, use null model test based on mean nearest taxon distance; depend, use MPD when phylo.metric is bMPD or bNRI, and use MNTD when phylo.metric is bMNTD or bNTI; both, use both MPD and MNTD. Default is depend |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick index value, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
omit.option |
three options about omitting small bins. "no" means to merge small bins to their nearest relatives to meet the bin size requirement, rather than omitting them; "test" means to output the information of small strict bins with a size lower than requirement, iCAMP will not be performed; "omit" means to do iCAMP analysis with strict bins which have enough species (larger than bin size requirement). |
meta.ab |
a numeric vector, to define the relative aubndance of each species in the regional pool. Default setting is NULL, means to calculate meta.ab as average relative abundance of each species across the samples. |
treepath.file |
character, name of the file saving the tree.path, which is a list of all the nodes and edge lengthes from root to every tip and/or node. it should be a .rda filename. |
pd.spname.file |
character, name of the file saving the taxa IDs, which has exactly the same order as the row names (and column names) of the big phylogenetic distance matrix. it should be a .csv filename. |
pd.backingfile |
character, the root name for the file for the cache of the big phylogenetic distance matrix. it should be a .bin filename. |
pd.desc.file |
character, name of the file to hold the backingfile description for the big phylogenetic distance matrix. it should be a .desc filename. |
taxo.metric |
taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored. |
transform.method |
character or a defined function, to specify how to transform community matrix before calculating dissimilarity. if it is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'. |
logbase |
numeric, the logarithm base used when transform.method='log'. |
dirichlet |
Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE. |
d.cut.method |
character, to specify the method to calculate pd.cut from ds. 'maxpd' means based on maximum phylogenetic distance, pd.cut = (maxpd - ds)/2. 'maxdroot' means based on maximum distance to root, pd.cut = maxdroot - (ds/2), which is preferred if the tree only has one edge from the root. |
This is the main function of iCAMP (Ning et al 2020). Most parameters can use the default settings.
To quantify various ecological processes, the observed taxa are first divided into different groups ('bins') based on their phylogenetic relationships. Then, the process governing each bin is identified based on null model analysis of the phylogenetic diversity using beta Net Relatedness Index (betaNRI), and taxonomic beta-diversities using modified Raup-Crick metric (RC; a typical setting of sig.index as SES.RC). For each bin, the fraction of pairwise comparisons with betaNRI < -1.96 is considered as the percentages of homogeneous selection, whereas those with betaNRI > +1.96 as the percentages of heterogeneous selection based on the threshold applied previously (Stegen et al 2015; Zhou and Ning 2017). Next, taxonomic diversity metric RC is used to partition the remaining pairwise comparisons with abs(NRI) <= 1.96. The fraction of pairwise comparisons with RC < -0.95 is treated as the percentages of homogenizing dispersal, while those with RC > 0.95 as dispersal limitation (Stegen et al 2013). The remains with abs(NRI) <= 1.96 and abs(RC) <= 0.95 represent the percentages of drift, diversification, weak selection and/or weak dispersal(Zhou and Ning 2017), simply designated as 'drift'(Stegen et al 2013) for convenience. The above analysis is repeated for every bin. Subsequently, the fractions of individual processes across all bins are weighted by the relative abundance of each bin, and summarized to estimate the relative importance of individual processes at the whole community level. Besides betaNRI and RC, null model significance can also be inferred by direct test based on null model distribution, which should be a preferred choice when the null model simulated values do not follow normal distribution (Veech 2012). See the references for details.
Bigmemory (Kane et al 2013) is used to deal with large datasets.
If omit.option is test, the output will be a table summarizing the information of small bins.
Otherwise, the output is a list object, including one or more elements as below:
The first one or selveral (if set 'both' for metrics and/or randomization scale) elements are matrixes of process importances at community level. In each matrix, the first two columns will be sample ID of each turnover, and the third to last column will show estimated relative importance of each process in shaping each turnover between communities (samples). The name(s) of the element(s) shows the metrics and its randomization scale, e.g. bNRIiRCa means phylogenetic null model analysis using betaNRI (i.e. SES based on betaMPD) with randomizaiton within each bin and taxonomic null model analysis using RC based on Bray-Curtis with randomization across bins. Other possible phylogenetic null-model-based metrics: bNTI, betaNTI (i.e. SES based on betaMNTD); RCbMPD, RC based on betaMPD; RCbMNTD, RC based on betaMNTD; CbMPD, confidence level based on betaMPD; CbMNTD, confidence level based on betaMNTD. Other possible taxonomic null-model-based metrics: SESbray, SES based on Bray-Curtis; CBray, confidence level based on Bray-Curtis. i, within-bin randomization; a, across-bin randomization.
detail |
an element in output only if detail.save is TRUE. A list with elements as below. |
taxabin |
an element in 'detail'. A list, show phylogenetic binning results. The first element is a matrix named sp.bin, where each row is a taxon (OTU or ASV), the first column is the original strict bin ID, the second column is the original bin ID after small bins are merged into nearest relative(s), the third column is the final renewed bin ID. The second element named bin.united.sp is a list, where each element shows taxa IDs within each bin and the bins are in the order of the final renewed bin IDs. The third element named bin.strict.sp is a list, where each element shows taxa IDs within each strict bin and the bins are in the order of the original strict bin IDs. The fourth element named state.strict is a matrix, where the 1st column is orginal strict bin IDs, the 2nd column is the taxa number in each strict bin, the 3rd to 5th columns show the maximum, mean, and standard deviation of phylogenetic distances within each strict bin. The fifth element named state.united is a matrix, where the row numbering is the final bin ID, the 1st column is orginal bin IDs, the 2nd column is the taxa number in each final bin, the 3rd to 5th columns show the maximum, mean, and standard deviation of phylogenetic distances within each final bin. |
SigbMPDi , SigbMPDa , SigbMNTDi , SigbMNTDa , SigBCi , SigBCa
|
elements in 'detail', matrixes showing null model significance testing index for each turnover of each bin. In the name of the element(s), SigbMPD, SigMNTD, or SigBC mean the significance testing is based on betaMPD, betaMNTD, or taxonomic dissimilarity (default is Bray-Curtis); i, within-bin randomization; a, across-bin randomization. In each matrix, the first two columns are sample IDs for each turnover; the 3rd to the last column represent different bins with column names containing the significance testing index name, which can be bNRI, bNTI, RCbMPD, RCbMNTD, CbMPD, CbMNTD, SESbray, RCbray, or CBray as mentioned above. |
bin.weight |
an element in 'detail', a matrix showing relative abundance of each bin in each pair of samples. |
processes |
an element in 'detail', a list of process importance results at community level. |
setting |
an element in 'detail', a data.frame showing all basic settings of this function. |
comm |
an element in 'detail', the input community matrix. |
rand |
an element in output only if detail.null is TRUE. It is a list with each element showing the observed or null values of a beta diversity index (e.g. betaMPD, betaMNTD, Bray-Curtis). Each index is showed as a list where each element represents a bin. |
special.crct |
an element in output only if detail.null is TRUE. It shows the corrected values for special cases, where zero means no correction is needed. |
Version 12: 2021.6.4, debug, fix 'arguments imply differing number of rows' issue. Version 11: 2021.6.4, add option d.cut.method to handle trees with only one edge from root. Version 10: 2021.4.17, add taxo.metric, transform.method, logbase, and dirichlet, to allow community data transform, dissimilar index other than Bray-Curtis, and relative abundances (values < 1) in the input community matrix. Version 9: 2021.4.1, revise 'sp.bin==i' to 'sp.bin==bin.lev[i]' to correct error when omit.option='omit' and strict bin IDs are used. Thank adityabandla for finding this bug. see https://github.com/DaliangNing/iCAMP1/issues/9 for details. Version 8: 2020.10.15, input comm as data.frame may return error, now include as.matrix to solve it. Version 7: 2020.9.21, fix minor bug when output.wd is NULL. Version 6: 2020.9.1, remove setwd; add options to specify some file names; change dontrun to donttest and revise folder path in help doc. Version 5: 2020.8.19, update help document, add example. Version 4: 2020.5.31. Version 3: 2019.9.30.
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Stegen, J.C., Lin, X., Fredrickson, J.K. & Konopka, A.E. (2015). Estimating and mapping ecological processes influencing microbial community assembly. Front Microbiol, 6, 370.
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Zhou, J. & Ning, D. (2017). Stochastic community assembly: Does it matter in microbial ecology? Microbiology and Molecular Biology Reviews, 81.
Veech, J.A. (2012). Significance testing in ecological null models. Theor Ecol, 5, 611-616.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm tree=example.data$tree # since need to save some output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'pd.wd'. wd0=getwd() # please change to the folder you want to save the pd.big output. pd.wd=paste0(tempdir(),"/pdbig.icampbig") nworker=2 # parallel computing thread number rand.time=20 # usually use 1000 for real data. bin.size.limit=5 # for real data, usually use a proper number # according to phylogenetic signal test or try some settings # then choose the reasonable stochasticity level. # our experience is 12, or 24, or 48. # but for this example dataset which is too small, have to use 5. icamp.out=icamp.big(comm=comm,tree=tree,pd.wd=pd.wd, rand=rand.time, nworker=nworker, bin.size.limit=bin.size.limit) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # since need to save some output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'pd.wd'. wd0=getwd() # please change to the folder you want to save the pd.big output. pd.wd=paste0(tempdir(),"/pdbig.icampbig") nworker=2 # parallel computing thread number rand.time=20 # usually use 1000 for real data. bin.size.limit=5 # for real data, usually use a proper number # according to phylogenetic signal test or try some settings # then choose the reasonable stochasticity level. # our experience is 12, or 24, or 48. # but for this example dataset which is too small, have to use 5. icamp.out=icamp.big(comm=comm,tree=tree,pd.wd=pd.wd, rand=rand.time, nworker=nworker, bin.size.limit=bin.size.limit) setwd(wd0)
This function is to calculate various statistic index to assess relative importance of each process in each bin and each turnover, and bin's contribution to each process.
icamp.bins(icamp.detail, treat = NULL, clas = NULL, silent = FALSE, boot = FALSE, rand.time = 1000, between.group = FALSE)
icamp.bins(icamp.detail, treat = NULL, clas = NULL, silent = FALSE, boot = FALSE, rand.time = 1000, between.group = FALSE)
icamp.detail |
list object, the output or the "detail" element of the output from |
treat |
matrix or data.frame, indicating the group or treatment of each sample, rownames are sample IDs. Allow to input multi-column matrix, different columns represent different ways to group the samples. |
clas |
matrix or data.frame, the classification information of species (OTUs). |
silent |
Logic, whether to show messages. Default is FALSE, thus all messages will be showed. |
boot |
Logic, whether to do bootstrapping test to get significance of dominating process in each bin. |
rand.time |
integer, bootstrapping times. |
between.group |
Logic, whether to analyze between-treatment turnovers. |
Bin level analysis can provide insights into community assembly mechanisms. This function provides more detailed statistics with the output of the main function icamp.big
.
Output is a list object.
Wtuvk |
The dominant process in each turnover of each bin. |
Ptuv |
Relative importance of each process in governing the turnovers between each pair of communities (samples). |
Ptk |
Relative importance of each process in governing the turnovers of each bin among a group of samples. |
Pt |
Relative importance of each process in governing the turnovers in a group of samples. |
BPtk |
Bin contribution to each process, measuring the contribution of each bin to the relative importance of each process in the assembly of a group of communities. |
BRPtk |
Bin relative contribution to each process, measuring the relative contribution of each bin to a certain process. |
Binwt |
Output if treat is given. Bin relative abundance in each group (treatment) of samples. |
Bin.TopClass |
Output if clas is given. A matrix showing the bin relative abundance; the top taxon ID, percentage in bin, and classification; the most abundant name at each phylogeny level in the bin. |
Class.Bin |
Output if clas is given. A matrix showing the bin ID and classification information for each taxon. |
Version 3: 2021.1.5, fix the error when a tanoxomy name has unrecognizable character. Version 2: 2020.8.19, update help document, add example. Version 1: 2019.12.11
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("icamp.out") data("example.data") treatment=example.data$treat classification=example.data$classification rand.time=20 # usually use 1000 for real data. icampbin=icamp.bins(icamp.detail = icamp.out, treat = treatment, clas = classification, boot = TRUE, rand.time = rand.time, between.group = TRUE)
data("icamp.out") data("example.data") treatment=example.data$treat classification=example.data$classification rand.time=20 # usually use 1000 for real data. icampbin=icamp.bins(icamp.detail = icamp.out, treat = treatment, clas = classification, boot = TRUE, rand.time = rand.time, between.group = TRUE)
Use bootstrapping to estimate the variation of relative importance of each process in each group, and compare the difference between groups.
icamp.boot(icamp.result, treat, rand.time = 1000, compare = TRUE, silent = FALSE, between.group = FALSE, ST.estimation = FALSE)
icamp.boot(icamp.result, treat, rand.time = 1000, compare = TRUE, silent = FALSE, between.group = FALSE, ST.estimation = FALSE)
icamp.result |
data.frame object, from the output of |
treat |
matrix or data.frame, a one-column (n x 1) matrix indicating the group or treatment of each sample, rownames are sample IDs. if input a n x m matrix, only the first column is used. |
rand.time |
integer, bootstrapping times. default is 1000. |
compare |
logic, whether to compare icamp reults between different groups. |
silent |
logic, if FALSE, some messages will show during calculation. |
between.group |
logic, whether to analyze between-treatment turnovers. |
ST.estimation |
logic, whether to estimate stochasticity as the total relative importance of dispersal and drift. |
Bootstrapping is implemented by random draw samples with replacement, to estimate the variation of relative importance of each process in each group, and calculate the relative difference, effect size, and significance of the difference between each two groups.
Output is a list with three elements.
summary |
data.frame, summary of each group. Group: group name from the input "treat". Process: process name from the icamp.result. Observed: the mean relative importance of each process in each group. Mean, Stdev, Min, Quartile25, Median, Quartile75, and Max: mean, standard deviation, minimum, 25 percent-quantile, median, 75 percent-quantile, and maximum of bootstrapping results, respectively. Lower.whisker, Lower.hinge, Mediean.1, Higher.hinge, Higher.whisker, Outerlier1...: boxplot elements. |
compare |
data.frame, summary of comaprison between each two groups. First two columns are group names. From the third column, different indexes for comparison are showed, including Cohen's d (Cohen.d), effect size magnitude according to Cohen's d (Effect.Size), and P value from bootstrapping test (P.value). |
boot.detail |
a list of matrixes, each matrix corresponds to a group, showing detailed bootstrapping results in each random draw. |
Version 4: 2021.7.1, fix a bug leading to zero cohen's d. Version 3: 2021.1.5, fix error when there is no outlier. Version 2: 2020.8.19, update help document, add example. Version 1: 2019.11.14
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("icamp.out") data("example.data") treatment=example.data$treat rand.time=20 # usually use 1000 for real data. icampbt=icamp.boot(icamp.result = icamp.out$bNRIiRCa, treat = treatment, rand.time = rand.time, compare = TRUE, between.group = TRUE, ST.estimation = TRUE)
data("icamp.out") data("example.data") treatment=example.data$treat rand.time=20 # usually use 1000 for real data. icampbt=icamp.boot(icamp.result = icamp.out$bNRIiRCa, treat = treatment, rand.time = rand.time, compare = TRUE, between.group = TRUE, ST.estimation = TRUE)
This function is to calculate various statistic index to assess relative importance of each process on different categories of taxa. The categories can be defined in various ways. For example, core, consistently and occasionally rare taxa; or different phyla; or various particular functional groups.
icamp.cate(icamp.bins.result, comm, cate, treat = NULL, silent = FALSE, between.group = FALSE)
icamp.cate(icamp.bins.result, comm, cate, treat = NULL, silent = FALSE, between.group = FALSE)
icamp.bins.result |
list object, the output from |
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a taxon (a species or OTU or ASV), thus rownames should be sample IDs, colnames should be taxa IDs. |
cate |
matrix or data.frame, indicating the category of each taxon, rownames are taxa IDs. If the matrix has multiple columns, only the first column will be used. |
treat |
matrix or data.frame, indicating the group or treatment of each sample, rownames are sample IDs. Allow to input multi-column matrix, different columns represent different ways to group the samples. |
silent |
logic, if FALSE, some messages will show during calculation. |
between.group |
logic, whether to analyze between-treatment turnovers. |
This function simply sums up the relative abundance of taxa of a category in different bins governed by a process to summarize the relative importance of the process on the category.
Output is a list object.
Ptuvx |
Relative importance of each process in governing each category's turnover between each pair of communities (samples). |
Ptx |
Relative importance of each process in governing each category's turnovers among a group of samples. |
Version 3: 2021.5.24, set NA if a cate has no taxon in a turnover; solve the problem that group means of different processes do not add up to 1. Version 2: 2021.1.7, add help document; fixed NAN error. Version 1: 2020.12.9.
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("icamp.out") data("example.data") comm=example.data$comm treatment=example.data$treat classification=example.data$classification rand.time=20 # usually use 1000 for real data. # 1 # summarize each bin icampbin=icamp.bins(icamp.detail = icamp.out, treat = treatment, clas = classification, boot = TRUE, rand.time = rand.time, between.group = TRUE) # 2 # define category cate=data.frame(type=rep("others",ncol(comm)),stringsAsFactors = FALSE) rownames(cate)=colnames(comm) tax.frequency=colSums(comm>0)/nrow(comm) tax.relative.ab=colMeans(comm/rowSums(comm)) cate[which(tax.frequency>0.75 & tax.relative.ab>0.05),1]="core" cate[which(tax.relative.ab<0.02),1]="rare" # 3 # summarize each category icampcate=icamp.cate(icamp.bins.result = icampbin, comm = comm, cate = cate, treat = treatment, silent = FALSE, between.group = TRUE)
data("icamp.out") data("example.data") comm=example.data$comm treatment=example.data$treat classification=example.data$classification rand.time=20 # usually use 1000 for real data. # 1 # summarize each bin icampbin=icamp.bins(icamp.detail = icamp.out, treat = treatment, clas = classification, boot = TRUE, rand.time = rand.time, between.group = TRUE) # 2 # define category cate=data.frame(type=rep("others",ncol(comm)),stringsAsFactors = FALSE) rownames(cate)=colnames(comm) tax.frequency=colSums(comm>0)/nrow(comm) tax.relative.ab=colMeans(comm/rowSums(comm)) cate[which(tax.frequency>0.75 & tax.relative.ab>0.05),1]="core" cate[which(tax.relative.ab<0.02),1]="rare" # 3 # summarize each category icampcate=icamp.cate(icamp.bins.result = icampbin, comm = comm, cate = cate, treat = treatment, silent = FALSE, between.group = TRUE)
Perform phylogenetic-bin-based null model analysis and quantify the relative importance of different processes. This function can deal with local communities under different metacommunities (regional pools).
icamp.cm(comm, tree, meta.group = NULL, meta.com = NULL, meta.frequency = NULL, meta.ab = NULL, pd.desc = NULL, pd.spname = NULL, pd.wd = getwd(), rand = 1000, prefix = "iCAMP", ds = 0.2, pd.cut = NA, phylo.rand.scale = c("within.bin", "across.all", "both"), taxa.rand.scale = c("across.all", "within.bin", "both"), phylo.metric = c("bMPD", "bMNTD", "both", "bNRI", "bNTI"), sig.index = c("Confidence", "SES.RC", "SES", "RC"), bin.size.limit = 24, nworker = 4, memory.G = 50, rtree.save = FALSE, detail.save = TRUE, qp.save = TRUE, detail.null = FALSE, ignore.zero = TRUE, output.wd = getwd(), correct.special = TRUE, unit.sum = rowSums(comm), special.method = c("depend", "MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, omit.option = c("no", "test", "omit"), treepath.file = "path.rda", pd.spname.file = "pd.taxon.name.csv", pd.backingfile = "pd.bin", pd.desc.file = "pd.desc", taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE, d.cut.method = c("maxpd", "maxdroot"))
icamp.cm(comm, tree, meta.group = NULL, meta.com = NULL, meta.frequency = NULL, meta.ab = NULL, pd.desc = NULL, pd.spname = NULL, pd.wd = getwd(), rand = 1000, prefix = "iCAMP", ds = 0.2, pd.cut = NA, phylo.rand.scale = c("within.bin", "across.all", "both"), taxa.rand.scale = c("across.all", "within.bin", "both"), phylo.metric = c("bMPD", "bMNTD", "both", "bNRI", "bNTI"), sig.index = c("Confidence", "SES.RC", "SES", "RC"), bin.size.limit = 24, nworker = 4, memory.G = 50, rtree.save = FALSE, detail.save = TRUE, qp.save = TRUE, detail.null = FALSE, ignore.zero = TRUE, output.wd = getwd(), correct.special = TRUE, unit.sum = rowSums(comm), special.method = c("depend", "MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, omit.option = c("no", "test", "omit"), treepath.file = "path.rda", pd.spname.file = "pd.taxon.name.csv", pd.backingfile = "pd.bin", pd.desc.file = "pd.desc", taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE, d.cut.method = c("maxpd", "maxdroot"))
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a taxon (a species or OTU or ASV), thus rownames should be sample IDs, colnames should be taxa IDs. |
tree |
phylogenetic tree, an object of class "phylo". |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. Rownames are sample IDs. The first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.com |
a list object, each element is a matrix or data.frame to define abundance (or relative abundance) of taxa in a metacommunity (regional pool). The element names indicate metacommunity names, which should be consistent with the metacommunity names defined in meta.group. If there is only one metacommunity, meta.com can be a matrix or data.frame to define taxa abundance (or relative abundance) in the metacommunity. Default is NULL, means to calculate metacommunity structure from comm according to metacommunities defined in meta.group. |
meta.frequency |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.ab |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group. |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. If it is NULL, the fucntion pd.big will be used to calculate the phylogenetic distance matrix from tree, and save it in pd.wd as a big.memory file. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
rand |
integer, randomization times. default is 1000. |
prefix |
character string, the prefix of those output files. |
ds |
numeric, the general threshold of phylogenetic distance within which the phylogenetic signal is still significant. default is 0.2. |
pd.cut |
numeric, the distance to the tree root where the phylogenetic tree is trancated to get strict phylogenetic bins. if pd.cut is set, the distance threshold (ds) is disabled. default is NA. |
phylo.rand.scale |
character, the scale to randomize the taxa for phylogenetic null model. "within.bin" means randomization within each bin; "across.all" means randomization across all bins; "both" means to test both methods. Default setting is within.bin. |
taxa.rand.scale |
character, the scale to randomize the taxa for taxonomic null model. "within.bin" means randomization within each bin; "across.all" means randomization across all bins; "both" means to test both methods. Default setting is across.all. |
phylo.metric |
character, the metric for phylogenetic null model analysis. bMPD (or bNRI), null model analysis based on beta mean pairwise distance (betaMPD); if sig.index is SES, it is beta net relatedness index (betaNRI). bMNTD (or bNTI), null model analysis based on beta mean nearest taxon distance (betaMNTD); if sig.index is SES, it is beta nearest taxon index (betaNTI). both, use null model test based on both bMPD and bMNTD. Default setting is based on bMPD. |
sig.index |
character, the index for null model significance test. Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; if set sig.index as Confidence, it will be applied to both phylogenetic and taxonomic metrics. If set as SES.RC, use standard effect size (SES) for phylogenetic metrics (i.e. betaNTI or betaNRI), and use modified Raup-Crick (RC) for taxonomic metrics (RCbray). If set as SES, use SES for both phylogenetic and taxonomic metrics. If set as RC, use RC for both phylogenetic and taxonomic metrics. default is Confidence. If input a vector, only the first one will be used. |
bin.size.limit |
integer, the minimal requirement of bin size (taxa numer in a bin). Default setting is 24. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
rtree.save |
logic, whether to save the rooted tree as nwk file, if the input tree is not rooted. Default is FALSE. |
detail.save |
logic, whether to save the details, i.e. some key objects for iCAMP analysis, as rda file. Default is TRUE. |
qp.save |
logic, whether to save the relative importance of processes as csv file. Default is TRUE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. But this need to be TRUE if you want to change significance testing index later using 'change.sigindex'. |
ignore.zero |
logic, in the community data matrix (comm), whether to remove the row(s)/column(s) of which the sum is zero. Default is TRUE. |
output.wd |
a folder path, where the files will be saved when rtree.save, detail.save, or qp.save is true. |
correct.special |
logic, whether to correct the special cases when calculating bNRI or bNTI. Default is TRUE. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances, and the Bray-Cuits index in each bin will become manhattan index divided by 2. Default setting are the row sums of community matrix, which are usually sequencing depth in each sample. If set as NULL, means not to do this special transformation. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MPD, use null model test based on mean pairwise distance; MNTD, use null model test based on mean nearest taxon distance; depend, use MPD when phylo.metric is bMPD or bNRI, and use MNTD when phylo.metric is bMNTD or bNTI; both, use both MPD and MNTD. Default is depend |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick index value, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
omit.option |
three options about omitting small bins. "no" means to merge small bins to their nearest relatives to meet the bin size requirement, rather than omitting them; "test" means to output the information of small strict bins with a size lower than requirement, iCAMP will not be performed; "omit" means to do iCAMP analysis with strict bins which have enough species (larger than bin size requirement). |
treepath.file |
character, name of the file saving the tree.path, which is a list of all the nodes and edge lengthes from root to every tip and/or node. it should be a .rda filename. |
pd.spname.file |
character, name of the file saving the taxa IDs, which has exactly the same order as the row names (and column names) of the big phylogenetic distance matrix. it should be a .csv filename. |
pd.backingfile |
character, the root name for the file for the cache of the big phylogenetic distance matrix. it should be a .bin filename. |
pd.desc.file |
character, name of the file to hold the backingfile description for the big phylogenetic distance matrix. it should be a .desc filename. |
taxo.metric |
taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored. |
transform.method |
character or a defined function, to specify how to transform community matrix before calculating dissimilarity. if it is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'. |
logbase |
numeric, the logarithm base used when transform.method='log'. |
dirichlet |
Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE. |
d.cut.method |
character, to specify the method to calculate pd.cut from ds. 'maxpd' means based on maximum phylogenetic distance, pd.cut = (maxpd - ds)/2. 'maxdroot' means based on maximum distance to root, pd.cut = maxdroot - (ds/2), which is preferred if the tree only has one edge from the root. |
This function is particularly designed for samples from different metacommunities. The null model will randomize the commuity matrix under different metacommunities, separately (and independently). All other details are the same as the function icamp.big.
If omit.option is test, the output will be a table summarizing the information of small bins.
Otherwise, the output is a list object, including one or more elements as below:
The first one or selveral (if set 'both' for metrics and/or randomization scale) elements are matrixes of process importances at community level. In each matrix, the first two columns will be sample ID of each turnover, and the third to last column will show estimated relative importance of each process in shaping each turnover between communities (samples). The name(s) of the element(s) shows the metrics and its randomization scale, e.g. bNRIiRCa means phylogenetic null model analysis using betaNRI (i.e. SES based on betaMPD) with randomizaiton within each bin and taxonomic null model analysis using RC based on Bray-Curtis with randomization across bins. Other possible phylogenetic null-model-based metrics: bNTI, betaNTI (i.e. SES based on betaMNTD); RCbMPD, RC based on betaMPD; RCbMNTD, RC based on betaMNTD; CbMPD, confidence level based on betaMPD; CbMNTD, confidence level based on betaMNTD. Other possible taxonomic null-model-based metrics: SESbray, SES based on Bray-Curtis; CBray, confidence level based on Bray-Curtis. i, within-bin randomization; a, across-bin randomization.
detail |
an element in output only if detail.save is TRUE. A list with elements as below. |
taxabin |
an element in 'detail'. A list, show phylogenetic binning results. The first element is a matrix named sp.bin, where each row is a taxon (OTU or ASV), the first column is the original strict bin ID, the second column is the original bin ID after small bins are merged into nearest relative(s), the third column is the final renewed bin ID. The second element named bin.united.sp is a list, where each element shows taxa IDs within each bin and the bins are in the order of the final renewed bin IDs. The third element named bin.strict.sp is a list, where each element shows taxa IDs within each strict bin and the bins are in the order of the original strict bin IDs. The fourth element named state.strict is a matrix, where the 1st column is orginal strict bin IDs, the 2nd column is the taxa number in each strict bin, the 3rd to 5th columns show the maximum, mean, and standard deviation of phylogenetic distances within each strict bin. The fifth element named state.united is a matrix, where the row numbering is the final bin ID, the 1st column is orginal bin IDs, the 2nd column is the taxa number in each final bin, the 3rd to 5th columns show the maximum, mean, and standard deviation of phylogenetic distances within each final bin. |
SigbMPDi , SigbMPDa , SigbMNTDi , SigbMNTDa , SigBCi , SigBCa
|
elements in 'detail', matrixes showing null model significance testing index for each turnover of each bin. In the name of the element(s), SigbMPD, SigMNTD, or SigBC mean the significance testing is based on betaMPD, betaMNTD, or taxonomic dissimilarity (default is Bray-Curtis); i, within-bin randomization; a, across-bin randomization. In each matrix, the first two columns are sample IDs for each turnover; the 3rd to the last column represent different bins with column names containing the significance testing index name, which can be bNRI, bNTI, RCbMPD, RCbMNTD, CbMPD, CbMNTD, SESbray, RCbray, or CBray as mentioned above. |
bin.weight |
an element in 'detail', a matrix showing relative abundance of each bin in each pair of samples. |
processes |
an element in 'detail', a list of process importance results at community level. |
setting |
an element in 'detail', a data.frame showing all basic settings of this function. |
comm |
an element in 'detail', the input community matrix. |
rand |
an element in output only if detail.null is TRUE. It is a list with each element showing the observed or null values of a beta diversity index (e.g. betaMPD, betaMNTD, Bray-Curtis). Each index is showed as a list where each element represents a bin. |
special.crct |
an element in output only if detail.null is TRUE. It shows the corrected values for special cases, where zero means no correction is needed. |
Version 1: 2021.8.4
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Stegen, J.C., Lin, X., Fredrickson, J.K. & Konopka, A.E. (2015). Estimating and mapping ecological processes influencing microbial community assembly. Front Microbiol, 6, 370.
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Zhou, J. & Ning, D. (2017). Stochastic community assembly: Does it matter in microbial ecology? Microbiology and Molecular Biology Reviews, 81.
Veech, J.A. (2012). Significance testing in ecological null models. Theor Ecol, 5, 611-616.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm tree=example.data$tree # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since need to save some output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() # please change to the folder you want to save the pd.big output. save.wd=paste0(tempdir(),"/pdbig.icampcm") nworker=2 # parallel computing thread number rand.time=20 # usually use 1000 for real data. bin.size.limit=5 # for real data, usually use a proper number # according to phylogenetic signal test or try some settings # then choose the reasonable stochasticity level. # our experience is 12, or 24, or 48. # but for this example dataset which is too small, have to use 5. icamp.out=icamp.cm(comm=comm, tree=tree, meta.group=meta.group, pd.wd=save.wd, rand=rand.time, nworker=nworker, bin.size.limit=bin.size.limit) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since need to save some output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() # please change to the folder you want to save the pd.big output. save.wd=paste0(tempdir(),"/pdbig.icampcm") nworker=2 # parallel computing thread number rand.time=20 # usually use 1000 for real data. bin.size.limit=5 # for real data, usually use a proper number # according to phylogenetic signal test or try some settings # then choose the reasonable stochasticity level. # our experience is 12, or 24, or 48. # but for this example dataset which is too small, have to use 5. icamp.out=icamp.cm(comm=comm, tree=tree, meta.group=meta.group, pd.wd=save.wd, rand=rand.time, nworker=nworker, bin.size.limit=bin.size.limit) setwd(wd0)
Perform phylogenetic-bin-based null model analysis and quantify the relative importance of different processes. This function can deal with local communities under different metacommunities (regional pools), and different metacommunity settings for phylogenetic and taxonomic models
icamp.cm2(comm, tree, meta.group.phy = NULL, meta.com.phy = NULL, meta.frequency.phy = NULL, meta.ab.phy = NULL, meta.group.tax = NULL, meta.com.tax = NULL, meta.frequency.tax = NULL, meta.ab.tax = NULL, pd.desc = NULL, pd.spname = NULL, pd.wd = getwd(), rand = 1000, prefix = "iCAMP", ds = 0.2, pd.cut = NA, phylo.rand.scale = c("within.bin", "across.all", "both"), taxa.rand.scale = c("across.all", "within.bin", "both"), phylo.metric = c("bMPD", "bMNTD", "both", "bNRI", "bNTI"), sig.index = c("Confidence", "SES.RC", "SES", "RC"), bin.size.limit = 24, nworker = 4, memory.G = 50, rtree.save = FALSE, detail.save = TRUE, qp.save = TRUE, detail.null = FALSE, ignore.zero = TRUE, output.wd = getwd(), correct.special = TRUE, unit.sum = rowSums(comm), special.method = c("depend", "MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, omit.option = c("no", "test", "omit"), treepath.file = "path.rda", pd.spname.file = "pd.taxon.name.csv", pd.backingfile = "pd.bin", pd.desc.file = "pd.desc", taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE, d.cut.method = c("maxpd", "maxdroot"))
icamp.cm2(comm, tree, meta.group.phy = NULL, meta.com.phy = NULL, meta.frequency.phy = NULL, meta.ab.phy = NULL, meta.group.tax = NULL, meta.com.tax = NULL, meta.frequency.tax = NULL, meta.ab.tax = NULL, pd.desc = NULL, pd.spname = NULL, pd.wd = getwd(), rand = 1000, prefix = "iCAMP", ds = 0.2, pd.cut = NA, phylo.rand.scale = c("within.bin", "across.all", "both"), taxa.rand.scale = c("across.all", "within.bin", "both"), phylo.metric = c("bMPD", "bMNTD", "both", "bNRI", "bNTI"), sig.index = c("Confidence", "SES.RC", "SES", "RC"), bin.size.limit = 24, nworker = 4, memory.G = 50, rtree.save = FALSE, detail.save = TRUE, qp.save = TRUE, detail.null = FALSE, ignore.zero = TRUE, output.wd = getwd(), correct.special = TRUE, unit.sum = rowSums(comm), special.method = c("depend", "MPD", "MNTD", "both"), ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975, omit.option = c("no", "test", "omit"), treepath.file = "path.rda", pd.spname.file = "pd.taxon.name.csv", pd.backingfile = "pd.bin", pd.desc.file = "pd.desc", taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE, d.cut.method = c("maxpd", "maxdroot"))
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a taxon (a species or OTU or ASV), thus rownames should be sample IDs, colnames should be taxa IDs. |
tree |
phylogenetic tree, an object of class "phylo". |
meta.group.phy |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to in the null model for phylogenetic beta diversity. Rownames are sample IDs. The first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.com.phy |
a list object, each element is a matrix or data.frame to define abundance (or relative abundance) of taxa in a metacommunity (regional pool) in the null model for phylogenetic beta diversity. The element names indicate metacommunity names, which should be consistent with the metacommunity names defined in meta.group. If there is only one metacommunity, meta.com can be a matrix or data.frame to define taxa abundance (or relative abundance) in the metacommunity. Default is NULL, means to calculate metacommunity structure from comm according to metacommunities defined in meta.group. |
meta.frequency.phy |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity in the null model for phylogenetic beta diversity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.ab.phy |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity in the null model for phylogenetic beta diversity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.group.tax |
the same format as meta.group.phy, but for taxonomic null model. |
meta.com.tax |
the same format as meta.com.phy, but for taxonomic null model. |
meta.frequency.tax |
the same format as meta.frequency.phy, but for taxonomic null model. |
meta.ab.tax |
the same format as meta.ab.phy, but for taxonomic null model. |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. If it is NULL, the fucntion pd.big will be used to calculate the phylogenetic distance matrix from tree, and save it in pd.wd as a big.memory file. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
rand |
integer, randomization times. default is 1000. |
prefix |
character string, the prefix of those output files. |
ds |
numeric, the general threshold of phylogenetic distance within which the phylogenetic signal is still significant. default is 0.2. |
pd.cut |
numeric, the distance to the tree root where the phylogenetic tree is trancated to get strict phylogenetic bins. if pd.cut is set, the distance threshold (ds) is disabled. default is NA. |
phylo.rand.scale |
character, the scale to randomize the taxa for phylogenetic null model. "within.bin" means randomization within each bin; "across.all" means randomization across all bins; "both" means to test both methods. Default setting is within.bin. |
taxa.rand.scale |
character, the scale to randomize the taxa for taxonomic null model. "within.bin" means randomization within each bin; "across.all" means randomization across all bins; "both" means to test both methods. Default setting is across.all. |
phylo.metric |
character, the metric for phylogenetic null model analysis. bMPD (or bNRI), null model analysis based on beta mean pairwise distance (betaMPD); if sig.index is SES, it is beta net relatedness index (betaNRI). bMNTD (or bNTI), null model analysis based on beta mean nearest taxon distance (betaMNTD); if sig.index is SES, it is beta nearest taxon index (betaNTI). both, use null model test based on both bMPD and bMNTD. Default setting is based on bMPD. |
sig.index |
character, the index for null model significance test. Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; if set sig.index as Confidence, it will be applied to both phylogenetic and taxonomic metrics. If set as SES.RC, use standard effect size (SES) for phylogenetic metrics (i.e. betaNTI or betaNRI), and use modified Raup-Crick (RC) for taxonomic metrics (RCbray). If set as SES, use SES for both phylogenetic and taxonomic metrics. If set as RC, use RC for both phylogenetic and taxonomic metrics. default is Confidence. If input a vector, only the first one will be used. |
bin.size.limit |
integer, the minimal requirement of bin size (taxa numer in a bin). Default setting is 24. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
rtree.save |
logic, whether to save the rooted tree as nwk file, if the input tree is not rooted. Default is FALSE. |
detail.save |
logic, whether to save the details, i.e. some key objects for iCAMP analysis, as rda file. Default is TRUE. |
qp.save |
logic, whether to save the relative importance of processes as csv file. Default is TRUE. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. But this need to be TRUE if you want to change significance testing index later using 'change.sigindex'. |
ignore.zero |
logic, in the community data matrix (comm), whether to remove the row(s)/column(s) of which the sum is zero. Default is TRUE. |
output.wd |
a folder path, where the files will be saved when rtree.save, detail.save, or qp.save is true. |
correct.special |
logic, whether to correct the special cases when calculating bNRI or bNTI. Default is TRUE. |
unit.sum |
NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances, and the Bray-Cuits index in each bin will become manhattan index divided by 2. Default setting are the row sums of community matrix, which are usually sequencing depth in each sample. If set as NULL, means not to do this special transformation. |
special.method |
When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MPD, use null model test based on mean pairwise distance; MNTD, use null model test based on mean nearest taxon distance; depend, use MPD when phylo.metric is bMPD or bNRI, and use MNTD when phylo.metric is bMNTD or bNTI; both, use both MPD and MNTD. Default is depend |
ses.cut |
numeric, the cutoff of significant standard effect size, default is 1.96. |
rc.cut |
numeric, the cutoff of significant modified Raup-Crick index value, default is 0.95. |
conf.cut |
numeric, the cutoff of significant one-side confidence level, default is 0.975. |
omit.option |
three options about omitting small bins. "no" means to merge small bins to their nearest relatives to meet the bin size requirement, rather than omitting them; "test" means to output the information of small strict bins with a size lower than requirement, iCAMP will not be performed; "omit" means to do iCAMP analysis with strict bins which have enough species (larger than bin size requirement). |
treepath.file |
character, name of the file saving the tree.path, which is a list of all the nodes and edge lengthes from root to every tip and/or node. it should be a .rda filename. |
pd.spname.file |
character, name of the file saving the taxa IDs, which has exactly the same order as the row names (and column names) of the big phylogenetic distance matrix. it should be a .csv filename. |
pd.backingfile |
character, the root name for the file for the cache of the big phylogenetic distance matrix. it should be a .bin filename. |
pd.desc.file |
character, name of the file to hold the backingfile description for the big phylogenetic distance matrix. it should be a .desc filename. |
taxo.metric |
taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored. |
transform.method |
character or a defined function, to specify how to transform community matrix before calculating dissimilarity. if it is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'. |
logbase |
numeric, the logarithm base used when transform.method='log'. |
dirichlet |
Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE. |
d.cut.method |
character, to specify the method to calculate pd.cut from ds. 'maxpd' means based on maximum phylogenetic distance, pd.cut = (maxpd - ds)/2. 'maxdroot' means based on maximum distance to root, pd.cut = maxdroot - (ds/2), which is preferred if the tree only has one edge from the root. |
This function is particularly designed for samples from different metacommunities, and allows phylogenetic and taxonomic null models have different settings of metacommunities. The null model will randomize the commuity matrix under different metacommunities, separately (and independently). All other details are the same as the function icamp.big.
If omit.option is test, the output will be a table summarizing the information of small bins.
Otherwise, the output is a list object, including one or more elements as below:
The first one or selveral (if set 'both' for metrics and/or randomization scale) elements are matrixes of process importances at community level. In each matrix, the first two columns will be sample ID of each turnover, and the third to last column will show estimated relative importance of each process in shaping each turnover between communities (samples). The name(s) of the element(s) shows the metrics and its randomization scale, e.g. bNRIiRCa means phylogenetic null model analysis using betaNRI (i.e. SES based on betaMPD) with randomizaiton within each bin and taxonomic null model analysis using RC based on Bray-Curtis with randomization across bins. Other possible phylogenetic null-model-based metrics: bNTI, betaNTI (i.e. SES based on betaMNTD); RCbMPD, RC based on betaMPD; RCbMNTD, RC based on betaMNTD; CbMPD, confidence level based on betaMPD; CbMNTD, confidence level based on betaMNTD. Other possible taxonomic null-model-based metrics: SESbray, SES based on Bray-Curtis; CBray, confidence level based on Bray-Curtis. i, within-bin randomization; a, across-bin randomization.
detail |
an element in output only if detail.save is TRUE. A list with elements as below. |
taxabin |
an element in 'detail'. A list, show phylogenetic binning results. The first element is a matrix named sp.bin, where each row is a taxon (OTU or ASV), the first column is the original strict bin ID, the second column is the original bin ID after small bins are merged into nearest relative(s), the third column is the final renewed bin ID. The second element named bin.united.sp is a list, where each element shows taxa IDs within each bin and the bins are in the order of the final renewed bin IDs. The third element named bin.strict.sp is a list, where each element shows taxa IDs within each strict bin and the bins are in the order of the original strict bin IDs. The fourth element named state.strict is a matrix, where the 1st column is orginal strict bin IDs, the 2nd column is the taxa number in each strict bin, the 3rd to 5th columns show the maximum, mean, and standard deviation of phylogenetic distances within each strict bin. The fifth element named state.united is a matrix, where the row numbering is the final bin ID, the 1st column is orginal bin IDs, the 2nd column is the taxa number in each final bin, the 3rd to 5th columns show the maximum, mean, and standard deviation of phylogenetic distances within each final bin. |
SigbMPDi , SigbMPDa , SigbMNTDi , SigbMNTDa , SigBCi , SigBCa
|
elements in 'detail', matrixes showing null model significance testing index for each turnover of each bin. In the name of the element(s), SigbMPD, SigMNTD, or SigBC mean the significance testing is based on betaMPD, betaMNTD, or taxonomic dissimilarity (default is Bray-Curtis); i, within-bin randomization; a, across-bin randomization. In each matrix, the first two columns are sample IDs for each turnover; the 3rd to the last column represent different bins with column names containing the significance testing index name, which can be bNRI, bNTI, RCbMPD, RCbMNTD, CbMPD, CbMNTD, SESbray, RCbray, or CBray as mentioned above. |
bin.weight |
an element in 'detail', a matrix showing relative abundance of each bin in each pair of samples. |
processes |
an element in 'detail', a list of process importance results at community level. |
setting |
an element in 'detail', a data.frame showing all basic settings of this function. |
comm |
an element in 'detail', the input community matrix. |
rand |
an element in output only if detail.null is TRUE. It is a list with each element showing the observed or null values of a beta diversity index (e.g. betaMPD, betaMNTD, Bray-Curtis). Each index is showed as a list where each element represents a bin. |
special.crct |
an element in output only if detail.null is TRUE. It shows the corrected values for special cases, where zero means no correction is needed. |
Version 1: 2022.2.10
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Stegen, J.C., Lin, X., Fredrickson, J.K. & Konopka, A.E. (2015). Estimating and mapping ecological processes influencing microbial community assembly. Front Microbiol, 6, 370.
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Zhou, J. & Ning, D. (2017). Stochastic community assembly: Does it matter in microbial ecology? Microbiology and Molecular Biology Reviews, 81.
Veech, J.A. (2012). Significance testing in ecological null models. Theor Ecol, 5, 611-616.
Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm tree=example.data$tree # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since need to save some output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() # please change to the folder you want to save the pd.big output. save.wd=paste0(tempdir(),"/pdbig.icampcm2") nworker=2 # parallel computing thread number rand.time=20 # usually use 1000 for real data. bin.size.limit=5 # for real data, usually use a proper number # according to phylogenetic signal test or try some settings # then choose the reasonable stochasticity level. # our experience is 12, or 24, or 48. # but for this example dataset which is too small, have to use 5. icamp.out=icamp.cm2(comm=comm, tree=tree, meta.group.phy=meta.group, meta.group.tax=NULL, pd.wd=save.wd, rand=rand.time, nworker=nworker, bin.size.limit=bin.size.limit) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since need to save some output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() # please change to the folder you want to save the pd.big output. save.wd=paste0(tempdir(),"/pdbig.icampcm2") nworker=2 # parallel computing thread number rand.time=20 # usually use 1000 for real data. bin.size.limit=5 # for real data, usually use a proper number # according to phylogenetic signal test or try some settings # then choose the reasonable stochasticity level. # our experience is 12, or 24, or 48. # but for this example dataset which is too small, have to use 5. icamp.out=icamp.cm2(comm=comm, tree=tree, meta.group.phy=meta.group, meta.group.tax=NULL, pd.wd=save.wd, rand=rand.time, nworker=nworker, bin.size.limit=bin.size.limit) setwd(wd0)
a typical output of icamp.big.
data("icamp.out")
data("icamp.out")
The format is: List of 4 $ bNRIiRCa :'data.frame': 190 obs. of 7 variables: ..$ sample1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... ..$ sample2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... ..$ Heterogeneous.Selection: num [1:190] 0 0 0 0 0 0 0 0 0 0 ... ..$ Homogeneous.Selection : num [1:190] 0 0 0 0 0 ... ..$ Dispersal.Limitation : num [1:190] 0 0 0 0 0 0 0 0 0 0 ... ..$ Homogenizing.Dispersal : num [1:190] 0 0.649 0.628 0 0 ... ..$ Drift.and.Others : num [1:190] 1 0.351 0.372 1 1 ... $ detail :List of 7 ..$ taxabin :List of 5 .. ..$ sp.bin :'data.frame': 30 obs. of 3 variables: .. .. ..$ bin.id.strict: int [1:30] 1 2 3 4 5 6 7 8 9 10 ... .. .. ..$ bin.id.united: chr [1:30] "34" "34" "34" "34" ... .. .. ..$ bin.id.new : num [1:30] 1 1 1 1 1 2 2 2 2 2 ... .. ..$ bin.united.sp:List of 3 .. .. ..$ : chr [1:5] "OTU1" "OTU2" "OTU3" "OTU4" ... .. .. ..$ : chr [1:7] "OTU6" "OTU7" "OTU8" "OTU9" ... .. .. ..$ : chr [1:18] "OTU13" "OTU14" "OTU15" "OTU16" ... .. ..$ bin.strict.sp:List of 30 .. .. ..$ 1 : chr "OTU1" .. .. ..$ 2 : chr "OTU2" .. .. ..$ 3 : chr "OTU3" .. .. ..$ 4 : chr "OTU4" .. .. ..$ 5 : chr "OTU5" .. .. ..$ 6 : chr "OTU6" .. .. ..$ 7 : chr "OTU7" .. .. ..$ 8 : chr "OTU8" .. .. ..$ 9 : chr "OTU9" .. .. ..$ 10: chr "OTU10" .. .. ..$ 11: chr "OTU11" .. .. ..$ 12: chr "OTU12" .. .. ..$ 13: chr "OTU13" .. .. ..$ 14: chr "OTU14" .. .. ..$ 15: chr "OTU15" .. .. ..$ 16: chr "OTU16" .. .. ..$ 17: chr "OTU17" .. .. ..$ 18: chr "OTU18" .. .. ..$ 19: chr "OTU19" .. .. ..$ 20: chr "OTU20" .. .. ..$ 21: chr "OTU21" .. .. ..$ 22: chr "OTU22" .. .. ..$ 23: chr "OTU23" .. .. ..$ 24: chr "OTU24" .. .. ..$ 25: chr "OTU25" .. .. ..$ 26: chr "OTU26" .. .. ..$ 27: chr "OTU27" .. .. ..$ 28: chr "OTU28" .. .. ..$ 29: chr "OTU29" .. .. ..$ 30: chr "OTU30" .. ..$ state.strict :'data.frame': 30 obs. of 5 variables: .. .. ..$ bin.strict.id : chr [1:30] "1" "2" "3" "4" ... .. .. ..$ bin.strict.taxa.num: int [1:30, 1] 1 1 1 1 1 1 1 1 1 1 ... .. .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. .. ..$ : chr [1:30] "1" "2" "3" "4" ... .. .. .. .. ..$ : NULL .. .. ..$ bin.pd.max : num [1:30] 0 0 0 0 0 0 0 0 0 0 ... .. .. ..$ bin.pd.mean : num [1:30] 0 0 0 0 0 0 0 0 0 0 ... .. .. ..$ bin.pd.sd : num [1:30] NA NA NA NA NA NA NA NA NA NA ... .. ..$ state.united :'data.frame': 3 obs. of 5 variables: .. .. ..$ bin.united.id.old : chr [1:3] "34" "38" "51" .. .. ..$ bin.united.tax.num: int [1:3] 5 7 18 .. .. ..$ bin.pd.max : num [1:3] 3.33 3.96 6.36 .. .. ..$ bin.pd.mean : num [1:3] 2.04 2.6 3.5 .. .. ..$ bin.pd.sd : num [1:3] 1.073 0.816 1.235 ..$ SigbMPDi :'data.frame': 190 obs. of 5 variables: .. ..$ name1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. ..$ name2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. ..$ bNRIi.bin1: num [1:190] 0.919 0.132 -1.207 1.274 1.274 ... .. ..$ bNRIi.bin2: num [1:190] -0.572 -0.935 0.311 -0.681 0 ... .. ..$ bNRIi.bin3: num [1:190] 0.919 0.865 1.034 0.345 0.701 ... ..$ SigBCa :'data.frame': 190 obs. of 5 variables: .. ..$ name1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. ..$ name2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. ..$ RCbraya.bin1: num [1:190] -0.81 0.41 0.14 -0.83 -0.8 -0.96 0.3 -0.12 -0.66 -0.75 ... .. ..$ RCbraya.bin2: num [1:190] 0.79 0.3 0.83 0.45 0.62 0.64 0.83 -0.49 0.84 0.42 ... .. ..$ RCbraya.bin3: num [1:190] -0.94 -0.98 -1 -0.42 -0.04 -0.82 -0.87 -0.55 -0.92 -0.12 ... ..$ bin.weight:'data.frame': 190 obs. of 5 variables: .. ..$ samp1: Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. ..$ samp2: Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. ..$ bin1 : num [1:190] 0.0873 0.1653 0.1389 0.0975 0.0799 ... .. ..$ bin2 : num [1:190] 0.242 0.186 0.233 0.204 0.111 ... .. ..$ bin3 : num [1:190] 0.671 0.649 0.628 0.698 0.809 ... ..$ processes :List of 1 .. ..$ bNRIiRCa:'data.frame': 190 obs. of 7 variables: .. .. ..$ sample1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ sample2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ Heterogeneous.Selection: num [1:190] 0 0 0 0 0 0 0 0 0 0 ... .. .. ..$ Homogeneous.Selection : num [1:190] 0 0 0 0 0 ... .. .. ..$ Dispersal.Limitation : num [1:190] 0 0 0 0 0 0 0 0 0 0 ... .. .. ..$ Homogenizing.Dispersal : num [1:190] 0 0.649 0.628 0 0 ... .. .. ..$ Drift.and.Others : num [1:190] 1 0.351 0.372 1 1 ... ..$ setting :'data.frame': 1 obs. of 24 variables: .. ..$ ds : num 0.2 .. ..$ pd.cut : logi NA .. ..$ max.pd : num 7.83 .. ..$ sp.check : logi TRUE .. ..$ phylo.rand.scale: Factor w/ 1 level "within.bin": 1 .. ..$ taxa.rand.scale : Factor w/ 1 level "across.all": 1 .. ..$ phylo.metric : Factor w/ 1 level "bMPD": 1 .. ..$ sig.index : Factor w/ 1 level "SES.RC": 1 .. ..$ bin.size.limit : num 5 .. ..$ nworker : num 4 .. ..$ memory.G : num 50 .. ..$ rtree.save : logi FALSE .. ..$ detail.save : logi TRUE .. ..$ qp.save : logi FALSE .. ..$ detail.null : logi TRUE .. ..$ ignore.zero : logi TRUE .. ..$ output.wd : Factor w/ 1 level "E:/Dropbox/ToolDevelop/package/iCAMP/LatestVersion/Example/TestOutputs14": 1 .. ..$ correct.special : logi TRUE .. ..$ unit.sum.mean : num 34.2 .. ..$ special.method : Factor w/ 1 level "depend": 1 .. ..$ ses.cut : num 1.96 .. ..$ rc.cut : num 0.95 .. ..$ conf.cut : num 0.975 .. ..$ omit.option : Factor w/ 1 level "no": 1 ..$ comm : int [1:20, 1:30] 1 3 0 0 2 2 2 0 0 4 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. .. ..$ : chr [1:30] "OTU1" "OTU2" "OTU3" "OTU4" ... $ rand :List of 4 ..$ bMPD.obs :List of 3 .. ..$ : num [1:20, 1:20] 0 0.00798 0.03268 0.01343 0.01722 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. ..$ : num [1:20, 1:20] 0 0.086 0.0216 0.0949 0.047 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. ..$ : num [1:20, 1:20] 0 1.48 1.29 1.27 1.64 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... ..$ bMPDi.rand:List of 3 .. ..$ :'data.frame': 190 obs. of 102 variables: .. .. ..$ name1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ name2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ rand1 : num [1:190] 0.00798 0.03197 0.01217 0.01476 0.00992 ... .. .. ..$ rand2 : num [1:190] 0.00543 0.02743 0.01838 0.01224 0.00822 ... .. .. ..$ rand3 : num [1:190] 0.00357 0.0245 0.02358 0.01224 0.00822 ... .. .. ..$ rand4 : num [1:190] 0.00839 0.04672 0.03604 0.00615 0.00413 ... .. .. ..$ rand5 : num [1:190] 0.0016 0.0226 0.0313 0.0172 0.0116 ... .. .. ..$ rand6 : num [1:190] 0.00626 0.03441 0.0261 0.00615 0.00413 ... .. .. ..$ rand7 : num [1:190] 0.00839 0.03746 0.01959 0.01476 0.00992 ... .. .. ..$ rand8 : num [1:190] 0.00839 0.03351 0.01257 0.01722 0.01157 ... .. .. ..$ rand9 : num [1:190] 0.00798 0.03268 0.01343 0.00697 0.00468 ... .. .. ..$ rand10 : num [1:190] 0.00357 0.02626 0.02671 0.00615 0.00413 ... .. .. ..$ rand11 : num [1:190] 0.00357 0.03161 0.03623 0.00313 0.0021 ... .. .. ..$ rand12 : num [1:190] 0.0016 0.0125 0.01325 0.01476 0.00992 ... .. .. ..$ rand13 : num [1:190] 0.00881 0.05001 0.03956 0.00248 0.00166 ... .. .. ..$ rand14 : num [1:190] 0.00127 0.02228 0.03252 0.01722 0.01157 ... .. .. ..$ rand15 : num [1:190] 0.00881 0.04554 0.03161 0.00248 0.00166 ... .. .. ..$ rand16 : num [1:190] 0.0016 0.02264 0.03127 0.00697 0.00468 ... .. .. ..$ rand17 : num [1:190] 0.00881 0.03988 0.02154 0.00313 0.0021 ... .. .. ..$ rand18 : num [1:190] 0.0016 0.02176 0.02971 0.00615 0.00413 ... .. .. ..$ rand19 : num [1:190] 0.0016 0.0134 0.0148 0.0172 0.0116 ... .. .. ..$ rand20 : num [1:190] 0.00127 0.02053 0.02939 0.01558 0.01047 ... .. .. ..$ rand21 : num [1:190] 0.00543 0.03581 0.03327 0.00248 0.00166 ... .. .. ..$ rand22 : num [1:190] 0.00798 0.04518 0.03565 0.0106 0.00712 ... .. .. ..$ rand23 : num [1:190] 0.00798 0.03197 0.01217 0.0106 0.00712 ... .. .. ..$ rand24 : num [1:190] 0.00881 0.05001 0.03956 0.01224 0.00822 ... .. .. ..$ rand25 : num [1:190] 0.00543 0.03309 0.02845 0.00697 0.00468 ... .. .. ..$ rand26 : num [1:190] 0.00839 0.03421 0.01382 0.00615 0.00413 ... .. .. ..$ rand27 : num [1:190] 0.00357 0.02626 0.02671 0.0106 0.00712 ... .. .. ..$ rand28 : num [1:190] 0.00881 0.03575 0.01421 0.00697 0.00468 ... .. .. ..$ rand29 : num [1:190] 0.00626 0.03441 0.0261 0.00697 0.00468 ... .. .. ..$ rand30 : num [1:190] 0.00798 0.04782 0.04035 0.00697 0.00468 ... .. .. ..$ rand31 : num [1:190] 0.00626 0.04064 0.03718 0.00248 0.00166 ... .. .. ..$ rand32 : num [1:190] 0.00127 0.02316 0.03408 0.01224 0.00822 ... .. .. ..$ rand33 : num [1:190] 0.00626 0.04152 0.03875 0.00248 0.00166 ... .. .. ..$ rand34 : num [1:190] 0.00881 0.03575 0.01421 0.01558 0.01047 ... .. .. ..$ rand35 : num [1:190] 0.0016 0.0244 0.0344 0.0156 0.0105 ... .. .. ..$ rand36 : num [1:190] 0.00543 0.02655 0.01681 0.01224 0.00822 ... .. .. ..$ rand37 : num [1:190] 0.00127 0.01781 0.02457 0.0164 0.01102 ... .. .. ..$ rand38 : num [1:190] 0.00756 0.03916 0.02731 0.00248 0.00166 ... .. .. ..$ rand39 : num [1:190] 0.00756 0.03438 0.0188 0.00313 0.0021 ... .. .. ..$ rand40 : num [1:190] 0.00357 0.01647 0.00931 0.01722 0.01157 ... .. .. ..$ rand41 : num [1:190] 0.00127 0.02053 0.02939 0.0106 0.00712 ... .. .. ..$ rand42 : num [1:190] 0.0016 0.0125 0.0132 0.0164 0.011 ... .. .. ..$ rand43 : num [1:190] 0.0016 0.0218 0.0297 0.0164 0.011 ... .. .. ..$ rand44 : num [1:190] 0.00756 0.0454 0.03839 0.00313 0.0021 ... .. .. ..$ rand45 : num [1:190] 0.0016 0.02352 0.03283 0.00615 0.00413 ... .. .. ..$ rand46 : num [1:190] 0.00626 0.02963 0.0176 0.00697 0.00468 ... .. .. ..$ rand47 : num [1:190] 0.00315 0.02472 0.02632 0.0106 0.00712 ... .. .. ..$ rand48 : num [1:190] 0.00798 0.04782 0.04035 0.00313 0.0021 ... .. .. ..$ rand49 : num [1:190] 0.00315 0.02744 0.03115 0.0164 0.01102 ... .. .. ..$ rand50 : num [1:190] 0.00839 0.03746 0.01959 0.00313 0.0021 ... .. .. ..$ rand51 : num [1:190] 0.00127 0.02141 0.03096 0.0106 0.00712 ... .. .. ..$ rand52 : num [1:190] 0.00839 0.03421 0.01382 0.01476 0.00992 ... .. .. ..$ rand53 : num [1:190] 0.00839 0.04935 0.04074 0.01224 0.00822 ... .. .. ..$ rand54 : num [1:190] 0.00315 0.02919 0.03427 0.00313 0.0021 ... .. .. ..$ rand55 : num [1:190] 0.00357 0.02986 0.0331 0.00313 0.0021 ... .. .. ..$ rand56 : num [1:190] 0.00357 0.01647 0.00931 0.01558 0.01047 ... .. .. ..$ rand57 : num [1:190] 0.00315 0.02296 0.02319 0.00697 0.00468 ... .. .. ..$ rand58 : num [1:190] 0.00756 0.0454 0.03839 0.00615 0.00413 ... .. .. ..$ rand59 : num [1:190] 0.00315 0.01493 0.00892 0.0164 0.01102 ... .. .. ..$ rand60 : num [1:190] 0.00315 0.02744 0.03115 0.00313 0.0021 ... .. .. ..$ rand61 : num [1:190] 0.00756 0.03438 0.0188 0.0164 0.01102 ... .. .. ..$ rand62 : num [1:190] 0.00839 0.044 0.03122 0.00248 0.00166 ... .. .. ..$ rand63 : num [1:190] 0.00798 0.0368 0.02076 0.00313 0.0021 ... .. .. ..$ rand64 : num [1:190] 0.00315 0.01493 0.00892 0.01476 0.00992 ... .. .. ..$ rand65 : num [1:190] 0.00798 0.04518 0.03565 0.00248 0.00166 ... .. .. ..$ rand66 : num [1:190] 0.00626 0.02963 0.0176 0.0106 0.00712 ... .. .. ..$ rand67 : num [1:190] 0.00881 0.03505 0.01296 0.0164 0.01102 ... .. .. ..$ rand68 : num [1:190] 0.00626 0.04152 0.03875 0.0164 0.01102 ... .. .. ..$ rand69 : num [1:190] 0.00626 0.02568 0.01058 0.0164 0.01102 ... .. .. ..$ rand70 : num [1:190] 0.00127 0.02141 0.03096 0.01476 0.00992 ... .. .. ..$ rand71 : num [1:190] 0.00798 0.0407 0.0277 0.01476 0.00992 ... .. .. ..$ rand72 : num [1:190] 0.00839 0.04935 0.04074 0.00248 0.00166 ... .. .. ..$ rand73 : num [1:190] 0.00543 0.02743 0.01838 0.00615 0.00413 ... .. .. ..$ rand74 : num [1:190] 0.00881 0.03505 0.01296 0.01224 0.00822 ... .. .. ..$ rand75 : num [1:190] 0.00626 0.04064 0.03718 0.01722 0.01157 ... .. .. ..$ rand76 : num [1:190] 0.00881 0.03988 0.02154 0.01558 0.01047 ... .. .. ..$ rand77 : num [1:190] 0.00315 0.01906 0.01625 0.01224 0.00822 ... .. .. ..$ rand78 : num [1:190] 0.00756 0.03114 0.01304 0.00615 0.00413 ... .. .. ..$ rand79 : num [1:190] 0.0016 0.02352 0.03283 0.01476 0.00992 ... .. .. ..$ rand80 : num [1:190] 0.00127 0.01605 0.02144 0.01476 0.00992 ... .. .. ..$ rand81 : num [1:190] 0.00127 0.01605 0.02144 0.01558 0.01047 ... .. .. ..$ rand82 : num [1:190] 0.00543 0.03669 0.03484 0.01476 0.00992 ... .. .. ..$ rand83 : num [1:190] 0.00626 0.03051 0.01916 0.0106 0.00712 ... .. .. ..$ rand84 : num [1:190] 0.00798 0.0368 0.02076 0.01722 0.01157 ... .. .. ..$ rand85 : num [1:190] 0.00881 0.04914 0.038 0.00313 0.0021 ... .. .. ..$ rand86 : num [1:190] 0.00357 0.02986 0.0331 0.01722 0.01157 ... .. .. ..$ rand87 : num [1:190] 0.00756 0.03043 0.01178 0.0106 0.00712 ... .. .. ..$ rand88 : num [1:190] 0.00756 0.03114 0.01304 0.0164 0.01102 ... .. .. ..$ rand89 : num [1:190] 0.00543 0.0226 0.00979 0.01558 0.01047 ... .. .. ..$ rand90 : num [1:190] 0.00626 0.02568 0.01058 0.01722 0.01157 ... .. .. ..$ rand91 : num [1:190] 0.00127 0.02316 0.03408 0.0164 0.01102 ... .. .. ..$ rand92 : num [1:190] 0.00756 0.03916 0.02731 0.01558 0.01047 ... .. .. ..$ rand93 : num [1:190] 0.00881 0.04914 0.038 0.00697 0.00468 ... .. .. ..$ rand94 : num [1:190] 0.00756 0.04452 0.03683 0.00248 0.00166 ... .. .. ..$ rand95 : num [1:190] 0.0016 0.0134 0.0148 0.0156 0.0105 ... .. .. ..$ rand96 : num [1:190] 0.00357 0.01972 0.01508 0.0106 0.00712 ... .. .. ..$ rand97 : num [1:190] 0.00543 0.03309 0.02845 0.00615 0.00413 ... .. .. .. [list output truncated] .. ..$ :'data.frame': 190 obs. of 102 variables: .. .. ..$ name1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ name2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ rand1 : num [1:190] 0.1008 0.0194 0.1046 0.0538 0 ... .. .. ..$ rand2 : num [1:190] 0.1009 0.0418 0.0792 0.059 0 ... .. .. ..$ rand3 : num [1:190] 0.093 0.0289 0.0847 0.0521 0 ... .. .. ..$ rand4 : num [1:190] 0.0695 0.0198 0.0566 0.0386 0 ... .. .. ..$ rand5 : num [1:190] 0.1207 0.0288 0.1134 0.0656 0 ... .. .. ..$ rand6 : num [1:190] 0.0985 0.0328 0.0897 0.0557 0 ... .. .. ..$ rand7 : num [1:190] 0.0769 0.0274 0.0749 0.0439 0 ... .. .. ..$ rand8 : num [1:190] 0.0659 0.0216 0.0713 0.0372 0 ... .. .. ..$ rand9 : num [1:190] 0.1118 0.0418 0.1078 0.0643 0 ... .. .. ..$ rand10 : num [1:190] 0.1054 0.0256 0.0966 0.0574 0 ... .. .. ..$ rand11 : num [1:190] 0.0823 0.0288 0.0721 0.0469 0 ... .. .. ..$ rand12 : num [1:190] 0.1124 0.0274 0.0929 0.0613 0 ... .. .. ..$ rand13 : num [1:190] 0.1193 0.044 0.1063 0.0685 0 ... .. .. ..$ rand14 : num [1:190] 0.1036 0.0384 0.094 0.0595 0 ... .. .. ..$ rand15 : num [1:190] 0.0938 0.0271 0.0728 0.0521 0 ... .. .. ..$ rand16 : num [1:190] 0.1036 0.0384 0.0924 0.0595 0 ... .. .. ..$ rand17 : num [1:190] 0.1112 0.0385 0.0964 0.0633 0 ... .. .. ..$ rand18 : num [1:190] 0.0881 0.0328 0.0837 0.0506 0 ... .. .. ..$ rand19 : num [1:190] 0.0876 0.0271 0.0684 0.0491 0 ... .. .. ..$ rand20 : num [1:190] 0.0846 0.0198 0.0862 0.0459 0 ... .. .. ..$ rand21 : num [1:190] 0.1051 0.0148 0.101 0.0548 0 ... .. .. ..$ rand22 : num [1:190] 0.1387 0.0384 0.1101 0.0766 0 ... .. .. ..$ rand23 : num [1:190] 0.08 0.0295 0.0768 0.0459 0 ... .. .. ..$ rand24 : num [1:190] 0.1174 0.0311 0.1051 0.0646 0 ... .. .. ..$ rand25 : num [1:190] 0.0883 0.0256 0.0691 0.0491 0 ... .. .. ..$ rand26 : num [1:190] 0.1071 0.0385 0.088 0.0613 0 ... .. .. ..$ rand27 : num [1:190] 0.1282 0.0311 0.1127 0.0699 0 ... .. .. ..$ rand28 : num [1:190] 0.1254 0.044 0.12 0.0715 0 ... .. .. ..$ rand29 : num [1:190] 0.1009 0.0418 0.0917 0.059 0 ... .. .. ..$ rand30 : num [1:190] 0.1054 0.0194 0.0979 0.056 0 ... .. .. ..$ rand31 : num [1:190] 0.099 0.0328 0.0919 0.056 0 ... .. .. ..$ rand32 : num [1:190] 0.0938 0.0271 0.0728 0.0521 0 ... .. .. ..$ rand33 : num [1:190] 0.1008 0.0311 0.1012 0.0565 0 ... .. .. ..$ rand34 : num [1:190] 0.1147 0.0418 0.0949 0.0657 0 ... .. .. ..$ rand35 : num [1:190] 0.0825 0.0289 0.0677 0.047 0 ... .. .. ..$ rand36 : num [1:190] 0.1031 0.0328 0.0926 0.058 0 ... .. .. ..$ rand37 : num [1:190] 0.0899 0.0385 0.0949 0.0529 0 ... .. .. ..$ rand38 : num [1:190] 0.1104 0.0216 0.1006 0.059 0 ... .. .. ..$ rand39 : num [1:190] 0.0869 0.0194 0.0928 0.047 0 ... .. .. ..$ rand40 : num [1:190] 0.0501 0.0198 0.0605 0.0291 0 ... .. .. ..$ rand41 : num [1:190] 0.094 0.0296 0.0917 0.0528 0 ... .. .. ..$ rand42 : num [1:190] 0.0865 0.0407 0.0823 0.0517 0 ... .. .. ..$ rand43 : num [1:190] 0.1221 0.044 0.1069 0.0699 0 ... .. .. ..$ rand44 : num [1:190] 0.0541 0.0114 0.0642 0.0291 0 ... .. .. ..$ rand45 : num [1:190] 0.0805 0.0198 0.0782 0.0439 0 ... .. .. ..$ rand46 : num [1:190] 0.0934 0.0216 0.1001 0.0506 0 ... .. .. ..$ rand47 : num [1:190] 0.094 0.0274 0.0969 0.0522 0 ... .. .. ..$ rand48 : num [1:190] 0.1015 0.0296 0.085 0.0565 0 ... .. .. ..$ rand49 : num [1:190] 0.1015 0.0296 0.1019 0.0565 0 ... .. .. ..$ rand50 : num [1:190] 0.0908 0.0198 0.0857 0.049 0 ... .. .. ..$ rand51 : num [1:190] 0.0879 0.0418 0.0761 0.0527 0 ... .. .. ..$ rand52 : num [1:190] 0.1118 0.0271 0.0854 0.0609 0 ... .. .. ..$ rand53 : num [1:190] 0.1032 0.0361 0.0784 0.0588 0 ... .. .. ..$ rand54 : num [1:190] 0.0794 0.0296 0.0885 0.0456 0 ... .. .. ..$ rand55 : num [1:190] 0.1112 0.0384 0.0909 0.0632 0 ... .. .. ..$ rand56 : num [1:190] 0.133 0.0418 0.114 0.0746 0 ... .. .. ..$ rand57 : num [1:190] 0.0988 0.0148 0.0938 0.0517 0 ... .. .. ..$ rand58 : num [1:190] 0.1199 0.0296 0.1148 0.0655 0 ... .. .. ..$ rand59 : num [1:190] 0.1435 0.0418 0.1215 0.0798 0 ... .. .. ..$ rand60 : num [1:190] 0.1249 0.0361 0.1007 0.0694 0 ... .. .. ..$ rand61 : num [1:190] 0.0917 0.0256 0.0855 0.0508 0 ... .. .. ..$ rand62 : num [1:190] 0.1032 0.0361 0.094 0.0588 0 ... .. .. ..$ rand63 : num [1:190] 0.1252 0.0289 0.1136 0.0679 0 ... .. .. ..$ rand64 : num [1:190] 0.0965 0.0194 0.1025 0.0517 0 ... .. .. ..$ rand65 : num [1:190] 0.0766 0.0271 0.0768 0.0437 0 ... .. .. ..$ rand66 : num [1:190] 0.0995 0.0184 0.1038 0.0529 0 ... .. .. ..$ rand67 : num [1:190] 0.1209 0.0289 0.1205 0.0657 0 ... .. .. ..$ rand68 : num [1:190] 0.1118 0.0418 0.1078 0.0643 0 ... .. .. ..$ rand69 : num [1:190] 0.068 0.0114 0.0691 0.0358 0 ... .. .. ..$ rand70 : num [1:190] 0.0674 0.0184 0.0708 0.0372 0 ... .. .. ..$ rand71 : num [1:190] 0.1124 0.0274 0.1128 0.0613 0 ... .. .. ..$ rand72 : num [1:190] 0.0944 0.0296 0.092 0.053 0 ... .. .. ..$ rand73 : num [1:190] 0.1237 0.0385 0.1194 0.0694 0 ... .. .. ..$ rand74 : num [1:190] 0.0801 0.0296 0.0771 0.046 0 ... .. .. ..$ rand75 : num [1:190] 0.0944 0.0296 0.0805 0.053 0 ... .. .. ..$ rand76 : num [1:190] 0.1032 0.0361 0.1024 0.0588 0 ... .. .. ..$ rand77 : num [1:190] 0.1009 0.0418 0.0917 0.059 0 ... .. .. ..$ rand78 : num [1:190] 0.0733 0.0361 0.0701 0.0442 0 ... .. .. ..$ rand79 : num [1:190] 0.1051 0.0148 0.0911 0.0548 0 ... .. .. ..$ rand80 : num [1:190] 0.0822 0.0311 0.0797 0.0474 0 ... .. .. ..$ rand81 : num [1:190] 0.0986 0.0295 0.0834 0.055 0 ... .. .. ..$ rand82 : num [1:190] 0.0865 0.0407 0.0823 0.0517 0 ... .. .. ..$ rand83 : num [1:190] 0.0659 0.0216 0.0713 0.0372 0 ... .. .. ..$ rand84 : num [1:190] 0.1038 0.0385 0.0946 0.0596 0 ... .. .. ..$ rand85 : num [1:190] 0.0999 0.044 0.0991 0.059 0 ... .. .. ..$ rand86 : num [1:190] 0.1174 0.0407 0.0949 0.0668 0 ... .. .. ..$ rand87 : num [1:190] 0.1326 0.0385 0.1113 0.0737 0 ... .. .. ..$ rand88 : num [1:190] 0.0846 0.0198 0.0811 0.0459 0 ... .. .. ..$ rand89 : num [1:190] 0.0769 0.0274 0.0749 0.0439 0 ... .. .. ..$ rand90 : num [1:190] 0.0755 0.0114 0.0772 0.0395 0 ... .. .. ..$ rand91 : num [1:190] 0.136 0.044 0.1166 0.0766 0 ... .. .. ..$ rand92 : num [1:190] 0.1036 0.0384 0.0924 0.0595 0 ... .. .. ..$ rand93 : num [1:190] 0.0501 0.0198 0.0491 0.0291 0 ... .. .. ..$ rand94 : num [1:190] 0.0562 0.0148 0.0739 0.0309 0 ... .. .. ..$ rand95 : num [1:190] 0.0902 0.0289 0.084 0.0508 0 ... .. .. ..$ rand96 : num [1:190] 0.086 0.0216 0.0869 0.047 0 ... .. .. ..$ rand97 : num [1:190] 0.093 0.0289 0.0847 0.0521 0 ... .. .. .. [list output truncated] .. ..$ :'data.frame': 190 obs. of 102 variables: .. .. ..$ name1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ name2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ rand1 : num [1:190] 1.36 1.25 1.23 1.66 2.19 ... .. .. ..$ rand2 : num [1:190] 1.46 1.25 1.27 1.79 2.31 ... .. .. ..$ rand3 : num [1:190] 1.34 1.15 1.06 1.52 1.96 ... .. .. ..$ rand4 : num [1:190] 1.26 1.13 1.03 1.43 1.94 ... .. .. ..$ rand5 : num [1:190] 1.349 1.137 0.997 1.545 1.841 ... .. .. ..$ rand6 : num [1:190] 1.43 1.27 1.23 1.67 2.1 ... .. .. ..$ rand7 : num [1:190] 1.12 1.02 1.04 1.48 2.07 ... .. .. ..$ rand8 : num [1:190] 1.38 1.17 1.17 1.8 2.34 ... .. .. ..$ rand9 : num [1:190] 1.15 0.934 0.867 1.5 1.997 ... .. .. ..$ rand10 : num [1:190] 1.48 1.26 1.22 1.75 2.26 ... .. .. ..$ rand11 : num [1:190] 1.39 1.25 1.22 1.48 1.98 ... .. .. ..$ rand12 : num [1:190] 1.28 1.11 1.17 1.87 2.41 ... .. .. ..$ rand13 : num [1:190] 1.328 1.109 0.973 1.653 2.036 ... .. .. ..$ rand14 : num [1:190] 1.5 1.3 1.28 1.62 2.06 ... .. .. ..$ rand15 : num [1:190] 1.35 1.24 1.27 1.74 2.27 ... .. .. ..$ rand16 : num [1:190] 0.99 0.865 0.923 1.242 1.662 ... .. .. ..$ rand17 : num [1:190] 1.26 1.15 1.08 1.52 1.98 ... .. .. ..$ rand18 : num [1:190] 1.259 1.08 0.976 1.326 1.75 ... .. .. ..$ rand19 : num [1:190] 1.72 1.53 1.44 2.03 2.62 ... .. .. ..$ rand20 : num [1:190] 1.085 0.868 0.893 1.23 1.578 ... .. .. ..$ rand21 : num [1:190] 0.969 0.938 0.883 1.278 1.695 ... .. .. ..$ rand22 : num [1:190] 1.17 1.03 1 1.4 1.81 ... .. .. ..$ rand23 : num [1:190] 1.26 1.04 1.06 1.36 1.72 ... .. .. ..$ rand24 : num [1:190] 1.144 0.993 0.991 1.479 1.978 ... .. .. ..$ rand25 : num [1:190] 0.973 0.818 0.878 1.294 1.771 ... .. .. ..$ rand26 : num [1:190] 1.44 1.29 1.25 1.82 2.33 ... .. .. ..$ rand27 : num [1:190] 1.34 1.23 1.22 1.55 1.9 ... .. .. ..$ rand28 : num [1:190] 1.21 1.12 1.11 1.6 2.06 ... .. .. ..$ rand29 : num [1:190] 0.996 0.869 0.977 1.239 1.724 ... .. .. ..$ rand30 : num [1:190] 1.29 1.04 1.03 1.59 2 ... .. .. ..$ rand31 : num [1:190] 0.991 0.807 0.907 1.473 1.771 ... .. .. ..$ rand32 : num [1:190] 1.59 1.3 1.27 1.7 2.23 ... .. .. ..$ rand33 : num [1:190] 1.52 1.44 1.3 1.92 2.46 ... .. .. ..$ rand34 : num [1:190] 1.12 1.01 1 1.41 1.77 ... .. .. ..$ rand35 : num [1:190] 1.76 1.55 1.51 1.88 2.43 ... .. .. ..$ rand36 : num [1:190] 1.13 1.02 1.08 1.43 1.92 ... .. .. ..$ rand37 : num [1:190] 1.57 1.25 1.39 1.91 2.4 ... .. .. ..$ rand38 : num [1:190] 1.4 1.24 1.24 1.66 2.08 ... .. .. ..$ rand39 : num [1:190] 1.43 1.18 1.13 1.66 2.15 ... .. .. ..$ rand40 : num [1:190] 1.78 1.5 1.38 1.84 2.31 ... .. .. ..$ rand41 : num [1:190] 1.33 1.09 1.06 1.55 1.94 ... .. .. ..$ rand42 : num [1:190] 1.45 1.33 1.22 1.76 2.31 ... .. .. ..$ rand43 : num [1:190] 1.119 1.014 0.993 1.496 1.879 ... .. .. ..$ rand44 : num [1:190] 1.4 1.29 1.18 1.63 2.17 ... .. .. ..$ rand45 : num [1:190] 1.59 1.45 1.29 1.68 2.2 ... .. .. ..$ rand46 : num [1:190] 1.45 1.28 1.27 1.77 2.22 ... .. .. ..$ rand47 : num [1:190] 1.24 1.05 1.09 1.5 2.03 ... .. .. ..$ rand48 : num [1:190] 1.14 1.16 1.17 1.54 2.1 ... .. .. ..$ rand49 : num [1:190] 0.951 0.928 0.924 1.384 1.856 ... .. .. ..$ rand50 : num [1:190] 1.39 1.24 1.23 1.65 2.05 ... .. .. ..$ rand51 : num [1:190] 1.233 1.037 0.956 1.522 1.937 ... .. .. ..$ rand52 : num [1:190] 0.962 0.815 0.929 1.202 1.568 ... .. .. ..$ rand53 : num [1:190] 1.42 1.3 1.28 1.95 2.46 ... .. .. ..$ rand54 : num [1:190] 0.936 0.733 0.769 1.24 1.453 ... .. .. ..$ rand55 : num [1:190] 1.197 0.968 0.983 1.272 1.743 ... .. .. ..$ rand56 : num [1:190] 1.48 1.37 1.29 1.7 2.33 ... .. .. ..$ rand57 : num [1:190] 1.038 0.976 1.023 1.476 1.916 ... .. .. ..$ rand58 : num [1:190] 1.42 1.3 1.2 1.69 2.18 ... .. .. ..$ rand59 : num [1:190] 0.979 0.93 0.833 1.418 1.849 ... .. .. ..$ rand60 : num [1:190] 1.33 1.11 1.15 1.63 2.09 ... .. .. ..$ rand61 : num [1:190] 1.54 1.23 1.13 1.59 2.07 ... .. .. ..$ rand62 : num [1:190] 1.117 0.805 0.844 1.329 1.576 ... .. .. ..$ rand63 : num [1:190] 1.25 1.12 1.21 1.44 1.82 ... .. .. ..$ rand64 : num [1:190] 1.36 1.18 1.13 1.6 2.09 ... .. .. ..$ rand65 : num [1:190] 1.15 1.029 0.947 1.348 1.851 ... .. .. ..$ rand66 : num [1:190] 1.143 0.963 0.956 1.278 1.784 ... .. .. ..$ rand67 : num [1:190] 1.128 0.834 0.855 1.401 1.661 ... .. .. ..$ rand68 : num [1:190] 1.02 0.93 0.87 1.57 1.97 ... .. .. ..$ rand69 : num [1:190] 0.984 0.931 0.919 1.472 1.913 ... .. .. ..$ rand70 : num [1:190] 1.092 0.835 0.889 1.337 1.635 ... .. .. ..$ rand71 : num [1:190] 1.2 1.104 0.992 1.777 2.262 ... .. .. ..$ rand72 : num [1:190] 1.21 1.06 1.14 1.27 1.64 ... .. .. ..$ rand73 : num [1:190] 1.32 1.07 0.93 1.55 2.07 ... .. .. ..$ rand74 : num [1:190] 1.5 1.3 1.26 1.74 2.27 ... .. .. ..$ rand75 : num [1:190] 1.62 1.46 1.42 1.86 2.45 ... .. .. ..$ rand76 : num [1:190] 1.28 1.09 1.13 1.7 2.2 ... .. .. ..$ rand77 : num [1:190] 1.156 0.874 0.936 1.548 2.069 ... .. .. ..$ rand78 : num [1:190] 1.201 0.989 0.922 1.352 1.769 ... .. .. ..$ rand79 : num [1:190] 1.34 1.12 1.14 1.37 1.8 ... .. .. ..$ rand80 : num [1:190] 1.31 1.12 1.02 1.59 2.06 ... .. .. ..$ rand81 : num [1:190] 1.43 1.13 1.03 1.49 1.94 ... .. .. ..$ rand82 : num [1:190] 1.216 0.798 0.897 1.465 1.809 ... .. .. ..$ rand83 : num [1:190] 1.139 1.048 0.957 1.462 1.843 ... .. .. ..$ rand84 : num [1:190] 1.5 1.23 1.18 1.76 2.26 ... .. .. ..$ rand85 : num [1:190] 1.58 1.46 1.45 1.87 2.53 ... .. .. ..$ rand86 : num [1:190] 1.39 1.22 1.18 1.65 2.14 ... .. .. ..$ rand87 : num [1:190] 0.994 0.731 0.766 1.279 1.735 ... .. .. ..$ rand88 : num [1:190] 1.15 1.04 1.06 1.54 2.01 ... .. .. ..$ rand89 : num [1:190] 1.19 1.02 1.05 1.46 1.84 ... .. .. ..$ rand90 : num [1:190] 1.65 1.39 1.35 1.93 2.29 ... .. .. ..$ rand91 : num [1:190] 1.26 1.01 1.04 1.39 1.81 ... .. .. ..$ rand92 : num [1:190] 1.57 1.35 1.3 1.76 2.16 ... .. .. ..$ rand93 : num [1:190] 1.22 1.09 1.11 1.31 1.74 ... .. .. ..$ rand94 : num [1:190] 1.4 1.31 1.24 1.78 2.35 ... .. .. ..$ rand95 : num [1:190] 1.71 1.48 1.36 1.92 2.48 ... .. .. ..$ rand96 : num [1:190] 1.4 1.23 1.04 1.61 2.02 ... .. .. ..$ rand97 : num [1:190] 1.47 1.23 1.18 1.5 1.96 ... .. .. .. [list output truncated] ..$ BC.obs :List of 3 .. ..$ : num [1:20, 1:20] 0 0.0317 0.1653 0.1389 0.051 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. ..$ : num [1:20, 1:20] 0 0.1468 0.0861 0.1444 0.1111 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. ..$ : num [1:20, 1:20] 0 0.345 0.263 0.317 0.494 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... .. .. .. ..$ : chr [1:20] "s1" "s2" "s3" "s4" ... ..$ BCa.rand :List of 3 .. ..$ :'data.frame': 190 obs. of 102 variables: .. .. ..$ name1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ name2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ rand1 : num [1:190] 0.127 0.1931 0.0333 0.137 0.0451 ... .. .. ..$ rand2 : num [1:190] 0.1429 0.0875 0.0778 0.0465 0.0208 ... .. .. ..$ rand3 : num [1:190] 0.1071 0.125 0.0667 0.0698 0.0521 ... .. .. ..$ rand4 : num [1:190] 0.19 0.225 0.206 0.262 0.281 ... .. .. ..$ rand5 : num [1:190] 0.107 0.296 0.117 0.095 0.104 ... .. .. ..$ rand6 : num [1:190] 0.123 0.169 0.194 0.148 0.163 ... .. .. ..$ rand7 : num [1:190] 0 0.1125 0.0889 0.0698 0.0104 ... .. .. ..$ rand8 : num [1:190] 0.107 0.167 0.178 0.143 0.104 ... .. .. ..$ rand9 : num [1:190] 0.127 0.1431 0.0778 0.1253 0.0972 ... .. .. ..$ rand10 : num [1:190] 0.1587 0.1403 0.0278 0.1208 0.0903 ... .. .. ..$ rand11 : num [1:190] 0.1548 0.1958 0.0944 0.1415 0.0938 ... .. .. ..$ rand12 : num [1:190] 0.0833 0.0375 0.1111 0.093 0.1562 ... .. .. ..$ rand13 : num [1:190] 0.111 0.211 0.189 0.123 0.142 ... .. .. ..$ rand14 : num [1:190] 0.0992 0.1361 0.1556 0.1693 0.2049 ... .. .. ..$ rand15 : num [1:190] 0.127 0.156 0.1 0.16 0.16 ... .. .. ..$ rand16 : num [1:190] 0.07937 0.07778 0.00556 0.13243 0.07986 ... .. .. ..$ rand17 : num [1:190] 0.234 0.16 0.244 0.222 0.253 ... .. .. ..$ rand18 : num [1:190] 0.0754 0.1653 0.15 0.1441 0.0278 ... .. .. ..$ rand19 : num [1:190] 0.1905 0.1125 0.1556 0.093 0.0104 ... .. .. ..$ rand20 : num [1:190] 0.337 0.29 0.3 0.243 0.236 ... .. .. ..$ rand21 : num [1:190] 0.1706 0.1736 0.1889 0.0995 0.1944 ... .. .. ..$ rand22 : num [1:190] 0.0952 0.0375 0.1889 0.1395 0.0833 ... .. .. ..$ rand23 : num [1:190] 0.0595 0.1333 0.1722 0.1531 0.059 ... .. .. ..$ rand24 : num [1:190] 0.19 0.204 0.178 0.167 0.115 ... .. .. ..$ rand25 : num [1:190] 0.0952 0.05 0.1889 0.093 0.0417 ... .. .. ..$ rand26 : num [1:190] 0.246 0.21 0.267 0.257 0.264 ... .. .. ..$ rand27 : num [1:190] 0.1786 0.2125 0.0222 0.0698 0.125 ... .. .. ..$ rand28 : num [1:190] 0.2262 0.1292 0.0556 0.1085 0.1875 ... .. .. ..$ rand29 : num [1:190] 0.119 0.075 0.0778 0.0465 0.1042 ... .. .. ..$ rand30 : num [1:190] 0.0595 0.075 0.1 0.1163 0.125 ... .. .. ..$ rand31 : num [1:190] 0.119 0.1 0.167 0.128 0 ... .. .. ..$ rand32 : num [1:190] 0.131 0.0625 0.1667 0.0814 0.0833 ... .. .. ..$ rand33 : num [1:190] 0.0119 0.0958 0.1389 0.1996 0.1667 ... .. .. ..$ rand34 : num [1:190] 0.119 0 0.1444 0.0465 0.125 ... .. .. ..$ rand35 : num [1:190] 0.23 0.249 0.144 0.111 0.132 ... .. .. ..$ rand36 : num [1:190] 0.0357 0.15 0.0333 0.0233 0.1146 ... .. .. ..$ rand37 : num [1:190] 0.0675 0.0681 0.0889 0.1486 0.1389 ... .. .. ..$ rand38 : num [1:190] 0.0119 0.1 0.1444 0.1279 0.125 ... .. .. ..$ rand39 : num [1:190] 0.0873 0.1653 0.1278 0.051 0.1528 ... .. .. ..$ rand40 : num [1:190] 0.246 0.222 0.233 0.257 0.243 ... .. .. ..$ rand41 : num [1:190] 0.119 0.125 0.0778 0.0814 0.0521 ... .. .. ..$ rand42 : num [1:190] 0.0714 0.125 0.1333 0.0698 0.1562 ... .. .. ..$ rand43 : num [1:190] 0.0595 0.1 0.0778 0.0698 0.0729 ... .. .. ..$ rand44 : num [1:190] 0.1429 0.15 0.0333 0.0814 0.1042 ... .. .. ..$ rand45 : num [1:190] 0.0476 0.1 0.1444 0.0814 0.0938 ... .. .. ..$ rand46 : num [1:190] 0.1667 0.075 0.0111 0.093 0.0521 ... .. .. ..$ rand47 : num [1:190] 0.0317 0.1306 0.1667 0.0904 0.0556 ... .. .. ..$ rand48 : num [1:190] 0.119 0.1125 0.0667 0.0349 0.1562 ... .. .. ..$ rand49 : num [1:190] 0.0476 0 0.0444 0.0465 0.1354 ... .. .. ..$ rand50 : num [1:190] 0.163 0.133 0.161 0.095 0.125 ... .. .. ..$ rand51 : num [1:190] 0.0595 0.125 0.1 0.0698 0.1042 ... .. .. ..$ rand52 : num [1:190] 0.1667 0.15 0.0889 0.1628 0 ... .. .. ..$ rand53 : num [1:190] 0.1667 0 0.0889 0.093 0 ... .. .. ..$ rand54 : num [1:190] 0.0357 0.125 0.1111 0.1163 0.1354 ... .. .. ..$ rand55 : num [1:190] 0.0873 0.1167 0.1667 0.1667 0.1562 ... .. .. ..$ rand56 : num [1:190] 0.1587 0.2736 0.1 0.0762 0.1632 ... .. .. ..$ rand57 : num [1:190] 0.0556 0.1181 0.0667 0.084 0.0278 ... .. .. ..$ rand58 : num [1:190] 0.0833 0.1875 0.1 0.1279 0.0938 ... .. .. ..$ rand59 : num [1:190] 0.1111 0.0736 0.1778 0.0879 0.1632 ... .. .. ..$ rand60 : num [1:190] 0.1746 0.1639 0.1833 0.0691 0.1424 ... .. .. ..$ rand61 : num [1:190] 0 0.1375 0.1222 0.093 0.0833 ... .. .. ..$ rand62 : num [1:190] 0.1151 0.1403 0.0833 0.0536 0.0799 ... .. .. ..$ rand63 : num [1:190] 0.135 0.178 0.106 0.121 0.17 ... .. .. ..$ rand64 : num [1:190] 0.0476 0.1 0.1556 0.1047 0.125 ... .. .. ..$ rand65 : num [1:190] 0.131 0.0375 0.0778 0.0698 0.0208 ... .. .. ..$ rand66 : num [1:190] 0.25 0.25 0.139 0.18 0.271 ... .. .. ..$ rand67 : num [1:190] 0.0357 0.1375 0.0667 0.0698 0 ... .. .. ..$ rand68 : num [1:190] 0.1071 0.0833 0.1056 0.1298 0.1875 ... .. .. ..$ rand69 : num [1:190] 0.234 0.164 0.172 0.185 0.181 ... .. .. ..$ rand70 : num [1:190] 0.226 0.204 0.233 0.19 0.198 ... .. .. ..$ rand71 : num [1:190] 0.0714 0.1125 0.0444 0.093 0.1146 ... .. .. ..$ rand72 : num [1:190] 0.1508 0.0889 0.1611 0.1156 0.1181 ... .. .. ..$ rand73 : num [1:190] 0.175 0.226 0.106 0.127 0.191 ... .. .. ..$ rand74 : num [1:190] 0.0357 0.125 0.1556 0.0233 0.0729 ... .. .. ..$ rand75 : num [1:190] 0.159 0.14 0.139 0.206 0.309 ... .. .. ..$ rand76 : num [1:190] 0 0.158 0.106 0.165 0.146 ... .. .. ..$ rand77 : num [1:190] 0.123 0.161 0.122 0.169 0.163 ... .. .. ..$ rand78 : num [1:190] 0.0635 0.1611 0.1444 0.2506 0.2153 ... .. .. ..$ rand79 : num [1:190] 0.0238 0.075 0.0556 0.093 0.1042 ... .. .. ..$ rand80 : num [1:190] 0.155 0.246 0.106 0.095 0.198 ... .. .. ..$ rand81 : num [1:190] 0.1706 0.0861 0.1889 0.1227 0.1944 ... .. .. ..$ rand82 : num [1:190] 0.262 0.154 0.289 0.202 0.135 ... .. .. ..$ rand83 : num [1:190] 0.127 0.0403 0.1278 0.1208 0.0972 ... .. .. ..$ rand84 : num [1:190] 0.00397 0.05556 0.11111 0.07881 0.07639 ... .. .. ..$ rand85 : num [1:190] 0.313 0.278 0.211 0.15 0.267 ... .. .. ..$ rand86 : num [1:190] 0.23 0.136 0.133 0.204 0.111 ... .. .. ..$ rand87 : num [1:190] 0 0.1875 0.1444 0.1047 0.0833 ... .. .. ..$ rand88 : num [1:190] 0.2143 0.125 0.0333 0.1047 0.1771 ... .. .. ..$ rand89 : num [1:190] 0.333 0.358 0.267 0.403 0.312 ... .. .. ..$ rand90 : num [1:190] 0.23 0.194 0.261 0.218 0.174 ... .. .. ..$ rand91 : num [1:190] 0.179 0.167 0.256 0.202 0.167 ... .. .. ..$ rand92 : num [1:190] 0.0357 0.0375 0.0556 0.1512 0.0729 ... .. .. ..$ rand93 : num [1:190] 0.0238 0.1625 0.1667 0.1628 0.0833 ... .. .. ..$ rand94 : num [1:190] 0.131 0.1875 0.0444 0.093 0.0625 ... .. .. ..$ rand95 : num [1:190] 0.1032 0.1181 0.1222 0.0556 0.1076 ... .. .. ..$ rand96 : num [1:190] 0.0556 0.0931 0.0778 0.1602 0.1701 ... .. .. ..$ rand97 : num [1:190] 0.1825 0.1111 0.0778 0.0762 0.059 ... .. .. .. [list output truncated] .. ..$ :'data.frame': 190 obs. of 102 variables: .. .. ..$ name1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ name2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ rand1 : num [1:190] 0.1151 0.0306 0.0778 0.137 0.1076 ... .. .. ..$ rand2 : num [1:190] 0.0397 0.0903 0.0611 0.1324 0.0903 ... .. .. ..$ rand3 : num [1:190] 0.1032 0.1181 0.0778 0.1137 0.0868 ... .. .. ..$ rand4 : num [1:190] 0.0635 0.0403 0.0389 0.0627 0.0694 ... .. .. ..$ rand5 : num [1:190] 0.0833 0 0.0778 0.1047 0.0417 ... .. .. ..$ rand6 : num [1:190] 0.0238 0.05 0.1222 0.0465 0.0417 ... .. .. ..$ rand7 : num [1:190] 0.167 0.146 0.106 0.153 0.104 ... .. .. ..$ rand8 : num [1:190] 0.0476 0.0375 0.0556 0.0116 0.0312 ... .. .. ..$ rand9 : num [1:190] 0.0516 0.0556 0.1 0.0672 0.0868 ... .. .. ..$ rand10 : num [1:190] 0.0794 0.0556 0.0444 0.0672 0.066 ... .. .. ..$ rand11 : num [1:190] 0.0278 0.0278 0.1278 0.1092 0.059 ... .. .. ..$ rand12 : num [1:190] 0.0595 0.1958 0.1167 0.1415 0.0729 ... .. .. ..$ rand13 : num [1:190] 0.262 0.167 0.156 0.248 0.281 ... .. .. ..$ rand14 : num [1:190] 0.0992 0.0861 0.0889 0.0762 0.0694 ... .. .. ..$ rand15 : num [1:190] 0.1706 0.1361 0.0778 0.146 0.1111 ... .. .. ..$ rand16 : num [1:190] 0.1349 0.1153 0.05 0.0743 0.0486 ... .. .. ..$ rand17 : num [1:190] 0.0952 0.0833 0.0722 0.095 0.1042 ... .. .. ..$ rand18 : num [1:190] 0.0833 0.025 0 0.0698 0.0729 ... .. .. ..$ rand19 : num [1:190] 0.0119 0.075 0.0333 0.1279 0.0625 ... .. .. ..$ rand20 : num [1:190] 0.0595 0.0125 0.0889 0.0814 0.0312 ... .. .. ..$ rand21 : num [1:190] 0.0675 0.0806 0.1222 0.1486 0.0347 ... .. .. ..$ rand22 : num [1:190] 0.0278 0.0403 0.0278 0.1092 0.1111 ... .. .. ..$ rand23 : num [1:190] 0.0714 0.0625 0.0333 0.0581 0.0417 ... .. .. ..$ rand24 : num [1:190] 0.0357 0.05 0.0222 0.0233 0.0625 ... .. .. ..$ rand25 : num [1:190] 0.0119 0.0375 0.0333 0.0698 0.0729 ... .. .. ..$ rand26 : num [1:190] 0.127 0.1181 0.1 0.0323 0.0972 ... .. .. ..$ rand27 : num [1:190] 0.1429 0.0958 0.1056 0.0975 0.1042 ... .. .. ..$ rand28 : num [1:190] 0.1071 0.0375 0.0444 0.0233 0.0417 ... .. .. ..$ rand29 : num [1:190] 0.23 0.219 0.139 0.206 0.122 ... .. .. ..$ rand30 : num [1:190] 0.183 0.186 0.189 0.146 0.111 ... .. .. ..$ rand31 : num [1:190] 0.0476 0.2333 0.0722 0.1415 0.1563 ... .. .. ..$ rand32 : num [1:190] 0.187 0.247 0.256 0.234 0.253 ... .. .. ..$ rand33 : num [1:190] 0.0595 0.0125 0.1 0.0233 0.0312 ... .. .. ..$ rand34 : num [1:190] 0.0238 0.05 0.0222 0.0465 0 ... .. .. ..$ rand35 : num [1:190] 0.0952 0.05 0.0667 0.0349 0.1042 ... .. .. ..$ rand36 : num [1:190] 0.0476 0.075 0.0667 0.1047 0.1146 ... .. .. ..$ rand37 : num [1:190] 0.0476 0.1 0.1222 0.0698 0.0938 ... .. .. ..$ rand38 : num [1:190] 0.19 0.179 0.144 0.134 0.188 ... .. .. ..$ rand39 : num [1:190] 0 0 0.0889 0.0233 0.0104 ... .. .. ..$ rand40 : num [1:190] 0.04365 0.06806 0.05556 0.09044 0.00347 ... .. .. ..$ rand41 : num [1:190] 0.0119 0.0375 0.0111 0.1047 0.0729 ... .. .. ..$ rand42 : num [1:190] 0.214 0.275 0.183 0.192 0.25 ... .. .. ..$ rand43 : num [1:190] 0.0913 0.0931 0.0556 0.137 0.1181 ... .. .. ..$ rand44 : num [1:190] 0.1587 0.0861 0.1111 0.0762 0.0694 ... .. .. ..$ rand45 : num [1:190] 0.143 0.167 0.189 0.178 0.26 ... .. .. ..$ rand46 : num [1:190] 0.0278 0.0403 0.0389 0.0627 0.0382 ... .. .. ..$ rand47 : num [1:190] 0.0357 0.0625 0.0222 0.1137 0.0208 ... .. .. ..$ rand48 : num [1:190] 0.131 0.0708 0.1611 0.095 0.1042 ... .. .. ..$ rand49 : num [1:190] 0 0.075 0.0111 0.0465 0 ... .. .. ..$ rand50 : num [1:190] 0.119 0 0.0222 0.093 0.0625 ... .. .. ..$ rand51 : num [1:190] 0.1071 0.0625 0.0222 0.0349 0 ... .. .. ..$ rand52 : num [1:190] 0.00794 0.10278 0.08333 0.06266 0.10069 ... .. .. ..$ rand53 : num [1:190] 0.123 0.0861 0.1222 0.1227 0.2569 ... .. .. ..$ rand54 : num [1:190] 0.1071 0.0681 0.0778 0.0439 0.1042 ... .. .. ..$ rand55 : num [1:190] 0.0357 0.0875 0.0889 0.0116 0.0208 ... .. .. ..$ rand56 : num [1:190] 0.0833 0.0458 0.0833 0.1298 0.0833 ... .. .. ..$ rand57 : num [1:190] 0.0595 0.0625 0.0667 0.0698 0.0521 ... .. .. ..$ rand58 : num [1:190] 0.135 0.136 0.1 0.123 0.101 ... .. .. ..$ rand59 : num [1:190] 0.1071 0.0833 0.0944 0.095 0.0938 ... .. .. ..$ rand60 : num [1:190] 0.0119 0 0.0333 0.0581 0.0104 ... .. .. ..$ rand61 : num [1:190] 0.1984 0.1514 0.0944 0.1389 0.1285 ... .. .. ..$ rand62 : num [1:190] 0.0833 0.0528 0.0944 0.1182 0.184 ... .. .. ..$ rand63 : num [1:190] 0.119 0.1083 0.1944 0.0368 0.0729 ... .. .. ..$ rand64 : num [1:190] 0.0556 0.0931 0.0556 0.0788 0.0868 ... .. .. ..$ rand65 : num [1:190] 0.0357 0.0875 0.0333 0.0465 0.0312 ... .. .. ..$ rand66 : num [1:190] 0.123 0.1528 0.1278 0.051 0.0278 ... .. .. ..$ rand67 : num [1:190] 0.0992 0.0528 0.1056 0.0743 0.1111 ... .. .. ..$ rand68 : num [1:190] 0.0119 0.025 0.0556 0.0349 0.0729 ... .. .. ..$ rand69 : num [1:190] 0.0754 0.1111 0.1556 0.1344 0.1007 ... .. .. ..$ rand70 : num [1:190] 0.0595 0.0375 0.1 0.0698 0.0417 ... .. .. ..$ rand71 : num [1:190] 0.0238 0.025 0.1 0.0233 0.0833 ... .. .. ..$ rand72 : num [1:190] 0.1468 0.1111 0.0778 0.0995 0.0799 ... .. .. ..$ rand73 : num [1:190] 0.0119 0.05 0 0.0116 0.125 ... .. .. ..$ rand74 : num [1:190] 0.0873 0.0528 0.05 0.1001 0.0139 ... .. .. ..$ rand75 : num [1:190] 0.0952 0.0833 0.0611 0.1066 0.0521 ... .. .. ..$ rand76 : num [1:190] 0.0714 0 0.1333 0.0581 0.0521 ... .. .. ..$ rand77 : num [1:190] 0 0.0125 0 0.0581 0 ... .. .. ..$ rand78 : num [1:190] 0.0119 0.0125 0.0556 0 0.0208 ... .. .. ..$ rand79 : num [1:190] 0.119 0.075 0.0778 0.0698 0.0208 ... .. .. ..$ rand80 : num [1:190] 0.0992 0.0278 0.0389 0.0627 0.0174 ... .. .. ..$ rand81 : num [1:190] 0 0.0625 0.0556 0.1163 0.0521 ... .. .. ..$ rand82 : num [1:190] 0.0873 0.1486 0.1333 0.1111 0.1528 ... .. .. ..$ rand83 : num [1:190] 0.0476 0.05 0.0556 0.0581 0.0417 ... .. .. ..$ rand84 : num [1:190] 0.0833 0.0625 0 0.0116 0 ... .. .. ..$ rand85 : num [1:190] 0.0992 0.0403 0.1056 0.051 0.059 ... .. .. ..$ rand86 : num [1:190] 0.0952 0.1208 0.1167 0.0601 0.125 ... .. .. ..$ rand87 : num [1:190] 0.0357 0.0375 0.0667 0 0.0417 ... .. .. ..$ rand88 : num [1:190] 0.0952 0.075 0.1111 0.0116 0.0312 ... .. .. ..$ rand89 : num [1:190] 0.0833 0.1125 0.0667 0.0581 0.0312 ... .. .. ..$ rand90 : num [1:190] 0.1071 0.1125 0.0667 0.0814 0 ... .. .. ..$ rand91 : num [1:190] 0.0556 0.1181 0.1333 0.0672 0.0972 ... .. .. ..$ rand92 : num [1:190] 0 0.025 0.0778 0.0233 0.0625 ... .. .. ..$ rand93 : num [1:190] 0 0.0625 0.0222 0.0581 0.1354 ... .. .. ..$ rand94 : num [1:190] 0 0.025 0.0111 0.0581 0 ... .. .. ..$ rand95 : num [1:190] 0.1429 0.1083 0.0611 0.1647 0.1146 ... .. .. ..$ rand96 : num [1:190] 0.119 0.075 0.0667 0.0233 0.0938 ... .. .. ..$ rand97 : num [1:190] 0.0833 0 0.0333 0.0349 0 ... .. .. .. [list output truncated] .. ..$ :'data.frame': 190 obs. of 102 variables: .. .. ..$ name1 : Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ name2 : Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ rand1 : num [1:190] 0.639 0.526 0.511 0.563 0.535 ... .. .. ..$ rand2 : num [1:190] 0.603 0.572 0.683 0.561 0.618 ... .. .. ..$ rand3 : num [1:190] 0.742 0.557 0.833 0.607 0.542 ... .. .. ..$ rand4 : num [1:190] 0.627 0.41 0.444 0.397 0.649 ... .. .. ..$ rand5 : num [1:190] 0.714 0.704 0.561 0.614 0.583 ... .. .. ..$ rand6 : num [1:190] 0.409 0.319 0.394 0.457 0.538 ... .. .. ..$ rand7 : num [1:190] 0.405 0.517 0.517 0.545 0.427 ... .. .. ..$ rand8 : num [1:190] 0.456 0.696 0.489 0.58 0.58 ... .. .. ..$ rand9 : num [1:190] 0.591 0.426 0.411 0.389 0.649 ... .. .. ..$ rand10 : num [1:190] 0.667 0.629 0.461 0.533 0.427 ... .. .. ..$ rand11 : num [1:190] 0.365 0.626 0.367 0.307 0.368 ... .. .. ..$ rand12 : num [1:190] 0.762 0.642 0.683 0.626 0.708 ... .. .. ..$ rand13 : num [1:190] 0.413 0.497 0.478 0.513 0.556 ... .. .. ..$ rand14 : num [1:190] 0.635 0.553 0.533 0.452 0.58 ... .. .. ..$ rand15 : num [1:190] 0.488 0.458 0.622 0.554 0.604 ... .. .. ..$ rand16 : num [1:190] 0.587 0.632 0.644 0.561 0.726 ... .. .. ..$ rand17 : num [1:190] 0.337 0.532 0.483 0.45 0.455 ... .. .. ..$ rand18 : num [1:190] 0.484 0.585 0.583 0.647 0.649 ... .. .. ..$ rand19 : num [1:190] 0.631 0.662 0.5 0.453 0.802 ... .. .. ..$ rand20 : num [1:190] 0.603 0.406 0.389 0.513 0.649 ... .. .. ..$ rand21 : num [1:190] 0.762 0.596 0.6 0.612 0.729 ... .. .. ..$ rand22 : num [1:190] 0.544 0.45 0.628 0.495 0.597 ... .. .. ..$ rand23 : num [1:190] 0.393 0.479 0.394 0.417 0.406 ... .. .. ..$ rand24 : num [1:190] 0.679 0.696 0.644 0.554 0.49 ... .. .. ..$ rand25 : num [1:190] 0.79 0.487 0.6 0.558 0.677 ... .. .. ..$ rand26 : num [1:190] 0.484 0.547 0.411 0.594 0.389 ... .. .. ..$ rand27 : num [1:190] 0.44 0.417 0.583 0.48 0.521 ... .. .. ..$ rand28 : num [1:190] 0.27 0.522 0.322 0.455 0.333 ... .. .. ..$ rand29 : num [1:190] 0.389 0.606 0.472 0.492 0.42 ... .. .. ..$ rand30 : num [1:190] 0.702 0.683 0.567 0.627 0.479 ... .. .. ..$ rand31 : num [1:190] 0.476 0.442 0.361 0.364 0.594 ... .. .. ..$ rand32 : num [1:190] 0.516 0.59 0.489 0.499 0.497 ... .. .. ..$ rand33 : num [1:190] 0.643 0.586 0.339 0.591 0.427 ... .. .. ..$ rand34 : num [1:190] 0.5 0.492 0.433 0.563 0.639 ... .. .. ..$ rand35 : num [1:190] 0.317 0.551 0.678 0.528 0.326 ... .. .. ..$ rand36 : num [1:190] 0.694 0.669 0.344 0.5 0.604 ... .. .. ..$ rand37 : num [1:190] 0.623 0.632 0.567 0.596 0.663 ... .. .. ..$ rand38 : num [1:190] 0.393 0.621 0.489 0.45 0.583 ... .. .. ..$ rand39 : num [1:190] 0.389 0.413 0.494 0.657 0.58 ... .. .. ..$ rand40 : num [1:190] 0.425 0.535 0.467 0.513 0.462 ... .. .. ..$ rand41 : num [1:190] 0.623 0.713 0.478 0.558 0.625 ... .. .. ..$ rand42 : num [1:190] 0.452 0.575 0.417 0.576 0.427 ... .. .. ..$ rand43 : num [1:190] 0.452 0.482 0.6 0.514 0.434 ... .. .. ..$ rand44 : num [1:190] 0.389 0.289 0.656 0.54 0.701 ... .. .. ..$ rand45 : num [1:190] 0.738 0.458 0.622 0.671 0.542 ... .. .. ..$ rand46 : num [1:190] 0.528 0.474 0.628 0.663 0.569 ... .. .. ..$ rand47 : num [1:190] 0.52 0.576 0.544 0.494 0.681 ... .. .. ..$ rand48 : num [1:190] 0.464 0.642 0.539 0.521 0.406 ... .. .. ..$ rand49 : num [1:190] 0.579 0.925 0.611 0.767 0.615 ... .. .. ..$ rand50 : num [1:190] 0.552 0.511 0.394 0.431 0.215 ... .. .. ..$ rand51 : num [1:190] 0.69 0.812 0.811 0.826 0.812 ... .. .. ..$ rand52 : num [1:190] 0.413 0.747 0.561 0.542 0.378 ... .. .. ..$ rand53 : num [1:190] 0.567 0.714 0.544 0.366 0.514 ... .. .. ..$ rand54 : num [1:190] 0.635 0.446 0.656 0.654 0.469 ... .. .. ..$ rand55 : num [1:190] 0.655 0.546 0.656 0.473 0.469 ... .. .. ..$ rand56 : num [1:190] 0.615 0.506 0.572 0.631 0.628 ... .. .. ..$ rand57 : num [1:190] 0.48 0.708 0.444 0.572 0.601 ... .. .. ..$ rand58 : num [1:190] 0.472 0.601 0.544 0.61 0.618 ... .. .. ..$ rand59 : num [1:190] 0.496 0.501 0.483 0.399 0.264 ... .. .. ..$ rand60 : num [1:190] 0.552 0.611 0.406 0.617 0.674 ... .. .. ..$ rand61 : num [1:190] 0.516 0.536 0.406 0.605 0.538 ... .. .. ..$ rand62 : num [1:190] 0.603 0.626 0.533 0.563 0.556 ... .. .. ..$ rand63 : num [1:190] 0.484 0.514 0.567 0.726 0.41 ... .. .. ..$ rand64 : num [1:190] 0.468 0.357 0.433 0.514 0.559 ... .. .. ..$ rand65 : num [1:190] 0.548 0.625 0.511 0.628 0.573 ... .. .. ..$ rand66 : num [1:190] 0.46 0.272 0.511 0.443 0.535 ... .. .. ..$ rand67 : num [1:190] 0.627 0.685 0.606 0.582 0.611 ... .. .. ..$ rand68 : num [1:190] 0.476 0.617 0.439 0.486 0.531 ... .. .. ..$ rand69 : num [1:190] 0.429 0.375 0.428 0.494 0.51 ... .. .. ..$ rand70 : num [1:190] 0.476 0.458 0.5 0.554 0.531 ... .. .. ..$ rand71 : num [1:190] 0.683 0.751 0.367 0.628 0.483 ... .. .. ..$ rand72 : num [1:190] 0.44 0.25 0.517 0.483 0.531 ... .. .. ..$ rand73 : num [1:190] 0.552 0.557 0.361 0.443 0.392 ... .. .. ..$ rand74 : num [1:190] 0.496 0.767 0.506 0.635 0.524 ... .. .. ..$ rand75 : num [1:190] 0.46 0.46 0.6 0.571 0.368 ... .. .. ..$ rand76 : num [1:190] 0.643 0.567 0.639 0.498 0.594 ... .. .. ..$ rand77 : num [1:190] 0.56 0.596 0.578 0.531 0.385 ... .. .. ..$ rand78 : num [1:190] 0.615 0.601 0.522 0.578 0.542 ... .. .. ..$ rand79 : num [1:190] 0.595 0.583 0.511 0.651 0.667 ... .. .. ..$ rand80 : num [1:190] 0.675 0.551 0.511 0.633 0.472 ... .. .. ..$ rand81 : num [1:190] 0.425 0.576 0.556 0.459 0.399 ... .. .. ..$ rand82 : num [1:190] 0.579 0.597 0.511 0.548 0.545 ... .. .. ..$ rand83 : num [1:190] 0.437 0.585 0.572 0.635 0.618 ... .. .. ..$ rand84 : num [1:190] 0.611 0.432 0.733 0.63 0.569 ... .. .. ..$ rand85 : num [1:190] 0.587 0.507 0.461 0.52 0.382 ... .. .. ..$ rand86 : num [1:190] 0.413 0.393 0.55 0.48 0.535 ... .. .. ..$ rand87 : num [1:190] 0.472 0.45 0.522 0.616 0.493 ... .. .. ..$ rand88 : num [1:190] 0.595 0.675 0.489 0.442 0.604 ... .. .. ..$ rand89 : num [1:190] 0.583 0.454 0.489 0.516 0.552 ... .. .. ..$ rand90 : num [1:190] 0.433 0.593 0.606 0.488 0.535 ... .. .. ..$ rand91 : num [1:190] 0.353 0.615 0.567 0.429 0.66 ... .. .. ..$ rand92 : num [1:190] 0.702 0.562 0.511 0.547 0.552 ... .. .. ..$ rand93 : num [1:190] 0.548 0.65 0.5 0.523 0.559 ... .. .. ..$ rand94 : num [1:190] 0.536 0.487 0.878 0.57 0.812 ... .. .. ..$ rand95 : num [1:190] 0.635 0.674 0.617 0.64 0.715 ... .. .. ..$ rand96 : num [1:190] 0.444 0.54 0.344 0.584 0.528 ... .. .. ..$ rand97 : num [1:190] 0.409 0.428 0.378 0.517 0.33 ... .. .. .. [list output truncated] $ special.crct:List of 1 ..$ SigbMPDi:List of 3 .. ..$ special.ses :'data.frame': 190 obs. of 5 variables: .. .. ..$ name1: Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ name2: Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ bin1 : num [1:190] 0 0 0 0 0 0 0 0 0 0 ... .. .. ..$ bin2 : num [1:190] 0 0 0 0 0 0 0 -99 0 0 ... .. .. ..$ bin3 : num [1:190] 0 0 0 0 0 0 0 0 0 0 ... .. ..$ special.rc :'data.frame': 190 obs. of 5 variables: .. .. ..$ name1: Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ name2: Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ bin1 : num [1:190] 0 0 0 0 0 0 0 0 0 0 ... .. .. ..$ bin2 : num [1:190] 0 0 0 0 0 0 0 -1.1 0 0 ... .. .. ..$ bin3 : num [1:190] 0 0 0 0 0 0 0 0 0 0 ... .. ..$ special.conf:'data.frame': 190 obs. of 5 variables: .. .. ..$ name1: Factor w/ 19 levels "s10","s11","s12",..: 11 13 14 15 16 17 18 19 1 2 ... .. .. ..$ name2: Factor w/ 19 levels "s1","s10","s11",..: 1 1 1 1 1 1 1 1 1 1 ... .. .. ..$ bin1 : num [1:190] 0 0 0 0 0 0 0 0 0 0 ... .. .. ..$ bin2 : num [1:190] 0 0 0 0 0 0 0 -1.1 0 0 ... .. .. ..$ bin3 : num [1:190] 0 0 0 0 0 0 0 0 0 0 ...
See help of icamp.big for detail.
icamp.big result from the example.data.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data(icamp.out)
data(icamp.out)
This function is usually used to check the consistency of samples names in different pairwise comparison matrixes.
match.2col(check.list, name.check = NULL, rerank = TRUE, silent = FALSE)
match.2col(check.list, name.check = NULL, rerank = TRUE, silent = FALSE)
check.list |
List, each element is a matrix. It must be set in a format like "check.list=list(A=A,B=B)". The first two columns of the matrixes will be compared and matched with each other. |
name.check |
matrix, the first two columns will be used as a standard. The pairs not appear in this matrix will be removed from all matrixes. |
rerank |
Logic, make the first two columns in all matrixes in the same rank or not. Default is TRUE. |
silent |
Logic, whether to show messages. Default is FALSE, thus all messages will be showed. |
A tool to match IDs.
Return a list object, new matrixes with the same first two columns. Some messages will return if some names are removed or all names matches very well.
Version 2: 2020.8.19, add example. Version 1: 2018.10.20
Daliang Ning
# here two simple matrixes are generated and the pairwise comparison IDs not matched are removed. A=1:5 names(A)=paste0("S",1:5) B=1:6 names(B)=paste0("S",1:6) DA3c=dist.3col(dist(A)) DB3c=dist.3col(dist(B)) checkid=match.2col(check.list = list(DA3c=DA3c,DB3c=DB3c)) DA3cnew=checkid$DA3c DB3cnew=checkid$DB3c
# here two simple matrixes are generated and the pairwise comparison IDs not matched are removed. A=1:5 names(A)=paste0("S",1:5) B=1:6 names(B)=paste0("S",1:6) DA3c=dist.3col(dist(A)) DB3c=dist.3col(dist(B)) checkid=match.2col(check.list = list(DA3c=DA3c,DB3c=DB3c)) DA3cnew=checkid$DA3c DB3cnew=checkid$DB3c
This function is usually used to check the consistency of species or samples names in different data table (e.g. OTU table and phylogenetic distance matrix). it can be used to check row names and/or column names of different matrixes, names in vector(s) or list(s), and tip.lable in tree(s)
match.name(name.check=integer(0), rn.list=list(integer(0)), cn.list=list(integer(0)), both.list=list(integer(0)), v.list=list(integer(0)), lf.list=list(integer(0)), tree.list=list(integer(0)), group=integer(0), rerank=TRUE, silent=FALSE)
match.name(name.check=integer(0), rn.list=list(integer(0)), cn.list=list(integer(0)), both.list=list(integer(0)), v.list=list(integer(0)), lf.list=list(integer(0)), tree.list=list(integer(0)), group=integer(0), rerank=TRUE, silent=FALSE)
name.check |
A character vector, indicating reference name list or the names you would like to keep. If not available, a union of all names is set as reference name list. |
rn.list |
A list object, including the matrix(es) of which the row names will be check. rn.list must be set in a format like "rn.list=list(A=A,B=B)". default is nothing. |
cn.list |
A list object, including the matrix(es) of which the column names will be check. cn.list must be set in a format like "cn.list=list(A=A,B=B)". default is nothing. |
both.list |
A list object, including the matrix(es) of which both column and row names will be check. both.list must be set in a format like "both.list=list(A=A,B=B)". default is nothing. |
v.list |
A list object, including the vector(s) of which the names will be check. v.list must be set in a format like "v.list=list(A=A,B=B)".default is nothing. |
lf.list |
A list object, including the list(s) of which the names will be check. lf.list must be set in a format like "lf.list=list(A=A,B=B)".default is nothing. |
tree.list |
A list object, including the tree(s) of which the tip.label names will be check. tree.list must be set in a format like "tree.list=list(A=A,B=B)".default is nothing. |
group |
a vector or one-column matrix/data.frame indicating the grouping information of samples or species, of which the sample/species names will be check. |
rerank |
Logic, make all names in the same rank or not. Default is TRUE |
silent |
Logic, whether to show messages. Default is FALSE, thus all messages will be showed. |
In many cases and functions, species names and samples names must be checked and set in the same rank. Sometimes, we also need to select some samples or species as necessary. This function can help.
Return a list object, new matrixes with the same row/column names in the same rank. Some messages will return if some names are removed or all names match very well.
Version 3: 2017.3.13 Version 2: 2015.9.25
Daliang Ning
data("example.data") comm=example.data$comm treat=example.data$treat tree=example.data$tree pd=example.data$pd clas=example.data$classification env=example.data$env # remove one sample in purpose to see how match.name works env=env[-13,] sampid.check=match.name(rn.list = list(comm=comm, treat=treat, env=env)) comm.ck=sampid.check$comm comm.ck=comm.ck[,colSums(comm.ck)>0,drop=FALSE] treat.ck=sampid.check$treat env.ck=sampid.check$env taxid.check=match.name(cn.list = list(comm.ck=comm.ck), rn.list = list(clas=clas), tree.list = list(tree=tree)) comm.ck=taxid.check$comm.ck clas.ck=taxid.check$clas tree.ck=taxid.check$tree
data("example.data") comm=example.data$comm treat=example.data$treat tree=example.data$tree pd=example.data$pd clas=example.data$classification env=example.data$env # remove one sample in purpose to see how match.name works env=env[-13,] sampid.check=match.name(rn.list = list(comm=comm, treat=treat, env=env)) comm.ck=sampid.check$comm comm.ck=comm.ck[,colSums(comm.ck)>0,drop=FALSE] treat.ck=sampid.check$treat env.ck=sampid.check$env taxid.check=match.name(cn.list = list(comm.ck=comm.ck), rn.list = list(clas=clas), tree.list = list(tree=tree)) comm.ck=taxid.check$comm.ck clas.ck=taxid.check$clas tree.ck=taxid.check$tree
Return the maxium value and its (their) location(s) in a big matrix.
maxbigm(m.desc, m.wd, nworker = 1, rm.na = TRUE, size.limit = 10000 * 10000)
maxbigm(m.desc, m.wd, nworker = 1, rm.na = TRUE, size.limit = 10000 * 10000)
m.desc |
the name of the file to hold the backingfile description of the big matrix. |
m.wd |
the path of the folder holding the big matrix file. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
rm.na |
logic, whether to remove NA. Default is TRUE. |
size.limit |
the matrix size which your current computer memory can easily handle at each time. |
A tool to figure out the maximum value in the big phylogenetic distance matrix.
Output is a list of two elements.
max.value |
Numeric, the maximum value. |
row.col |
Matrix, the row(s) and column(s), i.e. the location(s), of the maximum value in the big matrix. |
Version 3: 2020.9.1, remove setwd; change dontrun to donttest and revise save.wd in help doc. Version 2: 2020.8.19, add example. Version 1: 2015.12.16
Daliang Ning
Michael J. Kane, John Emerson, Stephen Weston (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
# this example shows how to find maximum value # in a big phylogenetic distance matrix. data("example.data") tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.maxbigm") # please change to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) maxb=maxbigm(m.desc = pd.big$pd.file, m.wd = pd.big$pd.wd, nworker = nworker, rm.na = TRUE) setwd(wd0)
# this example shows how to find maximum value # in a big phylogenetic distance matrix. data("example.data") tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.maxbigm") # please change to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) maxb=maxbigm(m.desc = pd.big$pd.file, m.wd = pd.big$pd.wd, nworker = nworker, rm.na = TRUE) setwd(wd0)
This is modified from the function "modpoint.root" in package "phytools". To deal with a large tree, phylogenetic distance is calculated and saved by using bigmemory in advance.
midpoint.root.big(tree, pd.desc, pd.spname, pd.wd, nworker = 4)
midpoint.root.big(tree, pd.desc, pd.spname, pd.wd, nworker = 4)
tree |
phylogenetic tree, an object of class "phylo". |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
iCAMP analysis need a rooted tree. If it is difficult to figure out the root, midpoint root is recommended for iCAMP analysis. Modified from the function 'midpoint.root' in package 'phytool'(Revell 2012), this function uses bigmemory (Kane et al 2013) to deal with large datasets.
Output is a list with two elements.
tree |
The rooted tree. |
max.pd |
The maximum pairwise phylogenetic distance. |
Version 3: 2020.9.1, remove setwd; change dontrun to donttest and revise save.wd in help doc. Version 2: 2020.8.19, update help document, add example. Version 1: 2015.12.16
Daliang Ning
Farris, J. (1972) Estimating phylogenetic trees from distance matrices. American Naturalist, 106, 645-667.
Paradis, E., J. Claude, and K. Strimmer (2004) APE: Analyses of phylogenetics and evolution in R language. Bioinformatics, 20, 289-290.
Revell, L. J. (2012) phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol., 3, 217-223.
Michael J. Kane, John Emerson, Stephen Weston (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.midpointroot") # please change to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) mroot=midpoint.root.big(tree = tree, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, nworker = nworker) setwd(wd0)
data("example.data") tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.midpointroot") # please change to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) mroot=midpoint.root.big(tree = tree, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, nworker = nworker) setwd(wd0)
Calculate mean nearest taxon distance (MNTD) in each community in a given community matrix.
mntdn(comm, pd, abundance.weighted = TRUE, check.name = TRUE, memory.G = 50, time.count = FALSE)
mntdn(comm, pd, abundance.weighted = TRUE, check.name = TRUE, memory.G = 50, time.count = FALSE)
comm |
matrix or data.frame, community data matrix, rownames are sample names, colnames are OTU ids. |
pd |
matrix, pairwise phylogenetic distance matrix. |
abundance.weighted |
logic, whether weighted by species abundance, default is TRUE, means weighted. |
check.name |
logic, whether to check the OTU ids (species names) in community matrix and phylogenetic distance matrix are the same. |
memory.G |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb |
time.count |
logic, whether to count calculation time, default is FALSE. |
mean nearest taxon distance (MNTD) in each community, using the same algrithm as the function 'mntd' in package 'picante'.
result is a numeric vector with sample names
Version 2: 2020.8.19, update help document, add example. Version 1: 2017.3.13
Daliang Ning
Webb CO, Ackerly DD, and Kembel SW. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 18:2098-2100
Kembel, S.W., Cowan, P.D., Helmus, M.R., Cornwell, W.K., Morlon, H., Ackerly, D.D. et al. (2010). Picante: R tools for integrating phylogenies and ecology. Bioinformatics, 26, 1463-1464.
data("example.data") comm=example.data$comm pd=example.data$pd mntd=mntdn(comm=comm,pd=pd,abundance.weighted = TRUE)
data("example.data") comm=example.data$comm pd=example.data$pd mntd=mntdn(comm=comm,pd=pd,abundance.weighted = TRUE)
Calculate mean pairwise distance (MPD) in each community in a given community matrix.
mpdn(comm, pd, abundance.weighted = TRUE, time.output = FALSE)
mpdn(comm, pd, abundance.weighted = TRUE, time.output = FALSE)
comm |
matrix or data.frame, community data matrix, rownames are sample names, colnames are OTU ids. |
pd |
matrix, pairwise phylogenetic distance matrix. |
abundance.weighted |
logic, whether weighted by species abundance, default is TRUE, means weighted. |
time.output |
logic, whether to count calculation time, default is FALSE. |
mean pairwise distance (MPD) in each community, which is the same index as 'mpd' in package 'picante', but calculated by matrix multiplication.
result is a numeric vector with sample names
Version 2: 2020.8.19, update help document, add example. Version 1: 2017.3.13
Daliang Ning
Webb C, Ackerly D, McPeek M, and Donoghue M. (2002). Phylogenies and community ecology. Annual Review of Ecology and Systematics 33:475-505.
Kembel, S.W., Cowan, P.D., Helmus, M.R., Cornwell, W.K., Morlon, H., Ackerly, D.D. et al. (2010). Picante: R tools for integrating phylogenies and ecology. Bioinformatics, 26, 1463-1464.
data("example.data") comm=example.data$comm pd=example.data$pd mpd=mpdn(comm = comm, pd = pd, abundance.weighted = TRUE)
data("example.data") comm=example.data$comm pd=example.data$pd mpd=mpdn(comm = comm, pd = pd, abundance.weighted = TRUE)
Calculate net relatedness index (NRI) or other index of null model significance test based on mean pairwise distance (MPD) by parallel computing, for small and medium size dataset. This function can deal with local communities under different metacommunities (regional pools).
NRI.cm(comm, dis, meta.group = NULL, meta.spool = NULL, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), check.name = TRUE, rand = 1000, output.MPD = c(FALSE, TRUE), silent = FALSE, sig.index = c("SES", "NRI", "Confidence", "RC", "all"))
NRI.cm(comm, dis, meta.group = NULL, meta.spool = NULL, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), check.name = TRUE, rand = 1000, output.MPD = c(FALSE, TRUE), silent = FALSE, sig.index = c("SES", "NRI", "Confidence", "RC", "all"))
comm |
community data matrix. rownames are sample names. colnames are species names. |
dis |
Phylogenetic distance matrix |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. rownames are sample IDs. first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.spool |
a list object, each element is a character vector listing all taxa IDs in a metacommunity. The names of the elements indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default is NULL, means to use the observed taxa in comm across samples within the same metacommunity that is defined by meta.group. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
check.name |
Logic, whether to check the species names in comm and dis. default is TRUE. |
rand |
integer, randomization times. default is 1000. |
output.MPD |
Logic, whether to output observed MNTD, so that you do not need to calculate observed MNTD alone. default is FALSE. |
silent |
Logic, if FALSE, some messages will be showed during calculation. Default is FALSE. |
sig.index |
character, the index for null model significance test. SES or NRI, standard effect size, i.e. net relatedness index (NRI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on MPD, i.e. count the number of null MPD lower than observed MPD plus a half of the number of null MPD equal to observed MPD, to get alpha, then calculate MPD-based RC as (2 x alpha - 1); all, output all the three indexes. default is SES. If input a vector, only the first element will be used. |
This function is particularly designed for samples from different metacommunities. The null model "taxa shuffle" will be done under different metacommunities, separately (and independently). All other details are the same as the function NRI.p.
Output can be a data.frame with each row representing a sample and only one column of index values, or a list of several data.frame objects.
SES |
output if sig.index is Confidence or all, a data.frame with NRI value for each sample. |
Confidence |
output if sig.index is SES or all, a data.frame showing confidence level based on MPD for each sample. |
RC |
output if sig.index is RC or all, a data.frame showing RC based on MPD for each sample. |
MPD.obs |
output if output.MPD is TRUE, a data.frame showing observed MPD for each sample. |
MPD.rand |
output if output.MPD is TRUE, a matrix showing all null MPD values. |
Version 1: 2021.8.4
Daliang Ning
Webb CO, Ackerly DD, and Kembel SW. 2008. Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 18:2098-2100
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("example.data") comm=example.data$comm pd=example.data$pd # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number. rand.time=20 # usually use 1000 for real data. sigmpd=NRI.cm(comm=comm, meta.group=meta.group, dis=pd, nworker=nworker, weighted=TRUE, rand=rand.time, sig.index="all") NRI=sigmpd$SES CMPD=sigmpd$Confidence RCMPD=sigmpd$RC
data("example.data") comm=example.data$comm pd=example.data$pd # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number. rand.time=20 # usually use 1000 for real data. sigmpd=NRI.cm(comm=comm, meta.group=meta.group, dis=pd, nworker=nworker, weighted=TRUE, rand=rand.time, sig.index="all") NRI=sigmpd$SES CMPD=sigmpd$Confidence RCMPD=sigmpd$RC
Calculate net relatedness index (NRI) or other index of null model significance test based on mean pairwise distance (MPD) by parallel computing, for small and medium size dataset.
NRI.p(comm, dis, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), check.name = TRUE, rand = 1000, output.MPD = c(FALSE, TRUE), silent = FALSE, sig.index=c("SES","NRI","Confidence","RC","all"))
NRI.p(comm, dis, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), check.name = TRUE, rand = 1000, output.MPD = c(FALSE, TRUE), silent = FALSE, sig.index=c("SES","NRI","Confidence","RC","all"))
comm |
community data matrix. rownames are sample names. colnames are species names. |
dis |
Phylogenetic distance matrix |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
check.name |
Logic, whether to check the species names in comm and dis. default is TRUE. |
rand |
integer, randomization times. default is 1000. |
output.MPD |
Logic, whether to output observed MNTD, so that you do not need to calculate observed MNTD alone. default is FALSE. |
silent |
Logic, if FALSE, some messages will be showed during calculation. Default is FALSE. |
sig.index |
character, the index for null model significance test. SES or NRI, standard effect size, i.e. net relatedness index (NRI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on MPD, i.e. count the number of null MPD lower than observed MPD plus a half of the number of null MPD equal to observed MPD, to get alpha, then calculate MPD-based RC as (2 x alpha - 1); all, output all the three indexes. default is SES. If input a vector, only the first element will be used. |
The net relatedness index (NRI) is a standardized measure of the mean pairwise phylogenetic distance in each sample/community (MPD). Currently this function only performs one null model algorithm, "taxa.labels" ("taxa shuffle", Kembel 2009), which is to shuffle distance matrix labels (across all taxa included in distance matrix). If the randomized results are all the same, the standard deviation will be zero and NRI will be NAN. In this case, NRI will be set as zero, since the observed result is not differentiable from randomized results.
RC (Chase et al 2011) and Confidence (Ning et al 2020) are alternative significance test indexes to evaluate how the observed diversity index deviates from null expectation, which could be a better metric than standardized effect size (NRI) in some cases, e.g. null values do not follow normal distribution.
Output can be a data.frame with each row representing a sample and only one column of index values, or a list of several data.frame objects.
SES |
output if sig.index is Confidence or all, a data.frame with NRI value for each sample. |
Confidence |
output if sig.index is SES or all, a data.frame showing confidence level based on MPD for each sample. |
RC |
output if sig.index is RC or all, a data.frame showing RC based on MPD for each sample. |
MPD.obs |
output if output.MPD is TRUE, a data.frame showing observed MPD for each sample. |
MPD.rand |
output if output.MPD is TRUE, a matrix showing all null MPD values. |
Version 2: 2020.8.19, update help document, add example Version 1: 2017.5.10
Daliang Ning
Webb CO, Ackerly DD, and Kembel SW. 2008. Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 18:2098-2100
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("example.data") comm=example.data$comm pd=example.data$pd nworker=2 # parallel computing thread number. rand.time=20 # usually use 1000 for real data. sigmpd=NRI.p(comm=comm, dis=pd, nworker=nworker, weighted=TRUE, rand=rand.time, sig.index="all") NRI=sigmpd$SES CMPD=sigmpd$Confidence RCMPD=sigmpd$RC
data("example.data") comm=example.data$comm pd=example.data$pd nworker=2 # parallel computing thread number. rand.time=20 # usually use 1000 for real data. sigmpd=NRI.p(comm=comm, dis=pd, nworker=nworker, weighted=TRUE, rand=rand.time, sig.index="all") NRI=sigmpd$SES CMPD=sigmpd$Confidence RCMPD=sigmpd$RC
Calculate nearest taxon index (NTI) of each sample with parallel computing. his function can deal with local communities under different metacommunities (regional pools).
NTI.cm(comm, dis, meta.group = NULL, meta.spool = NULL, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, check.name = TRUE, output.MNTD = c(FALSE, TRUE), sig.index = c("SES", "NTI", "Confidence", "RC", "all"), silent = FALSE)
NTI.cm(comm, dis, meta.group = NULL, meta.spool = NULL, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, check.name = TRUE, output.MNTD = c(FALSE, TRUE), sig.index = c("SES", "NTI", "Confidence", "RC", "all"), silent = FALSE)
comm |
community data matrix. rownames are sample names. colnames are species names. |
dis |
Phylogenetic distance matrix. |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. rownames are sample IDs. first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.spool |
a list object, each element is a character vector listing all taxa IDs in a metacommunity. The names of the elements indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default is NULL, means to use the observed taxa in comm across samples within the same metacommunity that is defined by meta.group. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
rand |
integer, randomization times. default is 1000. |
check.name |
logic, whether to check the taxa names in comm and dis, which must be the same and in the same order; if not match, remove mismatched names and change to the same order. default is TRUE. |
output.MNTD |
logic, if TRUE, the NTI and MNTD will be output, if FALSE, only output NTI. |
sig.index |
character, the index for null model significance test. SES or NTI, standard effect size, i.e. nearest taxon index (NTI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on MNTD, i.e. count the number of null MNTD lower than observed MNTD plus a half of the number of null MNTD equal to observed MNTD, to get alpha, then calculate MNTD-based RC as (2 x alpha - 1); all, output all the three indexes. default is SES. If input a vector, only the first element will be used. |
silent |
logic, if FALSE, some messages will show during calculation. |
This function is particularly designed for samples from different metacommunities. The null model "taxa shuffle" will be done under different metacommunities, separately (and independently). All other details are the same as the function NTI.p.
If output.MNTD is FALSE, output is a one-column matrix where rownames are sample IDs and the only column shows NTI values. If output.MNTD is TRUE, output is a list of three elements.
NTI |
matrix, NTI values. |
MNTD |
matrix, observed MNTD. |
MNTD.rand |
array, null MNTD values, the third dimension represent randomization times. |
Version 1: 2021.8.4
Daliang Ning
Webb CO, Ackerly DD, and Kembel SW. 2008. Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 18:2098-2100
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("example.data") comm=example.data$comm pd=example.data$pd # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number. rand.time=4 # usually use 1000 for real data. sigmntd=NTI.cm(comm=comm, meta.group=meta.group, dis=pd, nworker = nworker, weighted = TRUE, rand = rand.time, sig.index="all") NTI=sigmntd$SES CMNTD=sigmntd$Confidence RCMNTD=sigmntd$RC
data("example.data") comm=example.data$comm pd=example.data$pd # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number. rand.time=4 # usually use 1000 for real data. sigmntd=NTI.cm(comm=comm, meta.group=meta.group, dis=pd, nworker = nworker, weighted = TRUE, rand = rand.time, sig.index="all") NTI=sigmntd$SES CMNTD=sigmntd$Confidence RCMNTD=sigmntd$RC
Calculate nearest taxon index (NTI) of each sample with parallel computing.
NTI.p(comm, dis, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, check.name = TRUE, output.MNTD = c(FALSE, TRUE), sig.index=c("SES","NTI","Confidence","RC","all"), silent=FALSE)
NTI.p(comm, dis, nworker = 4, memo.size.GB = 50, weighted = c(TRUE, FALSE), rand = 1000, check.name = TRUE, output.MNTD = c(FALSE, TRUE), sig.index=c("SES","NTI","Confidence","RC","all"), silent=FALSE)
comm |
community data matrix. rownames are sample names. colnames are species names. |
dis |
Phylogenetic distance matrix. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memo.size.GB |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
rand |
integer, randomization times. default is 1000. |
check.name |
logic, whether to check the taxa names in comm and dis, which must be the same and in the same order; if not match, remove mismatched names and change to the same order. default is TRUE. |
output.MNTD |
logic, if TRUE, the NTI and MNTD will be output, if FALSE, only output NTI. |
sig.index |
character, the index for null model significance test. SES or NTI, standard effect size, i.e. nearest taxon index (NTI); Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; RC, modified Raup-Crick index (RC) based on MNTD, i.e. count the number of null MNTD lower than observed MNTD plus a half of the number of null MNTD equal to observed MNTD, to get alpha, then calculate MNTD-based RC as (2 x alpha - 1); all, output all the three indexes. default is SES. If input a vector, only the first element will be used. |
silent |
logic, if FALSE, some messages will show during calculation. |
The nearest taxon index (NTI) is a standardized measure of the mean phylogenetic distance to the nearest taxon in each sample/community (MNTD). Currently this function only performs one null model algorithm, "taxa.labels" ("taxa shuffle", Kembel 2009), which is to shuffle distance matrix labels (across all taxa included in distance matrix). If the randomized results are all the same, the standard deviation will be zero and NTI will be NAN. In this case, NTI will be set as zero, since the observed result is not differentiable from randomized results.
RC (Chase et al 2011) and Confidence (Ning et al 2020) are alternative significance test indexes to evaluate how the observed diversity index deviates from null expectation, which could be a better metric than standardized effect size (NTI) in some cases, e.g. null values do not follow normal distribution.
If output.MNTD is FALSE, output is a one-column matrix where rownames are sample IDs and the only column shows NTI values. If output.MNTD is TRUE, output is a list of three elements.
NTI |
matrix, NTI values. |
MNTD |
matrix, observed MNTD. |
MNTD.rand |
array, null MNTD values, the third dimension represent randomization times. |
Version 2: 2020.8.19, update help document, add example. Version 1: 2018.10.19
Daliang Ning
Webb CO, Ackerly DD, and Kembel SW. 2008. Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 18:2098-2100
Kembel, S.W. (2009). Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecol Lett, 12, 949-960.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
data("example.data") comm=example.data$comm pd=example.data$pd nworker=2 # parallel computing thread number. rand.time=4 # usually use 1000 for real data. sigmntd=NTI.p(comm=comm, dis=pd, nworker = nworker, weighted = TRUE, rand = rand.time, sig.index="all") NTI=sigmntd$SES CMNTD=sigmntd$Confidence RCMNTD=sigmntd$RC
data("example.data") comm=example.data$comm pd=example.data$pd nworker=2 # parallel computing thread number. rand.time=4 # usually use 1000 for real data. sigmntd=NTI.p(comm=comm, dis=pd, nworker = nworker, weighted = TRUE, rand = rand.time, sig.index="all") NTI=sigmntd$SES CMNTD=sigmntd$Confidence RCMNTD=sigmntd$RC
To test whether the null values of each turnover of each bin follow normal distribution.
null.norm(icamp.output = NULL, rand.list = NULL, index.name = "Test.Index", p.norm.cut = 0.05, detail.out = FALSE)
null.norm(icamp.output = NULL, rand.list = NULL, index.name = "Test.Index", p.norm.cut = 0.05, detail.out = FALSE)
icamp.output |
list, the exact output of the function icamp.big in which detail.null must be TRUE, to save all null values. |
rand.list |
list, the null values of a certain dissimilarity index. Each eletment is a matrix that represents a bin. In each eletment matrix, the first two columns indicate sample IDs of the pairwise comparison (turnover), and each of the other columns shows the null values from one time of randomization. |
index.name |
character, when rand.list is given, to specify the name of the dissimilarity index. |
p.norm.cut |
numeric, the threshold of significant P value. A p value lower than this indicates significant difference from normal distribution. |
detail.out |
logic, if TRUE, the detailed statistics and P values for each turnover of each bin will be output; otherwise, only output a summary on non-normal percentage for each bin. |
Normal distribution of null values is basic assumption when using Standard Effect Size (SES, e.g. betaNRI, betaNTI) to identify significant difference between null and observed values. This function uses five different methods to perform normality test, including Anderson-Darling test (Anderson), Cramer-von Mises test (Cramer), Kolmogorov-Smirnov test (Kolmogorov, also known as Lilliefors test), Shapiro-Francia test (ShapiroF), and Shapiro-Wilk test (Shapiro). The function 'shapiro.test' in package 'stats', and various functions in package 'nortest' are used.
Output is a list object.
summary |
data.frame, each row represents a bin and a dissimilarity index. Seven columns. The first column indicates the dissimilarity index; the second column indicate Bin ID; each of the other columns indicate non-normal ratio based on a method. The non-normal raio is calculated as percentage of turnovers where null value distribution is significantly different from normal distribution. |
P.value.cut |
the value of p.norm.cut |
detail |
list, each first-level element represents a dissimilarity index; each second-level element is a matrix represents a bin; and the matrix has 14 columns, including the dissimilarity index (Index), bin ID (BinID), sample IDs (name1 and name2), and the statistics and P value based on different methods. |
Version 2: 2020.8.19, update help document, add example Version 1: 2020.8.1
Daliang Ning
Stephens, M.A. (1986): Tests based on EDF statistics. In: D'Agostino, R.B. and Stephens, M.A., eds.: Goodness-of-Fit Techniques. Marcel Dekker, New York.
Dallal, G.E. and Wilkinson, L. (1986): An analytic approximation to the distribution of Lilliefors' test for normality. The American Statistician, 40, 294-296.
Stephens, M.A. (1974): EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69, 730-737.
Royston, P. (1993): A pocket-calculator algorithm for the Shapiro-Francia test for non-normality: an application to medicine. Statistics in Medicine, 12, 181-184.
Thode Jr., H.C. (2002): Testing for Normality. Marcel Dekker, New York
Patrick Royston (1982). An extension of Shapiro and Wilk's W test for normality to large samples. Applied Statistics, 31, 115-124. doi: 10.2307/2347973.
Patrick Royston (1982). Algorithm AS 181: The W test for Normality. Applied Statistics, 31, 176-180. doi: 10.2307/2347986.
Patrick Royston (1995). Remark AS R94: A remark on Algorithm AS 181: The W test for normality. Applied Statistics, 44, 547-551. doi: 10.2307/2986146.
Juergen Gross and Uwe Ligges (2015). nortest: Tests for Normality. R package version 1.0-4. https://CRAN.R-project.org/package=nortest
data("icamp.out") nntest=null.norm(icamp.output = icamp.out, detail.out = TRUE)
data("icamp.out") nntest=null.norm(icamp.output = icamp.out, detail.out = TRUE)
Calculates between-species phylogenetic distance matrix from a tree, using bigmemory to deal with too large dataset.
pdist.big(tree, wd = getwd(), tree.asbig = FALSE, output = FALSE, nworker = 4, nworker.pd = nworker, memory.G = 50, time.count = FALSE, treepath.file="path.rda", pd.spname.file="pd.taxon.name.csv", pd.backingfile="pd.bin", pd.desc.file="pd.desc", tree.backingfile="treeinfo.bin", tree.desc.file="treeinfo.desc")
pdist.big(tree, wd = getwd(), tree.asbig = FALSE, output = FALSE, nworker = 4, nworker.pd = nworker, memory.G = 50, time.count = FALSE, treepath.file="path.rda", pd.spname.file="pd.taxon.name.csv", pd.backingfile="pd.bin", pd.desc.file="pd.desc", tree.backingfile="treeinfo.bin", tree.desc.file="treeinfo.desc")
tree |
phylogenetic tree, an object of class "phylo". |
wd |
path of a folder to save the big phylogenetic distance matrix, default is current work directory. |
tree.asbig |
logic, whether to treat tree attributes also as big data, default is FALSE, generally no need to set as TRUE. |
output |
logic, whether to output the big phylogenetic distance matrix, default is FALSE, generally do not output it, could be too large. |
nworker |
for parallel computing the tree paths. a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
nworker.pd |
for parallel computing the phylogenetic distance matrix. default is set the same as nworker. may need to set lower than nworker if the matrix is too large. |
memory.G |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb |
time.count |
logic, whether to count calculation time, default is FALSE. |
treepath.file |
character, name of the file saving the tree.path, which is a list of all the nodes and edge lengthes from root to every tip and/or node. it should be a .rda filename. |
pd.spname.file |
character, name of the file saving the taxa IDs, which has exactly the same order as the row names (and column names) of the big phylogenetic distance matrix. it should be a .csv filename. |
pd.backingfile |
character, the root name for the file for the cache of the big phylogenetic distance matrix. it should be a .bin filename. |
pd.desc.file |
character, name of the file to hold the backingfile description for the big phylogenetic distance matrix. it should be a .desc filename. |
tree.backingfile |
character, the root name for the file for the cache of the 3-column matrix of the tree information, including edge and edge length. it should be a .bin filename. |
tree.desc.file |
character, name of the file to hold the backingfile description for the tree information matrix. it should be a .desc filename. |
The cophenetic distance between each pair of taxa is calculated (Sokal and Rohlf 1962). Modified from the function "cophenetic" in package "ape" (Paradis & Schliep 2018), this function can calculate pairwise distance from large phylogenetic tree quickly by parallel computing. This function uses bigmemory (Kane et al 2013) to deal with large phylogenetic distance matrix, which will not occupy memory but directly be saved at the hard disk.
Output is a list
tip.label |
OTU ids or species names, which is tip.label in tree file. |
pd.wd |
the folder saving the big phylogenetic distance matrix. |
pd.file |
the folder saving the big phylogenetic distance matrix. |
pd.name.file |
the file saving the tip.label information. |
Version 4: 2020.9.1, remove setwd; add options to specify the file names; change dontrun to donttest and revise save.wd in help doc. Version 3: 2020.8.19, add example. Version 2: 2017.3.13 Version 1: 2015.7.24
Daliang Ning
Sokal, R. R. & Rohlf, F. J.. (1962). The comparison of dendrograms by objective methods. Taxon, 11:33-40
Paradis, E. & Schliep, K. (2018). ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526-528.
Kane, M.J., Emerson, J., & Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.pdist.big") # please change to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) setwd(wd0)
data("example.data") tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.pdist.big") # please change to the folder you want to save the pd.big output. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) setwd(wd0)
Calculates between-species phylogenetic distance matrix from a tree. only deal with relatively small dataset.
pdist.p(tree, nworker = 4, memory.G = 50, silent = FALSE, time.count = FALSE)
pdist.p(tree, nworker = 4, memory.G = 50, silent = FALSE, time.count = FALSE)
tree |
phylogenetic tree, an object of class "phylo". |
nworker |
for parallel computing the tree paths. a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb |
silent |
logic, whether to show messages. Default is FALSE, thus all messages will be showed. |
time.count |
logic, whether to count calculation time, default is FALSE. |
The cophenetic distance between each pair of taxa is calculated (Sokal and Rohlf 1962). Modified from the function "cophenetic" in package "ape" (Paradis & Schliep 2018), this function can calculate pairwise distance from phylogenetic tree quickly by parallel computing. If the tree has too many tips (taxa), please use another function pdist.big designed for large datasets.
Output is a data.frame object, a square matrix of pairwise phylogenetic distances. Row names are the same as column names, indicating taxa IDs.
Version 1: 2021.9.24
Daliang Ning
Sokal, R. R. & Rohlf, F. J.. (1962). The comparison of dendrograms by objective methods. Taxon, 11:33-40
Paradis, E. & Schliep, K. (2018). ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526-528.
data("example.data") tree=example.data$tree nworker=2 # parallel computing thread number pd=pdist.p(tree = tree, nworker = nworker)
data("example.data") tree=example.data$tree nworker=2 # parallel computing thread number pd=pdist.p(tree = tree, nworker = nworker)
Use Mantel test to evaluate phylogenetic signal within each bin, i.e. correlation between phylogenetic distance and niche difference.
ps.bin(sp.bin, sp.ra, spname.use = NULL, pd.desc = "pd.desc", pd.spname, pd.wd, nd.list, nd.spname = NULL, ndbig.wd = NULL, cor.method = c("pearson", "spearman"), r.cut = 0.01, p.cut = 0.2, min.spn = 6)
ps.bin(sp.bin, sp.ra, spname.use = NULL, pd.desc = "pd.desc", pd.spname, pd.wd, nd.list, nd.spname = NULL, ndbig.wd = NULL, cor.method = c("pearson", "spearman"), r.cut = 0.01, p.cut = 0.2, min.spn = 6)
sp.bin |
one-column matrix or data.frame, indicating the bin ID for each species (OTU or ASV), rownames are species IDs. usually use the third column of "sp.bin" in the output of |
sp.ra |
one-column matrix or data.frame, or a vector with name for each element, indicating mean relative abundance of each species. |
spname.use |
character vector, to specify which species will be used for phylogenetic signal test. Default is NULL, means to use all species. |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. |
pd.spname |
character vector, species id in the same order as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
nd.list |
list object. if the niche difference matrixes are big.matrix, each element of this list is the big.matrix backingfile description, e.g. "pH.ND.desc"; otherwise, each element is a niche difference matrix based on an environment factor. usually this is the "nd" in the output of |
nd.spname |
character vector or NULL. If the niche difference matrixes are big.matrix, this is the species IDs in the same order as in each big matrix; otherwise, this should be set as NULL, the species IDs will be extracted from nd.list. |
ndbig.wd |
folder path or NULL. If the niche difference matrixes are big.matrix, this is where the big matrixes of niche differences are saved; otherwise, this is NULL. |
cor.method |
Correlation method, as accepted by cor: "pearson", "spearman" or "kendall". Multiple methods at a time are allowed. |
r.cut |
the cutoff of correlaiton coefficient to identify significant correlation. |
p.cut |
the cutoff of p value to identify significant correlation. |
min.spn |
the minimal spcies (or OUT or ASV) number required for phylogenetic signal test. |
This is simply Mantel test between phylogenetic distance and niche difference (i.e. phylogenetic signal) within each bin. Then, it returns the overall relative abundance of bins with significant phylogenetic signal, average correlation coefficient, as well as detailed results in each bin, to evaluate within-bin phylogenetic signal of the binning (inputed as sp.bin). Bigmemory (Kane et al 2013) is used to deal with large datasets.
Output is a list object with two elements.
Index |
Summary of phylogenetic signal test. The indexes include relative abundance of bins with significant phylogenetic signal in all bins (RAsig) or in bins with species number larger than min.spn (RAsig.adj), average correlation coefficient in significant bins (MeanR.sig) or in all bins (MeanR). |
detail |
correlation coefficient (r) and p value in each bin. |
Version 4: 2021.5.24, debug to avoid the dimnames issue. Version 3: 2020.9.1, remove setwd; change dontrun to donttest and revise save.wd in help doc. Version 2: 2020.8.18, update help document, add example. Version 1: 2020.5.15
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Kane, M.J., Emerson, J., & Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm env=example.data$env tree=example.data$tree # since big.memory need to specify a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.ps.bin") # please change to the folder you want to save the big niche difference matrix. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) niche.dif=dniche(env = env, comm = comm, method = "niche.value", nworker = nworker, out.dist=FALSE,bigmemo=TRUE,nd.wd = save.wd, nd.spname.file="nd.names.csv") ds = 0.2 # setting can be changed to explore the best choice bin.size.limit = 5 # setting can be changed to explore the best choice. # here, bin.size.limit is set as 5 just for the small example dataset. # For real data, usually try 12 to 48. phylobin=taxa.binphy.big(tree = tree, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, ds = ds, bin.size.limit = bin.size.limit, nworker = nworker) sp.bin=phylobin$sp.bin[,3,drop=FALSE] sp.ra=colMeans(comm/rowSums(comm)) abcut=3 # by abcut, you may remove some species, # if they are too rare to perform reliable correlation test. commc=comm[,colSums(comm)>=abcut,drop=FALSE] dim(commc) spname.use=colnames(commc) binps=ps.bin(sp.bin = sp.bin,sp.ra = sp.ra,spname.use = spname.use, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, nd.list = niche.dif$nd, nd.spname = niche.dif$names, ndbig.wd = niche.dif$nd.wd, cor.method = "pearson",r.cut = 0.1, p.cut = 0.05, min.spn = 5) setwd(wd0)
data("example.data") comm=example.data$comm env=example.data$env tree=example.data$tree # since big.memory need to specify a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.ps.bin") # please change to the folder you want to save the big niche difference matrix. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) niche.dif=dniche(env = env, comm = comm, method = "niche.value", nworker = nworker, out.dist=FALSE,bigmemo=TRUE,nd.wd = save.wd, nd.spname.file="nd.names.csv") ds = 0.2 # setting can be changed to explore the best choice bin.size.limit = 5 # setting can be changed to explore the best choice. # here, bin.size.limit is set as 5 just for the small example dataset. # For real data, usually try 12 to 48. phylobin=taxa.binphy.big(tree = tree, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, ds = ds, bin.size.limit = bin.size.limit, nworker = nworker) sp.bin=phylobin$sp.bin[,3,drop=FALSE] sp.ra=colMeans(comm/rowSums(comm)) abcut=3 # by abcut, you may remove some species, # if they are too rare to perform reliable correlation test. commc=comm[,colSums(comm)>=abcut,drop=FALSE] dim(commc) spname.use=colnames(commc) binps=ps.bin(sp.bin = sp.bin,sp.ra = sp.ra,spname.use = spname.use, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, nd.list = niche.dif$nd, nd.spname = niche.dif$names, ndbig.wd = niche.dif$nd.wd, cor.method = "pearson",r.cut = 0.1, p.cut = 0.05, min.spn = 5) setwd(wd0)
Identify the community assembly process governing each bin in each turnover (i.e. pairwise comparison between two samples/communities), then calculate the relative importance of community assembly processes in each turnover.
qp.bin.js(sig.phy.bin=NULL, sig.phy2.bin=NULL, sig.tax.bin=NULL, bin.weight, sig.phy.cut=1.96, sig.phy2.cut=1.96, sig.tax.cut=0.95, check.name=FALSE)
qp.bin.js(sig.phy.bin=NULL, sig.phy2.bin=NULL, sig.tax.bin=NULL, bin.weight, sig.phy.cut=1.96, sig.phy2.cut=1.96, sig.tax.cut=0.95, check.name=FALSE)
sig.phy.bin |
matrix, the first two columns are sample IDs, thus each row represent a turnover between two samples. From the third column, each column shows the significance testing index of phylogenetic null model analysis (e.g. betaNTI) of a bin. NULL means the data is not available. |
sig.phy2.bin |
matrix, the same as sig.phy.bin, serves as the second phylogenetic metrics when phylogenetic null model test is based on two different beta diversity indexes, e.g. both betaMPD and betaMNTD. NULL means the data is not available. |
sig.tax.bin |
matrix, the first two columns are sample IDs, thus each row represent a turnover between two samples. From the third column, each column shows the significance testing index of taxonomic null model analysis (e.g. RC.Bray) of a bin. NULL means the data is not available. |
bin.weight |
matrix, the first two columns are sample IDs, thus each row represent a turnover between two samples. From the third column, each column shows the abundance sum of a bin in each pair of samples. |
sig.phy.cut |
numeric, a cutoff for the null model significance testing index based on a phylogenetic beta diversity index, e.g. betaNRI based on betaMPD, default is 1.96. |
sig.phy2.cut |
numeric, a cutoff for the null model significance testing index based on the second phylogenetic beta diversity index, e.g. betaNTI based on betaMNTD, default is 1.96. |
sig.tax.cut |
numeric, a cutoff for the null model significance testing index based on a taxonomic beta diversity index, e.g. RC based on Bray-Curtis, default is 0.95. |
check.name |
logic, whether to check the sample IDs in different input matrixess are in the same order. |
The framework is proposed by James Stegen (2013 and 2015), to identify governing ecologcial process based on phylogenetic (betaNTI) and taonomic (RC.Bray) null model analysis. In all pairwised comparisons between samples/communities, the non-random phylogenetic turnovers recognized by phylogeny shuffle were counted as influence of environment selection, and the non-random taxonomic turnovers in the rest pairwised comparisons were counted as influence of dispersal limitation or homogenizing dispersal. The rest part is called undominated.
This function applied this framework to each phylogenetic bin and allowed to use betaNRI and/or betaNTI. When both betaNTI and betaNRI are provided, a turnover is idientified as controlled by selection when either betaNRI or betaNTI is significant. Alternatively, RC or confidence level based on betaMPD and/or betaMNTD can also be used (Ning et al. 2020).
Output is a matrix. The first two columns are sample IDs, and each row represent a turnover between two samples. From the third column, each column shows the relative importance of a community assembly process in each turnover (pairwise comparison between each pair of samples).
Version 5: 2020.8.19, update help document, add example. Version 4: 2020.7.28, change bNTI.bin=NULL,bNRI.bin=NULL,RC.bin=NULL to sig.phy.bin and sig.tax.bin. Version 3: 2018.10.20, add bNRI.bin as option; add check.name. Version 2: 2016.3.26, add RC.all option Version 1: 2015.12.16
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Stegen JC, Lin X, Fredrickson JK, Chen X, Kennedy DW, Murray CJ et al. 2013. Quantifying community assembly processes and identifying features that impose them. Isme Journal 7: 2069-2079.
Stegen JC, Lin X, Fredrickson JK, Konopka AE. 2015. Estimating and mapping ecological processes influencing microbial community assembly. Frontiers in Microbiology 6.
data("icamp.out") bNRIbins=icamp.out$detail$SigbMPDi RCbins=icamp.out$detail$SigBCa binwt=icamp.out$detail$bin.weight qpbin=qp.bin.js(sig.phy.bin = bNRIbins,sig.tax.bin = RCbins, bin.weight = binwt, sig.phy.cut = 1.96, sig.tax.cut = 0.95, check.name = TRUE)
data("icamp.out") bNRIbins=icamp.out$detail$SigbMPDi RCbins=icamp.out$detail$SigBCa binwt=icamp.out$detail$bin.weight qpbin=qp.bin.js(sig.phy.bin = bNRIbins,sig.tax.bin = RCbins, bin.weight = binwt, sig.phy.cut = 1.96, sig.tax.cut = 0.95, check.name = TRUE)
The previous framework quantifying assembly processes based on entire-community null model analysis (Stegen et al 2013, 2015). add bigmemory function to handle large datasets.
qpen(comm = NULL, pd = NULL, pd.big.wd = NULL, pd.big.spname = NULL, tree = NULL, bNTI = NULL, RC = NULL, ab.weight = TRUE, meta.ab = NULL, exclude.conspecifics = FALSE, rand.time = 1000, sig.bNTI = 1.96, sig.rc = 0.95, nworker = 4, memory.G = 50, project = NA, wd = getwd(), output.detail = FALSE, save.bNTIRC = FALSE, taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE)
qpen(comm = NULL, pd = NULL, pd.big.wd = NULL, pd.big.spname = NULL, tree = NULL, bNTI = NULL, RC = NULL, ab.weight = TRUE, meta.ab = NULL, exclude.conspecifics = FALSE, rand.time = 1000, sig.bNTI = 1.96, sig.rc = 0.95, nworker = 4, memory.G = 50, project = NA, wd = getwd(), output.detail = FALSE, save.bNTIRC = FALSE, taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a taxon (species or OTU or ASV), thus rownames should be sample IDs, colnames should be taxa IDs. |
pd |
a character or a matrix or NULL. If it is a character, it specifies the name of the file to hold the backingfile description of the big phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. If it is matrix, it is the phylogenetic distance matrix with taxa IDs as colnames and rownames. The function will check the consistency of species names between phylogenetic matrix and comm. if pd is NULL, the function will calculate phylogenetic distance matrix from tree. |
pd.big.wd |
a folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. If pd has given the phylogenetic matrix, pd.big.wd should be NULL. If pd is NULL and pd.big.wd is NULL either, a folder will be created to save the big phylgoenetic distance matrix. |
pd.big.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. not necessary if pd has given the phylogenetic matrix. |
tree |
phylogenetic tree, an object of class "phylo". |
bNTI |
a matrix, if beta nearest taxon index (betaNTI) values are available, just directly input them as a squre matrix here. |
RC |
a matrix, if modified Raup-Crick metric based Bray-Curtis (RCbray) values are available, just directly input them as a squre matrix here. |
ab.weight |
logic, abundance-weighted or binary. default is TRUE, means abundance-weighted. |
meta.ab |
a numeric vector, to define the relative aubndance of each species in the regional pool. Default setting is NULL, means to calculate meta.ab as average relative abundance of each species across the samples. |
exclude.conspecifics |
Logic, should conspecific taxa in different communities be exclude from MNTD calculations? default is FALSE. The same as in the function bmntd. |
rand.time |
integer, randomization times. default is 1000. |
sig.bNTI |
numeric, the cutoff of significant betaNTI, default is 1.96. |
sig.rc |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
project |
character string, the prefix of saved output files. |
wd |
a folder path, where the files will be saved. |
output.detail |
logic, if TRUE, some detailed results including all null values will be output. |
save.bNTIRC |
logic, if TRUE, a file will be saved in the folder specified by wd. |
taxo.metric |
taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored. |
transform.method |
character or a defined function, to specify how to transform community matrix before calculating taxonomic dissimilarity. Due to the definition of the phylogenetic dissimilarity (bMNTD), it is not affected by data transformation. If transform.method is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'. |
logbase |
numeric, the logarithm base used when transform.method='log'. |
dirichlet |
Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE. |
This method is developed by Dr. James Stegen (2013 and 2015), from which we developed iCAMP. In all pairwised comparisons between samples/communities, the non-random phylogenetic turnovers recognized by phylogeny shuffle were counted as influence of environment selection (betaNTI>1.96 or <-1.96), and the non-random taxonomic turnovers in the rest pairwised comparisons were counted as influence of dispersal limitation (RCbray>0.95) or homogenizing dispersal (RCbray<-0.95). The rest part is called undominated. Please read the references for details.
betaNTI is the standardized effect size between the observed and null values of beta mean nearest taxon distance (betaMNTD). RCbray is the modified Roup-Crick metric based on taxonomic Bray-Curtis dissimilarity index.
Bigmemory (Kane et al 2013) is used to deal with large datasets.
output is a list with following elements.
ratio |
The overall percentage of turnovers governed by each process |
result |
a matrix, showing betaMNTD, Bray-Curtis dissimilarity, betaNTI, RC, and the identified process governing each turnover. Each row represents a turnover between two samples. |
pd |
phylogenetic distance matrix or the backingfile name if phylogenetic distance is saved by bigmemeory function. |
bMNTD |
a square matrix of observed betaMNTD. |
BC |
a square matrix of observed Bary-Curtis index. |
bMNTD.rand |
a matrix showing all null values of betaMNTD. |
BC.rand |
a matrix showing all null values of Bray-Curtis index. |
setting |
a data.frame showing all settings for this function. |
Version 7: 2021.4.18, add taxo.metric, transform.method, logbase, and dirichlet, to allow community data transform, dissimilar index other than Bray-Curtis, and relative abundances (values < 1) in the input community matrix. Version 6: 2020.12.5, use function bNTI.big to calculate betaNTI, which use bigmemory better. Version 5: 2020.9.1, remove setwd; change dontrun to donttest and revise save.wd in help doc. Version 4: 2020.8.21, update help document, add example. Version 3: 2016.2.15, add options to use bigmemory to handle large phylogenetic distance matrixes. Version 2: 2015.11.22
Daliang Ning
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Stegen, J.C., Lin, X., Fredrickson, J.K. & Konopka, A.E. (2015). Estimating and mapping ecological processes influencing microbial community assembly. Front Microbiol, 6, 370.
Kane, M.J., Emerson, J., & Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() nworker=2 # parallel computing thread number rand.time=5 # usually use 1000 for real data. # for a small dataset, phylogenetic distance matrix can be directly used. pd=example.data$pd qp=qpen(comm=comm, pd=pd, rand.time=rand.time,nworker=nworker) # for a big dataset, pdist.big may be used save.wd=paste0(tempdir(),"/pdbig.qpen") # please change to the folder you want to save the pd.big output. pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) qp2=qpen(comm=comm, pd=pd.big$pd.file, pd.big.wd=pd.big$pd.wd, pd.big.spname=pd.big$tip.label, tree=tree, rand.time=rand.time, nworker=nworker) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() nworker=2 # parallel computing thread number rand.time=5 # usually use 1000 for real data. # for a small dataset, phylogenetic distance matrix can be directly used. pd=example.data$pd qp=qpen(comm=comm, pd=pd, rand.time=rand.time,nworker=nworker) # for a big dataset, pdist.big may be used save.wd=paste0(tempdir(),"/pdbig.qpen") # please change to the folder you want to save the pd.big output. pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) qp2=qpen(comm=comm, pd=pd.big$pd.file, pd.big.wd=pd.big$pd.wd, pd.big.spname=pd.big$tip.label, tree=tree, rand.time=rand.time, nworker=nworker) setwd(wd0)
The framework quantifying assembly processes based on entire-community null model analysis (Stegen et al 2013, 2015). add bigmemory function to handle large datasets. This function can deal with local communities under different metacommunities (regional pools).
qpen.cm(comm, pd = NULL, pd.big.wd = NULL, pd.big.spname = NULL, tree = NULL, meta.group = NULL, meta.com = NULL, meta.frequency = NULL, meta.ab = NULL, ab.weight = TRUE, exclude.conspecifics = FALSE, rand.time = 1000, sig.bNTI = 1.96, sig.rc = 0.95, nworker = 4, memory.G = 50, project = NA, wd = getwd(), output.detail = FALSE, save.bNTIRC = FALSE, taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE)
qpen.cm(comm, pd = NULL, pd.big.wd = NULL, pd.big.spname = NULL, tree = NULL, meta.group = NULL, meta.com = NULL, meta.frequency = NULL, meta.ab = NULL, ab.weight = TRUE, exclude.conspecifics = FALSE, rand.time = 1000, sig.bNTI = 1.96, sig.rc = 0.95, nworker = 4, memory.G = 50, project = NA, wd = getwd(), output.detail = FALSE, save.bNTIRC = FALSE, taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a taxon (species or OTU or ASV), thus rownames should be sample IDs, colnames should be taxa IDs. |
pd |
a character or a matrix or NULL. If it is a character, it specifies the name of the file to hold the backingfile description of the big phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. If it is matrix, it is the phylogenetic distance matrix with taxa IDs as colnames and rownames. The function will check the consistency of species names between phylogenetic matrix and comm. if pd is NULL, the function will calculate phylogenetic distance matrix from tree. |
pd.big.wd |
a folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. If pd has given the phylogenetic matrix, pd.big.wd should be NULL. If pd is NULL and pd.big.wd is NULL either, a folder will be created to save the big phylgoenetic distance matrix. |
pd.big.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. not necessary if pd has given the phylogenetic matrix. |
tree |
phylogenetic tree, an object of class "phylo". |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. Rownames are sample IDs. The first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.com |
a list object, each element is a matrix or data.frame to define abundance (or relative abundance) of taxa in a metacommunity (regional pool). The element names indicate metacommunity names, which should be consistent with the metacommunity names defined in meta.group. If there is only one metacommunity, meta.com can be a matrix or data.frame to define taxa abundance (or relative abundance) in the metacommunity. Default is NULL, means to calculate metacommunity structure from comm according to metacommunities defined in meta.group. |
meta.frequency |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.ab |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group. |
ab.weight |
logic, abundance-weighted or binary. default is TRUE, means abundance-weighted. |
exclude.conspecifics |
Logic, should conspecific taxa in different communities be exclude from MNTD calculations? default is FALSE. The same as in the function bmntd. |
rand.time |
integer, randomization times. default is 1000. |
sig.bNTI |
numeric, the cutoff of significant betaNTI, default is 1.96. |
sig.rc |
numeric, the cutoff of significant modified Raup-Crick metric, default is 0.95. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb. |
project |
character string, the prefix of saved output files. |
wd |
a folder path, where the files will be saved. |
output.detail |
logic, if TRUE, some detailed results including all null values will be output. |
save.bNTIRC |
logic, if TRUE, a file will be saved in the folder specified by wd. |
taxo.metric |
taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored. |
transform.method |
character or a defined function, to specify how to transform community matrix before calculating taxonomic dissimilarity. Due to the definition of the phylogenetic dissimilarity (bMNTD), it is not affected by data transformation. If transform.method is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'. |
logbase |
numeric, the logarithm base used when transform.method='log'. |
dirichlet |
Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE. |
This function is particularly designed for samples from different metacommunities. The null model will randomize the commuity matrix under different metacommunities, separately (and independently). All other details are the same as the function qpen.
output is a list with following elements.
ratio |
The overall percentage of turnovers governed by each process |
result |
a matrix, showing betaMNTD, Bray-Curtis dissimilarity, betaNTI, RC, and the identified process governing each turnover. Each row represents a turnover between two samples. |
pd |
phylogenetic distance matrix or the backingfile name if phylogenetic distance is saved by bigmemeory function. |
bMNTD |
a square matrix of observed betaMNTD. |
BC |
a square matrix of observed Bary-Curtis index. |
bMNTD.rand |
a matrix showing all null values of betaMNTD. |
BC.rand |
a matrix showing all null values of Bray-Curtis index. |
setting |
a data.frame showing all settings for this function. |
Version 1: 2021.8.3
Daliang Ning
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Stegen, J.C., Lin, X., Fredrickson, J.K. & Konopka, A.E. (2015). Estimating and mapping ecological processes influencing microbial community assembly. Front Microbiol, 6, 370.
Kane, M.J., Emerson, J., & Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
qpen
, RC.cm
, bNTI.cm
, bNTI.big.cm
data("example.data") comm=example.data$comm tree=example.data$tree # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() nworker=2 # parallel computing thread number rand.time=5 # usually use 1000 for real data. # for a small dataset, phylogenetic distance matrix can be directly used. pd=example.data$pd qp=qpen.cm(comm=comm, meta.group=meta.group, pd=pd, rand.time=rand.time,nworker=nworker) # for a big dataset, pdist.big may be used save.wd=paste0(tempdir(),"/pdbig.qpen.cm") # please change to the folder you want to save the pd.big output. pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) qp2=qpen.cm(comm=comm, meta.group=meta.group, pd=pd.big$pd.file, pd.big.wd=pd.big$pd.wd, pd.big.spname=pd.big$tip.label, tree=tree, rand.time=rand.time, nworker=nworker) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() nworker=2 # parallel computing thread number rand.time=5 # usually use 1000 for real data. # for a small dataset, phylogenetic distance matrix can be directly used. pd=example.data$pd qp=qpen.cm(comm=comm, meta.group=meta.group, pd=pd, rand.time=rand.time,nworker=nworker) # for a big dataset, pdist.big may be used save.wd=paste0(tempdir(),"/pdbig.qpen.cm") # please change to the folder you want to save the pd.big output. pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) qp2=qpen.cm(comm=comm, meta.group=meta.group, pd=pd.big$pd.file, pd.big.wd=pd.big$pd.wd, pd.big.spname=pd.big$tip.label, tree=tree, rand.time=rand.time, nworker=nworker) setwd(wd0)
Bootstrapping analysis of the results from QPEN (quantifying assembly processes based on entire-community null model analysis, Stegen et al 2013, 2015), to estimate the mean and variation of each index and each process influence in each group, and calculate the significance of the difference between groups.
qpen.test(qpen.result, treat, rand.time = 1000, between.group = FALSE, out.detail = TRUE, silent = FALSE)
qpen.test(qpen.result, treat, rand.time = 1000, between.group = FALSE, out.detail = TRUE, silent = FALSE)
qpen.result |
the output of the function 'qpen', or the element 'result' (data.frame) in qpen output. |
treat |
matrix or data.frame, each column indicates the group or treatment of each sample, rownames are sample IDs. |
rand.time |
integer, bootstrapping times. default is 1000. |
between.group |
logic. if True, the turnovers between each pair of treatments will also be calculated as a group. |
out.detail |
logic. if True, the 'qpen' results and the bootstrapping results in each group will also be output. |
silent |
logic. if FALSE, some messages will show during calculation. |
Basically use bootstrapping of samples to estimate the variation of each index's mean and relative importance of each process in each group, as well as the effect size and signficance of the difference between different groups.
Output is a list.
obs.summary |
the mean, standarded deviation, quartile, and boxplot elements for each observed index (e.g. bMNTD, bNTI, etc.) in each group. |
boot.summary |
the mean, standarded deviation, quartile, and boxplot elements of the average level of each index and the estimated relative importance of each process in each group. |
compare |
The relative difference, Cohen's d, and P value of the difference of each index or each process importance between different groups. |
group.results.detail |
the qpen results in each group. |
boot.detail |
the average value of each index or estimated relative importanc of each process in each group in each time of bootstrapping. |
Version 2: 2021.6.9 speed up transformation for a huge number of comparisons by using package data.table. Version 1: 2021.4.15 include the function into package iCAMP.
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Stegen, J.C., Lin, X., Fredrickson, J.K. & Konopka, A.E. (2015). Estimating and mapping ecological processes influencing microbial community assembly. Front Microbiol, 6, 370.
data("example.data") comm=example.data$comm tree=example.data$tree treat=example.data$treat # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() nworker=2 # parallel computing thread number rand.time=5 # usually use 1000 for real data. # for a big dataset, pdist.big may be used save.wd=paste0(tempdir(),"/pdbig.qpen.test") # please change to the folder you want to save the pd.big output. pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) qp2=qpen(comm=comm, pd=pd.big$pd.file, pd.big.wd=pd.big$pd.wd, pd.big.spname=pd.big$tip.label, tree=tree, rand.time=rand.time, nworker=nworker) qptest=qpen.test(qpen.result=qp2, treat=treat) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree treat=example.data$treat # since pdist.big need to save output to a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() nworker=2 # parallel computing thread number rand.time=5 # usually use 1000 for real data. # for a big dataset, pdist.big may be used save.wd=paste0(tempdir(),"/pdbig.qpen.test") # please change to the folder you want to save the pd.big output. pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) qp2=qpen(comm=comm, pd=pd.big$pd.file, pd.big.wd=pd.big$pd.wd, pd.big.spname=pd.big$tip.label, tree=tree, rand.time=rand.time, nworker=nworker) qptest=qpen.test(qpen.result=qp2, treat=treat) setwd(wd0)
Calculate modified Roup-Crick index based on Bray-Curtis similarity (RC.Bray) for each phylogenetic bin. The null model algorithm will randomize the whole community data matrix of all bins.
RC.bin.bigc(com, sp.bin, rand = 1000, na.zero = TRUE, nworker = 4, memory.G = 50, big.method = c("loop", "no"), weighted = TRUE, unit.sum = NULL, meta.ab = NULL, sig.index=c("RC","Confidence","SES"), detail.null=FALSE,output.bray=FALSE, taxo.metric= "bray", transform.method=NULL, logbase=2, dirichlet=FALSE)
RC.bin.bigc(com, sp.bin, rand = 1000, na.zero = TRUE, nworker = 4, memory.G = 50, big.method = c("loop", "no"), weighted = TRUE, unit.sum = NULL, meta.ab = NULL, sig.index=c("RC","Confidence","SES"), detail.null=FALSE,output.bray=FALSE, taxo.metric= "bray", transform.method=NULL, logbase=2, dirichlet=FALSE)
com |
community data matrix. rownames are sample names. colnames are species names. |
sp.bin |
one-column matrix, rownames are taxa IDs (i.e. OTU IDs), the only column shows the bin ID of each taxon. Bin IDs are integers. |
rand |
integer, randomization times. default is 1000. |
na.zero |
logic. If community data marix has any zero-sum row (sample), Bray-Curtis index will be NA. Somtimes, this kind of NA need be set as zero to avoid some format problem in following calculation. Default is TRUE. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of big data will not be limited by physical memory. unit is Gb. default is 50Gb. |
big.method |
character, the method to handle big data. loop, randomization once after another; no, use parallel computing. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
unit.sum |
If unit.sum is set as a number or a numeric vector, the taxa abundances will be divided by unit.sum to calculate the relative abundances, and the Bray-Cuits index in each bin will become manhattan index divided by 2. usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this special transformation. |
meta.ab |
a numeric vector, to define the relative aubndance of each species in the regional pool. Default setting is NULL, means to calculate meta.ab as average relative abundance of each species across the samples. |
sig.index |
character, the index for null model significance test. RC, modified Raup-Crick index (RC) based on taxonomic dissimilarity (default is Bray-Curtis, BC), i.e. count the number of null BC lower than observed BC plus a half of the number of null BC equal to observed BC, to get alpha, then calculate RCbray as (2 x alpha - 1). SES, standard effect size; Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level. default is RC. If input a vector, only the first element will be used. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
output.bray |
logic, if TRUE, the output will include observed taxonomic dissimilarity (default is Bray-Curtis). |
taxo.metric |
taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored. |
transform.method |
character or a defined function, to specify how to transform community matrix before calculating dissimilarity. if it is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'. |
logbase |
numeric, the logarithm base used when transform.method='log'. |
dirichlet |
Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE. |
This function calculates RC.Bray in each phylogenetic bin while randomizing across all bins (Ning et al 2020). The Raup-Crick based on taxonomic dissimilarity index was proposed by Chase in 2011, and then modified to include consider species relative abundances by Stegen in 2013. The non-random part recognized by RC can reflect the influence niche seletion and extreme dispersal. The original codes used a relatively time-consuming looping. This function improved the efficiency and added some parameters to fit iCAMP analysis.
SES (Kraft et al 2011) and Confidence (Ning et al 2020) are alternative significance testing indexes to evaluate how the observed beta diversity index deviates from null expectation.
Output is a list.
index |
list, each element is a square matrix of RC (or SES or Confidence based on Bray-Curtis) values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
sig.index |
character, indicates the index for null model significance test, RC, Confidence, or SES. |
BC.obs |
Output only if output.bray is TRUE. A list, each element is a square matrix of observed taxonomic dissimilarity (default is Bray-Curtis) index values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
rand |
Output only if detail.null is TRUE. A list, each element is a matrix with null values of Bray-Curtis index for each turnover of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
Version 7: 2021.4.18, fix the bug when detail.null=TRUE and comm has only two samples. Version 6: 2021.4.17, add taxo.metric, transform.method, logbase, and dirichlet, to allow community data transform, dissimilar index other than Bray-Curtis, and relative abundances (values < 1) in the input community matrix. Version 5: 2020.8.19, update help document, add example. Version 4: 2020.8.2, add sig.index, detail.null, and output.bray. Version 3: 2020.6.14, add meta.ab Version 2: 2018.10.3, add unit.sum Version 1: 2015.3.17
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Kraft, N.J.B., Comita, L.S., Chase, J.M., Sanders, N.J., Swenson, N.G., Crist, T.O. et al. (2011). Disentangling the drivers of beta diversity along latitudinal and elevational gradients. Science, 333, 1755-1758.
data("example.data") comm=example.data$comm sp.bin=example.data$sp.bin rand.time=20 # usually use 1000 for real data. nworker=2 # parallel computing thread number RCbin=RC.bin.bigc(com=comm, sp.bin=sp.bin, rand=rand.time, nworker=nworker, weighted=TRUE, sig.index="RC")
data("example.data") comm=example.data$comm sp.bin=example.data$sp.bin rand.time=20 # usually use 1000 for real data. nworker=2 # parallel computing thread number RCbin=RC.bin.bigc(com=comm, sp.bin=sp.bin, rand=rand.time, nworker=nworker, weighted=TRUE, sig.index="RC")
Calculate modified Roup-Crick index based on Bray-Curtis similarity (RC.Bray) for each phylogenetic bin. The null model algorithm will randomize the whole community data matrix of all bins. This function can deal with local communities under different metacommunities (regional pools).
RC.bin.cm(com, sp.bin, rand = 1000, na.zero = TRUE, meta.group = NULL, meta.frequency = NULL, meta.ab = NULL, nworker = 4, memory.G = 50, big.method = c("loop", "no"), weighted = TRUE, unit.sum = NULL, sig.index = c("RC", "Confidence", "SES"), detail.null = FALSE, output.bray = FALSE, taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE)
RC.bin.cm(com, sp.bin, rand = 1000, na.zero = TRUE, meta.group = NULL, meta.frequency = NULL, meta.ab = NULL, nworker = 4, memory.G = 50, big.method = c("loop", "no"), weighted = TRUE, unit.sum = NULL, sig.index = c("RC", "Confidence", "SES"), detail.null = FALSE, output.bray = FALSE, taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE)
com |
community data matrix. rownames are sample names. colnames are species names. |
sp.bin |
one-column matrix, rownames are taxa IDs (i.e. OTU IDs), the only column shows the bin ID of each taxon. Bin IDs are integers. |
rand |
integer, randomization times. default is 1000. |
na.zero |
logic. If community data marix has any zero-sum row (sample), Bray-Curtis index will be NA. Somtimes, this kind of NA need be set as zero to avoid some format problem in following calculation. Default is TRUE. |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. Rownames are sample IDs. The first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.frequency |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.ab |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of big data will not be limited by physical memory. unit is Gb. default is 50Gb. |
big.method |
character, the method to handle big data. loop, randomization once after another; no, use parallel computing. |
weighted |
Logic, consider abundances or not (just presence/absence). default is TRUE. |
unit.sum |
If unit.sum is set as a number or a numeric vector, the taxa abundances will be divided by unit.sum to calculate the relative abundances, and the Bray-Cuits index in each bin will become manhattan index divided by 2. usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this special transformation. |
sig.index |
character, the index for null model significance test. RC, modified Raup-Crick index (RC) based on taxonomic dissimilarity (default is Bray-Curtis, BC), i.e. count the number of null BC lower than observed BC plus a half of the number of null BC equal to observed BC, to get alpha, then calculate RCbray as (2 x alpha - 1). SES, standard effect size; Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level. default is RC. If input a vector, only the first element will be used. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
output.bray |
logic, if TRUE, the output will include observed taxonomic dissimilarity (default is Bray-Curtis). |
taxo.metric |
taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored. |
transform.method |
character or a defined function, to specify how to transform community matrix before calculating dissimilarity. if it is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'. |
logbase |
numeric, the logarithm base used when transform.method='log'. |
dirichlet |
Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE. |
This function is particularly designed for samples from different metacommunities. The null model will randomize the commuity matrix under different metacommunities, separately (and independently). All other details are the same as the function RC.bin.bigc.
Output is a list.
index |
list, each element is a square matrix of RC (or SES or Confidence based on Bray-Curtis) values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
sig.index |
character, indicates the index for null model significance test, RC, Confidence, or SES. |
BC.obs |
Output only if output.bray is TRUE. A list, each element is a square matrix of observed taxonomic dissimilarity (default is Bray-Curtis) index values of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
rand |
Output only if detail.null is TRUE. A list, each element is a matrix with null values of Bray-Curtis index for each turnover of a bin. The elements (bins) are in the same order as in the input pdid.bin. |
Version 1: 2021.8.4
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Kraft, N.J.B., Comita, L.S., Chase, J.M., Sanders, N.J., Swenson, N.G., Crist, T.O. et al. (2011). Disentangling the drivers of beta diversity along latitudinal and elevational gradients. Science, 333, 1755-1758.
data("example.data") comm=example.data$comm sp.bin=example.data$sp.bin # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) rand.time=20 # usually use 1000 for real data. nworker=2 # parallel computing thread number RCbin=RC.bin.cm(com=comm, meta.group=meta.group, sp.bin=sp.bin, rand=rand.time, nworker=nworker, weighted=TRUE, sig.index="RC")
data("example.data") comm=example.data$comm sp.bin=example.data$sp.bin # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) rand.time=20 # usually use 1000 for real data. nworker=2 # parallel computing thread number RCbin=RC.bin.cm(com=comm, meta.group=meta.group, sp.bin=sp.bin, rand=rand.time, nworker=nworker, weighted=TRUE, sig.index="RC")
The Raup-Crick based on taxonomic dissimilarity index (i.e. Bray-Curtis) is to use null models to disentangle variation in community dissimilarity from variation in alpha-diversity. This function can deal with local communities under different metacommunities (regional pools).
RC.cm(comm, rand = 1000, na.zero = TRUE, nworker = 4, meta.group = NULL, meta.frequency = NULL, meta.ab = NULL, memory.G = 50, weighted = TRUE, unit.sum = NULL, sig.index = c("RC", "Confidence", "SES"), detail.null = FALSE, output.bray = FALSE, silent = FALSE, taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE)
RC.cm(comm, rand = 1000, na.zero = TRUE, nworker = 4, meta.group = NULL, meta.frequency = NULL, meta.ab = NULL, memory.G = 50, weighted = TRUE, unit.sum = NULL, sig.index = c("RC", "Confidence", "SES"), detail.null = FALSE, output.bray = FALSE, silent = FALSE, taxo.metric = "bray", transform.method = NULL, logbase = 2, dirichlet = FALSE)
comm |
Community data matrix. rownames are sample names. colnames are species names. |
rand |
integer, randomization times, default is 1000. |
na.zero |
logic. If community data marix has any zero-sum row (sample), Bray-Curtis index will be NA. Somtimes, this kind of NA need be set as zero to avoid some format problem in following calculation. Default is TRUE. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
meta.group |
matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to. rownames are sample IDs. first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity. |
meta.frequency |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group. |
meta.ab |
matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group. |
memory.G |
numeric, to set the memory size as you need, so that calculation of big data will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
lOgic, whether to use abundance-weighted metrics. default is TRUE |
unit.sum |
If unit.sum is set as a number or a numeric vector, the taxa abundances will be divided by unit.sum to calculate the relative abundances, and the Bray-Cuits index in each bin will become manhattan index divided by 2. usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this special transformation. |
sig.index |
character, the index for null model significance test. RC, modified Raup-Crick index (RC) based on Bray-Curtis (BC), i.e. count the number of null BC lower than observed BC plus a half of the number of null BC equal to observed BC, to get alpha, then calculate RCbray as (2 x alpha - 1). SES, standard effect size; Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level. default is RC. If input a vector, only the first element will be used. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
output.bray |
logic, if TRUE, the output will include observed taxonomic dissimilarity (default is Bray-Curtis). |
silent |
logic, if FALSE, some messages will show during calculation. |
taxo.metric |
taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored. |
transform.method |
character or a defined function, to specify how to transform community matrix before calculating dissimilarity. if it is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'. |
logbase |
numeric, the logarithm base used when transform.method='log'. |
dirichlet |
Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE. |
While all other details are the same as the function RC.pc, this function is particularly designed for samples from different metacommunities. The null model will randomize the commuity matrix under different metacommunities, separately (and independently).
Output is a list.
index |
a square matrix of RC (or SES or Confidence based on Bray-Curtis) values. |
sig.index |
character, indicates the index for null model significance test, RC, Confidence, or SES. |
BC.obs |
Output only if output.bray is TRUE. A square matrix of observed taxonomic dissimilarity index (default is Bray-Curtis dissimilarity) values. |
rand |
Output only if detail.null is TRUE. A matrix with all null values of Bray-Curtis index for each turnover. |
Version 1: 2021.8.2
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Kraft, N.J.B., Comita, L.S., Chase, J.M., Sanders, N.J., Swenson, N.G., Crist, T.O. et al. (2011). Disentangling the drivers of beta diversity along latitudinal and elevational gradients. Science, 333, 1755-1758.
data("example.data") comm=example.data$comm rand.time=20 # usually use 1000 for real data. # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number RC=RC.cm(comm=comm, rand = rand.time, nworker = nworker, meta.group=meta.group, weighted = TRUE, sig.index="RC")
data("example.data") comm=example.data$comm rand.time=20 # usually use 1000 for real data. # in this example, 10 samples from one metacommunity, # the other 10 samples from another metacommunity. meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10))) rownames(meta.group)=rownames(comm) nworker=2 # parallel computing thread number RC=RC.cm(comm=comm, rand = rand.time, nworker = nworker, meta.group=meta.group, weighted = TRUE, sig.index="RC")
The Raup-Crick based on taxonomic dissimilarity index (i.e. Bray-Curtis) is to use null models to disentangle variation in community dissimilarity from variation in alpha-diversity.
RC.pc(comm, rand = 1000, na.zero = TRUE, nworker = 4, memory.G = 50, weighted = TRUE, unit.sum = NULL, meta.ab = NULL,sig.index=c("RC","Confidence","SES"), detail.null=FALSE,output.bray=FALSE,silent=FALSE, taxo.metric="bray", transform.method=NULL, logbase=2, dirichlet=FALSE)
RC.pc(comm, rand = 1000, na.zero = TRUE, nworker = 4, memory.G = 50, weighted = TRUE, unit.sum = NULL, meta.ab = NULL,sig.index=c("RC","Confidence","SES"), detail.null=FALSE,output.bray=FALSE,silent=FALSE, taxo.metric="bray", transform.method=NULL, logbase=2, dirichlet=FALSE)
comm |
Community data matrix. rownames are sample names. colnames are species names. |
rand |
integer, randomization times, default is 1000. |
na.zero |
logic. If community data marix has any zero-sum row (sample), Bray-Curtis index will be NA. Somtimes, this kind of NA need be set as zero to avoid some format problem in following calculation. Default is TRUE. |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
memory.G |
numeric, to set the memory size as you need, so that calculation of big data will not be limited by physical memory. unit is Gb. default is 50Gb. |
weighted |
lOgic, whether to use abundance-weighted metrics. default is TRUE |
unit.sum |
If unit.sum is set as a number or a numeric vector, the taxa abundances will be divided by unit.sum to calculate the relative abundances, and the Bray-Cuits index in each bin will become manhattan index divided by 2. usually, unit.sum can be set as the sequencing depth in each sample. Default setting is NULL, means not to do this special transformation. |
meta.ab |
a numeric vector, to define the relative aubndance of each species in the regional pool. Default setting is NULL, means to calculate meta.ab as average relative abundance of each species across the samples. |
sig.index |
character, the index for null model significance test. RC, modified Raup-Crick index (RC) based on Bray-Curtis (BC), i.e. count the number of null BC lower than observed BC plus a half of the number of null BC equal to observed BC, to get alpha, then calculate RCbray as (2 x alpha - 1). SES, standard effect size; Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level. default is RC. If input a vector, only the first element will be used. |
detail.null |
logic, if TRUE, the output will include all the null values. Default is FALSE. |
output.bray |
logic, if TRUE, the output will include observed taxonomic dissimilarity (default is Bray-Curtis). |
silent |
logic, if FALSE, some messages will show during calculation. |
taxo.metric |
taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored. |
transform.method |
character or a defined function, to specify how to transform community matrix before calculating dissimilarity. if it is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'. |
logbase |
numeric, the logarithm base used when transform.method='log'. |
dirichlet |
Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE. |
The Raup-Crick based on taxonomic dissimilarity index was proposed by Chase in 2011, and then modified to include consider species relative abundances by Stegen in 2013. The non-random part recognized by RC can reflect the influence niche seletion and extreme dispersal. The original codes used a relatively time-consuming looping. This function improved the efficiency and added some parameters to fit iCAMP analysis.
SES (Kraft et al 2011) and Confidence (Ning et al 2020) are alternative significance testing indexes to evaluate how the observed beta diversity index deviates from null expectation.
Output is a list.
index |
a square matrix of RC (or SES or Confidence based on Bray-Curtis) values. |
sig.index |
character, indicates the index for null model significance test, RC, Confidence, or SES. |
BC.obs |
Output only if output.bray is TRUE. A square matrix of observed taxonomic dissimilarity index (default is Bray-Curtis dissimilarity) values. |
rand |
Output only if detail.null is TRUE. A matrix with all null values of Bray-Curtis index for each turnover. |
Version 8: 2021.4.18, fix the bug when detail.null=TRUE and comm has only two samples. Version 7: 2021.4.17, add taxo.metric, transform.method, logbase, and dirichlet, to allow community data transform, dissimilar index other than Bray-Curtis, and relative abundances (values < 1) in the input community matrix. Version 6: 2020.8.19, update help document, add example. Version 5: 2020.8.2, add sig.index, detail.null, and output.bray. Version 4: 2020.6.14, add meta.ab Version 3: 2018.10.3, add unit.sum. Version 2: 2015.8.5, revise the randomization algorithm according to Stegen et al 2013. Version 1: 2015.2.12
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. & Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in alpha-diversity. Ecosphere, 2, 1-11.
Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.
Kraft, N.J.B., Comita, L.S., Chase, J.M., Sanders, N.J., Swenson, N.G., Crist, T.O. et al. (2011). Disentangling the drivers of beta diversity along latitudinal and elevational gradients. Science, 333, 1755-1758.
data("example.data") comm=example.data$comm rand.time=20 # usually use 1000 for real data. nworker=2 # parallel computing thread number RC=RC.pc(comm=comm, rand = rand.time, nworker = nworker, weighted = TRUE, sig.index="RC")
data("example.data") comm=example.data$comm rand.time=20 # usually use 1000 for real data. nworker=2 # parallel computing thread number RC=RC.pc(comm=comm, rand = rand.time, nworker = nworker, weighted = TRUE, sig.index="RC")
To calculate the abundance-weighted or unweighted percentage of taxa following Sloan's neutral theory model. The original R code is from Burns et al (2016). Bootstrapping test and different metacommunity settings are added.
snm(comm, meta.com = NULL, taxon = NULL, alpha = 0.05, simplify = FALSE) snm.boot(comm, rand=1000, meta.com=NULL, taxon=NULL, alpha=0.05, detail=TRUE) snm.comm(comm, treat=NULL, meta.coms=NULL, meta.com=NULL, meta.group=NULL, rand=1000,taxon=NULL,alpha=0.05, two.tail=TRUE,output.detail=TRUE)
snm(comm, meta.com = NULL, taxon = NULL, alpha = 0.05, simplify = FALSE) snm.boot(comm, rand=1000, meta.com=NULL, taxon=NULL, alpha=0.05, detail=TRUE) snm.comm(comm, treat=NULL, meta.coms=NULL, meta.com=NULL, meta.group=NULL, rand=1000,taxon=NULL,alpha=0.05, two.tail=TRUE,output.detail=TRUE)
comm |
matrix or data.frame, community data, each row is a sample or site, each colname is a taxon (a species or OTU or ASV), thus rownames should be sample IDs, colnames should be taxa IDs. |
meta.com |
matrix or data.frame, metacommunity data, each column represents a taxon, can be one or multiple rows. If NULL, the comm will be used to estimate relative abundance of each taxon in the metacommunity. In function snm.comm, if this is given, meta.group will be ignored. |
taxon |
matrix or data.frame, classification information of each taxon. Each row represents a taxon. Rownames are taxa IDs. |
alpha |
numeric, the significance level threshold counted as alpha value, usually 0.05. |
simplify |
logic, if FALSE, the function snm will performan more model fitting test and return detailed statistical information. |
rand |
integer, randomization times. default is 1000. |
detail |
logic, if TRUE, the detailed output from the function snm will be included into the output of snm.boot. |
treat |
matrix or data.frame, indicating the group or treatment of each sample, rownames are sample IDs. if input multiple columns, they will be analyzed one column after another. |
meta.coms |
a list, to specify the metacommunity data for each level of treatments in each of the columns of 'treat'. A basic element is a matrix, each column represents a taxon, can be one or multiple rows. If this is given, 'meta.group' and 'meta.com' will be ignored. |
meta.group |
a matrix, to specify the metacommunity ID that each sample belongs to. It should have the some column number as 'treat' if 'treat' is given. If meta.coms, meta.com, and meta.group are all NULL, the samples are deemed from the same metacommunity. |
two.tail |
logic, to specify the p value is calculated as two-tail (TRUE) or one-tail (FALSE). |
output.detail |
logic, if TRUE, the output of the function snm.comm will include all bootstrapping values. |
The method is developed by Burns et al (2016) based on the Sloan's model (Sloan et al 2006, 2007) which is derived from Hubbell's unified neutral theory (Hubbell 2001). According to neutral theory, the regional relative abundance and occurrence frequency of each taxon should follow a certain model (Sloan et al 2006). Thus, a taxon can be counted as 'neutral taxon' if inside a certain confidence interval of the neutral expectation, and their percentage in a sample may be used to reflect the importance of neutral processes (Burns et al 2016).
Output of snm is a list.
stats |
output only if simplify is FALSE, showing statistics about model fitting and coefficients. m, dispersal rate estimated by Non-linear least squares (NLS). m.ci.2.5 and m.ci.97.5, the confidence interval of m. m.mle, dispersal rate calculated by Maximum likelihood estimation. maxLL, binoLL, and poisLL, maximum likelihood function value (L) for neutral theory model, binomial model, and Poisson model, respectively. Rsqr, Rsqr.bino, and Rsqr.pois, R squared (coefficient of determination) of neutral theory model, binomial model, and Poisson model, respectively. RMSE, RMSE.bino, and RMSE.pois, root-mean-square error. AIC, BIC, AIC.bino, BIC.bino, AIC.pois, and BIC.pois, AIC and BIC of different models. N, mean individual number in each local community. Samples, sample number. Richness, total number of taxa. Detect, detected limitation of relative abundance in each local community, i.e. 1/N. |
detail |
output only if simplify is FALSE. a matrix, showing detailed information of each taxon. Each row represent a taxon. Columns as blow. p, observed regional relative abundance of each taxon. freq, observed occurrence frequency of each taxon. freq.pred, occurrence frequency predicted by neutral theory model. pred.lwr and pred.upr, the confidence interval of occurrence frequency estimated by neutral theory model. bino.pred, bino.lwr, bino.upr, pois.pred, pois.lwr, and bino.upr, the expectation and confidence interval of occurrence frequency estimated by bionomial and Poisson model, respectively. type, the taxon is identified as 'Neutral', or 'Below' or 'Above' the confidence interval of neutral expectation. |
type.uw |
the percentage (unweighted) of taxa within (Neutral), below, or above the confidence interval of the neutral theory expected frequency (given the regional relative abudance). |
type.wt |
the abundance weighted percentage (relative abundance sum) of taxa within (Neutral), below, or above the confidence interval of the neutral theory expectation. |
sp.names |
the taxa IDs for each type. |
Output of snm.boot is a list.
stats , detail , type.uw , type.wt
|
output only if detail is TRUE. the same as output of snm. |
summary |
a matrix, showing observed values, mean, standard deviation, quartiles, boxplot key points and outliers, for the unweighted and weighted precentage of taxa within (Neutral), below, and above the confidence interval of neutral theory expectation. |
rand |
a matrix, showing the bootstrapping values of the unweighted and weighted precentage of different types of taxa. Each row represents one time of bootstrapping. |
Output of snm.comm is a list.
stats |
treat.type, the treatment type, a column name of the input 'treat'. treatment.id, the treatment name. Others are the same as the output 'stats' of snm. The most commonly used information is the dispersal rate values under different treatments, to investigate the effect of treatment on species dispersal. |
plot.detail |
a matrix, showing the output 'detail' of snm for each treatment. it is ofen used to show the type of each taxon and draw the figure of neutral confidence interval and each taxon. |
ratio.summary |
a matrix, showing the output 'summary' of snm.boot for each treatment. it is ofen used to draw box plots. |
pvalues |
a matrix, showing significance of difference between different treatments. |
boot.detail |
output only if output.detail is TRUE. A matrix, showing the output 'rand' of snm.boot for each treatment. |
Version 3: 2020.8.21, update help document, add example.
Version 2: 2018.4.16, add meta.group, meta.com, meta.coms, to consider if the samples are from different metacommunities.
Version 1: 2017.7.21
Daliang Ning
Burns, A.R., Stephens, W.Z., Stagaman, K., Wong, S., Rawls, J.F., Guillemin, K. et al. (2016). Contribution of neutral processes to the assembly of gut microbial communities in the zebrafish over host development. ISME J, 10, 655-664.
Sloan, W.T., Lunn, M., Woodcock, S., Head, I.M., Nee, S. & Curtis, T.P. (2006). Quantifying the roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol, 8, 732-740.
Sloan, W.T., Woodcock, S., Lunn, M., Head, I.M. & Curtis, T.P. (2007). Modeling taxa-abundance distributions in microbial communities using environmental sequence data. Microbial Ecology, 53, 443-455.
Hubbell, S.P. (2001). The unified neutral theory of biodiversity and biogeography. Princeton University Press, Princeton, New Jersey.
data("example.data") comm=example.data$comm treat=example.data$treat rand.time=10 # usually use 1000 for real data. snmtest=snm.comm(comm = comm, treat = treat, rand = rand.time)
data("example.data") comm=example.data$comm treat=example.data$treat rand.time=10 # usually use 1000 for real data. snmtest=snm.comm(comm = comm, treat = treat, rand = rand.time)
Phylogenetic binning for iCAMP analysis. To handle large phylogenetic tree, phylogenetic distance matrix should be calculated and saved using the package 'bigmemory' in advance.
taxa.binphy.big(tree, pd.desc, pd.spname, pd.wd, outgroup.tip = NA, outgroup.rm = TRUE, d.cut=NULL, ds=0.2, bin.size.limit = 24, nworker = 4, d.cut.method=c("maxpd","maxdroot"))
taxa.binphy.big(tree, pd.desc, pd.spname, pd.wd, outgroup.tip = NA, outgroup.rm = TRUE, d.cut=NULL, ds=0.2, bin.size.limit = 24, nworker = 4, d.cut.method=c("maxpd","maxdroot"))
tree |
phylogenetic tree, an object of class "phylo". |
pd.desc |
the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. |
pd.spname |
character vector, taxa id in the same rank as the big matrix of phylogenetic distances. |
pd.wd |
folder path, where the bigmemmory file of the phylogenetic distance matrix are saved. |
outgroup.tip |
a vector of tip names (i.e. OTU IDs) which is in totally different lineage from all other tips, thus can be used as outgroup to root the tree. For example, Archaeal OTUs may be set as outgroup tips when analyzing Bacterial OTUs. Default is NA, means no need to set outgroup tip. |
outgroup.rm |
logic, whether to remove the outgroup.tip after the tree is rooted. Default is TRUE. |
d.cut |
numeric, the distance from root to the truncating point of the tree. |
ds |
numeric, the general threshold of phylogenetic distance within which the phylogenetic signal is significant. default is 0.2. |
bin.size.limit |
integer, the minimal requirement of bin size (taxa numer in a bin). Default setting is 24. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
d.cut.method |
character, to specify the method to calculate d.cut from ds. 'maxpd' means based on maximum phylogenetic distance, d.cut = (maxpd - ds)/2. 'maxdroot' means based on maximum distance to root, d.cut = maxdroot - (ds/2), which is preferred if the tree only has one edge from the root. |
The phylogenetic tree is truncated at a certain phylogenetic distance (as short as necessary) to the root (d.cut), by which all the rest connections between tips (taxa) are lower than a threshold. Within the threshold, phylogenetic signal is generally significant. The taxa derived from the same ancestor after the truncating point are grouped to the same strict bin. Then, each small bin is merged into the bin with the nearest relatives. This procedure is repeated until all merged bins have enough taxa (>= bin.size.limit). Bigmemory (Kane et al 2013) is used to deal with large datasets.
Output is a list.
sp.bin |
matrix, rownames are taxa IDs; the first column is strict bin IDs; the second column indicates which strict bin the taxon is merged into; the third column is the final bin IDs. |
bin.united.sp |
list, each element is a vector of taxa IDs, indicating the taxa in a final bin (after small bins are merged into nearest large bins). |
bin.strict.sp |
list, each element is a vector of taxa ID(s), indicating the taxa in a strict bin (before small bins are merged into large bins). |
state.strict |
matrix, status of each strict bin. bin.strict.id, the strict bin ID; bin.strict.taxa.num, taxa number in each strict bin; bin.pd.max, bin.pd.mean, and bin.pd.sd, the maximum, mean, and standard deviation of the pairwise phylogenetic distances in each strict bin. |
state.united |
matrix, status of each final bin. bin.united.id.old, the ID of the largest strict bin in each final bin; bin.united.tax.num, taxa number in each final bin; bin.pd.max, bin.pd.mean, and bin.pd.sd, the maximum, mean, and standard deviation of the pairwise phylogenetic distances in each strict bin. |
Version 4: 2021.6.26, fix a bug which may mess up some large taxa ids. Version 4: 2021.6.4, add option d.cut.method to handle trees with only one edge from root. Version 3: 2020.9.1, remove setwd. change dontrun to donttest and revise save.wd in help doc. Version 2: 2020.8.19, update help document, add example. Version 1: 2015.12.16
Daliang Ning
Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.
Kane, M.J., Emerson, J., & Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
data("example.data") comm=example.data$comm tree=example.data$tree # since pd.big need to specify a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.taxa.binphy") # please change to the folder you want to save the big niche difference matrix. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) ds = 0.2 # setting can be changed to explore the best choice bin.size.limit = 5 # setting can be changed to explore the best choice. # here set as 5 just for the small example dataset. # For real data, usually try 12 to 48. phylobin=taxa.binphy.big(tree = tree, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, ds = ds, bin.size.limit = bin.size.limit, nworker = nworker) setwd(wd0)
data("example.data") comm=example.data$comm tree=example.data$tree # since pd.big need to specify a certain folder, # the following code is set as 'not test'. # but you may test the code on your computer # after change the folder path for 'save.wd'. wd0=getwd() save.wd=paste0(tempdir(),"/pdbig.taxa.binphy") # please change to the folder you want to save the big niche difference matrix. nworker=2 # parallel computing thread number pd.big=pdist.big(tree = tree, wd=save.wd, nworker = nworker) ds = 0.2 # setting can be changed to explore the best choice bin.size.limit = 5 # setting can be changed to explore the best choice. # here set as 5 just for the small example dataset. # For real data, usually try 12 to 48. phylobin=taxa.binphy.big(tree = tree, pd.desc = pd.big$pd.file, pd.spname = pd.big$tip.label, pd.wd = pd.big$pd.wd, ds = ds, bin.size.limit = bin.size.limit, nworker = nworker) setwd(wd0)
To calculate the distance from root to tip(s) and node(s) on phylogenetic tree
tree.droot(tree, range = NA, nworker = 4, output.path = FALSE)
tree.droot(tree, range = NA, nworker = 4, output.path = FALSE)
tree |
Phylogenetic tree, an object of class "phylo". |
range |
NA or a vector of integer, to specify the numbering of the tips/nodes of which the distances to root will be calculated. The numbering corresponds to those in the element "edge" of the tree. Default is NA, means to calculate all tips and nodes. |
nworker |
integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run. |
output.path |
logic, this function will call the function tree.path, if output.path is TRUE, the result of tree.path will be included in the output. Default is FALSE. |
A tool to get distances to root, used in phylogenetic binning.
If output.path is FALSE, output is a matrix where the first column indicates the numbering of nodes/tips and the second column has the distance to root. If output.path is TRUE, output is a list with two elements.
droot |
matrix, the first column indicates the numbering of nodes/tips and the second column has the distance to root. |
path |
result of tree.path, list of nodes and edge lengthes from root to each tip and/or node. |
Version 2: 2020.8.19, add example. Version 1: 2015.8.19
Daliang Ning
tree=ape::rtree(4) nworker=2 # parallel computing thread number droot=tree.droot(tree = tree, nworker = nworker)
tree=ape::rtree(4) nworker=2 # parallel computing thread number droot=tree.droot(tree = tree, nworker = nworker)
To list all the nodes and edge lengthes from root to every tip and/or node.
tree.path(tree, nworker = 4, range = NA, cum = c("no", "from.root", "from.tip", "both"))
tree.path(tree, nworker = 4, range = NA, cum = c("no", "from.root", "from.tip", "both"))
tree |
Phylogenetic tree, an object of class "phylo". |
nworker |
for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4. |
range |
a numeric vector, to specify nodes and/or tips to which the path from root will be calculated. default is NA, means all tips. |
cum |
method to calculate cumulative banch length. "no" means not to calculate cumulative lenght; "from.root" means to cumulate from root to tip; "from.tip" means to cumulate from tip to root; "both" means to calculate in both ways and return both results. |
This function can be useful in phylogenetic diversity analysis, for example, phylogenetic distance, phylogenetic Hill number, phylogenetic binning, etc.
A list result will be returned. 1st layer (the names of the list) is the end of the path, usually the names of tips and/or nodes In 2nd layer, [[1]] is the orders of nodes between root and the tip/node specified in 1st layer; [[2]] is the edge lengthes. if cum="both", [[3]] is cumulative length from root, and [[4]] is cumulative length from tip, otherwise, [[3]] is the cumulative length specified by cum.
Version 1: 2016.2.14
Daliang Ning
data("example.data") tree=example.data$tree nworker=2 # parallel computing thread number treepath=tree.path(tree=tree, nworker=nworker)
data("example.data") tree=example.data$tree nworker=2 # parallel computing thread number treepath=tree.path(tree=tree, nworker=nworker)