Title: | Prioritize and Delete Erroneous Taxa in a Large Phylogenetic Tree |
---|---|
Description: | Finds, prioritizes and deletes erroneous taxa in a phylogenetic tree. This package calculates scores for taxa in a tree. Higher score means the taxon is more erroneous. If the score is zero for a taxon, the taxon is not erroneous. This package also can remove all erroneous taxa automatically by iterating score calculation and pruning taxa with the highest score. |
Authors: | Satoshi Aoki [aut, cph, cre], Keita Fukasawa [ctb] |
Maintainer: | Satoshi Aoki <[email protected]> |
License: | MIT + file LICENSE |
Version: | 3.0.0 |
Built: | 2024-12-08 07:09:45 UTC |
Source: | CRAN |
Internal Apoderoides functions
These are not to be called by the user.
Different values, depending on the function.
Iterate calc.Score() and deleteAnomaly() until all the tree tips have 0 score or the number of the tips becomes three or lower.
autoDeletion( tree,OTUrankData=NULL, show_progress=TRUE,num_threads=1, prior="MRCA",criteria="composite" )
autoDeletion( tree,OTUrankData=NULL, show_progress=TRUE,num_threads=1, prior="MRCA",criteria="composite" )
tree |
A phylogenetic tree to be checked. This is loaded by ape::read.tree() from a file. |
OTUrankData |
A list composed of two character vectors. The first vector is tips of tree. The second vector is the upper rank of the tips. When this is NULL, the function assumes that all the tree tips are expressed as Genus_species like Homo_sapience, and calculates for genera. When this is not NULL, the function calculates based on the upper rank in this list. |
show_progress |
If TRUE, calculation progress is shown on the R console. |
num_threads |
A positive integer to specify the number of threads to calculate. |
prior |
Used only when "criteria" is "both". "MRCA" or "centroid". This argument defines the prioritized score when scores based on MRCA and centroid are equal. |
criteria |
Criteria nodes to calculate the scores."composite", "both", "MRCA" or "centroid". "MRCA" and "centroid" use their corresponding node to calculate both intruder and outlier scores. "composite" calculates intruder scores using MRCA and outlier ones using centroid, which is empirically known to be most effective. "both" calculates both of MRCA-based and centroid-based scores and uses the highest one to select taxa to be deleted. |
A list of the length three or four. The first element is a list of phylogenetic tree from which erroneous taxa are deleted. The second is a character vector of deleted taxa. The third and fourth are a list of lists showing the transition of the scores. When criteria is "both", third and fourth elements correspond to scores based on MRCA and centroid, respectively. See calc.Score about the contents of the third and fourth elements.
data(testTree) data(testRankList) #calculate scores for the rank in the list, and delete all the erroneous tips #this takes tens of seconds for calculation result<-autoDeletion(testTree,testRankList) #tree without erroneos tips result[[1]] #deleted tips result[[2]] #scores during iteration of score calculation and tip deletion result[[3]]
data(testTree) data(testRankList) #calculate scores for the rank in the list, and delete all the erroneous tips #this takes tens of seconds for calculation result<-autoDeletion(testTree,testRankList) #tree without erroneos tips result[[1]] #deleted tips result[[2]] #scores during iteration of score calculation and tip deletion result[[3]]
Calculate scores of a phylogenetic tree to find and prioritize erroneous taxa to delete.
calc.Score(tree,OTUrankData=NULL, allRankNames=NULL,allCentroids=NULL,allMRCAs=NULL,dropIndex=NULL, sort=TRUE,show_progress=TRUE,num_threads=1,criteria="composite")
calc.Score(tree,OTUrankData=NULL, allRankNames=NULL,allCentroids=NULL,allMRCAs=NULL,dropIndex=NULL, sort=TRUE,show_progress=TRUE,num_threads=1,criteria="composite")
tree |
A phylogenetic tree to be checked. This is loaded by ape::read.tree() from a file. |
OTUrankData |
A list composed of two character vectors. The first vector is tips of tree. The second vector is the upper rank of the tips. When this is NULL, the function assumes that all the tree tips are expressed as Genus_species like Homo_sapience, and calculate the score for genera. When this is not NULL, the function returns scores based on the upper rank in this list. |
allRankNames |
This can be omitted. This is a unique character vector of the upper ranks of the tree tips. If given, the calculation will be a little faster. |
allCentroids |
This can be omitted. This is a list of numeric vectors of the centroids of ranks. If given, the calculation will be a little faster. |
allMRCAs |
This can be omitted. This is a list of numeric vectors of the MRCAs of ranks. If given, the calculation will be a little faster. |
dropIndex |
This can be omitted. A numeric vector of indices of tree tips. The tree tips indicated by this dropIndex will be removed from the score calculation. |
sort |
If TRUE, the calculation result is sorted by descending order of the total score. |
show_progress |
If TRUE, calculation progress is shown on the R console. |
num_threads |
A positive integer to specify the number of threads to calculate the scores. |
criteria |
Criteria nodes to calculate the scores."composite", "both", "MRCA" or "centroid". "MRCA" and "centroid" use their corresponding node to calculate both intruder and outlier scores. "composite" calculates intruder scores using MRCA and outlier ones using centroid, which is empirically known to be most effective. "both" calculates both of MRCA-based and centroid-based score. |
A list containing one or two matrices of characters showing the scores. Only when criteria is "both", there are two matrices, and the first one is the score based on the centroids, and the second is that based on the MRCAs. The following explains the columns in the matrix.
OTU |
The name of tree tip. |
perCladeOTUScore |
The final score calculated by "sum" divided by the number of OTUs with the same "#clade". |
sum |
The sum of "intruder" and "outlier" for the OTU. |
intruder |
The intruder score showing how many ranks the OTU intruding into. |
outlier |
The outlier score showing how the OTU is far away from the core clade of the belonging rank. |
#clade |
The clade number. Monophyletic OTUs with the same rank has the same #clade. |
data(testTree) #calculate scores for genus calc.Score(testTree) data(testRankList) #calculate scores for the rank in the list calc.Score(testTree,testRankList)
data(testTree) #calculate scores for genus calc.Score(testTree) data(testRankList) #calculate scores for the rank in the list calc.Score(testTree,testRankList)
Delete tip(s) with the highest score from a tree.
deleteAnomaly(tree,scores,OTUrankData=NULL,drop=FALSE,prior="MRCA")
deleteAnomaly(tree,scores,OTUrankData=NULL,drop=FALSE,prior="MRCA")
tree |
A phylogenetic tree to be checked. This is loaded by ape::read.tree() from a file. |
scores |
A list of scores calculated by calc.Score function. |
OTUrankData |
A list composed of two character vectors. The first vector is tips of tree. The second vector is the upper rank of the tips. When this is NULL, the function assumes that all the tree tips are expressed as Genus_species like Homo_sapience and that the score is calculated based on genera. When this is not NULL, the function assumes the score is calculated based on the upper rank in this list. |
drop |
Whether the dropped OTU(s) is included in the returned tree. |
prior |
Used only when the length of "scores" is two. "MRCA" or "centroid". This argument defines the prioritized score when scores based on MRCA and centroid are equal. |
A list of the length two. The first element is a vector of characters of deleted tip label(s). The second is a list of a phylogenetic tree without the deleted tip(s).
data(testTree) data(testRankList) #calculate scores for the rank in the list score<-calc.Score(testTree,testRankList) #delete tip with the highest score from tree deleteAnomaly(testTree,score,testRankList)
data(testTree) data(testRankList) #calculate scores for the rank in the list score<-calc.Score(testTree,testRankList) #delete tip with the highest score from tree deleteAnomaly(testTree,score,testRankList)
Obtain upper rank of scientific names in data. When OTUrankData is not provided, this function returns genus names assuming the elements in data are scientific names connected by underlines like "Homo_sapiens". When OTUrankData is provided, this function searches data in OTUrankData[[1]] and returns OTUrankData[[2]] of the corresponding index.
get.upperRank(data,OTUrankData=NULL)
get.upperRank(data,OTUrankData=NULL)
data |
A vector of characters. |
OTUrankData |
A list composed of two character vectors. The first vector is tips of tree. The second vector is the upper rank of the tips. When this is NULL, the function assumes that all the tree tips are expressed as Genus_species like Homo_sapience, and calculate the score for genera. When this is not NULL, the function returns scores based on the upper rank in this list. |
A vector of characters of upper rank.
#obtain genus name get.upperRank(c("Oxalis_nipponica","Homo_sapiens")) data(testTree) data(testRankList) #obtain higher rank names get.upperRank(testTree$tip[1:3],testRankList)
#obtain genus name get.upperRank(c("Oxalis_nipponica","Homo_sapiens")) data(testTree) data(testRankList) #obtain higher rank names get.upperRank(testTree$tip[1:3],testRankList)
Calculate all the centroids of ranks in the tree. The centroid of a rank is equivalent to S-centroid by Slater (1978).
getAllCentroids(tree,OTUrankData=NULL,show_progress=FALSE,num_threads=1)
getAllCentroids(tree,OTUrankData=NULL,show_progress=FALSE,num_threads=1)
tree |
A phylogenetic tree to be checked. This is loaded by ape::read.tree() from a file. |
OTUrankData |
A list composed of two character vectors. The first vector is tips of tree. The second vector is the upper rank of the tips. When this is NULL, the function assumes that all the tree tips are expressed as Genus_species like Homo_sapience, and calculate the centroids for genera. When this is not NULL, the function returns centroids based on the upper rank in this list. |
show_progress |
If TRUE, calculation progress is shown on the R console. |
num_threads |
A positive integer to specify the number of threads to calculate the scores. |
A list containing vectors of integers of centroid node number(s).
Slater P. J. 1978. Centers to centroids in graphs. Journal of Graph Theory 2: 209–222.
data(testTree) #calculate centroids for genus getAllCentroids(testTree) data(testRankList) #calculate centroids for the rank in the list getAllCentroids(testTree,testRankList)
data(testTree) #calculate centroids for genus getAllCentroids(testTree) data(testRankList) #calculate centroids for the rank in the list getAllCentroids(testTree,testRankList)
Calculate all the most recent common ancestors (MRCAs) of ranks in the tree. Unlike getMRCA() in ape package, this function returns a tip node number when the rank is monotypic.
getAllMRCAs(tree,OTUrankData=NULL)
getAllMRCAs(tree,OTUrankData=NULL)
tree |
A phylogenetic tree to be checked. This is loaded by ape::read.tree() from a file. |
OTUrankData |
A list composed of two character vectors. The first vector is tips of tree. The second vector is the upper rank of the tips. When this is NULL, the function assumes that all the tree tips are expressed as Genus_species like Homo_sapience, and calculate the MRCAs for genera. When this is not NULL, the function returns MRCAs based on the upper rank in this list. |
A list containing vectors of an MRCA node number.
data(testTree) #calculate MRCAs for genus getAllMRCAs(testTree) data(testRankList) #calculate MRCAs for the rank in the list getAllMRCAs(testTree,testRankList)
data(testTree) #calculate MRCAs for genus getAllMRCAs(testTree) data(testRankList) #calculate MRCAs for the rank in the list getAllMRCAs(testTree,testRankList)
Example data to test Apoderoides. testRankList is a list of two elements. The first element is the tip label of testTree, and the second element is corresponding family names of the tips.
data(testRankList)
data(testRankList)
Example data to test Apoderoides. testTree is a tree of land plants based on chlB gene.
data(testTree)
data(testTree)