Package 'treebalance'

Title: Computation of Tree (Im)Balance Indices
Description: The aim of the 'R' package 'treebalance' is to provide functions for the computation of a large variety of (im)balance indices for rooted trees. The package accompanies the book ''Tree balance indices: a comprehensive survey'' by M. Fischer, L. Herbst, S. Kersting, L. Kuehn and K. Wicke (2023) <ISBN: 978-3-031-39799-8>, <doi:10.1007/978-3-031-39800-1>, which gives a precise definition for the terms 'balance index' and 'imbalance index' (Chapter 4) and provides an overview of the terminology in this manual (Chapter 2). For further information on (im)balance indices, see also Fischer et al. (2021) <https://treebalance.wordpress.com>. Considering both established and new (im)balance indices, 'treebalance' provides (among others) functions for calculating the following 18 established indices and index families: the average leaf depth, the B1 and B2 index, the Colijn-Plazzotta rank, the normal, corrected, quadratic and equal weights Colless index, the family of Colless-like indices, the family of I-based indices, the Rogers J index, the Furnas rank, the rooted quartet index, the s-shape statistic, the Sackin index, the symmetry nodes index, the total cophenetic index and the variance of leaf depths. Additionally, we include 9 tree shape statistics that satisfy the definition of an (im)balance index but have not been thoroughly analyzed in terms of tree balance in the literature yet. These are: the total internal path length, the total path length, the average vertex depth, the maximum width, the modified maximum difference in widths, the maximum depth, the maximum width over maximum depth, the stairs1 and the stairs2 index. As input, most functions of 'treebalance' require a rooted (phylogenetic) tree in 'phylo' format (as introduced in 'ape' 1.9 in November 2006). 'phylo' is used to store (phylogenetic) trees with no vertices of out-degree one. For further information on the format we kindly refer the reader to E. Paradis (2012) <http://ape-package.ird.fr/misc/FormatTreeR_24Oct2012.pdf>.
Authors: Mareike Fischer [aut], Lina Herbst [aut], Sophie Kersting [aut], Luise Kuehn [aut, cre], Kristina Wicke [aut]
Maintainer: Luise Kuehn <[email protected]>
License: GPL-3
Version: 1.2.0
Built: 2024-12-09 06:32:31 UTC
Source: CRAN

Help Index


Calculation of the area per pair index for rooted trees

Description

This function calculates the area per pair index APP(T)APP(T) for a given rooted tree TT. The tree must not necessarily be binary. APP(T)APP(T) is defined as

APP(T)=2n(n1)1i<jndT(i,j)APP(T)=\frac{2}{n\cdot(n-1)}\cdot\sum_{1\leq i<j\leq n} d_T(i,j)

in which nn denotes the number of leaves in TT, and dT(i,j)d_T(i,j) denotes the number of edges on the path between the two leaves ii and jj. Note that APP(T)APP(T) can also be computed from the Sackin index S(T)S(T) and the total cophenetic index TCI(T)TCI(T) of TT as APP(T)=2nS(T)4n(n1)TCI(T)APP(T)=\frac{2}{n}\cdot S(T)-\frac{4}{n(n-1)}\cdot TCI(T) enabling efficient computation.

The area per pair index does not fulfill the definition of an (im)balance index given in "Tree balance indices: a comprehensive survey" (Fischer et al., 2023).

For details on the area per pair index, see also Chapter 24 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_24).

Usage

areaPerPairI(tree)

Arguments

tree

A rooted tree in phylo format.

Value

areaPerPairI returns the area per pair index of the given tree.

Author(s)

Luise Kuehn

References

T. Araujo Lima, F. M. D. Marquitti, and M. A. M. de Aguiar. Measuring Tree Balance with Normalized Tree Area. arXiv e-prints, art. arXiv:2008.12867, 2020.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
areaPerPairI(tree)

Calculation of the average leaf depth index for rooted trees

Description

This function calculates the average leaf depth N(T)N(T) for a given rooted tree TT. The tree must not necessarily be binary. N(T)N(T) is defined as

N(T)=1nuVin(T)nuN(T)=\frac{1}{n}\cdot\sum_{u\in V_{in}(T)} n_u

in which nn denotes the number of leaves in TT, Vin(T)V_{in}(T) denotes the set of inner nodes of TT and nun_u denotes the number of leaves in the pending subtree that is rooted at the inner node uu. Note that N(T)N(T) can also be computed from the Sackin index S(T)S(T) as N(T)=1nS(T)N(T)=\frac{1}{n}\cdot S(T). The average leaf depth is an imbalance index.

For n=1n=1 the function returns N(T)=0N(T)=0 and a warning.

For details on the average leaf depth, see also Chapter 6 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_6).

Usage

avgLeafDepI(tree)

Arguments

tree

A rooted tree in phylo format.

Value

avgLeafDepI returns the average leaf depth of the given tree.

Author(s)

Luise Kuehn

References

M. J. Sackin. "Good" and "Bad" Phenograms. Systematic Biology, 21(2):225-226, 1972. doi: 10.1093/sysbio/21.2.225.

K.-T. Shao and R. R. Sokal. Tree Balance. Systematic Zoology, 39(3):266, 1990.
doi: 10.2307/2992186.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
avgLeafDepI(tree)

Calculation of the average vertex depth for rooted trees

Description

This function calculates the average vertex depth AVD(T)AVD(T) for a given rooted tree TT. The tree must not necessarily be binary. AVD(T)AVD(T) is defined as

AVD(T)=1V(T)xV(T)δ(x)AVD(T)=\frac{1}{|V(T)|}\cdot\sum_{x\in V(T)} \delta(x)

in which V(T)V(T) denotes the set of vertices of TT, and δ(x)\delta(x) denotes the depth of the vertex xx. The average vertex depth is a normalised version of the total path length and an imbalance index.

For n=1n=1 the function returns AVD(T)=0AVD(T)=0 and a warning.

For details on the average vertex depth, see also Chapter 23 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_23).

Usage

avgVertDep(tree)

Arguments

tree

A rooted tree in phylo format.

Value

avgVertDep returns the average vertex depth of the given tree.

Author(s)

Luise Kuehn

References

A. Herrada et al. Scaling properties of protein family phylogenies. BMC Evolutionary Biology, 11(1), June 2011. doi: 10.1186/1471-2148-11-155.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,,),),(,)));")
avgVertDep(tree)

Calculation of the B1 index for rooted trees

Description

This function calculates the B1B1 index B1(T)B1(T) for a given rooted tree TT. The tree must not necessarily be binary. B1(T)B1(T) is defined as

B1(T)=uVin(T){ρ}h(Tu)1B1(T)=\sum_{u\in V_{in}(T)\setminus\{\rho\}} h(T_u)^{-1}

in which Vin(T){ρ}V_{in}(T)\setminus\{\rho\} denotes the set of inner vertices of TT without the root, and h(Tu)h(T_u) denotes the height of the pending subtree rooted at uu. When restricted to binary trees, the B1B1 index is a balance index. For arbitrary trees it does not fulfill the definition of an (im)balance index.

For n=1n=1 the function returns B1(T)=0B1(T)=0 and a warning.

For details on the B1 index, see also Chapter 10 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_10).

Usage

B1I(tree)

Arguments

tree

A rooted tree in phylo format.

Value

B1I returns the B1 index of the given tree.

Author(s)

Sophie Kersting

References

K.-T. Shao and R. R. Sokal. Tree Balance. Systematic Zoology, 39(3):266, 1990.
doi: 10.2307/2992186.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
B1I(tree)

Calculation of the B2 index for rooted trees

Description

This function calculates the B2 index B2(T)B2(T) for a given rooted tree TT. The tree must not necessarily be binary. B2(T)B2(T) is defined as

B2(T)=xVL(T)pxlog(px)B2(T)=-\sum_{x\in V_L(T)} p_x\cdot log(p_x)

in which VL(T)V_L(T) denotes the leaf set of TT, and in which

px=vanc(x)1child(v)p_x=\prod_{v\in anc(x)} \frac{1}{|child(v)|}

denotes the probability of reaching leaf xx when starting at the root and assuming equiprobable branching at each vertex vanc(x)v\in anc(x) with anc(x)anc(x) denoting the set of ancestors of xx excluding xx. child(v)child(v) denotes the set of children of the inner vertex vv.
The B2B2 index is a balance index.

For n=1n=1 the function returns B2(T)=0B2(T)=0 and a warning.

For details on the B2 index, see also Chapter 11 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_11).

Usage

B2I(tree, logbase = 2)

Arguments

tree

A rooted tree in phylo format.

logbase

The base that shall be used for the logarithm. For binary trees it is common to use base 2.

Value

B2I returns the B2 index of the given tree.

Author(s)

Sophie Kersting, Luise Kuehn

References

K.-T. Shao and R. R. Sokal. Tree Balance. Systematic Zoology, 39(3):266, 1990.
doi: 10.2307/2992186.

P.-M. Agapow and A. Purvis. Power of Eight Tree Shape Statistics to Detect Nonrandom Diversification: A Comparison by Simulation of Two Models of Cladogenesis. Systematic Biology, 51(6):866-872, 2002.doi: 10.1080/10635150290102564.
URL https://doi.org/10.1080/10635150290102564.

M. Hayati, B. Shadgar, and L. Chindelevitch. A new resolution function to evaluate tree shape statistics. PLOS ONE, 14(11):e0224197, 2019. doi: 10.1371/journal.pone.0224197.
URL https://doi.org/10.1371/journal.pone.0224197.

M. Kirkpatrick and M. Slatkin. Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution, 47(4):1171-1181, 1993. doi: 10.1111/j.1558-5646.1993.tb02144.x.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
B2I(tree)

Calculation of the cherry index for rooted trees

Description

This function calculates the cherry index ChI(T)ChI(T) for a given rooted tree TT. The tree must not necessarily be binary. ChI(T)ChI(T) is defined as the number of cherries in the tree. A cherry is a pair of leaves that have the same direct ancestor. Note, if a vertex uu has xx leaves as direct descendants, the number of cherries induced by uu is binom(x,2)binom(x,2).

The cherry index does not fulfill the definition of an (im)balance index given in "Tree balance indices: a comprehensive survey" (Fischer et al., 2023).

For details on the cherry index, see also Chapter 24 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_24).

Usage

cherryI(tree)

Arguments

tree

A rooted tree in phylo format.

Value

cherryI returns the cherry index of the given tree.

Author(s)

Sophie Kersting

References

A. McKenzie and M. Steel. Distributions of cherries for two models of trees. Mathematical Biosciences, 164(1):81-92, 2000. doi: 10.1016/s0025-5564(99)00060-7.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
cherryI(tree)
tree <- ape::read.tree(text="((,),((((,),),),(,)));")
cherryI(tree)
tree <- ape::read.tree(text="((,,,),(,,));")
cherryI(tree)

Calculation of the Colless index for rooted binary trees

Description

This function calculates variants of the Colless index for a given rooted binary tree TT. All of them are imbalance indices.

The original Colless index C(T)C(T) is defined as

C(T)=uVin(T)nuanubC(T)=\sum_{u \in V_{in}(T)} |n_{u_a}-n_{u_b}|

in which Vin(T)V_{in}(T) denotes the set of all inner vertices of TT, and in which nuan_{u_a} and nubn_{u_b} denote the number of leaves in the two pending subtrees that are rooted at the direct descendants of uu.

The corrected Colless index IC(T)I_C(T) of TT is defined as IC(T)=0I_C(T)=0 for n=1n=1 and n=2n=2 and for n>2n>2 as

IC(T)=2C(T)(n1)(n2)I_C(T)=\frac{2\cdot C(T)}{(n-1)\cdot(n-2)}

in which nn denotes the total number of leaves in TT.

The quadratic Colless index QC(T)QC(T) of TT is defined as

QC(T)=uVin(T)nuanub2QC(T)=\sum_{u\in V_{in}(T)} |n_{u_a}-n_{u_b}|^2



Special cases: For n=1n=1 the function returns C(T)=IC(T)=QC(T)=0C(T)=I_C(T)=QC(T)=0 and a warning.

For details on the original, corrected and quadratic Colless indices, see also Chapters 12, 13 and 15 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_12, https://doi.org/10.1007/978-3-031-39800-1_13, https://doi.org/10.1007/978-3-031-39800-1_15).

Usage

collessI(tree, method = "original")

Arguments

tree

A rooted binary tree in phylo format.

method

A character string specifying the version that shall be computed. It can be one of the following: "original", "corrected", "quadratic".

Value

collessI returns the Colless index of the given tree according to the chosen method.

Author(s)

Luise Kuehn and Sophie Kersting

References

D. Colless. Review of Phylogenetics: the theory and practice of phylogenetic systematics. Systematic Zoology, 1982. ISSN 00397989.

T. M. Coronado, M. Fischer, L. Herbst, F. Rossello, and K. Wicke. On the minimum value of the Colless index and the bifurcating trees that achieve it. Journal of Mathematical Biology, 2020.doi: 10.1007/s00285-020-01488-9.

S. B. Heard. Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees. Evolution, 1992. doi: 10.1111/j.1558-5646.1992.tb01171.x.

K. Bartoszek, T. M. Coronado, A. Mir, and F. Rossello. Squaring within the Colless index yields a better balance index. Mathematical Biosciences, 331:108503, 2021. doi: 10.1016/j.mbs.2020.108503.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
collessI(tree, method="original")
collessI(tree, method="corrected")
collessI(tree, method="quadratic")

Calculation of the Colless-like indices for rooted trees

Description

This function calculates the Colless-like index for a given rooted tree TT according to the chosen weight function ff and dissimilarity DD. The Colless-like index CL(T)CL(T) relative to DD and ff is the sum of the (D,f)(D,f)-balance values over all inner vertices of the tree. More precisely,

CL(T)=vVin(T)balD,f(v)CL(T)=\sum_{v\in V_{in}(T)} bal_{D,f}(v)

where Vin(T)V_{in}(T) is the set of inner vertices of TT. The (D,f)(D,f)-balance value of vv with children v1,...,vkv_1,...,v_k is computed as

balD,f(v)=D(fs(Tv1),...,fs(Tvk))bal_{D,f}(v)=D(fs(T_{v_1}),...,fs(T_{v_k}))

with DD denoting the dissimilarity and fsfs denoting the f.size.
The f.size fs(T)fs(T) of a tree TT uses the function ff, which maps any integer to a non-negative real number, to build a weighted sum of the out-degrees of all vertices in TT. More precisely,

fs(T)=vV(T)f(deg+(v))fs(T)=\sum_{v\in V(T)} f(deg+(v))

where V(T)V(T) is the set of all vertices of TT and deg+(v)deg+(v) denotes the out-degree (i.e. the number of children) of the vertex vv. The ff-functions that are already implemented are f(x)=exf(x)=e^x and f(x)=ln(x+e)f(x)=ln(x+e).
The dissimilarity D(x1,...,xk)D(x_1,...,x_k) of a vector x1,...,xkx_1,...,x_k assigns a non-negative value to the vector, is independent of the order of the vector entries and equals zero if and only if x1=...=xkx_1=...=x_k. In this implementation the following dissimilarity functions are already built-in: mean deviation from the median (mdmmdm), the sample variance (varvar) and the sample standard deviation (sdsd).
collesslikeI also allows the use of other functions for the weight function ff and the dissimilarity DD.

Special cases: For n=1n=1 the function returns CL(T)=0CL(T)=0 and a warning.

For details on the family of Colless-like indices, see also Chapter 16 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_16).

Usage

collesslikeI(tree, f.size, dissim)

Arguments

tree

A rooted binary tree in phylo format.

f.size

A character string specifying the function ff that shall be used to compute the f.size. It can be one of the following: "exp", "ln" or the name of a function as a string.

dissim

A character string specifying the dissimilarity that shall be used. It can be one of the following: "mdm", "var", "sd" or the name of a function as a string.

Value

collesslikeI returns the Colless-like index of the given tree according to the chosen function and dissimilarity.

Author(s)

Luise Kuehn, Sophie Kersting

References

A. Mir, L. Rotger, and F. Rossello. Sound Colless-like balance indices for multifurcating trees. PLOSONE, 13(9):e0203401, 2018. doi: 10.1371/journal.pone.0203401

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
collesslikeI(tree, f.size="exp", dissim="mdm")
collesslikeI(tree, f.size="exp", dissim="var")
collesslikeI(tree, f.size="ln", dissim="sd")
myfsize <- function(x) return(x+1)
mydissim <- function(x) return (var(x))
collesslikeI(tree, f.size="myfsize",dissim = "mydissim")

Calculation of the Colijn-Plazzotta rank for rooted binary trees

Description

This function calculates the Colijn-Plazzotta rank CP(T)CP(T) for a given rooted binary tree TT.

For a binary tree TT, the Colijn-Plazzotta rank CP(T)CP(T) is recursively defined as CP(T)=1CP(T)=1 if TT consists of only one leaf and otherwise

CP(T)=12CP(T1)(CP(T1)1)+CP(T2)+1CP(T)=\frac{1}{2}\cdot CP(T_1)\cdot(CP(T_1)-1)+CP(T_2)+1

with CP(T1)CP(T2)CP(T_1) \geq CP(T_2) being the ranks of the two pending subtrees rooted at the children of the root of TT. This rank of TT corresponds to its position in the lexicographically sorted list of (i,ji,j): (1),(1,1),(2,1),(2,2),(3,1),... The Colijn-Plazzotta rank of binary trees has been shown to be an imbalance index.

For n=1n=1 the function returns CP(T)=1CP(T)=1 and a warning.

Note that problems can sometimes arise even for trees with small leaf numbers due to the limited range of computable values (ranks can reach INF quickly).

For details on the Colijn-Plazzotta rank, see also Chapter 21 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_21).

Usage

colPlaLab(tree)

Arguments

tree

A rooted binary tree in phylo format.

Value

colPlaLab returns the Colijn-Plazotta rank of the given tree. Since the values can get quite large, the function returns them in big.z format (package gmp).

Author(s)

Sophie Kersting, Luise Kuehn

References

C. Colijn and G. Plazzotta. A Metric on Phylogenetic Tree Shapes. Systematic Biology, doi: 10.1093/sysbio/syx046.

N. A. Rosenberg. On the Colijn-Plazzotta numbering scheme for unlabeled binary rooted trees. Discrete Applied Mathematics, 2021. doi: 10.1016/j.dam.2020.11.021.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
colPlaLab(tree)

Generation of the rooted binary tree corresponding to a given Colijn-Plazzotta rank

Description

This function generates the unique rooted binary tree TT (in phylo format) that corresponds to the given Colijn-Plazzotta rank CP(T)CP(T). It is the inverse function of colPlaLab().

colPlaLab(): For a given rooted binary tree TT, CP(T)CP(T) is recursively defined as CP(T)=1CP(T)=1 if TT consists of only one vertex and otherwise CP(T)=12CP(T1)(CP(T1)1)+CP(T2)+1CP(T)=\frac{1}{2}\cdot CP(T_1)\cdot(CP(T_1)-1)+CP(T_2)+1 with CP(T1)CP(T2)CP(T_1) \geq CP(T_2) being the ranks of the two pending subtrees rooted at the children of the root of TT. The rank CP(T)CP(T) of TT corresponds to its position in the lexicographically sorted list of (i,ji,j): (1),(1,1),(2,1),(2,2),(3,1),...

colPlaLab_inv(): For a given rank CPCP the corresponding tree TT can be reconstructed by starting from one vertex ρ\rho (labelled CPCP) and recursively splitting vertices whose labels hh are greater than 1 into two children with the labels:

i=1+8h721i=\left\lceil\frac{1+\sqrt{8\cdot h-7}}{2}\right\rceil-1

and

j=hi(i1)21j=h-\frac{i\cdot(i-1)}{2}-1

until there are no more vertices to split.
For CP=1CP=1 the function returns the smallest possible tree in the phylo format: the tree consisting of a single edge.

Note that problems can arise for extremely high input values (>10e+18).

For details on the Colijn-Plazzotta rank, see also Chapter 21 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_21).

Usage

colPlaLab_inv(rank)

Arguments

rank

An integer denoting the Colijn-Plazzotta rank of the sought tree.

Value

colPlaLab_inv returns the unique rooted binary tree for the given rank.

Author(s)

Sophie Kersting

References

C. Colijn and G. Plazzotta. A Metric on Phylogenetic Tree Shapes. Systematic Biology, 67(1):113-126,2018. doi: 10.1093/sysbio/syx046.

N. A. Rosenberg. On the Colijn-Plazzotta numbering scheme for unlabeled binary rooted trees. Discrete Applied Mathematics, 291:88-98, 2021. doi: 10.1016/j.dam.2020.11.021.

Examples

colPlaLab_inv(22)

Calculation of the equal weights Colless index for rooted binary trees

Description

This function calculates the equal weights Colless index I2(T)I_2(T) for a given rooted binary tree TT. I2(T)I_2(T) is defined as

I2(T)=1n2uVin(T),nu>2nuanubnu2I_2(T)=\frac{1}{n-2}\cdot\sum_{u\in V_{in}(T), n_u>2} \frac{|n_{u_a}-n_{u_b}|}{n_u-2}

in which Vin(T)V_{in}(T) denotes the set of all inner vertices of TT, and in which nun_u, nuan_{u_a} and nubn_{u_b} denote the number of leaves in the pending subtrees that are rooted at uu and the two direct descendants of uu. The equal weights Colless index is an imbalance index.

For n=1n=1 and n=2n=2 the function returns I2(T)=0I_2(T)=0 and a warning.

For details on the equal weigths Colless index, see also Chapter 14 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_14).

Usage

ewCollessI(tree)

Arguments

tree

A rooted binary tree in phylo format.

Value

ewCollessI returns the equal weights Colless index of the given tree.

Author(s)

Luise Kuehn

References

A. O. Mooers and S. B. Heard. Inferring Evolutionary Process from Phylogenetic Tree Shape. The Quarterly Review of Biology, 72(1), 1997. doi: 10.1086/419657.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
ewCollessI(tree)

Calculation of the Furnas rank for rooted binary trees

Description

This function calculates the Furnas rank F(T)F(T) for a given rooted binary tree TT. F(T)F(T) is the unique rank of the tree TT among all rooted binary trees with nn leaves in the left-light rooted ordering. For details on the left-light rooted ordering as well as details on how the Furnas rank is computed, see "The generation of random, binary unordered trees" by G.W. Furnas (1984) or "Tree balance indices: a comprehensive survey" by Fischer et al. (2023). The Furnas rank is a balance index.

The concept of assigning each rooted binary tree a unique tuple (rank,n)(rank, n) allows to store many trees with minimal storage use.

For details on the Furnas rank, see also Chapter 22 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_22).

Usage

furnasI(tree)

Arguments

tree

A rooted binary tree in phylo format.

Value

furnasI returns the unique Furnas rank of the given tree, i.e. the rank of the tree among all rooted binary trees with nn leaves in the left-light rooted ordering. Since the values can get quite large, the function returns them in big.zbig.z format (package gmpgmp).

Author(s)

Luise Kuehn, Lina Herbst

References

G. W. Furnas. The generation of random, binary unordered trees. Journal of Classification, 1984. doi: 10.1007/bf01890123. URL https://doi.org/10.1007/bf01890123.

M. Kirkpatrick and M. Slatkin. Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution, 1993. doi: 10.1111/j.1558-5646.1993.tb02144.x.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
furnasI(tree)

Calculation of rooted binary tree for tuple (rank, leaf number)

Description

This function calculates the unique tree TT (in phylo format) for two given integer values rr and nn, with nn denoting the number of leaves of TT and rr denoting the rank of TT in the left-light rooted ordering of all rooted binary trees with nn leaves. It is the inverse function of furnasI(). For details on how to calculate TT (including algorithm) see "The generation of random, binary unordered trees" by G.W. Furnas (1984) or "Tree balance indices: a comprehensive survey" by Fischer et al. (2023).

furnasI_inv can be used e.g. to generate random rooted binary trees with a certain number of leaves. Also, the concept of assigning each rooted binary tree a unique tuple (rank,n)(rank, n) allows to store many trees with minimal storage use.

For details on the Furnas rank, see also Chapter 22 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_22).

Usage

furnasI_inv(rank, n)

Arguments

rank

An integer denoting the rank of the sought tree among all rooted binary trees with nn leaves.

n

An integer denoting the number of leaves of the sought tree.

Value

furnasI_inv returns the unique tree (in phylo format) for the given leaf number and rank.

Author(s)

Sophie Kersting

References

G. W. Furnas. The generation of random, binary unordered trees. Journal of Classification, 1984. doi: 10.1007/bf01890123. URL https://doi.org/10.1007/bf01890123.

Examples

furnasI_inv(rank=6,n=8)

Auxiliary functions

Description

getDescMatrix - Creates a matrix that contains the descendants of node ii in row ii.

getAncVec - Creates a vector that contains the parent (direct ancestor) of node ii at position ii.

getNodesOfDepth - Creates a matrix that contains the nodes of depth ii in row ii.

symBucketLexicoSort - Sorts the pairs of numbers lexicographically and returns ranking. Uses bucket sort.

getAllAncestors - Returns all ancestors of vv including vv itself.

cPL_inv - Returns the binary tree that belongs to the input label in an incomplete Newick format.

maxDepthLeaf - Returns the maximumy< depth of a leaf in the subtree that is rooted at vv.

get.subtreesize - Creates a vector that contains at the ii-th position the number of leaves in the pending subtree rooted at ii.

getlca - Returns the name of the lowest common ancestor of the two input vertices vv and ww.

we_eth - Returns the Wedderburn-Etherington number we(n)we(n) for a given non-negative integer nn.

getfurranks - Returns for each vertex ii the Furnas rank of the subtree rooted at ii.

getsubtree - Returns the pending subtree (in phylo format) that is rooted at the input vertex. If the input vertex is a leaf, the function returns the standard tree for n=1n=1 (with 1 edge).

is_binary - Returns TRUE if the input tree is binary and FALSE otherwise.

is_phylo - Tests all requirements of the phylo format, and returns TRUE if the tree is correctly formatted, else FALSE with detailed feedback on the features that are not met.

tree_decomposition - Returns a list of length two, which contains the two pending subtrees that are rooted at the children of the root of the input tree. The smaller one (according to the number of leaves) is stated first.

tree_merge - Returns a rooted tree TT in phylo format, which contains the input trees tree1tree1 and tree2tree2 as "left" and "right" maximal pending subtrees.

treenumber - Returns the unique tree number tn(T)tn(T) of the given tree. tn(T)tn(T) is the rank of the tree TT among all rooted binary trees in the left-light rooted ordering. It can be calculated as follows:

tn(T)=F(T)+i=1n1we(i)tn(T)=F(T) + \sum_{i=1}^{n-1} we(i)

in which nn is the number of leaves in TT, F(T)F(T) is the Furnas rank of TT, i.e. the rank of TT in the left-light rooted ordering of all rooted binary trees with nn leaves, and we(i)we(i) is the Wedderburn-Etherington number of ii. The concept of assigning each rooted binary tree a unique tree number allows to store many trees with minimal storage use. For n=1n=1 the function returns tn(T)=1tn(T)=1 and a warning.

treenumber_inv - Returns the unique tree (in phylo format) for the given tree number.

auxE_l_X - Returns the sum of all products of l different values in X.

Usage

getDescMatrix(tree)

getAncVec(tree)

getNodesOfDepth(mat, root, n)

symBucketLexicoSort(workLabs)

getAllAncestors(tree, v)

cPL_inv(label)

maxDepthLeaf(tree, v = length(tree$tip.label) + 1)

get.subtreesize(tree)

getlca(tree, v, w)

we_eth(n)

getfurranks(tree)

getsubtree(tree, subroot)

is_binary(tree)

is_phylo(tree)

tree_decomposition(tree)

tree_merge(tree1, tree2)

treenumber(tree)

treenumber_inv(treenum)

auxE_l_X(subX, Xset)

Arguments

tree

A rooted tree in phylo format, >= 2 leaves

mat

Descendants matrix from getDescMatrix

root

Number (label) of the root of the tree

n

Number of leaves of the tree

workLabs

numeric matrix (2 columns)

v

A vertex of the tree.

label

A Colijn-Plazotta label of desired tree, a positive integer.

w

A vertex of the tree.

subroot

A vertex of the tree. It is not recommended to use leaves as subroots.

tree1

A rooted tree in phylo format.

tree2

A rooted tree in phylo format.

treenum

An integer denoting the tree number of the sought tree.

subX

integer >=1, size of the subsets of X.

Xset

Vector (multiset) of numeric values.

Value

desc_mat numeric matrix

anc_vec numeric vector

nodes_of_depth numeric matrix

ranking numeric vector

vectorWithAncs numeric vector

Author(s)

Sophie Kersting, Luise Kuehn and Lina Herbst

Examples

mat <- cbind(c(7,7,6,5,5,6),c(1,2,3,4,6,7))
tree <- list(edge=mat, tip.label=c("","","",""), Nnode=3)
getDescMatrix(tree)
mat <- cbind(c(5,5,5,5),c(1,2,3,4))
tree <- list(edge=mat, tip.label=c("","","",""), Nnode=1)
getDescMatrix(tree)
getAncVec(tree)
getNodesOfDepth(mat=getDescMatrix(tree),root=length(tree$tip.label)+1,
n=length(tree$tip.label))
myWorkLabs <- cbind(c(0,1,2,3,1,0),c(0,2,2,4,1,0))
symBucketLexicoSort(myWorkLabs)
getAllAncestors(tree,v=6)
cPL_inv(label=6)
maxDepthLeaf(tree,v=6)
get.subtreesize(tree)
getlca(tree,1,2)
we_eth(5)
getfurranks(tree)
getsubtree(tree,4)
is_binary(ape::read.tree(text="((((,),),(,)),(((,),),(,)));"))
is_phylo(ape::read.tree(text="((((,),),(,)),(((,),),(,)));"))
tree_decomposition(ape::read.tree(text="((((,),),(,)),(((,),),(,)));"))
treeA <- ape::read.tree(text="(((,),),(,));")
treeB <- ape::read.tree(text="((,),);")
tree_merge(treeA, treeB)
treenumber(ape::read.tree(text="((((,),),(,)),(((,),),(,)));"))
treenumber_inv(192)
auxE_l_X(subX=3,Xset=c(1,1,2,2))

Calculation of the I-based indices for rooted trees

Description

This function calculates II-based indices I(T)I(T) for a given rooted tree TT. Note that the leaves of the tree may represent single species or groups of more than one species. Thus, a vector is required that contains for each leaf the number of species that it represents. The tree may contain few polytomies, which are not allowed to concentrate in a particular region of the tree (see p. 238 in Fusco and Cronk (1995)).

Let vv be a vertex of TT that fulfills the following criteria: a) The number of descendant (terminal) species of vv is kv>3k_v>3 (note that if each leaf represents only one species kvk_v is simply the number of leaves in the pending subtree rooted at vv), and b) vv has exactly two children.

Then, we can calculate the IvI_v value as follows:

Iv=kvakv2kv1kv2I_v=\frac{k_{v_a}-\left\lceil\frac{k_v}{2}\right\rceil}{k_v-1-\left\lceil\frac{k_v}{2}\right\rceil}

in which kvak_{v_a} denotes the number of descendant (terminal) species in the bigger one of the two pending subtrees rooted at vv.

As the expected value of IvI_v under the Yule model depends on kvk_v, Purvis et al. (2002) suggested to take the corrected values IvI'_v or IvwI_v^w instead.
The IvI'_v value of vv is defined as follows: Iv=IvI'_v=I_v if kvk_v is odd and Iv=kv1kvIvI'_v=\frac{k_v-1}{k_v}\cdot I_v if kvk_v is even.
The IvwI_v^w value of vv is defined as follows:

Ivw=w(Iv)IvmeanV(T)w(Iv)I_v^w=\frac{w(I_v)\cdot I_v}{mean_{V'(T)} w(I_v)}

where V(T)V'(T) is the set of inner vertices of TT that have precisely two children and kv4k_v\geq 4, and w(Iv)w(I_v) is a weight function with w(Iv)=1w(I_v)=1 if kvk_v is odd and w(Iv)=kv1kvw(I_v)=\frac{k_v-1}{k_v} if kvk_v is even and Iv>0I_v>0, and w(Iv)=2(kv1)kvw(I_v)=\frac{2\cdot(k_v-1)}{k_v} if kvk_v is even and Iv=0I_v=0.

The II-based index of TT can now be calculated using different methods. Here, we only state the version for the II' correction method, but the non-corrected version or the IvwI_v^w corrected version works analoguously. 1) root: The II' index of TT equals the IvI'_v value of the root of TT, i.e. I(T)=IρI'(T)=I'_{\rho}, provided that the root fulfills the two criteria. Note that this method does not fulfil the definition of an (im)balance index. 2) median: The II' index of TT equals the median IvI'_v value of all vertices vv that fulfill the two criteria. 3) total: The II' index of TT equals the summarised IvI'_v values of all vertices vv that fulfill the two criteria. 4) mean: The II' index of TT equals the mean IvI'_v value of all vertices vv that fulfill the two criteria. 5) quartile deviation: The II' index of TT equals the quartile deviation (half the difference between third and first quartile) of the IvI'_v values of all vertices vv that fulfill the two criteria.

For details on the family of I-based indices, see also Chapter 17 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_17).

Usage

IbasedI(
  tree,
  specnum = rep(1, length(tree$tip.label)),
  method = "mean",
  correction = "none",
  logs = TRUE
)

Arguments

tree

A rooted tree in phylo format (with possibly few polytomies).

specnum

A vector whose ii-th entry is the number of species that the ii-th leaf represents. (default is 1,...,1)

method

A character string specifying the method that shall be used to calculate I(T)I(T). It can be one of the following: "root", "median", "total", "mean", "quartdev".

correction

A character string specifying the correction method that shall be applied to the I values. It can be one of the following: "none", "prime", "w".

logs

Boolean value, (default true), determines if the number of suitable nodes (i.e. nodes that fulfill the criteria) and polytomies in the tree should be printed.

Value

IbasedI returns an II-based balance index of the given tree according to the chosen (correction and) method.

Author(s)

Luise Kuehn and Sophie Kersting

References

G. Fusco and Q. C. Cronk. A new method for evaluating the shape of large phylogenies. Journal of Theoretical Biology, 1995. doi: 10.1006/jtbi.1995.0136. URL https://doi.org/10.1006/jtbi.1995.0136.

A. Purvis, A. Katzourakis, and P.-M. Agapow. Evaluating Phylogenetic Tree Shape: Two Modifications to Fusco & Cronks Method. Journal of Theoretical Biology, 2002. doi: 10.1006/jtbi.2001.2443. URL https://doi.org/10.1006/jtbi.2001.2443.

Examples

tree <- ape::read.tree(text="(((((,),),),),);")
IbasedI(tree, method="mean")
IbasedI(tree, method="mean", correction="prime", specnum=c(1,1,2,1,1,1))

Calculation of the (modified) maximum difference in widths for a rooted tree

Description

This function calculates the maximum difference in widths delW(T)delW(T) and the modified maximum difference in width mdelW(T)mdelW(T) for a given rooted tree TT. The tree must not necessarily be binary. delW(T)delW(T) is defined as

delW(T)=maxi=0,...,h(T)1w(i+1)w(i)delW(T)=\max_{i=0,...,h(T)-1} |w(i+1)-w(i)|

and mdelW(T)mdelW(T) is defined as

mdelW(T)=maxi=0,...,h(T)1w(i+1)w(i)mdelW(T)=\max_{i=0,...,h(T)-1} w(i+1)-w(i)

in which h(T)h(T) denotes the height of the tree TT and w(i)w(i) denotes the number of vertices in TT that have depth ii. The modified maximum difference in widths is a balance index, while the maximum difference in widths is neither a balance nor imbalance index.

Note that there was a spelling error in the previous manual of this function - we wrote "maximum difference in widths" while the given definition and the R code corresponded to the "modified maximum difference in width".

For details on the maximum difference in widths and the modified maximum difference in widths, see also Chapters 24 and 23 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_24, https://doi.org/10.1007/978-3-031-39800-1_23).

Usage

maxDelW(tree, method = "modified")

Arguments

tree

A rooted tree in phylo format.

method

A character string specifying whether the original maximum difference in widths or the modified maximum difference in widths shall be computed. Can be any of "original" or "modified" (default is modified).

Value

maxDelW returns the maximum difference in widths of a tree (if method is set to original) or the modified maximum difference in widths (if method is set to modified).

Author(s)

Sophie Kersting, Luise Kuehn

References

C. Colijn, J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
maxDelW(tree, method="original")
tree <- ape::read.tree(text="((,),((((,),),),(,)));")
maxDelW(tree, method="modified")

Calculation of the maximum depth of the tree

Description

This function calculates the maximum depth of any vertex in a rooted tree TT, which is at the same time its height h(T)h(T). The tree must not necessarily be binary. Formally, h(T)h(T) is defined as

h(T)=maxvV(T)δ(v)h(T)=\max_{v\in V(T)} \delta(v)

with δ(v)\delta(v) being the depth of the vertex vv. The maximum depth is an imbalance index.

For n=1n=1 the function returns h(T)=0h(T)=0 and a warning.

For details on the maximum depth, see also Chapter 23 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_23).

Usage

maxDepth(tree)

Arguments

tree

A rooted tree in phylo format.

Value

maxDepth returns the maximum depth, i.e. height, of a tree.

Author(s)

Luise Kuehn, Sophie Kersting

References

C. Colijn and J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
maxDepth(tree)
tree <- ape::read.tree(text="((,),((((,),),),(,)));")
maxDepth(tree)

Calculation of the maximum width of the tree

Description

This function calculates the maximum width maxWidth(T)maxWidth(T) for a given rooted tree TT. The tree must not necessarily be binary. maxWidth(T)maxWidth(T) is defined as

maxWidth(T)=maxi=0,...,h(T)w(i)maxWidth(T)=\max_{i=0,...,h(T)} w(i)

in which h(T)h(T) denotes the height of the tree TT and w(i)w(i) denotes the number of vertices in TT that have depth ii. The maximum width is a balance index.

For details on the maximum width, see also Chapter 23 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_23).

Usage

maxWidth(tree)

Arguments

tree

A rooted tree in phylo format.

Value

maxWidth returns the maximum width of a tree.

Author(s)

Sophie Kersting

References

C. Colijn and J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
maxWidth(tree)
tree <- ape::read.tree(text="((,),((((,),),),(,)));")
maxWidth(tree)

Calculation of the modified cherry index for rooted binary trees

Description

This function calculates the modified cherry index mChI(T)mChI(T) for a given rooted binary tree TT. Note that compared to the original cherry index ChI(T)ChI(T), the modified cherry index is defined for binary trees only. mChI(T)mChI(T) is defined as n2ChI(T)n-2\cdot ChI(T), i.e. it counts the number of leaves of the tree which are not in a cherry. A cherry is a pair of leaves that have the same direct ancestor.

The modified cherry index does not fulfill the definition of an (im)balance index given in "Tree balance indices: a comprehensive survey" (Fischer et al., 2023).

For details on the modified cherry index, see also Chapter 24 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_24).

Usage

mCherryI(tree)

Arguments

tree

A rooted binary tree in phylo format.

Value

mCherryI returns the modified cherry index of the given tree.

Author(s)

Luise Kuehn

References

S. J. Kersting, M. Fischer. Measuring tree balance using symmetry nodes — A new balance index and its extremal properties. Mathematical Biosciences, 341:108690, 2021. doi: 10.1016/j.mbs.2021.108690.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
mCherryI(tree)
tree <- ape::read.tree(text="((,),((((,),),),(,)));")
mCherryI(tree)

Calculation of the maximum width over maximum depth of the tree

Description

This function calculates the maximum width over maximum depth mWovermD(T)mWovermD(T) for a given rooted tree TT. The tree must not necessarily be binary. For n>1n>1, mWovermD(T)mWovermD(T) is defined as

mWovermD(T)=maxWidth(T)/h(T)mWovermD(T)=maxWidth(T) / h(T)

in which h(T)h(T) denotes the height of the tree TT, which is the same as the maximum depth of any leaf in the tree, and maxWidth(T)maxWidth(T) denotes the maximum width of the tree TT. The maximum width over maximum depth is a balance index.

For details on the maximum width over maximum depth, see also Chapter 23 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_23).

Usage

mWovermD(tree)

Arguments

tree

A rooted tree in phylo format.

Value

mWovermD returns the maximum width over maximum depth of a tree.

Author(s)

Luise Kuehn

References

C. Colijn and J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96–108, 2014. doi: 10.1093/emph/eou018.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
mWovermD(tree)
tree <- ape::read.tree(text="((,),((((,),),),(,)));")
mWovermD(tree)

Calculation of the Rogers J index for rooted binary trees

Description

This function calculates the Rogers J index J(T)J(T) for a given rooted binary tree TT. It is defined as the number of inner vertices whose balance value is unequal to zero, more precisely

J(T)=uVin(T)(1I(nua=nub))J(T)=\sum_{u \in V_{in}(T)} (1-I(n_{u_a}=n_{u_b}))

in which Vin(T)V_{in}(T) denotes the set of all inner vertices of TT, and in which nuan_{u_a} and nubn_{u_b} denote the number of leaves in the two pending subtrees that are rooted at the direct descendants of uu.
Special cases: For n=1n=1, the function returns J(T)=0J(T)=0 and a warning.

For details on the Rogers J index, see also Chapter 19 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_19).

Usage

rogersI(tree)

Arguments

tree

A rooted binary tree in phylo format.

Value

rogersI returns the Rogers J index of the given tree.

Author(s)

Sophie Kersting

References

J. S. Rogers. Central Moments and Probability Distributions of Three Measures of Phylogenetic Tree Imbalance. Systematic Biology, 45(1):99-110, 1996. doi: 10.1093/sysbio/45.1.99.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
rogersI(tree)

Calculation of the rooted quartet index for rooted trees

Description

This function calculates the rooted quartet index rQI(T)rQI(T) for a given rooted tree TT. The tree must not necessarily be binary.

Let TT be a rooted tree, whose leaves are 1,...,n1,...,n. Let P4P_4 denote the set of all subsets of {1,...,n}\{1,...,n\} that have cardinality 4. Let T(Q)T(Q) denote the rooted quartet on QP4Q\in P_4 that is obtained by taking the subgraph of TT that is induced by QQ and supressing its outdegree-1 vertices. T(Q)T(Q) can have one of the five following shapes:

- Q0Q_0^*: This is the caterpillar tree shape on 4 leaves, i.e. "(,(,(,)));" in Newick format. It has 2 automorphisms.
- Q1Q_1^*: This is the tree shape on 4 leaves that has three pending subtrees rooted at the children of the root of TT, one of them being a cherry and the other two being single vertices, i.e. "((,),,);" in Newick format. It has 4 automorphisms.
- Q2Q_2^*: This is the tree shape on 4 leaves that has two pending subtrees rooted at the children of the root of TT, one of them being a star tree shape on 3 leaves and the other one being a single vertex, i.e. "((,,),);" in Newick format. It has 6 automorphisms.
- Q3Q_3^*: This is the fully balanced binary tree shape on 4 leaves, i.e. "((,),(,));" in Newick format. It has 8 automorphisms.
- Q4Q_4^*: This is the star tree shape on 4 leaves, i.e. "(,,,);" in Newick format. It has 24 automorphisms.

T(Q)T(Q) is assigned an rQI-value based on its shape, i.e. rQI(T(Q))=qirQI(T(Q))=q_i if T(Q)T(Q) has the shape QiQ_i^*. The values q0,...,q4q_0,...,q_4 are chosen in such a way that they increase with the symmetry of the shape as measured by means of its number of automorphisms. Coronado et al. (2019) suggested the values q0=0q_0=0 and qi=iq_i=i or qi=2iq_i=2^i for i=1,...,4i=1,...,4.
The rooted quartet index rQI(T)rQI(T) of the tree TT is then defined as the sum of the rQI-values of its rooted quartets:

rQI(T)=QP4rQI(T(Q))rQI(T)=\sum_{Q\in P_4} rQI(T(Q))

The rooted quartet index is a balance index.

For details on the rooted quartet index, see also Chapter 20 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_20).

Usage

rQuartetI(tree, shapeVal = c(0, 1, 2, 3, 4))

Arguments

tree

A rooted tree in phylo format.

shapeVal

A vector of length 5 containing the shape values q0,...,q4q_0,...,q_4. Default is (q0,q1,q2,q3,q4)=(0,1,2,3,4)(q_0,q_1,q_2,q_3,q_4)=(0,1,2,3,4).

Value

rQuartetI returns the rooted quartet index of the given tree based on the chosen shape values (see description for details).

Author(s)

Sophie Kersting

References

T. M. Coronado, A. Mir, F. Rossello, and G. Valiente. A balance index for phylogenetic trees based on rooted quartets. Journal of Mathematical Biology, 79(3):1105-1148, 2019. doi: 10.1007/s00285-019-01377-w. URL https://doi.org/10.1007/s00285-019-01377-w.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
rQuartetI(tree)

Calculation of the Sackin index for rooted trees

Description

This function calculates the Sackin index S(T)S(T) for a given rooted tree TT. The tree must not necessarily be binary. S(T)S(T) is defined as

S(T)=xVL(T)δ(x)=uVin(T)nuS(T)=\sum_{x\in V_L(T)} \delta(x)=\sum_{u\in V_{in}(T)} n_u

in which VL(T)V_L(T) denotes the leaf set of TT, δ(x)\delta(x) denotes the depth of the leaf xx, Vin(T)V_{in}(T) denotes the set of inner vertices in TT, and nun_u denotes the number of leaves in the pending subtree that is rooted at uu. The Sackin index is an imbalance index.

For n=1n=1 the function returns S(T)=0S(T)=0 and a warning.

For details on the Sackin index, see also Chapter 5 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_5).

Usage

sackinI(tree)

Arguments

tree

A rooted tree in phylo format.

Value

sackinI returns the Sackin index of the given tree.

Author(s)

Luise Kuehn

References

M.J. Sackin. "Good" and "Bad" Phenograms. Systematic Biology, 21(2):225-226, 1972. doi: 10.1093/sysbio/21.2.225.

K.-T. Shao and R.R. Sokal. Tree Balance. Systematic Zoology, 39(3):266, 1990. doi: 10.2307/2992186.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
sackinI(tree)

Calculation of the s-shape statistic for rooted trees

Description

This function calculates the s-shape statistic sShape(T)sShape(T) for a given rooted tree TT. The tree must not necessarily be binary, however sShapesShape only fulfils the definition of an imbalance index on the space of binary trees. sShape(T)sShape(T) is defined as

sShape(T)=uVin(T)log(nu1)sShape(T)=\sum_{u\in V_{in}(T)} log(n_u-1)

in which Vin(T)V_{in}(T) denotes the set of inner vertices of TT and nun_u denotes the number of leaves in the pending subtree that is rooted at uu. An arbitrary logarithm base can be used (for binary trees it is common to use base 2).

For n=1n=1 the function returns sShape(T)=0sShape(T)=0 and a warning.

For details on the s-shape statistic, see also Chapter 9 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_9).

Usage

sShapeI(tree, logbase = 2)

Arguments

tree

A rooted tree in phylo format.

logbase

The logarithm base that shall be used.

Value

sShapeI returns the s-shape statistic of the given tree.

Author(s)

Luise Kuehn

References

M.G. Blum and O. Francois. Which random processes describe the tree of life? a large-scale study of phylogenetic tree imbalance. Systematic Biology, 2006.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
sShapeI(tree)

Calculation of the stairs1 value for rooted binary trees

Description

This function calculates the stairs1 value st1(T)st1(T) for a given rooted binary tree TT. It is a modified version of the Rogers J index and is defined as the fraction of inner vertices whose balance value is unequal to zero, more precisely

st1(T)=1n1uVin(T)(1I(nua=nub))st1(T)=\frac{1}{n-1}\cdot\sum_{u \in V_{in}(T)} (1-I(n_{u_a}=n_{u_b}))

in which Vin(T)V_{in}(T) denotes the set of all inner vertices of TT, and in which nuan_{u_a} and nubn_{u_b} denote the number of leaves in the two pending subtrees that are rooted at the direct descendants of uu. The stairs1 value is an imbalance index.

Special cases: For n=1n=1, the function returns st1(T)=0st1(T)=0 and a warning.

For details on the stairs1 value, see also Chapter 23 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_23).

Usage

stairs1(tree)

Arguments

tree

A rooted binary tree in phylo format.

Value

stairs1 returns the stairs1 value of the given tree.

Author(s)

Sophie Kersting

References

M. M. Norstrom, M. C. Prosperi, R. R. Gray, A. C. Karlsson, and M. Salemi. PhyloTempo: A Set of R Scripts for Assessing and Visualizing Temporal Clustering in Genealogies Inferred from Serially Sampled Viral Sequences. Evolutionary Bioinformatics, 8:EBO.S9738, 2012. ISSN 1176-9343, 1176-9343. doi:10.4137/EBO.S9738.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
stairs1(tree)

Calculation of the stairs2 value for rooted binary trees

Description

This function calculates the stairs2 value st2(T)st2(T) for a given rooted binary tree TT. It is defined as the mean ratio between the leaf numbers of the smaller and larger pending subtree over all inner vertices, more precisely

st2(T)=1n1uVin(T)nuanubst2(T)=\frac{1}{n-1}\cdot\sum_{u \in V_{in}(T)} \frac{n_{u_a}}{n_{u_b}}

in which Vin(T)V_{in}(T) denotes the set of all inner vertices of TT, and in which nuanubn_{u_a}\geq n_{u_b} denote the number of leaves in the two pending subtrees that are rooted at the direct descendants of uu. The stairs2 value is an imbalance index.

Special cases: For n=1n=1, the function returns st2(T)=0st2(T)=0 and a warning.

For details on the stairs2 value, see also Chapter 23 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_23).

Usage

stairs2(tree)

Arguments

tree

A rooted binary tree in phylo format.

Value

stairs2 returns the stairs2 value of the given tree.

Author(s)

Sophie Kersting

References

C. Colijn, J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
stairs2(tree)

Calculation of the symmetry nodes index for rooted binary trees

Description

This function calculates the symmetry nodes index SNI(T)SNI(T) for a given rooted binary tree TT. SNI(T)SNI(T) is defined as the number of inner vertices vv that are not symmetry nodes, i.e. the two pending subtrees rooted at the children of vv do not have the same tree shape.

For n=1n=1 the function returns SNI(T)=0SNI(T)=0 and a warning.

For details on the symmetry nodes index, see also Chapter 18 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_18).

Usage

symNodesI(tree)

Arguments

tree

A rooted binary tree in phylo format.

Value

symNodesI returns the symmetry nodes index of the given tree.

Author(s)

Sophie Kersting

References

S. J. Kersting, M. Fischer. Measuring tree balance using symmetry nodes — A new balance index and its extremal properties. Mathematical Biosciences, page 108690, 2021. ISSN 0025-5564. doi:https://doi.org/10.1016/j.mbs.2021.108690

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
symNodesI(tree)

Calculation of the total cophenetic index for rooted trees

Description

This function calculates the total cophenetic index TCI(T)TCI(T) of a given rooted tree TT. The tree must not necessarily be binary. TCI(T)TCI(T) is defined as

TCI(T)=1i<jnδ(lca(i,j))=uVin(T){ρ}binom(nu,2)TCI(T)=\sum_{1\leq i<j\leq n} \delta(lca(i,j))=\sum_{u\in V_{in}(T)\setminus\{\rho\}} binom(n_u,2)

in which δ(lca(i,j))\delta(lca(i,j)) denotes the depth of the lowest common ancestor of the two leaves ii and jj and Vin(T){ρ}V_{in}(T)\setminus\{\rho\} denotes the set of all inner vertices exept the root and nun_u denotes the number of descendant leaves of uu. The second formula is useful for efficient computation of TCI(T)TCI(T). The total cophenetic index is an imbalance index.

For n=1n=1 the function returns TCI(T)=0TCI(T)=0.

For details on the total cophenetic index, see also Chapter 8 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_8).

Usage

totCophI(tree)

Arguments

tree

A rooted tree in phylo format.

Value

totCophI returns the total cophenetic index of the given tree.

Author(s)

Sophie Kersting

References

A. Mir, F. Rossello, and L. Rotger. A new balance index for phylogenetic trees. Mathematical Bio-sciences, 241(1):125-136, 2013. doi: 10.1016/j.mbs.2012.10.005.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
totCophI(tree)
tree <- ape::read.tree(text="((,),((((,),),),(,)));")
totCophI(tree)
tree <- ape::read.tree(text="((,,,),(,,));")
totCophI(tree)

Calculation of the total internal path length for rooted trees

Description

This function calculates the total internal path length TIP(T)TIP(T) for a given rooted tree TT. The tree must not necessarily be binary. TIP(T)TIP(T) is defined as

TIP(T)=xVin(T)δ(x)TIP(T)=\sum_{x\in V_{in}(T)} \delta(x)

in which Vin(T)V_{in}(T) denotes the set of inner vertices of TT, and δ(x)\delta(x) denotes the depth of the vertex xx. The total internal path length is an imbalance index.

For details on the total internal path length, see also Chapter 23 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_23).

Usage

totIntPathLen(tree)

Arguments

tree

A rooted tree in phylo format.

Value

totIntPathLen returns the total internal path length of the given tree.

Author(s)

Luise Kuehn

References

D. E. Knuth. The art of computer programming: fundamental algorithms, volume 1. Addison-Wesley, Reading, Mass, 3rd edition, 1997. ISBN 9780201896831.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,,),),(,)));")
totIntPathLen(tree)

Calculation of the total path length for rooted trees

Description

This function calculates the total path length TPL(T)TPL(T) for a given rooted tree TT. The tree must not necessarily be binary. TPL(T)TPL(T) is defined as

TPL(T)=xV(T)δ(x)TPL(T)=\sum_{x\in V(T)} \delta(x)

in which V(T)V(T) denotes the set of vertices of TT, and δ(x)\delta(x) denotes the depth of the vertex xx. The total path length is an imbalance index.

For n=1n=1 the function returns TPL(T)=0TPL(T)=0 and a warning.

For details on the total path length, see also Chapter 23 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_23).

Usage

totPathLen(tree)

Arguments

tree

A rooted tree in phylo format.

Value

totPathLen returns the total path length of the given tree.

Author(s)

Luise Kuehn

References

see e.g. R. P. Dobrow, J. A. Fill. Total path length for random recursive trees. Combinatorics, Probability and Computing, 8(4):317–333, 1999. doi: 10.1017/S0963548399003855.

see e.g. L. Takacs. On the total heights of random rooted trees. Journal of Applied Probability, 29(3):543–556, 1992. doi: 10.2307/3214892.

see e.g. L. Takacs. On the total heights of random rooted binary trees. Journal of Combinatorial Theory, Series B, 61(2):155–166, 1994. ISSN 0095-8956. doi: 10.1006/jctb.1994.1041.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,,),),(,)));")
totPathLen(tree)

Calculation of the variance of leaf depths index for rooted trees

Description

This function calculates the variance of leaf depths index VLD(T)VLD(T) for a given rooted tree TT. The tree must not necessarily be binary. VLD(T)VLD(T) is defined as

VLD(T)=1nxVL(T)(δ(x)N(T))2VLD(T)=\frac{1}{n}\cdot\sum_{x\in V_L(T)} (\delta(x)-N(T))^2

in which nn denotes the number of leaves of TT, VL(T)V_L(T) denotes the set of leaves of TT, δ(x)\delta(x) denotes the depth of the leaf xx and N(T)N(T) denotes the average leaf depth of TT.

For n=1n=1 the function returns VLD(T)=0VLD(T)=0 and a warning.

For details on the variance of leaf depths, see also Chapter 7 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_7).

Usage

varLeafDepI(tree)

Arguments

tree

A rooted tree in phylo format.

Value

varLeafDepI returns the variance of leaf depths index of the given tree.

Author(s)

Sophie Kersting

References

T. M. Coronado, A. Mir, F. Rossello, and L. Rotger. On Sackin's original proposal: the variance of the leaves' depths as a phylogenetic balance index. BMC Bioinformatics, 21(1), 2020. doi: 10.1186/s12859-020-3405-1. URL https://doi.org/10.1186/s12859-020-3405-1.

M. J. Sackin. "Good" and "Bad" Phenograms. Systematic Biology, 21(2):225-226, 1972. doi: 10.1093/sysbio/21.2.225.

K.-T. Shao and R. R. Sokal. Tree Balance. Systematic Zoology, 39(3):266, 1990. doi: 10.2307/2992186.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
varLeafDepI(tree)

Wedderburn Etherington numbers (from OEIS)

Description

Contains a vector of Wedderburn Etherington numbers for n=1n=1 to n=2545n=2545.

Usage

data(wedEth)

Format

numerical vector

Source

OEIS Sequence A001190 available at https://oeis.org/A001190

Examples

data(wedEth)
wedEth[5]

Calculation of weighted l1 distance index for rooted binary trees

Description

This function calculates the weighted l1 distance index Dl1(T)D_{l1}(T) for a given rooted binary tree TT. Dl1(T)D_{l1}(T) is defined as

Dl1(T)=z=2nzfn(z)pn(z)D_{l1}(T)=\sum_{z=2}^n z \cdot |f_n(z)-p_n(z)|

in which nn denotes the number of leaves of TT, fn(z)f_n(z) denotes the frequency of pending subtrees of size zz in TT and pn(z)p_n(z) is the expected number of pending subtrees of size zz under the Yule model, i.e. pn(z)=1n1p_n(z)=\frac{1}{n-1} if z=nz=n and otherwise nn12z(z+1)\frac{n}{n-1}\cdot\frac{2}{z\cdot(z+1)}.

For n=1n=1 the function returns Dl1(T)=0D_{l1}(T)=0.

For details on the weighted l1 distance index, see also Chapter 24 in "Tree balance indices: a comprehensive survey" (https://doi.org/10.1007/978-3-031-39800-1_24).

Usage

weighL1dist(tree)

Arguments

tree

A rooted binary tree in phylo format.

Value

weighL1distI returns the weighted l1 distance index of the given tree.

Author(s)

Sophie Kersting

References

M. G. Blum and O. Francois. On statistical tests of phylogenetic tree imbalance: The Sackin and other indices revisited. Mathematical Biosciences, 195(2):141-153, 2005. doi: 10.1016/j.mbs.2005.03.003.

Examples

tree <- ape::read.tree(text="((((,),),(,)),(((,),),(,)));")
weighL1dist(tree)