| Title: | Taxonomic Diversity Indices Using Deng Entropy |
|---|---|
| Description: | Calculates taxonomic diversity indices for ecological community data using Deng entropy framework and classical approaches (Shannon, Simpson, Clarke & Warwick). Provides functions for computing taxonomic distinctness, average taxonomic distinctness (AvTD/Delta+), variation in taxonomic distinctness (VarTD/Lambda+), and Deng entropy-based measures that incorporate taxonomic hierarchy information. Includes tools for constructing taxonomic trees and computing pairwise taxonomic distances. |
| Authors: | Muhammet Murat Gorgoz [aut, cre] (ORCID: <https://orcid.org/0000-0002-6398-0005>), Kursad Ozkan [aut] (ORCID: <https://orcid.org/0000-0002-8526-7243>), Mehmet Guvenc Negiz [aut] (ORCID: <https://orcid.org/0009-0008-1507-8712>) |
| Maintainer: | Muhammet Murat Gorgoz <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-31 07:07:05 UTC |
| Source: | https://github.com/cran/taxdiv |
A data frame containing 20 tree species from Anatolian forests, distributed across three sample plots with varying community compositions. Species abundances follow the Westhoff & van der Maarel (1973) scale (1–9). Taxonomic classification includes seven ranks from species to kingdom.
anatolian_treesanatolian_trees
A data frame with 33 rows and 9 columns:
Sample plot name (character)
Binomial species name with underscore separator (character)
Genus (character)
Family (character)
Order (character)
Class (character)
Phylum / Division (character)
Kingdom (character)
Westhoff abundance value, integer 1–9 (numeric)
The three sites represent different forest types:
Mixed forest – both conifers and broadleaves (12 species)
Broadleaf-dominated forest (13 species)
Conifer-dominated forest (8 species)
This dataset can be used directly with batch_analysis
for multi-site analysis:
batch_analysis(anatolian_trees)
To extract a single community for use with ozkan_pto
or compare_indices:
site1 <- anatolian_trees[anatolian_trees$Site == "Karisik_Orman", ]
community <- setNames(site1$Abundance, site1$Species)
tax_tree <- site1[, c("Species", "Genus", "Family", "Order",
"Class", "Phylum", "Kingdom")]
ozkan_pto(community, tax_tree)
Westhoff, V. & van der Maarel, E. (1973). The Braun-Blanquet approach. In: R.H. Whittaker (ed.), Ordination and classification of communities. Handbook of Vegetation Science 5, 617–726.
batch_analysis for multi-site analysis,
gazi_comm and gazi_gytk for a
single-community example.
data(anatolian_trees) head(anatolian_trees) # Multi-site analysis batch_analysis(anatolian_trees) # Single site extraction site1 <- anatolian_trees[anatolian_trees$Site == "Karisik_Orman", ] comm <- setNames(site1$Abundance, site1$Species) tax <- site1[, c("Species", "Genus", "Family", "Order", "Class", "Phylum", "Kingdom")] ozkan_pto(comm, tax)data(anatolian_trees) head(anatolian_trees) # Multi-site analysis batch_analysis(anatolian_trees) # Single site extraction site1 <- anatolian_trees[anatolian_trees$Site == "Karisik_Orman", ] comm <- setNames(site1$Abundance, site1$Species) tax <- site1[, c("Species", "Genus", "Family", "Order", "Class", "Phylum", "Kingdom")] ozkan_pto(comm, tax)
Calculates the average taxonomic distinctness (AvTD, Delta+) based on Clarke & Warwick (1998). This is a presence/absence-based measure of the average taxonomic distance between all pairs of species.
avtd(species, tax_tree, weights = NULL)avtd(species, tax_tree, weights = NULL)
species |
Character vector of species names present in the community (presence-only data). |
tax_tree |
A data frame representing the taxonomic hierarchy. |
weights |
Optional numeric vector of weights for taxonomic levels. |
A numeric value representing the average taxonomic distinctness (Delta+).
Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. Journal of Applied Ecology, 35, 523-531.
tax <- data.frame( Species = c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis", "Abies_nordmanniana"), Genus = c("Quercus", "Pinus", "Fagus", "Abies"), Family = c("Fagaceae", "Pinaceae", "Fagaceae", "Pinaceae"), Order = c("Fagales", "Pinales", "Fagales", "Pinales"), stringsAsFactors = FALSE ) spp <- c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis") avtd(spp, tax)tax <- data.frame( Species = c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis", "Abies_nordmanniana"), Genus = c("Quercus", "Pinus", "Fagus", "Abies"), Family = c("Fagaceae", "Pinaceae", "Fagaceae", "Pinaceae"), Order = c("Fagales", "Pinales", "Fagales", "Pinales"), stringsAsFactors = FALSE ) spp <- c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis") avtd(spp, tax)
Computes all diversity indices for one or more sample sites from a single data frame (e.g., imported from Excel). The function automatically detects the site column, taxonomic columns, and abundance column, splits the data by site, and returns a summary data frame with species count and 14 diversity indices per site.
batch_analysis( data, site_column = NULL, tax_columns = NULL, abundance_column = "Abundance", correction = c("none", "miller_madow", "grassberger", "chao_shen"), parallel = FALSE, n_cores = NULL )batch_analysis( data, site_column = NULL, tax_columns = NULL, abundance_column = "Abundance", correction = c("none", "miller_madow", "grassberger", "chao_shen"), parallel = FALSE, n_cores = NULL )
data |
A data frame containing species data. Must include at minimum a species column, at least one taxonomic rank column, and an abundance column. Optionally includes a site/plot column for multi-site analysis. |
site_column |
Character string specifying the name of the site column.
If |
tax_columns |
Character vector specifying the names of the taxonomic
columns (from Species to highest rank). If |
abundance_column |
Character string specifying the name of the
abundance column. Default is |
correction |
Bias correction for the Shannon index. One of
|
parallel |
Logical. If |
n_cores |
Number of CPU cores to use when |
When no site column is present (or all values are identical), the entire data set is treated as a single community.
The function calculates the following indices per site:
Shannon: Shannon-Wiener entropy (shannon)
Simpson: Gini-Simpson index (simpson)
Delta: Clarke & Warwick taxonomic diversity (delta)
Delta_star: Clarke & Warwick taxonomic distinctness (delta_star)
AvTD: Average taxonomic distinctness (avtd)
VarTD: Variation in taxonomic distinctness (vartd)
uTO: Unweighted taxonomic diversity (Ozkan pTO, all levels)
TO: Weighted taxonomic diversity (Ozkan pTO, all levels)
uTO_plus: Unweighted taxonomic distance (Ozkan pTO, all levels)
TO_plus: Weighted taxonomic distance (Ozkan pTO, all levels)
uTO_max: Unweighted taxonomic diversity (informative levels only)
TO_max: Weighted taxonomic diversity (informative levels only)
uTO_plus_max: Unweighted taxonomic distance (informative levels only)
TO_plus_max: Weighted taxonomic distance (informative levels only)
A data frame with one row per site and columns:
Site, N_Species, Shannon, Simpson, Delta,
Delta_star, AvTD, VarTD, uTO, TO,
uTO_plus, TO_plus, uTO_max, TO_max,
uTO_plus_max, TO_plus_max.
compare_indices for analysis with pre-built community
vectors, build_tax_tree for building taxonomic trees manually.
# Single-site data (no Site column) df <- data.frame( Species = c("sp1", "sp2", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2"), Order = c("O1", "O1", "O1", "O1"), Abundance = c(10, 20, 15, 5), stringsAsFactors = FALSE ) batch_analysis(df) # Multi-site data (with Site column) df2 <- data.frame( Site = c("A", "A", "A", "B", "B", "B"), Species = c("sp1", "sp2", "sp3", "sp1", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1", "F2"), Order = c("O1", "O1", "O1", "O1", "O1", "O1"), Abundance = c(10, 20, 15, 5, 25, 10), stringsAsFactors = FALSE ) batch_analysis(df2)# Single-site data (no Site column) df <- data.frame( Species = c("sp1", "sp2", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2"), Order = c("O1", "O1", "O1", "O1"), Abundance = c(10, 20, 15, 5), stringsAsFactors = FALSE ) batch_analysis(df) # Multi-site data (with Site column) df2 <- data.frame( Site = c("A", "A", "A", "B", "B", "B"), Species = c("sp1", "sp2", "sp3", "sp1", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1", "F2"), Order = c("O1", "O1", "O1", "O1", "O1", "O1"), Abundance = c(10, 20, 15, 5, 25, 10), stringsAsFactors = FALSE ) batch_analysis(df2)
Creates a taxonomic hierarchy data frame from species classification
information. This is a convenience function for constructing the
tax_tree input required by other functions in the package.
build_tax_tree(species, ...)build_tax_tree(species, ...)
species |
Character vector of species names. |
... |
Named character vectors for each taxonomic rank, in order from lowest to highest (e.g., Genus, Family, Order). |
A data frame with species as the first column and taxonomic ranks as subsequent columns.
tree <- build_tax_tree( species = c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis"), Genus = c("Quercus", "Pinus", "Fagus"), Family = c("Fagaceae", "Pinaceae", "Fagaceae"), Order = c("Fagales", "Pinales", "Fagales") )tree <- build_tax_tree( species = c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis"), Genus = c("Quercus", "Pinus", "Fagus"), Family = c("Fagaceae", "Pinaceae", "Fagaceae"), Order = c("Fagales", "Pinales", "Fagales") )
Computes all available diversity indices for one or more communities and returns them in a single data frame. Optionally produces a grouped bar plot for visual comparison.
compare_indices( communities, tax_tree, correction = c("none", "miller_madow", "grassberger", "chao_shen"), plot = FALSE )compare_indices( communities, tax_tree, correction = c("none", "miller_madow", "grassberger", "chao_shen"), plot = FALSE )
communities |
A named list of community vectors (named numeric), or a single named numeric vector. When a single vector is provided, it is wrapped in a list with name "Community". |
tax_tree |
A data frame representing the taxonomic hierarchy,
as produced by |
correction |
Bias correction for the Shannon index. One of
|
plot |
Logical. If |
The function calculates the following indices:
Shannon: Shannon-Wiener entropy (shannon)
Simpson: Gini-Simpson index (simpson)
Delta: Clarke & Warwick taxonomic diversity (delta)
Delta_star: Clarke & Warwick taxonomic distinctness (delta_star)
AvTD: Average taxonomic distinctness (avtd)
VarTD: Variation in taxonomic distinctness (vartd)
uTO: Unweighted taxonomic diversity (Ozkan pTO, all levels)
TO: Weighted taxonomic diversity (Ozkan pTO, all levels)
uTO_plus: Unweighted taxonomic distance (Ozkan pTO, all levels)
TO_plus: Weighted taxonomic distance (Ozkan pTO, all levels)
uTO_max: Unweighted taxonomic diversity (informative levels)
TO_max: Weighted taxonomic diversity (informative levels)
uTO_plus_max: Unweighted taxonomic distance (informative levels)
TO_plus_max: Weighted taxonomic distance (informative levels)
If plot = FALSE, a data frame with communities as rows
and indices as columns. If plot = TRUE, a list with two elements:
The data frame of index values.
A ggplot object showing a grouped bar chart.
shannon, simpson,
delta, delta_star,
avtd, vartd,
ozkan_pto, pto_components
tax <- build_tax_tree( species = c("sp1", "sp2", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2"), Order = c("O1", "O1", "O1", "O1") ) # Single community comm <- c(sp1 = 10, sp2 = 20, sp3 = 15, sp4 = 5) compare_indices(comm, tax) # Multiple communities comm_list <- list( Site_A = c(sp1 = 10, sp2 = 20, sp3 = 15, sp4 = 5), Site_B = c(sp1 = 5, sp2 = 5, sp3 = 5, sp4 = 5) ) compare_indices(comm_list, tax)tax <- build_tax_tree( species = c("sp1", "sp2", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2"), Order = c("O1", "O1", "O1", "O1") ) # Single community comm <- c(sp1 = 10, sp2 = 20, sp3 = 15, sp4 = 5) compare_indices(comm, tax) # Multiple communities comm_list <- list( Site_A = c(sp1 = 10, sp2 = 20, sp3 = 15, sp4 = 5), Site_B = c(sp1 = 5, sp2 = 5, sp3 = 5, sp4 = 5) ) compare_indices(comm_list, tax)
Calculates the taxonomic diversity index (Delta) from Warwick & Clarke (1995). This is the average weighted path length between every pair of individuals, including same-species pairs (weighted 0).
delta(community, tax_tree, weights = NULL)delta(community, tax_tree, weights = NULL)
community |
A named numeric vector of species abundances. |
tax_tree |
A data frame with taxonomic hierarchy. |
weights |
Optional numeric vector of path weights for each taxonomic level. If NULL, a linear scale is used (1, 2, 3, ...). |
A numeric value representing taxonomic diversity (Delta).
Warwick, R.M. & Clarke, K.R. (1995). New 'biodiversity' measures reveal a decrease in taxonomic distinctness with increasing stress. Marine Ecology Progress Series, 129, 301-305.
delta_star() for taxonomic distinctness (excluding same-species),
avtd() for presence/absence-based AvTD,
ozkan_pto() for Deng entropy-based alternative.
comm <- c(sp1 = 5, sp2 = 3, sp3 = 3, sp4 = 1, sp5 = 3) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G2"), Family = c("F1", "F1", "F1", "F2", "F2"), stringsAsFactors = FALSE ) delta(comm, tax)comm <- c(sp1 = 5, sp2 = 3, sp3 = 3, sp4 = 1, sp5 = 3) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G2"), Family = c("F1", "F1", "F1", "F2", "F2"), stringsAsFactors = FALSE ) delta(comm, tax)
Calculates the taxonomic distinctness (Delta*) from Warwick & Clarke (1995). This is the average weighted path length between individuals of different species only.
delta_star(community, tax_tree, weights = NULL)delta_star(community, tax_tree, weights = NULL)
community |
A named numeric vector of species abundances. |
tax_tree |
A data frame with taxonomic hierarchy. |
weights |
Optional numeric vector of path weights for each taxonomic level. If NULL, a linear scale is used (1, 2, 3, ...). |
A numeric value representing taxonomic distinctness (Delta*).
Warwick, R.M. & Clarke, K.R. (1995). New 'biodiversity' measures reveal a decrease in taxonomic distinctness with increasing stress. Marine Ecology Progress Series, 129, 301-305.
delta() for taxonomic diversity (including same-species),
avtd() and vartd() for presence/absence measures.
comm <- c(sp1 = 5, sp2 = 3, sp3 = 3, sp4 = 1, sp5 = 3) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G2"), Family = c("F1", "F1", "F1", "F2", "F2"), stringsAsFactors = FALSE ) delta_star(comm, tax)comm <- c(sp1 = 5, sp2 = 3, sp3 = 3, sp4 = 1, sp5 = 3) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G2"), Family = c("F1", "F1", "F1", "F2", "F2"), stringsAsFactors = FALSE ) delta_star(comm, tax)
Computes the Deng entropy (Ed) for a given set of group proportions at a specific taxonomic level. This is the core entropy calculation from Deng (2016), which generalizes Shannon entropy through the Dempster-Shafer evidence theory framework.
deng_entropy_level( abundances, group_sizes = NULL, correction = c("none", "miller_madow", "grassberger", "chao_shen") )deng_entropy_level( abundances, group_sizes = NULL, correction = c("none", "miller_madow", "grassberger", "chao_shen") )
abundances |
A numeric vector of abundances for each group (node) at the given taxonomic level. |
group_sizes |
Optional integer vector of focal element sizes
( |
correction |
Bias correction method for Shannon entropy
estimation. Only applied at species level ( |
The Deng entropy is calculated as:
At species level, each focal element has cardinality 1, so Deng entropy reduces to Shannon entropy:
At higher levels (genus, family, etc.), equals the
number of species within each group, and the mass function is
the normalized proportion of total abundance in each group.
Bias correction is only meaningful at the species level where Deng entropy equals Shannon entropy. At higher taxonomic levels the mass function has a different structure and bias-correction formulas do not apply.
A numeric value representing the Deng entropy at that level.
Deng, Y. (2016). Deng entropy. Chaos, Solitons & Fractals, 91, 549-553.
ozkan_pto() which uses this function internally,
shannon() for classical Shannon entropy and bias corrections.
# Shannon entropy (species level, |Fi| = 1 for all) deng_entropy_level(c(4, 2, 3, 1, 2, 3, 2, 2)) # With bias correction at species level deng_entropy_level(c(4, 2, 3, 1, 2), correction = "chao_shen") # Deng entropy at genus level with group sizes deng_entropy_level(c(9, 3, 7), group_sizes = c(3, 2, 3))# Shannon entropy (species level, |Fi| = 1 for all) deng_entropy_level(c(4, 2, 3, 1, 2, 3, 2, 2)) # With bias correction at species level deng_entropy_level(c(4, 2, 3, 1, 2), correction = "chao_shen") # Deng entropy at genus level with group sizes deng_entropy_level(c(9, 3, 7), group_sizes = c(3, 2, 3))
A named numeric vector of species abundances for a single forest community with 8 Anatolian tree species. Abundance values follow the Westhoff & van der Maarel (1973) scale (1–9). This vector mirrors the hypothetical example in Ozkan (2018).
gazi_commgazi_comm
A named numeric vector with 8 elements. Names are species binomials (underscore-separated); values are integer abundances (1–4).
The species include 3 genera from Pinaceae, 2 from Fagaceae, 1 each from Cupressaceae and Betulaceae, spanning 2 orders (Pinales, Fagales).
Pair with gazi_gytk for analysis:
ozkan_pto(gazi_comm, gazi_gytk) compare_indices(gazi_comm, gazi_gytk)
Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336–346.
gazi_gytk for the matching taxonomy,
anatolian_trees for a multi-site dataset.
data(gazi_comm) data(gazi_gytk) # Ozkan pTO ozkan_pto(gazi_comm, gazi_gytk) # All indices at once compare_indices(gazi_comm, gazi_gytk)data(gazi_comm) data(gazi_gytk) # Ozkan pTO ozkan_pto(gazi_comm, gazi_gytk) # All indices at once compare_indices(gazi_comm, gazi_gytk)
A data frame containing the taxonomic hierarchy for the 8 species
in gazi_comm, with 3 taxonomic ranks (Genus, Family,
Order). This compact taxonomy table is designed for quick
demonstrations and unit testing.
gazi_gytkgazi_gytk
A data frame with 8 rows and 4 columns:
Binomial species name (character)
Genus (character)
Family (character)
Order (character)
The taxonomy represents:
8 genera: Pinus, Cedrus, Quercus, Fagus, Juniperus, Carpinus
4 families: Pinaceae (3 spp), Fagaceae (3 spp), Cupressaceae (1), Betulaceae (1)
2 orders: Pinales (4 spp), Fagales (4 spp)
Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336–346.
gazi_comm for the matching community vector,
build_tax_tree for building custom taxonomies.
data(gazi_gytk) gazi_gytk # Compute taxonomic distance matrix tax_distance_matrix(gazi_gytk)data(gazi_gytk) gazi_gytk # Compute taxonomic distance matrix tax_distance_matrix(gazi_gytk)
Computes the four components of the Deng entropy-based taxonomic diversity measure proposed by Ozkan (2018): weighted/unweighted taxonomic diversity (TO, uTO) and weighted/unweighted taxonomic distance (TO+, uTO+).
ozkan_pto(community, tax_tree, max_level = NULL)ozkan_pto(community, tax_tree, max_level = NULL)
community |
A named numeric vector of species abundances.
Names must match the first column of |
tax_tree |
A data frame with taxonomic hierarchy. First column is species names, subsequent columns are taxonomic ranks from lowest to highest (e.g., Species, Genus, Family, Order, Class, Phylum, Kingdom). |
max_level |
Integer or |
The method uses the slicing procedure from Ozkan (2018). At each slice (nk = 0, 1, ..., n_s), species with abundance <= nk are removed. The surviving species receive EQUAL weight (1/count) — abundance information enters indirectly through which species survive each slice.
Deng entropy at each taxonomic level is computed using these equal proportions, where the mass function m(Fi) = count_in_group / total_count and |Fi| = number of species in that taxonomic group.
The core product formula at each slice is:
where is the Deng entropy at species level and
is the Deng entropy at level i, computed using
presence/absence (equal weight) proportions.
pTO+ (taxonomic distance) uses only the nk=0 slice:
pTO (taxonomic diversity) aggregates across all slices:
A named list with components:
Unweighted taxonomic diversity (all levels)
Weighted taxonomic diversity (all levels)
Unweighted taxonomic distance (all levels)
Weighted taxonomic distance (all levels)
Unweighted taxonomic diversity (max informative level)
Weighted taxonomic diversity (max informative level)
Unweighted taxonomic distance (max informative level)
Weighted taxonomic distance (max informative level)
Deng entropy at each taxonomic level (nk=0 slice)
Integer: highest level with Ed > 0
Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336-346. DOI: 10.18182/tjf.441061
Deng, Y. (2016). Deng entropy. Chaos, Solitons & Fractals, 91, 549-553.
deng_entropy_level() for the core Deng entropy calculation,
pto_components() for a convenience wrapper returning a named vector,
delta() and avtd() for Clarke & Warwick alternatives.
# Simple example with 5 species comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) ozkan_pto(comm, tax) # With auto max-level detection ozkan_pto(comm, tax, max_level = "auto")# Simple example with 5 species comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) ozkan_pto(comm, tax) # With auto max-level detection ozkan_pto(comm, tax, max_level = "auto")
Runs the complete Ozkan taxonomic diversity analysis pipeline: jackknife (Islem 1), stochastic resampling (Islem 2), and sensitivity analysis (Islem 3), returning the maximum values across all three runs. This is equivalent to running all three steps in the Excel macro sequentially.
ozkan_pto_full(community, tax_tree, n_iter = 101L, seed = NULL)ozkan_pto_full(community, tax_tree, n_iter = 101L, seed = NULL)
community |
A named numeric vector of species abundances.
Names must match the first column of |
tax_tree |
A data frame with taxonomic hierarchy. First column is species names, subsequent columns are taxonomic ranks. |
n_iter |
Number of stochastic iterations for Run 2 and Run 3 (default: 101, minimum: 101). |
seed |
Optional random seed for reproducibility. If provided,
Run 2 uses this seed and Run 3 uses |
This function implements the full Excel macro pipeline in a single call:
Islem 1: Leave-one-out jackknife to identify contributing (happy) vs non-contributing (unhappy) species, plus deterministic pTO calculation.
Islem 2: Stochastic resampling – unhappy species are always included, happy species get 50\
Islem 3: Sensitivity analysis – unhappy species get
, happy species get a data-driven probability.
Final result: Maximum values across all three runs.
A named list with components:
Final maximum uTO+ across all 3 runs
Final maximum TO+ across all 3 runs
Final maximum uTO across all 3 runs
Final maximum TO across all 3 runs
Deterministic pTO result (from ozkan_pto())
Full Run 2 result (from ozkan_pto_resample())
Full Run 3 result (from ozkan_pto_sensitivity())
Jackknife result with species classifications
Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336-346. DOI: 10.18182/tjf.441061
ozkan_pto() for deterministic calculation only,
ozkan_pto_resample() for Run 2 only,
ozkan_pto_sensitivity() for Run 3 only,
ozkan_pto_jackknife() for jackknife only.
comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) set.seed(42) result <- ozkan_pto_full(comm, tax, n_iter = 101) result$uTO_plus # Final maximum uTO+ result$TO_plus # Final maximum TO+comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) set.seed(42) result <- ozkan_pto_full(comm, tax, n_iter = 101) result$uTO_plus # Final maximum uTO+ result$TO_plus # Final maximum TO+
Implements the leave-one-out jackknife procedure from the Ozkan Excel macro (Islem 1). Removes each species one at a time, recalculates pTO, and identifies "happy" (contributing) and "unhappy" (non-contributing) species. A species is "happy" if its removal decreases the pTO index, indicating it positively contributes to the community's taxonomic diversity.
ozkan_pto_jackknife(community, tax_tree, component = "uTO_plus")ozkan_pto_jackknife(community, tax_tree, component = "uTO_plus")
community |
A named numeric vector of species abundances.
Names must match the first column of |
tax_tree |
A data frame with taxonomic hierarchy. First column is species names, subsequent columns are taxonomic ranks. |
component |
Character string specifying which pTO component to use
for the happy/unhappy classification. One of |
The jackknife procedure follows the Excel macro's Islem 1 logic:
Compute pTO for the full community.
For each species i, remove it and compute pTO for the remaining community (leave-one-out).
Compare each leave-one-out result against the full-community value.
If removing species i DECREASES the specified component (pTO becomes smaller), species i is classified as "happy" (contributing).
If removing species i does NOT decrease the component, species i is classified as "unhappy" (non-contributing).
The happy/unhappy classification is used by ozkan_pto_resample()
(Islem 2) and ozkan_pto_sensitivity() (Islem 3) to apply
different resampling probabilities to each species category.
A named list with components:
The ozkan_pto() result for the full community
Data frame with leave-one-out results per species
Named logical vector: TRUE = happy
(contributing), FALSE = unhappy (non-contributing)
Number of happy species
Number of unhappy species
Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336-346. DOI: 10.18182/tjf.441061
ozkan_pto() for the core calculation,
ozkan_pto_resample() for Run 2,
ozkan_pto_full() for the full 3-run pipeline.
comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) jk <- ozkan_pto_jackknife(comm, tax) jk$species_status # Which species are happy (contributing)? jk$n_happy # How many happy species?comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) jk <- ozkan_pto_jackknife(comm, tax) jk$species_status # Which species are happy (contributing)? jk$n_happy # How many happy species?
Implements the stochastic resampling procedure from Ozkan's Excel macro (Islem 2). First performs a jackknife (Islem 1) to identify "happy" (contributing) and "unhappy" (non-contributing) species, then runs stochastic resampling where unhappy species are always included and happy species are randomly included with 50\
ozkan_pto_resample(community, tax_tree, n_iter = 101L, seed = NULL)ozkan_pto_resample(community, tax_tree, n_iter = 101L, seed = NULL)
community |
A named numeric vector of species abundances.
Names must match the first column of |
tax_tree |
A data frame with taxonomic hierarchy. First column is species names, subsequent columns are taxonomic ranks. |
n_iter |
Number of stochastic iterations to run (default: 101). Must be >= 101. |
seed |
Optional random seed for reproducibility (default: NULL). |
The algorithm follows the Excel macro's Islem 1 + Islem 2 logic:
Run jackknife (ozkan_pto_jackknife()) to classify each
species as happy or unhappy.
Iteration 1: Use the original community (deterministic).
Iterations 2..n_iter: For each species:
Unhappy species (AA = 0): always included with original abundance.
Happy species (AA > 0): randomly included (50\
or excluded. Uses RANDBETWEEN(0,1) * abundance.
Return the maximum of each component across all iterations.
A named list with components:
Maximum unweighted taxonomic distance across iterations
Maximum weighted taxonomic distance across iterations
Maximum unweighted taxonomic diversity across iterations
Maximum weighted taxonomic diversity across iterations
Deterministic uTO+ (first iteration, same as
ozkan_pto())
Deterministic TO+ (first iteration)
Deterministic uTO (first iteration)
Deterministic TO (first iteration)
Number of iterations performed
Named logical vector from jackknife
(TRUE = happy)
Full jackknife result from
ozkan_pto_jackknife()
Number of iterations with positive uTO+
Data frame with all iteration results
Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336-346. DOI: 10.18182/tjf.441061
ozkan_pto_jackknife() for the jackknife step,
ozkan_pto_sensitivity() for Run 3,
ozkan_pto_full() for the full pipeline.
comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) set.seed(42) result <- ozkan_pto_resample(comm, tax, n_iter = 101) result$species_status # Happy/unhappy classificationcomm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) set.seed(42) result <- ozkan_pto_resample(comm, tax, n_iter = 101) result$species_status # Happy/unhappy classification
Implements the sensitivity analysis procedure from Ozkan's Excel macro
(Islem 3). Uses the jackknife results from Run 2 to apply species-specific
inclusion probabilities: unhappy species get , happy
species get a data-driven probability derived from Run 2 iteration results.
ozkan_pto_sensitivity( community, tax_tree, run2_result, n_iter = NULL, seed = NULL )ozkan_pto_sensitivity( community, tax_tree, run2_result, n_iter = NULL, seed = NULL )
community |
A named numeric vector of species abundances. |
tax_tree |
A data frame with taxonomic hierarchy. |
run2_result |
The result from |
n_iter |
Number of iterations (default: same as Run 2). |
seed |
Optional random seed for reproducibility. |
The algorithm follows the Excel macro's Islem 3 logic:
For each species, the inclusion probability depends on its jackknife classification from Islem 1:
Unhappy species (AA = 0, non-contributing): Included with
probability , where S is total species count. In the Excel
formula: IF(RANDBETWEEN(1, S) > 1, H2, 0).
Happy species (AA > 0, contributing): Included with
probability derived from Run 2 results. In the Excel formula:
IF(L25 >= RANDBETWEEN(0, K22), H2, 0), where L25 is a
summary score from Run 2 and K22 is the iteration count.
The happy species probability is computed as:
where is the number of Run 2 iterations that produced
a positive uTO+ value and S is the species count.
The maximum pTO across all three runs (Run 1, 2, 3) is the final result.
A named list with components:
Maximum uTO+ across Run 1, 2, and 3
Maximum TO+ across all runs
Maximum uTO across all runs
Maximum TO across all runs
Maximum uTO+ from Run 3 only
Maximum TO+ from Run 3 only
Maximum uTO from Run 3 only
Maximum TO from Run 3 only
Number of iterations performed
Named numeric vector of inclusion probabilities
Probability used for happy species
Probability used for unhappy species
Data frame with all Run 3 iteration results
Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336-346. DOI: 10.18182/tjf.441061
ozkan_pto_resample() for Run 2,
ozkan_pto_full() for the full pipeline.
comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) set.seed(42) run2 <- ozkan_pto_resample(comm, tax, n_iter = 101) ozkan_pto_sensitivity(comm, tax, run2, n_iter = 101)comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) set.seed(42) run2 <- ozkan_pto_resample(comm, tax, n_iter = 101) ozkan_pto_sensitivity(comm, tax, run2, n_iter = 101)
Creates a bubble chart showing each species' abundance (x-axis), average taxonomic distance to other species (y-axis), and relative contribution to the community (bubble size). Species that are both abundant and taxonomically distant from others contribute most to overall taxonomic diversity.
plot_bubble(community, tax_tree, color_by = NULL, title = NULL)plot_bubble(community, tax_tree, color_by = NULL, title = NULL)
community |
Named numeric vector of species abundances. |
tax_tree |
A data frame representing the taxonomic hierarchy,
as produced by |
color_by |
Character string specifying which taxonomic rank to use
for coloring bubbles. Must match a column name in |
title |
Optional character string for the plot title. |
For each species , the average taxonomic distance is calculated as:
where is the pairwise taxonomic distance and
is the number of species. Bubble size represents the product of
relative abundance and average distance, indicating each species'
contribution to overall taxonomic diversity.
A ggplot object.
tax_distance_matrix, compare_indices
comm <- c(sp1 = 25, sp2 = 18, sp3 = 30, sp4 = 12, sp5 = 8) tax <- build_tax_tree( species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G3"), Family = c("F1", "F1", "F1", "F2", "F2"), Order = c("O1", "O1", "O1", "O1", "O1") ) plot_bubble(comm, tax)comm <- c(sp1 = 25, sp2 = 18, sp3 = 30, sp4 = 12, sp5 = 8) tax <- build_tax_tree( species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G3"), Family = c("F1", "F1", "F1", "F2", "F2"), Order = c("O1", "O1", "O1", "O1", "O1") ) plot_bubble(comm, tax)
Produces a Clarke & Warwick style funnel plot showing expected confidence limits for Average Taxonomic Distinctness (AvTD) and/or Variation in Taxonomic Distinctness (VarTD) as a function of species richness. Observed site values can be overlaid to assess whether they fall within or outside the expected range.
plot_funnel( sim_result, observed = NULL, index = c("avtd", "vartd"), title = NULL, point_labels = TRUE, mean_color = "darkblue", ci_color = "steelblue" )plot_funnel( sim_result, observed = NULL, index = c("avtd", "vartd"), title = NULL, point_labels = TRUE, mean_color = "darkblue", ci_color = "steelblue" )
sim_result |
A |
observed |
Optional data frame with columns |
index |
Which index to plot when |
title |
Optional plot title. If |
point_labels |
Logical; if |
mean_color |
Color of the mean line (default: |
ci_color |
Fill color of the confidence band (default:
|
The funnel shape arises because small samples (low S) have greater random variation in AvTD/VarTD, producing wider confidence bands. As S approaches the full species pool, the band narrows.
Observed points falling below the lower confidence limit suggest the community has lower taxonomic breadth than expected by chance, potentially indicating environmental stress or biotic homogenisation.
Requires the ggplot2 package.
A ggplot object.
Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. Journal of Applied Ecology, 35, 523-531.
simulate_td() for generating the simulation,
avtd() and vartd() for the underlying calculations.
tax <- data.frame( Species = paste0("sp", 1:10), Genus = rep(c("G1", "G2", "G3", "G4", "G5"), each = 2), Family = rep(c("F1", "F1", "F2", "F2", "F3"), each = 2), Order = rep(c("O1", "O1", "O2", "O2", "O2"), each = 2), stringsAsFactors = FALSE ) sim <- simulate_td(tax, n_sim = 99, seed = 42) # Basic funnel plot plot_funnel(sim) # With observed sites obs <- data.frame( site = c("Site_A", "Site_B"), s = c(5, 8), value = c(2.5, 1.8) ) plot_funnel(sim, observed = obs)tax <- data.frame( Species = paste0("sp", 1:10), Genus = rep(c("G1", "G2", "G3", "G4", "G5"), each = 2), Family = rep(c("F1", "F1", "F2", "F2", "F3"), each = 2), Order = rep(c("O1", "O1", "O2", "O2", "O2"), each = 2), stringsAsFactors = FALSE ) sim <- simulate_td(tax, n_sim = 99, seed = 42) # Basic funnel plot plot_funnel(sim) # With observed sites obs <- data.frame( site = c("Site_A", "Site_B"), s = c(5, 8), value = c(2.5, 1.8) ) plot_funnel(sim, observed = obs)
Visualizes the pairwise taxonomic distance matrix as a colored heatmap using ggplot2. Species are ordered by hierarchical clustering so that taxonomically similar species appear adjacent.
plot_heatmap( tax_tree, label_size = 3, title = NULL, low_color = "white", high_color = "#B22222" )plot_heatmap( tax_tree, label_size = 3, title = NULL, low_color = "white", high_color = "#B22222" )
tax_tree |
A data frame representing the taxonomic hierarchy,
as produced by |
label_size |
Numeric value controlling the size of species labels. Default is 3. |
title |
Optional character string for the plot title. |
low_color |
Color for the lowest distance (most similar).
Default is |
high_color |
Color for the highest distance (most distant).
Default is |
The heatmap displays the full symmetric distance matrix computed by
tax_distance_matrix. The diagonal (self-distance = 0)
appears in the lowest color. Species are reordered using hierarchical
clustering (UPGMA) to reveal taxonomic groupings visually.
A ggplot object.
tax_distance_matrix, plot_taxonomic_tree
tax <- build_tax_tree( species = c("sp1", "sp2", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2") ) plot_heatmap(tax)tax <- build_tax_tree( species = c("sp1", "sp2", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2") ) plot_heatmap(tax)
Visualizes the iteration-by-iteration pTO values from stochastic resampling (Run 2) or sensitivity analysis (Run 3). Each iteration's value is shown as a point, with the deterministic (Run 1) value displayed as a horizontal reference line.
plot_iteration(resample_result, component = "TO", title = NULL)plot_iteration(resample_result, component = "TO", title = NULL)
resample_result |
The list returned by
|
component |
Character string specifying which pTO component to plot.
One of |
title |
Optional character string for the plot title. |
The plot includes three visual elements:
Grey points: Individual iteration values
Red dashed line: Deterministic (Run 1) value
Blue dashed line: Maximum value across all iterations
This helps assess how stochastic species removal affects the pTO index and whether the maximum exceeds the deterministic value.
A ggplot object showing iteration values as points,
the deterministic value as a dashed red line, and the maximum
value as a dashed blue line.
ozkan_pto_resample, ozkan_pto_sensitivity
comm <- c(sp1 = 10, sp2 = 20, sp3 = 15, sp4 = 5) tax <- build_tax_tree( species = paste0("sp", 1:4), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2") ) res <- ozkan_pto_resample(comm, tax, n_iter = 101, seed = 42) plot_iteration(res, component = "TO")comm <- c(sp1 = 10, sp2 = 20, sp3 = 15, sp4 = 5) tax <- build_tax_tree( species = paste0("sp", 1:4), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2") ) res <- ozkan_pto_resample(comm, tax, n_iter = 101, seed = 42) plot_iteration(res, component = "TO")
Creates a radar chart comparing diversity indices across multiple communities. Each axis represents a different index, and each community is drawn as a colored polygon. Values are normalized to 0-1 scale so that indices with different magnitudes can be compared visually.
plot_radar(communities, tax_tree, indices = NULL, title = NULL)plot_radar(communities, tax_tree, indices = NULL, title = NULL)
communities |
A named list of community vectors (named numeric). |
tax_tree |
A data frame representing the taxonomic hierarchy,
as produced by |
indices |
Character vector specifying which indices to include.
Default is all 10 indices. Available: |
title |
Optional character string for the plot title. |
Each index value is normalized using min-max scaling across the communities being compared:
If all communities have the same value for an index (i.e.,
), the normalized value is set to 0.5.
The radar chart is built using polar coordinates in ggplot2. Each community appears as a colored polygon overlay, making it easy to spot which community scores higher on which indices.
A ggplot object.
compare_indices for tabular comparison
tax <- build_tax_tree( species = c("sp1", "sp2", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2") ) comms <- list( Site_A = c(sp1 = 10, sp2 = 20, sp3 = 15, sp4 = 5), Site_B = c(sp1 = 5, sp2 = 5, sp3 = 5, sp4 = 5) ) plot_radar(comms, tax)tax <- build_tax_tree( species = c("sp1", "sp2", "sp3", "sp4"), Genus = c("G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F2") ) comms <- list( Site_A = c(sp1 = 10, sp2 = 20, sp3 = 15, sp4 = 5), Site_B = c(sp1 = 5, sp2 = 5, sp3 = 5, sp4 = 5) ) plot_radar(comms, tax)
Visualises a rarefaction curve with confidence intervals using ggplot2.
Accepts output from rarefaction_taxonomic().
plot_rarefaction( rare_result, title = NULL, xlab = "Sample Size (individuals)", ylab = NULL, ci_color = "steelblue", line_color = "darkblue" )plot_rarefaction( rare_result, title = NULL, xlab = "Sample Size (individuals)", ylab = NULL, ci_color = "steelblue", line_color = "darkblue" )
rare_result |
A data frame returned by |
title |
Optional plot title. If |
xlab |
Label for the x-axis (default: |
ylab |
Label for the y-axis. If |
ci_color |
Fill color for the confidence interval ribbon
(default: |
line_color |
Color of the mean line (default: |
The plot shows the mean diversity value at each sample size as a solid line, surrounded by a shaded ribbon representing the bootstrap confidence interval. A vertical dashed line marks the total sample size (full data).
Requires the ggplot2 package.
A ggplot object.
rarefaction_taxonomic() for computing the rarefaction curve.
comm <- c(sp1 = 10, sp2 = 5, sp3 = 8, sp4 = 2, sp5 = 3) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G3"), Family = c("F1", "F1", "F1", "F2", "F2"), stringsAsFactors = FALSE ) rare <- rarefaction_taxonomic(comm, tax, index = "shannon", n_boot = 50) plot_rarefaction(rare)comm <- c(sp1 = 10, sp2 = 5, sp3 = 8, sp4 = 2, sp5 = 3) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G3"), Family = c("F1", "F1", "F1", "F2", "F2"), stringsAsFactors = FALSE ) rare <- rarefaction_taxonomic(comm, tax, index = "shannon", n_boot = 50) plot_rarefaction(rare)
Visualizes a taxonomic hierarchy as a dendrogram (tree diagram) using ggplot2. The function converts the taxonomic distance matrix into a hierarchical clustering object and renders it as a horizontal dendrogram with species labels colored by a chosen taxonomic rank.
plot_taxonomic_tree( tax_tree, community = NULL, color_by = NULL, label_size = 3, title = NULL )plot_taxonomic_tree( tax_tree, community = NULL, color_by = NULL, label_size = 3, title = NULL )
tax_tree |
A data frame representing the taxonomic hierarchy,
as produced by |
community |
Optional named numeric vector of species abundances.
Names must match species in |
color_by |
Character string specifying which taxonomic rank to use
for coloring species labels. Must match a column name in |
label_size |
Numeric value controlling the size of species labels. Default is 3. |
title |
Optional character string for the plot title. If |
The dendrogram is constructed from the pairwise taxonomic distance matrix
(computed via tax_distance_matrix) using hierarchical
clustering (hclust with method = "average").
Branch heights reflect taxonomic distance: species in the same genus
merge at the lowest level, while species in different orders merge at
the highest level.
When a community vector is provided, species labels are annotated
with abundance values in parentheses, e.g., "Quercus_coccifera (25)".
This function requires the ggplot2 package. If ggplot2 is not installed, the function will stop with an informative error message.
The clustering method used is UPGMA (Unweighted Pair Group Method with Arithmetic Mean), which is standard for taxonomic classification trees where branch lengths represent average distances between groups.
A ggplot object that can be further customized with
ggplot2 functions (e.g., + theme(), + labs()).
Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. Journal of Applied Ecology, 35, 523–531.
build_tax_tree for creating the taxonomy input,
tax_distance_matrix for the underlying distance calculation.
# Build a simple taxonomic tree tax <- build_tax_tree( species = c("Quercus_robur", "Quercus_petraea", "Pinus_nigra", "Pinus_brutia", "Juniperus_excelsa"), Genus = c("Quercus", "Quercus", "Pinus", "Pinus", "Juniperus"), Family = c("Fagaceae", "Fagaceae", "Pinaceae", "Pinaceae", "Cupressaceae"), Order = c("Fagales", "Fagales", "Pinales", "Pinales", "Pinales") ) # Basic dendrogram plot_taxonomic_tree(tax) # Color by Family, with abundance info comm <- c(Quercus_robur = 25, Quercus_petraea = 18, Pinus_nigra = 30, Pinus_brutia = 12, Juniperus_excelsa = 8) plot_taxonomic_tree(tax, community = comm, color_by = "Family")# Build a simple taxonomic tree tax <- build_tax_tree( species = c("Quercus_robur", "Quercus_petraea", "Pinus_nigra", "Pinus_brutia", "Juniperus_excelsa"), Genus = c("Quercus", "Quercus", "Pinus", "Pinus", "Juniperus"), Family = c("Fagaceae", "Fagaceae", "Pinaceae", "Pinaceae", "Cupressaceae"), Order = c("Fagales", "Fagales", "Pinales", "Pinales", "Pinales") ) # Basic dendrogram plot_taxonomic_tree(tax) # Color by Family, with abundance info comm <- c(Quercus_robur = 25, Quercus_petraea = 18, Pinus_nigra = 30, Pinus_brutia = 12, Juniperus_excelsa = 8) plot_taxonomic_tree(tax, community = comm, color_by = "Family")
Returns a named numeric vector with all eight Ozkan (2018) components: four using all taxonomic levels and four using only the informative levels (max version), matching the Excel macro's Run 1+2+3 output.
pto_components(community, tax_tree)pto_components(community, tax_tree)
community |
A named numeric vector of species abundances.
Names must match the first column of |
tax_tree |
A data frame with taxonomic hierarchy. First column is species names, subsequent columns are taxonomic ranks from lowest to highest (e.g., Species, Genus, Family, Order, Class, Phylum, Kingdom). |
A named numeric vector with eight elements:
uTO, TO, uTO_plus, TO_plus,
uTO_max, TO_max, uTO_plus_max, TO_plus_max.
ozkan_pto() for the full result including per-level entropy.
comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) pto_components(comm, tax)comm <- c(sp1 = 4, sp2 = 2, sp3 = 3, sp4 = 1, sp5 = 2) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G1", "G2", "G2"), Family = c("F1", "F1", "F1", "F1", "F1"), stringsAsFactors = FALSE ) pto_components(comm, tax)
Computes rarefaction curves for taxonomic diversity indices by subsampling individuals from the community at increasing sample sizes. Uses bootstrap resampling to estimate expected diversity and confidence intervals at each sample size.
rarefaction_taxonomic( community, tax_tree, index = c("shannon", "simpson", "species", "uTO", "TO", "uTO_plus", "TO_plus", "avtd"), steps = 20, n_boot = 100, ci = 0.95, seed = NULL, correction = c("none", "miller_madow", "grassberger", "chao_shen"), parallel = FALSE, n_cores = NULL )rarefaction_taxonomic( community, tax_tree, index = c("shannon", "simpson", "species", "uTO", "TO", "uTO_plus", "TO_plus", "avtd"), steps = 20, n_boot = 100, ci = 0.95, seed = NULL, correction = c("none", "miller_madow", "grassberger", "chao_shen"), parallel = FALSE, n_cores = NULL )
community |
A named numeric vector of species abundances (integers).
Names must match the first column of |
tax_tree |
A data frame with taxonomic hierarchy. First column is species names, subsequent columns are taxonomic ranks. |
index |
Which index to rarefy. One of |
steps |
Number of sample-size steps along the curve (default: 20). |
n_boot |
Number of bootstrap replicates per step (default: 100). |
ci |
Confidence interval width (default: 0.95). |
seed |
Optional random seed for reproducibility (default: NULL). |
correction |
Bias correction for the Shannon index. One of
|
parallel |
Logical. If |
n_cores |
Number of CPU cores to use when |
The algorithm works as follows:
Expand the abundance vector into an individual-level vector (e.g., c(sp1=3, sp2=2) becomes c("sp1","sp1","sp1","sp2","sp2")).
For each sample size (from min to total N), draw n_boot
random subsamples without replacement.
For each subsample, count species abundances and compute the chosen diversity index.
Return the mean and confidence interval at each step.
A data frame with columns:
Number of individuals in the subsample
Mean index value across bootstrap replicates
Lower bound of the confidence interval
Upper bound of the confidence interval
Standard deviation across replicates
Gotelli, N.J. & Colwell, R.K. (2001). Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters, 4, 379-391.
Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336-346.
plot_rarefaction() for visualising the rarefaction curve,
ozkan_pto() for full pTO calculation, shannon() and simpson()
for classical indices, avtd() for average taxonomic distinctness.
comm <- c(sp1 = 10, sp2 = 5, sp3 = 8, sp4 = 2, sp5 = 3) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G3"), Family = c("F1", "F1", "F1", "F2", "F2"), stringsAsFactors = FALSE ) rarefaction_taxonomic(comm, tax, index = "shannon", n_boot = 50)comm <- c(sp1 = 10, sp2 = 5, sp3 = 8, sp4 = 2, sp5 = 3) tax <- data.frame( Species = paste0("sp", 1:5), Genus = c("G1", "G1", "G2", "G2", "G3"), Family = c("F1", "F1", "F1", "F2", "F2"), stringsAsFactors = FALSE ) rarefaction_taxonomic(comm, tax, index = "shannon", n_boot = 50)
Calculates the Shannon-Wiener diversity index (H') for a community, optionally applying a bias correction for small samples.
shannon( community, base = exp(1), correction = c("none", "miller_madow", "grassberger", "chao_shen") )shannon( community, base = exp(1), correction = c("none", "miller_madow", "grassberger", "chao_shen") )
community |
A numeric vector of species abundances (counts). |
base |
The logarithm base. Default is |
correction |
Bias correction method. One of |
The naive (MLE) Shannon index is calculated as:
where is the proportion of species ,
is the total number of individuals, and is the
number of species observed.
The MLE estimator has a known negative bias that is significant for small samples. Three bias-correction methods are available:
Miller-Madow (1955): Adds a first-order bias correction term:
Grassberger (2003): Uses the digamma function instead of the logarithm:
where is the digamma function.
Chao-Shen (2003): Applies a Good-Turing coverage correction with Horvitz-Thompson weighting:
where and is
the number of singletons.
Bias corrections require integer abundance counts. A warning is
issued if non-integer values are detected with correction != "none".
A numeric value representing the Shannon diversity index.
Miller, G.A. & Madow, W.G. (1954). On the maximum likelihood estimate of the Shannon-Wiener index of diversity. AFCRC-TR-54-75.
Grassberger, P. (2003). Entropy estimates from insufficient samplings. arXiv:physics/0307138.
Chao, A. & Shen, T.-J. (2003). Nonparametric estimation of Shannon's index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10, 429-443.
simpson() for Simpson diversity, deng_entropy_level() for
Deng entropy (a generalization of Shannon).
comm <- c(10, 5, 8, 3, 12) shannon(comm) shannon(comm, correction = "miller_madow") shannon(comm, correction = "grassberger") shannon(comm, correction = "chao_shen")comm <- c(10, 5, 8, 3, 12) shannon(comm) shannon(comm, correction = "miller_madow") shannon(comm, correction = "grassberger") shannon(comm, correction = "chao_shen")
Calculates the Simpson diversity index (1 - D) for a community.
simpson(community, type = c("gini_simpson", "inverse", "dominance"))simpson(community, type = c("gini_simpson", "inverse", "dominance"))
community |
A numeric vector of species abundances. |
type |
One of |
Simpson's dominance index D is calculated as:
The Gini-Simpson index is and the inverse Simpson is
.
A numeric value representing the Simpson index.
shannon() for Shannon diversity.
comm <- c(10, 5, 8, 3, 12) simpson(comm) simpson(comm, type = "inverse")comm <- c(10, 5, 8, 3, 12) simpson(comm) simpson(comm, type = "inverse")
Generates the null distribution of Average Taxonomic Distinctness (AvTD) and/or Variation in Taxonomic Distinctness (VarTD) by randomly drawing species subsets from a regional species pool. Used to construct funnel plots for statistical testing (Clarke & Warwick 1998, 2001).
simulate_td( tax_tree, s_range = NULL, n_sim = 999L, index = c("both", "avtd", "vartd"), weights = NULL, ci = 0.95, seed = NULL, parallel = FALSE, n_cores = NULL )simulate_td( tax_tree, s_range = NULL, n_sim = 999L, index = c("both", "avtd", "vartd"), weights = NULL, ci = 0.95, seed = NULL, parallel = FALSE, n_cores = NULL )
tax_tree |
A data frame representing the full regional species pool taxonomy. First column is species names, subsequent columns are taxonomic ranks from lowest to highest. |
s_range |
Integer vector of species richness values to
simulate. Default |
n_sim |
Number of random draws per species richness value (default 999). |
index |
Which index to simulate: |
weights |
Optional numeric vector of weights for taxonomic
levels. Passed to |
ci |
Confidence interval width (default 0.95). |
seed |
Optional random seed for reproducibility. |
parallel |
Logical. If |
n_cores |
Number of CPU cores to use when |
For each value of S in s_range, n_sim random subsets of S
species are drawn (without replacement) from the full species pool
in tax_tree. AvTD and/or VarTD are computed for each random
subset. The mean and percentile-based confidence limits are
recorded.
The resulting object can be passed to plot_funnel() to produce
the classic Clarke & Warwick funnel plot.
A data frame with class "td_simulation" containing
columns:
Species richness (number of species drawn)
Mean simulated AvTD (if index includes avtd)
Lower CI bound for AvTD
Upper CI bound for AvTD
Mean simulated VarTD (if index includes vartd)
Lower CI bound for VarTD
Upper CI bound for VarTD
Attributes: ci, index, n_sim, pool_size.
Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. Journal of Applied Ecology, 35, 523-531.
Clarke, K.R. & Warwick, R.M. (2001). A further biodiversity index applicable to species lists: variation in taxonomic distinctness. Marine Ecology Progress Series, 216, 265-278.
plot_funnel() for visualisation, avtd() and vartd()
for the underlying calculations.
tax <- data.frame( Species = paste0("sp", 1:10), Genus = rep(c("G1", "G2", "G3", "G4", "G5"), each = 2), Family = rep(c("F1", "F1", "F2", "F2", "F3"), each = 2), Order = rep(c("O1", "O1", "O2", "O2", "O2"), each = 2), stringsAsFactors = FALSE ) sim <- simulate_td(tax, n_sim = 99, seed = 42) simtax <- data.frame( Species = paste0("sp", 1:10), Genus = rep(c("G1", "G2", "G3", "G4", "G5"), each = 2), Family = rep(c("F1", "F1", "F2", "F2", "F3"), each = 2), Order = rep(c("O1", "O1", "O2", "O2", "O2"), each = 2), stringsAsFactors = FALSE ) sim <- simulate_td(tax, n_sim = 99, seed = 42) sim
Calculates pairwise taxonomic distances between species based on their positions in a taxonomic hierarchy. Distance is computed as the weighted proportion of taxonomic levels at which two species diverge.
tax_distance_matrix(tax_tree, species = NULL, weights = NULL)tax_distance_matrix(tax_tree, species = NULL, weights = NULL)
tax_tree |
A data frame representing the taxonomic hierarchy. First column must be species names, subsequent columns are taxonomic ranks from lowest to highest. |
species |
Optional character vector of species names to include.
If NULL, all species in |
weights |
Optional numeric vector of weights for each taxonomic level. If NULL, equal weights are assigned. |
A symmetric matrix of taxonomic distances between species. With default equal step weights (1, 2, 3, ...), values range from 0 (same species) to the number of taxonomic levels (maximum distance when no common ancestor is found at any level).
tax <- data.frame( Species = c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis"), Genus = c("Quercus", "Pinus", "Fagus"), Family = c("Fagaceae", "Pinaceae", "Fagaceae"), Order = c("Fagales", "Pinales", "Fagales"), stringsAsFactors = FALSE ) tax_distance_matrix(tax)tax <- data.frame( Species = c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis"), Genus = c("Quercus", "Pinus", "Fagus"), Family = c("Fagaceae", "Pinaceae", "Fagaceae"), Order = c("Fagales", "Pinales", "Fagales"), stringsAsFactors = FALSE ) tax_distance_matrix(tax)
Calculates the variation in taxonomic distinctness (VarTD, Lambda+) based on Clarke & Warwick (2001).
vartd(species, tax_tree, weights = NULL)vartd(species, tax_tree, weights = NULL)
species |
Character vector of species names present in the community. |
tax_tree |
A data frame representing the taxonomic hierarchy. |
weights |
Optional numeric vector of weights for taxonomic levels. |
A numeric value representing the variation in taxonomic distinctness (Lambda+).
Clarke, K.R. & Warwick, R.M. (2001). A further biodiversity index applicable to species lists: variation in taxonomic distinctness. Marine Ecology Progress Series, 216, 265-278.
tax <- data.frame( Species = c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis", "Abies_nordmanniana"), Genus = c("Quercus", "Pinus", "Fagus", "Abies"), Family = c("Fagaceae", "Pinaceae", "Fagaceae", "Pinaceae"), Order = c("Fagales", "Pinales", "Fagales", "Pinales"), stringsAsFactors = FALSE ) spp <- c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis") vartd(spp, tax)tax <- data.frame( Species = c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis", "Abies_nordmanniana"), Genus = c("Quercus", "Pinus", "Fagus", "Abies"), Family = c("Fagaceae", "Pinaceae", "Fagaceae", "Pinaceae"), Order = c("Fagales", "Pinales", "Fagales", "Pinales"), stringsAsFactors = FALSE ) spp <- c("Quercus_robur", "Pinus_nigra", "Fagus_orientalis") vartd(spp, tax)