| Title: | Offline Taxonomic Name Matching Against Local Darwin Core Snapshots |
|---|---|
| Description: | Match taxonomic names against locally stored Darwin Core backbone databases ('WFO', 'COL', 'GBIF', 'ITIS', 'NCBI Taxonomy', 'Open Tree of Life', 'WoRMS', 'Species Fungorum', 'AlgaeBase'). Provides offline fuzzy and exact matching with synonym resolution, hybrid name detection, and a unified output schema across all sources. All heavy computation runs in the 'vectra' C11 columnar engine. |
| Authors: | Gilles Colling [aut, cre, cph] (ORCID: <https://orcid.org/0000-0003-3070-6066>) |
| Maintainer: | Gilles Colling <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.12 |
| Built: | 2026-06-30 16:51:42 UTC |
| Source: | https://github.com/cran/taxify |
Joins AlgaeTraits (Vranken et al. 2023) macroalgal functional traits to a
taxify() result by looking up accepted_name. AlgaeTraits provides
morphological, ecological, and life-history traits for European seaweeds.
add_algae_traits(x, verbose = TRUE)add_algae_traits(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: AlgaeTraits (Vranken et al. 2023, VLIZ Marine Data Archive, CC BY 4.0). Coverage: ~1,745 European macroalgae species.
The same data.frame with additional columns:
Maximum body size in centimetres.
Growth form / body shape (e.g., filamentous, foliose, crustose).
Calcification type (e.g., uncalcified, articulated, encrusting).
Life span category (annual, perennial, etc.).
Tidal zonation (e.g., supralittoral, eulittoral, sublittoral).
Wave exposure tolerance (sheltered, moderately exposed, exposed).
Habitat environment (marine, brackish, freshwater).
Environmental position / substrate type.
Vranken S et al. (2023) AlgaeTraits: a trait database for (European) seaweeds. Earth System Science Data 15:2711-2754. doi:10.5194/essd-15-2711-2023
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Fucus vesiculosus", backend = "gbif") |> add_algae_traits() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Fucus vesiculosus", backend = "gbif") |> add_algae_traits() options(old)
Joins alien species first record data to a taxify() result, filtered
by country. Data from the Global Alien Species First Record Database
(Seebens et al. 2017).
add_alien_first_records(x, country, verbose = TRUE)add_alien_first_records(x, country, verbose = TRUE)
x |
A data.frame returned by |
country |
Character. ISO 3166-1 alpha-2 country code(s), or
|
verbose |
Logical. Default |
Source: Global Alien Species First Record Database v3.1 (Seebens et al. 2017, Nature Communications 8, 14435). CC BY 4.0. Coverage: ~77k species x country combinations across all taxa.
The same data.frame with additional column(s):
Year of the first record (integer), or NA
if not recorded for that country.
Database that contributed the record
(e.g., "GAVIA", "CABI ISC").
Original citation or reference for the record.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Robinia pseudoacacia") |> add_alien_first_records(country = "AT") taxify(c("Robinia pseudoacacia", "Ailanthus altissima")) |> add_alien_first_records(country = c("AT", "DE")) options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Robinia pseudoacacia") |> add_alien_first_records(country = "AT") taxify(c("Robinia pseudoacacia", "Ailanthus altissima")) |> add_alien_first_records(country = c("AT", "DE")) options(old)
Joins AmphiBIO amphibian life-history and ecological traits to a
taxify() result by looking up accepted_name.
add_amphibio(x, verbose = TRUE)add_amphibio(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: AmphiBIO (Oliveira et al. 2017, CC BY 4.0). Coverage: ~6,800 amphibian species. Amphibians only.
The same data.frame with additional columns:
Maximum body size in mm (snout-vent length).
Age at maturity in days.
Maximum longevity in days.
Clutch/litter size.
Reproductive output per year.
Offspring size in mm.
Direct development (0/1).
Has larval stage (0/1).
Aquatic habitat (0/1).
Fossorial habitat (0/1).
Arboreal habitat (0/1).
Diurnal activity (0/1).
Nocturnal activity (0/1). Named
nocturnal_amphibio to avoid collision with EltonTraits'
nocturnal column.
Oliveira BF, Sao-Pedro VA, Santos-Barrera G, Penone C, Costa GC (2017) AmphiBIO, a global database for amphibian ecological traits. Scientific Data 4:170123.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Bufo bufo", backend = "gbif") |> add_amphibio() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Bufo bufo", backend = "gbif") |> add_amphibio() options(old)
Joins AnAge (Animal Ageing and Longevity Database) traits to a
taxify() result by looking up accepted_name.
add_anage(x, verbose = TRUE)add_anage(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: AnAge (de Magalhaes & Costa 2009, CC BY). Coverage: ~4.7k vertebrate species (mammals, birds, reptiles, amphibians, fish).
The same data.frame with additional columns:
Maximum longevity in years.
Adult body mass in grams.
Basal metabolic rate in watts.
Female age at sexual maturity in days.
Male age at sexual maturity in days.
Gestation or incubation length in days.
Litter or clutch size.
Mass at birth in grams.
Growth rate (1/days).
Body temperature in Kelvin.
de Magalhaes JP, Costa J (2009) A database of vertebrate longevity records and their relation to other life-history traits. Journal of Evolutionary Biology 22:1770-1774.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Vulpes vulpes", backend = "gbif") |> add_anage() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Vulpes vulpes", backend = "gbif") |> add_anage() options(old)
Joins AnimalTraits body mass and metabolic rate data to a taxify()
result by looking up accepted_name.
add_animaltraits(x, verbose = TRUE)add_animaltraits(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: AnimalTraits (Hebert et al. 2022, CC0). Coverage: ~2k species across arthropods, vertebrates, molluscs, and annelids. Individual-level observations aggregated to species medians.
The same data.frame with additional columns:
Median body mass in kg.
Median metabolic rate in watts.
Hebert K et al. (2022) AnimalTraits – a curated animal trait database for body mass, metabolic rate and brain size. Scientific Data 9:265.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Drosophila melanogaster", backend = "gbif") |> add_animaltraits() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Drosophila melanogaster", backend = "gbif") |> add_animaltraits() options(old)
Joins the Northwestern European Arthropod Life Histories dataset to a
taxify() result by looking up accepted_name.
add_arthropod_traits(x, verbose = TRUE)add_arthropod_traits(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: Logghe et al. (2025, CC BY-NC). Coverage: ~4.9k arthropod species from NW Europe across 10 orders (Coleoptera, Hemiptera, Orthoptera, Araneae, Diptera, Hymenoptera, Lepidoptera, etc.).
The same data.frame with additional columns:
Body size in mm.
Dispersal ability (0–1 ratio within order).
Mean number of generations per year.
Fecundity (number of eggs/offspring).
Development time in days.
Adult lifespan in days.
Mean thermal niche (degrees C).
Activity period (diurnal/nocturnal/both).
Feeding guild of adult.
Trophic range of adult (specialist/generalist).
Logghe A et al. (2025) An in-depth dataset of northwestern European arthropod life histories and ecological traits. Biodiversity Data Journal 13:e146785.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Abax parallelepipedus", backend = "gbif") |> add_arthropod_traits() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Abax parallelepipedus", backend = "gbif") |> add_arthropod_traits() options(old)
Joins AVONET species-level averages for bird morphology, ecology,
and migration to a taxify() result by looking up accepted_name.
add_avonet(x, verbose = TRUE)add_avonet(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: AVONET (Tobias et al. 2022, Figshare, CC BY 4.0). Coverage: ~11k bird species. Birds only.
The same data.frame with additional columns:
Beak length in mm (culmen, species mean).
Beak depth in mm (species mean).
Wing length in mm (species mean).
Tail length in mm (species mean).
Tarsus length in mm (species mean).
Body mass in grams (species mean).
Hand-wing index (pointedness, species mean).
Primary habitat classification.
Trophic level classification.
Trophic niche classification.
Migration strategy: "sedentary", "partial",
or "full".
Tobias JA et al. (2022) AVONET: morphological, ecological and geographical data for all birds. Ecology Letters 25:581-597.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Parus major", backend = "gbif") |> add_avonet() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Parus major", backend = "gbif") |> add_avonet() options(old)
Joins Baseflor (Julve, Programme Catminat) plant traits to a taxify()
result by looking up accepted_name. Baseflor covers the vascular flora of
France and neighbouring regions, providing flowering phenology, pollination
and breeding biology, dispersal mode, and floral and fruit morphology.
add_baseflor(x, verbose = TRUE)add_baseflor(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: Baseflor, Programme Catminat (Julve 1998 ff.). Coverage: ~7,000 vascular plant taxa of France and neighbouring regions. Data are released under ODbL 1.0 / CC BY-SA 2.0.
For ecological indicator values on the light, temperature, moisture,
reaction, and nutrient axes, see add_eive() (European calibration). For
Raunkiaer life form and seed, leaf, and clonality traits of the Northwest
European flora, see add_leda().
The same data.frame with additional columns:
First month of flowering (1-12).
Last month of flowering (1-12). A value smaller
than flower_begin_month denotes a flowering period that wraps
across the new year (e.g. begin 10, end 6).
Pollination vector(s): insect, wind, water, self, apogamy. Comma-separated when more than one applies.
Diaspore dispersal mode(s): anemochory, barochory, epizoochory, endozoochory, myrmecochory, hydrochory, autochory, dyszoochory. Comma-separated when more than one applies.
Sexual system: hermaphroditic, monoecious, dioecious, gynodioecious, androdioecious, gynomonoecious, polygamous.
Flower colour(s): white, yellow, pink, green, blue, brown, black. Comma-separated when more than one applies.
Fruit type: achene, capsule, caryopsis, drupe, legume, silique, berry, follicle, cone, samara, pyxid.
Woody growth form for woody taxa: tree, small tree, large tree, shrub, bush, subshrub, liana, parasite. NA for non-woody (herbaceous) taxa.
Ellenberg-style continentality indicator value (1-9), the axis absent from EIVE.
Ellenberg-style salinity indicator value (0-9), the axis absent from EIVE.
Julve, Ph. (1998 ff.) baseflor. Index botanique, ecologique et chorologique de la Flore de France. Programme Catminat.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Bellis perennis") |> add_baseflor() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Bellis perennis") |> add_baseflor() options(old)
Joins extra Catalogue of Life columns to a taxify() result by
looking up taxon_id in the COL backbone. Only enriches rows where
backend == "col".
add_col_info(x)add_col_info(x)
x |
A data.frame returned by |
The same data.frame with additional columns:
Hybrid type from COL: "generic", "specific",
"infrageneric", or "infraspecific".
Nomenclatural code ("ICN", "ICZN", etc.).
Nomenclatural status.
Original publication reference.
Kingdom classification.
Phylum classification.
Class classification (renamed to avoid conflict with
R's class function).
Order classification.
Infraspecific epithet.
Logical. Whether the species is extinct (from SpeciesProfile, if available).
Logical. Whether the species is marine.
Logical. Whether the species is freshwater.
Logical. Whether the species is terrestrial.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur", backend = "col") |> add_col_info() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur", backend = "col") |> add_col_info() options(old)
Joins vernacular names to a taxify() result by looking up
accepted_name, filtered by language.
add_common_names(x, lang = "en", verbose = TRUE)add_common_names(x, lang = "en", verbose = TRUE)
x |
A data.frame returned by |
lang |
Character. ISO 639-1 language code (e.g., |
verbose |
Logical. Default |
Common names are merged from three sources:
GBIF backbone vernacular names (CC0) — multi-language via ISO 639-1 codes.
NCBI Taxonomy common names (public domain) — no language tag
(lang = NA).
Open Tree of Life common names (CC0) — no language tag
(lang = NA).
When multiple common names exist for a species in the requested language, the first (most commonly used) is returned.
The same data.frame with an additional column:
The vernacular name in the requested language,
or NA if none is available.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_common_names() taxify("Quercus robur") |> add_common_names(lang = "de") options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_common_names() taxify("Quercus robur") |> add_common_names(lang = "de") options(old)
Joins IUCN Red List conservation status to a taxify() result by
looking up accepted_name in the conservation status enrichment.
add_conservation_status(x, verbose = TRUE)add_conservation_status(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Show download progress if enrichment data needs
to be fetched. Default |
Conservation status values are compiled from publicly available sources including GBIF and the IUCN Red List API. Coverage is global across all taxonomic groups (~166k species).
The same data.frame with an additional column:
IUCN category: "LC" (Least Concern),
"NT" (Near Threatened), "VU" (Vulnerable), "EN" (Endangered),
"CR" (Critically Endangered), "EW" (Extinct in the Wild),
"EX" (Extinct), or NA if not assessed.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Panthera tigris", backend = "gbif") |> add_conservation_status() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Panthera tigris", backend = "gbif") |> add_conservation_status() options(old)
Joins an external data source (CSV file or data.frame) to a taxify()
result. Species names in the external data are matched through the same
backbone(s) used in the original taxify() call, and the join is performed
on accepted_id — so synonyms in either dataset resolve to the same key.
add_data( x, data, species_col = NULL, table = NULL, sheet = NULL, start_row = NULL, cols = NULL, group_col = NULL, groups = "all", fuzzy = TRUE, fuzzy_threshold = 0.2, verbose = TRUE )add_data( x, data, species_col = NULL, table = NULL, sheet = NULL, start_row = NULL, cols = NULL, group_col = NULL, groups = "all", fuzzy = TRUE, fuzzy_threshold = 0.2, verbose = TRUE )
x |
A data.frame returned by |
data |
One of:
|
species_col |
Character. Name of the column in |
table |
Character. Required when |
sheet |
Integer or character. Sheet to read when |
start_row |
Integer. Row where column headers begin in an |
cols |
Character vector of column names from |
group_col |
Character or |
groups |
Character vector or |
fuzzy |
Logical. Enable fuzzy matching for names in |
fuzzy_threshold |
Numeric. Maximum allowed distance for fuzzy matches.
Default |
verbose |
Logical. Default |
The workflow:
Read data (CSV or data.frame).
Identify the species column (explicit or auto-detected).
Match species names through the same backbone(s) as the original
taxify() call, obtaining accepted_id for each row.
Check for conflicting duplicates: if multiple rows in data resolve
to the same accepted_id with different values, an error is raised
(unless group_col is set).
Exact duplicates produce a warning and are deduplicated.
Left-join on accepted_id.
When your data has multiple rows per species (e.g., one row per species
per country), set group_col to produce wide output with suffixed
columns. This is the same format as the built-in grouped enrichments.
When species_col is not specified, add_data() takes the first 10 rows
of each character column and runs them through taxify(). The column with
the highest match rate is selected. If no column achieves at least 50%
matches, an error is raised asking the user to specify species_col
explicitly.
The input data.frame with additional columns from data, joined
via backbone-resolved accepted_id. Columns from data that collide
with existing columns in x are prefixed with "data_".
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) result <- taxify(c("Quercus robur", "Pinus sylvestris")) traits <- data.frame(species = c("Quercus robur", "Pinus sylvestris"), height = c(30, 25)) result |> add_data(traits, species_col = "species") options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) result <- taxify(c("Quercus robur", "Pinus sylvestris")) traits <- data.frame(species = c("Quercus robur", "Pinus sylvestris"), height = c(30, 25)) result |> add_data(traits, species_col = "species") options(old)
Joins species-level mean seed mass and plant height from Diaz et al.
(2022) to a taxify() result by looking up accepted_name.
add_diaz_traits(x, verbose = TRUE)add_diaz_traits(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: Diaz et al. 2022, TRY File Archive (CC BY 3.0). Coverage: ~46k plant species. Plants only.
The same data.frame with additional columns:
Seed mass in milligrams (species-level mean).
Plant height in metres (species-level mean).
Diaz S et al. (2022) The global spectrum of plant form and function: enhanced species-level trait data. TRY File Archive.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_diaz_traits() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_diaz_traits() options(old)
Joins traits from the Ecological Flora of the British Isles (Fitter & Peat
1994) to a taxify() result by looking up accepted_name. Ecoflora covers
the vascular flora of the British Isles, providing canopy height, leaf
traits, life form, flowering phenology, pollination and reproduction, seed
weight, and British-calibrated Ellenberg indicator values. Every column
carries a _uk suffix to mark the British-flora calibration and to avoid
collisions when chained with other plant-trait enrichments
(e.g. add_baseflor() for France, add_floraweb() for Germany).
add_ecoflora(x, verbose = TRUE)add_ecoflora(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: Ecoflora (Ecological Flora of the British Isles). Ecoflora has no
bulk download or API; the bundled dataset was collected one species at a
time and is redistributed under the source licence (CC BY-NC-SA 4.0). The
.vtr is downloaded from the taxify release on first use and cached.
For French-flora traits see add_baseflor(); for German-flora traits see
add_floraweb(); for European-calibration indicator values see add_eive().
The same data.frame with additional _uk columns:
Canopy height range (mm).
Leaf area class.
Leaf longevity (e.g. evergreen, deciduous).
Root system type.
Photosynthetic pathway (C3/C4/CAM).
Raunkiaer life form.
Reproduction method.
Flowering months (1-12).
Pollen vector(s).
Seed weight (mg).
Propagule / dispersule type.
Ellenberg indicator values calibrated for the British flora (light, moisture, reaction, nitrogen, salt).
Fitter AH, Peat HJ (1994) The Ecological Flora Database. Journal of Ecology 82:415-425.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Bellis perennis") |> add_ecoflora() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Bellis perennis") |> add_ecoflora() options(old)
Joins EIVE 1.0 (Dengler et al. 2023) ecological indicator values to a
taxify() result by looking up accepted_name. EIVE provides
continuous indicator values for European vascular plants, superseding
the original ordinal Ellenberg values.
add_eive(x, verbose = TRUE)add_eive(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: EIVE 1.0 (Dengler et al. 2023, Zenodo, CC BY 4.0). Coverage: ~14.5k European vascular plant species.
The same data.frame with additional columns:
Light indicator value (continuous).
Temperature indicator value (continuous).
Moisture indicator value (continuous).
Soil reaction (pH) indicator value (continuous).
Nutrient indicator value (continuous).
Dengler J et al. (2023) EIVE 1.0 – a standardized set of Ecological Indicator Values for Europe. Vegetation Classification and Survey 4:7-29. doi:10.3897/VCS.98324
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Arrhenatherum elatius") |> add_eive() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Arrhenatherum elatius") |> add_eive() options(old)
Joins EltonTraits 1.0 diet composition, foraging strata, body mass,
and activity data to a taxify() result by looking up accepted_name.
add_elton_traits(x, verbose = TRUE)add_elton_traits(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: EltonTraits 1.0 (Wilman et al. 2014, Figshare, CC0). Coverage: ~15.4k species. Birds and mammals only.
The same data.frame with additional columns:
Percentage of diet: invertebrates.
Percentage of diet: endothermic vertebrates.
Percentage of diet: ectothermic vertebrates.
Percentage of diet: fish.
Percentage of diet: unknown vertebrates.
Percentage of diet: scavenging.
Percentage of diet: fruit.
Percentage of diet: nectar.
Percentage of diet: seeds and nuts.
Percentage of diet: other plant material.
Percentage of foraging: below water surface.
Percentage of foraging: on ground.
Percentage of foraging: in understory.
Percentage of foraging: in mid to high strata.
Percentage of foraging: in canopy.
Percentage of foraging: aerial.
Body mass in grams.
Nocturnal activity (0 = diurnal, 1 = nocturnal).
Wilman H et al. (2014) EltonTraits 1.0: Species-level foraging attributes of the world's birds and mammals. Ecology 95:2027.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Parus major", backend = "gbif") |> add_elton_traits() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Parus major", backend = "gbif") |> add_elton_traits() options(old)
Joins FISHMORPH morphological trait data to a taxify() result by
looking up accepted_name.
add_fish_traits(x, verbose = TRUE)add_fish_traits(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: FISHMORPH (Brosse et al. 2021, Figshare, CC BY 4.0). Coverage: ~8.3k freshwater fish species.
The same data.frame with additional columns:
Maximum body length (cm).
Body elongation (body length / body depth).
Vertical eye position (eye position / head depth).
Relative eye size (eye diameter / head length).
Oral gape position (mouth position: 0 = inferior, 0.5 = terminal, 1 = superior).
Relative maxillary length (maxillary length / head length).
Body lateral shape (body depth / caudal peduncle depth).
Pectoral fin vertical position (fin insertion depth / body depth).
Pectoral fin size (fin length / body length).
Caudal peduncle throttling (caudal peduncle depth / caudal fin depth).
Brosse S, Charpin N, Su G, Toussaint A, Herrera-R GA, Tedesco PA, Villegé r S (2021) FISHMORPH: A global database on morphological traits of freshwater fishes. Global Ecology and Biogeography 30:2330-2336. doi:10.1111/geb.13395
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Salmo trutta", backend = "gbif") |> add_fish_traits() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Salmo trutta", backend = "gbif") |> add_fish_traits() options(old)
Joins FishBase morphological and ecological traits to a taxify() result
by looking up accepted_name.
add_fishbase(x, verbose = TRUE)add_fishbase(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: FishBase via rfishbase (Froese & Pauly, CC BY-NC 3.0). Coverage: ~35k fish species. Fishes only.
The build-from-source fallback requires the rfishbase package
(available on CRAN). Pre-built .vtr files do not require rfishbase.
The same data.frame with additional columns:
Maximum body length in centimetres.
Body mass in grams (estimated from length-weight relationships where available).
Trophic level.
Minimum depth in metres.
Maximum depth in metres.
Vulnerability index (0–100).
Habitat type (e.g. demersal, pelagic).
Commercial importance category.
Froese R, Pauly D (eds.) (2024) FishBase. World Wide Web electronic publication, https://www.fishbase.org.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Gadus morhua", backend = "gbif") |> add_fishbase() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Gadus morhua", backend = "gbif") |> add_fishbase() options(old)
Joins traits from FloraWeb (Bundesamt fuer Naturschutz) to a taxify()
result by looking up accepted_name. FloraWeb is the live national portal
carrying the BiolFlor trait data (Klotz, Kuehn & Durka 2002) together with
Rothmaler morphology and Ellenberg indicator values. This enrichment covers
the full per-species trait profile scraped from the four FloraWeb trait
pages: morphology, reproductive biology, the nine Ellenberg indicator
values, ploidy and chromosome number, and chorological distribution. Every
column carries a _de suffix to mark the German-flora calibration and to
avoid collisions when chained with other plant-trait enrichments
(e.g. add_ecoflora() for Britain, add_baseflor() for France).
add_floraweb(x, verbose = TRUE)add_floraweb(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: FloraWeb (https://www.floraweb.de/), Bundesamt fuer Naturschutz,
Bonn. FloraWeb has no bulk export or API; the bundled dataset was scraped
per species (accessed 2026-06-24) and that access date is the dataset
version. The trait data largely derive from BiolFlor, which per the BioFresh
metadata statement is publicly available and may be used without
restrictions provided it is acknowledged and cited correctly. The .vtr is
downloaded from the taxify release on first use and cached.
For British-flora traits see add_ecoflora(); for French-flora traits see
add_baseflor(); for European-calibration indicator values see add_eive().
The same data.frame with German trait columns (all suffixed _de),
grouped as:
height_de, life_form_de, leaf_shape_de,
leaf_anatomy_de, leaf_persistence_de, storage_organs_de,
flowering_months_de, flowering_months_biolflor_de,
flowering_phase_de, phenological_season_de, description_de.
pollination_vector_de, pollinator_de,
pollinator_reward_de, flower_type_de, flower_class_de,
dispersal_type_de, diaspore_type_de, germinule_type_de,
reproduction_type_de, vegetative_spread_de, fertilization_type_de,
apomixis_de, dicliny_de, dichogamy_de, self_incompatibility_de,
si_mechanism_de, ploidy_de, chromosome_number_de,
chromosome_freq_de, chromosomes_de.
the nine Ellenberg indicator values ell_light_de,
ell_temperature_de, ell_continentality_de, ell_moisture_de,
ell_moisture_variability_de, ell_reaction_de, ell_nitrogen_de,
ell_salt_de, heavy_metal_resistance_de, plus strategy_type_de
(Grime CSR), habitat_site_de, formation_de, plant_community_de,
biotope_type_de, forest_binding_de, hemeroby_de, urbanity_de.
floristic_zones_de, areal_formula_de,
areal_type_de, oceanity_de, range_centre_de, world_range_size_de,
world_range_frequency_de, world_range_position_de,
world_range_hazard_de, germany_range_share_de,
germany_responsibility_de.
Categorical traits with several applicable values are joined with "; ". Trait values are German (as published by FloraWeb / BiolFlor).
Klotz S, Kuehn I, Durka W (2002) BIOLFLOR - Eine Datenbank zu biologisch-oekologischen Merkmalen der Gefaesspflanzen in Deutschland. Schriftenreihe fuer Vegetationskunde 38. Bundesamt fuer Naturschutz, Bonn.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Bellis perennis") |> add_floraweb() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Bellis perennis") |> add_floraweb() options(old)
Joins FungalTraits (Polme et al. 2020) genus-level trait data to a
taxify() result by looking up genus. Unlike other enrichments that
join on species-level accepted_name, FungalTraits is a genus-level
database and joins on the genus column already present in taxify output.
add_fungal_traits(x, verbose = TRUE)add_fungal_traits(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: FungalTraits (Polme et al. 2020, Fungal Diversity, CC BY 4.0). Coverage: ~10k fungal genera. Genus-level only (not species-level).
The same data.frame with additional columns:
Primary ecological role (e.g., saprotroph, mycorrhizal, pathogen, endophyte, lichenized, parasite).
Secondary ecological role, if any.
Morphological growth form (e.g., agaricoid, corticioid, polyporoid, yeast).
Fruiting body morphology (e.g., gasteroid, pileate, resupinate).
Substrate type for saprotrophic genera (e.g., wood, litter, dung, soil).
Capacity to cause plant disease (e.g., high, medium, low, none).
Capacity for animal biotrophy.
Capacity for endophytic interactions with plants.
Exploration type for ectomycorrhizal genera (e.g., contact, short, medium, long).
Polme S et al. (2020) FungalTraits: a user-friendly traits database of fungi and fungus-like stramenopiles. Fungal Diversity 105:1-16. doi:10.1007/s13225-020-00466-2
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Amanita muscaria", backend = "gbif") |> add_fungal_traits() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Amanita muscaria", backend = "gbif") |> add_fungal_traits() options(old)
Joins FUNGuild trophic mode, guild, growth morphology, and confidence
data to a taxify() result by looking up accepted_name. Species-level
matches take priority; genus-level guild assignments are used as fallback
for unmatched species.
add_funguild(x, verbose = TRUE)add_funguild(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: FUNGuild (Nguyen et al. 2016, CC BY 4.0). Coverage: ~13k taxa. Fungi only.
The enrichment first attempts species-level matching. For species without a direct match, it falls back to genus-level guild assignments from FUNGuild's genus-rank entries.
The same data.frame with additional columns:
Trophic mode (e.g., Pathotroph, Saprotroph, Symbiotroph, or hyphenated combinations).
Functional guild (e.g., "Ectomycorrhizal", "Plant Pathogen", "Wood Saprotroph").
Growth morphology (e.g., "Agaricoid", "Microfungus"). Prefixed to avoid collision with FungalTraits.
Confidence of the guild assignment (Possible, Probable, Highly Probable).
Nguyen NH et al. (2016) FUNGuild: An open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecology 20:241-248.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Amanita muscaria", backend = "gbif") |> add_funguild() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Amanita muscaria", backend = "gbif") |> add_funguild() options(old)
Joins extra GBIF backbone columns to a taxify() result by
looking up taxon_id in the GBIF backbone. Only enriches rows where
backend == "gbif".
add_gbif_info(x)add_gbif_info(x)
x |
A data.frame returned by |
The same data.frame with additional columns:
Hybrid type: "GENERIC", "SPECIFIC", or
"INFRASPECIFIC".
Nomenclatural status (may contain multiple values).
Basionym author in parentheses.
Basionym author year.
Combining author year.
Publication citation.
How the name entered the backbone.
Infraspecific epithet.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur", backend = "gbif") |> add_gbif_info() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur", backend = "gbif") |> add_gbif_info() options(old)
Joins GloNAF (Global Naturalized Alien Flora) data to a taxify()
result, filtered by region.
add_glonaf(x, region, verbose = TRUE)add_glonaf(x, region, verbose = TRUE)
x |
A data.frame returned by |
region |
Character. GloNAF region identifier(s), or
|
verbose |
Logical. Default |
Source: GloNAF v2.0 (van Kleunen et al. 2019, Davis et al. 2025, CC BY 4.0). Coverage: ~16k alien plant taxa across ~1,300 regions. Plants only.
The same data.frame with additional column(s):
Integer 1 if the species is recorded as
naturalized in that region, NA otherwise.
van Kleunen M et al. (2019) The Global Naturalized Alien Flora (GloNAF) database. Ecology 100:e02542.
Davis K et al. (2025) The updated Global Naturalized Alien Flora (GloNAF 2.0) database. Ecology, e70245.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Robinia pseudoacacia") |> add_glonaf(region = "EUR") taxify("Robinia pseudoacacia") |> add_glonaf(region = c("EUR", "NAM")) options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Robinia pseudoacacia") |> add_glonaf(region = "EUR") taxify("Robinia pseudoacacia") |> add_glonaf(region = c("EUR", "NAM")) options(old)
Parses the input_name column from a taxify() result to extract
hybrid parent names and classify the hybrid type.
add_hybrid_info(x)add_hybrid_info(x)
x |
A data.frame returned by |
The same data.frame with additional columns:
First parent (full binomial), NA if not a
hybrid formula.
Second parent (full binomial, abbreviated
genus expanded), NA if not a hybrid formula.
One of "nothogenus", "nothospecies",
"formula", or NA if not a hybrid.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus pyrenaica x Q. petraea") |> add_hybrid_info() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus pyrenaica x Q. petraea") |> add_hybrid_info() options(old)
Joins GRIIS (Global Register of Introduced and Invasive Species) data
to a taxify() result, filtered by country.
add_invasive_status(x, country, verbose = TRUE)add_invasive_status(x, country, verbose = TRUE)
x |
A data.frame returned by |
country |
Character. ISO 3166-1 alpha-2 country code(s), or
|
verbose |
Logical. Default |
Source: GRIIS (Zenodo combined CSV, CC BY 4.0, 196 countries). Coverage: ~23k name x country combinations.
The same data.frame with additional column(s):
One of "native", "introduced", "invasive",
or NA if not recorded for that country.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Robinia pseudoacacia") |> add_invasive_status(country = "AT") taxify("Robinia pseudoacacia") |> add_invasive_status(country = c("AT", "DE")) options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Robinia pseudoacacia") |> add_invasive_status(country = "AT") taxify("Robinia pseudoacacia") |> add_invasive_status(country = c("AT", "DE")) options(old)
Joins LEDA Traitbase (Kleyer et al. 2008) plant functional traits to a
taxify() result by looking up accepted_name. LEDA provides species-level
trait data for NW European plant species, covering life form, dispersal,
seed, leaf, and clonality traits.
add_leda(x, verbose = TRUE)add_leda(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: LEDA Traitbase (Kleyer et al. 2008). Coverage: ~8,000 NW European plant species.
The Raunkiaer life form is a bud-position classification system: phanerophyte = buds >25 cm above soil, chamaephyte = buds near soil surface, hemicryptophyte = buds at soil surface, geophyte (cryptophyte) = buds below soil, therophyte = annual that survives as seed.
The same data.frame with additional columns:
Primary Raunkiaer life form classification (phanerophyte, chamaephyte, hemicryptophyte, geophyte, therophyte, helophyte, hydrophyte).
1 if species assigned to multiple Raunkiaer forms, 0 otherwise.
Primary dispersal type (anemochory, zoochory, hydrochory, autochory, barochory, dysochory).
Seed terminal velocity in m/s (species median).
Seed mass in mg (species median). Prefixed with
leda_ in the .vtr to avoid collision with Diaz traits.
Canopy height in meters (species median).
Leaf dry mass in mg (species median).
Specific leaf area in mm/mg (species median).
Capable of clonal growth (1 = yes, 0 = no).
Seed buoyancy classification.
Kleyer M et al. (2008) The LEDA Traitbase: a database of life-history traits of the Northwest European flora. Journal of Ecology 96:1266-1274.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Arrhenatherum elatius") |> add_leda() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Arrhenatherum elatius") |> add_leda() options(old)
Joins LepTraits 1.0 butterfly life-history and ecological traits to a
taxify() result by looking up accepted_name.
add_leptraits(x, verbose = TRUE)add_leptraits(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: LepTraits 1.0 (Shirey et al. 2022, CC0). Coverage: ~12.4k butterfly species globally (Papilionoidea).
The same data.frame with additional columns:
Wingspan in mm (midpoint of lower and upper bounds).
Number of generations per year.
Overwintering/diapause life stage.
Canopy association category.
Edge/gap affinity category.
Moisture affinity category.
Disturbance affinity category.
Number of host plant families used.
Number of months with adult flight activity.
Shirey V et al. (2022) LepTraits 1.0: A globally comprehensive dataset of butterfly traits. Scientific Data 9:398.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Vanessa cardui", backend = "gbif") |> add_leptraits() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Vanessa cardui", backend = "gbif") |> add_leptraits() options(old)
Joins lizard trait data from Meiri (2018) to a taxify() result by
looking up accepted_name.
add_lizard_traits(x, verbose = TRUE)add_lizard_traits(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: Meiri (2018, Global Ecology and Biogeography, CC BY 4.0). Coverage: ~6,600 lizard species. Lizards only.
The same data.frame with additional columns:
Body mass in grams.
Snout-vent length in mm.
Tail length in mm.
Clutch size.
Clutches per year.
Maximum longevity in years.
Diet category.
Habitat type.
Activity time (diurnal/nocturnal/crepuscular).
Foraging mode (sit-and-wait/active).
Meiri S (2018) Traits of lizards of the world: Variation around a successful evolutionary design. Global Ecology and Biogeography 27:1168-1172.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Pogona vitticeps", backend = "gbif") |> add_lizard_traits() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Pogona vitticeps", backend = "gbif") |> add_lizard_traits() options(old)
Joins PanTHERIA mammal life-history and ecological traits to a
taxify() result by looking up accepted_name.
add_pantheria(x, verbose = TRUE)add_pantheria(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: PanTHERIA (Jones et al. 2009, Ecological Archives, CC0). Coverage: ~5.4k mammal species. Mammals only.
The same data.frame with additional columns:
Adult body mass in grams.
Maximum longevity in months.
Litter size (mean).
Gestation length in days.
Weaning age in days.
Home range size in km.
Diet breadth (number of diet categories).
Habitat breadth (number of habitat types).
Jones KE et al. (2009) PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology 90:2648.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Vulpes vulpes", backend = "gbif") |> add_pantheria() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Vulpes vulpes", backend = "gbif") |> add_pantheria() options(old)
Fetches Italian Ellenberg-type indicator values, life form, and chorotype
from Pignatti's Flora d'Italia (Pignatti, Menegoni & Pietrosanti 2005) for
the species in a taxify() result, using the TR8 package, and joins them by
accepted_name. TR8 ships these values bundled, so this works offline.
add_pignatti(x, verbose = TRUE)add_pignatti(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
These values originate in a copyrighted publication, so taxify does not
redistribute them. This function reads the copy bundled in the suggested
package TR8 (which redistributes it under TR8's GPL, with attribution);
taxify ships none of it and no internet access is required. For
European-calibration indicator values see add_eive().
The same data.frame with additional columns:
Ellenberg-type indicator values
calibrated for the Italian flora (codes; X = indifferent, 0 = not
applicable).
Life form for the Italian flora.
Chorological type (distribution).
Pignatti S, Menegoni P, Pietrosanti S (2005) Bioindicazione attraverso le piante vascolari. Braun-Blanquetia 39. Bocci G (2015) TR8: an R package for easily retrieving plant species traits. Methods in Ecology and Evolution 6:347-350.
old <- options(taxify.data_dir = taxify_example_data()) # add_pignatti() fetches Italian trait data on demand via the TR8 package. taxify("Abies alba") |> add_pignatti() options(old)old <- options(taxify.data_dir = taxify_example_data()) # add_pignatti() fetches Italian trait data on demand via the TR8 package. taxify("Abies alba") |> add_pignatti() options(old)
Parses the input_name column from a taxify() result to extract
taxonomic qualifiers (cf., aff., s.l., etc.) and their positions.
add_qualifier_info(x)add_qualifier_info(x)
x |
A data.frame returned by |
The same data.frame with additional columns:
The qualifier found (e.g., "cf.", "aff."),
or NA if none.
Integer position (character index) of the
qualifier in the original name, or NA if none.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Pinus cf. sylvestris") |> add_qualifier_info() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Pinus cf. sylvestris") |> add_qualifier_info() options(old)
Joins WCVP (World Checklist of Vascular Plants, Kew) native range
data to a taxify() result, filtered by TDWG botanical region.
add_wcvp(x, region, verbose = TRUE)add_wcvp(x, region, verbose = TRUE)
x |
A data.frame returned by |
region |
Character. TDWG Level 2 region code(s), or
|
verbose |
Logical. Default |
Source: WCVP (Kew, CC BY). Coverage: ~340k plant species. Plants only.
The same data.frame with additional column(s):
One of "native", "introduced", "extinct",
or NA if not recorded for that region.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_wcvp(region = "EUR") taxify("Quercus robur") |> add_wcvp(region = c("EUR", "NAM")) options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_wcvp(region = "EUR") taxify("Quercus robur") |> add_wcvp(region = c("EUR", "NAM")) options(old)
Joins extra World Flora Online columns to a taxify() result by
looking up taxon_id in the WFO backbone.
add_wfo_info(x)add_wfo_info(x)
x |
A data.frame returned by |
The same data.frame with additional columns:
WFO scientificNameID.
WFO parentNameUsageID.
Publication reference.
Higher classification string.
Taxonomic remarks.
Infraspecific epithet (for subspecies, varieties, forms).
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_wfo_info() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_wfo_info() options(old)
Joins woodiness data from Zanne et al. (2014) to a taxify() result
by looking up accepted_name.
add_woodiness(x, verbose = TRUE)add_woodiness(x, verbose = TRUE)
x |
A data.frame returned by |
verbose |
Logical. Default |
Source: Zanne et al. 2014, Nature (Dryad, CC0). Coverage: ~50k plant species. Plants only.
The same data.frame with an additional column:
One of "woody", "herbaceous", "variable",
or NA if not in the dataset.
Zanne AE et al. (2014) Three keys to the radiation of angiosperms into freezing environments. Nature 506:89-92.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_woodiness() options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) taxify("Quercus robur") |> add_woodiness() options(old)
Prints formatted citations for the taxonomic backbone(s), enrichment layers, and the taxify package itself. Optionally writes a BibTeX file.
cite(x, file = NULL)cite(x, file = NULL)
x |
A |
file |
Optional file path. If provided, BibTeX entries are written
to this file (extension should be |
x, invisibly (pipe-friendly).
result <- taxify("Quercus robur", backend = "wfo") result |> cite() result |> cite(file = tempfile(fileext = ".bib"))result <- taxify("Quercus robur", backend = "wfo") result |> cite() result |> cite(file = tempfile(fileext = ".bib"))
Writes a taxify() result (with any enrichments) to disk in one of several
formats. The default .vtr format preserves column types and is fast to
re-read with add_data().
export_data(x, path, overwrite = FALSE)export_data(x, path, overwrite = FALSE)
x |
A data.frame returned by |
path |
Character. Output file path. The format is inferred from the
extension: |
overwrite |
Logical. Overwrite an existing file? Default |
Invisibly returns path.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) result <- taxify(c("Quercus robur", "Pinus sylvestris")) result |> export_data(tempfile(fileext = ".vtr")) result |> export_data(tempfile(fileext = ".csv")) result |> export_data(tempfile(fileext = ".tsv")) options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) result <- taxify(c("Quercus robur", "Pinus sylvestris")) result |> export_data(tempfile(fileext = ".vtr")) result |> export_data(tempfile(fileext = ".csv")) result |> export_data(tempfile(fileext = ".tsv")) options(old)
Returns a summary of all enrichment layers available in the taxify manifest, including version, row count, whether the dataset is static, and which trait columns are provided.
list_enrichments(verbose = TRUE)list_enrichments(verbose = TRUE)
verbose |
Logical. Default |
A data.frame with columns: name, version, nrow, static,
trait_cols (comma-separated), and source_url.
list_enrichments()list_enrichments()
Returns the register row for the given genus, or NULL if not found.
Auto-loads the register on first call.
lookup_genus(genus)lookup_genus(genus)
genus |
Character scalar. The genus name to look up. |
A one-row data.frame, or NULL if the genus is not in the register.
Delegates to the standard data.frame print method.
## S3 method for class 'taxify_result' print(x, ...)## S3 method for class 'taxify_result' print(x, ...)
x |
A |
... |
Passed to the next method. |
x, invisibly.
Prints a compact digest of match quality and life-form scope to the console.
Uses cat() so output is captured by capture.output() and rendered
correctly in knitr chunks.
## S3 method for class 'taxify_result' summary(object, ...)## S3 method for class 'taxify_result' summary(object, ...)
object |
A |
... |
Ignored. |
object, invisibly.
Matches a vector of taxonomic names against locally stored Darwin Core backbone databases. Returns a data.frame with one row per input name containing the matched name, accepted name, taxonomic hierarchy, and match quality information.
taxify( x, backend = "wfo", fuzzy = TRUE, fuzzy_threshold = 0.2, fuzzy_method = c("dl", "levenshtein", "jw"), verbose = TRUE )taxify( x, backend = "wfo", fuzzy = TRUE, fuzzy_threshold = 0.2, fuzzy_method = c("dl", "levenshtein", "jw"), verbose = TRUE )
x |
Character vector of taxonomic names. |
backend |
Character vector of backend names (e.g., |
fuzzy |
Logical. Enable fuzzy matching for names that fail exact
match. Default |
fuzzy_threshold |
Numeric. Maximum allowed distance for fuzzy matches. Two modes depending on the value:
|
fuzzy_method |
Character. One of |
verbose |
Logical. Print progress messages. Default |
When multiple backends are specified, names are matched against each backend in order. Names matched by an earlier backend are not re-matched by later ones (fallback chain).
A data.frame with one row per input name and the following columns:
The original name as provided.
Full name in the backbone that matched.
Resolved accepted name (equals matched_name
if not a synonym).
Backend-specific ID of the matched name.
ID of the accepted name.
Taxonomic rank (species, subspecies, genus, etc.).
Family name.
Genus name.
Specific epithet.
Authorship of the matched name.
Authorship of the accepted name. For a synonym
this is the author of the resolved accepted name, not the synonym's own
author, so accepted_name and accepted_authorship together form the
accepted name's full citation.
Logical. Was the match a synonym?
Logical. Was a hybrid marker detected in the input?
One of "exact", "exact_ci", "fuzzy", "abbrev"
(an abbreviated genus such as "Q. robur" resolved via genus initial
plus epithet), or "none".
Normalized string distance (0–1), NA if exact.
Logical. TRUE when the matched scientificName had
multiple synonym rows pointing to different accepted taxa at the same
priority tier (homonym ambiguity). Disambiguated via
nomenclaturalStatus = "Valid" when that column is in the backbone;
for irreducible ambiguity, the scalar columns hold one candidate.
Character. |-joined list of conflicting
accepted taxon IDs when is_ambiguous = TRUE; NA otherwise.
Which backend was used (e.g., "wfo", "col",
"gbif").
Backend name, version, and download date
(e.g., "wfo:2024-12 (2026-04-01)"). Useful for reproducibility.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) # Match a few names taxify(c("Quercus robur", "Pinus sylvestris")) # Disable fuzzy matching taxify("Quercus robus", fuzzy = FALSE) # Fallback chain: try WFO first, then COL for unmatched taxify(c("Quercus robur", "Panthera leo"), backend = c("wfo", "col")) options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) # Match a few names taxify(c("Quercus robur", "Pinus sylvestris")) # Disable fuzzy matching taxify("Quercus robus", fuzzy = FALSE) # Fallback chain: try WFO first, then COL for unmatched taxify(c("Quercus robur", "Panthera leo"), backend = c("wfo", "col")) options(old)
Removes all loaded backbone handles from memory. The next call to
taxify() will re-load from disk.
taxify_clear_cache()taxify_clear_cache()
No return value, called for side effects.
Returns the directory where taxify stores downloaded backbone and
enrichment .vtr files. By default this is the platform-appropriate
per-user cache returned by tools::R_user_dir() (available since R 4.0).
taxify_data_dir()taxify_data_dir()
The location can be overridden, in order of precedence, by the
taxify.data_dir option (getOption("taxify.data_dir")) or the
TAXIFY_DATA_DIR environment variable. This is useful to point taxify at
a shared cache, or at the small bundled example database returned by
taxify_example_data().
Character string. Path to the data directory.
Downloads the latest Darwin Core snapshot for the specified backend and
converts it to vectra's .vtr format for fast repeated queries.
taxify_download(backend, dest = NULL, verbose = TRUE, ...)taxify_download(backend, dest = NULL, verbose = TRUE, ...)
backend |
A |
dest |
Character. Destination directory. Defaults to
|
verbose |
Logical. Print progress messages. |
... |
Additional arguments passed to methods. |
Always re-downloads the latest release, overwriting any existing backbone.
Use taxify() for day-to-day matching — it auto-downloads on first use
and reuses the local copy thereafter.
The path to the .vtr file (invisibly).
Downloads pre-built enrichment .vtr files from the taxify manifest.
taxify_download_enrichment(enrichment, version = "latest", verbose = TRUE)taxify_download_enrichment(enrichment, version = "latest", verbose = TRUE)
enrichment |
Character. One or more enrichment names (e.g.,
|
version |
Character. |
verbose |
Logical. Default |
Available enrichments:
IUCN conservation status (LC/NT/VU/EN/CR/EW/EX)
GRIIS invasive species status by country
Zanne et al. 2014 woody/herbaceous classification
WCVP native range by TDWG botanical region
EIVE 1.0 ecological indicator values (European plants)
Diaz et al. 2022 seed mass and plant height
EltonTraits 1.0 diet and foraging (birds + mammals)
AVONET bird morphology and migration
PanTHERIA mammal life-history traits
GBIF vernacular names (multi-language)
AmphiBIO amphibian life-history and ecological traits
LEDA Traitbase NW European plant traits (Kleyer et al. 2008)
The path(s) to the downloaded .vtr file(s) (invisibly).
Downloads a pre-built .vtr backbone from Zenodo using the taxify manifest.
Progress is always shown. No prompts are shown — calling this function is
consent.
taxify_download_vtr(backend = "wfo", version = "latest", verbose = TRUE)taxify_download_vtr(backend = "wfo", version = "latest", verbose = TRUE)
backend |
Character. One of |
version |
Character. |
verbose |
Logical. Default |
The path(s) to the downloaded .vtr file(s) (invisibly).
taxify ships a tiny example database (a handful of species per backbone plus matching enrichment tables) so that examples and quick experiments run offline, without downloading the full multi-million-row backbones.
taxify_example_data()taxify_example_data()
Point taxify at it for the current session by setting the
taxify.data_dir option:
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus robur") |> add_woodiness()
options(old) # restore the real data directory
The example database is read-only and covers only the species used in the package examples; use the full downloaded backbones for real work.
Character string. Path to the bundled example database directory,
or "" if it is not installed.
Reads genus_register.vtr from disk and caches it as a data.frame in
.taxify_env$register. Subsequent calls reuse the cached version unless
force = TRUE.
taxify_load_register(force = FALSE, verbose = TRUE)taxify_load_register(force = FALSE, verbose = TRUE)
force |
Logical. If |
verbose |
Logical. Print progress messages. Default |
The register contains one row per genus with columns: genus, kingdom,
phylum, class, order, family, life_form.
The register data.frame (invisibly).
Converts wide-format columns produced by grouped enrichments (e.g.,
invasive_status_AT, invasive_status_DE) back to long format with
one row per species x group combination.
taxify_long(x, cols = NULL, group_col = NULL, drop_na = FALSE)taxify_long(x, cols = NULL, group_col = NULL, drop_na = FALSE)
x |
A data.frame, typically a |
cols |
Character vector of base column names to reshape. These are
the column names without the group suffix (e.g., |
group_col |
Character. Name for the output group column.
If omitted, auto-detected from enrichment metadata (e.g.,
|
drop_na |
Logical. If |
When cols and group_col are omitted, taxify_long() reads the
reshape metadata attached by grouped enrichment functions
(add_invasive_status(), add_alien_first_records(), add_wcvp(),
add_common_names()). If multiple grouped enrichments were applied,
all are reshaped together (they must share the same group column).
Column matching uses the explicit base names in cols to avoid ambiguity.
For example, given cols = c("alien_first_record", "alien_first_record_source"), the column alien_first_record_source_AT
is correctly matched to base alien_first_record_source (not
alien_first_record with suffix source_AT), because longer base names
are matched first.
If the columns in x exactly match cols (no suffixed variants), the
data is already in single-group format. In this case, the data.frame is
returned unchanged with group_col set to NA.
A data.frame in long format. All columns from x that are not
part of the reshape are preserved. The reshaped columns use their base
names (without suffix), and a new group_col column contains the
group code extracted from the suffix.
# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) # Auto-detected: no cols or group_col needed taxify("Robinia pseudoacacia") |> add_alien_first_records(country = c("AT", "DE")) |> taxify_long() # Explicit: override auto-detection taxify("Robinia pseudoacacia") |> add_invasive_status(country = c("AT", "DE")) |> taxify_long(cols = "invasive_status", group_col = "country") options(old)# Runs offline against the bundled example database. old <- options(taxify.data_dir = taxify_example_data()) # Auto-detected: no cols or group_col needed taxify("Robinia pseudoacacia") |> add_alien_first_records(country = c("AT", "DE")) |> taxify_long() # Explicit: override auto-detection taxify("Robinia pseudoacacia") |> add_invasive_status(country = c("AT", "DE")) |> taxify_long(cols = "invasive_status", group_col = "country") options(old)
Forces the next fetch_manifest() call to re-fetch from the network.
Useful after the maintainer updates the manifest between R sessions without
restarting R.
taxify_refresh_manifest()taxify_refresh_manifest()
No return value, called for side effects.
Queries backend_coverage.vtr to determine which backends contain the
given genus, along with the backbone version at time of indexing.
taxify_register_coverage(genus)taxify_register_coverage(genus)
genus |
Character scalar. The genus name to query. |
A data.frame with columns genus, backend, version,
date_added. Returns a zero-row data.frame if the genus is not found
in any backend.