The World Register of Marine Species (WoRMS) is a comprehensive
database providing authoritative lists of marine organism names, managed
by taxonomic experts. It combines data from the Aphia database and other
sources like AlgaeBase and FishBase, offering species names, higher
classifications, and additional data. WoRMS is continuously updated and
maintained by taxonomists. In this tutorial, we source the R package worrms
to access WoRMS data for our function. Please note that the authors of
SHARK4R are not affiliated with WoRMS.
Phytoplankton data, including scientific names and AphiaIDs, are downloaded from SHARK. To see more download options, please visit the Retrieve Data From SHARK tutorial.
Taxon names can be matched with the WoRMS API to retrieve Aphia IDs
and corresponding taxonomic information. The
match_worms_taxa() function incorporates retry logic to
handle temporary failures, ensuring that all names are processed
successfully.
# Find taxa without Aphia ID
no_aphia_id <- shark_data %>%
filter(is.na(aphia_id))
# Randomly select taxa with missing aphia_id
taxa_names <- sample(unique(no_aphia_id$scientific_name),
size = 10,
replace = TRUE)
# Match taxa names with WoRMS
worms_records <- match_worms_taxa(unique(taxa_names),
fuzzy = TRUE,
best_match_only = TRUE,
marine_only = TRUE,
verbose = FALSE)
# Print result
print(worms_records)## # A tibble: 4 × 29
## name AphiaID url scientificname authority status unacceptreason taxonRankID
## <chr> <int> <chr> <chr> <chr> <chr> <chr> <int>
## 1 Cyli… 149004 http… Cylindrotheca… (Ehrenbe… accep… <NA> 220
## 2 Unic… NA <NA> <NA> <NA> no co… <NA> NA
## 3 Scri… 109545 http… Scrippsiella Balech e… accep… <NA> 180
## 4 Dipl… 109515 http… Diplopsalis R.S.Berg… accep… <NA> 180
## # ℹ 21 more variables: rank <chr>, valid_AphiaID <int>, valid_name <chr>,
## # valid_authority <chr>, parentNameUsageID <int>, originalNameUsageID <int>,
## # kingdom <chr>, phylum <chr>, class <chr>, order <chr>, family <chr>,
## # genus <chr>, citation <chr>, lsid <chr>, isMarine <int>, isBrackish <int>,
## # isFreshwater <int>, isTerrestrial <int>, isExtinct <lgl>, match_type <chr>,
## # modified <chr>
Taxonomic records can also be retrieved using Aphia IDs, employing
the same retry and error-handling logic as the
match_worms_taxa() function.
# Randomly select ten Aphia IDs
aphia_ids <- sample(unique(shark_data$aphia_id),
size = 10)
# Remove NAs
aphia_ids <- aphia_ids[!is.na(aphia_ids)]
# Retrieve records
worms_records <- get_worms_records(aphia_ids,
verbose = FALSE)
# Print result
print(worms_records)## # A tibble: 10 × 28
## AphiaID url scientificname authority status unacceptreason taxonRankID
## <int> <chr> <chr> <chr> <chr> <lgl> <int>
## 1 110223 https://w… Protoperidini… (Ostenfe… accep… NA 220
## 2 149120 https://w… Chaetoceros d… Cleve, 1… unass… NA 220
## 3 110156 https://w… Peridiniella … (Levande… accep… NA 220
## 4 160552 https://w… Dinobryon bal… (Schütt)… accep… NA 220
## 5 17657 https://w… Eutreptiella A.M.Cunh… accep… NA 180
## 6 149045 https://w… Nitzschia A.H. Has… accep… NA 180
## 7 160591 https://w… Monoraphidium… (Thuret)… accep… NA 220
## 8 149308 https://w… Thalassiosira… van Goor… unass… NA 220
## 9 109475 https://w… Gymnodinium F.Stein,… accep… NA 180
## 10 160553 https://w… Dinobryon fac… (Willén)… accep… NA 220
## # ℹ 21 more variables: rank <chr>, valid_AphiaID <int>, valid_name <chr>,
## # valid_authority <chr>, parentNameUsageID <int>, originalNameUsageID <int>,
## # kingdom <chr>, phylum <chr>, class <chr>, order <chr>, family <chr>,
## # genus <chr>, citation <chr>, lsid <chr>, isMarine <int>, isBrackish <int>,
## # isFreshwater <int>, isTerrestrial <int>, isExtinct <int>, match_type <chr>,
## # modified <chr>
SHARK sources taxonomic information from Dyntaxa, which is reflected in columns
starting with taxon_xxxxx. Equivalent columns based on
WoRMS can be retrieved using the add_worms_taxonomy()
function.
# Retrieve taxonomic table
worms_taxonomy <- add_worms_taxonomy(aphia_ids,
verbose = FALSE)
# Print result
print(worms_taxonomy)
# Enrich SHARK data with taxonomic data from WoRMS
shark_data_with_worms <- shark_data %>%
left_join(worms_taxonomy, by = "aphia_id")## # A tibble: 10 × 10
## aphia_id worms_scientific_name worms_kingdom worms_phylum worms_class
## <dbl> <chr> <chr> <chr> <chr>
## 1 110223 Protoperidinium granii Chromista Myzozoa Dinophyceae
## 2 149120 Chaetoceros danicus Chromista Heterokontophyta Bacillarioph…
## 3 110156 Peridiniella catenata Chromista Myzozoa Dinophyceae
## 4 160552 Dinobryon balticum Chromista Ochrophyta Chrysophyceae
## 5 17657 Eutreptiella Protozoa Euglenozoa Euglenophyce…
## 6 149045 Nitzschia Chromista Heterokontophyta Bacillarioph…
## 7 160591 Monoraphidium contortum Plantae <NA> Chlorophyceae
## 8 149308 Thalassiosira levanderi Chromista Heterokontophyta Bacillarioph…
## 9 109475 Gymnodinium Chromista Myzozoa Dinophyceae
## 10 160553 Dinobryon faculiferum Chromista Ochrophyta Chrysophyceae
## # ℹ 5 more variables: worms_order <chr>, worms_family <chr>, worms_genus <chr>,
## # worms_species <chr>, worms_hierarchy <chr>
To explore the full hierarchical taxonomy records of your Aphia IDs,
you can use the get_worms_taxonomy_tree() function. This
function retrieves records for the entire taxonomic tree from WoRMS,
including parent-child relationships, and can optionally fetch all
descendants (e.g. species) under a genus or known synonyms.
# Retrieve taxonomic tree
worms_tree <- get_worms_taxonomy_tree(
aphia_ids[1], # use first id only in this example
add_descendants = FALSE, # only retrieve hierarchy for given AphiaIDs
add_synonyms = FALSE, # do not retrieve synonyms
verbose = FALSE # suppress progress messages
)
# Print result
print(worms_tree)## # A tibble: 12 × 28
## AphiaID url scientificname authority status unacceptreason taxonRankID
## <int> <chr> <chr> <chr> <chr> <lgl> <int>
## 1 7 https://w… Chromista <NA> accep… NA 10
## 2 582419 https://w… Harosa <NA> accep… NA 20
## 3 536209 https://w… Alveolata Cavalier… accep… NA 25
## 4 450030 https://w… Myzozoa Cavalier… accep… NA 30
## 5 562620 https://w… Dinozoa <NA> accep… NA 40
## 6 146203 https://w… Dinoflagellata <NA> accep… NA 45
## 7 19542 https://w… Dinophyceae Fritsch,… accep… NA 60
## 8 493845 https://w… Peridiniphyci… Fensome,… accep… NA 70
## 9 109394 https://w… Peridiniales Haeckel,… accep… NA 100
## 10 109435 https://w… Protoperidini… J.P. Buj… accep… NA 140
## 11 109553 https://w… Protoperidini… Bergh, 1… accep… NA 180
## 12 110223 https://w… Protoperidini… (Ostenfe… accep… NA 220
## # ℹ 21 more variables: rank <chr>, valid_AphiaID <int>, valid_name <chr>,
## # valid_authority <chr>, parentNameUsageID <int>, originalNameUsageID <int>,
## # kingdom <chr>, phylum <chr>, class <chr>, order <chr>, family <chr>,
## # genus <chr>, citation <chr>, lsid <chr>, isMarine <int>, isBrackish <int>,
## # isFreshwater <int>, isTerrestrial <int>, isExtinct <lgl>, match_type <chr>,
## # modified <chr>
Phytoplankton data are often categorized into major groups such as Dinoflagellates, Diatoms, Cyanobacteria, and Others. This grouping can be achieved by referencing information from WoRMS and assigning taxa to these groups based on their taxonomic classification, as demonstrated in the example below.
# Subset data from one national monitoring station
nat_stations <- shark_data %>%
filter(station_name %in% c("BY5 BORNHOLMSDJ"))
# Randomly select one sample from the nat_stations
sample <- sample(unique(nat_stations$shark_sample_id_md5), 1)
# Subset the random sample
shark_data_subset <- shark_data %>%
filter(shark_sample_id_md5 == sample)
# Assign groups by providing both scientific name and Aphia ID
plankton_groups <- assign_phytoplankton_group(
scientific_names = shark_data_subset$scientific_name,
aphia_ids = shark_data_subset$aphia_id,
verbose = FALSE)
# Print result
distinct(plankton_groups)
# Add plankton groups to data and summarize abundance results
plankton_group_sum <- shark_data_subset %>%
mutate(plankton_group = plankton_groups$plankton_group) %>%
filter(parameter == "Abundance") %>%
group_by(plankton_group) %>%
summarise(sum_plankton_groups = sum(value, na.rm = TRUE))
# Plot a pie chart
ggplot(plankton_group_sum,
aes(x = "", y = sum_plankton_groups, fill = plankton_group)) +
geom_col(width = 1) +
coord_polar(theta = "y") +
labs(
title = "Phytoplankton Groups",
subtitle = paste(unique(shark_data_subset$station_name),
unique(shark_data_subset$sample_date)),
fill = "Plankton Group"
) +
theme_void() +
theme(plot.background = element_rect(fill = "white", color = NA))## # A tibble: 23 × 2
## scientific_name plankton_group
## <chr> <chr>
## 1 Pauliella taeniata Diatoms
## 2 Amylax triacantha Dinoflagellates
## 3 Aphanocapsa Cyanobacteria
## 4 Aphanothece Cyanobacteria
## 5 Chaetoceros similis Diatoms
## 6 Dinobryon balticum Other
## 7 Dinophysis acuminata Dinoflagellates
## 8 Dinophysis norvegica Dinoflagellates
## 9 Gymnodinium Dinoflagellates
## 10 Protodinium simplex Other
## # ℹ 13 more rows
You can add custom plankton groups by using the
custom_groups parameter, allowing flexibility to categorize
plankton based on specific taxonomic criteria. Please note that the
order of the list matters: taxa are assigned to the last matching group.
For example: Mesodinium rubrum will be excluded from the Ciliates group
because it appears after Ciliates in the list in the example below.
# Define custom plankton groups using a named list
custom_groups <- list(
"Cryptophytes" = list(class = "Cryptophyceae"),
"Green Algae" = list(class = c("Trebouxiophyceae",
"Chlorophyceae",
"Pyramimonadophyceae"),
phylum = "Chlorophyta"),
"Ciliates" = list(phylum = "Ciliophora"),
"Mesodinium rubrum" = list(scientific_name = "Mesodinium rubrum"),
"Dinophysis" = list(genus = "Dinophysis")
)
# Assign groups by providing scientific name only, and adding custom groups
plankton_groups <- assign_phytoplankton_group(
scientific_names = shark_data_subset$scientific_name,
custom_groups = custom_groups,
verbose = FALSE)
# Add new plankton groups to data and summarize abundance results
plankton_custom_group_sum <- shark_data_subset %>%
mutate(plankton_group = plankton_groups$plankton_group) %>%
filter(parameter == "Abundance") %>%
group_by(plankton_group) %>%
summarise(sum_plankton_groups = sum(value, na.rm = TRUE))
# Plot a new pie chart, including the custom groups
ggplot(plankton_custom_group_sum,
aes(x = "", y = sum_plankton_groups, fill = plankton_group)) +
geom_col(width = 1) +
coord_polar(theta = "y") +
labs(
title = "Phytoplankton Custom Groups",
subtitle = paste(unique(shark_data_subset$station_name),
unique(shark_data_subset$sample_date)),
fill = "Plankton Group"
) +
theme_void() +
theme(plot.background = element_rect(fill = "white", color = NA))## To cite package 'SHARK4R' in publications use:
##
## Lindh, M. and Torstensson, A. (2026). SHARK4R: Accessing and
## Validating Marine Environmental Data from 'SHARK' and Related
## Databases. R package version 1.2.0.
## https://CRAN.R-project.org/package=SHARK4R
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {SHARK4R: Accessing and Validating Marine Environmental Data from 'SHARK' and Related Databases},
## author = {Markus Lindh and Anders Torstensson},
## year = {2026},
## note = {R package version 1.2.0},
## url = {https://CRAN.R-project.org/package=SHARK4R},
## }