| Title: | Predict and Map Oyster Growth Suitability from Environmental Data |
|---|---|
| Description: | Predicts spatial suitability for oyster growth from environmental survey data using Analytic Hierarchy Process (AHP) weighted scoring. Users supply sensor data from Acoustic Doppler Current Profilers (ADCP), Conductivity-Temperature-Depth (CTD) sensors, bathymetric sonar, and sidescan sonar, specify a target species, and receive per-location suitability scores, a 'GeoTIFF' heatmap for 'QGIS', contour lines, and a formatted PDF or HTML report. Supports fourteen species across global aquaculture regions, including Ostrea edulis, Magallana gigas, Crassostrea virginica, Crassostrea hongkongensis, and ten further species; see list_species(). Includes season-aware scoring, tidal height correction, Bayesian tolerance parameter updating from field observations, spatial block cross-validation (Roberts et al., 2017, <doi:10.1111/ecog.02881>), permutation variable importance, wave exposure and sediment stability modules, Harmful Algal Bloom (HAB) risk and anthropogenic disturbance scoring with optional live International Council for the Exploration of the Sea (ICES) data integration, hybrid larval dispersal connectivity scoring (union-find Gaussian kernel plus optional 'OpenDrift' or Finite Volume Community Ocean Model ('FVCOM') connectivity matrix), and batch multi-species comparison. |
| Authors: | T Tucker [aut, cre] (ORCID: <https://orcid.org/0009-0008-5190-6423>) |
| Maintainer: | T Tucker <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.4.0 |
| Built: | 2026-05-15 22:05:00 UTC |
| Source: | https://github.com/cran/oystermapR |
Classifies each survey location as intertidal, subtidal, or supratidal based on its depth below Chart Datum (CD) and the tidal range at the nearest harmonic reference port. Chart Datum is approximately Lowest Astronomical Tide (LAT), so:
Subtidal: depth > 0 (always submerged)
Intertidal: -MHWS_above_CD <= depth <= 0
(exposed at some states of the tide; depth is at or above CD)
Supratidal: depth < -MHWS_above_CD
(above the highest astronomical tide; excluded from scoring)
MHWS above CD is approximated from the harmonic constituents of the
nearest port as: Z0 + 1.1 * (M2_H + S2_H) – the 10% factor accounts
for minor constituents not included in the 5-constituent model.
This flag is used internally by score_locations() to give Magallana
gigas (Pacific oyster) full depth scores for intertidal cells, reflecting
its strong intertidal ecology in NW European waters.
add_intertidal_flag( df, max_port_dist_km = 75, depth_col = "depth", verbose = TRUE )add_intertidal_flag( df, max_port_dist_km = 75, depth_col = "depth", verbose = TRUE )
df |
Dataframe with |
max_port_dist_km |
Numeric. Maximum distance to nearest harmonic port in km (default 75). If the survey centroid is further than this from all ports, the function falls back to a conservative intertidal window of 6 m above CD and issues a warning. |
depth_col |
Character. Name of the chart-datum depth column
(default |
verbose |
Logical. Print the nearest port name and derived MHWS
(default |
The input dataframe with an added intertidal_zone column:
"subtidal": depth > 0 m CD
"intertidal": 0 m CD >= depth >= -MHWS_above_CD
"supratidal": depth < -MHWS_above_CD
survey <- auto_tidal_correct(survey, datetime_col = "date") survey <- add_intertidal_flag(survey) # Check intertidal coverage table(survey$intertidal_zone)survey <- auto_tidal_correct(survey, datetime_col = "date") survey <- add_intertidal_flag(survey) # Check intertidal coverage table(survey$intertidal_zone)
Vectorised wrapper around detect_season() for use on dataframes.
add_season_column(df, date_col = NULL, lat_col = "lat")add_season_column(df, date_col = NULL, lat_col = "lat")
df |
A dataframe containing at minimum a date column and a latitude column. |
date_col |
Name of the date column (default |
lat_col |
Name of the latitude column (default |
The input dataframe with an additional column season.
Attaches a regulatory shellfish harvesting classification (A/B/C/Prohibited) to each row of a scored result dataframe. This classification affects the economic viability score for aquaculture applications: Prohibited sites receive viability = 0; Class C sites receive a 0.60 multiplier; Class B a 0.80 multiplier; Class A is unpenalised.
Classification can be supplied three ways (in priority order):
class_col – name of an existing column in result already
containing classification values ("A", "B", "C", "Prohibited", or NA).
classified_areas – a dataframe with lat, lon, and
shellfish_class columns, spatially matched to result rows within
match_radius_deg degrees.
fetch_live = TRUE – queries the FSA England/Wales open data API
(requires internet). Requires httr package. Works only for sites
within the FSA/DAERA coverage area (England, Wales, Northern Ireland).
Unclassified or unmatched sites receive shellfish_class = "Unclassified"
and a penalty of 0.70 (precautionary principle).
add_shellfish_classification( result, class_col = NULL, classified_areas = NULL, fetch_live = FALSE, match_radius_deg = 0.05, verbose = TRUE )add_shellfish_classification( result, class_col = NULL, classified_areas = NULL, fetch_live = FALSE, match_radius_deg = 0.05, verbose = TRUE )
result |
Dataframe from |
class_col |
Character. Name of an existing column in |
classified_areas |
Dataframe with |
fetch_live |
Logical. Query live FSA/DAERA API (default FALSE). Requires
internet access and the |
match_radius_deg |
Numeric. Spatial matching tolerance in decimal
degrees when using |
verbose |
Logical. Print matching summary (default TRUE). |
result with two additional columns:
shellfish_class: "A", "B", "C", "Prohibited", or "Unclassified"
shellfish_class_penalty: multiplier applied to economic viability
(1.0 for A, 0.80 for B, 0.60 for C, 0.0 for Prohibited, 0.70 for
Unclassified)
# Option 1: manual column already in data result <- add_shellfish_classification(result, class_col = "water_class") # Option 2: separate classified areas dataframe areas <- data.frame(lat = c(51.5), lon = c(-4.2), shellfish_class = c("B")) result <- add_shellfish_classification(result, classified_areas = areas) # Option 3: live fetch (internet required) result <- add_shellfish_classification(result, fetch_live = TRUE)# Option 1: manual column already in data result <- add_shellfish_classification(result, class_col = "water_class") # Option 2: separate classified areas dataframe areas <- data.frame(lat = c(51.5), lon = c(-4.2), shellfish_class = c("B")) result <- add_shellfish_classification(result, classified_areas = areas) # Option 3: live fetch (internet required) result <- add_shellfish_classification(result, fetch_live = TRUE)
Re-scores each row of a survey dataframe n_boot times, each time adding
Gaussian noise to every numeric variable proportional to its measurement
uncertainty, and returns the 5th and 95th percentile suitability scores as
suit_ci_lower and suit_ci_upper columns (90% interval).
Measurement uncertainty is specified via the uncertainty argument: a named
list where each name matches a variable name in tolerances$scored_factors
and the value is the 1-sigma absolute error (same units as the variable).
Any variable not listed defaults to 2% of its observed value.
add_suitability_ci( result, tolerances, n_boot = 200L, uncertainty = NULL, seed = 42L, verbose = FALSE )add_suitability_ci( result, tolerances, n_boot = 200L, uncertainty = NULL, seed = 42L, verbose = FALSE )
result |
Dataframe returned by |
tolerances |
Species tolerance list from |
n_boot |
Integer. Number of bootstrap resamples (default 200). |
uncertainty |
Named numeric vector. 1-sigma measurement errors by
variable name (e.g. |
seed |
Integer or NULL. Random seed for reproducibility (default 42). |
verbose |
Logical. Print progress (default FALSE). |
The input dataframe with additional columns:
suit_ci_lower, suit_ci_upper, suit_ci_width.
result <- predict_oyster(survey, "ostrea_edulis") tol <- get_species_tolerances("ostrea_edulis") result <- add_suitability_ci(result, tol, uncertainty = c(temperature = 0.3, salinity = 0.5)) # Columns suit_ci_lower, suit_ci_upper, suit_ci_width now present generate_report(result, "report.html") # map rings scale with CI widthresult <- predict_oyster(survey, "ostrea_edulis") tol <- get_species_tolerances("ostrea_edulis") result <- add_suitability_ci(result, tol, uncertainty = c(temperature = 0.3, salinity = 0.5)) # Columns suit_ci_lower, suit_ci_upper, suit_ci_width now present generate_report(result, "report.html") # map rings scale with CI width
Builds a spatial graph of cells above a suitability threshold and identifies connected components – contiguous patches of suitable habitat. Isolated cells or small patches are flagged because they are at higher risk of:
Recruitment failure – larvae from isolated populations may be flushed away before settling back in the patch.
Genetic bottlenecks – small isolated populations have lower allelic diversity and reduced adaptive capacity.
Local extinction without recolonisation – if a patch is lost, there are no connected source populations to recolonise.
Two cells are considered connected if they are within gap_m metres of
each other (default 500 m – approximately the scale of larval dispersal
during a tidal cycle at moderate current speeds). Connectivity is computed
using a fast union-find (disjoint set) algorithm – no external graph
packages required.
analyse_connectivity( result, min_suitability = 0.5, gap_m = 500, min_patch_cells = 3L, verbose = TRUE )analyse_connectivity( result, min_suitability = 0.5, gap_m = 500, min_patch_cells = 3L, verbose = TRUE )
result |
Dataframe from |
min_suitability |
Numeric. Minimum suitability score for a cell to be included in the connectivity analysis (default 0.5, i.e. Moderate+). |
gap_m |
Numeric. Maximum distance in metres between two suitable cells for them to be considered connected (default 500 m). |
min_patch_cells |
Integer. Patches smaller than this are flagged as isolated (default 3 cells). |
verbose |
Logical. Print connectivity summary (default TRUE). |
Input dataframe with additional columns:
patch_id (integer – unique patch identifier; NA for non-suitable cells),
patch_size (number of cells in the patch),
patch_area_km2 (approximate area based on cell density),
connectivity_class ("isolated", "small", "moderate", "large"),
is_hub (logical – TRUE for the highest-scoring cell in each patch,
useful as candidate introduction points).
result <- predict_oyster(survey, "ostrea_edulis") result <- analyse_connectivity(result, gap_m = 500) # Large well-connected patches are the best restoration targets good_patches <- subset(result, connectivity_class == "large" & suitability_class == "High") # Count patches table(result$connectivity_class) # Visualise -- patch_id maps to distinct colours in QGIS export_geotiff(result, "connectivity.tif")result <- predict_oyster(survey, "ostrea_edulis") result <- analyse_connectivity(result, gap_m = 500) # Large well-connected patches are the best restoration targets good_patches <- subset(result, connectivity_class == "large" & suitability_class == "High") # Count patches table(result$connectivity_class) # Visualise -- patch_id maps to distinct colours in QGIS export_geotiff(result, "connectivity.tif")
Evaluates which aquaculture and restoration gear types are physically viable at each survey location based on depth, current velocity, substrate, and (optionally) wave height. Returns a feasibility score and binary flag for each gear type.
This function is part of the planning module aimed at commercial
operators. For pure restoration work, the restoration_reef gear type is
always included and is the most relevant output.
assess_gear_feasibility( result, gear_types = names(.gear_specs), depth_col = "depth", current_col = "current_velocity", substrate_col = "substrate_hardness_class", wave_col = NULL, verbose = TRUE )assess_gear_feasibility( result, gear_types = names(.gear_specs), depth_col = "depth", current_col = "current_velocity", substrate_col = "substrate_hardness_class", wave_col = NULL, verbose = TRUE )
result |
Dataframe from |
gear_types |
Character vector of gear types to assess. Options:
|
depth_col |
Character. Column name for depth below CD in metres
(default |
current_col |
Character. Column name for current velocity in m/s
(default |
substrate_col |
Character. Column for substrate class
(default |
wave_col |
Character or NULL. Column for significant wave height Hs in metres (default NULL – wave check skipped if absent). |
verbose |
Logical. Print feasibility summary (default TRUE). |
Input dataframe with added columns per gear type:
gear_<type>_feasible (logical),
gear_<type>_score [0-1],
plus best_gear (most feasible option) and n_gear_options (count).
result <- predict_oyster(survey, "ostrea_edulis") result <- assess_gear_feasibility(result) # Sites where longline is feasible AND suitability is High longline_sites <- subset(result, gear_longline_feasible & suitability_class == "High") # Restoration reef sites (no farming licence needed) reef_sites <- subset(result, gear_restoration_reef_feasible)result <- predict_oyster(survey, "ostrea_edulis") result <- assess_gear_feasibility(result) # Sites where longline is feasible AND suitability is High longline_sites <- subset(result, gear_longline_feasible & suitability_class == "High") # Restoration reef sites (no farming licence needed) reef_sites <- subset(result, gear_restoration_reef_feasible)
Finds the nearest standard port to the survey area, checks whether it is
within a configurable distance threshold, predicts tidal heights for every
survey timestamp using embedded 5-constituent harmonic models
(M2, S2, N2, K1, O1), and applies correct_to_chart_datum() automatically.
No external data or internet connection is required. Harmonic constituent data (amplitude and Greenwich phase lag) for 31 UK and European ports are embedded in the package.
Accuracy: 5-constituent predictions are typically accurate to +/-0.15–0.35 m
at the standard port. Accuracy degrades with distance from the port and in
areas with complex tidal dynamics (e.g. the Bristol Channel, inner fjords).
For safety-critical work, use official UKHO EasyTide predictions and supply
them manually via correct_to_chart_datum().
Requirements: The dataframe must contain a datetime column. Values must
be parseable by as.POSIXct() (e.g. "2024-06-15 10:30:00" or
"2024-06-15"). Date-only strings default to noon UTC.
auto_tidal_correct( df, datetime_col = "date", depth_col = "depth", max_port_dist_km = 75, keep_raw = TRUE, verbose = TRUE )auto_tidal_correct( df, datetime_col = "date", depth_col = "depth", max_port_dist_km = 75, keep_raw = TRUE, verbose = TRUE )
df |
A dataframe with at minimum a depth and a datetime column.
Typically the output of |
datetime_col |
Character. Name of the datetime column (default
|
depth_col |
Character. Name of the depth column to correct
(default |
max_port_dist_km |
Numeric. Maximum distance in km from the survey
centroid to the nearest standard port. If the nearest port is further
than this, no correction is applied and a warning is issued
(default |
keep_raw |
Logical. Retain original depths in a |
verbose |
Logical. Print port selection, distance, and prediction
summary (default |
The input dataframe with depths corrected to Chart Datum, plus
a tidal_height_pred_m column containing the predicted tidal height
used for each row.
correct_to_chart_datum() for manual correction,
tidal_port_info() for port datum reference data.
# Automatic correction -- finds nearest port, predicts heights, corrects depths survey_corrected <- auto_tidal_correct(survey, datetime_col = "date") # Tighten the distance threshold (only trust ports within 40 km) survey_corrected <- auto_tidal_correct(survey, max_port_dist_km = 40) # See which port was selected and inspect predicted heights survey_corrected$tidal_height_pred_m# Automatic correction -- finds nearest port, predicts heights, corrects depths survey_corrected <- auto_tidal_correct(survey, datetime_col = "date") # Tighten the distance threshold (only trust ports within 40 km) survey_corrected <- auto_tidal_correct(survey, max_port_dist_km = 40) # See which port was selected and inspect predicted heights survey_corrected$tidal_height_pred_m
Applies hard-stop exclusion criteria for a given species tolerance profile. Any location that violates one or more exclusion rules receives a flag and is removed from the suitability scoring pipeline.
Exclusion factors checked (where data columns are present):
General temperature range
Season-specific temperature floors/ceilings (requires season column)
Salinity (with temperature-dependent minimum)
Dissolved oxygen (hypoxia threshold)
check_exclusions(df, tolerances)check_exclusions(df, tolerances)
df |
A dataframe. Must contain at minimum a subset of the standard column names (see Column Naming below). Any exclusion factor for which no column is present is silently skipped with a warning. |
tolerances |
A species tolerance list as returned by
|
The input dataframe with two additional columns:
excluded: logical – TRUE if the location fails any exclusion criterion.
exclusion_reason: character – semicolon-separated list of failed criteria,
or NA if not excluded.
The function recognises these column names (case-insensitive):
| Variable | Expected column name(s) |
| Temperature | temperature, temp |
| Salinity | salinity, sal |
| Dissolved oxygen | dissolved_oxygen, do, do_mgl |
| Season | season (auto-added by add_season_column())
|
tol <- get_species_tolerances("ostrea_edulis") df <- data.frame( lat = 51.5, lon = -2.5, temperature = 8, salinity = 28, dissolved_oxygen = 7 ) check_exclusions(df, tol)tol <- get_species_tolerances("ostrea_edulis") df <- data.frame( lat = 51.5, lon = -2.5, temperature = 8, salinity = 28, dissolved_oxygen = 7 ) check_exclusions(df, tol)
Uses the intensity of acoustic backscatter from the near-seabed bin(s) to classify substrate type. Hard substrates (gravel, rock, shell hash) scatter more sound energy than soft substrates (mud, fine sand). The function converts raw backscatter (dB re 1 m^-1 or instrument counts) to a categorical substrate class and a continuous hardness index [0, 1].
Classification thresholds (default, adjustable):
| Class | Hardness index | Typical substrate |
| Hard | 0.8 -- 1.0 | Rock, cobble, shell hash |
| Mixed | 0.5 -- 0.8 | Gravel/sand mix, maerl beds |
| Soft | 0.2 -- 0.5 | Sand, sandy mud |
| Very Soft | 0.0 -- 0.2 | Mud, fine silt |
The hardness index is a min-max normalisation of the mean near-seabed
backscatter across the bottom n_bottom_bins cells. If the column
contains calibrated volume backscatter (Sv in dB), set is_sv = TRUE
and a sign-flip is applied (less negative Sv = stronger return = harder).
classify_substrate_from_backscatter( df, backscatter_col, is_sv = FALSE, bs_min = NULL, bs_max = NULL, hard_thresh = 0.8, mixed_thresh = 0.5, soft_thresh = 0.2, output_col = "substrate_hardness", verbose = TRUE )classify_substrate_from_backscatter( df, backscatter_col, is_sv = FALSE, bs_min = NULL, bs_max = NULL, hard_thresh = 0.8, mixed_thresh = 0.5, soft_thresh = 0.2, output_col = "substrate_hardness", verbose = TRUE )
df |
Dataframe with at least one column of seabed backscatter values. |
backscatter_col |
Character. Name of the backscatter column. |
is_sv |
Logical. If TRUE, treats values as Sv (dB, typically negative); hardness is derived from abs(Sv) after sign convention correction. Default FALSE (raw instrument counts, higher = harder). |
bs_min |
Numeric. Backscatter value mapped to hardness = 0 (softest). Defaults to 5th percentile of observed data. |
bs_max |
Numeric. Backscatter value mapped to hardness = 1 (hardest). Defaults to 95th percentile of observed data. |
hard_thresh |
Numeric. Hardness index above which substrate is "Hard" (default 0.8). |
mixed_thresh |
Numeric. Hardness index above which substrate is "Mixed" (default 0.5). |
soft_thresh |
Numeric. Hardness index above which substrate is "Soft" (default 0.2); below is "Very Soft". |
output_col |
Character. Base name for output columns. Two columns are
added: |
verbose |
Logical. Print classification summary (default TRUE). |
Input dataframe with substrate_hardness_index and
substrate_hardness_class columns appended. These map directly to the
substrate_hardness scored variable in the tolerance spec.
# ADCP data with near-seabed backscatter column survey <- classify_substrate_from_backscatter(survey, backscatter_col = "bottom_backscatter_db", is_sv = TRUE) # The substrate_hardness_index column feeds directly into predict_oyster() # as the 'substrate_hardness' variable names(survey)[names(survey) == "substrate_hardness_index"] <- "substrate_hardness" result <- predict_oyster(survey, "ostrea_edulis")# ADCP data with near-seabed backscatter column survey <- classify_substrate_from_backscatter(survey, backscatter_col = "bottom_backscatter_db", is_sv = TRUE) # The substrate_hardness_index column feeds directly into predict_oyster() # as the 'substrate_hardness' variable names(survey)[names(survey) == "substrate_hardness_index"] <- "substrate_hardness" result <- predict_oyster(survey, "ostrea_edulis")
Runs predict_oyster() for every specified species against the same survey
dataset and returns a combined comparison table – one row per survey
location, one suitability column per species. An optional summary prints
which species is best-suited to each location.
This is the tool to use when you haven't yet decided which species to stock, or when you want to identify areas where multiple species overlap in suitability (robust site selection).
compare_species( data, species = NULL, output_dir = NULL, min_data_quality = "low", verbose = TRUE )compare_species( data, species = NULL, output_dir = NULL, min_data_quality = "low", verbose = TRUE )
data |
A dataframe or CSV path accepted by |
species |
Character vector of species keys to compare. Defaults to all
species currently in the oystermapR database (see |
output_dir |
Character or |
min_data_quality |
Character. Minimum data quality tier to include:
|
verbose |
Logical. Print a comparison summary per species (default |
A dataframe with all original survey columns plus:
suit_<species_key>: suitability score [0, 1] per species
class_<species_key>: suitability class per species
best_species: key of the species with the highest suitability at each
location (NA if all excluded)
best_suitability: the highest suitability score across all species
n_species_high: number of species scoring "High" at each location
(useful for identifying robust multi-species sites)
# Compare all supported species comparison <- compare_species(survey) # Compare only well-characterised species comparison <- compare_species(survey, min_data_quality = "high") # Compare specific species and export per-species heatmaps comparison <- compare_species( survey, species = c("ostrea_edulis", "magallana_gigas"), output_dir = "comparison_maps/" ) # Find sites suitable for both O. edulis AND M. gigas both_high <- subset(comparison, n_species_high >= 2)# Compare all supported species comparison <- compare_species(survey) # Compare only well-characterised species comparison <- compare_species(survey, min_data_quality = "high") # Compare specific species and export per-species heatmaps comparison <- compare_species( survey, species = c("ostrea_edulis", "magallana_gigas"), output_dir = "comparison_maps/" ) # Find sites suitable for both O. edulis AND M. gigas both_high <- subset(comparison, n_species_high >= 2)
Takes a named list of predict_oyster() result dataframes and produces a
single comparative summary suitable for trend monitoring, multi-site
selection, or regulatory reporting. Common spatial cells are matched across
surveys and a change analysis is run between consecutive surveys if ordered
chronologically.
The function returns a list with three elements:
summary – one row per survey with mean suitability, class breakdown,
and data completeness stats.
spatial – all surveys joined on shared lat/lon cells, with suitability
columns per survey and trend_slope (linear regression of suitability
over survey index for each cell).
change – pairwise change table (only if surveys are in order; each
consecutive pair shows mean score difference and fraction of cells
that improved, degraded, or were stable).
compare_surveys( surveys, cell_deg = 0.001, stable_threshold = 0.05, verbose = TRUE )compare_surveys( surveys, cell_deg = 0.001, stable_threshold = 0.05, verbose = TRUE )
surveys |
Named list of dataframes from |
cell_deg |
Numeric. Spatial rounding tolerance in decimal degrees for matching cells across surveys (default 0.001 approx. 111 m). |
stable_threshold |
Numeric. Absolute suitability change below which a cell is considered "stable" rather than improved/degraded (default 0.05). |
verbose |
Logical. Print comparison summary (default TRUE). |
Named list: summary (dataframe), spatial (dataframe),
change (dataframe or NULL if only one survey).
r2022 <- predict_oyster(survey_2022, "ostrea_edulis") r2023 <- predict_oyster(survey_2023, "ostrea_edulis") r2024 <- predict_oyster(survey_2024, "ostrea_edulis") comp <- compare_surveys( surveys = list("2022" = r2022, "2023" = r2023, "2024" = r2024) ) comp$summary # mean scores by year comp$change # year-on-year change # Pass to generate_report() for a full comparative HTML report generate_report(r2024, "monitoring_report.html", title = "Kames Bay 3-Year Monitoring")r2022 <- predict_oyster(survey_2022, "ostrea_edulis") r2023 <- predict_oyster(survey_2023, "ostrea_edulis") r2024 <- predict_oyster(survey_2024, "ostrea_edulis") comp <- compare_surveys( surveys = list("2022" = r2022, "2023" = r2023, "2024" = r2024) ) comp$summary # mean scores by year comp$change # year-on-year change # Pass to generate_report() for a full comparative HTML report generate_report(r2024, "monitoring_report.html", title = "Kames Bay 3-Year Monitoring")
Oysters are permanent residents; a location is only truly suitable if
conditions are adequate year-round. composite_seasonal() takes a named
list of predict_oyster() result dataframes (one per season or survey
visit) and produces a single composite dataframe with one row per spatial
cell, summarising suitability across all input surveys.
Composite methods:
"min" (default) – most conservative; a location must be suitable in
every season to score well. Best for regulatory site selection.
"mean" – average across seasons. Good for monitoring trend summaries.
"prob" – fraction of seasons with suitability >= prob_threshold.
Useful when surveys are irregularly spaced or some seasons are missing.
Spatial matching uses a grid cell tolerance of cell_deg decimal degrees
(default 0.001 approx. 111 m). Cells not present in all seasons are flagged in
n_seasons_present.
composite_seasonal( surveys, method = c("min", "mean", "prob"), prob_threshold = 0.5, cell_deg = 0.001, verbose = TRUE )composite_seasonal( surveys, method = c("min", "mean", "prob"), prob_threshold = 0.5, cell_deg = 0.001, verbose = TRUE )
surveys |
Named list of dataframes from |
method |
Character. One of |
prob_threshold |
Numeric [0,1]. Suitability threshold used when
|
cell_deg |
Numeric. Spatial tolerance in decimal degrees for matching cells across surveys (default 0.001). |
verbose |
Logical. Print summary (default TRUE). |
A dataframe with columns: lat, lon, suitability_composite,
suitability_class, n_seasons_present, one suit_<season> column per
input survey, and suit_range (max - min across seasons).
spring <- predict_oyster(survey_spring, "ostrea_edulis") summer <- predict_oyster(survey_summer, "ostrea_edulis") autumn <- predict_oyster(survey_autumn, "ostrea_edulis") composite <- composite_seasonal( surveys = list(spring = spring, summer = summer, autumn = autumn), method = "min" ) generate_report(composite, "composite_report.html", title = "Year-round Suitability Assessment")spring <- predict_oyster(survey_spring, "ostrea_edulis") summer <- predict_oyster(survey_summer, "ostrea_edulis") autumn <- predict_oyster(survey_autumn, "ostrea_edulis") composite <- composite_seasonal( surveys = list(spring = spring, summer = summer, autumn = autumn), method = "min" ) generate_report(composite, "composite_report.html", title = "Year-round Suitability Assessment")
Converts water depths recorded during a survey to depths below Chart Datum (approximately Lowest Astronomical Tide, LAT) by adding the tidal height above Chart Datum at the time of survey. This removes bias introduced by surveying at different states of the tide.
Formula: depth_CD = depth_surveyed + tidal_height_m
Finding your tidal height:
UK: UKHO EasyTide (easytide.ukho.gov.uk) – free 7-day predictions for any standard port. Download the predicted heights for your survey period.
Ireland: Marine Institute tide gauges (data.marine.ie)
France/EU: SHOM (shom.fr) or Copernicus Marine (marine.copernicus.eu)
Any port: apps such as PocketTides, Tides Near Me, or TidePod give height-above-CD predictions for named ports.
For a typical loch or sheltered bay survey, reading the tidal height from a tide table for the nearest standard port at the mid-point of the survey is sufficient. For multi-day surveys or those spanning a full tidal cycle, supply per-row heights matched to the survey timestamp.
correct_to_chart_datum( df, tidal_height_m = NULL, depth_col = "depth", datetime_col = NULL, keep_raw = TRUE, verbose = TRUE )correct_to_chart_datum( df, tidal_height_m = NULL, depth_col = "depth", datetime_col = NULL, keep_raw = TRUE, verbose = TRUE )
df |
A dataframe containing at minimum a depth column. Typically the
output of |
tidal_height_m |
Numeric scalar or vector. Tidal height above Chart
Datum at the time of survey, in metres. If a scalar, the same offset
is applied to all rows. If a vector, must be the same length as |
depth_col |
Character. Name of the depth column to correct
(default |
datetime_col |
Character or |
keep_raw |
Logical. If |
verbose |
Logical. Print correction summary (default |
The input dataframe with the depth column corrected to Chart Datum.
If keep_raw = TRUE, the original values are preserved as depth_raw_m.
# Survey conducted around high water at Oban -- tidal height ~3.1 m above CD survey_corrected <- correct_to_chart_datum(survey, tidal_height_m = 3.1) # Per-row heights from a tide gauge CSV matched to survey timestamps tide_series <- read.csv("oban_tide_gauge.csv") # (match to survey rows by timestamp -- user responsibility) survey_corrected <- correct_to_chart_datum( survey, tidal_height_m = tide_series$height_m )# Survey conducted around high water at Oban -- tidal height ~3.1 m above CD survey_corrected <- correct_to_chart_datum(survey, tidal_height_m = 3.1) # Per-row heights from a tide gauge CSV matched to survey timestamps tide_series <- read.csv("oban_tide_gauge.csv") # (match to survey rows by timestamp -- user responsibility) survey_corrected <- correct_to_chart_datum( survey, tidal_height_m = tide_series$height_m )
Returns the meteorological season ("winter", "spring", "summer", "autumn") for a given date and latitude. Hemisphere is inferred from the sign of latitude (positive = Northern Hemisphere, negative = Southern Hemisphere).
detect_season(date, lat)detect_season(date, lat)
date |
A |
lat |
Numeric. Latitude in decimal degrees. Positive = N hemisphere. |
A character string: one of "winter", "spring", "summer",
"autumn".
detect_season("2024-01-15", lat = 51.5) # "winter" (UK) detect_season("2024-07-15", lat = 51.5) # "summer" detect_season("2024-01-15", lat = -33.9) # "summer" (Sydney)detect_season("2024-01-15", lat = 51.5) # "winter" (UK) detect_season("2024-07-15", lat = 51.5) # "summer" detect_season("2024-01-15", lat = -33.9) # "summer" (Sydney)
Uses the empirical log-linear relationship between volume backscatter
strength (Sv) and chlorophyll-a concentration to add an estimated
chlorophyll_a column to a survey dataframe. Intended for surveys where
a fluorometer was not deployed and the ADCP backscatter intensity is
available.
The conversion is: Chl_a (ugg/L) = 10 ^ (a * Sv + b)
Default a and b constants are derived from Deines (1999) for three
common ADCP frequencies (300, 600, 1200 kHz). For best results, calibrate
against concurrent water samples from your survey using the calibration
argument.
estimate_chlorophyll_from_backscatter( df, backscatter_col, frequency_khz = 600, calibration = NULL, min_chl = 0.05, max_chl = 50, keep_raw = TRUE, verbose = TRUE )estimate_chlorophyll_from_backscatter( df, backscatter_col, frequency_khz = 600, calibration = NULL, min_chl = 0.05, max_chl = 50, keep_raw = TRUE, verbose = TRUE )
df |
Dataframe containing a backscatter column and optional temperature and depth columns. |
backscatter_col |
Character. Name of the column containing volume
backscatter strength values (Sv in dB re 1 |
frequency_khz |
Numeric. ADCP operating frequency in kHz. Used to
select default calibration constants if |
calibration |
Numeric vector of length 2: |
min_chl |
Numeric. Floor for estimated chlorophyll-a in ugg/L
(default |
max_chl |
Numeric. Ceiling for estimated chlorophyll-a in ugg/L
(default |
keep_raw |
Logical. If |
verbose |
Logical. Print conversion summary and warnings (default |
The input dataframe with an additional chlorophyll_a column
(ugg/L). If a chlorophyll_a column already exists, a warning is issued
and the existing column is overwritten.
Collect 5–15 grab/Niskin water samples distributed across your survey area and depth range, then:
# Fit calibration from samples
fit <- lm(log10(sample_chl) ~ backscatter, data = samples_df)
a <- coef(fit)[["backscatter"]]
b <- coef(fit)[["(Intercept)"]]
df <- estimate_chlorophyll_from_backscatter(
df, "amp_mean_dB", frequency_khz = 600,
calibration = c(a, b))
# Default calibration (300 kHz Nortek Signature) adcp <- read_nortek_adcp("survey.csv") adcp <- estimate_chlorophyll_from_backscatter( adcp, backscatter_col = "amp_mean_dB", frequency_khz = 300) # Custom calibration from water samples adcp <- estimate_chlorophyll_from_backscatter( adcp, "amp_mean_dB", frequency_khz = 300, calibration = c(a = 0.041, b = -0.52))# Default calibration (300 kHz Nortek Signature) adcp <- read_nortek_adcp("survey.csv") adcp <- estimate_chlorophyll_from_backscatter( adcp, backscatter_col = "amp_mean_dB", frequency_khz = 300) # Custom calibration from water samples adcp <- estimate_chlorophyll_from_backscatter( adcp, "amp_mean_dB", frequency_khz = 300, calibration = c(a = 0.041, b = -0.52))
Simple great-circle ray-tracing to estimate how far open water extends
from a point in a given direction, up to max_fetch_km. This is a
lightweight GIS-free approximation – for precise fetch, use QGIS or
the fetchR R package.
estimate_fetch( lat, lon, bearing_deg, land_polygons = NULL, max_fetch_km = 200, step_km = 1 )estimate_fetch( lat, lon, bearing_deg, land_polygons = NULL, max_fetch_km = 200, step_km = 1 )
lat |
Numeric. Site latitude (decimal degrees). |
lon |
Numeric. Site longitude (decimal degrees). |
bearing_deg |
Numeric. Prevailing wind direction to check (0 = North). |
land_polygons |
SpatialPolygons or sf POLYGON object representing land. If NULL, returns max_fetch_km (fully open ocean assumed). |
max_fetch_km |
Numeric. Maximum fetch to search (default 200 km). |
step_km |
Numeric. Ray step size (default 1 km). |
Numeric. Estimated fetch in km.
Derives contour lines from the IDW suitability raster and saves them as a
vector GeoPackage (.gpkg). Load alongside the heatmap in QGIS for the
depth-contour look shown in BioBase outputs.
export_contours(tif_path, interval = 0.1, gpkg_path = NULL)export_contours(tif_path, interval = 0.1, gpkg_path = NULL)
tif_path |
Character. Path to the GeoTIFF produced by |
interval |
Numeric. Contour interval in suitability units (default |
gpkg_path |
Character. Output |
The .gpkg file path (invisibly).
export_geotiff(result, "oyster_heatmap.tif") export_contours("oyster_heatmap.tif")export_geotiff(result, "oyster_heatmap.tif") export_contours("oyster_heatmap.tif")
Takes the scored point dataset from predict_oyster() and produces a
smooth, continuous raster heatmap using Inverse Distance Weighting (IDW)
interpolation – the same technique used by systems like Lowrance BioBase.
The result looks like a fluid heat surface over your survey area rather than
isolated point squares.
The output GeoTIFF contains three bands:
suitability – IDW-interpolated score [0, 1]. Use this for the heatmap.
excluded_mask – 1 where a survey point was hard-excluded, 0 otherwise.
n_observations – how many survey points fell within each raster cell (data density diagnostic).
export_geotiff( df, path, resolution = 2e-04, idw_power = 2, idw_max_dist = NULL, buffer = 0.001, contours = TRUE, crs = "EPSG:4326", overwrite = TRUE )export_geotiff( df, path, resolution = 2e-04, idw_power = 2, idw_max_dist = NULL, buffer = 0.001, contours = TRUE, crs = "EPSG:4326", overwrite = TRUE )
df |
A dataframe processed by |
path |
Character. Output |
resolution |
Numeric. Raster cell size in decimal degrees. Default
|
idw_power |
Numeric. IDW distance decay exponent (default |
idw_max_dist |
Numeric or |
buffer |
Numeric. Padding in degrees added around the survey extent
before building the raster grid (default |
contours |
Logical. If |
crs |
Character. CRS string (default |
overwrite |
Logical. Overwrite existing file (default |
The output file path (invisibly). Also writes a .qml QGIS style
file alongside the .tif automatically.
result <- predict_oyster("survey.csv", "ostrea_edulis") # Standard export -- smooth heatmap with auto radius and contours export_geotiff(result, "oyster_heatmap.tif") # Finer resolution for a small harbour survey export_geotiff(result, "oyster_heatmap.tif", resolution = 0.0001) # Override IDW radius explicitly (e.g. sparse transect data) export_geotiff(result, "oyster_heatmap.tif", idw_max_dist = 0.005)result <- predict_oyster("survey.csv", "ostrea_edulis") # Standard export -- smooth heatmap with auto radius and contours export_geotiff(result, "oyster_heatmap.tif") # Finer resolution for a small harbour survey export_geotiff(result, "oyster_heatmap.tif", resolution = 0.0001) # Override IDW radius explicitly (e.g. sparse transect data) export_geotiff(result, "oyster_heatmap.tif", idw_max_dist = 0.005)
Writes a QGIS Layer Style file (.qml) using a yellow -> orange -> red ->
dark red colour ramp, closely matching the BioBase/Lowrance heatmap style.
The file is automatically applied when its name matches the .tif – no
manual styling needed in QGIS.
export_qml_style(tif_path)export_qml_style(tif_path)
tif_path |
Character. Path to the |
The .qml file path (invisibly).
Convenience wrapper that fetches data from one or more live sources and returns a named list of dataframes that can be passed to the relevant scoring functions. Each source can be fetched independently or all at once.
This function requires internet access and registered API credentials where
applicable (see oystermapR_live_config()). If a source fails, it is
silently skipped and not included in the output - the function never aborts
due to a single failed source.
fetch_live_environmental_data( survey, sources = "all", date_range = NULL, verbose = TRUE )fetch_live_environmental_data( survey, sources = "all", date_range = NULL, verbose = TRUE )
survey |
Dataframe with |
sources |
Character vector. Which data sources to fetch. Options:
|
date_range |
Character vector of length 2: |
verbose |
Logical. Print progress (default TRUE). |
Named list. Each successfully fetched source produces one element:
$cmems: SST, salinity, chlorophyll grid
$ices_hab: HAB event records within extent
$ices_vms: Swept-area ratio per ICES c-square
$emodnet_predators: Predator occurrence records
$fsa_shellfish: Shellfish classification polygons / area descriptions
## Not run: oystermapR_live_config(cmems_user = "u", cmems_password = "p") live <- fetch_live_environmental_data(survey, sources = c("ices_hab", "ices_vms")) result <- score_hab_risk(result, hab_data = live$ices_hab, species = "ostrea_edulis") result <- score_anthropogenic_disturbance(result, trawling_data = live$ices_vms) ## End(Not run)## Not run: oystermapR_live_config(cmems_user = "u", cmems_password = "p") live <- fetch_live_environmental_data(survey, sources = c("ices_hab", "ices_vms")) result <- score_hab_risk(result, hab_data = live$ices_hab, species = "ostrea_edulis") result <- score_anthropogenic_disturbance(result, trawling_data = live$ices_vms) ## End(Not run)
Renders a structured assessment report from the output of predict_oyster().
The report includes an executive summary, suitability class breakdown,
per-variable scoring table, most common limiting factors, top introduction
sites, and a methodology section.
Output format is PDF by default (requires a LaTeX installation – see Note below). HTML is also supported and requires no additional system software.
generate_report( result, output = "oyster_report.html", title = "Oyster Suitability Assessment", author = "", species = "ostrea_edulis", top_n = 5L, validation = NULL, comparison = NULL, open = TRUE )generate_report( result, output = "oyster_report.html", title = "Oyster Suitability Assessment", author = "", species = "ostrea_edulis", top_n = 5L, validation = NULL, comparison = NULL, open = TRUE )
result |
A dataframe returned by |
output |
Character. Output file path. Extension determines format:
|
title |
Character. Report title. Default |
author |
Character. Author name(s) to appear on the title page.
Default |
species |
Character. Species key used in |
top_n |
Integer. Number of top introduction sites to include in the
report (default |
validation |
Named list or |
comparison |
Dataframe or |
open |
Logical. Open the report in your default viewer after rendering
(default |
The output file path (invisibly).
HTML reports (recommended) require plotly and scales:
install.packages(c("plotly","scales")).
PDF reports additionally require a LaTeX installation; see the
Note section below.
PDF output requires LaTeX. If you don't have LaTeX installed,
either use output = "report.html" (no extra software needed), or install
a minimal LaTeX distribution with:
install.packages("tinytex")
tinytex::install_tinytex()
result <- predict_oyster(survey, "ostrea_edulis", output_geotiff = "kames_bay.tif") # Standard HTML report (recommended -- Plotly charts, no LaTeX needed) generate_report(result, "kames_bay_report.html", title = "Kames Bay Oyster Habitat Assessment", author = "T. Tucker") # With validation results embedded val <- validate_against_records(result, nbn_records) generate_report(result, "kames_bay_validated.html", validation = val) # With competition adjustment (when comparing species) comp <- compare_species(survey, species = c("ostrea_edulis", "magallana_gigas")) generate_report(result, "kames_bay_competition.html", comparison = comp) # PDF report (requires LaTeX / tinytex) generate_report(result, "kames_bay_report.pdf", title = "Kames Bay Oyster Habitat Assessment")result <- predict_oyster(survey, "ostrea_edulis", output_geotiff = "kames_bay.tif") # Standard HTML report (recommended -- Plotly charts, no LaTeX needed) generate_report(result, "kames_bay_report.html", title = "Kames Bay Oyster Habitat Assessment", author = "T. Tucker") # With validation results embedded val <- validate_against_records(result, nbn_records) generate_report(result, "kames_bay_validated.html", validation = val) # With competition adjustment (when comparing species) comp <- compare_species(survey, species = c("ostrea_edulis", "magallana_gigas")) generate_report(result, "kames_bay_competition.html", comparison = comp) # PDF report (requires LaTeX / tinytex) generate_report(result, "kames_bay_report.pdf", title = "Kames Bay Oyster Habitat Assessment")
Produces a self-contained multi-page A4 PDF directly from the output of
predict_oyster() without requiring LaTeX, pandoc, or Rmd rendering.
The PDF is designed for printing and contains four pages:
Page 1 – Title header, executive summary table, class breakdown bar
Page 2 – Suitability heatmap (survey points, continuous colour ramp)
Page 3 – Per-variable mean score chart (horizontal bar)
Page 4 – Score distribution histogram + class count bar chart
generate_summary_pdf( result, output = "oyster_summary.pdf", species = "ostrea_edulis", title = "Oyster Suitability Summary", author = "", open = TRUE, verbose = TRUE )generate_summary_pdf( result, output = "oyster_summary.pdf", species = "ostrea_edulis", title = "Oyster Suitability Summary", author = "", open = TRUE, verbose = TRUE )
result |
A dataframe returned by |
output |
Character. Output |
species |
Character. Species key used in |
title |
Character. Report title shown in the header.
Default |
author |
Character. Author name (optional, shown in header). |
open |
Logical. Open the PDF after writing (default |
verbose |
Logical. Print progress messages (default |
The output file path (invisibly).
result <- predict_oyster(survey_df, "ostrea_edulis") generate_summary_pdf(result, "kames_bay_summary.pdf", title = "Kames Bay -- Spring Survey", author = "T. Tucker")result <- predict_oyster(survey_df, "ostrea_edulis") generate_summary_pdf(result, "kames_bay_summary.pdf", title = "Kames Bay -- Spring Survey", author = "T. Tucker")
Get species tolerance parameters
get_species_tolerances(species)get_species_tolerances(species)
species |
Character. Species key (e.g. |
Named list of tolerance parameters for the species.
tol <- get_species_tolerances("ostrea_edulis") tol$exclusions$temperaturetol <- get_species_tolerances("ostrea_edulis") tol$exclusions$temperature
Returns the tolerance parameter list currently held in the session cache,
after any update_species_tolerances() calls. If no update has been run,
returns the built-in prior parameters unchanged.
get_tolerance_posteriors(species, var = NULL)get_tolerance_posteriors(species, var = NULL)
species |
Character. Species key (e.g. |
var |
Character or NULL. If specified, returns parameters for that
variable only; otherwise returns the full |
Named list of tolerance parameters (same structure as
get_species_tolerances()$scored).
# After calling update_species_tolerances() post <- get_tolerance_posteriors("ostrea_edulis") post$temperature # updated optimal_min, optimal_max with CIs# After calling update_species_tolerances() post <- get_tolerance_posteriors("ostrea_edulis") post$temperature # updated optimal_min, optimal_max with CIs
From a project_suitability() output, identifies locations that maintain
at least min_class suitability across all projected scenarios. These are
the most robust sites for long-term restoration or aquaculture investment.
identify_resilient_sites(projections, baseline, min_class = "Moderate")identify_resilient_sites(projections, baseline, min_class = "Moderate")
projections |
Named list returned by |
baseline |
Dataframe from |
min_class |
Character. Minimum suitability class required in all
scenarios. One of |
Subset of baseline for locations that meet min_class in every
scenario, with an added n_scenarios_qualifying column.
proj <- project_suitability(result, tol) resilient <- identify_resilient_sites(proj, result, min_class = "Moderate") nrow(resilient) # sites that stay Moderate+ even under RCP8.5proj <- project_suitability(result, tol) resilient <- identify_resilient_sites(proj, result, min_class = "Moderate") nrow(resilient) # sites that stay Moderate+ even under RCP8.5
Converts sparse point observations (CTD casts, ADCP profiles) to a regular
grid by spatial interpolation. The interpolated grid is returned in the same
dataframe format as the raw survey, and can be passed directly to
predict_oyster().
Method selection (applied per variable independently):
Ordinary kriging is used when >= 50 non-NA observations are available
for a variable, the gstat and sp packages are installed, and the
variogram can be fitted without error. Kriging is the best linear
unbiased estimator (BLUE) under the assumption of spatial stationarity –
it minimises mean squared prediction error and returns a kriging variance
for each cell.
IDW (Inverse Distance Weighting, p = 2) is used as a fallback when kriging cannot be applied. It is fast, deterministic, and requires no model fitting, but produces no uncertainty estimate and can create bull's-eye artefacts around isolated observations.
Kriging variograms are fitted automatically using gstat::fit.variogram()
with a Matern model (default) – the most flexible and statistically
justified model for environmental data. If automatic fitting fails, a
spherical model is tried; if that also fails, IDW is used.
interpolate_survey( survey, vars = NULL, resolution_m = 100, method = c("auto", "kriging", "idw"), kriging_min_n = 50L, idw_power = 2, idw_max_radius_m = NULL, verbose = TRUE )interpolate_survey( survey, vars = NULL, resolution_m = 100, method = c("auto", "kriging", "idw"), kriging_min_n = 50L, idw_power = 2, idw_max_radius_m = NULL, verbose = TRUE )
survey |
Dataframe with |
vars |
Character vector of column names to interpolate. Default: all
numeric columns except |
resolution_m |
Numeric. Grid cell size in metres (default 100 m). Coarsen for sparse surveys (> 500 m) or refine for dense ADCP grids (50 m). |
method |
Character. |
kriging_min_n |
Integer. Minimum observations required to attempt kriging (default 50). |
idw_power |
Numeric. IDW distance decay exponent (default 2). |
idw_max_radius_m |
Numeric. Maximum search radius for IDW neighbours in metres (default 5 * resolution_m). |
verbose |
Logical. Print per-variable method used and diagnostics (default TRUE). |
A dataframe on a regular grid with interpolated values, a method
column per variable (interp_method_<var>), and kriging variance columns
(krige_var_<var>) where kriging was used.
# Auto method -- kriging for dense variables, IDW for sparse grid <- interpolate_survey(survey, resolution_m = 100) result <- predict_oyster(grid, "ostrea_edulis") # Force IDW (fast, no gstat needed) grid <- interpolate_survey(survey, resolution_m = 200, method = "idw") # Inspect which method was used per variable unique(grid$interp_method_temperature) # Kriging variance -- high values = uncertain interpolation hist(grid$krige_var_temperature)# Auto method -- kriging for dense variables, IDW for sparse grid <- interpolate_survey(survey, resolution_m = 100) result <- predict_oyster(grid, "ostrea_edulis") # Force IDW (fast, no gstat needed) grid <- interpolate_survey(survey, resolution_m = 200, method = "idw") # Inspect which method was used per variable unique(grid$interp_method_temperature) # Kriging variance -- high values = uncertain interpolation hist(grid$krige_var_temperature)
List all supported species
list_species()list_species()
Named character vector: names = latin names, values = species keys.
list_species()list_species()
Reads a previously saved .rds file (from save_tolerance_update()) and
restores it into the session cache. Subsequent calls to predict_oyster()
and score_locations() will automatically use the loaded parameters.
load_tolerance_update( species = "all", path = tools::R_user_dir("oystermapR", which = "data"), verbose = TRUE )load_tolerance_update( species = "all", path = tools::R_user_dir("oystermapR", which = "data"), verbose = TRUE )
species |
Character. Species key or |
path |
Character. Directory to load from (default: user-specific data
directory from |
verbose |
Logical. Default TRUE. |
Invisibly: the loaded tolerance list.
load_tolerance_update("ostrea_edulis") # Now predict_oyster() uses the updated tolerances automatically result <- predict_oyster(survey, "ostrea_edulis")load_tolerance_update("ostrea_edulis") # Now predict_oyster() uses the updated tolerances automatically result <- predict_oyster(survey, "ostrea_edulis")
Takes any number of sensor dataframes (from read_nortek_adcp(),
read_generic_csv(), or custom sources) and merges them spatially into a
single table ready for predict_oyster().
Each dataset is first rounded to a common spatial resolution grid. Datasets
are then joined using the grid cell key – cells that exist in multiple
datasets are merged; cells that exist in only one dataset carry NA for
variables from other sensors.
merge_sensor_data(..., spatial_res = 4L, date_source = NULL, verbose = TRUE)merge_sensor_data(..., spatial_res = 4L, date_source = NULL, verbose = TRUE)
... |
Two or more dataframes to merge. Each must contain Multiple runs of the same sensor: If you have several surveys from the
same sensor type, pass them as a list and they will be automatically
stacked via merge_sensor_data( adcp = list(adcp_run1, adcp_run2, adcp_run3), bathy = bathy_df, biobase = biobase_df ) |
spatial_res |
Integer. Decimal places for the common spatial grid
(default |
date_source |
Character. Name of the dataset to take the |
verbose |
Logical. Print a merge summary (default |
A single dataframe with all oystermapR-compatible columns populated
from their respective sensor sources. Columns that weren't available from
any sensor will be absent from the result – predict_oyster() handles
missing columns gracefully.
# Single run per sensor adcp <- read_nortek_adcp("adcp.csv") biobase <- read_generic_csv("biobase.csv") survey <- merge_sensor_data(adcp = adcp, biobase = biobase) # Multiple ADCP runs (different survey days) -- automatically stacked adcp1 <- read_nortek_adcp("survey_jan.csv") adcp2 <- read_nortek_adcp("survey_feb.csv") survey <- merge_sensor_data(adcp = list(adcp1, adcp2), biobase = biobase) result <- predict_oyster(survey, "ostrea_edulis", output_geotiff = "map.tif")# Single run per sensor adcp <- read_nortek_adcp("adcp.csv") biobase <- read_generic_csv("biobase.csv") survey <- merge_sensor_data(adcp = adcp, biobase = biobase) # Multiple ADCP runs (different survey days) -- automatically stacked adcp1 <- read_nortek_adcp("survey_jan.csv") adcp2 <- read_nortek_adcp("survey_feb.csv") survey <- merge_sensor_data(adcp = list(adcp1, adcp2), biobase = biobase) result <- predict_oyster(survey, "ostrea_edulis", output_geotiff = "map.tif")
Prints a step-by-step workflow guide to the console, covering sensor data ingestion, spatial merging, suitability prediction, and QGIS export. Run this any time you need a reminder of the available functions and typical usage patterns.
oystermapR_help(topic = NULL)oystermapR_help(topic = NULL)
topic |
Character. Optionally focus on a specific topic:
|
Called for its side-effect (console output). Returns invisible(NULL).
oystermapR_help() # full guide oystermapR_help("sensors") # sensor ingestion only oystermapR_help("qgis") # QGIS output onlyoystermapR_help() # full guide oystermapR_help("sensors") # sensor ingestion only oystermapR_help("qgis") # QGIS output only
Stores API credentials in R session options so live data fetching can authenticate with external services. Credentials are not written to disk and are lost when the R session ends.
All parameters are optional - only set credentials for the services you intend to use. Functions that require a missing credential will warn and fall back to manual data rather than aborting.
oystermapR_live_config( cmems_user = NULL, cmems_password = NULL, ices_key = NULL, show_config = FALSE )oystermapR_live_config( cmems_user = NULL, cmems_password = NULL, ices_key = NULL, show_config = FALSE )
cmems_user |
Character. CMEMS username. |
cmems_password |
Character. CMEMS password. |
ices_key |
Character. ICES API key (currently optional; ICES APIs are largely open). |
show_config |
Logical. Print current configuration (credentials masked). Default FALSE. |
Invisibly returns a named list of the options set.
CMEMS: Register at marine.copernicus.eu
ICES: API access is open (no key required for HAB/VMS endpoints)
EMODnet Biology: Open WFS, no key required
FSA/DAERA: Open REST API, no key required
## Not run: oystermapR_live_config( cmems_user = "myusername", cmems_password = "mypassword" ) # Then use fetch_live = TRUE in any supporting function score_hab_risk(result, species = "ostrea_edulis", fetch_live = TRUE) ## End(Not run)## Not run: oystermapR_live_config( cmems_user = "myusername", cmems_password = "mypassword" ) # Then use fetch_live = TRUE in any supporting function score_hab_risk(result, species = "ostrea_edulis", fetch_live = TRUE) ## End(Not run)
Converts the output of an OpenDrift settlement simulation (a CSV file with
particle origin and final position) into the connectivity matrix format
expected by score_larval_connectivity(connectivity_matrix = ...).
OpenDrift setup:
Run OpenDrift (OceanDrift or OpenOil adapted for larvae) seeding particles
at each source reef location. Export a CSV with at minimum:
origin_lat, origin_lon – seed position
final_lat, final_lon – settlement position (stranded or settled)
status – filter to "stranded" or "active" as appropriate
The function bins particles into a grid and computes the fraction of particles from each source bin that arrive in each destination bin.
parse_opendrift_connectivity( opendrift_csv, bin_deg = 0.05, status_col = NULL, status_keep = c("stranded", "settled"), min_weight = 0.005, verbose = TRUE )parse_opendrift_connectivity( opendrift_csv, bin_deg = 0.05, status_col = NULL, status_keep = c("stranded", "settled"), min_weight = 0.005, verbose = TRUE )
opendrift_csv |
Character. Path to OpenDrift settlement CSV file. Must
contain columns |
bin_deg |
Numeric. Grid cell size in decimal degrees for binning (default 0.05 approx. 5 km at mid-latitudes). |
status_col |
Character or NULL. Name of a status column to filter on. If NULL (default), all rows are used. |
status_keep |
Character. Value(s) of |
min_weight |
Numeric. Minimum connectivity weight to include in output (default 0.005 = 0.5 percent). Filters very weak connections. |
verbose |
Logical. Default TRUE. |
A dataframe with columns source_lat, source_lon, dest_lat,
dest_lon, weight – ready for passing to
score_larval_connectivity().
## Not run: # After running OpenDrift simulation and exporting CSV: cm <- parse_opendrift_connectivity("opendrift_output.csv", bin_deg = 0.05) result <- predict_oyster(survey, "ostrea_edulis") result <- score_larval_connectivity(result, species = "ostrea_edulis", connectivity_matrix = cm) ## End(Not run)## Not run: # After running OpenDrift simulation and exporting CSV: cm <- parse_opendrift_connectivity("opendrift_output.csv", bin_deg = 0.05) result <- predict_oyster(survey, "ostrea_edulis") result <- score_larval_connectivity(result, species = "ostrea_edulis", connectivity_matrix = cm) ## End(Not run)
Estimates the importance of each scored environmental variable by randomly permuting its values and measuring the resulting drop in AUC against known presence/absence records. A large drop indicates the model strongly depends on that variable; a small drop indicates it could be removed with minimal impact.
This approach (Breiman 2001) is model-agnostic, requires no model refit, and is directly interpretable: "how much does AUC fall if I destroy the information in variable X?"
permutation_importance( predicted, records, presence_col = "presence", n_permutations = 50L, match_radius_deg = 0.002, seed = 42L, verbose = TRUE )permutation_importance( predicted, records, presence_col = "presence", n_permutations = 50L, match_radius_deg = 0.002, seed = 42L, verbose = TRUE )
predicted |
Dataframe from |
records |
Dataframe of presence/absence records (same format as
|
presence_col |
Character. Name of presence/absence column (default
|
n_permutations |
Integer. Number of permutation replicates per variable (default 50). More = more stable importance estimates; >= 100 recommended for publication. |
match_radius_deg |
Numeric. Matching radius (default 0.002degrees). |
seed |
Integer. Random seed (default 42). |
verbose |
Logical. Default TRUE. |
A dataframe (sorted by importance, descending) with columns:
variable: scored variable name
baseline_auc: AUC of unperturbed model
mean_permuted_auc: mean AUC after permuting this variable
importance: baseline_auc - mean_permuted_auc (drop in AUC)
importance_sd: SD of AUC drop across permutation replicates
importance_pct: importance as percentage of baseline AUC
rank: rank by importance (1 = most important)
importance > 0.10 (> 10% AUC drop): high importance – variable is load- bearing for model discrimination.
importance 0.02–0.10: moderate importance.
importance < 0.02: low importance – consider whether this variable adds value relative to its data collection cost.
Breiman (2001) Machine Learning 45:5-32. Zurell et al. (2020) Ecography 43:1261-1277.
records <- read.csv("nbn_ostrea_edulis.csv") result <- predict_oyster(survey, "ostrea_edulis") imp <- permutation_importance(result, records, n_permutations = 100) print(imp) # Variable Importance Rank # temperature 0.183 1 # salinity 0.091 2 # depth 0.045 3 # ...records <- read.csv("nbn_ostrea_edulis.csv") result <- predict_oyster(survey, "ostrea_edulis") imp <- permutation_importance(result, records, n_permutations = 100) print(imp) # Variable Importance Rank # temperature 0.183 1 # salinity 0.091 2 # depth 0.045 3 # ...
The primary user-facing function of oystermapR. Accepts a tabular dataset
(CSV file path or dataframe) containing spatial coordinates and environmental
measurements, applies species-specific exclusion and weighted scoring rules,
and returns a scored dataframe alongside an optional GeoTIFF heatmap for
QGIS visualisation.
predict_oyster( data, species, output_geotiff = FALSE, resolution = 2e-04, contours = TRUE, verbose = FALSE )predict_oyster( data, species, output_geotiff = FALSE, resolution = 2e-04, contours = TRUE, verbose = FALSE )
data |
A dataframe or a file path to a CSV file. Must contain at minimum:
|
species |
Character string identifying the target oyster species.
Accepts the key ( |
output_geotiff |
Logical or character. If |
resolution |
Numeric. Raster cell size in degrees for GeoTIFF output
(default |
contours |
Logical. If |
verbose |
Logical. If |
A dataframe (invisibly) with all original columns plus:
season: detected season at each location/date.
excluded: logical, TRUE if the location fails a hard exclusion.
exclusion_reason: character, reason(s) for exclusion or NA.
suitability: numeric [0, 1], overall weighted suitability score.
suitability_class: factor, one of "High", "Moderate", "Low",
"Very Low", "Excluded".
score_<variable>: per-variable component scores.
If output_geotiff is set, a GeoTIFF is written to disk and its path is
printed to the console.
Column names are matched case-insensitively. Recognised synonyms:
| Variable | Recognised column names |
| Longitude | lon, lng, longitude, x |
| Latitude | lat, latitude, y |
| Date | date, datetime, timestamp |
| Temperature (degrees C) | temperature, temp, temp_c |
| Salinity (PSU) | salinity, sal, salinity_psu |
| Dissolved oxygen (mg/L) | dissolved_oxygen, do, do_mgl, oxygen |
| Depth (m) | depth, depth_m |
| Current velocity (m/s) | current_velocity, velocity, current, u_mean |
| Shear stress (N/m^2) | shear_stress, tau, bed_shear, shear |
| Chlorophyll-a (ugg/L) | chlorophyll_a, chla, chl_a, chlorophyll |
| Turbidity (NTU) | turbidity, ntu, turb |
| Slope (degrees) | slope, slope_deg |
| Roughness / rugosity | roughness, rugosity |
| Substrate hardness | substrate_hardness, hardness, bottom_hardness |
| Sediment type | sediment_type, sediment, substrate_type |
| Benthic communities | benthic_communities, benthic, community |
| Biotope | biotope, biotopes, habitat |
| Fishing intensity | fishing_intensity, fishing, fishing_observed
|
# Minimal runnable example using the bundled Example Bay survey data adcp_f <- system.file("extdata", "example_bay_adcp.csv", package = "oystermapR") bathy_f <- system.file("extdata", "example_bay_soundings.xyz", package = "oystermapR") ctd_f <- system.file("extdata", "example_bay_ctd.csv", package = "oystermapR") adcp <- read_nortek_adcp(adcp_f, verbose = FALSE) bathy <- read_soundings_xyz(bathy_f, verbose = FALSE) ctd <- read_generic_csv(ctd_f, verbose = FALSE) survey <- merge_sensor_data(adcp = adcp, bathy = bathy, ctd = ctd) result <- predict_oyster(survey, "ostrea_edulis", verbose = FALSE) table(result$suitability_class) # Using your own CSV file result <- predict_oyster( data = "my_survey.csv", species = "ostrea_edulis", output_geotiff = "oyster_suitability.tif" )# Minimal runnable example using the bundled Example Bay survey data adcp_f <- system.file("extdata", "example_bay_adcp.csv", package = "oystermapR") bathy_f <- system.file("extdata", "example_bay_soundings.xyz", package = "oystermapR") ctd_f <- system.file("extdata", "example_bay_ctd.csv", package = "oystermapR") adcp <- read_nortek_adcp(adcp_f, verbose = FALSE) bathy <- read_soundings_xyz(bathy_f, verbose = FALSE) ctd <- read_generic_csv(ctd_f, verbose = FALSE) survey <- merge_sensor_data(adcp = adcp, bathy = bathy, ctd = ctd) result <- predict_oyster(survey, "ostrea_edulis", verbose = FALSE) table(result$suitability_class) # Using your own CSV file result <- predict_oyster( data = "my_survey.csv", species = "ostrea_edulis", output_geotiff = "oyster_suitability.tif" )
Re-scores survey locations under user-specified climate perturbations to assess how habitat suitability may change under warming and related oceanographic shifts. Perturbations are applied as additive deltas to temperature and salinity, and multiplicative factors to other variables.
Built-in scenarios align with UKCP18 marine projections for NW European shelf seas:
| Scenario | SST delta | Salinity delta | Notes |
| RCP2.6/SSP1 | +0.5degrees C | 0.0 PSU | Low emissions, best case |
| RCP4.5/SSP2 | +1.2degrees C | -0.2 PSU | Intermediate emissions |
| RCP8.5/SSP5 | +2.5degrees C | -0.5 PSU | High emissions, worst case |
| Custom | user | user | Supply your own deltas |
The function runs score_locations() for each scenario and returns a list
with one result dataframe per scenario, plus a comparison summary and a
suitability_delta column (projected minus baseline) for each.
project_suitability( result, tolerances, scenarios = c("rcp26", "rcp45", "rcp85"), custom_deltas = NULL, horizon = "2050s", verbose = TRUE )project_suitability( result, tolerances, scenarios = c("rcp26", "rcp45", "rcp85"), custom_deltas = NULL, horizon = "2050s", verbose = TRUE )
result |
Dataframe from |
tolerances |
Species tolerance list from |
scenarios |
Character vector of built-in scenario names, or NULL to
use |
custom_deltas |
Named list of custom scenarios. Each element is a
named numeric vector of deltas keyed to column names, e.g.
|
horizon |
Character. Label for the time horizon, used in output names
only (default |
verbose |
Logical. Print scenario comparison table (default TRUE). |
Named list:
One dataframe per scenario (named by scenario key) with suitability,
suitability_class, and suitability_delta columns.
"summary" – dataframe comparing mean suitability, class breakdown,
and net change vs baseline across all scenarios.
result <- predict_oyster(survey, "ostrea_edulis") tol <- get_species_tolerances("ostrea_edulis") # Project under all three UKCP18-aligned scenarios proj <- project_suitability(result, tol) # Mean suitability change by scenario proj$summary # High-risk cells: currently High but drops to Low under RCP8.5 vulnerable <- subset(proj$rcp85, result$suitability_class == "High" & suitability_class == "Low") # Custom scenario (e.g. local downscaled projection) proj2 <- project_suitability(result, tol, scenarios = NULL, custom_deltas = list( ukcp18_p95 = c(temperature = 3.2, salinity = -0.8)))result <- predict_oyster(survey, "ostrea_edulis") tol <- get_species_tolerances("ostrea_edulis") # Project under all three UKCP18-aligned scenarios proj <- project_suitability(result, tol) # Mean suitability change by scenario proj$summary # High-risk cells: currently High but drops to Low under RCP8.5 vulnerable <- subset(proj$rcp85, result$suitability_class == "High" & suitability_class == "Low") # Custom scenario (e.g. local downscaled projection) proj2 <- project_suitability(result, tol, scenarios = NULL, custom_deltas = list( ukcp18_p95 = c(temperature = 3.2, salinity = -0.8)))
Detects and flags suspect sensor readings before they enter the suitability model. Three complementary checks are applied to each numeric column:
Range check – values outside the physically plausible range for that
variable (e.g. temperature above 35degrees C in temperate coastal water, salinity
above 42 PSU) are flagged as "range_fail".
Statistical outlier – values beyond iqr_k x IQR from the median
are flagged as "outlier" (default k = 3, Tukey's outer fence).
Temporal gradient – when a datetime column is present and data is
sorted chronologically, sequential differences exceeding the
max_gradient threshold are flagged as "gradient_fail" (instrument
spike or data entry error).
Additionally, cross-variable sanity checks flag physically implausible combinations:
Dissolved oxygen > 20 mg/L with temperature < 5degrees C (likely sensor error)
Salinity < 5 PSU with temperature > 25degrees C (likely freshwater intrusion instrument error, not a real coastal reading)
Chlorophyll_a > 50 ugg/L (extreme bloom or sensor fouling)
Flagged values are not removed – they are marked in a qc_flag_<col>
column so the user can decide whether to replace with NA, correct, or
accept them. A summary qc_n_flags column counts total flags per row.
qc_survey_data( df, datetime_col = "datetime", iqr_k = 3, max_gradients = NULL, apply_flags = FALSE, verbose = TRUE )qc_survey_data( df, datetime_col = "datetime", iqr_k = 3, max_gradients = NULL, apply_flags = FALSE, verbose = TRUE )
df |
Dataframe of raw survey measurements. |
datetime_col |
Character or NULL. Name of the datetime column for
temporal gradient checks (default |
iqr_k |
Numeric. IQR multiplier for outlier detection (default 3.0; use 1.5 for stricter QC of high-precision CTD data). |
max_gradients |
Named numeric vector. Maximum allowed absolute change per unit time (seconds) per variable. Defaults are conservative for typical towed or lowered deployments. |
apply_flags |
Logical. If TRUE, replace flagged values with NA in the returned dataframe. If FALSE (default), flags are added but values preserved for manual review. |
verbose |
Logical. Print QC summary (default TRUE). |
Input dataframe with additional columns:
qc_flag_<varname> for each checked variable ("ok", "range_fail",
"outlier", "gradient_fail", "cross_fail"),
qc_n_flags (integer count of flags per row),
qc_status ("pass", "review", "fail" – fail = 3+ flags on one row).
# Run QC before modelling survey_qc <- qc_survey_data(survey, datetime_col = "timestamp") # Inspect flagged rows flagged <- subset(survey_qc, qc_status %in% c("review","fail")) View(flagged) # Apply flags (replace flagged values with NA) before scoring survey_clean <- qc_survey_data(survey, apply_flags = TRUE) result <- predict_oyster(survey_clean, "ostrea_edulis")# Run QC before modelling survey_qc <- qc_survey_data(survey, datetime_col = "timestamp") # Inspect flagged rows flagged <- subset(survey_qc, qc_status %in% c("review","fail")) View(flagged) # Apply flags (replace flagged values with NA) before scoring survey_clean <- qc_survey_data(survey, apply_flags = TRUE) result <- predict_oyster(survey_clean, "ostrea_edulis")
Reads the CSV file exported by Aanderaa Data Studio from an Aanderaa Recording Current Meter (RCM Blue 5450, SeaGuard RCM, or compatible model). The export typically contains time, current speed, current direction, and optionally temperature, pressure/depth, salinity, and dissolved oxygen.
Because Aanderaa instruments are moored at a fixed location, latitude and
longitude must be supplied by the user via lat and lon.
Current speed is returned as current_speed (m/s). If the export uses
cm/s (common in older instrument firmware), set speed_unit = "cm/s" or
leave as "auto" and the function will detect units from the column header
or by range-checking the data.
read_aanderaa_csv( file, lat = NULL, lon = NULL, speed_unit = c("auto", "m/s", "cm/s"), depth_from_pressure = FALSE, verbose = TRUE )read_aanderaa_csv( file, lat = NULL, lon = NULL, speed_unit = c("auto", "m/s", "cm/s"), depth_from_pressure = FALSE, verbose = TRUE )
file |
Character. Path to the Aanderaa Data Studio CSV export. |
lat |
Numeric. Site latitude in decimal degrees (required). |
lon |
Numeric. Site longitude in decimal degrees (required). |
speed_unit |
Character. Units of the current speed column.
|
depth_from_pressure |
Logical. If |
verbose |
Logical. Print progress messages (default |
A one-row dataframe (site-averaged) with columns lat, lon,
current_speed, and any additional variables present in the file
(temperature, salinity, dissolved_oxygen, pressure_dbar, depth)
– ready for merge_sensor_data().
read_rdi_adcp(), read_nortek_aquadopp(), merge_sensor_data()
# Moored RCM Blue at Strangford Lough narrows rcm <- read_aanderaa_csv( "strangford_rcm_2024.csv", lat = 54.398, lon = -5.558 ) survey <- merge_sensor_data(adcp = rcm, bathy = bathy)# Moored RCM Blue at Strangford Lough narrows rcm <- read_aanderaa_csv( "strangford_rcm_2024.csv", lat = 54.398, lon = -5.558 ) survey <- merge_sensor_data(adcp = rcm, bathy = bathy)
Reads any sensor CSV (Lowrance BioBase, CTD/probe logger, bathymetric sonar, sidescan export, etc.) and maps its columns to the standard oystermapR variable names. Performs spatial averaging at a specified resolution.
read_generic_csv( file, col_map = NULL, spatial_res = 4L, min_obs = 3L, categorical_cols = c("sediment_type", "benthic_communities", "biotope", "fishing_intensity"), verbose = TRUE )read_generic_csv( file, col_map = NULL, spatial_res = 4L, min_obs = 3L, categorical_cols = c("sediment_type", "benthic_communities", "biotope", "fishing_intensity"), verbose = TRUE )
file |
Character. Path to the CSV file. |
col_map |
Named character vector mapping oystermapR variable names to column names in the CSV. Only variables you want to extract need to be listed. Example: col_map = c( lat = "Latitude", lon = "Longitude", depth = "Depth_m", substrate_hardness = "Hardness", temperature = "Temp_C" ) If |
spatial_res |
Integer. Decimal places for spatial binning (default |
min_obs |
Integer. Minimum observations per cell (default |
categorical_cols |
Character vector. Column names (after mapping) that
should be treated as categorical – mode (most common value) is returned
instead of mean. Default: |
verbose |
Logical. Print a summary after loading (default |
A spatially averaged dataframe using oystermapR standard column names.
# Lowrance BioBase export (automatic column matching) biobase <- read_generic_csv("biobase_export.csv") # With explicit column mapping probe <- read_generic_csv( "ctd_data.csv", col_map = c(lat="Lat", lon="Lon", temperature="Temp", salinity="Sal", dissolved_oxygen="DO_mgl") )# Lowrance BioBase export (automatic column matching) biobase <- read_generic_csv("biobase_export.csv") # With explicit column mapping probe <- read_generic_csv( "ctd_data.csv", col_map = c(lat="Lat", lon="Lon", temperature="Temp", salinity="Sal", dissolved_oxygen="DO_mgl") )
Reads a raw merged Nortek Signature 500 ADCP CSV, automatically detects
and excludes depth bins contaminated by acoustic sidelobe interference,
computes current velocity and estimated bed shear stress from the deepest
clean bin, and returns a spatially averaged dataframe ready to merge with
other sensor data via merge_sensor_data().
Sidelobe detection: Each bin is tested independently. A bin is flagged
as contaminated if more than sidelobe_threshold fraction of its speed
readings exceed max_plausible_speed. Once a bin is flagged, all deeper
bins are also flagged (contamination propagates downward).
Shear stress accuracy: Bed shear stress (tau = rho * Cd * U_bed^2)
is most accurate when computed from near-bed velocity. If near-bed bins are
contaminated and only near-surface bins remain, a warning is issued and the
shear_stress_quality column is set to "low". If two or more clean bins
exist, the deepest clean bin is used and quality is "moderate" or
"good".
read_nortek_adcp( file, spatial_res = 4L, min_obs = 5L, max_plausible_speed = 1.5, sidelobe_threshold = 0.1, rho = 1025, Cd = 0.002, verbose = TRUE )read_nortek_adcp( file, spatial_res = 4L, min_obs = 5L, max_plausible_speed = 1.5, sidelobe_threshold = 0.1, rho = 1025, Cd = 0.002, verbose = TRUE )
file |
Character. Path to the Nortek merged CSV file. |
spatial_res |
Integer. Decimal places for lat/lon binning (default |
min_obs |
Integer. Minimum number of raw ensembles per spatial cell
to include in output (default |
max_plausible_speed |
Numeric. Speed in m/s above which a reading is
considered a sidelobe spike (default |
sidelobe_threshold |
Numeric. Fraction of readings in a bin that must
exceed |
rho |
Numeric. Seawater density in kg/m^3 (default |
Cd |
Numeric. Drag coefficient for bed shear stress calculation
(default |
verbose |
Logical. Print processing summary (default |
A dataframe with one row per spatial cell containing:
lat, lon – cell centroid coordinates
date – earliest observation date in cell (character, "YYYY-MM-DD")
current_velocity – mean speed from deepest clean bin (m/s)
current_velocity_sd – standard deviation within cell (m/s)
current_velocity_p95 – 95th percentile speed (m/s)
shear_stress – estimated bed shear stress (N/m^2)
shear_stress_quality – "good", "moderate", or "low"
n_ensembles – raw ensembles averaged into this cell
bins_used – which velocity bins contributed (e.g. "bin1")
bins_excluded – which bins were removed as contaminated
adcp <- read_nortek_adcp("S104456A008_AWE_Melfort_merged.csv") print(adcp)adcp <- read_nortek_adcp("S104456A008_AWE_Melfort_merged.csv") print(adcp)
Reads the ASCII velocity export produced by Nortek's AquaPro software from an Aquadopp Profiler deployment. The function accepts two common export layouts:
ENU CSV – a single file with named columns for east, north, and up
velocity per depth bin (e.g. East_1, North_1, East_2, ...) plus
optional time, temperature, and pressure columns.
Separate beam files – the companion .sen (sensor) file providing
time, temperature, and pressure, and the .v1/.v2 files providing
beam velocities. Supply the .sen path to file and set
beam_files = c("survey.v1", "survey.v2", "survey.v3").
Because Aquadopp instruments are often moored at a single location rather
than vessel-mounted, GPS coordinates may not be embedded in the export. In
that case supply lat and lon directly; all output rows receive that
fixed position.
read_nortek_aquadopp( file, beam_files = NULL, lat = NULL, lon = NULL, spatial_res = 4L, min_obs = 5L, max_plausible_speed = 2.5, rho = 1025, Cd = 0.002, verbose = TRUE )read_nortek_aquadopp( file, beam_files = NULL, lat = NULL, lon = NULL, spatial_res = 4L, min_obs = 5L, max_plausible_speed = 2.5, rho = 1025, Cd = 0.002, verbose = TRUE )
file |
Character. Path to the AquaPro ENU CSV export, or path to the
|
beam_files |
Character vector. Paths to |
lat |
Numeric. Fixed latitude for moored deployments (decimal degrees). Required when GPS columns are absent from the export. |
lon |
Numeric. Fixed longitude for moored deployments (decimal degrees). Required when GPS columns are absent from the export. |
spatial_res |
Integer. Decimal places for lat/lon binning (default |
min_obs |
Integer. Minimum ensembles per grid cell (default |
max_plausible_speed |
Numeric. Upper speed threshold for outlier
removal in m/s (default |
rho |
Numeric. Water density in kg/m^3 (default |
Cd |
Numeric. Drag coefficient for bed shear stress (default |
verbose |
Logical. Print progress messages (default |
A dataframe with columns lat, lon, current_speed,
bed_shear_stress, shear_stress_quality, and optionally temperature,
pressure_dbar, and date – ready for merge_sensor_data().
# Moored deployment (fixed position) adcp <- read_nortek_aquadopp( "harbour_mouth_aqd.csv", lat = 56.512, lon = -5.403 ) # Vessel-mounted with GPS columns in the file adcp <- read_nortek_aquadopp("transect_enu.csv") survey <- merge_sensor_data(adcp = adcp, bathy = bathy)# Moored deployment (fixed position) adcp <- read_nortek_aquadopp( "harbour_mouth_aqd.csv", lat = 56.512, lon = -5.403 ) # Vessel-mounted with GPS columns in the file adcp <- read_nortek_aquadopp("transect_enu.csv") survey <- merge_sensor_data(adcp = adcp, bathy = bathy)
Reads the binary PD0 output file produced by Teledyne RDI instruments
(WorkHorse II, Sentinel V, StreamPro, Ocean Surveyor, Long Ranger) via the
the oce package's read.adp.rdi() parser. The function
extracts east/north velocity profiles, applies sidelobe contamination
detection, and derives current speed and bed shear stress – producing
output in the same format as read_nortek_adcp().
Optional dependency: reading RDI binary (PD0) files requires the oce
package (install.packages("oce")). If oce is absent the function will
stop with an informative message. ASCII WinRiver II exports (.txt/.csv)
do not require oce.
GPS: vessel-mounted deployments embed GPS in the binary file and are
handled automatically. For moored deployments supply lat and lon.
read_rdi_adcp( file, lat = NULL, lon = NULL, spatial_res = 4L, min_obs = 5L, max_plausible_speed = 2, sidelobe_threshold = 0.1, rho = 1025, Cd = 0.002, verbose = TRUE )read_rdi_adcp( file, lat = NULL, lon = NULL, spatial_res = 4L, min_obs = 5L, max_plausible_speed = 2, sidelobe_threshold = 0.1, rho = 1025, Cd = 0.002, verbose = TRUE )
file |
Character. Path to the RDI binary |
lat |
Numeric. Fixed latitude for moored deployments (decimal degrees). |
lon |
Numeric. Fixed longitude for moored deployments (decimal degrees). |
spatial_res |
Integer. Decimal places for lat/lon binning (default |
min_obs |
Integer. Minimum ensembles per grid cell (default |
max_plausible_speed |
Numeric. Upper speed threshold in m/s for
sidelobe / spike removal (default |
sidelobe_threshold |
Numeric. Fraction of ensembles exceeding
|
rho |
Numeric. Water density in kg/m^3 (default |
Cd |
Numeric. Drag coefficient for bed shear stress (default |
verbose |
Logical. Print progress messages (default |
A dataframe with columns lat, lon, current_speed,
bed_shear_stress, shear_stress_quality, and n_ensembles – ready
for merge_sensor_data().
read_nortek_adcp(), read_nortek_aquadopp(), merge_sensor_data()
# Vessel-mounted WorkHorse II (GPS embedded in binary) adcp <- read_rdi_adcp("survey_2024.000") # Moored Sentinel V (fixed position) adcp <- read_rdi_adcp("mooring_2024.000", lat = 52.41, lon = -9.20) survey <- merge_sensor_data(adcp = adcp, bathy = bathy)# Vessel-mounted WorkHorse II (GPS embedded in binary) adcp <- read_rdi_adcp("survey_2024.000") # Moored Sentinel V (fixed position) adcp <- read_rdi_adcp("mooring_2024.000", lat = 52.41, lon = -9.20) survey <- merge_sensor_data(adcp = adcp, bathy = bathy)
Reads a single-band GeoTIFF raster using terra and returns a spatially
averaged dataframe suitable for merge_sensor_data().
Two modes are supported, selected via the type argument:
"bathy" – Bathymetric DEM (e.g. from Ping 3DSS or post-processed
soundings). Extracts:
depth – raster cell value (metres)
slope – computed using terra::terrain()
roughness – Terrain Ruggedness Index via terra::terrain()
"sidescan" – Sidescan backscatter mosaic (normalised 0–1 or raw DN).
Extracts:
substrate_hardness – backscatter intensity normalised to [0, 1].
High backscatter -> hard/coarse substrate (rock, gravel, shell);
low backscatter -> soft substrate (mud, fine sand).
read_sonar_tif( file, type = c("bathy", "sidescan"), spatial_res = 4L, band = 1L, nodata_val = NA, depth_positive = TRUE, verbose = TRUE )read_sonar_tif( file, type = c("bathy", "sidescan"), spatial_res = 4L, band = 1L, nodata_val = NA, depth_positive = TRUE, verbose = TRUE )
file |
Character. Path to the GeoTIFF file ( |
type |
Character. Either |
spatial_res |
Integer. Decimal places for output lat/lon grid
(default |
band |
Integer. Raster band to read (default |
nodata_val |
Numeric. Value to treat as no-data (default |
depth_positive |
Logical. For |
verbose |
Logical (default |
A dataframe with lat, lon, and the derived variable columns
ready for merge_sensor_data().
bathy_df <- read_sonar_tif("kames_bay_bathy.tif", type = "bathy") sidescan_df <- read_sonar_tif("kames_bay_sidescan.tif", type = "sidescan") survey <- merge_sensor_data(bathy = bathy_df, sidescan = sidescan_df, adcp = adcp_df)bathy_df <- read_sonar_tif("kames_bay_bathy.tif", type = "bathy") sidescan_df <- read_sonar_tif("kames_bay_sidescan.tif", type = "sidescan") survey <- merge_sensor_data(bathy = bathy_df, sidescan = sidescan_df, adcp = adcp_df)
Reads a plain-text XYZ soundings file (comma or space delimited, optional
#-prefixed header), spatially averages onto a regular grid, and derives:
depth – mean depth per cell (metres, positive downward)
roughness – rugosity index per cell, derived from intra-cell depth
variance using the surface-area approximation
rugosity = sqrt(1 + (depth_sd / cell_size_m)^2). Values near 1.0 are
flat; higher values indicate more complex relief.
slope – maximum downslope gradient in degrees, computed by finite
differences between neighbouring grid cells (Horn 1981 method).
read_soundings_xyz( file, lon_col = NULL, lat_col = NULL, depth_col = NULL, spatial_res = 4L, min_soundings = 10L, depth_positive = TRUE, verbose = TRUE )read_soundings_xyz( file, lon_col = NULL, lat_col = NULL, depth_col = NULL, spatial_res = 4L, min_soundings = 10L, depth_positive = TRUE, verbose = TRUE )
file |
Character. Path to the |
lon_col, lat_col, depth_col
|
Character or integer. Column names or
positions for longitude, latitude, and depth. If |
spatial_res |
Integer. Decimal places for lat/lon binning (default |
min_soundings |
Integer. Minimum soundings per cell to include in output
(default |
depth_positive |
Logical. If |
verbose |
Logical. Print processing summary (default |
A dataframe with columns lat, lon, depth, slope,
roughness, and n_soundings – ready for merge_sensor_data().
bathy <- read_soundings_xyz("kames_bay_soundings.xyz") survey <- merge_sensor_data(adcp = adcp_data, bathy = bathy)bathy <- read_soundings_xyz("kames_bay_soundings.xyz") survey <- merge_sensor_data(adcp = adcp_data, bathy = bathy)
Clears the session cache for a species, reverting to the built-in prior parameters. Does not affect saved files (use file deletion for those).
reset_tolerance_update(species = "all", verbose = TRUE)reset_tolerance_update(species = "all", verbose = TRUE)
species |
Character. Species key, or |
verbose |
Logical. Default TRUE. |
Invisibly NULL.
reset_tolerance_update("ostrea_edulis") reset_tolerance_update("all")reset_tolerance_update("ostrea_edulis") reset_tolerance_update("all")
Serialises the current session cache for a species to an .rds file so
that updated parameters persist across R sessions. Load again with
load_tolerance_update().
save_tolerance_update( species = "all", path = tools::R_user_dir("oystermapR", which = "data"), verbose = TRUE )save_tolerance_update( species = "all", path = tools::R_user_dir("oystermapR", which = "data"), verbose = TRUE )
species |
Character. Species key or |
path |
Character. Directory to save into (default: user-specific data
directory from |
verbose |
Logical. Default TRUE. |
Invisibly: path(s) to saved file(s).
save_tolerance_update("ostrea_edulis") save_tolerance_update("all", path = "data/bayes_updates/")save_tolerance_update("ostrea_edulis") save_tolerance_update("all", path = "data/bayes_updates/")
Calculates a disturbance index [0,1] from bottom trawling intensity, dredging activity, and anchor damage. The primary data input is the ICES swept-area ratio (SAR), which can be supplied manually or fetched live from the ICES VMS data portal.
This output is primarily relevant to the gear feasibility and economic viability modules. Very high trawling intensity (SAR > 0.5/yr) constitutes a hard gate for bottom culture and restoration reef deployment – physical gear deployment is not viable on actively trawled ground regardless of ecological suitability.
Input options:
trawling_data – ICES VMS SAR data, manually downloaded as CSV/dataframe.
fetch_live = TRUE – queries ICES VMS GeoServer endpoint.
Neither – returns zero disturbance with a warning.
score_anthropogenic_disturbance( result, trawling_data = NULL, anchor_data = NULL, dredge_areas = NULL, fetch_live = FALSE, match_radius_m = 5000, verbose = TRUE )score_anthropogenic_disturbance( result, trawling_data = NULL, anchor_data = NULL, dredge_areas = NULL, fetch_live = FALSE, match_radius_m = 5000, verbose = TRUE )
result |
Dataframe from |
trawling_data |
Dataframe of ICES VMS SAR data (see Details), or NULL. |
anchor_data |
Dataframe of anchoring density data, or NULL. |
dredge_areas |
Dataframe of dredging/extraction area centroids, or NULL. |
fetch_live |
Logical. Query ICES VMS GeoServer (default FALSE). |
match_radius_m |
Numeric. Spatial matching radius in metres (default 5000 m, consistent with ICES c-square ~0.05degrees resolution). |
verbose |
Logical. Default TRUE. |
result with additional columns:
disturbance_sar: swept-area ratio at or near each site (NA if no data)
disturbance_score [0,1]: composite anthropogenic disturbance index
disturbance_class: "Negligible" / "Low" / "Moderate" / "High" / "Active"
disturbance_gear_gate: logical – TRUE if SAR exceeds gear deployment
threshold (used by assess_gear_feasibility())
disturbance_note: management flag
A dataframe with columns:
lat, lon – centroid of c-square or fishing location
sar – swept-area ratio (numeric; dimensionless ratio per year)
gear_type (optional) – "otter", "beam", "dredge", "seine" etc.
Dredge gear has higher impact per SAR unit.
anchor_data: dataframe with lat, lon, density (vessel anchoring
density index) for recreational/commercial anchorage areas.
dredge_areas: dataframe with lat, lon identifying licensed
aggregate extraction areas (all sites within the polygon receive score 1.0).
# Manual ICES VMS data (downloaded from ICES data portal) vms <- read.csv("ices_vms_sar_2022.csv") # columns: lat, lon, sar, gear_type result <- score_anthropogenic_disturbance(result, trawling_data = vms) # Live ICES VMS fetch result <- score_anthropogenic_disturbance(result, fetch_live = TRUE)# Manual ICES VMS data (downloaded from ICES data portal) vms <- read.csv("ices_vms_sar_2022.csv") # columns: lat, lon, sar, gear_type result <- score_anthropogenic_disturbance(result, trawling_data = vms) # Live ICES VMS fetch result <- score_anthropogenic_disturbance(result, fetch_live = TRUE)
Assesses the environmental risk of two key bivalve pathogens at each survey location:
Bonamia ostreae (Ostrea edulis only) – an intracellular haplosporidian
parasite that is the primary constraint on flat oyster restoration in northern
Europe. Risk is temperature-driven: transmission peaks 15-20degrees C and is
negligible below 8degrees C. The function also accepts a known_sites dataframe of
confirmed infected locations to apply a proximity multiplier.
OsHV-1 microvariant (Magallana gigas) – a herpesvirus causing Pacific Oyster Mortality Syndrome (POMS). Mortality events occur when seawater exceeds 16degrees C for sustained periods. High turbidity amplifies risk by increasing larval stress.
Risk scores are returned on a 0-1 scale (0 = negligible, 1 = highest environmental risk) and classified as Low / Moderate / High / Critical. They are deliberately kept separate from the suitability score – a High suitability site with High disease risk is a meaningful regulatory red flag.
score_disease_risk( result, species = "ostrea_edulis", known_sites = NULL, temp_col = "temperature", salinity_col = "salinity", turbidity_col = "turbidity", verbose = TRUE )score_disease_risk( result, species = "ostrea_edulis", known_sites = NULL, temp_col = "temperature", salinity_col = "salinity", turbidity_col = "turbidity", verbose = TRUE )
result |
Dataframe from |
species |
Character. |
known_sites |
Dataframe or NULL. Known infected/positive sites with
|
temp_col |
Character. Name of temperature column in |
salinity_col |
Character. Name of salinity column (default |
turbidity_col |
Character. Name of turbidity column (default
|
verbose |
Logical. Print risk summary (default TRUE). |
Input dataframe with additional columns:
disease_risk_score [0-1], disease_risk_class (Low/Moderate/High/Critical),
disease_agent (pathogen name), and disease_risk_note (plain-language
interpretation).
result <- predict_oyster(survey, "ostrea_edulis") # Basic risk scoring (temperature-driven only) result <- score_disease_risk(result, "ostrea_edulis") # With known Bonamia-positive site locations infected <- data.frame(lat = c(55.8, 56.1), lon = c(-5.2, -5.4)) result <- score_disease_risk(result, "ostrea_edulis", known_sites = infected) # Sites with high ecological suitability but also high disease risk flagged <- subset(result, suitability_class == "High" & disease_risk_class %in% c("High","Critical"))result <- predict_oyster(survey, "ostrea_edulis") # Basic risk scoring (temperature-driven only) result <- score_disease_risk(result, "ostrea_edulis") # With known Bonamia-positive site locations infected <- data.frame(lat = c(55.8, 56.1), lon = c(-5.2, -5.4)) result <- score_disease_risk(result, "ostrea_edulis", known_sites = infected) # Sites with high ecological suitability but also high disease risk flagged <- subset(result, suitability_class == "High" & disease_risk_class %in% c("High","Critical"))
Combines ecological suitability, gear feasibility, site accessibility, and patch size into a composite economic viability index [0-1]. This is a heuristic scoring – not a financial model – but it ranks sites consistently and can be used to shortlist candidates for detailed business case development.
The index weighs four components:
Ecological suitability (40%) – from predict_oyster()
Gear feasibility (25%) – best gear score from assess_gear_feasibility()
Accessible patch area (20%) – log-scaled area of connected suitable
habitat (from analyse_connectivity() if available)
Access score (15%) – distance to nearest harbour/port (if
harbours table supplied) or flat 0.5 if unknown
score_economic_viability( result, harbours = NULL, target = c("restoration", "aquaculture"), verbose = TRUE )score_economic_viability( result, harbours = NULL, target = c("restoration", "aquaculture"), verbose = TRUE )
result |
Dataframe from |
harbours |
Dataframe or NULL. Known harbour/landing locations with
|
target |
Character. |
verbose |
Logical. Print viability summary (default TRUE). |
Input dataframe with viability_score [0-1],
viability_class (Poor/Fair/Good/Excellent), and viability_notes.
result <- predict_oyster(survey, "ostrea_edulis") result <- assess_gear_feasibility(result) result <- analyse_connectivity(result) harbours <- data.frame( name = c("Tarbert","Portavadie"), lat = c(55.865, 55.875), lon = c(-5.425, -5.300) ) result <- score_economic_viability(result, harbours = harbours, target = "aquaculture") # Best aquaculture candidates subset(result, viability_class %in% c("Good","Excellent"))result <- predict_oyster(survey, "ostrea_edulis") result <- assess_gear_feasibility(result) result <- analyse_connectivity(result) harbours <- data.frame( name = c("Tarbert","Portavadie"), lat = c(55.865, 55.875), lon = c(-5.425, -5.300) ) result <- score_economic_viability(result, harbours = harbours, target = "aquaculture") # Best aquaculture candidates subset(result, viability_class %in% c("Good","Excellent"))
Produces a HAB risk index (0 = negligible, 1 = severe) based on historical bloom event frequency and, where available, nutrient and stratification data.
Input options (in priority order):
hab_data – manually supplied dataframe of historical HAB events
(see Details).
fetch_live = TRUE – queries the ICES HAB database for event records
within the survey extent and a specified date range. Requires internet
access and the httr package.
Neither – returns a fixed background-level risk (0.1) with a warning.
score_hab_risk( result, hab_data = NULL, fetch_live = FALSE, date_range = NULL, match_radius_m = 5000, species = "ostrea_edulis", verbose = TRUE )score_hab_risk( result, hab_data = NULL, fetch_live = FALSE, date_range = NULL, match_radius_m = 5000, species = "ostrea_edulis", verbose = TRUE )
result |
Dataframe from |
hab_data |
Dataframe of historical HAB events (see Details), or NULL. |
fetch_live |
Logical. Query ICES HAB database (default FALSE). |
date_range |
Character vector length 2 |
match_radius_m |
Numeric. Radius in metres for event aggregation (default 5000 m = 5 km, consistent with ICES monitoring station spacing). |
species |
Character. Target species – affects toxin severity weights
( |
verbose |
Logical. Default TRUE. |
result with additional columns:
hab_risk [0,1]: composite risk index
hab_risk_class: "Low" / "Moderate" / "High" / "Critical"
hab_closure_days_per_year: estimated days/year with shellfish closures
hab_dominant_toxin: most frequently recorded toxin class at the site
hab_risk_note: management flag
A dataframe with columns:
lat, lon – event coordinates (decimal degrees)
date – event date (Date or character "YYYY-MM-DD")
genus (optional) – bloom genus (e.g. "Alexandrium", "Pseudo-nitzschia")
toxin (optional) – toxin class (PSP, ASP, DSP, AZP) – affects severity weight
closure_days (optional) – number of days the area was closed; default 1
per event if absent
# Manual data hab <- data.frame(lat=52.1, lon=-4.5, date="2022-07-15", genus="Alexandrium", toxin="PSP", closure_days=14) result <- score_hab_risk(result, hab_data=hab, species="ostrea_edulis") # Live ICES fetch result <- score_hab_risk(result, fetch_live=TRUE, date_range=c("2015-01-01","2024-12-31"))# Manual data hab <- data.frame(lat=52.1, lon=-4.5, date="2022-07-15", genus="Alexandrium", toxin="PSP", closure_days=14) result <- score_hab_risk(result, hab_data=hab, species="ostrea_edulis") # Live ICES fetch result <- score_hab_risk(result, fetch_live=TRUE, date_range=c("2015-01-01","2024-12-31"))
Estimates the larval connectivity of each survey location using one or both of two approaches:
Route 1 – Built-in gap-threshold union-find (no external data needed):
Patches above min_suitability that lie within the species' effective
dispersal kernel distance are grouped into dispersal clusters using
union-find. Within each cluster, a connectivity score is derived from the
number, quality, and distance-weighted contribution of potential larval
sources (Gaussian decay kernel, sigma = dispersal_km / 2).
Route 2 – External connectivity matrix (particle tracking / literature):
When connectivity_matrix is supplied (a dataframe of source -> destination
pairs with weights), those weights replace the Gaussian kernel for matched
pairs. Rows not covered by the matrix fall back to Route 1. The matrix can
come from:
OpenDrift particle tracking simulations (export as source/destination settlement density CSV)
FVCOM / ROMS model connectivity matrices
Published empirical matrices (e.g. Robins et al. 2017)
score_larval_connectivity( result, species = NULL, dispersal_km = NULL, pld_days = NULL, tidal_excursion_km = 5, min_suitability = 0.4, connectivity_matrix = NULL, matrix_match_radius_deg = 0.1, verbose = TRUE )score_larval_connectivity( result, species = NULL, dispersal_km = NULL, pld_days = NULL, tidal_excursion_km = 5, min_suitability = 0.4, connectivity_matrix = NULL, matrix_match_radius_deg = 0.1, verbose = TRUE )
result |
Dataframe from |
species |
Character. Species key (e.g. |
dispersal_km |
Numeric. Override for the effective dispersal kernel
radius in km. When supplied, |
pld_days |
Numeric. Override for the mean pelagic larval duration in
days. Takes precedence over species lookup; ignored if |
tidal_excursion_km |
Numeric. Mean daily tidal dispersal distance in km
(default 5 km/day approx. mean current 0.06 m/s x 24 h). Increase for highly
tidal or estuarine sites (8–15 km/day). Ignored if |
min_suitability |
Numeric. Minimum suitability score for a patch to qualify as a potential larval source or sink (default 0.40). Unsuitable patches still receive a connectivity score reflecting their proximity to sources. |
connectivity_matrix |
Dataframe or NULL. External connectivity weights (Route 2). Must contain columns:
|
matrix_match_radius_deg |
Numeric. Spatial matching tolerance in decimal degrees for linking matrix entries to result rows (default 0.10 approx. 10 km). |
verbose |
Logical. Print dispersal parameters and connectivity summary (default TRUE). |
result with additional columns:
larval_dispersal_km: effective kernel distance used
larval_pld_days: PLD used (species default or override)
larval_type: "lecithotrophic" or "planktotrophic"
n_larval_sources: suitable patches within dispersal range
source_quality_score [0,1]: Gaussian-weighted quality of nearby sources
larval_cluster_id: dispersal network cluster ID (integer; NA if
below min_suitability threshold and no sources nearby)
larval_cluster_size: number of suitable patches in dispersal network
nearest_source_km: km to nearest suitable source patch
larval_connectivity_score [0,1]: composite dispersal connectivity score
larval_connectivity_class: "Highly connected" / "Connected" /
"Low connectivity" / "Isolated"
larval_connectivity_note: plain-language ecological interpretation
larval_source: "union_find", "matrix", or "matrix+union_find"
If dispersal_km is not supplied, the effective kernel distance is derived
from the species' mean pelagic larval duration (PLD) multiplied by
tidal_excursion_km (mean daily tidal dispersal distance):
dispersal_km = PLD_mean_days x tidal_excursion_km
PLD values are literature-derived:
| Species | PLD | Larval type | Default dispersal |
| O. edulis | 1--7 days | lecithotrophic | ~20 km |
| M. gigas | 14--28 days | planktotrophic | ~105 km |
| C. angulata | 14--21 days | planktotrophic | ~85 km |
| O. stentina | 2--7 days | lecithotrophic | ~20 km |
| O. lurida | 3--10 days | lecithotrophic | ~30 km |
Cowen R.K. & Sponaugle S. (2009) Annual Review of Marine Science 1:443–466. Robins P.E. et al. (2017) J Applied Ecology 54:1699–1710. doi:10.1111/1365-2664.12854 Siegel D.A. et al. (2008) PNAS 105:8974–8979. Woolmer A.P. et al. (2009) J Shellfish Research 28:107–116. Trimble A.C. et al. (2009) J Shellfish Research 28:43–53.
result <- predict_oyster(survey, "ostrea_edulis") # Route 1 only -- built-in dispersal kernel result <- score_larval_connectivity(result, species = "ostrea_edulis") # Route 1 with custom dispersal distance (e.g. strong tidal excursion) result <- score_larval_connectivity(result, species = "ostrea_edulis", tidal_excursion_km = 10) # Route 1 with manual dispersal override result <- score_larval_connectivity(result, dispersal_km = 8) # Route 2 -- external connectivity matrix from OpenDrift cm <- read.csv("opendrift_connectivity.csv") # Required columns: source_lat, source_lon, dest_lat, dest_lon, weight result <- score_larval_connectivity(result, species = "ostrea_edulis", connectivity_matrix = cm) # Inspect isolated patches -- need artificial seeding isolated <- subset(result, larval_connectivity_class == "Isolated" & suitability_class == "High") # Strong candidates: high suitability AND high connectivity targets <- subset(result, suitability_class == "High" & larval_connectivity_class == "Highly connected")result <- predict_oyster(survey, "ostrea_edulis") # Route 1 only -- built-in dispersal kernel result <- score_larval_connectivity(result, species = "ostrea_edulis") # Route 1 with custom dispersal distance (e.g. strong tidal excursion) result <- score_larval_connectivity(result, species = "ostrea_edulis", tidal_excursion_km = 10) # Route 1 with manual dispersal override result <- score_larval_connectivity(result, dispersal_km = 8) # Route 2 -- external connectivity matrix from OpenDrift cm <- read.csv("opendrift_connectivity.csv") # Required columns: source_lat, source_lon, dest_lat, dest_lon, weight result <- score_larval_connectivity(result, species = "ostrea_edulis", connectivity_matrix = cm) # Inspect isolated patches -- need artificial seeding isolated <- subset(result, larval_connectivity_class == "Isolated" & suitability_class == "High") # Strong candidates: high suitability AND high connectivity targets <- subset(result, suitability_class == "High" & larval_connectivity_class == "Highly connected")
Applies .score_row() across every non-excluded row. Excluded rows
receive suitability = 0. Adds per-variable score columns and a
limiting_factors column identifying the variables most constraining
the score at each location.
score_locations(df, tolerances, verbose = FALSE)score_locations(df, tolerances, verbose = FALSE)
df |
A dataframe processed by |
tolerances |
Species tolerance list from |
verbose |
Logical. Print per-variable scoring summary (default |
The input dataframe with additional columns:
suitability: weighted score [0, 1]
suitability_class: "High" / "Moderate" / "Low" / "Very Low" / "Excluded"
limiting_factors: comma-separated names of the variables most limiting
the score (NA if all variables score >= 0.65)
score_<variable>: one column per scored variable present in the data
Produces a predation risk index (0 = negligible, 1 = severe) based on predator occurrence data, substrate rugosity, and depth. The output is intended as an overlay on ecological suitability – high predation risk does not reduce the habitat suitability score directly but is flagged in a separate column for management planning (e.g. cage exclusion, deep subtidal placement to avoid intertidal drills).
Input options (in priority order):
predator_data – manually supplied dataframe with predator records
(see Details).
fetch_live = TRUE – queries EMODnet Biology for Asterias rubens and
Carcinus maenas occurrence records within the survey extent. Requires
internet access and the httr package.
Neither – returns a depth-proxy-only risk score with a warning.
score_predation_risk( result, predator_data = NULL, fetch_live = FALSE, match_radius_m = 1000, species = "ostrea_edulis", verbose = TRUE )score_predation_risk( result, predator_data = NULL, fetch_live = FALSE, match_radius_m = 1000, species = "ostrea_edulis", verbose = TRUE )
result |
Dataframe from |
predator_data |
Dataframe of predator records (see Details), or NULL. |
fetch_live |
Logical. Query EMODnet Biology API (default FALSE). |
match_radius_m |
Numeric. Radius in metres within which predator records are aggregated per survey cell (default 1000 m). |
species |
Character. Target oyster species – affects which predators
are most relevant ( |
verbose |
Logical. Default TRUE. |
result with additional columns:
predation_risk [0,1]: composite risk index
predation_risk_class: "Low" / "Moderate" / "High" / "Severe"
predation_risk_note: text flag for management planning
n_predator_records: number of predator records within match_radius_m
A dataframe with columns:
lat, lon – coordinates
species – species name or AphiaID (character)
density (optional) – relative abundance / density index; if absent,
each record is treated as presence-only (density = 1)
date (optional) – survey date (used to filter to relevant season)
# Manual data pred <- data.frame(lat=51.5, lon=-4.1, species="Asterias rubens", density=3) result <- score_predation_risk(result, predator_data=pred, species="ostrea_edulis") # Live EMODnet fetch result <- score_predation_risk(result, fetch_live=TRUE) # Depth-proxy only (no predator data) result <- score_predation_risk(result)# Manual data pred <- data.frame(lat=51.5, lon=-4.1, species="Asterias rubens", density=3) result <- score_predation_risk(result, predator_data=pred, species="ostrea_edulis") # Live EMODnet fetch result <- score_predation_risk(result, fetch_live=TRUE) # Depth-proxy only (no predator data) result <- score_predation_risk(result)
Calculates the Shields parameter and a sediment stability score [0,1] for each survey location. High sediment mobility (theta >> theta_cr) indicates that the seabed is frequently in motion, which is unfavourable for oyster settlement, juvenile survival, and restoration reef persistence.
score_sediment_stability( result, current_col = "current_ms", wave_hs_col = "wave_hs_m", depth_col = "depth_m", substrate_col = "substrate", d50_col = "d50_mm", wave_period_s = 8, drag_coef = 0.003, verbose = TRUE )score_sediment_stability( result, current_col = "current_ms", wave_hs_col = "wave_hs_m", depth_col = "depth_m", substrate_col = "substrate", d50_col = "d50_mm", wave_period_s = 8, drag_coef = 0.003, verbose = TRUE )
result |
Dataframe from
|
current_col |
Character. Column name for depth-averaged current speed
(m/s). Default |
wave_hs_col |
Character. Column name for significant wave height (m).
Default |
depth_col |
Character. Column name for depth (m). Default |
substrate_col |
Character. Column name for substrate type. Default
|
d50_col |
Character. Column name for measured median grain size (mm).
Default |
wave_period_s |
Numeric. Peak wave period in seconds for orbital velocity calculation. Default 8 s (typical NW European sea state). |
drag_coef |
Numeric. Quadratic bed drag coefficient C_f. Default 0.003 (smooth mixed seabed; increase to 0.005-0.010 for rough rock/reef). |
verbose |
Logical. Default TRUE. |
result with additional columns:
shields_parameter: Shields number theta at each site
shields_critical: critical Shields number theta_cr for the substrate d50
mobility_ratio: theta / theta_cr (> 1.0 = mobile; < 1.0 = stable)
d50_mm_estimated: grain size used (mm) – measured or substrate-derived
sediment_stability_score [0,1]: 1 = fully stable, 0 = highly mobile
sediment_mobility_class: "Stable" / "Marginally stable" / "Mobile" / "Highly mobile"
sediment_stability_note: management flag
At minimum, the function needs current velocity to estimate bed shear stress. Grain size improves accuracy but can be estimated from the substrate type column if not directly measured.
# With current velocity and substrate type columns result <- score_sediment_stability(result, current_col="current_ms", substrate_col="substrate") # After score_wave_exposure() -- wave_hs_m column already present result <- score_wave_exposure(result, fetch_km=15) result <- score_sediment_stability(result) # With measured grain size result$d50_mm <- c(0.25, 2.5, 15.0, 0.08) result <- score_sediment_stability(result)# With current velocity and substrate type columns result <- score_sediment_stability(result, current_col="current_ms", substrate_col="substrate") # After score_wave_exposure() -- wave_hs_m column already present result <- score_wave_exposure(result, fetch_km=15) result <- score_sediment_stability(result) # With measured grain size result$d50_mm <- c(0.25, 2.5, 15.0, 0.08) result <- score_sediment_stability(result)
Assesses whether survey locations meet the conditions required for bivalve larval settlement and metamorphosis. Settlement scoring uses a separate, tighter tolerance specification than adult habitat scoring – larvae are more sensitive to turbidity, require minimum current flow for delivery, and need hard substrate for attachment.
The output can be compared directly against the adult suitability score from
predict_oyster() to distinguish "good adult habitat" from "likely natural
recruitment site."
score_settlement(survey, species = "ostrea_edulis", verbose = TRUE)score_settlement(survey, species = "ostrea_edulis", verbose = TRUE)
survey |
Dataframe of survey measurements. Same format as |
species |
Character. One of |
verbose |
Logical. Print summary (default TRUE). |
Dataframe with settlement_suitability, settlement_class, and
per-variable settle_score_* columns. Combine with adult result via
cbind() or merge on lat/lon.
adult_result <- predict_oyster(survey, "ostrea_edulis") settlement_result <- score_settlement(survey, "ostrea_edulis") # Identify cells suitable for both adult survival AND natural recruitment combined <- merge(adult_result, settlement_result[, c("lat","lon", "settlement_suitability","settlement_class")], by = c("lat","lon")) combined$dual_suitable <- combined$suitability >= 0.6 & combined$settlement_suitability >= 0.6adult_result <- predict_oyster(survey, "ostrea_edulis") settlement_result <- score_settlement(survey, "ostrea_edulis") # Identify cells suitable for both adult survival AND natural recruitment combined <- merge(adult_result, settlement_result[, c("lat","lon", "settlement_suitability","settlement_class")], by = c("lat","lon")) combined$dual_suitable <- combined$suitability >= 0.6 & combined$settlement_suitability >= 0.6
Produces a wave exposure score [0,1] and significant wave height estimate for each survey location. High wave exposure reduces gear feasibility and can destabilise restoration reef structures and spat bags.
Input options:
fetch_km: effective fetch in kilometres (distance to nearest land or
obstruction in the prevailing wind direction). Most useful for estuarine,
sea loch, or enclosed bay sites. Can be measured from GIS (e.g. in QGIS
using the Fetch tool) or estimated from chart inspection.
wave_height_m: directly measured or modelled significant wave height
(Hs). If supplied, fetch_km is ignored for Hs computation.
Both absent: wave exposure is estimated from depth alone (coarse proxy; a warning is issued).
score_wave_exposure( result, fetch_col = NULL, wave_height_col = NULL, fetch_km = NULL, wave_height_m = NULL, wind_speed_ms = 12, depth_limit = TRUE, verbose = TRUE )score_wave_exposure( result, fetch_col = NULL, wave_height_col = NULL, fetch_km = NULL, wave_height_m = NULL, wind_speed_ms = 12, depth_limit = TRUE, verbose = TRUE )
result |
Dataframe from |
fetch_col |
Character. Name of a column in |
wave_height_col |
Character. Name of a column in |
fetch_km |
Numeric scalar. Uniform fetch in km applied to all rows
(overridden by |
wave_height_m |
Numeric scalar. Uniform Hs in metres applied to all
rows (overridden by |
wind_speed_ms |
Numeric. Design wind speed in m/s used for JONSWAP calculation (default 12 m/s = moderate gale, Beaufort 6, typical design condition for coastal aquaculture siting). |
depth_limit |
Logical. Apply depth-limited wave breaking cap (Hs_max = 0.6 * depth_m). Default TRUE. |
verbose |
Logical. Default TRUE. |
result with additional columns:
wave_hs_m: significant wave height (m) – computed or supplied
wave_exposure_score [0,1]: 0 = fully sheltered, 1 = severely exposed
wave_exposure_class: "Sheltered" / "Moderate" / "Exposed" / "Severe"
wave_exposure_note: gear implication flag
wave_source: method used ("measured", "jonswap_fetch", "depth_proxy")
# Fetch column in result result$fetch_km <- c(2.5, 8.0, 25.0, 45.0) result <- score_wave_exposure(result, fetch_col = "fetch_km") # Uniform fetch for a sheltered sea loch result <- score_wave_exposure(result, fetch_km = 3.5) # Measured wave height column result <- score_wave_exposure(result, wave_height_col = "hs_m") # Depth proxy only (coarse) result <- score_wave_exposure(result)# Fetch column in result result$fetch_km <- c(2.5, 8.0, 25.0, 45.0) result <- score_wave_exposure(result, fetch_col = "fetch_km") # Uniform fetch for a sheltered sea loch result <- score_wave_exposure(result, fetch_km = 3.5) # Measured wave height column result <- score_wave_exposure(result, wave_height_col = "hs_m") # Depth proxy only (coarse) result <- score_wave_exposure(result)
Computes the marginal response curve for a single environmental variable: suitability is predicted at a grid of values for the focal variable, while all other variables are held at their observed median (or a specified background value). This reveals the shape of the scoring function as actually applied to a real dataset.
Partial dependence is a standard diagnostic for SDM publication (Elith et al. 2008; Zurell et al. 2020 ODMAP protocol). Use it to verify that predicted responses match ecological expectations before submission.
sensitivity_analysis( predicted, species, variable, n_steps = 100L, background = NULL, season = NULL, verbose = TRUE )sensitivity_analysis( predicted, species, variable, n_steps = 100L, background = NULL, season = NULL, verbose = TRUE )
predicted |
Dataframe from |
species |
Character. Species key (e.g. |
variable |
Character. Name of the focal variable to vary (e.g.
|
n_steps |
Integer. Number of evenly spaced values across the variable's biological range (default 100). |
background |
Named list. Fixed values for all other variables. If NULL
(default), uses column medians from |
season |
Character or NULL. Season to apply for seasonal variables
(e.g. |
verbose |
Logical. Default TRUE. |
A dataframe with columns:
x: focal variable value
suitability: predicted suitability at that value
variable: focal variable name
species: species key
Suitable for plotting with ggplot2 or base R plot().
Elith et al. (2008) J Animal Ecology 77:802-813. Zurell et al. (2020) Ecography 43:1261-1277.
result <- predict_oyster(survey, "ostrea_edulis") # Temperature response curve temp_pd <- sensitivity_analysis(result, "ostrea_edulis", "temperature") plot(temp_pd$x, temp_pd$suitability, type="l", xlab="Temperature (degrees C)", ylab="Suitability", main="Partial dependence: temperature") # Salinity response (summer) sal_pd <- sensitivity_analysis(result, "ostrea_edulis", "salinity", season = "summer") # ggplot2 version library(ggplot2) ggplot(temp_pd, aes(x, suitability)) + geom_line(colour="steelblue", linewidth=1.2) + geom_ribbon(aes(ymin=0, ymax=suitability), alpha=0.2, fill="steelblue") + labs(x="Temperature (degrees C)", y="Suitability [0-1]") + theme_minimal()result <- predict_oyster(survey, "ostrea_edulis") # Temperature response curve temp_pd <- sensitivity_analysis(result, "ostrea_edulis", "temperature") plot(temp_pd$x, temp_pd$suitability, type="l", xlab="Temperature (degrees C)", ylab="Suitability", main="Partial dependence: temperature") # Salinity response (summer) sal_pd <- sensitivity_analysis(result, "ostrea_edulis", "salinity", season = "summer") # ggplot2 version library(ggplot2) ggplot(temp_pd, aes(x, suitability)) + geom_line(colour="steelblue", linewidth=1.2) + geom_ribbon(aes(ymin=0, ymax=suitability), alpha=0.2, fill="steelblue") + labs(x="Temperature (degrees C)", y="Suitability [0-1]") + theme_minimal()
Individual CTD or ADCP casts are point measurements subject to instrument
noise, micro-scale patchiness, and tidal state variability. smooth_suitability()
applies a Gaussian distance-weighted kernel to the suitability scores, replacing
each cell's score with a weighted mean of all cells within bandwidth_m metres.
The result is a spatially coherent surface that better reflects broad habitat
quality rather than single-cast noise.
The smoothed score is computed as:
where and is the
Gaussian bandwidth in metres. Excluded cells (suitability = 0, excluded flag)
are down-weighted but included so that unsuitable barriers are preserved.
smooth_suitability( result, bandwidth_m = 500, max_radius_m = NULL, smooth_excluded = FALSE, verbose = TRUE )smooth_suitability( result, bandwidth_m = 500, max_radius_m = NULL, smooth_excluded = FALSE, verbose = TRUE )
result |
Dataframe from |
bandwidth_m |
Numeric. Gaussian kernel bandwidth (1 sigma) in metres. Default 500 m. Increase for sparser surveys; decrease to preserve fine detail. |
max_radius_m |
Numeric. Maximum search radius for neighbours in metres.
Cells further than this are ignored (default |
smooth_excluded |
Logical. If FALSE (default), excluded cells do not contribute to neighbours' smoothed scores (hard barriers are preserved). |
verbose |
Logical. Print summary (default TRUE). |
Input dataframe with suitability_raw (original) and
suitability (smoothed, overwrites original) columns, plus a
suitability_class column reclassified from the smoothed score.
result <- predict_oyster(survey, "ostrea_edulis") # Smooth with 300 m bandwidth (good for dense ADCP surveys) result_smooth <- smooth_suitability(result, bandwidth_m = 300) # Compare raw vs smoothed plot(result$suitability_raw, result$suitability, xlab = "Raw", ylab = "Smoothed", pch = 20)result <- predict_oyster(survey, "ostrea_edulis") # Smooth with 300 m bandwidth (good for dense ADCP surveys) result_smooth <- smooth_suitability(result, bandwidth_m = 300) # Compare raw vs smoothed plot(result$suitability_raw, result$suitability, xlab = "Raw", ylab = "Smoothed", pch = 20)
Evaluates model performance using spatial block cross-validation (Roberts et al. 2017), which avoids inflated performance estimates that arise when nearby train and test records share similar environments.
Records are partitioned into n_blocks contiguous geographic blocks. In
each fold, one block is held out as test data and the remaining blocks
provide training data to re-score the suitability model. AUC, TSS, and
Brier score are computed per fold and summarised across folds.
This function is purely evaluative – it does not refit a new model. Instead,
it uses the spatial heterogeneity in the existing predicted suitability
surface to estimate out-of-block performance.
spatial_block_cv( predicted, records, presence_col = "presence", n_blocks = 5L, match_radius_deg = 0.002, seed = 42L, plot = TRUE, verbose = TRUE )spatial_block_cv( predicted, records, presence_col = "presence", n_blocks = 5L, match_radius_deg = 0.002, seed = 42L, plot = TRUE, verbose = TRUE )
predicted |
Dataframe from |
records |
Dataframe of known presence/absence records. Must contain
|
presence_col |
Character. Name of presence/absence column (default
|
n_blocks |
Integer. Number of spatial blocks (default 5). More blocks = finer spatial resolution but smaller test sets per fold. |
match_radius_deg |
Numeric. Matching radius in degrees (default 0.002 approx. 220 m). Passed to inner matching step. |
seed |
Integer. Random seed for reproducibility (default 42). |
plot |
Logical. Reserved for future visualisation output; currently unused (default TRUE). |
verbose |
Logical. Print fold-level diagnostics (default TRUE). |
A named list:
mean_auc: mean AUC across blocks (primary reporting metric)
auc_sd: standard deviation of per-block AUC
mean_tss, tss_sd: mean / SD of True Skill Statistic
brier_mean, brier_sd: mean / SD of Brier score
n_blocks_actual: number of blocks with >= 5 records (used in CV)
fold_results: dataframe with per-fold metrics
spatial_autocorr_range_deg: estimated autocorrelation range (degrees)
spatial_bias_warning: logical, TRUE if random CV would be misleading
Compare mean_auc from spatial CV with standard AUC from
validate_against_records(). A large drop (> 0.10) indicates strong
spatial autocorrelation and means the model is less transferable than
the standard validation suggested. This is common for coastal surveys
with spatially clustered records.
Roberts et al. (2017) Ecography 40:913-929. doi:10.1111/ecog.02881
records <- read.csv("nbn_ostrea_edulis.csv") result <- predict_oyster(survey, "ostrea_edulis") # Standard validation std_val <- validate_against_records(result, records) std_val$auc # may be optimistically high # Spatial block CV sp_cv <- spatial_block_cv(result, records, n_blocks = 5) sp_cv$mean_auc # more honest estimate of transferability # Compare -- if sp_cv$mean_auc << std_val$auc, model is spatially # autocorrelated and transfers less well than initially apparent.records <- read.csv("nbn_ostrea_edulis.csv") result <- predict_oyster(survey, "ostrea_edulis") # Standard validation std_val <- validate_against_records(result, records) std_val$auc # may be optimistically high # Spatial block CV sp_cv <- spatial_block_cv(result, records, n_blocks = 5) sp_cv$mean_auc # more honest estimate of transferability # Compare -- if sp_cv$mean_auc << std_val$auc, model is spatially # autocorrelated and transfers less well than initially apparent.
Combines two or more dataframes from the same sensor (e.g. an ADCP run on Monday and again on Thursday) by row-binding them and re-averaging overlapping cells onto a common spatial grid. Non-overlapping cells are kept as-is.
This is the right tool when you have repeated surveys of the same area – rather than picking one run, overlapping cells are averaged across all runs which improves noise reduction. Non-overlapping cells (different survey extents) are preserved.
After stacking, pass the result into merge_sensor_data() alongside your
other sensor datasets as usual.
stack_surveys(..., spatial_res = 4L, verbose = TRUE)stack_surveys(..., spatial_res = 4L, verbose = TRUE)
... |
Two or more dataframes of the same sensor type. Each must contain
|
spatial_res |
Integer. Decimal places for the common spatial grid
(default |
verbose |
Logical. Print a summary (default |
A single spatially averaged dataframe combining all input surveys.
# Stack two ADCP runs before merging with other sensors adcp_mon <- read_nortek_adcp("survey_monday.csv") adcp_thu <- read_nortek_adcp("survey_thursday.csv") adcp_all <- stack_surveys(adcp_mon, adcp_thu) survey <- merge_sensor_data(adcp = adcp_all, bathy = bathy_df)# Stack two ADCP runs before merging with other sensors adcp_mon <- read_nortek_adcp("survey_monday.csv") adcp_thu <- read_nortek_adcp("survey_thursday.csv") adcp_all <- stack_surveys(adcp_mon, adcp_thu) survey <- merge_sensor_data(adcp = adcp_all, bathy = bathy_df)
Returns the mean spring tidal range (MHWS - MLWS) and typical Chart Datum
offset from Mean Sea Level for a named port. Useful for a quick sanity check
on tidal height values before applying correct_to_chart_datum().
This is a reference table only – always use official tide table predictions for actual corrections.
tidal_port_info(port)tidal_port_info(port)
port |
Character. Port name (partial, case-insensitive match). |
A one-row dataframe with columns port, country,
mhws_m, mlws_m, spring_range_m, cd_below_msl_m.
tidal_port_info("oban") tidal_port_info("brest")tidal_port_info("oban") tidal_port_info("brest")
Fits a Bayesian logistic regression linking AHP suitability scores to observed presence/absence, then updates the optimal-range tolerance parameters for one or more scored variables.
Default method (MAP + Laplace): finds the Maximum A Posteriori estimate
via optim() (L-BFGS-B) and approximates the posterior as a multivariate
normal using the numerical Hessian. Fast; suitable for routine use.
MCMC method: Random-Walk Metropolis-Hastings sampler for full posterior samples. Recommended when n < 50 or when credible intervals are needed for publication. May take several minutes for large parameter sets.
Posteriors are stored in a package-level cache so that repeated calls
accumulate evidence (sequential Bayesian updating). Use
save_tolerance_update() to persist between sessions.
update_species_tolerances( records, species, update_vars = NULL, presence_col = "presence", method = c("map", "mcmc"), n_iter = 5000L, n_warmup = 1000L, prior_sd_fraction = 0.2, min_records = 20L, verbose = TRUE )update_species_tolerances( records, species, update_vars = NULL, presence_col = "presence", method = c("map", "mcmc"), n_iter = 5000L, n_warmup = 1000L, prior_sd_fraction = 0.2, min_records = 20L, verbose = TRUE )
records |
Dataframe of field observations. Must contain columns
|
species |
Character. Species key (e.g. |
update_vars |
Character vector. Variables to update parameters for.
Defaults to all numeric scored variables present in |
presence_col |
Character. Name of presence/absence column (default
|
method |
Character. |
n_iter |
Integer. MCMC iterations (default 5000; ignored for MAP). |
n_warmup |
Integer. MCMC warm-up / burn-in (default 1000; ignored for MAP). |
prior_sd_fraction |
Numeric. Prior SD as fraction of parameter range (default 0.20 = 20 percent). Wider = weaker / more data-driven prior. |
min_records |
Integer. Minimum matched records required (default 20). |
verbose |
Logical. Print progress and diagnostics (default TRUE). |
Invisibly: a named list with elements:
species: species key
method: "map" or "mcmc"
n_records: number of matched records used
loglik_null: log-likelihood of intercept-only model
loglik_fit: log-likelihood at MAP/posterior mean
mcfadden_r2: 1 - loglik_fit / loglik_null (pseudo-R^2)
updated_params: named list of updated parameter values per variable
posterior_sd: named list of posterior SD per parameter (from Laplace/MCMC)
prior_params: the parameter values before updating (for comparison)
convergence: optim() convergence code (0 = success; MAP only)
Side effect: updates the in-session cache accessed by score_locations().
For each variable in update_vars, the function estimates shifts in
optimal_min and optimal_max (the sweet-spot bounds), plus
acceptable_min / poor_min and acceptable_max / poor_max (shoulder
bounds, updated proportionally to maintain curve shape). Parameters are
constrained to biologically plausible ranges via box constraints in the
optimiser.
get_tolerance_posteriors(), reset_tolerance_update(),
save_tolerance_update(), load_tolerance_update()
# Field records with presence/absence + environmental measurements records <- data.frame( lat = c(51.5, 51.6, 51.7, 51.8, 51.4), lon = c(-4.1, -4.2, -4.0, -4.3, -4.0), presence = c(1, 1, 0, 0, 1), temperature = c(16.2, 17.1, 12.3, 11.0, 15.8), salinity = c(32, 33, 31, 30, 34), depth = c(5, 8, 15, 20, 3) ) fit <- update_species_tolerances( records = records, species = "ostrea_edulis", update_vars = c("temperature", "salinity", "depth") ) # See updated parameters get_tolerance_posteriors("ostrea_edulis") # Re-run predict_oyster -- it will automatically use updated parameters result <- predict_oyster(survey, "ostrea_edulis") # Save to disk for next session save_tolerance_update("ostrea_edulis")# Field records with presence/absence + environmental measurements records <- data.frame( lat = c(51.5, 51.6, 51.7, 51.8, 51.4), lon = c(-4.1, -4.2, -4.0, -4.3, -4.0), presence = c(1, 1, 0, 0, 1), temperature = c(16.2, 17.1, 12.3, 11.0, 15.8), salinity = c(32, 33, 31, 30, 34), depth = c(5, 8, 15, 20, 3) ) fit <- update_species_tolerances( records = records, species = "ostrea_edulis", update_vars = c("temperature", "salinity", "depth") ) # See updated parameters get_tolerance_posteriors("ostrea_edulis") # Re-run predict_oyster -- it will automatically use updated parameters result <- predict_oyster(survey, "ostrea_edulis") # Save to disk for next session save_tolerance_update("ostrea_edulis")
Compares the suitability scores from predict_oyster() against a
presence/absence dataset to quantify how well the model discriminates
between occupied and unoccupied habitat. Useful for assessing whether the
AHP-weighted scoring produces ecologically defensible outputs before using
them for site selection.
Metrics computed:
AUC (Area Under the ROC Curve): probability that a random presence location scores higher than a random absence location. 0.5 = random; 1.0 = perfect discrimination.
Optimal threshold: Youden's J statistic (maximises sensitivity + specificity - 1). Used for the confusion-matrix-derived metrics below.
Sensitivity (true positive rate): fraction of presences correctly predicted above threshold.
Specificity (true negative rate): fraction of absences correctly predicted below threshold.
F1 score: harmonic mean of precision and sensitivity.
Brier score: mean squared error between predicted probability and observed presence/absence; 0 = perfect, 0.25 = uninformative.
TSS (True Skill Statistic = sensitivity + specificity - 1): values above 0.4 generally indicate useful predictive skill.
validate_against_records( predicted, records, presence_col = "presence", match_radius_deg = 0.002, plot = TRUE, verbose = TRUE )validate_against_records( predicted, records, presence_col = "presence", match_radius_deg = 0.002, plot = TRUE, verbose = TRUE )
predicted |
Dataframe from |
records |
Dataframe of known presence/absence records. Must contain:
|
presence_col |
Character. Name of the presence/absence column in
|
match_radius_deg |
Numeric. Radius in decimal degrees within which a
prediction cell is matched to a record (default |
plot |
Logical. Print an ASCII ROC curve to the console (default |
verbose |
Logical. Print full metric summary (default |
A named list:
auc: numeric [0,1]
optimal_threshold: numeric [0,1] – Youden's J
sensitivity: numeric [0,1] at optimal threshold
specificity: numeric [0,1] at optimal threshold
f1: numeric [0,1] at optimal threshold
tss: numeric [-1,1] at optimal threshold
brier_score: numeric [0,1]
n_presences: integer
n_absences: integer
n_unmatched: integer – records dropped due to no nearby prediction
roc_df: dataframe with columns threshold, sensitivity,
specificity (for custom plotting)
# Load known flat oyster records (e.g. from NBN Atlas CSV export) records <- read.csv("nbn_ostrea_edulis.csv") # Run prediction on same area result <- predict_oyster(survey, "ostrea_edulis") # Validate val <- validate_against_records(result, records, presence_col = "presence") val$auc # overall discrimination power val$tss # skill score val$roc_df # data for custom ggplot# Load known flat oyster records (e.g. from NBN Atlas CSV export) records <- read.csv("nbn_ostrea_edulis.csv") # Run prediction on same area result <- predict_oyster(survey, "ostrea_edulis") # Validate val <- validate_against_records(result, records, presence_col = "presence") val$auc # overall discrimination power val$tss # skill score val$roc_df # data for custom ggplot