Conservation planning rarely asks how many species live in a region. It asks which fragments of that region carry species found nowhere else, how much richness is lost when continuous habitat is broken into patches, and how survey effort distorts the apparent richness of poorly sampled areas. These questions sit beside the standard species accumulation curve, and spacc collects the tools that answer them under one set of objects.
This vignette covers the model family aimed at spatial structure and
conservation value: endemism-area curves, the species-fragmented area
relationship (SFAR), the sampling-effort corrected species-area
relationship (SESARS), per-site accumulation metrics for spatial
prioritisation, Moran eigenvector maps (MEM) for decomposing spatial
autocorrelation, the wavefront expansion diagnostic, and grid-based
spatial subsampling. Each function takes either a site-by-species matrix
with coordinates or a fitted spacc object, and each returns
a typed S3 object with print(), summary(), and
plot() methods.
The worked examples use a single simulated study region with known ground truth. Range size varies across species on purpose: a subset of narrow-ranged endemics share the region with widespread generalists. That contrast drives every result below, from the shape of the endemism curve to the sign of the fragmentation exponent. Working from a simulation with controlled inputs makes it possible to read each output against the structure that produced it, rather than against an unknown truth.
The seven functions divide into three groups by what they consume.
Endemism curves, per-site metrics, and the wavefront diagnostic take the
raw site-by-species matrix and coordinates, and run their own
accumulation internally. SFAR and SESARS take a fitted
spacc object and a covariate (patch labels or effort),
since they reshape an existing curve rather than building a new one. The
MEM tools take coordinates to build a spatial basis, then accept any
per-site response for partitioning. Knowing which group a function
belongs to is the quickest way to see what it needs and what it
returns.
The species-area relationship counts how richness grows with area. The endemism-area relationship asks a sharper question: of the species present in an accumulated set of sites, how many occur only within that set. Let \(S(A)\) be the richness accumulated over area \(A\), and let a species be endemic to the accumulated region when every one of its occurrences falls inside it. At each accumulation step \(k\) over the visit order, a species with cumulative occurrence count \(a_k\) and global occurrence count \(t\) is endemic when \(a_k = t\) and \(a_k > 0\). The endemic count \(E(A)\) rises as narrow-ranged species are captured and approaches total richness only when every site is included, since at full extent every species is trivially contained.
The diagnostic value lies at intermediate area. A site set that holds many endemic species is irreplaceable: those species disappear from the protected network if the set is dropped. Endemism concentrated in small areas signals a setting where reserve placement matters more than reserve size.
The shape of \(E(A)\) carries information beyond its endpoints. A curve that climbs slowly and only catches up to richness near full extent describes a pool of widespread species with diffuse ranges, where almost no area is irreplaceable until nearly everything is included. A curve that tracks richness closely from the start describes a pool of narrow endemics, where even small areas already hold their full complement of locally restricted species. The endemism proportion \(E(A)/S(A)\) makes this readable on a single axis: it is the fraction of the accumulated species pool that is local to the area sampled, and its early behaviour separates landscapes built from widespread species from those built from endemics.
Habitat loss and habitat fragmentation are distinct processes. Removing area reduces richness through the classic power law. Splitting the remaining area into more, smaller pieces reduces it further, through edge effects, isolation, and minimum viable area. The SFAR (Hanski et al. 2013) separates the two with an additive log-linear model,
\[\log(S) = c + z \cdot \log(A) + f \cdot \log(n),\]
where \(S\) is species richness, \(A\) is total habitat area, \(n\) is the number of fragments, \(z\) is the ordinary area exponent of the SAR, \(f\) is the fragmentation exponent, and \(c\) is the intercept on the log scale (so \(e^{c}\) is the richness expected at unit area and a single fragment). A negative \(f\) means that, holding area fixed, more fragments support fewer species. spacc reports \(f\) with the sign convention that a positive value indicates fragmentation reduces richness beyond the area effect, so the model is written \(S = e^{c} \cdot A^{z} \cdot n^{-f}\) in the printed output.
Atlas and citizen-science data confound area with effort. A large region that is sparsely surveyed can look species-poor for the wrong reason. SESARS (Dennstadt et al. 2019) adds an effort term to the SAR,
\[\log(S) = c + z \cdot \log(A) + e \cdot \log(E),\]
where \(E\) is sampling effort (visits, hours, trap-nights), \(e\) is the effort exponent, and \(A\), \(z\), \(c\), \(S\) carry their SAR meanings. A positive \(e\) quantifies how much of the apparent richness gradient tracks survey intensity rather than area. With effort in the model, the area exponent \(z\) estimates the area effect net of unequal sampling.
Spatial autocorrelation is structure worth describing in its own right. Moran eigenvector maps build an orthogonal basis of spatial patterns directly from coordinates. The construction truncates the pairwise distance matrix at a threshold (by default the largest edge of the minimum spanning tree, the smallest distance that keeps all sites connected), double-centres the squared truncated matrix, and extracts its eigenvectors. Each eigenvector is a map: broad-scale patterns carry large eigenvalues and strong positive Moran’s I, fine-scale patterns carry small eigenvalues. Moran’s I for eigenvector \(v\) with connectivity matrix \(W\) is
\[I = \frac{n}{\sum_{ij} w_{ij}} \frac{\sum_{ij} w_{ij}\,(v_i - \bar v)(v_j - \bar v)} {\sum_i (v_i - \bar v)^2},\]
with \(n\) sites and weights \(w_{ij} = 1\) when sites \(i\) and \(j\) lie within the threshold. Regressing a per-site response on a selected set of these eigenvectors partitions its variance into a spatial component and a residual, and reveals at which scales the structure lives.
Two methods share the construction. PCNM retains every eigenvector with a positive eigenvalue, including those describing negative autocorrelation (checkerboard-like patterns where neighbours differ). dbMEM keeps only the eigenvectors with positive Moran’s I, the patterns where neighbours resemble each other, which is the usual target for modelling ecological gradients. The threshold matters as much as the method: a small threshold connects only the nearest sites and yields many fine-scale vectors, while a large threshold connects distant sites and emphasises broad gradients. The minimum-spanning-tree default is the smallest threshold that leaves no site isolated, a principled starting point that adapts to irregular sampling.
The simulated region holds 80 sites scattered over a 100 by 100 unit
square and 40 species. Each species occupies a Gaussian neighbourhood
around a random centre, so occurrence probability falls off with squared
distance from that centre. The decay constant spread sets
range size: a large value confines a species to a tight cluster near its
centre, a small value lets it spill across the whole extent.
library(spacc)
set.seed(42)
n_sites <- 80
n_species <- 40
coords <- data.frame(
x = runif(n_sites, 0, 100),
y = runif(n_sites, 0, 100)
)
# Species with varying range sizes (some endemic, some widespread)
species <- matrix(0L, n_sites, n_species)
for (sp in seq_len(n_species)) {
cx <- runif(1, 10, 90)
cy <- runif(1, 10, 90)
# First 10 species are narrow-ranged (endemics)
spread <- if (sp <= 10) 0.005 else 0.001
prob <- exp(-spread * ((coords$x - cx)^2 + (coords$y - cy)^2))
species[, sp] <- rbinom(n_sites, 1, prob)
}
colnames(species) <- paste0("sp", seq_len(n_species))The first 10 species use spread = 0.005, which gives a
Gaussian half-width of roughly 12 units, about an eighth of the map
edge. These are the endemics: tight clusters that vanish if their corner
of the map is dropped. The remaining 30 use spread = 0.001,
a half-width near 26 units, so they appear across much of the area. The
split is the ground truth that every downstream analysis recovers.
range_size <- colSums(species)
tapply(range_size, c(rep("endemic", 10), rep("widespread", 30)), mean)
#> endemic widespread
#> 4.50000 18.46667Mean occupancy is lower for the narrow-ranged group, as designed. The contrast is moderate rather than extreme, which keeps the simulated curves realistic: endemics are not single-site rarities, and widespread species do not saturate every site.
spaccEndemism() tracks endemic richness as the sampled
area expands. It runs a nearest-neighbour accumulation from each of
n_seeds random starts, and at each step counts species
whose every occurrence falls inside the accumulated sites. The print
method reports the final mean endemism, which equals total richness once
all sites are in.
end <- spaccEndemism(species, coords, n_seeds = 20, progress = FALSE, seed = 1)
end
#> spacc endemism: 80 sites, 40 species, 20 seeds
#> Final mean endemism: 40.0 species (100% of total)The summary() method returns the curve as a data frame,
with mean richness, mean endemism, a percentile confidence band, and the
endemism proportion at each accumulated extent. The proportion is the
share of accumulated species that are local to the area sampled so
far.
es <- summary(end)
head(es[, c("sites", "mean_richness", "mean_endemism", "endemism_proportion")], 4)
#> sites mean_richness mean_endemism endemism_proportion
#> 1 1 6.85 0.00 0.000000000
#> 2 2 10.65 0.05 0.004694836
#> 3 3 13.35 0.10 0.007490637
#> 4 4 15.10 0.10 0.006622517Total richness and endemic richness as area expands.
At small extents the endemism proportion is high: most species seen so far are confined to the few sites visited, because a handful of sites can only contain narrow-ranged occupants. As more sites enter, widespread species accumulate and the proportion falls, then both curves converge as the remaining endemics are absorbed. The early peak in proportion marks the spatial scale at which irreplaceability is concentrated. Sites contributing to that peak are the ones a reserve network cannot afford to lose, because the species they hold occur nowhere else in the dataset.
Setting map = TRUE runs the accumulation from every site
and records the endemic count reachable from each starting point, which
feeds the map plot and the as_sf() export covered alongside
the per-site metrics below.
SFAR needs two inputs: a fitted spacc object and a
vector assigning each site to a habitat fragment. Patches here come from
k-means clustering of the coordinates, a stand-in for mapped habitat
patches. kmeans() is base R, so no guard is required.
sac <- spacc(species, coords, n_seeds = 20, progress = FALSE, seed = 1)
set.seed(123)
patches <- kmeans(coords, centers = 5)$cluster
sfar_fit <- sfar(sac, patches)
sfar_fit
#> SFAR: Species-Fragmented Area Relationship
#> --------------------------------------------
#> Fragments: 5
#> R-squared: 0.977
#>
#> Model: S = 6.18 * A^0.336 * n^(--0.302)
#> Fragmentation effect (f): -0.302
#> No additional fragmentation penalty detectedThe fitted coefficients are stored in $coef as a named
vector: log_c is the log-scale intercept, z
the area exponent, and f the fragmentation exponent with
the sign convention that positive means fragmentation costs
richness.
SFAR fit: observed richness and the area-plus-fragmentation prediction.
The area exponent \(z\) takes a value in the usual SAR range, confirming that richness climbs with the number of accumulated sites. The fragmentation exponent \(f\) measures the residual penalty once area is accounted for. A negative printed \(f\) (reported as “no additional fragmentation penalty”) means the patch count adds nothing beyond area in this dataset, which is the expected result for a simulation where patches are an arbitrary partition of a continuous surface rather than isolated habitat islands. In a genuinely fragmented region, where each patch loses edge-sensitive and area-demanding species, \(f\) turns positive and the SFAR line falls below the pure-area expectation. The interpretation hinges on that gap: SFAR is read as the distance between the fitted curve and what area alone would predict.
The mechanism behind a positive \(f\) is the reason the term is added at all. Theory predicts that fragmentation matters most when total habitat is already low, because at that point the remaining area is divided into pieces too small to sustain populations that need large territories or interior conditions away from edges. A single block of a given area then holds more species than the same area split into many fragments. The SFAR captures this as a multiplicative penalty \(n^{-f}\) that grows with fragment count, so doubling the number of fragments at fixed area reduces predicted richness by a factor \(2^{-f}\). Estimating \(f\) therefore requires patches that are genuinely isolated and that vary in number across the dataset, conditions the practical guidance below makes explicit.
SESARS corrects the SAR for uneven survey intensity. Effort enters as a per-site vector with one entry per site. Here effort is drawn from a Poisson distribution, mimicking a survey where some sites received many visits and others few.
set.seed(7)
effort <- rpois(n_sites, 10) + 1
sesars_fit <- sesars(sac, effort)
sesars_fit
#> SESARS: Sampling Effort Species-Area Relationship
#> ------------------------------------------------
#> Model: power
#> R-squared: 0.996
#>
#> Coefficients:
#> log_c z w
#> 5.6224 2.8860 -2.1304
#>
#> Interpretation: S = 276.54 * A^2.886 * E^-2.130The coefficients live in $coef: log_c (or
c under the additive model), the area exponent
z, and the effort exponent (named w in the
object, the \(e\) of the equation
above). The R-squared in $r_squared reports how much of the
richness variation the joint area-and-effort model captures.
SESARS fit with the area-plus-effort prediction overlaid on observed richness.
A positive effort exponent means apparent richness rises with survey intensity independent of area, the bias SESARS exists to expose. When effort and area are correlated in field data, ignoring effort inflates the area exponent, because the SAR absorbs the effort signal into \(z\). Fitting both terms returns an area exponent net of effort, which is the quantity comparable across regions surveyed at different intensities.
The effort enters as a cumulative quantity aligned with the accumulation order, so the model relates accumulated richness to accumulated area and accumulated effort step by step. In this simulation effort is independent of the species data, so its exponent reflects only the Poisson draw rather than a real survey-intensity gradient. The instructive case is the field dataset where under-surveyed regions are also species-poor in the raw counts: there the naive SAR reads the low richness of sparsely visited sites as an area effect, and only the effort term separates true poverty from missing detections. SESARS is most useful precisely when effort and richness move together, since that is the configuration in which the uncorrected SAR is most misleading.
spaccMetrics() runs an accumulation curve starting from
each site and reduces each curve to summary statistics, giving one row
per site. Three metrics are requested here: slope_10 (the
initial slope over the first 10 percent of sites),
half_richness (the number of sites needed to reach half the
total species pool), and auc (the area under the
accumulation curve, normalised by length).
met <- spaccMetrics(species, coords,
metrics = c("slope_10", "half_richness", "auc"),
progress = FALSE)
summary(met)
#> Metric summary:
#> slope_10: mean=1.55, sd=0.65, range=[0.39, 3.40]
#> half_richness: mean=9.56, sd=4.39, range=[2.00, 19.00]
#> auc: mean=32.99, sd=1.58, range=[28.74, 36.17]The per-site values are stored in met$metrics, a data
frame carrying the coordinates and a column for each metric, which makes
it directly mappable and joinable.
head(met$metrics[, c("site_id", "x", "y", "slope_10", "half_richness", "auc")], 4)
#> site_id x y slope_10 half_richness auc
#> 1 1 91.48060 58.16040 0.9285714 10 34.2625
#> 2 2 93.70754 15.79052 1.1666667 8 33.6500
#> 3 3 28.61395 35.90283 1.0952381 11 34.5125
#> 4 4 83.04476 64.56319 1.0000000 15 31.0625Spatial heatmap of the initial accumulation slope per site.
A steep initial slope marks a site whose neighbourhood adds species
fast: many distinct species are packed nearby, the spatial signature of
a diversity hotspot. A low half_richness flags a site from
which half the species pool is reachable within a short accumulation,
another hotspot indicator. Reading the heatmap, the clusters of high
slope coincide with the corners where endemics were seeded, since those
tight ranges concentrate species turnover. These maps turn an abstract
curve into a spatial prioritisation layer: rank sites by slope or
inverse half-richness and the top entries are candidate priority
areas.
The three metrics capture different stages of the curve and need not agree. The slope reads the steepness at the origin, sensitive to immediate neighbours. Half-richness reads the middle, where the curve crosses the halfway mark and turnover slows. The AUC integrates the whole curve and rewards sites that accumulate quickly and reach a high plateau. A site can have a steep slope but a high half-richness if its rich neighbourhood is small and the curve then stalls. Comparing the maps of the three metrics shows whether the hotspots are local peaks or extended rich zones, a distinction that matters when a reserve must enclose a contiguous area rather than a single point.
The metrics object exports to sf for use in a GIS workflow or for spatial joins against habitat layers. The conversion is guarded because sf is a Suggests dependency.
spatialEigenvectors() builds the MEM basis from
coordinates alone. The print method reports the truncation threshold,
the number of eigenvectors retained, and the range of Moran’s I across
them.
mem <- spatialEigenvectors(coords)
mem
#> Spatial eigenvectors (PCNM): 80 sites
#> Threshold: 17.5122
#> Eigenvectors retained: 50
#> Moran's I range: [-0.168, 1.419]The summary() method tabulates each eigenvector’s
eigenvalue, Moran’s I, and share of total variance. The leading vectors
carry the broad-scale, strongly autocorrelated patterns; later vectors
describe progressively finer structure.
head(summary(mem), 5)
#> vector eigenvalue moran_i variance_explained
#> 1 MEM1 22573.08 1.4189143 0.09330907
#> 2 MEM2 18796.70 1.1530522 0.07769883
#> 3 MEM3 16042.37 0.9561648 0.06631344
#> 4 MEM4 14292.83 0.8373798 0.05908148
#> 5 MEM5 13568.98 0.7816404 0.05608932First spatial eigenvectors mapped over the landscape.
Mapping the first few eigenvectors shows the scales explicitly: MEM1
splits the region into broad halves, later vectors carve it into finer
tiles. To learn how much of a diversity pattern is spatially structured,
spatialPartition() regresses a per-site response on the
MEMs with forward selection, then reports the spatial and non-spatial
shares of variance. The response here is the per-site initial slope from
the metrics object.
slope <- met$metrics$slope_10
part <- spatialPartition(slope, mem)
part
#> Spatial Variance Partitioning
#> -----------------------------------
#> Spatial R-squared: 0.716
#> Non-spatial: 0.284
#> MEMs selected: 15
#> Selected: MEM15, MEM5, MEM2, MEM25, MEM8, MEM23, MEM36, MEM21, MEM14, MEM37, MEM39, MEM41, MEM10, MEM35, MEM33Variance in per-site slope split into spatial and non-spatial components.
The spatial R-squared is the fraction of slope variation explained by the selected eigenvectors, and the selected MEMs name the scales that carry it. A large spatial share with mostly low-numbered MEMs means the hotspots are organised at broad scales; a share spread across high-numbered MEMs means the structure is patchy and fine-grained. Partitioning this way separates the question “is the pattern spatial” from “is the pattern explained by a covariate”, and the two analyses can be combined to ask how much spatial structure survives after accounting for an environmental predictor.
Forward selection guards against overfitting. It adds eigenvectors
one at a time, keeping each only when it improves the model fit by
enough to justify the extra parameter, and stops when no remaining
vector helps. The result is a parsimonious set rather than the full
basis, so the reported spatial R-squared reflects structure the data
actually support. Passing forward = FALSE includes every
eigenvector and inflates the fit, which is appropriate only when the
goal is to remove all spatial signal rather than to describe it. The
selected MEMs are worth inspecting individually: their maps name the
spatial scales at which the response is organised, turning a single
R-squared into a scale-resolved account of where the pattern lives.
wavefront() accumulates species inside an expanding
radius from seed points, a model of spread from an introduction front.
As a diagnostic it answers a different question than the
nearest-neighbour curve: how richness grows with geographic reach rather
than with site count, which exposes how clumped the species pool is
around typical starting points. The method originates in invasion
biology, where it describes the species an organism encounters as its
range front sweeps outward, but it serves equally as a spatial
diagnostic for any dataset where distance from a point is the relevant
axis.
wf <- wavefront(species, coords, n_seeds = 20, n_steps = 40,
progress = FALSE, seed = 1)
wf
#> spacc wavefront: 80 sites, 40 species, 20 seeds
#> Radius: 0.00 to 124.00 (40 steps)Species captured against expanding radius from seed points.
The curve rises steeply at small radii when seeds land in
species-dense neighbourhoods, then flattens as the front reaches sparse
ground. Plotting against xaxis = "sites" instead recovers a
site-based accumulation, so the two axes give the geographic and the
sampling view of the same expansion. A front that saturates early
relative to its maximum radius indicates a pool concentrated near the
seeds, the spatial signature the endemism curve also detects.
The radius step dr defaults to the maximum pairwise
distance divided by n_steps, so the front sweeps the whole
region in the requested number of increments. The contrast between the
radius axis and the site axis is the diagnostic content. A curve that
climbs fast on the radius axis but slow on the site axis means each unit
of distance brings in many sites, the signature of a dense, evenly
covered network. The opposite pattern, slow on radius and fast on sites,
marks a network with large gaps the front must cross before reaching the
next cluster. Read this way, the wavefront measures how the spatial
layout of the sampling, not just the species, shapes the pace of
discovery.
Sites clustered in space inflate the apparent accumulation rate and
bias macroecological fits, because nearby sites share species.
subsample() thins the network to reduce that
autocorrelation. Grid thinning overlays a mesh of a chosen cell size and
keeps one random site per occupied cell.
keep <- subsample(coords, method = "grid", cell_size = 20, seed = 1)
length(keep)
#> [1] 24
sac_full <- spacc(species, coords, n_seeds = 20, progress = FALSE, seed = 1)
sac_thin <- spacc(species[keep, ], coords[keep, ], n_seeds = 20,
progress = FALSE, seed = 1)
combined <- c(full = sac_full, thinned = sac_thin)Accumulation curve before and after grid thinning.
The thinned curve sits above the full curve at equal site counts: with redundant near-neighbours removed, each retained site adds more new species, so the per-site accumulation rate climbs. The two curves converge at their right ends toward total richness, since thinning drops sites, not species. For SAR and SFAR fits, the thinned set gives a less biased area exponent because it breaks the spatial redundancy that otherwise flattens the curve. Thinning gives up sample size to gain independence, and the cell size sets the balance directly.
Grid thinning is one of three methods the function offers. Random
subsampling takes a simple sample of a target size and ignores
geography, useful only for matching sample sizes across datasets.
Distance thinning removes sites greedily until no two retained points
fall closer than min_dist, dropping the member of each
close pair that has more near neighbours, which gives finer control than
a grid when the sampling is clustered irregularly. The grid method is
the fastest and the most predictable: one site per occupied cell, with
the cell size read directly off the autocorrelation scale. Whichever
method is chosen, the return value is a vector of row indices, so the
same indices subset both the species matrix and the coordinates and keep
them aligned by position.
The spatial model family rewards attention to sample size and to whether the question matches the method. The rules below use the conventions of this package and the cited literature.
Rules of thumb.
cell_size near the spatial
scale of autocorrelation. A cell smaller than the nearest-neighbour
spacing keeps nearly every site and removes no redundancy; a cell
several times the spacing discards most of the data.Minimum sample sizes.
| Method | Minimum input | Comfortable input |
|---|---|---|
| Endemism curve | 20 sites | 40 or more sites |
| SFAR | 4 fragments | 8 to 10 fragments |
| SESARS | effort spanning 10x | effort spanning 30x+ |
| MEM partition | 20 sites | 50 or more sites |
| Per-site metrics | 30 sites | 50 or more sites |
| Wavefront | 30 sites | 50 or more sites |
Choosing a method.
| Question | Method |
|---|---|
| Which sites hold species found nowhere else | spaccEndemism |
| Does fragmentation cost richness beyond area | sfar |
| Is richness biased by uneven survey effort | sesars |
| Where are the accumulation hotspots | spaccMetrics |
| At what scale is the diversity pattern organised | spatialEigenvectors + spatialPartition |
| How does richness grow with geographic reach | wavefront |
| How to break spatial autocorrelation | subsample |
When not to use each.