ELAplus provides tools for Energy Landscape Analysis (ELA), a framework for analyzing dynamical systems represented as weighted networks. The package is designed primarily for the analysis of ecological community composition data, but the framework is applicable to other multistable systems.
The typical workflow consists of:
- Fitting community composition data to a (possibly extended) pairwise maximum entropy model
- Constructing an energy landscape from the fitted model
- Identifying stable states and transition structures
- Visualizing the resulting landscape and interactions
The theoretical background is described in Suzuki et al. (2021).
This vignette demonstrates a standard ELA workflow using test dataset.
baseabtable: samples -
species abundance table (data.frame or matrix).
basemetadata (optional): a samples -
factors metadata table (data.frame or matrix).
baseabtable)# load package
library(ELAplus)
# load test data set
# species abundance table
data("baseabtable")
head(baseabtable)
#> species.1 species.2 species.3 species.4 species.5 species.6 species.7
#> sample.1 2384 0 1313 8218 0 3529 0
#> sample.2 10343 0 2201 6814 0 2887 0
#> sample.3 4473 0 614 2046 4455 3429 0
#> sample.4 8786 0 5517 328 0 15 0
#> sample.5 10640 0 2321 6652 0 2642 0
#> sample.6 0 0 165 7794 1697 3816 0
#> species.8 species.9 species.10 species.11 species.12 species.13
#> sample.1 0 3938 120 4156 0 3223
#> sample.2 6944 4907 0 0 0 0
#> sample.3 1503 8319 0 0 0 10433
#> sample.4 5372 10032 1722 0 0 5001
#> sample.5 7305 4858 0 0 0 0
#> sample.6 0 7230 2130 2267 0 6102
#> species.14 species.15 species.16
#> sample.1 7150 3122 0
#> sample.2 7298 0 0
#> sample.3 5080 0 0
#> sample.4 6117 0 0
#> sample.5 7020 0 0
#> sample.6 9944 0 0
# metadata (environmental factors)
data("basemetadata")
head(basemetadata)
#> factor.1 factor.2
#> sample.1 -0.92 2.7969211
#> sample.2 0.34 2.1470125
#> sample.3 -0.92 1.6061136
#> sample.4 -0.24 0.5847647
#> sample.5 0.40 1.7316138
#> sample.6 -0.82 2.9597175The simulated community composition data are also available.
In this example, the data comprising 24 species and
512 samples were generated through 100
iterations of Gibbs sampling (heat-bath algorithm).
Both implicit/explicit environmental factors (h.act and
g.act) are considered. The number of explicit environmental
factors is specified by ne (2 factors in this
example).
j.act is the interaction matrix.
# load package
library(ELAplus)
# Generate random parameters
hb_params <- hb.paramgen(24, ne = 2) # the number of environmental factors
h.act <- hb_params[[1]]
j.act <- hb_params[[2]]
g.act <- hb_params[[3]]
# Simulate community composition data
rand_data <- HeatBath(100, 512, h.act, j.act, g = g.act)
rand_mat <- rand_data[[1]]
rand_enmat <- rand_data[[2]]ELAplus expects the community data in a standardized representation.
Formatting() converts raw input tables into aligned
occurrence (binary) matrix and (optionally) environmental factor
matrix.
baseabtable: samples -
species abundance table (data.frame or
matrix).
basemetadata (optional): a samples
- factors metadata table (data.frame or matrix).
normalize = 1, the abundances within each sample are
normalized so that each row sums to 1. This is useful when:
normalize = 0, the original scale is preserved.Formatting() creates the occurrence matrix by applying
the first element of parameters:parameters[1] (“0/1 threshold”): a numeric cutoff used
to define presence.
parameters[1] = 0.05 means “abundance >
0.05 is present; otherwise absent”.Species that are too rare or almost always present typically add little information or inflate the results in construction of energy landscape.
Formatting() therefore filters species based on
prevalence across samples using:
parameters[2] (“min occurrence threshold”)
parameters[3] (“max occurrence threshold”)
These are interpreted as proportions of samples in which the species is present after binarization.
Example:
parameters = c(0.01, 0.01, 0.99) means:formatted <- Formatting(baseabtable = baseabtable, basemetadata = basemetadata,
normalize = 1, parameters = c(0.01, 0.01, 0.99))
#> Processed 256 samples.
#> Relative abundance threshold = 0.01
#> Occurrence threshold (lower) = 0.01
#> Occurrence threshold (upper) = 0.99
#> Selected 16 out of 16 species.
ocvecs <- formatted[[1]] # community composition binary matrix
abvecs <- formatted[[2]] # community abundance matrix (not binarized)
envecs <- formatted[[3]] # environmental variables (optional)The energy \(E(\sigma^{(k)},\varepsilon)\) of each community composition \(\sigma^{(k)}\) is assigned by imposing the potential structure to the state space by introducing the extended pairwise maximum entropy model. Briefly, the energy is determined by three parameters. For details, please see Appendix.
h: Implicit (unobserved) environmental parameter for
each species
J: Species-species interaction matrix
g: Explicit environmental parameters for each
species
Fitting the parameters of the pairwise maximum entropy model is performed using a stochastic approximation algorithm.
We approximate the intractable model expectation in the likelihood gradient with a persistent (warm-started) Gibbs/heat-bath chain advanced by one sweep per parameter iteration, corresponding to persistent contrastive divergence with one sweep (PCD-1), also known as stochastic maximum likelihood (SML).
It repeatedly simulates data under the current parameters and then
updates h, J, and g to reduce the
discrepancies between observed vs simulated metrics.
Two observed co-occurrence statistics are used for the fitting.
Species–species co-occurrence: \(y = ocvecs^{T}ocvecs\)
Environment–species co-occurrence: \(y^{env} = envecs^{T}ocvecs\)
Then, the algorithm repeatedly updates the parameters by comparing
Each parameter independently updated by those metrics:
\[ \Delta{h} \propto diag(y_{diff}) \] \[ \Delta{J} \propto y_{diff} \]
\[ \Delta{g} \propto y^{env}_{diff} \]
For simplicity, the usage of ELA without explicit environmental
parameters (envecs) is first explained.
Fitting the pairwise maximum entropy model with runSA()
depends on several optimization configurations, and the best settings
can vary across datasets (number of species, sample size, sparsity).
ELAplus provides a dedicated grid-search routine to choose hyperparameters empirically.
Findbp() performs a grid search over three key
hyperparameters:
-lmd : a sparsity penalty to encourage
sparse matrix. Larger values typically lead to fewer nonzero
parameters.
-we : AdamW weight decay, a form of L2
regularization applied during optimization.
-maxlrs : maximum learning rate in
fitting algorithm (SML).
bpresult <- Findbp(ocvecs, rep = 2, threads = 2,
cvmode="10fold",nrepeat = 10, we=c(0.001),totalit=4000,
lmd=c(1e-05,1e-03,0.01),maxlrs = c(0.005),
tol=0.03, runadamW=TRUE,sparse=TRUE,fastfitting=TRUE)
#Start hyper-parameter search:
#There are 300 tasks
#Finished 300 tasks
#SA: elapsed time 15.08 sec
bp <- bpresult[[1]] # typically best bpresult
bpm <- as.numeric(unlist(strsplit(names(bp)[1], split = "_")))
allresults <- bpresult[[2]]1. Splits samples into training and test subsets
nrepeat times), and each fold is used as the test
set while the remaining folds are used for training.2. Fits the model on the training subset for each grid point
runSA().3. Evaluates predictive performance on held-out samples
fastfitting=TRUE, the function returns candidates
with the lowest iterations within tol of the best observed
error.plotSAtest.The best parameters found by Findbp() are used for real
analyses by runSA().
After fitting the (extended) pairwise maximum entropy model with
runSA(), the next step is to construct energy landscape
over community compositions using the fitted parameters.
In ELAplus, this is performed by ELA(), which
identifies:
1. Stable states Here, the base of the basin (with minimal energy) is regarded as a stable state, which is distinct from the equilibrium point of the dynamical system or the attractor. For details, see Suzuki et al. (2021),
2. Tipping points (energy barriers connecting pairs of basins/stable states). A connecting pair of basins have basin boundary and the point with the lowest energy barrier is regarded as tipping point here.
The fitted model defines an energy value for any community composition (typically a binary presence/absence vector of community members).
After stable states and tipping points are computed,
ELA() summarizes and standardizes the outputs. This
conversion step produces a compact representation of the energy
landscape structure:
bin2id())The output format is compatible to downstream steps such as pruning (ELPruning) and visualization (e.g., disconnectivity graphs).
ela[[1]] # stable state IDs
#> [1] "09x" "01t" "EWB" "EY8" "1uV" "1mL" "17s"
ela[[2]] # stable state energies
#> [1] -11.099418 -10.297931 -9.544050 -8.961634 -8.005040 -6.440385 -6.269140SSestimate)SS.itr
(default 20000).ELA()
evaluates transitions between them by searching for tipping points
between all pairs of stable states.FindingTip.itr. As with SS.itr, increasing
FindingTip.itr improves robustness at the cost of runtime.
This step can be computationally heavy because it is performed across
stable-state pairs.The energy landscape constructed by ELA() first
identified stable states as much as possible. Some of these stable
states correspond to very shallow basins.
Such stable states are often sensitive to noise and does not largely explain dominant multistability of the system.
ELPruning() addresses this issue by iteratively pruning
shallow basins and merging them into deeper, neighboring basins.
elap <- ELPruning(ela, th=0.1, type="relative")
#> Start pruning:
#> *...
#> ELPruning: elapsed time 0.05 secELPruning() returns a list with two elements:
[[1]] Pruned ELA object (same format as the output of
ELA()):
elap[[1]][[1]] # updated stable state IDs
#> [1] "09x" "EWB" "EY8" "1uV"
elap[[1]][[2]] # updated stable state energies
#> [1] -11.099418 -9.544050 -8.961634 -8.005040[[2]] Stable-state mapping table
For each stable state, a basin depth is defined as the energy difference between the energy of the stable state itself and the lowest tipping point energy connecting that stable state to any other stable state (\(\Delta E\)).
The basin depth measures how “robust” a stable state is against transitions: deeper basins correspond to more robust, while shallow basins correspond to more unstable states.
In ELPruning(), there are two methods
relative or bootstrap.
If type="relative" is set, th (relative
depth threshold) specifies the minimum acceptable energy barrier
relative to the deepest basin barrier. \[
\Delta E_{\mathrm{min}} < th * \Delta E_{\mathrm{max}}
\] For example, th = 0.2 means that basins shallower
than 20% of the deepest basin (\(\Delta
E_{\mathrm{max}}\)) are pruned. The results of pruning is usually
sensitive to this value.
Alternatively, users can perform pruning of stable states by calculating vulnerabilities in bootstrap samples in each stable state.
This method is explained in the next section:Pruning with bootstrap resampling.
A disconnectivity graph is a compact visualization of the global structure of an energy landscape.
The reliability of stable states can be also examined by the
bootstrap resampling of training data sets (the occurrence matrix). The
function, bootstrap_SA(), performs parameter fitting with
bootstrap resampling of the data sets.
bootstrap_ELA() uses the results of
bootstrap_SA() (i.e., estimated parameters: h, J) for
computing the variability of energy gaps between tipping points and
stable states arising from uncertainty due to finite sample size.
If the estimated stable state is statistically reliable, the estimated \(\Delta E\) is expected to be consistently smaller than 0 across bootstrap resamples of the training data.
Therefore, the probability of bootstrap samples with \(\Delta E < 0\) reflects the statistical significance of the estimated stable state. Here, this quantity is referred to as a “bootstrap-approximated p-value” and is used to assess statistical reliability.
As an example, the distribution of \(\Delta E\) between “01t” and “09/” is shown as a histogram. More than 25% of bootstrap samples yield \(\Delta E > 0\) (i.e., p-val > 0.25), and this case is regarded as statistically not reliable.
If bootstrap_ene is specified in showDG(),
the function displays an approximate p-value in each stable state. This
option is helpful when you want to see uncertainty in estimated energies
in ELA.
bs_params <- bootstrap_SA(ocvecs, rep = 16, threads=8, bootstrap.it = 1000,
lambda = bpm[[1]],we = bpm[[2]],
maxlr = bpm[[3]], totalit = bpm[[4]])
bootstrap <- bootstrap_ELA(bs_params,ocvecs,ela=ela,threads = 1)
ela_rep <- bootstrap[[1]]
bootstrap_ene <- bootstrap[[2]]
showDG(ela_rep,ocvecs,bootstrap_ene = bootstrap_ene, min_pval = 0.01,
scale_labels = c(0.05,0.1,0.2,0.5))The pruning stable states is also possible using the bootstrap results.
The ELPruning() with type="bootstrap"used
the distribution of \(\Delta E\) in
bootstrap samples and evaluated whether the predicted stable states are
statistically reliable.
If the p-value is above threshold (e.g., 0.05), the function prunes such unreliable and shallow basins (stable states) and merges them into deeper, neighboring basins. The pruned graph is follows.
elap_rep <- ELPruning(ela, type = "bootstrap",bootstrap_ene=bootstrap_ene,lower_qr = 0.25)
showDG(ela_rep,ocvecs,bootstrap_ene = bootstrap_ene, min_pval = 0.01,
scale_labels = c(0.05,0.1,0.2,0.5))If getGraohObj=TRUE, showDG() function
returns a data frame for plotting a disconnectivity graph. In the
following example, disconnectivity graph is drawn using
ggplot(). Tipping points and stable states are shown as
triangles and circles, respectively, and colored by energies.
grobj_to_plot <- showDG(ela_rep,ocvecs,bootstrap_ene = bootstrap_ene,getGraphObj = TRUE)
grobj_point <- grobj_to_plot[grobj_to_plot$point_str != "",]
g <- ggplot(
grobj_to_plot,
aes(x=.data$nodes2xposi, y=.data$energy, label=.data$point_str)) +
xlab("") + geom_text(hjust=0.75, vjust=2, aes(fontface=2)) + geom_path() +
geom_point(
data = grobj_point,
mapping = aes(
x = .data$nodes2xposi,
y = .data$energy,
color = .data$energy,
shape = .data$type,
),size = 4
) +
scale_color_viridis_c(option = "plasma",alpha=0.8,
direction=-1,na.value = "grey") +
coord_cartesian(xlim=c(0.75,7.5), ylim=c(-5.8,-11.5))
plot(g)showIntrGraph() visualizes the pairwise interaction
structure inferred by the fitted model in a 2D “interaction space”.
It extracts estimated parameters from the resulting interaction matrix (jest).
th are classified as positive,
while links with value < -th are classified as
negative.The plot draws these two link types separately and colors them differently (positive = red, negative = blue).
The stability structure of a community is not fixed, but changes continuously along an environmental gradient (e.g., temperature, nutrient availability, disturbance intensity).
GradELA() is designed to capture this dependence by
estimating energy landscapes repeatedly across a controlled range of an
environmental factor and summarizing how stable states and transition
barriers change.
When environmental variables (envecs) are included,
ELAplus supports Gradient Energy Landscape Analysis via
GradELA(). The parameter fitting process is same as that
for normal ELA except for specifying enmat.
bpresult <- Findbp(ocvecs, enmat = envecs, rep = 2, threads = 2,
cvmode="10fold",nrepeat = 10, we=c(0.001),totalit=4000,
lmd=c(1e-05,1e-03,0.01),maxlrs = c(0.005),
tol=0.03, runadamW=TRUE,sparse=TRUE,fastfitting=TRUE)
bp <- bpresult[[1]]
bpm <- as.numeric(unlist(strsplit(names(bp)[1], split = "_")))
sa <- runSA(
ocvecs,
enmat = envecs,
rep = 16,
getall = FALSE,
lambda = bpm[[1]],
we = bpm[[2]],
maxlr = bpm[[3]],
totalit = bpm[[4]]
)GradELA() proceeds in three conceptual stages:
eid is varied over a specified
range and steps.
eid is environmental factor for changing (one column in
the environmental matrix enmat).range is the minimum and maximum values of the
environmental factor.steps is how many discrete points are evaluated.ELA()
and ELPruning() are performed.env (it corresponds to a row
in enmat) as follows:
env is provided, it is used as the reference vector
for all environmental factors except eid.env is not provided, the mean vector of
enmat is used automatically.# This process takes few minutes with a small number of threads, and here pre-computed results are shown.
gela <- GradELA(
sa = sa,
eid = "factor.1",
enmat = envecs,
env = NULL,
range = c(-1, 1),
steps = 32,
th = 0.05,
threads = 8,
pruning_type = "relative",
bs_params = bs_params,
ocmat = ocvecs)Users can use bootstrap samples in ELPruning process by
pruning_type="bootstrap". Other options are equivalent to
ELPruning().
# This process takes few minutes with a small number of threads, and here pre-computed results are shown.
bs_params <- bootstrap_SA(ocvecs, enmat = envecs,rep = 16, threads=8,bootstrap.it = 1000,
lambda = bpm[[1]],we = bpm[[2]],
maxlr = bpm[[3]], totalit = bpm[[4]])
gela <- GradELA(
sa = sa,
eid = "factor.1",
enmat = envecs,
env = NULL,
range = c(-1, 1),
steps = 32,
lower_qr = 0.25,
threads = 8,
pruning_type = "bootstrap",
bs_params = bs_params,
ocmat = ocvecs)GradELA() returns a list with three elements:
1. Pruned ELA results for each step
ELA() output, after pruning).2. Environmental value sequence
eid).3. Reference environmental vector
The resulting object can then be plotted using
showSSD(), which visualize the regime shift: how the
number, identity, and energy of stable states change along the
environmental gradient.
If users want to modify the figure configuration of the diagram,
showSSD_ggplot() function returns a data frame for
plotting. Then users can the configuration manually using
ggplot().
ssd_df <- showSSD_ggplot(gela,plot = FALSE,getSSDobj = TRUE)
g <- ggplot(ssd_df, aes(x = env, y = Energy, color = SS_ID)) +
geom_point(shape = 19) +
geom_line(aes(group = SS_ID))showGELA3D() visualizes a smoothed energy landscape
surface produced by GELAObj() and overlays stable-state
trajectories.
GradELA().# This process takes few minutes, and here pre-computed result is drawn.
gelaobj <- GELAObj(gela, sa=sa,threads=2)
showGELA3D(gelaobj,theta = 30,phi = 30)Overview of extended maximum pairwise entropy model
The model determines the probability of the occurrence of community composition \(\sigma^{(k)}\) in an environmental condition \(\varepsilon = (\varepsilon_{1}, \varepsilon_{2}, ..., \varepsilon_{M})\), environmental factors such as pH or temperature.
\[ E(\sigma^{(k)},\varepsilon) = - \sum_{i=1}^{S} h_i \sigma_i^{(k)} - \sum_{j=1}^{S} \sum_{i=1}^{M} g_{ij} \varepsilon_i^{(k)} \sigma_j^{(k)} - \sum_{i=1}^{S} \sum_{j=1,i \neq j}^{S} J_{ij} \sigma_i^{(k)} \sigma_j^{(k)} \] where: