Title: | Vegetation Patterns |
---|---|
Description: | Find, visualize and explore patterns of differential taxa in vegetation data (namely in a phytosociological table), using the Differential Value (DiffVal). Patterns are searched through mathematical optimization algorithms. Ultimately, Total Differential Value (TDV) optimization aims at obtaining classifications of vegetation data based on differential taxa, as in the traditional geobotanical approach. The Gurobi optimizer, as well as the R package 'gurobi', can be installed from <https://www.gurobi.com/products/gurobi-optimizer/>. The useful vignette Gurobi Installation Guide, from package 'prioritizr', can be found here: <https://prioritizr.net/articles/gurobi_installation_guide.html>. |
Authors: | Tiago Monteiro-Henriques [aut, cre] , Jorge Orestes Cerdeira [aut] , Fundação para a Ciência e a Tecnologia, Portugal [fnd] (<https://www.fct.pt/>) |
Maintainer: | Tiago Monteiro-Henriques <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.0 |
Built: | 2024-12-13 06:34:01 UTC |
Source: | CRAN |
Given a big phytosociological data set represented as a list, and a partition of the relevés in that list, this function calculates the respective Total Differential Value (TDV).
bigdata_tdv( phyto_list, p, n_rel, output_type = "normal", parallel = FALSE, mc_cores = getOption("mc.cores", 2L) )
bigdata_tdv( phyto_list, p, n_rel, output_type = "normal", parallel = FALSE, mc_cores = getOption("mc.cores", 2L) )
phyto_list |
A list. This is a very light representation of what could
be a usual phytosociological table, registering only taxa presences. Each
component should uniquely represent a taxon and should contain a vector (of
numeric values) with the relevé(s) id(s) where that taxon was observed.
Relevé's ids are expected to be represented by consecutive integers,
starting with 1. The components of the list might be named (e.g. using the
taxon name) or empty (decreasing further memory burden). However, for
|
p |
A vector of integer numbers with the partition of the relevés (i.e.,
a k-partition, consisting in a vector with values from 1 to k, with length
equal to the number of relevés in |
n_rel |
The number of relevés in the |
output_type |
A character determining the amount of information returned by the function and also the amount of pre-validations. Possible values are "normal" (the default) and "fast". |
parallel |
Logical. Should function |
mc_cores |
The number of cores to be passed to |
This function accepts a list (phyto_list
) representing a
phytosociological data set, as well as a k-partition of its relevés (p
),
returning the corresponding TDV (see tdv()
for an explanation
on TDV).
Partition p
gives the group to which each relevé is ascribed, by
increasing order of relevé id.
Big phytosociological tables can occupy a significant amount of computer
memory, which mostly relate to the fact that the absences (usually more
frequent than presences) are also recorded in memory. The use of a list,
focusing only on presences, reduces significantly the amount of needed
memory to store all the information that a phytosociological table contains
and also the computation time of TDV, allowing computations for big data
sets.
If output_type = "normal"
(the default) pre-validations are done
(which can take some time) and a list is returned, with the following
components (see tdv()
for the mathematical notation):
A matrix with the values for each taxon
in each group, for short called the 'inner frequency of presences'.
A matrix with the values for each
taxon in each group, for short called the 'outer frequency of
differentiating absences'.
A vector with the values for each taxon, i.e., the
number of groups containing that taxon.
A matrix with the for each taxon.
A numeric with the TDV of matrix m_bin,
given the partition
p
.
If output_type = "fast"
, only TDV is returned and no pre-validations are
done.
Tiago Monteiro-Henriques. E-mail: [email protected].
# Getting the Taxus baccata forests data set data(taxus_bin) # Creating a group partition, as the one presented in the original article of # the data set groups <- rep(c(1, 2, 3), c(3, 11, 19)) # Removing taxa occurring in only one relevé, in order to reproduce exactly # the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Calculating TDV using tdv() tdv(taxus_bin_wmt, groups)$tdv # Converting from the phytosociologic matrix format to the list format taxus_phyto_list <- apply(taxus_bin_wmt, 1, function(x) which(as.logical(x))) # Getting the number of relevés in the list n_rel <- length(unique(unlist(taxus_phyto_list))) # Calculating TDV using bigdata_tdv(), even if this is not a big matrix bigdata_tdv( phyto_list = taxus_phyto_list, p = groups, n_rel = n_rel, output_type = "normal" )$tdv
# Getting the Taxus baccata forests data set data(taxus_bin) # Creating a group partition, as the one presented in the original article of # the data set groups <- rep(c(1, 2, 3), c(3, 11, 19)) # Removing taxa occurring in only one relevé, in order to reproduce exactly # the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Calculating TDV using tdv() tdv(taxus_bin_wmt, groups)$tdv # Converting from the phytosociologic matrix format to the list format taxus_phyto_list <- apply(taxus_bin_wmt, 1, function(x) which(as.logical(x))) # Getting the number of relevés in the list n_rel <- length(unique(unlist(taxus_phyto_list))) # Calculating TDV using bigdata_tdv(), even if this is not a big matrix bigdata_tdv( phyto_list = taxus_phyto_list, p = groups, n_rel = n_rel, output_type = "normal" )$tdv
This function plots an interactive image of a tabulation.
explore_tabulation(tab, palette = "Vik")
explore_tabulation(tab, palette = "Vik")
tab |
A list as returned by the |
palette |
A character with the name of the colour palette (one of
|
The function explore.tabulation accepts an object returned by the
tabulation()
function, plotting a condensed image of the
respective tabulated matrix, permitting the user to click on the coloured
blocks and receive the respective list of taxa names on the console.
Returns invisibly, although it prints taxa names on the console upon the user click on the figure.
Tiago Monteiro-Henriques. E-mail: [email protected].
# Getting the Taxus baccata forests data set data(taxus_bin) # Creating a group partition, as presented in the original article of # the data set groups <- rep(c(1, 2, 3), c(3, 11, 19)) # Removing taxa occurring in only one relevé in order to # reproduce exactly the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Sorts the phytosociological table, putting exclusive taxa at the top and # plots an image of it tabul <- tabulation( m_bin = taxus_bin_wmt, p = groups, taxa_names = rownames(taxus_bin_wmt), plot_im = "normal", palette = "Zissou 1" ) # This creates an interactive plot (where you can click) if (interactive()) { explore_tabulation(tabul, palette = "Zissou 1") }
# Getting the Taxus baccata forests data set data(taxus_bin) # Creating a group partition, as presented in the original article of # the data set groups <- rep(c(1, 2, 3), c(3, 11, 19)) # Removing taxa occurring in only one relevé in order to # reproduce exactly the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Sorts the phytosociological table, putting exclusive taxa at the top and # plots an image of it tabul <- tabulation( m_bin = taxus_bin_wmt, p = groups, taxa_names = rownames(taxus_bin_wmt), plot_im = "normal", palette = "Zissou 1" ) # This creates an interactive plot (where you can click) if (interactive()) { explore_tabulation(tabul, palette = "Zissou 1") }
Checks if two vectors represent the same k-partition.
identical_partition(p1, p2)
identical_partition(p1, p2)
p1 |
A vector of integers representing a k-partition (taking values
from 1 to k), of the same length of |
p2 |
A vector of integers representing a k-partition (taking values
from 1 to k), of the same length of |
Parameters p1
and p2
are vectors indicating group membership.
In this package context, these vectors have as many elements as the columns
of a phytosociological table, indicating the group membership of each
relevé to one of k groups (i.e., a k-partition).
This function checks if the two given vectors p1
and p2
correspond, in
practice, to the same k-partition, i.e., if the relevé groups are actually
the same, but the group numbers are somehow swapped.
TRUE
if p1
and p2
represent the same k-partitions; FALSE
otherwise.
Tiago Monteiro-Henriques and Jorge Orestes Cerdeira. E-mail: [email protected].
# Creating three 2-partitions par1 <- c(1, 1, 2, 2, 2) par2 <- c(2, 2, 1, 1, 1) par3 <- c(1, 1, 1, 2, 2) # Is it the same partition? identical_partition(par1, par2) # TRUE identical_partition(par1, par3) # FALSE identical_partition(par2, par3) # FALSE
# Creating three 2-partitions par1 <- c(1, 1, 2, 2, 2) par2 <- c(2, 2, 1, 1, 1) par3 <- c(1, 1, 1, 2, 2) # Is it the same partition? identical_partition(par1, par2) # TRUE identical_partition(par1, par3) # FALSE identical_partition(par2, par3) # FALSE
Given a phytosociological matrix, this function finds a partition in two groups of the matrix columns, which maximizes the Total Differential Value (TDV).
optim_tdv_gurobi_k_2(m_bin, formulation = "t-dependent", time_limit = 5)
optim_tdv_gurobi_k_2(m_bin, formulation = "t-dependent", time_limit = 5)
m_bin |
A matrix. A phytosociological table of 0s (absences) and 1s (presences), where rows correspond to taxa and columns correspond to relevés. |
formulation |
A character selecting which formulation to use. Possible values are "t-dependent" (the default) or "t-independent". See Details. |
time_limit |
A numeric ("double") with the time limit (in seconds) to be passed as a parameter to Gurobi, Defaults to 5 seconds, but see Details. |
Given a phytosociological table m_bin
(rows corresponding to taxa
and columns corresponding to relevés) this function finds a 2-partition (a
partition in two groups) that maximizes TDV, using the Gurobi optimizer.
Gurobi is a commercial software for which a free academic license can be obtained if you are affiliated with a recognized educational institution. Package 'prioritizr' contains a comprehensive vignette (Gurobi Installation Guide), which can guide you trough the process of obtaining a license, installing the Gurobi optimizer, activating the license and eventually installing the R package 'gurobi'.
optim_tdv_gurobi_k_2()
returns, when the optimization is successful, a
2-partition which is a global maximum of TDV for any 2-partitions of the
columns on m_bin
.
See tdv()
for an explanation on the Total Differential Value of a
phytosociological table.
The function implements two different mixed-integer linear programming formulations of the problem. The formulations differ as one is independent of the size of the obtained groups (t-independent), while the other formulation fixes the size of the obtained groups (t-dependent). The t-dependent formulation is implemented to run Gurobi as many times as necessary to cover all possible group sizes; this approach can result in faster total computation time.
For medium-sized matrices the computation time might become already
prohibitive, thus the use of a time limit (time_limit
) is advisable.
For formulation = "t-dependent"
, a list with the following
components:
A character vector with Gurobi output status for all the runs.
A numeric with the maximum TDV found by Gurobi.
A vector with the 2-partition corresponding to the the maximum TDV found by Gurobi.
For formulation = "t-independent"
, a list with the following components:
A character with Gurobi output status.
A numeric with the maximum TDV found by Gurobi.
A vector with the 2-partition corresponding to the the maximum TDV found by Gurobi.
Jorge Orestes Cerdeira and Tiago Monteiro-Henriques. E-mail: [email protected].
# Getting the Taxus baccata forests data set data(taxus_bin) # Obtaining the 2-partition that maximizes TDV using the Gurobi solver, by # mixed-integer linear programming ## Not run: # Requires the suggested package 'gurobi' optim_tdv_gurobi_k_2(taxus_bin) ## End(Not run)
# Getting the Taxus baccata forests data set data(taxus_bin) # Obtaining the 2-partition that maximizes TDV using the Gurobi solver, by # mixed-integer linear programming ## Not run: # Requires the suggested package 'gurobi' optim_tdv_gurobi_k_2(taxus_bin) ## End(Not run)
This function searches for partitions of the columns of a given matrix, optimizing the Total Differential Value (TDV).
optim_tdv_hill_climb( m_bin, k, p_initial = "random", n_runs = 1, n_sol = 1, maxit = 10, min_g_size = 1, stoch_first = FALSE, stoch_neigh_size = 1, stoch_maxit = 100, full_output = FALSE, verbose = FALSE )
optim_tdv_hill_climb( m_bin, k, p_initial = "random", n_runs = 1, n_sol = 1, maxit = 10, min_g_size = 1, stoch_first = FALSE, stoch_neigh_size = 1, stoch_maxit = 100, full_output = FALSE, verbose = FALSE )
m_bin |
A matrix. A phytosociological table of 0s (absences) and 1s (presences), where rows correspond to taxa and columns correspond to relevés. |
k |
A numeric giving the number of desired groups. |
p_initial |
A vector or a character. A vector of integer numbers
with the initial partition of the relevés (i.e., a vector with values from
1 to |
n_runs |
A numeric giving the number of runs to perform. |
n_sol |
A numeric giving the number of best solutions to keep in the final output. Defaults to 1. |
maxit |
A numeric giving the number of iterations of the Hill-climbing optimization. |
min_g_size |
A numeric. The minimum number of relevés that a group can contain (must be 1 or higher). |
stoch_first |
A logical. |
stoch_neigh_size |
A numeric giving the size (n) of the
n-neighbours for the Stochastic Hill-climbing; only used if
|
stoch_maxit |
A numeric giving the number of iterations of the
Stochastic Hill-climbing optimization; only used if |
full_output |
A logical. If |
verbose |
A logical. If |
Given a phytosociological table (m_bin
, rows corresponding to
taxa and columns corresponding to relevés) this function searches for
a k
-partition (k
defined by the user) optimizing TDV, i.e., searches,
using a Hill-climbing algorithm, for patterns of differential taxa by
rearranging the relevés into k
groups.
Optimization can start from a random partition (p_ini = "random"
), or
from a given partition (p_ini
, defined by the user or produced by any
clustering method, or even a manual classification of the relevés).
Each iteration searches for a TDV improvement screening all 1-neighbours,
until the given number of maximum iterations (maxit
) is reached. A
1-neighbour of a given partition is another partition obtained by changing
1 relevé (of the original partition) to a different group. A n-neighbour
is obtained, equivalently, ascribing n relevés to different groups.
Optionally, a faster search (Stochastic Hill-climbing) can be performed in
a first step (stoch_first = TRUE
), consisting on searching for TDV
improvements, by randomly selecting, in each iteration, one n-neighbour (n
defined by the user in the parameter stoch_neigh_size
), accepting that
n-neighbour partition as a better solution if it improves TDV. This is
repeated until a given number of maximum iterations (stoch_maxit
) is
reached. Stochastic Hill-climbing might be helpful for big tables (where
the screening of all 1-neighbours might be too time consuming).
Several runs of this function (i.e., multiple starts) should be tried out, as several local maxima are usually present and the Hill-climbing algorithm converges easily to local maxima.
Trimming your table by a 'constancy' range or using the result of other
cluster methodologies as input, might help finding interesting partitions.
Specially after trimming the table by a 'constancy' range, getting a random
initial partition with TDV greater than zero might be unlikely; on such
cases using a initial partition from partition_tdv_grasp()
or
partition_tdv_grdtp()
(or even the result of other clustering
strategies) as an input partition might be useful.
If full_output = FALSE
, a list with (at most) n_sol
best
solutions (equivalent solutions are removed). Each best solution is also
a list with the following components:
A logical indicating if par
is a 1-neighbour
local maximum.
A vector with the partition of highest TDV obtained by the Hill-climbing algorithm(s).
A numeric with the TDV of par
.
If full_output = TRUE
, a list with just one component (one run only),
containing also a list with the following components:
A matrix with the iteration number (of the Stochastic Hill-climbing phase), the maximum TDV found until that iteration, and the TDV of the randomly selected n-neighbour in that iteration.
A vector with the best partition found in the Stochastic Hill-climbing phase.
A numeric showing the maximum TDV found in the Stochastic Hill-climbing phase (if selected).
A matrix with the iteration number (of the Hill-climbing), the maximum TDV found until that iteration, and the highest TDV among all 1-neighbours.
A logical indicating if par
is a 1-neighbour local
maximum.
A vector with the partition of highest TDV obtained by the Hill-climbing algorithm(s).
A numeric with the TDV of par
.
Tiago Monteiro-Henriques. E-mail: [email protected].
# Getting the Taxus baccata forests data set data(taxus_bin) # Removing taxa occurring in only one relevé in order to # reproduce the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Obtaining a partition that maximizes TDV using the Stochastic Hill-climbing # and the Hill-climbing algorithms result <- optim_tdv_hill_climb( m_bin = taxus_bin_wmt, k = 3, n_runs = 7, n_sol = 2, min_g_size = 3, stoch_first = TRUE, stoch_maxit = 500, verbose = TRUE ) # Inspect the result. The highest TDV found in the runs. result[[1]]$tdv # If result[[1]]$tdv is 0.1958471 you are probably reproducing the three # groups (Estrela, Gerês and Galicia) from the original article. If not # try again the optim_tdv_hill_climb function (maybe increasing n_runs). # Plot the sorted (or tabulated) phytosociological table tabul1 <- tabulation( m_bin = taxus_bin_wmt, p = result[[1]]$par, taxa_names = rownames(taxus_bin_wmt), plot_im = "normal" ) # Plot the sorted (or tabulated) phytosociological table, also including # taxa occurring just once in the matrix tabul2 <- tabulation( m_bin = taxus_bin, p = result[[1]]$par, taxa_names = rownames(taxus_bin), plot_im = "normal" )
# Getting the Taxus baccata forests data set data(taxus_bin) # Removing taxa occurring in only one relevé in order to # reproduce the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Obtaining a partition that maximizes TDV using the Stochastic Hill-climbing # and the Hill-climbing algorithms result <- optim_tdv_hill_climb( m_bin = taxus_bin_wmt, k = 3, n_runs = 7, n_sol = 2, min_g_size = 3, stoch_first = TRUE, stoch_maxit = 500, verbose = TRUE ) # Inspect the result. The highest TDV found in the runs. result[[1]]$tdv # If result[[1]]$tdv is 0.1958471 you are probably reproducing the three # groups (Estrela, Gerês and Galicia) from the original article. If not # try again the optim_tdv_hill_climb function (maybe increasing n_runs). # Plot the sorted (or tabulated) phytosociological table tabul1 <- tabulation( m_bin = taxus_bin_wmt, p = result[[1]]$par, taxa_names = rownames(taxus_bin_wmt), plot_im = "normal" ) # Plot the sorted (or tabulated) phytosociological table, also including # taxa occurring just once in the matrix tabul2 <- tabulation( m_bin = taxus_bin, p = result[[1]]$par, taxa_names = rownames(taxus_bin), plot_im = "normal" )
This function searches for k
-partitions of the columns of a given matrix
(i.e., a partition of the columns in k
groups), optimizing the Total
Differential Value (TDV) using a stochastic global optimization method
called Simulated Annealing (SANN) algorithm. Optionally, a Greedy
Randomized Adaptive Search Procedure (GRASP) can be used to find a initial
partition (seed) to be passed to the SANN algorithm.
optim_tdv_simul_anne( m_bin, k, p_initial = NULL, n_runs = 10, n_sol = 1, t_inic = 0.3, t_final = 1e-06, alpha = 0.05, n_iter = 1000, use_grasp = TRUE, thr = 0.95, full_output = FALSE )
optim_tdv_simul_anne( m_bin, k, p_initial = NULL, n_runs = 10, n_sol = 1, t_inic = 0.3, t_final = 1e-06, alpha = 0.05, n_iter = 1000, use_grasp = TRUE, thr = 0.95, full_output = FALSE )
m_bin |
A matrix. A phytosociological table of 0s (absences) and 1s (presences), where rows correspond to taxa and columns correspond to relevés. |
k |
A numeric giving the number of desired groups. |
p_initial |
A vector of integer numbers with the partition of the
relevés (i.e., a |
n_runs |
A numeric giving the number of runs. Defaults to 10. |
n_sol |
A numeric giving the number of best solutions to keep in the
final output (only used if |
t_inic |
A numeric giving the initial temperature. Must be greater than 0 and maximum admitted value is 1. Defaults to 0.3. |
t_final |
A numeric giving the final temperature. Must be bounded between 0 and 1. Usually very low values are needed to ensure convergence. Defaults to 0.000001. |
alpha |
A numeric giving the fraction of temperature drop to be used in the temperature reduction scheme (see Details). Must be bounded between 0 and 1. Defaults to 0.05. |
n_iter |
A numeric giving the number of iterations. Defaults to 1000. |
use_grasp |
A logical. Defaults to |
thr |
A numeric giving a threshold value (from 0 to 1 ) with the
probability used to compute the sample quantile, in order to get the best
|
full_output |
A logical. Defaults to |
Given a phytosociological table (m_bin
, with rows corresponding to
taxa and columns corresponding to relevés) this function searches for a
k
-partition (k
, defined by the user) optimizing the TDV, i.e.,
searches, using a SANN algorithm (optionally working upon GRASP solutions),
for a global maximum of TDV (by rearranging the relevés into k
groups).
This function uses two main algorithms:
An optional GRASP, which is used to obtain initial solutions
(partitions of m_bin
) using function partition_tdv_grasp()
.
Such initial solutions are then submitted to the SANN algorithm.
The (main) SANN algorithm, which is used to search for a global
maximum of TDV. The initial partition for each run of SANN can be a
partition obtained from GRASP (if use_grasp = TRUE
) or, (if
use_grasp = FALSE
), a partition given by the user (using p_initial
) or
a random partition (using p_initial = "random"
).
The SANN algorithm decreases the temperature multiplying the current
temperature by 1 - alpha
according to a predefined schedule, which is
automatically calculated from the given values for t_inic
, t_final
,
alpha
and n_iter
.
Specifically, the cooling schedule is obtained calculating the number of
times that the temperature has to be decreased in order to approximate
t_final
starting from t_inic
. The number of times that the temperature
decreases, say nt
, is calculated by the expression:
floor(n_iter/((n_iter * log(1 - alpha)) / (log((1 - alpha) * t_final /
t_inic))))
.
Finally, these decreasing stages are scattered through the desired
iterations (n_iter
) homogeneously, by calculating the indices of the
iterations that will experience a decrease in temperature using
floor(n_iter / nt * (1:nt))
.
SANN is often seen as an exploratory technique where the temperature
settings are challenging and dependent on the problem. This function tries
to restrict temperature values taking into account that TDV is always
between 0 and 1. Even though, obtaining values of temperature that allow
convergence can be challenging. full_output = TRUE
allows the user to
inspect the behaviour of current.tdv
and check if convergence fails.
Generally, convergence failure can be spotted when final SANN TDV values
are similar to the initial current.tdv
, specially when coming from random
partitions. In such cases, as a rule of thumb, it is advisable to decrease
t_final
.
If full_output = FALSE
(the default), a list with the following
components (the GRASP component is only returned if use_grasp = TRUE
):
A list with at most n_sol
components, each one
containing also a list with two components:
A vector with the partition of highest TDV obtained by GRASP;
A numeric with the TDV of par
.
A list with at most n_sol
components, each one containing
also a list with two components:
A vector with the partition of highest TDV obtained by the (GRASP +) SANN algorithm(s);
A numeric with the TDV of par
.
If full_output = TRUE
, a list with the following components (the GRASP
component is only returned if use_grasp = TRUE
):
A list with n_runs
components, each one containing also a
list with two components:
A vector with the partition of highest TDV obtained by GRASP.
A numeric with the TDV of par
.
A list with n_runs
components, each one containing also a
list with six components:
A vector of length n_iter
with the current TDV of
each SANN iteration.
A vector of length n_iter
with the alternative
TDV used in each SANN iteration.
A vector of length n_iter
with the probability
used in each SANN iteration.
A vector of length n_iter
with the temperature of
each SANN iteration.
A vector with the partition of highest TDV obtained by the (GRASP +) SANN algorithm(s).
A numeric with the TDV of par
.
Jorge Orestes Cerdeira and Tiago Monteiro-Henriques. E-mail: [email protected].
# Getting the Taxus baccata forests data set data(taxus_bin) # Removing taxa occurring in only one relevé in order to # reproduce the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Obtaining a partition that maximizes TDV using the Simulated Annealing # algorithm result <- optim_tdv_simul_anne( m_bin = taxus_bin_wmt, k = 3, p_initial = "random", n_runs = 5, n_sol = 5, use_grasp = FALSE, full_output = TRUE ) # Inspect the result # The TDV of each run sapply(result[["SANN"]], function(x) x$tdv) # The best partition that was found (i.e., with highest TDV) result[["SANN"]][[1]]$par # A TDV of 0.1958471 indicates you are probably reproducing the three # groups (Estrela, Gerês and Galicia) from the original article. A solution # with TDV = 0.2005789 might also occur, but note that one group has only two # elements. For now, a minimum group size is not implemented in function # optim_tdv_simul_anne() as it is in the function optim_tdv_hill_climb(). # Inspect how the optimization progressed (should increase towards the right) plot( result[["SANN"]][[1]]$current.tdv, type = "l", xlab = "Iteration number", ylab = "TDV of the currently accepted solution" ) for (run in 2:length(result[["SANN"]])) { lines(result[["SANN"]][[run]]$current.tdv) } # Plot the sorted (or tabulated) phytosociological table, using the best # partition that was found tabul <- tabulation( m_bin = taxus_bin_wmt, p = result[["SANN"]][[1]]$par, taxa_names = rownames(taxus_bin_wmt), plot_im = "normal" )
# Getting the Taxus baccata forests data set data(taxus_bin) # Removing taxa occurring in only one relevé in order to # reproduce the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Obtaining a partition that maximizes TDV using the Simulated Annealing # algorithm result <- optim_tdv_simul_anne( m_bin = taxus_bin_wmt, k = 3, p_initial = "random", n_runs = 5, n_sol = 5, use_grasp = FALSE, full_output = TRUE ) # Inspect the result # The TDV of each run sapply(result[["SANN"]], function(x) x$tdv) # The best partition that was found (i.e., with highest TDV) result[["SANN"]][[1]]$par # A TDV of 0.1958471 indicates you are probably reproducing the three # groups (Estrela, Gerês and Galicia) from the original article. A solution # with TDV = 0.2005789 might also occur, but note that one group has only two # elements. For now, a minimum group size is not implemented in function # optim_tdv_simul_anne() as it is in the function optim_tdv_hill_climb(). # Inspect how the optimization progressed (should increase towards the right) plot( result[["SANN"]][[1]]$current.tdv, type = "l", xlab = "Iteration number", ylab = "TDV of the currently accepted solution" ) for (run in 2:length(result[["SANN"]])) { lines(result[["SANN"]][[run]]$current.tdv) } # Plot the sorted (or tabulated) phytosociological table, using the best # partition that was found tabul <- tabulation( m_bin = taxus_bin_wmt, p = result[["SANN"]][[1]]$par, taxa_names = rownames(taxus_bin_wmt), plot_im = "normal" )
This function obtains a partition of the columns of a given phytosociological matrix, aiming at high values of the Total Differential Value (TDV) using a GRASP algorithm.
partition_tdv_grasp(m_bin, k, thr = 0.95, verify = TRUE)
partition_tdv_grasp(m_bin, k, thr = 0.95, verify = TRUE)
m_bin |
A matrix. A phytosociological table of 0s (absences) and 1s (presences), where rows correspond to taxa and columns correspond to relevés. |
k |
A numeric giving the number of desired groups. |
thr |
A numeric giving a threshold value (from 0 to 1 ) with the
probability used to compute the sample quantile, in order to get the best
|
verify |
A logical. If |
This function uses a Greedy Randomized Adaptive Search Procedure
(GRASP) to obtain a partition of m_bin
.
Given a phytosociological table (m_bin
, with rows corresponding to taxa
and columns corresponding to relevés) this function searches for a
k
-partition (k
, defined by the user) aiming at high values of the TDV.
See tdv()
for an explanation on the TDV of a phytosociological table.
With thr = 1
, the algorithm corresponds to the Greedy algorithm.
A numeric vector, which length is the same as the number of columns
of m_bin
, with numbers from 1 to k
, representing the group to which the
respective column was ascribed.
Jorge Orestes Cerdeira and Tiago Monteiro-Henriques. E-mail: [email protected].
# Getting the Taxus baccata forests data set data(taxus_bin) # Obtaining a partition based on the GRASP algorithm partition_tdv_grasp(taxus_bin, 3)
# Getting the Taxus baccata forests data set data(taxus_bin) # Obtaining a partition based on the GRASP algorithm partition_tdv_grasp(taxus_bin, 3)
This function obtains a partition of the columns of a given phytosociological matrix, aiming at high values of the Total Differential Value (TDV), implementing a Greedy-type algorithm.
partition_tdv_grdtp(m_bin, k, verify = TRUE)
partition_tdv_grdtp(m_bin, k, verify = TRUE)
m_bin |
A matrix. A phytosociological table of 0s (absences) and 1s (presences), where rows correspond to taxa and columns correspond to relevés. |
k |
A numeric giving the number of desired groups. |
verify |
A logical. If |
Given the phytosociological table m_bin
(rows corresponding to
taxa and columns corresponding to relevés), this function uses a
Greedy-type algorithm (a simplified version of the Greedy algorithm) to
obtain a k
-partition (k
, defined by the user) of the columns of
m_bin
, aiming at high values of TDV.
The algorithm operates in the following way: Firstly, k
columns are
selected randomly to work as seeds for each one of the desired k
groups.
Secondly, one of the remaining columns is selected randomly and added to
the partition group which maximizes the upcoming TDV. This second step is
repeated until all columns are placed in a group of the k
-partition.
This function is expected to perform faster than partition_tdv_grasp()
,
yet returning worse partitions in terms of TDV. For the (true) Greedy
algorithm see partition_tdv_grasp()
.
See tdv()
for an explanation on the TDV of a phytosociological table.
A numeric vector, which length is the same as the number of columns
of m_bin
, with numbers from 1 to k
, representing the group to which the
respective column was ascribed.
Jorge Orestes Cerdeira and Tiago Monteiro-Henriques. E-mail: [email protected].
# Getting the Taxus baccata forests data set data(taxus_bin) # Obtaining a partiton based on a Greedy-type algorithm partition_tdv_grdtp(taxus_bin, 3)
# Getting the Taxus baccata forests data set data(taxus_bin) # Obtaining a partiton based on a Greedy-type algorithm partition_tdv_grdtp(taxus_bin, 3)
This function reorders a phytosociological table rows using, firstly, the
increasing number of groups in which a taxon occurs, and secondly, the
decreasing sum of the inner frequency of presences of each taxon
(see tdv()
). The columns are also reordered, simply using the increasing
number of the respective group membership.
tabulation( m_bin, p, taxa_names, plot_im = NULL, palette = "Vik", greyout = TRUE, greyout_colour = "grey" )
tabulation( m_bin, p, taxa_names, plot_im = NULL, palette = "Vik", greyout = TRUE, greyout_colour = "grey" )
m_bin |
A matrix. A phytosociological table of 0s (absences) and 1s (presences), where rows correspond to taxa and columns correspond to relevés. |
p |
A vector of integer numbers with the partition of the relevés (i.e.,
a k-partition, consisting in a vector with values from 1 to k, with length
equal to the number of columns of |
taxa_names |
A character vector (with length equal to the number of rows
of |
plot_im |
By default, |
palette |
A character with the name of the colour palette (one of
|
greyout |
A logical. If |
greyout_colour |
A character with the name of the colour to use for non-differential taxa. Defaults to "grey". |
The function accepts a phytosociological table (m_bin
), a
k-partition of its columns (p
) and the names of the taxa (corresponding
to the rows of m_bin
), returning a rearranged/reordered matrix (and
plotting optionally).
If plot_im = NULL
, a list with the following components:
The given taxa_names
A vector with the order of the rows/taxa.
The rearranged/reordered m_bin
matrix.
The matrix used to create the "condensed" image.
If plot_im = "normal"
, it returns the above list and, additionally, plots
an image of the tabulated matrix.
If plot_im = "condensed"
, it returns the above list and, additionally,
plots an image of the tabulated matrix, but presenting the sets of
differential taxa as solid coloured blocks of equal width.
Tiago Monteiro-Henriques. E-mail: [email protected].
# Getting the Taxus baccata forests data set data(taxus_bin) # Creating a group partition, as presented in the original article of the # data set groups <- rep(c(1, 2, 3), c(3, 11, 19)) # Removing taxa occurring in only one relevé in order to # reproduce exactly the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Sorting the phytosociological table, putting exclusive taxa in the top and # plotting an image of it tabul <- tabulation( m_bin = taxus_bin_wmt, p = groups, taxa_names = rownames(taxus_bin_wmt), plot_im = "normal", palette = "Zissou 1" ) # Inspect the first rows and columns of the reordered phytosociological table head(tabul$tabulated, n = c(5, 5))
# Getting the Taxus baccata forests data set data(taxus_bin) # Creating a group partition, as presented in the original article of the # data set groups <- rep(c(1, 2, 3), c(3, 11, 19)) # Removing taxa occurring in only one relevé in order to # reproduce exactly the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Sorting the phytosociological table, putting exclusive taxa in the top and # plotting an image of it tabul <- tabulation( m_bin = taxus_bin_wmt, p = groups, taxa_names = rownames(taxus_bin_wmt), plot_im = "normal", palette = "Zissou 1" ) # Inspect the first rows and columns of the reordered phytosociological table head(tabul$tabulated, n = c(5, 5))
A binary phytosociological table containing relevés of Taxus baccata forests, from the northwest of the Iberian Peninsula.
taxus_bin
taxus_bin
A matrix with 209 rows and 33 columns. Each column corresponds to a phytosociological relevé and each row corresponds to a taxon. Values in the matrix denote presences (1) and absences (0).
Portela-Pereira E., Monteiro-Henriques T., Casas C., Forner N., Garcia-Cabral I., Fonseca J.P. & Neto C. 2021. Teixedos no noroeste da Península Ibérica. Finisterra 56(117): 127-150. doi:10.18055/FINIS18102.
# Getting the Taxus baccata forests data set data(taxus_bin) # Inspect the first rows and columns of taxus_bin head(taxus_bin, n = c(5, 5))
# Getting the Taxus baccata forests data set data(taxus_bin) # Inspect the first rows and columns of taxus_bin head(taxus_bin, n = c(5, 5))
Given a phytosociological table and a partition of its columns, this function calculates the respective Total Differential Value (TDV).
tdv(m_bin, p, output_type = "normal")
tdv(m_bin, p, output_type = "normal")
m_bin |
A matrix. A phytosociological table of 0s (absences) and 1s (presences), where rows correspond to taxa and columns correspond to relevés. |
p |
A vector of integer numbers with the partition of the relevés (i.e.,
a k-partition, consisting in a vector with values from 1 to k, with length
equal to the number of columns of |
output_type |
A character determining the amount of information returned by the function and also the amount of pre-validations. Possible values are "normal" (the default), "fast" and "full". |
The function accepts a phytosociological table (m_bin
) and a
k-partition of its columns (p
), returning the corresponding TDV.
TDV was proposed by Monteiro-Henriques and Bellu (2014).
Monteiro-Henriques (2016) proposed TDV1, modifying TDV slightly with the
objective of ensuring a value from 0 to 1. Yet, TDV is always within that
range. In practice, both TDV and TDV1 have 0 as possible minimum value
and 1 as possible maximum value, but TDV1 reduces further the contribution
of differential taxa present in more than one group. TDV is then
implemented here, for parsimony.
TDV is calculated using the index for each (and all) of the
taxa present in a tabulated phytosociological table
(also called
sorted table).
index aims at characterizing how well a taxon
works as a differential taxon in a such tabulated phytosociological table
(for more information on differential taxa see Mueller-Dombois & Ellenberg,
1974).
An archetypal differential taxon of a certain group of the
partition
(a partition on the columns of
) is the one
present in all relevés of group
, and absent from all the other
groups of that partition. Therefore,
has two components, an
inner one (
), which measures the presence of the
taxon inside each of the groups, and an outer one (
),
which measures the relevant absences of the taxon outside of each of the
groups. Specifically, given a partition
with
groups,
is calculated for each taxon
as:
where:
, is the total number of presences of taxon
within group
.
, is the total number of relevés of group
.
, is the total number of differentiating absences of taxon
, i.e., absences coming from the groups other than
from
which the taxon
is completely absent.
, is the total number of relevés of all groups but
(i.e.,
the total number of relevés in the table -
).
, is the total number of groups in which the taxon
occurs
at least once.
Therefore, for each taxon and for each group
, the
index evaluates:
, i.e., the frequency of the presences of taxon
, relative to the size of group
; commonly called 'relative
frequency.'
is only 1 if and only if taxon
occurs in all the relevés of group
.
, i.e., the frequency of the differentiating
absences of taxon
outside group
, relative to the sum of
sizes of all groups but
. Nota bene: absences in
are
counted outside the group
but only in the groups from which taxon
is completely absent (these are the relevant absences, which
produce differentiation among groups); in practice
corresponds to
the sum of the sizes of all groups other than
that are empty.
is 1 if and only if the taxon
is absent
from all groups but
.
Finally, ensures that
is a value
from 0 to 1.
The Total Differential Value (TDV or ) of a
phytosociological table
tabulated/sorted by the partition
is:
where:
, is the number of taxa in table
.
The division by the number of taxa present in ensures that TDV
remains in the [0,1] interval (as
is also in the same
interval).
If output_type = "normal"
(the default) pre-validations are done
and a list is returned, with the following components:
A matrix with the values for each taxon
in each group, for short called the 'inner frequency of presences'.
A matrix with the values for each
taxon in each group, for short called the 'outer frequency of
differentiating absences'.
A vector with the values for each taxon, i.e., the
number of groups containing that taxon.
A matrix with the for each taxon.
A numeric with the TDV of matrix m_bin,
given the partition
p
.
If output_type = "full"
, some extra components are added to the output:
afg
, empty.size
, gct
(= ) and
i.mul
. These are intermediate
matrices used in the computation of TDV.
If output_type = "fast"
, only TDV is returned and no pre-validations are
done.
Tiago Monteiro-Henriques. E-mail: [email protected].
Monteiro-Henriques T. & Bellu A. 2014. An optimization approach to the production of differentiated tables based on new differentiability measures. 23rd EVS European Vegetation Survey. Presented orally. Ljubljana, Slovenia.
Monteiro-Henriques T. 2016. A bunch of R functions to assist phytosociological tabulation. 25th Meeting of European Vegetation Survey. Presented in poster. Rome. Italy.
Mueller-Dombois D. & Ellenberg H. 1974. Aims and Methods of Vegetation Ecology. New York: John Wiley & Sons.
# Getting the Taxus baccata forests data set data(taxus_bin) # Creating a group partition, as the one presented in the original article of # the data set groups <- rep(c(1, 2, 3), c(3, 11, 19)) # Removing taxa occurring in only one relevé, in order to reproduce exactly # the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Calculating TDV result <- tdv(taxus_bin_wmt, groups) # This is the TDV result$tdv # This is TDV1, reproducing exactly the value from the original article sum(result$diffval / result$e) / nrow(taxus_bin_wmt)
# Getting the Taxus baccata forests data set data(taxus_bin) # Creating a group partition, as the one presented in the original article of # the data set groups <- rep(c(1, 2, 3), c(3, 11, 19)) # Removing taxa occurring in only one relevé, in order to reproduce exactly # the example in the original article of the data set taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ] # Calculating TDV result <- tdv(taxus_bin_wmt, groups) # This is the TDV result$tdv # This is TDV1, reproducing exactly the value from the original article sum(result$diffval / result$e) / nrow(taxus_bin_wmt)