---
title: "Tutorial for bioregion"
author: "Maxime Lenormand, Boris Leroy and Pierre Denelle"
date: "`r Sys.Date()`"
output:
html_vignette:
number_sections: false
html_document:
toc: true
toc_float:
collapsed: false
smooth_scroll: false
toc_depth: 2
bibliography: '`r system.file("REFERENCES.bib", package="bioregion")`'
csl: journal-of-biogeography.csl
vignette: >
%\VignetteIndexEntry{Tutorial for bioregion}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE,
fig.width = 8, fig.height = 8)
# Packages --------------------------------------------------------------------
suppressPackageStartupMessages({
suppressWarnings({
library(bioregion)
})
})
options(tinytex.verbose = TRUE)
```
## 0. Brief introduction
This tutorial aims at describing the different features of the R package
`bioregion`.
The main purpose of the `bioregion`'s package is to propose a transparent
methodological framework to compare bioregionalisation methods. Below is the
typical flow chart of bioregions' identification based on a site-species
bipartite network or co-occurrence matrix with `bioregion` (Figure 1). This
workflow can be divided into four main steps:
1. Preprocess the data (matrix or network formats)
2. Compute similarity/dissimilarity metrics between sites based on species
composition
3. Run the different algorithms to identify different set of bioregions
4. Evaluate and visualize the results
Figure 1: Workflow of the bioregion's package.
## 1. Install binary files
Some functions or at least part of them (listed below) require binary
files to run.
* [netclu_infomap](https://bioRgeo.github.io/bioregion/reference/netclu_infomap.html)
* [netclu_louvain](https://bioRgeo.github.io/bioregion/reference/netclu_louvain.html) (Cpp version)
* [netclu_oslom](https://bioRgeo.github.io/bioregion/reference/netclu_oslom.html)
Please check this
[tutorial page](https://bioRgeo.github.io/bioregion/articles/a1_install_binary_files.html)
to get instructions regarding the installation of the binary files.
## 2. Matrix or Network formats
The `bioregion`'s package takes as input site-species information stored in a
bipartite network or a co-occurrence matrix. Relying on the function
[mat_to_net](https://bioRgeo.github.io/bioregion/reference/mat_to_net.html)
and [net_to_mat](https://bioRgeo.github.io/bioregion/reference/net_to_mat.html)
, it handles both the matrix and network formats throughout the workflow.
Please have a look at this
[tutorial page](https://bioRgeo.github.io/bioregion/articles/a2_matrix_and_network_formats.html)
to better understand how these two functions work.
## 3. Pairwise similarity/dissimilarity metrics
The functions [similarity](https://bioRgeo.github.io/bioregion/reference/similarity.html)
and [dissimilarity](https://bioRgeo.github.io/bioregion/reference/dissimilarity.html)
compute respectively pairwise similarity and dissimilarity metrics based on a
(site-species) co-occurrence matrix. The resulting `data.frame` is stored in a
`bioregion.pairwise.metric` object containing all requested metrics between each
pair of sites.
The functions [dissimilarity_to_similarity](https://bioRgeo.github.io/bioregion/reference/dissimilarity_to_similarity.html) and [similarity_to_dissimilarity](https://bioRgeo.github.io/bioregion/reference/dissimilarity_to_similarity.html)
can be used to transform a similarity object into a dissimilarity object
and vice versa.
Please have a look at this [tutorial page](https://bioRgeo.github.io/bioregion/articles/a3_pairwise_metrics.html)
to better understand how these functions work.
## 4. Clustering algorithms
The `bioregion` R package gathers several methods allowing to group sites
and species into similar entities called bioregions. All these methods can lead
to several partitions of sites and species, i.e. to different
bioregionalisations.
Bioregionalisation methods can be based on hierarchical clustering
algorithms, non-hierarchical clustering algorithms or network algorithms.
The functions in the package are related to each of these three families and
produce output that have a specific class, namely the `bioregion.clusters`
class.
### 4.1 Hierarchical clustering
The functions relying on hierarchical clustering start with the prefix `hclu_`.
With these algorithms, the bioregions are placed into a dendrogram that ranges
from two extremes: all sites belong to the same bioregion (top of the tree) or
all sites belong to a different bioregion (bottom of the tree).
See the following
[tutorial page](https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html)
for more details.
### 4.2 Non-hierarchical clustering
The functions relying on hierarchical clustering start with the prefix
`nhclu_`. For most of these algorithms, the user needs to predefine the number
of clusters, although this number can be determined by estimating the optimal
partition.
See this
[tutorial page](https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html)
for more details.
### 4.3 Network clustering
The functions relying on network clustering start with the prefix `netclu_`.
Site-species matrices can be seen as (bipartite) networks where the nodes are
either the sites or the species and the links between them are the occurrences
of species within sites.
With networks, modularity algorithms can be applied, leading to
bioregionalisation.
The following
[tutorial page](https://bioRgeo.github.io/bioregion/articles/a4_3_network_clustering.html)
details more each clustering functions relying on a network algorithm.
### 4.4 Microbenchmark
The different bioregionalisation methods listed in the package rely on more or
less computationally intensive algorithms.
The following [page](https://biorgeo.github.io/bioregion/articles/a4_4_microbenchmark.html)
estimates the time required to run each method on data sets of different sizes.
## 5. Visualization and evaluation of the results
## 5.1 Visualization
If sites have geographic coordinates, then each bioregionalisation can be
visualized with the function `map_clusters()`.
This [tutorial page](https://biorgeo.github.io/bioregion/articles/a5_1_visualization.html)
details different ways to plot your bioregionalisation.
## 5.2 Compare partitions
In this section, we look at how sites are assigned to bioregions within a
single bioregionalization and also compare this assignment across different
bioregionalizations. The following
[page](https://biorgeo.github.io/bioregion/articles/a5_2_compare_partitions.html)
illustrates this.