Introducing LifemapR, a package aiming to visualise data on a Lifemap basemap. The uniqueness of LifemapR is that it offers interactive visualization of data across a vast taxonomy (the NCBI taxonomy). LifemapR can easily process large datasets, exceeding 300,000 rows, with remarkable efficiency and precision. But what sets LifemapR apart is its flexibility and adaptability, offering a wide spectrum of customization options. This allows users to personalize their data visualization to better suit their unique needs and preferences.
LifemapR
from
GitHubThe very first step to use this package is to have a dataframe
containing at least a taxid
column containing NCBI format
TaxIds. This dataframe can also contain other data that you might want
to visualise.
## taxid sci_name zoom lat lon
## 6 2836 Bacillariophyta 13 -8.307441 -11.45947
## 7 2849 Phaeodactylum 22 -8.352870 -11.49588
## 8 2850 Phaeodactylum tricornutum 24 -8.352800 -11.49580
## 9 2857 Nitzschia 19 -8.344973 -11.49516
## 10 2864 Dinophyceae 12 -6.953183 -12.25743
You can then transform this dataframe into a format suitable for the
visualisation functions of the package with the
build_Lifemap
function.
After that, you get a lifemap
object that takes the form
of a list containing the following informations :
df
: containing the augmented dataframe obtained after
the first step.basemap
: which basemap was used to fetch the
data.You can then visualise your data with lifemap()
associated with one or more of the following functions :
lm_markers()
, to represent data as circles.lm_branches()
, to represent the data’s subtree.lm_discret()
, to represent discret data as
piecharts.Each one of these three function adds a layer to the visualisation.
These layers are combined with the +
symbol.
# Example with default representation
# one layer
lifemap(LM) + lm_markers()
# three different layers
lifemap(LM) + lm_markers() + lm_branches()
These function also allow the user to represent data by modifying characteristics of representations as we’ll see later in the examples.
The output is a shiny interface where the user can move and zoom freely.
Please note that the following examples have been done in may 2023, if you try them you may not have the same values due to database update
This dataset is the result of a classification by Kraken (Derrick E. Wood, J. Lu, 2019) on a metagenomic sample coming from a controlled set of 12 known bacterial species (V. Sevim, J. Lee, 2019).
First of all, we load the data and transform it into the LifemapR format.
Then we can began to visualise our data.
We can for example represent the number of read that was assigned to each TaxID by the color of the markers with the following command.
With this representation, the markers are displayed if the if the associated node is close enough.
It is possible to change the way markers are displayed with the
display
argument.
# All the nodes that were requested by the user.
lifemap(LM_kraken) +
lm_markers(var_fillColor = "coverage_percent", fillColor = "PiYG", display = "requested")
# Only the nodes that have no descendants.
lifemap(LM_kraken) +
lm_markers(var_fillColor = "coverage_percent", fillColor = "PiYG", display = "leaves")
Informations can also be displayed for the user with the
popup
or label
arguments.
# When clicking on a node, display the desired information.
lifemap(LM_kraken) +
lm_markers(var_fillColor = "coverage_percent", fillColor = "PiYG", popup = "name")
Finally, it is also possible to represent data with a subtree, either by the size of the branches or their color.
# Information on branche's color.
lifemap(LM_kraken) +
lm_branches(var_color = "coverage_percent", color = "PiYG")
# Information on branche's size.
lifemap(LM_kraken) +
lm_branches(size = "coverage_percent")
This dataset contains informations about the genome size and the Transposable Elements content for molluscs, insects and vertebrates.
First of all, we load the data and transform it into the LifemapR format.
Then we can began to visualise our data.
Here we have two characteristics to visualise, we can do so with the following command.
However, unlike the precedent data set, we don’t have informations
for all the nodes. Here we only have data for the leaves so it will be
necessary to infere values to the nodes where the information is missing
with the FUN
argument.
# Visualisation of the Genome size on the fillColor and the TEcontent on the size of markers.
lifemap(LM_gen) +
lm_markers(var_fillColor = "Genome_size", fillColor = "PiYG", radius = "TEcontent_bp", FUN = mean)
We can also represent markers and subtree at the same time
# Visualisation of the Genome size on the fillColor and the TEcontent on the size of markers.
lifemap(LM_gen) +
lm_branches()
lm_markers(var_fillColor = "Genome_size", fillColor = "PiYG", radius = "TEcontent_bp", FUN = mean)
This dataset contains informations about around 1 000 eukaryotes randomly fetched from the NCBI database.
First of all, we load the data and transform it into the LifemapR format.
Then we can began to visualise our data.
We can also choose to visualise only a part of our data. To do this,
we can either sort our data in advance or use the data
argument to do so.
# Visualisation of Plants.
lifemap(LM_eukaryotes) +
lm_markers(data = LM_eukaryotes$df[LM_eukaryotes$df$Group %in% "Plants", ])
Finally we can visualise discret variables with the
lm_piechats()
function as following.
# Visualisation of the maximum assembly level.
lifemap(LM_eukaryotes) +
lm_piecharts(param = "Group")