Title: | Creating and Representing Functional Trait Spaces |
---|---|
Description: | Estimation of functional spaces based on traits of organisms. The package includes functions to impute missing trait values (with or without considering phylogenetic information), and to create, represent and analyse two dimensional functional spaces based on principal components analysis, other ordination methods, or raw traits. It also allows for mapping a third variable onto the functional space. See 'Carmona et al. (2021)' <doi:10.1038/s41586-021-03871-y>, 'Puglielli et al. (2021)' <doi:10.1111/nph.16952>, 'Carmona et al. (2021)' <doi:10.1126/sciadv.abf2675>, 'Carmona et al. (2019)' <doi:10.1002/ecy.2876> for more information. |
Authors: | Carlos P. Carmona [cre] , Nicola Pavanetto [aut] , Giacomo Puglielli [aut] |
Maintainer: | Carlos P. Carmona <[email protected]> |
License: | GPL-3 |
Version: | 0.2.2 |
Built: | 2025-01-14 06:18:43 UTC |
Source: | CRAN |
Defines the functional structure of a set of species
funspace( x, PCs = c(1, 2), group.vec = NULL, fixed.bw = TRUE, n_divisions = 100, trait_ranges = NULL, threshold = 0.999 )
funspace( x, PCs = c(1, 2), group.vec = NULL, fixed.bw = TRUE, n_divisions = 100, trait_ranges = NULL, threshold = 0.999 )
x |
Data to create the functional space. It can be either a PCA object obtained using the |
PCs |
A vector specifying the Principal Components to be considered (e.g. choosing |
group.vec |
An object of class factor specifying the levels of the grouping variable. |
fixed.bw |
Logical indicating whether the same bandwidth that is used in the kde estimation for the whole dataset should also be used for the kde estimation of individual groups of observations ( |
n_divisions |
The number of equal-length parts in which each principal component should be divided to calculate the grid in which calculations are based. Higher values of n_divisions will result in larger computation times, but also more smooth graphics. Defaults to 100. |
trait_ranges |
A list indicating the range of values that will be considered in the calculations for each of the considered PCA components. The list should contain the range (minimum and maximum) of values that will be considered. Each element of the list corresponds with one PCA component. The order of the components must be the same as the order provided in |
threshold |
The probability threshold to consider to estimate the TPD function. TPD functions are positive across the whole trait space; |
The functional structure of a set of organisms refers to how these organisms are distributed within a functional space (a space defined by traits). Functional structure can be expressed in probabilistic terms using trait probability density functions (TPD). TPD functions reflect how densely the organisms occupy the different parts of the functional space, and are implemented in the package TPD
(Carmona et al. 2019).
funspace
allows the user to define functional structure in a two-dimensional functional space created using a PCA, other ordination methods, or raw traits. The function automatically estimates the probability of occurrence of trait combinations within the space using kernel density estimation with unconstrained bandwidth using the functions from the ks
R package (Duong, 2007). Contour lines can be drawn at any quantile of the probability distribution. Colored areas, corresponding to the target quantiles, visually summarize the probability of occurrence of certain trait combinations within the trait space.
funspace
The function returns an object of class funspace
containing characteristics of the functional space and the trait probability distributions. The object includes estimations of functional richness and functional divergence for all observations taken together (global) and for each individual group (if groups are provided). The funspace
class has specific methods exists for the generic functions plot
and summary
.
CP Carmona, F de Bello, NWH Mason, J Leps (2019). Trait Probability Density (TPD): measuring functional diversity across scales based on trait probability density with R. Ecology e02876. T Duong, T., (2007). ks: Kernel Density Estimation and Kernel Discriminant Analysis for Multivariate Data in R. J. Stat. Softw. 21(7), 1-16.
# 1. Plotting a space based on a PCA x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) summary(funtest) plot(funtest, type = "global") #2. To include groups, let's consider two major families. # We will use two raw traits, ph and sla: selFam <- c("Pinaceae", "Fabaceae") selRows <- which(GSPFF_tax$family %in% selFam) GSPFF_subset <- GSPFF[selRows, c("ph", "sla")] tax_subset <- droplevels(GSPFF_tax[selRows, ]) funtest <- funspace(x = GSPFF_subset, threshold = 0.95, group.vec = tax_subset$family) summary(funtest) plot(funtest, type = "global") plot(funtest, type = "groups", axis.title.x = "Plant height", axis.title.y = "Specific leaf area", quant.plot = TRUE, pnt = TRUE, pnt.cex = 0.5, pnt.col = rgb(0, 1, 1, alpha = 0.2))
# 1. Plotting a space based on a PCA x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) summary(funtest) plot(funtest, type = "global") #2. To include groups, let's consider two major families. # We will use two raw traits, ph and sla: selFam <- c("Pinaceae", "Fabaceae") selRows <- which(GSPFF_tax$family %in% selFam) GSPFF_subset <- GSPFF[selRows, c("ph", "sla")] tax_subset <- droplevels(GSPFF_tax[selRows, ]) funtest <- funspace(x = GSPFF_subset, threshold = 0.95, group.vec = tax_subset$family) summary(funtest) plot(funtest, type = "global") plot(funtest, type = "groups", axis.title.x = "Plant height", axis.title.y = "Specific leaf area", quant.plot = TRUE, pnt = TRUE, pnt.cex = 0.5, pnt.col = rgb(0, 1, 1, alpha = 0.2))
Calculating the dimensionality of a functional space based on PCA
funspaceDim(data)
funspaceDim(data)
data |
A |
funspaceDim
allows the user to identify the number of dimensions that are needed to build a trait space. The identified dimensions are those that minimize redundancy while maximizing the information contained in the trait data. The number of significant PCA axes to be retained is determined by using the paran()
function of the R package paran
(Dinno, 2018). paran()
is based on the method proposed by Horn (1965), which involves contrasting the eigenvalues produced through PCAs run on (30 * (number of variables)) random datasets with the same number of variables and observations as the input dataset. Eigenvalues > 1 are retained in the adjustment.
funspaceDim
returns the number of dimensions to be retained. The output is stored and printed out in the R console as well.
Horn, J.L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika 30: 179-185.
Dinno, A. (2018). paran: Horn's test of principal components/factors. R package version 1.5.2.
# Dimensionality of the GSPFF funspaceDim(GSPFF)
# Dimensionality of the GSPFF funspaceDim(GSPFF)
Mapping response variables in a functional space
funspaceGAM(y, funspace, family = "gaussian", minObs = 30)
funspaceGAM(y, funspace, family = "gaussian", minObs = 30)
y |
vector including the variable to be mapped inside the functional space. There must be a correspondence between the elements of y and the observations used to make the PCA (contained in 'pca.object'), both in the number of elements and in their order. |
funspace |
An object of class |
family |
A family object specifying the distribution and link to use in the gam model. Defaults to "gaussian". See package |
minObs |
minimum number of observations needed in a group to make a model (defaults to 30). |
Different response variables can be mapped onto a functional space. In funspace
, we follow the approach by Carmona et al. (2021), in which a generalized additive model is estimated across the bidimensional functional space. The resulting models show the predicted values of the response variable at each position of the portion of the functional space that is defined in the TPD of the global set of observations or of individual groups.
The function returns an object of class funspace
containing the functional space, trait probability distributions, and the fitted gam models. The funspace
class has specific methods exists for the generic functions plot
and summary
.
CP Carmona, et al. (2021). Erosion of global functional diversity across the tree of life. Science Advances eabf2675
# 1. GAM on a space based on a PCA x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) y <- abs(x$scores[, 1] * x$scores[, 2]) + rnorm(nrow(GSPFF), mean = 0, sd = 1) funtestGAM <- funspaceGAM(y = y, funspace = funtest) plot(funtestGAM, quant.plot = TRUE, quant.col = "grey90") summary(funtestGAM)
# 1. GAM on a space based on a PCA x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) y <- abs(x$scores[, 1] * x$scores[, 2]) + rnorm(nrow(GSPFF), mean = 0, sd = 1) funtestGAM <- funspaceGAM(y = y, funspace = funtest) plot(funtestGAM, quant.plot = TRUE, quant.col = "grey90") summary(funtestGAM)
Comparing the amount of occupied functional space against null models
funspaceNull( funspace, nrep = 100, alter = "greater", null.distribution = "multnorm", verbose = TRUE )
funspaceNull( funspace, nrep = 100, alter = "greater", null.distribution = "multnorm", verbose = TRUE )
funspace |
An object of class |
nrep |
|
alter |
|
null.distribution |
|
verbose |
|
funspaceNull
The function tests for the statistical difference between the size (functional richness) of the considered TPD, obtained using the funspace
function, against a vector of functional richness values generated using null models (see below) across a user-defined number of iterations. Two null models are currently available for testing. One generates data with a multivariate normal distribution, creating a dataset with normally distributed variables having the same mean and covariance than the observations used to build the functional space (see Carmona et al. 2021). This null model returns a theoretical TPD where some trait combinations (those around the mean of the trait space axes, thus towards the center of the null trait space) are more likely than others (i.e., this null model resembles an ellipse). The other null model generates a dataset with variables following a uniform distribution (see null model 1 in Diaz et al. 2016), creating a distribution where all trait combinations within the range of the original observations are equally possible (i.e., the approximate shape of this null model is a rectangle).
Note that the function does not work for funspace objects that are based on a TPDs object created using the package TPD
funspaceNull
The function returns the list containing all the simulated datasets, the area of the observed trait space, the mean value of the area for the null model (calculated across iterations), the p-value of the difference between observed and simulated trait space, as well as a standardized effect size of the difference between observed trait space and mean null model areas. This output is reported together with the output of funspace
.
CP Carmona, et al. (2021). Fine-root traits in the global spectrum of plant form and function. Nature 597, 683–687 S Diaz, et al. (2016). The global spectrum of plant form and function. Nature 529, 167–171
# 1. PCA space, multivariate model (see Carmona et al. 2021, Nature) x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) funtestNull <- funspaceNull(funtest, null.distribution = 'multnorm', nrep = 1000) summary(funtestNull) #'# 2. Two raw traits and uniform distribution (see Diaz et al. 2016, Nature) x <- GSPFF[, c("ph", "sla")] funtest <- funspace(x = x, threshold = 0.95) funtestNull <- funspaceNull(funtest, null.distribution = 'uniform', nrep = 1000) summary(funtestNull)
# 1. PCA space, multivariate model (see Carmona et al. 2021, Nature) x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) funtestNull <- funspaceNull(funtest, null.distribution = 'multnorm', nrep = 1000) summary(funtestNull) #'# 2. Two raw traits and uniform distribution (see Diaz et al. 2016, Nature) x <- GSPFF[, c("ph", "sla")] funtest <- funspace(x = x, threshold = 0.95) funtestNull <- funspaceNull(funtest, null.distribution = 'uniform', nrep = 1000) summary(funtestNull)
Data on six aboveground traits for 2,630 species with complete trait information. Data was processed from the TRY database (https://www.try-db.org/TryWeb/Home.php) and used in the paper "Fine-root traits in the global spectrum of plant form and function (Carmona et al. 2021, Nature). The data is available in https://doi.org/10.6084/m9.figshare.13140146. All traits are log10 transformed and scaled.
GSPFF
GSPFF
## 'GSPFF' A data frame with 2,630 rows and 6 columns:
leaf area
leaf nitrogen content
plant height
specific leaf area
specific stem density
seed mass
...
<https://doi.org/10.6084/m9.figshare.13140146>
Data on six aboveground traits for 10,746 species with incomplete trait information. Data was processed from the TRY database (https://www.try-db.org/TryWeb/Home.php) and used in the paper "Fine-root traits in the global spectrum of plant form and function (Carmona et al. 2021, Nature). The data is available in https://doi.org/10.6084/m9.figshare.13140146. Only species with information for at least three traits are included. All traits are log10 transformed and scaled.
GSPFF_missing
GSPFF_missing
## 'GSPFF_missing' A data frame with 10,746 rows and 6 columns:
leaf area
leaf nitrogen content
plant height
specific leaf area
specific stem density
seed mass
...
<https://doi.org/10.6084/m9.figshare.13140146>
Taxonomic data for 10,746 species with incomplete trait information (species with at least three traits).
GSPFF_missing_tax
GSPFF_missing_tax
## 'GSPFF_missing_tax' A data frame with 10,746 rows and 3 columns:
genus to which the species belongs
family to which the species belongs
order to which the species belongs
...
<https://doi.org/10.6084/m9.figshare.13140146>
Taxonomic data for 2,630 species with complete trait information.
GSPFF_tax
GSPFF_tax
## 'GSPFF_tax' A data frame with 2,630 rows and 3 columns:
genus to which the species belongs
family to which the species belongs
order to which the species belongs
...
<https://doi.org/10.6084/m9.figshare.13140146>
Imputing incomplete trait information, with the possibility of using phylogenetic information
impute( traits, phylo = NULL, addingSpecies = FALSE, nEigen = 10, messages = TRUE )
impute( traits, phylo = NULL, addingSpecies = FALSE, nEigen = 10, messages = TRUE )
traits |
A matrix or data.frame containing trait information with missing values. The rows correspond to observations (generally species) and the columns to the variables (generally traits). Traits can be continuous and/or categorical. Row names of the |
phylo |
(optional) A phylogenetic tree (an object of class "phylo") containing the evolutionary relationships between species. |
addingSpecies |
Logical, defaults to FALSE. Should species present in the trait matrix but not in the phylogeny be added to the phylogeny? If TRUE, the |
nEigen |
The number of phylogenetic eigenvectors to be considered. Defaults to 10. |
messages |
Logical, defaults to TRUE. Should the function return messages? |
impute
imputes trait values in trait matrices with incomplete trait information. It uses the Random Forest approach implemented in the missForest
package. Phylogenetic information can be incorporated in the imputation in the form of a phylogenetic tree, from which a number of phylogenetic eigenvectors are added to the trait matrix.
The function returns a list containing both the original trait data (incomplete) and the imputed trait data.
# GSPFF_missing dataset includes >10,000 species. # Preparing and imputing this data takes very long time. # Let's select a small random subset: selectSPS <- 200 set.seed(2) subset_traits <- GSPFF_missing[sample(1:nrow(GSPFF_missing), selectSPS), ] deleteTips <- setdiff(phylo$tip.label, rownames(subset_traits)) subset_phylo <- ape::drop.tip(phylo, tip = deleteTips) GSPFF_subset <- impute(traits = subset_traits, phylo = subset_phylo, addingSpecies = TRUE) pca <- princomp(GSPFF_subset$imputed) funtest <- funspace(pca) plot(funtest, pnt = TRUE, pnt.cex = 0.2, arrows = TRUE) summary(funtest)
# GSPFF_missing dataset includes >10,000 species. # Preparing and imputing this data takes very long time. # Let's select a small random subset: selectSPS <- 200 set.seed(2) subset_traits <- GSPFF_missing[sample(1:nrow(GSPFF_missing), selectSPS), ] deleteTips <- setdiff(phylo$tip.label, rownames(subset_traits)) subset_phylo <- ape::drop.tip(phylo, tip = deleteTips) GSPFF_subset <- impute(traits = subset_traits, phylo = subset_phylo, addingSpecies = TRUE) pca <- princomp(GSPFF_subset$imputed) funtest <- funspace(pca) plot(funtest, pnt = TRUE, pnt.cex = 0.2, arrows = TRUE) summary(funtest)
Phylogenetic tree including information for 10,746 species with incomplete trait information (species with at least three traits), contained in GSPFF_missing
.
phylo
phylo
## 'phylo'
An object of class "phylo"
Takes a funspace
object produced by funspace()
or funspaceGAM()
and plots the trait probability distribution (TPD) or the map of the response variable (depending of which kind of funspace object is provided) in a functional space.
## S3 method for class 'funspace' plot( x = NULL, type = "global", which.group = NULL, quant.plot = FALSE, quant = NULL, quant.lty = 1, quant.col = "grey30", quant.lwd = 1, quant.labels = TRUE, colors = NULL, ncolors = 100, pnt = FALSE, pnt.pch = 19, pnt.cex = 0.5, pnt.col = "grey80", arrows = FALSE, arrows.length = 1, arrows.head = 0.08, arrows.col = "black", arrows.label.col = "black", arrows.label.pos = 1.1, arrows.label.cex = 1, axis.title = TRUE, axis.title.x = NULL, axis.title.y = NULL, axis.title.cex = 1, axis.title.line = 2, axis.cex = 1, globalContour = TRUE, globalContour.quant = NULL, globalContour.lwd = 3, globalContour.lty = 1, globalContour.col = "grey50", xlim = NULL, ylim = NULL, ... )
## S3 method for class 'funspace' plot( x = NULL, type = "global", which.group = NULL, quant.plot = FALSE, quant = NULL, quant.lty = 1, quant.col = "grey30", quant.lwd = 1, quant.labels = TRUE, colors = NULL, ncolors = 100, pnt = FALSE, pnt.pch = 19, pnt.cex = 0.5, pnt.col = "grey80", arrows = FALSE, arrows.length = 1, arrows.head = 0.08, arrows.col = "black", arrows.label.col = "black", arrows.label.pos = 1.1, arrows.label.cex = 1, axis.title = TRUE, axis.title.x = NULL, axis.title.y = NULL, axis.title.cex = 1, axis.title.line = 2, axis.cex = 1, globalContour = TRUE, globalContour.quant = NULL, globalContour.lwd = 3, globalContour.lty = 1, globalContour.col = "grey50", xlim = NULL, ylim = NULL, ... )
x |
A |
type |
character indicating whether the plots should represent the global distribution of observations ( |
which.group |
when plotting groups, either a character or a number indicating the name (character) or position (number) of a single group to be plotted individually. |
quant.plot |
Logical, Default is |
quant |
A vector specifying the quantiles to be plotted (in case |
quant.lty |
type of line to be used to represent quantiles. See |
quant.col |
Color to be used in the quantile lines. Defaults to |
quant.lwd |
Line width to be used in the quantile lines. Defaults to 1. |
quant.labels |
Logical, Default is |
colors |
A vector defining the colors of plotted quantiles in the TPD case. Only two colors need to be specified. The first color is automatically assigned to the highest quantile in |
ncolors |
number of colors to include in the color gradients set by |
pnt |
Logical, defaults to |
pnt.pch |
Numerical. Graphical parameter to select the type of point to be drawn. Default is set to 19. See |
pnt.cex |
Numerical. Graphical parameter to set the size of the points. Default is 0.5. See |
pnt.col |
Graphical parameter to set the points color. Default is |
arrows |
Logical, defaults to |
arrows.length |
Numerical. Graphical parameter to set the length of the arrow (see |
arrows.head |
Numerical. Graphical parameter to set the length of the arrow head (see |
arrows.col |
Graphical parameter to set the arrows color (see |
arrows.label.col |
Graphical parameter to set the color of the arrows labels color. Default is |
arrows.label.pos |
Numerical. Graphical parameter to set the position of the arrow labels with respect to the arrow heads. Default is 1.1, which draws arrow labels slightly beyond the arrow heads. A value of 1 means drawing labels on top of arrow heads. |
arrows.label.cex |
Numerical. Graphical parameter to set the size of arrow labels. Defaults to 1. |
axis.title |
Logical. Default is |
axis.title.x |
Character. The title to be plotted in the x axis if |
axis.title.y |
Character. The title to be plotted in the y axis if |
axis.title.cex |
Numerical. Graphical parameter to set the size of the axes titles. Default is 1. |
axis.title.line |
Numerical. Graphical parameter to set the on which margin line to plot axes titles. Default is 2. |
axis.cex |
Numerical. Graphical parameter to set the size of the axes annotation. Default is 1. |
globalContour |
Logical, Default is |
globalContour.quant |
A vector specifying the quantiles to be plotted (in case |
globalContour.lwd |
Line width to be used in the global contour lines. Defaults to 3. |
globalContour.lty |
type of line to be used to represent the global contour lines. See |
globalContour.col |
Graphical parameter to set the color of the global contour lines. Default is |
xlim |
the x limits (x1, x2) of the plot. |
ylim |
the y limits (y1, y2) of the plot. |
... |
Other arguments |
Produces default plots. If the input object was generated with funspace()
, the plot shows a bivariate functional trait space displaying trait probability densities (for single or multiple groups). If the input object was generated with funspaceGAM
, the plot shows a heatmap depicting how a target variable is distributed within the functional trait space (for single or multiple groups).
No return value. This function is called for its side effect: generating plots.
x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) plot(funtest, type = "global", quant.plot = TRUE, quant.lwd = 2, pnt = TRUE, pnt.cex = 0.1, pnt.col = rgb(0.1, 0.8, 0.2, alpha = 0.2), arrows = TRUE, arrows.length = 0.7)
x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) plot(funtest, type = "global", quant.plot = TRUE, quant.lwd = 2, pnt = TRUE, pnt.cex = 0.1, pnt.col = rgb(0.1, 0.8, 0.2, alpha = 0.2), arrows = TRUE, arrows.length = 0.7)
summary
method for class funspace
"
## S3 method for class 'funspace' summary(object, ...)
## S3 method for class 'funspace' summary(object, ...)
object |
A |
... |
Other arguments |
Produces default summary. If the input object was generated with funspace()
, the summary includes information about the characteristics of the functional space (particularly if it derives from a PCA), along with functional diversity indicators (functional richness and functional divergence) for the whole set of observations and for each group (in case groups are specified). If the input object was generated with funspaceGAM()
, the function returns the summary for the GAM models for the whole set of observations and individual groups. In the case of funspace objects based on a TPD object created with the TPD
package, only information about groups is provided (since there is no global distribution). If the input was generated with funspaceNull()
, the function returns tests exploring the difference between the observed functional richness and the null model functional richness.
No return value. This function is called for its side effect: summarizing objects of class "funspace"
.
x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) summary(funtest)
x <- princomp(GSPFF) funtest <- funspace(x = x, PCs = c(1, 2), threshold = 0.95) summary(funtest)