Title: | Multivariate Multiscale Spatial Analysis |
---|---|
Description: | Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran's Eigenvectors Maps, MEM). Several approaches are described in the review Dray et al (2012) <doi:10.1890/11-1183.1>. |
Authors: | Stéphane Dray [aut] , David Bauman [ctb], Guillaume Blanchet [ctb], Daniel Borcard [ctb], Sylvie Clappe [ctb], Guillaume Guenard [ctb] , Thibaut Jombart [ctb], Guillaume Larocque [ctb], Pierre Legendre [ctb] , Naima Madi [ctb], Hélène H Wagner [ctb], Aurélie Siberchicot [ctb, cre] |
Maintainer: | Aurélie Siberchicot <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.3-24 |
Built: | 2024-12-07 06:35:22 UTC |
Source: | CRAN |
This function constructs eigenvectors of a site-by-link matrix. Weights can be applied to the links.
aem( aem.build.binary, binary.mat, weight, rm.link0 = FALSE, print.binary.mat = FALSE )
aem( aem.build.binary, binary.mat, weight, rm.link0 = FALSE, print.binary.mat = FALSE )
aem.build.binary |
Object created by function |
binary.mat |
Site (n rows) by link (k columns) matrix. The 1s in the matrix represents the presence of a link influencing a site, directly or indirectly, otherwise the values are 0s. |
weight |
Vector of weights of length k, to be applied to the links. |
rm.link0 |
Logical ( |
print.binary.mat |
Logical ( |
If only an object of class aem.build.binary
is given to this function, The
argument binary.mat
is not considered. binary.mat
is only
considered when the argument aem.build.binary
is missing.
If weights are applied to the links, the length of vector weight
has
to take into account wether the links connecting real sites to the origin
(the fictitious site 0) have been kept or removed.
value |
A vector of singular values associated with the AEM. |
vectors |
A matrix of eigenvector. Each column is an AEM eigenfunction (or variable). |
mod.binary.mat |
A site-by-link matrix modified through the function. |
It sometimes happens that AEM eigenfunctions have equal singular values. In that case, different sets of AEM eigenfunctions may be produced on different plateforms.
Eigenvectors associated to an eigenvalue that is smaller than
are considered negligeable. They have been removed from the created AEM
eigenfunctions.
F. Guillaume Blanchet
Blanchet F.G., P. Legendre and Borcard D. (2008) Modelling directional spatial processes in ecological data. Ecological Modelling, 215, 325-336.
# Construction of object of class nb (spdep) if(require("spdep", quietly = TRUE)){ nb <- cell2nb(5,5,"queen") # Create fictitious geographical coordinates xy <- cbind(1:25,expand.grid(1:5,1:5)) # Build binary site-by-link matrix bin.mat <- aem.build.binary(nb,xy) # Construct AEM eigenfunctions from an object of class aem.build.binary res <- aem(aem.build.binary=bin.mat,rm.link0=FALSE) res$values # Illustrate 4 AEM eigenfunctions using bubble plots opal <- palette() palette(c("black","white")) oldpar <- par(mfrow=c(2,2)) symbols(x=xy[,2:3], circles=abs(res$vectors[,1]), inches=FALSE, asp=1, fg=ifelse(sign(-res$vectors[,1])+1>0,1,0), bg=ifelse(sign(res$vectors[,1])+1>0,1,0), xlab="x", ylab="y") title("AEM 1") symbols(x=xy[,2:3], circles=abs(res$vectors[,2]), inches=FALSE, asp=1, fg=ifelse(sign(-res$vectors[,2])+1>0,1,0), bg=ifelse(sign(res$vectors[,2])+1>0,1,0), xlab="x", ylab="y") title("AEM 2") symbols(x=xy[,2:3], circles=abs(res$vectors[,3]), inches=FALSE, asp=1, fg=ifelse(sign(-res$vectors[,3])+1>0,1,0), bg=ifelse(sign(res$vectors[,3])+1>0,1,0), xlab="x", ylab="y") title("AEM 3") symbols(x=xy[,2:3], circles=abs(res$vectors[,4]), inches=FALSE, asp=1, fg=ifelse(sign(-res$vectors[,4])+1>0,1,0), bg=ifelse(sign(res$vectors[,4])+1>0,1,0), xlab="x", ylab="y") title("AEM 4") # Construct AEM eigenfunctions using only a site-by-link matrix res2 <- aem(binary.mat=bin.mat[[1]]) res2$values # Illustrate 4 AEM eigenfunctions using bubble plots par(mfrow=c(2,2)) symbols(x=xy[,2:3], circles=abs(res2$vectors[,1]), inches=FALSE, asp=1, fg=ifelse(sign(-res2$vectors[,1])+1>0,1,0), bg=ifelse(sign(res2$vectors[,1])+1>0,1,0), xlab="x", ylab="y") title("AEM 1") symbols(x=xy[,2:3], circles=abs(res2$vectors[,2]), inches=FALSE, asp=1, fg=ifelse(sign(-res2$vectors[,2])+1>0,1,0), bg=ifelse(sign(res2$vectors[,2])+1>0,1,0), xlab="x", ylab="y") title("AEM 2") symbols(x=xy[,2:3], circles=abs(res2$vectors[,3]), inches=FALSE, asp=1, fg=ifelse(sign(-res2$vectors[,3])+1>0,1,0), bg=ifelse(sign(res2$vectors[,3])+1>0,1,0), xlab="x", ylab="y") title("AEM 3") symbols(x=xy[,2:3], circles=abs(res2$vectors[,4]), inches=FALSE,asp=1, fg=ifelse(sign(-res2$vectors[,4])+1>0,1,0), bg=ifelse(sign(res2$vectors[,4])+1>0,1,0), xlab="x", ylab="y") title("AEM 4") palette(opal) par(oldpar) # Construct AEM eigenfunctions with a function of the distance # as weights to put on the links # Construction of object of class nb (spdep) nb<-cell2nb(5,5,"queen") # Create fictitious geographical coordinates xy <- cbind(1:25,expand.grid(1:5,1:5)) # Build binary site-by-link matrix bin.mat <- aem.build.binary(nb,xy) # Construct a matrix of distances long.lien.mat<-as.matrix(dist(xy)) # Extract the edges, remove the ones directly linked to site 0 lien.b<-bin.mat$edges[-1:-5,] # Construct a vector giving the length of each edge long.lien<-vector(length=nrow(lien.b)) for(i in 1:nrow(lien.b)){ long.lien[i]<-long.lien.mat[lien.b[i,1],lien.b[i,2]] } # Construct a vector of weights based on distance weight.vec<-1-(long.lien/max(long.lien))^2 # Construct AEM eigenfunctions from an object of class aem.build.binary res <- aem(aem.build.binary=bin.mat,weight=weight.vec,rm.link0=TRUE) res # Computing Moran's I for AEMs # Building AEMs xy <- cbind(1:25,expand.grid(1:5,1:5)) Wdist <- 1/as.matrix(dist(xy[,2:3])) nb <- cell2nb(5,5,"queen") bin.mat <- aem.build.binary(nb,xy) linkBase <- bin.mat[[2]] link <- linkBase[-which(linkBase[,1] == 0),] weight <- numeric() for(i in 1:nrow(link)){ weight[i] <- Wdist[link[i,1],link[i,2]] } AEM <- aem(bin.mat, weight = weight, rm.link0 = TRUE) # Constructing asymmetric matrix matasym <- matrix(0,ncol=25, nrow=25) for(i in 1:nrow(link)){ matasym[link[i,1],link[i,2]]<- weight[i] } # Build a listw object from the asymmetric matrix listwAsym <- mat2listw(matasym, style = "B", zero.policy = TRUE) # Calculate Moran's I for AEM MoranIAEM <- moran.randtest(AEM$vectors, listwAsym) }
# Construction of object of class nb (spdep) if(require("spdep", quietly = TRUE)){ nb <- cell2nb(5,5,"queen") # Create fictitious geographical coordinates xy <- cbind(1:25,expand.grid(1:5,1:5)) # Build binary site-by-link matrix bin.mat <- aem.build.binary(nb,xy) # Construct AEM eigenfunctions from an object of class aem.build.binary res <- aem(aem.build.binary=bin.mat,rm.link0=FALSE) res$values # Illustrate 4 AEM eigenfunctions using bubble plots opal <- palette() palette(c("black","white")) oldpar <- par(mfrow=c(2,2)) symbols(x=xy[,2:3], circles=abs(res$vectors[,1]), inches=FALSE, asp=1, fg=ifelse(sign(-res$vectors[,1])+1>0,1,0), bg=ifelse(sign(res$vectors[,1])+1>0,1,0), xlab="x", ylab="y") title("AEM 1") symbols(x=xy[,2:3], circles=abs(res$vectors[,2]), inches=FALSE, asp=1, fg=ifelse(sign(-res$vectors[,2])+1>0,1,0), bg=ifelse(sign(res$vectors[,2])+1>0,1,0), xlab="x", ylab="y") title("AEM 2") symbols(x=xy[,2:3], circles=abs(res$vectors[,3]), inches=FALSE, asp=1, fg=ifelse(sign(-res$vectors[,3])+1>0,1,0), bg=ifelse(sign(res$vectors[,3])+1>0,1,0), xlab="x", ylab="y") title("AEM 3") symbols(x=xy[,2:3], circles=abs(res$vectors[,4]), inches=FALSE, asp=1, fg=ifelse(sign(-res$vectors[,4])+1>0,1,0), bg=ifelse(sign(res$vectors[,4])+1>0,1,0), xlab="x", ylab="y") title("AEM 4") # Construct AEM eigenfunctions using only a site-by-link matrix res2 <- aem(binary.mat=bin.mat[[1]]) res2$values # Illustrate 4 AEM eigenfunctions using bubble plots par(mfrow=c(2,2)) symbols(x=xy[,2:3], circles=abs(res2$vectors[,1]), inches=FALSE, asp=1, fg=ifelse(sign(-res2$vectors[,1])+1>0,1,0), bg=ifelse(sign(res2$vectors[,1])+1>0,1,0), xlab="x", ylab="y") title("AEM 1") symbols(x=xy[,2:3], circles=abs(res2$vectors[,2]), inches=FALSE, asp=1, fg=ifelse(sign(-res2$vectors[,2])+1>0,1,0), bg=ifelse(sign(res2$vectors[,2])+1>0,1,0), xlab="x", ylab="y") title("AEM 2") symbols(x=xy[,2:3], circles=abs(res2$vectors[,3]), inches=FALSE, asp=1, fg=ifelse(sign(-res2$vectors[,3])+1>0,1,0), bg=ifelse(sign(res2$vectors[,3])+1>0,1,0), xlab="x", ylab="y") title("AEM 3") symbols(x=xy[,2:3], circles=abs(res2$vectors[,4]), inches=FALSE,asp=1, fg=ifelse(sign(-res2$vectors[,4])+1>0,1,0), bg=ifelse(sign(res2$vectors[,4])+1>0,1,0), xlab="x", ylab="y") title("AEM 4") palette(opal) par(oldpar) # Construct AEM eigenfunctions with a function of the distance # as weights to put on the links # Construction of object of class nb (spdep) nb<-cell2nb(5,5,"queen") # Create fictitious geographical coordinates xy <- cbind(1:25,expand.grid(1:5,1:5)) # Build binary site-by-link matrix bin.mat <- aem.build.binary(nb,xy) # Construct a matrix of distances long.lien.mat<-as.matrix(dist(xy)) # Extract the edges, remove the ones directly linked to site 0 lien.b<-bin.mat$edges[-1:-5,] # Construct a vector giving the length of each edge long.lien<-vector(length=nrow(lien.b)) for(i in 1:nrow(lien.b)){ long.lien[i]<-long.lien.mat[lien.b[i,1],lien.b[i,2]] } # Construct a vector of weights based on distance weight.vec<-1-(long.lien/max(long.lien))^2 # Construct AEM eigenfunctions from an object of class aem.build.binary res <- aem(aem.build.binary=bin.mat,weight=weight.vec,rm.link0=TRUE) res # Computing Moran's I for AEMs # Building AEMs xy <- cbind(1:25,expand.grid(1:5,1:5)) Wdist <- 1/as.matrix(dist(xy[,2:3])) nb <- cell2nb(5,5,"queen") bin.mat <- aem.build.binary(nb,xy) linkBase <- bin.mat[[2]] link <- linkBase[-which(linkBase[,1] == 0),] weight <- numeric() for(i in 1:nrow(link)){ weight[i] <- Wdist[link[i,1],link[i,2]] } AEM <- aem(bin.mat, weight = weight, rm.link0 = TRUE) # Constructing asymmetric matrix matasym <- matrix(0,ncol=25, nrow=25) for(i in 1:nrow(link)){ matasym[link[i,1],link[i,2]]<- weight[i] } # Build a listw object from the asymmetric matrix listwAsym <- mat2listw(matasym, style = "B", zero.policy = TRUE) # Calculate Moran's I for AEM MoranIAEM <- moran.randtest(AEM$vectors, listwAsym) }
This function construct a site-by-edge binary matrix. It uses a set of sites
coordinates and a connexion diagram (object of class nb
from the
spdep
package). The 1s in the matrix represents the presence of a link
influencing a site, directly or indirectly, otherwise the values are 0s.
Graphically, the function is implemented such that the directional process is
considered to be going from the bottom to the top of the screen in the
graphical output of R. As such, the origin is underneath the set of points
representing the sites. Prepare the table of site coordinates accordingly.
aem.build.binary( nb.object = NULL, coords, link = NULL, unit.angle = "degrees", rot.angle = 0, rm.same.y = TRUE, plot.connexions = TRUE )
aem.build.binary( nb.object = NULL, coords, link = NULL, unit.angle = "degrees", rot.angle = 0, rm.same.y = TRUE, plot.connexions = TRUE )
nb.object |
Object of class |
coords |
A three columns matrix or data frame. Colums 1: identifiers of the points (needs to be numeric). Columns 2 and 3: the X and Y coordinates of the points. |
link |
A two columns matrix. Each row define an edge. Column 1: The site
from which the edge starts. Column 2: the site to which the edge ends. All
values in |
unit.angle |
Character. The measurement units in which the angle is defined: either "degrees" (default) or "radians". |
rot.angle |
Numeric. Angle of the vector describing the process influencing the sites. This argument generate a rotation of the site coordinates. The set of coordinates is rotated counterclockwise. Negative values will produce a clockwise rotation. |
rm.same.y |
Logical ( |
plot.connexions |
Logical ( |
The lowest site in the gradient is the one that will connect to the fictitious site 0 to consider direction. Note that if there are multiple lowest sites, they will all be connected to the fictitious site 0 to consider direction.
The site-by-edge matrix created by this function and the list of edges
include the links associated to a fictitious site upstream of all other, see
Blanchet et al. (2008), for details. The decision regarding wether the
origin and the edges associated with it should be kept or removed is left to
the user. Removal of this site and of its associated edges can be done
manually after the construction of the site-by-edge matrix and of the list
edges. It can also be done when running the function aem
.
If the connexion diagram was modified so that the links connecting sites that
are exactly perpendicular to the gradient have been removed or if there is no
sites exactly at the same level in the gradient, defining rm.same.y
to
TRUE
will generate an error.
If all the sites have the same y coordinates after rotation, e.g. a horizontal transect perpendicular to the defined spatial asymmetry, this analysis should not be used.
The argument plot.connexions
will plot the sites (coords
) in
black, after rotation, if any, and the connexion diagram (nb.object
),
in red. The site labels are also plotted on the graph. To show the direction
of the spatial asymmetry considered by the function, a fictive site (in blue)
was added upstream. This fictive site is linked (blue edges) to the site(s)
that are the most upstream ones. Since this graph is generic, it might
sometimes look odd, however, the information given will remain the accurate.
se.mat |
A binary (n x k) matrix of site (n rows) by link edges (k columns). |
edges |
A matrix describing the link edges. It has 2 columns (from, to) and as many rows as there are edges. The edges linked to the fictitious site of origin are found at the beginning of the list. |
F. Guillaume Blanchet
Blanchet F.G., P. Legendre and Borcard D. (2008) Modelling directional spatial processes in ecological data. Ecological Modelling, 215, 325-336.
### Create an object of class nb (spdep) if(require("spdep", quietly = TRUE)){ nb<-cell2nb(5,5,"queen") ### Create fictitious geographical coordinates xy <- cbind(1:25,expand.grid(1:5,1:5)) ### Build a binary site-by-link matrix; remove the site which have identical Y coordinate ### (by default argument: rm.same.y = TRUE) bin.mat <- aem.build.binary(nb,xy) str(bin.mat) ### Build a binary site-by-link matrix using the argument link: remove the site which ### have identical Y coordinate (by default argument: rm.same.y = TRUE) edges<-expand.grid(1,2:25) bin.mat <- aem.build.binary(coords=xy,link=edges) str(bin.mat) ### Build a binary site-by-link matrix, making the process affect the points at ### an angle of 45 degrees bin.mat.45 <- aem.build.binary(nb,xy, rot.angle=45) str(bin.mat.45) ### Build a binary site-by-link matrix, making the process affect the points at ### an angle of pi/3 radians bin.mat.pi3 <- aem.build.binary(nb,xy,unit.angle="radians", rot.angle=pi/3) str(bin.mat.pi3) }
### Create an object of class nb (spdep) if(require("spdep", quietly = TRUE)){ nb<-cell2nb(5,5,"queen") ### Create fictitious geographical coordinates xy <- cbind(1:25,expand.grid(1:5,1:5)) ### Build a binary site-by-link matrix; remove the site which have identical Y coordinate ### (by default argument: rm.same.y = TRUE) bin.mat <- aem.build.binary(nb,xy) str(bin.mat) ### Build a binary site-by-link matrix using the argument link: remove the site which ### have identical Y coordinate (by default argument: rm.same.y = TRUE) edges<-expand.grid(1,2:25) bin.mat <- aem.build.binary(coords=xy,link=edges) str(bin.mat) ### Build a binary site-by-link matrix, making the process affect the points at ### an angle of 45 degrees bin.mat.45 <- aem.build.binary(nb,xy, rot.angle=45) str(bin.mat.45) ### Build a binary site-by-link matrix, making the process affect the points at ### an angle of pi/3 radians bin.mat.pi3 <- aem.build.binary(nb,xy,unit.angle="radians", rot.angle=pi/3) str(bin.mat.pi3) }
This function constructs AEM eigenfunctions for multi-scale analysis of a regular time series or spatial transect of univariate or multivariate data.
aem.time(n, w = NULL, moran = FALSE)
aem.time(n, w = NULL, moran = FALSE)
n |
Numeric. Number of points in the series. |
w |
A vector of weights to be applied to the edges (columns of matrix
E). Equal weights are used if no vector |
moran |
Logical. If |
Time series represent a form of directional stochastic process. To emphasize
the directional nature of the process influencing the data, AEM analysis,
which was designed to take trends into account, should be applied to the
non-detrended series. MEM analysis (see scores.listw
) can be applied
to data series that were detrended to remove the directional component as
recommended by Blanchet et al. (2008, 2011) and Legendre & Legendre (2012,
Subsection 14.1.2). Detrended palaeoecological sediment core data, for
example, could be studied by MEM analysis.
No data file needs to be provided to this function. The AEM eigenvectors are constructed from a matrix E generated from the regular sequence of points along the series.
A vector of weights w
can be provided, representing the ease of
communication of matter, energy or information among the points. The most
simple form would be the inverse of (d/dmax) where d is the distance between
adjacent nodes and dmax is the maximum distance between adjacent nodes in the
spatial or time series. More general forms of weights may represent the
inverse of landscape resistance to the movement of organisms, propagules,
genes, etc.
If the calculation of Moran's I is requested, the point coordinates are generated from the point positions along the series.
E |
Nodes-by-edges matrix E. |
values |
Eigenvalues of the principal component analysis of E. |
aem |
Matrix of AEM eigenfunctions normalized to unit length. |
Moran |
Moran's I statistics tested by a bilateral test with 999 permutations |
listw |
An object of class |
Pierre Legendre and F. Guillaume Blanchet
Blanchet F.G., P. Legendre and Borcard D. (2008) Modelling directional spatial processes in ecological data. Ecological Modelling, 215, 325-336.
Blanchet F.G., P. Legendre, R. Maranger, D. Monti, and P. Pepin. (2011) Modelling the effect of directional spatial ecological processes at different scales. Oecologia, 166, 357-368.
Legendre, P. and L. Legendre (2012) Numerical Ecology, 3rd English edition. Elsevier Science BV, Amsterdam.
Legendre, P. and O. Gauthier (2014) Statistical methods for temporal and space-time analysis of community composition data. Proceedings of the Royal Society B - Biological Sciences, 281, 20132728.
aem
, scores.listw
# Time series containing 20 equispaced observations out <- aem.time(20, moran = TRUE) # Time series containing 20 observations with unequal spacing # Generate (n-1) random interpoint distances distances <- runif(19,1,5) # Compute weights representing the ease of communication among points w <- 1/(distances/max(distances)) # Compute the AEM eigenfunctions out <- aem.time(20, w = w, moran = TRUE)
# Time series containing 20 equispaced observations out <- aem.time(20, moran = TRUE) # Time series containing 20 observations with unequal spacing # Generate (n-1) random interpoint distances distances <- runif(19,1,5) # Compute weights representing the ease of communication among points w <- 1/(distances/max(distances)) # Compute the AEM eigenfunctions out <- aem.time(20, w = w, moran = TRUE)
These functions construct a vector of weights that can be
associated to the edges of the connexion diagram used as a basis to build
AEM eigenfunctions. aem.weight.edges
is general and can be used for 1 or
2 dimensional problems. aem.weight.time
is meant to be used only for time
series. It is a wrapper for aem.weight.edges
.
aem.weight.edges( nb.object, coords, distmat = NULL, alpha = 2, beta = NULL, max.d = NULL, unit.angle = "degrees", rot.angle = 0, rm.same.y = TRUE, plot.connexions = TRUE ) aem.weight.time( dates, distmat = NULL, alpha = 2, beta = NULL, max.d = NULL, unit.angle = "degrees", rot.angle = 0, rm.same.y = TRUE, plot.connexions = TRUE )
aem.weight.edges( nb.object, coords, distmat = NULL, alpha = 2, beta = NULL, max.d = NULL, unit.angle = "degrees", rot.angle = 0, rm.same.y = TRUE, plot.connexions = TRUE ) aem.weight.time( dates, distmat = NULL, alpha = 2, beta = NULL, max.d = NULL, unit.angle = "degrees", rot.angle = 0, rm.same.y = TRUE, plot.connexions = TRUE )
nb.object |
Object with class 'nb', computed by the spdep package, containing a list of neighbours for each sampling unit (site or time). |
coords |
A three-column matrix or data frame. Column 1: identifiers of the points (must be numeric). Columns 2 and 3: the X and Y coordinates of the points. |
distmat |
Class 'matrix' or 'dist' object containing a dissimilarity or distance matrix. (See details). |
alpha |
Numeric. Exponent of the first weighting function. (See details). |
beta |
Numeric. Exponent of the second weighting function. (See details). |
max.d |
Numeric. Maximum distance for weighting. Default value if
max.d=NULL: the maximum distance among a set of sites divided by 2 or the
full span of a time series divided by 2 (not recommended in most problems,
see details). A warning is given if |
unit.angle |
Character. The measurement units in which the angle is defined: either "degrees" (default) or "radians". |
rot.angle |
Numeric. Angle of the vector describing the process influencing the sites. This argument generates a rotation of the site coordinates. The set of coordinates is rotated counterclockwise. Negative values will produce a clockwise rotation. |
rm.same.y |
Logical ( |
plot.connexions |
Logical ( |
dates |
A vector of dates, class 'numeric' or 'Date'. |
These functions should be used in close relationship with
aem.build.binary
, consequently many of the arguments in this
function and in aem.build.binary
are the same.
The argument distmat
may contain general forms of dissimilarity, for
example the difficulty of transfer of individuals, matter or energy among the
sampling units through space or time.
In aem.weight.edges
, two weighting functions, described in Legendre and
Legendre (2012, eqs. 114.3 and 14.4) have been implemented, where
is the distance between sites
and
:
Weighting function 1: |
|
Weighting
function 2: |
|
Also note that if a value is provided for beta
(that is, if it is not
NULL
), weighting function 2 is used regardless of whether alpha
is defined or not.
In most applications, the default value of max.d
is not optimal. A more
meaningful solution in many applications is to compute a Moran's I correlogram
(for univariate data) or a Mantel correlogram (for multivariate data), and
provide the distance where the correlation becomes 0 as the value for max.d.
A vector of weights associating a value to each edge of the graph.
aem.weight.time()
:
Olivier Gauthier, Pierre Legendre and F. Guillaume Blanchet
Legendre, P. and L. Legendre (2012) Numerical Ecology, 3rd English edition. Elsevier Science BV, Amsterdam.
Legendre, P. and O. Gauthier (2014) Statistical methods for temporal and space-time analysis of community composition data. Proceedings of the Royal Society B - Biological Sciences, 281, 20132728.
aem.build.binary
, sp.correlogram
,
mantel.correlog
### Time serie example ### Example - 12 dates (days from January 1st of year 1) ### in a 6-year study starting September 5, 2000 if(require("spdep", quietly = TRUE)){ dates <- as.Date(c(129,269,500,631,864,976,1228,1352,1606,1730,1957,2087),origin="2000/1/1") autocor.limit <- 522 # Limit of autcorrelation in the correlogram ### Using aem.weight.time() (wtime <- aem.weight.time(dates, alpha=2, max.d=autocor.limit)) ### Using aem.weight.edges() n <- length(dates) nb <- cell2nb(1, n) xy.dates <- cbind(1:n, rep(1, n), dates) (wtime <- aem.weight.edges(nb, xy.dates, alpha=2, max.d=autocor.limit)) n <- length(dates) nb <- cell2nb(1, n) xy.dates <- cbind(1:n, dates, rep(1, n)) ## Note the inversion of 'dates' and 'rep(1,n)' wtime <- aem.weight.edges(nb, xy.dates, alpha=2, max.d=autocor.limit,rot.angle=90) # Note that 'rot.angle=90' was used ### Spatial example using default d.max (notice the warning) ########################################################################### nb<-cell2nb(5,5,"queen") xy <- cbind(1:25,expand.grid(1:5,1:5)) (wspace <- aem.weight.edges(nb,xy)) }
### Time serie example ### Example - 12 dates (days from January 1st of year 1) ### in a 6-year study starting September 5, 2000 if(require("spdep", quietly = TRUE)){ dates <- as.Date(c(129,269,500,631,864,976,1228,1352,1606,1730,1957,2087),origin="2000/1/1") autocor.limit <- 522 # Limit of autcorrelation in the correlogram ### Using aem.weight.time() (wtime <- aem.weight.time(dates, alpha=2, max.d=autocor.limit)) ### Using aem.weight.edges() n <- length(dates) nb <- cell2nb(1, n) xy.dates <- cbind(1:n, rep(1, n), dates) (wtime <- aem.weight.edges(nb, xy.dates, alpha=2, max.d=autocor.limit)) n <- length(dates) nb <- cell2nb(1, n) xy.dates <- cbind(1:n, dates, rep(1, n)) ## Note the inversion of 'dates' and 'rep(1,n)' wtime <- aem.weight.edges(nb, xy.dates, alpha=2, max.d=autocor.limit,rot.angle=90) # Note that 'rot.angle=90' was used ### Spatial example using default d.max (notice the warning) ########################################################################### nb<-cell2nb(5,5,"queen") xy <- cbind(1:25,expand.grid(1:5,1:5)) (wspace <- aem.weight.edges(nb,xy)) }
Longitude and latitude of 25 samples when bacterial production was sampled in Lake St. Pierre (Québec, Canada). These sampled were carried out in August 18, 2005.
data(bacProdxy)
data(bacProdxy)
A data frame with the coordinates of the 25 sampled locations
Blanchet F.G., P. Legendre, R. Maranger, D. Monti, and P. Pepin. (2011) Modelling the effect of directional spatial ecological processes at different scales. Oecologia, 166, 357-368.
Compute estimates of total beta diversity as the total variance in a community data matrix Y, as well as derived SCBD and LCBD statistics, for 19 dissimilarity coefficients or the raw data table. Computing beta diversity as Var(Y) for raw, untransformed community composition data is not recommended. Tests of significance of the LCBD indices are also produced.
beta.div( Y, method = "hellinger", sqrt.D = FALSE, samp = TRUE, nperm = 999, adj = TRUE, save.D = FALSE, clock = FALSE )
beta.div( Y, method = "hellinger", sqrt.D = FALSE, samp = TRUE, nperm = 999, adj = TRUE, save.D = FALSE, clock = FALSE )
Y |
Community composition data. The object class can be either
|
method |
One of the 19 dissimilarity coefficients available in the
function: |
sqrt.D |
If
|
samp |
If |
nperm |
Number of permutations for the tests of significance of LCBD indices. |
adj |
Compute adjusted p-values using the Holm method. Default: |
save.D |
If |
clock |
If |
Calculations may be carried out in two ways, depending on the selected method.
For untransformed or transformed raw data, the total sum of squares (SStotal)
is first computed, then the total beta diversity (BDtotal), which is SStotal divided
by (n - 1), is calculated. This algorithm is used for methods "euclidean"
,
"profiles"
, "hellinger"
, "chord"
, "log.chord"
,
"chisquare"
. No transformation of the data is computed when the method is
"euclidean"
. For methods "profiles"
, "hellinger"
,
"chord"
, "log.chord"
, "chisquare"
, the algorithm begins with
computation of the same-name transformation of the community data (Legendre and
Gallagher 2001; Legendre and Legendre 2012, Section 7.7; Legendre and Borcard
2018); SStotal and BDtotal are then computed for the transformed data, followed
by calculation of the SCBD and LCBD indices.
Calculations of BDtotal can also be conducted from a dissimilarity
matrix. SStotal is computed by summing the squared dissimilarities in the
lower triangular dissimilarity matrix and dividing by n. Then, total beta
diversity (BDtotal) is obtained by dividing SStotal by (n-1). With option
sqrt.D = TRUE
, the computation of SStotal is equivalent to summing
the distances instead of the squared distances. Choices are:
"whittaker"
, "divergence"
, "canberra"
,
"percentdiff"
, "ruzicka"
, "wishart"
,
"kulczynski"
, "ab.jaccard"
, "ab.sorensen"
,
"ab.ochiai"
, "ab.simpson"
, "jaccard"
,
"sorensen"
, "ochiai"
. Equations for these dissimilarities are
presented in Table 1 of Legendre and De Cáceres (2013). The Ružička index
is described in Legendre (2014); this coefficient is suitable for beta
diversity studies. See Chao et al. (2006) for details about the
abundance-based (ab) coefficients.
Community composition data can be log-transformed prior to analysis with the
chord distance; see Legendre and Borcard (2018). The log(y+1) transformation
(log1p
function of base
) reduces the asymmetry of the species
distributions. The chord-log distance, readily available among the methods of the
beta.div
function, is the chord distance computed on log(y+1)-transformed
data. This combined transformation is meaningful for community composition data
because the log is one of the transformations in the Box-Cox series, corresponding to
exponent 0; see Legendre and Legendre (2012, Section 1.5.6). Exponent 1 (no
transformation of the data) followed by the chord transformation and calculation of
the Euclidean distance would simply produce the chord distance. Exponent 0.5 (square
root) followed by the chord transformation and the Euclidean distance would produce
the Hellinger distance. The chord, Hellinger and log-chord distances represent a
series where the data are increasingly transformed to reduce the asymmetry of the
distributions. Note that it is meaningless to subject log-transformed community
compostion data to the "profiles"
, "hellinger"
, or "chisquare"
distances available in this function.
The Jaccard, Sørensen and Ochiai coefficients are the binary
forms of 10 of the 12 dissimilarity coefficients (including the Ružička
index) that are suitable for beta diversity assessment. The equivalences
are described in Legendre and De Cáceres (2013, Table 1). These popular
coefficients can be computed directly using function beta.div
without going to the trouble of applying the quantitative forms of these
coefficients to data reduced to presence-absence form. beta.div
produces the dissimilarity matrix in the form sqrt(D), which is Euclidean.
Hence for these three coefficients, function beta.div
should be used
with option sqrt.D=FALSE
.
Species contributions to beta diversity (SCBD indices for the species) are computed for untransformed or transformed raw data, but they cannot be computed from dissimilarity matrices.
Local contributions to beta diversity (LCBD indices) represent the degree of uniqueness of the sites in terms of their species compositions. They can be computed in all cases: raw (not recommended) or transformed data, as well as dissimilarity matrices. See Legendre and De Cáceres (2013) for details. LCBD indices are tested for significance by random, independent permutations within the columns of Y. This permutation method tests H0 that the species are distributed at random, independently of one another, among the sites, while preserving the species abundance distributions in the observed data. See Legendre and De Cáceres (2013) for discussion.
This version of beta.div
calls computer code written in C to speed up
computation, especially for the permutation tests of the LCBD indices.
A list containing the following results:
beta
: Total sum of squares and total beta diversity [=
Var(Y)] of the data matrix. BDtotal statistics computed with the same D
index are comparable among data sets having the same or different numbers
of sampling units (n), provided that they are of the same size or represent
the same sampling effort.
SCBD
: Vector of Species
contributions to beta diversity (SCBD), if computed.
LCBD
:
Vector of Local contributions to beta diversity (LCBD) for the sites.
p.LCBD
: P-values associated with the LCBD indices.
p.adj
: Corrected P-values for the LCBD indices, Holm correction.
method
: Method selected.
note
: Notes indicate whether
the selected coefficient is Euclidean or not.
D
: The distance
matrix if save.D=TRUE
.
When all sites contain a different set of species with no species in common, the maximum value that BDtotal can take depends on the method used in the calculation.
With methods
"hellinger"
, "chord"
, "profiles"
, which have maximum values of
sqrt(2), BDtotal produces an index in the range [0, 1] with a maximum value
of 1.
For dissimilarity indices with maximum values of 1, BDtotal has a maximum value of 0.5.
Dissimilarity indices that do not have maximum values of 1 or sqrt(2) produce BDtotal values that do not have an upper bound; hence they cannot be compared across taxonomic groups or among study sites. This group includes the chi-square distance.
See Legendre &
De Caceres (2013, p. 957-958), Table 2 and section Maximum value of BD.
For two sites only, the LCBD results are not interesting. With all
coefficients, the two LCBD indices are equal to 0.5. The two associated
p-values are 1 because LCBD is 0.5 for all columnwise permutations of the
data.
The calculation is aborted when Y only contains two identical
rows of data. In that case, SStotal and BDtotal are 0 and the LCBD indices
cannot be computed (value NaN).
Pierre Legendre [email protected]
Chao, A., R. L. Chazdon, R. K. Colwell and T. J. Shen. 2006. Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62: 361-371.
Legendre, P. 2014. Interpreting the replacement and richness difference components of beta diversity. Global Ecology and Biogeography 23: 1324-1334.
Legendre, P. and D. Borcard. 2018. Box-Cox-chord transformations for community composition data prior to beta diversity analysis. Ecography 41: 1820-1824.
Legendre, P. and M. De Cáceres. 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters 16: 951-963.
Legendre, P. and E. D. Gallagher, E.D. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271-280.
Legendre, P. and Legendre, L. 2012. Numerical Ecology. 3rd English edition. Elsevier Science BV, Amsterdam.
if(require("vegan", quietly = TRUE) & require("adegraphics", quietly = TRUE)){ data(mite) res = beta.div(mite, "hellinger", nperm=999) # Plot a map of the LCBD indices using the Cartesian coordinates data(mite.xy) s.value(mite.xy, res$LCBD, symbol = "circle", col = c("white", "brown"), main="Map of mite LCBD") ### Example using the mite abundance data and the percentage difference dissimilarity res = beta.div(mite, "percentdiff", nperm=999, clock=TRUE) # Plot a map of the LCBD indices signif = which(res$p.LCBD <= 0.05) # Which are the significant LCBD indices? nonsignif = which(res$p.LCBD > 0.05) # Which are the non-significant LCBD indices? g1 <- s.value(mite.xy[signif,], res$LCBD[signif], ppoint.alpha = 0.5, plegend.drawKey = FALSE, symbol = "circle", col = c("white", "red"), main="Map of mite LCBD (red = significant indices)") g2 <- s.value(mite.xy[nonsignif,], res$LCBD[nonsignif], ppoint.alpha = 0.5, symbol = "circle", col = c("white", "blue")) g2+g1 }
if(require("vegan", quietly = TRUE) & require("adegraphics", quietly = TRUE)){ data(mite) res = beta.div(mite, "hellinger", nperm=999) # Plot a map of the LCBD indices using the Cartesian coordinates data(mite.xy) s.value(mite.xy, res$LCBD, symbol = "circle", col = c("white", "brown"), main="Map of mite LCBD") ### Example using the mite abundance data and the percentage difference dissimilarity res = beta.div(mite, "percentdiff", nperm=999, clock=TRUE) # Plot a map of the LCBD indices signif = which(res$p.LCBD <= 0.05) # Which are the significant LCBD indices? nonsignif = which(res$p.LCBD > 0.05) # Which are the non-significant LCBD indices? g1 <- s.value(mite.xy[signif,], res$LCBD[signif], ppoint.alpha = 0.5, plegend.drawKey = FALSE, symbol = "circle", col = c("white", "red"), main="Map of mite LCBD (red = significant indices)") g2 <- s.value(mite.xy[nonsignif,], res$LCBD[nonsignif], ppoint.alpha = 0.5, symbol = "circle", col = c("white", "blue")) g2+g1 }
Podani-family and Baselga-family decompositions of the Jaccard and Sørensen dissimilarity coefficients and their quantitative forms (Ruzicka and percentage difference) into replacement and richness difference components, for species presence-absence or abundance data, as described in Legendre (2014).
beta.div.comp(mat, coef = "J", quant = FALSE, save.abc = FALSE)
beta.div.comp(mat, coef = "J", quant = FALSE, save.abc = FALSE)
mat |
Community composition data ( |
coef |
Family of coefficients to be computed.
The quantitative form of the Sørensen dissimilarity is the percentage difference index. The quantitative form of the Jaccard dissimilarity is the Ruzicka index. |
quant |
If |
save.abc |
If |
For species presence-absence data, the dissimilarity coefficients are
Jaccard = (b+c)/(a+b+c) and Sørensen = (b+c)/(2*a+b+c) with the usual a,b,c
notation. For species abundance data, the dissimilarity coefficients are the
Ruzicka index = (B+C)/(A+B+C) and Odum's percentage difference =
(B+C)/(2A+B+C) (aka Bray-Curtis in some packages), where
A = sum of the intersections (or minima) of species abundances at two sites,
B = sum of abundances at site 1 minus A,
C = sum of abundances at site 2 minus A.
The binary
(quant=FALSE
) and quantitative (quant=TRUE
) forms of the S and
J indices return the same values when computed for presence-absence data.
A list containing the following results:
repl
: Replacement matrix, class = dist.
rich
: Richness/abundance difference or nestedness matrix (class
dist
). With options "BJ", "BS" and "N", rich
contains
nestedness indices. With option "N", the repl[i,j] and rich[i,j] values do
not add up to D[i,j].
D
: Dissimilarity matrix (classdist
).
part
: Beta diversity partitioning vector:
BDtotal (total beta diversity) = sum(D.ij)/(n*(n-1)) (Legendre & De Cáceres 2013). This is equal to sum(d.ij^2)/(n*(n-1)) where d.ij = sqrt(D.ij). The dissimilarities are square-rooted because the Jaccard, Sørensen, Ruzicka and percentage difference indices are not Euclidean.
Repl = Total replacement diversity.
RichDiff|Nes = Total richness difference diversity (or nestedness).
Repl/BDtotal = Total replacement diversity/Total beta diversity.
RichDiff/BDtotal = Total richness difference diversity (or nestedness)/Total beta diversity.
note
: Name of the dissimilarity coefficient.
The Jaccard and Sørensen dissimilarity coefficients and their quantitative forms, the Ruzicka and percentage difference indices, all have upper bounds (Dmax) of 1. Hence, when all sites contain a different set of species with no species in common, the maximum value that BDtotal can take is 0.5. See Legendre & De Caceres (2013, p. 958), section Maximum value of BD. This differs form the values produced by function beta.div(): with methods "hellinger", "chord" and "profiles", which have maximum values of sqrt(2), BDtotal has a maximum value of 1 for these dissimilarities.
Pierre Legendre [email protected]
Baselga, A. (2010) Partitioning the turnover and nestedness components of beta diversity. Global Ecology and Biogeography, 19, 134–143.
Baselga, A. (2012) The relationship between species replacement, dissimilarity derived from nestedness, and nestedness. Global Ecology and Biogeography, 21, 1223-1232.
Baselga, A. (2013) Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs. abundance gradients. Methods in Ecology and Evolution, 4, 552-557.
Carvalho, J.C., Cardoso, P., Borges, P.A.V., Schmera, D. & Podani, J. (2013) Measuring fractions of beta diversity and their relationships to nestedness: a theoretical and empirical comparison of novel approaches. Oikos, 122, 825-834.
Legendre, P. 2014. Interpreting the replacement and richness difference components of beta diversity. Global Ecology and Biogeography, 23, 1324-1334.
Legendre, P. and M. De Cáceres. 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters 16: 951-963.
Podani, J., Ricotta, C. & Schmera, D. (2013) A general framework for analyzing beta diversity, nestedness and related community-level phenomena based on abundance data. Ecological Complexity, 15, 52-61.
Podani, J. & Schmera, D. 2011. A new conceptual and methodological framework for exploring and explaining pattern in presence-absence data. Oikos, 120, 1625-1638.
if(require(ade4, quietly = TRUE)){ data(doubs) fish.sp = doubs$fish[-8,] # Fish data; site 8 is removed because no fish were caught # Compute and partition a matrix of Jaccard indices (presence-absence data) out1 = beta.div.comp(fish.sp, coef="J", quant=FALSE) out1$part # Compute and partition a matrix of percentage difference indices # (quantitative form of Sorensen index) out2 = beta.div.comp(fish.sp, coef="S", quant=TRUE) out2$part # In paragraph Value, see the description of the 5 elements of vector part. # Is the fish beta diversity dominated by replacement or richness/abundance difference? }
if(require(ade4, quietly = TRUE)){ data(doubs) fish.sp = doubs$fish[-8,] # Fish data; site 8 is removed because no fish were caught # Compute and partition a matrix of Jaccard indices (presence-absence data) out1 = beta.div.comp(fish.sp, coef="J", quant=FALSE) out1$part # Compute and partition a matrix of percentage difference indices # (quantitative form of Sorensen index) out2 = beta.div.comp(fish.sp, coef="S", quant=TRUE) out2$part # In paragraph Value, see the description of the 5 elements of vector part. # Is the fish beta diversity dominated by replacement or richness/abundance difference? }
The function chooseCN
is a simple interface to build a connection
network (CN) from xy coordinates. The user chooses from 6 types of graph and
one additional weighting scheme. chooseCN
calls functions from
appropriate packages, handles non-unique coordinates and returns a
connection network either with classe nb
or listw
. For graph
types 1-4, duplicated locations are not accepted and will issue an error.
chooseCN( xy, ask = TRUE, type = NULL, result.type = "nb", d1 = NULL, d2 = NULL, k = NULL, a = NULL, dmin = NULL, plot.nb = TRUE, edit.nb = FALSE )
chooseCN( xy, ask = TRUE, type = NULL, result.type = "nb", d1 = NULL, d2 = NULL, k = NULL, a = NULL, dmin = NULL, plot.nb = TRUE, edit.nb = FALSE )
xy |
an matrix or data.frame with two columns for x and y coordinates. |
ask |
a logical stating whether graph should be chosen interactively
(TRUE,default) or not (FALSE). Set to FALSE if |
type |
an integer giving the type of graph (see details). |
result.type |
a character giving the class of the returned object.
Either "nb" (default) or "listw", both from |
d1 |
the minimum distance between any two neighbours. Used if
|
d2 |
the maximum distance between any two neighbours. Used if
|
k |
the number of neighbours per point. Used if |
a |
the exponent of the inverse distance matrix. Used if |
dmin |
the minimum distance between any two distinct points. Used to
avoid infinite spatial proximities (defined as the inversed spatial
distances). Used if |
plot.nb |
a logical stating whether the resulting graph should be plotted (TRUE, default) or not (FALSE). |
edit.nb |
a logical stating whether the resulting graph should be edited manually for corrections (TRUE) or not (FALSE, default). |
There are 7 kinds of graphs proposed:
Delaunay triangulation (type 1)
Gabriel graph (type 2)
Relative neighbours (type 3)
Minimum spanning
tree (type 4)
Neighbourhood by distance (type 5)
K nearests neighbours
(type 6)
Inverse distances (type 7)
The last option (type=7) is not a true neighbouring graph: all sites are
neighbours, but the spatial weights are directly proportional to the
inversed spatial distances.
Also not that in this case, the output of the
function is always a listw
object, even if nb
was
requested.
The choice of the connection network has been discuted on the adegenet forum. Please search the archives from adegenet website (section 'contact') using 'graph' as keyword.
Returns a connection network having the class nb
or
listw
. The xy coordinates are passed as attribute to the created
object.
Thibaut Jombart [email protected]
if(require("ade4", quietly = TRUE)){ data(mafragh) oldpar <- par(mfrow=c(2,2)) cn1 <- chooseCN(mafragh$xy,ask=FALSE,type=1) cn2 <- chooseCN(mafragh$xy,ask=FALSE,type=2) cn3 <- chooseCN(mafragh$xy,ask=FALSE,type=3) cn4 <- chooseCN(mafragh$xy,ask=FALSE,type=4) par(oldpar) }
if(require("ade4", quietly = TRUE)){ data(mafragh) oldpar <- par(mfrow=c(2,2)) cn1 <- chooseCN(mafragh$xy,ask=FALSE,type=1) cn2 <- chooseCN(mafragh$xy,ask=FALSE,type=2) cn3 <- chooseCN(mafragh$xy,ask=FALSE,type=3) cn4 <- chooseCN(mafragh$xy,ask=FALSE,type=4) par(oldpar) }
Function constr.hclust
carries out space-constrained or
time-constrained agglomerative clustering from a multivariate dissimilarity
matrix.
constr.hclust( d, method = "ward.D2", links, coords, beta = -0.25, chron = FALSE, members = NULL )
constr.hclust( d, method = "ward.D2", links, coords, beta = -0.25, chron = FALSE, members = NULL )
d |
A |
method |
The agglomeration method to be used (default: "ward.D2"; see details). |
links |
A list of edges (or links) connecting the points. May be omitted in some cases; see details and examples |
coords |
Coordinates of the observations (data rows) in the dissimilarity matrix
|
beta |
The beta parameter for beta-flexible clustering (default:
|
chron |
Logical (TRUE or FALSE) indicating whether a chronological (i.e.
time-constrained or spatial transect) clustering should be calculated (default:
|
members |
NULL or a vector with length size of |
The agglomeration method to be used should be (an unambiguous
abbreviation of) one of "ward.D"
,
"ward.D2"
, "single"
, "complete"
,
"average"
(UPGMA), "mcquitty"
(WPGMA),
"centroid"
(UPGMC), "median"
(WPGMC), or
"flexible"
. Method "ward.D2"
(default) implements the
Ward (1963) clustering criterion, method
"ward.D"
does not (Murtagh and Legendre, 2014).
Agglomerative clustering can be carried out with a constraint of spatial or
temporal contiguity. This means that only the objects that are linked in
links
are considered to be candidates for clustering: the next pair of
objects to cluster will be the pair that has the lowest dissimilarity value
among the pairs that are linked.
The same rule applies during the subsequent clustering steps, which involve groups of objects: the list of links is updated after each agglomeration step. All objects that are neighbours of one of the components that have fused are now neighbours of the newly formed cluster.
The edges (links) are specified using argument links
, which can be an
object of class nb
(see, e.g., tri2nb
), an object
of class listw
(see, e.g., nb2listw
), a
two-element list
or an object coercible as a such (e.g., a two-column
dataframe
), or a two-column matrix with each row representing an edge
and the columns representing the two ends of the edges. For lists with more
than two elements, as well as dataframes or matrices with more than
two-columns, only the first two elements or columns are used for the analysis.
The edges are interpreted as being non directional; there is no need to
specify an edge going from point a to point b and one going from point b to
point a. While doing so is generally inconsequential for the analysis, it
carries some penalty in terms of computation time. It is a good practice to
place the nodes in increasing order of numbers from the top to the bottom and
from the left to the right of the list but this is not mandatory. A word of
caution: in cases where clusters with identical minimum distances occur, the
order of the edges in the list may have an influence on the result.
Alternative results would be statistically equivalent.
When argument link
is omitted, regular (unconstrained) clustering is
performed and a hclust-class
object is returned unless
argument chron = TRUE
. When argument chron = TRUE
,
chronological clustering is performed, taking the order of observations as
their positions in the sequence. Argument links
is not used when
chron = TRUE
. Argument chron
allows one to perform a
chronological clustering in the case where observations are ordered
chronologically. Here, the term "chronologically" should not be taken
restrictively: the method remains applicable to other sequential data sets
such as spatial series made of observations along a transect.
When the graph described by link
is not entirely connected, a warning
message is issued to warn the user about the presence and number of disjoint
clusters and a procedure is suggested to identify the disjoint clusters. The
disjoint clusters (or singletons) are merged in the order of their indices
(i.e. the two clusters with smallest indices are merged first) and so on
until all of disjoint clusters have been merged. The dissimilarity at which
these clusters are merged is a missing value (NA
) in vector
height
(i.e., unconnected clusters have undefined dissimilarities in
constrained clustering).
If members != NULL
, then d
is taken to be a dissimilarity
matrix between clusters instead of dissimilarities between individual objects.
Then, members
must be a vector giving the number of observations per
cluster. In this way, the hierarchical clustering algorithm can be ‘started
in the middle of the dendrogram’, e.g., in order to reconstruct the part of
the tree above a cut. See examples in hclust
for details on that
functionality."
Memory storage and time to compute constrained clustering for N objects. The Lance and Williams algorithm for agglomerative clustering uses dissimilarity matrices. The amount of memory needed to store the dissimilarities among N observations as 64-bit double precision floating point variables (IEEE 754) is 8*N*(N-1)/2 bytes. For example, a dissimilarity matrix among 22 500 observations would require 2 024 910 000 bytes (1.89 GiB) of storage whereas one among 100 000 observations would take up 39 999 600 000 bytes (37.25 GiB). The implementation in this function needs to cache a copy of the dissimilarity matrix as its elements are modified following each merging of the closest clusters or singletons, thereby doubling the amounts of required memory shown above. Memory needed to store the other information associated with the clustering is much smaller. Users should make sure to have the necessary memory space (and system stability) before attempting to analyze large data sets. What is considered a large amount of memory has increased over time as computer hardware evolved with time. We let users apply contemporary common sense as to what sample sizes represent manageable clustering problems. Computation time grows with N at roughly the same speed as memory storage requirement to store the dissimilarity matrices increases. See the Benchmarking example below.
With large data sets, a manageable output describing the classification of
the sites is obtained with function cutree
(x, k) where k is the
number of groups. A dendrogram would be unreadable.
A constr.hclust-class
object.
Pierre Legendre [email protected] (preliminary version coded in R) and Guillaume Guénard [email protected] (present version mostly coded in C)
Guénard, G. and P. Legendre. 2022. Hierarchical clustering with contiguity constraint in R. Journal of Statistical Software 103(7): 1-26 doi:10.18637/jss.v103.i07
Langfelder, P. and S. Horvath. 2012. Fast R functions for robust correlations and hierarchical clustering. Journal of Statistical Software 46(11): 1-17. doi:10.18637/jss.v046.i11
Legendre, P. and L. Legendre. 2012. Numerical ecology, 3rd English edition. Elsevier Science BV, Amsterdam. doi:10.1016/S0304-3800(00)00291-X
Murtagh, F. and P. Legendre. 2014. Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? Journal of Classification 31: 274-295. doi:10.1007/s00357-014-9161-z
Ward, J. H. 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58: 236-244. doi:10.1080/01621459.1963.10500845
plot.constr.hclust
, hclust
,
cutree
, and ScotchWhiskey
## First example: Artificial map data from Legendre & Legendre ## (2012, Fig. 13.26): n = 16 dat <- c(41,42,25,38,50,30,41,43,43,41,30,50,38,25,42,41) coord.dat <- matrix(c(1,3,5,7,2,4,6,8,1,3,5,7,2,4,6,8, 4.4,4.4,4.4,4.4,3.3,3.3,3.3,3.3, 2.2,2.2,2.2,2.2,1.1,1.1,1.1,1.1),16,2) ## Obtaining a list of neighbours: library(spdep) listW <- nb2listw(tri2nb(coord.dat), style="B") links.mat.dat <- listw2mat(listW) neighbors <- listw2sn(listW)[,1:2] ## Calculating the (Euclidean) distance between points: D.dat <- dist(dat) ## Display the points: plot(coord.dat, type='n',asp=1) title("Delaunay triangulation") text(coord.dat, labels=as.character(as.matrix(dat)), pos=3) for(i in 1:nrow(neighbors)) lines(rbind(coord.dat[neighbors[i,1],], coord.dat[neighbors[i,2],])) ## Unconstrained clustring by hclust: grpWD2_hclust <- hclust(D.dat, method="ward.D2") plot(grpWD2_hclust, hang=-1) ## Clustering without a contiguity constraint; ## the result is represented as a dendrogram: grpWD2_constr_hclust <- constr.hclust(D.dat, method="ward.D2") plot(grpWD2_constr_hclust, hang=-1) ## Clustering with a contiguity constraint described by a list of ## links: grpWD2cst_constr_hclust <- constr.hclust( D.dat, method="ward.D2", neighbors, coord.dat) ## To visualize using hclust's plotting method: ## stats:::plot.hclust(grpWD2cst_constr_hclust, hang=-1) ## Plot the results on a map with k=3 clusters: plot(grpWD2cst_constr_hclust, k=3, links=TRUE, las=1, xlab="Eastings", ylab="Northings", cex=3, lwd=3) ## Generic functions from hclust can be used, for instance to obtain ## a list of members of each cluster: cutree(grpWD2cst_constr_hclust, k=3) ## Now with k=5 clusters: plot(grpWD2cst_constr_hclust, k=5, links=TRUE, las=1, xlab="Eastings", ylab="Northings", cex=3, lwd=3) cutree(grpWD2cst_constr_hclust, k=5) ## End of the artificial map example ## Second example: Scotch Whiskey distilleries clustered using tasting ## scores (nose, body, palate, finish, and the four distances combined) ## constrained with respect to the distillery locations in Scotland. ## Documentation file about the Scotch Whiskey data: ?ScotchWhiskey data(ScotchWhiskey) ## Cluster analyses for the nose, body, palate, and finish D ## matrices: grpWD2cst_ScotchWhiskey <- lapply( ScotchWhiskey$dist, ## A list of distance matrices constr.hclust, ## The function called by function lapply links=ScotchWhiskey$neighbors@data, ## The list of links coords=ScotchWhiskey$geo@coords/1000 ) ## The four D matrices (nose, body, palate, finish), represented as ## vectors in the ScotchWiskey data file, are combined as follows to ## produce a single distance matrix integrating all four types of ## tastes: Dmat <- ScotchWhiskey$dist ScotchWhiskey[["norm"]] <- sqrt(Dmat$nose^2 + Dmat$body^2 + Dmat$palate^2 + Dmat$finish^2) ## This example shows how to apply const.clust to a single D matrix when ## the data file contains several matrices. grpWD2cst_ScotchWhiskey[["norm"]] <- constr.hclust( d=ScotchWhiskey[["norm"]],method="ward.D2", ScotchWhiskey$neighbors@data, coords=ScotchWhiskey$geo@coords/1000 ) ## A fonction to plot the Whiskey clustering results: plotWhiskey <- function(wh, k) { par(fig=c(0,1,0,1)) plot(grpWD2cst_ScotchWhiskey[[wh]], k=k, links=TRUE, las=1, xlab="Eastings (km)", ylab="Northings (km)", cex=0.1, lwd=3, main=sprintf("Feature: %s",wh)) text(ScotchWhiskey$geo@coords/1000,labels=1:length(ScotchWhiskey$geo)) legend(x=375, y=700, lty=1L, lwd=3, col=rainbow(1.2*k)[1L:k], legend=sprintf("Group %d",1:k), cex=1.25) SpeyZoom <- list(xlim=c(314.7,342.2), ylim=c(834.3,860.0)) rect(xleft=SpeyZoom$xlim[1L], ybottom=SpeyZoom$ylim[1L],col="#E6E6E680", xright=SpeyZoom$xlim[2L], ytop=SpeyZoom$ylim[2L], lwd=2, lty=1L) par(fig=c(0.01,0.50,0.46,0.99), new=TRUE) plot(grpWD2cst_ScotchWhiskey[[wh]], xlim=SpeyZoom$xlim, ylim=SpeyZoom$ylim, k=k, links=TRUE, las=1, xlab="", ylab="", cex=0.1, lwd=3, axes=FALSE) text(ScotchWhiskey$geo@coords/1000,labels=1:length(ScotchWhiskey$geo)) rect(xleft=SpeyZoom$xlim[1L], ybottom=SpeyZoom$ylim[1L], xright=SpeyZoom$xlim[2L], ytop=SpeyZoom$ylim[2L], lwd=2, lty=1L) } ## Plot the clustering results on the map of Scotland for 5 groups. ## The inset map shows the Speyside distilleries in detail: plotWhiskey("nose", 5L) plotWhiskey("body", 5L) plotWhiskey("palate", 5L) plotWhiskey("finish", 5L) plotWhiskey("norm", 5L) ## End of the Scotch Whiskey tasting data example ## Not run: ## Third example: Fish community composition along the Doubs River, ## France. The sequence is analyzed as a case of chronological ## clustering, substituting space for time. if(require("ade4", quietly = TRUE)){ data(doubs, package="ade4") Doubs.D <- dist.ldc(doubs$fish, method="hellinger") grpWD2cst_fish <- constr.hclust(Doubs.D, method="ward.D2", chron=TRUE, coords=as.matrix(doubs$xy)) plot(grpWD2cst_fish, k=5, las=1, xlab="Eastings (km)", ylab="Northings (km)", cex=3, lwd=3) ## Repeat the plot with other values of k (number of groups) ## End of the Doubs River fish assemblages example ## Example with 6 connected points, shown in Fig. 2 of Guénard & Legendre paper var = c(1.5, 0.2, 5.1, 3.0, 2.1, 1.4) ex.Y = data.frame(var) ## Site coordinates, matrix xy x.coo = c(-1, -2, -0.5, 0.5, 2, 1) y.coo = c(-2, -1, 0, 0, 1, 2) ex.xy = data.frame(x.coo, y.coo) ## Matrix of connecting edges E from = c(1,1,2,3,4,3,4) to = c(2,3,3,4,5,6,6) ex.E = data.frame(from, to) ## Carry out constrained clustering analysis test.out <- constr.hclust( dist(ex.Y), # Response dissimilarity matrix method="ward.D2", # Clustering method links=ex.E, # File of link edges (constraint) E coords=ex.xy # File of geographic coordinates ) par(mfrow=c(1,2)) ## Plot the map of the results for k = 3 plot(test.out, k=3) ## Plot the dendrogram stats:::plot.hclust(test.out, hang=-1) } ## Same example modified: disjoint clusters ## Same ex.Y and ex.xy as in the previous example var = c(1.5, 0.2, 5.1, 3.0, 2.1, 1.4) ex.Y = data.frame(var) ## Site coordinates, matrix xy x.coo = c(-1, -2, -0.5, 0.5, 2, 1) y.coo = c(-2, -1, 0, 0, 1, 2) ex.xy = data.frame(x.coo, y.coo) ## Matrix of connecting edges E2 from = c(1,1,2,4,4) to = c(2,3,3,5,6) ex.E2 = data.frame(from, to) ## Carry out constrained clustering analysis test.out2 <- constr.hclust( dist(ex.Y), # Response dissimilarity matrix method="ward.D2", # Clustering method links=ex.E2, # File of link edges (constraint) E coords=ex.xy # File of geographic coordinates ) cutree(test.out2, k=2) par(mfrow=c(1,2)) ## Plot the map of the results for k = 3 plot(test.out2, k=3) ## Plot the dendrogram showing the disconnected groups stats:::plot.hclust(test.out2, hang=-1) axis(2,at=0:ceiling(max(test.out2$height,na.rm=TRUE))) ## End of the disjoint clusters example ## End(Not run) ## End of examples
## First example: Artificial map data from Legendre & Legendre ## (2012, Fig. 13.26): n = 16 dat <- c(41,42,25,38,50,30,41,43,43,41,30,50,38,25,42,41) coord.dat <- matrix(c(1,3,5,7,2,4,6,8,1,3,5,7,2,4,6,8, 4.4,4.4,4.4,4.4,3.3,3.3,3.3,3.3, 2.2,2.2,2.2,2.2,1.1,1.1,1.1,1.1),16,2) ## Obtaining a list of neighbours: library(spdep) listW <- nb2listw(tri2nb(coord.dat), style="B") links.mat.dat <- listw2mat(listW) neighbors <- listw2sn(listW)[,1:2] ## Calculating the (Euclidean) distance between points: D.dat <- dist(dat) ## Display the points: plot(coord.dat, type='n',asp=1) title("Delaunay triangulation") text(coord.dat, labels=as.character(as.matrix(dat)), pos=3) for(i in 1:nrow(neighbors)) lines(rbind(coord.dat[neighbors[i,1],], coord.dat[neighbors[i,2],])) ## Unconstrained clustring by hclust: grpWD2_hclust <- hclust(D.dat, method="ward.D2") plot(grpWD2_hclust, hang=-1) ## Clustering without a contiguity constraint; ## the result is represented as a dendrogram: grpWD2_constr_hclust <- constr.hclust(D.dat, method="ward.D2") plot(grpWD2_constr_hclust, hang=-1) ## Clustering with a contiguity constraint described by a list of ## links: grpWD2cst_constr_hclust <- constr.hclust( D.dat, method="ward.D2", neighbors, coord.dat) ## To visualize using hclust's plotting method: ## stats:::plot.hclust(grpWD2cst_constr_hclust, hang=-1) ## Plot the results on a map with k=3 clusters: plot(grpWD2cst_constr_hclust, k=3, links=TRUE, las=1, xlab="Eastings", ylab="Northings", cex=3, lwd=3) ## Generic functions from hclust can be used, for instance to obtain ## a list of members of each cluster: cutree(grpWD2cst_constr_hclust, k=3) ## Now with k=5 clusters: plot(grpWD2cst_constr_hclust, k=5, links=TRUE, las=1, xlab="Eastings", ylab="Northings", cex=3, lwd=3) cutree(grpWD2cst_constr_hclust, k=5) ## End of the artificial map example ## Second example: Scotch Whiskey distilleries clustered using tasting ## scores (nose, body, palate, finish, and the four distances combined) ## constrained with respect to the distillery locations in Scotland. ## Documentation file about the Scotch Whiskey data: ?ScotchWhiskey data(ScotchWhiskey) ## Cluster analyses for the nose, body, palate, and finish D ## matrices: grpWD2cst_ScotchWhiskey <- lapply( ScotchWhiskey$dist, ## A list of distance matrices constr.hclust, ## The function called by function lapply links=ScotchWhiskey$neighbors@data, ## The list of links coords=ScotchWhiskey$geo@coords/1000 ) ## The four D matrices (nose, body, palate, finish), represented as ## vectors in the ScotchWiskey data file, are combined as follows to ## produce a single distance matrix integrating all four types of ## tastes: Dmat <- ScotchWhiskey$dist ScotchWhiskey[["norm"]] <- sqrt(Dmat$nose^2 + Dmat$body^2 + Dmat$palate^2 + Dmat$finish^2) ## This example shows how to apply const.clust to a single D matrix when ## the data file contains several matrices. grpWD2cst_ScotchWhiskey[["norm"]] <- constr.hclust( d=ScotchWhiskey[["norm"]],method="ward.D2", ScotchWhiskey$neighbors@data, coords=ScotchWhiskey$geo@coords/1000 ) ## A fonction to plot the Whiskey clustering results: plotWhiskey <- function(wh, k) { par(fig=c(0,1,0,1)) plot(grpWD2cst_ScotchWhiskey[[wh]], k=k, links=TRUE, las=1, xlab="Eastings (km)", ylab="Northings (km)", cex=0.1, lwd=3, main=sprintf("Feature: %s",wh)) text(ScotchWhiskey$geo@coords/1000,labels=1:length(ScotchWhiskey$geo)) legend(x=375, y=700, lty=1L, lwd=3, col=rainbow(1.2*k)[1L:k], legend=sprintf("Group %d",1:k), cex=1.25) SpeyZoom <- list(xlim=c(314.7,342.2), ylim=c(834.3,860.0)) rect(xleft=SpeyZoom$xlim[1L], ybottom=SpeyZoom$ylim[1L],col="#E6E6E680", xright=SpeyZoom$xlim[2L], ytop=SpeyZoom$ylim[2L], lwd=2, lty=1L) par(fig=c(0.01,0.50,0.46,0.99), new=TRUE) plot(grpWD2cst_ScotchWhiskey[[wh]], xlim=SpeyZoom$xlim, ylim=SpeyZoom$ylim, k=k, links=TRUE, las=1, xlab="", ylab="", cex=0.1, lwd=3, axes=FALSE) text(ScotchWhiskey$geo@coords/1000,labels=1:length(ScotchWhiskey$geo)) rect(xleft=SpeyZoom$xlim[1L], ybottom=SpeyZoom$ylim[1L], xright=SpeyZoom$xlim[2L], ytop=SpeyZoom$ylim[2L], lwd=2, lty=1L) } ## Plot the clustering results on the map of Scotland for 5 groups. ## The inset map shows the Speyside distilleries in detail: plotWhiskey("nose", 5L) plotWhiskey("body", 5L) plotWhiskey("palate", 5L) plotWhiskey("finish", 5L) plotWhiskey("norm", 5L) ## End of the Scotch Whiskey tasting data example ## Not run: ## Third example: Fish community composition along the Doubs River, ## France. The sequence is analyzed as a case of chronological ## clustering, substituting space for time. if(require("ade4", quietly = TRUE)){ data(doubs, package="ade4") Doubs.D <- dist.ldc(doubs$fish, method="hellinger") grpWD2cst_fish <- constr.hclust(Doubs.D, method="ward.D2", chron=TRUE, coords=as.matrix(doubs$xy)) plot(grpWD2cst_fish, k=5, las=1, xlab="Eastings (km)", ylab="Northings (km)", cex=3, lwd=3) ## Repeat the plot with other values of k (number of groups) ## End of the Doubs River fish assemblages example ## Example with 6 connected points, shown in Fig. 2 of Guénard & Legendre paper var = c(1.5, 0.2, 5.1, 3.0, 2.1, 1.4) ex.Y = data.frame(var) ## Site coordinates, matrix xy x.coo = c(-1, -2, -0.5, 0.5, 2, 1) y.coo = c(-2, -1, 0, 0, 1, 2) ex.xy = data.frame(x.coo, y.coo) ## Matrix of connecting edges E from = c(1,1,2,3,4,3,4) to = c(2,3,3,4,5,6,6) ex.E = data.frame(from, to) ## Carry out constrained clustering analysis test.out <- constr.hclust( dist(ex.Y), # Response dissimilarity matrix method="ward.D2", # Clustering method links=ex.E, # File of link edges (constraint) E coords=ex.xy # File of geographic coordinates ) par(mfrow=c(1,2)) ## Plot the map of the results for k = 3 plot(test.out, k=3) ## Plot the dendrogram stats:::plot.hclust(test.out, hang=-1) } ## Same example modified: disjoint clusters ## Same ex.Y and ex.xy as in the previous example var = c(1.5, 0.2, 5.1, 3.0, 2.1, 1.4) ex.Y = data.frame(var) ## Site coordinates, matrix xy x.coo = c(-1, -2, -0.5, 0.5, 2, 1) y.coo = c(-2, -1, 0, 0, 1, 2) ex.xy = data.frame(x.coo, y.coo) ## Matrix of connecting edges E2 from = c(1,1,2,4,4) to = c(2,3,3,5,6) ex.E2 = data.frame(from, to) ## Carry out constrained clustering analysis test.out2 <- constr.hclust( dist(ex.Y), # Response dissimilarity matrix method="ward.D2", # Clustering method links=ex.E2, # File of link edges (constraint) E coords=ex.xy # File of geographic coordinates ) cutree(test.out2, k=2) par(mfrow=c(1,2)) ## Plot the map of the results for k = 3 plot(test.out2, k=3) ## Plot the dendrogram showing the disconnected groups stats:::plot.hclust(test.out2, hang=-1) axis(2,at=0:ceiling(max(test.out2$height,na.rm=TRUE))) ## End of the disjoint clusters example ## End(Not run) ## End of examples
Files belonging to this class hold information about the constrained agglomerative clustering and allows one to display results graphically.
A file belonging to this class is a list with elements:
A (n-1) by 2 matrix. Row i of file "merge"
describes
the merging of clusters at step i of the clustering. If an element j in
the row is negative, it means that observation -j was merged at this
stage. If j is positive, it means that the merge was with the cluster
formed at the (earlier) stage j of the algorithm. Thus negative entries
in file "merge"
indicate agglomerations of singletons, and
positive entries indicate agglomerations of non-singletons.
A set of (n-1) non-decreasing real values. The clustering height is the value of the criterion associated with the clustering method for the particular agglomeration.
A vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossing branches.
Labels for the clustered objects.
The agglomerative clustering method that has been used.
The call that produced the result.
The distance that has been used to create dissimilarity
matrix "d"
(only returned if the dissimilarity matrix object has a
"method" attribute attached to it).
A copy of the list of edges (if a matrix of edges was provided to the function).
A copy of the coordinates (if coordinates were provided to the function).
The class inherits from hclust-class
and describes the
tree produced by the constrained clustering procedure.
All class members except links
and coords
are identical to
those in hclust-class
. several methods designed to process
these objects are expected to also work with constr.hclust-class
objects.
hclust-class
Function to compute a contingency periodogram for a univariate series of qualitative data
Cperiodogram(x, T1 = 2, T2 = NULL, nperm = NULL, alpha = 0.05, graph = TRUE)
Cperiodogram(x, T1 = 2, T2 = NULL, nperm = NULL, alpha = 0.05, graph = TRUE)
x |
a qualitative variable ( |
T1 |
first period included in the calculations (default: T1 = 2) |
T2 |
last period included in the calculations (default: T2 = n/2) |
nperm |
Number of permutations for the chi-square test. For chi-square
tests using the chi- square distribution, use the default |
alpha |
significance level for computation of the confidence limits |
graph |
a logical indicating if a graph is requested, by default
|
The contingency periodogram of Legendre et al. (1981) identifies periodic components in qualitative data vectors. The vector may contain classes of a qualitative variable or the classes obtained by hierarchical clustering or partitioning of a multivariate data table. The method is also described in Legendre & Legendre (2012). The optional graph produced by the function shows the following information:
In red: the B statistics (information in common).
In blue: Confidence limits for B without correction.
In green: Bonferroni-corrected limits of the confidence intervals.
In black: Confidence limits with progressive Bonferroni correction.
A table with the statistics for the selected periods:
Wilks' chi-square statistic (Wilks.chisq)
information in common (B),
degrees of freedom (df),
p-value (prob)
Confidence interval limits:
critical value of B without correction (B.crit),
critical value of B with Bonferroni correction based on the number of periods studied in the periodogram (B.crit.Bonf),
critical value of B with progressive Bonferroni correction (B.prog.Bonf).
Pierre Legendre [email protected]
Legendre, L., M. Fréchette & P. Legendre. 1981. The contingency periodogram: a method of identifying rhythms in series on nonmetric ecological data. Journal of Ecology 69: 965-979.
Legendre, P. and Legendre, L. 2012. Numerical Ecology. 3rd English ed. Elsevier, Amsterdam
# Data from the numerical example of Subsection 12.4.2 of Legendre and Legendre (2012). test.vec <- c(1,1,2,3,3,2,1,2,3,2,1,1,2,3,3,1) # Periodogram with tests using the chi-square distribution res <- Cperiodogram(test.vec) # Periodogram with permutation tests res <- Cperiodogram(test.vec, nperm=2000, graph=FALSE)
# Data from the numerical example of Subsection 12.4.2 of Legendre and Legendre (2012). test.vec <- c(1,1,2,3,3,2,1,2,3,2,1,1,2,3,3,1) # Periodogram with tests using the chi-square distribution res <- Cperiodogram(test.vec) # Periodogram with permutation tests res <- Cperiodogram(test.vec, nperm=2000, graph=FALSE)
This function reads a file containing the Cartesian coordinates of
sites forming different groups on the map, and constructs a combined staggered matrix
of dbMEM spatial eigenvectors, ready for use in RDA.
The method was first described and used in Declerck et al. (2011) and summarized in
the Borcard et al. (2011) book, section 7.4.3.5. These publications provided
preliminary versions of the present function. The present version is more completely
documented. Furthermore, it uses the dbmem
function of the
adespatial
package for computation of the eigenfunctions.
create.dbMEM.model(coord = NULL, D.mat = NULL, nsites)
create.dbMEM.model(coord = NULL, D.mat = NULL, nsites)
coord |
Optional file containing the Cartesian coordinates of the sites. |
D.mat |
Optional distance matrix provided by user, class |
nsites |
A vector containing the number of sites per group. |
The geographic positions of the sites are provided either in a file of geographic
coordinates coord
or as a geographic distance matrix D.mat
.
The sites must, of course, be in the same order in file coord
(or in file
D.mat
) and in the response data file used in the RDA. All sites of a group must
be together in these two files, i.e. not interspersed. The numbers of sites in the
groups are provided in vector nsites
. See example.
File vector coord
, if provided, must contain Cartesian coordinates of the sites,
not coordinates in degrees. The Euclidean distance computed from the geographic
coordinates is a meaningful representation of the geographic relationships only if the
coordinates are Cartesian. Geodetic Cartesian coordinates can be derived from Lat-Lon
data in degrees using the function geoXY
of the SoDA
package. Beware of
UTM coordinates if the sites are not all located in the same UTM zone; UTM coordinates
are Cartesian only within an UTM zone. See
https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system.
A matrix with n
rows containing a set of k
staggered matrices of
dbMEM eigenfunctions in its diagonal portion; n
is the total number of sites
in the study and k
is the number of groups. Each small matrix contains
the dbMEM functions, modelling positive spatial correlation, describing the spatial
relationships among the sites of a group. The remainder of the matrix is filled with
zeros. Zero is the mean value of all eigenfunctions describing within-group
relationships. This means that during the calculation of RDA, the sites of a focus
group will have, with each other, relationships described by the dbMEM eigenfunctions
of that group, whereas the sites outside that group will have weights of 0 in the
regressions that concern these eigenfunctions.
Pierre Legendre [email protected], 2010. Adaptation to adespatial: Daniel Borcard and Pierre Legendre, 2016
Borcard, D., F. Gillet and P. Legendre. 2011. Numerical ecology with R. Use R! series, Springer Science, New York.
Declerck, S. A. J., J. S. Coronel, P. Legendre & L. Brendonck. 2011. Scale dependency of processes structuring metacommunities of cladocerans in temporary pools of High-Andes wetlands. Ecography 34: 296-305.
{ # Generate random coordinates for 35 sites forming 6 distinct groups on the map Easting <- runif(35)+c(rep(0,6),rep(1.5,7),rep(3,6), rep(0,5),rep(1.5,5),rep(3,6)) Northing<- runif(35)+c(rep(2.8,6),rep(2.3,7),rep(2.8,6), rep(0,5),rep(0.5,5),rep(0,6)) cartesian <- cbind(Easting,Northing) rownames(cartesian) <- paste("S",1:nrow(cartesian),sep='') nsites.per.group <- c(6,7,6,5,5,6) result <- create.dbMEM.model(coord=cartesian, nsites=nsites.per.group) # Draw a map to check the coding of the sites into the groups site.codes <- unlist(apply(cbind(1:6),1,n=nsites.per.group,function(a,n) rep(a,n[a]))) col.vec <- c("green3","gray99","orange2","gold1","brown3","gray70") plot(cartesian, pch=22, col="black", bg=col.vec[site.codes], cex=2, ylim=c(0,4),asp=1) text(cartesian,labels=rownames(cartesian), cex=0.5, pos=3) # Examine the staggered matrix of dbMEM eigenfunctions # Not run: result }
{ # Generate random coordinates for 35 sites forming 6 distinct groups on the map Easting <- runif(35)+c(rep(0,6),rep(1.5,7),rep(3,6), rep(0,5),rep(1.5,5),rep(3,6)) Northing<- runif(35)+c(rep(2.8,6),rep(2.3,7),rep(2.8,6), rep(0,5),rep(0.5,5),rep(0,6)) cartesian <- cbind(Easting,Northing) rownames(cartesian) <- paste("S",1:nrow(cartesian),sep='') nsites.per.group <- c(6,7,6,5,5,6) result <- create.dbMEM.model(coord=cartesian, nsites=nsites.per.group) # Draw a map to check the coding of the sites into the groups site.codes <- unlist(apply(cbind(1:6),1,n=nsites.per.group,function(a,n) rep(a,n[a]))) col.vec <- c("green3","gray99","orange2","gold1","brown3","gray70") plot(cartesian, pch=22, col="black", bg=col.vec[site.codes], cex=2, ylim=c(0,4),asp=1) text(cartesian,labels=rownames(cartesian), cex=0.5, pos=3) # Examine the staggered matrix of dbMEM eigenfunctions # Not run: result }
Compute distance-based Moran's eigenvector maps (dbMEM, also called dbMEM spatial eigenfunctions) from a geographic distance matrix, in view of spatial eigenfunction analysis.
dbmem( xyORdist, thresh = NULL, MEM.autocor = c("positive", "non-null", "all", "negative"), store.listw = TRUE, silent = TRUE )
dbmem( xyORdist, thresh = NULL, MEM.autocor = c("positive", "non-null", "all", "negative"), store.listw = TRUE, silent = TRUE )
xyORdist |
Either a matrix of spatial coordinates or a distance matrix
(class |
thresh |
A threshold value for truncation of the geographic distance
matrix. If |
MEM.autocor |
A string indicating if all MEMs must be returned or only
those corresponding to non-null, positive or negative autocorrelation. The
difference between options |
store.listw |
A logical indicating if the spatial weighting matrix should
be stored in the attribute |
silent |
A logical indicating if some information should be printed during computation: truncation level and time to compute the dbmem |
dbMEM eigenfunctions were called PCNM in early papers (Borcard and Legendre 2002, Borcard et al. 2004). There is a small difference in the computation: to construct PCNMs, the distance matrix subjected to PCoA contained zeros on the diagonal. In dbMEM, the matrix contains 4*thresh values on the diagonal. The result is that the dbMEM eigenvalues are smaller than the PCNM eigenvalues by a constant (equal to (n.sites * (4*thresh)^2)/2). The dbMEM eigenvalues are proportional to Moran's I coefficient of spatial correlation (Dray et al. 2006; Legendre and Legendre 2012). The dbMEM eigenvectors only differ from the PCNM eigenvectors by a multiplicative constant; this has no impact on the use of MEMs as explanatory variables in linear models. In this implementation, dbMEM eigenvectors have a norm equal to 1 (using the uniform weigts 1/n.sites).
If a truncation value is not provided, the largest distance in a minimum
spanning tree linking all sites on the map is computed (returned by the
function give.thresh
). That value is used as the truncation threshold
value (thresh).
A square regular grid produces multiple eigenvalues (i.e. eigenvalues that are equal) and multiple eigenvalues have an infinity of eigenvector solutions. Hence, different eigenvectors may be produced by this function on computers with different operating systems or implementations of R. In addition, the eigenvectors found by the dbmem function from the site coordinates may differ from the eigenvectors computed from the geographic distance matrix among the sites. Nonetheless, the different complete sets of eigenvectors will have the exact same explanatory power (R-square) for a given response vector or matrix, despite the fact that they are not fully correlated on a one-to-one basis. This is, however, not the case for subsets of eigenvectors selected using stepwise procedures.
An object of class orthobasisSp
, subclass orthobasis
.
The dbMEM eigenfunctions (principal coordinates of the truncated distance
matrix) are stored as a data.frame
. It contains several attributes
(see ?attributes
) including:
values
: The dbMEM
eigenvalues.
listw
: The associated spatial weighting matrix (if
store.listw = TRUE
).
Stéphane Dray [email protected], Pierre Legendre, Daniel Borcard and F. Guillaume Blanchet
Borcard, D. and P. Legendre. 2002. All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling 153: 51-68.
Borcard, D., P. Legendre, C. Avois-Jacquet and H. Tuomisto. 2004. Dissecting the spatial structure of ecological data at multiple scales. Ecology 85: 1826-1832.
Dray, S., P. Legendre and P. R. Peres-Neto. 2006. Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM). Ecological Modelling 196: 483-493.
Legendre, P. and L. Legendre. 2012. Numerical ecology, 3rd English edition. Elsevier Science BV, Amsterdam.
if(require("ade4", quietly = TRUE) & require("adegraphics", quietly = TRUE)){ data(oribatid) mite <- oribatid$fau # 70 peat cores, 35 species mite.xy <- oribatid$xy # Geographic coordinates of the 70 cores # Example 1: Compute the MEMs corresponding to all non-null eigenvalues # thresh=1.012 is the value used in Borcard and Legendre (2002) mite.dbmem1 <- dbmem(mite.xy, thresh=1.012, MEM.autocor = "non-null", silent = FALSE) mite.dbmem1 # Print the (n-1) non-null eigenvalues attributes(mite.dbmem1)$values # or: attr(mite.dbmem1, "values") # Plot the associated spatial weighting matrix s.label(mite.xy, nb = attr(mite.dbmem1, "listw")) # Plot maps of the first 3 dbMEM eigenfunctions s.value(mite.xy, mite.dbmem1[,1:3]) # Compute and test associated Moran's I values # Eigenvalues are proportional to Moran's I test <- moran.randtest(mite.dbmem1, nrepet = 99) plot(test$obs, attr(mite.dbmem1, "values"), xlab = "Moran's I", ylab = "Eigenvalues") # Decreasing values of Moran's I for the successive MEM. # The red line is the expected value of Moran's I under H0. plot(test$obs, xlab="MEM rank", ylab="Moran's I") abline(h=-1/(nrow(mite.xy) - 1), col="red") # Example 2: Compute only the MEMs with positive eigenvalues (and positive Moran's I) mite.dbmem2 <- dbmem(mite.xy, thresh=1.012) # or: mite.dbmem2 <- dbmem(dist(mite.xy), thresh=1.012, silent=FALSE) mite.dbmem2 # Examine the eigenvalues attributes(mite.dbmem2)$values # or: attr(mite.dbmem2, "values") # Examine (any portion of) the dbmem spatial eigenvectors tmp <- as.matrix(mite.dbmem2) tmp[1:10,1:6] }
if(require("ade4", quietly = TRUE) & require("adegraphics", quietly = TRUE)){ data(oribatid) mite <- oribatid$fau # 70 peat cores, 35 species mite.xy <- oribatid$xy # Geographic coordinates of the 70 cores # Example 1: Compute the MEMs corresponding to all non-null eigenvalues # thresh=1.012 is the value used in Borcard and Legendre (2002) mite.dbmem1 <- dbmem(mite.xy, thresh=1.012, MEM.autocor = "non-null", silent = FALSE) mite.dbmem1 # Print the (n-1) non-null eigenvalues attributes(mite.dbmem1)$values # or: attr(mite.dbmem1, "values") # Plot the associated spatial weighting matrix s.label(mite.xy, nb = attr(mite.dbmem1, "listw")) # Plot maps of the first 3 dbMEM eigenfunctions s.value(mite.xy, mite.dbmem1[,1:3]) # Compute and test associated Moran's I values # Eigenvalues are proportional to Moran's I test <- moran.randtest(mite.dbmem1, nrepet = 99) plot(test$obs, attr(mite.dbmem1, "values"), xlab = "Moran's I", ylab = "Eigenvalues") # Decreasing values of Moran's I for the successive MEM. # The red line is the expected value of Moran's I under H0. plot(test$obs, xlab="MEM rank", ylab="Moran's I") abline(h=-1/(nrow(mite.xy) - 1), col="red") # Example 2: Compute only the MEMs with positive eigenvalues (and positive Moran's I) mite.dbmem2 <- dbmem(mite.xy, thresh=1.012) # or: mite.dbmem2 <- dbmem(dist(mite.xy), thresh=1.012, silent=FALSE) mite.dbmem2 # Examine the eigenvalues attributes(mite.dbmem2)$values # or: attr(mite.dbmem2, "values") # Examine (any portion of) the dbmem spatial eigenvectors tmp <- as.matrix(mite.dbmem2) tmp[1:10,1:6] }
Compute directional indices of community change along coenoclines or time.
directional.response(mat, method = "overlap", relativize = NULL)
directional.response(mat, method = "overlap", relativize = NULL)
mat |
A community composition data matrix with sites in rows and species
in columns. The direction of the physical process is indicated by the order of the
sampling units in |
method |
One of the 11 calculation methods available in the function:
|
relativize |
Compute relativized indices: |
The output matrix is non-symmetric, meaning that its upper triangle is not the mirror
image of the lower triangle. Given the direction of the physical process through space
or time indicated by the order of the sampling units, the output matrix mat.out
reflects that direction in its non-symmetric presentation, From (rows of the
matrix) To (columns of the matrix). Users of the function can extract one or the
other of these triangular portions and analyse them separately. See Appendix xx for
examples.
A list containing the following results:
mat.out
: A square matrix with the chosen index computed among the sites.
Depending on the method chosen, this matrix may be symmetric or non-symmetric.
total.t
: methods #4 to 6, a matrix with total turnover (b+c); else NA.
total.n
: For methods #7 to 9, a matrix with total nestedness a+abs(b-c)
if a>0; else NA.
total.strict.n
: For methods #10 and 11, a matrix with total strict nestedness
a+abs(b-c) if a>0 and b!=c; else NA.
den
: For calculation results with Jaccard or Sorensen denominators:
a square matrix of denominators. If relativize=NULL
, den=NA
.
Dénes Schmera [email protected] and Pierre Legendre [email protected]
Schmera, D., P. Legendre, T. Eros, M. Toth, E. K. Magyari, B. Baur and J. Podani. 2022. New measures for quantifying directional changes in presence-absence community data. Ecological Indicators 136: 108618. https://doi.org/10.1016/j.ecolind.2022.108618
Verneaux, J. (1973) Cours d'eau de Franche-Comté (Massif du Jura). Recherches écologiques sur le réseau hydrographique du Doubs. Essai de biotypologie. Thèse d'État, Besançon. 1–257.
# Artificial Example art <- c(1,1,1,0,0,0, 0,0,0,1,1,0, 0,0,0,0,0,1) art.data <- matrix(art, nrow=3, ncol=6, byrow=TRUE) art.out <- directional.response(art.data, method="overlap",relativize=NULL) # Real data example: the Doubs River fish data (Verneaux 1973), available in ade4. # 30 sites, 27 species. No fish had been caught at site 8; remove that site if(require("ade4", quietly = TRUE)) { data(doubs) dim(doubs$fish) fish <- doubs$fish[-8,] dim(fish) doubs.out <- directional.response(fish, method="gain", relativize="S") }
# Artificial Example art <- c(1,1,1,0,0,0, 0,0,0,1,1,0, 0,0,0,0,0,1) art.data <- matrix(art, nrow=3, ncol=6, byrow=TRUE) art.out <- directional.response(art.data, method="overlap",relativize=NULL) # Real data example: the Doubs River fish data (Verneaux 1973), available in ade4. # 30 sites, 27 species. No fish had been caught at site 8; remove that site if(require("ade4", quietly = TRUE)) { data(doubs) dim(doubs$fish) fish <- doubs$fish[-8,] dim(fish) doubs.out <- directional.response(fish, method="gain", relativize="S") }
Compute dissimilarity indices for ecological data matrices. The dissimilarity
indices computed by this function are those described in Legendre and De
Cáceres (2013). In the name of the function, 'ldc' stands for the author's
names. Twelve of these 21 indices are not readily available in other R
package functions; four of them can, however, be computed in two computation
steps in vegan
.
dist.ldc(Y, method = "hellinger", binary = FALSE, samp = TRUE, silent = FALSE)
dist.ldc(Y, method = "hellinger", binary = FALSE, samp = TRUE, silent = FALSE)
Y |
Community composition data. The object class can be either
|
method |
One of the 21 dissimilarity coefficients available in the
function: |
binary |
If |
samp |
If |
silent |
If |
The dissimilarities computed by this function are the following. Indices i and k designate two rows (sites) of matrix Y, j designates a column (species). D[ik] is the dissimilarity between rows i and k. p is the number of columns (species) in Y; pp is the number of species present in one or the other site, or in both. y[i+] is the sum of values in row i; same for y[k+]. y[+j] is the sum of values in column j. y[++] is the total sum of values in Y. The indices are computed by functions written in C for greater computation speed with large data matrices.
Group 1 - D computed by transformation of Y followed by Euclidean distance
Hellinger D, D[ik] = sqrt(sum((sqrt(y[ij]/y[i+])-sqrt(y[kj]/y[k+]))^2))
chord D, D[ik] = sqrt(sum((y[ij]/sqrt(sum(y[ij]^2))-y[kj]/sqrt(sum(y[kj]^2)))^2))
log-chord D, D[ik] = chord D[ik] computed on log(y[ij]+1)-transformed data (Legendre and Borcard 2018)
chi-square D, D[ik] = sqrt(y[++] sum((1/j[+j])(y[ij]/y[i+]-y[kj]/y[k+])^2))
species profiles D, D[ik] = sqrt(sum((y[ij]/y[i+]-y[kj]/y[k+])^2))
Group 2 - Other D functions appropriate for beta diversity studies where A = sum(min(y[ij],y[kj])), B = y[i+]-A, C = y[k+]-A
percentage difference D (aka Bray-Curtis), D[ik] = (sum(abs(y[ij]-y[k,j])))/(y[i+]+y[k+]) or else, D[ik] = (B+C)/(2A+B+C)
Ružička D, D[ik] = 1-(sum(min(y[ij],y[kj])/sum(max(y[ij],y[kj])) or else, D[ik] = (B+C)/(A+B+C)
coeff. of divergence D, D[ik] = sqrt((1/pp)sum(((y[ij]-y(kj])/(y[ij]+y(kj]))^2))
Canberra metric D, D[ik] = (1/pp)sum(abs(y[ij]-y(kj])/(y[ij]+y(kj]))
Whittaker D, D[ik] = 0.5*sum(abs(y[ij]/y[i+]-y(kj]/y[k+]))
Wishart D, D[ik] = 1-sum(y[ij]y[kj])/(sum(y[ij]^2)+sum(y[kj]^2)-sum(y[ij]y[kj]))
Kulczynski D, D[ik] = 1-0.5((sum(min(y[ij],y[kj])/y[i+]+sum(min(y[ij],y[kj])/y[k+]))
Group 3 - Classical indices for binary data; they are appropriate for beta diversity studies. Value a is the number of species found in both i and k, b is the number of species in site i not found in k, and c is the number of species found in site k but not in i. The D matrices are square-root transformed, as in dist.binary of ade4; the user-oriented reason for this transformation is explained below.
Jaccard D, D[ik] = sqrt((b+c)/(a+b+c))
Sørensen D, D[ik] = sqrt((b+c)/(2a+b+c))
Ochiai D, D[ik] = sqrt(1 - a/sqrt((a+b)(a+c)))
Group 4 - Abundance-based indices of Chao et al. (2006) for
quantitative abundance data. These functions correct the index for species
that have not been observed due to sampling errors. For the meaning of the
U and V notations, see Chao et al. (2006, section 3). When
samp=TRUE
, the abundance-based distances (ab.jaccard, ab.sorensen,
ab.ochiai, ab.simpson) are computed for sample data. If samp=FALSE
,
indices are computed for true population data. - Do not use indices of
group 4 with samp=TRUE
on presence-absence data; the indices are not
meant to accommodate this type of data. If samp=FALSE
is used with
presence-absence data, the indices are the regular Jaccard, Sørensen,
Ochiai and Simpson indices. On output, however, the D matrices are not
square-rooted, contrary to the Jaccard, Sørensen and Ochiai indices in
section 3 which are square-rooted.
abundance-based Jaccard D, D[ik] = 1-(UV/(U+V-UV))
abundance-based Sørensen D, D[ik] = 1-(2UV/(U+V))
abundance-based Ochiai D, D[ik] = 1-sqrt(UV)
abundance-based Simpson D, D[ik] = 1-(UV/(UV+min((U-UV),(V-UV))))
Group 5 - General-purpose dissimilarities that do not have an upper bound (maximum D value). They are inappropriate for beta diversity studies.
Euclidean D, D[ik] = sqrt(sum(y[ij]-y[kj])^2)
Manhattan D, D[ik] = sum(abs(y[ij] - y[ik]))
modified mean character difference, D[ik] = (1/pp) sum(abs(y[ij] - y[ik]))
The properties of
all dissimilarities available in this function (except Ružička D) were
described and compared in Legendre & De Cáceres (2013), who showed that
most of these dissimilarities are appropriate for beta diversity studies.
Inappropriate are the Euclidean, Manhattan, modified mean character
difference, species profile and chi-square distances. Most of these
dissimilarities have a maximum value of either 1 or sqrt(2). Three
dissimilarities (Euclidean, Manhattan, Modified mean character difference)
do not have an upper bound and are thus inappropriate for beta diversity
studies. The chi-square distance has an upper bound of
sqrt(2*(sum(Y))).
The Euclidean, Hellinger, chord, chi-square and
species profiles dissimilarities have the property of being Euclidean,
meaning that they never produce negative eigenvalues in principal
coordinate analysis. The Canberra, Whittaker, percentage difference,
Wishart and Manhattan coefficients are Euclidean when they are square-root
transformed (Legendre & De Cáceres 2013, Table 2). The distance forms (1-S)
of the Jaccard, Sørensen and Ochiai similarity (S) coefficients are
Euclidean after taking the square root of (1-S) (Legendre & Legendre 2012,
Table 7.2). The D matrices resulting from these three coefficients are
outputted in the form sqrt(1-S), as in function dist.binary
of ade4,
because that form is Euclidean and will thus produce no negative
eigenvalues in principal coordinate analysis.
The Hellinger, chord,
chi-square and species profile dissimilarities are computed using the
two-step procedure developed by Legendre & Gallagher (2001). The data are
first transformed using either the row marginals, or the row and column
marginals in the case of the chi-square distance. The dissimilarities are
then computed from the transformed data using the Euclidean distance
formula. As a consequence, these four dissimilarities are necessarily
Euclidean. D matrices for other binary coefficients can be computed in two
ways: either by using function dist.binary
of ade4, or by choosing
option binary=TRUE
, which transforms the abundance data to binary
form, and using one of the quantitative indices of the present function.
Table 1 of Legendre & De Cáceres (2013) shows the incidence-based
(presence-absence-based) indices computed by the various indices using
binary data.
The Euclidean distance computed on untransformed
presence-absence or abundance data produces non-informative and incorrect
ordinations, as shown in Legendre & Legendre (2012, p. 300) and in Legendre
& De Cáceres (2013). However, the Euclidean distance computed on
log-transformed abundance data produces meaningful ordinations in principal
coordinate analysis (PCoA). Nonetheless, it is easier to compute a PCA of
log-transformed abundance data instead of a PCoA; the resulting ordination
with scaling 1 will be meaningful. Messages are printed to the R console
indicating the Euclidean status of the computed dissimilarity matrices.
Note that for the chi-square distance, the columns that sum to zero are
eliminated before calculation of the distances, thus preventing divisions
by zero in the calculation of the chi-square transformation.
A dissimilarity matrix, with class dist
.
Pierre Legendre [email protected] and Naima Madi
Chao, A., R. L. Chazdon, R. K. Colwell and T. J. Shen. 2006. Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62: 361-371.
Legendre, P. and D. Borcard. 2018. Box-Cox-chord transformations for community composition data prior to beta diversity analysis. Ecography 41: 1820-1824.
Legendre, P. and M. De Cáceres. 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters 16: 951-963.
Legendre, P. and E. D. Gallagher, E.D. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271-280.
Legendre, P. and Legendre, L. 2012. Numerical Ecology. 3rd English edition. Elsevier Science BV, Amsterdam.
if(require("vegan", quietly = TRUE)) { data(mite) mat1 = as.matrix(mite[1:10, 1:15]) # No column has a sum of 0 mat2 = as.matrix(mite[61:70, 1:15]) # 7 of the 15 columns have a sum of 0 #Example 1: compute Hellinger distance for mat1 D.out = dist.ldc(mat1,"hellinger") #Example 2: compute chi-square distance for mat2 D.out = dist.ldc(mat2,"chisquare") #Example 3: compute percentage difference dissimilarity for mat2 D.out = dist.ldc(mat2,"percentdiff") }
if(require("vegan", quietly = TRUE)) { data(mite) mat1 = as.matrix(mite[1:10, 1:15]) # No column has a sum of 0 mat2 = as.matrix(mite[61:70, 1:15]) # 7 of the 15 columns have a sum of 0 #Example 1: compute Hellinger distance for mat1 D.out = dist.ldc(mat1,"hellinger") #Example 2: compute chi-square distance for mat2 D.out = dist.ldc(mat2,"chisquare") #Example 3: compute percentage difference dissimilarity for mat2 D.out = dist.ldc(mat2,"percentdiff") }
The function uses two different spatially-constrained null models to test the shared space-environment fraction (SSEF, or fraction [b]) of a variation partitioning of two explanatory components.
envspace.test( spe, env, coord, MEM.spe, listw.env, MEM.autocor = c("positive", "negative", "all"), regular = FALSE, nperm = 999, MSR.method = "singleton", alpha = 0.05 )
envspace.test( spe, env, coord, MEM.spe, listw.env, MEM.autocor = c("positive", "negative", "all"), regular = FALSE, nperm = 999, MSR.method = "singleton", alpha = 0.05 )
spe |
Vector, matrix, or dataframe of response variable(s) (e.g. species abundances) |
env |
Vector, matrix, or dataframe of environmental variables (rows = sites, columns = variables) |
coord |
Matrix or dataframe of spatial coordinates of the sampled sites |
MEM.spe |
Matrix or dataframe of spatial predictors (MEM variables) selected for
|
listw.env |
An object of class |
MEM.autocor |
A string indicating the type of spatial structure of interest for
|
regular |
Logical argument indicating whether a torus-translation test will be
performed, in addition to the MSR. Set to |
nperm |
Number of permutations performed; Default is 999 |
MSR.method |
Algorithm of |
alpha |
Threshold value of null hypothesis rejection for the test of a spatial structure in the environment, and for the shared environment-space fraction of the variation partitioning; Default is 0.05 |
The function tests the SSEF (also known as fraction [b]) of a variation
partitioning of a response variable or matrix (y
) between an environmental and a
spatial component (env
, and MEM.spe
, respectively). The SSEF is the
explained variation of y
shared by env
and MEM.spe
.
The adjusted R-squared (Peres-Neto et al. 2006; R2adj) of the SSEF is not an
actual R2, as it is computed by subtracting the adjusted R2adj of other fractions and
therefore has zero degree of freedom (Legendre and Legendre 2012).
The SSEF can therefore not be computed in the classical way (residuals permutation;
Anderson and Legendre 1999, Legendre and Legendre 2012).
The function envspace.test
provides two ways of testing this fraction, that is,
spatially-constrained null models based either on a torus-translation test (TT) (for
regular sampling designs only), or on Moran spectral randomizations (MSR) (for any type
of sampling design). The test of the SSEF should only be performed if both the global
models of y
against all the environmental variables and against all spatial variables
are significant (see Bauman et al. 2018c).
The function first checks whether the environment displays significant spatial structures,
and then proceeds to the test of the SSEF if this condition is fulfilled (details in
Bauman et al. 2018c).
spe
can be a vector or a multicolumn matrix or dataframe (multivariate
response data). If multivariate, it is greatly advised to transform spe
prior
to performing the variation partitioning and testing the SSEF (e.g., Hellinger
transformation; see Legendre and Gallagher 2001).
MEM.spe
is a set of spatial predictors (MEM variables). It is recommended to be
a well-defined subset of MEM variables selected among the complete set generated from
the spatial weighting matrix (SWM) (see review about spatial eigenvector selection in
Bauman et al. 2018a).
Optimising the selection of a subset of forward-selected MEM variables
among a set of candidate SWMs has been shown to increase statistical power as well as
R2-estimation accuracy (Bauman et al. 2018b). To do so, MEM.spe
can be generated
using listw.candidates
followed by listw.select
. If a SWM has
already been selected in another way, then mem.select
can be used to
generate the MEM variables and to select an optimal subset among them, which can then
be used as MEM.spe
in envspace.test
(see Details
of function
mem.select
).
listw.env
corresponds to the SWM that will be used to test for a spatial structure
in env
, and to build the MEM variables for the MSR test.
The choice of the SWM for env
can also be optimised with listw.select
.
The SWMs selected for spe
and env
should be optimised separately to
best model the spatial structure of both spe
and env
(see example).
To verify that env
displays a significant spatial pattern, prior to performing the
test of the SSEF, a residuals permutation test is performed on the global set of MEM
variables (generated internally from listw.env
) associated to the type of
spatial structure of interest (see argument MEM.autocor
). This test is performed
with mem.select
. The choice of MEM.autocor
should be made according to
the MEM.autocor
argument used to build MEM.spe
.
env
is a dataset of environmental variables chosen by the user. We recommend dealing
with collinearity issues prior to performing the variation partitioning and the test of
the SSEF (see Dormann et al. 2013 for a review of methods to cope with collinearity).
regular
is a logical argument indicating whether a TT test should
be performed instead of the MSR to test the SSEF. Since the TT can only
be performed on regular sampling designs, regular
should only be set to
TRUE
if the sampling design is either a transect, or a grid displaying the
same number of sites for all lines and columns (although the number of sites per column
can differ from the number of sites per line).
listw.env
is the SWM used by the MSR to generate spatially-constrained null
environmental variables. It should ideally be a SWM optimised on the basis of env
using the function listw.select
, with the argument method = "global"
(see
Details
of function mem.select
for an explanation).
This will allow detecting the spatial structures of env
as accurately as possible,
hence allowing MSR to generate null environmental variables as spatially faithful to the
original ones.
It is also on the basis of listw.env
that MEM variables will be generated to test
whether env
is spatially structured (i.e. global test) prior to perform the test of
the SSEF.
It is worth mentioning that, although a significant SSEF may provide evidence of an induced spatial dependence (Bauman et al. 2018c), a non-significant SSEF only indicates that no induced spatial dependence could be detected in relation with the chosen environmental variables. This does not exclude that this effect may exist with respect to some unmeasured variables.
If the condition of env
being spatially structured is fulfilled, the test
is performed and the function returns an object of class randtest
containing the results of the test.
David Bauman and Jason Vleminckx, [email protected], [email protected]
Anderson M. and Legendre P. (1999) An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. Journal of Statistical Computation and Simulation, 62(3), 271–303
Bauman D., Drouet T., Dray S. and Vleminckx J. (2018a) Disentangling good from bad practices in the selection of spatial or phylogenetic eigenvectors. Ecography, 41, 1–12
Bauman D., Fortin M-J, Drouet T. and Dray S. (2018b) Optimizing the choice of a spatial weighting matrix in eigenvector-based methods. Ecology
Bauman D., Vleminckx J., Hardy O., Drouet T. (2018c) Testing and interpreting the shared space-environment fraction in variation partitioning analyses of ecological data. Oikos
Blanchet G., Legendre P. and Borcard D. (2008) Forward selection of explanatory variables. Ecology, 89(9), 2623–2632
Legendre P., Gallagher E.D. (2001) Ecologically meaningful transformations for ordination of species data. Oecologia, 129(2), 271–280
Legendre P. and Legendre L. (2012) Numerical Ecology, Elsevier, Amsterdam
Peres-Neto P., Legendre P., Dray S., Borcard D. (2006) Variation partitioning of species data matrices: estimation and comparison of fractions. Ecology, 87(10), 2614–2625
Peres-Neto P. and Legendre P. (2010) Estimating and controlling for spatial structure in the study of ecological communities. Global Ecology and Biogeography, 19, 174–184
varpart
, listw.select
, listw.candidates
, mem.select
if(require(vegan)) { # Illustration of the test of the SSEF on the oribatid mite data # (Borcard et al. 1992, 1994 for details on the dataset): # Community data (response matrix): data(mite) # Hellinger-transformation of the community data (Legendre and Gallagher 2001): Y <- decostand(mite, method = "hellinger") # Environmental explanatory dataset: data(mite.env) # We only use two numerical explanatory variables: env <- mite.env[, 1:2] dim(Y) dim(env) # Coordinates of the 70 sites: data(mite.xy) coord <- mite.xy ### Building a list of candidate spatial weighting matrices (SWMs) for the ### optimisation of the SWM selection, separately for 'Y' and 'env': # We create five candidate SWMs: a connectivity matrix based on a Gabriel graphs, on # a minimum spanning tree (i.e., two contrasted graph-based SWMs), either # not weighted, or weighted by a linear function decreasing with the distance), # and a distance-based SWM corresponding to the connectivity and weighting # criteria of the original PCNM method: candidates <- listw.candidates(coord, nb = c("gab", "mst", "pcnm"), weights = c("binary", "flin")) ### Optimisation of the selection of a SWM: # SWM for 'Y' (based on the best forward-selected subset of MEM variables): modsel.Y <- listw.select(Y, candidates, method = "FWD", MEM.autocor = "positive", p.adjust = TRUE) names(candidates)[modsel.Y$best.id] # Best SWM selected modsel.Y$candidates$Pvalue[modsel.Y$best.id] # Adjusted p-value of the global model modsel.Y$candidates$N.var[modsel.Y$best.id] # Nb of forward-selected MEM variables modsel.Y$candidates$R2Adj.select[modsel.Y$best.id] # Adjusted R2 of the selected MEM var. # SWM for 'env' (method = "global" for the optimisation, as all MEM variables are required # to use MSR): modsel.env <- listw.select(env, candidates, method = "global", MEM.autocor = "positive", p.adjust = TRUE) names(candidates)[modsel.env$best.id] # Best SWM selected modsel.env$candidates$Pvalue[modsel.env$best.id] # Adjusted p-value of the global model modsel.env$candidates$N.var[modsel.env$best.id] # Nb of forward-selected MEM variables modsel.env$candidates$R2Adj.select[modsel.env$best.id] # Adjusted R2 of the selected MEM var. ### We perform the variation partitioning: # Subset of selected MEM variables within the best SWM: MEM.spe <- modsel.Y$best$MEM.select VP <- varpart(Y, env, MEM.spe) plot(VP) # Test of the shared space-environment fraction (fraction [b]): SSEF.test <- envspace.test(Y, env, coord, MEM.spe, listw.env = candidates[[modsel.env$best.id]], regular = FALSE, nperm = 999) SSEF.test # The SSEF is highly significant, indicating a potential induced spatial dependence. }
if(require(vegan)) { # Illustration of the test of the SSEF on the oribatid mite data # (Borcard et al. 1992, 1994 for details on the dataset): # Community data (response matrix): data(mite) # Hellinger-transformation of the community data (Legendre and Gallagher 2001): Y <- decostand(mite, method = "hellinger") # Environmental explanatory dataset: data(mite.env) # We only use two numerical explanatory variables: env <- mite.env[, 1:2] dim(Y) dim(env) # Coordinates of the 70 sites: data(mite.xy) coord <- mite.xy ### Building a list of candidate spatial weighting matrices (SWMs) for the ### optimisation of the SWM selection, separately for 'Y' and 'env': # We create five candidate SWMs: a connectivity matrix based on a Gabriel graphs, on # a minimum spanning tree (i.e., two contrasted graph-based SWMs), either # not weighted, or weighted by a linear function decreasing with the distance), # and a distance-based SWM corresponding to the connectivity and weighting # criteria of the original PCNM method: candidates <- listw.candidates(coord, nb = c("gab", "mst", "pcnm"), weights = c("binary", "flin")) ### Optimisation of the selection of a SWM: # SWM for 'Y' (based on the best forward-selected subset of MEM variables): modsel.Y <- listw.select(Y, candidates, method = "FWD", MEM.autocor = "positive", p.adjust = TRUE) names(candidates)[modsel.Y$best.id] # Best SWM selected modsel.Y$candidates$Pvalue[modsel.Y$best.id] # Adjusted p-value of the global model modsel.Y$candidates$N.var[modsel.Y$best.id] # Nb of forward-selected MEM variables modsel.Y$candidates$R2Adj.select[modsel.Y$best.id] # Adjusted R2 of the selected MEM var. # SWM for 'env' (method = "global" for the optimisation, as all MEM variables are required # to use MSR): modsel.env <- listw.select(env, candidates, method = "global", MEM.autocor = "positive", p.adjust = TRUE) names(candidates)[modsel.env$best.id] # Best SWM selected modsel.env$candidates$Pvalue[modsel.env$best.id] # Adjusted p-value of the global model modsel.env$candidates$N.var[modsel.env$best.id] # Nb of forward-selected MEM variables modsel.env$candidates$R2Adj.select[modsel.env$best.id] # Adjusted R2 of the selected MEM var. ### We perform the variation partitioning: # Subset of selected MEM variables within the best SWM: MEM.spe <- modsel.Y$best$MEM.select VP <- varpart(Y, env, MEM.spe) plot(VP) # Test of the shared space-environment fraction (fraction [b]): SSEF.test <- envspace.test(Y, env, coord, MEM.spe, listw.env = candidates[[modsel.env$best.id]], regular = FALSE, nperm = 999) SSEF.test # The SSEF is highly significant, indicating a potential induced spatial dependence. }
Performs a forward selection by permutation of residuals under reduced model. Y can be multivariate.
forward.sel( Y, X, K = nrow(X) - 1, R2thresh = 0.99, adjR2thresh = 0.99, nperm = 999, R2more = 0.001, alpha = 0.05, Xscale = TRUE, Ycenter = TRUE, Yscale = FALSE, verbose = TRUE )
forward.sel( Y, X, K = nrow(X) - 1, R2thresh = 0.99, adjR2thresh = 0.99, nperm = 999, R2more = 0.001, alpha = 0.05, Xscale = TRUE, Ycenter = TRUE, Yscale = FALSE, verbose = TRUE )
Y |
Response data matrix with n rows and m columns containing quantitative variables |
X |
Explanatory data matrix with n rows and p columns containing quantitative variables |
K |
Maximum number of variables to be selected. The default is one minus the number of rows |
R2thresh |
Stop the forward selection procedure if the R-square of the model exceeds the stated value. This parameter can vary from 0.001 to 1 |
adjR2thresh |
Stop the forward selection procedure if the adjusted R-square of the model exceeds the stated value. This parameter can take any value (positive or negative) smaller than 1 |
nperm |
The number of permutation to be used.The default setting is 999 permutation. |
R2more |
Stop the forward selection procedure if the difference in model R-square with the previous step is lower than R2more. The default setting is 0.001 |
alpha |
Significance level. Stop the forward selection procedure if the p-value of a variable is higher than alpha. The default is 0.05 is TRUE |
Xscale |
Standardize the variables in table X to variance 1. The default setting is TRUE |
Ycenter |
Center the variables in table Y. The default setting is TRUE |
Yscale |
Standardize the variables in table Y to variance 1. The default setting is FALSE. |
verbose |
If 'TRUE' more diagnostics are printed. The default setting is TRUE |
The forward selection will stop when either K, R2tresh, adjR2tresh, alpha and R2more has its parameter reached.
A dataframe with:
variables |
The names of the variables |
order |
The order of the selection of the variables |
R2 |
The R2 of the variable selected |
R2Cum |
The cumulative R2 of the variables selected |
AdjR2Cum |
The cumulative adjusted R2 of the variables selected |
F |
The F statistic |
pval |
The P-value statistic |
Not yet implemented for CCA (weighted regression) and with covariables.
Stephane Dray [email protected]
Canoco manual p.49
x <- matrix(rnorm(30),10,3) y <- matrix(rnorm(50),10,5) forward.sel(y,x,nperm=99, alpha = 0.5)
x <- matrix(rnorm(30),10,3) y <- matrix(rnorm(50),10,5) forward.sel(y,x,nperm=99, alpha = 0.5)
If Y is univariate, this function implements FS in regression. If Y is multivariate, this function implements FS using the F-test described by Miller and Farr (1971). This test requires that (i) the Y variables be standardized, and (ii) the error in the response variables be normally distributed (to be verified by the user).
forward.sel.par( Y, X, alpha = 0.05, K = nrow(X) - 1, R2thresh = 0.99, R2more = 0.001, adjR2thresh = 0.99, Yscale = FALSE, verbose = TRUE )
forward.sel.par( Y, X, alpha = 0.05, K = nrow(X) - 1, R2thresh = 0.99, R2more = 0.001, adjR2thresh = 0.99, Yscale = FALSE, verbose = TRUE )
Y |
Response data matrix with n rows and m columns containing quantitative variables |
X |
Explanatory data matrix with n rows and p columns containing quantitative variables |
alpha |
Significance level. Stop the forward selection procedure if the p-value of a variable is higher than alpha. The default is 0.05 |
K |
Maximum number of variables to be selected. The default is one minus the number of rows |
R2thresh |
Stop the forward selection procedure if the R-square of the model exceeds the stated value. This parameter can vary from 0.001 to 1 |
R2more |
Stop the forward selection procedure if the difference in model R-square with the previous step is lower than R2more. The default setting is 0.001 |
adjR2thresh |
Stop the forward selection procedure if the adjusted R-square of the model exceeds the stated value. This parameter can take any value (positive or negative) smaller than 1 |
Yscale |
Standardize the variables in table Y to variance 1. The default setting is FALSE. The setting is automatically changed to TRUE if Y contains more than one variable. This is a validity condition for the parametric test of significance (Miller and Farr 1971) |
verbose |
If 'TRUE' more diagnostics are printed. The default setting is TRUE |
The forward selection will stop when either K, R2tresh, adjR2tresh, alpha and R2more has its parameter reached.
A dataframe with:
variables |
The names of the variables |
order |
The order of the selection of the variables |
R2 |
The R2 of the variable selected |
R2Cum |
The cumulative R2 of the variables selected |
AdjR2Cum |
The cumulative adjusted R2 of the variables selected |
F |
The F statistic |
pval |
The P-value statistic |
Pierre Legendre [email protected] and Guillaume Blanchet
Miller, J. K. & S. D. Farr. 1971. Bimultivariate redundancy: a
comprehensive measure of interbattery relationship. Multivariate
Behavioral Research, 6, 313–324.
x <- matrix(rnorm(30),10,3) y <- matrix(rnorm(50),10,5) forward.sel.par(y,x, alpha = 0.5)
x <- matrix(rnorm(30),10,3) y <- matrix(rnorm(50),10,5) forward.sel.par(y,x, alpha = 0.5)
It is used to select a truncation value for the dbMEM approach. It returns the minimum value that keep all samples connected.
give.thresh(matdist)
give.thresh(matdist)
matdist |
A distance matrix (class |
The maximum distance in the minimum spanning tree.
Stéphane Dray [email protected]
xy <- matrix(rnorm(60),30,2) dxy <- dist(xy) th <- give.thresh(dxy)
xy <- matrix(rnorm(60),30,2) dxy <- dist(xy) th <- give.thresh(dxy)
These two Monte Carlo tests are used to assess the existence of 'global' and
'local' spatial structures, corresponding respectively to positive and
negative Moran's I .
global.rtest(X, listw, k = 1, nperm = 499)
global.rtest(X, listw, k = 1, nperm = 499)
X |
a data matrix, with variables in columns |
listw |
a list of weights of class |
k |
integer: the number of highest |
nperm |
integer: the number of randomisations to be performed. |
They rely on the decomposition of a data matrix X into global and local
components using multiple regression on Moran's Eigenvector Maps (MEMs). They
require a data matrix (X) and a list of weights derived from a connection
network. X is regressed onto global MEMs (U+) in the global test and on local
ones (U-) in the local test. One mean is obtained for each
MEM, the k highest being summed to form the test statistic.
The reference distribution of these statistics are obtained by randomly permuting the rows of X.
These tests were originally part of the adegenet package for R.
An object of class randtest
.
Thibaut Jombart [email protected]
Jombart, T., Devillard, S., Dufour, A.-B. and Pontier, D. 2008. Revealing cryptic spatial patterns in genetic variability by a new multivariate method. Heredity, 101, 92–103. doi: 10.1038/hdy.2008.34.
# wait for a generic dataset
# wait for a generic dataset
Compute LCBD indices (Legendre & De Cáceres 2013) from a symmetric dissimilarity matrix (D) or from a beta component matrix (Repl, RichDiff or AbDiff, or Nes) (Legendre 2014).
LCBD.comp(D, sqrt.D = TRUE, save.D = FALSE)
LCBD.comp(D, sqrt.D = TRUE, save.D = FALSE)
D |
A dissimilarity or beta diversity component matrix, class
|
sqrt.D |
Take the square root of the dissimilarities in matrix D before computing the LCBD indices. |
save.D |
If |
Use sqrt.D = TRUE
when computing LCBD indices for most of the
replacement and richness/abundance difference indices computed by function
beta.div.comp
, as well as for the corresponding D matrices. See
Table S1.4 in Appendix S1 of Legendre (2014) to identify the matrices that
are Euclidean without taking the square root of the individual values. Only
the RichDiffS (for presence-absence data) and AbDiff
abundance data) of the Sørensen group in the Podani family have that
property. In all other cases, use sqrt.D = TRUE
.
When computing LCBD from a D matrix, use sqrt = TRUE
if the D matrix
is not Euclidean. The Euclidean property can be checked with function
is.euclid
of ade4
.
BDtotal statistics are comparable among data sets having the same or different numbers of sampling units (n), provided that the sampling units are of the same size or represent the same sampling effort and that BDtotal is computed with the same D index.
Function LCBD.comp
produces the same (SStotal, BDtotal, LCBD)
results as function beta.div
. Note, however, that the latter
produces other interesting results (p.LCBD, SCBD). Function
LCBD.comp
should then only be used to compute LCBD indices from
dissimilarity matrices that cannot be computed by function beta.div
,
e.g. genetic D matrices, or from replacement and richness difference
matrices produced by function beta.div.comp
. Significance of the
LCBD indices cannot be tested when their calculation starts from a D matrix
because the testing procedure involves permutation of the columns of raw
data.
A list containing the following results:
beta
: Total sum of squares and total beta diversity [=
Var(Y)] of the data matrix.
LCBD
: Vector of Local
contributions to beta diversity (LCBD) for the sites.
D
: The
input dissimilarity matrix, class dist
; only if save.D=TRUE
.
Pierre Legendre [email protected]
Legendre, P. 2014. Interpreting the replacement and richness difference components of beta diversity. Global Ecology and Biogeography 23: 1324-1334.
Legendre, P. & M. De Cáceres. 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters 16: 951-963.
### Example 1 ### Compute the Hellinger distance, then the LCBD indices. if(require("vegan", quietly = TRUE)){ data(mite) mite.hel = decostand(mite, "hellinger") mite.D = dist(mite.hel) out.mite.D = LCBD.comp(mite.D, sqrt.D=FALSE) } ### Example 2 if(require("ade4", quietly = TRUE) & require("adegraphics", quietly = TRUE)){ data(doubs) fish.sp = doubs$fish[-8,] # Fish data; site 8 is removed because no fish were caught out.comp = beta.div.comp(fish.sp, coef="S", quant=TRUE) out.fish.D = LCBD.comp(out.comp$D, sqrt.D=TRUE) # out.comp.D is not Euclidean out.fish.D$beta out.fish.Repl = LCBD.comp(out.comp$repl, sqrt.D=TRUE) # out.comp$repl is not Euclidean out.fish.Repl$beta out.fish.AbDiff = LCBD.comp(out.comp$rich, sqrt.D=FALSE) # out.comp$rich is Euclidean out.fish.AbDiff$beta ### Plot maps of the LCBD indices fish.xy = doubs$xy[-8,] # Geographic coordinates; site 8 removed because no fish were caught # Map of LCBD indices for %difference dissimilarity s.value(fish.xy, out.fish.D$LCBD, method="size", symbol = "circle", col = c("white", "brown"), main = "Doubs fish LCBD, %difference D") # Map of LCBD indices for replacement component of %difference dissimilarity s.value(fish.xy, out.fish.Repl$LCBD, method="size", symbol = "circle", col = c("white", "brown"), main = "Doubs fish replacement LCBD") # Map of LCBD indices for abundance difference component of %difference dissimilarity s.value(fish.xy, out.fish.AbDiff$LCBD, method="size", symbol = "circle", col = c("white", "brown"), main = "Doubs fish abundance diff. LCBD") } if(require("ade4", quietly = TRUE) & require("betapart", quietly = TRUE)){ ### Example 3 ### This example requires packages \code{"betapart"} and \code{"ade4"} for data. ### For the Baselga-family indices, the same partitioning results are obtained using ### (1) beta.div.comp or (2) beta.pair.abund() of \code{"betapart"} and LCBD.comp() data(doubs) # Data available in \code{"ade4"} fish.sp = doubs$fish[-8,] # Fish data; site 8 is removed because no fish were caught # We use abundance data in this example, not presence-absence data # Partition into Baselga-family replacement and nestedness components # using \code{"beta.div.comp"} with the percentage difference index (aka Bray-Curtis) out.comp = beta.div.comp(fish.sp, coef="BS", quant=TRUE) out.comp$part # Compute the D and component matrices using \code{"beta.pair.abund"} out3 = beta.pair.abund(fish.sp, index.family = "bray") summary(out3) is.euclid(out3$beta.bray) # D matrix out3$beta.bray is not Euclidean out3.D = LCBD.comp(out3$beta.bray, sqrt.D=TRUE) out3.D$beta # Compare BDtotal here to BDtotal in out.comp$part (above) out3.Repl = LCBD.comp(out3$beta.bray.bal, sqrt.D=TRUE) out3.Repl$beta # Compare BDtotal here to RichDiff in out.comp$part (above) out3.AbDiff = LCBD.comp(out3$beta.bray.gra, sqrt.D=TRUE) out3.AbDiff$beta # Compare BDtotal here to RichDiff/Nes in out.comp$part (above) }
### Example 1 ### Compute the Hellinger distance, then the LCBD indices. if(require("vegan", quietly = TRUE)){ data(mite) mite.hel = decostand(mite, "hellinger") mite.D = dist(mite.hel) out.mite.D = LCBD.comp(mite.D, sqrt.D=FALSE) } ### Example 2 if(require("ade4", quietly = TRUE) & require("adegraphics", quietly = TRUE)){ data(doubs) fish.sp = doubs$fish[-8,] # Fish data; site 8 is removed because no fish were caught out.comp = beta.div.comp(fish.sp, coef="S", quant=TRUE) out.fish.D = LCBD.comp(out.comp$D, sqrt.D=TRUE) # out.comp.D is not Euclidean out.fish.D$beta out.fish.Repl = LCBD.comp(out.comp$repl, sqrt.D=TRUE) # out.comp$repl is not Euclidean out.fish.Repl$beta out.fish.AbDiff = LCBD.comp(out.comp$rich, sqrt.D=FALSE) # out.comp$rich is Euclidean out.fish.AbDiff$beta ### Plot maps of the LCBD indices fish.xy = doubs$xy[-8,] # Geographic coordinates; site 8 removed because no fish were caught # Map of LCBD indices for %difference dissimilarity s.value(fish.xy, out.fish.D$LCBD, method="size", symbol = "circle", col = c("white", "brown"), main = "Doubs fish LCBD, %difference D") # Map of LCBD indices for replacement component of %difference dissimilarity s.value(fish.xy, out.fish.Repl$LCBD, method="size", symbol = "circle", col = c("white", "brown"), main = "Doubs fish replacement LCBD") # Map of LCBD indices for abundance difference component of %difference dissimilarity s.value(fish.xy, out.fish.AbDiff$LCBD, method="size", symbol = "circle", col = c("white", "brown"), main = "Doubs fish abundance diff. LCBD") } if(require("ade4", quietly = TRUE) & require("betapart", quietly = TRUE)){ ### Example 3 ### This example requires packages \code{"betapart"} and \code{"ade4"} for data. ### For the Baselga-family indices, the same partitioning results are obtained using ### (1) beta.div.comp or (2) beta.pair.abund() of \code{"betapart"} and LCBD.comp() data(doubs) # Data available in \code{"ade4"} fish.sp = doubs$fish[-8,] # Fish data; site 8 is removed because no fish were caught # We use abundance data in this example, not presence-absence data # Partition into Baselga-family replacement and nestedness components # using \code{"beta.div.comp"} with the percentage difference index (aka Bray-Curtis) out.comp = beta.div.comp(fish.sp, coef="BS", quant=TRUE) out.comp$part # Compute the D and component matrices using \code{"beta.pair.abund"} out3 = beta.pair.abund(fish.sp, index.family = "bray") summary(out3) is.euclid(out3$beta.bray) # D matrix out3$beta.bray is not Euclidean out3.D = LCBD.comp(out3$beta.bray, sqrt.D=TRUE) out3.D$beta # Compare BDtotal here to BDtotal in out.comp$part (above) out3.Repl = LCBD.comp(out3$beta.bray.bal, sqrt.D=TRUE) out3.Repl$beta # Compare BDtotal here to RichDiff in out.comp$part (above) out3.AbDiff = LCBD.comp(out3$beta.bray.gra, sqrt.D=TRUE) out3.AbDiff$beta # Compare BDtotal here to RichDiff/Nes in out.comp$part (above) }
This function is a user-friendly way to create a list of one or several spatial weighting matrices (SWM) by selecting a set of predefined connectivity and weighting matrices (B and A matrices, respectively).
listw.candidates( coord, style = "B", nb = c("del", "gab", "rel", "mst", "pcnm", "dnear"), d1 = 0, d2, weights = c("binary", "flin", "fup", "fdown"), y_fdown = 5, y_fup = 0.5 )
listw.candidates( coord, style = "B", nb = c("del", "gab", "rel", "mst", "pcnm", "dnear"), d1 = 0, d2, weights = c("binary", "flin", "fup", "fdown"), y_fdown = 5, y_fup = 0.5 )
coord |
Vector, matrix, or dataframe of point coordinates |
style |
Coding scheme style (see |
nb |
Defines how the B matrix (connectivity) is build:
|
d1 |
Only considered if |
d2 |
Only considered if |
weights |
Defines how the A matrix (weighths) is build:
|
y_fdown |
Single value or vector of values of the |
y_fup |
Single value or vector of values of the |
The function allows constructing SWMs based on any combination
of B and A matrices. The B matrices are either graph-based or
distance-based. The function proposes the Delaunay triangulation, Gabriel
graph, relative neighbourhood graph, and the minimum spanning tree criteria
to build a graph-based B matrix. Distance-based SWMs can be built
with the principal coordinates of neighbour matrices (PCNM; Borcard and
Legendre 2002) criteria (see details below), or using another threshold
distance to define the connected site pairs. The A matrix can be based on a
binary, linear, concave-down, or concave-up function. The linear,
concave-down, and concave-up weighting functions are defined by ,
, and
, respectively, where
D
is the euclidean distance between the two sites considered,
dmax
is the maximum euclidean distance between two sites, and
y
is a user-defined parametre that can either be a single value or a
vector of values. The choice nb = "pcnm"
consists in constructing a
distance-based SWM based on the largest edge of the minimum spanning
tree as a connectivity distance threshold, and then by weighting the links
by the function , where
D
is the euclidean
distance between the sites, and t
is the distance threshold below
which two sites are considered connected (Dray et al. 2006). As optimizing
the choice of a SWM has to be done with a p-value correction depending
on the number of candidate SWMs tested (see function
listw.select
), Bauman et al. (2018) strongly encouraged plotting the
concave-down and concave-up weighting functions with several parametre
values in order to only choose the realistic ones to build the candidate W
matrices (e.g., ranging between 0.1 and 1 for the concave-up function, as
values over 1 would make no ecological sense). First visualizing the
connectivity schemes with the listw.explore
function may also help
choosing the B matrices to select for the listw.candidates
function.
Spatial eigenvectors can be generated from any candidate SWM obtained by
listw.candidates
using scores.listw
, or can be generated
and tested (recommended option for real data analysis) using
mem.select
. If several SWMs were created, the selection of an
optimized SWM can be made using listw.select
.
A list of SWMs. Each element of the list was built by
nb2listw
(package spdep
) and therefore is of class
listw
and nb
. The name of each element of the list (SWM)
is composed of the corresponding B and A matrices, followed (if any) by the
y
parameter value of the weighting function.
David Bauman ([email protected] or [email protected]) and Stéphane Dray
Bauman D., Fortin M-J., Drouet T. and Dray S. (2018) Optimizing the choice of a spatial weighting matrix in eigenvector-based methods. Ecology
Borcard D. and Legendre P. (2002) All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling, 153, 51–68
Dray S., Legendre P. and Peres-Neto P. R. (2006) Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling, 196, 483–493
listw.explore
, scores.listw
, mem.select
, listw.select
### Create 100 random sampling locations in a squared grid of 120 x 120: xy <- matrix(nrow = 100, ncol = 2) xy[, 1] <- sample(c(1:120), 100, replace = FALSE) xy[, 2] <- sample(c(1:120), 100, replace = FALSE) ### The function listw.candidates is used to build the spatial weighting matrices that ### we want to test and compare (with the listw.select function). We test a Gabriel's graph, ### a minimum spanning tree, and a distance-based connectivity defined by a threshold ### distance corresponding to the smallest distance keeping all sites connected (i.e., ### the defaut value of d2). These connectivity matrices are then either not weighted ### (binary weighting), or weighted by the linearly decreasing function: candidates <- listw.candidates(coord = xy, nb = c("gab", "mst", "dnear"), weights = c("binary", "flin")) names(candidates) plot(candidates[[1]], xy) plot(candidates[[3]], xy) ### Construction of a different list of spatial weighting matrices. This time, the ### connexions are defined by a distance-based criterion based on the same threshold ### value, but the connections are weighted by the concave-down function with a y parameter ### varying between 2 and 5, and a concave-up function with a y parametre of 0.2. candidates2 <- listw.candidates(coord = xy, nb = "dnear", weights = c("fdown", "fup"), y_fdown = 1:5, y_fup = 0.2) ### Number of spatial weighting matrices generated: length(candidates2) ### A single SWM can also easily be generated with listw.candidates: lw <- listw.candidates(xy, nb = "gab", weights = "bin") plot(lw[[1]], xy) ### Generating MEM variables from an object of listw.candidates with scores.listw: MEM <- scores.listw(lw[[1]]) ### See functions mem.select and listw.select for examples of how to use an object ### created by listw.candidates with these functions.
### Create 100 random sampling locations in a squared grid of 120 x 120: xy <- matrix(nrow = 100, ncol = 2) xy[, 1] <- sample(c(1:120), 100, replace = FALSE) xy[, 2] <- sample(c(1:120), 100, replace = FALSE) ### The function listw.candidates is used to build the spatial weighting matrices that ### we want to test and compare (with the listw.select function). We test a Gabriel's graph, ### a minimum spanning tree, and a distance-based connectivity defined by a threshold ### distance corresponding to the smallest distance keeping all sites connected (i.e., ### the defaut value of d2). These connectivity matrices are then either not weighted ### (binary weighting), or weighted by the linearly decreasing function: candidates <- listw.candidates(coord = xy, nb = c("gab", "mst", "dnear"), weights = c("binary", "flin")) names(candidates) plot(candidates[[1]], xy) plot(candidates[[3]], xy) ### Construction of a different list of spatial weighting matrices. This time, the ### connexions are defined by a distance-based criterion based on the same threshold ### value, but the connections are weighted by the concave-down function with a y parameter ### varying between 2 and 5, and a concave-up function with a y parametre of 0.2. candidates2 <- listw.candidates(coord = xy, nb = "dnear", weights = c("fdown", "fup"), y_fdown = 1:5, y_fup = 0.2) ### Number of spatial weighting matrices generated: length(candidates2) ### A single SWM can also easily be generated with listw.candidates: lw <- listw.candidates(xy, nb = "gab", weights = "bin") plot(lw[[1]], xy) ### Generating MEM variables from an object of listw.candidates with scores.listw: MEM <- scores.listw(lw[[1]]) ### See functions mem.select and listw.select for examples of how to use an object ### created by listw.candidates with these functions.
Interactive tool to generate R code that creates a spatial weighting matrix
listw.explore()
listw.explore()
No return value
Stéphane Dray [email protected]
if(interactive()){ ## a matrix or an object of class 'Spatial*' should be in the global environment xy <- matrix(rnorm(50), 25) listw.explore() }
if(interactive()){ ## a matrix or an object of class 'Spatial*' should be in the global environment xy <- matrix(rnorm(50), 25) listw.explore() }
listw.select
computes MEM variables (i.e., eigenvectors of a doubly
centered spatial weighting matrix) for various definitions of spatial
weighting matrices (SWM) and optimizes the selection of the SWM and of a
subset of MEM variables. The optimization is done by maximizing the adjusted
R-squared (R2) or by minimizing the residual spatial autocorrelation. The
function controls the type I error rate by accounting for the number of tests
performed. This function combine calls to the functions scores.listw
and
mem.select
. The list of candidate SWMs
can easily be generated using listw.candidates
.
listw.select( x, candidates, MEM.autocor = c("positive", "negative", "all"), method = c("FWD", "MIR", "global"), MEM.all = FALSE, nperm = 999, nperm.global = 9999, alpha = 0.05, p.adjust = TRUE, verbose = FALSE )
listw.select( x, candidates, MEM.autocor = c("positive", "negative", "all"), method = c("FWD", "MIR", "global"), MEM.all = FALSE, nperm = 999, nperm.global = 9999, alpha = 0.05, p.adjust = TRUE, verbose = FALSE )
x |
Vector, matrix, or dataframe of the response variable(s) |
candidates |
A list of SWMs of the class |
MEM.autocor |
Sign of the spatial eigenvectors to generate; |
method |
Criterion to select the best subset of MEM variables. Either
|
MEM.all |
A logical indicating if the complete set of MEM variables for the best model should be returned |
nperm |
Number of permutations to perform the tests in the selection procedure; Default is 999 |
nperm.global |
Number of permutations to perform the tests in the global test; Default is 9999 |
alpha |
Significance threshold value for the tests; Default is 0.05 |
p.adjust |
A logical indicating wheter the p-value of the global test performed on each SWM
should be corrected for multiple tests (TRUE) or not (FALSE); default is
|
verbose |
If 'TRUE' more diagnostics are printed. The default setting is FALSE |
While the selection of the SWM is the most critical step of the
spatial eigenvector-based methods (Dray et al. 2006), Bauman et al. (2018)
showed that optimizing the choice of the SWM led to inflated type I error
rates if an explicit control of the number of SWMs tested was not applied.
The function listw.select
therefore applies a Sidak correction (Sidak
1967) for multiple tests to the p-value of the global test of each SWM
(i.e., the model integrating the whole set of spatial predictors). The Sidak
correction is computed as: , where
is the number of tests performed,
is the observed p-value, and
is the new p-value after the correction. The p-value is
first computed using
nperm
permutations and then corrected according to
the total number of SWMs tested (if p.adjust = TRUE
). Although the
function can be run without this correction, using the default value is strongly
recommended to avoid inflated type I error rates (Bauman et al. 2018).
As a consequence of the p-value correction, the significance threshold decreases as the number of SWMs increases, hence leading to a trade-off between the gain of accuracy and the power loss.
The optimization criterion of the SWM performed by listw.select
is
either based on the maximization of the significant adjusted R2 of all the generated
spatial eigenvectors (also referred to as spatial predictors or MEM
variables) (method = "global"
), or is based on an optimized subset of
eigenvectors (method = "FWD"
and "MIR"
).
If the objective is only to optimize the selection of the SWM, without the
intervention of the selection of a subset of predictors within each SWM
(method = "global"
), then the best SWM is the one maximizing the significant adjusted
global R2, that is, the R2 of the model of x
against the whole set of
generated MEM variables which must be significant for the global test (method = "global"
).
The optimization of the SWM depends on the choosen method
. See
mem.select
for a description of the situations in which
method = "FWD"
, "MIR"
, and "global"
should be preferred.
If a subset of MEM variables is needed, then the optimization of the subset
of spatial predictors guides the optimization of the selection of SWM
(method = "FWD"
or "MIR"
).
If method = "FWD"
, listw.select
performs the forward
selection on the significant SWMs and selects among these the SWM for which
the forward-selected subset of spatial eigenvectors yields the highest
adjusted R2. If method = "MIR"
, listw.select
performs the MIR
selection on all the significant candidate SWMs, and selects the best SWM as
the one with the smallest number of MIR-selected spatial eigenvectors. If
two or more SWMs present the same smallest number of predictors, then the
selection is made among them on the basis of the residual Moran's I.
If MEM.autocor = "all"
, the optimization criteria described above are
applied on the sum of the adjusted R2 or number of selected spatial eigenvectors,
for method = "FWD"
and "MIR"
, respectively.
If no subset of MEM variable is required, then the optimization of the SWM is
based on the maximization of the adjusted R2 of all the generated MEM variables
(method = "global"
).
If MEM.autocor = "all"
, n-1 MEM variables are generated. In this case, if
method = "global"
or method = "FWD"
, the adjusted R2 is computed
separately on the MEM associated to positive and negative eigenvalues (hereafter
positive and negative MEM variables, respectively), and the SWM yielding the
highest sum of the the two significant R2 values is selected. If method = "MIR"
, the
MIR selection is performed separately on the positive and negative MEM variables,
and the SWM is selected based on the sum of the number of positive and
negative spatial predictors.
listw.select
returns a list that contains:
A data.frame that summarizes the results on all SWMs
The index and name of the best SWM
The results
for the best SWM as returned by mem.select
Bauman David ([email protected] or [email protected]) and Stéphane Dray
Bauman D., Fortin M-J, Drouet T. and Dray S. (2018) Optimizing the choice of a spatial weighting matrix in eigenvector-based methods. Ecology
Blanchet G., Legendre P. and Borcard D. (2008) Forward selection of explanatory variables. Ecology, 89(9), 2623–2632
Dray S., Legendre P. and Peres-Neto P. R. (2006) Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling, 196, 483–493
Sidak Z. (1967) Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62(318), 626–633
listw.candidates
, mem.select
,
scores.listw
if(require(spdep)) { ### Create a grid of 15 x 15: grid <- expand.grid(x = seq(1, 15, 1), y = seq(1, 15, 1)) ### Generate a response variable Y structured at broad scale by linear combination of ### the first three MEM variables to which a normal noise is added: nb <- cell2nb(nrow = 15, ncol = 15, "queen") lw <- nb2listw(nb, style = "B") MEM <- scores.listw(lw, MEM.autocor = "positive") # Degree of spatial autocorrelation: intensity <- 0.8 Y_space <- scale(MEM[, 1] + MEM[, 2] + MEM[, 3]) * intensity Y_noise <- scale(rnorm(n = nrow(MEM), mean = 0, sd = 1)) * (1 - intensity) Y <- Y_space + Y_noise ### Y is sampled in 100 randomly-chosen sites of the grid: idx.sample <- sample(c(1:nrow(grid)), 100, replace = FALSE) xy <- grid[idx.sample, ] Y_sampled <- Y[idx.sample] ### The function listw.candidates is used to build the spatial weighting matrices that ### we want to test and compare (with the listw.select function). We test a Gabriel's graph, ### a minimum spanning tree, and a distance-based connectivity defined by a threshold ### distance corresponding to the smallest distance keeping all sites connected (i.e., ### the defaut value of d2; see help of function listw.candidates). ### These connectivity matrices are then either not weighted (binary weighting), or ### weighted by the linearly decreasing function (see help of the function listw.candidates): candidates <- listw.candidates(coord = xy, nb = c("gab", "mst"), weights = c("binary", "flin")) ### Number of candidate W matrices generated: nbw <- length(candidates) ### Significance threshold value after p-value correction (Sidak correction): 1 - (1 - 0.05)^(1/nbw) ### Optimization of the selection of the SWM among the candidates generated above, ### using the corrected significance threshold calculated above for the global tests: W_sel <- listw.select(Y_sampled, candidates, MEM.autocor = "positive", method = "FWD", p.adjust = TRUE, nperm = 299) ### Some characteristics of the best spatial model: # Best SWM: W_sel$best.id # Selected subset of spatial predictor within the best SWM: W_sel$best$MEM.select nrow(W_sel$best$summary) # Corrected p-value of the global test of the best SWM: W_sel$best$global.test$Pvalue # Adjusted R2 of the subset of spatial predictors selected within the chosen SWM: max(W_sel$best$summary$R2Adj) # p-values of all the tested W matrices: W_sel$candidates$Pvalue # Adjusted R2 of the subset of spatial predictors selected for all the significant # W matrices: W_sel$candidates$R2Adj.select # See Appendix S3 of Bauman et al. 2018 for more extensive examples and illustrations. }
if(require(spdep)) { ### Create a grid of 15 x 15: grid <- expand.grid(x = seq(1, 15, 1), y = seq(1, 15, 1)) ### Generate a response variable Y structured at broad scale by linear combination of ### the first three MEM variables to which a normal noise is added: nb <- cell2nb(nrow = 15, ncol = 15, "queen") lw <- nb2listw(nb, style = "B") MEM <- scores.listw(lw, MEM.autocor = "positive") # Degree of spatial autocorrelation: intensity <- 0.8 Y_space <- scale(MEM[, 1] + MEM[, 2] + MEM[, 3]) * intensity Y_noise <- scale(rnorm(n = nrow(MEM), mean = 0, sd = 1)) * (1 - intensity) Y <- Y_space + Y_noise ### Y is sampled in 100 randomly-chosen sites of the grid: idx.sample <- sample(c(1:nrow(grid)), 100, replace = FALSE) xy <- grid[idx.sample, ] Y_sampled <- Y[idx.sample] ### The function listw.candidates is used to build the spatial weighting matrices that ### we want to test and compare (with the listw.select function). We test a Gabriel's graph, ### a minimum spanning tree, and a distance-based connectivity defined by a threshold ### distance corresponding to the smallest distance keeping all sites connected (i.e., ### the defaut value of d2; see help of function listw.candidates). ### These connectivity matrices are then either not weighted (binary weighting), or ### weighted by the linearly decreasing function (see help of the function listw.candidates): candidates <- listw.candidates(coord = xy, nb = c("gab", "mst"), weights = c("binary", "flin")) ### Number of candidate W matrices generated: nbw <- length(candidates) ### Significance threshold value after p-value correction (Sidak correction): 1 - (1 - 0.05)^(1/nbw) ### Optimization of the selection of the SWM among the candidates generated above, ### using the corrected significance threshold calculated above for the global tests: W_sel <- listw.select(Y_sampled, candidates, MEM.autocor = "positive", method = "FWD", p.adjust = TRUE, nperm = 299) ### Some characteristics of the best spatial model: # Best SWM: W_sel$best.id # Selected subset of spatial predictor within the best SWM: W_sel$best$MEM.select nrow(W_sel$best$summary) # Corrected p-value of the global test of the best SWM: W_sel$best$global.test$Pvalue # Adjusted R2 of the subset of spatial predictors selected within the chosen SWM: max(W_sel$best$summary$R2Adj) # p-values of all the tested W matrices: W_sel$candidates$Pvalue # Adjusted R2 of the subset of spatial predictors selected for all the significant # W matrices: W_sel$candidates$R2Adj.select # See Appendix S3 of Bauman et al. 2018 for more extensive examples and illustrations. }
A list containing the XY coordinates of the lakes and a site-by-edge matrix describing how 42 lakes influence each other. The influence is defined by 66 edges.
data(mastigouche)
data(mastigouche)
A list that includes the centred coordinates of 42 lakes in the Mastigouche reserve and a site-by-edge matrix describing how the 42 lakes are influenced among each other. The influence is defined by 66 edges.
Magnan, P., Rodriguez, M.A., Legendre, P., Lacasse, S. (1994) Dietary variation in a freshwater fish species: relative contribution of biotic interactions, abiotic factors, and spatial structure. Canadian Journal of Fisheries and Aquatic Sciences 51, 2856-2865. Blanchet F.G., P. Legendre, and Borcard D. (2008) Modelling directional spatial processes in ecological data. Ecological Modelling, 215, 325-336.
mem.select
computes the spatial eigenvectors (MEM) of the spatial
weighting matrix (SWM) provided (listw
) and optimizes the selection of
a subset of MEM variables relative to response variable(s) stored in
x
. The optimization is done either by maximizing the adjusted
R-squared (R2) of all (method = "global"
) or a subset (method =
"FWD"
) of MEM variables or by minimizing the residual spatial
autocorrelation (method = "MIR"
) (see details in Bauman et al. 2018a).
mem.select( x, listw, MEM.autocor = c("positive", "negative", "all"), method = c("FWD", "MIR", "global"), MEM.all = FALSE, nperm = 999, nperm.global = 9999, alpha = 0.05, verbose = FALSE, ... )
mem.select( x, listw, MEM.autocor = c("positive", "negative", "all"), method = c("FWD", "MIR", "global"), MEM.all = FALSE, nperm = 999, nperm.global = 9999, alpha = 0.05, verbose = FALSE, ... )
x |
A vector, matrix, or dataframe of response variable(s). The
|
listw |
A spatial weighting matrix of class |
MEM.autocor |
Sign of the spatial eigenvectors to generate; "positive", "negative", or "all", for positively, negatively autocorrelated eigenvectors, or both, respectively; default is "positive" |
method |
Criterion to select the best subset of MEM variables. Either
|
MEM.all |
A logical indicating if the complete set of MEM variables should be returned |
nperm |
Number of permutations to perform the tests in the selection procedure; Default is 999 |
nperm.global |
Number of permutations to perform the tests in the global test; Default is 9999 |
alpha |
Significance threshold value for the tests; Default is 0.05 |
verbose |
If 'TRUE' more diagnostics are printed. The default setting is FALSE |
... |
Other parameters (for internal use with |
The function provides three different methods to select a subset of
MEM variables. For all methods, a global test is firstly performed. If
MEM.autocor = "all"
, two global tests are performed and p-values are
corrected for multiple comparison (Sidak correction).
If the MEM variables are to be further used in a model including actual
predictors (e.g. environmental), then a subset of spatial eigenvectors
needs to be selected before proceeding to further analyses to avoid model
overfitting and/or a loss of statistical power to detect the contribution of
the environment to the variability of the response data (Griffith 2003, Dray
et al. 2006, Blanchet et al. 2008, Peres-Neto and Legendre 2010, Diniz-Filho
et al. 2012). Although several eigenvector selection approaches have been
proposed to select a best subset of eigenvectors, Bauman et al. (2018b) showed
that two main procedures should be preferred, depending on the underlying
objective: the forward selection with double stopping criterion (Blanchet et
al. 2008; method = "FWD"
) or the minimization of the residual spatial
autocorrelation (Griffith and Peres-Neto 2006; MIR selection in Bauman et al.
2018a,b, method = "MIR"
).
The most powerful and accurate selection method, in terms of R2 estimation,
is the forward selection. This method should be preferred when the objective
is to capture as accurately as possible the spatial patterns of x
. If
the objective is to optimize the detection of the spatial patterns in the
residuals of a model of the response variable(s) against a set of environmental
predictors, for instance, then x
should be the model residuals, and
method = "FWD"
. This allows optimizing the detection of residual
spatial patterns once the effect of the environmental predictors has been
removed.
If however the objective is only to remove the spatial autocorrelation from the
residuals of a model of x
against a set of actual predictors (e.g.
environmental) with a small number of spatial predictors, then accuracy is
not as important and one should focus mainly on the number of spatial
predictors (Bauman et al. 2018b). In this case, method = "MIR"
is more
adapted, as it has the advantage to maintain the standard errors of the actual
predictor coefficients as low as possible. Note that method = "MIR"
can
only be used for a univariate x
, as the Moran's I is a univariate index.
If x
is multivariate, then the best criterion is the forward selection
(see Bauman et al. 2018b).
A third option is to not perform any selection of MEM variables
(method = "global"
). This option may be interesting when the complete set
of MEM variables will be used, like in Moran spectral randomizations (Wagner and
Dray 2015, Bauman et al. 2018c) or when using smoothed MEM (Munoz 2009).
For method = "MIR"
, the global test consists in computing the
Moran's I of x
(e.g. residuals of the model of the response variable
against environmental variables) and tests it by permutation (results
stored in global.test
). If the Moran's I is significant, the
function performs a selection procedure that searches among the set of
generated spatial predictors the one that best minimizes the value of the
Moran's I. A model of x
against the selected eigenvector is built,
and the significance of the Moran's I of the model residuals is tested
again. The procedure goes on until the Moran's I of the model residuals is
not significant anymore, hence the name of Minimization of moran's I in the
Residuals (MIR).
For method = "global"
and method = "FWD"
, the global test
consists in computing the adjusted global R2, that is, the R2 of the model
of x
against the whole set of generated MEM variables and tests it
by permutation (results stored in global.test
).
For method = "global"
, if the adjusted global R2 is significant, the
functions returns the whole set of generated MEM variables in MEM.select
.
For method = "FWD"
, if the adjusted global R2 is significant, the
function performs a forward selection with double stopping criterion that
searches among the set of generated spatial predictors the one that best maximizes
the R2 of the model. The procedure is repeated untill one of the two stopping
criterion is reached (see Blanchet et al. 2008). Note that in a few cases, the forward
selection does not select any variable even though the global model is significant. This
can happen for example when a single variable has a strong relation with the response
variable(s), because the integration of the variable alone yields an adjusted R2
slightly higher than the global adjusted R2. In this case, we recommend checking that this
is indeed the reason why the first selected variable was rejected, and rerun the analysis
with a second stopping criterion equal to the global adjusted R2 plus a small amount
allowing avoiding this issue (e.g. 5
done through the argument adjR2thresh
of function forward.sel
, until the
solution is implemented in mem.select
.
For the method = "FWD"
and method = "MIR"
, the MEM selected
by the procedure are returned in MEM.select
and a summary of the
results is provided in summary
. If no MEM are selected, then
MEM.select
and summary
are not returned.
The function returns a list with:
An
object of class randtest
containing the result of the global test
associated to all MEM (adjusted R2 and p-value). If MEM.autocor = "all"
,
a list with two elements (positive
and negative
) corresponding to
the results of the global tests performed on positive and negative MEM respectively.
An object of class orthobasisSp
containing the
complete set of generated MEM variables (generated by
scores.listw
). Only returned if MEM.all = TRUE
.
A dataframe summarizing the results of the selection procedure
An object of class orthobasisSp
containing the subset of significant MEM variables.
David Bauman ([email protected] or [email protected]) and Stéphane Dray
Bauman D., Fortin M-J, Drouet T. and Dray S. (2018a) Optimizing the choice of a spatial weighting matrix in eigenvector-based methods. Ecology, 99, 2159-2166
Bauman D., Drouet T., Dray S. and Vleminckx J. (2018b) Disentangling good from bad practices in the selection of spatial or phylogenetic eigenvectors. Ecography, 41, 1–12
Bauman D., Vleminckx J., Hardy O., Drouet T. (2018c) Testing and interpreting the shared space-environment fraction in variation partitioning analyses of ecological data. Oikos
Blanchet G., Legendre P. and Borcard D. (2008) Forward selection of explanatory variables. Ecology, 89(9), 2623–2632
Diniz-Filho J.A.F., Bini L.M., Rangel T.F., Morales-Castilla I. et al. (2012) On the selection of phylogenetic eigenvectors for ecological analyses. Ecography, 35, 239–249
Dray S., Legendre P. and Peres-Neto P. R. (2006) Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling, 196, 483–493
Griffith D. (2003) Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization. Springer, Berlin
Griffith D. and Peres-Neto P. (2006) Spatial modeling in Ecology: the flexibility of eigenfunction spatial analyses. Ecology, 87, 2603–2613
Munoz, F. 2009. Distance-based eigenvector maps (DBEM) to analyse metapopulation structure with irregular sampling. Ecological Modelling, 220, 2683–2689
Peres-Neto P. and Legendre P. (2010) Estimating and controlling for spatial structure in the study of ecological communities. Global Ecology and Biogeography, 19, 174–184
Wagner H., Dray S. (2015). Generating spatially constrained null models for irregularly spaced data using Moran spectral randomization methods. Methods in Ecology and Evolution, 6, 1169–1178
listw.candidates
, listw.select
,
scores.listw
if(require(vegan)){ # Illustration of the MIR selection on the oribatid mite data # (Borcard et al. 1992, 1994 for details on the dataset): # ******************************************************* # Community data (response matrix): data(mite) # We will compute the example on a single species: spe <- mite[, 2] # Environmental explanatory dataset: data(mite.env) # We only use two numerical explanatory variables: env <- mite.env[, 1:2] dim(env) # Coordinates of the 70 sites: data(mite.xy) coord <- mite.xy # We build the model we are interested in: mod <- lm(spe ~ ., data = env) # In order to avoid possible type I error rate inflation issues, we check # whether the model residuals are independent, and if they are spatially # autocorrelated, we select a small subset of MEM variables to add to the # model as covariables with the MIR selection: # 1) We build a spatial weighting matrix based on Gabriel graph with a # weighting function decreasing linearly with the distance: w <- listw.candidates(coord, nb = "gab", weights = "flin") # 2) We test the spatial autocorrelation of the model residuals and, if # necessary, select a subset of spatial predictors: y <- residuals(mod) MEM <- mem.select(x = y, listw = w[[1]], method = "MIR", MEM.autocor = "positive", nperm = 999, alpha = 0.05) dim(MEM$MEM.select) # The residuals of the model presented spatial autocorrelation. The selection # of MEM variables is thus performed to remove residual autocorrelation. # 3) We can reconstruct our model adding the selected MEM variable as covariables: env2 <- cbind(env, MEM$MEM.select) mod_complete <- lm(spe ~ ., data = env2) summary(mod_complete)$coefficient[, 1] # Coefficient estimates summary(mod_complete)$coefficient[, 2] # Standard errors }
if(require(vegan)){ # Illustration of the MIR selection on the oribatid mite data # (Borcard et al. 1992, 1994 for details on the dataset): # ******************************************************* # Community data (response matrix): data(mite) # We will compute the example on a single species: spe <- mite[, 2] # Environmental explanatory dataset: data(mite.env) # We only use two numerical explanatory variables: env <- mite.env[, 1:2] dim(env) # Coordinates of the 70 sites: data(mite.xy) coord <- mite.xy # We build the model we are interested in: mod <- lm(spe ~ ., data = env) # In order to avoid possible type I error rate inflation issues, we check # whether the model residuals are independent, and if they are spatially # autocorrelated, we select a small subset of MEM variables to add to the # model as covariables with the MIR selection: # 1) We build a spatial weighting matrix based on Gabriel graph with a # weighting function decreasing linearly with the distance: w <- listw.candidates(coord, nb = "gab", weights = "flin") # 2) We test the spatial autocorrelation of the model residuals and, if # necessary, select a subset of spatial predictors: y <- residuals(mod) MEM <- mem.select(x = y, listw = w[[1]], method = "MIR", MEM.autocor = "positive", nperm = 999, alpha = 0.05) dim(MEM$MEM.select) # The residuals of the model presented spatial autocorrelation. The selection # of MEM variables is thus performed to remove residual autocorrelation. # 3) We can reconstruct our model adding the selected MEM variable as covariables: env2 <- cbind(env, MEM$MEM.select) mod_complete <- lm(spe ~ ., data = env2) summary(mod_complete)$coefficient[, 1] # Coefficient estimates summary(mod_complete)$coefficient[, 2] # Standard errors }
This function performs multi-frequential periodogram analysis for univariate temporal or spatial data series collected with equal intervals. Compared with the traditional periodogram used in spectral analysis, this method can detect overlapping signals with fractional frequencies. Fitting a joint polynomial-trigonometric model is achieved by Ordinary Least Squares (OLS) regression. The function also performs autocorrelation analysis of OLS residuals up to a number of lags determined by the user.
mfpa( y, MaxNFreq = 2, MinFreq = 3, MaxFreq = NA, ntrend = 0, nlags = 0, alpha = 0.05 ) ## S3 method for class 'mfpa' plot(x, xlab = "", ylab = "Values", ...) ## S3 method for class 'mfpa' print(x, ...)
mfpa( y, MaxNFreq = 2, MinFreq = 3, MaxFreq = NA, ntrend = 0, nlags = 0, alpha = 0.05 ) ## S3 method for class 'mfpa' plot(x, xlab = "", ylab = "Values", ...) ## S3 method for class 'mfpa' print(x, ...)
y |
Vector of n observations (vector of integer or real numbers, or one-column matrix). |
MaxNFreq |
Maximum number of frequencies to be estimated in the stepwise procedure (e.g. 2). |
MinFreq |
Minimum value for frequency estimates (e.g. 3.0). |
MaxFreq |
Maximum value for frequency estimates (e.g. 10). Must be larger
than MinFreq and smaller than half of the number of observations in the
series. If unspecified by the user |
ntrend |
Number (0 to 3) of orthogonal polynomial components estimating the broad-scale trend, to be included in the joint polynomial-trigonometric model. Use 0 to estimate no trend component, only an intercept. |
nlags |
Number of lags to be used for autocorrelation analysis of OLS residuals. Use 0 to bypass this analysis. |
alpha |
Significance threshold for including frequencies. |
x |
An object of class |
xlab , ylab
|
Labels for x and y axes |
... |
Further arguments passed to or from other methods |
The fitting of a joint polynomial-trigonometric model is limited to ordinary least squares (OLS), with autocorrelation analysis of OLS residuals up to a certain lag. Orthogonal polynomials are used to model broad-scale trends, whereas cosines and sines model the periodic structures at intermediate scales. See Dutilleul (2011, section 6.5) and Legendre & Legendre (2012, section 12.4.4) for details. OLS regression could be replaced by an estimated generalized least squares (EGLS) procedure, as described in Dutilleul (2011).
In spectral analysis in general and in mfpa in particular, the cosines and sines are considered jointly in the search for the dominant frequency components since they are both required to fully account for a frequency component in a linear model. So, when either the cosine or the sine is significant, this is sufficient indication that a significant frequency component has been found. But see the first paragraph of the 'Recommendations to users' below.
The periodic phenomenon corresponding to each identified frequency is modelled by a cosine and a sine. The first pair ('cos 1', 'sin 1') corresponds to the first frequency, the second pair to the second frequency, and so on. An intercept is also computed, as well as a polynomial broad-scale trend if argument ntrend > 0. The coefficients shown for each periodic component ('cos' and 'sin') are the OLS regression coefficients. The tests of significance producing the p-values (called 'prob' in the output file) are 2-tailed parametric t-tests, as in standard OLS regression.
A global R-square statistic for the periodogram is computed as the variance of the fitted values divided by the variance of the data series. An R-squared corresponding to each frequency is also returned.
In the Dutilleul periodogram, the time unit is the length of the data series (in time units: seconds, hours, days, etc.). Hence, the frequency identified by a Dutilleul periodogram is the number of cycles of the periodic signal (how many full or partial cycles) along the time series. That number is an integer when the series contains an integer number of cycles; it may also be a real number when the number of cycles is fractional. The periodogram can identify several periodic phenomena with different frequencies. The estimated frequencies could be divided by an appropriate constant to produce numbers of cycles per second or day, or per meter or km, depending on the study.
To find the period (number of days, hours, etc.) of the process generating a periodic signal in the data, divide the length of the series (in days, hours, etc.) by the frequency identified by Dutilleul's periodogram.
Recommendations to users The mfpa code estimates the periodic frequencies to be included in the model through a combination of a stepwise procedure and non-linear optimisation. Following that, the contributions of the 'cos' and 'sin' components of all frequencies in the model are estimated by multiple linear regression in the presence of the intercept and trends (if any). Because the mfpa method estimates fractional frequencies, the cos-sin combinations are not orthogonal among the identified frequencies, and unnecessary frequencies may be selected as 'significant'.
1. It is important that users of this periodogram have hypotheses in mind about the frequencies of the processes that may be operating on the system under study and the number of periodic components they are expecting to find. If one asks for more components than the number of periodic phenomena at work on the system, the 'real' frequency usually has a strong or fairly strong R-squared and it is followed by other components with very small R-squared. Selection of frequencies of interest should thus be based more upon examination of the R-squares of the components rather than on the p-values. For short series in particular, the adjusted R-squared is an unbiased estimate of the variance of the data explained by the model. Even series of random numbers can produce 'significant' frequencies for periodic components; the associated (adjusted) R-squares will, however, be very small.
2. Function mfpa cannot detect frequencies < 1 (smaller than one cycle in the series) or larger than (n-1) where n is the number of observations in the series, the latter case corresponding to periods smaller than the interval between successive observations. When a periodic component with such a period is present in the data, Dutilleul's periodogram can detect harmonics of that frequency. Recommendation: when a frequency is detected that does not seem to correspond to a hypothesized process, one could check, using simulated data, if it could be produced by a process operating at a temporal scale (period) smaller than the interval between successive observations. An example is shown in Example 2.
3. When analysing a time series with unknown periodic structure, it is recommended to try first with more than one frequency, say 2 or 3, and also with a trend. Eliminate the non-significant components, step by step, in successive runs, starting with the trend(s), then eliminate the weakly significant periodic components, until there are only highly significant components left in the model.
A list containing the following elements:
frequencies
: Vector of estimated frequencies of the
model periodic components and associated R-squared. The frequencies are
numbers of cycles in the whole (temporal or spatial) series under study.
coefficients
: Data frame containing OLS slope estimates,
starting with the intercept, then the orthogonal polynomials modelling trend
in increasing order, followed by the cosine and sine coefficients
(alternating) in the order of the estimated frequencies. Columns: (1)
coefficient
: the OLS intercept or slope estimates; (2) prob
:
the associated probabilities.
predicted
: A vector (length
n) of predicted response values (fitted values), including the trend
if any. The data and predicted values can be plotted together using function
plot.mfpa
; type plot(name.of.output.object). The data values are
represented by red circles and the fitted values by a black line.
auto_coeff
: If nlags > 0: data frame containing the following
columns. (1) lag
: lags at which autocorrelation analysis of the OLS
residuals is performed; (2) auto_r
: vector of sample autocorrelation
coefficients calculated from OLS residuals for each lag; (3) prob
:
vector of probabilities associated with the tests of significance of the
sample autocorrelation coefficients.
y
: the original data
series (one-column matrix).
X
: the matrix of explanatory
variables; it contains a column of "1" to estimate the intercept, a column
for each of the trend components (if any), and two columns for each
frequency component, each frequency being represented by a cosine and a
sine.
r.squared.global
: The global R-squared of the model and
the adjusted R-squared.
Guillaume Larocque <[email protected]> and Pierre Legendre.
Dutilleul, P. 1990. Apport en analyse spectrale d'un périodogramme modifié et modélisation des séries chronologiques avec répétitions en vue de leur comparaison en fréquence. Doctoral Dissertation, Université Catholique de Louvain, Louvain-la-Neuve, Belgium.
Dutilleul, P. 1998. Incorporating scale in ecological experiments: data analysis. Pp. 387-425 in: D. L. Peterson & V. T. Parker [eds.] Ecological scale - Theory and applications. Columbia University Press, New York.
Dutilleul, P. 2001. Multi-frequential periodogram analysis and the detection of periodic components in time series. Commun. Stat. - Theory Methods 30, 1063-1098.
Dutilleul, P. R. L. 2011. Spatio-temporal heterogeneity - Concepts and analyses. Cambridge University Press, Cambridge.
Dutilleul, P. and C. Till. 1992. Evidence of periodicities related to climate and planetary behaviors in ring-width chronologies of Atlas cedar (Cedrus atlantica) in Morocco. Can. J. For. Res. 22: 1469-1482.
Legendre, P. and P. Dutilleul. 1992. Introduction to the analysis of periodic phenomena. 11-25 in: M. A. Ali [ed.] Rhythms in fishes. NATO ASI Series, Vol. A-236. Plenum, New York.
Legendre, P. and L. Legendre. 2012. Numerical Ecology. 3rd English edition. Elsevier, Amsterdam.
### Example 1 # Simulate data with frequencies 2.3 and 6.1 and a random component, n = 100. # No trend, no autocorrelated residuals. y <- as.matrix(0.4*(sin(2.3*2*pi*(1:100)/100)) + 0.4*(sin(6.1*2*pi*(1:100)/100)) + 0.2*rnorm(100)) res <- mfpa(y, MaxNFreq = 2, MinFreq = 2, ntrend = 0, nlags = 0) # Compute the periods associated with the two periodic components. Each # frequency in element $frequencies is a number of cycles in the whole series. # The periods are expressed in numbers of time intervals of the data series. In # this example, if the data are measured every min, the periods are in min. periods <- 100/res$frequencies$frequency # Draw the data series and the fitted (or predicted) values plot(res) ### Example 2 # Generate hourly periodic data with tide signal (tide period T = 12.42 h) # during 1 year, hence 24*365 = 8760 hourly data. See # https://en.wikipedia.org/wiki/Tide. # In this simulation, constant (c = 0) puts the maximum value of the cosine at # midnight on the first day of the series. periodic.component <- function(x, T, c) cos((2*pi/T)*(x+c)) tide.h <- periodic.component(1:8760, 12.42, 0) # The number of tides in the series is: 8760/12.42 = 705.314 tidal cycles # during one year. # Sample the hourly data series once a day at 12:00 noon every day. The # periodic signal to be detected has a period smaller then the interval between # consecutive observations and its frequency is larger than (n-1). The sequence # of sampling hours for the tide.h data is: h.noon <- seq(12, 8760, 24) tide.data <- tide.h[h.noon] length(tide.data) # The series contains 365 sampling units # Compute Dutilleul's multi-frequential periodogram res.noon <- mfpa(tide.data, MaxNFreq = 1, MinFreq = 2, ntrend = 1, nlags = 2) # Examine the frequency detected by the periodogram, element # res.noon$frequencies. This is a harmonic of the tide signal in the original # data series tide.h. # Compute the period of the signal in the data series sampled hourly: period <- 365/res.noon$frequencies$frequency # Draw the data series and the adjusted values plot(res.noon) # Repeat this analysis after addition of random noise to the tide data tide.noise <- tide.data + rnorm(365, 0, 0.25) res.noise <- mfpa(tide.noise, MaxNFreq = 1, MinFreq = 2, ntrend = 1, nlags = 2) plot(res.noise)
### Example 1 # Simulate data with frequencies 2.3 and 6.1 and a random component, n = 100. # No trend, no autocorrelated residuals. y <- as.matrix(0.4*(sin(2.3*2*pi*(1:100)/100)) + 0.4*(sin(6.1*2*pi*(1:100)/100)) + 0.2*rnorm(100)) res <- mfpa(y, MaxNFreq = 2, MinFreq = 2, ntrend = 0, nlags = 0) # Compute the periods associated with the two periodic components. Each # frequency in element $frequencies is a number of cycles in the whole series. # The periods are expressed in numbers of time intervals of the data series. In # this example, if the data are measured every min, the periods are in min. periods <- 100/res$frequencies$frequency # Draw the data series and the fitted (or predicted) values plot(res) ### Example 2 # Generate hourly periodic data with tide signal (tide period T = 12.42 h) # during 1 year, hence 24*365 = 8760 hourly data. See # https://en.wikipedia.org/wiki/Tide. # In this simulation, constant (c = 0) puts the maximum value of the cosine at # midnight on the first day of the series. periodic.component <- function(x, T, c) cos((2*pi/T)*(x+c)) tide.h <- periodic.component(1:8760, 12.42, 0) # The number of tides in the series is: 8760/12.42 = 705.314 tidal cycles # during one year. # Sample the hourly data series once a day at 12:00 noon every day. The # periodic signal to be detected has a period smaller then the interval between # consecutive observations and its frequency is larger than (n-1). The sequence # of sampling hours for the tide.h data is: h.noon <- seq(12, 8760, 24) tide.data <- tide.h[h.noon] length(tide.data) # The series contains 365 sampling units # Compute Dutilleul's multi-frequential periodogram res.noon <- mfpa(tide.data, MaxNFreq = 1, MinFreq = 2, ntrend = 1, nlags = 2) # Examine the frequency detected by the periodogram, element # res.noon$frequencies. This is a harmonic of the tide signal in the original # data series tide.h. # Compute the period of the signal in the data series sampled hourly: period <- 365/res.noon$frequencies$frequency # Draw the data series and the adjusted values plot(res.noon) # Repeat this analysis after addition of random noise to the tide data tide.noise <- tide.data + rnorm(365, 0, 0.25) res.noise <- mfpa(tide.noise, MaxNFreq = 1, MinFreq = 2, ntrend = 1, nlags = 2) plot(res.noise)
This function computes the upper and lower bounds of Moran's I for a given spatial weighting matrix (stored in a listw
object).
These values are obtained by the eigendecomposition of the spatial weighting matrix.
moran.bounds(listw)
moran.bounds(listw)
listw |
an object of class |
A vector containing the maximum and minimum of Moran's I for a given spatial weighting matrix value returned
Stéphane Dray [email protected]
de Jong, P., Sprenger, C., & van Veen, F. (1984). On extreme values of Moran's I and Geary's C. Geographical Analysis, 16(1), 17-24.
if(require("ade4", quietly = TRUE)){ if(require("spdep", quietly = TRUE)){ data(oribatid) nbtri <- tri2nb(as.matrix(oribatid$xy)) lwB <- nb2listw(nbtri, style = "B") lwW <- nb2listw(nbtri, style = "W") scB <- mem(lwB) scW <- mem(lwW) moran.bounds(lwB) moran.mc(scB[,1], lwB, 9) moran.mc(scB[,69], lwB, 9) moran.bounds(lwW) moran.mc(scW[,1], lwW, 9) moran.mc(scW[,69], lwW, 9) } }
if(require("ade4", quietly = TRUE)){ if(require("spdep", quietly = TRUE)){ data(oribatid) nbtri <- tri2nb(as.matrix(oribatid$xy)) lwB <- nb2listw(nbtri, style = "B") lwW <- nb2listw(nbtri, style = "W") scB <- mem(lwB) scW <- mem(lwW) moran.bounds(lwB) moran.mc(scB[,1], lwB, 9) moran.mc(scB[,69], lwB, 9) moran.bounds(lwW) moran.mc(scW[,1], lwW, 9) moran.mc(scW[,69], lwW, 9) } }
This function computes Moran's I statistic and provide a testing procedure using random permutations.
It is based on the moran.mc
function of the spdep
package. The moran.randtest
is slightly different as it allows to consider several variables (x
can have more than one columns)
and its ouputs are objects of class randtest
(one variable) or krandtest
(several variables).
moran.randtest(x, listw, nrepet = 999, ...)
moran.randtest(x, listw, nrepet = 999, ...)
x |
a |
listw |
an object of class |
nrepet |
an integer indicating the number of permutations used in the randomization procedure |
... |
other arguments to be passed to the |
An object of class randtest
(one variable) or krandtest
(several variables)
Stéphane Dray [email protected]
Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37, 17-23.
if(require("ade4", quietly = TRUE) & require("spdep", quiet = TRUE)){ data(mafragh) tests <- moran.randtest(mafragh$env, nb2listw(mafragh$nb)) tests plot(tests) }
if(require("ade4", quietly = TRUE) & require("spdep", quiet = TRUE)){ data(mafragh) tests <- moran.randtest(mafragh$env, nb2listw(mafragh$nb)) tests plot(tests) }
This function computes positive and negative parts of Moran's I statistic and
provide a testing procedure using random permutations. The functions compute
the Moran's eigenvector maps (MEM) and eigenvalues for the listw
object. If alter = "greater"
, the statistic 'I+' is computed as the
sum of the products between positive eigenvalues and squared correlations
between x
and associated MEMs. If alter = "less"
, the statistic
'I-' is computed as the sum of the products between negative eigenvalues and
squared correlations between x
and associated MEMs. If alter =
"two-sided"
, both statistics are computed.
moranNP.randtest( x, listw, nrepet = 999, alter = c("greater", "less", "two-sided"), ... )
moranNP.randtest( x, listw, nrepet = 999, alter = c("greater", "less", "two-sided"), ... )
x |
a |
listw |
an object of class |
nrepet |
an integer indicating the number of permutations used in the randomization procedure |
alter |
a character string specifying the alternative hypothesis, must be one of "greater" (default), "less" or "two-sided" |
... |
other arguments (e.g., |
An object of class randtest
(for unilateral test) or
krandtest
(for bilateral test)
Stéphane Dray [email protected]
Dray, S. (2011). A new perspective about Moran's coefficient: spatial autocorrelation as a linear regression problem. Geographical Analysis, 43, 127-141.
if(require("ade4", quietly = TRUE) & require("spdep", quiet = TRUE)){ data(mafragh) tests <- moranNP.randtest(mafragh$env[,1], nb2listw(mafragh$nb), alter = "two-sided", p.adjust.method = "holm") tests moran.randtest(mafragh$env[,1], nb2listw(mafragh$nb))$obs sum(tests$obs) }
if(require("ade4", quietly = TRUE) & require("spdep", quiet = TRUE)){ data(mafragh) tests <- moranNP.randtest(mafragh$env[,1], nb2listw(mafragh$nb), alter = "two-sided", p.adjust.method = "holm") tests moran.randtest(mafragh$env[,1], nb2listw(mafragh$nb))$obs sum(tests$obs) }
The multi-scale pattern analysis (MSPA, Jombart et al 2009) investigates the
main scales of spatial variation in a multivariate dataset. This
implementation allows one to perform a MSPA using any multivariate analysis
(stored as a dudi
object), and a list of spatial weights
(class listw
) or an object of class orthobasisSp
.
mspa( dudi, lwORorthobasisSp, nblocks, scannf = TRUE, nf = 2, centring = c("param", "sim"), nperm = 999 ) ## S3 method for class 'mspa' scatter( x, xax = 1, yax = 2, posieig = "topleft", bary = TRUE, plot = TRUE, storeData = TRUE, pos = -1, ... ) ## S3 method for class 'mspa' print(x, ...)
mspa( dudi, lwORorthobasisSp, nblocks, scannf = TRUE, nf = 2, centring = c("param", "sim"), nperm = 999 ) ## S3 method for class 'mspa' scatter( x, xax = 1, yax = 2, posieig = "topleft", bary = TRUE, plot = TRUE, storeData = TRUE, pos = -1, ... ) ## S3 method for class 'mspa' print(x, ...)
dudi |
a duality diagram (i.e. a reduced space ordination) obtained by a
|
lwORorthobasisSp |
either a list of weights (class |
nblocks |
an integer indicating the number of blocks to divide MEMs. |
scannf |
logical, indicating whether the screeplot should be displayed to choose the number or retained factors. |
nf |
the number of retained factors |
centring |
a character string indicating if parametric ("param") or non-parametric ("sim") centring should be used |
nperm |
an integer giving the number of permutations used to compute the theoretical coefficients of determination (999 by default); used if centring="sim". |
x |
a mspa object. |
xax |
an integer indicating the x axis to be displayed. |
yax |
an integer indicating the y axis to be displayed. |
posieig |
a character indicating the position of the screeplot (any of the four combination between "top", "bottom", "left" and "right"). |
bary |
a logical indicating whether the barycenter of the variables should be displayed. |
plot |
a logical indicating if the graphics is displayed |
storeData |
a logical indicating if the data should be stored in the
returned object. If |
pos |
an integer indicating the position of the environment where the
data are stored, relative to the environment where the function is called.
Useful only if |
... |
additional graphical parameters (see |
The scatter
method is used for plotting the results. Compared to the
original version of the method, this new implementation allows to specify a
number of blocks (nblocks
). In this case, the multiscale decomposition
is performed by dividing MEMs into several blocks and summing R2 values. This
could facilitate the interpretation of results.
An object having the classes mspa
and
dudi
: mspa
objects are dudi
objects with the following extra slots:
- ls: principal components of
the MSPA. These are the coordinates of variables onto principal axes, to be
used for plotting. Correspond to matrix B in Appendix A of Jombart
et al (2009).
- R2: matrix of R2 between variables and MEMs.
Corresponds to S in Jombart et al (2009).
- meanPoint:
coordinates of the 'mean variable' onto principal axes. The 'mean variable'
is an hypothetic variable whose scale profile is the average of those of
all variables of the analysed dataset.
- varweights: the weights of
variables. Corresponds to d in Jombart et al. (2009).
Thibaut Jombart [email protected]
Jombart T, Dray S, and Dufour, A-B. (2009) Finding essential scales of spatial variation in ecological data: a multivariate approach. Ecography 32: 161-168.
chooseCN
to obtain a list of spatial weights.
#################################### ### using oribatib mites dataset ### #################################### if(require("ade4", quietly = TRUE)){ ## load data data(oribatid) ## get the list of spatial weights cn <- chooseCN(oribatid$xy, res = "listw", ask = FALSE, type = 1) ## Hellinger transformation hellTrans <- function(X){ if (!( is.matrix(X) | is.data.frame(X) )) stop("Object is not a matrix.") if (any(is.na(X))) stop("na entries in table.") sumRow <- apply(X,1,sum) Y <- X/sumRow Y <- sqrt(Y) return(Y) } ## ENVIRONMENTAL VARIABLES ## ## Hill and Smith analysis for environmental variables ## (for a mixture of quantitative / qualitative variables) hsEnv <- dudi.hillsmith(oribatid$envir,scannf=FALSE) ## detrending of the analysis (residuals of regression onto xy coordinates) hsEnv.detr <- pcaivortho(hsEnv,oribatid$xy,scannf=FALSE) ## MSPA of the detrended analysis mspaEnv <- mspa(hsEnv.detr,cn,scannf=FALSE,nf=2) scatter(mspaEnv) ## SPECIES DATA ## ## PCA of species abundances, after Hellinger transformation pcaFau <- dudi.pca(hellTrans(oribatid$fau),scale=FALSE,scannf=FALSE) ## detrending of this PCA pcaFau.detr <- pcaivortho(pcaFau,oribatid$xy,scannf=FALSE) # MSPA of the detrended analysis mspaFau <- mspa(pcaFau.detr,cn,scannf=FALSE,nf=2) scatter(mspaFau) ## CANONICAL MSPA ## ## RDA species ~ envir ## (species abundances predicted by environment) ## note: RDA = 'PCAIV' (PCA with Instrumental Variables) rda1 <- pcaiv(dudi=pcaFau.detr, df=oribatid$envir,scannf=FALSE,nf=2) ## canonical MSPA (species predicted by environment) mspaCan1 <- mspa(dudi=rda1, lw=cn, scannf=FALSE, nf=2) scatter(mspaCan1) ## same analysis, using a non-parametric centring mspaCan1NP <- mspa(dudi=rda1, lw=cn, scannf=FALSE, nf=2,cent="sim",nper=999) scatter(mspaCan1NP) # basically no change ## PARTIAL CANONICAL MSPA ## ## partial RDA species ~ envir ## (species abundances not predicted by environment) rda2 <- pcaivortho(dudi=pcaFau.detr,df=oribatid$envir,scannf=FALSE,nf=2) ## partial canonical MSPA mspaCan2 <- mspa(dudi=rda2, lw=cn, scannf=FALSE, nf=2) scatter(mspaCan2) # nothing left }
#################################### ### using oribatib mites dataset ### #################################### if(require("ade4", quietly = TRUE)){ ## load data data(oribatid) ## get the list of spatial weights cn <- chooseCN(oribatid$xy, res = "listw", ask = FALSE, type = 1) ## Hellinger transformation hellTrans <- function(X){ if (!( is.matrix(X) | is.data.frame(X) )) stop("Object is not a matrix.") if (any(is.na(X))) stop("na entries in table.") sumRow <- apply(X,1,sum) Y <- X/sumRow Y <- sqrt(Y) return(Y) } ## ENVIRONMENTAL VARIABLES ## ## Hill and Smith analysis for environmental variables ## (for a mixture of quantitative / qualitative variables) hsEnv <- dudi.hillsmith(oribatid$envir,scannf=FALSE) ## detrending of the analysis (residuals of regression onto xy coordinates) hsEnv.detr <- pcaivortho(hsEnv,oribatid$xy,scannf=FALSE) ## MSPA of the detrended analysis mspaEnv <- mspa(hsEnv.detr,cn,scannf=FALSE,nf=2) scatter(mspaEnv) ## SPECIES DATA ## ## PCA of species abundances, after Hellinger transformation pcaFau <- dudi.pca(hellTrans(oribatid$fau),scale=FALSE,scannf=FALSE) ## detrending of this PCA pcaFau.detr <- pcaivortho(pcaFau,oribatid$xy,scannf=FALSE) # MSPA of the detrended analysis mspaFau <- mspa(pcaFau.detr,cn,scannf=FALSE,nf=2) scatter(mspaFau) ## CANONICAL MSPA ## ## RDA species ~ envir ## (species abundances predicted by environment) ## note: RDA = 'PCAIV' (PCA with Instrumental Variables) rda1 <- pcaiv(dudi=pcaFau.detr, df=oribatid$envir,scannf=FALSE,nf=2) ## canonical MSPA (species predicted by environment) mspaCan1 <- mspa(dudi=rda1, lw=cn, scannf=FALSE, nf=2) scatter(mspaCan1) ## same analysis, using a non-parametric centring mspaCan1NP <- mspa(dudi=rda1, lw=cn, scannf=FALSE, nf=2,cent="sim",nper=999) scatter(mspaCan1NP) # basically no change ## PARTIAL CANONICAL MSPA ## ## partial RDA species ~ envir ## (species abundances not predicted by environment) rda2 <- pcaivortho(dudi=pcaFau.detr,df=oribatid$envir,scannf=FALSE,nf=2) ## partial canonical MSPA mspaCan2 <- mspa(dudi=rda2, lw=cn, scannf=FALSE, nf=2) scatter(mspaCan2) # nothing left }
This function allows to generate spatially-constrained random variables preserving the global autocorrelation (Moran's I) and the spatial structures at multiple scales. Multiscale property is defined by the power spectrum (i.e. decomposition of the variance of the original variables) on a basis of orthonormal eigenvectors (Moran's Eigenvector Maps, MEM). The function provides methods for univariate randomization, joint randomization of a group of variables while keeping within-group correlations fixed and univariate randomization with a fixed correlation between original data and randomized replicates.
msr(x, ...) ## Default S3 method: msr( x, listwORorthobasis, nrepet = 99, method = c("pair", "triplet", "singleton"), cor.fixed, nmax = 100, simplify = TRUE, ... )
msr(x, ...) ## Default S3 method: msr( x, listwORorthobasis, nrepet = 99, method = c("pair", "triplet", "singleton"), cor.fixed, nmax = 100, simplify = TRUE, ... )
x |
For |
... |
further arguments passed to or from other methods |
listwORorthobasis |
an object of the class |
nrepet |
an |
method |
an character specifying which algorithm should be used to produce spatial replicates (see Details). |
cor.fixed |
if not missing, the level of correlation between the original variable and its randomized replicates |
nmax |
the number of trials used in the "triplet" procedure. |
simplify |
A logical value. If |
Three procedures are implemented in the function. The "pair" procedure is the more general as it can be applied in the three cases (univariate, univariate with fixed correlation and multivariate). This procedure preserves the power spectrum by pair of MEMs but not strictly the global autocorrelation level (Moran's I). The "singleton" procedure can be used for univariate and multivariate cases. It preserves strictly the global level of autocorrelation and the power spectrum. The "triplet" procedure can only be applied in the univariate case. It preserves the power spectrum by triplet of MEMs and strictly the global autocorrelation level.
Either a matrix (if simplify
is TRUE
) or a list with
randomized replicates.
Stephane Dray [email protected] and Helene H Wagner [email protected]
Wagner, H.H. and Dray S. (2015) Generating spatially-constrained null models for irregularly spaced data using Moran spectral randomization methods. Methods in Ecology and Evolution, 6: 1169-1178. doi:10.1111/2041-210X.12407
library(spdep) x1 <- matrix(rnorm(81*5), nrow = 81) lw1 <- nb2listw(cell2nb(9, 9)) moran.mc(x1[,1], lw1, 2)$statistic ## singleton x1.1 <- msr(x1[,1], lw1, nrepet = 9, method = "singleton") apply(x1.1, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) ## triplet x1.2 <- msr(x1[,1], lw1, nrepet = 9, method = "triplet") apply(x1.2, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) ## pair x1.3 <- msr(x1[,1], lw1, nrepet = 9, method = "pair") apply(x1.3, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) ## pair with cor.fixed x1.4 <- msr(x1[,1], lw1, nrepet = 9, cor.fixed = 0.5) apply(x1.4, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) cor(x1[,1], x1.4) ## pair preserving correlations for multivariate data x1.5 <- msr(x1, lw1, nrepet = 9, cor.fixed = 0.5) cor(x1) lapply(x1.5, cor) apply(x1, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) apply(x1.5[[1]], 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) ## singleton preserving correlations for multivariate data x1.6 <- msr(x1, lw1, nrepet = 9, method = "singleton") cor(x1) lapply(x1.6, cor) apply(x1, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) apply(x1.6[[1]], 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic)
library(spdep) x1 <- matrix(rnorm(81*5), nrow = 81) lw1 <- nb2listw(cell2nb(9, 9)) moran.mc(x1[,1], lw1, 2)$statistic ## singleton x1.1 <- msr(x1[,1], lw1, nrepet = 9, method = "singleton") apply(x1.1, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) ## triplet x1.2 <- msr(x1[,1], lw1, nrepet = 9, method = "triplet") apply(x1.2, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) ## pair x1.3 <- msr(x1[,1], lw1, nrepet = 9, method = "pair") apply(x1.3, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) ## pair with cor.fixed x1.4 <- msr(x1[,1], lw1, nrepet = 9, cor.fixed = 0.5) apply(x1.4, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) cor(x1[,1], x1.4) ## pair preserving correlations for multivariate data x1.5 <- msr(x1, lw1, nrepet = 9, cor.fixed = 0.5) cor(x1) lapply(x1.5, cor) apply(x1, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) apply(x1.5[[1]], 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) ## singleton preserving correlations for multivariate data x1.6 <- msr(x1, lw1, nrepet = 9, method = "singleton") cor(x1) lapply(x1.6, cor) apply(x1, 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic) apply(x1.6[[1]], 2, function(x) moran.mc(x, listw = lw1, nsim = 2)$statistic)
This function allows to test fourth-corner statistics using constrained null
models (for traits and/or environmental variables). If the argument
phyloORorthobasis
is specified, random traits are
phylogenetically-constrained to preserve the global autocorrelation (Moran's
I) and the phylogenetic structures at multiple scales. If not, standard
permutations are used. If the argument listwORorthobasis
is specified,
random environmental variables are spatially-constrained to preserve the
global autocorrelation (Moran's I) and the spatial structures at multiple
scales. If not, standard permutations are used. Multiscale property is
defined by the power spectrum (i.e. decomposition of the variance of the
original variables) on a basis of orthonormal eigenvectors (Moran's
Eigenvector Maps, MEM).
## S3 method for class ''4thcorner'' msr( x, listwORorthobasis, phyloORorthobasis, nrepet = x$npermut, method = c("pair", "triplet", "singleton"), ... )
## S3 method for class ''4thcorner'' msr( x, listwORorthobasis, phyloORorthobasis, nrepet = x$npermut, method = c("pair", "triplet", "singleton"), ... )
x |
An object generated by the |
listwORorthobasis |
an object of the class |
phyloORorthobasis |
an object of the class |
nrepet |
an |
method |
an character specifying which algorithm should be used to
produce spatial replicates (see |
... |
further arguments of the |
An object of class 4thcorner
randomized replicates.
Stephane Dray [email protected]
Braga, J., Thuiller, W., ter Braak, C.J.F. and Dray, S. (2018) Integrating spatial and phylogenetic information in the fourth-corner analysis to test trait-environment relationships. Ecology, 99:2667-2674.
if(require("ade4", quietly = TRUE) & require("adephylo", quietly = TRUE) & require("spdep", quietly = TRUE) & require("ape", quietly = TRUE)){ data(mafragh, package = "ade4") fr1 <- fourthcorner(mafragh$env, mafragh$flo, mafragh$traits$tabQuantitative, nrepet = 49) phy <- read.tree(text = mafragh$tre) lw <- nb2listw(mafragh$nb) fr1.msr <- msr(fr1, listwORorthobasis = lw, phyloORorthobasis = phy) fr1 fr1.msr }
if(require("ade4", quietly = TRUE) & require("adephylo", quietly = TRUE) & require("spdep", quietly = TRUE) & require("ape", quietly = TRUE)){ data(mafragh, package = "ade4") fr1 <- fourthcorner(mafragh$env, mafragh$flo, mafragh$traits$tabQuantitative, nrepet = 49) phy <- read.tree(text = mafragh$tre) lw <- nb2listw(mafragh$nb) fr1.msr <- msr(fr1, listwORorthobasis = lw, phyloORorthobasis = phy) fr1 fr1.msr }
This function allows to test the Mantel statistic using constrained null models in the presence of spatial autocorrelation. Random replicates of the second distance matrix are produced. They are spatially-constrained to preserve the global autocorrelation (Moran's I) and the spatial structures at multiple scales. Multiscale property is defined by the power spectrum (i.e. decomposition of the variance of the original variables) on a basis of orthonormal eigenvectors (Moran's Eigenvector Maps, MEM).
## S3 method for class 'mantelrtest' msr( x, listwORorthobasis, nrepet = x$rep, method = c("pair", "triplet", "singleton"), ... )
## S3 method for class 'mantelrtest' msr( x, listwORorthobasis, nrepet = x$rep, method = c("pair", "triplet", "singleton"), ... )
x |
An object generated by the |
listwORorthobasis |
an object of the class |
nrepet |
an |
method |
an character specifying which algorithm should be used to
produce spatial replicates (see |
... |
further arguments of the |
An object of class randtest
.
Sylvie Clappe, Stephane Dray [email protected]
Crabot, J., Clappe, S., Dray, S. and Datry, T. (2019) Testing the Mantel statistic with a spatially-constrained permutation procedure. Methods in Ecology and Evolution. in press.
if(require("ade4", quietly = TRUE) & require("spdep", quietly = TRUE)){ data(mafragh, package = "ade4") d1 <- dist(mafragh$env[,1:3]) d2 <- dist(mafragh$env[,7]) t1 <- mantel.randtest(d1,d2) t1 lw <- nb2listw(mafragh$nb) t2 <- msr(t1, listwORorthobasis = lw) t2 }
if(require("ade4", quietly = TRUE) & require("spdep", quietly = TRUE)){ data(mafragh, package = "ade4") d1 <- dist(mafragh$env[,1:3]) d2 <- dist(mafragh$env[,7]) t1 <- mantel.randtest(d1,d2) t1 lw <- nb2listw(mafragh$nb) t2 <- msr(t1, listwORorthobasis = lw) t2 }
The functions allows to evaluate the significance and estimate parts in variation partitioning using Moran Spectral Randomization (MSR) as a spatially-constrained null model to account for spatial autocorrelation in table X. Hence, this function provides a variation partioning adujsted for spurious correlation due to spatial autocorrelation in both the response and one explanatory matrix.
## S3 method for class 'varipart' msr( x, listwORorthobasis, nrepet = x$test$rep[1], method = c("pair", "triplet", "singleton"), ... )
## S3 method for class 'varipart' msr( x, listwORorthobasis, nrepet = x$test$rep[1], method = c("pair", "triplet", "singleton"), ... )
x |
An object generated by the |
listwORorthobasis |
an object of the class |
nrepet |
an |
method |
an character specifying which algorithm should be used to
produce spatial replicates (see |
... |
further arguments of the |
The function corrects the biases due to spatial autocorrelation by using MSR procedure to produce environmental predictors that preserve the spatial autocorrelation and the correlation structures of the original environmental variables while being generated independently of species distribution.
An object of class varipart
randomized replicates.
(s) Stephane Dray [email protected] and Sylvie Clappe [email protected]
Clappe, S., Dray S. and P.R. Peres-Neto (2018) Beyond neutrality: disentangling the effects of species sorting and spurious correlations in community analysis. Ecology 99:1737-1747.
Wagner, H. H., and S. Dray (2015). Generating spatially constrained null models for irregularly spaced data using Moran spectral randomization methods. Methods in Ecology and Evolution 6:1169-1178.
library(ade4) library(spdep) data(mafragh) ## Performing standard variation partitioning dudiY <- dudi.pca(mafragh$flo, scannf = FALSE, scale = FALSE) mafragh.lw <- nb2listw(mafragh$nb) me <- mem(mafragh.lw, MEM.autocor = "positive") vprda <- varipart(dudiY, mafragh$env, me, type = "parametric") ## Adjust estimation and compute p-value by msr methods vprda.msr <- msr(vprda, mafragh.lw, nrepet=99) vprda.msr
library(ade4) library(spdep) data(mafragh) ## Performing standard variation partitioning dudiY <- dudi.pca(mafragh$flo, scannf = FALSE, scale = FALSE) mafragh.lw <- nb2listw(mafragh$nb) me <- mem(mafragh.lw, MEM.autocor = "positive") vprda <- varipart(dudiY, mafragh$env, me, type = "parametric") ## Adjust estimation and compute p-value by msr methods vprda.msr <- msr(vprda, mafragh.lw, nrepet=99) vprda.msr
Compute mst and returns as a nb
object
mst.nb(dxy)
mst.nb(dxy)
dxy |
A distance matrix based on spatial coordinates of samples |
An object of class nb
Stéphane Dray [email protected]
xy <- matrix(rnorm(60),30,2) dxy <- dist(xy) th <- give.thresh(dxy) nb1 <- mst.nb(dxy) nb1 wh1 <- which(as.matrix(dxy)==th,arr.ind=TRUE) plot(nb1,xy,pch=20,cex=2,lty=3) lines(xy[wh1[1,],1],xy[wh1[1,],2],lwd=2) title(main="Maximum distance of the minimum spanning tree in bold")
xy <- matrix(rnorm(60),30,2) dxy <- dist(xy) th <- give.thresh(dxy) nb1 <- mst.nb(dxy) nb1 wh1 <- which(as.matrix(dxy)==th,arr.ind=TRUE) plot(nb1,xy,pch=20,cex=2,lty=3) lines(xy[wh1[1,],1],xy[wh1[1,],2],lwd=2) title(main="Maximum distance of the minimum spanning tree in bold")
This function provides a multivariate extension of the univariate method of spatial autocorrelation analysis. It provides a spatial ordination by maximizing the product of variance by spatial autocorrelation.
multispati(dudi, listw, scannf = TRUE, nfposi = 2, nfnega = 0) ## S3 method for class 'multispati' summary(object, ...) ## S3 method for class 'multispati' print(x, ...) ## S3 method for class 'multispati' plot(x, xax = 1, yax = 2, pos = -1, storeData = TRUE, plot = TRUE, ...)
multispati(dudi, listw, scannf = TRUE, nfposi = 2, nfnega = 0) ## S3 method for class 'multispati' summary(object, ...) ## S3 method for class 'multispati' print(x, ...) ## S3 method for class 'multispati' plot(x, xax = 1, yax = 2, pos = -1, storeData = TRUE, plot = TRUE, ...)
dudi |
an object of class |
listw |
an object of class |
scannf |
a logical value indicating whether the eigenvalues barplot should be displayed |
nfposi |
an integer indicating the number of axes with positive autocorrelation |
nfnega |
an integer indicating the number of axes with negative autocorrelation |
... |
further arguments passed to or from other methods |
x , object
|
an object of class |
xax , yax
|
the numbers of the x-axis and the y-axis |
pos |
an integer indicating the position of the environment where the
data are stored, relative to the environment where the function is called.
Useful only if |
storeData |
a logical indicating if the data should be stored in the
returned object. If |
plot |
a logical indicating if the graphics is displayed |
This analysis generalizes the Wartenberg's multivariate spatial correlation
analysis to various duality diagrams created by the functions
(dudi.pca
, dudi.coa
, dudi.acm
, dudi.mix
...) If
dudi is a duality diagram created by the function dudi.pca
and
listw gives spatial weights created by a row normalized coding
scheme, the analysis is equivalent to Wartenberg's analysis.
We note X the data frame with the variables, Q the column weights matrix and
D the row weights matrix associated to the duality diagram dudi. We
note L the neighbouring weights matrix associated to listw. Then, the
'multispati'
analysis gives principal axes v that maximize the
product of spatial autocorrelation and inertia of row scores :
Returns an object of class multispati
, which contains the
following elements :
eig |
a numeric vector containing the eigenvalues |
nfposi |
integer, number of kept axes associated to positive eigenvalues |
nfnega |
integer, number of kept axes associated to negative eigenvalues |
c1 |
principle axes (v), data frame with p rows and (nfposi + nfnega) columns |
li |
principal components (XQv), data frame with n rows and (nfposi + nfnega) columns |
ls |
lag vector onto the principal axes (LXQv), data frame with n rows and (nfposi + nfnega) columns |
as |
principal axes of the dudi analysis (u) onto principal axes of multispati (t(u)Qv), data frame with nf rows and (nfposi + nfnega) columns |
Stéphane Dray [email protected] with contributions by Daniel Chessel, Sebastien Ollier and Thibaut Jombart
Dray, S., Said, S. and Debias, F. (2008) Spatial ordination of vegetation data using a generalization of Wartenberg's multivariate spatial correlation. Journal of vegetation science, 19, 45–56.
Grunsky, E. C. and Agterberg, F. P. (1988) Spatial and multivariate analysis of geochemical data from metavolcanic rocks in the Ben Nevis area, Ontario. Mathematical Geology, 20, 825–861.
Switzer, P. and Green, A.A. (1984) Min/max autocorrelation factors for multivariate spatial imagery. Tech. rep. 6, Stanford University.
Thioulouse, J., Chessel, D. and Champely, S. (1995) Multivariate analysis of spatial patterns: a unified approach to local and global structures. Environmental and Ecological Statistics, 2, 1–14.
Wartenberg, D. E. (1985) Multivariate spatial correlation: a method for exploratory geographical analysis. Geographical Analysis, 17, 263–283.
if (require(spdep, quiet = TRUE) & require(ade4, quiet = TRUE)) { data(mafragh) maf.xy <- mafragh$xy maf.flo <- mafragh$flo maf.listw <- nb2listw(mafragh$nb) if(adegraphicsLoaded()) { g1 <- s.label(maf.xy, nb = mafragh$nb, plab.cex = 0.75) } else { s.label(maf.xy, neig = mafragh$neig, clab = 0.75) } maf.coa <- dudi.coa(maf.flo,scannf = FALSE) maf.coa.ms <- multispati(maf.coa, maf.listw, scannf = FALSE, nfposi = 2, nfnega = 2) maf.coa.ms ### detail eigenvalues components fgraph <- function(obj){ # use multispati summary sum.obj <- summary(obj) # compute Imin and Imax Ibounds <- moran.bounds(eval(as.list(obj$call)$listw)) Imin <- Ibounds[1] Imax <- Ibounds[2] I0 <- -1/(nrow(obj$li)-1) # create labels labels <- lapply(1:length(obj$eig),function(i) bquote(lambda[.(i)])) # draw the plot xmax <- eval(as.list(obj$call)$dudi)$eig[1]*1.1 oldpar <- par(las=1) var <- sum.obj[,2] moran <- sum.obj[,3] plot(x=var,y=moran,type='n',xlab='Inertia',ylab="Spatial autocorrelation (I)", xlim=c(0,xmax),ylim=c(Imin*1.1,Imax*1.1),yaxt='n') text(x=var,y=moran,do.call(expression,labels)) ytick <- c(I0,round(seq(Imin,Imax,le=5),1)) ytlab <- as.character(round(seq(Imin,Imax,le=5),1)) ytlab <- c(as.character(round(I0,1)),as.character(round(Imin,1)), ytlab[2:4],as.character(round(Imax,1))) axis(side=2,at=ytick,labels=ytlab) rect(0,Imin,xmax,Imax,lty=2) segments(0,I0,xmax,I0,lty=2) abline(v=0) title("Spatial and inertia components of the eigenvalues") par(oldpar) } fgraph(maf.coa.ms) ## end eigenvalues details if(adegraphicsLoaded()) { g2 <- s1d.barchart(maf.coa$eig, p1d.hori = FALSE, plot = FALSE) g3 <- s1d.barchart(maf.coa.ms$eig, p1d.hori = FALSE, plot = FALSE) g4 <- s.corcircle(maf.coa.ms$as, plot = FALSE) G1 <- ADEgS(list(g2, g3, g4), layout = c(1, 3)) } else { oldpar <- par(mfrow = c(1, 3)) barplot(maf.coa$eig) barplot(maf.coa.ms$eig) s.corcircle(maf.coa.ms$as) par(oldpar) } if(adegraphicsLoaded()) { g5 <- s.value(maf.xy, -maf.coa$li[, 1], plot = FALSE) g6 <- s.value(maf.xy, -maf.coa$li[, 2], plot = FALSE) g7 <- s.value(maf.xy, maf.coa.ms$li[, 1], plot = FALSE) g8 <- s.value(maf.xy, maf.coa.ms$li[, 2], plot = FALSE) G2 <- ADEgS(list(g5, g6, g7, g8), layout = c(2, 2)) } else { oldpar <- par(mfrow = c(2, 2)) s.value(maf.xy, -maf.coa$li[, 1]) s.value(maf.xy, -maf.coa$li[, 2]) s.value(maf.xy, maf.coa.ms$li[, 1]) s.value(maf.xy, maf.coa.ms$li[, 2]) par(oldpar) } w1 <- -maf.coa$li[, 1:2] w1m <- apply(w1, 2, lag.listw, x = maf.listw) w1.ms <- maf.coa.ms$li[, 1:2] w1.msm <- apply(w1.ms, 2, lag.listw, x = maf.listw) if(adegraphicsLoaded()) { g9 <- s.match(w1, w1m, plab.cex = 0.75, plot = FALSE) g10 <- s.match(w1.ms, w1.msm, plab.cex = 0.75, plot = FALSE) G3 <- cbindADEg(g9, g10, plot = TRUE) } else { oldpar <- par(mfrow = c(1,2)) s.match(w1, w1m, clab = 0.75) s.match(w1.ms, w1.msm, clab = 0.75) par(oldpar) } maf.pca <- dudi.pca(mafragh$env, scannf = FALSE) maf.pca.ms <- multispati(maf.pca, maf.listw, scannf=FALSE) plot(maf.pca.ms) }
if (require(spdep, quiet = TRUE) & require(ade4, quiet = TRUE)) { data(mafragh) maf.xy <- mafragh$xy maf.flo <- mafragh$flo maf.listw <- nb2listw(mafragh$nb) if(adegraphicsLoaded()) { g1 <- s.label(maf.xy, nb = mafragh$nb, plab.cex = 0.75) } else { s.label(maf.xy, neig = mafragh$neig, clab = 0.75) } maf.coa <- dudi.coa(maf.flo,scannf = FALSE) maf.coa.ms <- multispati(maf.coa, maf.listw, scannf = FALSE, nfposi = 2, nfnega = 2) maf.coa.ms ### detail eigenvalues components fgraph <- function(obj){ # use multispati summary sum.obj <- summary(obj) # compute Imin and Imax Ibounds <- moran.bounds(eval(as.list(obj$call)$listw)) Imin <- Ibounds[1] Imax <- Ibounds[2] I0 <- -1/(nrow(obj$li)-1) # create labels labels <- lapply(1:length(obj$eig),function(i) bquote(lambda[.(i)])) # draw the plot xmax <- eval(as.list(obj$call)$dudi)$eig[1]*1.1 oldpar <- par(las=1) var <- sum.obj[,2] moran <- sum.obj[,3] plot(x=var,y=moran,type='n',xlab='Inertia',ylab="Spatial autocorrelation (I)", xlim=c(0,xmax),ylim=c(Imin*1.1,Imax*1.1),yaxt='n') text(x=var,y=moran,do.call(expression,labels)) ytick <- c(I0,round(seq(Imin,Imax,le=5),1)) ytlab <- as.character(round(seq(Imin,Imax,le=5),1)) ytlab <- c(as.character(round(I0,1)),as.character(round(Imin,1)), ytlab[2:4],as.character(round(Imax,1))) axis(side=2,at=ytick,labels=ytlab) rect(0,Imin,xmax,Imax,lty=2) segments(0,I0,xmax,I0,lty=2) abline(v=0) title("Spatial and inertia components of the eigenvalues") par(oldpar) } fgraph(maf.coa.ms) ## end eigenvalues details if(adegraphicsLoaded()) { g2 <- s1d.barchart(maf.coa$eig, p1d.hori = FALSE, plot = FALSE) g3 <- s1d.barchart(maf.coa.ms$eig, p1d.hori = FALSE, plot = FALSE) g4 <- s.corcircle(maf.coa.ms$as, plot = FALSE) G1 <- ADEgS(list(g2, g3, g4), layout = c(1, 3)) } else { oldpar <- par(mfrow = c(1, 3)) barplot(maf.coa$eig) barplot(maf.coa.ms$eig) s.corcircle(maf.coa.ms$as) par(oldpar) } if(adegraphicsLoaded()) { g5 <- s.value(maf.xy, -maf.coa$li[, 1], plot = FALSE) g6 <- s.value(maf.xy, -maf.coa$li[, 2], plot = FALSE) g7 <- s.value(maf.xy, maf.coa.ms$li[, 1], plot = FALSE) g8 <- s.value(maf.xy, maf.coa.ms$li[, 2], plot = FALSE) G2 <- ADEgS(list(g5, g6, g7, g8), layout = c(2, 2)) } else { oldpar <- par(mfrow = c(2, 2)) s.value(maf.xy, -maf.coa$li[, 1]) s.value(maf.xy, -maf.coa$li[, 2]) s.value(maf.xy, maf.coa.ms$li[, 1]) s.value(maf.xy, maf.coa.ms$li[, 2]) par(oldpar) } w1 <- -maf.coa$li[, 1:2] w1m <- apply(w1, 2, lag.listw, x = maf.listw) w1.ms <- maf.coa.ms$li[, 1:2] w1.msm <- apply(w1.ms, 2, lag.listw, x = maf.listw) if(adegraphicsLoaded()) { g9 <- s.match(w1, w1m, plab.cex = 0.75, plot = FALSE) g10 <- s.match(w1.ms, w1.msm, plab.cex = 0.75, plot = FALSE) G3 <- cbindADEg(g9, g10, plot = TRUE) } else { oldpar <- par(mfrow = c(1,2)) s.match(w1, w1m, clab = 0.75) s.match(w1.ms, w1.msm, clab = 0.75) par(oldpar) } maf.pca <- dudi.pca(mafragh$env, scannf = FALSE) maf.pca.ms <- multispati(maf.pca, maf.listw, scannf=FALSE) plot(maf.pca.ms) }
This function is now deprecated. Please try the new mem.select
function.
ortho.AIC(Y, X, ord.var = FALSE)
ortho.AIC(Y, X, ord.var = FALSE)
Y |
A matrix with response variables (univariate or multivariate response) |
X |
A set of orthonormal and centered vectors |
ord.var |
A logical value indicating if the order of variables and cumulative R2 must be returned |
This function compute corrected AIC for models with orthonormal and centered explanatory variables such as MEM spatial eigenfunctions. Variables are sorted by their contribution to R2.
It ensures that a model with k variables is the best one that can be obtained. By default, response variables are centered (model with intercept).
A vector with corrected AIC if ord.var=FALSE
. A list if
ord.var=TRUE
with:
AICc |
Values of corrected AIC. |
AICc0 |
Values of corrected AIC for the null model (only intercept). |
ord |
Order of variables to be enter in the model |
R2 |
Cumulative R2 |
Stéphane Dray [email protected]
Godinez-Dominguez E. and Freire J. (2003) Information-theoretic approach for selection of spatial and temporal models of community organization. Marine Ecology - Progress Series. 253, 17–24
y <- matrix(rnorm(50),50,1) x <- svd(scale(y %*% c(0.1,0.5,2,0,0.7)+matrix(rnorm(250),50,5)))$u res <- ortho.AIC(y,x,ord.var=TRUE) minAIC <- which.min(res$AICc) nvar <- length(1:minAIC)+1 # number of orthogonal vectors + 1 for intercept lm1 <- lm(y~x[,res$ord[1:minAIC]]) summary(lm1)$r.squared # R2 res$R2[minAIC] # the same min(res$AICc) # corrected AIC extractAIC(lm1) # classical AIC min(res$AICc)-2*(nvar*(nvar+1))/(nrow(x)-nvar-1) # the same lm2 <- lm(y~1) res$AICc0 # corrected AIC for the null model extractAIC(lm2) # classical AIC res$AICc0-2*(1*(1+1))/(nrow(x)-1-1) # the same
y <- matrix(rnorm(50),50,1) x <- svd(scale(y %*% c(0.1,0.5,2,0,0.7)+matrix(rnorm(250),50,5)))$u res <- ortho.AIC(y,x,ord.var=TRUE) minAIC <- which.min(res$AICc) nvar <- length(1:minAIC)+1 # number of orthogonal vectors + 1 for intercept lm1 <- lm(y~x[,res$ord[1:minAIC]]) summary(lm1)$r.squared # R2 res$R2[minAIC] # the same min(res$AICc) # corrected AIC extractAIC(lm1) # classical AIC min(res$AICc)-2*(nvar*(nvar+1))/(nrow(x)-nvar-1) # the same lm2 <- lm(y~1) res$AICc0 # corrected AIC for the null model extractAIC(lm2) # classical AIC res$AICc0-2*(1*(1+1))/(nrow(x)-1-1) # the same
This function computes orthgonal polynomials of geographical coordinates. Polynomials
functions are orthogonal and centred for the weights defined in wt
(i.e., orthogonal to wt
). It is the classical approach to perform
trend surface analysis.
orthobasis.poly(coords, degree = 2, wt = rep(1/nrow(coords), nrow(coords)))
orthobasis.poly(coords, degree = 2, wt = rep(1/nrow(coords), nrow(coords)))
coords |
either a |
degree |
the degree of the polynomial |
wt |
a vector of weights. It is used to orthogonalize the polynomial functions |
an object of class orthobasisSp
, subclass orthobasis
Stéphane Dray [email protected]
Dray S., Pélissier R., Couteron P., Fortin M.J., Legendre P., Peres-Neto P.R., Bellier E., Bivand R., Blanchet F.G., De Caceres M., Dufour A.B., Heegaard E., Jombart T., Munoz F., Oksanen J., Thioulouse J., Wagner H.H. (2012). Community ecology in the age of multivariate multiscale spatial analysis. Ecological Monographs 82, 257–275.
if(require("ade4", quietly = TRUE)){ data(mafragh, package = "ade4") pol2 <- orthobasis.poly(mafragh$Spatial) if(require("adegraphics", quietly = TRUE)){ plot(pol2, mafragh$Spatial) } }
if(require("ade4", quietly = TRUE)){ data(mafragh, package = "ade4") pol2 <- orthobasis.poly(mafragh$Spatial) if(require("adegraphics", quietly = TRUE)){ plot(pol2, mafragh$Spatial) } }
Method plot.constr.hclust
displays the results of space-constrained or
time-constrained agglomerative cluster analyses obtained from multivariate
dissimilarity matrices.
## S3 method for class 'constr.hclust' plot(x, k, xlim, ylim, xlab, ylab, bg, col, lty, lwd, col.links, links=FALSE, points=TRUE, pch=21L, hybrids=c("change","single","none"), lty.hyb=1L, lwd.hyb=1, col.hyb="black", plot=TRUE, axes=TRUE, cex=1, lwd.pt=1, invert.axes=FALSE, ...)
## S3 method for class 'constr.hclust' plot(x, k, xlim, ylim, xlab, ylab, bg, col, lty, lwd, col.links, links=FALSE, points=TRUE, pch=21L, hybrids=c("change","single","none"), lty.hyb=1L, lwd.hyb=1, col.hyb="black", plot=TRUE, axes=TRUE, cex=1, lwd.pt=1, invert.axes=FALSE, ...)
x |
A |
k |
The number of clusters to delineate. |
xlim |
Optional: limits, in abscissa, of the zone to be plotted. |
ylim |
Optional: limits, in ordinate, of the zone to be plotted. |
xlab |
Optional: labels for x axis annotation. |
ylab |
Optional: labels for y axis annotation. |
bg |
Optional: a colour or set of colours to be used for the point backgrounds (one for each of the k clusters), see Details. |
col |
Optional: a colour or set of colours to be used for the point
outlines (defaults to |
lty |
Optional: reference line type (see graphical parameters for details). |
lwd |
Optional: reference line width (see graphical parameters for details). |
col.links |
Optional: a colour or set of colours to be used for the links within a cluster (defaults to the point background colours when omitted). |
links |
Should segments be drawn to represent the edges (links) (default: FALSE). |
points |
Should observation points be drawn (default: TRUE). |
pch |
Point character to display observations (default: 21, a circle with a background colour). |
hybrids |
How should hybrid segments be drawn (default: "change"). |
lty.hyb |
Line type to use for hybrid segments (default: lty). |
lwd.hyb |
Width of hybrid segments with respect to lwd (default: 1). |
col.hyb |
Colour of hybrid segments, when applicable (default: "black"). |
plot |
Should a new plotting window be opened first (default: TRUE). |
axes |
Should the axes be displayed (default: TRUE). |
cex |
Text and symbol magnification (see graphical parameters) (default: 1). |
lwd.pt |
Line width around points with respect to lwd (default: 1). |
invert.axes |
Should axes be inverted on the plot (default: FALSE). |
... |
Other graphical parameters. |
The plotting method uses the coordinates provided by the user of
constr.hclust
to display the observations. It cuts the tree
(see cutree) into k
clusters and displays each cluster using
the indices returned by cutree
. The point background colours
can be provided using argument bg
. When they are omitted, default
colours are provided automatically by the function as either a set of
mutually contrasting colours (when k <= 10
) or rainbow colours (when
k > 10
). When links = TRUE
, each edge is displayed as a segment
with a colour corresponding to the identity of the clusters located at its
ends. A special treatment is done for hybrids edges: those whose ends lie in
different clusters; it is controlled by argument hybrids
. When
argument hybrids="change"
(the default), hybrid links are represented
as segments whose colours change halfway. When hybrids="single"
,
hybrid edges are shown as single-color lines, whose color is given as
argument col.hyb
, whereas hybrids="none"
suppresses the drawing
of hybrid edges. Whenever hybrid edges are displayed, their width with
respect to the lwd value is controlled by argument lwd.hyb
.
When argument plot=FALSE
, no plot
command is issued and the
points (and segments when links = TRUE
) are drawn over an existing
plotting window. This functionality is to allow one to plot the result of a
constrained clustering over an existing map. In that case, arguments
xlim
, ylim
, axes
, and all other
graphical parameters to which the method plot would responds
are ignored.
When disjoint clusters are present (i.e., when the graph provided to
constr.hclust
is not entirely connected), the function does not
allow one to plot fewer clusters than the number of disjoint subsets; a
warning message is issued to notify the user.
Guillaume Guénard [email protected] and Pierre Legendre [email protected]
## Artificial map data from Legendre & Legendre (2012, Fig. 13.26) ## n = 16 dat <- c(41,42,25,38,50,30,41,43,43,41,30,50,38,25,42,41) coord.dat <- matrix(c(1,3,5,7,2,4,6,8,1,3,5,7,2,4,6,8, 4.4,4.4,4.4,4.4,3.3,3.3,3.3,3.3, 2.2,2.2,2.2,2.2,1.1,1.1,1.1,1.1),16,2) ## Obtaining a list of neighbours: library(spdep) listW <- nb2listw(tri2nb(coord.dat), style="B") links.mat.dat <- listw2mat(listW) neighbors <- listw2sn(listW)[,1:2] ## Calculating the (Euclidean) distance between points: D.dat <- dist(dat) ## Display the points: plot(coord.dat, type='n',asp=1) title("Delaunay triangulation") text(coord.dat, labels=as.character(as.matrix(dat)), pos=3) for(i in 1:nrow(neighbors)) lines(rbind(coord.dat[neighbors[i,1],], coord.dat[neighbors[i,2],])) ## Clustering with a contiguity constraint described by a list of ## links: grpWD2cst_constr_hclust <- constr.hclust( D.dat, method="ward.D2", neighbors, coord.dat) ## Plot the results with k=5 clusters on a map: plot(grpWD2cst_constr_hclust, k=5, links=TRUE, las=1, xlab="Eastings", ylab="Northings", cex=3, lwd=3) ## Repeat the plot with other values of k (number of groups)
## Artificial map data from Legendre & Legendre (2012, Fig. 13.26) ## n = 16 dat <- c(41,42,25,38,50,30,41,43,43,41,30,50,38,25,42,41) coord.dat <- matrix(c(1,3,5,7,2,4,6,8,1,3,5,7,2,4,6,8, 4.4,4.4,4.4,4.4,3.3,3.3,3.3,3.3, 2.2,2.2,2.2,2.2,1.1,1.1,1.1,1.1),16,2) ## Obtaining a list of neighbours: library(spdep) listW <- nb2listw(tri2nb(coord.dat), style="B") links.mat.dat <- listw2mat(listW) neighbors <- listw2sn(listW)[,1:2] ## Calculating the (Euclidean) distance between points: D.dat <- dist(dat) ## Display the points: plot(coord.dat, type='n',asp=1) title("Delaunay triangulation") text(coord.dat, labels=as.character(as.matrix(dat)), pos=3) for(i in 1:nrow(neighbors)) lines(rbind(coord.dat[neighbors[i,1],], coord.dat[neighbors[i,2],])) ## Clustering with a contiguity constraint described by a list of ## links: grpWD2cst_constr_hclust <- constr.hclust( D.dat, method="ward.D2", neighbors, coord.dat) ## Plot the results with k=5 clusters on a map: plot(grpWD2cst_constr_hclust, k=5, links=TRUE, las=1, xlab="Eastings", ylab="Northings", cex=3, lwd=3) ## Repeat the plot with other values of k (number of groups)
This function allow to plot or map orthogonal bases
## S3 method for class 'orthobasisSp' plot(x, SpORcoords, pos = -1, plot = TRUE, match.ID = FALSE, ...)
## S3 method for class 'orthobasisSp' plot(x, SpORcoords, pos = -1, plot = TRUE, match.ID = FALSE, ...)
x |
an object of class |
SpORcoords |
either a |
pos |
an integer indicating the position of the environment where the
data are stored, relative to the environment where the function is called.
Useful only if |
plot |
a logical indicating if the graphics is displayed |
match.ID |
a logical indicating if names of geographic entities match
rownames of the |
... |
additional graphical parameters (see |
an object of class ADEgS
, generated by the s.Spatial
function of the adegraphics
package
Stéphane Dray [email protected]
if(require("ade4", quietly = TRUE) & require("spdep", quietly = TRUE)){ data(mafragh) me <- mem(nb2listw(mafragh$nb)) if(require("adegraphics", quietly = TRUE)){ plot(me[,1:6], mafragh$xy) plot(me[,1:6], mafragh$Spatial) } }
if(require("ade4", quietly = TRUE) & require("spdep", quietly = TRUE)){ data(mafragh) me <- mem(nb2listw(mafragh$nb)) if(require("adegraphics", quietly = TRUE)){ plot(me[,1:6], mafragh$xy) plot(me[,1:6], mafragh$Spatial) } }
B-C plots are an important step in temporal beta diversity analysis. This function draws B-C plots from the output of function TBI. Different graphic options are available.
## S3 method for class 'TBI' plot( x, type = "BC", s.names = NULL, pch.loss = 21, pch.gain = 22, cex.names = 1, col.rim = "black", col.bg = "gold1", cex.symb = 3, diam = TRUE, main = "B-C plot", cex.main = 1, cex.lab = 1, xlim = NULL, ylim = NULL, silent = TRUE, ... )
## S3 method for class 'TBI' plot( x, type = "BC", s.names = NULL, pch.loss = 21, pch.gain = 22, cex.names = 1, col.rim = "black", col.bg = "gold1", cex.symb = 3, diam = TRUE, main = "B-C plot", cex.main = 1, cex.lab = 1, xlim = NULL, ylim = NULL, silent = TRUE, ... )
x |
Output of a temporal beta diversity analysis with function TBI. The matrix BCD.mat will be extracted from that object. This matrix contains the B/den statistics in column 1 and the C/den statistics in column 2, where "den" is the denominator used in the TBI analysis. |
type |
Specify which outputs are plotted. At this time, only BC plots are implemented |
s.names |
a vector of names: Site names will be printed on the BC plot. Examples: s.names=1:25; s.names=paste("Site",1:25,sep="."); s.names=rownames(res1$BCD.mat). Else, s.names=NULL (default): no site names will be printed. |
pch.loss |
Symbol used for sites where losses > gains. Default: pch=21, circles. Symbols 21 to 25 have a black rim and can be filled with different colours (argument col.bg); see documentation of function points. Symbols 0 to 20 only have a rim. |
pch.gain |
Symbol used for sites where losses >= gains. Default: pch=21, squares. |
cex.names |
Multiplier for the font size of the site names. |
col.rim |
Colour of symbol rims in the plot. The following colours have been used
in BC plots: |
col.bg |
Colour filling symbols 21 to 25 in the plot. |
cex.symb |
Multiplier for size of the symbols representing the TBI values of the sites in the plot. With cex.symb=NULL, symbols have small and uniform sizes. |
diam |
If |
main |
Main title above the plot. Change the title and adapt it to your study. |
cex.main |
Multiplier for the font size of the main title. |
cex.lab |
Multiplier for the font size of the labels. |
xlim |
The x limits of the plot, e.g. c(0,1). |
ylim |
The y limits of the plot, e.g. c(0,1). |
silent |
If |
... |
Other arguments to be passed to the function |
B-C plots are an informative output of temporal beta diversity analysis. The species losses (B statistics) form the abscissa and the gains (C statistics) are on the ordinate of the plot. The objective is to illustrate whether the temporal changes at the various sites are dominated by gains or by losses. Distinctive symbols are used for the sites dominated by gains (default: squares) and by losses (default: circles). The symbols are drawn to sizes representing the values of the D = (B+C) statistics.
A graph in the R graphic window, with the same scale along the 2 axes (asp=1).
Pierre Legendre [email protected]
Legendre, P. 2019. A temporal beta-diversity index to identify sites that have changed in exceptional ways in space-time surveys. Ecology and Evolution (in press).
van den Brink, P. J. & C. J. F. ter Braak. 1999. Principal response curves: analysis of time-dependent multivariate responses of biological community to stress. Environmental Toxicology and Chemistry 18: 138-148.
if(require("vegan", quietly = TRUE)) { ## Example 1 - ## Invertebrate communities subjected to insecticide treatment. ## As an example in their paper on Principal Response Curves (PRC method), van den ## Brink & ter Braak (1999) used observations on the abundances of 178 invertebrate ## species (macroinvertebrates and zooplankton) subjected to treatments in 12 mesocosms ## by the insecticide chlorpyrifos. The mesocosms were sampled at 11 occasions. The ## data, available in the {vegan} package, are log-transformed species abundances, ## ytranformed = loge(10*y+1). ## The data of survey #4 will be compared to those of survey #11 in this example. ## Survey #4 was carried out one week after the insecticide treatment, whereas the ## fauna of the mesocosms was considered by the authors to have fully recovered from ## the insecticide treatment at survey #11. data(pyrifos) ## The mesocosms had originally been attributed at random to the treatments. However, ## to facilitate presentation of the results, they will be listed here in order of ## increased insecticide doses: {0, 0, 0, 0, 0.1, 0.1, 0.9, 0.9, 6, 6, 44, 44} ## micro g/L. ## Select the 12 data rows of surveys 4 and 11 from the data file and reorder them ord4 <- c(38,39,41,47,37,44,40,46,43,48,42,45) ord11 <- c(122,123,125,131,121,128,124,130,127,132,126,129) ## Run the TBI function res1 <- TBI(pyrifos[ord4,], pyrifos[ord11,], method = "%diff", nperm = 0, test.t.perm = FALSE) res1$BCD.mat ## Draw BC plots oldpar <- par(mfrow=c(1,2)) s.names <- paste("Surv",1:12,sep=".") ## In the 1st plot, the symbols have diameters proportional to the site TBI statistics plot(res1, s.names=s.names, col.bg="red", pch.loss=21, pch.gain=22, main="B-C plot, Pyrifos, surveys 4 & 11") ## In the 2nd plot, control the axes limit values by specifying xlim and ylim plot(res1, s.names=1:12, col.bg="green", pch.loss=23, pch.gain=24, main="B-C plot, Pyrifos, surveys 4 & 11", xlim=c(0,0.5), ylim=c(0.1,0.6)) ## In the 3rd plot, draw all symbols small and of the same size, using cex.symb=NULL par(oldpar) plot(res1, s.names=1:12, col.bg="gold", pch.loss=23, pch.gain=24, main="B-C plot, Pyrifos, surveys 4 & 11", cex.symb=NULL) ## Example 2 - ## This example uses the mite data available in vegan. Let us pretend that sites 1-20 ## represent a survey at time 1 (T1) and sites 21-40 a survey at time 2 (T2). data(mite) ## Run the TBI function res2 <- TBI(mite[1:20,],mite[21:40,],method="%diff",nperm=0,test.t.perm=FALSE) res2$BCD.mat ## Draw BC plots oldpar <- par(mfrow=c(1,2)) s.names=rownames(res2$BCD.mat) ## In the 1st plot, the symbols have diameters proportional to the site TBI statistics plot(res2, s.names=s.names, col.bg="cadetblue2", pch.loss=21, pch.gain=22, main="B-C plot, Mite data") # In the 2nd plot, control the axes limit values by specifying xlim and ylim plot(res2, s.names=1:20, col.rim="coral2", pch.loss=19, pch.gain=15, main="B-C plot, Mite data", xlim=c(0,0.6), ylim=c(0,0.6)) par(oldpar) }
if(require("vegan", quietly = TRUE)) { ## Example 1 - ## Invertebrate communities subjected to insecticide treatment. ## As an example in their paper on Principal Response Curves (PRC method), van den ## Brink & ter Braak (1999) used observations on the abundances of 178 invertebrate ## species (macroinvertebrates and zooplankton) subjected to treatments in 12 mesocosms ## by the insecticide chlorpyrifos. The mesocosms were sampled at 11 occasions. The ## data, available in the {vegan} package, are log-transformed species abundances, ## ytranformed = loge(10*y+1). ## The data of survey #4 will be compared to those of survey #11 in this example. ## Survey #4 was carried out one week after the insecticide treatment, whereas the ## fauna of the mesocosms was considered by the authors to have fully recovered from ## the insecticide treatment at survey #11. data(pyrifos) ## The mesocosms had originally been attributed at random to the treatments. However, ## to facilitate presentation of the results, they will be listed here in order of ## increased insecticide doses: {0, 0, 0, 0, 0.1, 0.1, 0.9, 0.9, 6, 6, 44, 44} ## micro g/L. ## Select the 12 data rows of surveys 4 and 11 from the data file and reorder them ord4 <- c(38,39,41,47,37,44,40,46,43,48,42,45) ord11 <- c(122,123,125,131,121,128,124,130,127,132,126,129) ## Run the TBI function res1 <- TBI(pyrifos[ord4,], pyrifos[ord11,], method = "%diff", nperm = 0, test.t.perm = FALSE) res1$BCD.mat ## Draw BC plots oldpar <- par(mfrow=c(1,2)) s.names <- paste("Surv",1:12,sep=".") ## In the 1st plot, the symbols have diameters proportional to the site TBI statistics plot(res1, s.names=s.names, col.bg="red", pch.loss=21, pch.gain=22, main="B-C plot, Pyrifos, surveys 4 & 11") ## In the 2nd plot, control the axes limit values by specifying xlim and ylim plot(res1, s.names=1:12, col.bg="green", pch.loss=23, pch.gain=24, main="B-C plot, Pyrifos, surveys 4 & 11", xlim=c(0,0.5), ylim=c(0.1,0.6)) ## In the 3rd plot, draw all symbols small and of the same size, using cex.symb=NULL par(oldpar) plot(res1, s.names=1:12, col.bg="gold", pch.loss=23, pch.gain=24, main="B-C plot, Pyrifos, surveys 4 & 11", cex.symb=NULL) ## Example 2 - ## This example uses the mite data available in vegan. Let us pretend that sites 1-20 ## represent a survey at time 1 (T1) and sites 21-40 a survey at time 2 (T2). data(mite) ## Run the TBI function res2 <- TBI(mite[1:20,],mite[21:40,],method="%diff",nperm=0,test.t.perm=FALSE) res2$BCD.mat ## Draw BC plots oldpar <- par(mfrow=c(1,2)) s.names=rownames(res2$BCD.mat) ## In the 1st plot, the symbols have diameters proportional to the site TBI statistics plot(res2, s.names=s.names, col.bg="cadetblue2", pch.loss=21, pch.gain=22, main="B-C plot, Mite data") # In the 2nd plot, control the axes limit values by specifying xlim and ylim plot(res2, s.names=1:20, col.rim="coral2", pch.loss=19, pch.gain=15, main="B-C plot, Mite data", xlim=c(0,0.6), ylim=c(0,0.6)) par(oldpar) }
Rotate a set of XY coordinates by an angle (in radians)
rotation(xy, angle)
rotation(xy, angle)
xy |
A 2-columns matrix or data frame containing a set of X and Y coordinates. |
angle |
Numeric. A scalar giving the angle at which the points should be rotated. The angle is in radians. |
A 2-columns matrix of the same size as xy
giving the rotated
coordinates.
F. Guillaume Blanchet
### Create a set of coordinates coords<-cbind(runif(20),runif(20)) ### Create a series of angles rad<-seq(0,pi,l=20) for(i in rad){ coords.rot<-rotation(coords,i) plot(coords.rot) } ### Rotate the coordinates by an angle of 90 degrees coords.90<-rotation(coords,90*pi/180) coords.90 plot(coords,xlim=range(rbind(coords.90,coords)[,1]),ylim=range(rbind(coords.90,coords)[,2]),asp=1) points(coords.90,pch=19)
### Create a set of coordinates coords<-cbind(runif(20),runif(20)) ### Create a series of angles rad<-seq(0,pi,l=20) for(i in rad){ coords.rot<-rotation(coords,i) plot(coords.rot) } ### Rotate the coordinates by an angle of 90 degrees coords.90<-rotation(coords,90*pi/180) coords.90 plot(coords,xlim=range(rbind(coords.90,coords)[,1]),ylim=range(rbind(coords.90,coords)[,2]),asp=1) points(coords.90,pch=19)
The function decomposes the variance of a variable x
on a basis of
orthogonal vectors. The significance of the associated R-squared values is
tested by a randomization procedure. A smoothed scalogram is obtained by
summing the R-squared values into nblocks
.
scalogram( x, orthobasisSp, nblocks = ncol(orthobasisSp), nrepet = 999, p.adjust.method = "none" ) ## S3 method for class 'scalogram' plot(x, pos = -1, plot = TRUE, ...)
scalogram( x, orthobasisSp, nblocks = ncol(orthobasisSp), nrepet = 999, p.adjust.method = "none" ) ## S3 method for class 'scalogram' plot(x, pos = -1, plot = TRUE, ...)
x |
a numeric vector for univariate data or an object of class |
orthobasisSp |
an object of class |
nblocks |
an integer indicating the number of blocks in the smoothed scalogram |
nrepet |
an integer indicating the number of permutations used in the randomization procedure |
p.adjust.method |
a string indicating a method for multiple adjustment,
see |
pos |
an integer indicating the position of the environment where the
data are stored, relative to the environment where the function is called.
Useful only if |
plot |
a logical indicating if the graphics is displayed |
... |
additional graphical parameters (see |
On the plot, oberved R-squared values are represent by bars. A black line indicate the 0.95 quantile of the values obtained by permutations. Significant values are indicated by a '*'
The function scalogram
returns an object of class
scalogram
, subclass krandtest
. The plot
function
returns an object of class ADEgS
, generated by the functions of the
adegraphics
package
Stéphane Dray [email protected]
Dray S., Pélissier R., Couteron P., Fortin M.J., Legendre P., Peres-Neto P.R., Bellier E., Bivand R., Blanchet F.G., De Caceres M., Dufour A.B., Heegaard E., Jombart T., Munoz F., Oksanen J., Thioulouse J., Wagner H.H. (2012). Community ecology in the age of multivariate multiscale spatial analysis. Ecological Monographs 82, 257–275.
if(require("ade4", quietly = TRUE) & require("spdep", quietly = TRUE)){ data(mafragh) me <- mem(nb2listw(mafragh$nb)) if(require("adegraphics", quietly = TRUE)){ sc1 <- scalogram(mafragh$env$Conduc, me, nblocks = 10) plot(sc1) } }
if(require("ade4", quietly = TRUE) & require("spdep", quietly = TRUE)){ data(mafragh) me <- mem(nb2listw(mafragh$nb)) if(require("adegraphics", quietly = TRUE)){ sc1 <- scalogram(mafragh$env$Conduc, me, nblocks = 10) plot(sc1) } }
These functions compute MEM (i.e., eigenvectors of a doubly centered spatial weighting matrix). Corresponding eigenvalues are linearly related to Moran's index of spatial autocorrelation.
scores.listw( listw, wt = rep(1, length(listw$neighbours)), MEM.autocor = c("non-null", "all", "positive", "negative"), store.listw = FALSE ) mem( listw, wt = rep(1, length(listw$neighbours)), MEM.autocor = c("non-null", "all", "positive", "negative"), store.listw = FALSE ) orthobasis.listw( listw, wt = rep(1, length(listw$neighbours)), MEM.autocor = c("non-null", "all", "positive", "negative"), store.listw = FALSE ) ## S3 method for class 'orthobasisSp' x[i, j, drop = TRUE]
scores.listw( listw, wt = rep(1, length(listw$neighbours)), MEM.autocor = c("non-null", "all", "positive", "negative"), store.listw = FALSE ) mem( listw, wt = rep(1, length(listw$neighbours)), MEM.autocor = c("non-null", "all", "positive", "negative"), store.listw = FALSE ) orthobasis.listw( listw, wt = rep(1, length(listw$neighbours)), MEM.autocor = c("non-null", "all", "positive", "negative"), store.listw = FALSE ) ## S3 method for class 'orthobasisSp' x[i, j, drop = TRUE]
listw |
An object of the class |
wt |
A vector of weights. It is used to orthogonalize the eigenvectors. It could be useful if MEM are used in weighted regression or canonical correspondence analysis |
MEM.autocor |
A string indicating if all MEMs must be returned or only
those corresponding to non-null, positive or negative autocorrelation. The
difference between options |
store.listw |
A logical indicating if the spatial weighting matrix
should be stored in the attribute |
x |
An object of class |
i , j
|
Elements to extract (integer or empty): index of rows (i) and columns (j). |
drop |
A logical. If TRUE, object containing only one colum is converted in vector |
Testing the nullity of eigenvalues is based on E(i)/E(1) where E(i) is i-th eigenvalue and E(1) is the maximum absolute value of eigenvalues
An object of class orthobasisSp
, subclass orthobasis
.
The MEMs are stored as a data.frame
. It contains several attributes
(see ?attributes
) including:
values
: The
associated eigenvalues.
listw
: The associated spatial
weighting matrix (if store.listw = TRUE
).
Stéphane Dray [email protected]
Dray, S., Legendre, P., and Peres-Neto, P. R. (2006). Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling 196, 483–493.
Griffith D. A. (1996) Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data. Canadian Geographer 40, 351–367.
if(require("ade4", quietly = TRUE) & require("spdep", quietly = TRUE)){ data(oribatid) nbtri <- tri2nb(as.matrix(oribatid$xy)) sc.tri <- scores.listw(nb2listw(nbtri, style = "B")) summary(sc.tri) } if(require("adegraphics", quietly = TRUE)){ s.value(oribatid$xy,sc.tri[,1:9]) plot(sc.tri[,1:6], oribatid$xy, pSp.cex = 5, pSp.alpha = 0.5, pbackground.col = 'lightblue') }
if(require("ade4", quietly = TRUE) & require("spdep", quietly = TRUE)){ data(oribatid) nbtri <- tri2nb(as.matrix(oribatid$xy)) sc.tri <- scores.listw(nb2listw(nbtri, style = "B")) summary(sc.tri) } if(require("adegraphics", quietly = TRUE)){ s.value(oribatid$xy,sc.tri[,1:9]) plot(sc.tri[,1:6], oribatid$xy, pSp.cex = 5, pSp.alpha = 0.5, pbackground.col = 'lightblue') }
Single Malt Scotch whiskeys from 109 distilleries
data(ScotchWhiskey)
data(ScotchWhiskey)
A list with 12 members:
A SpatialPointsDataFrame-class
object containing
the geographic coordinates and other information about the distilleries.
The whiskey colour coded as a 14-level factor.
A set of 12 nasal notes (boolean).
A set of 8 body notes (boolean).
A set of 15 palatine notes (boolean).
A set of 19 finish (or after-taste) notes (boolean).
Number of characteristics attributed to each distillery for each of the four sets of boolean features: nose, body, palate, finish.
A listw
object (see nb2listw
)
containing information about the spatial edges (neighbour links) between the
distilleries.
A binary square matrix of the spatial connexions between the distilleries (contiguity matrix).
A SpatialLinesDataFrame-class
object
containing geographic information about the spatial links between the
distilleries.
A list of distance matrices obtained for each of the four sets of boolean features.
There are 5 data sets: color, nose, body, palate, and finish. The binary (0,1) descriptors are in the same order as on p. 239 of the whisky paper.
There are two whiskies in the classification from the Springbank distillery. One pertains to the Islay group, the other to the Western group.
Please let us know of the analyses you have performed with the whiskey data, especially if you intend to publish them.
The distance matrices were calculated separately as follows for each tasting data set:
D = (1 - S4)^0.5,
where S4 is the Simple matching coefficient of Sokal & Michener (1958). This
coefficient was called S4 in the Gower & Legendre (1986) paper and S1 in the
Legendre & Legendre (2012) book. In package ade4
, coefficient
D = sqrt(1 - S4) is computed by function dist.binary
using
argument "method=2"
.
Pierre Legendre <[email protected]> and François-Joseph Lapointe <[email protected]>, Département de sciences biologiques, Université de Montréal, Montréal, Québec, Canada.
Lapointe, F.-J. and P. Legendre. 1994. A classification of pure malt Scotch whiskies. Applied Statistics 43: 237-257 doi:10.2307/2986124
Gower, J.C. and Legendre, P. 1986. Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 3, 5-48. doi:10.1007/BF01896809
Legendre, P. and Legendre, L. 2012. Numerical Ecology. 3rd English edition. Elsevier Science BV, Amsterdam. doi:10.1016/S0304-3800(00)00291-X
data(ScotchWhiskey) lapply(ScotchWhiskey,ncol) ScotchWhiskey$nbChar ScotchWhiskey$listW ## attr(ScotchWhiskey$listW,"class") names(ScotchWhiskey) names(ScotchWhiskey$dist) plotWhiskey <- function(main) { plot(x=ScotchWhiskey$geo@coords[,1L]/1000, xlab="Eastings (km)", y=ScotchWhiskey$geo@coords[,2L]/1000, ylab="Northings (km)", main=main, type="n",asp=1) apply( ScotchWhiskey$neighbor@data,1L, function(X,coords) { segments( coords[X[1L],1L]/1000, coords[X[1L],2L]/1000, coords[X[2L],1L]/1000, coords[X[2L],2L]/1000 ) }, coords=ScotchWhiskey$geo@coords ) invisible(NULL) } plotWhiskey("Scotch whiskey: peat nose") cols <- c("blue","orange") points(ScotchWhiskey$geo@coords/1000,pch=21L, bg=cols[ScotchWhiskey$nose[,"peat"]+1L]) legend(x=50,y=1000,legend=c("Has a peat nose","Has no peat nose"), pch=21L,pt.bg=rev(cols)) plotWhiskey("Scotch whiskey: soft body") cols <- c("red","green") points(ScotchWhiskey$geo@coords/1000,pch=21L, bg=cols[ScotchWhiskey$body[,"soft"]+1L]) legend(x=50,y=1000,legend=c("Has a soft body","Has no soft body"), pch=21L,pt.bg=rev(cols)) plotWhiskey("Scotch whiskey: spicy palate") cols <- c("red","green") points(ScotchWhiskey$geo@coords/1000,pch=21L, bg=cols[ScotchWhiskey$palate[,"spice"]+1L]) legend(x=50,y=1000,legend=c("Has a spicy palate","Has no spicy palate"), pch=21L,pt.bg=rev(cols)) plotWhiskey("Scotch whiskey: sweet finish") cols <- c("red","green") points(ScotchWhiskey$geo@coords/1000,pch=21L, bg=cols[ScotchWhiskey$finish[,"sweet"]+1L]) legend(x=50,y=1000,legend=c("Has a sweet finish","Has no sweet finish"), pch=21L,pt.bg=rev(cols)) ## To visualize (part of) the distance matrices: as.matrix(ScotchWhiskey$dist$nose)[1:5,1:5] as.matrix(ScotchWhiskey$dist$body)[1:5,1:5] as.matrix(ScotchWhiskey$dist$palate)[1:5,1:5] as.matrix(ScotchWhiskey$dist$finish)[1:5,1:5] ## The data tables: ScotchWhiskey$colour head(ScotchWhiskey$nose) head(ScotchWhiskey$body) head(ScotchWhiskey$palate) head(ScotchWhiskey$finish)
data(ScotchWhiskey) lapply(ScotchWhiskey,ncol) ScotchWhiskey$nbChar ScotchWhiskey$listW ## attr(ScotchWhiskey$listW,"class") names(ScotchWhiskey) names(ScotchWhiskey$dist) plotWhiskey <- function(main) { plot(x=ScotchWhiskey$geo@coords[,1L]/1000, xlab="Eastings (km)", y=ScotchWhiskey$geo@coords[,2L]/1000, ylab="Northings (km)", main=main, type="n",asp=1) apply( ScotchWhiskey$neighbor@data,1L, function(X,coords) { segments( coords[X[1L],1L]/1000, coords[X[1L],2L]/1000, coords[X[2L],1L]/1000, coords[X[2L],2L]/1000 ) }, coords=ScotchWhiskey$geo@coords ) invisible(NULL) } plotWhiskey("Scotch whiskey: peat nose") cols <- c("blue","orange") points(ScotchWhiskey$geo@coords/1000,pch=21L, bg=cols[ScotchWhiskey$nose[,"peat"]+1L]) legend(x=50,y=1000,legend=c("Has a peat nose","Has no peat nose"), pch=21L,pt.bg=rev(cols)) plotWhiskey("Scotch whiskey: soft body") cols <- c("red","green") points(ScotchWhiskey$geo@coords/1000,pch=21L, bg=cols[ScotchWhiskey$body[,"soft"]+1L]) legend(x=50,y=1000,legend=c("Has a soft body","Has no soft body"), pch=21L,pt.bg=rev(cols)) plotWhiskey("Scotch whiskey: spicy palate") cols <- c("red","green") points(ScotchWhiskey$geo@coords/1000,pch=21L, bg=cols[ScotchWhiskey$palate[,"spice"]+1L]) legend(x=50,y=1000,legend=c("Has a spicy palate","Has no spicy palate"), pch=21L,pt.bg=rev(cols)) plotWhiskey("Scotch whiskey: sweet finish") cols <- c("red","green") points(ScotchWhiskey$geo@coords/1000,pch=21L, bg=cols[ScotchWhiskey$finish[,"sweet"]+1L]) legend(x=50,y=1000,legend=c("Has a sweet finish","Has no sweet finish"), pch=21L,pt.bg=rev(cols)) ## To visualize (part of) the distance matrices: as.matrix(ScotchWhiskey$dist$nose)[1:5,1:5] as.matrix(ScotchWhiskey$dist$body)[1:5,1:5] as.matrix(ScotchWhiskey$dist$palate)[1:5,1:5] as.matrix(ScotchWhiskey$dist$finish)[1:5,1:5] ## The data tables: ScotchWhiskey$colour head(ScotchWhiskey$nose) head(ScotchWhiskey$body) head(ScotchWhiskey$palate) head(ScotchWhiskey$finish)
Function stimodels
performs two-way ANOVA to test space-time
interaction (STI) without replicates using one among a set of possible models
described in Legendre et al. (2010).
Function quicksti
allows performing space-time ANOVA in a simplified
way. In many models, degrees of freedom are saved by coding space and/or
time parsimoniously using distance-based Moran Eigenvector Maps (dbMEM;
Borcard & Legendre 2002; Dray et al. 2006).
stimodels( Y, S, Ti, model = "5", nperm = 999, nS = -1, nT = -1, Sfixed = TRUE, Tfixed = TRUE, COD.S = NULL, COD.T = NULL, save.X = FALSE, print.res = TRUE ) quicksti( Y, S, Ti, nperm = 999, alpha = 0.05, COD.S = NULL, COD.T = NULL, save.X = FALSE, print.res = TRUE )
stimodels( Y, S, Ti, model = "5", nperm = 999, nS = -1, nT = -1, Sfixed = TRUE, Tfixed = TRUE, COD.S = NULL, COD.T = NULL, save.X = FALSE, print.res = TRUE ) quicksti( Y, S, Ti, nperm = 999, alpha = 0.05, COD.S = NULL, COD.T = NULL, save.X = FALSE, print.res = TRUE )
Y |
Site-by-species response data table. Assumes row blocks corresponding to times, i.e. within each block all sites are provided, always in the same order. |
S |
Number of spatial points (when they are aligned on a transect or a time series and equispaced) or a matrix of spatial coordinates (when the sites are on a two-dimensional surface or on a line but very irregularly spaced). |
Ti |
Number of time campaigns (when equispaced) or a matrix (a vector) of temporal coordinates (when the time campaigns are very irregularly spaced). |
model |
Linear space-time model to be used (can be either "2", "3a", "3b", "4", "5", "6a", "6b", or "7"). |
nperm |
Number of permutations in the significance tests. |
nS |
Number of space dbMEMs to use (by default, -1, all dbMEMs with positive autocorrelation are used). |
nT |
Number of time dbMEMs to use (by default, -1, all dbMEMs with positive autocorrelation are used). |
Sfixed |
Logical: is factor Space fixed, or not (if FALSE, it is considered a random factor). |
Tfixed |
Logical: is factor Time fixed, or not (if FALSE, it is considered a random factor). |
COD.S |
Spatial coding functions to be used instead of dbMEM. The
number of columns must be lower than |
COD.T |
Temporal coding functions to be used instead of dbMEM. The
number of columns must be lower than |
save.X |
If TRUE, the explanatory bloc-diagonal matrix constructed for model 6a or 6b is saved in the output list with the label X. |
print.res |
If TRUE, displays the results and additional information onscreen (recommended). If FALSE, only R2, F and P are printed onscreen. |
alpha |
In |
The 'stimodels' and 'quicksti' functions should only be used (1) when each site has been sampled during each survey, with no missing data, and (2) when there are no replicate observations of the space-time points. When there is replication, use a regular permutational Manova function such as adonis2.
When the sites form a one-dimensional spatial transect, or a meandering line such as the course of a river, with regular sampling intervals, and the time series has fairly equal time intervals, one can use the S and Ti arguments to indicate the number of sites in space and the number of surveys along time. The order of the sites in each temporal block of the input data file will be taken to correspond to the spatial positions of the sites along the transect (from 1 to S), and the order of the time blocks in the data file will be taken to indicate the temporal order of the surveys (from 1 to Ti). The function will then compute dbMEM eigenfunctions corresponding to the spatial and temporal positions of the data rows in the input data file as if they were on straight lines.
When the sites do not form a regular, one-dimensional spatial transect, one must provide a file of spatial coordinates of the sites to argument S. Similarly, when the time series has unequal time intervals, one must provide a file of temporal coordinates of the surveys to argument Ti.
Alternatively, one can use arguments COD.S or COD.T to provide a matrix of Helmert contrasts to be used in the analysis in place of dbMEM eigenfunctions. One can do that, for example, when there are only a few surveys along time and it would not be useful to represent these few surveys by dbMEM eigenfunctions. That matrix can have the class "matrix" or "numeric"; they both work in functions stimodels and quicksti. Arguments COD.S and COD.T can also be used to provide matrices containing other types of eigenfunctions, for example AEM eigenfunctions, to be used instead of dbMEM matrices computed by stimodels or quicksti. However, do not code both the space and time factors as Helmert contrasts; that would not leave residual degrees of freedom for the test of the interaction.
In stimodels
, tests for space-time interaction and space or time main
effects are conducted using one of the following models:
Model 2 - Space and Time are coded using Helmert contrasts for the main effects. The interaction cannot be tested.
Model 3a - Space is coded using dbMEM variables whereas Time is coded using Helmert contrasts.
Model 3b - Space is coded using Helmert contrasts whereas Time is coded using dbMEM variables.
Model 4 - Both Space and Time are coded using dbMEM variables; the interaction is coded as the Hadamard (or elementwise) product of the space-coding by the time-coding dbMEMs.
Model 5 - Space and Time are coded using Helmert contrasts for the main factor effects; the interaction is coded as the Hadamard product of the space-coding by the time-coding dbMEM variables.
Model 6a - Nested model. Testing for the existence of spatial structure (common or separate) using dbMEM (or other) variables coding for Space.
Model 6b - Nested model. Testing for the existence of temporal structure (common or separate) using dbMEM (or other) variables coding for Time.
Model 7 - Space and Time are coded using dbMEM variables for the main factor effects, but they are coded using Helmert contrasts for the interaction term (not recommended).
With Models 2, 6a and 6b, the interaction test is not available.
When using quicksti
, space-time interaction is first tested using
Model 5. Depending on the outcome of this test, the main factors are tested
using different strategies. If the interaction is not significant then the
test of main factors is also done following Model 5. If the interaction is
significant, then a nested model (6a) is used to know whether separate
spatial structures exist and another (6b) to know whether separate temporal
structures exist. In quicksti
function space and time are always
considered fixed factors (F ratios are constructed using residual MS in the
denominator).
For the interaction the permutations are unrestricted, whereas for the main
factors the permutations are restricted within time blocks (for the test of
factor Space) or space blocks (for the test of factor Time). By default, the
function computes dbMEM for space and time coding, but other space and/or
time descriptors can be provided by the user, through COD.S
and
COD.T
.
A list containing the following results:
testS
An object with the result of the space effect test, including
the mean squares for the F numerator (MS.num
), the mean squares for
the F denominator (MS.den
), the proportion of explained variance
(R2
), the adjusted proportion of explained variance (R2.adj
),
the F statistics (F
) and its p-value computed from a permutation test
(Prob
).
testT
An object with the result of the time effect test, like testS
.
teststi
An object with the result of the space-time interaction test,
like testS
.
X.matrix
The bloc-diagonal explanatory matrix used in test of model 6a or 6b
Pierre Legendre [email protected], Miquel De Caceres and Daniel Borcard
Borcard, D. & P. Legendre. 2002. All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling 153: 51-68. https://doi.org/10.1016/S0304-3800(01)00501-4.
Dray, S., P. Legendre & P. R. Peres-Neto. 2006. Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM). Ecological Modelling 196: 483-493. https://doi.org/10.1016/j.ecolmodel.2006.02.015.
Legendre, P. & D. Borcard. 2018. Box-Cox-chord transformations for community composition data prior to beta diversity analysis. Ecography 41: 1820–1824. https://doi.org/10.1111/ecog.03498.
Legendre, P., M. De Caceres & D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction in the absence of replication. Ecology 91: 262-272. https://doi.org/10.1890/09-0199.1.
# The "trichoptera" data set contains the abundances of 56 Trichoptera species captured # during 100 consecutive days in 22 emergence traps along a stream. The 22 traps # (sites) form a regular transect, with geographic positions 1 to 22. The original # daily data collected at each site were pooled into 10 survey periods for the study # of Legendre et al. (2010) in order to reduce the very high proportion of zeros in the # original data matrix. Order of the observations in the data set: the 22 traps (sites) # are nested within the survey weeks, as required by the 'stimodels' and 'quicksti' # functions. data(trichoptera) # log-transform the species data, excluding the Site and Date colums trich.log <- log1p(trichoptera[,-c(1,2)]) # A log-chord transformation (Legendre & Borcard 2018) would also be appropriate for # these data: trich.tr <- decostand(log1p(trichoptera[,-c(1,2)]), method="norm") # Example 1. Compute the space-time interaction test using model 5. By specifying the # number of sites (traps), the sofware assumes that they form a regular transect with # equispaced sites. Important note to users – In real analyses, use more than 99 # permutations. out.1 <- stimodels(trich.log, S=22, Ti=10, model="5", nperm=99) # The interaction is significant. Because of that, test results for the main effects, # space and time, obtained with model 5, cannot be interpreted. Tests of spatial # variation can be done for individual times using simple RDA against dbMEM.S # variables. Likewise, tests of temporal variation can be done for individual sites # using simple RDA against dbMEM.T variables. A global test of the hypothesis that none # of the times shows a significant spatial structure can be done with model 6a. For a # global test of temporal structure at the various sites, use model 6b. # Code not run during CRAN software tests # Example 2. Run space-time analysis with global tests for main effects after testing # the interaction, which is significant in this example out.2 <- quicksti(trich.log, S=22, Ti=10, nperm=999) # Since the interaction is significant, function 'quicksti' will carry out the # tests of existence of a spatial (at least in one of the time periods) and temporal # (at least at one of the sites) structures using models 6a and 6b, respectively. # 3. Run space-time analysis for two time blocks only, i.e. time periods 6 and 7, # then time periods 8 and 9. # Example 3.1. Time periods 6 and 7. The interaction is not significant. In that case, # function 'quicksti' carries out the tests of the main effects using model 5. out.3 <- quicksti(trich.log[111:154,], S=22, Ti=2, nperm=999) # Example 3.2. Time periods 8 and 9. The interaction is significant. In that case, # 'quicksti' carries out the tests of the spatial effects using model 6a. Model 6b # cannot proceed with the test of the temporal effect because Ti=2. An explanation is # printed in the output list. out.4 <- quicksti(trich.log[155:198,], S=22, Ti=2, nperm=999) # 4. Illustrations of the use of 'COD.S' and 'COD.T' in STI analysis # The following examples illustrate how to use other representations of the spatial or # temporal relationships among observations, through arguments 'COD.S' and # 'COD.T' of functions 'stimodels' and 'quicksti'. The trichoptera data # are used again. # Example 4.1. Explicitly compute dbMEMs for the spatial structure along the regular # transect, using function 'dbmem' of adespatial, and provide it to 'stimodels' # or 'quicksti' as argument 'COD.S'. The dbMEMs must first be computed on the # transect, then repeated (Ti-1) times to provide Ti repeats in total. dbMEM.S1 <- as.matrix(dbmem(1:22)) dbMEM.S10 <- dbMEM.S1 for(j in 2:10) dbMEM.S10 <- rbind(dbMEM.S10, dbMEM.S1) out.5 <- stimodels(trich.log, S=22, Ti=10, model="5", COD.S=dbMEM.S10, nperm=999) # Results should be identical to those in output file out.1 of Example 1, except for # P-values which can vary slightly. # Example 4.2. Assume now that the sampling sites have irregular positions, as # described by the following matrix of geographic coordinates 'xy.trich'. Provide # this matrix to argument S of 'stimodels' xy.trich = matrix(c(1:5,11:15,21:25,31:35,41,42,rep(c(1,2),11)),22,2) plot(xy.trich, asp=1) # Plot a quick map of the site positions out.6 <- stimodels(trich.log, S=xy.trich, Ti=10, model="5", nperm=999) # Example 4.3. Compute a matrix of dbMEMs for the sites. The coding matrix provided to # argument 'COD.S' must contain repeated dbMEM.S codes because that matrix must have # the same number of rows as matrix Y. Construct coding matrix dbMEM.S10 containing the # dbMEM.S codes repeated 10 times. dbMEM.S1 <- as.matrix(dbmem(xy.trich)) dbMEM.S10 = dbMEM.S1 for(i in 1:9) dbMEM.S10 <- rbind(dbMEM.S10, dbMEM.S1) out.7 <- stimodels(trich.log, S=22, Ti=10, model="5", COD.S=dbMEM.S10, nperm=999) # Compare the results with those obtained in the output file out6, example 4.2. # Careful: If an analysis requires a dbMEM coding matrix for 'COD.T', the dbMEM.T # codes must follow the required data arrangement: sites must be nested within times. # The following function can be used to construct a dbMEM.T matrix. MEM.T <- function(s, tt, coord.T=NULL) # Documentation of function MEM.T – # Generate a matrix of temporal eigenfunctions for input into stimodels, # with sites nested within times. # Arguments – # s : number of space points (sites) # tt : number of time points # coord.T : optional matrix or vector giving the time point coordinates { n <- s*tt if(is.null(coord.T)) coord.T <- as.matrix(1:tt) MEM.TT <- as.matrix(dbmem(coord.T)) dbMEM.T <- matrix(0,n,ncol(MEM.TT)) # Empty matrix to house dbMEM.T beg.x <- seq(1, n, by=s) for(i in 1:tt) { # Fill tt blocks of rows with identical MEM.TT values for(j in 1:s) dbMEM.T[(beg.x[i]+j-1),] <- MEM.TT[i,] } dbMEM.T } # Example of use of function MEM.T dbMEM.T <- MEM.T(s=6, tt=5) # Check the size of the dbMEM.T output matrix dim(dbMEM.T) # End of code not run during CRAN software tests
# The "trichoptera" data set contains the abundances of 56 Trichoptera species captured # during 100 consecutive days in 22 emergence traps along a stream. The 22 traps # (sites) form a regular transect, with geographic positions 1 to 22. The original # daily data collected at each site were pooled into 10 survey periods for the study # of Legendre et al. (2010) in order to reduce the very high proportion of zeros in the # original data matrix. Order of the observations in the data set: the 22 traps (sites) # are nested within the survey weeks, as required by the 'stimodels' and 'quicksti' # functions. data(trichoptera) # log-transform the species data, excluding the Site and Date colums trich.log <- log1p(trichoptera[,-c(1,2)]) # A log-chord transformation (Legendre & Borcard 2018) would also be appropriate for # these data: trich.tr <- decostand(log1p(trichoptera[,-c(1,2)]), method="norm") # Example 1. Compute the space-time interaction test using model 5. By specifying the # number of sites (traps), the sofware assumes that they form a regular transect with # equispaced sites. Important note to users – In real analyses, use more than 99 # permutations. out.1 <- stimodels(trich.log, S=22, Ti=10, model="5", nperm=99) # The interaction is significant. Because of that, test results for the main effects, # space and time, obtained with model 5, cannot be interpreted. Tests of spatial # variation can be done for individual times using simple RDA against dbMEM.S # variables. Likewise, tests of temporal variation can be done for individual sites # using simple RDA against dbMEM.T variables. A global test of the hypothesis that none # of the times shows a significant spatial structure can be done with model 6a. For a # global test of temporal structure at the various sites, use model 6b. # Code not run during CRAN software tests # Example 2. Run space-time analysis with global tests for main effects after testing # the interaction, which is significant in this example out.2 <- quicksti(trich.log, S=22, Ti=10, nperm=999) # Since the interaction is significant, function 'quicksti' will carry out the # tests of existence of a spatial (at least in one of the time periods) and temporal # (at least at one of the sites) structures using models 6a and 6b, respectively. # 3. Run space-time analysis for two time blocks only, i.e. time periods 6 and 7, # then time periods 8 and 9. # Example 3.1. Time periods 6 and 7. The interaction is not significant. In that case, # function 'quicksti' carries out the tests of the main effects using model 5. out.3 <- quicksti(trich.log[111:154,], S=22, Ti=2, nperm=999) # Example 3.2. Time periods 8 and 9. The interaction is significant. In that case, # 'quicksti' carries out the tests of the spatial effects using model 6a. Model 6b # cannot proceed with the test of the temporal effect because Ti=2. An explanation is # printed in the output list. out.4 <- quicksti(trich.log[155:198,], S=22, Ti=2, nperm=999) # 4. Illustrations of the use of 'COD.S' and 'COD.T' in STI analysis # The following examples illustrate how to use other representations of the spatial or # temporal relationships among observations, through arguments 'COD.S' and # 'COD.T' of functions 'stimodels' and 'quicksti'. The trichoptera data # are used again. # Example 4.1. Explicitly compute dbMEMs for the spatial structure along the regular # transect, using function 'dbmem' of adespatial, and provide it to 'stimodels' # or 'quicksti' as argument 'COD.S'. The dbMEMs must first be computed on the # transect, then repeated (Ti-1) times to provide Ti repeats in total. dbMEM.S1 <- as.matrix(dbmem(1:22)) dbMEM.S10 <- dbMEM.S1 for(j in 2:10) dbMEM.S10 <- rbind(dbMEM.S10, dbMEM.S1) out.5 <- stimodels(trich.log, S=22, Ti=10, model="5", COD.S=dbMEM.S10, nperm=999) # Results should be identical to those in output file out.1 of Example 1, except for # P-values which can vary slightly. # Example 4.2. Assume now that the sampling sites have irregular positions, as # described by the following matrix of geographic coordinates 'xy.trich'. Provide # this matrix to argument S of 'stimodels' xy.trich = matrix(c(1:5,11:15,21:25,31:35,41,42,rep(c(1,2),11)),22,2) plot(xy.trich, asp=1) # Plot a quick map of the site positions out.6 <- stimodels(trich.log, S=xy.trich, Ti=10, model="5", nperm=999) # Example 4.3. Compute a matrix of dbMEMs for the sites. The coding matrix provided to # argument 'COD.S' must contain repeated dbMEM.S codes because that matrix must have # the same number of rows as matrix Y. Construct coding matrix dbMEM.S10 containing the # dbMEM.S codes repeated 10 times. dbMEM.S1 <- as.matrix(dbmem(xy.trich)) dbMEM.S10 = dbMEM.S1 for(i in 1:9) dbMEM.S10 <- rbind(dbMEM.S10, dbMEM.S1) out.7 <- stimodels(trich.log, S=22, Ti=10, model="5", COD.S=dbMEM.S10, nperm=999) # Compare the results with those obtained in the output file out6, example 4.2. # Careful: If an analysis requires a dbMEM coding matrix for 'COD.T', the dbMEM.T # codes must follow the required data arrangement: sites must be nested within times. # The following function can be used to construct a dbMEM.T matrix. MEM.T <- function(s, tt, coord.T=NULL) # Documentation of function MEM.T – # Generate a matrix of temporal eigenfunctions for input into stimodels, # with sites nested within times. # Arguments – # s : number of space points (sites) # tt : number of time points # coord.T : optional matrix or vector giving the time point coordinates { n <- s*tt if(is.null(coord.T)) coord.T <- as.matrix(1:tt) MEM.TT <- as.matrix(dbmem(coord.T)) dbMEM.T <- matrix(0,n,ncol(MEM.TT)) # Empty matrix to house dbMEM.T beg.x <- seq(1, n, by=s) for(i in 1:tt) { # Fill tt blocks of rows with identical MEM.TT values for(j in 1:s) dbMEM.T[(beg.x[i]+j-1),] <- MEM.TT[i,] } dbMEM.T } # Example of use of function MEM.T dbMEM.T <- MEM.T(s=6, tt=5) # Check the size of the dbMEM.T output matrix dim(dbMEM.T) # End of code not run during CRAN software tests
The function computes and tests Temporal Beta-diversity Indices (TBI) between multivariate observations (frequency or presence-absence data) forming pairs observed at time 1 (T1) and time 2 (T2). The data matrices may contain abundance or presence-absence data, or other types of frequency-like data (e.g. biomasses). TBI are dissimilarity indices that measure beta differentiation through time. The indices are computed between T1 and T2 for each site. The difference between species (or abundances-per-species) gains (C/den) and losses (B/den) can be printed out and tested for significance.
TBI( mat1, mat2, method = "%difference", pa.tr = FALSE, nperm = 99, BCD = TRUE, replace = FALSE, test.BC = TRUE, test.t.perm = FALSE, save.BC = FALSE, seed. = NULL, clock = FALSE )
TBI( mat1, mat2, method = "%difference", pa.tr = FALSE, nperm = 99, BCD = TRUE, replace = FALSE, test.BC = TRUE, test.t.perm = FALSE, save.BC = FALSE, seed. = NULL, clock = FALSE )
mat1 , mat2
|
Two multivariate community composition or gene frequency data matrices (class data.frame or matrix) with the same number of rows and columns. The rows must correspond to the same objects (e.g. sites) and the columns to the same variables, e.g. species or alleles. – The input files are checked for having equal numbers of rows and columns, for rows that are empty in both mat1 and mat2, and for the presence of negative values, which cannot be frequencies. |
method |
One of the following dissimilarity coefficients:
|
pa.tr |
If |
nperm |
Number of permutations for the tests of significance of the
temporal beta indices and the permutation test of the B-C difference. Use
|
BCD |
If If |
replace |
If |
test.BC |
If |
test.t.perm |
If |
save.BC |
If |
seed. |
Seed for random number generator. If |
clock |
If |
For each object, the function tests the hypothesis (H0) that a species assemblage is not exceptionally different between T1 and T2, compared to assemblages that could have been observed at this site at T1 and T2 under conditions corresponding to H0. If H0 is rejected, the object is recognized as exceptionally different from the other objects for its difference between T1 and T2.
To fix ideas, an example in palaeoecology - A researcher is studying ancient and modern diatom communities in sediment cores. If a site displays an exceptional difference between T1 and T2, the researcher is justified to examine the reason for that difference. It could, for example, be caused by a change in land use at that site, which has caused the difference to be larger than at the other sites, compared to the differences caused by climate change at all sites.
The temporal beta diversity indices available in this function belong to four groups, computed in different ways.
Method
"%difference"
computes the percentage difference index, erroneously
called the Bray-Curtis index in some software packages ; it is the
quantitative form of the Sorensen index. Method "ruzicka"
computes the
Ruzicka dissimilarity; this is one of the quantitative coefficients
corresponding to the Jaccard dissimilarity for binary data. When these
indices are used to compute ordinations by principal coordinate analysis, it
is recommended to take the square root of the dissimilarities before the
ordination analysis because these indices do not have the property of being
Euclidean. However, that precaution is not important here; the results of
permutation tests will be the same for these dissimilarities, square-rooted
or not. If pa.tr=TRUE
, either the Sorensen or the Jaccard coefficient
are obtained by computing these two coefficients.
Methods
"chord"
(chord distance), "hellinger"
(Hellinger distance) and
"log.chord"
(log.chord distance) are obtained by transformation of the
species data, as described by Legendre & Borcard (2018), followed by
calculation of the Euclidean distance. For the Hellinger distance, the data
are square-rooted, then subjected to the chord transformation and the
Euclidean distance. For the log.chord distance, the data are transformed by
y' = log(y+1) using function log1p() of R, then subjected to the chord
transformation and the Euclidean distance. These three distances have the
Euclidean property (Legendre & Legendre 2012, Legendre & De Caceres 2013). If
pa.tr=TRUE
, the Ochiai distance for binary data,
sqrt(2)*sqrt(1-Ochiai similarity), is obtained from these three coefficients.
Methods "jaccard"
, "sorensen"
, "ochiai"
implement
the Jaccard, Sorensen and Ochiai dissimilarities. For these coefficients, the
data are first transformed to presence-absence (pa.tr
receives the
value TRUE
), then the dissimilarities are computed using the
corresponding quantitative coefficients (Ruzicka, percentage difference, and
chord).
The Euclidean distance is also available in this function. It is not recommended for community composition or allele frequency data. One can compute it for log-transformed abundance data that do not contain zeros, or very few zeros (short gradients).
The temporal beta indices are tested for significance using permutation tests. The hypotheses are the following:
H0: the site under study (e.g. a species assemblage) is not exceptionally different between T1 and T2, compared to assemblages that could have been observed at this site at T1 and T2 under conditions corresponding to H0. The differences between T1 and T2 all belong to the same statistical population of differences.
H1: the site under study is exceptionally different between times T1 and T2.
In the decomposition of the Ruzicka and percentage difference dissimilarities or their presence-absence forms (Jaccard, Sorensen), the components B and C are computed as follows:
bj is the part of the abundance of species j that is higher at time 1 than at time 2: bj = (y1j - y2j) if y1j > y2j ; bj = 0 otherwise. B is the sum of the bj values for all species in the group of species under study. It is the unscaled sum of species losses between time 1 and time 2. In the BCD output table BCD.mat, column 1 contains B/den where den is the denominator of the index, i.e. (2A+B+C) for the percentage difference index and (A+B+C) for the Ruzicka index.
cj is the part of the abundance of species j that is higher at time 2 than at time 1: cj = (y2j - y1j) if y2j > y1j ; cj = 0 otherwise. C is the sum of the cj values for all species in the group of species under study. It is the unscaled sum of species gains between time 1 and time 2. In the BCD output table BCD.mat, column 2 contains C/den where den is the denominator of the index, i.e. (2A+B+C) for the percentage difference index and (A+B+C) for the Ruzicka index.
The original values of B and C for each site, without denominator, are also available in the output table BC.
Warning - In real ecological studies, when the TBI test is applied to data where some sites are highly impoverished due to pollution or other extreme environmental situations, this situation may produce sites with very few species (i.e. very low richness) and no species in common for the T1-T2 comparisons due to sampling variation at these impoverished sites. The TBI dissimilarity will be high and the test may indicate a significant T1-T2 difference if most other sites have higher species richness. This would be a correct statistical outcome for the test. When users of the method identify sites showing significant TBI tests in data, they should check the species richness of these sites at T1 and T2. Interpretation of the test results should be done with caution when high and significant TBI indices are associated with very low richness and no species in common between T1 and T2.
Function TBI returns a list containing the following results:
TBI
The vector of Temporal Beta-diversity Indices
(TBI) between observations at times T1 and T2 for each object.
p.TBI
A corresponding vector of p-values. Significant p-values
(e.g. p.TBI <= 0.05) indicate exceptional objects for the difference of
their species composition.
p.adj
The p-values are corrected for multiple testing using
function p.adjust of stats
. The adjustment is done using
method="holm"
, which is the default option of the p.adjust
function.
BCD.mat
An output table with four columns: B/den, C/den,
D=(B+C)/den, and Change. The value den is the denominator of the index,
i.e. (2A+B+C) for the percentage difference index and (A+B+C) for the
Ruzicka index. The decomposition is such that D = B/den + C/den. Columns B
and C indicate which of the D values are associated with large B (losses)
or large C values (gains), before proceeding to the analysis and
interpretation of the D values, using environmental or spatial explanatory
variables, through regression or classification tree analysis. When B > C,
the site has lost species or abundances-per-species between time 1 and time
2; this is indicated by a "-" sign in column Change. On the contrary, if B
< C, the site has gained species or abundances-per-species between time 1
and time 2; this is indicated by a "+" sign in that column. Sites with
equal amounts of losses and gains are marked with a "0". - The B/den and
C/den values can be plotted in B-C plots, which are informative about the
changes that occurred in the data set between the two surveys under study.
- If pa.tr
is TRUE, the B and C components are the numbers of
spepcies losses and gains, and D is either the Sorensen or the Jaccard
dissimilarity. - If BCD=FALSE
, that table is not produced. No table
is (or can be) computed for indices other than the Ruzicka and percentage
difference indices or their binary forms.
BCD.summary
An output table with six columns: mean(B/den);
mean(C/den); mean(D); B/(B+C) (which is mean(B/den) divided by mean(D));
C/(B+C) (which is mean(C/den) divided by mean(D)). These values indicate,
over all sites, which of the processes dominated (loss or gain of species
or abundances-per-species) when site compositions changed between time 1
and time 2. Change has the same meaning as in table BCD.mat
; the
sign indicates the direction of the mean change over all sites.
t.test_B.C
The results of a paired t-test (parametric) of
significance of the difference between columns C/den and B/den of the
BCD.mat
table. If test.t.perm=TRUE
, the difference between
species gains (C/den) and losses (B/den) is also tested in a permutational
paired t-test and the permutational p-value is shown in the output table.
This result provides an overall test of the direction of change over all
sites. It helps confirm the asymmetry between species (or
abundances-per-species) gains (C/den) and species (or
abundances-per-species) losses (B/den). A star in column p<=0.05 indicates
a significant result of the parametric test at the 0.05 level.
BC
An output table with two columns: B and C. In this table,
the B and C statistics are not divided by a denominator, contrary to the
values B/den and C/den found in the output table BCD.mat
.
Pierre Legendre [email protected]
Legendre, P. 2019. A temporal beta-diversity index to identify exceptional sites in space-time surveys. Ecology and Evolution (in press).
Legendre, P. & M. De Caceres. 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters 16: 951-963.
Legendre, P. & D. Borcard. 2018. Box-Cox-chord transformations for community composition data prior to beta diversity analysis. Ecography 41: 1820-1824.
Legendre, P. & L. Legendre. 2012. Numerical Ecology. 3rd English edition. Elsevier Science BV, Amsterdam.
van den Brink, P. J. & C. J. F. ter Braak. 1999. Principal response curves: analysis of time-dependent multivariate responses of biological community to stress. Environmental Toxicology and Chemistry 18: 138-148.
if(require("vegan", quietly = TRUE)) { ## Example 1 - ## Invertebrate communities subjected to insecticide treatment. ## As an example in their paper on Principal Response Curves (PRC method), van den ## Brink & ter Braak (1999) used observations on the abundances of 178 invertebrate ## species (macroinvertebrates and zooplankton) subjected to treatments in 12 mesocosms by ## the insecticide chlorpyrifos. The mesocosms were sampled at 11 occasions. The data, ## available in the {vegan} package, are log-transformed species abundances, ytranformed = ## log(10*y+1). ## The data of survey #4 will be compared to those of survey #11 in this example. ## Survey #4 was carried out one week after the insecticide treatment, whereas the fauna ## of the mesocosms was considered by the authors to have fully recovered from the ## insecticide treatment at survey #11. data(pyrifos) ## The mesocosms had originally been attributed at random to the treatments. However, ## to facilitate presentation of the results, they will be listed here in order of ## increased insecticide doses: {0, 0, 0, 0, 0.1, 0.1, 0.9, 0.9, 6, 6, 44, 44} micro g/L. ## Select the 12 data rows of surveys 4 and 11 from the data file and reorder them ord4 = c(38,39,41,47,37,44,40,46,43,48,42,45) ord11 = c(122,123,125,131,121,128,124,130,127,132,126,129) ## Run the TBI function res1 <- TBI(pyrifos[ord4,], pyrifos[ord11,], method = "%diff", nperm = 0, test.t.perm = FALSE) res1$BCD.mat ## Example 2 - ## This example uses the mite data available in vegan. Let us pretend that sites 1-20 ## represent T1 and sites 21-40 represent T2. data(mite) # Run the TBI function res2 <- TBI(mite[1:20,], mite[21:40,], method = "%diff", nperm = 0, test.t.perm = FALSE) summary(res2) res2$BCD.mat }
if(require("vegan", quietly = TRUE)) { ## Example 1 - ## Invertebrate communities subjected to insecticide treatment. ## As an example in their paper on Principal Response Curves (PRC method), van den ## Brink & ter Braak (1999) used observations on the abundances of 178 invertebrate ## species (macroinvertebrates and zooplankton) subjected to treatments in 12 mesocosms by ## the insecticide chlorpyrifos. The mesocosms were sampled at 11 occasions. The data, ## available in the {vegan} package, are log-transformed species abundances, ytranformed = ## log(10*y+1). ## The data of survey #4 will be compared to those of survey #11 in this example. ## Survey #4 was carried out one week after the insecticide treatment, whereas the fauna ## of the mesocosms was considered by the authors to have fully recovered from the ## insecticide treatment at survey #11. data(pyrifos) ## The mesocosms had originally been attributed at random to the treatments. However, ## to facilitate presentation of the results, they will be listed here in order of ## increased insecticide doses: {0, 0, 0, 0, 0.1, 0.1, 0.9, 0.9, 6, 6, 44, 44} micro g/L. ## Select the 12 data rows of surveys 4 and 11 from the data file and reorder them ord4 = c(38,39,41,47,37,44,40,46,43,48,42,45) ord11 = c(122,123,125,131,121,128,124,130,127,132,126,129) ## Run the TBI function res1 <- TBI(pyrifos[ord4,], pyrifos[ord11,], method = "%diff", nperm = 0, test.t.perm = FALSE) res1$BCD.mat ## Example 2 - ## This example uses the mite data available in vegan. Let us pretend that sites 1-20 ## represent T1 and sites 21-40 represent T2. data(mite) # Run the TBI function res2 <- TBI(mite[1:20,], mite[21:40,], method = "%diff", nperm = 0, test.t.perm = FALSE) summary(res2) res2$BCD.mat }
This function is now deprecated. Please try the new listw.candidates
and
listw.select
functions.
test.W( Y, nb, xy, MEM.autocor = c("all", "positive", "negative"), f = NULL, ... )
test.W( Y, nb, xy, MEM.autocor = c("all", "positive", "negative"), f = NULL, ... )
Y |
A matrix with response variables (univariate or multivariate response) |
nb |
An object of the class |
xy |
Coordinates of the samples, this argument is optional and is
required only if the argument |
MEM.autocor |
A string indicating if all MEM must be returned or only those corresponding to positive or negative autocorrelation |
f |
A function of the distance that can be used as a weighting spatial function. This argument is optional |
... |
Others arguments for the function |
This function is a user-friendly way to compute and test eigenvectors for
various definitions of spatial weighting matrices. It combines calls to the
functions scores.listw
and ortho.AIC
. It allows to test various
definitions of the spatial weighting matrix and return results of
scores.listw
for the best one.
This functions allows to test one binary spatial weighting matrix
(if only Y and nb are provided). It allows also to test a weighting
function based on distances (if f is provided) and a weighting function
with different values of parameters if other arguments of f
are
provided.
A list with the following elements:
all |
A data.frame where each row correspond to one spatial weighting matrix tested. It contains value of parameteres tested and corrected AIC and number of orthogonal vectors for the best model. |
best |
A list containing results of scores.listw and ortho.AIC of the best spatial weighting matrix according to corrected AIC. |
Stéphane Dray [email protected]
Dray S., Legendre P. and Peres-Neto P. R. (2006) Spatial modeling: a comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling, 196, 483–493
if(require(ade4) & require(spdep)){ data(oribatid) # Hellinger transformation fau <- sqrt(oribatid$fau / outer(apply(oribatid$fau, 1, sum), rep(1, ncol(oribatid$fau)), "*")) # remove gradient effect faudt <- resid(lm(as.matrix(fau) ~ as.matrix(oribatid$xy))) # test a binary spatial weighting matrix nbtri <- tri2nb(as.matrix(oribatid$xy)) tri.res <- test.W(faudt, nbtri) maxi <- max(unlist(nbdists(nbtri, as.matrix(oribatid$xy)))) # test a simple spatial weighting function of the distance f1 <- function(x) {1-(x)/(maxi)} tri.f1 <- test.W(faudt, nbtri, f = f1, xy = as.matrix(oribatid$xy)) # test a spatial weighting function with various values of parameters f2 <- function(x,dmax,y) {1-(x^y)/(dmax)^y} tri.f2 <- test.W(faudt,nbtri, f = f2, y = 2:10, dmax = maxi, xy = as.matrix(oribatid$xy)) }
if(require(ade4) & require(spdep)){ data(oribatid) # Hellinger transformation fau <- sqrt(oribatid$fau / outer(apply(oribatid$fau, 1, sum), rep(1, ncol(oribatid$fau)), "*")) # remove gradient effect faudt <- resid(lm(as.matrix(fau) ~ as.matrix(oribatid$xy))) # test a binary spatial weighting matrix nbtri <- tri2nb(as.matrix(oribatid$xy)) tri.res <- test.W(faudt, nbtri) maxi <- max(unlist(nbdists(nbtri, as.matrix(oribatid$xy)))) # test a simple spatial weighting function of the distance f1 <- function(x) {1-(x)/(maxi)} tri.f1 <- test.W(faudt, nbtri, f = f1, xy = as.matrix(oribatid$xy)) # test a spatial weighting function with various values of parameters f2 <- function(x,dmax,y) {1-(x^y)/(dmax)^y} tri.f2 <- test.W(faudt,nbtri, f = f2, y = 2:10, dmax = maxi, xy = as.matrix(oribatid$xy)) }
Fish community composition, ecological traits, and environmental characteristics observed along a 1-km coral reef transect.
data(Tiahura)
data(Tiahura)
A list with four elements:
A data frame with 22 rows (sites) and 280 columns (fish species) describing the presence (value: 1) or absence (value: 0) of the fish species at the sites.
A character vector with 280 elements containing the binomial (latin) names of the 280 fish species.
A data frame with 280 rows (species) and five columns (traits) containing the species ecological traits (see details).
A data frame with 22 rows (sites) and 10 columns (characteristics) containing the environmental characteristics of the sites (see details).
A data frame with 6 rows (sections) and 3 columns describing the different sections of the transect (see details).
The Tiahura fish transect data was described and analyzed by Galzin & Legendre (1987). It consists of presence/absence data for 280 fish species observed at $22$ sites along a 1.02 km long coast-to-sea cross-reef transect located near the northwestern corner of the high volcanic island of Moorea (French Polynesia; WGS84: -17.4934, -149.8680). The survey sites were 50 m long. Species presence/absence data were recorded by a diver trained in underwater fish identification. The transect began on a coral sand beach, followed by a zone of detritic sediments, then a dying reef flat, followed by a zone of coral patches. That relatively flat area ended at a 100-m wide - 9-m deep channel, followed by a 490-m wide barrier reef, which ended in a slightly elevated reef ridge, followed by the outer slope into the Pacific Ocean. The survey was terminated at depth of approximately 25 m; this is the maximum depth allowing scuba diving for any length of time without having to perform decompression stops. For further details about the transect and survey method, see Galzin & Legendre (1987).
Ecological traits contained in data frame trait
are
A seven-level categorical variables describing the feeding habits of the fish species
A seven-level categorical variable describing the general behavior of the fish species in their habitats
A six-level ordered variables describing the adult sizes of the fish species
A three-level categorical variable describing the types of eggs laid by the fish species
A three-level categorical variable describing the activity rhythm of the fish species
Environmental characteristics contained in data frame habitat
are the
distance from the shore (in m), the water depth at the site (in cm), and
substrate composition. Substrate composition (environmental characteristics
3 - 10) is described by percentage coverage indices of the reef bottom by
different materials, based on 50 observation points. These observations
points were positioned at 1-m intervals along a 50-m rope. The variables
indicate what proportion of the 50 readings pertained to each category of
substrate. Several of these categories represent biological materials lying
on top of, intermingled with, or attached to the mineral substrate. When the
22 stations are considered globally, these eight substrate categories
respectively represent 2.5
0.7
Data frame reef
, which contains the boundaries of the different
sections of the transect, comes in handy for displaying data graphically and
interpret results.
This data set was made available to the students in the European Advanced Course "Numerical Analysis in Marine Ecology" given at Observatoire océanologique, Université Paris VI, Villefranche-sur-Mer, on 3 - 20 July 1996.
Pierre Legendre <[email protected]>, René Galzin <[email protected]>, Mireille Harmelin-Vivien <[email protected]>, and Guillaume Guenard <[email protected]>
Galzin, R. & P. Legendre. 1987. The fish communities of a coral reef transect. Pacific Science 41: 158-165.
Legendre, P., R. Galzin & M. Harmelin-Vivien. 1997. Relating behavior to habitat: Solutions to the fourth-corner problem. Ecology 78: 547-562
data(Tiahura) ## Compute dissimilary matrix from Jaccard's similarity coefficient: tiah.jac <- dist.ldc(Tiahura$fish,method = "jaccard") ## Constrained clustering of the fish species: tiah.chclust <- constr.hclust(tiah.jac, coords=Tiahura$habitat[,"distance"], chron=TRUE) ## Plotting the results par(mfrow=c(3,1)) ## First graph: constrained clusters par(mar=c(3,6.5,2,2)) dst <- Tiahura$habitat[,"distance"] plot(NA, xlim=range(dst), ylim=c(0.5,5.5), yaxt="n", ylab="Partitions\n\n", xlab="") parts <- c(2,3,5,7,12) cols <- c("turquoise", "orange", "chartreuse", "aquamarine", "blue", "violet", "pink", "cyan", "green", "red", "cornsilk", "purple") for(i in 1L:length(parts)) { tiah.chclust$coords[,"y"] <- i plot(tiah.chclust, parts[i], link=TRUE, lwd=3, hybrids="none", lwd.pt=0.5, cex=3, pch=21, plot=FALSE, col=cols[round(seq(1,length(cols), length.out=parts[i]))]) } axis(2, at=1:length(parts), labels=paste(parts,"groups"), las=1) ## Second graph: transect profile par(mar=c(4,6.5,1,2)) plot(x=dst, y=Tiahura$habitat[,"depth"], ylim=c(max(range(Tiahura$habitat[,"depth"])),-300), las=1, ylab="Depth\n(cm)\n", xlab="", type="l", lwd=2) for(i in 1:nrow(Tiahura$reef)) { abline(v=Tiahura$reef[i,2], lty=3) abline(v=Tiahura$reef[i,3], lty=3) if((Tiahura$reef[i,3] - Tiahura$reef[i,2])<100) { text(x=(Tiahura$reef[i,2] + Tiahura$reef[i,3])/2, y=2350, labels=toupper(Tiahura$reef[i,1]),srt=90,adj=0) } else { text(x=(Tiahura$reef[i,2] + Tiahura$reef[i,3])/2, y=-150, labels=toupper(Tiahura$reef[i,1])) } } ## Third graph: bottom composition par(mar=c(5,6.5,0,2)) plot(NA,xlim=range(dst), ylim=c(0,1), las=1, ylab="Bottom composition\n(proportions)\n", xlab="Distance (m)") bot <- cbind(0, Tiahura$habitat[,3:10]) for(i in 2:9) bot[,i] <- bot[,i] + bot[,i-1] cols <- c("", "grey75", "brown", "grey25", "green", "purple", "lightgreen", "yellow", "white") for(i in 2:9) polygon(x=c(dst, rev(dst)),y=c(bot[,i], rev(bot[,i-1]))/50, col=cols[i]) text(x=c(44, 365, 707, 538, 957, 111, 965), y=c(0.05, 0.47, 0.37, 0.58, 0.42, 0.80, 0.88), labels=colnames(bot)[2:8], xpd=TRUE) ## Species presence graph set: plot_slice <- function(sl,split) { size <- ceiling(length(Tiahura$species)/split) sp_slice <- size*(sl - 1L) + (1L:size) image(z=t(as.matrix(Tiahura$fish[,sp_slice])),y=1:nrow(Tiahura$fish), x=1:length(sp_slice),zlim=c(0,1),col=c("white","black"),axes=FALSE, ylab="",xlab="") axis(1L,at=1:length(sp_slice),labels=Tiahura$species[sp_slice],las=2L) axis(2L,at=1:nrow(Tiahura$fish),label=rownames(Tiahura$fish),las=1L) invisible(NULL) } par(mar=c(15,5,2,2)) plot_slice(1L,5L) ## plot_slice(2L,5L) ## plot_slice(3L,5L) ## plot_slice(4L,5L) ## plot_slice(5L,5L)
data(Tiahura) ## Compute dissimilary matrix from Jaccard's similarity coefficient: tiah.jac <- dist.ldc(Tiahura$fish,method = "jaccard") ## Constrained clustering of the fish species: tiah.chclust <- constr.hclust(tiah.jac, coords=Tiahura$habitat[,"distance"], chron=TRUE) ## Plotting the results par(mfrow=c(3,1)) ## First graph: constrained clusters par(mar=c(3,6.5,2,2)) dst <- Tiahura$habitat[,"distance"] plot(NA, xlim=range(dst), ylim=c(0.5,5.5), yaxt="n", ylab="Partitions\n\n", xlab="") parts <- c(2,3,5,7,12) cols <- c("turquoise", "orange", "chartreuse", "aquamarine", "blue", "violet", "pink", "cyan", "green", "red", "cornsilk", "purple") for(i in 1L:length(parts)) { tiah.chclust$coords[,"y"] <- i plot(tiah.chclust, parts[i], link=TRUE, lwd=3, hybrids="none", lwd.pt=0.5, cex=3, pch=21, plot=FALSE, col=cols[round(seq(1,length(cols), length.out=parts[i]))]) } axis(2, at=1:length(parts), labels=paste(parts,"groups"), las=1) ## Second graph: transect profile par(mar=c(4,6.5,1,2)) plot(x=dst, y=Tiahura$habitat[,"depth"], ylim=c(max(range(Tiahura$habitat[,"depth"])),-300), las=1, ylab="Depth\n(cm)\n", xlab="", type="l", lwd=2) for(i in 1:nrow(Tiahura$reef)) { abline(v=Tiahura$reef[i,2], lty=3) abline(v=Tiahura$reef[i,3], lty=3) if((Tiahura$reef[i,3] - Tiahura$reef[i,2])<100) { text(x=(Tiahura$reef[i,2] + Tiahura$reef[i,3])/2, y=2350, labels=toupper(Tiahura$reef[i,1]),srt=90,adj=0) } else { text(x=(Tiahura$reef[i,2] + Tiahura$reef[i,3])/2, y=-150, labels=toupper(Tiahura$reef[i,1])) } } ## Third graph: bottom composition par(mar=c(5,6.5,0,2)) plot(NA,xlim=range(dst), ylim=c(0,1), las=1, ylab="Bottom composition\n(proportions)\n", xlab="Distance (m)") bot <- cbind(0, Tiahura$habitat[,3:10]) for(i in 2:9) bot[,i] <- bot[,i] + bot[,i-1] cols <- c("", "grey75", "brown", "grey25", "green", "purple", "lightgreen", "yellow", "white") for(i in 2:9) polygon(x=c(dst, rev(dst)),y=c(bot[,i], rev(bot[,i-1]))/50, col=cols[i]) text(x=c(44, 365, 707, 538, 957, 111, 965), y=c(0.05, 0.47, 0.37, 0.58, 0.42, 0.80, 0.88), labels=colnames(bot)[2:8], xpd=TRUE) ## Species presence graph set: plot_slice <- function(sl,split) { size <- ceiling(length(Tiahura$species)/split) sp_slice <- size*(sl - 1L) + (1L:size) image(z=t(as.matrix(Tiahura$fish[,sp_slice])),y=1:nrow(Tiahura$fish), x=1:length(sp_slice),zlim=c(0,1),col=c("white","black"),axes=FALSE, ylab="",xlab="") axis(1L,at=1:length(sp_slice),labels=Tiahura$species[sp_slice],las=2L) axis(2L,at=1:nrow(Tiahura$fish),label=rownames(Tiahura$fish),las=1L) invisible(NULL) } par(mar=c(15,5,2,2)) plot_slice(1L,5L) ## plot_slice(2L,5L) ## plot_slice(3L,5L) ## plot_slice(4L,5L) ## plot_slice(5L,5L)
This function computes paired t-tests for each species, for abundances observed at time 1 (T1) and time 2 (T2). The test is one-tailed in the direction of the sign (+ or -) of the t statistic.
tpaired.krandtest(mat1, mat2, nperm = 99, list.all = FALSE)
tpaired.krandtest(mat1, mat2, nperm = 99, list.all = FALSE)
mat1 |
site-by-species data at time T1 (data.frame or matrix). |
mat2 |
site-by-species data at time T2 (data.frame or matrix). |
nperm |
Number of permutations. Use 999, 9999, or more, to allow for correction of p-values for multiple tests. |
list.all |
If |
The species that do not vary in either data set are discarded before calculation of the paired t-tests begins.
p-values should be corrected for multiple testing. Use function p.adjust
of
stats
: p.adjust(res$t.test$p.param) or p.adjust(res$t.test$p.perm)
Correction methods "holm" (default) and "hochberg" are fine for this type of analysis.
A table with species in rows and 6 columns: "mean(T1-T2)","t.stat","p.param","p.perm","p<=0.05","Sign(T1-T2)" The parametric and permutational p-values are not corrected for multiple tests. A star is shown in column "p<=0.05" if the parametric p-value is <= 0.05.
A list of names of the species tested; their t statistics were not 0.
A list of names of the species not tested because their t-statistics were 0.
Pierre Legendre [email protected]
Legendre, P. 2019. A temporal beta-diversity index to identify sites that have changed in exceptional ways in space-time surveys. Ecology and Evolution (in press).
van den Brink, P. J. & C. J. F. ter Braak. 1999. Principal response curves: analysis of time-dependent multivariate responses of biological community to stress. Environmental Toxicology and Chemistry 18: 138-148.
tpaired.randtest
if(require("vegan", quietly = TRUE)) { ## Invertebrate communities subjected to insecticide treatment. ## As an example in their paper on Principal Response Curves (PRC), van den Brink & ter ## Braak (1999) used observations on the abundances of 178 invertebrate species ## (macroinvertebrates and zooplankton) subjected to treatments in 12 mesocosms by the ## insecticide chlorpyrifos. The mesocosms were sampled at 11 occasions. The data, ## available in the {vegan} package, are log-transformed species abundances, ## y.tranformed = loge(10*y+1). ## The data of survey #4 will be compared to those of survey #11 in this example. ## Survey #4 was carried out one week after the insecticide treatment, whereas the ## fauna of the mesocosms was considered to have fully recovered from the treatment ## at the time of survey #11. data(pyrifos) ## The mesocosms had originally been attributed at random to the treatments. However, ## to facilitate presentation of the results, they will be listed here in order of ## increased insecticide doses: {0, 0, 0, 0, 0.1, 0.1, 0.9, 0.9, 6, 6, 44, 44} ## micro g/L. survey4.order = c(38,39,41,47,37,44,40,46,43,48,42,45) survey11.order = c(122,123,125,131,121,128,124,130,127,132,126,129) ## Paired t-tests of differences between survey.4 and survey.11 for the p species res <- tpaired.krandtest(pyrifos[survey4.order,],pyrifos[survey11.order,]) }
if(require("vegan", quietly = TRUE)) { ## Invertebrate communities subjected to insecticide treatment. ## As an example in their paper on Principal Response Curves (PRC), van den Brink & ter ## Braak (1999) used observations on the abundances of 178 invertebrate species ## (macroinvertebrates and zooplankton) subjected to treatments in 12 mesocosms by the ## insecticide chlorpyrifos. The mesocosms were sampled at 11 occasions. The data, ## available in the {vegan} package, are log-transformed species abundances, ## y.tranformed = loge(10*y+1). ## The data of survey #4 will be compared to those of survey #11 in this example. ## Survey #4 was carried out one week after the insecticide treatment, whereas the ## fauna of the mesocosms was considered to have fully recovered from the treatment ## at the time of survey #11. data(pyrifos) ## The mesocosms had originally been attributed at random to the treatments. However, ## to facilitate presentation of the results, they will be listed here in order of ## increased insecticide doses: {0, 0, 0, 0, 0.1, 0.1, 0.9, 0.9, 6, 6, 44, 44} ## micro g/L. survey4.order = c(38,39,41,47,37,44,40,46,43,48,42,45) survey11.order = c(122,123,125,131,121,128,124,130,127,132,126,129) ## Paired t-tests of differences between survey.4 and survey.11 for the p species res <- tpaired.krandtest(pyrifos[survey4.order,],pyrifos[survey11.order,]) }
This function computes a permutation test of comparison of the means of two paired vectors (related samples). For each object, permutations are restricted to the two related observations.
tpaired.randtest( vec1, vec2, nperm = 99, alternative = "two.sided", silent = FALSE )
tpaired.randtest( vec1, vec2, nperm = 99, alternative = "two.sided", silent = FALSE )
vec1 , vec2
|
The two data vectors to be compared. |
nperm |
Number of permutations. Use 999, 9999, or more. |
alternative |
c("two.sided", "less", "greater"). Default value: "two.sided". |
silent |
If |
A list containing the following results:
estim
: mean of the differences
t.ref
: reference value
of the t-statistic
p.param
: parametric p-value
p.perm
: permutational p-value
nperm
: number of
permutations
- Pierre Legendre [email protected] Permutation code improved by Guillaume Blanchet.
Zar, J. H. 1999. Biostatistical analysis. 4th edition. Prentice Hall, New Jersey.
## Deer leg length, data from Zar (1999, p. 162). deer <- matrix(c(142,140,144,144,142,146,149,150,142,148,138,136,147,139,143,141,143, 145,136,146),10,2) rownames(deer) <- paste("Deer",1:10,sep=".") colnames(deer) <- c('Hind.leg', 'Fore.leg') res <- tpaired.randtest(deer[,1], deer[,2]) # Two-tailed test by default ## Compare the results to: res2 = t.test(deer[,1], deer[,2], paired=TRUE)
## Deer leg length, data from Zar (1999, p. 162). deer <- matrix(c(142,140,144,144,142,146,149,150,142,148,138,136,147,139,143,141,143, 145,136,146),10,2) rownames(deer) <- paste("Deer",1:10,sep=".") colnames(deer) <- c('Hind.leg', 'Fore.leg') res <- tpaired.randtest(deer[,1], deer[,2]) # Two-tailed test by default ## Compare the results to: res2 = t.test(deer[,1], deer[,2], paired=TRUE)
A dataset containing the abundances of 56 Trichoptera species captured in 1984 during 100 consecutive days in 22 emergence traps (sites) along the outflow stream of Lake Cromwell (45°59'13" N, 73°59'47" W) on the territory of Station de Biologie des Laurentides (Université; de Montéal).
data(trichoptera)
data(trichoptera)
A data frame with (10 x 22) = 220 rows and 58 variables (Site, Date, and 56 species).
Row order in the data file: the 22 traps (sites) are nested within the 10 survey periods.
The original daily data collected at each site were pooled into 10 survey periods for the study of Legendre et al. (2010) in order to reduce the very high proportion of zeros in the original data matrix. The 22 traps form a regular transect, with geographic positions 1 to 22.
Data provided by Prof. Pierre-Paul Harper, Département de sciences biologiques, Université de Montréal.
Legendre, P., M. De Caceres & D. Borcard. 2010. Community surveys through space and time to assess environmental changes: testing space-time interaction in the absence of replication. Ecology 91: 262-272.
Compute a multivariate empirical variogram. It is strictly equivalent to summing univariate variograms
variogmultiv(Y, xy, dmin = 0, dmax = max(dist(xy)), nclass = 20)
variogmultiv(Y, xy, dmin = 0, dmax = max(dist(xy)), nclass = 20)
Y |
A matrix with numeric data |
xy |
A matrix with coordinates of samples |
dmin |
The minimum distance value at which the variogram is computed (i.e. lower bound of the first class) |
dmax |
The maximum distance value at which the variogram is computed (i.e. higher bound of the last class) |
nclass |
Number of classes of distances |
A list:
d |
Distances (i.e. centers of distance classes). |
var |
Empirical semi-variances. |
n.w |
Number of connections between samples for a given distance. |
n.c |
Number of samples used for the computation of the variogram. |
dclass |
Character vector with the names of the distance classes. |
Stéphane Dray [email protected]
Wagner H. H. (2003) Spatial covariance in plant communities: integrating ordination, geostatistics, and variance testing. Ecology, 84, 1045–1057
if(require(ade4)){ data(oribatid) # Hellinger transformation fau <- sqrt(oribatid$fau / outer(apply(oribatid$fau, 1, sum), rep(1, ncol(oribatid$fau)), "*")) # Removing linear effect faudt <- resid(lm(as.matrix(fau) ~ as.matrix(oribatid$xy))) mvspec <- variogmultiv(faudt, oribatid$xy, nclass = 20) mvspec plot(mvspec$d, mvspec$var,type = 'b', pch = 20, xlab = "Distance", ylab = "C(distance)") }
if(require(ade4)){ data(oribatid) # Hellinger transformation fau <- sqrt(oribatid$fau / outer(apply(oribatid$fau, 1, sum), rep(1, ncol(oribatid$fau)), "*")) # Removing linear effect faudt <- resid(lm(as.matrix(fau) ~ as.matrix(oribatid$xy))) mvspec <- variogmultiv(faudt, oribatid$xy, nclass = 20) mvspec plot(mvspec$d, mvspec$var,type = 'b', pch = 20, xlab = "Distance", ylab = "C(distance)") }
Whittaker-Robinson periodogram for univariate series of quantitative data.
WRperiodogram( x, T1 = 2, T2, nperm = 999, nopermute, mult = c("sidak", "bonferroni"), print.time = FALSE ) ## S3 method for class 'WRperio' plot(x, prog = 1, alpha = 0.05, line.col = "red", main = NULL, ...)
WRperiodogram( x, T1 = 2, T2, nperm = 999, nopermute, mult = c("sidak", "bonferroni"), print.time = FALSE ) ## S3 method for class 'WRperio' plot(x, prog = 1, alpha = 0.05, line.col = "red", main = NULL, ...)
x |
A vector of quantitative values, with class |
T1 |
First period included in the calculation (default: |
T2 |
Last period included in the calculation (default: |
nperm |
Number of permutations for the tests of significance. |
nopermute |
List of item numbers that should not be permuted; see Details (default: no items should be excluded from the permutations). |
mult |
Correction method for multiple testing. Choices are "bonferroni"
and "sidak" (default: |
print.time |
Print the computation time. Useful when planning the
analysis of a long data series with a high number of permutations. Default:
|
prog |
|
alpha |
Significance level for the plot; p-values smaller than or equal
to alpha are represented by black symbols. Default: |
line.col |
Colour of the lines between symbols in the graph (default:
|
main |
Main title of the plot. Users can write a custom title, in quotes
(default: |
... |
Other graphical arguments passed to this function. |
The Whittaker-Robinson periodogram (Whittaker and Robinson, 1924) identifies periodic components in a vector of quantitative data. The data series must contain equally-spaced observations (i.e. constant lag) along a transect in space or through time. The vector may contain missing observations, represented by NA, in reasonable amount, e.g. up to a few percent of the total number of observations. The periodogram statistic used in this function is the standard deviation of the means of the columns of the Buys-Ballot table (Enright, 1965). The method is also described in Legendre & Legendre (2012, Section 12.4.1). Missing values (NA) are handled by skipping the NA values when computing the column means of the Buys-Ballot table.
The data must be stationary before computation of the periodogram. Stationarity is violated when there is a trend in the data or when they were obtained under contrasting environmental or experimental conditions. Users should at least test for the presence of a significant linear trend in the data (using linear regression); if a significant trend is identified, it can be removed by computing regression residuals.
The nopermute
option allows users to include a list of items numbers
that should not be permuted, whether the observations are NA or zero values.
This option should not be used in routine work. It is intended for special
situations where observations could not be made at some points along the
space or time series because that was impossible. For example, in a spatial
data series along a river, if points fall on emerging rocks or on islands, no
observation of phytoplankton could have been made at those points. For the
permutation test, values at these positions (NA or 0) should not be permuted
with values at points where observations were possible.
The graph produced by the plot
function shows the periodogram
statistics and their significance following a permutation test, with periods
in the abscissa. The p-values may be corrected for multiple testing using
either the Bonferroni or the Sidak correction, which can be applied to all
values in the correlogram uniformly, or following a progressive correction.
A progressive correction means that for the first periodogram statistic, the p-value is tested against the alpha significance level without any correction; for the second statistic, the p-value is corrected for 2 simultaneous tests; and so forth until the k-th statistic, where the p-value is corrected for k simultaneous tests. This approach solves the problem of "where to stop interpreting a periodogram"; one goes on as long as significant values emerge, considering the fact that the tests become progressively more conservative.
In the Whittaker-Robinson periodogram, harmonics of a basic period are often found to be also significant.
The permutation tests, which can take a bit of time in very large jobs, can
be interrupted by issuing an Escape
command. One can also click the
STOP
button at the top of the R console.
The function produces an object of class WRperio
containing a
table with the following columns:
Period |
period number; |
WR.stat |
periodogram statistic; |
p-value |
p-value after permutation test; |
p.corrected |
p-value corrected for multiple tests, using the Bonferroni or Sidak method; |
p.corr.prog |
p-value after progressive correction. |
When the p-values cannot be computed because of a very high proportion of missing values in the data, values of 99 are posted in the last three columns of the output table.
Pierre Legendre [email protected] and Guillaume Guenard
Enright, J. T. 1965. The search for rhythmicity in biological time-series. Journal of Theoretical Biology 8: 426-468.
Legand, M. 1958. Variations diurnes du zooplancton autour de la Nouvelle-Calédonie. O.R.S.T.O.M., Inst. Fr. Océanie Sect. Océanogr. Rapp. Sci. (6): 1-42.
Legendre, P. and L. Legendre. 2012. Numerical ecology, 3rd English edition. Elsevier Science BV, Amsterdam.
Sarrazin, J., D. Cuvelier, L. Peton, P. Legendre and P. M. Sarradin. 2014. High-resolution dynamics of a deep-sea hydrothermal mussel assemblage monitored by the EMSO-Açores MoMAR observatory. Deep-Sea Research I 90: 62-75. (Recent application in oceanography)
Whittaker, E. T. and G. Robinson. 1924. The calculus of observations - A treatise on numerical mathematics. Blackie & Son, London.
### 1. Numerical example from Subsection 12.4.1 of Legendre and Legendre (2012) test.vec <- c(2,2,4,7,10,5,2,5,8,4,1,2,5,9,6,3) # Periodogram with permutation tests of significance res <- WRperiodogram(test.vec) plot(res) # Plot the periodogram ##### ### 2. Simulated data # Generate a data series with periodic component using Legand's (1958) equation. # Ref. Legendre and Legendre (2012, eq. 12.8, p. 753) # x = time points, T = generated period, c = shift of curve, left (+) or right (-) periodic.component <- function(x,T,c){cos((2*pi/T)*(x+c))} n <- 500 # corresponds to 125 days with 4 observations per day # Generate a lunar cycle, 29.5 days (T=118) moon <- periodic.component(1:n, 118, 59) # Generate a circadian cycle (T=4) daily <- periodic.component(1:n, 4, 0) # Generate an approximate tidal cycle (T=2) # A real tidal signal would have a period of 12.42 h tide <- periodic.component(1:n, 2, 0) # Periodogram of the lunar component only res.moon.250 <- WRperiodogram(moon, nperm=0) # T1=2, T2=n/2=250; no test res.moon.130 <- WRperiodogram(moon, T2=130, nperm=499) oldpar <- par(mfrow=c(1,2)) # Plot 2 moon cycles, n = 118*2 = 236 points plot(moon[1:236], xlab="One time unit = 6 hours") plot(res.moon.130, prog=1) # Plot the periodogram ##### # Add the daily and tidal components, plus a random normal error. With daily (T=4) and # tide (T=2), period 4 and its harmonics should have a higher W statistic than period 2 var1 <- daily + tide + rnorm(n, 0, 0.5) # Plot a portion (40 points) of the data series # Two periodic components identifiable. Tide (T=2) reinforces the daily signal (T=4) par(mfrow=c(1,2)) plot(var1[1:40], pch=".", cex=1, xlab="One time unit = 6 hours") lines(var1[1:40]) # Periodogram of 'var' res.var1 <- WRperiodogram(var1, T2=40, nperm=499) plot(res.var1, prog=3, line.col="blue") # Plot the periodogram # The progressive correction for multiple tests (prog=3) was used in the periodogram. ##### # Add the three components, plus a random normal error term # to show that the WRperiodogram can test several periodic components at the same time. # (5*moon) makes the lunar periods stronger than the daily and tidal periods var2 <- 5*moon + daily + tide + rnorm(n, 0, 0.5) # Plot a portion (150 points) of the data series # The three periodic components are identifiable par(mfrow=c(1,2)) plot(var2[1:150], pch=".", cex=1, xlab="One time unit = 6 hours") lines(var2[1:150]) # Periodogram of 'var' res.var2 <- WRperiodogram(var2, T2=130, nperm=499) plot(res.var2, prog=1, line.col="blue") # Plot the periodogram # Find the position of the maximum W statistic value in this periodogram (which(res.var2[,2] == max(res.var2[,2])) -1) # "-1" correction at the end of the previous line: the first computed period is T=2, # so period #118 is on line #117 of file res.var2 ##### # Illustration that the WR periodogram can handle missing values: # Replace 10% of the 500 data by NA select <- sort(sample(1:500)[1:50]) var.na <- var2 var.na[select] <- NA res.var.na <- WRperiodogram(var.na, T2=130, nperm=499) # Plot the periodogram with no correction for multiple tests plot(res.var.na, prog=1) # Plot periodogram again with progressive correction for multiple tests plot(res.var.na, prog=3) ##### ### 3. Data used in the examples of the documentation file of function afc() of {stats} # Data file "ldeaths"; time series, 6 years x 12 months of deaths in UK hospitals # First, examine the data file ldeaths. Then: ld.res.perio <- WRperiodogram(ldeaths, nperm=499) # Plot the periodogram with two types of corrections for multiple tests par(mfrow=c(1,2)) plot(ld.res.perio, prog=1) # No correction for multiple testing plot(ld.res.perio, prog=3) # Progressive correction for multiple tests # The yearly cycle and harmonics are significant # Compare the results of afc() to those of WRperiodogram above acf(ldeaths) # lag=1.0 is one year; see ?acf par(oldpar)
### 1. Numerical example from Subsection 12.4.1 of Legendre and Legendre (2012) test.vec <- c(2,2,4,7,10,5,2,5,8,4,1,2,5,9,6,3) # Periodogram with permutation tests of significance res <- WRperiodogram(test.vec) plot(res) # Plot the periodogram ##### ### 2. Simulated data # Generate a data series with periodic component using Legand's (1958) equation. # Ref. Legendre and Legendre (2012, eq. 12.8, p. 753) # x = time points, T = generated period, c = shift of curve, left (+) or right (-) periodic.component <- function(x,T,c){cos((2*pi/T)*(x+c))} n <- 500 # corresponds to 125 days with 4 observations per day # Generate a lunar cycle, 29.5 days (T=118) moon <- periodic.component(1:n, 118, 59) # Generate a circadian cycle (T=4) daily <- periodic.component(1:n, 4, 0) # Generate an approximate tidal cycle (T=2) # A real tidal signal would have a period of 12.42 h tide <- periodic.component(1:n, 2, 0) # Periodogram of the lunar component only res.moon.250 <- WRperiodogram(moon, nperm=0) # T1=2, T2=n/2=250; no test res.moon.130 <- WRperiodogram(moon, T2=130, nperm=499) oldpar <- par(mfrow=c(1,2)) # Plot 2 moon cycles, n = 118*2 = 236 points plot(moon[1:236], xlab="One time unit = 6 hours") plot(res.moon.130, prog=1) # Plot the periodogram ##### # Add the daily and tidal components, plus a random normal error. With daily (T=4) and # tide (T=2), period 4 and its harmonics should have a higher W statistic than period 2 var1 <- daily + tide + rnorm(n, 0, 0.5) # Plot a portion (40 points) of the data series # Two periodic components identifiable. Tide (T=2) reinforces the daily signal (T=4) par(mfrow=c(1,2)) plot(var1[1:40], pch=".", cex=1, xlab="One time unit = 6 hours") lines(var1[1:40]) # Periodogram of 'var' res.var1 <- WRperiodogram(var1, T2=40, nperm=499) plot(res.var1, prog=3, line.col="blue") # Plot the periodogram # The progressive correction for multiple tests (prog=3) was used in the periodogram. ##### # Add the three components, plus a random normal error term # to show that the WRperiodogram can test several periodic components at the same time. # (5*moon) makes the lunar periods stronger than the daily and tidal periods var2 <- 5*moon + daily + tide + rnorm(n, 0, 0.5) # Plot a portion (150 points) of the data series # The three periodic components are identifiable par(mfrow=c(1,2)) plot(var2[1:150], pch=".", cex=1, xlab="One time unit = 6 hours") lines(var2[1:150]) # Periodogram of 'var' res.var2 <- WRperiodogram(var2, T2=130, nperm=499) plot(res.var2, prog=1, line.col="blue") # Plot the periodogram # Find the position of the maximum W statistic value in this periodogram (which(res.var2[,2] == max(res.var2[,2])) -1) # "-1" correction at the end of the previous line: the first computed period is T=2, # so period #118 is on line #117 of file res.var2 ##### # Illustration that the WR periodogram can handle missing values: # Replace 10% of the 500 data by NA select <- sort(sample(1:500)[1:50]) var.na <- var2 var.na[select] <- NA res.var.na <- WRperiodogram(var.na, T2=130, nperm=499) # Plot the periodogram with no correction for multiple tests plot(res.var.na, prog=1) # Plot periodogram again with progressive correction for multiple tests plot(res.var.na, prog=3) ##### ### 3. Data used in the examples of the documentation file of function afc() of {stats} # Data file "ldeaths"; time series, 6 years x 12 months of deaths in UK hospitals # First, examine the data file ldeaths. Then: ld.res.perio <- WRperiodogram(ldeaths, nperm=499) # Plot the periodogram with two types of corrections for multiple tests par(mfrow=c(1,2)) plot(ld.res.perio, prog=1) # No correction for multiple testing plot(ld.res.perio, prog=3) # Progressive correction for multiple tests # The yearly cycle and harmonics are significant # Compare the results of afc() to those of WRperiodogram above acf(ldeaths) # lag=1.0 is one year; see ?acf par(oldpar)