Title: | Palaeoecological and Palaeogeographical Analysis Tools |
---|---|
Description: | A set of analytical tools useful in analysing ecological and geographical data sets, both ancient and modern. The package includes functions for estimating species richness (Chao 1 and 2, ACE, ICE, Jacknife), shared species/beta diversity, species area curves and geographic distances and areas. |
Authors: | Matthew J. Vavrek <[email protected]> |
Maintainer: | Matthew J. Vavrek <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.4.0 |
Built: | 2024-12-18 06:38:18 UTC |
Source: | CRAN |
A set of analytical tools useful in analysing ecological and geographical data sets, both ancient and modern. The package includes functions for estimating species richness (Chao 1 and 2, ACE, ICE, Jacknife), shared species/beta diversity, species area curves and geographic distances.
The fossil package is designed to be used by palaeoecologists and palaeobiogeographers, providing to them a set of useful tools including species similarity indices, species estimators, minimum spanning tree/forest functions, and an assortment of other useful tools.
Matthew Vavrek <[email protected]>
Computes the extrapolated species richness of a population using the Abundance- and Incidence-based Coerage Estimators
ACE(x, taxa.row = TRUE) ICE(x, taxa.row = TRUE)
ACE(x, taxa.row = TRUE) ICE(x, taxa.row = TRUE)
x |
a vector, matrix or data frame of positive integers or zero of any size |
taxa.row |
whether each row of the matrix is a different taxon; if so, value is T |
These functions compute the ACE and ICE richness estimators, respectively.
Both functions will accept a vector, matrix or data frame of any size made up of positive
integers and zeros. Matrices are by default treated such that each row is a
different taxon and each column is a sample or locality, however if they are
arranged with the taxa as columns, change the argument taxa.row
to FALSE
Take note that ACE
is intended only for use with abundance data, and not
presence absence data. While ICE
will accept abundance matrices, it
will internally convert the matrix to presence absence data. Note that if
ACE
returns NaN or Inf as a value, that Chao1 will be used in it's place
as per the recommendation made by Colwell in EstimateS.
A value representing a minimum number of species present in the assemblage if the entire population were to be censused.
Matthew Vavrek, with recommendations from the EstimateS reference manual by R.K. Colwell
Chao, A., M.-C. Ma, & M. C. K. Yang. 1993. Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80, 193-201.
Chao, A., W.-H. Hwang, Y.-C. Chen, and C.-Y. Kuo. 2000. Estimating the number of shared species in two communities. Statistica Sinica 10:227-246.
Chazdon, R. L., R. K. Colwell, J. S. Denslow, & M. R. Guariguata. 1998. Statistical methods for estimating species richness of woody regeneration in primary and secondary rain forests of NE Costa Rica. Pp. 285-309 in F. Dallmeier and J. A. Comiskey, eds. Forest biodiversity research, monitoring and modeling: Conceptual background and Old World case studies. Parthenon Publishing, Paris.
For related species estimators, see chao1
, bootstrap
and jack1
, and spp.est
to calculate multiple indices at once.
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) ACE(a) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) ACE(a) ICE(a) ## presence absence matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) ACE(a) ICE(a)
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) ACE(a) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) ACE(a) ICE(a) ## presence absence matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) ACE(a) ICE(a)
Test if two empirical samples are drwan from the same or different communities
aic.nest(comm1, comm2, base=exp(1))
aic.nest(comm1, comm2, base=exp(1))
comm1 , comm2
|
lists of abundance data from two empirical samples |
base |
base of the log used in the calculation of Shannon's diversity |
This function tests if two empirical samples are drawn from the same community, based on the AIC scores.
returns two AIC scores, the first assuming the two empirical samples are drawn from the same distribution, the other assuming the two empirical samples are from different distributions
Matthew Vavrek
#for example, two different communities a<-c(12,4,12,1,4,0,6,5,0,0,0) b<-c(0,11,4,3,6,7,7,2,23,5,8) #if the aic score is lower, it is the better model aic.nest(a,b) #from the same community a<-c(5,6,5,6,5,6,5,6,5,2,1,1) b<-c(2,3,2,3,2,3,2,3,2,1,0,0) aic.nest(a,b)
#for example, two different communities a<-c(12,4,12,1,4,0,6,5,0,0,0) b<-c(0,11,4,3,6,7,7,2,23,5,8) #if the aic score is lower, it is the better model aic.nest(a,b) #from the same community a<-c(5,6,5,6,5,6,5,6,5,2,1,1) b<-c(2,3,2,3,2,3,2,3,2,1,0,0) aic.nest(a,b)
Computes the bootstrap species richness estimator for abundance or presence-absence data
bootstrap(x, taxa.row = TRUE, abund = TRUE, samples = NA)
bootstrap(x, taxa.row = TRUE, abund = TRUE, samples = NA)
x |
a vector, matrix or data frame of positive integers or zero of any size |
taxa.row |
whether each row of the matrix is a different taxon |
abund |
whether the input is abundance (or presence/absence) based |
samples |
if input is a vector file, the number of samples must be included |
The bootstrap estimator
Returns a single value for the Bootstrap Species Estimator
Matthew Vavrek
Smith, E.P. & van Belle, G. 1984. Nonparametric estimation of species richness. Biometrics 40, 119-129.
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) bootstrap(a,samples=45) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) bootstrap(a) bootstrap(a,,FALSE) ## presence absence matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) bootstrap(a,,FALSE)
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) bootstrap(a,samples=45) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) bootstrap(a) bootstrap(a,,FALSE) ## presence absence matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) bootstrap(a,,FALSE)
Computes the standard error for chao1
or chao2
chao.sd(x)
chao.sd(x)
x |
a vector of abundances or frequencies of occurrences |
primarily designed to be used internally by spp.est
to calculate the
errors for the chao estimators
returns a value for standard deviation for chao1
or chao2
Matthew Vavrek
Colwell, R.K. 2010. EstimateS: Statistical estimation of species richness and shared species from samples. Version 8.2. User's Guide and application published at: http://purl.oclc.org/estimates.
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) chao.sd(a)
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) chao.sd(a)
Chao's Jaccard and Sorenson shared species estimators for use with incomplete datasets
chao.sorenson(x, y) chao.jaccard(x, y)
chao.sorenson(x, y) chao.jaccard(x, y)
x |
species from group A |
y |
species from group B |
You must provide two separate vectors, with species arranged in the same order, from area A and B. If species are present in one site but not the other, these must be recorded for both sites; the site where they are not found should be coded as a zero. Species not present at either site are ignored.
Returns the Chao-Jaccard or Chao-Sorenson similarity index for the two sites in question.
Matthew Vavrek
Chao, A., R. L. Chazdon, et al. 2005. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters 8: 148-159.
##Species counts from two different locations a <- c(1,0,4,3,5,0,0,7) b <- c(2,1,3,0,0,1,0,6) chao.sorenson(a,b) chao.jaccard(a,b)
##Species counts from two different locations a <- c(1,0,4,3,5,0,0,7) b <- c(2,1,3,0,0,1,0,6) chao.sorenson(a,b) chao.jaccard(a,b)
Computes the Chao species estimator for abundance or presence-absence data
chao1(x, taxa.row = TRUE) chao2(x, taxa.row = TRUE)
chao1(x, taxa.row = TRUE) chao2(x, taxa.row = TRUE)
x |
a vector, matrix or data frame with species by samples |
taxa.row |
a logical argument if the species are the rows or columns |
chao1
will return an estimate of species richness based on a vector or matrix of abundance data, while chao2
will return an estimate of species richness based on incidence data. Note that chao1
estimator is for abundance data only. The chao2
estimator can be given abundance data and it will automagically convert it to incidence data, but due to the nature of the estimator, the data must contain more than one sample (ie the data must be arranged in a minimum 2 by 2 matrix).
returns a value for the Chao Species Estimator for a the given data.
While the function will still return a value, if all the species abundances are equal to 1 in the input to chao1
, a warning will be raised, and the value returned will be equal to the number of species observed.
Matthew Vavrek
Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scandinavian Journal of Statistics 11: 265-270.
Chao, A. 1987. Estimating the Population Size for Capture-Recapture Data with Unequal Catchability. Biometrics 43: 783-791.
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) chao1(a) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) chao1(a) chao2(a) ## presence absence matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) chao1(a) chao2(a)
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) chao1(a) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) chao1(a) chao2(a) ## presence absence matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) chao1(a) chao2(a)
Computes Cohesiveness Index for a Cluster Analysis
coi(mst, groups)
coi(mst, groups)
mst |
A minimum spanning tree matrix (binary) |
groups |
A vector with the group/cluster assignments for each sample |
Returns a something
While the function will still return a value, if all the species abundances are equal to 1 in the input to chao1
, a warning will be raised, and the value returned will be equal to the number of species observed.
Matthew Vavrek
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) chao1(a) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) chao1(a) chao2(a) ## presence absence matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) chao1(a) chao2(a)
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) chao1(a) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) chao1(a) chao2(a) ## presence absence matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) chao1(a) chao2(a)
Create a matrix of locations with a column of latitudes and longitudes
create.lats(x, loc="locality", long="longitude", lat="latitude")
create.lats(x, loc="locality", long="longitude", lat="latitude")
x |
a table arranged in columnar format, with one column indicating the locations, another the latitude and another the longitude |
loc |
the name or number of the column giving the names of the locations to be used |
long |
the name or number of the column giving the longitude of the locations |
lat |
the name or number of the column giving the latitude of the locations |
This function will create a location table with longitude (X) and latitude (Y) or their equivalents for every location. This function ceates a matrix in the format needed for most of the geographic functions found in the fossil package.
A matrix with a column of longitude and latitude, respectively with rownames correspnding to each location
Matthew Vavrek
#to reproduce the fdata.lats dataset data(fdata.list) create.lats(fdata.list, loc="locality", long="longitude", lat="latitude")
#to reproduce the fdata.lats dataset data(fdata.list) create.lats(fdata.list, loc="locality", long="longitude", lat="latitude")
Create a matrix with taxa as rows and occurrences or samples as columns
create.matrix( x, tax.name="genus", locality="locality", time.col=NULL, time=NULL, abund=FALSE, abund.col="abundance" )
create.matrix( x, tax.name="genus", locality="locality", time.col=NULL, time=NULL, abund=FALSE, abund.col="abundance" )
x |
a table arranged in columnar format, with at least one column indicating name of taxa and another giving location or sample |
tax.name |
the name or number of the column giving the taxonomic names to be used (the rows of the matrix to be created) |
locality |
the name or number of the column giving the locations of the samples (the columns of the matrix to be created) |
time.col |
what is the column name or number containing the time periods; if left null, filtering for time willbe ignored |
time |
what time periods to keep for the matrix; if left null, filtering for time willbe ignored |
abund |
whether to record abundances of taxa; if left FALSE, a binary (presence/absence) matrix is created |
abund.col |
column name or number containing abundance values |
This is a helper function to convert large lists of data into matrices of
species (rows) and locations (columns). The parameters can be adjusted to
create either a binary (presence/absence) or abundance matrix. The setup of
the table is largely flexible; simply input the column names or numbers
containing the pertinent information. To filter data according to time, both
the time column and the time period must be specified. For abundance, the
default title for the abundance column is simply "abundance"; the function
will not work if you have chosen to include abundances (abund = TRUE
but the name of the abundance column is incorrect.
A matrix of taxa (rows) by localities (columns).
At present, the function will ignore rows where the taxon name is NA
, NULL
, ''
(empty character value) or ' '
(single space), as these labels typically represent an unknown taxa, which would be inappropriate to include in most analyses.
Matthew Vavrek
#converting the fdata.list dataset into a matrix of species (rows) #by samples (columns) with abundance data data(fdata.list) create.matrix(fdata.list, tax.name = "species", abund=TRUE) #same data set, but now for an occurrence matrix create.matrix(fdata.list, tax.name = "species", locality = "locality")
#converting the fdata.list dataset into a matrix of species (rows) #by samples (columns) with abundance data data(fdata.list) create.matrix(fdata.list, tax.name = "species", abund=TRUE) #same data set, but now for an occurrence matrix create.matrix(fdata.list, tax.name = "species", locality = "locality")
Haversine formula to calculate distances between points on the earth
deg.dist( long1, lat1, long2, lat2 )
deg.dist( long1, lat1, long2, lat2 )
long1 |
longitude of location 1 |
lat1 |
latitude of location 1 |
long2 |
longitude of location 2 |
lat2 |
latitude of location 2 |
This function will calculate the shortest distance (portion of a Great Circle) in kilometers between two points on the Earth given their latitude and longitude.
Arc distance between two points on the Earth's surface in kilometers.
The distance calculated may be up to 0.2% inaccurate, as this function treats the Earth as a sphere with a circumference of 40041.47 km (mean circumference), rather than an ellipsoid like it actually is.
Matthew Vavrek
The formulas for the Haversine distance function were taken from the Dr. Math website at http://mathforum.org/library/drmath/view/55417.html
To calculate pairwise distances between a list of points see earth.dist
, or to calculate an area enclosed by three points on the Earth's surface, see earth.tri
##distance between 23 degrees N 54 degrees E and 32 degrees S 67 degrees E deg.dist(23,54,-32,67)
##distance between 23 degrees N 54 degrees E and 32 degrees S 67 degrees E deg.dist(23,54,-32,67)
Methods for calculating a minimum spanning tree or network between a number of points given a distance matrix.
dino.mst(x, random.start = TRUE, random.search = TRUE) dino.msn(x)
dino.mst(x, random.start = TRUE, random.search = TRUE) dino.msn(x)
x |
a distance matrix for any number of points |
random.start |
If the minimum spanning tree is to start at a random point and not the first given site (default is TRUE) |
random.search |
If there is more than one shortest possible branch, should one be chosen randomly |
Ensure that a distance matrix is used, and not a similarity matrix, otherwise the result given will be highly incorrect.
Returns a binary matrix where connections between points are denoted by a 1.
Yvonnick Noel, Julien Claude and Emmanuel Paradis with modifications from Matthew Vavrek
#minimum spanning tree for the fdata set data(fdata.mat) fdata.dist<-dino.dist(fdata.mat) dino.mst<-dino.mst(fdata.dist)
#minimum spanning tree for the fdata set data(fdata.mat) fdata.dist<-dino.dist(fdata.mat) dino.mst<-dino.mst(fdata.dist)
Calculate the bearing in degrees clockwise from True North between any two points on the globe.
earth.bear(long1, lat1, long2, lat2)
earth.bear(long1, lat1, long2, lat2)
long1 |
Longitude of site 1 |
lat1 |
Latitude of site 1 |
long2 |
Longitude of site 2 |
lat2 |
Latitude of site 2 |
Calculate the bearing in degrees clockwise from True North between any two points on the globe. Primarily designed to be used with other included geographic tools.
Returns a value in degrees from True North between two geographic points.
Matthew Vavrek
Haversine formula from Math Forums: Ask Dr. Math at http://mathforum.org/dr.math/
earth.bear(-100, 30, 20, -40)
earth.bear(-100, 30, 20, -40)
Create a distance matrix (lower triangle) between a list of points
earth.dist(lats, dist = TRUE)
earth.dist(lats, dist = TRUE)
lats |
a table with a longitude and latitude column respectively as the first two columns |
dist |
A logical argument whether to create a distance matrix (lower triangle) or full matrix |
This function will calculate the pairwise distances between all points given and return either a distance or full matrix as specified. All coordinates must be in decimal degrees.
Returns a matrix of distances in kilometers between a list of longitudes and latitudes.
Large datasets may take some time to process, as the number of distances to calculate is factorial in nature.
Matthew Vavrek, with suggestions from Anton Korobeynikovs
data(fdata.lats) earth.dist(fdata.lats)
data(fdata.lats) earth.dist(fdata.lats)
Calculate a minimum convex polygon for a collection of points without knowing what points form the vertices.
earth.poly(lats)
earth.poly(lats)
lats |
a table with a longitude and latitude column respectively as the first two columns, or a SpatialPoints object with longitude/latitude |
This function will calculate the area of a minimum convex polygon/convex hull for a spherical surface (ie points on a globe).
The function will return a list consisting of the area in $km^2$ (\$area
) and a vector with the row numbers of the vertices (\$vertices
)
Matthew Vavrek
#1/8th the surface area of the earth a <- matrix(c(0, 0, 0, 90, 90, 0, 25, 25), 4, 2, byrow = TRUE) earth.poly(a)
#1/8th the surface area of the earth a <- matrix(c(0, 0, 0, 90, 90, 0, 25, 25), 4, 2, byrow = TRUE) earth.poly(a)
Calculate the true area on a sphere enclosed by three points on the earth's surface
earth.tri(long1, lat1, long2, lat2, long3, lat3)
earth.tri(long1, lat1, long2, lat2, long3, lat3)
long1 |
Longitude of site 1 |
lat1 |
Latitude of site 1 |
long2 |
Longitude of site 2 |
lat2 |
Latitude of site 2 |
long3 |
Longitude of site 3 |
lat3 |
Latitude of site 3 |
A function to find the area enclosed by three points on the surface of the
earth, given their latitudes and longitudes. This function is primarily
designed to be a component of earth.poly
, which is likely a more
useful function for most applications.
Returns a value in kilometers squared of the area enclosed by the three points.
The distance calculated may be up to 0.2% inaccurate, as this function treats the Earth as a sphere with a circumference of 40041.47 km (mean circumference), rather than an ellipsoid like it actually is.
Matthew Vavrek
Wolfram Mathworld, http://mathworld.wolfram.com/SphericalTriangle.html
#1/8th the surface area of the earth earth.tri(0, 0, 0, 90, 90, 0)
#1/8th the surface area of the earth earth.tri(0, 0, 0, 90, 90, 0)
Create a distance matrix between any number of locations
ecol.dist(x, method = sorenson, type = "dis") dino.dist(x, method = sorenson, type = "dis")
ecol.dist(x, method = sorenson, type = "dis") dino.dist(x, method = sorenson, type = "dis")
x |
matrix of taxa (or equivalent data) in rows by columns of localities (or equivalent) |
method |
the distance/similarity index to compute |
type |
if the matrix is to be a distance ('dis') or similarity ('sim) matrix |
This will create a distance (or similarity) matrix using any of the provided
indices: sorenson
, simpson
,
bray.curtis
, jaccard
, morisita.horn
,
chao.jaccard
and chao.sorenson
. Creating a
distance matrix will give a value of 1 for the most distantly related sites,
while similarity index will give a value of 1 for the most similar sites.
dino.dist
is an old name for the function, and is in the process of being deprecated.
A distance matrix (lower triangle) giving the pairwise distance indices between all points.
To use a user generated distance index, type the name of the function to be used for method, and the function will use that function instead. Note that the function internally provides two equal length vectors at a time to the distance calculation function.
Matthew Vavrek
sorenson
, simpson
,
bray.curtis
, jaccard
, morisita.horn
,
chao.jaccard
and chao.sorenson
##example using fdata.mat data(fdata.mat) ecol.dist(fdata.mat) ecol.dist(fdata.mat,simpson,"sim")
##example using fdata.mat data(fdata.mat) ecol.dist(fdata.mat) ecol.dist(fdata.mat,simpson,"sim")
Calculate the rotation of a point on the Earth for a given Euler pole. The rotation assumes a shperical earth.
euler.rot(lat1, long1, rotdeg, lat2, long2)
euler.rot(lat1, long1, rotdeg, lat2, long2)
lat1 |
Euler-pole latitude |
long1 |
Euler-pole longitude |
rotdeg |
Rotation about Euler-pole |
lat2 |
Latitude of point to be converted |
long2 |
longitude of point to be converted |
Locations of the Euler pole and the point to be rotated must be given in decimal degrees.
Rotated latitude and longitude of the provided point in decimal degrees.
Matthew Vavrek
~put references to the literature/web site here ~
A simple hypothetical data set used in many of the examples.
There are 3 datasets, however 2 of them (fdata.mat
and fdata.lats
) derive from the first (fdata.list
). fdata.list
is a table with 5 columns descriing the sample site, species name, abundance, and location in latitude/longitude. fdata.mat
is a 12 by 12 species abundance matrix (12 unique species and 12 unique samples/localities) that can be recreated from the original table of occurrences using the create.matrix()
function; likewise, the fdata.lats()
contains the locations of each of the samples, and can be created using the create.lats()
Matthew Vavrek
data(fdata.list)
data(fdata.list)
Computes the Chao species estimator for both chao1
and chao2
estimators
int.chao(x)
int.chao(x)
x |
a vector of positive integers or zero of any length |
This function is typically only called internally by the functions chao1
and chao2
. The function has a built in bias correction, such that it
will not return values of infinity or non-numbers.
Estimated numer of species using the Chao estimator.
Matthew Vavrek
Chao, A. 1984. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics 11: 265-270.
For the more useful implementations of the Chao estimator, see chao1
for the abundance based estimator or chao2
for the incidence based estimator
## create example data set a<-c(4,5,1,1,2,0,0,1,3,0,8,45,23) int.chao(a) ## a data set which would give NaN using classic (ie not bias corrected) version a<-c(4,5,0,0,2,0,0,0,3,0,8,45,23) int.chao(a)
## create example data set a<-c(4,5,1,1,2,0,0,1,3,0,8,45,23) int.chao(a) ## a data set which would give NaN using classic (ie not bias corrected) version a<-c(4,5,0,0,2,0,0,0,3,0,8,45,23) int.chao(a)
Computes the extrapolated species richness of a population using first- or second-order jacknife stimators
jack1(x, taxa.row = TRUE, abund = TRUE) jack2(x, taxa.row = TRUE, abund = TRUE)
jack1(x, taxa.row = TRUE, abund = TRUE) jack2(x, taxa.row = TRUE, abund = TRUE)
x |
a vector, matrix or data frame of positive integers or zero of any size |
taxa.row |
whether each row of the matrix is a different taxon; if so, value is set to TRUE |
abund |
If true, data is assumed to be abundance, if false, presence absence is assumed |
These functions compute the first and second-order jacknife species richness
estimators, respectively. Both functions will accept a vector, matrix or data
frame of any size made up of positive integers and zeros. Matrices are by
default treated such that each row is a different taxon and each column is a
sample or locality, however if they are arranged with the taxa as columns,
change the argument taxa.row
to FALSE. If the data is abundance based,
abund
should be set to TRUE. If abund
is set to FALSE, the data
will be converted to presence/absence if not already in that format. For
single vectors/columns, taxa.row
and abund
are ignored.
The value returned is the Jackknife estimated species diversity of the dataset in question.
Matthew Vavrek
Burnham, K.P. & W.S. Overton. 1978. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65, 623-633.
Burnham, K.P. & W.S. Overton. 1979. Robust estimation of population size when capture probabilities vary among animals. Ecology 60, 927-936.
Heltshe, J. & Forrester, N.E. 1983 . Estimating species richness using the jackknife procedure. Biometrics 39, 1-11.
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) jack1(a) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) jack1(a) jack2(a) jack2(a,abund = FALSE) ## presence absence matrix of the above abundance matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) jack1(a) jack2(a) jack2(a, abund = FALSE)
## sample vector a<-c(0,5,1,1,2,0,0,1,0,0,8,45) jack1(a) ## matrix format a<-matrix(c(0,5,1,1,2,0,0,1,0,0,8,45),4,3) jack1(a) jack2(a) jack2(a,abund = FALSE) ## presence absence matrix of the above abundance matrix a<-matrix(c(0,1,1,1,1,0,0,1,0,0,1,1),4,3) jack1(a) jack2(a) jack2(a, abund = FALSE)
A helper function to convert a table of latitudes and longitudes (and associated attributes, if applicable) into a shapefile
lats2Shape(lats)
lats2Shape(lats)
lats |
a table with a latitude and longitude column respectively with associated attributes |
The table to be converted must contain as it's first two columns the latitude (or Y) and longitude (or X) values to be converted. Any other number of columns in any format can also be attached, and will be included in the attribute table.
A shapefile object which can be written to file using write.shapefile
Matthew Vavrek
## Not run: #use fdata.lats as dataset data(fdata.lats) shape.lats<-lats2Shape(fdata.lats) write.shapefile(shape.lats, file='/path/to/write/lats') ## End(Not run)
## Not run: #use fdata.lats as dataset data(fdata.lats) shape.lats<-lats2Shape(fdata.lats) write.shapefile(shape.lats, file='/path/to/write/lats') ## End(Not run)
A function to plot any number of points given their latitude and longitude respectively on a map of the world.
loc.map(x, ...)
loc.map(x, ...)
x |
a table with a longitude and latitude column respectively with optional associated attributes |
... |
arguments to be passed to the |
This is a helper function, which automatically zooms in and centers the map
view on the input points. The ...
allow the user to adjust the usual
parameters for a scatterplot outlined by par
.
Plots a map of the world focused on the locations provided.
Matthew Vavrek
#plotting the fdata sample set data(fdata.lats) loc.map(fdata.lats)
#plotting the fdata sample set data(fdata.lats) loc.map(fdata.lats)
A function meant to be used internally be the relational.clustering
function
localoptima(dist, group)
localoptima(dist, group)
dist |
Distance matrix to be used |
group |
group designations |
The function takes a distance matrix and a vector with the group identifications for each sample locality (or equivalent). It is mainly meant to be used internally by the relational.clustering
function to optimize the initial clustering and find the local (which hopefully is also the global) optimal organization, such that each member of a group is more similar to the other members in it's group (on average) than to any other groups.
Arc distance between two points on the Earth's surface in kilometers.
The distance calculated may be up to 0.2% inaccurate, as this function treats the Earth as a sphere with a circumference of 40003 km, rather than an ellipsoid like it actually is.
Matthew Vavrek
To calculate pairwise distances between a list of points see earth.dist
, or to calculate an area enclosed by three points on the Earth's surface, see earth.tri
##distance between 23 degrees N 54 degrees E and 32 degrees S 67 degrees E deg.dist(23,54,-32,67)
##distance between 23 degrees N 54 degrees E and 32 degrees S 67 degrees E deg.dist(23,54,-32,67)
Creating a quick and focused map using a world map for gegraphically referenced visualization within R of a minimum spanning tree or network.
msn.map(msn, lat, ...)
msn.map(msn, lat, ...)
msn |
minimum spanning tree or network to be used |
lat |
the lats |
... |
arguments to be passed to |
This is a helper function for quick visualization of georeferenced minimum spanning trees, and is not meant for creating figure quality images due to lack of fine control over many functions
Returns a map of the globe, focused in on any set of georeferenced localities.
Matthew Vavrek
##add examples
##add examples
A helper function to convert a minimum spanning tree or network into shapefile format.
msn2Shape(msn, lats, dist = NULL)
msn2Shape(msn, lats, dist = NULL)
msn |
a minimum spanning tree or network (binary matrix) |
lats |
a matrix or data frame with the latitude and longitude of the sites as the first two columns respectively |
dist |
Optional argument to include distance values in final output; if wanted, a distance matrix (lower triangle) with the localities in the same order as in the MSN are required |
This function will take a minimum spanning tree or network object, along with
the georeferenced locations of the sites, and convert it into a shapefile for
use with GIS. The msn
argument requires a minimu spanning tree or
network object, and the lat argument requires some form of location for each
of the points, typically a matrix with latitude and longitude columns
respectively.
A shapefile which can be output using the write.shapefile
function for
use with a GIS program.
Matthew Vavrek
lats2Shape
for a function to convert a lat/long table to
a shapefile
## Not run: #import both fdata.lats and fdata.mat data(fdata.lats) data(fdata.mat) fdata.dist<-dino.dist(fdata.mat) fdata.mst<-dino.mst(fdata.dist) shape.mst<-msn2Shape(fdata.mst, fdata.lats) write.shapefile(shape.mst, file='/path/to/write/mst') ## End(Not run)
## Not run: #import both fdata.lats and fdata.mat data(fdata.lats) data(fdata.mat) fdata.dist<-dino.dist(fdata.mat) fdata.mst<-dino.mst(fdata.dist) shape.mst<-msn2Shape(fdata.mst, fdata.lats) write.shapefile(shape.mst, file='/path/to/write/mst') ## End(Not run)
a method of displaying a Minimum Spanning Tree/Network over a given set of points
mstlines(mst, x, y = NULL, pts.names = NULL, ...)
mstlines(mst, x, y = NULL, pts.names = NULL, ...)
mst |
a minimum spanning tree or network object |
x |
either a table with the first two columns that of the x and y coordinates respectively, or simply that of the x coordinate |
y |
an optional argument if the y coordinates were not given in argument x |
pts.names |
If there is more than one shortest possible branch, should one be chosen randomly |
... |
arguments to be passed to lines() |
A function to plot the lines of a minimum spanning tree/forest on a plot; works as a frontend for lines
.
Matthew Vavrek
#plot with overlain MST for fdata dataset data(fdata.lats) data(fdata.mat) fdata.dist<-dino.dist(fdata.mat) fdata.mst<-dino.mst(fdata.dist) plot(coordinates(fdata.lats)) mstlines(fdata.mst, coordinates(fdata.lats))
#plot with overlain MST for fdata dataset data(fdata.lats) data(fdata.mat) fdata.dist<-dino.dist(fdata.mat) fdata.mst<-dino.mst(fdata.dist) plot(coordinates(fdata.lats)) mstlines(fdata.mst, coordinates(fdata.lats))
Find a new location using an original location (latitude and longitude) along with a bearing and distance
new.lat.long(long, lat, bearing, distance)
new.lat.long(long, lat, bearing, distance)
long |
original longitude |
lat |
original latitude |
bearing |
bearing from original point to new location, degrees from North |
distance |
distance to location |
a vector of length 2 with the new latitude and longitude respectively
Matthew Vavrek
#Travel from 0,0 to a new location at a bearing of 45 degrees #from North (clockwise) and 1000 km away new.lat.long(long = 0, lat = 0, bearing = 45, distance = 1000)
#Travel from 0,0 to a new location at a bearing of 45 degrees #from North (clockwise) and 1000 km away new.lat.long(long = 0, lat = 0, bearing = 45, distance = 1000)
This is a helper function which will plot an NMDS with an overlain MST
nmds.mst(nmds, mst, ...)
nmds.mst(nmds, mst, ...)
nmds |
an NMDS created using the ecodist program |
mst |
a minimum spanning tree or network (binary matrix) |
... |
arguments to be passed to the |
At the moment, the function requires an NMDS created using the ecodist program, howver the minimum spanning tree can be any one which creates a binary matrix showing connections (ie dino.mst
).
Plots a non-metric multidimensional scaling plot with an overlain minimum spanning tree showing connections between the points.
Matthew Vavrek
## Not run: #use fdata.mat as dataset, and use the \code{ecodist} package for the \code{nmds()} function data(fdata.mat) z <- ecol.dist(fdata.mat) a <- dino.msn(z) b <- nmds(z) nmds.mst(b, a) ## End(Not run)
## Not run: #use fdata.mat as dataset, and use the \code{ecodist} package for the \code{nmds()} function data(fdata.mat) z <- ecol.dist(fdata.mat) a <- dino.msn(z) b <- nmds(z) nmds.mst(b, a) ## End(Not run)
Measures to compare the similarity of two clustering outcomes
rand.index(group1, group2) adj.rand.index(group1, group2)
rand.index(group1, group2) adj.rand.index(group1, group2)
group1 |
first cluster identity matrix |
group2 |
second cluster identity matrix |
This function calculates the Rand Index for two different clustering outcomes. The Rand Index gives a value between 0 and 1, where 1 means the two clustering outcomes match identicaly.
The Adjusted Rand Index rescales the index, taking into account that random chance will cause some objects to occupy the same clusters, so the Rand Index will never actually be zero.
a single value between 0 and 1
Matthew Vavrek
Rand, W.M. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66: 846–850.
Hubert, L. and Arabie, P. 1985. Comparing partitions. Journal of Classification. 2: 193–218.
To cluster the data, use the function rclust
#create a hypothetical clustering outcome with 2 distinct clusters g1 <- sample(1:2, size=10, replace=TRUE) g2 <- sample(1:3, size=10, replace=TRUE) rand.index(g1, g2)
#create a hypothetical clustering outcome with 2 distinct clusters g1 <- sample(1:2, size=10, replace=TRUE) g2 <- sample(1:3, size=10, replace=TRUE) rand.index(g1, g2)
A method to cluster a number of samples using a relational (dissimila
rclust(dist, clusters = 2, runs = 10, counter = FALSE)
rclust(dist, clusters = 2, runs = 10, counter = FALSE)
dist |
pairwise distance matrix |
clusters |
final number of clusters |
runs |
number of randomizations to run |
counter |
use counter to display current run number |
This function will return a cluster identity vector. The number of clusters requested must be 2 or greater, but 1/2 or less than the total number of samples, as the function requires at least 2 samples per cluster.
The vector returned will be the same length as the number of samples provided in the original dist
matrix. The samples will have been clustered into the same number of clusters as defined in clusters
Matthew Vavrek
rclust.dist
, rclust.null
, rclust.weights
and coi
#a null solution for the fdata example data set data(fdata.mat) fd.dist <- dino.dist(fdata.mat) rclust(fd.dist, clusters = 2, runs = 10)
#a null solution for the fdata example data set data(fdata.mat) fd.dist <- dino.dist(fdata.mat) rclust(fd.dist, clusters = 2, runs = 10)
Provides a distance matrix intra- and inter-group average distances based on a clustering solution and a dissimilarity matrix.
rclust.dist(groups, dist)
rclust.dist(groups, dist)
groups |
cluster identity vector |
dist |
original pairwise distance matrix |
This function calculates a distance matrix for each cluster, giving the average within group pairwise distance and the average between group pairwise distance.
A matrix by
in size, where
is the number of clusters
Matthew Vavrek
To cluster the data, use the function rclust
; see as well rclust.weights
, rclust.null
#a null solution for the fdata example data set data(fdata.mat) fd.dist <- dino.dist(fdata.mat) fd.clust <- rclust(fd.dist, 2) rclust.dist(fd.clust, fd.dist)
#a null solution for the fdata example data set data(fdata.mat) fd.dist <- dino.dist(fdata.mat) fd.clust <- rclust(fd.dist, 2) rclust.dist(fd.clust, fd.dist)
A Monte Carlo method for calculating a null/random clustering solution based on the type and arrangement of a known clustering solution.
rclust.null(groups, dist)
rclust.null(groups, dist)
groups |
cluster identity vector |
dist |
original pairwise distance matrix |
This function calculates a random/null clustering solution based on a given solution. It resamples the data and reassigns samples to groups, keeping the same group sizes.
A matrix equal in rows to the number of clusters originally given, with two columns for mean within group distance and standard deviation respectively.
Matthew Vavrek
To cluster the data, use the function rclust
; see as well rclust.weights
, rclust.dist
#a null solution for the fdata example data set data(fdata.mat) fd.dist <- dino.dist(fdata.mat) fd.clust <- rclust(fd.dist, 2) rclust.null(fd.clust, fd.dist)
#a null solution for the fdata example data set data(fdata.mat) fd.dist <- dino.dist(fdata.mat) fd.clust <- rclust(fd.dist, 2) rclust.null(fd.clust, fd.dist)
A method to cluster a number of samples using a relational (dissimila
rclust.weights(groups, dist)
rclust.weights(groups, dist)
groups |
cluster identity vector |
dist |
original pairwise distance matrix |
This function creates an n by c sized matrix, where n is the number of samples and c is the number of groups, of the average distances for each sample from itself to all the members of another group ().
A matrix of of size n (samples) by c (groups).
Matthew Vavrek
To cluster the data, use the function rclust
; see as well rclust.dist
, rclust.null
data(fdata.mat) fd.dist <- dino.dist(fdata.mat) fd.clust <- rclust(fd.dist, 2) rclust.weights(fd.clust, fd.dist)
data(fdata.mat) fd.dist <- dino.dist(fdata.mat) fd.clust <- rclust(fd.dist, 2) rclust.weights(fd.clust, fd.dist)
A method to cluter a number of samples using a relational (dissimila
relational.clustering(dist, clusters = 2)
relational.clustering(dist, clusters = 2)
dist |
pairwise distance matrix |
clusters |
number of clusters required |
This function will calculate the shortest distance (portion of a Great Circle) in kilometers between two points on the Earth given their latitude and longitude.
Arc distance between two points on the Earth's surface in kilometers.
The distance calculated may be up to 0.2% inaccurate, as this function treats the Earth as a sphere with a circumference of 40003 km, rather than an ellipsoid like it actually is.
Matthew Vavrek
The formulas for the Haversine distance function were taken from the Dr. Math website at http://mathforum.org/library/drmath/view/55417.html
To calculate pairwise distances between a list of points see earth.dist
, or to calculate an area enclosed by three points on the Earth's surface, see earth.tri
##distance between 23 degrees N 54 degrees E and 32 degrees S 67 degrees E deg.dist(23,54,-32,67)
##distance between 23 degrees N 54 degrees E and 32 degrees S 67 degrees E deg.dist(23,54,-32,67)
Calculating a species area curve for a set of georeferenced localities
sac(lats, spp)
sac(lats, spp)
lats |
a table with a longitude and latitude column respectively as the first two columns, or a SpatialPoints object with longitude/latitude |
spp |
A matrix/data frame of species (rows) by samples/localities (columns) |
This will take a set of geographic coordinates along with a table of species by localities and return a list consisting of a matrix ($areavsspp
) with a column of total area and of total species present, and a vector (ranks
) with the order the samples were added in. The area is calculated by starting with the most central point, and adding those points closest to it, calculating a minimum spanning polygon as each new site is added, until all points are used.
Returns a list of a matrix with columns of total area and total species recorded respectively and a vector of sample orders.
Matthew Vavrek
#fdata species/area relationship data(fdata.lats) data(fdata.mat) a<-sac(fdata.lats, fdata.mat) plot(log(a$areavsspp))
#fdata species/area relationship data(fdata.lats) data(fdata.mat) a<-sac(fdata.lats, fdata.mat) plot(log(a$areavsspp))
A function to simulate a species occurrence data set
sim.occ(total.species = 100, endemics = 0.1, regions = 3, locs = 30, avg.abund = 1)
sim.occ(total.species = 100, endemics = 0.1, regions = 3, locs = 30, avg.abund = 1)
total.species |
The total number of species in the region (i.e. the number of rows in the result matrix) |
endemics |
The proportion of endemic species for the entire region |
regions |
The number of areas of endemicity |
locs |
The number of samples/locatlities per region of endemicity |
avg.abund |
The 'average' abundance of a species for any given sample |
The function creates a matrix of $c$ rows of species (given by total.species
) with $n$ number of sample columns (where $n$ equals $regions*locs$). The given abundance of any species at a given sample is determined by a log normal distribution, with each species being randomly assigned a value from rnorm()
. The number of endemics for any given region is equal to $total.species*endemics/regions$. An endemic is conseidered to only occur within a given region, and all other non-ndemic species are considered to be 'cosmopolitan' and can occur in any region. The avg.abund
value affects how many species are recovered at a given site, and for any given run there are typically species that are not present in the sample but are present in the region.
Returns a matrix of simulated species abundances per locality.
Matthew Vavrek
## create a dataset with 2 regions and 5 samples per region sim.occ(regions=2, locs=5)
## create a dataset with 2 regions and 5 samples per region sim.occ(regions=2, locs=5)
Functions to calculate the ecological distance between two groups
braun.blanquet(x, y) bray.curtis(x, y) euclidean(x,y) kulczynski(x,y) jaccard(x, y) manhattan(x, y) morisita.horn(x, y) ochiai(x, y) simpson(x, y) sorenson(x, y)
braun.blanquet(x, y) bray.curtis(x, y) euclidean(x,y) kulczynski(x,y) jaccard(x, y) manhattan(x, y) morisita.horn(x, y) ochiai(x, y) simpson(x, y) sorenson(x, y)
x |
species from group A |
y |
species from group B |
You must provide two separate vectors, with species arranged in the same order, from area A and B. If species are present in one site but not the other, these must be recorded for both sites; the site where they are not found should be coded as a zero. For details on each index, please consult the references.
Returns the similarity index for the two sites in question.
Matthew Vavrek
Shi, G. R. 1993. Multivariate data analysis in palaeoecology and palaeobiogeography – a review. Palaeogeography, Palaeoclimatology, Palaeoecology 105: 199–234.
Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell.
##Species counts from two different locations a <- c(1,0,4,3,5,0,0,7) b <- c(2,1,3,0,0,1,0,6) bray.curtis(a,b) jaccard(a,b) simpson(a,b) sorenson(a,b) morisita.horn(a,b)
##Species counts from two different locations a <- c(1,0,4,3,5,0,0,7) b <- c(2,1,3,0,0,1,0,6) bray.curtis(a,b) jaccard(a,b) simpson(a,b) sorenson(a,b) morisita.horn(a,b)
Estimate the diversity of a sample(s) using a number of species diversity estimators.
spp.est(x, rand = 10, abund = TRUE, counter = FALSE, max.est = 'all')
spp.est(x, rand = 10, abund = TRUE, counter = FALSE, max.est = 'all')
x |
A vector, matrix or data frame with species as rows and locations/samples as columns |
rand |
The number of times to run the internal randomizations; default is set to 10 |
abund |
If the data is abundance or presence/absence; default is set to TRUE for abundance |
counter |
Whether or not to provide a running total of progress of randomizations |
max.est |
The value to go up to for the analysis; default is set to the same as the total number of samples |
This function will accept a vector, matrix or data frame of species by samples and return a large matrix with various species estimation values.
Returns a table with the following column names if abund=TRUE
:
N.obs |
Total sample size |
S.obs |
Number of observed species |
S.obs(+95%) |
95% upper confidence interval |
S.obs(-95%) |
95% lower confidence interval |
Chao1 |
Chao Species Estimation |
Chao1(upper) |
95% upper confidence interval |
Chao1(lower) |
95% lower confidence interval |
ACE |
Abundance-based Coverage Estimator |
ACE(upper) |
95% upper confidence interval |
ACE(lower) |
95% lower confidence interval |
Jack1 |
First Order Jacknife Estimator |
Jack1(upper) |
95% upper confidence interval |
Jack1(lower) |
95% lower confidence interval |
Returns a table with the following column names if abund=FALSE
:
N.obs |
Total sample size |
S.obs |
Number of observed species |
S.obs(+95%) |
95% upper confidence interval |
S.obs(-95%) |
95% lower confidence interval |
Chao2 |
Chao Species Estimation |
Chao2(upper) |
95% upper confidence interval |
Chao2(lower) |
95% lower confidence interval |
ICE |
Incidence-based Coverage Estimator |
ICE(upper) |
95% upper confidence interval |
ICE(lower) |
95% lower confidence interval |
Jack1 |
First Order Jacknife Estimator |
Jack1(upper) |
95% upper confidence interval |
Jack1(lower) |
95% lower confidence interval |
This function can be very long to run due to its iterative nature. The randomizations are initially set to 10 so the process will run relatively quickly, but a low value for randomizations will not give nicely smoothed curves.
Also, in some cases due to the nature of some of the functions, they provide no answer, such as is common with the Chao standard deviation. In this case, the Chao upper and lower bounds are simply 95% confidence intervals based on the actual Chao estimator.
Matthew Vavrek
The original idea for a program similar to this came from the extremely useful EstimateS program by Robert K. Colwell
Colwell, R.K. 2010. EstimateS: Statistical estimation of species richness and shared species from samples. Version 8.2. User's Guide and application published at: http://purl.oclc.org/estimates.
#abundance example with sample data set data(fdata.mat) spp.est(fdata.mat, abund = TRUE, counter = FALSE) #occurrence example with sample data set data(fdata.mat) spp.est(fdata.mat, abund = FALSE, counter = FALSE)
#abundance example with sample data set data(fdata.mat) spp.est(fdata.mat, abund = TRUE, counter = FALSE) #occurrence example with sample data set data(fdata.mat) spp.est(fdata.mat, abund = FALSE, counter = FALSE)
Determines if a distance matrix obeys the triangle inequality
tri.ineq(dist)
tri.ineq(dist)
dist |
A distance matrix |
Tests if a distance matrix respects the triangle inequality. Often with non-monotonic distance measures and complex data a situation can arise where the triangle inequality (where no single side of a triangle is greater in length than the sum of the other two sides) is not respected.
Returns a TRUE if the inequality is respected, and a FALSE if there is any situation where the triangle inequality is not respected.
Matthew Vavrek
## sample distance matrix with an impossible triangle a<-matrix(0.2, 4,4) a[4,2]<-0.8 a<-as.dist(a) tri.ineq(a)
## sample distance matrix with an impossible triangle a<-matrix(0.2, 4,4) a[4,2]<-0.8 a<-as.dist(a) tri.ineq(a)