Title: | Species-Richness Prediction and Diversity Estimation with R |
---|---|
Description: | Estimation of various biodiversity indices and related (dis)similarity measures based on individual-based (abundance) data or sampling-unit-based (incidence) data taken from one or multiple communities/assemblages. |
Authors: | Anne Chao, K. H. Ma, T. C. Hsieh and Chun-Huo Chiu |
Maintainer: | Anne Chao <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.1 |
Built: | 2024-10-31 21:18:22 UTC |
Source: | CRAN |
Provides simple functions to compute various biodiversity indices and related (dis)similarity measures based on individual-based (abundance) data or sampling-unit-based (incidence) data taken from one or multiple communities/assemblages.
This package contains six main functions:
1. ChaoSpecies
(estimating species richness for one community).
2. Diversity
(estimating a continuous diversity profile and various diversity indices in one community including species richness, Shannon
diversity and Simpson diversity). This function also features plots of empirical and estimated continuous diversity profiles.
3. ChaoShared
(estimating the number of shared species between two communities).
4. SimilartyPair
(estimating various similarity indices between two assemblages). Both richness- and abundance-based two-community similarity indices are included.
5. SimilarityMult
(estimating various similarity indices among communities). Both richness- and abundance-based
-community similarity indices are included.
6. Genetics
(estimating allelic dissimilarity/differentiation among sub-populations based on multiple-subpopulation genetics data).
Except for the Genetics
function, there are at least three types of data are supported for each function.
Data are generally classified as abundance data and incidence data and there are five types of data input formats options (datatype="abundance", "abundance_freq_count", "incidence_freq", "incidence_freq_count", "incidence_raw").
Individual-based abundance data when a sample of individuals is taken from each community.
Type (1) abundance data (datatype = "abundance"): Input data consist of species (in rows) by community (in columns) matrix. The entries of each row are the observed abundances of a species in communities.
Type (1A) abundance-frequency counts data only for a single community (datatype = "abundance_freq_count"): input data are arranged as (1 )(each number needs to be separated by at least one blank space or separated by rows), where
denotes the maximum frequency and
denotes the number of species represented by exactly
individuals/times in the sample. Here the data (
) are referred to as "abundance-frequency counts".
Sampling-unit-based incidence data when a number of sampling units are randomly taken from each community. Only the incidence (detection/non-detection) of species is recorded in each sampling unit. There are three data formats options.
Type (2) incidence-frequency data (datatype="incidence_freq"): The first row of the input data must be the number of sampling units in each community. Beginning with the second row, input data consist of species (in rows) by community (in columns) matrix. The entries of each row are the observed incidence frequencies (the number of detections or the number of sampling units in which a species are detected) of a species in communities.
Type (2A) incidence-frequency counts data only for a single community (datatype="incidence
_freq_count"): input data are arranged as () (each number needs to be separated by at least one blank space or separated by rows), where
denotes the number of species that were detected in exactly
sampling units, while
denotes the number of sampling units in which the most frequent species were found. The first entry must be the total number of sampling units,
. The data (
) are referred to as "incidence frequency counts".
Type (2B) incidence-raw data (datatype="incidence_raw"): Data consist of a species-by-sampling-unit incidence (detection/non-detection) matrix; typically "1" means a detection and "0" means a non-detection. Each row refers to the detection/non-detection record of a species in sampling units. Users must specify the number of sampling units in the function argument "units". The first
columns of the input matrix denote species detection/non-detection data based on the
sampling units from Community 1, and the next
columns denote the detection/non-detection data based on the
sampling units from Community 2, and so on, and the last
columns denote the detection/non-detection data based on
sampling units from Community
,
.
An Online version of SpadeR is also available for users without an R background:
http://chao.stat.nthu.edu.tw/wordpress/software_download/softwarespader_online/.
In the detailed Online SpadeR User's Guide, we illustrate all the running procedures in an easily
accessible way through numerical examples with proper interpretations of portions of the output.
All the data of those illustrative examples are included in this package.
functions: ChaoSpecies, Diversity, ChaoShared, SimilarityPair, SimilarityMult, Genetics
Anne Chao, K. H. Ma, T. C. Hsieh and Chun-Huo Chiu
Maintainer: Anne Chao <[email protected]>
ChaoSpecies
: Estimation of species richness in a single community based on five types of data:
Type (1) abundance data (datatype="abundance"), Type (1A) abundance-frequency counts
(datatype="abundance_freq_count"), Type (2) incidence-frequency data (datatype =
"incidence_freq"), Type (2A) incidence-frequency counts (datatype="incidence_freq_count"), and
Type (2B) incidence-raw data (datatype="incidence_raw"); see SpadeR-package
details for data input formats.
ChaoSpecies(data, datatype = c("abundance", "abundance_freq_count", "incidence_freq", "incidence_freq_count", "incidence_raw"), k = 10, conf = 0.95)
ChaoSpecies(data, datatype = c("abundance", "abundance_freq_count", "incidence_freq", "incidence_freq_count", "incidence_raw"), k = 10, conf = 0.95)
data |
a matrix/data.frame of species abundances/incidences. |
datatype |
type of input data, "abundance", "abundance_freq_count", "incidence_freq", "incidence_freq_count" or "incidence_raw". |
k |
the cut-off point (default = 10), which separates species into "abundant" and "rare" groups for abundance data for the estimator ACE; it separates species into "frequent" and "infrequent" groups for incidence data for the estimator ICE. |
conf |
a positive number |
a list of three objects: $Basic_data_information
and $Rare_species_group
/$Infreq_species_group
for summarizing data information. $Species_table
for showing a table of various species richness estimates, standard errors, and the associated confidence intervals.
Chao, A., and Chiu, C. H. (2012). Estimation of species richness and shared species richness. In N. Balakrishnan (ed). Methods and Applications of Statistics in the Atmospheric and Earth Sciences. p.76-111, Wiley, New York.
Chao, A., and Chiu, C. H. (2016). Nonparametric estimation and comparison of species richness. Wiley Online Reference in the Life Science. In: eLS. John Wiley and Sons, Ltd: Chichester. DOI: 10.1002/9780470015902.a0026329.
Chao, A., and Chiu, C. H. (2016). Species richness: estimation and comparison. Wiley StatsRef: Statistics Reference Online. 1-26.
Chiu, C. H., Wang Y. T., Walther B. A. and Chao A. (2014). An improved non-parametric lower bound of species richness via the Good-Turing frequency formulas. Biometrics, 70, 671-682.
Gotelli, N. G. and Chao, A. (2013). Measuring and estimating species richness, species diver- sity, and biotic similarity from sampling data. Encyclopedia of Biodiversity, 2nd Edition, Vol. 5, 195-211, Waltham, MA.
data(ChaoSpeciesData) # Type (1) abundance data ChaoSpecies(ChaoSpeciesData$Abu,"abundance",k=10,conf=0.95) # Type (1A) abundance-frequency counts data ChaoSpecies(ChaoSpeciesData$Abu_count,"abundance_freq_count",k=10,conf=0.95) # Type (2) incidence-frequency data ChaoSpecies(ChaoSpeciesData$Inci,"incidence_freq",k=10,conf=0.95) # Type (2A) incidence-frequency counts data ChaoSpecies(ChaoSpeciesData$Inci_count,"incidence_freq_count",k=10,conf=0.95) # Type (2B) incidence-raw data ChaoSpecies(ChaoSpeciesData$Inci_raw,"incidence_raw",k=10,conf=0.95)
data(ChaoSpeciesData) # Type (1) abundance data ChaoSpecies(ChaoSpeciesData$Abu,"abundance",k=10,conf=0.95) # Type (1A) abundance-frequency counts data ChaoSpecies(ChaoSpeciesData$Abu_count,"abundance_freq_count",k=10,conf=0.95) # Type (2) incidence-frequency data ChaoSpecies(ChaoSpeciesData$Inci,"incidence_freq",k=10,conf=0.95) # Type (2A) incidence-frequency counts data ChaoSpecies(ChaoSpeciesData$Inci_count,"incidence_freq_count",k=10,conf=0.95) # Type (2B) incidence-raw data ChaoSpecies(ChaoSpeciesData$Inci_raw,"incidence_raw",k=10,conf=0.95)
There are five data sets:
1. Type (1) abundance data (ChaoSpeciesData$Abu
)
The data consist of 25 birds abundances/frequencies in a sample (Magurran, 1988, p.152). Their observed frequencies are respectively 752, 276, 194, 126, 121, 97, 95, 83, 72, 44, 39, 16, 15, 13, 9, 9, 9, 8, 7, 4, 2, 2, 1, 1, 1.
2. Type (1A) abundance-frequency counts data (ChaoSpeciesData$Abu_count
)
The data consist of the observed species abundance distribution of endangered and rare vascular plant species in the central portion of the southern Appalachian region (Miller and Wiegert, 1989). A total of 188 species were recorded out of 1008 individuals compiled over a span of 150 years of field observations. The data are read as: (1 61 2 35 3 18 4 12 ... 67 1); each number needs to be separated by at least one blank space or by separated by rows. Here the first pair (1, 61) indicates that there are 61 singletons, the second pair (2, 35) indicates there are 35 doubletons, and so on, with the last pair (67, 1) indicating that there is one species that is represented by 67 individuals.
3. Type (2) incidence-frequency data (ChaoSpeciesData$Inci
)
The data include seed-bank records taken from Butler and Chazdon (1998). There were 121 soil samples (each soil sample is regarded as a sampling unit) and species of seedlings that germinated from each soil sample were recorded. A total of 34 species of seedlings were found in the 121 soil samples. In the input data, the entry in the first row denotes the number of sampling units. Then, beginning with the second row, each row records the species incidence frequency (i.e., the number of soil samples in which the seedlings were found) of a given species in all 121 soil samples. The ordering of data entries does not affect the analysis.
4. Type (2A) incidence-frequency counts data (ChaoSpeciesData$Inci_freq_count
)
The data consist of cottontail capture-recapture data provided in Edwards and Eberhardt (1967) to illustrate that species richness estimation can be applied to estimate the size of a population. An "individual" animal in capture-recapture studies corresponds to a "species" in the richness estimation. A total of 142 captures were recorded for 76 distinct rabbits in 18 trapping nights. For these data, the incidence frequency counts ( to
) were 43, 16, 8, 6, 0, 2, 1. The input data are read as follows:
(18 1 43 2 16 3 8 4 6 6 2 7 1); each number needs to be separated by at least one blank space or separated by rows. Here the pair (1, 43) indicates that there are 43 unique species, the next pair (2, 16) indicates there are 16 duplicate species, and so on.
5. Type (2B) incidence-raw data (ChaoSpeciesData$Inci_raw
)
In the cottontail capture-recapture experiments conducted by Edwards and Eberhardt (1967), a total of 76 distinct individuals (regarded as 76 "species") were found in 18 trapping nights. The incidence-raw data consist of a capture/non-capture matrix (where "1" means a capture and "0" means a non-capture) with 76 rows and 18 columns. If we regard this capture-recapture matrix as a species-by-sampling-unit matrix, then species richness estimation can be applied to estimate the size of the cottontail population.
data(ChaoSpeciesData)
data(ChaoSpeciesData)
Magurran, A. E. (1988). Ecological Diversity and Its Measurement.
Princeton University Press, Princeton, New Jersey.
Miller, R. I. and Wiegert, R. G. (1989). Documenting completeness, species-area relations, and the species-abundance distribution of a regional flora. Ecology, 70, 16-22.
Butler, B. J., and Chazdon, R. L. (1998). Species richness, spatial variation, and abundance of
the soil seed bank of a secondary tropical rain forest. Biotropica, 30, 214-222.
Edwards, W. R. and Eberhardt, L. (1967). Estimating cottontail abundance from live trapping data. The Journal of Wildlife Management, 31, 87-96.
Diversity
: Estimating a continuous diversity profile in one community including species rich-
ness, Shannon diversity and Simpson diversity). This function also supplies plots of empirical and
estimated continuous diversity profiles. Various estimates for Shannon entropy and the Gini-
Simpson index are also computed. All five types of data are supported: Type (1) abundance data
(datatype="abundance"), Type (1A) abundance-frequency counts
(datatype="abundance_freq_count"), Type (2) incidence-frequency data (datatype =
"incidence_freq"), Type (2A) incidence-frequency counts (datatype="incidence_freq_count"), and
Type (2B) incidence-raw data (datatype="incidence_raw"); see SpadeR-package
details for data input formats.
Diversity(data, datatype = c("abundance", "abundance_freq_count", "incidence_freq", "incidence_freq_count", "incidence_raw"), q = NULL)
Diversity(data, datatype = c("abundance", "abundance_freq_count", "incidence_freq", "incidence_freq_count", "incidence_raw"), q = NULL)
data |
a matrix/data.frame of species abundances/incidences. |
datatype |
type of input data, "abundance", "abundance_freq_count", "incidence_freq", "incidence_freq_count" or "incidence_raw". |
q |
a vector of nonnegative numbers specifying the diversity orders for which Hill numbers will be estimated. If |
a list of seven objects: $Basic_data
for summarizing data information. $Species_richness
for showing various species richness estimates along with related statistics. $Shannon_index
and $Shannon_diversity
for showing various Shannon index/diversity estimates. $Simpson_index
and $Simpson_diversity
for showing two Simpson index/diversity estimates. $Hill_numbers
for showing Hill number (diversity) estimates of diversity orders specified in the argument q
.
Chao, A., and Chiu, C. H. (2012). Estimation of species richness and shared species richness. In N. Balakrishnan (ed). Methods and Applications of Statistics in the Atmospheric and Earth Sciences. p.76-111, Wiley, New York.
Chao, A. and Jost, L. (2015). Estimating diversity and entropy profiles via discovery rates of new species. Methods in Ecology and Evolution, 6, 873-882.
Chao, A., Wang, Y. T. and Jost, L. (2013). Entropy and the species accumulation curve: a novel estimator of entropy via discovery rates of new species. Methods in Ecology and Evolution 4, 1091-1110.
## Not run: data(DiversityData) # Type (1) abundance data Diversity(DiversityData$Abu,"abundance",q=c(0,0.5,1,1.5,2)) # Type (1A) abundance-frequency counts data Diversity(DiversityData$Abu_count,"abundance_freq_count",q=seq(0,3,by=0.5)) # Type (2) incidence-frequency data Diversity(DiversityData$Inci,"incidence_freq",q=NULL) # Type (2A) incidence-frequency counts data Diversity(DiversityData$Inci_freq_count,"incidence_freq_count",q=NULL) # Type (2B) incidence-raw data Diversity(DiversityData$Inci_raw,"incidence_raw",q=NULL) ## End(Not run)
## Not run: data(DiversityData) # Type (1) abundance data Diversity(DiversityData$Abu,"abundance",q=c(0,0.5,1,1.5,2)) # Type (1A) abundance-frequency counts data Diversity(DiversityData$Abu_count,"abundance_freq_count",q=seq(0,3,by=0.5)) # Type (2) incidence-frequency data Diversity(DiversityData$Inci,"incidence_freq",q=NULL) # Type (2A) incidence-frequency counts data Diversity(DiversityData$Inci_freq_count,"incidence_freq_count",q=NULL) # Type (2B) incidence-raw data Diversity(DiversityData$Inci_raw,"incidence_raw",q=NULL) ## End(Not run)
There are five data sets:
1. Type (1) abundance data (DiversityData$Abu
)
The data include a column of the observed tree abundances/frequencies from an old-growth rain forest in Costa Rica (Chao et al. 2005, 2008). There were 69 tree species among 557 individuals.
2. Type (1A) abundance-frequency counts data (DiversityData$Abu_count
)
The data consist of the observed beetles species abundance-frequency counts collected from the Osa old-growth forest site in Costa Rica (Janzen, 1973). There were 112 species among 237 individuals. The input abundance-frequency counts data are arranged as = (1 84 2 10 3 4 4 3 ... 42 1); each number needs to be separated by at least one blank space or separated by rows. Here the first pair (1, 84) indicates that there are 84 singletons, the second pair (2, 10) indicates there are 10 doubletons, and so on, with the last pair (42, 1) indicating that there is one species that is represented by 42 individuals.
3. Type (2) incidence-frequency data (DiversityData$Inci
)
The single-column data include the observed incidence-based frequencies of tropical rainforest ants collected by Berlese extraction of soil samples (217 sampling units) in Costa Rica (Longino et al. 2002). In the input data, the entry in the first row denotes the number of sampling units (217); the subsequent 117 rows denote species incidence frequencies of the observed species.
4. Type (2A) incidence-frequency counts data (DiversityData$Inci_freq_count
)
The seed-bank data consist of the observed species incidence-based frequency counts of seedlings that germinated from soil samples (Butler and Chazdon, 1998); here each soil sample is regarded as a sampling unit. A total of 34 species of seedlings were found in the 121 soil samples. The incidence frequency counts are read as = (121 1 3 2 2 3 3 ... 61 1); each number needs to be separated by at least one blank space or by separated by rows. The first entry, indicating that there are 121 soil samples, is followed by the 18 pairs (1, 3), (2, 2), (3, 3), (4, 3), (5, 1), (6, 5), and so on, up to (61, 1). Here (1, 3) indicates that there are 3 unique species, (2, 2) indicates there are 2 duplicate species, and so on, with
(61, 1) indicating that there is one species found in 61 soil samples.
5. Type (2B) incidence-raw data (DiversityData$Inci_raw
)
The data consist of raw incidence data of the seed-bank records, described above for the incidence frequency counts data. The input data include a 34 x 121 (species-by-sampling-unit) matrix. For each element of the matrix, "1" means a detection and "0" means a non-detection.
data(DiversityData)
data(DiversityData)
Chao, A., Chazdon, R. L., Colwell, R. K. and Shen, T.-J. (2005). A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters, 8, 148-159.
Chao, A., Jost, L., Chiang, S.-C., Jiang, Y.-H. and Chazdon, R. L. (2008). A Two-stage probabilistic approach to multiple-community similarity indices. Biometrics, 64, 1178-1186.
Janzen, D. H. (1973) Sweep samples of tropical foliage insects: description of study sites, with data on species abundances and size distributions . Ecology, 54, 659-686.
Longino, J. T., Coddington, J. A. and Colwell, R. K. (2002). The ant fauna of a tropical rain forest: estimating species richness three different ways. Ecology, 83, 689-702.
Butler, B. J., and Chazdon, R. L. (1998). Species richness, spatial variation, and abundance of the
soil seed bank of a secondary tropical rain forest. Biotropica, 30, 214-222.
Genetics
: Estimation allelic differentiation among subpopulations based on multiple-subpopulation
genetics data. The richness-based indices include the classic Jaccard and Sorensen dissimilarity
indices; the abundance-based indices include the conventional Gst measure, Horn, Morisita-Horn
and regional species-differentiation indices.
Only Type (1) abundance data (datatype="abundance") is supported; input data for each sub-population
include sample frequencies in an empirical sample of individuals. When there are multiple subpopulations, input data consist of an allele-by-subpopulation frequency matrix.
Genetics(X, q = 2, nboot = 200)
Genetics(X, q = 2, nboot = 200)
X |
a matrix, or a data.frame of allele frequencies. |
q |
a specified order to use to compute pairwise dissimilarity measures. If |
nboot |
an integer specifying the number of bootstrap replications. |
a list of ten objects: $info
for summarizing data information.$Empirical_richness
for showing the observed values of the richness-based dis-similarity indices
including the classic Jaccard and Sorensen indices. $Empirical_relative
for showing the observed values of the equal-weighted dis-similarity
indices for comparing allele relative abundances including Gst, Horn, Morisita-Horn and regional differentiation measures. $Empirical_WtRelative
for showing the observed value of the dis-similarity index for
comparing size-weighted allele relative abundances, i.e., Horn size-weighted measure based on Shannon-entropy under equal-effort sampling.
The corresponding three objects for showing the estimated dis-similarity indies are: $estimated_richness
, $estimated_relative
and $estimated_WtRelative
. $pairwise
and $dissimilarity.matrix
for showing respectively the pairwise dis-similarity
estimates (with related statistics) and the dissimilarity matrix for various measures depending on
the diversity order q
specified in the function argument. $q
for showing which diversity order q
to compute pairwise dissimilarity.
Chao, A., and Chiu, C. H. (2016). Bridging the variance and diversity decomposition approaches to beta diversity via similarity and differentiation measures. Methods in Ecology and Evolution, 7, 919-928.
Chao, A., Jost, L., Hsieh, T. C., Ma, K. H., Sherwin, W. B. and Rollins, L. A. (2015). Expected Shannon entropy and Shannon differentiation between subpopulations for neutral genes under the finite island model. Plos One, 10:e0125471.
Jost, L. (2008). and its relatives do not measure differentiation. Molecular Ecology, 17, 4015-4026.
## Not run: # Type (1) abundance data data(GeneticsDataAbu) Genetics(GeneticsDataAbu,q=2,nboot=200) ## End(Not run)
## Not run: # Type (1) abundance data data(GeneticsDataAbu) Genetics(GeneticsDataAbu,q=2,nboot=200) ## End(Not run)
The data taken from Rosenberg et al. (2002) consist of allele frequencies from four human subpopulations (BiakaPyg, Palestin, Bedouin and Druze). The data are formatted as an allele (row) by subpopulation (column) matrix file. Entries in each row denote the frequencies of each allele found in the four subpopulations. The data include an observed allele frequency table with 27 rows and 4 columns.
data(GeneticsDataAbu)
data(GeneticsDataAbu)
Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky, L. A. and Feldman, M. W. (2002). Genetic structure of human populations. Science, 298, 2381-2385.
SimilarityMult
: Estimation various -community similarity indices. The richness-based indices
include the classic
-community Jaccard and Sorensen indices; the abundance-based indices include the Horn, Morisita-Horn, regional species-overlap, and the
-community Bray-Curtis indices.
Three types of data are supported: Type (1) abundance data (datatype="abundance"), Type (2)
incidence-frequency data (datatype="incidence_freq"), and Type (2B) incidence-raw data
(datatype="incidence_raw"); see
SpadeR-package
details for data input formats.
SimilarityMult(X, datatype = c("abundance", "incidence_freq", "incidence_raw"), units, q = 2, nboot = 200, goal = "relative")
SimilarityMult(X, datatype = c("abundance", "incidence_freq", "incidence_raw"), units, q = 2, nboot = 200, goal = "relative")
X |
a matrix/data.frame of species abundances/incidences. |
datatype |
type of input data, "abundance", "incidence_freq" or "incidence_raw". |
units |
number of sampling units in each community. For |
q |
a specified order to use to compute pairwise similarity measures. If |
nboot |
an integer specifying the number of bootstrap replications. |
goal |
a specified estimating goal to use to compute pairwise similarity measures:comparing species relative abundances ( |
a list of fourteen objects: $datatype
for showing the specified data types (abundance or incidence).$info
for summarizing data information.$Empirical_richness
for showing the observed values of the richness-based similarity indices
include the classic -community Jaccard and Sorensen indices.
$Empirical_relative
for showing the observed values of the equal-weighted similarity indices
for comparing species relative abundances including Horn, Morisita-Horn and regional overlap measures. $Empirical_WtRelative
for showing the observed value of the Horn similarity index for comparing
size-weighted species relative abundances based on Shannon entropy under equal-effort sampling. $Empirical_absolute
for showing the observed values of the similarity indices for comparing
absolute abundances. These measures include the Shannon-entropy-based measure, Morisita-Horn and the regional species-overlap measures based on species absolute abundance, as well as the -community Bray-Curtis index.
All measures are valid only under equal-effort sampling.
The corresponding four objects for showing the estimated similarity indices are:
$estimated_richness
, $estimated_relative
, $estimated_WtRelative
and $estimated_absolute
. $pairwise
and $similarity.matrix
for showing respectively the pairwise dis-similarity
estimates (with related statistics) and the similarity matrix for various measures depending on the
diversity order q
and the goal
aspecified in the function arguments. $goal
for showing the goal specified in the argument goal (absolute or relative) used to compute pairwise similarity.$q
for showing which diversity order q
specified to compute pairwise similarity.
Chao, A., and Chiu, C. H. (2016). Bridging the variance and diversity decomposition approaches to beta diversity via similarity and differentiation measures. Methods in Ecology and Evolution, 7, 919-928.
Chao, A., Jost, L., Hsieh, T. C., Ma, K. H., Sherwin, W. B. and Rollins, L. A. (2015). Expected Shannon entropy and Shannon differentiation between subpopulations for neutral genes under the finite island model. Plos One, 10:e0125471.
Chiu, C. H., Jost, L. and Chao, A. (2014). Phylogenetic beta diversity, similarity, and differentiation measures based on Hill numbers. Ecological Monographs, 84, 21-44.
Gotelli, N. G. and Chao, A. (2013). Measuring and estimating species richness, species diver- sity,
and biotic similarity from sampling data. Encyclopedia of Biodiversity, 2nd Edition, Vol. 5, 195-211, Waltham, MA.
## Not run: data(SimilarityMultData) # Type (1) abundance data SimilarityMult(SimilarityMultData$Abu,"abundance",q=2,nboot=200,"relative") # Type (2) incidence-frequency data SimilarityMult(SimilarityMultData$Inci,"incidence_freq",q=2,nboot=200,"relative") # Type (2B) incidence-raw data SimilarityMult(SimilarityMultData$Inci_raw,"incidence_raw", units=c(19,17,15),q=2,nboot=200,"relative") ## End(Not run)
## Not run: data(SimilarityMultData) # Type (1) abundance data SimilarityMult(SimilarityMultData$Abu,"abundance",q=2,nboot=200,"relative") # Type (2) incidence-frequency data SimilarityMult(SimilarityMultData$Inci,"incidence_freq",q=2,nboot=200,"relative") # Type (2B) incidence-raw data SimilarityMult(SimilarityMultData$Inci_raw,"incidence_raw", units=c(19,17,15),q=2,nboot=200,"relative") ## End(Not run)
There are three data sets:
1. Type (1) abundance data (SimilarityMultData$Abu
)
The data include the observed species frequencies of three communities: seedlings (column 1), saplings (column 2) and trees (column 3) collected from an old-growth rain forest; see Chao et al. (2005, 2008). The three entries in each row are the observed frequency (or abundance) of each species from the three communities.
2. Type (2) incidence-frequency data (SimilarityMultData$Inci
)
The data include the observed incidence frequencies of tropical rainforest ants using three sampling techniques: (a) Berlese extraction of soil samples (217 samples), (b) fogging samples from canopy fogging (459 samples), and (c) Malaise trap samples for flying and crawling insects (62 samples); The data were collected in Costa-Rica (Longino et al. 2002). The three entries in the first row of the input data denote the number of sampling units (217, 459 and 62). Beginning with the second row, the three numbers in each row denotes incidence frequencies (the total number of detections) in the samples based on three sampling techniques.
3. Type (2B) incidence-raw data (SimilarityMultData$Inci_raw
)
The data include the observed soil ciliate species detection/non-detection data for a total of 51 soil samples from three areas of Namibia, Africa: Etosha Pan (19 samples), Central Namib Desert (17 samples) and Southern Namib Desert (15 samples). The raw detection/non-detection data include 365 x 51 matrix of 0's and 1's (0 denotes a non-detection and 1 denotes a detection).
data(SimilarityMultData)
data(SimilarityMultData)
Chao, A., Chazdon, R. L., Colwell, R. K. and Shen, T.-J. (2005). A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters, 8, 148-159.
Chao, A., Jost, L., Chiang, S.-C., Jiang, Y.-H. and Chazdon, R. L. (2008). A Two-stage probabilistic approach to multiple-community similarity indices. Biometrics, 64, 1178-1186.
Longino, J. T., Coddington, J. A. and Colwell, R. K. (2002). The ant fauna of a tropical rain forest: estimating species richness three different ways. Ecology, 83, 689-702.
Foissner, W., Agatha, S. and Berger, H. (2002). Soil Ciliates (Protozoa, Ciliophora) from Namibia (Southwest Africa), with emphasis on two contrasting environments, the Etosha Region and the Namib Desert. Denisia, 5, 1-1459.
SimilarityPair
: Estimation various similarity indices for two assemblages. The richness-based
indices include the classic two-community Jaccard and Sorensen indices; the abundance-based
indices include the Horn, Morisita-Horn, regional species-overlap, two-community Bray-Curtis and the
abundance-based Jaccard and Sorensen indices. Three types of data are supported: Type (1)
abundance data (datatype="abundance"), Type (2) incidence-frequency data
(datatype="incidence_freq"), and Type (2B) incidence-raw data (datatype="incidence_raw"); see
SpadeR-package
details for data input formats.
SimilarityPair(X, datatype = c("abundance", "incidence_freq", "incidence_raw"), units, nboot = 200)
SimilarityPair(X, datatype = c("abundance", "incidence_freq", "incidence_raw"), units, nboot = 200)
X |
a matrix/data.frame of species abundances/incidences. |
datatype |
type of input data, "abundance", "incidence_freq" or "incidence_raw". |
units |
number of sampling units in each community. For |
nboot |
an integer specifying the number of replications. |
a list of ten objects: $datatype
for showing the specified data types (abundance or incidence). $info
for summarizing data information. $Empirical_richness
for showing the observed values of the richness-based similarity indices
include the classic two-community Jaccard and Sorensen indices. $Empirical_relative
for showing the observed values of the equal-weighted similarity indices
for comparing species relative abundances including Horn, Morisita-Horn, regional overlap,
Chao-Jaccard and Chao-Sorensen abundance (or incidence) measures based on species relative abundances. $Empirical_WtRelative
for showing the observed value of the Horn similarity index for comparing
size-weighted species relative abundances based on Shannon entropy under equal-effort sampling. $Empirical_absolute
for showing the observed values of the similarity indices for comparing
absolute abundances. These measures include the Shannon-entropy-based measure,
Morisita-Horn and the regional overlap measures based on species absolute abundances, as well as the Bray-Curtis index.
All measures are valid only under equal-effort sampling.
The corresponding four objects for showing the estimated similarity indices are:
$estimated_richness
, $estimated_relative
, $estimated_WtRelative
and $estimated_Absolute
.
Chao, A., Chazdon, R. L., Colwell, R. K. and Shen, T.-J. (2005). A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters, 8, 148-159.
Chao, A., and Chiu, C. H. (2016). Bridging the variance and diversity decomposition approaches to beta diversity via similarity and differentiation measures. Methods in Ecology and Evolution, 7, 919-928.
Chao, A., Jost, L., Hsieh, T. C., Ma, K. H., Sherwin, W. B. and Rollins, L. A. (2015). Expected
Shannon entropy and Shannon differentiation between subpopulations for neutral genes under the finite island model. Plos One, 10:e0125471.
Chiu, C. H., Jost, L. and Chao, A. (2014). Phylogenetic beta diversity, similarity, and differentiation measures based on Hill numbers. Ecological Monographs, 84, 21-44.
## Not run: data(SimilarityPairData) # Type (1) abundance data SimilarityPair(SimilarityPairData$Abu,"abundance",nboot=200) # Type (2) incidence-frequency data SimilarityPair(SimilarityPairData$Inci,"incidence_freq",nboot=200) # Type (2B) incidence-raw data SimilarityPair(SimilarityPairData$Inci_raw,"incidence_raw",units=c(19,17),nboot=200) ## End(Not run)
## Not run: data(SimilarityPairData) # Type (1) abundance data SimilarityPair(SimilarityPairData$Abu,"abundance",nboot=200) # Type (2) incidence-frequency data SimilarityPair(SimilarityPairData$Inci,"incidence_freq",nboot=200) # Type (2B) incidence-raw data SimilarityPair(SimilarityPairData$Inci_raw,"incidence_raw",units=c(19,17),nboot=200) ## End(Not run)
There are three data sets:
1. Type (1) abundance data (SimilarityPairData$Abu
)
The data include the observed species frequencies of two communities: seedlings (column 1), and trees (column 2) collected from an old-growth rain forest; see Chao et al. (2005, 2008). The two entries in each row are the observed frequency (or abundance) of each species from the two communities. (These data are subset of SimilarityMultData$Abu
used in the function SimilarityMult.)
2. Type (2) incidence-frequency data (SimilarityPairData$Inci
)
The data include the observed incidence frequencies of tropical rainforest ants based on two sampling techniques: (a) Berlese extraction of soil samples (217 samples), and (b) Malaise trap samples for flying and crawling insects (62 samples); see Longino et al. (2002). The two entries in first row of the input data denote the number of sampling units (217 and 62). Beginning with the second row, the two numbers in each row denotes incidence frequencies (the total number of detections) in the soil samples based on the two sampling techniques. (These data are subset of SimilarityMultData$Inci
used in the function SimilarityMult.)
3. Type (2B) incidence-raw data (SimilarityPairData$Inci_raw
)
The data include the observed soil ciliate species detection/non-detection data for a total of 36 soil samples from two areas of Namibia, Africa: Etosha Pan (19 samples), and Central Namib Desert (17 samples). The raw detection/non-detection data include 365 x 36 matrix of 0's and 1's (0 denotes a non-detection and 1 denotes a detection). (These data are subset of SimilarityMultData$Inci_raw
used in the function SimilarityMult.)
data(SimilarityPairData)
data(SimilarityPairData)
Chao, A., Chazdon, R. L., Colwell, R. K. and Shen, T.-J. (2005). A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters, 8, 148-159.
Chao, A., Jost, L., Chiang, S.-C., Jiang, Y.-H. and Chazdon, R. L. (2008). A Two-stage probabilistic approach to multiple-community similarity indices. Biometrics, 64, 1178-1186.
Longino, J. T., Coddington, J. A. and Colwell, R. K. (2002). The ant fauna of a tropical rain forest: estimating species richness three different ways. Ecology, 83, 689-702.
Foissner, W., Agatha, S. and Berger, H. (2002) Soil Ciliates (Protozoa, Ciliophora) from Namibia (Southwest Africa), with emphasis on two Contrasting environments, the Etosha Region and the Namib Desert. Denisia, 5, 1-1459.