Title: | Statistical Package for Species Richness Estimation |
---|---|
Description: | Implementation of various methods in estimation of species richness or diversity in Wang (2011)<doi:10.18637/jss.v040.i09>. |
Authors: | Ji-Ping Wang [aut, cre] |
Maintainer: | Ji-Ping Wang <[email protected]> |
License: | GPL-2 |
Version: | 1.2.0 |
Built: | 2024-12-23 06:16:37 UTC |
Source: | CRAN |
SPECIES
provides multiple functions to compute popular estimators for species richness.These estimators include:
(1) jackknife estimator by Burnham and Overton 1978, 1979; (2) lower-bound estimator by Chao 1984; (3) coverage-base estimators ACE, ACE-1 by Chao and Lee 1992; (4) coverage-duplication estimator from Poisson-Gamma model by Chao and Bunge 2002; (5) unconditional nonparametric maximum likelihood estimator by Norris and Pollock 1996, 1998; (6) penalized nonparametric maximum likelihood estimator by Wang and Lindsay 2005; and (7) Poisson-compound Gamma model with smooth nonparametric maximum likelihood estimation by Wang 2010.
functions: chao1984, ChaoBunge, ChaoLee1992, jackknife, pcg ,pnpmle, unpmle; data: butterfly, cottontail,EST, insect, microbial, traffic
Ji-Ping Wang, Department of Statistics, Northwestern University
Maintainer: [email protected]
Acinas, S., Klepac-Ceraj, V., Hunt, D., Pharino, C., Ceraj, I., Distel, D., and Polz, M. (2004), Fine-scale phylogenetic architecture of a complex bacterial community. Nature, 430, 551-554.
Bohning, D. and Schon, D., Nonparametric maximum likelihood estimation of population size based on the counting distribution, Journal of the Royal Statistical Society, Series C: Applied Statistics, 54, 721-737.
Burnham, K. P., and Overton,W. S. (1978), Estimation of the Size of a Closed Population When Capture Probabilities Vary Among Animals, Biometrika, 65, 625-633.
Burnham, K. P., and Overton,W. S. (1979), Robust Estimation of Population Size When Capture Probabilities Vary Among Animals, Ecology, 60, 927-936.
Chao, A. (1984), Nonparametric Estimation of the Number of Classes in a Population, Scandinavian Journal of Statistics, 11, 265-270.
Chao, A. (1987). Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43, 783-791.
Chao, A., and Lee, S.-M. (1992), Estimating the Number of Classes via Sample Coverage, Journal of the American Statistical Association, 87, 210-217.
Chao, A., and Bunge, J. (2002), Estimating the Number of Species in a Stochastic Abundance Model, Biometrics, 58, 531-539.
Fisher, R. A., Corbet, A. S., and Williams, C. B. ,(1943), The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population, Journal of Animal Ecology, 12, 42-58.
Hong, S. H., and Bunge, J. and Jeon, S.O. and Epstein, S. (2006), Predicting microbial species richness, Proc. Natl. Acad. Sci, 103, 117-122.
Norris, J. L. I., and Pollock, K. H. (1996), Nonparametric MLE Under Two Closed Capture-Recapture Models With Heterogeneity, Biometrics, 52,639-649.
Norris, J. L. I., and Pollock, K. H.(1998), Non-Parametric MLE for Poisson Species Abundance Models Allowing for Heterogeneity Between Species, Environmental and Ecological Statistics, 5, 391-402.
Simar, L. (1976), Maximum likelihood estimation of a compound Poisson process, Annals of Statistics, 4, 1200-1209.
Wang, J.-P. Z. and Lindsay, B. G. (2005), A penalized nonparametric maximum likelihood approach to species richness estimation. Journal of American Statistical Association, 100(471):942-959.
Wang, J.-P., and Lindsay, B.G. (2008), An exponential partial prior for improving NPML estimation for mixtures, Statistical Methodology, 5:30-45.
Wang, J.-P. (2010), Estimating the species richness by a Poisson-Compound Gamma model, Biometrika, 97(3): 727-740.
Wang, J.-P. (2011), SPECIES: An R Package for Species Richness Estimation, Journal of Statistical Software, 40(9), 1-15, URL: http://www.jstatsoft.org/v40/i09/.
##load library library(SPECIES) ## "butterfly" is the famous butterfly data by Fisher 1943. data(butterfly) ##jackknife method jackknife(butterfly,k=5) ##using only 'ACE' coverage method ChaoLee1992(butterfly,t=10, method="all") ##using chao1984 lower bound estimator chao1984(butterfly) ##using Chao and Bunge coverage-duplication method ChaoBunge(butterfly,t=10) ##penalized NPMLE method #pnpmle(butterfly,t=15,C=1,b=200) ##unconditonal NPMLE method #unpmle(butterfly,t=10,C=1,b=200) ##Poisson-compound Gamma method #pcg(butterfly,t=20,C=1,b=200)
##load library library(SPECIES) ## "butterfly" is the famous butterfly data by Fisher 1943. data(butterfly) ##jackknife method jackknife(butterfly,k=5) ##using only 'ACE' coverage method ChaoLee1992(butterfly,t=10, method="all") ##using chao1984 lower bound estimator chao1984(butterfly) ##using Chao and Bunge coverage-duplication method ChaoBunge(butterfly,t=10) ##penalized NPMLE method #pnpmle(butterfly,t=15,C=1,b=200) ##unconditonal NPMLE method #unpmle(butterfly,t=10,C=1,b=200) ##Poisson-compound Gamma method #pcg(butterfly,t=20,C=1,b=200)
The famous Fisher's butterfly data originally appeared in Fisher 1943. It has been re-analyzed in many publications in the literature.
Fisher, R. A., Corbet, A. S., andWilliams, C. B. ,1943, The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population, Journal of Animal Ecology, 12, 42-58.
##load library library(SPECIES) ##load data that coming with the package. data(butterfly)
##load library library(SPECIES) ##load data that coming with the package. data(butterfly)
This function calculates the lower-bound estimator by Chao 1984.
chao1984(n,conf=0.95)
chao1984(n,conf=0.95)
n |
a matrix or a numerical data frame of two columns. It is also called the “frequency of frequencies” data in literature.
The first column is the frequency |
conf |
a positive number |
The function chao1984
returns a list of: Nhat
, SE
and CI
.
Nhat |
point estimate. |
SE |
standard error of the point estimate. |
CI |
confidence interval using a log transformation explained in Chao 1987. |
Ji-Ping Wang, Department of Statistics, Northwestern University
Chao, A. (1984), Nonparametric Estimation of the Number of Classes in a Population, Scandinavian Journal of Statistics, 11, 265-270.
Chao, A. (1987). Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43, 783-791.
library(SPECIES) ##load data from the package, ## \dQuote{butterfly}is the famous butterfly data by Fisher 1943. data(butterfly) chao1984(butterfly)
library(SPECIES) ##load data from the package, ## \dQuote{butterfly}is the famous butterfly data by Fisher 1943. data(butterfly) chao1984(butterfly)
This function calculates coverage-duplication based estimator from a Poisson-Gamma model by Chao and Bunge 2002.
ChaoBunge(n, t = 10,conf = 0.95)
ChaoBunge(n, t = 10,conf = 0.95)
n |
a matrix or a numerical data frame of two columns. It is also called the “frequency of frequencies” data in literature. The first column is the frequency |
t |
a positive integer. |
conf |
a positive number |
The function ChaoBunge
returns a list of: Nhat
, SE
and CI
.
Nhat |
point estimate. |
SE |
standard error(s) of the point estimate. |
CI |
confidence interval using a log transformation explained in Chao 1987. |
Ji-Ping Wang, Department of Statistics, Northwestern University
Chao, A. (1984), Nonparametric Estimation of the Number of Classes in a Population, Scandinavian Journal of Statistics, 11, 265-270.
Chao, A., and Bunge, J. (2002), Estimating the Number of Species in a Stochastic Abundance Model, Biometrics, 58, 531-539.
library(SPECIES) ##load data from the package, ##"butterfly" is the famous butterfly data by Fisher 1943. data(butterfly) ##output estimates from all 4 methods using cutoff t=10 ChaoBunge(butterfly,t=10)
library(SPECIES) ##load data from the package, ##"butterfly" is the famous butterfly data by Fisher 1943. data(butterfly) ##output estimates from all 4 methods using cutoff t=10 ChaoBunge(butterfly,t=10)
This function calculates ACE
and ACE-1
estimators by Chao and Lee 1992 (ACE-1
provides
further bias correction based on ACE
).
ChaoLee1992(n, t = 10, method = "all",conf = 0.95)
ChaoLee1992(n, t = 10, method = "all",conf = 0.95)
n |
a matrix or a numerical data frame of two columns. It is also called the “frequency of frequencies” data in literature. The first column is the frequency |
t |
a positive integer. |
method |
a string. It can be any one of “ACE”, “ACE-1”, or “all”. The default is “all”. |
conf |
a positive number |
The function ChaoLee1992
returns a list of: Nhat
, SE
and CI
.
Nhat |
point estimate of the specified method. If the default |
SE |
standard error(s) of the point estimate(s). |
CI |
confidence interval using a log transformation explained in Chao 1987. |
Ji-Ping Wang, Department of Statistics, Northwestern University
Chao, A. (1987). Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43, 783-791.
Chao, A., and Lee, S.-M. (1992), Estimating the Number of Classes via Sample Coverage, Journal of the American Statistical Association, 87, 210-217.
library(SPECIES) ##load data from the package, ## "butterfly" is the famous butterfly data by Fisher 1943. data(butterfly) ##output estimates from all 4 methods using cutoff t=10 ChaoLee1992(butterfly,t=10,method="all") ##output estimates from ACE method using cutoff t=10 ChaoLee1992(butterfly,t=10,method="ACE")
library(SPECIES) ##load data from the package, ## "butterfly" is the famous butterfly data by Fisher 1943. data(butterfly) ##output estimates from all 4 methods using cutoff t=10 ChaoLee1992(butterfly,t=10,method="all") ##output estimates from ACE method using cutoff t=10 ChaoLee1992(butterfly,t=10,method="ACE")
The cottontail data was analyzed in Chao 1987
Chao, A. (1987). Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43, 783-791.
##load library library(SPECIES) ##load data that coming with the package. data(cottontail)
##load library library(SPECIES) ##load data that coming with the package. data(cottontail)
The Arabidopsis thaliana expressed sequence tag (EST) data originally appeared in Wang and Lindsay 2005. It was recently reanalyzed in Wang 2010. For convenience, the frequency at is used to denote the total count of species with
.
Wang, J.-P. Z. and Lindsay, B. G. ,(2005), A penalized nonparametric maximum likelilhood approach to species richness estimation. Journal of American Statistical Association, 2005,100(471):942-959
##load library library(SPECIES) ##load data that coming with the package. data(EST)
##load library library(SPECIES) ##load data that coming with the package. data(EST)
The insects data was analyzed in Burnham and Overton 1979. The frequency at is used to denote the total count of species with
.
Burnham, K. P., and Overton,W. S. (1979), Robust Estimation of Population Size When Capture Probabilities Vary Among Animals, Ecology, 60, 927-936.
##load library library(SPECIES) ##load data that coming with the package. data(insects)
##load library library(SPECIES) ##load data that coming with the package. data(insects)
A function implementing the jackknife estimator of the species number by Burnham and Overton 1978 and 1979.
jackknife(n, k = 5, conf = 0.95)
jackknife(n, k = 5, conf = 0.95)
n |
a matrix or a numerical data frame of two columns. It is also called the “frequency of frequencies” data in literature. The first column is the frequency |
k |
a positive integer. |
conf |
a positive number |
The function jackknife
returns a list of: JackknifeOrder
, Nhat
, SE
and CI
.
JackknifeOrder |
the jackknife estimator order specified order by the user or determined by the testing procedure. |
Nhat |
jackknife estimate. |
SE |
standard error of the jackknife estimate. |
CI |
confidence interval of the jackknife estimate. |
Ji-Ping Wang, Department of Statistics, Northwestern University
Burnham, K. P., and Overton,W. S. (1978), Estimation of the Size of a Closed Population When Capture Probabilities Vary Among Animals, Biometrika, 65, 625-633.
Burnham, K. P., and Overton,W. S. (1979), Robust Estimation of Population Size When Capture Probabilities Vary Among Animals, Ecology, 60, 927-936.
library(SPECIES) ##load data from the package, ## "butterfly" is the famous tterfly data by Fisher 1943. data(butterfly) jackknife(butterfly,k=5)
library(SPECIES) ##load data from the package, ## "butterfly" is the famous tterfly data by Fisher 1943. data(butterfly) jackknife(butterfly,k=5)
The microbial species data originally appeared in Acinas et al 2004. Recently it was re-analyzed by Bohning and Schon 2005, and Wang 2009.
Acinas, S., Klepac-Ceraj, V., Hunt, D., Pharino, C., Ceraj, I., Distel, D., and Polz, M. (2004), Fine-scale phylogenetic architecture of a complex bacterial community. Nature, 430, 551-554.
Hong, S. H., and Bunge, J. and Jeon, S.O. and Epstein, S. (2006), Predicting microbial species richness, Proc. Natl. Acad. Sci, 103, 117-122.
##load library library(SPECIES) ##load data that coming with the package. data(microbial)
##load library library(SPECIES) ##load data that coming with the package. data(microbial)
Function to calculate the Poisson-compound Gamma estimators of the species number by Wang 2010. This method is essentially a conditional NPMLE method. The species abundance here is assumed to follow a compound Gamma model. The confidence interval is obtained based on a bootstrap procedure. A Fortran function is called to for the computing. This function requires Fortran compiler installed.
pcg(n,t=35,C=0,alpha=c(1:10),b=200,seed=NULL,conf=0.95,dis=1)
pcg(n,t=35,C=0,alpha=c(1:10),b=200,seed=NULL,conf=0.95,dis=1)
n |
a matrix or a numerical data frame of two columns. It is also called the “frequency of frequencies” data in literature. The first column is the frequency |
t |
a positive integer. |
C |
integer either 0 or 1. It specifies whether bootstrap confidence interval should be calculated. “ |
b |
integer. |
alpha |
a positive grid for Gamma shape parameter. |
conf |
a positive number |
seed |
a single value, interpreted as an integer. Seed for random number generation |
dis |
0 or 1. 1 for on-screen display of the mixture output, and 0 for none. |
The pcg
estimator is computing intensive. The computing of bootstrap confidence interval may take up to a few hours.
The function pcg
returns a list of: Nhat
, CI
(if “C
=1”) and AlphaModel
.
Nhat |
point estimate of |
CI |
bootstrap confidence interval. |
AlphaModel |
unified shape parameter of compound Gamma selected from cross-validation. |
Ji-Ping Wang, Department of Statistics, Northwestern University
Wang, J.-P. (2010), Estimating the species richness by a Poisson-Compound Gamma model, 97(3): 727-740
library(SPECIES) ##load data from the package, ## \dQuote{butterfly} is the famous butterfly data by Fisher 1943. data(butterfly) ##output estimate without confidence interval using cutoff t=15 ##pcg(butterfly,t=20,C=0,alpha=c(1:10)) ##output estimate with confidence interval using cutoff t=15 #pcg(butterfly,t=20,C=1,alpha=c(1:10),b=200)
library(SPECIES) ##load data from the package, ## \dQuote{butterfly} is the famous butterfly data by Fisher 1943. data(butterfly) ##output estimate without confidence interval using cutoff t=15 ##pcg(butterfly,t=20,C=0,alpha=c(1:10)) ##output estimate with confidence interval using cutoff t=15 #pcg(butterfly,t=20,C=1,alpha=c(1:10),b=200)
This function calculate the penalized conditional NPML estimator of the species number by Wang and Lindsay 2005. This estimator was based on the conditional likelihood of a Poisson mixture model. A penalty term was introduced into the model to prevent the boundary problem discussed in Wang and Lindsay 2008. The confidence interval is calculated based on a bootstrap procedure. A Fortran function is called to for the computing.
pnpmle(n,t=15,C=0,b=200,seed=NULL,conf=0.95,dis=1)
pnpmle(n,t=15,C=0,b=200,seed=NULL,conf=0.95,dis=1)
n |
a matrix or a numerical data frame of two columns. It is also called the “frequency of frequencies” data in literature. The first column is the frequency |
t |
a positive integer. |
C |
integer either 0 or 1. It specifies whether bootstrap confidence interval should be calculated. “ |
b |
integer. |
conf |
a positive number |
seed |
a single value, interpreted as an integer. Seed for random number generation |
dis |
0 or 1. 1 for on-screen display of the mixture output, and 0 for none. |
The function pnpmle
returns a list of: Nhat
, CI
(if “C
=1”).
Nhat |
Point estimate of |
CI |
bootstrap confidence interval |
Ji-Ping Wang,Department of Statistics, Northwestern University
Wang, J.-P. Z. and Lindsay, B. G. ,2005, A penalized nonparametric maximum likelihood approach to species richness estimation. Journal of American Statistical Association, 2005,100(471):942-959
Wang, J.-P., and Lindsay, B.G., 2008, An exponential partial prior for improving NPML estimation for mixtures, Statistical Methodology, 2008,5:30-45
library(SPECIES) ##load data from the package, ## \dQuote{butterfly} is the famous butterfly data by Fisher 1943. #data(butterfly) ##output estimate without confidence interval using cutoff t=15 #pnpmle(butterfly,t=15,C=0) ##output estimate with confidence interval using cutoff t=15 #pnpmle(butterfly,t=15,C=1, b=200)
library(SPECIES) ##load data from the package, ## \dQuote{butterfly} is the famous butterfly data by Fisher 1943. #data(butterfly) ##output estimate without confidence interval using cutoff t=15 #pnpmle(butterfly,t=15,C=0) ##output estimate with confidence interval using cutoff t=15 #pnpmle(butterfly,t=15,C=1, b=200)
The traffice data originally appeared in Simar 1976 where the total number of N is knowns as 9461. Recently it was re-analyzed by Bohning and Schon 2005.
Simar, L. (1976), Maximum likelihood estimation of a compound Poisson process, Annals of Statistics, 4, 1200-1209. Bohning, D., and Schon, D. (2005), Nonparametric maximum likelihood estimation of population size based on the counting distribution, Journal of the Royal Statistical Society, Series C: Applied Statistics, 54, 721-737.
##load library library(SPECIES) ##load data that coming with the package. data(traffic) chao1984(traffic)
##load library library(SPECIES) ##load data that coming with the package. data(traffic) chao1984(traffic)
This function calculate the unconditional NPML estimator of the species number by Norris and Pollock 1996, 1998. This estimator was obtained from the full likelihood based on a Poisson mixture model. The confidence interval is calculated based on a bootstrap procedure.
unpmle(n,t=15,C=0,method="W-L",b=200,conf=.95,seed=NULL,dis=1)
unpmle(n,t=15,C=0,method="W-L",b=200,conf=.95,seed=NULL,dis=1)
n |
a matrix or a numerical data frame of two columns. It is also called the “frequency of frequencies” data in literature. The first column is the frequency |
t |
a positive integer. |
C |
integer either 0 or 1. It specifies whether bootstrap confidence interval should be calculated. “ |
method |
string either “N-P” or “W-L”(default). If |
b |
integer. |
conf |
a positive number |
seed |
a single value, interpreted as an integer. Seed for random number generation |
dis |
0 or 1. 1 for on-screen display of the mixture output, and 0 for none. |
The computing is intensive if method
=“N-P” is used particularly when extrapolation is large.
It may takes hours to compute the bootstrap confidence interval. If method
=“W-L” is used, computing usually
is much much faster. Estimates from both methods are often identical.
The function unpmle
returns a list of: Nhat
, CI
(if “C
=1”)
Nhat |
point estimate of N |
CI |
bootstrap confidence interval. |
The unconditional NPML estimator is unstable from either method='N-P'
or method='W-L'
. Extremely large estimates may occur.
This is also reflected in that the upper confidence bound often greatly vary from different runs of bootstrap procedure. In contrast the penalized NPMLE by pnpmle
function is much more stable.
Ji-Ping Wang, Department of Statistics, Northwestern University
Norris, J. L. I., and Pollock, K. H. (1996), Nonparametric MLE Under Two Closed Capture-Recapture Models With Heterogeneity, Biometrics, 52,639-649.
Norris, J. L. I., and Pollock, K. H.(1998), Non-Parametric MLE for Poisson Species Abundance Models Allowing for Heterogeneity Between Species, Environmental and Ecological Statistics, 5, 391-402.
Bonhing, D. and Schon, D., (2005), Nonparametric maximum likelihood estimation of population size based on the counting distribution, Journal of the Royal Statistical Society, Series C: Applied Statistics, 54, 721-737.
Wang, J.-P. Z. and Lindsay, B. G. ,(2005), A penalized nonparametric maximum likelihood approach to species richness estimation. Journal of American Statistical Association, 2005,100(471):942-959
library(SPECIES) ##load data from the package, ## "butterfly" is the famous butterfly data by Fisher 1943. data(butterfly) ##output estimate without confidence interval using cutoff t=15 #unpmle(butterfly,t=15,C=0) ##output estimate with confidence interval using cutoff t=15 #unpmle(butterfly,t=15,C=1,b=200)
library(SPECIES) ##load data from the package, ## "butterfly" is the famous butterfly data by Fisher 1943. data(butterfly) ##output estimate without confidence interval using cutoff t=15 #unpmle(butterfly,t=15,C=0) ##output estimate with confidence interval using cutoff t=15 #unpmle(butterfly,t=15,C=1,b=200)