Title: | Estimation of Entropy and Related Quantities |
---|---|
Description: | Contains methods for the estimation of Shannon's entropy, variants of Renyi's entropy, mutual information, Kullback-Leibler divergence, and generalized Simpson's indices. The estimators used have a bias that decays exponentially fast. |
Authors: | Lijuan Cao [aut], Michael Grabchak [aut, cre] |
Maintainer: | Michael Grabchak <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.1 |
Built: | 2024-12-14 06:22:36 UTC |
Source: | CRAN |
Contains methods for the estimation of Shannon's entropy, variants of Renyi's entropy, mutual Information, Kullback-Leibler divergence, and generalized Simpson's indices. These estimators have a bias that decays exponentially fast. For more information see Z. Zhang and J. Zhou (2010), Zhang (2012), Zhang (2013), Zhang and Grabchak (2013), Zhang and Grabchak (2014a), Zhang and Grabchak (2014b), and Zhang and Zheng (2014).
Package: | EntropyEstimation |
Type: | Package |
Version: | 1.2.1 |
Date: | 2024-09-14 |
License: | GPL3 |
Lijuan Cao <[email protected]> and Michael Grabchak <[email protected]>
Z. Zhang (2012). Entropy estimation in Turing's' perspective. Neural Computation 24(5), 1368–1389.
Z. Zhang (2013). Asymptotic normality of an entropy estimator with asymptotically decaying bias. IEEE Transactions on Information Theory 59(1), 504–508.
Z. Zhang and M. Grabchak (2013). Bias Adjustment for a Nonparametric Entropy Estimator. Entropy, 15(6), 1999-2011.
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
Z. Zhang and M. Grabchak (2014b). Nonparametric Estimation of Kullback-Leibler Divergence. Neural Computation, 26(11): 2570-2593.
Z. Zhang and L. Zheng (2014). A Mutual Information Estimator with Exponentially Decaying Bias.
Z. Zhang and J. Zhou (2010). Re-parameterization of multinomial distributions and diversity indices. Journal of Statistical Planning and Inference 140(7), 1731-1738.
Returns the estimated asymptotic standard deviation for the Z estimator of Shannon's Entropy. Note that this is also the asymptotic standard deviation of the plug-in estimator. See Zhang and Grabchak (2014a) for details.
Entropy.sd(x)
Entropy.sd(x)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) # vector of counts Entropy.sd(x) # Estimated standard deviation data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Entropy.sd(counts)
x = c(1,3,7,4,8) # vector of counts Entropy.sd(x) # Estimated standard deviation data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Entropy.sd(counts)
Returns the Z estimator of Shannon's Entropy. This estimator has exponentially decaying bias. See Zhang (2012), Zhang (2013), and Zhang and Grabchak (2014a) for details.
Entropy.z(x)
Entropy.z(x)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
Lijuan Cao and Michael Grabchak
Z. Zhang (2012). Entropy estimation in Turing's' perspective. Neural Computation 24(5), 1368–1389.
Z. Zhang (2013). Asymptotic normality of an entropy estimator with asymptotically decaying bias. IEEE Transactions on Information Theory 59(1), 504–508.
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) Entropy.z(x) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Entropy.z(counts)
x = c(1,3,7,4,8) Entropy.z(x) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Entropy.z(counts)
Returns the estimated asymptotic standard deviation of the Z estimator of the generalized Simpson's index of order r, i.e. of the index sum_k p_k(1-p_k)^r. This estimate of the standard deviation is based on the formula in Zhang and Grabchak (2014a) and not the one in Zhang and Zhou (2010).
GenSimp.sd(x, r)
GenSimp.sd(x, r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Positive integer representing the order of the generalized Simpson's index. If a noninteger value is given then the integer part is taken. Must be strictly less than sum(x). |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
Z. Zhang and J. Zhou (2010). Re-parameterization of multinomial distributions and diversity indices. Journal of Statistical Planning and Inference 140(7), 1731-1738.
x = c(1,3,7,4,8) GenSimp.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) GenSimp.sd(counts,2)
x = c(1,3,7,4,8) GenSimp.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) GenSimp.sd(counts,2)
Returns the Z estimator of the generalized Simpson's index of order r, i.e. of the index sum_k p_k(1-p_k)^r. See Zhang and Zhou (2010) and Zhang and Grabchak (2014a) for details.
GenSimp.z(x,r)
GenSimp.z(x,r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Positive integer representing the order of the generalized Simpson's index. If a noninteger value is given then the integer part is taken. Must be strictly less than sum(x). |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
Z. Zhang and J. Zhou (2010). Re-parameterization of multinomial distributions and diversity indices. Journal of Statistical Planning and Inference 140(7), 1731-1738.
x = c(1,3,7,4,8) GenSimp.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) GenSimp.z(counts,2)
x = c(1,3,7,4,8) GenSimp.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) GenSimp.z(counts,2)
Returns the estimated asymptotic standard deviation for the Z estimator of Hill's diversity numbe. Note that this is also the asymptotic standard deviation of the plug-in estimator. See Zhang and Grabchak (2014a) for details.
Hill.sd(x, r)
Hill.sd(x, r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Order of Hill's deversity numbe. Must be a strictly positive real number. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) Hill.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Hill.sd(counts,2)
x = c(1,3,7,4,8) Hill.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Hill.sd(counts,2)
Returns the Z estimator of Hill's diversity number. This is based on raising the Z estimator of Renyi's equivalent entropy to the 1/(r-1) power. When r=1 returns exp(H), where H is the Z estimator of Shannon's entropy. See Zhang and Grabchak (2014a) for details.
Hill.z(x, r)
Hill.z(x, r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Order of Renyi's equivalent entropy this index is based on. Must be a strictly positive real number. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) Hill.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Hill.z(counts,2)
x = c(1,3,7,4,8) Hill.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Hill.z(counts,2)
Returns the augmented plugin estimator of Kullback-Leibler Divergence. See Zhang and Grabchak (2014b) for details.
KL.Plugin(x, y)
KL.Plugin(x, y)
x |
Vector of counts from first distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
y |
Vector of counts from second distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014b). Nonparametric Estimation of Kullback-Leibler Divergence. Neural Computation, 26(11): 2570-2593.
x = c(1,3,7,4,8) y = c(2,5,1,3,6) KL.Plugin(x,y) KL.Plugin(y,x)
x = c(1,3,7,4,8) y = c(2,5,1,3,6) KL.Plugin(x,y) KL.Plugin(y,x)
Returns the estimated asymptotic standard deviation for the Z estimator of Kullback-Leibler's divergence. Note that this is also the asymptotic standard deviation of the plug-in estimator. See Zhang and Grabchak (2014b) for details.
KL.sd(x, y)
KL.sd(x, y)
x |
Vector of counts from the first distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
y |
Vector of counts from the second distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014b). Nonparametric Estimation of Kullback-Leibler Divergence. Neural Computation, 26(11): 2570-2593.
x = c(1,3,7,4,8) # first vector of counts y = c(2,5,1,3,6) # second vector of counts KL.sd(x,y) # Estimated standard deviation KL.sd(y,x) # Estimated standard deviation
x = c(1,3,7,4,8) # first vector of counts y = c(2,5,1,3,6) # second vector of counts KL.sd(x,y) # Estimated standard deviation KL.sd(y,x) # Estimated standard deviation
Returns the Z estimator of Kullback-Leibler Divergence, which has exponentially decaying bias. See Zhang and Grabchak (2014b) for details.
KL.z(x, y)
KL.z(x, y)
x |
Vector of counts from the first distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
y |
Vector of counts from the second distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014b). Nonparametric Estimation of Kullback-Leibler Divergence. Neural Computation, 26(11): 2570-2593.
x = c(1,3,7,4,8) y = c(2,5,1,3,6) KL.z(x,y) KL.z(y,x)
x = c(1,3,7,4,8) y = c(2,5,1,3,6) KL.z(x,y) KL.z(y,x)
Returns the estimated asymptotic standard deviation for the Z estimator of mutual information. Note that this is also the asymptotic standard deviation of the plug-in estimator. See Zhang and Zheng (2014) for details.
MI.sd(y)
MI.sd(y)
y |
Matrix of counts. Must be integer valued. Each entry represents the number of observations of a distinct combination of letters from the two alphabets. |
Lijuan Cao and Michael Grabchak
Z. Zhang and L. Zheng (2014). A Mutual Information Estimator with Exponentially Decaying Bias.
x = matrix(c(0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 2, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 3, 6, 2, 2, 0, 0, 0, 2, 0, 2, 5, 6, 5, 1, 0, 0, 0, 0, 0, 4, 6, 11, 5, 1, 1, 0, 1, 0, 0, 5, 10, 21, 7, 5, 1, 0, 1, 0, 0, 7, 11, 9, 6, 3, 0, 0, 1, 0, 0, 4, 10, 6, 5, 1, 0, 0, 0),10,10,byrow=TRUE) MI.sd(x) x = rbinom(100,20,.5) y = rbinom(100,20,.5) MI.sd(table(x,y))
x = matrix(c(0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 2, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 3, 6, 2, 2, 0, 0, 0, 2, 0, 2, 5, 6, 5, 1, 0, 0, 0, 0, 0, 4, 6, 11, 5, 1, 1, 0, 1, 0, 0, 5, 10, 21, 7, 5, 1, 0, 1, 0, 0, 7, 11, 9, 6, 3, 0, 0, 1, 0, 0, 4, 10, 6, 5, 1, 0, 0, 0),10,10,byrow=TRUE) MI.sd(x) x = rbinom(100,20,.5) y = rbinom(100,20,.5) MI.sd(table(x,y))
Returns the Z estimator of Mutual Information. This estimator has exponentially decaying bias. See Zhang and Zheng (2014) for details.
MI.z(x)
MI.z(x)
x |
Matrix of counts. Must be integer valued. Each entry represents the number of observations of a distinct combination of letters from the two alphabets. |
Lijuan Cao and Michael Grabchak
Z. Zhang and L. Zheng (2014). A Mutual Information Estimator with Exponentially Decaying Bias.
x = matrix(c(0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 2, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 3, 6, 2, 2, 0, 0, 0, 2, 0, 2, 5, 6, 5, 1, 0, 0, 0, 0, 0, 4, 6, 11, 5, 1, 1, 0, 1, 0, 0, 5, 10, 21, 7, 5, 1, 0, 1, 0, 0, 7, 11, 9, 6, 3, 0, 0, 1, 0, 0, 4, 10, 6, 5, 1, 0, 0, 0),10,10,byrow=TRUE) MI.z(x) x = rbinom(100,20,.5) y = rbinom(100,20,.5) MI.z(table(x,y))
x = matrix(c(0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 2, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 3, 6, 2, 2, 0, 0, 0, 2, 0, 2, 5, 6, 5, 1, 0, 0, 0, 0, 0, 4, 6, 11, 5, 1, 1, 0, 1, 0, 0, 5, 10, 21, 7, 5, 1, 0, 1, 0, 0, 7, 11, 9, 6, 3, 0, 0, 1, 0, 0, 4, 10, 6, 5, 1, 0, 0, 0),10,10,byrow=TRUE) MI.z(x) x = rbinom(100,20,.5) y = rbinom(100,20,.5) MI.z(table(x,y))
Returns the estimated asymptotic standard deviation for the Z estimator of Renyi's Entropy. Note that this is also the asymptotic standard deviation of the plug-in estimator. See Zhang and Grabchak (2014a) for details.
Renyi.sd(x, r)
Renyi.sd(x, r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Order of Renyi's entropy. Must be a strictly positive real number. Not allowed to be 1, in that case use Entropy.sd instead. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) Renyi.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Renyi.sd(counts,2)
x = c(1,3,7,4,8) Renyi.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Renyi.sd(counts,2)
Returns the Z estimator of Renyi's entropy. This is based on taking the log of the Z estimator of Renyi's equivalent entropy and dividing by (1-r). When r=1 returns the Z estimator of Shannon's entropy. See Zhang and Grabchak (2014a) for details.
Renyi.z(x, r)
Renyi.z(x, r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Order of Renyi's equivalent entropy this index is based on. Must be a strictly positive real number. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) Renyi.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Renyi.z(counts,2)
x = c(1,3,7,4,8) Renyi.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Renyi.z(counts,2)
Returns the estimated asymptotic standard deviation for the Z estimator of Renyi Equivalent Entropy. Note that this is also the asymptotic standard deviation of the plug-in estimator. When r=1, returns 0. See Zhang and Grabchak (2014a) for details.
RenyiEq.sd(x, r)
RenyiEq.sd(x, r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Order of Renyi's equivalent entropy. Must be a strictly positive real number. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) RenyiEq.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) RenyiEq.sd(counts,2)
x = c(1,3,7,4,8) RenyiEq.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) RenyiEq.sd(counts,2)
Returns the Z estimator of Renyi's equivalent entropy. This estimator has exponentially decaying bias. When r=1 returns 1. See Zhang and Grabchak (2014a) for details.
RenyiEq.z(x, r)
RenyiEq.z(x, r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Order of Renyi's equivalent entropy. Must be a strictly positive real number. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) RenyiEq.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) RenyiEq.z(counts,2)
x = c(1,3,7,4,8) RenyiEq.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) RenyiEq.z(counts,2)
Returns the augmented plugin estimator of Symetrized Kullback-Leibler Divergence. See Zhang and Grabchak (2014b) for details.
SymKL.Plugin(x, y)
SymKL.Plugin(x, y)
x |
Vector of counts from first distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
y |
Vector of counts from second distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014b). Nonparametric Estimation of Kullback-Leibler Divergence. Neural Computation, DOI 10.1162/NECO_a_00646.
x = c(1,3,7,4,8) # first vector of counts y = c(2,5,1,3,6) # second vector of counts SymKL.Plugin(x,y) # Estimated standard deviation
x = c(1,3,7,4,8) # first vector of counts y = c(2,5,1,3,6) # second vector of counts SymKL.Plugin(x,y) # Estimated standard deviation
Returns the estimated asymptotic standard deviation for the Z estimator of Symmetrized Kullback-Leibler's divergence. Note that this is also the asymptotic standard deviation of the plug-in estimator. See Zhang and Grabchak (2014b) for details.
SymKL.sd(x, y)
SymKL.sd(x, y)
x |
Vector of counts from first distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
y |
Vector of counts from second distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014b). Nonparametric Estimation of Kullback-Leibler Divergence. Neural Computation, DOI 10.1162/NECO_a_00646.
x = c(1,3,7,4,8) # first vector of counts y = c(2,5,1,3,6) # second vector of counts SymKL.sd(x,y) # Estimated standard deviation
x = c(1,3,7,4,8) # first vector of counts y = c(2,5,1,3,6) # second vector of counts SymKL.sd(x,y) # Estimated standard deviation
Returns the Z estimator of Symetrized Kullback-Leibler Divergence, which has exponentialy decaying bias. See Zhang and Grabchak (2014b) for details.
SymKL.z(x, y)
SymKL.z(x, y)
x |
Vector of counts from first distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
y |
Vector of counts from second distribution. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014b). Nonparametric Estimation of Kullback-Leibler Divergence. Neural Computation, DOI 10.1162/NECO_a_00646.
x = c(1,3,7,4,8) y = c(2,5,1,3,6) SymKL.z(x,y)
x = c(1,3,7,4,8) y = c(2,5,1,3,6) SymKL.z(x,y)
Returns the estimated asymptotic standard deviation for the Z estimator of Tsallis Entropy. Note that this is also the asymptotic standard deviation of the plug-in estimator. See Zhang and Grabchak (2014a) for details.
Tsallis.sd(x, r)
Tsallis.sd(x, r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Order of Tsallis entropy. Must be a strictly positive real number. Not allowed to be 1, in that case use Entropy.sd instead. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) Tsallis.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Tsallis.sd(counts,2)
x = c(1,3,7,4,8) Tsallis.sd(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Tsallis.sd(counts,2)
Returns the Z estimator of Tsallis entropy. This is based on scaling and shifting the Z estimator of Renyi's equivalent entropy. When r=1 returns the Z estimator of Shannon's entropy. See Zhang and Grabchak (2014a) for details.
Tsallis.z(x, r)
Tsallis.z(x, r)
x |
Vector of counts. Must be integer valued. Each entry represents the number of observations of a distinct letter. |
r |
Order or Renyi's equivalent entropy this index is based on. Must be a strictly positive real number. |
Lijuan Cao and Michael Grabchak
Z. Zhang and M. Grabchak (2014a). Entropic representation and estimation of diversity indices. http://arxiv.org/abs/1403.3031.
x = c(1,3,7,4,8) Tsallis.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Tsallis.z(counts,2)
x = c(1,3,7,4,8) Tsallis.z(x,2) data = rbinom(10,20,.5) counts = tabulate(as.factor(data)) Tsallis.z(counts,2)