| Title: | Ewens Distribution |
|---|---|
| Description: | Implements the probability mass function of, and random draws from, the Ewens distribution, a probability distribution over partitions of integer, as described in Ewens (1972) <doi:10.1016/0040-5809(72)90035-4>. |
| Authors: | Chris Hanretty [aut, cre] (ORCID: <https://orcid.org/0000-0002-8932-9405>) |
| Maintainer: | Chris Hanretty <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-19 14:04:54 UTC |
| Source: | https://github.com/cran/ewens |
Gives the probability mass function for the Ewens distribution, as described in Ewens, Warren (1972). "The sampling theory of selectively neutral alleles". Theoretical Population Biology. 3: 87–112. doi:10.1016/0040-5809(72)90035-4.
dewens(x, theta = 1, log = FALSE)dewens(x, theta = 1, log = FALSE)
x |
A vector giving class memberships of each observation in the sample |
theta |
A non-negative parameter governing the expected sample diversity. |
log |
if TRUE, probabilities are given as log(p). Default is FALSE. |
The probability of a vector of counts is given by the expression
A numeric vector giving a probability (or if log = TRUE, a log probability)
x <- sample(LETTERS, 120, replace = TRUE) dewens(x, theta = 1) dewens(x, theta = 0) ## returns NaN since vector incompatible with zero diversityx <- sample(LETTERS, 120, replace = TRUE) dewens(x, theta = 1) dewens(x, theta = 0) ## returns NaN since vector incompatible with zero diversity
Probability mass function for the number of classes from a Ewens distribution
dewens_k(k, n, theta)dewens_k(k, n, theta)
k |
An integer number of classes at which to evaluate the PMF |
n |
A sample size not less than k |
theta |
A non-negative parameter governing the expected sample diversity. |
The number of classes from a Ewens distribution with parameter is given by the expression
, where is the absolute value of a Stirling number of the first kind.
The probability of observing k classes
x <- sample(LETTERS, 120, replace = TRUE) dewens_k(1, 20, theta = 1) ## Pretty unlikely we just see one classx <- sample(LETTERS, 120, replace = TRUE) dewens_k(1, 20, theta = 1) ## Pretty unlikely we just see one class
The expected number of classes from the Ewens distribution is given by . This is often more convenient than integrating across the PMF given by dewens_k
ewens_k_exact(n, theta)ewens_k_exact(n, theta)
n |
The sample size |
theta |
The non-negative parameter governing expected sample diversity |
Maximum likelihood estimate of theta given sample vector with class memberships
ewens_mle(x)ewens_mle(x)
x |
A vector containing class memberships; sample size n and number of classes k are calculated from this |
A scalar giving the estimate of theta
Draw from a generalized Chinese Restaurant Process
gcrp(n, alpha = 0, theta = 1)gcrp(n, alpha = 0, theta = 1)
n |
The sample size. |
alpha |
A parameter between zero and one inclusive governing the expected sample diversity |
theta |
A non-negative parameter governing the expected sample diversity. |
A vector of length n consisting of numeric class labels.
rewens(100, 1) rewens(120, 0.5) rewens(10, 0)rewens(100, 1) rewens(120, 0.5) rewens(10, 0)
Returns a vector with class membership
rewens(n, theta = 1)rewens(n, theta = 1)
n |
The sample size. |
theta |
A non-negative parameter governing the expected sample diversity. |
Although this command is described as sampling from the Ewens distribution, it is easier to think of it as a particular instantiation of the Chinese Restaurant Process, run for n "customers". The $j$th customer
sits at a new table with probability , or
sits at an occupied table with probability
where $c$ is the number of customers already at each table.
A vector of length n consisting of numeric class labels.
rewens(100, 1) rewens(120, 0.5) rewens(10, 0) ## equal to rep(1, 10)rewens(100, 1) rewens(120, 0.5) rewens(10, 0) ## equal to rep(1, 10)
Draw from the Griffiths-Engen-McCloskey distribution
rgem(alpha = 0, theta = 1, trunc_at = 500)rgem(alpha = 0, theta = 1, trunc_at = 500)
alpha |
A parameter between zero and one |
theta |
A parameter which must be greater than -alpha |
trunc_at |
An integer which specifies the maximum number of components to return |
The Griffiths-Engen-McCloskey distribution is the infinite dimensional counterpart to the Ewens sampling distribution. This function does not return an infinite dimensional vector(!), but returns a vector of shares creating by a "stick-breaking" construction. The vector of shares is returned after trunc_at sticks are broken; this can mean that there is still a non-negligible residual amount.
A vector of shares of length trunc_at which may sum to less than one