Title: | The Poisson-Multinomial Distribution |
---|---|
Description: | Implementation of the exact, normal approximation, and simulation-based methods for computing the probability mass function (pmf) and cumulative distribution function (cdf) of the Poisson-Multinomial distribution, together with a random number generator for the distribution. The exact method is based on multi-dimensional fast Fourier transformation (FFT) of the characteristic function of the Poisson-Multinomial distribution. The normal approximation method uses a multivariate normal distribution to approximate the pmf of the distribution based on central limit theorem. The simulation method is based on the law of large numbers. Details about the methods are available in Lin, Wang, and Hong (2022) <DOI:10.1007/s00180-022-01299-0>. |
Authors: | Yili Hong [aut, cre], Zhengzhi Lin [aut, ctb], Yueyao Wang [aut, ctb], Florian Junge [aut, ctb] |
Maintainer: | Yili Hong <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1 |
Built: | 2024-11-25 06:44:18 UTC |
Source: | CRAN |
Computes the pmf of Poisson-Multinomial distribution (PMD), specified by the success probability matrix, using various methods. This function is capable of computing all probability mass points as well as of pmf at certain point(s).
dpmd(pmat, xmat = NULL, method = "DFT-CF", B = 1000)
dpmd(pmat, xmat = NULL, method = "DFT-CF", B = 1000)
pmat |
An |
xmat |
A matrix with |
method |
Character string stands for the method selected by users to
compute the cdf. The method can only be one of
the following three:
|
B |
Number of repeats used in the simulation method. It is ignored for methods other than
the |
Consider independent trials and each trial leads to a success outcome for exactly one of the
categories.
Each category has varying success probabilities from different trials. The Poisson multinomial distribution (PMD) gives the probability
of any particular combination of numbers of successes for the
categories.
The success probabilities form an
matrix, which is called the success probability matrix and denoted by
pmat
.
For the methods we applied in dpmd
, "DFT-CF"
is an exact method that computes all probability mass points of the distribution,
using multi-dimensional FFT algorithm. When the dimension of pmat
increases, the computation burden of "DFT-CF"
may challenge the capability
of a computer because the method automatically computes all probability mass points regardless of the input of xmat
.
"SIM"
is a simulation method that generates random samples from the distribution, and uses relative frequency to estimate the pmf. Note that the accuracy and running time will be affected by user choice of B
.
Usually B
=1e5 or 1e6 will be accurate enough. Increasing B
to larger than 1e8 will heavily increase the
computational burden of the computer.
"NA"
is an approximation method that uses a multivariate normal distribution to approximate
the pmf at the points specified in xmat
. This method requires an input of xmat
.
Notice if xmat
is not specified then it will be set as NULL
. In this case, dpmd
will
compute the entire pmf if the chosen method is "DFT-CF"
or "SIM"
.
If xmat
is provided, only the pmf at the points specified
by xmat
will be outputted.
For a given xmat
, dpmd
returns the pmf at points specified by xmat
.
If xmat
is NULL
, all probability mass points for the distribution specified by the success probability matrix pmat
will be computed, and the results are
stored and outputted in a multi-dimensional array, denoted by res
. Note the dimension of
pmat
is , thus
res
will be an array. Then
the value of the pmf
can be extracted as
.
For example, for the pmat
matrix in the example section, the array element res[1,2,1]=0.90
gives
the value of the pmf .
Lin, Z., Wang, Y., and Hong, Y. (2023). The computing of the Poisson multinomial distribution and applications in ecological inference and machine learning, Computational Statistics, Vol. 38, pp. 1851-1877.
pp <- matrix(c(.1, .1, .1, .7, .1, .3, .3, .3, .5, .2, .1, .2), nrow = 3, byrow = TRUE) x <- c(0,0,1,2) x1 <- matrix(c(0,0,1,2,2,1,0,0),nrow=2,byrow=TRUE) dpmd(pmat = pp) dpmd(pmat = pp, xmat = x1) dpmd(pmat = pp, xmat = x) dpmd(pmat = pp, xmat = x, method = "NA" ) dpmd(pmat = pp, xmat = x1, method = "NA" ) dpmd(pmat = pp, method = "SIM", B = 1e3) dpmd(pmat = pp, xmat = x, method = "SIM", B = 1e3) dpmd(pmat = pp, xmat = x1, method = "SIM", B = 1e3)
pp <- matrix(c(.1, .1, .1, .7, .1, .3, .3, .3, .5, .2, .1, .2), nrow = 3, byrow = TRUE) x <- c(0,0,1,2) x1 <- matrix(c(0,0,1,2,2,1,0,0),nrow=2,byrow=TRUE) dpmd(pmat = pp) dpmd(pmat = pp, xmat = x1) dpmd(pmat = pp, xmat = x) dpmd(pmat = pp, xmat = x, method = "NA" ) dpmd(pmat = pp, xmat = x1, method = "NA" ) dpmd(pmat = pp, method = "SIM", B = 1e3) dpmd(pmat = pp, xmat = x, method = "SIM", B = 1e3) dpmd(pmat = pp, xmat = x1, method = "SIM", B = 1e3)
Computes the cdf of Poisson-Multinomial distribution that is specified by the success probability matrix, using various methods.
ppmd(pmat, xmat, method = "DFT-CF", B = 1000)
ppmd(pmat, xmat, method = "DFT-CF", B = 1000)
pmat |
An |
xmat |
A matrix with |
method |
Character string stands for the method selected by users to
compute the cdf. The method can only be one of
the following three:
|
B |
Number of repeats used in the simulation method. It is ignored for methods other than
the |
See Details in dpmd
for the definition of the PMD, the introduction of notation, and the description of the three methods ("DFT-CF"
, "NA"
, and "SIM"
).
ppmd
computes the cdf by adding all probability
mass points within hyper-dimensional space bounded by x
as in the cdf.
The value of cdf at
.
pp <- matrix(c(.1, .1, .1, .7, .1, .3, .3, .3, .5, .2, .1, .2), nrow = 3, byrow = TRUE) x <- c(3,2,1,3) x1 <- matrix(c(0,0,1,2,2,1,0,0),nrow=2,byrow=TRUE) ppmd(pmat = pp, xmat = x) ppmd(pmat = pp, xmat = x1) ppmd(pmat = pp, xmat = x, method = "NA") ppmd(pmat = pp, xmat = x1, method = "NA") ppmd(pmat = pp, xmat = x, method = "SIM", B = 1e3) ppmd(pmat = pp, xmat = x1, method = "SIM", B = 1e3)
pp <- matrix(c(.1, .1, .1, .7, .1, .3, .3, .3, .5, .2, .1, .2), nrow = 3, byrow = TRUE) x <- c(3,2,1,3) x1 <- matrix(c(0,0,1,2,2,1,0,0),nrow=2,byrow=TRUE) ppmd(pmat = pp, xmat = x) ppmd(pmat = pp, xmat = x1) ppmd(pmat = pp, xmat = x, method = "NA") ppmd(pmat = pp, xmat = x1, method = "NA") ppmd(pmat = pp, xmat = x, method = "SIM", B = 1e3) ppmd(pmat = pp, xmat = x1, method = "SIM", B = 1e3)
Generates random samples from the PMD specified by the success probability matrix.
rpmd(pmat, s = 1)
rpmd(pmat, s = 1)
pmat |
An |
s |
The number of samples to be generated. |
An matrix of samples, each row stands for one sample from the PMD with success probability matrix
pmat
.
pp <- matrix(c(.1, .1, .1, .7, .1, .3, .3, .3, .5, .2, .1, .2), nrow = 3, byrow = TRUE) rpmd(pmat = pp, s = 5)
pp <- matrix(c(.1, .1, .1, .7, .1, .3, .3, .3, .5, .2, .1, .2), nrow = 3, byrow = TRUE) rpmd(pmat = pp, s = 5)