Title: | Rounded Mixture Package |
---|---|
Description: | Performs univariate probability mass function estimation via Bayesian nonparametric mixtures of rounded kernels as in Canale and Dunson (2011) <doi:10.1198/jasa.2011.tm10552>. |
Authors: | Antonio Canale [aut, cre], Nicola Lunardon [ctb] |
Maintainer: | Antonio Canale <[email protected]> |
License: | GPL-2 |
Version: | 2.2 |
Built: | 2024-12-08 06:51:20 UTC |
Source: | CRAN |
Performs Bayesian probability mass function estimation under DP prior with Poisson base measure.
dp.post.est(x, y, alpha, lambda)
dp.post.est(x, y, alpha, lambda)
x |
Values on which to compute the pmf. |
y |
Vector of observed data. |
alpha |
DP precision parameter |
lambda |
Mean parameter for the Poisson base measure. |
Performs probability mass function estimation under th following model
where is Poisson with mean
lambda
.
A vector of size length(x)
containing the probability masses
Antonio Canale
Carota, C., and Parmigiani, G. (2002), “Semiparametric Regression for Count Data,” Biometrika, 89, 265–281.
data(ethylene) y <- tapply(ethylene$impl,FUN=mean,INDEX=ethylene$id) z <- tapply(ethylene$dose,FUN=mean,INDEX=ethylene$id) # Estimate the pmf of the number of implants in the control group y0 <- y[z==0] pmf.control = dp.post.est(0:30, y0, alpha = 1)
data(ethylene) y <- tapply(ethylene$impl,FUN=mean,INDEX=ethylene$id) z <- tapply(ethylene$dose,FUN=mean,INDEX=ethylene$id) # Estimate the pmf of the number of implants in the control group y0 <- y[z==0] pmf.control = dp.post.est(0:30, y0, alpha = 1)
Data from the developmental toxicity study of ethylene glycol in mice conducted by the National Toxicology Program. Pregnant mice were assigned to dose groups of 0, 750, 1,500 or 3,000 mg/kg per day, with the number of implants measured for each mouse at the end of the experiment. Group sizes are 25, 24, 23 and 23, respectively.
data(ethylene)
data(ethylene)
A data frame with 1192 observations on the following 7 variables.
id
identifyer for pregnant mouse
dose
dose of ethylene glycol
weight
weight of the fetus
sex
sex of the fetus
impl
number of implants in the pregnant mouce
litsz
size of the relative litter
malf
presence of malformation in the fetus
Price, C. J., Kimmel, C. A., Tyl, R. W., and Marr, M. C. (1985) "The developemental toxicity of ethylene glycol in rats and mice" Toxicological and Applied Pharmacology 81, 113-127.
data(ethylene) implants <- tapply(ethylene$impl, FUN=mean, INDEX=ethylene$id) summary(implants) m <- mean(implants) v <- var(implants) hist(implants, main=paste("Histogram of the number of implants (Mean = ", round(m,2), ", Var = ", round(v,2),")"))
data(ethylene) implants <- tapply(ethylene$impl, FUN=mean, INDEX=ethylene$id) summary(implants) m <- mean(implants) v <- var(implants) hist(implants, main=paste("Histogram of the number of implants (Mean = ", round(m,2), ", Var = ", round(v,2),")"))
Performs probability mass function estimation under nonparametric mixture of Poisson kernels.
npmp(y, k, nrep, nb, alpha=1, theta=alpha, sigma=0, mixing_hyperprior= FALSE, basemeasure_hyperprior = FALSE, mixing_type="DP", algo="slice", prior="gamma", a, b, a_a=1, b_a=1, lb=NULL, ub=NULL, print = 1, ndisplay = nrep/4, plot.it = FALSE, pdfwrite = FALSE, ... ) dpmpoiss(y, k, nrep, nb, alpha = 1, a, b, lb = NULL, ub = NULL, print = 1, ndisplay = nrep/4, plot.it = FALSE, pdfwrite = FALSE, ...)
npmp(y, k, nrep, nb, alpha=1, theta=alpha, sigma=0, mixing_hyperprior= FALSE, basemeasure_hyperprior = FALSE, mixing_type="DP", algo="slice", prior="gamma", a, b, a_a=1, b_a=1, lb=NULL, ub=NULL, print = 1, ndisplay = nrep/4, plot.it = FALSE, pdfwrite = FALSE, ... ) dpmpoiss(y, k, nrep, nb, alpha = 1, a, b, lb = NULL, ub = NULL, print = 1, ndisplay = nrep/4, plot.it = FALSE, pdfwrite = FALSE, ...)
y |
Vector of count data |
k |
Truncation level for the number of cluster in the mixture. Default is |
nrep |
Number of MCMC iterations |
nb |
Number of burn-in iteration in the MCMC to discard |
alpha |
Value of the precision parameter of the Dirichlet process prior |
theta |
Value of the strength parameter of the Two-parameters-Poisson-Dirichlet process prior |
sigma |
Value of the discount parameter of the Two-parameters-Poisson-Dirichlet process prior |
mixing_hyperprior |
Logical. If TRUE |
basemeasure_hyperprior |
Logical. If TRUE also the parameters of the base measure are random, see details below. |
mixing_type |
Type of mixing distribution. Default is "DP" for Dirichlet process but also "2PD" for Two-parameters-Poisson-Dirichlet process is allowed. |
algo |
Type of algorithm. Current choices are: slice sampler ( |
prior |
String for the base measure prior. Default is "gamma" for lambda ~ Gamma(a,b). The other choice is "normal" for exp(lambda) ~ N(a,b) |
a |
Shape (mean) hyperparameter for the gamma (normal) base measure |
b |
Scale (sd) hyperparameter for the gamma (normal) prior |
a_a |
Shape hyperparameter for |
b_a |
Scale hyperparameter for |
lb |
Scalar integer. Lower bound for the argument of the pmf. Default is |
ub |
Scalar integer. Upper bound for the argument of the pmf. Default is |
print |
Vector of integers (from 1 to 5) indicating whether to print each step of the Gibbs sampler. Specifically, 1 for current iteration, 2 for the DP cluster allocation, 3 for the posterior parameters of the mixture components, 4 for the precision of the DP, 5 for the posterior pmf. |
ndisplay |
Scalar integer. It gives the number of iterations to be displayed on screen (the function reports on the screen when every |
plot.it |
Logical, default FALSE. If TRUE a plot with empirical and estimated posterior probability mass functions is plotted. |
pdfwrite |
Logical, default FALSE. If TRUE a pdf file is written in the current working directory. Traceplots and other posterior quantities are drown. |
... |
Additional arguments (for future implemetantions). |
The function npmp
performs probability mass function estimation under nonparametric mixture of Poisson kernels, i.e.
where is a nonparametric prior (Dirichlet process or Two-parameters-Poisson-Dirichlet process) with base measure
. The function
dppoiss
is a wrapper to npmp
with mixing_type="DP"
for back portability with version 1.0 of the package. The main part of the code is written in C language to gain computational speed. Plots and posterior summaries are in plain R code. From version 2.0 on, the blocked Gibbs sampler has been removed in place of slice samper (Kalli et al., 2011) and polya-urn sampler. Two different base measures are implemented:
prior="gamma"
and prior="normal"
for
respectively.
name |
Name of the model |
mixing_type |
Name of the mixing prior |
mcmc |
Quantities about MCMC sampling |
mcmc.chains |
MCMC chains of the parameters |
pmf |
A list containing several quantities related to the probability mass function (emprical pmf, posterior mean pmf and pointwise 95% credible intervals) computed for the values from |
parameters |
A list containing the posterior mean of the cluster specific parameters (be careful of label-switching problems) |
clustering |
A list containing posterior quantities related to the clustering structure of the data |
R code and porting by A. Canale, C code by A. Canale with minor contributions by N. Lunardon.
Canale, A. and Dunson, D. B. (2011), "Bayesian Kernel Mixtures for Counts", Journal of American Statistical Association, 106, 1528-1539.
Kalli, M., Griffin, J., and Walker, S. (2011), "Slice sampling mixture models," Statistics and Computing, 21, 93-105.
rmg
data(ethylene) y <- tapply(ethylene$impl,FUN=mean,INDEX=ethylene$id) z <- tapply(ethylene$dose,FUN=mean,INDEX=ethylene$id) # Estimate the pmf of the number of implants in the control group y0 <- y[z==0] pmf.control = dpmpoiss(y0, k=20, nrep=11000, nb=1000, alpha=1, a=1, b=1, lb=5, ub=24, plot.it=TRUE)
data(ethylene) y <- tapply(ethylene$impl,FUN=mean,INDEX=ethylene$id) z <- tapply(ethylene$dose,FUN=mean,INDEX=ethylene$id) # Estimate the pmf of the number of implants in the control group y0 <- y[z==0] pmf.control = dpmpoiss(y0, k=20, nrep=11000, nb=1000, alpha=1, a=1, b=1, lb=5, ub=24, plot.it=TRUE)
Performs Bayesian probability mass function estimation under nonparametric mixture of rounded of Gaussian kernels.
rmg(ydis, k=length(ydis), nrep, nb, alpha=1, theta=alpha, sigma=0, mixing_hyperprior= FALSE, basemeasure_hyperprior = FALSE, mixing_type="DP", algo="slice", mu0=mean(ydis), kap=var(ydis), atau=1, btau=2, a_a=1, b_a=1, lb=NULL, ub=NULL, print = 1, ndisplay = nrep/4, plot.it = FALSE, pdfwrite = FALSE, ...) dpmrg(ydis, k, nrep, nb, alpha, alpha_r = FALSE, mu0 = mean(ydis), kap = var(ydis), atau, btau, a_a = 1, b_a = 1, lb = NULL, ub = NULL, print = 1, ndisplay = nrep/4, plot.it = FALSE, pdfwrite = FALSE, ...)
rmg(ydis, k=length(ydis), nrep, nb, alpha=1, theta=alpha, sigma=0, mixing_hyperprior= FALSE, basemeasure_hyperprior = FALSE, mixing_type="DP", algo="slice", mu0=mean(ydis), kap=var(ydis), atau=1, btau=2, a_a=1, b_a=1, lb=NULL, ub=NULL, print = 1, ndisplay = nrep/4, plot.it = FALSE, pdfwrite = FALSE, ...) dpmrg(ydis, k, nrep, nb, alpha, alpha_r = FALSE, mu0 = mean(ydis), kap = var(ydis), atau, btau, a_a = 1, b_a = 1, lb = NULL, ub = NULL, print = 1, ndisplay = nrep/4, plot.it = FALSE, pdfwrite = FALSE, ...)
ydis |
Vector of count data |
k |
Truncation level for the number of cluster in the mixture. Default is |
nrep |
Number of MCMC iterations |
nb |
Number of burn-in iteration in the MCMC to discard |
alpha |
Value of the precision parameter of the Dirichlet process prior |
theta |
Value of the strength parameter of the Two-parameters-Poisson-Dirichlet process prior |
sigma |
Value of the discount parameter of the Two-parameters-Poisson-Dirichlet process prior |
mixing_hyperprior |
Logical. If TRUE |
alpha_r |
Logical. If TRUE |
basemeasure_hyperprior |
Logical. If TRUE also the parameters of the base measure are random, see details below. |
mixing_type |
Type of mixing distribution. Default is "DP" for Dirichlet process but also "2PD" for Two-parameters-Poisson-Dirichlet process is allowed. |
algo |
Type of algorithm. Current choices are: slice sampler ( |
mu0 |
Location hyperparameter for the latent rounded Gaussian base measure |
kap |
Precision hyperparameter for the latent rounded Gaussian base measure |
atau |
Shape hyperparameter for the Gamma distribution |
btau |
Scale hyperparameter for the Gamma distribution |
a_a |
Shape hyperparameter for the Gamma distribution for |
b_a |
Scale hyperparameter for the Gamma distribution for |
lb |
Scalar integer. Lower bound for the argument of the pmf. Default is |
ub |
Scalar integer. Upper bound for the argument of the pmf. Default is |
print |
Vector of integers (from 1 to 6) indicating whether to print each step of the Gibbs sampler. Specifically, 1 for current iteration, 2 for the data augmentation step simulating the latent continuous variables, 3 for the DP cluster allocation, 4 for the posterior parameters of the mixture components, 5 for the precision of the DP, 6 for the posterior pmf. |
ndisplay |
Scalar integer. It gives the number of iterations to be displayed on screen (the function reports on the screen when every |
plot.it |
Logical, default FALSE. If TRUE a plot with empirical and estimated posterior probability mass functions is plotted. |
pdfwrite |
Logical, default FALSE. If TRUE a pdf file is written in the current working directory. Traceplots and other posterior quantities are drown. |
... |
Additional arguments (for future implemetantions). |
The rmg
function performs Bayesian probability mass function estimation under the mixture model of Canale and Dunson (2011) with Dirichlet process or Two-parameters-Poisson-Dirichlet process as prior for the mixing measure. The model is
where is the Dirichlet process or the Two-parameters-Poisson-Dirichlet process with base measure
and
is a rounded Gaussian kernel with location
and precision
and tresholds
.
The function
dpmrg
is a wrapper to rmg
with mixing_type="DP"
for back portability with version 1.0 of the package.
The main part of the code is written in C language to gain computational speed. Plots and posterior summaries are in plain R code. From version 2.0 on, the blocked gibbs sampler has been removed in place of slice samper (Kalli et al., 2011) and polya-urn sampler.
name |
Name of the model |
mixing_type |
Name of the mixing prior |
mcmc |
Quantities about MCMC sampling |
mcmc.chains |
MCMC chains of the parameters |
pmf |
A list containing several quantities related to the probability mass function (emprical pmf, posterior mean pmf and pointwise 95% credible intervals) computed for the values from |
parameters |
A list containing the posterior mean of the cluster specific parameters (be careful of label-switching problems) |
clustering |
A list containing posterior quantities related to the clustering structure of the data |
R code and porting by A. Canale, C code by A. Canale with minor contributions by N. Lunardon.
Canale, A. and Dunson, D. B. (2011), "Bayesian Kernel Mixtures for Counts", Journal of American Statistical Association, 106, 1528-1539.
Kalli, M., Griffin, J., and Walker, S. (2011), "Slice sampling mixture models," Statistics and Computing, 21, 93-105.
dpmpoiss
data(ethylene) y <- tapply(ethylene$impl,FUN=mean,INDEX=ethylene$id) z <- tapply(ethylene$dose,FUN=mean,INDEX=ethylene$id) # Estimate the pmf of the number of implants in the control group y0 <- y[z==0] pmf.control = rmg(y0, k=20, nrep=11000, nb=1000, alpha=1, atau=1, btau=1, lb=5, ub=24, plot.it= TRUE)
data(ethylene) y <- tapply(ethylene$impl,FUN=mean,INDEX=ethylene$id) z <- tapply(ethylene$dose,FUN=mean,INDEX=ethylene$id) # Estimate the pmf of the number of implants in the control group y0 <- y[z==0] pmf.control = rmg(y0, k=20, nrep=11000, nb=1000, alpha=1, atau=1, btau=1, lb=5, ub=24, plot.it= TRUE)