Title: | Multiscale Bernstein Polynomials for Densities |
---|---|
Description: | Performs Bayesian nonparametric multiscale density estimation and multiscale testing of group differences with multiscale Bernstein polynomials (msBP) mixtures as in Canale and Dunson (2016). |
Authors: | Antonio Canale |
Maintainer: | Antonio Canale <[email protected]> |
License: | GPL-2 |
Version: | 1.4-1 |
Built: | 2024-12-16 06:48:57 UTC |
Source: | CRAN |
Performs Bayesian nonparametric multiscale density estimation and multiscale testing of group differences with multiscale Bernstein polynomials (msBP) mixtures as in Canale and Dunson (2016).
Antonio Canale <[email protected]>
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
Dataset with the velocities of the 82 galaxies reported by Roeder (1990)
data(galaxy)
data(galaxy)
A data frame with 82 observations and a single variable reporting the speed of galaxies (km/second)
Roeder, K. (1990) Density estimation with confidence sets exemplified by superclusters and voids in the galaxies, Journal of the American Statistical Association, 85: 617-624.
Escobar, M.D. and West, M. (1995) Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association, 90: 577-588.
data(galaxy) str(galaxy)
data(galaxy) str(galaxy)
Compute the binary tree of probabilities using the multiscale stick-breaking process of Canale and Dunson (2016).
msBP.compute.prob(msBPtree, root = TRUE)
msBP.compute.prob(msBPtree, root = TRUE)
msBPtree |
An object of the class msBPTree |
root |
logical. if the root needs to be considered (default) or it should be cut (fixing |
Compute a binary tree of weights. The general weights for node of scale
, is
where and
if
is the right daughter of node
, or
if
is the left daughter of
.
An object of the msBPTree class is basically a list containing two objects of the class
binaryTree
: the tree (representing the stoping probabilities) and the
tree (representing the proceed-right probabilities).
An object of the class msbpTree
.
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
S <-structure(list( T = list(1/8,c(1/3,1/3), c(1/4,1/4,1/4,1/4), rep(1,8)), max.s=3), class = "binaryTree") R <-structure(list( T = list(1/2,c(1/2,1/2), c(1/4,1/2,1/2,1/2), rep(1,8)), max.s=3), class = "binaryTree") RS <-structure(list(S = S, R = R), class = "msbpTree") probabilities <- msBP.compute.prob(RS)
S <-structure(list( T = list(1/8,c(1/3,1/3), c(1/4,1/4,1/4,1/4), rep(1,8)), max.s=3), class = "binaryTree") R <-structure(list( T = list(1/2,c(1/2,1/2), c(1/4,1/2,1/2,1/2), rep(1,8)), max.s=3), class = "binaryTree") RS <-structure(list(S = S, R = R), class = "msbpTree") probabilities <- msBP.compute.prob(RS)
Gibbs sampling for Markov Chain Motecarlo sampling from the posterior distribution of an msBP model.
msBP.Gibbs(x, a, b, g0 = "normal", g0par=c(0,1), mcmc, grid = list(n.points=40, low=0.001, upp=0.999), state=NULL, hyper, printing=0, maxScale=5, ...)
msBP.Gibbs(x, a, b, g0 = "normal", g0par=c(0,1), mcmc, grid = list(n.points=40, low=0.001, upp=0.999), state=NULL, hyper, printing=0, maxScale=5, ...)
x |
the observed sample |
a |
scalar a parameter |
b |
scalar b parameter |
g0 |
prior guess for the density of |
g0par |
additional scalar parameters for |
mcmc |
a list giving the MCMC parameters. It must include the
following integers: |
grid |
a list giving the parameters for plotting the posterior mean density over a finite grid. It must include the following values: |
state |
a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis or if we want to start the MCMC algorithm from some particular value of the parameters. |
hyper |
a list containing the values of the hyperparameters for |
printing |
Vector of integers if the internal C++ function need to print what is doing |
maxScale |
maximum scale of the binary trees. |
... |
additional arguments. |
Before calling the proper C++ subrouting the function center the sample on an initial guess for the density of the data. If g0 = 'empirical'
the data are transformed so that the expctation of the msBP prior is centered on the kernel density estimate of x
.
The algorithm consists of two primary steps: (i) allocate each observation
to a multiscale cluster, conditionally on the values of the weights (see also msBP.postCluster
);
(ii) update the weights, conditionally on the cluster allocations.
All the procedure is written in C++ and additional R scripts are used to pre- and post-process the data and the output.
If hyper$hyperpriors$a
or hyper$hyperpriors$b
is true, additional hyperpriors for a
and b
are assumed. Specifically the algorithm implements and
.
For the former parameter the full conditional posterior distribution is available in closed form, i.e.
while for the latter its full conditional posterior is proportional to
where is the maximum occupied scale and
is the Beta function. To sample
from the latter distribution, a griddy Gibbs approach over the grid defined by
hyper$hyperpar$gridB
is used. See Ritter and Tanner (1992).
From Version 1.1, if hyper$hyperpriors$base=TRUE
and g0="normal"
additional hyperpriors for the parameter of the centering normal density are assumed. Specifically the model is
and an addtional step simulating the values of and
from their conditional posterior distribution is added to the Gibbs sampler of Canale and Dunson (2016). Specifically, a Metropolis-Hastings step with proposal equal to the prior is implemented.
A list containing
density |
A list containing |
mcmc |
A list containing the MCMC chains: |
postmean |
A list containing posterior means over the MCMC samples of |
fit |
A list containing the LPML, mean and median of the log CPO. |
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
Ritter C., Tanner M. (1992). "Facilitating the Gibbs Sampler: the Gibbs Stopper and the Griddy-Gibbs Sampler." Journal of the American Statistical Association, 87, 861-868.
## Not run: data(galaxy) galaxy <- data.frame(galaxy) speeds <- galaxy$speed/1000 set.seed(1) #with fixed g0 and random a, b fit.msbp.1 <- msBP.Gibbs(speeds, a = 10, b = 5, g0 = "empirical", mcmc=list(nrep = 10000, nb = 5000, ndisplay = 1000), hyper=list(hyperprior=list(a = TRUE, b = TRUE, g0 = FALSE), hyperpar=list(beta=5,gamma = 1,delta = 1,lambda = 1)), printing = 0, maxS = 7, grid = list(n.points = 150, low = 5, upp = 38)) #with random a, b and hyperparameters of g0 fit.msbp.2 <- msBP.Gibbs(speeds, a = 10, b=5, g0 = "normal", mcmc=list(nrep = 10000, nb = 5000, ndisplay = 1000), hyper=list(hyperprior = list(a = TRUE, b = TRUE, g0 = TRUE), hyperpar = list(beta = 50, gamma = 5, delta = 10, lambda = 1, gridB = seq(0, 20, length = 30), mu0 = 21, kappa0 = 0.1, alpha0 = 1, beta0 = 20)), printing = 0, maxS = 7, grid = list(n.points = 150, lo w= 5, upp = 38)) hist(speeds, prob=TRUE,br=10, ylim=c(0,0.23), main="", col='grey') points(fit.msbp.1$density$postMeanDens~fit.msbp.1$density$xDens, ty='l', lwd=2) points(fit.msbp.1$density$postUppDens~fit.msbp.1$density$xDens, ty='l',lty=2, lwd=2) points(fit.msbp.1$density$postLowDens~fit.msbp.1$density$xDens, ty='l',lty=2, lwd=2) hist(speeds, prob=TRUE,br=10, ylim=c(0,0.23), main="", col='grey') points(fit.msbp.2$density$postMeanDens~fit.msbp.2$density$xDens, ty='l', lwd=2) points(fit.msbp.2$density$postUppDens~fit.msbp.2$density$xDens, ty='l',lty=2, lwd=2) points(fit.msbp.2$density$postLowDens~fit.msbp.2$density$xDens, ty='l',lty=2, lwd=2) ## End(Not run)
## Not run: data(galaxy) galaxy <- data.frame(galaxy) speeds <- galaxy$speed/1000 set.seed(1) #with fixed g0 and random a, b fit.msbp.1 <- msBP.Gibbs(speeds, a = 10, b = 5, g0 = "empirical", mcmc=list(nrep = 10000, nb = 5000, ndisplay = 1000), hyper=list(hyperprior=list(a = TRUE, b = TRUE, g0 = FALSE), hyperpar=list(beta=5,gamma = 1,delta = 1,lambda = 1)), printing = 0, maxS = 7, grid = list(n.points = 150, low = 5, upp = 38)) #with random a, b and hyperparameters of g0 fit.msbp.2 <- msBP.Gibbs(speeds, a = 10, b=5, g0 = "normal", mcmc=list(nrep = 10000, nb = 5000, ndisplay = 1000), hyper=list(hyperprior = list(a = TRUE, b = TRUE, g0 = TRUE), hyperpar = list(beta = 50, gamma = 5, delta = 10, lambda = 1, gridB = seq(0, 20, length = 30), mu0 = 21, kappa0 = 0.1, alpha0 = 1, beta0 = 20)), printing = 0, maxS = 7, grid = list(n.points = 150, lo w= 5, upp = 38)) hist(speeds, prob=TRUE,br=10, ylim=c(0,0.23), main="", col='grey') points(fit.msbp.1$density$postMeanDens~fit.msbp.1$density$xDens, ty='l', lwd=2) points(fit.msbp.1$density$postUppDens~fit.msbp.1$density$xDens, ty='l',lty=2, lwd=2) points(fit.msbp.1$density$postLowDens~fit.msbp.1$density$xDens, ty='l',lty=2, lwd=2) hist(speeds, prob=TRUE,br=10, ylim=c(0,0.23), main="", col='grey') points(fit.msbp.2$density$postMeanDens~fit.msbp.2$density$xDens, ty='l', lwd=2) points(fit.msbp.2$density$postUppDens~fit.msbp.2$density$xDens, ty='l',lty=2, lwd=2) points(fit.msbp.2$density$postLowDens~fit.msbp.2$density$xDens, ty='l',lty=2, lwd=2) ## End(Not run)
Compute the path of each subject in the binary tree of weights and returns 3 tree: the n
tree, the r
tree, and the v
tree (see values
msBP.nrvTrees(sh, maxS = max(sh[,1]))
msBP.nrvTrees(sh, maxS = max(sh[,1]))
sh |
A matrix with 2 columns and a number of rows equal to the sample size denoting the scale and node labels of each unit |
maxS |
Upper bound for the scale |
A list containing tree objects of the class binaryTree
. n
is the tree containing at each node the number of subjects allocated to that node, r
is the tree containing at each node the number of subjects that went right at that node, and v
is the tree containing at each node the number of subjects that passed through that node.
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
sh <- cbind(c(2,2,2,3,3,3,3,3,3,3), c(1,2,2,1,2,3,4,5,6,7)) nrv.trees <- msBP.nrvTrees(sh) plot(nrv.trees$n)
sh <- cbind(c(2,2,2,3,3,3,3,3,3,3), c(1,2,2,1,2,3,4,5,6,7)) nrv.trees <- msBP.nrvTrees(sh) plot(nrv.trees$n)
Compute the density and the cumulative distribution functions of a random density drawn from an msBP(a,b) process
msBP.pdf(weights, n.points, y) msBP.cdf(weights, n.points, log, y)
msBP.pdf(weights, n.points, y) msBP.cdf(weights, n.points, log, y)
weights |
An object of the class binaryTree containing probability weights |
n.points |
Length of the grid over (0,1) in which calculate the value of the random density |
log |
Logical. TRUE for computing the log-cdf |
y |
Vector of values in which the random density is evaluated. If used, |
Vector of size n.points
or length(y)
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
prob <-structure(list( T = list(0.15, c(0.05,0.05), c(0.05,0.2,0.1,0.1), c(0,0,0.3,0,0,0,0,0) ), max.s=3), class = "binaryTree") density <- msBP.pdf(prob, 100) probability <- msBP.cdf(prob, 100) par(mfrow=c(1,2)) plot(density$dens~density$y, ty='l', main = "pdf") plot(probability$prob~density$y, ty='l', main = "cdf")
prob <-structure(list( T = list(0.15, c(0.05,0.05), c(0.05,0.2,0.1,0.1), c(0,0,0.3,0,0,0,0,0) ), max.s=3), class = "binaryTree") density <- msBP.pdf(prob, 100) probability <- msBP.cdf(prob, 100) par(mfrow=c(1,2)) plot(density$dens~density$y, ty='l', main = "pdf") plot(probability$prob~density$y, ty='l', main = "cdf")
Perform the posterior multiscale cluster allocation conditionally on a tree of weights. See Algorithm 1 in Canale and Dunson (2016).
msBP.postCluster(y, weights)
msBP.postCluster(y, weights)
y |
the sample of individials to be allocated to binary tree structure |
weights |
the binary tree of weights (summing to one). An object of the class msBPTree |
conditionally on the weights contained in weights
, each subject in y
is allocated to a multiscale cluster using Algorithm 1 of Canale and Dunson (2016). It relies on a multiscale modification of the slice sampler of Kalli et al. (2011).
a matrix with length(y)
row and two columns, denoting the scale and node within the scale, respectively.
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
Kalli, M., Griffin, J., and Walker, S. (2011), "Slice sampling mixture models," Statistics and Computing, 21, 93-105.
set.seed(1) y <- rbeta(30, 5, 1) weights <-structure(list( T = list(0, c(0,0.10), c(0.0,0,0.3,0.6)), max.s=2), class = 'binaryTree') sh <- msBP.postCluster(y, weights) clus.size <- msBP.nrvTrees(sh)$n plot(clus.size)
set.seed(1) y <- rbeta(30, 5, 1) weights <-structure(list( T = list(0, c(0,0.10), c(0.0,0,0.3,0.6)), max.s=2), class = 'binaryTree') sh <- msBP.postCluster(y, weights) clus.size <- msBP.nrvTrees(sh)$n plot(clus.size)
Random numbers generation from a random density drawn from a msBP process.
msBP.rsample(n, msBPtree)
msBP.rsample(n, msBPtree)
n |
Size of the sample to be generated |
msBPtree |
An object of the class msBPtree |
A vector containing the random sample
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
rand.tree <- msBP.rtree(50,2, 4) rand.samp <- msBP.rsample(50, rand.tree) hist(rand.samp, prob=TRUE) prob <- msBP.compute.prob(rand.tree) density <- msBP.pdf(prob, 100) points(density$dens~density$y, ty='l', col=4)
rand.tree <- msBP.rtree(50,2, 4) rand.samp <- msBP.rsample(50, rand.tree) hist(rand.samp, prob=TRUE) prob <- msBP.compute.prob(rand.tree) density <- msBP.pdf(prob, 100) points(density$dens~density$y, ty='l', col=4)
Draw a random tree from the msBP process
msBP.rtree(a, b, max.s = 10)
msBP.rtree(a, b, max.s = 10)
a |
Scalar parameter |
b |
Scalar parameter |
max.s |
Maximum depth of the random trees |
An object of the class msBPTree
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
msBP.rsample
, msBP.compute.prob
msBP.rtree(2, 2, 4)
msBP.rtree(2, 2, 4)
Performs multiscale hypothesis testing of difference in the distribution of two groups using msBP prior.
msBP.test(y, a, b, group, priorH0 = 0.5, mcmc, maxScale = 5, plot.it = FALSE, ...)
msBP.test(y, a, b, group, priorH0 = 0.5, mcmc, maxScale = 5, plot.it = FALSE, ...)
y |
The pooled sample of observations |
a , b
|
Parameters of the msBP prior |
group |
Vector of size |
priorH0 |
Prior gues for the probability of H0 |
mcmc |
a list giving the MCMC parameters. It must include the
following integers: |
maxScale |
maximum scale of the binary trees. |
plot.it |
logical. If TRUE a plot of the posterior mean probability of H0 is produced |
... |
additional arguments to be passed. |
a list containing
Ps |
a matrix with |
Ps |
the posterior mean probabilities of H0 for each scale. |
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
set.seed(1) y <- runif(100) g <- c(rep(0,50), rep(1,50)) mcmc <- list(nrep = 5000, nb = 1000, ndisplay = 500) ## Not run: test.res <- msBP.test(y, 5, 1, g, mcmc=mcmc, plot.it = TRUE) ## End(Not run)
set.seed(1) y <- runif(100) g <- c(rep(0,50), rep(1,50)) mcmc <- list(nrep = 5000, nb = 1000, ndisplay = 500) ## Not run: test.res <- msBP.test(y, 5, 1, g, mcmc=mcmc, plot.it = TRUE) ## End(Not run)
msBPTree
Create an object of the class msBPTree
msBP.tree(max.s = 10)
msBP.tree(max.s = 10)
max.s |
Maximum depth of the binary tree |
An object of the class msbpTree
is a list of 5 elements that represent a draw from a msBP(a,b) prior as introduced by Canale and Dunson (2016). The first two elements are the trees of the stopping and descending-to-the-right probabilities, respectively. Both are object of the class binaryTree
. The third and fourth argument are the hyperparameters of the msBP prior, namely a
and b
. The last value is an integer with the maximum depth of both the trees.
An object of the class msBPTree
with zero at all nodes and a=b=NULL
.
Canale, A. and Dunson, D. B. (2016), "Multiscale Bernstein polynomials for densities", Statistica Sinica, 26(3), 1175-1195.
Canale, A. (2017), "msBP: An R Package to Perform Bayesian Nonparametric Inference Using Multiscale Bernstein Polynomials Mixtures". Journal of Statistical Software, 78(6), 1-19.
msBP.rtree
tree <- msBP.tree(2)
tree <- msBP.tree(2)
Convert a binary tree object into a vector and _vice versa_
tree2vec(tree) vec2tree(vec)
tree2vec(tree) vec2tree(vec)
tree |
An object of the class binaryTree |
vec |
A vector of numbers. It must have size 2^s - 1, with s an integer. |
An object of the class binaryTree is a binary tree containing at each node a value.
A vector of size , where
is the depth of the binary tree, or a binary tree with depth
length(vec) + 1
.
tree <- vec2tree(1:(2^5 - 1)) vector <- tree2vec(tree)
tree <- vec2tree(1:(2^5 - 1)) vector <- tree2vec(tree)