Title: | Bayesian Mode Inference |
---|---|
Description: | A two-step Bayesian approach for mode inference following Cross, Hoogerheide, Labonne and van Dijk (2024) <doi:10.1016/j.econlet.2024.111579>). First, a mixture distribution is fitted on the data using a sparse finite mixture (SFM) Markov chain Monte Carlo (MCMC) algorithm. The number of mixture components does not have to be known; the size of the mixture is estimated endogenously through the SFM approach. Second, the modes of the estimated mixture at each MCMC draw are retrieved using algorithms specifically tailored for mode detection. These estimates are then used to construct posterior probabilities for the number of modes, their locations and uncertainties, providing a powerful tool for mode inference. |
Authors: | Nalan Baştürk [aut], Jamie Cross [aut], Peter de Knijff [aut], Lennart Hoogerheide [aut], Paul Labonne [aut, cre], Herman van Dijk [aut] |
Maintainer: | Paul Labonne <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.7.2 |
Built: | 2024-10-26 03:30:53 UTC |
Source: | CRAN |
Estimation of a univariate mixture with unknown number of components using a sparse finite mixture Markov chain Monte Carlo (SFM MCMC) algorithm.
bayes_fit( data, K, dist, priors = list(), nb_iter = 2000, burnin = nb_iter/2, print = TRUE )
bayes_fit( data, K, dist, priors = list(), nb_iter = 2000, burnin = nb_iter/2, print = TRUE )
data |
Vector of observations. |
K |
Maximum number of mixture components. |
dist |
String indicating the distribution of the mixture components;
currently supports |
priors |
List of priors; default is an empty list which implies the following priors: |
nb_iter |
Number of MCMC iterations; default is |
burnin |
Number of MCMC iterations used as burnin; default is |
print |
Showing MCMC progression ? Default is |
Let ,
denote observations.
A general mixture of
distributions from the same
parametric family is given by:
with and
,
.
The exact number of components does not have to be known a priori
when using an SFM MCMC approach. Rather, an upper bound is specified for the
number of components and the weights of superfluous components are shrunk
towards zero during estimation. Following Malsiner-Walli et al. (2016)
a symmetric Dirichlet prior is used for the mixture weights:
where a Gamma hyperprior is used on the concentration parameter :
Mixture of Normal distributions
Normal components take the form:
Independent conjugate priors are used for and
(see for instance Malsiner-Walli et al. 2016):
Mixture of skew-Normal distributions
We use the skew-Normal of Azzalini (1985) which takes the form:
where is a location parameter,
a scale parameter and
the shape parameter introducing skewness. For Bayesian estimation, we adopt the approach of
Frühwirth-Schnatter and Pyne (2010) and use the following reparameterised random-effect model:
where the parameters of the skew-Normal are recovered with
By defining a regressor , the skew-Normal mixture can be seen as
random effect model and sampled using standard techniques. Thus we use priors similar to
the Normal mixture model:
We set
and
with D_xi = D_psi = 1.
Mixture of Poisson distributions
Poisson components take the form:
The prior for follows from Viallefont et al. (2002):
Mixture of shifted-Poisson distributions
Shifted-Poisson components take the form
where is a location or shift parameter with uniform prior, see Cross et al. (2024).
A list of class bayes_mixture
containing:
data |
Same as argument. |
mcmc |
Matrix of MCMC draws where the rows corresponding to burnin have been discarded; |
mcmc_all |
Matrix of MCMC draws. |
loglik |
Log likelihood at each MCMC draw. |
K |
Number of components. |
dist |
Same as argument. |
pdf_func |
The pdf/pmf of the mixture components. |
dist_type |
Type of the distribution, i.e. continuous or discrete. |
pars_names |
Names of the mixture components' parameters. |
loc |
Name of the location parameter of the mixture components. |
nb_var |
Number of variables/parameters in the mixture distribution. |
Azzalini A (1985).
“A Class of Distributions Which Includes the Normal Ones.”
Scandinavian Journal of Statistics, 12(2), 171–178.
ISSN 0303-6898, Publisher: [Board of the Foundation of the Scandinavian Journal of Statistics, Wiley].
Cross JL, Hoogerheide L, Labonne P, van Dijk HK (2024).
“Bayesian mode inference for discrete distributions in economics and finance.”
Economics Letters, 235, 111579.
ISSN 0165-1765, doi:10.1016/j.econlet.2024.111579.
Frühwirth-Schnatter S, Pyne S (2010).
“Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions.”
Biostatistics, 11(2), 317–336.
ISSN 1465-4644, doi:10.1093/biostatistics/kxp062.
Malsiner-Walli G, Fruhwirth-Schnatter S, Grun B (2016).
“Model-based clustering based on sparse finite Gaussian mixtures.”
Statistics and Computing, 26(1), 303–324.
ISSN 1573-1375, doi:10.1007/s11222-014-9500-2.
Viallefont V, Richardson S, Peter
J (2002).
“Bayesian analysis of Poisson mixtures.”
Journal of Nonparametric Statistics, 14(1-2), 181–202.
# Example with galaxy data ================================================ set.seed(123) # retrieve galaxy data y <- galaxy # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "normal", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # plot estimated mixture # plot(bayesmix, max_size = 200) # Changing priors ================================================ set.seed(123) # retrieve galaxy data y <- galaxy # estimation K <- 5 bayesmix <- bayes_fit( data = y, K = K, # not many to run the example rapidly dist = "normal", priors = list( a0 = 10, A0 = 10 * K ), nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # plot estimated mixture # plot(bayesmix, max_size = 200) # Example with DNA data ===================================================== set.seed(123) # retrieve DNA data y <- d4z4 # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "shifted_poisson", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # plot estimated mixture # plot(bayesmix, max_size = 200)
# Example with galaxy data ================================================ set.seed(123) # retrieve galaxy data y <- galaxy # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "normal", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # plot estimated mixture # plot(bayesmix, max_size = 200) # Changing priors ================================================ set.seed(123) # retrieve galaxy data y <- galaxy # estimation K <- 5 bayesmix <- bayes_fit( data = y, K = K, # not many to run the example rapidly dist = "normal", priors = list( a0 = 10, A0 = 10 * K ), nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # plot estimated mixture # plot(bayesmix, max_size = 200) # Example with DNA data ===================================================== set.seed(123) # retrieve DNA data y <- d4z4 # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "shifted_poisson", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # plot estimated mixture # plot(bayesmix, max_size = 200)
bayes_mixture
Creates an object of class bayes_mixture
which can subsequently be used as argument in bayes_mode()
.
This function is useful for users who want to use the mode inference capabilities of BayesMultiMode
with mixture
estimated using external software.
bayes_mixture( mcmc, data, burnin = 0, dist = NA_character_, pdf_func = NULL, dist_type = NA_character_, loglik = NULL, vars_to_keep = NA_character_, vars_to_rename = NA_character_, loc = NA_character_ )
bayes_mixture( mcmc, data, burnin = 0, dist = NA_character_, pdf_func = NULL, dist_type = NA_character_, loglik = NULL, vars_to_keep = NA_character_, vars_to_rename = NA_character_, loc = NA_character_ )
mcmc |
A matrix of MCMC draws with one column per variable, e.g. eta1, eta2, ..., mu1, mu2, etc... |
data |
Vector of observation used for estimating the model. |
burnin |
Number of draws to discard as burnin; default is |
dist |
Distribution family of the mixture components supported by
the package (i.e. |
pdf_func |
(function) Pdf or pmf of the mixture components;
this input is used only if |
dist_type |
Either |
loglik |
Vector showing the log likelihood at each MCMC draw. |
vars_to_keep |
(optional) Character vector containing the names
of the variables to keep in |
vars_to_rename |
(optional) Use for renaming variables/parameters in |
loc |
(for continuous mixtures other than Normal mixtures) String indicating the location parameter of the distribution; the latter is used to initialise the MEM algorithm. |
A list of class bayes_mixture
containing:
data |
Same as argument. |
mcmc |
Matrix of MCMC draws where the rows corresponding to burnin have been discarded; |
mcmc_all |
Matrix of MCMC draws. |
loglik |
Log likelihood at each MCMC draw. |
K |
Number of components. |
dist |
Same as argument. |
pdf_func |
The pdf/pmf of the mixture components. |
dist_type |
Type of the distribution, i.e. continuous or discrete. |
pars_names |
Names of the mixture components' parameters. |
loc |
Name of the location parameter of the mixture components. |
nb_var |
Number of parameters in the mixture distribution. |
# Example with a Student t ================================================ # Constructing synthetic mcmc output mu <- c(0.5, 6) mu_mat <- matrix(rep(mu, 100) + rnorm(200, 0, 0.1), ncol = 2, byrow = TRUE ) omega <- c(1, 2) sigma_mat <- matrix(rep(omega, 100) + rnorm(200, 0, 0.1), ncol = 2, byrow = TRUE ) nu <- c(5, 5) nu_mat <- matrix(rep(nu, 100) + rnorm(200, 0, 0.1), ncol = 2, byrow = TRUE ) eta <- c(0.8, 0.2) eta_mat <- matrix(rep(eta[1], 100) + rnorm(100, 0, 0.05), ncol = 1 ) eta_mat <- cbind(eta_mat, 1 - eta_mat) xi_mat <- matrix(0, 100, 2) fit <- cbind(eta_mat, mu_mat, sigma_mat, nu_mat, xi_mat) colnames(fit) <- c( "eta1", "eta2", "mu1", "mu2", "omega1", "omega2", "nu1", "nu2", "xi1", "xi2" ) # sampling observations data <- c( sn::rst(eta[1] * 1000, mu[1], omega[1], nu = nu[1]), sn::rst(eta[2] * 1000, mu[2], omega[2], nu = nu[2]) ) pdf_func <- function(x, pars) { sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"]) } dist_type <- "continuous" BM <- bayes_mixture(fit, data, burnin = 50, pdf_func = pdf_func, dist_type = dist_type, vars_to_rename = c("sigma" = "omega"), loc = "xi" ) # plot(BM)
# Example with a Student t ================================================ # Constructing synthetic mcmc output mu <- c(0.5, 6) mu_mat <- matrix(rep(mu, 100) + rnorm(200, 0, 0.1), ncol = 2, byrow = TRUE ) omega <- c(1, 2) sigma_mat <- matrix(rep(omega, 100) + rnorm(200, 0, 0.1), ncol = 2, byrow = TRUE ) nu <- c(5, 5) nu_mat <- matrix(rep(nu, 100) + rnorm(200, 0, 0.1), ncol = 2, byrow = TRUE ) eta <- c(0.8, 0.2) eta_mat <- matrix(rep(eta[1], 100) + rnorm(100, 0, 0.05), ncol = 1 ) eta_mat <- cbind(eta_mat, 1 - eta_mat) xi_mat <- matrix(0, 100, 2) fit <- cbind(eta_mat, mu_mat, sigma_mat, nu_mat, xi_mat) colnames(fit) <- c( "eta1", "eta2", "mu1", "mu2", "omega1", "omega2", "nu1", "nu2", "xi1", "xi2" ) # sampling observations data <- c( sn::rst(eta[1] * 1000, mu[1], omega[1], nu = nu[1]), sn::rst(eta[2] * 1000, mu[2], omega[2], nu = nu[2]) ) pdf_func <- function(x, pars) { sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"]) } dist_type <- "continuous" BM <- bayes_mixture(fit, data, burnin = 50, pdf_func = pdf_func, dist_type = dist_type, vars_to_rename = c("sigma" = "omega"), loc = "xi" ) # plot(BM)
Bayesian inference on the modes in a univariate mixture estimated with MCMC methods, see Cross et al. (2024).
Provides posterior probabilities of the number of modes and their locations.
Under the hood it calls the function mix_mode()
to find the modes in each MCMC draw.
bayes_mode( BayesMix, rd = 1, tol_mixp = 0, tol_x = sd(BayesMix$data)/10, tol_conv = 1e-08, inside_range = TRUE, range = c(min(BayesMix$data), max(BayesMix$data)), conditional_nb_modes = NULL )
bayes_mode( BayesMix, rd = 1, tol_mixp = 0, tol_x = sd(BayesMix$data)/10, tol_conv = 1e-08, inside_range = TRUE, range = c(min(BayesMix$data), max(BayesMix$data)), conditional_nb_modes = NULL )
BayesMix |
An object of class |
rd |
(for continuous mixtures) Integer indicating the number of decimal places when rounding the distribution's support. It is necessary to compute posterior probabilities of mode locations. |
tol_mixp |
Components with a mixture proportion below |
tol_x |
(for continuous mixtures) Tolerance parameter for distance in-between modes; default is |
tol_conv |
(for continuous mixtures) Tolerance parameter for convergence of the algorithm; default is |
inside_range |
Should modes outside of |
range |
limits of the support where modes are saved (if |
conditional_nb_modes |
Mcmc draws are filtered to include those with only |
Each draw from the MCMC output after burnin, , leads to a posterior predictive probability
density/mass function:
Using this function, the mode in draw
,
,
where
is the number of modes, are estimated using the algorithm mentioned
in the description above.
After running this procedure across all retained posterior draws,
we compute the posterior probability for the number of modes being as:
Similarly, posterior probabilities for locations of the modes are given by:
for each location in the range
. Obviously,
continuous data are not defined on a discrete support;
it is therefore necessary to choose a rounding decimal to discretize their support (with the
rd
argument).
A list of class bayes_mode
containing:
data |
From |
dist |
From |
dist_type |
From |
pars_names |
From |
modes |
Matrix with a row for each draw and columns showing modes. |
p1 |
Posterior probability of unimodality. |
p_nb_modes |
Matrix showing posterior probabilities for the number of modes. |
p_mode_loc |
Matrix showing posterior probabilities for mode locations. |
mix_density |
Mixture density at all mode locations in each draw. |
algo |
Algorithm used for mode estimation. |
range |
Range outside which modes are discarded if |
conditional_nb_modes |
From |
BayesMix |
|
Cross JL, Hoogerheide L, Labonne P, van Dijk HK (2024). “Bayesian mode inference for discrete distributions in economics and finance.” Economics Letters, 235, 111579. ISSN 0165-1765, doi:10.1016/j.econlet.2024.111579.
# Example with galaxy data ================================================ set.seed(123) # retrieve galaxy data y <- galaxy # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "normal", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # mode estimation BayesMode <- bayes_mode(bayesmix) # plot # plot(BayesMode, max_size = 200) # summary # summary(BayesMode) # Example with DNA data ================================================ set.seed(123) # retrieve DNA data y <- d4z4 # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "shifted_poisson", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # mode estimation BayesMode <- bayes_mode(bayesmix) # plot # plot(BayesMode, max_size = 200) # summary # summary(BayesMode) # Example with a Student t ================================================ mu <- c(0.5, 6) sigma <- c(1, 2) nu <- c(5, 5) p <- c(0.8, 0.2) #' data <- c( sn::rst(p[1] * 1000, mu[1], sigma[1], nu = nu[1]), sn::rst(p[2] * 1000, mu[2], sigma[2], nu = nu[2]) ) fit <- c(eta = p, mu = mu, sigma = sigma, nu = nu, xi = c(0, 0)) fit <- rbind(fit, fit) pdf_func <- function(x, pars) { sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"]) } dist_type <- "continuous" bayesmix <- bayes_mixture(fit, data, burnin = 1, pdf_func = pdf_func, dist_type = dist_type, loc = "mu" ) BayesMode <- bayes_mode(bayesmix) # plot # plot(BayesMode, max_size = 200) # summary # summary(BayesMode)
# Example with galaxy data ================================================ set.seed(123) # retrieve galaxy data y <- galaxy # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "normal", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # mode estimation BayesMode <- bayes_mode(bayesmix) # plot # plot(BayesMode, max_size = 200) # summary # summary(BayesMode) # Example with DNA data ================================================ set.seed(123) # retrieve DNA data y <- d4z4 # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "shifted_poisson", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # mode estimation BayesMode <- bayes_mode(bayesmix) # plot # plot(BayesMode, max_size = 200) # summary # summary(BayesMode) # Example with a Student t ================================================ mu <- c(0.5, 6) sigma <- c(1, 2) nu <- c(5, 5) p <- c(0.8, 0.2) #' data <- c( sn::rst(p[1] * 1000, mu[1], sigma[1], nu = nu[1]), sn::rst(p[2] * 1000, mu[2], sigma[2], nu = nu[2]) ) fit <- c(eta = p, mu = mu, sigma = sigma, nu = nu, xi = c(0, 0)) fit <- rbind(fit, fit) pdf_func <- function(x, pars) { sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"]) } dist_type <- "continuous" bayesmix <- bayes_mixture(fit, data, burnin = 1, pdf_func = pdf_func, dist_type = dist_type, loc = "mu" ) BayesMode <- bayes_mode(bayesmix) # plot # plot(BayesMode, max_size = 200) # summary # summary(BayesMode)
This is wrapper around the bayesplot::mcmc_trace()
function from package bayesplot
.
bayes_trace(BayesMix, mcmc_vars = NULL, with_burnin = FALSE, ...)
bayes_trace(BayesMix, mcmc_vars = NULL, with_burnin = FALSE, ...)
BayesMix |
An object of class |
mcmc_vars |
Variables to plot; default is all the variable in the MCMC output. |
with_burnin |
Plot all draws ? |
... |
Additional arguments passed to function |
A trace plot.
# Example with galaxy data ================================================ set.seed(123) # retrieve galaxy data y <- galaxy # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "normal", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # trace plot # bayes_trace(bayesmix)
# Example with galaxy data ================================================ set.seed(123) # retrieve galaxy data y <- galaxy # estimation bayesmix <- bayes_fit( data = y, K = 5, # not many to run the example rapidly dist = "normal", nb_iter = 500, # not many to run the example rapidly burnin = 100 ) # trace plot # bayes_trace(bayesmix)
Repeat units that encode for a cancer testis antigen.
Locus (hg18): Xq24
Unit (kb): 4.8
Restriction enzyme: EcoRI
Encoded product : cancer testis antigen 47
ct47
ct47
A vector of counts with 410 elements.
Schaap M, Lemmers RJ, Maassen R, van der Vliet PJ, Hoogerheide LF, van Dijk HK, Basturk N, de Knijff P, van der Maarel SM (2013). “Genome-wide analysis of macrosatellite repeat copy number variation in worldwide populations: evidence for differences and commonalities in size distributions and size restrictions.” BMC Genomics, 14(1), 143. ISSN 1471-2164, doi:10.1186/1471-2164-14-143.
Dataset constructed using the International Best Track Archive for Climate Stewardship (IBTrACS). The distribution of tropical cyclones lifetime maximum intensity across the globe is known to be bimodal which has important implications for climate modelling.
cyclone
cyclone
A dataset with three columns showing the identification of the cyclone, its year of occurrence and its lifetime maximum intensity (LMI). LMI is calculated as the maximum wind speed for each cyclone with unit ks.
https://www.ncei.noaa.gov/products/international-best-track-archive
Knapp KR, Kruk MC, Levinson DH, Diamond HJ, Neumann CJ (2010).
“The International Best Track Archive for Climate Stewardship (IBTrACS): Unifying Tropical Cyclone Data.”
Bulletin of the American Meteorological Society, 91(3), 363–376.
ISSN 0003-0007, 1520-0477, doi:10.1175/2009BAMS2755.1, Publisher: American Meteorological Society Section: Bulletin of the American Meteorological Society.
Knapp KR, Diamond HJ, J.P. K, Kruk MC, Schreck CJ (2018).
“International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4.”
NOAA National Centers for Environmental Information.
doi:10.1175/2009BAMS2755.1.
Macrosatellite repeats D4Z4 in the subtelomere of chromosome 4q.
Locus (hg18): 4q35.2
Unit (kb): 3.3
Restriction enzyme: EcoRI + HindIII/EcoRI + BlnI/XapI
Encoded product : DUX4
d4z4
d4z4
A vector of counts with 410 elements.
Schaap M, Lemmers RJ, Maassen R, van der Vliet PJ, Hoogerheide LF, van Dijk HK, Basturk N, de Knijff P, van der Maarel SM (2013). “Genome-wide analysis of macrosatellite repeat copy number variation in worldwide populations: evidence for differences and commonalities in size distributions and size restrictions.” BMC Genomics, 14(1), 143. ISSN 1471-2164, doi:10.1186/1471-2164-14-143.
Velocity at which 82 galaxies in the Corona Borealis region are moving away from our galaxy, scaled by 1000.
galaxy
galaxy
An object of class numeric
of length 82.
https://people.maths.bris.ac.uk/~mapjg/mixdata
Richardson S, Green PJ (1997). “On Bayesian Analysis of Mixtures with an Unknown Number of Components.” Journal of the Royal Statistical Society. Series B (Methodological), 59(4), pp. 731–792. ISSN 00359246.
Mode estimation in univariate mixture distributions. The fixed-point algorithm of Carreira-Perpinan (2000) is used for Gaussian mixtures. The Modal EM algorithm of Li et al. (2007) is used for other continuous mixtures. A basic algorithm is used for discrete mixtures, see Cross et al. (2024).
mix_mode( mixture, tol_mixp = 0, tol_x = 1e-06, tol_conv = 1e-08, type = "all", inside_range = TRUE )
mix_mode( mixture, tol_mixp = 0, tol_x = 1e-06, tol_conv = 1e-08, type = "all", inside_range = TRUE )
mixture |
An object of class |
tol_mixp |
Components with a mixture proportion below |
tol_x |
(for continuous mixtures) Tolerance parameter for distance in-between modes; default is |
tol_conv |
(for continuous mixtures) Tolerance parameter for convergence of the algorithm; default is |
type |
(for discrete mixtures) Type of modes, either |
inside_range |
Should modes outside of |
This function finds modes in a univariate mixture defined as:
where is a density or probability mass/density function.
Fixed-point algorithm
Following Carreira-Perpinan (2000), a mode is found by iterating the two steps:
with
until convergence, that is, until ,
where
is an argument with default value
.
Following Carreira-perpinan (2000), the algorithm is started at each component location.
Separately, it is necessary to identify identical modes which diverge only up to
a small value; this tolerance value can be controlled with the argument
tol_x
.
MEM algorithm
Following Li et al. (2007), a mode is found by iterating the two steps:
until convergence, that is, until ,
where
is an argument with default value
.
The algorithm is started at each component location.
Separately, it is necessary to identify identical modes which diverge only up to
a small value. Modes which are closer then
tol_x
are merged.
Discrete method By definition, modes must satisfy either:
The algorithm evaluate each location point with these two conditions.
A list of class mix_mode
containing:
mode_estimates |
estimates of the mixture modes. |
algo |
algorithm used for mode estimation. |
dist |
from |
dist_type |
type of mixture distribution, i.e. continuous or discrete. |
pars |
from |
pdf_func |
from |
K |
from |
nb_var |
from |
Cross JL, Hoogerheide L, Labonne P, van Dijk HK (2024). “Bayesian mode inference for discrete distributions in economics and finance.” Economics Letters, 235, 111579. ISSN 0165-1765, doi:10.1016/j.econlet.2024.111579.
Carreira-Perpinan MA (2000).
“Mode-finding for mixtures of Gaussian distributions.”
IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1318–1323.
ISSN 1939-3539, doi:10.1109/34.888716, Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
Cross JL, Hoogerheide L, Labonne P, van Dijk HK (2024).
“Bayesian mode inference for discrete distributions in economics and finance.”
Economics Letters, 235, 111579.
ISSN 0165-1765, doi:10.1016/j.econlet.2024.111579.
Li J, Ray S, Lindsay BG (2007).
“A Nonparametric Statistical Approach to Clustering via Mode Identification.”
Journal of Machine Learning Research, 8, 1687-1723.
# Example with a normal distribution ==================================== mu <- c(0, 5) sigma <- c(1, 2) p <- c(0.5, 0.5) params <- c(eta = p, mu = mu, sigma = sigma) mix <- mixture(params, dist = "normal", range = c(-5, 15)) modes <- mix_mode(mix) # summary(modes) # plot(modes) # Example with a skew normal ============================================= xi <- c(0, 6) omega <- c(1, 2) alpha <- c(0, 0) p <- c(0.8, 0.2) params <- c(eta = p, xi = xi, omega = omega, alpha = alpha) dist <- "skew_normal" mix <- mixture(params, dist = dist, range = c(-5, 15)) modes <- mix_mode(mix) # summary(modes) # plot(modes) # Example with an arbitrary continuous distribution ====================== xi <- c(0, 6) omega <- c(1, 2) alpha <- c(0, 0) nu <- c(3, 100) p <- c(0.8, 0.2) params <- c(eta = p, mu = xi, sigma = omega, xi = alpha, nu = nu) pdf_func <- function(x, pars) { sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"]) } mix <- mixture(params, pdf_func = pdf_func, dist_type = "continuous", loc = "mu", range = c(-5, 15) ) modes <- mix_mode(mix) # summary(modes) # plot(modes, from = -4, to = 4) # Example with a poisson distribution ==================================== lambda <- c(0.1, 10) p <- c(0.5, 0.5) params <- c(eta = p, lambda = lambda) dist <- "poisson" mix <- mixture(params, range = c(0, 50), dist = dist) modes <- mix_mode(mix) # summary(modes) # plot(modes) # Example with an arbitrary discrete distribution ======================= mu <- c(20, 5) size <- c(20, 0.5) p <- c(0.5, 0.5) params <- c(eta = p, mu = mu, size = size) pmf_func <- function(x, pars) { dnbinom(x, mu = pars["mu"], size = pars["size"]) } mix <- mixture(params, range = c(0, 50), pdf_func = pmf_func, dist_type = "discrete" ) modes <- mix_mode(mix) # summary(modes) # plot(modes)
# Example with a normal distribution ==================================== mu <- c(0, 5) sigma <- c(1, 2) p <- c(0.5, 0.5) params <- c(eta = p, mu = mu, sigma = sigma) mix <- mixture(params, dist = "normal", range = c(-5, 15)) modes <- mix_mode(mix) # summary(modes) # plot(modes) # Example with a skew normal ============================================= xi <- c(0, 6) omega <- c(1, 2) alpha <- c(0, 0) p <- c(0.8, 0.2) params <- c(eta = p, xi = xi, omega = omega, alpha = alpha) dist <- "skew_normal" mix <- mixture(params, dist = dist, range = c(-5, 15)) modes <- mix_mode(mix) # summary(modes) # plot(modes) # Example with an arbitrary continuous distribution ====================== xi <- c(0, 6) omega <- c(1, 2) alpha <- c(0, 0) nu <- c(3, 100) p <- c(0.8, 0.2) params <- c(eta = p, mu = xi, sigma = omega, xi = alpha, nu = nu) pdf_func <- function(x, pars) { sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"]) } mix <- mixture(params, pdf_func = pdf_func, dist_type = "continuous", loc = "mu", range = c(-5, 15) ) modes <- mix_mode(mix) # summary(modes) # plot(modes, from = -4, to = 4) # Example with a poisson distribution ==================================== lambda <- c(0.1, 10) p <- c(0.5, 0.5) params <- c(eta = p, lambda = lambda) dist <- "poisson" mix <- mixture(params, range = c(0, 50), dist = dist) modes <- mix_mode(mix) # summary(modes) # plot(modes) # Example with an arbitrary discrete distribution ======================= mu <- c(20, 5) size <- c(20, 0.5) p <- c(0.5, 0.5) params <- c(eta = p, mu = mu, size = size) pmf_func <- function(x, pars) { dnbinom(x, mu = pars["mu"], size = pars["size"]) } mix <- mixture(params, range = c(0, 50), pdf_func = pmf_func, dist_type = "discrete" ) modes <- mix_mode(mix) # summary(modes) # plot(modes)
mixture
Creates an object of class mixture
which can subsequently be used as argument in mix_mode()
for mode estimation.
mixture( pars, dist = NA_character_, pdf_func = NULL, dist_type = NA_character_, range, loc = NA_character_ )
mixture( pars, dist = NA_character_, pdf_func = NULL, dist_type = NA_character_, range, loc = NA_character_ )
pars |
Named vector of mixture parameters. |
dist |
Distribution family of the mixture components supported by
the package (i.e. |
pdf_func |
(function) Pdf or pmf of the mixture components;
this input is used only if |
dist_type |
Type of the distribution, either |
range |
upper and lower limit of the range where the mixture should be evaluated. |
loc |
(for continuous mixtures other than Normal mixtures) String indicating the location parameter of the distribution; the latter is used to initialise the MEM algorithm. |
A list of class mixture
containing:
pars |
Same as argument. |
pars_names |
Names of the parameters of the components' distribution. |
dist |
Same as argument. |
pdf_func |
Pdf (or pmf) of the mixture components. |
dist_type |
Same as argument. |
loc |
Type of the distribution, either |
nb_var |
Number of parameters in the mixture distribution. |
K |
Number of mixture components. |
range |
Same as argument. |
# Example with the skew normal ============================================= xi <- c(0, 6) omega <- c(1, 2) alpha <- c(0, 0) p <- c(0.8, 0.2) params <- c(eta = p, xi = xi, omega = omega, alpha = alpha) dist <- "skew_normal" mix <- mixture(params, dist = dist, range = c(-2, 10)) # summary(mix) # plot(mix) # Example with an arbitrary distribution =================================== mu <- c(0, 6) omega <- c(1, 2) xi <- c(0, 0) nu <- c(3, 100) p <- c(0.8, 0.2) params <- c(eta = p, mu = mu, sigma = omega, xi = xi, nu = nu) pdf_func <- function(x, pars) { sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"]) } mix <- mixture(params, pdf_func = pdf_func, dist_type = "continuous", loc = "mu", range = c(-2, 10) ) # summary(mix) # plot(mix, from = -4, to = 4)
# Example with the skew normal ============================================= xi <- c(0, 6) omega <- c(1, 2) alpha <- c(0, 0) p <- c(0.8, 0.2) params <- c(eta = p, xi = xi, omega = omega, alpha = alpha) dist <- "skew_normal" mix <- mixture(params, dist = dist, range = c(-2, 10)) # summary(mix) # plot(mix) # Example with an arbitrary distribution =================================== mu <- c(0, 6) omega <- c(1, 2) xi <- c(0, 0) nu <- c(3, 100) p <- c(0.8, 0.2) params <- c(eta = p, mu = mu, sigma = omega, xi = xi, nu = nu) pdf_func <- function(x, pars) { sn::dst(x, pars["mu"], pars["sigma"], pars["xi"], pars["nu"]) } mix <- mixture(params, pdf_func = pdf_func, dist_type = "continuous", loc = "mu", range = c(-2, 10) ) # summary(mix) # plot(mix, from = -4, to = 4)
bayes_mixture
objectsPlot an estimated mixture for a given number of draws with a frequency distribution of the data.
## S3 method for class 'bayes_mixture' plot(x, draws = 250, draw = NULL, bins = 30, alpha = 0.1, ...)
## S3 method for class 'bayes_mixture' plot(x, draws = 250, draw = NULL, bins = 30, alpha = 0.1, ...)
x |
An object of class |
draws |
The number of MCMC draws to plot. |
draw |
Plot estimated mixture in draw |
bins |
(for continuous mixtures) Number of bins for the histogram of
the data. Passed to |
alpha |
transparency of the density lines. Default is 0.1. Should be greater than 0 and below or equal to 1. |
... |
Not used. |
bayes_mode
objectsPlot method for bayes_mode
objects
## S3 method for class 'bayes_mode' plot(x, graphs = c("p1", "number", "loc"), draw = NULL, ...)
## S3 method for class 'bayes_mode' plot(x, graphs = c("p1", "number", "loc"), draw = NULL, ...)
x |
An object of class |
graphs |
which plot to show ? Default is all three c("p1", "number", "loc"). |
draw |
Plot modes in a given mcmc draw; note that |
... |
Not used. |
mix_mode
objectsPlot method for mix_mode
objects
## S3 method for class 'mix_mode' plot(x, from = x$range[1], to = x$range[2], ...)
## S3 method for class 'mix_mode' plot(x, from = x$range[1], to = x$range[2], ...)
x |
An object of class |
from |
the lower limit of the range over which the function will be plotted.
Default is |
to |
the upper limit of the range over which the function will be plotted.
Default is |
... |
Not used. |
mixture
objectsPlot method for mixture
objects
## S3 method for class 'mixture' plot(x, from = x$range[1], to = x$range[2], ...)
## S3 method for class 'mixture' plot(x, from = x$range[1], to = x$range[2], ...)
x |
An object of class |
from |
the lower limit of the range over which the function will be plotted.
Default is |
to |
the upper limit of the range over which the function will be plotted.
Default is |
... |
Not used. |
bayes_mixture
objectsPrint method for bayes_mixture
objects
## S3 method for class 'bayes_mixture' print(x, max_length = 6L, max_width = 6L, print_all = F, ...)
## S3 method for class 'bayes_mixture' print(x, max_length = 6L, max_width = 6L, print_all = F, ...)
x |
An object of class |
max_length |
maximum number of elements (for vector) or rows (for matrices) to show. Default is |
max_width |
maximum number of columns to show (for matrices). Default is |
print_all |
override max_length and max_width to print everything? Default is FALSE. |
... |
Not used. |
bayes_mode
objectsPrint method for bayes_mode
objects
## S3 method for class 'bayes_mode' print(x, max_length = 6L, max_width = 6L, print_all = F, ...)
## S3 method for class 'bayes_mode' print(x, max_length = 6L, max_width = 6L, print_all = F, ...)
x |
An object of class |
max_length |
maximum number of elements (for vector) or rows (for matrices) to show. Default is |
max_width |
maximum number of columns to show (for matrices). Default is |
print_all |
override max_length and max_width to print everything? Default is FALSE. |
... |
Not used. |
mix_mode
objectsPrint method for mix_mode
objects
## S3 method for class 'mix_mode' print(x, max_length = 6L, max_width = 6L, print_all = F, ...)
## S3 method for class 'mix_mode' print(x, max_length = 6L, max_width = 6L, print_all = F, ...)
x |
An object of class |
max_length |
maximum number of elements (for vector) or rows (for matrices) to show. Default is |
max_width |
maximum number of columns to show (for matrices). Default is |
print_all |
override max_length and max_width to print everything? Default is FALSE. |
... |
Not used. |
mixture
objectsPrint method for mixture
objects
## S3 method for class 'mixture' print(x, max_length = 6L, max_width = 6L, print_all = F, ...)
## S3 method for class 'mixture' print(x, max_length = 6L, max_width = 6L, print_all = F, ...)
x |
An object of class |
max_length |
maximum number of elements (for vector) or rows (for matrices) to show. Default is |
max_width |
maximum number of columns to show (for matrices). Default is |
print_all |
override max_length and max_width to print everything? Default is FALSE. |
... |
Not used. |
bayes_mixture
objects
The summary of MCMC draws is given by the function
summarise_draws
from package posterior.Summary method for bayes_mixture
objects
The summary of MCMC draws is given by the function
summarise_draws
from package posterior.
## S3 method for class 'bayes_mixture' summary(object, ...)
## S3 method for class 'bayes_mixture' summary(object, ...)
object |
An object of class |
... |
Not used. |
bayes_mode
objectsSummary method for bayes_mode
objects
## S3 method for class 'bayes_mode' summary(object, ...)
## S3 method for class 'bayes_mode' summary(object, ...)
object |
An object of class |
... |
Not used. |
mix_mode
objectsSummary method for mix_mode
objects
## S3 method for class 'mix_mode' summary(object, ...)
## S3 method for class 'mix_mode' summary(object, ...)
object |
An object of class |
... |
Not used. |
mixture
objectsSummary method for mixture
objects
## S3 method for class 'mixture' summary(object, ...)
## S3 method for class 'mixture' summary(object, ...)
object |
An object of class |
... |
Not used. |