Title: | Addons for the 'mclust' Package |
---|---|
Description: | Extend the functionality of the 'mclust' package for Gaussian finite mixture modeling by including: density estimation for data with bounded support (Scrucca, 2019 <doi:10.1002/bimj.201800174>); modal clustering using MEM (Modal EM) algorithm for Gaussian mixtures (Scrucca, 2021 <doi:10.1002/sam.11527>); entropy estimation via Gaussian mixture modeling (Robin & Scrucca, 2023 <doi:10.1016/j.csda.2022.107582>); Gaussian mixtures modeling of financial log-returns (Scrucca, 2024 <doi:10.3390/e26110907>). |
Authors: | Luca Scrucca [aut, cre, cph] |
Maintainer: | Luca Scrucca <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.9.1 |
Built: | 2024-11-27 06:56:59 UTC |
Source: | CRAN |
Compute the cumulative density function (cdf) or quantiles of a
one-dimensional density for bounded data estimated via the
transformation-based approach for Gaussian mixtures in
densityMclustBounded()
.
Diagnostic plots for density estimation of bounded data via transformation-based approach of Gaussian mixtures. Only available for the one-dimensional case.
The two diagnostic plots for density estimation in the one-dimensional case are discussed in Loader (1999, pp- 87-90).
cdfDensityBounded(object, data, ngrid = 100, ...) quantileDensityBounded(object, p, ...) densityMclustBounded.diagnostic( object, type = c("cdf", "qq"), col = c("black", "black"), lwd = c(2, 1), lty = c(1, 1), legend = TRUE, grid = TRUE, ... )
cdfDensityBounded(object, data, ngrid = 100, ...) quantileDensityBounded(object, p, ...) densityMclustBounded.diagnostic( object, type = c("cdf", "qq"), col = c("black", "black"), lwd = c(2, 1), lty = c(1, 1), legend = TRUE, grid = TRUE, ... )
object |
An object of class |
data |
A numeric vector of evaluation points. |
ngrid |
The number of points in a regular grid to be used as evaluation
points if no |
... |
Additional arguments. |
p |
A numeric vector of probabilities corresponding to quantiles. |
type |
The type of graph requested:
|
col |
A pair of values for the color to be used for plotting, respectively, the estimated CDF and the empirical cdf. |
lwd |
A pair of values for the line width to be used for plotting, respectively, the estimated CDF and the empirical cdf. |
lty |
A pair of values for the line type to be used for plotting, respectively, the estimated CDF and the empirical cdf. |
legend |
A logical indicating if a legend must be added to the plot of fitted CDF vs the empirical CDF. |
grid |
A logical indicating if a |
The cdf is evaluated at points given by the optional argument data
.
If not provided, a regular grid of length ngrid
for the evaluation
points is used.
The quantiles are computed using bisection linear search algorithm.
cdfDensityBounded()
returns a list of x
and y
values providing,
respectively, the evaluation points and the estimated cdf.
quantileDensityBounded()
returns a vector of quantiles.
No return value, called for side effects.
Luca Scrucca
Loader C. (1999), Local Regression and Likelihood. New York, Springer.
densityMclustBounded()
, plot.densityMclustBounded()
.
densityMclustBounded()
, plot.densityMclustBounded()
.
# univariate case with lower bound x <- rchisq(200, 3) dens <- densityMclustBounded(x, lbound = 0) xgrid <- seq(-2, max(x), length=1000) cdf <- cdfDensityBounded(dens, xgrid) str(cdf) plot(xgrid, pchisq(xgrid, df = 3), type = "l", xlab = "x", ylab = "CDF") lines(cdf, col = 4, lwd = 2) q <- quantileDensityBounded(dens, p = c(0.01, 0.1, 0.5, 0.9, 0.99)) cbind(quantile = q, cdf = cdfDensityBounded(dens, q)$y) plot(cdf, type = "l", col = 4, xlab = "x", ylab = "CDF") points(q, cdfDensityBounded(dens, q)$y, pch = 19, col = 4) # univariate case with lower & upper bounds x <- rbeta(200, 5, 1.5) dens <- densityMclustBounded(x, lbound = 0, ubound = 1) xgrid <- seq(-0.1, 1.1, length=1000) cdf <- cdfDensityBounded(dens, xgrid) str(cdf) plot(xgrid, pbeta(xgrid, 5, 1.5), type = "l", xlab = "x", ylab = "CDF") lines(cdf, col = 4, lwd = 2) q <- quantileDensityBounded(dens, p = c(0.01, 0.1, 0.5, 0.9, 0.99)) cbind(quantile = q, cdf = cdfDensityBounded(dens, q)$y) plot(cdf, type = "l", col = 4, xlab = "x", ylab = "CDF") points(q, cdfDensityBounded(dens, q)$y, pch = 19, col = 4) # univariate case with lower bound x <- rchisq(200, 3) dens <- densityMclustBounded(x, lbound = 0) plot(dens, x, what = "diagnostic") # or densityMclustBounded.diagnostic(dens, type = "cdf") densityMclustBounded.diagnostic(dens, type = "qq") # univariate case with lower & upper bounds x <- rbeta(200, 5, 1.5) dens <- densityMclustBounded(x, lbound = 0, ubound = 1) plot(dens, x, what = "diagnostic") # or densityMclustBounded.diagnostic(dens, type = "cdf") densityMclustBounded.diagnostic(dens, type = "qq")
# univariate case with lower bound x <- rchisq(200, 3) dens <- densityMclustBounded(x, lbound = 0) xgrid <- seq(-2, max(x), length=1000) cdf <- cdfDensityBounded(dens, xgrid) str(cdf) plot(xgrid, pchisq(xgrid, df = 3), type = "l", xlab = "x", ylab = "CDF") lines(cdf, col = 4, lwd = 2) q <- quantileDensityBounded(dens, p = c(0.01, 0.1, 0.5, 0.9, 0.99)) cbind(quantile = q, cdf = cdfDensityBounded(dens, q)$y) plot(cdf, type = "l", col = 4, xlab = "x", ylab = "CDF") points(q, cdfDensityBounded(dens, q)$y, pch = 19, col = 4) # univariate case with lower & upper bounds x <- rbeta(200, 5, 1.5) dens <- densityMclustBounded(x, lbound = 0, ubound = 1) xgrid <- seq(-0.1, 1.1, length=1000) cdf <- cdfDensityBounded(dens, xgrid) str(cdf) plot(xgrid, pbeta(xgrid, 5, 1.5), type = "l", xlab = "x", ylab = "CDF") lines(cdf, col = 4, lwd = 2) q <- quantileDensityBounded(dens, p = c(0.01, 0.1, 0.5, 0.9, 0.99)) cbind(quantile = q, cdf = cdfDensityBounded(dens, q)$y) plot(cdf, type = "l", col = 4, xlab = "x", ylab = "CDF") points(q, cdfDensityBounded(dens, q)$y, pch = 19, col = 4) # univariate case with lower bound x <- rchisq(200, 3) dens <- densityMclustBounded(x, lbound = 0) plot(dens, x, what = "diagnostic") # or densityMclustBounded.diagnostic(dens, type = "cdf") densityMclustBounded.diagnostic(dens, type = "qq") # univariate case with lower & upper bounds x <- rbeta(200, 5, 1.5) dens <- densityMclustBounded(x, lbound = 0, ubound = 1) plot(dens, x, what = "diagnostic") # or densityMclustBounded.diagnostic(dens, type = "cdf") densityMclustBounded.diagnostic(dens, type = "qq")
Density estimation for bounded data via transformation-based approach for Gaussian mixtures.
densityMclustBounded( data, G = NULL, modelNames = NULL, criterion = c("BIC", "ICL"), lbound = NULL, ubound = NULL, lambda = c(-3, 3), prior = NULL, initialization = NULL, nstart = 25, parallel = FALSE, seed = NULL, ... ) ## S3 method for class 'densityMclustBounded' summary(object, parameters = FALSE, ...)
densityMclustBounded( data, G = NULL, modelNames = NULL, criterion = c("BIC", "ICL"), lbound = NULL, ubound = NULL, lambda = c(-3, 3), prior = NULL, initialization = NULL, nstart = 25, parallel = FALSE, seed = NULL, ... ) ## S3 method for class 'densityMclustBounded' summary(object, parameters = FALSE, ...)
data |
A numeric vector, matrix, or data frame of observations. If a matrix or data frame, rows correspond to observations and columns correspond to variables. |
G |
An integer vector specifying the numbers of mixture components. By
default |
modelNames |
A vector of character strings indicating the Gaussian
mixture models to be fitted on the transformed-data space. See
|
criterion |
A character string specifying the information criterion for
model selection. Possible values are |
lbound |
Numeric vector proving lower bounds for variables. |
ubound |
Numeric vector proving upper bounds for variables. |
lambda |
A numeric vector providing the range (min and max) of searched values for the transformation parameter(s). If a matrix is provided, then for each variable a row should be provided containing the range of lambda values for the transformation parameter. If a variable must have a fixed lambda value, the provided min and max values should be equal. See examples below. |
prior |
A function specifying a prior for Bayesian regularization of
Gaussian mixtures. See |
initialization |
A list containing one or more of the following components:
|
nstart |
An integer value specifying the number of replications of
k-means clustering to be used for initializing the EM algorithm. See
|
parallel |
An optional argument which allows to specify if the search over all possible models should be run sequentially (default) or in parallel. For a single machine with multiple cores, possible values are:
In all the cases described above, at the end of the search the cluster is automatically stopped by shutting down the workers. If a cluster of multiple machines is available, evaluation of the fitness
function can be executed in parallel using all, or a subset of, the cores
available to the machines belonging to the cluster. However, this option
requires more work from the user, who needs to set up and register a
parallel back end. In this case the cluster must be explicitely stopped
with |
seed |
An integer value containing the random number generator state. This argument can be used to replicate the result of k-means initialisation strategy. Note that if parallel computing is required, the doRNG package must be installed. |
... |
Further arguments passed to or from other methods. |
object |
An object of class |
parameters |
A logical, if |
For more details see
vignette("mclustAddons")
Returns an object of class 'densityMclustBounded'
.
Luca Scrucca
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
predict.densityMclustBounded()
, plot.densityMclustBounded()
.
# univariate case with lower bound x <- rchisq(200, 3) xgrid <- seq(-2, max(x), length=1000) f <- dchisq(xgrid, 3) # true density dens <- densityMclustBounded(x, lbound = 0) summary(dens) summary(dens, parameters = TRUE) plot(dens, what = "BIC") plot(dens, what = "density") lines(xgrid, f, lty = 2) plot(dens, what = "density", data = x, breaks = 15) # univariate case with lower & upper bounds x <- rbeta(200, 5, 1.5) xgrid <- seq(-0.1, 1.1, length=1000) f <- dbeta(xgrid, 5, 1.5) # true density dens <- densityMclustBounded(x, lbound = 0, ubound = 1) summary(dens) plot(dens, what = "BIC") plot(dens, what = "density") plot(dens, what = "density", data = x, breaks = 9) # bivariate case with lower bounds x1 <- rchisq(200, 3) x2 <- 0.5*x1 + sqrt(1-0.5^2)*rchisq(200, 5) x <- cbind(x1, x2) plot(x) dens <- densityMclustBounded(x, lbound = c(0,0)) summary(dens, parameters = TRUE) plot(dens, what = "BIC") plot(dens, what = "density") plot(dens, what = "density", type = "hdr") plot(dens, what = "density", type = "persp") # specify different ranges for the lambda values of each variable dens1 <- densityMclustBounded(x, lbound = c(0,0), lambda = matrix(c(-2,2,0,1), 2, 2, byrow=TRUE)) # set lambda = 0 fixed for the second variable dens2 <- densityMclustBounded(x, lbound = c(0,0), lambda = matrix(c(0,1,0,0), 2, 2, byrow=TRUE)) dens[c("lambdaRange", "lambda", "loglik", "df")] dens1[c("lambdaRange", "lambda", "loglik", "df")] dens2[c("lambdaRange", "lambda", "loglik", "df")]
# univariate case with lower bound x <- rchisq(200, 3) xgrid <- seq(-2, max(x), length=1000) f <- dchisq(xgrid, 3) # true density dens <- densityMclustBounded(x, lbound = 0) summary(dens) summary(dens, parameters = TRUE) plot(dens, what = "BIC") plot(dens, what = "density") lines(xgrid, f, lty = 2) plot(dens, what = "density", data = x, breaks = 15) # univariate case with lower & upper bounds x <- rbeta(200, 5, 1.5) xgrid <- seq(-0.1, 1.1, length=1000) f <- dbeta(xgrid, 5, 1.5) # true density dens <- densityMclustBounded(x, lbound = 0, ubound = 1) summary(dens) plot(dens, what = "BIC") plot(dens, what = "density") plot(dens, what = "density", data = x, breaks = 9) # bivariate case with lower bounds x1 <- rchisq(200, 3) x2 <- 0.5*x1 + sqrt(1-0.5^2)*rchisq(200, 5) x <- cbind(x1, x2) plot(x) dens <- densityMclustBounded(x, lbound = c(0,0)) summary(dens, parameters = TRUE) plot(dens, what = "BIC") plot(dens, what = "density") plot(dens, what = "density", type = "hdr") plot(dens, what = "density", type = "persp") # specify different ranges for the lambda values of each variable dens1 <- densityMclustBounded(x, lbound = c(0,0), lambda = matrix(c(-2,2,0,1), 2, 2, byrow=TRUE)) # set lambda = 0 fixed for the second variable dens2 <- densityMclustBounded(x, lbound = c(0,0), lambda = matrix(c(0,1,0,0), 2, 2, byrow=TRUE)) dens[c("lambdaRange", "lambda", "loglik", "df")] dens1[c("lambdaRange", "lambda", "loglik", "df")] dens2[c("lambdaRange", "lambda", "loglik", "df")]
Compute an estimate of the (differential) entropy from a Gaussian Mixture Model (GMM) fitted using the mclust package.
EntropyGMM(object, ...) ## S3 method for class 'densityMclust' EntropyGMM(object, ...) ## S3 method for class 'densityMclustBounded' EntropyGMM(object, ...) ## S3 method for class 'Mclust' EntropyGMM(object, ...) ## S3 method for class 'data.frame' EntropyGMM(object, ...) ## S3 method for class 'matrix' EntropyGMM(object, ...) EntropyGauss(sigma) nats2bits(x) bits2nats(x)
EntropyGMM(object, ...) ## S3 method for class 'densityMclust' EntropyGMM(object, ...) ## S3 method for class 'densityMclustBounded' EntropyGMM(object, ...) ## S3 method for class 'Mclust' EntropyGMM(object, ...) ## S3 method for class 'data.frame' EntropyGMM(object, ...) ## S3 method for class 'matrix' EntropyGMM(object, ...) EntropyGauss(sigma) nats2bits(x) bits2nats(x)
object |
An object of class If a |
... |
Further arguments passed to or from other methods. |
sigma |
A symmetric covariance matrix. |
x |
A vector of values. |
For more details see
vignette("mclustAddons")
EntropyGMM()
returns an estimate of the entropy based on a
estimated Gaussian mixture model (GMM) fitted using the mclust
package. If a matrix of data values is provided, a GMM is preliminary
fitted to the data and then the entropy computed.
EntropyGauss()
returns the entropy for a multivariate Gaussian
distribution with covariance matrix sigma
.
nats2bits()
and bits2nats()
convert input values in nats to
bits, and viceversa. Information-theoretic quantities have different
units depending on the base of the logarithm used: nats are expressed
in base-2 logarithms, whereas bits in natural logarithms.
Luca Scrucca
Robin S. and Scrucca L. (2023) Mixture-based estimation of entropy. Computational Statistics & Data Analysis, 177, 107582. doi:10.1016/j.csda.2022.107582
mclust::Mclust()
, mclust::densityMclust()
.
X = iris[,1:4] mod = densityMclust(X, plot = FALSE) h = EntropyGMM(mod) h bits2nats(h) EntropyGMM(X)
X = iris[,1:4] mod = densityMclust(X, plot = FALSE) h = EntropyGMM(mod) h bits2nats(h) EntropyGMM(X)
A function implementing a fast and efficient Modal EM algorithm for Gaussian mixtures.
GaussianMixtureMEM( data, pro, mu, sigma, control = list(eps = 1e-05, maxiter = 1000, stepsize = function(t) 1 - exp(-0.1 * t), denoise = TRUE, alpha = 0.01, keep.path = FALSE), ... )
GaussianMixtureMEM( data, pro, mu, sigma, control = list(eps = 1e-05, maxiter = 1000, stepsize = function(t) 1 - exp(-0.1 * t), denoise = TRUE, alpha = 0.01, keep.path = FALSE), ... )
data |
A numeric vector, matrix, or data frame of observations.
Categorical variables are not allowed. If a matrix or data frame, rows
correspond to observations ( |
pro |
A |
mu |
A |
sigma |
A |
control |
A list of control parameters:
|
... |
Further arguments passed to or from other methods. |
Returns a list containing the following elements:
n
The number of input data points.
d
The number of variables/features.
parameters
The Gaussian mixture parameters.
iter
The number of iterations of MEM algorithm.
nmodes
The number of modes estimated by the MEM algorithm.
modes
The coordinates of modes estimated by MEM algorithm.
path
If requested, the coordinates of full paths to modes for each data point.
logdens
The log-density at the estimated modes.
logvol
The log-volume used for denoising (if requested).
classification
The modal clustering classification of input data points.
Luca Scrucca
Scrucca L. (2021) A fast and efficient Modal EM algorithm for Gaussian mixtures. Statistical Analysis and Data Mining, 14:4, 305–314. doi: 10.1002/sam.11527
Gaussian mixtures for modeling the distribution of financial log-returns.
GMMlogreturn(y, ...) ## S3 method for class 'GMMlogreturn' summary(object, ...)
GMMlogreturn(y, ...) ## S3 method for class 'GMMlogreturn' summary(object, ...)
y |
A numeric vector providing the log-returns of a financial stock. |
... |
Further arguments passed to |
object |
An object of class |
Let be the price of a financial stock for the current time frame
(day for instance), and
the price of the previous time frame.
The log-return at time
is defined as:
A univariate heteroscedastic GMM using Bayesian regularization
(as described in mclust::priorControl()
) is fitted to the observed
log-returns. The number of mixture components is automatically selected
by BIC, unless specified with the optional G
argument.
Returns an object of class 'GMMlogreturn'
.
Luca Scrucca
Scrucca L. (2024) Entropy-based volatility analysis of financial log-returns using Gaussian mixture models. Entropy, 26(11), 907. doi:10.3390/e26110907
VaR.GMMlogreturn()
, ES.GMMlogreturn()
.
data(gold) head(gold) mod = GMMlogreturn(gold$log.returns) summary(mod) plot(mod, what = "density", data = gold$log.returns, xlab = "log-returns", col = 4, lwd = 2)
data(gold) head(gold) mod = GMMlogreturn(gold$log.returns) summary(mod) plot(mod, what = "density", data = gold$log.returns, xlab = "log-returns", col = 4, lwd = 2)
Gold price log-returns for the year 2023 obtained from Yahoo Finance using
the quantmod
R package.
Code used to download, format, and save the data:
gold = quantmod::getSymbols("GC=F", src = "yahoo", auto.assign = FALSE) gold = quantmod::dailyReturn(gold, type = "log") gold = data.frame("date" = as.Date(zoo::index(gold)), "log.returns" = as.vector(gold$daily.returns), row.names = NULL)
A data frame with the following variables:
Date (format: yyyy-mmm-dd).
Daily log-return.
Internal functions not intended to be called directly by users.
as.MclustBounded(x, ...) ## Default S3 method: as.MclustBounded(x, ...) ## S3 method for class 'densityMclustBounded' as.MclustBounded(x, ...) as.densityMclustBounded(x, ...) ## Default S3 method: as.densityMclustBounded(x, ...) ## S3 method for class 'MclustBounded' as.densityMclustBounded(x, ...)
as.MclustBounded(x, ...) ## Default S3 method: as.MclustBounded(x, ...) ## S3 method for class 'densityMclustBounded' as.MclustBounded(x, ...) as.densityMclustBounded(x, ...) ## Default S3 method: as.densityMclustBounded(x, ...) ## S3 method for class 'MclustBounded' as.densityMclustBounded(x, ...)
x |
An object of class specific for the method. |
... |
Further arguments passed to or from other methods. |
Clustering of bounded data via transformation-based approach for Gaussian mixtures.
MclustBounded(data, ...) ## S3 method for class 'MclustBounded' summary(object, classification = TRUE, parameters = FALSE, ...)
MclustBounded(data, ...) ## S3 method for class 'MclustBounded' summary(object, classification = TRUE, parameters = FALSE, ...)
data |
A numeric vector, matrix, or data frame of observations. If a matrix or data frame, rows correspond to observations and columns correspond to variables. |
... |
Further arguments passed to |
object |
An object of class |
classification |
A logical, if |
parameters |
A logical, if |
For more details see
vignette("mclustAddons")
Returns an object of class 'MclustBounded'
.
Luca Scrucca
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
densityMclustBounded()
, predict.MclustBounded()
, plot.MclustBounded()
.
Given a GMM for bounded data, computes the means and variances in the original scale from the estimated mixture components parameters dataset using simulations.
MclustBoundedParameters(object, nsim = 1e+06, ...)
MclustBoundedParameters(object, nsim = 1e+06, ...)
object |
An object of class |
nsim |
An integer specifying the number of simulations to employ. |
... |
Further arguments passed to or from other methods. |
x = rlnorm(1000, 0, 1) mod = densityMclustBounded(x, lbound = 0, lambda = 0) summary(mod, parameters = TRUE) plot(mod, what = "density") # transformed parameters (from log-normal distribution) # mean with(mod$parameters, exp(mean + 0.5*variance$sigmasq)) # var with(mod$parameters, (exp(variance$sigmasq) - 1)*exp(2*mean + variance$sigmasq)) # using simulations MclustBoundedParameters(mod)
x = rlnorm(1000, 0, 1) mod = densityMclustBounded(x, lbound = 0, lambda = 0) summary(mod, parameters = TRUE) plot(mod, what = "density") # transformed parameters (from log-normal distribution) # mean with(mod$parameters, exp(mean + 0.5*variance$sigmasq)) # var with(mod$parameters, (exp(variance$sigmasq) - 1)*exp(2*mean + variance$sigmasq)) # using simulations MclustBoundedParameters(mod)
Function to compute the marginal parameters from a fitted Gaussian mixture models.
mclustMarginalParams(object, ...) gmm2margParams(pro, mu, sigma, ...)
mclustMarginalParams(object, ...) gmm2margParams(pro, mu, sigma, ...)
object |
An object of class |
... |
Further arguments passed to or from other methods. |
pro |
A vector of mixing proportions for each mixture component. |
mu |
A matrix of mean vectors for each mixture component. For
a |
sigma |
An array of covariance matrices for each mixture component.
For a |
Given a -component GMM with estimated mixture weight
,
mean vector
, and covariance matrix
, for
mixture component
, then the marginal distribution has:
mean vector
covariance matrix
Returns a list of two components for the mean and covariance of the marginal distribution.
Luca Scrucca
Frühwirth-Schnatter S. (2006) Finite Mixture and Markov Switching Models, Springer, Sec. 6.1.1
mclust::Mclust()
, mclust::densityMclust()
.
x = iris[,1:4] mod = Mclust(x, G = 3) mod$parameters$pro mod$parameters$mean mod$parameters$variance$sigma mclustMarginalParams(mod)
x = iris[,1:4] mod = Mclust(x, G = 3) mod$parameters$pro mod$parameters$mean mod$parameters$variance$sigma mclustMarginalParams(mod)
Modal-clustering estimation by applying the Modal EM algorithm to Gaussian mixtures fitted using the mclust package.
MclustMEM(object, data = NULL, ...) ## S3 method for class 'MclustMEM' summary(object, ...)
MclustMEM(object, data = NULL, ...) ## S3 method for class 'MclustMEM' summary(object, ...)
object |
An object of class |
data |
If provided, a numeric vector, matrix, or data frame of
observations. If a matrix or data frame, rows correspond to observations
( |
... |
Further arguments passed to or from other methods. |
For more details see
vignette("mclustAddons")
Returns an object of class 'MclustMEM'
with elements described in
GaussianMixtureMEM()
.
Luca Scrucca
Scrucca L. (2021) A fast and efficient Modal EM algorithm for Gaussian mixtures. Statistical Analysis and Data Mining, 14:4, 305–314. doi:10.1002/sam.11527
GaussianMixtureMEM()
, plot.MclustMEM()
.
data(Baudry_etal_2010_JCGS_examples, package = "mclust") plot(ex4.1) GMM <- Mclust(ex4.1) plot(GMM, what = "classification") MEM <- MclustMEM(GMM) MEM summary(MEM) plot(MEM) plot(ex4.4.2) GMM <- Mclust(ex4.4.2) plot(GMM, what = "classification") MEM <- MclustMEM(GMM) MEM summary(MEM) plot(MEM, addDensity = FALSE)
data(Baudry_etal_2010_JCGS_examples, package = "mclust") plot(ex4.1) GMM <- Mclust(ex4.1) plot(GMM, what = "classification") MEM <- MclustMEM(GMM) MEM summary(MEM) plot(MEM) plot(ex4.4.2) GMM <- Mclust(ex4.4.2) plot(GMM, what = "classification") MEM <- MclustMEM(GMM) MEM summary(MEM) plot(MEM, addDensity = FALSE)
Plots for mclustDensityBounded
objects.
## S3 method for class 'densityMclustBounded' plot(x, what = c("BIC", "density", "diagnostic"), data = NULL, ...)
## S3 method for class 'densityMclustBounded' plot(x, what = c("BIC", "density", "diagnostic"), data = NULL, ...)
x |
An object of class |
what |
The type of graph requested:
|
data |
Optional data points. |
... |
Further available arguments:
|
No return value, called for side effects.
Luca Scrucca
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
densityMclustBounded()
, predict.densityMclustBounded()
.
# univariate case with lower bound x <- rchisq(200, 3) dens <- densityMclustBounded(x, lbound = 0) plot(dens, what = "BIC") plot(dens, what = "density", data = x, breaks = 15) # univariate case with lower & upper bound x <- rbeta(200, 5, 1.5) dens <- densityMclustBounded(x, lbound = 0, ubound = 1) plot(dens, what = "BIC") plot(dens, what = "density", data = x, breaks = 9) # bivariate case with lower bounds x1 <- rchisq(200, 3) x2 <- 0.5*x1 + sqrt(1-0.5^2)*rchisq(200, 5) x <- cbind(x1, x2) dens <- densityMclustBounded(x, lbound = c(0,0)) plot(dens, what = "density") plot(dens, what = "density", data = x) plot(dens, what = "density", type = "hdr") plot(dens, what = "density", type = "persp")
# univariate case with lower bound x <- rchisq(200, 3) dens <- densityMclustBounded(x, lbound = 0) plot(dens, what = "BIC") plot(dens, what = "density", data = x, breaks = 15) # univariate case with lower & upper bound x <- rbeta(200, 5, 1.5) dens <- densityMclustBounded(x, lbound = 0, ubound = 1) plot(dens, what = "BIC") plot(dens, what = "density", data = x, breaks = 9) # bivariate case with lower bounds x1 <- rchisq(200, 3) x2 <- 0.5*x1 + sqrt(1-0.5^2)*rchisq(200, 5) x <- cbind(x1, x2) dens <- densityMclustBounded(x, lbound = c(0,0)) plot(dens, what = "density") plot(dens, what = "density", data = x) plot(dens, what = "density", type = "hdr") plot(dens, what = "density", type = "persp")
Plotting method for model-based clustering of bounded data
## S3 method for class 'MclustBounded' plot(x, what = c("BIC", "classification", "uncertainty"), dimens = NULL, ...)
## S3 method for class 'MclustBounded' plot(x, what = c("BIC", "classification", "uncertainty"), dimens = NULL, ...)
x |
An object of class |
what |
A string specifying the type of graph requested. Available choices are:
|
dimens |
A vector of integers specifying the dimensions of the coordinate projections. |
... |
Further arguments passed to or from other methods. |
No return value, called for side effects.
Plots for MclustMEM
objects.
## S3 method for class 'MclustMEM' plot( x, dimens = NULL, addDensity = TRUE, addPoints = TRUE, symbols = NULL, colors = NULL, cex = NULL, labels = NULL, cex.labels = NULL, gap = 0.2, ... )
## S3 method for class 'MclustMEM' plot( x, dimens = NULL, addDensity = TRUE, addPoints = TRUE, symbols = NULL, colors = NULL, cex = NULL, labels = NULL, cex.labels = NULL, gap = 0.2, ... )
x |
An object of class |
dimens |
A vector of integers specifying the dimensions of the coordinate projections. |
addDensity |
A logical indicating whether or not to add density estimates to the plot. |
addPoints |
A logical indicating whether or not to add data points to the plot. |
symbols |
Either an integer or character vector assigning a plotting
symbol to each unique class in |
colors |
Either an integer or character vector assigning a color to
each unique class in |
cex |
A vector of numerical values specifying the size of the plotting
symbol for each unique class in |
labels |
A vector of character strings for labelling the variables. The
default is to use the column dimension names of |
cex.labels |
A numerical value specifying the size of the text labels. |
gap |
A numerical argument specifying the distance between subplots
(see |
... |
Further arguments passed to or from other methods. |
No return value, called for side effects.
Luca Scrucca
Scrucca L. (2021) A fast and efficient Modal EM algorithm for Gaussian mixtures. Statistical Analysis and Data Mining, 14:4, 305–314. doi: 10.1002/sam.11527
# 1-d example GMM <- Mclust(iris$Petal.Length) MEM <- MclustMEM(GMM) plot(MEM) # 2-d example data(Baudry_etal_2010_JCGS_examples) GMM <- Mclust(ex4.1) MEM <- MclustMEM(GMM) plot(MEM) plot(MEM, addPoints = FALSE) plot(MEM, addDensity = FALSE) # 3-d example GMM <- Mclust(ex4.4.2) MEM <- MclustMEM(GMM) plot(MEM) plot(MEM, addPoints = FALSE) plot(MEM, addDensity = FALSE)
# 1-d example GMM <- Mclust(iris$Petal.Length) MEM <- MclustMEM(GMM) plot(MEM) # 2-d example data(Baudry_etal_2010_JCGS_examples) GMM <- Mclust(ex4.1) MEM <- MclustMEM(GMM) plot(MEM) plot(MEM, addPoints = FALSE) plot(MEM, addDensity = FALSE) # 3-d example GMM <- Mclust(ex4.4.2) MEM <- MclustMEM(GMM) plot(MEM) plot(MEM, addPoints = FALSE) plot(MEM, addDensity = FALSE)
Predict density estimates for univariate and multivariate bounded data
based on Gaussian finite mixture models estimated by
densityMclustBounded()
.
## S3 method for class 'densityMclustBounded' predict( object, newdata, what = c("dens", "cdens", "z"), logarithm = FALSE, ... )
## S3 method for class 'densityMclustBounded' predict( object, newdata, what = c("dens", "cdens", "z"), logarithm = FALSE, ... )
object |
An object of class |
newdata |
A numeric vector, matrix, or data frame of observations. If
missing the density is computed for the input data obtained from the call to
|
what |
A character string specifying what to retrieve: |
logarithm |
A logical value indicating whether or not the logarithm of the densities/probabilities should be returned. |
... |
Further arguments passed to or from other methods. |
Returns a vector or a matrix of values evaluated at newdata
depending
on the argument what
(see above).
Luca Scrucca
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
densityMclustBounded()
, plot.densityMclustBounded()
.
y <- sample(0:1, size = 200, replace = TRUE, prob = c(0.6, 0.4)) x <- y*rchisq(200, 3) + (1-y)*rchisq(200, 10) dens <- densityMclustBounded(x, lbound = 0) summary(dens) plot(dens, what = "density", data = x, breaks = 11) xgrid <- seq(0, max(x), length = 201) densx <- predict(dens, newdata = xgrid, what = "dens") cdensx <- predict(dens, newdata = xgrid, what = "cdens") cdensx <- sweep(cdensx, MARGIN = 2, FUN = "*", dens$parameters$pro) plot(xgrid, densx, type = "l", lwd = 2) matplot(xgrid, cdensx, type = "l", col = 3:4, lty = 2:3, lwd = 2, add = TRUE) z <- predict(dens, newdata = xgrid, what = "z") matplot(xgrid, z, col = 3:4, lty = 2:3, lwd = 2, ylab = "Posterior probabilities")
y <- sample(0:1, size = 200, replace = TRUE, prob = c(0.6, 0.4)) x <- y*rchisq(200, 3) + (1-y)*rchisq(200, 10) dens <- densityMclustBounded(x, lbound = 0) summary(dens) plot(dens, what = "density", data = x, breaks = 11) xgrid <- seq(0, max(x), length = 201) densx <- predict(dens, newdata = xgrid, what = "dens") cdensx <- predict(dens, newdata = xgrid, what = "cdens") cdensx <- sweep(cdensx, MARGIN = 2, FUN = "*", dens$parameters$pro) plot(xgrid, densx, type = "l", lwd = 2) matplot(xgrid, cdensx, type = "l", col = 3:4, lty = 2:3, lwd = 2, add = TRUE) z <- predict(dens, newdata = xgrid, what = "z") matplot(xgrid, z, col = 3:4, lty = 2:3, lwd = 2, ylab = "Posterior probabilities")
Predict clustering for univariate and multivariate bounded data based on
Gaussian finite mixture models estimated by MclustBounded()
.
## S3 method for class 'MclustBounded' predict(object, newdata, ...)
## S3 method for class 'MclustBounded' predict(object, newdata, ...)
object |
An object of class |
newdata |
A numeric vector, matrix, or data frame of observations. If
missing the density is computed for the input data obtained from the call to
|
... |
Further arguments passed to or from other methods. |
Returns a list of with the following components:
classification
A factor of predicted cluster labels for newdata.
z
A matrix whose th entry is the probability that
th
observation in
newdata
belongs to the th cluster.
Luca Scrucca
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
MclustBounded()
, plot.MclustBounded()
.
Proportion of white student enrollment in 56 school districts in Nassau County (Long Island, New York), for the 1992-1993 school year.
A data frame with the following variables:
School district.
Proportion of white student enrolled.
Simonoff, S.J. (1996) Smoothing Methods in Statistics, Springer-Verlag, New York, p. 52
Functions to compute univariate range–power transformation and its back-transform.
rangepowerTransform(x, lbound = -Inf, ubound = +Inf, lambda = 1) rangepowerBackTransform(y, lbound = -Inf, ubound = +Inf, lambda = 1)
rangepowerTransform(x, lbound = -Inf, ubound = +Inf, lambda = 1) rangepowerBackTransform(y, lbound = -Inf, ubound = +Inf, lambda = 1)
x |
A numeric vector of data values. |
lbound |
A numerical value of variable lower bound. |
ubound |
A numerical value of variable upper bound. |
lambda |
A numerical value for the power transformation. |
y |
A numeric vector of transformed data values. |
The range-power transformation can be applied to variables with bounded support.
Lower bound case
Suppose is a univariate random variable with lower bounded support
, where
.
Consider a preliminary range transformation defined as
, which maps
.
The range-power transformation is a continuous monotonic transformation
defined as
with back-transformation function
Lower and upper bound case
Suppose is a univariate random variable with bounded support
, where
. Consider a preliminary range transformation defined as
, which maps
.
In this case, the range-power transformation is a continuous monotonic
transformation defined as
with back-transformation function
Returns a vector of transformed or back-transformed values.
Luca Scrucca
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
# Lower bound case x = rchisq(1000, 5) y = rangepowerTransform(x, lbound = 0, lambda = 1/3) par(mfrow=c(2,2)) hist(x, main = NULL, breaks = 21); rug(x) hist(y, xlab = "y = t(x)", main = NULL, breaks = 21); rug(y) xx = rangepowerBackTransform(y, lbound = 0, lambda = 1/3) hist(xx, xlab = "t^-1(y) = x", main = NULL, breaks = 21); rug(xx) plot(x, xx, ylab = "t^-1(y)"); abline(0,1) # Lower and upper bound case x = rbeta(1000, 2, 1) y = rangepowerTransform(x, lbound = 0, ubound = 1, lambda = 0) par(mfrow=c(2,2)) hist(x, main = NULL, breaks = 21); rug(x) hist(y, xlab = "y = t(x)", main = NULL, breaks = 21); rug(y) xx = rangepowerBackTransform(y, lbound = 0, ubound = 1, lambda = 0) hist(xx, xlab = "t^-1(y) = x", main = NULL, breaks = 21); rug(xx) plot(x, xx, ylab = "t^-1(y)"); abline(0,1)
# Lower bound case x = rchisq(1000, 5) y = rangepowerTransform(x, lbound = 0, lambda = 1/3) par(mfrow=c(2,2)) hist(x, main = NULL, breaks = 21); rug(x) hist(y, xlab = "y = t(x)", main = NULL, breaks = 21); rug(y) xx = rangepowerBackTransform(y, lbound = 0, lambda = 1/3) hist(xx, xlab = "t^-1(y) = x", main = NULL, breaks = 21); rug(xx) plot(x, xx, ylab = "t^-1(y)"); abline(0,1) # Lower and upper bound case x = rbeta(1000, 2, 1) y = rangepowerTransform(x, lbound = 0, ubound = 1, lambda = 0) par(mfrow=c(2,2)) hist(x, main = NULL, breaks = 21); rug(x) hist(y, xlab = "y = t(x)", main = NULL, breaks = 21); rug(y) xx = rangepowerBackTransform(y, lbound = 0, ubound = 1, lambda = 0) hist(xx, xlab = "t^-1(y) = x", main = NULL, breaks = 21); rug(xx) plot(x, xx, ylab = "t^-1(y)"); abline(0,1)
Lengths of treatment spells (in days) of control patients in suicide study.
A vector of containing the lengths (days) of 86 spells of psychiatric treatment undergone by patients used as controls in a study of suicide risks.
Silverman, B. W. (1986) Density Estimation, Chapman & Hall, Tab 2.1.
Generic functions for computing Value-at-Risk (VaR) and Expected Shortfall (ES).
VaR(object, ...) ES(object, ...)
VaR(object, ...) ES(object, ...)
object |
An object of class specific for the method. |
... |
Further arguments passed to or from other methods. |
Value-at-Risk (VaR) and Expected Shortfall (ES) from the fit of
Gaussian mixtures provided by GMMlogreturn()
function.
## S3 method for class 'GMMlogreturn' VaR(object, alpha, ...) ## S3 method for class 'GMMlogreturn' ES(object, alpha, ...)
## S3 method for class 'GMMlogreturn' VaR(object, alpha, ...) ## S3 method for class 'GMMlogreturn' ES(object, alpha, ...)
object |
An object of class |
alpha |
A vector of values in the interval |
... |
Further arguments passed to or from other methods. |
VaR() is the maximum potential loss over a specified time
horizon with probability equal to the confidence level
.
ES() is the expected loss given that the loss exceeds the
VaR(
) level.
Returns a numerical value corresponding to VaR or ES at given level(s).
References:
Ruppert Matteson (2015) Statistics and Data Analysis for Financial Engineering, Springer, Chapter 19.
Cizek Hardle Weron (2011) Statistical Tools for Finance and Insurance, 2nd ed., Springer, Chapter 2.
z = sample(1:2, size = 250, replace = TRUE, prob = c(0.8, 0.2)) y = double(length(z)) y[z == 1] = rnorm(sum(z == 1), 0, 1) y[z == 2] = rnorm(sum(z == 2), -0.5, 2) GMM = GMMlogreturn(y) alpha = seq(0.01, 0.1, by = 0.001) matplot(alpha, data.frame(VaR = VaR(GMM, alpha), ES = ES(GMM, alpha)), type = "l", col = c(2,4), lty = 1, lwd = 2, xlab = expression(alpha), ylab = "Loss") legend("topright", col = c(2,4), lty = 1, lwd = 2, legend = c("VaR", "ES"), inset = 0.02)
z = sample(1:2, size = 250, replace = TRUE, prob = c(0.8, 0.2)) y = double(length(z)) y[z == 1] = rnorm(sum(z == 1), 0, 1) y[z == 2] = rnorm(sum(z == 2), -0.5, 2) GMM = GMMlogreturn(y) alpha = seq(0.01, 0.1, by = 0.001) matplot(alpha, data.frame(VaR = VaR(GMM, alpha), ES = ES(GMM, alpha)), type = "l", col = c(2,4), lty = 1, lwd = 2, xlab = expression(alpha), ylab = "Loss") legend("topright", col = c(2,4), lty = 1, lwd = 2, legend = c("VaR", "ES"), inset = 0.02)