Title: | Parsimonious Mixtures of MSEN and MTIN Distributions |
---|---|
Description: | Implements parsimonious mixtures of MSEN and MTIN distributions via expectation- maximization based algorithms for model-based clustering. For each mixture component, parsimony is reached via the eigen-decomposition of the scale matrices and by imposing a constraint on the tailedness parameter. This produces a family of 28 parsimonious mixture models for each distribution. |
Authors: | Salvatore D. Tomarchio [aut, cre], Bagnato Luca [aut], Antonio Punzo [aut] |
Maintainer: | Salvatore D. Tomarchio <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2024-12-25 06:54:05 UTC |
Source: | CRAN |
A dataset containing biometrical measurements for two categories of athletes collected at the Australian Institute of Sport.
data(AIS)
data(AIS)
A matrix with 202 observations on the following variables:
0 = Male or 1 = Female.
Height (in cm).
Lean body mass (in Kg).
Red cell count.
Hematocrit.
Hemoglobin.
Sum of skin folds.
Body fat percentage.
This dataset is a subset of the ais
dataset contained in the alr4
R package.
Weisberg Sanford (2018). alr4: Data to Accompany Applied Linear Regression 4th Edition. https://CRAN.R-project.org/package=alr4.
Density of a MSEN distribution
dmsen(x, mu = rep(0, d), Sigma, theta = Inf, formula = "direct")
dmsen(x, mu = rep(0, d), Sigma, theta = Inf, formula = "direct")
x |
A data matrix with |
mu |
A vector of length |
Sigma |
A symmetric positive-definite matrix representing the scale matrix of the distribution. |
theta |
A number greater than 0 indicating the tailedness parameter. |
formula |
Method used to calculate the density: "direct", "indirect", "series". |
The value(s) of the density in x
Punzo A., and Bagnato L. (2020). Allometric analysis using the multivariate shifted exponential normal distribution. Biometrical Journal, 62(6), 1525-1543.
d <- 3 x <- matrix(rnorm(d*2), 2, d) dmsen(x, mu = rep(0,d), Sigma = diag(d), theta = 0.4, formula = "direct")
d <- 3 x <- matrix(rnorm(d*2), 2, d) dmsen(x, mu = rep(0,d), Sigma = diag(d), theta = 0.4, formula = "direct")
Density of a MTIN distribution
dmtin(x, mu = rep(0, d), Sigma, theta = 0.01, formula = "direct")
dmtin(x, mu = rep(0, d), Sigma, theta = 0.01, formula = "direct")
x |
A data matrix with |
mu |
A vector of length |
Sigma |
A symmetric positive-definite matrix representing the scale matrix of the distribution. |
theta |
A number greater than 0 indicating the tailedness parameter. |
formula |
Method used to calculate the density: "direct", "indirect", "series". |
The value(s) of the density in x
Punzo A., and Bagnato L. (2021). The multivariate tail-inflated normal distribution and its application in finance. Journal of Statistical Computation and Simulation, 91(1), 1-36.
d <- 3 x <- matrix(rnorm(d*2), 2, d) dmtin(x, mu = rep(0,d), Sigma = diag(d), theta = 0.9, formula = "direct")
d <- 3 x <- matrix(rnorm(d*2), 2, d) dmtin(x, mu = rep(0,d), Sigma = diag(d), theta = 0.9, formula = "direct")
A dataset containing size-related measurements for two different Hawk species. Each species is further categorized by sex.
data(Hawks)
data(Hawks)
A matrix with 323 observations on the following variables:
1 = Male CH hawks, 2 = Male SS hawks, 3 = Female CH hawks or 4 = Female SS hawks
Length (in mm) of primary wing feather from tip to wrist it attaches to.
Body weight (in gm).
Measurement (in mm) related to the length of the tail.
This dataset is a subset of the Hawks
dataset contained in the Stat2Data
R package.
Cannon et al. (2019). Stat2Data: Datasets for Stat2. https://CRAN.R-project.org/package=Stat2Data.
Fits, by using EM-based algorithms, parsimonious mixtures of MSEN or MTIN distributions to the given data. Parallel computing is implemented and highly recommended for a faster model fitting. The Bayesian information criterion (BIC) and the integrated completed likelihood (ICL) are used to select the best fitting models according to each information criterion.
Mixt.fit( X, k = 1:3, init.par = NULL, cov.model = "all", theta.model = "all", density, ncores = 1, verbose = FALSE, ret.all = FALSE )
Mixt.fit( X, k = 1:3, init.par = NULL, cov.model = "all", theta.model = "all", density, ncores = 1, verbose = FALSE, ret.all = FALSE )
X |
A data matrix with |
k |
An integer or a vector indicating the number of groups of the models to be estimated. |
init.par |
The initial values for starting the algorithms, as produced by the |
cov.model |
A character vector indicating the parsimonious structure of the scale matrices. Possible values are: "EII", "VII", "EEI", "VEI", "EVI", "VVI", "EEE", "VEE", "EVE", "EEV", "VVE", "VEV", "EVV", "VVV" or "all". When "all" is used, all of the 14 parsimonious structures are considered. |
theta.model |
A character vector indicating the parsimonious structure of the tailedness parameters. Possible values are: "E", "V" or "all". When "all" is used, both parsimonious structures are considered. |
density |
A character indicating the density of the mixture components. Possible values are: "MSEN" or "MTIN". |
ncores |
A positive integer indicating the number of cores used for running in parallel. |
verbose |
A logical indicating whether the running output should be displayed. |
ret.all |
A logical indicating whether to report the results of all the models or only those of the best models according to BIC and ICL. |
A list with the following elements:
all.models |
The results related to the all the fitted models (only when |
BicWin |
The best fitting model according to the BIC. |
IclWin |
The best fitting model according to the ICL. |
Summary |
A quick table showing summary results for the best fitting models according to BIC and ICL. |
set.seed(1234) n <- 50 k <- 2 Pi <- c(0.5, 0.5) mu <- matrix(c(0, 0, 4, 5), 2, 2) cov.model <- "EEE" lambda <- c(0.5, 0.5) delta <- c(0.7, 0.7) gamma <- c(2.62, 2.62) theta <- c(0.1, 0.1) density <- "MSEN" data <- rMixt(n, k, Pi, mu, cov.model, lambda, delta, gamma, theta, density) X <- data$X nstartR <- 1 init.par <- Mixt.fit.init(X, k, density, nstartR) theta.model <- "E" res <- Mixt.fit(X, k, init.par, cov.model, theta.model, density)
set.seed(1234) n <- 50 k <- 2 Pi <- c(0.5, 0.5) mu <- matrix(c(0, 0, 4, 5), 2, 2) cov.model <- "EEE" lambda <- c(0.5, 0.5) delta <- c(0.7, 0.7) gamma <- c(2.62, 2.62) theta <- c(0.1, 0.1) density <- "MSEN" data <- rMixt(n, k, Pi, mu, cov.model, lambda, delta, gamma, theta, density) X <- data$X nstartR <- 1 init.par <- Mixt.fit.init(X, k, density, nstartR) theta.model <- "E" res <- Mixt.fit(X, k, init.par, cov.model, theta.model, density)
Runs the initialization of the EM-based algorithms used for fitting parsimonious mixtures of MSEN or MTIN distributions. Parallel computing is implemented and highly recommended for a faster calculation.
Mixt.fit.init(X, k = 1:3, density, nstartR = 100, ncores = 1, verbose = FALSE)
Mixt.fit.init(X, k = 1:3, density, nstartR = 100, ncores = 1, verbose = FALSE)
X |
A data matrix with |
k |
An integer or a vector indicating the number of groups of the models. |
density |
A character indicating the density of the mixture components. Possible values are: "MSEN" or "MTIN". |
nstartR |
An integer specifying the number of random starts to be considered. |
ncores |
A positive integer indicating the number of cores used for running in parallel. |
verbose |
A logical indicating whether the running output should be displayed. |
init |
A list of objects to be used by the |
set.seed(1234) n <- 50 k <- 2 Pi <- c(0.5, 0.5) mu <- matrix(c(0, 0, 4, 5), 2, 2) cov.model <- "EEE" lambda <- c(0.5, 0.5) delta <- c(0.7, 0.7) gamma <- c(2.62, 2.62) theta <- c(0.1, 0.1) density <- "MSEN" data <- rMixt(n, k, Pi, mu, cov.model, lambda, delta, gamma, theta, density) X <- data$X nstartR <- 1 init.par <- Mixt.fit.init(X, k, density, nstartR)
set.seed(1234) n <- 50 k <- 2 Pi <- c(0.5, 0.5) mu <- matrix(c(0, 0, 4, 5), 2, 2) cov.model <- "EEE" lambda <- c(0.5, 0.5) delta <- c(0.7, 0.7) gamma <- c(2.62, 2.62) theta <- c(0.1, 0.1) density <- "MSEN" data <- rMixt(n, k, Pi, mu, cov.model, lambda, delta, gamma, theta, density) X <- data$X nstartR <- 1 init.par <- Mixt.fit.init(X, k, density, nstartR)
Random number generation for bidimensional parsimonious mixtures of MSEN or MTIN distributions
rMixt(n, k, Pi, mu, cov.model, lambda, delta, gamma, theta, density)
rMixt(n, k, Pi, mu, cov.model, lambda, delta, gamma, theta, density)
n |
An integer specifying the number of data points to be simulated. |
k |
An integer indicating the number of groups in the data. |
Pi |
A vector of length |
mu |
A matrix of means with 2 rows and |
cov.model |
A character indicating the parsimonious structure of the scale matrices. Possible values are: "EII", "VII", "EEI", "VEI", "EVI", "VVI", "EEE", "VEE", "EVE", "EEV", "VVE", "VEV", "EVV" or "VVV". |
lambda |
A numeric vector of length |
delta |
A numeric vector of length |
gamma |
A numeric vector of length |
theta |
A vector of length |
density |
A character indicating the density of the mixture components. Possible values are: "MSEN" or "MTIN". |
A list with the following elements:
X |
A data matrix with |
Sigma |
An array of dimension 2 x 2 x |
Size |
The size of each generated group. |
Punzo A., Browne R. and McNicholas P.D. (2016). Hypothesis Testing for Mixture Model Selection. Journal of Statistical Computation and Simulation, 86(14), 2797-2818.
n <- 50 k <- 2 Pi <- c(0.5, 0.5) mu <- matrix(c(0, 0, 4, 5), 2, 2) cov.model <- "EEE" lambda <- c(0.5, 0.5) delta <- c(0.7, 0.7) gamma <- c(2.62, 2.62) theta <- c(0.1, 0.1) density <- "MSEN" data <- rMixt(n, k, Pi, mu, cov.model, lambda, delta, gamma, theta, density)
n <- 50 k <- 2 Pi <- c(0.5, 0.5) mu <- matrix(c(0, 0, 4, 5), 2, 2) cov.model <- "EEE" lambda <- c(0.5, 0.5) delta <- c(0.7, 0.7) gamma <- c(2.62, 2.62) theta <- c(0.1, 0.1) density <- "MSEN" data <- rMixt(n, k, Pi, mu, cov.model, lambda, delta, gamma, theta, density)
Random number generation for the MSEN distribution
rmsen(n, mu = rep(0, d), Sigma, theta = Inf)
rmsen(n, mu = rep(0, d), Sigma, theta = Inf)
n |
An integer specifying the number of data points to be simulated. |
mu |
A vector of length |
Sigma |
A symmetric positive-definite matrix representing the scale matrix of the distribution. |
theta |
A number greater than 0 indicating the tailedness parameter. |
A list with the following elements:
X |
A data matrix with |
w |
A vector of weights of dimension |
Punzo A., and Bagnato L. (2020). Allometric analysis using the multivariate shifted exponential normal distribution. Biometrical Journal, 62(6), 1525-1543.
d <- 3 rmsen(10, mu = rep(0, d), Sigma = diag(d), theta = 0.3)
d <- 3 rmsen(10, mu = rep(0, d), Sigma = diag(d), theta = 0.3)
Random number generation for the MTIN distribution
rmtin(n, mu = rep(0, d), Sigma, theta = 0.01)
rmtin(n, mu = rep(0, d), Sigma, theta = 0.01)
n |
An integer specifying the number of data points to be simulated. |
mu |
A vector of length |
Sigma |
A symmetric positive-definite matrix representing the scale matrix of the distribution. |
theta |
A number between 0 and 1 indicating the tailedness parameter. |
A list with the following elements:
X |
A data matrix with |
w |
A vector of weights of dimension |
Punzo A., and Bagnato L. (2021). The multivariate tail-inflated normal distribution and its application in finance. Journal of Statistical Computation and Simulation, 91(1), 1-36.
d <- 3 rmtin(10, mu = rep(0, d), Sigma = diag(d), theta = 0.9)
d <- 3 rmtin(10, mu = rep(0, d), Sigma = diag(d), theta = 0.9)