Package 'leptokurticMixture' reference manual

Title:	Implements Parsimonious Finite Mixtures of Multivariate Elliptical Leptokurtic-Normals
Description:	A way to fit Parsimonious Finite Mixtures of Multivariate Elliptical Leptokurtic-Normals. Two methods of estimation are implemented.
Authors:	Ryan Browne [aut, cre] (0000-0003-4543-0218), Luca Bagnato [ctb], Antonio Punzo [ctb]
Maintainer:	Ryan Browne <[email protected]>
License:	GPL (>= 2)
Version:	1.1
Built:	2025-01-30 07:35:23 UTC
Source:	CRAN

Compare the two methods of estimation

Description

Compare the two methods of estimation for fitting a finite mixture of multivariate elliptical leptokurtic-normal distributions; fixed point iterations and MM algorithm.

Usage

compareEstimation(
  mod = NULL,
  data = NULL,
  G = NULL,
  n = 10^4,
  tol = 1e-06,
  wt = NULL,
  n0 = 25,
  lab = NULL
)
compareEstimation(
  mod = NULL,
  data = NULL,
  G = NULL,
  n = 10^4,
  tol = 1e-06,
  wt = NULL,
  n0 = 25,
  lab = NULL
)

Arguments

`mod`	A character of length 4 such as "VVVV", indicating the model; the covariance and beta parameters.
`data`	A n x p matrix of observations.
`G`	The number of components to fit.
`n`	The maximum number of EM iterations.
`tol`	The tolerance for the stopping rule; lack of progress. The default is 1e-6 but it depends on the dataset.
`wt`	a (n x d) matrix of weights for initialization if NULL, then a random weight matrix is generated.
`n0`	Given wt, the number of iterations used to obtain the initial parameters
`lab`	Using given labels (lab) as starting values.

Value

A vector of times, number of iterations and log-likelihood values.

EM for the finite mixtures of MLN

Description

Performs a number of iterations of the EM for the multivariate elliptical leptokurtic-normal (MLN) distribution until the tolerance for the lack progress or the maximum number of iterations is reached. An implementation of parsimonious clustering models via the eigen-decomposition of the scatter matrix and allowing the concentration parameter to be varying, equal or fixed across components.

Usage

EM(
  data = NULL,
  G = 2,
  model = NULL,
  kml = c(1, 0, 1),
  n = 10,
  epsilon = 0.01,
  gpar0 = NULL,
  estimation = 1,
  label = NULL
)
EM(
  data = NULL,
  G = 2,
  model = NULL,
  kml = c(1, 0, 1),
  n = 10,
  epsilon = 0.01,
  gpar0 = NULL,
  estimation = 1,
  label = NULL
)

Arguments

`data`	A n x p matrix of observations.
`G`	A integer determine the number of components of the mixture model.
`model`	a character of length 4 such as "VVVV", indicating the model; the covariance and beta parameters. The 1st position controls, lambda, the volume; "V" varying across components or "E" equal across components. The 2nd position controls the eigenvalues; V" varying across components, "E" equal across components or "I" the identity matrix. The 3rd position controls the orientation; "V" varying across components, "E" equal across components or "I" the identity matrix. The 4th position controls the concentration, beta; "V" varying across components, "E" equal across components, "F" fixed at the maximum value.
`kml`	a vector of length 3 indicating, the number of k-means starts, number of random starts and the number of EM iterations used for each start
`n`	The maximum number of EM iterations.
`epsilon`	The tolerance for the stopping rule; lack of progress. The default is 1e-6 but it depends on the dataset.
`gpar0`	A list of model parameters .
`estimation`	If 1 (default) use the fixed point iterations and if 2 the MM algorithm.
`label`	If `NULL` then the data has no known groups. If `is.integer` then some of the observations have known groups. If `label[i]=k` then observation belongs to group `k`. If `label[i]=0` then observation has no known group.

Value

A list with following items

loglik - A vector of the loglikelihood values
gpar - A list containing the parameters values
z - A n x G matrix of the posterior probabilities
map - A vector the maximum a posteriori derived from z
label - The input provided.
numpar - The number of free parameters in the fitted model.
maxLoglik - The largest value from loglik.

Examples

x1 = rmln(n=100, d=4, mu=rep(5,4), diag(4), beta=2)
x2 = rmln(n=100, d=4, mu=rep(-5,4), diag(4), beta=2)
x = rbind( x1,x2)
mlnFit = EM(data=x, G=2, model="VVVF")
x1 = rmln(n=100, d=4, mu=rep(5,4), diag(4), beta=2)
x2 = rmln(n=100, d=4, mu=rep(-5,4), diag(4), beta=2)
x = rbind( x1,x2)
mlnFit = EM(data=x, G=2, model="VVVF")

Parsimonious model-based clustering with the multivariate elliptical leptokurtic-normal

Description

Performs parsimonious clustering with the multivariate elliptical leptokurtic-normal (MLN). There are 14 possible scale matrix structure and 2 for the kurtosis parameter for a total of 28 models.

Usage

pmln(
  data = NULL,
  G = 1:3,
  covModels = NULL,
  betaModels = "B",
  kml = c(1, 0, 1),
  label = NULL,
  scale.data = TRUE,
  veo = FALSE,
  iterMax = 1000,
  tol = 1e-08,
  pprogress = FALSE,
  method = "FP"
)
pmln(
  data = NULL,
  G = 1:3,
  covModels = NULL,
  betaModels = "B",
  kml = c(1, 0, 1),
  label = NULL,
  scale.data = TRUE,
  veo = FALSE,
  iterMax = 1000,
  tol = 1e-08,
  pprogress = FALSE,
  method = "FP"
)

Arguments

`data`	A n x p matrix of observations.
`G`	A integer determine the number of components of the mixture model.
`covModels`	if NULL fit 14 possible scale matrix structures. Otherwise a character vector where each element has length 3. e.g. c("VVV", "EEE") A character of length 4 such as "VVVV", indicating the model; the covariance and beta parameters. The 1st position controls, lambda, the volume; "V" varying across components or "E" equal across components. The 2nd position controls the eigenvalues; V" varying across components, "E" equal across components or "I" the identity matrix. The 3rd position controls the orientation; "V" varying across components, "E" equal across components or "I" the identity matrix.
`betaModels`	set to "V", "E", "B", "F". "V" varying across components, "E" equal across components, "B" consider both "V" & "E", "F" fixed at the maximum value.
`kml`	a vector of length 3 indicating, the number of k-means starts, number of random starts and the number of EM iterations used for each start
`label`	If `NULL` then the data has no known groups. If `is.integer` then some of the observations have known groups. If `label[i]=k` then observation belongs to group `k`. If `label[i]=0` then observation has no known group.
`scale.data`	Should the data be scaled before clustering. The default is TRUE.
`veo`	"Variables exceed observations". If TRUE, fit the model even though the number variables in the model exceeds the number of observations.
`iterMax`	The maximum number of EM iterations for each model fitted.
`tol`	The tol for the stopping rule; lack of progress. The default is 1e-6 but it depends on the data set.
`pprogress`	If TRUE print the progress of the function.
`method`	If FP use the fixed point iteration method otherwise if MM use the MM method.

Value

A list of

startobject - A statement on how the models were initialized
gpar - A list of parameter values for the model choosen by the BIC
loglik - A vector of the log-likelihoods values
z - A n x G matrix of the posterior probabilities from the model choosen by the BIC
map - A vector the maximum a posteriori derived from z
BIC - An array with dimensions (G, number of fitted models, 3). The last dimension indices the loglik, number of free parameters and BIC for each fitted model.
bicModel - Information as list on the model choosen by the BIC.

Examples

x1 = rmln(n=100, d=4, mu=rep(5,4), diag(4), beta=2)
x2 = rmln(n=100, d=4, mu=rep(-5,4), diag(4), beta=2)
x = rbind( x1,x2)
mlnFit = pmln(data=x, G=2, covModels=c("VVV", "EEE"), betaModels="B")
x1 = rmln(n=100, d=4, mu=rep(5,4), diag(4), beta=2)
x2 = rmln(n=100, d=4, mu=rep(-5,4), diag(4), beta=2)
x = rbind( x1,x2)
mlnFit = pmln(data=x, G=2, covModels=c("VVV", "EEE"), betaModels="B")

Generate realizations from the multivariate elliptical leptokurtic-normal distribution

Description

This function calculates the log cumulative density function for the multivariate-t with scale matrix equal to the identity matrix. It finds the mode and then uses Gaussian quadrature to estimate the integral.

Usage

rmln(n = NULL, d = NULL, mu = NULL, Sigma = NULL, beta = NULL)
rmln(n = NULL, d = NULL, mu = NULL, Sigma = NULL, beta = NULL)

Arguments

`n`	number of observations
`d`	the dimension of the observations
`mu`	location parameter of length d
`Sigma`	(d x d) scatter matrix
`beta`	the concentration parameter

Value

A (n x d) matrix of realizations

Examples

x = rmln(n=10, d=4, mu=rep(0,4), diag(4), beta=2)
x = rmln(n=10, d=4, mu=rep(0,4), diag(4), beta=2)

Package 'leptokurticMixture'

Help Index

Compare the two methods of estimation

Description

Usage

Arguments

Value

EM for the finite mixtures of MLN

Description

Usage

Arguments

Value

Examples

Parsimonious model-based clustering with the multivariate elliptical leptokurtic-normal

Description

Usage

Arguments

Value

Examples

Generate realizations from the multivariate elliptical leptokurtic-normal distribution

Description

Usage

Arguments

Value

Examples