Title: | Clustering with Matrix Gaussian and Matrix Transformation Mixture Models |
---|---|
Description: | Provides matrix Gaussian mixture models, matrix transformation mixture models and their model-based clustering results. The parsimonious models of the mean matrices and variance covariance matrices are implemented with a total of 196 variations. For more information, please check: Xuwen Zhu, Shuchismita Sarkar, and Volodymyr Melnykov (2021), "MatTransMix: an R package for matrix model-based clustering and parsimonious mixture modeling", <doi:10.1007/s00357-021-09401-9>. |
Authors: | Xuwen Zhu [aut, cre], Volodymyr Melnykov [aut], Shuchismita Sarkar [ctb], Michael Hutt [ctb, cph], Stephen Moshier [ctb, cph], Rouben Rostamian [ctb, cph], Carl Edward Rasmussen [ctb, cph], Dianne Cook [ctb, cph] |
Maintainer: | Xuwen Zhu <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.18 |
Built: | 2025-01-29 20:31:19 UTC |
Source: | CRAN |
The utility of this package is the clustering of random matrices. Finite mixture modeling and model-based clustering based on matrix Gaussian mixtures and matrix transformation mixtures are employed.
Package: | MatTransMix |
Type: | Package |
Version: | 0.1.1 |
Date: | 2017-02-09 |
License: | GPL (>= 2) |
LazyLoad: | no |
Function 'MatTrans.init' runs the initialization for the EM algorithm.
Function 'MatTrans.EM' runs the EM algorithm for matrix-variate mixtures to cluster matrices.
Xuwen Zhu and Volodymyr Melnykov.
Maintainer: Xuwen Zhu <[email protected]>
Data collected by FBI's Uniform Crime on the violent and property crimes of 236 cities.
data(crime)
data(crime)
A list of 3 objects: Y, department and state. Y represents the crime rate data array from 236 cities. Department is the police department names and state represents the states where each city is located at. Y is of dimensionality 10 x 13 x 236 with 236 crime rates on the following 10 variables from year 2000 through 2012.
Population of each city;
Total number of violent crimes;
Number of murders;
Number of rape crimes;
Number of robberies;
Number of assaults;
Total number of property crimes;
Number of burglary crimes;
Number of theft crimes;
Number of vehicle theft crimes;
The data have been made publicly available by FBI's Uniform Crime Reports.
data(crime)
data(crime)
Data collected from IMDb.com on the ratings of 105 popular comedy movies.
data(IMDb)
data(IMDb)
A list of 2 objects: Y and name, where Y represents the data array of ratings and name represents the comedy movie names. Y is the of dimensionality 2 x 4 x 105 with ratings on 105 movies from female and male by age groups 0-18, 18-29, 30-44, 45+.
The data are publicly available on www.IMDb.com.
data(IMDb)
data(IMDb)
Runs the EM algorithm for matrix clustering
MatTrans.EM(Y, initial = NULL, la = NULL, nu = NULL, model = NULL, trans = "None", la.type = 0, row.skew = TRUE, col.skew = TRUE, tol = 1e-05, short.iter = NULL, long.iter = 1000, all.models = TRUE, size.control = 0, silent = TRUE)
MatTrans.EM(Y, initial = NULL, la = NULL, nu = NULL, model = NULL, trans = "None", la.type = 0, row.skew = TRUE, col.skew = TRUE, tol = 1e-05, short.iter = NULL, long.iter = 1000, all.models = TRUE, size.control = 0, silent = TRUE)
Y |
dataset of random matrices (p x T x n), n random matrices of dimensionality (p x T) |
initial |
initialization parameters provided by function MatTrans.init() |
la |
initial skewness for rows (K x p) |
nu |
initial skewness for columns (K x T) |
model |
parsimonious model type, if null, then all 210 models are run |
trans |
transformation method: None (Gaussian models), Power, Manly |
la.type |
lambda type 0 or 1, 0: unrestricted, 1: same lambda across all variables |
row.skew |
if skewness for rows are fitted: TRUE or FALSE |
col.skew |
if skewness for columns are fitted: TRUE or FALSE |
tol |
tolerance level |
short.iter |
number of short EM iterations; if not specified, just run long EM |
long.iter |
number of long EM iterations |
all.models |
if true, run long EM for all models; otherwise just the best model returned by short EM in terms of BIC |
size.control |
minimum size of clusters allowed for controlling spurious solutions |
silent |
whether to produce output of steps or not |
Runs the EM algorithm for modeling and clustering matrices for a provided dataset. Both matrix Gaussian mixture, matrix Power mixture and matrix Manly transformation mixture can be employed. The user should use the MatTrans.init() function to get initial parameters and input them as 'initial'. In the case when transformation parameters are not provided but 'trans' is specified to be 'Power' or 'Manly', 'la' and 'nu' take value of 0.5. 'model' can be specified as 'X-XXX-XX'. The first digit 'X' stands for the mean structure. It is either 'G': general mean or 'A': additive mean. The second 'XXX' specifies the variance-covariance Sigma. There are 14 options including EII, VII, EEI, VEI, EVI, VVI, EEE, EVE, VEE, VVE, EEV, VEV, EVV and VVV with detailed explanation as follows: "EII" spherical, equal volume "VII" spherical, unequal volume "EEI" diagonal, equal volume and shape "VEI" diagonal, varying volume, equal shape "EVI" diagonal, equal volume, varying shape "VVI" diagonal, varying volume and shape "EEE" ellipsoidal, equal volume, shape, and orientation "EVE" ellipsoidal, equal volume and orientation (*) "VEE" ellipsoidal, equal shape and orientation (*) "VVE" ellipsoidal, equal orientation (*) "EEV" ellipsoidal, equal volume and equal shape "VEV" ellipsoidal, equal shape "EVV" ellipsoidal, equal volume (*) "VVV" ellipsoidal, varying volume, shape, and orientation The last 2-digit 'XX' specifies the variance-covariance Psi. There are 8 options including II, EI, VI, EE, VE, EV, VV, AR. The user can specify the 'model' to be for example 'X-VVV-EV', then both 'G' and 'A' mean structures will be fitted while Sigma and Psi are fixed at 'VVV' and 'EV', respectively. Similarly, 'model' can be specified as 'G-XXX-EV' or 'G-VVV-XX' for selection of Sigma and Psi structures.
No return value, called for side effects
Runs the initialization for the EM algorithm for matrix clustering
MatTrans.init(Y, K, n.start = 10, scale = 1)
MatTrans.init(Y, K, n.start = 10, scale = 1)
Y |
dataset of random matrices (p x T x n), n random matrices of dimensionality (p x T) |
K |
number of clusters |
n.start |
initial random starts |
scale |
scaling parameter |
Random starts are used to obtain different starting values. The number of clusters, the skewness parameters, and number of random starts need to be specified. In the case when transformation parameters are not provided, the function runs the EM algorithm without any transformations, i.e., it is equivalent to the EM algorithm for a matrix Gaussian mixture. Notation: n - sample size, p x T - dimensionality of the random matrices, K - number of mixture components.
scale |
scale parameter set by the user |
result |
parsimonious models |
model |
model types |
loglik |
log likelihood values |
bic |
bic values |
best.result |
best parsimonious model |
best.model |
best model type |
best.loglik |
best logliklihood |
best.bic |
best bic |
trans |
transformation type |
set.seed(123) data(crime) Y <- crime$Y[c(2,7),,] / 1000 p <- dim(Y)[1] T <- dim(Y)[2] n <- dim(Y)[3] K <- 2 init <- MatTrans.init(Y, K = K, n.start = 2)
set.seed(123) data(crime) Y <- crime$Y[c(2,7),,] / 1000 p <- dim(Y)[1] T <- dim(Y)[2] n <- dim(Y)[3] K <- 2 init <- MatTrans.init(Y, K = K, n.start = 2)
Mean coordinate plot provided for the best fitted model returned by MatTrans.EM model.
MatTrans.plot(X, model = NULL, xlab = "", ylab = "", rownames = NULL, colnames = NULL, lwd.obs = 0.8, lwd.mean = 2, line.cols = NULL, ...)
MatTrans.plot(X, model = NULL, xlab = "", ylab = "", rownames = NULL, colnames = NULL, lwd.obs = 0.8, lwd.mean = 2, line.cols = NULL, ...)
X |
dataset of random matrices (p x T x n), n random matrices of dimensionality (p x T) |
model |
fitted MatTrans mixture model returned by function MatTrans.plot() |
xlab |
label on the X-axis |
ylab |
label on the Y-axis |
rownames |
input row variable names |
colnames |
input column variable names |
lwd.obs |
line width of observations |
lwd.mean |
line width of the mean profile |
line.cols |
line colors of the mean and observations |
... |
Provides the mean profile plot for the fitted data by MatTrans.EM model.
No return value, called for side effects
EM
classes for printing and summarizing objects.
## S3 method for class 'EM' print(x, ...) ## S3 method for class 'EM' summary(object, ...)
## S3 method for class 'EM' print(x, ...) ## S3 method for class 'EM' summary(object, ...)
x |
an object with the 'EM' class attributes. |
object |
an object with the 'EM' class attributes. |
... |
other possible options. |
Some useful functions for printing and summarizing results.
No return value, called for side effects
MatTrans.EM
.
Data collected from the Chronicle of Higher Education web site reporting the average faculty salaries from 696 universities presented in the form of a 2 by 3 by 13 -dimensional tensor.
data(Salary)
data(Salary)
A list of 2 objects: Y, uni_info. Y represents the salary data array from 696 universities. uni_info has the university names, state and category. Y is of dimensionality 2 by 3 by 13 categorized by the following factors: gender (Male, Female), professor rank (Assistant, Associate, Full), and academic year (2003-2004,2015-2016).
The data have been made publicly available by the Chronicle of Higher Education web site.
data(Salary)
data(Salary)