Title: | Flexible Cluster-Weighted Modeling |
---|---|
Description: | Allows maximum likelihood fitting of cluster-weighted models, a class of mixtures of regression models with random covariates. Methods are described in Angelo Mazza, Antonio Punzo, Salvatore Ingrassia (2018) <doi:10.18637/jss.v086.i02>. |
Authors: | Mazza A., Punzo A., Ingrassia S. |
Maintainer: | Angelo Mazza <[email protected]> |
License: | GPL-2 |
Version: | 1.92 |
Built: | 2024-12-12 07:11:33 UTC |
Source: | CRAN |
Allows for maximum likelihood fitting of cluster-weighted models, a class of mixtures of regression models with random covariates.
Package: | CWM |
Type: | Package |
Version: | 1.7 |
Date: | 2017-02-14 |
License: | GNU-2 |
Mazza A., Punzo A., Ingrassia S.
Maintainer: Mazza Angelo <[email protected]>
Mazza, A., Ingrassia, S., and Punzo, A. (2018). flexCWM: A Flexible Framework for Cluster-Weighted Models. Journal of Statistical Software, 86(2), 1-30.
Ingrassia, S., Minotti, S. C., and Vittadini, G. (2012). Local Statistical Modeling via the Cluster-Weighted Approach with Elliptical Distributions. Journal of Classification, 29(3), 363-401.
Ingrassia, S., Minotti, S. C., and Punzo, A. (2014). Model-based clustering via linear cluster-weighted models. Computational Statistics and Data Analysis, 71, 159-182.
Ingrassia, S., Punzo, A., and Vittadini, G. (2015). The Generalized Linear Mixed Cluster-Weighted Model. Journal of Classification, 32(forthcoming)
Punzo, A. (2014). Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model. Statistical Modelling, 14(3), 257-291.
Maximum likelihood fitting of the cluster-weighted model by the EM algorithm.
cwm(formulaY = NULL, familyY = gaussian, data, Xnorm = NULL, Xbin = NULL, Xpois = NULL, Xmult = NULL, modelXnorm = NULL, Xbtrials = NULL, k = 1:3, initialization = c("random.soft", "random.hard", "kmeans", "mclust", "manual"), start.z = NULL, seed = NULL, maxR = 1, iter.max = 1000, threshold = 1.0e-04, eps = 1e-100, parallel = FALSE, pwarning = FALSE)
cwm(formulaY = NULL, familyY = gaussian, data, Xnorm = NULL, Xbin = NULL, Xpois = NULL, Xmult = NULL, modelXnorm = NULL, Xbtrials = NULL, k = 1:3, initialization = c("random.soft", "random.hard", "kmeans", "mclust", "manual"), start.z = NULL, seed = NULL, maxR = 1, iter.max = 1000, threshold = 1.0e-04, eps = 1e-100, parallel = FALSE, pwarning = FALSE)
formulaY |
an optional object of class " |
familyY |
a description of the error distribution and link function to be used for the conditional distribution of
Default value is |
data |
an optional |
Xnorm , Xbin , Xpois , Xmult
|
an optional matrix containing variables to be used for marginalization having normal, binomial, Poisson and multinomial distributions. |
modelXnorm |
an optional vector of character strings indicating the parsimonious models to be fitted for variables in |
Xbtrials |
an optional vector containing the number of trials for each column in |
k |
an optional vector containing the numbers of mixture components to be tried. Default value is |
initialization |
an optional character string. It sets the initialization strategy for the EM-algorithm. It can be:
Default value is |
start.z |
matrix of soft or hard classification: it is used only if |
seed |
an optional scalar. It sets the seed for the random number generator, when random initializations are used; if |
maxR |
number of initializations to be tried. Default value is 1. |
iter.max |
an optional scalar. It sets the maximum number of iterations in the EM-algorithm. Default value is 200. |
threshold |
an optional scalar. It sets the threshold for the Aitken acceleration procedure. Default value is 1.0e-04. |
eps |
an optional scalar. It sets the smallest value for eigenvalues of covariance matrices for |
parallel |
When |
pwarning |
When |
When familyY = binomial
, the response variable must be a matrix with two columns, where the first column is the number of "successes" and the second column is the number of "failures".
When several models have been estimated, methods summary
and print
consider the best model according to the information criterion in criterion
, among the estimated models having a number of components among those in k
an error distribution among those in familyY
and a parsimonious model among those in modelXnorm
.
This function returns a class cwm
object, which is a list of values related to the model selected. It contains:
call |
an object of class |
formulaY |
an object of class |
familyY |
the distribution used for the conditional distribution of |
data |
a |
concomitant |
a list containing |
Xbtrials |
number of trials used for |
models |
a list; each element is related to one of the models fitted. Each element is a list and contains: |
posterior
posterior probabilities
iter
number of iterations performed in EM algorithm
k
number of (fitted) mixture components.
size
estimated size of the groups.
cluster
classification vector
loglik
final log-likelihood value
df
overall number of estimated parameters
prior
weights for the mixture components
IC
list containing values of the information criteria
converged
logical; TRUE
if EM algorithm converged
GLModels
a list; each element is related to a mixture component and contains:
model
a "glm
" class object.
sigma
estimated local scale parameters of the conditional distribution of , when
familyY
is gaussian
or student.t
t_df
estimated degrees of freedom of the t distribution, when familyY
is student.t
nuY
estimated shape parameter, when familyY
is Gamma
. The gamma distribution is parameterized according to McCullagh & Nelder (1989, p. 30)
concomitant
a list with estimated concomitant variables parameters for each mixture component
normal.d, multinomial.d, poisson.d, binomial.d
marginal distribution of concomitant variables
normal.mu
mixture component means for Xnorm
normal.Sigma
mixture component covariance matrices for Xnorm
normal.model
models fitted for Xnorm
multinomial.probs
multinomial distribution probabilities for Xmult
poisson.lambda
lambda parameters for Xpois
binomial.p
binomial probabilities for Xbin
Mazza A., Punzo A., Ingrassia S.
Mazza, A., Ingrassia, S., and Punzo, A. (2018). flexCWM: A Flexible Framework for Cluster-Weighted Models. Journal of Statistical Software, 86(2), 1-30.
Ingrassia, S., Minotti, S. C., and Vittadini, G. (2012). Local Statistical Modeling via the Cluster-Weighted Approach with Elliptical Distributions. Journal of Classification, 29(3), 363-401.
Ingrassia, S., Minotti, S. C., and Punzo, A. (2014). Model-based clustering via linear cluster-weighted models. Computational Statistics and Data Analysis, 71, 159-182.
Ingrassia, S., Punzo, A., and Vittadini, G. (2015). The Generalized Linear Mixed Cluster-Weighted Model. Journal of Classification, 32(forthcoming)
McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall, Boca Raton, 2nd edition
Punzo, A. (2014). Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model. Statistical Modelling, 14(3), 257-291.
## an exemple with artificial data data("ExCWM") attach(ExCWM) str(ExCWM) # mixtures of binomial distributions resXbin <- cwm(Xbin = Xbin, k = 1:2, initialization = "kmeans") getParXbin(resXbin) # Mixtures of Poisson distributions resXpois <- cwm(Xpois = Xpois, k = 1:2, initialization = "kmeans") getParXpois(resXpois) # parsimonious mixtures of multivariate normal distributions resXnorm <- cwm(Xnorm = cbind(Xnorm1,Xnorm2), k = 1:2, initialization = "kmeans") getParXnorm(resXnorm) ## an exemple with real data data("students") attach(students) str(students) # CWM fit2 <- cwm(WEIGHT ~ HEIGHT + HEIGHT.F , Xnorm = cbind(HEIGHT, HEIGHT.F), k = 2, initialization = "kmeans", modelXnorm = "EEE") summary(fit2, concomitant = TRUE) plot(fit2)
## an exemple with artificial data data("ExCWM") attach(ExCWM) str(ExCWM) # mixtures of binomial distributions resXbin <- cwm(Xbin = Xbin, k = 1:2, initialization = "kmeans") getParXbin(resXbin) # Mixtures of Poisson distributions resXpois <- cwm(Xpois = Xpois, k = 1:2, initialization = "kmeans") getParXpois(resXpois) # parsimonious mixtures of multivariate normal distributions resXnorm <- cwm(Xnorm = cbind(Xnorm1,Xnorm2), k = 1:2, initialization = "kmeans") getParXnorm(resXnorm) ## an exemple with real data data("students") attach(students) str(students) # CWM fit2 <- cwm(WEIGHT ~ HEIGHT + HEIGHT.F , Xnorm = cbind(HEIGHT, HEIGHT.F), k = 2, initialization = "kmeans", modelXnorm = "EEE") summary(fit2, concomitant = TRUE) plot(fit2)
An artificial data set, with 200 observations, generated by a CWM with 2 mixture components of different size, one binomial response variable, and four covariates with bivariate Gaussian, Poisson and Binomial distribution, respectively.
data(ExCWM)
data(ExCWM)
A dataset
data("ExCWM") attach(ExCWM) str(ExCWM) # mixtures of binomial distributions resXbin <- cwm(Xbin = Xbin, k = 1:2, initialization = "kmeans") getParXbin(resXbin) # Mixtures of Poisson distributions resXpois <- cwm(Xpois = Xpois, k = 1:2, initialization = "kmeans") getParXpois(resXpois) # parsimonious mixtures of multivariate normal distributions resXnorm <- cwm(Xnorm = cbind(Xnorm1,Xnorm2), k = 1:2, initialization = "kmeans") getParXnorm(resXnorm)
data("ExCWM") attach(ExCWM) str(ExCWM) # mixtures of binomial distributions resXbin <- cwm(Xbin = Xbin, k = 1:2, initialization = "kmeans") getParXbin(resXbin) # Mixtures of Poisson distributions resXpois <- cwm(Xpois = Xpois, k = 1:2, initialization = "kmeans") getParXpois(resXpois) # parsimonious mixtures of multivariate normal distributions resXnorm <- cwm(Xnorm = cbind(Xnorm1,Xnorm2), k = 1:2, initialization = "kmeans") getParXnorm(resXnorm)
cwm
class objects.
These functions extract values from cwm
class objects.
getBestModel(object, criterion = "BIC", k = NULL, modelXnorm = NULL, familyY = NULL) getPosterior(object, ...) getSize(object, ...) getCluster(object, ...) getParGLM(object, ...) getParConcomitant(object, name = NULL, ...) getPar(object, ...) getParPrior(object, ...) getParXnorm(object, ...) getParXbin(object, ...) getParXpois(object, ...) getParXmult(object, ...) getIC(object,criteria) whichBest(object, criteria = NULL, k = NULL, modelXnorm = NULL, familyY = NULL) ## S3 method for class 'cwm' summary(object, criterion = "BIC", concomitant = FALSE, digits = getOption("digits")-2, ...) ## S3 method for class 'cwm' print(x, ...)
getBestModel(object, criterion = "BIC", k = NULL, modelXnorm = NULL, familyY = NULL) getPosterior(object, ...) getSize(object, ...) getCluster(object, ...) getParGLM(object, ...) getParConcomitant(object, name = NULL, ...) getPar(object, ...) getParPrior(object, ...) getParXnorm(object, ...) getParXbin(object, ...) getParXpois(object, ...) getParXmult(object, ...) getIC(object,criteria) whichBest(object, criteria = NULL, k = NULL, modelXnorm = NULL, familyY = NULL) ## S3 method for class 'cwm' summary(object, criterion = "BIC", concomitant = FALSE, digits = getOption("digits")-2, ...) ## S3 method for class 'cwm' print(x, ...)
object , x
|
a class |
criterion |
a string with the information criterion to consider; supported values are: |
criteria |
a vector of strings with the names of information criteria to consider. If |
k |
an optional vector containing the numbers of mixture components to consider. If not specified, all the estimated models are considered. |
modelXnorm |
an optional vector of character strings indicating the parsimonious models to consider for |
familyY |
an optional vector of character strings indicating the conditional distribution of |
name |
an optional vector of strings specifing the names of distribution families of concomitant variables; if |
concomitant |
When |
digits |
integer used for number formatting. |
... |
additional arguments to be passed to |
When several models have been estimated, these functions consider the best model according to the information criterion in criterion
, among the estimated models having a number of components among those in k
an error distribution among those in familyY
and a parsimonious model among those in modelXnorm
.
getIC
provides values for the information criteria in criteria
.
The getBestModel
method returns a cwm
object containing the best model only, selected as described above.
#res <- cwm(Y=Y,Xcont=X,k=1:4,seed=1) #summary(res) #plot(res)
#res <- cwm(Y=Y,Xcont=X,k=1:4,seed=1) #summary(res) #plot(res)
Plot method for cwm class objects.
## S3 method for class 'cwm' plot(x, regr = TRUE, ctype = c("Xnorm","Xbin","Xpois", "Xmult"), which = NULL, criterion = "BIC", k = NULL, modelXnorm = NULL, familyY = NULL,histargs=list(breaks=31),...)
## S3 method for class 'cwm' plot(x, regr = TRUE, ctype = c("Xnorm","Xbin","Xpois", "Xmult"), which = NULL, criterion = "BIC", k = NULL, modelXnorm = NULL, familyY = NULL,histargs=list(breaks=31),...)
x |
An object of class |
regr |
boolean, allows for bivariate regression plot. |
ctype |
a vector with concomitant variables types to plot. |
which |
a vector with columns number to plot, or "all" for all the columns |
criterion |
a string with the information criterion to consider; supported values are: |
k |
an optional vector containing the numbers of mixture components to consider. If not specified, all the estimated models are considered. |
modelXnorm |
an optional vector of character strings indicating the parsimonious models to consider for |
familyY |
an optional vector of character strings indicating the conditional distribution of |
histargs |
an optional list with |
... |
further arguments for |
data("students") attach(students) str(students) fit2 <- cwm(WEIGHT ~ HEIGHT + HEIGHT.F , Xnorm = cbind(HEIGHT, HEIGHT.F), k = 2, initialization = "kmeans", modelXnorm = "EEE") summary(fit2, concomitant = TRUE) plot(fit2)
data("students") attach(students) str(students) fit2 <- cwm(WEIGHT ~ HEIGHT + HEIGHT.F , Xnorm = cbind(HEIGHT, HEIGHT.F), k = 2, initialization = "kmeans", modelXnorm = "EEE") summary(fit2, concomitant = TRUE) plot(fit2)
A dataframe with data from a survey of 270 students attending a statistics course at the Department of Economics and Business of the University of Catania in the academic year 2011/2012. It contains the following variables:
GENDER
gender of the respondent;
HEIGHT
height of the respondent, measured in centimeters;
WEIGHT
weight of the respondent, measured in kilograms;
HEIGHT.F
height of respondent's father, measured in centimeters.
data(students)
data(students)
A dataset
http://www.economia.unict.it/punzo/
Ingrassia, S., Minotti, S. C., and Punzo, A. (2014). Model-based clustering via linear cluster-weighted models. Computational Statistics and Data Analysis, 71, 159-182.
data("students") attach(students) str(students) fit2 <- cwm(WEIGHT ~ HEIGHT + HEIGHT.F , Xnorm = cbind(HEIGHT, HEIGHT.F), k = 2, initialization = "kmeans", modelXnorm = "EEE") summary(fit2, concomitant = TRUE) plot(fit2)
data("students") attach(students) str(students) fit2 <- cwm(WEIGHT ~ HEIGHT + HEIGHT.F , Xnorm = cbind(HEIGHT, HEIGHT.F), k = 2, initialization = "kmeans", modelXnorm = "EEE") summary(fit2, concomitant = TRUE) plot(fit2)