Title: | Flexible Mixture Modeling |
---|---|
Description: | A general framework for finite mixtures of regression models using the EM algorithm is implemented. The E-step and all data handling are provided, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering. |
Authors: | Bettina Gruen [aut, cre] , Friedrich Leisch [aut] , Deepayan Sarkar [ctb] , Frederic Mortier [ctb], Nicolas Picard [ctb] |
Maintainer: | Bettina Gruen <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.3-19 |
Built: | 2024-11-11 07:26:13 UTC |
Source: | CRAN |
Compute the Akaike Information Criterion.
Compute the AIC of a flexmix
object
Compute the AIC of all
models contained in the stepFlexmix
object.
22-centre clinical trial of beta-blockers for reducing mortality after myocardial infarction.
data("betablocker")
data("betablocker")
A data frame with 44 observations on the following 4 variables.
Number of deaths.
Total number of patients.
Number of clinical centre.
A factor with levels Control
and Treated
.
G. McLachlan and D. Peel. Finite Mixture Models, 2000. John Wiley and Sons Inc. http://www.maths.uq.edu.au/~gjm/DATA/mmdata.html
M. Aitkin. Meta-analysis by random effect modelling in generalized linear models. Statistics in Medicine, 18, 2343–2351, 1999.
S. Yusuf, R. Peto, J. Lewis, R. Collins and P. Sleight. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Progress in Cardiovascular Diseases, 27, 335–371, 1985.
data("betablocker", package = "flexmix") betaMix <- initFlexmix(cbind(Deaths, Total - Deaths) ~ 1 | Center, data = betablocker, k = 3, nrep = 5, model = FLXMRglmfix(family = "binomial", fixed = ~Treatment))
data("betablocker", package = "flexmix") betaMix <- initFlexmix(cbind(Deaths, Total - Deaths) ~ 1 | Center, data = betablocker, k = 3, nrep = 5, model = FLXMRglmfix(family = "binomial", fixed = ~Treatment))
Compute the Bayesian Information Criterion.
Compute the BIC of a flexmix
object
Compute the BIC of all
models contained in the stepFlexmix
object.
A sample of 915 biochemistry graduate students.
data("bioChemists")
data("bioChemists")
count of articles produced during last 3 years of Ph.D.
factor indicating gender of student, with levels Men and Women
factor indicating marital status of student, with levels Single and Married
number of children aged 5 or younger
prestige of Ph.D. department
count of articles produced by Ph.D. mentor during last 3 years
This data set is taken from package pscl provided by Simon Jackman.
found in Stata format at https://jslsoc.sitehost.iu.edu/stata/spex_data/couart2.dta
Long, J. Scott. The origins of sex difference in science. Social Forces, 68, 1297–1315, 1990.
Long, J. Scott. Regression Models for Categorical and Limited Dependent Variables, 1997. Thousand Oaks, California: Sage.
Given a flexmix
object perform parametric or empirical bootstrap.
boot(object, ...) ## S4 method for signature 'flexmix' boot(object, R, sim = c("ordinary", "empirical", "parametric"), initialize_solution = FALSE, keep_weights = FALSE, keep_groups = TRUE, verbose = 0, control, k, model = FALSE, ...) LR_test(object, ...) ## S4 method for signature 'flexmix' LR_test(object, R, alternative = c("greater", "less"), control, ...)
boot(object, ...) ## S4 method for signature 'flexmix' boot(object, R, sim = c("ordinary", "empirical", "parametric"), initialize_solution = FALSE, keep_weights = FALSE, keep_groups = TRUE, verbose = 0, control, k, model = FALSE, ...) LR_test(object, ...) ## S4 method for signature 'flexmix' LR_test(object, R, alternative = c("greater", "less"), control, ...)
object |
A fitted finite mixture model of class |
R |
The number of bootstrap replicates. |
sim |
A character string indicating the type of simulation
required. Possible values are |
initialize_solution |
A logical. If |
keep_weights |
A logical. If |
keep_groups |
A logical. If |
verbose |
If a positive integer, then progress information
is reported every |
control |
Object of class |
k |
Vector of integers specifying for which number of components
finite mixtures are fitted to the bootstrap samples. If missing the
number of components of the fitted |
alternative |
A character string specifying the alternative
hypothesis, must be either |
model |
A logical. If |
... |
Further arguments to be passed to or from methods. |
boot
returns an object of class FLXboot
which
contains the fitted parameters, the fitted priors, the log
likelihoods, the number of components of the fitted mixtures and the
information if the EM algorithm has converged.
LR_test
returns an object of class htest
containing the
number of valid bootstrap replicates, the p-value, the - twice log
likelihood ratio test statistics for the original data and the
bootstrap replicates.
Bettina Gruen
data("NPreg", package = "flexmix") fitted <- initFlexmix(yn ~ x + I(x^2) | id2, data = NPreg, k = 2) ## Not run: lrtest <- LR_test(fitted, alternative = "greater", R = 20, verbose = 1) ## End(Not run)
data("NPreg", package = "flexmix") fitted <- initFlexmix(yn ~ x + I(x^2) | id2, data = NPreg, k = 2) ## Not run: lrtest <- LR_test(fitted, alternative = "greater", R = 20, verbose = 1) ## End(Not run)
A simple artificial regression example data set with 3 latent
classes, one independent variable x
and a concomitant
variable w
.
data("BregFix")
data("BregFix")
A data frame with 200 observations on the following 5 variables.
yes
number of successes
no
number of failures
x
independent variable
w
concomitant variable, a factor with levels 0
1
class
latent class memberships
data("BregFix", package = "flexmix") Model <- FLXMRglmfix(family="binomial", nested = list(formula = c(~x, ~0), k = c(2, 1))) Conc <- FLXPmultinom(~w) FittedBin <- initFlexmix(cbind(yes, no) ~ 1, data = BregFix, k = 3, model = Model, concomitant = Conc) summary(FittedBin)
data("BregFix", package = "flexmix") Model <- FLXMRglmfix(family="binomial", nested = list(formula = c(~x, ~0), k = c(2, 1))) Conc <- FLXPmultinom(~w) FittedBin <- initFlexmix(cbind(yes, no) ~ 1, data = BregFix, k = 3, model = Model, concomitant = Conc) summary(FittedBin)
The data is from a new product and concept test where the number of individual packs of hard candy purchased within the past 7 days is recorded.
data("candy")
data("candy")
A data frame with 21 observations on the following 2 variables.
Packages
a numeric vector
Freq
a numeric vector
D. Boehning, E. Dietz and P. Schlattmann. Recent Developments in Computer-Assisted Analysis of Mixtures. Biometrics 54(2), 525–536, 1998.
J. Magidson and J. K. Vermunt. Latent Class Models. In D. W. Kaplan (ed.), The Sage Handbook of Quantitative Methodology for the Social Sciences, 175–198, 2004. Thousand Oakes: Sage Publications.
D. Boehning, E. Dietz and P. Schlattmann. Recent Developments in Computer-Assisted Analysis of Mixtures. Biometrics, 54(2), 525–536, 1998.
W. R. Dillon and A. Kumar. Latent structure and other mixture models in marketing: An integrative survey and overview. In R. P. Bagozzi (ed.), Advanced methods of marketing research, 352–388, 1994. Cambridge, UK: Blackwell.
Count data from a dental epidemiological study for evaluation of various programs for reducing caries collected among school children from an urban area of Belo Horizonte (Brazil).
data("dmft")
data("dmft")
A data frame with 797 observations on the following 5 variables.
Number of decayed, missing or filled teeth at the end of the study.
Number of decayed, missing or filled teeth at the beginning of the study.
A factor with levels male
and female
.
A factor with levels brown
, white
and black
.
A factor with levels control
, educ
, enrich
,
rinse
, hygiene
and all
.
The aim of the caries prevention study was to compare four methods to prevent dental caries. Interventions were carried out according to the following scheme:
Control group
Oral health education
Enrichment of the school diet with rice bran
Mouthwash with 0.2% sodium floride (NaF) solution
Oral hygiene
All four methods together
D. Boehning, E. Dietz, P. Schlattmann, L. Mendonca and U. Kirchner. The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society A, 162(2), 195–209, 1999.
data("dmft", package = "flexmix") dmft_flx <- initFlexmix(End ~ 1, data = dmft, k = 2, model = FLXMRglmfix(family = "poisson", fixed = ~ Gender + Ethnic + Treatment))
data("dmft", package = "flexmix") dmft_flx <- initFlexmix(End ~ 1, data = dmft, k = 2, model = FLXMRglmfix(family = "poisson", fixed = ~ Gender + Ethnic + Treatment))
Compute the entropic measure information criterion for model selection.
## S4 method for signature 'flexmix' EIC(object, ...) ## S4 method for signature 'stepFlexmix' EIC(object, ...)
## S4 method for signature 'flexmix' EIC(object, ...) ## S4 method for signature 'stepFlexmix' EIC(object, ...)
object |
See Methods section below |
... |
Some methods for this generic function may take additional, optional arguments. At present none do. |
Returns a numeric vector with the corresponding EIC value(s).
Compute the EIC of a flexmix
object.
Compute the EIC of all
models contained in the stepFlexmix
object.
Bettina Gruen
V. Ramaswamy, W. S. DeSarbo, D. J. Reibstein, and W. T. Robinson. An empirical pooling approach for estimating marketing mix elasticities with PIMS data. Marketing Science, 12(1), 103–124, 1993.
data("NPreg", package = "flexmix") ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2) EIC(ex1)
data("NPreg", package = "flexmix") ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2) EIC(ex1)
Generate random data mixed from k generalized linear regressions (GLMs).
ExLinear(beta, n, xdist = "runif", xdist.args = NULL, family = c("gaussian","poisson"), sd = 1, ...)
ExLinear(beta, n, xdist = "runif", xdist.args = NULL, family = c("gaussian","poisson"), sd = 1, ...)
beta |
A matrix of regression coefficients. Each row corresponds to a variable, each column to a mixture component. The first row is used as intercept. |
n |
Integer, the number of observations per component. |
xdist |
Character, name of a random number function for the explanatory variables. |
xdist.args |
List, arguments for the random number functions. |
family |
A character string naming a GLM
family.Only |
sd |
Numeric, the error standard deviation for each component for Gaussian responses. |
... |
Used as default for |
First, arguments n
(and sd
for Gaussian response) are
recycled to the number of mixture components ncol(beta)
, and
arguments xdist
and xdist.args
are recycled to the
number of explanatory variables nrow(beta)-1
. Then a design
matrix is created for each mixture component by drawing random numbers
from xdist
. For each component, the design matrix is multiplied
by the regression coefficients to form the linear predictor. For
Gaussian responses the identity link is used, for Poisson responses
the log link.
The true cluster memberships are returned as attribute "clusters"
.
## simple example in 2d beta <- matrix(c(1, 2, 3, -1), ncol = 2) sigma <- c(.5, 1) df1 <- ExLinear(beta, 100, sd = sigma, min = -1, max = 2) plot(y~x1, df1, col = attr(df1, "clusters")) ## add a second explanatory variable with exponential distribution beta2 <- rbind(beta, c(-2, 2)) df2 <- ExLinear(beta2, 100, sd = c(.5, 1), xdist = c("runif", "rexp"), xdist.args = list(list(min = -1, max = 2), list(rate = 3))) summary(df2) opar = par("mfrow") par(mfrow = 1:2) hist(df2$x1) hist(df2$x2) par(opar) f1 <- flexmix(y ~ ., data = df2, k = 2) ## sort components on Intercept f1 <- relabel(f1, "model", "Intercept") ## the parameters should be close to the true beta and sigma round(parameters(f1), 3) rbind(beta2, sigma) ### A simple Poisson GLM df3 <- ExLinear(beta/2, 100, min = -1, max = 2, family = "poisson") plot(y ~ x1, df3, col = attr(df3, "clusters")) f3 <- flexmix(y ~ ., data = df3, k = 2, model = FLXMRglm(family = "poisson")) round(parameters(relabel(f3, "model", "Intercept")), 3) beta/2
## simple example in 2d beta <- matrix(c(1, 2, 3, -1), ncol = 2) sigma <- c(.5, 1) df1 <- ExLinear(beta, 100, sd = sigma, min = -1, max = 2) plot(y~x1, df1, col = attr(df1, "clusters")) ## add a second explanatory variable with exponential distribution beta2 <- rbind(beta, c(-2, 2)) df2 <- ExLinear(beta2, 100, sd = c(.5, 1), xdist = c("runif", "rexp"), xdist.args = list(list(min = -1, max = 2), list(rate = 3))) summary(df2) opar = par("mfrow") par(mfrow = 1:2) hist(df2$x1) hist(df2$x2) par(opar) f1 <- flexmix(y ~ ., data = df2, k = 2) ## sort components on Intercept f1 <- relabel(f1, "model", "Intercept") ## the parameters should be close to the true beta and sigma round(parameters(f1), 3) rbind(beta2, sigma) ### A simple Poisson GLM df3 <- ExLinear(beta/2, 100, min = -1, max = 2, family = "poisson") plot(y ~ x1, df3, col = attr(df3, "clusters")) f3 <- flexmix(y ~ ., data = df3, k = 2, model = FLXMRglm(family = "poisson")) round(parameters(relabel(f3, "model", "Intercept")), 3) beta/2
A simple artificial example for normal clustering with 4 latent classes, all of them having a Gaussian distribution. See the function definition for true means and covariances.
ExNclus(n) data("Nclus")
ExNclus(n) data("Nclus")
n |
Number of observations in the two small latent classes. |
The Nclus
data set can be re-created by ExNclus(100)
using set.seed(2602)
, it has been saved as a data set for
simplicity of examples only.
data("Nclus", package = "flexmix") require("MASS") eqscplot(Nclus, col = rep(1:4, c(100, 100, 150, 200)))
data("Nclus", package = "flexmix") require("MASS") eqscplot(Nclus, col = rep(1:4, c(100, 100, 150, 200)))
A simple artificial regression example with 2 latent classes, one
independent variable
(uniform on ), and three dependent variables with
Gaussian, Poisson and Binomial distribution, respectively.
ExNPreg(n) data("NPreg")
ExNPreg(n) data("NPreg")
n |
Number of observations per latent class. |
The NPreg
data frame can be re-created by ExNPreg(100)
using set.seed(2602)
, it has been saved as a data set for
simplicity of examples only.
data("NPreg", package = "flexmix") plot(yn ~ x, data = NPreg, col = class) plot(yp ~ x, data = NPreg, col = class) plot(yb ~ x, data = NPreg, col = class)
data("NPreg", package = "flexmix") plot(yn ~ x, data = NPreg, col = class) plot(yp ~ x, data = NPreg, col = class) plot(yb ~ x, data = NPreg, col = class)
Number of faults in rolls of a textile fabric.
data("fabricfault")
data("fabricfault")
A data frame with 32 observations on the following 2 variables.
Length of role (m).
Number of faults.
G. McLachlan and D. Peel. Finite Mixture Models, 2000, John Wiley and Sons Inc. http://www.maths.uq.edu.au/~gjm/DATA/mmdata.html
A. F. Bissell. A Negative Binomial Model with Varying Element Sizes Biometrika, 59, 435–441, 1972.
M. Aitkin. A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing, 6, 251–262, 1996.
data("fabricfault", package = "flexmix") fabricMix <- initFlexmix(Faults ~ 1, data = fabricfault, k = 2, model = FLXMRglmfix(family = "poisson", fixed = ~ log(Length)), nrep = 5)
data("fabricfault", package = "flexmix") fabricMix <- initFlexmix(Faults ~ 1, data = fabricfault, k = 2, model = FLXMRglmfix(family = "poisson", fixed = ~ log(Length)), nrep = 5)
Extract fitted values for each component from a flexmix object.
## S4 method for signature 'flexmix' fitted(object, drop = TRUE, aggregate = FALSE, ...)
## S4 method for signature 'flexmix' fitted(object, drop = TRUE, aggregate = FALSE, ...)
object |
an object of class |
drop |
logical, if |
aggregate |
logical, if |
... |
currently not used |
Friedrich Leisch and Bettina Gruen
data("NPreg", package = "flexmix") ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2) matplot(NPreg$x, fitted(ex1), pch = 16, type = "p") points(NPreg$x, NPreg$yn)
data("NPreg", package = "flexmix") ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2) matplot(NPreg$x, fitted(ex1), pch = 16, type = "p") points(NPreg$x, NPreg$yn)
FlexMix implements a general framework for finite
mixtures of regression models. Parameter estimation is performed using
the EM algorithm: the E-step is implemented by flexmix
, while
the user can specify the M-step.
flexmix(formula, data = list(), k = NULL, cluster = NULL, model = NULL, concomitant = NULL, control = NULL, weights = NULL) ## S4 method for signature 'flexmix' summary(object, eps = 1e-4, ...)
flexmix(formula, data = list(), k = NULL, cluster = NULL, model = NULL, concomitant = NULL, control = NULL, weights = NULL) ## S4 method for signature 'flexmix' summary(object, eps = 1e-4, ...)
formula |
A symbolic description of the model to be fit. The
general form is |
data |
An optional data frame containing the variables in the model. |
k |
Number of clusters (not needed if |
cluster |
Either a matrix with |
weights |
An optional vector of replication weights to be used in
the fitting process. Should be |
model |
Object of class |
concomitant |
Object of class |
control |
Object of class |
object |
Object of class |
eps |
Probabilities below this threshold are treated as zero in the summary method. |
... |
Currently not used. |
FlexMix models are described by objects of class FLXM
,
which in turn are created by driver functions like
FLXMRglm
or FLXMCmvnorm
. Multivariate
responses with independent components can be specified using a
list of FLXM
objects.
The summary
method lists for each component the prior
probability, the number of observations assigned to the corresponding
cluster, the number of observations with a posterior probability
larger than eps
and the ratio of the latter two numbers (which
indicates how separated the cluster is from the others).
Returns an object of class flexmix
.
Friedrich Leisch and Bettina Gruen
Friedrich Leisch. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 2004. doi:10.18637/jss.v011.i08
Bettina Gruen and Friedrich Leisch. Fitting finite mixtures of generalized linear regressions in R. Computational Statistics & Data Analysis, 51(11), 5247-5252, 2007. doi:10.1016/j.csda.2006.08.014
Bettina Gruen and Friedrich Leisch. FlexMix Version 2: Finite mixtures with concomitant variables and varying and constant parameters Journal of Statistical Software, 28(4), 1-35, 2008. doi:10.18637/jss.v028.i04
data("NPreg", package = "flexmix") ## mixture of two linear regression models. Note that control parameters ## can be specified as named list and abbreviated if unique. ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2, control = list(verb = 5, iter = 100)) ex1 summary(ex1) plot(ex1) ## now we fit a model with one Gaussian response and one Poisson ## response. Note that the formulas inside the call to FLXMRglm are ## relative to the overall model formula. ex2 <- flexmix(yn ~ x, data = NPreg, k = 2, model = list(FLXMRglm(yn ~ . + I(x^2)), FLXMRglm(yp ~ ., family = "poisson"))) plot(ex2) ex2 table(ex2@cluster, NPreg$class) ## for Gaussian responses we get coefficients and standard deviation parameters(ex2, component = 1, model = 1) ## for Poisson response we get only coefficients parameters(ex2, component = 1, model = 2) ## fitting a model only to the Poisson response is of course ## done like this ex3 <- flexmix(yp ~ x, data = NPreg, k = 2, model = FLXMRglm(family = "poisson")) ## if observations are grouped, i.e., we have several observations per ## individual, fitting is usually much faster: ex4 <- flexmix(yp~x|id1, data = NPreg, k = 2, model = FLXMRglm(family = "poisson")) ## And now a binomial example. Mixtures of binomials are not generically ## identified, here the grouping variable is necessary: set.seed(1234) ex5 <- initFlexmix(cbind(yb,1 - yb) ~ x, data = NPreg, k = 2, model = FLXMRglm(family = "binomial"), nrep = 5) table(NPreg$class, clusters(ex5)) ex6 <- initFlexmix(cbind(yb, 1 - yb) ~ x | id2, data = NPreg, k = 2, model = FLXMRglm(family = "binomial"), nrep = 5) table(NPreg$class, clusters(ex6))
data("NPreg", package = "flexmix") ## mixture of two linear regression models. Note that control parameters ## can be specified as named list and abbreviated if unique. ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2, control = list(verb = 5, iter = 100)) ex1 summary(ex1) plot(ex1) ## now we fit a model with one Gaussian response and one Poisson ## response. Note that the formulas inside the call to FLXMRglm are ## relative to the overall model formula. ex2 <- flexmix(yn ~ x, data = NPreg, k = 2, model = list(FLXMRglm(yn ~ . + I(x^2)), FLXMRglm(yp ~ ., family = "poisson"))) plot(ex2) ex2 table(ex2@cluster, NPreg$class) ## for Gaussian responses we get coefficients and standard deviation parameters(ex2, component = 1, model = 1) ## for Poisson response we get only coefficients parameters(ex2, component = 1, model = 2) ## fitting a model only to the Poisson response is of course ## done like this ex3 <- flexmix(yp ~ x, data = NPreg, k = 2, model = FLXMRglm(family = "poisson")) ## if observations are grouped, i.e., we have several observations per ## individual, fitting is usually much faster: ex4 <- flexmix(yp~x|id1, data = NPreg, k = 2, model = FLXMRglm(family = "poisson")) ## And now a binomial example. Mixtures of binomials are not generically ## identified, here the grouping variable is necessary: set.seed(1234) ex5 <- initFlexmix(cbind(yb,1 - yb) ~ x, data = NPreg, k = 2, model = FLXMRglm(family = "binomial"), nrep = 5) table(NPreg$class, clusters(ex5)) ex6 <- initFlexmix(cbind(yb, 1 - yb) ~ x | id2, data = NPreg, k = 2, model = FLXMRglm(family = "binomial"), nrep = 5) table(NPreg$class, clusters(ex6))
A fitted flexmix
model.
model
:List of FLXM
objects.
prior
:Numeric vector with prior probabilities of clusters.
posterior
:Named list with elements scaled
and unscaled
, both matrices with one row per observation
and one column per cluster.
iter
:Number of EM iterations.
k
:Number of clusters after EM.
k0
:Number of clusters at start of EM.
cluster
:Cluster assignments of observations.
size
:Cluster sizes.
logLik
:Log-likelihood at EM convergence.
df
:Total number of parameters of the model.
components
:List describing
the fitted components using FLXcomponent
objects.
formula
:Object of class "formula"
.
control
:Object of class "FLXcontrol"
.
call
:The function call used to create the object.
group
:Object of class "factor"
.
converged
:Logical, TRUE
if EM algorithm converged.
concomitant
:Object of class "FLXP"
..
weights
:Optional weights of the observations.
Class FLXdist
, directly.
The following functions should be used for accessing the corresponding slots:
cluster
:Cluster assignments of observations.
posterior
:A matrix of posterior probabilities for each observation.
Friedrich Leisch and Bettina Gruen
A fitted component of a flexmix
model.
Objects can be created by calls of the form new("FLXcomponent", ...)
.
df
:Number of parameters used by the component.
logLik
:Function computing the log-likelihood of observations.
parameters
:List with model parameters.
predict
:Function predicting response for new data.
Friedrich Leisch and Bettina Gruen
Hyperparameters for the EM algorithm.
Objects can be created by calls of the form new("FLXcontrol",
...)
. In addition, named lists can be coerced to FLXcontrol
objects, names are completed if unique (see examples).
iter.max
:Maximum number of iterations.
minprior
:Minimum prior probability of clusters, components falling below this threshold are removed during the iteration.
tolerance
:The EM algorithm is stopped when the
(relative) change of log-likelihood is smaller than tolerance
.
verbose
:If a positive integer, then the
log-likelihood is reported every verbose
iterations. If 0,
no output is generated during model fitting.
classify
:Character string, one of "auto"
,
"weighted"
, "hard"
(or "CEM"
),
"random"
or ("SEM"
).
nrep
:Reports the number of random initializations
used in stepFlexmix()
to determine the mixture.
Run new("FLXcontrol")
to see the default settings of all slots.
Friedrich Leisch and Bettina Gruen
## have a look at the defaults new("FLXcontrol") ## corce a list mycont <- list(iter = 200, tol = 0.001, class = "r") as(mycont, "FLXcontrol")
## have a look at the defaults new("FLXcontrol") ## corce a list mycont <- list(iter = 200, tol = 0.001, class = "r") as(mycont, "FLXcontrol")
Constructs objects of class FLXdist
which represent unfitted finite mixture
models.
FLXdist(formula, k = NULL, model = FLXMRglm(), components, concomitant = FLXPconstant())
FLXdist(formula, k = NULL, model = FLXMRglm(), components, concomitant = FLXPconstant())
formula |
A symbolic description of the model to be fit. The
general form is |
k |
Integer specifying the number of cluster or a numeric vector of length equal to the length of components, specifying the prior probabilities of clusters. |
model |
Object of class |
components |
A list of length equal to the number of components containing a list of length equal to the number of models which again contains a list of named elements for defining the parameters of the component-specific model. |
concomitant |
Object of class |
Returns an object of class FLXdist
.
Bettina Gruen
FLXdist-class
Objects of class FLXdist
represent unfitted finite mixture
models.
## S4 method for signature 'FLXdist' parameters(object, component = NULL, model = NULL, which = c("model", "concomitant"), simplify = TRUE, drop = TRUE) ## S4 method for signature 'FLXdist' predict(object, newdata = list(), aggregate = FALSE, ...)
## S4 method for signature 'FLXdist' parameters(object, component = NULL, model = NULL, which = c("model", "concomitant"), simplify = TRUE, drop = TRUE) ## S4 method for signature 'FLXdist' predict(object, newdata = list(), aggregate = FALSE, ...)
object |
An object of class "FLXdist". |
component |
Number of component(s), if |
model |
Number of model(s), if |
which |
Specifies if the parameters of the component specific model or the concomitant variable model are returned. |
simplify |
Logical, if |
drop |
Logical, if |
newdata |
Dataframe containing new data. |
aggregate |
Logical, if |
... |
Passed to the method of the model class. |
List of FLXM
objects.
Numeric vector with prior probabilities of clusters.
List describing the components using
FLXcomponent
objects.
concomitant
:Object of class "FLXP"
.
Object of class "formula"
.
The function call used to create the object.
Number of clusters.
The following functions should be used for accessing the corresponding slots:
parameters
:The parameters for each model and component, return value depends on the model.
prior
:Numeric vector of prior class probabilities/component weights
Friedrich Leisch and Bettina Gruen
FLXdist
This is the basic computing engine called by flexmix
, it
should usually not be used directly.
FLXfit(model, concomitant, control, postunscaled = NULL, groups, weights)
FLXfit(model, concomitant, control, postunscaled = NULL, groups, weights)
model |
List of |
concomitant |
Object of class |
control |
Object of class |
weights |
A numeric vector of weights to be used in the fitting process. |
postunscaled |
Initial a-posteriori probabilities of the observations at the start of the EM algorithm. |
groups |
List with components |
Returns an object of class flexmix
.
Friedrich Leisch and Bettina Gruen
FlexMix model specification.
The most generic class is the virtual class FLXM
. The classes
FLXMC
for model-based clustering and FLXMR
for
clusterwise regression extend the virtual class. Both have further
more specific model classes which inherit from them.
Model class FLXMCsparse
allows for model-based clustering with
a sparse matrix as data input.
Objects can be created by calls of the form new("FLXM", ...)
,
typically inside driver functions like FLXMRglm
or
FLXMCmvnorm
.
fit
:Function returning an FLXcomponent
object.
defineComponent
:Function or expression to determine the
FLXcomponent
object given the parameters.
weighted
:Logical indicating whether fit
can do
weighted likelihood maximization.
name
:Character string used in print methods.
formula
:Formula describing the model.
fullformula
:Resulting formula from updating the model
formula with the formula specified in the call to flexmix
.
x
:Model matrix.
y
:Model response.
terms
, xlevels
, contrasts
:Additional information for model matrix.
preproc.x
:Function for preprocessing matrix x
before the EM algorithm starts, by default the identity function.
preproc.y
:Function for preprocessing matrix y
before the EM algorithm starts, by default the identity function.
Friedrich Leisch and Bettina Gruen
These are drivers for flexmix
implementing model-based
clustering of univariate data using different distributions for the
component-specific models.
FLXMCdist1(formula = . ~ ., dist, ...)
FLXMCdist1(formula = . ~ ., dist, ...)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
dist |
Character string indicating the component-specific univariate distribution. |
... |
Arguments for the specific model drivers. |
Currently drivers for the following distributions are available:
Lognormal ("lnorm"
)
inverse Gaussian ("invGauss"
using dinvGauss
)
gamma ("gamma"
)
exponential ("exp"
)
Weibull ("weibull"
)
Burr ("burr"
using dburr
)
Inverse Burr ("invburr"
using dinvburr
)
FLXMCdist1
returns an object of class FLXMC
.
Friedrich Leisch and Bettina Gruen
Tatjana Miljkovic and Bettina Gruen. Modeling loss data using mixtures of distributions. Insurance: Mathematics and Economics, 70, 387-396, 2016. doi:10.1016/j.insmatheco.2016.06.019
if (require("actuar")) { set.seed(123) y <- c(rexp(100, 10), rexp(100, 1)) ex <- flexmix(y ~ 1, cluster = rep(1:2, each = 100), model = FLXMCdist1(dist = "exp")) parameters(ex) }
if (require("actuar")) { set.seed(123) y <- c(rexp(100, 10), rexp(100, 1)) ex <- flexmix(y ~ 1, cluster = rep(1:2, each = 100), model = FLXMCdist1(dist = "exp")) parameters(ex) }
This driver for flexmix
implements estimation of mixtures of
factor analyzers using ML estimation of factor analysis implemented in
factanal
in each M-step.
FLXMCfactanal(formula = . ~ ., factors = 1, ...)
FLXMCfactanal(formula = . ~ ., factors = 1, ...)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
factors |
Integer specifying the number of factors in each component. |
... |
Passed to |
FLXMCfactanal
returns an object of class FLXM
.
This does not implement the AECM framework presented in McLachlan and Peel (2000, p.245), but uses the available functionality in R for ML estimation of factor analyzers. The implementation therefore is only experimental and has not been well tested.
Please note that in general a good initialization is crucial for the EM algorithm to converge to a suitable solution for this model class.
Bettina Gruen
G. McLachlan and D. Peel. Finite Mixture Models, 2000. John Wiley and Sons Inc.
## Reproduce (partly) Table 8.1. p.255 (McLachlan and Peel, 2000) if (require("gclus")) { data("wine", package = "gclus") wine_data <- as.matrix(wine[,-1]) set.seed(123) wine_fl_diag <- initFlexmix(wine_data ~ 1, k = 3, nrep = 10, model = FLXMCmvnorm(diagonal = TRUE)) wine_fl_fact <- lapply(1:4, function(q) flexmix(wine_data ~ 1, model = FLXMCfactanal(factors = q, nstart = 3), cluster = posterior(wine_fl_diag))) sapply(wine_fl_fact, logLik) ## FULL set.seed(123) wine_full <- initFlexmix(wine_data ~ 1, k = 3, nrep = 10, model = FLXMCmvnorm(diagonal = FALSE)) logLik(wine_full) ## TRUE wine_true <- flexmix(wine_data ~ 1, cluster = wine$Class, model = FLXMCmvnorm(diagonal = FALSE)) logLik(wine_true) }
## Reproduce (partly) Table 8.1. p.255 (McLachlan and Peel, 2000) if (require("gclus")) { data("wine", package = "gclus") wine_data <- as.matrix(wine[,-1]) set.seed(123) wine_fl_diag <- initFlexmix(wine_data ~ 1, k = 3, nrep = 10, model = FLXMCmvnorm(diagonal = TRUE)) wine_fl_fact <- lapply(1:4, function(q) flexmix(wine_data ~ 1, model = FLXMCfactanal(factors = q, nstart = 3), cluster = posterior(wine_fl_diag))) sapply(wine_fl_fact, logLik) ## FULL set.seed(123) wine_full <- initFlexmix(wine_data ~ 1, k = 3, nrep = 10, model = FLXMCmvnorm(diagonal = FALSE)) logLik(wine_full) ## TRUE wine_true <- flexmix(wine_data ~ 1, cluster = wine$Class, model = FLXMCmvnorm(diagonal = FALSE)) logLik(wine_true) }
This is a model driver for flexmix
implementing
model-based clustering of binary data.
FLXMCmvbinary(formula = . ~ ., truncated = FALSE)
FLXMCmvbinary(formula = . ~ ., truncated = FALSE)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
truncated |
logical, if |
This model driver can be used to cluster binary data. The only parameter is the column-wise mean of the data, which equals the probability of observing a 1.
FLXMCmvbinary
returns an object of class FLXMC
.
Friedrich Leisch and Bettina Gruen
This is a model driver for flexmix
implementing
model-based clustering of a combination of binary and Gaussian data.
FLXMCmvcombi(formula = . ~ .)
FLXMCmvcombi(formula = . ~ .)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
This model driver can be used to cluster mixed-mode binary and
Gaussian data. It checks which columns of a matrix contain only zero
and ones, and does the same as FLXMCmvbinary
for them. For
the remaining columns of the data matrix independent Gaussian
distributions are used (same as FLXMCmvnorm
with
diagonal = FALSE
.
The same could be obtained by creating a corresponding list of two
models for the respective columns, but FLXMCmvcombi
does a
better job in reporting parameters.
FLXMCmvcombi
returns an object of class FLXMC
.
Friedrich Leisch
flexmix
, FLXMCmvbinary
, FLXMCmvnorm
## create some artificial data x1 <- cbind(rnorm(300), sample(0:1, 300, replace = TRUE, prob = c(0.25, 0.75))) x2 <- cbind(rnorm(300, mean = 2, sd = 0.5), sample(0:1, 300, replace = TRUE, prob = c(0.75, 0.25))) x <- rbind(x1, x2) ## fit the model f1 <- flexmix(x ~ 1, k = 2, model = FLXMCmvcombi()) ## should be similar to the original parameters parameters(f1) table(clusters(f1), rep(1:2, c(300,300))) ## a column with noise should not hurt too much x <- cbind(x, rnorm(600)) f2 <- flexmix(x ~ 1, k = 2, model = FLXMCmvcombi()) parameters(f2) table(clusters(f2), rep(1:2, c(300,300)))
## create some artificial data x1 <- cbind(rnorm(300), sample(0:1, 300, replace = TRUE, prob = c(0.25, 0.75))) x2 <- cbind(rnorm(300, mean = 2, sd = 0.5), sample(0:1, 300, replace = TRUE, prob = c(0.75, 0.25))) x <- rbind(x1, x2) ## fit the model f1 <- flexmix(x ~ 1, k = 2, model = FLXMCmvcombi()) ## should be similar to the original parameters parameters(f1) table(clusters(f1), rep(1:2, c(300,300))) ## a column with noise should not hurt too much x <- cbind(x, rnorm(600)) f2 <- flexmix(x ~ 1, k = 2, model = FLXMCmvcombi()) parameters(f2) table(clusters(f2), rep(1:2, c(300,300)))
These are demo drivers for flexmix
implementing
model-based clustering of Gaussian data.
FLXMCmvnorm(formula = . ~ ., diagonal = TRUE) FLXMCnorm1(formula = . ~ .)
FLXMCmvnorm(formula = . ~ ., diagonal = TRUE) FLXMCnorm1(formula = . ~ .)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
diagonal |
If |
This is mostly meant as a demo for FlexMix driver programming, you
should also look at package mclust for real
applications. FLXMCmvnorm
clusters multivariate data,
FLXMCnorm1
univariate data. In the latter case smart
initialization is important, see the example below.
FLXMCmvnorm
returns an object of class FLXMC
.
Friedrich Leisch and Bettina Gruen
Friedrich Leisch. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 2004. doi:10.18637/jss.v011.i08
data("Nclus", package = "flexmix") require("MASS") eqscplot(Nclus) ## This model is wrong (one component has a non-diagonal cov matrix) ex1 <- flexmix(Nclus ~ 1, k = 4, model = FLXMCmvnorm()) print(ex1) plotEll(ex1, Nclus) ## True model, wrong number of components ex2 <- flexmix(Nclus ~ 1, k = 6, model = FLXMCmvnorm(diagonal = FALSE)) print(ex2) plotEll(ex2, Nclus) ## Get parameters of first component parameters(ex2, component = 1) ## Have a look at the posterior probabilies of 10 random observations ok <- sample(1:nrow(Nclus), 10) p <- posterior(ex2)[ok, ] p ## The following two should be the same max.col(p) clusters(ex2)[ok] ## Now try the univariate case plot(density(Nclus[, 1])) ex3 <- flexmix(Nclus[, 1] ~ 1, cluster = cut(Nclus[, 1], 3), model = FLXMCnorm1()) ex3 parameters(ex3)
data("Nclus", package = "flexmix") require("MASS") eqscplot(Nclus) ## This model is wrong (one component has a non-diagonal cov matrix) ex1 <- flexmix(Nclus ~ 1, k = 4, model = FLXMCmvnorm()) print(ex1) plotEll(ex1, Nclus) ## True model, wrong number of components ex2 <- flexmix(Nclus ~ 1, k = 6, model = FLXMCmvnorm(diagonal = FALSE)) print(ex2) plotEll(ex2, Nclus) ## Get parameters of first component parameters(ex2, component = 1) ## Have a look at the posterior probabilies of 10 random observations ok <- sample(1:nrow(Nclus), 10) p <- posterior(ex2)[ok, ] p ## The following two should be the same max.col(p) clusters(ex2)[ok] ## Now try the univariate case plot(density(Nclus[, 1])) ex3 <- flexmix(Nclus[, 1] ~ 1, cluster = cut(Nclus[, 1], 3), model = FLXMCnorm1()) ex3 parameters(ex3)
This is a model driver for flexmix
implementing
model-based clustering of Poisson distributed data.
FLXMCmvpois(formula = . ~ .)
FLXMCmvpois(formula = . ~ .)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
This can be used to cluster Poisson distributed data where given the component membership the variables are mutually independent.
FLXMCmvpois
returns an object of class FLXMC
.
Friedrich Leisch and Bettina Gruen
Model driver for fitting mixtures of conditional logit models.
FLXMRcondlogit(formula = . ~ ., strata)
FLXMRcondlogit(formula = . ~ ., strata)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
strata |
A formula which is interpreted such that no intercept is fitted and which allows to determine the variable indicating which observations are from the same stratum. |
The M-step is performed using coxph.fit
.
Returns an object of class FLXMRcondlogit
inheriting from FLXMRglm
.
To ensure identifiability repeated measurements are necessary. Sufficient conditions are given in Gruen and Leisch (2008).
Bettina Gruen
Bettina Gruen and Friedrich Leisch. Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. Journal of Classification, 25, 225–247. 2008.
if (require("Ecdat")) { data("Catsup", package = "Ecdat") ## To reduce the time needed for the example only a subset is used Catsup <- subset(Catsup, id %in% 1:100) Catsup$experiment <- seq_len(nrow(Catsup)) vnames <- c("display", "feature", "price") Catsup_long <- reshape(Catsup, idvar = c("id", "experiment"), times = c(paste("heinz", c(41, 32, 28), sep = ""), "hunts32"), timevar = "brand", varying = matrix(colnames(Catsup)[2:13], nrow = 3, byrow = TRUE), v.names = vnames, direction = "long") Catsup_long$selected <- with(Catsup_long, choice == brand) Catsup_long <- Catsup_long[, c("id", "selected", "experiment", vnames, "brand")] Catsup_long$brand <- relevel(factor(Catsup_long$brand), "hunts32") set.seed(0808) flx1 <- flexmix(selected ~ display + feature + price + brand | id, model = FLXMRcondlogit(strata = ~ experiment), data = Catsup_long, k = 1) }
if (require("Ecdat")) { data("Catsup", package = "Ecdat") ## To reduce the time needed for the example only a subset is used Catsup <- subset(Catsup, id %in% 1:100) Catsup$experiment <- seq_len(nrow(Catsup)) vnames <- c("display", "feature", "price") Catsup_long <- reshape(Catsup, idvar = c("id", "experiment"), times = c(paste("heinz", c(41, 32, 28), sep = ""), "hunts32"), timevar = "brand", varying = matrix(colnames(Catsup)[2:13], nrow = 3, byrow = TRUE), v.names = vnames, direction = "long") Catsup_long$selected <- with(Catsup_long, choice == brand) Catsup_long <- Catsup_long[, c("id", "selected", "experiment", vnames, "brand")] Catsup_long$brand <- relevel(factor(Catsup_long$brand), "hunts32") set.seed(0808) flx1 <- flexmix(selected ~ display + feature + price + brand | id, model = FLXMRcondlogit(strata = ~ experiment), data = Catsup_long, k = 1) }
This is the main driver for FlexMix interfacing the glm
family of models.
FLXMRglm(formula = . ~ ., family = c("gaussian", "binomial", "poisson", "Gamma"), offset = NULL)
FLXMRglm(formula = . ~ ., family = c("gaussian", "binomial", "poisson", "Gamma"), offset = NULL)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
family |
A character string naming a |
offset |
This can be used to specify an a priori known component to be included in the linear predictor during fitting. |
See flexmix
for examples.
Returns an object of class FLXMRglm
inheriting from FLXMR
.
Friedrich Leisch and Bettina Gruen
Friedrich Leisch. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 2004. doi:10.18637/jss.v011.i08
This implements a driver for FlexMix which interfaces the glm
family of models and where it is possible to specify fixed (constant) or
nested varying coefficients or to ensure that in the Gaussian case the
variance estimate is equal for all components.
FLXMRglmfix(formula = . ~ ., fixed = ~0, varFix = FALSE, nested = NULL, family = c("gaussian", "binomial", "poisson", "Gamma"), offset = NULL)
FLXMRglmfix(formula = . ~ ., fixed = ~0, varFix = FALSE, nested = NULL, family = c("gaussian", "binomial", "poisson", "Gamma"), offset = NULL)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
fixed |
A formula which specifies the additional regressors for the fixed (constant) coefficients. |
varFix |
A logical indicating if the variance estimate for Gaussian components should be constrained to be equal for all components. It can be also a vector specifying the number of components with equal variance. |
nested |
An object of class |
family |
A character string naming a |
offset |
This can be used to specify an a priori known component to be included in the linear predictor during fitting. |
Returns an object of class FLXMRglmfix
inheriting from FLXMRglm
and FLXMRfix
.
Friedrich Leisch and Bettina Gruen
FLXMRglm
data("NPreg", package = "flexmix") ex <- flexmix(yn ~ x | id2, data = NPreg, k = 2, cluster = NPreg$class, model = FLXMRglm(yn ~ . + I(x^2))) ex.fix <- flexmix(yn ~ x | id2, data = NPreg, cluster = posterior(ex), model = FLXMRglmfix(nested = list(k = c(1, 1), formula = c(~0, ~I(x^2))))) summary(refit(ex)) ## Not run: summary(refit(ex.fix)) ## End(Not run)
data("NPreg", package = "flexmix") ex <- flexmix(yn ~ x | id2, data = NPreg, k = 2, cluster = NPreg$class, model = FLXMRglm(yn ~ . + I(x^2))) ex.fix <- flexmix(yn ~ x | id2, data = NPreg, cluster = posterior(ex), model = FLXMRglmfix(nested = list(k = c(1, 1), formula = c(~0, ~I(x^2))))) summary(refit(ex)) ## Not run: summary(refit(ex.fix)) ## End(Not run)
This is a driver which allows fitting of mixtures of GLMs where the coefficients are penalized using the (adaptive) lasso or the elastic net by reusing functionality from package glmnet.
FLXMRglmnet(formula = . ~ ., family = c("gaussian", "binomial", "poisson"), adaptive = TRUE, select = TRUE, offset = NULL, ...)
FLXMRglmnet(formula = . ~ ., family = c("gaussian", "binomial", "poisson"), adaptive = TRUE, select = TRUE, offset = NULL, ...)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
family |
A character string naming a |
adaptive |
A logical indicating if the adaptive lasso should be
used. By default equal to |
select |
A logical vector indicating which variables in the
model matrix should be included in the penalized part. By default
equal to |
offset |
This can be used to specify an a priori known component to be included in the linear predictor during fitting. |
... |
Additional arguments to be passed to
|
Some care is needed to ensure convergence of the algorithm, which is
computationally more challenging than a standard EM. In the proposed
method, not only are cluster allocations identified and component
parameters estimated as commonly done in mixture models, but there is
also variable selection via penalized regression using $k$-fold
cross-validation to choose the penalty parameter. For the algorithm
to converge, it is necessary that the same cross-validation
partitioning be used across the EM iterations, i.e., the subsamples
for cross-validation must be defined at the beginning This is
accomplished using the foldid
option as an additional parameter
to be passed to cv.glmnet
(see
glmnet package documentation).
Returns an object of class FLXMRglm
.
Frederic Mortier and Nicolas Picard.
Frederic Mortier, Dakis-Yaoba Ouedraogo, Florian Claeys, Mahlet G. Tadesse, Guillaume Cornu, Fidele Baya, Fabrice Benedet, Vincent Freycon, Sylvie Gourlet-Fleury and Nicolas Picard. Mixture of inhomogeneous matrix models for species-rich ecosystems. Environmetrics, 26(1), 39-51, 2015. doi:10.1002/env.2320
set.seed(12) p <- 10 beta <- matrix(0, nrow = p + 1, ncol = 2) beta[1,] <- c(-1, 1) beta[cbind(c(5, 10), c(1, 2))] <- 1 nobs <- 100 X <- matrix(rnorm(nobs * p), nobs, p) mu <- cbind(1, X) %*% beta z <- sample(1:ncol(beta), nobs, replace = TRUE) y <- mu[cbind(1:nobs, z)] + rnorm(nobs) data <- data.frame(y, X) ## The maximum number of iterations is reduced to ## avoid a long running time. fo <- sample(rep(seq(10), length = nrow(data))) ex1 <- flexmix(y ~ ., data = data, k = 2, cluster = z, model = FLXMRglmnet(foldid = fo), control = list(iter.max = 2)) parameters(ex1)
set.seed(12) p <- 10 beta <- matrix(0, nrow = p + 1, ncol = 2) beta[1,] <- c(-1, 1) beta[cbind(c(5, 10), c(1, 2))] <- 1 nobs <- 100 X <- matrix(rnorm(nobs * p), nobs, p) mu <- cbind(1, X) %*% beta z <- sample(1:ncol(beta), nobs, replace = TRUE) y <- mu[cbind(1:nobs, z)] + rnorm(nobs) data <- data.frame(y, X) ## The maximum number of iterations is reduced to ## avoid a long running time. fo <- sample(rep(seq(10), length = nrow(data))) ex1 <- flexmix(y ~ ., data = data, k = 2, cluster = z, model = FLXMRglmnet(foldid = fo), control = list(iter.max = 2)) parameters(ex1)
This is a driver which allows fitting of mixtures of linear models with random effects.
FLXMRlmm(formula = . ~ ., random, lm.fit = c("lm.wfit", "smooth.spline"), varFix = c(Random = FALSE, Residual = FALSE), ...) FLXMRlmer(formula = . ~ ., random, weighted = TRUE, control = list(), eps = .Machine$double.eps)
FLXMRlmm(formula = . ~ ., random, lm.fit = c("lm.wfit", "smooth.spline"), varFix = c(Random = FALSE, Residual = FALSE), ...) FLXMRlmer(formula = . ~ ., random, weighted = TRUE, control = list(), eps = .Machine$double.eps)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
random |
A formula for specifying the random effects. |
weighted |
A logical indicating if the model should be estimated with weighted ML. |
control |
A list of control parameters. See
|
eps |
Observations with a component-specific posterior smaller
than |
lm.fit |
A character string indicating if the coefficients should
be fitted using either a linear model or the function
|
varFix |
Named logical vector of length 2 indicating if the variance of the random effects and the residuals are fixed over the components. |
... |
Additional arguments to be passed to |
FLXMRlmm
allows only one random effect. FLXMRlmer
allows
an arbitrary number of random effects if weighted = FALSE
; a
certain structure of the model matrix of the random effects has to be
given for weighted ML estimation, i.e. where weighted = TRUE
.
Returns an object of class FLXMRlmer
and FLXMRlmm
inheriting from FLXMRglm
and FLXMR
, respectively.
For FLXMRlmer
the weighted ML estimation is only correct if the
covariate matrix of the random effects is the same for each
observation. By default weighted ML estimation is made and the
condition on the covariate matrix of the random effects is checked. If
this fails, only estimation with weighted = FALSE
is possible
which will maximize the classification likelihood.
Bettina Gruen
id <- rep(1:50, each = 10) x <- rep(1:10, 50) sample <- data.frame(y = rep(rnorm(unique(id)/2, 0, c(5, 2)), each = 10) + rnorm(length(id), rep(c(3, 8), each = 10)) + rep(c(0, 3), each = 10) * x, x = x, id = factor(id)) fitted <- flexmix(.~.|id, k = 2, model = FLXMRlmm(y ~ x, random = ~ 1), data = sample, control = list(tolerance = 10^-3), cluster = rep(rep(1:2, each = 10), 25)) parameters(fitted) fitted1 <- flexmix(.~.|id, k = 2, model = FLXMRlmer(y ~ x, random = ~ 1), data = sample, control = list(tolerance = 10^-3), cluster = rep(rep(1:2, each = 10), 25)) parameters(fitted1) fitted2 <- flexmix(.~.|id, k = 2, model = FLXMRlmm(y ~ 0 + x, random = ~ 1, lm.fit = "smooth.spline"), data = sample, control = list(tolerance = 10^-3), cluster = rep(rep(1:2, each = 10), 25)) parameters(fitted2)
id <- rep(1:50, each = 10) x <- rep(1:10, 50) sample <- data.frame(y = rep(rnorm(unique(id)/2, 0, c(5, 2)), each = 10) + rnorm(length(id), rep(c(3, 8), each = 10)) + rep(c(0, 3), each = 10) * x, x = x, id = factor(id)) fitted <- flexmix(.~.|id, k = 2, model = FLXMRlmm(y ~ x, random = ~ 1), data = sample, control = list(tolerance = 10^-3), cluster = rep(rep(1:2, each = 10), 25)) parameters(fitted) fitted1 <- flexmix(.~.|id, k = 2, model = FLXMRlmer(y ~ x, random = ~ 1), data = sample, control = list(tolerance = 10^-3), cluster = rep(rep(1:2, each = 10), 25)) parameters(fitted1) fitted2 <- flexmix(.~.|id, k = 2, model = FLXMRlmm(y ~ 0 + x, random = ~ 1, lm.fit = "smooth.spline"), data = sample, control = list(tolerance = 10^-3), cluster = rep(rep(1:2, each = 10), 25)) parameters(fitted2)
This is a driver which allows fitting of mixtures of linear models with random effects and left-censored observations.
FLXMRlmmc(formula = . ~ ., random, censored, varFix, eps = 10^-6, ...)
FLXMRlmmc(formula = . ~ ., random, censored, varFix, eps = 10^-6, ...)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
random |
A formula for specifying the random effects. If missing no random effects are fitted. |
varFix |
If random effects are specified a named logical vector of length 2 indicating if the variance of the random effects and the residuals are fixed over the components. Otherwise a logical indicating if the variance of the residuals are fixed over the components. |
censored |
A formula for specifying the censoring variable. |
eps |
Observations with an a-posteriori probability smaller or
equal to |
... |
Additional arguments to be passed to |
Returns an object of class FLXMRlmmc
, FLXMRlmmcfix
,
FLXMRlmc
or FLXMRlmcfix
inheriting from FLXMR
.
Bettina Gruen
This is a driver which allows fitting of mixtures of GAMs.
FLXMRmgcv(formula = . ~ ., family = c("gaussian", "binomial", "poisson"), offset = NULL, control = NULL, optimizer = c("outer", "newton"), in.out = NULL, eps = .Machine$double.eps, ...)
FLXMRmgcv(formula = . ~ ., family = c("gaussian", "binomial", "poisson"), offset = NULL, control = NULL, optimizer = c("outer", "newton"), in.out = NULL, eps = .Machine$double.eps, ...)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
family |
A character string naming a |
offset |
This can be used to specify an a priori known component to be included in the linear predictor during fitting. |
control |
A list of fit control parameters returned by
|
optimizer |
An array specifying the numerical optimization method
to use to optimize the smoothing parameter estimation criterion; for
more details see |
in.out |
Optional list for initializing outer iteration; for more
details see |
eps |
Observations with an a-posteriori probability smaller or
equal to |
... |
Additional arguments to be pased to the GAM fitter. |
Returns an object of class FLXMRmgcv
inheriting from FLXMRglm
.
Bettina Gruen
set.seed(2012) x <- seq(0, 1, length.out = 100) z <- sample(0:1, length(x), replace = TRUE) y <- rnorm(length(x), ifelse(z, 5 * sin(x * 2 * pi), 10 * x - 5)) fitted_model <- flexmix(y ~ s(x), model = FLXMRmgcv(), cluster = z + 1, control = list(tolerance = 10^-3)) plot(y ~ x, col = clusters(fitted_model)) matplot(x, fitted(fitted_model), type = "l", add = TRUE)
set.seed(2012) x <- seq(0, 1, length.out = 100) z <- sample(0:1, length(x), replace = TRUE) y <- rnorm(length(x), ifelse(z, 5 * sin(x * 2 * pi), 10 * x - 5)) fitted_model <- flexmix(y ~ s(x), model = FLXMRmgcv(), cluster = z + 1, control = list(tolerance = 10^-3)) plot(y ~ x, col = clusters(fitted_model)) matplot(x, fitted(fitted_model), type = "l", add = TRUE)
Model driver for fitting mixtures of multinomial logit models.
FLXMRmultinom(formula = . ~ ., ...)
FLXMRmultinom(formula = . ~ ., ...)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
... |
Additional arguments to be passed to |
The M-step is performed using nnet.default
.
Returns an object of class FLXMRmultinom
inheriting from FLXMRglm
.
To ensure identifiability repeated measurements are necessary. Sufficient conditions are given in Gruen and Leisch (2008).
Bettina Gruen
Bettina Gruen and Friedrich Leisch. Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. Journal of Classification, 25, 225–247. 2008.
This driver adds a noise component to the mixture model which can be used to model background noise in the data. See the Compstat paper Leisch (2008) cited below for details.
FLXMRrobglm(formula = . ~ ., family = c("gaussian", "poisson"), bgw = FALSE, ...)
FLXMRrobglm(formula = . ~ ., family = c("gaussian", "poisson"), bgw = FALSE, ...)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
family |
A character string naming a |
bgw |
Logical, controls whether the parameters of the background
component are fixed to multiples of location and scale of the
complete data (the default), or estimated by EM with normal weights for the
background ( |
... |
passed to |
Returns an object of class FLXMRrobglm
inheriting from FLXMRglm
.
The implementation of this model class is currently under development,
and some methods like refit
are still missing.
Friedrich Leisch and Bettina Gruen
Friedrich Leisch. Modelling background noise in finite mixtures of
generalized linear regression models. In Paula Brito, editor, Compstat
2008–Proceedings in Computational Statistics, 385–396. Physica
Verlag, Heidelberg, Germany, 2008.
Preprint available at http://epub.ub.uni-muenchen.de/6332/.
## Example from Compstat paper, see paper for detailed explanation: data("NPreg", package = "flexmix") DATA <- NPreg[, 1:2] set.seed(3) DATA2 <- rbind(DATA, cbind(x = -runif(3), yn = 50 + runif(3))) ## Estimation without (f2) and with (f3) background component f2 <- flexmix(yn ~ x + I(x^2), data = DATA2, k = 2) f3 <- flexmix(yn ~ x + I(x^2), data = DATA2, k = 3, model = FLXMRrobglm(), control = list(minprior = 0)) ## Predict on new data for plots x <- seq(-5,15, by = .1) y2 <- predict(f2, newdata = data.frame(x = x)) y3 <- predict(f3, newdata = data.frame(x = x)) ## f2 was estimated without background component: plot(yn ~ x, data = DATA2, pch = clusters(f2), col = clusters(f2)) lines(x, y2$Comp.1, col = 1) lines(x, y2$Comp.2, col = 2) ## f3 is with background component: plot(yn ~ x, data = DATA2, pch = 4 - clusters(f3), col = 4 - clusters(f3)) lines(x, y3$Comp.2, col = 2) lines(x, y3$Comp.3, col = 1)
## Example from Compstat paper, see paper for detailed explanation: data("NPreg", package = "flexmix") DATA <- NPreg[, 1:2] set.seed(3) DATA2 <- rbind(DATA, cbind(x = -runif(3), yn = 50 + runif(3))) ## Estimation without (f2) and with (f3) background component f2 <- flexmix(yn ~ x + I(x^2), data = DATA2, k = 2) f3 <- flexmix(yn ~ x + I(x^2), data = DATA2, k = 3, model = FLXMRrobglm(), control = list(minprior = 0)) ## Predict on new data for plots x <- seq(-5,15, by = .1) y2 <- predict(f2, newdata = data.frame(x = x)) y3 <- predict(f3, newdata = data.frame(x = x)) ## f2 was estimated without background component: plot(yn ~ x, data = DATA2, pch = clusters(f2), col = clusters(f2)) lines(x, y2$Comp.1, col = 1) lines(x, y2$Comp.2, col = 2) ## f3 is with background component: plot(yn ~ x, data = DATA2, pch = 4 - clusters(f3), col = 4 - clusters(f3)) lines(x, y3$Comp.2, col = 2) lines(x, y3$Comp.3, col = 1)
This is a driver which allows fitting of zero inflated poisson and binomial models.
FLXMRziglm(formula = . ~ ., family = c("binomial", "poisson"), ...)
FLXMRziglm(formula = . ~ ., family = c("binomial", "poisson"), ...)
formula |
A formula which is interpreted relative to the formula
specified in the call to |
family |
A character string naming a |
... |
passed to |
Returns an object of class FLXMRziglm
inheriting from FLXMRglm
.
In fact this only approximates zero inflated models by fixing the coefficient of the intercept at -Inf and the other coefficients at zero for the first component.
Friedrich Leisch and Bettina Gruen
data("dmft", package = "flexmix") Model <- FLXMRziglm(family = "poisson") Fitted <- flexmix(End ~ log(Begin + 0.5) + Gender + Ethnic + Treatment, model = Model, k = 2 , data = dmft, control = list(minprior = 0.01)) summary(refit(Fitted))
data("dmft", package = "flexmix") Model <- FLXMRziglm(family = "poisson") Fitted <- flexmix(End ~ log(Begin + 0.5) + Gender + Ethnic + Treatment, model = Model, k = 2 , data = dmft, control = list(minprior = 0.01)) summary(refit(Fitted))
Specification of nesting structure for regression coefficients.
Objects can be created by calls of the form new("FLXnested",
formula, k, ...)
. In addition, named lists can be coerced to
FLXnested
objects, names are completed if unique.
formula
:Object of class "list"
containing the
formula for determining the model matrix for each nested parameter.
k
:Object of class "numeric"
specifying the
number of components in each group.
Friedrich Leisch and Bettina Gruen
Creator functions for the concomitant variable
model. FLXPconstant
specifies constant priors
and FLXPmultinom
multinomial logit models for the priors.
FLXPconstant() FLXPmultinom(formula = ~1)
FLXPconstant() FLXPmultinom(formula = ~1)
formula |
A formula for determining the model matrix of the concomitant variables. |
FLXPmultinom
uses nnet.default
from nnet to fit the
multinomial logit model.
Object of class FLXP
. FLXPmultinom
returns an
object of class FLXPmultinom
which extends class
FLXP
directly and is used for method dispatching.
Friedrich Leisch and Bettina Gruen
Concomitant model class.
Objects can be created by calls of the form new("FLXP", ...)
,
typically inside driver functions like FLXPconstant
or
FLXPmultinom
.
name
:Character string used in print methods.
formula
:Formula describing the model.
x
:Model matrix.
fit
:Function returning the fitted prior probabilities.
refit
:Function returning the fitted concomitant model.
coef
:Matrix containing the fitted parameters.
df
:Function for determining the number of degrees of freedom used.
Friedrich Leisch and Bettina Gruen
Extract grouping variable for all observations.
## S4 method for signature 'flexmix' group(object) ## S4 method for signature 'FLXM' group(object) ## S4 method for signature 'FLXMRglmfix' group(object)
## S4 method for signature 'flexmix' group(object) ## S4 method for signature 'FLXM' group(object) ## S4 method for signature 'FLXMRglmfix' group(object)
object |
an object of class |
Bettina Gruen
Compute the Integrated Completed Likelihood criterion for model selection.
## S4 method for signature 'flexmix' ICL(object, ...) ## S4 method for signature 'stepFlexmix' ICL(object, ...)
## S4 method for signature 'flexmix' ICL(object, ...) ## S4 method for signature 'stepFlexmix' ICL(object, ...)
object |
see Methods section below |
... |
Some methods for this generic function may take additional, optional arguments. At present none do. |
Returns a numeric vector with the corresponding ICL value(s).
Compute the ICL of a flexmix
object.
Compute the ICL of all
models contained in the stepFlexmix
object.
Friedrich Leisch and Bettina Gruen
C. Biernacki, G. Celeux, and G. Govaert. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719–725, 2000.
data("NPreg", package = "flexmix") ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2) ICL(ex1)
data("NPreg", package = "flexmix") ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2) ICL(ex1)
Estimate the Kullback-Leibler divergence of several distributions.
## S4 method for signature 'matrix' KLdiv(object, eps = 10^-4, overlap = TRUE,...) ## S4 method for signature 'flexmix' KLdiv(object, method = c("continuous", "discrete"), ...)
## S4 method for signature 'matrix' KLdiv(object, eps = 10^-4, overlap = TRUE,...) ## S4 method for signature 'flexmix' KLdiv(object, method = c("continuous", "discrete"), ...)
object |
See Methods section below. |
method |
The method to be used; "continuous" determines the Kullback-Leibler divergence between the unweighted theoretical component distributions and the unweighted posterior probabilities at the observed points are used by "discrete". |
eps |
Probabilities below this threshold are replaced by this threshold for numerical stability. |
overlap |
Logical, do not determine the KL divergence for
those pairs where for each point at least one of the densities has a
value smaller than |
... |
Passed to the matrix method. |
Estimates
for distributions with densities and
.
A matrix of KL divergences where the rows correspond to using the
respective distribution as in the formula above.
Takes as input a matrix of density values with one row per observation and one column per distribution.
Returns the Kullback-Leibler divergence of the mixture components.
The density functions are modified to have equal support.
A weight of at least eps
is given to each
observation point for the modified densities.
Friedrich Leisch and Bettina Gruen
S. Kullback and R. A. Leibler. On information and sufficiency.The Annals of Mathematical Statistics, 22(1), 79–86, 1951.
Friedrich Leisch. Exploring the structure of mixture model components. In Jaromir Antoch, editor, Compstat 2004–Proceedings in Computational Statistics, 1405–1412. Physika Verlag, Heidelberg, Germany, 2004. ISBN 3-7908-1554-3.
## Gaussian and Student t are much closer to each other than ## to the uniform: x <- seq(-3, 3, length = 200) y <- cbind(u = dunif(x), n = dnorm(x), t = dt(x, df = 10)) matplot(x, y, type = "l") KLdiv(y) if (require("mlbench")) { set.seed(2606) x <- mlbench.smiley()$x model1 <- flexmix(x ~ 1, k = 9, model = FLXmclust(diag = FALSE), control = list(minprior = 0)) plotEll(model1, x) KLdiv(model1) }
## Gaussian and Student t are much closer to each other than ## to the uniform: x <- seq(-3, 3, length = 200) y <- cbind(u = dunif(x), n = dnorm(x), t = dt(x, df = 10)) matplot(x, y, type = "l") KLdiv(y) if (require("mlbench")) { set.seed(2606) x <- mlbench.smiley()$x model1 <- flexmix(x ~ 1, k = 9, model = FLXmclust(diag = FALSE), control = list(minprior = 0)) plotEll(model1, x) KLdiv(model1) }
Apply a function to each component of a finite mixture
## S4 method for signature 'FLXRmstep' Lapply(object, FUN, model = 1, component = TRUE, ...)
## S4 method for signature 'FLXRmstep' Lapply(object, FUN, model = 1, component = TRUE, ...)
object |
S4 class object. |
FUN |
The function to be applied. |
model |
The model (for a multivariate response) that shall be used. |
component |
Index vector for selecting the components. |
... |
Optional arguments to |
FUN
is found by a call to match.fun
and typically is specified
as a function or a symbol (e.g. a backquoted name) or a character
string specifying a function to be searched for from the
environment of the call to Lapply
.
A list of the length equal to the number of components specified is
returned, each element of which is the result of applying FUN
to the
specified component of the refitted mixture model.
Apply a function to each component of a
refitted flexmix
object using method = "mstep"
.
Friedrich Leisch and Bettina Gruen
data("NPreg", package = "flexmix") ex2 <- flexmix(yn ~ x, data = NPreg, k = 2, model = list(FLXMRglm(yn ~ . + I(x^2)), FLXMRglm(yp ~ ., family = "poisson"))) ex2r <- refit(ex2, method = "mstep") Lapply(ex2r, "vcov", 2)
data("NPreg", package = "flexmix") ex2 <- flexmix(yn ~ x, data = NPreg, k = 2, model = list(FLXMRglm(yn ~ . + I(x^2)), FLXMRglm(yp ~ ., family = "poisson"))) ex2r <- refit(ex2, method = "mstep") Lapply(ex2r, "vcov", 2)
Evaluate the log-likelihood. This function is defined as
an S4 generic in the stats4
package.
Evaluate the log-likelihood of an
flexmix
object
For a 22-centre trial the number of responses and the total number of patients is reported for the control group and the group receiving a new drug.
data("Mehta")
data("Mehta")
A data frame with 44 observations on the following 4 variables.
Number of responses.
Total number of observations.
A factor indicating treatment with levels New
and
Control
.
A factor indicating the site/centre.
M. Aitkin. Meta-analysis by random effect modelling in generalized linear models. Statistics in Medicine, 18, 2343–2351, 1999.
C.R. Mehta, N.R. Patel and P. Senchaudhuri. Importance sampling for estimating exact probabilities in permutational inference. Journal of the American Statistical Association, 83, 999–1005, 1988.
data("Mehta", package = "flexmix") mehtaMix <- initFlexmix(cbind(Response, Total-Response) ~ 1|Site, data = Mehta, nrep = 5, k = 3, model = FLXMRglmfix(family = "binomial", fixed = ~ Drug), control = list(minprior = 0.04))
data("Mehta", package = "flexmix") mehtaMix <- initFlexmix(cbind(Response, Total-Response) ~ 1|Site, data = Mehta, nrep = 5, k = 3, model = FLXMRglmfix(family = "binomial", fixed = ~ Drug), control = list(minprior = 0.04))
A simple artificial regression example with 3 latent classes, two independent variables, one concomitant variable and a dependent variable which follows a Gaussian distribution.
data("NregFix")
data("NregFix")
A data frame with 200 observations on the following 5 variables.
x1
Independent variable: numeric variable.
x2
Independent variable: a factor with two levels:
0
and 1
.
w
Concomitant variable: a factor with two levels:
0
and 1
.
y
Dependent variable.
class
Latent class memberships.
data("NregFix", package = "flexmix") library("lattice") xyplot(y ~ x1 | x2 * w, data = NregFix, groups = class) Model <- FLXMRglmfix(~ 1, fixed = ~ x2, nested = list(k = c(2, 1), formula = c(~x1, ~0))) fittedModel <- initFlexmix(y ~ 1, model = Model, data = NregFix, k = 3, concomitant = FLXPmultinom(~ w), nrep = 5) fittedModel
data("NregFix", package = "flexmix") library("lattice") xyplot(y ~ x1 | x2 * w, data = NregFix, groups = class) Model <- FLXMRglmfix(~ 1, fixed = ~ x2, nested = list(k = c(2, 1), formula = c(~x1, ~0))) fittedModel <- initFlexmix(y ~ 1, model = Model, data = NregFix, k = 3, concomitant = FLXPmultinom(~ w), nrep = 5) fittedModel
Number of patents, R&D spending and sales in millions of dollar for 70 pharmaceutical and biomedical companies in 1976.
data("patent")
data("patent")
A data frame with 70 observations on the following 4 variables.
Name of company.
Number of patents.
R&D spending per sales.
Logarithmized R&D spendings (in millions of dollars).
The data is taken from the National Bureau of Economic Research R&D Masterfile.
P. Wang, I.M. Cockburn and M.L. Puterman. Analysis of Patent Data – A Mixed-Poisson-Regression-Model Approach. Journal of Business & Economic Statistics, 16(1), 27–41, 1998.
B.H. Hall, C. Cummins, E. Laderman and J. Mundy. The R&D Master File Documentation. Technical Working Paper 72, National Bureau of Economic Research, 1988. Cambridge, MA.
data("patent", package = "flexmix") patentMix <- initFlexmix(Patents ~ lgRD, k = 3, model = FLXMRglm(family = "poisson"), concomitant = FLXPmultinom(~RDS), nrep = 5, data = patent) plot(Patents ~ lgRD, data = patent, pch = as.character(clusters(patentMix))) ordering <- order(patent$lgRD) apply(fitted(patentMix), 2, function(y) lines(sort(patent$lgRD), y[ordering]))
data("patent", package = "flexmix") patentMix <- initFlexmix(Patents ~ lgRD, k = 3, model = FLXMRglm(family = "poisson"), concomitant = FLXPmultinom(~RDS), nrep = 5, data = patent) plot(Patents ~ lgRD, data = patent, pch = as.character(clusters(patentMix))) ordering <- order(patent$lgRD) apply(fitted(patentMix), 2, function(y) lines(sort(patent$lgRD), y[ordering]))
The plot
method for flexmix-class
objects gives a
rootogram or histogram of the posterior probabilities.
## S4 method for signature 'flexmix,missing' plot(x, y, mark = NULL, markcol = NULL, col = NULL, eps = 1e-4, root = TRUE, ylim = TRUE, main = NULL, xlab = "", ylab = "", as.table = TRUE, endpoints = c(-0.04, 1.04), ...)
## S4 method for signature 'flexmix,missing' plot(x, y, mark = NULL, markcol = NULL, col = NULL, eps = 1e-4, root = TRUE, ylim = TRUE, main = NULL, xlab = "", ylab = "", as.table = TRUE, endpoints = c(-0.04, 1.04), ...)
x |
An object of class |
y |
Not used. |
mark |
Integer: mark posteriors of this component. |
markcol |
Color used for marking components. |
col |
Color used for the bars. |
eps |
Posteriors smaller than |
root |
If |
ylim |
A logical value or a numeric vector of length 2. If
|
main |
Main title of the plot. |
xlab |
Label of x-axis. |
ylab |
Label of y-axis. |
as.table |
Logical that controls the order in which panels
should be plotted: if |
endpoints |
Vector of length 2 indicating the range of x-values that is
to be covered by the histogram. This applies only when
|
... |
Further graphical parameters for the lattice function histogram. |
For each mixture component a rootogram or histogram of the posterior probabilities of all observations is drawn. Rootograms are very similar to histograms, the only difference is that the height of the bars correspond to square roots of counts rather than the counts themselves, hence low counts are more visible and peaks less emphasized. Please note that the y-axis denotes the number of observations in each bar in any case.
Usually in each component a lot of observations have posteriors
close to zero, resulting in a high count for the corresponding
bin in the rootogram which obscures the information in the other
bins. To avoid this problem, all probabilities with a posterior below
eps
are ignored.
A peak at probability one indicates that a mixture component is well seperated from the other components, while no peak at one and/or significant mass in the middle of the unit interval indicates overlap with other components.
Friedrich Leisch and Bettina Gruen
Friedrich Leisch. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 2004. doi:10.18637/jss.v011.i08
Jeremy Tantrum, Alejandro Murua and Werner Stuetzle. Assessment and pruning of hierarchical model based clustering. Proceedings of the 9th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, 197–205. ACM Press, New York, NY, USA, 2003.
Friedrich Leisch. Exploring the structure of mixture model components. In Jaromir Antoch, editor, Compstat 2004–Proceedings in Computational Statistics, 1405–1412. Physika Verlag, Heidelberg, Germany, 2004. ISBN 3-7908-1554-3.
Plot 50% and 95% confidence ellipses for mixtures of Gaussians fitted using
FLXMCmvnorm
.
plotEll(object, data, which = 1:2, model = 1, project = NULL, points = TRUE, eqscale = TRUE, col = NULL, number = TRUE, cex = 1.5, numcol = "black", pch = NULL, ...)
plotEll(object, data, which = 1:2, model = 1, project = NULL, points = TRUE, eqscale = TRUE, col = NULL, number = TRUE, cex = 1.5, numcol = "black", pch = NULL, ...)
object |
An object of class |
data |
The response variable in a data frame or as a matrix. |
which |
Index numbers of dimensions of (projected) input space to plot. |
model |
The model (for a multivariate response) that shall be plotted. |
project |
Projection object, currently only the result of
|
points |
Logical, shall data points be plotted? |
eqscale |
Logical, plot using |
number |
Logical, plot number labels at cluster centers? |
cex , numcol
|
Size and color of number labels. |
pch , col , ...
|
Graphical parameters. |
Friedrich Leisch and Bettina Gruen
Determine posterior probabilities or cluster memberships for a fitted
flexmix
or unfitted FLXdist
model.
## S4 method for signature 'flexmix,missing' posterior(object, newdata, unscaled = FALSE, ...) ## S4 method for signature 'FLXdist,listOrdata.frame' posterior(object, newdata, unscaled = FALSE, ...) ## S4 method for signature 'flexmix,missing' clusters(object, newdata, ...) ## S4 method for signature 'FLXdist,ANY' clusters(object, newdata, ...)
## S4 method for signature 'flexmix,missing' posterior(object, newdata, unscaled = FALSE, ...) ## S4 method for signature 'FLXdist,listOrdata.frame' posterior(object, newdata, unscaled = FALSE, ...) ## S4 method for signature 'flexmix,missing' clusters(object, newdata, ...) ## S4 method for signature 'FLXdist,ANY' clusters(object, newdata, ...)
object |
An object of class "flexmix" or "FLXdist". |
newdata |
Data frame or list containing new data. If missing the posteriors of the original observations are returned. |
unscaled |
Logical, if |
... |
Currently not used. |
Friedrich Leisch and Bettina Gruen
Refits an estimated flexmix model to obtain additional information like coefficient significance p-values for GLM regression.
## S4 method for signature 'flexmix' refit(object, newdata, method = c("optim", "mstep"), ...) ## S4 method for signature 'FLXRoptim' summary(object, model = 1, which = c("model", "concomitant"), ...) ## S4 method for signature 'FLXRmstep' summary(object, model = 1, which = c("model", "concomitant"), ...) ## S4 method for signature 'FLXRoptim,missing' plot(x, y, model = 1, which = c("model", "concomitant"), bycluster = TRUE, alpha = 0.05, components, labels = NULL, significance = FALSE, xlab = NULL, ylab = NULL, ci = TRUE, scales = list(), as.table = TRUE, horizontal = TRUE, ...)
## S4 method for signature 'flexmix' refit(object, newdata, method = c("optim", "mstep"), ...) ## S4 method for signature 'FLXRoptim' summary(object, model = 1, which = c("model", "concomitant"), ...) ## S4 method for signature 'FLXRmstep' summary(object, model = 1, which = c("model", "concomitant"), ...) ## S4 method for signature 'FLXRoptim,missing' plot(x, y, model = 1, which = c("model", "concomitant"), bycluster = TRUE, alpha = 0.05, components, labels = NULL, significance = FALSE, xlab = NULL, ylab = NULL, ci = TRUE, scales = list(), as.table = TRUE, horizontal = TRUE, ...)
object |
An object of class |
newdata |
Optional new data. |
method |
Specifies if the variance covariance matrix is
determined using |
model |
The model (for a multivariate response) that shall be used. |
which |
Specifies if a component specific model or the concomitant variable model is used. |
x |
An object of class |
y |
Missing object. |
bycluster |
A logical if the parameters should be group by cluster or by variable. |
alpha |
Numeric indicating the significance level. |
components |
Numeric vector specifying which components are plotted. The default is to plot all components. |
labels |
Character vector specifying the variable names used. |
significance |
A logical indicating if non-significant coefficients are shaded in a lighter grey. |
xlab |
String for the x-axis label. |
ylab |
String for the y-axis label. |
ci |
A logical indicating if significant and insignificant parameter estimates are shaded differently. |
scales |
See argument of the same name for
function |
as.table |
See arguments of the same name for
function |
horizontal |
See arguments of the same name for
function |
... |
Currently not used |
The refit
method for FLXMRglm
models in
combination with the summary
method can be
used to obtain the usual tests for significance of coefficients. Note
that the tests are valid only if flexmix
returned the maximum
likelihood estimator of the parameters. If refit
is used with
method = "mstep"
for these component specific models the
returned object contains a glm
object for each component where
the elements model
which is the model frame and data
which contains the original dataset are missing.
An object inheriting form class FLXR
is returned. For the
method using optim
the object has class FLXRoptim
and
for the M-step method it has class FLXRmstep
. Both classes give
similar results for their summary
methods.
Objects of class FLXRoptim
have their own plot
method.
Lapply
can be used to further analyse the refitted component
specific models of objects of class FLXRmstep
.
For method = "mstep"
the standard deviations are determined
separately for each of the components using the a-posteriori
probabilities as weights without accounting for the fact that the
components have been simultaneously estimated. The derived standard
deviations are hence approximative and should only be used in an
exploratory way, as they are underestimating the uncertainty given
that the missing information of the component memberships are replaced
by the expected values.
The newdata
argument can only be specified when using
method = "mstep"
for refitting FLXMRglm
components. A
variant of glm
for weighted ML estimation is used for fitting
the components and full glm
objects are returned. Please note
that in this case the data and the model frame are stored for each
component which can significantly increase the object size.
Friedrich Leisch and Bettina Gruen
Friedrich Leisch. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 2004. doi:10.18637/jss.v011.i08
data("NPreg", package = "flexmix") ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2) ex1r <- refit(ex1) ## in one component all coefficients should be highly significant, ## in the other component only the linear term summary(ex1r)
data("NPreg", package = "flexmix") ex1 <- flexmix(yn ~ x + I(x^2), data = NPreg, k = 2) ex1r <- refit(ex1) ## in one component all coefficients should be highly significant, ## in the other component only the linear term summary(ex1r)
The components are sorted by the value of one of the parameters or according to an integer vector containing the permutation of the numbers from 1 to the number of components.
relabel(object, by, ...) ## S4 method for signature 'FLXdist,character' relabel(object, by, which = NULL, ...)
relabel(object, by, ...) ## S4 method for signature 'FLXdist,character' relabel(object, by, which = NULL, ...)
object |
An object of class |
by |
If a character vector, it needs to be one of |
which |
Name (or unique substring) of a parameter if |
... |
Currently not used. |
Friedrich Leisch and Bettina Gruen
set.seed(123) beta <- matrix(1:16, ncol = 4) beta df1 <- ExLinear(beta, n = 100, sd = .5) f1 <- flexmix(y~., data = df1, k = 4) ## There was label switching, parameters are not in the same order ## as in beta: round(parameters(f1)) betas <- rbind(beta, .5) betas ## This makes no sense: summary(abs(as.vector(betas-parameters(f1)))) ## We relabel the components by sorting the coefficients of x1: r1 <- relabel(f1, by = "model", which = "x1") round(parameters(r1)) ## Now we can easily compare the fit with the true parameters: summary(abs(as.vector(betas-parameters(r1))))
set.seed(123) beta <- matrix(1:16, ncol = 4) beta df1 <- ExLinear(beta, n = 100, sd = .5) f1 <- flexmix(y~., data = df1, k = 4) ## There was label switching, parameters are not in the same order ## as in beta: round(parameters(f1)) betas <- rbind(beta, .5) betas ## This makes no sense: summary(abs(as.vector(betas-parameters(f1)))) ## We relabel the components by sorting the coefficients of x1: r1 <- relabel(f1, by = "model", which = "x1") round(parameters(r1)) ## Now we can easily compare the fit with the true parameters: summary(abs(as.vector(betas-parameters(r1))))
Given a finite mixture model generate random numbers from it.
rflexmix(object, newdata, ...)
rflexmix(object, newdata, ...)
object |
A fitted finite mixture model of class |
newdata |
Optionally, a data frame in which to look for variables with which to predict or an integer specifying the number of random draws for model-based clustering. If omitted, the data to which the model was fitted is used. |
... |
Further arguments to be passed to or from methods. |
rflexmix
provides the creation of the model matrix for new data
and the sampling of the cluster memberships. The sampling of the
component distributions given the classification is done by calling
rFLXM
. This step has to be provided for the different model
classes.
A list with components
y |
Random sample |
group |
Grouping factor |
class |
Class membership |
Bettina Gruen
example(flexmix) sample <- rflexmix(ex1)
example(flexmix) sample <- rflexmix(ex1)
Data on Ames Salmonella reverse mutagenicity assay.
data("salmonellaTA98")
data("salmonellaTA98")
This data frame contains the following columns:
Dose levels of quinoline.
Numbers of revertant colonies of TA98 Salmonella observed on each of three replicate plates tested at each of six dose levels of quinoline diameter.
This data set is taken from package dispmod provided by Luca Scrucca.
Margolin, B.J., Kaplan, N. and Zeiger, E. Statistical analysis of the Ames Salmonella/microsome test, Proc. Natl. Acad. Sci. USA, 76, 3779–3783, 1981.
Breslow, N.E. Extra-Poisson variation in log-linear models, Applied Statistics, 33, 38–44, 1984.
Wang, P., Puterman, M.L., Cockburn, I.M., and Le, N.D. Mixed Poisson regression models with covariate dependent rates, Biometrics, 52, 381–400, 1996.
data("salmonellaTA98", package = "flexmix") salmonMix <- initFlexmix(y ~ 1, data = salmonellaTA98, model = FLXMRglmfix(family = "poisson", fixed = ~ x + log(x + 10)), k = 2, nrep = 5) salmonMix.pr <- predict(salmonMix, newdata = salmonellaTA98) plot(y ~ x, data = salmonellaTA98, pch = as.character(clusters(salmonMix)), ylim = range(c(salmonellaTA98$y, unlist(salmonMix.pr)))) for (i in 1:2) lines(salmonellaTA98$x, salmonMix.pr[[i]], lty = i)
data("salmonellaTA98", package = "flexmix") salmonMix <- initFlexmix(y ~ 1, data = salmonellaTA98, model = FLXMRglmfix(family = "poisson", fixed = ~ x + log(x + 10)), k = 2, nrep = 5) salmonMix.pr <- predict(salmonMix, newdata = salmonellaTA98) plot(y ~ x, data = salmonellaTA98, pch = as.character(clusters(salmonMix)), ylim = range(c(salmonellaTA98$y, unlist(salmonMix.pr)))) for (i in 1:2) lines(salmonellaTA98$x, salmonMix.pr[[i]], lty = i)
Data from a clinical trial where the effect of intravenous gamma-globulin on suppression of epileptic seizures is studied. Daily observations for a period of 140 days on one patient are given, where the first 27 days are a baseline period without treatment, the remaining 113 days are the treatment period.
data("seizure")
data("seizure")
A data frame with 140 observations on the following 4 variables.
A numeric vector, daily counts of epileptic seizures.
A numeric vector, hours of daily parental observation.
A factor with levels No
and Yes
.
A numeric vector.
P. Wang, M. Puterman, I. Cockburn, and N. Le. Mixed poisson regression models with covariate dependent rates. Biometrics, 52, 381–400, 1996.
B. Gruen and F. Leisch. Bootstrapping finite mixture models. In J. Antoch, editor, Compstat 2004–Proceedings in Computational Statistics, 1115–1122. Physika Verlag, Heidelberg, Germany, 2004. ISBN 3-7908-1554-3.
data("seizure", package = "flexmix") plot(Seizures/Hours ~ Day, col = as.integer(Treatment), pch = as.integer(Treatment), data = seizure) abline(v = 27.5, lty = 2, col = "grey") legend(140, 9, c("Baseline", "Treatment"), pch = 1:2, col = 1:2, xjust = 1, yjust = 1) set.seed(123) ## The model presented in the Wang et al paper: two components for ## "good" and "bad" days, respectively, each a Poisson GLM with hours of ## parental observation as offset seizMix <- flexmix(Seizures ~ Treatment * log(Day), data = seizure, k = 2, model = FLXMRglm(family = "poisson", offset = log(seizure$Hours))) summary(seizMix) summary(refit(seizMix)) matplot(seizure$Day, fitted(seizMix)/seizure$Hours, type = "l", add = TRUE, col = 3:4)
data("seizure", package = "flexmix") plot(Seizures/Hours ~ Day, col = as.integer(Treatment), pch = as.integer(Treatment), data = seizure) abline(v = 27.5, lty = 2, col = "grey") legend(140, 9, c("Baseline", "Treatment"), pch = 1:2, col = 1:2, xjust = 1, yjust = 1) set.seed(123) ## The model presented in the Wang et al paper: two components for ## "good" and "bad" days, respectively, each a Poisson GLM with hours of ## parental observation as offset seizMix <- flexmix(Seizures ~ Treatment * log(Day), data = seizure, k = 2, model = FLXMRglm(family = "poisson", offset = log(seizure$Hours))) summary(seizMix) summary(refit(seizMix)) matplot(seizure$Day, fitted(seizMix)/seizure$Hours, type = "l", add = TRUE, col = 3:4)
Runs flexmix repeatedly for different numbers of components and returns the maximum likelihood solution for each.
initFlexmix(..., k, init = list(), control = list(), nrep = 3L, verbose = TRUE, drop = TRUE, unique = FALSE) initMethod(name = c("tol.em", "cem.em", "sem.em"), step1 = list(tolerance = 10^-2), step2 = list(), control = list(), nrep = 3L) stepFlexmix(..., k = NULL, nrep = 3, verbose = TRUE, drop = TRUE, unique = FALSE) ## S4 method for signature 'stepFlexmix,missing' plot(x, y, what = c("AIC", "BIC", "ICL"), xlab = NULL, ylab = NULL, legend = "topright", ...) ## S4 method for signature 'stepFlexmix' getModel(object, which = "BIC") ## S4 method for signature 'stepFlexmix' unique(x, incomparables = FALSE, ...)
initFlexmix(..., k, init = list(), control = list(), nrep = 3L, verbose = TRUE, drop = TRUE, unique = FALSE) initMethod(name = c("tol.em", "cem.em", "sem.em"), step1 = list(tolerance = 10^-2), step2 = list(), control = list(), nrep = 3L) stepFlexmix(..., k = NULL, nrep = 3, verbose = TRUE, drop = TRUE, unique = FALSE) ## S4 method for signature 'stepFlexmix,missing' plot(x, y, what = c("AIC", "BIC", "ICL"), xlab = NULL, ylab = NULL, legend = "topright", ...) ## S4 method for signature 'stepFlexmix' getModel(object, which = "BIC") ## S4 method for signature 'stepFlexmix' unique(x, incomparables = FALSE, ...)
... |
|
k |
A vector of integers passed in turn to the |
init |
An object of class |
name |
A character string indication which initialization
strategy should be employed: short runs of EM followed by a long
( |
step1 |
A named list which combined with the |
step2 |
A named list which combined with the |
control |
A named list which combined with the |
nrep |
For each value of |
verbose |
If |
drop |
If |
unique |
If |
x , object
|
An object of class |
y |
Not used. |
what |
Character vector naming information criteria to
plot. Functions of the same name must exist, which take a
|
xlab , ylab
|
Graphical parameters. |
legend |
If not |
which |
Number of model to get. If character, interpreted as number of components or name of an information criterion. |
incomparables |
A vector of values that cannot be
compared. Currently, |
An object of class "stepFlexmix"
containing the best models
with respect to the log likelihood for the different number of
components in a slot if length(k)>1
, else directly an object of
class "flexmix"
.
If unique = FALSE
, then the resulting object contains one
model per element of k
(which is the number of clusters the EM
algorithm started with). If unique = TRUE
, then the result
is resorted according to the number of clusters contained in the
fitted models (which may be less than the number with which the EM
algorithm started), and only the maximum likelihood solution for each
number of fitted clusters is kept. This operation can also be done
manually by calling unique()
on objects of class
"stepFlexmix"
.
Friedrich Leisch and Bettina Gruen
Friedrich Leisch. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 2004. doi:10.18637/jss.v011.i08
Christophe Biernacki, Gilles Celeux and Gerard Govaert. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis, 41(3–4), 561–575, 2003.
Theresa Scharl, Bettina Gruen and Friedrch Leisch. Mixtures of regression models for time-course gene expression data: Evaluation of initialization and random effects. Bioinformatics, 26(3), 370–377, 2010.
data("Nclus", package = "flexmix") ## try 2 times for k = 4 set.seed(511) ex1 <- initFlexmix(Nclus~1, k = 4, model = FLXMCmvnorm(diagonal = FALSE), nrep = 2) ex1 ## now 2 times each for k = 2:5, specify control parameter ex2 <- initFlexmix(Nclus~1, k = 2:5, model = FLXMCmvnorm(diagonal = FALSE), control = list(minprior = 0), nrep = 2) ex2 plot(ex2) ## get BIC values BIC(ex2) ## get smallest model getModel(ex2, which = 1) ## get model with 3 components getModel(ex2, which = "3") ## get model with smallest ICL (here same as for AIC and BIC: true k = 4) getModel(ex2, which = "ICL") ## now 1 time each for k = 2:5, with larger minimum prior ex3 <- initFlexmix(Nclus~1, k = 2:5, model = FLXMCmvnorm(diagonal = FALSE), control = list(minprior = 0.1), nrep = 1) ex3 ## keep only maximum likelihood solution for each unique number of ## fitted clusters: unique(ex3)
data("Nclus", package = "flexmix") ## try 2 times for k = 4 set.seed(511) ex1 <- initFlexmix(Nclus~1, k = 4, model = FLXMCmvnorm(diagonal = FALSE), nrep = 2) ex1 ## now 2 times each for k = 2:5, specify control parameter ex2 <- initFlexmix(Nclus~1, k = 2:5, model = FLXMCmvnorm(diagonal = FALSE), control = list(minprior = 0), nrep = 2) ex2 plot(ex2) ## get BIC values BIC(ex2) ## get smallest model getModel(ex2, which = 1) ## get model with 3 components getModel(ex2, which = "3") ## get model with smallest ICL (here same as for AIC and BIC: true k = 4) getModel(ex2, which = "ICL") ## now 1 time each for k = 2:5, with larger minimum prior ex3 <- initFlexmix(Nclus~1, k = 2:5, model = FLXMCmvnorm(diagonal = FALSE), control = list(minprior = 0.1), nrep = 1) ex3 ## keep only maximum likelihood solution for each unique number of ## fitted clusters: unique(ex3)
The data investigates whether the adult Tribolium species Castaneum has developed an evolutionary advantage to recognize and avoid eggs of their own species while foraging.
data("tribolium")
data("tribolium")
A data frame with 27 observations on the following 4 variables.
Remaining
A numeric vector.
Total
A numeric vector.
Replicate
A factor with levels 1
, 2
, 3
.
Species
A factor with levels Castaneum
Confusum
Madens
.
Beetles of the genus Tribolium are cannibalistic in the sense that adults eat the eggs of their own species as well as those of closely related species. The experiment isolated a number of adult beetles of the same species and presented them with a vial of 150 eggs (50 of each type), the eggs being thoroughly mixed to ensure uniformity throughout the vial.
The data gives the consumption data for adult Castaneum species. It reports the number of Castaneum, Confusum and Madens eggs, respectively, that remain uneaten after two day exposure to the adult beetles. Replicates 1, 2, and 3 correspond to different occasions on which the experiment was conducted.
P. Wang and M.L. Puterman. Mixed Logistic Regression Models. Journal of Agricultural, Biological, and Environmental Statistics, 3 (2), 175–200, 1998.
data("tribolium", package = "flexmix") tribMix <- initFlexmix(cbind(Remaining, Total - Remaining) ~ Species, k = 2, nrep = 5, data = tribolium, model = FLXMRglm(family = "binomial"))
data("tribolium", package = "flexmix") tribMix <- initFlexmix(cbind(Remaining, Total - Remaining) ~ Species, k = 2, nrep = 5, data = tribolium, model = FLXMRglm(family = "binomial"))
Trypanosome data from a dosage-response analysis to assess the proportion of organisms belonging to different populations. It is assumed that organisms belonging to different populations are indistinguishable other than in terms of their reaction to the stimulus.
data("trypanosome")
data("trypanosome")
A data frame with 426 observations on the following 2 variables.
Dead
A logical vector.
Dose
A numeric vector.
The experimental technique involved inspection under the microscope of a representative aliquot of a suspension, all organisms appearing within two fields of view being classified either alive or dead. Hence the total numbers of organisms present at each dose and the number showing the quantal response were both random variables.
R. Ashford and P.J. Walker. Quantal Response Analysis for a Mixture of Populations. Biometrics, 28, 981–988, 1972.
D.A. Follmann and D. Lambert. Generalizing Logistic Regression by Nonparametric Mixing. Journal of the American Statistical Association, 84(405), 195–300, 1989.
data("trypanosome", package = "flexmix") trypMix <- initFlexmix(cbind(Dead, 1-Dead) ~ 1, k = 2, nrep = 5, data = trypanosome, model = FLXMRglmfix(family = "binomial", fixed = ~log(Dose)))
data("trypanosome", package = "flexmix") trypMix <- initFlexmix(cbind(Dead, 1-Dead) ~ 1, k = 2, nrep = 5, data = trypanosome, model = FLXMRglmfix(family = "binomial", fixed = ~log(Dose)))
The data set is from Simmons Study of Media and Markets and contains the incidence matrix for scotch brands used in last year for those households who report consuming scotch.
data("whiskey")
data("whiskey")
A data frame whiskey
with 484 observations on the following 2
variables.
Freq
a numeric vector
Incidence
a matrix with 21 columns
Additional information on the brands is contained in the data frame
whiskey_brands
which is simultaneously loaded. This data frame
contains 21 observations on the following 3 variables.
Brand
a character vector
Type
a factor with levels Blend
Single Malt
Bottled
a factor with levels Domestic
Foreign
The dataset is taken from the bayesm package.
Peter Rossi and Rob McCulloch. bayesm: Bayesian Inference for Marketing/Micro-econometrics. R package version 2.0-8, 2006. http://gsbwww.uchicago.edu/fac/peter.rossi/research/bsm.html
Edwards, Y. and G. Allenby. Multivariate Analysis of Multiple Response Data, Journal of Marketing Research, 40, 321–334, 2003.