Title: | Generic EM Algorithm |
---|---|
Description: | A generic function for running the Expectation-Maximization (EM) algorithm within a maximum likelihood framework, based on Dempster, Laird, and Rubin (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x> is implemented. It can be applied after a model fitting using R's existing functions and packages. |
Authors: | Dongjie Wu [aut, cre, cph] |
Maintainer: | Dongjie Wu <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2024-12-12 07:15:07 UTC |
Source: | CRAN |
Given the posterior probability, generate a matrix to assign each individual to a class. The assignment is based on which probability is the largest.
cstep(postpr)
cstep(postpr)
postpr |
('matrix()') |
This is a generic EM algorithm that can work on specific models/objects. Currently, it supports 'lm', 'glm', 'gnm' in package gnm, 'clogit' in package survival and 'multinom' in package nnet. Use '?em.default' to check the manual of the default function of 'em'.
em(object, ...)
em(object, ...)
object |
the model used, e.g. 'lm', 'glm', 'gnm', 'clogit', 'multinom' |
... |
arguments used in the 'model'. |
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
Dongjie Wu
fit.lm <- lm(yn ~ x, data = simreg) results <- em(fit.lm, latent = 2, verbose = FALSE) fmm_fit <- predict(results) fmm_fit_post <- predict(results, prob = "posterior")
fit.lm <- lm(yn ~ x, data = simreg) results <- em(fit.lm, latent = 2, verbose = FALSE) fmm_fit <- predict(results) fmm_fit_post <- predict(results, prob = "posterior")
The em function for 'survival::clogit'.
## S3 method for class 'clogit' em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans", "hc"), init.prob = NULL, algo = c("em", "cem", "sem"), cluster.by = NULL, max_iter = 500, abs_tol = 1e-04, concomitant = list(...), use.optim = FALSE, optim.start = c("random", "sample5"), ... )
## S3 method for class 'clogit' em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans", "hc"), init.prob = NULL, algo = c("em", "cem", "sem"), cluster.by = NULL, max_iter = 500, abs_tol = 1e-04, concomitant = list(...), use.optim = FALSE, optim.start = c("random", "sample5"), ... )
object |
the model used, e.g. 'lm', 'glm', 'gnm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. 'kmeans' is K-means clustering. 'hc' is model-based agglomerative hierarchical clustering. |
init.prob |
the starting prior probabilities used in classification based method. |
algo |
the algorithm used in em: 'em' the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
cluster.by |
a variable to define the level of clustering. |
max_iter |
the maximum iteration for em algorithm. |
abs_tol |
absolute accuracy requested. |
concomitant |
the formula to define the concomitant part of the model. The default is NULL. |
use.optim |
maximize the complete log likelihood (MLE) by using 'optim' and 'rcpp' code.The default value is 'FALSE'. |
optim.start |
the initialization method of generating the starting value for MLE. |
... |
arguments used in the 'model'. |
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
The default em function
## Default S3 method: em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans", "hc"), init.prob = NULL, algo = c("em", "cem", "sem"), cluster.by = NULL, max_iter = 500, abs_tol = 1e-04, concomitant = list(...), use.optim = FALSE, optim.start = c("random", "sample5"), ... )
## Default S3 method: em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans", "hc"), init.prob = NULL, algo = c("em", "cem", "sem"), cluster.by = NULL, max_iter = 500, abs_tol = 1e-04, concomitant = list(...), use.optim = FALSE, optim.start = c("random", "sample5"), ... )
object |
the model used, e.g. 'lm', 'glm', 'gnm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. 'kmeans' is K-means clustering. 'hc' is model-based agglomerative hierarchical clustering. |
init.prob |
the starting prior probabilities used in classification based method. |
algo |
the algorithm used in em: 'em' the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
cluster.by |
a variable to define the level of clustering. |
max_iter |
the maximum iteration for em algorithm. |
abs_tol |
absolute accuracy requested. |
concomitant |
the formula to define the concomitant part of the model. The default is NULL. |
use.optim |
maximize the complete log likelihood (MLE) by using 'optim' and 'rcpp' code.The default value is 'FALSE'. |
optim.start |
the initialization method of generating the starting value for MLE. |
... |
arguments used in the 'model'. |
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
The default em function
## S3 method for class 'fitdist' em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans", "hc"), init.prob = NULL, algo = c("em", "cem", "sem"), max_iter = 500, ... )
## S3 method for class 'fitdist' em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans", "hc"), init.prob = NULL, algo = c("em", "cem", "sem"), max_iter = 500, ... )
object |
the model used, e.g. 'lm', 'glm', 'gnm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. 'kmeans' is K-means clustering. 'hc' is model-based agglomerative hierarchical clustering. |
init.prob |
the starting prior probabilities used in classification based method. |
algo |
the algorithm used in em: 'em' the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
max_iter |
the maximum iteration for em algorithm. |
... |
arguments used in the 'model'. |
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
The em function for glmerMod
## S3 method for class 'glmerMod' em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans", "hc"), algo = c("em", "cem", "sem"), max_iter = 500, concomitant = list(...), ... )
## S3 method for class 'glmerMod' em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans", "hc"), algo = c("em", "cem", "sem"), max_iter = 500, concomitant = list(...), ... )
object |
the model used, e.g. 'lm', 'glm', 'gnm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. 'kmeans' is K-means clustering. 'hc' is model-based agglomerative hierarchical clustering. |
algo |
the algorithm used in em: 'em' the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
max_iter |
the maximum iteration for em algorithm. |
concomitant |
the formula to define the concomitant part of the model. The default is NULL. |
... |
arguments used in the 'model'. |
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
The em function for 'panelmodel' such as 'plm'.
## S3 method for class 'panelmodel' em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans"), algo = c("em", "cem", "sem"), max_iter = 500, concomitant = list(...), ... )
## S3 method for class 'panelmodel' em( object, latent = 2, verbose = FALSE, init.method = c("random", "kmeans"), algo = c("em", "cem", "sem"), max_iter = 500, concomitant = list(...), ... )
object |
the model used, e.g. 'lm', 'glm', 'gnm', 'plm'. |
latent |
the number of latent classes. |
verbose |
'True' to print the process of convergence. |
init.method |
the initialization method used in the model. The default method is 'random'. |
algo |
the algorithm used in em: the default EM algorithm, the classification em 'cem', or the stochastic em 'sem'. |
max_iter |
the maximum iteration for em algorithm. |
concomitant |
the formula to define the concomitant part of the model. The default is NULL. |
... |
arguments used in the 'model'. |
An object of class 'em' is a list containing at least the following components:
models
a list of models/objects whose class are determined by a model fitting from the previous step.
pi
the prior probabilities.
latent
number of the latent classes.
algorithm
the algorithm used (could be either 'em', 'sem' or 'cem').
obs
the number of observations.
post_pr
the posterior probabilities.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
init.method
the initialization method used.
call
the matched call.
terms
the codeterms object used.
This function performs an E-Step of EM Algorithm.
estep(models, pi_matrix)
estep(models, pi_matrix)
models |
models used in the EM algorithm, |
pi_matrix |
the pi matrix. |
the fitting result for the model.
This function generates the probability density of given models.
fit.den(object, ...)
fit.den(object, ...)
object |
the fitted model such as 'lm'. |
... |
other used arguments. |
the density function.
Fit the density for the survival::clogit
## S3 method for class 'coxph' fit.den(object, ...)
## S3 method for class 'coxph' fit.den(object, ...)
object |
the fitted model. |
... |
other used arguments. |
the density function.
Fitting the density function using in 'fitdistrplus::fitdist()'
## S3 method for class 'fitdist' fit.den(object, ...)
## S3 method for class 'fitdist' fit.den(object, ...)
object |
the fitted model. |
... |
other used arguments. |
the density function.
Fit the density function for a generalized linear regression model.
## S3 method for class 'glm' fit.den(object, ...)
## S3 method for class 'glm' fit.den(object, ...)
object |
the fitted model. |
... |
other used arguments. |
the density function.
Fit the density function for a generalized linear mixed effect model.
## S3 method for class 'glmerMod' fit.den(object, ...)
## S3 method for class 'glmerMod' fit.den(object, ...)
object |
the fitted model. |
... |
other used arguments. |
the density function.
Fit the density function for a generalized non-linear regression model.
## S3 method for class 'gnm' fit.den(object, ...)
## S3 method for class 'gnm' fit.den(object, ...)
object |
the fitted model. |
... |
other used arguments. |
the density function.
Fit the density function for a linear regression model.
## S3 method for class 'lm' fit.den(object, ...)
## S3 method for class 'lm' fit.den(object, ...)
object |
the fitted model. |
... |
other used arguments. |
the density function.
Fit the density function for a multinomial regression model.
## S3 method for class 'multinom' fit.den(object, ...)
## S3 method for class 'multinom' fit.den(object, ...)
object |
the fitted model. |
... |
other used arguments. |
the density function.
Fit the density function for a 'nnet' model.
## S3 method for class 'nnet' fit.den(object, ...)
## S3 method for class 'nnet' fit.den(object, ...)
object |
the fitted model. |
... |
other used arguments. |
the density function.
Fit the density function for a panel regression model.
## S3 method for class 'plm' fit.den(object, ...)
## S3 method for class 'plm' fit.den(object, ...)
object |
the fitted model. |
... |
other used arguments. |
the density function.
Flatten a data.frame or matrix by column or row with its name. The name will be transformed into the number of row/column plus the name of column/row separated by '.'.
flatten(x, by = c("col", "row"))
flatten(x, by = c("col", "row"))
x |
a data.frame or matrix. |
by |
either by column or by row. |
a flattened vector with names
Given a matrix with number of rows equal to the number of observation and number of columns equal to the number of latent classes, function 'init.em' generate the posterior probability using that matrix based on the method set by the user.
init.em(object, ...)
init.em(object, ...)
object |
A matrix. |
... |
other used arguments. |
The posterior probability matrix
model-based agglomerative hierarchical clustering
## S3 method for class 'hc' init.em(object, ...)
## S3 method for class 'hc' init.em(object, ...)
object |
A matrix. |
... |
other used arguments. |
The posterior probability matrix
K-mean initialization
## S3 method for class 'kmeans' init.em(object, ...)
## S3 method for class 'kmeans' init.em(object, ...)
object |
A matrix. |
... |
other used arguments. |
The posterior probability matrix
Random initialization
## S3 method for class 'random' init.em(object, ...)
## S3 method for class 'random' init.em(object, ...)
object |
A matrix. |
... |
other used arguments. |
The posterior probability matrix
Random initialization with weights
## S3 method for class 'random.weights' init.em(object, ...)
## S3 method for class 'random.weights' init.em(object, ...)
object |
A matrix. |
... |
other used arguments. |
The posterior probability matrix
Initialization using sampling 5 times.
## S3 method for class 'sample5' init.em(object, ...)
## S3 method for class 'sample5' init.em(object, ...)
object |
A matrix. |
... |
other used arguments. |
The posterior probability matrix
This function computes logLik of EM Algorithm.
## S3 method for class 'em' logLik(object, ...)
## S3 method for class 'em' logLik(object, ...)
object |
an object of 'em'. |
... |
other used arguments. |
the log-likelihood value
This function performs an M-Step of EM Algorithm.
mstep(models, post_pr = NULL)
mstep(models, post_pr = NULL)
models |
the models used in the EM algorithm |
post_pr |
the posterior probability. |
the fitting result for the model.
This section was inspired by Flexmix.
mstep.concomitant(formula, data, postpr)
mstep.concomitant(formula, data, postpr)
formula |
the formula of the concomitant model. |
data |
the data or model.frame related to the concomitant model. |
postpr |
the posterior probability matrix. |
the function returns a fitted nnet object.
The refit of for the concomitant model. This section was inspired by Flexmix.
mstep.concomitant.refit(formula, data, postpr)
mstep.concomitant.refit(formula, data, postpr)
formula |
the formula of the concomitant model. |
data |
the data or model.frame related to the concomitant model. |
postpr |
the posterior probability matrix. |
the function returns a fitted multinom object.
Multiple run of EM algorithm
multi.em(object, ...)
multi.em(object, ...)
object |
the model to use in em, e.g. 'lm', 'glm', 'gnm' |
... |
arguments used in em. |
return the 'em' object with the maximum log-likelihood.
Default generic for multi.em
## Default S3 method: multi.em( object, iter = 10, parallel = FALSE, num.cores = 2, random.init = TRUE, ... )
## Default S3 method: multi.em( object, iter = 10, parallel = FALSE, num.cores = 2, random.init = TRUE, ... )
object |
the model to use in em, e.g. 'lm', 'glm', 'gnm' |
iter |
number of iterations for running EM algorithm. |
parallel |
whether to use the parallel computing. |
num.cores |
number of cores used in the parallel computing. |
random.init |
whether to use a random initialization. |
... |
arguments used in em. |
return the 'em' object with the maximum log-likelihood.
This is the generic plot function for 'em' project. One can produce three types of graphs using this function 1. A graph of the predicted value distribution for each component. 2. A histogram of posterior probability distributions
## S3 method for class 'em' plot( x, by = c("component", "prob"), prior = FALSE, cols = rep(1, length(x$models)), lwds = rep(3, length(x$models)), ltys = c(seq_len(length(x$models))), ranges = NULL, main = NULL, lgd = list(), lgd.loc = "topleft", hist.args = list(main = "Histograms of posterior probabilities", xlab = "Posterior Probabilities"), ... )
## S3 method for class 'em' plot( x, by = c("component", "prob"), prior = FALSE, cols = rep(1, length(x$models)), lwds = rep(3, length(x$models)), ltys = c(seq_len(length(x$models))), ranges = NULL, main = NULL, lgd = list(), lgd.loc = "topleft", hist.args = list(main = "Histograms of posterior probabilities", xlab = "Posterior Probabilities"), ... )
x |
the 'em' model to plot |
by |
the type of the graph to produce. The default is 'component'. |
prior |
whether fit the model using prior probabilities. |
cols |
lines' colors. |
lwds |
Lines' widths. |
ltys |
lines' types. |
ranges |
the ranges of the x-axis and the y-axis limits of plots. It should be a vector of four numeric values. The first two represent the x-axis limits. The last two represent the y-axis limits |
main |
the main title. |
lgd |
a list for legend related arguments. |
lgd.loc |
the location of the legend. The default is "topleft". |
hist.args |
The list of arguments for the histogram. |
... |
other arguments. |
'NULL'
Predict the fitted finite mixture models
## S3 method for class 'em' predict(object, prob = c("prior", "posterior"), ...)
## S3 method for class 'em' predict(object, prob = c("prior", "posterior"), ...)
object |
Output from |
prob |
the probabilities used to compute the fitted value. It can be either prior probability ('prior') or posterior probability ('posterior'). The default value is 'prior'. |
... |
other arguments. |
An object of class 'predict.em' is a list containing at least the following components:
components
a list of fitted values by components with each element
a matrix/vector of fitted values.
mean
a matrix of predicted values computed by weighted sum of fitted values by components.
The weights used in the computation can be either prior probabilities or posterior probabilities
depending on the parameter 'prob'.
prob
the value used in the parameter 'prob'.
Print the 'em' object
## S3 method for class 'em' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'em' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
the 'em' object. |
digits |
the maximum digits printed, the default is '3L'. |
... |
other arguments used. |
print the 'em' object on the screen.
Print the 'summary.em' object
## S3 method for class 'summary.em' print( x, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ... )
## S3 method for class 'summary.em' print( x, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ... )
x |
the 'summary.em' object. |
digits |
the maximum digits printed, the default is '3L'. |
signif.stars |
logical; if 'TRUE', P-values are additionally encoded visually as 'significance stars' in order to help scanning of long coefficient tables. It defaults to the 'show.signif.stars' slot of options. |
... |
other augments used in 'printCoefmat'. |
print the 'summary.em' object on the screen.
A data set with simulated data from a mixture of a logistic regression.
simbinom
simbinom
A data frame with 10000 rows and 2 variables:
A dependent variable generated from a mixture of a logistic regression with x
An independent variable
<https://www.github.com/wudongjie/em>
A data set with simulated data from a mixture of a conditional logistic regression.
simclogit
simclogit
A data frame with 10000 rows and 4 variables:
A dummy variable showing whether x is equal to level 2
A dummy variable showing whether x is equal to level 3
Whether the alternative choice 2 is chosen
Interaction between a2 and x2
Interaction between a2 and x3
Whether the alternative choice 3 is chosen
Interaction between a3 and x2
Interaction between a3 and x3
Whether the observation-alternative combination is chosen (Generated by a one-class regression).
Whether the observation-alternative combination is chosen (Generated by a two-class mixed regression).
Family ID
Individual ID
Other variables
<https://www.github.com/wudongjie/em>
A data set with simulated data from mixture regression models.
simreg
simreg
A data frame with 1000 rows and 5 variables:
A dependent variable generated from a mixture of a poisson regression with x
A dependent variable generated from a mixture of a linear regression with x
A dependent variable generated from a mixture of a linear regression with x and a concomitant variable of z
An independent variable
A concomitant variable
<https://www.github.com/wudongjie/em>
Given the posterior probability, generate a matrix to assign each individual to a class. The assignment is randomly sampled based on the posterior probability.
sstep(postpr)
sstep(postpr)
postpr |
('matrix()') |
Summaries of fitted finite mixture models using EM algorithm
## S3 method for class 'em' summary(object, ...)
## S3 method for class 'em' summary(object, ...)
object |
Output from |
... |
other arguments used. |
An object of class 'summary.em' is a list containing at least the following components:
call
the matched call.
coefficients
pi
the prior probabilities.
latent
number of the latent classes.
ll
log-likelihood value.
sum.models
summaries of models generated by 'summary()' of models from each class.
df
degree of freedom.
obs
number of observations.
AIC
the Akaike information criterion.
BIC
the Bayesian information criterion.
concomitant
a list of the concomitant model. It is empty if no concomitant model is used.
concomitant.summary
summaries of the concomitant model generated by 'summary()'.
Transform a factor variable to a matrix of dummy variables
vdummy(x)
vdummy(x)
x |
a factor vector |
a matrix of dummy variables