Title: | Functional Latent Data Models for Clustering Heterogeneous Curves ('FLaMingos') |
---|---|
Description: | Provides a variety of original and flexible user-friendly statistical latent variable models for the simultaneous clustering and segmentation of heterogeneous functional data (i.e time series, or more generally longitudinal data, fitted by unsupervised algorithms, including EM algorithms. Functional Latent Data Models for Clustering heterogeneous curves ('FLaMingos') are originally introduced and written in 'Matlab' by Faicel Chamroukhi <https://github.com/fchamroukhi?utf8=?&tab=repositories&q=mix&type=public&language=matlab>. The references are mainly the following ones. Chamroukhi F. (2010) <https://chamroukhi.com/FChamroukhi-PhD.pdf>. Chamroukhi F., Same A., Govaert, G. and Aknin P. (2010) <doi:10.1016/j.neucom.2009.12.023>. Chamroukhi F., Same A., Aknin P. and Govaert G. (2011). <doi:10.1109/IJCNN.2011.6033590>. Same A., Chamroukhi F., Govaert G. and Aknin, P. (2011) <doi:10.1007/s11634-011-0096-5>. Chamroukhi F., and Glotin H. (2012) <doi:10.1109/IJCNN.2012.6252818>. Chamroukhi F., Glotin H. and Same A. (2013) <doi:10.1016/j.neucom.2012.10.030>. Chamroukhi F. (2015) <https://chamroukhi.com/FChamroukhi-HDR.pdf>. Chamroukhi F. and Nguyen H-D. (2019) <doi:10.1002/widm.1298>. |
Authors: | Faicel Chamroukhi [aut] , Florian Lecocq [aut, trl, cre] (R port), Marius Bartcus [aut, trl] (R port) |
Maintainer: | Florian Lecocq <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2024-11-04 06:43:37 UTC |
Source: | CRAN |
flamingos
is an open-source toolbox for the simultaneous
clustering (or classification) and segmentation of heterogeneous functional
data (i.e time-series ore more generally longitudinal data), with original
and flexible functional latent variable models, fitted by unsupervised
algorithms, including EM algorithms.
flamingos
contains the following time series clustering and segmentation models:
mixRHLP;
mixHMM;
mixHMMR.
For the advantages/differences of each of them, the user is referred to our mentioned paper references.
To learn more about flamingos
, start with the vignettes:
browseVignettes(package = "flamingos")
Maintainer: Florian Lecocq [email protected] (R port) [translator]
Authors:
Faicel Chamroukhi [email protected] (0000-0002-5894-3103)
Marius Bartcus [email protected] (R port) [translator]
Chamroukhi, Faicel, and Hien D. Nguyen. 2019. Model-Based Clustering and Classification of Functional Data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. https://chamroukhi.com/papers/MBCC-FDA.pdf.
Chamroukhi, F. 2016. Unsupervised Learning of Regression Mixture Models with Unknown Number of Components. Journal of Statistical Computation and Simulation 86 (November): 2308–34. https://chamroukhi.com/papers/Chamroukhi-JSCS-2015.pdf.
Chamroukhi, Faicel. 2016. Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation. Journal of Classification 33 (3): 374–411. https://chamroukhi.com/papers/Chamroukhi-PWRM-JournalClassif-2016.pdf.
Chamroukhi, F. 2015. Statistical Learning of Latent Data Models for Complex Data Analysis. Habilitation Thesis (HDR), Universite de Toulon. https://chamroukhi.com/Dossier/FChamroukhi-Habilitation.pdf.
Chamroukhi, F., H. Glotin, and A. Same. 2013. Model-Based Functional Mixture Discriminant Analysis with Hidden Process Regression for Curve Classification. Neurocomputing 112: 153–63. https://chamroukhi.com/papers/chamroukhi_et_al_neucomp2013a.pdf.
Chamroukhi, F., and H. Glotin. 2012. Mixture Model-Based Functional Discriminant Analysis for Curve Classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), IEEE, 1–8. Brisbane, Australia. https://chamroukhi.com/papers/Chamroukhi-ijcnn-2012.pdf.
Same, A., F. Chamroukhi, Gerard Govaert, and P. Aknin. 2011. Model-Based Clustering and Segmentation of Time Series with Changes in Regime. Advances in Data Analysis and Classification 5 (4): 301–21. https://chamroukhi.com/papers/adac-2011.pdf.
Chamroukhi, F., A. Same, P. Aknin, and G. Govaert. 2011. Model-Based Clustering with Hidden Markov Model Regression for Time Series with Regime Changes. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), IEEE, 2814–21. https://chamroukhi.com/papers/Chamroukhi-ijcnn-2011.pdf.
Chamroukhi, F., A. Same, G. Govaert, and P. Aknin. 2010. A Hidden Process Regression Model for Functional Data Description. Application to Curve Discrimination. Neurocomputing 73 (7-9): 1210–21. https://chamroukhi.com/papers/chamroukhi_neucomp_2010.pdf.
Chamroukhi, F. 2010. Hidden Process Regression for Curve Modeling, Classification and Tracking. Ph.D. Thesis, Universite de Technologie de Compiegne. https://chamroukhi.com/papers/FChamroukhi-Thesis.pdf.
Useful links:
Report bugs at https://github.com/fchamroukhi/FLaMingos/issues
cemMixRHLP implements the maximum complete likelihood parameter estimation of mixture of RHLP models by the Classification Expectation-Maximization algorithm (CEM algorithm).
cemMixRHLP(X, Y, K, R, p = 3, q = 1, variance_type = c("heteroskedastic", "homoskedastic"), init_kmeans = TRUE, n_tries = 1, max_iter = 100, threshold = 1e-05, verbose = FALSE, verbose_IRLS = FALSE)
cemMixRHLP(X, Y, K, R, p = 3, q = 1, variance_type = c("heteroskedastic", "homoskedastic"), init_kmeans = TRUE, n_tries = 1, max_iter = 100, threshold = 1e-05, verbose = FALSE, verbose_IRLS = FALSE)
X |
Numeric vector of length m representing the covariates/inputs
|
Y |
Matrix of size |
K |
The number of clusters (Number of RHLP models). |
R |
The number of regimes (RHLP components) for each cluster. |
p |
Optional. The order of the polynomial regression. By default, |
q |
Optional. The dimension of the logistic regression. For the purpose of segmentation, it must be set to 1 (which is the default value). |
variance_type |
Optional character indicating if the model is "homoskedastic" or "heteroskedastic". By default the model is "heteroskedastic". |
init_kmeans |
Optional. A logical indicating whether or not the curve partition should be initialized by the K-means algorithm. Otherwise the curve partition is initialized randomly. |
n_tries |
Optional. Number of runs of the EM algorithm. The solution providing the highest log-likelihood will be returned. If |
max_iter |
Optional. The maximum number of iterations for the EM algorithm. |
threshold |
Optional. A numeric value specifying the threshold for the relative difference of log-likelihood between two steps of the EM as stopping criteria. |
verbose |
Optional. A logical value indicating whether or not values of the log-likelihood should be printed during EM iterations. |
verbose_IRLS |
Optional. A logical value indicating whether or not values of the criterion optimized by IRLS should be printed at each step of the EM algorithm. |
cemMixRHLP function implements the CEM algorithm. This function
starts with an initialization of the parameters done by the method
initParam
of the class ParamMixRHLP, then it alternates
between the E-Step, the C-Step (methods of the class
StatMixRHLP), and the CM-Step (method of the class
ParamMixRHLP) until convergence (until the relative
variation of log-likelihood between two steps of the EM algorithm is less
than the threshold
parameter).
EM returns an object of class ModelMixRHLP.
ModelMixRHLP, ParamMixRHLP, StatMixRHLP
data(toydataset) #' # Let's fit a mixRHLP model on a dataset containing 2 clusters: data <- toydataset[1:190,1:21] x <- data$x Y <- t(data[,2:ncol(data)]) mixrhlp <- cemMixRHLP(X = x, Y = Y, K = 2, R = 2, p = 1, verbose = TRUE) mixrhlp$summary() mixrhlp$plot()
data(toydataset) #' # Let's fit a mixRHLP model on a dataset containing 2 clusters: data <- toydataset[1:190,1:21] x <- data$x Y <- t(data[,2:ncol(data)]) mixrhlp <- cemMixRHLP(X = x, Y = Y, K = 2, R = 2, p = 1, verbose = TRUE) mixrhlp$summary() mixrhlp$plot()
emMixHMM implements the maximum-likelihood parameter estimation of a mixture of HMM models by the Expectation-Maximization (EM) algorithm, known as Baum-Welch algorithm in the context of mixHMM.
emMixHMM(Y, K, R, variance_type = c("heteroskedastic", "homoskedastic"), order_constraint = TRUE, init_kmeans = TRUE, n_tries = 1, max_iter = 1000, threshold = 1e-06, verbose = FALSE)
emMixHMM(Y, K, R, variance_type = c("heteroskedastic", "homoskedastic"), order_constraint = TRUE, init_kmeans = TRUE, n_tries = 1, max_iter = 1000, threshold = 1e-06, verbose = FALSE)
Y |
Matrix of size |
K |
The number of clusters (Number of HMM models). |
R |
The number of regimes (HMM components) for each cluster. |
variance_type |
Optional character indicating if the model is "homoskedastic" or "heteroskedastic". By default the model is "heteroskedastic". |
order_constraint |
Optional. A logical indicating whether or not a mask
of order one should be applied to the transition matrix of the Markov chain
to provide ordered states. For the purpose of segmentation, it must be set
to |
init_kmeans |
Optional. A logical indicating whether or not the curve partition should be initialized by the K-means algorithm. Otherwise the curve partition is initialized randomly. |
n_tries |
Optional. Number of runs of the EM algorithm. The solution providing the highest log-likelihood will be returned. If |
max_iter |
Optional. The maximum number of iterations for the EM algorithm. |
threshold |
Optional. A numeric value specifying the threshold for the relative difference of log-likelihood between two steps of the EM as stopping criteria. |
verbose |
Optional. A logical value indicating whether or not values of the log-likelihood should be printed during EM iterations. |
emMixHMM function implements the EM algorithm. This function starts
with an initialization of the parameters done by the method initParam
of
the class ParamMixHMM, then it alternates between the E-Step
(method of the class StatMixHMM) and the M-Step (method of
the class ParamMixHMM) until convergence (until the relative
variation of log-likelihood between two steps of the EM algorithm is less
than the threshold
parameter).
EM returns an object of class ModelMixHMM.
ModelMixHMM, ParamMixHMM, StatMixHMM
data(toydataset) Y <- t(toydataset[,2:ncol(toydataset)]) mixhmm <- emMixHMM(Y = Y, K = 3, R = 3, verbose = TRUE) mixhmm$summary() mixhmm$plot()
data(toydataset) Y <- t(toydataset[,2:ncol(toydataset)]) mixhmm <- emMixHMM(Y = Y, K = 3, R = 3, verbose = TRUE) mixhmm$summary() mixhmm$plot()
emMixHMMR implements the maximum-likelihood parameter estimation of a mixture of HMMR models by the Expectation-Maximization (EM) algorithm.
emMixHMMR(X, Y, K, R, p = 3, variance_type = c("heteroskedastic", "homoskedastic"), order_constraint = TRUE, init_kmeans = TRUE, n_tries = 1, max_iter = 1000, threshold = 1e-06, verbose = FALSE)
emMixHMMR(X, Y, K, R, p = 3, variance_type = c("heteroskedastic", "homoskedastic"), order_constraint = TRUE, init_kmeans = TRUE, n_tries = 1, max_iter = 1000, threshold = 1e-06, verbose = FALSE)
X |
Numeric vector of length m representing the covariates/inputs
|
Y |
Matrix of size |
K |
The number of clusters (Number of HMMR models). |
R |
The number of regimes (HMMR components) for each cluster. |
p |
Optional. The order of the polynomial regression. By default, |
variance_type |
Optional. character indicating if the model is "homoskedastic" or "heteroskedastic". By default the model is "heteroskedastic". |
order_constraint |
Optional. A logical indicating whether or not a mask
of order one should be applied to the transition matrix of the Markov chain
to provide ordered states. For the purpose of segmentation, it must be set
to |
init_kmeans |
Optional. A logical indicating whether or not the curve partition should be initialized by the K-means algorithm. Otherwise the curve partition is initialized randomly. |
n_tries |
Optional. Number of runs of the EM algorithm. The solution providing the highest log-likelihood will be returned. If |
max_iter |
Optional. The maximum number of iterations for the EM algorithm. |
threshold |
Optional. A numeric value specifying the threshold for the relative difference of log-likelihood between two steps of the EM as stopping criteria. |
verbose |
Optional. A logical value indicating whether or not values of the log-likelihood should be printed during EM iterations. |
emMixHMMR function implements the EM algorithm. This function starts
with an initialization of the parameters done by the method initParam
of
the class ParamMixHMMR, then it alternates between the
E-Step (method of the class StatMixHMMR) and the M-Step
(method of the class ParamMixHMMR) until convergence (until
the relative variation of log-likelihood between two steps of the EM
algorithm is less than the threshold
parameter).
EM returns an object of class ModelMixHMMR.
ModelMixHMMR, ParamMixHMMR, StatMixHMMR
data(toydataset) x <- toydataset$x Y <- t(toydataset[,2:ncol(toydataset)]) mixhmmr <- emMixHMMR(X = x, Y = Y, K = 3, R = 3, p = 1, verbose = TRUE) mixhmmr$summary() mixhmmr$plot()
data(toydataset) x <- toydataset$x Y <- t(toydataset[,2:ncol(toydataset)]) mixhmmr <- emMixHMMR(X = x, Y = Y, K = 3, R = 3, p = 1, verbose = TRUE) mixhmmr$summary() mixhmmr$plot()
emMixRHLP implements the maximum-likelihood parameter estimation of a mixture of RHLP models by the Expectation-Maximization (EM) algorithm.
emMixRHLP(X, Y, K, R, p = 3, q = 1, variance_type = c("heteroskedastic", "homoskedastic"), init_kmeans = TRUE, n_tries = 1, max_iter = 1000, threshold = 1e-05, verbose = FALSE, verbose_IRLS = FALSE)
emMixRHLP(X, Y, K, R, p = 3, q = 1, variance_type = c("heteroskedastic", "homoskedastic"), init_kmeans = TRUE, n_tries = 1, max_iter = 1000, threshold = 1e-05, verbose = FALSE, verbose_IRLS = FALSE)
X |
Numeric vector of length m representing the covariates/inputs
|
Y |
Matrix of size |
K |
The number of clusters (Number of RHLP models). |
R |
The number of regimes (RHLP components) for each cluster. |
p |
Optional. The order of the polynomial regression. By default, |
q |
Optional. The dimension of the logistic regression. For the purpose of segmentation, it must be set to 1 (which is the default value). |
variance_type |
Optional character indicating if the model is "homoskedastic" or "heteroskedastic". By default the model is "heteroskedastic". |
init_kmeans |
Optional. A logical indicating whether or not the curve partition should be initialized by the K-means algorithm. Otherwise the curve partition is initialized randomly. |
n_tries |
Optional. Number of runs of the EM algorithm. The solution providing the highest log-likelihood will be returned. If |
max_iter |
Optional. The maximum number of iterations for the EM algorithm. |
threshold |
Optional. A numeric value specifying the threshold for the relative difference of log-likelihood between two steps of the EM as stopping criteria. |
verbose |
Optional. A logical value indicating whether or not values of the log-likelihood should be printed during EM iterations. |
verbose_IRLS |
Optional. A logical value indicating whether or not values of the criterion optimized by IRLS should be printed at each step of the EM algorithm. |
emMixRHLP function implements the EM algorithm. This function starts
with an initialization of the parameters done by the method initParam
of
the class ParamMixRHLP, then it alternates between the
E-Step (method of the class StatMixRHLP) and the M-Step
(method of the class ParamMixRHLP) until convergence (until
the relative variation of log-likelihood between two steps of the EM
algorithm is less than the threshold
parameter).
EM returns an object of class ModelMixRHLP.
ModelMixRHLP, ParamMixRHLP, StatMixRHLP
data(toydataset) # Let's fit a mixRHLP model on a dataset containing 2 clusters: data <- toydataset[1:190,1:21] x <- data$x Y <- t(data[,2:ncol(data)]) mixrhlp <- emMixRHLP(X = x, Y = Y, K = 2, R = 2, p = 1, verbose = TRUE) mixrhlp$summary() mixrhlp$plot()
data(toydataset) # Let's fit a mixRHLP model on a dataset containing 2 clusters: data <- toydataset[1:190,1:21] x <- data$x Y <- t(data[,2:ncol(data)]) mixrhlp <- emMixRHLP(X = x, Y = Y, K = 2, R = 2, p = 1, verbose = TRUE) mixrhlp$summary() mixrhlp$plot()
FData is a reference class which represents general independent and
identically distributed (i.i.d.) functional objects. The data can be ordered
by time (functional time series). In the last case, the field X
represents
the time.
X
Numeric vector of length m representing the covariates/inputs.
Y
Matrix of size representing the observed
responses/outputs.
Y
consists of n functions of X
observed at
points .
mkStochastic ensures that it is a stochastic vector, matrix or array.
mkStochastic(M)
mkStochastic(M)
M |
A vector, matrix or array to transform. |
mkStochastic ensures that the giving argument is a stochastic vector, matrix or array, i.e., that the sum over the last dimension is 1.
A vector, matrix or array for which the sum over the last dimension is 1.
ModelMixHMM represents an estimated mixture of HMM model.
param
A ParamMixHMM object. It contains the estimated values of the parameters.
stat
A StatMixHMM object. It contains all the statistics associated to the MixHMM model.
plot(what = c("clustered", "smoothed", "loglikelihood"), ...)
Plot method
what
The type of graph requested:
"clustered" =
Clustered curves (field
klas
of class StatMixHMM).
"smoothed" =
Smoothed signal (field
smoothed
of class StatMixHMM).
"loglikelihood" =
Value of the log-likelihood for
each iteration (field stored_loglik
of class
StatMixHMM).
...
Other graphics parameters.
summary(digits = getOption("digits"))
Summary method.
digits
The number of significant digits to use when printing.
data(toydataset) Y <- t(toydataset[,2:ncol(toydataset)]) mixhmm <- emMixHMM(Y = Y, K = 3, R = 3, verbose = TRUE) # mixhmm is a ModelMixHMM object. It contains some methods such as 'summary' and 'plot' mixhmm$summary() mixhmm$plot() # mixhmm has also two fields, stat and param which are reference classes as well # Log-likelihood: mixhmm$stat$loglik # Means mixhmm$param$mu
data(toydataset) Y <- t(toydataset[,2:ncol(toydataset)]) mixhmm <- emMixHMM(Y = Y, K = 3, R = 3, verbose = TRUE) # mixhmm is a ModelMixHMM object. It contains some methods such as 'summary' and 'plot' mixhmm$summary() mixhmm$plot() # mixhmm has also two fields, stat and param which are reference classes as well # Log-likelihood: mixhmm$stat$loglik # Means mixhmm$param$mu
ModelMixHMMR represents an estimated mixture of HMMR model.
param
A ParamMixHMMR object. It contains the estimated values of the parameters.
stat
A StatMixHMMR object. It contains all the statistics associated to the MixHMMR model.
plot(what = c("clustered", "smoothed", "loglikelihood"), ...)
Plot method
what
The type of graph requested:
"clustered" =
Clustered curves (field
klas
of class StatMixHMMR).
"smoothed" =
Smoothed signal (field
smoothed
of class StatMixHMMR).
"loglikelihood" =
Value of the log-likelihood for
each iteration (field stored_loglik
of class
StatMixHMMR).
...
Other graphics parameters.
summary(digits = getOption("digits"))
Summary method.
digits
The number of significant digits to use when printing.
data(toydataset) x <- toydataset$x Y <- t(toydataset[,2:ncol(toydataset)]) mixhmmr <- emMixHMMR(X = x, Y = Y, K = 3, R = 3, p = 1, verbose = TRUE) # mixhmmr is a ModelMixHMMR object. It contains some methods such as 'summary' and 'plot' mixhmmr$summary() mixhmmr$plot() # mixhmmr has also two fields, stat and param which are reference classes as well # Log-likelihood: mixhmmr$stat$loglik # Parameters of the polynomial regressions: mixhmmr$param$beta
data(toydataset) x <- toydataset$x Y <- t(toydataset[,2:ncol(toydataset)]) mixhmmr <- emMixHMMR(X = x, Y = Y, K = 3, R = 3, p = 1, verbose = TRUE) # mixhmmr is a ModelMixHMMR object. It contains some methods such as 'summary' and 'plot' mixhmmr$summary() mixhmmr$plot() # mixhmmr has also two fields, stat and param which are reference classes as well # Log-likelihood: mixhmmr$stat$loglik # Parameters of the polynomial regressions: mixhmmr$param$beta
ModelMixRHLP represents an estimated mixture of RHLP model.
param
A ParamMixRHLP object. It contains the estimated values of the parameters.
stat
A StatMixRHLP object. It contains all the statistics associated to the MixRHLP model.
plot(what = c("estimatedsignal", "regressors", "loglikelihood"), ...)
Plot method.
what
The type of graph requested:
"estimatedsignal" =
Estimated signal (field
Ey
of class StatMixRHLP).
"regressors" =
Polynomial regression components
(fields polynomials
and pi_jkr
of class
StatMixRHLP).
"loglikelihood" =
Value of the log-likelihood for
each iteration (field stored_loglik
of class
StatMixRHLP).
...
Other graphics parameters.
By default, all the above graphs are produced.
summary(digits = getOption("digits"))
Summary method.
digits
The number of significant digits to use when printing.
data(toydataset) # Let's fit a mixRHLP model on a dataset containing 2 clusters: data <- toydataset[1:190,1:21] x <- data$x Y <- t(data[,2:ncol(data)]) mixrhlp <- cemMixRHLP(X = x, Y = Y, K = 2, R = 2, p = 1, verbose = TRUE) # mixrhlp is a ModelMixRHLP object. It contains some methods such as 'summary' and 'plot' mixrhlp$summary() mixrhlp$plot() # mixrhlp has also two fields, stat and param which are reference classes as well # Log-likelihood: mixrhlp$stat$loglik # Parameters of the polynomial regressions: mixrhlp$param$beta
data(toydataset) # Let's fit a mixRHLP model on a dataset containing 2 clusters: data <- toydataset[1:190,1:21] x <- data$x Y <- t(data[,2:ncol(data)]) mixrhlp <- cemMixRHLP(X = x, Y = Y, K = 2, R = 2, p = 1, verbose = TRUE) # mixrhlp is a ModelMixRHLP object. It contains some methods such as 'summary' and 'plot' mixrhlp$summary() mixrhlp$plot() # mixrhlp has also two fields, stat and param which are reference classes as well # Log-likelihood: mixrhlp$stat$loglik # Parameters of the polynomial regressions: mixrhlp$param$beta
ParamMixHMM contains all the parameters of a mixture of HMM models.
fData
FData object representing the sample (covariates/inputs
X
and observed responses/outputs Y
).
K
The number of clusters (Number of HMM models).
R
The number of regimes (HMM components) for each cluster.
variance_type
Character indicating if the model is homoskedastic
(variance_type = "homoskedastic"
) or heteroskedastic (variance_type = "heteroskedastic"
). By default the model is heteroskedastic.
order_constraint
A logical indicating whether or not a mask of order
one should be applied to the transition matrix of the Markov chain to
provide ordered states. For the purpose of segmentation, it must be set to
TRUE
(which is the default value).
alpha
Cluster weights. Matrix of dimension .
prior
The prior probabilities of the Markov chains. prior
is a
matrix of dimension . The k-th column represents the prior
distribution of the Markov chain asociated to the cluster k.
trans_mat
The transition matrices of the Markov chains. trans_mat
is
an array of dimension .
mask
Mask applied to the transition matrices trans_mat
. By default,
a mask of order one is applied.
mu
Means. Matrix of dimension . The k-th column gives
represents the k-th cluster and gives the means for the
R
regimes.
sigma2
The variances for the K
clusters. If MixHMM model is
heteroskedastic (variance_type = "heteroskedastic"
) then sigma2
is a
matrix of size (otherwise MixHMM model is homoskedastic
(
variance_type = "homoskedastic"
) and sigma2
is a matrix of size
).
nu
The degrees of freedom of the MixHMM model representing the complexity of the model.
initGaussParamHmm(Y, k, R, variance_type, try_algo)
Initialize the means mu
and sigma2
for the cluster
k
.
initParam(init_kmeans = TRUE, try_algo = 1)
Method to initialize parameters alpha
, prior
, trans_mat
,
mu
and sigma2
.
If init_kmeans = TRUE
then the curve partition is initialized by
the K-means algorithm. Otherwise the curve partition is initialized
randomly.
If try_algo = 1
then mu
and sigma2
are
initialized by segmenting the time series Y
uniformly into
R
contiguous segments. Otherwise, mu
and
sigma2
are initialized by segmenting randomly the time series
Y
into R
segments.
MStep(statMixHMM)
Method which implements the M-step of the EM algorithm to learn the
parameters of the MixHMM model based on statistics provided by the object
statMixHMM
of class StatMixHMM (which contains the
E-step).
ParamMixHMMR contains all the parameters of a mixture of HMMR models.
fData
FData object representing the sample (covariates/inputs
X
and observed responses/outputs Y
).
K
The number of clusters (Number of HMMR models).
R
The number of regimes (HMMR components) for each cluster.
p
The order of the polynomial regression.
variance_type
Character indicating if the model is homoskedastic
(variance_type = "homoskedastic"
) or heteroskedastic (variance_type = "heteroskedastic"
). By default the model is heteroskedastic.
order_constraint
A logical indicating whether or not a mask of order
one should be applied to the transition matrix of the Markov chain to
provide ordered states. For the purpose of segmentation, it must be set to
TRUE
(which is the default value).
alpha
Cluster weights. Matrix of dimension .
prior
The prior probabilities of the Markov chains. prior
is a
matrix of dimension . The k-th column represents the prior
distribution of the Markov chain asociated to the cluster k.
trans_mat
The transition matrices of the Markov chains. trans_mat
is
an array of dimension .
mask
Mask applied to the transition matrices trans_mat
. By default,
a mask of order one is applied.
beta
Parameters of the polynomial regressions. beta
is an array of
dimension , with
p
the order of the polynomial
regression. p
is fixed to 3 by default.
sigma2
The variances for the K
clusters. If MixHMMR model is
heteroskedastic (variance_type = "heteroskedastic"
) then sigma2
is a
matrix of size (otherwise MixHMMR model is homoskedastic
(
variance_type = "homoskedastic"
) and sigma2
is a matrix of size
nu
The degree of freedom of the MixHMMR model representing the complexity of the model.
phi
A list giving the regression design matrix for the polynomial regressions.
initParam(init_kmeans = TRUE, try_algo = 1)
Method to initialize parameters alpha
, prior
,
trans_mat
, beta
and sigma2
.
If init_kmeans = TRUE
then the curve partition is initialized by
the K-means algorithm. Otherwise the curve partition is initialized
randomly.
If try_algo = 1
then beta
and sigma2
are
initialized by segmenting the time series Y
uniformly into
R
contiguous segments. Otherwise, beta
and
sigma2
are initialized by segmenting randomly the time series
Y
into R
segments.
initRegressionParam(Y, k, R, phi, variance_type, try_algo)
Initialize beta
and sigma2
for the cluster k
.
MStep(statMixHMMR)
Method which implements the M-step of the EM algorithm to learn the
parameters of the MixHMMR model based on statistics provided by the
object statMixHMMR
of class StatMixHMMR (which contains
the E-step).
ParamMixRHLP contains all the parameters of a mixture of RHLP models.
fData
FData object representing the sample (covariates/inputs
X
and observed responses/outputs Y
).
K
The number of clusters (Number of RHLP models).
R
The number of regimes (RHLP components) for each cluster.
p
The order of the polynomial regression.
q
The dimension of the logistic regression. For the purpose of segmentation, it must be set to 1.
variance_type
Character indicating if the model is homoskedastic
(variance_type = "homoskedastic"
) or heteroskedastic (variance_type = "heteroskedastic"
). By default the model is heteroskedastic.
alpha
Cluster weights. Matrix of dimension .
W
Parameters of the logistic process. is
an array of dimension
, with
,
, and
q
the order of the
logistic regression. q
is fixed to 1 by default.
beta
Parameters of the polynomial regressions. is an array of dimension
,
with
,
,
p
the order of the
polynomial regression. p
is fixed to 3 by default.
sigma2
The variances for the K
clusters. If MixRHLP model is
heteroskedastic (variance_type = "heteroskedastic"
) then sigma2
is a
matrix of size (otherwise MixRHLP model is homoskedastic
(
variance_type = "homoskedastic"
) and sigma2
is a matrix of size
).
nu
The degree of freedom of the MixRHLP model representing the complexity of the model.
phi
A list giving the regression design matrices for the polynomial and the logistic regressions.
CMStep(statMixRHLP, verbose_IRLS = FALSE)
Method which implements the M-step of the CEM algorithm to learn the
parameters of the MixRHLP model based on statistics provided by the
object statMixRHLP
of class StatMixRHLP (which contains
the E-step and the C-step).
initParam(init_kmeans = TRUE, try_algo = 1)
Method to initialize parameters alpha
, W
, beta
and sigma2
.
If init_kmeans = TRUE
then the curve partition is initialized by
the R-means algorithm. Otherwise the curve partition is initialized
randomly.
If try_algo = 1
then beta
and sigma2
are
initialized by segmenting the time series Y
uniformly into
R
contiguous segments. Otherwise, W
, beta
and
sigma2
are initialized by segmenting randomly the time series
Y
into R
segments.
initRegressionParam(Yk, k, try_algo = 1)
Initialize the matrix of polynomial regression coefficients beta_k for
the cluster k
.
MStep(statMixRHLP, verbose_IRLS = FALSE)
Method which implements the M-step of the EM algorithm to learn the
parameters of the MixRHLP model based on statistics provided by the
object statMixRHLP
of class StatMixRHLP (which contains
the E-step).
StatMixHMM contains all the statistics associated to a MixHMM model, in particular the E-Step of the EM algorithm.
tau_ik
Matrix of size giving the posterior probabilities
that the curve
originates from the
-th HMM model.
gamma_ikjr
Array of size giving the posterior
probabilities that the observation
originates from the
-th regime of the
-th HMM model.
loglik
Numeric. Log-likelihood of the MixHMM model.
stored_loglik
Numeric vector. Stored values of the log-likelihood at each iteration of the EM algorithm.
klas
Row matrix of the labels issued from tau_ik
. Its elements are
,
.
z_ik
Hard segmentation logical matrix of dimension
obtained by the Maximum a posteriori (MAP) rule:
.
smoothed
Matrix of size giving the smoothed time series.
The smoothed time series are computed by combining the time series
with both the estimated posterior regime
probabilities
gamma_ikjr
and the corresponding estimated posterior
cluster probability tau_ik
. The k-th column gives the estimated mean
series of cluster k.
BIC
Numeric. Value of BIC (Bayesian Information Criterion).
AIC
Numeric. Value of AIC (Akaike Information Criterion).
ICL1
Numeric. Value of ICL (Integrated Completed Likelihood Criterion).
log_alpha_k_fyi
Private. Only defined for calculations.
exp_num_trans
Private. Only defined for calculations.
exp_num_trans_from_l
Private. Only defined for calculations.
computeStats(paramMixHMM)
Method used in the EM algorithm to compute statistics based on
parameters provided by the object paramMixHMM
of class
ParamMixHMM.
EStep(paramMixHMM)
Method used in the EM algorithm to update statistics based on parameters
provided by the object paramMixHMM
of class ParamMixHMM
(prior and posterior probabilities).
MAP()
MAP calculates values of the fields z_ik
and klas
by applying the Maximum A Posteriori Bayes allocation rule.
.
StatMixHMMR contains all the statistics associated to a MixHMMR model, in particular the E-Step of the EM algorithm.
tau_ik
Matrix of size giving the posterior probabilities
that the curve
originates from the
-th HMMR model.
gamma_ikjr
Array of size giving the posterior
probabilities that the observation
originates from the
-th regime of the
-th HMM model.
loglik
Numeric. Log-likelihood of the MixHMMR model.
stored_loglik
Numeric vector. Stored values of the log-likelihood at each iteration of the EM algorithm.
klas
Row matrix of the labels issued from tau_ik
. Its elements are
,
.
z_ik
Hard segmentation logical matrix of dimension
obtained by the Maximum a posteriori (MAP) rule:
.
smoothed
Matrix of size giving the smoothed time series.
The smoothed time series are computed by combining the polynomial
regression components with both the estimated posterior regime
probabilities
gamma_ikjr
and the corresponding estimated posterior
cluster probability tau_ik
. The k-th column gives the estimated mean
series of cluster k.
BIC
Numeric. Value of BIC (Bayesian Information Criterion).
AIC
Numeric. Value of AIC (Akaike Information Criterion).
ICL1
Numeric. Value of ICL (Integrated Completed Likelihood Criterion).
log_alpha_k_fyi
Private. Only defined for calculations.
exp_num_trans
Private. Only defined for calculations.
exp_num_trans_from_l
Private. Only defined for calculations.
computeStats(paramMixHMMR)
Method used in the EM algorithm to compute statistics based on
parameters provided by the object paramMixHMMR
of class
ParamMixHMMR.
EStep(paramMixHMMR)
Method used in the EM algorithm to update statistics based on parameters
provided by the object paramMixHMMR
of class ParamMixHMMR
(prior and posterior probabilities).
MAP()
MAP calculates values of the fields z_ik
and klas
by applying the Maximum A Posteriori Bayes allocation rule.
.
StatMixRHLP contains all the statistics associated to a MixRHLP model, in particular the E-Step (and C-Step) of the (C)EM algorithm.
pi_jkr
Array of size representing the logistic
proportion for cluster k.
tau_ik
Matrix of size giving the posterior probabilities
(fuzzy segmentation matrix) that the curve
originates from the
-th RHLP model.
z_ik
Hard segmentation logical matrix of dimension
obtained by the Maximum a posteriori (MAP) rule:
.
klas
Column matrix of the labels issued from z_ik
. Its elements are
,
.
gamma_ijkr
Array of size giving the posterior
probabilities that the observation
originates from the
-th regime of the
-th RHLP model.
polynomials
Array of size giving the values of the
estimated polynomial regression components.
weighted_polynomials
Array of size giving the values
of the estimated polynomial regression components weighted by the prior
probabilities
pi_jkr
.
Ey
Matrix of size (m, K). Ey
is the curve expectation
(estimated signal): sum of the polynomial components weighted by the
logistic probabilities pi_jkr
.
loglik
Numeric. Observed-data log-likelihood of the MixRHLP model.
com_loglik
Numeric. Complete-data log-likelihood of the MixRHLP model.
stored_loglik
Numeric vector. Stored values of the log-likelihood at each EM iteration.
stored_com_loglik
Numeric vector. Stored values of the Complete log-likelihood at each EM iteration.
BIC
Numeric. Value of BIC (Bayesian Information Criterion).
ICL
Numeric. Value of ICL (Integrated Completed Likelihood).
AIC
Numeric. Value of AIC (Akaike Information Criterion).
log_fk_yij
Matrix of size giving the values of the
probability density function
,
.
log_alphak_fk_yij
Matrix of size giving the values of the
logarithm of the joint probability density function
,
.
log_gamma_ijkr
Array of size giving the logarithm of
gamma_ijkr
.
computeStats(paramMixRHLP)
Method used in the EM algorithm to compute statistics based on
parameters provided by the object paramMixRHLP
of class
ParamMixRHLP.
CStep(reg_irls)
Method used in the CEM algorithm to update statistics.
EStep(paramMixRHLP)
Method used in the EM algorithm to update statistics based on parameters
provided by the object paramMixRHLP
of class ParamMixRHLP
(prior and posterior probabilities).
MAP()
MAP calculates values of the fields z_ik
and klas
by applying the Maximum A Posteriori Bayes allocation rule.
.
A dataset composed of 30 simulated time series with regime changes.
toydataset
toydataset
A data frame with 350 rows and 31 variables:
The covariate variable which is the time in that case.
Times series with a wave form shape and for which a normally distributed random noise has been added.
Same as y1
.
Same as y1
.
Same as y1
.
Same as y1
.
Same as y1
.
Same as y1
.
Same as y1
.
Same as y1
.
Same as y1
.
Time series generated as follows:
First regime: 120 values of Normally distributed random numbers with mean 5 and variance 1.
Second regime: 70 values of Normally distributed random numbers with mean 7 and variance 1.
Third regime: 160 values of Normally distributed random numbers with mean 5 variance 1.
Same as y11
.
Same as y11
.
Same as y11
.
Same as y11
.
Same as y11
.
Same as y11
.
Same as y11
.
Same as y11
.
Same as y11
.
Time series generated as follows:
First regime: 80 values of Normally distributed random numbers with mean 7 variance 1.
Second regime: 130 values of Normally distributed random numbers with mean 5 variance 1.
Third regime: 140 values of Normally distributed random numbers with mean 4 variance 1.
Same as y21
.
Same as y21
.
Same as y21
.
Same as y21
.
Same as y21
.
Same as y21
.
Same as y21
.
Same as y21
.
Same as y21
.