Title: | Mixtures of Continuous Time Markov Models |
---|---|
Description: | Provides an expectation maximization (EM) algorithm to fit a mixture of continuous time Markov models for use with clickstream or other sequence type data. Gallaugher, M.P.B and McNicholas, P.D. (2018) <arXiv:1802.04849>. |
Authors: | Michael P.B. Gallaugher, Paul D. McNicholas |
Maintainer: | Michael P.B. Gallaugher <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.7 |
Built: | 2024-10-15 06:19:07 UTC |
Source: | CRAN |
This function fits the continuous time first-order Markov model for a specified set of groups and returns the model chosen by the BIC. This is an implementation of the methodology developed in Gallaugher and McNicholas (2019).
ClickClust_EM(x, t, J, G, itemEM = 5, starts = 100, maxit = 5000, tol = 0.001, Contin = TRUE, Verbose = TRUE, seed = 1, known = NULL, crit = "BIC", returnall = FALSE)
ClickClust_EM(x, t, J, G, itemEM = 5, starts = 100, maxit = 5000, tol = 0.001, Contin = TRUE, Verbose = TRUE, seed = 1, known = NULL, crit = "BIC", returnall = FALSE)
x |
A list of states |
t |
A list of times spent in each state |
J |
The total number of states |
G |
A vector containing the number of groups to test |
itemEM |
The number of emEM iterations for initialization (defaults to 5) |
starts |
The number of random starting values for the emEM algorithm (defaults to 100) |
maxit |
The maximum number of iterations after initialization (defaults to 5000) |
tol |
The tolerance for convergence (defaults to 0.001) |
Contin |
Fit the continuous time model (defaults to TRUE). If FALSE, fit the discrete model. |
Verbose |
Display Messages (defaults to TRUE) |
seed |
Sets the seed for the emEM algorithm (defaults to 1) |
known |
A vector of labels for semi-supervised classification. 0 indicates unknown observations. The known labels are denoted by their group number (1,2,3, etc.). |
crit |
The model selection criterion to use ("BIC" or "ICL"). Defaults to "BIC". |
returnall |
If true, returns the results for all groups considered. Defaults to FALSE. |
Returns a list with parameter and classification estimates for the best model chosen by the selection criterion.
Michael P.B. Gallaugher and Paul D. McNicholas (2019). Clustering and semi-supervised classification for clickstream data via mixture models. arXiv preprint arXiv:1802.04849v2.
library(gtools) data(SimData) x<-SimData[[1]] t<-SimData[[2]] Click_2G<-ClickClust_EM(x=x,t=t,J=5,G=2,starts=10)
library(gtools) data(SimData) x<-SimData[[1]] t<-SimData[[2]] Click_2G<-ClickClust_EM(x=x,t=t,J=5,G=2,starts=10)
This is a revised version of the MSNBC323 dataset in the R package ClickClust (Melnykov, 2016). This dataset contains the clickstreams without within-state transitions x
and simulated time points t
. See Gallaugher and McNicholas (2019) for further details.
data(mMSNBC)
data(mMSNBC)
An object of class list
of length 2.
Michael P.B. Gallaugher and Paul D. McNicholas (2019). Clustering and semi-supervised classification for clickstream data via mixture models. arXiv preprint arXiv:1802.04849v2.
Volodymyr Melnykov (2016). ClickClust: An R Package for Model-Based Clustering of Categorical Sequences. Journal of Statistical Software 74(9), 1-34.
This is a simulated dataset with two groups. It is in the form of a list with the first element being the list of states and the second element being the list of time stamps. This is an example of the simulated data used in Simulation 1B in Gallaugher and McNicholas (2019).
data(SimData)
data(SimData)
An object of class list
of length 2.
Michael P.B. Gallaugher and Paul D. McNicholas (2019). Clustering and semi-supervised classification for clickstream data via mixture models. arXiv preprint arXiv:1802.04849v2.