Title: | Clustering via Stochastic Approximation and Gaussian Mixture Models |
---|---|
Description: | Computes clustering by fitting Gaussian mixture models (GMM) via stochastic approximation following the methods of Nguyen and Jones (2018) <doi:10.1201/9780429446177>. It also provides some test data generation and plotting functionality to assist with this process. |
Authors: | Andrew T. Jones, Hien D. Nguyen |
Maintainer: | Andrew T. Jones <[email protected]> |
License: | GPL-3 |
Version: | 0.2.4 |
Built: | 2024-11-14 06:40:29 UTC |
Source: | CRAN |
Generate a series of gain factors.
gainFactors(Number, Burnin)
gainFactors(Number, Burnin)
Number |
Number of values required. |
Burnin |
Number of 'Burnin' values at the beginning of sequence. |
Gamma, a vector of gain factors.
g<-gainFactors(10^4, 2*10^3)
g<-gainFactors(10^4, 2*10^3)
This function is primarily a convienence wrapper for MixSim.
generateSimData(ngroups = 5, Dimensions = 5, Number = 10^4)
generateSimData(ngroups = 5, Dimensions = 5, Number = 10^4)
ngroups |
Number of mixture components. Default 5. |
Dimensions |
number of Dimensions. Default 5. |
Number |
number of samples. Default 10^4. |
List of results: X, Y, simobject.
sims<-generateSimData(ngroups=10, Dimensions=10, Number=10^4) sims<-generateSimData()
sims<-generateSimData(ngroups=10, Dimensions=10, Number=10^4) sims<-generateSimData()
The SAGMM package allows for computation of gaussian mixture models using stochastic approximation to increase efficiency with large data sets.
The primary function SAGMMFit
allows this to be performed in a relative flexible manner.
Andrew T. Jones and Hien D. Nguyen
Nguyen & Jones (2018). Big Data-Appropriate Clustering via Stochastic Approximation and Gaussian Mixture Models. In Data Analytics (pp. 79-96). CRC Press.
Fit a GMM via Stochastic Approximation. See Reference.
SAGMMFit(X, Y = NULL, Burnin = 5, ngroups = 5, kstart = 10, plot = FALSE)
SAGMMFit(X, Y = NULL, Burnin = 5, ngroups = 5, kstart = 10, plot = FALSE)
X |
numeric matrix of the data. |
Y |
Group membership (if known). Where groups are integers in 1:ngroups. If provided ngroups can |
Burnin |
Ratio of observations to use as a burn in before algorithm begins. |
ngroups |
Number of mixture components. If Y is provided, and groups is not then is overridden by Y. |
kstart |
number of kmeans starts to initialise. |
plot |
If TRUE generates a plot of the clustering. |
A list containing
Cluster |
The clustering of each observation. |
plot |
A plot of the clustering (if requested). |
l2 |
Estimate of Lambda^2 |
ARI1 |
Adjusted Rand Index 1 - using k-means |
ARI2 |
Adjusted Rand Index 2 - using GMM Clusters |
ARI3 |
Adjusted Rand Index 3 - using intialiation k-means |
KM |
Initial K-means clustering of the data. |
pi |
The cluster proportions (vector of length ngroups) |
tau |
tau matrix of conditional probabilities. |
fit |
Full output details from inner C++ loop. |
Andrew T. Jones and Hien D. Nguyen
Nguyen & Jones (2018). Big Data-Appropriate Clustering via Stochastic Approximation and Gaussian Mixture Models. In Data Analytics (pp. 79-96). CRC Press.
sims<-generateSimData(ngroups=10, Dimensions=10, Number=10^4) res1<-SAGMMFit(sims$X, sims$Y) res2<-SAGMMFit(sims$X, ngroups=5)
sims<-generateSimData(ngroups=10, Dimensions=10, Number=10^4) res1<-SAGMMFit(sims$X, sims$Y) res2<-SAGMMFit(sims$X, ngroups=5)