--- title: "Unsupervised clustering with general GMCMs" author: "Anders Ellern Bilgrau" date: "2019-08-28" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Unsupervised clustering with general GMCMs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r knit-int, echo=FALSE, include=FALSE} set.seed(7869670) knitr::opts_knit$set(self.contained = TRUE) ``` This is a quick tutorial for using the general GMCM for unsupervised clustering. ## Initialization The **GMCM**^[1][1]^ package is loaded. ```{r load-packages, include=TRUE} #install.packages("GMCM") # Uncomment to install the GMCM package library("GMCM") ``` If **GMCM** is *not* installed, please uncomment the above line and rerun the script to install from CRAN. The development version can be installed from [GitHub following the instructions there](https://github.com/AEBilgrau/GMCM). ## Simulation of data First, we simulate some toy data. We wish to simulate, say, 1000 observations of 2-dimensions each of which stems from one of 3 components. In order to do so, we construct a parameter object `theta` and simulate from the GMCM. ```{r sim-data, include=TRUE, echo=TRUE} true_theta <- rtheta(m = 3, d = 2) plot(true_theta) ds <- SimulateGMCMData(n = 1000, theta = true_theta) str(ds) ``` As can be seen, the `SimulateGMCMData` function returns a `list` containing the copula observations `u`, the unobserved process `z`, the latent groups `K`, and the `theta` used for simulation. Plotting the true copula realizations and coloring by the true classes shows what we intend to estimate and recover: ```{r data-plot} plot(ds$u, col = ds$K) ``` A ranking of `u` (or `z`) corresponds to what would be the data in a real application. ```{r select-data, include=TRUE, echo=TRUE} uhat <- Uhat(ds$u) plot(uhat) ``` ## Initial parameters To fit a general GMCM, we must choose some initial parameters. The `choose.theta` is a (sometimes) helpful default we here invoke explicitly: ```{r show-initial-params, include=TRUE, echo=TRUE} init_theta <- choose.theta(uhat, m = 3) print(init_theta) ``` The function needs to know how many components we want to estimate though this number may very well be unknown in practice. ## Model fitting With the data loaded and defined initial parameters, the model is now fitted. ```{r fit_model, error=TRUE} est_theta <- fit.full.GMCM(u = uhat, # Ranking function is applied automatically theta = init_theta, method = "NM", max.ite = 5000, verbose = FALSE) print(est_theta) ``` The fitting method is set to `"NM"` with a maximum number of iterations of `100`. ## Unsupervised clustering The estimated parameters are used to calculated posterior component probabilities on which the classification is based: ```{r compute_probs} membership <- classify(uhat, est_theta) str(membership) ``` If it is of interest, the posterior probability can be computed directly using ```{r post_prob} post_prob <- get.prob(uhat, est_theta) # Compute component probabilities ``` ## Results The number of observations in each class can be e.g. counted by ```{r classes_table} table(membership) ``` The results are also displayed by plotting ```{r plot_results} par(mfrow = c(1,2)) plot(uhat, col = membership, asp = 1) # Plot of estimated copula values z <- GMCM:::qgmm.marginal(uhat, theta = est_theta) # Estimate latent process plot(z, col = membership, asp = 1) # Plot of estimated latent process ``` The fitted `theta` object can also be plotted directly: ```{r plot_theta} plot(est_theta) ``` ### Session information This report was generated using **rmarkdown**^[2][2]^ and **knitr**^[3][3]^ under the session given below. ```{r session-info} sessionInfo() ``` ### References Please cite the **GMCM** paper^[1][1]^ if you use the package or shiny app. ```{r citation, echo=FALSE, results='asis'} cites <- lapply(c("GMCM", "knitr", "rmarkdown"), citation) fmt_cites <- unlist(lapply(cites, format, style = "text")) cat(paste0(" ", seq_along(fmt_cites), ". ", fmt_cites, "\n")) ``` [1]: http://doi.org/10.18637/jss.v070.i02 [2]: https://bookdown.org/yihui/rmarkdown/parameterized-reports.html [3]: https://yihui.name/knitr/demo/stitch/