Title: | Monitor Changes in Cluster Solutions of Dynamic Datasets |
---|---|
Description: | Monitor and trace changes in clustering solutions of accumulating datasets at successive time points. The clusters can adopt External and Internal transition at succeeding time points. The External transitions comprise of Survived, Merged, Split, Disappeared, and newly Emerged candidates. In contrast, Internal transition includes changes in location and cohesion of the survived clusters. The package uses MONIC framework developed by Spiliopoulou, Ntoutsi, Theodoridis, and Schult (2006)<doi:10.1145/1150402.1150491> . |
Authors: | Muhammad Atif |
Maintainer: | Muhammad Atif <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2024-11-20 06:40:08 UTC |
Source: | CRAN |
Partition data into clusters
Object of class Clustering
containing clustering solution of cumulative dataset D_i. The object of class
Clustering
comprise of four slots. Slot Clusters
contain data items of each cluster, slot Centers
contain
cluster centers, slot k
contain the number of centers, while slot clusterMem
contain cluster memberships vector.
Cluster
List of matrices, where each element of the list include data items belonging to the corresponding cluster.
Centers
Matrix of cluster centers.
k
Number of centers.
clusterMem
Numeric vector of cluster membership.
Initialize slots of class Clustering
by partitioning the dataset into k
clusters.
Clusters(object, x, k) ## S4 method for signature 'Clustering,matrix,numeric' Clusters(object, x, k)
Clusters(object, x, k) ## S4 method for signature 'Clustering,matrix,numeric' Clusters(object, x, k)
object |
An object of class |
x |
Numeric matrix of data. |
k |
Number of centers. |
Runs cclust
function from "flexclust" package with default settings i.e.
method = "kmeans", dist = "euclidean", and partition the dataset. Returns object of class Clustering
.
An object of class Clustering
A list of datasets generated at four time points containing two variables and cluster membership at each point.
Data2D
Data2D
A data frame
X1.
X2.
Class membership.
A list of datasets generated at four points containing three variables and cluster membership at each point.
Data3D
Data3D
A data frame
X1.
X2.
X3.
Class membership.
This S4 method trace cluster solutions of dynamic dataset, and identify the candidates that experience external transition from first clustering and emerged at second clustering.
extTransitionCan(object) ## S4 method for signature 'TransitionCan' extTransitionCan(object)
extTransitionCan(object) ## S4 method for signature 'TransitionCan' extTransitionCan(object)
object |
An object of class Transitioncan |
Return an object of class TransitionCan
Trace cluster solutions of dynamic datasets and count the number of clusters that experiences external transition from first clustering. The external transition includes survived, split into various daughters, spliced into one, disappeared, and newly emerged candidates.
extTransitionCount(object) ## S4 method for signature 'TransitionCount' extTransitionCount(object)
extTransitionCount(object) ## S4 method for signature 'TransitionCount' extTransitionCount(object)
object |
An object of class |
Return an object of class TransitionCount
This method identify internal transition of the survived clusters, obtained from 'extTransitionCan()' method.
Trace clustering solutions of cumulative datasets and identify the survived clusters experiencing Internal transitions. Internal transition includes the change in location and density of the survived candidates.
internalTransition(object) ## S4 method for signature 'intTransitionCan' internalTransition(object)
internalTransition(object) ## S4 method for signature 'intTransitionCan' internalTransition(object)
object |
An object of class |
Return an object of class intTransitionCan
Class containing results of Internal Transition of survived clusters from first clustering .
object |
An object of class Transitioncan |
Location.diff
Vector of integers containing difference in location (= Distance bw cluster centers/min(rx,ry)).
Compactness.diff
Vector of integers containing Change in density of survived clusters (d(rx, ry)).
Location_thrHold
Minimum value of threshold for shift in location.
Density_thrHold
Minimum value of threshold for change in density.
ShiftLocCan
Vector of integers containing Survived candidates with shift in their location.
NoShiftLocCan
Vector of integers containing Survived candidates with no Shift in their Location.
MoreCompactCan
Vector of integers containing Survived Candidates Which becomes more compact.
MoreDiffuseCan
Vector of integers containing Survived Candidates Which becomes more diffuse.
NoChangeCompactCan
Vector of integers containing Survived candidates with no change in compactness.
An S4 class that contain time steps
object |
An object of class |
TimeStep
Time Steps
This method plot 3 barplot and 1 line graph. The first stack barplot shows SurvivalRatio and AbsorptionRatio, second barplot shows number of newly emerged clusters at each time stamp, third barplot shows number of disapeared clusters at each time stamp. The line graph shows passforward Ratio and Survival Ratio.
moplot(object) ## S4 method for signature 'Monic' moplot(object)
moplot(object) ## S4 method for signature 'Monic' moplot(object)
object |
An object of class Monic |
Initialize slots of class OverLap
by importing clustering solutions of dynamic
datasets at two consecutive time points. Clusters at each time point should be provided as a list of
matrices, where each matrix contains dataset belongs to the corresponding cluster.
Overlap(object, e1, e2) ## S4 method for signature 'OverLap,Clustering,Clustering' Overlap(object, e1, e2) ## S4 method for signature 'OverLap,ANY,ANY' Overlap(object, e1, e2)
Overlap(object, e1, e2) ## S4 method for signature 'OverLap,Clustering,Clustering' Overlap(object, e1, e2) ## S4 method for signature 'OverLap,ANY,ANY' Overlap(object, e1, e2)
object |
An object of class |
e1 |
An object of class |
e2 |
An object of class |
Return an object of class OverLap
.
Contains matrix of similarity indices between clusters, after clustering dynamic datasets at consecutive time points.
Overlap
A numeric matrix containing the similarity index between clusters extracted at time point t_1
and t_2
.
The rows of the matrix illustrate clusters extracted from first clustering ,whereas columns represent
clusters extracted from second clustering
.
rx
A numeric vector containg radius of each cluster from first clustering .
ry
A numeric vector containg radius of each cluster from second clustering .
Centersx
A numeric vector containing centers of clusters from first clustering .
Centersy
A numeric vector containing centers of clusters from second clustering .
avgDisx
A numeric vector containing average distance between points in a cluster from its center in first clustering .
avgDisy
A numeric vector containing average distance between points in a cluster from its center in second clustering .
clusterMem
A vector of integers containing cluster membership from second clustering .
Show Method for output
## S4 method for signature 'Monic' show(object)
## S4 method for signature 'Monic' show(object)
object |
An object of class Monic |
Model and trace the evolution of clusters evolving over time in cumulative
datasets. A typical call to Transition()
function involves three essential pieces:
the data input (listdata, listclus, overlap)
, choice of window swSize
,
and the threshold parameters. The function either receive a list of datasets arriving at
time points t_1, t_2, t_3, ..., t_n
respectively, list of clustering solutions
extracted from cumulative datasets at successive time points, or list of objects of class
OverLap
(see Details).
Transition( listdata, swSize = 1, Overlap = NULL, listclus = NULL, typeind = 1, Survival_thrHold = 0.7, Split_thrHold = 0.3, location_thrHold = 0.3, density_thrHold = 0.3, k = NULL )
Transition( listdata, swSize = 1, Overlap = NULL, listclus = NULL, typeind = 1, Survival_thrHold = 0.7, Split_thrHold = 0.3, location_thrHold = 0.3, density_thrHold = 0.3, k = NULL )
listdata |
List of numeric matrices containing datasets |
swSize |
Integer value (1, length(listdata)) indicating size of the sliding window. As time goes
by, each window consist only objects that fall in the interval [t-swSize+1, t], while older objects
are discarded. The default value of |
Overlap |
A list of objects as produced by the |
listclus |
|
typeind |
Type indicator. |
Survival_thrHold |
A numeric value (0,1) indicating minimum threshold value for survival of clusters. |
Split_thrHold |
A numeric value (0,1) indicating minimum threshold value for split of clusters. |
location_thrHold |
A numeric value (0,1) indicating minimum threshold value for shift in location of survived clusters. |
density_thrHold |
A numeric value (0,1) indicating minimum threshold value for changes in density of Survived clusters. |
k |
Numeric Vector of length |
The Transition()
function apply 'MONIC' algorithm presented by Spiliopoulou et.al (2006) to trace
changes in cluster solutions of dynamic data sets. The changes includes two types of transition i.e. External transition
and Internal transition. External Transition consist of 'Survive', 'Split', 'Merge', 'Disappeared' and 'newly emerged' candidates,
while Internal transition consist of changes in location and cohesion of the survived clusters. The listdata
argument
allow user to import dynamic datasets as a list of matrices or data frames, where each element of the list is a matrix containing
data set at a single time point. Each dataset are clustered by 'kmeans' algorithm using default settings of cclust()
function
from flexclust
package. The number of clusters at each time stamp can be import by k
argument of the function,
which is a vector of integers encompassing number of partitions in corresponding datasets of listdata
argument. Once the datasets are
clustered, the 'Overlap' matrices in clustering at consecutive time stamps are calculated. The Overlap matrix is
calculated by using algorithm presented by Ntoutsi, I., et.al (2012). These 'Overlap' matrices are used to trace the
transitions occurred in cluster solutions.
Alternatively, the user can directly import list of 'Overlap' matrices between consecutive clustering. The Overlap
matrix can be calculated using Overlap(obj, e1, e2)
method of the package, where 'obj' is the object of class
OverLap
and e1, e2 are any clustering at time stamp i and j respectively.
As a third option user can provide list of clusters at each data point utilizing listclus
argument. Each element
of the listclus
is a nested list, which holds clusters at a single time stamp.
Returns A list of class Monic
.
Survive |
Number of clusters survived. |
Merged |
Number of clusters merged. |
Split |
Number of clusters split. |
Died |
Number of clusters disappeared. |
new.Emerged |
Number of newly emerged clusters, which are not upshot of any external transition. |
SurvivalCanx |
A vector of integers indicating candidates from the first clustering survived to the latter time stamp |
SurvivalCany |
A vector of integers indicating candidates of second clustering, that clinch the survival candidates from first clustering. |
SplitCanx |
A vector of integers indicating candidate(s) that split into various daughter clusters from first clustering. |
SplitCany |
List of integer vector(s) designating candidates appeared, as a result of splits from first clustering. |
MergeCanx |
List of integer vector(s) designating Candidates that spliced together to form new clusters. Each element of the list gives candidates that merge together to form one. |
MergeCany |
Vector of integers designating candidates that emerged, as a result of merger of different candidates from first clustering. |
EmergCan |
Vector of integers contain Newly emerged candidates, which are not result of any external transition. |
SurvivalRatio |
The Ratio of survived clusters at second clustering to the total number of clusters at first clustering. |
AbsorptionRatio |
Ratio of number of merged clusters to total number of clusters at first clustering. |
passforwardRatio |
Sum of SurvivalRatio and AbsorptionRatio. This gives the ratio of clusters that is also present at second clustering either in the form of survival or absorption. |
Overlap |
A numeric matrix containing overlap of the two clustering. The rows of matrix indicate first clustering, while columns indicate second clustering. |
Centersx |
A matrix of cluster centers from first clustering. |
Centersx |
A matrix of cluster centers from second clustering. |
rx |
A numeric vector containing radius of each cluster from first clustering. |
ry |
A numeric vector containing radius of each cluster from second clustering. |
avgDisx |
A numeric vector containing average distance of points in a cluster from its center in first clustering. |
avgDisy |
A numeric vector containing average distance of points in a cluster from its center in second clustering. |
ShiftLocCan |
A vector of integers comprises of Survived candidates with shift in location. |
NoShiftLocCan |
A vector of integers comprises of Survived candidates with no shift in location. |
MoreCompactCan |
A Vector of integers comprises of Survived candidates, which becomes more compact. |
MoreDiffuseCan |
A Vector of integers comprises of Survived candidates, which becomes more diffuse. |
NoChangeCompactCan |
A Vector of integers comprises of Survived candidates, with no changes in compactness. |
Location.diff |
A numeric vector containing Distance between the centers of survived clusters. |
Compactness.diff |
A numeric vector containing Difference between compactness of survived clusters. |
Cluster_Tracex |
A vector containing result of each cluster from first clustering. |
Cluster_Tracey |
A Vector representing result of each cluster from second clustering. |
clusterMem |
A vector of integers (from 1 to k) indicating the point to which cluster it is allocated from second clusterig. |
Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R. MONIC: modeling and monitoring cluster transitions. In: Eliassi-Rad, T., Ungar, L. H., Craven, M., Gunopulos, D. (eds.) ACM SIGKDD 2006, pp. 706-711. ACM, Philadelphia (2006).
### Example 1: typeind = 1 (listdata Argument) d1 <- Data2D[[1]][c("X1", "X2")] d2 <- Data2D[[2]][c("X1", "X2")] d3 <- Data2D[[3]][c("X1", "X2")] listdata <- list(d1, d2, d3) p <- Transition(listdata = listdata, swSize = 1, typeind = 1, Survival_thrHold = 0.8, Split_thrHold = 0.3, density_thrHold = 0.3, location_thrHold = 0.3, k = c(3,3,2)) ### Example 2: typeind = 3 (listclus Argument) D1 <- d1 D2 <- merge(d1, d2, all.x = TRUE, all.y = TRUE) D3 <- merge(D2, d3, all.x = TRUE, all.y = TRUE) set.seed(10) f1 <- kmeans(D1, 3) C1 <- list() for(i in 1:3)C1[[i]] <- D1[f1$cluster == i, ] f2 <- kmeans(D2, 3) C2 <- list() for(i in 1:3)C2[[i]] <- D2[f2$cluster == i, ] f3 <- kmeans(D3, 2) C3 <- list() for(i in 1:2)C3[[i]] <- D3[f3$cluster == i, ] listclus <- list(C1, C2, C3) p <- Transition(listclus = listclus, typeind = 3, Survival_thrHold = 0.8, Split_thrHold = 0.3, density_thrHold = 0.3, location_thrHold = 0.3) ### Example 3: typeind = 3 (Overlap Argument) obj <- new("OverLap") Overlap1 <- Overlap(obj, e1 = C1, e2 = C2) Overlap2 <- Overlap(obj, e1 = C2, e2 = C3) Overlap <- list(Overlap1, Overlap2) p <- Transition(Overlap = Overlap, typeind = 2, Survival_thrHold = 0.8, Split_thrHold = 0.3, density_thrHold = 0.3, location_thrHold = 0.3)
### Example 1: typeind = 1 (listdata Argument) d1 <- Data2D[[1]][c("X1", "X2")] d2 <- Data2D[[2]][c("X1", "X2")] d3 <- Data2D[[3]][c("X1", "X2")] listdata <- list(d1, d2, d3) p <- Transition(listdata = listdata, swSize = 1, typeind = 1, Survival_thrHold = 0.8, Split_thrHold = 0.3, density_thrHold = 0.3, location_thrHold = 0.3, k = c(3,3,2)) ### Example 2: typeind = 3 (listclus Argument) D1 <- d1 D2 <- merge(d1, d2, all.x = TRUE, all.y = TRUE) D3 <- merge(D2, d3, all.x = TRUE, all.y = TRUE) set.seed(10) f1 <- kmeans(D1, 3) C1 <- list() for(i in 1:3)C1[[i]] <- D1[f1$cluster == i, ] f2 <- kmeans(D2, 3) C2 <- list() for(i in 1:3)C2[[i]] <- D2[f2$cluster == i, ] f3 <- kmeans(D3, 2) C3 <- list() for(i in 1:2)C3[[i]] <- D3[f3$cluster == i, ] listclus <- list(C1, C2, C3) p <- Transition(listclus = listclus, typeind = 3, Survival_thrHold = 0.8, Split_thrHold = 0.3, density_thrHold = 0.3, location_thrHold = 0.3) ### Example 3: typeind = 3 (Overlap Argument) obj <- new("OverLap") Overlap1 <- Overlap(obj, e1 = C1, e2 = C2) Overlap2 <- Overlap(obj, e1 = C2, e2 = C3) Overlap <- list(Overlap1, Overlap2) p <- Transition(Overlap = Overlap, typeind = 2, Survival_thrHold = 0.8, Split_thrHold = 0.3, density_thrHold = 0.3, location_thrHold = 0.3)
Class containing candidates that adopted external transition from first clustering ,
and emerged as new clusters at second clustering
.
SurvivalCanx
Vector of integers comprising Candidates that Survive from first clustering .
SurvivalCany
Vector of integers comprising Candidates that Survive to second clustering .
SplitCanx
Vector of integers comprising Candidates that Sliced into Various daughter Clusters from
first clustering .
SplitCany
List of integer vectors comprising Candidates that emerged as daughter clusters in second clustering
because of Split from first clustering
.
MergeCanx
List of integer vectors comprising Candidates from first clustering that are merged.
Each slot of list indicates the clusters that merge together from first clustering.
MergeCany
Vector of integers comprising Candidates that emerged in second clustering because
of merging various clusters from first clustering
.
EmergCan
Newley emerged candidates which are not a result of any external transition from first clustering .
Cluster_Tracey
Vector of Cluster Trace from second clustering .
Trace cluster solutions of dynamic datasets at consecutive time points and counts the clusters that experiences external transition. External transition includes Survive, Split, Merge, newly emerged, and Died candidates.
Survive
Number of candidates survive from first clustering .
Split
Number of candidates from first clustering that split into several daughter clusters at second clustering
.
Merge
Number of candidates from first clustering that merge toghter at second clustering
.
Died
Number of candidates from first clusterin that disapeared at second clustering
.
SurvivalRatio
Ratio of survive clusters to total number of clusters from first clusering .
AbsorptionRatio
Ratio of Merged clusters to total number of clusters from first clusering .
passforwardRatio
Sum of SurvivalRatio and AbsorptionRatio.
Survival_thrHold
Threshold for survival of clusters.
Split_thrHold
Threhold for split of clusters.
Cluster_Tracex
Vector containing each cluster result from first clustering .