Title: | An L1-Version of the Spectral Clustering |
---|---|
Description: | Provides an l1-version of the spectral clustering algorithm devoted to robustly clustering highly perturbed graphs using l1-penalty. This algorithm is described with more details in the preprint C. Champion, M. Champion, M. Blazère, R. Burcelin and J.M. Loubes, "l1-spectral clustering algorithm: a spectral clustering method using l1-regularization" (2022). |
Authors: | Camille Champion [aut], Magali Champion [aut, cre] |
Maintainer: | Magali Champion <[email protected]> |
License: | GPL-2 |
Version: | 0.99.6 |
Built: | 2024-11-21 06:51:06 UTC |
Source: | CRAN |
Provides an l1-version of the spectral clustering algorithm devoted to robustly clustering highly perturbed graphs using l1-penalty. This algorithm is described with more details in the preprint C. Champion, M. Champion, M. Blazère, R. Burcelin and J.M. Loubes, "l1-spectral clustering algorithm: a spectral clustering method using l1-regularization" (2022).
l1-spectral clustering is an l1-penalized version of the spectral clustering algorithm, which aims at robustly detecting cluster structure of perturbed graphs by promoting sparse eigenbases solutions of specific l1-minimization problems.
The DESCRIPTION file:
Package: | l1spectral |
Title: | An L1-Version of the Spectral Clustering |
Version: | 0.99.6 |
Authors@R: | c(person("Camille", "Champion", role = "aut"), person("Magali", "Champion", role = c("aut","cre"),email="[email protected]" )) |
Description: | Provides an l1-version of the spectral clustering algorithm devoted to robustly clustering highly perturbed graphs using l1-penalty. This algorithm is described with more details in the preprint C. Champion, M. Champion, M. Blazère, R. Burcelin and J.M. Loubes, "l1-spectral clustering algorithm: a spectral clustering method using l1-regularization" (2022). |
License: | GPL-2 |
Imports: | Rcpp (>= 0.12.5), stats, dplyr, graphics, igraph, Matrix, aricode, grDevices, caret, glmnet, ggplot2, cvTools |
LinkingTo: | Rcpp, RcppArmadillo |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.2 |
NeedsCompilation: | yes |
Packaged: | 2022-01-26 16:29:51 UTC; mchampion |
Author: | Camille Champion [aut], Magali Champion [aut, cre] |
Maintainer: | Magali Champion <[email protected]> |
Repository: | CRAN |
Date/Publication: | 2022-01-26 17:12:46 UTC |
Config/pak/sysreqs: | libglpk-dev libicu-dev libxml2-dev |
Camille Champion [aut], Magali Champion [aut, cre]
C. Champion, M. Champion, M. Blazère, R. Burcelin, J.M. Loubes, l1-spectral clustering algorithm: a robust spectral clustering using Lasso regularization, Preprint (2021).
##################################################### # Performing the l1-spectral clustering on the graph ##################################################### data(ToyData) # if desired, the number of clusters and representative elements can be provided, # otherwise remove results2 <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso") results2$comm # when desired, the number of clusters and representative elements can also be provided results2 <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso", k=2, elements = c(1,4))
##################################################### # Performing the l1-spectral clustering on the graph ##################################################### data(ToyData) # if desired, the number of clusters and representative elements can be provided, # otherwise remove results2 <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso") results2$comm # when desired, the number of clusters and representative elements can also be provided results2 <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso", k=2, elements = c(1,4))
This function computes the performances of the l1-spectral clustering algorithm in terms of Normalized Mutualized Information (NMI).
ComputePerformances(Results, A)
ComputePerformances(Results, A)
Results |
Output of the function |
A |
The adjacency matrix of the graph to cluster. |
The Normalized Mutualized Information (NMI), Adjusted Mutualized Information (AMI) and Adjusted Rand Index (ARI) scores.
Camille Champion, Magali Champion
l1_spectralclustering
, l1spectral
.
############################################################# # Computing the performances ############################################################# data(ToyData) results <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso", k=2, elements = c(1,4)) ComputePerformances(Results=results,A=ToyData$A)
############################################################# # Computing the performances ############################################################# data(ToyData) results <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso", k=2, elements = c(1,4)) ComputePerformances(Results=results,A=ToyData$A)
This function generates toy data that can be used to run the l1-spectal clustering algorithm: the adjacency matrix of a graph with n
nodes and its perturbed version.
CreateDataSet(k, n, p, print.plot = TRUE, ClustersLength = NULL)
CreateDataSet(k, n, p, print.plot = TRUE, ClustersLength = NULL)
k |
True number of clusters. |
n |
Number of nodes. |
p |
List of probabilities of perturbations (inside and outside clusters). |
print.plot |
TRUE/FALSE indicated whether the graph should be plotted. |
ClustersLength |
Length of the |
A list with the following elements:
A
Adjacency matrix of the generated graph.
A_hat
Adjacency matrix of the perturbed version of the generated graph.
ClustersLength
Length of the k
clusters.
Camille Champion, Magali Champion
l1_spectralclustering
, l1spectral
.
############################################################# # Generating toy data ############################################################# Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0.1,p_outside=0.1)) # Data is a list of three objects: # - Data$A is an nxn matrix corresponding to the adjacency matrix of a graph # with n nodes and k clusters, # - Data$A_hat is a perturbed version of this graph with a probability # p_inside of removing an edge inside clusters and # p_outside of adding an edge between clusters, # - Data$ClustersLength is a vector indicating the length of the clusters. Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0.1,p_outside=0.1), print.plot=TRUE) # The same as above but the true graph and its perturbed version are both plotted.
############################################################# # Generating toy data ############################################################# Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0.1,p_outside=0.1)) # Data is a list of three objects: # - Data$A is an nxn matrix corresponding to the adjacency matrix of a graph # with n nodes and k clusters, # - Data$A_hat is a perturbed version of this graph with a probability # p_inside of removing an edge inside clusters and # p_outside of adding an edge between clusters, # - Data$ClustersLength is a vector indicating the length of the clusters. Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0.1,p_outside=0.1), print.plot=TRUE) # The same as above but the true graph and its perturbed version are both plotted.
This internal function of the l1-spectral clustering algorithm finds representative elements of the clusters, that is nodes belonging to the clusters.
FindElement(A, structure, clusters, elements = NULL)
FindElement(A, structure, clusters, elements = NULL)
A |
The adjacency matrix |
structure |
Output of the function |
clusters |
Output of the function |
elements |
The representative elements of the clusters (not necessary needed). If not provided, chosen using the betweeness centrality score. |
A list with the following elements:
score
The edge betweenness score of all nodes,
Nodes
Vector of the representative elements.
Camille Champion, Magali Champion
l1_spectralclustering
, l1spectral
.
###################################################### # Finding the representative elements of the clusters ###################################################### # 1st: create data (not perturbed graph) Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0,p_outside=0)) # 2nd: find the structure of the graph Structure <- FindStructure(Data$A_hat) # 3rd: find the optimal number of clusters (here, 3 clusters) Clusters <- FindNbrClusters(A = Data$A_hat, structure = Structure, k=3) # 4th: find the representative elements of the clusters Elements <- FindElement(A = Data$A_hat, structure = Structure, clusters = Clusters) # if elements is not provided, the representative elements of each component are chosen # by maximizing the edge betweenness score Elements <- FindElement(A = Data$A_hat, structure = Structure, clusters = Clusters, elements = c(1,5,12))
###################################################### # Finding the representative elements of the clusters ###################################################### # 1st: create data (not perturbed graph) Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0,p_outside=0)) # 2nd: find the structure of the graph Structure <- FindStructure(Data$A_hat) # 3rd: find the optimal number of clusters (here, 3 clusters) Clusters <- FindNbrClusters(A = Data$A_hat, structure = Structure, k=3) # 4th: find the representative elements of the clusters Elements <- FindElement(A = Data$A_hat, structure = Structure, clusters = Clusters) # if elements is not provided, the representative elements of each component are chosen # by maximizing the edge betweenness score Elements <- FindElement(A = Data$A_hat, structure = Structure, clusters = Clusters, elements = c(1,5,12))
This internal function of the l1-spectral algorithm finds the optimal number of clusters to build.
FindNbrClusters(A, structure, k = NULL, k_max = NULL)
FindNbrClusters(A, structure, k = NULL, k_max = NULL)
A |
The adjacency matrix |
structure |
Output of the function |
k |
True number of clusters (not necessarily needed). If not provided, k is chosen by spectral eigengap. |
k_max |
Maximal number of clusters to form (not necessarily needed). If not provided, k_max is set to the number of nodes. |
A list with the following elements:
nbr_clusters
Optimal number of clusters by component,
nbr_clusters_total
Optimal total number of clusters.
Camille Champion, Magali Champion
l1_spectralclustering
, l1spectral
.
######################################### # Finding the optimal number of clusters ######################################### # 1st example: non-perturbed graph Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0,p_outside=0)) Structure <- FindStructure(Data$A_hat) Clusters <- FindNbrClusters(A = Data$A_hat, structure = Structure, k=3) # The number of clusters is provided (3): each of the 3 components will be divided into 1 cluster Clusters <- FindNbrClusters(A = Data$A_hat, structure = Structure, k=5) # The number of clusters is provided (5) and larger than the number of components (3), # the spectral eigengap method is used to find the optimal number of clusters of each component. # 2nd example: perturbed graph Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0.1,p_outside=0.1)) Structure <- FindStructure(Data$A_hat) # there are less than 3 components Clusters <- FindNbrClusters(A = Data$A_hat, structure = Structure) # The number of clusters is optimized using the spectral eigengap method
######################################### # Finding the optimal number of clusters ######################################### # 1st example: non-perturbed graph Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0,p_outside=0)) Structure <- FindStructure(Data$A_hat) Clusters <- FindNbrClusters(A = Data$A_hat, structure = Structure, k=3) # The number of clusters is provided (3): each of the 3 components will be divided into 1 cluster Clusters <- FindNbrClusters(A = Data$A_hat, structure = Structure, k=5) # The number of clusters is provided (5) and larger than the number of components (3), # the spectral eigengap method is used to find the optimal number of clusters of each component. # 2nd example: perturbed graph Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0.1,p_outside=0.1)) Structure <- FindStructure(Data$A_hat) # there are less than 3 components Clusters <- FindNbrClusters(A = Data$A_hat, structure = Structure) # The number of clusters is optimized using the spectral eigengap method
This internal function of the spectral clustering algorithm finds the structure of the graph to cluster (number of nodes and connected components).
FindStructure(A)
FindStructure(A)
A |
The adjacency matrix |
A list with the following elements:
graph
igraph object derived from A,
groups
List of connected components and corresponding nodes.
Camille Champion, Magali Champion
l1_spectralclustering
, l1spectral
.
############################################################### # Finding the structure of the graph from the adjacency matrix ############################################################### # 1st example: non-perturbed graph Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0,p_outside=0)) Structure <- FindStructure(Data$A_hat) Structure$groups # the graph is not perturbed, there are 3 connected components # 2nd example: highly-perturbed graph Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0.5,p_outside=0.5)) Structure <- FindStructure(Data$A_hat) Structure$groups # the graph is higlhy perturbed, there are less than 3 connected components
############################################################### # Finding the structure of the graph from the adjacency matrix ############################################################### # 1st example: non-perturbed graph Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0,p_outside=0)) Structure <- FindStructure(Data$A_hat) Structure$groups # the graph is not perturbed, there are 3 connected components # 2nd example: highly-perturbed graph Data <- CreateDataSet(k=3, n=20, p=list(p_inside=0.5,p_outside=0.5)) Structure <- FindStructure(Data$A_hat) Structure$groups # the graph is higlhy perturbed, there are less than 3 connected components
This function runs the l1-spectral algorithm, an l1-penalized version of the spectral clustering that aims at robustly clustering perturbed graphs.
l1_spectralclustering( A, k = NULL, k_max = NULL, elements = NULL, pen, stab = TRUE )
l1_spectralclustering( A, k = NULL, k_max = NULL, elements = NULL, pen, stab = TRUE )
A |
The adjacency matrix of the graph to cluster. |
k |
True number of clusters (not necessarily needed). If not provided, k is chosen by spectral eigengap. |
k_max |
Maximal number of clusters to form (not necessarily needed). If not provided, k_max is set to the number of nodes. |
elements |
The representative elements of the clusters (not necessary needed). If not provided, index are chosen using the betweeness centrality score. |
pen |
The penalty (to be chosen among "lasso" or "thresholdedLS"). |
stab |
TRUE/FALSE indicated whether the indices should be stabilized (TRUE by default) |
A list with the following elements:
comm
The community matrix,
structure
The structure of the graph to cluster,
clusters
The number of clusters,
elements
The chosen representative elements of the clusters.
Camille Champion, Magali Champion
ComputePerformances
, l1spectral
.
##################################################### # Performing the l1-spectral clustering on the graph ##################################################### data(ToyData) # if desired, the number of clusters and representative elements can be provided, otherwise, remove results2 <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso") results2$comm # when desired, the number of clusters and representative elements can also be provided results2 <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso", k=2, elements = c(1,4))
##################################################### # Performing the l1-spectral clustering on the graph ##################################################### data(ToyData) # if desired, the number of clusters and representative elements can be provided, otherwise, remove results2 <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso") results2$comm # when desired, the number of clusters and representative elements can also be provided results2 <- l1_spectralclustering(A = ToyData$A_hat, pen = "lasso", k=2, elements = c(1,4))
An example of data for running the l1-spectral clustering algorithm.
ToyData
ToyData
A list of three variables containing the adajcency matrix A
of a 5-nodes graph, the adjacency matrix A_hat
of a perturbed version of the same graph and the length of the two inherent clusters.
No value returned, as this is a dataset.
data(ToyData) A <- ToyData$A A_hat <- ToyData$A_hat clusters <- ToyData$clusters
data(ToyData) A <- ToyData$A A_hat <- ToyData$A_hat clusters <- ToyData$clusters