Title: | Clustering and Model Selection with the Integrated Classification Likelihood |
---|---|
Description: | An ensemble of algorithms that enable the clustering of networks and data matrices (such as counts, categorical or continuous) with different type of generative models. Model selection and clustering is performed in combination by optimizing the Integrated Classification Likelihood (which is equivalent to minimizing the description length). Several models are available such as: Stochastic Block Model, degree corrected Stochastic Block Model, Mixtures of Multinomial, Latent Block Model. The optimization is performed thanks to a combination of greedy local search and a genetic algorithm (see <arXiv:2002:11577> for more details). |
Authors: | Etienne Côme [aut, cre], Nicolas Jouvin [aut] |
Maintainer: | Etienne Côme <[email protected]> |
License: | GPL |
Version: | 0.6.1 |
Built: | 2024-12-19 06:59:50 UTC |
Source: | CRAN |
An S4 class to represent an abstract optimization algorithm.
Display the list of every currently available optimization algorithm
available_algorithms()
available_algorithms()
Display the list of every currently available DLVM
available_models()
available_models()
A network of books about US politics published around the time of the 2004 presidential election and sold by the online bookseller Amazon.com. Edges between books represent frequent co-purchasing of books by the same buyers. The network was compiled by V. Krebs and is unpublished, but can found on Krebs' web site. Thanks to Valdis Krebs for permission to post these data on this web site.
data(Books)
data(Books)
An object of class list
with two fields;
network adjacency matrix as a sparseMatrix
of size 105x105
a factor of length (size 105) with levels "l", "n", or "c" to indicate whether the books are liberal, neutral, or conservative
data(Books)
data(Books)
IclFit-class
objectThis method take a IclFit-class
object and return an integer vector with the cluster assignments that were found.
clustering(fit) ## S4 method for signature 'IclFit' clustering(fit)
clustering(fit) ## S4 method for signature 'IclFit' clustering(fit)
fit |
an |
an integer vector with cluster assignments. Zero indicates noise points.
IclFit
: IclFit-class method
DcLbmFit-class
objectExtract parameters from an DcLbmFit-class
object
## S4 method for signature 'DcLbmFit' coef(object)
## S4 method for signature 'DcLbmFit' coef(object)
object |
a list with the model parameters estimates (MAP), the fields are:
'pirows'
: row cluster proportions
'picols'
: row cluster proportions
'thetakl'
: between clusters connection probabilities (matrix of size Krow x Kcol),
'gammarows'
: rows degree correction parameters (size Nrows),
'gammacols'
: cols degree correction parameters (size Ncols),
DcSbmFit-class
objectExtract parameters from an DcSbmFit-class
object
## S4 method for signature 'DcSbmFit' coef(object)
## S4 method for signature 'DcSbmFit' coef(object)
object |
in case of undirected graph
a list with the model parameters estimates (MAP), the fields are the following for "directed" models :
'pi'
: cluster proportions
'thetakl'
: between cluster normalized connection intensities (matrix of size K x K),
gammain
: node in-degree correction parameter
gammaout
: node out-degree correction parameter
And as follow for un-directed models : #'
'pi'
: cluster proportions
'thetakl'
: between cluster normalized connection intensities (matrix of size K x K),
gamma
: node degree correction parameter
DiagGmmFit-class
objectExtract mixture parameters from DiagGmmFit-class
object
## S4 method for signature 'DiagGmmFit' coef(object)
## S4 method for signature 'DiagGmmFit' coef(object)
object |
a list with the mixture parameters estimates (MAP), the fields are:
'pi'
: cluster proportions
'muk'
: cluster means
'Sigmak'
: cluster co-variance matrices
GmmFit-class
objectExtract mixture parameters from GmmFit-class
object
## S4 method for signature 'GmmFit' coef(object)
## S4 method for signature 'GmmFit' coef(object)
object |
a list with the mixture parameters estimates (MAP), the fields are:
'pi'
: cluster proportions
'muk'
: cluster means
'Sigmak'
: cluster co-variance matrices
IclFit-class
objectExtract parameters from an IclFit-class
object
## S4 method for signature 'IclFit' coef(object)
## S4 method for signature 'IclFit' coef(object)
object |
The results depends of the used model, in case the method is not yet implemented for a model, this generic method will be used. Which will return the obs_stats
slot of the model.
a list with the model parameters estimates (MAP)
LcaFit-class
objectExtract parameters from an LcaFit-class
object
## S4 method for signature 'LcaFit' coef(object)
## S4 method for signature 'LcaFit' coef(object)
object |
a list with the model parameters estimates (MAP), the fields are:
'pi'
: cluster proportions
'thetav'
: cluster profile probabilities (list of matrix of size K x Dv),
MoMFit-class
objectExtract parameters from an MoMFit-class
object
## S4 method for signature 'MoMFit' coef(object)
## S4 method for signature 'MoMFit' coef(object)
object |
a list with the model parameters estimates (MAP), the fields are:
'pi'
: cluster proportions
'thetak'
: cluster profile probabilities (matrix of size K x D),
MoRFit-class
object using MAP estimationExtract mixture parameters from MoRFit-class
object using MAP estimation
## S4 method for signature 'MoRFit' coef(object)
## S4 method for signature 'MoRFit' coef(object)
object |
a list with the mixture parameters estimates (MAP), the fields are:
'pi'
: cluster proportions
'A'
: cluster regression matrix
'Sigmak'
: cluster noise co-variance matrices
MultSbmFit-class
objectExtract parameters from an MultSbmFit-class
object
## S4 method for signature 'MultSbmFit' coef(object)
## S4 method for signature 'MultSbmFit' coef(object)
object |
a list with the model parameters estimates (MAP), the fields are:
'pi'
: cluster proportions
'thetakl'
: cluster profile probabilities (array of size K x K x D),
SbmFit-class
objectExtract parameters from an SbmFit-class
object
## S4 method for signature 'SbmFit' coef(object)
## S4 method for signature 'SbmFit' coef(object)
object |
a list with the model parameters estimates (MAP), the fields are:
'pi'
: cluster proportions
'thetakl'
: between clusters connections probabilities (matrix of size K x K)
An S4 class to represent a combined clustering models, where several models are used to model different datasets. A conditional independence assumption between the view knowing the cluster is made.
CombinedModels(models, alpha = 1)
CombinedModels(models, alpha = 1)
models |
a named list of DlvmPrior's object |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
The filed name in the models list must match the name of the list use to provide the datasets to cluster together.
a CombinedModels-class
object
CombinedModelsFit-class
, CombinedModelsPath-class
Other DlvmModels:
DcLbm
,
DcSbm
,
DiagGmm
,
DlvmPrior-class
,
Gmm
,
Lca
,
MoM
,
MoR
,
MultSbm
,
Sbm
,
greed()
CombinedModels(models = list(continuous = GmmPrior(), discrete = LcaPrior()))
CombinedModels(models = list(continuous = GmmPrior(), discrete = LcaPrior()))
An S4 class to represent a fit of a degree corrected stochastic block model for co_clustering, extend IclFit-class
.
model
a DcSbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
extractSubModel,CombinedModelsPath,character-method
An S4 class to represent a hierarchical fit of a degree corrected stochastic block model, extend IclPath-class
.
model
a DcSbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
path
a list of size K-1 with each part of the path described by:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
cl: vector of cluster indexes
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats: a list with the elements:
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
train_hist
data.frame with training history information (details depends on the training procedure)
extractSubModel,CombinedModelsPath,character-method
This method take a DcLbmPath-class
and an integer K and return the solution from the path with K clusters
## S4 method for signature 'DcLbmPath' cut(x, K)
## S4 method for signature 'DcLbmPath' cut(x, K)
x |
A an |
K |
Desired number of cluster |
an IclPath-class
object with the desired number of cluster
This method take a IclPath-class
object and an integer K and return the solution from the path with K clusters
## S4 method for signature 'IclPath' cut(x, K)
## S4 method for signature 'IclPath' cut(x, K)
x |
A an |
K |
Desired number of cluster |
an IclPath-class
object with the desired number of cluster
An S4 class to represent a degree corrected stochastic block model for co_clustering of bipartite graph.
Such model can be used to cluster graph vertex, and model a bipartite graph adjacency matrix with the following generative model :
The individuals parameters allow to take into account the node degree heterogeneity.
These parameters have uniform priors over simplex
.
These classes mainly store the prior parameters value
of this generative model.
The
DcLbm-class
must be used when fitting a simple Diagonal Gaussian Mixture Model whereas the DcLbmPrior-class
must be sued when fitting a CombinedModels-class
.
DcLbmPrior(p = NaN) DcLbm(alpha = 1, p = NaN)
DcLbmPrior(p = NaN) DcLbm(alpha = 1, p = NaN)
p |
Exponential prior parameter (default to Nan, in this case p will be estimated from data as the average intensities of X) |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
a DcLbmPrior-class
a DcLbm-class
object
DcLbmFit-class
, DcLbmPath-class
Other DlvmModels:
CombinedModels
,
DcSbm
,
DiagGmm
,
DlvmPrior-class
,
Gmm
,
Lca
,
MoM
,
MoR
,
MultSbm
,
Sbm
,
greed()
DcLbmPrior() DcLbmPrior(p = 0.7) DcLbm() DcLbm(p = 0.7)
DcLbmPrior() DcLbmPrior(p = 0.7) DcLbm() DcLbm(p = 0.7)
An S4 class to represent a fit of a degree corrected stochastic block model for co_clustering, extend IclFit-class
.
model
a DcLbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
Krow
number of extracted row clusters
Kcol
number of extracted column clusters
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
din: numeric vector of size K which store the sums of in-degrees for each clusters
dout: numeric vector of size K which store the sums of out-degrees for each clusters
x_counts: matrix of size K*K with the number of links between each pair of clusters
co_x_counts: matrix of size Krow*Kcol with the number of links between each pair of row and column cluster
clrow
a numeric vector with row cluster indexes
clcol
a numeric vector with column cluster indexes
Nrow
number of rows
Ncol
number of columns
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a fit of a degree corrected stochastic block model for co_clustering, extend IclPath-class
.
model
a DcLbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
Krow
number of extracted row clusters
Kcol
number of extracted column clusters
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
din: numeric vector of size K which store the sums of in-degrees for each clusters
dout: numeric vector of size K which store the sums of out-degrees for each clusters
x_counts: matrix of size K*K with the number of links between each pair of clusters
co_x_counts: matrix of size Krow*Kcol with the number of links between each pair of row and column cluster
clrow
a numeric vector with row cluster indexes
clcol
a numeric vector with column cluster indexes
Nrow
number of rows
Ncol
number of columns
path
a list of size K-1 with each part of the path described by:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
cl: vector of cluster indexes
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats: a list with the elements:
counts: numeric vector of size K with number of elements in each clusters
din: numeric vector of size K which store the sums of in-degrees for each clusters
dout: numeric vector of size K which store the sums of out-degrees for each clusters
x_counts: matrix of size K*K with the number of links between each pair of clusters
co_x_counts: matrix of size Krow*Kcol with the number of links between each pair of row and column cluster
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
ggtreerow
data.frame with complete merge tree of row clusters for easy plotting with ggplot2
ggtreecol
data.frame with complete merge tree of column clusters for easy plotting with ggplot2
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a Degree Corrected Stochastic Block Model.
Such model can be used to cluster graph vertex, and model a square adjacency matrix with the following generative model :
The individuals parameters allow to take into account the node degree heterogeneity.
These parameters have uniform priors over the simplex
ie.
.
These classes mainly store the prior parameters value
of this generative model.
The
DcSbm-class
must be used when fitting a simple Degree Corrected Stochastic Block Model whereas the DcSbmPrior-class
must be used when fitting a CombinedModels-class
.
DcSbmPrior(p = NaN, type = "guess") DcSbm(alpha = 1, p = NaN, type = "guess")
DcSbmPrior(p = NaN, type = "guess") DcSbm(alpha = 1, p = NaN, type = "guess")
p |
Exponential prior parameter (default to NaN, in this case p will be estimated from data as the mean connection probability) |
type |
define the type of networks (either "directed", "undirected" or "guess", default to "guess") |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
a DcSbmPrior-class
object
a DcSbm-class
object
DcSbmFit-class
, DcSbmPath-class
Other DlvmModels:
CombinedModels
,
DcLbm
,
DiagGmm
,
DlvmPrior-class
,
Gmm
,
Lca
,
MoM
,
MoR
,
MultSbm
,
Sbm
,
greed()
DcSbmPrior() DcSbmPrior(type = "undirected") DcSbm() DcSbm(type = "undirected")
DcSbmPrior() DcSbmPrior(type = "undirected") DcSbm() DcSbm(type = "undirected")
An S4 class to represent a fit of a degree corrected stochastic block model for co_clustering, extend IclFit-class
.
model
a DcSbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
din: numeric vector of size K which store the sums of in-degrees for each clusters
dout: numeric vector of size K which store the sums of out-degrees for each clusters
x_counts: matrix of size K*K with the number of links between each pair of clusters
obs_stats_cst
a list with the following elements:
din_node: node in-degree, a vector of size N
dout_node: node in-degree vector of size N
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a hierarchical fit of a degree corrected stochastic block model, extend IclPath-class
.
model
a DcSbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
din: numeric vector of size K which store the sums of in-degrees for each clusters
dout: numeric vector of size K which store the sums of out-degrees for each clusters
x_counts: matrix of size K*K with the number of links between each pair of clusters
path
a list of size K-1 with each part of the path described by:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
cl: vector of cluster indexes
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats: a list with the elements:
counts: numeric vector of size K with number of elements in each clusters
din: numeric vector of size K which store the sums of in-degrees for each clusters
dout: numeric vector of size K which store the sums of out-degrees for each clusters
x_counts: matrix of size K*K with the number of links between each pair of clusters
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a multivariate diagonal Gaussian mixture model. The model corresponds to the following generative model:
with the Gamma distribution with shape parameter
and rate parameter
.
These classes mainly store the prior parameters value (
) of this generative model.
The
DiagGmm-class
must be used when fitting a simple Diagonal Gaussian Mixture Model whereas the DiagGmmPrior-class
must be sued when fitting a CombinedModels-class
.
DiagGmmPrior(tau = 0.01, kappa = 1, beta = NaN, mu = NaN) DiagGmm(alpha = 1, tau = 0.01, kappa = 1, beta = NaN, mu = NaN)
DiagGmmPrior(tau = 0.01, kappa = 1, beta = NaN, mu = NaN) DiagGmm(alpha = 1, tau = 0.01, kappa = 1, beta = NaN, mu = NaN)
tau |
Prior parameter (inverse variance), (default 0.01) |
kappa |
Prior parameter (gamma shape), (default to 1) |
beta |
Prior parameter (gamma rate), (default to NaN, in this case beta will be estimated from data as 0.1 time the mean of X columns variances) |
mu |
Prior for the means (vector of size D), (default to NaN, in this case mu will be estimated from data as the mean of X) |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
a DiagGmmPrior-class
object
a DiagGmm-class
object
Bertoletti, Marco & Friel, Nial & Rastelli, Riccardo. (2014). Choosing the number of clusters in a finite mixture model using an exact Integrated Completed Likelihood criterion. METRON. 73. 10.1007/s40300-015-0064-5. #'
DiagGmmFit-class
, DiagGmmPath-class
Other DlvmModels:
CombinedModels
,
DcLbm
,
DcSbm
,
DlvmPrior-class
,
Gmm
,
Lca
,
MoM
,
MoR
,
MultSbm
,
Sbm
,
greed()
DiagGmmPrior() DiagGmmPrior(tau = 0.1) DiagGmm() DiagGmm(tau = 0.1)
DiagGmmPrior() DiagGmmPrior(tau = 0.1) DiagGmm() DiagGmm(tau = 0.1)
An S4 class to represent a fit of a multivariate diagonal Gaussian mixture model, extend IclFit-class
.
model
a DiagGmm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
regs: list of size $K$ with statistics for each clusters
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a hierarchical fit of a diagonal gaussian mixture model, extend IclPath-class
.
model
a DiagGmm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
regs: list of size $K$ with statistics for each clusters
path
a list of size K-1 with each part of the path described by:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
cl: vector of cluster indexes
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats: a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
regs: list of size $K$ with statistics for each clusters
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
train_hist
data.frame with training history information (details depends on the training procedure)
plot,DiagGmmFit,missing-method
An S4 class to represent an abstract generative model
alpha
a numeric vector of length 1 which define the parameters of the Dirichlet over the cluster proportions (default to 1)
An S4 class to represent an abstract generative model
alpha
a numeric vector of length 1 which define the parameters of the Dirichlet over the cluster proportions (default to 1)
Other DlvmModels:
CombinedModels
,
DcLbm
,
DcSbm
,
DiagGmm
,
Gmm
,
Lca
,
MoM
,
MoR
,
MultSbm
,
Sbm
,
greed()
CombinedModelsPath-class
objectExtract a part of a CombinedModelsPath-class
object
extractSubModel(sol, sub_model_name) ## S4 method for signature 'CombinedModelsPath,character' extractSubModel(sol, sub_model_name)
extractSubModel(sol, sub_model_name) ## S4 method for signature 'CombinedModelsPath,character' extractSubModel(sol, sub_model_name)
sol |
an |
sub_model_name |
a string which specify the part of the model to
extract. Note that the name must correspond to the one of the names used in
the list of models during the origin call to |
a IclFit-class
object of the relevant class
sol = CombinedModelsPath,sub_model_name = character
: CombinedModelsPath method
Zalando fashionmnist dataset, sample of 1 000 Zalando's article images from the test set.
data(fashion)
data(fashion)
An object of class matrix
with a random sample of 1000 images (one per rows) extracted from the fashionmnist dataset.
https://github.com/zalandoresearch/fashion-mnist
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Han Xiao, Kashif Rasul, Roland Vollgraf (2017) (arXiv:1708.07747).
data(fashion)
data(fashion)
A random sample of 6000 players from the FIFA videogame with various statistics on all player ranging from position, cost in the game, capacity in offense/defense, speed, etc. Two columns pos_x, pos_y with average player possible positions (in opta coordiantes) were derived from the raw data. was also u.
data(Fifa)
data(Fifa)
An R data.frame with columns containing each of the descriptive statistics of a player.
https://www.kaggle.com/stefanoleone992/fifa-20-complete-player-dataset?select=players_20.csv
data(Fifa)
data(Fifa)
Network of American football games between Division IA colleges during regular season Fall 2000.
data(Football)
data(Football)
An object of class list
with two fields;
network adjacency matrix as a sparseMatrix
of size 115x115
vector of teams conferences of size 115 with the following encoding (0 = Atlantic Coast, 1 = Big East, 2 = Big Ten, 3 = Big Twelve, 4 = Conference USA, 5 = Independents, 6 = Mid-American, 7 = Mountain West, 8 = Pacific Ten, 9 = Southeastern, 10 = Sun Belt, 11 = Western Athletic)
M. Girvan and M. E. J. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99, 7821-7826 (2002)
data(Football)
data(Football)
An S4 class to represent a genetic algorithm (extends Alg-class
class).
Genetic(pop_size = 100, nb_max_gen = 20, prob_mutation = 0.25, sel_frac = 0.75)
Genetic(pop_size = 100, nb_max_gen = 20, prob_mutation = 0.25, sel_frac = 0.75)
pop_size |
size of the solutions populations (default to 10) |
nb_max_gen |
maximal number of generation to produce (default to 4) |
prob_mutation |
probability of mutation (default to 0.25) |
sel_frac |
fraction of best solutions selected for crossing (default to 0.75) |
a Genetic-class
object
Genetic
: Genetic algorithm class constructor
pop_size
size of the solutions populations (default to 10)
nb_max_gen
maximal number of generation to produce (default to 4)
prob_mutation
probability of mutation (default to 0.25)
sel_frac
fraction of best solutions selected for crossing (default to 0.75)
Genetic() Genetic(pop_size = 500)
Genetic() Genetic(pop_size = 500)
An S4 class to represent a multivariate Gaussian mixture model. The model corresponds to the following generative model:
with the Wishart distribution.
The
Gmm-class
must be used when fitting a simple Gaussian Mixture Model whereas the GmmPrior-class
must be used when fitting a CombinedModels-class
.
GmmPrior(tau = 0.01, N0 = NaN, mu = NaN, epsilon = NaN) Gmm(tau = 0.01, N0 = NaN, mu = NaN, epsilon = NaN, alpha = 1)
GmmPrior(tau = 0.01, N0 = NaN, mu = NaN, epsilon = NaN) Gmm(tau = 0.01, N0 = NaN, mu = NaN, epsilon = NaN, alpha = 1)
tau |
Prior parameter (inverse variance) default 0.01 |
N0 |
Prior parameter (pseudo count) should be > number of features (default to NaN, in this case it will be estimated from data as the number of columns of X) |
mu |
Prior parameters for the means (vector of size D), (default to NaN, in this case mu will be estimated from the data and will be equal to the mean of X) |
epsilon |
Prior parameter co-variance matrix prior (matrix of size D x D), (default to a matrix of NaN, in this case epsilon will be estimated from data and will corresponds to 0.1 times a diagonal matrix with the variances of the X columns) |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
a GmmPrior-class
object
a Gmm-class
object
Bertoletti, Marco & Friel, Nial & Rastelli, Riccardo. (2014). Choosing the number of clusters in a finite mixture model using an exact Integrated Completed Likelihood criterion. METRON. 73. 10.1007/s40300-015-0064-5.
Other DlvmModels:
CombinedModels
,
DcLbm
,
DcSbm
,
DiagGmm
,
DlvmPrior-class
,
Lca
,
MoM
,
MoR
,
MultSbm
,
Sbm
,
greed()
GmmPrior() GmmPrior(tau = 0.1) Gmm() Gmm(tau = 0.1, alpha = 0.5)
GmmPrior() GmmPrior(tau = 0.1) Gmm() Gmm(tau = 0.1, alpha = 0.5)
An S4 class to represent a fit of a multivariate mixture of regression model, extend IclFit-class
.
model
a GmmPrior-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
regs: list of size $K$ with statistics for each clusters
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
Make a matrix of plots with a given data and gmm fitted parameters with ellipses.
gmmpairs(sol, X)
gmmpairs(sol, X)
sol |
|
X |
the data used for the fit a data.frame or matrix. |
a ggplot2
graphic
An S4 class to represent a hierarchical fit of a gaussian mixture model, extend IclPath-class
.
model
a GmmPrior-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
gmm: list of size $K$ with statistics for each clusters
path
a list of size K-1 with each part of the path described by:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats: a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
gmm: list of size $K$ with statistics for each clusters
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
train_hist
data.frame with training history information (details depends on the training procedure)
This function is the main function for fitting Dlvms with greed.
In the simplest case you may only provide a dataset and greed will find a suitable one.
The accepted classes for X
depends on the generative used which can be specified with the model
argument.
See the DlvmPrior-class
and the derived classes for details.
Greed enables the clustering of networks and count data matrix with different models.
Model selection and clustering are performed in
combination by optimizing the Integrated Classification Likelihood.
Optimization is performed thanks to a combination of greedy local search and
a genetic algorithm. The main entry point is the greed
function
to perform the clustering, which is documented below. The package also
provides sampling functions for all the implemented DLVMs.
greed(X, model = find_model(X), K = 20, alg = Hybrid(), verbose = FALSE)
greed(X, model = find_model(X), K = 20, alg = Hybrid(), verbose = FALSE)
X |
data to cluster either a data.frame, a matrix, an array, ... depending on the used generative model |
model |
|
K |
initial number of cluster |
alg |
an optimization algorithm of class |
verbose |
boolean value for verbose mode |
an IclPath-class
object
Other DlvmModels:
CombinedModels
,
DcLbm
,
DcSbm
,
DiagGmm
,
DlvmPrior-class
,
Gmm
,
Lca
,
MoM
,
MoR
,
MultSbm
,
Sbm
sbm <- rsbm(50, c(0.5, 0.5), diag(2) * 0.1 + 0.01) sol <- greed(sbm$x, model = Sbm()) table(sbm$cl,clustering(sol))
sbm <- rsbm(50, c(0.5, 0.5), diag(2) * 0.1 + 0.01) sol <- greed(sbm$x, model = Sbm()) table(sbm$cl,clustering(sol))
Compute the entropy of a discrete sample
H(cl)
H(cl)
cl |
vector of discrete labels |
the entropy of the sample
cl <- sample(2, 500, replace = TRUE) H(cl)
cl <- sample(2, 500, replace = TRUE) H(cl)
An S4 class to represent an hybrid genetic/greedy algorithm (extends Alg-class
class).
Hybrid(pop_size = 20, nb_max_gen = 10, prob_mutation = 0.25, Kmax = 100)
Hybrid(pop_size = 20, nb_max_gen = 10, prob_mutation = 0.25, Kmax = 100)
pop_size |
size of the solutions populations (default to 20) |
nb_max_gen |
maximal number of generation to produce (default to 10) |
prob_mutation |
mutation probability (default to 0.25) |
Kmax |
maximum number of clusters (default to 100) |
a Hybrid-class
object
Hybrid
: Hybrid algorithm class constructor
pop_size
size of the solutions populations (default to 20)
nb_max_gen
maximal number of generation to produce (default to 10)
prob_mutation
mutation probability (default to 0.25)
Kmax
maximum number of clusters (default to 100)
Hybrid() Hybrid(pop_size = 100)
Hybrid() Hybrid(pop_size = 100)
IclFit-class
objectThis method take a IclFit-class
object and return its ICL score.
ICL(fit) ## S4 method for signature 'IclFit' ICL(fit)
ICL(fit) ## S4 method for signature 'IclFit' ICL(fit)
fit |
an |
The ICL value achieved
IclFit
: IclFit method
An S4 abstract class to represent an icl fit of a clustering model.
K
a numeric vector of length 1 which correspond to the number of clusters
icl
a numeric vector of length 1 which store the the icl value
cl
a numeric vector of length N which store the clusters labels
obs_stats
a list to store the observed statistics of the model needed to compute ICL.
obs_stats_cst
a list to store the observed statistics of the model that do not depend on the clustering.
move_mat
binary matrix which store move constraints
train_hist
a data.frame to store training history (format depends on the used algorithm used).
name
generative model name
An S4 class to represent a hierarchical path of solution.
path
a list of merge moves describing the hierarchy of merge followed to complete totally the merge path.
tree
a tree representation of the merges.
ggtree
a data.frame for easy plotting of the dendrogram
logalpha
a numeric value which corresponds to the starting value of log(alpha).
List of edges of the network of Jazz musicians.
data(Jazz)
data(Jazz)
An object of class sparseMatrix
with the network adjacency matrix.
P.Gleiser and L. Danon , Community Structure in jazz, Adv. Complex Syst.6, 565 (2003) (Arxiv)
data(Jazz)
data(Jazz)
IclFit-class
objectThis method take a IclFit-class
object and return its ICL score.
K(fit) ## S4 method for signature 'IclFit' K(fit)
K(fit) ## S4 method for signature 'IclFit' K(fit)
fit |
an |
The number of clusters
IclFit
: IclFit method
An S4 class to represent a Latent Class Analysis model
Such model can be used to cluster a data.frame with several columns of factors with the following generative model :
These classes mainly store the prior parameters value () of this generative model.
The
Lca-class
must be used when fitting a simple Latent Class Analysis whereas the LcaPrior-class
must be used when fitting a CombinedModels-class
.
LcaPrior(beta = 1) Lca(alpha = 1, beta = 1)
LcaPrior(beta = 1) Lca(alpha = 1, beta = 1)
beta |
Dirichlet prior parameter for all the categorical feature (default to 1) |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
a LcaPrior-class
object
a Lca-class
object
Other DlvmModels:
CombinedModels
,
DcLbm
,
DcSbm
,
DiagGmm
,
DlvmPrior-class
,
Gmm
,
MoM
,
MoR
,
MultSbm
,
Sbm
,
greed()
LcaPrior() LcaPrior(beta = 0.5) Lca() Lca(beta = 0.5)
LcaPrior() LcaPrior(beta = 0.5) Lca() Lca(beta = 0.5)
An S4 class to represent a fit of a Latent Class Analysis model
for categorical data clustering, extend IclFit-class
. The
original data must be an n x p matrix where p is the number of variables
and each variable is encoded as a factor (integer-valued).
model
a Lca-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size K*D with the number of occurrences of each modality for each clusters
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a fit of a Latent Class Analysis model, extend IclPath-class
.
model
a Lca-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size K*D with the number of occurrence of modality word in each clusters
path
a list of size K-1 with each part of the path described by:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
cl: vector of cluster indexes
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size K*D with the number of occurrence of modality word in each clusters
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
train_hist
data.frame with training history information (details depends on the training procedure)
Compute the mutual information of two discrete samples
MI(cl1, cl2)
MI(cl1, cl2)
cl1 |
vector of discrete labels |
cl2 |
vector of discrete labels |
the mutual information between the two discrete samples
cl1 <- sample(2, 500, replace = TRUE) cl2 <- sample(2, 500, replace = TRUE) MI(cl1, cl2)
cl1 <- sample(2, 500, replace = TRUE) cl2 <- sample(2, 500, replace = TRUE) MI(cl1, cl2)
An S4 class to represent a Mixture of Multinomial model.
Such model can be used to cluster a data matrix with the following generative model :
With . These classes mainly store the prior parameters value (
) of this generative model.
The
MoM-class
must be used when fitting a simple Mixture of Multinomials whereas the MoMPrior-class
must be sued when fitting a CombinedModels-class
.
MoMPrior(beta = 1) MoM(alpha = 1, beta = 1)
MoMPrior(beta = 1) MoM(alpha = 1, beta = 1)
beta |
Dirichlet over vocabulary prior parameter (default to 1) |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
a MoMPrior-class
object
a MoM-class
object
Other DlvmModels:
CombinedModels
,
DcLbm
,
DcSbm
,
DiagGmm
,
DlvmPrior-class
,
Gmm
,
Lca
,
MoR
,
MultSbm
,
Sbm
,
greed()
MoMPrior() MoMPrior(beta = 0.5) MoM() MoM(beta = 0.5)
MoMPrior() MoMPrior(beta = 0.5) MoM() MoM(beta = 0.5)
An S4 class to represent a fit of a degree corrected stochastic block model for co_clustering, extend IclFit-class
.
model
a MoM-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size K*D with the number of occurrences of each modality for each clusters
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a fit of a stochastic block model, extend IclPath-class
.
model
a MoM-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size K*D with the number of occurrence of modality word in each clusters
path
a list of size K-1 with each part of the path described by:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
cl: vector of cluster indexes
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size K*D with the number of occurrence of modality word in each clusters
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a multivariate mixture of regression model. The model follows [minka-linear](https://tminka.github.io/papers/minka-linear.pdf) . The model corresponds to the following generative model:
with the Wishart distribution and
the matrix-normal distribution.
The
MoR-class
must be used when fitting a simple Mixture of Regression whereas the MoRPrior-class
must be used when fitting a CombinedModels-class
.
MoRPrior(formula, tau = 0.001, N0 = NaN, epsilon = as.matrix(NaN)) MoR(formula, alpha = 1, tau = 0.1, N0 = NaN, epsilon = as.matrix(NaN))
MoRPrior(formula, tau = 0.001, N0 = NaN, epsilon = as.matrix(NaN)) MoR(formula, alpha = 1, tau = 0.1, N0 = NaN, epsilon = as.matrix(NaN))
formula |
a |
tau |
Prior parameter (inverse variance) default 0.001 |
N0 |
Prior parameter (default to NaN, in this case N0 will be fixed equal to the number of columns of Y.) |
epsilon |
Covariance matrix prior parameter (default to NaN, in this case epsilon will be fixed to a diagonal variance matrix equal to 0.1 time the variance of the regression residuals with only one cluster.) |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
a MoRPrior-class
object
a MoR-class
object
Other DlvmModels:
CombinedModels
,
DcLbm
,
DcSbm
,
DiagGmm
,
DlvmPrior-class
,
Gmm
,
Lca
,
MoM
,
MultSbm
,
Sbm
,
greed()
MoRPrior(y ~ x1 + x2) MoRPrior(y ~ x1 + x2, N0 = 100) MoRPrior(cbind(y1, y2) ~ x1 + x2, N0 = 100) MoR(y ~ x1 + x2) MoR(y ~ x1 + x2, N0 = 100) MoR(cbind(y1, y2) ~ x1 + x2, N0 = 100)
MoRPrior(y ~ x1 + x2) MoRPrior(y ~ x1 + x2, N0 = 100) MoRPrior(cbind(y1, y2) ~ x1 + x2, N0 = 100) MoR(y ~ x1 + x2) MoR(y ~ x1 + x2, N0 = 100) MoR(cbind(y1, y2) ~ x1 + x2, N0 = 100)
An S4 class to represent a fit of a multivariate mixture of regression model, extend IclFit-class
.
model
a MoR-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
mvmregs: list of size $K$ with statistics for each clusters
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a hierarchical fit of a multivariate mixture of regression model, extend IclPath-class
.
model
a MoR-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
mvmregs: list of size $K$ with statistics for each clusters
path
a list of size K-1 with each part of the path described by:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats: a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
mvregs: list of size $K$ with statistics for each clusters
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a greedy algorithm with multiple start (extends Alg-class
class).
Multistarts(nb_start = 10)
Multistarts(nb_start = 10)
nb_start |
number of random starts (default to 10) |
a Multistarts-class
object
Multistarts
: Multistarts algorithm class constructor
nb_start
number of random starts (default to 10)
Multistarts() Multistarts(15)
Multistarts() Multistarts(15)
An S4 class to represent a Multinomial Stochastic Block Model. Such model can be used to cluster multi-layer graph vertex, and model a square adjacency cube of size NxNxM with the following generative model :
With . These classes mainly store the prior parameters value
of this generative model.
The
MultSbm-class
must be used when fitting a simple MultSbm whereas the MultSbmPrior-class
must be sued when fitting a CombinedModels-class
.
MultSbmPrior(beta = 1, type = "guess") MultSbm(alpha = 1, beta = 1, type = "guess")
MultSbmPrior(beta = 1, type = "guess") MultSbm(alpha = 1, beta = 1, type = "guess")
beta |
Dirichlet prior parameter over Multinomial links |
type |
define the type of networks (either "directed", "undirected" or "guess", default to "guess"), for undirected graphs the adjacency matrix is supposed to be symmetric. |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
a MultSbmPrior-class
object
a MultSbm-class
object
MultSbmFit-class
, MultSbmPath-class
Other DlvmModels:
CombinedModels
,
DcLbm
,
DcSbm
,
DiagGmm
,
DlvmPrior-class
,
Gmm
,
Lca
,
MoM
,
MoR
,
Sbm
,
greed()
MultSbmPrior() MultSbmPrior(type = "undirected") MultSbm() MultSbm(type = "undirected")
MultSbmPrior() MultSbmPrior(type = "undirected") MultSbm() MultSbm(type = "undirected")
An S4 class to represent a fit of a Multinomial Stochastic Block Model, extend IclFit-class
.
model
a MultSbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: cube of size KxKxM with the number of links between each pair of clusters
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a hierarchical fit of a Multinomial Stochastic Block Model, extend IclPath-class
.
model
a MultSbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size KxKxM with the number of links between each pair of clusters
path
a list of size K-1 with each part of the path described by:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
cl: vector of cluster indexes
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats: a list with the elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size KxKxM with the number of links between each pair of clusters
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
train_hist
data.frame with training history information (details depends on the training procedure)
plot,MultSbmFit,missing-method
Categorical data from UCI Machine Learning Repository describing 8124 mushrooms with 22 phenotype variables. Each mushroom is classified as "edible" or "poisonous" and the goal is to recover the mushroom class from its phenotype.
data(mushroom)
data(mushroom)
An R data.frame with a variable edibility used as label and 22 categorical variables with no names. More detail on the UCI webpage describing the data.
https://archive.ics.uci.edu/ml/datasets/Mushroom
data(mushroom)
data(mushroom)
Network of co-attendance occurrence attendance of suspected members of the Ndrangheta criminal organization at summits (meetings whose purpose is to make important decisions and/or affiliations, but also to solve internal problems and to establish roles and powers) taking place between 2007 and 2009.
data(Ndrangheta)
data(Ndrangheta)
An object of class list
with two fields;
network adjacency matrix as a matrix
of size 146x146
data frame of nodes meta information with features :
id of the node, rownames of network adjacency matrix
factor with the locali affiliation of the node , "OUT": Suspects not belonging to La Lombardia, "MISS": Information not available, other Locali Id.
factor with the type of hierarchical position of the node "MISS": Information not available,"boss": high hierarchical position, "aff": affiliate
ucinetsoftware/datasets/covert-networks
Extended Stochastic Block Models with Application to Criminal Networks, Sirio Legramanti and Tommaso Rigon and Daniele Durante and David B. Dunson, 2021, (arXiv:2007.08569).
data(Ndrangheta)
data(Ndrangheta)
NewGuinea
a social network of 16 tribes, where two types of interactions were recorded, amounting to either friendship or enmity [read-cultures-1954].
data(NewGuinea)
data(NewGuinea)
A binary array of size (16,16,3) the first layer encodes enmity, the second, the friendship relations. The third, no relations between the two tribes.
https://networks.skewed.de/net/new_guinea_tribes
Kenneth E. Read, “Cultures of the Central Highlands, New Guinea”, Southwestern J. of Anthropology, 10(1):1-43 (1954). DOI: 10.1086/soutjanth.10.1.3629074
data(NewGuinea)
data(NewGuinea)
Compute the normalized mutual information of two discrete samples
NMI(cl1, cl2)
NMI(cl1, cl2)
cl1 |
vector of discrete labels |
cl2 |
vector of discrete labels |
the normalized mutual information between the two discrete samples
cl1 <- sample(2, 500, replace = TRUE) cl2 <- sample(2, 500, replace = TRUE) NMI(cl1, cl2)
cl1 <- sample(2, 500, replace = TRUE) cl2 <- sample(2, 500, replace = TRUE) NMI(cl1, cl2)
DcLbmFit-class
Plot a DcLbmFit-class
## S4 method for signature 'DcLbmFit,missing' plot(x, type = "blocks")
## S4 method for signature 'DcLbmFit,missing' plot(x, type = "blocks")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
DcLbmPath-class
Plot a DcLbmPath-class
## S4 method for signature 'DcLbmPath,missing' plot(x, type = "tree")
## S4 method for signature 'DcLbmPath,missing' plot(x, type = "tree")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
DcSbmFit-class
objectPlot a DcSbmFit-class
object
## S4 method for signature 'DcSbmFit,missing' plot(x, type = "blocks")
## S4 method for signature 'DcSbmFit,missing' plot(x, type = "blocks")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
DiagGmmFit-class
objectPlot a DiagGmmFit-class
object
## S4 method for signature 'DiagGmmFit,missing' plot(x, type = "marginals")
## S4 method for signature 'DiagGmmFit,missing' plot(x, type = "marginals")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
GmmFit-class
objectPlot a GmmFit-class
object
## S4 method for signature 'GmmFit,missing' plot(x, type = "marginals")
## S4 method for signature 'GmmFit,missing' plot(x, type = "marginals")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
IclPath-class
objectPlot an IclPath-class
object
## S4 method for signature 'IclPath,missing' plot(x, type = "tree")
## S4 method for signature 'IclPath,missing' plot(x, type = "tree")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
LcaFit-class
objectPlot a LcaFit-class
object
## S4 method for signature 'LcaFit,missing' plot(x, type = "marginals")
## S4 method for signature 'LcaFit,missing' plot(x, type = "marginals")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
MoMFit-class
objectPlot a MoMFit-class
object
## S4 method for signature 'MoMFit,missing' plot(x, type = "blocks")
## S4 method for signature 'MoMFit,missing' plot(x, type = "blocks")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
MultSbmFit-class
objectPlot a MultSbmFit-class
object
## S4 method for signature 'MultSbmFit,missing' plot(x, type = "blocks")
## S4 method for signature 'MultSbmFit,missing' plot(x, type = "blocks")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
SbmFit-class
objectPlot a SbmFit-class
object
## S4 method for signature 'SbmFit,missing' plot(x, type = "blocks")
## S4 method for signature 'SbmFit,missing' plot(x, type = "blocks")
x |
|
type |
a string which specify plot type:
|
a ggplot2
graphic
IclFit-class
objectThis method take a IclFit-class
object and return the prior used.
prior(fit) ## S4 method for signature 'IclFit' prior(fit)
prior(fit) ## S4 method for signature 'IclFit' prior(fit)
fit |
an |
An S4 object describing the prior parameters
IclFit
: IclFit method
rdcsbm
returns an adjacency matrix and the cluster labels generated randomly using a Degree Corrected Stochastic Block Model.
rdcsbm(N, pi, mu, betain, betaout)
rdcsbm(N, pi, mu, betain, betaout)
N |
A numeric value the size of the graph to generate |
pi |
A numeric vector of length K with clusters proportions. Must sum up to 1. |
mu |
A numeric matrix of dim K x K with the connectivity pattern to generate, elements in [0,1]. |
betain |
A numeric vector of length N which specify the in-degree correction will be normalized per cluster during the generation. |
betaout |
A numeric vector of length N which specify the out-degree correction will be normalized per cluster during the generation. |
It takes the sample size, cluster proportions and emission matrix, and as input and sample a graph accordingly together with the clusters labels.
A list with fields:
x: the count matrix as a dgCMatrix
K: number of generated clusters
N: number of vertex
cl: vector of clusters labels
pi: clusters proportions
mu: connectivity matrix
betain: normalized in-degree parameters
betaout: normalized out-degree parameters
rlbm
returns the adjacency matrix and the cluster labels generated randomly with a Latent Block Model.
rlbm(Nr, Nc, pir, pic, mu)
rlbm(Nr, Nc, pir, pic, mu)
Nr |
desired Number of rows |
Nc |
desired Number of column |
pir |
A numeric vector of length Kr with rows clusters proportions (will be normalized to sum up to 1). |
pic |
A numeric vector of length Kc with columns clusters proportions (will be normalized to sum up to 1). |
mu |
A numeric matrix of dim Kr x Kc with the connectivity pattern to generate. elements in [0,1]. |
This function takes the desired graph size, cluster proportions and connectivity matrix as input and sample a graph accordingly together with the clusters labels.
A list with fields:
x: the generated data matrix as a dgCMatrix
clr: vector of row clusters labels
clc: vector of column clusters labels
Kr: number of generated row clusters
Kc: number of generated column clusters
Nr: number of rows
Nc: number of column
pir: row clusters proportions
pic: column clusters proportions
mu: connectivity matrix
simu <- rlbm(500, 1000, rep(1 / 5, 5), rep(1 / 10, 10), matrix(runif(50), 5, 10))
simu <- rlbm(500, 1000, rep(1 / 5, 5), rep(1 / 10, 10), matrix(runif(50), 5, 10))
rlca
returns a data.frame with factor sampled from an lca model
rlca(N, pi, theta)
rlca(N, pi, theta)
N |
The size of the graph to generate |
pi |
A numeric vector of length K with clusters proportions (will be normalized to sum up to 1). |
theta |
A list of size V |
This function takes the desired graph size, cluster proportions and connectivity matrix as input and sample a graph accordingly together with the clusters labels.
A list with fields:
x: the multi-graph adjacency matrix as an array
K: number of generated clusters
N: number of vertex
cl: vector of clusters labels
pi: clusters proportions
theta:
theta <- list( matrix(c(0.1, 0.9, 0.9, 0.1, 0.5, 0.5, 0.3, 0.7), ncol = 2, byrow = TRUE), matrix(c(0.5, 0.5, 0.3, 0.7, 0.05, 0.95, 0.3, 0.7), ncol = 2, byrow = TRUE), matrix(c(0.5, 0.5, 0.9, 0.1, 0.5, 0.5, 0.1, 0.9), ncol = 2, byrow = TRUE) ) lca.data <- rlca(100, rep(1 / 4, 4), theta)
theta <- list( matrix(c(0.1, 0.9, 0.9, 0.1, 0.5, 0.5, 0.3, 0.7), ncol = 2, byrow = TRUE), matrix(c(0.5, 0.5, 0.3, 0.7, 0.05, 0.95, 0.3, 0.7), ncol = 2, byrow = TRUE), matrix(c(0.5, 0.5, 0.9, 0.1, 0.5, 0.5, 0.1, 0.9), ncol = 2, byrow = TRUE) ) lca.data <- rlca(100, rep(1 / 4, 4), theta)
rmm
returns a count matrix and the cluster labels generated randomly with a Mixture of Multinomial model.
rmm(N, pi, mu, lambda)
rmm(N, pi, mu, lambda)
N |
A numeric value the size of the graph to generate |
pi |
A numeric vector of length K with clusters proportions. Must sum up to 1. |
mu |
A numeric matrix of dim k x D with the clusters patterns to generate, all elements in [0,1]. |
lambda |
A numeric value which specify the expectation for the row sums. |
It takes the sample size, cluster proportions and emission matrix, and as input and sample a graph accordingly together with the clusters labels.
A list with fields:
x: the count matrix as a dgCMatrix
K: number of generated clusters
N: number of vertex
cl: vector of clusters labels
pi: clusters proportions
mu: connectivity matrix
lambda: expectation of row sums
rmreg
returns an X matrix, a y vector and the cluster labels generated randomly with a Mixture of regression model.
rmreg( N, pi, A, sigma, X = cbind(rep(1, N), matrix(stats::rnorm(N * (ncol(A) - 1)), N, ncol(A) - 1)) )
rmreg( N, pi, A, sigma, X = cbind(rep(1, N), matrix(stats::rnorm(N * (ncol(A) - 1)), N, ncol(A) - 1)) )
N |
A numeric value the size of the graph to generate |
pi |
A numeric vector of length K with clusters proportions (must sum up to 1) |
A |
A numeric matrix of dim K x d with the regression coefficient |
sigma |
A numeric of length 1 with the target conditional variance |
X |
A matrix of covariate |
It takes the sample size, cluster proportions and regression parameters matrix and variance as input accordingly
A list with fields:
X: the covariate matrix
y: the target feature
K: number of generated clusters
N: sample size
cl: vector of clusters labels
pi: clusters proportions
A: regression coefficients used in the simulation
sigma: conditional variance
rmultsbm
returns the multi-graph adjacency matrix and the cluster labels generated randomly with a Multinomial Stochastic Block Model.
rmultsbm(N, pi, mu, lambda)
rmultsbm(N, pi, mu, lambda)
N |
The size of the graph to generate |
pi |
A numeric vector of length K with clusters proportions (will be normalized to sum up to 1). |
mu |
A numeric array of dim K x K x M with the connectivity pattern to generate. elements in [0,1]. |
lambda |
A double with the Poisson intensity to generate the total counts |
This function takes the desired graph size, cluster proportions and connectivity matrix as input and sample a graph accordingly together with the clusters labels.
A list with fields:
x: the multi-graph adjacency matrix as an array
K: number of generated clusters
N: number of vertex
cl: vector of clusters labels
pi: clusters proportions
mu: connectivity matrix
lambda:
simu <- rsbm(100, rep(1 / 5, 5), diag(rep(0.1, 5)) + 0.001)
simu <- rsbm(100, rep(1 / 5, 5), diag(rep(0.1, 5)) + 0.001)
rsbm
returns the adjacency matrix and the cluster labels generated randomly with a Stochastic Block Model.
rsbm(N, pi, mu)
rsbm(N, pi, mu)
N |
The size of the graph to generate |
pi |
A numeric vector of length K with clusters proportions (will be normalized to sum up to 1). |
mu |
A numeric matrix of dim K x K with the connectivity pattern to generate. elements in [0,1]. |
This function takes the desired graph size, cluster proportions and connectivity matrix as input and sample a graph accordingly together with the clusters labels.
A list with fields:
x: the graph adjacency matrix as a dgCMatrix
K: number of generated clusters
N: number of vertex
cl: vector of clusters labels
pi: clusters proportions
mu: connectivity matrix
simu <- rsbm(100, rep(1 / 5, 5), diag(rep(0.1, 5)) + 0.001)
simu <- rsbm(100, rep(1 / 5, 5), diag(rep(0.1, 5)) + 0.001)
An S4 class to represent a Stochastic Block Model.
Such model can be used to cluster graph vertex, and model a square adjacency matrix with the following generative model :
These classes mainly store the prior parameters value of this generative model.
The
Sbm-class
must be used when fitting a simple Sbm whereas the SbmPrior-class
must be used when fitting a CombinedModels-class
.
SbmPrior(a0 = 1, b0 = 1, type = "guess") Sbm(alpha = 1, a0 = 1, b0 = 1, type = "guess")
SbmPrior(a0 = 1, b0 = 1, type = "guess") Sbm(alpha = 1, a0 = 1, b0 = 1, type = "guess")
a0 |
Beta prior parameter over links (default to 1) |
b0 |
Beta prior parameter over no-links (default to 1) |
type |
define the type of networks (either "directed", "undirected" or "guess", default to "guess"), for undirected graphs the adjacency matrix is supposed to be symmetric. |
alpha |
Dirichlet prior parameter over the cluster proportions (default to 1) |
a SbmPrior-class
object
a Sbm-class
object
Nowicki, Krzysztof and Tom A B Snijders (2001). “Estimation and prediction for stochastic block structures”. In:Journal of the American statistical association 96.455, pp. 1077–1087
Other DlvmModels:
CombinedModels
,
DcLbm
,
DcSbm
,
DiagGmm
,
DlvmPrior-class
,
Gmm
,
Lca
,
MoM
,
MoR
,
MultSbm
,
greed()
Sbm() SbmPrior() SbmPrior(type = "undirected") Sbm() Sbm(type = "undirected")
Sbm() SbmPrior() SbmPrior(type = "undirected") Sbm() Sbm(type = "undirected")
An S4 class to represent a fit of a Stochastic Block Model, extend IclFit-class
.
model
a Sbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over rows and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size K*K with the number of links between each pair of clusters
move_mat
binary matrix which store move constraints
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a hierarchical fit of a stochastic block model, extend IclPath-class
.
model
a Sbm-class
object to store the model fitted
name
generative model name
icl
icl value of the fitted model
K
number of extracted clusters over row and columns
cl
a numeric vector with row and columns cluster indexes
obs_stats
a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size K*K with the number of links between each pair of clusters
path
a list of size K-1 with that store all the solutions along the path. Each element is a list with the following fields:
icl1: icl value reach with this solution for alpha=1
logalpha: log(alpha) value were this solution is better than its parent
K: number of clusters
cl: vector of cluster indexes
k,l: index of the cluster that were merged at this step
merge_mat: lower triangular matrix of delta icl values
obs_stats: a list with the following elements:
counts: numeric vector of size K with number of elements in each clusters
x_counts: matrix of size K*K with the number of links between each pair of clusters
logalpha
value of log(alpha)
ggtree
data.frame with complete merge tree for easy plotting with ggplot2
tree
numeric vector with merge tree tree[i]
contains the index of i
father
train_hist
data.frame with training history information (details depends on the training procedure)
An S4 class to represent a greedy algorithm with initialization from spectral clustering and or k-means (extends Alg-class
class ).
Seed()
Seed()
a Seed-class
object
Seed
: Seed algorithm class constructor
Seed()
Seed()
SevenGraders
A small multiplex network of friendships among 29 seventh grade students in Victoria, Australia. Students nominated classmates for three different activities (who do you get on with in the class, who are your best friends, and who would you prefer to work with). Edge direction for each of these three types of edges indicates if node i nominated node j, and the edge weight gives the frequency of this nomination. Students 1-12 are boys and 13-29 are girls. The KONECT version of this network is the collapse of de Domenico's multiplex version.
data(SevenGraders)
data(SevenGraders)
A binary array of size (29,29,3) containing directed graphs. The first layer encodes "getting along in class" while the second encodes the best-friendship (can be one-way). The third encodes the preferred work relation.
https://networks.skewed.de/net/7th_graders
M. Vickers and S. Chan, "Representing Classroom Social Structure." Melbourne: Victoria Institute of Secondary Education, (1981).
data(SevenGraders)
data(SevenGraders)
Print an IclPath-class
object, model type and number of found clusters are provided.
## S4 method for signature 'IclFit' show(object)
## S4 method for signature 'IclFit' show(object)
object |
|
None (invisible NULL). No return value, called for side effects.
performs regularized spectral clustering of a sparse adjacency matrix
spectral(X, K)
spectral(X, K)
X |
An adjacency matrix in sparse format (see the |
K |
Desired number of cluster |
cl Vector of cluster labels
Tai Qin, Karl Rohe. Regularized Spectral Clustering under the Degree-Corrected Stochastic Block Model. Nips 2013.
Convert a binary adjacency matrix with missing value to a cube
to_multinomial(X)
to_multinomial(X)
X |
A binary adjacency matrix with NA |
a cube
Young people survey data from Miroslav Sabo and available on the Kaggle platform. This is an authentic example of questionnaire data where Slovakian young people (15-30 years old) were asked musical preferences according to different genres (rock, hip-hop, classical, etc.).
data(Youngpeoplesurvey)
data(Youngpeoplesurvey)
An R data.frame with columns containing each of the 150 original variables of the study.
https://www.kaggle.com/miroslavsabo/young-people-survey
data(Youngpeoplesurvey)
data(Youngpeoplesurvey)