Title: | Process an MCMC Sample of Clusterings |
---|---|
Description: | Implements methods for processing a sample of (hard) clusterings, e.g. the MCMC output of a Bayesian clustering model. Among them are methods that find a single best clustering to represent the sample, which are based on the posterior similarity matrix or a relabelling algorithm. |
Authors: | Arno Fritsch |
Maintainer: | Arno Fritsch <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1 |
Built: | 2024-11-07 06:18:20 UTC |
Source: | CRAN |
Implements methods for processing a sample of (hard) clusterings, e.g. the MCMC output of a Bayesian clustering model. Among them are methods that find a single best clustering to represent the sample, which are based on the posterior similarity matrix or a relabelling algorithm.
Package: | mcclust |
Type: | Package |
Version: | 1.0 |
Date: | 2009-03-12 |
License: | GPL (>= 2) |
LazyLoad: | yes |
Most important functions:
comp.psm
for computing posterior similarity matrix (PSM). Based on the PSM maxpear
and minbinder
provide
several optimization methods to find a clustering with maximal posterior expected adjusted Rand index with the true clustering or
one that minimizes the posterior expectation of a loss function by Binder (1978). minbinder
provides the optimization algorithm of
Lau and Green.
relabel
contains the relabelling algorithm of Stephens (2000).
arandi
and vi.dist
compute distance functions for clusterings, the (adjusted) Rand index and the entropy-based variation of
information distance.
Arno Fritsch
Maintainer: Arno Fritsch <[email protected]>
Binder, D.A. (1978) Bayesian cluster analysis, Biometrika 65, 31–38.
Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, accepted.
Lau, J.W. and Green, P.J. (2007) Bayesian model based clustering procedures, Journal of Computational and Graphical Statistics 16, 526–558.
Stephens, M. (2000) Dealing with label switching in mixture models. Journal of the Royal Statistical Society Series B, 62, 795–809.
data(cls.draw2) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm2 <- comp.psm(cls.draw2) # posterior similarity matrix # optimize criteria based on PSM mbind2 <- minbinder(psm2) mpear2 <- maxpear(psm2) # Relabelling k <- apply(cls.draw2,1, function(cl) length(table(cl))) max.k <- as.numeric(names(table(k))[which.max(table(k))]) relab2 <- relabel(cls.draw2[k==max.k,]) # compare clusterings found by different methods with true grouping arandi(mpear2$cl, tru.class) arandi(mbind2$cl, tru.class) arandi(relab2$cl, tru.class)
data(cls.draw2) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm2 <- comp.psm(cls.draw2) # posterior similarity matrix # optimize criteria based on PSM mbind2 <- minbinder(psm2) mpear2 <- maxpear(psm2) # Relabelling k <- apply(cls.draw2,1, function(cl) length(table(cl))) max.k <- as.numeric(names(table(k))[which.max(table(k))]) relab2 <- relabel(cls.draw2[k==max.k,]) # compare clusterings found by different methods with true grouping arandi(mpear2$cl, tru.class) arandi(mbind2$cl, tru.class) arandi(relab2$cl, tru.class)
Computes the adjusted or unadjusted Rand index between two clusterings/partitions of the same objects.
arandi(cl1, cl2, adjust = TRUE)
arandi(cl1, cl2, adjust = TRUE)
cl1 , cl2
|
vectors of cluster memberships (need to have the same lengths). |
adjust |
logical. Should index be adjusted? Defaults to TRUE. |
The Rand index is based on how often the two clusterings agree in the treatment of pairs of observations,
where agreement means that two observations are in/not in the same cluster in both clusterings.
The adjusted Rand index adjusts for the expected number of chance agreements.
Formulas of Hubert and Arabie (1985) are used for the computation.
Arno Fritsch, [email protected]
Hubert, L. and Arabie, P. (1985): Comparing partitions. Journal of Classification, 2, 193–218.
cl1 <- sample(1:3,10,replace=TRUE) cl2 <- c(cl1[1:5], sample(1:3,5,replace=TRUE)) arandi(cl1,cl2) arandi(cl1,cl2,adjust=FALSE)
cl1 <- sample(1:3,10,replace=TRUE) cl2 <- c(cl1[1:5], sample(1:3,5,replace=TRUE)) arandi(cl1,cl2) arandi(cl1,cl2,adjust=FALSE)
Output of a Dirichlet process mixture model with normal components fitted to the data set Ysim1.5
.
True clusters are given by rep(1:8,each =50).
data(cls.draw1.5)
data(cls.draw1.5)
matrix with 500 rows and 400 columns. Each row contains a clustering of the 400 observations.
Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, accepted.
Output of a Dirichlet process mixture model with normal components fitted to the data set Ysim2
.
True clusters are given by rep(1:8,each =50).
data(cls.draw2)
data(cls.draw2)
matrix with 500 rows and 400 columns. Each row contains a clustering of the 400 observations.
Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, accepted.
A similarity matrix is a symmetric matrix whose entry is 1 if observation
i
and j
are in the same cluster and 0 otherwise.
cltoSim(cl) Simtocl(Sim)
cltoSim(cl) Simtocl(Sim)
cl |
vector of cluster memberships |
Sim |
similarity matrix |
Simtocl
does not check whether Sim
is a valid similarity matrix,
e.g. that Sim[i,j]==1
if Sim[i,k]==1
and Sim[j,k]==1.
Arno Fritsch, [email protected]
comp.psm
for an average similarity matrix.
cl <- c(3,3,1,2,2) (Sim <- cltoSim(cl)) Simtocl(Sim) # not a valid similarity matrix (Sim2 <- matrix(c(1,0,1,0,1,1,1,1,1), ncol=3)) Simtocl(Sim2) # no warning
cl <- c(3,3,1,2,2) (Sim <- cltoSim(cl)) Simtocl(Sim) # not a valid similarity matrix (Sim2 <- matrix(c(1,0,1,0,1,1,1,1,1), ncol=3)) Simtocl(Sim2) # no warning
For a sample of clusterings of the same objects the proportion of clusterings in which observation
and
are together in a cluster is computed and a matrix containing all proportions is given out.
comp.psm(cls)
comp.psm(cls)
cls |
a matrix in which every row corresponds to a clustering of the |
In Bayesian cluster analysis the posterior similarity matrix is a matrix whose entry
contains the posterior probability that observation
and
are together in a cluster.
It is estimated by the proportion of a posteriori clusterings in which
and
cluster together.
a symmetric ncol(cls)*ncol(cls)
matrix
Arno Fritsch, [email protected]
(cls <- rbind(c(1,1,2,2),c(1,1,2,2),c(1,2,2,2),c(2,2,1,1))) comp.psm(cls)
(cls <- rbind(c(1,1,2,2),c(1,1,2,2),c(1,2,2,2),c(2,2,1,1))) comp.psm(cls)
Based on a posterior similarity matrix of a sample of clusterings maxpear
finds the clustering that maximizes the
posterior expected Rand adjusted index (PEAR) with the true clustering, while pear
computes PEAR for several provided clusterings.
maxpear(psm, cls.draw = NULL, method = c("avg", "comp", "draws", "all"), max.k = NULL) pear(cls,psm)
maxpear(psm, cls.draw = NULL, method = c("avg", "comp", "draws", "all"), max.k = NULL) pear(cls,psm)
psm |
a posterior similarity matrix, usually obtained from a call to |
cls , cls.draw
|
a matrix in which every row corresponds to a clustering of the |
method |
the maximization method used. Should be one of |
max.k |
integer, if |
For method="avg"
and "comp"
1-psm
is used as a distance matrix for hierarchical clustering with average/complete linkage.
The hierachical clustering is cut for the cluster sizes 1:max.k
and PEAR computed for these clusterings.
Method "draws"
simply computes PEAR for each row of cls.draw
and takes the maximum.
If method="all"
all maximization methods are applied.
cl |
clustering with maximal value of PEAR. If |
value |
value of PEAR. A vector corresponding to the rows of |
method |
the maximization method used. |
Arno Fritsch, [email protected]
Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, accepted.
comp.psm
for computing posterior similarity matrix, minbinder
, medv
, relabel
for other possibilities for processing a sample of clusterings.
data(cls.draw1.5) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm1.5 <- comp.psm(cls.draw1.5) mpear1.5 <- maxpear(psm1.5) table(mpear1.5$cl, tru.class) # Does hierachical clustering with Ward's method lead # to a better value of PEAR? hclust.ward <- hclust(as.dist(1-psm1.5), method="ward") cls.ward <- t(apply(matrix(1:20),1, function(k) cutree(hclust.ward,k=k))) ward1.5 <- pear(cls.ward, psm1.5) max(ward1.5) > mpear1.5$value
data(cls.draw1.5) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm1.5 <- comp.psm(cls.draw1.5) mpear1.5 <- maxpear(psm1.5) table(mpear1.5$cl, tru.class) # Does hierachical clustering with Ward's method lead # to a better value of PEAR? hclust.ward <- hclust(as.dist(1-psm1.5), method="ward") cls.ward <- t(apply(matrix(1:20),1, function(k) cutree(hclust.ward,k=k))) ward1.5 <- pear(cls.ward, psm1.5) max(ward1.5) > mpear1.5$value
Based on a posterior similarity matrix of a sample of clusterings medv
obtains a clustering by using 1-psm
as distance
matrix for hierarchical clustering with complete linkage. The dendrogram is cut at a value h
close to 1.
medv(psm, h=0.99)
medv(psm, h=0.99)
psm |
a posterior similarity matrix, usually obtained from a call to |
h |
The height at which the dendrogram is cut. |
vector of cluster memberships.
Arno Fritsch, [email protected]
Medvedovic, M. Yeung, K. and Bumgarner, R. (2004) Bayesian mixture model based clustering of replicated microarray data, Bioinformatics, 20, 1222-1232.
comp.psm
for computing posterior similarity matrix, maxpear
, minbinder
, relabel
for other possibilities for processing a sample of clusterings.
data(cls.draw1.5) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm1.5 <- comp.psm(cls.draw1.5) medv1.5 <- medv(psm1.5) table(medv1.5, tru.class)
data(cls.draw1.5) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm1.5 <- comp.psm(cls.draw1.5) medv1.5 <- medv(psm1.5) table(medv1.5, tru.class)
Based on a posterior similarity matrix of a sample of clusterings minbinder
finds the clustering that minimizes the
posterior expectation of Binders loss function, while binder
computes the posterior expected loss for several provided clusterings.
minbinder(psm, cls.draw = NULL, method = c("avg", "comp", "draws", "laugreen","all"), max.k = NULL, include.lg = FALSE, start.cl = NULL, tol = 0.001) binder(cls,psm) laugreen(psm, start.cl, tol=0.001)
minbinder(psm, cls.draw = NULL, method = c("avg", "comp", "draws", "laugreen","all"), max.k = NULL, include.lg = FALSE, start.cl = NULL, tol = 0.001) binder(cls,psm) laugreen(psm, start.cl, tol=0.001)
psm |
a posterior similarity matrix, usually obtained from a call to |
cls , cls.draw
|
a matrix in which every row corresponds to a clustering of the |
method |
the maximization method used. Should be one of |
max.k |
integer, if |
include.lg |
logical, should method |
start.cl |
clustering used as starting point for |
tol |
convergence tolerance for |
The posterior expected loss is the sum of the absolute differences of the indicator function of observation
and
clustering together and the posterior probability that they are in one cluster.
For method="avg"
and "comp"
1-psm
is used as a distance matrix for hierarchical clustering with average/complete linkage.
The hierachical clustering is cut for the cluster sizes 1:max.k
and the posterior expected loss is computed for these clusterings.
Method "draws"
simply computes the posterior expected loss for each row of cls.draw
and takes the minimum.
Method "laugreen"
implements the algorithm of Lau and Green (2007), which is based on binary integer programming. Since the method can
take some time to converge it is only used if explicitly demanded with method="laugreen"
or method="all"
and include.lg=TRUE
.
If method="all"
all minimization methods except "laugreen"
are applied.
cl |
clustering with minimal value of expected loss. If |
value |
value of posterior expected loss. A vector corresponding to the rows of |
method |
the maximization method used. |
iter.lg |
if |
Arno Fritsch, [email protected]
Binder, D.A. (1978) Bayesian cluster analysis, Biometrika 65, 31–38.
Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, accepted.
Lau, J.W. and Green, P.J. (2007) Bayesian model based clustering procedures, Journal of Computational and Graphical Statistics 16, 526–558.
comp.psm
for computing posterior similarity matrix, maxpear
, medv
, relabel
for other possibilities for processing a sample of clusterings. lp
for the linear programming.
data(cls.draw2) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm2 <- comp.psm(cls.draw2) mbind2 <- minbinder(psm2) table(mbind2$cl, tru.class) # Does hierachical clustering with Ward's method lead # to a lower value of Binders loss? hclust.ward <- hclust(as.dist(1-psm2), method="ward") cls.ward <- t(apply(matrix(1:20),1, function(k) cutree(hclust.ward,k=k))) ward2 <- binder(cls.ward, psm2) min(ward2) < mbind2$value # Method laugreen is applied to 40 randomly selected observations ind <- sample(1:400, 40) mbind.lg <- minbinder(psm2[ind, ind],cls.draw2[,ind], method="all", include.lg=TRUE) mbind.lg$value
data(cls.draw2) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm2 <- comp.psm(cls.draw2) mbind2 <- minbinder(psm2) table(mbind2$cl, tru.class) # Does hierachical clustering with Ward's method lead # to a lower value of Binders loss? hclust.ward <- hclust(as.dist(1-psm2), method="ward") cls.ward <- t(apply(matrix(1:20),1, function(k) cutree(hclust.ward,k=k))) ward2 <- binder(cls.ward, psm2) min(ward2) < mbind2$value # Method laugreen is applied to 40 randomly selected observations ind <- sample(1:400, 40) mbind.lg <- minbinder(psm2[ind, ind],cls.draw2[,ind], method="all", include.lg=TRUE) mbind.lg$value
Cluster labels of a clusterings are replaced by 1:length(table(cl))
.
norm.label(cl)
norm.label(cl)
cl |
vector of cluster memberships |
the clustering with normed labels.
Arno Fritsch, [email protected]
relabel
for labelling a sample of clusterings the same way
(cl <- sample(c(13,12,34), 13, replace=TRUE)) norm.label(cl) (cl <- sample(c("a","b","f31"), 13, replace=TRUE)) norm.label(cl)
(cl <- sample(c(13,12,34), 13, replace=TRUE)) norm.label(cl) (cl <- sample(c("a","b","f31"), 13, replace=TRUE)) norm.label(cl)
For a sample of clusterings in which corresponding clusters have different labels the algorithm attempts to bring the clusterings to a unique labelling.
relabel(cls, print.loss = TRUE)
relabel(cls, print.loss = TRUE)
cls |
a matrix in which every row corresponds to a clustering of the |
print.loss |
logical, should current value of loss function be printed after each iteration? Defaults to TRUE. |
The algorithm minimizes the loss function
over the clusterings,
observations and
clusters, where
is the
estimated probability that observation
belongs to cluster
and
indicates to which cluster
observation
belongs in clustering
.
is an indicator function.
Minimization is achieved by iterating the estimation of over all clusterings and the
minimization of the loss function in each clustering by permuting the cluster labels. The latter is
done by linear programming.
cls |
the input |
P |
an |
loss.val |
value of the loss function. |
cl |
vector of cluster memberships that have the highest probabilities |
The algorithm assumes that the number of clusters is fixed. If this is not the case
is taken to be the most common number of clusters. Clusterings with other numbers of clusters are discarded
and a warning is issued.
The implementation is a variant of the algorithm of Stephens which is originally applied to draws of parameters for each observation, not to cluster labels.
Arno Fritsch, [email protected]
Stephens, M. (2000) Dealing with label switching in mixture models. Journal of the Royal Statistical Society Series B, 62, 795–809.
lp.transport
for the linear programming, maxpear
, minbinder
, medv
for other possibilities of processing a sample of clusterings.
(cls <- rbind(c(1,1,2,2),c(1,1,2,2),c(1,2,2,2),c(2,2,1,1))) # group 2 in clustering 4 corresponds to group 1 in clustering 1-3. cls.relab <- relabel(cls) cls.relab$cls
(cls <- rbind(c(1,1,2,2),c(1,1,2,2),c(1,2,2,2),c(2,2,1,1))) # group 2 in clustering 4 corresponds to group 1 in clustering 1-3. cls.relab <- relabel(cls) cls.relab$cls
Computes the 'variation of information' distance of Meila (2007) between two clusterings/partitions of the same objects.
vi.dist(cl1, cl2, parts = FALSE, base = 2)
vi.dist(cl1, cl2, parts = FALSE, base = 2)
cl1 , cl2
|
vectors of cluster memberships (need to have the same lengths). |
parts |
logical; should the two conditional entropies also be returned? |
base |
base of logarithm used for computation of entropy and mutual information. |
The variation of information distance is the sum of the two conditional entropies of one clustering given the other. For details see Meila (2007).
The VI distance. If parts=TRUE
the two conditional entropies are appended.
Arno Fritsch, [email protected]
Meila, M. (2007) Comparing Clusterings - an Information Based Distance. Journal of Multivariate Analysis, 98, 873 – 895.
cl1 <- sample(1:3,10,replace=TRUE) cl2 <- c(cl1[1:5], sample(1:3,5,replace=TRUE)) vi.dist(cl1,cl2) vi.dist(cl1,cl2, parts=TRUE)
cl1 <- sample(1:3,10,replace=TRUE) cl2 <- c(cl1[1:5], sample(1:3,5,replace=TRUE)) vi.dist(cl1,cl2) vi.dist(cl1,cl2, parts=TRUE)
Cluster means are given by the 8 possible values of to which standard normal
noise was added. True clusters are given by
rep(1:8,each =50).
data(Ysim1.5)
data(Ysim1.5)
matrix with 400 rows and 3 columns.
Simulated by
1.5 * matrix(c(rep(c(1,1,1),50), rep(c(1,1,-1),50), rep(c(1,-1,1),50), rep(c(-1,1,1),50),
rep(c(1,-1,-1),50), rep(c(-1,1,-1),50), rep(c(-1,-1,1),50), rep(c(-1,-1,-1),50)),
byrow=TRUE, ncol=3) + matrix(rnorm( 400*3),ncol=3)
Fritsch, A. and Ickstadt, K. (2008) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, accepted.
Cluster means are given by the 8 possible values of to which standard normal
noise was added. True clusters are given by
rep(1:8,each =50).
data(Ysim2)
data(Ysim2)
matrix with 400 rows and 3 columns.
Simulated by
2 * matrix(c(rep(c(1,1,1),50), rep(c(1,1,-1),50), rep(c(1,-1,1),50), rep(c(-1,1,1),50),
rep(c(1,-1,-1),50), rep(c(-1,1,-1),50), rep(c(-1,-1,1),50), rep(c(-1,-1,-1),50)),
byrow=TRUE, ncol=3) + matrix(rnorm( 400*3),ncol=3)
Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, accepted.