Title: | Tight Clustering |
---|---|
Description: | The functions needed to perform tight clustering Algorithm. |
Authors: | George C. Tseng <[email protected]>, Wing H. Wong <[email protected]> |
Maintainer: | Chi Song <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1 |
Built: | 2024-11-03 06:35:01 UTC |
Source: | CRAN |
This package could perform tight clustering algorithm proposed by George C. Tseng and Wing H. Wong.
Package: | tightClust |
Type: | Package |
Version: | 1.0 |
Date: | 2012-08-28 |
License: | GPL (>=2) |
George C. Tseng <[email protected]>, Wing H. Wong <[email protected]>
Maintainer: Chi Song <[email protected]>
George C. Tseng and Wing H. Wong. (2005) Tight Clustering: A Resampling-based Approach for Identifying Stable and Tight Patterns in Data. Biometrics.61:10-16.
A function to plot the heatmap of the tight cluster result.
## S3 method for class 'tight.clust' plot(x, standardize.gene = TRUE, order.sample = FALSE, plot.noise=TRUE, ...)
## S3 method for class 'tight.clust' plot(x, standardize.gene = TRUE, order.sample = FALSE, plot.noise=TRUE, ...)
x |
Return value of the |
standardize.gene |
If standardize each gene vector to mean 0 and sd 1. |
order.sample |
It specifies whether to order samples (features) using the hierachical clustering method. |
plot.noise |
It specifies whether to plot the remaining noise genes (objects). |
... |
Arguments to |
Chi Song <[email protected]>
George C. Tseng and Wing H. Wong. (2005) Tight Clustering: A Resampling-based Approach for Identifying Stable and Tight Patterns in Data. Biometrics.61:10-16.
Sample microarray data
data(tclust.test.data)
data(tclust.test.data)
The data is a list of 3 items:
ID of each gene
Annotation information of each gene
Data matirx of gene expression: each row represent one gene; each column represent one sample
This function could perform the tight clustering algorithm.
tight.clust(x, target, k.min, alpha = 0.1, beta = 0.6, top.can = 7, seq.num = 2, resamp.num = 10, samp.p = 0.7, nstart = 1, remain.p = 0.1, k.stop = 5, standardize.gene=TRUE, random.seed=NULL)
tight.clust(x, target, k.min, alpha = 0.1, beta = 0.6, top.can = 7, seq.num = 2, resamp.num = 10, samp.p = 0.7, nstart = 1, remain.p = 0.1, k.stop = 5, standardize.gene=TRUE, random.seed=NULL)
x |
Input data, should be |
target |
The total number of clusters that the user aims to find. |
k.min |
The starting point of k0. See 'Details' for more information. |
alpha |
The threshold of comembership index. Default value is suggested to be used. |
beta |
The threshold of clusters stably found in consecutive k0. Default value is suggested to be used. |
top.can |
The number of top (size) candidate clusters for a specific k0. Default value is suggested to be used. |
seq.num |
The number of subsequent k0 that finds the tight cluster. Default value is suggested to be used. |
resamp.num |
Total number of resampling to obtain comembership matrix. Default value is suggested to be used. |
samp.p |
Percentage of subsamples selected. Default value is suggested to be used. |
nstart |
Number of different random inital for K-means. Default value is suggested to be used. |
remain.p |
Stop searching when the percentage of remaining points <= |
k.stop |
Stop decreasing |
standardize.gene |
It specifies whether to standardize each gene vector to mean 0 and sd 1. Default value is suggested to be used. |
random.seed |
If |
Tight clustering method is a resampling-evaluated clustering method that aims to directly identify tight clusters in a high-dimensional complex data set and allow a set of scattered objects without being clustered. The method was originally developed for gene cluster analysis in microarray data but can be applied in any complex data. The most important parameter is k.min
. A large k.min
results in smaller and tighter clusters. Normally k.min
>=target
+5 is suggested. All other parameters do not affect the quality of final clustering results too much and are suggested to remain unchanged.
Returned value is a "tight.clust" object (list). The first element is the original data matrix. The second element is a vector representing the cluster identity (-1: scattered gene set; 1: the first cluster; 2: the second cluster; ...). The third element is a vector of the size of each tight cluster.
Chi Song <[email protected]>
George C. Tseng and Wing H. Wong. (2005) Tight Clustering: A Resampling-based Approach for Identifying Stable and Tight Patterns in Data. Biometrics.61:10-16.
## load the test dataset data(tclust.test.data) ## find 10 tight clusters ptm<-proc.time() ## k.min=25, tighter clusters will be found ## target=1 is used to save time, target=10 is recommended tclust1<-tight.clust(tclust.test.data$Data, target=1, k.min=25, random.seed=12345) proc.time()-ptm ## plot the heat map of cluster result plot(tclust1) ## write the cluster result write.tight.clust(tclust1) ptm<-proc.time() ## k.min=10, looser clusters will be found ## target=1 is used to save time, target=5 is recommended tclust2<-tight.clust(tclust.test.data$Data, target=1, k.min=10, random.seed=12345) proc.time()-ptm ## plot the heat map of cluster result plot(tclust2) ## write the cluster result write.tight.clust(tclust2)
## load the test dataset data(tclust.test.data) ## find 10 tight clusters ptm<-proc.time() ## k.min=25, tighter clusters will be found ## target=1 is used to save time, target=10 is recommended tclust1<-tight.clust(tclust.test.data$Data, target=1, k.min=25, random.seed=12345) proc.time()-ptm ## plot the heat map of cluster result plot(tclust1) ## write the cluster result write.tight.clust(tclust1) ptm<-proc.time() ## k.min=10, looser clusters will be found ## target=1 is used to save time, target=5 is recommended tclust2<-tight.clust(tclust.test.data$Data, target=1, k.min=10, random.seed=12345) proc.time()-ptm ## plot the heat map of cluster result plot(tclust2) ## write the cluster result write.tight.clust(tclust2)
A function to print the tight cluster result to a file or connection.
write.tight.clust(x, ...)
write.tight.clust(x, ...)
x |
Return value of the |
... |
Arguments to |
Chi Song <[email protected]>
George C. Tseng and Wing H. Wong. (2005) Tight Clustering: A Resampling-based Approach for Identifying Stable and Tight Patterns in Data. Biometrics.61:10-16.