Title: | Probabilistic and Possibilistic Cluster Analysis |
---|---|
Description: | Partitioning clustering divides the objects in a data set into non-overlapping subsets or clusters by using the prototype-based probabilistic and possibilistic clustering algorithms. This package covers a set of the functions for Fuzzy C-Means (Bezdek, 1974) <doi:10.1080/01969727308546047>, Possibilistic C-Means (Krishnapuram & Keller, 1993) <doi:10.1109/91.227387>, Possibilistic Fuzzy C-Means (Pal et al, 2005) <doi:10.1109/TFUZZ.2004.840099>, Possibilistic Clustering Algorithm (Yang et al, 2006) <doi:10.1016/j.patcog.2005.07.005>, Possibilistic C-Means with Repulsion (Wachs et al, 2006) <doi:10.1007/3-540-31662-0_6> and the other variants of hard and soft clustering algorithms. The cluster prototypes and membership matrices required by these partitioning algorithms are initialized with different initialization techniques that are available in the package 'inaparc'. As the distance metrics, not only the Euclidean distance but also a set of the commonly used distance metrics are available to use with some of the algorithms in the package. |
Authors: | Zeynel Cebeci [aut, cre], Figen Yildiz [aut], Alper Tuna Kavlak [aut], Cagatay Cebeci [aut], Hasan Onder [aut] |
Maintainer: | Zeynel Cebeci <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.0.1 |
Built: | 2024-12-08 07:16:04 UTC |
Source: | CRAN |
Partitioning clustering simply divides the objects in a data set into non-overlapping subsets or clusters by using the prototype-based probabilistic and possibilistic clustering algorithms. The package covers various functions for K-Means (MacQueen, 1967), Fuzzy C-Means (Bezdek, 1974), Possibilitic C-Means (Krishnapuram & Keller, 1993;1996), Possibilistic and Fuzzy C-Means (Pal et al, 2005), Possibilistic Clustering Algorithm (Yang et al, 2006), Possibilistic C-Means with Repulsion (Wachs et al, 2006), Unsupervised Possibilistic Fuzzy C-Means (Wu et al, 2010) and the other variant algorithms which produce hard, fuzzy and possibilistic partitions of numeric data sets. The cluster prototypes and membership matrices required by the partitioning algorithms can be initialized with many initialization techniques that are available in the package ‘inaparc’. As the distance metrics, not only the Euclidean distance but also a set of the commonly used distance metrics are available to use with some of the algorithms in the package.
The goal of prototype-based algorithms as the most widely-used group of partitioning clustering algorithms is to partition a data set of n objects with p features into k, a pre-defined number of clusters which are the non-hierchical subsets of data. On the clustering context, a prototype is a data item that represents or characterizes a cluster. Usually, a prototype can be regarded as the most central point in a data subspace (Tan et al. 2006).
Among the partitioning-based algorithms, the hard clustering algorithms, i.e. K-means, assume that each data point belongs to one cluster; however, in practice clusters may overlap and data points may belong to more than one cluster. In this case the membership degrees of a data point to clusters should be a value between zero and one. This idea has been modelled with the fuzzy clustering algorithms. The Fuzzy C-means proposed by Bezdek (FCM) (Bezdek, 1981), is the well-known partitioning based fuzzy algorithm. It assigns a fuzzy membership degree to each data point based on its distances to the cluster centers. If a data point is closer to a cluster center, its membership to this cluster be higher than its memberships to the other clusters.
As an extension of the basic FCM algorithm, Gustafson and Kessel (GK) clustering algorithm employs an adaptive distance norm in order to detect clusters with different geometrical shapes (Babuska, 2001; Correa et al, 2011). Gath and Geva (1989) revealed that the fuzzy maximum likelihood estimates (FMLE) clustering algorithm can be used to detect clusters of varying shapes, sizes and densities.
In the related literature it has been revealed that FCM is sensitive to noise and outliers in the data sets. Krishnapuram and Keller proposed and later improved the Possibilistic C-means (PCM) algorithm in order to avoid the effect of outliers or noises on the clustering performance (Krishnapuram & Keller, 1993;1996). Althouh PCM solves the problems due to noise and outliers by relaxing the probabilistic constraint of FCM, it has the disadvantage to generate coincident clusters with poor initializations. In order to overcome this problem, several variants of FCM and PCM have been proposed. Fuzzy Possibilistic C-means (FPCM) (Pal et al, 1997) is one of the mixed algorithms for simultaneously constructing memberships and typicalities. However FPCM has some problems because of the row sum constraint with the typicality values that produces unrealistic typicality values for large data sets. By adding a repulsion term forcing clusters far away from each other, an extension of PCM with repulsion has been introduced by Timm et al (2004). Pal et al (2005) proposed the Possibilistic Fuzzy C-means (PFCM) to overcome the noise sensitivity defect of FCM, to eliminate the coincident clusters problem of PCM and to eliminate the row sum constraints of FPCM.
Possibilistic Clustering Algorithm (PCA) proposed by Yang and Wu (2006) was in the front lines of another direction of algorithms improving FCM and PCM. The authors argued that the resulting membership of their proposed algorithm becomes an exponential function, so that it is robust to noise and outliers. Recently, Wu et al (2010) introduced the Unsupervised Possibilistic Fuzzy Clustering (UPFC) algorithm. UPFC is an extension of PCA by combining FCM and PCA algorithms in order to overcome the noise sensitivity problem of FCM and the coincident clusters problem of PCM. When compared to previous algorithms, UPFC seems promising algorithm for clustering in fuzzy and possibilistic environments since it needs not a probabilistic initialization.
Given a data set describing
data objects, the probabilistic and possibilistic clustering algorithms partition data into
, a predefined number of clusters through the minimization of their related objective functions with some probabilistic or possibilistic constraints.
is the data set for
objects in the p-dimensional data space
,
where:
is the number of objects in the data set,
is the number of features or variables which describes the data objects,
is p-length data vector for the ith data object.
On the clustering context, clusters are mostly represented by their prototypes. The prototypes are generally the centers of clusters which can be either centroids or medoids. The probabilistic and possibilistic partitioning clustering algorithms start with initialization of a cluster prototype matrix , and updates it through the iteration steps until it is stabilized.
is the protoype matrix of the clusters, where:
is the number of clusters,
is the p-length prototype vector for the jth cluster
The clustering algorithms compute the membership degrees of data objects by using some distance metrics for calculation of their proximities to the cluster centers.
is the distance measure between the data object
and cluster prototype
. In general, the squared Euclidean distance metric are used in most of the applications:
The clustering algorithms usually are run with the standard and squared Euclidean distance norms, which induce hyperspherical clusters. Therefore they are able find the clusters with the same shape and orientation because the norm inducing matrix is an identity matrix . On the other hand, the distance metrics can be employed with a
diagonal norm inducing matrix
which modifies the distances depending on the direction in which the distance is measured (Timm et al, 2004; Balasko et al 2005). In this case, the squared Euclidean distance with the norm matrix
becomes:
can be also formed as the inverse of the
covariance matrix
:
, where:
.
Where stands for the sample mean of the data. When the distance is induced with the norm matrix
the Mahalanobis norm on
can be written as follows:
The membership degrees are the measures specifying the amount of belongingness of the data objects to the different clusters. A data point nearer to the center of a cluster has a higher degree of membership to this cluster.
is the matrix for an hard, fuzzy or possibilistic partition of
, where:
is the membership degree of
to the jth cluster
Zeynel Cebeci, Figen Yildiz, A. Tuna Kavlak, Cagatay Cebeci & Hasan Onder
Babuska, R. (2001). Fuzzy and neural control. DISC Course Lecture Notes. Delft University of Technology. Delft, the Netherlands. <https://tr.scribd.com/document/209211977/Fuzzy-and-Neural-Control>.
Balasko, B., Abonyi, J. & Feil, B. (2005). Fuzzy clustering and data analysis toolbox. Department of Process Eng., Univ. of Veszprem, Veszprem.
Bezdek, J.C. (1981). Pattern recognition with fuzzy objective function algorithms. Plenum, NY. <ISBN:0306406713>
Cebeci, Z. & Yildiz, F. (2015). Bulanik C-Ortalamalar algoritmasýnýn farklý kume buyuklukleri icin hesaplama performansi ve kumeleme gecerliliginin karsilastirilmasi, In Proc.pf 9. Ulusal Zootekni Bilim Kongresi, Sep. 2015, Konya. pp. 227-239. doi:10.13140/RG.2.1.2909.9288
Cebeci, Z., Kavlak, A.T. & Yildiz, F. (2017). Validation of fuzzy and possibilistic clustering results, in Proc. of 2017 Int. Artificial Intelligence & Data Processing Symposium, IEEE. pp. 1-7. doi:10.1109/IDAP.2017.8090183
Cebeci, Z. (2018). Initialization of membership degree matrix for fast convergence of Fuzzy C-Means clustering, in Proc. of 2018 Int.Conf.on Artificial Intelligence & Data Processing, pp. 1-5. IEEE. doi:10.1109/IDAP.2018.8620920
Cebeci, Z. & Cebeci, C. (2018) kpeaks: an R package for quick selection of k for cluster analysis, in Proc. of 2018 Int. Conf. on Artificial Intelligence & Data Processing, pp. 1-7, IEEE. doi:10.1109/IDAP.2018.8620896
Cebeci, Z. (2019). Comparison of internal validity indices for fuzzy clustering. Journal of Agricultural Informatics, 10(2):1-14. doi:10.17700/jai.2019.10.2.537
Correa, C., Valero, C., Barreiro, P., Diago, M. P., & Tardáguila, J. (2011). A comparison of fuzzy clustering algorithms applied to feature extraction on vineyard. In Proc. of the 14th Conf. of the Spanish Assoc. for Artificial Intelligence. <http://oa.upm.es/9246/>.
Gath, I. & Geva, A.B. (1989). Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (7): 773-781. <doi:10.1109/34.192473>
Gustafson, D. E. & Kessel, W. C. (1979). Fuzzy clustering with a fuzzy covariance matrix. In Proc. of IEEE Conf. on Decision and Control including the 17th Symposium on Adaptive Processes, San Diego. pp. 761-766. <doi:10.1109/CDC.1978.268028>
Pal, N.R., Pal, K., & Bezdek, J.C. (1997). A mixed c-means clustering model. In Proc. of the 6th IEEE Int. Conf. on Fuzzy Systems, 1, pp. 11-21. <doi:10.1109/FUZZY.1997.616338>
Pal, N.R., Pal, S.K., Keller,J.M. & Bezdek, J.C. (2005). A possibilistic fuzzy c-means clustering algorithm. IEEE Transactions on Fuzzy Systems, 13 (4): 517-530. <doi: 10.1109/TFUZZ.2004.840099>
Tan, P. N., Steinbach, M., & Kumar, V. (2006). Cluster analysis: Basic concepts and algorithms. In Introduction to Data Mining. Pearson Addison Wesley. <http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf>
Timm, H., Borgelt, C., Doring, C. & Kruse, R. (2004). An extension to possibilistic fuzzy cluster analysis. Fuzzy Sets and Systems, 147 (1): 3-16. <doi:10.1016/j.fss.2003.11.009>
Yang, M. S. & Wu, K. L. (2006). Unsupervised possibilistic clustering. Pattern Recognition, 39(1): 5-21. <doi:10.1016/j.patcog.2005.07.005>
Wu, X., Wu, B., Sun, J. & Fu, H. (2010). Unsupervised possibilistic fuzzy clustering. J. of Information & Computational Sci., 7 (5): 1075-1080.
as.ppclust
,
comp.omega
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
is.ppclust
pca
,
ppclust2
pcm
,
pcmr
,
pfcm
,
summary.ppclust
,
upfc
Converts an object of the classes fanny.object
, summary.fclust
, kmeans
or vegclust
to ‘ppclust’ class.
as.ppclust(objx, ...)
as.ppclust(objx, ...)
objx |
an object to be converted to an instance of |
... |
additional arguments. |
an object of ppclust
class.
Zeynel Cebeci
is.ppclust
,
ppclust2
,
summary.ppclust
data(iris) # Create an fclust object ofc <- fclust::FKM(X=iris[,1:4], k=3) # Test the class of object 'ofc' class(ofc) # Convert 'ofc' to ppclust object opc <- as.ppclust(ofc) # Test the class of 'opc' object class(opc)
data(iris) # Create an fclust object ofc <- fclust::FKM(X=iris[,1:4], k=3) # Test the class of object 'ofc' class(ofc) # Convert 'ofc' to ppclust object opc <- as.ppclust(ofc) # Test the class of 'opc' object class(opc)
Computes the vector , the possibilistic penalty argument for Possibilistic C-Means clustering analysis.
comp.omega(d, u, m=2, pco=NULL, K=1)
comp.omega(d, u, m=2, pco=NULL, K=1)
d |
a numeric matrix containing the distances of objects to the centers of clusters. |
u |
a numeric matrix containing the fuzzy or possibilistic membership degrees of the data objects. |
pco |
an object of ‘ppclust’ class that contains the results from a clustering algorithm. If it is supplied, there is no need to input the arguments |
m |
a number for the fuzzy exponent. The default is 2. |
K |
a number greater than 0 to be used as the weight of penalty term. The default is 1. |
The vector is called possibilistic penalty term that controls the variance of clusters (Gosztolya & Szilagyi, 2015). In general, it is computed by using the fuzzy intra cluster distances from a previous run of FCM. This vector of mobilization scale parameters related with the spread of clusters contains the distance values at where membership degrees becomes 0.5 for each cluster (Timm et al, 2004; Wachs et al, 2006; Alamelumangai, 2014).
Where:
is a coefficent,
. It is generally chosen to be 1.
omega |
a numeric vector containing the mobilization scale values for each cluster. |
Zeynel Cebeci
Alamelumangai N. (2014). Computer aided segmentation of mammary carcinoma on ultrasound images using soft computing techniques. PhD Thesis, Anna Univ., IN. <http://hdl.handle.net/10603/50590>
Wachs, J., Shapira, O., & Stern, H. (2006). A method to enhance the ‘Possibilistic C-Means with Repulsion’ algorithm based on cluster validity index. In Applied Soft Computing Technologies: The Challenge of Complexity, pp. 77-87. Springer, Berlin, Heidelberg. <doi: 10.1007/3-540-31662-0_6>
Gosztolya, G. & Szilagyi, L. (2015). Application of fuzzy and possibilistic c-means clustering models in blind speaker clustering. Acta Polytechnica Hungarica, 12(7):41-56. <http://publicatio.bibl.u-szeged.hu/6151/1/2015-acta-polytechnica.pdf>
data(iris) x <- iris[,-5] # Run FCM res.fcm <- fcm(x=x, centers=3) # Compute the mobilization scale values using the results from FCM vomg1 <- comp.omega(pco=res.fcm) vomg1 # Compute the mobilization scale values using the distances and memberships from FCM vomg2 <- comp.omega(res.fcm$d, res.fcm$u, m=3) vomg2 # Compute the mobilization scale values with the K value of 10 vomg3 <- comp.omega(res.fcm$d, res.fcm$u, m=2, K=10) vomg3
data(iris) x <- iris[,-5] # Run FCM res.fcm <- fcm(x=x, centers=3) # Compute the mobilization scale values using the results from FCM vomg1 <- comp.omega(pco=res.fcm) vomg1 # Compute the mobilization scale values using the distances and memberships from FCM vomg2 <- comp.omega(res.fcm$d, res.fcm$u, m=3) vomg2 # Compute the mobilization scale values with the K value of 10 vomg3 <- comp.omega(res.fcm$d, res.fcm$u, m=2, K=10) vomg3
Crisps the fuzzy and possibilistic membership degrees from the fuzzy or possibilistic clustering algorithms.
crisp(u, method, tv)
crisp(u, method, tv)
u |
a numeric matrix containing the fuzzy or possibilistic membership degrees of the data objects. |
method |
a string for selection of the crisping method. The default is max that assigns the data object to the cluster in which the object has maximum membership. The alternative is threshold that assigns the objects to a cluster if its maximum membership degree is greater than |
tv |
a number for the threshold membership degree. The default is 0.5 with the |
The function crisp
produces the crisp or hard membership degrees of the objects in order to place them into only one cluster.
cluster |
a numeric vector containing the indexes (labels) of clusters for the maximum membership of the objects. |
Zeynel Cebeci
data(iris) x <- iris[,1:4] # Run FCM res.fcm <- fcm(x, centers=3) # Crisp the fuzzy memberships degrees and plot the crisp memberships cllabels <- crisp(res.fcm$u) plot(x, col=cllabels)
data(iris) x <- iris[,1:4] # Run FCM res.fcm <- fcm(x, centers=3) # Crisp the fuzzy memberships degrees and plot the crisp memberships cllabels <- crisp(res.fcm$u) plot(x, col=cllabels)
The function ekm
partitions a numeric data set by using the K-means clustering algorithm. It is a wrapper function of the standard kmeans
function with more initialization (seeding) techniques and output values obtained in the multiple starts of the algorithm.
ekm(x, centers, dmetric="euclidean", alginitv="hartiganwong", nstart=1, iter.max=1000, stand=FALSE, numseed)
ekm(x, centers, dmetric="euclidean", alginitv="hartiganwong", nstart=1, iter.max=1000, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
dmetric |
a string for the distance calculation method. The default is euclidean for the Euclidean distances. |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is hartiganwong for Hartigan-Wong seeding method. See |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
K-Means (KM) clustering algorithm partitions a data set of objects into
, a pre-defined number of clusters. It is an iterative relocation algorithm, and in each iteration step the objects are assigned to the nearest cluster by using the Euclidean distances. The objective of KM is to minimize total intra-cluster variance (or the sum of squared errors):
In the above equation for :
is the distance measure between the object
and cluster prototype
.The Euclidean distance metric is usually employed with the implementations of K-means.
See kmeans
and ppclust-package
for more details about the terms of the objective function .
The update equation of the cluster prototypes:
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the crisp membership degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start with the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘KM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, Figen Yildiz & Hasan Onder
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, University of California Press, 1: 281-297. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.8619&rep=rep1&type=pdf>
kmeans
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
# Load dataset iris data(iris) x <- iris[,-5] # Run EKM for 3 clusters res.ekm <- ekm(x, centers=3) # Print and plot the clustering result print(res.ekm$cluster) plot(x, col=res.ekm$cluster, pch=16)
# Load dataset iris data(iris) x <- iris[,-5] # Run EKM for 3 clusters res.ekm <- ekm(x, centers=3) # Print and plot the clustering result print(res.ekm$cluster) plot(x, col=res.ekm$cluster, pch=16)
Partitions a numeric data set by using the Fuzzy C-Means (FCM) clustering algorithm (Bezdek, 1974;1981).
fcm(x, centers, memberships, m=2, dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
fcm(x, centers, memberships, m=2, dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent or fuzzifier. The default is 2. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to make the initial cluster centers not changed along the different starts of the algorithm. The default is |
fixmemb |
a logical flag to make the initial membership degrees not changed along the different starts of the algorithm. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
an optional seeding number to set the seed of R's random number generator. |
Fuzzy C-Means (FCM) clustering algorithm was firstly studied by Dunn (1973) and generalized by Bezdek in 1974 (Bezdek, 1981). Unlike K-means algorithm, each data object is not the member of only one cluster but is the member of all clusters with varying degrees of memberhip between 0 and 1. It is an iterative clustering algorithm that partitions the data set into a predefined k partitions by minimizing the weighted within group sum of squared errors. The objective function of FCM is:
In the objective function, is the fuzzifier to specify the amount of 'fuzziness' of the clustering result;
. It is usually chosen as 2. The higher values of
result with the more fuzzy clusters while the lower values give harder clusters. If it is 1, FCM becomes an hard algorithm and produces the same results with K-means.
FCM must satisfy the following constraints:
The objective function of FCM is minimized by using the following update equations:
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the fuzzy memberships degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
m |
a number for the fuzzifier. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘FCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, Figen Yildiz & Alper Tuna Kavlak
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Dunn, J.C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybernetics, 3(3):32-57. <doi:10.1080/01969727308546046>
Bezdek, J.C. (1974). Cluster validity with fuzzy sets. J. Cybernetics, 3: 58-73. <doi:10.1080/01969727308546047>
Bezdek J.C. (1981). Pattern recognition with fuzzy objective function algorithms. Plenum, NY. <ISBN:0306406713>
ekm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
# Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ algorithm v <- inaparc::kmpp(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FCM with the initial prototypes and memberships fcm.res <- fcm(x, centers=v, memberships=u, m=2) # Show the fuzzy membership degrees for the top 5 objects head(fcm.res$u, 5)
# Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ algorithm v <- inaparc::kmpp(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FCM with the initial prototypes and memberships fcm.res <- fcm(x, centers=v, memberships=u, m=2) # Show the fuzzy membership degrees for the top 5 objects head(fcm.res$u, 5)
Partitions a numeric data set by using the Type-2 Fuzzy C-Means (FCM2) clustering algorithm (Rhee & Hwang, 2001). It has been reported that it is effective for spherical clusters in the data sets, but it fails when the data sets contain non-spherical and complex structures (Gosain & Dahiya, 2016).
fcm2(x, centers, memberships, m=2, dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
fcm2(x, centers, memberships, m=2, dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent or fuzzifier. The default is 2. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options, see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to make the initial cluster centers not changed along the different starts of the algorithm. The default is |
fixmemb |
a logical flag to make the initial membership degrees not changed along the different starts of the algorithm. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
an optional seeding number to set the seed of R's random number generator. |
In the Type-2 Fuzzy C-Means (T2FCM) clustering algorithm proposed by (Rhee & Hwang, 2001), the idea is that all data points should not have the same contribution in computing the cluster prototypes, instead the data points with higher membership value should contribute much more in the cluster prototypes (Gosain & Dahiya, 2016). Based on this idea, a type-2 membership value is computed by using the following equation:
In the above equation, and
respectively stand for the Type-1 (standard) and Type-2 membership values.
is only used to update the cluster prototypes.
The objective function of FCM2 and other terms are the same with those of FCM except the following update equation for the cluster prototypes:
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the fuzzy memberships degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
m |
a number for the fuzzifier. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘FCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci & Alper Tuna Kavlak
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Rhee, F.C.H. & Hwang, C. (2001). A type-2 fuzzy c-means clustering algorithm. In <IEEE 9th IFSA World Congress and 20th NAFIPS Conf. 2001., 4:1926-1929. <doi:10.1109/NAFIPS.2001.944361>
Gosain, A. & Dahiya, S. (2016). Performance analysis of various fuzzy clustering algorithms: A review. Procedia Comp. Sci., 79:100-111. <doi:10.1016/j.procs.2016.03.014>
ekm
,
fcm
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
# Load dataset X12 data(x12) # Initialize the prototype matrix using K-means++ algorithm v <- inaparc::kmpp(x12, k=2)$v # Initialize the membership degrees matrix u <- inaparc::imembrand(nrow(x12), k=2)$u # Run FCM2 with the initial prototypes and memberships fcm2.res <- fcm2(x12, centers=v, memberships=u, m=2) # Show the fuzzy membership degrees for the top 5 objects head(fcm2.res$u, 5)
# Load dataset X12 data(x12) # Initialize the prototype matrix using K-means++ algorithm v <- inaparc::kmpp(x12, k=2)$v # Initialize the membership degrees matrix u <- inaparc::imembrand(nrow(x12), k=2)$u # Run FCM2 with the initial prototypes and memberships fcm2.res <- fcm2(x12, centers=v, memberships=u, m=2) # Show the fuzzy membership degrees for the top 5 objects head(fcm2.res$u, 5)
Partitions a numeric data set by using the Fuzzy and Possibilistic C-Means (FPCM) clustering algorithm (Pal et al, 1997).
fpcm(x, centers, memberships, m=2, eta=2, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
fpcm(x, centers, memberships, m=2, eta=2, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent or fuzzifier. The default is 2. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 3. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to make the initial cluster centers not changed along the different starts of the algorithm. The default is |
fixmemb |
a logical flag to make the initial membership degrees not changed along the different starts of the algorithm. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Fuzzy and Possibilistic C Means (FPCM) algorithm which has been proposed by Pal et al (1997) indended to combine the characteristics of FCM and PCM, and hence, was also so-called Mixed C-Means (MCM) algorithm.
The objective function of FPCM is:
In the above equation:
is the data set for
objects in the p-dimensional data space
,
is the protoype matrix of the clusters,
is the matrix for a fuzzy partition of
,
is the matrix for a possibilistic partition of
,
is the squared Euclidean distance between the object
and cluster prototype
.
is the fuzzifier to specify the amount of fuzziness for the clustering;
. It is usually chosen as 2.
is the typicality exponent to specify the amount of typicality for the clustering;
. It is usually chosen as 2.
FPCM must satisfy the following constraints:
The objective function of FPCM is minimized by using the following update equations:
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the fuzzy memberships degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
m |
a number for the fuzzifier. |
eta |
a number for the typicality exponent. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘FCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, Alper Tuna Kavlak & Figen Yildiz
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Pal, N.R., Pal, K., & Bezdek, J.C. (1997). A mixed c-means clustering model. In Proc. of the 6th IEEE Int. Conf. on Fuzzy Systems, 1, pp. 11-21. <doi:10.1109/FUZZY.1997.616338>
ekm
,
fcm
,
fcm2
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
# Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FPCM with the initial prototypes and memberships fpcm.res <- fpcm(x, centers=v, memberships=u, m=2, eta=2) # Show the fuzzy membership degrees for the top 5 objects head(fpcm.res$u, 5) # Show the possibilistic membership degrees for the top 5 objects head(fpcm.res$t, 5)
# Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FPCM with the initial prototypes and memberships fpcm.res <- fpcm(x, centers=v, memberships=u, m=2, eta=2) # Show the fuzzy membership degrees for the top 5 objects head(fpcm.res$u, 5) # Show the possibilistic membership degrees for the top 5 objects head(fpcm.res$t, 5)
Partitions a numeric data set by using the Fuzzy Possibilistic Product Partition C-Means (FPPPCM) clustering algorithm which has been proposed by Szilagyi & Szilagyi (2014).
fpppcm(x, centers, memberships, m=2, eta=2, K=1, omega, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
fpppcm(x, centers, memberships, m=2, eta=2, K=1, omega, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent. The default is 2. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 2. |
K |
a number greater than 0 to be used as the weight of penalty term. The default is 1. |
omega |
a numeric vector of reference distances. If missing, it is internally generated. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to fix the initial cluster centers. The default is |
fixmemb |
a logical flag to fix the initial membership degrees. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Fuzzy Possibilistic Product Partition C-Means (FPPPCM) clustering algorithm aimed to eliminate the effect of outliers in the other fuzzy and possibilistic clustering algorithms. The algorithm includes a probabilistic and a possibilistic term via multiplicative way instead of additive combination (Gosztolya & Szilagyi, 2015). The objective function of the algorithm as follows:
The fuzzy membership degrees in the probabilistic part of the objective function is updated as follows:
The typicality degrees in the possibilistic part of the objective function is calculated as follows:
is the fuzzifier to specify the amount of fuzziness for the clustering;
. It is usually chosen as 2.
is the typicality exponent to specify the amount of typicality for the clustering;
. It is usually chosen as 2.
is the possibilistic penalty term to control the variance of the clusters.
The update equation for cluster prototypes:
an object of class ‘ppclust’, which is a list consists of the following items:
v |
a numeric matrix containing the final cluster prototypes. |
t |
a numeric matrix containing the typicality degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
x |
a numeric matrix containing the processed data set. |
cluster |
a numeric vector containing the cluster labels found by defuzzifying the typicality degrees of the objects. |
csize |
a numeric vector for the number of objects in the clusters. |
k |
an integer for the number of clusters. |
m |
a number for the used fuzziness exponent. |
eta |
a number for the used typicality exponent. |
omega |
a numeric vector of reference distances. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘PCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, Alper Tuna Kavlak & Figen Yildiz
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Szilagyi, L. & Szilagyi, S. M. (2014). Generalization rules for the suppressed fuzzy c-means clustering algorithm. Neurocomputing, 139:298-309. <doi:10.1016/j.neucom.2014.02.027>
Gosztolya, G. & Szilagyi, L. (2015). Application of fuzzy and possibilistic c-means clustering models in blind speaker clustering. Acta Polytechnica Hungarica, 12(7):41-56. <http://publicatio.bibl.u-szeged.hu/6151/1/2015-acta-polytechnica.pdf>
ekm
,
fcm
,
fcm2
,
fpcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
upfc
# Load dataset X16 data(x16) x <- x16[,-3] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=2)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=2)$u # Run FPPPCM res.fpppcm <- fpppcm(x, centers=v, memberships=u, m=2, eta=2) # Display typicality degrees res.fpppcm$t # Run FPPPCM for eta=3 res.fpppcm <- fpppcm(x, centers=v, memberships=u, m=2, eta=3) # Display typicality degrees res.fpppcm$t
# Load dataset X16 data(x16) x <- x16[,-3] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=2)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=2)$u # Run FPPPCM res.fpppcm <- fpppcm(x, centers=v, memberships=u, m=2, eta=2) # Display typicality degrees res.fpppcm$t # Run FPPPCM for eta=3 res.fpppcm <- fpppcm(x, centers=v, memberships=u, m=2, eta=3) # Display typicality degrees res.fpppcm$t
Displays the distance metric options for calculation of the distances between the data objects and the cluster centers.
get.dmetrics(dmt="all")
get.dmetrics(dmt="all")
dmt |
a string for the type of distance metrics. The default is all for the list of all of the distance metrics which are available in the package. The other options are l1, l2, lp, sl2 and sc for L1, L2, Lp, squared L2 and squared Chord, respectively. |
dmlist |
a two-column matrix containing the distance metrics and their descriptions. |
Zeynel Cebeci
Podani, J. (2000). Introduction to the Exploration of Multivariate Biological Data. Backhuys Publishers, Leiden, The Netherlands. 407 pages. <ISBN 90-5782-067-6>
McCune, B., & Grace, J. B. (2002). Analysis of ecological communities. Gleneden Beach, Oregon: MjM Software Design. <ISBN:0972129006>
# Get all distance metrics dmlist <- get.dmetrics(dmt="all") dmlist # Get only L2 type distance metrics dmlist <- get.dmetrics(dmt="l2") dmlist
# Get all distance metrics dmlist <- get.dmetrics(dmt="all") dmlist # Get only L2 type distance metrics dmlist <- get.dmetrics(dmt="l2") dmlist
Partitions a numeric data set by using the Gath-Geva (GG) clustering algorithm (Gath & Geva, 1989). The function gg
is based on the code in fuzzy.GG
function of the package advclust by Bagus and Pramana (2016) with some more extended initialization methods and distance metrics.
gg(x, centers, memberships, m=2, ggversion="simple", dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1e03, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
gg(x, centers, memberships, m=2, ggversion="simple", dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1e03, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
ggversion |
a string for the version of Gath-Geva algorithm. The default is simple for the simplified version (Hoepner et al (1999). Use original for the original Gath-Geva algorithm (Gath & Geva, 1989). |
m |
a number greater than 1 to be used as the fuzziness exponent or fuzzifier. The default is 2. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to make the initial cluster centers not changed along the different starts of the algorithm. The default is |
fixmemb |
a logical flag to make the initial membership degrees not changed along the different starts of the algorithm. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Gath and Geva (1989) proposed that the fuzzy maximum likelihood estimates (FMLE) clustering algorithm can be used to detect clusters of varying shapes, sizes and densities. Instead of using Euclidean distance as in FCM, Gath-Geva (GG) algorithm uses a distance norm based on fuzzy maximum likelihood estimates as described by Bezdek & Dunn (1975). These are the Gauss distances, and hence, GG is also so-called Gaussian Mixture Decomposition.
The objective function of GG is:
In the above equation, is the Gauss distance between the object
and cluster prototype
.
The argument is the prior probability for belonging
to the cluster
:
The argument is used to evaluate the size of a cluster since bigger clusters attract more elements.
is the fuzzifier to specify the amount of fuzziness for the clustering. Although it is 1 in the original FMLE algorithm (Bezdek & Dunn, 1975) a higher value of it (
) can be used to make the partition more fuzzy compensating the exponential term of the distance norm (Balasko et al, 2005).
The objective function of GG is minimized by using the following update equations:
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the fuzzy memberships degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
m |
a number for the fuzzifier. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘FCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Due to the exponential distance norm, this algorithm needs a good initialization since it converges to a near local optimum. So, usually another clustering algorithm, i.e. FCM, is used to initialize the partition matrix (Balasko et al, 2005).
Zeynel Cebeci
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Bezdek, J.C. & Dunn J.C. (1975). Optimal fuzzy partitions: A heuristic for estimating the parameters in a mixture of normal dustrubutions. IEEE Transactions on Computers, C-24(8):835-838. <doi: 10.1109/T-C.1975.224317>
Gath, I. & Geva, A.B. (1989). Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (7): 773-781. <doi:10.1109/34.192473>
Hoeppner, F., Klawonn, F., Kruse, R. & Runkler, T. (1999). Fuzzy cluster analysis. New York: John Wiley and Sons. <ISBN:0471988642>
Balasko, B., Abonyi, J. & Feil, B. (2005). Fuzzy clustering and data analysis toolbox. Department of Process Eng., Univ. of Veszprem, Veszprem.
Bagus, A. F. & Pramana, S. (2016). advclust: Object Oriented Advanced Clustering. R package version 0.4. https://CRAN.R-project.org/package=advclust
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
## Not run: # Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using Inofrep algorithm v <- inaparc::inofrep(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run Gath & Geva with the initial prototypes and memberships gg.res <- gg(x, centers=v, memberships=u, m=2) # Show the fuzzy memberships degrees for the top 5 objects head(gg.res$u, 5) ## End(Not run)
## Not run: # Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using Inofrep algorithm v <- inaparc::inofrep(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run Gath & Geva with the initial prototypes and memberships gg.res <- gg(x, centers=v, memberships=u, m=2) # Show the fuzzy memberships degrees for the top 5 objects head(gg.res$u, 5) ## End(Not run)
Partitions a numeric data set by using the Gustafson-Kessel (GK) clustering algorithm (Gustafson & Kessel, 1979). Unlike FCM using the Euclidean distance, GK uses cluster specific Mahalanobis distance.
gk(x, centers, memberships, m=2, dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1e03, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
gk(x, centers, memberships, m=2, dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1e03, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent or fuzzifier. The default is 2. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to make the initial cluster centers not changed along the different starts of the algorithm. The default is |
fixmemb |
a logical flag to make the initial membership degrees not changed along the different starts of the algorithm. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
As an extension of basic FCM algorithm, Gustafson and Kessel (GK) clustering algorithm employs an adaptive distance norm in order to detect clusters with different geometrical shapes (Babuska, 2001; Correa et al, 2011).
The objective function of GK is:
In the above equation is the Mahalanobis distance between the data object
and cluster prototype
.
As it is understood from the above equation, each cluster has its own norm-inducing matrix in GK, so that the norm matrix is a k-length tuples of the cluster-specific norm-inducing matrices:
.
The objective function of GK cannot be directly minimized with respect to since it is
linear in
. For obtaining a feasible solution,
must be constrained in some way. One of the usual ways of accomplishing this is to constrain the determinant of
(Babuska, 2001):
Allowing the matrix to vary with its determinant fixed corresponds to optimizing the shape of cluster while its volume remains constant. By using the Lagrange-multiplier method, the norm-inducing matrix for the cluster
is defined as follows (Babuska, 2001):
,
where:
is the number of data objects,
represents the volume of cluster
,
is the fuzzy covariance matrix for the cluster
, and is calculated as follows:
is the fuzzifier to specify the amount of fuzziness for the clustering;
. It is usually chosen as 2.
GK must satisfy the following constraints:
The objective function of GK is minimized by using the following update equations:
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the fuzzy memberships degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
m |
a number for the fuzzifier. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘FCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci & Cagatay Cebeci
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027-1035. http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
Gustafson, D. E. & Kessel, W. C. (1979). Fuzzy clustering with a fuzzy covariance matrix. In Proc. of IEEE Conf. on Decision and Control including the 17th Symposium on Adaptive Processes, San Diego. pp. 761-766. doi:10.1109/CDC.1978.268028
Babuska, R. (2001). Fuzzy and neural control. DISC Course Lecture Notes. Delft University of Technology. Delft, the Netherlands. https://tr.scribd.com/document/209211977/Fuzzy-and-Neural-Control.
Correa, C., Valero, C., Barreiro, P., Diago, M. P., & Tardáguila, J. (2011). A comparison of fuzzy clustering algorithms applied to feature extraction on vineyard. In Proc. of the 14th Conf. of the Spanish Assoc. for Artificial Intelligence. http://oa.upm.es/9246/
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
## Not run: # Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Initialize the membership degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FCM with the initial prototypes and memberships gk.res <- gk(x, centers=v, memberships=u, m=2) # Show the fuzzy membership degrees for the top 5 objects head(gk.res$u, 5) ## End(Not run)
## Not run: # Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Initialize the membership degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FCM with the initial prototypes and memberships gk.res <- gk(x, centers=v, memberships=u, m=2) # Show the fuzzy membership degrees for the top 5 objects head(gk.res$u, 5) ## End(Not run)
Partitions a numeric data set by using the Gustafson-Kessel (GK) algorithm within the PFCM (Possibilistic Fuzzy C-Means) clustering algorithm (Ojeda-Magaina et al, 2006).
gkpfcm(x, centers, memberships, m=2, eta=2, K=1, omega, a, b, dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
gkpfcm(x, centers, memberships, m=2, eta=2, K=1, omega, a, b, dmetric="sqeuclidean", pw = 2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent or fuzzifier. The default is 2. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 2. |
a |
a number for the relative importance of the fuzzy part of the objective function. The default is 1. |
b |
a number for the relative importance of the possibilistic part of the objective function. The default is 1. |
K |
a number greater than 0 to be used as the weight of penalty term. The default is 1. |
omega |
a numeric vector of reference distances. If missing, it is internally generated. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to make the initial cluster centers not changed along the different starts of the algorithm. The default is |
fixmemb |
a logical flag to make the initial membership degrees not changed along the different starts of the algorithm. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Gustafson-Kessel clustering within Possibilistic Fuzzy C-Means (GKPFCM) algorithm is an improvement for PFCM algorithm that consists of the modification of the distance metric for . The original PFCM uses the Euclidean distance whereas GKPFCM uses the Mahalanobis distance with GK algorithm. Babuska et al (2002) have proposed an improvement for calculating the covariance matrix
as follows:
In the above equation, involves a weighted identity matrix. The eigenvalues
and the eigenvectors
of the resulting matrix are calculated, and the maximum eigenvalue
is determined. With the obtained results,
, which satisfies
is calculated. Finally,
is recomputed with the following equation:
The objective function of GKPFCM is:
is the fuzzifier to specify the amount of fuzziness for the clustering;
. It is usually chosen as 2.
is the typicality exponent to specify the amount of typicality for the clustering;
. It is usually chosen as 2.
The objective function is minimized by using the following update equations:
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the fuzzy memberships degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
m |
a number for the fuzzifier. |
eta |
a number greater than 1 to be used as the typicality exponent. |
a |
a number for the relative importance of the fuzzy part of the objective function. |
b |
a number for the relative importance of the possibilistic part of the objective function. |
omega |
a numeric vector of reference distances. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘FCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci & Cagatay Cebeci
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Gustafson, D. E. & Kessel, W. C. (1979). Fuzzy clustering with a fuzzy covariance matrix. In Proc. of IEEE Conf. on Decision and Control including the 17th Symposium on Adaptive Processes, San Diego. pp. 761-766. <doi:10.1109/CDC.1978.268028>
Babuska, R., van der Veen, P. J. & Kaymak, U. (2002). Improved covariance estimation for Gustafson-Kessel clustering. In Proc. of Int. Conf. on Fuzzy Systems, Hawaii, 2002, pp. 1081-1085. <https://tr.scribd.com/document/209211977/Fuzzy-and-Neural-Control>.
Ojeda-Magaina, B., Ruelas, R., Corona-Nakamura, M. A. & Andina, D. (2006). An improvement to the possibilistic fuzzy c-means clustering algorithm. In Proc. of IEEE World Automation Congress, 2006 (WAC'06). pp. 1-8. <doi:10.1109/WAC.2006.376056>
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
## Not run: # Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FCM with the initial prototypes and memberships gkpfcm.res <- gkpfcm(x, centers=v, memberships=u, m=2) # Show the fuzzy membership degrees for the top 5 objects head(gkpfcm.res$u, 5) ## End(Not run)
## Not run: # Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FCM with the initial prototypes and memberships gkpfcm.res <- gkpfcm(x, centers=v, memberships=u, m=2) # Show the fuzzy membership degrees for the top 5 objects head(gkpfcm.res$u, 5) ## End(Not run)
Partitions a numeric data set by using Hard C-Means (HCM) clustering algorithm (or K-Means) which has been proposed by MacQueen(1967). The function hcm
is an extension of the basic kmeans
with more input arguments and output values in order to make the clustering results comparable with those of other fuzzy and possibilistic algorithms. For instance, not only the Euclidean distance metric but also a number of distance metrics such as the squared Euclidean distance, the squared Chord distance etc. can be employed with the function hcm
.
hcm(x, centers, dmetric="euclidean", pw=2, alginitv="kmpp", nstart=1, iter.max=1000, con.val=1e-9, stand=FALSE, numseed)
hcm(x, centers, dmetric="euclidean", pw=2, alginitv="kmpp", nstart=1, iter.max=1000, con.val=1e-9, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
dmetric |
a string for the distance metric. The default is euclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Hard C-Means (HCM) clustering algorithm (or K-means) partitions a data set into k groups, so-called clusters. The objective function of HCM is:
See ppclust-package
for the details about the terms in the above equation of .
The update equation for membership degrees is:
The update equation for cluster prototypes is:
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the hard membership degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
cluster |
a numeric vector containing the cluster labels of the data objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
best.start |
an integer for the index of start with the minimum objective functional. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
func.val |
a numeric vector for the objective function values of each start of the algorithm. |
comp.time |
a numeric vector for the execution time of each start of the algorithm. |
wss |
a numeric vector containing the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
stand |
a logical value, |
algorithm |
a string for the name of partitioning algorithm. It is ‘HCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci & Figen Yildiz
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, Univ. of California Press, 1: 281-297. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.8619&rep=rep1&type=pdf>
kmeans
,
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
## Not run: # Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Run HCM with the initial prototypes res.hcm <- hcm(x, centers=v) # Print, summarize and plot the clustering result res.hcm$cluster summary(res.hcm$cluster) plot(x, col=res.hcm$cluster, pch=16) ## End(Not run)
## Not run: # Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Run HCM with the initial prototypes res.hcm <- hcm(x, centers=v) # Print, summarize and plot the clustering result res.hcm$cluster summary(res.hcm$cluster) plot(x, col=res.hcm$cluster, pch=16) ## End(Not run)
Checks the class of given object whether it is an instance of the ppclust
class or not.
is.ppclust(objx)
is.ppclust(objx)
objx |
an object to be checked for its class. |
TRUE
if objx
is a valid ppclust
object and FALSE
for the other types of object classes.
Zeynel Cebeci
as.ppclust
,
ppclust2
,
summary.ppclust
data(iris) # Run FCM for 3 clusters res.fcm <- fcm(x=iris[,1:4], centers=2) # Test for a ppclust object returned by the fcm function is.ppclust(res.fcm) # Test for a matrix object x <- matrix(nrow=3, ncol=2, c(1,4,2,5,7,8)) is.ppclust(x)
data(iris) # Run FCM for 3 clusters res.fcm <- fcm(x=iris[,1:4], centers=2) # Test for a ppclust object returned by the fcm function is.ppclust(res.fcm) # Test for a matrix object x <- matrix(nrow=3, ncol=2, c(1,4,2,5,7,8)) is.ppclust(x)
Partitions a numeric data set by using the Modified Fuzzy and Possibilistic C-Means (MFPCM) clustering algorithm (Saad & Alimi, 2009).
mfpcm(x, centers, memberships, m=2, eta=2, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
mfpcm(x, centers, memberships, m=2, eta=2, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent or fuzzifier. The default is 2. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 3. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to make the initial cluster centers not changed along the different starts of the algorithm. The default is |
fixmemb |
a logical flag to make the initial membership degrees not changed along the different starts of the algorithm. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Modified Fuzzy and Possibilistic C Means (MFPCM) algorithm was proposed by Pal et al (1997) intented to incorporate a weight parameter to the objective function of FPCM as follows:
In the above ojective function, every data object is considered to has its own weight in relation to every cluster. Therefore it is expected that the weight permits to have a better classification especially in the case of noise data (Saad & Alimi, 2009). The weight is calculated with the following equation:
The objective function of MFPCM is minimized by using the following update equations:
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the fuzzy memberships degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
m |
a number for the fuzzifier. |
eta |
a number for the typicality exponent. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘FCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, Alper Tuna Kavlak & Figen Yildiz
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Saad, M. F. & Alimi, A. M. (2009). Modified fuzzy possibilistic c-means. In Proc. of the Int. Multiconference of Engineers and Computer Scientists, 1: 18-20. <ISBN:978-988-17012-2-0>
ekm
,
fcm
,
fcm2
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
# Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FCM with the initial prototypes and memberships mfpcm.res <- mfpcm(x, centers=v, memberships=u, m=2, eta=2) # Show the fuzzy membership degrees for the top 5 objects head(mfpcm.res$u, 5) # Show the possibilistic membership degrees for the top 5 objects head(mfpcm.res$t, 5)
# Load dataset iris data(iris) x <- iris[,-5] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=3)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=3)$u # Run FCM with the initial prototypes and memberships mfpcm.res <- mfpcm(x, centers=v, memberships=u, m=2, eta=2) # Show the fuzzy membership degrees for the top 5 objects head(mfpcm.res$u, 5) # Show the possibilistic membership degrees for the top 5 objects head(mfpcm.res$t, 5)
Partitions a numeric data set by using the Possibilistic Clustering Algorithm (PCA) which has been proposed by Yang & Wu (2006).
pca(x, centers, memberships, m=2, eta=2, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
pca(x, centers, memberships, m=2, eta=2, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent. The default is 2. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 2. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to fix the initial cluster centers. The default is |
fixmemb |
a logical flag to fix the initial membership degrees. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Unlike the Possibilistic C-Means (PCM) algorithm requiring the results of a previous run of Fuzzy C-Means (FCM) clustering in order to calculate the parameter , Possibilistic Clustering Algorithm (PCA) is based on the FCM objective function, the partition coefficient (PC) and partition entropy (PE) validity indexes. So that PCA directly computes the typicality values and needs not run FCM beforehand to compute this parameter. The resulting membership becomes the exponential function, hence, it is reported that it is robust to noise and outliers (Yang & Wu, 2006). However, Wu et al (2010) reported that PCA is very sensitive to initializations and sometimes generates coincident clusters.
The objective function of PCA is:
Where:
The update equation for cluster prototypes:
Where:
with
an object of class ‘ppclust’, which is a list consists of the following items:
v |
a numeric matrix containing the final cluster prototypes. |
u |
a numeric matrix containing the fuzzy membership degrees of the data objects. |
t |
a numeric matrix containing the typicality degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
x |
a numeric matrix containing the processed data set. |
cluster |
a numeric vector containing the cluster labels found by defuzzifying the typicality degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
k |
an integer for the number of clusters. |
m |
a number for the fuziness exponent. |
eta |
a number for the typicality exponent. |
omega |
a numeric vector of reference distances. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘PCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, Alper Tuna Kavlak
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Yang, M. S. & Wu, K. L. (2006). Unsupervised possibilistic clustering. Pattern Recognition, 39(1): 5-21. <doi:10.1016/j.patcog.2005.07.005>
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pcm
,
pcmr
,
pfcm
,
upfc
# Load dataset X16 data(x16) x <- x16[,-3] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=2)$v # Initialize the membership degrees matrix u <- inaparc::imembrand(nrow(x), k=2)$u # Run PCA pca.res <- pca(x, centers=v, memberships=u, m=2, eta=2) # Display the fuzzy membership degrees print(round(pca.res$u,2)) # Display the typicality degrees print(round(pca.res$t,2))
# Load dataset X16 data(x16) x <- x16[,-3] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=2)$v # Initialize the membership degrees matrix u <- inaparc::imembrand(nrow(x), k=2)$u # Run PCA pca.res <- pca(x, centers=v, memberships=u, m=2, eta=2) # Display the fuzzy membership degrees print(round(pca.res$u,2)) # Display the typicality degrees print(round(pca.res$t,2))
Partitions a numeric multidimensional data set by using the Possibilistic C-Means (PCM) clustering algorithm which has been proposed by Krishnapuram & Keller (1993, 1996).
pcm(x, centers, memberships, eta=2, K=1, omega, oftype = 1, dmetric="sqeuclidean", pw=2, fcmrun=TRUE, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
pcm(x, centers, memberships, eta=2, K=1, omega, oftype = 1, dmetric="sqeuclidean", pw=2, fcmrun=TRUE, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 2. |
K |
a number greater than 0 to be used as the weight of penalty term. The default is 1. |
omega |
a numeric vector of reference distances. If missing, it is internally generated. |
oftype |
an integer for the type of objective function. The default option is 1 for the PCM objective function in Krishnapuram & Keller (1993). Use 2 for PCM objective function in Krishnapuram & Keller (1996). |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
fcmrun |
a logical value for using the results from a previous FCM run. The default is TRUE for starting the algorithm with the initial centers and memberships from FCM run. Set the value to FALSE to cancel to use the results of FCM run. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical value to fix the initial cluster centers. The default is |
fixmemb |
a logical value to fix the initial membership degrees. The default is |
stand |
a logical value to standardize data. The default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Possibilistic C-Means (PCM) clustering algorithm was proposed by Krishnapuram and Keller (1993) to overcome the noise problem of FCM (Nasraoui, 1999). The algorithm differs from the other clustering algorithms in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values can be interpreted as degrees of possibility of the points belonging to the classes, i.e., the compatibilities of the points with the class prototypes (Krishnapuram & Keller, 1993). PCM relaxes the probabilistic constraint on the sum of the memberships of an object to all clusters in FCM. However, PCM must run on the fuzzy clustering results of FCM in order to calculate the parameter . Although it solves the noise sensitivity problem of FCM, the performance of PCM depends heavily on the initialization and often deteriorates due to the coincident clustering problem (Filippone et al, 2007).
The objective function of PCM is:
In the objective function above, the first term leads to a minimization of the weighted distances while the second term suppresses the trivial solution (Timm et al, 2004).
Krishnapuram and Keller later proposed an alternative objective function for PCM (Krishnapuram & Keller, 1996):
Where:
Since the membership calculation in PCM is possibilistic, the membership degrees can be treated as typicality values which measure how typical a data object is for a given cluster, regardless of all other clusters. The update equation of typicality degrees remains identical to those of FCM, and is derived from the objective function of PCM is:
The update equation for cluster prototypes is the same with those of FCM:
an object of class ‘ppclust’, which is a list consists of the following items:
v |
a numeric matrix containing the final cluster prototypes. |
t |
a numeric matrix containing the typicality degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
x |
a numeric matrix containing the processed data set. |
cluster |
a numeric vector containing the cluster labels found by defuzzifying the typicality degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
k |
an integer for the number of clusters. |
eta |
a number for the typicality exponent. |
omega |
a numeric vector of reference distances. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘PCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
PCM usually leads to acceptable results, although it suffers from stability problems if it is not initialized with the corresponding probabilistic algorithm. Therefore, it needs fuzzy partitioning results of a probabilistic algorithm, i.e. FCM, in order to compute some input arguments such as the cluster prototypes and mobilization scale parameter (Timm et al, 2004).
Zeynel Cebeci, Alper Tuna Kavlak
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Krishnapuram, R. & Keller, J.M. (1993). A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems, 1(2):98-110. <doi:10.1109/91.227387>
Krishnapuram, R. & Keller, J.M. (1996). The possibilistic c-means algorithm: insights and recommendations. IEEE Transactions on Fuzzy Systems, 4(3):385-393. <doi:10.1109/91.531779>
Timm, H., Borgelt, C., Doring, C. & Kruse, R. (2004). An extension to possibilistic fuzzy cluster analysis. Fuzzy Sets and Systems, 147 (1): 3-16. <doi:10.1016/j.fss.2003.11.009>
Filippone, M., Masulli, F., & Rovetta, S. (2007). Possibilistic clustering in feature space. In Int. Workshop on Fuzzy Logic and Applications, pp. 219-226. Springer, Berlin, Heidelberg. <doi:10.1007/978-3-540-73400-0_27>
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcmr
,
pfcm
,
upfc
# Load the dataset X12 data(x12) # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x12, k=2)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x12), k=2)$u # Run FCM with the initial prototypes and memberships fcm.res <- fcm(x12, centers=v, memberships=u, m=2) # Run PCM with the prototypes and memberships from FCM run pcm.res <- pcm(x12, centers=fcm.res$v, memberships=fcm.res$u, eta=2) # Display the typicality degrees print(pcm.res$t) # Plot the crisp memberships using maximum typicality degrees plotcluster(pcm.res, mt="t", trans=TRUE ) # Plot the crisp memberships using the typicality degrees > 0.5 plotcluster(pcm.res, mt="t", cm="threshold", tv=0.5)
# Load the dataset X12 data(x12) # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x12, k=2)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x12), k=2)$u # Run FCM with the initial prototypes and memberships fcm.res <- fcm(x12, centers=v, memberships=u, m=2) # Run PCM with the prototypes and memberships from FCM run pcm.res <- pcm(x12, centers=fcm.res$v, memberships=fcm.res$u, eta=2) # Display the typicality degrees print(pcm.res$t) # Plot the crisp memberships using maximum typicality degrees plotcluster(pcm.res, mt="t", trans=TRUE ) # Plot the crisp memberships using the typicality degrees > 0.5 plotcluster(pcm.res, mt="t", cm="threshold", tv=0.5)
Partitions a numeric data set by using the Possibilistic C-Means with Repulsion (PCMR) clustering algorithm which has been proposed by Wachs et al (2006).
pcmr(x, centers, memberships, eta=2, K=1, omega, gamma=15, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
pcmr(x, centers, memberships, eta=2, K=1, omega, gamma=15, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 2. |
K |
a number greater than 0 to be used as the weight of penalty term. The default is 1. |
omega |
a numeric vector of reference distances. If missing, it is internally generated. |
gamma |
a number for normalization. Gamma value can be in the range of 0.1 and 200, but generally 10 is used. In Shapira & Wachs(2004) gamma = 15 gave the best accuracy for PCMR. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to fix the initial cluster centers. The default is |
fixmemb |
a logical flag to fix the initial membership degrees. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Possibilistic C-Means with Repulsion (PCMR) aims to minimize the intracluster distances while maximizing the intercluster distances without using implicitly the constraints of FCM, but by adding a cluster repulsion term to the objective function of PCM (Wachs et al, 2006).
Where is a weighting factor, and
satisfies:
The repulsion term is relevant if the clusters are close enough. When the distance increases it becomes smaller until it is compensated by the attraction of the clusters. On the other hand, if the clusters are sufficiently spread out, and the intercluster distance decreases (due to the first two terms), the attraction of the cluster can be compensated only by the repulsion term.
The update equation for the cluster prototypes:
an object of class ‘ppclust’, which is a list consists of the following items:
v |
a numeric matrix containing the final cluster prototypes. |
t |
a numeric matrix containing the typicality degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
x |
a numeric matrix containing the processed data set. |
cluster |
a numeric vector containing the cluster labels found by defuzzifying the typicality degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
k |
an integer for the number of clusters. |
eta |
a number for the typicality exponent. |
omega |
a numeric vector of reference distances. |
gamma |
a number for normalization. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘PCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, A. Tuna Kavlak, Figen Yildiz
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Wachs, J., Shapira, O. & Stern, H. (2006). A method to enhance the 'Possibilistic C-Means with Repulsion' algorithm based on cluster validity index. In Applied Soft Computing Technologies: The Challenge of Complexity, pp. 77-87. Springer, Berlin, Heidelberg. <doi:10.1007/3-540-31662-0_6>
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pfcm
,
upfc
# Load data set X12 data(x12) # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x12, k=2)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x12), k=2)$u # Run FCM with the initial prototypes and memberships fcm.res <- fcm(x12, centers=v, memberships=u, m=2) # Run PCMR with the prototypes and memberships from FCM run pcmr.res <- pcmr(x12, centers=fcm.res$v, memberships=fcm.res$u, eta=2) # Show the typicality degrees for the top 5 objects head(pcmr.res$t, 5) # Plot the crisp memberships using maximum typicality degrees plotcluster(pcmr.res, mt="t", cm="max") # Plot the crisp memberships using the typicality degrees > 0.5 plotcluster(pcmr.res, mt="t", cm="threshold", tv=0.5)
# Load data set X12 data(x12) # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x12, k=2)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x12), k=2)$u # Run FCM with the initial prototypes and memberships fcm.res <- fcm(x12, centers=v, memberships=u, m=2) # Run PCMR with the prototypes and memberships from FCM run pcmr.res <- pcmr(x12, centers=fcm.res$v, memberships=fcm.res$u, eta=2) # Show the typicality degrees for the top 5 objects head(pcmr.res$t, 5) # Plot the crisp memberships using maximum typicality degrees plotcluster(pcmr.res, mt="t", cm="max") # Plot the crisp memberships using the typicality degrees > 0.5 plotcluster(pcmr.res, mt="t", cm="threshold", tv=0.5)
Partitions a numeric data set by using the Possibilistic Fuzzy C-Means (PFCM) clustering algorithm proposed by Pal et al (2005).
pfcm(x, centers, memberships, m=2, eta=2, K=1, omega, a, b, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
pfcm(x, centers, memberships, m=2, eta=2, K=1, omega, a, b, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent. The default is 2. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 2. |
a |
a number for the relative importance of the fuzzy part of the objective function. The default is 1. |
b |
a number for the relative importance of the possibilistic part of the objective function. The default is 1. |
K |
a number greater than 0 to be used as the weight of penalty term. The default is 1. |
omega |
a numeric vector of reference distances. If missing, it is internally generated. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to fix the initial cluster centers. The default is |
fixmemb |
a logical flag to fix the initial membership degrees. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
In FPCM, the constraint corresponding to the sum of all the typicality values of all data objects to a cluster must be equal to one causes problems; particularly for a big data set. In order to avoid this problem Pal et al (2005) proposed Possibilistic Fuzzy C-Means (PFCM) clustering algorithm with following objective function:
The fuzzy membership degrees in the probabilistic part of the objective function is calculated in the same way as in FCM, as follows:
The typicality degrees in the possibilistic part of the objective function is calculated as follows:
The constraints with PFCM are:
and
are the coefficients to define the relative importance of fuzzy membership and typicality degrees for weighting the probabilistic and possibilistic terms of the objective function,
.
is the fuzzifier to specify the amount of fuzziness for the clustering;
. It is usually chosen as 2.
is the typicality exponent to specify the amount of typicality for the clustering;
. It is usually chosen as 2.
is the possibilistic penalty terms for controlling the variance of the clusters.
The update equation for cluster prototypes:
an object of class ‘ppclust’, which is a list consists of the following items:
v |
a numeric matrix containing the final cluster prototypes. |
t |
a numeric matrix containing the typicality degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
x |
a numeric matrix containing the processed data set. |
cluster |
a numeric vector containing the cluster labels found by defuzzifying the typicality degrees of the objects. |
csize |
a numeric vector for the number of objects in the clusters. |
k |
an integer for the number of clusters. |
m |
a number for the used fuzziness exponent. |
eta |
a number for the used typicality exponent. |
a |
a number for the fuzzy part of the objective function. |
b |
a number for the possibilistic part of the objective function. |
omega |
a numeric vector of reference distances. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘PCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, Alper Tuna Kavlak & Figen Yildiz
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Pal, N. R., Pal, K. & Bezdek, J. C. (2005). A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Systems, 13 (4): 517-530. <doi:10.1109/TFUZZ.2004.840099>
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
upfc
# Load the dataset X12 data(x12) # Set the initial centers of clusters v0 <- matrix(nrow=2, ncol=2, c(-3.34, 1.67, 1.67, 0.00), byrow=FALSE) # Run FCM with the initial centers in v0 res.fcm <- fcm(x12, centers=v0, m=2) # Run PFCM with the final centers and memberhips from FCM res.pfcm <- pfcm(x12, centers=res.fcm$v, memberships=res.fcm$u, m=2, eta=2) # Show the typicality and fuzzy membership degrees from PFCM res.pfcm$t res.pfcm$u
# Load the dataset X12 data(x12) # Set the initial centers of clusters v0 <- matrix(nrow=2, ncol=2, c(-3.34, 1.67, 1.67, 0.00), byrow=FALSE) # Run FCM with the initial centers in v0 res.fcm <- fcm(x12, centers=v0, m=2) # Run PFCM with the final centers and memberhips from FCM res.pfcm <- pfcm(x12, centers=res.fcm$v, memberships=res.fcm$u, m=2, eta=2) # Show the typicality and fuzzy membership degrees from PFCM res.pfcm$t res.pfcm$u
Plots clustering results from a cluster analysis with ‘ppclust’.
plotcluster(objx, mt, cm, tv, cp=1, pt=19, trans=FALSE)
plotcluster(objx, mt, cm, tv, cp=1, pt=19, trans=FALSE)
objx |
an object of ‘ppclust’ class. |
mt |
a character to specify the membership type. The default is u for fuzzy membership degrees. The alternative option is t for typicality degrees. The default is t for the algorithms which produce both types of membership matrices. |
cm |
a character to specify the crisping method. The default is max for using maximum degree of memberships for each data obejct. The alternative option is threshold for the membership degrees exceeding a user-specified threshold value with |
tv |
a number specifying the threshold membership degree when the value of |
cp |
an integer for the index of available color palettes. The default is 1. The options are 2, 3, 4 and 5 for different color themes. |
pt |
an integer specifying the ASCII code of a point character to be used in plotting. The default is 19 for solid circles. Use # for displaying the cluster with their cluster labels. |
trans |
a logical value for the type of plots. The default is FALSE for solid point colors. The alternative option is TRUE for transparent point colors. |
Zeynel Cebeci
# Run FCM for 3 clusters on the data set Iris res.fcm <- fcm(x=iris[,-5], centers=3) par(ask=TRUE) # Plot the clustering results with solid colors plotcluster(res.fcm, cp=1) # Plot the same clustering results with transparent colors plotcluster(res.fcm, cp=1, trans=TRUE) # Plot the same clustering results for the memberships > 0.75 plotcluster(res.fcm, cp=1, cm="threshold", tv=0.75, trans=TRUE) par(ask=FALSE)
# Run FCM for 3 clusters on the data set Iris res.fcm <- fcm(x=iris[,-5], centers=3) par(ask=TRUE) # Plot the clustering results with solid colors plotcluster(res.fcm, cp=1) # Plot the same clustering results with transparent colors plotcluster(res.fcm, cp=1, trans=TRUE) # Plot the same clustering results for the memberships > 0.75 plotcluster(res.fcm, cp=1, cm="threshold", tv=0.75, trans=TRUE) par(ask=FALSE)
Converts an object of ‘ppclust’ class to the other types of cluster objects.
ppclust2(objx, otype, ...)
ppclust2(objx, otype, ...)
objx |
an object of |
otype |
target object class type for conversion. |
... |
additional arguments. |
an object of fanny.object
, summary.fclust
, kmeans
or vegclust
class.
Zeynel Cebeci
as.ppclust
,
is.ppclust
,
summary.ppclust
data(iris) # Create a object of ppclust opc <- fcm(x=iris[,1:4], centers=3) # Check the class of opc object is.ppclust(opc) # Convert ppclust object 'opc' to the fanny object ofc <- ppclust2(opc, otype="fanny") # Check the class of 'ofc' for ppclust is.ppclust(ofc) # Check the class of 'ofc' class(ofc) # Convert ppclust object 'opc' to fclust object ofc <- ppclust2(opc, otype="fclust") # Check the class of 'ofc' class(ofc)
data(iris) # Create a object of ppclust opc <- fcm(x=iris[,1:4], centers=3) # Check the class of opc object is.ppclust(opc) # Convert ppclust object 'opc' to the fanny object ofc <- ppclust2(opc, otype="fanny") # Check the class of 'ofc' for ppclust is.ppclust(ofc) # Check the class of 'ofc' class(ofc) # Convert ppclust object 'opc' to fclust object ofc <- ppclust2(opc, otype="fclust") # Check the class of 'ofc' class(ofc)
Summarizes the clustering results for an object which is an instance of ‘ppclust’ class.
## S3 method for class 'ppclust' summary(object, ...)
## S3 method for class 'ppclust' summary(object, ...)
object |
an object of |
... |
additional arguments for S3 method summary. |
summary of the clustering results from the object of ppclust
class.
Zeynel Cebeci
as.ppclust
,
is.ppclust
,
ppclust2
data(iris) # Run FCM for three clusters res.fcm <- fcm(x=iris[,1:4], centers=3) # Summarize the result summary(res.fcm)
data(iris) # Run FCM for three clusters res.fcm <- fcm(x=iris[,1:4], centers=3) # Summarize the result summary(res.fcm)
Partitions a numeric data set by using the Unsupervised Possibilistic Fuzzy C-Means clustering algorithm which has been proposed by Wu et al (2010).
upfc(x, centers, memberships, m=2, eta=2, a, b, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
upfc(x, centers, memberships, m=2, eta=2, a, b, dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09, fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent. The default is 2. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 2. |
a |
a number for the relative importance of the fuzzy part of the objective function. The default is 1. |
b |
a number for the relative importance of the possibilistic part of the objective function. The default is 1. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to fix the initial cluster centers. The default is |
fixmemb |
a logical flag to fix the initial membership degrees. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Unsupervised Possibilistic Fuzzy C-Means (UPFC) is an extension of Possibilistic Clustering Algorithm (PCA) by Yang & Wu (2006). Wu et al (2010) reported that PCA is very sensitive to initializations and sometimes generates coincident clusters, and proposed the algorithm UPFC to overcome this problem with inspiration by Pal et al's PFCM algorithm (Pal et al, 2005). The algorithm UPFC produces both possibilistic and probabilistic memberships simultaneously, and overcomes the noise sensitivity problem of FCM and the coincident clusters problem of PCA.
The objective function of UPFC is:
Where:
Where:
with
The constraints with UPFC are:
and
are the coefficients to define the relative importance of fuzzy membership and typicality degrees in the objective function,
.
The update equation for cluster prototypes:
an object of class ‘ppclust’, which is a list consists of the following items:
v |
a numeric matrix containing the final cluster prototypes. |
t |
a numeric matrix containing the typicality degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
x |
a numeric matrix containing the processed data set. |
cluster |
a numeric vector containing the cluster labels found by defuzzifying the typicality degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
k |
an integer for the number of clusters. |
m |
a number for the used fuzziness exponent. |
eta |
a number for the used typicality exponent. |
a |
a number for the fuzzy part of the objective function. |
b |
a number for the possibilistic part of the objective function. |
beta |
a numeric vector of normalization... |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘PCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, Figen Yildiz & Alper Tuna Kavlak
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Pal, N. R., Pal, K. & Bezdek, J. C. (2005). A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Systems, 13 (4): 517-530. <doi:10.1109/TFUZZ.2004.840099>
Yang, M. S. & Wu, K. L. (2006). Unsupervised possibilistic clustering. Pattern Recognition, 39(1): 5-21. <doi:10.1016/j.patcog.2005.07.005>
Wu, X., Wu, B., Sun, J. & Fu, H. (2010). Unsupervised possibilistic fuzzy clustering. J. of Information & Computational Sci., 7 (5): 1075-1080. <https://www.researchgate.net/publication/267691137_Unsupervised_Possibilistic_Fuzzy_Clustering>
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
# Load dataset X16 data(x16) x <- x16[,-3] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=2)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=2)$u # Run UPFC res.upfc <- upfc(x, centers=v, memberships=u, eta=2) # Display the fuzzy membership degrees print(round(res.upfc$u, 2)) # Display the typicality degrees print(round(res.upfc$t, 2))
# Load dataset X16 data(x16) x <- x16[,-3] # Initialize the prototype matrix using K-means++ v <- inaparc::kmpp(x, k=2)$v # Initialize the memberships degrees matrix u <- inaparc::imembrand(nrow(x), k=2)$u # Run UPFC res.upfc <- upfc(x, centers=v, memberships=u, eta=2) # Display the fuzzy membership degrees print(round(res.upfc$u, 2)) # Display the typicality degrees print(round(res.upfc$t, 2))
A synthetic data set described by Pal et al (2005). It consists of two continous variables forming two well-separated clusters in addition to two noise.
data(x12)
data(x12)
A data frame with 12 rows and 2 numeric variables:
a numeric variable ranging from -5.0 to 5.0
a numeric variable ranging from -1.67 to 10.0
The data set x12
is recommended to test the performances of the possibilistic and probabilistic clustering algorithms.
Pal, N. R., Pal, K. & Bezdek, J. C. (2005). A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Systems, 13 (4): 517-530. <doi:10.1109/TFUZZ.2004.840099>
data(x12) # Descriptive statistics of the data set summary(x12) # Plot the data set pairs(x12, pch=19, cex=2)
data(x12) # Descriptive statistics of the data set summary(x12) # Plot the data set pairs(x12, pch=19, cex=2)
A synthetic data set described by Yang & Wu (2006). It consists of two continous variables forming two well-separated clusters in addition to two noise points.
data(x16)
data(x16)
A data frame with 16 rows and 2 numeric variables:
a numeric variable ranging from 50 to 150
a numeric variable ranging from 145 to 200
a numeric variable ranging from 1 to 4
The data set x16
is recommended to test the performances of the possibilistic and noise clustering algorithms.
Yang, M. S. & Wu, K. L. (2006). Unsupervised possibilistic clustering. Pattern Recognition, 39(1): 5-21. <doi:10.1016/j.patcog.2005.07.005>
data(x16) x <- x16[,-3] # descriptive statistics of the variables summary(x) # scatter plots for the variable pairs pairs(x, col=x16$cl, pch=20, cex=2)
data(x16) x <- x16[,-3] # descriptive statistics of the variables summary(x) # scatter plots for the variable pairs pairs(x, col=x16$cl, pch=20, cex=2)