| Title: | Fuzzy Unsupervised and Semi-Supervised Clustering |
|---|---|
| Description: | Methods for distance-based fuzzy unsupervised and semi-supervised clustering, including fuzzy and possibilistic models based on alternating optimization (AO) algorithm. The package introduces a vectorized estimation framework for prototype-based fuzzy clustering algorithms, enabling modular algorithm design and extensibility. It also supports storage and retrieval of intermediate AO optimization results for downstream analysis and processing. For more details see Kmita et al. (2024) <doi:10.1109/TFUZZ.2024.3370768>. |
| Authors: | Kamil Kmita [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-8829-2420>) |
| Maintainer: | Kamil Kmita <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-02 18:57:53 UTC |
| Source: | https://github.com/cran/fussclust |
Calculates data evidence matrix E from distances matrix D.
calculate_evidence(D)calculate_evidence(D)
D |
Distances matrix of size N x c. |
Matrix of size N x c.
Creates DHE (stands for "distances horizontally exploded") and DVE (stands for "distances vertically exploded") matrices.
dheve(A, vertical)dheve(A, vertical)
A |
Matrix of size N x c. |
vertical |
Boolean switch.
If |
Matrix of size Nc x c
Estimated T matrix with typicalities in semi-supervised case.
estimate_super_T(D, superF, alpha, gammas, b = 1)estimate_super_T(D, superF, alpha, gammas, b = 1)
D |
Distances matrix of size N x c. |
superF |
Binary supervision matrix of size N x c. |
alpha |
Scaling factor, a floating point > 0 regulating the impact of partial supervision. |
gammas |
a c-vector of cluster-specific gamma hyperparameters. |
b |
a scalar weighting the contribution of possibilistic membership in SPFCM (semi-supervised possibilistic fuzzy c-means) model. It is set to 1 by default for other semi-supervised models. |
Estimated T matrix with typicalities in unsupervised case.
estimate_T(D, gammas)estimate_T(D, gammas)
D |
Distances matrix of size N x c. |
gammas |
a c-vector of cluster-specific gamma hyperparameters. |
Estimated U matrix with memberships in semi-supervised case.
estimate_U(D, superF, alpha)estimate_U(D, superF, alpha)
D |
Distances matrix of size N x c. |
superF |
Binary supervision matrix of size N x c. |
alpha |
Scaling factor, a floating point > 0 regulating the impact of partial supervision. |
.Equation to calculate clusters' prototypes matrix .
estimate_V(Phi, X)estimate_V(Phi, X)
Phi |
Matrix with weights of size N x c. |
X |
Matrix with predictors of size N x p. |
Clusters' prototypes matrix of size c x p.
Fits a Fuzzy C-Means (FCM) clustering model using the Alternating Optimization algorithm.
FCM( X, C, U = NULL, max_iter = 200, conv_criterion = 1e-04, function_dist = rdist::cdist, store_history = FALSE )FCM( X, C, U = NULL, max_iter = 200, conv_criterion = 1e-04, function_dist = rdist::cdist, store_history = FALSE )
X |
A numeric feature matrix. |
C |
Integer specifying the number of clusters. |
U |
Optional initial membership matrix.
Primarily intended for reproducibility purposes.
If |
max_iter |
Maximum number of iterations.
Defaults to |
conv_criterion |
Convergence threshold used at the end of each iteration of the Alternating Optimization algorithm. |
function_dist |
Optional distance function.
The function must accept two matrices, For the Euclidean distance, the returned distances should not be squared.
Defaults to |
store_history |
Logical indicating whether optimization
histories should be stored. If |
An object of class fcm containing:
An membership matrix.
A matrix of cluster prototypes.
The distance function used by the model.
Number of iterations performed until convergence.
If store_history = TRUE, a list of length
counter containing membership matrices estimated at each
iteration; otherwise NULL.
If store_history = TRUE, a list of length
counter containing prototype matrices estimated at each
iteration; otherwise NULL.
If store_history = TRUE, a list of length
counter containing phi-weight matrices estimated at each
iteration; otherwise NULL.
Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. Springer US. https://doi.org/10.1007/978-1-4757-0450-1
X <- matrix(rnorm(100), ncol = 2) model_fcm <- fussclust::FCM( X = X, C = 2 ) print(model_fcm$V)X <- matrix(rnorm(100), ncol = 2) model_fcm <- fussclust::FCM( X = X, C = 2 ) print(model_fcm$V)
Aggregates elements of DHE and DVE matrices in a step to build evidence matrix E.
gamma_fcm(dhe, dve)gamma_fcm(dhe, dve)
dhe |
DHE matrix of size Nc x c. |
dve |
DVE matrix of size Nc x c. |
Matrix of size Nc x 1.
Initialization procedure to calculate values of gamma hyperparameters.
init_gamma(.model, .X)init_gamma(.model, .X)
.model |
estimated model of class |
.X |
features matrix of size N x c |
Fits a Possibilistic C-Means (PCM) clustering model using the Alternating Optimization algorithm.
PCM( X, C, U = NULL, gammas = NULL, initFCM = NULL, max_iter = 200, conv_criterion = 1e-04, function_dist = rdist::cdist, store_history = FALSE )PCM( X, C, U = NULL, gammas = NULL, initFCM = NULL, max_iter = 200, conv_criterion = 1e-04, function_dist = rdist::cdist, store_history = FALSE )
X |
A numeric feature matrix. |
C |
Integer specifying the number of clusters. |
U |
Optional initial membership matrix.
Primarily intended for reproducibility purposes.
If |
gammas |
Optional vector of cluster-specific gamma hyperparameters.
If If |
initFCM |
Optional fitted Fuzzy C-Means model used to initialize
cluster-specific gamma hyperparameters via weighted averaging.
If |
max_iter |
Maximum number of iterations.
Defaults to |
conv_criterion |
Convergence threshold used at the end of each iteration of the Alternating Optimization algorithm. |
function_dist |
Optional distance function.
The function must accept two matrices, For the Euclidean distance, the returned distances should not be squared.
Defaults to |
store_history |
Logical indicating whether optimization
histories should be stored. If |
An object of class pcm containing:
An membership matrix.
A matrix of cluster prototypes.
The distance function used by the model.
Number of iterations performed until convergence.
Vector of cluster-specific gamma hyperparameters.
If store_history = TRUE, a list of length
counter containing membership matrices estimated at each
iteration; otherwise NULL.
If store_history = TRUE, a list of length
counter containing prototype matrices estimated at each
iteration; otherwise NULL.
If store_history = TRUE, a list of length
counter containing phi-weight matrices estimated at each
iteration; otherwise NULL.
Krishnapuram, R., & Keller, J. (1993). A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems, 1(2), 98–110. https://doi.org/10.1109/91.227387
X <- matrix(rnorm(100), ncol = 2) model_pcm <- fussclust::PCM( X = X, C = 2, initFCM = TRUE ) print(model_pcm$V)X <- matrix(rnorm(100), ncol = 2) model_pcm <- fussclust::PCM( X = X, C = 2, initFCM = TRUE ) print(model_pcm$V)
ssfcm objectsPredicts cluster memberships for new observations using a fitted Semi-Supervised Fuzzy C-Means model.
## S3 method for class 'ssfcm' predict(object, X, ...)## S3 method for class 'ssfcm' predict(object, X, ...)
object |
An object of class |
X |
A numeric matrix of new observations with |
... |
Additional arguments. Currently ignored. |
A matrix of size containing predicted
cluster memberships, where is the number of clusters.
X <- matrix(rnorm(100), ncol = 2) superF <- matrix(0, nrow = nrow(X), ncol = 2) superF[1:10, 1] <- 1 superF[11:20, 2] <- 1 model_ssfcm <- SSFCM( X = X, C = 2, superF = superF, alpha = 1 ) predict(model_ssfcm, matrix(rnorm(2), ncol = 2))X <- matrix(rnorm(100), ncol = 2) superF <- matrix(0, nrow = nrow(X), ncol = 2) superF[1:10, 1] <- 1 superF[11:20, 2] <- 1 model_ssfcm <- SSFCM( X = X, C = 2, superF = superF, alpha = 1 ) predict(model_ssfcm, matrix(rnorm(2), ncol = 2))
sspcm objectsPredicts cluster memberships for new observations using a fitted Semi-Supervised Possibilistic C-Means model.
## S3 method for class 'sspcm' predict(object, X, ...)## S3 method for class 'sspcm' predict(object, X, ...)
object |
An object of class |
X |
A numeric matrix of new observations with |
... |
Additional arguments. Currently ignored. |
A matrix of size containing predicted
cluster memberships, where is the number of clusters.
X <- matrix(rnorm(100), ncol = 2) superF <- matrix(0, nrow = nrow(X), ncol = 2) superF[1:10, 1] <- 1 superF[11:20, 2] <- 1 model_sspcm <- SSPCM( X = X, C = 2, superF = superF, initFCM = TRUE, alpha = 1 ) predict(model_sspcm, matrix(rnorm(2), ncol = 2))X <- matrix(rnorm(100), ncol = 2) superF <- matrix(0, nrow = nrow(X), ncol = 2) superF[1:10, 1] <- 1 superF[11:20, 2] <- 1 model_sspcm <- SSPCM( X = X, C = 2, superF = superF, initFCM = TRUE, alpha = 1 ) predict(model_sspcm, matrix(rnorm(2), ncol = 2))
Fits a Semi-Supervised Fuzzy C-Means (SSFCM) clustering model using the Alternating Optimization algorithm.
SSFCM( X, C, U = NULL, max_iter = 200, conv_criterion = 1e-04, function_dist = rdist::cdist, store_history = FALSE, alpha = NULL, superF = NULL )SSFCM( X, C, U = NULL, max_iter = 200, conv_criterion = 1e-04, function_dist = rdist::cdist, store_history = FALSE, alpha = NULL, superF = NULL )
X |
A numeric feature matrix. |
C |
Integer specifying the number of clusters. |
U |
Optional initial membership matrix.
Primarily intended for reproducibility purposes.
If |
max_iter |
Maximum number of iterations.
Defaults to |
conv_criterion |
Convergence threshold used at the end of each iteration of the Alternating Optimization algorithm. |
function_dist |
Optional distance function.
The function must accept two matrices, For the Euclidean distance, the returned distances should not be squared.
Defaults to |
store_history |
Logical indicating whether optimization
histories should be stored. If |
alpha |
Positive scaling factor regulating the impact of partial supervision. |
superF |
Binary supervision matrix of the same dimensions as |
An object of class sspcm containing:
An memberships matrix.
A matrix of cluster prototypes.
The distance function used by the model.
Number of iterations performed until convergence.
Value of scaling factor.
If store_history = TRUE, a list of length
counter containing membership matrices estimated at each
iteration; otherwise NULL.
If store_history = TRUE, a list of length
counter containing prototype matrices estimated at each
iteration; otherwise NULL.
If store_history = TRUE, a list of length
counter containing phi-weight matrices estimated at each
iteration; otherwise NULL.
Kmita, K., Kaczmarek-Majer, K., & Hryniewicz, O. (2024). Explainable Impact of Partial Supervision in Semi-Supervised Fuzzy Clustering. IEEE Transactions on Fuzzy Systems, 1–10. https://doi.org/10.1109/TFUZZ.2024.3370768
X <- matrix(rnorm(100), ncol = 2) superF <- matrix(0, nrow = nrow(X), ncol = 2) superF[1:10, 1] <- 1 superF[11:20, 2] <- 1 model_ssfcm <- SSFCM( X = X, C = 2, superF = superF, alpha = 1 ) print(model_ssfcm$V)X <- matrix(rnorm(100), ncol = 2) superF <- matrix(0, nrow = nrow(X), ncol = 2) superF[1:10, 1] <- 1 superF[11:20, 2] <- 1 model_ssfcm <- SSFCM( X = X, C = 2, superF = superF, alpha = 1 ) print(model_ssfcm$V)
Fits a Semi-Supervised Possibilistic C-Means (SSPCM) clustering model using the Alternating Optimization algorithm.
SSPCM( X, C, U = NULL, gammas = NULL, initFCM = NULL, max_iter = 200, conv_criterion = 1e-04, function_dist = rdist::cdist, store_history = FALSE, alpha = NULL, superF = NULL )SSPCM( X, C, U = NULL, gammas = NULL, initFCM = NULL, max_iter = 200, conv_criterion = 1e-04, function_dist = rdist::cdist, store_history = FALSE, alpha = NULL, superF = NULL )
X |
A numeric feature matrix. |
C |
Integer specifying the number of clusters. |
U |
Optional initial membership matrix.
Primarily intended for reproducibility purposes.
If |
gammas |
Optional vector of cluster-specific gamma hyperparameters.
If If |
initFCM |
Optional fitted Fuzzy C-Means model used to initialize
cluster-specific gamma hyperparameters via weighted averaging.
If |
max_iter |
Maximum number of iterations.
Defaults to |
conv_criterion |
Convergence threshold used at the end of each iteration of the Alternating Optimization algorithm. |
function_dist |
Optional distance function.
The function must accept two matrices, For the Euclidean distance, the returned distances should not be squared.
Defaults to |
store_history |
Logical indicating whether optimization
histories should be stored. If |
alpha |
Positive scaling factor regulating the impact of partial supervision. |
superF |
Binary supervision matrix of the same dimensions as |
An object of class sspcm containing:
An typicalities matrix.
A matrix of cluster prototypes.
The distance function used by the model.
Number of iterations performed until convergence.
Vector of cluster-specific gamma hyperparameters.
Value of scaling factor.
If store_history = TRUE, a list of length
counter containing membership matrices estimated at each
iteration; otherwise NULL.
If store_history = TRUE, a list of length
counter containing prototype matrices estimated at each
iteration; otherwise NULL.
If store_history = TRUE, a list of length
counter containing phi-weight matrices estimated at each
iteration; otherwise NULL.
Kmita, K., Kaczmarek-Majer, K., & Hryniewicz, O. (2024). Explainable Impact of Partial Supervision in Semi-Supervised Fuzzy Clustering. IEEE Transactions on Fuzzy Systems, 1–10. https://doi.org/10.1109/TFUZZ.2024.3370768
X <- matrix(rnorm(100), ncol = 2) superF <- matrix(0, nrow = nrow(X), ncol = 2) superF[1:10, 1] <- 1 superF[11:20, 2] <- 1 model_sspcm <- SSPCM( X = X, C = 2, superF = superF, alpha = 1 ) print(model_sspcm$V)X <- matrix(rnorm(100), ncol = 2) superF <- matrix(0, nrow = nrow(X), ncol = 2) superF[1:10, 1] <- 1 superF[11:20, 2] <- 1 model_sspcm <- SSPCM( X = X, C = 2, superF = superF, alpha = 1 ) print(model_sspcm$V)
This dataset provides a concrete superivison structure: - 'superF' matrix of size 150 x 3 with partial supervision, - 'ind' vector with indices of unsupervised observations, - 'tind' vector with indicies of observations selected to be in the test dataset, - 'tclass' vector with class membership of the observations selected to be in the test dataset.
This supervision structure is meant to reproduce a particular realization of phenomenon of underimpact of partial supervision specific to the iris dataset.
data(superFstruct_underimpact)data(superFstruct_underimpact)
A list with: a matrix of size 150 x 3, and three vectors.
This dataset provides a concrete initialization of membership matrix specific to the iris data that exhibits the phenomenon of underimpact of partial supervision in semi-supervised fuzzy clustering.
data(U_underimpact)data(U_underimpact)
A matrix of size 150 x 3.
Rearranges elements of input matrix from a block matrix with vertical blocks (column vectors) to a block matrix with horizontal blocks (row vectors).
xi_fcm(A, c)xi_fcm(A, c)
A |
Matrix of size Nc x 1. |
c |
Number of columns in the wanted matrix. Associated with the number of clusters. |
Matrix of size N x c.