Title: | Simultaneous Clustering and Factorial Decomposition of Three-Way Datasets |
---|---|
Description: | Implements two iterative techniques called T3Clus and 3Fkmeans, aimed at simultaneously clustering objects and a factorial dimensionality reduction of variables and occasions on three-mode datasets developed by Vichi et al. (2007) <doi:10.1007/s00357-007-0006-x>. Also, we provide a convex combination of these two simultaneous procedures called CT3Clus and based on a hyperparameter alpha (alpha in [0,1], with 3FKMeans for alpha=0 and T3Clus for alpha= 1) also developed by Vichi et al. (2007) <doi:10.1007/s00357-007-0006-x>. Furthermore, we implemented the traditional tandem procedures of T3Clus (TWCFTA) and 3FKMeans (TWFCTA) for sequential clustering-factorial decomposition (TWCFTA), and vice-versa (TWFCTA) proposed by P. Arabie and L. Hubert (1996) <doi:10.1007/978-3-642-79999-0_1>. |
Authors: | Prosper Ablordeppey [aut, cre] , Adelaide Freitas [ctb] , Giorgia Zaccaria [ctb] |
Maintainer: | Prosper Ablordeppey <[email protected]> |
License: | GPL-3 |
Version: | 0.0.3 |
Built: | 2024-12-12 06:49:15 UTC |
Source: | CRAN |
Simultaneous results attributes
U_i_g0
matrix. Initial object membership function matrix
B_j_q0
matrix. Initial factor/component matrix for the variables
C_k_r0
matrix. Initial factor/component matrix for the occasions
U_i_g
matrix. Final/updated object membership function matrix
B_j_q
matrix. Final/updated factor/component matrix for the variables
C_k_r
matrix. Final/updated factor/component matrix for the occasions
Y_g_qr
matrix. Derived centroids in the reduced space (data matrix)
X_i_jk_scaled
matrix. Standardized dataset matrix
BestTimeElapsed
numeric. Execution time for the best iterate
BestLoop
numeric. Loop that obtained the best iterate
BestIteration
numeric. Iteration yielding the best results
Converged
numeric. Flag to check if algorithm converged for the K-means
nConverges
numeric. Number of loops that converged for the K-means
TSS_full
numeric. Total deviance in the full-space
BSS_full
numeric. Between deviance in the reduced-space
RSS_full
numeric. Residual deviance in the reduced-space
PF_full
numeric. PseudoF in the full-space
TSS_reduced
numeric. Total deviance in the reduced-space
BSS_reduced
numeric. Between deviance in the reduced-space
RSS_reduced
numeric. Residual deviance in the reduced-space
PF_reduced
numeric. PseudoF in the reduced-space
PF
numeric. Weighted PseudoF score
Labels
integer. Object cluster assignments
Fs
numeric. Objective function values for the KM best iterate
Enorm
numeric. Average l2 norm of the residual norm.
Tandem results attributes
U_i_g0
matrix. Initial object membership function matrix.
B_j_q0
matrix. Initial factor/component matrix for the variables.
C_k_r0
matrix. Initial factor/component matrix for the occasions.
U_i_g
matrix. Final/updated object membership function matrix.
B_j_q
matrix. Final/updated factor/component matrix for the variables.
C_k_r
matrix. Final/updated factor/component matrix for the occasions.
Y_g_qr
matrix. Derived centroids in the reduced space (data matrix).
X_i_jk_scaled
matrix. Standardized dataset matrix.
BestTimeElapsed
numeric. Execution time for the best iterate.
BestLoop
numeric. Loop that obtained the best iterate.
BestKmIteration
numeric. Number of iteration until best iterate for the K-means.
BestFaIteration
numeric. Number of iteration until best iterate for the FA.
FaConverged
numeric. Flag to check if algorithm converged for the K-means.
KmConverged
numeric. Flag to check if algorithm converged for the Factor Decomposition.
nKmConverges
numeric. Number of loops that converged for the K-means.
nFaConverges
numeric. Number of loops that converged for the Factor decomposition.
TSS_full
numeric. Total deviance in the full-space.
BSS_full
numeric. Between deviance in the reduced-space.
RSS_full
numeric. Residual deviance in the reduced-space.
PF_full
numeric. PseudoF in the full-space.
TSS_reduced
numeric. Total deviance in the reduced-space.
BSS_reduced
numeric. Between deviance in the reduced-space.
RSS_reduced
numeric. Residual deviance in the reduced-space.
PF_reduced
numeric. PseudoF in the reduced-space.
PF
numeric. Actual PseudoF value to obtain best loop.
Labels
integer. Object cluster assignments.
FsKM
numeric. Objective function values for the KM best iterate.
FsFA
numeric. Objective function values for the FA best iterate.
Enorm
numeric. Average l2 norm of the residual norm.
Implements simultaneous version of TWFCTA
fit.3fkmeans(model, X_i_jk, full_tensor_shape, reduced_tensor_shape) ## S4 method for signature 'simultaneous' fit.3fkmeans(model, X_i_jk, full_tensor_shape, reduced_tensor_shape)
fit.3fkmeans(model, X_i_jk, full_tensor_shape, reduced_tensor_shape) ## S4 method for signature 'simultaneous' fit.3fkmeans(model, X_i_jk, full_tensor_shape, reduced_tensor_shape)
model |
Initialized simultaneous model. |
X_i_jk |
Matricized tensor along mode-1 (I objects). |
full_tensor_shape |
Dimensions of the tensor in full-space. |
reduced_tensor_shape |
Dimensions of tensor in the reduced-space. |
The procedure performs simultaneously the sequential TWFCTA model. The model finds B_j_q and C_k_r such that the within-clusters deviance of the component scores is minimized.
Output attributes accessible via the '@' operator.
U_i_g0 - Initial object membership function matrix
B_j_q0 - Initial factor/component matrix for the variables
C_k_r0 - Initial factor/component matrix for the occasions
U_i_g - Final/updated object membership function matrix
B_j_q - Final/updated factor/component matrix for the variables
C_k_r - Final/updated factor/component matrix for the occasions
Y_g_qr - Derived centroids in the reduced space (data matrix)
X_i_jk_scaled - Standardized dataset matrix
BestTimeElapsed - Execution time for the best iterate
BestLoop - Loop that obtained the best iterate
BestIteration - Iteration yielding the best results
Converged - Flag to check if algorithm converged for the K-means
nConverges - Number of loops that converged for the K-means
TSS_full - Total deviance in the full-space
BSS_full - Between deviance in the reduced-space
RSS_full - Residual deviance in the reduced-space
PF_full - PseudoF in the full-space
TSS_reduced - Total deviance in the reduced-space
BSS_reduced - Between deviance in the reduced-space
RSS_reduced - Residual deviance in the reduced-space
PF_reduced - PseudoF in the reduced-space
PF - Weighted PseudoF score
Labels - Object cluster assignments
Fs - Objective function values for the KM best iterate
Enorm - Average l2 norm of the residual norm.
Tucker L (1966). “Some mathematical notes on three-mode factor analysis.” Psychometrika, 31(3), 279-311. doi:10.1007/BF02289464, https://ideas.repec.org/a/spr/psycho/v31y1966i3p279-311.html. Vichi M, Kiers HAL (2001). “Factorial k-means analysis for two-way data.” Computational Statistics and Data Analysis, 37(1), 49-64. https://EconPapers.repec.org/RePEc:eee:csdana:v:37:y:2001:i:1:p:49-64. Vichi M, Rocci R, Kiers H (2007). “Simultaneous Component and Clustering Models for Three-way Data: Within and Between Approaches.” Journal of Classification, 24, 71-98. doi:10.1007/s00357-007-0006-x.
X_i_jk = generate_dataset()$X_i_jk model = simultaneous() tfkmeans = fit.3fkmeans(model, X_i_jk, c(8,5,4), c(3,3,2))
X_i_jk = generate_dataset()$X_i_jk model = simultaneous() tfkmeans = fit.3fkmeans(model, X_i_jk, c(8,5,4), c(3,3,2))
Implements simultaneous T3Clus and 3FKMeans integrating an alpha value between 0 and 1 inclusive for a weighted result.
fit.ct3clus( model, X_i_jk, full_tensor_shape, reduced_tensor_shape, alpha = 0.5 ) ## S4 method for signature 'simultaneous' fit.ct3clus( model, X_i_jk, full_tensor_shape, reduced_tensor_shape, alpha = 0.5 )
fit.ct3clus( model, X_i_jk, full_tensor_shape, reduced_tensor_shape, alpha = 0.5 ) ## S4 method for signature 'simultaneous' fit.ct3clus( model, X_i_jk, full_tensor_shape, reduced_tensor_shape, alpha = 0.5 )
model |
Initialized simultaneous model. |
X_i_jk |
Matricized tensor along mode-1 (I objects). |
full_tensor_shape |
Dimensions of the tensor in full space. |
reduced_tensor_shape |
Dimensions of tensor in the reduced space. |
alpha |
0<alpha>1 hyper parameter. Model is T3Clus when alpha=1 and 3FKMeans when alpha=0. |
Output attributes accessible via the '@' operator.
U_i_g0 - Initial object membership function matrix
B_j_q0 - Initial factor/component matrix for the variables
C_k_r0 - Initial factor/component matrix for the occasions
U_i_g - Final/updated object membership function matrix
B_j_q - Final/updated factor/component matrix for the variables
C_k_r - Final/updated factor/component matrix for the occasions
Y_g_qr - Derived centroids in the reduced space (data matrix)
X_i_jk_scaled - Standardized dataset matrix
BestTimeElapsed - Execution time for the best iterate
BestLoop - Loop that obtained the best iterate
BestIteration - Iteration yielding the best results
Converged - Flag to check if algorithm converged for the K-means
nConverges - Number of loops that converged for the K-means
TSS_full - Total deviance in the full-space
BSS_full - Between deviance in the reduced-space
RSS_full - Residual deviance in the reduced-space
PF_full - PseudoF in the full-space
TSS_reduced - Total deviance in the reduced-space
BSS_reduced - Between deviance in the reduced-space
RSS_reduced - Residual deviance in the reduced-space
PF_reduced - PseudoF in the reduced-space
PF - Weighted PseudoF score
Labels - Object cluster assignments
Fs - Objective function values for the KM best iterate
Enorm - Average l2 norm of the residual norm.
Tucker L (1966). “Some mathematical notes on three-mode factor analysis.” Psychometrika, 31(3), 279-311. doi:10.1007/BF02289464, https://ideas.repec.org/a/spr/psycho/v31y1966i3p279-311.html. Rocci R, Vichi M (2005). “Three-Mode Component Analysis with Crisp or Fuzzy Partition of Units.” Psychometrika, 70, 715-736. doi:10.1007/s11336-001-0926-z. Vichi M, Kiers HAL (2001). “Factorial k-means analysis for two-way data.” Computational Statistics and Data Analysis, 37(1), 49-64. https://EconPapers.repec.org/RePEc:eee:csdana:v:37:y:2001:i:1:p:49-64. Vichi M, Rocci R, Kiers H (2007). “Simultaneous Component and Clustering Models for Three-way Data: Within and Between Approaches.” Journal of Classification, 24, 71-98. doi:10.1007/s00357-007-0006-x.
fit.t3clus
fit.3fkmeans
simultaneous
X_i_jk = generate_dataset()$X_i_jk model = simultaneous() ct3clus = fit.ct3clus(model, X_i_jk, c(8,5,4), c(3,3,2), alpha=0.5)
X_i_jk = generate_dataset()$X_i_jk model = simultaneous() ct3clus = fit.ct3clus(model, X_i_jk, c(8,5,4), c(3,3,2), alpha=0.5)
Implements simultaneous version of TWCFTA
fit.t3clus(model, X_i_jk, full_tensor_shape, reduced_tensor_shape) ## S4 method for signature 'simultaneous' fit.t3clus(model, X_i_jk, full_tensor_shape, reduced_tensor_shape)
fit.t3clus(model, X_i_jk, full_tensor_shape, reduced_tensor_shape) ## S4 method for signature 'simultaneous' fit.t3clus(model, X_i_jk, full_tensor_shape, reduced_tensor_shape)
model |
Initialized simultaneous model. |
X_i_jk |
Matricized tensor along mode-1 (I objects). |
full_tensor_shape |
Dimensions of the tensor in full-space. |
reduced_tensor_shape |
Dimensions of tensor in the reduced-space. |
The procedure performs simultaneously the sequential TWCFTA model. The model finds B_j_q and C_k_r such that the between-clusters deviance of the component scores is maximized.
Output attributes accessible via the '@' operator.
U_i_g0 - Initial object membership function matrix
B_j_q0 - Initial factor/component matrix for the variables
C_k_r0 - Initial factor/component matrix for the occasions
U_i_g - Final/updated object membership function matrix
B_j_q - Final/updated factor/component matrix for the variables
C_k_r - Final/updated factor/component matrix for the occasions
Y_g_qr - Derived centroids in the reduced space (data matrix)
X_i_jk_scaled - Standardized dataset matrix
BestTimeElapsed - Execution time for the best iterate
BestLoop - Loop that obtained the best iterate
BestIteration - Iteration yielding the best results
Converged - Flag to check if algorithm converged for the K-means
nConverges - Number of loops that converged for the K-means
TSS_full - Total deviance in the full-space
BSS_full - Between deviance in the reduced-space
RSS_full - Residual deviance in the reduced-space
PF_full - PseudoF in the full-space
TSS_reduced - Total deviance in the reduced-space
BSS_reduced - Between deviance in the reduced-space
RSS_reduced - Residual deviance in the reduced-space
PF_reduced - PseudoF in the reduced-space
PF - Weighted PseudoF score
Labels - Object cluster assignments
Fs - Objective function values for the KM best iterate
Enorm - Average l2 norm of the residual norm.
Tucker L (1966). “Some mathematical notes on three-mode factor analysis.” Psychometrika, 31(3), 279-311. doi:10.1007/BF02289464, https://ideas.repec.org/a/spr/psycho/v31y1966i3p279-311.html. Rocci R, Vichi M (2005). “Three-Mode Component Analysis with Crisp or Fuzzy Partition of Units.” Psychometrika, 70, 715-736. doi:10.1007/s11336-001-0926-z. Vichi M, Rocci R, Kiers H (2007). “Simultaneous Component and Clustering Models for Three-way Data: Within and Between Approaches.” Journal of Classification, 24, 71-98. doi:10.1007/s00357-007-0006-x.
X_i_jk = generate_dataset()$X_i_jk model = simultaneous() t3clus = fit.t3clus(model, X_i_jk, c(8,5,4), c(3,3,2))
X_i_jk = generate_dataset()$X_i_jk model = simultaneous() t3clus = fit.t3clus(model, X_i_jk, c(8,5,4), c(3,3,2))
Implements K-means clustering and afterwards factorial reduction in a sequential fashion.
fit.twcfta(model, X_i_jk, full_tensor_shape, reduced_tensor_shape) ## S4 method for signature 'tandem' fit.twcfta(model, X_i_jk, full_tensor_shape, reduced_tensor_shape)
fit.twcfta(model, X_i_jk, full_tensor_shape, reduced_tensor_shape) ## S4 method for signature 'tandem' fit.twcfta(model, X_i_jk, full_tensor_shape, reduced_tensor_shape)
model |
Initialized tandem model. |
X_i_jk |
Matricized tensor along mode-1 (I objects). |
full_tensor_shape |
Dimensions of the tensor in full space. |
reduced_tensor_shape |
Dimensions of tensor in the reduced space. |
The procedure requires sequential clustering and factorial decomposition.
The K-means clustering algorithm is initially applied to the matricized tensor X_i_jk to obtain the centroids matrix X_g_jk and the membership matrix U_i_g.
The Tucker2 decomposition technique is then implemented on the centroids matrix X_g_jk to yield the core centroids matrix Y_g_qr and the component weights matrices B_j_q and C_k_r.
Output attributes accessible via the '@' operator.
U_i_g0 - Initial object membership function matrix.
B_j_q0 - Initial factor/component matrix for the variables.
C_k_r0 - Initial factor/component matrix for the occasions.
U_i_g - Final/updated object membership function matrix.
B_j_q - Final/updated factor/component matrix for the variables.
C_k_r - Final/updated factor/component matrix for the occasions.
Y_g_qr - Derived centroids in the reduced space (data matrix).
X_i_jk_scaled - Standardized dataset matrix.
BestTimeElapsed - Execution time for the best iterate.
BestLoop - Loop that obtained the best iterate.
BestKmIteration - Number of iteration until best iterate for the K-means.
BestFaIteration - Number of iteration until best iterate for the FA.
FaConverged - Flag to check if algorithm converged for the K-means.
KmConverged - Flag to check if algorithm converged for the Factor Decomposition.
nKmConverges - Number of loops that converged for the K-means.
nFaConverges - Number of loops that converged for the Factor decomposition.
TSS_full - Total deviance in the full-space.
BSS_full - Between deviance in the reduced-space.
RSS_full - Residual deviance in the reduced-space.
PF_full - PseudoF in the full-space.
TSS_reduced - Total deviance in the reduced-space.
BSS_reduced - Between deviance in the reduced-space.
RSS_reduced - Residual deviance in the reduced-space.
PF_reduced - PseudoF in the reduced-space.
PF - Actual PseudoF value to obtain best loop.
Labels - Object cluster assignments.
FsKM - Objective function values for the KM best iterate.
FsFA - Objective function values for the FA best iterate.
Enorm - Average l2 norm of the residual norm.
This procedure is useful to further interpret the between clusters variability of the data and to understand the variables and/or occasions that most contribute to discriminate the clusters. However, the application of this technique could lead to the masking of variables that are not informative of the clustering structure.
since the Tucker2 model is applied after the clustering, this cannot help select the most relevant information for the clustering in the dataset.
Arabie P, Hubert L (1996). “Advances in Cluster Analysis Relevant to Marketing Research.” In Gaul W, Pfeifer D (eds.), From Data to Knowledge, 3–19. Tucker L (1966). “Some mathematical notes on three-mode factor analysis.” Psychometrika, 31(3), 279-311. doi:10.1007/BF02289464, https://ideas.repec.org/a/spr/psycho/v31y1966i3p279-311.html.
X_i_jk = generate_dataset()$X_i_jk model = tandem() twcfta = fit.twcfta(model, X_i_jk, c(8,5,4), c(3,3,2))
X_i_jk = generate_dataset()$X_i_jk model = tandem() twcfta = fit.twcfta(model, X_i_jk, c(8,5,4), c(3,3,2))
Implements factorial reduction and then K-means clustering in a sequential fashion.
fit.twfcta(model, X_i_jk, full_tensor_shape, reduced_tensor_shape) ## S4 method for signature 'tandem' fit.twfcta(model, X_i_jk, full_tensor_shape, reduced_tensor_shape)
fit.twfcta(model, X_i_jk, full_tensor_shape, reduced_tensor_shape) ## S4 method for signature 'tandem' fit.twfcta(model, X_i_jk, full_tensor_shape, reduced_tensor_shape)
model |
Initialized tandem model. |
X_i_jk |
Matricized tensor along mode-1 (I objects). |
full_tensor_shape |
Dimensions of the tensor in full space. |
reduced_tensor_shape |
Dimensions of tensor in the reduced space. |
The procedure implements sequential factorial decomposition and clustering.
The technique performs Tucker2 decomposition on the X_i_jk matrix to obtain the matrix of component scores Y_i_qr with component weights matrices B_j_q and C_k_r.
The K-means clustering algorithm is then applied to the component scores matrix Y_i_qr to obtain the desired core centroids matrix Y_g_qr and its associated stochastic membership function matrix U_i_g.
Output attributes accessible via the '@' operator.
U_i_g0 - Initial object membership function matrix.
B_j_q0 - Initial factor/component matrix for the variables.
C_k_r0 - Initial factor/component matrix for the occasions.
U_i_g - Final/updated object membership function matrix.
B_j_q - Final/updated factor/component matrix for the variables.
C_k_r - Final/updated factor/component matrix for the occasions.
Y_g_qr - Derived centroids in the reduced space (data matrix).
X_i_jk_scaled - Standardized dataset matrix.
BestTimeElapsed - Execution time for the best iterate.
BestLoop - Loop that obtained the best iterate.
BestKmIteration - Number of iteration until best iterate for the K-means.
BestFaIteration - Number of iteration until best iterate for the FA.
FaConverged - Flag to check if algorithm converged for the K-means.
KmConverged - Flag to check if algorithm converged for the Factor Decomposition.
nKmConverges - Number of loops that converged for the K-means.
nFaConverges - Number of loops that converged for the Factor decomposition.
TSS_full - Total deviance in the full-space.
BSS_full - Between deviance in the reduced-space.
RSS_full - Residual deviance in the reduced-space.
PF_full - PseudoF in the full-space.
TSS_reduced - Total deviance in the reduced-space.
BSS_reduced - Between deviance in the reduced-space.
RSS_reduced - Residual deviance in the reduced-space.
PF_reduced - PseudoF in the reduced-space.
PF - Actual PseudoF value to obtain best loop.
Labels - Object cluster assignments.
FsKM - Objective function values for the KM best iterate.
FsFA - Objective function values for the FA best iterate.
Enorm - Average l2 norm of the residual norm.
The technique helps interpret the within clusters variability of the data. The Tucker2 tends to explain most of the total variation in the dataset. Hence, the variance of variables that do not contribute to the clustering structure in the dataset is also included.
The Tucker2 dimensions may still mask some essential clustering structures in the dataset.
Arabie P, Hubert L (1996). “Advances in Cluster Analysis Relevant to Marketing Research.” In Gaul W, Pfeifer D (eds.), From Data to Knowledge, 3–19. Tucker L (1966). “Some mathematical notes on three-mode factor analysis.” Psychometrika, 31(3), 279-311. doi:10.1007/BF02289464, https://ideas.repec.org/a/spr/psycho/v31y1966i3p279-311.html.
X_i_jk = generate_dataset()$X_i_jk model = tandem() twfCta = fit.twfcta(model, X_i_jk, c(8,5,4), c(3,3,2))
X_i_jk = generate_dataset()$X_i_jk model = tandem() twfCta = fit.twfcta(model, X_i_jk, c(8,5,4), c(3,3,2))
X_i_jk => X_i_j_k, X_j_ki => X_i_j_k, X_k_ij => X_i_j_k
fold(X, mode, shape)
fold(X, mode, shape)
X |
Data matrix to fold. |
mode |
Mode of operation. |
shape |
Dimension of original tensor. |
X_i_j_k Three-mode tensor.
X_i_jk = generate_dataset()$X_i_jk X_i_j_k = fold(X_i_jk, mode=1, shape=c(I=8,J=5,K=4)) # X_i_j_k
X_i_jk = generate_dataset()$X_i_jk X_i_j_k = fold(X_i_jk, mode=1, shape=c(I=8,J=5,K=4)) # X_i_j_k
Generate G clustered synthetic dataset of I objects measured on J variables for K occasions with additive noise.
generate_dataset( I = 8, J = 5, K = 4, G = 3, Q = 3, R = 2, centroids_spread = c(0, 1), noise_mean = 0, noise_stdev = 0.5, seed = NULL )
generate_dataset( I = 8, J = 5, K = 4, G = 3, Q = 3, R = 2, centroids_spread = c(0, 1), noise_mean = 0, noise_stdev = 0.5, seed = NULL )
I |
Number of objects. |
J |
Number of variables per occasion. |
K |
Number of occasions. |
G |
Number of clusters. |
Q |
Number of factors for the variables. |
R |
Number of factors for the occasions. |
centroids_spread |
interval from which to uniformly pick the centroids. |
noise_mean |
Mean of noise to generate. |
noise_stdev |
Noise effect level/spread/standard deviation. |
seed |
Seed for random sequence generation. |
Z_i_jk: Component scores in the full space.
E_i_jk: Generated noise at the given noise level.
X_i_jk: Dataset with noise level set to noise_stdev specified.
Y_g_qr: Centroids matrix in the reduced space.
U_i_g: Stochastic membership function matrix.
B_j_q: Objects component scores matrix.
C_k_r: Occasions component scores matrix.
generate_dataset(seed=0)
generate_dataset(seed=0)
Generates random binary stochastic membership function matrix for the I objects.
generate_rmfm(I, G, seed = NULL)
generate_rmfm(I, G, seed = NULL)
I |
Number of objects. |
G |
Number of groups/clusters. |
seed |
Seed for random number generation. |
U_i_g, binary stochastic membership matrix.
generate_rmfm(I=8,G=3)
generate_rmfm(I=8,G=3)
Initializes centroids based on a given membership function matrix or randomly. Iterate once over the input data to update the membership function matrix assigning objects to the closest centroids.
onekmeans(Y_i_qr, G, U_i_g = NULL, seed = NULL)
onekmeans(Y_i_qr, G, U_i_g = NULL, seed = NULL)
Y_i_qr |
Input data to group/cluster. |
G |
Number of clusters to find. |
U_i_g |
Initial membership matrix for the I objects. |
seed |
Seed for random values generation. |
updated membership matrix U_i_g.
Oti EU, Olusola MO, Eze FC, Enogwe SU (2021). “Comprehensive Review of K-Means Clustering Algorithms.” International Journal of Advances in Scientific Research and Engineering (IJASRE), ISSN:2454-8006, DOI: 10.31695/IJASRE, 7(8), 64–69. doi:10.31695/IJASRE.2021.34050, https://ijasre.net/index.php/ijasre/article/view/1301.
X_i_jk = generate_dataset(seed=0)$X_i_jk onekmeans(X_i_jk, G=5)
X_i_jk = generate_dataset(seed=0)$X_i_jk onekmeans(X_i_jk, G=5)
Computes the PseudoF score in the full space.
pseudof.full(bss, wss, full_tensor_shape, reduced_tensor_shape)
pseudof.full(bss, wss, full_tensor_shape, reduced_tensor_shape)
bss |
Between sums of squared deviations between clusters. |
wss |
Within sums of squared deviations within clusters. |
full_tensor_shape |
Dimensions of the tensor in the original space. |
reduced_tensor_shape |
Dimension of the tensor in the reduced space. |
PseudoF score
Caliński T, Harabasz J (1974). “A dendrite method for cluster analysis.” Communications in Statistics, 3(1), 1-27. doi:10.1080/03610927408827101, https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101, https://www.tandfonline.com/doi/abs/10.1080/03610927408827101. Rocci R, Vichi M (2005). “Three-Mode Component Analysis with Crisp or Fuzzy Partition of Units.” Psychometrika, 70, 715-736. doi:10.1007/s11336-001-0926-z.
pseudof.full(12,6,c(8,5,4),c(3,3,2))
pseudof.full(12,6,c(8,5,4),c(3,3,2))
Computes the PseudoF score in the reduced space.
pseudof.reduced(bss, wss, full_tensor_shape, reduced_tensor_shape)
pseudof.reduced(bss, wss, full_tensor_shape, reduced_tensor_shape)
bss |
Between sums of squared deviations between clusters. |
wss |
Within sums of squared deviations within clusters. |
full_tensor_shape |
Dimensions of the tensor in the original space. |
reduced_tensor_shape |
Dimension of the tensor in the reduced space. |
PseudoF score
Caliński T, Harabasz J (1974). “A dendrite method for cluster analysis.” Communications in Statistics, 3(1), 1-27. doi:10.1080/03610927408827101, https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101, https://www.tandfonline.com/doi/abs/10.1080/03610927408827101.
pseudof.reduced(12,6,c(8,5,4),c(3,3,2))
pseudof.reduced(12,6,c(8,5,4),c(3,3,2))
Initialize model object required by the simultaneous methods.
simultaneous( seed = NULL, verbose = TRUE, init = "svd", n_max_iter = 10, n_loops = 10, tol = 1e-05, U_i_g = NULL, B_j_q = NULL, C_k_r = NULL )
simultaneous( seed = NULL, verbose = TRUE, init = "svd", n_max_iter = 10, n_loops = 10, tol = 1e-05, U_i_g = NULL, B_j_q = NULL, C_k_r = NULL )
seed |
Seed for random sequence generation. |
verbose |
Flag to display output result for each loop. |
init |
The initialization method for the model parameters. Values could be 'svd','random','twcfta' or 'twfcta' Defaults to svd. |
n_max_iter |
Maximum number of iterations to optimize objective function. |
n_loops |
Number of runs/loops in search of the global result. |
tol |
Acceptable tolerance level. |
U_i_g |
Membership function matrix for the objects. |
B_j_q |
Component matrix for the variables. |
C_k_r |
Component matrix for the occasions. |
Two simultaneous models T3Clus and 3FKMeans are the implemented methods.
T3Clus finds B_j_q and C_k_r such that the between-clusters deviance of the component scores is maximized.
3FKMeans finds B_j_q and C_k_r such that the within-clusters deviance of the component scores is minimized.
An object of class "simultaneous".
The model finds the best partition described by the best orthogonal linear combinations of the variables and orthogonal linear combinations of the occasions.
Tucker L (1966). “Some mathematical notes on three-mode factor analysis.” Psychometrika, 31(3), 279-311. doi:10.1007/BF02289464, https://ideas.repec.org/a/spr/psycho/v31y1966i3p279-311.html. Vichi M, Rocci R, Kiers H (2007). “Simultaneous Component and Clustering Models for Three-way Data: Within and Between Approaches.” Journal of Classification, 24, 71-98. doi:10.1007/s00357-007-0006-x.
fit.t3clus
fit.3fkmeans
fit.ct3clus
tandem
simultaneous()
simultaneous()
Simultaneous Model
seed
numeric. Seed for random sequence generation. Defaults to None.
verbose
logical. Whether to display executions output or not. Defaults to False.
init
character. The parameter initialization method. Defaults to 'svd'.
n_max_iter
numeric. Maximum number of iterations. Defaults to 10.
n_loops
numeric. Number of initialization to guarantee global results. Defaults to 10.
tol
numeric. Tolerance level/acceptable error. Defaults to 1e-5.
U_i_g
numeric. (I,G) initial stochastic membership function matrix.
B_j_q
numeric. (J,Q) initial component weight matrix for variables.
C_k_r
numeric. (K,R) initial component weight matrix for occasions.
Initializes an instance of the tandem model required by the tandem methods.
tandem( seed = NULL, verbose = TRUE, init = "svd", n_max_iter = 10, n_loops = 10, tol = 1e-05, U_i_g = NULL, B_j_q = NULL, C_k_r = NULL )
tandem( seed = NULL, verbose = TRUE, init = "svd", n_max_iter = 10, n_loops = 10, tol = 1e-05, U_i_g = NULL, B_j_q = NULL, C_k_r = NULL )
seed |
Seed for random sequence generation. |
verbose |
Flag to display iteration outputs for each loop. |
init |
Parameter initialization method, 'svd' or 'random'. |
n_max_iter |
Maximum number of iteration to optimize the objective function. |
n_loops |
Maximum number of loops/runs for global results. |
tol |
Allowable tolerance to check convergence. |
U_i_g |
Initial membership function matrix for the objects. |
B_j_q |
Initial component scores matrix for the variables. |
C_k_r |
Initial component sores matrix for the occasions. |
An object of class "tandem".
Arabie P, Hubert L (1996). “Advances in Cluster Analysis Relevant to Marketing Research.” In Gaul W, Pfeifer D (eds.), From Data to Knowledge, 3–19. Tucker L (1966). “Some mathematical notes on three-mode factor analysis.” Psychometrika, 31(3), 279-311. doi:10.1007/BF02289464, https://ideas.repec.org/a/spr/psycho/v31y1966i3p279-311.html.
fit.twcfta
fit.twfcta
simultaneous
Tandem Class
seed
Seed for random sequence generation. Defaults to None.
verbose
logical. Whether to display executions output or not. Defaults to False.
init
character. The parameter initialization method. Defaults to 'svd'.
n_max_iter
numeric. Maximum number of iterations. Defaults to 10.
n_loops
numeric. Number of initialization to guarantee global results. Defaults to 10.
tol
numeric. Tolerance level/acceptable error. Defaults to 1e-5.
U_i_g
matrix. (I,G) initial stochastic membership function matrix.
B_j_q
matrix. (J,Q) initial component weight matrix for variables.
C_k_r
matrix. (K,R) initial component weight matrix for occasions.
Unfold/Matricize tensor. convert matrix to tensor by mode.
unfold(tensor, mode)
unfold(tensor, mode)
tensor |
Three-mode tensor array. |
mode |
Mode of operation. |
Matrix
X_i_jk = generate_dataset()$X_i_jk X_i_j_k = fold(X_i_jk, mode=1, shape=c(I=8,J=5,K=4)) unfold(X_i_j_k, mode=1) # X_i_jk
X_i_jk = generate_dataset()$X_i_jk X_i_j_k = fold(X_i_jk, mode=1, shape=c(I=8,J=5,K=4)) unfold(X_i_j_k, mode=1) # X_i_jk