Title: | Clustering of Datasets |
---|---|
Description: | Hierarchical and partitioning algorithms to cluster blocks of variables. The partitioning algorithm includes an option called noise cluster to set aside atypical blocks of variables. The CLUSTATIS method (for quantitative blocks) (Llobell, Cariou, Vigneau, Labenne & Qannari (2020) <doi:10.1016/j.foodqual.2018.05.013>, Llobell, Vigneau & Qannari (2019) <doi:10.1016/j.foodqual.2019.02.017>) and the CLUSCATA method (for Check-All-That-Apply data) (Llobell, Cariou, Vigneau, Labenne & Qannari (2019) <doi:10.1016/j.foodqual.2018.09.006>, Llobell, Giacalone, Labenne & Qannari (2019) <doi:10.1016/j.foodqual.2019.05.017>) are the core of this package. The CATATIS methods allows to compute some indices and tests to control the quality of CATA data. Multivariate analysis and clustering of subjects for quantitative multiblock data, CATA, RATA, Free Sorting and JAR experiments are available. Clustering of rows in multi-block context (notably with ClusMB strategy) is also included. |
Authors: | Fabien Llobell [aut, cre] (Oniris/XLSTAT), Evelyne Vigneau [ctb] (Oniris), Veronique Cariou [ctb] (Oniris), El Mostafa Qannari [ctb] (Oniris) |
Maintainer: | Fabien Llobell <[email protected]> |
License: | GPL-3 |
Version: | 4.0.0 |
Built: | 2024-12-19 06:32:51 UTC |
Source: | CRAN |
Hierarchical and partitioning algorithms of blocks of variables.The CLUSTATIS method and the CLUSCATA method are the core of this package. The CATATIS methods allows to compute some indices and tests to control the quality of CATA data. Multivariate analysis and clustering of subjects for quantitative multiblock data, CATA, RATA, Free Sorting and JAR experiments are available. Clustering of rows in multi-block context (notably with ClusMB strategy) is also included.
Package: | ClustBlock |
Type: | Package |
Version: | 4.0.0 |
First version Date: | 2019-03-06 |
Last version Date: | 2024-05-21 |
Fabien Llobell, Evelyne Vigneau, Veronique Cariou, El Mostafa Qannari
Maintainer: [email protected]
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2020). Analysis and clustering of multiblock datasets |
by means of the STATIS and CLUSTATIS methods. Application to sensometrics. Food |
Quality and Preference, 79, 103520. |
Llobell, F., Vigneau, E., & Qannari, E. M. (2019). Clustering datasets by means of CLUSTATIS with identification |
of atypical datasets. Application to sensometrics. Food Quality and Preference, 75, 97-104. |
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2019). A new approach for the analysis of data |
and the clustering of subjects in a CATA experiment. Food quality and preference, 72, 31-39. |
Llobell, F., Giacalone, D., Labenne, A., & Qannari, E. M. (2019). Assessment of the agreement and cluster analysis of |
the respondents in a CATA experiment. Food Quality and Preference, 77, 184-190. |
Llobell, F., & Qannari, E. M. (2020). CLUSTATIS: Cluster analysis of blocks of variables. Electronic |
Journal of Applied Statistical Analysis, 13(2), 436-453. |
Llobell, F. (2020). Classification de tableaux de données, applications en analyse sensorielle (Doctoral |
dissertation, Nantes, Ecole nationale vétérinaire). |
CATATIS method. Additional outputs are also computed. Non-binary data are accepted and weights can be tested.
catatis(Data,nblo,NameBlocks=NULL, NameVar=NULL, Graph=TRUE, Graph_weights=TRUE, Test_weights=FALSE, nperm=100)
catatis(Data,nblo,NameBlocks=NULL, NameVar=NULL, Graph=TRUE, Graph_weights=TRUE, Test_weights=FALSE, nperm=100)
Data |
data frame or matrix where the blocks of binary variables are merged horizontally. If you have a different format, see |
nblo |
integer. Number of blocks (subjects). |
NameBlocks |
string vector. Name of each block (subject). Length must be equal to the number of blocks. If NULL, the names are S1,...Sm. Default: NULL |
NameVar |
string vector. Name of each variable (attribute, the same names for each subject). Length must be equal to the number of attributes. If NULL, the colnames of the first block are taken. Default: NULL |
Graph |
logical. Show the graphical representation? Default: TRUE |
Graph_weights |
logical. Should the barplot of the weights be plotted? Default: TRUE |
Test_weights |
logical. Should the the weights be tested? Default: FALSE |
nperm |
integer. Number of permutation for the weight tests. Default: 100 |
a list with:
S: the S matrix: a matrix with the similarity coefficient among the subjects
compromise: a matrix which is the compromise of the subjects (akin to a weighted average)
weights: the weights associated with the subjects to build the compromise
weights_tests: the weights tests results
lambda: the first eigenvalue of the S matrix
overall error: the error for the CATATIS criterion
error_by_sub: the error by subject (CATATIS criterion)
error_by_prod: the error by product (CATATIS criterion)
s_with_compromise: the similarity coefficient of each subject with the compromise
homogeneity: homogeneity of the subjects (in percentage)
CA: the results of correspondence analysis performed on the compromise dataset
eigenvalues: the eigenvalues associated to the correspondence analysis
inertia: the percentage of total variance explained by each axis of the CA
scalefactors: the scaling factors of each subject
nb_1: the number of 1 in each block, i.e. the number of checked attributes by subject.
param: parameters called
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2019). A new approach for the analysis of data and the clustering of subjects in a CATA experiment. Food Quality and Preference, 72, 31-39.
Bonnet, L., Ferney, T., Riedel, T., Qannari, E.M., Llobell, F. (September 14, 2022) .Using CATA for sensory profiling: assessment of the panel performance. Eurosense, Turku, Finland.
plot.catatis
, summary.catatis
, cluscata
, change_cata_format
, change_cata_format2
data(straw) res.cat=catatis(straw, nblo=114) summary(res.cat) plot(res.cat) #Vertical format with sessions data("fish") chang=change_cata_format2(fish, nprod= 6, nattr= 27, nsub = 12, nsess= 3) res.cat2=catatis(Data= chang$Datafinal, nblo = 12, NameBlocks = chang$NameSub, Test_weights=TRUE) #Vertical format without sessions Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res.cat3=catatis(Data= chang2$Datafinal, nblo = 11, NameBlocks = chang2$NameSub)
data(straw) res.cat=catatis(straw, nblo=114) summary(res.cat) plot(res.cat) #Vertical format with sessions data("fish") chang=change_cata_format2(fish, nprod= 6, nattr= 27, nsub = 12, nsess= 3) res.cat2=catatis(Data= chang$Datafinal, nblo = 12, NameBlocks = chang$NameSub, Test_weights=TRUE) #Vertical format without sessions Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res.cat3=catatis(Data= chang2$Datafinal, nblo = 11, NameBlocks = chang2$NameSub)
CATATIS method adapted to JAR data.
catatis_jar(Data, nprod, nsub, levelsJAR=3, beta=0.1, Graph=TRUE, Graph_weights=TRUE, Test_weights=FALSE, nperm=100)
catatis_jar(Data, nprod, nsub, levelsJAR=3, beta=0.1, Graph=TRUE, Graph_weights=TRUE, Test_weights=FALSE, nperm=100)
Data |
data frame where the first column is the Assessors, the second is the products and all other columns the JAR attributes with numbers (1 to 3 or 1 to 5, see levelsJAR) |
nprod |
integer. Number of products. |
nsub |
integer. Number of subjects. |
levelsJAR |
integer. 3 or 5 levels. If 5, the data will be transformed in 3 levels. |
beta |
numerical. Parameter for agreement between JAR and other answers. Between 0 and 0.5. |
Graph |
logical. Show the graphical representation? Default: TRUE |
Graph_weights |
logical. Should the barplot of the weights be plotted? Default: TRUE |
Test_weights |
logical. Should the the weights be tested? Default: FALSE |
nperm |
integer. Number of permutation for the weight tests. Default: 100 |
a list with:
S: the S matrix: a matrix with the similarity coefficient among the subjects
compromise: a matrix which is the compromise of the subjects (akin to a weighted average)
weights: the weights associated with the subjects to build the compromise
weights_tests: the weights tests results
lambda: the first eigenvalue of the S matrix
overall error: the error for the CATATIS criterion
error_by_sub: the error by subject (CATATIS criterion)
error_by_prod: the error by product (CATATIS criterion)
s_with_compromise: the similarity coefficient of each subject with the compromise
homogeneity: homogeneity of the subjects (in percentage)
CA: the results of correspondance analysis performed on the compromise dataset
eigenvalues: the eigenvalues associated to the correspondance analysis
inertia: the percentage of total variance explained by each axis of the CA
scalefactors: the scaling factors of each subject
nb_1: Can be ignored
param: parameters called
Llobell, F., Vigneau, E. & Qannari, E. M. ((September 14, 2022). Multivariate data analysis and clustering of subjects in a Just about right task. Eurosense, Turku, Finland.
catatis
, plot.catatis
, summary.catatis
, cluscata_jar
, preprocess_JAR
, cluscata_kmeans_jar
data(cheese) res.cat=catatis_jar(Data=cheese, nprod=8, nsub=72, levelsJAR=5) summary(res.cat) #plot(res.cat)
data(cheese) res.cat=catatis_jar(Data=cheese, nprod=8, nsub=72, levelsJAR=5) summary(res.cat) #plot(res.cat)
CATATIS method for RATA data. Additional outputs are also computed. Non-binary data are accepted and weights can be tested.
catatis_rata(Data,nblo,NameBlocks=NULL, NameVar=NULL, Graph=TRUE, Graph_weights=TRUE, Test_weights=FALSE, nperm=100)
catatis_rata(Data,nblo,NameBlocks=NULL, NameVar=NULL, Graph=TRUE, Graph_weights=TRUE, Test_weights=FALSE, nperm=100)
Data |
data frame or matrix where the blocks of variables are merged horizontally. If you have a different format, see |
nblo |
integer. Number of blocks (subjects). |
NameBlocks |
string vector. Name of each block (subject). Length must be equal to the number of blocks. If NULL, the names are S1,...Sm. Default: NULL |
NameVar |
string vector. Name of each variable (attribute, the same names for each subject). Length must be equal to the number of attributes. If NULL, the colnames of the first block are taken. Default: NULL |
Graph |
logical. Show the graphical representation? Default: TRUE |
Graph_weights |
logical. Should the barplot of the weights be plotted? Default: TRUE |
Test_weights |
logical. Should the the weights be tested? Default: FALSE |
nperm |
integer. Number of permutation for the weight tests. Default: 100 |
a list with:
S: the S matrix: a matrix with the similarity coefficient among the subjects
compromise: a matrix which is the compromise of the subjects (akin to a weighted average)
weights: the weights associated with the subjects to build the compromise
weights_tests: the weights tests results
lambda: the first eigenvalue of the S matrix
overall error: the error for the CATATIS criterion
error_by_sub: the error by subject (CATATIS criterion)
error_by_prod: the error by product (CATATIS criterion)
s_with_compromise: the similarity coefficient of each subject with the compromise
homogeneity: homogeneity of the subjects (in percentage)
CA: the results of correspondence analysis performed on the compromise dataset
eigenvalues: the eigenvalues associated to the correspondence analysis
inertia: the percentage of total variance explained by each axis of the CA
scalefactors: the scaling factors of each subject
param: parameters called
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2019). A new approach for the analysis of data and the clustering of subjects in a CATA experiment. Food Quality and Preference, 72, 31-39.
Bonnet, L., Ferney, T., Riedel, T., Qannari, E.M., Llobell, F. (September 14, 2022) .Using CATA for sensory profiling: assessment of the panel performance. Eurosense, Turku, Finland.
Bonnet, L., Llobell, F., Qannari, E.M. (Pangborn 2023). Assessment of the panel performance in a RATA experiment.
catatis
, plot.catatis
, summary.catatis
, change_cata_format
, change_cata_format2
#RATA data with session data(RATAchoc) chang2=change_cata_format2(RATAchoc, nprod= 12, nattr= 13, nsub = 9, nsess= 3) res.cat4=catatis_rata(Data= chang2$Datafinal, nblo = 9, NameBlocks = chang2$NameSub) summary(res.cat4) #RATA data without session Data=RATAchoc[1:108,2:16] chang2=change_cata_format2(Data, nprod= 12, nattr= 13, nsub = 9, nsess = 1) res.cat5=catatis_rata(Data= chang2$Datafinal, nblo = 9, NameBlocks = chang2$NameSub) summary(res.cat5) graphics.off()
#RATA data with session data(RATAchoc) chang2=change_cata_format2(RATAchoc, nprod= 12, nattr= 13, nsub = 9, nsess= 3) res.cat4=catatis_rata(Data= chang2$Datafinal, nblo = 9, NameBlocks = chang2$NameSub) summary(res.cat4) #RATA data without session Data=RATAchoc[1:108,2:16] chang2=change_cata_format2(Data, nprod= 12, nattr= 13, nsub = 9, nsess = 1) res.cat5=catatis_rata(Data= chang2$Datafinal, nblo = 9, NameBlocks = chang2$NameSub) summary(res.cat5) graphics.off()
CATATIS and CLUSCATA operate on data where the blocksvariables are merged horizontally. If you have a different format, you can use this function to change the format. Format=1 is for data merged vertically with the dataset of the first subject, then the second,... with products in same order Format=2 is for data merged vertically with the dataset for the first product, then the second... with subjects in same order
Unlike change_cata_format2, you don't need to specify products and subjects, just make sure they are in the right order.
change_cata_format(Data, nprod, nattr, nsub, format=1, NameProds=NULL, NameAttr=NULL)
change_cata_format(Data, nprod, nattr, nsub, format=1, NameProds=NULL, NameAttr=NULL)
Data |
data frame or matrix. Correspond to your data |
nprod |
integer. Number of products |
nattr |
integer. Number of attributes |
nsub |
integer. Number of subjects. |
format |
integer (1 or 2). See the description |
NameProds |
string vector with the names of the products (length must be nprod) |
NameAttr |
string vector with the names of attributes (length must be nattr) |
The arranged data for CATATIS and CLUSCATA function
catatis
, cluscata
, change_cata_format2
CATATIS and CLUSCATA operate on data where the blocks of variables are merged horizontally. If you have a vertical format, you can use this function to change the format. The first column must contain the sessions, the second the subjects, the third the products and the others the attributes. If you don't have sessions, then the first column must contain the subjects and the second the products. Unlike change_cata_format function, you can enter data with sessions and/or mixed data in terms of products/subjects. However, you have to set columns to indicate this beforehand.
change_cata_format2(Data, nprod, nattr, nsub, nsess)
change_cata_format2(Data, nprod, nattr, nsub, nsess)
Data |
data frame or matrix. Correspond to your data |
nprod |
integer. Number of products |
nattr |
integer. Number of attributes |
nsub |
integer. Number of subjects. |
nsess |
integer. Number of sessions |
The arranged data for CATATIS and CLUSCATA function and the subjects names in the correct order.
catatis
, cluscata
, change_cata_format
#Vertical format with sessions data("fish") chang=change_cata_format2(fish, nprod= 6, nattr= 27, nsub = 12, nsess= 3) res.cat2=catatis(Data= chang$Datafinal, nblo = 12, NameBlocks = chang$NameSub) #Vertical format without sessions Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res.cat3=catatis(Data= chang2$Datafinal, nblo = 11, NameBlocks = chang2$NameSub) res.clu3=cluscata(Data= chang2$Datafinal, nblo = 11, NameBlocks = chang2$NameSub)
#Vertical format with sessions data("fish") chang=change_cata_format2(fish, nprod= 6, nattr= 27, nsub = 12, nsess= 3) res.cat2=catatis(Data= chang$Datafinal, nblo = 12, NameBlocks = chang$NameSub) #Vertical format without sessions Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res.cat3=catatis(Data= chang2$Datafinal, nblo = 11, NameBlocks = chang2$NameSub) res.clu3=cluscata(Data= chang2$Datafinal, nblo = 11, NameBlocks = chang2$NameSub)
cheese Just About Right data
data(cheese)
data(cheese)
JAR data. A data frame with Assessors, Products and JAR attributes. 8 products, 9 attributes and 72 subjects.
Luc, A., Lê, S., Philippe, M., Qannari, E. M., & Vigneau, E. (2022). Free JAR experiment: Data analysis and comparison with JAR task. Food Quality and Preference, 98, 104453.
data(cheese)
data(cheese)
chocolates data
data(choc)
data(choc)
Free sorting data. A data frame with 14 rows (the chocolates) and 25 columns (the subjects). The numbers indicate the groups to which the products (rows) are assigned.
Courcoux, P., Qannari, E. M., Taylor, Y., Buck, D., & Greenhoff, K. (2012). Taxonomic free sorting. Food Quality and Preference, 23(1), 30-35.
data(choc)
data(choc)
Clustering of subjects (blocks) from a CATA experiment. Each cluster of blocks is associated with a compromise computed by the CATATIS method. The hierarchical clustering is followed by a partitioning algorithm (consolidation). Non-binary data are accepted.
cluscata(Data, nblo, NameBlocks=NULL, NameVar=NULL, Noise_cluster=FALSE, Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, nblo-2), rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50, Warnings=FALSE)
cluscata(Data, nblo, NameBlocks=NULL, NameVar=NULL, Noise_cluster=FALSE, Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, nblo-2), rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50, Warnings=FALSE)
Data |
data frame or matrix where the blocks of binary variables are merged horizontally. If you have a different format, see |
nblo |
numerical. Number of blocks (subjects). |
NameBlocks |
string vector. Name of each block (subject). Length must be equal to the number of blocks. If NULL, the names are S1,...Sm. Default: NULL |
NameVar |
string vector. Name of each variable (attribute, the same names for each subject). Length must be equal to the number of attributes. If NULL, the colnames of the first block are taken. Default: NULL |
Noise_cluster |
logical. Should a noise cluster be computed? Default: FALSE |
Itermax |
numerical. Maximum of iteration for the partitioning algorithm. Default:30 |
Graph_dend |
logical. Should the dendrogram be plotted? Default: TRUE |
Graph_bar |
logical. Should the barplot of the difference of the criterion and the barplot of the overall homogeneity at each merging step of the hierarchical algorithm be plotted? Default: TRUE |
printlevel |
logical. Print the number of remaining levels during the hierarchical clustering algorithm? Default: FALSE |
gpmax |
logical. What is maximum number of clusters to consider? Default: min(6, nblo-2) |
rhoparam |
numerical. What is the threshold for the noise cluster? Between 0 and 1, high value can imply lot of blocks set aside. If NULL, automatic threshold is computed. |
Testonlyoneclust |
logical. Test if there is more than one cluster? Default: FALSE |
alpha |
numerical between 0 and 1. What is the threshold to test if there is more than one cluster? Default: 0.05 |
nperm |
numerical. How many permutations are required to test if there is more than one cluster? Default: 50 |
Warnings |
logical. Display warnings about the fact that none of the subjects in some clusters checked an attribute or product? Default: FALSE |
Each partitionK contains a list for each number of clusters of the partition, K=1 to gpmax with:
group: the clustering partition after consolidation. If Noise_cluster=TRUE, some subjects could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster
homogeneity: homogeneity index (
s_with_compromise: similarity coefficient of each subject with its cluster compromise
weights: weight associated with each subject in its cluster
compromise: the compromise of each cluster
CA: list. the correspondance analysis results on each cluster compromise (coordinates, contributions...)
inertia: percentage of total variance explained by each axis of the CA for each cluster
s_all_cluster: the similarity coefficient between each subject and each cluster compromise
criterion: the CLUSCATA criterion error
param: parameters called
type: parameter passed to other functions
There is also at the end of the list:
dend: The CLUSCATA dendrogram
cutree_k: the partition obtained by cutting the dendrogram in K clusters (before consolidation).
overall_homogeneity_ng: percentage of overall homogeneity by number of clusters before consolidation (and after if there is no noise cluster)
diff_crit_ng: variation of criterion when a merging is done before consolidation (and after if there is no noise cluster)
test_one_cluster: decision and pvalue to know if there is more than one cluster
param: parameters called
type: parameter passed to other functions
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2019). A new approach for the analysis of data and the clustering of subjects in a CATA experiment. Food Quality and Preference, 72, 31-39.
Llobell, F., Giacalone, D., Labenne, A., Qannari, E.M. (2019). Assessment of the agreement and cluster analysis of the respondents in a CATA experiment. Food Quality and Preference, 77, 184-190.
plot.cluscata
, summary.cluscata
, catatis
, cluscata_kmeans
, change_cata_format
, change_cata_format2
data(straw) #with 40 subjects res=cluscata(Data=straw[,1:(16*40)], nblo=40) #plot(res, ngroups=3, Graph_dend=FALSE) summary(res, ngroups=3) #With noise cluster res2=cluscata(Data=straw[,1:(16*40)], nblo=40, Noise_cluster=TRUE, Graph_dend=FALSE, Graph_bar=FALSE) #With noise cluster and defined rho threshold #(high threshold for this example, you can put low threshold #(ex: 0.2 or 0.3) to avoid set aside lot of respondents) res3=cluscata(Data=straw[,1:(16*40)], nblo=40, Noise_cluster=TRUE, Graph_dend=FALSE, Graph_bar=FALSE, rhoparam=0.6) #with all subjects res=cluscata(Data=straw, nblo=114, printlevel=TRUE) #Vertical format data("fish") Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res3=cluscata(Data= chang2$Datafinal, nblo = 11, NameBlocks = chang2$NameSub)
data(straw) #with 40 subjects res=cluscata(Data=straw[,1:(16*40)], nblo=40) #plot(res, ngroups=3, Graph_dend=FALSE) summary(res, ngroups=3) #With noise cluster res2=cluscata(Data=straw[,1:(16*40)], nblo=40, Noise_cluster=TRUE, Graph_dend=FALSE, Graph_bar=FALSE) #With noise cluster and defined rho threshold #(high threshold for this example, you can put low threshold #(ex: 0.2 or 0.3) to avoid set aside lot of respondents) res3=cluscata(Data=straw[,1:(16*40)], nblo=40, Noise_cluster=TRUE, Graph_dend=FALSE, Graph_bar=FALSE, rhoparam=0.6) #with all subjects res=cluscata(Data=straw, nblo=114, printlevel=TRUE) #Vertical format data("fish") Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res3=cluscata(Data= chang2$Datafinal, nblo = 11, NameBlocks = chang2$NameSub)
Hierarchical clustering of subjects from a JAR experiment. Each cluster of subjects is associated with a compromise computed by the CATATIS method. The hierarchical clustering is followed by a partitioning algorithm (consolidation).
cluscata_jar(Data, nprod, nsub, levelsJAR=3, beta=0.1, Noise_cluster=FALSE, Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, nsub-2), rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50, Warnings=FALSE)
cluscata_jar(Data, nprod, nsub, levelsJAR=3, beta=0.1, Noise_cluster=FALSE, Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, nsub-2), rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50, Warnings=FALSE)
Data |
data frame where the first column is the Assessors, the second is the products and all other columns the JAR attributes with numbers (1 to 3 or 1 to 5, see levelsJAR) |
nprod |
integer. Number of products. |
nsub |
integer. Number of subjects. |
levelsJAR |
integer. 3 or 5 levels. If 5, the data will be transformed in 3 levels. |
beta |
numerical. Parameter for agreement between JAR and other answers. Between 0 and 0.5. |
Noise_cluster |
logical. Should a noise cluster be computed? Default: FALSE |
Itermax |
numerical. Maximum of iteration for the partitioning algorithm. Default:30 |
Graph_dend |
logical. Should the dendrogram be plotted? Default: TRUE |
Graph_bar |
logical. Should the barplot of the difference of the criterion and the barplot of the overall homogeneity at each merging step of the hierarchical algorithm be plotted? Default: TRUE |
printlevel |
logical. Print the number of remaining levels during the hierarchical clustering algorithm? Default: FALSE |
gpmax |
logical. What is maximum number of clusters to consider? Default: min(6, nblo-2) |
rhoparam |
numerical. What is the threshold for the noise cluster? Between 0 and 1, high value can imply lot of blocks set aside. If NULL, automatic threshold is computed. |
Testonlyoneclust |
logical. Test if there is more than one cluster? Default: FALSE |
alpha |
numerical between 0 and 1. What is the threshold to test if there is more than one cluster? Default: 0.05 |
nperm |
numerical. How many permutations are required to test if there is more than one cluster? Default: 50 |
Warnings |
logical. Display warnings about the fact that none of the subjects in some clusters checked an attribute or product? Default: FALSE |
Each partitionK contains a list for each number of clusters of the partition, K=1 to gpmax with:
group: the clustering partition after consolidation. If Noise_cluster=TRUE, some subjects could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster
homogeneity: homogeneity index (
s_with_compromise: similarity coefficient of each subject with its cluster compromise
weights: weight associated with each subject in its cluster
compromise: the compromise of each cluster
CA: list. the correspondance analysis results on each cluster compromise (coordinates, contributions...)
inertia: percentage of total variance explained by each axis of the CA for each cluster
s_all_cluster: the similarity coefficient between each subject and each cluster compromise
criterion: the CLUSCATA criterion error
param: parameters called
type: parameter passed to other functions
There is also at the end of the list:
dend: The CLUSCATA dendrogram
cutree_k: the partition obtained by cutting the dendrogram in K clusters (before consolidation).
overall_homogeneity_ng: percentage of overall homogeneity by number of clusters before consolidation (and after if there is no noise cluster)
diff_crit_ng: variation of criterion when a merging is done before consolidation (and after if there is no noise cluster)
test_one_cluster: decision and pvalue to know if there is more than one cluster
param: parameters called
type: parameter passed to other functions
Llobell, F., Vigneau, E. & Qannari, E. M. ((September 14, 2022). Multivariate data analysis and clustering of subjects in a Just about right task. Eurosense, Turku, Finland.
plot.cluscata
, summary.cluscata
, catatis_jar
, preprocess_JAR
, cluscata_kmeans_jar
data(cheese) res=cluscata_jar(Data=cheese, nprod=8, nsub=72, levelsJAR=5) #plot(res, ngroups=4, Graph_dend=FALSE) summary(res, ngroups=4)
data(cheese) res=cluscata_jar(Data=cheese, nprod=8, nsub=72, levelsJAR=5) #plot(res, ngroups=4, Graph_dend=FALSE) summary(res, ngroups=4)
Partitioning of binary Blocks from a CATA experiment. Each cluster is associated with a compromise computed by the CATATIS method. Moreover, a noise cluster can be set up.
cluscata_kmeans(Data,nblo, clust, nstart=100, rho=0, NameBlocks=NULL, NameVar=NULL, Itermax=30, Graph_groups=TRUE, print_attempt=FALSE, Warnings=FALSE)
cluscata_kmeans(Data,nblo, clust, nstart=100, rho=0, NameBlocks=NULL, NameVar=NULL, Itermax=30, Graph_groups=TRUE, print_attempt=FALSE, Warnings=FALSE)
Data |
data frame or matrix where the blocks of binary variables are merged horizontally. If you have a different format, see |
nblo |
numerical. Number of blocks (subjects). |
clust |
numerical vector or integer. Initial partition or number of starting partitions if integer. If numerical vector, the numbers must be 1,2,3,...,number of clusters |
nstart |
numerical. Number of starting partitions. Default: 100 |
rho |
numerical between 0 and 1. Threshold for the noise cluster. If 0, there is no noise cluster. Default: 0 |
NameBlocks |
string vector. Name of each block. Length must be equal to the number of blocks. If NULL, the names are S1,...Sm. Default: NULL |
NameVar |
string vector. Name of each variable (attribute, the same names for each subject). Length must be equal to the number of attributes. If NULL, the colnames of the first block are taken. Default: NULL |
Itermax |
numerical. Maximum of iterations by partitionning algorithm. Default: 30 |
Graph_groups |
logical. Should each cluster compromise graphical representation be plotted? Default: TRUE |
print_attempt |
logical. Print the number of remaining attempts in multi-start case? Default: FALSE |
Warnings |
logical. Display warnings about the fact that none of the subjects in some clusters checked an attribute or product? Default: FALSE |
a list with:
group: the clustering partition. If rho>0, some subjects could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster
homogeneity: percentage of homogeneity of the subjects in each cluster and the overall homogeneity
s_with_compromise: Similarity coefficient of each subject with its cluster compromise
weights: weight associated with each subject in its cluster
compromise: The compromise of each cluster
CA: The correspondance analysis results on each cluster compromise (coordinates, contributions...)
inertia: percentage of total variance explained by each axis of the CA for each cluster
s_all_cluster: the similarity coefficient between each subject and each cluster compromise
param: parameters called
criterion: the CLUSCATA criterion error
type: parameter passed to other functions
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2019). A new approach for the analysis of data and the clustering of subjects in a CATA experiment. Food Quality and Preference, 72, 31-39.
Llobell, F., Giacalone, D., Labenne, A., Qannari, E.M. (2019). Assessment of the agreement and cluster analysis of the respondents in a CATA experiment. Food Quality and Preference, 77, 184-190.
plot.cluscata
, summary.cluscata
, catatis
, cluscata
, change_cata_format
data(straw) cl_km=cluscata_kmeans(Data=straw[,1:(16*40)], nblo=40, clust=3) #plot(cl_km, Graph_groups=FALSE, Graph_weights = TRUE) summary(cl_km)
data(straw) cl_km=cluscata_kmeans(Data=straw[,1:(16*40)], nblo=40, clust=3) #plot(cl_km, Graph_groups=FALSE, Graph_weights = TRUE) summary(cl_km)
Partitionning of subject from a JAR experiment. Each cluster is associated with a compromise computed by the CATATIS method. Moreover, a noise cluster can be set up.
cluscata_kmeans_jar(Data, nprod, nsub, levelsJAR=3, beta=0.1, clust, nstart=100, rho=0, Itermax=30, Graph_groups=TRUE, print_attempt=FALSE, Warnings=FALSE)
cluscata_kmeans_jar(Data, nprod, nsub, levelsJAR=3, beta=0.1, clust, nstart=100, rho=0, Itermax=30, Graph_groups=TRUE, print_attempt=FALSE, Warnings=FALSE)
Data |
data frame where the first column is the Assessors, the second is the products and all other columns the JAR attributes with numbers (1 to 3 or 1 to 5, see levelsJAR) |
nprod |
integer. Number of products. |
nsub |
integer. Number of subjects. |
levelsJAR |
integer. 3 or 5 levels. If 5, the data will be transformed in 3 levels. |
beta |
numerical. Parameter for agreement between JAR and other answers. Between 0 and 0.5. |
clust |
numerical vector or integer. Initial partition or number of starting partitions if integer. If numerical vector, the numbers must be 1,2,3,...,number of clusters |
nstart |
numerical. Number of starting partitions. Default: 100 |
rho |
numerical between 0 and 1. Threshold for the noise cluster. If 0, there is no noise cluster. Default: 0 |
Itermax |
numerical. Maximum of iterations by partitionning algorithm. Default: 30 |
Graph_groups |
logical. Should each cluster compromise graphical representation be plotted? Default: TRUE |
print_attempt |
logical. Print the number of remaining attempts in multi-start case? Default: FALSE |
Warnings |
logical. Display warnings about the fact that none of the subjects in some clusters checked an attribute or product? Default: FALSE |
a list with:
group: the clustering partition. If rho>0, some subjects could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster
homogeneity: percentage of homogeneity of the subjects in each cluster and the overall homogeneity
s_with_compromise: Similarity coefficient of each subject with its cluster compromise
weights: weight associated with each subject in its cluster
compromise: The compromise of each cluster
CA: The correspondance analysis results on each cluster compromise (coordinates, contributions...)
inertia: percentage of total variance explained by each axis of the CA for each cluster
s_all_cluster: the similarity coefficient between each subject and each cluster compromise
param: parameters called
criterion: the CLUSCATA criterion error
type: parameter passed to other functions
Llobell, F., Vigneau, E. & Qannari, E. M. ((September 14, 2022). Multivariate data analysis and clustering of subjects in a Just about right task. Eurosense, Turku, Finland.
plot.cluscata
, summary.cluscata
, catatis_jar
, preprocess_JAR
, cluscata_jar
data(cheese) res=cluscata_kmeans_jar(Data=cheese, nprod=8, nsub=72, levelsJAR=5, clust=4) #plot(res) summary(res)
data(cheese) res=cluscata_kmeans_jar(Data=cheese, nprod=8, nsub=72, levelsJAR=5, clust=4) #plot(res) summary(res)
Hierarchical clustering of subjects (blocks) from a RATA experiment. Each cluster of blocks is associated with a compromise computed by the CATATIS method. The hierarchical clustering is followed by a partitioning algorithm (consolidation).
cluscata_rata(Data, nblo, NameBlocks=NULL, NameVar=NULL, Noise_cluster=FALSE, Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, nblo-2), rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50, Warnings=FALSE)
cluscata_rata(Data, nblo, NameBlocks=NULL, NameVar=NULL, Noise_cluster=FALSE, Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, nblo-2), rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50, Warnings=FALSE)
Data |
data frame or matrix where the blocks of binary variables are merged horizontally. If you have a different format, see |
nblo |
numerical. Number of blocks (subjects). |
NameBlocks |
string vector. Name of each block (subject). Length must be equal to the number of blocks. If NULL, the names are S1,...Sm. Default: NULL |
NameVar |
string vector. Name of each variable (attribute, the same names for each subject). Length must be equal to the number of attributes. If NULL, the colnames of the first block are taken. Default: NULL |
Noise_cluster |
logical. Should a noise cluster be computed? Default: FALSE |
Itermax |
numerical. Maximum of iteration for the partitioning algorithm. Default:30 |
Graph_dend |
logical. Should the dendrogram be plotted? Default: TRUE |
Graph_bar |
logical. Should the barplot of the difference of the criterion and the barplot of the overall homogeneity at each merging step of the hierarchical algorithm be plotted? Default: TRUE |
printlevel |
logical. Print the number of remaining levels during the hierarchical clustering algorithm? Default: FALSE |
gpmax |
logical. What is maximum number of clusters to consider? Default: min(6, nblo-2) |
rhoparam |
numerical. What is the threshold for the noise cluster? Between 0 and 1, high value can imply lot of blocks set aside. If NULL, automatic threshold is computed. |
Testonlyoneclust |
logical. Test if there is more than one cluster? Default: FALSE |
alpha |
numerical between 0 and 1. What is the threshold to test if there is more than one cluster? Default: 0.05 |
nperm |
numerical. How many permutations are required to test if there is more than one cluster? Default: 50 |
Warnings |
logical. Display warnings about the fact that none of the subjects in some clusters checked an attribute or product? Default: FALSE |
Each partitionK contains a list for each number of clusters of the partition, K=1 to gpmax with:
group: the clustering partition after consolidation. If Noise_cluster=TRUE, some subjects could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster
homogeneity: homogeneity index (
s_with_compromise: similarity coefficient of each subject with its cluster compromise
weights: weight associated with each subject in its cluster
compromise: the compromise of each cluster
CA: list. the correspondance analysis results on each cluster compromise (coordinates, contributions...)
inertia: percentage of total variance explained by each axis of the CA for each cluster
s_all_cluster: the similarity coefficient between each subject and each cluster compromise
criterion: the CLUSCATA criterion error
param: parameters called
type: parameter passed to other functions
There is also at the end of the list:
dend: The CLUSCATA dendrogram
cutree_k: the partition obtained by cutting the dendrogram in K clusters (before consolidation).
overall_homogeneity_ng: percentage of overall homogeneity by number of clusters before consolidation (and after if there is no noise cluster)
diff_crit_ng: variation of criterion when a merging is done before consolidation (and after if there is no noise cluster)
test_one_cluster: decision and pvalue to know if there is more than one cluster
param: parameters called
type: parameter passed to other functions
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2019). A new approach for the analysis of data and the clustering of subjects in a CATA experiment. Food Quality and Preference, 72, 31-39.
Llobell, F., Giacalone, D., Labenne, A., Qannari, E.M. (2019). Assessment of the agreement and cluster analysis of the respondents in a CATA experiment. Food Quality and Preference, 77, 184-190.
Conference to come (Eurosense 2024)
plot.cluscata
, summary.cluscata
, catatis_rata
, change_cata_format
, change_cata_format2
#RATA data without session data(RATAchoc) Data=RATAchoc[1:108,2:16] chang2=change_cata_format2(Data, nprod= 12, nattr= 13, nsub = 9, nsess = 1) res.clus=cluscata_rata(Data= chang2$Datafinal, nblo = 9, NameBlocks = chang2$NameSub) summary(res.clus) plot(res.clus)
#RATA data without session data(RATAchoc) Data=RATAchoc[1:108,2:16] chang2=change_cata_format2(Data, nprod= 12, nattr= 13, nsub = 9, nsess = 1) res.clus=cluscata_rata(Data= chang2$Datafinal, nblo = 9, NameBlocks = chang2$NameSub) summary(res.clus) plot(res.clus)
Clustering of rows (products in sensory analysis) in a Multi-block context. The hierarchical clustering is followed by a partitioning algorithm (consolidation).
ClusMB(Data, Blocks, NameBlocks=NULL, scale=FALSE, center=TRUE, nclust=NULL, gpmax=6)
ClusMB(Data, Blocks, NameBlocks=NULL, scale=FALSE, center=TRUE, nclust=NULL, gpmax=6)
Data |
data frame or matrix. Correspond to all the blocks of variables merged horizontally |
Blocks |
numerical vector. The number of variables of each block. The sum must be equal to the number of columns of Data. |
NameBlocks |
string vector. Name of each block. Length must be equal to the length of Blocks vector. If NULL, the names are B1,...Bm. Default: NULL |
scale |
logical. Should the data variables be scaled? Default: FALSE |
center |
logical. Should the data variables be centered? Default: TRUE. Please set to FALSE for a CATA experiment |
nclust |
numerical. Number of clusters to consider. If NULL, the Hartigan index advice is taken. |
gpmax |
logical. What is maximum number of clusters to consider? Default: min(6, number of blocks -2) |
group: the clustering partition after consolidation.
nbgH: Advised number of clusters per Hartigan index
nbgCH: Advised number of clusters per Calinski-Harabasz index
cutree_k: the partition obtained by cutting the dendrogram in K clusters (before consolidation).
dend: The ClusMB dendrogram
param: parameters called
type: parameter passed to other functions
Llobell, F., Qannari, E.M. (June 10, 2022). Cluster analysis in a multi-bloc setting. SMTDA, Athens, Greece.
Llobell, F., Giacalone, D., Qannari, E. M. (Pangborn 2021). Cluster Analysis of products in CATA experiments.
Paper submitted
indicesClusters
, summary.clusRows
, clustRowsOnStatisAxes
#####projective mapping#### library(ClustBlock) data(smoo) res1=ClusMB(smoo, rep(2,24)) summary(res1) indicesClusters(smoo, rep(2,24), res1$group) ####CATA#### data(fish) Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res2=ClusMB(Data= chang2$Datafinal, Blocks= rep(27, 11), center=FALSE) indicesClusters(Data= chang2$Datafinal, Blocks= rep(27, 11),cut = res2$group, center=FALSE)
#####projective mapping#### library(ClustBlock) data(smoo) res1=ClusMB(smoo, rep(2,24)) summary(res1) indicesClusters(smoo, rep(2,24), res1$group) ####CATA#### data(fish) Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res2=ClusMB(Data= chang2$Datafinal, Blocks= rep(27, 11), center=FALSE) indicesClusters(Data= chang2$Datafinal, Blocks= rep(27, 11),cut = res2$group, center=FALSE)
Hierarchical clustering of quantitative Blocks followed by a partitioning algorithm (consolidation). Each cluster of blocks is associated with a compromise computed by the STATIS method. Moreover, a noise cluster can be set up.
clustatis(Data,Blocks,NameBlocks=NULL,Noise_cluster=FALSE,scale=FALSE, Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, length(Blocks)-2), rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50)
clustatis(Data,Blocks,NameBlocks=NULL,Noise_cluster=FALSE,scale=FALSE, Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, length(Blocks)-2), rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50)
Data |
data frame or matrix. Correspond to all the blocks of variables merged horizontally |
Blocks |
numerical vector. The number of variables of each block. The sum must be equal to the number of columns of Data |
NameBlocks |
string vector. Name of each block. Length must be equal to the length of Blocks vector. If NULL, the names are B1,...Bm. Default: NULL |
Noise_cluster |
logical. Should a noise cluster be computed? Default: FALSE |
scale |
logical. Should the data variables be scaled? Default: FALSE |
Itermax |
numerical. Maximum of iteration for the partitioning algorithm. Default: 30 |
Graph_dend |
logical. Should the dendrogram be plotted? Default: TRUE |
Graph_bar |
logical. Should the barplot of the difference of the criterion and the barplot of the overall homogeneity at each merging step of the hierarchical algorithm be plotted? Default: TRUE |
printlevel |
logical. Print the number of remaining levels during the hierarchical clustering algorithm? Default: FALSE |
gpmax |
logical. What is maximum number of clusters to consider? Default: min(6, number of blocks -2) |
rhoparam |
numerical. What is the threshold for the noise cluster? Between 0 and 1, high value can imply lot of blocks set aside. If NULL, automatic threshold is computed. |
Testonlyoneclust |
logical. Test if there is more than one cluster? Default: FALSE |
alpha |
numerical between 0 and 1. What is the threshold to test if there is more than one cluster? Default: 0.05 |
nperm |
numerical. How many permutations are required to test if there is more than one cluster? Default: 50 |
Each partitionK contains a list for each number of clusters of the partition, K=1 to gpmax with:
group: the clustering partition of datasets after consolidation. If Noise_cluster=TRUE, some blocks could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster (computed or input parameter)
homogeneity: homogeneity index (
rv_with_compromise: RV coefficient of each block with its cluster compromise
weights: weight associated with each block in its cluster
comp_RV: RV coefficient between the compromises associated with the various clusters
compromise: the W compromise of each cluster
coord: the coordinates of objects of each cluster
inertia: percentage of total variance explained by each axis for each cluster
rv_all_cluster: the RV coefficient between each block and each cluster compromise
criterion: the CLUSTATIS criterion error
param: parameters called in the consolidation
type: parameter passed to other functions
There is also at the end of the list:
dend: The CLUSTATIS dendrogram
cutree_k: the partition obtained by cutting the dendrogram for K clusters (before consolidation).
overall_homogeneity_ng: percentage of overall homogeneity by number of clusters before consolidation (and after if there is no noise cluster)
diff_crit_ng: variation of criterion when a merging is done before consolidation (and after if there is no noise cluster)
test_one_cluster: decision and pvalue to know if there is more than one cluster
param: parameters called
type: parameter passed to other functions
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2018). Analysis and clustering of multiblock datasets by means of the STATIS and CLUSTATIS methods. Application to sensometrics. Food Quality and Preference, in Press.
Llobell, F., Vigneau, E., Qannari, E. M. (2019). Clustering datasets by means of CLUSTATIS with identification of atypical datasets. Application to sensometrics. Food Quality and Preference, 75, 97-104.
plot.clustatis
, summary.clustatis
, clustatis_kmeans
, statis
data(smoo) NameBlocks=paste0("S",1:24) cl=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks) #plot(cl, ngroups=3, Graph_dend=FALSE) summary(cl) #with noise cluster cl2=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, Noise_cluster=TRUE, Graph_dend=FALSE, Graph_bar=FALSE) #with noise cluster and defined rho threshold cl3=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, Noise_cluster=TRUE, Graph_dend=FALSE, Graph_bar=FALSE, rhoparam=0.5)
data(smoo) NameBlocks=paste0("S",1:24) cl=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks) #plot(cl, ngroups=3, Graph_dend=FALSE) summary(cl) #with noise cluster cl2=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, Noise_cluster=TRUE, Graph_dend=FALSE, Graph_bar=FALSE) #with noise cluster and defined rho threshold cl3=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, Noise_cluster=TRUE, Graph_dend=FALSE, Graph_bar=FALSE, rhoparam=0.5)
Hierarchical clustering of free sorting data followed by a partitioning algorithm (consolidation). Each cluster of blocks is associated with a compromise computed by the STATIS method. Moreover, a noise cluster can be set up.
clustatis_FreeSort(Data, NameSub=NULL, Noise_cluster=FALSE,Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, ncol(Data)-1),rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50)
clustatis_FreeSort(Data, NameSub=NULL, Noise_cluster=FALSE,Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE, printlevel=FALSE, gpmax=min(6, ncol(Data)-1),rhoparam=NULL, Testonlyoneclust=FALSE, alpha=0.05, nperm=50)
Data |
data frame or matrix. Corresponds to all variables that contain subjects results. Each column corresponds to a subject and gives the groups to which the products (rows) are assigned |
NameSub |
string vector. Name of each subject. Length must be equal to the number of clumn of the Data. If NULL, the names are S1,...Sm. Default: NULL |
Noise_cluster |
logical. Should a noise cluster be computed? Default: FALSE |
Itermax |
numerical. Maximum of iteration for the partitioning algorithm. Default: 30 |
Graph_dend |
logical. Should the dendrogram be plotted? Default: TRUE |
Graph_bar |
logical. Should the barplot of the difference of the criterion and the barplot of the overall homogeneity at each merging be plotted? Default: FALSE |
printlevel |
logical. Print the number of remaining levels during the hierarchical clustering algorithm? Default: FALSE |
gpmax |
logical. What is maximum number of clusters to consider? Default: min(6, number of subjects -1) |
rhoparam |
numerical. What is the threshold for the noise cluster? Between 0 and 1, high value can imply lot of blocks set aside. If NULL, automatic threshold is computed. |
Testonlyoneclust |
logical. Test if there is more than one cluster? Default: FALSE |
alpha |
numerical between 0 and 1. What is the threshold to test if there is more than one cluster? Default: 0.05 |
nperm |
numerical. How many permutations are required to test if there is more than one cluster? Default: 50 |
Each partitionK contains a list for each number of clusters of the partition, K=1 to gpmax with:
group: the clustering partition of subjects after consolidation. If Noise_cluster=TRUE, some subjects could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster
homogeneity: homogeneity index (
rv_with_compromise: RV coefficient of each block with its cluster compromise
weights: weight associated with each subject in its cluster
comp_RV: RV coefficient between the compromises associated with the various clusters
compromise: the W compromise of each cluster
coord: the coordinates of objects of each cluster
inertia: percentage of total variance explained by each axis for each cluster
rv_all_cluster: the RV coefficient between each subject and each cluster compromise
criterion: the CLUSTATIS criterion error
param: parameters called in the consolidation
type: parameter passed to other functions
There is also at the end of the list:
dend: The CLUSTATIS dendrogram
cutree_k: the partition obtained by cutting the dendrogram for K clusters (before consolidation).
overall_homogeneity_ng: percentage of overall homogeneity by number of clusters before consolidation (and after if there is no noise cluster)
diff_crit_ng: variation of criterion when a merging is done before consolidation (and after if there is no noise cluster)
test_one_cluster: decision and pvalue to know if there is more than one cluster
param: parameters called
type: parameter passed to other functions
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2018). Analysis and clustering of multiblock datasets by means of the STATIS and CLUSTATIS methods. Application to sensometrics. Food Quality and Preference, in Press.
Llobell, F., Vigneau, E., Qannari, E. M. (2019). Clustering datasets by means of CLUSTATIS with identification of atypical datasets. Application to sensometrics. Food Quality and Preference, 75, 97-104.
clustatis
, preprocess_FreeSort
, summary.clustatis
, , plot.clustatis
data(choc) res.clu=clustatis_FreeSort(choc) plot(res.clu, Graph_dend=FALSE) summary(res.clu)
data(choc) res.clu=clustatis_FreeSort(choc) plot(res.clu, Graph_dend=FALSE) summary(res.clu)
Partitionning algorithm for Free Sorting data. Each cluster is associated with a compromise computed by the STATIS method. Moreover, a noise cluster can be set up.
clustatis_FreeSort_kmeans(Data, NameSub=NULL, clust, nstart=100, rho=0,Itermax=30, Graph_groups=TRUE, Graph_weights=FALSE, print_attempt=FALSE)
clustatis_FreeSort_kmeans(Data, NameSub=NULL, clust, nstart=100, rho=0,Itermax=30, Graph_groups=TRUE, Graph_weights=FALSE, print_attempt=FALSE)
Data |
data frame or matrix. Corresponds to all variables that contain subjects results. Each column corresponds to a subject and gives the groups to which the products (rows) are assigned |
NameSub |
string vector. Name of each subject. Length must be equal to the number of clumn of the Data. If NULL, the names are S1,...Sm. Default: NULL |
clust |
numerical vector or integer. Initial partition or number of starting partitions if integer. If numerical vector, the numbers must be 1,2,3,...,number of clusters |
nstart |
integer. Number of starting partitions. Default: 100 |
rho |
numerical between 0 and 1. Threshold for the noise cluster. Default:0 |
Itermax |
numerical. Maximum of iterations by partitionning algorithm. Default: 30 |
Graph_groups |
logical. Should each cluster compromise be plotted? Default: TRUE |
Graph_weights |
logical. Should the barplot of the weights in each cluster be plotted? Default: FALSE |
print_attempt |
logical. Print the number of remaining attempts in the multi-start case? Default: FALSE |
a list with:
group: the clustering partition. If rho>0, some subjects could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster
homogeneity: percentage of homogeneity of the subjects in each cluster and the overall homogeneity
rv_with_compromise: RV coefficient of each subject with its cluster compromise
weights: weight associated with each subject in its cluster
comp_RV: RV coefficient between the compromises associated with the various clusters
compromise: the W compromise of each cluster
coord: the coordinates of objects of each cluster
inertia: percentage of total variance explained by each axis for each cluster
rv_all_cluster: the RV coefficient between each subject and each cluster compromise
criterion: the CLUSTATIS criterion error
param: parameters called
type: parameter passed to other functions
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2018). Analysis and clustering of multiblock datasets by means of the STATIS and CLUSTATIS methods. Application to sensometrics. Food Quality and Preference, in Press.
Llobell, F., Vigneau, E., Qannari, E. M. (2019). Clustering datasets by means of CLUSTATIS with identification of atypical datasets. Application to sensometrics. Food Quality and Preference, 75, 97-104.
clustatis_FreeSort
, preprocess_FreeSort
, summary.clustatis
, , plot.clustatis
data(choc) res.clu=clustatis_FreeSort_kmeans(choc, clust=2) plot(res.clu, Graph_groups=FALSE, Graph_weights=TRUE) summary(res.clu)
data(choc) res.clu=clustatis_FreeSort_kmeans(choc, clust=2) plot(res.clu, Graph_groups=FALSE, Graph_weights=TRUE) summary(res.clu)
Partitioning algorithm for quantitative variables. Each cluster is associated with a compromise computed by the STATIS method. Moreover, a noise cluster can be set up.
clustatis_kmeans(Data, Blocks, clust, nstart=100, rho=0, NameBlocks=NULL, Itermax=30,Graph_groups=TRUE, Graph_weights=FALSE, scale=FALSE, print_attempt=FALSE)
clustatis_kmeans(Data, Blocks, clust, nstart=100, rho=0, NameBlocks=NULL, Itermax=30,Graph_groups=TRUE, Graph_weights=FALSE, scale=FALSE, print_attempt=FALSE)
Data |
data frame or matrix. Correspond to all the blocks of variables merged horizontally |
Blocks |
numerical vector. The number of variables of each block. The sum must be equal to the number of columns of Data |
clust |
numerical vector or integer. Initial partition or number of starting partitions if integer. If numerical vector, the numbers must be 1,2,3,...,number of clusters |
nstart |
integer. Number of starting partitions. Default: 100 |
rho |
numerical between 0 and 1. Threshold for the noise cluster. Default:0 |
NameBlocks |
string vector. Name of each block. Length must be equal to the length of Blocks vector. If NULL, the names are B1,...Bm. Default: NULL |
Itermax |
numerical. Maximum of iterations by partitionning algorithm. Default: 30 |
Graph_groups |
logical. Should each cluster compromise be plotted? Default: TRUE |
Graph_weights |
logical. Should the barplot of the weights in each cluster be plotted? Default: FALSE |
scale |
logical. Should the data variables be scaled? Default: FALSE |
print_attempt |
logical. Print the number of remaining attempts in the multi-start case? Default: FALSE |
a list with:
group: the clustering partition. If rho>0, some blocks could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster
homogeneity: percentage of homogeneity of the blocks in each cluster and the overall homogeneity
rv_with_compromise: RV coefficient of each block with its cluster compromise
weights: weight associated with each block in its cluster
comp_RV: RV coefficient between the compromises associated with the various clusters
compromise: the W compromise of each cluster
coord: the coordinates of objects of each cluster
inertia: percentage of total variance explained by each axis for each cluster
rv_all_cluster: the RV coefficient between each block and each cluster compromise
criterion: the CLUSTATIS criterion error
param: parameters called
type: parameter passed to other functions
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2018). Analysis and clustering of multiblock datasets by means of the STATIS and CLUSTATIS methods. Application to sensometrics. Food Quality and Preference, in Press.
Llobell, F., Vigneau, E., Qannari, E. M. (2019). Clustering datasets by means of CLUSTATIS with identification of atypical datasets. Application to sensometrics. Food Quality and Preference, 75, 97-104.
plot.clustatis
, clustatis
, summary.clustatis
, statis
data(smoo) NameBlocks=paste0("S",1:24) #with multi-start cl_km=clustatis_kmeans(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, clust=3) #with an initial partition cl=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, Graph_dend=FALSE) partition=cl$cutree_k$partition3 cl_km2=clustatis_kmeans(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, clust=partition, Graph_weights=FALSE, Graph_groups=FALSE) graphics.off()
data(smoo) NameBlocks=paste0("S",1:24) #with multi-start cl_km=clustatis_kmeans(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, clust=3) #with an initial partition cl=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, Graph_dend=FALSE) partition=cl$cutree_k$partition3 cl_km2=clustatis_kmeans(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks, clust=partition, Graph_weights=FALSE, Graph_groups=FALSE) graphics.off()
Clustering of rows (products in sensory analysis) in a Multi-block context. The STATIS method is followed by a hierarchical algorithm.
clustRowsOnStatisAxes(Data, Blocks, NameBlocks=NULL, scale=FALSE, nclust=NULL, gpmax=6, ncomp=5)
clustRowsOnStatisAxes(Data, Blocks, NameBlocks=NULL, scale=FALSE, nclust=NULL, gpmax=6, ncomp=5)
Data |
data frame or matrix. Correspond to all the blocks of variables merged horizontally |
Blocks |
numerical vector. The number of variables of each block. The sum must be equal to the number of columns of Data. |
NameBlocks |
string vector. Name of each block. Length must be equal to the length of Blocks vector. If NULL, the names are B1,...Bm. Default: NULL |
scale |
logical. Should the data variables be scaled? Default: FALSE |
nclust |
numerical. Number of clusters to consider. If NULL, the Hartigan index advice is taken. |
gpmax |
logical. What is maximum number of clusters to consider? min(6, number of blocks -2) |
ncomp |
numerical. Number of axes to consider. Default:5 |
group: the clustering partition.
nbgH: Advised number of clusters per Hartigan index
nbgCH: Advised number of clusters per Calinski-Harabasz index
cutree_k: the partition obtained by cutting the dendrogram in K clusters
dend: The dendrogram
param: parameters called
type: parameter passed to other functions
Paper submitted
indicesClusters
, summary.clusRows
, ClusMB
#####projective mapping#### library(ClustBlock) data(smoo) res1=clustRowsOnStatisAxes(smoo, rep(2,24)) summary(res1) indicesClusters(smoo, rep(2,24), res1$group) ####CATA#### data(fish) Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res2=clustRowsOnStatisAxes(Data= chang2$Datafinal, Blocks= rep(27, 11)) indicesClusters(Data= chang2$Datafinal, Blocks= rep(27, 11),cut = res2$group, center=FALSE)
#####projective mapping#### library(ClustBlock) data(smoo) res1=clustRowsOnStatisAxes(smoo, rep(2,24)) summary(res1) indicesClusters(smoo, rep(2,24), res1$group) ####CATA#### data(fish) Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res2=clustRowsOnStatisAxes(Data= chang2$Datafinal, Blocks= rep(27, 11)) indicesClusters(Data= chang2$Datafinal, Blocks= rep(27, 11),cut = res2$group, center=FALSE)
Permutation test on the agreement between subjects for each attribute in a CATA experiment
consistency_cata(Data,nblo, nperm=100, alpha=0.05, printAttrTest=FALSE)
consistency_cata(Data,nblo, nperm=100, alpha=0.05, printAttrTest=FALSE)
Data |
data frame or matrix. Correspond to all the blocks of variables merged horizontally |
nblo |
numerical. Number of blocks (subjects). |
nperm |
numerical. How many permutations are required? Default: 100 |
alpha |
numerical between 0 and 1. What is the threshold? Default: 0.05 |
printAttrTest |
logical. Print the number of remaining attributes to be tested? Default: FALSE |
a list with:
consist: the consistent attributes
no_consist: the inconsistent attributes
pval: pvalue for each test
Llobell, F., Giacalone, D., Labenne, A., Qannari, E.M. (2019). Assessment of the agreement and cluster analysis of the respondents in a CATA experiment. Food Quality and Preference, 77, 184-190.
consistency_cata_panel
, change_cata_format
, change_cata_format2
data(straw) #with only 40 subjects consistency_cata(Data=straw[,1:(16*40)], nblo=40) #with all subjects consistency_cata(Data=straw, nblo=114, printAttrTest=TRUE)
data(straw) #with only 40 subjects consistency_cata(Data=straw[,1:(16*40)], nblo=40) #with all subjects consistency_cata(Data=straw, nblo=114, printAttrTest=TRUE)
Permutation test on the agreement between subjects in a CATA experiment
consistency_cata_panel(Data,nblo, nperm=100, alpha=0.05)
consistency_cata_panel(Data,nblo, nperm=100, alpha=0.05)
Data |
data frame or matrix. Correspond to all the blocks of variables merged horizontally |
nblo |
numerical. Number of blocks (subjects). |
nperm |
numerical. How many permutations are required? Default: 100 |
alpha |
numerical between 0 and 1. What is the threshold? Default: 0.05 |
a list with:
answer: the answer of the test
pval: pvalue of the test
dis: distance between the homogeneity and the median of the permutations
Llobell, F., Giacalone, D., Labenne, A., Qannari, E.M. (2019). Assessment of the agreement and cluster analysis of the respondents in a CATA experiment. Food Quality and Preference, 77, 184-190.
Bonnet, L., Ferney, T., Riedel, T., Qannari, E.M., Llobell, F. (September 14, 2022) .Using CATA for sensory profiling: assessment of the panel performance. Eurosense, Turku, Finland.
consistency_cata
, change_cata_format
, change_cata_format2
data(straw) #with all subjects consistency_cata_panel(Data=straw, nblo=114)
data(straw) #with all subjects consistency_cata_panel(Data=straw, nblo=114)
fish data
data(fish)
data(fish)
CATA data with sessions. A data frame with the sessions, the panelists, the products and CATA attributes.
Bonnet, L., Ferney, T., Riedel, T., Qannari, E.M., Llobell, F. (September 14, 2022) .Using CATA for sensory profiling: assessment of the panel performance. Eurosense, Turku, Finland.
data(fish)
data(fish)
Compute the Il index to evaluate the agreement between each block and the global partition (in sensory: agreement between each subject and the global partition)
Compute the Jl index to evaluate if each block has a partition (in sensory: if each subject made a partition of products)
indicesClusters(Data, Blocks, cut, NameBlocks=NULL, center=TRUE, scale=FALSE)
indicesClusters(Data, Blocks, cut, NameBlocks=NULL, center=TRUE, scale=FALSE)
Data |
data frame or matrix. Correspond to all the blocks of variables merged horizontally |
Blocks |
numerical vector. The number of variables of each block. The sum must be equal to the number of columns of Data. |
cut |
numerical vector. The partition of the cluster analysis. |
NameBlocks |
string vector. Name of each block. Length must be equal to the length of Blocks vector. If NULL, the names are B1,...Bm. Default: NULL |
center |
logical. Should the data variables be centered? Default: TRUE. Please set to FALSE for a CATA experiment |
scale |
logical. Should the data variables be scaled? Default: FALSE |
Il: the Il indices
jl: the jl indicess
Llobell, F., Qannari, E.M. (June 10, 2022). Cluster analysis in a multi-bloc setting. SMTDA, Athens, Greece.
Llobell, F., Giacalone, D., Qannari, E. M. (Pangborn 2021). Cluster Analysis of products in CATA experiments.
Paper submitted
clustRowsOnStatisAxes
, , ClusMB
#####projective mapping#### library(ClustBlock) data(smoo) res1=ClusMB(smoo, rep(2,24)) summary(res1) indicesClusters(smoo, rep(2,24), res1$group) ####CATA#### data(fish) Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res2=ClusMB(Data= chang2$Datafinal, Blocks= rep(27, 11), center=FALSE) indicesClusters(Data= chang2$Datafinal, Blocks= rep(27, 11),cut = res2$group, center=FALSE)
#####projective mapping#### library(ClustBlock) data(smoo) res1=ClusMB(smoo, rep(2,24)) summary(res1) indicesClusters(smoo, rep(2,24), res1$group) ####CATA#### data(fish) Data=fish[1:66,2:30] chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1) res2=ClusMB(Data= chang2$Datafinal, Blocks= rep(27, 11), center=FALSE) indicesClusters(Data= chang2$Datafinal, Blocks= rep(27, 11),cut = res2$group, center=FALSE)
This function plots the CATATIS map and CATATIS weights
## S3 method for class 'catatis' plot(x, Graph=TRUE, Graph_weights=TRUE, Graph_eig=TRUE, axes=c(1,2), tit="CATATIS", cex=1, col.obj="blue", col.attr="red", ...)
## S3 method for class 'catatis' plot(x, Graph=TRUE, Graph_weights=TRUE, Graph_eig=TRUE, axes=c(1,2), tit="CATATIS", cex=1, col.obj="blue", col.attr="red", ...)
x |
object of class 'catatis' |
Graph |
logical. Show the graphical representation? Default: TRUE |
Graph_weights |
logical. Should the barplot of the weights be plotted? Default: TRUE |
Graph_eig |
logical. Should the barplot of the eigenvalues be plotted? Only with Graph=TRUE. Default: TRUE |
axes |
numerical vector (length 2). Axes to be plotted |
tit |
string. Title for the graphical representation. Default: 'CATATIS' |
cex |
numerical. Numeric character expansion factor; multiplied by par("cex") yields the final character size. NULL and NA are equivalent to 1.0. |
col.obj |
numerical or string. Color for the objects points. Default: "blue" |
col.attr |
numerical or string. Color for the attributes points. Default: "red" |
... |
further arguments passed to or from other methods |
the CATATIS map
data(straw) res.cat=catatis(straw, nblo=114) plot(res.cat, Graph_weights=FALSE, axes=c(1,3))
data(straw) res.cat=catatis(straw, nblo=114) plot(res.cat, Graph_weights=FALSE, axes=c(1,3))
This function plots dendrogram, variation of the merging criterion, weights and CATATIS map of each cluster
## S3 method for class 'cluscata' plot(x, ngroups=NULL, Graph_groups=TRUE, Graph_dend=TRUE, Graph_bar=FALSE, Graph_weights=FALSE, axes=c(1,2), cex=1, col.obj="blue", col.attr="red", ...)
## S3 method for class 'cluscata' plot(x, ngroups=NULL, Graph_groups=TRUE, Graph_dend=TRUE, Graph_bar=FALSE, Graph_weights=FALSE, axes=c(1,2), cex=1, col.obj="blue", col.attr="red", ...)
x |
object of class 'cluscata'. |
ngroups |
number of groups to consider. Ignored for cluscata_kmeans results. Default: recommended number of clusters |
Graph_groups |
logical. Should each cluster compromise graphical representation be plotted? Default: TRUE |
Graph_dend |
logical. Should the dendrogram be plotted? Default: TRUE |
Graph_bar |
logical. Should the barplot of the difference of the criterion and the barplot of the overall homogeneity at each merging step of the hierarchical algorithm be plotted? Also available after consolidation if Noise_cluster=FALSE. Default: FALSE |
Graph_weights |
logical. Should the barplot of the weights in each cluster be plotted? Default: FALSE |
axes |
numerical vector (length 2). Axes to be plotted. Default: c(1,2) |
cex |
numerical. Numeric character expansion factor; multiplied by par("cex") yields the final character size. NULL and NA are equivalent to 1.0. |
col.obj |
numerical or string. Color for the objects points. Default: "blue" |
col.attr |
numerical or string. Color for the attributes points. Default: "red" |
... |
further arguments passed to or from other methods |
the CLUSCATA graphs
data(straw) res=cluscata(Data=straw[,1:(16*40)], nblo=40) plot(res, ngroups=3, Graph_dend=FALSE) plot(res, ngroups=3, Graph_dend=FALSE,Graph_bar=FALSE, Graph_weights=FALSE, axes=c(1,3))
data(straw) res=cluscata(Data=straw[,1:(16*40)], nblo=40) plot(res, ngroups=3, Graph_dend=FALSE) plot(res, ngroups=3, Graph_dend=FALSE,Graph_bar=FALSE, Graph_weights=FALSE, axes=c(1,3))
This function plots dendrogram, variation of the merging criterion, weights and STATIS map of each cluster
## S3 method for class 'clustatis' plot(x, ngroups=NULL, Graph_groups=TRUE, Graph_dend=TRUE, Graph_bar=FALSE, Graph_weights=FALSE, axes=c(1,2), col=NULL, cex=1, font=1, ...)
## S3 method for class 'clustatis' plot(x, ngroups=NULL, Graph_groups=TRUE, Graph_dend=TRUE, Graph_bar=FALSE, Graph_weights=FALSE, axes=c(1,2), col=NULL, cex=1, font=1, ...)
x |
object of class 'clustatis'. |
ngroups |
number of groups to consider. Ignored for clustatis_kmeans results. Default: recommended number of clusters |
Graph_groups |
logical. Should each cluster compromise graphical representation be plotted? Default: TRUE |
Graph_dend |
logical. Should the dendrogram be plotted? Default: TRUE |
Graph_bar |
logical. Should the barplot of the difference of the criterion and the barplot of the overall homogeneity at each merging step of the hierarchical algorithm be plotted? Also available after consolidation if Noise_cluster=FALSE. Default: FALSE |
Graph_weights |
logical. Should the barplot of the weights in each cluster be plotted? Default: FALSE |
axes |
numerical vector (length 2). Axes to be plotted. Default: c(1,2) |
col |
vector. Color for each object. Default: rainbow(nrow(Data)) |
cex |
numerical. Numeric character expansion factor; multiplied by par("cex") yields the final character size. NULL and NA are equivalent to 1.0. |
font |
numerical. Integer specifying font to use for text. 1=plain, 2=bold, 3=italic, 4=bold italic, 5=symbol. Default: 1 |
... |
further arguments passed to or from other methods |
the CLUSTATIS graphs
data(smoo) NameBlocks=paste0("S",1:24) cl=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks) plot(cl, ngroups=3, Graph_dend=FALSE) plot(cl, ngroups=3, Graph_dend=FALSE, axes=c(1,3)) graphics.off()
data(smoo) NameBlocks=paste0("S",1:24) cl=clustatis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks) plot(cl, ngroups=3, Graph_dend=FALSE) plot(cl, ngroups=3, Graph_dend=FALSE, axes=c(1,3)) graphics.off()
This function plots the STATIS map and STATIS weights
## S3 method for class 'statis' plot(x, axes=c(1,2), Graph_obj=TRUE, Graph_weights=TRUE, Graph_eig=TRUE, tit="STATIS", col=NULL, cex=1, font=1, xlim=NULL, ylim=NULL, ...)
## S3 method for class 'statis' plot(x, axes=c(1,2), Graph_obj=TRUE, Graph_weights=TRUE, Graph_eig=TRUE, tit="STATIS", col=NULL, cex=1, font=1, xlim=NULL, ylim=NULL, ...)
x |
object of class 'statis' |
axes |
numerical vector (length 2). Axes to be plotted. Default: c(1,2) |
Graph_obj |
logical. Should the compromise graphical representation be plotted? Default: TRUE |
Graph_weights |
logical. Should the barplot of the weights be plotted? Default: TRUE |
Graph_eig |
logical. Should the barplot of the eigenvalues be plotted? Only with Graph_obj=TRUE. Default: TRUE |
tit |
string. Title for the objects graphical representation. Default: 'STATIS' |
col |
vector. Color for each object. If NULL, col=rainbow(nrow(Data)). Default: NULL |
cex |
numerical. Numeric character expansion factor; multiplied by par("cex") yields the final character size. NULL and NA are equivalent to 1.0. |
font |
numerical. Integer specifying font to use for text. 1=plain, 2=bold, 3=italic, 4=bold italic, 5=symbol. Default: 1 |
xlim |
numerical vector (length 2). Minimum and maximum for x coordinates. |
ylim |
numerical vector (length 2). Minimum and maximum for y coordinates. |
... |
further arguments passed to or from other methods |
the STATIS graphs
data(smoo) NameBlocks=paste0("S",1:24) st=statis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks) plot(st, axes=c(1,3), Graph_weights=FALSE)
data(smoo) NameBlocks=paste0("S",1:24) st=statis(Data=smoo,Blocks=rep(2,24),NameBlocks = NameBlocks) plot(st, axes=c(1,3), Graph_weights=FALSE)
For Free Sorting Data, this preprocessing is needed.
preprocess_FreeSort(Data, NameSub=NULL)
preprocess_FreeSort(Data, NameSub=NULL)
Data |
data frame or matrix. Corresponds to all variables that contain subjects results. Each column corresponds to a subject and gives the groups to which the products (rows) are assigned |
NameSub |
string vector. Name of each subject. Length must be equal to the number of clumn of the Data. If NULL, the names are S1,...Sm. Default: NULL |
A list with:
new_Data: the Data transformed
Blocks: the number of groups for each subject
NameBlocks: the name of each subject
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2018). Analysis and clustering of multiblock datasets by means of the STATIS and CLUSTATIS methods. Application to sensometrics. Food Quality and Preference, in Press.
data(choc) prepro=preprocess_FreeSort(choc)
data(choc) prepro=preprocess_FreeSort(choc)
For JAR data, this preprocessing is needed.
preprocess_JAR(Data, nprod, nsub, levelsJAR=3, beta=0.1)
preprocess_JAR(Data, nprod, nsub, levelsJAR=3, beta=0.1)
Data |
data frame where the first column is the Assessors, the second is the products and all other columns the JAR attributes with numbers (1 to 3 or 1 to 5, see levelsJAR) |
nprod |
integer. Number of products. |
nsub |
integer. Number of subjects. |
levelsJAR |
integer. 3 or 5 levels. If 5, the data will be transformed in 3 levels. |
beta |
numerical. Parameter for agreement between JAR and other answers. Between 0 and 0.5. |
A list with:
Datafinal: the Data transformed
NameSub: the name of each subject in the right order
Llobell, F., Vigneau, E. & Qannari, E. M. (September 14, 2022). Multivariate data analysis and clustering of subjects in a Just about right task. Eurosense, Turku, Finland.
catatis_jar
, cluscata_jar
, cluscata_kmeans_jar
data(cheese) prepro=preprocess_JAR(cheese, nprod=8, nsub=72, levelsJAR=5)
data(cheese) prepro=preprocess_JAR(cheese, nprod=8, nsub=72, levelsJAR=5)
Print the CATATIS results
## S3 method for class 'catatis' print(x, ...)
## S3 method for class 'catatis' print(x, ...)
x |
object of class 'catatis' |
... |
further arguments passed to or from other methods |
Print the CLUSCATA results
## S3 method for class 'cluscata' print(x, ...)
## S3 method for class 'cluscata' print(x, ...)
x |
object of class 'cluscata' |
... |
further arguments passed to or from other methods |
Print the ClusMB or clustering on STATIS axes results
## S3 method for class 'clusRows' print(x, ...)
## S3 method for class 'clusRows' print(x, ...)
x |
object of class 'clusRows' |
... |
further arguments passed to or from other methods |
Print the CLUSTATIS results
## S3 method for class 'clustatis' print(x, ...)
## S3 method for class 'clustatis' print(x, ...)
x |
object of class 'clustatis' |
... |
further arguments passed to or from other methods |
Print the STATIS results
## S3 method for class 'statis' print(x, ...)
## S3 method for class 'statis' print(x, ...)
x |
object of class 'statis' |
... |
further arguments passed to or from other methods |
RATA data on chocolates
data(RATAchoc)
data(RATAchoc)
RATA data with sessions. A data frame with 3 sessions, 9 panelists, 12 products and 27 RATA attributes.
Pangborn 2023
data(RATAchoc)
data(RATAchoc)
Test adapted to CATA data to determine whether two predetermined groups of subjects have a different perception or not. For example, men and women.
simil_groups_cata(Data, groups, one=1, two=2, nperm=50, Graph=TRUE, alpha= 0.05, printl=FALSE)
simil_groups_cata(Data, groups, one=1, two=2, nperm=50, Graph=TRUE, alpha= 0.05, printl=FALSE)
Data |
data frame or matrix. Correspond to all the blocks of variables merged horizontally |
groups |
categorical vector. The groups of each subject . The length must be the number of subjects. |
one |
string. Name of the group 1 in groups vector. |
two |
string. Name of the group 2 in groups vector. |
nperm |
numerical. How many permutations are required? Default: 50 |
Graph |
logical. Should the CATATIS graph of each group be plotted? Default: TRUE |
alpha |
numerical between 0 and 1. What is the threshold of the test? Default: 0.05 |
printl |
logical. Print the number of remaining permutations during the algorithm? Default: FALSE |
a list with:
decision: the decision of the test
pval: pvalue of the test
Llobell, F., Giacalone, D., Jaeger, S.R. & Qannari, E. M. (2021). CATA data: Are there differences in perception? JSM conference.
Llobell, F., Giacalone, D., Jaeger, S.R. & Qannari, E. M. (2021). CATA data: Are there differences in perception? AgroStat conference.
data(straw) groups=sample(1:2, 114, replace=TRUE) simil_groups_cata(straw, groups, one=1, two=2)
data(straw) groups=sample(1:2, 114, replace=TRUE) simil_groups_cata(straw, groups, one=1, two=2)
smoothies data
data(smoo)
data(smoo)
Projective mapping (or Napping) data. A data frame with 8 rows (the number of smoothies) and 48 columns (the number of consumers * 2). For each consumer, we have the coordinates of the products on the sheet of paper.
Francois Husson, Sebastien Le and Marine Cadoret (2017). SensoMineR: Sensory Data Analysis. R package version 1.23. https://CRAN.R-project.org/package=SensoMineR
data(smoo)
data(smoo)
STATIS method on quantitative blocks. SUpplementary outputs are also computed
statis(Data,Blocks,NameBlocks=NULL,Graph_obj=TRUE, Graph_weights=TRUE, scale=FALSE)
statis(Data,Blocks,NameBlocks=NULL,Graph_obj=TRUE, Graph_weights=TRUE, scale=FALSE)
Data |
data frame or matrix. Correspond to all the blocks of variables merged horizontally |
Blocks |
numerical vector. The number of variables of each block. The sum must be equal to the number of columns of Data |
NameBlocks |
string vector. Name of each block. Length must be equal to the length of Blocks vector. If NULL, the names are B1,...Bm. Default: NULL |
Graph_obj |
logical. Show the graphical representation od the objects? Default: TRUE |
Graph_weights |
logical. Should the barplot of the weights be plotted? Default: TRUE |
scale |
logical. Should the data variables be scaled? Default: FALSE |
a list with:
RV: the RV matrix: a matrix with the RV coefficient between blocks of variables
compromise: a matrix which is the compromise of the blocks (akin to a weighted average)
weights: the weights associated with the blocks to build the compromise
lambda: the first eigenvalue of the RV matrix
overall error : the error for the STATIS criterion
error_by_conf: the error by configuration (STATIS criterion)
rv_with_compromise: the RV coefficient of each block with the compromise
homogeneity: homogeneity of the blocks (in percentage)
coord: the coordinates of each object
eigenvalues: the eigenvalues of the svd decomposition
inertia: the percentage of total variance explained by each axis
error_by_obj: the error by object (STATIS criterion)
scalefactors: the scaling factors of each block
proj_config: the projection of each object of each configuration on the axes: presentation by configuration
proj_objects: the projection of each object of each configuration on the axes: presentation by object
Lavit, C., Escoufier, Y., Sabatier, R., Traissac, P. (1994). The act (statis method). Computational 462 Statistics & Data Analysis, 18 (1), 97-119.\
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2018). Analysis and clustering of multiblock datasets by means of the STATIS and CLUSTATIS methods.Application to sensometrics. Food Quality and Preference, in Press.
data(smoo) NameBlocks=paste0("S",1:24) st=statis(Data=smoo, Blocks=rep(2,24),NameBlocks = NameBlocks) #plot(st, axes=c(1,3)) summary(st) #with variables scaling st2=statis(Data=smoo, Blocks=rep(2,24),NameBlocks = NameBlocks, Graph_weights=FALSE, scale=TRUE)
data(smoo) NameBlocks=paste0("S",1:24) st=statis(Data=smoo, Blocks=rep(2,24),NameBlocks = NameBlocks) #plot(st, axes=c(1,3)) summary(st) #with variables scaling st2=statis(Data=smoo, Blocks=rep(2,24),NameBlocks = NameBlocks, Graph_weights=FALSE, scale=TRUE)
STATIS method on Free Sorting data. A lot of supplementary informations are also computed
statis_FreeSort(Data, NameSub=NULL, Graph_obj=TRUE, Graph_weights=TRUE)
statis_FreeSort(Data, NameSub=NULL, Graph_obj=TRUE, Graph_weights=TRUE)
Data |
data frame or matrix. Corresponds to all variables that contain subjects results. Each column corresponds to a subject and gives the groups to which the products (rows) are assigned |
NameSub |
string vector. Name of each subject. Length must be equal to the number of clumn of the Data. If NULL, the names are S1,...Sm. Default: NULL |
Graph_obj |
logical. Show the graphical representation od the objects? Default: TRUE |
Graph_weights |
logical. Should the barplot of the weights be plotted? Default: TRUE |
a list with:
a list with:
RV: the RV matrix: a matrix with the RV coefficient between subjects
compromise: a matrix which is the compromise of the subjects (akin to a weighted average)
weights: the weights associated with the subjects to build the compromise
lambda: the first eigenvalue of the RV matrix
overall error : the error for the STATIS criterion
error_by_conf: the error by configuration (STATIS criterion)
rv_with_compromise: the RV coefficient of each subject with the compromise
homogeneity: homogeneity of the subjects (in percentage)
coord: the coordinates of each object
eigenvalues: the eigenvalues of the svd decomposition
inertia: the percentage of total variance explained by each axis
error_by_obj: the error by object (STATIS criterion)
scalefactors: the scaling factors of each subject
proj_config: the projection of each object of each subject on the axes: presentation by subject
proj_objects: the projection of each object of each subject on the axes: presentation by object
Lavit, C., Escoufier, Y., Sabatier, R., Traissac, P. (1994). The act (statis method). Computational 462 Statistics & Data Analysis, 18 (1), 97-119.\
Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2018). Analysis and clustering of multiblock datasets by means of the STATIS and CLUSTATIS methods.Application to sensometrics. Food Quality and Preference, in Press.
preprocess_FreeSort
, clustatis_FreeSort
data(choc) res.sta=statis_FreeSort(choc)
data(choc) res.sta=statis_FreeSort(choc)
strawberries data
data(straw)
data(straw)
CATA data. A data frame with 6 rows (the number of strawberries) and 1824 columns (the number of consumers (114) * the number of attributes (16)). For each consumer,each attribute and eachb product, there is 1 if the attribute has been checked by the consumer for the product, and 0 if not.
Ares, G., & Jaeger, S. R. (2013). Check-all-that-apply questions: Influence of attribute order on sensory product characterization. Food Quality and Preference, 28(1), 141-153.
data(straw)
data(straw)
This function shows the CATATIS results
## S3 method for class 'catatis' summary(object, ...)
## S3 method for class 'catatis' summary(object, ...)
object |
object of class 'catatis'. |
... |
further arguments passed to or from other methods |
a list with:
homogeneity: homogeneity of the subjects (in percentage)
weights: the weights associated with the subjects to build the compromise
eigenvalues: the eigenvalues associated to the correspondance analysis
inertia: the percentage of total variance explained by each axis of the CA
This function shows the cluscata results
## S3 method for class 'cluscata' summary(object, ngroups=NULL, ...)
## S3 method for class 'cluscata' summary(object, ngroups=NULL, ...)
object |
object of class 'cluscata'. |
ngroups |
number of groups to consider. Ignored for cluscata_kmeans results. Default: recommended number of clusters |
... |
further arguments passed to or from other methods |
the CLUSCATA principal results
a list with:
group: the clustering partition
homogeneity: homogeneity index (
weights: weight associated with each subject in its cluster
rho: the threshold for the noise cluster
test_one_cluster: decision and pvalue to know if there is more than one cluster
This function shows the ClusMB or clustering on STATIS axes results
## S3 method for class 'clusRows' summary(object, ...)
## S3 method for class 'clusRows' summary(object, ...)
object |
object of class 'clusRows'. |
... |
further arguments passed to or from other methods |
a list with:
groups: clustering partition
nbClustRetained: the number of clusters retained
nbgH: Advised number of clusters per Hartigan index
nbgCH: Advised number of clusters per Calinski-Harabasz index
This function shows the clustatis results
## S3 method for class 'clustatis' summary(object, ngroups=NULL, ...)
## S3 method for class 'clustatis' summary(object, ngroups=NULL, ...)
object |
object of class 'clustatis'. |
ngroups |
number of groups to consider. Ignored for clustatis_kmeans results. Default: recommended number of clusters |
... |
further arguments passed to or from other methods |
the CLUSTATIS principal results
a list with:
group: the clustering partition
homogeneity: homogeneity index (
weights: weight associated with each block in its cluster
rho: the threshold for the noise cluster
test_one_cluster: decision and pvalue to know if there is more than one cluster
This function shows the STATIS results
## S3 method for class 'statis' summary(object, ...)
## S3 method for class 'statis' summary(object, ...)
object |
object of class 'statis'. |
... |
further arguments passed to or from other methods |
a list with:
homogeneity: homogeneity of the blocks (in percentage)
weights: the weights associated with the blocks to build the compromise
eigenvalues: the eigenvalues of the svd decomposition
inertia: the percentage of total variance explained by each axis