Title: | Clustering of Variables Around Latent Variables |
---|---|
Description: | Functions for the clustering of variables around Latent Variables, for 2-way or 3-way data. Each cluster of variables, which may be defined as a local or directional cluster, is associated with a latent variable. External variables measured on the same observations or/and additional information on the variables can be taken into account. A "noise" cluster or sparse latent variables can also be defined. |
Authors: | Evelyne Vigneau [aut, cre], Mingkun Chen [ctb], Veronique Cariou [aut] |
Maintainer: | Evelyne Vigneau <[email protected]> |
License: | GPL-3 |
Version: | 2.1.1 |
Built: | 2024-11-05 06:30:17 UTC |
Source: | CRAN |
Sensory characterization and consumers preference for 12 varieties of apples.
data(apples_sh)
data(apples_sh)
A data frame with 12 observations and 2 blocks of variables.
43 sensory attributes
hedonic scores given by a panel of 60 consumers
Daillant-Spinnler, B, MacFie, H.J.H, Beyts, P.K., Hedderley, D. (1996). Relationships between perceived sensory properties and major preference directions of 12 varieties of apples from the southern hemisphere. Food Quality and Preference, 7(2), 113-126.
data(apples_sh) names(apples_sh) apples_sh$senso apples_sh$pref
data(apples_sh) names(apples_sh) apples_sh$senso apples_sh$pref
The psychological behaviour items in this dataset is a part of French Research Project (AUPALESENS, 2010-2013) dealing with food behaviour and nutritional status of elderly people. There are 31 psychological items organised into five blocks, each aiming to describe a given behavioural characteristic: emotional eating (E) with six items, external eating (X) with five items, restricted eating (R) with five items, pleasure for food (P) with five items, and self esteem (S) with ten items. Detailed description and analysis of the emotional, external and restricted eating items for this study are available in Bailly, Maitre, Amand, Herve, and Alaphilippe (2012). 559 subjects were considered.
data(AUPA_psycho)
data(AUPA_psycho)
A data frame with 559 observations, (row names from 1 to 559) and 31 items. The name of the items refers to the corresponding block (E, X, R, P, S).
Bailly N, Maitre I, Amand M, Herve C, Alaphilippe D (2012). The Dutch Eating Behaviour Questionnaire(DEBQ). Assessment of eating behaviour in an aging French population. Appetite, 59(853-858).
X = data(AUPA_psycho)
X = data(AUPA_psycho)
Discrimination between authentic and adulterated juices using 1H NMR spectroscopy. 150 samples were prepared by varying the percentage of co-fruit mixed with the fruit juice of interest. The two first characters in the row names represent this percentage. Authentic juice names begin with "00". Samples prepared with the co-fruit alone are identified by "99" (rather than 100).
data(authen_NMR)
data(authen_NMR)
150 observations and 2 blocks of variables.
spectral range from 6 to 9 ppm (300 variables)
spectral range from 0.5 to 2.3 ppm (180 variables)
Vigneau E, Thomas F (2012). Model calibration and feature selection for orange juice authentication by 1H NMR spectroscopy.Chemometrics and Intelligent Laboratory Systems, 117, 22:30.
data(authen_NMR) xlab=as.numeric(colnames(authen_NMR$Xz2)) plot(xlab, authen_NMR$Xz2[1,], type="l", xlab="ppm",ylab="", ylim=c(14.8,15.8), xlim=rev(range(xlab))) for (i in (1:nrow(authen_NMR$Xz2))) lines(xlab,authen_NMR$Xz2[i,])
data(authen_NMR) xlab=as.numeric(colnames(authen_NMR$Xz2)) plot(xlab, authen_NMR$Xz2[1,], type="l", xlab="ppm",ylab="", ylim=c(14.8,15.8), xlim=rev(range(xlab))) for (i in (1:nrow(authen_NMR$Xz2))) lines(xlab,authen_NMR$Xz2[i,])
Centering and scaling of a three-way array
block.scale(X, xcenter = TRUE, xscale = 0)
block.scale(X, xcenter = TRUE, xscale = 0)
X |
: a three-way array |
xcenter |
: centering of X. By default X will be centered for both mode 2 and 3 (xcenter=TRUE), otherwise xcenter=FALSE |
xscale |
: scaling parameter applied to X. By default no scaling (xscale=0) |
Xscaled : the scaled three-way array
Bootstrapping on the individuals (in row) or the variables (in column) is performed. Choose the "row" option (the default), if the variables are measured on a random sample of individuals, Choose the "column" option, if the variables are taken from a population of variables. The first case is the more usual, but the second may occur, e.g. when variables are consumers assessing specific products. Each boostrapped data matrix is submitted to CLV in order to get partitions from 1 to nmax clusters. For each number of clusters, K, the Rand Index, the adjusted Rand Index, as well as the cohesion and the isolation of the clusters of the observed partition and the bootstrapped partitions are computed. These criteria are used for assessing the stability of the solution into K clusters. Parallel computing is performed for time saving.
boot_clv(object, case = "row", B = 100, nmax = NULL)
boot_clv(object, case = "row", B = 100, nmax = NULL)
object |
: result of CLV() |
case |
: "row" or "column" corresponding to the random effect of the design |
B |
: the number of bootstrap to be run (100 by default) |
nmax |
: maximal size of the partitions to be considered (if NULL, the value of nmax used for the object is used) |
res |
a list of length 4 for the Rand Index, Adjusted Rand Index, Cohesion and Isolation of the partition |
CLV
Case study pertaining to Quantitative Descriptive Analysis (QDA) applied to ten varieties of cider. The sensory panel consists of seven trained assessors who were asked to rate ten varieties of cider using a list of ten sensory attributes.
data(ciders)
data(ciders)
An object of class "array"
with 10 ciders (mode 1), 10 sensory attributes (mode 2) and 7 assessors (mode 3):
1 to 10
sweet, acid, bitter, astringency, odor strength, pungent, alcohol, perfume, intensity, and fruity
Judge.1 to Judge.7
Ledauphin, S., Hanafi, M., & Qannari, E. M. (2006). Assessment of the agreement among the subjects in fixed vocabulary profiling. Food quality and preference, 17(3-4), 277-280.
data(ciders) str(ciders)
data(ciders) str(ciders)
Hierarchical Cluster Analysis of a set of variables with consolidation. Directional or local groups may be defined. Each group of variables is associated with a latent component. Moreover, the latent component may be constrained using external information collected on the observations or on the variables.
CLV( X, Xu = NULL, Xr = NULL, method = NULL, sX = TRUE, sXr = FALSE, sXu = FALSE, nmax = 20, maxiter = 20, graph = TRUE )
CLV( X, Xu = NULL, Xr = NULL, method = NULL, sX = TRUE, sXr = FALSE, sXu = FALSE, nmax = 20, maxiter = 20, graph = TRUE )
X |
: The matrix of variables to be clustered |
Xu |
: The external variables associated with the columns of X |
Xr |
: The external variables associated with the rows of X |
method |
: The criterion to be use in the cluster analysis. |
sX |
,TRUE/FALSE : standardization or not of the columns X (TRUE by default) |
sXr |
,TRUE/FALSE : standardization or not of the columns Xr (FALSE by default) |
sXu |
,TRUE/FALSE : standardization or not of the columns Xu (FALSE by default) |
nmax |
: maximum number of partitions for which the consolidation will be done (by default nmax=20) |
maxiter |
: maximum number of iterations allowed for the consolidation/partitioning algorithm (by default maxiter=20) |
graph |
TRUE/FALSE (by default TRUE) : dendrogram and variation of the optimization criterion. |
If external variables are used, define either Xr or Xu, but not both. Use the LCLV function when Xr and Xu are simultaneously provided.
tabres |
Results of the clustering algorithm. In each line you find the results of one specific step of the hierarchical clustering.
|
partition K |
contains a list for each number of clusters of the partition, K=2 to nmax with
|
Vigneau E., Qannari E.M. (2003). Clustering of variables around latents components. Comm. Stat, 32(4), 1131-1150.
Vigneau E., Chen M., Qannari E.M. (2015). ClustVarLV: An R Package for the clustering of Variables around Latent Variables. The R Journal, 7(2), 134-148
CLV_kmeans, LCLV
data(apples_sh) #directional groups resclvX <- CLV(X = apples_sh$senso, method = "directional", sX = TRUE) plot(resclvX,type="dendrogram") plot(resclvX,type="delta") #local groups with external variables Xr resclvYX <- CLV(X = apples_sh$pref, Xr = apples_sh$senso, method = "local", sX = FALSE, sXr = TRUE)
data(apples_sh) #directional groups resclvX <- CLV(X = apples_sh$senso, method = "directional", sX = TRUE) plot(resclvX,type="dendrogram") plot(resclvX,type="delta") #local groups with external variables Xr resclvYX <- CLV(X = apples_sh$pref, Xr = apples_sh$senso, method = "local", sX = FALSE, sXr = TRUE)
K-means algorithm for the clustering of variables. Directional or local groups may be defined. Each group of variables is associated with a latent component. Moreover external information collected on the observations or on the variables may be introduced.
CLV_kmeans( X, Xu = NULL, Xr = NULL, method, sX = TRUE, sXr = FALSE, sXu = FALSE, clust, iter.max = 20, nstart = 100, strategy = "none", rho = 0.3 )
CLV_kmeans( X, Xu = NULL, Xr = NULL, method, sX = TRUE, sXr = FALSE, sXu = FALSE, clust, iter.max = 20, nstart = 100, strategy = "none", rho = 0.3 )
X |
The matrix of the variables to be clustered |
Xu |
The external variables associated with the columns of X |
Xr |
The external variables associated with the rows of X |
method |
The criterion to use in the cluster analysis. |
sX |
TRUE/FALSE : standardization or not of the columns X (TRUE by default) |
sXr |
TRUE/FALSE : standardization or not of the columns Xr (FALSE by default) |
sXu |
TRUE/FALSE : standardization or not of the columns Xu (FALSE by default) |
clust |
: a number i.e. the size of the partition, K, or a vector of INTEGERS i.e. the group membership of each variable in the initial partition (integer between 1 and K) |
iter.max |
maximal number of iteration for the consolidation (20 by default) |
nstart |
nb of random initialisations in the case where init is a number (100 by default) |
strategy |
"none" (by default), or "kplusone" (an additional cluster for the noise variables), or "sparselv" (zero loadings for the noise variables) |
rho |
a threshold of correlation between 0 and 1 (0.3 by default) |
The initalization can be made at random, repetitively, or can be defined by the user.
The parameter "strategy" makes it possible to choose a strategy for setting aside variables that do not fit into the pattern of any cluster.
tabres |
The value of the clustering criterion at convergence. |
clusters |
the group's membership |
comp |
The latent components of the clusters |
loading |
if there are external variables Xr or Xu : The loadings of the external variables |
Vigneau E., Qannari E.M. (2003). Clustering of variables around latents components. Comm. Stat, 32(4), 1131-1150.
Vigneau E., Chen M., Qannari E.M. (2015). ClustVarLV: An R Package for the clustering of Variables around Latent Variables. The R Journal, 7(2), 134-148
Vigneau E., Chen M. (2016). Dimensionality reduction by clustering of variables while setting aside atypical variables. Electronic Journal of Applied Statistical Analysis, 9(1), 134-153
CLV, LCLV
data(apples_sh) #local groups with external variables Xr resclvkmYX <- CLV_kmeans(X = apples_sh$pref, Xr = apples_sh$senso,method = "local", sX = FALSE, sXr = TRUE, clust = 2, nstart = 20)
data(apples_sh) #local groups with external variables Xr resclvkmYX <- CLV_kmeans(X = apples_sh$pref, Xr = apples_sh$senso,method = "local", sX = FALSE, sXr = TRUE, clust = 2, nstart = 20)
Hierarchical Cluster Analysis of a set of variables (mode 2) given a three-way array with a further consolidation step. Each group of variables is associated with a one-rank PARAFAC model (comp x loading x weight). Moreover, a Non Negativity (NN) constraint may be added to the model, so that the loading coefficients have positive values. Return an object of class clv3w.
CLV3W(X,mode.scale=0,NN=FALSE,moddendoinertie=TRUE,gmax=20,graph=TRUE,cp.rand=10)
CLV3W(X,mode.scale=0,NN=FALSE,moddendoinertie=TRUE,gmax=20,graph=TRUE,cp.rand=10)
X |
: a three way array - variables of mode 2 will be clustered |
mode.scale |
: scaling parameter applied to X, by default centering of X (for mode 2 x mode 3) is done. By default no scaling (mode.scale=0) |
NN |
: non Negativity constraint to be added on the loading coefficients. By default no constraint (NN=FALSE) |
moddendoinertie |
: dendrogram. By default it is based on the delta clustering criterion (moddendoinertie =TRUE) |
gmax |
: maximum number of partitions for which the consolidation will be done (default : gmax=11) |
graph |
: boolean, if TRUE, the graphs associated with the dendrogram and the evolution of the aggregation criterion are displayed (default : graph=TRUE) |
cp.rand |
: number of random starts associated with the one rank Candecomp/Parafac model (By default cp.rand=10) |
tabres |
Results of the hierarchical clustering algorithm. In each line you find the results of one specific step of the hierarchical clustering.
|
hclust |
contains the results of the HCA |
partition K |
contains a list for each number of clusters of the partition, K=1 to gmax with
|
param |
contains the clustering parameters
|
call : call of the method
Veronique Cariou, [email protected]
Wilderjans, T. F., & Cariou, V. (2016). CLV3W: A clustering around latent variables approach to detect panel disagreement in three-way conventional sensory profiling data. Food quality and preference, 47, 45-53.
Cariou, V., & Wilderjans, T. F. (2018). Consumer segmentation in multi-attribute product evaluation by means of non-negatively constrained CLV3W. Food Quality and Preference, 67, 18-26.
CLV3W_kmeans, get_comp, get_loading, get_partition, plot, plot_var.clv3w,
data(ciders) ## Cluster Analysis of cider sensory descriptors with block scaling ## to set the assessors to the same footing res.cider<-CLV3W(ciders,mode.scale=3,NN=FALSE,moddendoinertie=FALSE,gmax=20,graph=FALSE,cp.rand=5) plot(res.cider,type="delta") plot(res.cider,type="dendrogram") print(res.cider) summary(res.cider,2) get_comp(res.cider,2) get_loading(res.cider,2) get_weight(res.cider,2)
data(ciders) ## Cluster Analysis of cider sensory descriptors with block scaling ## to set the assessors to the same footing res.cider<-CLV3W(ciders,mode.scale=3,NN=FALSE,moddendoinertie=FALSE,gmax=20,graph=FALSE,cp.rand=5) plot(res.cider,type="delta") plot(res.cider,type="dendrogram") print(res.cider) summary(res.cider,2) get_comp(res.cider,2) get_loading(res.cider,2) get_weight(res.cider,2)
Each group of variables is associated with a one-rank PARAFAC model (comp x loading x weight). Moreover, a Non Negativity (NN) constraint may be added to the model, so that the loading coefficients have positive values. Return an object of class clv3w.
CLV3W_kmeans(X,K,mode.scale=0,NN=FALSE,init=10,cp.rand=5)
CLV3W_kmeans(X,K,mode.scale=0,NN=FALSE,init=10,cp.rand=5)
X |
: a three way array - variables of mode 2 will be clustered |
K |
: number of clusters |
mode.scale |
: scaling parameter applied to X, by default centering of X (for mode 2 x mode 3) is done. By default no scaling (mode.scale=0) |
NN |
: non Negativity constraint to be added on the loading coefficients. By default no constraint (NN=FALSE) |
init |
: either the number of random starts i.e. partitions generated for the initialisation (By default init=10) |
cp.rand |
: number of random starts associated with the one rank Candecomp/Parafac model (By default cp.rand=10) |
results |
|
param |
contains the clustering parameters
|
call : call of the method
Veronique Cariou, [email protected]
Wilderjans, T. F., & Cariou, V. (2016). CLV3W: A clustering around latent variables approach to detect panel disagreement in three-way conventional sensory profiling data. Food quality and preference, 47, 45-53.
Cariou, V., & Wilderjans, T. F. (2018). Consumer segmentation in multi-attribute product evaluation by means of non-negatively constrained CLV3W. Food Quality and Preference, 67, 18-26.
summary.clv3W, print.clv3W
data(coffee) ## Cluster Analysis of coffee sensory descriptors with block scaling ## to set the assessors to the same footing res.coffee <- CLV3W_kmeans(coffee,K=2,NN=TRUE,mode.scale=3,init=1,cp.rand=1) summary(res.coffee) get_partition(res.coffee)
data(coffee) ## Cluster Analysis of coffee sensory descriptors with block scaling ## to set the assessors to the same footing res.coffee <- CLV3W_kmeans(coffee,K=2,NN=TRUE,mode.scale=3,init=1,cp.rand=1) summary(res.coffee) get_partition(res.coffee)
Case study pertaining to consumer emotions associations for a variety of 12 coffee aromas. The participants were asked to complete each rating (i.e., rating the odor of 12 aromas on 15 emotion terms) on a 5-point rating scale.
data(coffee)
data(coffee)
An object of class "array"
with 12 odors (mode 1), 84 subjects (mode 2) and 15 emotions (mode 3):
Vanilla, B.Rice, Lemon, Coffee.Flower, Cedar, Hazelnut, Coriander.Seed, Honey, Medicine, Apricot, Earth, Hay
persons from Oniris
Amused, Angry, Calm, Disappointed, Disgusted, Energetic, Excited, Free, Happy, Irritated, Nostalgic, Surprised, Unique, Unpleasant and Well
Cariou, V., & Wilderjans, T. F. (2018). Consumer segmentation in multi-attribute product evaluation by means of non-negatively constrained CLV3W. Food Quality and Preference, 67, 18-26.
data(coffee) str(coffee)
data(coffee) str(coffee)
Loading plot of the variables from a Principal Components Analysis. scores of the observations are surimposed
data_biplot(X, sX = TRUE, axeh = 1, axev = 2, cex.lab = 1)
data_biplot(X, sX = TRUE, axeh = 1, axev = 2, cex.lab = 1)
X |
the data matrix |
sX |
TRUE/FALSE : standardization or not of the columns X (TRUE by default) |
axeh |
component number for the horizontal axis |
axev |
component number for the vertical axis |
cex.lab |
: magnification to be used for labels (1 by default) |
To get the latent components associated with each cluster.
get_comp(resclv, K = NULL, graph = FALSE, cex.lab = 1)
get_comp(resclv, K = NULL, graph = FALSE, cex.lab = 1)
resclv |
: result of CLV(), CLV_kmeans(), LCLV(), CLV3W() or CLV3W_km() |
K |
: the number of clusters chosen (already defined if CLV_kmeans or CLV3W_kmeans is used) |
graph |
: boolean, if TRUE, the barplot associated with the scores is displayed (default : graph=FALSE) |
cex.lab |
: magnification to be used for labels (1 by default) |
comp |
the group latent components (centered) |
data(apples_sh) resclvX <- CLV(X = apples_sh$senso, method = "directional", sX = TRUE) comp4G<-get_comp(resclvX, K = 4)
data(apples_sh) resclvX <- CLV(X = apples_sh$senso, method = "directional", sX = TRUE) comp4G<-get_comp(resclvX, K = 4)
To get the variables loadings for the latent component in each cluster.
For CLV(_kmeans), the loadings are of particular interest when method="directional" or when strategy="sparse LV"
For CLV3W(_kmeans), the loadings are given for the variables associated with mode 2 of the 3-way array.
get_loading(resclv, K = NULL, type = "list", graph = FALSE, cex.lab = 1)
get_loading(resclv, K = NULL, type = "list", graph = FALSE, cex.lab = 1)
resclv |
: result of CLV(), CLV_kmeans(), LCLV(), CLV3W() or CLV3W_kmeans() |
K |
: the number of clusters chosen (already defined if CLV_kmeans or CLV3W_kmeans is used) |
type |
: outputs in the form of a "list" (one element by cluster, by default) or a "vector" (available only if "sparselv" strategy is used) |
graph |
: boolean, if TRUE, the barplot associated with the scores is displayed (default : graph=FALSE) |
cex.lab |
: magnification to be used for labels (1 by default) |
loading |
the loadings of the variables on each cluster's latent component |
To get the clusters memberships of the variables.
get_partition(resclv, K = NULL, type = "vector")
get_partition(resclv, K = NULL, type = "vector")
resclv |
: result of CLV(), CLV_kmeans(), LCLV(), CLV3W() or CLV3W_km() |
K |
: the number of clusters chosen (already defined if CLV_kmeans or CLV3W_kmeans is used) |
type |
: "vector" (by default) for output given as a vector of integers between 1 and K (with 0 for "kplusone" strategy), |
partition |
the group's membership for the variables, in a vector or matrix form. |
data(apples_sh) resclvX <- CLV(X = apples_sh$senso, method = "directional", sX = TRUE) parti4G<-get_partition(resclvX, K = 4)
data(apples_sh) resclvX <- CLV(X = apples_sh$senso, method = "directional", sX = TRUE) parti4G<-get_partition(resclvX, K = 4)
To get the weigths associated with each cluster. For CLV(_kmeans) or LCLV, applies only when external variables (Xr, Xu or both) are involved. For CLV3W(_kmeans), the weights are associated with the third mode of the 3-way array.
get_weight(resclv, K = NULL, graph = FALSE, cex.lab = 1)
get_weight(resclv, K = NULL, graph = FALSE, cex.lab = 1)
resclv |
: result of CLV(), CLV_kmeans() or LCLV(), CLV3W() or CLV3W_kmeans() |
K |
: the number of clusters chosen (already defined if CLV_kmeans or CLV3W_kmeans is used) |
graph |
: boolean, if TRUE, the barplot associated with the scores is displayed (default : graph=FALSE) |
cex.lab |
: magnification to be used for labels (1 by default) |
weight |
Weights in each cluster (associated with mode 3 for CLV3W object) |
For each variable, its missing data will be imputed according to the values of the latent variable of the group in which the variable belong to.
imput_clv(x, X0, K = NULL)
imput_clv(x, X0, K = NULL)
x |
: an object of class |
X0 |
: the initial data matrix with missing values (NA) |
K |
: the number of Latent Variables to be considered, each of them being associated with a group of variables. |
It is adviced to use a larger number of latent variables, on the basis of which the imputation will be done, than the suspected 'true' number of groups of variables
X0imput : the imputed data matrix, in the original scale
Ximput : the imputed matrix, centered and scaled according to the pretratment parameters chosen in CLV
Define clusters of X-variables aroud latent components. In each cluster, two latent components are extracted, the first one is a linear combination of the external information collected for the rows of X and the second one is a linear combination of the external information associated with the columns of X.
LCLV(X, Xr, Xu, ccX = FALSE, sX = TRUE, sXr = FALSE, sXu = FALSE, nmax = 20)
LCLV(X, Xr, Xu, ccX = FALSE, sX = TRUE, sXr = FALSE, sXu = FALSE, nmax = 20)
X |
The matrix of variables to be clustered |
Xr |
The external variables associated with the rows of X |
Xu |
The external variables associated with the columns of X |
ccX |
TRUE/FALSE : double centering of X (FALSE, by default) If FALSE this implies that cX = TRUE : column-centering of X |
sX |
TRUE/FALSE : standardization or not of the columns X (TRUE by default) |
sXr |
TRUE/FALSE : standardization or not of the columns Xr (FALSE by default) |
sXu |
TRUE/FALSE : standardization or not of the columns Xu (FALSE by default) |
nmax |
maximum number of partitions for which the consolidation will be done (by default nmax=20) |
tabres |
Results of the clustering algorithm. In each line you find the results of one specific step of the hierarchical clustering.
|
partition K |
a list for each number of clusters of the partition, K=2 to nmax with
|
Vigneau E., Qannari E.M. (2003). Clustering of variables around latents components. Comm. Stat, 32(4), 1131-1150.
Vigneau, E., Charles, M.,& Chen, M. (2014). External preference segmentation with additional information on consumers: A case study on apples. Food Quality and Preference, 32, 83-92.
Vigneau E., Chen M., Qannari E.M. (2015). ClustVarLV: An R Package for the clustering of Variables around Latent Variables. The R Journal, 7(2), 134-148
prediction of a response variable, y, based on clusters of predictors variables, X. boosted-liked procedure for identifying groups of predictors, and their associated latent component, well correlated with the actual residuals of response variable, y. sparsity is allowed using the strategy options ("sparselv" or "kplusone") and the rho parameter.
lm_CLV( X, y, method = "directional", sX = TRUE, shrinkp = 0.5, strategy = "none", rho = 0.3, validation = FALSE, id.test = NULL, maxiter = 100, threshold = 1e-05 )
lm_CLV( X, y, method = "directional", sX = TRUE, shrinkp = 0.5, strategy = "none", rho = 0.3, validation = FALSE, id.test = NULL, maxiter = 100, threshold = 1e-05 )
X |
: The matrix of the predictors, the variables to be clustered |
y |
: The response variable (usually numeric)
If y is binary factor, indicator variable (0/1) is generated. A Bayes rule is used to compute class probabilities. |
method |
: The criterion to be use in the cluster analysis. |
sX |
: TRUE/FALSE, i.e. standardization or not of the columns X (TRUE by default) |
shrinkp |
: shrinkage paramater used in the boosting (max : 1, 0.5 by default). |
strategy |
: "none" (by default), or "kplusone" (an additional cluster for the unclassifiable variables), or "sparselv" (zero loadings for the unclassifiable variables) |
rho |
: a threshold of correlation between 0 and 1 (used in "kplusone" or "sparselv" strategy, 0.3 by default) |
validation |
TRUE/FALSE i.e. using a test set or not. By default no validation |
id.test |
: if validation==TRUE, the number of the observations used as test set |
maxiter |
: the maximum number of components extracted (100 by default) |
threshold |
: used in a stopping rule, when the relative calibration errors sum of squares stabilizes (10e-6 by default) |
Group |
a list of the groups of variables X in order of the first time extracted. |
Comp |
a list of the latent components associated with the groups of X variables extracted. |
Load |
a list for the loadings of the X variables in the latent component. |
Alpha |
a list of the regression coefficients to be applied to the latent components. |
Beta |
a list of the beta coefficients to be applied to the pretreated predictors. |
GroupImp |
Group Importance i.e. the decrease of the residuals' variance provided by the CLV components in the model. |
RMSE.cal |
the root mean square error for the calibration set, at each step of the procedure. |
ERRrate.cal and rocAUC.cal |
when y is a binary factor, the classification rate and the AUC for ROC, on the bassis of the calibration set, at each step of the procedure. |
RMSE.val |
as RMSE.cal but for the test set, if provided. |
ERRrate.val and rocAUC.val |
as for calibration set but for the test set, if provided. |
CLV, CLV_kmeans
Loading plot of the variables from a Principal Components Analysis. The group membership of the variables is superimposed.
plot_var( resclv, K = NULL, axeh = 1, axev = 2, label = FALSE, cex.lab = 1, v_colors = NULL, v_symbol = FALSE, beside = FALSE )
plot_var( resclv, K = NULL, axeh = 1, axev = 2, label = FALSE, cex.lab = 1, v_colors = NULL, v_symbol = FALSE, beside = FALSE )
resclv |
results of CLV(), CLV_kmeans() or LCLV() |
K |
the number of groups in the partition (already defined if CLV_kmeans is used) |
axeh |
component number for the horizontal axis |
axev |
component number for the vertical axis |
label |
= TRUE :the column names in X are used as labels / = FALSE: no labels (by default) |
cex.lab |
: magnification to be used for labels (1 by default) |
v_colors |
default NULL. If missing colors are given, by default |
v_symbol |
=TRUE : symbols are given isntead of colors for the identification of the groups/ =FALSE: no symbol (by default). |
beside |
=TRUE : a plot per cluster of variables, side-by-side/ =FALSE :an unique plot with all the variables with the identification of their group membership (by default). |
data(apples_sh) resclvX <- CLV(X = apples_sh$senso, method = 1, sX = TRUE) plot_var(resclvX, K = 4, axeh = 1, axev = 2)
data(apples_sh) resclvX <- CLV(X = apples_sh$senso, method = 1, sX = TRUE) plot_var(resclvX, K = 4, axeh = 1, axev = 2)
Scores plot from a Candecomp Parafac analysis. The group membership of the variables is superimposed.
plot_var.clv3w( resclv3w, K = NULL, axeh = 1, axev = 2, labels = FALSE, cex.lab = 1, v_colors = NULL, v_symbol = FALSE, beside = FALSE, mode3 = FALSE )
plot_var.clv3w( resclv3w, K = NULL, axeh = 1, axev = 2, labels = FALSE, cex.lab = 1, v_colors = NULL, v_symbol = FALSE, beside = FALSE, mode3 = FALSE )
resclv3w |
the data matrix |
K |
the number of groups in the partition (already defined if CLV3W_kmeans i used) |
axeh |
component number for the horizontal axis |
axev |
component number for the vertical axis |
labels |
boolean to add variable' labels (label=TRUE) on the plot or not (label=FALSE). By default label=TRUE |
cex.lab |
magnification to be used for labels (1 by default) |
v_colors |
default NULL. If missing colors are given, by default |
v_symbol |
symbols are given instead of colors for the identification of the groups/ =FALSE: no symbol (by default). |
beside |
plot per cluster of variables, side-by-side |
mode3 |
projection of the mode 3 elements onto the scores plot |
This function plots either the CLV dendrogram or the variations of the consolidated CLV criterion.
## S3 method for class 'clv' plot(x, type = "dendrogram", cex = 0.8, ...)
## S3 method for class 'clv' plot(x, type = "dendrogram", cex = 0.8, ...)
x |
: an object of class |
type |
: What to plot. |
cex |
: Character expansion for labels. |
... |
further arguments passed to or from other methods |
CLV
This function plots either the CLV3W dendrogram or the variations of the consolidated CLV3W criterion.
## S3 method for class 'clv3w' plot(x, type = "dendrogram", cex = 0.8, ...)
## S3 method for class 'clv3w' plot(x, type = "dendrogram", cex = 0.8, ...)
x |
: an object of class |
type |
: What to plot. |
cex |
: Character expansion for labels. |
... |
Additional arguments passed on to the real |
CLV3W
This function plots either the CLV dendrogram or the variations of the consolidated CLV criterion.
## S3 method for class 'lclv' plot(x, type = "dendrogram", cex = 0.8, ...)
## S3 method for class 'lclv' plot(x, type = "dendrogram", cex = 0.8, ...)
x |
: an object of class |
type |
: What to plot. |
cex |
: Character expansion for labels. |
... |
further arguments passed to or from other methods |
LCLV
To get the predicted response values based on lmCLV model.
## S3 method for class 'lmclv' predict(object, newdata, shrinkp, ...)
## S3 method for class 'lmclv' predict(object, newdata, shrinkp, ...)
object |
: result of class |
newdata |
: Data frame of observations for which to make predictions |
shrinkp |
: shrinkage parameter. |
... |
: further arguments passed to or from other methods |
a matrix of the predicted values,
each column with an increasing number of CLV component included
the first column being for the null model
if the response if a binary factor, two additional matrices are provided :
the probabilities of belonging to class 1 and the response values (0 or 1).
Print the CLV results
## S3 method for class 'clv' print(x, ...)
## S3 method for class 'clv' print(x, ...)
x |
an object of class |
... |
further arguments passed to or from other methods |
CLV
Print the CLV3W results
## S3 method for class 'clv3w' print(x, ...)
## S3 method for class 'clv3w' print(x, ...)
x |
an object of class |
... |
Additional arguments passed on to the real |
CLV3W, CLV3W_kmeans
Print the LCLV results
## S3 method for class 'lclv' print(x, ...)
## S3 method for class 'lclv' print(x, ...)
x |
an object of class |
... |
further arguments passed to or from other methods |
LCLV
pretreatment of qualitative variables
stand_quali(X.quali, metric = "chisq")
stand_quali(X.quali, metric = "chisq")
X.quali |
: a factor or a data frame with several factors |
metric |
: the metric to be used, i.e. each category is weighted by the inverse of the square-root of its relative frequency |
Xdisj.sd : a standardized matrix with as many columns as categories associated with the qualitative variables.
This function provides the list of the variables within each group and complementary informations. Users will be asked to specify the number of clusters,
## S3 method for class 'clv' summary(object, K = NULL, ...)
## S3 method for class 'clv' summary(object, K = NULL, ...)
object |
: result of CLV() or CLV_kmeans() |
K |
: the number of clusters (unless if CLV_kmeans was used) |
... |
further arguments passed to or from other methods |
The ouputs include :
the size of the groups,
the list of the variables within each group. FFor each cluster, the correlation of the each variable with its group latent component
and the correlation with the next neighbouring group latent component are given.
the proportion of the variance within each group explained by its latent variable,
the proportion of the whole dataset account by the group latent variables
the matrix of correlation between the latent variables.
This function provides the list of the variables within each group and complementary informations. Users will be asked to specify the number of clusters,
## S3 method for class 'clv3w' summary(object, K = NULL, ...)
## S3 method for class 'clv3w' summary(object, K = NULL, ...)
object |
: result of CLV3W() or CLV3W_kmeans() |
K |
: the number of clusters (unless if CLV3W_kmeans was used) |
... |
Additional arguments passed on to the real |
The ouputs include :
the size of the groups,
the proportion of the variance within each group explained by its latent variable,
the proportion of the whole dataset accounted by the group latent variables
the latent components (mode 1) associated to the various groups,
the weights (mode 3) associated to the various groups,
the list of the variables within each group. For each cluster, the loading (mode 2) of the variable is given together with the correlation of the block component with its group latent component
and the correlation with the next neighbouring group latent component are given.
the matrix of correlation between the latent variables.