Title: | K-Means for Joint Longitudinal Data |
---|---|
Description: | An implementation of k-means specifically design to cluster joint trajectories (longitudinal data on several variable-trajectories). Like 'kml', it provides facilities to deal with missing value, compute several quality criterion (Calinski and Harabatz, Ray and Turie, Davies and Bouldin, BIC,...) and propose a graphical interface for choosing the 'best' number of clusters. In addition, the 3D graph representing the mean joint-trajectories of each cluster can be exported through LaTeX in a 3D dynamic rotating PDF graph. |
Authors: | Christophe Genolini [cre, aut], Bruno Falissard [ctb], Patrice Kiener [ctb], Jean-Baptiste Pingault [ctb] |
Maintainer: | Christophe Genolini <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.5.0 |
Built: | 2024-11-23 06:20:24 UTC |
Source: | CRAN |
KmL3D
is a new implementation of k-means for longitudinal data (or trajectories).
Here is an overview of the package.
Package: | KmL3D |
Type: | Package |
Version: | 2.4.2 |
Date: | 2017-08-01 |
License: | GPL (>= 2) |
LazyData: | yes |
Depends: | methods,graphics,rgl,misc3d,longitudinalData(>=2.2),KmL(>=2.2) |
URL: | http://www.r-project.org |
URL: | http://christophe.genolini.free.fr/kml |
To cluster data, KmL3D
go through three steps, each of which
is associated to some functions:
Data preparation
Building "optimal" clusterization.
Exporting results
Visualizing and exporting 3D object
kml3d
works on object of class ClusterLongData3d
.
Data preparation therefore simply consists in transforming data into an object ClusterLongData3d
.
This can be done via function
clusterLongData3d
(cld3d
in short) that
converts a data.frame
or an array
into a ClusterLongData3d
.
Working on several variables mesured on different scales can give to
much weight to one of the dimension. So the function scale
normalizes data.
Instead of working on real data, one can also work on artificial
data. Such data can be created with generateArtificialLongData3d
(gald3d
in short).
Once an object of class ClusterLongData3d
has been created, the algorithm
kml3d
can be run.
Starting with a ClusterLongData3d
, kml3d
built several Partitions
(see package longitudinalData
).
An object of class Partition
is a partition of trajectories
into subgroups. It also contains some information like the
percentage of trajectories contained in each group or some quality critetion (like the Calinski &
Harabasz).
k-means
is a "hill-climbing" algorithm. The specificity of this
kind of algorithm is that it always converges towards a maximum, but
one cannot know whether it is a local or a global maximum. It offers
no guarantee of optimality.
To maximize one's chances of getting a quality Partition
,
it is better to execute the hill climbing algorithm several times,
then to choose the best solution. By default, kml3d
executes the hill climbing algorithm 20 times.
To date, it is not possible to know the optimum number of clusters
even if the calculatous of some qualities criterion can gives some
clues. kml3d
computes various of them.
In the end, kml3d
tests by default 2, 3, 4, 5 et 6 clusters, 20 times each.
When kml3d
has constructed some
Partition
, the user can examine them one by one and choose
to export some. This can be done via function
choice
. choice
opens a graphic windows showing
various information including the trajectories cluterized by a specific
Partition
.
When some Partition
has been selected (the user can select
more than 1), it is possible to
save them. The clusters are therefore exported towards the file
name-cluster.csv
. Criteria are exported towards
name-criteres.csv
. The graphs are exported according to their
extension.
KmL3D
also propose tools to visualize the trajectories in
3D. plot3d
using the library rgl
to plot two
variables according to time (either the all set of joint-trajectories, or
just the mean joint-trajectories). Then the user can make the
graphical representation turn using the mouse. plot3dPdf
build an
Triangles
object. These kind of
object can be include in a pdf
file using
saveTrianglesAsASY
and the software
asymptote
. Once again, it is possible to make the image in the
pdf file move using the mouse -so the reader gets real 3D-.
For those who are not familiar with S4 programming: In S4 programming, each function can be adapted for some specific arguments.
To get help on a function (for example plot
), use:
?(plot)
.
To get help on a function adapted to its argument (for example plot
on argument ClusterLongData
), used: ?"plot,ClusterLongData"
.
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### 1. Data Preparation data(pregnandiol) names(pregnandiol) cld3dPregTemp <- cld3d(pregnandiol,timeInData=list(temp=1:30*2,preg=1:30*2+1)) ### 2. Building "optimal" clusteration (with only 2 redrawings) ### Real analysis needs at least 20 redrawings kml3d(cld3dPregTemp,3:5,nbRedrawing=2,toPlot="both") ### 3. Exporting results try(choice(cld3dPregTemp)) ### 4. Visualizing in 3D plotMeans3d(cld3dPregTemp,4) ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### 1. Data Preparation data(pregnandiol) names(pregnandiol) cld3dPregTemp <- cld3d(pregnandiol,timeInData=list(temp=1:30*2,preg=1:30*2+1)) ### 2. Building "optimal" clusteration (with only 2 redrawings) ### Real analysis needs at least 20 redrawings kml3d(cld3dPregTemp,3:5,nbRedrawing=2,toPlot="both") ### 3. Exporting results try(choice(cld3dPregTemp)) ### 4. Visualizing in 3D plotMeans3d(cld3dPregTemp,4) ### Go back to current dir setwd(wd)
Given some longitudinal data (trajectories) and k clusters centers, affectIndiv3d
affects each individual to the cluster whose center is the closest.
affectIndiv3d(traj, clustersCenter, distance = dist3d)
affectIndiv3d(traj, clustersCenter, distance = dist3d)
traj |
|
clustersCenter |
|
distance |
|
Given an array of clusters center clustersCenter
(each plan of
the first dimension is
a cluster center, that is clusterCenter[2,,] is the second cluster
center), the function affectIndiv3d
affect each
individual of the array traj
to the closest clusters,
according to distance
.
affectIndiv3d
used with calculTrajMean3d
simulates one k-means 3D step.
Object of classPartition
.
####################### ### affectIndiv ### Some trajectories traj <- gald3d()["traj"] ### 4 clusters centers center <- traj[runif(4,1,nrow(traj)),,] ### Affectation of each individual part <- affectIndiv3d(traj,center) ################# ### K-means simulation (3 steps) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) for (i in 1:3){ center <- calculTrajMean3d(traj,part) part <- affectIndiv3d(traj,center) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) }
####################### ### affectIndiv ### Some trajectories traj <- gald3d()["traj"] ### 4 clusters centers center <- traj[runif(4,1,nrow(traj)),,] ### Affectation of each individual part <- affectIndiv3d(traj,center) ################# ### K-means simulation (3 steps) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) for (i in 1:3){ center <- calculTrajMean3d(traj,part) part <- affectIndiv3d(traj,center) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) }
Given some joint longitudinal data and a cluster affectation,
calculTrajMean3d
computes the mean joint-trajectories of each cluster.
calculTrajMean3d(traj, clust,centerMethod=function(x){mean(x,na.rm=TRUE)})
calculTrajMean3d(traj, clust,centerMethod=function(x){mean(x,na.rm=TRUE)})
traj |
|
clust |
|
centerMethod |
|
Given a vector of affectation to a cluster, the function
calculTrajMean3d
compute the "central" trajectory of each
clusters. The "center" can be define using the argument centerMethod
.
affectIndiv3d
used with calculTrajMean3d
simulates one k-means step.
An array of dimension (k,t,v)
with k
number of groups, t
number of
time mesurement and v
number of variables.
####################### ### calculTrajMean3d ### Some LongitudinalData3d traj <- gald3d()["traj"] ### A partition part <- floor(runif(150,1,5)) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) ### Clusters center (center <- calculTrajMean3d(traj,part)) ################# ### K-means simulation (4 steps) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) for (i in 1:4){ part <- affectIndiv3d(traj,center) center <- calculTrajMean3d(traj,part) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) }
####################### ### calculTrajMean3d ### Some LongitudinalData3d traj <- gald3d()["traj"] ### A partition part <- floor(runif(150,1,5)) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) ### Clusters center (center <- calculTrajMean3d(traj,part)) ################# ### K-means simulation (4 steps) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) for (i in 1:4){ part <- affectIndiv3d(traj,center) center <- calculTrajMean3d(traj,part) plot(clusterLongData3d(traj),parTraj=parTRAJ(col=part+1)) }
clusterLongData3d
(or cld3d
in short) is the constructor
for ClusterLongData3d
object.
clusterLongData3d(traj, idAll, time, timeInData, varNames, maxNA) cld3d(traj, idAll, time, timeInData, varNames, maxNA)
clusterLongData3d(traj, idAll, time, timeInData, varNames, maxNA) cld3d(traj, idAll, time, timeInData, varNames, maxNA)
traj |
|
idAll |
|
time |
|
timeInData |
|
varNames |
|
maxNA |
|
clusterLongData3d
construct a object of class
ClusterLongData
(from package kml
). Two cases can be distinguised:
traj
is an array
: the first dimension (line) are
individual. The second dimension (column) are time at which the
measurement are made. The third dimension are the differents
variable-trajectories. For example, traj[,,2]
is the second variable-trajectory.
If idAll
is missing, the individuals are labelled i1
,
i2
, i3
,...
If timeInData
is missing, all the column
are used (1:ncol(traj)
).
traj
is a data.frame
: lines are individual. Time of
measurement and variables should be provide through
timeInData
. timeInData
is a list.
The label of the list are the
variable-trajectories names. Elements of the list are the column
containning the trajectories. For example, if
timeInData=list(V=c(2,3,4),W=c(6,8,12))
, then the first
variable-trajectory is 'V', its mesearment are in column 2,3 and
4. The second variable-trajectory is 'W', its measurment are in column
6,8 and 12.
If idAll
is missing, the first column of the data.frame
is used.
An object of class ClusterLongData3d
.
############### ### Building an array tr1n <- array(c(1,2,NA, 1,4,NA, 6,1,8, 10,NA,2, 3,NA,NA, 4,NA,5, 6,3,4, 3,4,4, 4,NA,NA, 5,5,4), dim=c(3,5,2)) ############### ### clusterLongData ### With maxNA=3 clusterLongData3d(traj=tr1n, idAll=as.character(c(100,102,104)), time=c(1,2,4,8,16), varNames=c("P","A"), maxNA=3 ) ### With maxNA=2 ### Individual 104 is exclude clusterLongData3d(traj=tr1n, idAll=as.character(c(100,102,104)), time=c(1,2,4,8,16), varNames=c("P","A"), maxNA=2 )
############### ### Building an array tr1n <- array(c(1,2,NA, 1,4,NA, 6,1,8, 10,NA,2, 3,NA,NA, 4,NA,5, 6,3,4, 3,4,4, 4,NA,NA, 5,5,4), dim=c(3,5,2)) ############### ### clusterLongData ### With maxNA=3 clusterLongData3d(traj=tr1n, idAll=as.character(c(100,102,104)), time=c(1,2,4,8,16), varNames=c("P","A"), maxNA=3 ) ### With maxNA=2 ### Individual 104 is exclude clusterLongData3d(traj=tr1n, idAll=as.character(c(100,102,104)), time=c(1,2,4,8,16), varNames=c("P","A"), maxNA=2 )
ClusterLongData3d
is an object containing joint-trajectories and
associated Partition
(from package longitudinalData
).
kml3d
is an algorithm that builds a set of Partition
from joint longitudinal data. ClusterLongData3d
is the object containing the original joint longitudinal data and all the Partition
that kml3d
finds.
When created, an ClusterLongData3d
object simply contains initial
data (the joint-trajectories).
After the execution of kml3d
, it contains
the original data and the Partition
which has
just been found by kml3d
.
Note that if kml3d
is executed several times, every new Partition
are added to the original ones, no pre-existing Partition
is erased.
idAll
[vector(character)]
: Single identifier
for each of the joint-trajectory (each individual). Usefull for exporting clusters.
idFewNA
[vector(character)]
: Restriction of
idAll
to the trajectories that does not have 'too many' missing
value. See maxNA
for details.
time
[numeric]
: Time at which measures are made.
varNames
[vector(character)]
: Names of the variable measured.
traj
[array(numeric)]
: Contains
the joint longitudianl data. Each horizontal plan (first dimension) corresponds to the trajectories of an
individual. Vertical plans (second dimension) refer to the time at which measures
are made. Transversal plans (the third dimension) are for variables.
dimTraj
[vector3(numeric)]
: size of the array
traj
(ie c(length(idFewNA),length(time),length(varNames))
).
maxNA
[numeric]
or [vector(numeric)]
:
Individual whose trajectories contain more missing value than
maxNA
are exclude from traj
and will no be use in
the analysis. Their identifier is preserved in idAll
but
not in idFewNA
. When maxNA
is a single number, it is
used for all the variables.
reverse
[matrix(numeric)]
: contain the
mean (first line) and the standard deviation (second line) used to
normalize the data. Usefull to restaure the original data after a
scaling operation.
criterionActif
[character]: Store the criterion name that will be used by functions that need a single criterion (like plotCriterion or ordered).
initializationMethod
[vector(chararcter)]: list all
the initialization method that has allready been used to find some
Partition
(usefull to not run several time a deterministic method).
sorted
[logical]
: are the Partition
curently hold in the object sorted in decreasing order ?
c1
[list(Partition)]: list of
Partition
with 1 clusters.
c2
[list(Partition)]: list of
Partition
with 2 clusters.
c3
[list(Partition)]: list of
Partition
with 3 clusters.
...
c26
[list(Partition)]: list of
Partition
with 26 clusters.
Class LongData3d
in packagelongitudinalData
, directly.
Class ListPartition
in packagelongitudinalData
, directly.
object['xxx']
Get the value of the field
xxx
. Inherit from LongData3d
and
ListPartition
(in package longitudinalData
).
object['xxx']<-value
Set the field xxx
to value
.
xxx
. Inherit from ListPartition
.
plot
Display the ClusterLongData3d
, one
graph for each variable, according to a Partition
.
plot3d
Display two
variables of the ClusterLongData3d
in 3D according to a
Partition
.
plot3dPdf
Export the AZY code for displaying two
variables of the ClusterLongData3d
in a 3D pdf graph.
Special thanks to Boris Hejblum for debugging the '[' and '[<-' operators (the previous version was not compatible with the matrix package, which is used by lme4).
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Building longData traj <- array(c(1,2,3,1,4, 3,6,1,8,10, 1,2,1,3,2, 4,2,5,6,3, 4,3,4,4,4, 7,6,5,5,4), dim=c(3,5,2)) myCld <- clusterLongData3d( traj=traj, idAll=as.character(c(100,102,103)), time=c(1,2,4,8,15), varNames=c("P","A"), maxNA=3 ) ### Show myCld ### Get myCld['varNames'] ### Set myCld['criterionActif']<-"Davies.Bouldin" ### Plot plot(myCld) ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Building longData traj <- array(c(1,2,3,1,4, 3,6,1,8,10, 1,2,1,3,2, 4,2,5,6,3, 4,3,4,4,4, 7,6,5,5,4), dim=c(3,5,2)) myCld <- clusterLongData3d( traj=traj, idAll=as.character(c(100,102,103)), time=c(1,2,4,8,15), varNames=c("P","A"), maxNA=3 ) ### Show myCld ### Get myCld['varNames'] ### Set myCld['criterionActif']<-"Davies.Bouldin" ### Plot plot(myCld) ### Go back to current dir setwd(wd)
Compute the distante between two joint trajectories.
dist3d(x, y, method = "euclidian", power = 2)
dist3d(x, y, method = "euclidian", power = 2)
x |
|
y |
|
method |
|
power |
|
Compute the distante between two joint trajectories, using one of the
distance define by dist
.
A numeric
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Generate artificial data myCld <- gald3d() ### Distance between individual 1 and 3 (there are in the same group) dist3d(myCld['traj'][1,,],myCld['traj'][3,,]) ### Distance between individual 1 and 51 (there are in two different groups) dist3d(myCld['traj'][1,,],myCld['traj'][51,,]) ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Generate artificial data myCld <- gald3d() ### Distance between individual 1 and 3 (there are in the same group) dist3d(myCld['traj'][1,,],myCld['traj'][3,,]) ### Distance between individual 1 and 51 (there are in two different groups) dist3d(myCld['traj'][1,,],myCld['traj'][51,,]) ### Go back to current dir setwd(wd)
This function builp up an artificial longitudinal data set (joint
trajectories) an turn them
into an object of class ClusterLongData
(from package longitudinalData
).
gald3d(nbEachClusters=50,time=0:10,varNames=c("V","T"), meanTrajectories=list(function(t){c(0,0)}, function(t){c(10,10)},function(t){c(10-t,10-t)}), personalVariation=function(t){c(rnorm(1,0,2),rnorm(1,0,2))}, residualVariation=function(t){c(rnorm(1,0,2),rnorm(1,0,2))}, decimal=2,percentOfMissing=0) generateArtificialLongData3d(nbEachClusters=50,time=0:10,varNames=c("V","T"), meanTrajectories=list(function(t){c(0,0)}, function(t){c(10,10)},function(t){c(10-t,10-t)}), personalVariation=function(t){c(rnorm(1,0,2),rnorm(1,0,2))}, residualVariation=function(t){c(rnorm(1,0,2),rnorm(1,0,2))}, decimal=2,percentOfMissing=0)
gald3d(nbEachClusters=50,time=0:10,varNames=c("V","T"), meanTrajectories=list(function(t){c(0,0)}, function(t){c(10,10)},function(t){c(10-t,10-t)}), personalVariation=function(t){c(rnorm(1,0,2),rnorm(1,0,2))}, residualVariation=function(t){c(rnorm(1,0,2),rnorm(1,0,2))}, decimal=2,percentOfMissing=0) generateArtificialLongData3d(nbEachClusters=50,time=0:10,varNames=c("V","T"), meanTrajectories=list(function(t){c(0,0)}, function(t){c(10,10)},function(t){c(10-t,10-t)}), personalVariation=function(t){c(rnorm(1,0,2),rnorm(1,0,2))}, residualVariation=function(t){c(rnorm(1,0,2),rnorm(1,0,2))}, decimal=2,percentOfMissing=0)
nbEachClusters |
|
time |
|
varNames |
|
meanTrajectories |
|
personalVariation |
|
residualVariation |
|
decimal |
|
percentOfMissing |
|
generateArtificialLongData3d
(gald3d
in short) is a
function that contruct a set of artificial joint longitudinal data.
Each individual is considered as belonging to a group. This group
follows a theoretical trajectory, function of time.
These functions (one per group) are given via the argument meanTrajectories
.
Within a group, the individual undergoes individal
variations. Individual variations are given via the argument residualVariation
.
The number of individuals in each group is given by nbEachClusters
.
Finally, it is possible to add missing values randomly (MCAR) striking the
data thanks to percentOfMissing
.
Object of class ClusterLongData
(see package longitudinalData
).
Christophe Genolini
1. UMR U1027, INSERM, Université Paul Sabatier / Toulouse III / France
2. CeRSME, EA 2931, UFR STAPS, Université de Paris Ouest-Nanterre-La Défense / Nanterre / France
[1] C. Genolini and B. Falissard
"KmL: k-means for longitudinal data"
Computational Statistics, vol 25(2), pp 317-328, 2010
[2] C. Genolini and B. Falissard
"KmL: A package to cluster longitudinal data"
Computer Methods and Programs in Biomedicine, 104, pp e112-121, 2011
ClusterLongData3d
, clusterLongData3d
, generateArtificialLongData
##################### ### Default example ex1 <- generateArtificialLongData3d() plot3d(ex1,parTraj=parTRAJ(col=rep(2:4,each=50))) ##################### ### 4 lines with unbalanced groups ex2 <- generateArtificialLongData3d( nbEachClusters=c(5,10,20,40), meanTrajectories=list( function(t)c(t,t^3/100), function(t)c(0,t), function(t)c(t,t), function(t)c(0,t^3/100) ), residualVariation = function(t){c(rnorm(1,0,1),rnorm(1,0,1))} ) plot3d(ex2,parTraj=parTRAJ(col=rep(1:4,time=c(5,10,20,40))))
##################### ### Default example ex1 <- generateArtificialLongData3d() plot3d(ex1,parTraj=parTRAJ(col=rep(2:4,each=50))) ##################### ### 4 lines with unbalanced groups ex2 <- generateArtificialLongData3d( nbEachClusters=c(5,10,20,40), meanTrajectories=list( function(t)c(t,t^3/100), function(t)c(0,t), function(t)c(t,t), function(t)c(0,t^3/100) ), residualVariation = function(t){c(rnorm(1,0,1),rnorm(1,0,1))} ) plot3d(ex2,parTraj=parTRAJ(col=rep(1:4,time=c(5,10,20,40))))
kml3d
is a new implementation of k-means for joint longitudinal
data (or joint trajectories). This algorithm is able to deal with missing value and
provides an easy way to re roll the algorithm several times, varying the starting conditions and/or the number of clusters looked for.
Here is the description of the algorithm. For an overview of the package, see kml3d-package.
kml3d(object, nbClusters = 2:6, nbRedrawing = 20, toPlot = "none", parAlgo = parKml3d())
kml3d(object, nbClusters = 2:6, nbRedrawing = 20, toPlot = "none", parAlgo = parKml3d())
object |
[ClusterLongData3d]: contains trajectories to clusterize
and some |
nbClusters |
[vector(numeric)]: Vector containing the number of clusters
with which |
nbRedrawing |
[numeric]: Sets the number of time that k-means must be re-run (with different starting conditions) for each number of clusters. |
toPlot |
|
parAlgo |
|
kml3d
works on object of class ClusterLongData
.
For each number i
included in nbClusters
, kml3d
computes a
Partition
then stores it in the field
cX
of the object ClusterLongData
according to its number
of clusters 'X'.
The algorithm starts over as many times as it is told in nbRedrawing
. By default, it is executed for 2,
3, 4, 5 and 6 clusters 20 times each, namely 100 times.
When a Partition
has been found, it is added to the slot
c1, c2, c3, ... or c26. cX
stores the all Partition
with
X clusters. Inside a sublist, the
Partition
are sorted from the biggest quality criterion to
the smallest (the best are stored first, using
ordered,ListPartition
), or not.
Note that Partition
are saved throughout the algorithm. If the user
interrupts the execution of kml3d
, the result is not lost. If the
user run kml3d
on an object, then running kml3d
again on
the same object will add some new Partition
to the one already
found.
The possible starting conditions are defined in initializePartition
.
A ClusterLongData3d
object, after having added
some Partition
to it.
Behind kml3d
, there are two different procedures :
Fast: when the parameter distance
is set to "euclidean3d"
and toPlot
is set to 'none' or
'criterion', kml3d
call a C
compiled (optimized) procedure.
Slow: when the user defines its own distance or if he wants
to see the construction of the clusters by setting toPlot
to
'traj' or 'both', kml3d
uses a R non compiled
programmes.
The C prodecure is 25 times faster than the R one.
So we advice to use the R procedure 1/ for trying some new method
(like using a new distance) or 2/ to "see" the very first clusters
construction, in order to check that every thing goes right. Then it
is better to
switch to the C procedure (like we do in Example
section).
If for a specific use, you need a different distance, feel free to contact the author.
Overview: kml3d-package
Classes : ClusterLongData3d
, Partition
in package longitudinalData
Methods : clusterLongData3d
, choice
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Generation of some data cld1 <- generateArtificialLongData3d(15) ### We suspect 2, 3, 4 or 5 clusters, we want 3 redrawing. ### We want to "see" what happen (so toPlot="both") kml3d(cld1,2:5,3,toPlot="both") ### 3 seems to be the best. ### We don't want to see again, we want to get the result as fast as possible. ### Just, to check the overall process, we plot the criterion evolution kml3d(cld1,3,10,toPlot="criterion") ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Generation of some data cld1 <- generateArtificialLongData3d(15) ### We suspect 2, 3, 4 or 5 clusters, we want 3 redrawing. ### We want to "see" what happen (so toPlot="both") kml3d(cld1,2:5,3,toPlot="both") ### 3 seems to be the best. ### We don't want to see again, we want to get the result as fast as possible. ### Just, to check the overall process, we plot the criterion evolution kml3d(cld1,3,10,toPlot="criterion") ### Go back to current dir setwd(wd)
parKml3d
is a constructor of object ParKml
(from package kml
)
that provide adequate default value for the use of function kml3d
.
parKml3d(saveFreq = 100, maxIt = 200, imputationMethod = "copyMean", distanceName = "euclidean3d", power = 2, distance = function() { }, centerMethod = meanNA, startingCond = "nearlyAll", nbCriterion =100,scale=TRUE)
parKml3d(saveFreq = 100, maxIt = 200, imputationMethod = "copyMean", distanceName = "euclidean3d", power = 2, distance = function() { }, centerMethod = meanNA, startingCond = "nearlyAll", nbCriterion =100,scale=TRUE)
saveFreq |
|
maxIt |
|
imputationMethod |
|
distanceName |
|
power |
|
distance |
|
centerMethod |
|
startingCond |
|
nbCriterion |
|
scale |
|
parKml3d
is a constructor of object ParKml
(from package kml
)
that provide adequate default value for the use of function kml3d
.
An object ParKml
(see package kml
).
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Generation of some data cld1 <- generateArtificialLongData3d(c(15,15,15)) ### Setting two different set of option : (option1 <- parKml3d()) (option2 <- parKml3d(centerMethod=function(x)median(x,na.rm=TRUE))) ### Running kml. Formaly, the second exemple is 'k-median' kml3d(cld1,4,1,toPlot="both",parAlgo=option1) kml3d(cld1,4,1,toPlot="both",parAlgo=option2) ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Generation of some data cld1 <- generateArtificialLongData3d(c(15,15,15)) ### Setting two different set of option : (option1 <- parKml3d()) (option2 <- parKml3d(centerMethod=function(x)median(x,na.rm=TRUE))) ### Running kml. Formaly, the second exemple is 'k-median' kml3d(cld1,4,1,toPlot="both",parAlgo=option1) kml3d(cld1,4,1,toPlot="both",parAlgo=option2) ### Go back to current dir setwd(wd)
plot
the trajectories of an object
ClusterLongData
relatively to a Partition
.
One graph for each variable is displayed.
## S4 method for signature 'ClusterLongData3d,ANY' plot(x,y=NA,parTraj=parTRAJ(),parMean=parMEAN(), addLegend=TRUE,adjustLegend=-0.05,toPlot="both",nbCriterion=1000,...)
## S4 method for signature 'ClusterLongData3d,ANY' plot(x,y=NA,parTraj=parTRAJ(),parMean=parMEAN(), addLegend=TRUE,adjustLegend=-0.05,toPlot="both",nbCriterion=1000,...)
x |
|
y |
|
parTraj |
|
parMean |
|
toPlot |
|
nbCriterion |
|
addLegend |
|
adjustLegend |
|
... |
Some other parameters can be passed to the method (like "xlab" or "ylab". |
plot
the trajectories of an object ClusterLongData3d
relativly
to the 'best' Partition
, or to the
Partition
define by y
.
Graphical option concerning the individual trajectory (col, type, pch
and xlab) can be change using parTraj
.
Graphical option concerning the cluster mean trajectory (col, type, pch,
pchPeriod and cex) can be change using parMean
. For more
detail on parTraj
and parMean
, see object of
class ParLongData
in package longitudinalData
.
Overview: kml3d-package
Classes : ClusterLongData3d
Plot : plotTraj
,
plotCriterion
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ################## ### Construction of the data myCld <- gald3d() ### Basic plotting plot(myCld) ################## ### Changing graphical parameters 'par' ### No letters on the mean trajectories kml3d(myCld,2:7,2) plot(myCld,2,parMean=parMEAN(type="l")) ### Only one letter on the mean trajectories plot(myCld,3,parMean=parMEAN(pchPeriod=Inf)) ### Color individual according to its clusters (col="clusters") plot(myCld,4,parTraj=parTRAJ(col="clusters")) ### Mean without individual plot(myCld,5,parTraj=parTRAJ(type="n")) ### No mean trajectories (type="n") ### Color individual according to its clusters (col="clusters") plot(myCld,6,parTraj=parTRAJ(col="clusters"),parMean=parMEAN(type="n")) ### Only few trajectories plot(myCld,7,nbSample=10,parTraj=parTRAJ(col='clusters'),parMean=parMEAN(type="n")) ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ################## ### Construction of the data myCld <- gald3d() ### Basic plotting plot(myCld) ################## ### Changing graphical parameters 'par' ### No letters on the mean trajectories kml3d(myCld,2:7,2) plot(myCld,2,parMean=parMEAN(type="l")) ### Only one letter on the mean trajectories plot(myCld,3,parMean=parMEAN(pchPeriod=Inf)) ### Color individual according to its clusters (col="clusters") plot(myCld,4,parTraj=parTRAJ(col="clusters")) ### Mean without individual plot(myCld,5,parTraj=parTRAJ(type="n")) ### No mean trajectories (type="n") ### Color individual according to its clusters (col="clusters") plot(myCld,6,parTraj=parTRAJ(col="clusters"),parMean=parMEAN(type="n")) ### Only few trajectories plot(myCld,7,nbSample=10,parTraj=parTRAJ(col='clusters'),parMean=parMEAN(type="n")) ### Go back to current dir setwd(wd)
Plot two variables of a ClusterLongData3d
object in 3D, optionnaly relatively to a Partition
.
## S4 method for signature 'ClusterLongData3d,numeric' plot3d(x,y,varY=1,varZ=2, parTraj=parTRAJ(),parMean=parMEAN(),...)
## S4 method for signature 'ClusterLongData3d,numeric' plot3d(x,y,varY=1,varZ=2, parTraj=parTRAJ(),parMean=parMEAN(),...)
x |
|
y |
|
varY |
|
varZ |
|
parTraj |
|
parMean |
|
... |
Arguments to be passed to methods, such as graphical parameters. |
Plot two variables of a ClusterLongData3d
object in 3D. It
use the rgl
library. The user can make the
graphical representation turn using its mouse.
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ################## ### Real example on array time=c(1,2,3,4,8,12,16,20) id2=1:120 f <- function(id,t)((id-1)%%3-1) * t g <- function(id,t)(id%%2+1)*t h <- function(id,t)(id%%4-0.5)*(20-t) myCld <- clusterLongData3d(array(cbind(outer(id2,time,f),outer(id2,time,g), outer(id2,time,h))+rnorm(120*8*3,0,3),dim=c(120,8,3))) ### Basic plot plot(myCld,parTraj=parTRAJ(col=rep(1:6,20))) ### plot3d, variable 1 and 2 plot3d(myCld,parTraj=parTRAJ(col=rep(1:6,20))) ### plot3d, variable 1 and 3 plot3d(myCld,parTraj=parTRAJ(col=rep(1:6,20)),varZ=3) plot3d(myCld,parTraj=parTRAJ(col="red")) ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ################## ### Real example on array time=c(1,2,3,4,8,12,16,20) id2=1:120 f <- function(id,t)((id-1)%%3-1) * t g <- function(id,t)(id%%2+1)*t h <- function(id,t)(id%%4-0.5)*(20-t) myCld <- clusterLongData3d(array(cbind(outer(id2,time,f),outer(id2,time,g), outer(id2,time,h))+rnorm(120*8*3,0,3),dim=c(120,8,3))) ### Basic plot plot(myCld,parTraj=parTRAJ(col=rep(1:6,20))) ### plot3d, variable 1 and 2 plot3d(myCld,parTraj=parTRAJ(col=rep(1:6,20))) ### plot3d, variable 1 and 3 plot3d(myCld,parTraj=parTRAJ(col=rep(1:6,20)),varZ=3) plot3d(myCld,parTraj=parTRAJ(col="red")) ### Go back to current dir setwd(wd)
Given a ClusterLongData3d
and a
Partition
(from package longitudinalData
), this
function creates Triangle objects representing the 3D plot of two
variables of the main trajectories.
## S4 method for signature 'ClusterLongData3d,missing' plot3dPdf(x,y,varY=1,varZ=2) ## S4 method for signature 'ClusterLongData3d,numeric' plot3dPdf(x,y,varY=1,varZ=2)
## S4 method for signature 'ClusterLongData3d,missing' plot3dPdf(x,y,varY=1,varZ=2) ## S4 method for signature 'ClusterLongData3d,numeric' plot3dPdf(x,y,varY=1,varZ=2)
x |
|
y |
|
varY |
|
varZ |
|
Create Triangle objects representing the 3D plot of the main
trajectories of a ClusterLongData
(of package longitudinalData
).
The three functions plot3dPdf
,
saveTrianglesAsASY
and makeLatexFile
are design to export a 3D graph to a Pdf file. The process is the following:
plot3dPdf
: Create a scene, that is a collection of Triangle object that
represent a 3D images.
saveTrianglesAsASY
: Export the scene in an '.asy' file.
'.azy' can not be include in LaTeX file. LaTeX can read only '.pre' file. So the next step is to use
asymptote
to convert '.asy' tp '.pre'. This is done by the command asy -inlineimage -tex pdflatex
scene.azy
.
The previous step did produce a file scene+0.prc
that can be include in a LaTeX file.
makeLatexFile
create a LaTeX file that is directly compilable (using pdfLatex
).
It produce a pdf file that contain the 3D object.
A Triangle object.
Christophe Genolini
INSERM U669 / PSIGIAM: Paris Sud Innovation Group in Adolescent Mental Health
Modal'X / Universite Paris Ouest-Nanterre- La Defense
Contact author : [email protected]
Article "KmL: K-means for Longitudinal Data", in
Computational Statistics, Volume 25, Issue 2 (2010), Page 317.
Web site: http://christophe.genolini.free.fr/kml/
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Generating the data myCld3d <- gald3d(c(5,5,5)) kml3d(myCld3d,3:4,1) ### Creation of the scene scene <- plot3dPdf(myCld3d,3) drawScene.rgl(scene) ### Export in '.azy' file saveTrianglesAsASY(scene) ### Creation of a '.prc' file # Open a console window, then run # asy -inlineimage -tex pdflatex scene.azy ### Creation of the LaTeX main document makeLatexFile() ### Creation of the '.pdf' # Open a console window, then run # pdfLatex main.tex ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ### Generating the data myCld3d <- gald3d(c(5,5,5)) kml3d(myCld3d,3:4,1) ### Creation of the scene scene <- plot3dPdf(myCld3d,3) drawScene.rgl(scene) ### Export in '.azy' file saveTrianglesAsASY(scene) ### Creation of a '.prc' file # Open a console window, then run # asy -inlineimage -tex pdflatex scene.azy ### Creation of the LaTeX main document makeLatexFile() ### Creation of the '.pdf' # Open a console window, then run # pdfLatex main.tex ### Go back to current dir setwd(wd)
Plot the means of two variables of a ClusterLongData3d
object in 3D
relatively to a Partition
(from package in longitudinalData
).
## S4 method for signature 'ClusterLongData3d,numeric' plotMeans3d(x,y,varY=1,varZ=2, parTraj=parTRAJ(type="n"),parMean=parMEAN(),...)
## S4 method for signature 'ClusterLongData3d,numeric' plotMeans3d(x,y,varY=1,varZ=2, parTraj=parTRAJ(type="n"),parMean=parMEAN(),...)
x |
|
y |
|
varY |
|
varZ |
|
parTraj |
|
parMean |
|
... |
Arguments to be passed to methods, such as graphical parameters. |
Plot two variables of a ClusterLongData3d
object in 3D. It
use the rgl
library. The user can make the
graphical representation turn using its mouse.
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ################## ### Real example on array time=c(1,2,3,4,8,12,16,20) id2=1:120 f <- function(id,t)((id-1)%%3-1) * t g <- function(id,t)(id%%2+1)*t h <- function(id,t)(id%%4-0.5)*(20-t) myCld <- clusterLongData3d(array(cbind(outer(id2,time,f),outer(id2,time,g), outer(id2,time,h))+rnorm(120*8*3,0,3),dim=c(120,8,3))) kml3d(myCld,3:4,2) ### Basic plot plotMeans3d(myCld,3) ### plotMeans3d, variable 1 and 3 plotMeans3d(myCld,4,varZ=3) plotMeans3d(myCld,3,parTraj=parTRAJ(col="red")) ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ################## ### Real example on array time=c(1,2,3,4,8,12,16,20) id2=1:120 f <- function(id,t)((id-1)%%3-1) * t g <- function(id,t)(id%%2+1)*t h <- function(id,t)(id%%4-0.5)*(20-t) myCld <- clusterLongData3d(array(cbind(outer(id2,time,f),outer(id2,time,g), outer(id2,time,h))+rnorm(120*8*3,0,3),dim=c(120,8,3))) kml3d(myCld,3:4,2) ### Basic plot plotMeans3d(myCld,3) ### plotMeans3d, variable 1 and 3 plotMeans3d(myCld,4,varZ=3) plotMeans3d(myCld,3,parTraj=parTRAJ(col="red")) ### Go back to current dir setwd(wd)
Plot the trajectories of two variables of a ClusterLongData3d
object in 3D
relatively to a Partition
from package longitudinalData
.
## S4 method for signature 'ClusterLongData3d,numeric' plotTraj3d(x,y,varY=1,varZ=2, parTraj=parTRAJ(col="clusters"),parMean=parMEAN(type="n"),...)
## S4 method for signature 'ClusterLongData3d,numeric' plotTraj3d(x,y,varY=1,varZ=2, parTraj=parTRAJ(col="clusters"),parMean=parMEAN(type="n"),...)
x |
|
y |
|
varY |
|
varZ |
|
parTraj |
|
parMean |
|
... |
Arguments to be passed to methods, such as graphical parameters. |
Plot the means trajectories of two variables of a ClusterLongData3d
object in 3D. It
use the rgl
library. The user can make the
graphical representation turn using its mouse.
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ################## ### Real example on array time=c(1,2,3,4,8,12,16,20) id2=1:120 f <- function(id,t)((id-1)%%3-1) * t g <- function(id,t)(id%%2+1)*t h <- function(id,t)(id%%4-0.5)*(20-t) myCld <- clusterLongData3d(array(cbind(outer(id2,time,f),outer(id2,time,g), outer(id2,time,h))+rnorm(120*8*3,0,3),dim=c(120,8,3))) kml3d(myCld,3:4,2) ### Basic plot plotMeans3d(myCld,3) ### plotTraj3d, variable 1 and 3 plotMeans3d(myCld,4,varZ=3) plotMeans3d(myCld,3,parMean=parMEAN(col="red")) ### Go back to current dir setwd(wd)
### Move to tempdir wd <- getwd() setwd(tempdir()); getwd() ################## ### Real example on array time=c(1,2,3,4,8,12,16,20) id2=1:120 f <- function(id,t)((id-1)%%3-1) * t g <- function(id,t)(id%%2+1)*t h <- function(id,t)(id%%4-0.5)*(20-t) myCld <- clusterLongData3d(array(cbind(outer(id2,time,f),outer(id2,time,g), outer(id2,time,h))+rnorm(120*8*3,0,3),dim=c(120,8,3))) kml3d(myCld,3:4,2) ### Basic plot plotMeans3d(myCld,3) ### plotTraj3d, variable 1 and 3 plotMeans3d(myCld,4,varZ=3) plotMeans3d(myCld,3,parMean=parMEAN(col="red")) ### Go back to current dir setwd(wd)
These longitudinal data are extract form the QUIDEL database whose aims is to studies hormone profiles among women who have no fertility problem.
data(pregnandiol)
data(pregnandiol)
Some longitudinal data in wide format. It includes 107 women who have been followed during up to 49 days. Each column correspond to a specific time meseaurement. The outcome is the hormone "pregnandiol".
id
unique idenfier for each patient.
day1
Measurement of pregnandiol at day 1.
day2
Measurement of pregnandiol at day 2.
day3
Measurement of pregnandiol at day 3.
...
...
day 49
Measurement of pregnandiol at day 49.
The QUIDEL database aims to gain better knowledge of hormone profiles among women who have no fertility problem. This database has been described as the largest existing database on hormone profiles in the normal human menstrual cycle, involving ultrasound scan of the day of ovulation [eco06]. It involves 107 women and 283 cycles in all, with identification of the day of ovulation and daily titration of the levels of the four main hormones in the ovulation cycle. The database belongs to the laboratory in charge of the analysis of hormone trajectories (CNRS 5558, René Ecochard). It has already been the subject of numerous publications, including [eco00, eco01].
QUIDEL cohort
[eco00] Ecochard R, Gougeon A. Side of ovulation and cycle characteristics in normally fertile women. Human reproduction (Oxford, England). 2000;15(4):752-755.
[eco01] Ecochard R et al. Chronological aspects of ultrasonic, hormonal, and other indirect indices of ovulation. BJOG : an international journal of obstetrics and gynaecology. 2001;108(8):822-829.