Title: | Model-Based Co-Clustering of Functional Data |
---|---|
Description: | The funLBM algorithm allows to simultaneously cluster the rows and the columns of a data matrix where each entry of the matrix is a function or a time series. |
Authors: | Charles Bouveyron, Julien Jacques and Amandine Schmutz |
Maintainer: | Charles Bouveyron <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.3 |
Built: | 2024-12-14 06:50:27 UTC |
Source: | CRAN |
The adjusted Rand index (ARI) allows to compare two clustering partitions.
ari(x, y)
ari(x, y)
x |
The first partition to compare (as vector). |
y |
The second partition to compare (as vector). |
ari |
The value of the ARI. |
x = sample(1:3,20,replace = TRUE) y = sample(1:3,20,replace = TRUE) ari(x,y)
x = sample(1:3,20,replace = TRUE) y = sample(1:3,20,replace = TRUE) ari(x,y)
The funLBM algorithm, proposed by Bouveyron et al. (2018) <doi:10.1111/rssc.12260>, allows to simultaneously cluster the rows and the columns of one or more data matrix where each entry of the matrix is a (univariate or multivariate) function or a time series.
funLBM(X, K, L, maxit = 50, burn = 25, basis.name = "fourier", nbasis = 15, nbinit = 1, gibbs.it = 3, display = FALSE, init = "funFEM", mc.cores = 1, ...)
funLBM(X, K, L, maxit = 50, burn = 25, basis.name = "fourier", nbasis = 15, nbinit = 1, gibbs.it = 3, display = FALSE, init = "funFEM", mc.cores = 1, ...)
X |
Univariate case: The data array (n x p x T) where each entry corresponds to the measure of one individual i, i=1,...,n, for a functional variable j, j=1,...,p, at point t, t=1,...,T. Multivariate case: a list of data array as described hereinabove with one data array by variable. |
K |
The number of row clusters, |
L |
The number of column clusters, |
maxit |
The maximum number of iterations of the SEM-Gibbs algorithm (default is 100), |
burn |
The number of of iterations of the burn-in period (default is 50), |
basis.name |
The name ('fourier' or 'spline') of the basis functions used for the decomposition of the functions (default is 'fourier'), |
nbasis |
Number of the basis functions used for the decomposition of the functions (default is 15), |
nbinit |
Number of initializations (default is 3), |
gibbs.it |
Number of Gibbs iterations (default is 3), |
display |
Binary value. If TRUE, information about the iterations is displayed, |
init |
The type of initialization: 'random', 'kmeans' or 'funFEM'. Default is 'kmeans', |
mc.cores |
The number of cores for parallel computing (default is 1), |
... |
Additional parameters to provide to sub-functions. |
The resulting object contains, in addition to call information:
prms |
A list containing all fited parameters for the best model (according to ICL) |
Z |
The dummy matrix of row clustering |
W |
The dummy matrix of column clustering |
row_clust |
The group memberships of rows |
col_clust |
The group memberships of columns |
allPrms |
A list containing the fited parameters for all tested models |
loglik |
The log-likelihood of the best model |
icl |
The value of ICL for the best model |
C. Bouveyron, L. Bozzi, J. Jacques and F.-X. Jollois, The Functional Latent Block Model for the Co-Clustering of Electricity Consumption Curves, Journal of the Royal Statistical Society, Series C, 2018 (https://doi.org/10.1111/rssc.12260).
## Univariate example: Co-clustering on simulated data set.seed(12345) X = simulateData(n = 30, p = 30, t = 15) out = funLBM(X$data,K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means') # Evaluating clustering results ari(out$col_clust,X$col_clust) ari(out$row_clust,X$row_clust) ## Multivariate example: X = simulateData2(n = 50, p = 50, t = 15) out = funLBM(list(X$data1,X$data2),K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means') # Evaluating clustering results ari(out$col_clust,X$col_clust) ari(out$row_clust,X$row_clust) ## The following examples could take a few minutes to run ## and depend on the number of available CPU cores! ## Co-clustering on simulated data with parallel model selection #X = simulateData(n = 30, p = 30, t = 15) #out = funLBM(X$data,K=2:4,L=2:4,mc.cores = 4) ## Evaluating clustering results #ari(out$col_clust,X$col_clust) #ari(out$row_clust,X$row_clust) ## Co-clustering of Velib data #data(Velib) #out = funLBM(Velib$data,K=4,L=2) ## Visualization of results #plot(out,type='blocks') #plot(out,type='proportions') #plot(out,type='means')
## Univariate example: Co-clustering on simulated data set.seed(12345) X = simulateData(n = 30, p = 30, t = 15) out = funLBM(X$data,K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means') # Evaluating clustering results ari(out$col_clust,X$col_clust) ari(out$row_clust,X$row_clust) ## Multivariate example: X = simulateData2(n = 50, p = 50, t = 15) out = funLBM(list(X$data1,X$data2),K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means') # Evaluating clustering results ari(out$col_clust,X$col_clust) ari(out$row_clust,X$row_clust) ## The following examples could take a few minutes to run ## and depend on the number of available CPU cores! ## Co-clustering on simulated data with parallel model selection #X = simulateData(n = 30, p = 30, t = 15) #out = funLBM(X$data,K=2:4,L=2:4,mc.cores = 4) ## Evaluating clustering results #ari(out$col_clust,X$col_clust) #ari(out$row_clust,X$row_clust) ## Co-clustering of Velib data #data(Velib) #out = funLBM(Velib$data,K=4,L=2) ## Visualization of results #plot(out,type='blocks') #plot(out,type='proportions') #plot(out,type='means')
Functional data observations, or a derivative of them, are plotted.
These may be either plotted simultaneously, as matplot
does for
multivariate data, or one by one with a mouse click to move from one
plot to another. The function also accepts the other plot
specification arguments that the regular plot
does. Calling
plot
with an fdSmooth
or an fdPar
object plots its fd
component.
## S3 method for class 'fd' plot(x, y, Lfdobj=0, href=TRUE, titles=NULL, xlim=NULL, ylim=NULL, xlab=NULL, ylab=NULL, ask=FALSE, nx=NULL, axes=NULL, col=1, ...)
## S3 method for class 'fd' plot(x, y, Lfdobj=0, href=TRUE, titles=NULL, xlim=NULL, ylim=NULL, xlab=NULL, ylab=NULL, ask=FALSE, nx=NULL, axes=NULL, col=1, ...)
x |
functional data object(s) to be plotted. |
y |
sequence of points at which to evaluate the functions 'x' and plot on the horizontal axis. Defaults to seq(rangex[1], rangex[2], length = nx). NOTE: This will be the values on the horizontal axis, NOT the vertical axis. |
Lfdobj |
either a nonnegative integer or a linear differential operator object. If present, the derivative or the value of applying the operator is plotted rather than the functions themselves. |
href |
a logical variable: If |
titles |
a vector of strings for identifying curves |
xlab |
a label for the horizontal axis. |
ylab |
a label for the vertical axis. |
xlim |
a vector of length 2 containing axis limits for the horizontal axis. |
ylim |
a vector of length 2 containing axis limits for the vertical axis. |
ask |
a logical value: If |
nx |
the number of points to use to define the plot. The default is usually enough, but for a highly variable function more may be required. |
axes |
Either a logical or a list or
|
col |
line colors |
... |
additional plotting arguments that can be used with function
|
Note that for multivariate data, a suitable array must first be
defined using the par
function.
'done'
a plot of the functional observations
## ## plot.fd ## daybasis65 <- create.fourier.basis(c(0, 365), 65, axes=list("axesIntervals")) harmaccelLfd <- vec2Lfd(c(0,(2*pi/365)^2,0), c(0, 365)) harmfdPar <- fdPar(daybasis65, harmaccelLfd, lambda=1e5) daytempfd <- with(CanadianWeather, smooth.basis(day.5, dailyAv[,,"Temperature.C"], daybasis65)$fd) # plot all the temperature functions for the monthly weather data plot(daytempfd, main="Temperature Functions") ## Not run: # To plot one at a time: # The following pauses to request page changes. \dontshow{ # (Without 'dontrun', the package build process # might encounter problems with the par(ask=TRUE) # feature.) } plot(daytempfd, ask=TRUE) ## End(Not run) ## ## plot.fdSmooth ## b3.4 <- create.bspline.basis(norder=3, breaks=c(0, .5, 1)) # 4 bases, order 3 = degree 2 = # continuous, bounded, locally quadratic fdPar3 <- fdPar(b3.4, lambda=1) # Penalize excessive slope Lfdobj=1; # (Can not smooth on second derivative Lfdobj=2 at it is discontinuous.) fd3.4s0 <- smooth.basis(0:1, 0:1, fdPar3) # using plot.fd directly plot(fd3.4s0$fd) ## ## with Date and POSIXct argvals ## # Date invasion1 <- as.Date('1775-09-04') invasion2 <- as.Date('1812-07-12') earlyUS.Canada <- as.numeric(c(invasion1, invasion2)) BspInvasion <- create.bspline.basis(earlyUS.Canada) earlyUSyears <- seq(invasion1, invasion2, length.out=7) earlyUScubic <- (as.numeric(earlyUSyears-invasion1)/365.24)^3 earlyUSyears <- as.numeric(earlyUSyears) fitCubic <- smooth.basis(earlyUSyears, earlyUScubic, BspInvasion)$fd plot(fitCubic) # POSIXct AmRev.ct <- as.POSIXct1970(c('1776-07-04', '1789-04-30')) AmRevYrs.ct <- seq(AmRev.ct[1], AmRev.ct[2], length.out=14) AmRevLin.ct <- as.numeric(AmRevYrs.ct-AmRev.ct[2]) AmRevYrs.ct <- as.numeric(AmRevYrs.ct) BspRev.ct <- create.bspline.basis(AmRev.ct) fitLin.ct <- smooth.basis(AmRevYrs.ct, AmRevLin.ct, BspRev.ct)$fd plot(fitLin.ct)
## ## plot.fd ## daybasis65 <- create.fourier.basis(c(0, 365), 65, axes=list("axesIntervals")) harmaccelLfd <- vec2Lfd(c(0,(2*pi/365)^2,0), c(0, 365)) harmfdPar <- fdPar(daybasis65, harmaccelLfd, lambda=1e5) daytempfd <- with(CanadianWeather, smooth.basis(day.5, dailyAv[,,"Temperature.C"], daybasis65)$fd) # plot all the temperature functions for the monthly weather data plot(daytempfd, main="Temperature Functions") ## Not run: # To plot one at a time: # The following pauses to request page changes. \dontshow{ # (Without 'dontrun', the package build process # might encounter problems with the par(ask=TRUE) # feature.) } plot(daytempfd, ask=TRUE) ## End(Not run) ## ## plot.fdSmooth ## b3.4 <- create.bspline.basis(norder=3, breaks=c(0, .5, 1)) # 4 bases, order 3 = degree 2 = # continuous, bounded, locally quadratic fdPar3 <- fdPar(b3.4, lambda=1) # Penalize excessive slope Lfdobj=1; # (Can not smooth on second derivative Lfdobj=2 at it is discontinuous.) fd3.4s0 <- smooth.basis(0:1, 0:1, fdPar3) # using plot.fd directly plot(fd3.4s0$fd) ## ## with Date and POSIXct argvals ## # Date invasion1 <- as.Date('1775-09-04') invasion2 <- as.Date('1812-07-12') earlyUS.Canada <- as.numeric(c(invasion1, invasion2)) BspInvasion <- create.bspline.basis(earlyUS.Canada) earlyUSyears <- seq(invasion1, invasion2, length.out=7) earlyUScubic <- (as.numeric(earlyUSyears-invasion1)/365.24)^3 earlyUSyears <- as.numeric(earlyUSyears) fitCubic <- smooth.basis(earlyUSyears, earlyUScubic, BspInvasion)$fd plot(fitCubic) # POSIXct AmRev.ct <- as.POSIXct1970(c('1776-07-04', '1789-04-30')) AmRevYrs.ct <- seq(AmRev.ct[1], AmRev.ct[2], length.out=14) AmRevLin.ct <- as.numeric(AmRevYrs.ct-AmRev.ct[2]) AmRevYrs.ct <- as.numeric(AmRevYrs.ct) BspRev.ct <- create.bspline.basis(AmRev.ct) fitLin.ct <- smooth.basis(AmRevYrs.ct, AmRevLin.ct, BspRev.ct)$fd plot(fitLin.ct)
Plotting of funLBM co-clustering results: functional means, block matrix, parameters, ...
## S3 method for class 'funLBM' plot(x,type='blocks',...)
## S3 method for class 'funLBM' plot(x,type='blocks',...)
x |
An object produced by the funLBM function, |
type |
The type of plot to display. Possible plots are 'blocks' (default), 'means', 'evolution', 'likelihood', 'proportions', |
... |
Additional arguments to provide. |
## Co-clustering of simulated data set.seed(12345) X = simulateData(n = 30, p = 30, t = 15) out = funLBM(X$data,K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means')
## Co-clustering of simulated data set.seed(12345) X = simulateData(n = 30, p = 30, t = 15) out = funLBM(X$data,K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means')
Printing a summary of the funLBM co-clustering results
## S3 method for class 'funLBM' print(x,...)
## S3 method for class 'funLBM' print(x,...)
x |
An object produced by the funLBM function, |
... |
Additional arguments to provide. |
## Co-clustering of simulated data set.seed(12345) X = simulateData(n = 30, p = 30, t = 15) out = funLBM(X$data,K=4,L=3) out
## Co-clustering of simulated data set.seed(12345) X = simulateData(n = 30, p = 30, t = 15) out = funLBM(X$data,K=4,L=3) out
Simulate data according to the funLBM model with K=4 groups for rows and L=3 groups for columns.
simulateData(n = 100, p = 100, t = 30)
simulateData(n = 100, p = 100, t = 30)
n |
The number of rows (individuals) of the simulated data array, |
p |
The number of columns (functional variables) of the simulated data array, |
t |
The number of measures for the functions of the simulated data array. |
The resulting object contains:
data |
data array of size n x p x t |
row_clust |
Group memberships of rows |
col_clust |
Group memberships of columns |
C. Bouveyron, L. Bozzi, J. Jacques and F.-X. Jollois, The Functional Latent Block Model for the Co-Clustering of Electricity Consumption Curves, Journal of the Royal Statistical Society, Series C, 2018 (https://doi.org/10.1111/rssc.12260).
set.seed(12345) # Simulate data and co-clustering X = simulateData(n = 30, p = 30, t = 15) # Co-clustering with funLBM out = funLBM(X$data,K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means') # Evaluating clustering results ari(out$col_clust,X$col_clust) ari(out$row_clust,X$row_clust)
set.seed(12345) # Simulate data and co-clustering X = simulateData(n = 30, p = 30, t = 15) # Co-clustering with funLBM out = funLBM(X$data,K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means') # Evaluating clustering results ari(out$col_clust,X$col_clust) ari(out$row_clust,X$row_clust)
Simulate bivariate data according to the funLBM model with K=4 groups for rows and L=3 groups for columns.
simulateData2(n = 100, p = 100, t = 30)
simulateData2(n = 100, p = 100, t = 30)
n |
The number of rows (individuals) of the simulated data array, |
p |
The number of columns (functional variables) of the simulated data array, |
t |
The number of measures for the functions of the simulated data array. |
The resulting object contains:
data1 |
data array of size n x p x t for first variable |
data2 |
data array of size n x p x t for second variable |
row_clust |
Group memberships of rows |
col_clust |
Group memberships of columns |
C. Bouveyron, L. Bozzi, J. Jacques and F.-X. Jollois, The Functional Latent Block Model for the Co-Clustering of Electricity Consumption Curves, Journal of the Royal Statistical Society, Series C, 2018 (https://doi.org/10.1111/rssc.12260).
# Simulate data and co-clustering set.seed(12345) X = simulateData2(n = 50, p = 50, t = 15) # Co-clustering with funLBM out = funLBM(list(X$data1,X$data2),K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means') # Evaluating clustering results ari(out$col_clust,X$col_clust) ari(out$row_clust,X$row_clust)
# Simulate data and co-clustering set.seed(12345) X = simulateData2(n = 50, p = 50, t = 15) # Co-clustering with funLBM out = funLBM(list(X$data1,X$data2),K=4,L=3) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means') # Evaluating clustering results ari(out$col_clust,X$col_clust) ari(out$row_clust,X$row_clust)
The Velib data set contains data from the bike sharing system of Paris, called Velib. The data are loading profiles of the bike stations over seven days. The data were collected every hour during the period Sunday 1st Sept. - Sunday 7th Sept., 2014.
data("Velib")
data("Velib")
The format is: - data: the loading profiles (nb of available bikes / nb of bike docks) of the 1189 stations for 7 days every hour. - position: the longitude and latitude of the 1189 bike stations.
The real time data are available at https://developer.jcdecaux.com/ (with an api key).
The data were first used in C. Bouveyron, E. Come and J. Jacques, The discriminative functional mixture model for a comparative analysis of bike sharing systems, The Annals of Applied Statistics, vol. 9 (4), pp. 1726-1760, 2015 (http://dx.doi.org/10.1214/15-AOAS861).
data(Velib) set.seed(12345) # Co-clustering with funLBM out = funLBM(Velib$data,K=4,L=2,basis.name="fourier",nbasis=5) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means')
data(Velib) set.seed(12345) # Co-clustering with funLBM out = funLBM(Velib$data,K=4,L=2,basis.name="fourier",nbasis=5) # Visualization of results plot(out,type='blocks') plot(out,type='proportions') plot(out,type='means')