Title: | Principal Tensor Analysis on k Modes |
---|---|
Description: | A multiway method to decompose a tensor (array) of any order, as a generalisation of SVD also supporting non-identity metrics and penalisations. 2-way SVD with these extensions is also available. The package includes also some other multiway methods: PCAn (Tucker-n) and PARAFAC/CANDECOMP with these extensions. |
Authors: | Didier G. Leibovici (2010) <doi:10.18637/jss.v034.i10> |
Maintainer: | Didier G. Leibovici <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0.0 |
Built: | 2024-11-02 06:16:05 UTC |
Source: | CRAN |
Computes all the 2-modes solutions associated to the given Principal Tensor of the given tensor.
APSOLU3(X,solu,pt3=NULL,nbPT2=1, smoothing=FALSE,smoo=list(NA), verbose=getOption("verbose"),file=NULL, ...)
APSOLU3(X,solu,pt3=NULL,nbPT2=1, smoothing=FALSE,smoo=list(NA), verbose=getOption("verbose"),file=NULL, ...)
X |
a tensor (as an array) of order 3, if non-identity metrics are
used |
solu |
a |
pt3 |
a number identifying in |
nbPT2 |
integer, if 1 all solutions will be computed otherwise at maximum nbPT2 solutions |
smoothing |
see |
smoo |
see |
verbose |
control printing |
file |
output printed at the prompt if |
... |
any other arguments passed to SVDGen or other functions |
For each component of the identified Principal Tensor
given in solu
, an SVD of the contracted product of X
and the component is done.
This gives all the associated Principal Tensors which updates solu
supposed to contain
Principal Tensors of X
.
an updated PTAk
object
Usually (i.e. as in PTA3
and PTAk
) the principal
tensor used is the first Principal Tensor of X (and is the last updated in solu
). If
it is another Principal Tensor, the obtained associated solutions do not stricto sensu
refer to the SVD-kmodes decomposition (because the orthogonality is defined in the whole
tensor space not necessarily on each component space) but are still meaningful.
Didier G. Leibovici
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329
Computes all the (k-1)-modes associated solutions to the given Principal Tensor of the given tensor. Calls recursively PTAk.
APSOLUk(X,solu,nbPT,nbPT2=1, smoothing=FALSE,smoo=list(NA), minpct=0.1,ptk=NULL, verbose=getOption("verbose"),file=NULL, modesnam=NULL, ...)
APSOLUk(X,solu,nbPT,nbPT2=1, smoothing=FALSE,smoo=list(NA), minpct=0.1,ptk=NULL, verbose=getOption("verbose"),file=NULL, modesnam=NULL, ...)
X |
a tensor (as an array) of order k, if non-identity metrics are
used |
solu |
a |
nbPT |
a number or a vector of dimension (k-2) |
nbPT2 |
integer, if 0 no 2-modes solutions will be computed, 1 means all, >1 otherwise |
smoothing |
see |
smoo |
see |
minpct |
numerical 0-100 to control of computation of future solutions at this level and below |
ptk |
a number identifying in solutions the Principal Tensor to use or the last (if |
verbose |
control printing |
file |
output printed at the prompt if |
modesnam |
character vector of the names of the modes, if |
... |
any other arguments passed to PTAk or other functions |
For each component of the identified Principal Tensor given in
solutions
, a PTA-(k-1)modes of the contracted product
of X and the component is done. This gives all the associated
Principal Tensors which updates solutions
supposed to contain
a Principal Tensors of X at the first place. For full description of
arguments see PTAk
.
an updated PTAk
object
Usually (i.e. as in PTA3
and PTAk
) the
principal tensor used is the first Principal Tensor of
X
(and is the last updated in solutions
). If
it is another Principal Tensor, the obtained associated
solutions do not stricto sensu refer to the
SVD-kmodes decomposition (because the orthogonality
is defined in the whole tensor space not necessarily on
each component space) but are still meaningful. This
function is usually called by PTAk
but can be used
on its own to carry on a PTAk
analysis if X
is the projected (of the original data) on the orthogonal
of all the kmodes Principal Tensor. In other words
the ptk
rank-one tensor in solutions
should
be the first best rank-one tensor approximating X
for this decomposition analysis to be called
PTA-kmodes.
Didier G. Leibovici
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329.
Performs the identical models known as PARAFAC or CANDECOMP model.
CANDPARA(X,dim=3,test=1E-8,Maxiter=1000, smoothing=FALSE,smoo=list(NA), verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="")
CANDPARA(X,dim=3,test=1E-8,Maxiter=1000, smoothing=FALSE,smoo=list(NA), verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="")
X |
a tensor (as an array) of order k, if non-identity metrics are
used |
dim |
a number specifying the number of rank-one tensors |
test |
control of convergence |
Maxiter |
maximum number of iterations allowed for convergence |
smoothing |
see |
smoo |
see |
verbose |
control printing |
file |
output printed at the prompt if |
modesnam |
character vector of the names of the modes, if |
addedcomment |
character string printed after the title of the analysis |
Looking for the best rank-one tensor approximation (LS) the three methods described in the package are equivalent. If the number of tensors looked for is greater then one the methods differs: PTA-kmodes will look for best approximation according to the orthogonal rank (i.e. the rank-one tensors are orthogonal), PCA-kmodes will look for best approximation according to the space ranks (i.e. the ranks of all (simple) bilinear forms , that is the number of components in each space), PARAFAC/CANDECOMP will look for best approximation according to the rank (i.e. the rank-one tensors are not necessarily orthogonal). For sake of comparisons the PARAFAC/CANDECOMP method and the PCA-nmodes are also in the package but complete functionnality of the use these methods and more complete packages may be checked at the www site quoted below.
a CANDPARA
(inherits from PTAk
) object
The use of metrics (diagonal or not) and smoothing extends
flexibility of analysis. This program runs slow! A PARAFAC orthogonal
can be done with PTAk looking only for k-modes Principal Tensors
i.e. with the options nbPT=c(rep(0,k-2),dim), nbPT2=0
.
It is identical to look in any PTAk
decomposition only for the
kmodes solution but obviously with unecessary computations.
Didier G. Leibovici
Caroll J.D and Chang J.J (1970) Analysis of individual differences in multidimensional scaling via n-way generalization of 'Eckart-Young' decomposition. Psychometrika 35,283-319.
Harshman R.A (1970) Foundations of the PARAFAC procedure: models and conditions for 'an explanatory' multi-mode factor analysis. UCLA Working Papers in Phonetics, 16,1-84.
Kroonenberg P (1983) Three-mode Principal Component Analysis: Theory and Applications. DSWO press. Leiden.)
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329.
Gives a robust estimate of an unknown within group covariance, aiming either to look for dense groups or to sparse groups (outliers) according to local variance and weighting function choice.
CauRuimet(Z,ker=1,m0=1,withingroup=TRUE, loc=substitute(apply(Z,2,mean,trim=.1)),matrixmethod=TRUE, Nrandom=3000)
CauRuimet(Z,ker=1,m0=1,withingroup=TRUE, loc=substitute(apply(Z,2,mean,trim=.1)),matrixmethod=TRUE, Nrandom=3000)
Z |
matrix |
ker |
either numerical or a function:
if numerical the weighting function is |
m0 |
is a graph of neighbourhood or another proximity matrix, the hadamard product of the proximities will be operated |
withingroup |
logical,if |
loc |
a vector of locations or a function using mean, median, to give an estimate of it |
matrixmethod |
if |
Nrandom |
if |
When withingroup is TRUE
, local(defined by the weighting) variance formula is returned, aiming
at finding dense groups:
where is the squared euclidian distance with
the inverse of a robust sample covariance (i.e. using
loc
instead of the mean) ;
if FALSE
robust Total weighted variance or if m0
not 1 Global weighted variance, is returned:
where is the vector
loc
.
If m0
is a graph of neighbourhood and ker is the function returning 1 (no proximity due to
distance is used) the function will return (when withingroup=TRUE
) the local
variance-covariance matrix as define in Lebart(1969).
a matrix
As mentioned by Caussinus and Ruiz a good strategy to reveal dense groups with generalised PCA
would be to reveal outliers first using the metric and remove them before using the
metric
. Based on theoretical considerations they recommand for the choice of
ker
, with the decreasing function : a lower bound of 1 if
withingroup
and something fairly small say in the interval [0.05;0.3] otherwise.
Didier G. Leibovici
Caussinus, H and Ruiz, A (1990) Interesting Projections of Multidimensional Data by Means of Generalized Principal Components Analysis. COMPSTAT90, Physica-Verlag, Heidelberg,121-126.
Faraj, A (1994) Interpretation tools for Generalized Discriminant Analysis.In: New Approches in Classification and Data Analysis, Springer-Verlag, 286-291, Heidelberg.
Lebart, L (1969) Analyse statistique de la contiguit<e9>e.Publication de l'Institut de Statistiques Universitaire de Paris, XVIII,81-112.
Leibovici D (2008) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes: the R package PTAk . to be submitted soon at Journal of Statisticcal Software.
data(iris) iris2 <- as.matrix(iris[,1:4]) dimnames(iris2)[[1]] <- as.character(iris[,5]) D2 <- CauRuimet(iris2,ker=1,withingroup=TRUE) D2 <- Powmat(D2,(-1)) iris2 <- sweep(iris2,2,apply(iris2,2,mean)) res <- SVDgen(iris2,D2=D2,D1=1) plot(res,nb1=1,nb2=2,cex=1,mod=1,Zcol=list(c(rep(1,50),rep(2,50),rep(3,50)))) summary(res,testvar=0) # the same in a demo function # source(paste(R.home(),"/library/PTAk/demo/CauRuimet.R",sep="")) # demo.CauRuimet(ker=4,withingroup=TRUE,openX11s=FALSE) # demo.Cauruimet(ker=0.15,withingroup=FALSE,openX11s=FALSE)
data(iris) iris2 <- as.matrix(iris[,1:4]) dimnames(iris2)[[1]] <- as.character(iris[,5]) D2 <- CauRuimet(iris2,ker=1,withingroup=TRUE) D2 <- Powmat(D2,(-1)) iris2 <- sweep(iris2,2,apply(iris2,2,mean)) res <- SVDgen(iris2,D2=D2,D1=1) plot(res,nb1=1,nb2=2,cex=1,mod=1,Zcol=list(c(rep(1,50),rep(2,50),rep(3,50)))) summary(res,testvar=0) # the same in a demo function # source(paste(R.home(),"/library/PTAk/demo/CauRuimet.R",sep="")) # demo.CauRuimet(ker=4,withingroup=TRUE,openX11s=FALSE) # demo.Cauruimet(ker=0.15,withingroup=FALSE,openX11s=FALSE)
Computes the contraction product of two tensors as a generalisation of matrix product.
CONTRACTION(X,z, Xwiz=NULL,zwiX=NULL,rezwiX=FALSE,usetensor=TRUE) CONTRACTION.list(X,zlist,moins=1,zwiX=NULL,usetensor=TRUE,withapply=FALSE)
CONTRACTION(X,z, Xwiz=NULL,zwiX=NULL,rezwiX=FALSE,usetensor=TRUE) CONTRACTION.list(X,zlist,moins=1,zwiX=NULL,usetensor=TRUE,withapply=FALSE)
X |
a tensor(as an array) of any order |
z |
another tensor (with at least one space in common) |
zlist |
a list of lists like a |
Xwiz |
|
zwiX |
idem as |
moins |
the elements in |
rezwiX |
logical if |
usetensor |
if |
withapply |
if |
Like two matrices contract according to the
appropriate dimensions (columns matching rows) when one
performs a matrix product, this operation does pretty much
the same thing for tensors(array) and specified contraction
dimensions given by Xwiz
and zwiX
which
should match. The function is actually written like:
transforms both tensors as matrices with the “matching
tensor product" of their contraction dimensions in columns
(for higher order tensor) and rows (the other one),
performs the matrix product and rebuild the result as a
tensor(array). Without using tensor
, if Xwiz
and/or zwiX
are not specified the functions tries to
match all z
dimensions onto the dimensions of X
where X is the higher order tensor (if it is not the case
in the arguments the function swaps them).
A tensor of dimension c(dim(X)[-Xwiz],dim(z)[-zwiX])
if X
has got a bigger order than z
.
This operation generalises the matrix product to the
contracted product of any two tensors(arrays), and
should theoretically perform the tensor product if no
matching (no contraction) but has not been implemented. I
recently put the option of using tensor
which does
exactly the same thing faster as well as it is from
C
. When using tensor
if Xwiz
or
zwiX
are NULL
they are replaced by the full
possibilities.
Didier G. Leibovici
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329.
Schwartz L (1975) Les Tenseurs. Herman, Paris.
library(tensor) z <- array(1:12,c(2,3,2)) X <- array(1:48,c(3,4,2,2)) Xcz <- CONTRACTION(X,z,Xwiz=c(1,3,4),zwiX=c(2,3,1)) dim(Xcz) # 4 Xcz1 <- CONTRACTION(X,z,Xwiz=c(3,4),zwiX=c(1,3)) dim(Xcz1) # 3,4,3 Xcz2 <- CONTRACTION(X,z,Xwiz=c(3,4),zwiX=c(3,1)) Xcz1[,,1] Xcz2[,,1] ####### sval0 <- list(list(v=c(1,2,3,4)),list(v=rep(1,3)),list(v=c(1,3))) tew <- array(1:24,c(4,3,2)) CONTRACTION.list(tew,sval0,moins=1) #this is equivalent to the following which may be too expensive for big datasets CONTRACTION(tew,TENSELE(sval0,moins=1),Xwiz=c(2,3)) ## CONTRACTION.list(tew,sval0,moins=c(1,2)) #must be equal to CONTRACTION(tew,sval0[[3]]$v,Xwiz=3)
library(tensor) z <- array(1:12,c(2,3,2)) X <- array(1:48,c(3,4,2,2)) Xcz <- CONTRACTION(X,z,Xwiz=c(1,3,4),zwiX=c(2,3,1)) dim(Xcz) # 4 Xcz1 <- CONTRACTION(X,z,Xwiz=c(3,4),zwiX=c(1,3)) dim(Xcz1) # 3,4,3 Xcz2 <- CONTRACTION(X,z,Xwiz=c(3,4),zwiX=c(3,1)) Xcz1[,,1] Xcz2[,,1] ####### sval0 <- list(list(v=c(1,2,3,4)),list(v=rep(1,3)),list(v=c(1,3))) tew <- array(1:24,c(4,3,2)) CONTRACTION.list(tew,sval0,moins=1) #this is equivalent to the following which may be too expensive for big datasets CONTRACTION(tew,TENSELE(sval0,moins=1),Xwiz=c(2,3)) ## CONTRACTION.list(tew,sval0,moins=c(1,2)) #must be equal to CONTRACTION(tew,sval0[[3]]$v,Xwiz=3)
After a FCA2
, a SVDgen
, a FCAk
or a PTAk
computes the traditional guides for interpretations used in PCA and correspondence analysis: COS2 or the percentage of variability rebuilt by the component and CTR or the amount of contribution towards that component.
COS2(solu, mod=1, solnbs=2:4) CTR(solu, mod=1, solnbs=1:4, signed = TRUE, mil = TRUE)
COS2(solu, mod=1, solnbs=2:4) CTR(solu, mod=1, solnbs=1:4, signed = TRUE, mil = TRUE)
solu |
an object inheriting from class |
mod |
an integer representing the mode number entry, 1 is row, 2 columns, ... |
solnbs |
a vector of integers representing the tensor numbers in the listing summary |
signed |
logical to use signed-CTR from affect the sign of corresponding value in |
mil |
logical |
Classical measures helping to interpret the plots in PCA, FCA and in PTAk as well. The sum of the COS2 across all the components needed to rebuild fully the tensor analysed) would make 1000 and the sum pf the CTR across the entry mode would be 1000.
a matrix whose columns are the COS2 or CTR as per thousands (‰) for the mode considered
Didier G. Leibovici
Escoufier Y (1985) L'Analyse des correspondances : ses propriétés et ses extensions. ISI 45th session Amsterdam.
Leibovici D(1993) Facteurs à Mesures Répétées et Analyses Factorielles : applications à un suivi Epidémiologique. Université de Montpellier II. PhD Thesis in Mathématiques et Applications (Biostatistiques).
Leibovici DG (2010) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes:the R package PTAk. Journal of Statistical Software, 34(10), 1-34. doi:10.18637/jss.v034.i10
PTAk
, FCA2
, FCAk
, summary.FCAk
, plot.PTAk
data(crimerate) cri.FCA2 <- FCA2(crimerate) summary(cri.FCA2) plot(cri.FCA2, mod = c(1,2), nb1 = 2, nb2 = 3) # unscaled plot(cri.FCA2, mod = c(1,2), nb1 = 2, nb2 = 3, coefi = list(c(0.130787,0.130787),c(0.104359,0.104359)) ) # symmetric-map biplot CTR(cri.FCA2, mod = 1, solnbs = 2:4) CTR(cri.FCA2, mod = 2, solnbs = 2:4) COS2(cri.FCA2, mod = 2, solnbs = 2:4) ##### useful fonctions ##selecting and sorting out dimensions positive and negative sides "ctrcos2" <-function(Ta, mod=1, dim=2, NegPos=TRUE, select=c("avg",12,"none"),nbdig=2, cos2min=333){ dim=c(dim, dim+1) ctr=CTR(Ta,mod=mod,solnbs=dim);cos2=COS2(Ta,mod=mod,solnbs=dim) val=round(Ta[[mod]]$v[dim,][1,],digits=nbdig) oo=order(ctr[,1],decreasing=TRUE) if(NegPos)oo=order(val,decreasing=TRUE) out=cbind(ctr[oo,1],cos2[oo,1],val[oo]) colnames(out)=c("ctr","cos2",paste0("dim",dim[1])) if(select[1]=="none") sout=0 if(select[1]=="avg") sout= 1000/length(val) if(is.numeric(select[1])) sout=select[1] return(out[ out[,1]>=sout | out[,2]>=cos2min, ]) }#ctrcos2 ## plot ctr cos2 "plotctrcos2"<-function(sol,mod12=c(1,2),dim=2, ratio2avg=TRUE, col=c(1,2),pch=c("-","°"), posi=c(2,3),reposi=TRUE, cos2min=333,select="avg",...){ ### ctrcos2 ini Ta,mod=1,dim=2, NegPos=TRUE, select=c("avg",12,"none"), nbdig=2, cos2min=333 pre<-function(mod=1,soldim=2, ...){ diim=length(sol[[mod]]$v[1,]) ctrov= ctrcos2(sol,dim= soldim,mod=mod,...) x=ctrov[,1]*sign(ctrov[,3]) if(ratio2avg)x=round(x/(1000/diim),2) y=ctrov[,2] lab=rownames(ctrov) len=dim(ctrov)[1] return(list("x"=x,"y"=y,"len"=len,"lab"=lab)) } if(length(col)<length(mod12))col=rep(col,length(mod12)) if(length(pch)<length(mod12))pch=rep(pch,length(mod12)) x=NULL;y=NULL;coul=NULL;pchl=NULL;lab=NULL;poslab=NULL for(m in mod12){ prep=pre(mod=m,soldim=dim,...) x=c(x,prep$x);y=c(y,prep$y); coul=c(coul,rep(col[m],prep$len));pchl=c(pchl,rep(pch[m],prep$len)) repos=rep(posi[m],prep$len); if(reposi)repos=sample(1:4,prep$len, replace=TRUE) lab=c(lab,prep$lab);poslab=c(poslab,repos) } summsol=summary(sol) if(match("FCA2" ,class(sol),nomatch=0)>0) xlabe=paste0( "Global pct ", round(summsol[dim,4],2), " FCA pct",round(summsol[dim,5],2)) else xlabe=paste0( " local pxt",round(summsol[dim,4],2), " Global pct", round(summsol[dim,5],2)) dimi=paste0("dim",dim) if(ratio2avg)ctrlab="CTR (signed ctr /(uniform ctr))" else ctrlab="CTR (signed)" if(!is.null(cos2min))cos2lab=paste("COS2 (> ",cos2min,")")else cos2lab= "COS2" plot(x,y,xlab=ctrlab, main=paste(dimi, xlabe ),ylab= cos2lab,col=coul, pch=pchl,ylim=c(min(y),1050),xlim=c(min(x-0.5),max(x+0.5))) abline(v=0,col=4,lty=2) abline(v=1,col=3,lty=2) abline(v=-1,col=3,lty=2) text(x,y,lab,pos=poslab, col=coul) return(cbind(x,y,lab,coul,pchl,poslab)) }#plotctrcos2 ctrcos2(cri.FCA2,mod=1) ctrcos2(cri.FCA2,mod=2) plotctrcos2(cri.FCA2)
data(crimerate) cri.FCA2 <- FCA2(crimerate) summary(cri.FCA2) plot(cri.FCA2, mod = c(1,2), nb1 = 2, nb2 = 3) # unscaled plot(cri.FCA2, mod = c(1,2), nb1 = 2, nb2 = 3, coefi = list(c(0.130787,0.130787),c(0.104359,0.104359)) ) # symmetric-map biplot CTR(cri.FCA2, mod = 1, solnbs = 2:4) CTR(cri.FCA2, mod = 2, solnbs = 2:4) COS2(cri.FCA2, mod = 2, solnbs = 2:4) ##### useful fonctions ##selecting and sorting out dimensions positive and negative sides "ctrcos2" <-function(Ta, mod=1, dim=2, NegPos=TRUE, select=c("avg",12,"none"),nbdig=2, cos2min=333){ dim=c(dim, dim+1) ctr=CTR(Ta,mod=mod,solnbs=dim);cos2=COS2(Ta,mod=mod,solnbs=dim) val=round(Ta[[mod]]$v[dim,][1,],digits=nbdig) oo=order(ctr[,1],decreasing=TRUE) if(NegPos)oo=order(val,decreasing=TRUE) out=cbind(ctr[oo,1],cos2[oo,1],val[oo]) colnames(out)=c("ctr","cos2",paste0("dim",dim[1])) if(select[1]=="none") sout=0 if(select[1]=="avg") sout= 1000/length(val) if(is.numeric(select[1])) sout=select[1] return(out[ out[,1]>=sout | out[,2]>=cos2min, ]) }#ctrcos2 ## plot ctr cos2 "plotctrcos2"<-function(sol,mod12=c(1,2),dim=2, ratio2avg=TRUE, col=c(1,2),pch=c("-","°"), posi=c(2,3),reposi=TRUE, cos2min=333,select="avg",...){ ### ctrcos2 ini Ta,mod=1,dim=2, NegPos=TRUE, select=c("avg",12,"none"), nbdig=2, cos2min=333 pre<-function(mod=1,soldim=2, ...){ diim=length(sol[[mod]]$v[1,]) ctrov= ctrcos2(sol,dim= soldim,mod=mod,...) x=ctrov[,1]*sign(ctrov[,3]) if(ratio2avg)x=round(x/(1000/diim),2) y=ctrov[,2] lab=rownames(ctrov) len=dim(ctrov)[1] return(list("x"=x,"y"=y,"len"=len,"lab"=lab)) } if(length(col)<length(mod12))col=rep(col,length(mod12)) if(length(pch)<length(mod12))pch=rep(pch,length(mod12)) x=NULL;y=NULL;coul=NULL;pchl=NULL;lab=NULL;poslab=NULL for(m in mod12){ prep=pre(mod=m,soldim=dim,...) x=c(x,prep$x);y=c(y,prep$y); coul=c(coul,rep(col[m],prep$len));pchl=c(pchl,rep(pch[m],prep$len)) repos=rep(posi[m],prep$len); if(reposi)repos=sample(1:4,prep$len, replace=TRUE) lab=c(lab,prep$lab);poslab=c(poslab,repos) } summsol=summary(sol) if(match("FCA2" ,class(sol),nomatch=0)>0) xlabe=paste0( "Global pct ", round(summsol[dim,4],2), " FCA pct",round(summsol[dim,5],2)) else xlabe=paste0( " local pxt",round(summsol[dim,4],2), " Global pct", round(summsol[dim,5],2)) dimi=paste0("dim",dim) if(ratio2avg)ctrlab="CTR (signed ctr /(uniform ctr))" else ctrlab="CTR (signed)" if(!is.null(cos2min))cos2lab=paste("COS2 (> ",cos2min,")")else cos2lab= "COS2" plot(x,y,xlab=ctrlab, main=paste(dimi, xlabe ),ylab= cos2lab,col=coul, pch=pchl,ylim=c(min(y),1050),xlim=c(min(x-0.5),max(x+0.5))) abline(v=0,col=4,lty=2) abline(v=1,col=3,lty=2) abline(v=-1,col=3,lty=2) text(x,y,lab,pos=poslab, col=coul) return(cbind(x,y,lab,coul,pchl,poslab)) }#plotctrcos2 ctrcos2(cri.FCA2,mod=1) ctrcos2(cri.FCA2,mod=2) plotctrcos2(cri.FCA2)
The crimerate
dataset provides crime rates per 100,000 people in
seven categories for each of the fifty states (USA) in 1977. The timage12
dataset
is an image from fMRI analysis (one brain slice), it is a t-statistic image over 12 subjects of the activation (verbal) parameter.
The Zone_climTUN
is an object of class Map representing montly (12
) measurements in Tunisia
of 10
climatic indicators. The grid of 2599
cells was stored previously as a shapefile and read using read.shape
.
data(crimerate) data(timage12) data(Zone_climTUN)
data(crimerate) data(timage12) data(Zone_climTUN)
crimerate
is a matrix of 50 x 7
for the crimerate
data.
timage12
is a matrix 91 x 109
for timage12
data.
crimerate
comes from SAS. The timage12
comes from FMRIB center, University of Oxford.
The Zone_climTUN
comes from WorldCLIM database 2000 see references along with description of the indicators in Leibovici et al.(2007).
Leibovici D, Quillevere G, Desconnets JC (2007). A Method to Classify Ecoclimatic Arid and Semi-Arid Zones in Circum-Saharan Africa Using Monthly Dynamics of Multiple Indicators. IEEE Transactions on Geoscience and Remote Sensing, 45(12), 4000-4007.
Performs a particular SVDgen
data as a ratio Observed/Expected
under complete independence with metrics as margins of the
contingency table (in frequencies).
FCA2(X, nbdim =NULL, minpct = 0.01, smoothing = FALSE, smoo = rep(list(function(u) ksmooth(1:length(u), u, kernel = "normal", bandwidth = 3, x.points = (1:length(u)))$y), length(dim(X))), verbose = getOption("verbose"), file = NULL, modesnam = NULL, addedcomment = "", chi2 = FALSE, E = NULL, ...)
FCA2(X, nbdim =NULL, minpct = 0.01, smoothing = FALSE, smoo = rep(list(function(u) ksmooth(1:length(u), u, kernel = "normal", bandwidth = 3, x.points = (1:length(u)))$y), length(dim(X))), verbose = getOption("verbose"), file = NULL, modesnam = NULL, addedcomment = "", chi2 = FALSE, E = NULL, ...)
X |
a matrix table of positive values |
nbdim |
a number of dimension to retain, if |
minpct |
numerical 0-100 to control of computation of future solutions at this level and below |
smoothing |
see |
smoo |
see |
verbose |
control printing |
file |
output printed at the prompt if |
modesnam |
character vector of the names of the modes, if |
addedcomment |
character string printed if |
chi2 |
print the chi2 information when computing margins in |
E |
if not |
... |
any other arguments passed to SVDGen or other functions |
Gives the SVD-2modes decomposition of the of
the contingency table of full count
,
i.e. complete independence + lack of independence (including marginal
independences) as shown for example in Lancaster(1951)(see reference
in Leibovici(1993 or 2000)). Noting
, a
SVD
of the
-uple is done, that is :
where the metrics are diagonals of the corresponding margins. For
full description of arguments see PTAk
. If E
is not NULL
an FCAk-modes relatively to a model is
done (see Escoufier(1985) and therin reference
Escofier(1984) for a 2-way derivation), e.g. for a three way contingency table
the 4-tuple data and metrics is:
If E
was the complete independence (product of the margins)
then this would give an AFCk
but without looking at the
marginal dependencies (i.e. for a three way table no two-ways lack of
independence are looked for).
a FCA2
(inherits FCAk
and PTAk
) object
Didier G. Leibovici
Escoufier Y (1985) L'Analyse des correspondances : ses propriétés et ses extensions. ISI 45th session Amsterdam.
Leibovici D(1993) Facteurs à Mesures Répétées et Analyses Factorielles : applications à un suivi Epidémiologique. Université de Montpellier II. PhD Thesis in Mathématiques et Applications (Biostatistiques).
Leibovici D (2000) Multiway Multidimensional Analysis for Pharmaco-EEG Studies.http://www.fmrib.ox.ac.uk/analysis/techrep/tr00dl2/tr00dl2.pdf
Leibovici DG (2010) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes:the R package PTAk. Journal of Statistical Software, 34(10), 1-34. doi:10.18637/jss.v034.i10
Leibovici DG and Birkin MH (2013) Simple, multiple and multiway correspondence analysis applied to spatial census-based population microsimulation studies using R. NCRM Working Paper. NCRM-n^o 07/13, Id-3178 https://eprints.ncrm.ac.uk/id/eprint/3178
data(crimerate) cri.FCA2 <- FCA2(crimerate) summary(cri.FCA2) plot(cri.FCA2, mod = c(1,2), nb1 = 2, nb2 = 3) # unscaled plot(cri.FCA2, mod = c(1,2), nb1 = 2, nb2 = 3, coefi = list(c(0.130787,0.130787),c(0.104359,0.104359)) )# symmetric-map biplot CTR(cri.FCA2, mod = 1, solnbs = 2:4) CTR(cri.FCA2, mod = 2, solnbs = 2:4) COS2(cri.FCA2, mod = 2, solnbs = 2:4)
data(crimerate) cri.FCA2 <- FCA2(crimerate) summary(cri.FCA2) plot(cri.FCA2, mod = c(1,2), nb1 = 2, nb2 = 3) # unscaled plot(cri.FCA2, mod = c(1,2), nb1 = 2, nb2 = 3, coefi = list(c(0.130787,0.130787),c(0.104359,0.104359)) )# symmetric-map biplot CTR(cri.FCA2, mod = 1, solnbs = 2:4) CTR(cri.FCA2, mod = 2, solnbs = 2:4) COS2(cri.FCA2, mod = 2, solnbs = 2:4)
Performs a particular PTAk
data as a ratio Observed/Expected
under complete independence with metrics as margins of the multiple
contingency table (in frequencies).
FCAk(X,nbPT=3,nbPT2=1,minpct=0.01, smoothing=FALSE,smoo=rep(list( function(u)ksmooth(1:length(u),u,kernel="normal", bandwidth=3,x.points=(1:length(u)))$y),length(dim(X))), verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="",chi2=TRUE,E=NULL, ...)
FCAk(X,nbPT=3,nbPT2=1,minpct=0.01, smoothing=FALSE,smoo=rep(list( function(u)ksmooth(1:length(u),u,kernel="normal", bandwidth=3,x.points=(1:length(u)))$y),length(dim(X))), verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="",chi2=TRUE,E=NULL, ...)
X |
a multiple contingency table (array) of order k |
nbPT |
a number or a vector of dimension (k-2) |
nbPT2 |
if 0 no 2-modes solutions will be computed, 1 =all, >1 otherwise |
minpct |
numerical 0-100 to control of computation of future solutions at this level and below |
smoothing |
see |
smoo |
see |
verbose |
control printing |
file |
output printed at the prompt if |
modesnam |
character vector of the names of the modes, if |
addedcomment |
character string printed if |
chi2 |
print the chi2 information when computing margins in |
E |
if not |
... |
any other arguments passed to SVDGen or other functions |
Gives the SVD-kmodes decomposition of the of
the multiple contingency table of full count
,
i.e. complete independence + lack of independence (including marginal
independences) as shown for example in Lancaster(1951)(see reference
in Leibovici(2000)). Noting
, a
PTAk
of the
-uple is done, e.g. for a three way contingency table
the 4-uple data and metrics is:
where the metrics are diagonals of the corresponding margins. For
full description of arguments see PTAk
. If E
is not NULL
an FCAk-modes relatively to a model is
done (see Escoufier(1985) and therin reference
Escofier(1984) for a 2-way derivation), e.g. for a three way contingency table
the 4-tuple data and metrics is:
If E
was the complete independence (product of the margins)
then this would give an AFCk
but without looking at the
marginal dependencies (i.e. for a three way table no two-ways lack of
independence are looked for).
a FCAk
(inherits PTAk
) object
Didier G. Leibovici
Escoufier Y (1985) L'Analyse des correspondances : ses propri<e9>t<e9>s et ses extensions. ISI 45th session Amsterdam.
Leibovici D(1993) Facteurs <e0> Mesures R<e9>p<e9>t<e9>es et Analyses Factorielles : applications <e0> un suivi <e9>pid<e9>miologique. Universit<e9> de Montpellier II. PhD Thesis in Math<e9>matiques et Applications (Biostatistiques).
Leibovici D (2000) Multiway Multidimensional Analysis for Pharmaco-EEG Studies.http://www.fmrib.ox.ac.uk/analysis/techrep/tr00dl2/tr00dl2.pdf
Leibovici DG (2010) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes:the R package PTAk. Journal of Statistical Software, 34(10), 1-34. doi:10.18637/jss.v034.i10
Leibovici DG and Birkin MH (2013) Simple, multiple and multiway correspondence analysis applied to spatial census-based population microsimulation studies using R. NCRM Working Paper. NCRM-n^o 07/13, Id-3178 https://eprints.ncrm.ac.uk/id/eprint/3178
# try the demo # demo.FCAk()
# try the demo # demo.FCAk()
Computes the ratio Observed/Expected under complete independence
with margins of the multiple contingency table (in frequencies) and
gives chi2
statistic of lack of complete independence.
FCAmet(X,chi2=FALSE,E=NULL,No0margins=TRUE)
FCAmet(X,chi2=FALSE,E=NULL,No0margins=TRUE)
X |
a multiple contingency table (array) of order k |
chi2 |
if |
E |
if not |
No0margins |
if TRUE, prevents zero margins in replacing cells involved by the min of the non-zero margins /nb of zero cells |
a list with
data |
an array |
met |
a list wherein each entry is the vector of the corresponding margins i.e.
|
count |
is the total sum |
The statistics and metrics do not depend on E
. The statistic given measure only the lack of independence.
Didier G. Leibovici
A mini guide to handle PTAk model decomposition
howtoPTAk()
howtoPTAk()
The PTAk decomposition aims at building an approximation of a given multiway data, represented as a tensor, based on a variance criterion. This approximation is given by a set of rank one tensors, orthogonal to each other, in a nested algorithm process and so controlling the level of approximation by the amount of variability extracted and represented by the sum of squares of the singular values (associated to the rank one tensors). In that respect it offers a way of generalising PCA to tensors of order greater than 2.
The reference in JSS provides details about preparing a dataset and running a general PTAk and particularities for spatio-temporal data. Some aspects on FCAk can also be found in the NCRM publication.
The license is GPL-3, support can be provided via http://c3s2i.free.fr, donations via Paypal to [email protected] are welcome.
Didier G. Leibovici
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329
Leibovici DG (2010) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes:the R package PTAk. Journal of Statistical Software, 34(10), 1-34. doi:10.18637/jss.v034.i10
Leibovici DG and Birkin MH (2013) Simple, multiple and multiway correspondence analysis applied to spatial census-based population microsimulation studies using R. NCRM Working Paper. NCRM-n^o 07/13, Id-3178 https://eprints.ncrm.ac.uk/id/eprint/3178
Gives the first Tucker1 components of a given tensor.
INITIA(X,modesnam=NULL,method="svds",dim=1,...)
INITIA(X,modesnam=NULL,method="svds",dim=1,...)
X |
a tensor (as an array) of order k |
modesnam |
a character vector of the names of the modes |
method |
uses either the inbuilt SVD |
dim |
default 1 in each space otherwise specify the number of dimensions
e.g. |
... |
extra arguments of the method |
Computes the first (or dim
) right singular vector (or other
summaries) for every representation of the tensor as a matrix with
dim(X)[i]
columns, i=1...k
.
a list (of length k) of lists with arguments:
v |
the singular vectors in rows |
modesnam |
a character object naming the mode, |
n |
labels of mode |
d |
the corresponding first singular values |
The collection these eigenvectors, is known as the Tucker1 solution
or initialisation related to PCA-3modes or PCA-nmodes models.
If a function is given it may include dim
as argument.
Didier G. Leibovici
Kroonenberg P.M (1983) Three-mode Principal Component Analysis: Theory and Applications. DSWO Press, Leiden.
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329.
Performs the Tuckern model using a space version of RPVSCC (SINGVA
).
PCAn(X,dim=c(2,2,2,3),test=1E-12,Maxiter=400, smoothing=FALSE,smoo=list(NA), verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="")
PCAn(X,dim=c(2,2,2,3),test=1E-12,Maxiter=400, smoothing=FALSE,smoo=list(NA), verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="")
X |
a tensor (as an array) of order k, if non-identity metrics are
used |
dim |
a vector of numbers specifying the dimensions in each space |
test |
control of convergence |
Maxiter |
maximum number of iterations allowed for convergence |
smoothing |
see |
smoo |
see |
verbose |
control printing |
file |
output printed at the prompt if |
modesnam |
character vector of the names of the modes, if |
addedcomment |
character string printed after the title of the analysis |
Looking for the best rank-one tensor approximation (LS) the three
methods described in the package are equivalent. If the number of
tensors looked for is greater then one the methods differs:
PTA-kmodes will "look" for "best" approximation according to the
orthogonal rank (i.e. the rank-one tensors are
orthogonal), PCA-kmodes will look for best approximation
according to the space ranks (i.e. the rank of every
bilinear form, that is the number of components in each space),
PARAFAC/CANDECOMP will look for best approximation according to the
rank (i.e. the rank-one tensors are not necessarily
orthogonal). For the sake of comparisons the PARAFAC/CANDECOMP method
and the PCA-nmodes are also in the package but complete
functionnality of the use these methods and more complete packages
may be fetched at the www site quoted below.
Recent work from Tamara G Kolda showed on an example that orthogonal rank
decompositions are not necesseraly nested. This makes PTA-kmodes a model with
nested decompositions not giving the exact orthogonal rank.
So PTA-kmodes will look for best approximation according to orthogonal tensors in a nested approximmation process.
a PCAn
(inherits PTAk
) object
The use of metrics (diagonal or not) and smoothing extend flexibility of analysis.
Didier G. Leibovici
Caroll J.D and Chang J.J (1970) Analysis of individual differences in multidimensional scaling via n-way generalization of "Eckart-Young" decomposition. Psychometrika 35,283-319.
Harshman R.A (1970) Foundations of the PARAFAC procedure: models and conditions for "an explanatory" multi-mode factor analysis. UCLA Working Papers in Phonetics, 16,1-84.
Kroonenberg P (1983) Three-mode Principal Component Analysis: Theory and Applications. DSWO press. Leiden. (There was a maintained (by Pieter) library of contributions to multiway analysis ...))
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329.
Kolda T.G (2003) A Counterexample to the Possibility of an Extension of the Eckart-Young Low-Rank Approximation Theorem for the Orthogonal Rank Tensor Decomposition. SIAM J. Matrix Analysis, 24(2):763-767, Jan. 2003.
Screeplot of singular values or superposed plot of modes for one or two components (1 dimensional scatterplot with spread labels or scatterplot on two dimensions).
## S3 method for class 'PTAk' plot(x, labels = TRUE, mod = 1, nb1 = 1, nb2 = NULL, coefi = list(NULL, NULL), xylab = TRUE, ppch = (1:length(solution)), lengthlabels = 2, scree = FALSE, ordered = TRUE, nbvs = 40, RiskJack = NULL, method = "",ZoomInOut=NULL, Zlabels=NULL, Zcol=NULL, poslab=c(2,1,3,3), signedCTR = FALSE, relCTR = TRUE,...) RiskJackplot(x, nbvs = 1:20, mod = NULL, max = NULL, rescaled=TRUE, ...)
## S3 method for class 'PTAk' plot(x, labels = TRUE, mod = 1, nb1 = 1, nb2 = NULL, coefi = list(NULL, NULL), xylab = TRUE, ppch = (1:length(solution)), lengthlabels = 2, scree = FALSE, ordered = TRUE, nbvs = 40, RiskJack = NULL, method = "",ZoomInOut=NULL, Zlabels=NULL, Zcol=NULL, poslab=c(2,1,3,3), signedCTR = FALSE, relCTR = TRUE,...) RiskJackplot(x, nbvs = 1:20, mod = NULL, max = NULL, rescaled=TRUE, ...)
x |
an object inheriting from class |
labels |
logical if |
mod |
vectors of the modes numbers to be plotted |
nb1 |
number identifying the Principal Tensor to display on the vertical
axe, can be checked using |
nb2 |
as nb1 to be displayed on the horizontal axe, if |
coefi |
coefficients to multiply components for all modes (not just the one in |
xylab |
logical to display axes labels |
ppch |
a vector of length at least |
lengthlabels |
a number or a vector of numbers of characters in labels to be used for display |
scree |
logical to display a screeplot of squared singular values as percent of total variation |
ordered |
logical used when displaying the screeplot with sorted
values (TRUE) or the order is given by output listing from
|
nbvs |
a maximum number of singular values to display on the screeplot or a vector of ranks |
max |
is the number of singular values to be considered as giving the perfect fit, NULL is the max possible in x |
rescaled |
boolean to rescale the y axis to 0-100 |
RiskJack |
if not |
method |
default is |
ZoomInOut |
list used as [[1]] for xlim and [[2]] for ylim in xy-plots instead of max and min range |
Zlabels |
used as labels instead of |
Zcol |
list of vectors of colours for Zlabels |
poslab |
integer or vector for 'pos' parameter, position of labels |
signedCTR |
logical to plot signed-CTR instead of coordinates, see |
relCTR |
logical if |
... |
plot arguments can be passed (except |
Plot components of one or two Principal Tensors, modes are superposed if more
than one is asked, or gives a screeplot. As it is using plot.default
at
some point some added features can be used in the ... part, especially
xlab=
may be useful when nb2=NULL
. Plots are superposed as they
correspond to the same Principal Tensor and so this gives insight to
interpretation of it, but careful is recommended as only overall
interpretation, once the Principal Tensor has been rebuilt mentally
(i.e. product of signs ...) to work out oppositions or associations. The
risk plot on top of a screeplot is an approximation of the Jacknife estimate of
the MSE in the choice of number of dimensions (see Besse et al.(1997)).
This function is used all for FCAk
, and
CANDPARA
, PCAn
objjects notheless for this
last object other interesting plots known as jointplots have not been implemented.
Didier G. Leibovici
Besse, P Cardot, H and Ferraty, F (1997) Simultaneous non-parametric regressions of unbalanced longitudinal data. Computational Statistics and Data Analysis, 24:255-270.
Leibovici D (2000) Multiway Multidimensional Analysis for Pharmaco-EEG Studies. https://www.researchgate.net/publication/216807619_Multiway_Multidimensional_Analysis_for_Pharmaco-EEG_Studies
# see the demo function source(paste(R.home(),"/ library/PTAk/demo/PTA3.R",sep="")); # or source(paste(R.home(),"/ library/PTAk/demo/PTAk.R",sep="")); # demo.PTA3()
# see the demo function source(paste(R.home(),"/ library/PTAk/demo/PTA3.R",sep="")); # or source(paste(R.home(),"/ library/PTAk/demo/PTAk.R",sep="")); # demo.PTA3()
Choices of centering or detrending and scaling are important preprocessings for multiway analysis.
Multcent(dat,bi=c(1,2),by=3, centre=mean, centrebyBA=c(TRUE,FALSE),scalebyBA=c(TRUE,FALSE)) IterMV(n=10,dat,Mm=c(1,3),Vm=c(2,3), fFUN=mean,usetren=FALSE, tren=function(x)smooth.spline(as.vector(x),df=5)$y, rsd=TRUE) Detren(dat,Mm=c(1,3),rsd=TRUE, tren=function(x)smooth.spline(as.vector(x),df=5)$y ) Susan1D(y,x=NULL,sigmak=NULL,sigmat=NULL, ker=list(function(u)return(exp(-0.5*u**2))))
Multcent(dat,bi=c(1,2),by=3, centre=mean, centrebyBA=c(TRUE,FALSE),scalebyBA=c(TRUE,FALSE)) IterMV(n=10,dat,Mm=c(1,3),Vm=c(2,3), fFUN=mean,usetren=FALSE, tren=function(x)smooth.spline(as.vector(x),df=5)$y, rsd=TRUE) Detren(dat,Mm=c(1,3),rsd=TRUE, tren=function(x)smooth.spline(as.vector(x),df=5)$y ) Susan1D(y,x=NULL,sigmak=NULL,sigmat=NULL, ker=list(function(u)return(exp(-0.5*u**2))))
dat |
array |
bi |
vector defining the "centering, bicentering or multi-centering" one wants
to operate crossed with |
by |
number or vector defining the entries used "with" in the other operations |
centre |
function used as |
centrebyBA |
a bolean vector for "centering" with |
scalebyBA |
idem as centrebyBA, for scaling operation |
n |
number of iterations between "centering" and scaling |
Mm |
margins to performs |
Vm |
margins to scale |
fFUN |
function to use as |
usetren |
logical, to use |
tren |
function to use in |
rsd |
logical passed into |
y |
vector (length |
x |
vector of same length, if |
sigmak |
parameter related to kernel bandwidth with |
sigmat |
parameter related to kernel bandwidth with |
ker |
a list of two kernels |
Multcent
performs in order "centering" by by
;
"multicentering" for every bi
with by
; then scale
(standard deviation) to one by by
.
IterMV
performs an iterative "detrending" and scaling
according to te margins defined (see Leibovici(2000) and references
in it).
Detren
detrends (or smooths if rsd
is FALSE
)
the data accoding to th margins given.
Susan1D
performs a non-linear kernel smoothing of y
against x
(both reordered in the function according to orders
of x
) with an usual kernel (t
) as for kernel
regression and a kernel (t
) for the values of y
(the
product of the kernels constitutes the non-linear weightings. This
function is adapted from SUSAN algorithm (see references).
Didier G. Leibovici
Smith S.M. and J.M. Brady (1997) SUSAN - a new approach to low level image processing. International Journal of Computer Vision, 23(1):45-78, May 1997.
Orthogonal-tensoriel projection of a tensor according to a rank-1 tensor, or a to bigger structure defined by kronecker product of matrices.
PROJOT(X,solu,numo=1,bortho=TRUE,Ortho=TRUE,metrics=NULL)
PROJOT(X,solu,numo=1,bortho=TRUE,Ortho=TRUE,metrics=NULL)
X |
a tensor(as an array) of any order |
solu |
an object like a |
numo |
a vector of numbers or a list of vectors (length the order of the tensor) identifying for each space the structure to project onto, if NULL for a specific space then no projection is done for this space |
bortho |
list of logicals saying if the structures are othogonal or not. |
Ortho |
list of logicals telling the projectors on each space to be on the structure or on its orthogonal. |
metrics |
NULL or list of metrics (either diagonal or not) for each entry of |
This function computes the tensorial orthogonal projection of
X
onto the tensorial structure defined by solu
and numo
. For each space (involved in the tensorial product
where from X
belongs), one defined the projector onto
solu[[i]]$v[numo,]
(or on its orthogonal if
Ortho[[i]]==TRUE
), then the result is the image of X
by
the tensorial product of the projectors, i.e.
.
A tensor with dimensions as X
For PTA-kmodes the projection used is only on rank-one tensors
(Principal Tensors), i.e. numo
is a number. The code
here can be used for any structure (on each spaces) and constitutes
the projector onto a tensorial structure, and can define the
PTAIV-kmodes (PTAk on Instrumental Variables Leibovici(1993).
(see other references for tensorial product of linear operators in
Leibovici(2000) e.g. Dauxois et al.(1994))
Didier G. Leibovici [email protected]
Leibovici D(1993) Facteurs <e0> Mesures R<e9>p<e9>t<e9>es et Analyses Factorielles : applications <e0> un suivi <e9>pid<e9>miologique. Universit<e9> de Montpellier II. PhD Thesis in Math<e9>matiques et Applications (Biostatistiques).
Leibovici D (2000) Multiway Multidimensional Analysis for Pharmaco-EEG Studies. http://www.fmrib.ox.ac.uk/analysis/techrep/tr00dl2/tr00dl2.pdf
don <- array(1:360,c(5,4,6,3)) don <- don + rnorm(360,10,2) ones <- list(list(v=rep(1,5)),list(v=rep(1,4)),list(v=rep(1,6)),list(v=rep(1,3))) donfc <- PROJOT(don,ones) apply(donfc,c(2,3,4),mean) apply(donfc,c(1),mean) # implementation de PTAIVk with obvious settings PTAIVk <- function(X,STruct,...) {X <- PROJOT(X$data,STruct,numo=Struct[[1]]$numo,Ortho=Struct[[1]]$Ortho,metrics=X$met) PTAk(X,...) }
don <- array(1:360,c(5,4,6,3)) don <- don + rnorm(360,10,2) ones <- list(list(v=rep(1,5)),list(v=rep(1,4)),list(v=rep(1,6)),list(v=rep(1,3))) donfc <- PROJOT(don,ones) apply(donfc,c(2,3,4),mean) apply(donfc,c(1),mean) # implementation de PTAIVk with obvious settings PTAIVk <- function(X,STruct,...) {X <- PROJOT(X$data,STruct,numo=Struct[[1]]$numo,Ortho=Struct[[1]]$Ortho,metrics=X$met) PTAk(X,...) }
Performs a truncated SVD-3modes analysis with or without specific metrics, penalised or not.
PTA3(X,nbPT=2,nbPT2=1, smoothing=FALSE, smoo=list(function(u)ksmooth(1:length(u),u,kernel="normal", bandwidth=4,x.points=(1:length(u)))$y, function(u)smooth.spline(u,df=3)$y, NA), minpct=0.1,verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="", ...)
PTA3(X,nbPT=2,nbPT2=1, smoothing=FALSE, smoo=list(function(u)ksmooth(1:length(u),u,kernel="normal", bandwidth=4,x.points=(1:length(u)))$y, function(u)smooth.spline(u,df=3)$y, NA), minpct=0.1,verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="", ...)
X |
a tensor (as an array) of order 3, if non-identity metrics are
used |
nbPT |
a number specifying the number of 3modes Principal Tensors requested |
nbPT2 |
if 0 no 2-modes solutions will be computed, 1 =all, >1 otherwise |
smoothing |
logical to consider smoothing or not |
smoo |
a list of length 3 with lists of functions operating on vectors component for the appropriate dimension (see details) |
minpct |
numerical 0-100 to control of computation of future solutions at this level and below |
verbose |
control printing |
file |
output printed at the prompt if |
modesnam |
character vector of the names of the modes, if |
addedcomment |
character string printed after the title of the analysis |
... |
any other arguments passed to SVDGen or other functions |
According to the decomposition described in Leibovici(1993) and
Leibovici and Sabatier(1998) the function gives a generalisation of
the SVD (2 modes) to 3 modes. It is the same algorithm used
for PTAk
but simpler as the recursivity implied by the
k modes analysis is reduced only to one level i.e for
every 3-modes Principal Tensors, 3 SVD are performed for
every contracted product with one the three components of the
3-modes Principal Tensors (see APSOLU3
,
PTAk
).
Recent work from Tamara G Kolda showed on an example that orthogonal rank
decompositions are not necesseraly nested. This makes PTA-3modes a model with
nested decompositions not giving the exact orthogonal rank.
So PTA-3modes will look for best approximation according to orthogonal tensors in a nested approximmation process. PTA3 decompositions is "a" generalisation of SVD but not the ...
With the smoothing
option smoo
contain a list of (lists) of functions to
apply on vectors of component (within the algorithm, see
SVDgen
). For a given dimension (1,2,or 3) a list of
functions is given. If this list consists only of one function (no list
needed) this
function will be used at any level all the time : if one want to smooth
only for the first Principal Tensor, put list(function,NA)
. Now
you start to understand this list will have a maximum length of
nbPT
and the corresponding function will be used for the
corresponding 3mode Principal Tensor. To smooth differently the
associated solutions one have to put another level of nested lists
otherwise the function given at the 3mode level will be used for
all. These rules are te same for PTAk
.
a PTAk
object
The use of metrics (diagonal or not) allows flexibility of analysis like in 2 modes
e.g. correspondence analysis, discriminant analysis, robust analysis. Smoothing option
extends the analysis towards functional data analysis, and or outliers "protection" is
theoretically valid for tensors belonging to a tensor product of separable Hilbert spaces
(e.g. Sobolev spaces) (see references in PTAk
, SVDgen
).
Didier G. Leibovici
Leibovici D(1993) Facteurs <e0> Mesures R<e9>p<e9>t<e9>es et Analyses Factorielles : applications <e0> un suivi <e9>pid<e9>miologique. Universit<e9> de Montpellier II. PhD Thesis in Math<e9>matiques et Applications (Biostatistiques).
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329.
Kolda T.G (2003) A Counterexample to the Possibility of an Extension of the Eckart-Young Low-Rank Approximation Theorem for the Orthogonal Rank Tensor Decomposition. SIAM J. Matrix Analysis, 24(2):763-767, Jan. 2003.
Leibovici DG (2010) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes:the R package PTAk. Journal of Statistical Software, 34(10), 1-34. doi:10.18637/jss.v034.i10
SVDgen
, FCAk
, PTAk
, summary.PTAk
# example using Zone_climTUN dataset # # library(maptools) # library(RColorBrewer) # Yl=brewer.pal(11,"PuOr") # data(Zone_climTUN) ## in fact a modified version of plot.Map was used # plot(Zone_climTUN,ol=NA,auxvar=Zone_climTUN$att.data$PREC_OCTO) ##indicators 84 +3 to repeat # Zone_clim<-Zone_climTUN$att.data[,c(2:13,15:26,28:39,42:53,57:80,83:95,55:56)] # Zot <-Zone_clim[,85:87] ;temp <-colnames(Zot) # Zot <- as.matrix(Zot)%x%t(as.matrix(rep(1,12))) # colnames(Zot) <-c(paste(rep(temp [1],12),1:12),paste(rep(temp [2],12),1:12), # paste(rep(temp [3],12),1:12)) # Zone_clim <-cbind(Zone_clim[,1:84],Zot) # Zone3w <- array(as.vector(as.matrix(Zone_clim)),c(2599,12,10)) ## preprocessing #Zone3w<-Multcent(dat=Zone3w,bi=NULL,by=3,centre=mean, # centrebyBA=c(TRUE,FALSE),scalebyBA=c(TRUE,FALSE)) # Zone3w.PTA3<-PTA3(Zone3w,nbPT=3,nbPT2=3) ## summary and plot # summary(Zone3w.PTA3) #plot(Zone3w.PTA3,mod=c(2,3),nb1=1,nb2=11,lengthlabels=5,coefi=list(c(1,1,1),c(1,-1,-1))) #plot(Zone_climTUN,ol=NA,auxvar=Zone3w.PTA3[[1]]$v[1,],nclass=30) #plot(Zone_climTUN,ol=NA,auxvar=Zone3w.PTA3[[1]]$v[11,],nclass=30) ############## cat(" A little fun using iris3 and matching randomly 15 for each iris sample!","\n") cat(" then performing a PTA-3modes. If many draws are done, plots") cat(" show the stability of the first and third Principal Tensors.","\n") cat("iris3 is centered and reduced beforehand for each original variables.","\n") # demo function # source(paste(R.home(),"/library/PTAk/demo/PTA3.R",sep="")) # demo.PTA3(bootn=10,show=5,openX11s=FALSE)
# example using Zone_climTUN dataset # # library(maptools) # library(RColorBrewer) # Yl=brewer.pal(11,"PuOr") # data(Zone_climTUN) ## in fact a modified version of plot.Map was used # plot(Zone_climTUN,ol=NA,auxvar=Zone_climTUN$att.data$PREC_OCTO) ##indicators 84 +3 to repeat # Zone_clim<-Zone_climTUN$att.data[,c(2:13,15:26,28:39,42:53,57:80,83:95,55:56)] # Zot <-Zone_clim[,85:87] ;temp <-colnames(Zot) # Zot <- as.matrix(Zot)%x%t(as.matrix(rep(1,12))) # colnames(Zot) <-c(paste(rep(temp [1],12),1:12),paste(rep(temp [2],12),1:12), # paste(rep(temp [3],12),1:12)) # Zone_clim <-cbind(Zone_clim[,1:84],Zot) # Zone3w <- array(as.vector(as.matrix(Zone_clim)),c(2599,12,10)) ## preprocessing #Zone3w<-Multcent(dat=Zone3w,bi=NULL,by=3,centre=mean, # centrebyBA=c(TRUE,FALSE),scalebyBA=c(TRUE,FALSE)) # Zone3w.PTA3<-PTA3(Zone3w,nbPT=3,nbPT2=3) ## summary and plot # summary(Zone3w.PTA3) #plot(Zone3w.PTA3,mod=c(2,3),nb1=1,nb2=11,lengthlabels=5,coefi=list(c(1,1,1),c(1,-1,-1))) #plot(Zone_climTUN,ol=NA,auxvar=Zone3w.PTA3[[1]]$v[1,],nclass=30) #plot(Zone_climTUN,ol=NA,auxvar=Zone3w.PTA3[[1]]$v[11,],nclass=30) ############## cat(" A little fun using iris3 and matching randomly 15 for each iris sample!","\n") cat(" then performing a PTA-3modes. If many draws are done, plots") cat(" show the stability of the first and third Principal Tensors.","\n") cat("iris3 is centered and reduced beforehand for each original variables.","\n") # demo function # source(paste(R.home(),"/library/PTAk/demo/PTA3.R",sep="")) # demo.PTA3(bootn=10,show=5,openX11s=FALSE)
Performs a truncated SVD-kmodes analysis with or without specific metrics, penalised or not.
PTAk(X,nbPT=2,nbPT2=1,minpct=0.1, smoothing=FALSE, smoo=list(NA), verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="", ...)
PTAk(X,nbPT=2,nbPT2=1,minpct=0.1, smoothing=FALSE, smoo=list(NA), verbose=getOption("verbose"),file=NULL, modesnam=NULL,addedcomment="", ...)
X |
a tensor (as an array) of order k, if non-identity metrics are
used |
nbPT |
integer vector of length (k-2) specifying the maximum number of Principal Tensors requested for the (3,...,k-1, k) modes levels (see details), if it is not a vector every levels would have the same given nbPT value |
nbPT2 |
if 0 no 2-modes solutions will be computed, 1 =all, >1 otherwise |
minpct |
numerical 0-100 to control of computation of future solutions at this level and below |
smoothing |
|
smoo |
see |
verbose |
control printing |
file |
output printed at the prompt if |
modesnam |
character vector of the names of the modes, if |
addedcomment |
character string printed if |
... |
any other arguments passed to other functions |
According to the decomposition described in Leibovici(1993) and
Leibovici and Sabatier(1998) the function gives a generalisation of
the SVD (2 modes) to k modes. The algorithm is recursive,
calling APSOLUk
which calls PTAk
for (k-1).
nbPT
, nbPT2
and minpct
control the number of
Principal Tensors desired. For example nbPT=c(2,4,3)
means a
tensor of order 5 is analysed, the maximum number of 5-modes
PT is set to 3, for each of them one sets a maximum of
4 associated 4-modes (for each of the five components),
for each of these later a maximum of 2 associated
3-modes PT is asked (for each of the four components). Then
nbPT2
complete for 2-modes associated or not. Overall
minpct
controls to carry on the algorithm at any level and
lower, i.e. stops if (where
is the singular value, and ssx is the total sum of
squares of the tensor
or the "metric transformed"
).
Putting a
at a given level in
nbPT
obviously
automatically puts in
nbPT
at lower levels. Putting
high values in nbPT
allows control only on minpct
helping to reach the full decomposition. All these controls allow to
truncate the full decomposition in a level-controlled fashion. Notice
the full decomposition always contains any possible choice of
truncation, i.e. the solutions are not dependant on the
truncation scheme (Generalised Eckart-Young Theorem).
Recent work from Tamara G Kolda showed on an example that orthogonal rank
decompositions are not necesseraly nested. This makes PTA-kmodes a model with
nested decompositions not giving the exact orthogonal rank.
So PTA-kmodes will look for best approximation according to orthogonal tensors in a nested approximmation process.
a PTAk
object which consist of a list of lists. Each mode has a list in which is listed:
$v |
matrix of components for the given mode |
$iter |
vector of iterations numbers where maximum was reach |
$test |
vector of test values at maximum |
$modesnam |
name of the mode |
$v |
matrix of components for the given mode |
The last mode list has also some additional information on the analysis done:
$d |
vector of singular values |
$pct |
percentage of sum of squares for each quared singular value |
$ssX |
vector of local sum of squares i.e. of the current tensor with the rescursive algorithm |
$vsnam |
vector of names given to the singular value according to a recursive data dependent scheme |
$datanam |
data reference |
$method |
call applied: could be PTAk or CANDPARA or PCAn or even SVDgen, with parameters choices |
$addedcomment |
the addedcomment (repeated) given in the call |
You will notice that methods other than PTAk may not have all list elements but the essential ones such as: $v, $d, $ssX, and may also have additional ones like $coremat for PCAn (the core array).
The use of metrics (diagonal or not) allows flexibility of analysis like in 2 modes e.g. correspondence analysis, discriminant analysis, robust analysis. Smoothing option extending the analysis towards functional data analysis is theoretically valid for Principal Tensors belonging to a tensor product of separable Hilbert spaces (e.g. Sobolev spaces) see Leibovici and El Maach (1997).
Didier G. Leibovici
Leibovici D(1993) Facteurs <e0> Mesures R<e9>p<e9>t<e9>es et Analyses Factorielles : applications <e0> un suivi <e9>pid<e9>miologique. Universit<e9> de Montpellier II. PhD Thesis in Math<e9>matiques et Applications (Biostatistiques).
Leibovici D and El Maache H (1997) Une d<e9>composition en Valeurs Singuli<e8>res d'un <e9>l<e9>ment d'un produit Tensoriel de k espaces de Hilbert S<e9>parables. Compte Rendus de l'Acad<e9>mie des Sciences tome 325, s<e9>rie I, Statistiques (Statistics) & Probabilit<e9>s (Probability Theory): 779-782.
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329. Kolda T.G (2003) A Counterexample to the Possibility of an Extension of the Eckart-Young Low-Rank Approximation Theorem for the Orthogonal Rank Tensor Decomposition. SIAM J. Matrix Analysis, 24(2):763-767, Jan. 2003.
Leibovici D (2008) A Simple Penalised algorithm for SVD and Multiway functional methods. (to be submitted in the futur)
Leibovici DG (2010) Spatio-temporal Multiway Decomposition using Principal Tensor Analysis on k-modes:the R package PTAk. Journal of Statistical Software, 34(10), 1-34. doi:10.18637/jss.v034.i10
REBUILD
, FCAk
, PTA3
summary.PTAk
# don <- array((1:3)%x%rnorm(6*4)%x%(1:10),c(10,4,6,3)) don <- array(1:360,c(5,4,6,3)) don <- don + rnorm(360,1,2) dimnames(don) <- list(paste("s",1:5,sep=""),paste("T",1:4,sep=""), paste("t",1:6,sep=""),c("young","normal","old")) # hypothetic data on learning curve at different age and period of year ones <-list(list(v=rep(1,5)),list(v=rep(1,4)),list(v=rep(1,6)),list(v=rep(1,3))) don <- PROJOT(don,ones) don.sol <- PTAk(don,nbPT=1,nbPT2=2,minpct=0.01, verbose=TRUE, modesnam=c("Subjects","Trimester","Time","Age"), addedcomment="centered on each mode") don.sol[[1]] # mode Subjects results and components don.sol[[2]] # mode Trimester results and components don.sol[[3]] # mode Time results and components don.sol[[4]] # mode Age results and components with additional information on the call summary(don.sol,testvar=2) plot(don.sol,mod=c(1,2,3,4),nb1=1,nb2=NULL, xlab="Subjects/Trimester/Time/Age",main="Best rank-one approx" ) plot(don.sol,mod=c(1,2,3,4),nb1=4,nb2=NULL, xlab="Subjects/Trimester/Time/Age",main="Associated to Subject vs1111") # demo function # demo.PTAk()
# don <- array((1:3)%x%rnorm(6*4)%x%(1:10),c(10,4,6,3)) don <- array(1:360,c(5,4,6,3)) don <- don + rnorm(360,1,2) dimnames(don) <- list(paste("s",1:5,sep=""),paste("T",1:4,sep=""), paste("t",1:6,sep=""),c("young","normal","old")) # hypothetic data on learning curve at different age and period of year ones <-list(list(v=rep(1,5)),list(v=rep(1,4)),list(v=rep(1,6)),list(v=rep(1,3))) don <- PROJOT(don,ones) don.sol <- PTAk(don,nbPT=1,nbPT2=2,minpct=0.01, verbose=TRUE, modesnam=c("Subjects","Trimester","Time","Age"), addedcomment="centered on each mode") don.sol[[1]] # mode Subjects results and components don.sol[[2]] # mode Trimester results and components don.sol[[3]] # mode Time results and components don.sol[[4]] # mode Age results and components with additional information on the call summary(don.sol,testvar=2) plot(don.sol,mod=c(1,2,3,4),nb1=1,nb2=NULL, xlab="Subjects/Trimester/Time/Age",main="Best rank-one approx" ) plot(don.sol,mod=c(1,2,3,4),nb1=4,nb2=NULL, xlab="Subjects/Trimester/Time/Age",main="Associated to Subject vs1111") # demo function # demo.PTAk()
Internal PTAk functions
Ginv(A) PPMA(X,test=1E-10,pena=list(function(u)ksmooth(1:length(u),u,kernel="normal", bandwidth=3,x.points=(1:length(u)))$y ,NA) ,ini=mean,vsmin=1E-20,Maxiter=2000, ...) Powmat(A,pw,eltw=FALSE) RaoProd(A,B) REBUILDPCAn(solu) RESUM(solb,sola=NULL,numass=NULL,verbose=getOption("verbose"),file=NULL ,summary=FALSE,testvar=0.1,with=TRUE) svdsmooth(X,nomb=min(dim(X)), smooth=list(function(u)ksmooth(1:length(u),u,kernel="normal", bandwidth=3,x.points=(1:length(u)))$y),vsmin=1E-16, ...) toplist(li) svd.p(X,...)
Ginv(A) PPMA(X,test=1E-10,pena=list(function(u)ksmooth(1:length(u),u,kernel="normal", bandwidth=3,x.points=(1:length(u)))$y ,NA) ,ini=mean,vsmin=1E-20,Maxiter=2000, ...) Powmat(A,pw,eltw=FALSE) RaoProd(A,B) REBUILDPCAn(solu) RESUM(solb,sola=NULL,numass=NULL,verbose=getOption("verbose"),file=NULL ,summary=FALSE,testvar=0.1,with=TRUE) svdsmooth(X,nomb=min(dim(X)), smooth=list(function(u)ksmooth(1:length(u),u,kernel="normal", bandwidth=3,x.points=(1:length(u)))$y),vsmin=1E-16, ...) toplist(li) svd.p(X,...)
These functions are not supposed to be called directly.
X |
a matrix |
test |
a zero limit number |
pena |
list of functions to be used as smoother |
ini |
initialisation method over the dual dimension |
vsmin |
zero limit for singular value |
Maxiter |
limit number of iteration |
A |
a matrix |
pw |
power value number |
eltw |
boolean to perform power elementwise or matrix power |
B |
a matrix |
solb |
an object inheriting from class |
sola |
an object inheriting from class |
solu |
an object inheriting from class |
numass |
position number of the associated solution, NULL is equivalent to the last in |
verbose |
boolean playing a verbose role |
file |
string pointing a destination of file output |
summary |
boolean to show the summary or not |
testvar |
threshold control for minimum percent of variability explained |
with |
boolean expression to give a supplementary selection criterion |
nomb |
integer giving the number of components to fit |
smooth |
idem as pena |
li |
any list |
... |
any other arguments passed to functions |
Didier G. Leibovici
Gives the approximation of a previously analysed tensor using its given decomposition.
REBUILD(solutions,nTens=1:2,testvar=1 ,redundancy=FALSE)
REBUILD(solutions,nTens=1:2,testvar=1 ,redundancy=FALSE)
solutions |
a |
nTens |
a vector of identifying positions (numbers given in |
testvar |
control within |
redundancy |
logical to take into account (within |
The function rebuilds the Principal Tensors, i.e. rank-one
tensors of order the order of the tensor analysed, and add them up to
build an approximation of the tensor analysed (according to the
method used see method
). This constitutes a best Least Squares
(ordinary or "weighted" if metrics are used) approximation of
datanam
for a given orthogonal-rank r (number of
principal tensors used), if and only if the singular values used are
the r highest.
A tensor with dimensions as solutions[[k]][["datanam"]]
.
This function can be called for PARAFAC/CANDECOMP
and
PCAn
. A specific rebuilt is implemented for this last one.
Didier G. Leibovici
Computes the best rank-one approximation using the RPVSCC algorithm.
SINGVA(X,test=1E-12,PTnam="vs111",Maxiter=2000, verbose=getOption("verbose"),file=NULL, smoothing=FALSE,smoo=list(NA), modesnam=NULL, Ini="svds",sym=NULL)
SINGVA(X,test=1E-12,PTnam="vs111",Maxiter=2000, verbose=getOption("verbose"),file=NULL, smoothing=FALSE,smoo=list(NA), modesnam=NULL, Ini="svds",sym=NULL)
X |
a tensor (as an array) of order k, if non-identity metrics are
used |
test |
numerical value to stop optimisation |
PTnam |
character giving the name of the k-modes Principal Tensor |
Maxiter |
if |
verbose |
control printing |
file |
output printed at the prompt if |
smoothing |
logical to use smooth functiosns or not (see
|
smoo |
list of functions returning smoothed vectors (see
|
modesnam |
character vector of the names of the modes, if |
Ini |
method used for initialisation of the algorithm (see |
sym |
description of the symmetry of the tensor e.g. c(1,1,3,4,1) means the second mode and the fifth are identical to the first |
The algorithm termed RPVSCC in Leibovici(1993) is implemented
to compute the first Principal Tensor (rank-one tensor with its
singular value) of the given tensor X
. According to the
decomposition described in Leibovici(1993) and Leibovici and
Sabatier(1998), the function gives a generalisation to k
modes of the best rank-one approximation issued from SVD whith
2 modes. It is identical to the PCA-kmodes if only 1
dimension is asked in each space, and to PARAFAC/CANDECOMP if the
rank of the approximation is fixed to 1. Then the methods differs,
PTA-kmodes will look for best approximation according to the
orthogonal rank (i.e. the rank-one tensors (of the
decomposition) are orthogonal), PCA-kmodes will look for best
approximation according to the space ranks (i.e. ranks
of every bilinear form deducted from the original tensor, that is the
number of components in each space), PARAFAC/CANDECOMP will look for
best approximation according to the rank (i.e. the
rank-one tensors are not necessarily orthogonal).
Recent work from Tamara G Kolda showed on an example that orthogonal rank
decompositions are not necesseraly nested. This makes PTA-kmodes a model with
nested decompositions not giving the exact orthogonal rank.
So PTA-kmodes will look for best approximation according to orthogonal tensors in a nested approximmation process.
a PTAk
object (without datanam method
)
The algorithm was derived in generalising the transition
formulae of SVD (Leibovici 1993), can also be understood as a
generalisation of the power method (De Lathauwer et al.
2000). In this paper they also use a similar algorithm to build
bases in each space, reminiscent of three-modes and n-modes
PCA (Kroonenberg(1980)), i.e. defining what they called a
rank-(R1,R2,...,Rn) approximation (called here space ranks,
see PCAn
). RPVSCC stands for Recherche de la Premi<e8>re
Valeur Singuli<e8>re par Contraction
Compl<ea>te.
Didier G. Leibovici
Kroonenberg P (1983) Three-mode Principal Component Analysis: Theory and Applications. DSWO press. Leiden.
Leibovici D(1993) Facteurs <e0> Mesures R<e9>p<e9>t<e9>es et Analyses Factorielles : applications <e0> un suivi <e9>pid<e9>miologique. Universit<e9> de Montpellier II. PhD Thesis in Math<e9>matiques et Applications (Biostatistiques).
Leibovici D and Sabatier R (1998) A Singular Value Decomposition of a k-ways array for a Principal Component Analysis of multi-way data, the PTA-k. Linear Algebra and its Applications, 269:307-329.
De Lathauwer L, De Moor B and Vandewalle J (2000) On the best rank-1 and rank-(R1,R2,...,Rn) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21,4:1324-1342.
Kolda T.G (2003) A Counterexample to the Possibility of an Extension of the Eckart-Young Low-Rank Approximation Theorem for the Orthogonal Rank Tensor Decomposition. SIAM J. Matrix Analysis, 24(2):763-767, Jan. 2003.
Print a summary listing of the decomposition obtained.
## S3 method for class 'PTAk' summary(object,testvar=1,dontshow="*", ...) ## S3 method for class 'FCAk' summary(object,testvar=0.5,dontshow="*", ...)
## S3 method for class 'PTAk' summary(object,testvar=1,dontshow="*", ...) ## S3 method for class 'FCAk' summary(object,testvar=0.5,dontshow="*", ...)
object |
an object inheriting from class |
testvar |
control within |
dontshow |
boolean criterion to remove Principal Tensors from the summary, or
default is a character "*" equivalent to the criterion:
|
... |
summary generic additional arguments not used here |
The function prints a listing of the decomposition with historical
order (instead of traditional singular value order). It is useful
before any plots or reconstruction, a screeplot (using
plot.PTAk
) will be also useful. It is useful before any plots
r reconstruction, a screeplot (using plot.PTAk
) will be also
useful. summary.FCAk
is alike
summary.PTAk
but testvar
operates on the variability of
the lack of complete independence.
prints on the prompt with an invisible return of the summary table
At the moment can be used for PCAn
,
CANDPRA
, better summaries will be in the next release.
Didier G. Leibovici [email protected]
Leibovici D (2000) Multiway Multidimensional Analysis for Pharmaco-EEG Studies.(submitted) https://www.researchgate.net/publication/216807619_Multiway_Multidimensional_Analysis_for_Pharmaco-EEG_Studies
data(crimerate) crimerate.mat <- sweep(crimerate,2,apply(crimerate,2,mean)) crimerate.mat <- sweep(crimerate.mat,2,sqrt(apply(crimerate,2,var)),FUN="/") cri.svd <- SVDgen(crimerate.mat) summary(cri.svd,testvar=0) plot(cri.svd,scree=TRUE) par(new=TRUE) RiskJackplot(cri.svd,nbvs=1:7,mod=NULL,max=NULL,rescaled=TRUE, axes=FALSE,ann=FALSE) par(new=FALSE) # or equivalently plot(cri.svd,scree=TRUE,type="b",lty=3,RiskJack=1) #set mod=NULL or c(1,2) ### data(crimerate) criafc <- FCAmet(crimerate,chi2=TRUE) cri.afc <- SVDgen(criafc$data,criafc$met[[2]],criafc$met[[1]]) summary(cri.afc) plot(cri.afc,scree=TRUE) plot(cri.afc,scree=TRUE,type="b",lty=3,RiskJack=1,method="FCA")
data(crimerate) crimerate.mat <- sweep(crimerate,2,apply(crimerate,2,mean)) crimerate.mat <- sweep(crimerate.mat,2,sqrt(apply(crimerate,2,var)),FUN="/") cri.svd <- SVDgen(crimerate.mat) summary(cri.svd,testvar=0) plot(cri.svd,scree=TRUE) par(new=TRUE) RiskJackplot(cri.svd,nbvs=1:7,mod=NULL,max=NULL,rescaled=TRUE, axes=FALSE,ann=FALSE) par(new=FALSE) # or equivalently plot(cri.svd,scree=TRUE,type="b",lty=3,RiskJack=1) #set mod=NULL or c(1,2) ### data(crimerate) criafc <- FCAmet(crimerate,chi2=TRUE) cri.afc <- SVDgen(criafc$data,criafc$met[[2]],criafc$met[[1]]) summary(cri.afc) plot(cri.afc,scree=TRUE) plot(cri.afc,scree=TRUE,type="b",lty=3,RiskJack=1,method="FCA")
Computes the generalised Singular Value Decomposition, i.e. with
non-identity metrics. A smooth approximation can be asked to constraint the
components (u
and v
) to be smooth.
SVDgen(Y, D2 = 1, D1 = 1, smoothing = FALSE, nomb = NULL, smoo = list(function(u)ksmooth( 1:length(u), u, kernel = "normal", bandwidth = 3, x.points = (1:length(u)))$y))
SVDgen(Y, D2 = 1, D1 = 1, smoothing = FALSE, nomb = NULL, smoo = list(function(u)ksmooth( 1:length(u), u, kernel = "normal", bandwidth = 3, x.points = (1:length(u)))$y))
Y |
a matrix |
D2 |
metric in |
D1 |
metric in |
smoothing |
logical if |
nomb |
numeric number of components to extract (typically when smoothing is used less components are used as the screeplot becomes flatter faster) |
smoo |
list of lists of smoothing functions on a vector of the approriate dimension; if on one dimension it is
|
The function computes the decomposition where
and
and the diagonal matrix
containing no zeros squared singular values. If
smoothing
a constraint on Least Squares solution is used, then the
above decomposition becomes an approximation (unless X
belongs to the space defined by the constraints). A Power Method algorithm to compute each
principal tensor is used wherein Alternated Least Squares are always followed by a smoothed
version of the updated vectors. If a Spline smoothing was used the algorithm would be equivalent
to use the traditional penalised least squares at each iteration and could be called
Penalised Power Method or Splined Alternated Least Squares Algorithm (SALSA is already an
acronym used by Besse and Ferraty (1995) in where a similar idea is developped: but smoothing
operates only on variables, and is model based as the Alternating operates on the whole
approximation i.e. given the choice of the dimension
reduction).
a PTAk
object
SVDgen
makes use of a non-identity version svd
(inbuilt) or
svdksmooth
which outputs like the inbuilt svd
. The smoothing
option is also implemented in PTA-kmodes, FCA-kmodes, PCAn and
CANDECOMP/PARAFAC. The use of metrics (diagonal or not) allows flexibility of
analysis like e.g. correspondence analysis, discriminant analysis,
robust analysis. Smoothing option extends the analysis towards functional data
analysis, and or outliers protection.
This smoothing penalising approach is theoretically valid for Principal Tensors (here order 2) belonging
to a tensor product of separable Hilbert spaces (e.g. Sobolev
spaces) see Leibovici and El Maach (1997), and in fact only valid for
projection onto this space : this includes polynomial fitting, spline
basis fitting ... As you are penalysing the alternating optimisation
criterion you also need the to get a robust fit at each iteration to be
able to reach stationarity and declare optimisation done. If the smoother is not linear one looses orthogonality of
the corresponding components but they are usually not too much correlated
and preserving one mode to be unsmoothed insured orthogonality of the
whole decomposition. Alternatively keepOrtho
insures (as a third
step optimisation for each iteration) orthogonality with the previous
component (but then the solution is approximatively in the space of constraints).
The flexibility of this function smoothing
constraint should be carefully used. The
function offers also the choice to change of smoothing (method or parameters)
as the number of components grows as in Ramsay and Silverman (1997).
Didier G. Leibovici [email protected]
Leibovici D and El Maache H (1997) Une décomposition en Valeurs Singulières d'un élément d'un produit Tensoriel de k espaces de Hilbert Séparables. Compte Rendus de l'Académie des Sciences tome 325, série I, Statistiques (Statistics) & Probabilités (Probability Theory): 779-782.
Besse P and Ferraty F (1995) Curvilinear fixed effect model. Computational Statistics, 10:339-351.
Leibovici D (2008) A Simple Penalised algorithm for SVD and Multiway functional methods. (to be submitted)
Ramsay J.O. and Silverman B.W. (1997) Functional Data Analysis. Springer Series in Statistics.
#library(stats) #library(tensor) # on smoothing data(longley) long <- as.matrix(longley[,-6]) long.svd <- SVDgen(long,smoothing=FALSE) summary.PTAk(long.svd,testvar=0) # X11(width=4,height=4) plot.PTAk(long.svd,scree=TRUE,RiskJack=4,type="b",lty=3) long.svdo <- SVDgen(long,smoothing=TRUE, smoo=list(function(u)ksmooth(1:length(u), u,kernel="normal",bandwidth=3,x.points=(1:length(u)))$y,NA)) summary.PTAk(long.svdo,testvar=0) # X11(width=4,height=4) plot.PTAk(long.svdo,scree=TRUE,type="b",lty=3) ###using polynomial fitting polyfit <- function(u,deg=length(u)/5) {n <- length(u);time <- rep(1,n); for(e in 1:deg)time<-cbind(time,(1:n)^e);return(lm.fit(time,u)$fitted.values)} bsfit<-function(u,deg=42) {n <- length(u);time <- rep(1,n); return(lm.fit(bs(time,df=deg),u)$fitted.values)} ### long.svdo2 <- SVDgen(long,nomb=4,smoothing=TRUE,smoo=list(polyfit,NA)) long.svdo2[[1]]$v[1:3,] long.svdo[[1]]$v[1:3,] # orthogonality may be lost with non-projective smoother #### comtoplot <- function(com=1,solua=long.svd,solub=long.svdo,openX11s=FALSE,...) { if(openX11s)X11(width=4,height=4) yla <- c(round((100*(solua[[2]]$d[com])^2)/ solua[[2]]$ssX[1],4), round((100*(solub[[2]]$d[com])^2)/solua[[2]]$ssX[1],4)) limi <- range(c(solua[[1]]$v[com,],solub[[1]]$v[com,])) plot(solua,nb1=com, mod=1,type="b",lty=3,lengthlabels=4,cex=0.4, ylimit=limi,ylab="",...) mtext(paste("vs",com,":",yla[1],"%"),2,col=2,line=2) par(new=TRUE) plot.PTAk(solub,nb1=com,mod=1,labels=FALSE,type="b",lty=1, lengthlabels=4,cex=0.6,ylimit=limi,ylab="",main=paste("smooth vs",com,":",yla[2],"%"),...) par(new=FALSE) } #### comtoplot(com=1) # on using non-diagonal metrics data(crimerate) crimerate.mat <- sweep(crimerate,2,apply(crimerate,2,mean)) crimerate.mat <- sweep(crimerate.mat,2,sqrt(apply(crimerate.mat,2,var)),FUN="/") metW <- Powmat(CauRuimet(crimerate.mat),(-1)) # inverse of the within "group" (to play a bit more you could set m0 relating # the neighbourhood of states (see CauRuimet) cri.svd <- SVDgen(crimerate.mat,D2=1,D1=1) summary(cri.svd,testvar=0) plot(cri.svd,scree=TRUE,RiskJack=4,type="b",lty=3) cri.svdo <- SVDgen(crimerate.mat,D2=metW,D1=1) summary(cri.svdo,testvar=0) plot(cri.svdo,scree=TRUE,RiskJack=4,type="b",lty=3) # X11(width=8,height=4) par(mfrow=c(1,2)) plot(cri.svd,nb1=1,nb2=2,mod=1,lengthlabels=3) plot(cri.svd,nb1=1,nb2=2,mod=2,lengthlabels=4,main="canonical") # X11(width=8,height=4) par(mfrow=c(1,2)) plot(cri.svdo,nb1=1,nb2=2,mod=1,lengthlabels=3) plot(cri.svdo,nb1=1,nb2=2,mod=2,lengthlabels=4, main=expression(paste("metric ",Wg^{-1}))) ########### # demo function # when ima is NULL it uses the dataset timage12 but you can put any array # demo.SVDgen(ima=NULL,snr=3,openX11s=TRUE)
#library(stats) #library(tensor) # on smoothing data(longley) long <- as.matrix(longley[,-6]) long.svd <- SVDgen(long,smoothing=FALSE) summary.PTAk(long.svd,testvar=0) # X11(width=4,height=4) plot.PTAk(long.svd,scree=TRUE,RiskJack=4,type="b",lty=3) long.svdo <- SVDgen(long,smoothing=TRUE, smoo=list(function(u)ksmooth(1:length(u), u,kernel="normal",bandwidth=3,x.points=(1:length(u)))$y,NA)) summary.PTAk(long.svdo,testvar=0) # X11(width=4,height=4) plot.PTAk(long.svdo,scree=TRUE,type="b",lty=3) ###using polynomial fitting polyfit <- function(u,deg=length(u)/5) {n <- length(u);time <- rep(1,n); for(e in 1:deg)time<-cbind(time,(1:n)^e);return(lm.fit(time,u)$fitted.values)} bsfit<-function(u,deg=42) {n <- length(u);time <- rep(1,n); return(lm.fit(bs(time,df=deg),u)$fitted.values)} ### long.svdo2 <- SVDgen(long,nomb=4,smoothing=TRUE,smoo=list(polyfit,NA)) long.svdo2[[1]]$v[1:3,] long.svdo[[1]]$v[1:3,] # orthogonality may be lost with non-projective smoother #### comtoplot <- function(com=1,solua=long.svd,solub=long.svdo,openX11s=FALSE,...) { if(openX11s)X11(width=4,height=4) yla <- c(round((100*(solua[[2]]$d[com])^2)/ solua[[2]]$ssX[1],4), round((100*(solub[[2]]$d[com])^2)/solua[[2]]$ssX[1],4)) limi <- range(c(solua[[1]]$v[com,],solub[[1]]$v[com,])) plot(solua,nb1=com, mod=1,type="b",lty=3,lengthlabels=4,cex=0.4, ylimit=limi,ylab="",...) mtext(paste("vs",com,":",yla[1],"%"),2,col=2,line=2) par(new=TRUE) plot.PTAk(solub,nb1=com,mod=1,labels=FALSE,type="b",lty=1, lengthlabels=4,cex=0.6,ylimit=limi,ylab="",main=paste("smooth vs",com,":",yla[2],"%"),...) par(new=FALSE) } #### comtoplot(com=1) # on using non-diagonal metrics data(crimerate) crimerate.mat <- sweep(crimerate,2,apply(crimerate,2,mean)) crimerate.mat <- sweep(crimerate.mat,2,sqrt(apply(crimerate.mat,2,var)),FUN="/") metW <- Powmat(CauRuimet(crimerate.mat),(-1)) # inverse of the within "group" (to play a bit more you could set m0 relating # the neighbourhood of states (see CauRuimet) cri.svd <- SVDgen(crimerate.mat,D2=1,D1=1) summary(cri.svd,testvar=0) plot(cri.svd,scree=TRUE,RiskJack=4,type="b",lty=3) cri.svdo <- SVDgen(crimerate.mat,D2=metW,D1=1) summary(cri.svdo,testvar=0) plot(cri.svdo,scree=TRUE,RiskJack=4,type="b",lty=3) # X11(width=8,height=4) par(mfrow=c(1,2)) plot(cri.svd,nb1=1,nb2=2,mod=1,lengthlabels=3) plot(cri.svd,nb1=1,nb2=2,mod=2,lengthlabels=4,main="canonical") # X11(width=8,height=4) par(mfrow=c(1,2)) plot(cri.svdo,nb1=1,nb2=2,mod=1,lengthlabels=3) plot(cri.svdo,nb1=1,nb2=2,mod=2,lengthlabels=4, main=expression(paste("metric ",Wg^{-1}))) ########### # demo function # when ima is NULL it uses the dataset timage12 but you can put any array # demo.SVDgen(ima=NULL,snr=3,openX11s=TRUE)
Computes the Tensor Product of a list of vectors (or matrices) according to a given order.
TENSELE(T,moins=NULL, asarray=TRUE,order=NULL,id=NULL)
TENSELE(T,moins=NULL, asarray=TRUE,order=NULL,id=NULL)
T |
a list like a |
moins |
if not |
asarray |
logical to specify the output form |
order |
if not |
id |
when |
The tensor product of the vectors (or matrices) in the list T
is
computed, skipping or not the indexes in moins
, the output tensor is
either in tensor form or in vector form. The way the tensor product is done
follows order
.
According to asarray
the value is either an array, or a vector
representing the tensor product of the vectors (not in moins), the dimension
in order[1]
running the slowest.
Didier G. Leibovici