Title: | Sequential Co-Sparse Factor Regression |
---|---|
Description: | Sequential factor extraction via co-sparse unit-rank estimation (SeCURE). |
Authors: | Aditya Mishra [aut, cre], Kun Chen [aut, cre] |
Maintainer: | Aditya Mishra <[email protected]> |
License: | GPL (>= 3.0) |
Version: | 0.6 |
Built: | 2024-11-14 06:34:01 UTC |
Source: | CRAN |
A list of two matrices used in Lee (2002).
data(CellCycle)
data(CellCycle)
A list with two components:
Chromatin immunoprecipitation data, a matrix of 1790 rows and 113 columns.
Eukariotic cell cycle data, a matrix of 1790 rows and 18 columns.
Matrix X, the chromatin immunoprecipitation (ChIP) data contain complete binding information of a subset of 1790 genes for a total of 113 transcription factors.
Matrix Y, the Eukariotic cell cycle data were generated using alpha factor arrest method, consisting of RNA levels measured every 7 minutes for 119 minutes with a total of 18 time points covering two cell cycle of 1790 genes.
Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber, G. K., Hannett, N. M., Harbison, C. T., Thompson, C. M., Simon, I. et al. (2002) Transcriptional regulatory networks in saccharomyces cerevisiae. Science, 298, 799-804.
# data(CellCycle) # X <- CellCycle$X;Y <- CellCycle$Y # n <- nrow(Y); p <- ncol(X); q <- ncol(Y) # control <- secure.control(spU=160/p,spV=1) # fit.cycle <- secure.path(Y, X, nrank = 10, nlambda = 100, # control = control)
# data(CellCycle) # X <- CellCycle$X;Y <- CellCycle$Y # n <- nrow(Y); p <- ncol(X); q <- ncol(Y) # control <- secure.control(spU=160/p,spV=1) # fit.cycle <- secure.path(Y, X, nrank = 10, nlambda = 100, # control = control)
A list of two components.
data(DLBCL)
data(DLBCL)
A list with two components:
Chromatin immunoprecipitation. A matrix of 180 rows and 661 columns.
Group index of 180 patients where 1,2,3 corresponds to OxPhos, BCR and HR groups respectively.
Matrix Y : Gene expression dataset from the patients with diffuse large-B-cell lymphoma (DLBCL) after chemotherapy. The data has been used for unsupervised analysis i.e. Biclustering. The data consists of expression levels of q = 661 genes from n =180 patients. Among the patients, 42, 51 and 87 of them were classified to OxPhos, BCR and HR groups, respectively. The data thus form an n by q matrix Y whose rows represent the subjects and columns correspond to the genes, and used in Rosenwald (2002).
classIndex: Out of OxPhos (oxidative phosphorylation), BCR(Bcell response) and HR (host response), the index corresponds to the groups in which these 180 subjects belongs as classified by Hoshida (2007).
Rosenwald, A., Wright, G., Chan, W. C., Connors, J. M., Campo, E., Fisher, R. I., Gascoyne, R. D., Muller-Hermelink, H. K., Smeland, E. B., Giltnane, J. M. et al. (2002) The use of molecular profling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. New England Journal of Medicine, 346, 1937-1947.
Hoshida, Y., Brunet, J.-P., Tamayo, P., Golub, T. R. and Mesirov, J. P. (2007) Subclass mapping: Identifying common subtypes in independent disease data sets. PLoS ONE, 2, e1195.
# data(DLBCL)
# data(DLBCL)
fit multivariate reduced rank regression for a specified rank.
rrr.fit(Y, X, nrank = nrank)
rrr.fit(Y, X, nrank = nrank)
Y |
a matrix of response (n by q) |
X |
a matrix of covariate (n by p) |
nrank |
an integer specifying the desired rank |
coef |
reduced rank estimate |
#require(secure) Y <- matrix(rnorm(400), 100, 4) X <- matrix(rnorm(800), 100, 8) rrr.fit <- rrr.fit(Y, X, nrank = 3)
#require(secure) Y <- matrix(rnorm(400), 100, 4) X <- matrix(rnorm(800), 100, 8) rrr.fit <- rrr.fit(Y, X, nrank = 3)
list of parameters for controling secure fitting
secure.control(mu = 1, nu = 1.1, MMerr = 0.001, MMiter = 100, outTol = 1e-06, outMaxIter = 200, inMaxIter = 200, inTol = 1e-04, lamMaxFac = 1, lamMinFac = 1e-10, gamma0 = 2, elnetAlpha = 0.95, spU = 0.25, spV = 0.25)
secure.control(mu = 1, nu = 1.1, MMerr = 0.001, MMiter = 100, outTol = 1e-06, outMaxIter = 200, inMaxIter = 200, inTol = 1e-04, lamMaxFac = 1, lamMinFac = 1e-10, gamma0 = 2, elnetAlpha = 0.95, spU = 0.25, spV = 0.25)
mu |
penalty parameter used in enforcing orthogonality |
nu |
penalty parameter used in enforcing orthogonality (incremental rate of mu) |
MMerr |
tolerence in the majorization maximization(MM) algorithm for computing initial values when missing value occurs |
MMiter |
maximum number iterations in the MM algorithm |
outTol |
tolerence of convergence of outer loop in CURE |
outMaxIter |
maximum number of outer loop iteration in CURE |
inMaxIter |
maximum number of inner loop iteration in CURE |
inTol |
tolerence value required for convergence of inner loop in CURE |
lamMaxFac |
a multiplier of calculated lambda_max |
lamMinFac |
a multiplier of determing lambda_min as a fraction of lambda_max |
gamma0 |
power parameter in the adaptive weights |
elnetAlpha |
elastic net penalty parameter |
spU |
maximum proportion of nonzero elements in each column of U |
spV |
maximum proportion of nonzero elements in each column of V |
a list of controling parameter.
Sequential factor extraction via co-sparse unit-rank estimation (SeCURE)
secure.path(Y, X = NULL, nrank = 3, nlambda = 100, U0 = NULL, V0 = NULL, D0 = NULL, orthXU = FALSE, orthV = FALSE, keepPath = TRUE, control = list(), ic = c("GIC", "BICP", "AIC")[1])
secure.path(Y, X = NULL, nrank = 3, nlambda = 100, U0 = NULL, V0 = NULL, D0 = NULL, orthXU = FALSE, orthV = FALSE, keepPath = TRUE, control = list(), ic = c("GIC", "BICP", "AIC")[1])
Y |
response matrix |
X |
covariate matrix; when X = NULL, the fucntion performs unsupervised learning |
nrank |
an integer specifying the desired rank/number of factors |
nlambda |
number of lambda values to be used along each path |
U0 |
initial value of U |
V0 |
initial value of V |
D0 |
initial value of D |
orthXU |
if TRUE, orthogonality of XU is required |
orthV |
if TRUE, orthogonality of V is required |
keepPath |
if TRUE, th solution paths of U, V, D are reported |
control |
a list of internal parameters controlling the model fitting |
ic |
character specifying which information criterion to use for selecting the tuning parameter: "GIC"(default), "BICP", and "AIC" |
C.est |
estimated coefficient matrix; based on modified BIC |
U |
estimated U matrix (factor weights) |
D |
estimated singular values |
V |
estimated V matrix (factor loadings) |
ortX |
if TRUE, X is treated as an orthogonal matrix in the computation |
lam |
selected lambda values based on the chosen information criterion |
lampath |
sequences of lambda values used in model fitting. In each sequential unit-rank estimation step, a sequence of length nlambda is first generated between (lamMax*lamMaxFac, lamMax*lamMaxFac*lamMinFac) equally spaced on the log scale, in which lamMax is estimated and the other parameters are specified in secure.control. The model fitting starts from the largest lambda and stops when the maximum proportion of nonzero elements is reached in either u or v, as specified by spU and spV in secure.control. |
IC |
values of information criteria |
Upath |
solution path of U |
Dpath |
solution path of D |
Vpath |
solution path of D |
Mishra, A., Dey, D., Chen, K. (2017) Sequential Co-Sparse Factor Regression, To appear in Journal of Computational and Graphical Statistics (JCGS)
#require(secure) # Simulate data from a sparse factor regression model p <- 100; q <- 100; n <- 200 xrho <- 0.5; nlambda <- 100 nrank <- 3 U <- matrix(0,ncol=nrank ,nrow=p); V <- matrix(0,ncol=nrank ,nrow=q) U[,1]<-c(sample(c(1,-1),8,replace=TRUE),rep(0,p-8)) U[,2]<-c(rep(0,5),sample(c(1,-1),9,replace=TRUE),rep(0,p-14)) U[,3]<-c(rep(0,11),sample(c(1,-1),9,replace=TRUE),rep(0,p-20)) V[,1]<-c(sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-5)) V[,2]<-c(rep(0,5),sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-10)) V[,3]<-c(rep(0,10),sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-15)) U[,1:3]<- apply(U[,1:3],2,function(x)x/sqrt(sum(x^2))) V[,1:3]<- apply(V[,1:3],2,function(x)x/sqrt(sum(x^2))) D <- diag(c(20,15,10)) C <- U%*%D%*%t(V) Xsigma <- xrho^abs(outer(1:p, 1:p,FUN="-")) sim.sample <- secure.sim(U,D,V,n,snr = 0.25,Xsigma,rho=0.3) Y <- sim.sample$Y; X <- sim.sample$X # Fitting secure. Set maximum rank to be 4. rank.ini <- 4 # Set largest model to about 25% sparsity # See secure.control for setting other parameters control <- secure.control(spU=0.25, spV=0.25) # Complete data case. # Fit secure without orthogonality fit.orthF <- secure.path(Y,X,nrank=rank.ini,nlambda = nlambda, control=control) # check orthogonality crossprod(X%*%fit.orthF$U)/n # check solution # fit.orthF$U # fit.orthF$V # fit.orthF$D # Fit secure with orthogonality if desired. It takes longer time. # fit.orthT <- secure.path(Y,X,nrank=rank.ini,nlambda = nlambda, # orthXU=TRUE,orthV=TRUE,control=control) # check orthogonality # crossprod(X%*%fit.orthT$U)/n # 15% missing case miss <- 0.15 t.ind <- sample.int(n*q, size = miss*n*q) y <- as.vector(Y); y[t.ind] <- NA; Ym <- matrix(y,n,q) fit.orthF.miss <- secure.path(Ym, X, nrank = rank.ini, nlambda = nlambda, control = control) # fit.orthT.miss <- secure.path(Ym, X, nrank = rank.ini, nlambda = nlambda, # orthXU=TRUE,orthV=TRUE, control = control)
#require(secure) # Simulate data from a sparse factor regression model p <- 100; q <- 100; n <- 200 xrho <- 0.5; nlambda <- 100 nrank <- 3 U <- matrix(0,ncol=nrank ,nrow=p); V <- matrix(0,ncol=nrank ,nrow=q) U[,1]<-c(sample(c(1,-1),8,replace=TRUE),rep(0,p-8)) U[,2]<-c(rep(0,5),sample(c(1,-1),9,replace=TRUE),rep(0,p-14)) U[,3]<-c(rep(0,11),sample(c(1,-1),9,replace=TRUE),rep(0,p-20)) V[,1]<-c(sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-5)) V[,2]<-c(rep(0,5),sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-10)) V[,3]<-c(rep(0,10),sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-15)) U[,1:3]<- apply(U[,1:3],2,function(x)x/sqrt(sum(x^2))) V[,1:3]<- apply(V[,1:3],2,function(x)x/sqrt(sum(x^2))) D <- diag(c(20,15,10)) C <- U%*%D%*%t(V) Xsigma <- xrho^abs(outer(1:p, 1:p,FUN="-")) sim.sample <- secure.sim(U,D,V,n,snr = 0.25,Xsigma,rho=0.3) Y <- sim.sample$Y; X <- sim.sample$X # Fitting secure. Set maximum rank to be 4. rank.ini <- 4 # Set largest model to about 25% sparsity # See secure.control for setting other parameters control <- secure.control(spU=0.25, spV=0.25) # Complete data case. # Fit secure without orthogonality fit.orthF <- secure.path(Y,X,nrank=rank.ini,nlambda = nlambda, control=control) # check orthogonality crossprod(X%*%fit.orthF$U)/n # check solution # fit.orthF$U # fit.orthF$V # fit.orthF$D # Fit secure with orthogonality if desired. It takes longer time. # fit.orthT <- secure.path(Y,X,nrank=rank.ini,nlambda = nlambda, # orthXU=TRUE,orthV=TRUE,control=control) # check orthogonality # crossprod(X%*%fit.orthT$U)/n # 15% missing case miss <- 0.15 t.ind <- sample.int(n*q, size = miss*n*q) y <- as.vector(Y); y[t.ind] <- NA; Ym <- matrix(y,n,q) fit.orthF.miss <- secure.path(Ym, X, nrank = rank.ini, nlambda = nlambda, control = control) # fit.orthT.miss <- secure.path(Ym, X, nrank = rank.ini, nlambda = nlambda, # orthXU=TRUE,orthV=TRUE, control = control)
genertate random samples from a sparse factor regression model
secure.sim(U, D, V, n, snr, Xsigma, rho = 0)
secure.sim(U, D, V, n, snr, Xsigma, rho = 0)
U |
specified value of U |
D |
specified value of D |
V |
specified value of V |
n |
sample size |
snr |
signal to noise ratio |
Xsigma |
covariance matrix for generating sample of X |
rho |
parameter defining correlated error |
Y |
Generated response matrix |
X |
Generated predictor matrix |
#require(secure) # Simulate data from a sparse factor regression model p <- 100; q <- 50; n <- 300 snr <- 0.5; ssigma <- 0.5; nlambda <- 200 nrank <- 3 U <- matrix(0,ncol=nrank ,nrow=p); V <- matrix(0,ncol=nrank ,nrow=q) U[,1]<-c(sample(c(1,-1),8,replace=TRUE),rep(0,p-8)) U[,2]<-c(rep(0,5),sample(c(1,-1),9,replace=TRUE),rep(0,p-14)) U[,3]<-c(rep(0,11),sample(c(1,-1),9,replace=TRUE),rep(0,p-20)) V[,1]<-c(sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-5)) V[,2]<-c(rep(0,5),sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-10)) V[,3]<-c(rep(0,10),sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-15)) U[,1:3]<- apply(U[,1:3],2,function(x)x/sqrt(sum(x^2))) V[,1:3]<- apply(V[,1:3],2,function(x)x/sqrt(sum(x^2))) D <- diag(c(20,15,10)) C <- U%*%D%*%t(V) Xsigma <- ssigma^abs(outer(1:p, 1:p,FUN="-")) sim.sample <- secure.sim(U,D,V,n,snr,Xsigma) Y <- sim.sample$Y X <- sim.sample$X
#require(secure) # Simulate data from a sparse factor regression model p <- 100; q <- 50; n <- 300 snr <- 0.5; ssigma <- 0.5; nlambda <- 200 nrank <- 3 U <- matrix(0,ncol=nrank ,nrow=p); V <- matrix(0,ncol=nrank ,nrow=q) U[,1]<-c(sample(c(1,-1),8,replace=TRUE),rep(0,p-8)) U[,2]<-c(rep(0,5),sample(c(1,-1),9,replace=TRUE),rep(0,p-14)) U[,3]<-c(rep(0,11),sample(c(1,-1),9,replace=TRUE),rep(0,p-20)) V[,1]<-c(sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-5)) V[,2]<-c(rep(0,5),sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-10)) V[,3]<-c(rep(0,10),sample(c(1,-1),5,replace=TRUE)*runif(5,0.3,1),rep(0,q-15)) U[,1:3]<- apply(U[,1:3],2,function(x)x/sqrt(sum(x^2))) V[,1:3]<- apply(V[,1:3],2,function(x)x/sqrt(sum(x^2))) D <- diag(c(20,15,10)) C <- U%*%D%*%t(V) Xsigma <- ssigma^abs(outer(1:p, 1:p,FUN="-")) sim.sample <- secure.sim(U,D,V,n,snr,Xsigma) Y <- sim.sample$Y X <- sim.sample$X