Title: | Performs the Joint Graphical Lasso for Sparse Inverse Covariance Estimation on Multiple Classes |
---|---|
Description: | The Joint Graphical Lasso is a generalized method for estimating Gaussian graphical models/ sparse inverse covariance matrices/ biological networks on multiple classes of data. We solve JGL under two penalty functions: The Fused Graphical Lasso (FGL), which employs a fused penalty to encourage inverse covariance matrices to be similar across classes, and the Group Graphical Lasso (GGL), which encourages similar network structure between classes. FGL is recommended over GGL for most applications. Reference: Danaher P, Wang P, Witten DM. (2013) <doi:10.1111/rssb.12033>. |
Authors: | Patrick Danaher |
Maintainer: | Patrick Danaher <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.3.2 |
Built: | 2024-12-14 06:28:27 UTC |
Source: | CRAN |
Runs the Fused Graphical Lasso and the Group Graphical Lasso for network estimation and sparse inverse covariance estimation across multiple classes of data.
The Fused Graphical Lasso (FGL) and the Group Graphical Lasso (GGL) are two methods for estimating sparse inverse covariance matrices that are similar across classes. A motivating example is the analysis of gene expression data from tumor and healthy cells: FGL and GGL allow joint estimation of gene expression conditional dependency networks in both cancer and healthy cells FGL is recommended over GGL for most purposes. The function JGL can implement either of these methods.
The JGL package includes a number of functions to help analyze estimated networks: subnetworks(), net.degree(), net.edges(), net.hubs(), net.neighbors(), print.jgl() and plot.jgl(). These functions rely on the igraph package.
A large number of other functions are called by the above functions, and are not generally useful when called by the user.
Patrick Danaher
Maintainer: Patrick Danaher - pdanaher at uw dot edu
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run FGL: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) str(fgl.results) print.jgl(fgl.results) ## get subnetwork membership of FGL results: subnetworks(fgl.results$theta)
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run FGL: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) str(fgl.results) print.jgl(fgl.results) ## get subnetwork membership of FGL results: subnetworks(fgl.results$theta)
crit() calculates the critical value of the FGL objective funciton. It is used to confirm that the FGL algorithm is converging.
crit(theta, S, n, lam1, lam2, penalize.diagonal)
crit(theta, S, n, lam1, lam2, penalize.diagonal)
theta |
A list of pXp inverse covariance matrices. |
S |
A list of pXp empirical covariance matrices. |
n |
A vector of sample sizes to attribute to each of the K data matrices. n controls the relative weights of the classes: for example, with n==c(1,1), each class's theta will be penalized equally. |
lam1 |
The tuning parameter for the graphical lasso penalty. |
lam2 |
The tuning parameter for the fused lasso penalty. |
penalize.diagonal |
Logical value determing whether the graphical lasso penalty should also be applied to the diagonal of the inverse covariance matrices. |
A function called by FGL to calculate the critical value of the objective function.
crit, the critical value of the list of inverse covariance matrices.
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
A dataset with 200 genes and 2 classes of data, each with 100 observations. The two classes' data matrices are stored in a list.
data(example.data)
data(example.data)
The format is: List of 2 $ : num [1:100, 1:200] 0.395 -2.03 -1.704 -0.469 1.75 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:200] "gene 1" "gene 2" "gene 3" "gene 4" ... $ : num [1:100, 1:200] -1.548 1.45 -0.812 -0.589 0.69 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:200] "gene 1" "gene 2" "gene 3" "gene 4" ...
data(example.data) str(example.data)
data(example.data) str(example.data)
gcrit() calculates the critical value of the GGL objective funciton. It is used to confirm that the GGL algorithm is converging.
gcrit(theta, S, n, lam1, lam2, penalize.diagonal)
gcrit(theta, S, n, lam1, lam2, penalize.diagonal)
theta |
A list of pXp inverse covariance matrices. |
S |
A list of pXp empirical covariance matrices. |
n |
A vector of sample sizes to attribute to each of the K data matrices. n controls the relative weights of the classes: for example, with n==c(1,1), each class's theta will be penalized equally. |
lam1 |
The tuning parameter for the graphical lasso penalty. |
lam2 |
The tuning parameter for the group lasso penalty. |
penalize.diagonal |
Logical, determining whether the penalties will be applied to the diagonal elements of the theta matrices. |
A function called by GGL to calculate the critical value of the objective function.
crit, the critical value of the list of inverse covariance matrices.
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
Solve the Joint Graphical Lasso
JGL(Y,penalty="fused",lambda1,lambda2,rho=1,weights="equal",penalize.diagonal=FALSE, maxiter=500,tol=1e-5,warm=NULL,return.whole.theta=FALSE, screening="fast", truncate = 1e-5)
JGL(Y,penalty="fused",lambda1,lambda2,rho=1,weights="equal",penalize.diagonal=FALSE, maxiter=500,tol=1e-5,warm=NULL,return.whole.theta=FALSE, screening="fast", truncate = 1e-5)
Y |
A list of nXp data matrices. |
penalty |
Determines whether lambda2 controls a "fused" or "group" lasso penalty. Must take value "fused" or "group". |
lambda1 |
The tuning parameter for the graphical lasso penalty. |
lambda2 |
The tuning parameter for the fused or group lasso penalty. |
rho |
A step size parameter. Large values decrease step size. |
weights |
Determines the putative sample size of each class's data. Allowed values: a vector with length equal to the number of classes; "equal", giving each class weight 1; "sample.size", giving each class weight corresponding to its sample size. |
penalize.diagonal |
If penalty=="fused", determines whether lambda1 is applied to the diagonal of theta. If penalty=="group", determines whether lambda1 and lambda2 are applied to the diagonal of theta. |
maxiter |
Maximum number of iterations. |
tol |
Determines convergence criterion. |
warm |
Input a warm start to theta in the form of a K-length list of pXp matrices. |
return.whole.theta |
If TRUE, each class's inverse covariance matrix is returned whole. If FALSE, the inverse covariance matrix is only returned over the connected nodes, and only the diagonal of the matrix is returned over the unconnected nodes. |
screening |
"fast" or "memory.efficient". Use of "fast" is recommended unless the number of features prohibits storage of a pXp matrix. For very high dimension data, screening="memory.efficient" will allow a solution with a much longer computation time. |
truncate |
Defaults to 1e-5. At convergence, all values of theta below this number will be set to zero. |
This function can solve both the Fused Graphical Lasso and the Group Graphical Lasso.
theta |
A list of the estimated inverse covariance matrices, over all nodes if return.whole.theta==TRUE and over only the connected nodes if return.whole.theta==FALSE |
diag.theta.unconnected |
Returned only if return.whole.theta==FALSE. A list of vectors, each vector the estimated diagonal of an inverse covariance matrix over the unconnected nodes. |
connected |
A logical vector identifying whether each node is connected. |
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) str(fgl.results) print.jgl(fgl.results) ## run ggl: ggl.results = JGL(Y=example.data,penalty="group",lambda1=.15,lambda2=.2,return.whole.theta=TRUE) str(ggl.results) print.jgl(ggl.results)
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) str(fgl.results) print.jgl(fgl.results) ## run ggl: ggl.results = JGL(Y=example.data,penalty="group",lambda1=.15,lambda2=.2,return.whole.theta=TRUE) str(ggl.results) print.jgl(ggl.results)
For each class, lists the degree of every node.
net.degree(theta)
net.degree(theta)
theta |
A list of pXp matrices, each an estimated sparse inverse covariance matrix. (For example, the result of FGL or GGL.) |
degree, a list of p-length vectors, each giving the degree of all p nodes in the network for the corresponding class.
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) ## get degree list: net.degree(fgl.results$theta)
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) ## get degree list: net.degree(fgl.results$theta)
For each class, list every pair of connected nodes.
net.edges(theta)
net.edges(theta)
theta |
A list of pXp matrices, each an estimated sparse inverse covariance matrix. (For example, the result of FGL or GGL.) |
edges, a K-length list, each element of the list an igraph.es object detailing all pairs of connected nodes in the class.
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) ## get edges list: net.edges(fgl.results$theta)
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) ## get edges list: net.edges(fgl.results$theta)
List the degrees of the most connected nodes in each class.
net.hubs(theta, nhubs = 10)
net.hubs(theta, nhubs = 10)
theta |
A list of pXp matrices, each an estimated sparse inverse covariance matrix. (For example, the result of FGL or GGL.) |
nhubs |
The number of hubs to be identified. net.hubs() will list the degree of the nhubs most connected nodes in each class. |
hubs, a list of length K, each element of which is a vector giving the degree of the most connected nodes in the corresponding class.
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) ## get hubs list: net.hubs(fgl.results$theta)
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) ## get hubs list: net.hubs(fgl.results$theta)
For each class, returns the names of the nodes connected to a given node.
net.neighbors(theta, index)
net.neighbors(theta, index)
theta |
A list of pXp matrices, each an estimated sparse inverse covariance matrix. (For example, the result of FGL or GGL.) |
index |
The row number of the node to be investigated. |
neighbors, a list of length K, each element of which is a vector of the row names of the nodes neighboring the node of interest.
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1,return.whole.theta=TRUE) ## get neighbors of gene 195: net.neighbors(fgl.results$theta,index=195)
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1,return.whole.theta=TRUE) ## get neighbors of gene 195: net.neighbors(fgl.results$theta,index=195)
Applies the FGL screening rule to identify (before running FGL) which features are connected (have degree > 0 in any class) or unconnected in the solution. screen.fgl returns exactly the right list of connected nodes when K=2. When K is larger than 2, screen.fgl applies a weaker condition that screens out many, but not all unconnected nodes. This algorithm is set up to be memory-efficient, but not fast: it can be applied to very large dimension datasets, but it will take time to run.
screen.fgl(Y, lambda1, lambda2, weights = "equal")
screen.fgl(Y, lambda1, lambda2, weights = "equal")
Y |
A list of nXp data matrices. |
lambda1 |
The tuning parameter for the graphical lasso penalty. Must be greater than or equal to 0. |
lambda2 |
The tuning parameter for the fused lasso penalty. Must be greater than or equal to 0. |
weights |
The weights to assign to each class. The higher a class's weights, the weaker the effect of the penalties on its estimated inverse covariance matrix. If "equal", the classes are weighted equally, regardless of sample size. If "sample.size", the classes are weighted by sample size. Custom weightings are achievable by entering a vector of K weights. |
connected, a logical vector identifying the connected nodes.
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## which nodes will be connected? screen.fgl(example.data,lambda1=.2,lambda2=.1,weights="equal")
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## which nodes will be connected? screen.fgl(example.data,lambda1=.2,lambda2=.1,weights="equal")
Applies the GGL screening rule to identify (before running GGL) which features are connected (have degree > 0 in any class) in the solution. This algorithm is set up to be memory-efficient, but not fast: it can be applied to very large dimension datasets, but it will take time to run.
screen.ggl(Y, lambda1, lambda2, weights = "equal")
screen.ggl(Y, lambda1, lambda2, weights = "equal")
Y |
A list of nXp data matrices. |
lambda1 |
The tuning parameter for the graphical lasso penalty. Must be greater than or equal to 0. |
lambda2 |
The tuning parameter for the group lasso penalty. Must be greater than or equal to 0. |
weights |
The weights to assign to each class. The higher a class's weights, the weaker the effect of the penalties on its estimated inverse covariance matrix. If "equal", the classes are weighted equally, regardless of sample size. If "sample.size", the classes are weighted by sample size. Custom weightings are achievable by entering a vector of K weights. |
connected, a logical vector identifying the connected nodes.
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## which nodes will be connected? screen.ggl(example.data,lambda1=.3,lambda2=.3,weights="equal")
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## which nodes will be connected? screen.ggl(example.data,lambda1=.3,lambda2=.3,weights="equal")
For each class, returns lists of all features belonging to subnetworks. (A subnetwork is defined as a collection of features C for which theta[C,!C]==0, and within which no further subnetworks can be identified. In other words, a block in the block diagonal structure of theta, or a set of features that can be connected through theta's edges.)
subnetworks(theta)
subnetworks(theta)
theta |
A list of pXp matrices, each an estimated sparse inverse covariance matrix. (For example, the result of FGL or GGL.) |
A list length K, each element of which is a list of subnetworks in class K. Each subnetwork is represented as a vector of feature names.
Patrick Danaher
Patrick Danaher, Pei Wang and Daniela Witten (2011). The joint graphical lasso for inverse covariance estimation across multiple classes. http://arxiv.org/abs/1111.0324
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) ## identify subnetworks subnetworks(fgl.results$theta)
## load an example dataset with K=two classes, p=200 features, and n=100 samples per class: data(example.data) str(example.data) ## run fgl: fgl.results = JGL(Y=example.data,penalty="fused",lambda1=.25,lambda2=.1) ## identify subnetworks subnetworks(fgl.results$theta)