A system-level understanding of
the regulation and coordination mechanisms of gene expression is
essential to understanding the complexity of biological processes in
health and disease. With the rapid development of single-cell RNA
sequencing technologies, it is now possible to investigate gene
interactions in a cell-type-specific manner. Here we introduce the
scLink
package, which uses statistical network modeling to
understand the co-expression relationships among genes and to construct
sparse gene co-expression networks from single-cell gene expression
data.
Here we demonstrate the functionality of scLink
using
the example data stored in the package.
## Loading required package: parallel
## Loading required package: glasso
count = readRDS(system.file("extdata", "example.rds", package = "scLink"))
genes = readRDS(system.file("extdata", "genes.rds", package = "scLink"))
The example raw count matrix count
has 793
rows representing different cells and 23,341
columns representing different genes.
genes
is a character vector of 500 genes of interest.
sclink_norm
We use the function sclink_norm
to process single cell
read count for application of the sclink method. The code below will
normalize the read count matrix with a library size of 106 and only keep the 500 genes in
genes
for downstream analysis. Note that the normalized
count matrix count.norm
is on the log10 scale.
If users do not have a particular gene list for network inference,
they can set filter.genes=TRUE
to filter for the top n genes with largest average
expression values. For example:
sclink_net
After the pre-processing step, we use the function
sclink_net
to calculate the robust correlation matrix and
identifed sparse co-expression network of scLink. expr
is
the normalized count matrix output by sclink_norm
or
supplied by the users. lda
is the candidate regularization
parameters used in scLink’s graphical model. The users can set
ncores
to take advantage of parallel computation.
sclink_net
returns a list of results. The scLink’s
robust correlation matrix can be retrieved from the cor
element:
## Rn45s Eef1a1 Malat1
## Rn45s 1.00000000 -0.27604002 -0.08561265
## Eef1a1 -0.27604002 1.00000000 -0.05138179
## Malat1 -0.08561265 -0.05138179 1.00000000
The gene co-expression networks and summary statistics can be
retrieved from the summary
element, which is a list with
the same length as lda
: each element corresponds to one
regularization parameter.
## [1] "adj" "Sigma" "nedge" "bic" "lambda"
## Rn45s Eef1a1 Malat1
## Rn45s 1 0 0
## Eef1a1 0 1 0
## Malat1 0 0 1
## Rn45s Eef1a1 Malat1
## Rn45s 1.696584 0.000000 0.000000
## Eef1a1 0.000000 1.665809 0.000000
## Malat1 0.000000 0.000000 1.578271
## [1] 1255450
## [1] 127
## [1] 0.5
sclink_cor
Since it is very difficult to infer co-expression relationships for
lowly expressed genes in single-cell data, we suggest the filtering step
as used in sclink_norm
to select genes. This also reduces
the computational burden. However, if the users would like to infer gene
networks for a large gene list (e.g., > 5000 genes), we suggest that the users
first use sclink_cor
to investigate the correlation
structures among these genes.
If the correlation matrix suggests obvious gene modules, then the
users can apply sclink_net
separately on these modules to
reduce computation time and increase overall accuracy.