Title: | Interpretation of Heterogeneous Single-Cell Gene Expression Data |
---|---|
Description: | We develop a novel matrix factorization tool named 'scINSIGHT' to jointly analyze multiple single-cell gene expression samples from biologically heterogeneous sources, such as different disease phases, treatment groups, or developmental stages. Given multiple gene expression samples from different biological conditions, 'scINSIGHT' simultaneously identifies common and condition-specific gene modules and quantify their expression levels in each sample in a lower-dimensional space. With the factorized results, the inferred expression levels and memberships of common gene modules can be used to cluster cells and detect cell identities, and the condition-specific gene modules can help compare functional differences in transcriptomes from distinct conditions. Please also see Qian K, Fu SW, Li HW, Li WV (2022) <doi:10.1186/s13059-022-02649-3>. |
Authors: | Kun Qian [aut, ctb, cre] , Wei Vivian Li [aut, ctb] |
Maintainer: | Kun Qian <[email protected]> |
License: | GPL-3 |
Version: | 0.1.4 |
Built: | 2024-11-11 07:03:17 UTC |
Source: | CRAN |
This function initializes an scINSIGHT object with normalized data passed in.
create_scINSIGHT(norm.data, condition)
create_scINSIGHT(norm.data, condition)
norm.data |
List of normalized expression matrices (genes by cells). Gene names should be the same in all matrices. |
condition |
Vector specifying sample conditions. |
scINSIGHT
object with norm.data slot set.
# Demonstration using matrices with randomly generated numbers S1 <- matrix(runif(50000,0,2), 500,100) S2 <- matrix(runif(60000,0,2), 500,120) S3 <- matrix(runif(80000,0,2), 500,160) S4 <- matrix(runif(75000,0,2), 500,150) data = list(S1, S2, S3, S4) sample = c("sample1", "sample2", "sample3", "sample4") condition = c("control", "activation", "control", "activation") names(data) = sample names(condition) = sample scINSIGHTx <- create_scINSIGHT(data, condition)
# Demonstration using matrices with randomly generated numbers S1 <- matrix(runif(50000,0,2), 500,100) S2 <- matrix(runif(60000,0,2), 500,120) S3 <- matrix(runif(80000,0,2), 500,160) S4 <- matrix(runif(75000,0,2), 500,150) data = list(S1, S2, S3, S4) sample = c("sample1", "sample2", "sample3", "sample4") condition = c("control", "activation", "control", "activation") names(data) = sample names(condition) = sample scINSIGHTx <- create_scINSIGHT(data, condition)
Perform INterpreting single cell gene expresSIon bioloGically Heterogeneous daTa (scINSIGHT) to return factorized ,
,
and
matrices.
This factorization produces a matrix (cells by
), a
matrix (cells by
), a shared
matrix (
by genes)
for each sample, and a
(
by genes) matrix for each condition.
are the expression matrices of
common gene modules for all samples,
is the membership matrix of
common gene modules, and it's shared by all samples.
are the expression matrices of
condition-specific gene modules for all samples,
and
are the membership matrices of
condition-specific gene modules for all conditions.
run_scINSIGHT( object, K = seq(5, 15, 2), K_j = 2, LDA = c(0.001, 0.01, 0.1, 1, 10), thre.niter = 500, thre.delta = 0.01, num.cores = 1, B = 5, out.dir = NULL, method = "increase" )
run_scINSIGHT( object, K = seq(5, 15, 2), K_j = 2, LDA = c(0.001, 0.01, 0.1, 1, 10), thre.niter = 500, thre.delta = 0.01, num.cores = 1, B = 5, out.dir = NULL, method = "increase" )
object |
|
K |
Number of common gene modules. (default |
K_j |
Number of dataset-specific gene modules. (default 2) |
LDA |
Regularization parameters. (default |
thre.niter |
Maximum number of block coordinate descent iterations to perform. (default 500) |
thre.delta |
Stop iteration when the reduction of objective function is less than the threshold. (default 0.01) |
num.cores |
Number of cores used for optimizing factorizations in parallel (default 1). |
B |
Number of repeats with random seed from 1 to B. (default 5) |
out.dir |
Output directory of scINSIGHT results. (default NULL) |
method |
Method of updating the factorization (default "increase"). If provide multiple For "increase", the algorithm will first perform factorization with the least For "increase", the algorithm will first perform factorization with the largest |
scINSIGHT
object with ,
,
,
and parameters slots set.
The scINSIGHT object is created from two or more single cell datasets. To construct a scINSIGHT object, the user needs to provide at least two normalized expression (or another single-cell modality) matrices and the condition vector.
The key slots used in the scINSIGHT object are described below.
norm.data
List of normalized expression matrices (genes by cells). Each matrix should have the same number and name of genes.
condition
Vector specifying each sample's condition name.
W_1
List of estimated by scINSIGHT, names correspond to sample names.
W_2
List of estimated by scINSIGHT, names correspond to sample names.
H
List of estimated by scINSIGHT, names correspond to condition names.
V
Matrix estimated by scINSIGHT.
norm.W_2
List of after normalization. Recommended for downstream analysis.
clusters
List of cluster results.
parameters
List of selected parameters, including and
.