Package 'scINSIGHT'

Title: Interpretation of Heterogeneous Single-Cell Gene Expression Data
Description: We develop a novel matrix factorization tool named 'scINSIGHT' to jointly analyze multiple single-cell gene expression samples from biologically heterogeneous sources, such as different disease phases, treatment groups, or developmental stages. Given multiple gene expression samples from different biological conditions, 'scINSIGHT' simultaneously identifies common and condition-specific gene modules and quantify their expression levels in each sample in a lower-dimensional space. With the factorized results, the inferred expression levels and memberships of common gene modules can be used to cluster cells and detect cell identities, and the condition-specific gene modules can help compare functional differences in transcriptomes from distinct conditions. Please also see Qian K, Fu SW, Li HW, Li WV (2022) <doi:10.1186/s13059-022-02649-3>.
Authors: Kun Qian [aut, ctb, cre] , Wei Vivian Li [aut, ctb]
Maintainer: Kun Qian <[email protected]>
License: GPL-3
Version: 0.1.4
Built: 2024-12-11 07:01:49 UTC
Source: CRAN

Help Index


Create an scINSIGHT object.

Description

This function initializes an scINSIGHT object with normalized data passed in.

Usage

create_scINSIGHT(norm.data, condition)

Arguments

norm.data

List of normalized expression matrices (genes by cells). Gene names should be the same in all matrices.

condition

Vector specifying sample conditions.

Value

scINSIGHT object with norm.data slot set.

Examples

# Demonstration using matrices with randomly generated numbers
S1 <- matrix(runif(50000,0,2), 500,100)
S2 <- matrix(runif(60000,0,2), 500,120)
S3 <- matrix(runif(80000,0,2), 500,160)
S4 <- matrix(runif(75000,0,2), 500,150)
data = list(S1, S2, S3, S4)
sample = c("sample1", "sample2", "sample3", "sample4")
condition = c("control", "activation", "control", "activation")
names(data) = sample
names(condition) = sample
scINSIGHTx <- create_scINSIGHT(data, condition)

Perform scINSIGHT on normalized datasets

Description

Perform INterpreting single cell gene expresSIon bioloGically Heterogeneous daTa (scINSIGHT) to return factorized W1W_{\ell1}, W2W_{\ell2}, HH and VV matrices.

This factorization produces a W1W_{\ell1} matrix (cells by KjK_j), a W2W_{\ell2} matrix (cells by KK), a shared VV matrix (KK by genes) for each sample, and a HH (KjK_j by genes) matrix for each condition. W2W_{\ell2} are the expression matrices of KK common gene modules for all samples, VV is the membership matrix of KK common gene modules, and it's shared by all samples. W1W_{\ell1} are the expression matrices of KjK_j condition-specific gene modules for all samples, and HH are the membership matrices of KjK_j condition-specific gene modules for all conditions.

Usage

run_scINSIGHT(
  object,
  K = seq(5, 15, 2),
  K_j = 2,
  LDA = c(0.001, 0.01, 0.1, 1, 10),
  thre.niter = 500,
  thre.delta = 0.01,
  num.cores = 1,
  B = 5,
  out.dir = NULL,
  method = "increase"
)

Arguments

object

scINSIGHT object.

K

Number of common gene modules. (default c(5, 7, 9, 11, 13, 15))

K_j

Number of dataset-specific gene modules. (default 2)

LDA

Regularization parameters. (default c(0.001, 0.01, 0.1, 1, 10))

thre.niter

Maximum number of block coordinate descent iterations to perform. (default 500)

thre.delta

Stop iteration when the reduction of objective function is less than the threshold. (default 0.01)

num.cores

Number of cores used for optimizing factorizations in parallel (default 1).

B

Number of repeats with random seed from 1 to B. (default 5)

out.dir

Output directory of scINSIGHT results. (default NULL)

method

Method of updating the factorization (default "increase"). If provide multiple KK, user can choose method between "increase" and "decrease".

For "increase", the algorithm will first perform factorization with the least K=K1K=K_1. Then initialize K2K1K_2-K_1 facotrs, where K2K_2 is the KK sightly larger than K1K_1, and perform facotrization with these new facotrs. Continue this process until the largest KK.

For "increase", the algorithm will first perform factorization with the largest K=K1K=K_1. Then choose K2K_2 facotrs, where K2K_2 is the KK sightly less than K1K_1, and perform facotrization with these new facotrs. Continue this process until the least KK.

Value

scINSIGHT object with W1W_1, W2W_2, HH, VV and parameters slots set.


The scINSIGHT Class

Description

The scINSIGHT object is created from two or more single cell datasets. To construct a scINSIGHT object, the user needs to provide at least two normalized expression (or another single-cell modality) matrices and the condition vector.

Details

The key slots used in the scINSIGHT object are described below.

Slots

norm.data

List of normalized expression matrices (genes by cells). Each matrix should have the same number and name of genes.

condition

Vector specifying each sample's condition name.

W_1

List of W1W_{\ell1} estimated by scINSIGHT, names correspond to sample names.

W_2

List of W2W_{\ell2} estimated by scINSIGHT, names correspond to sample names.

H

List of HH estimated by scINSIGHT, names correspond to condition names.

V

Matrix VV estimated by scINSIGHT.

norm.W_2

List of W2W_{\ell2} after normalization. Recommended for downstream analysis.

clusters

List of cluster results.

parameters

List of selected parameters, including KK and λ\lambda.