Title: | Identifying Modulators of Cellular Responses Leveraging Intercellular Heterogeneity |
---|---|
Description: | Cellular responses to perturbations are highly heterogeneous and depend largely on the initial state of cells. Connecting post-perturbation cells via cellular trajectories to untreated cells (e.g. by leveraging metabolic labeling information) enables exploitation of intercellular heterogeneity as a combined knock-down and overexpression screen to identify pathway modulators, termed Heterogeneity-seq (see 'Berg et al' <doi:10.1101/2024.10.28.620481>). This package contains functions to generate cellular trajectories based on scSLAM-seq (single-cell, thiol-(SH)-linked alkylation of RNA for metabolic labelling sequencing) time courses, functions to identify pathway modulators and to visualize the results. |
Authors: | Kevin Berg [aut, cre], Florian Erhard [aut]
|
Maintainer: | Kevin Berg <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.1.0 |
Built: | 2025-03-05 07:09:26 UTC |
Source: | CRAN |
This function calculates distances matrices between cells of different time points based on metabolic labeling RNA profiles.
distmat(prev.t, next.t, prevAssay, nextAssay, gene_subset = NULL)
distmat(prev.t, next.t, prevAssay, nextAssay, gene_subset = NULL)
prev.t |
Cells to be used from the previous time point in distance matrix calculation. |
next.t |
Cells to be used from the next time point in distance matrix calculation. |
prevAssay |
Name of the expression assay of cells from the previous time point. |
nextAssay |
Name of the expression assay of cells from the next time point. |
gene_subset |
Set a subset of genes on which trajectories should be calculated. Other genes will be disregarded. |
Distance matrix between cells from two time points.
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html obj.list <- SplitObject(seuratObject, split.by = "time") D.list=list( distmat(treatment.list[["0h"]],treatment.list[["2h"]], "RNA", "prevRNA"), distmat(treatment.list[["2h"]],treatment.list[["4h"]], "RNA", "prevRNA") )
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html obj.list <- SplitObject(seuratObject, split.by = "time") D.list=list( distmat(treatment.list[["0h"]],treatment.list[["2h"]], "RNA", "prevRNA"), distmat(treatment.list[["2h"]],treatment.list[["4h"]], "RNA", "prevRNA") )
Wrapper function for all Heterogeneity-seq functions.
Hetseq(method = c("test", "classify", "DoubleML"), ...)
Hetseq(method = c("test", "classify", "DoubleML"), ...)
method |
The method to run Heterogeneity-seq. Calls hetseq.test, hetseq.classify or hetseq.DoubleML. |
... |
Parameters given to the chosen Hetseq method. See respective help pages. |
Heterogeneity-seq uses intercellular heterogeneity to identify modulators of cellular response to perturbations. Three approaches to identify these factors are currently available:
- test: Differential Gene Expression testing between cells from two response groups.
- classify: Predicting cellular outcome by expression of single genes (+informative features) reveals genes with strong predictive capabilities (high AUC) as potential pathway modulators.
- DoubleML: A strict predictor that utilizes Causal Inference (DoubleML) to distinguish causal factors from simply correlating genes. Genes with high estimated effects on the outcome and high significance are identified as potential pathway modulators.
Table of Heterogeneity-seq results.
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html tab <- Hetseq(method="classify", data, trajectories, score.name = "score")
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html tab <- Hetseq(method="classify", data, trajectories, score.name = "score")
Classifying the cellular response of control cells using single gene expression (+ informative features) to identify features with the strongest predictive capabilities.
HetseqClassify( object, trajectories, score.group = NULL, score.name = NULL, quantiles = c(0.25, 0.75), compareGroups = c("Low", "High"), posClass = NULL, basefeatures = NULL, genes = NULL, assay = NULL, split = NULL, kernel = "radial", cross = 10, num_cores = 1 )
HetseqClassify( object, trajectories, score.group = NULL, score.name = NULL, quantiles = c(0.25, 0.75), compareGroups = c("Low", "High"), posClass = NULL, basefeatures = NULL, genes = NULL, assay = NULL, split = NULL, kernel = "radial", cross = 10, num_cores = 1 )
object |
Seurat object |
trajectories |
Matrix of cell-cell trajectories. Columns represent time points, rows represent trajectories of connected cells over time points. |
score.group |
A named vector of response groups. Names represent cells, the values represent the score groups. If no score.group is set, use score.name and quantiles parameters must be set to define score groups. |
score.name |
The name of a numeric Seurat meta data column, which will be used to calculate score groups. Only used if no score.group is given. |
quantiles |
Thresholds of the score.name meta data to define 3 response groups. Low, Middle, High. |
compareGroups |
Which score groups to test. Default: Low vs. High |
posClass |
Define the positive Class for classification. |
basefeatures |
Additional informative features to include in the classification. Must be meta data available in the Seurat object. |
genes |
Vector of genes to test. |
assay |
The name of the Seurat assay to perform Heterogeneity-seq on. If NULL, the default assay will be used. |
split |
Set a training-test data split. Must be in [0,1] |
kernel |
The kernel for the SVM. linear, polynomial, radial or sigmoid. Default: radial. |
cross |
Number of cross-validations. |
num_cores |
The number of cores used in parallel processing. |
Table of log2FC and AUC values for each gene and an additional AUC value for the baseline features.
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html t <- HetseqClassify(data, trajectories, score.name = "score") t <- HetseqClassify(data, trajectories, score.group = groups, compareGroups = c("Weak", "Strong"))
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html t <- HetseqClassify(data, trajectories, score.name = "score") t <- HetseqClassify(data, trajectories, score.group = groups, compareGroups = c("Weak", "Strong"))
Classifying the cellular response of control cells using single gene expression (+ informative features) to identify features with the strongest predictive capabilities and applying causal inference by a DoubleML approach.
HetseqDoubleML( object, trajectories, score.group = NULL, score.name = NULL, quantiles = c(0.25, 0.75), compareGroups = c("Low", "High"), posClass = NULL, basefeatures = NULL, genes = NULL, background = NULL, assay = NULL, split = NULL, cross = 10, num_cores = 1 )
HetseqDoubleML( object, trajectories, score.group = NULL, score.name = NULL, quantiles = c(0.25, 0.75), compareGroups = c("Low", "High"), posClass = NULL, basefeatures = NULL, genes = NULL, background = NULL, assay = NULL, split = NULL, cross = 10, num_cores = 1 )
object |
Seurat object |
trajectories |
Matrix of cell-cell trajectories. Columns represent time points, rows represent trajectories of connected cells over time points. |
score.group |
A named vector of response groups. Names represent cells, the values represent the score groups. If no score.group is set, use score.name and quantiles parameters must be set to define score groups. |
score.name |
The name of a numeric Seurat meta data column, which will be used to calculate score groups. Only used if no score.group is given. |
quantiles |
Thresholds of the score.name meta data to define 3 response groups. Low, Middle, High. |
compareGroups |
Which score groups to test. Default: Low vs. High |
posClass |
Define the positive Class for classification. |
basefeatures |
Additional informative features to include in the classification. Must be meta data available in the Seurat object. |
genes |
Vector of genes to test. |
background |
A set of genes that will be considered as potential confounding factors in the DoubleML analysis. Must contain all genes set in the genes parameter. By default, all genes are used. |
assay |
The name of the Seurat assay to perform Heterogeneity-seq on. If NULL, the default assay will be used. |
split |
Set a training-test data split. Must be in [0,1] |
cross |
Number of cross-validations. |
num_cores |
The number of cores used in parallel processing. |
Table of log2FC and AUC values for each gene and an additional AUC value for the baseline features.
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html t <- HetseqDoubleML(data, trajectories, score.name = "score") t <- HetseqDoubleML(data, trajectories, score.group = group_vector, compareGroups = c("Weak", "Strong"))
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html t <- HetseqDoubleML(data, trajectories, score.name = "score") t <- HetseqDoubleML(data, trajectories, score.group = group_vector, compareGroups = c("Weak", "Strong"))
Testing for differential gene expression in predecessors of treated cells
HetseqTest(mat, A, B)
HetseqTest(mat, A, B)
mat |
Gene expression matrix of control cells |
A |
Vector of cells (columns) in positive class |
B |
Vector of cells (columns) in negative class |
Table of log2FC, p-values and adjusted p-values of differentially expressed genes.
Applies Min-Cost-Max-Flow to calculate optimal trajectories from distance matrices.
mincostflow(D.list, verbose = TRUE)
mincostflow(D.list, verbose = TRUE)
D.list |
List of (pruned) distance matrices. Can be generated by the distmat function. |
verbose |
Show verbose output. |
Matrix of cell-cell trajectories spanning all given timepoints.
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html mincostflow(D.list)
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html mincostflow(D.list)
This plotting functions creates AUC Scatter plots to visualize Classifier Results.
PlotClassify( table, highlights = NULL, highlights.color = NULL, auc.cutoff = NULL, plot.baseline = TRUE, density.color = TRUE, density.n = 500, point.scale = 0.5, xlab = "AUC", ylab = bquote(log[2] ~ FC ~ (`0h`)), linetype = "dashed" )
PlotClassify( table, highlights = NULL, highlights.color = NULL, auc.cutoff = NULL, plot.baseline = TRUE, density.color = TRUE, density.n = 500, point.scale = 0.5, xlab = "AUC", ylab = bquote(log[2] ~ FC ~ (`0h`)), linetype = "dashed" )
table |
Table from HetSeq using the Classifier method. |
highlights |
A vector of genes to highlight in the plot. |
highlights.color |
A vector of colors for gene highlights. |
auc.cutoff |
Inserts a vertical line at the cutoff AUC value. |
plot.baseline |
Inserts a vertical line at the baseline AUC value (= AUC of the classifier with basefeatures but no further gene info). |
density.color |
Color data points by density. Default is TRUE. |
density.n |
Set granularity of 2d density color. |
point.scale |
Set point size. |
xlab |
Set label of the x-axis. |
ylab |
Set label of the y-axis. |
linetype |
Set the linetype of the baseline AUC line. |
ggplot object.
tab <- HetseqClassify(data, trajectories, score.name = "score") PlotClassify(tab, highlights=c("MYC", "GAPDH", "ISG15"))
tab <- HetseqClassify(data, trajectories, score.name = "score") PlotClassify(tab, highlights=c("MYC", "GAPDH", "ISG15"))
This plotting functions creates a Vulcano Plot to visualize DoubleML Results.
PlotDoubleML( table, highlights = NULL, p.cutoff = 0.05, est.cutoff = NULL, highlights.color = NULL, label.repulsion = 1, density.color = TRUE, density.n = 500, point.scale = 0.5, xlab = "Estimate", ylab = bquote("-" ~ log[10] ~ FDR), linetype = "dashed" )
PlotDoubleML( table, highlights = NULL, p.cutoff = 0.05, est.cutoff = NULL, highlights.color = NULL, label.repulsion = 1, density.color = TRUE, density.n = 500, point.scale = 0.5, xlab = "Estimate", ylab = bquote("-" ~ log[10] ~ FDR), linetype = "dashed" )
table |
Table from the Hetseq using the doubleML method. |
highlights |
A vector of genes to highlight in the plot. |
p.cutoff |
Adds a dashed horizontal line at the given adjusted p-value cutoff. |
est.cutoff |
Adds two dashed vertical lines (+/-) at the given estimate cutoff. |
highlights.color |
A vector of colors for gene highlights. |
label.repulsion |
Represents the force parameter of the ggrepel::geom_label_repel() function. Higher values reduce label overlap. |
density.color |
Color data points by density. Default is TRUE. |
density.n |
Set granularity of 2d density color. |
point.scale |
Set point size. |
xlab |
Set label of the x-axis. |
ylab |
Set label of the y-axis. |
linetype |
Set the linetype of the p-value and estimate cutoff line. |
ggplot object.
tab <- HetseqDoubleML(data, trajectories, score.name = "score") PlotDoubleML(tab, highlights=c("MYC", "GAPDH", "ISG15"))
tab <- HetseqDoubleML(data, trajectories, score.name = "score") PlotDoubleML(tab, highlights=c("MYC", "GAPDH", "ISG15"))
Prune trajectories down to top.n candidates to reduce runtime of subsequent mincostflow function.
prune(D.list, top.n = 10)
prune(D.list, top.n = 10)
D.list |
List of distance matrices. Can be generated by the distmat function. |
top.n |
Prune trajectories to only top.n possible connections to optimize subsequent application of the mincostflow function. |
Pruned distance matrix between cells from multiple timepoints.
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html prune(D.list, top.n = 10)
# Full vignette available on https://grandr.erhard-lab.de/articles/web/hetseq.html prune(D.list, top.n = 10)