Title: | Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data |
---|---|
Description: | Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>). |
Authors: | Dominic Grün [aut, cre] |
Maintainer: | Dominic Grün <[email protected]> |
License: | GPL-3 |
Version: | 0.3.6 |
Built: | 2024-10-27 12:34:50 UTC |
Source: | CRAN |
RaceID is a clustering algorithm for the identification of cell types from single-cell RNA-sequencing data. It was specifically designed for the detection of rare cells which correspond to outliers in conventional clustering methods. The package contains RaceID3, the most recently published version of this algorithm, and StemID2, an algorithm for the identification of lineage trees based on RaceID3 analysis. RaceID3 utilizes single cell expression data, and was designed to work well with quantitative single-cell RNA-seq data incorporating unique molecular identifiers. It requires a gene-by-cell expression matrix as input and produces a clustering partition representing cell types. StemID2 assembles these cell types into a lineage tree. The RaceID package (>= v0.1.4) also contains functions for a VarID analysis. VarID comprises a sensitive clustering method utilizing pruned k-nearest neighbor networks, connecting only cells with links supported by a background model of gene expression. These pruned k-nearest neighbor networks further enable the definition of homogenous neighborhoods for the quantification of local gene expression variability in cell state space.
For details please see vignette.
Dominic Grun, [email protected].
Maintainer: Dominic Grun <[email protected]>
Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662> Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>
This functions generates a barplot of gene expression across all clusters.
barplotgene(object, g, n = NULL, logsc = FALSE)
barplotgene(object, g, n = NULL, logsc = FALSE)
object |
|
g |
Individual gene name or vector with a group of gene names corresponding to a subset of valid row names of the |
n |
String of characters representing the title of the plot. Default is |
logsc |
logical. If |
None
This function returns the base line variability as a function of the
baseLineVar(x, y)
baseLineVar(x, y)
x |
mean expression. The corresponding corrected variance is returned. |
y |
object returned by |
Base line (or background) variability.
y <- noiseBaseFit(intestinalDataSmall,step=.01,thr=.05) x <- apply(intestinalDataSmall,1,mean) baseLineVar(x,y)
y <- noiseBaseFit(intestinalDataSmall,step=.01,thr=.05) x <- apply(intestinalDataSmall,1,mean) baseLineVar(x,y)
This function computes expression z-score between groups of cells from the same cluster residing on different links
branchcells(object, br)
branchcells(object, br)
object |
|
br |
List containing two branches, where each component has to be two valid cluster numbers seperated by a |
A list ot four components:
n |
a vector with the number of significant links for each cluster. |
scl |
a vector with the delta entropy for each cluster. |
k |
a vector with the StemID score for each cluster. |
diffgenes |
a vector with the StemID score for each cluster. |
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr) ltr <- lineagegraph(ltr) ltr <- comppvalue(ltr) x <- branchcells(ltr,list("1.3","3.6")) head(x$diffgenes$z) plotmap(x$scl) plotdiffgenes(x$diffgenes,names(x$diffgenes$z)[1])
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr) ltr <- lineagegraph(ltr) ltr <- comppvalue(ltr) x <- branchcells(ltr,list("1.3","3.6")) head(x$diffgenes$z) plotmap(x$scl) plotdiffgenes(x$diffgenes,names(x$diffgenes$z)[1])
This function calculates an aggregated dispersion parameter comprising global cell-to-cell variability of transcript counts and biological variability.
calcAlphaG(noise)
calcAlphaG(noise)
noise |
List of noise parameters returned by |
Matrix of aggregated dispersion parameters.
This function calculates the total variance from a local VarID fit.
calcVar(w)
calcVar(w)
w |
List object returned by |
Vector of total variance estimates.
This function calculates a total variance fit comprising sampling noise, global cell-to-cell variability of transcript counts, and biological variability.
calcVarFit(noise, norm = FALSE)
calcVarFit(noise, norm = FALSE)
noise |
List of noise parameters returned by |
norm |
Logical. If |
Matrix of total variance fits.
This dataset contains official gene symbols for markers of the S phase and G2/M phase of the cell cycle in mouse.
cc_genes
cc_genes
A list of two components with S phase marker (s) and G2M phase marker (g2m) gene symbols.
None
This functions performs dimensional reduction by PCA or ICA and removes components enriched for particular gene sets, e.g. cell cycle related genes genes associated with technical batch effects.
CCcorrect( object, vset = NULL, CGenes = NULL, ccor = 0.4, pvalue = 0.01, quant = 0.01, nComp = NULL, dimR = FALSE, mode = "pca", logscale = FALSE, FSelect = TRUE )
CCcorrect( object, vset = NULL, CGenes = NULL, ccor = 0.4, pvalue = 0.01, quant = 0.01, nComp = NULL, dimR = FALSE, mode = "pca", logscale = FALSE, FSelect = TRUE )
object |
|
vset |
List of vectors with genes sets. The loadings of each component are tested for enrichment in any of these gene sets and if the lower |
CGenes |
Vector of gene names. If this argument is given, gene sets to be tested for enrichment in PCA- or ICA-components are defined by all genes with a Pearson's correlation of |
ccor |
Positive number between 0 and 1. Correlation threshold used to detrmine correlating gene sets for all genes in |
pvalue |
Positive number between 0 and 1. P-value cutoff for determining enriched components. See |
quant |
Positive number between 0 and 1. Upper and lower fraction of gene loadings used for determining enriched components. See |
nComp |
Number of PCA- or ICA-components to use. Default is |
dimR |
logical. If |
mode |
|
logscale |
logical. If |
FSelect |
logical. If |
The function returns an updated SCseq
object with the principal or independent component matrix written to the slot dimRed$x
of the SCseq
object. Additional information on the PCA or ICA is stored in slot dimRed
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- CCcorrect(sc,dimR=TRUE,nComp=3)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- CCcorrect(sc,dimR=TRUE,nComp=3)
This function extracts a vector of cells on a given differentiation trajectory in pseudo-temporal order determined from the projection coordinates.
cellsfromtree(object, z)
cellsfromtree(object, z)
object |
|
z |
Vector of valid cluster numbers ordered along the trajectory. |
A list ot four components:
f |
a vector of cells ids ordered along the trajectory defined by |
g |
a vector of integer number. Number |
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr) ltr <- lineagegraph(ltr) ltr <- comppvalue(ltr) x <- cellsfromtree(ltr,c(1,3,6,2))
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr) ltr <- lineagegraph(ltr) ltr <- comppvalue(ltr) x <- cellsfromtree(ltr,c(1,3,6,2))
This function compares the neighborhood of a cell with the neighorhoods of all of its k nearest neighors and prunes links to neighbors that do not co-occur in a defined minimum number of neighborhoods by setting their link p-value (entry in pvM
data.frame of res
input object) to 0.
cleanNN(res, minN = 2, no_cores = NULL)
cleanNN(res, minN = 2, no_cores = NULL)
res |
List object with k nearest neighbour information returned by |
minN |
Positive integer number. Minimum of neighborhoods across the k nearest neighbours of a cell expected to share a neighbor with the cell. Default is 2. |
no_cores |
Positive integer number. Number of cores for multithreading. If set to |
A res
object with update pvalue entries (pvM
element).
This functions computes differentially expressed genes in a (set of) cluster(s) by comparing to all remaining cells outside of the cluster (or a given background set of clusters) based on a negative binomial model of gene expression
clustdiffgenes(object, cl, bgr = NULL, pvalue = 0.01)
clustdiffgenes(object, cl, bgr = NULL, pvalue = 0.01)
object |
|
cl |
A valid set of cluster numbers from the final cluster partition stored in the |
bgr |
Ordered vector of cluster numbers to be used as background set. If |
pvalue |
Positive real number smaller than one. This is the p-value cutoff for the inference of differential gene expression. Default is 0.01. |
A list of two components. The first component dg
contains a a data.frame of differentially expressed genes ordered by p-value in increasing order, with four columns:
mean.ncl |
mean expression across cells outside of cluster |
mean.cl |
mean expression across cells within cluster |
fc |
fold-change of mean expression in cluster |
pv |
inferred p-value for differential expression. |
padj |
Benjamini-Hochberg corrected FDR. |
The second component de
contains the conventional output of diffexpnb
, where set B corresponds to all clusters in cl
and B to the background set (all clusters in bgr
or not in cl
). This component can be used for plotting by plotdiffgenesnb
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) x <- clustdiffgenes(sc,1) head(x$dg[x$dg$fc>1,])
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) x <- clustdiffgenes(sc,1) head(x$dg[x$dg$fc>1,])
This functions performs the initial clustering of the RaceID3 algorithm.
clustexp( object, sat = TRUE, samp = NULL, cln = NULL, clustnr = 30, bootnr = 50, rseed = 17000, FUNcluster = "kmedoids", verbose = TRUE )
clustexp( object, sat = TRUE, samp = NULL, cln = NULL, clustnr = 30, bootnr = 50, rseed = 17000, FUNcluster = "kmedoids", verbose = TRUE )
object |
|
sat |
logical. If |
samp |
Number of random sample of cells used for the inference of cluster number and for inferring Jaccard similarities. Default is 1000. |
cln |
Number of clusters to be used. Default is |
clustnr |
Maximum number of clusters for the derivation of the cluster number by the saturation of mean within-cluster-dispersion. Default is 30. |
bootnr |
Number of booststrapping runs for |
rseed |
Integer number. Random seed to enforce reproducible clustering results. Default is 17000. |
FUNcluster |
Clustering method used by RaceID3. One of |
verbose |
logical. If |
SCseq
object with clustering data stored in slot cluster
and slot clusterpar
. The clustering partition is stored in
cluster$kpart
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc)
This functions plots a heatmap of the distance matrix grouped by clusters.
clustheatmap(object, final = TRUE, hmethod = "single")
clustheatmap(object, final = TRUE, hmethod = "single")
object |
|
final |
logical. If |
hmethod |
Agglomeration method used for determining the cluster order from hierarchical clustering of the cluster medoids. See |
Returns a vector of cluster numbers ordered as determined by herarchical clustering of cluster the cluster medoids as depicted in the heatmap.
This functions computes the distance matrix used for cell type inference by RaceID3.
compdist( object, metric = "pearson", FSelect = TRUE, knn = NULL, alpha = 1, no_cores = 1 )
compdist( object, metric = "pearson", FSelect = TRUE, knn = NULL, alpha = 1, no_cores = 1 )
object |
|
metric |
Distances are computed from the filtered expression matrix after optional feature selection, dimensional reduction, and/or transformation (batch correction).
Possible values for |
FSelect |
Logical parameter. If |
knn |
Positive integer number of nearest neighbours used for imputing gene expression values. Default is |
alpha |
Positive real number. Relative weight of a cell versus its k nearest neigbour applied for imputing gene expression. A cell receives a weight of |
no_cores |
Positive integer number. Number of cores for multithreading during imputation. If set to |
SCseq
object with the distance matrix in slot distances
. If FSelect=TRUE
, the genes used for computing the distance object are stored in
slot cluster$features
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc)
This function computes the transcriptome entropy for each cell.
compentropy(object)
compentropy(object)
object |
|
An Ltree class object with a vector of entropies for each cell in the same order as column names in slot sc@ndata.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr)
This functions performs the computation of a Fruchterman-Rheingold graph layout based on an adjacency matrix derived
from the distance object in slot distances
using the igraph package.
compfr(object, knn = 10, rseed = 15555)
compfr(object, knn = 10, rseed = 15555)
object |
|
knn |
Positive integer number of nearest neighbours used for the inference of the Fruchterman-Rheingold layout. Default is |
rseed |
Integer number. Random seed to enforce reproducible layouts. |
SCseq
object with layout coordinates stored in slot fr
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- compfr(sc)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- compfr(sc)
This function performs computation of locally averaged gene expression across the pruned k nearest neighbours at given link probability cutoff.
compMean( x, res, pvalue = 0.01, genes = NULL, regNB = FALSE, batch = NULL, regVar = NULL, offsetModel = TRUE, thetaML = FALSE, theta = 10, ngenes = NULL, span = 0.75, no_cores = NULL, seed = 12345 )
compMean( x, res, pvalue = 0.01, genes = NULL, regNB = FALSE, batch = NULL, regVar = NULL, offsetModel = TRUE, thetaML = FALSE, theta = 10, ngenes = NULL, span = 0.75, no_cores = NULL, seed = 12345 )
x |
Matrix of gene expression values with genes as rows and cells as columns. The matrix need to contain the same cell IDs as columns like the input matrix used to derive the pruned k nearest neighbours with the |
res |
List object with k nearest neighbour information returned by |
pvalue |
Positive real number between 0 and 1. All nearest neighbours with link probability |
genes |
Vector of gene names corresponding to a subset of rownames of |
regNB |
logical. If |
batch |
vector of batch variables. Component names need to correspond to valid cell IDs, i.e. column names of |
regVar |
data.frame with additional variables to be regressed out simultaneously with the log UMI count and the batch variable (if |
offsetModel |
Logical parameter. Only considered if |
thetaML |
Logical parameter. Only considered if |
theta |
Positive real number. Fixed value of the dispersion parameter. Only considered if |
ngenes |
Positive integer number. Randomly sampled number of genes (from rownames of |
span |
Positive real number. Parameter for loess-regression (see |
no_cores |
Positive integer number. Number of cores for multithreading. If set to |
seed |
Integer number. Random number to initialize stochastic routines. Default is 12345. |
List object of three components:
mean |
matrix with local gene expression averages, computed from Pearson residuals (if |
regData |
If |
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) mexp <- compMean(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1)
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) mexp <- compMean(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1)
This functions computes cluster medoids given an SCseq
object and a clustering partition. The medoids are either derived from the
distance matrix or, if the slot distances
is empty, from the dimensionally reduced feature matrix in slot dimRed$x
using the euclidean metric.
compmedoids(object, part)
compmedoids(object, part)
object |
|
part |
Clustering partition. A vector of cluster numbers for (a subset of) cells (i.e. column names)
of slot |
Returns a list of medoids (column names of slot ndata
from the SCseq
object) ordered by increasing cluster number.
This function performs computation of the local gene expression variability across the pruned k nearest neighbours at given link probability cutoff. The estimated variance is corrected for the mean dependence utilizing the baseline model of gene expression variance.
compNoise( x, res, pvalue = 0.01, genes = NULL, regNB = FALSE, batch = NULL, regVar = NULL, offsetModel = TRUE, thetaML = FALSE, theta = 10, ngenes = NULL, span = 0.75, step = 0.01, thr = 0.05, no_cores = NULL, seed = 12345 )
compNoise( x, res, pvalue = 0.01, genes = NULL, regNB = FALSE, batch = NULL, regVar = NULL, offsetModel = TRUE, thetaML = FALSE, theta = 10, ngenes = NULL, span = 0.75, step = 0.01, thr = 0.05, no_cores = NULL, seed = 12345 )
x |
Matrix of gene expression values with genes as rows and cells as columns. The matrix need to contain the same cell IDs as columns like the input matrix used to derive the pruned k nearest neighbours with the |
res |
List object with k nearest neighbour information returned by |
pvalue |
Positive real number between 0 and 1. All nearest neighbours with link probability |
genes |
Vector of gene names corresponding to a subset of rownames of |
regNB |
logical. If |
batch |
vector of batch variables. Component names need to correspond to valid cell IDs, i.e. column names of |
regVar |
data.frame with additional variables to be regressed out simultaneously with the log UMI count and the batch variable (if |
offsetModel |
Logical parameter. Only considered if |
thetaML |
Logical parameter. Only considered if |
theta |
Positive real number. Fixed value of the dispersion parameter. Only considered if |
ngenes |
Positive integer number. Randomly sampled number of genes (from rownames of |
span |
Positive real number. Parameter for loess-regression (see |
step |
Positive real number between 0 and 1. See function |
thr |
Positive real number between 0 and 1. See function |
no_cores |
Positive integer number. Number of cores for multithreading. If set to |
seed |
Integer number. Random number to initialize stochastic routines. Default is 12345. |
List object of three components:
model |
the baseline noise model as computed by the |
data |
matrix with local gene expression variability estimates, corrected for the mean dependence. |
regData |
If |
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1)
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1)
This function computes a p-value for the significance (i.e. over-representation of assigned cells) of each inter-cluster link.
comppvalue(object, pthr = 0.01, sensitive = FALSE)
comppvalue(object, pthr = 0.01, sensitive = FALSE)
object |
|
pthr |
p-value cutoff for link significance. This threshold is applied for the calculation of link scores reflecting how uniformly a link is occupied by cells. |
sensitive |
logical. Only relevant when |
An Ltree class object with link p-value and occupancy data stored in slot cdata
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr) ltr <- lineagegraph(ltr) ltr <- comppvalue(ltr)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr) ltr <- lineagegraph(ltr) ltr <- comppvalue(ltr)
This function extracts the number of links connecting a given cluster to other cluster, the delta median entropy of each cluster (median entropy of a cluster after subtracting the minimum median entropy across all clusters), and the StemID2 score which is the product of both quantities for each cluster.
compscore(object, nn = 1, scthr = 0, show = TRUE)
compscore(object, nn = 1, scthr = 0, show = TRUE)
object |
|
nn |
Positive integer number. Number of higher order neighbors to be included for the determination of links: indirect connections via |
scthr |
Real number between zero and one. Score threshold for links to be included in the calculation. For |
show |
logical. If |
A list ot three components:
links |
a vector with the number of significant links for each cluster. |
entropy |
a vector with the delta entropy for each cluster. |
StemIDscore |
a vector with the StemID score for each cluster. |
This function fits negative binomial models to transcript counts of pruned k-nearest neighbourhoods inferred by pruneKnn
thereby deconvoluting variability into sampling noise, global cell-to-cell variability of transcript counts, and residual variability, which corresponds to biological noise.
compTBNoise( res, expData, pvalue = 0.01, genes = NULL, minN = 5, no_cores = NULL, gamma = 0.5, x0 = 0, lower = 0, upper = 100 )
compTBNoise( res, expData, pvalue = 0.01, genes = NULL, minN = 5, no_cores = NULL, gamma = 0.5, x0 = 0, lower = 0, upper = 100 )
res |
List object with k nearest neighbour information returned by |
expData |
Matrix of gene expression values with genes as rows and cells as columns. These values have to correspond to unique molecular identifier counts. |
pvalue |
Positive real number between 0 and 1. All nearest neighbours with link probability |
genes |
Vector of gene names corresponding to a subset of rownames of |
minN |
Positive integer number. Noise inference is only done for k-nearest neighbourhoods with at least |
no_cores |
Positive integer number. Number of cores for multithreading. If set to |
gamma |
Positive real number. Scale paramter of the cauchy prior. Default is 0.5. |
x0 |
Real number greater or equal to zero. Location parameter of the cauchy prior. |
lower |
Real number greater or equal to zero. Lower bound for the maximum a posterior inference of the biological noise. Default is 0. |
upper |
Real number greater or equal to zero. Upper bound for the maximum a posterior inference of the biological noise. Default is 100. |
List object of three components:
mu |
Vector of mean expression for all k-nearest neighbourhoods. Componenets are set to |
rt |
Vector of dispersion parameters capturing global cell-to-cell variability of transcript counts for all k-nearest neighbourhoods. Componenets are set to |
epsilon |
Matrix of biological noise estimates for all genes across for all k-nearest neighbourhoods. Componenets are set to |
pars |
List of parameters. |
## Not run: res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compTBNoise(res,intestinalDataSmall,pvalue=0.01,genes = NULL,no_cores=1) ## End(Not run)
## Not run: res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compTBNoise(res,intestinalDataSmall,pvalue=0.01,genes = NULL,no_cores=1) ## End(Not run)
This functions performs the computation of a t-SNE map from the distance object in slot distances
using the Rtsne package.
comptsne( object, dimRed = FALSE, initial_cmd = TRUE, perplexity = 30, rseed = 15555 )
comptsne( object, dimRed = FALSE, initial_cmd = TRUE, perplexity = 30, rseed = 15555 )
object |
|
dimRed |
logical. If |
initial_cmd |
logical. If |
perplexity |
Positive number. Perplexity of the t-SNE map. Default is |
rseed |
Integer number. Random seed to enforce reproducible t-SNE map. |
SCseq
object with t-SNE coordinates stored in slot tsne
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc)
This functions performs the computation of a two-dimensional umap representation based on the distance matrix in
slot distances
using the umap package.
compumap( object, dimRed = FALSE, n_neighbors = 15, metric = "euclidean", n_epochs = 200, min_dist = 0.1, local_connectivity = 1, spread = 1 )
compumap( object, dimRed = FALSE, n_neighbors = 15, metric = "euclidean", n_epochs = 200, min_dist = 0.1, local_connectivity = 1, spread = 1 )
object |
|
dimRed |
logical. If |
n_neighbors |
Umap parameter. See |
metric |
Umap parameter. See |
n_epochs |
Umap parameter. See |
min_dist |
Umap parameter. See |
local_connectivity |
Umap parameter. See |
spread |
Umap parameter. See |
SCseq
object with umap coordinates stored in slot umap
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- compumap(sc)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- compumap(sc)
Function for regressing out the mean-variance dependence. This function corrects for the systematic dependence of the variance on the mean by a local regression.
corrVar(m, v, span = 0.75, degree = 2)
corrVar(m, v, span = 0.75, degree = 2)
m |
Vector of mean expression estimates for a set of genes. |
v |
Vector of variance etsimates for a set of genes. |
span |
Parameter for the local regression. See help(loess). Default value is 0.75. |
degree |
Parameter for the local regression. See help(loess). Default value is 2. |
Vector of corrected variance estimates.
This creates an adjacency matrix, keeping only nearest neighbour with a link probability above a minimum probability
createKnnMatrix(res, pvalue = 0.01)
createKnnMatrix(res, pvalue = 0.01)
res |
List object with k nearest neighbour information returned by |
pvalue |
Positive real number between 0 and 1. All nearest neighbours with link probability |
Adjacency matrix in sparse matrix format (see package Matrix) with positive non-zero entries only for k nearest neighours with link probability >= pvalue
. The value of these entries equals the link probability.
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) y <- createKnnMatrix(res,pvalue=0.01)
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) y <- createKnnMatrix(res,pvalue=0.01)
This function performs differential expression analysis between two sets of single cell transcriptomes. The inference is based on a noise model or relies on the DESeq2
approach.
diffexpnb( x, A, B, DESeq = FALSE, method = "pooled", norm = FALSE, vfit = NULL, locreg = FALSE, ... )
diffexpnb( x, A, B, DESeq = FALSE, method = "pooled", norm = FALSE, vfit = NULL, locreg = FALSE, ... )
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis. This input has to be provided if |
A |
vector of cell IDs corresponding column names of |
B |
vector of cell IDs corresponding column names of |
DESeq |
logical value. If |
method |
either "per-condition" or "pooled". If DESeq is not used, this parameter determines, if the noise model is fitted for each set separately ("per-condition") or for the pooled set comprising all cells in |
norm |
logical value. If |
vfit |
function describing the background noise model. Inference of differentially expressed genes can be performed with a user-specified noise model describing the expression variance as a function of the mean expression. Default value is |
locreg |
logical value. If |
... |
additional arguments to be passed to the low level function |
If DESeq
equals TRUE
, the function returns the output of DESeq2. In this case list of the following two components is returned:
cds |
object returned by the DESeq2 function |
res |
data frame containing the results of the DESeq2 analysis. |
Otherwise, a list of three components is returned:
vf1 |
a data frame of three columns, indicating the mean |
vf2 |
a data frame of three columns, indicating the mean |
res |
a data frame with the results of the differential gene expression analysis with the structure of the |
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) A <- names(sc@cpart)[sc@cpart %in% c(1,2)] B <- names(sc@cpart)[sc@cpart %in% c(3)] y <- diffexpnb(getfdata(sc,n=c(A,B)), A=A, B=B )
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) A <- names(sc@cpart)[sc@cpart %in% c(1,2)] B <- names(sc@cpart)[sc@cpart %in% c(3)] y <- diffexpnb(getfdata(sc,n=c(A,B)), A=A, B=B )
This functions computes expression differences between clusters and ranks genes by z-score differences.
diffgenes(object, cl1, cl2, mincount = 1)
diffgenes(object, cl1, cl2, mincount = 1)
object |
|
cl1 |
A vector of valid cluster numbers (contained in the |
cl2 |
A vector of valid cluster numbers (contained in the |
mincount |
Minimal normalized expression level of a gene to be included into the analysis. A gene needs to be expressed at this level in at least a single cell. |
A list with four components:
z |
a vector of z-scores in decreasing order with genes up-regulated in |
cl1 |
a |
cl2 |
a |
cl1n |
a vector of cluster numbers for cells in |
cl2n |
a vector of cluster numbers for cells in |
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) x <- diffgenes(sc,1,2) head(x$z) plotdiffgenes(x,names(x$z)[1])
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) x <- diffgenes(sc,1,2) head(x$z) plotdiffgenes(x,names(x$z)[1])
This function extracts genes with significantly elevated variability in a cluster on a basis of a Wilcoxon rank sum-test between cells in a cluster and all remaining cells.
diffNoisyGenes(noise, cl, set, bgr = NULL, no_cores = 1)
diffNoisyGenes(noise, cl, set, bgr = NULL, no_cores = 1)
noise |
List object with the background noise model and a variability matrix, returned by the |
cl |
List object with clustering information, returned by the |
set |
Postive integer number or vector of integers corresponding to valid cluster numbers. The function reports genes with elevated variability in all
clusters contained in |
bgr |
Postive integer number or vector of integers corresponding to valid cluster numbers. Background set for comparison. The function reports genes
with elevated variability in all clusters contained in |
no_cores |
Positive integer number. Number of cores for multithreading. If set to |
Data.frame reporting the log2 fold change between clusters in set
and the remaining clusters and the p-value for elevated variability for each genes. Rows are ordered by decreasing log2 fold change.
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1) cl <- graphCluster(res,pvalue=0.01) ngenes <- diffNoisyGenes(noise,cl,c(1,2),no_cores=1)
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1) cl <- graphCluster(res,pvalue=0.01) ngenes <- diffNoisyGenes(noise,cl,c(1,2),no_cores=1)
This function infers genes with differential biological variability in a cluster versus a background set of clusters on the basis of a Wilcoxon rank sum-test between cells in a cluster and in the background set.
diffNoisyGenesTB( noise, cl, set, bgr = NULL, no_cores = 1, minobs = 5, ps = 0.1, rseed = 17000 )
diffNoisyGenesTB( noise, cl, set, bgr = NULL, no_cores = 1, minobs = 5, ps = 0.1, rseed = 17000 )
noise |
List object with noise parameters returned by the |
cl |
List object with clustering information, returned by the |
set |
Postive integer number or vector of integers corresponding to valid cluster numbers. The function reports genes with differential variability in all
clusters contained in |
bgr |
Postive integer number or vector of integers corresponding to valid cluster numbers. Background set for comparison. The function reports genes
with differential variability in all clusters contained in |
no_cores |
Positive integer number. Number of cores for multithreading. If set to |
minobs |
Positive integer number. Only genes with at least |
ps |
Real number greater or equal to zero. A small random variable sampled from a uniform distribution in the interval |
rseed |
Integer number. Random seed to enforce reproducible results. Default is 17000. |
Data.frame with five columns:
mu.set |
Mean expression across clusters in |
mu.bgr |
Mean expression across clusters in |
mu.all |
Mean expression across clusters in |
eps.set |
Average variability across clusters in |
eps.bgr |
Average variability across clusters in |
eps.all |
Average variability across clusters in |
log2FC |
log2 fold change of variability between between clusters in |
pvalue |
Banjamini-Hochberg corrected Wilcoxon rank sum test p-value for differential variability. |
Rows are ordered by decreasing log2 fold change of variability.
## Not run: res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compTBNoise(res,intestinalDataSmall,pvalue=0.01,genes = NULL,no_cores=1) cl <- graphCluster(res,pvalue=0.01) ngenes <- diffNoisyGenesTB(noise,cl,c(1,2),no_cores=1) ## End(Not run)
## Not run: res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compTBNoise(res,intestinalDataSmall,pvalue=0.01,genes = NULL,no_cores=1) cl <- graphCluster(res,pvalue=0.01) ngenes <- diffNoisyGenesTB(noise,cl,c(1,2),no_cores=1) ## End(Not run)
This function discards lowly expressed genes from a count matrix stored in an SCseq
object, and returns (normalized or non-normalized) gene expression or noise values.
extractCounts( object, minexpr = 5, minnumber = 5, noise = FALSE, pt = NULL, n = NULL, g = NULL, norm = TRUE )
extractCounts( object, minexpr = 5, minnumber = 5, noise = FALSE, pt = NULL, n = NULL, g = NULL, norm = TRUE )
object |
|
minexpr |
Integer number greater or equal to zero. Minimum expression of a gene in at least |
minnumber |
Integer number greater or equal to zero. Minimum number of cells required to have at least |
noise |
logical. If |
pt |
List object returned by function |
n |
Vector of valid column names corresponding to a subset of valid column names of the |
g |
Vector of gene IDs (valid row names of |
norm |
logical. If |
Filtered expression matrix.
This function allows filtering of genes and cells to be used in the RaceID3 analysis.
It also can perform batch effect correction using an internal method or a recently published alternative mnnCorrect
from the batchelor package.
filterdata( object, mintotal = 3000, minexpr = 5, minnumber = 5, LBatch = NULL, knn = 10, CGenes = NULL, FGenes = NULL, ccor = 0.4, bmode = "RaceID", verbose = TRUE )
filterdata( object, mintotal = 3000, minexpr = 5, minnumber = 5, LBatch = NULL, knn = 10, CGenes = NULL, FGenes = NULL, ccor = 0.4, bmode = "RaceID", verbose = TRUE )
object |
|
mintotal |
minimum total transcript number required. Cells with less than |
minexpr |
minimum required transcript count of a gene in at least |
minnumber |
See |
LBatch |
List of experimental batches used for batch effect correction. Each list element contains a vector with cell names
(i.e. column names of the input expression data) falling into this batch. Default is |
knn |
Number of nearest neighbors used to infer corresponding cell types in different batches. Defult is 10. |
CGenes |
List of gene names. All genes with correlated expression to any of the genes in |
FGenes |
List of gene names to be filtered out for cell type inference. Default is |
ccor |
Correlation coefficient used as a trehshold for determining genes correlated to genes in |
bmode |
Method used for batch effect correction. Any of |
verbose |
logical. If |
An SCseq class object with filtered and normalized expression data.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc)
This functions performs the outlier identification based on the clusters infered with the clustexp
function.
findoutliers( object, probthr = 0.001, outminc = 5, outlg = 2, outdistquant = 0.95, verbose = TRUE )
findoutliers( object, probthr = 0.001, outminc = 5, outlg = 2, outdistquant = 0.95, verbose = TRUE )
object |
|
probthr |
outlier probability threshold for a minimum of |
outminc |
minimal transcript count of a gene in a clusters to be tested for being an outlier gene. Default is 5. |
outlg |
Minimum number of outlier genes required for being an outlier cell. Default is 2. |
outdistquant |
Real number between zero and one. Outlier cells are merged to outlier clusters if their distance smaller than the outdistquant-quantile of the distance distribution of pairs of cells in the orginal clusters after outlier removal. Default is 0.95. |
verbose |
logical. If |
SCseq
object with outlier data stored in slot out
and slot outlierpar
. The final clustering partition is stored in
cpart
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc)
This funtion fits a second order polynomial to the variance-mean dependence across all genes in log space.
fitBackVar(x, mthr = -1)
fitBackVar(x, mthr = -1)
x |
Matrix of transcript counts with genes as rows and cells as columns. |
mthr |
Real number. Threshold of log2 mean expression. Genes with mean expression |
List object of four components:
fit |
model fit as returned by the |
genes |
genes with expression variance greater than the polynomial fit. |
m |
mean expression of all genes |
v |
expression variance of all genes |
bg <- fitBackVar(intestinalDataSmall)
bg <- fitBackVar(intestinalDataSmall)
This function fits a Gamma distribution to the total transcript counts distribution across a given group of cells. Total transcript counts are normalized by the mean total transcript count across the group. This function is used to infer a Gamma distribution of the global cell-to-cell variability across pruned nearest neighbourhoods.
fitGammaRt(x)
fitGammaRt(x)
x |
Transcript count matrix with cells as columns and genes as rows. |
Shape parameter of the Gamma distribution. This parameter corresponds to the dispersion explained by the global cell-to-cell variability of UMI counts in a negative binomial model.
Second order polynomial fit of mean-variance dependence This function corrects for the systematic dependence of the variance on the mean by a local regression.
fitLogVarLogMean(x)
fitLogVarLogMean(x)
x |
Matrix of transcript counts with genes as rows and cells as columns. |
Second order polynomial model as obtained by lm
.
This function fits a negative binomial model to transcript counts of a group of cells thereby deconvoluting variability into sampling noise, global cell-to-cell variability of transcript counts, and residual variability, which corresponds to biological noise.
fitNBtb(z, gamma = 2, x0 = 0, lower = 0, upper = 100, grad = TRUE)
fitNBtb(z, gamma = 2, x0 = 0, lower = 0, upper = 100, grad = TRUE)
z |
Transcript count matrix with cells as columns and genes as rows. |
gamma |
Positive real number. Scale paramter of the cauchy prior. Default is 2. |
x0 |
Real number greater or equal to zero. Location parameter of the cauchy prior. |
lower |
Real number greater or equal to zero. Lower bound for the maximum a posterior inference of the biological noise. Default is 0. |
upper |
Real number greater or equal to zero. Upper bound for the maximum a posterior inference of the biological noise. Default is 100. |
grad |
Logical. If |
Data.frame with four columns:
mu |
Mean expression. |
epsilon |
Biological noise. |
rt |
Dispersion parameter capturing global cell-to-cell variability of transcript counts. |
alphaG |
Dispersion parameter capturing global cell-to-cell variability of transcript counts and biological noise. |
This function fits a negative binomial model to transcript counts of a group of cells thereby deconvoluting variability into sampling noise, global cell-to-cell variability of transcript counts, and residual variability, which corresponds to biological noise. Local mean and and global cell-to-cell variability of transcript counts are pre-computed arguments.
fitNBtbCl(z, mu, rt, gamma = 2, x0 = 0.1, lower = 0, upper = 100)
fitNBtbCl(z, mu, rt, gamma = 2, x0 = 0.1, lower = 0, upper = 100)
z |
Transcript count matrix with cells as columns and genes as rows. |
mu |
Vector of mean expression values across cells in |
rt |
Vector of dispersion parameters explaining global cell-to-cell variability of transcript counts across cells in |
gamma |
Positive real number. Scale paramter of the cauchy prior. Default is 2. |
x0 |
Real number greater or equal to zero. Location parameter of the cauchy prior. |
lower |
Real number greater or equal to zero. Lower bound for the maximum a posterior inference of the biological noise. Default is 0. |
upper |
Real number greater or equal to zero. Upper bound for the maximum a posterior inference of the biological noise. Default is 100. |
Vector of biological noise parameters across cells in z
.
This is a plotting function for visualizing gene expression across subsets of clusters or samples. The diameter of a dot reflects the fraction of cells expressing a gene, and the color indicates the expression z-score across all clusters or samples.
fractDotPlot( object, genes, cluster = NULL, samples = NULL, subset = NULL, zsc = FALSE, logscale = TRUE, cap = Inf, flo = -Inf )
fractDotPlot( object, genes, cluster = NULL, samples = NULL, subset = NULL, zsc = FALSE, logscale = TRUE, cap = Inf, flo = -Inf )
object |
|
genes |
vector of valid gene names corresponding to row names of slot |
cluster |
vector of valid cluster numbers contained in slot |
samples |
vector of sample names for all cells. Length and order has to correspond to |
subset |
vector of unique sample names to show in the expression dotplot. Each sample names in |
zsc |
logical. If |
logscale |
logical. If |
cap |
real number. Upper limit for the expression, log2 expression, or z-score. Values larges then |
flo |
real number. Lower limit for the expression, log2 expression, or z-score. Values smaller then |
None
SCseq
objectThis function for extracts a filtered expression matrix from a RaceID SCseq
object. The filterdata
function from
the RaceID package has to be run on the SCseq
object before.
getExpData(object, genes = NULL)
getExpData(object, genes = NULL)
object |
RaceID |
genes |
Vector of valid gene identifiers corresponding to valid rownames of the input expression data. An expression matrix is returned only for these genes.
Default is |
noise Sparse Matrix with genes as rows and cells as columns after filtering.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) d <- getExpData(sc) res <- pruneKnn(d,distM=sc@distances,knn=10,alpha=1,no_cores=1,FSelect=FALSE)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) d <- getExpData(sc) res <- pruneKnn(d,distM=sc@distances,knn=10,alpha=1,no_cores=1,FSelect=FALSE)
This functions allows the extraction of a filtered and normalized expression matrix
getfdata(object, g = NULL, n = NULL)
getfdata(object, g = NULL, n = NULL)
object |
|
g |
Vector of gene names to be included corresponding to a subset of valid row names of the |
n |
Vector of valid column names corresponding to a subset of valid column names of the |
Matrix of filtered expression data with genes as rows and cells as columns.
This function discards lowly expressed genes from a count matrix stored in an SCseq
object.
getFilteredCounts(object, minnumber = 5, minexpr = 5)
getFilteredCounts(object, minnumber = 5, minexpr = 5)
object |
|
minnumber |
Integer number greater or equal to zero. Minimum number of cells required to have at least |
minexpr |
Integer number greater or equal to zero. Minimum expression of a gene in at least |
Filtered expression matrix.
Extract a vector of all genes corresponding to a given module of a FateID self-organizing map (SOM) of pseudo-time ordered gene expression (or noise) profiles.
getNode(ps, n)
getNode(ps, n)
ps |
FateID SOM. List object. |
n |
Integer number of SOM module. |
Vector of gene IDs in module n
.
This function extracts projections of all cells in a cluster and plots a heatmap of these hierarchically clustered projections (rows) to all other clusters (columns). A minimum spanning tree of the cluster centers is overlaid for comparison.
getproj(object, i, show = TRUE, zscore = FALSE)
getproj(object, i, show = TRUE, zscore = FALSE)
object |
|
i |
Cluster number. This number has to correspond to one of the RaceID3 clusters included for the StemID2 inference, i.e. to a number present
in slot |
show |
logical. If |
zscore |
logical. If |
A list ot two components:
pr |
a data.frame of projections for all cells in cluster |
prz |
a data.frame of z-transformed projections for all cells in cluster |
This function derives a graph object from the pruned k nearest neighbours and infers clusters by modularity optimizatio nusing the Louvain or the Leiden algorithm on this graph. A Fruchterman-Rheingold graph layout is also derived from the pruned nearest neighbours.
graphCluster( res, pvalue = 0.01, use.weights = TRUE, use.leiden = FALSE, leiden.resolution = 1, min.size = 2, rseed = 12345 )
graphCluster( res, pvalue = 0.01, use.weights = TRUE, use.leiden = FALSE, leiden.resolution = 1, min.size = 2, rseed = 12345 )
res |
List object with k nearest neighbour information returned by |
pvalue |
Positive real number between 0 and 1. All nearest neighbours with link probability |
use.weights |
logical. If TRUE, then nearest-neighbor link probabilities are used to build a graph as input for Louvain clustering. If FALSE, then all links have equal weight. Default is TRUE. |
use.leiden |
logical. If TRUE, then the Leiden algorithm is used. If FALSE, the Louvain algorithm is used. Default is FALSE. |
leiden.resolution |
Positive real number. Resolution parameter for the Leiden algorithm. |
min.size |
Positive integer number. Minimum cluster size. All clusters with less than |
rseed |
Integer number. Random seed to enforce reproducible clustering results. Default is 12345. |
List object of three components:
partition |
Vector with clustering partition. |
fr |
Data.frame with Fruchterman-Rheingold graph layout. |
residual.cluster |
In case clusters with less than |
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) cl <- graphCluster(res,pvalue=0.01)
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) cl <- graphCluster(res,pvalue=0.01)
This functions returns an imputed expression matrix based on the imputing computed with compdist
.
imputeexp(object, genes = NULL)
imputeexp(object, genes = NULL)
object |
|
genes |
vector of valid gene names corresponding to row names of slot |
An expression matrix with imputed expression values after size normalization. Genes are in rows and cells in columns.
This function allows inspection of the local background model and the pruning of nearest neighbours for a given cell. A dimensional reduction representation is plotted where k nearest neighours and outliers are highlighted. Alternatively, the dependence of the transcript count variance or, alternatively, the coefficient of variation (CV) on the mean in log2 space is plotted. The mean-variance dependence is plotted along with a loess-regression, a second order polynomial fit, and the background model of the local variability. The CV plot also highlights the local variability associated with cell-to-cell variability of total transcript counts, as calculated directly from the mean and variance of total transcript counts (turquoise) or from a local fit of a gamma distribution (orange).
inspectKNN( i, expData, res, cl, object = NULL, nb = res$pars$nb, pvalue = 0.01, backModel = NULL, alpha = res$alpha[i], plotSymbol = FALSE, id = NULL, degree = 2, span = 0.75, cv = FALSE, ... )
inspectKNN( i, expData, res, cl, object = NULL, nb = res$pars$nb, pvalue = 0.01, backModel = NULL, alpha = res$alpha[i], plotSymbol = FALSE, id = NULL, degree = 2, span = 0.75, cv = FALSE, ... )
i |
Either integer column index or column name of |
expData |
Matrix of gene expression values with genes as rows and cells as columns. These values have to correspond to unique molecular identifier counts. |
res |
List object with k nearest neighbour information returned by |
cl |
List object with clustering information, returned by the |
object |
|
nb |
Input parameter of |
pvalue |
Positive real number between 0 and 1. All nearest neighbours with link probability |
backModel |
Optional background model. Second order polynomial fitting the mean-variance dpendence on log2 scales as returned by |
alpha |
Input parameter of |
plotSymbol |
Logical. If |
id |
Valid column name of expData. If |
degree |
Input parameter for mean-variance fit. See |
span |
Input parameter for mean-variance fit. See |
cv |
Input parameter for mean-variance fit. See |
... |
Additional parameters for |
List object with six components:
pv.neighbours.cell |
Vector of outlier p-values (Bonferroni-corrected) for each of the k-nearest neighbours. |
cluster.neighours |
Vector of cluster numbers for central cell and each of the k-nearest neighbours. |
alpha |
|
expr.neighbours |
Matrix of normalized transcript counts for the central cell and each of the k-nearest neighbours (normalized to the minimum number of total trascript counts across all neighours). Additional columns indicate inferred local mean, standard deviation, and strongest outlier p-value. Rows are sorted by p-values of the strongest outlier cell in increasing order. |
pv.neighbours |
Matrix of outlier p-values of all genes for the central cells and each of the k-nearest neighbours. Rows are sorted by p-values of the strongest outlier cell in increasing order. |
strongest.outlier |
Column name of strongest outlier. |
This dataset contains gene expression values, i. e. transcript counts, of 278 intestinal epithelial cells.
intestinalData
intestinalData
A sparse matrix (using the Matrix) with cells as columns and genes as rows. Entries are raw transcript counts.
None
Grün et al. (2016) Cell Stem Cell 19(2): 266-77 <DOI:10.1016/j.stem.2016.05.010> (PubMed)
This dataset is a smaller subset of the original dataset, which contains gene expression values, i. e. transcript counts, of 278 intestinal epithelial cells. The dataset is included for quick testing and examples. Only cells with >10,000 transcripts per cell and only genes with >20 transcript counts in >10 cells were retained.
intestinalDataSmall
intestinalDataSmall
A sparse matrix (using the Matrix) with cells as columns and genes as rows. Entries are raw transcript counts.
None
Grün et al. (2016) Cell Stem Cell 19(2): 266-77 <DOI:10.1016/j.stem.2016.05.010> (PubMed)
This function assembles a lineage graph based on the cell projections onto inter-cluster links.
lineagegraph(object, verbose = TRUE)
lineagegraph(object, verbose = TRUE)
object |
|
verbose |
logical. If |
An Ltree class object with lineage graph-related data stored in slots ltcoord
, prtree
, and cdata
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr) ltr <- lineagegraph(ltr)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr) ltr <- lineagegraph(ltr)
The Ltree class is the central object storing all information generated during lineage tree inference by the StemID algorithm. It comprises a number of slots for a variety of objects.
object |
An Ltree object. |
sc
An SCseq
object with the RaceID3 analysis of the single-cell RNA-seq data for which a lineage tree should be derived.
ldata
List object storing information on the clustering partition, the distance matrix, and the cluster centers in dimensionally-reduced
input space and in two-dimensional t-sne space. Elements:
lp
: vector with the filtered partition into clusters after discarding clusters with cthr cells or less.
pdi
:matrix with the coordinates of all cells in the embedded space. Clusters with cthr
transcripts or less were discarded (see function projcells
).
Rows are medoids and columns are coordinates.
cn
: data.frame with the coordinates of the cluster medoids in the embedded space. Clusters with cthr
transcripts or less were discarded.
Rows are medoids and columns are coordinates.
m
: vector with the numbers of the clusters which survived the filtering.
pdil
: data.frame with coordinates of cells in the two-dimensional t-SNE representation computed by RaceID3. Clusters with cthr
transcripts or less were
discarded. Rows are cells and columns are coordinates.
cnl
: data.frame with the coordinates of the cluster medoids in the two-dimensional t-SNE representation computed by RaceID3. Clusters with cthr
transcripts or less were discarded. Rows are medoids and columns are coordinates.
entropy
Vector with transcriptome entropy computed for each cell.
trproj
List containing two data.frames. Elements:
res
: data.frame with three columns for each cell. The first column o
shows the cluster of a cell,
the second column l
shows the cluster number for the link the cell is assigned to, and the third column h
shows the projection as a fraction of the length
of the inter-cluster link. Parallel projections are positive, while anti-parallel projections are negative.
rma
: data.frame with all projection coordinates for each cell. Rows are cells and columns are clusters. Projections are given as a fraction of the length of the
inter-cluster link. Parallel projections are positive, while anti-parallel projections are negative. The column corresponding to the originating cluster of a cell
shows NA
.
par
List of parameters used for the StemID2 analysis.
prback
data.frame of the same structure as the trproj$res
. In case randomizations are used to compute significant projections, the projections of all
pdishuff
randomizations are appended to this data.frame and therefore the number of rows corresponds to the number of cells multiplied by pdishuf
. See
function projback
.
prbacka
data.frame reporting the aggregated results of the randomizations with four columns. Column n
denotes the number of the randomization sample,
column o
and l
contain the numbers of the originating and the terminal cluster, respectively, for each inter-cluster link and column count
shows
the number of cells assigned to this link in randomization sample n
. The discrete distribution for the computation of the link p-value is given by the data
contained in this object (if nmode=FALSE
).
ltcoord
Matrix storing projection coordinates of all cells in the two-dimensional t-SNE space, used for visualization.
prtree
List with two elements. The first element l
stores a list with the projection coordinates for each link. The name of each element identifies the
link and is composed of two cluster numbers separated by a dot. The second element n
is a list of the same structure and contains the cell names corresponding
to the projection coordinates stored in l
.
cdata
list of data.frames, each with cluster ids as rows and columns:
counts
data.frame indicating the number of cells on the links connecting the cluster of origin (rows) to other clusters (columns).
counts.br
data.frame containing the cell counts on cluster connections averaged across the randomized background samples (if nmode = FALSE
) or as derived
from sampling statistics (if nmode = TRUE
).
pv.e
matrix of enrichment p-values estimated from sampling statistics (if nmode = TRUE
); entries are 0 if the observed number of cells on the respective
link exceeds the (1 – pethr)
-quantile of the randomized background distribution and 0.5 otherwise (if nmode = FALSE
).
pv.d
matrix of depletion p-values estimated from sampling statistics (if nmode = TRUE
); entries are 0 if the observed number of cells on the respective
link is lower than the pethr
-quantile of the randomized background distribution and 0.5 otherwise (if nmode = FALSE
).
pvn.e
matrix of enrichment p-values estimated from sampling statistics (if nmode = TRUE
); 1- quantile, with the quantile estimated from the number of cells on a link as derived from the randomized background distribution (if nmode = FALSE
).
pvn.d
matrix of depletion p-values estimated from sampling statistics (if nmode = TRUE
); quantile estimated from the number of cells on a link as derived from the randomized background distribution (if nmode = FALSE
).
This function extracts genes with maximal variability in a cluster or in the entire data set.
maxNoisyGenes(noise, cl = NULL, set = NULL)
maxNoisyGenes(noise, cl = NULL, set = NULL)
noise |
List object with the background noise model and a variability matrix, returned by the |
cl |
List object with clustering information, returned by the |
set |
Postive integer number or vector of integers corresponding to valid cluster numbers. Noise levels are computed across all cells in this subset of clusters. Default is |
Vector with average gene expression variability in decreasing order, computed across all cells or only cells in a set of clusters (if cl
and
set
are given.
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1) mgenes <- maxNoisyGenes(noise)
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1) mgenes <- maxNoisyGenes(noise)
This function extracts genes with maximal variability in a cluster or in the entire data set.
maxNoisyGenesTB(noise, cl = NULL, set = NULL, minobs = 5)
maxNoisyGenesTB(noise, cl = NULL, set = NULL, minobs = 5)
noise |
List object with noise parameters returned by the |
cl |
List object with clustering information, returned by the |
set |
Postive integer number or vector of integers corresponding to valid cluster numbers. Noise levels are computed across all cells in this subset of clusters. Default is |
minobs |
Positive integer number. Only genes with at least |
Vector with average gene expression variability in decreasing order, computed across all cells or only cells in a set of clusters (if cl
and
set
are given.
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1) mgenes <- maxNoisyGenes(noise)
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1) mgenes <- maxNoisyGenes(noise)
This function fits a second order polynomial to the baseline variance-mean dependence across all genes in log space.
noiseBaseFit(x, step = 0.01, thr = 0.05)
noiseBaseFit(x, step = 0.01, thr = 0.05)
x |
Matrix of gene expression values with genes as rows and cells as columns. |
step |
Positive real number between 0 and 1. Bin size for the computation. The interval of mean gene expression values is divided into bins with equal number of data points and |
thr |
Positive real number between 0 and 1. In each mean expression bin defined by |
List object of three components:
nfit |
model fit as returned by the |
m |
mean expression of all genes |
v |
expression variance of all genes |
x <- noiseBaseFit(intestinalDataSmall,step=.01,thr=.05)
x <- noiseBaseFit(intestinalDataSmall,step=.01,thr=.05)
Function to generate boxplots of a feature vector across different clusters.
plotB(x, part, cluster = NULL, set = NULL, ...)
plotB(x, part, cluster = NULL, set = NULL, ...)
x |
Vector of real numbers. |
part |
Vector with cluster partition, e.g., element |
cluster |
Positive integer corresponing to valid cluster number or |
set |
Ordered set of valid cluster numbers. If |
... |
Additional parameters of |
None
This functions produces a scatter plot showing the gene expression variance as a function of the mean and the inferred polynomial fit of the background model computed by RaceID3. It also shows a local regression.
plotbackground(object)
plotbackground(object)
object |
|
None
This function plots the variance against mean expression across all genes and a second order polynomial to the variance-mean dependence in log space. It also plots a local regression.
plotBackVar(x)
plotBackVar(x)
x |
List object returned by function |
None
bg <- fitBackVar(intestinalDataSmall) plotBackVar(bg)
bg <- fitBackVar(intestinalDataSmall) plotBackVar(bg)
This functions produces a barplot of differentially expressed genes derived by the function diffgenes
plotdiffgenes(z, gene)
plotdiffgenes(z, gene)
z |
Output of |
gene |
Valid gene name. Has to correspond to one of the rownames of the |
None
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) x <- diffgenes(sc,1,2) head(x$z) plotdiffgenes(x,names(x$z)[1])
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) x <- diffgenes(sc,1,2) head(x$z) plotdiffgenes(x,names(x$z)[1])
This is a plotting function for visualizing the output of the diffexpnb
or clustdiffgenes
function as MA plot.
plotdiffgenesnb( x, pthr = 0.05, padj = TRUE, lthr = 0, mthr = -Inf, Aname = NULL, Bname = NULL, show_names = TRUE, ... )
plotdiffgenesnb( x, pthr = 0.05, padj = TRUE, lthr = 0, mthr = -Inf, Aname = NULL, Bname = NULL, show_names = TRUE, ... )
x |
output of the function |
pthr |
real number between 0 and 1. This number represents the p-value cutoff applied for displaying differentially expressed genes. Default value is 0.05. The parameter |
padj |
logical value. If |
lthr |
real number between 0 and Inf. Differentially expressed genes are displayed only for log2 fold-changes greater than |
mthr |
real number between -Inf and Inf. Differentially expressed genes are displayed only for log2 mean expression greater than |
Aname |
name of expression set |
Bname |
name of expression set |
show_names |
logical value. If |
... |
Additional arguments for function |
None
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) A <- names(sc@cpart)[sc@cpart %in% c(1,2)] B <- names(sc@cpart)[sc@cpart %in% c(3)] y <- diffexpnb(getfdata(sc,n=c(A,B)), A=A, B=B ) plotdiffgenesnb(y)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) A <- names(sc@cpart)[sc@cpart %in% c(1,2)] B <- names(sc@cpart)[sc@cpart %in% c(3)] y <- diffexpnb(getfdata(sc,n=c(A,B)), A=A, B=B ) plotdiffgenesnb(y)
This is a plotting function for visualizing the output of the diffNoisyGenesTB
function as MA plot.
plotDiffNoise( x, pthr = 0.05, mu = TRUE, lthr = 0, ps = 0.01, mthr = -Inf, set.name = NULL, bgr.name = NULL, show_names = TRUE )
plotDiffNoise( x, pthr = 0.05, mu = TRUE, lthr = 0, ps = 0.01, mthr = -Inf, set.name = NULL, bgr.name = NULL, show_names = TRUE )
x |
output of the function |
pthr |
real number between 0 and 1. This number represents the p-value cutoff applied for displaying differentially variable genes. Default value is 0.05. |
mu |
logical value. If |
lthr |
real number between 0 and Inf. Differentially variable genes are displayed only for log2 fold-changes greater than |
ps |
positive real number. Pseudo-count added to component |
mthr |
real number between -Inf and Inf. Differentially variable genes are displayed only for log2 mean expression (or mean noise, if |
set.name |
name of |
bgr.name |
name of |
show_names |
logical value. If |
None
This functions plots the explained variance as a function of PCA/ICA components computed by the function CCcorrect
. The number of components where
the change in explained variability upon adding further components approaches linear behaviour demarcates the saturation point and is highlighted in blue.
plotdimsat(object, change = TRUE, lim = NULL)
plotdimsat(object, change = TRUE, lim = NULL)
object |
|
change |
logical. If |
lim |
Number of components included for he calculation and shown in the plot. Default is |
None
This function plots a histogram of the ratios of cell-to-cell distances in the original versus the high-dimensional embedded space used as input for the StemID2 inferences. The embedded space approximates correlation-based distances by Euclidean distances obtained by classical multi-dimensional scaling. A minimum spanning tree of the cluster centers is overlaid for comparison.
plotdistanceratio(object)
plotdistanceratio(object)
object |
|
None.
This functions highlights gene expression in a two-dimensional t-SNE map, UMAP, or a Fruchterman-Rheingold graph layout of the singe-cell transcriptome data.
plotexpmap( object, g, n = NULL, logsc = FALSE, imputed = FALSE, fr = FALSE, um = FALSE, cells = NULL, cex = 0.5, map = TRUE, leg = TRUE, noise = FALSE )
plotexpmap( object, g, n = NULL, logsc = FALSE, imputed = FALSE, fr = FALSE, um = FALSE, cells = NULL, cex = 0.5, map = TRUE, leg = TRUE, noise = FALSE )
object |
|
g |
Individual gene name or vector with a group of gene names corresponding to a subset of valid row names of the |
n |
String of characters representing the title of the plot. Default is |
logsc |
logical. If |
imputed |
logical. If |
fr |
logical. If |
um |
logical. If |
cells |
Vector of valid cell names corresponding to column names of slot |
cex |
size of data points. Default value is 0.5. |
map |
logical. If |
leg |
logical. If |
noise |
logical. If |
None
Plotting noise (epsilon) as a function of normalized or non-normalized expression for a given gene.
plotExpNoise(g, object, noise, set = NULL, ps = 0.1, norm = TRUE, ...)
plotExpNoise(g, object, noise, set = NULL, ps = 0.1, norm = TRUE, ...)
g |
Valid gene ID with available expression and noise estimates. |
object |
RaceID |
noise |
List object returned by the |
set |
Set of valid cluster numbers. Default is |
ps |
Real number. Pseudo-count added to noise and expression estimates. Default is 0.1. |
norm |
logical. If |
... |
Additional arguments of |
None.
This functions highlights feature values in a two-dimensional t-SNE map, UMAP, or a Fruchterman-Rheingold graph layout of the singe-cell transcriptome data.
plotfeatmap( object, g, n = NULL, logsc = FALSE, fr = FALSE, um = FALSE, cells = NULL, cex = 1, map = TRUE, leg = TRUE, flo = NULL, ceil = NULL )
plotfeatmap( object, g, n = NULL, logsc = FALSE, fr = FALSE, um = FALSE, cells = NULL, cex = 1, map = TRUE, leg = TRUE, flo = NULL, ceil = NULL )
object |
|
g |
Vector of real numbered features to highlight in the dimensional reduction representation, NAs will be highlighted in grey. |
n |
String of characters representing the title of the plot. Default is |
logsc |
logical. If |
fr |
logical. If |
um |
logical. If |
cells |
Vector of valid cell names corresponding to column names of slot |
cex |
size of data points. Default value is 1. |
map |
logical. If |
leg |
logical. If |
flo |
Numeric. Lower bound for feature values. All values smaller then |
ceil |
Numeric. Upper bound for feature values. All values larger then |
None
This function plots a graph of lineage trajectories connecting RaceID3 cluster medoids as inferred by StemID2 to approximate the lineage tree. The plot highlights significant links, where colour indicates the level of significance and width indicates the link score. The node colour reflects the level of transcriptome entropy.
plotgraph( object, showCells = FALSE, showMap = TRUE, tp = 0.5, scthr = 0, cex = 1 )
plotgraph( object, showCells = FALSE, showMap = TRUE, tp = 0.5, scthr = 0, cex = 1 )
object |
|
showCells |
logical. If |
showMap |
logical. Tf |
tp |
Real number between zero and one. Level of transparency of the t-SNE map. Deafault is 0.5. See |
scthr |
Real number between zero and one. Score threshold for links to be shown in the graph. For |
cex |
real positive number. Size of data points. Deault is 1. |
None.
This functions plots a barchart of Jaccard similarities for the RaceID3 clusters before outlier identification
plotjaccard(object)
plotjaccard(object)
object |
|
None
This functions plots cell labels into a two-dimensional t-SNE map, UMAP, or a Fruchterman-Rheingold graph layout of the singe-cell transcriptome data.
plotlabelsmap(object, labels = NULL, fr = FALSE, um = FALSE, cex = 0.5)
plotlabelsmap(object, labels = NULL, fr = FALSE, um = FALSE, cex = 0.5)
object |
|
labels |
Vector of labels for all cells to be highlighted in the t-SNE map. The order has to be the same as for the
columns in slot |
fr |
logical. If |
um |
logical. If |
cex |
positive real number. Size of the labels. Default is 0.5. |
None
This function plots a heatmap of link p-values.
plotlinkpv(object)
plotlinkpv(object)
object |
|
None.
This function plots a heatmap of link score.
plotlinkscore(object)
plotlinkscore(object)
object |
|
None.
This functions plots a two-dimensional t-SNE map, UMAP, or a Fruchterman-Rheingold graph layout of the singe-cell transcriptome data.
plotmap(object, final = TRUE, tp = 1, fr = FALSE, um = FALSE, cex = 0.5)
plotmap(object, final = TRUE, tp = 1, fr = FALSE, um = FALSE, cex = 0.5)
object |
|
final |
logical. If |
tp |
Number between 0 and 1 to change transparency of dots in the map. Default is 1. |
fr |
logical. If |
um |
logical. If |
cex |
size of data points. Default value is 0.5. |
None
This functions generates a heatmap of expression for defined group of genes and can highlight the clustering partition and another sample grouping, e.g. origin or cell type.
plotmarkergenes( object, genes, imputed = FALSE, cthr = 0, cl = NULL, cells = NULL, order.cells = FALSE, aggr = FALSE, norm = FALSE, cap = NULL, flo = NULL, samples = NULL, cluster_cols = FALSE, cluster_rows = TRUE, cluster_set = FALSE, samples_col = NULL, zsc = FALSE, logscale = TRUE, noise = FALSE, fontsize = 10 )
plotmarkergenes( object, genes, imputed = FALSE, cthr = 0, cl = NULL, cells = NULL, order.cells = FALSE, aggr = FALSE, norm = FALSE, cap = NULL, flo = NULL, samples = NULL, cluster_cols = FALSE, cluster_rows = TRUE, cluster_set = FALSE, samples_col = NULL, zsc = FALSE, logscale = TRUE, noise = FALSE, fontsize = 10 )
object |
|
genes |
A vector with a group of gene names corresponding to a subset of valid row names of the |
imputed |
logical. If |
cthr |
Interger number greater or equal zero. Only clusters with |
cl |
Vector of valid cluster numbers contained in slot |
cells |
Vector of valid cell names corresponding to column names of slot |
order.cells |
logical. If |
aggr |
logical. If |
norm |
logical. If |
cap |
Numeric. Upper bound for gene expression. All values larger then |
flo |
Numeric. Lower bound for gene expression. All values smaller then |
samples |
A vector with a group of sample names for each cell in the same order as the column names of the |
cluster_cols |
logical. If |
cluster_rows |
logical. If |
cluster_set |
logical. If |
samples_col |
Vector of colors used for highlighting all samples contained in |
zsc |
logical. If |
logscale |
logical. If |
noise |
logical. If |
fontsize |
postive real number. Font size of gene name labels. Default is 10. |
Object with clustering information for rows and columns returned by the function pheatmap
from the package pheatmap.
This functions plots the dependence of the transcript count variance or, alternatively, the coefficient of variation (CV) on the mean in log2 space. The mean-variance dependence is plotted along with a loess-regression, a second order polynomial fit, and the background model of the local variability. The CV plot also highlights the local variability associated with cell-to-cell variability of total transcript counts, as calculated directly from the mean and variance of total transcript counts (turquoise) or from a local fit of a gamma distribution (orange).
plotMV(x, cv = FALSE, ret = FALSE, span = 0.75, degree = 2, ...)
plotMV(x, cv = FALSE, ret = FALSE, span = 0.75, degree = 2, ...)
x |
Transcript count matrix. |
cv |
Logical. If |
ret |
Logical. If |
span |
Parameter for the local regression. See help(loess). Default value is 0.75. |
degree |
Parameter for the local regression. See help(loess). Default value is 2. |
... |
Additional arguments for |
If ret=FALSE
second order polynomial fit as returned by lm
.
This function plots the variance against mean expression across all genes and a second order polynomial to the base line of the variance-mean dependence in log space.
plotNoiseModel(x, corrected = FALSE)
plotNoiseModel(x, corrected = FALSE)
x |
List object returned by function |
corrected |
logical value. If |
None
x <- noiseBaseFit(intestinalDataSmall,step=.01,thr=.05) plotNoiseModel(x)
x <- noiseBaseFit(intestinalDataSmall,step=.01,thr=.05) plotNoiseModel(x)
This functions plots a barchart of outlier probabilities across all cells in each cluster.
plotoutlierprobs(object)
plotoutlierprobs(object)
object |
|
None
This functions plots the percentage variability explained the first one hundred (or pcaComp
) pricipal components of the PCA performed in the function pruneKnn
if the parameter large
was TRUE. The selected number of principal components (if pcaComp
was NULL) is determined by an elbow criterion and highlighted by a blue circle.
plotPC(res, logDiff = FALSE)
plotPC(res, logDiff = FALSE)
res |
List object with k nearest neighbour information returned by |
logDiff |
logical. If |
This function plots the variance versus the mean of the Pearson residuals obtained by the negative binomial regression computed by the function compY
if regNB
is TRUE
. A local regression is also shown.
plotPearsonRes(y, log = FALSE, ...)
plotPearsonRes(y, log = FALSE, ...)
y |
List object returned by the |
log |
logical. If |
... |
Additional arguments for |
None
res <- pruneKnn(intestinalDataSmall,no_cores=1) plotPearsonRes(res,log=TRUE)
res <- pruneKnn(intestinalDataSmall,no_cores=1) plotPearsonRes(res,log=TRUE)
This function plots various statistics for the posterior check
plotPP(pp, y = NULL, umi.eps = FALSE, i = 1, log.scale = TRUE)
plotPP(pp, y = NULL, umi.eps = FALSE, i = 1, log.scale = TRUE)
pp |
List object returned by |
y |
One of "mean", "median", "var", "cor", or |
umi.eps |
Logical. If |
i |
Positive integer number. Index of |
log.scale |
Logical. If |
Highlight clusters or pseudotime in dimensional reduction representation and indicate trajectory derived by slingshot.
plotPT(pt, object, clusters = TRUE, lineages = FALSE)
plotPT(pt, object, clusters = TRUE, lineages = FALSE)
pt |
List object returned by function |
object |
RaceID |
clusters |
logical. If |
lineages |
logical. If |
None
Displaying two noise-related quantaties of local pruned k-nearest neighbourhoods in a scatterplot highlighting VarID clusters.
plotQQ(x, m, n, object, cluster = NULL, cex = 0.5, show.cor = TRUE, ...)
plotQQ(x, m, n, object, cluster = NULL, cex = 0.5, show.cor = TRUE, ...)
x |
List object returned by |
m |
Component name of |
n |
Component name of |
object |
|
cluster |
Valid cluster number or vector of cluster numbers, contained in |
cex |
Real positive number. Size of data points. Default is 0.5. |
show.cor |
logical. If |
... |
Additional parameters of |
None
Plotting noise-related quantaties of local pruned k-nearest neighbourhoods in the dimensional reduction representation chosen for quantKnn
or as boxplot across clusters.
plotQuantMap( x, n, object, box = FALSE, cluster = NULL, set = NULL, logsc = FALSE, cex = 0.5, ... )
plotQuantMap( x, n, object, box = FALSE, cluster = NULL, set = NULL, logsc = FALSE, cex = 0.5, ... )
x |
List object returned by |
n |
Component name of |
object |
|
box |
Logical. If |
cluster |
Valid cluster number or vector of cluster numbers, contained in |
set |
Ordered set of valid cluster numbers. If |
logsc |
logical. If |
cex |
Real positive number. Size of data points. Default is 0.5. |
... |
Additional parameters of |
None
This function plots the parameters obatined by the negative binomial regression of the transcript counts on the total transcript count in each cells. Smoothed parameter estimates are also shown.
plotRegNB(expData, y, par.nb = NULL, span = 0.75, ...)
plotRegNB(expData, y, par.nb = NULL, span = 0.75, ...)
expData |
Matrix of gene expression values with genes as rows and cells as columns. The matrix needs to contain the same cell IDs as columns like the input matrix.
used to derive the pruned k nearest neighbours with the |
y |
List object returned by the |
par.nb |
Parameter to be plotted, i.e. valid column of |
span |
Positive real number. Parameter for loess-regression (see |
... |
Additional arguments for |
None
res <- pruneKnn(intestinalDataSmall,no_cores=1) plotRegNB(intestinalDataSmall,res,"theta")
res <- pruneKnn(intestinalDataSmall,no_cores=1) plotRegNB(intestinalDataSmall,res,"theta")
This functions plots the (change in the) mean within-cluster dispersion as a function of the cluster number and highlights the saturation point inferred based on the saturation criterion applied by RaceID3: The number of clusters where the change in within-cluster dispersion upon adding further clusters approaches linear behaviour demarcates the saturation point and is highlighted in blue.
plotsaturation(object, disp = FALSE)
plotsaturation(object, disp = FALSE)
object |
|
disp |
logical. If |
None
This functions plots the number of outliers as a function of the outlier probability.
plotsensitivity(object)
plotsensitivity(object)
object |
|
None
This functions produces a silhouette plot for RaceID3 clusters prior or post outlier identification.
plotsilhouette(object, final = FALSE)
plotsilhouette(object, final = FALSE)
object |
|
final |
logical. If |
None
This function plots a minimum spanning tree of the RaceID3 cluster medoids in a two-dimensional reduction representation.
plotspantree(object, tp = 0.5, cex = 1, projections = FALSE)
plotspantree(object, tp = 0.5, cex = 1, projections = FALSE)
object |
|
tp |
Real number between zero and one. Level of transparency of the t-SNE map. Deafault is 0.5. |
cex |
real positive number. Size of data points. Deault is 1. |
projections |
logical. If |
None.
This functions highlights groups of cells by different symbols in a two-dimensional t-SNE map, UMAP, or a Fruchterman-Rheingold graph layout of the singe-cell transcriptome data.
plotsymbolsmap( object, types, subset = NULL, samples_col = NULL, cex = 0.5, fr = FALSE, um = FALSE, leg = TRUE, map = TRUE, cex.legend = 0.75, leg.pos = "topleft" )
plotsymbolsmap( object, types, subset = NULL, samples_col = NULL, cex = 0.5, fr = FALSE, um = FALSE, leg = TRUE, map = TRUE, cex.legend = 0.75, leg.pos = "topleft" )
object |
|
types |
Vector assigning each cell to a type to be highlighted in the t-SNE map. The order has to be the same as for the
columns in slot |
subset |
Vector containing a subset of types from |
samples_col |
Vector of colors used for highlighting all samples contained in |
cex |
size of data points. Default value is 0.5. |
fr |
logical. If |
um |
logical. If |
leg |
logical. If |
map |
logical. If |
cex.legend |
Positive real number. Size of data points and text in the legend. Default is 0.75. |
leg.pos |
Position of the legend. a single keyword from the list ‘"bottomright"’, ‘"bottom"’, ‘"bottomleft"’,‘"left"’, ‘"topleft"’, ‘"top"’, ‘"topright"’, ‘"right"’ and‘"center"’. This places the legend on the inside of the plot frame at the given location. |
None
This function plots the transitions probabilities in a dimensional reduction representation of a RaceID SCseq
object updates with the
updateSC
function.
in order to utilize RaceID functions for visualization.
plotTrProbs( object, probs, tp = 0.5, prthr = 0, cthr = 0, fr = FALSE, um = FALSE, cex = 0.5 )
plotTrProbs( object, probs, tp = 0.5, prthr = 0, cthr = 0, fr = FALSE, um = FALSE, cex = 0.5 )
object |
RaceID |
probs |
Matrix of transition probabilities between clusters, returned by the |
tp |
Positive real number between 0 and 1. Transparency of the data points in the dimensional reduction map. Default is 0.5. |
prthr |
Positive real number between 0 and 1. Threshold of transition probabilities. Only transitions with probability |
cthr |
Integer number greater or equal 0 defining the minimum clusters size for inclusion into the map. Default is 0. |
fr |
Logical. If |
um |
Logical. If |
cex |
Real positive number. Size of data points. Default is 0.5. |
None
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) d <- getExpData(sc) res <- pruneKnn(d,distM=sc@distances,knn=10,alpha=1,no_cores=1,FSelect=FALSE) cl <- graphCluster(res,pvalue=0.01) sc <- updateSC(sc,res=res,cl=cl) sc <- comptsne(sc) probs <-transitionProbs(res,cl,pvalue=0.01) plotTrProbs(sc,probs,tp=.5,prthr=0,cthr=0,fr=FALSE)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) d <- getExpData(sc) res <- pruneKnn(d,distM=sc@distances,knn=10,alpha=1,no_cores=1,FSelect=FALSE) cl <- graphCluster(res,pvalue=0.01) sc <- updateSC(sc,res=res,cl=cl) sc <- comptsne(sc) probs <-transitionProbs(res,cl,pvalue=0.01) plotTrProbs(sc,probs,tp=.5,prthr=0,cthr=0,fr=FALSE)
This function plots the dependence of mean noise per cell on the total UMI count per cell. It serves as a basis for choosing the prior parameter gamma
(see function compTBNoise
). With a proper parameter choice, there should be no correlation between the two quantities. If a positive correlation is observed, gamma
should be increased in order to weaken the prior. If the correlation is negative, gamma
should be decreased in order to increase the strength of the prior.
plotUMINoise(object, noise, log.scale = TRUE)
plotUMINoise(object, noise, log.scale = TRUE)
object |
RaceID |
noise |
object returned by |
log.scale |
Logical. If |
Non-normalized negative log posterior probability with a negative binomial likelihood and Cauchy prior.
postfntb(eps, z, x0, gamma, mu, rt)
postfntb(eps, z, x0, gamma, mu, rt)
eps |
Positive real number. Residual (biological) noise. |
z |
Vector of integer number greater or equal zero. Transcript counts. |
x0 |
Real number. Location parameter. |
gamma |
Positive real number. Scale parameter. |
mu |
Positive real number. Mean expression. |
rt |
Positive real number. Technical noise parameter. See help(fitGammaRt). |
Negative non-normalized log posterior probability fro maximum a posterior inference.
A prior function specified as Cauchy probability density.
priorfn(x, x0, gamma)
priorfn(x, x0, gamma)
x |
Vector or real numbers (quantiles) |
x0 |
Real number. Location parameter. |
gamma |
Positive real number. Scale parameter. |
Vector of probabilities
This function computes the projections of cells onto inter-cluster links for randomized cell positions in a high-dimensional embedded space. Significance of link based on an increased number of cells on a link is inferred based on this background model.
projback(object, pdishuf = 500, fast = FALSE, rseed = 17000, verbose = TRUE)
projback(object, pdishuf = 500, fast = FALSE, rseed = 17000, verbose = TRUE)
object |
|
pdishuf |
Number of randomizations of cell positions for which to compute projections of cells on inter-cluster links. Default is 2000.
No randomizations are needed in this mode and the function will do nothing. Default is |
fast |
logical. If |
rseed |
Integer number used as seed to ensure reproducibility of randomizations. Defaut is 17000. |
verbose |
logical. If |
An Ltree class object with all information on randomized cell projections onto links stored in the prbacka
slot.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr,nmode=FALSE) ltr <- projback(ltr,pdishuf=50)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr,nmode=FALSE) ltr <- projback(ltr,pdishuf=50)
This function computes the projections of cells onto inter-cluster links in a high-dimensional embedded space.
projcells(object, cthr = 5, nmode = TRUE, knn = 3, fr = FALSE, um = FALSE)
projcells(object, cthr = 5, nmode = TRUE, knn = 3, fr = FALSE, um = FALSE)
object |
|
cthr |
Positive integer number. Clusters to be included into the StemID2 analysis must contain more than |
nmode |
logical. If |
knn |
Positive integer number. See |
fr |
logical. Use Fruchterman-Rheingold layout instead of t-SNE for dimensional-reduction representation of the lineage graph. Default is |
um |
logical. Use umap representation instead of t-SNE for dimensional-reduction representation of the lineage graph. Default is |
An Ltree class object with all information on cell projections onto links stored in the ldata
slot.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- comptsne(sc) ltr <- Ltree(sc) ltr <- compentropy(ltr) ltr <- projcells(ltr)
This function plots a heatmap of the enrichment ratios of cells on significant links.
projenrichment(object)
projenrichment(object)
object |
|
None.
This function determines k nearest neighbours for each cell in gene expression space, and tests if the links are supported by a negative binomial joint distribution of gene expression. A probability is assigned to each link which is given by the minimum joint probability across all genes.
pruneKnn( expData, distM = NULL, large = TRUE, regNB = TRUE, bmethod = NULL, batch = NULL, regVar = NULL, offsetModel = TRUE, thetaML = FALSE, theta = 10, ngenes = 2000, span = 0.75, pcaComp = NULL, tol = 1e-05, algorithm = "kd_tree", metric = "pearson", genes = NULL, knn = 25, do.prune = TRUE, alpha = 1, nb = 3, no_cores = NULL, FSelect = FALSE, pca.scale = FALSE, ps = 1, seed = 12345, theta.harmony = NULL, ... )
pruneKnn( expData, distM = NULL, large = TRUE, regNB = TRUE, bmethod = NULL, batch = NULL, regVar = NULL, offsetModel = TRUE, thetaML = FALSE, theta = 10, ngenes = 2000, span = 0.75, pcaComp = NULL, tol = 1e-05, algorithm = "kd_tree", metric = "pearson", genes = NULL, knn = 25, do.prune = TRUE, alpha = 1, nb = 3, no_cores = NULL, FSelect = FALSE, pca.scale = FALSE, ps = 1, seed = 12345, theta.harmony = NULL, ... )
expData |
Matrix of gene expression values with genes as rows and cells as columns. These values have to correspond to unique molecular identifier counts. Alternatively, a Seurat object could be used as input, after normalization, PCA-dimensional reduction, and shared-nearest neighbour inference. |
distM |
Optional distance matrix used for determining k nearest neighbours. Default is |
large |
logical. If |
regNB |
logical. If |
bmethod |
Character string indicating the batch correction method. If "harmony", then batch correction is performed by the harmony package. Default is |
batch |
vector of batch variables. Component names need to correspond to valid cell IDs, i.e. column names of |
regVar |
data.frame with additional variables to be regressed out simultaneously with the log UMI count and the batch variable (if |
offsetModel |
Logical parameter. Only considered if |
thetaML |
Logical parameter. Only considered if |
theta |
Positive real number. Fixed value of the dispersion parameter. Only considered if |
ngenes |
Positive integer number. Randomly sampled number of genes (from rownames of |
span |
Positive real number. Parameter for loess-regression (see |
pcaComp |
Positive integer number. Number of princple components to be included if |
tol |
Numerical value greater than zero. Tolerance for numerical PCA using irlba. Default value is 1e-6. |
algorithm |
Algorithm for fast k nearest neighbour inference, using the |
metric |
Distances are computed from the expression matrix |
genes |
Vector of gene names corresponding to a subset of rownames of |
knn |
Positive integer number. Number of nearest neighbours considered for each cell. Default is 25. |
do.prune |
Logical parameter. If |
alpha |
Positive real number. Relative weight of a cell versus its k nearest neigbour applied for the derivation of joint probabilities. A cell receives a weight of |
nb |
Positive integer number. Number of genes with the lowest outlier probability included for calculating the link probabilities for the knn pruning. The link probability is computed as the geometric mean across these genes. Default is 3. |
no_cores |
Positive integer number. Number of cores for multithreading. If set to |
FSelect |
Logical parameter. If |
pca.scale |
Logical parameter. If |
ps |
Real number greater or equal to zero. Pseudocount to be added to counts within local neighbourhoods for outlier identification and pruning. Default is 1. |
seed |
Integer number. Random number to initialize stochastic routines. Default is 12345. |
theta.harmony |
|
... |
Additional parameters for |
List object of six components:
distM |
Distance matrix. |
dimRed |
PCA transformation of |
pvM |
Matrix of link probabilities between a cell and each of its k nearest neighbours (Bonferroni-corrected p-values). Column |
pvM.raw |
Matrix of uncorrected link probabilities between a cell and each of its k nearest neighbours (without multiple-testing correction). Column |
NN |
Matrix of column indices of k nearest neighbours for each cell according to input matrix |
B |
List object with background model of gene expression as obtained by |
regData |
If |
alpha |
Vector of inferred values for the |
pars |
List object storing the run parameters. |
pca |
Principal component analysis of the of the input data, if |
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE)
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE)
Extract pseudo-time order of cells along a trajectory defined by a set of clusters using the slingshot algorithm. If the slingshot package is unavailable, the function falls back to inference by principal curve analysis using the princurve package.
pseudoTime( object, set, m = NULL, useSlingshot = TRUE, map = "umap", x = NULL, n_neighbors = 15, metric = "euclidean", n_epochs = 200, min_dist = 0.1, local_connectivity = 1, spread = 1, initial_cmd = TRUE, perplexity = 30, rseed = 15555, ... )
pseudoTime( object, set, m = NULL, useSlingshot = TRUE, map = "umap", x = NULL, n_neighbors = 15, metric = "euclidean", n_epochs = 200, min_dist = 0.1, local_connectivity = 1, spread = 1, initial_cmd = TRUE, perplexity = 30, rseed = 15555, ... )
object |
RaceID |
set |
Set of valid ordered cluster numbers (in |
m |
Existing dimensional reduction representation of RaceID object. Either |
useSlingshot |
logical. If |
map |
Either |
x |
Optional feature matrix, which will be directly used for computation of the dimensional reduction representation. Default is NULL and |
n_neighbors |
Umap parameter (used if |
metric |
Umap parameter (used if |
n_epochs |
Umap parameter (used if |
min_dist |
Umap parameter (used if |
local_connectivity |
Umap parameter (used if |
spread |
Umap parameter (used if |
initial_cmd |
logical. t-SNE parameter (used if |
perplexity |
Positive number. t-SNE parameter (used if |
rseed |
Integer number. Random seed to enforce reproducible dimensional reduction computation. |
... |
Additional arguments to be passed to the |
List object of six components:
pt |
Vector of pseudo-time value obtained by slingshot. |
ord |
Vector of cells in |
set |
Vector of cluster numbers defining the trajectory used for pseudo-time inference. |
part |
Vector of cluster numbers of all cells in |
rd |
Two-dimensional matrix with x- and y-coordinates of dimensional reduction representation used for |
sls |
|
This function computes a number of noise-related quantities for all pruned k-nearest neighbourhoods.
quantKnn(res, noise, object, pvalue = 0.01, minN = 5, no_cores = NULL)
quantKnn(res, noise, object, pvalue = 0.01, minN = 5, no_cores = NULL)
res |
List object with k nearest neighbour information returned by |
noise |
List of noise parameters returned by |
object |
|
pvalue |
Positive real number between 0 and 1. All nearest neighbours with link probability |
minN |
Positive integer number. Noise inference is only done for k-nearest neighbourhoods with at least |
no_cores |
Positive integer number. Number of cores for multithreading. If set to |
List object with eight components:
noise.av |
Vector of biological noise average across all genes for each k-nearest neighbourhood. |
noise.ratio |
Vector of ratio between total noise and technical noise averaged across all genes for each k-nearest neighbourhood. |
local.corr |
Vector of average Spearman's correlation coefficient between all cell in a pruned k-nearest neighourhood. |
umi |
Vector of total UMI counts for all cells. |
Simple function using Rcpp
rcpp_hello_world()
rcpp_hello_world()
## Not run: rcpp_hello_world() ## End(Not run)
## Not run: rcpp_hello_world() ## End(Not run)
This functions applies random forests-based reclassification of cell clusters to enhance robustness of the final clusters.
rfcorrect( object, rfseed = 12345, nbtree = NULL, final = TRUE, nbfactor = 5, ... )
rfcorrect( object, rfseed = 12345, nbtree = NULL, final = TRUE, nbfactor = 5, ... )
object |
|
rfseed |
Seed for enforcing reproducible results. Default is 12345. |
nbtree |
Number of trees to be built. Default is |
final |
logical. If |
nbfactor |
Positive integer number. See |
... |
additional input arguments to the |
The function returns an updated SCseq
object with random forests votes written to slot out$rfvotes
. The clustering
partition prior or post outlier identification (slot cluster$kpart
or cpart
, if parameter final
equals FALSE
or TRUE
, respectively) is overwritten with the partition derived from the reclassification.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- rfcorrect(sc)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) sc <- clustexp(sc) sc <- findoutliers(sc) sc <- rfcorrect(sc)
The SCseq class is the central object storing all information generated during cell type identification with the RaceID3 algorithm. It comprises a number of slots for a variety of objects.
object |
An SCseq object. |
expdata
The raw expression data matrix with cells as columns and genes as rows in sparse matrix format.
ndata
Filtered data with expression normalized to one for each cell.
noise
Matrix with local gene expression noise estimates (used for VarID analysis)
counts
Vector with total transcript counts for each cell in ndata
remaining after filtering.
genes
Vector with gene names of all genes in ndata
remaining after filtering.
dimRed
list object object storing information on a feature matrix obtained by dimensional reduction, batch effect correction etc.
Component x
stores the actual feature matrix.
distances
distance (or dis-similarity) matrix computed by RaceID3.
imputed
list with two matrices computed for imputing gene expression. The first matrix nn
contains the cell indices of the knn
nearest neighbours,
the second matrix contains the probabilities at which each cell contributes to thye imputed gene expression value for the cell correponding to the columns.
tsne
data.frame with coordinates of two-dimensional tsne layout computed by RaceID3.
fr
data.frame with coordinates of two-dimensional Fruchterman-Rheingold graphlayout computed by RaceID3.
umap
data.frame with coordinates of two-dimensional umap representation computed by RaceID3.
cluster
list storing information on the initial clustering step of the RaceID3 algorithm
background
list storing the polynomial fit for the background model of gene expression variability computed by RaceID3, which is used for outlier identification.
out
list storing information on outlier cells used for the prediction of rare cell types by RaceID3
cpart
vector containing the final clustering (i.e. cell type) partition computed by RaceID3
fcol
vector contaning the colour scheme for the RaceID3 clusters
medoids
vector containing the cell ids for th cluster medoids
filterpar
list containing the parameters used for cell and gene filterung
clusterpar
list containing the parameters used for clustering
outlierpar
list containing the parameters used for outlier identification
This function expects a class Seurat
object from the Seurat package as input and converts this into a RaceID SCseq
object. The function transfers the counts, initializes ndata
and fdata
without further filtering, transfers the PCA cell embeddings from the Seurat
object to dimRed
, transfers the clustering partition, and umap
and tsne
dimensional reduction (if available). CAUTION: Cluster numbers in RaceID start at 1 by default. Hence, all Seurat cluster numbers are shifted by 1.
Seurat2SCseq(Se, rseed = 12345)
Seurat2SCseq(Se, rseed = 12345)
Se |
Seurat object. |
rseed |
Integer number. Random seed for sampling cluster colours. |
RaceID SCseq
object.
This functions compares variance estimates obtained from the maximum a posterior estimate with a given prior to the data. The ratio between the predicted variance and the actual variance for a random subset of genes is computed across all pruned k nearest neighbourhoods.
testPrior( res, expData, gamma = c(0.2, 0.5, 1, 5, 1000), rseed = 12345, ngenes = 200, pvalue = 0.01, minN = 5, no_cores = NULL, x0 = 0, lower = 0, upper = 100 )
testPrior( res, expData, gamma = c(0.2, 0.5, 1, 5, 1000), rseed = 12345, ngenes = 200, pvalue = 0.01, minN = 5, no_cores = NULL, x0 = 0, lower = 0, upper = 100 )
res |
List object with k nearest neighbour information returned by |
expData |
Matrix of gene expression values with genes as rows and cells as columns. These values have to correspond to unique molecular identifier counts. |
gamma |
Vector of |
rseed |
Integer number. Random seed to enforce reproducible gene sampling. Default is 12345. |
ngenes |
Positive integer number. Randomly sampled number of genes (from rownames of |
pvalue |
Input parameter for |
minN |
Input parameter for |
no_cores |
Input parameter for |
x0 |
Input parameter for |
lower |
Input parameter for |
upper |
Input parameter for |
List of three components:
pp.var.ratio |
List of vectors for each gamma value of ratios between predicted and actual variances across all sampled genes and neighbourhoods. |
noise |
List of noise objects obtained from |
tc |
Vector of total transcript counts for all cells |
This function computes transition probabilities between clusters based on the link probabilities of the pruned k nearest neighbour graph.
transitionProbs(res, cl, pvalue = 0.01)
transitionProbs(res, cl, pvalue = 0.01)
res |
List object with k nearest neighbour information returned by |
cl |
List object with clustering information, returned by the |
pvalue |
Positive real number between 0 and 1. All nearest neighbours with link probability |
Matrix of transition probabilities between clusters.
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) cl <- graphCluster(res,pvalue=0.01) probs <-transitionProbs(res,cl,pvalue=0.01)
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE) cl <- graphCluster(res,pvalue=0.01) probs <-transitionProbs(res,cl,pvalue=0.01)
This function updates a RaceID SCseq
object with a distance matrix or dimensionally reduced feature matrix,
a clustering partition, and/or a matrix of gene expression variability,
in order to utilize RaceID functions for visualization.
updateSC(object, res = NULL, cl = NULL, noise = NULL, flo = NULL)
updateSC(object, res = NULL, cl = NULL, noise = NULL, flo = NULL)
object |
RaceID |
res |
List object returned by |
cl |
List object with clustering information, returned by the |
noise |
List object with the background noise model and a variability matrix, returned by the |
flo |
Real number. Lower cutoff for the gene expression variability. All values |
SCseq
object with a distance matrix (slot distances
) and a dimensionally reduced feature matrix (slot dimRed$x
), or clustering partition (slot cpart
and cluster$kpart
) derived from the VarID analysis, and/or with a gene expression variability matrix in slot noise
.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) d <- getExpData(sc) res <- pruneKnn(d,distM=sc@distances,knn=10,alpha=1,no_cores=1,FSelect=FALSE) cl <- graphCluster(res,pvalue=0.01) sc <- updateSC(sc,res=res,cl=cl) sc <- comptsne(sc) plotmap(sc)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) sc <- compdist(sc) d <- getExpData(sc) res <- pruneKnn(d,distM=sc@distances,knn=10,alpha=1,no_cores=1,FSelect=FALSE) cl <- graphCluster(res,pvalue=0.01) sc <- updateSC(sc,res=res,cl=cl) sc <- comptsne(sc) plotmap(sc)
This functions regresses out variability associated with particular sources.
varRegression(object, vars = NULL, logscale = FALSE, Batch = FALSE)
varRegression(object, vars = NULL, logscale = FALSE, Batch = FALSE)
object |
|
vars |
data.frame of variables to be regressed out. Each column corresponds to a variable and each variable corresponds to a cell.
The object must contain all cells, i.e. column names of the slot |
logscale |
logical. If |
Batch |
logical. If |
The function returns an updated SCseq
object with the corrected expression matrix written to the slot dimRed$x
of the SCseq
object.
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) b <- sub("(\\_\\d+)$","",colnames(intestinalData)) vars <- data.frame(row.names=colnames(intestinalData),batch=b) sc <- varRegression(sc,vars)
sc <- SCseq(intestinalDataSmall) sc <- filterdata(sc) b <- sub("(\\_\\d+)$","",colnames(intestinalData)) vars <- data.frame(row.names=colnames(intestinalData),batch=b) sc <- varRegression(sc,vars)
Displaying violin plots of gene expression or gene expression noise (epsilon) across (a set of) clusters
violinMarkerPlot(g, object, noise = NULL, set = NULL, ti = NULL)
violinMarkerPlot(g, object, noise = NULL, set = NULL, ti = NULL)
g |
Valid gene ID corresponding to a (set of) rownames of |
object |
RaceID |
noise |
List of noise parameters returned by |
set |
Postive integer number or vector of integers corresponding to valid cluster numbers. Violin plots are shown for all clusters in |
ti |
String of characters representing the title of the plot. Default is |
None