Title: | Receptor Abundance Estimation using Feature Selection and Gene Set Scoring |
---|---|
Description: | Performs receptor abundance estimation for single cell RNA-sequencing data using a supervised feature selection mechanism and a thresholded gene set scoring procedure. Seurat's normalization method is described in: Hao et al., (2021) <doi:10.1016/j.cell.2021.04.048>, Stuart et al., (2019) <doi:10.1016/j.cell.2019.05.031>, Butler et al., (2018) <doi:10.1038/nbt.4096> and Satija et al., (2015) <doi:10.1038/nbt.3192>. Method for reduced rank reconstruction and rank-k selection is detailed in: Javaid et al., (2022) <doi:10.1101/2022.10.08.511197>. Gene set scoring procedure is described in: Frost et al., (2020) <doi:10.1093/nar/gkaa582>. Clustering method is outlined in: Song et al., (2020) <doi:10.1093/bioinformatics/btaa613> and Wang et al., (2011) <doi:10.32614/RJ-2011-015>. |
Authors: | H. Robert Frost [aut], Azka Javaid [aut, cre] |
Maintainer: | Azka Javaid <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2024-12-12 07:15:18 UTC |
Source: | CRAN |
Performs receptor abundance estimation for scRNA-seq target data using gene set scoring and thresholding. scRNA-seq target counts are normalized and
reduced rank reconstructed (RRR) using the
SPECK::randomizedRRR()
function. Gene set scoring is next performed leveraging expression from
the top most weighted genes based on the gene sets weights membership matrix with the VAM::vam()
function. The resulting cell-specific gene set scores
are then thresholded utilizing the Ckmeans.1d.dp::Ckmeans.1d.dp()
function. Note that this function only performs normalization
and does not perform any quality control (QC) checks on the inputted target scRNA-seq counts matrix. Any QC needed can be performed on the target matrix before passing it
as an input to the function.
receptorAbundanceEstimation( target.rnaseq, receptor.geneset.matrix, num.genes = 10, rank.range.end = 100, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1, max.num.clusters = 4, seed.ckmeans = 2 )
receptorAbundanceEstimation( target.rnaseq, receptor.geneset.matrix, num.genes = 10, rank.range.end = 100, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1, max.num.clusters = 4, seed.ckmeans = 2 )
target.rnaseq |
|
receptor.geneset.matrix |
|
num.genes |
Number of top most weighted genes for subsequent gene set scoring and thresholding. |
rank.range.end |
See documentation for the |
min.consec.diff |
See documentation for the |
rep.consec.diff |
See documentation for the |
manual.rank |
See documentation for the |
seed.rsvd |
See documentation for the |
max.num.clusters |
See documentation for the |
seed.ckmeans |
See documentation for the |
receptor.abundance.estimates
- A matrix consisting of abundance estimates for
cells in
receptors.
data("train.malt.rna.mat") data("train.malt.adt.mat") receptor.geneset.matrix.out <- receptorGeneSetConstruction(train.rnaseq = train.malt.rna.mat[1:100,1:80], train.citeseq = train.malt.adt.mat[1:100,1:2], rank.range.end = 70, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1) dim(receptor.geneset.matrix.out) head(receptor.geneset.matrix.out) data("target.malt.rna.mat") receptor.abundance.estimates.out <- receptorAbundanceEstimation(target.rnaseq = target.malt.rna.mat[1:200,1:80], receptor.geneset.matrix = receptor.geneset.matrix.out, num.genes = 10, rank.range.end = 70, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1, max.num.clusters = 4, seed.ckmeans = 2) dim(receptor.abundance.estimates.out) head(receptor.abundance.estimates.out)
data("train.malt.rna.mat") data("train.malt.adt.mat") receptor.geneset.matrix.out <- receptorGeneSetConstruction(train.rnaseq = train.malt.rna.mat[1:100,1:80], train.citeseq = train.malt.adt.mat[1:100,1:2], rank.range.end = 70, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1) dim(receptor.geneset.matrix.out) head(receptor.geneset.matrix.out) data("target.malt.rna.mat") receptor.abundance.estimates.out <- receptorAbundanceEstimation(target.rnaseq = target.malt.rna.mat[1:200,1:80], receptor.geneset.matrix = receptor.geneset.matrix.out, num.genes = 10, rank.range.end = 70, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1, max.num.clusters = 4, seed.ckmeans = 2) dim(receptor.abundance.estimates.out) head(receptor.abundance.estimates.out)
Computes gene sets weights membership matrix using associations learned between log-normalized and reduced rank reconstructed (RRR)
scRNA-seq training data and
CITE-seq ADT training counts normalized using the centered log ratio (CLR) transformation. scRNA-seq counts are normalized and
RRR using the
SPECK::randomizedRRR()
function while CITE-seq counts are normalized using the
Seurat::NormalizeData()
function with the normalization.method
parameter set to CLR
. Spearman rank correlations are computed between the
normalized CITE-seq data and the normalized and RRR scRNA-seq data.
receptorGeneSetConstruction( train.rnaseq, train.citeseq, rank.range.end = 100, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1 )
receptorGeneSetConstruction( train.rnaseq, train.citeseq, rank.range.end = 100, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1 )
train.rnaseq |
|
train.citeseq |
|
rank.range.end |
See documentation for the |
min.consec.diff |
See documentation for the |
rep.consec.diff |
See documentation for the |
manual.rank |
See documentation for the |
seed.rsvd |
See documentation for the |
receptor.geneset.matrix
- A gene sets weights membership matrix where a column
from
corresponds to the weights for
genes from the scRNA-seq matrix trained against the corresponding CITE-seq ADT transcript
.
data("train.malt.rna.mat") data("train.malt.adt.mat") receptor.geneset.matrix.out <- receptorGeneSetConstruction(train.rnaseq = train.malt.rna.mat[1:100,1:80], train.citeseq = train.malt.adt.mat[1:100,1:2], rank.range.end = 70, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1) dim(receptor.geneset.matrix.out) head(receptor.geneset.matrix.out)
data("train.malt.rna.mat") data("train.malt.adt.mat") receptor.geneset.matrix.out <- receptorGeneSetConstruction(train.rnaseq = train.malt.rna.mat[1:100,1:80], train.citeseq = train.malt.adt.mat[1:100,1:2], rank.range.end = 70, min.consec.diff = 0.01, rep.consec.diff = 2, manual.rank = NULL, seed.rsvd = 1) dim(receptor.geneset.matrix.out) head(receptor.geneset.matrix.out)
A random subset of joint scRNA-seq/CITE-seq 10X Genomics human extranodal marginal zone B-cell tumor/mucosa-associated lymphoid tissue (MALT) target data. See the dataProcessing.R
file from data-raw
folder for code to recreate data subset.
target.malt.rna.mat
target.malt.rna.mat
A scRNA-seq counts matrix of dgCMatrix-class
from the Matrix
package with 4000 cells and 33538 genes.
<https://www.10xgenomics.com/resources/datasets/10-k-cells-from-a-malt-tumor-gene-expression-and-cell-surface-protein-3-standard-3-0-0>
A random subset of joint scRNA-seq/CITE-seq 10X Genomics human extranodal marginal zone B-cell tumor/mucosa-associated lymphoid tissue (MALT) training data. See the dataProcessing.R
file from data-raw
folder for code to recreate data subset.
train.malt.adt.mat
train.malt.adt.mat
A CITE-seq counts matrix of dgeMatrix-class
from the Matrix
package with 1000 cells and 17 genes.
<https://www.10xgenomics.com/resources/datasets/10-k-cells-from-a-malt-tumor-gene-expression-and-cell-surface-protein-3-standard-3-0-0>
A random subset of joint scRNA-seq/CITE-seq 10X Genomics human extranodal marginal zone B-cell tumor/mucosa-associated lymphoid tissue (MALT) training data. See the dataProcessing.R
file from data-raw
folder for code to recreate data subset.
train.malt.rna.mat
train.malt.rna.mat
A scRNA-seq counts matrix of dgCMatrix-class
from the Matrix
package with 1000 cells and 33538 genes.
<https://www.10xgenomics.com/resources/datasets/10-k-cells-from-a-malt-tumor-gene-expression-and-cell-surface-protein-3-standard-3-0-0>