STREAK is a supervised receptor abundance estimation method that depends on functionalities from the Seurat (Hao et al. 2021; Stuart et al. 2019; Butler et al. 2018; Satija et al. 2015), SPECK (Frost and Javaid 2022), VAM (Frost 2021) and Ckmeans.1d.dp (Wang and Song 2011; Song and Zhong 2020) packages.
STREAK performs receptor abundance estimation by leveraging
expression associations learned from joint scRNA-seq/CITE-seq training
data. These associations can either be manually specified using
pre-existing ground truth or can be built using a subset of joint
transcriptomics and proteomics data. Below, we use a subset of 1000
cells from the 10X Genomics human extranodal marginal zone B-cell
tumor/mucosa-associated lymphoid tissue (MALT) scRNA-seq/CITE-seq joint
dataset to build a gene set weights membership matrix for the CD3, CD4,
CD8a, CD14 and CD15 receptors. Given a m × n training scRNA-seq
counts matrix and a m × h CITE-seq matrix, the
receptorGeneSetConstruction()
function is utilized to learn
associations between each CITE-seq ADT transcript and all scRNA-seq
transcripts. The resulting gene weights membership matrix is n × h.
data("train.malt.rna.mat")
data("train.malt.adt.mat")
receptor.geneset.matrix.out <- receptorGeneSetConstruction(train.rnaseq =
train.malt.rna.mat,
train.citeseq =
train.malt.adt.mat[,1:5],
rank.range.end = 100,
min.consec.diff = 0.01,
rep.consec.diff = 2,
manual.rank = NULL,
seed.rsvd = 1)
dim(receptor.geneset.matrix.out)
#> [1] 33538 5
head(receptor.geneset.matrix.out)
#> CD3 CD4 CD8a CD14 CD15
#> MIR1302-2HG -0.603003712 -0.26561850 0.26371421 0.58013205 0.57615067
#> FAM138A -0.597892301 -0.26338838 0.09474784 0.57817595 0.58215039
#> OR4F5 -0.089979883 -0.19045034 0.14489008 0.05350075 0.14764528
#> AL627309.1 0.067616000 0.08191478 -0.09400833 -0.09638293 -0.06789212
#> AL627309.3 -0.009037395 0.12988914 -0.13433059 0.01619190 -0.02506765
#> AL627309.2 -0.096277159 -0.05824631 0.01289699 0.07889503 0.08270513
Following the development of weighted gene sets, the
receptorAbundanceEstimation()
function is used to perform
receptor abundance estimation. A subset of 1100 cells from the 10X
Genomics MALT scRNA-seq data is used for estimation. Given a m × n target scRNA-seq
counts matrix and a n × h gene set weights
membership matrix, target scRNA-seq expression from top most weighted
genes with each ADT transcript is used for gene set scoring and
subsequent thresholding. The resulting estimated receptor abundance
matrix is m × h.
data("target.malt.rna.mat")
receptor.abundance.estimates.out <-
receptorAbundanceEstimation(target.rnaseq = target.malt.rna.mat,
receptor.geneset.matrix =
receptor.geneset.matrix.out,
num.genes = 10, rank.range.end = 100,
min.consec.diff = 0.01, rep.consec.diff = 2,
manual.rank = NULL, seed.rsvd = 1,
max.num.clusters = 4, seed.ckmeans = 2)
dim(receptor.abundance.estimates.out)
#> [1] 1100 5
head(receptor.abundance.estimates.out)
#> CD3 CD4 CD8a CD14 CD15
#> CTACCTGAGAGCGACT-1 0.0000000 0 0.9987944 0.6740526 0.7753415
#> TGGCGTGCACAGCATT-1 0.9464793 0 0.0000000 0.0000000 0.0000000
#> TAGGAGGAGCTGGCCT-1 0.0000000 0 0.0000000 0.9992784 0.9988085
#> ACTATCTCACCCTATC-1 0.0000000 0 0.9982689 0.1559718 0.2513592
#> ACGGAAGTCAATCCGA-1 0.0000000 0 0.9957439 0.5229880 0.6813975
#> AAGTACCCACAGAGCA-1 0.0000000 0 0.0000000 0.9990658 0.9985386