STREAK is a supervised receptor abundance estimation method that depends on functionalities from the Seurat (Hao et al. 2021; Stuart et al. 2019; Butler et al. 2018; Satija et al. 2015), SPECK (Frost and Javaid 2022), VAM (Frost 2021) and Ckmeans.1d.dp (Wang and Song 2011; Song and Zhong 2020) packages.
STREAK performs receptor abundance estimation by leveraging
expression associations learned from joint scRNA-seq/CITE-seq training
data. These associations can either be manually specified using
pre-existing ground truth or can be built using a subset of joint
transcriptomics and proteomics data. Below, we use a subset of 1000
cells from the 10X Genomics human extranodal marginal zone B-cell
tumor/mucosa-associated lymphoid tissue (MALT) scRNA-seq/CITE-seq joint
dataset to build a gene set weights membership matrix for the CD3, CD4,
CD8a, CD14 and CD15 receptors. Given a \(m
\times n\) training scRNA-seq counts matrix and a \(m \times h\) CITE-seq matrix, the
receptorGeneSetConstruction() function is utilized to learn
associations between each CITE-seq ADT transcript and all scRNA-seq
transcripts. The resulting gene weights membership matrix is \(n \times h\).
data("train.malt.rna.mat")
data("train.malt.adt.mat")
receptor.geneset.matrix.out <- receptorGeneSetConstruction(train.rnaseq =
train.malt.rna.mat,
train.citeseq =
train.malt.adt.mat[,1:5],
rank.range.end = 100,
min.consec.diff = 0.01,
rep.consec.diff = 2,
manual.rank = NULL,
seed.rsvd = 1)
dim(receptor.geneset.matrix.out)
#> [1] 33538 5
head(receptor.geneset.matrix.out)
#> CD3 CD4 CD8a CD14 CD15
#> MIR1302-2HG -0.604110407 -0.26998235 0.27026601 0.57776984 0.57784085
#> FAM138A -0.598787274 -0.25872687 0.09013295 0.57990947 0.58231208
#> OR4F5 -0.095207766 -0.20187439 0.15526743 0.05700447 0.15424909
#> AL627309.1 0.067616000 0.08191478 -0.09400833 -0.09638293 -0.06789212
#> AL627309.3 -0.008013746 0.13915141 -0.14105265 0.01494577 -0.02837786
#> AL627309.2 -0.098468902 -0.06671758 0.01694013 0.08150649 0.08697406Following the development of weighted gene sets, the
receptorAbundanceEstimation() function is used to perform
receptor abundance estimation. A subset of 1100 cells from the 10X
Genomics MALT scRNA-seq data is used for estimation. Given a \(m \times n\) target scRNA-seq counts matrix
and a \(n \times h\) gene set weights
membership matrix, target scRNA-seq expression from top most weighted
genes with each ADT transcript is used for gene set scoring and
subsequent thresholding. The resulting estimated receptor abundance
matrix is \(m \times h\).
data("target.malt.rna.mat")
receptor.abundance.estimates.out <-
receptorAbundanceEstimation(target.rnaseq = target.malt.rna.mat,
receptor.geneset.matrix =
receptor.geneset.matrix.out,
num.genes = 10, rank.range.end = 100,
min.consec.diff = 0.01, rep.consec.diff = 2,
manual.rank = NULL, seed.rsvd = 1,
max.num.clusters = 4, seed.ckmeans = 2)
dim(receptor.abundance.estimates.out)
#> [1] 1100 5
head(receptor.abundance.estimates.out)
#> CD3 CD4 CD8a CD14 CD15
#> CTACCTGAGAGCGACT-1 0.0000000 0 0.9987944 0.6740526 0.7753415
#> TGGCGTGCACAGCATT-1 0.9464793 0 0.0000000 0.0000000 0.0000000
#> TAGGAGGAGCTGGCCT-1 0.0000000 0 0.0000000 0.9992784 0.9988085
#> ACTATCTCACCCTATC-1 0.0000000 0 0.9982689 0.1559718 0.2513592
#> ACGGAAGTCAATCCGA-1 0.0000000 0 0.9957439 0.5229880 0.6813975
#> AAGTACCCACAGAGCA-1 0.0000000 0 0.0000000 0.9990658 0.9985386