This vignette introduces the PRECAST workflow for the analysis of integrating multiple spatial transcriptomics datasets. The workflow consists of three steps
We demonstrate the use of PRECAST to two sliced human breast cancer Visium data that are here, which can be downloaded to the current working path by the following command:
githubURL <- "https://github.com/feiyoung/PRECAST/blob/main/vignettes_data/bc2.rda?raw=true"
download.file(githubURL, "bc2.rda", mode = "wb")
Then load to R
This data is also available at 10X genomics data website:
Users require the two folders for each dataset: spatial and filtered_feature_bc_matrix. Then the data can be read by the following commond.
The package can be loaded with the command:
View human breast cancer Visium data from
DataPRECAST
Check the content in bc2
We show how to create a PRECASTObject object step by step. First, we
create a Seurat list object using the count matrix and meta data of each
data batch. Although bc2
is a prepared Seurat list object,
we re-create it to show the details of the Seurat list object. At the
same time, check the meta data that must include the spatial coordinates
named “row” and “col”, respectively. If the names are not, they are
required to rename them.
## Get the gene-by-spot read count matrices countList <- lapply(bc2, function(x)
## x[['RNA']]@counts)
countList <- lapply(bc2, function(x) {
assay <- DefaultAssay(x)
GetAssayData(x, assay = assay, slot = "counts")
})
M <- length(countList)
## Get the meta data of each spot for each data batch
metadataList <- lapply(bc2, function(x) x@meta.data)
for (r in 1:M) {
meta_data <- metadataList[[r]]
all(c("row", "col") %in% colnames(meta_data)) ## the names are correct!
head(meta_data[, c("row", "col")])
}
## ensure the row.names of metadata in metaList are the same as that of colnames count matrix
## in countList
for (r in 1:M) {
row.names(metadataList[[r]]) <- colnames(countList[[r]])
}
## Create the Seurat list object
seuList <- list()
for (r in 1:M) {
seuList[[r]] <- CreateSeuratObject(counts = countList[[r]], meta.data = metadataList[[r]], project = "BreastCancerPRECAST")
}
bc2 <- seuList
rm(seuList)
head(meta_data[, c("row", "col")])
Next, we use CreatePRECASTObject()
to create a
PRECASTObject based on the Seurat list object bc2
. This
function will do three things:
premin.features
and premin.spots
,
respectively; the spots are retained in raw data (bc2) with at least
premin.features number of nonzero-count features (genes), and the genes
are retained in raw data (bc2) with at least premin.spots
number of spots. To ease presentation, we denote the filtered Seurat
list object as bc2_filter1.gene.number=2000
) for each data batch using
FindSVGs()
function in DR.SC
package for
spatially variable genes or FindVariableFeatures()
function
in Seurat
package for highly variable genes. Next, we
prioritized genes based on the number of times they were selected as
variable genes in all samples and chose the top 2,000 genes. Then denote
the Seurat list object as bc2_filter2, where only 2,000 genes are
retained.postmin.features
and
postmin.spots
, respectively; the spots are retained with at
least post.features
nonzero counts across genes; the
features (genes) are retained with at least postmin.spots
number of nonzero-count spots. Usually, no genes are filltered because
these genes are variable genes.If the argument customGenelist
is not NULL
,
then this function only does (3) not (1) and (2). User can retain the
raw seurat list object by setting
rawData.preserve = TRUE
.
## Create PRECASTObject.
set.seed(2022)
PRECASTObj <- CreatePRECASTObject(bc2, project = "BC2", gene.number = 2000, selectGenesMethod = "SPARK-X",
premin.spots = 20, premin.features = 20, postmin.spots = 1, postmin.features = 10)
## User can retain the raw seuList by the following commond. PRECASTObj <-
## CreatePRECASTObject(seuList, customGenelist=row.names(seuList[[1]]), rawData.preserve =
## TRUE)
Add adjacency matrix list and parameter setting of PRECAST. More model setting parameters can be found in .
## check the number of genes/features after filtering step
PRECASTObj@seulist
## seuList is null since the default value `rawData.preserve` is FALSE.
PRECASTObj@seuList
## Add adjacency matrix list for a PRECASTObj object to prepare for PRECAST model fitting.
PRECASTObj <- AddAdjList(PRECASTObj, platform = "Visium")
## Add a model setting in advance for a PRECASTObj object. verbose =TRUE helps outputing the
## information in the algorithm.
PRECASTObj <- AddParSetting(PRECASTObj, Sigma_equal = FALSE, verbose = TRUE, maxIter = 30)
For function PRECAST
, users can specify the number of
clusters K or set
K
to be an integer vector by using modified BIC(MBIC) to
determine K. First, we try
using user-specified number of clusters. For convenience, we give the
selected number of clusters by MBIC (K=14).
Select a best model if K is
an integer vector. Even if K
is a scalar, this step is also neccessary to re-organize the results in
PRECASTObj
object.
## backup the fitting results in resList
resList <- PRECASTObj@resList
PRECASTObj <- SelectModel(PRECASTObj)
Integrate the two samples using the IntegrateSpaData
function. For computational efficiency, this function exclusively
integrates the variable genes. Specifically, in cases where users do not
specify the PRECASTObj@seuList
or seuList
argument within the IntegrateSpaData
function, it
automatically focuses on integrating only the variable genes. The
default setting for PRECASTObj@seuList
is NULL
when rawData.preserve
in CreatePRECASTObject
is set to FALSE
. For instance:
print(PRECASTObj@seuList)
seuInt <- IntegrateSpaData(PRECASTObj, species = "Human")
seuInt
## The low-dimensional embeddings obtained by PRECAST are saved in PRECAST reduction slot.
There are two ways to use IntegrateSpaData
integrating
all genes, which will require more memory. We recommand running for all
genes on server. The first one is to set value for
PRECASTObj@seuList
.
## assign the raw Seurat list object to it For illustration, we generate a new seuList with
## more genes; For integrating all genes, users can set `seuList <- bc2`.
genes <- c(row.names(PRECASTObj@seulist[[1]]), row.names(bc2[[1]])[1:10])
seuList <- lapply(bc2, function(x) x[genes, ])
PRECASTObj@seuList <- seuList #
seuInt <- IntegrateSpaData(PRECASTObj, species = "Human")
seuInt
The second method is to set a value for the argument
seuList
:
First, user can choose a beautiful color schema using
chooseColors()
.
Show the spatial scatter plot for clusters
p12 <- SpaPlot(seuInt, item = "cluster", batch = NULL, point_size = 1, cols = cols_cluster, combine = TRUE,
nrow.legend = 7)
p12
# users can plot each sample by setting combine=FALSE
Users can re-plot the above figures for specific need by returning a ggplot list object. For example, we plot the spatial heatmap using a common legend.
pList <- SpaPlot(seuInt, item = "cluster", batch = NULL, point_size = 1, cols = cols_cluster, combine = FALSE,
nrow.legend = 7)
drawFigs(pList, layout.dim = c(1, 2), common.legend = TRUE, legend.position = "right", align = "hv")
Show the spatial UMAP/tNSE RGB plot to illustrate the performance in extracting features.
seuInt <- AddUMAP(seuInt)
p13 <- SpaPlot(seuInt, batch = NULL, item = "RGB_UMAP", point_size = 2, combine = TRUE, text_size = 15)
p13
# seuInt <- AddTSNE(seuInt) SpaPlot(seuInt, batch=NULL,item='RGB_TSNE',point_size=2,
# combine=T, text_size=15)
Show the tSNE plot based on the extracted features from PRECAST to check the performance of integration.
seuInt <- AddTSNE(seuInt, n_comp = 2)
p1 <- dimPlot(seuInt, item = "cluster", point_size = 0.5, font_family = "serif", cols = cols_cluster,
border_col = "gray10", nrow.legend = 14, legend_pos = "right") # Times New Roman
p2 <- dimPlot(seuInt, item = "batch", point_size = 0.5, font_family = "serif", legend_pos = "right")
drawFigs(list(p1, p2), layout.dim = c(1, 2), legend.position = "right", align = "hv")
Combined differential expression analysis
library(Seurat)
dat_deg <- FindAllMarkers(seuInt)
library(dplyr)
n <- 10
dat_deg %>%
group_by(cluster) %>%
top_n(n = n, wt = avg_log2FC) -> top10
seuInt <- ScaleData(seuInt)
seus <- subset(seuInt, downsample = 400)
Plot DE genes’ heatmap for each spatial domain identified by PRECAST.
color_id <- as.numeric(levels(Idents(seus)))
library(ggplot2)
## HeatMap
p1 <- doHeatmap(seus, features = top10$gene, cell_label = "Domain", grp_label = F, grp_color = cols_cluster[color_id],
pt_size = 6, slot = "scale.data") + theme(legend.text = element_text(size = 10), legend.title = element_text(size = 13,
face = "bold"), axis.text.y = element_text(size = 5, face = "italic", family = "serif"))
p1
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rmarkdown_2.29
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.49
#> [5] maketools_1.3.1 cachem_1.1.0 knitr_1.49 htmltools_0.5.8.1
#> [9] buildtools_1.0.0 lifecycle_1.0.4 cli_3.6.3 sass_0.4.9
#> [13] jquerylib_0.1.4 compiler_4.4.2 sys_3.4.3 tools_4.4.2
#> [17] evaluate_1.0.1 bslib_0.8.0 yaml_2.3.10 formatR_1.14
#> [21] jsonlite_1.8.9 rlang_1.1.4