This vignette introduces the FAST
workflow for the analysis of two humn dorsolateral prefrontal cortex
(DLPFC) spatial transcriptomics dataset. FAST workflow is based on the
PRECASTObj
object created in the PRECAST
R
package and the workflow of FAST is similar to that of PRECAST; see the
vignette (https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html)
for the workflow of PRECAST. In this vignette, the workflow of FAST
consists of three steps
We demonstrate the use of FAST to two DLPFC Visium data that are here, which can be downloaded to the current working path by the following command:
githubURL <- "https://github.com/feiyoung/FAST/blob/main/vignettes_data/seulist2_ID9_10.RDS?raw=true"
download.file(githubURL,"seulist2_ID9_10.RDS",mode='wb')
Then load to R
The package can be loaded with the command:
First, we view the the two spatial transcriptomics data with Visium platform. There are ~15000 genes for each data batch and ~7000 spots in total.
Check the content in dlpfc2
.
We observed the genes are Ensembl IDs. In the following, we will transfer the Ensembl IDs to gene symbols for matching the housekeeping genes in the downstream analysis for removing the unwanted variations.
We show how to create a PRECASTObject object step by step. First, we create a Seurat list object using the count matrix and meta data of each data batch.
row
and col
, which benefits the
identification of spaital coordinates by FAST## Get the gene-by-spot read count matrices
countList <- lapply(dlpfc2, function(x) x[["RNA"]]@counts)
M <- length(countList)
### transfer the Ensembl ID to symbol
for(r in 1:M){
row.names(countList[[r]]) <- transferGeneNames(row.names(countList[[r]]), now_name = "ensembl",
to_name="symbol",
species="Human", Method='eg.db')
}
## Check the spatial coordinates: Yes, they are named as "row" and "col"!
print(head(dlpfc2[[1]]@meta.data))
## Get the meta data of each spot for each data batch
metadataList <- lapply(dlpfc2, function(x) x@meta.data)
## ensure the row.names of metadata in metaList are the same as that of colnames count matrix in countList
for(r in 1:M){
row.names(metadataList[[r]]) <- colnames(countList[[r]])
}
## Create the Seurat list object
seuList <- list()
for(r in 1:M){
seuList[[r]] <- CreateSeuratObject(counts = countList[[r]], meta.data=metadataList[[r]], project = "FASTdlpfc2")
}
print(seuList)
row.names(seuList[[1]])[1:10]
Next, we use CreatePRECASTObject()
to create a
PRECASTObject object based on the Seurat list object
seuList
. Here, we select the top 2000 highly variable genes
for followed analysis. Users are able to see https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html
for what is done in this function.
## Create PRECASTObject
set.seed(2023)
PRECASTObj <- CreatePRECASTObject(seuList, project = "FASTdlpfc2", gene.number = 2000, selectGenesMethod = "HVGs",
premin.spots = 20, premin.features = 20, postmin.spots = 1, postmin.features = 10)
## User can retain the raw seuList by the following commond.
## PRECASTObj <- CreatePRECASTObject(seuList, customGenelist=row.names(seuList[[1]]), rawData.preserve = TRUE)
## check the number of genes/features after filtering step
PRECASTObj@seulist
Add adjacency matrix list and parameter setting of FAST. More model
setting parameters can be found in model_set_FAST()
.
## seuList is null since the default value `rawData.preserve` is FALSE.
PRECASTObj@seuList
## Add adjacency matrix list for a PRECASTObj object to prepare for FAST model fitting.
PRECASTObj <- AddAdjList(PRECASTObj, platform = "Visium")
## Add a model setting in advance for a PRECASTObj object: verbose =TRUE helps outputing the information in the algorithm;
PRECASTObj <- AddParSettingFAST(PRECASTObj, verbose=TRUE)
## Check the parameters
PRECASTObj@parameterList
For function FAST
, users can specify the number of
factors q
and the fitted model fit.model
. The
q
sets the number of spatial factors to be extracted, and a
lareger one means more information to be extracted but higher
computaional cost. The fit.model
specifies the version of
FAST to be fitted. The Gaussian version (gaussian
) models
the log-count matrices while the Poisson verion (poisson
)
models the count matrices; default as poisson
. For a fast
implementation, we run Gaussian version of FAST here. (Note: The
computational time required to run the analysis on personal PCs is
approximately ~1 minute.)
Next, we investigate the performance of dimension reduction by
calculating the adjusted McFadden’s pseudo R-square for each data batch.
The manual annotations are regarded as the groud truth in the
meta.data
of each component of
PRECASTObj@seulist
.
Based on the embeddings from FAST, we use iSC-MEB
to
jointly align the embeddings and perform clustering. In this downstream
analysis, other methods for embedding alignment and clustering can be
also used. In the vignette of the simulated ST data, we show another
method (Harmony
+Louvain
) to perform embedding
alignment and clustering.
First, we use the function RunHarmonyLouvain()
in
ProFAST
R package to select the number of clusters
PRECASTObj <- RunHarmonyLouvain(PRECASTObj, resolution = 0.3)
ARI_vec_louvain <- sapply(1:M, function(r) mclust::adjustedRandIndex(PRECASTObj@resList$Louvain$cluster[[r]], yList[[r]]))
print(ARI_vec_louvain)
Then, we use the function RuniSCMEB()
in
ProFAST
R package to jointly perform embedding alignment
and spatial clustering. The results are saved in the slot
PRECASTObj@resList$iSCMEB
.
In the following, we remove the unwanted variations in the
log-normalized expression matrices to obtain a combined log-normalized
expression matrix in a Seurat object. For this real data, we use
housekeeping genes as negative control to remove the unwanted
variations. Specifically, we leverage the batch effect embeddings
estimated through housekeeping gene expression matrix to capture
unwanted variations. Additionally, we utilize the cluster labels
obtained via iSC-MEB to retain the desired biological effects. The
posterior probability of cluster labels (RList) are in the slot
PRECASTObj@resList$iSCMEB
.
To capture the unwanted variation, we first select the select the
housekeeping genes using the function SelectHKgenes()
.
## select the HK genes
HKgenes <- SelectHKgenes(seuList, species= "Human", HK.number=200)
seulist_HK <- lapply(seuList, function(x) x[HKgenes, ])
seulist_HK
Then, we integrate the two sections by removing the unwanted
variations by using seulist_HK=seulist_HK
and
Method = "iSC-MEB"
in the function
IntegrateSRTData()
. After obtaining seuInt
, we
will see there are three embeddings: FAST
,
iscmeb
and position
, in the slot
seuInt@reductions
. FAST
are the embeddings
obtained by FAST model fitting and are uncorrected embeddings that may
includes the unwanted effects (i.e., batch effects);
iscmeb
are the aligned embeddings obtained by iSC-MEB
fitting; and position
are the spatial coordinates.
First, user can choose a beautiful color schema using
chooseColors()
in the R package PRECAST
.
Then, we plot the spatial scatter plot for clusters using the
function SpaPlot()
in the R package
PRECAST
.
p12 <- SpaPlot(seuInt, item = "cluster", batch = NULL, point_size = 1, cols = cols_cluster, combine = FALSE)
library(ggplot2)
p12 <- lapply(p12, function(x) x+ coord_flip() + scale_x_reverse())
drawFigs(p12, layout.dim = c(1,2), common.legend = T)
We use the function AddUMAP()
in the R package
PRECAST
to obtain the three-dimensional components of UMAP
using the aligned embeddings in the reduction harmony
.
We plot the spatial tNSE RGB plot to illustrate the performance in extracting features.
p13 <- SpaPlot(seuInt, batch = NULL, item = "RGB_UMAP", point_size = 1, combine = FALSE, text_size = 15)
p13 <- lapply(p13, function(x) x+ coord_flip() + scale_x_reverse())
drawFigs(p13, layout.dim = c(1, 2), common.legend = TRUE, legend.position = "right", align = "hv")
We use the function AddUTSNE()
in the R package
PRECAST
to obtain the two-dimensional components of UMAP
using the aligned embeddings in the reduction iscmeb
, and
plot the tSNE plot based on the extracted features to check the
performance of integration.
seuInt <- AddTSNE(seuInt, n_comp = 2, reduction = 'iscmeb', assay = 'RNA')
p1 <- dimPlot(seuInt, item = "cluster", point_size = 0.5, font_family = "serif", cols = cols_cluster,
border_col = "gray10", legend_pos = "right") # Times New Roman
p2 <- dimPlot(seuInt, item = "batch", point_size = 0.5, font_family = "serif", legend_pos = "right")
drawFigs(list(p1, p2), layout.dim = c(1, 2), legend.position = "right", align = "hv")
Finally, we condut the combined differential expression analysis
using the integrated log-normalized expression matrix saved in the
seuInt
object. The function FindAllMarkers()
in the Seurat
R package is ued to achieve this
analysis.
dat_deg <- FindAllMarkers(seuInt)
library(dplyr)
n <- 5
dat_deg %>%
group_by(cluster) %>%
top_n(n = n, wt = avg_log2FC) -> top10
Plot dot plot of normalized expressions of the layer-specified marker genes for each spatial domain identified by using the FAST embeddings.
col_here <- c("#F2E6AB", "#9C0141")
library(ggplot2)
marker_Seurat <- readRDS("151507_DEGmarkerTop_5.rds")
marker_Seurat <- lapply(marker_Seurat, function(x) transferGeneNames(x, Method='eg.db'))
maker_genes <- unlist(lapply(marker_Seurat, toupper))
features <- maker_genes[!duplicated(maker_genes)]
p_dot <- DotPlot(seuInt, features=unname(features), cols=col_here, # idents = ident_here,
col.min = -1, col.max = 1) + coord_flip()+ theme(axis.text.y = element_text(face = "italic"))+
ylab("Domain") + xlab(NULL) + theme(axis.text.x = element_text(size=12, angle = 25, hjust = 1, family='serif'),
axis.text.y = element_text(size=12, face= "italic", family='serif'))
p_dot
# table(unlist(yList), Idents(seuInt))
Based on the marker genes, we annotate the Domains 1-9 as Layer5/6,
Layer3/4, Layer5, Layer2/3, Layer2/3, Layer6, WM, Layer5, Layer1. Then,
we rename the Domains using the function
RenameIdents()
.
seuInt2 <- RenameIdents(seuInt, '1' = 'Layer6', '2' = 'Layer2/3', '3'="Layer5", '4'="WM",
'5'='Layer3/4', '6'= 'Layer1')
levels(seuInt2) <- c("Layer1", "Layer2/3", "Layer3/4", "Layer5", "Layer6", "WM")
p_dot2 <- DotPlot(seuInt2, features=unname(features), cols=col_here, # idents = ident_here,
col.min = -1, col.max = 1) + coord_flip()+ theme(axis.text.y = element_text(face = "italic"))+
ylab("Domain") + xlab(NULL) + theme(axis.text.x = element_text(size=12, angle = 25, hjust = 1, family='serif'),
axis.text.y = element_text(size=12, face= "italic", family='serif'))
p_dot2
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rmarkdown_2.29
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.49
#> [5] maketools_1.3.1 cachem_1.1.0 knitr_1.49 htmltools_0.5.8.1
#> [9] buildtools_1.0.0 lifecycle_1.0.4 cli_3.6.3 sass_0.4.9
#> [13] jquerylib_0.1.4 compiler_4.4.2 sys_3.4.3 tools_4.4.2
#> [17] evaluate_1.0.1 bslib_0.8.0 yaml_2.3.10 jsonlite_1.8.9
#> [21] rlang_1.1.4