This vignette introduces the FAST workflow for the analysis of single-section rather than multi-secitons data, one humn dorsolateral prefrontal cortex (DLPFC) spatial transcriptomics dataset. In this vignette, the workflow of FAST consists of three steps
We demonstrate the use of FAST to one DLPFC Visium data that are here, which can be downloaded to the current working path by the following command:
githubURL <- "https://github.com/feiyoung/FAST/blob/main/vignettes_data/seulist2_ID9_10.RDS?raw=true"
download.file(githubURL,"seulist2_ID9_10.RDS",mode='wb')
Then load to R. Here, we only focus one section.
The package can be loaded with the command:
First, we view the the spatial transcriptomics data with Visium platform. There are ~15000 genes and ~3600 spots.
We observed that the genes are Ensembl IDs. In the following, we will transfer the Ensembl IDs to gene symbols for matching the housekeeping genes in the downstream analysis for removing the unwanted variations.
print(row.names(dlpfc)[1:10])
count <- dlpfc[['RNA']]@counts
row.names(count) <- unname(transferGeneNames(row.names(count), now_name = "ensembl",
to_name="symbol",
species="Human", Method='eg.db'))
print(row.names(count)[1:10])
seu <- CreateSeuratObject(counts = count, meta.data = dlpfc@meta.data)
seu
We show how to preprocessing before fitting FAST, including log-normalization (if user use the gaussian version of FAST), and select highly variable genes.
row
and col
, which benefits the
identification of spaital coordinates by FAST.For function FAST_single
, users can specify the number
of factors q
and the fitted model fit.model
.
The q
sets the number of spatial factors to be extracted,
and a lareger one means more information to be extracted but higher
computaional cost. The fit.model
specifies the version of
FAST to be fitted. The Gaussian version (gaussian
) models
the log-normalized matrix while the Poisson verion
(poisson
) models the count matrix; default as
poisson
. (Note: The computational time required to run the
analysis on personal PCs is approximately ~0.5 minute on a personal
PC.)
Adj_sp <- AddAdj(as.matrix(seu@meta.data[,c("row", "col")]), platform = "Visium")
### set q= 15 here
set.seed(2023)
seu <- FAST_single(seu, Adj_sp=Adj_sp, q= 15, fit.model='poisson')
seu
Users can also use the gaussian version by the following command:
Next, we investigate the performance of dimension reduction by
calculating the adjusted McFadden’s pseudo R-square. The manual
annotations are regarded as the ground truth in the
meta.data
of seu
.
Based on the embeddings from FAST, we use Louvain
to
perform clustering. In this downstream analysis, other methods for
clustering can be also used.
seu <- FindNeighbors(seu, reduction = 'fast')
seu <- FindClusters(seu, resolution = 0.4)
seu$fast.cluster <- seu$seurat_clusters
ARI.fast <- mclust::adjustedRandIndex(y, seu$fast.cluster)
print(paste0("ARI of PCA is ", round(ARI.fast, 3)))
For comparison, we also run PCA to obtain PCA embeddings, and then conduct louvain clustering.
seu <- ScaleData(seu)
seu <- RunPCA(seu, npcs=15, verbose=FALSE)
Mac.pca <- get_r2_mcfadden(Embeddings(seu, reduction='pca'), y)
print(paste0("MacFadden's R-square of PCA is ", round(Mac.pca, 3)))
set.seed(1)
seu <- FindNeighbors(seu, reduction = 'pca', graph.name ="pca.graph")
seu <- FindClusters(seu, resolution = 0.8,graph.name = 'pca.graph')
seu$pca.cluster <- seu$seurat_clusters
ARI.pca <- mclust::adjustedRandIndex(y, seu$pca.cluster)
print(paste0("ARI of PCA is ", round(ARI.pca, 3)))
First, user can choose a beautiful color schema using
chooseColors()
in the R package PRECAST
.
Then, we plot the spatial scatter plot for clusters using the
function DimPlot()
in the R package Seurat
. We
observe that the clusters from PCA are more messy while the clusters
from FAST are more smoothing in spatial coordinates.
seu <- PRECAST::Add_embed(embed = as.matrix(seu@meta.data[,c("row", "col")]), seu, embed_name = 'Spatial')
seu
p1 <- DimPlot(seu, reduction = 'Spatial', group.by = 'pca.cluster',cols = cols_cluster, pt.size = 1.5)
p2 <- DimPlot(seu, reduction = 'Spatial', group.by = 'fast.cluster',cols = cols_cluster, pt.size = 1.5)
drawFigs(list(p1, p2),layout.dim = c(1,2) )
Next, we visualize the clusters from FAST
on the UMAP
space, and observe the clusters are well separated in general.
Finally, we condut the differential expression (DE) analysis. The
function FindAllMarkers()
in the Seurat
R
package is ued to achieve this analysis. And we extract the top five DE
genes.
Idents(seu) <- seu$fast.cluster
dat_deg <- FindAllMarkers(seu)
library(dplyr)
n <- 5
dat_deg %>%
group_by(cluster) %>%
top_n(n = n, wt = avg_log2FC) -> top5
top5
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rmarkdown_2.29
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.49
#> [5] maketools_1.3.1 cachem_1.1.0 knitr_1.49 htmltools_0.5.8.1
#> [9] buildtools_1.0.0 lifecycle_1.0.4 cli_3.6.3 sass_0.4.9
#> [13] jquerylib_0.1.4 compiler_4.4.2 sys_3.4.3 tools_4.4.2
#> [17] evaluate_1.0.1 bslib_0.8.0 yaml_2.3.10 jsonlite_1.8.9
#> [21] rlang_1.1.4