---
title: 'PRECAST: Four DLPFC Sample Analysis'
author: "Wei Liu"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{PRECAST: Four DLPFC Sample Analysis}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
tidy = TRUE,
fig.width = 11,
tidy.opts = list(width.cutoff = 95),
message = FALSE,
warning = FALSE,
time_it = TRUE
)
all_times <- list() # store the time for each chunk
knitr::knit_hooks$set(time_it = local({
now <- NULL
function(before, options) {
if (before) {
now <<- Sys.time()
} else {
res <- difftime(Sys.time(), now, units = "secs")
all_times[[options$label]] <<- res
}
}
}))
```
This vignette introduces the PRECAST workflow for the analysis of four spatial transcriptomics datasets. The workflow consists of three steps
* Independent preprocessing and model setting
* Probabilistic embedding and clustering using PRECAST model
* Downstream analysis (i.e. visualization of clusters and embeddings)
We demonstrate the use of PRECAST to four human dorsolateral prefrontal cortex Visium data. Download the data to R
```{r eval =FALSE}
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(SingleCellExperiment))
name_ID4 <- as.character(c(151673, 151674, 151675, 151676))
### Read data in an online manner
n_ID <- length(name_ID4)
url_brainA <- "https://github.com/feiyoung/DR-SC.Analysis/raw/main/data/DLPFC_data/"; url_brainB <- ".rds"
seuList <- list()
if(!require(ProFAST)){
remotes::install_github("feiyoung/ProFAST")
}
for(i in 1:n_ID){
# i <- 1
cat('input brain data', i, '\n')
# load and read data
dlpfc <- readRDS(url(paste0(url_brainA, name_ID4[i],url_brainB) ))
count <- dlpfc@assays@data$counts
row.names(count) <- ProFAST::transferGeneNames(row.names(count), species = "Human")
seu1 <- CreateSeuratObject(counts = count,
meta.data = as.data.frame(colData(dlpfc)),
min.cells = 10,min.features = 10)
seuList[[i]] <- seu1
}
# saveRDS(seuList, file='seuList4.RDS')
```
The package can be loaded with the command:
```{r eval = FALSE}
library(PRECAST)
```
## Compare PRECAST with DR-SC in analyzing one sample
First, we view the the spatial transcriptomics data with Visium platform.
```{r eval = FALSE}
seuList <- readRDS("seuList4.RDS")
seuList ## a list including Seurat objects
```
check the meta data that must include the spatial coordinates named "row" and "col", respectively.
If the names are not, they are required to rename them.
```{r eval = FALSE}
metadataList <- lapply(seuList, function(x) x@meta.data)
for(r in seq_along(metadataList)){
meta_data <- metadataList[[r]]
cat(all(c("row", "col") %in% colnames(meta_data)), '\n') ## the names are correct!
}
```
### Prepare the PRECASTObject.
Create a PRECASTObj object to prepare for PRECAST models. Here, we only show the `HVGs` method to select the 2000 highly variable genes, but users are able to choose more genes or also use the `SPARK-X` to choose the spatially variable genes.
```{r eval = FALSE}
set.seed(2023)
preobj <- CreatePRECASTObject(seuList = seuList, selectGenesMethod="HVGs",
gene.number = 2000) #
```
### Add the model setting
```{r eval = FALSE}
## check the number of genes/features after filtering step
preobj@seulist
## Add adjacency matrix list for a PRECASTObj object to prepare for PRECAST model fitting.
PRECASTObj <- AddAdjList(preobj, platform = "Visium")
## Add a model setting in advance for a PRECASTObj object. verbose =TRUE helps outputing the
## information in the algorithm.
PRECASTObj <- AddParSetting(PRECASTObj, Sigma_equal = TRUE, coreNum = 1,
maxIter=30, verbose = TRUE)
```
### Fit PRECAST
For function `PRECAST`, users can specify the number of clusters $K$ or set `K` to be an integer vector by using modified BIC(MBIC) to determine $K$. Here, we use user-specified number of clusters.
```{r eval = FALSE}
### Given K
PRECASTObj <- PRECAST(PRECASTObj, K= 7)
```
Use the function `SelectModel()` to re-organize the fitted results in PRECASTObj.
```{r eval = FALSE}
## backup the fitting results in resList
resList <- PRECASTObj@resList
PRECASTObj <- SelectModel(PRECASTObj)
ari_precast <- sapply(1:length(seuList), function(r) mclust::adjustedRandIndex(PRECASTObj@resList$cluster[[r]], PRECASTObj@seulist[[r]]$layer_guess_reordered))
mat <- matrix(round(ari_precast,2), nrow=1)
name_ID4 <- as.character(c(151673, 151674, 151675, 151676))
colnames(mat) <- name_ID4
DT::datatable(mat)
```
**Other options**
Users are also able to set multiple K, then choose the best one
```{r eval =FALSE}
PRECASTObj2 <- AddParSetting(PRECASTObj, Sigma_equal = FALSE, coreNum = 4,
maxIter=30, verbose = TRUE) # set 4 cores to run in parallel.
PRECASTObj2 <- PRECAST(PRECASTObj2, K= 5:8)
## backup the fitting results in resList
resList2 <- PRECASTObj2@resList
PRECASTObj2 <- SelectModel(PRECASTObj2)
```
Besides, user can also use different initialization method by setting `int.model`, for example, set `int.model=NULL`; see the functions `AddParSetting()` and `model_set()` for more details.
### Integration and Visualization
Integrate the all samples using the `IntegrateSpaData` function. For computational efficiency, this function exclusively integrates the variable genes. Specifically, in cases where users do not specify the `PRECASTObj@seuList` or `seuList` argument within the `IntegrateSpaData` function, it automatically focuses on integrating only the variable genes. The default setting for `PRECASTObj@seuList` is `NULL` when `rawData.preserve` in `CreatePRECASTObject` is set to `FALSE`. For instance:
```{r eval = FALSE}
print(PRECASTObj@seuList)
seuInt <- IntegrateSpaData(PRECASTObj, species='Human')
seuInt
## The low-dimensional embeddings obtained by PRECAST are saved in PRECAST reduction slot.
```
**Integrating all genes**
There are two ways to use `IntegrateSpaData` integrating all genes, which will require more memory. We recommand running for all genes on server. The first one is to set value for `PRECASTObj@seuList`.
```{r eval =FALSE}
## assign the raw Seurat list object to it.
## For illustration, we generate a new seuList with more genes;
## For integrating all genes, users can set `seuList <- bc2`.
genes <- c(row.names(PRECASTObj@seulist[[1]]), row.names(seuList[[1]])[1:10])
seuList_sub <- lapply(seuList, function(x) x[genes,])
PRECASTObj@seuList <- seuList_sub #
seuInt <- IntegrateSpaData(PRECASTObj, species='Human')
seuInt
```
The second method is to set a value for the argument `seuList`:
```{r eval =FALSE}
PRECASTObj@seuList <- NULL
## At the same time, we can set subsampling to speed up the computation.
seuInt <- IntegrateSpaData(PRECASTObj, species='Human', seuList=seuList_sub, subsample_rate = 0.5)
seuInt
```
First, user can choose a beautiful color schema using `chooseColors()`.
```{r eval = FALSE}
cols_cluster <- chooseColors(palettes_name = 'Classic 20', n_colors=7, plot_colors = TRUE)
```
Show the spatial scatter plot for clusters
```{r eval = FALSE, fig.height = 8, fig.width=9}
p12 <- SpaPlot(seuInt, item='cluster', batch=NULL,point_size=1, cols=cols_cluster, combine=TRUE, nrow.legend=7)
p12
# users can plot each sample by setting combine=FALSE
```
Users can re-plot the above figures for specific need by returning a ggplot list object. For example, we plot the spatial heatmap using a common legend.
```{r eval = FALSE, fig.height = 8, fig.width=8.5}
library(ggplot2)
pList <- SpaPlot(seuInt, item='cluster', batch=NULL,point_size=2.5, cols=cols_cluster, combine=FALSE, nrow.legend=7)
pList <- lapply(pList, function(x) x + coord_flip() + scale_x_reverse())
drawFigs(pList, layout.dim = c(2,2), common.legend = TRUE, legend.position = 'right', align='hv')
```
Show the spatial UMAP/tNSE RGB plot to illustrate the performance in extracting features.
```{r eval = FALSE, fig.height = 6, fig.width=5.5}
seuInt <- AddUMAP(seuInt)
p13List <- SpaPlot(seuInt, batch=NULL,item='RGB_UMAP',point_size=2, combine=FALSE, text_size=15)
p13List <- lapply(p13List, function(x) x + coord_flip() + scale_x_reverse())
drawFigs(p13List, layout.dim = c(2,2), common.legend = TRUE, legend.position = 'right', align='hv')
#seuInt <- AddTSNE(seuInt)
#SpaPlot(seuInt, batch=NULL,item='RGB_TSNE',point_size=2, combine=T, text_size=15)
```
Show the tSNE plot based on the extracted features from PRECAST to check the performance of integration.
```{r eval = FALSE, fig.height = 4.5, fig.width=12}
seuInt <- AddTSNE(seuInt, n_comp = 2)
p1 <- dimPlot(seuInt, item='cluster', point_size = 0.5, font_family='serif', cols=cols_cluster,border_col="gray10", nrow.legend=14, legend_pos='right') # Times New Roman
p2 <- dimPlot(seuInt, item='batch', point_size = 0.5, font_family='serif', legend_pos='right')
drawFigs(list(p1, p2), layout.dim = c(1,2), legend.position = 'right', align='hv')
```
Combined differential expression analysis
```{r eval = FALSE}
library(Seurat)
dat_deg <- FindAllMarkers(seuInt)
library(dplyr)
n <- 5
dat_deg %>%
group_by(cluster) %>%
top_n(n = n, wt = avg_log2FC) -> top10
```
Plot DE genes' heatmap for each spatial domain identified by PRECAST.
```{r eval = FALSE, fig.height = 6, fig.width=8}
library(ggplot2)
## HeatMap
p1 <- DotPlot(seuInt, features = unique(top10$gene), col.min = 0, col.max = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=8))
p1
```
**Session Info**
```{r}
sessionInfo()
```