Package 'SpaTopic' reference manual

Title:	Topic Inference to Identify Tissue Architecture in Multiplexed Images
Description:	A novel spatial topic model to integrate both cell type and spatial information to identify the complex spatial tissue architecture on multiplexed tissue images without human intervention. The Package implements a collapsed Gibbs sampling algorithm for inference. 'SpaTopic' is scalable to large-scale image datasets without extracting neighborhood information for every single cell. For more details on the methodology, see <https://xiyupeng.github.io/SpaTopic/>.
Authors:	Xiyu Peng [aut, cre]
Maintainer:	Xiyu Peng <[email protected]>
License:	GPL (>= 3)
Version:	1.2.0
Built:	2025-03-03 19:21:51 UTC
Source:	CRAN

Example input data for 'SpaTopic'

Description

multiplexed image data on tumor tissue sample from non small cell lung cancer patient

Usage

lung5
lung5

Format

## 'lung5' A data frame with 100149 rows and 4 columns:

image: Image ID
X: X coordinate of the cell
Y: Y coordinate of the cell
type: cell type

Source

<https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/nsclc-ffpe-dataset/>

Print method for SpaTopic objects

Description

Provides a formatted summary of SpaTopic results when the object is printed. This method displays key model metrics and explains how to access different components of the model output.

Usage

## S3 method for class 'SpaTopic'
print(x, ...)
## S3 method for class 'SpaTopic'
print(x, ...)

Arguments

`x`	An object of class "SpaTopic" returned by the SpaTopic_inference function
`...`	Additional arguments passed to print methods (not used)

Details

The method displays: - Number of topics identified - Model perplexity (lower is better) - DIC (Deviance Information Criterion) for model comparison - A preview of the topic distributions across cell types - Instructions on how to access full results

Value

No return value, called for side effect of printing

Examples

# If gibbs.res is a SpaTopic object:
# print(gibbs.res)
# If gibbs.res is a SpaTopic object:
# print(gibbs.res)

Convert a Seurat v5 object as the input of 'SpaTopic'

Description

Prepare 'SpaTopic' input from one Seurat v5 object

Usage

Seurat5obj_to_SpaTopic(object, group.by, image = "image1")
Seurat5obj_to_SpaTopic(object, group.by, image = "image1")

Arguments

`object`	Seurat v5 object
`group.by`	`character`. The name of the column that contains celltype information in the Seurat object.
`image`	`character`. The name of the image. Default is "image1".

Value

Return a data frame as the input of 'SpaTopic'

Examples


## nano.obj is a Seurat v5 object
#dataset<-Seurat5obj_to_SpaTopic(object = nano.obj, 
#                 group.by = "predicted.annotation.l1",image = "image1")
## Expect output
data("lung5")

## nano.obj is a Seurat v5 object
#dataset<-Seurat5obj_to_SpaTopic(object = nano.obj, 
#                 group.by = "predicted.annotation.l1",image = "image1")
## Expect output
data("lung5")

'SpaTopic': fast topic inference to identify tissue architecture in multiplexed images

Description

This is the main function of 'SpaTopic', implementing a Collapsed Gibbs Sampling algorithm to learn topics, which referred to different tissue microenvironments, across multiple multiplexed tissue images. The function takes cell labels and coordinates on tissue images as input, and returns the inferred topic labels for every cell, as well as topic contents, a distribution over celltypes. The function recovers spatial tissue architectures across images, as well as indicating cell-cell interactions in each domain.

Usage

SpaTopic_inference(
  tissue,
  ntopics,
  sigma = 50,
  region_radius = 400,
  kneigh = 5,
  npoints_selected = 1,
  ini_LDA = TRUE,
  ninit = 10,
  niter_init = 100,
  beta = 0.05,
  alpha = 0.01,
  trace = FALSE,
  seed = 123,
  thin = 20,
  burnin = 1000,
  niter = 200,
  display_progress = TRUE,
  do.parallel = FALSE,
  n.cores = 1,
  axis = "2D"
)
SpaTopic_inference(
  tissue,
  ntopics,
  sigma = 50,
  region_radius = 400,
  kneigh = 5,
  npoints_selected = 1,
  ini_LDA = TRUE,
  ninit = 10,
  niter_init = 100,
  beta = 0.05,
  alpha = 0.01,
  trace = FALSE,
  seed = 123,
  thin = 20,
  burnin = 1000,
  niter = 200,
  display_progress = TRUE,
  do.parallel = FALSE,
  n.cores = 1,
  axis = "2D"
)

Arguments

`tissue`	(Required). A data frame or a list of data frames. One for each image. Each row represent a cell with its image ID, X, Y coordinates on the image, celltype, with column names (image, X, Y, type), respectively. You may add another column Y2 for 3D tissue image.
`ntopics`	(Required). Number of topics. Topics will be obtained as distributions of cell types.
`sigma`	Default is 50. The lengthscale of the Nearest-neighbor Exponential Kernel. Sigma controls the strength of decay of correlation with distance in the kernel function. Please check the paper for more information. Need to be adjusted based on the image resolution
`region_radius`	Default is 400. The radius for each grid square when sampling region centers for each image. Need to be adjusted based on the image resolution and pattern complexity.
`kneigh`	Default is 5. Only consider the top 5 closest region centers for each cell.
`npoints_selected`	Default is 1. Number of points sampled for each grid square when sampling region centers for each image. Used with `region_radius`.
`ini_LDA`	Default is TRUE. Use warm start strategy for initialization and choose the best one to continue. If 0, it simply uses the first initialization.
`ninit`	Default is 10. Number of initialization. Only retain the initialization with the highest log likelihood (perplexity).
`niter_init`	Default is 100. Warm start with 100 iterations in the Gibbs sampling during initialization.
`beta`	Default is 0.05. A hyperparameter to control the sparsity of topic content (topic-celltype) matrix `Beta`. A smaller value introduces more sparse in `Beta`.
`alpha`	Default is 0.01. A hyperparameter to control the sparsity of document (region) content (region-topic) matrix `Theta`. For our application, we keep it very small for the sparsity in `Theta`.
`trace`	Default is FALSE. Compute and save log likelihood, `Ndk`, `Nwk` for every posterior samples. Useful when you want to use DIC to select number of topics, but it is time consuming to compute the likelihood for every posterior samples.
`seed`	Default is 123. Random seed.
`thin`	Default is 20. Key parameter in Gibbs sampling. Collect a posterior sample for every thin=20 iterations.
`burnin`	Default is 1000. Key parameter in Gibbs sampling. Start to collect posterior samples after 1000 iterations. You may increase the number of iterations for burn-in for highly complex tissue images.
`niter`	Default is 200. Key parameter in Gibbs sampling. Number of posterior samples collected for model inference.
`display_progress`	Default is TRUE. Display the progress bar.
`do.parallel`	Default is FALSE. Use parallel computing through R package `foreach`.
`n.cores`	Default is 1. Number of cores used in parallel computing.
`axis`	Default is "2D". You may switch to "3D" for 3D tissue images. However, the model inference for 3D tissue is still under test.

Value

Return a SpaTopic-class object. A list of outputs from Gibbs sampling.

Examples


## tissue is a data frame containing cellular information from one image or
## multiple data frames from multiple images.

data("lung5")
## NOT RUN, it takes about 90s
library(sf)
#gibbs.res<-SpaTopic_inference(lung5, ntopics = 7,
#                               sigma = 50, region_radius = 400)
                             
                              
## generate a fake image 2 and make an example for multiple images
## NOT RUN
#lung6<-lung5
#lung6$image<-"image2"  ## The image ID of two images should be different
#gibbs.res<-SpaTopic_inference(list(A = lung5, B = lung6), 
#                 ntopics = 7, sigma = 50, region_radius = 400) 

## tissue is a data frame containing cellular information from one image or
## multiple data frames from multiple images.

data("lung5")
## NOT RUN, it takes about 90s
library(sf)
#gibbs.res<-SpaTopic_inference(lung5, ntopics = 7,
#                               sigma = 50, region_radius = 400)
                             
                              
## generate a fake image 2 and make an example for multiple images
## NOT RUN
#lung6<-lung5
#lung6$image<-"image2"  ## The image ID of two images should be different
#gibbs.res<-SpaTopic_inference(list(A = lung5, B = lung6), 
#                 ntopics = 7, sigma = 50, region_radius = 400)

Format messages for SpaTopic package

Description

Creates consistently formatted messages for the SpaTopic package with timestamps and message type indicators. This function helps standardize all output messages across the package. Error messages will stop execution.

Usage

spatopic_message(type = "INFO", message, timestamp = TRUE)
spatopic_message(type = "INFO", message, timestamp = TRUE)

Arguments

`type`	Character string indicating message type (e.g., "INFO", "WARNING", "ERROR", "PROGRESS")
`message`	The message content to display
`timestamp`	Logical; whether to include a timestamp in the message (default: TRUE)

Details

This function prefixes messages with a timestamp and the SpaTopic tag, creating a consistent message format throughout the package. When type="ERROR", this will stop execution with stop(). When type="WARNING", this will use warning() for non-fatal warnings. All other message types will use message() for informational output.

Value

No return value, called for side effect of displaying a message

Examples

## Not run: 
spatopic_message("INFO", "Starting analysis...")
spatopic_message("WARNING", "Parameter out of recommended range", timestamp = FALSE)
spatopic_message("ERROR", "Required input missing") # This will stop execution
spatopic_message("PROGRESS", "Processing complete")

## End(Not run)

## Not run: 
spatopic_message("INFO", "Starting analysis...")
spatopic_message("WARNING", "Parameter out of recommended range", timestamp = FALSE)
spatopic_message("ERROR", "Required input missing") # This will stop execution
spatopic_message("PROGRESS", "Processing complete")

## End(Not run)

A class of the output from 'SpaTopic'

Description

Outputs from function SpaTopic_inference. A list contains the following members:

$Perplexity. The perplexity is for the training data. Let N be the total number of cells across all images. $Perplexity = exp(-loglikelihood/N)$
$Deviance. $Deviance = -2loglikelihood$ .
$loglikelihood. The model log-likelihood.
$loglike.trace. The log-likelihood for every collected posterior sample. NULL if trace = FALSE.
$DIC. Deviance Information Criterion. NULL if trace = FALSE.
$Beta. Topic content matrix with rows as celltypes and columns as topics
$Theta. Topic prevalent matrix with rows as regions and columns as topics
$Ndk. Number of cells per topic (col) per region (row).
$Nwk. Number of cells per topic (col) per celltype (row).
$Z.trace. Number of times cell being assigned to each topic across all posterior samples. We can further compute the posterior distributions of Z (topic assignment) for individual cells.
$doc.trace. Ndk for every collected posterior sample. NULL if trace = FALSE.
$word.trace. Nwk for every collected posterior sample. NULL if trace = FALSE.
$cell_topics. Final topic assignments Z for individual cells.
$parameters. Model parameters used in the analysis.

SpaTopic: Spatial Topic Modeling for Multiplexed Images

Description

SpaTopic is a package for spatial topic modeling of Multiplexed images. It adapts an approach originally developed for image segmentation in computer vision, incorporating spatial information into the flexible design of regions (image partitions, analogous to documents in language modeling). We further refined the approach to address unique challenges in cellular images and provide an efficient C++ implementation of the algorithm in this R package.

Compared to other KNN-based methods (such as KNN-kmeans, the default neighborhood analysis in Seurat v5 R package), SpaTopic runs much faster on large-scale image datasets.

The main functions in the 'SpaTopic' package

Prepare input Seurat5obj_to_SpaTopic
Model Inference SpaTopic_inference
Print results print.SpaTopic

Author(s)

Xiyu Peng [email protected]

Source

<https://github.com/xiyupeng/SpaTopic>

Spatially stratified random sample points from an image.

Description

Spatially stratified random sample points from an image by R package sf

Usage

stratified_sampling_sf(
  points,
  cellsize = c(600, 600),
  num_samples_per_stratum = 1
)
stratified_sampling_sf(
  points,
  cellsize = c(600, 600),
  num_samples_per_stratum = 1
)

Arguments

`points`	a data frame contains all points in a image with X, Y coordinates.
`cellsize`	a vector of length 2 contains the size of each grid square. Default c(600,600).
`num_samples_per_stratum`	number of point selected from each grid square. Default 1.

Value

Return a vector contains index of sampled points.

Examples


data("lung5")
pt_idx<-stratified_sampling_sf(lung5, cellsize = c(600,600))

data("lung5")
pt_idx<-stratified_sampling_sf(lung5, cellsize = c(600,600))

Package 'SpaTopic'

Help Index

Example input data for 'SpaTopic'

Description

Usage

Format

Source

See Also

Print method for SpaTopic objects

Description

Usage

Arguments

Details

Value

Examples

Convert a Seurat v5 object as the input of 'SpaTopic'

Description

Usage

Arguments

Value

See Also

Examples

'SpaTopic': fast topic inference to identify tissue architecture in multiplexed images

Description

Usage

Arguments

Value

See Also

Examples

Format messages for SpaTopic package

Description

Usage

Arguments

Details

Value

Examples

A class of the output from 'SpaTopic'

Description

See Also

SpaTopic: Spatial Topic Modeling for Multiplexed Images

Description

Author(s)

Source

See Also

Spatially stratified random sample points from an image.

Description

Usage

Arguments

Value

Examples