Title: | Calls Copy Number Alterations from Slide-Seq Data |
---|---|
Description: | This takes spatial single-cell-type RNA-seq data (specifically designed for Slide-seq v2) that calls copy number alterations (CNAs) using pseudo-spatial binning, clusters cellular units (e.g. beads) based on CNA profile, and visualizes spatial CNA patterns. Documentation about 'SlideCNA' is included in the the pre-print by Zhang et al. (2022, <doi:10.1101/2022.11.25.517982>). The package 'enrichR' (>= 3.0), conditionally used to annotate SlideCNA-determined clusters with gene ontology terms, can be installed at <https://github.com/wjawaid/enrichR> or with install_github("wjawaid/enrichR"). |
Authors: | Diane Zhang [aut, cre] , Johanna Klughammer [aut] , Jan Watter [aut] , Broad Institute of MIT and Harvard [cph, fnd] |
Maintainer: | Diane Zhang <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2025-01-24 02:01:36 UTC |
Source: | CRAN |
This function computes a pseudospatial distance between beads that combines spatial distance and distance from the expression space, then using the silhouette score and hierarchical clustering, segregates beads into bins
bin( dat, md, k, pos = TRUE, pos_k = 55, ex_k = 1, hc_function = "ward.D2", plot_directory )
bin( dat, md, k, pos = TRUE, pos_k = 55, ex_k = 1, hc_function = "ward.D2", plot_directory )
dat |
data.table of smoothed relative expression intensities |
md |
data.table of metadata of each bead |
k |
number of malignant bins to set |
pos |
TRUE if doing spatial and expressional binning, FALSE if just expressional binning |
pos_k |
positional weight |
ex_k |
expressional weight |
hc_function |
hierarchical clustering function |
plot_directory |
output plot directory path |
A data.table of bead metadata combined with bin designations
This function combines metadata with binned relative expression intensities
bin_metadata( md, dat, avg_bead_per_bin = 12, pos = TRUE, pos_k = 55, ex_k = 1, hc_function = "ward.D2", plot_directory )
bin_metadata( md, dat, avg_bead_per_bin = 12, pos = TRUE, pos_k = 55, ex_k = 1, hc_function = "ward.D2", plot_directory )
md |
data.table of metadata of each bead |
dat |
data.table of smoothed relative expression intensities |
avg_bead_per_bin |
integer of average number of beads there should be per bin |
pos |
TRUE if doing spatial and expressional binning, FALSE if just expressional binning |
pos_k |
positional weight |
ex_k |
expressional weight |
hc_function |
hierarchical clustering function |
plot_directory |
output plot directory path |
A data.table of bead metadata combined with binned expression intensities for all genes for all beads
Take in a data.table of genomic positions and smoothed expression intensities counts and center by subtracting average intensity across all beads for each gene
center_rm(rm)
center_rm(rm)
rm |
data.table of smoothed expression intensities counts |
centered_rm data.table of smoothed, centered expression intensities
This function adds another column for cluster designation to a seurat object's meta data and bins beads
clone_so(so, hcl_sub, md, mal = FALSE)
clone_so(so, hcl_sub, md, mal = FALSE)
so |
Seurat object of beads and their meta data |
hcl_sub |
hierarchical clustering object of cluster assignemnt as outputted from SlideCNA::plot_clones() |
md |
data.table of metadata of each bead |
mal |
TRUE if only using malignant beads |
A seurat object updated with clone information
This function prepares data for plotting and makes a heat map of CNV scores per bead across all genes
cnv_heatmap( cnv_data, md, chrom_colors, hc_function = "ward.D2", plot_directory )
cnv_heatmap( cnv_data, md, chrom_colors, hc_function = "ward.D2", plot_directory )
cnv_data |
list object of cnv data from SlideCNA::prep_cnv_dat() |
md |
data.table of metadata of each bead |
chrom_colors |
vector of colors labeled by which chromosome they correspond to |
hc_function |
character for which hierarchical clustering function to use |
plot_directory |
output plot directory path |
None
This function will create rows for each bead and gene combination, adding in new metadata with bin designations
dat_to_long(dat, md)
dat_to_long(dat, md)
dat |
data.table of smoothed relative expression intensities |
md |
data.table of metadata per bead |
A data.table of bead expression intensities per gene with metadata in long format
This function uses Seurat's marker finding capability to find DEGs of each cluster
find_cluster_markers( so_clone, type, logfc.threshold = 0.2, min.pct = 0, only.pos = TRUE, n_markers = 5, value = "log2_expr", text_size = 16, title_size = 18, legend_size_pt = 4, p_val_thresh = 0.05, bin = TRUE, plot_directory = None )
find_cluster_markers( so_clone, type, logfc.threshold = 0.2, min.pct = 0, only.pos = TRUE, n_markers = 5, value = "log2_expr", text_size = 16, title_size = 18, legend_size_pt = 4, p_val_thresh = 0.05, bin = TRUE, plot_directory = None )
so_clone |
seurat object with 'clone' (SlideCNA-designated cluster) and bin annotations |
type |
character string that is 'all' if using malignant and normal clusters and 'malig' if just using malignant clusters |
logfc.threshold |
numeric float that is seurat parameter, representing the minimum log2 fold change for DEGs to be significant |
min.pct |
numeric Seurat function parameter |
only.pos |
TRUE if only using DEGs with positive log2 fold change |
n_markers |
integer of number of top DEGs to plot/use |
value |
expression value of DEGs; one of ("log2_expr", "avg_expr", and "avg_log2FC") for log2-normalized aerage epxression, average expression, or log2 fold change |
text_size |
Ggplot2 text size |
title_size |
Ggplot2 title size |
legend_size_pt |
Ggplot2 legend_size_pt |
p_val_thresh |
value for p value cutoff for DEGs |
bin |
TRUE if using binned beads |
plot_directory |
output plot directory path |
A list object with cluster marker information markers_clone = data.table of all cluster markers top_markers_clone = data.table of just top cluster markers top_clone_vis = data.frame formatted for plot visualization of top cluster markers
This function utilizes cluster-specific DEGs to identify cluster-specifc GO biological processes and plots these if they occur
find_go_terms( cluster_markers_obj, type, n_terms = 5, text_size, title_size, plot_directory )
find_go_terms( cluster_markers_obj, type, n_terms = 5, text_size, title_size, plot_directory )
cluster_markers_obj |
list object with cluster marker information |
type |
character string that is 'all' if using malignant and normal clusters and 'malig' if just using malignant clusters |
n_terms |
integer of number of top DEGs to plot/use |
text_size |
integer of text size for ggplot |
title_size |
integer of title size for ggplot |
plot_directory |
output plot directory path |
A list object with cluster GO term information en_clone = data.table of cluster GO terms top_en_clone = data.table of just top cluster GO terms
This function uses the Silhouette Method applied to CNV scores to determine the best number of clusters to divide the binned beads into
get_num_clust( data, hc_func = "ward.D2", max_k = 10, plot = TRUE, malig = FALSE, k = NA, plot_directory )
get_num_clust( data, hc_func = "ward.D2", max_k = 10, plot = TRUE, malig = FALSE, k = NA, plot_directory )
data |
cnv_data list object of cnv data from SlideCNA::prep_cnv_dat() |
hc_func |
character string for which hierarchical clustering function to use |
max_k |
integer of number max number of clusters to evaluate (2:max_k) |
plot |
TRUE if plotting silhoutte scores per cluster |
malig |
TRUE if only using malignant bins and FALSE if using all bins |
k |
integer of optimal number of clusters, if known, and NA if not known |
plot_directory |
output plot directory path |
An integer representing the number of clusters that optimizes the silhouette score
This function will combine beads into bins, taking the average expression intensities, average positions, most common cluster seurat cluster, and most common cluster/tissue type of constituent beads
long_to_bin(dat_long, plot_directory, spatial = TRUE)
long_to_bin(dat_long, plot_directory, spatial = TRUE)
dat_long |
data.table of bead expression intensities per gene with metadata in long format |
plot_directory |
output plot directory path |
spatial |
True if using spatial information |
data.table of expression intensities at aggregated bin level
This function takes in raw counts (and potentially meta data) to make a Seurat object and process it
make_seurat_annot( cb, md = NULL, seed_FindClusters = 0, seed_RunTSNE = 1, seed_RunUMAP = 42 )
make_seurat_annot( cb, md = NULL, seed_FindClusters = 0, seed_RunTSNE = 1, seed_RunUMAP = 42 )
cb |
sparse counts matrix (genes x cells/beads) |
md |
data.frame of meta data for cells/beads if specific annotations known |
seed_FindClusters |
seed number for FindCLusters |
seed_RunTSNE |
seed number for RunTSNE |
seed_RunUMAP |
seed number for RunUMAP |
A Seurat object with specific Seurat features run
Aggregate Seurat object counts by bin to create a new Seurat object with binned beads as units instead of beads
make_so_bin(so, md, hcl_sub, mal = FALSE)
make_so_bin(so, md, hcl_sub, mal = FALSE)
so |
Seurat object of beads and their meta data |
md |
data.frame of metadata for Seurat object |
hcl_sub |
hierarchical clustering object of cluster assignemnt as outputted from SlideCNA::plot_clones() |
mal |
TRUE if using malignant beads only |
A Seurat object with binned beads as units and corresponding binned metadata
This function colors and plots each bin by its mean CNV score on spatial coordinates for each chromosome
mean_cnv_plot( cnv_data, text_size, title_size, legend_height_bar, plot_directory )
mean_cnv_plot( cnv_data, text_size, title_size, legend_height_bar, plot_directory )
cnv_data |
list object of cnv data from SlideCNA::prep_cnv_dat() |
text_size |
integer of text size for ggplot |
title_size |
integer of title size for ggplot |
legend_height_bar |
integer of bar height of legend for ggplot |
plot_directory |
output plot directory path |
None
This function finds the mode of a vector
mode(x)
mode(x)
x |
vector (column in data.table) to calculate the mode from |
mode of the vector
This function plots cluster dendrograms, spatial assignment, and the CNV heat map
plot_clones( cnv_data, md, k, type, chrom_colors, text_size, title_size, legend_size_pt, legend_height_bar, hc_function = "ward.D2", plot_directory, spatial = TRUE )
plot_clones( cnv_data, md, k, type, chrom_colors, text_size, title_size, legend_size_pt, legend_height_bar, hc_function = "ward.D2", plot_directory, spatial = TRUE )
cnv_data |
list object of cnv data from SlideCNA::prep_cnv_dat() |
md |
data.table of metadata of each bead |
k |
integer of number of clusters/clones |
type |
character string, being "all" if using all binned beads, or "malig" if just malignant binned beads |
chrom_colors |
vector of colors labeled by which chromosome they correspond to |
text_size |
Ggplot2 text size |
title_size |
Ggplot2 title size |
legend_size_pt |
Ggplot2 legend_size_pt |
legend_height_bar |
Ggplot2 legend_height_bar |
hc_function |
character string for which hierarchical clustering function to use |
plot_directory |
output plot directory path |
spatial |
TRUE if using spatial information |
A hierarchical clustering object of the clusters
This function takes in a data table of raw counts and a vector of reference/normal beads to normalize counts and adjust for reference expression.
prep(so, normal_beads, gene_pos, chrom_ord, logTPM = FALSE)
prep(so, normal_beads, gene_pos, chrom_ord, logTPM = FALSE)
so |
Seurat object of Slide-seq data with raw counts |
normal_beads |
vector of names of normal beads |
gene_pos |
data.table with columns for GENE, chr, start, end, rel_gene_pos (1 : # of genes on chromosome) |
chrom_ord |
vector of the names of chromosomes in order |
logTPM |
TRUE if performing adjustment with logTPM |
A data.table of normalized, capped, and ref-adjusted counts with genomic psoition info
This function caps CNV scores, adds annotation columns for plotting, performs hierarchical clustering of bins based on similar CNV score, and plots nUMI per bin
prep_cnv_dat( dat_bin, lower = 0.6, upper = 1.4, hc_function = "ward.D2", plot_directory )
prep_cnv_dat( dat_bin, lower = 0.6, upper = 1.4, hc_function = "ward.D2", plot_directory )
dat_bin |
data.table of CNV scores per bin |
lower |
numeric float to represent the lower cap for CNV scores |
upper |
numeric float to represent the upper cap for CNV scores |
hc_function |
character for which hierarchical clustering function to use |
plot_directory |
output plot directory path |
A list object for downstream cnv plotting and analysis all = data.table of CNV scores of all bins x (metadata + genes) malig = data.table of CNV scores of just malignant bins x (metadata + genes) all_wide = data.frame in wide format of CNV scores of all bins x (metadata + genes) malig_wide = data.frame in wide format of CNV scores of just malignant bins x (metadata + genes) hcl = hclust object that describes the hierarchical clustering for malignant bins hcl_all = hclust object that describes the hierarchical clustering for all bins
This function colors and plots each bin by its CNV score quantiles (min, 1st quartile, median, 3rd quartile, max) on spatial coordinates for each chromosome
quantile_plot( cnv_data, cluster_label = "seurat_clusters", text_size, title_size, legend_height_bar, plot_directory )
quantile_plot( cnv_data, cluster_label = "seurat_clusters", text_size, title_size, legend_height_bar, plot_directory )
cnv_data |
list object of cnv data from SlideCNA::prep_cnv_dat() |
cluster_label |
character string of which column name to keep |
text_size |
integer of text size for ggplot |
title_size |
integer of title size for ggplot |
legend_height_bar |
integer of bar height of legend for ggplot |
plot_directory |
output plot directory path |
None
Take in a data.table of genomic positions and smoothed, centered expression intensities counts and adjust for reference beads by subtracting average intensities of reference beads for each gene. This is the second reference adjustment.
ref_adj(centered_rm, normal_beads)
ref_adj(centered_rm, normal_beads)
centered_rm |
data.table of smoothed, centered expression intensities counts |
normal_beads |
vector of names of normal beads |
rm_adj data.table of smoothed relative expression intensities
This function finds the GO biological processes associated with the top n genes using enrichR
run_enrichr(genes, n_genes)
run_enrichr(genes, n_genes)
genes |
vector of differentially expressed genes |
n_genes |
number of the most significantly enriched DEGs to base gene enrichment from |
A data.table of the most significant GO terms and their meta data
Take a raw expression counts, cell type annotations, and positional cooridnates to identify CNA patterns across space and CNA-based clustering patterns
run_slide_cna( counts, beads_df, gene_pos, output_directory, plot_directory, spatial = TRUE, roll_mean_window = 101, avg_bead_per_bin = 12, pos = TRUE, pos_k = 55, ex_k = 1, hc_function_bin = "ward.D2", spatial_vars_to_plot = c("seurat_clusters", "bin_all", "N_bin", "umi_bin", "cluster_type"), scale_bin_thresh_hard = TRUE, lower_bound_cnv = 0.6, upper_bound_cnv = 1.4, hc_function_cnv = "ward.D2", hc_function_cnv_heatmap = "ward.D2", quantile_plot_cluster_label = "seurat_clusters", hc_function_silhouette = "ward.D2", max_k_silhouette = 10, plot_silhouette = TRUE, hc_function_plot_clones = "ward.D2", use_GO_terms = TRUE, chrom_ord = c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chr23", "chrX", "chrY", "chrM"), chrom_colors = c(chr1 = "#8DD3C7", chr2 = "#FFFFB3", chr3 = "#BEBADA", chr4 = "#FB8072", chr5 = "#80B1D3", chr6 = "#FDB462", chr7 = "#B3DE69", chr8 = "#FCCDE5", chr9 = "#D9D9D9", chr10 = "#BC80BD", chr11 = "#CCEBC5", chr12 = "#FFED6F", chr13 = "#1B9E77", chr14 = "#D95F02", chr15 = "#7570B3", chr16 = "#E7298A", chr17 = "#66A61E", chr18 = "#E6AB02", chr19 = "#A6761D", chr20 = "#666666", chr21 = "#A6CEE3", chr22 = "#1F78B4", chrX = "#B2DF8A"), text_size = 16, title_size = 18, legend_size_pt = 4, legend_height_bar = 1.5 )
run_slide_cna( counts, beads_df, gene_pos, output_directory, plot_directory, spatial = TRUE, roll_mean_window = 101, avg_bead_per_bin = 12, pos = TRUE, pos_k = 55, ex_k = 1, hc_function_bin = "ward.D2", spatial_vars_to_plot = c("seurat_clusters", "bin_all", "N_bin", "umi_bin", "cluster_type"), scale_bin_thresh_hard = TRUE, lower_bound_cnv = 0.6, upper_bound_cnv = 1.4, hc_function_cnv = "ward.D2", hc_function_cnv_heatmap = "ward.D2", quantile_plot_cluster_label = "seurat_clusters", hc_function_silhouette = "ward.D2", max_k_silhouette = 10, plot_silhouette = TRUE, hc_function_plot_clones = "ward.D2", use_GO_terms = TRUE, chrom_ord = c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chr23", "chrX", "chrY", "chrM"), chrom_colors = c(chr1 = "#8DD3C7", chr2 = "#FFFFB3", chr3 = "#BEBADA", chr4 = "#FB8072", chr5 = "#80B1D3", chr6 = "#FDB462", chr7 = "#B3DE69", chr8 = "#FCCDE5", chr9 = "#D9D9D9", chr10 = "#BC80BD", chr11 = "#CCEBC5", chr12 = "#FFED6F", chr13 = "#1B9E77", chr14 = "#D95F02", chr15 = "#7570B3", chr16 = "#E7298A", chr17 = "#66A61E", chr18 = "#E6AB02", chr19 = "#A6761D", chr20 = "#666666", chr21 = "#A6CEE3", chr22 = "#1F78B4", chrX = "#B2DF8A"), text_size = 16, title_size = 18, legend_size_pt = 4, legend_height_bar = 1.5 )
counts |
data.frame of raw counts (genes x beads) |
beads_df |
data.frame of annotation of each bead (beads x annotations); contains columns 'bc' for bead names, 'cluster_type' for annotations of 'Normal' or 'Malignant', 'pos_x' for x-coordinate bead positions, and 'pos_y' for y-coordinate bead positions |
gene_pos |
data.frame with columns for GENE, chr, start, end, rel_gene_pos (1 : # of genes on chromosome) |
output_directory |
output directory path |
plot_directory |
output plot directory path |
spatial |
TRUE if using spatial information FALSE if not |
roll_mean_window |
integer number of adjacent genes for which to average over in pyramidal weighting scheme |
avg_bead_per_bin |
integer of average number of beads there should be per bin |
pos |
TRUE if doing spatial and expressional binning, FALSE if just expressional binning |
pos_k |
positional weight |
ex_k |
expressional weight |
hc_function_bin |
hierarchical clustering function for binning; to feed hclust's method argument, one of "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid" |
spatial_vars_to_plot |
character vector of features to plot/columns of metadata |
scale_bin_thresh_hard |
TRUE if using strict thresholds for expression thresholds and FALSE if adjusting thresholds based on 1 + or - the mean of absolute min and max vlaues |
lower_bound_cnv |
numeric float to represent the lower cap for CNV scores |
upper_bound_cnv |
numeric float to represent the upper cap for CNV scores |
hc_function_cnv |
character for which hierarchical clustering function to use for CNV-calling; to feed hclust's method argument, one of "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid" |
hc_function_cnv_heatmap |
character for which hierarchical clustering function to use for visualzing CNV heat map; to feed hclust's method argument, one of "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid" |
quantile_plot_cluster_label |
character string of which column name to keep in quantile plot |
hc_function_silhouette |
character string for which hierarchical clustering function to use for the Silhouette method; to feed hclust's method argument, one of "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid" |
max_k_silhouette |
integer of number max number of clusters to evaluate (2:max_k_silhouette) . in Silhouette method |
plot_silhouette |
TRUE if plotting silhouette scores for clustering |
hc_function_plot_clones |
character string for which hierarchical clustering function to use in plotting clones |
use_GO_terms |
TRUE if using enrichR to get Gene Ontology terms for SlideCNA-defined clusters |
chrom_ord |
character vector of order and names of chromosomes |
chrom_colors |
character vector of which colors each chromosome should be in heat map |
text_size |
integer of size of text in some ggplots |
title_size |
integer of size of title in some ggplots |
legend_size_pt |
integer of size of legend text size in some ggplots |
legend_height_bar |
integer of height of legend bar in some ggplots |
None
This function re-scales expression intensities to be in a smaller range, normalizes for nUMI per bin, and subtracts reference bead signal
scale_nUMI(dat_bin, thresh_hard = FALSE)
scale_nUMI(dat_bin, thresh_hard = FALSE)
dat_bin |
data.table of relative expression intensities per bin |
thresh_hard |
TRUE if using strict thresholds for expression thresholds and FALSE if adjusting thresholds based on 1 + or - the mean of absolute min and max values |
data.table of CNV scores per bin
This function re-scales expression intensities to be in a smaller range, normalizes for nUMI per bin, and centers the CNV scores to have a mean of 1
scalefit(obj, nbin, start, end)
scalefit(obj, nbin, start, end)
obj |
data.table of relative expression intensities per bin |
nbin |
nUMIs in that specific bin |
start |
lower bound of CNV scores |
end |
upper bound of CNV scores |
vector of adjusted CNV scores for that bins with nbin number of nUMIs within the range (inclusive) of start to end
This function will plot information about beads and bins on x and y coordinates
SpatialPlot( dat_long, vars = NULL, text_size, title_size, legend_size_pt, legend_height_bar, plot_directory )
SpatialPlot( dat_long, vars = NULL, text_size, title_size, legend_size_pt, legend_height_bar, plot_directory )
dat_long |
data.table of bead expression intensities per gene with metadata in long format |
vars |
character vector of features to plot/columns of metadata |
text_size |
Ggplot2 text size |
title_size |
Ggplot2 title size |
legend_size_pt |
Ggplot2 legend_size_pt |
legend_height_bar |
Ggplot2 legend_height_bar |
plot_directory |
output plot directory path |
None
Take in a data.table of genomic positions and bead normalized/modified counts and apply pyramidal weighting with a window size k to create smoothed expression intensities
weight_rollmean(dat, k = 101)
weight_rollmean(dat, k = 101)
dat |
data.table of normalized/adjusted counts |
k |
size of window for weighting |
A data.table of expression intensities
Take in a counts matrix and apply pyramidal weighting with a window size k to create smoothed expression intensities
weight_rollmean_sub(mat, k)
weight_rollmean_sub(mat, k)
mat |
matrix of normalized/adjusted counts |
k |
size of window for weighting |
A matrix of smoothed counts