| Title: | Analysing RNA Splicing in Single-Cell RNA Sequencing Data |
|---|---|
| Description: | Provides analysis of high-dimensional single-cell splicing data. Offers a framework to extract and work with ratio-based data structures derived from single-cell RNA sequencing experiments. Provides both a modern 'R6' object-oriented interface and direct matrix manipulation functions. Core functionalities are implemented in 'C++' via 'Rcpp' to ensure high performance and scalability on large datasets. |
| Authors: | Arsham Mikaeili Namini [aut, cre] (ORCID: <https://orcid.org/0000-0002-9453-6951>) |
| Maintainer: | Arsham Mikaeili Namini <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 2.3.1 |
| Built: | 2026-05-13 12:14:03 UTC |
| Source: | https://github.com/cran/splikit |
Calculate the Sum Deviance for Inclusion and Exclusion Matrices
find_variable_events( m1_matrix, m2_matrix = NULL, min_row_sum = 50, n_threads = 1, verbose = FALSE, ... )find_variable_events( m1_matrix, m2_matrix = NULL, min_row_sum = 50, n_threads = 1, verbose = FALSE, ... )
m1_matrix |
A matrix representing the inclusion matrix. Rows are events, columns are barcodes. |
m2_matrix |
A matrix representing the exclusion matrix. Rows are events, columns are barcodes. |
min_row_sum |
A numeric value specifying the minimum row sum threshold for filtering events. Defaults to 50. |
n_threads |
If the module OpenPM is available for your device, the function suggests using multi-thread processing for even faster computation. |
verbose |
Logical. If |
... |
Additional arguments to be passed. |
A data.table containing the events and their corresponding sum deviance values.
# loading the toy dataset toy_obj <- load_toy_M1_M2_object() # getting HVE (high variable events) HVE <- find_variable_events(toy_obj$m1, toy_obj$m2) # printing the results print(HVE[order(-sum_deviance)])# loading the toy dataset toy_obj <- load_toy_M1_M2_object() # getting HVE (high variable events) HVE <- find_variable_events(toy_obj$m1, toy_obj$m2) # printing the results print(HVE[order(-sum_deviance)])
Identifies highly variable genes from a sparse gene expression matrix using one of two methods: variance-stabilizing transformation (VST) or deviance-based modeling. The VST method uses a C++-accelerated approach to compute standardized variance, while the deviance-based method models gene variability across libraries using negative binomial deviances.
find_variable_genes( gene_expression_matrix, method = "vst", n_threads = 1, verbose = FALSE, ... )find_variable_genes( gene_expression_matrix, method = "vst", n_threads = 1, verbose = FALSE, ... )
gene_expression_matrix |
A sparse gene expression matrix (of class |
method |
Character string, either |
n_threads |
If OpenMP is available for your device, the function suggests using multi-thread processing for even faster computation (only for sum_deviance method). |
verbose |
Logical. If |
... |
Additional arguments (currently unused). |
A data.table containing gene names (column events) and computed metrics.
For the deviance method, this includes sum_deviance and variance columns.
library(Matrix) # loading the toy dataset toy_obj <- load_toy_M1_M2_object() # getting high variable genes HVG_VST <- find_variable_genes(toy_obj$gene_expression, method = "vst") # sum_deviance method HVG_DEV <- find_variable_genes(toy_obj$gene_expression, method = "sum_deviance") # Using multi-threading for faster computation (sum_deviance method only) HVG_DEV_MT <- find_variable_genes(toy_obj$gene_expression, method = "sum_deviance", n_threads = 4) # 4 threads # printing the results print(HVG_VST[order(-standardize_variance)]) print(HVG_DEV[order(-sum_deviance)])library(Matrix) # loading the toy dataset toy_obj <- load_toy_M1_M2_object() # getting high variable genes HVG_VST <- find_variable_genes(toy_obj$gene_expression, method = "vst") # sum_deviance method HVG_DEV <- find_variable_genes(toy_obj$gene_expression, method = "sum_deviance") # Using multi-threading for faster computation (sum_deviance method only) HVG_DEV_MT <- find_variable_genes(toy_obj$gene_expression, method = "sum_deviance", n_threads = 4) # 4 threads # printing the results print(HVG_VST[order(-standardize_variance)]) print(HVG_DEV[order(-sum_deviance)])
This function calculates a pseudo R²-like correlation metric using a beta-binomial model
implemented in C++. It takes in a data matrix ZDB_matrix and two model matrices
for inclusion and exclusion, respectively. The function now supports both sparse and dense
matrices for m1 and m2, and allows selection between Cox-Snell and Nagelkerke R² metrics.
get_pseudo_correlation( ZDB_matrix, m1_inclusion = NULL, m2_exclusion = NULL, metric = "CoxSnell", suppress_warnings = TRUE, verbose = FALSE )get_pseudo_correlation( ZDB_matrix, m1_inclusion = NULL, m2_exclusion = NULL, metric = "CoxSnell", suppress_warnings = TRUE, verbose = FALSE )
ZDB_matrix |
A numeric dense matrix of shape (events x samples). Should have rownames representing events. |
m1_inclusion |
A numeric matrix (dense or sparse) of the same number of rows as |
m2_exclusion |
A numeric matrix (dense or sparse) of the same number of rows as |
metric |
Character string specifying which R² metric to compute. Options are "CoxSnell" (default) or "Nagelkerke". |
suppress_warnings |
Logical. If |
verbose |
Logical. If |
A data.table with the following columns:
The event names from ZDB_matrix rownames.
The computed pseudo R² correlation values using the specified metric.
Null correlation values from a permuted version of ZDB_matrix.
set.seed(42) # get the m1 object junction_abundance_object <- load_toy_SJ_object() m1_obj <- make_m1(junction_ab_object = junction_abundance_object) # obtaining the m1 and eventdata m1_inclusion <- m1_obj$m1_inclusion_matrix eventdata <- m1_obj$event_data m2_exclusion <- make_m2(m1_inclusion, eventdata) # creating a dummy ZDB ZDB_matrix <- matrix(rnorm(n = (nrow(m1_inclusion) * ncol(m1_inclusion)), sd = 7), nrow = nrow(m1_inclusion), ncol = ncol(m1_inclusion)) rownames(ZDB_matrix) <- rownames(m1_inclusion) # m1 and m2 can now be either sparse or dense matrices # Example with dense matrices (backward compatible) m1_dense <- as.matrix(m1_inclusion) m2_dense <- as.matrix(m2_exclusion) pseudo_r_square_cox <- get_pseudo_correlation(ZDB_matrix, m1_dense, m2_dense) print(pseudo_r_square_cox) # Example with sparse matrices (more memory efficient) pseudo_r_square_sparse <- get_pseudo_correlation(ZDB_matrix, m1_inclusion, m2_exclusion) # Example using Nagelkerke R-squared instead of Cox-Snell pseudo_r_square_nagel <- get_pseudo_correlation(ZDB_matrix, m1_inclusion, m2_exclusion, metric = "Nagelkerke") print(pseudo_r_square_nagel)set.seed(42) # get the m1 object junction_abundance_object <- load_toy_SJ_object() m1_obj <- make_m1(junction_ab_object = junction_abundance_object) # obtaining the m1 and eventdata m1_inclusion <- m1_obj$m1_inclusion_matrix eventdata <- m1_obj$event_data m2_exclusion <- make_m2(m1_inclusion, eventdata) # creating a dummy ZDB ZDB_matrix <- matrix(rnorm(n = (nrow(m1_inclusion) * ncol(m1_inclusion)), sd = 7), nrow = nrow(m1_inclusion), ncol = ncol(m1_inclusion)) rownames(ZDB_matrix) <- rownames(m1_inclusion) # m1 and m2 can now be either sparse or dense matrices # Example with dense matrices (backward compatible) m1_dense <- as.matrix(m1_inclusion) m2_dense <- as.matrix(m2_exclusion) pseudo_r_square_cox <- get_pseudo_correlation(ZDB_matrix, m1_dense, m2_dense) print(pseudo_r_square_cox) # Example with sparse matrices (more memory efficient) pseudo_r_square_sparse <- get_pseudo_correlation(ZDB_matrix, m1_inclusion, m2_exclusion) # Example using Nagelkerke R-squared instead of Cox-Snell pseudo_r_square_nagel <- get_pseudo_correlation(ZDB_matrix, m1_inclusion, m2_exclusion, metric = "Nagelkerke") print(pseudo_r_square_nagel)
Efficiently computes the variance of each row for either a base R dense matrix or a sparse dgCMatrix, via a single Rcpp entry point. Logs progress messages to the R console.
get_rowVar(M, verbose = FALSE)get_rowVar(M, verbose = FALSE)
M |
A numeric matrix (base R matrix) or a sparse matrix of class |
verbose |
Logical. If |
Dispatches in C++ between dense and sparse implementations to avoid unnecessary overhead or external dependencies. Uses compressed-column traversal for sparse inputs.
A numeric vector of length nrow(M) containing the variance of each row.
Only 32-bit integer indices are supported, due to limitations in R's internal matrix representations. This function will not work with matrices that exceed the 32-bit integer indexing range.
library(Matrix) # Dense example dm <- matrix(rnorm(1000), nrow = 100) get_rowVar(dm) # Sparse example sm <- rsparsematrix(100, 10, density = 0.1) get_rowVar(sm)library(Matrix) # Dense example dm <- matrix(rnorm(1000), nrow = 100) get_rowVar(dm) # Sparse example sm <- rsparsematrix(100, 10, density = 0.1) get_rowVar(sm)
Computes the average silhouette width for a clustering solution using Euclidean distance.
get_silhouette_mean(X, cluster_assignments, n_threads = 1, verbose = FALSE)get_silhouette_mean(X, cluster_assignments, n_threads = 1, verbose = FALSE)
X |
A numeric matrix where rows are observations and columns are features. |
cluster_assignments |
An integer vector of cluster assignments, which must be the same length as the number of rows in |
n_threads |
Number of threads to use for parallel processing. |
verbose |
Logical. If |
A single numeric value: the average silhouette score.
This process can be very slow for large matrices if single-threaded. Use multiple threads to take advantage of parallel computation for significantly faster results.
# Preparing the inputs set.seed(42) pc_matrix <- matrix(data = rnorm(n = 10000 * 15, sd = 2), nrow = 10000, ncol = 15) cluster_numbers <- as.integer(runif(n = 10000, min = 1, max = 10)) # Getting the mean silhouette score n_threads <- parallel::detectCores() score <- get_silhouette_mean(pc_matrix, cluster_numbers, n_threads) print(score)# Preparing the inputs set.seed(42) pc_matrix <- matrix(data = rnorm(n = 10000 * 15, sd = 2), nrow = 10000, ncol = 15) cluster_numbers <- as.integer(runif(n = 10000, min = 1, max = 10)) # Getting the mean silhouette score n_threads <- parallel::detectCores() score <- get_silhouette_mean(pc_matrix, cluster_numbers, n_threads) print(score)
Loads a toy object of M1 and M2 used for examples or testing.
load_toy_M1_M2_object()load_toy_M1_M2_object()
An R object from the toy_m1_m2_obj.rds file.
Loads a toy splice junction object used for examples or testing.
load_toy_SJ_object()load_toy_SJ_object()
An R object from the toy_SJ_object.RDS file.
This function reads in a GTF file, extracts gene annotations, and merges them with
event-level genomic intervals provided by the user. The final data table contains the
original event intervals and the corresponding gene information (for example, gene_id
and gene_name).
make_eventdata_plus(eventdata, GTF_file_direction)make_eventdata_plus(eventdata, GTF_file_direction)
eventdata |
A data table of genomic intervals for events. Must contain columns:
|
GTF_file_direction |
A character string specifying the path to a GTF file. The file
must contain at least these columns: |
Read the GTF: Uses data.table::fread to load GTF data and convert it to
a data table.
Subset for Genes: Keeps only rows where type == "gene", retaining columns
for chromosome, start, end, strand, gene_id, and gene_name.
Strand Conversion: Merges the GTF data with a small lookup table to replace
+ and - with numeric strand indicators 1 and 2 (matching STAR).
Overlaps: With both data sets keyed, uses foverlaps() from data.table to
find intervals in eventdata that fall fully within gene boundaries.
A data table containing overlapping event intervals with added gene metadata
(such as gene_id and gene_name). The columns returned will include both event-level
and gene-level information.
Constructs sparse gene expression matrices from one or more directories containing 10X Genomics-style output. The function supports barcode filtering using either an external whitelist or the internally provided filtered barcode file.
make_gene_count( expression_dirs, sample_ids, whitelist_barcodes = NULL, use_internal_whitelist = TRUE, verbose = FALSE )make_gene_count( expression_dirs, sample_ids, whitelist_barcodes = NULL, use_internal_whitelist = TRUE, verbose = FALSE )
expression_dirs |
A character vector or list of strings. Each element must be a path to a directory containing the
gene expression matrix files: |
sample_ids |
A character vector or list of unique sample identifiers, one for each element in |
whitelist_barcodes |
A list of character vectors. Each list element corresponds to a sample and contains the
barcodes to retain for that sample. If |
use_internal_whitelist |
Logical (default |
verbose |
Logical. If |
The function is designed for bulk or single-cell gene expression processing from 10X-style output folders.
Each input directory should contain the standard matrix.mtx, features.tsv/genes.tsv, and barcodes.tsv
files. Barcodes can be filtered using either a provided whitelist or by relying on the filtered barcode files
output by tools like CellRanger.
If neither an external whitelist nor an internal filtered barcode file is available, all barcodes from the raw matrix will be retained.
If a single sample is provided, returns a sparse matrix of class "dgCMatrix" with genes as rows and barcodes as columns.
If multiple samples are provided, returns a named list of sparse matrices, one per sample ID.
Requires the Matrix package for sparse matrix handling and potentially data.table for efficient I/O.
This function processes STARsolo splicing junction output to create a modality list for splicing data. It supports both single and multiple sample processing, optional barcode filtration, and internal whitelist usage.
make_junction_ab( STARsolo_SJ_dirs, white_barcode_lists = NULL, sample_ids, use_internal_whitelist = TRUE, verbose = FALSE, keep_multi_mapped_junctions = FALSE, ... )make_junction_ab( STARsolo_SJ_dirs, white_barcode_lists = NULL, sample_ids, use_internal_whitelist = TRUE, verbose = FALSE, keep_multi_mapped_junctions = FALSE, ... )
STARsolo_SJ_dirs |
A character vector or list of strings representing the paths to STARsolo SJ directories. Each directory should contain the raw splicing junction output files. |
white_barcode_lists |
A list of character vectors, each containing barcode whitelist(s) for the corresponding sample.
If |
sample_ids |
A character vector or list of unique sample IDs corresponding to each directory in |
use_internal_whitelist |
A logical flag (default |
verbose |
Logical (default |
keep_multi_mapped_junctions |
Logical (default |
... |
Additional parameters for future extensions. |
A list containing processed splicing modality data for each sample. If a single sample is provided, the function returns the processed data as a list. For multiple samples, a named list is returned. Each sample's result contains:
A data.table containing feature metadata.
A sparse matrix of junction abundance.
This function processes junction abundance data from multiple samples to create a splicing modality inclusion matrix (M1). It merges event data, handles start and end coordinate groups, ensures matrix compatibility, and includes robust error handling.
make_m1(junction_ab_object, min_counts = 1, verbose = FALSE)make_m1(junction_ab_object, min_counts = 1, verbose = FALSE)
junction_ab_object |
A named list where each element represents a sample's junction abundance data.
Each element must contain |
min_counts |
Numeric (default 1). Minimum count threshold for filtering events. Events with total counts below this threshold will be removed. |
verbose |
Logical (default |
The function requires the following libraries: data.table, and Matrix.
A list containing the processed data from all samples:
A matrix representing the processed inclusion values for all events across all samples.
A data.table containing the merged and grouped metadata for each event.
# Example usage junction_abundance_object <- load_toy_SJ_object() result <- make_m1(junction_abundance_object) m1_matrix <- result$m1_inclusion_matrix event_metadata <- result$event_data# Example usage junction_abundance_object <- load_toy_SJ_object() result <- make_m1(junction_abundance_object) m1_matrix <- result$m1_inclusion_matrix event_metadata <- result$event_data
Creates the M2 matrix from a given m1_inclusion_matrix and eventdata with intelligent memory management. Automatically detects when the operation would exceed memory limits and switches to a batched sparse matrix approach.
make_m2( m1_inclusion_matrix, eventdata, batch_size = 5000, memory_threshold = 2e+09, force_fast = FALSE, n_threads = 1, use_cpp = TRUE, verbose = FALSE )make_m2( m1_inclusion_matrix, eventdata, batch_size = 5000, memory_threshold = 2e+09, force_fast = FALSE, n_threads = 1, use_cpp = TRUE, verbose = FALSE )
m1_inclusion_matrix |
A sparse matrix to be modified and used for creating the M2 matrix. |
eventdata |
A data.table containing event information with at least |
batch_size |
An integer specifying the number of groups to process per batch (default: 5000). Only used when batched processing is triggered. |
memory_threshold |
A numeric value representing the maximum number of rows allowed in the summary before switching to batched processing (default: 2e9, which is ~93% of 2^31). |
force_fast |
A logical flag to force fast processing regardless of size estimates (default: FALSE). WARNING: This may cause memory errors on large datasets. |
n_threads |
Number of threads for parallel processing (default: 1). Used for C++ implementation and batched R processing. Values > 1 require parallel package for batched mode. |
use_cpp |
Logical flag to use fast C++ implementation (default: TRUE). Falls back to R implementation if FALSE. |
verbose |
A logical flag for detailed progress reporting (default: FALSE). |
A sparse matrix M2 with the dummy row removed and proper adjustments made.
junction_abundance_object <- load_toy_SJ_object() m1_obj <- make_m1(junction_ab_object = junction_abundance_object) # obtaining the m1 and eventdata m1 <- m1_obj$m1_inclusion_matrix eventdata <- m1_obj$event_data m2 <- make_m2(m1_inclusion_matrix = m1, eventdata = eventdata)junction_abundance_object <- load_toy_SJ_object() m1_obj <- make_m1(junction_ab_object = junction_abundance_object) # obtaining the m1 and eventdata m1 <- m1_obj$m1_inclusion_matrix eventdata <- m1_obj$event_data m2 <- make_m2(m1_inclusion_matrix = m1, eventdata = eventdata)
Parses and processes spliced and unspliced gene expression matrices from one or more Velocyto output directories. The function applies barcode filtering using an external whitelist or filtered barcodes file, and optionally merges the results across samples into unified matrices.
make_velo_count( velocyto_dirs, sample_ids, whitelist_barcodes = NULL, use_internal_whitelist = TRUE, merge_counts = FALSE, verbose = FALSE )make_velo_count( velocyto_dirs, sample_ids, whitelist_barcodes = NULL, use_internal_whitelist = TRUE, merge_counts = FALSE, verbose = FALSE )
velocyto_dirs |
A character vector or list of strings. Each element should be a path to a Velocyto output directory.
Each directory must contain subdirectories (typically |
sample_ids |
A character vector or list of unique sample identifiers corresponding to each entry in |
whitelist_barcodes |
A list of character vectors. Each element should provide a whitelist of barcodes to retain for the
corresponding sample. If |
use_internal_whitelist |
Logical (default |
merge_counts |
Logical (default |
verbose |
Logical. If |
The function assumes that each Velocyto directory follows the 10X-like structure typically produced by tools like Loom or Velocyto CLI. Barcode filtering ensures that only high-quality or selected barcodes are retained for downstream RNA velocity analysis.
When merging matrices, barcodes are prefixed with their corresponding sample ID to avoid collisions and preserve traceability.
A list containing processed gene expression matrices:
If merge_counts = FALSE, returns a named list of sample-specific matrices. Each entry contains:
splicedSparse matrix of spliced transcript counts.
unsplicedSparse matrix of unspliced transcript counts.
If merge_counts = TRUE, returns a list with two elements:
splicedMerged sparse matrix of spliced counts across all samples.
unsplicedMerged sparse matrix of unspliced counts across all samples.
Requires the Matrix package for sparse matrix operations and data.table for efficient file parsing.
Draws a gene-model view of a gene's transcripts (one row per transcript)
with splice junctions rendered as arcs above each row. Junctions used by a
single transcript of the gene ("transcript-exclusive") are drawn as solid
black arcs; shared junctions as thin grey arcs. Works with both
Ensembl-style and RefSeq-style GTFs: if a row has no gene_name
attribute, gene_id is used; if no transcript_name, transcript_id
is used.
plot_exclusive_junctions( gtf, target_gene, show_exclusive = TRUE, transcript = NULL, curvature = -0.2, out_file = NULL )plot_exclusive_junctions( gtf, target_gene, show_exclusive = TRUE, transcript = NULL, curvature = -0.2, out_file = NULL )
gtf |
Either a path to an Ensembl- or RefSeq-style GTF, or a
pre-loaded |
target_gene |
Gene symbol (matches |
show_exclusive |
Logical (default |
transcript |
Optional character vector of transcript names to pin
the plot to (overrides |
curvature |
Numeric, arc-height knob for |
out_file |
Optional character. If given, the plot is also written
to this file with |
An S3 object of class "splikit_junction_plot" (a list) with
components:
plotA ggplot object.
infoA data.table with one row per drawn (transcript,
junction) pair and columns gene_name, gene_id,
transcript_name, transcript_id, chr,
strand, j_start, j_end, j_width,
exclusive, n_tx_with_junction,
observed_in_eventdata, row_names_mtx,
is_annot. For GTF-only calls the last three are NA.
exons, junctions, tx_order
The underlying tables used to build the plot, retained for advanced users.
Printing the object renders the plot and then prints info.
Requires the ggplot2 package (declared in Suggests).
if (requireNamespace("ggplot2", quietly = TRUE)) { # Build a tiny synthetic GTF as a data.table (no external file needed). # tx1 has 3 exons (2 junctions); tx2 has 2 exons (1 junction). gtf <- data.table::data.table( seqname = "chr1", source = "toy", type = rep("exon", 5), start = c(100, 300, 500, 100, 500), end = c(200, 400, 600, 200, 600), score = ".", strand = "+", frame = ".", attr = c( 'gene_name "FOO"; transcript_id "tx1"; exon_number "1";', 'gene_name "FOO"; transcript_id "tx1"; exon_number "2";', 'gene_name "FOO"; transcript_id "tx1"; exon_number "3";', 'gene_name "FOO"; transcript_id "tx2"; exon_number "1";', 'gene_name "FOO"; transcript_id "tx2"; exon_number "2";' ) ) res <- plot_exclusive_junctions(gtf, "FOO") res$info }if (requireNamespace("ggplot2", quietly = TRUE)) { # Build a tiny synthetic GTF as a data.table (no external file needed). # tx1 has 3 exons (2 junctions); tx2 has 2 exons (1 junction). gtf <- data.table::data.table( seqname = "chr1", source = "toy", type = rep("exon", 5), start = c(100, 300, 500, 100, 500), end = c(200, 400, 600, 200, 600), score = ".", strand = "+", frame = ".", attr = c( 'gene_name "FOO"; transcript_id "tx1"; exon_number "1";', 'gene_name "FOO"; transcript_id "tx1"; exon_number "2";', 'gene_name "FOO"; transcript_id "tx1"; exon_number "3";', 'gene_name "FOO"; transcript_id "tx2"; exon_number "1";', 'gene_name "FOO"; transcript_id "tx2"; exon_number "2";' ) ) res <- plot_exclusive_junctions(gtf, "FOO") res$info }
Sibling of plot_exclusive_junctions() that restricts drawn arcs to
junctions present in a splikit eventdata table. Exon structure and
exclusivity are still derived from the GTF, so a black arc means the
junction is both transcript-exclusive in the annotation and observed in
the data.
plot_exclusive_junctions_event( target_gene, GTF, eventdata, show_exclusive = TRUE, transcript = NULL, curvature = -0.2 )plot_exclusive_junctions_event( target_gene, GTF, eventdata, show_exclusive = TRUE, transcript = NULL, curvature = -0.2 )
target_gene |
Gene symbol to plot. |
GTF |
Either a GTF path or a pre-loaded |
eventdata |
A |
show_exclusive, transcript, curvature
|
Works with both Ensembl-style and RefSeq-style GTFs. If eventdata
lacks a gene_name column the function runs make_eventdata_plus()
internally (requires GTF to be a path in that case). Observed
junctions not matching any annotated intron are dropped with a message
and their count is reported as novel_junction_count.
An S3 object of class "splikit_junction_plot" with the same
structure as in plot_exclusive_junctions(), plus
novel_junction_count for the unannotated observed junctions. The
info$observed_in_eventdata column is TRUE for all rows, and
row_names_mtx / is_annot are populated from eventdata when
available.
Requires the ggplot2 package (declared in Suggests).
Writes a multi-page PDF: page 1 = full gene view (all transcripts, exclusive arcs in black); subsequent pages = one per exclusive-owning transcript, with its exclusive junction in black.
plot_exclusive_junctions_pdf( gtf, target_gene, out_pdf, curvature = -0.2, width = 10, height = NULL )plot_exclusive_junctions_pdf( gtf, target_gene, out_pdf, curvature = -0.2, width = 10, height = NULL )
gtf, target_gene, curvature
|
|
out_pdf |
Character path for the PDF. |
width, height
|
PDF page dimensions. |
Invisibly, a list with exclusive_transcripts and n_pages.
Requires the ggplot2 package (declared in Suggests).
Renders the stored plot and then prints the info data.table. Called
automatically at the REPL when a function like
plot_exclusive_junctions() is invoked without assignment.
## S3 method for class 'splikit_junction_plot' print(x, n = NULL, ...)## S3 method for class 'splikit_junction_plot' print(x, n = NULL, ...)
x |
A |
n |
Rows of |
... |
Unused. |
Convenience function to create a SplikitObject.
splikit(...)splikit(...)
... |
Arguments passed to |
A new SplikitObject instance.
# From existing matrices using the toy dataset toy <- load_toy_M1_M2_object() obj <- splikit(m1 = toy$m1, m2 = toy$m2, eventData = toy$eventdata)# From existing matrices using the toy dataset toy <- load_toy_M1_M2_object() obj <- splikit(m1 = toy$m1, m2 = toy$m2, eventData = toy$eventdata)
R6 class for splicing analysis in single-cell RNA-seq data. Provides a modern object-oriented interface to splikit functionality while maintaining backward compatibility with existing functions.
The SplikitObject encapsulates the core data structures for splicing analysis:
m1: Inclusion matrix (sparse dgCMatrix)
m2: Exclusion matrix (sparse dgCMatrix)
eventData: Event metadata (data.table)
geneExpression: Optional gene expression matrix
m1Inclusion matrix (dgCMatrix). Rows are events, columns are cells.
m2Exclusion matrix (dgCMatrix). Same dimensions as m1.
eventDataEvent metadata (data.table). One row per event.
geneExpressionOptional gene expression matrix (dgCMatrix).
metadataList containing summary statistics and analysis results.
new()
Create a new SplikitObject.
SplikitObject$new( junction_ab = NULL, m1 = NULL, m2 = NULL, eventData = NULL, min_counts = 1, verbose = FALSE )
junction_abA junction abundance object from make_junction_ab().
If provided, m1 and eventData are computed automatically.
m1An existing inclusion matrix (dgCMatrix).
m2An existing exclusion matrix (dgCMatrix).
eventDataA data.table with event metadata.
min_countsMinimum count threshold for filtering events (default: 1).
verbosePrint progress messages (default: FALSE).
A new SplikitObject instance.
makeM2()
Compute the M2 exclusion matrix from M1 and eventData.
SplikitObject$makeM2( batch_size = 5000, memory_threshold = 2e+09, force_fast = FALSE, n_threads = 1, use_cpp = TRUE, verbose = FALSE )
batch_sizeNumber of groups per batch for memory management (default: 5000).
memory_thresholdMaximum rows before switching to batched processing.
force_fastForce fast processing regardless of size (default: FALSE).
n_threadsNumber of threads for parallel processing (default: 1).
use_cppUse fast C++ implementation (default: TRUE).
verbosePrint progress messages (default: FALSE).
Self (invisibly), for method chaining.
findVariableEvents()
Find highly variable splicing events.
SplikitObject$findVariableEvents( min_row_sum = 50, n_threads = 1, verbose = FALSE )
min_row_sumMinimum row sum threshold for filtering (default: 50).
n_threadsNumber of threads for parallel computation (default: 1).
verbosePrint progress messages (default: FALSE).
A data.table with event names and sum_deviance scores.
findVariableGenes()
Find highly variable genes from gene expression data.
SplikitObject$findVariableGenes(method = "vst", n_threads = 1, verbose = FALSE)
methodMethod for variable gene selection: "vst" or "sum_deviance" (default: "vst").
n_threadsNumber of threads for parallel computation (default: 1).
verbosePrint progress messages (default: FALSE).
A data.table with gene names and variability scores.
getPseudoCorrelation()
Compute pseudo-correlation between splicing and external data.
SplikitObject$getPseudoCorrelation( ZDB_matrix, metric = "CoxSnell", suppress_warnings = TRUE )
ZDB_matrixDense matrix of external data (e.g., gene expression PCs). Must have same dimensions as m1.
metricR-squared metric: "CoxSnell" or "Nagelkerke" (default: "CoxSnell").
suppress_warningsSuppress computation warnings (default: TRUE).
A data.table with event names, pseudo_correlation, and null_distribution.
subset()
Subset the object by events and/or cells.
SplikitObject$subset(events = NULL, cells = NULL)
eventsEvent indices or names to keep.
cellsCell indices or names to keep.
Self (invisibly), for method chaining.
setGeneExpression()
Set the gene expression matrix.
SplikitObject$setGeneExpression(gene_matrix)
gene_matrixA gene expression matrix (will be converted to dgCMatrix).
Self (invisibly), for method chaining.
annotateEvents()
Annotate events with gene information from a GTF file.
SplikitObject$annotateEvents(GTF_file)
GTF_filePath to a GTF annotation file.
Self (invisibly), for method chaining.
summary()
Get a summary of the object.
SplikitObject$summary()
A list with object statistics.
print()
Print a human-readable summary of the object.
SplikitObject$print()
deepCopy()
Create a deep copy of the object.
SplikitObject$deepCopy()
deepIf TRUE, creates a deep copy of all data.
A new SplikitObject with copied data. Validate input matrices and eventData Ensure M2 is computed Ensure matrix is sparse dgCMatrix Standardized error reporting
clone()
The objects of this class are cloneable with this method.
SplikitObject$clone(deep = FALSE)
deepWhether to make a deep clone.
# Create from existing matrices using the toy dataset toy <- load_toy_M1_M2_object() obj <- SplikitObject$new(m1 = toy$m1, m2 = toy$m2, eventData = toy$eventdata) # Find variable events hve <- obj$findVariableEvents(min_row_sum = 50) # Or build M2 from M1 + eventData and chain into feature selection obj2 <- SplikitObject$new(m1 = toy$m1, eventData = toy$eventdata) results <- obj2$makeM2()$findVariableEvents()# Create from existing matrices using the toy dataset toy <- load_toy_M1_M2_object() obj <- SplikitObject$new(m1 = toy$m1, m2 = toy$m2, eventData = toy$eventdata) # Find variable events hve <- obj$findVariableEvents(min_row_sum = 50) # Or build M2 from M1 + eventData and chain into feature selection obj2 <- SplikitObject$new(m1 = toy$m1, eventData = toy$eventdata) results <- obj2$makeM2()$findVariableEvents()