Title: | Variance-Adjusted Mahalanobis |
---|---|
Description: | Contains logic for cell-specific gene set scoring of single cell RNA sequencing data. |
Authors: | H. Robert Frost |
Maintainer: | H. Robert Frost <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.0 |
Built: | 2024-11-29 09:08:21 UTC |
Source: | CRAN |
Implementation of Variance-adjusted Mahalanobis (VAM), a method for cell-specific gene set scoring of scRNA-seq data.
Package: | VAM |
Type: | Package |
Version: | 1.0.0 |
Date: | 2021 |
License: | GPL-2 |
This work was supported by the National Institutes of Health grants K01LM012426, R21CA253408, P20GM130454 and P30CA023108.
H. Robert Frost
Frost, H. R. (2020). Variance-adjusted Mahalanobis (VAM): a fast and accurate method for cell-specific gene set scoring. biorXiv e-prints. doi: https://doi.org/10.1101/2020.02.18.954321
Utility function that creates a gene set collection list in the format required by vamForCollection() given the gene IDs measured in the expression matrix and a list of gene sets as defined by the IDs of the member genes.
createGeneSetCollection(gene.ids, gene.set.collection, min.size=1, max.size)
createGeneSetCollection(gene.ids, gene.set.collection, min.size=1, max.size)
gene.ids |
Vector of gene IDs. This should correspond to the genes measured in the gene expression data. |
gene.set.collection |
List of gene sets where each element in the list corresponds to a gene set and the list element is a vector of gene IDs. List names are gene set names. Must contain at least one gene set. |
min.size |
Minimum gene set size after filtering out genes not in the gene.ids vector. Gene sets whose post-filtering size is below this are removed from the final collection list. Default is 1 and cannot be set to less than 1. |
max.size |
Maximum gene set size after filtering out genes not in the gene.ids vector. Gene sets whose post-filtering size is above this are removed from the final collection list. If not specified, no filtering is performed. |
Version of the input gene.set.collection list where gene IDs have been replaced by position indices, genes not present in the gene.ids vector have been removed and gene sets failing the min/max size constraints have been removed.
# Create a collection with two sets defined over 3 genes createGeneSetCollection(gene.ids=c("A", "B", "C"), gene.set.collection = list(set1=c("A", "B"), set2=c("B", "C")), min.size=2, max.size=3)
# Create a collection with two sets defined over 3 genes createGeneSetCollection(gene.ids=c("A", "B", "C"), gene.set.collection = list(set1=c("A", "B"), set2=c("B", "C")), min.size=2, max.size=3)
Implementation of the Variance-adjusted Mahalanobis (VAM) method, which computes distance statistics and one-sided p-values
for all cells in the specified single cell gene expression matrix. This matrix should reflect the subset of the full
expression profile that corresponds to a single gene set. The p-values will be computed using either a
chi-square distribution, a non-central chi-square distribution or gamma distribution as controlled by the
center
and gamma
arguments for the one-sided alternative hypothesis that the expression values in the
cell are further from the mean (center=T
) or origin (center=F
) than expected under the null
of uncorrelated technical noise, i.e., gene expression variance is purely technical and all genes are uncorrelated.
vam(gene.expr, tech.var.prop, gene.weights, center=FALSE, gamma=TRUE)
vam(gene.expr, tech.var.prop, gene.weights, center=FALSE, gamma=TRUE)
gene.expr |
An n x p matrix of gene expression values for n cells and p genes. |
tech.var.prop |
Vector of technical variance proportions for each of the p genes. If specified, the Mahalanobis distance will be computed using a diagonal covariance matrix generated using these proportions. If not specified, the Mahalanobis distances will be computed using a diagonal covariance matrix generated from the sample variances. |
gene.weights |
Optional vector of gene weights. If specified, weights must be > 0. The weights are used to adjust the gene variance values included in the computation of the modified Mahalanobis distances. Specifically, the gene variance is divided by the gene weight. This adjustment means that large weights will increase the influence of a given gene in the computation of the modified Mahalanobis distance. |
center |
If true, will mean center the values in the computation of the Mahalanobis statistic. If false, will compute the Mahalanobis distance from the origin. Default is F. |
gamma |
If true, will fit a gamma distribution to the non-zero squared Mahalanobis distances computed from
a row-permuted version of |
A data.frame
with the following elements (row names will match row names from gene.expr):
"cdf.value": 1 minus the one-sided p-values computed from the squared adjusted Mahalanobis distances.
"distance.sq": The squared adjusted Mahalanobis distances for the n cells.
# Simulate Poisson expression data for 10 genes and 10 cells gene.expr=matrix(rpois(100, lambda=2), nrow=10) # Simulate technical variance proportions tech.var.prop=runif(10) # Execute VAM to compute scores for the 10 genes on each cell vam(gene.expr=gene.expr, tech.var.prop=tech.var.prop) # Create weights that prioritize the first 5 genes gene.weights = c(rep(2,5), rep(1,5)) # Execute VAM using the weights vam(gene.expr=gene.expr, tech.var.prop=tech.var.prop, gene.weights=gene.weights)
# Simulate Poisson expression data for 10 genes and 10 cells gene.expr=matrix(rpois(100, lambda=2), nrow=10) # Simulate technical variance proportions tech.var.prop=runif(10) # Execute VAM to compute scores for the 10 genes on each cell vam(gene.expr=gene.expr, tech.var.prop=tech.var.prop) # Create weights that prioritize the first 5 genes gene.weights = c(rep(2,5), rep(1,5)) # Execute VAM using the weights vam(gene.expr=gene.expr, tech.var.prop=tech.var.prop, gene.weights=gene.weights)
Executes the Variance-adjusted Mahalanobis (VAM) method (vam
) on multiple gene sets, i.e., a gene set collection.
vamForCollection(gene.expr, gene.set.collection, tech.var.prop, gene.weights, center=FALSE, gamma=TRUE)
vamForCollection(gene.expr, gene.set.collection, tech.var.prop, gene.weights, center=FALSE, gamma=TRUE)
gene.expr |
An n x p matrix of gene expression values for n cells and p genes. |
gene.set.collection |
List of m gene sets for which scores are computed.
Each element in the list corresponds to a gene set and the list element is a vector
of indices for the genes in the set. The index value is defined relative to the
order of genes in the |
tech.var.prop |
See description in |
gene.weights |
See description in |
center |
See description in |
gamma |
See description in |
A list containing two elements:
"cdf.value": n x m matrix of 1 minus the one-sided p-values for the m gene sets and n cells.
"distance.sq": n x m matrix of squared adjusted Mahalanobis distances for the m gene sets and n cells.
# Simulate Poisson expression data for 10 genes and 10 cells gene.expr=matrix(rpois(100, lambda=2), nrow=10) # Simulate technical variance proportions tech.var.prop=runif(10) # Define a collection with two disjoint sets that span the 10 genes collection=list(set1=1:5, set2=6:10) # Execute VAM on both sets using default values for center and gamma vamForCollection(gene.expr=gene.expr, gene.set.collection=collection, tech.var.prop=tech.var.prop) # Create weights that prioritize the first 2 genes for the first set # and the last 2 genes for the second set gene.weights = list(c(2,2,1,1,1),c(1,1,1,2,2)) # Execute VAM using the weights vamForCollection(gene.expr=gene.expr, gene.set.collection=collection, tech.var.prop=tech.var.prop, gene.weights=gene.weights)
# Simulate Poisson expression data for 10 genes and 10 cells gene.expr=matrix(rpois(100, lambda=2), nrow=10) # Simulate technical variance proportions tech.var.prop=runif(10) # Define a collection with two disjoint sets that span the 10 genes collection=list(set1=1:5, set2=6:10) # Execute VAM on both sets using default values for center and gamma vamForCollection(gene.expr=gene.expr, gene.set.collection=collection, tech.var.prop=tech.var.prop) # Create weights that prioritize the first 2 genes for the first set # and the last 2 genes for the second set gene.weights = list(c(2,2,1,1,1),c(1,1,1,2,2)) # Execute VAM using the weights vamForCollection(gene.expr=gene.expr, gene.set.collection=collection, tech.var.prop=tech.var.prop, gene.weights=gene.weights)
Executes the Variance-adjusted Mahalanobis (VAM) method (vamForCollection
) on
normalized scRNA-seq data stored in a Seurat object.
If the Seurat NormalizeData
method was used for normalization, the technical variance of each gene is computed as
the proportion of technical variance (from FindVariableFeatures
) multiplied by the variance of the normalized counts.
If SCTransform
was used for normalization, the technical variance for each gene is set
to 1 (the normalized counts output by SCTransform
should have variance 1 if there is only technical variation).
vamForSeurat(seurat.data, gene.weights, gene.set.collection, center=FALSE, gamma=TRUE, sample.cov=FALSE, return.dist=FALSE)
vamForSeurat(seurat.data, gene.weights, gene.set.collection, center=FALSE, gamma=TRUE, sample.cov=FALSE, return.dist=FALSE)
seurat.data |
The Seurat object that holds the scRNA-seq data. Assumes normalization has already been performed. |
gene.weights |
See description in |
gene.set.collection |
List of m gene sets for which scores are computed.
Each element in the list corresponds to a gene set and the list element is a vector
of indices for the genes in the set. The index value is defined relative to the
order of genes in the relevant |
center |
See description in |
gamma |
See description in |
sample.cov |
If true, will use the a diagonal covariance matrix generated from the
sample variances to compute the squared adjusted Mahalanobis distances (this is equivalent to not specifying
|
return.dist |
If true, will return the squared adjusted Mahalanobis distances in a new Assay object called "VAM.dist". Default is F. |
Updated Seurat object that hold the VAM results in one or two new Assay objects:
If return.dist
is true, the matrix of squared adjusted Mahalanobis distances will be stored in new
Assay object called "VAM.dist".
The matrix of CDF values (1 minus the one-sided p-values) will be stored in new Assay object called "VAM.cdf".
# Only run example code if Seurat package is available if (requireNamespace("Seurat", quietly=TRUE) & requireNamespace("SeuratObject", quietly=TRUE)) { # Define a collection with one gene set for the first 10 genes collection=list(set1=1:10) # Execute on the pbmc_small scRNA-seq data set included with SeuratObject # See vignettes for more detailed Seurat examples vamForSeurat(seurat.data=SeuratObject::pbmc_small, gene.set.collection=collection) }
# Only run example code if Seurat package is available if (requireNamespace("Seurat", quietly=TRUE) & requireNamespace("SeuratObject", quietly=TRUE)) { # Define a collection with one gene set for the first 10 genes collection=list(set1=1:10) # Execute on the pbmc_small scRNA-seq data set included with SeuratObject # See vignettes for more detailed Seurat examples vamForSeurat(seurat.data=SeuratObject::pbmc_small, gene.set.collection=collection) }