Title: | Tissue-Adjusted Pathway Analysis of Cancer (TPAC) |
---|---|
Description: | Contains logic for single sample gene set testing of cancer transcriptomic data with adjustment for normal tissue-specificity. Frost, H. Robert (2023) "Tissue-adjusted pathway analysis of cancer (TPAC)" <doi:10.1101/2022.03.17.484779>. |
Authors: | H. Robert Frost [aut, cre] |
Maintainer: | H. Robert Frost <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.0 |
Built: | 2024-10-31 22:27:55 UTC |
Source: | CRAN |
Implementation of TPAC, a method for single sample gene set testing of cancer transcriptomics data that leverages normal tissue-specificity.
Package: | TPAC |
Type: | Package |
Version: | 0.1.0 |
Date: | 2023 |
License: | GPL-2 |
This work was supported by the National Institutes of Health grants R35GM146586, R21CA253408, P20GM130454 and P30CA023108.
H. Robert Frost
Frost, H. R. (2022). Tissue-adjusted pathway analysis of cancer (TPAC). biorXiv e-prints. doi: https://doi.org/10.1101/2022.03.17.484779
Utility function that creates a gene set collection list in the format required by tpacForCollection
given the gene IDs measured in the expression matrix and a list of gene sets as defined by the IDs of the member genes.
createGeneSetCollection(gene.ids, gene.set.collection, min.size=1, max.size)
createGeneSetCollection(gene.ids, gene.set.collection, min.size=1, max.size)
gene.ids |
Vector of gene IDs. This should correspond to the genes measured in the gene expression data. |
gene.set.collection |
List of gene sets where each element in the list corresponds to a gene set and the list element is a vector of gene IDs. List names are gene set names. Must contain at least one gene set. |
min.size |
Minimum gene set size after filtering out genes not in the gene.ids vector. Gene sets whose post-filtering size is below this are removed from the final collection list. Default is 1 and cannot be set to less than 1. |
max.size |
Maximum gene set size after filtering out genes not in the gene.ids vector. Gene sets whose post-filtering size is above this are removed from the final collection list. If not specified, no filtering is performed. |
Version of the input gene.set.collection list where gene IDs have been replaced by position indices, genes not present in the gene.ids vector have been removed and gene sets failing the min/max size constraints have been removed.
# Create a collection with two sets defined over 3 genes createGeneSetCollection(gene.ids=c("A", "B", "C"), gene.set.collection = list(set1=c("A", "B"), set2=c("B", "C")), min.size=2, max.size=3)
# Create a collection with two sets defined over 3 genes createGeneSetCollection(gene.ids=c("A", "B", "C"), gene.set.collection = list(set1=c("A", "B"), set2=c("B", "C")), min.size=2, max.size=3)
Retrieves the names of cancer types supported by the tpacForCancer
function.
getSupportedCancerTypes()
getSupportedCancerTypes()
Vector containing the names of supported cancer types.
getSupportedCancerTypes()
getSupportedCancerTypes()
Implementation of the tissue-adjusted pathway analysis of cancer (TPAC) method, which performs single sample gene set scoring of cancer transcriptomic data. The TPAC method computes a tissue-adjusted squared Mahalanobis distance and one-sided CDF values for all samples in the specified cancer gene expression matrix. Importantly, these distances are measured from the normal tissue mean and normal tissue specificity (i.e., fold-change in mean expression in the corresponding normal tissue relative to the mean expression in other tissue types) is used to weight genes when computing the distance metric. The distances are decomposed into a positive component, negative component and overall component, i.e., the positive component only captures expression values above the mean. A gamma distribution is fit on the squared tissue-adjusted Mahalanobis distances computed on data that is permuted to match to the null that the cancer expression data is uncorrelated with mean values matching those found in the associated normal tissue. This null gamma distribution is used to calculate the CDF values for the unpermuted distances.
tpac(gene.expr, mean.expr, tissue.specificity)
tpac(gene.expr, mean.expr, tissue.specificity)
gene.expr |
An n x p matrix of gene expression values for n samples and p genes. This matrix should reflect the subset of the full expression profile that corresponds to a single gene set. |
mean.expr |
Vector of mean expression values for the p genes from which distances will be measured, i.e., mean expression in
corresponding normal tissue.These mean expression values must be computed using the same normalization method used for
tumor expression data in |
tissue.specificity |
Optional vector of tissue-specificity values for the p genes, i.e, fold change between mean expression in the associated normal tissue and the average of mean expression values in other cancer-associated normal tissues. If specified, these specificity values are used to modify the diagonal elements of the covariance matrix used in the Mahalanobis statistic. |
A data.frame
with the following elements (row names will match row names from gene.expr
):
s.pos
: vector of TPAC scores (i.e., CDF values) computed from the positive squared adjusted Mahalanobis distances.
s.neg
: vector of TPAC scores computed from the negative squared adjusted Mahalanobis distances.
s
: vector of TPAC scores computed from the sum of the positive and negative squared adjusted Mahalanobis distances.
# Simulate Gaussian expression data for 10 genes and 10 samples gene.expr=matrix(rnorm(100), nrow=10) # Use 0 as mean.expr mean.expr=rep(0,10) # Simulate tissue-specific weights tissue.specificity = runif(10, min=0.5, max=1.5) # Execute TPAC to compute positive, negative and overall scores tpac(gene.expr=gene.expr, mean.expr=mean.expr, tissue.specificity=tissue.specificity)
# Simulate Gaussian expression data for 10 genes and 10 samples gene.expr=matrix(rnorm(100), nrow=10) # Use 0 as mean.expr mean.expr=rep(0,10) # Simulate tissue-specific weights tissue.specificity = runif(10, min=0.5, max=1.5) # Execute TPAC to compute positive, negative and overall scores tpac(gene.expr=gene.expr, mean.expr=mean.expr, tissue.specificity=tissue.specificity)
Executes the TPAC (tissue-adjusted pathway analysis for cancer) method (tpacForCollection
) on cancer gene expression data
using normal tissue expression data from the Human Protein Atlas (HPA) that is included in the package as hpa.data
.
This HPA normal tissue data was specially processed by the HPA group as FPKM values using a pipeline similar to that employed by GDC for the TCGA data. For consistency with this HPA normal tissue data, the provided cancer.gene.expr
data must be specified as FPKM+1 values. Please see the vignette for an example of calling this function using appropriately normalized TCGA gene expression data.
tpacForCancer(cancer.gene.expr, cancer.type, gene.set.collection, min.set.size=1, max.set.size)
tpacForCancer(cancer.gene.expr, cancer.type, gene.set.collection, min.set.size=1, max.set.size)
cancer.gene.expr |
An n x p matrix of gene expression values for n tumors of the specified tumor type and p genes. The data should be normalized as FPKM+1 values, row names should be sample ID, and column names should be Ensembl gene IDs. |
cancer.type |
Cancer type of the expression data. Must be one of the supported cancer types as per |
gene.set.collection |
List of m gene sets for which scores are computed. Each element in the list corresponds to a gene set and the list element is a vector of Ensembl IDs for genes in the set. Gene set names should be specified as list names. |
min.set.size |
See description of |
max.set.size |
See description of |
A list containing two elements:
S.pos
: n x m matrix of TPAC scores computed using the positive squared adjusted Mahalanobis distances.
S.neg
: n x m matrix of TPAC scores computed using the negative squared adjusted Mahalanobis distances.
S
: n x m matrix of TPAC scorescomputed using the sum of the positive and negative squared adjusted Mahalanobis distances.
tpac
, hpa.data
, tpacForCollection
, getSupportedCancerTypes
, createGeneSetCollection
# Simulate Gaussian expression data for 10 genes and 10 samples # (Note: cancer expression should be FPKM+1 for real applications) cancer.gene.expr=matrix(rnorm(200), nrow=20) # Create arbitrary Ensembl IDs gene.ids = c("ENSG00000000003","ENSG00000000005","ENSG00000000419", "ENSG00000000457","ENSG00000000460","ENSG00000000938", "ENSG00000000971","ENSG00000001036","ENSG00000001084", "ENSG00000001167") colnames(cancer.gene.expr) = gene.ids # Define a collection with two disjoint sets that span the 10 genes collection=list(set1=gene.ids[1:5], set2=gene.ids[6:10]) # Execute TPAC on both sets tpacForCancer(cancer.gene.expr=cancer.gene.expr, cancer.type="glioma", gene.set.collection=collection)
# Simulate Gaussian expression data for 10 genes and 10 samples # (Note: cancer expression should be FPKM+1 for real applications) cancer.gene.expr=matrix(rnorm(200), nrow=20) # Create arbitrary Ensembl IDs gene.ids = c("ENSG00000000003","ENSG00000000005","ENSG00000000419", "ENSG00000000457","ENSG00000000460","ENSG00000000938", "ENSG00000000971","ENSG00000001036","ENSG00000001084", "ENSG00000001167") colnames(cancer.gene.expr) = gene.ids # Define a collection with two disjoint sets that span the 10 genes collection=list(set1=gene.ids[1:5], set2=gene.ids[6:10]) # Execute TPAC on both sets tpacForCancer(cancer.gene.expr=cancer.gene.expr, cancer.type="glioma", gene.set.collection=collection)
Executes the TPAC (tissue-adjusted pathway analysis for cancer) method (tpac
) on multiple gene sets, i.e., a gene set collection.
tpacForCollection(gene.expr, mean.expr, tissue.specificity, gene.set.collection)
tpacForCollection(gene.expr, mean.expr, tissue.specificity, gene.set.collection)
gene.expr |
See description in |
mean.expr |
See description in |
tissue.specificity |
See description in |
gene.set.collection |
List of m gene sets for which scores are computed.
Each element in the list corresponds to a gene set and the list element is a vector
of indices for the genes in the set. The index value is defined relative to the
order of genes in the |
A list containing two elements:
s.pos
: n x m matrix of TPAC scores computed using the positive squared adjusted Mahalanobis distances.
s.neg
: n x m matrix of TPAC scores computed using the negative squared adjusted Mahalanobis distances.
s
: n x m matrix of TPAC scores computed using the sum of the positive and negative squared adjusted Mahalanobis distances.
# Simulate Gaussian expression data for 10 genes and 10 samples gene.expr=matrix(rnorm(100), nrow=10) # Use 0 as mean.expr mean.expr=rep(0,10) # Simulate tissue-specific weights tissue.specificity = runif(10, min=0.5, max=1.5) # Define a collection with two disjoint sets that span the 10 genes collection=list(set1=1:5, set2=6:10) # Execute TPAC on both sets tpacForCollection(gene.expr=gene.expr, mean.expr=mean.expr, tissue.specificity=tissue.specificity, gene.set.collection=collection)
# Simulate Gaussian expression data for 10 genes and 10 samples gene.expr=matrix(rnorm(100), nrow=10) # Use 0 as mean.expr mean.expr=rep(0,10) # Simulate tissue-specific weights tissue.specificity = runif(10, min=0.5, max=1.5) # Define a collection with two disjoint sets that span the 10 genes collection=list(set1=1:5, set2=6:10) # Execute TPAC on both sets tpacForCollection(gene.expr=gene.expr, mean.expr=mean.expr, tissue.specificity=tissue.specificity, gene.set.collection=collection)