Package 'TPAC' reference manual

Title:	Tissue-Adjusted Pathway Analysis of Cancer (TPAC)
Description:	Contains logic for single sample gene set testing of cancer transcriptomic data with adjustment for normal tissue-specificity. Frost, H. Robert (2023) "Tissue-adjusted pathway analysis of cancer (TPAC)" <doi:10.1101/2022.03.17.484779>.
Authors:	H. Robert Frost [aut, cre]
Maintainer:	H. Robert Frost <[email protected]>
License:	GPL (>= 2)
Version:	0.2.0
Built:	2025-02-28 08:17:55 UTC
Source:	CRAN

Tissue-adjusted pathway analysis of cancer (TPAC)

Description

Implementation of TPAC, a method for single sample gene set testing of cancer transcriptomics data that leverages normal tissue-specificity.

Details

Package:	TPAC
Type:	Package
Version:	0.1.0
Date:	2023
License:	GPL-2

Note

This work was supported by the National Institutes of Health grants R35GM146586, R21CA253408, P20GM130454 and P30CA023108.

Author(s)

H. Robert Frost

References

Frost, H. R. (2022). Tissue-adjusted pathway analysis of cancer (TPAC). biorXiv e-prints. doi: https://doi.org/10.1101/2022.03.17.484779

Utility function to help create gene set collection list object

Description

Utility function that creates a gene set collection list in the format required by tpacForCollection given the gene IDs measured in the expression matrix and a list of gene sets as defined by the IDs of the member genes.

Usage

    createGeneSetCollection(gene.ids, gene.set.collection, min.size=1, max.size)
createGeneSetCollection(gene.ids, gene.set.collection, min.size=1, max.size)

Arguments

`gene.ids`	Vector of gene IDs. This should correspond to the genes measured in the gene expression data.
`gene.set.collection`	List of gene sets where each element in the list corresponds to a gene set and the list element is a vector of gene IDs. List names are gene set names. Must contain at least one gene set.
`min.size`	Minimum gene set size after filtering out genes not in the gene.ids vector. Gene sets whose post-filtering size is below this are removed from the final collection list. Default is 1 and cannot be set to less than 1.
`max.size`	Maximum gene set size after filtering out genes not in the gene.ids vector. Gene sets whose post-filtering size is above this are removed from the final collection list. If not specified, no filtering is performed.

Value

Version of the input gene.set.collection list where gene IDs have been replaced by position indices, genes not present in the gene.ids vector have been removed and gene sets failing the min/max size constraints have been removed.

Examples

    # Create a collection with two sets defined over 3 genes
    createGeneSetCollection(gene.ids=c("A", "B", "C"),
        gene.set.collection = list(set1=c("A", "B"), set2=c("B", "C")),
        min.size=2, max.size=3)
# Create a collection with two sets defined over 3 genes
    createGeneSetCollection(gene.ids=c("A", "B", "C"),
        gene.set.collection = list(set1=c("A", "B"), set2=c("B", "C")),
        min.size=2, max.size=3)

Get supported cancer types

Description

Retrieves the names of cancer types supported by the tpacForCancer function.

Usage

    getSupportedCancerTypes()
getSupportedCancerTypes()

Value

Vector containing the names of supported cancer types.

Examples

  getSupportedCancerTypes()
getSupportedCancerTypes()

Tissue-adjusted pathway analysis of cancer (TPAC) algorithm

Description

Implementation of the tissue-adjusted pathway analysis of cancer (TPAC) method, which performs single sample gene set scoring of cancer transcriptomic data. The TPAC method computes a tissue-adjusted squared Mahalanobis distance and one-sided CDF values for all samples in the specified cancer gene expression matrix. Importantly, these distances are measured from the normal tissue mean and normal tissue specificity (i.e., fold-change in mean expression in the corresponding normal tissue relative to the mean expression in other tissue types) is used to weight genes when computing the distance metric. The distances are decomposed into a positive component, negative component and overall component, i.e., the positive component only captures expression values above the mean. A gamma distribution is fit on the squared tissue-adjusted Mahalanobis distances computed on data that is permuted to match to the null that the cancer expression data is uncorrelated with mean values matching those found in the associated normal tissue. This null gamma distribution is used to calculate the CDF values for the unpermuted distances.

Usage

    tpac(gene.expr, mean.expr, tissue.specificity)
tpac(gene.expr, mean.expr, tissue.specificity)

Arguments

`gene.expr`	An n x p matrix of gene expression values for n samples and p genes. This matrix should reflect the subset of the full expression profile that corresponds to a single gene set.
`mean.expr`	Vector of mean expression values for the p genes from which distances will be measured, i.e., mean expression in corresponding normal tissue.These mean expression values must be computed using the same normalization method used for tumor expression data in `gene.expr`
`tissue.specificity`	Optional vector of tissue-specificity values for the p genes, i.e, fold change between mean expression in the associated normal tissue and the average of mean expression values in other cancer-associated normal tissues. If specified, these specificity values are used to modify the diagonal elements of the covariance matrix used in the Mahalanobis statistic.

Value

A data.frame with the following elements (row names will match row names from gene.expr):

s.pos: vector of TPAC scores (i.e., CDF values) computed from the positive squared adjusted Mahalanobis distances.
s.neg: vector of TPAC scores computed from the negative squared adjusted Mahalanobis distances.
s: vector of TPAC scores computed from the sum of the positive and negative squared adjusted Mahalanobis distances.

Examples

    # Simulate Gaussian expression data for 10 genes and 10 samples
    gene.expr=matrix(rnorm(100), nrow=10)
    # Use 0 as mean.expr 
    mean.expr=rep(0,10)
    # Simulate tissue-specific weights
    tissue.specificity = runif(10, min=0.5, max=1.5)
    # Execute TPAC to compute positive, negative and overall scores 
    tpac(gene.expr=gene.expr, mean.expr=mean.expr, tissue.specificity=tissue.specificity)
# Simulate Gaussian expression data for 10 genes and 10 samples
    gene.expr=matrix(rnorm(100), nrow=10)
    # Use 0 as mean.expr 
    mean.expr=rep(0,10)
    # Simulate tissue-specific weights
    tissue.specificity = runif(10, min=0.5, max=1.5)
    # Execute TPAC to compute positive, negative and overall scores 
    tpac(gene.expr=gene.expr, mean.expr=mean.expr, tissue.specificity=tissue.specificity)

Executes the TPAC method on cancer gene expression data

Description

Executes the TPAC (tissue-adjusted pathway analysis for cancer) method (tpacForCollection) on cancer gene expression data using normal tissue expression data from the Human Protein Atlas (HPA) that is included in the package as hpa.data. This HPA normal tissue data was specially processed by the HPA group as FPKM values using a pipeline similar to that employed by GDC for the TCGA data. For consistency with this HPA normal tissue data, the provided cancer.gene.expr data must be specified as FPKM+1 values. Please see the vignette for an example of calling this function using appropriately normalized TCGA gene expression data.

Usage

    tpacForCancer(cancer.gene.expr, cancer.type, gene.set.collection, 
                  min.set.size=1, max.set.size)
tpacForCancer(cancer.gene.expr, cancer.type, gene.set.collection, 
                  min.set.size=1, max.set.size)

Arguments

`cancer.gene.expr`	An n x p matrix of gene expression values for n tumors of the specified tumor type and p genes. The data should be normalized as FPKM+1 values, row names should be sample ID, and column names should be Ensembl gene IDs.
`cancer.type`	Cancer type of the expression data. Must be one of the supported cancer types as per `getSupportedCancerTypes`.
`gene.set.collection`	List of m gene sets for which scores are computed. Each element in the list corresponds to a gene set and the list element is a vector of Ensembl IDs for genes in the set. Gene set names should be specified as list names.
`min.set.size`	See description of `min.size` in `createGeneSetCollection`
`max.set.size`	See description of `max.size` in `createGeneSetCollection`

Value

A list containing two elements:

S.pos: n x m matrix of TPAC scores computed using the positive squared adjusted Mahalanobis distances.
S.neg: n x m matrix of TPAC scores computed using the negative squared adjusted Mahalanobis distances.
S: n x m matrix of TPAC scorescomputed using the sum of the positive and negative squared adjusted Mahalanobis distances.

Examples

    # Simulate Gaussian expression data for 10 genes and 10 samples
    # (Note: cancer expression should be FPKM+1 for real applications)
    cancer.gene.expr=matrix(rnorm(200), nrow=20)
    # Create arbitrary Ensembl IDs
    gene.ids = c("ENSG00000000003","ENSG00000000005","ENSG00000000419",
                 "ENSG00000000457","ENSG00000000460","ENSG00000000938",
                 "ENSG00000000971","ENSG00000001036","ENSG00000001084",
                 "ENSG00000001167")
    colnames(cancer.gene.expr) = gene.ids
    # Define a collection with two disjoint sets that span the 10 genes
    collection=list(set1=gene.ids[1:5], set2=gene.ids[6:10])    
    # Execute TPAC on both sets
    tpacForCancer(cancer.gene.expr=cancer.gene.expr, cancer.type="glioma", 
                  gene.set.collection=collection)
# Simulate Gaussian expression data for 10 genes and 10 samples
    # (Note: cancer expression should be FPKM+1 for real applications)
    cancer.gene.expr=matrix(rnorm(200), nrow=20)
    # Create arbitrary Ensembl IDs
    gene.ids = c("ENSG00000000003","ENSG00000000005","ENSG00000000419",
                 "ENSG00000000457","ENSG00000000460","ENSG00000000938",
                 "ENSG00000000971","ENSG00000001036","ENSG00000001084",
                 "ENSG00000001167")
    colnames(cancer.gene.expr) = gene.ids
    # Define a collection with two disjoint sets that span the 10 genes
    collection=list(set1=gene.ids[1:5], set2=gene.ids[6:10])    
    # Execute TPAC on both sets
    tpacForCancer(cancer.gene.expr=cancer.gene.expr, cancer.type="glioma", 
                  gene.set.collection=collection)

TPAC method for multiple gene sets

Description

Executes the TPAC (tissue-adjusted pathway analysis for cancer) method (tpac) on multiple gene sets, i.e., a gene set collection.

Usage

    tpacForCollection(gene.expr, mean.expr, tissue.specificity, gene.set.collection)
tpacForCollection(gene.expr, mean.expr, tissue.specificity, gene.set.collection)

Arguments

`gene.expr`	See description in `tpac`
`mean.expr`	See description in `tpac`
`tissue.specificity`	See description in `tpac`
`gene.set.collection`	List of m gene sets for which scores are computed. Each element in the list corresponds to a gene set and the list element is a vector of indices for the genes in the set. The index value is defined relative to the order of genes in the `gene.expr` matrix. Gene set names should be specified as list names.

Value

A list containing two elements:

s.pos: n x m matrix of TPAC scores computed using the positive squared adjusted Mahalanobis distances.
s.neg: n x m matrix of TPAC scores computed using the negative squared adjusted Mahalanobis distances.
s: n x m matrix of TPAC scores computed using the sum of the positive and negative squared adjusted Mahalanobis distances.

Examples

    # Simulate Gaussian expression data for 10 genes and 10 samples
    gene.expr=matrix(rnorm(100), nrow=10)
    # Use 0 as mean.expr 
    mean.expr=rep(0,10)
    # Simulate tissue-specific weights
    tissue.specificity = runif(10, min=0.5, max=1.5)
    # Define a collection with two disjoint sets that span the 10 genes
    collection=list(set1=1:5, set2=6:10)    
    # Execute TPAC on both sets
    tpacForCollection(gene.expr=gene.expr, mean.expr=mean.expr,
        tissue.specificity=tissue.specificity, gene.set.collection=collection)
# Simulate Gaussian expression data for 10 genes and 10 samples
    gene.expr=matrix(rnorm(100), nrow=10)
    # Use 0 as mean.expr 
    mean.expr=rep(0,10)
    # Simulate tissue-specific weights
    tissue.specificity = runif(10, min=0.5, max=1.5)
    # Define a collection with two disjoint sets that span the 10 genes
    collection=list(set1=1:5, set2=6:10)    
    # Execute TPAC on both sets
    tpacForCollection(gene.expr=gene.expr, mean.expr=mean.expr,
        tissue.specificity=tissue.specificity, gene.set.collection=collection)

Package 'TPAC'

Help Index

Tissue-adjusted pathway analysis of cancer (TPAC)

Description

Details

Note

Author(s)

References

Utility function to help create gene set collection list object

Description

Usage

Arguments

Value

See Also

Examples

Get supported cancer types

Description

Usage

Value

See Also

Examples

Tissue-adjusted pathway analysis of cancer (TPAC) algorithm

Description

Usage

Arguments

Value

See Also

Examples

Executes the TPAC method on cancer gene expression data

Description

Usage

Arguments

Value

See Also

Examples

TPAC method for multiple gene sets

Description

Usage

Arguments

Value

See Also

Examples