Package 'PEACH'

Title: Pareto Enrichment Analysis for Combining Heterogeneous Datasets
Description: A meta gene set analysis tool developed based on principles of Pareto dominance (William B T Mock (2011) <doi:10.1007/978-1-4020-9160-5_341>). It is designed to combine gene set analysis p-values from multiple transcriptome datasets (e.g., microarray and RNA-Seq). The novel Pareto method for p-value combination allows PEACH to properly model heterogeneity and correlation in Omics datasets.
Authors: Jinyan Chan [aut, cre], Jinghua Gu [aut, cre]
Maintainer: Jinyan Chan <[email protected]>
License: GPL-2
Version: 0.1.1
Built: 2024-10-29 06:29:37 UTC
Source: CRAN

Help Index


KEGG pathways

Description

This data set gives the geneset list of KEGG pathways

Usage

KEGG

Format

A large list containing 186 pathways.

Source

KEGG genesets from GSEA MSigDB Collections


Pareto Enrichment Analysis for Combining Heterogeneous datasets

Description

This function is for pathway enrichment meta-analysis with Pareto dominance based method. The input of this function is the gene level test-statistics (e.g. t statistics) from multiple datasets on which meta analysis will be performed, and a pathway (or geneset) list. It outputs the pathway p-values for each individual dataset as well as the pareto combined pathway p-values in a data frame.

Usage

peach(
  input.data = NULL,
  nsample = 1000,
  input.pathway = NULL,
  direction = "up",
  is.Fisher.Stouffer = TRUE
)

Arguments

input.data

The test statistics of each gene from multiple datasets (the test statistics is from the case versus control statistical test, e.g. t-test). The rows are genes where the rownames are gene names (official gene symbols). The columns are the individual datasets.

nsample

The number of random sampling times for Pareto meta-analysis p-value calculation. As Pareto based meta-analysis is a non-parametric method, this parameter decides the NULL distribution size of meta-pathway p-value computing.

input.pathway

the pathways or genesets in the format of lists. The pathways or genesets should be defined by official gene symbols. An example KEGG pathway can be obtained with data('KEGG'). (The pathway input format is the same with the output from the 'gmtPathways' function from fgsea package.)

direction

"up" or "down" denoting if the pathway p-value is calculated by accounting for pathway up-regulation or down-regulation. The default is "up", which means the peach function calculated combined p-value indicates if a pathway is up-regulated across the datasets being combined.

is.Fisher.Stouffer

Logical indication. If TRUE, peach function will output the combined meta-pathway-p-value from non-parametric Fisher's and Stouffer's method. The combined p-value will not be the same with the original Fisher's or Stouffer's method, as this version has the Monte Carlo implementation of these two methods that accounts for the correlation from the input dataset.

Examples

## load example input data (TCGA cancer versus control test t statistics)
data('TCGA.input')
## load the KEGG pathways
data('KEGG')
## Run peach

res = peach(input.data=TCGA.input,input.pathway=KEGG,direction ="up",is.Fisher.Stouffer = TRUE)

A sample of 16 cancers gene differential expression analysis t statistic score data

Description

This data set gives the TCGA 16 cancer types tumor versus normal t-test scores of each gene.

Usage

TCGA.input

Format

A matrix containing the t statistics of 20501 genes/rows and 16 cancer types/columns

Source

TCGA