Differential Co-Expression and Differential Expression Analysis

Package description

Integrated differential expression (DE) and differential co-expression (DC) analysis on gene expression data based on DECODE (DifferEntial CO-expression and Differential Expression) algorithm. Given a set of gene expression data and functional gene set data, the program will return a table summary of the selected gene sets with high differential co-expression and high differential expression (HDC-HDE).

Reference

Lui, TWH, Tsui, NBY, Chan, LWC, SP Siu, PM, Wong, C, Yung, BYM. (2015) DECODE: an integrated differential co-expression and differential expression analysis of gene expression data. BMC Bioinformatics, May 31;16:182. http://www.biomedcentral.com/1471-2105/16/182?fmt_math=yes&fmt_math_check=on

Input gene expression data format

Data format:

geneName probeID 2 2 2 1 1 1
- - Case Sample 1 Case Sample 2 Case Sample 3 Control Sample 1 Control Sample 2 Control Sample 3
7A5 ILMN_1762337 5.12621 5.19419 5.06645 5.40649 5.51259 5.38700
A1BG ILMN_2055271 5.63504 5.68533 5.66251 5.37466 5.43955 5.50973
A1CF ILMN_2383229 5.41543 5.58543 5.43239 5.49634 5.62685 5.36962
A26C3 ILMN_1653355 5.56713 5.55470 5.59547 5.46895 5.49622 5.50094
A2BP1 ILMN_1814316 5.23016 5.33808 5.31413 5.30586 5.40108 5.31855
A2M ILMN_1745607 7.65332 6.56431 8.20163 9.19837 9.04295 10.1448
A2ML1 ILMN_2136495 5.53532 5.93801 5.33728 5.36676 5.79942 5.13974
A3GALT2 ILMN_1668111 5.18578 5.35207 5.30554 5.26107 5.26536 5.28932
A4GALT ILMN_1735045 6.34751 5.56750 6.92335 7.49523 7.12119 6.54748
A4GNT ILMN_1680754 5.26417 5.28596 5.27560 5.28830 5.08440 5.44869

Input gene set data format

Data format:

Column 1 Column 2 Column 3, 4, …
positive regulation of epidermal growth factor-activated receptor activity GO 0045741 EREG FBXW7 EPGN ADAM17 ADRA2C ADRA2A TGFA EGF
pyrimidine-containing compound salvage GO 0008655 UPP1 TYMP TK1 UPP2 UCKL1 CDA TK2 UCK1 DCK

To load the package:

library(decode)

Example 1:

Running a larger set of gene expression data with 1400 genes. It will take ~16 minutes to complete. (Computer used: An Intel Core i7-4600 processor, 2.69 GHz, 8 GB RAM)

path = system.file('extdata', package='decode')
geneSetInputFile = file.path(path, "geneSet.txt")
geneExpressionFile = file.path(path, "Expression_data_1400genes.txt")
runDecode(geneSetInputFile, geneExpressionFile)

Example 2:

A smaller set of gene expression data with 50 genes to satisfy CRAN’s submission requirement. (No results will be generated)

path = system.file('extdata', package='decode')
geneSetInputFile = file.path(path, "geneSet.txt")
geneExpressionFile = file.path(path, "Expression_data_50genes.txt")
runDecode(geneSetInputFile, geneExpressionFile)
## [1] "Reading gene expression data..."
## [1] "Calculating t-statistics..."
## [1] "Calculating pairwise correlation for normal states..."
## [1] "Calculating pairwise correlation for disease states..."
## [1] "Calculating differential co-expression measures ..."
## [1] "Reading functional gene set data"
## Warning in strsplit(oneLine, "\t"): unable to translate 'Loss of proteins
## required for interphase microtubule organization<e5><ca>from the centrosome
## REACTOME\REACT_15451.2 TUBA1A CDK5RAP2 DYNC1H1 PPP2R1A CDK1 HSP90AA1 CEP41 PLK4
## NEDD1 PLK1 OFD1 NDE1 CEP250 DYNLL1 ACTR1A CEP72 CEP78 CCP110 CEP76 CEP192
## CEP13...' to a wide string
## Warning in strsplit(oneLine, "\t"): input string 1 is invalid
## [1] "Identifying optimal thresholds for genes"
## [1] "Gene id: 1"
## [1] "Gene id: 2"
## [1] "Gene id: 3"
## [1] "Gene id: 4"
## [1] "Gene id: 5"
## [1] "Gene id: 6"
## [1] "Gene id: 7"
## [1] "Gene id: 8"
## [1] "Gene id: 9"
## [1] "Gene id: 10"
## [1] "Gene id: 11"
## [1] "Gene id: 12"
## [1] "Gene id: 13"
## [1] "Gene id: 14"
## [1] "Gene id: 15"
## [1] "Gene id: 16"
## [1] "Gene id: 17"
## [1] "Gene id: 18"
## [1] "Gene id: 19"
## [1] "Gene id: 20"
## [1] "Gene id: 21"
## [1] "Gene id: 22"
## [1] "Gene id: 23"
## [1] "Gene id: 24"
## [1] "Gene id: 25"
## [1] "Gene id: 26"
## [1] "Gene id: 27"
## [1] "Gene id: 28"
## [1] "Gene id: 29"
## [1] "Gene id: 30"
## [1] "Gene id: 31"
## [1] "Gene id: 32"
## [1] "Gene id: 33"
## [1] "Gene id: 34"
## [1] "Gene id: 35"
## [1] "Gene id: 36"
## [1] "Gene id: 37"
## [1] "Gene id: 38"
## [1] "Gene id: 39"
## [1] "Gene id: 40"
## [1] "Gene id: 41"
## [1] "Gene id: 42"
## [1] "Gene id: 43"
## [1] "Gene id: 44"
## [1] "Gene id: 45"
## [1] "Gene id: 46"
## [1] "Gene id: 47"
## [1] "Gene id: 48"
## [1] "Gene id: 49"
## [1] "Gene id: 50"
## [1] "Identifying best associated functional gene set for each gene..."
## [1] "Gene id: 1"
## [1] "Gene id: 2"
## [1] "Gene id: 3"
## [1] "Gene id: 4"
## [1] "Gene id: 5"
## [1] "Gene id: 6"
## [1] "Gene id: 7"
## [1] "Gene id: 8"
## [1] "Gene id: 9"
## [1] "Gene id: 10"
## [1] "Gene id: 11"
## [1] "Gene id: 12"
## [1] "Gene id: 13"
## [1] "Gene id: 14"
## [1] "Gene id: 15"
## [1] "Gene id: 16"
## [1] "Gene id: 17"
## [1] "Gene id: 18"
## [1] "Gene id: 19"
## [1] "Gene id: 20"
## [1] "Gene id: 21"
## [1] "Gene id: 22"
## [1] "Gene id: 23"
## [1] "Gene id: 24"
## [1] "Gene id: 25"
## [1] "Gene id: 26"
## [1] "Gene id: 27"
## [1] "Gene id: 28"
## [1] "Gene id: 29"
## [1] "Gene id: 30"
## [1] "Gene id: 31"
## [1] "Gene id: 32"
## [1] "Gene id: 33"
## [1] "Gene id: 34"
## [1] "Gene id: 35"
## [1] "Gene id: 36"
## [1] "Gene id: 37"
## [1] "Gene id: 38"
## [1] "Gene id: 39"
## [1] "Gene id: 40"
## [1] "Gene id: 41"
## [1] "Gene id: 42"
## [1] "Gene id: 43"
## [1] "Gene id: 44"
## [1] "Gene id: 45"
## [1] "Gene id: 46"
## [1] "Gene id: 47"
## [1] "Gene id: 48"
## [1] "Gene id: 49"
## [1] "Gene id: 50"
## [1] "Processing raw results..."
## [1] "Summarizing functional gene set results..."
## [1] "Done. Result is saved in out_summary.txt"