--- title: "Differential Co-Expression and Differential Expression Analysis" author: "Thomas Lui" date: "Sunday, July 12, 2015" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{DECODE } %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ### Package description Integrated differential expression (DE) and differential co-expression (DC) analysis on gene expression data based on DECODE (**D**iffer**E**ntial **CO**-expression and **D**ifferential **E**xpression) algorithm. Given a set of gene expression data and functional gene set data, the program will return a table summary of the selected gene sets with high differential co-expression and high differential expression (HDC-HDE). ### Reference Lui, TWH, Tsui, NBY, Chan, LWC, SP Siu, PM, Wong, C, Yung, BYM. (2015) DECODE: an integrated differential co-expression and differential expression analysis of gene expression data. BMC Bioinformatics, May 31;16:182. http://www.biomedcentral.com/1471-2105/16/182?fmt_math=yes&fmt_math_check=on ### Input gene expression data format **Data format:** * Columns: + Columns are tab separated + Column 1: Official gene symbol + Column 2: Probe ID + Starting from column 3: Expression for different samples * Rows: + Row 1 (starting from column 3): Sample class ("1" indicates control group; "2" indicates case group) + Row 2: Sample id + Starting from row 3: Expression for different genes * An example for data format: geneName | probeID | 2 | 2 | 2 | 1 | 1 | 1 -------- | ------- | - | - | - | - | - | - - | - | Case Sample 1 | Case Sample 2 | Case Sample 3 | Control Sample 1 | Control Sample 2 | Control Sample 3 7A5 | ILMN_1762337 | 5.12621 | 5.19419 | 5.06645 | 5.40649 | 5.51259 | 5.38700 A1BG | ILMN_2055271 | 5.63504 | 5.68533 | 5.66251 | 5.37466 | 5.43955 | 5.50973 A1CF | ILMN_2383229 | 5.41543 | 5.58543 | 5.43239 | 5.49634 | 5.62685 | 5.36962 A26C3 | ILMN_1653355 | 5.56713 | 5.55470 | 5.59547 | 5.46895 | 5.49622 | 5.50094 A2BP1 | ILMN_1814316 | 5.23016 | 5.33808 | 5.31413 | 5.30586 | 5.40108 | 5.31855 A2M | ILMN_1745607 | 7.65332 | 6.56431 | 8.20163 | 9.19837 | 9.04295 | 10.1448 A2ML1 | ILMN_2136495 | 5.53532 | 5.93801 | 5.33728 | 5.36676 | 5.79942 | 5.13974 A3GALT2 | ILMN_1668111 | 5.18578 | 5.35207 | 5.30554 | 5.26107 | 5.26536 | 5.28932 A4GALT | ILMN_1735045 | 6.34751 | 5.56750 | 6.92335 | 7.49523 | 7.12119 | 6.54748 A4GNT | ILMN_1680754 | 5.26417 | 5.28596 | 5.27560 | 5.28830 | 5.08440 | 5.44869 ### Input gene set data format **Data format:** * Rows: + Each row consists of a gene set (including gene set names, id, and genes) * Columns: + Columns are tab separated + Column 1: Name of gene set + Column 2: Gene set ID (e.g. GO ID) + Starting from column 3: Genes (using official gene symbols) in the gene set * An example for data format: Column 1 | Column 2 | Column 3, 4, ... -------- | ------- | - positive regulation of epidermal growth factor-activated receptor activity | GO 0045741 | EREG FBXW7 EPGN ADAM17 ADRA2C ADRA2A TGFA EGF pyrimidine-containing compound salvage | GO 0008655 | UPP1 TYMP TK1 UPP2 UCKL1 CDA TK2 UCK1 DCK ### To load the package: ```{r} library(decode) ``` ### Example 1: Running a larger set of gene expression data with 1400 genes. It will take ~16 minutes to complete. (Computer used: An Intel Core i7-4600 processor, 2.69 GHz, 8 GB RAM) ``` path = system.file('extdata', package='decode') geneSetInputFile = file.path(path, "geneSet.txt") geneExpressionFile = file.path(path, "Expression_data_1400genes.txt") runDecode(geneSetInputFile, geneExpressionFile) ``` ### Example 2: A smaller set of gene expression data with 50 genes to satisfy CRAN's submission requirement. (No results will be generated) ```{r} path = system.file('extdata', package='decode') geneSetInputFile = file.path(path, "geneSet.txt") geneExpressionFile = file.path(path, "Expression_data_50genes.txt") runDecode(geneSetInputFile, geneExpressionFile) ```