Title: | Transcript Time Course Analysis |
---|---|
Description: | The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes respond with quick, transient dynamics, while other genes change their expression slowly over time. The existing methods for detecting significant expression dynamics often fail when the expression dynamics show a large heterogeneity. Moreover, these methods often cannot cope with irregular and sparse measurements. The method proposed here is specifically designed for the analysis of perturbation responses. It combines different scores to capture fast and transient dynamics as well as slow expression changes, and performs well in the presence of low replicate numbers and irregular sampling times. The results are given in the form of tables including links to figures showing the expression dynamics of the respective transcript. These allow to quickly recognise the relevance of detection, to identify possible false positives and to discriminate early and late changes in gene expression. An extension of the method allows the analysis of the expression dynamics of functional groups of genes, providing a quick overview of the cellular response. The performance of this package was tested on microarray data derived from lung cancer cells stimulated with epidermal growth factor (EGF). Paper: Albrecht, Marco, et al. (2017)<DOI:10.1186/s12859-016-1440-8>. |
Authors: | Marco Albrecht |
Maintainer: | Marco Albrecht <[email protected]> |
License: | EUPL |
Version: | 0.1.1 |
Built: | 2024-11-09 06:22:49 UTC |
Source: | CRAN |
Links probeset ID with gene_name
data("annot")
data("annot")
A data frame with 5000 observations on the following 2 variables.
probeset_id
a character vector
gene_name
a character vector
TTCA: An R Package for the identification of differentially expressed genes in time course microarray data
data(annot)
data(annot)
Gives the possibility to link the resulttable with further characteristics.
data("annotation")
data("annotation")
A data frame with 5000 observations on the following 2 variables.
GeneOntology
a character vector
Property
a character vector
TTCA: An R Package for the identification of differentially expressed genes in time course microarray data
data(annotation)
data(annotation)
The second time course representing the control. Lung cancer cell line.
data("Control")
data("Control")
A data frame with 5000 observations on the following 11 variables.
B_C_2_0
a numeric vector
B_C_3_0
a numeric vector
B_C_2_0.5
a numeric vector
B_C_1_1
a numeric vector
B_C_3_4
a numeric vector
B_C_2_6
a numeric vector
B_C_1_24
a numeric vector
B_C_3_24
a numeric vector
B_C_1_48
a numeric vector
B_C_2_48
a numeric vector
B_C_3_48
a numeric vector
TTCA: An R Package for the identification of differentially expressed genes in time course microarray data
data(Control)
data(Control)
The first time course representing the sample with epidermal growth factor stimulation. Lung cancer cell line.
data("EGF")
data("EGF")
A data frame with 5000 observations on the following 16 variables.
B_C_2_0
a numeric vector
B_C_3_0
a numeric vector
B_EGF_1_0.5
a numeric vector
B_EGF_3_0.5
a numeric vector
B_EGF_1_1
a numeric vector
B_EGF_3_2
a numeric vector
B_EGF_1_4
a numeric vector
B_EGF_1_6
a numeric vector
B_EGF_1_8
a numeric vector
B_EGF_2_12
a numeric vector
B_EGF_2_18
a numeric vector
B_EGF_1_24
a numeric vector
B_EGF_3_24
a numeric vector
B_EGF_1_48
a numeric vector
B_EGF_2_48
a numeric vector
B_EGF_3_48
a numeric vector
TTCA: An R Package for the identification of differentially expressed genes in time course microarray data
data(EGF)
data(EGF)
Background: The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes respond with quick, transient dynamics, while other genes change their expression slowly over time. The existing methods for detecting significant expression dynamics often fail when the expression dynamics show a large heterogeneity. Moreover, these methods often cannot cope with irregular and sparse measurements. Results: The method proposed here is specifically designed for the analysis of perturbation responses. It combines different scores to capture fast and transient dynamics as well as slow expression changes, and performs well in the presence of low replicate numbers and irregular sampling times. The results are given in the form of tables including links to figures showing the expression dynamics of the respective transcript. These allow to quickly recognise the relevance of detection, to identify possible false positives and to discriminate early and late changes in gene expression. An extension of the method allows the analysis of the expression dynamics of functional groups of genes, providing a quick overview of the cellular response. The performance of this package was tested on microarray data derived from lung cancer cells stimulated with epidermal growth factor (EGF). Paper: Albrecht, Marco, et al. (2017)<DOI:10.1186/s12859-016-1440-8>.
TTCA(grp1, grp1.time, grp2, grp2.time, lambda = 0.6, annot = NA, annotation = "annotation", timeInt = NULL, pVal = 0.05, codetest = FALSE, file = getwd(), MaxPics = 10000, Stimulus1 = "", Stimulus2 = "", S = "gene", mapGO = "", PeakMode = "norm")
TTCA(grp1, grp1.time, grp2, grp2.time, lambda = 0.6, annot = NA, annotation = "annotation", timeInt = NULL, pVal = 0.05, codetest = FALSE, file = getwd(), MaxPics = 10000, Stimulus1 = "", Stimulus2 = "", S = "gene", mapGO = "", PeakMode = "norm")
grp1 |
Data set with longitudinal sampled data (data.frame) |
grp1.time |
Time points for data set 1 (vector like: c(0,0,0.5,1,2,4,6,8,12,12) |
grp2 |
Data set with longitudinal sampled data for comparison (data.frame) |
grp2.time |
Time points for data set 2 (vector like: c(0,0,0.5,3,2,4,6,8,12,12,24) |
lambda |
Smoothing parameter in penalty term of quantil regression (default: lambda=0.6 ). Adjust, if fit is too strict or too flexible. |
annot |
Annotation for pictures and result (Data.frame with 2 columns with ID and GeneName). (Default: annot=NA) |
annotation |
Merges the TTCA by rowname with a table of your wish. Example: annotation<-annotation[,c("probeset_id", "gene_name","transkript_id","GO_BP","GO_CC","GO_mf")] (default: annotation="annotation") |
timeInt |
Defines early, middle and late time period. Defines the middle time period between 4 h and 12 h with timeInt<-c(4,12). (default: timeInt=NULL) |
pVal |
P-value for the local hypothesis test (default: 0.05). |
codetest |
Reduces the data set to 200 features for a quick run of the program. (default: codetest=FALSE) |
file |
Result folder will be saved at this location (default: file=getwd() ). |
MaxPics |
Limits the number of plots (default: MaxPics=10000) |
Stimulus1 |
Searches this term together with the gene name in PubMed. Stimulus1="Insulin+like+growth+factor" ( default: Stimulus1="") |
Stimulus2 |
Searches this term together with the gene name in PubMed. Stimulus2="epidermal+growth+factor" ( default: Stimulus2="") |
S |
Defines mode. S =="GO" changes programm to gene ontology mode (default: S="gene") |
mapGO |
Link genes to Gene Ontology terms (default: mapGO="") |
PeakMode |
Peakmode "norm" uses variance between replicates. If changed to another character value, a normal hypothesis test will be conducted (default: PeakMode="norm") |
The package has not be applied to Hi-Seq data yet. The problem is the huge variety in the read counts. An additional transformation of normalized Hi-Seq data might be an option to scale the values between two values like 0 and 1 (Simple idea: Datalog<-log(data, base = max(data))). Not tested. IF you are interested to adjust my package to sequence data, feel free to contact me.
The R-package delivers a table with different significance values, rankings, p-values. Moreover, it will plot the most important time courses and quality control images.
## Not run: ########################################## #### Gene-ANALYSE ########################################## require(quantreg);require(VennDiagram);require(tcltk2); require(tcltk); require(RISmed);require(Matrix) data(EGF,Control,annot,annotation) S="gene" Control.time <- c(0,0,0.5,1,4,6,24,24,48,48,48) EGF.time <- c(0,0,0.5,0.5,1,2,4,6,8,12,18,24,24,48,48,48) file = paste0(getwd(),"/TTCA_Gene") dir.create(file) ###### TTCAresult<-TTCA(grp1=EGF, grp1.time=EGF.time, grp2=Control, grp2.time=Control.time,S="gene", lambda=0.6, annot=annot, annotation=annotation,pVal=0.05,codetest=FALSE, file=file, Stimulus1="epidermal+growth+factor", timeInt=c(4,12), MaxPics =10000) ## End(Not run) ## Not run: ########################################## #### GO-ANALYSE ########################################## require(quantreg);require(VennDiagram);require(tcltk2); require(tcltk); require(RISmed);require(Matrix) #source("https://bioconductor.org/biocLite.R") #biocLite("biomaRt") library(biomaRt) data(EGF,Control,annot,annotation) require(biomaRt) ensembl <- useMart("ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl") mapGO <- getBM(attributes=c("go_id","name_1006",'affy_hugene_2_0_st_v1'), filters = 'affy_hugene_2_0_st_v1', values=rownames(annot), mart =ensembl) colnames(mapGO)<-c("go_id","GO_name","probeset_id") S="GO" Control.time <- c(0,0,0.5,1,4,6,24,24,48,48,48) EGF.time <- c(0,0,0.5,0.5,1,2,4,6,8,12,18,24,24,48,48,48) file = paste0(getwd(),"/TTCA_GO") dir.create(file) TTCAresult<-TTCA(grp1=EGF, grp1.time=EGF.time, grp2=Control, grp2.time=Control.time, S="GO", pVal=0.05,lambda=0.6,codetest=FALSE, file=file, Stimulus1="epidermal+growth+factor", timeInt=c(4,12), MaxPics=10000, mapGO=mapGO) ## End(Not run)
## Not run: ########################################## #### Gene-ANALYSE ########################################## require(quantreg);require(VennDiagram);require(tcltk2); require(tcltk); require(RISmed);require(Matrix) data(EGF,Control,annot,annotation) S="gene" Control.time <- c(0,0,0.5,1,4,6,24,24,48,48,48) EGF.time <- c(0,0,0.5,0.5,1,2,4,6,8,12,18,24,24,48,48,48) file = paste0(getwd(),"/TTCA_Gene") dir.create(file) ###### TTCAresult<-TTCA(grp1=EGF, grp1.time=EGF.time, grp2=Control, grp2.time=Control.time,S="gene", lambda=0.6, annot=annot, annotation=annotation,pVal=0.05,codetest=FALSE, file=file, Stimulus1="epidermal+growth+factor", timeInt=c(4,12), MaxPics =10000) ## End(Not run) ## Not run: ########################################## #### GO-ANALYSE ########################################## require(quantreg);require(VennDiagram);require(tcltk2); require(tcltk); require(RISmed);require(Matrix) #source("https://bioconductor.org/biocLite.R") #biocLite("biomaRt") library(biomaRt) data(EGF,Control,annot,annotation) require(biomaRt) ensembl <- useMart("ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl") mapGO <- getBM(attributes=c("go_id","name_1006",'affy_hugene_2_0_st_v1'), filters = 'affy_hugene_2_0_st_v1', values=rownames(annot), mart =ensembl) colnames(mapGO)<-c("go_id","GO_name","probeset_id") S="GO" Control.time <- c(0,0,0.5,1,4,6,24,24,48,48,48) EGF.time <- c(0,0,0.5,0.5,1,2,4,6,8,12,18,24,24,48,48,48) file = paste0(getwd(),"/TTCA_GO") dir.create(file) TTCAresult<-TTCA(grp1=EGF, grp1.time=EGF.time, grp2=Control, grp2.time=Control.time, S="GO", pVal=0.05,lambda=0.6,codetest=FALSE, file=file, Stimulus1="epidermal+growth+factor", timeInt=c(4,12), MaxPics=10000, mapGO=mapGO) ## End(Not run)