Title: | E-Value in the Omics Data Association Studies |
---|---|
Description: | In the omics data association studies, it is common to conduct the p-value corrections to control the false significance. Beyond the P-value corrections, E-value is recently studied to facilitate multiple testing correction based on V. Vovk and R. Wang (2021) <doi:10.1214/20-AOS2020>. This package provides E-value calculation for DNA methylation data and RNA-seq data. Currently, five data formats are supported: DNA methylation levels using DMR detection tools (BiSeq, DMRfinder, MethylKit, Metilene and other DNA methylation tools) and RNA-seq data. The relevant references are listed below: Katja Hebestreit and Hans-Ulrich Klein (2022) <doi:10.18129/B9.bioc.BiSeq>; Altuna Akalin et.al (2012) <doi:10.18129/B9.bioc.methylKit>. |
Authors: | Yifan Yang [aut, cre, cph], Xiaoqing Pan [aut], Haoyuan Liu [aut] |
Maintainer: | Yifan Yang <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.2.4 |
Built: | 2024-11-12 06:51:48 UTC |
Source: | CRAN |
The dummy output for BiSeq illustrating purpose. It is dummy.
- seqnames: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- width
- strand: Strand
- median.p
- median.meth.group1
- median.meth.group2
- median.meth.diff
Notice that there are "NaN" within the feature columns.
Please check the vignette "metevalue" for details.
The methyrate for BiSeq illustrating purpose. It is dummy.
The data includes 12 columns.
- chr: string Chromosome
- pos: int Position
- g1~g2: methylation rate data in groups, repeat 5 times. Notice that there are "NaN" within the feature columns.
Please check the vignette "metevalue" for details.
The output dummy data for "RNA" meythod illustrating purpose.
The data includes 10 columns.
- treated1fb:
- treated2fb:
- treated3fb:
- untreated1fb:
- untreated2fb:
- untreated3fb:
- untreated4fb:
This data contains 8166 rows and 7 columns.
Please check the vignette "metevalue" for details.
# library("pasilla") # pasCts <- system.file("extdata", # "pasilla_gene_counts.tsv", # package="pasilla", mustWork=TRUE) # pasAnno <- system.file("extdata", # "pasilla_sample_annotation.csv", # package="pasilla", mustWork=TRUE) # cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id")) # coldata <- read.csv(pasAnno, row.names=1) # coldata <- coldata[,c("condition","type")] # coldata$condition <- factor(coldata$condition) # coldata$type <- factor(coldata$type) # # library("DESeq2") # colnames(cts)=paste0(colnames(cts),'fb') # cts = cts[,rownames(coldata)] # dds <- DESeqDataSetFromMatrix(countData = cts, # colData = coldata, # design = ~ condition) # dds <- DESeq(dds) # # # dat <- t(t(cts)/(dds$sizeFactor)) # dat.out <- dat[rowSums(dat >5)>=0.8*ncol(dat),] # # demo_desq_out <- log(dat.out)
# library("pasilla") # pasCts <- system.file("extdata", # "pasilla_gene_counts.tsv", # package="pasilla", mustWork=TRUE) # pasAnno <- system.file("extdata", # "pasilla_sample_annotation.csv", # package="pasilla", mustWork=TRUE) # cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id")) # coldata <- read.csv(pasAnno, row.names=1) # coldata <- coldata[,c("condition","type")] # coldata$condition <- factor(coldata$condition) # coldata$type <- factor(coldata$type) # # library("DESeq2") # colnames(cts)=paste0(colnames(cts),'fb') # cts = cts[,rownames(coldata)] # dds <- DESeqDataSetFromMatrix(countData = cts, # colData = coldata, # design = ~ condition) # dds <- DESeq(dds) # # # dat <- t(t(cts)/(dds$sizeFactor)) # dat.out <- dat[rowSums(dat >5)>=0.8*ncol(dat),] # # demo_desq_out <- log(dat.out)
The output dummy dataset for DMRfinder illustrating purpose.
The data includes 6 columns.
- chr: string Chromosome
- pos: int Position
- g1~g2: methylation rate data in groups, repeat 2 times. Notice that there are "NaN" within the feature columns.
Please check the vignette "metevalue" for details.
The methyrate for BiSeq illustrating purpose. It is dummy.
The data includes 6 columns.
- chr: string Chromosome
- pos: int Position
- g1~g2: methylation rate data in groups, repeat 2 times. Notice that there are "NaN" within the feature columns.
Please check the vignette "metevalue" for details.
The methyrate dataset samples "myCpG" data from the methylKit (a bioconductor package) for illustrating purpose.
The data includes 7 columns:
- chr: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- strand: Strand
- pvalue: The adjusted p-value based on BH method in MWU-test
- qvalue: cutoff for qvalue of differential methylation statistic
- methyl.diff: The difference between the group means of methylation level
Please check the vignette "metevalue" for details.
Akalin, Altuna, et al. "methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles." Genome biology 13.10 (2012): 1-9. doi:10.1186/gb-2012-13-10-r87
The methyrate dataset samples "myCpG" data from the methylKit (a bioconductor package) for illustrating purpose.
The data includes 6 columns.
- chr: string Chromosome
- pos: int Position
- g1~g2: methylation rate data in groups (4 columns)
Please check the vignette "metevalue" for details.
Akalin, Altuna, et al. "methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles." Genome biology 13.10 (2012): 1-9. doi:10.1186/gb-2012-13-10-r87
The methyrate for metilene illustrating purpose. It is dummy.
The data includes 18 columns.
- chr: string Chromosome
- pos: int Position
- g1~g2: methylation rate data in groups.
Notice that there are "NaN" within the feature columns.
Please check the vignette "metevalue" for details.
The output dummy data for "metilene" meythod illustrating purpose.
The data includes 10 columns.
- V1: string Chromosome
- V2: The positions of the start sites of the corresponding region
- V3: The positions of the end sites of the corresponding region
- V4- V10: data value.
Please check the vignette "metevalue" for details.
Build-in data process function
evalue_buildin_sql(a, b, method = "metilene")
evalue_buildin_sql(a, b, method = "metilene")
a |
data frame of the methylation rate |
b |
data frame of output data corresponding to the "method" option |
method |
"metilene" or "biseq", "DMRfinder" or "methylKit" |
a data frame combines data frame a and b corresponding to the "method" option
data("demo_metilene_out") data("demo_metilene_input") result = evalue_buildin_var_fmt_nm(demo_metilene_input, demo_metilene_out, method="metilene") result_sql = evalue_buildin_sql(result$a, result$b, method="metilene")
data("demo_metilene_out") data("demo_metilene_input") result = evalue_buildin_var_fmt_nm(demo_metilene_input, demo_metilene_out, method="metilene") result_sql = evalue_buildin_sql(result$a, result$b, method="metilene")
Build-in check file format function Perform the format check and data clean for the "metilene" or "biseq", "DMRfinder" or "methylKit" method correspondingly.
evalue_buildin_var_fmt_nm(a, b, method = "metilene")
evalue_buildin_var_fmt_nm(a, b, method = "metilene")
a |
data frame of the methylation rate |
b |
data frame of output data corresponding to the "method" option |
method |
"metilene" or "biseq", "DMRfinder" or "methylKit" |
list(a, b) which contains the cleaned data correspondingly
data("demo_metilene_out") data("demo_metilene_input") evalue_buildin_var_fmt_nm(demo_metilene_input, demo_metilene_out, method="metilene")
data("demo_metilene_out") data("demo_metilene_input") evalue_buildin_var_fmt_nm(demo_metilene_input, demo_metilene_out, method="metilene")
Please check vignette "metevalue" for details.
metevalue.biseq( methyrate, BiSeq.output, adjust.methods = "BH", sep = "\t", bheader = FALSE )
metevalue.biseq( methyrate, BiSeq.output, adjust.methods = "BH", sep = "\t", bheader = FALSE )
methyrate |
is the methyrate file. For example:
The columns are (in order): - chr: Chromosome - pos: int Position - g1~g2: methylation rate data in groups |
|||||||||||||||||
BiSeq.output |
is the output file of BiSeq. The columns are (in order): - seqnames: Chromosome - start: The positions of the start sites of the corresponding region - end: The positions of the end sites of the corresponding region - width: The number of CpG sites within the corresponding region - strand: Strand - median.p: The median p-value among CpG sites within the corresponding region - median.meth.group1: The median methylation rate in the first group among CpG sites within the corresponding region - median.meth.group2: The median methylation rate in the second group among CpG sites within the corresponding region - median.meth.diff: The median methylation difference between groups among CpG sites within the corresponding region |
|||||||||||||||||
adjust.methods |
is the adjust methods of e-value. It can be 'bonferroni', 'hochberg', 'holm', 'hommel', 'BH', 'BY' |
|||||||||||||||||
sep |
seperator, default is the TAB key. |
|||||||||||||||||
bheader |
a logical value indicating whether the BiSeq.output file contains the names of the variables as its first line. By default, bheader = FALSE. |
a dataframe, the columns are (in order):
- chr: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- q-value: The adjusted p-value based on BH method in MWU-test
- methyl.diff: The difference between the group means of methylation level
- CpGs: The number of CpG sites within the corresponding region
- p : p-value based on MWU-test
- p2: p-value based on 2D KS-test
- m1: The absolute mean methylation level for the corresponding segment of group 1
- m2: The absolute mean methylation level for the corresponding segment of group 2
- e_value: The e-value of the corresponding region
#\donttest{ #data("demo_biseq_methyrate") #data("demo_biseq_DMR") #example_tempfiles = tempfile(c("demo_biseq_methyrate", "demo_biseq_DMR")) #tempdir() #### write to temp file #### #write.table(demo_biseq_methyrate, file=example_tempfiles[1],row.names=FALSE, # col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_biseq_DMR, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #### compute e-value and its adjustment #### #result = metevalue.biseq(example_tempfiles[1], # example_tempfiles[2], bheader = TRUE) #}
#\donttest{ #data("demo_biseq_methyrate") #data("demo_biseq_DMR") #example_tempfiles = tempfile(c("demo_biseq_methyrate", "demo_biseq_DMR")) #tempdir() #### write to temp file #### #write.table(demo_biseq_methyrate, file=example_tempfiles[1],row.names=FALSE, # col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_biseq_DMR, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #### compute e-value and its adjustment #### #result = metevalue.biseq(example_tempfiles[1], # example_tempfiles[2], bheader = TRUE) #}
Check the BiSeq data format
metevalue.biseq.chk( input_filename_a, input_filename_b, sep = "\t", bheader = FALSE )
metevalue.biseq.chk( input_filename_a, input_filename_b, sep = "\t", bheader = FALSE )
input_filename_a |
metilene input file path. This file is a sep (e.g. TAB) separated file with two key columns and several value columns: For exampe:
- chr and pos are keys; - g1~g2: methylation rate data in groups. |
|||||||||||||||||
input_filename_b |
metilene input file path. This file should stored as a sep(e.g. TAB) separated file with two key columns and several value columns: The columns are (in order): - chr: Chromosome - start: The position of the start site of the corresponding region - end: The position of the end site of the corresponding region - range: The range of the corresponding region - strand: Strand - median.p: The median of p-values in the corresponding region - median.meth.group1 : The median of methylation level for the corresponding segment of group 1 - median.meth.group2 : The median of methylation level for the corresponding segment of group 2 - median.meth.diff: The median of the difference between the methylation level |
|||||||||||||||||
sep |
separator, default is the TAB key. |
|||||||||||||||||
bheader |
a logical value indicating whether the input_filename_b file contains the names of the variables as its first line. By default, bheader = FALSE. |
list(file_a, file_b, file_a_b) returns a list with three pr-handled data.frames corresponding to the input_filename_a, input_filename_b file and a A JOIN B file.
#data("demo_biseq_methyrate") #data("demo_biseq_DMR") #example_tempfiles = tempfile(c("demo_biseq_methyrate", "demo_biseq_DMR")) #tempdir() #write.table(demo_biseq_methyrate, file=example_tempfiles[1],row.names=FALSE, # col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_biseq_DMR, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #### compute e-value and its adjustment #### #result = metevalue.biseq.chk(example_tempfiles[1], # example_tempfiles[2], bheader = TRUE)
#data("demo_biseq_methyrate") #data("demo_biseq_DMR") #example_tempfiles = tempfile(c("demo_biseq_methyrate", "demo_biseq_DMR")) #tempdir() #write.table(demo_biseq_methyrate, file=example_tempfiles[1],row.names=FALSE, # col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_biseq_DMR, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #### compute e-value and its adjustment #### #result = metevalue.biseq.chk(example_tempfiles[1], # example_tempfiles[2], bheader = TRUE)
Calculate E-value of the DMRfinder data format
metevalue.DMRfinder( methyrate, DMRfinder.output, adjust.methods = "BH", sep = "\t", bheader = FALSE )
metevalue.DMRfinder( methyrate, DMRfinder.output, adjust.methods = "BH", sep = "\t", bheader = FALSE )
methyrate |
is the methyrate file. For example:
The columns are (in order): - chr: Chromosome - pos: int Position - g1~g2: methylation rate data in groups |
|||||||||||||||||
DMRfinder.output |
is the output file of DMRfinder. - chr: Chromosome - start: The positions of the start sites of the corresponding region - end: The positions of the end sites of the corresponding region - CpG: The number of CpG sites within the corresponding region - Control.mu: The average methylation rate in control group - Expt1.mu: The average methylation rate in experiment group - Control.Expt1.diff: The methylation difference between control and experiment groups - Control.Expt1.pval: P-value based on Wald-test. |
|||||||||||||||||
adjust.methods |
is the adjust methods of e-value. It can be 'bonferroni', 'hochberg', 'holm', 'hommel', 'BH', 'BY' |
|||||||||||||||||
sep |
seperator, default is the TAB key. |
|||||||||||||||||
bheader |
a logical value indicating whether the DMRfinder.output file contains the names of the variables as its first line. By default, bheader = FALSE. |
a dataframe, the columns are (in order):
- chr: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- q-value: The adjusted p-value based on BH method in MWU-test
- methyl.diff: The difference between the group means of methylation level
- CpGs: The number of CpG sites within the corresponding region
- p : p-value based on MWU-test
- p2: p-value based on 2D KS-test
- m1: The absolute mean methylation level for the corresponding segment of group 1
- m2: The absolute mean methylation level for the corresponding segment of group 2
- e_value: The e-value of the corresponding region
data(demo_DMRfinder_rate_combine) data(demo_DMRfinder_DMRs) #example_tempfiles = tempfile(c("rate_combine", "DMRfinder_out")) #tempdir() #write.table(demo_DMRfinder_rate_combine, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_DMRfinder_DMRs, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue._DMRfinder(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE) #head(result)
data(demo_DMRfinder_rate_combine) data(demo_DMRfinder_DMRs) #example_tempfiles = tempfile(c("rate_combine", "DMRfinder_out")) #tempdir() #write.table(demo_DMRfinder_rate_combine, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_DMRfinder_DMRs, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue._DMRfinder(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE) #head(result)
Check the DMRfinder data format
metevalue.DMRfinder.chk( input_filename_a, input_filename_b, sep = "\t", bheader = FALSE )
metevalue.DMRfinder.chk( input_filename_a, input_filename_b, sep = "\t", bheader = FALSE )
input_filename_a |
the combined data of methylation rate file. This file is a sep (e.g. TAB) separated file with two key columns and several value columns. For exampe:
The columns are (in order): - chr and pos are keys; - g1~g2: methylation rate data in groups. |
|||||||||||||||||
input_filename_b |
the output file of DMRfinder. The columns are (in order): - chr: Chromosome - start: The position of the start sites of the corresponding region - end: The position of the end sites of the corresponding region - CpG: The number of CpG sites within the corresponding region - 'Control:mu': The absolute mean methylation level for the corresponding segment of the control group - 'Exptl:mu': The absolute mean methylation level for the corresponding segment of the experimental group - 'Control->Exptl:diff': The difference between the group means of methylation level - p: p-value |
|||||||||||||||||
sep |
separator, default is the TAB key. |
|||||||||||||||||
bheader |
a logical value indicating whether the input_filename_b file contains the names of the variables as its first line. By default, bheader = FALSE. |
list(file_a, file_b, file_a_b) returns a list with three pr-handled data.frames corresponding to the input_filename_a, input_filename_b file and a A JOIN B file.
data(demo_DMRfinder_rate_combine) data(demo_DMRfinder_DMRs) #example_tempfiles = tempfile(c("rate_combine", "DMRfinder_out")) #tempdir() #write.table(demo_DMRfinder_rate_combine, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_DMRfinder_DMRs, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue.DMRfinder.chk(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE)
data(demo_DMRfinder_rate_combine) data(demo_DMRfinder_DMRs) #example_tempfiles = tempfile(c("rate_combine", "DMRfinder_out")) #tempdir() #write.table(demo_DMRfinder_rate_combine, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_DMRfinder_DMRs, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue.DMRfinder.chk(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE)
Calculate E-value of the methylKit data format
metevalue.methylKit( methyrate, methylKit.output, adjust.methods = "BH", sep = "\t", bheader = FALSE )
metevalue.methylKit( methyrate, methylKit.output, adjust.methods = "BH", sep = "\t", bheader = FALSE )
methyrate |
is the data of methylation rates of each sites and group. For example:
The columns are (in order): - chr: Chromosome - pos: int Position - g1~g2: methylation rate data in groups |
|||||||||||||||||
methylKit.output |
is the output data with e-value of each region - chr: Chromosome - start: The positions of the start sites of the corresponding region - end: The positions of the end sites of the corresponding region - strand: Strand - pvalue: The adjusted p-value based on BH method in MWU-test - qvalue: cutoff for qvalue of differential methylation statistic - methyl.diff: The difference between the group means of methylation level |
|||||||||||||||||
adjust.methods |
is the adjust methods of e-value. It can be 'bonferroni', 'hochberg', 'holm', 'hommel', 'BH', 'BY' |
|||||||||||||||||
sep |
seperator, default is the TAB key. |
|||||||||||||||||
bheader |
a logical value indicating whether the input_filename_b file contains the names of the variables as its first line. By default, bheader = FALSE. |
a dataframe, the columns are (in order):
- chr: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- q-value: The adjusted p-value based on BH method in MWU-test
- methyl.diff: The difference between the group means of methylation level
- CpGs: The number of CpG sites within the corresponding region
- p : p-value based on MWU-test
- p2: p-value based on 2D KS-test
- m1: The absolute mean methylation level for the corresponding segment of group 1
- m2: The absolute mean methylation level for the corresponding segment of group 2
- e_value: The e-value of the corresponding region
data(demo_methylkit_methyrate) data(demo_methylkit_met_all) ## example_tempfiles = tempfile(c("rate_combine", "methylKit_DMR_raw")) ## tempdir() ## write.table(demo_methylkit_methyrate, file=example_tempfiles[1], ## row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') ## write.table(demo_methylkit_met_all, file=example_tempfiles[2], ## sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) ## result = metevalue.methylKit(example_tempfiles[1], example_tempfiles[2], ## bheader = TRUE) ## str(result)
data(demo_methylkit_methyrate) data(demo_methylkit_met_all) ## example_tempfiles = tempfile(c("rate_combine", "methylKit_DMR_raw")) ## tempdir() ## write.table(demo_methylkit_methyrate, file=example_tempfiles[1], ## row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') ## write.table(demo_methylkit_met_all, file=example_tempfiles[2], ## sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) ## result = metevalue.methylKit(example_tempfiles[1], example_tempfiles[2], ## bheader = TRUE) ## str(result)
Check the methylKit data format
metevalue.methylKit.chk( input_filename_a, input_filename_b, sep = "\t", bheader = FALSE )
metevalue.methylKit.chk( input_filename_a, input_filename_b, sep = "\t", bheader = FALSE )
input_filename_a |
the combined data of methylation rate file. This file is a sep (e.g. TAB) separated file with two key columns and several value columns: For exampe:
- chr and pos are keys; - g1~g2: methylation rate data in groups. |
|||||||||||||||||
input_filename_b |
the output file of methylKit. a methylDiff or methylDiffDB object containing the differential methylated locations satisfying the criteria. The columns are (in order): - chr: Chromosome - start: The position of the start sites of the corresponding region - end: The position of the end sites of the corresponding region - strand: Strand - p: p-value - qvalue: The adjusted p-value based on BH method - meth.diff : The difference between the group means of methylation level |
|||||||||||||||||
sep |
separator, default is the TAB key. |
|||||||||||||||||
bheader |
a logical value indicating whether the input_filename_b file contains the names of the variables as its first line. By default, bheader = FALSE. |
list(file_a, file_b, file_a_b) returns a list with three pr-handled data.frames corresponding to the input_filename_a, input_filename_b file and a A JOIN B file.
data(demo_methylkit_methyrate) data(demo_methylkit_met_all) ## example_tempfiles = tempfile(c("rate_combine", "methylKit_DMR_raw")) ## tempdir() ## write.table(demo_methylkit_methyrate, file=example_tempfiles[1], ## row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') ## write.table(demo_methylkit_met_all, file=example_tempfiles[2], ## sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) ## result = metevalue.methylKit.chk(example_tempfiles[1], example_tempfiles[2], ## bheader = TRUE)
data(demo_methylkit_methyrate) data(demo_methylkit_met_all) ## example_tempfiles = tempfile(c("rate_combine", "methylKit_DMR_raw")) ## tempdir() ## write.table(demo_methylkit_methyrate, file=example_tempfiles[1], ## row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') ## write.table(demo_methylkit_met_all, file=example_tempfiles[2], ## sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) ## result = metevalue.methylKit.chk(example_tempfiles[1], example_tempfiles[2], ## bheader = TRUE)
Calculate E-value of the Metilene data format
metevalue.metilene( methyrate, metilene.output, adjust.methods = "BH", sep = "\t", bheader = FALSE )
metevalue.metilene( methyrate, metilene.output, adjust.methods = "BH", sep = "\t", bheader = FALSE )
methyrate |
metilene input file path. This file is a sep (e.g. TAB) separated file with two key columns and several value columns. For exampe:
The columns are (in order): - chr and pos are keys; - g1~g2: methylation rate data in groups. |
|||||||||||||||||
metilene.output |
metilene input file path. This file should stored as a sep(e.g. TAB) separated file with two key columns and several value columns: The columns are (in order): - chr: Chromosome - start: The positions of the start sites of the corresponding region - end: The positions of the end sites of the corresponding region - q-value: The adjusted p-value based on BH method in MWU-test - methyl.diff: The difference between the group means of methylation level - CpGs: The number of CpG sites within the corresponding region - p : p-value based on MWU-test - p2: p-value based on 2D KS-test - m1: The absolute mean methylation level for the corresponding segment of group 1 - m2: The absolute mean methylation level for the corresponding segment of group 2 |
|||||||||||||||||
adjust.methods |
is the adjust methods of e-value. It can be 'bonferroni', 'hochberg', 'holm', 'hommel', 'BH', 'BY' |
|||||||||||||||||
sep |
seperator, default is the TAB key. |
|||||||||||||||||
bheader |
a logical value indicating whether the metilene.output file contains the names of the variables as its first line. By default, bheader = FALSE. |
a dataframe, the columns are (in order):
- chr: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- q-value: The adjusted p-value based on BH method in MWU-test
- methyl.diff: The difference between the group means of methylation level
- CpGs: The number of CpG sites within the corresponding region
- p : p-value based on MWU-test
- p2: p-value based on 2D KS-test
- m1: The absolute mean methylation level for the corresponding segment of group 1
- m2: The absolute mean methylation level for the corresponding segment of group 2
- e_value: The e-value of the corresponding region
#### metilene example ####' data(demo_metilene_input) data(demo_metilene_out) #example_tempfiles = tempfile(c("metilene_input", "metilene_out")) #tempdir() #write.table(demo_metilene_input, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_metilene_out, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue.metilene(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE) #head(result)
#### metilene example ####' data(demo_metilene_input) data(demo_metilene_out) #example_tempfiles = tempfile(c("metilene_input", "metilene_out")) #tempdir() #write.table(demo_metilene_input, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_metilene_out, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue.metilene(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE) #head(result)
Check the Metilene data format
metevalue.metilene.chk( input_filename_a, input_filename_b, sep = "\t", bheader = FALSE )
metevalue.metilene.chk( input_filename_a, input_filename_b, sep = "\t", bheader = FALSE )
input_filename_a |
metilene input file path. This file is a sep (e.g. TAB) separated file with two key columns and several value columns. For exampe:
The columns are (in order): - chr and pos are keys; - g1~g2: methylation rate data in groups. |
|||||||||||||||||
input_filename_b |
metilene input file path. This file should stored as a sep(e.g. TAB) separated file with two key columns and several value columns: The columns are (in order): - chr: Chromosome - start: The position of the start sites of the corresponding region - end: The position of the end sites of the corresponding region - q-value: The adjusted p-value based on BH method in MWU-test - methyl.diff: The difference between the group means of methylation level - CpGs: The number of CpG sites within the corresponding region - p : p-value based on MWU-test - p2: p-value based on 2D KS-test - m1: The absolute mean methylation level for the corresponding segment of group 1 - m2: The absolute mean methylation level for the corresponding segment of group 2 |
|||||||||||||||||
sep |
separator, default is the TAB key. |
|||||||||||||||||
bheader |
a logical value indicating whether the input_filename_b file contains the names of the variables as its first line. By default, bheader = FALSE. |
list(file_a, file_b, file_a_b) returns a list with three pr-handled data.frames corresponding to the input_filename_a, input_filename_b file and a A JOIN B file.
#data(demo_metilene_input) #data(demo_metilene_out) #example_tempfiles = tempfile(c("metilene_input", "metilene_out")) #tempdir() #write.table(demo_metilene_input, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_metilene_out, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue.metilene.chk(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE)
#data(demo_metilene_input) #data(demo_metilene_out) #example_tempfiles = tempfile(c("metilene_input", "metilene_out")) #tempdir() #write.table(demo_metilene_input, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table(demo_metilene_out, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue.metilene.chk(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE)
A general method to calculate the e-value for RNA-seq data.
metevalue.RNA_general(rna, group1_name, group2_name)
metevalue.RNA_general(rna, group1_name, group2_name)
rna |
data.frame: A data.frame object of RNAseq data. For example:
Row names (TAG1 and TAG2 in the above example) is also suggested. |
|||||||||||||||||||||
group1_name |
charactor: The name (pattern) of the first group. For example, "treated" in the above example. For example 'treated_abc' and 'treated' will be considered as the same group if 'group1_name = "treated"'. Use this with care in practice. |
|||||||||||||||||||||
group2_name |
charactor: The name (pattern) of the second group. For example, "untreated" in the above example. For example 'untreated_abc' and 'untreated' will be considered as the same group if 'group2_name = "untreated"'. Use this with care in practice. |
evalue
data("demo_desq_out") evalue = metevalue.RNA_general(demo_desq_out, 'treated','untreated')
data("demo_desq_out") evalue = metevalue.RNA_general(demo_desq_out, 'treated','untreated')
The data file could be pre-handled by the evalue.metilene.chk function.
varevalue.metilene( a, b, a_b, group1_name = "g1", group2_name = "g2", adjust.methods = "BH" )
varevalue.metilene( a, b, a_b, group1_name = "g1", group2_name = "g2", adjust.methods = "BH" )
a |
A data.frame object:
i.e two key columns (chrom, pos) with several value columns in groups. |
|||||||||||||||||
b |
A data.frame object stores the data, the columns are (in order): - chr: Chromosome - start: The positions of the start sites of the corresponding region - end: The positions of the end sites of the corresponding region - q-value: The adjusted p-value based on BH method in MWU-test - methyl.diff: The difference between the group means of methylation level - CpGs: The number of CpG sites within the corresponding region - p : p-value based on MWU-test - p2: p-value based on 2D KS-test - m1: The absolute mean methylation level for the corresponding segment of group 1 - m2: The absolute mean methylation level for the corresponding segment of group 2 |
|||||||||||||||||
a_b |
A data.frame object of a join b with particular data clean processes. Check the function [evalue.methylKit.chk()] for more details. |
|||||||||||||||||
group1_name |
charactor: The name of the first group. For example, "g1" in the above example. |
|||||||||||||||||
group2_name |
charactor: The name of the second group. For example, "g2" in the above example. |
|||||||||||||||||
adjust.methods |
is the adjust methods of e-value. It can be 'bonferroni', 'hochberg', 'holm', 'hommel', 'BH', 'BY'. The default value is 'BH'. |
a dataframe, the columns are (in order):
- chr: Chromosome
- start: The positions of the start sites of the corresponding region
- end: The positions of the end sites of the corresponding region
- q-value: The adjusted p-value based on BH method in MWU-test
- methyl.diff: The difference between the group means of methylation level
- CpGs: The number of CpG sites within the corresponding region
- p : p-value based on MWU-test
- p2: p-value based on 2D KS-test
- m1: The absolute mean methylation level for the corresponding segment of group 1
- m2: The absolute mean methylation level for the corresponding segment of group 2
- e_value: The e-value of the corresponding region
#data(demo_metilene_input) #data(demo_metilene_out) #result = evalue_buildin_var_fmt_nm(demo_metilene_input, demo_metilene_out, method="metilene") #result = list(a = result$a, # b = result$b, # a_b = evalue_buildin_sql(result$a, result$b, method="metilene")) #result = varevalue.metilene(result$a, result$b, result$a_b)
#data(demo_metilene_input) #data(demo_metilene_out) #result = evalue_buildin_var_fmt_nm(demo_metilene_input, demo_metilene_out, method="metilene") #result = list(a = result$a, # b = result$b, # a_b = evalue_buildin_sql(result$a, result$b, method="metilene")) #result = varevalue.metilene(result$a, result$b, result$a_b)
The input data file is just the DNA methylation rates using the similar format above, with no need for another data file output by different tools. The Chromosome name, start and end sites shoule be specified in the function.
varevalue.single_general( methyrate, group1_name = "g1", group2_name = "g2", chr, start, end )
varevalue.single_general( methyrate, group1_name = "g1", group2_name = "g2", chr, start, end )
methyrate |
data.frame: A data.frame object of methylation rates, the columns should be (name of groups can be self-defined)
|
|||||||||||||||||
group1_name |
charactor: The name (pattern) of the first group. For example, "g1" in the above example. For example 'g1_abc' and 'g1' will be considered as the same group if 'group1_name = "g1"'. Use this with care in practice. |
|||||||||||||||||
group2_name |
charactor: The name (pattern) of the second group. For example, "g2" in the above example. For example 'g2_abc' and 'g2' will be considered as the same group if 'group2_name = "g2"'. Use this with care in practice. |
|||||||||||||||||
chr |
charactor: The Chromosome name. Typically, it is a string like "chr21" and so on. |
|||||||||||||||||
start |
integer: The position of the start site of the corresponding region |
|||||||||||||||||
end |
integer: The position of the end site of the corresponding region |
evalue
#data("demo_metilene_input") #varevalue.single_general(demo_metilene_input, chr = "chr21", start = 9437432, end = 9437540) # [1] 2.626126e+43 #### Compare to `metevalue.metilene` #### data(demo_metilene_out) #example_tempfiles = tempfile(c("metilene_input", "metilene_out")) #tempdir() #write.table(demo_metilene_input, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table (demo_metilene_out, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue.metilene(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE) # result[with(result, chr == 'chr21' & start == '9437432' & end == '9437540'), ncol(result)] # [1] 2.626126e+43
#data("demo_metilene_input") #varevalue.single_general(demo_metilene_input, chr = "chr21", start = 9437432, end = 9437540) # [1] 2.626126e+43 #### Compare to `metevalue.metilene` #### data(demo_metilene_out) #example_tempfiles = tempfile(c("metilene_input", "metilene_out")) #tempdir() #write.table(demo_metilene_input, file=example_tempfiles[1], # row.names=FALSE, col.names=TRUE, quote=FALSE, sep='\t') #write.table (demo_metilene_out, file=example_tempfiles[2], # sep ="\t", row.names =FALSE, col.names =TRUE, quote =FALSE) #result = metevalue.metilene(example_tempfiles[1], example_tempfiles[2], # bheader = TRUE) # result[with(result, chr == 'chr21' & start == '9437432' & end == '9437540'), ncol(result)] # [1] 2.626126e+43