| Title: | Differential Gene Expression Analysis with R |
|---|---|
| Description: | Analyses gene expression data derived from microarray experiments to detect differentially expressed genes (DEGs) by employing majority voting across five statistical models: Welch t-test, one-way ANOVA, Dunnett's test, Half's modified t-test, and the Wilcoxon-Mann-Whitney U-test. Combined p-values are computed with Fisher's method. Gene annotation is optional: users may supply a GEO SOFT annotation table or rely on row names directly. Boyer, R.S., Moore, J.S. (1991) <doi:10.1007/978-94-011-3488-0_5>. |
| Authors: | Koushik Bardhan [aut, cre, ctb] (ORCID: <https://orcid.org/0009-0002-8846-8347>), Chiranjib Sarkar [aut, ths] (ORCID: <https://orcid.org/0000-0003-1536-7449>) |
| Maintainer: | Koushik Bardhan <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.1 |
| Built: | 2026-07-03 12:10:59 UTC |
| Source: | https://github.com/cran/DGEAR |
Main orchestration function that runs five statistical tests (Welch t-test, one-way ANOVA, Dunnett's test, Half's modified t-test, and Wilcoxon-Mann-Whitney U-test) on a gene expression matrix, combines their BH-adjusted p-values with Fisher's combined probability method, and identifies differentially expressed genes (DEGs) by majority voting.
DGEAR( dataframe, con1, con2, exp1, exp2, alpha = 0.05, votting_cutoff = 3, annot_df = NULL )DGEAR( dataframe, con1, con2, exp1, exp2, alpha = 0.05, votting_cutoff = 3, annot_df = NULL )
dataframe |
A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Raw intensity / count values are automatically log2-transformed when they appear to be on a linear scale. |
con1 |
Integer. Index of the first control column. |
con2 |
Integer. Index of the last control column. |
exp1 |
Integer. Index of the first experiment column. |
exp2 |
Integer. Index of the last experiment column. |
alpha |
Numeric significance threshold for BH-adjusted p-values
(default |
votting_cutoff |
Integer. Minimum number of tests (out of 5) that must
independently declare a gene significant for it to be included in the
majority-vote DEG list (default |
annot_df |
Optional annotation data.frame with columns |
The function internally calls:
perform_t_test — Welch two-sample t-test
perform_anova — one-way ANOVA
perform_dunnett_test — Dunnett's test
perform_h_test — Half's modified t-test
perform_wilcox_test — Wilcoxon rank-sum test
Each test independently assigns an FDR flag (1 = significant, 0 = not).
The five flags are summed per gene; genes whose sum meets or exceeds
votting_cutoff are reported as DEGs (majority voting, Boyer & Moore
1991). Combined p-values across all five tests are computed with Fisher's
method via parallelFisher.
Annotation via annot_df is entirely optional. When supplied, the
first gene symbol listed for each probe (delimited by /// ) is used.
When absent, row names serve as identifiers, making the function fully
self-contained without GEO annotation files.
A named list with four elements:
DEGsData.frame of gene identifiers that passed majority voting.
FDR_TableWide data.frame with BH-adjusted p-values from every test, the Fisher-combined FDR, the ensemble voting score, and log2 fold change for every gene.
Results_TableConcise data.frame with G_Symbol,
CombineFDR, log2FC, and Ensemble score.
IndividualTestsNamed list of the raw output from each
of the five test functions (each containing a Table and a
DEGs element).
Boyer, R.S. and Moore, J.S. (1991). MJRTY — A Fast Majority Vote Algorithm. In Automated Reasoning: Essays in Honor of Woody Bledsoe, pp. 105–117. Springer, Dordrecht. doi:10.1007/978-94-011-3488-0_5
library(DGEAR) data("gene_exp_data") ## Basic usage — no annotation file needed result <- DGEAR(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20, alpha = 0.05, votting_cutoff = 2) result$DEGs head(result$FDR_Table) ## With an optional annotation data.frame (GEO SOFT format) ## annot <- read.delim("GSExxxxx_family.soft") ## result <- DGEAR(dataframe = gene_exp_data, ## con1 = 1, con2 = 10, exp1 = 11, exp2 = 20, ## annot_df = annot)library(DGEAR) data("gene_exp_data") ## Basic usage — no annotation file needed result <- DGEAR(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20, alpha = 0.05, votting_cutoff = 2) result$DEGs head(result$FDR_Table) ## With an optional annotation data.frame (GEO SOFT format) ## annot <- read.delim("GSExxxxx_family.soft") ## result <- DGEAR(dataframe = gene_exp_data, ## con1 = 1, con2 = 10, exp1 = 11, exp2 = 20, ## annot_df = annot)
This dataset contains statistically simulated gene expression data for ease of exercise.
gene_exp_datagene_exp_data
A data frame with 10 rows and 20 columns, the columns represents samples, say first 10 columns 1 to 10 being control and 11 to 20 being experiment. Whereas, the rows of the dataset contains genes. First 5 out of 10 genes, gene1-gene5 are the true DEGs as the expression values for the first 10 samples are ~13 times higher than the rest.
# Data will be loaded with lazy loading and can be accessible when needed. data("gene_exp_data") head(gene_exp_data)# Data will be loaded with lazy loading and can be accessible when needed. data("gene_exp_data") head(gene_exp_data)
Performs a one-way ANOVA (Welch correction via oneway.test)
for every gene (row) in the expression matrix, applies BH correction, and
returns a results table together with the list of significant DEGs.
perform_anova(dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL)perform_anova(dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL)
dataframe |
A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale. |
con1 |
Integer. Index of the first control column. |
con2 |
Integer. Index of the last control column. |
exp1 |
Integer. Index of the first experiment column. |
exp2 |
Integer. Index of the last experiment column. |
alpha |
Numeric significance threshold for BH-adjusted p-values
(default |
annot_df |
Optional annotation data.frame with columns |
A named list:
Data.frame with columns G_Symbol, log2FC,
statistic.F, p.value, BH, fdr.
Data.frame of significant gene identifiers.
library(DGEAR) data("gene_exp_data") result <- perform_anova(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGslibrary(DGEAR) data("gene_exp_data") result <- perform_anova(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGs
Performs Dunnett's multiple comparison test (control vs. treatment) for every
gene (row) in the expression matrix using DunnettTest,
applies BH correction, and returns a results table with the list of significant
DEGs. Genes with insufficient variance, missing values, or other numerical
issues are skipped gracefully and receive NA p-values.
perform_dunnett_test( dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL )perform_dunnett_test( dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL )
dataframe |
A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale. |
con1 |
Integer. Index of the first control column. |
con2 |
Integer. Index of the last control column. |
exp1 |
Integer. Index of the first experiment column. |
exp2 |
Integer. Index of the last experiment column. |
alpha |
Numeric significance threshold for BH-adjusted p-values
(default |
annot_df |
Optional annotation data.frame with columns |
A named list:
Data.frame with columns G_Symbol, log2FC,
p.value, BH, fdr.
Data.frame of significant gene identifiers.
library(DGEAR) data("gene_exp_data") result <- perform_dunnett_test(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGslibrary(DGEAR) data("gene_exp_data") result <- perform_dunnett_test(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGs
Computes a modified t-statistic (sometimes called "Half's t-test") that uses only the control standard deviation in its denominator, applies BH correction, and returns a results table together with the list of significant DEGs.
perform_h_test( dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL )perform_h_test( dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL )
dataframe |
A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale. |
con1 |
Integer. Index of the first control column. |
con2 |
Integer. Index of the last control column. |
exp1 |
Integer. Index of the first experiment column. |
exp2 |
Integer. Index of the last experiment column. |
alpha |
Numeric significance threshold for BH-adjusted p-values
(default |
annot_df |
Optional annotation data.frame with columns |
A named list:
Data.frame with columns G_Symbol, log2FC,
statistic, p.value, BH, fdr.
Data.frame of significant gene identifiers.
library(DGEAR) data("gene_exp_data") result <- perform_h_test(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGslibrary(DGEAR) data("gene_exp_data") result <- perform_h_test(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGs
Performs an independent two-sample Welch t-test for every gene (row) in the expression matrix, applies Benjamini-Hochberg (BH) correction, and returns a results table together with the list of significant DEGs.
perform_t_test( dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL )perform_t_test( dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL )
dataframe |
A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale. |
con1 |
Integer. Index of the first control column. |
con2 |
Integer. Index of the last control column. |
exp1 |
Integer. Index of the first experiment column. |
exp2 |
Integer. Index of the last experiment column. |
alpha |
Numeric significance threshold for BH-adjusted p-values
(default |
annot_df |
Optional annotation data.frame with columns |
A named list:
Data.frame with columns G_Symbol, log2FC,
statistic.t, p.value, BH, fdr.
Data.frame of gene identifiers whose BH-adjusted p-value
is alpha.
library(DGEAR) data("gene_exp_data") result <- perform_t_test(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGslibrary(DGEAR) data("gene_exp_data") result <- perform_t_test(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGs
Performs the Wilcoxon rank-sum (Mann-Whitney U) test for every gene (row) in the expression matrix, applies BH correction, and returns a results table together with the list of significant DEGs.
perform_wilcox_test( dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL )perform_wilcox_test( dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL )
dataframe |
A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale. |
con1 |
Integer. Index of the first control column. |
con2 |
Integer. Index of the last control column. |
exp1 |
Integer. Index of the first experiment column. |
exp2 |
Integer. Index of the last experiment column. |
alpha |
Numeric significance threshold for BH-adjusted p-values
(default |
annot_df |
Optional annotation data.frame with columns |
A named list:
Data.frame with columns G_Symbol, log2FC,
statistic.W, p.value, BH, fdr.
Data.frame of significant gene identifiers.
library(DGEAR) data("gene_exp_data") result <- perform_wilcox_test(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGslibrary(DGEAR) data("gene_exp_data") result <- perform_wilcox_test(dataframe = gene_exp_data, con1 = 1, con2 = 10, exp1 = 11, exp2 = 20) head(result$Table) result$DEGs