Package 'DGEAR'

Title: Differential Gene Expression Analysis with R
Description: Analyses gene expression data derived from microarray experiments to detect differentially expressed genes (DEGs) by employing majority voting across five statistical models: Welch t-test, one-way ANOVA, Dunnett's test, Half's modified t-test, and the Wilcoxon-Mann-Whitney U-test. Combined p-values are computed with Fisher's method. Gene annotation is optional: users may supply a GEO SOFT annotation table or rely on row names directly. Boyer, R.S., Moore, J.S. (1991) <doi:10.1007/978-94-011-3488-0_5>.
Authors: Koushik Bardhan [aut, cre, ctb] (ORCID: <https://orcid.org/0009-0002-8846-8347>), Chiranjib Sarkar [aut, ths] (ORCID: <https://orcid.org/0000-0003-1536-7449>)
Maintainer: Koushik Bardhan <[email protected]>
License: MIT + file LICENSE
Version: 0.2.1
Built: 2026-07-03 12:10:59 UTC
Source: https://github.com/cran/DGEAR

Help Index


Differential Gene Expression Analysis with R

Description

Main orchestration function that runs five statistical tests (Welch t-test, one-way ANOVA, Dunnett's test, Half's modified t-test, and Wilcoxon-Mann-Whitney U-test) on a gene expression matrix, combines their BH-adjusted p-values with Fisher's combined probability method, and identifies differentially expressed genes (DEGs) by majority voting.

Usage

DGEAR(
  dataframe,
  con1,
  con2,
  exp1,
  exp2,
  alpha = 0.05,
  votting_cutoff = 3,
  annot_df = NULL
)

Arguments

dataframe

A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Raw intensity / count values are automatically log2-transformed when they appear to be on a linear scale.

con1

Integer. Index of the first control column.

con2

Integer. Index of the last control column.

exp1

Integer. Index of the first experiment column.

exp2

Integer. Index of the last experiment column.

alpha

Numeric significance threshold for BH-adjusted p-values (default 0.05).

votting_cutoff

Integer. Minimum number of tests (out of 5) that must independently declare a gene significant for it to be included in the majority-vote DEG list (default 3). Must be between 1 and 5.

annot_df

Optional annotation data.frame with columns ID and Gene.Symbol (or Gene.symbol) that maps probe/row identifiers to gene symbols. When NULL (default) row names of dataframe are used directly as gene identifiers. Typical source: a GEO SOFT family file parsed with read.delim().

Details

The function internally calls:

Each test independently assigns an FDR flag (1 = significant, 0 = not). The five flags are summed per gene; genes whose sum meets or exceeds votting_cutoff are reported as DEGs (majority voting, Boyer & Moore 1991). Combined p-values across all five tests are computed with Fisher's method via parallelFisher.

Annotation via annot_df is entirely optional. When supplied, the first gene symbol listed for each probe (delimited by /// ) is used. When absent, row names serve as identifiers, making the function fully self-contained without GEO annotation files.

Value

A named list with four elements:

DEGs

Data.frame of gene identifiers that passed majority voting.

FDR_Table

Wide data.frame with BH-adjusted p-values from every test, the Fisher-combined FDR, the ensemble voting score, and log2 fold change for every gene.

Results_Table

Concise data.frame with G_Symbol, CombineFDR, log2FC, and Ensemble score.

IndividualTests

Named list of the raw output from each of the five test functions (each containing a Table and a DEGs element).

References

Boyer, R.S. and Moore, J.S. (1991). MJRTY — A Fast Majority Vote Algorithm. In Automated Reasoning: Essays in Honor of Woody Bledsoe, pp. 105–117. Springer, Dordrecht. doi:10.1007/978-94-011-3488-0_5

Examples

library(DGEAR)
data("gene_exp_data")

## Basic usage — no annotation file needed
result <- DGEAR(dataframe    = gene_exp_data,
                con1         = 1,
                con2         = 10,
                exp1         = 11,
                exp2         = 20,
                alpha        = 0.05,
                votting_cutoff = 2)
result$DEGs
head(result$FDR_Table)

## With an optional annotation data.frame (GEO SOFT format)
## annot <- read.delim("GSExxxxx_family.soft")
## result <- DGEAR(dataframe = gene_exp_data,
##                 con1 = 1, con2 = 10, exp1 = 11, exp2 = 20,
##                 annot_df = annot)

A dataset containing gene expression data

Description

This dataset contains statistically simulated gene expression data for ease of exercise.

Usage

gene_exp_data

Format

A data frame with 10 rows and 20 columns, the columns represents samples, say first 10 columns 1 to 10 being control and 11 to 20 being experiment. Whereas, the rows of the dataset contains genes. First 5 out of 10 genes, gene1-gene5 are the true DEGs as the expression values for the first 10 samples are ~13 times higher than the rest.

Examples

# Data will be loaded with lazy loading and can be accessible when needed.
data("gene_exp_data")
head(gene_exp_data)

One-Way ANOVA Test for Differential Gene Expression

Description

Performs a one-way ANOVA (Welch correction via oneway.test) for every gene (row) in the expression matrix, applies BH correction, and returns a results table together with the list of significant DEGs.

Usage

perform_anova(dataframe, con1, con2, exp1, exp2, alpha = 0.05, annot_df = NULL)

Arguments

dataframe

A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale.

con1

Integer. Index of the first control column.

con2

Integer. Index of the last control column.

exp1

Integer. Index of the first experiment column.

exp2

Integer. Index of the last experiment column.

alpha

Numeric significance threshold for BH-adjusted p-values (default 0.05).

annot_df

Optional annotation data.frame with columns ID and Gene.Symbol (or Gene.symbol). When NULL (default) row names of dataframe are used as gene identifiers.

Value

A named list:

Table

Data.frame with columns G_Symbol, log2FC, statistic.F, p.value, BH, fdr.

DEGs

Data.frame of significant gene identifiers.

Examples

library(DGEAR)
data("gene_exp_data")
result <- perform_anova(dataframe = gene_exp_data,
                        con1 = 1, con2 = 10,
                        exp1 = 11, exp2 = 20)
head(result$Table)
result$DEGs

Dunnett's Test for Differential Gene Expression

Description

Performs Dunnett's multiple comparison test (control vs. treatment) for every gene (row) in the expression matrix using DunnettTest, applies BH correction, and returns a results table with the list of significant DEGs. Genes with insufficient variance, missing values, or other numerical issues are skipped gracefully and receive NA p-values.

Usage

perform_dunnett_test(
  dataframe,
  con1,
  con2,
  exp1,
  exp2,
  alpha = 0.05,
  annot_df = NULL
)

Arguments

dataframe

A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale.

con1

Integer. Index of the first control column.

con2

Integer. Index of the last control column.

exp1

Integer. Index of the first experiment column.

exp2

Integer. Index of the last experiment column.

alpha

Numeric significance threshold for BH-adjusted p-values (default 0.05).

annot_df

Optional annotation data.frame with columns ID and Gene.Symbol (or Gene.symbol). When NULL (default) row names of dataframe are used as gene identifiers.

Value

A named list:

Table

Data.frame with columns G_Symbol, log2FC, p.value, BH, fdr.

DEGs

Data.frame of significant gene identifiers.

Examples

library(DGEAR)
data("gene_exp_data")
result <- perform_dunnett_test(dataframe = gene_exp_data,
                               con1 = 1, con2 = 10,
                               exp1 = 11, exp2 = 20)
head(result$Table)
result$DEGs

Half's Modified t-Test for Differential Gene Expression

Description

Computes a modified t-statistic (sometimes called "Half's t-test") that uses only the control standard deviation in its denominator, applies BH correction, and returns a results table together with the list of significant DEGs.

Usage

perform_h_test(
  dataframe,
  con1,
  con2,
  exp1,
  exp2,
  alpha = 0.05,
  annot_df = NULL
)

Arguments

dataframe

A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale.

con1

Integer. Index of the first control column.

con2

Integer. Index of the last control column.

exp1

Integer. Index of the first experiment column.

exp2

Integer. Index of the last experiment column.

alpha

Numeric significance threshold for BH-adjusted p-values (default 0.05).

annot_df

Optional annotation data.frame with columns ID and Gene.Symbol (or Gene.symbol). When NULL (default) row names of dataframe are used as gene identifiers.

Value

A named list:

Table

Data.frame with columns G_Symbol, log2FC, statistic, p.value, BH, fdr.

DEGs

Data.frame of significant gene identifiers.

Examples

library(DGEAR)
data("gene_exp_data")
result <- perform_h_test(dataframe = gene_exp_data,
                         con1 = 1, con2 = 10,
                         exp1 = 11, exp2 = 20)
head(result$Table)
result$DEGs

Welch Two-Sample t-Test for Differential Gene Expression

Description

Performs an independent two-sample Welch t-test for every gene (row) in the expression matrix, applies Benjamini-Hochberg (BH) correction, and returns a results table together with the list of significant DEGs.

Usage

perform_t_test(
  dataframe,
  con1,
  con2,
  exp1,
  exp2,
  alpha = 0.05,
  annot_df = NULL
)

Arguments

dataframe

A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale.

con1

Integer. Index of the first control column.

con2

Integer. Index of the last control column.

exp1

Integer. Index of the first experiment column.

exp2

Integer. Index of the last experiment column.

alpha

Numeric significance threshold for BH-adjusted p-values (default 0.05).

annot_df

Optional annotation data.frame with columns ID and Gene.Symbol (or Gene.symbol). When NULL (default) row names of dataframe are used as gene identifiers.

Value

A named list:

Table

Data.frame with columns G_Symbol, log2FC, statistic.t, p.value, BH, fdr.

DEGs

Data.frame of gene identifiers whose BH-adjusted p-value is \le alpha.

Examples

library(DGEAR)
data("gene_exp_data")
result <- perform_t_test(dataframe = gene_exp_data,
                         con1 = 1, con2 = 10,
                         exp1 = 11, exp2 = 20)
head(result$Table)
result$DEGs

Wilcoxon-Mann-Whitney U-Test for Differential Gene Expression

Description

Performs the Wilcoxon rank-sum (Mann-Whitney U) test for every gene (row) in the expression matrix, applies BH correction, and returns a results table together with the list of significant DEGs.

Usage

perform_wilcox_test(
  dataframe,
  con1,
  con2,
  exp1,
  exp2,
  alpha = 0.05,
  annot_df = NULL
)

Arguments

dataframe

A numeric matrix or data.frame of gene expression values (rows = genes, columns = samples). Values are automatically log2- transformed when they appear to be on a linear / intensity scale.

con1

Integer. Index of the first control column.

con2

Integer. Index of the last control column.

exp1

Integer. Index of the first experiment column.

exp2

Integer. Index of the last experiment column.

alpha

Numeric significance threshold for BH-adjusted p-values (default 0.05).

annot_df

Optional annotation data.frame with columns ID and Gene.Symbol (or Gene.symbol). When NULL (default) row names of dataframe are used as gene identifiers.

Value

A named list:

Table

Data.frame with columns G_Symbol, log2FC, statistic.W, p.value, BH, fdr.

DEGs

Data.frame of significant gene identifiers.

Examples

library(DGEAR)
data("gene_exp_data")
result <- perform_wilcox_test(dataframe = gene_exp_data,
                              con1 = 1, con2 = 10,
                              exp1 = 11, exp2 = 20)
head(result$Table)
result$DEGs