Package 'BICORN'

Title: Integrative Inference of De Novo Cis-Regulatory Modules
Description: Prior transcription factor binding knowledge and target gene expression data are integrated in a Bayesian framework for functional cis-regulatory module inference. Using Gibbs sampling, we iteratively estimate transcription factor associations for each gene, regulation strength for each binding event and the hidden activity for each transcription factor.
Authors: Xi Chen [aut, cre], Jianhua Xuan [aut]
Maintainer: Xi Chen <[email protected]>
License: GPL-2
Version: 0.1.0
Built: 2024-12-01 08:35:53 UTC
Source: CRAN

Help Index


TF-gene regulation strength matrix

Description

A matrix of TF-gene regulation strength with genes as rows and TFs as columns.

Usage

A

Format

numeric matrix


TF-gene regulation strength matrix sampled from the previous round

Description

A matrix of TF-gene regulation strength with genes as rows and TFs as columns, sampled from the previous round. During the Gibbs sampling process, this matrix is used as prior for a new round of regulation strength sampling.

Usage

A_old

Format

numeric matrix


Regulation Strength Sampling Function

Description

Function 'A_sampling' estimates a regulation strength for each sampled binding event in C, according to a posterior Gaussian distribution.

Usage

A_sampling(Y, C, A_old, X, base_line, C_prior, sigma_noise, sigma_A,
  sigma_baseline, sigma_X)

Arguments

Y

gene expression data matrix

C

sampled TF-gene binding network

A_old

regulatory strength sampled from the previous round, used as a prior in current function

X

sampled transcription factor activity matrix

base_line

sampled gene expression baseline activity

C_prior

prior TF-gene binding network

sigma_noise

variance of gene expression fitting residuals

sigma_A

variance of regulatory strength

sigma_baseline

variance of gene expression baseline activity

sigma_X

variance of transcription factor activity


Inverse-gamma distribution hyper-parameter alpha

Description

Hyper-parameter alpha of inverse-gamma distribution.

Usage

alpha

Format

scalar


Gene baseline expression

Description

A vector of baseline expression for all genes.

Usage

base_line

Format

numeric vector


Gene baseline expression sampled from the previous round.

Description

A vector of baseline expression for all genes, sampled from the previous round. During the Gibbs Samplig process, this is used as a prior for a new round of gene baseline expression sampling.

Usage

base_line_old

Format

numeric vector


Gene Baseline Expression Sampling Function

Description

Function 'baseline_sampling' estimates a baseline expression for each gene, according to a posterior Gaussian distribution.

Usage

baseline_sampling(Y, C, A, X, base_line_old, C_prior, sigma_noise, sigma_A,
  sigma_baseline, sigma_X)

Arguments

Y

gene expression data matrix

C

sampled TF-gene binding network

A

sampled regulatory strength matrix

X

sampled transcription factor activity matrix

base_line_old

prior gene expression baseline activity

C_prior

prior TF-gene binding network

sigma_noise

variance of gene expression fitting residuals

sigma_A

variance of regulatory strength

sigma_baseline

variance of gene expression baseline activity

sigma_X

variance of transcription factor activity


Inverse-gamma distribution hyper-parameter beta

Description

Hyper-parameter beta of inverse-gamma distribution.

Usage

beta

Format

scalar


BICORN Algorithm Function

Description

Function 'BICORN' infers a posterior module-gene regulatory network by iteratively sampling regulatory strength, transcription factor activity and several key model parameters.

Usage

BICORN(BICORN_input = NULL, L = 100, output_threshold = 10)

Arguments

BICORN_input

this list structure contains TF symbols, gene symbols and candidate modules

L

total rounds of Gibbs Sampling.

output_threshold

number of rounds after which we start to record results.

Examples

# load in the sample data input
data("sample.input")

# Data initialization (Integerate prior binding network and gene expression data)
BICORN_input<-data_integration(Binding_matrix = Binding_matrix, Binding_TFs = Binding_TFs,
Binding_genes = Binding_genes, Exp_data = Exp_data, Exp_genes = Exp_genes,
Minimum_gene_per_module_regulate = 2)

# Infer cis-regulatory modules (TF combinations) and their target genes
BICORN_output<-BICORN(BICORN_input, L = 2, output_threshold = 1)

Genes in the prior binding network

Description

A list of offical gene symbols in the binary binding network.

Usage

Binding_genes

Format

character vector


Prior TF-gene binding network

Description

A prior binary TF-gene regulatory network with each unit either 1 (binding) or 0 (non-binding).

Usage

Binding_matrix

Format

numeric matrix


TFs in the prior binding network

Description

A list of transcription factors in the prior binding network.

Usage

Binding_TFs

Format

character vector


TF-gene binding network

Description

A matrix of TF-gene regulatory network with each unit either 1 (binding) or 0 (non-binding).

Usage

C

Format

numeric matrix


TF-gene binding network sampled from the previous round

Description

A matrix of TF-gene binding network sampled from the previous round, with each unit either 1 (binding) or 0 (non-binding). During the Gibbs sampling process, this is used as a prior for a new round of binding network sampling.

Usage

C_old

Format

numeric matrix


Prior TF-gene binding network

Description

A matrix of prior TF-gene binding events, with each unit either 1 (binding) or 0 (non-binding). Such a prior network can be obtained from TF-gene binding database, motif searching, ChIP-seq peaks or ATAC-seq peaks.

Usage

C_prior

Format

numeric matrix


cis-Regulatory Module Sampling Function

Description

Function 'C_sampling_cluster' samples a candidate cis-regulatory module for each gene, according to a discrete posterior probability distribution.

Usage

C_sampling_cluster(Y, C_old, A_old, X_old, base_line_old, C_prior, sigma_noise,
  sigma_A, sigma_baseline, sigma_X, BICORN_input)

Arguments

Y

gene expression data matrix

C_old

TF-gene binding network sampled from the previous round

A_old

regulatory strength matrix sampled from the previous round

X_old

transcription factor activity matrix sampled from the previous round

base_line_old

gene expression baseline activity sampled from the previous round

C_prior

prior TF-gene binding network

sigma_noise

variance of gene expression fitting residuals

sigma_A

variance of regulatory strength

sigma_baseline

variance of gene expression baseline activity

sigma_X

variance of transcription factor activity

BICORN_input

this list structure contains TF symbols, Gene symbols and candidate modules


Data Initialization for BICORN

Description

Function 'data_integration' integrates the prior TF-gene binding network and gene expression data together. It will remove any genes missing either TF bindings or gene expression and identify a list of candidate cis-regulatory modules.

Usage

data_integration(Binding_matrix = NULL, Binding_TFs = NULL,
  Binding_genes = NULL, Exp_data, Exp_genes = NULL,
  Minimum_gene_per_module_regulate = 2)

Arguments

Binding_matrix

loaded prior binding network

Binding_TFs

loaded transcription factors

Binding_genes

loaded genes in the prior binding network

Exp_data

loaded properly normalized gene expression data

Exp_genes

loaded genes in the gene expression data

Minimum_gene_per_module_regulate

the minimum number of genes regulated by each module, used for candidate module filtering.

Examples

# load in the sample data input
data("sample.input")

# Data initialization (Integerate prior binding network and gene expression data)
BICORN_input<-data_integration(Binding_matrix = Binding_matrix, Binding_TFs = Binding_TFs,
Binding_genes = Binding_genes, Exp_data = Exp_data, Exp_genes = Exp_genes,
Minimum_gene_per_module_regulate = 2)

Gene expression data

Description

A matrix of normalized gene expression data with genes as rows and samples as columns. The gene expression data can be either time-course data measured under multiple time points or steady state data generated from at least two different conditions.

Usage

Exp_data

Format

numeric matrix


Genes in the expression data

Description

A list of official gene symbols in the gene expression data set.

Usage

Exp_genes

Format

character vector


Regulation strength variance

Description

Variance of regulation strength matrix A.

Usage

sigma_A

Format

scalar


Variance of baseline gene expression.

Description

Variance of baseline gene expression.

Usage

sigma_baseline

Format

scalar


Variance of gene expression fitting residuals.

Description

Variance of gene expression fitting residuals.

Usage

sigma_noise

Format

scalar


Transcription factor activity variance

Description

Variance of transcription factor activity matrix X.

Usage

sigma_X

Format

scalar


Fitting Residule Variance Sampling Function

Description

Function 'sigmanoise_sampling' estimates the variance of overal gene expression fitting residuals, according to an inverse-gamma distribution.

Usage

sigmanoise_sampling(Y, C, A, X, base_line, C_prior, sigma_A, sigma_baseline,
  sigma_X, alpha, beta)

Arguments

Y

gene expression data matrix

C

sampled TF-gene binding network

A

sampled regulatory strength matrix

X

sampled transcription factor activity matrix

base_line

sampled gene expression baseline activity

C_prior

prior TF-gene binding network

sigma_A

variance of regulatory strength

sigma_baseline

variance of gene expression baseline activity

sigma_X

variance of transcription factor activity

alpha

hyper-parameter for inverse-gamma distribution

beta

hyper-parameter for inverse-gamma distribution


Transcription factr activity matrix

Description

A matrix of hidden transcription factr activity estimated from gene expression data, with transcription factrs as rows and samples as columns.

Usage

X

Format

numeric matrix


Transcription factr activity matrix sampled from the previous round

Description

A matrix of hidden transcription factr activity estimated from gene expression data, with transcription factrs as rows and samples as columns, sampled from the previous round. During the Gibbs sampling process, this is used as a prior for a new round of transcription factor activity sampling.

Usage

X_old

Format

numeric matrix


Transcription Factor Activity Sampling Function

Description

Function 'X_sampling' estimates the hidden activities of each transcription factor, according to a posterior Gaussian random process.

Usage

X_sampling(Y, C, A, X_old, base_line, C_prior, sigma_noise, sigma_A,
  sigma_baseline, sigma_X)

Arguments

Y

gene expression data matrix

C

sampled TF-gene binding network

A

sampled regulatory strength matrix

X_old

sampled transcription factor activity matrix from the previous round

base_line

sampled gene expression baseline activity

C_prior

prior TF-gene binding network

sigma_noise

variance of gene expression fitting residuals

sigma_A

variance of regulatory strength

sigma_baseline

variance of gene expression baseline activity

sigma_X

variance of transcription factor activity


Gene expression data used for module inference

Description

A matrix of normalized gene expression for common genes of prior binding input and gene expression input, with genes as rows and samples as columns. Y is the matrix used for cis-regulatory mudole inference.

Usage

Y

Format

numeric matrix