Title: | Integrative Inference of De Novo Cis-Regulatory Modules |
---|---|
Description: | Prior transcription factor binding knowledge and target gene expression data are integrated in a Bayesian framework for functional cis-regulatory module inference. Using Gibbs sampling, we iteratively estimate transcription factor associations for each gene, regulation strength for each binding event and the hidden activity for each transcription factor. |
Authors: | Xi Chen [aut, cre], Jianhua Xuan [aut] |
Maintainer: | Xi Chen <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2024-12-01 08:35:53 UTC |
Source: | CRAN |
A matrix of TF-gene regulation strength with genes as rows and TFs as columns.
A
A
numeric matrix
A matrix of TF-gene regulation strength with genes as rows and TFs as columns, sampled from the previous round. During the Gibbs sampling process, this matrix is used as prior for a new round of regulation strength sampling.
A_old
A_old
numeric matrix
Function 'A_sampling' estimates a regulation strength for each sampled binding event in C, according to a posterior Gaussian distribution.
A_sampling(Y, C, A_old, X, base_line, C_prior, sigma_noise, sigma_A, sigma_baseline, sigma_X)
A_sampling(Y, C, A_old, X, base_line, C_prior, sigma_noise, sigma_A, sigma_baseline, sigma_X)
Y |
gene expression data matrix |
C |
sampled TF-gene binding network |
A_old |
regulatory strength sampled from the previous round, used as a prior in current function |
X |
sampled transcription factor activity matrix |
base_line |
sampled gene expression baseline activity |
C_prior |
prior TF-gene binding network |
sigma_noise |
variance of gene expression fitting residuals |
sigma_A |
variance of regulatory strength |
sigma_baseline |
variance of gene expression baseline activity |
sigma_X |
variance of transcription factor activity |
Hyper-parameter alpha of inverse-gamma distribution.
alpha
alpha
scalar
A vector of baseline expression for all genes.
base_line
base_line
numeric vector
A vector of baseline expression for all genes, sampled from the previous round. During the Gibbs Samplig process, this is used as a prior for a new round of gene baseline expression sampling.
base_line_old
base_line_old
numeric vector
Function 'baseline_sampling' estimates a baseline expression for each gene, according to a posterior Gaussian distribution.
baseline_sampling(Y, C, A, X, base_line_old, C_prior, sigma_noise, sigma_A, sigma_baseline, sigma_X)
baseline_sampling(Y, C, A, X, base_line_old, C_prior, sigma_noise, sigma_A, sigma_baseline, sigma_X)
Y |
gene expression data matrix |
C |
sampled TF-gene binding network |
A |
sampled regulatory strength matrix |
X |
sampled transcription factor activity matrix |
base_line_old |
prior gene expression baseline activity |
C_prior |
prior TF-gene binding network |
sigma_noise |
variance of gene expression fitting residuals |
sigma_A |
variance of regulatory strength |
sigma_baseline |
variance of gene expression baseline activity |
sigma_X |
variance of transcription factor activity |
Hyper-parameter beta of inverse-gamma distribution.
beta
beta
scalar
Function 'BICORN' infers a posterior module-gene regulatory network by iteratively sampling regulatory strength, transcription factor activity and several key model parameters.
BICORN(BICORN_input = NULL, L = 100, output_threshold = 10)
BICORN(BICORN_input = NULL, L = 100, output_threshold = 10)
BICORN_input |
this list structure contains TF symbols, gene symbols and candidate modules |
L |
total rounds of Gibbs Sampling. |
output_threshold |
number of rounds after which we start to record results. |
# load in the sample data input data("sample.input") # Data initialization (Integerate prior binding network and gene expression data) BICORN_input<-data_integration(Binding_matrix = Binding_matrix, Binding_TFs = Binding_TFs, Binding_genes = Binding_genes, Exp_data = Exp_data, Exp_genes = Exp_genes, Minimum_gene_per_module_regulate = 2) # Infer cis-regulatory modules (TF combinations) and their target genes BICORN_output<-BICORN(BICORN_input, L = 2, output_threshold = 1)
# load in the sample data input data("sample.input") # Data initialization (Integerate prior binding network and gene expression data) BICORN_input<-data_integration(Binding_matrix = Binding_matrix, Binding_TFs = Binding_TFs, Binding_genes = Binding_genes, Exp_data = Exp_data, Exp_genes = Exp_genes, Minimum_gene_per_module_regulate = 2) # Infer cis-regulatory modules (TF combinations) and their target genes BICORN_output<-BICORN(BICORN_input, L = 2, output_threshold = 1)
A list of offical gene symbols in the binary binding network.
Binding_genes
Binding_genes
character vector
A prior binary TF-gene regulatory network with each unit either 1 (binding) or 0 (non-binding).
Binding_matrix
Binding_matrix
numeric matrix
A list of transcription factors in the prior binding network.
Binding_TFs
Binding_TFs
character vector
A matrix of TF-gene regulatory network with each unit either 1 (binding) or 0 (non-binding).
C
C
numeric matrix
A matrix of TF-gene binding network sampled from the previous round, with each unit either 1 (binding) or 0 (non-binding). During the Gibbs sampling process, this is used as a prior for a new round of binding network sampling.
C_old
C_old
numeric matrix
A matrix of prior TF-gene binding events, with each unit either 1 (binding) or 0 (non-binding). Such a prior network can be obtained from TF-gene binding database, motif searching, ChIP-seq peaks or ATAC-seq peaks.
C_prior
C_prior
numeric matrix
Function 'C_sampling_cluster' samples a candidate cis-regulatory module for each gene, according to a discrete posterior probability distribution.
C_sampling_cluster(Y, C_old, A_old, X_old, base_line_old, C_prior, sigma_noise, sigma_A, sigma_baseline, sigma_X, BICORN_input)
C_sampling_cluster(Y, C_old, A_old, X_old, base_line_old, C_prior, sigma_noise, sigma_A, sigma_baseline, sigma_X, BICORN_input)
Y |
gene expression data matrix |
C_old |
TF-gene binding network sampled from the previous round |
A_old |
regulatory strength matrix sampled from the previous round |
X_old |
transcription factor activity matrix sampled from the previous round |
base_line_old |
gene expression baseline activity sampled from the previous round |
C_prior |
prior TF-gene binding network |
sigma_noise |
variance of gene expression fitting residuals |
sigma_A |
variance of regulatory strength |
sigma_baseline |
variance of gene expression baseline activity |
sigma_X |
variance of transcription factor activity |
BICORN_input |
this list structure contains TF symbols, Gene symbols and candidate modules |
Function 'data_integration' integrates the prior TF-gene binding network and gene expression data together. It will remove any genes missing either TF bindings or gene expression and identify a list of candidate cis-regulatory modules.
data_integration(Binding_matrix = NULL, Binding_TFs = NULL, Binding_genes = NULL, Exp_data, Exp_genes = NULL, Minimum_gene_per_module_regulate = 2)
data_integration(Binding_matrix = NULL, Binding_TFs = NULL, Binding_genes = NULL, Exp_data, Exp_genes = NULL, Minimum_gene_per_module_regulate = 2)
Binding_matrix |
loaded prior binding network |
Binding_TFs |
loaded transcription factors |
Binding_genes |
loaded genes in the prior binding network |
Exp_data |
loaded properly normalized gene expression data |
Exp_genes |
loaded genes in the gene expression data |
Minimum_gene_per_module_regulate |
the minimum number of genes regulated by each module, used for candidate module filtering. |
# load in the sample data input data("sample.input") # Data initialization (Integerate prior binding network and gene expression data) BICORN_input<-data_integration(Binding_matrix = Binding_matrix, Binding_TFs = Binding_TFs, Binding_genes = Binding_genes, Exp_data = Exp_data, Exp_genes = Exp_genes, Minimum_gene_per_module_regulate = 2)
# load in the sample data input data("sample.input") # Data initialization (Integerate prior binding network and gene expression data) BICORN_input<-data_integration(Binding_matrix = Binding_matrix, Binding_TFs = Binding_TFs, Binding_genes = Binding_genes, Exp_data = Exp_data, Exp_genes = Exp_genes, Minimum_gene_per_module_regulate = 2)
A matrix of normalized gene expression data with genes as rows and samples as columns. The gene expression data can be either time-course data measured under multiple time points or steady state data generated from at least two different conditions.
Exp_data
Exp_data
numeric matrix
A list of official gene symbols in the gene expression data set.
Exp_genes
Exp_genes
character vector
Variance of regulation strength matrix A.
sigma_A
sigma_A
scalar
Variance of baseline gene expression.
sigma_baseline
sigma_baseline
scalar
Variance of gene expression fitting residuals.
sigma_noise
sigma_noise
scalar
Variance of transcription factor activity matrix X.
sigma_X
sigma_X
scalar
Function 'sigmanoise_sampling' estimates the variance of overal gene expression fitting residuals, according to an inverse-gamma distribution.
sigmanoise_sampling(Y, C, A, X, base_line, C_prior, sigma_A, sigma_baseline, sigma_X, alpha, beta)
sigmanoise_sampling(Y, C, A, X, base_line, C_prior, sigma_A, sigma_baseline, sigma_X, alpha, beta)
Y |
gene expression data matrix |
C |
sampled TF-gene binding network |
A |
sampled regulatory strength matrix |
X |
sampled transcription factor activity matrix |
base_line |
sampled gene expression baseline activity |
C_prior |
prior TF-gene binding network |
sigma_A |
variance of regulatory strength |
sigma_baseline |
variance of gene expression baseline activity |
sigma_X |
variance of transcription factor activity |
alpha |
hyper-parameter for inverse-gamma distribution |
beta |
hyper-parameter for inverse-gamma distribution |
A matrix of hidden transcription factr activity estimated from gene expression data, with transcription factrs as rows and samples as columns.
X
X
numeric matrix
A matrix of hidden transcription factr activity estimated from gene expression data, with transcription factrs as rows and samples as columns, sampled from the previous round. During the Gibbs sampling process, this is used as a prior for a new round of transcription factor activity sampling.
X_old
X_old
numeric matrix
Function 'X_sampling' estimates the hidden activities of each transcription factor, according to a posterior Gaussian random process.
X_sampling(Y, C, A, X_old, base_line, C_prior, sigma_noise, sigma_A, sigma_baseline, sigma_X)
X_sampling(Y, C, A, X_old, base_line, C_prior, sigma_noise, sigma_A, sigma_baseline, sigma_X)
Y |
gene expression data matrix |
C |
sampled TF-gene binding network |
A |
sampled regulatory strength matrix |
X_old |
sampled transcription factor activity matrix from the previous round |
base_line |
sampled gene expression baseline activity |
C_prior |
prior TF-gene binding network |
sigma_noise |
variance of gene expression fitting residuals |
sigma_A |
variance of regulatory strength |
sigma_baseline |
variance of gene expression baseline activity |
sigma_X |
variance of transcription factor activity |
A matrix of normalized gene expression for common genes of prior binding input and gene expression input, with genes as rows and samples as columns. Y is the matrix used for cis-regulatory mudole inference.
Y
Y
numeric matrix