Title: | Pseudo-Value Regression Approach for Network Analysis (PRANA) |
---|---|
Description: | A novel pseudo-value regression approach for the differential co-expression network analysis in expression data, which can incorporate additional clinical variables in the model. This is a direct regression modeling for the differential network analysis, and it is therefore computationally amenable for the most users. The full methodological details can be found in Ahn S et al (2023) <doi:10.1186/s12859-022-05123-w>. |
Authors: | Seungjun Ahn [cre, aut, trl] , Somnath Datta [ctb, ths] |
Maintainer: | Seungjun Ahn <[email protected]> |
License: | GPL-3 |
Version: | 1.0.6 |
Built: | 2024-11-22 06:26:12 UTC |
Source: | CRAN |
A function to retrieve a table with adjusted p-values after running PRANA. The table includes all variables that were included in the pseudo-value regression model.
adjpval(PRANAres)
adjpval(PRANAres)
PRANAres |
An object called after running PRANA. |
A table that includes adjusted p-values for all variables included in the fitted model.
data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Now, we want to keep the table with adjusted p-values only. adjpval(PRANAres)
data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Now, we want to keep the table with adjusted p-values only. adjpval(PRANAres)
A function to retrieve a vector of adjusted p-values after running PRANA.
adjpval_specific_var(adjptab, varname)
adjpval_specific_var(adjptab, varname)
adjptab |
A table that includes adjusted p-values for a specific variable. |
varname |
Specify the name of the variable of interest. |
A vector of adjusted p-values for a single variable from the model.
data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Create an object to keep the table with adjusted p-values using adjpval() function. adjpvaltab <- adjpval(PRANAres) # Retrieve a vector of adjusted p-values for a single variable of interest. adjpval_specific_var(adjptab = adjpvaltab, varname = "currentsmoking")
data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Create an object to keep the table with adjusted p-values using adjpval() function. adjpvaltab <- adjpval(PRANAres) # Retrieve a vector of adjusted p-values for a single variable of interest. adjpval_specific_var(adjptab = adjpvaltab, varname = "currentsmoking")
A function to retrieve a table with coefficient estimates after running PRANA. The table includes all variables that were included in the pseudo-value regression model.
coeff(PRANAres)
coeff(PRANAres)
PRANAres |
An object called after running PRANA. |
A table that includes coefficient estimates for all variables included in the fitted model.
#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. # Then, create an object called `res` to call results later. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Now, we want to keep the table with coefficient estimates only. coeff(PRANAres)
#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. # Then, create an object called `res` to call results later. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Now, we want to keep the table with coefficient estimates only. coeff(PRANAres)
A function to retrieve a vector of coefficient estimates for a specific variable of interest after running PRANA.
coeff_specific_var(coefftab, varname)
coeff_specific_var(coefftab, varname)
coefftab |
A table that includes adjusted p-values for a specific variable. |
varname |
Specify the name of the variable of interest. |
A vector of coefficient estimates for a single variable from the model.
#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. # Then, create an object called `res` to call results later. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Create an object to keep the table with coefficient estimates using coeff() function. coefftab <- coeff(PRANAres) # Lastly, we use coeff_specific_var function to retrieve # adjusted p-values for a single variable of interest. coeff_specific_var(coefftab = coefftab, varname = "currentsmoking")
#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. # Then, create an object called `res` to call results later. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Create an object to keep the table with coefficient estimates using coeff() function. coefftab <- coeff(PRANAres) # Lastly, we use coeff_specific_var function to retrieve # adjusted p-values for a single variable of interest. coeff_specific_var(coefftab = coefftab, varname = "currentsmoking")
A subset of the COPDGene study data, obtained from the Gene Expression Omnibus database. This contains expression and clinical covariates data. All of the 28 genes contained in this data are identified as COPD-related by Sakornsakolpat et al. Nat Genet, 2019.
data(combinedCOPDdat_RGO)
data(combinedCOPDdat_RGO)
combinedCOPDdat_RGO
Sakornsakolpat P, Prokopenko D, Lamontagne M, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet. 2019;51(3):494–505. (PubMed) Ahn S, Grimes T, Datta S. A pseudo-value regression approach for differential network analysis of co-expression data. BMC Bioinformatics, 2023;24(1):8. (PubMed)
data(combinedCOPDdat_RGO) combinedCOPDdat_RGO
data(combinedCOPDdat_RGO) combinedCOPDdat_RGO
A function to calculate the adjusted p-value for each gene (Datta and Datta, 2005)
EBS(pvo, alpha = 0.05, B = 500, h = 1)
EBS(pvo, alpha = 0.05, B = 500, h = 1)
pvo |
An object with p-values estimated from the user-provided expression data |
alpha |
a level of significance (default is 0.05) |
B |
size of bootstrapping (default is 500) |
h |
a bandwidth (default is 1) |
A vector of adjusted p-values for each gene.
Datta, S and Datta, S (2005). Empirical Bayes screening of many p-values with applications to microarray studies. Bioinformatics, 21(9), 1987-1994
A pseudo-value regression approach for differential network analysis that adjusts for additional covariates (PRANA).
PRANA(RNASeqdat, clindat, groupA, groupB)
PRANA(RNASeqdat, clindat, groupA, groupB)
RNASeqdat |
An RNA-Seq data with subjects in rows and genes in columns. |
clindat |
A data with clinical variables to be included in the regression (e.g., binary group variable indicating current smoking status, continuous age, ...) |
groupA |
Indices of the subjects in the first category (e.g., non-current smoker) of binary group variable. |
groupB |
Indices of the subjects in the second category (e.g., current smoker) of binary group variable. |
A list containing three data frame objects that summarize the results of PRANA. This includes beta coefficients, p-values, and adjusted p-values via the empirical Bayes approach for each predictor variables that are included in the regression model.
Ahn S, Grimes T, Datta S. A pseudo-value regression approach for differential network analysis of co-expression data. BMC Bioinformatics, 2023;24(1):8
data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB)
data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB)
A function to conduct co-expression analysis using ARACNE (Margolin et al. 2006). Uses the implementation from the minet package (Meyer et al. 2008). This function is from dnapath R package, which is archived in May 2024.
run_aracne( x, estimator = "spearman", disc = "none", nbins = NULL, eps = 0, ... )
run_aracne( x, estimator = "spearman", disc = "none", nbins = NULL, eps = 0, ... )
x |
A n by p matrix of gene expression data (n samples and p genes) |
estimator |
Argument is passed into minet::build.mim. |
disc |
Argument is passed into minet::build.mim. |
nbins |
Argument is passed into minet::build.mim. |
eps |
Argument is passed into minet::aracne. |
... |
Additional arguments are ignored. |
A p by p matrix of association scores.
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A (2006). “ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context.” In BMC Bioinformatics, volume 7(1), S7. BioMed Central.
Meyer PE, Lafitte F, Bontempi G (2008). “minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks using Mutual Information.” BMC Bioinformatics, 9(1), 461.
A function to retrieve the name of genes that are significantly differentially connected (DC). between two biological/clinical states (aka the main binary indicator) with the presence of additional covariate information.
sigDCGnames(adjptab, groupvar, alpha)
sigDCGnames(adjptab, groupvar, alpha)
adjptab |
A table with adjusted p-values for all variables that were included in the pseudo-value regression model. |
groupvar |
Specify the name of binary indicator variable. |
alpha |
A level of significance (e.g. 0.05). |
Names of significantly DC genes (e.g. gene IDs) from PRANA. If you need both adjusted p-values and names, please use sigDCGtab() instead.
#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. # Then, create an object called PRANA_Results to call results. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Next, we want to call the table with adjusted p-values only. adjptab <- adjpval(PRANAres) # Please specify the name of binary group indicator in sigDCGnames(groupvar = ). sigDCGnames <- sigDCGnames(adjptab = adjptab, groupvar = "currentsmoking", alpha = 0.05) sigDCGnames
#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. # Then, create an object called PRANA_Results to call results. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Next, we want to call the table with adjusted p-values only. adjptab <- adjpval(PRANAres) # Please specify the name of binary group indicator in sigDCGnames(groupvar = ). sigDCGnames <- sigDCGnames(adjptab = adjptab, groupvar = "currentsmoking", alpha = 0.05) sigDCGnames
A function to retrieve the data frame that are significantly differentially connected (DC). between two biological/clinical states (aka the main binary indicator) with the presence of additional covariate information.
sigDCGtab(adjptab, groupvar, alpha)
sigDCGtab(adjptab, groupvar, alpha)
adjptab |
A table with adjusted p-values and names for the variable that the user specifies in the groupvar. |
groupvar |
Specify the name of binary indicator variable. |
alpha |
A level of significance (e.g. 0.05). |
Adjusted p-values and names of significantly DC genes (e.g. gene IDs) from PRANA.
#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. # Then, create an object called PRANA_Results to call results. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Next, we want to call the table with adjusted p-values. adjptab <- adjpval(PRANAres) # Please specify the name of variable in sigDCGtab(groupvar = ). sigDCGtab <- sigDCGtab(adjptab = adjptab, groupvar = "currentsmoking", alpha = 0.05) sigDCGtab
#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data. # A gene expression data part of the downloaded data. rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)] rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric)) # A clinical data with additional covariates sorted by current smoking groups: # The first column is ID, so do not include. phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7] # Indices of non-current smoker (namely Group A) index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0) # Indices of current smoker (namely Group B) index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1) # Use PRANA() function to perform the pseudo-value regression analysis. # Then, create an object called PRANA_Results to call results. PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat, groupA = index_grpA, groupB = index_grpB) # Next, we want to call the table with adjusted p-values. adjptab <- adjpval(PRANAres) # Please specify the name of variable in sigDCGtab(groupvar = ). sigDCGtab <- sigDCGtab(adjptab = adjptab, groupvar = "currentsmoking", alpha = 0.05) sigDCGtab
A function to compute the total connectivity of each gene from the association matrix.
thetahats(asso.matinput)
thetahats(asso.matinput)
asso.matinput |
An association matrix that is estimated from the user-provided expression data is used as an input to compute the total connectivity of each gene. |
A vector containing total connectivity of each gene (i.e. continuous version of centrality measure of a network)