Title: | Joint Modeling of the Gene-Expression and Bioassay Data, Taking Care of the Effect Due to a Fingerprint Feature |
---|---|
Description: | Offers modeling the association between gene-expression and bioassay data, taking care of the effect due to a fingerprint feature and helps with several plots to better understand the analysis. |
Authors: | Rudradev Sengupta, Nolen Joy Perualila |
Maintainer: | Rudradev Sengupta <[email protected]> |
License: | GPL-3 |
Version: | 1.6 |
Built: | 2024-11-28 06:33:29 UTC |
Source: | CRAN |
The fitJM function fits the model for all the genes for a specific bio-activity vector and a particular fingerprint feature.
fitJM(dat, responseVector, covariate = NULL, methodMultTest)
fitJM(dat, responseVector, covariate = NULL, methodMultTest)
dat |
Contains the gene expression data matrix for all the genes - can be a matrix or an expression set. |
responseVector |
Vector containing the bio-activity data. |
covariate |
Vector of 0's and 1's, containing data about the fingerprint feature. |
methodMultTest |
Character string to specify the multiple testing method. Default is the BH-FDR method. |
The default for the covariate parameter is NULL and if no covariate is specified it returns a data frame containing 5 variables, named as "Pearson","Spearman","p", "adj-p","logratio" and the data frame is ordered based on the column "p" which is the p-value obtained from the Log-Ratio Test. If there is a covariate, then the output is a dataframe containing 13 variables for all the genes,named as "adjPearson","adjSpearman","pPearson","Pearson", "Spearman", "pAdjR", "CovEffect1", "adjPeffect1", "CovEffect2", "adjPeffect2", "rawP1", "rawP2","logratio" and sorted based on "rawP1" and "pPearson" which are p-value corresponding to the effect of the fingerprint feature on the gene expression data as obtained from the t-table after fitting the model using gls and the p-value obtained from the Log-Ratio Test, respectively. In the first case without any covariate it calls the nullcov function inside it, otherwise the non_nullcov function is called to do the analysis.
A data frame, containing the results of the model, to be used later for plots or to identify the top genes.
## Not run: jmRes <- fitJM(dat=gene_eset,responseVector=activity,methodMultTest='fdr') jmRes <- fitJM(dat=gene_eset,responseVector=activity,covariate = fp,methodMultTest='fdr') ## End(Not run)
## Not run: jmRes <- fitJM(dat=gene_eset,responseVector=activity,methodMultTest='fdr') jmRes <- fitJM(dat=gene_eset,responseVector=activity,covariate = fp,methodMultTest='fdr') ## End(Not run)
The getCorrUnad function is a support function for the function plot1gene.
getCorrUnad(geneName, fp, fpName, responseVector, dat, resPlot)
getCorrUnad(geneName, fp, fpName, responseVector, dat, resPlot)
geneName |
Character string, specifying the name of the gene. |
fp |
Vector containing 0's and 1's - the data about the fingerprint feature. |
fpName |
Character string, used to make the title of the plots. If not specified, the plot title will be blank. |
responseVector |
Vector containing the bio-activity data. |
dat |
Contains the gene expression data matrix for all the genes - can be a matrix or an expression set. |
resPlot |
Logical. If TRUE, creates the plot data for the residual plot |
Works as a support function for plot1gene.
A list containing the data to create the respective plots and the unadjusted association between the gene expression and bio-activity data.
## Not run: getCorrUnad(geneName="Gene21",fp=fp,fpName="Fingerprint", responseVector=activity,dat=gene_eset,resPlot=TRUE) ## End(Not run)
## Not run: getCorrUnad(geneName="Gene21",fp=fp,fpName="Fingerprint", responseVector=activity,dat=gene_eset,resPlot=TRUE) ## End(Not run)
The IntegratedJM package contains the functions to fit the Joint Model, to classify the genes based on different criteria and necessary plot functions.
The multiplot function plots multiple ggplots in the same window.
multiplot(..., cols = 1)
multiplot(..., cols = 1)
... |
ggplot2 objects, separated by comma. |
cols |
Integer, specifying the number of plots in one row in the layout. |
Plots multiple ggplots in the same window - multiplot(p1,p2,p3,p4, cols=2) is similar to the standard R notation par(mfrow=c(2,2)).
Creates multiple ggplots in same window
## Not run: multiplot(p1,p2,p3,cols=3) ## End(Not run)
## Not run: multiplot(p1,p2,p3,cols=3) ## End(Not run)
The non_nullcov function is called while fitting the model when the covariate is specified in the fitJM function. It returns a data.frame containing the results after fitting the model. The output of this function is also the output of the fitJM function.
non_nullcov(dat, responseVector, covariate, methodMultTest, data_type)
non_nullcov(dat, responseVector, covariate, methodMultTest, data_type)
dat |
Contains the gene expression data matrix for all the genes - can be a matrix or an expression set. |
responseVector |
Vector containing the bio-activity data. |
covariate |
Vector of 0's and 1's, containing data about the fingerprint feature. |
methodMultTest |
Character string to specify the multiple testing method. |
data_type |
Binary, specifying the type of the parameter dat: 0 - expressionSet, 1 - matrix. |
Fits the model, adjusting for the covariate effect, using gls, calculates the correlation, p-values, adjusted p-values (based on the multiple testing method) and logratio from LRT and returns the required results.
A data frame, containing the results of the model - same as the output of the fitJM function.
## Not run: non_nullcov(dat=gene_eset,responseVector=activity,covariate=fp,methodMultTest='fdr',data_type=0) ## End(Not run)
## Not run: non_nullcov(dat=gene_eset,responseVector=activity,covariate=fp,methodMultTest='fdr',data_type=0) ## End(Not run)
The nullcov function is called while fitting the model when there no covariate is specified in the fitJM function. It returns a data.frame containing the results after fitting the model. The output of this function is also the output of the fitJM function.
nullcov(dat, responseVector, methodMultTest, data_type)
nullcov(dat, responseVector, methodMultTest, data_type)
dat |
Contains the gene expression data matrix for all the genes - can be a matrix or an expression set. |
responseVector |
Vector containing the bio-activity data. |
methodMultTest |
Character string to specify the multiple testing method. Default is the BH-FDR method. |
data_type |
Binary, specifying the type of the parameter dat: 0 - expressionSet, 1 - matrix. |
Fits the model using gls, calculates the correlation, p-values, adjusted p-values (based on the multiple testing method) and logratio from LRT and returns the required results.
A data frame, containing the results of the model - same as the output of the fitJM function.
## Not run: nullcov(dat=gene_eset,responseVector=activity,methodMultTest='fdr',data_type=0) ## End(Not run)
## Not run: nullcov(dat=gene_eset,responseVector=activity,methodMultTest='fdr',data_type=0) ## End(Not run)
The plot1gene function plots the data for a single gene.
plot1gene(geneName, fp, fpName = "", responseVector, dat, resPlot = TRUE, colP = "blue", colA = "white")
plot1gene(geneName, fp, fpName = "", responseVector, dat, resPlot = TRUE, colP = "blue", colA = "white")
geneName |
Character string, specifying the name of the gene. |
fp |
Vector containing 0's and 1's - the data about the fingerprint feature. |
fpName |
Character string, used to make the title of the plots. If not specified, the plot title will be blank. |
responseVector |
Vector containing the bio-activity data. |
dat |
Contains the gene expression data matrix for all the genes - can be a matrix or an expression set. |
resPlot |
Logical. If TRUE, also plots the residual from the gls fit. Default is TRUE. |
colP |
Character string, specifying the colour for the 1's in the fp parameter. Default is blue. |
colA |
Character string, specifying the colour for the 0's in the fp parameter. Default is white. |
Calls the getCorrUnad function and creates the plot(s) accordingly.
Creates a plot
## Not run: plot1gene(geneName="Gene21",fp=fp,fpName="Fingerprint",responseVector=activity,dat=gene_eset) ## End(Not run)
## Not run: plot1gene(geneName="Gene21",fp=fp,fpName="Fingerprint",responseVector=activity,dat=gene_eset) ## End(Not run)
The plotAsso function is used to plot the unadjusted association vs the adjusted association for all the genes.
plotAsso(jointModelResult, type)
plotAsso(jointModelResult, type)
jointModelResult |
Data frame, containing the results from the fitJM function. |
type |
Character string, specifying the type of association - Pearson or Spearman. |
Plots the unadjusted association vs the adjusted association for all the genes.
Creates a plot
## Not run: plotAsso(jointModelResult=jmRes,type="Pearson") ## End(Not run)
## Not run: plotAsso(jointModelResult=jmRes,type="Pearson") ## End(Not run)
The plotEff function is used to plot the fingerprint effect on gene expression vs the adjusted association for all the genes.
plotEff(jointModelResult, type)
plotEff(jointModelResult, type)
jointModelResult |
Data frame, containing the results from the fitJM function. |
type |
Character string, specifying the type of association - Pearson or Spearman. |
Plots the fingerprint effect on gene expression vs the specified type of adjusted association for all the genes.
Creates a plot
## Not run: plotEff(jointModelResult=jmRes,type="Pearson") ## End(Not run)
## Not run: plotEff(jointModelResult=jmRes,type="Pearson") ## End(Not run)
sample_data; Sample data included in the package. Gene expression data for 500 genes and 20 compounds and data on bio-activity and fingerprint feature.
data(sampleData)
data(sampleData)
The format is: List containing one gene expression data matrix, one vector each on bio-activity and fingerprint feature data, respectively.
data(sampleData) gene_mx <- sample_data[[1]] activity <- sample_data[[2]] fp <- sample_data[[3]]
data(sampleData) gene_mx <- sample_data[[1]] activity <- sample_data[[2]] fp <- sample_data[[3]]
The topkGenes function is to identify the top genes based on different criteria.
topkGenes(jointModelResult, subset_type, ranking, k = 10, sigLevel = 0.01)
topkGenes(jointModelResult, subset_type, ranking, k = 10, sigLevel = 0.01)
jointModelResult |
Data frame, containing the results from the fitJM function. |
subset_type |
Character string to specify the set of genes. It can have four values: "Effect" for only differentially expressed genes, "Correlation" for only correlated genes, "Effect and Correlation" for genes which are both differentially expressed & correlated and "Other" for the genes which are neither differentially expressed nor correlated. |
ranking |
Character string, specifying one of the columns of the jointModelResult data frame, based on the genes will be ranked within the selected subset. |
k |
Integer, specifying the number of genes, to be returned from the list of top genes. Default is 10. |
sigLevel |
Numeric between 0 and 1, specifying the level of significance, used to select the subset of genes. |
Returned data frame contains 6 columns, named as "Genes","FP-Effect", "p-adj(Effect)", "Unadj.Asso.","Adj.Asso.", "p-adj(Adj.Asso.)".
A data frame containing top k genes according to the specified criteria from the specified set of genes.
## Not run: topkGenes(jointModelResult=jmRes,subset_type="Effect",ranking="Pearson",k=10,sigLevel = 0.05) ## End(Not run)
## Not run: topkGenes(jointModelResult=jmRes,subset_type="Effect",ranking="Pearson",k=10,sigLevel = 0.05) ## End(Not run)
The volcano function produces the volcano plot for logratio / fp-effect vs corresponding p-values.
volcano(x, pValue, pointLabels, topPValues = 10, topXvalues = 10, smoothScatter = TRUE, xlab = NULL, ylab = NULL, main = NULL, newpage = TRUE, additionalPointsToLabel = NULL, additionalLabelColor = "red", dir = TRUE)
volcano(x, pValue, pointLabels, topPValues = 10, topXvalues = 10, smoothScatter = TRUE, xlab = NULL, ylab = NULL, main = NULL, newpage = TRUE, additionalPointsToLabel = NULL, additionalLabelColor = "red", dir = TRUE)
x |
Numeric vector of logratios or covariate effect values to be plotted. |
pValue |
Numeric vector of corresponding p-values obtained from some statistical test. |
pointLabels |
Character vector providing the texts for the points to be labelled in the plot. |
topPValues |
Number of top p-values to be labelled. Default value is 10. |
topXvalues |
Number of top logratios or covariate effect values to be labelled. Default value is 10. |
smoothScatter |
Logical parameter to decide if a smooth plot is expected or not. Default is TRUE. |
xlab |
Text for the x-axis of the plot. Default is NULL. |
ylab |
Text for the y-axis of the plot. Default is NULL. |
main |
Text for the main title of the plot. Default is NULL. |
newpage |
Logical parameter |
additionalPointsToLabel |
Set of points other than the top values to be labelled in the plot. Default is NULL. |
additionalLabelColor |
Colour of the additionally labelled points. Default colour is red. |
dir |
Logical parameter deciding if the top values should be in decreasing (= TRUE) or increasing (= FALSE) order. Default is TRUE. |
Creates a plot which looks like a volcano with the interesting points labelled within the plot.
A plot which looks like a volcano.
## Not run: volcano(x=jmRes$CovEffect1,pValue=jmRes$rawP1,pointLabels=rownames(jmRes), topPValues = 10, topXvalues = 10,xlab="FP Effect (alpha)",ylab="-log(p-values)") ## End(Not run)
## Not run: volcano(x=jmRes$CovEffect1,pValue=jmRes$rawP1,pointLabels=rownames(jmRes), topPValues = 10, topXvalues = 10,xlab="FP Effect (alpha)",ylab="-log(p-values)") ## End(Not run)