Title: | Toolkit to Identify Candidate Synthetic Lethality |
---|---|
Description: | Enables the user to infer potential synthetic lethal relationships by analysing relationships between bimodally distributed gene pairs in big gene expression datasets. Enables the user to visualise these candidate synthetic lethal relationships. |
Authors: | Mark Wappett |
Maintainer: | Mark Wappett <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.3 |
Built: | 2024-12-01 08:24:38 UTC |
Source: | CRAN |
A set of tools that enable the user to accurately identify bimodality and non-normality in gene expression data and stratify samples as high or low expression for bimodal genes. Enables identification of candidate synthetic lethal gene pairs. Enables the user to assess and visualise functional redundancy between candidate synthetic lethal gene pairs.
Package: | BiSEp |
Type: | Package |
Version: | 2.0 |
Date: | 2014-10-21 |
License: | GPL-2 |
This package has a mixture of CRAN and bioconductor packages listed as dependancies. Please ensure that you have Bioconductor installed.
Author: Mark Wappett
Maintainer: Mark Wappett <[email protected]>
Takes the output from the function BISEP and a discreet mutation matrix as input. The mutation matrix samples (columns) must mirror or overlap with the gene expression matrix. The data in the mutation matrix must be a discreet 'WT' or 'MUT' call based on the status of each gene with each sample. Detects mutations of genes enriched in either the high or low gene expression modes.
BEEM( bisepData=data, mutData=mutData, sampleType=c("cell_line", "cell_line_low", "patient", "patient_low"), minMut=10 )
BEEM( bisepData=data, mutData=mutData, sampleType=c("cell_line", "cell_line_low", "patient", "patient_low"), minMut=10 )
bisepData |
This should be the output from the BISEP function. |
mutData |
This should be a matrix with genes rownames and samples as column names. All cells should be made up of a discreet 'WT' or 'MUT' call. There should be overlap (by sample) with the gene expression matrix. |
sampleType |
The type of sample being analysed. Select 'cell_line' or 'patient' for datasets with greater than ~200 samples. For datasets with less than ~200 samples, use 'cell_line_low' or 'patient_low'. |
minMut |
The minimum number of mutations you for a gene would consider for analysis. |
Lower sample numbers have more stringent bimodality hurdles to clear in order to keep the false positive rate lower. The tool returns a percentage complete text window so the user can observe the status of the job.
A matrix containing 10 columns. Column 1 contains the bimodal genes from the expression data (gene 1) and column 2 contains the mutated candidate synthetic lethal gene pair (gene 2). Columns 3 and 4 contain the number of mutations of gene 2 in the low and high expression modes of gene 1. Column 5 contains the fishers p value that evaluates enrichment of mutation in either the high or low mode (indicated by column 10). Columns 6 and 7 contain the percentage of samples in the low and high expression modes of gene 1 that are mutated for gene 2. Columns 8 and 9 contain information on the overall size (in terms of sample) of the low and high expression modes of gene 1.
Mark Wappett
Part of the Synthetic Lethality detection in Genomics toolkit. Detects bimodality and non-normality in all genes across the dataset. Compares all pairwise combinations of bimodal genes and searches for mutually exclusive low expression as evidence of potential synthetic lethality. Scores gene-pairs based on the presence of mutual exclusive bimodality and the distribution of signal intensity across the rest of the dataset.
BIGEE( bisepData=data, sampleType=c("cell_line", "cell_line_low", "patient", "patient_low") )
BIGEE( bisepData=data, sampleType=c("cell_line", "cell_line_low", "patient", "patient_low") )
bisepData |
This should be the output from the BISEP function. |
sampleType |
The type of sample being analysed. Select 'cell_line' or 'patient' for datasets with greater than ~200 samples. For datasets with less than ~200 samples, use 'cell_line_low' or 'patient_low'. |
Lower sample numbers have more stringent bimodality hurdles to clear in order to keep the false positive rate lower. The tool returns a percentage complete text window so the user can observe the status of the job.
A matrix containing three columns. Columns 1 and 2 are the gene symbols that make up the candidate synthetic lethal gene pairs. Column 3 is the score calculated the tool to rank the statistical significance of the gene pairs.
Mark Wappett
Detects bimodality and non-normality in all genes across the dataset.
BISEP( data = data )
BISEP( data = data )
data |
This should be a log2 gene expression matrix with genes as rownames and samples as column names. Suitable for gene expression data from any platform - NGS datasets should be RPKM or RSEM values. |
The lower confidence calls will dramatically affect the number of gene pairs that the tool produces and increase the false positive rate. The tool will take approximately 10 minutes to run a 5,000 row and 200 column input matrix using a 'medium,' confidence interval.
A list containing three matrices. Matrix 1 contains the output of the BISEP algorithm - including the midpoint of the bimodal distribution and the associated p value. Matrix 2 contains the output from the BI algorithm - including the delta, pi and BI values. Matrix 3 contains the input matrix.
Mark Wappett
data(INPUT_data) outputBISEP <- BISEP(data=INPUT_data)
data(INPUT_data) outputBISEP <- BISEP(data=INPUT_data)
Matrix 1 contains the output of the BISEP algorithm - including the midpoint of the bimodal distribution and the associated p value. Matrix 2 contains the output from the BI algorithm - including the delta, pi and BI values. Matrix 3 contains the input matrix.
data(BISEP_dat)
data(BISEP_dat)
13 observations across 100 variables.
Matrix 1 contains the output of the BISEP algorithm - including the midpoint of the bimodal distribution and the associated p value. Matrix 2 contains the output from the BI algorithm - including the delta, pi and BI values. Matrix 3 contains the input matrix.
data(BISEP_data)
data(BISEP_data)
13 observations across 442 variables.
Takes the output from the function BISEP and two gene names that correspond to a relevant gene pair. Gene names must be available in the input BISEP object.
expressionPlot( bisepData=data, gene1, gene2 )
expressionPlot( bisepData=data, gene1, gene2 )
bisepData |
This should be the output from the BISEP function. |
gene1 |
The first gene whose expression you would like to plot. |
gene2 |
The second gene whose expression you would like to plot. |
The function will return an error if any of the input information is incorrect or missing. The resulting plot will be returned in real time.
A scatter plot of the two genes you have identified as bimodal. The red lines correspond to the mid-points of the bimodal distribution for these two genes. Ideally the lower left quadrant would be empty when observing a candidate SL interaction.
Mark Wappett
data(BISEP_data) data(MUT_data) expressionOut <- expressionPlot(BISEP_data, gene1="SMARCA1", gene2="SMARCA4")
data(BISEP_data) data(MUT_data) expressionOut <- expressionPlot(BISEP_data, gene1="SMARCA1", gene2="SMARCA4")
Utilises gene ontology information from the GO database bioconductor package. Assesses gene pairs output from the SLinG and BEEM tools for gene ontology functional redundancy. Performs semantic similarity scoring utilising the GOSemSim bioconductor package
FURE( data=data, inputType=inputType)
FURE( data=data, inputType=inputType)
data |
This should be the output matrix (or similar) from the SLinG and BEEM tools. Columns 1 and 2 should be gene symbols. |
inputType |
Either 'BIGEE' or 'BEEM' based on origin of the input matrix. |
A list of matrices containing gene pairs with associated synthetic lethal statistical significance values + gene ontology annotation/ scores.
Mark Wappett
Output matrix from the BIGEE tool.
data(FURE_data)
data(FURE_data)
A data frame with 1 observation across 3 variables.
A Log2 Gene Expression matrix where rownames are genes and colnames are samples
data(INPUT_data)
data(INPUT_data)
13 observations across 442 variables.
A matrix containing discreet mutation calls of either 'WT' or 'MUT' where rownames are genes and column names are samples
data(MUT_data)
data(MUT_data)
4 observations across 442 variables.
Takes the output from the function BISEP and a discreet mutation matrix as input. The mutation matrix samples (columns) must mirror or overlap with the gene expression matrix. The data in the mutation matrix must be a discreet 'WT' or 'MUT' call based on the status of each gene with each sample. Gene names must be available in the input matrices.
waterfallPlot( bisepData=data, mutData=mutData, expressionGene, mutationGene )
waterfallPlot( bisepData=data, mutData=mutData, expressionGene, mutationGene )
bisepData |
This should be the output from the BISEP function. |
mutData |
This should be a matrix with genes rownames and samples as column names. All cells should be made up of a discreet 'WT' or 'MUT' call. There should be overlap (by sample) with the gene expression matrix. |
expressionGene |
The gene whose expression you would like to plot. |
mutationGene |
The gene whose mutation status you would like to overlap with the expression gene. |
The function will return an error if any of the input information is incorrect or missing. The resulting plot will be returned in real time.
A waterfall plot. The plot is made up of two panels: the left panel is a density distribution of the expression gene provided and the right panel is a bar-chart of the gene expression level coloured by mutation status.
Mark Wappett
data(BISEP_data) data(MUT_data) waterfallOut <- waterfallPlot(BISEP_data, MUT_data, expressionGene="micb", mutationGene="PBRM1")
data(BISEP_data) data(MUT_data) waterfallOut <- waterfallPlot(BISEP_data, MUT_data, expressionGene="micb", mutationGene="PBRM1")