Title: | CRISPR Screen and Gene Expression Differential Analysis |
---|---|
Description: | Provides analytical methods for analyzing CRISPR screen data at different levels of gene expression. Multi-component normal mixture models and EM algorithms are used for modeling. |
Authors: | Lianbo Yu [aut, cre], Yue Zhao [aut], Kevin R. Coombes [aut], Lang Li [aut] |
Maintainer: | Lianbo Yu <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 1.1.1 |
Built: | 2024-12-25 06:36:08 UTC |
Source: | CRAN |
Code was adapted from R package gscreend.
alphaBeta(pvec)
alphaBeta(pvec)
pvec |
A numeric vector of p-values. |
A min value of the kth smallest value based on the beta distribution B(k, n-k+1), where the n is the number of probabiliteis in the vector. This min value is the significance score of the gene.
Log fold ratios of all sgRNAs of a gene are averaged to obtain the gene level log fold ratio.
calculateGeneLFC(lfcs, genes)
calculateGeneLFC(lfcs, genes)
lfcs |
A numeric vector containing log fold change of sgRNAs. |
genes |
A character string containing gene names corresponding to sgRNAs. |
A numeric vector containing log fold ratio of genes.
Code was adapted from R package gscreend. The alpha-RRA method is adapted from MAGeCK.
calculateGenePval(pvec, genes, alpha, nperm = 20)
calculateGenePval(pvec, genes, alpha, nperm = 20)
pvec |
A numeric vector containing p-values of sgRNAs. |
genes |
A character string containing gene names corresponding to sgRNAs. |
alpha |
A numeric number denoting the alpha cutoff (i.e. 0.05). |
nperm |
Number of permutations, default is 20 |
A list with four elements: 1) a list of genes with their p-values; 2) a numeric matrix of rho null, each column corresponding to a different number of sgRNAs per gene; 3)a numeric vector of rho; 4) a numeric vector of number of sgRNAs per gene.
This function generates a scatter plot with 2D density contour of log2 fold ratios of sgRNAs against the corresponding gene expression levels.
densityPlot(data, ...)
densityPlot(data, ...)
data |
A data frame from the output of preparePlotData function |
... |
Other graphical parameters |
No return value
The function normalmixEM in R package mixtools is employed for fitting multi-component normal mixture models.
EMFit(x, k0, mean_constr, sd_constr, npara, d0)
EMFit(x, k0, mean_constr, sd_constr, npara, d0)
x |
A numeric vector |
k0 |
Number of components in the normal mixture model |
mean_constr |
A constrain on means of components |
sd_constr |
A constrain on standard deviations of components |
npara |
Number of parameters |
d0 |
Number of times for fitting mixture model using different starting values |
Normal mixture model fit and BIC value of the log-likelihood
Code was adapted from R package gscreend.
makeRhoNull(n, p, nperm)
makeRhoNull(n, p, nperm)
n |
An integer representing sgRNA number of a gene. |
p |
A numeric vector which contains the percentiles of the p-values that meet the cut-off (alpha). |
nperm |
Number of permutation runs. |
A numric vector which contains all the significance scores (rho) of genes generated by a permutation test where the sgRNAs are randomly assigned to genes.
A dataset containing the expression data of sgRNAs in a CRISPR screen experiment of cell line MDA-MB-231.
mda231
mda231
A data frame with a list of two elements:
Raw Read counts of sgRNAs
A list of non-essential genes
This function adjusts sgRNA counts by the median ratio method. The normalized sgRNA read counts are calculated as the raw read counts devided by a size factor. The size factor is calcuated as the median of all size factors caculated from negative control sgRNAs (eg., sgRNAs corresponding to non-targeting or non-essential genes).
medianNormalization(data, control)
medianNormalization(data, control)
data |
A numeric matrix containing raw read counts of sgRNAs with rows corresponding to sgRNAs and columns correspondings to samples. |
control |
A numeric matrix containing raw read counts of negative control sgRNAs with rows corresponding to sgRNAs and columns corresponding to samples. Sample ordering is the same as in data. |
A list with two elements: 1) size factors of all samples; 2) normalized counts of sgRNAs.
count <- matrix(rnbinom(5000 * 6, mu=500, size=3), ncol = 6) colnames(count) = paste0("sample", 1:6) rownames(count) = paste0("sgRNA", 1:5000) control <- count[1:100,] normalizedcount <- medianNormalization(count, control)
count <- matrix(rnbinom(5000 * 6, mu=500, size=3), ncol = 6) colnames(count) = paste0("sample", 1:6) rownames(count) = paste0("sgRNA", 1:5000) control <- count[1:100,] normalizedcount <- medianNormalization(count, control)
This function perform an empirical Bayes modeling on log fold ratios and return the posterior log fold ratios.
normalMM(data, theta0, n.b = 5, d = 10)
normalMM(data, theta0, n.b = 5, d = 10)
data |
A numeric matrix containing limma results and log2 gene expression levels that has a column nameed 'lfc' and a column named 'exp.level.log2' |
theta0 |
Standard deviation of log2 fold changes under permutations |
n.b |
Number of bins, default is 5 bins |
d |
Number of times for fitting mixture model using different starting values, default is 10 |
A numeric matrix containing limma results, RNA expression levels, posterior log2 fold ratio, log p-values, and estimates of mixture model
The lmFit function in R package limma is employed for group comparisons under permutations.
permuteLimma(data, design, contrast.matrix, nperm)
permuteLimma(data, design, contrast.matrix, nperm)
data |
A numeric matrix containing log2 expression level of sgRNAs with rows corresponding to sgRNAs and columns to samples. |
design |
A design matrix with rows corresponding to samples and columns to coefficients to be estimated. |
contrast.matrix |
A matrix with columns corresponding to contrasts. |
nperm |
Number of permutations |
A numeric matrix containing log2 fold changes with permutations
y <- matrix(rnorm(1000*6),1000,6) condition <- gl(2,3,labels=c("Control","Baseline")) design <- model.matrix(~ 0 + condition) contrast.matrix <- makeContrasts("conditionControl-conditionBaseline",levels=design) fit <- permuteLimma(y,design,contrast.matrix,20)
y <- matrix(rnorm(1000*6),1000,6) condition <- gl(2,3,labels=c("Control","Baseline")) design <- model.matrix(~ 0 + condition) contrast.matrix <- makeContrasts("conditionControl-conditionBaseline",levels=design) fit <- permuteLimma(y,design,contrast.matrix,20)
Input a data frame with each gene one row, and geneID, geneLFC, geneFDR as columns. This function will stratify genes into five groups based on their FDR levels: <=0.001, (0.001,0.01], (0.01,0.05], (0.05,0.5], (0.5,1]
preparePlotData(data, gene.fdr)
preparePlotData(data, gene.fdr)
data |
A data frame containing each gene in one row, and at least three columns with geneID, geneLFC, and geneFDR. |
gene.fdr |
A numeric variable (column) in the data frame, corresponding to the gene level FDR |
A data frame based on the original data frame, with an additional column "group" indicating which FDR group this gene belongs to.
This function generates a density ridgeline plot of gene expression levels for different FDR groups.
ridgePlot(data, ...)
ridgePlot(data, ...)
data |
A data frame from the output of preparePlotData function |
... |
Other graphical parameters |
No return value
The lmFit function in R package limma is employed for group comparisons.
runLimma(data, design, contrast.matrix)
runLimma(data, design, contrast.matrix)
data |
A numeric matrix containing log2 expression levels of sgRNAs with rows corresponding to sgRNAs and columns corresponding to samples. |
design |
A design matrix with rows corresponding to samples and columns corresponding to coefficients to be estimated. |
contrast.matrix |
A matrix with columns corresponding to contrasts. |
A data frame with rows corresponding to sgRNAs and columns corresponding to limma results
y <- matrix(rnorm(1000*6),1000,6) condition <- gl(2,3,labels=c("Treatment","Baseline")) design <- model.matrix(~ 0 + condition) contrast.matrix <- makeContrasts("conditionTreatment-conditionBaseline",levels=design) limma.fit <- runLimma(y,design,contrast.matrix)
y <- matrix(rnorm(1000*6),1000,6) condition <- gl(2,3,labels=c("Treatment","Baseline")) design <- model.matrix(~ 0 + condition) contrast.matrix <- makeContrasts("conditionTreatment-conditionBaseline",levels=design) limma.fit <- runLimma(y,design,contrast.matrix)
This function generates a scatter plot of log2 fold ratios of sgRNAs against the corresponding gene expression levels.
scatterPlot(data, fdr, ...)
scatterPlot(data, fdr, ...)
data |
A numeric matrix from the output of normalMM function |
fdr |
A level of false discovery rate |
... |
Other graphical parameters |
No return value