Package 'CB2' reference manual

Title:	CRISPR Pooled Screen Analysis using Beta-Binomial Test
Description:	Provides functions for hit gene identification and quantification of sgRNA (single-guided RNA) abundances for CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) pooled screen data analysis. Details are in Jeong et al. (2019) <doi:10.1101/gr.245571.118> and Baggerly et al. (2003) <doi:10.1093/bioinformatics/btg173>.
Authors:	Hyun-Hwan Jeong [aut, cre]
Maintainer:	Hyun-Hwan Jeong <[email protected]>
License:	MIT + file LICENSE
Version:	1.3.4
Built:	2025-01-26 06:45:49 UTC
Source:	CRAN

A function to calculate the mappabilities of each NGS sample.

Description

A function to calculate the mappabilities of each NGS sample.

Usage

calc_mappability(count_obj, df_design)
calc_mappability(count_obj, df_design)

Arguments

`count_obj`	A list object is created by 'run_sgrna_quant'.
`df_design`	The table contains a study design.

Examples

library(CB2)
library(magrittr)
library(tibble)
library(dplyr)
library(glue)
FASTA <- system.file("extdata", "toydata", "small_sample.fasta", package = "CB2")
ex_path <- system.file("extdata", "toydata", package = "CB2")

df_design <- tribble(
  ~group, ~sample_name,
  "Base", "Base1",  
  "Base", "Base2", 
  "High", "High1",
  "High", "High2") %>% 
    mutate(fastq_path = glue("{ex_path}/{sample_name}.fastq"))

cb2_count <- run_sgrna_quant(FASTA, df_design)
calc_mappability(cb2_count, df_design)

library(CB2)
library(magrittr)
library(tibble)
library(dplyr)
library(glue)
FASTA <- system.file("extdata", "toydata", "small_sample.fasta", package = "CB2")
ex_path <- system.file("extdata", "toydata", package = "CB2")

df_design <- tribble(
  ~group, ~sample_name,
  "Base", "Base1",  
  "Base", "Base2", 
  "High", "High1",
  "High", "High2") %>% 
    mutate(fastq_path = glue("{ex_path}/{sample_name}.fastq"))

cb2_count <- run_sgrna_quant(FASTA, df_design)
calc_mappability(cb2_count, df_design)

A benchmark CRISPRn pooled screen data from Evers et al.

Description

A benchmark CRISPRn pooled screen data from Evers et al.

Usage

data(Evers_CRISPRn_RT112)
data(Evers_CRISPRn_RT112)

Format

The data object is a list and contains below information:

count: The count matrix from Evers et al.'s paper and contains the CRISPRn screening result using RT112 cell-line. It contains three different replicates for T0 (before) and contains different three replicates for T1 (after).
egenes: The list of 46 essential genes used in Evers et al.'s study.
ngenes: The list of 47 non-essential genes used in Evers et al.'s study.
design: The data.frame contains study design.
sg_stat: The data.frame contains the sgRNA-level statistics.
gene_stat: The data.frame contains the gene-level statistics.

Source

https://www.ncbi.nlm.nih.gov/pubmed/27111720

A C++ function to perform a parameter estimation for the sgRNA-level test. It will estimate two different parameters 'phat' and 'vhat,' and we assume input count data follows the beta-binomial distribution. Dr. Keith Baggerly initially implemented this code in Matlab, and it has been rewritten it in C++ for the speed-up.

Description

A C++ function to perform a parameter estimation for the sgRNA-level test. It will estimate two different parameters 'phat' and 'vhat,' and we assume input count data follows the beta-binomial distribution. Dr. Keith Baggerly initially implemented this code in Matlab, and it has been rewritten it in C++ for the speed-up.

Usage

fit_ab(xvec, nvec)
fit_ab(xvec, nvec)

Arguments

`xvec`	a matrix contains sgRNA read counts.
`nvec`	a vector contains the library size.

A function to normalize sgRNA read counts.

Description

A function to normalize sgRNA read counts.

Usage

get_CPM(sgcount)
get_CPM(sgcount)

Arguments

sgcount

The input table contains read counts of sgRNAs for each sample

A function to calculate the CPM (Counts Per Million) (required)

Value

a normalized CPM table will be returned

Examples

library(CB2)
data(Evers_CRISPRn_RT112)
get_CPM(Evers_CRISPRn_RT112$count)

library(CB2)
data(Evers_CRISPRn_RT112)
get_CPM(Evers_CRISPRn_RT112$count)

A function to join a count table and a design table.

Description

A function to join a count table and a design table.

Usage

join_count_and_design(sgcount, df_design)
join_count_and_design(sgcount, df_design)

Arguments

`sgcount`	The input matrix contains read counts of sgRNAs for each sample.
`df_design`	The table contains a study design.

Value

A tall-thin and combined table of the sgRNA read counts and study design will be returned.

Examples

library(CB2)
data(Evers_CRISPRn_RT112) 
head(join_count_and_design(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design))

library(CB2)
data(Evers_CRISPRn_RT112) 
head(join_count_and_design(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design))

A function to perform gene-level test using a sgRNA-level statistics.

Description

A function to perform gene-level test using a sgRNA-level statistics.

Usage

measure_gene_stats(sgrna_stat, logFC_level = "sgRNA")
measure_gene_stats(sgrna_stat, logFC_level = "sgRNA")

Arguments

`sgrna_stat`	A data frame created by ‘measure_sgrna_stats’
`logFC_level`	The level of ‘logFC’ value. It can be ‘gene’ or ‘sgRNA’.

Value

A table contains the gene-level test result, and the table contains these columns:

‘gene’: Theg gene name to be tested.
‘n_sgrna’: The number of sgRNA targets the gene in the library.
‘cpm_a’: The mean of CPM of sgRNAs within the first group.
‘cpm_b’: The mean of CPM of sgRNAs within the second group.
‘logFC’: The log fold change of the gene between two groups. Taking the mean of sgRNA ‘logFC’s is default, and ‘logFC' is calculated by 'log2(cpm_b+1) - log2(cpm_a+1)’ if ‘logFC_level’ parameter is set to ‘gene’.
‘p_ts’: The p-value indicates a difference between the two groups at the gene-level.
‘p_pa’: The p-value indicates enrichment of the first group at the gene-level.
‘p_pb’: The p-value indicates enrichment of the second group at the gene-level.
‘fdr_ts’: The adjusted P-value of ‘p_ts’.
‘fdr_pa’: The adjusted P-value of ‘p_pa’.
‘fdr_pb’: The adjusted P-value of ‘p_pb’.

Examples

data(Evers_CRISPRn_RT112)
measure_gene_stats(Evers_CRISPRn_RT112$sg_stat)

data(Evers_CRISPRn_RT112)
measure_gene_stats(Evers_CRISPRn_RT112$sg_stat)

A function to perform a statistical test at a sgRNA-level

Description

A function to perform a statistical test at a sgRNA-level

Usage

measure_sgrna_stats(
  sgcount,
  design,
  group_a,
  group_b,
  delim = "_",
  ge_id = NULL,
  sg_id = NULL
)
measure_sgrna_stats(
  sgcount,
  design,
  group_a,
  group_b,
  delim = "_",
  ge_id = NULL,
  sg_id = NULL
)

Arguments

`sgcount`	This data frame contains read counts of sgRNAs for the samples.
`design`	This table contains study design. It has to contain 'group.'
`group_a`	The first group to be tested.
`group_b`	The second group to be tested.
`delim`	The delimiter between a gene name and a sgRNA ID. It will be used if only rownames contains sgRNA ID.
`ge_id`	The column name of the gene column.
`sg_id`	The column/columns of sgRNA identifiers.

Value

A table contains the sgRNA-level test result, and the table contains these columns:

‘sgRNA’: The sgRNA identifier.
‘gene’: The gene is the target of the sgRNA
‘n_a’: The number of replicates of the first group.
‘n_b’: The number of replicates of the second group.
‘phat_a’: The proportion value of the sgRNA for the first group.
‘phat_b’: The proportion value of the sgRNA for the second group.
‘vhat_a’: The variance of the sgRNA for the first group.
‘vhat_b’: The variance of the sgRNA for the second group.
‘cpm_a’: The mean CPM of the sgRNA within the first group.
‘cpm_b’: The mean CPM of the sgRNA within the second group.
‘logFC’: The log fold change of sgRNA between two groups.
‘t_value’: The value for the t-statistics.
‘df’: The value of the degree of freedom, and will be used to calculate the p-value of the sgRNA.
‘p_ts’: The p-value indicates a difference between the two groups.
‘p_pa’: The p-value indicates enrichment of the first group.
‘p_pb’: The p-value indicates enrichment of the second group.
‘fdr_ts’: The adjusted P-value of ‘p_ts’.
‘fdr_pa’: The adjusted P-value of ‘p_pa’.
‘fdr_pb’: The adjusted P-value of ‘p_pb’.

Examples

library(CB2)
data(Evers_CRISPRn_RT112)
measure_sgrna_stats(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design, "before", "after")

library(CB2)
data(Evers_CRISPRn_RT112)
measure_sgrna_stats(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design, "before", "after")

A function to show a heatmap sgRNA-level corrleations of the NGS samples.

Description

A function to show a heatmap sgRNA-level corrleations of the NGS samples.

Usage

plot_corr_heatmap(sgcount, df_design, cor_method = "pearson")
plot_corr_heatmap(sgcount, df_design, cor_method = "pearson")

Arguments

`sgcount`	The input matrix contains read counts of sgRNAs for each sample.
`df_design`	The table contains a study design.
`cor_method`	A string parameter of the correlation measure. One of the three - "pearson", "kendall", or "spearman" will be the string.

Value

A pheatmap object contains the correlation heatmap

library(CB2) data(Evers_CRISPRn_RT112) plot_corr_heatmap(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design)

A function to plot read count distribution.

Description

A function to plot read count distribution.

Usage

plot_count_distribution(sgcount, df_design, add_dots = FALSE)
plot_count_distribution(sgcount, df_design, add_dots = FALSE)

Arguments

`sgcount`	The input matrix contains read counts of sgRNAs for each sample.
`df_design`	The table contains a study design.
`add_dots`	The function will display dots of sgRNA counts if it is set to 'TRUE'.

Value

A ggplot2 object contains a read count distribution plot for 'sgcount'.

Examples

library(CB2)
data(Evers_CRISPRn_RT112)
cpm <- get_CPM(Evers_CRISPRn_RT112$count)
plot_count_distribution(cpm, Evers_CRISPRn_RT112$design)

library(CB2)
data(Evers_CRISPRn_RT112)
cpm <- get_CPM(Evers_CRISPRn_RT112$count)
plot_count_distribution(cpm, Evers_CRISPRn_RT112$design)

A function to visualize dot plots for a gene.

Description

A function to visualize dot plots for a gene.

Usage

plot_dotplot(sgcount, df_design, gene, ge_id = NULL, sg_id = NULL)
plot_dotplot(sgcount, df_design, gene, ge_id = NULL, sg_id = NULL)

Arguments

`sgcount`	The input matrix contains read counts of sgRNAs for each sample.
`df_design`	The table contains a study design.
`gene`	The gene to be shown.
`ge_id`	A name of the column contains gene names.
`sg_id`	A name of the column contains sgRNA IDs.

Value

A ggplot2 object contains dot plots of sgRNA read counts for a gene.

Examples

library(CB2)
data(Evers_CRISPRn_RT112)
plot_dotplot(get_CPM(Evers_CRISPRn_RT112$count), Evers_CRISPRn_RT112$design, "RPS7")

library(CB2)
data(Evers_CRISPRn_RT112)
plot_dotplot(get_CPM(Evers_CRISPRn_RT112$count), Evers_CRISPRn_RT112$design, "RPS7")

A function to plot the first two principal components of samples.

Description

This function will perform a principal component analysis, and it returns a ggplot object of the PCA plot.

Usage

plot_PCA(sgcount, df_design)
plot_PCA(sgcount, df_design)

Arguments

`sgcount`	The input matrix contains read counts of sgRNAs for each sample.
`df_design`	The table contains a study design.

Value

A ggplot2 object contains a PCA plot for the input.

library(CB2) data(Evers_CRISPRn_RT112) plot_PCA(Evers_CRISPRn_RT112$count, Evers_CRISPRn_RT112$design)

A C++ function to quantify sgRNA abundance from NGS samples.

Description

A C++ function to quantify sgRNA abundance from NGS samples.

Usage

quant(ref_path, fastq_path, verbose = FALSE)
quant(ref_path, fastq_path, verbose = FALSE)

Arguments

`ref_path`	the path of the annotation file and it has to be a FASTA formatted file.
`fastq_path`	a list of the FASTQ files.
`verbose`	Display some logs during the quantification if it is set to 'true'.

A function to perform a statistical test at a sgRNA-level, deprecated.

Description

A function to perform a statistical test at a sgRNA-level, deprecated.

Usage

run_estimation(
  sgcount,
  design,
  group_a,
  group_b,
  delim = "_",
  ge_id = NULL,
  sg_id = NULL
)
run_estimation(
  sgcount,
  design,
  group_a,
  group_b,
  delim = "_",
  ge_id = NULL,
  sg_id = NULL
)

Arguments

`sgcount`	This data frame contains read counts of sgRNAs for the samples.
`design`	This table contains study design. It has to contain 'group.'
`group_a`	The first group to be tested.
`group_b`	The second group to be tested.
`delim`	The delimiter between a gene name and a sgRNA ID. It will be used if only rownames contains sgRNA ID.
`ge_id`	The column name of the gene column.
`sg_id`	The column/columns of sgRNA identifiers.

Value

A table contains the sgRNA-level test result, and the table contains these columns:

‘sgRNA’: The sgRNA identifier.
‘gene’: The gene is the target of the sgRNA
‘n_a’: The number of replicates of the first group.
‘n_b’: The number of replicates of the second group.
‘phat_a’: The proportion value of the sgRNA for the first group.
‘phat_b’: The proportion value of the sgRNA for the second group.
‘vhat_a’: The variance of the sgRNA for the first group.
‘vhat_b’: The variance of the sgRNA for the second group.
‘cpm_a’: The mean CPM of the sgRNA within the first group.
‘cpm_b’: The mean CPM of the sgRNA within the second group.
‘logFC’: The log fold change of sgRNA between two groups.
‘t_value’: The value for the t-statistics.
‘df’: The value of the degree of freedom, and will be used to calculate the p-value of the sgRNA.
‘p_ts’: The p-value indicates a difference between the two groups.
‘p_pa’: The p-value indicates enrichment of the first group.
‘p_pb’: The p-value indicates enrichment of the second group.
‘fdr_ts’: The adjusted P-value of ‘p_ts’.
‘fdr_pa’: The adjusted P-value of ‘p_pa’.
‘fdr_pb’: The adjusted P-value of ‘p_pb’.

A function to run a sgRNA quantification algorithm from NGS sample

Description

A function to run a sgRNA quantification algorithm from NGS sample

Usage

run_sgrna_quant(lib_path, design, map_path = NULL, ncores = 1, verbose = FALSE)
run_sgrna_quant(lib_path, design, map_path = NULL, ncores = 1, verbose = FALSE)

Arguments

`lib_path`	The path of the FASTA file.
`design`	A table contains the study design. It must contain 'fastq_path' and 'sample_name.'
`map_path`	The path of file contains gene-sgRNA mapping.
`ncores`	The number that indicates how many processors will be used with a parallelization. The parallelization will be enabled if users do not set the parameter as '-1“ (it means the full physical cores will be used) or greater than '1'.
`verbose`	Display some logs during the quantification if it is set to 'TRUE'

Value

It will return a list, and the list contains three elements. The first element (‘count’) is a data frame contains the result of the quantification for each sample. The second element (‘total’) is a numeric vector contains the total number of reads of each sample. The last element (‘sequence’) a data frame contains the sequence of each sgRNA in the library.

Examples

library(CB2)
library(magrittr)
library(tibble)
library(dplyr)
library(glue)
FASTA <- system.file("extdata", "toydata", "small_sample.fasta", package = "CB2")
ex_path <- system.file("extdata", "toydata", package = "CB2")

df_design <- tribble(
  ~group, ~sample_name,
  "Base", "Base1",  
  "Base", "Base2", 
  "High", "High1",
  "High", "High2") %>% 
    mutate(fastq_path = glue("{ex_path}/{sample_name}.fastq"))

cb2_count <- run_sgrna_quant(FASTA, df_design)

library(CB2)
library(magrittr)
library(tibble)
library(dplyr)
library(glue)
FASTA <- system.file("extdata", "toydata", "small_sample.fasta", package = "CB2")
ex_path <- system.file("extdata", "toydata", package = "CB2")

df_design <- tribble(
  ~group, ~sample_name,
  "Base", "Base1",  
  "Base", "Base2", 
  "High", "High1",
  "High", "High2") %>% 
    mutate(fastq_path = glue("{ex_path}/{sample_name}.fastq"))

cb2_count <- run_sgrna_quant(FASTA, df_design)

A benchmark CRISPRn pooled screen data from Sanson et al.

Description

A benchmark CRISPRn pooled screen data from Sanson et al.

Usage

data(Sanson_CRISPRn_A375)
data(Sanson_CRISPRn_A375)

Format

The data object is a list and contains below information:

count: The count matrix from Sanson et al.'s paper and contains the CRISPRn screening result using A375 cell-line. It contains a sample of plasimd, and three biological replicates after three weeks.
egenes: The list of 1,580 essential genes used in Sanson et al.'s study.
ngenes: The list of 927 non-essential genes used in Sanson et al.'s study.
design: The data.frame contains study design.

Source

https://www.ncbi.nlm.nih.gov/pubmed/30575746

Package 'CB2'

Help Index

A function to calculate the mappabilities of each NGS sample.

Description

Usage

Arguments

Examples

A benchmark CRISPRn pooled screen data from Evers et al.

Description

Usage

Format

Source

Description

Usage

Arguments

A function to normalize sgRNA read counts.

Description

Usage

Arguments

Value

Examples

A function to join a count table and a design table.

Description

Usage

Arguments

Value

Examples

A function to perform gene-level test using a sgRNA-level statistics.

Description

Usage

Arguments

Value

Examples

A function to perform a statistical test at a sgRNA-level

Description

Usage

Arguments

Value

Examples

A function to show a heatmap sgRNA-level corrleations of the NGS samples.

Description

Usage

Arguments

Value

A function to plot read count distribution.

Description

Usage

Arguments

Value

Examples

A function to visualize dot plots for a gene.

Description

Usage

Arguments

Value

Examples

A function to plot the first two principal components of samples.

Description

Usage

Arguments

Value

A C++ function to quantify sgRNA abundance from NGS samples.

Description

Usage

Arguments

A function to perform a statistical test at a sgRNA-level, deprecated.

Description

Usage

Arguments

Value

A function to run a sgRNA quantification algorithm from NGS sample

Description

Usage

Arguments

Value

Examples

A benchmark CRISPRn pooled screen data from Sanson et al.

Description

Usage

Format