---
title: "Quick_Start"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Quick_Start}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The goal of rTCRBCRr is to process the results from clonotyping tools such as trust, mixcr, and immunoseq to analyze the clonotype repertoire metrics
## Installation
The package is accepted by the [CRAN](https://CRAN.R-project.org), you can install the released version of rTCRBCRr from CRAN with:
```r
install.packages("rTCRBCRr")
```
You can also install the development version from [GitHub](https://github.com/) with:
```r
# install.packages("devtools")
devtools::install_github("sciencepeak/rTCRBCRr")
```
## Example code
### Attach packages
```{r example}
library("rTCRBCRr")
library("magrittr")
library("readr")
```
### Read raw data files (trust generated for example) into a list of data frames
```{r message=FALSE, warning=FALSE}
present_tool <- c("trust", "mixcr")[1]
example_data_directory <- system.file(paste("extdata", present_tool, sep = "/"), package = "rTCRBCRr")
input_paths <- dir(example_data_directory, full.names = TRUE)
input_files <- dir(example_data_directory, full.names = FALSE)
input_files
sample_names <- sub(".tsv.*", "", input_files)
sample_names
raw_clonotype_dataframe_list <- lapply(input_paths, readr::read_tsv) %>%
magrittr::set_names(., value = sample_names)
raw_clonotype_dataframe_list
```
### Tidy up the clonotype dataframes
The tidy-up consists of four steps, namely four functions:
1. format_clonotype_to_immunarch_style
2. remove_nonproductive_CDR3aa
3. annotate_chain_name_and_isotype_name
4. merge_convergent_clonotype
```{r}
# If you only want to test one sample, you can process the only sample as follows.
the_divergent_clonotype_dataframe <- raw_clonotype_dataframe_list[["sample_01"]] %>%
format_clonotype_to_immunarch_style(., clonotyping_tool = present_tool) %>%
remove_nonproductive_CDR3aa %>%
annotate_chain_name_and_isotype_name %>%
merge_convergent_clonotype
# Then the only one sample should be put into a list, element of which uses the sample name,
# because the later step need a named list of data frames as input.
divergent_clonotype_dataframe_list <- list(sample_01 = the_divergent_clonotype_dataframe)
# Otherwise, normally you will have multiple samples,
# then functional style of processing is preferred as follows.
divergent_clonotype_dataframe_list <- raw_clonotype_dataframe_list %>%
lapply(., format_clonotype_to_immunarch_style, clonotyping_tool = present_tool) %>%
lapply(., remove_nonproductive_CDR3aa) %>%
lapply(., annotate_chain_name_and_isotype_name) %>%
lapply(., merge_convergent_clonotype)
```
### Calculate and merge repertoire metrics by chains for each sample in the list
This step consists of three functions.
```{r}
# handle repertoire metrics for all the chains.
all_sample_all_chain_all_metrics_wide_format_dataframe_list <- the_divergent_clonotype_dataframe_list %>%
lapply(., compute_repertoire_metrics_by_chain_name)
all_sample_all_chain_all_metrics_wide_format_dataframe_list
all_sample_all_chain_all_metrics_wide_format_dataframe <- all_sample_all_chain_all_metrics_wide_format_dataframe_list %>%
combine_all_sample_repertoire_metrics
all_sample_all_chain_all_metrics_wide_format_dataframe
all_sample_all_chain_individual_metrics_dataframe_list <- all_sample_all_chain_all_metrics_wide_format_dataframe %>%
get_item_name_x_sample_name_for_each_metric
all_sample_all_chain_individual_metrics_dataframe_list
```
### Calculate and merge repertoire metrics by IGH isotypes for each sample in the list
This step consists of three functions.
```{r}
# handle repertoire metrics all all the isotypes of IGH chain.
all_sample_IGH_chain_all_metrics_wide_format_dataframe_list <- the_divergent_clonotype_dataframe_list %>%
lapply(., calculate_IGH_isotype_proportion)
all_sample_IGH_chain_all_metrics_wide_format_dataframe_list
all_sample_IGH_chain_all_metrics_wide_format_dataframe <- all_sample_IGH_chain_all_metrics_wide_format_dataframe_list %>%
combine_all_sample_repertoire_metrics
all_sample_IGH_chain_all_metrics_wide_format_dataframe
all_sample_IGH_chain_individual_metrics_dataframe_list <- all_sample_IGH_chain_all_metrics_wide_format_dataframe %>%
get_item_name_x_sample_name_for_each_metric
all_sample_IGH_chain_individual_metrics_dataframe_list
```
## Clonotype repertoire metrics formulas
The repertoire metrics formula including richness, diversity (Shannon entropy), evenness (Pielou's eveness), clonality, and median (frequency median) were defined as follows, where $p_i$ is the frequency of ${\rm clonotype}_i$ in a sample with $N$ unique clonotypes ([Khunger, Rytlewski et al. 2019](https://www.tandfonline.com/doi/full/10.1080/2162402X.2019.1652538), [Looney, Topacio-Hall et al. 2020](https://www.frontiersin.org/articles/10.3389/fimmu.2019.02985/full)). $P$ is the frequency vector of unique clonotypes in a sample.
$$
richness\ =\ N
$$
$$
Shannon\ entropy=-\sum_{i=1}^{N}{p_i\log_2{\left(p_i\right)}}
$$
$$
Pielou\prime s\ eveness\ =\ \frac{Shannon\ entropy}{\log_2{N}}
$$
$$
clonality\ =\ 1\ -\ Pielou\prime s\ evenness
$$
$$
frequency\ median\ =\ median(P)
$$
The function `calculate_repertoire_metrics` is essential to implement the repertoire metrics formulas
```{r}
calculate_repertoire_metrics
```
## Acknowledgements
The [hexagon](https://github.com/terinjokes/StickersStandard) logo of the package was created with the help of the package [hexSticker](https://github.com/GuangchuangYu/hexSticker). The math formula was written with the help of recognition tool [MyScript](https://webdemo.myscript.com/). The latex formula in markdown was inspired by [rmd4sci](https://rmd4sci.njtierney.com/math). The code in this study was inspired by the [UCSB R tutorial note](http://traits-dgs.nceas.ucsb.edu/workspace/r/r-tutorial-for-measuring-species-diversity/Measuring Diversity in R.pdf/attachment_download/file), [LymphoSeq script](https://rdrr.io/bioc/LymphoSeq/src/R/clonality.R), and [vegan package](https://cran.r-project.org/package=vegan).