--- title: "Quick_Start" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quick_Start} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The goal of rTCRBCRr is to process the results from clonotyping tools such as trust, mixcr, and immunoseq to analyze the clonotype repertoire metrics ## Installation The package is accepted by the [CRAN](https://CRAN.R-project.org), you can install the released version of rTCRBCRr from CRAN with: ```r install.packages("rTCRBCRr") ``` You can also install the development version from [GitHub](https://github.com/) with: ```r # install.packages("devtools") devtools::install_github("sciencepeak/rTCRBCRr") ``` ## Example code ### Attach packages ```{r example} library("rTCRBCRr") library("magrittr") library("readr") ``` ### Read raw data files (trust generated for example) into a list of data frames ```{r message=FALSE, warning=FALSE} present_tool <- c("trust", "mixcr")[1] example_data_directory <- system.file(paste("extdata", present_tool, sep = "/"), package = "rTCRBCRr") input_paths <- dir(example_data_directory, full.names = TRUE) input_files <- dir(example_data_directory, full.names = FALSE) input_files sample_names <- sub(".tsv.*", "", input_files) sample_names raw_clonotype_dataframe_list <- lapply(input_paths, readr::read_tsv) %>% magrittr::set_names(., value = sample_names) raw_clonotype_dataframe_list ``` ### Tidy up the clonotype dataframes The tidy-up consists of four steps, namely four functions: 1. format_clonotype_to_immunarch_style 2. remove_nonproductive_CDR3aa 3. annotate_chain_name_and_isotype_name 4. merge_convergent_clonotype ```{r} # If you only want to test one sample, you can process the only sample as follows. the_divergent_clonotype_dataframe <- raw_clonotype_dataframe_list[["sample_01"]] %>% format_clonotype_to_immunarch_style(., clonotyping_tool = present_tool) %>% remove_nonproductive_CDR3aa %>% annotate_chain_name_and_isotype_name %>% merge_convergent_clonotype # Then the only one sample should be put into a list, element of which uses the sample name, # because the later step need a named list of data frames as input. divergent_clonotype_dataframe_list <- list(sample_01 = the_divergent_clonotype_dataframe) # Otherwise, normally you will have multiple samples, # then functional style of processing is preferred as follows. divergent_clonotype_dataframe_list <- raw_clonotype_dataframe_list %>% lapply(., format_clonotype_to_immunarch_style, clonotyping_tool = present_tool) %>% lapply(., remove_nonproductive_CDR3aa) %>% lapply(., annotate_chain_name_and_isotype_name) %>% lapply(., merge_convergent_clonotype) ``` ### Calculate and merge repertoire metrics by chains for each sample in the list This step consists of three functions. ```{r} # handle repertoire metrics for all the chains. all_sample_all_chain_all_metrics_wide_format_dataframe_list <- the_divergent_clonotype_dataframe_list %>% lapply(., compute_repertoire_metrics_by_chain_name) all_sample_all_chain_all_metrics_wide_format_dataframe_list all_sample_all_chain_all_metrics_wide_format_dataframe <- all_sample_all_chain_all_metrics_wide_format_dataframe_list %>% combine_all_sample_repertoire_metrics all_sample_all_chain_all_metrics_wide_format_dataframe all_sample_all_chain_individual_metrics_dataframe_list <- all_sample_all_chain_all_metrics_wide_format_dataframe %>% get_item_name_x_sample_name_for_each_metric all_sample_all_chain_individual_metrics_dataframe_list ``` ### Calculate and merge repertoire metrics by IGH isotypes for each sample in the list This step consists of three functions. ```{r} # handle repertoire metrics all all the isotypes of IGH chain. all_sample_IGH_chain_all_metrics_wide_format_dataframe_list <- the_divergent_clonotype_dataframe_list %>% lapply(., calculate_IGH_isotype_proportion) all_sample_IGH_chain_all_metrics_wide_format_dataframe_list all_sample_IGH_chain_all_metrics_wide_format_dataframe <- all_sample_IGH_chain_all_metrics_wide_format_dataframe_list %>% combine_all_sample_repertoire_metrics all_sample_IGH_chain_all_metrics_wide_format_dataframe all_sample_IGH_chain_individual_metrics_dataframe_list <- all_sample_IGH_chain_all_metrics_wide_format_dataframe %>% get_item_name_x_sample_name_for_each_metric all_sample_IGH_chain_individual_metrics_dataframe_list ``` ## Clonotype repertoire metrics formulas The repertoire metrics formula including richness, diversity (Shannon entropy), evenness (Pielou's eveness), clonality, and median (frequency median) were defined as follows, where $p_i$ is the frequency of ${\rm clonotype}_i$ in a sample with $N$ unique clonotypes ([Khunger, Rytlewski et al. 2019](https://www.tandfonline.com/doi/full/10.1080/2162402X.2019.1652538), [Looney, Topacio-Hall et al. 2020](https://www.frontiersin.org/articles/10.3389/fimmu.2019.02985/full)). $P$ is the frequency vector of unique clonotypes in a sample. $$ richness\ =\ N $$ $$ Shannon\ entropy=-\sum_{i=1}^{N}{p_i\log_2{\left(p_i\right)}} $$ $$ Pielou\prime s\ eveness\ =\ \frac{Shannon\ entropy}{\log_2{N}} $$ $$ clonality\ =\ 1\ -\ Pielou\prime s\ evenness $$ $$ frequency\ median\ =\ median(P) $$ The function `calculate_repertoire_metrics` is essential to implement the repertoire metrics formulas ```{r} calculate_repertoire_metrics ``` ## Acknowledgements The [hexagon](https://github.com/terinjokes/StickersStandard) logo of the package was created with the help of the package [hexSticker](https://github.com/GuangchuangYu/hexSticker). The math formula was written with the help of recognition tool [MyScript](https://webdemo.myscript.com/). The latex formula in markdown was inspired by [rmd4sci](https://rmd4sci.njtierney.com/math). The code in this study was inspired by the [UCSB R tutorial note](http://traits-dgs.nceas.ucsb.edu/workspace/r/r-tutorial-for-measuring-species-diversity/Measuring Diversity in R.pdf/attachment_download/file), [LymphoSeq script](https://rdrr.io/bioc/LymphoSeq/src/R/clonality.R), and [vegan package](https://cran.r-project.org/package=vegan).