--- title: "Welcome to _anabel_: ANAlysis of Binding Events + l" output: rmarkdown::html_document: keep_md: TRUE theme: lumen highlight: rstudio toc: true toc_depth: 2 toc_float: true fig_width: 6 fig_height: 4 author: Hoor Al-Hasani, Oliver Selinger, Stefan Kraemer authors: - name: Hoor Al-Hasani orcid: 0000-0002-0431-845X affiliation: 1 - name: Oliver Selinger orcid: 0000-0001-9723-2809 affiliation: 1 - name: Stefan Kraemer orcid: 0000-0002-0071-9344 affiliation: 1 affiliations: - name: BioCopy GmbH index: 1 date: "Last compiled on `r format(Sys.time(), '%B, %Y')` " bibliography: paper.bib #`tango`, `pygments`, `kate`, `monochrome`, `espresso`, `zenburn`, `haddock`, `breezedark`, `textmate`, `arrow`, or `rstudio` or a file with extension `.theme`. abstract: > _anabel_ is a free software for kinetics-fit analysis of 1:1 biomolecular interactions for Single-Curve-Analysis (SCA), Single-Cycle-Kinetics (SCK), and Multi-Cycle-Kinetics (MCK) injection strategies. It supports exported kinetic datasets from Biacore, BLI, Score, and an open data format, providing a user-friendly interface for non R-users ([check the online version](https://anabel.biocopy.de/)). Funded by [BioCopy GmbH](https://www.biocopy.com/), _anabel_ is a valuable tool for researchers seeking a streamlined analysis process. vignette: > %\VignetteIndexEntry{Welcome to _anabel_: ANAlysis of Binding Events + l} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", width = 68) ``` ```{r logo, echo=FALSE, warning=FALSE, message=FALSE} biocopy_colors <- c("#958BB2", "#C61E19", "#99CFE9", "#A2C510", "#FBB800") library(ggplot2) library(dplyr) if (!rlang::is_installed("htmltools")) install.packages("htmltools") ``` # Summary *anabel* aims to simplify the analysis of binding-curve fitting for scientists of different backgrounds, while minimizing user influence [@anabel1; @anabel2]. With the function `run_anabel`, which supports three different modes, estimating kinetics constants is a straightforward task. The user can select the mode that is most appropriate for their experimental setup. Please note that this vignette assumes a basic understanding of real-time label-free biomolecular interactions. For more information and an introduction to the theoretical background, please refer to the [online version](https://anabel.biocopy.de/). # Getting started Installing *anabel* within `R` is similar to any other R package either using `install.packages` or `devtools::install`. Either way you choose, make sure to set `dependencies = TRUE`. The core of *anabel* includes some packages commonly used for everyday data analysis, such as `ggplot2, dplyr, purrr, reshape2`. Once the installation is successful, you could start using *anabel* as follows: ```{r load, message=FALSE} library(anabel) packageVersion("anabel") ``` ## Input *anabel* accepts sensogram input in the form of an Excel or CSV file, or as a data frame. If providing a file, the full path must be specified, or anabel will attempt to read from the working directory. The input data must be in numeric table format with a column dedicated to time. This column can have any name and use any R-approved symbols, as long as it contains the keyword 'time' (see exemplary datasets). To specify the spots/sample names for the final results (tables + plots), you can provide an additional table with an 'ID' column containing the exact column names from the sensogram tables (except for the time-column), and a 'Name' column for mapping. Please note that 'ID' and 'Name' are reserved column names, and *anabel* will ignore the file if they are not present. ## Exemplary datasets - I To run this tutorial, we will use simulated data that mimics typical 1:1 kinetics interactions. This data is available through *anabel*: ```{r dataset_normal} data("SCA_dataset") data("MCK_dataset") data("SCK_dataset") ``` To view the help page for anabel and the dataset, use the following command: ```{r help_pages, eval=FALSE} help(package = "anabel") ?SCA_dataset ?MCK_dataset ?SCK_dataset ``` All datasets that are used in this tutorial were generated using the `Biacore™ Simul8 – SPR sensorgram simulation tool (Simul8)` [@simul8] ## Functions *anabel* currently offers two main functions, each with a help page that includes code examples: ```{r func, eval=FALSE} ?convert_toMolar() # show help page ?run_anabel() # show help page ``` The main function of *anabel* is `run_anabel`, which analyzes sensograms of 1:1 biomolecular interactions using three different modes: Single-curve analysis (SCA), Multi-cycle kinetics (MCK), and Single-cycle kinetics (SCK). Additionally, the `convert_toMolar` function converts the analyte concentration unit into molar, supporting units such as nanomolar (nm), millimolar (mm), micromolar (µM), and picomolar (pm). This function is case-insensitive and accepts variations such as nM, NM, nanomolar, and Nanomolar. In the following section (Analyte concentration), we explain how to use this function. ## Analyte concentration {#ac} The first step is to convert the value of analyte-concentration into molar: ```{r convert_2molar} # one value in case of SCA method ac <- convert_toMolar(val = 50, unit = "nM") # vector in case of SCK and MCK methods ac_mck <- convert_toMolar(val = c(50, 16.7, 5.56, 1.85, 6.17e-1), unit = "nM") ac_sck <- convert_toMolar(val = c(6.17e-1, 1.85, 5.56, 16.7, 50), unit = "nM") ``` # Supported models {.tabset .tabset-fade} ```{r models, echo=FALSE, eval=FALSE} htmltools::img( src = knitr::image_uri("vignettes/strategies.png"), alt = "models", style = "padding:10px;width:100%; border:0" ) ``` ## Single-curve analysis (SCA) {#sca} The parameters of `SCA_dataset` are as follows: ```{r sca_input, echo=FALSE, fig.dim=c(5,3), fig.align='center'} myTable <- data.frame( Curve = paste("Sample.", LETTERS[1:3]), Ka = c(1e+6, 1e+6, 1e+6), Kd = c(1e-2, 5e-2, 1e-3), Conc = rep("50nM", 3), tass = rep(50, 3), tdiss = rep(200, 3) ) myTable$Expected_KD <- myTable$Kd / myTable$Ka kableExtra::kable(myTable) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F) ``` For example, Sample.A looks as follow: ```{r sca_rev, echo=FALSE, fig.dim=c(5,3), fig.align='center'} ggplot(SCA_dataset, aes(x = Time)) + geom_point(aes(y = Sample.A), col = "#A2C510") + geom_vline(xintercept = 50, linetype = 2) + geom_vline(xintercept = 200, linetype = 2) + theme_minimal() ``` By default, *anabel* runs in *SCA* mode. Before using the function, make sure that the input data meet the following requirements: - The data must contain a column with time values. The name of the column can be anything, as long as it contains the word "time" (case insensitive). - The association and dissociation time points must have single values (tass and tdiss, respectively). - The time points should be logically valid, i.e., *tstart \< tass \< tdiss \< tend*. - The analyte concentration should have a single value. > The starting and ending time of the experiment are always single value, unlike the value of analyte concentration or association/dissociation time, these parameters are specific to the model. > Missing start or/and end of experiment time (tstart & tend resp.) are allowed, the values will be taken from the provided data. > check ?run_anabel to get full description of each parameter ```{r sca} sca_rslt <- run_anabel(SCA_dataset, tass = 50, tdiss = 200, conc = ac) ``` By default, the command creates a list of two data frames: - kinetics: contains the estimated kinetics constants for each binding curve - fit_data: contains the response data (original) with the fitted value the kinetics table for this method contains the following information: ```{r sca_kntk, echo=FALSE} knitr::kable(sca_rslt$kinetics) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F) %>% kableExtra::scroll_box(width = "100%") ``` One way to visualize the results: ```{r sca_rslt, fig.align='center', fig.dim=c(8,4)} ggplot(sca_rslt$fit_data, aes(x = Time)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + facet_wrap(~Name, ncol = 2, scales = "free") + theme_light() ``` ## Multi-cycle kinetics (MCK) {#mck} The *MCK* method is the most common method used for analyzing biomolecular interactions, and it involves injecting different analyte concentrations in independent cycles. We can use the simulated data provided in the `MCK_dataset` to demonstrate how to analyze similar data with *anabel*. The data was created using the following parameters: ```{r params_mck, out.width="30%", echo=FALSE,message=F} myTable <- data.frame( "tass" = 45, "tdiss" = 145, "Kass" = "1e+7nM", "Kdiss" = "1e-2", "KD" = 1e-2 / 1e+7, "Conc" = "50, 16.7, 5.56, 1.85, 6.17e-1" ) kableExtra::kable(myTable) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = FALSE) ``` ```{r mck_prev, echo=FALSE, fig.dim=c(5.5,3), fig.align='center', warning=FALSE, message=FALSE} temp <- MCK_dataset %>% tidyr::pivot_longer(!Time, names_to = "conc", values_to = "Responce") temp$analyte <- gsub("Conc\\.+", "", temp$conc) ggplot(temp, aes(x = Time)) + geom_point(aes(y = Responce, col = analyte)) + geom_vline(xintercept = 50, linetype = 2) + geom_vline(xintercept = 150, linetype = 2) + theme_light() + scale_color_manual(values = biocopy_colors) + theme(legend.position = "bottom") ``` The `MCK` method assumes that each column in the input table represents one cycle with a different analyte concentration. Ideally, the values of the concentration should be different, but *anabel* will not throw an error if the same value is given to multiple cycles. However, it is the user's responsibility to check the validity of the input at this point. As with `SCA`, make sure that the following conditions hold: - The table contains data for one sample. - The input data must have a column containing the time value. - There is a single time value for each of association and dissociation (tass & tdiss, respectively). - The time points are logically valid, i.e. *tstart \< tass \< tdiss \< tend*. - There are multiple values for the analyte concentration. - The number of given analyte concentrations should equal the number of columns - 1 in the given table (e.g. the `MCK_dataset` requires 5 of each). - The order of the analyte concentrations must match the data. ```{r mck} mck_rslt <- run_anabel(MCK_dataset, tass = 45, tdiss = 145, conc = ac_mck, method = "MCK") ``` > the order of the given analyte concentration should match the columns in the sensogram table. In case of `MCK_dataset`, the value of analyte concentration is decreasing therefore the input starts from 50 down to 6.1e-7. > the estimated kinetics constants in the `kinetics` table are named accoriding to the parameter that was used in the fitting plus the cycle number (e.g. tass_1). > the fitting was successful as no boundaries were violated (columns ParamsQualitySummary & FittingQ ) ```{r mck kntks, echo=FALSE} knitr::kable(mck_rslt$kinetics) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F) %>% kableExtra::scroll_box(width = "100%") ``` You can visualize the fitting results using the fit_data table. ```{r mck_rslts, fig.align='center', fig.dim=c(7,3)} ggplot(mck_rslt$fit_data, aes(x = Time, group = Name)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + theme_light() ``` Compared to the SCA method, the MCK method generates a slightly different output: **it does not generate a report**. ## Single-cycle kinetics (SCK) {#sck} `SCK` is a fitting mode used when in the experimental setup, the analyte concentration is titrated while increasing the concentration with only a short or even without a regeneration step in between. The simulated data `SCK_dataset` was generated with the following parameters: ```{r params_sck, out.width="20%", echo=FALSE,message=F} myTable <- data.frame( Param = c("Conc", "tass", "tdiss"), Step1 = c(6.17e-1, 35, 145), Step2 = c(1.85, 205, 315), Step3 = c(5.56, 375, 485), Step4 = c(16.7, 545, 655), Step5 = c(50, 715, 825) ) kableExtra::kable(myTable) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = FALSE) ``` Overall `Kass = 1e+6nM` and `Kdiss = 1e-2nM`, therefore, the expected is `KD = 1e-08`. To analyze a dataset with the SCK method, the input should include the following: - A vector of the different analyte concentrations used in the titration - A vector of different time points for each injection step. Specifically, two time points should be included for each step: one for association and one for dissociation. ```{r sck_prev, echo=FALSE, fig.dim=c(8,3), fig.align='center'} ggplot(SCK_dataset, aes(x = Time)) + geom_point(aes(y = Sample.A), size = 1, col = "#3373A1") + geom_vline(xintercept = c(35, 375, 715), linetype = 2, linewidth = 1, col = "#F08000") + # ta geom_vline(xintercept = c(145, 485, 825), linetype = 2, linewidth = 1, col = "#F08000") + # td geom_vline(xintercept = c(205, 545), linetype = 2, linewidth = 1, col = "#A2C510") + # ta geom_vline(xintercept = c(315, 655), linetype = 2, linewidth = 1, col = "#A2C510") + # td theme_minimal() + scale_x_continuous(breaks = seq(0, max(SCK_dataset$Time), 150)) ``` To analyse this dataset with *anabel* use the following: ```{r sck} sck_rslt <- run_anabel(SCK_dataset, tass = c(35, 205, 375, 545, 715), tdiss = c(145, 315, 485, 655, 825), conc = ac_sck, method = "SCK" ) ``` and the kinetics table: ```{r sck_kntks, echo =FALSE} knitr::kable(sck_rslt$kinetics) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F) %>% kableExtra::scroll_box(width = "100%") ``` and to visualize the outcome: ```{r sck_rslt, fig.align='center', fig.dim=c(8,3)} ggplot(sck_rslt$fit_data, aes(x = Time)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + facet_wrap(~Name, ncol = 2) + theme_light() ``` # Model correction Baseline drift and surface decay are common experimental issues that can affect the estimation of kinetics from sensograms. *anabel* includes features to correct for these problems. In the following sections, we will demonstrate how to handle these cases using three datasets that suffer from either surface decay or drift. The datasets are named according to the type of problem and the method used for correction. ## Exemplary datasets - II {.tabset .tabset-fade} ```{r dataset_cor} data("MCK_dataset_drift") # multi cycle kinetics experiment with baseline drift data("SCA_dataset_drift") # single curve analysis with baseline drift data("SCK_dataset_decay") # single cycle kinetics with exponentional decay ``` ## Linear drift {.tabset .tabset-fade} ### SCA First, lets look at the data: ```{r sca_drift_1, echo=FALSE, fig.dim=c(9,4), fig.align='center'} df <- tidyr::pivot_longer(SCA_dataset_drift, cols = contains("Sample")) ggplot(df, aes(Time, value)) + geom_point(aes(col = name), size = 0.5) + geom_vline(xintercept = c(50, 200), linetype = 2, linewidth = 0.5) + theme_light() + labs(y = "Response") + theme(legend.position = "bottom") + scale_x_continuous(breaks = seq(0, 1000, 100)) + scale_color_manual(values = biocopy_colors) + facet_wrap(~name, ncol = 2, scales = "free_y") + theme(legend.position = "none") + ggtitle("Five SCA sensograms with linear drift = -0.019") ``` to analyse this data, apply the drift correction when calling `run_anabel` and visualize the results yourself if you didn't let *anabel* generate the output ```{r sca_drift, fig.align='center', fig.dim=c(9,4)} sca_rslt_drift <- run_anabel(SCA_dataset_drift, tass = 50, tdiss = 200, conc = ac, drift = TRUE) ggplot(sca_rslt_drift$fit_data, aes(x = Time)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + facet_wrap(~Name, ncol = 2) + theme_light() ``` ### MCK to analyse the MCK data with linear drift, apply the drift correction when calling `run_anabel`: ```{r mck_drift, fig.align='center', fig.dim=c(8,3)} mck_rslt_drift <- run_anabel(MCK_dataset_drift, tass = 45, tdiss = 145, conc = ac_mck, drift = TRUE, method = "MCK") ggplot(mck_rslt_drift$fit_data, aes(x = Time, group = Name)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + theme_light() + ggtitle("MCK five sensogram with linear drift = -0.01") ``` ## Exponential decay The simulated `SCK_dataset` including an exponential decay component looks as follows: ```{r sck_decay, echo=FALSE, fig.dim=c(8,3), fig.align='center'} df <- tidyr::pivot_longer(SCK_dataset_decay, cols = contains("Sample")) ggplot(df, aes(Time, value)) + geom_point(size = 0.2, col = "#3373A1") + geom_vline(xintercept = c(50, 390, 730), linetype = 2, linewidth = 1, col = "#F08000") + # ta geom_vline(xintercept = c(150, 490, 830), linetype = 2, linewidth = 1, col = "#F08000") + # td geom_vline(xintercept = c(220, 560), linetype = 2, linewidth = 1, col = "#A2C510") + # ta geom_vline(xintercept = c(320, 660), linetype = 2, linewidth = 1, col = "#A2C510") + # td theme(legend.position = "none") + facet_wrap(~name, ncol = 2) + theme_light() + ggtitle("Five SCK sensograms with exponential decay") ``` ```{r sck_decay_rslts, fig.align='center', fig.dim=c(8,3)} sck_rslt_decay <- run_anabel(SCK_dataset_decay, tass = c(35, 205, 375, 545, 715), tdiss = c(145, 315, 485, 655, 825), conc = ac_sck, method = "SCK", decay = TRUE ) ggplot(sck_rslt_decay$fit_data, aes(x = Time)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + facet_wrap(~Name, ncol = 2) + theme_light() ``` # Debug mode This mode is useful for users with a background in model optimization who want to understand the fitting model used by *anabel*. To enable debug mode, set `debug_mode = TRUE` when running the `run_anabel()` function. When the `debug_mode` parameter is set to TRUE, *anabel* will generate additional data frame that provide more information on the fitting process: - `init_df`: contains the initial values of the fitting parameters for each binding curve. ```{r ff, eval=FALSE} # call anabel in debug mode with sca data set my_data <- run_anabel(SCA_dataset, tass = 50, tdiss = 200, conc = ac, debug_mode = TRUE) init_df <- my_data$init_df # extract information of the first curve (Sample.A) response <- init_df$Response[1] %>% strsplit(",") %>% unlist() %>% as.numeric() # create a temp data frame containing both original value 'Value' and the estimated one 'Response' sampleA_df <- data.frame( Time = SCA_dataset$Time, Value = SCA_dataset$Sample.A, Response = response ) # Generate the plot associated with this curve ggplot(sampleA_df, aes(x = Time)) + geom_point(aes(y = Value), col = "#A2C510", size = 0.5) + geom_line(aes(y = Response)) + theme_light() ``` # Output options You can save *anabel*'s fitting results by setting the option `generate_output = "all"` and specifying the output directory `outdir.` The following outcome will be saved in the specified directory: - tables of kinetics and fit results (default as xlsx tables, and could be saved in other formats, see `?run_anabel`) - pdf file containing the fitting plot - SCA and SCK methods: a report file (html format) that summarizes the results If you only want specific output, you can set any of the associated options `generate_Plots`, `generate_Tables`, `generate_Report` to `TRUE.` If any of these options are `TRUE`, you must set the `generate_output` option to `customized`. > `generate_output` overwrits all other flags, its default value is "none", i.e. nothing is generated. Therefore, changing the other options without changing it will always be ignored. # Design principles & support The main goal of *anabel* is to support the scientific community for free and establish unified standards for kinetics analysis. It is continuously updated to ensure its usefulness for a variety of instruments. You can stay updated on the latest news on the *anabel* website at . If you have any questions, suggestions, or bug reports, you can contact the *anabel* team at [anabel\@biocopy.de](mailto:anabel@biocopy.de){.email}. To help the *anabel* team process your request more efficiently, please make sure to include specific keywords in the subject line of your email. If you encountered an error, use the keyword **Error**. If the run was successful but the results are incorrect, use the keyword **Bug**. If you need help with something specific about your data, use the keyword **Help**. If you are requesting a new feature or plan to use *anabel* in a commercial workflow, use the keyword **Request**. Additionally, please include a reproducible example of the problem in your email. # Acknowledgments and licensing *anabel* the package and the online tool are supported by [BioCopy GmBH](https://www.biocopy.com/). The package could be re-distributed and/or modified under the terms of the General Public License (GNU) as published by the Free Software Foundation (under any version). **For commercial use please contact the *anabel* team.** ````{=html} ```` # References ```{=html} ```