--- title: "Introduction and Quickstart for Seven Bridges API R Client" date: "`r Sys.Date()`" output: rmarkdown::html_document: toc: true toc_float: true toc_depth: 4 number_sections: true theme: "flatly" highlight: "textmate" css: "sevenbridges.css" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Introduction and Quickstart for Seven Bridges API R Client} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` `Important note:` This is a work-in-progress project to update the `sevenbridges2` package. Accordingly, this vignette will also change as new features are implemented. # Introduction `sevenbridges2` is an R package that provides an interface for the Seven Bridges public API. The supported platforms include the [Seven Bridges Platform](https://igor.sbgenomics.com/), [Cancer Genomics Cloud (CGC)](https://www.cancergenomicscloud.org), [BioData Catalyst (BDC)](https://platform.sb.biodatacatalyst.nhlbi.nih.gov/) and [CAVATICA](https://cavatica.sbgenomics.com). Learn more from our documentation on the [Seven Bridges Platform](https://docs.sevenbridges.com/page/api), [Cancer Genomics Cloud (CGC)](https://docs.cancergenomicscloud.org/v1.0/page/the-cgc-api), [BioData Catalyst (BDC)](https://sb-biodatacatalyst.readme.io/reference/new-1) and [CAVATICA](https://docs.cavatica.org/reference/the-api). Unlike the current `sevenbridges` package that is built on top of [Reference classes](http://adv-r.had.co.nz/R5.html), the `sevenbridges2` package is based on more modern and lightweight [R6](https://adv-r.hadley.nz/r6.html) classes. However, the basic idea and way of constructing API requests is largely preserved. ## R Client for the Seven Bridges API In order to use the `sevenbridges2` package users must authenticate themselves first by creating an Auth object and providing necessary credentials. You can read more about the authentication types in our next chapters. The `sevenbridges2` package only supports v2+ versions of the API, since versions prior to v2 are not compatible with the Common Workflow Language (CWL). This package provides a simple interface for accessing and trying out various methods. ## Installation The `sevenbridges2` package is available on CRAN and Seven Bridges Github repository. To install it from `CRAN`, use simply: ```{r} # Install package from CRAN install.packages("sevenbridges2") ``` To install the development version from the `develop` branch on our Github, use the `remotes` package: ```{r} # Install package from github remotes::install_github( "sbg/sevenbridges2", build_vignettes = TRUE, dependencies = TRUE ) ``` If you have trouble with `pandoc` and do not want to install it, set `build_vignettes = FALSE` to avoid the vignettes build. ## API General Information There are two ways of constructing API calls. For instance, you can use low-level API calls which use arguments like `path`, `query`, and `body`. These are documented in the API reference libraries for the [Seven Bridges Platform](https://docs.sevenbridges.com/reference#list-all-api-paths) and the [CGC](https://docs.cancergenomicscloud.org/docs/new-1). An example of a low-level request to "list all projects" is shown below. In this request, you can also pass `query` and `body` as a list. ```{r} # Load the package library("sevenbridges2") # Authenticate a <- Auth$new(token = "", platform = "aws-us") # List all projects with raw api() function a$api(path = "projects", method = "GET") ``` ***(Advanced user option)*** The second way of constructing an API request is to directly use the [httr2](https://httr2.r-lib.org/) package to make your API calls. The `sevenbridges2` package is organized by main resources from the Seven Bridges API reference. There we have groups of endpoints to work with projects, files, apps, tasks, invoices, volumes, etc. For each group of resources, there is a set of operations such as `query()`, `get()` and `delete()` which are common, as well as other custom operations. Before we start, keep in mind the following: __`offset` and `limit`__ Almost every API call accepts two arguments named `offset` and `limit`. - Offset defines where the retrieved items start. - Limit defines the number of items you want to get. By default, `offset` is set to `0` and `limit` is set to `50`. As such, your API request returns the __first 50 items__ when you list items or search for items by name. To search and list all items, use `complete = TRUE` if you are using the core `api()` function in your API request, or the `all()` operation within the `Collection` object you've received as the result. __`Collection`__ Every API call that returns a list of items (usually the output from `query()` operations), operations), like fetching projects, files, apps etc, wraps the results into a general `Collection` class object containing the `items` field from which users may access the items returned. Additional options that the `Collection` class offers are to navigate between pages of results, for example, to load next or previous page of results by calling `next_page()` and `prev_page()` methods. Moreover, users can fetch all results using `Collection`'s `all()` method which is a shortcut to send multiple API calls for each next page and collect all results. Keep in mind the `limit` used, as well as the API rate limit. ```{r} # Create a collection of files public_files <- a$files$query(project = "admin/sbg-public-data") # Load next 50 results public_files$next_page() # Load previous 50 results public_files$prev_page() # Load all results public_files$all() ``` Lastly, printing `Collection` objects will print the first 10 items (if there are more than 10 items in the results) by default, but this can be changed with the `n` parameter in its `print()` function: ```{r} # Create a collection of files public_files <- a$files$query(project = "admin/sbg-public-data") # Default print public_files # Print 20 items public_files$print(n = 20) ``` __Search by ID__ When searching by ID (usually it's the resource's `get()` operation), your request will return your exact resource as it is unique. Therefore, you do not have to set `offset` and `limit` manually. It is good practice to find your resources by their ID and pass this ID as an input to your task. You can find a resource's ID in the final part of the URL in the visual interface or via API requests to list resources or get a resource's details. __Search by name__ Search by name as criteria in the `query()` operations of Resources, returns all exact or partial matches depending on the resource. For example, to list all public files, use the `admin/sbg-public-data` project query parameter, while if you want to find an exact file by name, set its `name` parameter to the exact value (partial search by name is not possible for files). ```{r} # Search all public files public_files <- a$files$query(project = "admin/sbg-public-data") # Search files by name file_1000G_omni <- a$files$query( project = "admin/sbg-public-data", name = "1000G_omni2.5.b37.vcf" ) ``` On the other hand, partial search by name works for Projects and Apps resources. You can set the corresponding `name` or `query_terms` parameters for this use case. In order to query public apps, set the `visibility` parameter to 'public'. ```{r} # Search all public apps containing the STAR term public_star_apps <- a$apps$query( visibility = "public", query_terms = list("STAR") ) # Search all projects that contain "demo" in the name demo_projs <- a$projects$query(name = "demo") ```

top

**Get information about a user** This call returns information about the specified user. Note that currently you can view only your own user information, so this call is equivalent to the [call to get information about your account](#youruser). ```{r} # Get user info a$user(username = "") ```

top

Please check `vignette("Authentication_and_Billing", package = "sevenbridges2")` for more technical details about getting user information. ## Rate Limit This call returns information about your current rate limit. This is the number of API calls you can make in five minutes. This call also returns information about your current instance limit. ```{r} # Get rate limit info a$rate_limit() ``` ``` ── Rate Limit ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── • rate • limit: 1000 • remaining: 1000 • reset: 2022-12-26 11:31:01 CET • instance • limit: 25 • remaining: 25 ```

top

Please check `vignette("Authentication_and_Billing", package = "sevenbridges2")` for more technical details about rate limit information. ## Show Billing Information Each project must have a Billing Group associated with it. This Billing Group pays for the storage and computation in the project. For example, your first project(s) were created with the free funds from the Pilot Funds Billing Group assigned to each user at sign-up. To get information about your billing groups: ```{r} # Check your billing info a$billing_groups$query() ``` This call lists all your billing groups, including groups that are pending or have been disabled. To get information about your invoices: ```{r} # Check your invoices a$invoices$query() ``` The call returns information about all your available invoices, unless you use the `billing_group_id` query parameter to specify the ID of a particular billing group, in which case it will return the invoice incurred by that billing group only. To get detailed information for a specific billing group, please use the billing_group method with the billing group ID. The information returned includes the billing group owner, the total balance, and the status of the billing group (pending, disabled,...). ```{r} # Get a single billing group a$billing_groups$get(id = "") ``` ``` ── Billing group info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── • disabled: FALSE • pending: FALSE • type: regular • name: My billing group • owner: • id: • href: https://api.sbgenomics.com/v2/billing/groups/ • balance • currency: USD • amount: 221 ```

top

Please check `vignette("Authentication_and_Billing", package = "sevenbridges2")` for more technical details about billing informations. ## List and query projects Projects are the core building blocks of the platform. Each project corresponds to a distinct scientific investigation, serving as a container for its data, analysis tools, results, and collaborators. In order to query and explore all projects, use the `projects` resource path and the `query()` method. One can also filter the projects by several criteria, like project's name and tags. The search by name is partial and case-insensitive. ```{r} # List first 5 projects my_projects <- a$projects$query(limit = 5) my_projects # Load next page of results my_projects$next_page() # Return all projects that contain the term "demo" demo_projects <- a$projects$query(name = "demo") # Return all projects tagged with "demo" tagged_projects <- a$projects$query(tags = list("demo")) ``` Note that the output is the `Collection` object and the results (list of `Project` objects) can be found within the `items` field.

top

Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about projects. ## Create a new project Create a new project called "API testing" with the billing group `id` obtained above. ```{r} # List all available billing groups for currently logged in user a$billing_groups$query() # Set the billing group for the new project bid <- "" # Create a new project p <- a$projects$create( name = "API testing", billing_group = bid, description = "This project has been created using the sevenbridges2 R API library." ) ``` ``` ── Project ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── • category: PRIVATE • root_folder: • type: v2 • description: This project has been created using the sevenbridges2 R API library. • billing_group: • name: API testing • id: /api-testing • href: https://api.sbgenomics.com/v2/projects//api-testing • settings • locked: FALSE • controlled: FALSE • location: aws:us-east-1 • use_interruptible_instances: TRUE • use_memoization: FALSE • intermediate_files: list(duration = 24, retention = "LIMITED") • allow_network_access: TRUE • use_elastic_disk: FALSE ```

top

The **new project** is created on the platform. Notice also that the variable `p` is an R6 object with fields that contain information about the platform project. The facility also has several methods that allow you to perform basic platform operations on the project. Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about projects. ## Get details of a specified project Use the `get()` method and provide the full ID of the project you would like to fetch. ```{r} # Get a single project by ID a$projects$get(id = "/api-testing") ``` ``` ── Project ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── • category: PRIVATE • root_folder: • type: v2 • description: This project has been created using the sevenbridges2 R API library. • billing_group: • name: API testing • id: /api-testing • href: https://api.sbgenomics.com/v2/projects//api-testing • settings • locked: FALSE • controlled: FALSE • location: aws:us-east-1 • use_interruptible_instances: TRUE • use_memoization: FALSE • intermediate_files: list(duration = 24, retention = "LIMITED") • allow_network_access: TRUE • use_elastic_disk: FALSE ``` Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about apps. ## Copy app into the project Seven Bridges maintains workflows and tools available to all of its users in the Public Apps repository. To find out more about public apps, you can do the following: - Browse them online. Check out the [tutorial](https://docs.sevenbridges.com/docs/search-for-an-app) in the "Find apps" section. - You can use the `sevenbridges2` package to find it, as shown below. ```{r} # Search by name matching, with limit 10 public_apps <- a$apps$query( visibility = "public", limit = 10, query_terms = list("STAR") ) # Search by ID star_app <- a$apps$get( id = "admin/sbg-public-data/rna-seq-alignment-star/0" ) ``` Now, copy the App your `project` with a new `name`, following this logic. ```{r} # Copy app into the project a$apps$copy( app = star_app, project = "/api-testing", name = "New copy of STAR" ) # Check if it is copied p <- a$projects$get(id = "/api-testing") # List the apps you have in your project p$list_apps() ``` The short name is changed to `newcopyofstar`. ``` == App == id : /api-testing/newcopyofstar/0 name : RNA-seq Alignment - STAR project : /api-testing revision : 0 ``` Alternatively, you can copy it from the `App` object. ```{r} # Get public app RNA Sequencing alignment - STAR star_app <- a$apps$get( id = "admin/sbg-public-data/rna-seq-alignment-star/0" ) # Copy it into a project star_app$copy( project = "/api-testing", name = "Copy of STAR" ) ```

top

Next, we would like to run a task with this app. Let's see what is required. Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about tasks. ## Execute a new task ### Find your app inputs Once you have copied the public app `admin/sbg-public-data/rna-seq-alignment-star/0` into your project, `/api-testing`, the app `id` in your current project is `/api-testing/newcopyofstar`. Alternatively, you can use another app you already have in your project for this Quickstart. To draft a new task, you need to specify the following: - The name of the task - An optional description - The App object or `id` of the workflow you are executing - The inputs for your workflow. You can always check the App details on the visual interface for task input requirements. However, there is also a function on the App objects to get basic information about app's inputs and outputs. To find the required inputs with R, you need to get an `App` object first. Let's check which inputs this app requires by calling the `input_matrix()` function and bring them into our project. ```{r} # Fetch copied app copied_star_app <- a$apps$get( id = "/api-testing/newcopyofstar/0" ) # Preview its inputs copied_star_app$input_matrix() ``` Locate the IDs of the required inputs. Note that task inputs need to match the expected data type and name. In the above example, we see two required fields: - **fastq:** This input takes a file array in the following formats: FASTA, FASTQ, FA, FQ etc. - **genomeFastaFiles:** This is a single reference file in the FASTA, FA, FNA or TAR format. We also want to provide a gene feature file: - **sjdbGTFfile:** A file array that can be in the GTF, GFF, GFF2, or GFF3 format. You can find a list of possible input types below: - **number, character or integer:** you can directly pass these to the input parameter as they are. - **enum type:** Pass this value to the input parameter. - **file:** This input is a file. However, while some inputs accept only a single file (`File`), other inputs take more than one file (`File` arrays, `FilesList`, or '`File...`' ). This input requires you to pass a single `File` object (for a single file input) or list of `File` objects (for inputs which accept more than one file). You can search for your file by `id` or by `name`, as shown in the example below.

top

### Prepare your input files ```{r} # Get reads (fastq) files and and copy them into a project reads_1 <- a$files$get(id = "641c48c425ed1842bd0bf799") # file id reads_1$copy_to(project = p) reads_2 <- a$files$get(id = "641c48c425ed1842bd0bf835") # file id reads_2$copy_to(project = p) # Get a single file reference file and copy into a project fasta_in <- a$files$get(id = "641c48c525ed1842bd0bf86a") # file id fasta_in$copy_to(project = p) # Get gtf file and copy into a project gtf_in <- a$files$get(id = "641c48c425ed1842bd0bf825") # file id gtf_in$copy_to(project = p) # Get copied files input_files <- p$list_files()$items ```

top

### Create a new draft task ```{r} # Add new tasks taskName <- paste0("STAR-alignment ", date()) tsk <- p$create_task( name = taskName, description = "STAR test", app = copied_star_app, inputs = list( "fastq" = c(input_files[[1]], input_files[[2]]), "genomeFastaFiles" = input_files[[3]], "sjdbGTFfile" = list(input_files[[4]]) ) ) # Preview task tsk$print() ``` ### Preview your app's expected outputs Similarly as with inputs, you can also preview the structure of the expected outputs of the task or workflow. You can get details about the output's name, description and type using `output_matrix()`. This function can be called from the App object. ```{r} # Get app's outputs details copied_star_app$output_matrix() ```

top

Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about tasks. ## Run a Task Now, we are ready to run our task. ```{r} # Run your task tsk$run() ``` Before you run your task, you can adjust your draft task if you have any final modifications. ```{r} # Update task tsk$update(description = "New RNA SEQ Alignment - STAR task") ``` After you run a task, you can track its status by refreshing the object with `reload()` function. ```{r} # Reload task tsk$reload() tsk$status ``` You can also abort the task execution if needed: ```{r} # Abort your task tsk$abort() ``` If you want to rerun your task without any modifications, you can use `rerun()` function which will clone the current task for you and start the execution immediately. ```{r} # Rerun your task tsk$rerun() ``` On the other side, if you want to update your task first and then re-run it, you should clone the current task, update it and then run it, as demonstrated below: ```{r} # First clone existing task cloned_task <- tsk$clone_task() # Then, update GTF input file in the cloned task cloned_task$update(inputs = list(sjdbGTFfile = "")) cloned_task$run() ``` Alternatively, you can delete the draft task if you no longer wish to run it. ```{r} # # not run # tsk$delete() ```

top

Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about running tasks. ## Run tasks using spot instances Running tasks with [spot instances](https://docs.sevenbridges.com/docs/about-spot-instances) could potentially [reduce a considerable amount of computational cost](https://www.sevenbridges.com/spot-instances-cost-reduction/). This option can be controlled on the project level or the task level on Seven Bridges platforms. Our package follows the same [logic](https://docs.sevenbridges.com/docs/use-spot-instances) as our platform's web interface (the current default setting for spot instances is **on**). For example, when we create a project using Projects resource's method `create()`, we can set `use_interruptible = FALSE` to use on-demand instances (non-interruptible but more expensive) instead of the spot instances (interruptible but cheaper): ```{r} # Create project with disabled spot instances p <- a$projects$create( name = "spot-disabled-project", bid, description = "spot disabled project", use_interruptible = FALSE ) ``` Then all the new tasks created under this project will use on-demand instances to run **by default**, unless an argument `use_interruptible_instances` is specifically set to `TRUE` when drafting the new task using Tasks resource method `create()`. For example, if `p` is the above spot disabled project, to draft a task that will use spot instances to run: ```{r} # Create task and set usage of interruptible instances to TRUE tsk <- p$create_task( name = paste0("spot enabled task in a spot disabled project"), description = "spot enabled task", app = copied_star_app, inputs = list( "fastq" = c(input_files[[1]], input_files[[2]]), "genomeFastaFiles" = input_files[[3]], "sjdbGTFfile" = list(input_files[[4]]) ), use_interruptible_instances = TRUE ) ``` Conversely, you can have a spot instance enabled project, but draft and run specific tasks using on-demand instances, by setting `use_interruptible_instances = FALSE` in `create_task()` explicitly.

top

Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about running tasks using spot instances. ## Execution hints per task run During workflow development and benchmarking, sometimes we need to view and make adjustments to the computational resources needed for a task to run more efficiently. Also, if a task fails due to resource deficiency, we often want to define a larger instance for the task re-run without editing the app itself. This is particularly important in cases where there is not enough disk space. The Seven Bridges API allows setting specific task execution parameters by using `execution_settings`. It includes the instance type (`instance_type`) and the maximum number of parallel instances (`max_parallel_instances`): ```{r} # Create task with setting instance type and number of parallel instances tsk <- p$create_task( ..., execution_settings = list( instance_type = "c4.2xlarge;ebs-gp2;2000", max_parallel_instances = 2 ) ) ``` For details about `execution_settings`, please check [create a new draft task](https://docs.sevenbridges.com/reference/create-a-new-task).

top

Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about execution hints. ## Draft a batch task Now let's do a batch with 4 files in 2 groups, which is batched by metadata `sample_id`. We will assume each file has this metadata field entered. Since these files can be evenly grouped into 2, we will have a single parent batch task with 2 child tasks. ```{r} # Add two more fastq files that will be used in our task inputs # and copy them into our API testing project reads_3 <- a$files$get(id = "641c48c425ed1842bd0bf7b6") # file id reads_3$copy_to(project = p) reads_4 <- a$files$get(id = "641c48c425ed1842bd0bf7a5") # file id reads_4$copy_to(project = p) # Get all project files input_files <- p$list_files()$items taskName <- paste0("STAR-alignment ", date()) # Create task with batch criteria tsk <- p$create_task( name = taskName, description = "Batch Star Test", app = copied_star_app, batch = TRUE, batch_input = "fastq", batch_by = list( type = "CRITERIA", criteria = list("metadata.sample_id") ), inputs = list( "fastq" = c( input_files[[1]], input_files[[2]], input_files[[3]], input_files[[4]] ), "genomeFastaFiles" = input_files[[5]], "sjdbGTFfile" = list(input_files[[6]]) ) ) # Run batch task tsk$run() ``` Now you have a draft batch task. Please check it out in the visual interface. Your response body should inform you of any errors or warnings. You can also check the parent task's children status with `list_batch_children()` method and then for each child execution details: ```{r} # List parent task children and their execution details child_tasks <- tsk$list_batch_children() child1_details <- child_tasks$items[[1]]$get_execution_details() child2_details <- child_tasks$items[[2]]$get_execution_details() ```

top

Please check `vignette("Projects_and_Tasks_execution", package = "sevenbridges2")` for more technical details about running batch tasks.