--- title: "R nf-core utils tutorial" author: "Louis Le Nezet" date: "30/03/2026" url: "https://github.com/nf-core/r-nf-core-utils" output: BiocStyle::html_document: toc: true toc_depth: 3 fig_crop: no header-includes: \usepackage{tabularx} vignette: | %\VignetteIndexEntry{R nf-core utils tutorial} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 80 --- ```{r width_control, echo = FALSE} old_opt <- options(width = 100) ``` # Introduction This package is meant to be use inside Nextflow `template()` script. Its aim is to provide useful function to take care of the connection between Nextflow variable and R logic. ## Main function There is two important function in this package. ### Function `process_inputs()` This function takes as inputs a list of expected options with their default values, the argument string and different validation rules. #### Parameter `opt` This parameter should list all the different variable that you might use in the main script. You can set them to a default value or event directly initialize them with Nextflow variable such as: ```{r, eval = FALSE} opt <- list( prefix = "${task.ext.prefix}", seed = 1 ) ``` #### Parameter `args` The argument string correspond to `${task.ext.args}` in Nextflow and will be parsed with `parse_arguments()`. This function expect all arguments to be in the form `--key value`. Key only argument will be interpreted as `TRUE` such as `--is-test` will give back `list("is-test" = "TRUE")`. Beware that is is for the moment a string value. If you need spaces for one value, use bracket around it, such as `--key "value with space"`. All the key / value pairs will then overwrite their counter part in the options list passe to `process_inputs()`. #### Validation rules The `process_inputs()` function will enforce the following rules to the keys listed: - `keys_to_nullify`: will be set to R `NULL` value if is "null" or empty - `expected_files`: these paths should be existing files - `expected_folders`: these path should be existing folder - `expected_double`: these values will be converted with `as.double()` or should be `NULL` - `expected_integer`: these values will be converted with `as.integer()` or should be `NULL` - `expected_boolean`: these values will be converted to TRUE/FALSE or should be `NULL` accepted values are: - TRUE: 1, yes, true - FALSE: 0, no, false - `required_opts`: these keys should be non-null values ### Function `process_end()` This function will emit a `versions.yml` and a `R_sessionInfo.log` file in the directory provided. The version file will be populated with the R version, the version of nfcore.utils and the version of the additional packages given. #### Parameter `packages` This parameter should be a named list where the name correspond to the conda package name and the value the R package name. Such as: ```{r, eval = FALSE} process_end( packages = list( "r-stats" = "stats" ), task_name = "${task.process}" ) ``` ## Usage example If we take for example the [`custom/geneticmapconvert` process in nf-core modules](https://nf-co.re/modules/custom_geneticmapconvert/). The nextflow process is the following: ```{groovy, eval = FALSE} process CUSTOM_GENETICMAPCONVERT { tag "$meta.id" label 'process_single' input: tuple val(meta), path(map_file) output: tuple val(meta), path("${prefix}.glimpse.map"), emit: glimpse_map path "versions.yml", emit: versions_geneticmapconvert, topic: versions when: task.ext.when == null || task.ext.when script: prefix = task.ext.prefix ?: "${meta.id}" args = task.ext.args ?: '' """ echo ${args} // In the form --tolerance 0.15 """ template 'geneticmapconvert.R' } ``` Then in the `templates/geneticmapconvert.R` we use the following ```{r, eval = FALSE} library(nfcore.utils) library(data.table) library(stringr) ### INPUTS PARSING ### opt <- list( map_file = "${map_file}", chr = "${meta.chr}", prefix = "${prefix}", tolerance = NULL ) process_input( opt = opt, args = "${args}", keys_to_nullify = c("prefix", "tolerance"), expected_files = c("map_file"), expected_double = c("tolerance"), required_opts = c("map_file", "prefix") ) ### MAIN SCRIPT ### ... ### END of PROCESS ### process_end( packages = list( "r-data.table" = "data.table", "r-stringr" = "stringr" ), task_name = "${task.process}", versions_path = "versions.yml", log_path = "R_sessionInfo.log" ) ``` # Session information ```{r} options(old_opt) sessionInfo() ```