R nf-core utils tutorial

Introduction

This package is meant to be use inside Nextflow template() script. Its aim is to provide useful function to take care of the connection between Nextflow variable and R logic.

Main function

There is two important function in this package.

Function process_inputs()

This function takes as inputs a list of expected options with their default values, the argument string and different validation rules.

Parameter opt

This parameter should list all the different variable that you might use in the main script. You can set them to a default value or event directly initialize them with Nextflow variable such as:

opt <- list(
  prefix = "${task.ext.prefix}",
  seed = 1
)

Parameter args

The argument string correspond to ${task.ext.args} in Nextflow and will be parsed with parse_arguments(). This function expect all arguments to be in the form --key value. Key only argument will be interpreted as TRUE such as --is-test will give back list("is-test" = "TRUE"). Beware that is is for the moment a string value.

If you need spaces for one value, use bracket around it, such as --key "value with space". All the key / value pairs will then overwrite their counter part in the options list passe to process_inputs().

Validation rules

The process_inputs() function will enforce the following rules to the keys listed:

  • keys_to_nullify: will be set to R NULL value if is “null” or empty
  • expected_files: these paths should be existing files
  • expected_folders: these path should be existing folder
  • expected_double: these values will be converted with as.double() or should be NULL
  • expected_integer: these values will be converted with as.integer() or should be NULL
  • expected_boolean: these values will be converted to TRUE/FALSE or should be NULL accepted values are:
    • TRUE: 1, yes, true
    • FALSE: 0, no, false
  • required_opts: these keys should be non-null values

Function process_end()

This function will emit a versions.yml and a R_sessionInfo.log file in the directory provided. The version file will be populated with the R version, the version of nfcore.utils and the version of the additional packages given.

Parameter packages

This parameter should be a named list where the name correspond to the conda package name and the value the R package name.

Such as:

process_end(
  packages = list(
    "r-stats" = "stats"
  ),
  task_name = "${task.process}"
)

Usage example

If we take for example the custom/geneticmapconvert process in nf-core modules.

The nextflow process is the following:

process CUSTOM_GENETICMAPCONVERT {
  tag "$meta.id"
  label 'process_single'

  input:
  tuple val(meta), path(map_file)

  output:
  tuple val(meta), path("${prefix}.glimpse.map"), emit: glimpse_map
  path "versions.yml", emit: versions_geneticmapconvert, topic: versions

  when:
  task.ext.when == null || task.ext.when

  script:
  prefix = task.ext.prefix ?: "${meta.id}"
  args = task.ext.args ?: ''

  """
  echo ${args} // In the form --tolerance 0.15
  """

  template 'geneticmapconvert.R'
}

Then in the templates/geneticmapconvert.R we use the following

library(nfcore.utils)
library(data.table)
library(stringr)

### INPUTS PARSING ###
opt <- list(
  map_file = "${map_file}",
  chr = "${meta.chr}",
  prefix = "${prefix}",
  tolerance = NULL
)

process_input(
  opt = opt,
  args = "${args}",
  keys_to_nullify = c("prefix", "tolerance"),
  expected_files = c("map_file"),
  expected_double = c("tolerance"),
  required_opts = c("map_file", "prefix")
)

### MAIN SCRIPT ###

...

### END of PROCESS ###
process_end(
  packages = list(
    "r-data.table" = "data.table",
    "r-stringr" = "stringr"
  ),
  task_name = "${task.process}",
  versions_path = "versions.yml",
  log_path = "R_sessionInfo.log"
)

Session information

options(old_opt)
sessionInfo()
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BiocStyle_2.41.0
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.39       R6_2.6.1            fastmap_1.2.0      
##  [4] xfun_0.59           maketools_1.3.2     cachem_1.1.0       
##  [7] knitr_1.51          htmltools_0.5.9     rmarkdown_2.31     
## [10] buildtools_1.0.0    lifecycle_1.0.5     cli_3.6.6          
## [13] sass_0.4.10         jquerylib_0.1.4     compiler_4.6.0     
## [16] sys_3.4.3           tools_4.6.0         bslib_0.11.0       
## [19] evaluate_1.0.5      yaml_2.3.12         otel_0.2.0         
## [22] BiocManager_1.30.27 jsonlite_2.0.0      rlang_1.2.0