Author: Mario A. Martínez Araya
Date: 2021-07-07
Url: http://marioma.me/?i=soft
CRAN: https://cran.r-project.org/package=tmplate
This R package is intended to modify general templates replacing tags
by variable content. It was first created to modify R and Bash scripts
necessary for parallel computation using MPI. Although it is related
with R package tRnslate
, the package tmplate
performs a different task which is enhanced by
tRnslate
.
In principle any version of R should be useful but I have not tested them all.
The same than for any other R package. You can download the tar file from CRAN (tmplate)
R CMD INSTALL /path/to/tar/file/tmplate_0.0.3.tar.gz
or from R console:
install.packages("tmplate", lib = "path/to/R/library")
The main function of the package is translate
, where its
main input arguments are as below:
translate ( vars, ..., template, envir )
template
is a character vector where each element is a
line in the template that can be obtained using readLines
.
Alternatively, it can be a unique string packing all the content of the
template where each line is assumed from the newline character. It can
also be a file path but it requires to set
allow_file = TRUE
....
are the variables and their values which are used
to directly modify the content of the template. For instance
name1 = value1,, ..., nameK = valueK
where
name1
, …, nameK
are the tag names, say
<:name1:>
, …, <:nameK:>
, that can
be used within the template to modify its content depending the values
of the variables.envir
is an environment where the input variables will
be evaluated. Additionally it can have its own variables used to modify
the template content.vars
is a named list whose elements are taken as
variables to be used in the tags within the template. This is useful
when there are too many input variables for translate
.As we will see, the output from translate
is a character
vector where each element correspond to a line in the output file.
Lines starting with @r
or @R
followed by
one space or tabular, define chunks of R code that is also interpreted
and translated by translate
. The chunks of R code can be
assignation or output chunks following the rules from
the R package tRnslate
(see tRnslate
vignette). Assignation chunks are those including <-
for assigning an object, while output chunks print R output to the
template. Thus several assignation chunks can be placed in adjacent
lines, however assignation and output chunks must be separated by one
empty line (the same for consecutive output chunks). Alternatively,
inline R code can be entered using <r@ code @>
or
<R@ code @>
. Inline R code with assignation does not
produce output so is replaced by blank, while inline R code producing
output while modify the resulting template. Additionaly, inline R code
can be used within tag variable definitions to allow different
content.
The R code can use tag variables that point to the value of argument variables which are being used to modify the template content, for example in the R chunk
@r # R chunk (printing)
@r ifelse(is_mod=="-1", "# module environment not found", paste("<:MODULES_LOAD:>"))
it uses the tag <:MODULES_LOAD:>
to point to the
value of the argument variable MODULES_LOAD
. Similarly, the
content of tag variables can be modified using inline R code in the
definition of the argument when calling translate
. For
example the argument
MPI_ASK_N = '<r@ <:SLURM_ASK_NODES:> * <:SLURM_ASK_TASKS:> @>'
will compute (using inline R code) the number of parallel jobs from the
input arguments for the number of nodes and tasks. Note that the NULL
definition in the template above is used in a R logical expression to
decide whether to print or not <:SLURM_ARRAY:>
into
the source code. Alternatively this decision may have been done purely
based on R code.
The evaluation of the inline and chunks of R code to update the input arguments and replace the tags in the template is performed in an environment that can be set by the user. As said before, this environment can contain its own objects which can also be referenced to update the input arguments and modify the content of the template.
translate
commandGiven the template above, we can define the input arguments directly
when calling translate
as done below:
## remember to load: library(tmplate) or call tmplate::translate
TT <- translate(
SHELL_CALL='#!/bin/bash',
SLURM_SBATCH=ifelse(.Platform$OS.type=="unix", ifelse(system("clu=$(sinfo --version 2>&1) || clu=`echo -1`; echo $clu",intern = TRUE)=="-1", '<:NULL:>', '#SBATCH '), '<:NULL:>'),
SLURM_PARTITION='<:SLURM_SBATCH:>--partition=defq',
SLURM_ASK_NODES=2,
SLURM_NODES='<:SLURM_SBATCH:>--nodes=<:SLURM_ASK_NODES:>',
SLURM_ASK_TASKS=4,
SLURM_TASKS='<:SLURM_SBATCH:>--ntasks-per-node=<:SLURM_ASK_TASKS:>',
SLURM_MEMORY='<:SLURM_SBATCH:>--memory=2gb',
SLURM_TIME='<:SLURM_SBATCH:>--time=1:00:00',
SLURM_ARRAY="<:NULL:>",
MODULES_LOAD='module load module/for/openmpi module/for/R',
WORKDIR=ifelse('<:SLURM_SBATCH:>'!='#SBATCH','# no slurm machine','cd ${SLURM_SUBMIT_DIR}'),
TASK="<:NULL:>",
PASS_TASK="<:NULL:>",
PASS_TASK_VAR="<:NULL:>",
MPI_N="<:NULL:>",
MPI_ASK_N='<r@ <:SLURM_ASK_NODES:> * <:SLURM_ASK_TASKS:> @>',
R_HOME=R.home("bin"),
R_OPTIONS='--no-save --no-restore',
R_FILE_INPUT='script.R',
R_ARGS='',
R_FILE_OUTPUT='output.Rout',
MPIRUN='mpirun --mca mpi_warn_on_fork 0 -n <:MPI_ASK_N:> <:R_HOME:>/Rscript <:R_OPTIONS:> "<:R_FILE_INPUT:>" <r@ ifelse(!any(grepl("^<:NULL:>$","<:SLURM_ARRAY:>")),"<:PASS_TASK_VAR:>","") @> <:R_ARGS:> > <:R_FILE_OUTPUT:>',
MESSAGE_CLOSE='echo "Job submitted on $(date +%F) at $(date +%T)."',
drop = TRUE,
template = T
)
Here we have used a new default environment to evaluate the
arguments. the argument drop = TRUE
will delete any line
containing <:NULL:>
in it.
The output from translate
is a character vector where
each element is a line in the resulting file. We can print it to disk
easily using cat
(remember to set
sep = "\n"
).
The final content of the template once translated depends on the
values of the variables used (which are system dependent). Thus, for a
multicore PC with OpenMPI but
without a dynamic environment modules manager such as environment-modules or Lmod and without a
job scheduler such as SLURM then the output of
cat(TT, sep="\n")
will be something like this:
#!/bin/bash
# module environment not found
# no slurm machine
mpirun --mca mpi_warn_on_fork 0 -n 8 /apps/local/resources/svn/R/r-devel/build/bin/Rscript --no-save --no-restore "script.R" > output.Rout
echo "Job submitted on $(date +%F) at $(date +%T)."
While for an SLURM
managed HPC having also
environment-modules
we would obtain:
#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --memory=2gb
#SBATCH --time=1:00:00
module load module/for/openmpi module/for/R
cd ${SLURM_SUBMIT_DIR}
mpirun --mca mpi_warn_on_fork 0 -n 8 /usr/lib/R/bin/Rscript --no-save --no-restore "script.R" > output.Rout
echo "Job submitted on $(date +%F) at $(date +%T)."
Additional rules could be added to control the lenght of the mpirun line, however as it is it works fine. Other source code can be generated following the same principles described before.
For templates having too many variables, translation can be performed calling a list with names elements containing the variables part of the template. For instance, the previous example could be also:
## list with arguments
v <- list(
SHELL_CALL='#!/bin/bash',
SLURM_SBATCH=ifelse(.Platform$OS.type=="unix", ifelse(system("clu=$(sinfo --version 2>&1) || clu=`echo -1`; echo $clu",intern = TRUE)=="-1", '<:NULL:>', '#SBATCH '), '<:NULL:>'),
SLURM_PARTITION='<:SLURM_SBATCH:>--partition=defq',
SLURM_ASK_NODES=2,
SLURM_NODES='<:SLURM_SBATCH:>--nodes=<:SLURM_ASK_NODES:>',
SLURM_ASK_TASKS=4,
SLURM_TASKS='<:SLURM_SBATCH:>--ntasks-per-node=<:SLURM_ASK_TASKS:>',
SLURM_MEMORY='<:SLURM_SBATCH:>--memory=2gb',
SLURM_TIME='<:SLURM_SBATCH:>--time=1:00:00',
SLURM_ARRAY="<:NULL:>",
MODULES_LOAD='module load module/for/openmpi module/for/R',
WORKDIR=ifelse('<:SLURM_SBATCH:>'!='#SBATCH','# no slurm machine','cd ${SLURM_SUBMIT_DIR}'),
TASK="<:NULL:>",
PASS_TASK="<:NULL:>",
PASS_TASK_VAR="<:NULL:>",
MPI_N="<:NULL:>",
MPI_ASK_N='<r@ <:SLURM_ASK_NODES:> * <:SLURM_ASK_TASKS:> @>',
R_HOME=R.home("bin"),
R_OPTIONS='--no-save --no-restore',
R_FILE_INPUT='script.R',
R_ARGS='',
R_FILE_OUTPUT='output.Rout',
MPIRUN='mpirun --mca mpi_warn_on_fork 0 -n <:MPI_ASK_N:> <:R_HOME:>/Rscript <:R_OPTIONS:> /
"<:R_FILE_INPUT:>" <r@ ifelse(!any(grepl("^<:NULL:>$","<:SLURM_ARRAY:>")),"<:PASS_TASK_VAR:>","") @> /
<:R_ARGS:> > <:R_FILE_OUTPUT:>',
MESSAGE_CLOSE='echo "Job submitted on $(date +%F) at $(date +%T)."'
)
## Produce output
## remember to load: library(tmplate) or call tmplate::translate
TT <- translate(vars = v, drop = TRUE, template = T)
## See result
cat(TT, sep="\n")
which produces the same output.
Since tmplate
uses tRnslate
, then some of
the limitations of the latter also applies to the former (see tRnslate
vignette for more details).
Never replace the content of a template writing the output to the same file.
Always check the content of the “translated” output before using it for other tasks.
Be cautious.