submitr is the third package in the From the Notebook to the Cluster
family, alongside toolero and containr. It provides a workflow for
submitting containerized R analyses to the UW-Madison Center for High
Throughput Computing (CHTC) from inside R.
htc_config() -- create or read a project-level htc.cfg configuration
file. On first use, prompts interactively for username and server, displays
ControlMaster SSH setup guidance to reduce Duo MFA prompts, writes
htc.cfg, and adds it to .gitignore. Subsequent calls read the existing
file and validate server reachability. Returns a named list with username
and server. Errors informatively when username or server are supplied
as empty strings.htc_gen_submit() -- generate an HTCondor .sub submit file from
project parameters. Supports single-job and multiple-job modes. Multiple
mode reads a manifest from toolero::write_by_group(manifest = TRUE),
extracts filenames, writes subdatasets.csv, and emits
queue file from subdatasets.csv. Resource presets (small, medium,
large, custom) are loaded at runtime from
inst/extdata/htc-resources.yaml; a local ./htc-resources.yaml takes
precedence over the package default. GPU support via gpu = TRUE and
gpu_options. comments = TRUE annotates each section of the generated
file with explanatory text.
htc_gen_executable() -- generate the .sh executable script that
HTCondor runs inside the container. Produces a four-element script:
shebang, mkdir, Rscript, and tar. In multiple-job mode, passes
${1} as a positional argument to the R script. r_script must be
supplied explicitly -- there is no default. set_executable = TRUE
(default) sets executable permissions via Sys.chmod().
htc_upload() -- copy files to the CHTC submit node via scp. Accepts
single files, vectors of files, directories (transferred recursively),
and glob patterns. remote_path defaults to "~/". dry_run = TRUE
previews the command without executing it.
htc_submit() -- run condor_submit on the remote submit node via SSH
from the remote directory where files were uploaded. Returns the cluster
ID invisibly for use with htc_status(). Supports dry_run = TRUE.
htc_status() -- check job progress via condor_q. Optionally filters
by cluster ID. watch = TRUE polls at interval seconds (default 60)
until the cluster ID leaves the queue. Returns condor_q output invisibly
as a character vector. Supports dry_run = TRUE.
htc_download() -- copy result files back from the submit node via scp.
Supports single filenames, vectors of filenames, and glob patterns
("*.tar.gz", "job.*"). Glob patterns are single-quoted to prevent
local shell expansion. local_path defaults to ".".
Supports dry_run = TRUE.
inst/extdata/htc-resources.yaml ships with the package and provides
default resource presets for htc_gen_submit().
inst/extdata/hello-world.sub and inst/extdata/hello-world.sh included
as test files for end-to-end workflow verification.
inst/extdata/sample.R included as a sample R script for use in examples.
The test suite uses a three-layer strategy to handle the fact that end-to-end
testing requires a live HTCondor environment and SSH access. Layer 1 covers
argument validation. Layer 2 covers command construction using dry_run = TRUE
and mocked bindings. Layer 3 integration tests are opt-in via
Sys.setenv(CHTC_USERNAME = "your.netid") and never run on CRAN or CI.
153 tests passing across seven test files.