| Title: | Scaffold and Submit Computational Jobs to HTC Schedulers |
|---|---|
| Description: | Provides scaffolding tools to help researchers prepare and submit computational jobs to high-throughput computing (HTC) schedulers. Generates the files required to run containerized R analyses on 'HTCondor', including submit files and executable scripts, and wraps the system commands needed to stage files, submit jobs, monitor status, and retrieve results from a CHTC submit node. Provides 'htc_config()' for managing connection details and SSH connection reuse guidance. Works naturally alongside 'containr' for container image management and 'toolero' for dataset splitting and project scaffolding. |
| Authors: | Erwin Lares [aut, cre] (ORCID: <https://orcid.org/0000-0002-3284-828X>) |
| Maintainer: | Erwin Lares <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-19 11:25:46 UTC |
| Source: | https://github.com/cran/submitr |
htc_config() creates or reads an htc.cfg file that stores the
connection details needed by htc_stage(), htc_submit(),
htc_status(), and htc_fetch_results(). On first use it prompts
interactively for your username and server address, writes htc.cfg
to path, and adds it to .gitignore. Subsequent calls read the
existing file.
htc_config(username = NULL, server = NULL, path = ".", overwrite = FALSE)htc_config(username = NULL, server = NULL, path = ".", overwrite = FALSE)
username |
A character string. Your HTC username (NetID), e.g.
|
server |
A character string. The HTC submit server hostname.
Defaults to |
path |
A character string. Directory where |
overwrite |
Logical. If |
A named list with elements username and server, returned
invisibly.
Each call to htc_stage(), htc_submit(), htc_status(), or
htc_fetch_results() opens a new SSH connection to the submit server,
which triggers a Duo MFA prompt each time. You can avoid this by
configuring SSH connection reuse (ControlMaster) in your
~/.ssh/config file. Add the following block:
Host *.chtc.wisc.edu ControlMaster auto ControlPersist 2h ControlPath ~/.ssh/connections/%r@%h:%p
Then create the connections directory:
mkdir -p ~/.ssh/connections
After this, only the first connection in a two-hour window will require Duo authentication. Full documentation: https://chtc.cs.wisc.edu/uw-research-computing/configure-ssh
htc.cfg contains your username and server address. Neither is
sensitive on its own, but htc_config() adds htc.cfg to
.gitignore on creation to avoid accidentally committing
institutional account details to a public repository.
# Preview what htc_config() would return without writing any files cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") str(cfg) ## Not run: # Interactive first-time setup cfg <- htc_config() # Non-interactive setup (for scripts) cfg <- htc_config( username = "erwin.lares", server = "ap2002.chtc.wisc.edu" ) # Force recreation of htc.cfg cfg <- htc_config(overwrite = TRUE) # Use in other functions htc_upload(files = c("job.sub", "job.sh"), config = cfg) ## End(Not run)# Preview what htc_config() would return without writing any files cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") str(cfg) ## Not run: # Interactive first-time setup cfg <- htc_config() # Non-interactive setup (for scripts) cfg <- htc_config( username = "erwin.lares", server = "ap2002.chtc.wisc.edu" ) # Force recreation of htc.cfg cfg <- htc_config(overwrite = TRUE) # Use in other functions htc_upload(files = c("job.sub", "job.sh"), config = cfg) ## End(Not run)
htc_download() copies one or more files from a directory on an HTC
submit node to a local directory via scp. It is the final step in the
job submission workflow – called after htc_status() confirms all jobs
have completed.
htc_download( files, remote_path = "~/", local_path = ".", config = NULL, dry_run = FALSE, verbose = FALSE )htc_download( files, remote_path = "~/", local_path = ".", config = NULL, dry_run = FALSE, verbose = FALSE )
files |
A character vector. One or more filenames or glob patterns
to download from |
remote_path |
A character string. The directory on the submit node
where the files are located. Defaults to |
local_path |
A character string. The local directory where downloaded
files will be saved. Defaults to |
config |
A named list as returned by |
dry_run |
Logical. If |
verbose |
Logical. If |
Glob patterns such as "*.tar.gz" are supported and are evaluated on the
remote server, not locally, so they match files that exist on the submit
node regardless of what is present on your local machine.
Called for its side effects. Returns invisible(NULL).
htc_download() is the final system-facing step in the submitr workflow.
Call it after htc_status() confirms all jobs have completed.
cfg <- htc_config() htc_status(cluster_id = 6302877, config = cfg, watch = TRUE) # Download all result tarballs htc_download( files = "*.tar.gz", config = cfg, local_path = "results/" )
Glob patterns are passed to the remote shell for evaluation so they
match files on the submit node, not on your local machine. The pattern
is single-quoted in the scp command to prevent local shell expansion.
Common patterns:
"*.tar.gz" – all result tarballs
"*.log" – all log files
"*.out" – all output files
"*.err" – all error files
Each call to htc_download() opens a new SSH connection. If you have
not configured ControlMaster in your ~/.ssh/config, this will trigger
a Duo MFA prompt. Run htc_config() for setup guidance.
# Preview the scp command without connecting to CHTC cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") htc_download(files = "*.tar.gz", config = cfg, dry_run = TRUE) ## Not run: # All remaining examples require a live CHTC connection cfg <- htc_config() # Download a single file htc_download(files = "r <- esults.tar.gz", config = cfg) # Download multiple specific files htc_download( files = c("job.log", "job.err", "results.tar.gz"), config = cfg ) # Download all result tarballs using a glob pattern htc_download( files = "*.tar.gz", config = cfg, local_path = "results/" ) # Download all log files from a specific remote directory htc_download( files = "*.log", remote_path = "~/projects/penguins/", local_path = "logs/", config = cfg ) ## End(Not run)# Preview the scp command without connecting to CHTC cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") htc_download(files = "*.tar.gz", config = cfg, dry_run = TRUE) ## Not run: # All remaining examples require a live CHTC connection cfg <- htc_config() # Download a single file htc_download(files = "r <- esults.tar.gz", config = cfg) # Download multiple specific files htc_download( files = c("job.log", "job.err", "results.tar.gz"), config = cfg ) # Download all result tarballs using a glob pattern htc_download( files = "*.tar.gz", config = cfg, local_path = "results/" ) # Download all log files from a specific remote directory htc_download( files = "*.log", remote_path = "~/projects/penguins/", local_path = "logs/", config = cfg ) ## End(Not run)
htc_gen_executable() writes a ready-to-use bash script (.sh) that
HTCondor runs inside the container for each job. The script creates a
results folder, runs the R script via Rscript, and compresses the
results into a tarball for transfer back to the submit node.
htc_gen_executable( output_file = "job.sh", r_script = NULL, results_folder = "results-folder", mode = "single", set_executable = TRUE, verbose = FALSE, comments = FALSE, output = "." )htc_gen_executable( output_file = "job.sh", r_script = NULL, results_folder = "results-folder", mode = "single", set_executable = TRUE, verbose = FALSE, comments = FALSE, output = "." )
output_file |
A character string. Name of the shell script to write.
Must end in |
r_script |
A character string. Name of the R script that HTCondor
will run inside the container, e.g. |
results_folder |
A character string. Name of the folder created
inside the container to hold job outputs before compression. Defaults
to |
mode |
A character string. Execution mode. |
set_executable |
Logical. If |
verbose |
Logical. If |
comments |
Logical. If |
output |
A character string. Directory where the shell script will
be written. Defaults to |
Called for its side effects. Writes a bash script to
file.path(output, output_file) and sets executable permissions when
set_executable = TRUE. Returns invisible(NULL).
The executable script generated by htc_gen_executable() is the file
referenced by the executable argument in htc_gen_submit(). The two
functions should always use the same mode. In "multiple" mode,
HTCondor passes each subset filename to the script as ${1}, which is
forwarded to the R script as a positional argument. The R script must
be written to accept this argument – the recommended approach is
toolero::detect_execution_context():
context <- toolero::detect_execution_context() input_file <- switch(context, interactive = "data/penguins.csv", quarto = params$input_file, rscript = commandArgs(trailingOnly = TRUE)[1] )
# Single-job executable script htc_gen_executable(r_script = "analysis.R", output = tempdir()) # Multiple-job executable script htc_gen_executable( r_script = "analysis.R", mode = "multiple", output = tempdir() ) # Custom R script name with annotations htc_gen_executable( output_file = "run.sh", r_script = "run-analysis.R", comments = TRUE, verbose = TRUE, output = tempdir() )# Single-job executable script htc_gen_executable(r_script = "analysis.R", output = tempdir()) # Multiple-job executable script htc_gen_executable( r_script = "analysis.R", mode = "multiple", output = tempdir() ) # Custom R script name with annotations htc_gen_executable( output_file = "run.sh", r_script = "run-analysis.R", comments = TRUE, verbose = TRUE, output = tempdir() )
htc_gen_submit() writes a ready-to-use HTCondor submit file (.sub)
for running a containerized R job on an HTC cluster such as CHTC. It
supports both single-job and multiple-job submission modes.
htc_gen_submit( output_file = "job.sub", container_image = NULL, executable = NULL, input_files = NULL, output_files = NULL, mode = "single", queue = 1L, queue_from = NULL, resources = "small", custom_resources = NULL, gpu = FALSE, gpu_options = NULL, verbose = FALSE, comments = FALSE, output = "." )htc_gen_submit( output_file = "job.sub", container_image = NULL, executable = NULL, input_files = NULL, output_files = NULL, mode = "single", queue = 1L, queue_from = NULL, resources = "small", custom_resources = NULL, gpu = FALSE, gpu_options = NULL, verbose = FALSE, comments = FALSE, output = "." )
output_file |
A character string. Name of the submit file to write.
Must end in |
container_image |
A character string. The container image to use,
including the registry prefix, e.g.
|
executable |
A character string. The shell script that HTCondor will
run inside the container, e.g. |
input_files |
A character vector. Files to transfer to the job's
working directory before execution, e.g. |
output_files |
A character vector. Files to transfer back from the
job's working directory after execution. In |
mode |
A character string. Submission mode. |
queue |
A positive integer. Number of identical jobs to submit.
Only used when |
queue_from |
A character string. Path to the manifest file produced
by |
resources |
A character string. Compute resource preset. One of
|
custom_resources |
A named list. Required when |
gpu |
Logical. If |
gpu_options |
A named list or |
verbose |
Logical. If |
comments |
Logical. If |
output |
A character string. Directory where the submit file (and,
in |
Called for its side effects. Writes an HTCondor submit file to
file.path(output, output_file). In "multiple" mode also writes
subdatasets.csv to output. Returns invisible(NULL).
When mode = "multiple", HTCondor passes each subset filename to the
executable as a positional argument via arguments = $(file). Your R
script must be written to accept and use this argument. The recommended
approach is to use toolero::detect_execution_context() in your analysis
script, which resolves the input file path correctly across interactive,
Quarto, and Rscript execution contexts:
context <- toolero::detect_execution_context() input_file <- switch(context, interactive = "data/penguins.csv", quarto = params$input_file, rscript = commandArgs(trailingOnly = TRUE)[1] ) data <- readr::read_csv(input_file)
The typical workflow is:
Write and develop your analysis in analysis.qmd using
toolero::detect_execution_context() for data loading.
Split your dataset with toolero::write_by_group(manifest = TRUE) to
produce subset CSV files and a manifest.csv.
Strip analysis.qmd to analysis.R with knitr::purl().
Call htc_gen_submit(mode = "multiple", queue_from = "manifest.csv")
to produce the submit file and subdatasets.csv.
Copy analysis.R, the subset data files, analysis.sub,
analysis.sh, and subdatasets.csv to CHTC and submit.
Resource presets are loaded at runtime from inst/extdata/htc-resources.yaml.
To customize presets for a specific project, copy that file to your project
directory as htc-resources.yaml and edit the values. htc_gen_submit()
checks for a local htc-resources.yaml in the working directory first,
falling back to the package default if none is found.
# Single-job submit file with default resource preset htc_gen_submit(output = tempdir()) # Single-job submit file with medium resources and file transfer htc_gen_submit( output_file = "analysis.sub", container_image = "docker://registry.doit.wisc.edu/netid/myimage", executable = "analysis.sh", input_files = "analysis.R", output_files = "results.tar.gz", resources = "medium", output = tempdir() ) # Annotated submit file useful for learning HTCondor syntax htc_gen_submit( output_file = "annotated.sub", comments = TRUE, verbose = TRUE, output = tempdir() ) # Custom resource request htc_gen_submit( resources = "custom", custom_resources = list(cpus = 2, memory = "8GB", disk = "4GB"), output = tempdir() ) ## Not run: # Multiple-job submit file driven by a write_by_group() manifest htc_gen_submit( output_file = "analysis.sub", container_image = "docker://registry.doit.wisc.edu/netid/myimage", executable = "analysis.sh", input_files = "analysis.R", mode = "multiple", queue_from = "data/manifest.csv", resources = "medium", output = "." ) ## End(Not run)# Single-job submit file with default resource preset htc_gen_submit(output = tempdir()) # Single-job submit file with medium resources and file transfer htc_gen_submit( output_file = "analysis.sub", container_image = "docker://registry.doit.wisc.edu/netid/myimage", executable = "analysis.sh", input_files = "analysis.R", output_files = "results.tar.gz", resources = "medium", output = tempdir() ) # Annotated submit file useful for learning HTCondor syntax htc_gen_submit( output_file = "annotated.sub", comments = TRUE, verbose = TRUE, output = tempdir() ) # Custom resource request htc_gen_submit( resources = "custom", custom_resources = list(cpus = 2, memory = "8GB", disk = "4GB"), output = tempdir() ) ## Not run: # Multiple-job submit file driven by a write_by_group() manifest htc_gen_submit( output_file = "analysis.sub", container_image = "docker://registry.doit.wisc.edu/netid/myimage", executable = "analysis.sh", input_files = "analysis.R", mode = "multiple", queue_from = "data/manifest.csv", resources = "medium", output = "." ) ## End(Not run)
htc_status() connects to an HTC submit node via SSH and runs
condor_q to report the status of jobs in the queue. By default it
shows all of your jobs. Optionally filter by cluster ID to monitor a
specific submission.
htc_status( cluster_id = NULL, config = NULL, watch = FALSE, interval = 60L, dry_run = FALSE, verbose = FALSE )htc_status( cluster_id = NULL, config = NULL, watch = FALSE, interval = 60L, dry_run = FALSE, verbose = FALSE )
cluster_id |
An integer or character string. The cluster ID returned
by |
config |
A named list as returned by |
watch |
Logical. If |
interval |
A positive integer. Number of seconds to wait between
polls when |
dry_run |
Logical. If |
verbose |
Logical. If |
When watch = TRUE, htc_status() polls the queue repeatedly at a
fixed interval until all jobs in the cluster have completed, printing
a timestamped snapshot after each poll.
Called for its side effects. Prints the condor_q output to the
console. Returns the most recent output invisibly as a character vector.
HTCondor reports each job's status with a single letter:
| Code | Meaning |
| I | Idle -- waiting for a matching execute node |
| R | Running -- currently executing |
| H | Held -- paused, usually due to an error |
| C | Completed -- finished successfully |
| X | Removed -- cancelled |
| S | Suspended |
Jobs disappear from condor_q once they complete and their output has
been transferred back to the submit node. Use htc_download() to retrieve
completed job output.
cfg <- htc_config() # One-shot status check htc_status(config = cfg) # Monitor a specific cluster until completion htc_status(cluster_id = 6302860, config = cfg, watch = TRUE)
Each poll in watch mode opens a new SSH connection. Configuring
ControlMaster in your ~/.ssh/config (see htc_config()) is strongly
recommended when using watch = TRUE to avoid repeated Duo MFA prompts.
# Preview the SSH command without connecting to CHTC cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") htc_status(config = cfg, dry_run = TRUE) # Preview with a specific cluster ID htc_status(cluster_id = 6302860, config = cfg, dry_run = TRUE) ## Not run: # All remaining examples require a live CHTC connection cfg <- htc_config() # Check all your jobs htc_status(config = cfg) # Check a specific cluster htc_status(cluster_id = 6302860, config = cfg) # Watch a cluster until all jobs complete (polls every 60 seconds) htc_status(cluster_id = 6302860, config = cfg, watch = TRUE) # Watch with a shorter polling interval htc_status(cluster_id = 6302860, config = cfg, watch = TRUE, interval = 30) ## End(Not run)# Preview the SSH command without connecting to CHTC cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") htc_status(config = cfg, dry_run = TRUE) # Preview with a specific cluster ID htc_status(cluster_id = 6302860, config = cfg, dry_run = TRUE) ## Not run: # All remaining examples require a live CHTC connection cfg <- htc_config() # Check all your jobs htc_status(config = cfg) # Check a specific cluster htc_status(cluster_id = 6302860, config = cfg) # Watch a cluster until all jobs complete (polls every 60 seconds) htc_status(cluster_id = 6302860, config = cfg, watch = TRUE) # Watch with a shorter polling interval htc_status(cluster_id = 6302860, config = cfg, watch = TRUE, interval = 30) ## End(Not run)
htc_submit() connects to an HTC submit node via SSH and runs
condor_submit on a submit file that has already been uploaded with
htc_upload(). It changes into the remote directory before submitting
so that relative paths in the submit file resolve correctly.
htc_submit( submit_file = "job.sub", remote_path = "~/", config = NULL, dry_run = FALSE, verbose = FALSE )htc_submit( submit_file = "job.sub", remote_path = "~/", config = NULL, dry_run = FALSE, verbose = FALSE )
submit_file |
A character string. Name of the submit file on the
remote node, e.g. |
remote_path |
A character string. The directory on the submit node
where the submit file was uploaded. Defaults to |
config |
A named list as returned by |
dry_run |
Logical. If |
verbose |
Logical. If |
The cluster ID assigned by HTCondor as a character string,
returned invisibly. Pass it directly to htc_status() to monitor
job progress. Returns invisible(NULL) if the cluster ID cannot be
parsed from the condor_submit output.
htc_submit() is the second system-facing step in the submitr workflow.
Call it after uploading your files with htc_upload(). The returned
cluster ID can be passed directly to htc_status().
cfg <- htc_config()
htc_upload(
files = c("job.sub", "job.sh", "analysis.R"),
config = cfg
)
cluster_id <- htc_submit(submit_file = "job.sub", config = cfg)
htc_status(cluster_id = cluster_id, config = cfg, watch = TRUE)
remote_path must match htc_upload()
htc_submit() runs cd remote_path && condor_submit submit_file on the
submit node. HTCondor resolves all paths in the submit file relative to
the directory where condor_submit is called. If remote_path does not
match the directory where files were uploaded, HTCondor will not find the
executable, input files, or output destinations and the job will fail.
Each call to htc_submit() opens a new SSH connection. If you have not
configured ControlMaster in your ~/.ssh/config, this will trigger a
Duo MFA prompt. Run htc_config() for setup guidance.
# Preview the SSH command without connecting to CHTC cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") htc_submit(submit_file = "job.sub", config = cfg, dry_run = TRUE) ## Not run: # All remaining examples require a live CHTC connection cfg <- htc_config() # Submit using default remote path htc_submit(submit_file = "job.sub", config = cfg) # Submit from a specific remote directory htc_submit( submit_file = "analysis.sub", remote_path = "~/projects/penguins/", config = cfg ) # Submit with verbose output to see condor_submit response htc_submit( submit_file = "job.sub", config = cfg, verbose = TRUE ) ## End(Not run)# Preview the SSH command without connecting to CHTC cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") htc_submit(submit_file = "job.sub", config = cfg, dry_run = TRUE) ## Not run: # All remaining examples require a live CHTC connection cfg <- htc_config() # Submit using default remote path htc_submit(submit_file = "job.sub", config = cfg) # Submit from a specific remote directory htc_submit( submit_file = "analysis.sub", remote_path = "~/projects/penguins/", config = cfg ) # Submit with verbose output to see condor_submit response htc_submit( submit_file = "job.sub", config = cfg, verbose = TRUE ) ## End(Not run)
htc_upload() copies one or more local files or directories to a
directory on an HTC submit node via scp. It is the first step in the
job submission workflow – files must be present on the submit node before
htc_submit() can run condor_submit.
htc_upload( files, remote_path = "~/", config = NULL, dry_run = FALSE, verbose = FALSE )htc_upload( files, remote_path = "~/", config = NULL, dry_run = FALSE, verbose = FALSE )
files |
A character vector. One or more local file paths or directory paths to copy to the submit node. A single file, a vector of files, and a directory path are all accepted. Directories are copied recursively. |
remote_path |
A character string. The destination directory on the
submit node. Defaults to |
config |
A named list as returned by |
dry_run |
Logical. If |
verbose |
Logical. If |
Called for its side effects. Returns invisible(NULL).
htc_upload() is the first system-facing step in the submitr workflow.
Call it after generating your submit file and executable script with
htc_gen_submit() and htc_gen_executable(), and before calling
htc_submit().
The typical sequence is:
cfg <- htc_config()
htc_upload(
files = c("job.sub", "job.sh", "analysis.R", "data.csv"),
config = cfg
)
htc_submit(submit_file = "job.sub", config = cfg)
Each call to htc_upload() opens a new SSH connection. If you have not
configured ControlMaster in your ~/.ssh/config, this will trigger a
Duo MFA prompt. Run htc_config() for setup guidance.
# Preview the scp command without connecting to CHTC cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") tmp <- tempfile(fileext = ".sub") writeLines("queue 1", tmp) htc_upload(files = tmp, config = cfg, dry_run = TRUE) ## Not run: # All remaining examples require a live CHTC connection cfg <- htc_config() # Upload a single file htc_upload(files = "job.sub", config = cfg) # Upload multiple files htc_upload( files = c("job.sub", "job.sh", "analysis.R"), config = cfg ) # Upload a directory htc_upload(files = "jobs/", config = cfg) # Upload to a specific remote directory htc_upload( files = c("job.sub", "job.sh"), remote_path = "~/projects/penguins/", config = cfg ) ## End(Not run)# Preview the scp command without connecting to CHTC cfg <- list(username = "netid", server = "ap2002.chtc.wisc.edu") tmp <- tempfile(fileext = ".sub") writeLines("queue 1", tmp) htc_upload(files = tmp, config = cfg, dry_run = TRUE) ## Not run: # All remaining examples require a live CHTC connection cfg <- htc_config() # Upload a single file htc_upload(files = "job.sub", config = cfg) # Upload multiple files htc_upload( files = c("job.sub", "job.sh", "analysis.R"), config = cfg ) # Upload a directory htc_upload(files = "jobs/", config = cfg) # Upload to a specific remote directory htc_upload( files = c("job.sub", "job.sh"), remote_path = "~/projects/penguins/", config = cfg ) ## End(Not run)