Title: | Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM, PBS/Torque) |
---|---|
Description: | Evaluate arbitrary function calls using workers on HPC schedulers in single line of code. All processing is done on the network without accessing the file system. Remote schedulers are supported via SSH. |
Authors: | Michael Schubert [aut, cre, cph] , ZeroMQ authors [aut, cph] (source files in 'src/libzmq' and 'src/cppzmq') |
Maintainer: | Michael Schubert <[email protected]> |
License: | Apache License (== 2.0) | file LICENSE |
Version: | 0.9.5 |
Built: | 2024-11-19 06:45:19 UTC |
Source: | CRAN |
Queue function calls on the cluster
Q( fun, ..., const = list(), export = list(), pkgs = c(), seed = 128965, memory = NULL, template = list(), n_jobs = NULL, job_size = NULL, split_array_by = -1, rettype = "list", fail_on_error = TRUE, workers = NULL, log_worker = FALSE, chunk_size = NA, timeout = Inf, max_calls_worker = Inf, verbose = TRUE )
Q( fun, ..., const = list(), export = list(), pkgs = c(), seed = 128965, memory = NULL, template = list(), n_jobs = NULL, job_size = NULL, split_array_by = -1, rettype = "list", fail_on_error = TRUE, workers = NULL, log_worker = FALSE, chunk_size = NA, timeout = Inf, max_calls_worker = Inf, verbose = TRUE )
fun |
A function to call |
... |
Objects to be iterated in each function call |
const |
A list of constant arguments passed to each function call |
export |
List of objects to be exported to the worker |
pkgs |
Character vector of packages to load on the worker |
seed |
A seed to set for each function call |
memory |
Short for 'template=list(memory=value)' |
template |
A named list of values to fill in the scheduler template |
n_jobs |
The number of jobs to submit; upper limit of jobs if job_size is given as well |
job_size |
The number of function calls per job |
split_array_by |
The dimension number to split any arrays in '...'; default: last |
rettype |
Return type of function call (vector type or 'list') |
fail_on_error |
If an error occurs on the workers, continue or fail? |
workers |
Optional instance of QSys representing a worker pool |
log_worker |
Write a log file for each worker |
chunk_size |
Number of function calls to chunk together defaults to 100 chunks per worker or max. 10 kb per chunk |
timeout |
Maximum time in seconds to wait for worker (default: Inf) |
max_calls_worker |
Maxmimum number of chunks that will be sent to one worker |
verbose |
Print status messages and progress bar (default: TRUE) |
A list of whatever 'fun' returned
## Not run: # Run a simple multiplication for numbers 1 to 3 on a worker node fx = function(x) x * 2 Q(fx, x=1:3, n_jobs=1) # list(2,4,6) # Run a mutate() call in dplyr on a worker node iris %>% mutate(area = Q(`*`, e1=Sepal.Length, e2=Sepal.Width, n_jobs=1)) # iris with an additional column 'area' ## End(Not run)
## Not run: # Run a simple multiplication for numbers 1 to 3 on a worker node fx = function(x) x * 2 Q(fx, x=1:3, n_jobs=1) # list(2,4,6) # Run a mutate() call in dplyr on a worker node iris %>% mutate(area = Q(`*`, e1=Sepal.Length, e2=Sepal.Width, n_jobs=1)) # iris with an additional column 'area' ## End(Not run)
Queue function calls defined by rows in a data.frame
Q_rows( df, fun, const = list(), export = list(), pkgs = c(), seed = 128965, memory = NULL, template = list(), n_jobs = NULL, job_size = NULL, rettype = "list", fail_on_error = TRUE, workers = NULL, log_worker = FALSE, chunk_size = NA, timeout = Inf, max_calls_worker = Inf, verbose = TRUE )
Q_rows( df, fun, const = list(), export = list(), pkgs = c(), seed = 128965, memory = NULL, template = list(), n_jobs = NULL, job_size = NULL, rettype = "list", fail_on_error = TRUE, workers = NULL, log_worker = FALSE, chunk_size = NA, timeout = Inf, max_calls_worker = Inf, verbose = TRUE )
df |
data.frame with iterated arguments |
fun |
A function to call |
const |
A list of constant arguments passed to each function call |
export |
List of objects to be exported to the worker |
pkgs |
Character vector of packages to load on the worker |
seed |
A seed to set for each function call |
memory |
Short for 'template=list(memory=value)' |
template |
A named list of values to fill in the scheduler template |
n_jobs |
The number of jobs to submit; upper limit of jobs if job_size is given as well |
job_size |
The number of function calls per job |
rettype |
Return type of function call (vector type or 'list') |
fail_on_error |
If an error occurs on the workers, continue or fail? |
workers |
Optional instance of QSys representing a worker pool |
log_worker |
Write a log file for each worker |
chunk_size |
Number of function calls to chunk together defaults to 100 chunks per worker or max. 10 kb per chunk |
timeout |
Maximum time in seconds to wait for worker (default: Inf) |
max_calls_worker |
Maxmimum number of chunks that will be sent to one worker |
verbose |
Print status messages and progress bar (default: TRUE) |
## Not run: # Run a simple multiplication for data frame columns x and y on a worker node fx = function (x, y) x * y df = data.frame(x = 5, y = 10) Q_rows(df, fx, job_size = 1) # [1] 50 # Q_rows also matches the names of a data frame with the function arguments fx = function (x, y) x - y df = data.frame(y = 5, x = 10) Q_rows(df, fx, job_size = 1) # [1] 5 ## End(Not run)
## Not run: # Run a simple multiplication for data frame columns x and y on a worker node fx = function (x, y) x * y df = data.frame(x = 5, y = 10) Q_rows(df, fx, job_size = 1) # [1] 50 # Q_rows also matches the names of a data frame with the function arguments fx = function (x, y) x - y df = data.frame(y = 5, x = 10) Q_rows(df, fx, job_size = 1) # [1] 5 ## End(Not run)
Register clustermq as 'foreach' parallel handler
register_dopar_cmq(...)
register_dopar_cmq(...)
... |
List of arguments passed to the 'Q' function, e.g. n_jobs |
Creates a pool of workers
workers( n_jobs, data = NULL, reuse = TRUE, template = list(), log_worker = FALSE, qsys_id = getOption("clustermq.scheduler", qsys_default), verbose = FALSE, ... )
workers( n_jobs, data = NULL, reuse = TRUE, template = list(), log_worker = FALSE, qsys_id = getOption("clustermq.scheduler", qsys_default), verbose = FALSE, ... )
n_jobs |
Number of jobs to submit (0 implies local processing) |
data |
Set common data (function, constant args, seed) |
reuse |
Whether workers are reusable or get shut down after call |
template |
A named list of values to fill in template |
log_worker |
Write a log file for each worker |
qsys_id |
Character string of QSys class to use |
verbose |
Print message about worker startup |
... |
Additional arguments passed to the qsys constructor |
An instance of the QSys class