Queue function calls on the cluster
Q( fun, ..., const = list(), export = list(), pkgs = c(), seed = 128965, memory = NULL, template = list(), n_jobs = NULL, job_size = NULL, rettype = "list", fail_on_error = TRUE, workers = NULL, log_worker = FALSE, chunk_size = NA, timeout = Inf, max_calls_worker = Inf, verbose = TRUE )
Q( fun, ..., const = list(), export = list(), pkgs = c(), seed = 128965, memory = NULL, template = list(), n_jobs = NULL, job_size = NULL, rettype = "list", fail_on_error = TRUE, workers = NULL, log_worker = FALSE, chunk_size = NA, timeout = Inf, max_calls_worker = Inf, verbose = TRUE )
fun |
A function to call |
... |
Objects to be iterated in each function call |
const |
A list of constant arguments passed to each function call |
export |
List of objects to be exported to the worker |
pkgs |
Character vector of packages to load on the worker |
seed |
A seed to set for each function call |
memory |
Short for 'template=list(memory=value)' |
template |
A named list of values to fill in the scheduler template |
n_jobs |
The number of jobs to submit; upper limit of jobs if job_size is given as well |
job_size |
The number of function calls per job |
rettype |
Return type of function call (vector type or 'list') |
fail_on_error |
If an error occurs on the workers, continue or fail? |
workers |
Optional instance of QSys representing a worker pool |
log_worker |
Write a log file for each worker |
chunk_size |
Number of function calls to chunk together defaults to 100 chunks per worker or max. 10 kb per chunk |
timeout |
Maximum time in seconds to wait for worker (default: Inf) |
max_calls_worker |
Maxmimum number of chunks that will be sent to one worker |
verbose |
Print status messages and progress bar (default: TRUE) |
A list of whatever 'fun' returned
## Not run: # Run a simple multiplication for numbers 1 to 3 on a worker node fx = function(x) x * 2 Q(fx, x=1:3, n_jobs=1) # list(2,4,6) # Run a mutate() call in dplyr on a worker node iris %>% mutate(area = Q(`*`, e1=Sepal.Length, e2=Sepal.Width, n_jobs=1)) # iris with an additional column 'area' ## End(Not run)
## Not run: # Run a simple multiplication for numbers 1 to 3 on a worker node fx = function(x) x * 2 Q(fx, x=1:3, n_jobs=1) # list(2,4,6) # Run a mutate() call in dplyr on a worker node iris %>% mutate(area = Q(`*`, e1=Sepal.Length, e2=Sepal.Width, n_jobs=1)) # iris with an additional column 'area' ## End(Not run)
Queue function calls defined by rows in a data.frame
Q_rows( df, fun, const = list(), export = list(), pkgs = c(), seed = 128965, memory = NULL, template = list(), n_jobs = NULL, job_size = NULL, rettype = "list", fail_on_error = TRUE, workers = NULL, log_worker = FALSE, chunk_size = NA, timeout = Inf, max_calls_worker = Inf, verbose = TRUE )
Q_rows( df, fun, const = list(), export = list(), pkgs = c(), seed = 128965, memory = NULL, template = list(), n_jobs = NULL, job_size = NULL, rettype = "list", fail_on_error = TRUE, workers = NULL, log_worker = FALSE, chunk_size = NA, timeout = Inf, max_calls_worker = Inf, verbose = TRUE )
df |
data.frame with iterated arguments |
fun |
A function to call |
const |
A list of constant arguments passed to each function call |
export |
List of objects to be exported to the worker |
pkgs |
Character vector of packages to load on the worker |
seed |
A seed to set for each function call |
memory |
Short for 'template=list(memory=value)' |
template |
A named list of values to fill in the scheduler template |
n_jobs |
The number of jobs to submit; upper limit of jobs if job_size is given as well |
job_size |
The number of function calls per job |
rettype |
Return type of function call (vector type or 'list') |
fail_on_error |
If an error occurs on the workers, continue or fail? |
workers |
Optional instance of QSys representing a worker pool |
log_worker |
Write a log file for each worker |
chunk_size |
Number of function calls to chunk together defaults to 100 chunks per worker or max. 10 kb per chunk |
timeout |
Maximum time in seconds to wait for worker (default: Inf) |
max_calls_worker |
Maxmimum number of chunks that will be sent to one worker |
verbose |
Print status messages and progress bar (default: TRUE) |
## Not run: # Run a simple multiplication for data frame columns x and y on a worker node fx = function (x, y) x * y df = data.frame(x = 5, y = 10) Q_rows(df, fx, job_size = 1) # [1] 50 # Q_rows also matches the names of a data frame with the function arguments fx = function (x, y) x - y df = data.frame(y = 5, x = 10) Q_rows(df, fx, job_size = 1) # [1] 5 ## End(Not run)
## Not run: # Run a simple multiplication for data frame columns x and y on a worker node fx = function (x, y) x * y df = data.frame(x = 5, y = 10) Q_rows(df, fx, job_size = 1) # [1] 50 # Q_rows also matches the names of a data frame with the function arguments fx = function (x, y) x - y df = data.frame(y = 5, x = 10) Q_rows(df, fx, job_size = 1) # [1] 5 ## End(Not run)
Register clustermq as 'foreach' parallel handler
register_dopar_cmq(...)
register_dopar_cmq(...)
... |
List of arguments passed to the 'Q' function, e.g. n_jobs |
Creates a pool of workers
workers( n_jobs, data = NULL, reuse = TRUE, template = list(), log_worker = FALSE, qsys_id = getOption("clustermq.scheduler", qsys_default), verbose = FALSE, ... )
workers( n_jobs, data = NULL, reuse = TRUE, template = list(), log_worker = FALSE, qsys_id = getOption("clustermq.scheduler", qsys_default), verbose = FALSE, ... )
n_jobs |
Number of jobs to submit (0 implies local processing) |
data |
Set common data (function, constant args, seed) |
reuse |
Whether workers are reusable or get shut down after call |
template |
A named list of values to fill in template |
log_worker |
Write a log file for each worker |
qsys_id |
Character string of QSys class to use |
verbose |
Print message about worker startup |
... |
Additional arguments passed to the qsys constructor |
An instance of the QSys class