availableCores() and availableWorkers() gained support for R
options parallelly.availableCores.methods.excludes and
parallelly.availableWorkers.methods.excludes (with corresponding
environment variables) to specify lookup methods to be excluded by
default.
makeClusterPSOCK() and makeNodePSOCK() gained argument
rscript_call to customize the parallel worker loop, which
defaults to parallel:::.workRSOCK().
Add internal worker loop parallelly:::workRPSOCK() that takes
optional argument workCommand to customize the default
parallel:::workCommand().
Analogusly to availableCores(), availableWorkers() queries also
Linux CGroups v2 CPU affinity values cpuset.cpus and
cpuset.cpus.effective.
availableCores(methods = "cgroups2.cpuset.cpus") would produce
"Error : 'length(cpuset) <= max_cores' is not TRUE" if CGroups v2
'cpuset.cpus' comprised an empty string.
availableCores(methods = "cgroups2.cpuset.cpus.effective") would
produce "Error in if (any(value < 0L | value >= max_cores)) { :
missing value where TRUE/FALSE needed" if parallel::detectCores()
returned a missing value.
Now makeClusterPSOCK(workers) produces a more informative error
message if workers is not an integer or a character vector.
availableCores() gained argument fraction, which allows you to
specify that a certain fraction of the available CPU cores should
be returned, e.g. availableCores(fraction = 0.5).
Now availableCores() queries also Linux CGroups v2 CPU affinity
values cpuset.cpus and cpuset.cpus.effective.
Give more information on invalid 'RichSOCKnode' connections.
availableWorkers(method = "Slurm") expands the compressed
hostname representation given by environment variable
SLURM_JOB_NODELIST to a vector of hostnames. For this it uses
scontrol show hostnames. If scontrol is not available, it falls
back to an internal parsing algorithm. This internal algorithm
would incorrectly pad some hostnames with zero, e.g. n[09-10]
would become c("n09", "n010") whereas it should be c("n09", "n10").
availableWorkers(method = "Slurm") did not, as documented, fall
back to legacy environment variable SLURM_NODELIST when
SLURM_JOB_NODELIST is not set
availableWorkers(method = "Slurm") could return the wrong set of
hostnames if environment variables SLURM_JOB_NODELIST,
SLURM_JOB_CPUS_PER_NODE and SLURM_CPUS_PER_TASK were all set,
because SLURM_CPUS_PER_TASK was treated as a string, not an
integer, in comparisons.
makeClusterPSOCK(..., setup_strategy = "sequential") would not
respect internal R options for how to retry with another TCP port,
if failing to start a cluster node.
Loading the package in the Positron Console would incorrectly set
environment variable _R_CHECK_LIMIT_CORES_ to TRUE, which in turn
would result in availableCores() being limited to a maximum of
two (2) CPU cores. This bug was introduced in parallelly 1.46.0
(2025-12-12).
parallel::clusterExport() on a makeClusterSequential() cluster
would export to the global environment rather than the local
environment of the cluster nodes.
Argument user of makeClusterPSOCK() did not recycle across
workers when length(user) == 1.
makeClusterPSOCK(default_packages = "*") with an empty R option
defaultPackages gave an error.
Now availableCores() returns 2 also when package vignettes are
built by R CMD build or R CMD check. This helps to prevent
package vignettes from overusing the CPU cores when building and
checking R packages.
Functions availableCores(), availableWorkers(), cpuLoad(),
and freePort() can now be called directly from the command line,
e.g. Rscript -e parallelly::availableCores --omit=1 and
Rscript -e parallelly::freePort.
parallelly.supportsMulticore.disableOn was documented to
disable forked ("multicore") processing in the RStudio Terminal,
but that was not the case due to a thinko. Default options and the
documentation have now been updated to reflect that it is only
disabled in the RStudio Console.parallelly.supportsMulticore.disableOn to hold
"rstudio_terminal" had no effect.print() for RichSOCKcluster outputs a more concise summary,
which is also grammatically correct for single-node clusters.makeClusterPSOCK() started to collect
session information on each parallel worker, which included
capabilities(). However, for unknown reasons, capabilities()
caused the cluster creation to fail on GitHub Actions running
macOS. The problem could be reproduced neither locally, on the
mac-builder, nor on the CRAN macOS servers. Because this feature is
non-critical and was only introduced in the previous version, I decided
to remove the collection of capabilities() again.availableCores() gained argument max, which limits the maximum
number of cores returned after everything else is applied, i.e.
availableCores(..., max = n) is short for min(n, availableCores(...), na.rm = TRUE).
availableWorkers() gained argument ..., which passes any
additional arguments to availableCores(), if specified.
If killNode(..., signal = tools::SIGTERM) successfully signaled
the cluster node, it will now close any existing socket connection
to the node. If the node is running on the local host, it will also
remove its temporary directory, because the node's R process
might not have exited gracefully.
The session information collected by makeClusterPSOCK() now
contains more details on each worker, e.g. the tempdir() folder,
capabilities(), and extSoftVersion().
Cluster nodes created by makeClusterPSOCK() gained attribute
calls, which record the sys.calls(). This can be useful when
troubleshooting from where a cluster was created. Analogously,
setting R option parallelly.makeNodePSOCK.calls to TRUE will
relay the call stack in the system call that launched the cluster
node.
availableCores() would not respect method = "fallback" if
constraints specified "connections" or "connections-N".
availableCores() would produce an error on Error in scan(file = file, what = what, ...) on systems that have a /proc/self/mounts
file with syntax errors. Such files have been reported on Windows
Subsystem for Linux version 2 (WSL 2), where spaces in Windows paths
have not been properly escaped for some entries. Now such invalid
entries are skipped, before parsing the mount table.
Add support to availableCores() and availableWorkers() to
specify constraints = "connections-N", where N specifies the
number of connections to leave free after launching a PSOCK cluster
with this number of cores.
Add all.equal() for connection, which can distinguish between
two connections that share the same connection index, but are not
the same connection, e.g. when one was created, then closed, and
another one of the same kind is created.
availableCores() would not respect method = "fallback", since
v1.41.0 (2024-12-18), on a system with a value for method = "/proc/self/status".Now availableCores() memoizes the values of all its components.
This means that as soon as it has been called, environment variables
such as NSLOTS will no longer be queried.
Starting with R 4.5.0, one can use parallel::makeCluster(n, type = parallelly::RPSOCK) as an alternative to
parallelly::makeClusterPSOCK(n). Similarly, type = parallelly::RMPI creates a cluster using
parallelly::makeClusterMPI(), and type = parallelly::SEQ
creates a cluster using parallelly::makeClusterSequential().
This was first introduced in parallelly 1.38.0, but here we
rename PSOCK to RPSOCK and MPI to RMPI to minimize the risk
for mistaking them from the built-in types in the parallel
package. The R stands for "Rich".
parallelly.maxWorkers.localhost
limits. Improved the warning and error messages that are produced
when these settings are exceeded.future.debug is no longer used as a fallback for option
parallelly.debug.isNodeAlive() could produce warnings on doTryCatch(return(expr), name, parentenv, handler) : NAs introduced by coercion on MS
Windows. Improved the internal tasklist parsers used to test
whether a process is alive.
availableCores() could produce Error: Error in cache_controller[[field]] : subscript out of bounds in
... getCGroups1CpuQuota -> getCGroups1CpuPeriodMicroseconds.
availableCores() and availableWorkers() support also when
both CGroups v1 and CGroups v2 are enabled on the
machine. Previously, such configurations were completely ignored.Calling isNodeAlive() and killNode() on cluster nodes running on
external machines would produce Error in match.arg(type, choices = known_types, several.ok = FALSE) : 'arg' must be of length 1. This
bug was introduced in version 1.38.0 (2024-07-27), when adding
richer support for the rscript_sh argument.
Calling isNodeAlive() and killNode() on cluster nodes running on
external machines would produce Error: ‘length(rsh_call) == 1L’ is not TRUE if option rshopts were specified during creation.
The value of availableCores() was numeric rather than integer as
documented. This harmless bug was introduced in version 1.31.0
(2022-04-07).
Now availableCores() queries also /proc/self/status for CPU
affinity allotments.
makeClusterPSOCK() will now produce an error, rather than a
warning, when the local system command used to launch the parallel
worker failed with a non-zero exit code.
Now serializedSize() always returns a double. Previously, it
would return an integer, if the value could be represented by an
integer. However, it turned out that returning an integer increased
the risk for integer overflow later on if, say, two such values
were added together.
makeClusterPSOCK() on MS Windows failed to launch remote workers,
with warnings on "In system(local_cmd, wait = FALSE, input = input) : 'C:\WINDOWS\System32\OpenSSH\ssh.exe' not found". This
bug was introduced in version 1.38.0 (2024-07-27), when adding
richer support for the rscript_sh argument.user of makeClusterPSOCK() may now be a vector of
usernames - one for each worker specified.Querying of cgroups v1 'cpuquota' CPU limits broke in the previous release (v1.39.0).
availableCores() could produce error Failed to identify mount point for CGroups v1 controller 'cpuset' on some systems.
availableWorkers() would produce invalid warning on Identified 8 workers from the ‘PE_HOSTFILE’ file (...), which is more than environment variable ‘NSLOTS’ = 8 when running via a Grid Engine
job scheduler.
R_PARALLELLY_RANDOM_PORTS now supports
multiple, comma-separated port specifications, e.g.
"20001:20999" and "1068:1099,20001:20999,40530".help("makeClusterPSOCK") on how to use
systemd-run to limit workers' CPU quota and memory allowances.availableCores() does a better job detecting cgroups v2
cpu.max CPU restrictions.Now argument rshcmd of makeNodePSOCK() can be a function. It
must accept at least two arguments named rshopts and
worker. The rshopts argument is a character vector of length
zero or more. The worker argument is a string hostname. The
function must return a single string.
Now makeNodePSOCK() accepts rscript_sh = "none", which skips
quoting the Rscript call.
Now makeNodePSOCK() accepts rscript_sh of length one or two.
If length(rscript_sh) == 2, then rscript_sh[1] is for the inner
and rscript_sh[2] is for the outer shell quoting of the Rscript
call. More precisely, rscript_sh[1] is for Rscript arguments
that need shell quoting (e.g. Rscript -e "<expr>"), and
rscript_sh[2] is for the whole Rscript ... call.
Add makeClusterSequential() available for R (>= 4.4.0).
Starting with R 4.5.0 (currently R-devel), one can use
parallel::makeCluster(n, type = parallelly::PSOCK) as an
alternative to parallelly::makeClusterPSOCK(n). Similarly, type = parallelly::MPI creates a cluster using
parallelly::makeClusterMPI(), and type = parallelly::SEQ
creates a cluster using parallelly::makeClusterSequential().
Add serializedSize() for calculating the size of an object by
counting the number of bytes required to serialize it.
R_PARALLELLY_MAXWORKERS_LOCALHOST was
interpreted as integers rather than doubles.makeClusterPSOCK(nworkers) gained protection against setting up
too many localhost workers relative to number of available CPU
cores. If nworkers / availableCores() is greater than 1.0 (100%),
then a warning is produced. If greater than 3.0 (300%), an error is
produced. These limits can be configured by R option
parallelly.maxWorkers.localhost. These checks are skipped if
nworkers inherits from AsIs, e.g. makeClusterPSOCK(I(16)).
The current 3.0 (300%) limit is likely to be decreased in a future
release. A few packages fail R CMD check --as-cran with this
validation enabled. For example, one package uses 8 parallel
workers in its examples, while R CMD check --as-cran only allows
for two. To give such packages time to be fixed, the CRAN-enforced
limits are ignored for now.makeClusterPSOCK() could produce a confusing error Invalid port: NA if a non-available port was requested. Now the error message
is more informative, e.g.
Argument 'port' specifies non-available port(s): 80.isNodeAlive() and killNode() now support also worker processes
that run on remote machines. They do this by connecting to the
remote machine using the same method used to launch the worker,
which is typically SSH, and do their R calls that way.
isNodeAlive() and killNode() gained argument timeout for
controlling the maximum time, in seconds, before giving up and
returning NA.
Add cloneNode(), which can be used to "restart" RichSOCKnode
cluster nodes.
Argument worker for makeNodePSOCK() now takes the optional,
logical attribute localhost to manually specify that the worker
is a localhost worker.
Add print() for RichSOCKnode, which gives more details than
print() for SOCKnode.
print() for RichSOCKnode and RichSOCKcluster report on nodes
with broken connections.
Add as.cluster() for RichSOCKnode, which returns a
RichSOCKcluster.
Introduce R option parallelly.supportsMulticore.disableOn to
control where multicore processing is disabled by default.
Calling killNode() on RichSOCKnode node could theoretically
kill a process on the current machine with the same process ID
(PID), although the parallel worker (node) is running on another
machine.
isNodeAlive() on RichSOCKnode node could theoretically
return TRUE because there was a process with the same process ID
(PID) on the current machine, although the parallel worker (node)
is running on another machine.
isLocalHost() for SOCK0node was not declared an S3 method.
freePort() defaults to default = NA_integer_, so that
NA_integer_ is returned when no free port could be found.
However, in R (< 4.0.0), which does not support port querying, we
use default = "random".help("makeClusterPSOCK") that rscript_sh = "cmd" is
needed if the remote machines run MS Windows.makeClusterPSOCK(..., verbose = TRUE) would not show verbose
output. One still had to set option parallelly.debug to TRUE.
availableWorkers() could produce false sanity-check warnings on
mismatching 'PE_HOSTFILE' content and 'NSLOTS' for certain SGE-cluster
configurations.
availableWorkers(constraints = "connections"),
which limits the number of workers that can be be used to the
current number of free R connections according to
freeConnections(). This is the maximum number of PSOCK, SOCK,
and MPI parallel cluster nodes we can open without running out
of available R connections.availableCores() would produce a warning In is.na(constraints) : is.na() applied to non-(list or vector) of type 'NULL' when
running with R (< 4.0.0).
availableWorkers() did not acknowledge the "cgroups2.cpu.max"
and "Bioconductor" methods added to availableCores() in
parallelly 1.33.0 (2022-12-13). It also did not acknowledge
methods "cgroups.cpuset" and "cgroups.cpuquota" added in
parallelly 1.31.0 (2022-04-07), and "nproc" added in
parallelly 1.26.1 (2021-06-29).
When makeClusterPSOCK() failed to connect to all parallel workers
within the connectTimeout time limit, could either produce Error in sprintf(ngettext(failed, "Cluster setup failed (connectTimeout=%.1f seconds). %d worker of %d failed to connect.", : invalid format '%d'; use format %f, %e, %g or %a for numeric objects instead of an informative error message, or an
error message with the incorrect information.
Add killNode() to terminate cluster nodes via process signaling.
Currently, this is only supported for parallel workers on the local
machine, and only those created by makeClusterPSOCK().
makeClusterPSOCK() and likes now assert the running R session
has enough permissions on the operating system to do system calls
such as system2("Rscript --version"). If not, an informative
error message is produced.
On Unix, availableCores() queries also control groups v2 (cgroups
v2) field cpu.max for a possible CPU quota allocation. If a CPU
quota is set, then the number of CPUs is rounded to the nearest
integer, unless its less that 0.5, in case it's rounded up to a
single CPU. An example, where cgroups CPU quotas can be set to
limit the total CPU load, is with Linux containers, e.g. docker run --cpus=3.5 ....
Add support for availableCores(methods = "connections"), which
returns the current number of free R connections per
freeConnections(). This is the maximum number of PSOCK, SOCK,
and MPI parallel cluster nodes we can open without running out
of available R connections. A convenient way to use this and all
other methods is availableCores(constraints = "connections").
Now availableCores() recognizes environment variable
IS_BIOC_BUILD_MACHINE, which is set to true by the Bioconductor
(>= 3.16) check servers. If true, then a maximum of four (4) cores
is returned. This new environment variable replaces legacy
variable BBS_HOME used in Bioconductor (<= 3.15).
availableCores() splits up method "BiocParallel" into two;
"BiocParallel" and "Bioconductor". The former queries
environment variable BIOCPARALLEL_WORKER_NUMBER and the latter
IS_BIOC_BUILD_MACHINE. This means availableCores(which = "all") now reports on both.
isNodeAlive() will now produce a once-per-session informative
warning when it detects that it is not possible to check whether
another process is alive on the current machine.
Add section to help("makeClusterPSOCK", package = "parallelly")
explaining why R CMD check may produce "checking for detritus in
the temp directory ... NOTE" and how to avoid them.
Add section 'For package developers' to help("makeClusterPSOCK", package = "parallelly") reminding us that we need to stop all
clusters we created in package examples, tests, and vignettes.
isNodeAlive() failed to record which method works for testing if a
process exists or not, which meant it would keep trying all methods
each time. Similarly, if none works, it would still keep trying
each time instead of returning NA immediately. On some systems,
failing to check whether a process exists could result in one or
more warnings, in which case those warnings would be produced for
each call to isNodeAlive().host element of the SOCK0node or SOCKnode objects created
by makeClusterPSOCK() lost attribute localhost for localhost
workers. This made some error messages from the future package
less informative.revtunnel of makeNodePSOCK(), and
therefore also of makeClusterPSOCK(), is now NA, which means
it's agile to whether rshcmd[1] specifies an SSH client, or not.
If SSH is used, then it will resolve to revtunnel = TRUE,
otherwise to revtunnel = FALSE. This removed the need for
setting revtunnel = FALSE, when non-SSH clients are used.availableCores() and availableWorkers() gained support for the
'Fujitsu Technical Computing Suite' job scheduler. Specifically,
they acknowledges environment variables PJM_VNODE_CORE,
PJM_PROC_BY_NODE, and PJM_O_NODEINF. See
help("makeClusterPSOCK", package = "parallelly") for an example.makeClusterPSOCK() would fail with Error: node$session_info$process$pid == pid is not TRUE when running R in
Simplified Chinese (LANGUAGE=zh_CN), Traditional Chinese (Taiwan)
(LANGUAGE=zh_TW), or Korean (LANGUAGE=ko) locales.
Some warnings and errors showed the wrong call.
Changes to option parallelly.availableCores.system would be
ignored if done after the first call to availableCores().
availableCores() with option parallelly.availableCores.system
set to less that parallel::detectCores() would produce a warning,
e.g. "[INTERNAL]: Will ignore the cgroups CPU set, because it
contains one or more CPU indices that is out of range [0,0]: 0-7".
freePort() to
"random", which used to be "first". The main reason for this
is to make sure the default behavior is to return a random port
also on R (< 4.0.0) where we cannot test whether or not a port is
available.On Unix, availableCores() now queries also control groups
(cgroups) fields cpu.cfs_quota_us and cpu.cfs_period_us, for a
possible CPU quota allocation. If a CPU quota is set, then the
number of CPUs is rounded to the nearest integer, unless its less
that 0.5, in case it's rounded up to a single CPU. An example,
where cgroups CPU quotas can be set to limit the total CPU load, is
with Linux containers, e.g. docker run --cpus=3.5 ....
In addition to cgroups CPU quotas, availableCores() also queries
cgroups for a possible CPU affinity, which is available in field
cpuset.set. This should give the same result as what the already
existing 'nproc' method gives. However, not all systems have the
nproc tool installed, in which case this new approach should
work. Some high-performance compute (HPC) environments set the CPU
affinity so that jobs do not overuse the CPUs. It may also be set
by Linux containers, e.g. docker run --cpuset-cpus=0-2,8 ....
The minimum value returned by availableCores() is one (1). This
can be overridden by new option parallelly.availableCores.min.
This can be used to test parallelization methods on single-core
machines, e.g. options(parallelly.availableCores.min = 2L).
The 'nproc' result for availableCores() was ignored if nproc > 9.
availableCores() would return the 'fallback' value when only
'system' and 'nproc' information was available. However, in this
case, we do want it to return 'nproc' when 'nproc' != 'system',
because that is a strong indication that the number of CPU cores is
limited by control groups (cgroups) on Linux. If 'nproc' ==
'system', we cannot tell whether cgroups is enabled or not, which
means we will fall back to the 'fallback' value if there is no
other evidence that another number of cores are available to the
current R process.
Technically, canPortBeUsed() could falsely return FALSE if the
port check was interrupted by, say, a user interrupt.
freePort(ports, default = "random") would always use return
ports[1] if the system does not allow testing if a port is
available or not, or if none of the specified ports are available.
makeNodePSOCK(), and therefore also makeClusterPSOCK(), gained
argument rscript_sh, which controls how Rscript arguments are
shell quoted. The default is to make a best guess on what type of
shell is used where each cluster node is launched. If launched
locally, then it whatever platform the current R session is
running, i.e. either a POSIX shell ("sh") or MS Windows
("cmd"). If remotely, then the assumption is that a POSIX shell
("sh") is used.
makeNodePSOCK(), and therefore also makeClusterPSOCK(), gained
argument default_packages, which controls the default set of R
packages to be attached on each cluster node at startup. Moreover,
if argument rscript specifies an 'Rscript' executable, then
argument default_packages is used to populate Rscript
command-line option --default-packages=.... If rscript
specifies something else, e.g. an 'R' or 'Rterm' executable, then
environment variable R_DEFAULT_PACKAGES=... is set accordingly
when launching each cluster node.
Argument rscript_args of makeClusterPSOCK() now supports "*"
values. When used, the corresponding element will be replaced with
the internally added Rscript command-line options. If not
specified, such options are appended at the end.
makeClusterPSOCK() did not support backslashes (\) in
rscript_libs, backslashes that may originate from, for example,
Windows network drives. The result was that the worker would
silently ignore any rscript_libs components with backslashes.
The package detects when R CMD check runs and adjust default
settings via environment variables in order to play nicer with the
machine where the checks are running. Some of these environment
variables were in this case ignored since parallelly 1.26.0.
makeClusterPSOCK() launches parallel workers with option
socketOptions set to "no-delay" by default. This decreases the
communication latency between workers and the main R session,
significantly so on Unix. This option requires R (>= 4.1.0) and
has no effect in early versions of R.Added argument socketOptions to makeClusterPSOCK(), which sets
the corresponding R option on each cluster node when they are
launched.
Argument rscript_envs of makeClusterPSOCK() can also be used to
unset environment variables cluster nodes. Any named element with
value NA_character_ will be unset.
Argument rscript of makeClusterPSOCK() now supports "*"
values. When used, the corresponding element will be replaced with
the "Rscript", or if homogenous = TRUE, then absolute path to
current 'Rscript'.
makeClusterPSOCK() example on how to launch workers
distributed across multiple CPU Groups on MS Windows 10.isForkedChild() would only return TRUE in a forked child process,
if and only if, it had already been called in the parent R process.
Using argument rscript_startup would cause makeClusterPSOCK()
to fail in R-devel (>= r80666).
example("isNodeAlive") now uses \donttest{} to avoid long (> 10
s) elapsed run times on MS Windows.Add isNodeAlive() to check whether a cluster and cluster nodes
are alive or not.
Add isForkedChild() to check whether or not the current R process
is a forked child process.
Environment variable R_PARALLELLY_SUPPORTSMULTICORE_UNSTABLE was
incorrectly parsed as a logical instead of a character string. If
the variables was set to, say, "quiet", this would cause an error
when the package was loaded.
makeClusterPSOCK() failed to fall back to setup_strategy = "sequential", when not supported by the current R version.
availableCores() and availableWorkers() now respects
environment variable BIOCPARALLEL_WORKER_NUMBER introduced in
BiocParallel (>= 1.27.2). They also respect BBS_HOME which is
set on the Bioconductor check servers to limit the number of
parallel workers while checking Bioconductor packages.makeClusterPSOCK() and parallel::makeCluster() failed with
error "Cluster setup failed. setup_strategy = "parallel" and when
the tcltk package is loaded when running R (>= 4.0.0 && <=
4.1.0) on macOS. Now parallelly forces setup_strategy = "sequential" when the tcltk package is loaded on these R
versions.makeClusterPSOCK(..., setup_strategy = "parallel") would forget
to close an socket connection used to set up the workers. This
socket connection would be closed by the garbage collector
eventually with a warning.
parallelly::makeClusterPSOCK() would fail with "Error in
freePort(port) : Unknown value on argument 'port': 'auto'" if
environment variable R_PARALLEL_PORT was set to a port number.
parallelly::availableCores() would produce 'Error in if (grepl("^
[1-9]$", res)) return(as.integer(res)) : argument is of length
zero' on Linux systems without nproc installed.
print() on RichSOCKcluster mentions when the cluster is
registered to be automatically stopped by the garbage collector.setup_strategy = "parallel" when using
makeClusterPSOCK() or parallel::makeCluster(). The symptom is
that they, after a long wait, result in "Error in
makeClusterPSOCK(workers, ...) : Cluster setup failed. setup_strategy = "sequential for
parallelly and parallel when running in the RStudio
Console. If you wish to override this behavior, you can always set
option parallelly.makeNodePSOCK.setup_strategy to "parallel",
e.g. in your ~/.Rprofile file. Alternatively, you can set the
environment variable
R_PARALLELLY_MAKENODEPSOCK_SETUP_STRATEGY=parallel, e.g. in your
~/.Renviron file.nproc installed, availableCores() would be
limited by environment variables OMP_NUM_THREADS and
OMP_THREAD_LIMIT, if set. For example, on conservative systems
that set OMP_NUM_THREADS=1 as the default, availableCores()
would pick this up via nproc and return 1. This was not the
intended behavior. Now those environment variables are temporarily
unset before querying nproc.R_PARALLELLY_* (and R_FUTURE_*) environment variables are now
only read when the parallelly package is loaded, where they set
the corresponding parallelly.* option. Previously, some of these
environment variables were queried by different functions as a
fallback to when an option was not set. By only parsing them when
the package is loaded, it decrease the overhead in functions, and
it clarifies that options can be changed at runtime whereas
environment variables should only be set at startup.makeClusterPSOCK() now support setting up cluster nodes in
parallel similarly to how parallel::makePSOCKcluster() does it.
This significantly reduces the setup turnaround time. This is only
supported in R (>= 4.0.0). To revert to the sequential setup
strategy, set R option parallelly.makeNodePSOCK.setup_strategy to
"sequential".
Add freePort() to get a random TCP port that can be opened.
parallelly.availableCores.fallback and environment
variable R_PARALLELLY_AVAILABLECORES_FALLBACK was ignored since
parallelly 1.22.0, when support for 'nproc' was added to
availableCores().ssh
client. This means that regardless whether you are on Linux,
macOS, or Windows 10, setting up parallel workers on external
machines over SSH finally works out of the box without having to
install PuTTY or other SSH clients. This was possible because a
workaround was found for a Windows 10 bug preventing us from using
reverse tunneling over SSH. It turns out the bug reveals itself
when using hostname 'localhost' but not '127.0.0.1', so we use the
latter.availableCores() gained argument omit to make it easier to put
aside zero or more cores from being used in parallel processing.
For example, on a system with four cores, availableCores(omit = 1) returns 3. Importantly, since availableCores() is guaranteed
to always return a positive integer, availableCores(omit = 4) == 1, even on systems with four or fewer cores. Using
availableCores() - 4 on such systems would return a non-positive
value, which would give an error downstream.makeClusterPSOCK(), or actually makeNodePSOCK(), did not accept
all types of environment variable names when using rscript_envs,
e.g. it would give an error if we tried to pass
_R_CLASS_MATRIX_ARRAY_.
makeClusterPSOCK() had a "length > 1 in coercion to logical" bug
that could affect especially MS Windows 10 users.
plink of the PuTTY software, (ii) ssh in the
RStudio distribution, and (iii) ssh of Windows 10. Previously,
the latter was considered first but that still has a bug preventing
us from using reverse tunneling.makeClusterPSOCK(), or actually makeNodePSOCK(), gained
argument quiet, which can be used to silence output produced by
manual = TRUE.
c() for cluster objects now warns about duplicated cluster
nodes.
Add isForkedNode() to test if a cluster node runs in a forked
process.
Add isLocalhostNode() to test if a cluster node runs on the
current machine.
Now availableCores() and availableWorkers() avoid recursive
calls to the custom function given by options
parallelly.availableCores.custom and
parallelly.availableWorkers.custom, respectively.
availableWorkers() now recognizes the Slurm environment variable
SLURM_JOB_NODELIST, e.g. "dev1,n[3-4,095-120]". It will use
scontrol show hostnames "$SLURM_JOB_NODELIST" to expand it, if
supported on the current machine, otherwise it will attempt to
parse and expand the nodelist specification using R. If either of
environment variable SLURM_JOB_CPUS_PER_NODE or
SLURM_TASKS_PER_NODE is set, then each node in the nodelist will
be represented that number of times. If in addition, environment
variable SLURM_CPUS_PER_TASK (always a scalar), then that is also
respected.
parallelly. prefix for options and the
R_PARALLELLY_ prefix for environment variables. Settings that
use the corresponding future. and R_FUTURE_ prefixes are still
recognized.availableCores() did not respect environment variable
SLURM_TASKS_PER_NODE when the job was allocated more than one
node.
Above argument quiet was introduced in future 1.19.1 but was
mistakenly dropped from parallelly 1.20.0 when that was
released, and therefore also from future (>= 1.20.0).
availableCores(), availableWorkers(), and freeCores() gained
argument logical, which is passed down to
parallel::detectCores() as-is. The default is TRUE but it can be
changed by setting the R option
parallelly.availableCores.logical. This option can in turn be
set via environment variable R_PARALLELLY_AVAILABLECORES_LOGICAL
which is applied (only) when the package is loaded.
Now makeClusterPSOCK() asserts that there are enough free
connections available before attempting to create the parallel
workers. If too many workers are requested, an informative error
message is produced.
Add availableConnections() and freeConnections() to infer the
maximum number of connections that the current R installation can
have open at any time and how many of those are currently free to
be used. This limit is typically 128 but may be different in
custom R installations that are built from source.
Now availableCores() queries also Unix command nproc, if
available. This will make it respect the number of CPU/cores
limited by 'cgroups' and Linux containers.
PSOCK cluster workers are now set up to communicate using little
endian (useXDR = FALSE) instead of big endian (useXDR = TRUE).
Since most modern systems use little endian, useXDR = FALSE
speeds up the communication noticeably (10-15%) on those systems.
The default value of this argument can be controlled by the R
option parallelly.makeNodePSOCK.useXDR or the corresponding
environment variable R_PARALLELLY_MAKENODEPSOCK_USEXDR.
Add cpuLoad() for querying the "average" system load on Unix-like
systems.
Add freeCores() for estimating the average number of unused cores
based on the average system load as given by cpuLoad().
R_FUTURE_AVAILABLECORES_FALLBACK
and R_FUTURE_AVAILABLECORES_SYSTEM, none of the R_PARALLELLY_*
and R_FUTURE_* ones where recognized.find_rshcmd() which was never meant to be exported.makeClusterPSOCK() gained argument validate to control whether
or not the nodes should be tested after they've been created. The
validation is done by querying each node for its session
information, which is then saved as attribute session_info on the
cluster node object. This information is also used in error
messages, if available. This validation has been done since
version 1.5.0 but now it can be disabled. The default of argument
validate can be controlled via an R options and an environment
variable.
Now makeNodePSOCK(..., rscript_envs = "UNKNOWN") produces an
informative warning on non-existing environment variables that was
skipped.
makeClusterPSOCK() would produce an error on 'one node produced
an error: could not find function "getOptionOrEnvVar"' if
parallelly is not available on the node.
makeClusterPSOCK() would attempt to load parallelly on the
worker. If it's not available on the worker, it would result in a
silent warning on the worker. Now parallelly is not loaded.
makeClusterPSOCK(..., tries = n) would retry to setup a cluster
node also on errors that were unrelated to node setup or node
connection errors.
The error message on using an invalid rscript_envs argument for
makeClusterPSOCK() reported on the value of rscript_libs
(sic!).
makeNodePSOCK(..., rscript_envs = "UNKNOWN") would result in an
error when trying to launch the cluster node.
find_rshcmd() which was never meant to be exported.availableCores(), and availableWorkers(),
supportsMulticore(), as.cluster(), autoStopCluster(),
makeClusterMPI(), makeClusterPSOCK(), and makeNodePSOCK()
from the future package.isConnectionValid() and connectionId() adopted from
internal code of the future package.Renamed environment variable R_FUTURE_MAKENODEPSOCK_tries used by
makeClusterPSOCK() to R_FUTURE_MAKENODEPSOCK_TRIES.
connectionId() did not return -1L on Solaris for connections
with internal 'nil' pointers because they were reported as '0' -
not 'nil' or '0x0'.
Now availableCores() better supports Slurm. Specifically, if
environment variable SLURM_CPUS_PER_TASK is not set, which
requires that option --slurm-cpus-per-task=n is specified and
SLURM_JOB_NUM_NODES=1, then it falls back to using
SLURM_CPUS_ON_NODE, e.g. when using --ntasks=n.
Now availableCores() and availableWorkers() supports
LSF/OpenLava. Specifically, they acknowledge environment variable
LSB_DJOB_NUMPROC and LSB_HOSTS, respectively.
makeClusterPSOCK() will now retry to create a cluster node up to
tries (default: 3) times before giving up. If argument port
species more than one port (e.g. port = "random") then it will
also attempt find a valid random port up to tries times before
giving up. The pre-validation of the random port is only supported
in R (>= 4.0.0) and skipped otherwise.
makeClusterPSOCK() skips shell quoting of the elements in
rscript if it inherits from AsIs.
makeClusterPSOCK(), or actually makeNodePSOCK(), gained
argument quiet, which can be used to silence output produced by
manual = TRUE.
plan(multisession), plan(cluster, workers = <number>), and
makeClusterPSOCK() which they both use internally, sets up
localhost workers twice as fast compared to versions since
future 1.12.0, which brings it back to par with a bare-bone
parallel::makeCluster(..., setup_strategy = "sequential") setup.
The slowdown was introduced in future 1.12.0 (2019-03-07) when
protection against leaving stray R processes behind from failed
worker startup was implemented. This protection now makes use of
memoization for speedup.print() on RichSOCKcluster gives information not only on the
name of the host but also on the version of R and the platform of
each node ("worker"), e.g. "Socket cluster with 3 nodes where 2
nodes are on host 'localhost' (R version 4.0.0 (2020-04-24),
platform x86_64-w64-mingw32), 1 node is on host 'n3' (R version
3.6.3 (2020-02-29), platform x86_64-pc-linux-gnu)".
It is now possible to set environment variables on workers before
they are launched by makeClusterPSOCK() by specify them as as
<name>=<value> as part of the rscript vector argument,
e.g. rscript=c("ABC=123", "DEF='hello world'", "Rscript"). This
works because elements in rscript that match regular expression
"^ [[:alpha:]_][[:alnum:]_]*=.*" are no longer shell quoted.
makeClusterPSOCK() now returns a cluster that in addition to
inheriting from SOCKcluster it will also inherit from
RichSOCKcluster.
Made makeClusterPSOCK() and makeNodePSOCK() agile to the name
change from parallel:::.slaveRSOCK() to parallel:::.workRSOCK()
in R (>= 4.1.0).
makeClusterPSOCK(..., rscript) will not try to locate
rscript[1] if argument homogeneous is FALSE (or inferred to be
FALSE).
makeClusterPSOCK(..., rscript_envs) would result in a syntax
error when starting the workers due to non-ASCII quotation marks if
option useFancyQuotes was not set to FALSE.
makeClusterPSOCK() gained argument rscript_envs for setting
environment variables in workers on startup, e.g. rscript_envs = c(FOO = "3.14", "BAR")._R_CHECK_LIMIT_CORES_ set. To better
emulate CRAN submission checks, the future package will, when
loaded, set this environment variable to TRUE if unset and if R CMD check is running. Note that future::availableCores()
respects _R_CHECK_LIMIT_CORES_ and returns at most 2L (two
cores) if detected.makeClusterPSOCK() draws a random
port from (when argument port is not specified) can now be
controlled by environment variable R_FUTURE_RANDOM_PORTS. The
default range is still 11000:11999 as with the parallel
package.?makeClusterPSOCK with
instructions on how to troubleshoot when the setup of local and
remote clusters fail.makeClusterPSOCK() could produce warnings like "cannot open file
'/tmp/alice/Rtmpi69yYF/future.parent=2622.a3e32bc6af7.pid': No such
file", e.g. when launching R workers running in Docker containers.
makeClusterMPI() did not work for MPI clusters with 'comm' other
than '1'.
Now availableCores() also recognizes PBS environment variable
NCPUS, because the PBSPro scheduler does not set PBS_NUM_PPN.
If, option future.availableCores.custom is set to a function,
then availableCores() will call that function and interpret its
value as number of cores. Analogously, option
future.availableWorkers.custom can be used to specify a hostnames
of a set of workers that availableWorkers() sees. These new
options provide a mechanism for anyone to customize
availableCores() and availableWorkers() in case they do not
(yet) recognize, say, environment variables that are specific the
user's compute environment or HPC scheduler.
makeClusterPSOCK() gained support for argument rscript_startup
for evaluating one or more R expressions in the background R worker
prior to the worker event loop launching. This provides a more
convenient approach than having to use, say, rscript_args = c("-e", sQuote(code)).
makeClusterPSOCK() gained support for argument rscript_libs to
control the R package library search path on the workers. For
example, to prepend the folder ~/R-libs on the workers, use
rscript_libs = c("~/R-libs", "*"), where "*" will be resolved
to the current .libPaths() on the workers.
makeClusterPSOCK() did not shell quote the Rscript executable
when running its pre-tests checking whether localhost Rscript
processes can be killed by their PIDs or not.makeClusterPSOCK() fails to create one of many nodes, then it
will attempt to stop any nodes that were successfully created.
This lowers the risk for leaving R worker processes behind.makeClusterPSOCK() in future (>= 1.11.1) produced warnings
when argument rscript had length(rscript) > 1.makeClusterPSOCK() fails to connect to a worker, it produces
an error with detailed information on what could have happened. In
rare cases, another error could be produced when generating the
information on what the workers PID is.makeClusterPSOCK() and
makeNodePSOCK() can now be controlled via environment variables
in addition to R options that was supported in the past. An
advantage of using environment variables is that they will be
inherited by child processes, also nested ones.R CMD check is running or not. If it is, then a few future-specific
environment variables are adjusted such that the tests play nice
with the testing environment. For instance, it sets the socket
connection timeout for PSOCK cluster workers to 120 seconds
(instead of the default 30 days!). This will lower the risk for
more and more zombie worker processes cluttering up the test
machine (e.g. CRAN servers) in case a worker process is left behind
despite the main R processes is terminated. Note that these
adjustments are applied automatically to the checks of any package
that depends on, or imports, the future package.makeClusterPSOCK() would fail to connect to a worker,
for instance due to a port clash, then it would leave the R worker
process running - also after the main R process terminated. When
the worker is running on the same machine, makeClusterPSOCK()
will now attempt to kill such stray R processes. Note that
parallel::makePSOCKcluster() still has this problem.makeClusterPSOCK() produces more informative error messages
whenever the setup of R workers fails. Also, its verbose messages
are now prefixed with "[local output] " to help distinguish the
output produced by the current R session from that produced by
background workers.
It is now possible to specify what type of SSH clients
makeClusterPSOCK() automatically searches for and in what order,
e.g. rshcmd = c("<rstudio-ssh>", "<putty-plink>").
Now makeClusterPSOCK() preserves the global RNG state
(.Random.seed) also when it draws a random port number.
makeClusterPSOCK() gained argument rshlogfile.
makeClusterPSOCK(..., rscript = "my_r") would in some cases fail
to find the intended my_r executable.Add makeClusterMPI(n) for creating MPI-based clusters of a
similar kind as parallel::makeCluster(n, type = "MPI") but that
also attempts to workaround issues where parallel::stopCluster()
causes R to stall.
makeClusterPSOCK() and makeClusterMPI() gained argument
autoStop for controlling whether the cluster should be
automatically stopped when garbage collected or not.
makeClusterPSOCK() produced a warning when environment variable
R_PARALLEL_PORT was set to random (e.g. as on CRAN).makeClusterPSOCK() now produces a more informative warning if
environment variable R_PARALLEL_PORT specifies a non-numeric
port.makeClusterPSOCK(), and therefore
plan(multisession) and plan(multiprocess), will use the SSH
client distributed with RStudio as a fallback if neither ssh nor
plink is available on the system PATH.makeClusterPSOCK(..., renice = 19) would launch each PSOCK worker
via nice +19 resulting in the error "nice: '+19': No such file or
directory". This bug was inherited from
parallel::makePSOCKcluster(). Now using nice --adjustment=19
instead.makeClusterPSOCK() now defaults to use the Windows PuTTY
software's SSH client plink -ssh, if ssh is not found.
Argument homogeneous of makeNodePSOCK(), a helper function of
makeClusterPSOCK(), will default to FALSE also if the hostname is
a fully qualified domain name (FQDN), that is, it "contains
periods". For instance, c('node1', 'node2.server.org') will use
homogeneous = TRUE for the first worker and homogeneous = FALSE
for the second.
makeClusterPSOCK() now asserts that each cluster node is
functioning by retrieving and recording the node's session
information including the process ID of the corresponding R
process.
makeClusterPSOCK() gained more detailed descriptions on
arguments and what their defaults are.connectTimeout and timeout of
makeNodePSOCK() can now be controlled via global options.availableCores(method = "mc.cores") is now defunct in favor of
"mc.cores+1".makeClusterPSOCK() treats workers that refer to a local machine
by its local or canonical hostname as "localhost". This avoids
having to launch such workers over SSH, which may not be supported
on all systems / compute cluster.
Added availableWorkers(). By default it returns localhost
workers according to availableCores(). In addition, it detects
common HPC allocations given in environment variables set by the
HPC scheduler.
Option future.availableCores.fallback, which defaults to
environment variable R_FUTURE_AVAILABLECORES_FALLBACK can now be
used to specify the default number of cores / workers returned by
availableCores() and availableWorkers() when no other settings
are available. For instance, if
R_FUTURE_AVAILABLECORES_FALLBACK=1 is set system wide in an HPC
environment, then all R processes that uses availableCores() to
detect how many cores can be used will run as single-core
processes. Without this fallback setting, and without other
core-specifying settings, the default will be to use all cores on
the machine, which does not play well on multi-user systems.
Creation of cluster futures (including multisession ones) would
time out already after 40 seconds if all workers were busy. New
default timeout is 30 days (option future.wait.timeout).
availableCores(methods = "_R_CHECK_LIMIT_CORES_") would give an
error if not running R CMD check.
Added makeClusterPSOCK() - a version of
parallel::makePSOCKcluster() that allows for more flexible
control of how PSOCK cluster workers are set up and how they are
launched and communicated with if running on external machines.
Added generic as.cluster() for coercing objects to cluster
objects to be used as in plan(cluster, workers = as.cluster(x)).
Also added a c() implementation for cluster objects such that
multiple cluster objects can be combined into a single one.
user to remote() was ignored (since 1.1.0).workers = "localhost" they (again) use the exact same R executable as the
main / calling R session (in all other cases it uses whatever
Rscript is found on the PATH). This was already indeed
implemented in 1.0.1, but with the added support for reverse SSH
tunnels in 1.1.0 this default behavior was lost.cluster() and
remote() to connect to remote clusters / machines. As long as
you can connect via SSH to those machines, it works also with these
future. The new code completely avoids incoming firewall and
incoming port forwarding issues previously needed. This is done by
using reverse SSH tunneling. There is also no need to worry about
internal or external IP numbers.availableCores() also acknowledges environment variable
NSLOTS set by Sun/Oracle Grid Engine (SGE).availableCores() returns 3L (=2L+1L) instead of 2L
if _R_CHECK_LIMIT_CORES_ is set.availableCores() also acknowledges the number of CPUs
allotted by Slurm.availableCores("mc.cores") returns getOption("mc.cores") + 1L,
because option mc.cores specifies "allowed number of additional
R processes" to be used in addition to the main R process.