This package is considered a duplicate. The official version of this package is found at:https://snoweye.r-universe.dev/pbdMPI

Package: pbdMPI 0.5-2

Wei-Chen Chen

pbdMPI: R Interface to MPI for HPC Clusters (Programming with Big Data Project)

A simplified, efficient, interface to MPI for HPC clusters. It is a derivation and rethinking of the Rmpi package. pbdMPI embraces the prevalent parallel programming style on HPC clusters. Beyond the interface, a collection of functions for global work with distributed data and resource-independent RNG reproducibility is included. It is based on S4 classes and methods.

Authors:Wei-Chen Chen [aut, cre], George Ostrouchov [aut], Drew Schmidt [aut], Pragneshkumar Patel [aut], Hao Yu [aut], Christian Heckendorf [ctb], Brian Ripley [ctb], R Core team [ctb], Sebastien Lamy de la Chapelle [aut]

pbdMPI_0.5-2.tar.gz
pbdMPI_0.5-2.tar.gz(r-4.5-noble)pbdMPI_0.5-2.tar.gz(r-4.4-noble)
pbdMPI.pdf |pbdMPI.html✨
pbdMPI/json (API)

# Install 'pbdMPI' in R:

install.packages('pbdMPI', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/snoweye/pbdmpi/issues2 issues

Uses libs:

openmpi– High performance message passing library

On CRAN:

openmpi

3.96 score 3 packages 1.0k downloads 2 mentions 258 exports 1 dependencies

Last updated 7 months agofrom:b986e9b7c2. Checks:1 OK. Indexed: no.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 12 2025

Exports:.mpiopt_init addr.mpi.comm.ptr allgather allreduce anysource anytag arrange.mpi.apts barrier bcast comm.abort comm.accept comm.all comm.allcommon comm.allcommon.integer comm.allpairs comm.any comm.as.gbd comm.balance.info comm.c2f comm.cat comm.chunk comm.connect comm.disconnect comm.dist comm.dist.common comm.dist.gbd comm.dup comm.end.seed comm.free comm.get.streams comm.is.null comm.length comm.load.balance comm.localrank comm.match.arg comm.max comm.mean comm.min comm.pairwise comm.pairwise.common comm.pairwise.gbd comm.print comm.range comm.rank comm.read.csv comm.read.csv2 comm.read.table comm.reset.seed comm.Rprof comm.sd comm.seed.state comm.set.errhandler comm.set.seed comm.set.stream comm.size comm.sort comm.sort.default comm.sort.double comm.sort.integer comm.split comm.stop comm.stopifnot comm.sum comm.timer comm.unload.balance comm.var comm.warning comm.warnings comm.which comm.which.max comm.which.min comm.write comm.write.csv comm.write.csv2 comm.write.table execmpi finalize gather get.conf get.jid get.lib get.mpi.comm.ptr get.sourcetag get.sysenv info.c2f info.create info.free info.set init intercomm.create intercomm.merge iprobe irecv is.comm.null is.finalized isend pbd_opt pbdApply pbdLapply pbdSapply port.close port.open probe recv reduce runmpi scatter send sendrecv sendrecv.replace serv.lookup serv.publish serv.unpublish spmd.allcheck.type spmd.allgather.array spmd.allgather.default spmd.allgather.double spmd.allgather.integer spmd.allgather.object spmd.allgather.raw spmd.allgatherv.default spmd.allgatherv.double spmd.allgatherv.integer spmd.allgatherv.raw spmd.allreduce.array spmd.allreduce.default spmd.allreduce.double spmd.allreduce.float spmd.allreduce.float32 spmd.allreduce.integer spmd.allreduce.logical spmd.allreduce.object spmd.alltoall.double spmd.alltoall.integer spmd.alltoall.raw spmd.alltoallv.double spmd.alltoallv.integer spmd.alltoallv.raw spmd.anysource spmd.anytag spmd.barrier spmd.bcast.array spmd.bcast.default spmd.bcast.double spmd.bcast.integer spmd.bcast.message spmd.bcast.object spmd.bcast.raw spmd.bcast.string spmd.check.type.recv spmd.check.type.send spmd.comm.abort spmd.comm.accept spmd.comm.c2f spmd.comm.cat spmd.comm.connect spmd.comm.disconnect spmd.comm.dup spmd.comm.free spmd.comm.get.parent spmd.comm.is.null spmd.comm.localrank spmd.comm.print spmd.comm.rank spmd.comm.set.errhandler spmd.comm.size spmd.comm.spawn spmd.comm.split SPMD.CT SPMD.DT spmd.finalize spmd.gather.array spmd.gather.default spmd.gather.double spmd.gather.integer spmd.gather.object spmd.gather.raw spmd.gatherv.default spmd.gatherv.double spmd.gatherv.integer spmd.gatherv.raw spmd.get.count spmd.get.processor.name spmd.get.sourcetag spmd.hostinfo spmd.info.c2f spmd.info.create spmd.info.free spmd.info.set spmd.init spmd.intercomm.create spmd.intercomm.merge SPMD.IO spmd.iprobe spmd.irecv.default spmd.irecv.double spmd.irecv.integer spmd.irecv.raw spmd.is.comm.null spmd.is.finalized spmd.is.manager spmd.isend.default spmd.isend.double spmd.isend.integer spmd.isend.raw SPMD.OP spmd.port.close spmd.port.open spmd.probe spmd.recv.default spmd.recv.double spmd.recv.integer spmd.recv.raw spmd.reduce.array spmd.reduce.default spmd.reduce.double spmd.reduce.float spmd.reduce.float32 spmd.reduce.integer spmd.reduce.logical spmd.reduce.object spmd.scatter.array spmd.scatter.default spmd.scatter.double spmd.scatter.integer spmd.scatter.object spmd.scatter.raw spmd.scatterv.default spmd.scatterv.double spmd.scatterv.integer spmd.scatterv.raw spmd.send.default spmd.send.double spmd.send.integer spmd.send.raw spmd.sendrecv.default spmd.sendrecv.double spmd.sendrecv.integer spmd.sendrecv.raw spmd.sendrecv.replace.default spmd.sendrecv.replace.double spmd.sendrecv.replace.integer spmd.sendrecv.replace.raw spmd.serv.lookup spmd.serv.publish spmd.serv.unpublish SPMD.TP spmd.wait spmd.waitall spmd.waitany spmd.waitsome task.pull task.pull.manager task.pull.workers wait waitall waitany waitsome

Dependencies:float

pbdMPI-guide

Rendered frompbdMPI-guide.Rnwusingutils::Sweaveon Mar 12 2025.

Last update: 2016-12-18
Started: 2013-07-04

Citation

To cite pbdMPI in a publication use:

Chen W, Ostrouchov G, Schmidt D, Patel P, Yu H (2022). “pbdMPI: R Interface to MPI.” R Package, URL https://cran.r-project.org/package=pbdMPI.

Chen W, Ostrouchov G, Schmidt D, Patel P, Yu H (2012). A Quick Guide for the pbdMPI Package. R Vignette, URL https://cran.r-project.org/package=pbdMPI.

Corresponding BibTeX entries:

  @Misc{Chen2022pbdMPIpackage,
    title = {{pbdMPI}: R Interface to {MPI}},
    author = {Wei-Chen Chen and George Ostrouchov and Drew Schmidt and
      Pragneshkumar Patel and Hao Yu},
    year = {2022},
    note = {{R} Package, URL
      https://cran.r-project.org/package=pbdMPI},
  }

  @Manual{Chen2012pbdMPIvignette,
    title = {A Quick Guide for the {pbdMPI} Package},
    author = {Wei-Chen Chen and George Ostrouchov and Drew Schmidt and
      Pragneshkumar Patel and Hao Yu},
    year = {2012},
    note = {{R} Vignette, URL
      https://cran.r-project.org/package=pbdMPI},
  }

Readme and manuals

pbdMPI

License:
Download:
Status:
Author: See section below.

This package provides a simplified, efficient, interface to MPI for HPC clusters. This derivation and rethinking of the Rmpi package embraces the prevalent parallel programming style on HPC clusters. It is based on S4 classes and methods.

If you don't have access to an HPC cluster, consider applying for an allocation at an HPC facility in your country. For example, US ACCESS, US INCITE, EU PRACE, Australia NCI, Canada RAC, Czechia IT4I, India NSM (National Super Computing Mission), Japan HPCI. (Please notify us if you have more examples or updates from your country.). Applying for a startup allocation can be easier than most would expect, sometimes as little as a paragraph describing your application and software. Large allocations require a full proposal.

With few exceptions, R does computations in memory. When data becomes too large to handle in the memory of a single node, or when more processors than those offered in commodity hardware are needed for a job, a typical strategy is to add more nodes. MPI, or the "Message Passing Interface", is the standard for managing multi-node computing. pbdMPI is a package that greatly simplifies the use of MPI from R.

In pbdMPI, we make extensive use of R's S4 system to simplify the interface. Instead of needing to specify the type (e.g., integer or double) of the data via function name (as in C implementations) or in an argument (as in Rmpi), you need only call the generic function on your data and we will always "do the right thing".

In pbdMPI, we write programs in the "Single Program/Multiple Data" or SPMD style, which is the prevalent style on HPC clusters. Contrary to the way much of the R world is aquainted with parallelism, there is no "manager". Each process (MPI rank) runs the same program as every other process, but operates on its own data or its own section of a global parameter space. This is arguably one of the simplest extensions of serial to massively parallel programming, and has been the standard way of doing things in the large-scale HPC community for decades. The "single program" can be viewed as a generalization of the serial program.

Installation

Installation with install.packages("pbdMPI") from CRAN or with remotes::install_github("RBigData/pbdMPI") from GitHub works on systems with MPI installed in a standard location. This is usually true on HPC Cluster Systems and also if you follow the Linux, MacOS, or Windows Notes below for MPI installation.

Usage

If you are comfortable with MPI concepts, you should find pbdMPI very agreeable and simple to use. Below is a basic "hello world" program:

# load the package and initialize MPI
suppressMessages(library(pbdMPI, quietly = TRUE))

# Hello world
message <- paste("Hello from rank", comm.rank(), "of", comm.size())
comm.print(message, all.rank = TRUE, quiet = TRUE)

# shut down the communicators and exit
finalize()

Save this as, say, mpi_hello_world.r and run it via:

mpirun -np 4 Rscript mpi_hello_world.r

The function comm.print() is a "sugar" function custom to pbdMPI that makes it simple to print in a distributed environment. The argument all.rank = TRUE specifies that all MPI ranks should print, and the quiet = TRUE argument tells each rank not to "announce" itself when it does its printing. This function and its companion comm.cat() automatically cooperate across the parallel instances of the single program to control printing.

Numerous other examples can be found in both the pbdMPI vignette as well as the pbdDEMO package and its corresponding vignette. While these were written for version 0.3-0 of pbdMPI, they are still highly relevant.

HPC Cluster Systems Notes

HPC clusters are Linux systems and use Environment Modules to manage software. Consult your local cluster documentation as specifics with respect to R and MPI can differ. Usually, an MPI version is installed and should work with pbdMPI standard install, although sometimes a module load openmpi might be needed to get OpenMPI.

Some common module commands are:

module list  # lists currently loaded software modules
module avail # lists available software modules
module load <module_name> # loads module <module_name>

Available R modules are typically loaded via module load r or module load R, possibly with directory and version information. On some systems, this needs to be preceded by selecting a programming environment, which may be gnu, pgi, etc., while on others loading R automatically selects the correct programming environment. Please consult your HPC cluster documentation. Typically, software installations are done on login nodes and parallel debugging and production runs on compute nodes.

A resource manager, usually Slurm, PBS, LSF, or SGE is used to allocate compute nodes for a job. Consult your cluster documentation, as defaults tend to be site-specific.

Scripts are usually submitted as batch jobs but interactive allocations are possible too. For batch submission, we recommend writing a shell script. Here we give a shell script example for Slurm and note that a translation table is available to other resource managers.

#!/bin/bash
#SBATCH -J <my_job>
#SBATCH -A <my_account>
#SBATCH --nodes=4
#SBATCH --exclusive
#SBATCH -t 00:20:00
#SBATCH --mem=0

module load gcc
module load openmpi
module load r

mpirun --map-by ppr:4:node Rscript <your_r_script>

This example runs asynchronously 16 copies (4 per node) of <your_r_script> in separate R sessions, communicating with each other via OpenMPI. If 128 cores are available on a node, further parallelism (32 per R session) is available for shared-memory parallel approaches (such as mclapply() or multithreaded libraries, like OpenBLAS, possibly via FlexiBLAS). The parameter --exclusive requests exclusive access to all cores on the nodes, --mem=0 requests all memory, and -t 00:20:00 asks for 20 minutes of time. Save this Slurm script in a file <your_script.sh> and submit with sbatch <your_script.sh>. To quickly troubleshoot a Slurm script at your location, replace Rscript <your_r_script> with hostname.

Linux Notes

See INSTALL file for details.

Mac OS Notes

MacOS does not provide MPI, so first install a recent version of OpenMPI. This is best done via Homebrew. Homebrew will automatically ask to install Xcode Command Line Tools (CLT) if you have not yet done so (You don't need all of Xcode, just the CLT), see Homebrew installation. After installing Homebrew,

brew install openmpi

will install OpenMPI in a location that pbdMPI can find. Then, follow standard R package installation for pbdMPI.

Parallelizing with distributed-memory concepts (like MPI) on shared-memory platforms (like a single node or a laptop) does produce excellent speedups but does not extend available memory for larger data objects. Chunking of larger objects does not extend available memory but does prevent duplication of the objects in memory when running several R sessions in shared memory of a laptop.

Windows Notes

Windows does not provide MPI, so first an MPI installation (binary, header, and libraries) is needed. We recommend installing Microsoft MPI which is based on MPICH.

Download MS-MPI v10.1.3 (msmpisetup.exe) and SDK (msmpisdk.msi) from the Microsoft Download Center.

See INSTALL file for the installation and for the usage of mpiexec.exe.

Authors

pbdMPI is authored and maintained by the pbdR core team:

Wei-Chen Chen
George Ostrouchov
Drew Schmidt

With additional contributions from:

Pragneshkumar Patel
Hao Yu
Christian Heckendorf
Brian Ripley (Windows HPC Pack 2012)
The R Core team (some functions are modified from the base packages)

Help Manual

Help page	Topics
R Interface to MPI (Programming with Big Data in R Project)	pbdMPI-package pbdMPI
All Ranks Gather Objects from Every Rank	allgather allgather,ANY,ANY,integer-method allgather,ANY,missing,integer-method allgather,ANY,missing,missing-method allgather,integer,integer,integer-method allgather,integer,integer,missing-method allgather,numeric,numeric,integer-method allgather,numeric,numeric,missing-method allgather,raw,raw,integer-method allgather,raw,raw,missing-method allgather-methods allgatherv
All Ranks Receive a Reduction of Objects from Every Rank	allreduce allreduce,ANY,missing-method allreduce,float32,float32-method allreduce,integer,integer-method allreduce,logical,logical-method allreduce,numeric,numeric-method allreduce-method
All to All	alltoall spmd.alltoall.double spmd.alltoall.integer spmd.alltoall.raw spmd.alltoallv.double spmd.alltoallv.integer spmd.alltoallv.raw
Parallel Apply and Lapply Functions	pbdApply pbdLapply pbdSapply
A Rank Broadcast an Object to Every Rank	bcast bcast,ANY-method bcast,integer-method bcast,numeric-method bcast,raw-method bcast-method
comm.chunk	comm.chunk
Communicator Functions	barrier comm.abort comm.accept comm.c2f comm.connect comm.disconnect comm.dup comm.free comm.is.null comm.localrank comm.rank comm.size comm.split finalize init intercomm.create intercomm.merge is.finalized port.close port.open serv.lookup serv.publish serv.unpublish
A Rank Gathers Objects from Every Rank	gather gather,ANY,ANY,integer-method gather,ANY,missing,integer-method gather,ANY,missing,missing-method gather,integer,integer,integer-method gather,integer,integer,missing-method gather,numeric,numeric,integer-method gather,numeric,numeric,missing-method gather,raw,raw,integer-method gather,raw,raw,missing-method gather-methods gatherv
Functions to Get MPI and/or pbdMPI Configures Used at Compiling Time	get.conf get.lib get.sysenv
Divide Job ID by Ranks	get.jid
Global All Pairs	comm.allpairs
Global Any and All Functions	comm.all comm.allcommon comm.any
Global As GBD Function	comm.as.gbd
Global Balance Functions	comm.balance.info comm.load.balance comm.unload.balance
Global Base Functions	comm.length comm.mean comm.sd comm.sum comm.var
Global Distance for Distributed Matrices	comm.dist
Global Argument Matching	comm.match.arg
Global Pairwise Evaluations	comm.pairwise
Global Print and Cat Functions	comm.cat comm.print
Global Range, Max, and Min Functions	comm.max comm.min comm.range
Global Reading Functions	comm.read.csv comm.read.csv2 comm.read.table
A Rprof Function for SPMD Routines	comm.Rprof
Global Quick Sort for Distributed Vectors or Matrices	comm.sort
Global Stop and Warning Functions	comm.stop comm.stopifnot comm.warning comm.warnings
A Timing Function for SPMD Routines	comm.timer
Global Which Functions	comm.which comm.which.max comm.which.min
Global Writing Functions	comm.write comm.write.csv comm.write.csv2 comm.write.table
Info Functions	info.c2f info.create info.free info.set
A Rank Receives (Nonblocking) an Object from the Other Rank	irecv irecv,ANY-method irecv,integer-method irecv,numeric-method irecv,raw-method irecv-method
Check if a MPI_COMM_NULL	is.comm.null
A Rank Send (Nonblocking) an Object to the Other Rank	isend isend,ANY-method isend,integer-method isend,numeric-method isend,raw-method isend-method
Set or Get MPI Array Pointers in R	arrange.mpi.apts
Functions for Get/Print MPI_COMM Pointer (Address)	addr.mpi.comm.ptr get.mpi.comm.ptr
Probe Functions	iprobe probe
A Rank Receives (Blocking) an Object from the Other Rank	recv recv,ANY-method recv,integer-method recv,numeric-method recv,raw-method recv-method
A Rank Receive a Reduction of Objects from Every Rank	reduce reduce,ANY,missing-method reduce,float32,float32-method reduce,integer,integer-method reduce,logical,logical-method reduce,numeric,numeric-method reduce-method
A Rank Scatter Objects to Every Rank	scatter scatter,ANY,ANY,integer-method scatter,ANY,missing,integer-method scatter,ANY,missing,missing-method scatter,integer,integer,integer-method scatter,integer,integer,missing-method scatter,numeric,numeric,integer-method scatter,numeric,numeric,missing-method scatter,raw,raw,integer-method scatter,raw,raw,missing-method scatter-method
Parallel random number generation with reproducible results	comm.end.seed comm.get.streams comm.reset.seed comm.seed.state comm.set.seed comm.set.stream
A Rank Send (blocking) an Object to the Other Rank	send send,ANY-method send,integer-method send,numeric-method send,raw-method send-method
Send and Receive an Object to and from Other Ranks	sendrecv sendrecv,ANY,ANY-method sendrecv,integer,integer-method sendrecv,numeric,numeric-method sendrecv,raw,raw-method sendrecv-method
Send and Receive an Object to and from Other Ranks	sendrecv.replace sendrecv.replace,ANY-method sendrecv.replace,integer-method sendrecv.replace,numeric-method sendrecv.replace,raw-method sendrecv.replace-method
Set Global pbdR Options	pbd_opt
Functions to Obtain source and tag	anysource anytag get.sourcetag
Default control in pbdMPI.	.pbd_env
Sets of controls in pbdMPI.	.mpiopt_init SPMD.CT SPMD.DT SPMD.IO SPMD.OP SPMD.TP
Functions for Task Pull Parallelism	task.pull task.pull.manager task.pull.workers
Execute MPI code in system	execmpi runmpi
Wait Functions	wait waitall waitany waitsome