--- title: "Controller groups" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Controller groups} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Each controller object only supports only one type of worker configuration which you set in advance. However, different controllers may have different types of workers, and `crew` supports controller groups to coordinate among these different worker types. With third-party launcher plugins from other packages, this mechanism will allow you to e.g. send some tasks to GPU-capable or high-memory workers while other tasks go to low-spec workers. We demonstrate with a controller of fully persistent workers which always stay running and a controller of semi-persistent workers which terminate after completing four tasks. We create controller objects with names. ```r library(crew) persistent <- crew_controller_local(name = "persistent") transient <- crew_controller_local(name = "semi-persistent", tasks_max = 4L) ``` `crew` uses a different TCP port for each controller you run, so please do not create hundreds of controllers. Please see the subsection on ports in the [README](https://wlandau.github.io/crew/index.html). We put these controller objects into a new controller group object. ```r group <- crew_controller_group(persistent, transient) ``` This controller group has a global `connect()` method to initialize both controllers. ```r group$start() ``` You can choose which worker pool to receive tasks. ```r group$push(name = "my task", command = sqrt(4), controller = "semi-persistent") ``` The controller group also supports global methods for `wait()`, `pop()`, and `terminate()`. These methods operate on all controllers at once by default, but the `controllers` argument allows you to select a subset of controllers to act on. Below in `pop()` the `controller` column of the output indicates which controller ran the task. ```r group$wait(controllers = "semi-persistent") group$pop() #> # A tibble: 1 × 13 #> name command result status error code trace warnings seconds seed #> #> 1 my task sqrt(4) succe… NA 0 NA NA 0 NA #> # ℹ 3 more variables: algorithm , controller , worker ``` The [`map()`](https://wlandau.github.io/crew/reference/crew_class_controller.html#method-crew_class_controller-group-map) method provides functional programming, and the `controller` argument lets you choose the controller to submit the tasks. ```r group$map( command = a + b + c + d, iterate = list( a = c(1, 3), b = c(2, 4) ), data = list(c = 5), globals = list(d = 6), controller = "persistent" ) #> # A tibble: 2 × 13 #> name command result status error code trace warnings seconds seed #> #> 1 unnamed_t… a + b … succe… NA 0 NA NA 0 NA #> 2 unnamed_t… a + b … succe… NA 0 NA NA 0 NA #> # ℹ 3 more variables: algorithm , controller , worker ``` The controller group has a `summary()` method which aggregates the summaries of one or more controllers. ```r group$summary() #> # A tibble: 2 × 8 #> controller tasks seconds success error crash cancel warning #> #> 1 persistent 2 0 2 0 0 0 0 #> 2 semi-persistent 1 0 1 0 0 0 0 ``` When you are finished, please call `terminate()` with no arguments to terminate all controllers in the controller group. ```r group$terminate() ``` # Backup controllers As described in the [introduction vignette](https://wlandau.github.io/crew/articles/introduction.html#crashes-and-retries), a worker may crash if it runs out of memory or encounters a system-level issue. The controller lets you retry the task, but `pop()` throws an error if the task's worker crashed more than `crashes_max` times in a row. Controller groups and backup controllers let you configure exactly how tasks are retried. Each controller can designate a backup controller to run tasks that cause too many crashes in the original controller. With backups, you can connect a whole chain of controllers with increasing resources. If you put all these controllers in a controller group, then you can use group-level `push()` and `pop()` methods without having to know which controller actually ran the task. Consider the following example. We define three different controllers that use one another as backups. The `default` controller uses the `medium_memory` controller as a backup, and the `medium_memory` controller uses the `high_memory` controller as a backup. ```r library(crew) library(crew.cluster) # https://wlandau.github.io/crew.cluster/index.html high_memory <- crew_controller_slurm( name = "high_memory", options_cluster = crew_options_slurm(memory_gigabytes_required = 64), crashes_max = 2 ) medium_memory <- crew_controller_slurm( name = "medium_memory", options_cluster = crew_options_slurm(memory_gigabytes_required = 64), crashes_max = 3, backup = high_memory ) default <- crew_controller_slurm( name = "default", crashes_max = 4, backup = medium_memory ) ``` We put all three controllers in a controller group. The `default` controller is listed first, so tasks pushed with `group$push()` will automatically run in the `default` unless special circumstances dictate otherwise. ```r group <- crew_controller_group(default, medium_memory, high_memory) ``` Consider a task that sometimes exhausts the available memory of its [SLURM](https://en.wikipedia.org/wiki/Slurm_Workload_Manager) worker: ```r group$push(command = my.package::run_heavy_task(), name = "heavy_task") ``` If the task exhausts available memory and crashes its worker, then `pop()` informs you: ```r task <- group$pop() task[, c("name", "result", "status", "error", "code", "controller")] #> # A tibble: 1 × 6 #> name result status error code controller #> #> 1 heavy_task crash 19 | Connection reset 19 default ``` when you resubmit the task with `group$push()`, it will run on the default controller again. But if you keep resubmitting it and it crashes 4 times in a row (from `crashes_max = 4`) then the next `push()` will send it to the backup `medium_memory` controller. If the task's worker crashes in the next 3 pushes crash the task in the `medium_memory` controller, then subsequent pushes will use the `high_memory` controller, which is the backup of `medium_memory`. If the `high_memory` controller successfully completes the task, the next `group$pop()` returns a successful result. ```r task <- group$pop() task[, c("name", "result", "status", "error", "code", "controller")] # A tibble: 1 × 6 name result status error code controller 1 heavy_task success NA 0 high_memory ``` `group$summary()` counts the number of times each controller attempted the task, with crashes counted as errors. ```r group$summary() #> # A tibble: 2 × 8 #> controller tasks seconds success error crash cancel warning #> #> 1 default 4 0 0 0 4 0 0 #> 2 medium_memory 3 0 0 0 3 0 0 #> 3 high_memory 1 0 1 0 0 0 0 ``` If you use a controller group like this one in a [`targets`](https://docs.ropensci.org/targets/) pipeline (with `tar_option_set(controller = "group")` in `_targets.R`) then `tar_make()` will automatically retry crashed tasks using the configuration in the controller group. This behavior requires [`targets`](https://docs.ropensci.org/targets/) version 1.10.0.9002 or later.