---
title: "rollupTree"
output: rmarkdown::html_vignette
bibliography: references.bib
vignette: >
%\VignetteIndexEntry{rollupTree}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 6
)
```
```{r setup}
library(rollupTree)
```
# Overview
`rollupTree` implements a general function for computations in which some property of a parent element is a combination of corresponding properties of its child elements. The mass of an assembly, for example, is the sum of the masses of its subassemblies, and the mass of each subassembly is the sum of masses of its parts, etc.
`rollupTree` can perform computations specified by arbitrarily-shaped (but well-formed) trees, arbitrarily-defined properties and property-combining operations. Defaults are provided to simplify common cases (atomic numerical properties combined by summing), but functional programming techniques allow the caller to pass arbitrary *update* methods as required.
# Example With A Data Frame
Consider this example Work Breakdown Structure from [WorkBreakdownStructure.com](https://www.workbreakdownstructure.com):
[![Example Work Breakdown Structure. Source: www.workbreakdownstructure.com](wbs.jpg)](https://www.workbreakdownstructure.com)
The computations in the example are worked out already; we show here how to reproduce them.
A Work Breakdown Structure is a tree, that is, a graph that is connected and acyclic. It is, in addition, directed, that is, the edges have direction. We arbitrarily chose the edge direction to go from child to parent. Finally, it is single-rooted: every vertex but one has a single parent vertex; the root vertex has no parent.
The leaf elements (vertices) of the tree require asserted values for the properties (work, budget) of interest. Property values for non-leaf elements are computed by `rollup()`.
We begin by capturing the structure of the tree and asserted values in a data frame we call `wbs_table`. Values to be computed are initially unknown. Each element is uniquely identified by an `id` column[^1]. We also indicate parent id in the `pid` column but this information is not used directly by `rollup()`.
[^1]: `id` is the default but any valid column name can be used. Values should be character data.
```{r example}
library(rollupTree)
wbs_table
```
A key feature of recursively-defined problems like this is that the computations must be ordered in such a way that the computations for a given element must occur after properties for it children are known (either asserted or computed). Traversing a tree in this manner can be achieved by using a well-known algorithm in graph theory known as *topological sort*. For that reason, we construct an `igraph` [@igraph_manual] graph object in R, from which we can conveniently (1) check that the graph is in fact a well-formed tree, and (2) efficiently execute a topological sort to order the computations. (Note that, although the problems solved by rollup are *defined* recursively, the implementation in software is not recursive.)
It is a simple matter to construct a graph from the information in our data frame:
```{r wbs_tree-plot}
library(igraph)
wbs_tree <- graph_from_edgelist(
as.matrix(wbs_table[which(!is.na(wbs_table$pid)), c("id", "pid")]),
directed = TRUE
)
```
## ![WBS Tree](wbs_tree.png)
A directed acyclic graph may have more than one topological sort; any is suitable for our purpose. Here is what `rollup()` uses internally:
```{r}
igraph::topo_sort(wbs_tree)
```
## Summing a Single Numeric Property
Although our data in this first example is a data frame, `rollup()` can operate on an arbitrary R object if provided with *update* and *validate* methods for that object. For the common case in which the parameter of interest is a numeric column in a data frame, the combine operation is addition, and the key column is named `id`, the package provides `update_df_prop_by_id()` and `validate_df_by_id()` functions that can be invoked as boilerplate. To roll up the `work` property, for example, we simply invoke:
```{r}
rollup(
tree=wbs_tree,
ds=wbs_table,
update=function(d, t, s) update_df_prop_by_id(df=d, target=t, sources=s, prop="work"),
validate_ds=function(t, d) validate_df_by_id(tree=t, df=d, prop="work")
)
```
`update_df_prop_by_id()` (like every well-behaved *update* method) modifies only the specified column and leaves the rest of the data frame unchanged. If we want to roll up the `budget` column as well, we can simply chain two `rollup()`s together. In this example we use R's pipe operator:
```{r}
rollup(
tree=wbs_tree,
ds=wbs_table,
update=function(d, t, s) update_df_prop_by_id(df=d, target=t, sources=s, prop="work"),
validate_ds=function(t, d) validate_df_by_id(tree=t, df=d, prop="work")
) |> rollup(
tree=wbs_tree,
ds=_,
update=function(d, t, s) update_df_prop_by_id(df=d, target=t, sources=s, prop="budget"),
validate_ds=function(t, d) validate_df_by_id(tree=t, df=d, prop="budget")
)
```
In most cases, this approach suffices. The code is simple and clear, and performance is not typically an issue. (In other testing `rollup()` performs tens of thousands of non-trivial property updates per second.) We show here some alternate approaches, mainly to illustrate architectural features of the approach that may be useful for more esoteric applications.
## Chaining Update Methods
A valid *update* method returns the updated data set, so we can chain two updates within a single `rollup()` instead of chaining two `rollup()`s. Similarly, a data set validator returns a logical value, so we can make the conjunction of two validators:
```{r}
rollup(
tree = wbs_tree,
ds = wbs_table,
update = function(d, t, s) {
update_df_prop_by_id(
df = d,
target = t,
sources = s,
prop = "work"
) |>
update_df_prop_by_id(target = t,
sources = s,
prop = "budget")
},
validate_ds = function(t, d) {
validate_df_by_id(tree = t, df = d, prop = "work") &&
validate_df_by_id(tree = t, df = d, prop = "budget")
}
)
```
## Custom *get*, *set*, and *update* Methods
In this example we create a custom *get* method that builds a named vector from the specified properties (using lower-level function `df_get_by_id()`) and a corresponding *set* method (using `df_set_by_id()`). We then create a custom *update* method using these methods. (The default *combine* method still works because R knows how to add vectors.) Finally, we create a custom data set validator and invoke `rollup()` with our custom methods.
```{r}
my_get <- function(d, i) c(
w=df_get_by_id(df=d, id=i, prop="work"),
b=df_get_by_id(df=d, id=i, prop="budget")
)
my_set <- function(d, i, v) {
df_set_by_id(df=d, id=i, prop="work", val=v["w"]) |>
df_set_by_id(id=i, prop="budget", val=v["b"])
}
my_update <- function(d, t, s) {
update_prop(ds=d, target=t, sources=s, set=my_set, get=my_get)
}
my_validate <- function(t, d) {
validate_ds(tree=t, ds=d,
get_keys=function(d) df_get_ids(df=d),
get_prop=my_get,
op=function(v) my_check(v["w"]) && my_check(v["b"])
)
}
my_check <- function(v)
is.numeric(v) && !is.na(v) && (v > 0.0)
rollup(
tree = wbs_tree,
ds = wbs_table,
update = my_update,
validate_ds = my_validate
)
```
## Custom *combine* Method
Finally, we illustrate the use of a custom combiner. Suppose we have 5% uncertainty in our leaf cost numbers. Add those uncertainty numbers to our data frame:
```{r}
new_wbs_table <- wbs_table
new_wbs_table$work <- NULL
new_wbs_table$budget_unc <- ifelse(is.na(wbs_table$budget), NA, wbs_table$budget * 0.05)
new_wbs_table
```
The standard technique for accumulating uncertainties is to combine using root-sum-square (RSS).
```{r}
combine_rss <- function(vl) {
sqrt(Reduce(f = `+`, x = Map(
f = function(v)
v * v,
vl
)))
}
result <- rollup(
tree = wbs_tree,
ds = new_wbs_table,
update = function(d, t, s)
update_df_prop_by_id(
df = d,
target = t,
sources = s,
prop = "budget"
) |>
update_df_prop_by_id(
target = t,
sources = s,
prop = "budget_unc",
combine = combine_rss
),
validate_ds = function(t, d)
validate_df_by_id(tree = t, df = d, prop = "budget_unc"),
)
result$budget_unc_pct <- result$budget_unc / result$budget * 100.
result
```
# Example With A List
Suppose instead of a data frame we have a list of lists:
```{r}
wbs_list <- lapply(split(wbs_table, wbs_table$id),
function(r) list(name = r$name, budget = r$budget)
)
str(wbs_list)
```
We construct *update* and *validate* functions as follows:
```{r}
list_get <- function(d, i) d[[i]]$budget
list_set <- function(d, i, v) { d[[i]]$budget = v; d }
list_update <- function(d, t, s) { update_prop(d, t, s, list_set, list_get) }
list_validate <- function(t, d) validate_ds(t, d, get_keys = function(l) names(l), get = list_get)
```
The output of `rollup()` is as expected:
```{r}
list_result <- rollup(wbs_tree, wbs_list, list_update, list_validate)
str(list_result)
```
# Example With A Tree
The tree can serve as the dataset with proper attributes and methods. To illustrate, we add the budget figures from our data frame as vertex attributes to the tree:
```{r}
new_wbs_tree <- Reduce(
f = function(g, k) set_vertex_attr(g, 'budget', k, df_get_by_id(wbs_table, k, 'budget')),
x = names(V(wbs_tree)),
init = wbs_tree
)
ib <- vertex_attr(new_wbs_tree, "budget")
names(ib) <- names(V(new_wbs_tree))
ib
```
We construct *update* and *validate* functions as follows:
```{r}
tree_get <- function(d, k) vertex_attr(d, "budget", k)
tree_set <- function(d, k, v) set_vertex_attr(d, "budget", k, v)
tree_update <- function(d, t, s) update_prop(d, t, s, set = tree_set, get = tree_get)
tree_validate <- function(t, d) validate_ds(t, d, get_keys = function(d) names(V(d)), get = tree_get)
```
The output of `rollup()` is as expected:
```{r}
tree_result <- rollup(new_wbs_tree, new_wbs_tree, update = tree_update, validate_ds = tree_validate)
ob <- vertex_attr(tree_result, "budget")
names(ob) <- names(V(tree_result))
ob
```
# References