---
title: "Subsetting and Manipulating Table Contents"
author: "Gabriel Becker and Adrian Waddell"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Subsetting and Manipulating Table Contents}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
---

```{r, include = FALSE}
suggested_dependent_pkgs <- c("dplyr", "tibble")
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = all(vapply(
    suggested_dependent_pkgs,
    requireNamespace,
    logical(1),
    quietly = TRUE
  ))
)
```

```{r, echo=FALSE}
knitr::opts_chunk$set(comment = "#")
```

```{css, echo=FALSE}
.reveal .r code {
    white-space: pre;
}
```

## Introduction

`TableTree` objects are based on a tree data structure as the name
indicates. The package is written such that the user does not need to
walk trees for many basic table manipulations. Walking trees will
still be necessary for certain manipulation and will be the subject of
a different vignette.

In this vignette we show some methods to subset tables and to extract
cell values.

We will use the following table for illustrative purposes:

```{r, message=FALSE}
library(rtables)
library(dplyr)

lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_rows_by("SEX", split_fun = drop_split_levels) %>%
  analyze(c("AGE", "STRATA1"))

tbl <- build_table(lyt, ex_adsl %>% filter(SEX %in% c("M", "F")))
tbl
```

## Traditional Subsetting and modification with `[`

The `[` and `[<-` accessor functions operate largely the same as their
`data.frame` cousins:

 - `[` and `[<-` both treat Tables as rectangular objects (rather than
   trees)
   - In particular this means label rows are treated as rows with
     empty cell values, rather than rows without cells
 - `[` accepts both column and row absolute position, and missing
   arguments mean "all indexes in that dimension"
   - multiple values can be specified in both row and column position
   - negative numeric positions are supported, though like
     `[.data.frame` they cannot be mixed with positive ones
 - `[` always returns the same class as the object being subset unless
   `drop = TRUE`
 - `[ , drop = TRUE` returns the raw (possibly multi-element) value
   associated with the cell.

**Known Differences from `[.data.frame`**
- absolute position cannot currently be used to reorder columns or
  rows. Note in general the result of such an ordering is unlikely to
  be structurally valid. To change the order of values, please read
  [sorting and
  pruning](https://insightsengineering.github.io/rtables/latest-tag/articles/sorting_pruning.html)
  vignette or relevant function (`sort_at_path()`).
- `character` indices are treated as paths, not vectors of names in
  both `[` and `[<-`

The `[` accessor function always returns an `TableTree` object if
`drop=TRUE` is not set. The first argument are the row indices and the
second argument the column indices. Alternatively logical subsetting
can be used. The indices are based on visible rows and not on the tree
structure. So:

```{r}
tbl[1, 1]
```

is a table with an empty cell because the first row is a label row. We
need to access a cell with actual cell data:

```{r}
tbl[3, 1]
```

To retrieve the value,  we use `drop = TRUE`:

```{r}
tbl[3, 1, drop = TRUE]
```


One can access multiple rows and columns:

```{r}
tbl[1:3, 1:2]
```


Note that we do not repeat label rows for descending children, e.g.

```{r}
tbl[2:4, ]
```

does not show that the first row is derived from `AGE`. In order to
repeat content/label information, one should use the pagination
feature. Please read the related vignette.

Character indices are interpreted as paths (see below), NOT elements
to be matched against `names(tbl)`:

```{r}
tbl[, c("ARM", "A: Drug X")]
```

### Dealing with titles, foot notes, and top left information
As standard no additional information is kept after subsetting. Here,
we show with a more complete table how it is still possible to keep
the (possibly) relevant information.

```{r}
top_left(tbl) <- "SEX"
main_title(tbl) <- "Table 1"
subtitles(tbl) <- c("Authors:", " - Abcd Zabcd", " - Cde Zbcd")

main_footer(tbl) <- "Please regard this table as an example of smart subsetting"
prov_footer(tbl) <- "Do remember where you read this though"

fnotes_at_path(tbl, rowpath = c("M", "AGE", "Mean"), colpath = c("ARM", "A: Drug X")) <- "Very important mean"
```

Normal subsetting loses all the information showed above.
```{r}
tbl[3, 3]
```

If all the rows are kept, top left information is also kept. This can be also
imposed by adding `keep_topleft = TRUE` to the subsetting as follows:

```{r}
tbl[, 2:3]
tbl[1:3, 3, keep_topleft = TRUE]
```

If the referenced entry is present in the subsetting, also the
referential footnote will appear. Please consider reading relevant
vignette about [referential
footnotes](https://insightsengineering.github.io/rtables/latest-tag/articles/title_footer.html#referential-footnotes). In
case of subsetting, the referential footnotes are by default indexed
again, as if the produced table is a new one.


```{r}
tbl[10, 1]
col_paths_summary(tbl) # Use these to find the right path to value or label
row_paths_summary(tbl) #

# To select column value, use `NULL` for `rowpath`
fnotes_at_path(tbl, rowpath = NULL, colpath = c("ARM", "A: Drug X")) <- "Interesting"
tbl[3, 1]

# reindexing of {2} as {1}
fnotes_at_path(tbl, rowpath = c("M", "AGE", "Mean"), colpath = NULL) <- "THIS mean"
tbl # {1}, {2}, and {3} are present
tbl[10, 2] # only {1} which was previously {2}
```

Similar to what we have used to keep top left information, we can specify to
keep more information from the original table. As a standard the foot notes are
always present if the titles are kept.
```{r}
tbl[1:3, 2:3, keep_titles = TRUE]
tbl[1:3, 2:3, keep_titles = FALSE, keep_footers = TRUE]

# Referential footnotes are not influenced by `keep_footers = FALSE`
tbl[1:3, keep_titles = TRUE, keep_footers = FALSE]
```

## Path Based Cell Value Accessing:
Tables can be subset or modified in a structurally aware manner via
*pathing*.

Paths define semantically meaningful positions within a constructed
table that correspond to the logic of the layout used to create it.

A path is an ordered set of split names, the names of subgroups
generated by the split, and the `@content` directive, which steps into
a position's content (or row group summary) table.

We can see the row and column paths of an existing table via the
`row_paths()`, `col_paths()`, `row_paths_summary()`, and
`col_paths_summary()`, functions, or as a portion of the more general
`make_row_df()` function output.


```{r}
lyt2 <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_cols_by("SEX", split_fun = drop_split_levels) %>%
  split_rows_by("RACE", split_fun = drop_split_levels) %>%
  summarize_row_groups() %>%
  analyze(c("AGE", "STRATA1"))

tbl2 <- build_table(lyt2, ex_adsl %>% filter(SEX %in% c("M", "F") & RACE %in% (levels(RACE)[1:3])))
tbl2
```

So the column paths are as follows:

```{r}
col_paths_summary(tbl2)
```

and the row paths are as follows:

```{r}
row_paths_summary(tbl2)
```

To get a semantically meaningful subset of our table, then, we can use
`[` (or `tt_at_path()` which underlies it)

```{r}
tbl2[c("RACE", "ASIAN"), c("ARM", "C: Combination")]
```

We can also retrieve individual cell-values via the `value_at()`
convenience function, which takes a pair of row and column paths which
resolve together to an individual cell, e.g. average age for Asian
female patients in arm A:

```{r}
value_at(tbl2, c("RACE", "ASIAN", "AGE", "Mean"), c("ARM", "A: Drug X", "SEX", "F"))
```

You can also request information from non-cell specific paths with the
`cell_values()` function:

```{r}
cell_values(tbl2, c("RACE", "ASIAN", "AGE", "Mean"), c("ARM", "A: Drug X"))
```

Note the return value of `cell_values()` is always a list even if you
specify a path to a cell:

```{r}
cell_values(tbl2, c("RACE", "ASIAN", "AGE", "Mean"), c("ARM", "A: Drug X", "SEX", "F"))
```