---
title: "Comparing Against Baselines or Control"
author: "Gabriel Becker and Adrian Waddell"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Comparing Against Baselines or Control}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---


```{r, echo=FALSE}
knitr::opts_chunk$set(comment = "#")
```

```{css, echo=FALSE}

.sourcecode {
  background-color:lightblue;
}
.reveal .r code {
    white-space: pre;
}
```


## Introduction


Often the data from one column is considered the
reference/baseline/comparison group and is compared to the data from
the other columns.

For example, lets calculate the average age:


```{r, message=FALSE}
library(rtables)

lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  analyze("AGE")

tbl <- build_table(lyt, DM)
tbl
```

and then the difference of the average `AGE` between the placebo arm
and the other arms:

```{r}
lyt2 <- basic_table() %>%
  split_cols_by("ARM", ref_group = "B: Placebo") %>%
  analyze("AGE", afun = function(x, .ref_group) {
    in_rows(
      "Difference of Averages" = rcell(mean(x) - mean(.ref_group), format = "xx.xx")
    )
  })

tbl2 <- build_table(lyt2, DM)
tbl2
```

Note that the column order has changed and the reference group is
displayed in the first column.

In cases where we want cells to be blank in the reference column,
(e.g., "B: Placebo") we use `non_ref_rcell()` instead of `rcell()`,
and pass `.in_ref_col` as the second argument:

```{r}
lyt3 <- basic_table() %>%
  split_cols_by("ARM", ref_group = "B: Placebo") %>%
  analyze(
    "AGE",
    afun = function(x, .ref_group, .in_ref_col) {
      in_rows(
        "Difference of Averages" = non_ref_rcell(mean(x) - mean(.ref_group), is_ref = .in_ref_col, format = "xx.xx")
      )
    }
  )

tbl3 <- build_table(lyt3, DM)
tbl3

lyt4 <- basic_table() %>%
  split_cols_by("ARM", ref_group = "B: Placebo") %>%
  analyze(
    "AGE",
    afun = function(x, .ref_group, .in_ref_col) {
      in_rows(
        "Difference of Averages" = non_ref_rcell(mean(x) - mean(.ref_group), is_ref = .in_ref_col, format = "xx.xx"),
        "another row" = non_ref_rcell("aaa", .in_ref_col)
      )
    }
  )

tbl4 <- build_table(lyt4, DM)
tbl4
```

You can see which arguments are available for `afun` in the manual for
`analyze()`.

## Row Splitting

When adding row-splitting the reference data may be represented by the
column with or without row splitting. For example:

```{r}
lyt5 <- basic_table(show_colcounts = TRUE) %>%
  split_cols_by("ARM", ref_group = "B: Placebo") %>%
  split_rows_by("SEX", split_fun = drop_split_levels) %>%
  analyze("AGE", afun = function(x, .ref_group, .ref_full, .in_ref_col) {
    in_rows(
      "is reference (.in_ref_col)" = rcell(.in_ref_col),
      "ref cell N (.ref_group)" = rcell(length(.ref_group)),
      "ref column N (.ref_full)" = rcell(length(.ref_full))
    )
  })

tbl5 <- build_table(lyt5, subset(DM, SEX %in% c("M", "F")))
tbl5
```

The data assigned to `.ref_full` is the full data of the reference
column whereas the data assigned to `.ref_group` respects the
subsetting defined by row-splitting and hence is from the same subset
as the argument `x` or `df` to `afun`.