---
title: "Simulation-based calibration checking"
bibliography: '`r system.file("bibliography.bib", package = "brms.mmrm")`'
csl: '`r system.file("asa.csl", package = "brms.mmrm")`'
output:
  rmarkdown::html_vignette:
    toc: true
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Simulation-based calibration checking}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 8,
  fig.height = 6
)
library(brms.mmrm)
library(dplyr)
library(fst)
library(ggplot2)
library(tibble)
library(tidyr)
```

# About

This vignette shows the results of a simulation-based calibration (SBC) checking study to validate the implementation of the models in `brms.mmrm`. SBC checking tests the ability of a Bayesian model to recapture the parameters used to simulate prior predictive data. For details on SBC checking, please read @Modrak2024 and the [`SBC`](https://hyunjimoon.github.io/SBC/) R package [@sbc]. This particular SBC checking study uses the [`targets`](https://docs.ropensci.org/targets/) pipeline in the [`viggnettes/sbc/`](https://github.com/openpharma/brms.mmrm/tree/main/vignettes/sbc) subdirectory of the [`brms.mmrm`](https://github.com/openpharma/brms.mmrm) package source code.

# Conclusion

From the results below, the SBC rank statistics are approximately uniformly distributed. In other words, the posterior distribution from the `brms`/Stan MMRM modeling code matches the prior from which the datasets were simulated. This is evidence that both the subgroup and non-subgroup models in `brms.mmrm` are implemented correctly.

# Setup

To show the SBC checking results in this vignette, we first load code from the SBC checking study, and we use custom functions to read and plot rank statistics:

```{r}
library(dplyr)
library(ggplot2)
library(tibble)
library(tidyr)

source("sbc/R/prior.R")
source("sbc/R/response.R")
source("sbc/R/scenarios.R")

read_ranks <- function(path) {
  fst::read_fst(path) |>
    tibble::as_tibble() |>
    pivot_longer(
      cols = everything(),
      names_to = "parameter",
      values_to = "rank"
    )
}

plot_ranks <- function(ranks) {
  ggplot(ranks) +
    geom_histogram(
      aes(x = rank),
      breaks = seq(from = 0, to = max(ranks$rank), length.out = 10)
    ) +
    facet_wrap(~parameter)
}
```

Each section below is its own SBC checking study based on a given modeling scenario. Each scenario shows results from 1000 independent simulations from the prior.

# Subgroup scenario

The subgroup scenario distinguishes itself from the others by the presence of a subgroup factor. Assumptions:

* 2 treatment groups
* 2 subgroup levels
* 3 time points
* 150 patients per treatment group
* Baseline covariates (2 continuous and 2 categorical)
* 30% dropout
* 8% rate of independent/sporadic missing response.

Model formula:

```{r, echo = FALSE}
subgroup()$formula
```

The prior was randomly generated and used for both simulation and analysis:

```{r, paged.print = FALSE}
setup_prior(subgroup) |>
  select(prior, class, coef, dpar) |>
  as.data.frame()
```

The following histograms show the SBC rank statistics which compare the prior parameter draws draws to the posterior draws. If the data simulation code and modeling code are both correct and consistent, then the rank statistics should be approximately uniformly distributed.

```{r}
ranks_subgroup <- read_ranks("sbc/results/subgroup.fst")
```

Fixed effect parameter ranks:

```{r}
ranks_subgroup |>
  filter(grepl("^b_", parameter)) |>
  filter(!grepl("^b_sigma", parameter)) |>
  plot_ranks()
```

Standard deviation parameter ranks:

```{r}
ranks_subgroup |>
  filter(grepl("b_sigma", parameter)) |>
  plot_ranks()
```

Correlation parameter ranks:

```{r}
ranks_subgroup |>
  filter(grepl("cortime_", parameter)) |>
  plot_ranks()
```

# Unstructured scenario

This scenario uses unstructured correlation and does not use a subgroup variable. Assumptions:

* 3 treatment groups
* No subgroup
* 4 time points
* 100 patients per treatment group
* No covariate adjustment
* No missing responses

Model formula:

```{r, echo = FALSE}
unstructured()$formula
```

The prior was randomly generated and used for both simulation and analysis:

```{r, paged.print = FALSE}
setup_prior(unstructured) |>
  select(prior, class, coef, dpar) |>
  as.data.frame()
```

SBC checking rank statistics:

```{r}
ranks_unstructured <- read_ranks("sbc/results/unstructured.fst")
```

Fixed effect parameter ranks:

```{r}
ranks_unstructured |>
  filter(grepl("^b_", parameter)) |>
  filter(!grepl("^b_sigma", parameter)) |>
  plot_ranks()
```
Log-scale standard deviation parameter ranks:

```{r}
ranks_unstructured |>
  filter(grepl("b_sigma", parameter)) |>
  plot_ranks()
```

Correlation parameter ranks:

```{r}
ranks_unstructured |>
  filter(grepl("cortime_", parameter)) |>
  plot_ranks()
```


# Autoregressive moving average scenario

This scenario uses an autoregressive moving average (ARMA) model with autoregressive order 1 and moving average order 1. Assumptions:

* 2 treatment groups
* No subgroup
* 3 time points
* 100 patients per treatment group
* No covariate adjustment
* No missing responses

Model formula:

```{r, echo = FALSE}
autoregressive_moving_average()$formula
```

The prior was randomly generated and used for both simulation and analysis:

```{r, paged.print = FALSE}
setup_prior(autoregressive_moving_average) |>
  select(prior, class, coef, dpar) |>
  as.data.frame()
```

SBC checking rank statistics:

```{r}
ranks_autoregressive_moving_average <- read_ranks(
  "sbc/results/autoregressive_moving_average.fst"
)
```

Fixed effect parameter ranks:

```{r}
ranks_autoregressive_moving_average |>
  filter(grepl("^b_", parameter)) |>
  filter(!grepl("^b_sigma", parameter)) |>
  plot_ranks()
```

Log-scale standard deviation parameter ranks:

```{r}
ranks_autoregressive_moving_average |>
  filter(grepl("b_sigma", parameter)) |>
  plot_ranks()
```

Correlation parameter ranks:

```{r}
ranks_autoregressive_moving_average |>
  filter(parameter %in% c("ar[1]", "ma[1]")) |>
  plot_ranks()
```

# Autoregressive scenario

This scenario is the same as above, but the correlation structure is autoregressive with order 2. Model formula:

```{r, echo = FALSE}
autoregressive()$formula
```

The prior was randomly generated and used for both simulation and analysis:

```{r, paged.print = FALSE}
setup_prior(autoregressive) |>
  select(prior, class, coef, dpar) |>
  as.data.frame()
```

SBC checking rank statistics:

```{r}
ranks_autoregressive <- read_ranks("sbc/results/autoregressive.fst")
```

Fixed effect parameter ranks:

```{r}
ranks_autoregressive |>
  filter(grepl("^b_", parameter)) |>
  filter(!grepl("^b_sigma", parameter)) |>
  plot_ranks()
```
Log-scale standard deviation parameter ranks:

```{r}
ranks_autoregressive |>
  filter(grepl("b_sigma", parameter)) |>
  plot_ranks()
```

Correlation parameter ranks:

```{r}
ranks_autoregressive |>
  filter(parameter %in% c("ar[1]", "ar[2]")) |>
  plot_ranks()
```

# Moving average scenario

This scenario is the same as above, but it uses a moving average correlation structure with order 2. Model formula:

```{r, echo = FALSE}
moving_average()$formula
```

The prior was randomly generated and used for both simulation and analysis:

```{r, paged.print = FALSE}
setup_prior(moving_average) |>
  select(prior, class, coef, dpar) |>
  as.data.frame()
```

SBC checking rank statistics:

```{r}
ranks_moving_average <- read_ranks("sbc/results/moving_average.fst")
```

Fixed effect parameter ranks:

```{r}
ranks_moving_average |>
  filter(grepl("^b_", parameter)) |>
  filter(!grepl("^b_sigma", parameter)) |>
  plot_ranks()
```
Log-scale standard deviation parameter ranks:

```{r}
ranks_moving_average |>
  filter(grepl("b_sigma", parameter)) |>
  plot_ranks()
```

Correlation parameter ranks:

```{r}
ranks_moving_average |>
  filter(parameter %in% c("ma[1]", "ma[2]")) |>
  plot_ranks()
```

# Compound symmetry scenario

This scenario is the same as above, but it uses a compound symmetry correlation structure. Model formula:

```{r, echo = FALSE}
compound_symmetry()$formula
```

The prior was randomly generated and used for both simulation and analysis:

```{r, paged.print = FALSE}
setup_prior(compound_symmetry) |>
  select(prior, class, coef, dpar) |>
  as.data.frame()
```

SBC checking rank statistics:

```{r}
ranks_compound_symmetry <- read_ranks("sbc/results/compound_symmetry.fst")
```

Fixed effect parameter ranks:

```{r}
ranks_compound_symmetry |>
  filter(grepl("^b_", parameter)) |>
  filter(!grepl("^b_sigma", parameter)) |>
  plot_ranks()
```
Log-scale standard deviation parameter ranks:

```{r}
ranks_compound_symmetry |>
  filter(grepl("b_sigma", parameter)) |>
  plot_ranks()
```

Correlation parameter ranks:

```{r}
ranks_compound_symmetry |>
  filter(parameter == "cosy") |>
  plot_ranks()
```

# Diagonal scenario

This scenario is the same as above, but it uses a diagonal correlation structure (independent time points within patients). Model formula:

```{r, echo = FALSE}
diagonal()$formula
```

The prior was randomly generated and used for both simulation and analysis:

```{r, paged.print = FALSE}
setup_prior(diagonal) |>
  select(prior, class, coef, dpar) |>
  as.data.frame()
```

SBC checking rank statistics:

```{r}
ranks_diagonal <- read_ranks("sbc/results/diagonal.fst")
```

Fixed effect parameter ranks:

```{r}
ranks_diagonal |>
  filter(grepl("^b_", parameter)) |>
  filter(!grepl("^b_sigma", parameter)) |>
  plot_ranks()
```
Log-scale standard deviation parameter ranks:

```{r}
ranks_diagonal |>
  filter(grepl("b_sigma", parameter)) |>
  plot_ranks()
```

# References