---
title: "Inference"
bibliography: '`r system.file("bibliography.bib", package = "brms.mmrm")`'
csl: '`r system.file("asa.csl", package = "brms.mmrm")`'
output:
  rmarkdown::html_vignette:
    toc: true
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Inference}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = rlang::is_installed("emmeans")
)
library(brms.mmrm)
library(dplyr)
library(tidyr)
library(zoo)
```

This vignette explains how `brms.mmrm` conducts posterior inference on a fitted [MMRM model](https://openpharma.github.io/brms.mmrm/articles/model.html) using estimated marginal means.

# Example data

Throughout this vignette, we use the `mmrm` package's `fev_data` dataset, a simulation of a clinical trial in which chronic obstructive pulmonary disease (COPD) patients (variable `USUBJID`) were randomized to different treatment groups (variable `ARMCD`) and measured across four discrete time points (variable `AVISIT`). The given response variable is forced expired volume in one second (`FEV1`), and we are interested in the `FEV1` change from baseline to each time point (derived variable `FEV_CHG`). For this vignette, we impute missing responses in order to simplify the discussion.

```{r}
library(brms.mmrm)
library(dplyr)
library(tidyr)
data(fev_data, package = "mmrm")
data <- fev_data |>
  group_by(USUBJID) |>
  complete(AVISIT) |>
  arrange(AVISIT) |>
  fill(
    any_of(c("ARMCD", "FEV1_BL", "RACE", "SEX", "WEIGHT")),
    .direction = "downup"
  ) |>
  mutate(FEV1 = na.locf(FEV1, na.rm = FALSE)) |>
  mutate(FEV1 = na.locf(FEV1, na.rm = FALSE, fromLast = TRUE)) |>
  ungroup() |>
  filter(!is.na(FEV1)) |>
  mutate(FEV1_CHG = FEV1 - FEV1_BL, USUBJID = as.character(USUBJID)) |>
  select(-FEV1) |>
  as_tibble() |>
  arrange(USUBJID, AVISIT) |>
  brm_data(
    outcome = "FEV1_CHG",
    baseline = "FEV1_BL",
    group = "ARMCD",
    patient = "USUBJID",
    time = "AVISIT",
    covariates = c("RACE", "SEX", "WEIGHT"),
    reference_group = "PBO",
    reference_time = "VIS1"
  )
```

```{r}
data
```

# Marginal means for clinical trials

According to @lenth2016, marginal means (formerly "least-squares means") are predictions (usually averaged predictions) at each point in a reference grid. The reference grid declares combinations of levels of factors of interest. In a clinical trial with repeated measures, we are often interested in the mean response at each combination of treatment group and discrete time point. For our FEV1 dataset, we are interested in the mean of `FEV1_CHG` and its standard error for each combination of treatment group and time point.^[For a subgroup analysis where the subgroup factor is identified in advance, we would be interested in each combination of treatment group, subgroup level, and time point.] In other words, we want to estimate the mean `FEV1_CHG` for group `"TRT"` time `"VIS1"`, the mean `FEV1_CHG` for group `"TRT"` time `"VIS2"`, and so on.^[Other quantities of interest are downstream of these marginal means. For example, the difference between `TRT` and `PBO` at time point `VIS4`, and the difference between `VIS4` and `VIS4` for the `TRT` group, are simple contrasts of the upstream marginal means.] We represent our goals in a reference with one row per marginal mean of interest and columns with the levels of the factors of interest.

```{r}
reference_grid <- distinct(data, ARMCD, AVISIT)
reference_grid
```

It is seldom trivial to estimate marginal means. For example, the following parameterization includes an intercept term, additive terms for each level of each factor, interactions to capture non-additive relationships among factors, continuous covariates, and different `FEV1_BL` slopes for different time points. Here, there is no model coefficient that directly corresponds to a marginal mean of interest. Even terms like `AVISITVIS2:ARMCDTRT` implicitly condition on a subset of the data because of the other variables involved.

```{r}
brms_mmrm_formula <- brm_formula(data, correlation = "diagonal")
base_formula <- as.formula(brms_mmrm_formula[[1]])
attr(base_formula, "nl") <- NULL
attr(base_formula, "loop") <- NULL
```

```{r}
base_formula
```

```{r}
colnames(model.matrix(object = base_formula, data = data))
```

To accomplish our goals, we need to carefully construct a linear transformation that maps these model coefficients to the marginal means of interest. The transformation should evaluate contrasts on the interesting parameters and average out the uninteresting parameters.

# Existing capabilities

`brms.mmrm::brm_model()` returns a fitted [`brms`](https://paulbuerkner.com/brms/) model, and [`brms`](https://paulbuerkner.com/brms/) already has tools for posterior inference. Through a combination of native functions and S3 methods, [`brms`](https://paulbuerkner.com/brms/) integrates not only with [`posterior`](https://mc-stan.org/posterior/) and [`loo`](https://mc-stan.org/loo/), but also [`emmeans`](https://cran.r-project.org/package=emmeans) for the estimation of marginal means and downstream contrasts. 

Despite the existing features in `brms`, `brms.mmrm` implements custom code to transform model coefficients into marginal means. This is because the reference grids in `emmeans` can only condition on factors explicitly declared in the model formula supplied to `brms`, whereas `brms.mmrm` needs more flexibility in order to support [informative prior archetypes](https://openpharma.github.io/brms.mmrm/articles/archetypes.html) (@bedrick1996, @bedrick1997, @christensen2010, @rosner2021).

# How `brms.mmrm` estimates marginal means

To estimate marginal means, `brms.mmrm::brm_transform_marginal()` creates a special matrix.

```{r}
transform <- brm_transform_marginal(data = data, formula = brms_mmrm_formula)
```

```{r}
dim(transform)
transform[, 1:4]
```

This special matrix encodes the equations below which map model coefficients to marginal means.^[`summary()` also invisibly returns a simple character vector with the equations below.]

```{r}
summary(transform)
```

Multiplying the matrix by a set of model coefficients is the same as plugging the coefficients into the equations above. Both produce estimates of marginal means.

```{r}
model <- lm(formula = base_formula, data = data)
marginals_custom <- transform %*% coef(model)
marginals_custom
```

This technique is similar to [`emmeans::emmeans(weights = "proportional")`](https://cran.r-project.org/package=emmeans)^[Unlike `emmeans`, `brms.mmrm` completes the grid of patient visits to add rows for implicitly missing responses. Completing the grid of patient visits ensures all patients are represented equally when averaging over baseline covariates, regardless of patients who drop out early.] (@lenth2016, @searle1979) and produces similar estimates.

```{r}
library(emmeans)
marginals_emmeans <- emmeans(
  object = model,
  specs = ~ARMCD:AVISIT,
  weights = "proportional",
  nuisance = c("USUBJID", "RACE", "SEX")
) |>
  as.data.frame() |>
  as_tibble() |>
  select(ARMCD, AVISIT, emmean) |>
  arrange(ARMCD, AVISIT)
```

```{r}
marginals_emmeans
```

```{r}
marginals_custom - marginals_emmeans$emmean
```

For our Bayesian MMRMs in `brms.mmrm`, the transformation from `brm_transform_marginal()` operates on each individual draw from the joint posterior distribution. The transformation matrix produced by `brm_transform_marginal()` is the value of the `transform` argument of `brm_marginal_draws()`. That way, `brm_marginal_draws()` produces an entire estimated posterior of each marginal mean, rather than point estimates that assume a normal or Student-t distribution.^[You can then estimate the posterior of any function of marginal means by simply applying that function to the individual posterior draws from `brm_marginal_draws()`. `brm_marginal_grid()` helps identify column names for this kind of custom inference/post-processing.]

# How `brm_marginal_draws()` works

Let us take a closer look at the equations that map model parameters to marginal means.

```{r}
summary(transform)
```

These equations include terms not only for the fixed effects of interest, but also for nuisance variables `FEV1_BL`, `SEX`, `RACE`, and `WEIGHT`. These nuisance variables were originally part of the model formula, which means each marginal mean can only be interpreted relative to a fixed value of `FEV1_BL`, a fixed proportion of female patients, etc. For example, if we dropped `40.13*b_FEV1_BL`, then `PBO:VIS1` would be the placebo mean at visit 1 for patients with `FEV1_BL = 0`: in other words, patients who cannot breathe out any air from their lungs at the beginning of the study. Similarly, if we dropped `0.53*b_SEXFemale`, then we would have to interpret `PBO:VIS1` as the visit 1 placebo mean for male patients only. Fixed values `40.13` and `0.53` are averages over the data to ensure our marginal means apply to the entire patient population as a whole.

The major challenge of `brm_transform_marginal()` is to condition on nuisance values that represent appropriate averages over the data. To calculate these nuisance values, `brm_transform_marginal()` uses a technique similar to `weights = "proportional"` in `emmeans::emmeans()`.

To replicate `brm_transform_marginal()`, we first create a reference grid to define the factor levels of interest and the means of continuous variables to condition on.

```{r}
grid <- data |>
  mutate(FEV1_BL = mean(FEV1_BL), WEIGHT = mean(WEIGHT)) |>
  distinct(ARMCD, AVISIT, FEV1_BL, WEIGHT) |>
  arrange(ARMCD, AVISIT)
grid
```

We use the grid to construct a model matrix with the desired interactions between continuous variables and factors of interest. Each column represents a model coefficient, and each row represents a marginal mean of interest.

```{r}
transform <- model.matrix(
  object = ~ FEV1_BL * AVISIT + ARMCD * AVISIT + WEIGHT,
  data = grid
)
rownames(transform) <- paste(grid$ARMCD, grid$AVISIT)
transform
```

We want to predict at the "average" of `SEX` and `RACE` across all the data. Since `SEX` and `RACE` are factors, we cannot simply take the means of the variables themselves. Rather, we construct a model matrix to turn each factor *level* into a dummy variable, and then average those dummy variables across the entire dataset. This process accounts for the observed frequencies of these levels in the data (ideal for passive variables that the experiment does not directly control), while guarding against hidden confounding with the factors of interest (which can lead to Simpson's paradox).^[For more context, please refer to the [`emmeans` basics vignette](https://cran.r-project.org/package=emmeans/vignettes/basics.html), as well as discussion forums [here](https://stats.stackexchange.com/questions/332167/what-are-ls-means-useful-for) and [here](https://stats.stackexchange.com/questions/510862/is-least-squares-means-lsmeans-statistical-nonsense/510923#510923).]

```{r}
proportional_factors <- data |>
  model.matrix(object = ~ 0 + SEX + RACE) |>
  colMeans() |>
  t()
proportional_factors
```

```{r}
transform <- transform |>
  bind_cols(proportional_factors) |>
  as.matrix()
transform <- transform[, names(coef(model))]
rownames(transform) <- paste(grid$ARMCD, grid$AVISIT)
transform
```

Finally, we use this transformation matrix to map estimated model coefficients to estimated marginal means.

```{r}
marginals_custom <- transform %*% coef(model)
marginals_custom
```

These results are extremely close to the estimated marginal means from `emmeans`.

```{r}
marginals_emmeans |>
  bind_cols(custom = as.numeric(marginals_custom)) |>
  mutate(difference = custom - emmean)
```

`brms.mmrm` follows the procedure above, but in a Bayesian context. The `brm_transform_marginal()` creates the matrix above, and `brm_marginal_draws()` uses it to transform posterior draws of `brms` model coefficients into posterior draws of marginal means. These posterior draws of marginal means then support estimation of treatment effects (via `brm_marginal_draws()` and `brm_marginal_summaries()`) and posterior probabilities on those treatment effects (via `brm_marginal_probabilities()`). To fine-tune the marginal mean estimation procedure for niche use cases, you can modify the transformation returned from `brm_transform_marginal()` and then supply it to the `transform` argument of `brm_marginal_draws()`.

# Subgroup analysis

Subgroup analysis raises important questions about how nuisance variables are averaged, and you as the user are responsible for choosing the approach that best suits the situation. To illustrate, suppose `SEX` is a pre-specified subgroup. When estimating marginal means, we now wish to condition on `"Female"` vs `"Male"` while averaging over `RACE` across the whole dataset. In `emmeans`, this is similar to how we calculated `marginals_emmeans` above, but we now move `SEX` from `nuisance` to `specs`:

```{r}
emmeans(
  object = model,
  specs = ~SEX:ARMCD:AVISIT,
  weights = "proportional",
  nuisance = c("USUBJID", "RACE")
)
```

This may be reasonable in some cases, and it mitigates the kind of hidden confounding between the subgroup and other variables which may otherwise cause Simpson's paradox. However, for subgroup-specific marginal means, it may not be realistic to condition on a single point estimate for all levels of the reference grid. For example, if the model were to regress on a `pregnancy` variable, then the marginal means for `SEX = "Male"` should always condition on `pregnancy = 0` instead of `mean(data$pregnancy)`. And in general, it may be more reasonable to condition on subgroup-specific averages of nuisance variables. However, if you do this, it is your responsibility to investigate and understand the hidden interactions and confounding in your dataset. <https://cran.r-project.org/package=emmeans/vignettes/interactions.html> is an edifying vignette on this topic.

To opt into subgroup-specific averages of nuisance variables in `brms.mmrm`, set `average_within_subgroup = TRUE` in `brm_transform_marginal()`, then supply the output to the `transform` argument of `brm_marginal_draws()`.

To replicate `brm_transform_marginal(average_within_subgroup = TRUE)` from scratch, first create a reference grid which includes subgroup levels.

```{r}
grid <- data |>
  distinct(ARMCD, SEX, AVISIT) |>
  arrange(ARMCD, SEX, AVISIT)
grid
```

For each continuous variable, append the corresponding subgroup-specific averages to the grid.

```{r}
means <- data |>
  group_by(SEX) |>
  summarize(FEV1_BL = mean(FEV1_BL), WEIGHT = mean(WEIGHT), .groups = "drop")
grid <- left_join(x = grid, y = means, by = "SEX")
grid
```

Begin creating the variable transformation matrix using this new grid. Be sure to include the subgroup in the formula below exactly as it appears in the formula used to fit the model.

```{r}
transform <- model.matrix(
  object = ~ FEV1_BL * AVISIT + ARMCD * AVISIT + SEX + WEIGHT,
  data = grid
)
```

Append subgroup-specific averages of the levels of nuisance factors (in this case, just `RACE`).

```{r}
proportions <- data |>
  model.matrix(object = ~ 0 + RACE) |>
  as.data.frame() |>
  mutate(SEX = data$SEX) |>
  group_by(SEX) |>
  summarize(across(everything(), mean), .groups = "drop")
transform <- transform |>
  as.data.frame() |>
  mutate(SEX = grid$SEX) |>
  left_join(y = proportions, by = "SEX") |>
  select(-SEX) |>
  as.matrix()
```

Complete the transformation matrix by assigning the correct row names and aligning the column order with that of the model coefficients.

```{r}
rownames(transform) <- paste(grid$ARMCD, grid$SEX, grid$AVISIT)
transform <- transform[, names(coef(model))]
transform
```

Finally, use the custom `transform` matrix to estimate subgroup-specific marginal means. Because we averaged `FEV1_BL`, `WEIGHT`, and `RACE` within subgroup levels, the results will differ from those of `emmeans`.

```{r}
transform %*% coef(model)
```

# References