Fitting ML-UMR Models

Overview

The mlumr() function fits Bayesian multilevel unanchored meta-regression models using Stan. The default backend is rstan; cmdstanr can be used when it is installed and configured. Three outcome families are supported: binary (binomial), continuous (normal), and count (Poisson). Two model types are available for each family:

SPFA (Shared Prognostic Factor Assumption): shared regression coefficients across treatments
Relaxed SPFA: treatment-specific regression coefficients, allowing effect modification

Model specification

Binary outcomes (binomial)

The SPFA model assumes that covariates affect outcomes equally for both treatments. The model is:

\[ \text{logit}(P(Y_i = 1 | \text{Index})) = \mu_A + \mathbf{x}_i^T \boldsymbol{\beta} \]

\[ \text{logit}(P(Y_j = 1 | \text{Comparator})) = \mu_B + \mathbf{x}_j^T \boldsymbol{\beta} \]

where \(\boldsymbol{\beta}\) is shared across both treatments. The AgD likelihood integrates over the comparator population covariate distribution using QMC points.

The Relaxed model allows treatment-specific effects:

\[ \text{logit}(P(Y_i = 1 | \text{Index})) = \mu_A + \mathbf{x}_i^T \boldsymbol{\beta}_A \]

\[ \text{logit}(P(Y_j = 1 | \text{Comparator})) = \mu_B + \mathbf{x}_j^T \boldsymbol{\beta}_B \]

This captures effect modification when covariate effects differ between treatments. The difference \(\boldsymbol{\delta} = \boldsymbol{\beta}_A - \boldsymbol{\beta}_B\) quantifies treatment-by-covariate interaction.

Continuous outcomes (normal)

For continuous outcomes, the SPFA model is:

\[ Y_i | \text{Index} \sim \text{Normal}(\mu_A + \mathbf{x}_i^T \boldsymbol{\beta}, \sigma^2) \]

The AgD likelihood uses the mean and standard error of the observed outcome. The generated quantities include mean differences (delta_index, delta_comparator) as the primary treatment effect measure.

An additional prior_sigma parameter controls the prior on the residual standard deviation \(\sigma\). The default is a half-normal prior induced by Stan’s <lower=0> constraint; half-Student-t and exponential priors are also available through the prior constructors.

Count outcomes (Poisson)

For count outcomes with exposure, the SPFA model uses a Poisson likelihood with log link:

\[ Y_i | \text{Index} \sim \text{Poisson}(\exp(\mu_A + \mathbf{x}_i^T \boldsymbol{\beta}) \cdot E_i) \]

where \(E_i\) is the exposure (person-time). The AgD likelihood uses log-sum-exp for numerical stability. The generated quantities include rate ratios (delta_index, delta_comparator).

Fitting models

The chunks below all use a small toy dat built from the same pipeline as vignette("data-preparation"). Run this setup chunk first so the subsequent fitting, summary, prediction, and comparison calls have something to operate on.

library(mlumr)
set.seed(2026)

trial_a_data <- data.frame(
  trt      = "Drug_A",
  response = rbinom(300, 1, 0.55),
  age_cat  = rbinom(300, 1, 0.40),
  sex      = rbinom(300, 1, 0.55)
)
trial_b_data <- data.frame(
  trt           = "Drug_B",
  n_total       = 400,
  n_events      = 160,
  age_cat_mean  = 0.35,
  sex_mean      = 0.50
)

ipd <- set_ipd(trial_a_data, treatment = "trt", outcome = "response",
               covariates = c("age_cat", "sex"))
agd <- set_agd(trial_b_data, treatment = "trt",
               outcome_n = "n_total", outcome_r = "n_events",
               cov_means = c("age_cat_mean", "sex_mean"),
               cov_types = c("binary", "binary"))
dat <- combine_data(ipd, agd)
dat <- add_integration(dat, n_int = 64,
                       age_cat = distr(qbern, prob = age_cat_mean),
                       sex = distr(qbern, prob = sex_mean))

# SPFA model (default)
fit_spfa <- mlumr(
  dat, model = "spfa",
  chains = 2, iter = 500, warmup = 250,
  seed = 42, refresh = 0, verbose = FALSE
)

# Relaxed SPFA
fit_relaxed <- mlumr(
  dat, model = "relaxed",
  chains = 2, iter = 500, warmup = 250,
  seed = 43, refresh = 0, verbose = FALSE
)

Controlling the sampler

fit <- mlumr(
  dat,
  model = "spfa",
  chains = 4,               # Number of MCMC chains
  iter = 4000,               # Total iterations per chain
  warmup = 2000,             # Warmup iterations
  seed = 42,                 # For reproducibility
  adapt_delta = 0.99,        # Increase for divergences
  max_treedepth = 15,        # Maximum tree depth
  refresh = 500              # Print progress every 500 iterations
)

The backend can be selected per fit. The code below runs the same small demonstration model with each available engine and records elapsed wall-clock time. rstan uses the model compiled into the installed R package. The first cmdstanr call may include external executable compilation or cache setup, so the table reports cold and warm cmdstanr runs separately. For repeated analyses, the warm-cache cmdstanr row is the more relevant comparison.

run_backend_timed <- function(engine, run, note, seed = 2026, ...) {
  start_time <- proc.time()[["elapsed"]]
  fit <- tryCatch(
    mlumr(
      dat,
      model = "spfa",
      engine = engine,
      chains = 2,
      iter = 300,
      warmup = 150,
      seed = seed,
      refresh = 0,
      verbose = FALSE,
      ...
    ),
    error = function(e) e
  )
  elapsed <- proc.time()[["elapsed"]] - start_time

  if (inherits(fit, "error")) {
    return(data.frame(
      run = run,
      engine = engine,
      status = "failed",
      elapsed_seconds = round(elapsed, 2),
      lor_comparator_mean = NA_real_,
      max_rhat = NA_real_,
      min_ess = NA_real_,
      note = conditionMessage(fit),
      stringsAsFactors = FALSE
    ))
  }

  lor_comp <- marginal_effects(fit, effect = "lor", population = "comparator")
  data.frame(
    run = run,
    engine = engine,
    status = "ran",
    elapsed_seconds = round(elapsed, 2),
    lor_comparator_mean = round(lor_comp$mean, 3),
    max_rhat = round(max(fit$summary$Rhat, na.rm = TRUE), 3),
    min_ess = round(min(fit$summary$n_eff, na.rm = TRUE), 1),
    note = note,
    stringsAsFactors = FALSE
  )
}

cmdstanr_ready <- requireNamespace("cmdstanr", quietly = TRUE) &&
  isTRUE(tryCatch({
    nzchar(cmdstanr::cmdstan_path())
  }, error = function(e) FALSE))

backend_speed <- run_backend_timed(
  engine = "rstan",
  run = "rstan",
  note = "Package-compiled model, already used above"
)

if (cmdstanr_ready) {
  backend_speed <- rbind(
    backend_speed,
    run_backend_timed(
      engine = "cmdstanr",
      run = "cmdstanr_cold",
      note = "First call; may include compile/cache setup"
    ),
    run_backend_timed(
      engine = "cmdstanr",
      run = "cmdstanr_warm",
      note = "Second call; executable/cache already available"
    ),
    run_backend_timed(
      engine = "cmdstanr",
      run = "cmdstanr_warm_parallel",
      note = "Warm cache with two parallel chains",
      parallel_chains = 2
    )
  )
} else {
  backend_speed <- rbind(
    backend_speed,
    data.frame(
      run = "cmdstanr",
      engine = "cmdstanr",
      status = "skipped",
      elapsed_seconds = NA_real_,
      lor_comparator_mean = NA_real_,
      max_rhat = NA_real_,
      min_ess = NA_real_,
      note = "cmdstanr or CmdStan is not available in this R session",
      stringsAsFactors = FALSE
    )
  )
}
#> Running MCMC with 2 sequential chains...
#> 
#> Chain 1 finished in 0.1 seconds.
#> Chain 2 finished in 0.1 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.1 seconds.
#> Total execution time: 0.4 seconds.
#> Running MCMC with 2 sequential chains...
#> 
#> Chain 1 finished in 0.1 seconds.
#> Chain 2 finished in 0.1 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.1 seconds.
#> Total execution time: 0.2 seconds.
#> Running MCMC with 2 parallel chains...
#> 
#> Chain 1 finished in 0.1 seconds.
#> Chain 2 finished in 0.1 seconds.
#> 
#> Both chains finished successfully.
#> Mean chain execution time: 0.1 seconds.
#> Total execution time: 0.1 seconds.

backend_speed
#>                      run   engine status elapsed_seconds lor_comparator_mean
#> 1                  rstan    rstan    ran            1.62               0.633
#> 2          cmdstanr_cold cmdstanr    ran           39.70               0.633
#> 3          cmdstanr_warm cmdstanr    ran            2.04               0.633
#> 4 cmdstanr_warm_parallel cmdstanr    ran            2.24               0.633
#>   max_rhat min_ess                                            note
#> 1    1.022    90.3      Package-compiled model, already used above
#> 2    1.026    88.9     First call; may include compile/cache setup
#> 3    1.026    88.9 Second call; executable/cache already available
#> 4    1.026    88.9             Warm cache with two parallel chains

For production analyses, choose the backend based on your local toolchain, diagnostics, and workflow. cmdstanr is often faster after the executable is compiled, especially with parallel chains and longer runs, but this is not guaranteed for tiny demonstration fits. In short examples, compilation, process startup, and CSV readback can dominate the timing. Posterior estimates should be similar up to Monte Carlo error when the same model, data, priors, and sampler settings are used.

Setting priors

By default, weakly informative Normal(0, 10) priors are used for intercepts and Normal(0, 2.5) priors are used for regression coefficients. Normal, Student-t, and Cauchy priors are available for intercepts and regression coefficients. For normal-family residual standard deviations, prior_sigma also supports an exponential prior.

fit <- mlumr(
  dat,
  model = "spfa",
  prior_intercept = prior_normal(0, 10),   # Default
  prior_beta = prior_normal(0, 2.5)        # More informative for betas
)

Per-covariate beta priors and autoscaling are supported:

fit <- mlumr(
  dat,
  model = "spfa",
  prior_beta = list(
    prior_normal(0, 2.5, autoscale = TRUE),
    prior_normal(0, 1.5, autoscale = TRUE)
  )
)

Inspect the priors actually passed to Stan, including autoscaled beta scales:

prior_summary(fit)

Inspecting results

Print and summary

# Quick overview
print(fit_spfa)
#> ML-UMR Fit
#> ==========
#> 
#> Model: SPFA (Shared Prognostic Factors) 
#> Family: Binary 
#> Link: logit 
#> Engine: rstan 
#> Treatments:
#>   Index (IPD): Drug_A 
#>   Comparator (AgD): Drug_B 
#> 
#> Data:
#>   IPD: n = 300 observations
#>   AgD: 1 rows
#>   Covariates: 2 
#>   Integration points: 64 
#> 
#> Sampling:
#>   Chains: 2 
#>   Iterations: 500 (warmup: 250 )
#> 
#> Key Parameters:
#>        variable        mean         sd        2.5%      97.5%      Rhat
#>        mu_index  0.01539256 0.18883908 -0.37200045  0.3564798 1.0020906
#>   mu_comparator -0.64227165 0.16949982 -0.98577758 -0.2947996 0.9993743
#>       lor_index  0.63883925 0.15289218  0.36295708  0.9408910 0.9971667
#>  lor_comparator  0.63923774 0.15306407  0.36327859  0.9413027 0.9971759
#>        rd_index  0.15763375 0.03709868  0.09032631  0.2305075 0.9972087
#>   rd_comparator  0.15763607 0.03714788  0.09039003  0.2306113 0.9972285
#> 
#> Use summary() for full results

# Detailed summary
summary(fit_spfa)
#> ML-UMR Model Summary
#> ====================
#> 
#> Model: SPFA 
#> Family: Binary 
#> Link: logit 
#> Engine: rstan 
#> Treatments: Drug_A (IPD) vs Drug_B (AgD)
#> 
#> MCMC Diagnostics:
#>   Divergent transitions: 0 
#>   Max treedepth hits: 0 
#>   Max Rhat: 1.005 
#>   Min ESS: 171 
#> 
#> Intercepts (logit scale):
#>       variable        mean        sd       2.5%      97.5%      Rhat
#>       mu_index  0.01539256 0.1888391 -0.3720004  0.3564798 1.0020906
#>  mu_comparator -0.64227165 0.1694998 -0.9857776 -0.2947996 0.9993743
#> 
#> Regression Coefficients:
#>  variable       mean        sd        2.5%     97.5%      Rhat
#>   beta[1] -0.1776436 0.2396588 -0.65564137 0.2978308 1.0011910
#>   beta[2]  0.6007849 0.2342236  0.09846948 1.0951764 0.9967522
#> 
#> Marginal Treatment Effects:
#>   Log Odds Ratios:
#>        variable      mean        sd      2.5%     97.5%
#>       lor_index 0.6388393 0.1528922 0.3629571 0.9408910
#>  lor_comparator 0.6392377 0.1530641 0.3632786 0.9413027
#>   Risk Differences:
#>       variable      mean         sd       2.5%     97.5%
#>       rd_index 0.1576337 0.03709868 0.09032631 0.2305075
#>  rd_comparator 0.1576361 0.03714788 0.09039003 0.2306113
#>   Risk Ratios:
#>       variable     mean        sd     2.5%    97.5%
#>       rr_index 1.396405 0.1127879 1.200357 1.650386
#>  rr_comparator 1.400566 0.1134029 1.203125 1.659196

Marginal treatment effects

The main quantities of interest are population-level treatment effects, marginalized over the covariate distribution:

# All effects in both populations
marginal_effects(fit_spfa)
#>         variable effect population      mean         sd       q2.5       q50
#> 1      lor_index    LOR      Index 0.6388393 0.15289218 0.36295708 0.6395973
#> 2 lor_comparator    LOR Comparator 0.6392377 0.15306407 0.36327859 0.6400668
#> 3       rd_index     RD      Index 0.1576337 0.03709868 0.09032631 0.1580295
#> 4  rd_comparator     RD Comparator 0.1576361 0.03714788 0.09039003 0.1578905
#> 5       rr_index     RR      Index 1.3964045 0.11278788 1.20035702 1.3941597
#> 6  rr_comparator     RR Comparator 1.4005660 0.11340289 1.20312501 1.3983381
#>       q97.5
#> 1 0.9408910
#> 2 0.9413027
#> 3 0.2305075
#> 4 0.2306113
#> 5 1.6503858
#> 6 1.6591961

# Log odds ratios only
marginal_effects(fit_spfa, effect = "lor")
#>         variable effect population      mean        sd      q2.5       q50
#> 1      lor_index    LOR      Index 0.6388393 0.1528922 0.3629571 0.6395973
#> 2 lor_comparator    LOR Comparator 0.6392377 0.1530641 0.3632786 0.6400668
#>       q97.5
#> 1 0.9408910
#> 2 0.9413027

# In index population only
marginal_effects(fit_spfa, effect = "lor", population = "index")
#>    variable effect population      mean        sd      q2.5       q50    q97.5
#> 1 lor_index    LOR      Index 0.6388393 0.1528922 0.3629571 0.6395973 0.940891

# Full posterior draws (for custom summaries)
lor_draws <- marginal_effects(fit_spfa, effect = "lor", summary = FALSE)
head(lor_draws)
#>   lor_index lor_comparator
#> 1 0.5621229      0.5624823
#> 2 0.6489194      0.6486779
#> 3 0.6444013      0.6447164
#> 4 0.6788594      0.6788753
#> 5 0.7168121      0.7168396
#> 6 0.6424331      0.6425593

Absolute predictions

# Predicted probabilities P(Y=1) for each treatment in each population
predict(fit_spfa, population = "both")
#>   treatment population      mean         sd      q2.5       q50     q97.5
#> 1    Drug_A      Index 0.5612091 0.02977884 0.5040289 0.5625020 0.6192961
#> 2    Drug_B      Index 0.4035753 0.02770884 0.3483834 0.4046299 0.4555289
#> 3    Drug_A Comparator 0.5568075 0.03001661 0.4985468 0.5571865 0.6133761
#> 4    Drug_B Comparator 0.3991715 0.02689400 0.3435149 0.3992277 0.4483255

# On logit scale
predict(fit_spfa, type = "link")
#>   treatment population       mean        sd         q2.5        q50      q97.5
#> 1    Drug_A      Index  0.2541485 0.1255538  0.016448033  0.2574419  0.4978812
#> 2    Drug_B      Index -0.4035157 0.1184283 -0.645106960 -0.4002716 -0.1822008
#> 3    Drug_A Comparator  0.2359455 0.1258985 -0.005183504  0.2361731  0.4806885
#> 4    Drug_B Comparator -0.4217187 0.1156721 -0.661097846 -0.4201242 -0.2136304

# Full posterior draws
pred_draws <- predict(fit_spfa, summary = FALSE)
head(pred_draws)
#>   p_index_index p_comparator_index p_index_comparator p_comparator_comparator
#> 1     0.5472828          0.4079553          0.5422336               0.4029606
#> 2     0.5730841          0.4122979          0.5565534               0.3961597
#> 3     0.5823263          0.4226094          0.5743957               0.4146177
#> 4     0.5703145          0.4023399          0.5620552               0.3942772
#> 5     0.5423707          0.3665788          0.5370674               0.3616298
#> 6     0.5829024          0.4236680          0.5775979               0.4183285

Conditional predictions and effects

predict() and marginal_effects() report population-averaged quantities. Use conditional_predict() and conditional_effects() when the estimand is a specific covariate profile rather than the index or comparator population.

profiles <- data.frame(
  age_cat = c(0, 1),
  sex = c(0, 1)
)

# Absolute event probabilities at each profile
conditional_predict(fit_spfa, newdata = profiles)
#>   profile treatment      mean         sd      q2.5       q50     q97.5
#> 1       1    Drug_A 0.5038500 0.04677380 0.4080580 0.5030961 0.5881880
#> 2       1    Drug_B 0.3457348 0.03799703 0.2717469 0.3467062 0.4268293
#> 3       2    Drug_A 0.6066538 0.05202611 0.5099722 0.6034968 0.7049153
#> 4       2    Drug_B 0.4461500 0.05698849 0.3407049 0.4452150 0.5604472

# Conditional link-scale, risk-difference, and risk-ratio effects
conditional_effects(fit_spfa, newdata = profiles)
#>   profile      effect      mean         sd       q2.5       q50     q97.5
#> 1       1 LINK_EFFECT 0.6576642 0.15774765 0.37489065 0.6584049 0.9823065
#> 2       1          RD 0.1581152 0.03800158 0.08854254 0.1586298 0.2359074
#> 3       1          RR 1.4659165 0.13553950 1.23270914 1.4566001 1.7785156
#> 4       2 LINK_EFFECT 0.6576642 0.15774765 0.37489065 0.6584049 0.9823065
#> 5       2          RD 0.1605038 0.03802191 0.09256806 0.1608222 0.2387842
#> 6       2          RR 1.3710490 0.11957532 1.17464773 1.3578899 1.6389207

For SPFA binary models, the conditional link-scale effect is constant across profiles because the shared beta cancels on the fitted link scale. Risk differences and risk ratios can still vary by profile because they depend on the absolute baseline probabilities.

Model comparison

Compare SPFA and Relaxed models using DIC:

compare_models(fit_spfa, fit_relaxed)
#> 
#> Model Comparison (DIC)
#> ======================
#> 
#>         Model    DIC   pD Delta_DIC
#>  Relaxed SPFA 420.12 3.92      0.00
#>          SPFA 421.18 4.95      1.06
#> 
#> Lower DIC = better fit. Delta_DIC > 5 is a rough heuristic for
#> meaningful difference, not a formally calibrated threshold.
#> DIC should not be the sole basis for model selection.

You can also compute DIC individually:

dic_spfa <- calculate_dic(fit_spfa)
print(dic_spfa)
#> DIC for ML-UMR Model
#> ====================
#> 
#> Model: SPFA 
#> DIC: 421.18 
#> pD: 4.95 
#> D_bar: 416.23

MCMC diagnostics

The package automatically warns about:

Divergent transitions (increase adapt_delta)
Maximum treedepth hits (increase max_treedepth)
High Rhat values (> 1.01, run more iterations)
Low ESS (< 400, run more iterations)

Check diagnostics manually:

# Diagnostics stored in the fit object
fit_spfa$diagnostics$n_divergent
#> [1] 0
fit_spfa$diagnostics$n_max_treedepth
#> [1] 0

# Rhat and ESS from the summary table
max(fit_spfa$summary$Rhat, na.rm = TRUE)
#> [1] 1.004521
min(fit_spfa$summary$n_eff, na.rm = TRUE)
#> [1] 171.1997

For visual diagnostics, use the bayesplot package:

library(bayesplot)
pars <- c("mu_index", "mu_comparator", "lor_index")
draws_array <- rstan::extract(fit_spfa$stanfit, pars = pars, permuted = FALSE)
mcmc_dens_overlay(draws_array, pars = pars)

mcmc_intervals(as.matrix(fit_spfa$draws[, c("lor_index", "lor_comparator")]))

Troubleshooting

Divergent transitions

Increase adapt_delta closer to 1:

fit <- mlumr(dat, adapt_delta = 0.99)

Slow convergence

Increase iterations and warmup:

fit <- mlumr(dat, iter = 8000, warmup = 4000)

Identifiability in Relaxed model

The Relaxed model has more parameters. With only one AgD row, the comparator-specific coefficients may be weakly identified. The package warns about this. Consider:

Using the SPFA model if effect modification is not expected
Providing multiple AgD subgroups
Using more informative priors for betas
Running prior_sensitivity() to assess how strongly relaxed-model conclusions depend on the beta prior

prior_sensitivity(fit_relaxed, prior_beta_scales = c(1, 2.5, 5, 10))