Meta-analytic data is largely characterized in three ways: univariate or multivariate, meta-analysis or network meta-analysis, and aggregate data or individual participant data. All other modeling choices fall into prior specification. For aggregate-data models, the standard data format is shown in the following table:
Outcome | Std.Dev | XCovariate | WCovariate | Trial (k ) |
Treat (t ) |
Npt |
---|---|---|---|---|---|---|
ȳ⋅13 | S13 | x13⊤ | w13⊤ | 1 | 3 | 1000 |
ȳ⋅10 | S10 | x10⊤ | w20⊤ | 1 | 0 | 1000 |
ȳ⋅20 | S20 | x20⊤ | w20⊤ | 2 | 0 | 1000 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Outcome is the response(s), Std.Dev is the standard deviation(s) of the response(s), XCovariate and WCovariate are the design matrices, and Npt is the trial sample sizes. The pair of trial and treatment indicators is unique to a specific row. Individual participant data obviously doesn’t have Std.Dev and Npt.
bmeta_analyze
bmeta_analyze
is the main function in
metapack that supersedes and subsumes all previous
worker functions. bmeta_analyze
internally parses the
formula and identifies the model, which ultimately passes the arguments
to a suitable worker function. The vanilla formula
class lacks parts and should strictly have one left-hand side
(LHS) and one right-hand side (RHS). Formula (Zeileis and Croissant 2010) extends this
formula interface to allow multiple parts and multiple responses,
lending itself well into (network) meta-analysis.
Aside from the model-specific arguments, control
has two
items universally applicable: scale_x
and
verbose
. When TRUE
, the former standardizes
the fixed-effects design matrix, and the latter allows printing the
progress bar of the Markov chain Monte Carlo algorithm. Both default to
FALSE
if not provided. Note that if scale_x
is
TRUE
, the subsequent methods like print
,
summary
, or coef
will revert the scaling to
their original units. If this behavior is not desired, scale the
variables explicitly in the formula.
Formula
Each part in LHS or RHS is an R expression where variables
(or functions of variables) are chained with a plus sign
(+
), e.g., x1 + x2
. The tilde (~
)
separates all LHS’s from all RHS’s, each side further broken down into
parts by vertical bars (|
). The meaning of each part is
syntactically determined by its location within a formula, like the
English language. Therefore, the order must be observed as presribed for
bmeta_analyze
to correctly identify and configure the
model.
Capitalizing on the unique data structure of meta-analysis data, the general form of the formula is as follows:
y1 + y2
), the responses, is required of
all models. Depending on the number of variables given in the first LHS,
the model is either univariate or multivariate.sd1 + sd2
) supplies the standard
deviations of the endpoints required of an aggregate-data meta-analysis.
Currently, with no IPD models implemented, bmeta_analyze
will break if this part is missing; however, in the coming weeks, this
requirement will be relaxed with the introduction of IPD models.x1 + x2 + x3 + ns(n)
) defines the
fixed-effects covariates. For aggregate-data models, the trial sample
sizes must be passed as an argument to ns()
. In the example
code, n
is the variable containing the trial sample sizes.
IPD models do not have trial samples sizes. In fact,
bmeta_analyze
checks the existence of ns()
to
determine whether it’s aggregate data.z1 + z2 + z3
) defines either the
random-effects covariates (e.g., wtk⊤γ)
or the variance-related covariates (e.g., log τ = ztk⊤ϕ)—see
Appendix.treat + trial
or
treat + trial + groups
) corresponds to the treatment and
trial indicators, and optionally the grouping variable if it exists.
Variables here must be provided in the exact order shown in the example.
If variables for this part are not already factors, they are coerced to
extract the levels and number of levels—and unfactored
afterward.Aside from formula
and data
, there are four
remaining arguments that are required to be lists: prior
,
mcmc
, control
, and init
.
prior
.mcmc
contains the numbers of Markov chain Monte Carlo
(MCMC) iterations: ndiscard
for the number of burn-in
iterations, nskip
for thinning, and nkeep
for
the posterior sample size.control
configures the estimation algorithm. Currently,
the Metropolis-Hastings algorithm is the only estimation method that
allows tuning. *_stepsize
with the asterisk replaced with
one of the parameter names indicates the stepsize for determining the
sample evaluation points in the localized Metropolis algorithm.init
provides the initial values for the parameters in
case a user has a priori known high-quality starting
values.bmeta_analyze
Univariate meta-analysis can be specified with one response endpoint and two treatments, except that the covariance modeling must be “NoRecovery”.
Multivariate meta-analysis is characterized by multiple response
endpoints and two treatments. Additionally, multivariate meta-analysis
requires a model for the covariance matrix to be specified as
model
in the prior
argument—which defaults to
model=NoRecovery
if left unspecified. The available options
are listed below. The objects parenthesized at the end of every bullet
point are the hyperparameters associated with each model.
model="NoRecovery"
- Σtk = diag(σtk, 112, …, σtk, JJ2)
where σtk, jj2 ∼ ℐ𝒢(a0, b0)
and ℐ𝒢(a, b) is the
inverse-gamma distribution. This specification is useful if the user
does not care about the correlation recovery. (c0
,
dj0
, a0
, b0
,
Omega0
)model="EquiCovariance"
- Σtk = Σ
for every combination of (t, k) and Σ−1 ∼ 𝒲s0(Σ0).
This specification assumes that the user has prior knowledge that the
correlation structure does not change across the arms included.
(c0
, dj0
, s0
,
Omega0
, Sigma0
)model="EquiWithinTreat"
- Σtk = Σt
and Σt−1 ∼ 𝒲s0(Σ0).
This is a relaxed version of model="EquiCovariance"
,
allowing the correlation structure to differ across trials but forcing
it to stay identical within a trial. (c0
, dj0
,
s0
, Omega0
, Sigma0
)model="EquiCorrelation"
- Σtk = δtkρδtk
where δtk = diag(Σtk, 111/2, …, Σtk, JJ1/2),
and ρ is the
correlation matrix. This specification allows the variances to vary
across arms but requires that the correlations be the same. This is due
to the lack of correlation information in the data, which would in turn
lead to the nonidentifiability of the correlations if they were allowed
to vary. However, this still is an ambitious model which permits maximal
degrees of freedom in terms of variance and correlation estimation.
(c0
, dj0
, a0
, b0
,
Omega0
)model="Hierarchical"
- The fifth model is hierarchical
and thus may require more data than the others: (Σtk−1 ∣ Σ) ∼ 𝒲ν0((ν0 − J − 1)−1Σ−1)
and Σ ∼ 𝒲d0(Σ0).
Σtk
encodes the within-treatment-arm variation while Σ captures the between-treatment-arm
variation. The hierarchical structure allows the “borrowing of strength”
across treatment arms. (c0
, dj0
,
d0
, nu0
, Sigma0
,
Omega0
)The boilerplate code for multivariate meta-analysis is as follows:
f <- "y1 + y2 | sd1 + sd2 ~ x1 + x2 + x3 + ns(n) | z1 + z2 + z3 | treat + trial + groups"
out <- bmeta_analyze(formula(f), data = df,
mcmc = list(ndiscard = 20000, nskip = 5, nkeep = 10000),
prior = list(model = "NoRecovery"))
Except model="NoRecovery"
, all else uses
R_stepsize
in the control
argument. The fourth
option—model=EquiCorrelation
—allows sampling ρ. However, researchers
oftentimes have a rough—or concrete—idea about the correlation values,
and therefore fixing ρ
makes sense. That edge case can be accomplished through setting
sample_Rho=FALSE
in control
and
Rho=<some_matrix>
in init
. If
sample_Rho=FALSE
but no initial value for Rho
is given, it will default to 0.5 (ρ = 0.5I + 0.511⊤).
In full form,
bmeta_analyze
Univariate network meta-analysis is characterized by a single
response endpoint and more than two treatments. Similar to
model
in the multivariate meta-analysis case, the degrees
of freedom (df
) for the random effects’ multivariate t-distribution are the parameter of
interest in univariate network meta-analysis. Objects parenthesized at
the end are the hyperparameters associated with the model.
df = <positive_real_value>
- the random effects
follow a multivariate t-distribution. If df
is not provided by the user, it defaults to 20. (c01
,
c02
)df = Inf
- the random effects follow a multivariate
normal distribution. (c01
, c02
)sample_df=TRUE
in control
- the random
effects follow a multivariate t-distribution but the degrees of
freedom are treated as random, in which case a hierarchical prior
distribution is considered for df
. That is,
(df | nu_a, nu_b) ~ Ga(nu_a, nu_a / nu_b)
,
nu_a ~ Ga(a4, b4)
, and nu_b ~ IG(a5, b5)
.
(c01
, c02
, a4
, b4
,
a5
, b5
)The boilerplate code for univariate network meta-analysis is as follows:
f <- "y | sd ~ x1 + x2 + x3 + ns(n) | z1 + z2 + z3 | treat + trial"
out <- bmeta_analyze(formula(f), data = df,
mcmc = list(discard = 20000, nskip = 5, nkeep = 10000),
prior = list(df = 3)) # heavy-tailed random effects
If control=list(sample_df=TRUE)
, the df
value in prior
is ignored. Identically to the multivariate
meta-analysis case, sample_Rho
can be set to
FALSE
to suppress the sampling of Rho
, and
Rho
will be initialized using the corresponding value in
init
, which defaults to 0.5 (ρ = 0.5I + 0.511⊤).
The correlation matrix Rho
for univariate network
meta-analysis belongs to the random effects, not the
responses.
Rho_init <- diag(nT) # nT = the number of treatments
Rho_init[upper.tri(Rho_init)] <-
Rho_init[lower.tri(Rho_init)] <- 0.2
out <- bmeta_analyze(formula(f), data = df,
control = list(sample_df = TRUE, sample_Rho = FALSE),
init = list(Rho = Rho_init))
Sometimes, there is no covariate information available for modeling the variances. The second RHS can be a vector of ones, which reduces the model to have $\mathrm{Var}(\overline{y}_{tk})=\sigma_{tk}^2/n_{tk} + \exp(2\phi)$ for every (t, k). Users can simply do without the second RHS in this case—the second RHS technically exists but is the treatment and trial configuration. An example formula is as follows:
When there are multiple endpoints, the correlations thereof are
oftentimes unreported. In a meta-regression setting, the correlations
are something we want to know. In the case of subject-level
meta-analyses (where individual participant/patient data are available),
the correlations may only be one function call away. However, for
study-level meta-analyses, the correlations must be estimated, which is
not an easy task. bmeta_analyze
provides five different
models to estimate and recover the missing correlations.
Since the aforementioned setting regards the correlations as missing, the reported data are $(\overline{\boldsymbol{y}}_{\cdot tk}, \boldsymbol{s}_{tk}) \in \mathbf{R}^J \times \mathbf{R}^J$ for the tth treatment arm and kth trial. Every trial out of K trials includes T treatment arms. The sample size of the tth treatment arm and kth trial is denoted by ntk. If we write xtkj ∈ Rpj to be the treatment-within-trial level regressor corresponding to the jth response, reflecting the fixed effects of the tth treatment arm, and wtkj ∈ Rqj to be the same for the random effects, the model becomes
and (ntk − 1)Stk ∼ 𝒲ntk − 1(Σtk) where Xtk = blockdiag(xtk1⊤, xtk2⊤, …, xtkJ⊤), β = (β1⊤, ⋯, βJ⊤)⊤, Wtk = blockdiag(wtk1⊤, …, wtkJ⊤), γk = (γk1⊤, γk2⊤, …, γkJ⊤)⊤, $\overline{\boldsymbol{\epsilon}}_{\cdot tk} \sim \mathcal{N}(\boldsymbol{0}, \Sigma_{tk}/n_{tk})$, Stk is the full-rank covariance matrix whose diagonal entries are the observed stk, and 𝒲ν(Σ) is the Wishart distribution with ν degrees of freedom and a J × J scale matrix Σ whose density function is
$$ f(X\mid \nu,\Sigma) = \dfrac{1}{2^{J\nu}|\Sigma|^{\nu/2}\Gamma_J(\nu/2)}|X|^{(\nu-J-1)/2}\exp\left(-\dfrac{1}{2}\mathrm{tr}(\Sigma^{-1}X) \right)I(X\in \mathcal{S}_{++}^J), $$
where Γp is the multivariate gamma function defined by
$$ \Gamma_p(z) = \pi^{p(p-1)/4}\prod_{j=1}^p \Gamma[z+(1-j)/2], $$
and 𝒮++J is the space of J × J symmetric positive definite matrices. The statistical independence of $\overline{\boldsymbol{\epsilon}}_{\cdot tk}$ and Stk follows naturally from the Basu’s theorem.
The patients can sometimes be grouped by a factor that will generate disparate random effects. Although an arbitrary number of groups can exist in theory, metapack restricts the number of groups to two for practicality. Denoting the binary group indicates by utk yields
$$ \overline{y}_{\cdot tkj} = \boldsymbol{x}_{tkj}^\top\boldsymbol{\beta} + (1-u_{tk})\boldsymbol{w}_{tkj}^\top \boldsymbol{\gamma}_{kj}^0 + u_{tk}\boldsymbol{w}_{tkj}^\top \boldsymbol{\gamma}_{kj}^1 + \overline{\epsilon}_{\cdot tkj}. $$
The random effects are modeled as $\boldsymbol{\gamma}_{kj}^l \overset{\text{ind}}{\sim}\mathcal{N}(\boldsymbol{\gamma}_j^{l*},\Omega_j^l)$ and (Ωjl)−1 ∼ 𝒲d0j(Ω0j). Stacking the vectors, γkl = ((γk1l)⊤, …, (γkJl)⊤)⊤ ∼ 𝒩(γl*, Ωl) where γl* = ((γ1l*)⊤, …, (γJl*)⊤)⊤, Ωj = blockdiag(Ωj0, Ωjl), and Ω = blockdiag(Ω1, …, ΩJ) for l ∈ {0, 1}. Adopting the non-centered reparametrization (Bernardo et al. 2003), define γk, ol = γkl − γl*. Denoting Wtk* = [(1 − utk)Wtk, utkWtk], Xtk* = [Xtk, Wtk*], and θ = (β⊤, γ0*⊤, γ1*⊤)⊤, the model is written as follows:
$$ \overline{\boldsymbol{y}}_{\cdot tk} = \boldsymbol{X}_{tk}^*\boldsymbol{\theta}+\boldsymbol{W}_{tk}^*\boldsymbol{\gamma}_{k,o} + \overline{\boldsymbol{\epsilon}}_{\cdot tk}, $$ where γk, o = ((γk, o0)⊤, (γk, o1)⊤)⊤. If there is no grouping in the patients, setting utk = 0 for all (t, k) reduces the model back to $\eqref{eq:reduced-model}$.
The conditional distribution of (Rtk ∣ Vtk, Σtk) where $R_{tk} = V_{tk}^{-\frac{1}{2}}S_{tk}V_{tk}^{-\frac{1}{2}}$ and Vtk = diag(Stk11, …, StkJJ) becomes
$$ f(R_{tk}\mid V_{tk},\Sigma_{tk}) \propto |R_{tk}|^{(n_{tk}-J-2)/2}\exp\left\{-\dfrac{(n_{tk}-1)}{2}\mathrm{tr}\left(V_{tk}^{\frac{1}{2}}\Sigma_{tk}^{-1}V_{tk}^{\frac{1}{2}}R_{tk} \right) \right\}. $$
Network meta-analysis is an extension of meta-analysis where more than two treatments are compared. Unlike the traditional meta-analyses that restrict the number of treatments to be equal across trials, network meta-analysis allows varying numbers of treatments. This achieves a unique benefit that two treatments that have not been compared head-to-head can be assessed as a pair.
Start by denoting the comprehensive list of treatments in all K trials by 𝒯 = {1, …, T}. It is rarely the case that all T treatments are included in the data but we drop the subscripts tk and replace it with t for notational simplicity. Now, consider the model
where ȳ⋅tk is the univariate aggregate response of the kth trial for which treatment t was assigned, xtk is the aggregate covariate vector for the fixed effects, and γtk is the random effects term. The observed standard deviation, stk2 is modeled by
$$ \dfrac{(n_{tk}-1)s_{tk}^2}{\sigma_{tk}^2} \sim \chi_{n_{tk}-1}^2. $$
τtk in Equation $\eqref{eq:nmr-basic}$ encapsulates the variance of the random effect for the tth treatment in the kth trial, which is modeled as a deterministic function of a related covariate. That is,
log τtk = ztk⊤ϕ,
where ztk is the q-dimensional aggregate covariate vector and ϕ is the corresponding coefficient vector.
For the kth trial, we define a selection/projection matrix Ek = (et1k, et2k, …, etTkk), where etlk = (0, …, 1, …, 0)⊤, l = 1, …, Tk, with tlkth element set to 1 and 0 otherwise, and Tk is the number of treatments included in the kth trial. Let the scaled random effects γk = (γ1k, …, γTk)⊤. Then, γk, o = Ek⊤γk is the vector of Tk-dimensional scaled random effects for the kth trial. The scaled random effects γk ∼ tT(γ, ρ, ν) where tT(μ, Σ, ν) denotes a multivariate t distribution with ν degrees of freedom, a location parameter vector μ, and a scale matrix Σ.
The non-centered reparametrization (Bernardo et al. 2003) gives γk, o = Ek⊤(γk − γ). Then, with $\bar{\boldsymbol{y}}_k = (\bar{y}_{kt_{k1}},\ldots,\bar{y}_{kt_{kT_k}})^\top$, Xk = (xktk1, …, xktkTk), and Zk(ϕ) = diag(exp (zktk1⊤ϕ), …, exp (zktkTk⊤ϕ)), the model is recast as
$$ \bar{\boldsymbol{y}}_k = \boldsymbol{X}_k^* \boldsymbol{\theta} + \boldsymbol{Z}_k(\boldsymbol{\phi}) \boldsymbol{\gamma}_{k,o} + \bar{\boldsymbol{\epsilon}}_k, $$ where Xk* = (Xk, Ek⊤), θ = (β⊤, γ⊤)⊤, and $\bar{\boldsymbol{\epsilon}}_k \sim \mathcal{N}_{T_k}(\boldsymbol{0},\Sigma_k)$, Σk = diag(σktk12/nktk1, …, σktkTk2/nktkTk). This allows the random effects γk, o ∼ tTk(0, Ek⊤ρEk, ν) to be centered at zero.
Since the multivariate t random effects are not analytically marginalizable, we represent it as a scale mixture of normals as follows:
$$ (\boldsymbol{\gamma}_{k,o}\mid \lambda_k) \overset{\text{ind}}{\sim} \mathcal{N}_{T_k}\left(\boldsymbol{0}, \lambda_k^{-1}(E_k^\top\boldsymbol{\rho}E_k) \right), \quad \lambda_k \overset{\text{iid}}{\sim}\mathcal{G}a\left(\dfrac{\nu}{2},\dfrac{\nu}{2} \right), $$ where 𝒢a(a, b) indicates the gamma distribution with mean a/b and variance a/b2.