The historical analysis framework for trial-based economic evaluations focusses on the modelling of individual-level effectiveness and cost variables \((e_i,c_i)\) collected at a given time or computed over some intervals, i.e. QALYs or Total costs. This because: 1) key quantities of interest, i.e. the mean parameters \((\mu_{e},\mu_{c})\) and related incremental quantities, can be easily derived from a bivariate model fitted to \((e_i,c_i)\); 2) when health economic outcomes are derived from longitudinal data \((e_{ij},c_{ij})\) collected at multiple occasions \(j=1,\ldots,J\), the modelling of \((e_i,c_i)\) is considerably simplified compared to a multivariate model applied to \((e_{ij},c_{ij})\) .
However, when the longitudinal responses are subject to missingness, i.e. \(e_{ij}=(e^{\text{obs}}_{ij},e^{\text{mis}}_{ij})\) and \(c_{ij}=(e^{\text{obs}}_{ij},c^{\text{mis}}_{ij})\), a cross-sectional model is often inadequate since information from partially-observed responses will be ignored in the analysis. As a result, the specification of a joint model for \((e_{ij},c_{ij})\) provides an intuitive approach which, within a Bayesian framework, is a simple extension of the standard cross-sectional framework In addition, the model can be easily customised to: retrieve the estimates of interest while taking into account the evidence from any observed responses; conduct sensitivity analysis to alternative missingness assumptions.
When partially-observed longitudinal data are available,
missingHE allows the user to implement a joint longitudinal
model through the dedicated function lmdm, which mimics the
modelling strategy of a longitudinal selection model. The function
requires the specification of a model for the effectiveness
(model.eff) and cost (model.cost) variables as
well as a conditional model (i.e. mechanism) for the missing
effectiveness (model.me) and costs (model.mc).
The main difference compared to standard selection models is that,
rather than directly modelling the missingness indicators,
lmdm specifies a model for the longitudinal missing data
pattern indicators \(\boldsymbol
r_{ie}=(m_{ije},\ldots,m_{iJe})\) and \(\boldsymbol
r_{ic}=(m_{ijc},\ldots,m_{iJc})\), where \(m_{ije}\) and \(m_{ije}\) denote the binary missingness
indicators for the responses at time \(j\). As an example, consider a
partially-observed effectiveness variable \(e_{ij}\) and let \(\boldsymbol r_{ie}\) denote the
corresponding longitudinal missingness patterns indicator. Although
\(r_{ie}\) could theory be associated
with different values depending on the specific pattern combinations
based on the values of \(m_{ije}\), a
possible way to simplify the specification of the model is to reduce the
number of modelled patterns by aggregating some of the observed patterns
together. This is the strategy implemented by missingHE to
facilitate the implementation of a longitudinal missingness approach
based on a restricted number of aggregated patterns. In particular,
missingHE specifies the model using three distinguished
types of missingness patterns and assigns each of them with a specific
indicator value: 1) \(\boldsymbol
r_{ie}=1\) for the complete cases, i.e. \(m_{ije}=0\) for any \(j\); 2) \(\boldsymbol r_{i2}=2\) for intermittent
missingness, i.e. \(m_{ije}=1\) and
\(m_{ile}=0\) for at least some time
\(l>j\); 3) \(\boldsymbol r_{ie}=3\) for dropout,
i.e. \(m_{ije}=1\) and \(m_{ile}=0\) for only \(l<j\).
This tutorial shows how it to specify longitudinal models using the
dedicated function lmdm in the package, while also showing
the available customisation options to allow a flexible modelling
specification. Throughout, we will use the built-in dataset called
PBS as a toy example, which is directly available when
installing and loading missingHE in your R
workspace. See Introduction to
missingHE for an introductory tutorial of each type of missingness
model in missingHE.
If you would like to have more information on the package, or would like to point out potential issues in the current version, feel free to contact the maintainer at [email protected]. Suggestions on how to improve the package are also very welcome. A development version of the package is also available from the author’s main GitHub repository at https://github.com/AnGabrio/missingHE.
Trial-based health economic disaggregated data usually consist in the longitudinal cost-effectiveness pair of variables \((e_{ijt},c_{ijt})\), collected on the \(i\)-th patient at time \(j\) assigned to the \(t\)-th treatment in the study, for \(i=1,\ldots,N\) patients, \(j=1,\ldots,J\) time points and \(t=1,\ldots,T\) treatment options. In many jurisdictions the recommended effectiveness variable \(e_{ijt}\) is a Health-Related Quality of Life (HRQoL) measure, typically in the form of utility scores computed from the patients’ answers to generic health-related quality of life questionnaires (e.g. EQ-5D) collected at given time intervals in the study. The cost variable \(c_{ijt}\) denotes the sum of all relevant patient-level costs (e.g. direct/indirect medical costs, productivity losses, etc.), collected from the patients’ answers to different types of healthcare-resource utilisation questionnaires or medical records at different times in the study.
In missingHE, the observed variables \((e_{ijt},c_{ijt})\) are modelled using a
joint bivariate probability distribution with density \(p(e_{ijt},c_{ijt}\mid \boldsymbol
\theta_{jt})\), as a function of a vector of relevant parameters
\(\boldsymbol \theta_{jt}\), and
implemented using a series of conditional model factorisations. In
particular, at \(j>1\), the model
can be represented as:
\[\begin{equation} p(e_{ijt},c_{ijt} \mid \boldsymbol \theta_{jt}) \quad = \quad p(e_{ijt} \mid e_{ij-1t},c_{ij-1t}, \boldsymbol \theta_{ejt})p(c_{ijt} \mid c_{ij-1t}, e_{ij-1t}, e_{ijt}, \boldsymbol \theta_{cjt}), (\#eq:biv) \end{equation}\]
where \(p(e_{ijt} \mid e_{ij-1t},c_{ij-1t}, \boldsymbol \theta_{ejt})\) denotes the conditional effectiveness distribution at time \(j\) given past responses (\(e_{ij-1t},c_{ij-1t}\)) and \(p(c_{ijt} \mid c_{ij-1t}, e_{ij-1t}, e_{ijt}, \boldsymbol \theta_{cjt})\) the conditional cost distribution at \(j\) given current (\(e_{ijt}\)) and past responses (\(e_{ij-1t},c_{ij-1t}\)), with \(\theta_{ejt}\) and \(\theta_{cjt}\) being the treatment- and time-specific set of parameters indexing the effectiveness and cost distributions, respectively. To simplify notation, we now drop the treatment index \(t\) from model parameters.
Both outcome densities in Equation @ref(eq:biv) are specified using some parametric forms for the probability distributions to describe the underlying uncertainty in the cost-effectiveness data. The set of model parameters can be generally represented as \(\boldsymbol \theta_j=(\boldsymbol \phi_j(x),\boldsymbol \psi_j(x))\), where: \(\boldsymbol \phi_j(x)=(\phi_{je}(x),\phi_{jc}(x))\) denotes the set of location parameters (e.g. mean) and \(\boldsymbol \psi_j(x)=(\psi_{je}(x),\psi_{jc}(x))\) the set of ancillary parameters (e.g. variance) indexing the outcome distributions, both depending on some vector of potential baseline covariates \(x\) (e.g. age, sex, etc…). While it is possible for both \(\boldsymbol \phi_j\) and \(\boldsymbol \psi_j\) to explicitly depend on \(x\), the model specification is usually simplified assuming that these only affect the location parameters through a generalised linear formulation. For example, the location parameter of the effectiveness model can be specified as:
\[\begin{equation} g_e(\phi_{ije}) \quad = \quad \alpha_{j0} + \sum_{k=1}^K \alpha_{jk}x_{ik} + \alpha_{je} e_{ij-1} + \alpha_{jc} c_{ij-1} \,\,[+ \ldots], (\#eq:glme) \end{equation}\]
and the location parameter of the cost model as:
\[\begin{equation} g_c(\phi_{ijc}) \quad = \quad \beta_{j0} + \sum_{k=1}^K \beta_{jk}x_{ik} + \beta_{jf} e_{ij} + \beta_{je} e_{ij-1} + \beta_{jc} c_{ij-1} , \,\,[+ \ldots]. (\#eq:glmc) \end{equation}\]
where the regression coefficients \((\alpha_{je},\alpha_{jc})\) and \((\beta_{jf}, \beta_{je},\beta_{jc})\) are
included to capture the possible time dependence between \(c_{ij}\) and \(e_{ij}\) at the location level. Note that
missingHE, through the argument time_dep
inside the lmdm function, allows the user to choose among
three alternative specifications of Equation @ref(eq:glme) and Equation
@ref(eq:glmc) based on different assumptions about the response time
dependence: 1) "none" requires no time dependence by
setting \(\alpha_{je}=\alpha_{jc}=\beta_{je}=\beta_{jc}=\beta_{jf}=0\);
2) "biv" requires dependence only between outcomes at the
same time by setting \(\alpha_{je}=\alpha_{jc}=\beta_{je}=\beta_{jc}=0\);l
3) "AR1" requires an autoregressive of order one structure
by not imposing any further restrictions on the regression
parameters.
Covariates may be included into the model at different scales using
the link functions \(g_e(\cdot)\) and
\(g_c(\cdot)\), being either identity,
logarithmic or logit, depending on the type of outcome and distribution
specified. Note that missingHE expands any categorical
covariates to a set of dummy variables. In addition, Equation
@ref(eq:glme) and Equation @ref(eq:glmc) can be extended, as indicated
by the term \([+ \ldots]\), to include
additional terms into the model, e.g. clustering effects to account for
the possible multilevel structure of the data. The objective of the
statistical analysis is the estimation of population average
time-specific effectiveness and cost parameters \(\boldsymbol \mu=(\mu_{je},\mu_{jc})\) in
each treatment group and to quantify the (posterior) uncertainty around
them. Estimates of these parameters can be obtained from Equation
@ref(eq:glme) and Equation @ref(eq:glmc) either from their respective
regression parameters after mean centring all covariates or through
repeatedly sampling from the predictive distribution of the outcomes
using Monte Carlo integration. Finally, estimates of aggregated mean
parameters computed over the entire study period are usually obtained
through linear combination of the time-specific average estimates. For
instance, in the case of QALYs, aggregated mean response estimates can
be typically derived by applying an Area Under the Curve (AUC)
approach:
\[\begin{equation} \mu_e = \sum_{j=1}^J \mu_{je} \times w_{je}, (\#eq:mue) \end{equation}\]
where \(w_{je}=\frac{\gamma_j}{2}\) denotes the weight used in the AUC calculation, with \(\gamma_j\) being the percentage of the time unit (often \(1\) year) covered between the time interval \(j-1\) and \(j\). As an example, if responses were collected at equally-spaced time intervals, say every \(6\) months, over a one-year study period, then \(w_{je}=\frac{\gamma_j}{2}=\frac{6/12}{2}=0.25\) for any \(j\). In the case of Total costs, aggregated mean response estimates can be typically derived as:
\[\begin{equation} \mu_c = \sum_{j=1}^J \mu_{jc} \times w_{jc}, (\#eq:muc) \end{equation}\]
where setting \(w_{jc}=1\) would
correspond to a standard unweighted sum. Note that, after
missingHE derives estimates for \((\mu_{je},\mu_{jc})\) from Equation
@ref(eq:glme) and Equation @ref(eq:glmc), it automatically computes the
aggregated estimates \((\mu_e,\mu_c)\)
using the approaches described in Equation @ref(eq:mue) and Equation
@ref(eq:muc) assuming default values of \(w_{je}=0.5\) and \(w_{jc}=1\) for all \(j\). However, the function
lmdm allows the user to modify these values through the
optional arguments w_e and w_c, which can be
provided either as single values (applied to each \(j\)) or as vectors of length \(J\). This level of customisation allows the
user to directly derive aggregated mean cost-effectiveness estimates and
use them in the computation of all CEA-specific output when using
lmdm.
Assuming that the location parameters are modelled using a linear
predictor form as in Equation @ref(eq:glme) and Equation @ref(eq:glmc),
priors on the corresponding sets of regression coefficients can be
specified as \(\boldsymbol
\alpha_j=(\alpha_{j0},\ldots,\alpha_{jK},\alpha_{je},\alpha_{jc})
\overset{\text{iid}}{\sim}
\text{Normal}(\mu_{j\alpha},\sigma_{j\alpha})\) and \(\boldsymbol
\beta=(\beta_{j0},\ldots,\beta_{jK},\beta_{jf},\beta_{je},\beta_{jc})
\overset{\text{iid}}{\sim}
\text{Normal}(\mu_{j\beta},\sigma_{j\beta})\), where the notation
\(\mu_{j\cdot}\) and \(\sigma_{j\cdot}\) denotes the time-specific
prior mean and standard deviation, respectively. By default,
missingHE assumes the “minimally informative” prior values
of \(\mu_{j\cdot}=0\) and \(\sigma_{j\cdot}=1000\) for all regression
coefficients. When genuine prior knowledge on specific parameters is
available, it is possible to modify the default priors and elicit the
corresponding information into the model. As for the ancillary
parameters, the choice of the prior values depends on the form of the
probability distribution selected according to the modelling choices
available in missingHE.
missingHE specifies the model for the aggregated
missingness patterns (\(\boldsymbol r_{ie},
\boldsymbol r_{ic}\)) through a selection model approach, where
the pattern-specific probabilities \((\pi^{r}_{ie},\pi^{r}_{ic})\) are estimated
using a multinomial (log) regression, conditional on possibly
fully-observed baseline covariates \(X_i\) and partially-observed outcomes
(\(e_{ij},c_{ij}\)). For example, the
missingness effectiveness model is specified as
where the upperscript \(r\)
indicates the pattern category (1=completers, 2=intermittent,
3=dropout), \(\boldsymbol r_{ie}\) is a
vector with entries set to one if \(m^r_{ije}=1\) and zero otherwise, \(\boldsymbol \gamma^r_j = (\gamma^r_{j0e}, \ldots,
\gamma^r_{jKe})\) is the set of time- and pattern-specific
regression coefficients, while \(\delta^r_{je}\) is the time- and
pattern-specific MNAR parameters. Additional terms, e.g. clustering
effects, may also be included in the model as indicated by the term
\([+ \ldots]\). Note that an equivalent
specification to Equation @ref(eq:lmdme) for the missingness model of
the longitudinal costs can be specified by missingHE
through the function lmdm.
According to Equation @ref(eq:lmdme), it is clear to see that alternative assumptions about the missingness mechanism can be encoded into the model based on the type of variables included. In particular, we have: a Missing Completely At Random (MCAR) assumption when all regression coefficients, except \(\gamma^r_{j0e}\), are zero; a Missing At Random (MAR) assumption when \(\delta^r_{je}=0\) and one or more coefficients among \((\gamma^r_{j1e}, \ldots, \gamma^r_{jKe})\) is/are not zero; a Missing Not At Random (MNAR) when \(\delta^r_{je} \neq 0\). Under MNAR, due to the missing values in \(e_{ij}\), the model can only be identified by imposing some restrictions. These are implicitly defined from a combination of parametric assumptions about \(p(e_{ij} \mid e_{ij-1}, c_{ij-1}, \boldsymbol \theta_j)\) and dependence structure assumed for \(p(\boldsymbol r_{ie} \mid e_{ij},\boldsymbol\eta_j)\). As a result, sensitivity analysis to assess the impact of alternative missingness assumptions on the inferences can be done by: varying the distributional assumption of \(p(e_{ij} \mid e_{ij-1}, c_{ij-1}, \boldsymbol \theta_j)\); varying the modelling structure in Equation @ref(eq:lmdme); and, within a Bayesian setting, varying the prior distribution on \(\delta^r_{je}\).
In the following, we use data from a real clinical study as a running
example to illustrate how a longitundal missingness model for
trial-based economic evaluations can be specified using the function
lmdm from missingHE. The PBS was a multicentre
randomised controlled trial conducted on people suffering from
intellectual disability and challenging behaviour. A total of \(244\) individuals were enrolled in the
trial, \(136\) in the Treatment As
Usual (TAU) and \(108\) in the
treatment of care plus the Positive Behavioural Support
intervention (PBS). For all individuals, key health economic outcomes
were collected via self-reported questionnaires (i.e. EQ-5D and CSRI) at
baseline, \(6\) and \(12\) months follow-ups.
After loading the package missingHE, for example by
typing library(missingHE), the PBS dataset
(PBS) is directly imported into the R
workspace and can be inspected using the command
> rbind(head(PBS), tail(PBS))
to show the first and last few rows:
#> id time e c age gender ethnicity living carer marital disability site trt
#> 1 1 0.17300001 9214.0 31 0 1 3 0 0 3 7 1
#> 2 1 0.85000002 2492.5 51 0 1 1 1 0 2 7 1
#> 3 1 -0.16599999 15283.0 53 1 1 1 1 0 1 5 1
#> 4 1 0.25800008 1858.0 30 0 1 3 0 0 2 5 1
#> 5 1 0.03800001 797.0 46 0 1 3 1 0 2 5 1
#> 6 1 1.00000000 358.0 44 0 0 3 1 0 1 5 1
#> 239 3 NA NA 33 1 1 2 0 0 3 4 1
#> 240 3 0.81500006 150.0 33 0 0 3 1 0 2 15 1
#> 241 3 0.45200008 1740.0 52 0 0 3 1 0 3 22 2
#> 242 3 0.88300002 105.0 32 0 1 3 1 0 3 23 1
#> 243 3 0.81500006 2372.0 50 0 0 1 1 0 3 2 2
#> 244 3 0.81500006 1630.0 27 1 0 2 0 0 2 2 2
The dataset consists of \(244\)
individuals grouped in two arms, where trt\(=1\) indicates the TAU and
trt\(=2\) indicates the
MenSS intervention, with id and time denoting
the participant identification number and measurement occasion,
respectively.
Key health economic variables are e and c,
which denote the HRQoL utility and cost longitudinal outcomes,
respectively. All baseline variables are fully-observed, whereas all
outcome variables are partially-observed in both treatment arms. There
is a total of 679 (93%) outcome observations, of which 371 in the TAU
and 308 in the PBS arm, while the remaining 53 observations are missing.
Figure @ref(fig:fig1e) and Figure @ref(fig:fig1c) show the histograms of
the observed distributions for the utility and cost variables in
PBS, separately by treatment arm and time point.
Histograms of the observed data distributions for the utilities in the TAU and PBS arm by time point.
Histograms of the observed data distributions for the costs in the TAU and PBS arm by time point.
missingHE relies on JAGS through the
dedicated function lmdm to implement the longitudinal
missingness approach while providing a series of customisation options.
In particular, key options include: choice of the probability
distribution for the effectiveness and cost variables, choice of the
regression model for each outcome, and type of missingness mechanism
assumed.
A set of pre-defined choices in terms of probability distributions,
with built-in parameterisations, for the effectiveness (e)
and/or cost (c) variables are available. The user may
select a given distributional choice by passing the corresponding
missingHE name to the arguments dist_e and
dist_c of the desired function. The outcome model \(p(e,c)\) requires a separate specification
of the effectiveness and cost model formula using the notation:
where e, c and trt indicate
the effectiveness, cost and treatment variable names stored in the data
set, while ... is a suitable form for the combination of
covariates to be used in each model. Dependence between the two models
can be specified by adding the term e as an additional
covariate in the cost model and by specifying the type of time
dependence structure assumed for the model through the argument
time_dep. Currently available choices for this argument
are: 1) "none" for complete independence, which also
requires e to not be included in model.cost;
2) "biv for bivariate dependence at each time, where
dependence between \(c_j\) and \(e_j\) is only allowed at the same time
\(j\); 3) "AR1 for order
one autoregressive structure as specified in Equation @ref(eq:glme) and
Equation @ref(eq:glmc). Note that the treatment indicator
trt must be included into both regression formulae and
should be coded as a factor variable in the data set, while the time
indicator time should not be included in the formulae and
should be coded as a numeric (integer) variable. The type of missingness
assumption for each outcome is specified through the argument
type, by setting type = "MAR" or
type = "MNAR" to select between a MAR or MNAR mechanism,
respectively.
As an example, a suitable call to lmdm is the
following.
fit.model <- lmdm(data = data, model.eff = model.eff,
model.cost = model.cost, dist_e = dist_e, dist_c = dist_c,
type = type, time_dep = time_dep, ...)where data is a dataframe containing the data,
model.eff and model.cost are the formulae for
the effectiveness and cost regression models, dist_e,
dist_c, type and time_dep are
character values indicating the chosen outcome distributions, type of
missingness mechanism and time dependence, while ...
indicates additional optional arguments. In practice, unless there is a
clear understanding of the implications of any change to the default
model structure, the user is not required to modify the default values
of these optional arguments. The only exception is related to: number of
MCMC chains (n.chains), number of iterations
(n.iter) and values of the parameters of the prior
distributions (prior). As an example, consider the default
priors for the linear predictor coefficients in the effectiveness and
cost model, namely \(\boldsymbol \beta_j
\overset{\text{iid}}{\sim}\text{Normal}(\mu_{j\beta},\sigma_{j\beta})\)
and \(\boldsymbol \alpha_j
\overset{\text{iid}}{\sim}\text{Normal}(\mu_{j\alpha},\sigma_{j\alpha})\).
Suppose the user wants to select different prior mean and standard
deviation values for these parameters. This can be done by typing
which instructs missingHE to separately assign normal
distributions to each regression coefficient in the models with prior
mean of \(0\) and precision (inverse of
variance) of \(0.01\). The list
myprior can be then passed to the optional
prior argument in the chosen model fitting function to
implement the model using the updated priors.
fit.model <- lmdm(data = data, model.eff = model.eff,
model.cost = model.cost, dist_e = dist_e, dist_c = dist_c,
type = type, time_dep = time_dep, prior = myprior)Note that the list and its elements containing the custom prior
specification need to be named using the missingHE accepted
terminology for parameter and distribution names. If the option
save_model is set to TRUE, the
JAGS model template is saved in the current working
directory and can be used to further modify the model structure with a
higher degree of flexibility. This, however, will require a new
compilation of the model and, at present, the new model has to be run
using dedicated commands, such as rjags from the
R2jags package.
To ease presentation, we consider the following common model
specification: e and c are both normally
distributed with no covariates included in their models; a MAR mechanism
is assumed for both outcomes, and a bivariate structure is assumed for
the time dependence. For demonstrative purposes and to ease presentation
of results, in this tutorial we only consider models fitted under MAR.
See Fitting MNAR models
in missingHE for an overview about how informative missingness
models can be fitted in missingHE.
The model is implemented using the function lmdm using
the following command.
> lmdm1_mar <- lmdm(data = PBS, dist_e = "norm", dist_c = "norm",
+ model.eff = e ~ trt, model.cost = c ~ trt + e,
+ model.me = me ~ 1, model.mc = mc ~ 1, time_dep = "biv",
+ type = "MAR", n.iter = 1000, ref = 2)
The arguments model.me and model.mc denote
the missingness mechanism as in Equation @ref(eq:lmdme) for both
outcomes. Since no covariates are included in model.me and
model.mc, the utility and cost mechanisms are assumed MAR
(type="MAR") conditional on the observed outcome values.
The argument ref = 2 is used to specify which study arm
should be used as the reference group in the comparison,
i.e. incremental results are reported here as trt=2
("PBS") vs trt=1 ("TAU").
Prior to inspecting the posterior results, it is important to assess
model performance and convergence of the MCMC algorithm.
missingHE provides three functions to compute and implement
different types of popular Bayesian model assessment measures and plots,
namely diagnostic, ppc and
pic.
The function diagnostic provides a range of standard
diagnostic tools and plots for assessing convergence based on the
posterior distribution of a Bayesian model fitted in
missingHE. Consider the output of the model stored in the
object lmdm1_mar; we may use the command
> diagnostic(x = lmdm1_mar, type = "denplot", param = "beta.f")
to produce “density plots” for the regression parameters associated with the effectiveness variable within the cost models. In this case, as displayed in Figure @ref(fig:fig2), no major issues in convergence are identified for the parameters monitored at each time point.
Checking convergence using the diagnostic function for a model fitted in missingHE, for example through inspection of the density plots.
With the exception of the argument x, which must be an
object of class "missingHE", all other arguments can be
used to customise the output of the function. A full description of the
choices for each argument, including the available types of diagnostic
tools and families of parameters, is provided in the help file of the
package. Note that the output produced by diagnostic is
effectively a ggplot2 object, and can therefore be
manipulated by the user using functions from packages which allow to
customise such objects.
The function ppc provides a range of graphical posterior
predictive checks for assessing the fit of a Bayesian model implemented
in missingHE. Consider the output of the model fitted in
Section @ref(lmdm-marmodel) and stored in the object
lmdm1_mar; we may use the command
> ppc(x = lmdm1_mar, type = "histogram", outcome = "effects",
+ ndisplay = 2, time.plot = 3, trt = "2")
to plot the densities of ndisplay\(=5\) replications of the effectiveness data
(light blue lines) with respect to the density of the observed data
(dark blue line) in the PBS (trt=2) arm. The argument
time.plot indicates for which times the comparison between
observed and replicated data should be plotted, here the third time
point is selected (default\(=1\)).
Figure @ref(fig:fig3) shows some discrepancies between the replicated
and observed data histograms, likely due to the relatively poor fit of
the current model (i.e. the normality assumption of e) to
the observed data in both arms.
Posterior predictive graphical checks using the ppc function for a model fitted in missingHE, for example through inspection of histograms.
A full description of the choices for each argument in
ppc, including the available types of posterior predictive
checks, is provided in the package help file. Note that the arguments in
ppc may also be used to generate graphs for any combination
of treatment arms and time points available in the data set: for
example, graphs corresponding to those shown in Figure @ref(fig:fig3)
for only the PBS arm at all time points may be requested by setting
trt=2 and time.plot="all".
The function pic allows to assess the relative
predictive accuracy of models fitted in missingHE by
computing alternative Bayesian information criteria. As an example,
consider again the model results stored in the object
lmdm1_mar; we may use the command
> pic_lmdm1_mar <- pic(x = lmdm1_mar, criterion = "looic", cases = "cc")
to return a named list containing a series of components used in the
calculation of the selected information criterion. In this case,
leave-one-out information criterion (LOOIC) was selected as specified in
criterion = "looic" based only on the complete cases in the
data set (cases = "cc"), and its value may be inspected by
typing:
> pic_lmdm1_mar$looic
#> [1] 14260.39
Note that all information criteria in missingHE can only
be used as relative measures of fit to compare nested models, with lower
values typically indicating better fit. In addition, predictive
information criteria are ideally evaluated based on the observed data
alone (or after integrating out missingness under MAR). As a result,
computing these measures based on the complete cases provides a “safe”
choice when comparing the fit of alternative models. pic
allows alternative choices for the argument cases, whose
appropriateness should be considered by the user based on the context at
hand (e.g. amount and type of missing data). Current choices are:
complete cases (cc), available cases for the effectiveness
(ac_e), available cases for the costs ("ac_c")
or all cases (all).
Finally, pic allows to compute information criteria for
multiple missingHE objects by simply replacing the argument
x with a list of models generated using the functions in
the package. For example, suppose an additional longitudinal model was
fitted using lmdm and its output stored in an object named
lmdm2_mar, which additionally includes some covariates into
the effectiveness and cost models. Then, the desired predictive
information criteria may be computed for both models using
pic by typing:
> pic_compare_mar <- pic(x = list(lmdm1_mar, lmdm2_mar),
+ criterion = "looic", cases = "cc")
A full description of the different types of output associated and input choices is provided in the package help file.
Summary statistics computed based on posterior samples of these
parameters, directly generated from JAGS, may be accessed
using the print function by typing
> print(lmdm1_mar, display = "fixed")
which returns the following output
#> mean sd 2.5% 50% 97.5% Rhat n.eff
#> alpha[1,1] 0.4774668421 0.03380072292 0.4110906197 0.4770403924 0.5465300005 1.00 500
#> alpha[2,1] 0.0823233443 0.04819621634 -0.0027014364 0.0826185540 0.1767845944 1.01 620
#> alpha[1,2] 0.4972297353 0.03180654387 0.4371215472 0.4973842840 0.5580172326 1.00 1000
#> alpha[2,2] 0.1407143544 0.04704184630 0.0512524414 0.1407080422 0.2336346719 1.00 1000
#> alpha[1,3] 0.4815649635 0.02839279600 0.4284286963 0.4808657016 0.5396731139 1.00 1000
#> alpha[2,3] 0.1340213198 0.04291335036 0.0496195939 0.1347396022 0.2153194104 1.00 1000
#> beta[1,1] 55.8801232096 30.93797510432 -3.5083567248 55.1339194620 118.9292774609 1.00 410
#> beta[2,1] 31.8525165271 32.79161522491 -34.6641161004 32.0659359662 94.7879942718 1.00 1000
#> beta[1,2] 54.1026243962 30.82668600636 -3.9985523628 53.0629597418 116.0963264010 1.01 250
#> beta[2,2] 32.4573961314 32.43437566851 -31.2618949904 32.7979283996 95.9260302043 1.00 1000
#> beta[1,3] 33.5200670968 32.96955496410 -28.9057654214 33.4589265091 98.3417450972 1.00 470
#> beta[2,3] 20.9058572513 31.36909924259 -42.1009544941 21.3479370921 82.5852314087 1.00 890
#> beta_f[1] -5.7891325531 31.49652565848 -65.4214939266 -5.0856971717 57.3903603120 1.00 1000
#> beta_f[2] -4.8216195738 31.76781481050 -66.8637831107 -3.7259816613 58.4801359963 1.00 1000
#> beta_f[3] -1.7033256676 32.18699504673 -63.9998491558 -1.0847655263 62.2647961685 1.00 1000
#> gamma_c[1,1] 0.3892001375 4.80566027292 -7.0289400317 -0.0842289872 8.1938062478 5.44 2
#> gamma_c[2,1] 1.7357543820 3.05687857590 -4.7223788144 1.9558154909 7.8444565221 2.05 4
#> gamma_c[3,1] 6.7842290179 1.68799125337 3.7667998222 6.6789926021 10.1135695099 2.08 4
#> gamma_c[1,2] -3.5281665679 4.85820548964 -11.1930217734 -3.9439654599 4.2925042251 5.33 2
#> gamma_c[2,2] -2.1982841776 3.05567186294 -8.6530217312 -1.9080739798 3.7906949663 2.02 4
#> gamma_c[3,2] 2.8523286132 1.71421721196 -0.2915445735 2.8401813299 6.1365560590 2.01 4
#> gamma_c[1,3] -2.8024048971 4.79671556271 -10.2581423325 -3.1480443413 4.9508789234 5.41 2
#> gamma_c[2,3] -1.4228341578 3.07602922063 -7.9847199576 -1.2199940209 4.7386811113 2.07 4
#> gamma_c[3,3] 3.6047612796 1.70267111353 0.5149599796 3.5054274160 6.8232404139 2.11 4
#> gamma_e[1,1] 0.8150196578 2.16972513038 -2.6427374326 0.8276232820 4.2511462979 4.37 2
#> gamma_e[2,1] 2.4227408196 5.31335071885 -6.4992830836 2.7930650798 9.7891078125 6.02 2
#> gamma_e[3,1] 0.6300398583 1.54766408810 -1.9859534233 0.4148844451 3.7290779723 1.19 21
#> gamma_e[1,2] -1.3460382381 2.17608676088 -4.8331382883 -1.3733031833 2.1643234720 4.35 2
#> gamma_e[2,2] 0.2587079759 5.31727278384 -8.5835569429 0.5532729047 7.6054500782 6.02 2
#> gamma_e[3,2] -1.5354436976 1.54703145428 -4.1513881918 -1.7482063226 1.5639189690 1.19 20
#> gamma_e[1,3] -1.7517462037 2.18515791352 -5.1906234911 -1.7476365488 1.7490969828 4.36 2
#> gamma_e[2,3] -0.1336904469 5.30205292333 -9.0241878338 0.3343639043 7.1452839135 6.02 2
#> gamma_e[3,3] -1.9259072805 1.55474682498 -4.6368336989 -2.1480462627 1.1660792849 1.18 21
#> p_c[1,1] 321.9998105963 1187.91814385731 0.0008858712 0.9456631638 3618.9688442115 5.44 2
#> p_c[2,1] 223.1159658911 752.53977783569 0.0088939972 7.0696830786 2551.5661106564 2.05 4
#> p_c[3,1] 3695.4605467947 10003.16181877702 43.2418123845 795.5228446557 24678.8945006288 2.12 4
#> p_c[1,2] 6.9718580242 26.67014461876 0.0000137700 0.0193997403 73.1532498774 5.33 2
#> p_c[2,2] 4.0933919221 13.78750579011 0.0001745985 0.1483658799 44.2871694509 2.02 4
#> p_c[3,2] 76.6826372324 235.52573440843 0.7471278844 17.1188759260 462.4582472510 2.01 4
#> p_c[1,3] 12.8281203862 44.92816670619 0.0000350708 0.0430168983 141.3000176937 5.41 2
#> p_c[2,3] 9.8802803557 32.74618426726 0.0003406282 0.2952340097 114.2833932172 2.07 4
#> p_c[3,3] 153.8114143843 422.26466292816 1.6735991703 33.2957079634 918.9884049152 2.11 4
#> p_e[1,1] 12.3712285224 20.95564815264 0.0711666058 2.2878800169 70.1858386627 4.37 2
#> p_e[2,1] 2234.4101245845 4737.51187633998 0.0015045189 19.2790190764 17838.3846678721 6.02 2
#> p_e[3,1] 5.9898856194 10.92409792715 0.1372496947 1.5141958146 41.6419673420 1.19 21
#> p_e[1,2] 1.4546690222 2.56525235470 0.0079615531 0.2532690331 8.7089487981 4.35 2
#> p_e[2,2] 257.0214869707 544.66726939203 0.0001871613 2.1045256915 2009.1554876828 6.02 2
#> p_e[3,2] 0.6920613605 1.27838632632 0.0157425488 0.1740860507 4.7775198572 1.19 20
#> p_e[1,3] 0.9596366297 1.61927945156 0.0055685339 0.1741858590 5.7494085202 4.36 2
#> p_e[2,3] 173.1343685322 376.16944150669 0.0001204620 1.5596856287 1268.1170076141 6.02 2
#> p_e[3,3] 0.4649941874 0.83831325783 0.0096883253 0.1167122142 3.2093893698 1.18 21
#> s_c[1] 3032.2358408369 151.61139922695 2751.0897921107 3030.3756706778 3348.3245175302 1.09 23
#> s_c[2] 2900.2780430513 122.23800864867 2670.6775442697 2892.2381888697 3139.2258470041 1.03 1000
#> s_c[3] 3817.2790502038 191.70221737580 3472.1514922716 3798.4022265394 4203.1241447786 1.11 21
#> s_e[1] 0.3795054371 0.01752650093 0.3491170430 0.3786510285 0.4149049768 1.00 1000
#> s_e[2] 0.3485407089 0.01707298529 0.3168235384 0.3477132747 0.3834513214 1.00 1000
#> s_e[3] 0.3271617017 0.01513705029 0.2992196969 0.3264994261 0.3574091839 1.00 1000
#> tau_c[1] 0.0000001096 0.00000001095 0.0000000892 0.0000001089 0.0000001321 1.09 23
#> tau_c[2] 0.0000001195 0.00000001007 0.0000001015 0.0000001195 0.0000001402 1.03 1000
#> tau_c[3] 0.0000000691 0.00000000686 0.0000000566 0.0000000693 0.0000000829 1.11 21
#> tau_e[1] 6.9872541162 0.63805469384 5.8090196013 6.9746391440 8.2046105584 1.00 1000
#> tau_e[2] 8.2907499199 0.80860138077 6.8011057004 8.2709893443 9.9624264864 1.00 1000
#> tau_e[3] 9.4026316121 0.86805169537 7.8283202978 9.3806980431 11.1691376455 1.00 1000
#> tmu_c[1] 69.9787780658 33.70257585769 8.3082827074 68.8925749738 137.2325626584 1.00 510
#> tmu_c[2] 68.4690128478 34.44991287038 3.7765987194 68.0787671060 138.4154810449 1.01 240
#> tmu_c[3] 42.7734793228 35.99748880516 -26.3739312365 42.6236504362 115.1700218174 1.00 330
#> tmu_e[1] 0.5139050437 0.02542791230 0.4648940689 0.5139139375 0.5636534298 1.00 1000
#> tmu_e[2] 0.5595131381 0.02333281507 0.5126808390 0.5594694086 0.6055472449 1.00 1000
#> tmu_e[3] 0.5408858755 0.02167806546 0.5001998168 0.5400838666 0.5837127184 1.00 1000
missingHE standardises the format of the output to
report summary quantities related to key model parameters, by default
the mean effectiveness and costs at each time point across treatment
arms on their natural scale. Estimates of posterior mean
(mean), standard deviation (sd) and \(95\%\) credible interval boundaries
(2.5% and 95.5%) can be used to summarise the
posterior distribution of the mean parameters. In addition, the MCMC
diagnostics Rhat and n.eff, corresponding to
the potential scale reduction factor and effective sample size numeric
diagnostics, are provided and can be used to assess convergence for
these parameters.
Summary statistics for the regression coefficients on the scale
defined by the linear predictor of the effectiveness and cost models
\((\boldsymbol \alpha, \boldsymbol
\beta)\), as defined in Section @ref(sec-frame), may also be
displayed using the coef function by typing
> coef(lmdm1_mar)
which returns the following output
#> $Effects
#> Mean SD QL QU
#> (Intercept).time.1 0.477 0.034 0.411 0.547
#> trt2.time.1 0.082 0.048 -0.003 0.177
#> (Intercept).time.2 0.497 0.032 0.437 0.558
#> trt2.time.2 0.141 0.047 0.051 0.234
#> (Intercept).time.3 0.482 0.028 0.428 0.540
#> trt2.time.3 0.134 0.043 0.050 0.215
#>
#> $Costs
#> Mean SD QL QU
#> (Intercept).time.1 55.880 30.938 -3.508 118.929
#> trt2.time.1 31.853 32.792 -34.664 94.788
#> (Intercept).time.2 54.103 30.827 -3.999 116.096
#> trt2.time.2 32.457 32.434 -31.262 95.926
#> (Intercept).time.3 33.520 32.970 -28.906 98.342
#> trt2.time.3 20.906 31.369 -42.101 82.585
#> e.time.1 -5.789 31.497 -65.421 57.390
#> e.time.2 -4.822 31.768 -66.864 58.480
#> e.time.3 -1.703 32.187 -64.000 62.265
The regression output is separately reported by outcome. For the
above output, the following parameter estimates are reported at each
time point in the model: the intercepts \(\alpha_{j0}=\)(Intercept) in
the effectiveness model; the intercepts \(\beta_{j0}=\)(Intercept) and
effectiveness coefficients \(\beta_{jf}=\)e in the cost
model.
For each regression parameter, estimates are summarised using
posterior means (mean), standard deviations
(sd) and credible interval boundaries (lower
and upper), the latter being calculated assuming by default
a credible level of \(95\%\).
The function plot allows to visually inspect how missing
outcome data have been imputed from a model stored in an object of class
"missingHE", using ggplot2 as the graphical
engine. For example, the command
> plot(lmdm1_mar, class = "boxplot", outcome = "costs",
> trt = "1", time.plot = 2)
displays boxplots of the observed (black dots and boxes) and imputed
(red dots and boxes) effectiveness variables in the "TAU"
(trt=1) arm at time \(2\)
based on the model results stored in lmdm1_mar, as shown in
Figure @ref(fig:figplot). Imputed values are represented in terms of
posterior mean values, and their uncertainty by means of corresponding
credible intervals. The arguments class="boxplot",
outcome="costs", trt="1" and
time.plot=2 instruct missingHE to display
boxplots for the cost outcome in the TAU arm at time \(2\).
Boxplots of observed and imputed effectiveness data in the TAU arm at time 2 under normal distributions using a longitudinal model.
Note that, similarly to diagnostic, all graphs produced
via plot can be manipulated using ggplot2
functions, e.g. to modify the graphics’ aesthetics, text and
disposition.
Incremental cost-effectiveness results between the reference arm
(trt\(=2\)) and each of
the other study arms (trt\(=1\)) can be obtained by using the
summary function and setting the argument
incremental to TRUE.
> summary(lmdm1_mar, incremental = TRUE)
Note that the health economic results displayed by
summary are based on aggregated mean effectiveness and cost
parameters (\(\mu_{e},\mu_{c}\)), which
are implicitly derived from the model estimates at each time (\(\mu_{je},\mu_{jc}\)) using Equation
@ref(eq:mue) and Equation @ref(eq:muc) and the weight quantities
(w_e and w_c) specified in
lmdm.
The incremental results produced by summary include:
the mean incremental effectiveness and cost parameters (\(\Delta_{e}=\mu_{e2}-\mu_{e1},\Delta_{c}=\mu_{c2}-\mu_{c1}\))
the incremental net monetary benefit (INMB) (\(\Delta_{e}\times k - \Delta_{c}\)), calculated at the given \(k\) value (willingness to pay threshold).
the estimated mean of the Incremental Cost-Effectiveness Ratio (ICER), which quantifies the cost per incremental unit of effectiveness, computed based on posterior samples as \(\text{ICER}=\frac{\Delta_{c}}{\Delta_{e}}\).
The above output compares the cost-effectiveness of the two arms in
the MenSS data set based on the results of the model stored
in the object lmdm1_mar. For example, in this case,
incremental results suggest that "PBS" is on average
associated with higher QALYs and Total costs with respect to
"TAU", leading to a positive estimated ICER.
A series of built-in functions are also available from the package
BCEA to visually display Probabilistic Sensitivity
Analysis (PSA) results derived from a model fitted in
missingHE. For example, the following commands may be
used
> BCEA::ceplane.plot(lmdm1_mar$cea, graph = "ggplot2")
> BCEA::ceac.plot(lmdm1_mar$cea, graph = "ggplot2")
to generate Figure @ref(fig:figplotcea1) and Figure
@ref(fig:figplotcea2), which combines cost-effectiveness plane and
acceptability curve plots obtained from the PSA results stored in
lmdm1_mar$cea.
Inspecting probabilistic sensitivity analysis using BCEA built-in functions, for example in terms of cost-effectiveness plane.
Inspecting probabilistic sensitivity analysis using BCEA built-in functions, for example in terms of cost-effectiveness acceptability curve.
In the cost-effectiveness plane, shown in Figure
@ref(fig:figplotcea1), all incremental estimates fall in the top-right
quadrant (where "PBS" is more expensive and more effective
than "TAU") and only a few of them are associated with ICER
estimates that are below \(k=25,000\)
(shaded area). In the cost-effectiveness acceptability curve, shown in
Figure @ref(fig:figplotcea2), a positive cost-effectiveness assessment
of "PBS" vs "TAU" is shown for only
willingness to pay threshold values \(k\) above \(40,000\).