Title: | Bayesian Survival Regression with Flexible Error and Random Effects Distributions |
---|---|
Description: | Contains Bayesian implementations of the Mixed-Effects Accelerated Failure Time (MEAFT) models for censored data. Those can be not only right-censored but also interval-censored, doubly-interval-censored or misclassified interval-censored. The methods implemented in the package have been published in Komárek and Lesaffre (2006, Stat. Modelling) <doi:10.1191/1471082X06st107oa>, Komárek, Lesaffre and Legrand (2007, Stat. in Medicine) <doi:10.1002/sim.3083>, Komárek and Lesaffre (2007, Stat. Sinica) <https://www3.stat.sinica.edu.tw/statistica/oldpdf/A17n27.pdf>, Komárek and Lesaffre (2008, JASA) <doi:10.1198/016214507000000563>, García-Zattera, Jara and Komárek (2016, Biometrics) <doi:10.1111/biom.12424>. |
Authors: | Arnošt Komárek [aut, cre] |
Maintainer: | Arnošt Komárek <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.8 |
Built: | 2024-12-16 07:03:16 UTC |
Source: | CRAN |
A function to estimate a regression model with bivariate (possibly right-, left-, interval- or doubly-interval-censored) data. In the case of doubly interval censoring, different regression models can be specified for the onset and event times.
The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variance matrices). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006) and Komárek and Lesaffre (2006).
We explain first in more detail a model without doubly censoring.
Let
be event times for
th cluster and the first and the second
unit. The following regression model is assumed:
where is unknown regression parameter vector and
is a vector of covariates. The bivariate error terms
are assumed to be i.i.d. with a bivariate density
. This density is expressed as
a mixture of Bayesian G-splines (normal densities with equidistant
means and constant variance matrices). We distinguish two,
theoretically equivalent, specifications.
where are
unknown basis variances and
is an equidistant grid of knots symmetric around the
unknown point
and related to the unknown basis variances through the
relationship
where are fixed
constants, e.g.
(which has a justification of being close to cubic B-splines).
where is an
unknown intercept term and
i.e.
are unknown scale
parameters.
is then
standardized bivariate error term which is distributed according
to the bivariate normal mixture, i.e.
where is an equidistant grid of fixed knots (means), usually
symmetric about the fixed point
and
are
fixed basis variances. Reasonable values for the numbers of grid
points
and
are
with the distance between the two
knots equal to
and for the basis
variances
Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2006) only Specification 2 is described.
The mixture weights
are
not estimated directly. To avoid the constraints
and
transformed weights
related to the original weights by the logistic transformation:
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2006) and to Komárek (2006).
If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.
bayesBisurvreg(formula, formula2, data = parent.frame(), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), prior2, prior.beta2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE), dir)
bayesBisurvreg(formula, formula2, data = parent.frame(), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), prior2, prior.beta2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE), dir)
formula |
model formula for the regression. In the case of
doubly-censored data, this is the model formula for the onset
time. Data are assumed to be sorted according to subjects and within
subjects according to the types of the events that determine the
bivariate survival distribution, i.e. the response vector must be
The left-hand side of the formula must be an object created using
|
formula2 |
model formula for the regression of the time-to-event in
the case of doubly-censored data. Ignored otherwise. The same remark as
for |
data |
optional data frame in which to interpret the variables occuring in the formulas. |
na.action |
the user is discouraged from changing the default
value |
onlyX |
if |
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
|
prior |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by |
prior.beta |
prior specification for the regression parameters,
in the case of doubly censored data for the regression parameters of
the onset time. I.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayesBisurvreg first with its argument |
init |
an optional list with initial values for the MCMC related
to the model given by
and
|
mcmc.par |
a list specifying how some of the G-spline parameters
related to
|
prior2 |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by |
prior.beta2 |
prior specification for the regression parameters
of time-to-event in the case of doubly censored data (related to
|
init2 |
an optional list with initial values for the MCMC related
to the model given by |
mcmc.par2 |
a list specifying how some of the G-spline parameters
related to |
store |
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
|
dir |
a string that specifies a directory where all sampled values are to be stored. |
A list of class bayesBisurvreg
containing an information
concerning the initial values and prior choices.
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burn-in, only
every nsimul$nwrite
value is written. After the burn-in, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
one column labeled iteration
with
indeces of MCMC iterations to which the stored sampled values
correspond.
columns labeled k
, Mean.1
, Mean.2
,
D.1.1
, D.2.1
, D.2.2
, where
k = number of mixture components that had probability numerically higher than zero;
Mean.1 =
;
Mean.2 =
;
D.1.1 =
;
D.2.1 =
;
D.2.2 =
;
all related to the distribution of the error term from the model given by formula
.
in the case of doubly-censored data, the same
structure as mixmoment.sim
, however related to the model
given by formula2
.
sampled mixture weights
of mixture components that had
probabilities numerically higher than zero. Related to the model
given by
formula
.
in the case of doubly-censored data, the same
structure as mweight.sim
, however related to the model
given by formula2
.
indeces
of mixture components that had probabilities numerically higher
than zero. It corresponds to the weights in
mweight.sim
. Related to the model given by formula
.
in the case of doubly-censored data, the same
structure as mmean.sim
, however related to the model
given by formula2
.
characteristics of the sampled G-spline
(distribution of
) related to the model given by
formula
. This file together with mixmoment.sim
,
mweight.sim
and mmean.sim
can be used to reconstruct
the G-spline in each MCMC iteration.
The file has columns labeled gamma1
,
gamma2
, sigma1
, sigma2
, delta1
,
delta2
, intercept1
, intercept2
,
scale1
, scale2
. The meaning of the values in these
columns is the following:
gamma1 = the middle knot in the
first dimension. If ‘Specification’ is 2, this column
usually contains zeros;
gamma2 = the middle knot in the
second dimension. If ‘Specification’ is 2, this column
usually contains zeros;
sigma1 = basis standard deviation
of the G-spline in the first dimension. This column contains
a fixed value if ‘Specification’ is 2;
sigma2 = basis standard deviation
of the G-spline in the second dimension. This column contains
a fixed value if ‘Specification’ is 2;
delta1 = distance between the two knots of the G-spline in
the first dimension. This column contains
a fixed value if ‘Specification’ is 2;
delta2 = distance between the two knots of the G-spline in
the second dimension. This column contains a fixed value if
‘Specification’ is 2;
intercept1 = the intercept term of
the G-spline in the first dimension. If ‘Specification’ is 1, this column
usually contains zeros;
intercept2 = the intercept term of
the G-spline in the second dimension. If ‘Specification’ is 1, this column
usually contains zeros;
scale1 = the scale parameter of the
G-spline in the first dimension. If ‘Specification’ is 1, this column
usually contains ones;
scale2 = the scale parameter of the
G-spline in the second dimension. ‘Specification’ is 1, this column
usually contains ones.
in the case of doubly-censored data, the same
structure as gspline.sim
, however related to the model
given by formula2
.
fully created only if store$a = TRUE
. The
file contains the transformed weights
of all mixture
components, i.e. also of components that had numerically zero
probabilities.
This file is related to the model given by
formula
.
fully created only if store$a2 =
TRUE
and in the case of doubly-censored data, the same
structure as mlogweight.sim
, however related to the model
given by formula2
.
fully created only if store$r = TRUE
. The file
contains the labels of the mixture components into which the
residuals are intrinsically assigned. Instead of double indeces
, values from 1 to
are stored here. Function
vecr2matr
can be used to transform it back to double
indeces.
fully created only if store$r2 =
TRUE
and in the case of doubly-censored data, the same
structure as r.sim
, however related to the model
given by formula2
.
either one column labeled lambda
or two
columns labeled lambda1
and lambda2
. These are the
values of the smoothing parameter(s)
(hyperparameters of the prior distribution of the transformed
mixture weights
). This file is
related to the model given by
formula
.
in the case of doubly-censored data, the same
structure as lambda.sim
, however related to the model
given by formula2
.
sampled values of the regression parameters
related to the model given by
formula
. The columns are labeled according to the
colnames
of the design matrix.
in the case of doubly-censored data, the same
structure as beta.sim
, however related to the model
given by formula2
.
fully created only if store$y = TRUE
. It
contains sampled (augmented) log-event times for all observations
in the data set.
fully created only if store$y2 =
TRUE
and in the case of doubly-censored data, the same
structure as Y.sim
, however related to the model
given by formula2
.
columns labeled loglik
, penalty
or penalty1
and
penalty2
, logprw
. This file is related to the model
given by formula
. The columns have the following meaning.
loglik
where denotes (augmented) (i,l)th
true log-event time. In other words,
loglik
is equal to the
conditional log-density
penalty1: If prior$neighbor.system
= "uniCAR"
:
the penalty term for the first dimension not multiplied by
lambda1
;
penalty2: If prior$neighbor.system
= "uniCAR"
:
the penalty term for the second dimension not multiplied by
lambda2
;
penalty: If prior$neighbor.system
is different from "uniCAR"
:
the penalty term not multiplied by lambda
;
logprw
where
is the number of residuals
assigned intrinsincally to the
th
mixture component.
In other words, logprw
is equal to the conditional
log-density
in the case of doubly-censored data, the same
structure as lambda.sim
, however related to the model
given by formula2
.
Arnošt Komárek [email protected]
Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337 - 348.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failure time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
Neal, R. M. (2003). Slice sampling (with Discussion). The Annals of Statistics, 31, 705 - 767.
## See the description of R commands for ## the population averaged AFT model ## with the Signal Tandmobiel data, ## analysis described in Komarek and Lesaffre (2006), ## ## R commands available in the documentation ## directory of this package as ## - see ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-tandmobPA.pdf ##
## See the description of R commands for ## the population averaged AFT model ## with the Signal Tandmobiel data, ## analysis described in Komarek and Lesaffre (2006), ## ## R commands available in the documentation ## directory of this package as ## - see ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-tandmobPA.pdf ##
Function to summarize the results obtained using
bayessurvreg1
function.
Compute the conditional (given the number of mixture components) and unconditional estimate of the density function based on the values sampled using the reversible jumps MCMC (MCMC average evaluated in a grid of values).
Give also the values of each sampled density evaluated at that grid (returned as the attribute of the resulting object). Methods for printing and plotting are also provided.
bayesDensity(dir, stgrid, centgrid, grid, n.grid = 100, skip = 0, by = 1, last.iter, standard = TRUE, center = TRUE, unstandard = TRUE)
bayesDensity(dir, stgrid, centgrid, grid, n.grid = 100, skip = 0, by = 1, last.iter, standard = TRUE, center = TRUE, unstandard = TRUE)
dir |
directory where to search for files ‘mixmoment.sim’, ‘mweight.sim’, mmean.sim', ‘mvariance.sim’ with the McMC sample. |
stgrid |
grid of values at which the sampled standardized
densities are to be evaluated. If |
centgrid |
grid of values at which the sampled centered (but not
scaled) densities are to be evaluated. If |
grid |
grid of values at which the sampled densities are to be
evaluated. If |
n.grid |
the length of the grid if |
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
by |
additional thinning of the sample. |
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
standard |
if |
center |
if |
unstandard |
of |
An object of class bayesDensity
is returned. This object is a
list and has potentially three components: standard
,
center
and
unstandard
. Each of these three components is a data.frame
with as many rows as number of grid points at which the density was
evaluated and with columns called ‘grid’, ‘unconditional’ and ‘k = 1’,
..., ‘k = k.max’ giving a predictive errr density, either averaged
over all sampled s (unconditional) or averaged over a
psecific number of mixture components.
Additionally, the object of class bayesDensity
has three
attributes:
sample.size |
a vector of length |
moments |
a |
k |
a |
There exist methods to print and plot objects of the class bayesDensity
.
Arnošt Komárek [email protected]
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549–569.
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2007), ## ## R commands available ## in the documentation ## directory of this package ## - ex-cgd.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-cgd.pdf ## ## - ex-tandmobMixture.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobMixture.pdf ##
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2007), ## ## R commands available ## in the documentation ## directory of this package ## - ex-cgd.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-cgd.pdf ## ## - ex-tandmobMixture.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobMixture.pdf ##
Compute the estimate of the density function based on the values sampled using the MCMC (MCMC average evaluated in a grid of values) in a model where density is specified as a Bayesian G-spline.
This function serves to summarize the MCMC chains related to the distributional parts
of the considered models obtained using the functions:
bayesHistogram
,
bayesBisurvreg
, bayessurvreg2
, bayessurvreg3
.
If asked, this function returns also the values of the G-spline evaluated in a grid at each iteration of MCMC.
bayesGspline(dir, extens="", extens.adjust="_b", grid1, grid2, skip = 0, by = 1, last.iter, nwrite, only.aver = TRUE, standard = FALSE, version = 0)
bayesGspline(dir, extens="", extens.adjust="_b", grid1, grid2, skip = 0, by = 1, last.iter, nwrite, only.aver = TRUE, standard = FALSE, version = 0)
dir |
directory where to search for files (‘mixmoment.sim’, ‘mweight.sim’, ‘mmean.sim’, ‘gspline.sim’) with the MCMC sample. |
||
extens |
an extension used to distinguish different sampled
G-splines if more G-splines were used in one simulation (e.g. with
doubly-censored data or in the model where both the error term and the
random intercept were defined as the G-splines). According to which
|
||
extens.adjust |
this argument is applicable for the situation when
the MCMC chains were created using the function
In that case the location of the error term and the random intercept
are separately not identifiable. Only the location of the sum
Argument The following values of
|
||
grid1 |
grid of values from the first dimension at which the sampled densities are to be evaluated. |
||
grid2 |
grid of values from the second dimension (if the G-spline
was bivariate) at which the sampled densities are to be
evaluated. This item is |
||
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
||
by |
additional thinning of the sample. |
||
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
||
nwrite |
frequency with which is the user informed about the
progress of computation (every |
||
only.aver |
|
||
standard |
|
||
version |
this argument indicates by which
|
An object of class bayesGspline
is returned. This object is a
list with components
grid
, average
for the univariate G-spline and
components grid1
, grid2
, average
for the bivariate G-spline.
grid |
this is a grid of values (vector) at which the McMC average of the G-spline was computed. |
||||||||||||||||||
average |
these are McMC averages of the G-spline (vector) evaluated in
|
||||||||||||||||||
grid1 |
this is a grid of values (vector) for the first dimension at which the McMC average of the G-spline was computed. |
||||||||||||||||||
grid2 |
this is a grid of values (vector) for the second dimension at which the McMC average of the G-spline was computed. |
||||||||||||||||||
average |
this is a matrix
and
|
There exists a method to plot objects of the class bayesGspline
.
Additionally, the object of class bayesGspline
has the following
attributes:
sample.size
a length of the McMC sample used to compute the McMC average.
sample
G-spline evaluated in a grid of values. This attribute
is present only if only.aver = FALSE
.
For a univariate G-spline this is a matrix with sample.size
columns and
length(grid1) rows.
For a bivariate G-spline this is a matrix
with sample.size
columns and
length(grid1)*length(grid2) rows.
Arnošt Komárek [email protected]
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3–22.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523–533.
Komárek, A., Lesaffre, E., and Legrand, C. (2007). Baseline and treatment effect heterogeneity for survival times between centers using a random effects accelerated failure time model with flexible error distribution. Statistics in Medicine, 26, 5457–5472.
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2006), ## Komarek and Lesaffre (2008), ## Komarek, Lesaffre, and Legrand (2007). ## ## R commands available ## in the documentation ## directory of this package ## - ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf ## - ex-tandmobCS.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobCS.pdf ## - ex-eortc.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-eortc.pdf ##
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2006), ## Komarek and Lesaffre (2008), ## Komarek, Lesaffre, and Legrand (2007). ## ## R commands available ## in the documentation ## directory of this package ## - ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf ## - ex-tandmobCS.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobCS.pdf ## - ex-eortc.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-eortc.pdf ##
A function to estimate a density of a uni- or bivariate (possibly censored) sample. The density is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and equal variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities in the density specification. Other method functions are available to visualize resulting density estimate.
This function served as a basis for further developed
bayesBisurvreg
, bayessurvreg2
and
bayessurvreg3
functions. However, in contrast to these
functions, bayesHistogram
does not allow for doubly censoring.
Bivariate case:
Let be
observations for the
th cluster and the first and the second
unit (dimension). The bivariate observations
are assumed to be i.i.d. with a~bivariate density
. This density is expressed as
a~mixture of Bayesian G-splines (normal densities with equidistant
means and constant variance matrices). We distinguish two,
theoretically equivalent, specifications.
where are
unknown basis variances and
is an~equidistant grid of knots symmetric around the
unknown point
and related to the unknown basis variances through the
relationship
where are fixed
constants, e.g.
(which has a~justification of being close to cubic B-splines).
where is an
unknown intercept term and
i.e.
are unknown scale
parameters.
is then
standardized observational vector which is distributed according
to the bivariate normal mixture, i.e.
where is an~equidistant grid of fixed knots (means), usually
symmetric about the fixed point
and
are
fixed basis variances. Reasonable values for the numbers of grid
points
and
are
with the distance between the two
knots equal to
and for the basis
variances
Univariate case:
It is a~direct simplification of the bivariate case.
bayesHistogram(y1, y2, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), store = list(a = FALSE, y = FALSE, r = FALSE), dir)
bayesHistogram(y1, y2, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), store = list(a = FALSE, y = FALSE, r = FALSE), dir)
y1 |
response for the first dimension in the form of a survival
object created using |
y2 |
response for the second dimension in the form of a survival
object created using |
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
|
prior |
a list that identifies prior hyperparameters and prior choices. See the paper Komárek and Lesaffre (2008) and the PhD. thesis Komárek (2006) for more details. Some prior parameters can be guessed by the function itself. If you
want to do so, set such parameters to
|
init |
a list of the initial values to start the McMC. Set to
|
mcmc.par |
a list specifying further details of the McMC simulation. There are default values implemented for all components of this list.
|
store |
a~list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
|
dir |
a string that specifies a directory where all sampled values are to be stored. |
A list of class bayesHistogram
containing an information
concerning the initial values and prior choices.
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. All
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
one column labeled iteration
with
indeces of MCMC iterations to which the stored sampled values
correspond.
columns labeled k
, Mean.1
, Mean.2
,
D.1.1
, D.2.1
, D.2.2
in the bivariate case and
columns labeled k
, Mean.1
, D.1.1
in the
univariate case, where
k = number of mixture components that had probability numerically higher than zero;
Mean.1 =
;
Mean.2 =
;
D.1.1 =
;
D.2.1 =
;
D.2.2 =
.
sampled mixture weights
of mixture components that had
probabilities numerically higher than zero.
indeces
of mixture components that had probabilities numerically higher
than zero. It corresponds to the weights in
mweight.sim
.
characteristics of the sampled G-spline
(distribution of
).
This file together with
mixmoment.sim
,
mweight.sim
and mmean.sim
can be used to reconstruct
the G-spline in each MCMC iteration.
The file has columns labeled gamma1
,
gamma2
, sigma1
, sigma2
, delta1
,
delta2
, intercept1
, intercept2
,
scale1
, scale2
. The meaning of the values in these
columns is the following:
gamma1 = the middle knot in the
first dimension. If ‘Specification’ is 2, this column
usually contains zeros;
gamma2 = the middle knot in the
second dimension. If ‘Specification’ is 2, this column
usually contains zeros;
sigma1 = basis standard deviation
of the G-spline in the first dimension. This column contains
a~fixed value if ‘Specification’ is 2;
sigma2 = basis standard deviation
of the G-spline in the second dimension. This column contains
a~fixed value if ‘Specification’ is 2;
delta1 = distance between the two knots of the G-spline in
the first dimension. This column contains
a~fixed value if ‘Specification’ is 2;
delta2 = distance between the two knots of the G-spline in
the second dimension. This column contains a~fixed value if
‘Specification’ is 2;
intercept1 = the intercept term of
the G-spline in the first dimension. If ‘Specification’ is 1, this column
usually contains zeros;
intercept2 = the intercept term of
the G-spline in the second dimension. If ‘Specification’ is 1, this column
usually contains zeros;
scale1 = the scale parameter of the
G-spline in the first dimension. If ‘Specification’ is 1, this column
usually contains ones;
scale2 = the scale parameter of the
G-spline in the second dimension. ‘Specification’ is 1, this column
usually contains ones.
fully created only if store$a = TRUE
. The
file contains the transformed weights
of all mixture
components, i.e. also of components that had numerically zero
probabilities.
fully created only if store$r = TRUE
. The file
contains the labels of the mixture components into which the
observations are intrinsically assigned. Instead of double indeces
, values from 1 to
are stored here. Function
vecr2matr
can be used to transform it back to double
indeces.
either one column labeled lambda
or two
columns labeled lambda1
and lambda2
. These are the
values of the smoothing parameter(s)
(hyperparameters of the prior distribution of the transformed
mixture weights
).
fully created only if store$y = TRUE
. It
contains sampled (augmented) log-event times for all observations
in the data set.
columns labeled loglik
, penalty
or penalty1
and
penalty2
, logprw
. The columns have the following
meaning (the formulas apply for the bivariate case).
loglik
where denotes (augmented) (i,l)th
true log-event time. In other words,
loglik
is equal to the
conditional log-density
penalty1: If prior$neighbor.system
= "uniCAR"
:
the penalty term for the first dimension not multiplied by
lambda1
;
penalty2: If prior$neighbor.system
= "uniCAR"
:
the penalty term for the second dimension not multiplied by
lambda2
;
penalty: If prior$neighbor.system
is different from "uniCAR"
:
the penalty term not multiplied by lambda
;
logprw
where
is the number of observations
assigned intrinsincally to the
th
mixture component.
In other words, logprw
is equal to the conditional
log-density
Arnošt Komárek [email protected]
Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337 - 348.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523 - 533.
Komárek, A. and Lesaffre, E. (2006b). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
Neal, R. M. (2003). Slice sampling (with Discussion). The Annals of Statistics, 31, 705 - 767.
A function to sample from the posterior distribution for a survival regression model
where distribution of is specified
as a normal mixture with unknown number of components as in Richardson
and Green (1997) and random effect
is normally distributed.
See Komárek (2006) or Komárek and Lesaffre (2007) for more detailed description of prior assumptions.
Sampled values are stored on a disk to be further worked out by e.g.
coda
or boa
.
bayessurvreg1(formula, random, data = parent.frame(), subset, na.action = na.fail, x = FALSE, y = FALSE, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nnoadapt = 0, nwrite = 10), prior = list(kmax = 5, k.prior = "poisson", poisson.k = 3, dirichlet.w = 1, mean.mu = NULL, var.mu = NULL, shape.invsig2 = 1.5, shape.hyper.invsig2 = 0.8, rate.hyper.invsig2 = NULL, pi.split = NULL, pi.birth = NULL, Eb0.depend.mix = FALSE), prior.beta, prior.b, prop.revjump, init = list(iter = 0, mixture = NULL, beta = NULL, b = NULL, D = NULL, y = NULL, r = NULL, otherp = NULL, u = NULL), store = list(y = TRUE, r = TRUE, b = TRUE, u = TRUE, MHb = FALSE, regresres = FALSE), dir, toler.chol = 1e-10, toler.qr = 1e-10, ...)
bayessurvreg1(formula, random, data = parent.frame(), subset, na.action = na.fail, x = FALSE, y = FALSE, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nnoadapt = 0, nwrite = 10), prior = list(kmax = 5, k.prior = "poisson", poisson.k = 3, dirichlet.w = 1, mean.mu = NULL, var.mu = NULL, shape.invsig2 = 1.5, shape.hyper.invsig2 = 0.8, rate.hyper.invsig2 = NULL, pi.split = NULL, pi.birth = NULL, Eb0.depend.mix = FALSE), prior.beta, prior.b, prop.revjump, init = list(iter = 0, mixture = NULL, beta = NULL, b = NULL, D = NULL, y = NULL, r = NULL, otherp = NULL, u = NULL), store = list(y = TRUE, r = TRUE, b = TRUE, u = TRUE, MHb = FALSE, regresres = FALSE), dir, toler.chol = 1e-10, toler.qr = 1e-10, ...)
formula |
model formula for the ‘fixed’ part of the model, i.e. the
part that specifies The left-hand side of the If
|
||
random |
formula for the ‘random’ part of the model, i.e. the
part that specifies When some random effects are included the random intercept is added
by default. It can be removed using e.g. |
||
data |
optional data frame in which to interpret the variables occuring in the formulas. |
||
subset |
subset of the observations to be used in the fit. |
||
na.action |
function to be used to handle any |
||
x |
if |
||
y |
if |
||
onlyX |
if TRUE, no McMC is performed. The function returns only
a design matrix of your model (intercept excluded). It might be
useful to set up correctly a parameter for a block update of
|
||
nsimul |
a list giving the number of iterations of the McMC and other parameters of the simulation.
|
||
prior |
a list that identifies prior hyperparameters and prior
choices. See accompanying paper for more details.
Some prior parameters can be guessed by the function itself. If you
want to do so, set such parameters to
|
||
prior.beta |
a list defining the blocks of
|
||
prior.b |
a list defining the way in which the random effects are to be updated and the specification of priors for random effects related parameters. The list is assumed to have following components.
|
||
prop.revjump |
a list of values defining in which way the reversible jumps will be performed.
|
||
init |
a list of the initial values to start the McMC. Set to
|
||
store |
a list that defines which sampled values besides
regression parameters
In the case that either |
||
dir |
a string that specifies a directory where all sampled values are to be stored. |
||
toler.chol |
tolerance for the Cholesky decomposition. |
||
toler.qr |
tolerance for the QR decomposition. |
||
... |
who knows? |
A list of class bayessurvreg
containing an information
concerning the initial values and prior choices.
Additionally, the following files with sampled values
are stored in a directory specified by dir
parameter of this
function (some of them are created only on request, see store
parameter of this function).
one column labeled iteration
with
indeces of McMC iterations to which the stored sampled values correspond.
two columns labeled loglik
and
randomloglik
.
where
denotes (sampled) (i,l)th true
log-event time,
sampled value of the random effect vector for the
ith cluster,
sampled value of the regression parameter
and
sampled mixture at each iteration.
where denotes a density of
(multivariate) normal distribution
where
is a sampled value of the mean of random
effect vector and
is a sampled value of the covariance
matrix of the random effects at each iteration.
three columns labeled k
, Intercept
and
Scale
. These are the number of mixture components, mean and
standard deviation of the sampled error distribution (mixture) at
each iteration.
each row contains mixture weights
at each iteration. From the header of this file, maximal number of mixture
components specified in the prior can be derived.
each row contains mixture means
at each iteration. From the header of this file, maximal number of mixture
components specified in the prior can be derived.
each row contains mixture variances
at each
iteration. From the header of this file, maximal number of mixture
components specified in the prior can be derived.
columns labeled according to name of the design
matrix. These are sampled values of regression parameters
and means of random effects
(except the mean of the random intercept which is zero).
columns labeled nameb[1].id[1], ...,
nameb[q].id[1], ..., nameb[1].id[N], ..., nameb[q].id[N]
,
where q
is a dimension of the random effect vector
and
N
number of clusters. nameb
is replaced by appropriate column name from the design matrix and
id
is replaced by identificator of the clusters. This gives
sampled values of the random effects for each cluster.
columns labeled det, D.s.t, s = 1,..., q, t =
s,...,q
, where q
is dimension of the random effect
vector . Column
det
gives a determinant of
the covariance matrix of the random effects at each
iteration, remaining columns give a lower triangle of this matrix
at each iteration.
columns labeled Y[m]
where m
goes from 1
to . This gives sampled
log-event times for each observation in the dataset at each
iteration.
columns labeled r[m]
where m
goes from 1
to . This gives sampled
mixture labels for each observation in the dataset at each iteration.
Currently only one column labeled eta
that
gives sampled values of the hyperparameter .
this gives the information concerning the
performance of reversible jump algorithm and a sampler of
regression parameters and means of random
effects
. It has columns
accept.spl.comb
relative frequency of accepted split-combine moves up to that iteration.
split
relative frequency of proposed split moves up to that iteration.
accept.birth.death
relative frequency of accepted birth-death moves up to that iteration.
birth
relative frequency of proposed birth moves up to that iteration.
beta.block.m
with m
going from 1 to number
of defined blocks of beta parameters. This gives a relative
frequency of accepted proposals for each block up to that
iteration. When Gibbs move is used, these should be columns of
ones.
this gives the information concerning the performance of a sampler for random effects (relative frequency of accepted values for each cluster and each block of random effects updated together). When Gibbs move is used only ones are seen in this file.
Sampled values of canonical proposal variables for reversible jump algorithm are stored here. This file is useful only when trying to restart the simulation from some specific point.
columns labeled res[m]
where m
goes from 1
to This stores so called
regression residuals for each observation at each iteration. This
residual is defined as
where is a (sampled)
log-event time at each iteration.
Arnošt Komárek [email protected]
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549 - 569.
Brooks, S. P., Giudici, P., and Roberts, G. O. (2003). Efficient construction of reversible jump Markov chain Monte Carlo proposal distribution (with Discussion). Journal of the Royal Statistical Society B, 65, 3 - 55.
Green, P. J. (1995). Reversible jump MCMC computation and Bayesian model determination. Biometrika, 82, 711 - 732.
Richardson, S., and Green, P. J. (1997). On Bayesian analysis of mixtures with unknown number of components (with Discussion). Journal of the Royal Statistical Society B, 59, 731 - 792.
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2007). ## ## R commands available ## in the documentation ## directory of this package as ## - ex-cgd.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-cgd.pdf ## ## - ex-tandmobMixture.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobMixture.pdf ##
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2007). ## ## R commands available ## in the documentation ## directory of this package as ## - ex-cgd.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-cgd.pdf ## ## - ex-tandmobMixture.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobMixture.pdf ##
This function creates the list of initial values as required by the init
argument of the function bayessurvreg1
. The initials are taken from
the files that are of the form of the files where the simulated values from
the McMC run performed by the function bayessurvreg1
are stored.
The files are assumed to have the following names: "iteration.sim",
"mixmoment.sim", "mweight.sim", "mmean.sim", "mvariance.sim",
"beta.sim", "b.sim", "Y.sim", "r.sim", "D.sim", "otherp.sim", "u.sim". Some of these files
may be missing. In that case, the corresponding initial is filled by NULL
.
bayessurvreg1.files2init(dir = getwd(), row, kmax)
bayessurvreg1.files2init(dir = getwd(), row, kmax)
dir |
string giving the directory where it will be searched for the files with initial values. |
row |
the row (possible header does not count) from the files with the values that will be considered to give the initial values. By default, it is the last row from the files. |
kmax |
maximal number of mixture components. This must be given
only if |
A list with components called "iter", "mixture", "beta", "b", "D", "y",
"r", "otherp", "u"
in the form as required by the argument init
of the function bayessurvreg1
.
Arnošt Komárek [email protected]
A function to estimate a regression model with possibly clustered (possibly right, left, interval or doubly-interval censored) data. In the case of doubly-interval censoring, different regression models can be specified for the onset and event times.
(Multivariate) random effects, normally distributed and acting as in the linear mixed model, normally distributed, can be included to adjust for clusters.
The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006), and Komárek, Lesaffre and Legrand (2007).
We explain first in more detail a model without doubly censoring.
Let
be event times for
th cluster and the units within that cluster
The following regression model is assumed:
where is unknown regression parameter vector,
is a vector of covariates.
is a (multivariate) cluster-specific random effect
vector and
is a vector of covariates for random
effects.
The random effect vectors
are assumed to be i.i.d. with a (multivariate) normal distribution
with the mean
and a covariance matrix
. Hierarchical centring (see Gelfand, Sahu, Carlin, 1995) is
used. I.e.
expresses the average effect of the
covariates included in
. Note that covariates
included in
may not be included in the covariate
vector
. The covariance matrix
is
assigned an inverse Wishart prior distribution in the next level of hierarchy.
The error terms
are assumed to be i.i.d. with a univariate density
. This density is expressed as
a mixture of Bayesian G-splines (normal densities with equidistant
means and constant variances). We distinguish two,
theoretically equivalent, specifications.
where is the
unknown basis variance and
is an equidistant grid of knots symmetric around the
unknown point
and related to the unknown basis variance through the
relationship
where is fixed
constants, e.g.
(which has a justification of being close to cubic B-splines).
where is an
unknown intercept term and
is an unknown scale parameter.
is then
standardized error term which is distributed according
to the univariate normal mixture, i.e.
where
is an equidistant grid of fixed knots (means), usually
symmetric about the fixed point
and
is fixed basis variance.
Reasonable values for the numbers of grid
points
is
with the distance between the two
knots equal to
and for the basis
variance
Personally, I found Specification 2 performing better. In the paper Komárek, Lesaffre and Legrand (2007) only Specification 2 is described.
The mixture weights
are
not estimated directly. To avoid the constraints
and
transformed weights
related to the original weights by the logistic transformation:
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek (2006) and to Komárek, Lesafre, and Legrand (2007).
If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.
bayessurvreg2(formula, random, formula2, random2, data = parent.frame(), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, prior.b, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), prior2, prior.beta2, prior.b2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE), dir)
bayessurvreg2(formula, random, formula2, random2, data = parent.frame(), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, prior.b, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), prior2, prior.beta2, prior.b2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE), dir)
formula |
model formula for the regression. In the case of doubly-censored data, this is the model formula for the onset time. The left-hand side of the In the formula all covariates appearing both in the vector
If
|
|
random |
formula for the ‘random’ part of the model, i.e. the
part that specifies the covariates If omitted, no random part is included in the model. E.g. to specify the model with a
random intercept, say When some random effects are included the random intercept is added
by default. It can be removed using e.g. |
|
formula2 |
model formula for the regression of the time-to-event in
the case of doubly-censored data. Ignored otherwise. The same structure as
for |
|
random2 |
specification of the ‘random’ part of the model for
time-to-event in the case of doubly-censored data. Ignored
otherwise. The same structure as for |
|
data |
optional data frame in which to interpret the variables
occuring in the |
|
na.action |
the user is discouraged from changing the default
value |
|
onlyX |
if |
|
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
|
|
prior |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by The item |
|
prior.b |
a list defining the way in which the random effects
involved in
|
|
prior.beta |
prior specification for the regression parameters,
in the case of doubly-censored data for the regression parameters of
the onset time, i.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayessurvreg2 first with its argument |
|
init |
an optional list with initial values for the MCMC related
to the model given by
|
|
mcmc.par |
a list specifying how some of the G-spline parameters
related to the distribution of the error term from In contrast to |
|
prior2 |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by |
|
prior.b2 |
prior specification for the parameters related to the
random effects from |
|
prior.beta2 |
prior specification for the regression parameters
of time-to-event in the case of doubly censored data (related to
|
|
init2 |
an optional list with initial values for the MCMC related
to the model given by |
|
mcmc.par2 |
a list specifying how some of the G-spline parameters
related to |
|
store |
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
|
|
dir |
a string that specifies a directory where all sampled values are to be stored. |
A list of class bayessurvreg2
containing an information
concerning the initial values and prior choices.
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burn-in, only
every nsimul$nwrite
value is written. After the burn-in, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
one column labeled iteration
with
indeces of MCMC iterations to which the stored sampled values
correspond.
columns labeled k
, Mean.1
,
D.1.1
, where
k = number of mixture components that had probability numerically higher than zero;
Mean.1 =
;
D.1.1 =
;
all related to the distribution of the error term from the
model given by formula
.
in the case of doubly-censored data, the same
structure as mixmoment.sim
, however related to the model
given by formula2
.
sampled mixture weights
of mixture components that had
probabilities numerically higher than zero. Related to the model
given by
formula
.
in the case of doubly-censored data, the same
structure as mweight.sim
, however related to the model
given by formula2
.
indeces
of mixture components that had probabilities numerically higher
than zero. It corresponds to the weights in
mweight.sim
. Related to the model given by formula
.
in the case of doubly-censored data, the same
structure as mmean.sim
, however related to the model
given by formula2
.
characteristics of the sampled G-spline
(distribution of
)
related to the model given by
formula
. This file together with mixmoment.sim
,
mweight.sim
and mmean.sim
can be used to reconstruct
the G-spline in each MCMC iteration.
The file has columns labeled
gamma1
,
sigma1
,
delta1
,
intercept1
,
scale1
,
The meaning of the values in these columns is the following:
gamma1 = the middle knot
If ‘Specification’ is 2, this column usually contains zeros;
sigma1 = basis standard deviation
of the G-spline. This column contains a fixed value
if ‘Specification’ is 2;
delta1 = distance between the two knots of the G-spline.
This column contains a fixed value if ‘Specification’ is 2;
intercept1 = the intercept term of the G-spline.
If ‘Specification’ is 1, this column usually contains zeros;
scale1 = the scale parameter of the G-spline.
If ‘Specification’ is 1, this column usually contains ones;
in the case of doubly-censored data, the same
structure as gspline.sim
, however related to the model
given by formula2
.
fully created only if store$a = TRUE
. The
file contains the transformed weights
of all mixture components, i.e. also of components that had numerically zero
probabilities. This file is related to the error distribution of
the model given by
formula
.
fully created only if store$a2 =
TRUE
and in the case of doubly-censored data, the same
structure as mlogweight.sim
, however related to the error
distribution of the model given by formula2
.
fully created only if store$r = TRUE
. The file
contains the labels of the mixture components into which the
residuals are intrinsically assigned. Instead of indeces on the
scale
values from 1 to
are stored here. Function
vecr2matr
can be used to transform it back to
indices from to
.
fully created only if store$r2 =
TRUE
and in the case of doubly-censored data, the same
structure as r.sim
, however related to the model
given by formula2
.
one column labeled lambda
. These are the
values of the smoothing parameter
(hyperparameters of the prior distribution of the transformed
mixture weights
). This file is
related to the model given by
formula
.
in the case of doubly-censored data, the same
structure as lambda.sim
, however related to the model
given by formula2
.
sampled values of the regression parameters, both
the fixed effects and means of the random
effects
(except the random intercept which
has always the mean equal to zero).
This file is related to the model given by
formula
.
The columns are labeled according to the
colnames
of the design matrix.
in the case of doubly-censored data, the same
structure as beta.sim
, however related to the model
given by formula2
.
sampled values of the covariance matrix of
the random effects. The file has
columns (
is the dimension of the random
effect vector
). The first column labeled
det
contains the determinant of the sampled matrix, additional columns
labeled D.1.1
, D.2.1
, ..., D.q.1
, ...
D.q.q
contain the lower triangle of the sampled
matrix. This file is related to the model specified by
formula
and random
.
in the case of doubly-censored data, the same
structure as D.sim
, however related to the model given by
formula2
and random2
.
fully created only if store$b = TRUE
. It
contains sampled values of random effects for all clusters in
the data set. The file has columns sorted as
. This file is
related to the model given by
formula
and random
.
fully created only if store$b2 =
TRUE
and in the case of doubly-censored data, the same
structure as b.sim
, however related to the model
given by formula2
and random2
.
fully created only if store$y = TRUE
. It
contains sampled (augmented) log-event times for all observations
in the data set.
fully created only if store$y2 =
TRUE
and in the case of doubly-censored data, the same
structure as Y.sim
, however related to the model
given by formula2
.
columns labeled loglik
, penalty
,
and logprw
. This file is related to the model
given by formula
. The columns have the following meaning.
loglik
where denotes (augmented) (i,l)th
true log-event time.
In other words, loglik
is equal to the
conditional log-density
penalty: the penalty term
(not multiplied by );
logprw
where
is the number of residuals
assigned intrinsincally to the
th
mixture component.
In other words, logprw
is equal to the conditional
log-density
in the case of doubly-censored data, the same
structure as logposter.sim
, however related to the model
given by formula2
.
Arnošt Komárek [email protected]
Gelfand, A. E., Sahu, S. K., and Carlin, B. P. (1995). Efficient parametrisations for normal linear mixed models. Biometrika, 82, 479-488.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A., Lesaffre, E., and Legrand, C. (2007). Baseline and treatment effect heterogeneity for survival times between centers using a random effects accelerated failure time model with flexible error distribution. Statistics in Medicine, 26, 5457-5472.
## See the description of R commands for ## the model with EORTC data, ## analysis described in Komarek, Lesaffre and Legrand (2007). ## ## R commands available in the documentation ## directory of this package ## as ex-eortc.R and ## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-eortc.pdf ##
## See the description of R commands for ## the model with EORTC data, ## analysis described in Komarek, Lesaffre and Legrand (2007). ## ## R commands available in the documentation ## directory of this package ## as ex-eortc.R and ## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-eortc.pdf ##
A function to estimate a regression model with possibly clustered (possibly right, left, interval or doubly-interval censored) data. In the case of doubly-interval censoring, different regression models can be specified for the onset and event times.
A univariate random effect (random intercept) with the distribution expressed as a penalized normal mixture can be included in the model to adjust for clusters.
The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006) and Komárek and Lesaffre (2008).
SUPPLEMENTED IN 06/2013: Interval-censored times might be subject to misclassification. In case of doubly-interval-censored data, the event time might be subject to misclassification. For details, see García-Zattera, Jara and Komárek (2016).
We explain first in more detail a model without doubly censoring.
Let
be event times for
th cluster and the units within that cluster
The following regression model is assumed:
where is unknown regression parameter vector,
is a vector of covariates.
is a cluster-specific random effect (random intercept).
The random effects
are assumed to be i.i.d. with a univariate density
.
The error terms
are assumed to be i.i.d. with a univariate density
.
Densities and
are
both expressed as
a mixture of Bayesian G-splines (normal densities with equidistant
means and constant variances). We distinguish two,
theoretically equivalent, specifications.
In the following, the density for
is explicitely described. The density for
is obtained in
an analogous manner.
where is the
unknown basis variance and
is an equidistant grid of knots symmetric around the
unknown point
and related to the unknown basis variance through the
relationship
where is fixed
constants, e.g.
(which has a justification of being close to cubic B-splines).
where is an
unknown intercept term and
is an unknown scale parameter.
is then
standardized error term which is distributed according
to the univariate normal mixture, i.e.
where
is an equidistant grid of fixed knots (means), usually
symmetric about the fixed point
and
is fixed basis variance.
Reasonable values for the numbers of grid
points
is
with the distance between the two
knots equal to
and for the basis
variance
Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2008) only Specification 2 is described.
The mixture weights
are
not estimated directly. To avoid the constraints
and
transformed weights
related to the original weights by the logistic transformation:
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2008).
If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.
In the case one wishes to link the random intercept of the onset model and the random intercept of the time-to-event model, there are the following possibilities.
Bivariate normal distribution
It is assumed that the pair of random intercepts from the onset and
time-to-event part of the model are normally distributed with zero
mean and an unknown covariance matrix .
A priori, the inverse covariance matrix is
addumed to follow a Wishart distribution.
Unknown correlation between the basis G-splines
Each pair of basis G-splines describing the distribution of the random
intercept in the onset part and the time-to-event part of the model is
assumed to be correlated with an unknown correlation coefficient
. Note that this is just an experiment and is no
more further supported.
Prior distribution on is assumed to be
uniform. In the MCMC, the Fisher Z transform of the
given by
is sampled. Its prior is derived from the uniform prior
put on
The Fisher Z transform is updated using the Metropolis-Hastings alhorithm. The proposal distribution is given either by a normal approximation obtained using the Taylor expansion of the full conditional distribution or by a Langevin proposal (see Robert and Casella, 2004, p. 318).
bayessurvreg3(formula, random, formula2, random2, data = parent.frame(), classification, classParam = list(Model = c("Examiner", "Factor:Examiner"), a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1, init.sens = NULL, init.spec = NULL), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, prior.b, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), prior2, prior.beta2, prior.b2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), priorinit.Nb, rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE, a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), dir) bayessurvreg3Para(formula, random, formula2, random2, data = parent.frame(), classification, classParam = list(Model = c("Examiner", "Factor:Examiner"), a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1, init.sens = NULL, init.spec = NULL), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, prior.b, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), prior2, prior.beta2, prior.b2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), priorinit.Nb, rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE, a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), dir)
bayessurvreg3(formula, random, formula2, random2, data = parent.frame(), classification, classParam = list(Model = c("Examiner", "Factor:Examiner"), a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1, init.sens = NULL, init.spec = NULL), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, prior.b, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), prior2, prior.beta2, prior.b2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), priorinit.Nb, rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE, a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), dir) bayessurvreg3Para(formula, random, formula2, random2, data = parent.frame(), classification, classParam = list(Model = c("Examiner", "Factor:Examiner"), a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1, init.sens = NULL, init.spec = NULL), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, prior.b, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), prior2, prior.beta2, prior.b2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), priorinit.Nb, rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE, a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), dir)
formula |
model formula for the regression. In the case of doubly-censored data, this is the model formula for the onset time. The left-hand side of the Intercept is implicitely included in the model by the
estimation of the error distribution. As a consequence If
|
||||
random |
formula for the ‘random’ part of the model.
In the case of doubly-censored data, this is the
is allowed. If omitted, no random part is included in the model. |
||||
formula2 |
model formula for the regression of the time-to-event in
the case of doubly-censored data. Ignored otherwise. The same structure as
for |
||||
random2 |
specification of the ‘random’ part of the model for
time-to-event in the case of doubly-censored data. Ignored
otherwise. The same structure as for |
||||
data |
optional data frame in which to interpret the variables
occuring in the |
||||
classification |
Possible additional columns are ignored. If missing, no misclassification is considered. |
||||
classParam |
a The following components of the
|
||||
na.action |
the user is discouraged from changing the default
value |
||||
onlyX |
if |
||||
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
|
||||
prior |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by The item |
||||
prior.b |
a list specifying the prior distribution of the
G-spline defining the distribution of the random intercept in the
regression model given by It is ignored if the argument The item |
||||
prior.beta |
prior specification for the regression parameters,
in the case of doubly-censored data for the regression parameters of
the onset time, i.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayessurvreg3 first with its argument |
||||
init |
an optional list with initial values for the MCMC related
to the model given by
|
||||
mcmc.par |
a list specifying how some of the G-spline parameters
related to the distribution of the error term and of the random
intercept from Compared to the mcmc.par argument of the function
In contrast to |
||||
prior2 |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by |
||||
prior.b2 |
prior specification for the parameters related to the
random effects from It is ignored if the argument |
||||
prior.beta2 |
prior specification for the regression parameters
of time-to-event in the case of doubly censored data (related to
|
||||
init2 |
an optional list with initial values for the MCMC related
to the model given by |
||||
mcmc.par2 |
a list specifying how some of the G-spline parameters
related to |
||||
priorinit.Nb |
a list specifying the prior of the random intercepts in the case of the AFT model with doubly-interval-censored data and onset, time-to-event random intercepts following bivariate normal distribution. The list should have the following components.
|
||||
rho |
a list specifying possible correlation between the onset
random intercept and the time-to-event random intercept in the
experimental version of the model. If not given correlation is fixed
to It is ignored if the argument The list can have the following components.
|
||||
store |
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
|
||||
dir |
a string that specifies a directory where all sampled values are to be stored. |
A list of class bayessurvreg3
containing an information
concerning the initial values and prior choices.
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burn-in, only
every nsimul$nwrite
value is written. After the burn-in, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
one column labeled iteration
with
indeces of MCMC iterations to which the stored sampled values
correspond.
this file is related to the density of the
error term from the model given by formula
.
Columns labeled k
, Mean.1
,
D.1.1
, where
k = number of mixture components that had probability numerically higher than zero;
Mean.1 =
;
D.1.1 =
.
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as mixmoment.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as mixmoment.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as mixmoment.sim
.
this file is related to the density of the
error term from the model given by formula
.
Sampled mixture weights of mixture components that had
probabilities numerically higher than zero.
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as mweight.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as mweight.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as mweight.sim
.
this file is related to the density of the
error term from the model given by formula
.
Indeces
of mixture components that had probabilities numerically higher
than zero. It corresponds to the weights in
mweight.sim
.
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as mmean.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as mmean.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as mmean.sim
.
this file is related to the density of the
error term from the model given by formula
.
Characteristics of the sampled G-spline.
This file together with mixmoment.sim
,
mweight.sim
and mmean.sim
can be used to reconstruct
the G-spline in each MCMC iteration.
The file has columns labeled
gamma1
,
sigma1
,
delta1
,
intercept1
,
scale1
,
The meaning of the values in these columns is the following:
gamma1 = the middle knot
If ‘Specification’ is 2, this column usually contains zeros;
sigma1 = basis standard deviation
of the G-spline. This column contains a fixed value
if ‘Specification’ is 2;
delta1 = distance between the two knots of the G-spline.
This column contains a fixed value if ‘Specification’ is 2;
intercept1 = the intercept term of the G-spline.
If ‘Specification’ is 1, this column usually contains zeros;
scale1 = the scale parameter of the G-spline.
If ‘Specification’ is 1, this column usually contains ones;
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as gspline.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as gspline.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as gspline.sim
.
this file is related to the density of the
error term from the model given by formula
.
Fully created only if store$a = TRUE
. The
file contains the transformed weights
of all mixture components, i.e. also of components that had numerically zero
probabilities.
this file is related to the density of the
random intercept from the model given by formula
and
random
.
Fully created only if store$a.b = TRUE
.
The same structure as mlogweight.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
Fully created only if store$a2 = TRUE
.
The same structure as mlogweight.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
Fully created only if store$a.b2 = TRUE
.
The same structure as mlogweight.sim
.
this file is related to the density of the
error term from the model given by formula
.
Fully created only if store$r = TRUE
. The file
contains the labels of the mixture components into which the
residuals are intrinsically assigned. Instead of indeces on the
scale
values from 1 to
are stored here. Function
vecr2matr
can be used to transform it back to
indices from to
.
this file is related to the density of the
random intercept from the model given by formula
and
random
.
Fully created only if store$r.b = TRUE
.
The same structure as r.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
Fully created only if store$r2 = TRUE
.
The same structure as r.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
Fully created only if store$r.b2 = TRUE
.
The same structure as r.sim
.
this file is related to the density of the
error term from the model given by formula
.
One column labeled lambda
. These are the
values of the smoothing parameter
(hyperparameters of the prior distribution of the transformed
mixture weights
).
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as lambda.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as lambda.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as lambda.sim
.
this file is related to the model given by
formula
.
Sampled values of the regression parameters
.
The columns are labeled according to the
colnames
of the design matrix.
in the case of doubly-censored data, the same
structure as beta.sim
, however related to the model
given by formula2
.
this file is related to the model given by
formula
and random
.
Fully created only if store$b = TRUE
. It
contains sampled values of random intercepts for all clusters in
the data set. The file has columns.
fully created only if store$b2 =
TRUE
and in the case of doubly-censored data, the same
structure as b.sim
, however related to the model
given by formula2
and random2
.
this file is related to the model given by formula
.
Fully created only if store$y = TRUE
. It
contains sampled (augmented) log-event times for all observations
in the data set.
fully created only if store$y2 =
TRUE
and in the case of doubly-censored data, the same
structure as Y.sim
, however related to the model
given by formula2
.
This file is related to the residuals of the model
given by formula
.
Columns labeled loglik
, penalty
, and logprw
.
The columns have the following meaning.
loglik
where denotes (augmented) (i,l)th
true log-event time.
In other words, loglik
is equal to the
conditional log-density
penalty: the penalty term
(not multiplied by );
logprw
where
is the number of residuals
assigned intrinsincally to the
th
mixture component.
In other words, logprw
is equal to the conditional
log-density
This file is related to the random
intercepts from the model given by formula
and
random
.
Columns labeled loglik
, penalty
, and logprw
.
The columns have the following meaning.
loglik
where denotes (augmented) ith
random intercept.
In other words, loglik
is equal to the
conditional log-density
The columns penalty
and logprw
have the analogous
meaning as in the case of logposter.sim file.
in the case of doubly-censored data, the same
structure as logposter.sim
, however related to the model
given by formula2
.
in the case of doubly-censored data, the same
structure as logposter_b.sim
, however related to the model
given by formula2
and random2
.
Arnošt Komárek [email protected]
García-Zattera, M. J., Jara, A., and Komárek, A. (2016). A flexible AFT model for misclassified clustered interval-censored data. Biometrics, 72, 473 - 483.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523 - 533.
Robert C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, Second Edition. New York: Springer Science+Business Media.
## See the description of R commands for ## the cluster specific AFT model ## with the Signal Tandmobiel data, ## analysis described in Komarek and Lesaffre (2007). ## ## R commands available in the documentation ## directory of this package ## - see ex-tandmobCS.R and ## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-tandmobCS.pdf ##
## See the description of R commands for ## the cluster specific AFT model ## with the Signal Tandmobiel data, ## analysis described in Komarek and Lesaffre (2007). ## ## R commands available in the documentation ## directory of this package ## - see ex-tandmobCS.R and ## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-tandmobCS.pdf ##
Dataset from Fleming and Harrington (1991).
Data from a multicenter placebo-controlled randomized trial of gamma
inferon in patients with chronic granulomatous disease (CGD). 128
patients were randomized to either gamma inferon () or
placebo (
). For each patient the times from study
entry to initial and any recurrent serious infections are
available. There is a minimum of one and a maximum of eight
(recurrent) infection times per patient, with a total of 203 records.
data(cgd)
data(cgd)
a~data frame with 203 rows and 17 columns. There are the following variables contained in the data frame.
code of the hospital
case identification number
date randomization onto study (mmddyy)
date of onset of serious infection, or date the patient was taken off the study (mmddyy)
treatment code, 1 = rIFN-gamma, 2 = placebo
pattern of inheritance, 1 = X-linked, 2 = autosomal recessive
age in years
height of the patient in cm
weight of the patient in kg
using corticosteroids at time of study entry, 1 = yes, 2 = no
using prophylatic antibiotics at time of study entry, 1 = yes, 2 = no
1 = male, 2 = female
hospital category, 1 = US-NIH, 2 = US-other, 2 = Europe - Amsterdam, 4 = Europe - other
elapsed time (in days) from randomization (from sequence = 1 record) to diagnosis of a serious infection or, if a censored observation, elapsed time from randomization to censoring date; computed as IDT - RDT (from sequence = 1 record)
0, for sequence = 1 record, if sequence > 1, T2 = T1(from previous record) + 1
censoring indicator, 1 = non-censored observation, 2 = censored observation
sequence number, for each patient, the infection recods are in sequence number order
Appendix D.2 of Fleming and Harrington (1991).
Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis. New York: John Wiley and Sons.
See references below for more details.
credible.region(sample, probs=c(0.90, 0.975))
credible.region(sample, probs=c(0.90, 0.975))
sample |
a data frame or matrix with sampled values (one column = one parameter) |
probs |
probabilities for which the credible regions are to be computed |
A list (one component for each confidence region) of length equal to
length(probs)
. Each component of the list is a matrix with two
rows (lower and upper limit) and as many columns as the number of
parameters giving the confidence region.
Arnošt Komárek [email protected]
Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995). Bayesian computation and stochastic systems (with Discussion). Statistical Science, 10, 3 - 66, page 30
Held, L. (2004). Simultaneous inference in risk assessment; a Bayesian perspective In: COMPSTAT 2004, Proceedings in Computational Statistics (J. Antoch, Ed.), 213 - 222, page 214
Held, L. (2004b). Simultaneous posterior probability statements from Monte Carlo output. Journal of Computational and Graphical Statistics, 13, 20 - 35.
m <- 10000 sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m)) probs <- c(0.70, 0.90, 0.95) CR <- credible.region(sample, probs=probs) for (kk in 1:length(CR)){ suma <- sum(sample$x1 >= CR[[kk]]["Lower", "x1"] & sample$x1 <= CR[[kk]]["Upper", "x1"] & sample$x2 >= CR[[kk]]["Lower", "x2"] & sample$x2 <= CR[[kk]]["Upper", "x2"] & sample$x3 >= CR[[kk]]["Lower", "x3"] & sample$x3 <= CR[[kk]]["Upper", "x3"]) show <- c(suma/m, probs[kk]) names(show) <- c("Empirical", "Desired") print(show) }
m <- 10000 sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m)) probs <- c(0.70, 0.90, 0.95) CR <- credible.region(sample, probs=probs) for (kk in 1:length(CR)){ suma <- sum(sample$x1 >= CR[[kk]]["Lower", "x1"] & sample$x1 <= CR[[kk]]["Upper", "x1"] & sample$x2 >= CR[[kk]]["Lower", "x2"] & sample$x2 <= CR[[kk]]["Upper", "x2"] & sample$x3 >= CR[[kk]]["Lower", "x3"] & sample$x3 <= CR[[kk]]["Upper", "x3"]) show <- c(suma/m, probs[kk]) names(show) <- c("Empirical", "Desired") print(show) }
Displays a plot of the density estimate for each variable in x, calculated by the density function.
This is slightly modified version of densplot
function
of a coda
package to conform to my personal preferences.
densplot2(x, plot = TRUE, show.obs = FALSE, bwf, bty = "n", main = "", xlim, ylim, xlab, ylab, ...)
densplot2(x, plot = TRUE, show.obs = FALSE, bwf, bty = "n", main = "", xlim, ylim, xlab, ylab, ...)
x |
|
plot |
if |
show.obs |
show observations along the x-axis? |
bwf |
function for calculating the bandwidth. If omitted, the bandwidth is calculate by 1.06 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one fifth power. |
xlim , ylim , xlab , ylab
|
further arguments passed to the
|
bty , main , ...
|
further arguments passed to the
|
No return value.
Arnošt Komárek [email protected]
This function creates a coda
mcmc
object from values found
in files where sampled values from bayessurvreg1
function are stored
or from data.frames.
files2coda(files, data.frames, variant = 1, dir = getwd(), start = 1, end, thin = 1, header = TRUE, chain)
files2coda(files, data.frames, variant = 1, dir = getwd(), start = 1, end, thin = 1, header = TRUE, chain)
files |
a vector of strings giving the names of files that are to
be converted to |
data.frames |
a vector of strings giving the names of data.frames
that are to be converted to |
variant |
a variant of Currently only 1 is supported to cooperate with |
dir |
string giving the directory where it will be searched for the files with sampled values. |
start |
the first row (possible header does not count) from the files with the sampled values that will be converted to coda objects. |
end |
the last row from the files with the sampled values that will be converted to coda objects. If missing, it is the last row in files. |
thin |
additional thinning of sampled values (i.e. only every
|
header |
TRUE or FALSE indicating whether the files with the sampled values contain also the header on the first line or not. |
chain |
parameter giving the number of the chain if parallel
chains were created and sampled values stored in data.frames further
stored in lists(). If |
A list with mcmc
objects. One object per file or data.frame.
Arnošt Komárek [email protected]
## *** illustration of usage of parameters 'data.frames' and 'chain' *** ## ********************************************************************* ## Two parallel chains with four variables, stored in data.frames ## data.frames are further stored in lists library("coda") group1 <- list(); group2 <- list(); group3 <- list() ## first chain of first two variables: group1[[1]] <- data.frame(var1 = rnorm(100, 0, 1), var2 = rnorm(100, 5, 4)) ## second chain of first two variables: group1[[2]] <- data.frame(var1 = rnorm(100, 0, 1), var2 = rnorm(100, 5, 4)) ## first chain of the third variable: group2[[1]] <- data.frame(var3 = rgamma(100, 1, 1)) ## second chain of the third variable: group2[[2]] <- data.frame(var3 = rgamma(100, 1, 1)) ## first chain of the fourth variable: group3[[1]] <- data.frame(var4 = rbinom(100, 1, 0.4)) ## second chain of the fourth variable: group3[[2]] <- data.frame(var4 = rbinom(100, 1, 0.4)) ## Create mcmc objects for each chain separately mc.chain1 <- files2coda(data.frames = c("group1", "group2", "group3"), chain = 1) mc.chain2 <- files2coda(data.frames = c("group1", "group2", "group3"), chain = 2) ## Create mcmc.list to represent two parallel chains mc <- mcmc.list(mc.chain1, mc.chain2) rm(mc.chain1, mc.chain2) ## *** illustration of usage of parameter 'data.frames' without 'chain' *** ## ************************************************************************ ## Only one chain for four variables was sampled and stored in three data.frames ## chain of first two variables: group1 <- data.frame(var1 = rnorm(100, 0, 1), var2 = rnorm(100, 5, 4)) ## chain of the third variable: group2 <- data.frame(var3 = rgamma(100, 1, 1)) ## chain of the fourth variable: group3 <- data.frame(var4 = rbinom(100, 1, 0.4)) ## Create an mcmc object mc <- files2coda(data.frames = c("group1", "group2", "group3"))
## *** illustration of usage of parameters 'data.frames' and 'chain' *** ## ********************************************************************* ## Two parallel chains with four variables, stored in data.frames ## data.frames are further stored in lists library("coda") group1 <- list(); group2 <- list(); group3 <- list() ## first chain of first two variables: group1[[1]] <- data.frame(var1 = rnorm(100, 0, 1), var2 = rnorm(100, 5, 4)) ## second chain of first two variables: group1[[2]] <- data.frame(var1 = rnorm(100, 0, 1), var2 = rnorm(100, 5, 4)) ## first chain of the third variable: group2[[1]] <- data.frame(var3 = rgamma(100, 1, 1)) ## second chain of the third variable: group2[[2]] <- data.frame(var3 = rgamma(100, 1, 1)) ## first chain of the fourth variable: group3[[1]] <- data.frame(var4 = rbinom(100, 1, 0.4)) ## second chain of the fourth variable: group3[[2]] <- data.frame(var4 = rbinom(100, 1, 0.4)) ## Create mcmc objects for each chain separately mc.chain1 <- files2coda(data.frames = c("group1", "group2", "group3"), chain = 1) mc.chain2 <- files2coda(data.frames = c("group1", "group2", "group3"), chain = 2) ## Create mcmc.list to represent two parallel chains mc <- mcmc.list(mc.chain1, mc.chain2) rm(mc.chain1, mc.chain2) ## *** illustration of usage of parameter 'data.frames' without 'chain' *** ## ************************************************************************ ## Only one chain for four variables was sampled and stored in three data.frames ## chain of first two variables: group1 <- data.frame(var1 = rnorm(100, 0, 1), var2 = rnorm(100, 5, 4)) ## chain of the third variable: group2 <- data.frame(var3 = rgamma(100, 1, 1)) ## chain of the fourth variable: group3 <- data.frame(var4 = rbinom(100, 1, 0.4)) ## Create an mcmc object mc <- files2coda(data.frames = c("group1", "group2", "group3"))
This function computes a sample mean, quantiles and a Bayesian
-value which is defined as
where
is the number of the sampled values which are
negative and
is the number of sampled values which
are positive.
give.summary(sample, probs=c(0.5, 0.025, 0.975))
give.summary(sample, probs=c(0.5, 0.025, 0.975))
sample |
a data frame or a vector with sampled values |
probs |
probabilities of the quantiles that are to be computed |
A matrix or a vector with the sample mean, quantiles and a Bayesian -value.
Arnošt Komárek [email protected]
## Example with a sample stored in a vector: sample <- rnorm(1000) give.summary(sample) ## Example with a sample stored in a data.frame: sample <- data.frame(x=rnorm(1000), y=rgamma(1000, shape=1, rate=1)) give.summary(sample)
## Example with a sample stored in a vector: sample <- rnorm(1000) give.summary(sample) ## Example with a sample stored in a data.frame: sample <- data.frame(x=rnorm(1000), y=rgamma(1000, shape=1, rate=1)) give.summary(sample)
Compute the estimate of the marginal density function based on the values sampled using the MCMC (MCMC average evaluated in a grid of values) in a model where density is specified as a bivariate Bayesian G-spline.
This function serves to summarize the MCMC chains related to the distributional parts
of the considered models obtained using the functions:
bayesHistogram
and bayesBisurvreg
.
If asked, this function returns also the values of the marginal G-spline evaluated in a grid at each iteration of MCMC.
marginal.bayesGspline(dir, extens = "", K, grid1, grid2, skip = 0, by = 1, last.iter, nwrite, only.aver = TRUE)
marginal.bayesGspline(dir, extens = "", K, grid1, grid2, skip = 0, by = 1, last.iter, nwrite, only.aver = TRUE)
dir |
directory where to search for files (‘mixmoment.sim’, ‘mweight.sim’, ‘mmean.sim’, ‘gspline.sim’) with the MCMC sample. |
extens |
an extension used to distinguish different sampled
G-splines if more G-splines were used in one simulation (e.g. with
doubly-censored data). According to which
|
K |
a~vector of length 2 specifying the number of knots at each side of the middle knot for each dimension of the G-spline. |
grid1 |
grid of values from the first dimension at which the sampled marginal densities are to be evaluated. |
grid2 |
grid of values from the second dimension at which the sampled marginal densities are to be evaluated. |
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
by |
additional thinning of the sample. |
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
nwrite |
frequency with which is the user informed about the
progress of computation (every |
only.aver |
|
An object of class marginal.bayesGspline
is returned. This object is a
list with components margin1
and margin2
(for two margins).
Both margin1
and margin2
components are data.frames with
columns named grid
and average
where
grid |
is a grid of values (vector) at which the McMC average of the marginal G-spline was computed. |
average |
are McMC averages of the marginal G-spline (vector) evaluated in
|
There exists a method to plot objects of the class marginal.bayesGspline
.
Additionally, the object of class marginal.bayesGspline
has the following
attributes:
sample.size
a length of the McMC sample used to compute the McMC average.
sample1
marginal (margin = 1) G-spline evaluated in a grid of values. This attribute
is present only if only.aver = FALSE
.
This a matrix with sample.size
columns and length(grid1) rows.
sample2
marginal (margin = 2) G-spline evaluated in a grid of values. This attribute
is present only if only.aver = FALSE
.
This a matrix with sample.size
columns and length(grid2) rows.
Arnošt Komárek [email protected]
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2006), ## ## R commands available ## in the documentation ## directory of this package ## - see ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf ##
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2006), ## ## R commands available ## in the documentation ## directory of this package ## - see ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf ##
This function plots an object created by bayesDensity
.
## S3 method for class 'bayesDensity' plot(x, k.cond, dim.plot = TRUE, over = TRUE, alegend = TRUE, standard = TRUE, center = FALSE, type = "l", bty = "n", xlab = expression(epsilon), ylab = expression(f(epsilon)), lty, xlim, ylim, xleg, yleg, main, ...)
## S3 method for class 'bayesDensity' plot(x, k.cond, dim.plot = TRUE, over = TRUE, alegend = TRUE, standard = TRUE, center = FALSE, type = "l", bty = "n", xlab = expression(epsilon), ylab = expression(f(epsilon)), lty, xlim, ylim, xleg, yleg, main, ...)
x |
an object of class |
k.cond |
a numerical vector giving the numbers of mixture components for which the conditional densities
are to be plotted. 0 states for the unconditional (overall) density, averaged over the mixture with all possible
numbers of components. If NULL, all conditional and the
unconditional density found in |
dim.plot |
an indicator whether the dimension of the plot used
in |
over |
an indicator whether all densities should be drawn into
one plot using different types of lines. If |
alegend |
an indicator whether an automatic legend should be added to the plot. |
standard |
logical, do we want to plot standardized density? |
center |
logical, do we want to plot centered density?, set both
|
xleg , yleg
|
position of the legend if |
type , bty , xlab , ylab , lty , xlim , ylim , main , ...
|
other
arguments passed to the |
No return value.
Arnošt Komárek [email protected]
This function plots an object created by bayesGspline
.
## S3 method for class 'bayesGspline' plot(x, add = FALSE, type = "l", lty=1, bty = "n", xlab, ylab, main, sub, ...)
## S3 method for class 'bayesGspline' plot(x, add = FALSE, type = "l", lty=1, bty = "n", xlab, ylab, main, sub, ...)
x |
an object of class |
add |
if |
type , lty , bty , xlab , ylab , main , sub , ...
|
other
arguments passed to the |
No return value.
Arnošt Komárek [email protected]
This function plots an object created by marginal.bayesGspline
.
## S3 method for class 'marginal.bayesGspline' plot(x, type = "l", lty=1, bty = "n", xlab1, ylab1, main1, xlab2, ylab2, main2, sub, ...)
## S3 method for class 'marginal.bayesGspline' plot(x, type = "l", lty=1, bty = "n", xlab1, ylab1, main1, xlab2, ylab2, main2, sub, ...)
x |
an object of class |
type , lty , bty , xlab1 , ylab1 , main1 , xlab2 , ylab2 , main2 , sub , ...
|
other
arguments passed to the |
No return value.
Arnošt Komárek [email protected]
This function runs additional McMC to compute predictive survivor and hazard curves and predictive event times for specified values of covariates.
Firstly, the function bayessurvreg1
has to be used to
obtain a sample from the posterior distribution of unknown quantities.
Directly, posterior predictive quantiles and means of asked quantities are computed and stored in files.
Function predictive.control
serves only to perform some input
checks inside the main function predictive
.
predictive(formula, random, time0 = 0, data = parent.frame(), grid, type = "mixture", subset, na.action = na.fail, quantile = c(0, 0.025, 0.5, 0.975, 1), skip = 0, by = 1, last.iter, nwrite, only.aver = FALSE, predict = list(Et=TRUE, t=FALSE, Surv=TRUE, hazard=FALSE, cum.hazard=FALSE), store = list(Et=TRUE, t = FALSE, Surv = FALSE, hazard = FALSE, cum.hazard=FALSE), Eb0.depend.mix = FALSE, dir, toler.chol = 1e-10, toler.qr = 1e-10) predictive.control(predict, store, only.aver, quantile)
predictive(formula, random, time0 = 0, data = parent.frame(), grid, type = "mixture", subset, na.action = na.fail, quantile = c(0, 0.025, 0.5, 0.975, 1), skip = 0, by = 1, last.iter, nwrite, only.aver = FALSE, predict = list(Et=TRUE, t=FALSE, Surv=TRUE, hazard=FALSE, cum.hazard=FALSE), store = list(Et=TRUE, t = FALSE, Surv = FALSE, hazard = FALSE, cum.hazard=FALSE), Eb0.depend.mix = FALSE, dir, toler.chol = 1e-10, toler.qr = 1e-10) predictive.control(predict, store, only.aver, quantile)
formula |
the same formula as that one used to sample from the
posterior distribution of unknown quantities by the function
|
random |
the same |
time0 |
starting time for the survival model. This option is used
to get correct hazard function in the case that the original model was
|
data |
optional data frame in which to interpret the variables
occuring in the formulas. Usually, you create a new
|
grid |
a list of length as number of observations in |
type |
a character string giving the type of assumed error distribution. Currently, valid are substrings of "mixture". In the future, "spline", "polya.tree" might be also implemented. |
subset |
subset of the observations from the |
na.action |
function to be used to handle any |
quantile |
a vector of quantiles that are to be computed for each predictive quantity. |
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
by |
additional thinning of the sample. |
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
nwrite |
frequency with which is the user informed about the
progress of computation (every |
only.aver |
if |
predict |
a list of logical values indicating which predictive quantities are to be sampled. Components of the list:
|
store |
a list of logical values indicating which predictive quantities are to be stored in files as ‘predET*.sim’, ‘predT*.sim’, ‘predS*.sim’, ‘predhazard*.sim’, ‘predcumhazard*.sim’. If you are interested only in posterior means or quantiles of the predictive quantities you do not have to store sampled values. Posterior means and quantiles are stored in files ‘quantET*.sim’, ‘quantT*.sim’, ‘quantS*.sim’, ‘quanthazard*.sim’, ‘quantpredhazard*.sim’. |
Eb0.depend.mix |
a logical value indicating whether the mean of
the random intercept (if included in the model) was given in a
hierarchical model as an overall mean of the mixture in the error
term. With |
dir |
a string giving a directory where previously simulated values were stored and where newly obtained quantities will be stored. On Unix, do not use ‘~/’ to specify your home directory. A full path must be given, e.g. ‘/home/arnost/’. |
toler.chol |
tolerance for the Cholesky decomposition. |
toler.qr |
tolerance for the QR decomposition. |
An integer which should be equal to zero if everything ran fine.
Arnošt Komárek [email protected]
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549 - 569.
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2007). ## ## R commands available ## in the documentation ## directory of this package as ## - ex-cgd.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-cgd.pdf ## ## - ex-tandmobMixture.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobMixture.pdf ##
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2007). ## ## R commands available ## in the documentation ## directory of this package as ## - ex-cgd.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-cgd.pdf ## ## - ex-tandmobMixture.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobMixture.pdf ##
This function computes predictive densities, survivor and hazard curves for specified combinations of covariates.
Firstly, either the function bayesBisurvreg
or the
function bayessurvreg2
or the function bayessurvreg3
has to be used to obtain a sample from the posterior distribution of unknown quantities.
Function predictive2.control
serves only to perform some input
checks inside the main function predictive2
.
predictive2(formula, random, obs.dim, time0, data = parent.frame(), grid, na.action = na.fail, Gspline, quantile = c(0, 0.025, 0.5, 0.975, 1), skip = 0, by = 1, last.iter, nwrite, only.aver = TRUE, predict = list(density=FALSE, Surv=TRUE, hazard=FALSE, cum.hazard=FALSE), dir, extens = "", extens.random="_b", version=0) predictive2Para(formula, random, obs.dim, time0, data = parent.frame(), grid, na.action = na.fail, Gspline, quantile = c(0, 0.025, 0.5, 0.975, 1), skip = 0, by = 1, last.iter, nwrite, only.aver = TRUE, predict = list(density=FALSE, Surv=TRUE, hazard=FALSE, cum.hazard=FALSE), dir, extens = "", extens.random="_b", version=0) predictive2.control(predict, only.aver, quantile, obs.dim, time0, Gspline, n)
predictive2(formula, random, obs.dim, time0, data = parent.frame(), grid, na.action = na.fail, Gspline, quantile = c(0, 0.025, 0.5, 0.975, 1), skip = 0, by = 1, last.iter, nwrite, only.aver = TRUE, predict = list(density=FALSE, Surv=TRUE, hazard=FALSE, cum.hazard=FALSE), dir, extens = "", extens.random="_b", version=0) predictive2Para(formula, random, obs.dim, time0, data = parent.frame(), grid, na.action = na.fail, Gspline, quantile = c(0, 0.025, 0.5, 0.975, 1), skip = 0, by = 1, last.iter, nwrite, only.aver = TRUE, predict = list(density=FALSE, Surv=TRUE, hazard=FALSE, cum.hazard=FALSE), dir, extens = "", extens.random="_b", version=0) predictive2.control(predict, only.aver, quantile, obs.dim, time0, Gspline, n)
formula |
the same formula as that one used to sample from the
posterior distribution of unknown quantities by the function
REMARK: the prediction must be asked for at least two combinations of covariates. This is the restriction imposed by one of the internal functions I use. |
||
random |
the same |
||
obs.dim |
a vector that has to be supplied if bivariate data were
used for estimation (using the function
|
||
time0 |
a~vector of length |
||
data |
optional data frame in which to interpret the variables
occuring in the formulas. Usually, you create a new
|
||
grid |
a~vector giving the grid of values where predictive
quantities are to be evaluated. The grid should normally start at some
value slightly higher than |
||
na.action |
function to be used to handle any |
||
Gspline |
a~list specifying the G-spline used for the error distribution in the model. It is a~list with the following components:
|
||
quantile |
a vector of quantiles that are to be computed for each predictive quantity. |
||
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
||
by |
additional thinning of the sample. |
||
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
||
nwrite |
frequency with which is the user informed about the
progress of computation (every |
||
only.aver |
if The word of warning: with |
||
predict |
a list of logical values indicating which predictive quantities are to be computed. Components of the list:
|
||
dir |
directory where to search for files (‘mixmoment.sim’, ‘mweight.sim’, mmean.sim', gspline.sim', 'beta.sim', 'D.sim', ...) with the McMC sample. |
||
extens |
an extension used to distinguish different sampled G-splines if more formulas were used in one MCMC simulation (e.g. with doubly-censored data).
|
||
extens.random |
only applicable if the function
This is an extension used to distinguish different sampled G-splines determining the distribution of the random intercept (under the presence of doubly-censored data).
|
||
version |
this argument indicates by which
|
||
n |
number of covariate combinations for which the prediction will be performed. |
A list with possibly the following components (what is included depends
on the value of the arguments predict
and only.aver
):
grid |
a~vector with the grid values at which the survivor function, survivor density, hazard and cumulative hazard are computed. |
Surv |
predictive survivor functions. A~matrix with as many columns as length(grid) and as many rows as the number of covariate combinations for which the predictive quantities were asked. One row per covariate combination. |
density |
predictive survivor densities. The same structure as |
hazard |
predictive hazard functions. The same structure as |
cum.hazard |
predictive cumulative hazard functions. The same structure as |
quant.Surv |
pointwise quantiles for the predictive survivor functions. This is a list with as many components as the number of covariate combinations. One component per covariate combination. Each component of this list is a~matrix with as many columns as
length(grid) and as many rows as the length of the argument
|
quant.density |
pointwise quantiles for the predictive survivor densities. The same structure as |
quant.hazard |
pointwise quantiles for the predictive hazard functions. The same structure as |
quant.cum.hazard |
pointwise quantiles for the predictive cumulative hazard functions. The same structure as |
Arnošt Komárek [email protected]
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523 - 533.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
Komárek, A., Lesaffre, E., and Legrand, C. (2007). Baseline and treatment effect heterogeneity for survival times between centers using a random effects accelerated failure time model with flexible error distribution. Statistics in Medicine, 26, 5457 - 5472.
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2006), ## Komarek and Lesaffre (2008), ## Komarek, Lesaffre, and Legrand (2007). ## ## R commands available in the documentation ## directory of this package ## - ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf ## - ex-tandmobCS.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobCS.pdf ## - ex-eortc.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-eortc.pdf
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2006), ## Komarek and Lesaffre (2008), ## Komarek, Lesaffre, and Legrand (2007). ## ## R commands available in the documentation ## directory of this package ## - ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf ## - ex-tandmobCS.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobCS.pdf ## - ex-eortc.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-eortc.pdf
This function prints a~object created by bayesDensity
.
## S3 method for class 'bayesDensity' print(x, ...)
## S3 method for class 'bayesDensity' print(x, ...)
x |
an object of class |
... |
this is here for a consistency with a generic function. |
No return value.
Arnošt Komárek [email protected]
According to the parametrization used, sample from the multivariate normal distribution.
The following parametrization can be specified
In this case we sample from either
or from
In this case we sample from
Generation of random numbers is performed by Algorithms 2.3-2.5 in Rue and Held (2005, pp. 34-35).
rMVNorm(n, mean=0, Sigma=1, Q, param=c("standard", "canonical"))
rMVNorm(n, mean=0, Sigma=1, Q, param=c("standard", "canonical"))
n |
number of observations to be sampled. |
mean |
For For |
Sigma |
covariance matrix of the multivariate normal
distribution. It is ignored if |
Q |
precision matrix of the multivariate normal distribution. It does not have to be supplied provided It must be supplied if |
param |
a character which specifies the parametrization. |
Matrix with sampled points in rows.
Arnošt Komárek [email protected]
Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Boca Raton: Chapman and Hall/CRC.
### Mean, covariance matrix, its inverse ### and the canonical mean mu <- c(0, 2, 0.5) L <- matrix(c(1, 1, 1, 0, 0.5, 0.5, 0, 0, 0.3), ncol=3) Sigma <- L %*% t(L) Q <- chol2inv(t(L)) b <- Q %*% mu print(Sigma) print(Q) print(Sigma %*% Q) ### Sample using different parametrizations set.seed(775988621) n <- 10000 ### Sample from N(mu, Sigma) xx1 <- rMVNorm(n=n, mean=mu, Sigma=Sigma) apply(xx1, 2, mean) var(xx1) ### Sample from N(mu, Q^{-1}) xx2 <- rMVNorm(n=n, mean=mu, Q=Q) apply(xx2, 2, mean) var(xx2) ### Sample from N(Q^{-1}*b, Q^{-1}) xx3 <- rMVNorm(n=n, mean=b, Q=Q, param="canonical") apply(xx3, 2, mean) var(xx3)
### Mean, covariance matrix, its inverse ### and the canonical mean mu <- c(0, 2, 0.5) L <- matrix(c(1, 1, 1, 0, 0.5, 0.5, 0, 0, 0.3), ncol=3) Sigma <- L %*% t(L) Q <- chol2inv(t(L)) b <- Q %*% mu print(Sigma) print(Q) print(Sigma %*% Q) ### Sample using different parametrizations set.seed(775988621) n <- 10000 ### Sample from N(mu, Sigma) xx1 <- rMVNorm(n=n, mean=mu, Sigma=Sigma) apply(xx1, 2, mean) var(xx1) ### Sample from N(mu, Q^{-1}) xx2 <- rMVNorm(n=n, mean=mu, Q=Q) apply(xx2, 2, mean) var(xx2) ### Sample from N(Q^{-1}*b, Q^{-1}) xx3 <- rMVNorm(n=n, mean=b, Q=Q, param="canonical") apply(xx3, 2, mean) var(xx3)
Sample from the Wishart distribution
where are degrees of freedom of the Wishart distribution
and
is its scale matrix. The same parametrization as in
Gelman (2004) is assumed, that is, if
then
.
In the univariate case, is the
same as
Generation of random numbers is performed by the algorithm described in Ripley (1987, pp. 99).
rWishart(n, df, S)
rWishart(n, df, S)
n |
number of observations to be sampled. |
df |
degrees of freedom of the Wishart distribution. |
S |
scale matrix of the Wishart distribution. |
Matrix with sampled points (lower triangles of ) in rows.
Arnošt Komárek [email protected]
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2004). Bayesian Data Analysis, Second edition. Boca Raton: Chapman and Hall/CRC.
Ripley, B. D. (1987). Stochastic Simulation. New York: John Wiley and Sons.
### The same as rgamma(n, shape=df/2, rate=1/(2*S)) n <- 1000 df <- 1 S <- 3 w <- rWishart(n=n, df=df, S=S) mean(w) ## should be close to df*S var(w) ## should be close to 2*df*S^2 ### Multivariate Wishart n <- 1000 df <- 2 S <- matrix(c(1,3,3,13), nrow=2) w <- rWishart(n=n, df=df, S=S) apply(w, 2, mean) ## should be close to df*S df*S df <- 2.5 S <- matrix(c(1,2,3,2,20,26,3,26,70), nrow=3) w <- rWishart(n=n, df=df, S=S) apply(w, 2, mean) ## should be close to df*S df*S
### The same as rgamma(n, shape=df/2, rate=1/(2*S)) n <- 1000 df <- 1 S <- 3 w <- rWishart(n=n, df=df, S=S) mean(w) ## should be close to df*S var(w) ## should be close to 2*df*S^2 ### Multivariate Wishart n <- 1000 df <- 2 S <- matrix(c(1,3,3,13), nrow=2) w <- rWishart(n=n, df=df, S=S) apply(w, 2, mean) ## should be close to df*S df*S df <- 2.5 S <- matrix(c(1,2,3,2,20,26,3,26,70), nrow=3) w <- rWishart(n=n, df=df, S=S) apply(w, 2, mean) ## should be close to df*S df*S
This function computes a sample covariance matrix.
sampleCovMat(sample)
sampleCovMat(sample)
sample |
a |
When is a sequence of
-dimensional vectors
the sample covariance
matrix
is equal to
where
When the function returns just sum of squares.
This function returns a matrix.
Arnošt Komárek [email protected]
## Sample some values z1 <- rnorm(100, 0, 1) ## first components of y z2 <- rnorm(100, 5, 2) ## second components of y z3 <- rnorm(100, 10, 0.5) ## third components of y ## Put them into a data.frame sample <- data.frame(z1, z2, z3) ## Compute a sample covariance matrix sampleCovMat(sample)
## Sample some values z1 <- rnorm(100, 0, 1) ## first components of y z2 <- rnorm(100, 5, 2) ## second components of y z3 <- rnorm(100, 10, 0.5) ## third components of y ## Put them into a data.frame sample <- data.frame(z1, z2, z3) ## Compute a sample covariance matrix sampleCovMat(sample)
This function computes an estimate of the residual (after adjustment
for covariates) Kendall's tau for the bivariate survival model fitted
using the functions bayesHistogram
or
bayesBisurvreg
.
For both these function their argument prior$specification
must
be equal to 2!
When is a bivariate distribution function, the population
version of the Kendall's tau is defined as
.
For the model estimated using one of the above mentioned functions the value of Kendall's tau at each iteration of MCMC is equal to
where
are knots in the first margin,
are knots in the second margin,
is the basis standard deviation in the first margin,
is the basis standard deviation in the second margin,
and
are the G-spline weights.
sampled.kendall.tau(dir = getwd(), extens = "", K, skip = 0, by = 1, last.iter, nwrite)
sampled.kendall.tau(dir = getwd(), extens = "", K, skip = 0, by = 1, last.iter, nwrite)
dir |
directory where to search for files (‘mixmoment.sim’, ‘mweight.sim’, ‘mmean.sim’, ‘gspline.sim’) with the MCMC sample. |
extens |
an extension used to distinguish different sampled
G-splines if more G-splines were used in one simulation (with
doubly-censored data) According to which
|
K |
a~vector of length 2 specifying the number of knots at each side of the middle knot for each dimension of the G-spline. |
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
by |
additional thinning of the sample. |
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
nwrite |
frequency with which is the user informed about the
progress of computation (every |
A vector with sampled values of the Kendall's tau.
Arnošt Komárek [email protected]
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2006), ## ## R commands available ## in the documentation ## directory of this package ## - see ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf ##
## See the description of R commands for ## the models described in ## Komarek (2006), ## Komarek and Lesaffre (2006), ## ## R commands available ## in the documentation ## directory of this package ## - see ex-tandmobPA.R and ## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf ##
Read numeric data into a data frame from a file. Header is assumed to be present in the file.
scanFN(file, quiet=FALSE)
scanFN(file, quiet=FALSE)
file |
the name of a file to read data values from. If the
specified file is Otherwise, the file name is interpreted relative to the
current working directory (given by Alternatively,
To read a data file not in the current encoding (for example a
Latin-1 file in a UTF-8 locale or conversely) use a
|
quiet |
logical: if |
See scan
.
data.frame
with read data values.
Arnošt Komárek [email protected]
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
cat("x y z", "1 2 3", "1 4 6", "10 20 30", file="ex.data", sep="\n") pp <- scanFN("ex.data", quiet=FALSE) pp <- scanFN("ex.data", quiet= TRUE) print(pp) unlink("ex.data") # tidy up
cat("x y z", "1 2 3", "1 4 6", "10 20 30", file="ex.data", sep="\n") pp <- scanFN("ex.data", quiet=FALSE) pp <- scanFN("ex.data", quiet= TRUE) print(pp) unlink("ex.data") # tidy up
The p-value is computed as 1 - the credible level of the credible region which just cover the point (0, 0, ..., 0)'.
The function returns also the simultaneous credible region (rectangle) with a specified credible level.
simult.pvalue(sample, precision=0.001, prob=0.95) ## S3 method for class 'simult.pvalue' print(x, ...)
simult.pvalue(sample, precision=0.001, prob=0.95) ## S3 method for class 'simult.pvalue' print(x, ...)
sample |
a data frame or matrix with sampled values (one column = one parameter) |
precision |
precision with which the p-value is to be computed |
prob |
probability for which the credible region is to be computed |
x |
an object of class simult.pvalue |
... |
who knows |
An object of class 'simult.pvalue'.
Arnošt Komárek [email protected]
Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995). Bayesian computation and stochastic systems (with Discussion). Statistical Science, 10, 3 - 66. page 30
Held, L. (2004). Simultaneous posterior probability statements from Monte Carlo output. Journal of Computational and Graphical Statistics, 13, 20 - 35.
m <- 1000 sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m)) simult.pvalue(sample) sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m, mean=2)) simult.pvalue(sample) sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m, mean=5)) simult.pvalue(sample, prob=0.99, precision=0.0001)
m <- 1000 sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m)) simult.pvalue(sample) sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m, mean=2)) simult.pvalue(sample) sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m, mean=5)) simult.pvalue(sample, prob=0.99, precision=0.0001)
This is the dataset resulting from a longitudinal prospective dental study performed in Flanders (North of Belgium) in 1996 – 2001. The cohort of 4\,468 randomly sampled children who attended the first year of the basic school at the beginning of the study was annualy dental examined by one of 16 trained dentists. The original dataset consists thus of at most 6 dental observations for each child.
The dataset presented here contains mainly the information on the emergence and caries times summarized in the interval-censored observations. Some baseline covariates are also included here.
For more detail on the design of the study see Vanobbergen et al. (2000).
This data set was used in the analyses presented in Komárek et al. (2005), in Lesaffre, Komárek, and Declerck (2005) and in Komárek and Lesaffre (2007).
IMPORTANT NOTICE: It is possible to use these data for your
research work under the condition that each manuscript is first
approved by
Prof. Emmanuel Lesaffre
Leuven Biostatistics and statistical Bioinformatics Centre (L-BioStat)
Katholieke Universiteit Leuven
Kapucijnenvoer 35
B-3000 Leuven
Belgium
<[email protected]
>
data(tandmob2)
data(tandmob2)
a~data frame with 4\,430 rows (38 sampled children did not come to any of the designed dental examinations) and the following variables
identification number of a child
character boy or girl
numeric, 0 = boy, 1 = girl
character, date of birth in the format DDmmmYY
factor, code of the province with
Antwerpen
Vlaams Brabant
Limburg
Oost Vlaanderen
West Vlaanderen
factor, code of the educational system with
Free
Community school
Province/council school
factor, code indicating the starting age of brushing the teeth (as reported by parents) with
[0, 1] years
(1, 2] years
(2, 3] years
(3, 4] years
(4, 5] years
later than at the age of 5
binary covariate, 0 = no, 1 = yes. This is the covariate fluorosis used in the paper Komárek et al. (2005).
binary, indicator whether a deciduous tooth xx was removed becaues of orthodontical reasons or not.
xx takes values 53, 63, 73, 83 (deciduous lateral canines), 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
lower limit of the emergence (in years of age) of the
permanent tooth xx. NA
if the emergence was left-censored.
xx takes values 11, 21, 31, 41 (permanent incisors), 12, 22, 32, 42 (permanent central canines), 13, 23, 33, 43 (permanent lateral canines), 14, 24, 34, 44 (permanent first premolars), 15, 25, 35, 45 (permanent second premolars), 16, 26, 36, 46 (permanent first molars), 17, 27, 37, 47 (permanent second molars).
upper limit of the emergence (in years of age) of the
permanent tooth xx. NA
if the emergence was right-censored.
xx takes values as for the variable EBEG.xx
.
lower limit for the caries time (in years of age, ‘F’
stands for ‘failure’) of the permanent tooth xx. NA
if the
caries time was left-censored.
xx takes values as for the variable EBEG.xx
.
upper limit for the caries time (in years of age, ‘F’
stands for ‘failure’) of the permanent tooth xx. NA
if the
caries time was right-censored.
xx takes values as for the variable EBEG.xx
.
Unfortunately, for all teeth except 16, 26, 36 and 46 almost all the caries times are right-censored. For teeth 16, 26, 36, 46, the amount of right-censoring is only about 25%.
indicator whether a deciduous tooth xx was decayed or missing due to caries or filled on at most the last examination before the first examination when the emergence of the permanent successor was recorded.
xx takes values 53, 63, 73, 83 (deciduous lateral incisors), 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
indicator whether a~deciduous tooth xx was removed due to the orthodontical reasons or decayed on at most the last examination before the first examination when the emergence of the permanent successor was recorded.
Leuven Biostatistics and statistical Bioinformatics Centre (L-BioStat), Katholieke Universiteit Leuven, Kapucijnenvoer 35, 3000 Leuven, Belgium
URL:
https://gbiomed.kuleuven.be/english/research/50000687/50000696/
Data collection was supported by Unilever, Belgium. The Signal Tandmobiel project comprises the following partners: D. Declerck (Dental School, Catholic University Leuven), L. Martens (Dental School, University Ghent), J. Vanobbergen (Oral Health Promotion and Prevention, Flemish Dental Association), P. Bottenberg (Dental School, University Brussels), E. Lesaffre (Biostatistical Centre, Catholic University Leuven), K. Hoppenbrouwers (Youth Health Department, Catholic University Leuven; Flemish Association for Youth Health Care).
Komárek, A., Lesaffre, E.,
T., Declerck, D., and
Virtanen, J. I. (2005).
A Bayesian analysis of multivariate doubly-interval-censored dental data.
Biostatistics, 6, 145–155.
Komárek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549–569.
Lesaffre, E., Komárek, A., and Declerck, D. (2005). An overview of methods for interval-censored data with an emphasis on applications in dentistry. Statistical Methods in Medical Research, 14, 539–552.
Vanobbergen, J., Martens, L., Lesaffre, E., and Declerck, D. (2000). The Signal-Tandmobiel project – a longitudinal intervention health promotion study in Flanders (Belgium): baseline and first year results. European Journal of Paediatric Dentistry, 2, 87–96.
This is the dataset resulting from a longitudinal prospective dental study performed in Flanders (North of Belgium) in 1996 – 2001. The cohort of 4\,468 randomly sampled children who attended the first year of the basic school at the beginning of the study was annualy dental examined by one of 16 trained dentists. The original dataset consists thus of at most 6 dental observations for each child.
The dataset presented here contains mainly the information on the emergence and caries times summarized in the interval-censored observations. Some baseline covariates are also included here.
For more detail on the design of the study see Vanobbergen et al. (2000).
This is the version of the dataset used first by Leroy et al. (2005)
and contains a subset of the tandmob2
. Some children
were removed to satisfy inclusion criteria given in Leroy et
al. (2005). Additionally, left-censored emergence times of the
permanent first molars are adjusted according to the eruption stage
(see Leroy et al., 2005).
This data set was then used in the analyses presented in Komárek and Lesaffre (2006, 2008).
IMPORTANT NOTICE: It is possible to use these data for your
research work under the condition that each manuscript is first
approved by
Prof. Emmanuel Lesaffre
Leuven Biostatistics and statistical Bioinformatics Centre (L-BioStat)
Katholieke Universiteit Leuven
Kapucijnenvoer 35
B-3000 Leuven
Belgium
<[email protected]
>
data(tandmobRoos)
data(tandmobRoos)
a~data frame with 4\,394 rows and the following variables
identification number of a child
character boy or girl
character, date of birth in the format DDmmmYY
factor, code of the province with
Antwerpen
Vlaams Brabant
Limburg
Oost Vlaanderen
West Vlaanderen
factor, code of the educational system with
Free
Community school
Province/council school
numeric, 0 = boy, 1 = girl
lower limit of the emergence (in years of age) of the
permanent tooth xx. In contrast to tandmob2
, the lower
emergence limit for the permanent first molars that were originally
left-censored, are adjusted according to the eruption stage (see
Leroy, 2005 for more details).
xx takes values 16, 26, 36, 46 (permanent first molars).
upper limit of the emergence (in years of age) of the
permanent tooth xx. NA
if the emergence was right-censored.
xx takes values as for the variable EBEG.xx
.
lower limit for the caries time (in years of age, ‘F’
stands for ‘failure’) of the permanent tooth xx. NA
if the
caries time was left-censored.
xx takes values as for the variable EBEG.xx
.
upper limit for the caries time (in years of age, ‘F’
stands for ‘failure’) of the permanent tooth xx. NA
if the
caries time was right-censored.
xx takes values as for the variable EBEG.xx
.
Unfortunately, for all teeth except 16, 26, 36 and 46 almost all the caries times are right-censored. For teeth 16, 26, 36, 46, the amount of right-censoring is only about 25%.
numeric, 0 or 1. Equal to 1 if the information concerning the permanent tooth was available, 0 if the permanent tooth xx was removed from the dataset by Kris.
xx takes values 16, 26, 36, 46.
These variables are almost useless for ordinary users.
numeric, 0 or 1. It is equal to 1 if the deciduous tooth xx was decayed, 0 otherwise.
xx takes values 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
numeric, 0 or 1. It is equal to 1 if the deciduous tooth xx was missing due to caries, 0 otherwise.
xx takes values 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
numeric, 0 or 1. It is equal to 1 if the deciduous tooth xx was filled, 0 otherwise.
xx takes values 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
numeric, 0 or 1. It is equal to 1 if the deciduous tooth xx was sound, 0 otherwise.
xx takes values 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
numeric, 0 or 1. It is equal to 1 if the permanent first molar xx was sealed in pits and fissures (a form of protection), 0 otherwise.
xx takes values 16, 26, 36, 46 (permanent first molars).
numeric, 0 or 1. It is equal to 1 if the child brushes daily the teeth, equal to 0 if he/she brushes less than once a day.
numeric, 0 or 1. It is equal to 1 if there was occlusal plaque in pits and fissures of the permanent tooth xx. It is equal to 0 if there was either no plaque present or the plaque was present on the total occlusal surface.
xx takes values 16, 26, 36, 46 (permanent first molars).
numeric, 0 or 1. It is equal to 1 if there was occlusal plaque on the total occlusal surface of the permanent tooth xx. It is equal to 0 if there was either no plaque present or the plaque was present only in pits and fissures.
xx takes values 16, 26, 36, 46 (permanent first molars).
Leuven Biostatistics and statistical Bioinformatics Centre (L-BioStat), Katholieke Universiteit Leuven, Kapucijnenvoer 35, 3000 Leuven, Belgium
URL:
https://gbiomed.kuleuven.be/english/research/50000687/50000696/
Data collection was supported by Unilever, Belgium. The Signal Tandmobiel project comprises the following partners: D. Declerck (Dental School, Catholic University Leuven), L. Martens (Dental School, University Ghent), J. Vanobbergen (Oral Health Promotion and Prevention, Flemish Dental Association), P. Bottenberg (Dental School, University Brussels), E. Lesaffre (Biostatistical Centre, Catholic University Leuven), K. Hoppenbrouwers (Youth Health Department, Catholic University Leuven; Flemish Association for Youth Health Care).
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523–533.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3–22.
Leroy, R., Bogaerts, K., Lesaffre, E., and Declerck, D. (2005). Effect of caries experience in primary molars on cavity formation in the adjacent permanent first molar. Caries Research, 39, 342–349.
Vanobbergen, J., Martens, L., Lesaffre, E., and Declerck, D. (2000). The Signal-Tandmobiel project – a longitudinal intervention health promotion study in Flanders (Belgium): baseline and first year results. European Journal of Paediatric Dentistry, 2, 87–96.
Displays a plot of iterations vs. sampled values for each variable in the chain, with a separate plot per variable.
This is slightly modified version of traceplot
function
of a coda
package to conform to my personal preferences.
traceplot2(x, chains, bty = "n", main, xlab, ...)
traceplot2(x, chains, bty = "n", main, xlab, ...)
x |
|
chains |
indeces of chains from the object that are to be plotted. |
bty , main , xlab , ...
|
further arguments passed to the
|
No return value.
Arnošt Komárek [email protected]
Components of a bivariate G-spline can be indexed in several
ways. Suppose that the knots in the first dimension are
and the knots in the second dimension
I.e. we have
knots in the first dimension and
knots in the second dimension. Each G-spline
component can have a double index
assigned which means that it corresponds to the knot
or alternatively the same G-spline component can have a~single index
assigned where takes values from
. Single indexing is used
for example by files
r.sim
and r_2.sim
generated by
functions bayesHistogram
, bayesBisurvreg
,
bayessurvreg2
to save some space.
This function serves to translate single indeces to double indeces using the relationship
The function can be used also in one dimensional case when a~simple relationship holds
vecr2matr(vec.r, KK)
vecr2matr(vec.r, KK)
vec.r |
a~vector of single indeces |
KK |
a~vector with numbers of knots on each side of the central
knot for each dimension of the G-spline. The length of |
In bivariate case: a~matrix with two columns and as many rows as the
length of vec.r
.
In univariate case: a~vector with as ,amy components as the length of vec.r
.
Arnošt Komárek [email protected]
### Bivariate G-spline ### with 31 knots in each dimension KK <- c(15, 15) ### First observation in component (-15, -15), ### second observation in component (15, 15), ### third observation in component (0, 0) vec.r <- c(1, 961, 481) vecr2matr(vec.r, KK)
### Bivariate G-spline ### with 31 knots in each dimension KK <- c(15, 15) ### First observation in component (-15, -15), ### second observation in component (15, 15), ### third observation in component (0, 0) vec.r <- c(1, 961, 481) vecr2matr(vec.r, KK)