Title: | Markov Chain Monte Carlo |
---|---|
Description: | Simulates continuous distributions of random vectors using Markov chain Monte Carlo (MCMC). Users specify the distribution by an R function that evaluates the log unnormalized density. Algorithms are random walk Metropolis algorithm (function metrop), simulated tempering (function temper), and morphometric random walk Metropolis (Johnson and Geyer, 2012, <doi:10.1214/12-AOS1048>, function morph.metrop), which achieves geometric ergodicity by change of variable. |
Authors: | Charles J. Geyer <[email protected]> and Leif T. Johnson <[email protected]> |
Maintainer: | Charles J. Geyer <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.9-8 |
Built: | 2024-11-14 06:25:02 UTC |
Source: | CRAN |
Like it says
data(foo)
data(foo)
A data frame with variables
quantitative predictor.
quantitative predictor.
quantitative predictor.
Bernoulli response.
library(mcmc) data(foo) out <- glm(y ~ x1 + x2 + x3, family = binomial, data = foo) summary(out)
library(mcmc) data(foo) out <- glm(y ~ x1 + x2 + x3, family = binomial, data = foo) summary(out)
Variance of sample mean of functional of reversible Markov chain using methods of Geyer (1992).
initseq(x)
initseq(x)
x |
a numeric vector that is a scalar-valued functional of a reversible Markov chain. |
Let
considered as a function of the lag be
the autocovariance function of the input time series.
Define
the sum of consecutive pairs of autocovariances. Then Theorem 3.1 in
Geyer (1992) says that considered as a function of
is strictly positive, strictly decreasing, and strictly convex,
assuming the input time series is a scalar-valued functional of a reversible Markov
chain. All of the MCMC done by this package is reversible.
This R function estimates the “big gamma” function,
considered as a function of
, subject to three different constraints, (1) nonnegative,
(2) nonnegative and nonincreasing, and (3) nonnegative, nonincreasing,
and convex. It also estimates the variance in the Markov chain central
limit theorem (CLT)
Note: The batch means provided by metrop
are also
scalar functionals of a reversible Markov chain. Thus these initial sequence
estimators applied to the batch means give valid standard errors for the
mean of the match means even when the batch length is too short to provide
a valid estimate of asymptotic variance. One does, of course, have to
multiply the asymptotic variance of the batch means by the batch length
to get the asymptotic variance for the unbatched chain.
a list containing the following components:
gamma0 |
the scalar |
Gamma.pos |
the vector |
Gamma.dec |
the vector |
Gamma.con |
the vector |
var.pos |
the scalar |
var.dec |
the scalar |
var.con |
the scalar |
Not precisely a bug, but var.pos
, var.dec
, and var.con
can be negative. This happens only when the chain is way too short to estimate
the variance, and even then rarely. But it does happen.
Geyer, C. J. (1992) Practical Markov Chain Monte Carlo. Statistical Science 7 473–483.
n <- 2e4 rho <- 0.99 x <- arima.sim(model = list(ar = rho), n = n) out <- initseq(x) ## Not run: plot(seq(along = out$Gamma.pos) - 1, out$Gamma.pos, xlab = "k", ylab = expression(Gamma[k]), type = "l") lines(seq(along = out$Gamma.dec) - 1, out$Gamma.dec, col = "red") lines(seq(along = out$Gamma.con) - 1, out$Gamma.con, col = "blue") ## End(Not run) # asymptotic 95% confidence interval for mean of x mean(x) + c(-1, 1) * qnorm(0.975) * sqrt(out$var.con / length(x)) # estimated asymptotic variance out$var.con # theoretical asymptotic variance (1 + rho) / (1 - rho) * 1 / (1 - rho^2) # illustrating use with batch means bm <- apply(matrix(x, nrow = 5), 2, mean) initseq(bm)$var.con * 5
n <- 2e4 rho <- 0.99 x <- arima.sim(model = list(ar = rho), n = n) out <- initseq(x) ## Not run: plot(seq(along = out$Gamma.pos) - 1, out$Gamma.pos, xlab = "k", ylab = expression(Gamma[k]), type = "l") lines(seq(along = out$Gamma.dec) - 1, out$Gamma.dec, col = "red") lines(seq(along = out$Gamma.con) - 1, out$Gamma.con, col = "blue") ## End(Not run) # asymptotic 95% confidence interval for mean of x mean(x) + c(-1, 1) * qnorm(0.975) * sqrt(out$var.con / length(x)) # estimated asymptotic variance out$var.con # theoretical asymptotic variance (1 + rho) / (1 - rho) * 1 / (1 - rho^2) # illustrating use with batch means bm <- apply(matrix(x, nrow = 5), 2, mean) initseq(bm)$var.con * 5
Like it says
data(logit)
data(logit)
A data frame with variables
quantitative predictor.
quantitative predictor.
quantitative predictor.
quantitative predictor.
Bernoulli response.
library(mcmc) data(logit) out <- glm(y ~ x1 + x2 + x3 + x4, family = binomial, data = logit) summary(out)
library(mcmc) data(logit) out <- glm(y ~ x1 + x2 + x3 + x4, family = binomial, data = logit) summary(out)
Markov chain Monte Carlo for continuous random vector using a Metropolis algorithm.
metrop(obj, initial, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, ...) ## S3 method for class 'function' metrop(obj, initial, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, ...) ## S3 method for class 'metropolis' metrop(obj, initial, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, ...)
metrop(obj, initial, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, ...) ## S3 method for class 'function' metrop(obj, initial, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, ...) ## S3 method for class 'metropolis' metrop(obj, initial, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, ...)
obj |
Either an R function or an object of class If a function, it evaluates the log unnormalized probability
density of the desired equilibrium distribution of the Markov chain.
Its first argument is the state vector of the Markov chain. Other
arguments arbitrary and taken from the If an object of class |
initial |
a real vector, the initial state of the Markov chain.
Must be feasible, see Details. Ignored if |
nbatch |
the number of batches. |
blen |
the length of batches. |
nspac |
the spacing of iterations that contribute to batches. |
scale |
controls the proposal step size. If scalar or
vector, the proposal is |
outfun |
controls the output. If a function, then the batch means
of |
debug |
if |
... |
additional arguments for |
Runs a “random-walk” Metropolis algorithm, terminology introduced
by Tierney (1994), with multivariate normal proposal
producing a Markov chain with equilibrium distribution having a specified
unnormalized density. Distribution must be continuous. Support of the
distribution is the support of the density specified by argument obj
.
The initial state must satisfy obj(state, ...) > -Inf
.
Description of a complete MCMC analysis (Bayesian logistic regression)
using this function can be found in the vignette
vignette("demo", "mcmc")
.
Suppose the function coded by the log unnormalized function (either
obj
or obj$lud
) is actually a log unnormalized density,
that is, if denotes that function, then
integrates
to some value strictly between zero and infinity. Then the
metrop
function always simulates a reversible, Harris ergodic Markov chain having
the equilibrium distribution with this log unnormalized density.
The chain is not guaranteed to be geometrically ergodic. In fact it cannot
be geometrically ergodic if the tails of the log unnormalized density are
sufficiently heavy. The morph.metrop
function deals with this
situation.
an object of class "mcmc"
, subclass "metropolis"
,
which is a list containing at least the following components:
accept |
fraction of Metropolis proposals accepted. |
batch |
|
accept.batch |
a vector of length |
initial |
value of argument |
final |
final state of Markov chain. |
initial.seed |
value of |
final.seed |
value of |
time |
running time of Markov chain from |
lud |
the function used to calculate log unnormalized density,
either |
nbatch |
the argument |
blen |
the argument |
nspac |
the argument |
outfun |
the argument |
Description of additional output when debug = TRUE
can be
found in the vignette debug
(../doc/debug.pdf).
If outfun
is missing or not a function, then the log unnormalized
density can be defined without a ... argument and that works fine.
One can define it starting ludfun <- function(state)
and that works
or ludfun <- function(state, foo, bar)
, where foo
and bar
are supplied as additional arguments to metrop
.
If outfun
is a function, then both it and the log unnormalized
density function can be defined without ... arguments if they
have exactly the same arguments list and that works fine. Otherwise it
doesn't work. Define these functions by
ludfun <- function(state, foo) outfun <- function(state, bar)
and you get an error about unused arguments. Instead define these functions by
ludfun <- function(state, foo, \ldots) outfun <- function(state, bar, \ldots)
and supply foo
and bar
as additional arguments to metrop
,
and that works fine.
In short, the log unnormalized density function and outfun
need
to have ... in their arguments list to be safe. Sometimes it works
when ... is left out and sometimes it doesn't.
Of course, one can avoid this whole issue by always defining the log
unnormalized density function and outfun
to have only one argument
state
and use global variables (objects in the R global environment) to
specify any other information these functions need to use. That too
follows the R way. But some people consider that bad programming practice.
A third option is to define either or both of these functions using a function
factory. This is demonstrated in the vignette for this package named
demo
, which is shown by vignette("demo", "mcmc")
.
This function follows the philosophy of MCMC explained the introductory chapter of the Handbook of Markov Chain Monte Carlo (Geyer, 2011).
This function automatically does batch means in order to reduce
the size of output and to enable easy calculation of Monte Carlo standard
errors (MCSE), which measure error due to the Monte Carlo sampling (not
error due to statistical sampling — MCSE gets smaller when you run the
computer longer, but statistical sampling variability only gets smaller
when you get a larger data set). All of this is explained in the package
vignette vignette("demo", "mcmc")
and in Section 1.10 of Geyer (2011).
This function does not apparently
do “burn-in” because this concept does not actually help with MCMC
(Geyer, 2011, Section 1.11.4) but the re-entrant property of this
function does allow one to do “burn-in” if one wants.
Assuming ludfun
, start.value
, scale
have been already defined
and are, respectively, an R function coding the log unnormalized density
of the target distribution, a valid state of the Markov chain,
and a useful scale factor,
out <- metrop(ludfun, start.value, nbatch = 1, blen = 1e5, scale = scale) out <- metrop(out, nbatch = 100, blen = 1000)
throws away a run of 100 thousand iterations before doing another run of 100 thousand iterations that is actually useful for analysis, for example,
apply(out$batch, 2, mean) apply(out$batch, 2, sd) / sqrt(out$nbatch)
give estimates of posterior means and their MCSE assuming the batch length (here 1000) was long enough to contain almost all of the significant autocorrelation (see Geyer, 2011, Section 1.10, for more on MCSE). The re-entrant property of this function (the second run starts where the first one stops) assures that this is really “burn-in”.
The re-entrant property allows one to do very long runs without having to do them in one invocation of this function.
out2 <- metrop(out) out3 <- metrop(out2) batch <- rbind(out$batch, out2$batch, out3$batch)
produces a result as if the first run had been three times as long.
The scale
argument must be adjusted so that the acceptance rate
is not too low or too high to get reasonable performance. The rule of
thumb is that the acceptance rate should be about 25%.
But this recommendation (Gelman, et al., 1996) is justified by analysis
of a toy problem (simulating a spherical multivariate normal distribution)
for which MCMC is unnecessary. There is no reason to believe this is optimal
for all problems (if it were optimal, a stronger theorem could be proved).
Nevertheless, it is clear that at very low acceptance rates the chain makes
little progress (because in most iterations it does not move) and that at
very high acceptance rates the chain also makes little progress (because
unless the log unnormalized density is nearly constant, very high acceptance
rates can only be achieved by very small values of scale
so the
steps the chain takes are also very small).
Even in the Gelman, et al. (1996) result, the optimal rate for spherical
multivariate normal depends on dimension. It is 44% for
and 23% for
.
Geyer and Thompson (1995) have an example, admittedly for
simulated tempering (see
temper
) rather than random-walk
Metropolis, in which no acceptance rate less than 70% produces an ergodic
Markov chain. Thus 25% is merely a rule of thumb. We only know we don't
want too high or too low. Probably 1% or 99% is very inefficient.
Gelman, A., Roberts, G. O., and Gilks, W. R. (1996) Efficient Metropolis jumping rules. In Bayesian Statistics 5: Proceedings of the Fifth Valencia International Meeting. Edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith. Oxford University Press, Oxford, pp. 599–607.
Geyer, C. J. (2011) Introduction to MCMC. In Handbook of Markov Chain Monte Carlo. Edited by S. P. Brooks, A. E. Gelman, G. L. Jones, and X. L. Meng. Chapman & Hall/CRC, Boca Raton, FL, pp. 3–48.
Geyer, C. J. and Thompson, E. A. (1995) Annealing Markov chain Monte Carlo with applications to ancestral inference. Journal of the American Statistical Association 90 909–920.
Tierney, L. (1994) Markov chains for exploring posterior distributions (with discussion). Annals of Statistics 22 1701–1762.
morph.metrop
and temper
h <- function(x) if (all(x >= 0) && sum(x) <= 1) return(1) else return(-Inf) out <- metrop(h, rep(0, 5), 1000) out$accept # acceptance rate too low out <- metrop(out, scale = 0.1) out$accept t.test(out$accept.batch)$conf.int # acceptance rate o. k. (about 25 percent) plot(out$batch[ , 1]) # but run length too short (few excursions from end to end of range) out <- metrop(out, nbatch = 1e4) out$accept plot(out$batch[ , 1]) hist(out$batch[ , 1]) acf(out$batch[ , 1], lag.max = 250) # looks like batch length of 250 is perhaps OK out <- metrop(out, blen = 250, nbatch = 100) apply(out$batch, 2, mean) # Monte Carlo estimates of means apply(out$batch, 2, sd) / sqrt(out$nbatch) # Monte Carlo standard errors t.test(out$accept.batch)$conf.int acf(out$batch[ , 1]) # appears that blen is long enough
h <- function(x) if (all(x >= 0) && sum(x) <= 1) return(1) else return(-Inf) out <- metrop(h, rep(0, 5), 1000) out$accept # acceptance rate too low out <- metrop(out, scale = 0.1) out$accept t.test(out$accept.batch)$conf.int # acceptance rate o. k. (about 25 percent) plot(out$batch[ , 1]) # but run length too short (few excursions from end to end of range) out <- metrop(out, nbatch = 1e4) out$accept plot(out$batch[ , 1]) hist(out$batch[ , 1]) acf(out$batch[ , 1], lag.max = 250) # looks like batch length of 250 is perhaps OK out <- metrop(out, blen = 250, nbatch = 100) apply(out$batch, 2, mean) # Monte Carlo estimates of means apply(out$batch, 2, sd) / sqrt(out$nbatch) # Monte Carlo standard errors t.test(out$accept.batch)$conf.int acf(out$batch[ , 1]) # appears that blen is long enough
Utility functions for variable transformation.
morph(b, r, p, center) morph.identity()
morph(b, r, p, center) morph.identity()
b |
Positive real number. May be missing. |
r |
Non-negative real number. May be missing. If |
p |
Real number strictly greater than 2. May be missing. If
|
center |
Real scalar or vector. May be missing. If
|
The morph
function facilitates using variable transformations
by providing functions to (using for the original random
variable with the pdf
, and
for the transformed
random variable with the pdf
):
Calculate the log unnormalized probability density for
induced by the transformation.
Transform an arbitrary function of to a function of
.
Transform values of to values of
.
Transform values of to values of
(the inverse transformation).
for a select few transformations.
morph.identity
implements the identity transformation,
.
The parameters r
, p
, b
and center
specify the
transformation function. In all cases, center
gives the center
of the transformation, which is the value in the equation
If no parameters are specified, the identity
transformation, , is used.
The parameters r
, p
and b
specify a function
, which is a monotonically increasing bijection from the
non-negative reals to the non-negative reals. Then
where represents the Euclidean norm of the vector
.
The inverse function is given by
The parameters r
and p
are used to define the function
where is the indicator
function. We require that
r
is non-negative and p
is
strictly greater than 2. The parameter b
is used to define the
function
We require that is positive.
The parameters r
, p
and b
specify in
the following manner:
If one or both of r
and p
is specified, and b
is not specified, then
If only
r
is specified, p = 3
is used. If only p
is specified,
r = 0
is used.
If only b
is specified, then
If one or both of r
and p
is specified, and b
is
also specified, then
a list containing the functions
outfun(f)
, a function that operates on functions.
outfun(f)
returns the function function(state, ...)
f(inverse(state), ...)
.
inverse
, the inverse transformation function.
transform
, the transformation function.
lud
, a function that operates on functions. As input,
lud
takes a function that calculates a log unnormalized
probability density, and returns a function that calculates the
log unnormalized density by transforming a random variable using the
transform
function. lud(f) = function(state, ...)
f(inverse(state), ...) + log.jacobian(state)
, where
log.jacobian
represents the function that calculate the log
Jacobian of the transformation. log.jacobian
is not returned.
The equations for the returned transform
function (see below)
do not have a general analytical solution when p
is not equal
to 3. This implementation uses numerical approximation to calculate
transform
when p
is not equal to 3. If computation
speed is a factor, it is advisable to use p=3
. This is not a
factor when using morph.metrop
, as transform
is
only called once during setup, and not at all while running the Markov chain.
# use an exponential transformation, centered at 100. b1 <- morph(b=1, center=100) # original log unnormalized density is from a t distribution with 3 # degrees of freedom, centered at 100. lud.transformed <- b1$lud(function(x) dt(x - 100, df=3, log=TRUE)) d.transformed <- Vectorize(function(x) exp(lud.transformed(x))) ## Not run: curve(d.transformed, from=-3, to=3, ylab="Induced Density") ## End(Not run)
# use an exponential transformation, centered at 100. b1 <- morph(b=1, center=100) # original log unnormalized density is from a t distribution with 3 # degrees of freedom, centered at 100. lud.transformed <- b1$lud(function(x) dt(x - 100, df=3, log=TRUE)) d.transformed <- Vectorize(function(x) exp(lud.transformed(x))) ## Not run: curve(d.transformed, from=-3, to=3, ylab="Induced Density") ## End(Not run)
Markov chain Monte Carlo for continuous random vector using a Metropolis algorithm for an induced density.
morph.metrop(obj, initial, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, morph, ...)
morph.metrop(obj, initial, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, morph, ...)
obj |
see |
initial |
see |
nbatch |
see |
blen |
see |
nspac |
see |
scale |
see |
outfun |
unlike for |
debug |
see |
morph |
morph object used for transformations. See |
... |
see |
morph.metrop
implements morphometric methods for Markov
chains. The caller specifies a log unnormalized probability density
and a transformation. The transformation specified by the
morph
parameter is used to induce a new log unnormalized
probability density, a Metropolis algorithm is
run for the induced density. The Markov chain is transformed back to
the original scale. Running the Metropolis algorithm for the induced
density, instead of the original density, can result in a Markov chain
with better convergence properties. For more details see Johnson and Geyer
(submitted). Except for morph
, all parameters are
passed to metrop
, transformed when necessary. The
scale
parameter is not transformed.
If is a real vector valued continuous random variable, and
where
is a diffeomorphism, then the pdf of
is given by
where is
the pdf of
and
is the Jacobian
of
. Because
is a diffeomorphism, a Markov chain
for
may be transformed into a Markov chain for
. Furthermore, these Markov chains are isomorphic
(Johnson and Geyer, submitted) and have the same convergence rate.
The
morph
variable provides a diffeomorphism,
morph.metrop
uses this diffeomorphism to induce the log
unnormalized density, based on the user
supplied log unnormalized density,
.
morph.metrop
runs a Metropolis algorithm for and transforms the resulting Markov chain into a Markov chain for
. The user accessible output components are the same as
those that come from
metrop
, see the documentation for
metrop
for details.
Subsequent calls of morph.metrop
may change to the
transformation by specifying a new value for morph
.
Any of the other parameters to morph.metrop
may also be
modified in subsequent calls. See metrop
for more details.
The general idea is that a random-walk Metropolis sampler
(what metrop
does) will not be geometrically
ergodic unless the tails of the unnormalized density decrease
superexponentially fast (so the tails of the log unnormalized density
decrease faster than linearly). It may not be geometrically ergodic
even then (see Johnson and Geyer, submitted, for the complete theory).
The transformations used by this function (provided by morph
)
can produce geometrically ergodic chains when the tails of the log
unnormalized density are too light for metrop
to do so.
When the tails of the unnormalized density are exponentially light but
not superexponentially light (so the tails of the log unnormalized density
are asymptotically linear, as in the case of exponential family models
when conjugate priors are used, for example logistic regression, Poisson
regression with log link, or log-linear models for categorical data), one
should use morph
with b = 0
(the default), which
produces a transformation of the form in the notation
used in the details section of the help for
morph
.
This will produce a geometrically ergodic sampler if other features of the
log unnormalized density are well behaved. For example it will do so
for the exponential family examples mentioned above.
(See Johnson and Geyer, submitted, for the complete theory.)
The transformation behaves like a shift transformation
on a ball of radius
r
centered at center
, so these arguments
to morph
should be chosen so that a sizable proportion of
the probability under the original (untransformed) unnormalized density
is contained in this ball. This function will work when r = 0
and
center = 0
(the defaults) are used, but may not work as well as when
r
and center
are well chosen.
When the tails of the unnormalized density are not exponentially light
(so the tails of the log unnormalized density decrease sublinearly, as
in the case of univariate and multivariate distributions), one
should use
morph
with r > 0
and p = 3
, which
produces a transformation of the form composed
with
in the notation
used in the details section of the help for
morph
.
This will produce a geometrically ergodic sampler if other features of the
log unnormalized density are well behaved. For example it will do so
for the examples mentioned above.
(See Johnson and Geyer, submitted, for the complete theory.)
an object of class mcmc
, subclass morph.metropolis
.
This object is a list containing all of the elements from an object
returned by metrop
, plus at least the following
components:
morph |
the morph object used for the transformations. |
morph.final |
the final state of the Markov chain on the transformed scale. |
Johnson, L. T. and Geyer, C. J. (submitted) Variable Transformation to Obtain Geometric Ergodicity in the Random-walk Metropolis Algorithm.
out <- morph.metrop(function(x) dt(x, df=3, log=TRUE), 0, blen=100, nbatch=100, morph=morph(b=1)) # change the transformation. out <- morph.metrop(out, morph=morph(b=2)) out$accept # accept rate is high, increase the scale. out <- morph.metrop(out, scale=4) # close to 0.20 is about right. out$accept
out <- morph.metrop(function(x) dt(x, df=3, log=TRUE), 0, blen=100, nbatch=100, morph=morph(b=1)) # change the transformation. out <- morph.metrop(out, morph=morph(b=2)) out$accept # accept rate is high, increase the scale. out <- morph.metrop(out, scale=4) # close to 0.20 is about right. out$accept
Variance of sample mean of time series calculated using overlapping batch means.
olbm(x, batch.length, demean = TRUE)
olbm(x, batch.length, demean = TRUE)
x |
a matrix or time series object. Each column of |
batch.length |
length of batches. |
demean |
when |
The estimated variance of the sample mean.
h <- function(x) if (all(x >= 0) && sum(x) <= 1) return(1) else return(-Inf) out <- metrop(h, rep(0, 5), 1000) out <- metrop(out, scale = 0.1) out <- metrop(out, nbatch = 1e4) foo <- olbm(out$batch, 150) # monte carlo estimates (true means are same by symmetry) apply(out$batch, 2, mean) # monte carlo standard errors (true s. d. are same by symmetry) sqrt(diag(foo)) # check that batch length is reasonable acf(out$batch, lag.max = 200)
h <- function(x) if (all(x >= 0) && sum(x) <= 1) return(1) else return(-Inf) out <- metrop(h, rep(0, 5), 1000) out <- metrop(out, scale = 0.1) out <- metrop(out, nbatch = 1e4) foo <- olbm(out$batch, 150) # monte carlo estimates (true means are same by symmetry) apply(out$batch, 2, mean) # monte carlo standard errors (true s. d. are same by symmetry) sqrt(diag(foo)) # check that batch length is reasonable acf(out$batch, lag.max = 200)
Markov chain Monte Carlo (MCMC) for continuous random vectors using
parallel or serial tempering, the latter also called umbrella
sampling and simulated tempering.
The chain simulates different distributions on the
same state space. In parallel tempering, all the distributions are
simulated in each iteration. In serial tempering, only one of the
distributions is simulated (a random one). In parallel tempering,
the state is a
matrix, each row of which
is the state for one of the distributions.
In serial tempering the state of the Markov chain is a pair
,
where
is an integer between 1 and
and
is a vector
of length
. This pair is represented as a single real vector
c(i, x)
. The variable says which distribution
is a simulation for.
temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, parallel = FALSE, ...) ## S3 method for class 'function' temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, parallel = FALSE, ...) ## S3 method for class 'tempering' temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, parallel = FALSE, ...)
temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, parallel = FALSE, ...) ## S3 method for class 'function' temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, parallel = FALSE, ...) ## S3 method for class 'tempering' temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, parallel = FALSE, ...)
obj |
either an R function or an object of class If a function, it should evaluate the log unnormalized
density If an object of class The first argument of the log unnormalized density function is the
is an R vector |
initial |
for serial tempering, a real vector |
neighbors |
a logical symmetric matrix of dimension |
nbatch |
the number of batches. |
blen |
the length of batches. |
nspac |
the spacing of iterations that contribute to batches. |
scale |
controls the proposal step size for real elements of the state
vector. For serial tempering, proposing a new value for the |
outfun |
controls the output. If a function, then the batch means
of |
debug |
if |
parallel |
if |
... |
additional arguments for |
Serial tempering simulates a mixture of distributions of a continuous random
vector. The number of components of the mixture is k
, and the dimension
of the random vector is p
. Denote the state , where
is a positive integer between 1 and
, and let
denote
the unnormalized joint density of their equilibrium distribution.
The logarithm of this function is what
obj
or obj$lud
calculates.
The mixture distribution is the marginal for derived from
the equilibrium distribution
, that is,
Parallel tempering simulates a product of distributions of a continuous random
vector. Denote the state ,
then the unnormalized joint density of the equilibrium distribution is
The update mechanism of the Markov chain combines two kinds of elementary updates: jump/swap updates (jump for serial tempering, swap for parallel tempering) and within-component updates. Each iteration of the Markov chain one of these elementary updates is done. With probability 1/2 a jump/swap update is done, and with probability 1/2 a with-component update is done.
Within-component updates are the same for both serial and parallel tempering.
They are “random-walk” Metropolis updates with multivariate normal
proposal, the proposal distribution being determined by the argument
scale
. In serial tempering, the part of the current state
is updated preserving
.
In parallel tempering, an index
is chosen at random and the part
of the state
representing that component is updated,
again preserving
.
Jump updates choose uniformly at random a neighbor of the current component:
if indexes the current component, then it chooses uniformly at random
a
such that
neighbors[i, j] == TRUE
. It then does does a
Metropolis-Hastings update for changing the current state from
to
.
Swap updates choose a component uniformly at random and a neighbor of that
component uniformly at random: first an index is chosen uniformly
at random between 1 and
, then an index
is chosen
uniformly at random such that
neighbors[i, j] == TRUE
. It then does
does a Metropolis-Hastings update for swapping the states of the
two components: interchanging and
while preserving
.
The initial state must satisfy lud(initial, ...) > - Inf
for serial
tempering or must satisfy lud(initial[i, ], ...) > - Inf
for each
i
for parallel tempering, where lud
is either obj
or obj$lud
.
That is, the initial state must have positive probability.
an object of class "mcmc"
, subclass "tempering"
,
which is a list containing at least the following components:
batch |
the batch means of the continuous part of the state.
If |
ibatch |
(returned for serial tempering only) an |
acceptx |
fraction of Metropolis within-component proposals accepted.
A vector of length |
accepti |
fraction of Metropolis jump/swap proposals accepted.
A |
initial |
value of argument |
final |
final state of Markov chain. |
initial.seed |
value of |
final.seed |
value of |
time |
running time of Markov chain from |
lud |
the function used to calculate log unnormalized density,
either |
nbatch |
the argument |
blen |
the argument |
nspac |
the argument |
outfun |
the argument |
Description of additional output when debug = TRUE
can be
found in the vignette debug
, which is shown by
vignette("debug", "mcmc")
.
If outfun
is missing, then the log unnormalized
density function can be defined without a ... argument and that works fine.
One can define it starting ludfun <- function(state)
and that works
or ludfun <- function(state, foo, bar)
, where foo
and bar
are supplied as additional arguments to temper
and that works too.
If outfun
is a function, then both it and the log unnormalized
density function can be defined without ... arguments if they
have exactly the same arguments list and that works fine. Otherwise it
doesn't work. Define these functions by
ludfun <- function(state, foo) outfun <- function(state, bar)
and you get an error about unused arguments. Instead define these functions by
ludfun <- function(state, foo, \ldots) outfun <- function(state, bar, \ldots)
and supply foo
and bar
as additional arguments to temper
,
and that works fine.
In short, the log unnormalized density function and outfun
need
to have ... in their arguments list to be safe. Sometimes it works
when ... is left out and sometimes it doesn't.
Of course, one can avoid this whole issue by always defining the log
unnormalized density function and outfun
to have only one argument
state
and use global variables (objects in the R global environment) to
specify any other information these functions need to use. That too
follows the R way. But some people consider that bad programming practice.
A third option is to define either or both of these functions using a function
factory. This is demonstrated in the vignette for this package named
demo
, which is shown by vignette("demo", "mcmc")
.
This function follows the philosophy of MCMC explained
the introductory chapter of the
Handbook of Markov Chain Monte Carlo (Geyer, 2011a)
and in the chapter of that book on tempering and related subjects
(Geyer, 2011b).
See also the section on philosophy of metrop
.
The scale
argument must be adjusted so that the acceptance rate
for within-component proposals (component acceptx
of the result
returned by this function)
is not too low or too high to get reasonable performance.
The log unnormalized density function must be chosen so that the acceptance rate
for jump/swap proposals (component accepti
of the result
returned by this function)
is not too low or too high to get reasonable performance.
The former is a vector and the latter is a matrix, and
all these rates must be adjusted to be reasonable.
The rates in in accepti
are influenced by the number of components
of the tempering mixture distribution, by what those components are (how
far apart they are in some unspecified metric on probability distributions),
and by the chosen normalizing constants for those distributions.
For examples of tuning tempering, see Geyer (2011b) and also the vignette
of this package shown by vignette("bfst", "mcmc")
.
The help for R function metrop
also gives advice on tuning
its sampler, which is relevant for tuning the acceptx
rates.
See also Geyer (1991) and Geyer and Thompson (1995) for the general theory of tuning parallel and serial tempering.
Geyer, C. J. (1991) Markov chain Monte Carlo maximum likelihood. Computing Science and Statistics: Proc. 23rd Symp. Interface, 156–163. http://hdl.handle.net/11299/58440.
Geyer, C. J. (2011a) Introduction to MCMC. In Handbook of Markov Chain Monte Carlo. Edited by S. P. Brooks, A. E. Gelman, G. L. Jones, and X. L. Meng. Chapman & Hall/CRC, Boca Raton, FL, pp. 3–48.
Geyer, C. J. (2011b) Importance Sampling, Simulated Tempering, and Umbrella Sampling. In Handbook of Markov Chain Monte Carlo. Edited by S. P. Brooks, A. E. Gelman, G. L. Jones, and X. L. Meng. Chapman & Hall/CRC, Boca Raton, FL, pp. 295–312.
Geyer, C. J. and Thompson, E. A. (1995) Annealing Markov chain Monte Carlo with applications to ancestral inference. Journal of the American Statistical Association 90 909–920.
d <- 9 witch.which <- c(0.1, 0.3, 0.5, 0.7, 1.0) ncomp <- length(witch.which) neighbors <- matrix(FALSE, ncomp, ncomp) neighbors[row(neighbors) == col(neighbors) + 1] <- TRUE neighbors[row(neighbors) == col(neighbors) - 1] <- TRUE ludfun <- function(state, log.pseudo.prior = rep(0, ncomp)) { stopifnot(is.numeric(state)) stopifnot(length(state) == d + 1) icomp <- state[1] stopifnot(icomp == as.integer(icomp)) stopifnot(1 <= icomp && icomp <= ncomp) stopifnot(is.numeric(log.pseudo.prior)) stopifnot(length(log.pseudo.prior) == ncomp) theta <- state[-1] if (any(theta > 1.0)) return(-Inf) bnd <- witch.which[icomp] lpp <- log.pseudo.prior[icomp] if (any(theta > bnd)) return(lpp) return(- d * log(bnd) + lpp) } # parallel tempering thetas <- matrix(0.5, ncomp, d) out <- temper(ludfun, initial = thetas, neighbors = neighbors, nbatch = 20, blen = 10, nspac = 5, scale = 0.56789, parallel = TRUE, debug = TRUE) # serial tempering theta.initial <- c(1, rep(0.5, d)) # log pseudo prior found by trial and error qux <- c(0, 9.179, 13.73, 16.71, 20.56) out <- temper(ludfun, initial = theta.initial, neighbors = neighbors, nbatch = 50, blen = 30, nspac = 2, scale = 0.56789, parallel = FALSE, debug = FALSE, log.pseudo.prior = qux)
d <- 9 witch.which <- c(0.1, 0.3, 0.5, 0.7, 1.0) ncomp <- length(witch.which) neighbors <- matrix(FALSE, ncomp, ncomp) neighbors[row(neighbors) == col(neighbors) + 1] <- TRUE neighbors[row(neighbors) == col(neighbors) - 1] <- TRUE ludfun <- function(state, log.pseudo.prior = rep(0, ncomp)) { stopifnot(is.numeric(state)) stopifnot(length(state) == d + 1) icomp <- state[1] stopifnot(icomp == as.integer(icomp)) stopifnot(1 <= icomp && icomp <= ncomp) stopifnot(is.numeric(log.pseudo.prior)) stopifnot(length(log.pseudo.prior) == ncomp) theta <- state[-1] if (any(theta > 1.0)) return(-Inf) bnd <- witch.which[icomp] lpp <- log.pseudo.prior[icomp] if (any(theta > bnd)) return(lpp) return(- d * log(bnd) + lpp) } # parallel tempering thetas <- matrix(0.5, ncomp, d) out <- temper(ludfun, initial = thetas, neighbors = neighbors, nbatch = 20, blen = 10, nspac = 5, scale = 0.56789, parallel = TRUE, debug = TRUE) # serial tempering theta.initial <- c(1, rep(0.5, d)) # log pseudo prior found by trial and error qux <- c(0, 9.179, 13.73, 16.71, 20.56) out <- temper(ludfun, initial = theta.initial, neighbors = neighbors, nbatch = 50, blen = 30, nspac = 2, scale = 0.56789, parallel = FALSE, debug = FALSE, log.pseudo.prior = qux)