Title: | Bayesian Object Oriented Modeling |
---|---|
Description: | A C++ library for Bayesian modeling, with an emphasis on Markov chain Monte Carlo. Although boom contains a few R utilities (mainly plotting functions), its primary purpose is to install the BOOM C++ library on your system so that other packages can link against it. |
Authors: | Steven L. Scott is the sole author and creator of the BOOM project. Some code in the BOOM libraries has been modified from other open source projects. These include Cephes (obtained from Netlib, written by Stephen L. Moshier), NEWUOA (M.J.D Powell, obtained from Powell's web site), and a modified version of the R math libraries (R core development team). Original copyright notices have been maintained in all source files. In these cases, copyright claimed by Steven L. Scott is limited to modifications made to the original code. Google claims copyright for code written while Steven L. Scott was employed at Google from 2008 - 2018, but BOOM is not an officially supported Google project. |
Maintainer: | Steven L. Scott <[email protected]> |
License: | LGPL-2.1 | file LICENSE |
Version: | 0.9.15 |
Built: | 2024-10-31 06:59:35 UTC |
Source: | CRAN |
The Boom package provides access to the C++ BOOM library for Bayesian computation.
If you are installing Boom using install.packages
on a
Linux machine (and thus compiling yourself) you will almost certainly
want to set the Ncpus
argument to a large number. Windows and
Mac users can ignore this advice.
The main purpose of the Boom package is not to be used directly, but
to provide the BOOM C++ library for other packages to link against.
The Boom package provides additional utility code for C++ authors to
use when writing R packages with C++ internals. These are described
in .../inst/include/r_interface/boom_r_tools.hpp
among the
package's include files.
Boom provides a collection of R functions and objects to help users
format data in the manner expected by the underlying C++ code.
Standard distributions that are commonly used as Bayesian priors can
be specified using BetaPrior
, GammaPrior
,
etc.
Boom provides a set of utilities helpful when writing unit tests for
Bayesian models. See CheckMcmcMatrix
and
CheckMcmcVector
for MCMC output, and functions like
check.probability.distribution
for checking function
inputs
Boom provides a collection of useful plots (using base R graphics)
that have proven useful for summarizing MCMC output. See
PlotDynamicDistribution
, PlotManyTs
,
BoxplotTrue
, and other code in the index with
Plot
in the title.
Please see the following pacakges
bsts
CausalImpact
Adds horizontal line segments to an existing plot. The segments are
centered at x
with height y
. The x
values are
assumed to be equally spaced, so that diff(x)
is a constant
'dx
'. The line segments go from x +/- half.width.factor
*dx
, so if half.width.factor=.5
there will be no gaps between
segments. The default is to leave a small gap.
This function was originally used to add reference lines to side-by-side boxplots.
AddSegments(x, y, half.width.factor = 0.45, ...)
AddSegments(x, y, half.width.factor = 0.45, ...)
x |
A numeric vector giving the midpoints of the line segments. |
y |
A numeric vector of the same length as |
half.width.factor |
See 'description' above. |
... |
graphical parameters controlling the type of lines used in the line segments |
Called for its side effect.
Steven L. Scott
x <- rnorm(100) y <- rnorm(100, 1) boxplot(list(x=x,y=y)) AddSegments(1:2, c(0, 1)) ## add segments to the boxplot
x <- rnorm(100) y <- rnorm(100, 1) boxplot(list(x=x,y=y)) AddSegments(1:2, c(0, 1)) ## add segments to the boxplot
A (possibly truncated) Gaussian prior on the autoregression coefficient in an AR1 model.
Ar1CoefficientPrior(mu = 0, sigma = 1, force.stationary = TRUE, force.positive = FALSE, initial.value = mu)
Ar1CoefficientPrior(mu = 0, sigma = 1, force.stationary = TRUE, force.positive = FALSE, initial.value = mu)
mu |
The mean of the prior distribution. |
sigma |
The standard deviation of the prior distribution. |
force.stationary |
Logical. If |
force.positive |
Logical. If |
initial.value |
The initial value of the parameter being modeled in the MCMC algorithm. |
The Ar1CoefficientPrior()
syntax is preferred, as it
more closely matches R's syntax for other constructors.
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
Specifies beta prior distribution for a binomial probability parameter.
BetaPrior(a = 1, b = 1, mean = NULL, sample.size = NULL, initial.value = NULL)
BetaPrior(a = 1, b = 1, mean = NULL, sample.size = NULL, initial.value = NULL)
a |
A positive real number interpretable as a prior success count. |
b |
A positive real number interpretable as a prior failure count. |
mean |
A positive real number representing |
sample.size |
A positive real number representing |
initial.value |
An initial value to be used for the variable
being modeled. If |
The distribution should be specified either with a
and
b
, or with mean
and sample.size
.
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
Plot the marginal distribution of each element in the Monte Carlo distribution of a matrix (e.g. a variance matrix or transition probability matrix). Rows and columns in the boxplots correspond to rows and columns in the matrix being plotted.
BoxplotMcmcMatrix(X, ylim = range(X), col.names, row.names, truth, colors = NULL, las = 0, ...)
BoxplotMcmcMatrix(X, ylim = range(X), col.names, row.names, truth, colors = NULL, las = 0, ...)
X |
3 dimensional array. The first dimension is the Monte Carlo index
(e.g. MCMC iteration). The second and third dimensions are the row
and column of the matrix being plotted. E.g. |
ylim |
2-vector giving the lower and upper limits of the vertical axis. |
col.names |
(optional) character vector giving the names of matrix columns
(third dimension of |
row.names |
(optional) character vector giving the names of matrix rows
(second dimension of |
truth |
(optional) scalar or matrix giving the values of reference lines to
be plotted on each boxplot. If a scalar then the same value will be
used for each boxplot. If a matrix then the rows and columns of the
matrix correspond to the second and third dimension of |
colors |
A vector of colors to use for the boxplots. Each row uses the same color scheme. |
las |
Controls the orientation of axis labels. See the |
... |
Extra arguments passed to |
Called for its side effect, which is to draw a set of side-by-side boxplots on the current graphics device.
Steven L. Scott
X <- array(rnorm(1000 * 3 * 4), dim=c(1000, 3, 4)) dimnames(X)[[2]] <- paste("row", 1:3) dimnames(X)[[3]] <- paste("col", 1:4) BoxplotMcmcMatrix(X) truth <- 0 BoxplotMcmcMatrix(X, truth=truth) truth <- matrix(rnorm(12), ncol=4) BoxplotMcmcMatrix(X, truth=truth)
X <- array(rnorm(1000 * 3 * 4), dim=c(1000, 3, 4)) dimnames(X)[[2]] <- paste("row", 1:3) dimnames(X)[[3]] <- paste("col", 1:4) BoxplotMcmcMatrix(X) truth <- 0 BoxplotMcmcMatrix(X, truth=truth) truth <- matrix(rnorm(12), ncol=4) BoxplotMcmcMatrix(X, truth=truth)
Plots side-by-side boxplots of the columns of the matrix x
.
Each boxplot can have its own reference line (truth
) and
standard error lines se.truth
, if desired. This function was
originally written to display MCMC output, where the reference lines
were true values used to test an MCMC simulation.
BoxplotTrue(x, truth = NULL, vnames = NULL, center = FALSE, se.truth = NULL, color = "white", truth.color = "black", ylim = NULL, ...)
BoxplotTrue(x, truth = NULL, vnames = NULL, center = FALSE, se.truth = NULL, color = "white", truth.color = "black", ylim = NULL, ...)
x |
The matrix whose columns are to be plotted. |
truth |
(optional) A vector of reference values with length equal to |
vnames |
(optional) character vector giving the column names of |
center |
(optional) logical. If |
se.truth |
(optional) numeric vector of length |
color |
(optional) vector of colors for each boxplot. |
truth.color |
A color (or vector of colors) to use for the segments representing true values. |
ylim |
Limits for the vertical axis. If |
... |
additional arguments to |
called for its side effect
Steven L. Scott
x <- t(matrix(rnorm(5000, 1:5, 1:5), nrow=5)) BoxplotTrue(x, truth=1:5, se.truth=1:5, col=rainbow(5), vnames = c("EJ", "TK", "JT", "OtherEJ", "TJ") )
x <- t(matrix(rnorm(5000, 1:5, 1:5), nrow=5)) BoxplotTrue(x, truth=1:5, se.truth=1:5, col=rainbow(5), vnames = c("EJ", "TK", "JT", "OtherEJ", "TJ") )
Verify that MCMC output covers expected values.
CheckMcmcMatrix(draws, truth, confidence = .95, control.multiple.comparisons = TRUE, burn = 0) CheckMcmcVector(draws, truth, confidence = .95, burn = 0) McmcMatrixReport(draws, truth, confidence = .95, burn = 0)
CheckMcmcMatrix(draws, truth, confidence = .95, control.multiple.comparisons = TRUE, burn = 0) CheckMcmcVector(draws, truth, confidence = .95, burn = 0) McmcMatrixReport(draws, truth, confidence = .95, burn = 0)
draws |
The array of MCMC draws to check. This must be a matrix for CheckMcmcMatrix and a vector for CheckMcmcVector. |
truth |
The vector of true values that must be covered by
|
confidence |
Specifies the probability width of the intervals
used to determine whether |
control.multiple.comparisons |
If FALSE then every interval must
cover its corresponding true value. Otherwise a fraction of
intervals (given by |
burn |
The number of MCMC iterations to discard as burn-in. |
CheckMcmcVector
checks a vector of draws corresponding to a
scalar random variable. CheckMcmcMatrix
checks a matrix of
draws corresponding to a vector of random variables. In either case
the check is made by constructing a central confidence interval
(obtained by removing half of 1 - confidence
from the upper and
lower tails of the distribution).
If a single variable is being checked with CheckMcmcVector then the check passes if and only if the interval covers the true value.
If multiple values are being checked with CheckMcmcMatrix then the
user has control over how strict to make the check. If
control.multiple.comparisons
is FALSE then the check passes if
and only if all intervals cover true values. Otherwise a fraction of
intervals must cover. The fraction is the lower bound of the binomial
confidence interval for the coverage rate under the hypothesis that
the true coverage rate is confidence
.
CheckMcmcVector
and CheckMcmcMatrix
return TRUE
if the check passes, and FALSE
if it does not.
McmcMatrixReport
returns a string that can be put in the
info
field of an expect_true
expression, to give useful information about a failed test case. The
return value is a textual representation of a three column matrix.
Each row matches a variable in draws
, and gives the lower and
upper bounds for the credible interval used to check the values. The
final column lists the true values that are supposed to be inside the
credible intervals. The value is returned as a character string that
is expected to be fed to cat()
or print()
so that it
will render correctly in R CMD CHECK output.
Steven L. Scott
ndraws <- 100 draws <- rnorm(ndraws, 0, 1) CheckMcmcVector(draws, 0) ## Returns TRUE CheckMcmcVector(draws, 17) ## Returns FALSE draws <- matrix(nrow = ndraws, ncol = 5) for (i in 1:5) { draws[, i] <- rnorm(ndraws, i, 1) } CheckMcmcMatrix(draws, truth = 1:5) ## Returns TRUE CheckMcmcMatrix(draws, truth = 5:1) ## Returns FALSE
ndraws <- 100 draws <- rnorm(ndraws, 0, 1) CheckMcmcVector(draws, 0) ## Returns TRUE CheckMcmcVector(draws, 17) ## Returns FALSE draws <- matrix(nrow = ndraws, ncol = 5) for (i in 1:5) { draws[, i] <- rnorm(ndraws, i, 1) } CheckMcmcMatrix(draws, truth = 1:5) ## Returns TRUE CheckMcmcMatrix(draws, truth = 5:1) ## Returns FALSE
Checks that data matches a concept
check.scalar.probability(x) check.positive.scalar(x) check.nonnegative.scalar(x) check.probability.distribution(x) check.scalar.integer(x) check.scalar.boolean(x)
check.scalar.probability(x) check.positive.scalar(x) check.nonnegative.scalar(x) check.probability.distribution(x) check.scalar.integer(x) check.scalar.boolean(x)
x |
An object to be checked. |
If the object does not match the concept being checked,
stop
is called. Otherwise TRUE
is returned.
Steven L. Scott [email protected]
Draw circles on the current graphics device.
circles(center, radius, ...)
circles(center, radius, ...)
center |
A two-column matrix giving the coordinates of the circle center. If a single circle is to be drawn then a 2-element vector can be passed instead. |
radius |
The radii of the circles. A scalar value will be
repeated if |
... |
Extra arguments passed to 'segments'. See
|
Draws circles on the current graphics device. This is a
low-level plotting function similar to points
,
lines
, segments
, etc.
Returns invisible NULL
.
Steven L. Scott [email protected]
plot(1:10, type = "n") circles(cbind(c(2, 3, 4), c(4, 5,6 )), radius = c(.3, .4, .5))
plot(1:10, type = "n") circles(cbind(c(2, 3, 4), c(4, 5,6 )), radius = c(.3, .4, .5))
Produces multiple density plots on a single axis, to compare the columns of a matrix or the elements of a list.
CompareDensities(x, legend.text = NULL, legend.location = "topright", legend.title = NULL, xlim = NULL, ylim = NULL, xlab = "parameter", ylab = "density", main = "", lty = NULL, col = "black", axes = TRUE, na.rm = TRUE, ...)
CompareDensities(x, legend.text = NULL, legend.location = "topright", legend.title = NULL, xlim = NULL, ylim = NULL, xlab = "parameter", ylab = "density", main = "", lty = NULL, col = "black", axes = TRUE, na.rm = TRUE, ...)
x |
matrix or list of numeric vectors. A density plot is produced for each column of the matrix or element of the list. |
legend.text |
(optional) character vector giving names of each density plot. |
legend.location |
Entry that can be passed to
|
legend.title |
The legend title. |
xlim |
(optional) horizonal range of the plotting region. If
omitted the region will be sized to fit all the observations in
|
ylim |
(optional) vertical range of the plotting region. If omitted the region will be sized to fit all empirical density plots. |
xlab |
label to be placed on the horizontal axis |
ylab |
label to be placed on the vertical axis |
main |
main title for the plot |
lty |
The line types to use for the different densities. See
|
col |
vector of colors for the densities to be plotted. |
axes |
Logical. Should axes and a box be drawn around the figure? |
na.rm |
Logical value indicating whether |
... |
Other graphical parameters passed to
|
Called for its side effect, which is to produce multiple density plots on the current graphics device.
Steven L. Scott
x <- t(matrix(rnorm(5000, 1:5, 1:5), nrow=5)) CompareDensities(x, legend.text=c("EJ", "TK", "JT", "OtherEJ", "TJ"), col=rainbow(5), lwd=2)
x <- t(matrix(rnorm(5000, 1:5, 1:5), nrow=5)) CompareDensities(x, legend.text=c("EJ", "TK", "JT", "OtherEJ", "TJ"), col=rainbow(5), lwd=2)
Produce a plot showing several stacked dynamic distributions over the same horizontal axis.
CompareDynamicDistributions( list.of.curves, timestamps, style = c("dynamic", "boxplot"), xlab = "Time", ylab = "", frame.labels = rep("", length(list.of.curves)), main = "", actuals = NULL, col.actuals = "blue", pch.actuals = 1, cex.actuals = 1, vertical.cuts = NULL, ...)
CompareDynamicDistributions( list.of.curves, timestamps, style = c("dynamic", "boxplot"), xlab = "Time", ylab = "", frame.labels = rep("", length(list.of.curves)), main = "", actuals = NULL, col.actuals = "blue", pch.actuals = 1, cex.actuals = 1, vertical.cuts = NULL, ...)
list.of.curves |
A list of matrices, all having the same number of columns. Each matrix represents a distribution of curves, with rows corresponding to individual curves, and columns to time points. |
timestamps |
A vector of time stamps, with length matching the number of columns in each element of list.of.curves. |
style |
Should the curves be represented using a dynamic distribution plot, or boxplots. Boxplots are better for small numbers of time points. Dynamic distribution plots are better for large numbers of time points. |
xlab |
Label for the horizontal axis. |
ylab |
Label for the (outer) vertical axis. |
frame.labels |
Labels for the vertical axis of each subplot. The length must match the number of plot. |
main |
Main title for the plot. |
actuals |
If non- |
col.actuals |
Color to use for the actuals. See |
pch.actuals |
Plotting character(s) to use for the actuals. See
|
cex.actuals |
Scale factor for actuals. See |
vertical.cuts |
If non- |
... |
Extra arguments passed to
|
Steven L. Scott
Produce a plot that compares the kernel density estimates for each element in a series of Monte Carlo draws of a vector or matrix.
CompareManyDensities(list.of.arrays, style = c("density", "box"), main = "", color = NULL, gap = 0, burn = 0, suppress.labels = FALSE, x.same.scale = TRUE, y.same.scale = FALSE, xlim = NULL, ylim = NULL, legend.location = c("top", "right"), legend.cex = 1, reflines = NULL, ...)
CompareManyDensities(list.of.arrays, style = c("density", "box"), main = "", color = NULL, gap = 0, burn = 0, suppress.labels = FALSE, x.same.scale = TRUE, y.same.scale = FALSE, xlim = NULL, ylim = NULL, legend.location = c("top", "right"), legend.cex = 1, reflines = NULL, ...)
list.of.arrays |
A list of arrays representing the MCMC draws of the vector or matrix in question. Each list element represents a different group. The first index in each list list element represents the Monte Carlo draw number (or iteration). The remaining indices represent the variables to be plotted. If the first list element has variable names assigned to its indices, these will be used to label the plots. |
style |
The style of plot to use for comparing distributions. |
main |
The main title of the plot. |
color |
A vector of colors to be used for representing the groups. |
gap |
The gap (in lines) between plots. |
burn |
The number of MCMC iterations to be discarded as burn-in. |
suppress.labels |
Logical. If |
x.same.scale |
Logical indicating whether the same horizontal scale should be used for all the plots. |
y.same.scale |
Logical indicating whether the same vertical scale
should be used for all the plots. This argument is ignored if
|
xlim |
Either |
ylim |
Either |
legend.location |
The location of the legend, either on top or
at the right. It can also be |
legend.cex |
The relative scale factor to use for the legend text. |
reflines |
This can be |
... |
Extra arguments passed to |
Steven L. Scott
x <- array(rnorm(9000), dim = c(1000, 3, 3)) dimnames(x) <- list(NULL, c("Larry", "Moe", "Curly"), c("Larry", "Eric", "Sergey")) y <- array(rnorm(9000), dim = c(1000, 3, 3)) z <- array(rnorm(9000), dim = c(1000, 3, 3)) data <- list(x = x, y = y, z = z) CompareManyDensities(data, color = c("red", "blue", "green")) CompareManyDensities(data, style = "box") x <- matrix(rnorm(5000), nrow = 1000) colnames(x) <- c("Larry", "Moe", "Curly", "Shemp", "???") y <- matrix(rnorm(5000), nrow = 1000) z <- matrix(rnorm(5000), nrow = 1000) data <- list(x = x, y = y, z = z) CompareManyDensities(data, color = c("red", "blue", "green")) CompareManyDensities(data, style = "box")
x <- array(rnorm(9000), dim = c(1000, 3, 3)) dimnames(x) <- list(NULL, c("Larry", "Moe", "Curly"), c("Larry", "Eric", "Sergey")) y <- array(rnorm(9000), dim = c(1000, 3, 3)) z <- array(rnorm(9000), dim = c(1000, 3, 3)) data <- list(x = x, y = y, z = z) CompareManyDensities(data, color = c("red", "blue", "green")) CompareManyDensities(data, style = "box") x <- matrix(rnorm(5000), nrow = 1000) colnames(x) <- c("Larry", "Moe", "Curly", "Shemp", "???") y <- matrix(rnorm(5000), nrow = 1000) z <- matrix(rnorm(5000), nrow = 1000) data <- list(x = x, y = y, z = z) CompareManyDensities(data, color = c("red", "blue", "green")) CompareManyDensities(data, style = "box")
Produce a plot that compares the kernel density estimates for each element in a series of Monte Carlo draws of a vector or matrix.
CompareManyTs(list.of.ts, burn = 0, type = "l", gap = 0, boxes = TRUE, thin = 1, labels = NULL, same.scale = TRUE, ylim = NULL, refline = NULL, color = NULL, ...)
CompareManyTs(list.of.ts, burn = 0, type = "l", gap = 0, boxes = TRUE, thin = 1, labels = NULL, same.scale = TRUE, ylim = NULL, refline = NULL, color = NULL, ...)
list.of.ts |
A list of time series matrices, data.frames or 3-dimensional arrays, all of the same size. The list elements correspond to groups. The first index of the array in each list element corresponds to time. The subsequent indices correspond to variables to be plotted. |
burn |
The number of initial observations to be discarded as burn-in (when plotting MCMC output). |
type |
The plotting type to use when plotting the time series. See
|
gap |
The amount of space to put between plots. |
boxes |
Logical. Should boxes be drawn around the plots? |
thin |
Plot every thin'th observation. This can reduce the amount of time it takes to make the plot if there are many long time series. |
labels |
A character vector to use as labels for individual plots. |
same.scale |
Logical. If TRUE then all plots are shown on the same verical scale, and vertical axes are drawn. If FALSE then each plot gets its own scale. |
ylim |
The scale of the vertical axis. If non-NULL then same.scale will be set to TRUE. |
refline |
The scalar value at which a thin dotted horizontal line should be plotted in each panel. This is useful for highlighting zero, for example. |
color |
A vector of colors to use for the plots. |
... |
Extra arguments passed to 'plot' and 'axis'. |
Steven L. Scott
PlotManyTs
, CompareManyDensities
x <- array(rnorm(9000), dim = c(1000, 3, 3)) dimnames(x) <- list(NULL, c("Larry", "Moe", "Curly"), c("Larry", "Eric", "Sergey")) y <- array(rnorm(9000), dim = c(1000, 3, 3)) z <- array(rnorm(9000), dim = c(1000, 3, 3)) data <- list(x = x, y = y, z = z) CompareManyTs(data, color = c("red", "blue", "green")) x <- matrix(rnorm(5000), nrow = 1000) colnames(x) <- c("Larry", "Moe", "Curly", "Shemp", "???") y <- matrix(rnorm(5000), nrow = 1000) z <- matrix(rnorm(5000), nrow = 1000) data <- list(x = x, y = y, z = z) CompareManyTs(data, color = c("red", "blue", "green"))
x <- array(rnorm(9000), dim = c(1000, 3, 3)) dimnames(x) <- list(NULL, c("Larry", "Moe", "Curly"), c("Larry", "Eric", "Sergey")) y <- array(rnorm(9000), dim = c(1000, 3, 3)) z <- array(rnorm(9000), dim = c(1000, 3, 3)) data <- list(x = x, y = y, z = z) CompareManyTs(data, color = c("red", "blue", "green")) x <- matrix(rnorm(5000), nrow = 1000) colnames(x) <- c("Larry", "Moe", "Curly", "Shemp", "???") y <- matrix(rnorm(5000), nrow = 1000) z <- matrix(rnorm(5000), nrow = 1000) data <- list(x = x, y = y, z = z) CompareManyTs(data, color = c("red", "blue", "green"))
Uses boxplots to compare distributions of vectors.
CompareVectorBoxplots(draws, main = NULL, colors = NULL, burn = 0, ...)
CompareVectorBoxplots(draws, main = NULL, colors = NULL, burn = 0, ...)
draws |
A list of MCMC draws. Each list element is a matrix with rows corresponding to MCMC iterations and columns to variables. The matrices can have different numbers of rows, but should have the same numbers of columns. |
main |
Main title of the plot. |
colors |
Colors to use for the boxplots. The length must match
the number entries in |
burn |
The number of initial MCMC iterations to discard before making the plot. |
... |
Extra arguments passed to |
Creates side-by-side boxplots with the dimensions of each vector gropued together.
x <- matrix(rnorm(300, mean = 1:3, sd = .4), ncol = 3, byrow = TRUE) y <- matrix(rnorm(600, mean = 3:1, sd = .2), ncol = 3, byrow = TRUE) CompareVectorBoxplots(list(x = x, y = y), colors = c("red", "blue"))
x <- matrix(rnorm(300, mean = 1:3, sd = .4), ncol = 3, byrow = TRUE) y <- matrix(rnorm(600, mean = 3:1, sd = .2), ncol = 3, byrow = TRUE) CompareVectorBoxplots(list(x = x, y = y), colors = c("red", "blue"))
A 'DiffDoubleModel' is tag given to a probability distribution that measures a real-valued scalar outcome, and whose log density is twice differentiable with respect to the random variable. The tag is a signal to underlying C++ code that the object being passed is one of a subset of understood distributions. Presently that subset includes the following distributions.
Clearly this list is non-exhaustive, and other distributions may be added in the future.
Steven L. Scott [email protected]
Density and random generation for the Dirichlet distribution.
ddirichlet(probabilities, nu, logscale = FALSE) rdirichlet(n, nu)
ddirichlet(probabilities, nu, logscale = FALSE) rdirichlet(n, nu)
probabilities |
A vector representing a discrete probability distribution, or a matrix where each row is a discrete probability distribution. Zero probabilities are not allowed. |
nu |
The parameters of the Dirichlet distribution. This can be a
vector of positive numbers, interpretable as prior counts, of length
matching the dimension of probabilities. If |
logscale |
Logical. If TRUE then return the density on the log scale. Otherwise return the density on the raw scale. |
n |
The number of desired draws. |
The Dirichlet distribution is a generalization of the beta distribution. Whereas beta distribution is a model for probabilities, the Dirichlet distribution is a model for discrete distributions with several possible outcome values.
Let denote a discrete probability distribution (a vector
of positive numbers summing to 1), and let
be a vector
of positive numbers (the parameters of the Dirichlet distribution),
which can be thought of as prior counts. Then the density of the
Dirichlet distribution can be written
ddirichlet
returns a vector of density values, with one
entry per row in probabilities
. rdirichlet
returns a
matrix (if n > 1
) or a vector (if n==1
) containing the
draws from the Dirichlet distribution with the specified parameters.
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
Specifies Dirichlet prior for a discrete probability distribution.
DirichletPrior(prior.counts, initial.value = NULL)
DirichletPrior(prior.counts, initial.value = NULL)
prior.counts |
A vector of positive numbers representing prior counts. |
initial.value |
The initial value in the MCMC algorithm of the distribution being modeled. |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
Prior distributions over a discrete quantities.
PointMassPrior(location) PoissonPrior(mean, lower.limit = 0, upper.limit = Inf) DiscreteUniformPrior(lower.limit, upper.limit)
PointMassPrior(location) PoissonPrior(mean, lower.limit = 0, upper.limit = Inf) DiscreteUniformPrior(lower.limit, upper.limit)
location |
The location of the point mass. |
mean |
The mean of the Poisson distribution. |
lower.limit |
The smallest value within the support of the
distribution. The prior probability for numbers less than
|
upper.limit |
The largest value within the support of the
distribution. The prior probability for numbers greater than
|
Each function returns a prior object whose class is the same as the function name. All of these inherit from "DiscreteUniformPrior" and from "Prior".
The PoissonPrior
assumes a potentially truncated Poisson
distribution with the given mean.
Steven L. Scott [email protected]
## Specify an exact number of trees in a Bart model (see the BoomBart ## package). ntrees <- PointMassPrior(200) ## Uniform prior between 50 and 100 trees, including the endpoints. ntrees <- DiscreteUniformPrior(50, 100) ## Truncated Poisson prior, with a mean of 20, a lower endpoint of 1, ## and an upper endpoint of 50. ntrees <- PoissonPrior(20, 1, 50)
## Specify an exact number of trees in a Bart model (see the BoomBart ## package). ntrees <- PointMassPrior(200) ## Uniform prior between 50 and 100 trees, including the endpoints. ntrees <- DiscreteUniformPrior(50, 100) ## Truncated Poisson prior, with a mean of 20, a lower endpoint of 1, ## and an upper endpoint of 50. ntrees <- PoissonPrior(20, 1, 50)
Evaluate the multivariate normal density.
dmvn(y, mu, sigma, siginv = NULL, ldsi = NULL, logscale = FALSE)
dmvn(y, mu, sigma, siginv = NULL, ldsi = NULL, logscale = FALSE)
y |
A numeric vector or matrix containing the data whose density is desired. |
mu |
The mean of the distribution. A vector. |
sigma |
The variance matrix of the distribution. A matrix. |
siginv |
The inverse of |
ldsi |
The log determinant of |
logscale |
Logical. If |
A vector containing the density of each row of y
.
Steven L. Scott [email protected]
A DoubleModel
is a class of prior distributions for
real valued scalar parameters. A DoubleModel
is sometimes used
by a probability model that does not have a conjugate prior.
Steven L. Scott [email protected]
BetaPrior
,
GammaPrior
,
LognormalPrior
,
NormalPrior
,
SdPrior
,
TruncatedGammaPrior
,
and UniformPrior
.
ExternalLegendLayout
sets up a plotting region to plot a
regular grid of points, with an optional legend on the top or right of
the grid.
AddExternalLegend
adds a legend to a grid of plots that was set
up using ExternalLegendLayout
.
ExternalLegendLayout(nrow, ncol, legend.labels, legend.location = c("top", "right"), outer.margin.lines = rep(4, 4), gap.between.plots = rep(0, 4), legend.cex = 1, x.axis = TRUE, y.axis = TRUE) AddExternalLegend(legend.labels, legend.location = c("top", "right"), legend.cex =1, bty = "n", ...)
ExternalLegendLayout(nrow, ncol, legend.labels, legend.location = c("top", "right"), outer.margin.lines = rep(4, 4), gap.between.plots = rep(0, 4), legend.cex = 1, x.axis = TRUE, y.axis = TRUE) AddExternalLegend(legend.labels, legend.location = c("top", "right"), legend.cex =1, bty = "n", ...)
nrow |
The number of rows in the array of plots. |
ncol |
The number of columns in the array of plots. |
legend.labels |
The labels to be used in the legend. |
legend.location |
Specifies whether the legend should appear on
the top or the right hand side. It can also be |
outer.margin.lines |
A vector of length four giving the number of
lines of text desired for the outer margins of the plot. See the
|
gap.between.plots |
A vector of length 4 giving the number of
lines of text to leave between grid panels. See the |
legend.cex |
The scale factor that will be used for legend text. This must match the scale factor used in add.external.legend. |
x.axis |
Will any plots have a horizontal axis? |
y.axis |
Will any plots have a vertical axis? |
bty |
Type of box to draw around the legend. Can be "n" (for no
box) or "o" for a box. See |
... |
Extra arguments passed to |
ExternalLegendLayout
returns the original graphical
parameters, intended for use with on.exit
.
AddExternalLegend
returns invisible NULL
.
Steven L. Scott
example.plot <- function() { x <- rnorm(100) y <- rnorm(100) scale <- range(x, y) opar <- ExternalLegendLayout(nrow = 2, ncol = 2, legend.labels = c("foo", "bar")) on.exit({par(opar); layout(1)}) hist(x, xlim = scale, axes = FALSE, main = "") mtext("X", side = 3, line = 1) box() plot(x, y, xlim = scale, ylim = scale, axes = FALSE) box() axis(3) axis(4) plot(y, x, xlim = scale, ylim = scale, axes = FALSE, pch = 2, col = 2) box() axis(1) axis(2) hist(y, xlim = scale, axes = FALSE, main = "") mtext("Y", side = 1, line = 1) box() AddExternalLegend(legend.labels = c("foo", "bar"), pch = 1:2, col = 1:2, legend.cex = 1.5) } ## Now call example.plot(). example.plot() ## Because of the call to on.exit(), in example.plot, ## the original plot layout is restored. hist(1:10)
example.plot <- function() { x <- rnorm(100) y <- rnorm(100) scale <- range(x, y) opar <- ExternalLegendLayout(nrow = 2, ncol = 2, legend.labels = c("foo", "bar")) on.exit({par(opar); layout(1)}) hist(x, xlim = scale, axes = FALSE, main = "") mtext("X", side = 3, line = 1) box() plot(x, y, xlim = scale, ylim = scale, axes = FALSE) box() axis(3) axis(4) plot(y, x, xlim = scale, ylim = scale, axes = FALSE, pch = 2, col = 2) box() axis(1) axis(2) hist(y, xlim = scale, axes = FALSE, main = "") mtext("Y", side = 1, line = 1) box() AddExternalLegend(legend.labels = c("foo", "bar"), pch = 1:2, col = 1:2, legend.cex = 1.5) } ## Now call example.plot(). example.plot() ## Because of the call to on.exit(), in example.plot, ## the original plot layout is restored. hist(1:10)
Specifies gamma prior distribution.
GammaPrior(a = NULL, b = NULL, prior.mean = NULL, initial.value = NULL) TruncatedGammaPrior(a = NULL, b = NULL, prior.mean = NULL, initial.value = NULL, lower.truncation.point = 0, upper.truncation.point = Inf)
GammaPrior(a = NULL, b = NULL, prior.mean = NULL, initial.value = NULL) TruncatedGammaPrior(a = NULL, b = NULL, prior.mean = NULL, initial.value = NULL, lower.truncation.point = 0, upper.truncation.point = Inf)
a |
The shape parameter in the Gamma(a, b) distribution. |
b |
The scale paramter in the Gamma(a, b) distribution. |
prior.mean |
The mean the Gamma(a, b) distribution, which is a/b. |
initial.value |
The initial value in the MCMC algorithm of the variable being modeled. |
lower.truncation.point |
The lower limit of support for the truncated gamma distribution. |
upper.truncation.point |
The upper limit of support for the truncated gamma distribution. |
The mean of the Gamma(a, b) distribution is a/b and the variance is
a/b^2. If prior.mean
is not NULL
, then one of either
a
or b
must be non-NULL
as well.
GammaPrior is the conjugate prior for a Poisson mean or an exponential
rate. For a Poisson mean a
corresponds to a prior sum of
observations and b
to a prior number of observations. For an
exponential rate the roles are reversed a
represents a number
of observations and b
the sum of the observed durations. The
gamma distribution is a generally useful for parameters that must be
positive.
The gamma distribution is the conjugate prior for the reciprocal of a
Guassian variance, but SdPrior
should usually be used in
that case.
A TruncatedGammaPrior is a GammaPrior with support truncated to the
interval (lower.truncation.point, upper.truncation.point)
.
If an object specifically needs a GammaPrior
you typically
cannot pass a TruncatedGammaPrior
.
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
This function is mainly intended for example code and unit
testing. It generates a data.frame
containing all factor data.
GenerateFactorData(factor.levels.list, sample.size)
GenerateFactorData(factor.levels.list, sample.size)
factor.levels.list |
A list of character vectors giving factor level names. The names attribute of this list becomes the set of variables names for the return data frame. |
sample.size |
The desired number of rows in the returned data frame. |
Steven L. Scott [email protected]
foo <- GenerateFactorData(list(a = c("foo", "bar", "baz"), b = c("larry", "moe", "curly", "shemp")), 50) head(foo) # a b # 1 bar curly # 2 foo curly # 3 bar moe # 4 bar moe # 5 baz curly # 6 bar curly
foo <- GenerateFactorData(list(a = c("foo", "bar", "baz"), b = c("larry", "moe", "curly", "shemp")), 50) head(foo) # a b # 1 bar curly # 2 foo curly # 3 bar moe # 4 bar moe # 5 baz curly # 6 bar curly
Plot a bunch of histograms describing the marginal distributions the columns in a data frame.
histabunch(x, gap = 1, same.scale = FALSE, boxes = FALSE, min.continuous = 12, max.factor = 40, vertical.axes = FALSE, ...)
histabunch(x, gap = 1, same.scale = FALSE, boxes = FALSE, min.continuous = 12, max.factor = 40, vertical.axes = FALSE, ...)
x |
A matrix or data frame containing the variables to be plotted. |
gap |
The gap between the plots, measured in lines of text. |
same.scale |
Logical. Indicates whether the histograms should all be plotted on the same scale. |
boxes |
Logical. Indicates whether boxes should be drawn around the histograms. |
min.continuous |
Numeric variables with more than |
max.factor |
Factors with more than |
vertical.axes |
Logical value indicating whether the histograms should be given vertical "Y" axes. |
... |
Extra arguments passed to hist (for numeric variables) or barplot (for factors). |
Called for its side effect, which is to produce multiple histograms on the current graphics device.
Steven L. Scott
data(airquality) histabunch(airquality)
data(airquality) histabunch(airquality)
Density for the inverse Wishart distribution.
dInverseWishart(Sigma, sum.of.squares, nu, logscale = FALSE, log.det.sumsq = log(det(sum.of.squares))) InverseWishartPrior(variance.guess, variance.guess.weight)
dInverseWishart(Sigma, sum.of.squares, nu, logscale = FALSE, log.det.sumsq = log(det(sum.of.squares))) InverseWishartPrior(variance.guess, variance.guess.weight)
Sigma |
Argument (random variable) for the inverse Wishart distribution. A positive definite matrix. |
nu |
The "degrees of freedom" parameter of the inverse Wishart
distribution. The distribution is only defined for |
sum.of.squares |
A positive definite matrix. Typically this is the sum of squares that is the sufficient statistic for the inverse Wishart distribution. |
logscale |
Logical. If |
log.det.sumsq |
The log determinant of |
variance.guess |
A prior guess at the value of the variance matrix the prior is modeling. |
variance.guess.weight |
A positive scalar indicating the number
of observations worth of weight to place on |
The inverse Wishart distribution has density function
dInverseWishart
returns the scalar density (or log density) at
the specified value. This function is not vectorized, so only one
random variable (matrix) can be evaluated at a time.
InverseWishartPrior
returns a list that encodes the parameters
of the distribution in a format expected by underlying C++ code.
Steven L. Scott [email protected]
dWishart
,
rWishart
,
NormalInverseWishartPrior
Density, distribution function, quantile function, and random draws from the inverse gamma distribution.
dinvgamma(x, shape, rate, logscale = FALSE) pinvgamma(x, shape, rate, lower.tail = TRUE, logscale = FALSE) qinvgamma(p, shape, rate, lower.tail = TRUE, logscale = FALSE) rinvgamma(n, shape, rate)
dinvgamma(x, shape, rate, logscale = FALSE) pinvgamma(x, shape, rate, lower.tail = TRUE, logscale = FALSE) qinvgamma(p, shape, rate, lower.tail = TRUE, logscale = FALSE) rinvgamma(n, shape, rate)
x |
A vector of deviates where the density or distribution function is to be evaluated. |
p |
A vector of probabilities representing CDF values (if
|
n |
The desired number of draws from the inverse gamma distribution. |
shape |
The shape parameter. |
rate |
The 'rate' parameter. NOTE: The term 'rate' is used to
match the corresponding parameter in |
logscale |
Logical. If |
lower.tail |
Logical. If TRUE then cumulative probabilities are measured from zero, as in a CDF. If FALSE then cumulative are measured from infinity, as in a survivor function. |
rinvgamma
returns draws from the distribution.
dinvgamma
returns the density function.
pinvgamma
returns the cumulative distribution function
(or survivor function, if lower.tail == FALSE
).
qinvgamma
returns quantiles from the distribution.
qinvgamma
and pinvgamma
are inverse functions.
Steven L. Scott [email protected]
Check whether a number is even or odd.
IsEven(x) IsOdd(x)
IsEven(x) IsOdd(x)
x |
An integer or vector of integers. |
Logical indicating whether the argument is even (or odd).
Steven L. Scott
IsEven(2) ## TRUE IsOdd(2) ## FALSE
IsEven(2) ## TRUE IsOdd(2) ## FALSE
Returns the log of the multivariate gamma function.
lmgamma(y, dimension)
lmgamma(y, dimension)
y |
The function argument, which must be a positive scalar. |
dimension |
The dimension of the multivariate gamma function, which must be an integer >= 1. |
The multivariate gamma function is
The multivariate gamma function shows up as part of the normalizing constant for the Wishart and inverse Wishart distributions.
Returns the log of the multivariate gamma function. Note that
this function is not vectorized. Both y
and dimension
must be scalars, and the return value is a scalar.
Steven L. Scott [email protected]
Compute the log of the integrated Gaussian likelihood, where the model paramaters are integrated out with respect to a normal-inverse gamma prior.
LogIntegratedGaussianLikelihood(suf, prior)
LogIntegratedGaussianLikelihood(suf, prior)
suf |
A |
prior |
A |
Returns a scalar giving the log integrated likelihood.
Steven L. Scott [email protected]
prior <- NormalInverseGammaPrior(10, 2, sqrt(2), 1) y <- c(7.8949, 9.17438, 8.38808, 5.52521) suf <- GaussianSuf(y) loglike <- LogIntegratedGaussianLikelihood(suf, prior) # -9.73975
prior <- NormalInverseGammaPrior(10, 2, sqrt(2), 1) y <- c(7.8949, 9.17438, 8.38808, 5.52521) suf <- GaussianSuf(y) loglike <- LogIntegratedGaussianLikelihood(suf, prior) # -9.73975
Specifies a lognormal prior distribution.
LognormalPrior(mu = 0.0, sigma = 1.0, initial.value = NULL)
LognormalPrior(mu = 0.0, sigma = 1.0, initial.value = NULL)
mu |
mean of the corresponding normal distribution. |
sigma |
standard deviation of the corresponding normal distribution. WARNING: If something looks strange in your program, look out for SD != Variance errors. |
initial.value |
Initial value of the variable to be modeled
(e.g. in an MCMC algorithm). If |
A lognormal distribution, where log(y) ~ N(mu, sigma). The mean of this distribution is exp(mu + 0.5 * sigma^2), so don't only focus on the mean parameter here.
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
https://en.wikipedia.org/wiki/Log-normal_distribution
The conjugate prior distribution for the parameters of a homogeneous Markov chain. The rows in the transition probability matrix modeled with independent Dirichlet priors. The distribution of the initial state is modeled with its own independent Dirichlet prior.
MarkovPrior(prior.transition.counts = NULL, prior.initial.state.counts = NULL, state.space.size = NULL, uniform.prior.value = 1)
MarkovPrior(prior.transition.counts = NULL, prior.initial.state.counts = NULL, state.space.size = NULL, uniform.prior.value = 1)
prior.transition.counts |
A matrix of the same dimension as the
transition probability matrix being modeled. Entry (i, j) represents
the "prior count" of transitions from state |
prior.initial.state.counts |
A vector of positive numbers representing prior counts of initial states. |
state.space.size |
If both prior.transition.counts and
prior.initial.state.counts are missing, then they will be filled
with an object of dimension state.space.size where all entries are
set to |
uniform.prior.value |
The default value to use for entries of
|
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
Given two data frames with the same data, but with rows and columns in potentially different orders, produce a pair of permutations such that data2[row.permutation, column.permutation] matches data1.
MatchDataFrame(data.to.match, data.to.permute)
MatchDataFrame(data.to.match, data.to.permute)
data.to.match |
The data frame to be matched. |
data.to.permute |
The data frame to be permuted. |
Returns a list with two elements.
column.permutation |
A vector of indices such that the columns of
|
row.permutation |
A vector of indices such that the rows of
|
Steven L. Scott [email protected]
x1 <- data.frame(larry = rnorm(10), moe = 1:10, curly = rpois(10, 2)) x2 <- x1[c(1:5, 10:6), c(3, 1, 2)] m <- MatchDataFrame(x1, x2) x2[m$row.permutation, m$column.permutation] == x1 ## all TRUE
x1 <- data.frame(larry = rnorm(10), moe = 1:10, curly = rpois(10, 2)) x2 <- x1[c(1:5, 10:6), c(3, 1, 2)] m <- MatchDataFrame(x1, x2) x2[m$row.permutation, m$column.permutation] == x1 ## all TRUE
Quickly scan a matrix from a file.
mscan(fname, nc = 0, header = FALSE, burn = 0, thin = 0, nlines = 0L, sep = "", ...)
mscan(fname, nc = 0, header = FALSE, burn = 0, thin = 0, nlines = 0L, sep = "", ...)
fname |
The name of the file from which to scan the data. |
nc |
The number of columns in the matrix to be read. If zero then the number of columns will be determined by the number of columns in the first line of the file. |
header |
logical indicating whether the file contains a header row. |
burn |
An integer giving the number of initial lines of the matrix to discard. |
thin |
An integer. If thin > 1 then keep every thin\'th line. This is useful for reading in very large files of MCMC output, for example. |
nlines |
If positive, the number of data lines to scan from the data file (e.g. for an MCMC algorithm that is only partway done). Otherwise the entire file will be read. |
sep |
Field separator in the data file. |
... |
Extra arguments passed to 'scan'. |
This function is similar to read.table
, but scanning a
matrix of homogeneous data is much faster because there is much less
format deduction.
The matrix stored in the data file.
Steven L. Scott [email protected]
filename <- file.path(tempdir(), "example.data") cat("foo bar baz", "1 2 3", "4 5 6", file = filename, sep = "\n") m <- mscan(filename, header = TRUE) m ## foo bar baz ## [1,] 1 2 3 ## [2,] 4 5 6
filename <- file.path(tempdir(), "example.data") cat("foo bar baz", "1 2 3", "4 5 6", file = filename, sep = "\n") m <- mscan(filename, header = TRUE) m ## foo bar baz ## [1,] 1 2 3 ## [2,] 4 5 6
A multivariate normal prior distribution formed by the product of independent normal margins.
MvnDiagonalPrior(mean.vector, sd.vector)
MvnDiagonalPrior(mean.vector, sd.vector)
mean.vector |
A vector giving the mean of the prior distribution. |
sd.vector |
The standard deviations of the components in the distribution. I.e. the square root of the diagonal of the variance matrix. |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
A prior for the parameters of the multivariate normal distribution that assumes Sigma to be a diagonal matrix with elements modeled by independent inverse Gamma priors.
MvnIndependentSigmaPrior(mvn.prior, sd.prior.list)
MvnIndependentSigmaPrior(mvn.prior, sd.prior.list)
mvn.prior |
An object of class |
sd.prior.list |
A list of |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
A multivariate normal prior distribution.
MvnPrior(mean, variance)
MvnPrior(mean, variance)
mean |
A vector giving the mean of the prior distribution. |
variance |
A symmetric positive definite matrix giving the variance of the prior distribution. |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
A multivaraite normal prior distribution, typically used as the prior distribution for the mean of multivaraite normal data. The variance of this distribution is proportional to another parameter "Sigma" that exists elsewhere. Usually "Sigma" is the variance of the data.
This distribution is the "normal" part of a "normal-inverse Wishart" distribution.
MvnGivenSigmaMatrixPrior(mean, sample.size)
MvnGivenSigmaMatrixPrior(mean, sample.size)
mean |
A vector giving the mean of the prior distribution. |
sample.size |
A positive scalar. The variance of the
distribution is |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
The NormalInverseGammaPrior is the conjugate prior for the mean and variance of the scalar normal distribution. The model says that
NormalInverseGammaPrior(mu.guess, mu.guess.weight = .01, sigma.guess, sigma.guess.weight = 1, ...)
NormalInverseGammaPrior(mu.guess, mu.guess.weight = .01, sigma.guess, sigma.guess.weight = 1, ...)
mu.guess |
The mean of the prior distribution. This is
|
mu.guess.weight |
The number of observations worth of weight
assigned to |
sigma.guess |
A prior estimate at the value of |
sigma.guess.weight |
The number of observations worth of weight
assigned to |
... |
blah |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
The NormalInverseWishartPrior is the conjugate prior for the mean and variance of the multivariate normal distribution. The model says that
The distribution is parameterized by
S
,
the inverse of the sum of squares matrix, and the scalar
degrees of freedom parameter nu
.
The distribution is improper if .
NormalInverseWishartPrior(mean.guess, mean.guess.weight = .01, variance.guess, variance.guess.weight = nrow(variance.guess) + 1)
NormalInverseWishartPrior(mean.guess, mean.guess.weight = .01, variance.guess, variance.guess.weight = nrow(variance.guess) + 1)
mean.guess |
The mean of the prior distribution. This is
|
mean.guess.weight |
The number of observations worth of weight
assigned to |
variance.guess |
A prior estimate at the value of |
variance.guess.weight |
The number of observations worth of weight
assigned to |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
Specifies a scalar Gaussian prior distribution.
NormalPrior(mu, sigma, initial.value = mu, fixed = FALSE)
NormalPrior(mu, sigma, initial.value = mu, fixed = FALSE)
mu |
The mean of the prior distribution. |
sigma |
The standard deviation of the prior distribution. |
initial.value |
The initial value of the parameter being modeled in the MCMC algorithm. |
fixed |
Should the deviate modeled by this distribution be fixed at its initial value? (Used for debugging in some code. Not universally respected.) |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
A pairs plot showing the posterior distribution of the given list of Monte Carlo draws. Plots above the diagonal show the posterior distribution on a scale just wide enough to fit the plots. The diagonal shows a marginal density plot, and the subdiagonal shows the distribution with all plots on a common scale.
PairsDensity(draws, nlevels = 20, lty = NULL, color = NULL, subset = NULL, labels, legend.location = "top", legend.cex = 1, label.cex = 1, ...)
PairsDensity(draws, nlevels = 20, lty = NULL, color = NULL, subset = NULL, labels, legend.location = "top", legend.cex = 1, label.cex = 1, ...)
draws |
Either a matrix or a list of matrices. If a list is provided then each list element is plotted as a separate set of contours, and all matrices must have the same number of columns (though the number of rows can differ). |
nlevels |
The number of contour levels to plot. |
lty |
The line types to use for the different elements in
|
color |
The color to use for different elements in |
subset |
If draws is a list, then this can be a numerical vector.
If draws has names, then subset can be a character vector naming
which elements to include. If |
labels |
If |
legend.location |
Either |
legend.cex |
Scale factor to use for the legend labels. |
label.cex |
Scale factor to use for the row and column labels. |
... |
Extra arguments (graphical parameters), passed to
|
Steven L. Scott
pairs
, CompareDensities
, CompareManyDensities
## You can see the pairs plot for a single set of draws. y <- matrix(rnorm(5000, mean = 1:5), ncol = 5, byrow = TRUE) PairsDensity(y) ## You can also compare two or more sets of draws. z <- matrix(rnorm(2500, mean = 2:6), ncol = 5, byrow = TRUE) PairsDensity(list("first set" = y, "second set" = z))
## You can see the pairs plot for a single set of draws. y <- matrix(rnorm(5000, mean = 1:5), ncol = 5, byrow = TRUE) PairsDensity(y) ## You can also compare two or more sets of draws. z <- matrix(rnorm(2500, mean = 2:6), ncol = 5, byrow = TRUE) PairsDensity(list("first set" = y, "second set" = z))
Contour plot of one ore more bivariate densities. This function was originally created to implement PairsDensity, but might be useful on its own.
PlotDensityContours(draws, x.index = 1, y.index = 2, xlim = NULL, ylim = NULL, nlevels = 20, subset = NULL, color = NULL, lty = NULL, axes = TRUE, ...)
PlotDensityContours(draws, x.index = 1, y.index = 2, xlim = NULL, ylim = NULL, nlevels = 20, subset = NULL, color = NULL, lty = NULL, axes = TRUE, ...)
draws |
Either a matrix or a list of matrices. If a list is provided then each list element is plotted as a separate set of contours, and all matrices must have the same number of columns (though the number of rows can differ). |
x.index |
The index of the parameter to plot on the horizonal axis. |
y.index |
The index of the beta coefficient to plot on the vertical axis. |
xlim |
Limits on the horizontal axis. If NULL then the plot is just wide enough to fit the contours. |
ylim |
Limits on the vertical axis. If NULL then the plot is just tall enough to fit the contours. |
nlevels |
The number of contour levels to plot. |
subset |
If draws is a list, then this can be a numerical vector.
If draws has names, then subset can be a character vector naming
which elements to include. If |
color |
The color to use for different elemetns in |
lty |
The line types to use for the different elements in
|
axes |
Logical. Should x and y axies be drawn? |
... |
Extra arguments passed to |
Steven L. Scott
## You can see the pairs plot for a single set of draws. y <- matrix(rnorm(5000, mean = 1:5), ncol = 5, byrow = TRUE) PlotDensityContours(y, 3, 1) ## You can also compare two or more sets of draws. z <- matrix(rnorm(2500, mean = 2:6), ncol = 5, byrow = TRUE) PlotDensityContours(list("first set" = y, "second set" = z), 3, 1)
## You can see the pairs plot for a single set of draws. y <- matrix(rnorm(5000, mean = 1:5), ncol = 5, byrow = TRUE) PlotDensityContours(y, 3, 1) ## You can also compare two or more sets of draws. z <- matrix(rnorm(2500, mean = 2:6), ncol = 5, byrow = TRUE) PlotDensityContours(list("first set" = y, "second set" = z), 3, 1)
Produces an dynamic distribution plot where gray scale shading is used to show the evolution of a distribution over an index set. This function is particularly useful when the index set is too large to do side-by-side boxplots.
PlotDynamicDistribution(curves, timestamps = NULL, quantile.step=.01, xlim = NULL, xlab = "Time", ylim = range(curves, na.rm = TRUE), ylab = "distribution", add = FALSE, axes = TRUE, ...)
PlotDynamicDistribution(curves, timestamps = NULL, quantile.step=.01, xlim = NULL, xlab = "Time", ylim = range(curves, na.rm = TRUE), ylab = "distribution", add = FALSE, axes = TRUE, ...)
curves |
A matrix where each row represents a curve (e.g. a simulation of a time series from a posterior distribution) and columns represent different points in the index set. For example, a long time series would be a wide matrix. |
timestamps |
An optional vector of "time stamps" that
|
quantile.step |
Each color step in the plot corresponds to this difference in quantiles. Smaller values make prettier plots, but the plots take longer to produce. |
xlim |
The x limits (x1, x2) of the plot. Note that |
xlab |
Label for the horzontal axis. |
ylim |
The y limits (y1, y2) of the plot. Note that |
ylab |
Label for the vertical axis. |
add |
Logical. If true then add the plot to the current plot. Otherwise a fresh plot will be created. |
axes |
Logical. Should axes be added to the plot? |
... |
Extra arguments to pass on to |
The function works by passing many calls to
polygon
. Each polygon is associated with a quantile
level, with darker shading near the median.
This function is called for its side effect, which is to produce a plot on the current graphics device.
Steven L. Scott [email protected]
x <- t(matrix(rnorm(1000 * 100, 1:100, 1:100), nrow=100)) ## x has 1000 rows, and 100 columns. Column i is N(i, i^2) noise. PlotDynamicDistribution(x) time <- as.Date("2010-01-01", format = "%Y-%m-%d") + (0:99 - 50)*7 PlotDynamicDistribution(x, time)
x <- t(matrix(rnorm(1000 * 100, 1:100, 1:100), nrow=100)) ## x has 1000 rows, and 100 columns. Column i is N(i, i^2) noise. PlotDynamicDistribution(x) time <- as.Date("2010-01-01", format = "%Y-%m-%d") + (0:99 - 50)*7 PlotDynamicDistribution(x, time)
Produces individual autocorrelation functions for many-valued time
series such as those produced by highly multivariate MCMC output.
Cross-correlations such as those produced by acf
are not shown.
PlotMacf(x, lag.max = 40, gap = 0.5, main = NULL, boxes = TRUE, xlab = "lag", ylab = "ACF", type = "h")
PlotMacf(x, lag.max = 40, gap = 0.5, main = NULL, boxes = TRUE, xlab = "lag", ylab = "ACF", type = "h")
x |
matrix or 3-way array of MCMC output (or other time series). The first dimension represents discrete time. |
lag.max |
maximum lag to use when computing ACF's. |
gap |
non-negative scalar. gap between plots |
main |
character. main title for the plot |
boxes |
logical. Should boxes be drawn around the plots |
xlab |
character label for horizontal axis. |
ylab |
character label for vertical axis. |
type |
type of line plot to show. Defaults to "h". See
|
Called for its side effect
Steven L. Scott
x <- matrix(rnorm(1000), ncol=10) PlotMacf(x)
x <- matrix(rnorm(1000), ncol=10) PlotMacf(x)
Plots many time series plots on the same graphical device. Each plot gets its own frame. Scales can be adjusted to see variation in each plot (each plot gets its own scale), or variation between plots (common scale).
PlotManyTs(x, type = "l", gap = 0, boxes = TRUE, truth = NULL, thin = 1, labs, same.scale = TRUE, ylim = NULL, refline = NULL, color = NULL, ...)
PlotManyTs(x, type = "l", gap = 0, boxes = TRUE, truth = NULL, thin = 1, labs, same.scale = TRUE, ylim = NULL, refline = NULL, color = NULL, ...)
x |
Matrix, data frame, list, or 3-dimensional array to be plotted. |
type |
type of line plots to produce. See
|
gap |
Number of lines of space to put between plots. |
boxes |
Logical indicating whether boxes should be drawn around each plot. |
truth |
A vector or matrix of reference values to be added to
each plot as a horizontal line. The dimension should match
|
thin |
Frequency of observations to plot. E.g. |
labs |
Optional character vector giving the title (e.g. variable
name) for each plot. If |
same.scale |
Logical indicating whether plots should be drawn
with a common vertical axis, which is displayed on alternating rows
of the plot. If |
ylim |
Scale of the vertical axis. If non-NULL then same.scale
is set to |
refline |
a vector or scalar value to use as a reference line.
This is a supplement to the |
color |
Vector of colors to use in the plots. |
... |
Steven L. Scott
plot.ts
(for plotting a small number of time series)
plot.macf
x <- matrix(rnorm(1000), ncol = 10) PlotManyTs(x) PlotManyTs(x, same = FALSE)
x <- matrix(rnorm(1000), ncol = 10) PlotManyTs(x) PlotManyTs(x, same = FALSE)
A conjugate prior for regression coefficients, conditional on residual variance, and sample size.
RegressionCoefficientConjugatePrior( mean, sample.size, additional.prior.precision = numeric(0), diagonal.weight = 0)
RegressionCoefficientConjugatePrior( mean, sample.size, additional.prior.precision = numeric(0), diagonal.weight = 0)
mean |
The mean of the prior distribution, denoted 'b' below. See Details. |
sample.size |
The value denoted |
additional.prior.precision |
A vector of non-negative numbers
representing the diagonal matrix |
diagonal.weight |
The weight given to the diagonal when XTX is
averaged with its diagonal. The purpose of |
A conditional prior for the coefficients (beta) in a linear regression
model. The prior is conditional on the residual variance
, the sample size n, and the design matrix X.
The prior is
where
The prior distribution also depends on the cross product matrix XTX and the sample size n, which are not arguments to this function. It is expected that the underlying C++ code will get those quantities elsewhere (presumably from the regression modeled by this prior).
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
Produces repeated copies of an object.
RepList(object, times)
RepList(object, times)
object |
The object to be replicated. |
times |
The desired number of replications. |
Returns a list containing times
copies of object
.
Steven L. Scott
alist <- list(x = "foo", y = 12, z = c(1:3)) three.copies <- RepList(alist, 3)
alist <- list(x = "foo", y = 12, z = c(1:3)) three.copies <- RepList(alist, 3)
Simulate draws from the multivariate normal distribution.
rmvn(n = 1, mu, sigma = diag(rep(1., length(mu))))
rmvn(n = 1, mu, sigma = diag(rep(1., length(mu))))
n |
The desired number of draws. |
mu |
The mean of the distribution. A vector. |
sigma |
The variance matrix of the distribution. A matrix. |
Note that mu
and sigma
are the same for all n
draws. This function cannot handle separate parameters for each draw
the way rnorm
and similar functions for scalar random
variables can.
If n == 1
the return value is a vector. Otherwise it is
a matrix with n
rows and length(mu)
columns.
Steven L. Scott [email protected]
y1 <- rnorm(1, 1:3) ## y1 is a vector y2 <- rnorm(10, 1:3) ## y2 is a matrix
y1 <- rnorm(1, 1:3) ## y1 is a vector y2 <- rnorm(10, 1:3) ## y2 is a matrix
A wrapper for passing R functions to C++ code.
RVectorFunction(f, ...)
RVectorFunction(f, ...)
f |
A scalar-valued function of a vector-valued argument. The function can depend on other arguments as long as the vector valued argument appears in the first position. |
... |
Optional, named, extra arguments to be passed to f. These arguments are fixed at the time this object is created. For the purpose of evaluating f, these arguments do not update. |
The Boom library can handle the output of this function as a C++ function object. Note that evaluating R functions in C is no faster than evaluating them in R, but a wrapper like this is useful for interacting with C and C++ libraries that expect to operate on C and C++ functions.
A list containing the information needed to evaluate the function f in C++ code.
Steven L. Scott [email protected]
A matrix-normal prior distribution, intended as the conjugate prior for the regression coefficients in a multivariate linear regression.
ScaledMatrixNormalPrior(mean, nu)
ScaledMatrixNormalPrior(mean, nu)
mean |
A matrix giving the mean of the distributions |
nu |
A scale factor affecting the variance. |
The matrix normal distribution is a 3-parameter distribution MN(mu, Omega, V), where mu is the mean. A deviate from the distribution is a matrix B, where Cov(B[i, j], B[k, m]) = Omega[i, k] * Sigma[j, m]. If b = Vec(B) is the vector obtained by stacking columns of B, then b is multivariate normal with mean Vec(mu) and covariance matrix
(the kronecker product).
This prior distribution assumes the underlying C++ code knows where to find the predictor (X) matrix in the regression, and the residual variance matrix Sigma. The assumed prior distribution is B ~ MN(mu, X'X / nu, Sigma).
Like most other priors in Boom, this function merely encodes information expected by the underlying C++ code, ensuring correct names and formatting.
Steven L. Scott [email protected]
Specifies an inverse Gamma prior for a variance parameter, but inputs are defined in terms of a standard deviation.
SdPrior(sigma.guess, sample.size = .01, initial.value = sigma.guess, fixed = FALSE, upper.limit = Inf)
SdPrior(sigma.guess, sample.size = .01, initial.value = sigma.guess, fixed = FALSE, upper.limit = Inf)
sigma.guess |
A prior guess at the value of the standard deviation. |
sample.size |
The weight given to |
initial.value |
The initial value of the paramter in the MCMC algorithm. |
fixed |
Logical. Some algorithms allow you to fix sigma at a
particular value. If |
upper.limit |
If positive, this is the upper limit on possible values of the standard deviation parameter. Otherwise the upper limit is assumed infinite. Not supported by all MCMC algorithms. |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
Sufficient statistics for various models.
RegressionSuf(X = NULL, y = NULL, xtx = crossprod(X), xty = crossprod(X, y), yty = sum(y^2), n = length(y), xbar = colMeans(X), ybar = mean(y)) GaussianSuf(y)
RegressionSuf(X = NULL, y = NULL, xtx = crossprod(X), xty = crossprod(X, y), yty = sum(y^2), n = length(y), xbar = colMeans(X), ybar = mean(y)) GaussianSuf(y)
X |
The predictor matrix for a regression problem. |
y |
The data, or the regression response variable. |
xtx |
The cross product of the design matrix. "X transpose X." |
xty |
The cross product of the design matrix with the response vector. "X transpose y." |
yty |
The sum of the squares of the response vector. "y transpose y." |
n |
The sample size. |
xbar |
A vector giving the average of each column in the predictor matrix. |
ybar |
The (scalar) mean of the response variable y. |
The returned value is a function containing the sufficient statistics for a regression model. Arguments are checked to ensure they have legal values. List names match the names expected by underlying C++ code.
Steven L. Scott [email protected]
X <- cbind(1, matrix(rnorm(3 * 100), ncol = 3)) y <- rnorm(100) ## Sufficient statistics can be computed from raw data, if it is ## available. suf1 <- RegressionSuf(X, y) ## The individual components can also be computed elsewhere, and ## provided as arguments. If n is very large, this can be a ## substantial coomputational savings. suf2 <- RegressionSuf(xtx = crossprod(X), xty = crossprod(X, y), yty = sum(y^2), n = 100, xbar = colMeans(X))
X <- cbind(1, matrix(rnorm(3 * 100), ncol = 3)) y <- rnorm(100) ## Sufficient statistics can be computed from raw data, if it is ## available. suf1 <- RegressionSuf(X, y) ## The individual components can also be computed elsewhere, and ## provided as arguments. If n is very large, this can be a ## substantial coomputational savings. suf2 <- RegressionSuf(xtx = crossprod(X), xty = crossprod(X, y), yty = sum(y^2), n = 100, xbar = colMeans(X))
Suggests a burn-in period for an MCMC chain based on the log likelihood values simulated in the final few iterations of the chain.
SuggestBurnLogLikelihood(log.likelihood, fraction = .10, quantile = .9)
SuggestBurnLogLikelihood(log.likelihood, fraction = .10, quantile = .9)
log.likelihood |
A numeric vector giving the log likelihood values for each MCMC iteration. |
fraction |
The fraction of the chain that should be used to
determine the log likelihood lower bound. The default setting looks
in the final 25% of the MCMC run. Must be an number less than 1.
If |
quantile |
The quantile of the values in the final fraction that must be exceeded before the burn-in period is declared over. |
Looks at the last fraction
of the log.likelihood
sequence and finds a specified quantile to use as a threshold.
Returns the first iteration where log.likelihood
exceeds this
threshold.
Returns a suggested number of iterations to discard. This can be
0 if fraction == 0
, which is viewed as a signal that no burn-in
is desired.
Steven L. Scott [email protected]
Systematic sampling of every thin
'th row of a matrix or vector.
Useful for culling MCMC output or denoising a plot.
thin(x, thin)
thin(x, thin)
x |
The array to be thinned. The first dimension is the one sampled over. |
thin |
The frequency of observations to keep. With |
The thinned vector, matrix, or array is returned.
Steven L. Scott
x <- rnorm(100) thin(x, 10) # returns a 10 vector y <- matrix(rnorm(200), ncol=2) thin(y, 10) # returns a 10 by 2 matrix
x <- rnorm(100) thin(x, 10) # returns a 10 vector y <- matrix(rnorm(200), ncol=2) thin(y, 10) # returns a 10 by 2 matrix
Return discard all but every k'th row of a matrix.
ThinMatrix(mat, thin)
ThinMatrix(mat, thin)
mat |
The matrix to be thinned. |
thin |
The distance between kepts lines from mat. The larger the number the fewer lines are kept. |
The bigger the value of thin
the more thinning that gets done.
For example, thin = 10
will keep every 10 lines from mat
.
The matrix mat, after discarding all but every thin
lines.
Steven L. Scott [email protected]
m <- matrix(1:100, ncol = 2) ThinMatrix(m, thin = 10) ## [,1] [,2] ## [1,] 10 60 ## [2,] 20 70 ## [3,] 30 80 ## [4,] 40 90 ## [5,] 50 100
m <- matrix(1:100, ncol = 2) ThinMatrix(m, thin = 10) ## [,1] [,2] ## [1,] 10 60 ## [2,] 20 70 ## [3,] 30 80 ## [4,] 40 90 ## [5,] 50 100
Creates a series of boxplots showing the evolution of a distribution over time.
TimeSeriesBoxplot(x, time, ylim = NULL, add = FALSE, ...)
TimeSeriesBoxplot(x, time, ylim = NULL, add = FALSE, ...)
x |
A matrix where each row represents a curve (e.g. a simulation of a time series from a posterior distribution) and columns represent time. A long time series would be a wide matrix. |
time |
A vector of class |
ylim |
limits for the y axis. |
add |
logical, if |
... |
Extra arguments to pass on to |
Called for its side effect, which is to produce a plot on the current graphics device.
Steven L. Scott [email protected]
x <- t(matrix(rnorm(1000 * 100, 1:100, 1:100), nrow=100)) ## x has 1000 rows, and 100 columns. Column i is N(i, i^2) noise. time <- as.Date("2010-01-01", format = "%Y-%m-%d") + (0:99 - 50)*7 TimeSeriesBoxplot(x, time)
x <- t(matrix(rnorm(1000 * 100, 1:100, 1:100), nrow=100)) ## x has 1000 rows, and 100 columns. Column i is N(i, i^2) noise. time <- as.Date("2010-01-01", format = "%Y-%m-%d") + (0:99 - 50)*7 TimeSeriesBoxplot(x, time)
Convert an object to a character string, suitable for including in error messages.
ToString(object, ...)
ToString(object, ...)
object |
An object to be printed to a string. |
... |
Extra arguments passed to |
A string (a character vector of length 1) containing the printed value of the object.
Steven L. Scott [email protected]
m <- matrix(1:6, ncol = 2) printed.matrix <- ToString(m) y <- c(1, 2, 3, 3, 3, 3, 3, 3) tab <- table(y) printed.table <- ToString(tab)
m <- matrix(1:6, ncol = 2) printed.matrix <- ToString(m) y <- c(1, 2, 3, 3, 3, 3, 3, 3) tab <- table(y) printed.table <- ToString(tab)
Returns the trace of the product of two matrices.
TraceProduct(A, B, b.is.symmetric = FALSE)
TraceProduct(A, B, b.is.symmetric = FALSE)
A |
The first matrix in the product. |
B |
The second matrix in the product. |
b.is.symmetric |
Logical. A |
Returns a number equivalent to sum(diag(A %*% B))
.
Steven L. Scott [email protected]
Specifies a uniform prior distribution on a real-valued scalar parameter.
UniformPrior(lo = 0, hi = 1, initial.value = NULL)
UniformPrior(lo = 0, hi = 1, initial.value = NULL)
lo |
The lower limit of support. |
hi |
The upper limit of support. |
initial.value |
The initial value of the parameter in question to
use in the MCMC algorithm. If |
Steven L. Scott [email protected]
Gelman, Carlin, Stern, Rubin (2003), "Bayesian Data Analysis", Chapman and Hall.
Density and random generation for the Wishart distribution.
dWishart(W, Sigma, nu, logscale = FALSE) rWishart(nu, scale.matrix, inverse = FALSE)
dWishart(W, Sigma, nu, logscale = FALSE) rWishart(nu, scale.matrix, inverse = FALSE)
W |
Argument (random variable) for the Wishart density. A symmetric positive definite matrix. |
Sigma |
Scale or "variance" parameter of the Wishart distribution. See the "details" section below. |
nu |
The "degrees of freedom" parameter of the Wishart
distribution. The distribution is only defined for |
logscale |
Logical. If |
scale.matrix |
For the Wishart distribution the
If simulating from the inverse Wishart, |
inverse |
Logical. If TRUE then simulate from the inverse Wishart distribution. If FALSE then simulate from the Wishart distribution. |
If nu
is an integer then a
random variable can be thought of as the sum of
nu
outer
products: , where
is a zero-mean
multivariate normal with variance matrix
Sigma
.
The Wishart distribution is
where p == nrow(W)
and is the
multivariate gamma function (see
lmgamma
).
dWishart
returns the density of the Wishart distribution. It
is not vectorized, so only one random variable (matrix) can be
evaluated at a time.
rWishart
returns one or more draws from the Wishart or inverse
Wishart distributions. If n > 0
the result is a 3-way array.
Unlike the rWishart
function from the stats
package, the first index corresponds to draws. This is in keeping
with the convention of other models from the Boom package.
Steven L. Scott [email protected]