Title: | Extended Empirical Saddlepoint Density Approximations |
---|---|
Description: | Tools for fitting the Extended Empirical Saddlepoint (EES) density of Fasiolo et al. (2018) <doi:10.1214/18-EJS1433>. |
Authors: | Matteo Fasiolo and Simon N. Wood |
Maintainer: | Matteo Fasiolo <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0.7 |
Built: | 2024-10-30 06:52:04 UTC |
Source: | CRAN |
Given a sample X, it gives a pointwise evaluation of the multivariate normal (MVN) density fit at position y.
demvn(y, X, log = FALSE, verbose = TRUE, alpha = 2, beta = 1.25)
demvn(y, X, log = FALSE, verbose = TRUE, alpha = 2, beta = 1.25)
y |
points at which the MVN is evaluated. It can be either a d-dimensional vector or an n by d matrix, each row indicating a different position. |
X |
an n by d matrix containing the data. |
log |
if TRUE the log-density is returned. |
verbose |
currently not used. |
alpha |
tuning parameter of |
beta |
tuning parameter of |
The covariance matrix is estimated robustly, using the robCov
function.
A vector where the i-th entry is the density corresponding to the i-th row of y.
Matteo Fasiolo <[email protected]> and Simon N. Wood.
library(esaddle) X <- matrix(rnorm(2 * 1e3), 1e3, 2) # Sample used to fit a multivariate Gaussian demvn(rnorm(2), X, log = TRUE) # Evaluate the fitted log-density at a random location
library(esaddle) X <- matrix(rnorm(2 * 1e3), 1e3, 2) # Sample used to fit a multivariate Gaussian demvn(rnorm(2), X, log = TRUE) # Evaluate the fitted log-density at a random location
Gives a pointwise evaluation of the EES density (and optionally of its gradient) at one or more locations.
dsaddle( y, X, decay, deriv = FALSE, log = FALSE, normalize = FALSE, control = list(), multicore = !is.null(cluster), ncores = detectCores() - 1, cluster = NULL )
dsaddle( y, X, decay, deriv = FALSE, log = FALSE, normalize = FALSE, control = list(), multicore = !is.null(cluster), ncores = detectCores() - 1, cluster = NULL )
y |
points at which the EES is evaluated (d dimensional vector) or an n by d matrix, each row indicating a different position. |
X |
n by d matrix containing the data. |
decay |
rate at which the EES falls back on a normal density approximation, fitted to |
deriv |
If TRUE also the gradient of the log-saddlepoint density is returned. |
log |
If TRUE the log of the saddlepoint density is returned. |
normalize |
If TRUE the normalizing constant of the EES density will be computed. FALSE by default. |
control |
A list of control parameters with entries:
|
multicore |
if TRUE the empirical saddlepoint density at each row of y will be evaluated in parallel. |
ncores |
number of cores to be used. |
cluster |
an object of class |
A list with entries:
llk
the value of the EES log-density at each location y;
mix
for each location y, the fraction of saddlepoint used:
1 means that only ESS is used and 0 means that only a Gaussian fit is used;
iter
for each location y, the number of iteration needed to solve the
saddlepoint equation;
lambda
an n by d matrix, where the i-th row is the solution of the saddlepoint
equation corresponding to the i-th row of y;
grad
the gradient of the log-density at y (optional);
logNorm
the estimated log normalizing constant (optional);
Matteo Fasiolo <[email protected]> and Simon N. Wood.
Fasiolo, M., Wood, S. N., Hartig, F. and Bravington, M. V. (2016). An Extended Empirical Saddlepoint Approximation for Intractable Likelihoods. ArXiv http://arxiv.org/abs/1601.01849.
library(esaddle) ### Simple univariate example set.seed(4141) x <- rgamma(1000, 2, 1) # Evaluating EES at several point xSeq <- seq(-2, 8, length.out = 200) tmp <- dsaddle(y = xSeq, X = x, decay = 0.05, log = TRUE) # Un-normalized EES tmp2 <- dsaddle(y = xSeq, X = x, decay = 0.05, # EES normalized by importance sampling normalize = TRUE, control = list("method" = "IS", nNorm = 500), log = TRUE) # Plotting true density, EES and normal approximation plot(xSeq, exp(tmp$llk), type = 'l', ylab = "Density", xlab = "x") lines(xSeq, dgamma(xSeq, 2, 1), col = 3) lines(xSeq, dnorm(xSeq, mean(x), sd(x)), col = 2) lines(xSeq, exp(tmp2$llk), col = 4) suppressWarnings( rug(x) ) legend("topright", c("EES un-norm", "EES normalized", "Truth", "Gaussian"), col = c(1, 4, 3, 2), lty = 1)
library(esaddle) ### Simple univariate example set.seed(4141) x <- rgamma(1000, 2, 1) # Evaluating EES at several point xSeq <- seq(-2, 8, length.out = 200) tmp <- dsaddle(y = xSeq, X = x, decay = 0.05, log = TRUE) # Un-normalized EES tmp2 <- dsaddle(y = xSeq, X = x, decay = 0.05, # EES normalized by importance sampling normalize = TRUE, control = list("method" = "IS", nNorm = 500), log = TRUE) # Plotting true density, EES and normal approximation plot(xSeq, exp(tmp$llk), type = 'l', ylab = "Density", xlab = "x") lines(xSeq, dgamma(xSeq, 2, 1), col = 3) lines(xSeq, dnorm(xSeq, mean(x), sd(x)), col = 2) lines(xSeq, exp(tmp2$llk), col = 4) suppressWarnings( rug(x) ) legend("topright", c("EES un-norm", "EES normalized", "Truth", "Gaussian"), col = c(1, 4, 3, 2), lty = 1)
Calculates the empirical cumulant generating function (CGF) and its derivatives given a sample of n d-dimentional vectors.
ecgf(lambda, X, mix, grad = 0)
ecgf(lambda, X, mix, grad = 0)
lambda |
point at which the empirical CGF is evaluated (d-dimensional vector). |
X |
an n by d matrix containing the data. |
mix |
fraction of empirical and normal CGF to use. If |
grad |
if |
For details on the CGF estimator being used here, see Fasiolo et al. (2016).
A list with entries:
K
the value of the empirical CGF at lambda
;
dK
the value of the gradient empirical CGF wrt lambda at lambda
;
d2K
the value of the hessian of the empirical CGF wrt lambda at lambda
.
Matteo Fasiolo <[email protected]> and Simon N. Wood.
Fasiolo, M., Wood, S. N., Hartig, F. and Bravington, M. V. (2016). An Extended Empirical Saddlepoint Approximation for Intractable Likelihoods. ArXiv http://arxiv.org/abs/1601.01849.
X <- matrix(rnorm(2 * 1e3), 1e3, 2) K <- ecgf(lambda = c(0, 0), X = X, mix = 0.5, grad = 2) K$K # CGF K$dK # CGF' (gradient) K$d2K # CGF'' (Hessian)
X <- matrix(rnorm(2 * 1e3), 1e3, 2) K <- ecgf(lambda = c(0, 0), X = X, mix = 0.5, grad = 2) K$K # CGF K$dK # CGF' (gradient) K$d2K # CGF'' (Hessian)
Given a sample from a d-dimensional distribution, the routine finds the mode of the corresponding Extended Empirical Saddlepoint (EES) density.
findMode( X, decay, init = NULL, method = "BFGS", hess = FALSE, sadControl = list(), ... )
findMode( X, decay, init = NULL, method = "BFGS", hess = FALSE, sadControl = list(), ... )
X |
an n by d matrix containing the data. |
decay |
rate at which the SPA falls back on a normal density. Should be a positive number. See Fasiolo et al. (2016) for details. |
init |
d-dimensional vector containing the starting point for the optimization. By default
it is equal to |
method |
optimization method used by |
hess |
if TRUE also an estimate of the Hessian at the mode will be returned. |
sadControl |
list corresponding to the |
... |
Extra arguments to be passed to the optimization routine |
A list where mode
is the location of mode of the empirical saddlepoint density,
logDens
is the log-density at the mode and hess
(present only if argument hess==TRUE
)
is the approximate Hessian at convergence. The other entries are the same as for stats::optim
.
Matteo Fasiolo <[email protected]>.
Fasiolo, M., Wood, S. N., Hartig, F. and Bravington, M. V. (2016). An Extended Empirical Saddlepoint Approximation for Intractable Likelihoods. ArXiv http://arxiv.org/abs/1601.01849.
# library(esaddle) set.seed(4141) x <- rgamma(1000, 2, 1) # Fixing tuning parameter of EES decay <- 0.05 # Evaluating EES at several point xSeq <- seq(-2, 8, length.out = 200) tmp <- dsaddle(y = xSeq, X = x, decay = decay, log = TRUE) # Un-normalized EES # Plotting true density, EES and normal approximation plot(xSeq, exp(tmp$llk), type = 'l', ylab = "Density", xlab = "x") lines(xSeq, dgamma(xSeq, 2, 1), col = 3) suppressWarnings( rug(x) ) legend("topright", c("EES", "Truth"), col = c(1, 3), lty = 1) # Find mode and plot it res <- findMode(x, init = mean(x), decay = decay)$mode abline(v = res, lty = 2, lwd = 1.5)
# library(esaddle) set.seed(4141) x <- rgamma(1000, 2, 1) # Fixing tuning parameter of EES decay <- 0.05 # Evaluating EES at several point xSeq <- seq(-2, 8, length.out = 200) tmp <- dsaddle(y = xSeq, X = x, decay = decay, log = TRUE) # Un-normalized EES # Plotting true density, EES and normal approximation plot(xSeq, exp(tmp$llk), type = 'l', ylab = "Density", xlab = "x") lines(xSeq, dgamma(xSeq, 2, 1), col = 3) suppressWarnings( rug(x) ) legend("topright", c("EES", "Truth"), col = c(1, 3), lty = 1) # Find mode and plot it res <- findMode(x, init = mean(x), decay = decay)$mode abline(v = res, lty = 2, lwd = 1.5)
Obtains a robust estimate of the covariance matrix of a sample of multivariate data, using Campbell's (1980) method as described on p231-235 of Krzanowski (1988).
robCov(sY, alpha = 2, beta = 1.25)
robCov(sY, alpha = 2, beta = 1.25)
sY |
A matrix, where each column is a replicate observation on a multivariate r.v. |
alpha |
tuning parameter, see details. |
beta |
tuning parameter, see details. |
Campbell (1980) suggests an estimator of the covariance matrix which downweights observations
at more than some Mahalanobis distance d.0
from the mean.
d.0
is sqrt(nrow(sY))+alpha/sqrt(2)
. Weights are one for observations
with Mahalanobis distance, d
, less than d.0
. Otherwise weights are
d.0*exp(-.5*(d-d.0)^2/beta^2)/d
. The defaults are as recommended by Campbell.
This routine also uses pre-conditioning to ensure good scaling and stable
numerical calculations. If some of the columns of sY
has zero variance, these
are removed.
A list where:
COV
The estimated covariance matrix.
E
A square root of the inverse covariance matrix. i.e. the inverse cov
matrix is t(E)%*%E
;
half.ldet.V
Half the log of the determinant of the covariance matrix;
mY
The estimated mean;
sd
The estimated standard deviations of each variable.
weights
This is w1/sum(w1)*ncol(sY)
, where w1
are the weights of Campbell (1980).
lowVar
The indexes of the columns of sY
whose variance is zero (if any). These
variable were removed and excluded from the covariance matrix.
Simon N. Wood, maintained by Matteo Fasiolo <[email protected]>.
Krzanowski, W.J. (1988) Principles of Multivariate Analysis. Oxford. Campbell, N.A. (1980) Robust procedures in multivariate analysis I: robust covariance estimation. JRSSC 29, 231-237.
p <- 5;n <- 100 Y <- matrix(runif(p*n),p,n) robCov(Y)
p <- 5;n <- 100 Y <- matrix(runif(p*n),p,n) robCov(Y)
Simulate random variables from the Extended Empirical Saddlepoint density (ESS), using importance sampling and then resampling according to the importance weights.
rsaddle( n, X, decay, ml = 2, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, ... )
rsaddle( n, X, decay, ml = 2, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, ... )
n |
number of simulated vectors. |
X |
an m by d matrix containing the data. |
decay |
rate at which the ESS falls back on a normal density. Should be a positive number. See Fasiolo et al. (2016) for details. |
ml |
n random variables are generated from a Gaussian importance density with covariance matrix
|
multicore |
if TRUE the ESS densities corresponding the samples will be evaluated in parallel. |
cluster |
an object of class |
ncores |
number of cores to be used. |
... |
additional arguments to be passed to |
Notice that, while importance sampling is used, the output is a matrix of unweighted samples, obtained by resampling with probabilities proportional to the importance weights.
An n by d matrix containing the simulated vectors.
Matteo Fasiolo <[email protected]>.
Fasiolo, M., Wood, S. N., Hartig, F. and Bravington, M. V. (2016). An Extended Empirical Saddlepoint Approximation for Intractable Likelihoods. ArXiv http://arxiv.org/abs/1601.01849.
# Simulate bivariate data, where each marginal distribution is Exp(2) X <- matrix(rexp(2 * 1e3), 1e3, 2) # Simulate bivariate data from a saddlepoint fitted to X Z <- rsaddle(1000, X, decay = 0.5) # Look at first marginal distribution hist( Z[ , 1] )
# Simulate bivariate data, where each marginal distribution is Exp(2) X <- matrix(rexp(2 * 1e3), 1e3, 2) # Simulate bivariate data from a saddlepoint fitted to X Z <- rsaddle(1000, X, decay = 0.5) # Look at first marginal distribution hist( Z[ , 1] )
Performs k-fold cross-validation to choose the EES's tuning parameter, which determines the mixture between a consistent and a Gaussian estimator of the Cumulant Generating Function (CGF).
selectDecay( decay, simulator, K, nrep = 1, normalize = FALSE, draw = TRUE, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, control = list(), ... )
selectDecay( decay, simulator, K, nrep = 1, normalize = FALSE, draw = TRUE, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, control = list(), ... )
decay |
Numeric vector containing the possible values of the tuning parameter. |
simulator |
Function with prototype |
K |
the number of folds to be used in cross-validation. |
nrep |
Number of times the whole cross-validation procedure will be repeated, by calling
|
normalize |
if TRUE the normalizing constant of EES is normalized at each value of |
draw |
if |
multicore |
if TRUE each fold will run on a different core. |
cluster |
an object of class |
ncores |
number of cores to be used. |
control |
a list of control parameters, with entries:
|
... |
extra arguments to be passed to |
A list with entries:
negLogLik
A matrix length{decay}
by K*nrep
where the i-th row represent the negative loglikelihood
estimated for the i-th value of decay
, while each column represents a different fold and repetition.
summary
A matrix of summary results from the cross-validation procedure.
normConst
A matrix length{decay}
by nrep
where the i-th row contains the estimates of the normalizing constant.
The list is returned invisibly. If control$draw == TRUE
the function will also plot the cross-validation curve.
Matteo Fasiolo <[email protected]>.
Fasiolo, M., Wood, S. N., Hartig, F. and Bravington, M. V. (2016). An Extended Empirical Saddlepoint Approximation for Intractable Likelihoods. ArXiv http://arxiv.org/abs/1601.01849.
library(esaddle) # The data is far from normal: saddlepoint is needed and we expect # cross validation to be minimized at low "decay" set.seed(4124) selectDecay(decay = c(0.001, 0.01, 0.05, 0.1, 0.5, 1), simulator = function(...) rgamma(500, 2, 1), K = 5) # The data is far from normal: saddlepoint is not needed and we expect # the curve to be fairly flat for high "decay" selectDecay(decay = c(0.001, 0.01, 0.05, 0.1, 0.5, 1), simulator = function(...) rnorm(500, 0, 1), K = 5)
library(esaddle) # The data is far from normal: saddlepoint is needed and we expect # cross validation to be minimized at low "decay" set.seed(4124) selectDecay(decay = c(0.001, 0.01, 0.05, 0.1, 0.5, 1), simulator = function(...) rgamma(500, 2, 1), K = 5) # The data is far from normal: saddlepoint is not needed and we expect # the curve to be fairly flat for high "decay" selectDecay(decay = c(0.001, 0.01, 0.05, 0.1, 0.5, 1), simulator = function(...) rnorm(500, 0, 1), K = 5)