Title: | Ensemble Quadratic and Affine Invariant Markov Chain Monte Carlo |
---|---|
Description: | The Ensemble Quadratic and Affine Invariant Markov chain Monte Carlo algorithms provide an efficient way to perform Bayesian inference in difficult parameter space geometries. The Ensemble Quadratic Monte Carlo algorithm was developed by Militzer (2023) <doi:10.3847/1538-4357/ace1f1>. The Ensemble Affine Invariant algorithm was developed by Goodman and Weare (2010) <doi:10.2140/camcos.2010.5.65> and it was implemented in Python by Foreman-Mackey et al (2013) <doi:10.48550/arXiv.1202.3665>. The Quadratic Monte Carlo method was shown to perform better than the Affine Invariant method in the paper by Militzer (2023) <doi:10.3847/1538-4357/ace1f1> and the Quadratic Monte Carlo method is the default method used. The Chen-Shao Highest Posterior Density Estimation algorithm is used for obtaining credible intervals and the potential scale reduction factor diagnostic is used for checking the convergence of the chains. |
Authors: | Weston Roda [aut, cre] , Karsten Hempel [aut] , Sasha van Katwyk [aut] , Diepreye Ayabina [aut] , Children's Hospital of Eastern Ontario [fnd], Canada's Drug Agency [fnd], Institute of Health Economics [cph] |
Maintainer: | Weston Roda <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2025-01-09 16:40:32 UTC |
Source: | CRAN |
This function runs the Ensemble Quadratic or Affine Invariant MCMC algorithm for Bayesian inference parameter estimation and it is based off of the papers by Militzer (2023), Goodman and Weare (2010), and Foreman-Mackey, Hogg, Lang, and Goodman (2013). The Ensemble Quadratic Monte Carlo algorithm was developed by Militzer (2023). The Ensemble Affine Invariant algorithm was developed by Goodman and Weare (2010) and it was implemented in Python by Foreman-Mackey, Hogg, Lang, and Goodman (2013). The Quadratic Monte Carlo method was shown to perform better than the Affine Invariant method in the paper by Militzer (2023) and the Quadratic method is the default method used in the 'ensemblealg' function.
ensemblealg( theta0, logfuns, T_iter, Thin_val, UseQuad = TRUE, a_par = NULL, ShowProgress = FALSE, ReturnCODA = FALSE )
ensemblealg( theta0, logfuns, T_iter, Thin_val, UseQuad = TRUE, a_par = NULL, ShowProgress = FALSE, ReturnCODA = FALSE )
theta0 |
The matrix of initial guesses for the MCMC chains |
logfuns |
A list object containing the log prior function and log likelihood function |
T_iter |
The number of iterations to run for each chain in the Ensemble MCMC algorithm |
Thin_val |
Every nth iteration is saved, where n is equal to the "Thin_val" parameter |
UseQuad |
If this bool is true, then the Ensemble Quadratic MCMC algorithm is used. Otherwise, the Ensemble Affine Invariant MCMC algorithm is used. (The default setting is true.) |
a_par |
The parameter 'a_par' is a performance parameter for the MCMC ensemble algorithm. (The default setting for the Quadratic algorithm is 'a_par' equal to 1.5 and the default setting for the Affine Invariant algorithm is 'a_par' equal to 2.) |
ShowProgress |
If this bool is true, then the progress of the algorithm is shown. Otherwise, the progress of the algorithm is not shown.(The default setting is false.) |
ReturnCODA |
If this bool is true, then the 'coda' package 'mcmc.list' object is returned along with the other outputs (The default setting is false.) |
A list object is returned that contains three matrices: theta_sample,
log_like_sample, and log_prior_sample.
theta_sample: this is the matrix of
parameter samples returned from the Ensemble MCMC algorithm, the matrix
dimensions are given by
(Number of parameters) x (Number of chains) x (Number of iterations)
log_like_sample: this is the matrix of log likelihood samples returned from
the Ensemble MCMC algorithm, the matrix dimensions are given by
(Number of chains) x (Number of iterations)
log_prior_sample: this is the matrix of log prior samples returned from the
Ensemble MCMC algorithm, the matrix dimensions are given by
(Number of chains) x (Number of iterations)
mcmc_list_coda: (optional) this is the 'coda' package 'mcmc.list' object that can be used with various MCMC diagnostic functions in the 'coda' package
Militzer B (2023) Study of Jupiter’s Interior with Quadratic
Monte Carlo Simulations. ApJ 953(111):20pp.
https://doi.org/10.3847/1538-4357/ace1f1
Goodman J and Weare J (2010) Ensemble samplers with affine invariance. Commun
Appl Math Comput Sci 5(1):65-80.
https://doi.org/10.2140/camcos.2010.5.65
Foreman-Mackey D, Hogg DW, Lang D, Goodman J (2013) emcee: The MCMC Hammer. PASP 125(925):306. https://doi.org/10.48550/arXiv.1202.3665
#Ensemble Quadratic MCMC algorithm example for fitting a Weibull #distribution #Assume the true parameters are a_shape = 20 sigma_scale = 900 #Random sample from the Weibull distribution with a = 20 and sigma = 900, #Y~WEI(a = 20, sigma = 900) num_ran_samples = 50 data_weibull = matrix(NA, nrow = 1, ncol = num_ran_samples) #Set the seed for this example set.seed(10) data_weibull = rweibull(num_ran_samples, shape = a_shape, scale = sigma_scale) #We want to estimate a_shape and sigma_scale #Log prior function for a_shape and sigma_scale #(assumed priors a_shape ~ U(1e-2, 1e2) and sigma_scale ~ U(1, 1e4)) logp <- function(param) { a_shape_use = param[1] sigma_scale_use = param[2] logp_val = dunif(a_shape_use, min = 1e-2, max = 1e2, log = TRUE) + dunif(sigma_scale_use, min = 1, max = 1e4, log = TRUE) return(logp_val) } #Log likelihood function for a_shape and sigma_scale logl <- function(param) { a_shape_use = param[1] sigma_scale_use = param[2] logl_val = sum(dweibull(data_weibull, shape = a_shape_use, scale = sigma_scale_use, log = TRUE)) return(logl_val) } logfuns = list(logp = logp, logl = logl) num_par = 2 #It is recommended to use at least twice as many chains as the number of #parameters to be estimated. num_chains = 2*num_par #Generate initial guesses for the MCMC chains theta0 = matrix(0, nrow = num_par, ncol = num_chains) temp_val = 0 j = 0 while(j < num_chains) { initial = c(runif(1, 1e-2, 1e2), runif(1, 1, 1e4)) temp_val = logl(initial) + logp(initial) while(is.na(temp_val) || is.infinite(temp_val)) { initial = c(runif(1, 1e-2, 1e2), runif(1, 1, 1e4)) temp_val = logl(initial) + logp(initial) } j = j + 1 message(paste('j:', j)) theta0[1,j] = initial[1] theta0[2,j] = initial[2] } num_chain_iterations = 1e4 thin_val_par = 10 #The total number of returned samples is given by #(num_chain_iterations/thin_val_par)*num_chains = 4e3 #Ensemble Quadratic MCMC algorithm Weibull_Quad_result = ensemblealg(theta0, logfuns, T_iter = num_chain_iterations, Thin_val = thin_val_par) my_samples = Weibull_Quad_result$theta_sample my_log_prior = Weibull_Quad_result$log_prior_sample my_log_like = Weibull_Quad_result$log_like_sample #Burn-in 25% of each chain my_samples_burn_in = my_samples[,,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))] my_log_prior_burn_in = my_log_prior[,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))] my_log_like_burn_in = my_log_like[,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))] #Calculate potential scale reduction factors diagnostic_result = psrfdiagnostic(my_samples_burn_in, 0.05) diagnostic_result$p_s_r_f_vec #log unnormalized posterior samples log_un_post_vec = as.vector(my_log_prior_burn_in + my_log_like_burn_in) #a_shape posterior samples k1 = as.vector(my_samples_burn_in[1,,]) #sigma_scale posterior samples k2 = as.vector(my_samples_burn_in[2,,]) #Calculate posterior median, 95% credible intervals, and maximum posterior for #the parameters median(k1) hpdparameter(k1, 0.05) median(k2) hpdparameter(k2, 0.05) k1[which.max(log_un_post_vec)] k2[which.max(log_un_post_vec)] #These plots display the silhouette of the unnormalized posterior surface from #the chosen parameter's perspective plot(k1, exp(log_un_post_vec), xlab="a_shape", ylab="unnormalized posterior density") plot(k2, exp(log_un_post_vec), xlab="sigma_scale", ylab="unnormalized posterior density")
#Ensemble Quadratic MCMC algorithm example for fitting a Weibull #distribution #Assume the true parameters are a_shape = 20 sigma_scale = 900 #Random sample from the Weibull distribution with a = 20 and sigma = 900, #Y~WEI(a = 20, sigma = 900) num_ran_samples = 50 data_weibull = matrix(NA, nrow = 1, ncol = num_ran_samples) #Set the seed for this example set.seed(10) data_weibull = rweibull(num_ran_samples, shape = a_shape, scale = sigma_scale) #We want to estimate a_shape and sigma_scale #Log prior function for a_shape and sigma_scale #(assumed priors a_shape ~ U(1e-2, 1e2) and sigma_scale ~ U(1, 1e4)) logp <- function(param) { a_shape_use = param[1] sigma_scale_use = param[2] logp_val = dunif(a_shape_use, min = 1e-2, max = 1e2, log = TRUE) + dunif(sigma_scale_use, min = 1, max = 1e4, log = TRUE) return(logp_val) } #Log likelihood function for a_shape and sigma_scale logl <- function(param) { a_shape_use = param[1] sigma_scale_use = param[2] logl_val = sum(dweibull(data_weibull, shape = a_shape_use, scale = sigma_scale_use, log = TRUE)) return(logl_val) } logfuns = list(logp = logp, logl = logl) num_par = 2 #It is recommended to use at least twice as many chains as the number of #parameters to be estimated. num_chains = 2*num_par #Generate initial guesses for the MCMC chains theta0 = matrix(0, nrow = num_par, ncol = num_chains) temp_val = 0 j = 0 while(j < num_chains) { initial = c(runif(1, 1e-2, 1e2), runif(1, 1, 1e4)) temp_val = logl(initial) + logp(initial) while(is.na(temp_val) || is.infinite(temp_val)) { initial = c(runif(1, 1e-2, 1e2), runif(1, 1, 1e4)) temp_val = logl(initial) + logp(initial) } j = j + 1 message(paste('j:', j)) theta0[1,j] = initial[1] theta0[2,j] = initial[2] } num_chain_iterations = 1e4 thin_val_par = 10 #The total number of returned samples is given by #(num_chain_iterations/thin_val_par)*num_chains = 4e3 #Ensemble Quadratic MCMC algorithm Weibull_Quad_result = ensemblealg(theta0, logfuns, T_iter = num_chain_iterations, Thin_val = thin_val_par) my_samples = Weibull_Quad_result$theta_sample my_log_prior = Weibull_Quad_result$log_prior_sample my_log_like = Weibull_Quad_result$log_like_sample #Burn-in 25% of each chain my_samples_burn_in = my_samples[,,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))] my_log_prior_burn_in = my_log_prior[,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))] my_log_like_burn_in = my_log_like[,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))] #Calculate potential scale reduction factors diagnostic_result = psrfdiagnostic(my_samples_burn_in, 0.05) diagnostic_result$p_s_r_f_vec #log unnormalized posterior samples log_un_post_vec = as.vector(my_log_prior_burn_in + my_log_like_burn_in) #a_shape posterior samples k1 = as.vector(my_samples_burn_in[1,,]) #sigma_scale posterior samples k2 = as.vector(my_samples_burn_in[2,,]) #Calculate posterior median, 95% credible intervals, and maximum posterior for #the parameters median(k1) hpdparameter(k1, 0.05) median(k2) hpdparameter(k2, 0.05) k1[which.max(log_un_post_vec)] k2[which.max(log_un_post_vec)] #These plots display the silhouette of the unnormalized posterior surface from #the chosen parameter's perspective plot(k1, exp(log_un_post_vec), xlab="a_shape", ylab="unnormalized posterior density") plot(k2, exp(log_un_post_vec), xlab="sigma_scale", ylab="unnormalized posterior density")
This function returns the upper and lower bound of the Highest Posterior Density (HPD) for a parameter based on the Chen-Shao Highest Posterior Density (HPD) Estimation Algorithm found in the book by Chen, Shao, and Ibrahim (2000). (The smallest 95% credible interval will be given by the HPD using alpha = 0.05)
hpdparameter(parameter_MCMC, alpha = 0.05)
hpdparameter(parameter_MCMC, alpha = 0.05)
parameter_MCMC |
a vector of the parameter samples for a single estimated parameter |
alpha |
100(1 - alpha)% credible interval with the default value as alpha = 0.05 |
A vector is returned that contains the lower and upper bound of the Highest Posterior Density (HPD) for a parameter (this will be the smallest 95% credible interval using alpha = 0.05)
Chen M, Shao Q, Ibrahim JG (2000) Monte Carlo Methods in Bayesian Computation. New York-New York: Springer-Verlag.
x_parameter = rnorm(75, mean = 0, sd = 1) hpdparameter(x_parameter, 0.05)
x_parameter = rnorm(75, mean = 0, sd = 1) hpdparameter(x_parameter, 0.05)
This function computes the potential scale reduction factor for
each parameter to formally test the convergence of the MCMC sampling to the
estimated posterior distribution which was developed by Gelman
and Brooks (1998). This potential scale reduction factor is based on empirical
interval lengths with the following formula:
, where
is the
distance between the upper and lower values of the
interval for the pooled samples,
is the distance between the upper
and lower values of the
interval for the
chain, and
is the total number of chains used.
When the potential scale reduction factor is close to 1 for all the estimated
parameters, this indicates that the MCMC sampling converged to the estimated
posterior distribution for each parameter.
psrfdiagnostic(my_samples_burn_in, alpha = 0.05)
psrfdiagnostic(my_samples_burn_in, alpha = 0.05)
my_samples_burn_in |
This parameter is a matrix of parameter samples returned from the Ensemble MCMC algorithm 'ensemblealg', the matrix dimensions are given by (Number of parameters) x (Number of chains) x (Number of iterations - Number of burn in iterations). It is recommended to burn-in the parameter samples from the starting iterations before running the 'psrfdiagnostic' to assess the convergence. |
alpha |
the alpha value here corresponds to the 100(1 - alpha)% credible intervals to be estimated, with the default value as alpha = 0.05 |
A list object is returned that contains two vectors and one matrix:
p_s_r_f_vec, L_vec, and d_matrix.
p_s_r_f_vec: this is the vector of potential scale reduction factors in order
of the parameters
L_vec: this is the vector of distances between the upper and lower values of
the 95% interval for the pooled samples and these distances are in order of
the parameters
d_matrix: this is the matrix of distances between the upper and lower values of the 95% interval for the samples in each of the chains, the matrix dimensions are given by (Number of parameters) x (Number of chains)
Brooks SP and Gelman A (1998) General methods for monitoring convergence of iterative simulations. J Comp Graph Stat 7(4):434-455.
#Take 100 random samples from a multivariate normal distribution #with mean c(1, 2) and covariance matrix matrix(c(1, 0.75, 0.75, 1), nrow = 2, ncol = 2) #for each of four chains. my_samples_example = array(0, dim=c(2, 4, 100)) for(j in 1:4) { for(i in 1:100) { my_samples_example[,j,i] = solve(matrix(c(1, 0.75, 0.75, 1), nrow = 2, ncol = 2))%*% rnorm(2, mean = 0, sd = 1) + matrix(c(1, 2), nrow = 2, ncol = 1, byrow = TRUE) } } #The potential scale reduction factors for each parameter are close to 1 psrfdiagnostic(my_samples_example)$p_s_r_f_vec
#Take 100 random samples from a multivariate normal distribution #with mean c(1, 2) and covariance matrix matrix(c(1, 0.75, 0.75, 1), nrow = 2, ncol = 2) #for each of four chains. my_samples_example = array(0, dim=c(2, 4, 100)) for(j in 1:4) { for(i in 1:100) { my_samples_example[,j,i] = solve(matrix(c(1, 0.75, 0.75, 1), nrow = 2, ncol = 2))%*% rnorm(2, mean = 0, sd = 1) + matrix(c(1, 2), nrow = 2, ncol = 1, byrow = TRUE) } } #The potential scale reduction factors for each parameter are close to 1 psrfdiagnostic(my_samples_example)$p_s_r_f_vec