Shrinkage Priors for Bayesian Vectorautoregressions featuring Stochastic Volatility Using the R Package bayesianVARs

In multivariate time series analysis, vectorautoregressions (VARs) are widely applied in fields such as brain connectivity modeling and the modeling of macroeconomic and financial time series . Especially in macroeconomic applications, VARs have probably become the workhorse model for forecasting. The VAR model of order p, VAR(p), can be formulated as follows: where yt is the M-dimensional vector of interest, Al, for l = 1, , p, an unknown M × M matrix of regression coefficients, εt an M-dimensional vector of errors, and Σt the corresponding M × M variance-covariance matrix. For ease of notation, let Φ ≔ (A1, …, Ap) denote the (K = pM) × M matrix containing all VAR coefficients and let ϕ ≔ vec(Φ) denote the vectorization thereof with length n = pM2.

To facilitate efficient and reliable estimation when M gets large, we consider two different decompositions of the variance-covariance matrix Σt explained in the following paragraphs.

Assuming that the errors feature a factor stochastic volatility structure, following , we decompose the variance-covariance matrix into Both Qt = diag(eh1t, …, ehMt) and Vt = diag(ehM + 1, t, …, ehM + r, t) are diagonal matrices of dimension M and r, respectively, and Λ is the M × r matrix of factor loadings. This is obviously equivalent to introducing r conditionally independent latent factors f ∼ 𝒩r(0, Vt) and rewriting the error term in as where ηt ∼ 𝒩M(0, Qt). The matrix Qt contains the idiosyncratic, series specific, variances. The matrix Vt contains the factor specific variances governing the contemporaneous dependencies. The logarithms of the elements in Qt and Vt follow a priori independent autoregressive processes of order one (AR(1)). More specifically, the evolution of the idiosyncratic log-variance hit ∼ 𝒩(μi + ϕi(hi, t − 1 − μi), σi2), for i = 1, …, M, is described by the parameters μi, the level, ϕi, the persistence and σi2, the variance. The factor-specific log-variance hjt ∼ 𝒩(ϕjhj, t − 1, σj2), for j = M + 1, …, M + r, is assumed to have mean zero to identify the scaling of the elements of Λ. Without imposing restrictions on the factor loadings, the VAR with factor stochastic volatility is invariant to the way the variables are ordered.

Assuming that the errors feature a Cholesky stochastic volatility structure, following , we decompose the variance-covariance matrix into where U is an M × M upper triangular matrix with ones on the diagonal. The logarithms of the elements of the M-dimensional diagonal matrix Dt = diag(eh1t, …, ehMt) are assumed to follow a priori independent AR(1) processes, i.e. hit ∼ 𝒩(μi + ϕi(hi, t − 1 − μi), σi2), for i = 1, …, M. Since Ut is a triangular matrix, the VAR with Cholesky stochastic volatility depends on the way the variables are ordered.

While flexible, VARs are known to be overparameterized: In macroeconomic applications the number of available observations T can be relatively small compared to the number of VAR coefficients n, since the data is usually reported on a quarterly or yearly basis. Bayesian shrinkage priors can be used to alleviate this issue. In the following paragraphs, we discuss several prior options for the VAR coefficients before briefly discussing prior choices for the variance-covariance matrix. In general, we assume that the joint prior distribution has a product form p(ϕ, Σt) = p(ϕ)p(Σt), i.e. we assume that a priori ϕ and Σt are independent. The generic prior for the VAR coefficients is conditionally normal $\bm \phi | \underline{\bm V} \sim \mathcal{N}_n(\bm 0, \underline{\bm V})$, where $\underline{\bm V}=\mathrm{diag}(v_1,\dots,v_n)$ is an n-dimensional diagonal matrix. The priors distinguish themselves in their treatment of $\underline{\bm V}$.

The Minnesota prior proposed in is mainly characterized by two assumptions: First, the own past of a given variable is more important in predicting its current value than the past of other variables. Second, the most recent past is assumed to be more important in predicting current values than the more distant past. Hence, $\underline{\bm{V}}$ is structured in a way, such that the sub-diagonal elements of Φ (the own-lag coefficients) are shrunken less than the off-diagonal elements (the cross-lag coefficients). And, coefficients associated with more recent lags are shrunken less than the ones associated with more distant lags. Denote $\mathbf{\underline{V}}_i$ the block of $\mathbf{\underline{V}}$ that corresponds to the K coefficients in the ith equation, and let $\mathbf{\underline{V}}_{i,jj}$ be its diagonal elements. The diagonal elements are set to where σ̂i2 is the OLS variance of a univariate AR(6) model of the ith variable. The term l2 in the denominator automatically imposes more shrinkage on the coefficients towards their prior mean as lag length increases. The term $\frac{\hat{\sigma}^2_{i}}{\hat{\sigma}^2_{j}}$ adjusts not only for different scales in the data, it is also intended to account for different scales of the responses of one economic variable to another. To shrink own-lag coefficients less than cross-lag coefficients, one could set λ1 > λ2. The hierarchical Minnesota prior, however, treats both shrinkage parameters as unknown. Following , we place independent gamma priors on λ1 and λ2,

Global local shrinkage priors in the fashion of are also used in the VAR literature . In order to combine the merits of tailor-made priors, such as the Minnesota prior, with the flexibility of off-the-shelf global local shrinkage priors, propose the class of semi-global local priors. Other than global local priors, which shrink globally, semi-global local priors shrink semi-globally, meaning that semi-global shrinkage is imposed on k pre-specified subgroups of the parameter space. Let 𝒜j, for j = 1, , k, denote the generic index set that labels the coefficients of the jth group in ϕ (e.g., the first group could be the own-lag coefficients associated with the first lag, the second group could be the cross-lag coefficients associated with the first lag, etc.). Then, a semi-global local prior with k groups has the following hierarchical representation: where K(δ) denotes a symmetric unimodal density with variance δ, ζj represents the semi-global shrinkage, and ϑi the local shrinkage. The only additional input required is the partitioning of ϕ into k subgroups. Several options for grouping the coefficients are ready-made in , though any custom grouping could be specified as well. The grouping indicates that the covariates of each equation form M separate groups (column-wise shrinkage w.r.t. Φ). The partitioning implies that the K covariates across all equations form separate groups (row-wise shrinkage w.r.t. Φ). The (olcl-lagwise) partitioning mimics some features of the Minnesota prior: In each lag, the diagonal elements (the own-lags) and the off-diagonal elements (the cross-lags) constitute separate groups, which makes 2p groups in total. The following list of hierarchical shrinkage priors, which can be cast in the form of semi-global (local) priors, are implemented in (in alphabetical order): Dirichlet-Laplace (DL) prior , Horseshoe prior , normal-gamma (NG) prior , R2-induced Dirichlet decomposition (R2D2) prior and stochastic-search-variable-selection (SSVS) prior . Fore more detailed characteristics and comparisons of those priors we refer to .

In the case that the variance-covariance is modeled via the factor decomposition, the priors from and are used. In the case that the errors are assumed to feature the Cholesky stochastic volatility structure, implements the DL prior, the HS prior, the NG prior, the R2D2 prior, and the SSVS prior for the free off-diagonal elements in U. Concerning the latent variables and their associated parameters in Dt, the priors from are used.

It should be noted that also implements homoskedastic VARs where Σt = Σ for all t. In case of the VAR with factor structure on the errors it holds that Vt = V = Ir and Qt = Q = diag(q1, …, qM) for all t. A priori, the ith diagonal element qi ∼ ℐ𝒢(af, bf) is assumed to follow an inverse gamma distribution for i = 1, …, M, independently. In case of the VAR with Cholesky structure on the errors, it holds that Dt = D = diag(d1, …, dM) for all t. The prior distribution of the ith diagonal element is inverse gamma, i.e. di ∼ ℐ𝒢(ac, bc) for i = 1, …, M, independently.

In a nutshell, implements a Markov chain Monte Carlo (MCMC) algorithm which alternately samples from the full conditional posterior distribution of the VAR coefficients p(ϕ|•) and from the full conditional posterior distribution of the paths of the variance-covariance matrix p(Σt|•) for t = 1, , T, with indicating that we condition on the remaining parameters and latent quantities of the model. To render computation of the necessary steps required for sampling from p(ϕ|•) feasible, implements the equation-per-equation algorithm proposed in for the VAR with factor stochastic volatility and the correct triangular algorithm from for the VAR with Cholesky stochastic volatility. The hyperparameters of the hierarchical shrinkage priors are sampled from the respective full conditional posterior distributions outlined in . To sample from p(Σt|•) for t = 1, , T, for the VAR with factor stochastic volatility, accesses the package . For the VAR with Cholesky stochastic volatility, the latent variables and associated parameters in Dt are sampled using the package . The free off-diagonal elements in U are sampled equation-per-equation as proposed in . Last but not least, all computationally intensive tasks are written in C++ and interfaced with R via and for increased computational efficiency.

We demonstrate the main functionality of using the dataset included in the package. The dataset – obtained from FRED-QD, a quarterly database for macroeconomic research – contains the time-series of 21 variables transformed to growth rates through taking log-differences (except for interest rates).

library(bayesianVARs)
variables <- c("GDPC1", "GDPCTPI",  "FEDFUNDS", "EXUSUKx", "S&P 500")
train_data <- 100 * usmacro_growth[1:230, variables]
test_data <- 100 * usmacro_growth[231:234, variables]

The workhorse function of for conducting MCMC inference is the function . Though it offers a low barrier to entry for users (in case only data is supplied without any further sampler and or prior configurations, default values are used), we encourage the user to specify the model to be estimated in more detail using the helper functions (prior configuration concerning the VAR coefficients) and (prior configuration concerning the variance-covariance of the VAR). In our demonstration, we specify a VAR with p = 2 lags with factor stochastic volatility and r = 4 factors and a semi-global local HS prior with olcl-lagwise partitioning for the VAR coefficients. It is possible to impose standard global local priors by specifying ’s argument . An arbitrary grouping for semi-global local priors can be achieved by supplying an indicator matrix to .

prior_phi <- specify_prior_phi(data = train_data, 
                               lags = 2L, 
                               prior = "HS", 
                               global_grouping = "olcl-lagwise")
prior_sigma <- specify_prior_sigma(data = train_data, 
                                   type = "factor", 
                                   factor_factors = 4L)
                                   
mod <- bvar(train_data, lags = 2L, draws = 10000, burnin = 2000, 
            prior_phi = prior_phi, prior_sigma = prior_sigma, 
            sv_keep = "all")

The plot methods shows the model fit via 90% in-sample prediction intervals by default, see Figure.

plot(mod, quantiles = c(0.05,0.5,0.95), dates = rownames(mod$Yraw))
Visualization of estimated in-sample prediction intervals. The red solid line depicts the median, the red shaded region the 90% credible interval and the black dotted line the observed data used for estimation.

Visualization of estimated in-sample prediction intervals. The red solid line depicts the median, the red shaded region the 90% credible interval and the black dotted line the observed data used for estimation.

The object output by contains the posterior draws. The extractors and come in handy to access the posterior draws of Φ and Σt, respectively. The function visualizes posterior summaries, such as the posterior median or posterior interquartile-range, as heatmaps, see Figure.

phi <- coef(mod)
posterior_heatmap(phi,median)
posterior_heatmap(phi, IQR)
Posterior summary of the VAR coefficients. Left: Heatmap of the posterior median. Right: Heatmap of the posterior interquartile range.Posterior summary of the VAR coefficients. Left: Heatmap of the posterior median. Right: Heatmap of the posterior interquartile range.

Posterior summary of the VAR coefficients. Left: Heatmap of the posterior median. Right: Heatmap of the posterior interquartile range.

The method simulates from the posterior predictive distribution. Log-predictive likelihoods will be computed if the ex-post observed data is supplied.

pred <- predict(mod, ahead = 1:4, LPL = TRUE, Y_obs = test_data)

The method for draws of the posterior predictive distribution defaults to displaying fan-charts by joining line charts for the observed data of the estimation sample with credible intervals of the posterior predictive distribution, see Figure.

plot(pred, first_obs = 216, 
     dates = c(rownames(train_data[-c(1:215),]), rownames(test_data)))
Fan-charts visualizing the last 15 out of 230 observations used for estimation through black solid lines, the median of the $h$-step ahead predictive distribution through red solid lines and the 50%/90% credible intervals of the $h$-step ahead predictive distribution through red shaded regions for $h=1,\dots,4$.

Fan-charts visualizing the last 15 out of 230 observations used for estimation through black solid lines, the median of the h-step ahead predictive distribution through red solid lines and the 50%/90% credible intervals of the h-step ahead predictive distribution through red shaded regions for h = 1, …, 4.

The calculated log-predictive likelihoods could be used to comparing forecasting performances of different models.

pred$LPL
#>       t+1       t+2       t+3       t+4 
#> -4.418369 -4.915120 -5.657900 -7.090953

References