bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R

This is a short vignette illustrating the bssm package. For more detailed exposition, please see the corresponding R Journal paper:

Jouni Helske and Matti Vihola (2021). “bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R”. The R Journal (2021) 13:2, pages 578-589. Link to the paper.

Introduction

State space models (SSM) are latent variable models which are commonly applied in analysing time series data due to their flexible and general framework (cf. Durbin and Koopman 2012). For R (R Core Team 2020), there is large number of packages available for state space modelling, especially for the two special cases. First special case is linear-Gaussian SSM (LGSSM) where both the observation and state densities are Gaussian with linear relationships with the states. Another special case is SSM with discrete state space, which are sometimes called hidden Markov models (HMM). What is special about these two classes of models is that the marginal likelihood function, and the conditional state distributions (conditioned on the observations) of these models are analytically tractable, making inference relatively straightforward. See for example S. Helske and Helske (2019) for review of some of the R packages dealing with these type of models. The R package bssm is designed for Bayesian inference of general state space models with non-Gaussian and/or non-linear observational and state equations. The package aims to provide easy-to-use and efficient functions for fully Bayesian inference of common time series models such basic structural time series model (BSM) (Harvey 1989) with exogenous covariates, simple stochastic volatility models, and discretized diffusion models, making it straightforward and efficient to make predictions and other inference in a Bayesian setting.

The motivation behind the bssm package is in (Vihola, Helske, and Franks 2020) which suggests a new computationally efficient, parallelisable approach for Bayesian inference of state space models. The core idea is to use fast approximate Markov chain Monte Carlo (MCMC) targeting the approximate marginal posterior of the hyperparameters (i.e. unknown variables excluding latent state variables), which is then used in importance sampling type weighting phase which provides asymptotically exact samples from the joint posterior of hyperparameters and the hidden states. In addition to this the two-stage procedure, standard pseudo-marginal MCMC and so called delayed acceptance pseudo-marginal MCMC are also supported. For more details, see (J. Helske and Vihola 2021). There is also separate vignette for nonlinear models as well as for discretized diffusion models.

State space models with linear-Gaussian dynamics

Denote a sequence of observations (y1, …, yT) as y1 : T, and sequence of latent state variables (α1, …, αT) as α1 : T. A general state space model consists of two parts: observation level densities gt(yt|αt) and latent state transition densities μt(αt + 1|αt). We first focus on the case where the state transitions are linear-Gaussian: αt + 1 = ct + Ttαt + Rtηt, where ct is known input vector (often omitted), and Tt and Rt are a system matrices which can depend on unknown parameters. Also, ηt ∼ N(0, Ik) and α1 ∼ N(a1, P1) independently of each other. For observation level density gt, the bssm package currently supports basic stochastic volatility model and general exponential family state space models.

For exponential family models, the observation equation has a general form

gt(yt|dt + Ztαt, xtβ, ϕ, ut), where dt is a again known input, xt contains the exogenous covariate values at time t, with β corresponding to the regression coefficients. Parameter ϕ and the known vector ut are distribution specific and can be omitted in some cases. Currently, following observational level distributions are supported:

  • Gaussian distribution: gt(yt|Ztαt, xtβ) = xtβ + Ztαt + Htϵt with ϵt ∼ N(0, 1).

  • Poisson distribution: gt(yt|Ztαt, xtβ, ut) = Poisson(utexp (xtβ + Ztαt)), where ut is the known exposure at time t.

  • Binomial distribution: gt(yt|Ztαt, xtβ, ut) = binomial(ut, exp (xtβ + Ztαt)/(1 + exp (xtβ + Ztαt))), where ut is the number of trials and exp (xtβ + Ztαt)/(1 + exp (xtβ + Ztαt)) is the probability of the success.

  • Negative binomial distribution: gt(yt|Ztαt, xtβ, ϕ, ut) = negative binomial(exp (xtβ + Ztαt), ϕ, ut), where utexp (xtβ + Ztαt) is the expected value and ϕ is the dispersion parameter (ut is again exposure term).

  • Gamma distribution: gt(yt|dt + Ztαt, ϕ, ut) = Gamma(exp (dt + Ztαt), ϕ, ut), where utexp (dt + Ztαt) is the expected value, ϕ is the shape parameter, and ut is a known offset term.

For stochastic volatility model, there are two possible parameterizations available. In general for we have yt = σexp (αt/2)ϵt,  ϵt ∼ N(0, 1), and αt + 1 = μ + ρ(αt − μ) + σηηt, with α1 ∼ N(μ, ση2/(1 − ρ2)). For identifiability purposes we must either choose σ = 1 or μ = 0. Although analytically identical, the parameterization with μ is often preferable in terms of computational efficiency.

Typically some of the model components such as β, Tt or Rt depend on unknown parameter vector θ, so gt(yt|αt) and μt(αt + 1|αt) depend implicitly on θ. Our goal is to perform Bayesian inference of the joint posterior of α1 : T and θ.

For multivariate models, these distributions can be combined arbitrarily, except the stochastic volatility model case which is currently handled separately. Also, for fully Gaussian model, the observational level errors can be correlated, i.e. Cov(ϵt) = HtHt.

Non-linear models

The general non-linear Gaussian model in the bssm has following form:

$$ y_t = Z(t, \alpha_t, \theta) + H(t, \alpha_t, \theta)\epsilon_t,\\ \alpha_{t+1} = T(t, \alpha_t, \theta) + R(t, \alpha_t, \theta)\eta_t,\\ \alpha_1 \sim N(a_1(\theta), P_1(\theta)), $$ with t = 1, …, n, ϵt ∼ N(0, Ip), and η ∼ N(0, Ik). Here vector θ contains the unknown model parameters. Due to their generality and the need to repeated calls within MCMC, functions T(⋅), H(⋅), T(⋅), R(⋅),a1(⋅), P1(⋅), as well as functions defining the Jacobians of Z(⋅) and T(⋅) needed by the extended Kalman filter and the prior distribution for θ must be defined by user as a external pointers to user-defined C++ functions. All of these functions can also depend on some known parameters, defined as known_params (vector) and known_tv_params (matrix with n columns) arguments to ssm_nlg function. See the growth model vignette1 for a template for these functions.

Time-discretised diffusion models

The bssm package also supports models where the state equation is defined as a continuous time diffusion model of form dαt = μ(αt, θ)dt + σ(αt, θ)dBt,  t ≥ 0, where Bt is a Brownian motion and where μ and σ are scalar-valued functions, with the univariate observation density g(yk|αk) defined at integer times k = 1…, n. As these transition densities are generally unavailable for non-linear diffusions, we use Milstein time-discretisation scheme for approximate simulation with bootstrap particle filter. Fine discretisation mesh gives less bias than the coarser one, with increased computational complexity. These models are also defined via C++ snippets, see the SDE vignette for details.

Markov chain Monte Carlo

Given the prior p(θ), the joint posterior of θ and α1 : T is given as

p(α1 : T, θ|y1 : T) ∝ p(θ)p(α1 : T, y1 : T|θ) = p(θ)p(y1 : T|θ)p(α1 : T|y1 : T, θ)

where p(y1 : T|θ) is the marginal likelihood, and p(α1 : T|y1 : T, θ) is often referred as a smoothing distribution. However, instead of targeting this joint posterior, it is typically more efficient to target the marginal posterior p(θ|y), and then given the sample {θi}i = 1n from this marginal posterior, simulate states α1 : Ti from the smoothing distribution p(α1 : T|y1 : T, θi) for i = 1…, n.

For Gaussian models given the parameters θ, the marginal likelihood p(y1 : T|θ) can be computed using the well known Kalman filter recursions, and there are several algorithms for simulating the states α1 : T from the smoothing distribution p(α1 : T|y1 : T) (see for example Durbin and Koopman (2012)). Therefore we can straightforwardly apply standard MCMC algorithms. In bssm, we use an adaptive random walk Metropolis algorithm based on RAM (Vihola 2012) where we fix the target acceptance rate beforehand. There RAM algorithm is provided by the ramcmc package (J. Helske 2018).

For non-linear/non-Gaussian models, the marginal likelihood p(y1 : T|θ) is typically not available in closed form. Thus we need to resort to simulation methods, which leads to pseudo-marginal MCMC algorithm Andrieu and Roberts (2009). bssm also supports more efficient inference algorithms based on (intermediate) approximations, see J. Helske and Vihola (2021) and Vihola, Helske, and Franks (2020).

Package functionality

Main functions of bssm is written in C++, with help of Rcpp (Eddelbuettel and François 2011) and RcppArmadillo (Eddelbuettel and Sanderson 2014) packages. On the Rside, package uses S3 methods in order to provide relatively unified workflow independent of the type of the model one is working with. The model building functions such as bsm_ng and svm are used to construct the actual state models which can be then passed to other methods, such as logLik and run_mcmc which compute the log-likelihood value and run MCMC algorithm respectively. We will now briefly describe the main functions and methods of bssm, for more detailed descriptions of different function arguments and return values, see the corresponding documentation in R.

Model building functions

For linear-Gaussian models, bssm offers functions bsm_lg for basic univariate structural time series models (BSM), ar1 for univariate, possibly noisy AR(1) process, as well as general ssm_ulg and ssm_mlg for arbitrary linear gaussian models. As an example, consider a Gaussian local linear trend model of form

$$ \begin{aligned} y_t &= \mu_t + \epsilon_t,\\ \mu_{t+1} &= \mu_t + \nu_t + \eta_t,\\ \nu_{t+1} &= \nu_t + \xi_t, \end{aligned} $$ with zero-mean Gaussian noise terms ϵt, ηt, ξt with unknown standard deviations. This model can be built with bsm_lg function as

library("bssm")
## 
## Attaching package: 'bssm'
## The following object is masked from 'package:base':
## 
##     gamma
data("nhtemp", package = "datasets")
prior <- halfnormal(1, 10)
bsm_model <- bsm_lg(y = nhtemp, sd_y = prior, sd_level = prior,
  sd_slope = prior)

Here we use helper function halfnormal which defines half-Normal prior distribution for the standard deviation parameters, with first argument defining the initial value of the parameter, and second defines the scale parameter of the half-Normal distribution. Other prior options are normal and uniform.

For non-Gaussian models, function bsm_ng can be used for constructing an BSM model where the observations are assumed to be distributed according to Poisson, binomial, negative binomial, or Gamma distribution. The syntax is nearly identical as in case of bsm_lg, but we now define also the distribution via argument distribution, and depending on the model, we can also define parameters u and phi. For Poisson and negative binomial models, the known parameter u corresponds to the offset term, whereas in case of binomial model u defines the number of trials. For negative binomial model, argument phi defines the dispersion term, which can be given as a fixed value, or as a prior function. For same observational densities, a model where the state equation follows a first order autoregressive process can be defined using the function ng_ar1. Finally, a stochastic volatility model can be defined using a function svm, and an arbitrary linear-Gaussian state model with Poisson, binomial or negative binomial distributed observations can be defined with ssm_ung and ssm_mng for univariate and multivariate models respectively.

For models where the state equation is no longer linear-Gaussian, we can use our pointer-based C++ interface with the function ssm_nlg. Diffusion models can be defined with the function ssm_sde. For details regarding these types of models, see the corresponding vignettes growth_model and sde_model respectively.

Filtering and smoothing

Filtering refers to estimating the conditional densities of the hidden states at time t, given the observations up to that point. For linear-Gaussian models, these densities can be efficiently computed using the Kalman filter recursions. The bssm has a method kfilter for this task. For models defined with the ssm_mng ,bsm_ng, ar1_ng, and svm functions, kfilter will first construct an approximating Gaussian model for which the Kalman filter is then used. For details of this approximation, see Durbin and Koopman (1997) and Vihola, Helske, and Franks (2020). For non-linear models defined by ssm_nlg it is possible to perform filtering using extended Kalman filter (EKF) with the function ekf, or unscented Kalman filter with the function ukf. It is also possible to use iterated EKF (IEKF) by changing the argument iekf_iter of the ekf function. Compared to EKF, in IEKF the observation equation is linearized iteratively within each time step.

While Kalman filter solves the filtering problem exactly in case of linear-Gaussian models, EKF, UKF, and the filtering based on the approximating Gaussian models produce only approximate, possibly biased filtering estimates for general models. This problem can be solved by the use of particle filters (PF). These sequential Monte Carlo methods are computationally more expensive, but can in principle deal with almost arbitrary state space models. The bssm supports general bootstrap particle filter (BSF) for all model classes of the bssm. For ssm_mng ,bsm_ng, ar1_ng, and svm models we recommend the particle filter called ψ-APF (Vihola, Helske, and Franks 2020) (see also another vignette on CRAN) which makes use of the previously mentioned approximating Gaussian model in order to produce more efficient filter. It is also available for ssm_nlg models but in case of severe non-linearities, it is not necessarily best option.

Compared to filtering problem, in smoothing problems we are interested in the conditional densities of the hidden states at certain time point t given all the observations y1, …, yt, …, yn. Again for linear-Gaussian models we can use so called Kalman smoothing recursions, where as in case of more general models we can rely on approximating methods, or smoothing algorithms based on the output of particle filters. Currently only filter-smoother approach (Kitagawa 1996) for particle smoothing is supported.

Markov chain Monte Carlo

The main purpose of the bssm is to allow efficient MCMC-based inference for various state space models. For this task, a method run_mcmc can be used. Here we define a random walk model with a drift and stochastic seasonal component for UK gas consumption dataset and use 40 000 MCMC iteration where first half is discarded by default as a burn-in. Note that the number of iterations is quite low and in practice we should run the chain longer. Here we use less iterations to speed up the package checks on CRAN.

prior <- halfnormal(0.1, 1)
UKgas_model <- bsm_lg(log10(UKgas), sd_y = prior, sd_level = prior,
  sd_slope = prior, sd_seasonal =  prior)

mcmc_bsm <- run_mcmc(UKgas_model, iter = 4e4, seed = 1)
mcmc_bsm
## 
## Call:
## run_mcmc.lineargaussian(model = UKgas_model, iter = 40000, seed = 1)
## 
## Iterations = 20001:40000
## Thinning interval = 1
## Length of the final jump chain = 4716
## 
## Acceptance rate after the burn-in period:  0.236
## 
## Summary for theta:
## 
##     variable        Mean           SE          SD         2.5%       97.5% ESS
##     sd_level 0.005076792 1.969154e-04 0.003360055 0.0002233676 0.012164506 291
##  sd_seasonal 0.026279113 1.331061e-04 0.003790511 0.0185637669 0.033864873 811
##     sd_slope 0.001169760 2.945858e-05 0.000539790 0.0001344258 0.002394876 336
##         sd_y 0.016280928 2.656162e-04 0.005588283 0.0041307403 0.026326567 443
## 
## Summary for alpha_109:
## 
##    variable time         Mean           SE          SD         2.5%       97.5%
##       level 1987  2.844603981 3.576960e-04 0.016755373  2.811864975  2.87881401
##  seasonal_1 1987  0.268232881 7.340927e-04 0.035012588  0.199319650  0.33673134
##  seasonal_2 1987  0.062504590 3.753102e-04 0.017738847  0.029212802  0.09875666
##  seasonal_3 1987 -0.295387310 3.287016e-04 0.015225214 -0.327559158 -0.26697840
##       slope 1987  0.009664101 8.330244e-05 0.003840359  0.002693921  0.01804641
##   ESS
##  2194
##  2275
##  2234
##  2145
##  2125
## 
## Run time:
##    user  system elapsed 
##   4.639   0.018   4.657

Note that all MCMC algorithms of bssm output also state forecasts for the timepoint n + 1, the summary statistics of this state is also shown in the output above.

Here we use ggplot2 (Wickham 2016) package for the figures, so we transform the MCMC samples to data.frame:

suppressMessages(library("ggplot2"))
d <- as.data.frame(mcmc_bsm, variable = "theta")
ggplot(d, aes(x = value)) + 
  geom_density(adjust = 3, fill = "#92f0a8") + 
  facet_wrap(~ variable, scales = "free") + 
  theme_bw()

suppressMessages(library("dplyr"))
d <- as.data.frame(mcmc_bsm, variable = "states")
level_fit <- d |> 
  filter(variable == "level") |>
  group_by(time) |>
  summarise(consumption = mean(value), 
    lwr = quantile(value, 0.025), 
    upr = quantile(value, 0.975))

ggplot(level_fit, aes(x = time, y = consumption)) + 
  geom_ribbon(aes(ymin = lwr, ymax = upr), 
    fill = "#92f0a8", alpha = 0.25) +
  geom_line(colour = "#92f0a8") +
  geom_line(data = data.frame(
    consumption = log10(UKgas), 
    time = time(UKgas)), 
    colour = "grey30", linetype = "dashed") +
  theme_bw()
Smoothed trend component with 95% intervals.

Smoothed trend component with 95% intervals.

Acknowledgements

This work has been supported by the Academy of Finland research grants 284513, 312605, 311877, and 331817.

References

Andrieu, Christophe, and Gareth O. Roberts. 2009. “The Pseudo-Marginal Approach for Efficient Monte Carlo Computations.” Annals of Statistics 37 (2): 697–725.
Beaumont, Mark A. 2003. “Estimation of Population Growth or Decline in Genetically Monitored Populations.” Genetics 164: 1139–60.
Durbin, James, and Siem Jan Koopman. 1997. “Monte Carlo Maximum Likelihood Estimation for Non-Gaussian State Space Models.” Biometrika 84 (3): 669–84. https://doi.org/10.1093/biomet/84.3.669.
———. 2012. Time Series Analysis by State Space Methods. 2nd ed. New York: Oxford University Press.
Eddelbuettel, Dirk, and Romain François. 2011. Rcpp: Seamless R and C++ Integration.” Journal of Statistical Software 40 (8): 1–18. https://www.jstatsoft.org/v40/i08/.
Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating r with High-Performance c++ Linear Algebra.” Computational Statistics and Data Analysis 71: 1054–63. https://dx.doi.org/10.1016/j.csda.2013.02.005.
Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.
Helske, Jouni. 2017. KFAS: Exponential Family State Space Models in R.” Journal of Statistical Software 78 (10): 1–39. https://doi.org/10.18637/jss.v078.i10.
———. 2018. ramcmc: Robust Adaptive Metropolis Algorithm. https://github.com/helske/ramcmc.
Helske, Jouni, and Matti Vihola. 2021. “Bssm: Bayesian Inference of Non-Linear and Non-Gaussian State Space Models in R.” R Journal. https://arxiv.org/abs/2101.08492.
Helske, Satu, and Jouni Helske. 2019. “Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R.” Journal of Statistical Software 88 (3): 1–32. https://doi.org/10.18637/jss.v088.i03.
Kitagawa, Genshiro. 1996. Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models.” Journal of Computational and Graphical Statistics 5 (1): 1–25.
Lin, L., K. F. Liu, and J. Sloan. 2000. “A Noisy Monte Carlo Algorithm.” Physical Review D 61.
Petris, Giovanni, and Sonia Petrone. 2011. “State Space Models in .” Journal of Statistical Software 41 (4): 1–25. https://doi.org/10.18637/jss.v041.i04.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Tusell, Fernando. 2011. “Kalman Filtering in .” Journal of Statistical Software 39 (2): 1–27. https://doi.org/10.18637/jss.v039.i02.
Vihola, Matti. 2012. “Robust Adaptive Metropolis Algorithm with Coerced Acceptance Rate.” Statistics and Computing 22 (5): 997–1008. https://doi.org/10.1007/s11222-011-9269-5.
Vihola, Matti, Jouni Helske, and Jordan Franks. 2020. “Importance Sampling Type Estimators Based on Approximate Marginal MCMC.” Scandinavian Journal of Statistics. https://doi.org/10.1111/sjos.12492.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.