--- title: "statespacer applied to UK Road Deaths" description: > A lot of time series exhibit some form of seasonality. In this vignette you will learn how to deal with seasonality using the statespacer package. Moreover, you will learn how to add explanatory variables to the State Space model, and how to estimate multivariate models if there is more than one dependent variable available. We also show you how to control the variance - covariance matrices of the various components. output: rmarkdown::html_vignette bibliography: ../inst/REFERENCES.bib vignette: > %\VignetteIndexEntry{statespacer applied to UK Road Deaths} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` A lot of time series exhibit some form of seasonality. An example of such a time series is the UK Road Deaths dataset, see `?Seatbelts` for details about this dataset. In this vignette you will learn how to deal with this seasonality using the statespacer package. Moreover, you will learn how to add explanatory variables to the State Space model, and how to estimate multivariate models if there is more than one dependent variable available. We also show you how to control the variance - covariance matrices of the various components. The UK Road Deaths dataset provides to be an excellent dataset at hand to showcase all of these functionalities of the statespacer package! The approach used in this document follows that of @commandeur2007introduction. ## Data preparation To start, we need to load the package and the dataset. We also log-transform the dependent variables, and some of the explanatory variables. ```{r, setup} # Load statespacer library(statespacer) # Load the dataset library(datasets) Data <- Seatbelts # Data preparation # The log of "drivers", "front", and "rear" will be used as dependent variables # The log of "PetrolPrice" and "kms" will be used as explanatory variables Data[, c("drivers", "front", "rear", "PetrolPrice", "kms")] <- log(Data[, c("drivers", "front", "rear", "PetrolPrice", "kms")]) ``` ## Univariate model with deterministic level and seasonal We start off with estimating a univariate model for the "drivers" variable. We specify a deterministic seasonal component of period 12, and a deterministic level. Furthermore, we take the log of "PetrolPrice" together with the "law" dummy as explanatory variables. ```{r} # Dependent variable y <- as.matrix(Data[, "drivers"]) # Period of the seasonal component BSM_vec <- 12 # Explanatory variables # Note: Must be a list of matrices! # Each dependent gets its own matrix of explanatory variables. addvar_list <- list(as.matrix(Data[, c("PetrolPrice", "law")])) # Format of the variance - covariance matrix of the level component # By setting the elements of this matrix to 0, # the component becomes deterministic. format_level <- matrix(0) # Format of the variance - covariance matrix of the seasonal component # Note: This format must be a list of matrices, because multiple # seasonalities can be specified! format_BSM_list <- list(matrix(0)) # Fitting the model fit <- statespacer(y = y, local_level_ind = TRUE, BSM_vec = BSM_vec, addvar_list = addvar_list, format_level = format_level, format_BSM_list = format_BSM_list, method = "BFGS", initial = 0.5 * log(var(y)), verbose = TRUE) ``` Let's check out the estimates: ```{r, fig.height = 4.5, fig.width = 7} # The estimated variance of the observation disturbance fit$system_matrices$H$H # Smoothed estimate of the level fit$smoothed$level[1,] # Smoothed estimate of the coefficient of log "PetrolPrice" fit$smoothed$addvar_coeff[1, 1] # Smoothed estimate of the coefficient of the "law" dummy fit$smoothed$addvar_coeff[1, 2] # Plot the data next to the smoothed level + explanatory variables components plot(Data[, c("drivers")], type = "l", ylim = c(6.95, 8.1), xlab = "year", ylab = "logarithm of drivers") lines(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), fit$smoothed$level + fit$smoothed$addvar, type = 'l', col = "red") legend(1978, 8.09, c("log(drivers)", "level + regression effects"), lty = c(1,1), lwd=c(2.5, 2.5), col = c("black", "red")) ``` ## Univariate model with stochastic level and seasonal Now, let's specify a stochastic level and seasonal component. This can be done by setting the entries in the format of the components to 1. ```{r, fig.height = 4.5, fig.width = 7, warning = FALSE} # By setting the entries in the format to 1, the component becomes stochastic format_level <- matrix(1) format_BSM_list <- list(matrix(1)) fit <- statespacer(y = y, local_level_ind = TRUE, BSM_vec = BSM_vec, addvar_list = addvar_list, format_level = format_level, format_BSM_list = format_BSM_list, method = "BFGS", initial = log(var(y)), verbose = TRUE) # The estimated variance of the observation disturbance fit$system_matrices$H$H # The estimated variance of the level disturbance fit$system_matrices$Q$level # The estimated variance of the seasonal disturbance fit$system_matrices$Q$BSM12 # Smoothed estimate of the coefficient of log "PetrolPrice" fit$smoothed$addvar_coeff[1, 1] # Smoothed estimate of the coefficient of the "law" dummy fit$smoothed$addvar_coeff[1, 2] # Plot the data next to the smoothed level + explanatory variables components plot(Data[, c("drivers")], type = "l", ylim = c(6.95, 8.1), xlab = "year", ylab = "logarithm of drivers") lines(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), fit$smoothed$level + fit$smoothed$addvar, type = 'l', col = "red") legend(1978, 8.09, c("log(drivers)", "level + regression effects"), lty = c(1,1), lwd=c(2.5, 2.5), col = c("black", "red")) ``` Plotting the stochastic seasonal, we can see that it barely changes over time, which is in line with the near zero value of its variance. It might be worthwhile to set the seasonal component to be deterministic, while letting the level be stochastic. I'll leave that as an exercise for you. You can compare the AICs of both models, and see which one scores better! ```{r, fig.height = 4.5, fig.width = 7} # Plot the stochastic seasonal plot(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), fit$smoothed$BSM12, type = "l", ylim = c(-0.2, 0.3), xlab = "year", ylab = "stochastic seasonal") abline(h = 0) ``` And for the sake of completeness, the irregular component: ```{r, fig.height = 4.5, fig.width = 7} plot(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), fit$smoothed$epsilon, type = "l", ylim = c(-0.15, 0.15), xlab = "year", ylab = "irregular component") abline(h = 0) ``` ## Bivariate model of front and rear passengers The previous models dealt with a univariate series. However, it might be interesting to model multiple series simultaneously. In this section we'll just do that. We'll take the logs of "front" and "rear" as dependent variables, the "law" dummy together with the logs of "PetrolPrice" and "kms" as explanatory variables, a stochastic local level component, and a deterministic seasonal component with period 12. ```{r, warning = FALSE} # Dependent variable y <- as.matrix(Data[, c("front", "rear")]) # Explanatory variables # Note: Must be a list of matrices! # Each dependent gets its own matrix of explanatory variables. X <- as.matrix(Data[, c("PetrolPrice", "kms", "law")]) addvar_list <- list(X, X) # Format of the variance - covariance matrix of the level component # Note: Only the lower triangular part of the format is used. # The format specifies which elements in the matrices L and D should be # non-zero, where L and D are the matrices of the Cholesky LDL decomposition. # The diagonal is used to specify which elements of the Diagonal matrix # should be non-zero. The lower triangular part excluding the diagonal # specifies which elements in the Loading matrix should be non-zero. format_level <- matrix(1, 2, 2) # Format of the variance - covariance matrix of the seasonal component # Note: This format must be a list of matrices, because multiple seasonalities # can be specified! format_BSM_list <- list(matrix(0, 2, 2)) # Format of the variance - covariance matrix of the observation disturbances H_format <- matrix(1, 2, 2) # Fitting the model fit <- statespacer(y = y, H_format = H_format, local_level_ind = TRUE, BSM_vec = BSM_vec, addvar_list = addvar_list, format_level = format_level, format_BSM_list = format_BSM_list, method = "BFGS", initial = 0.5 * log(diag(var(y))), verbose = TRUE) ``` Checking out the estimates and plotting the levels: ```{r, fig.height = 4.5, fig.width = 7} # The estimated variance - covariance matrix of the observation disturbance fit$system_matrices$H$H # The estimated variance - covariance matrix of the level disturbance fit$system_matrices$Q$level # Coefficients + T-stats coeff <- cbind( c("front PetrolPrice", "front kms", "front law", "rear PetrolPrice", "rear kms", "rear law"), fit$smoothed$addvar_coeff[1,], fit$smoothed$addvar_coeff[1,] / fit$smoothed$addvar_coeff_se[1,] ) colnames(coeff) <- c("Variable", "Coefficient", "T-Stat") coeff # plot of level for "front" plot(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), fit$smoothed$level[, 1], type = "l", xlab = "year", ylab = "level of front passengers") # plot of level for "rear" plot(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), fit$smoothed$level[, 2], type = "l", xlab = "year", ylab = "level of rear passengers") ``` We see that the levels closely resemble each other. In fact, their correlation is almost equal to 1: ```{r} fit$system_matrices$Q_correlation_matrix$level ``` It is possible to restrict the rank of the variance - covariance matrix of the level component to be 1. This can be done by setting `format_level` as follows: ```{r} format_level <- matrix(0, 2, 2) format_level[, 1] <- 1 ``` Feel free to estimate the model with this new specification yourself! I'd also suggest dropping the "law" dummy for the "rear" passengers, but keeping it in for the "front" passengers. ## References