---
title: "statespacer applied to UK Road Deaths"
description: >
  A lot of time series exhibit some form of seasonality. In this vignette you 
  will learn how to deal with seasonality using the statespacer package. 
  Moreover, you will learn how to add explanatory variables to the State Space 
  model, and how to estimate multivariate models if there is more than one 
  dependent variable available. We also show you how to control the variance - 
  covariance matrices of the various components.
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{statespacer applied to UK Road Deaths}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

A lot of time series exhibit some form of seasonality. An example of such a time series is the UK Road Deaths dataset, see `?Seatbelts` for details about this dataset. In this vignette you will learn how to deal with this seasonality using the statespacer package. Moreover, you will learn how to add explanatory variables to the State Space model, and how to estimate multivariate models if there is more than one dependent variable available. We also show you how to control the variance - covariance matrices of the various components. The UK Road Deaths dataset provides to be an excellent dataset at hand to showcase all of these functionalities of the statespacer package! The approach used in this document follows that of @commandeur2007introduction.

## Data preparation

To start, we need to load the package and the dataset. We also log-transform the dependent variables, and some of the explanatory variables. 

```{r, setup}
# Load statespacer
library(statespacer)

# Load the dataset
library(datasets)
Data <- Seatbelts

# Data preparation
# The log of "drivers", "front", and "rear" will be used as dependent variables
# The log of "PetrolPrice" and "kms" will be used as explanatory variables
Data[, c("drivers", "front", "rear", "PetrolPrice", "kms")] <- log(Data[, c("drivers", "front", "rear", "PetrolPrice", "kms")])
```

## Univariate model with deterministic level and seasonal
We start off with estimating a univariate model for the "drivers" variable. We specify a deterministic seasonal component of period 12, and a deterministic level. Furthermore, we take the log of "PetrolPrice" together with the "law" dummy as explanatory variables.

```{r}
# Dependent variable
y <- as.matrix(Data[, "drivers"])

# Period of the seasonal component
BSM_vec <- 12

# Explanatory variables
# Note: Must be a list of matrices! 
#       Each dependent gets its own matrix of explanatory variables.
addvar_list <- list(as.matrix(Data[, c("PetrolPrice", "law")]))

# Format of the variance - covariance matrix of the level component
# By setting the elements of this matrix to 0, 
# the component becomes deterministic.
format_level <- matrix(0)

# Format of the variance - covariance matrix of the seasonal component
# Note: This format must be a list of matrices, because multiple 
#       seasonalities can be specified!
format_BSM_list <- list(matrix(0))

# Fitting the model
fit <- statespacer(y = y,
                   local_level_ind = TRUE,
                   BSM_vec = BSM_vec,
                   addvar_list = addvar_list,
                   format_level = format_level, 
                   format_BSM_list = format_BSM_list,
                   method = "BFGS",
                   initial = 0.5 * log(var(y)),
                   verbose = TRUE)
```

Let's check out the estimates:

```{r, fig.height = 4.5, fig.width = 7}
# The estimated variance of the observation disturbance
fit$system_matrices$H$H

# Smoothed estimate of the level
fit$smoothed$level[1,]

# Smoothed estimate of the coefficient of log "PetrolPrice"
fit$smoothed$addvar_coeff[1, 1]

# Smoothed estimate of the coefficient of the "law" dummy
fit$smoothed$addvar_coeff[1, 2]

# Plot the data next to the smoothed level + explanatory variables components
plot(Data[, c("drivers")], type = "l", ylim = c(6.95, 8.1),
     xlab = "year", ylab = "logarithm of drivers")
lines(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), 
      fit$smoothed$level + fit$smoothed$addvar, type = 'l', col = "red")
legend(1978, 8.09, c("log(drivers)", "level + regression effects"), 
       lty = c(1,1), lwd=c(2.5, 2.5), col = c("black", "red"))
```

## Univariate model with stochastic level and seasonal
Now, let's specify a stochastic level and seasonal component. This can be done by setting the entries in the format of the components to 1.

```{r, fig.height = 4.5, fig.width = 7, warning = FALSE}
# By setting the entries in the format to 1, the component becomes stochastic
format_level <- matrix(1)
format_BSM_list <- list(matrix(1))
fit <- statespacer(y = y,
                   local_level_ind = TRUE,
                   BSM_vec = BSM_vec,
                   addvar_list = addvar_list,
                   format_level = format_level,
                   format_BSM_list = format_BSM_list,
                   method = "BFGS",
                   initial = log(var(y)),
                   verbose = TRUE)

# The estimated variance of the observation disturbance
fit$system_matrices$H$H

# The estimated variance of the level disturbance
fit$system_matrices$Q$level

# The estimated variance of the seasonal disturbance
fit$system_matrices$Q$BSM12

# Smoothed estimate of the coefficient of log "PetrolPrice"
fit$smoothed$addvar_coeff[1, 1]

# Smoothed estimate of the coefficient of the "law" dummy
fit$smoothed$addvar_coeff[1, 2]

# Plot the data next to the smoothed level + explanatory variables components
plot(Data[, c("drivers")], type = "l", ylim = c(6.95, 8.1),
     xlab = "year", ylab = "logarithm of drivers")
lines(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), 
      fit$smoothed$level + fit$smoothed$addvar, type = 'l', col = "red")
legend(1978, 8.09, c("log(drivers)", "level + regression effects"), 
       lty = c(1,1), lwd=c(2.5, 2.5), col = c("black", "red"))
```

Plotting the stochastic seasonal, we can see that it barely changes over time, which is in line with the near zero value of its variance. It might be worthwhile to set the seasonal component to be deterministic, while letting the level be stochastic. I'll leave that as an exercise for you. You can compare the AICs of both models, and see which one scores better!

```{r, fig.height = 4.5, fig.width = 7}
# Plot the stochastic seasonal
plot(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), 
     fit$smoothed$BSM12,
     type = "l", ylim = c(-0.2, 0.3),
     xlab = "year", ylab = "stochastic seasonal")
abline(h = 0)
```

And for the sake of completeness, the irregular component:

```{r, fig.height = 4.5, fig.width = 7}
plot(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), 
     fit$smoothed$epsilon,
     type = "l", ylim = c(-0.15, 0.15),
     xlab = "year", ylab = "irregular component")
abline(h = 0)
```

## Bivariate model of front and rear passengers
The previous models dealt with a univariate series. However, it might be interesting to model multiple series simultaneously. In this section we'll just do that. We'll take the logs of "front" and "rear" as dependent variables, the "law" dummy together with the logs of "PetrolPrice" and "kms" as explanatory variables, a stochastic local level component, and a deterministic seasonal component with period 12.

```{r, warning = FALSE}
# Dependent variable
y <- as.matrix(Data[, c("front", "rear")])

# Explanatory variables
# Note: Must be a list of matrices! 
#       Each dependent gets its own matrix of explanatory variables.
X <- as.matrix(Data[, c("PetrolPrice", "kms", "law")])
addvar_list <- list(X, X)

# Format of the variance - covariance matrix of the level component
# Note: Only the lower triangular part of the format is used.
#       The format specifies which elements in the matrices L and D should be 
#       non-zero, where L and D are the matrices of the Cholesky LDL decomposition.
#       The diagonal is used to specify which elements of the Diagonal matrix
#       should be non-zero. The lower triangular part excluding the diagonal 
#       specifies which elements in the Loading matrix should be non-zero.
format_level <- matrix(1, 2, 2)

# Format of the variance - covariance matrix of the seasonal component
# Note: This format must be a list of matrices, because multiple seasonalities
#       can be specified!
format_BSM_list <- list(matrix(0, 2, 2))

# Format of the variance - covariance matrix of the observation disturbances
H_format <- matrix(1, 2, 2)

# Fitting the model
fit <- statespacer(y = y,
                   H_format = H_format,
                   local_level_ind = TRUE,
                   BSM_vec = BSM_vec,
                   addvar_list = addvar_list,
                   format_level = format_level, 
                   format_BSM_list = format_BSM_list,
                   method = "BFGS",
                   initial = 0.5 * log(diag(var(y))),
                   verbose = TRUE)
```

Checking out the estimates and plotting the levels:
```{r, fig.height = 4.5, fig.width = 7}
# The estimated variance - covariance matrix of the observation disturbance
fit$system_matrices$H$H

# The estimated variance - covariance matrix of the level disturbance
fit$system_matrices$Q$level

# Coefficients + T-stats
coeff <- cbind(
  c("front PetrolPrice", "front kms", "front law", 
    "rear PetrolPrice", "rear kms", "rear law"),
  fit$smoothed$addvar_coeff[1,],
  fit$smoothed$addvar_coeff[1,] / fit$smoothed$addvar_coeff_se[1,]
)
colnames(coeff) <- c("Variable", "Coefficient", "T-Stat")
coeff

# plot of level for "front"
plot(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), 
     fit$smoothed$level[, 1], type = "l",
     xlab = "year", ylab = "level of front passengers")

# plot of level for "rear"
plot(seq(tsp(Data)[1], tsp(Data)[2], 1/tsp(Data)[3]), 
     fit$smoothed$level[, 2], type = "l",
     xlab = "year", ylab = "level of rear passengers")
```

We see that the levels closely resemble each other. In fact, their correlation is almost equal to 1:

```{r}
fit$system_matrices$Q_correlation_matrix$level
```

It is possible to restrict the rank of the variance - covariance matrix of the level component to be 1. This can be done by setting `format_level` as follows:

```{r}
format_level <- matrix(0, 2, 2)
format_level[, 1] <- 1
```

Feel free to estimate the model with this new specification yourself! I'd also suggest dropping the "law" dummy for the "rear" passengers, but keeping it in for the "front" passengers.

## References