---
title: "Introduction to the jumps package"
date: "Version: 2025-03-16"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to the jumps package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

The `jumps` package implements the method described in the forthcoming *Economic Modelling* article "A Hodrick-Prescott filter with automatically selected breaks". The terms *jumps* and *breaks* are used interchangeably in the package documentation. Indeed, in the initial version of the article, we used the term jumps, and one of the referees suggested that this term is generally devoted to the jump processes used in mathematical finance, while *breaks* is generally used in econometrics. We agreed with the referee and decided to use the term breaks in the article's final version. However, we decided to keep the term *jumps* in the package name and in the function names because we had already written most of it.

We introduced an innovation to the well-known Hodrick-Prescott filter (HPF) that allows for a small number of discontinuities in the otherwise smooth filter. The number, positions and magnitudes of these discontinuities, which we name *breaks* or *jumps*, are automatically estimated from the data. The method is based on the minimization of the sum of squared residuals of the HPF subject to a penalty on the number of breaks. The penalty is chosen by the user and can be set to zero, in which case the method reduces to the standard HPF. The technique is implemented in the `hpj` function, which is the main function of the package. The function `hpj` is a wrapper for other functions that implement various variations of the technique.

For efficiency, the package's computational engine is written in C++ through the Rcpp package. The engine implements the Kalman filter and smoother for the state-space form underlying the HPF. All formulae can be found in the article and in the vignette titled *Formulae*.

Users needing higher control over the estimation process can use the functions with names `hpfj*` and `auto_hpfj*`. However, the wrapper function `hpj` makes the process simpler and more stable. Indeed, good starting values are essential since the technique is based on a complex numerical optimisation. Inside the `hpj` function, we linearly transform the time series so that the starting values in the `hpfj*` functions are expected to work fine, and then we apply the anti-transform to all the results so that they refer to the original time series. The functions `hpfj*` and `auto_hpfj*` allow full control over the starting values and penalties.

While `hpfj*` and `auto_hpfj*` accept only time series stored in a numerical vector (any type of time series object is cast to numerical vector), the `hpj` function accepts all the main time series objects used in R (ts, zoo, xts, and timeSeries). All the time series generated by the function receive the same class and dates as the original time series.

# Basic usage

The function is called as

```         
hpj(y, maxsum = NULL, lambda = NULL, xreg = NULL, ic = c("bic", "hq", "aic", "aicc"))
```
where `y` is the time series to be filtered, `maxsum` is inverse penalty, when zero no jumps are allowed, the larger `maxsum` the more and larger breaks are allowed, `lambda` is the smoothing parameter of the HPF, `xreg` is a matrix of regressors (we are still woring on this, so, at the moment it is not yet implemented), and `ic` is the information criterion used to select the penalty when `maxsum = NULL`. When `lambda = NULL`, the smoothing parameter is estimated by quasi maximum likelihood. The parameter `lambda` can be a positive number or one of the following strings (in parenthesis the value of the smoothing parameter corresponding to the string):
daily (110,930,628,906), weekly (45,697,600), monthly  (129,600), quarterly (1,600), annual  (6.25).

The function returns an object of class `hpj`, which is a list with the following slots:

- `y` the input time series;
- `maxsum` the maximum sum of additional standard deviations;
- `lambda` the smoothing constant of the HP filter;
- `pars` vector of estimated parameters (sigma_slope, sigma_noise, gamma);- hpj: the time series of the HP filter with jumps;
- `hpj_std` the time series of the HP filter with jumps standard deviations;
- `std_devs` vector of additional standard deviations of the level disturbance;
- `xreg` matrix of regressors;
- `df` model's degrees of freedom;
- `loglik` value of the log-likelihood at maximum;
- `ic` vector of information criteria (aic, aicc, bic, hq);
- `opt` the output of the optimization function (nloptr);
- `call` the call to the function.

The methods `print` and `plot` are available for the `hpj` class. The `plot` method can be customised:
```
plot(x, prob = NULL, show_breaks = TRUE, main = "original + filter", use_ggplot = TRUE, ...)
```
where `x` is the object of class `hpj`, `prob` is the coverage of the confidence interval for the filter, which is not plotted when `prob = NULL`, `show_breaks` is a logical indicating whether to show the breaks in the plot, `main` is the title of the plot, `use_ggplot` is a logical indicating whether to use ggplot2, and `...` are additional arguments passed to the `plot` function when the standard plot is used (that is when `use_ggplot = FALSE`).

# Examples
## Simulated time series

```{r fig.width=5}
library(jumps)
set.seed(2025)
n <- 100

# simulated smooth trend
mu <- 100*cos(3*pi/n*(1:n)) - ((1:n) > 50)*n - c(rep(0, 50), 1:50)*10
# simulated time series
y <- mu + rnorm(n, sd = 20)

# HP filter with jumps with estimated lambda and fixed penalty (maxsum = 50)
hpj_sim <- hpj(y, maxsum = 50)

print(hpj_sim)

plot(hpj_sim)
plot(hpj_sim, prob = 0.95)
plot(hpj_sim, use_ggplot = FALSE)
```

## Nile time series

```{r fig.width=5}
# HP filter with jumps with estimated lambda and automatically selected penalty
hpj_nile <- hpj(Nile)

print(hpj_nile)

plot(hpj_nile, main = "Nile river flow")
plot(hpj_nile, prob = 0.95, main = "Nile river flow")
plot(hpj_nile, use_ggplot = FALSE, main = "Nile river flow")
```

## Employment in Italy

```{r warning=FALSE, message=FALSE, fig.width=5}
data("employed_IT")
y <- window(employed_IT[, "Y25.29"], start = c(2009, 1))
hpj_emp <- hpj(y, scl = "original")

print(hpj_emp)
plot(hpj_emp, main = "Millions of employed in Italy: age 25-29")
```