Here we provide a brief tutorial
of the BayesChange package. The BayesChange
package contains two main functions: one that performs change points
detection on time series and epidemic diffusions and one that perform
clustering of time series and epidemic diffusions with common change
points. Here we briefly show how to implement these.
The function detect_cp provide a method for detecting
change points, it is based on the work Martínez
and Mena (2014) and on Corradin et al.
(2024)
Depending on the structure of the data, detect_cp might
perform change points detection on univariate time series or
multivariate time series. We import dataset eu_inflation
that contains the standardized monthly inflation rates from the
Harmonized Index of Consumer Prices (HICP) for the 12 COICOP expenditure
categories across European Union countries. The data span the period
from February~1997 to December~2024 resulting in a matrix of \(12\) rows and \(355\) columns.
Now we can run the function detect_cp, as arguments of
the function we need to specify the number of iterations, the number of
burn-in steps and a list with the the autoregressive coefficient
phi for the likelihood of the data, the parameters
a, b, c for the priors and the
probability q of performing a split at each step. Since we
deal with time series we need also to specify
kernel = "ts".
out <- detect_cp(data = eu_inflation[1,],
n_iterations = 2000, n_burnin = 500, q = 0.5,
params = list(prior_var_phi = 0.1, prior_delta_c = 1, prior_delta_d = 1), kernel = "ts")
#> Completed: 200/2000 - in 0.413669 sec
#> Completed: 400/2000 - in 0.778968 sec
#> Completed: 600/2000 - in 1.10147 sec
#> Completed: 800/2000 - in 1.45406 sec
#> Completed: 1000/2000 - in 1.79363 sec
#> Completed: 1200/2000 - in 2.08734 sec
#> Completed: 1400/2000 - in 2.39096 sec
#> Completed: 1600/2000 - in 2.70758 sec
#> Completed: 1800/2000 - in 3.029 sec
#> Completed: 2000/2000 - in 3.34796 secWith the methods print and summary we can
get information about the algorithm.
print(out)
#> DetectCpObj object
#> Type: change points detection on univariate time series
summary(out)
#> DetectCpObj object
#> Change point detection summary:
#> - Data: univariate time series
#> - Burn-in iterations: 500
#> - MCMC iterations: 1500
#> - Average number of detected change points: 8.17
#> - Computational time: 3.35 seconds
#>
#> Use plot() for a detailed visualization or posterior_estimate() to analyze the detected change points.In order to get a point estimate of the change points we can use the
method posterior_estimate that uses the method
salso by David B. Dahl and Müller
(2022) to get the final latent order and then detect the change
points.
cp_est <- posterior_estimate(out, loss = "binder")
cumsum(table(cp_est))[-length(table(cp_est))] + 1
#> 1 2 3 4 5 6 7 8
#> 130 145 147 300 302 305 322 326The package also provides a method for plotting the change points.
We can assess convergence of the latent order posterior chain, for
example, by inspecting the traceplot of its log-likelihood with
coda::traceplot.
If we instead consider a matrix of data, detect_cp
automatically performs a multivariate change points detection method. We
define the parameters.
params_multi <- list(m_0 = rep(0,3),
k_0 = 1,
nu_0 = 10,
S_0 = diag(0.1,3,3),
prior_var_phi = 0.1,
prior_delta_c = 1,
prior_delta_d = 1)Arguments k_0, nu_0, phi_0,
m_0, prior_delta_c, prior_delta_d
and prior_var_phi correspond to the parameters of the prior
distributions for the multivariate likelihood.
out <- detect_cp(data = eu_inflation[1:3,], n_iterations = 2000,
n_burnin = 500, q = 0.5, params = params_multi, kernel = "ts")
#> Completed: 200/2000 - in 0.058356 sec
#> Completed: 400/2000 - in 0.119056 sec
#> Completed: 600/2000 - in 0.180904 sec
#> Completed: 800/2000 - in 0.242692 sec
#> Completed: 1000/2000 - in 0.306173 sec
#> Completed: 1200/2000 - in 0.368056 sec
#> Completed: 1400/2000 - in 0.430865 sec
#> Completed: 1600/2000 - in 0.493515 sec
#> Completed: 1800/2000 - in 0.555849 sec
#> Completed: 2000/2000 - in 0.617997 sec
table(posterior_estimate(out, loss = "binder"))
#> Warning in salso::salso(mcmc_chain, loss = "binder", maxNClusters =
#> maxNClusters, : The number of clusters equals the default maximum possible
#> number of clusters.
#>
#> 1 2 3 4 5
#> 42 150 86 24 34plot(out, loss = "binder", plot_freq = TRUE)
#> Warning in salso::salso(mcmc_chain, loss = "binder", maxNClusters =
#> maxNClusters, : The number of clusters equals the default maximum possible
#> number of clusters.Function detect_cp can also be used to detect change
points on survival functions. We consider the synthetic dataset
epi_synthetic
To run detect_cp on epidemiological data we need to set
kernel = "epi". Moreover, besides the usual parameters, we
need to set the number of Monte Carlo replications M for
the approximation of the integrated likelihood and the recovery rate
xi. a0 and b0 are optional and
correspond to the parameters of the gamma distribution for the
integration of the likelihood.
params_epi <- list(M = 250, xi = 1/8, a0 = 4, b0 = 10, I0_var = 0.1)
out <- detect_cp(data = epi_synthetic, n_iterations = 2000, n_burnin = 500,
q = 0.25, params = params_epi, kernel = "epi")
#> Completed: 200/2000 - in 12.8483 sec
#> Completed: 400/2000 - in 25.3378 sec
#> Completed: 600/2000 - in 38.277 sec
#> Completed: 800/2000 - in 50.606 sec
#> Completed: 1000/2000 - in 63.753 sec
#> Completed: 1200/2000 - in 76.5805 sec
#> Completed: 1400/2000 - in 88.829 sec
#> Completed: 1600/2000 - in 101.001 sec
#> Completed: 1800/2000 - in 113.356 sec
#> Completed: 2000/2000 - in 125.687 sec
print(out)
#> DetectCpObj object
#> Type: change points detection on an epidemic diffusionAlso here, with function plot we can plot the survival
function and the position of the change points.
BayesChange contains another function,
clust_cp, that cluster respectively univariate and
multivariate time series and survival functions with common change
points. Details about this methods can be found in Corradin et al. (2026)
In clust_cp the argument kernel must be
specified, if data are time series then kernel = "ts" must
be set. Then the algorithm automatically detects if data are univariate
or multivariate.
We consider for this example dataset stock_uni that
contains the daily mean stock prices for the 50 largest companies (by
market capitalization) in the Standard&Poor’s 500 Index from January
1, 2020 to January 1, 2022.
Arguments that need to be specified in clust_cp are the
number of iterations n_iterations, the number of elements
in the normalisation constant B, the split-and-merge step
L performed when a new partition is proposed and a list
with the parameters of the algorithm, the likelihood and the
priors..
params_uni <- list(a = 1,
b = 1,
c = 1,
phi = 0.1)
out <- clust_cp(data = stock_uni[1:5,], n_iterations = 2000, n_burnin = 500,
L = 1, q = 0.5, B = 1000, params = params_uni, kernel = "ts")
#> Normalization constant - completed: 100/1000 - in 0.189228 sec
#> Normalization constant - completed: 200/1000 - in 0.363059 sec
#> Normalization constant - completed: 300/1000 - in 0.548615 sec
#> Normalization constant - completed: 400/1000 - in 0.735855 sec
#> Normalization constant - completed: 500/1000 - in 0.921944 sec
#> Normalization constant - completed: 600/1000 - in 1.10651 sec
#> Normalization constant - completed: 700/1000 - in 1.28983 sec
#> Normalization constant - completed: 800/1000 - in 1.46943 sec
#> Normalization constant - completed: 900/1000 - in 1.6618 sec
#> Normalization constant - completed: 1000/1000 - in 1.8421 sec
#>
#> ------ MAIN LOOP ------
#>
#> Completed: 200/2000 - in 3.11131 sec
#> Completed: 400/2000 - in 5.92338 sec
#> Completed: 600/2000 - in 8.60356 sec
#> Completed: 800/2000 - in 11.355 sec
#> Completed: 1000/2000 - in 13.9892 sec
#> Completed: 1200/2000 - in 16.553 sec
#> Completed: 1400/2000 - in 19.2294 sec
#> Completed: 1600/2000 - in 21.8527 sec
#> Completed: 1800/2000 - in 24.3712 sec
#> Completed: 2000/2000 - in 26.955 sec
posterior_estimate(out, loss = "binder")
#> [1] 1 2 3 4 1Method plot for clustering univariate time series
represents the data colored according to the assigned cluster.
Method plot_psm shows the posterior similarity matrix of
the clustering. Selecting reorder = TRUE we can choose to
order the matrix depending on the clustering obtained.
If time series are multivariate, data must be an array, where each
element is a multivariate time series represented by a matrix. Each row
of the matrix is a component of the time series. Here we use dataset
stock_multi that contains for each company the daily
opening and closing stock prices.
params_multi <- list(m_0 = rep(0,2),
k_0 = 1,
nu_0 = 10,
S_0 = diag(1,2,2),
phi = 0.1)
out <- clust_cp(data = stock_multi[,,1:5], n_iterations = 2500, n_burnin = 500,
L = 1, B = 1000, params = params_multi, kernel = "ts")
#> Normalization constant - completed: 100/1000 - in 0.020845 sec
#> Normalization constant - completed: 200/1000 - in 0.04184 sec
#> Normalization constant - completed: 300/1000 - in 0.062555 sec
#> Normalization constant - completed: 400/1000 - in 0.083127 sec
#> Normalization constant - completed: 500/1000 - in 0.103785 sec
#> Normalization constant - completed: 600/1000 - in 0.124543 sec
#> Normalization constant - completed: 700/1000 - in 0.145257 sec
#> Normalization constant - completed: 800/1000 - in 0.165952 sec
#> Normalization constant - completed: 900/1000 - in 0.186466 sec
#> Normalization constant - completed: 1000/1000 - in 0.20703 sec
#>
#> ------ MAIN LOOP ------
#>
#> Completed: 250/2500 - in 0.430167 sec
#> Completed: 500/2500 - in 0.866737 sec
#> Completed: 750/2500 - in 1.33828 sec
#> Completed: 1000/2500 - in 1.80254 sec
#> Completed: 1250/2500 - in 2.26703 sec
#> Completed: 1500/2500 - in 2.74952 sec
#> Completed: 1750/2500 - in 3.21317 sec
#> Completed: 2000/2500 - in 3.67078 sec
#> Completed: 2250/2500 - in 4.13789 sec
#> Completed: 2500/2500 - in 4.59756 sec
posterior_estimate(out, loss = "binder")
#> [1] 1 2 3 1 1Finally, if we set kernel = "epi", clust_cp
cluster survival functions with common change points. Also here details
can be found in Corradin et al. (2026)
Data are a matrix where each row is the number of infected at each
time. Inside this package is included the dataset
epi_synthetic_multi with multivariate synthetic epidemic
diffusions.
data("epi_synthetic_multi")
params_epi <- list(M = 100, xi = 1/8,
alpha_SM = 1,
a0 = 4,
b0 = 10,
I0_var = 0.1,
avg_blk = 2)
out <- clust_cp(epi_synthetic_multi[,10:150], n_iterations = 2000, n_burnin = 500,
L = 1, B = 1000, params = params_epi, kernel = "epi")
posterior_estimate(out, loss = "binder")
plot(out, loss = "binder")