Introduction to SNSeg and Examples

SNSeg supports change-points estimation for both univariate and multivariate time series (including high-dimensional time series with dimension greater than 10.) using Self-Normalization (SN) based framework. Please read Zhao, Jiang and Shao (2022) <doi.org/10.1111/rssb.12552> for details of the SN-based algorithms.

The package contain three functions for change-points estimation:

  • The function SNSeg_Uni works for a univariate time series.
  • The function SNSeg_Multi works for multivariate time series with dimension up to ten.
  • The function SNSeg_HD works for high-dimensional time series with dimension above ten.

All functions contain two important input arguments: grid_size_scale and grid_size.

  • grid_size_scale: the trimming parameter ϵ, a parameter to control the local window size grid_size. The range of grid_size_scale should be between 0.05 and 0.5. Any input grid_size_scale smaller than 0.05 will be automatically changed to 0.05, and any input grid_size_scale greater than 0.5 will be automatically changed to 0.5.
  • grid_size: the local window size h for local scanning for each time point. According to Zhao et al. (2022), grid_size is a product of grid_size_scale and the length of time.

Users can set their own grid_size or leave it as NULL in the input arguments. If grid_size is NULL, these functions will compute the SN test statistic using the value of grid_size_scale. If grid_size is set by users, the functions will first calculate grid_size_scale by diving the length of time, and then compute the SN test statistic.

For the other input arguments: * ts: Users should enter a time series for this argument. For SNSeg_Uni, the dimension of ts should be exactly one in most cases (or two when paras_to_test = 'bivcor'); for SNSeg_Multi, the dimension must be at least two but no more than ten; for SNSeg_HD, the dimension must be greater than ten. * paras_to_test: This argument in functions SNSeg_Uni and SNSeg_Multi allows users to enter the parameter(s) they would like to test the change in. * confidence: Users should choose a confidence level among 0.9, 0.95, 0.99, 0.995 and 0.995. A smaller confidence level is easier to reject the null hypothesis and detect a change-point.

The package also offers an option to plot the time series (this only works for the univariate time series cases!) Users need to set plot_SN = TRUE to visualize the time series, and est_cp_loc = TRUE to add the estimated change-point locations in the plot.

SN Test Statistic Plot: max_SNsweep

To visualize the computed SN-based test statistic at each time point, users can apply the function max_SNsweep by plugging in the output object from one of the functions SNSeg_Uni, SNSeg_Multi and SNSeg_HD. The options est_cp_loc = TRUE and critical_loc = TRUE are provided to draw the estimated change-point locations and the critical value threshold inside the test statistic plot.

max_SNsweep also returns the SN-based test statistic for each time point. A large number of test statistics in the output can be messy for users who only seek to generate a plot. To hide the test statistics output, users can create an arbitrary variable name and set it to the max_SNsweep function, e.g., SN_stat <- max_SNsweep(...).

Parameter Estimates of Each Segment Separated by the Detected Change-Points: SNSeg_estimate

The function SNSeg_estimate allows users to compute the parameter estimates (e.g., mean, variance, acf, quantile, etc.) of each of the segments separated by the estimated change-points. To use this function, users should use the output of the functions SNSeg_Uni, SNSeg_Multi and SNSeg_HD as the input of SNSeg_estimate.

S3 methods: summary, print and plot

The typical S3 methods summary, print and plot are available to SNSeg_Uni, SNSeg_Multi and SNSeg_HD objects. The summary method displays the parameter to be tested, the estimated change-point amount and locations, the grid_size, confidence level as well as the critical value of the SN-based test. The print method shows the change-point locations. The plot method plots the time series, and similar to the argument plot_SN = TRUE, the plot method allows users to generate time series segmentation plot(s) with the estimated change-point locations. It also provides ts_index option to allow users to plot any individual time series they want. Users can apply their preferred color of the change-point(s) within the plot(s) by setting cpts.col to any color.

We then provide the examples of the functions SNSeg_Uni, SNSeg_Multi and SNSeg_HD.

Examples of SNSeg_Uni:

The function SNSeg_Uni detect change-points for a univariate time series based on the change in a single or parameters.

  • For a univariate time series, the input argument paras_to_test allows one or a combination of multiple parameters from “mean”, “variance”, “acf” and a numeric quantile value between 0 and 1. For instance, to test the change in autocorrelation and 80th quantile, users should set paras_to_test to c(‘acf’, 0.8).
  • To test the change in correlation between two univariate time series, the input time series must be two-dimensional and paras_to_test should be set to bivcor.
  • Users can also set paras_to_test using their own parameters. In this case, the input of paras_to_test needs to be a function which returns a numeric value.

We provide examples for different cases:

# Please run the following function before running examples:
mix_GauGPD <- function(u,p,trunc_r,gpd_scale,gpd_shape){
  indicator <- u<p
  rv <- rep(0, length(u))
  rv[indicator>0] <- qtruncnorm(u[indicator>0]/p,a=-Inf,b=trunc_r)
  rv[indicator<=0] <- qgpd((u[indicator<=0]-p)/(1-p), loc=trunc_r, scale=gpd_scale,shape=gpd_shape)
  return(rv)
}

Test in a single parameter

Segmentation for Mean

set.seed(7)
n <- 2000
reptime <- 2
cp_sets <- round(n*c(0,cumsum(c(0.5,0.25)),1))
mean_shift <- c(0.4,0,0.4)
rho <- -0.7
ts <- MAR(n, reptime, rho)
no_seg <- length(cp_sets)-1
for(index in 1:no_seg){ # Mean shift
  tau1 <- cp_sets[index]+1   
  tau2 <- cp_sets[index+1]
  ts[tau1:tau2,] <- ts[tau1:tau2,] + mean_shift[index]
}
ts <- ts[,2]
# grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = NULL, 
                    plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined & generate time series segmentation plot
result <- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = 116, 
                    plot_SN = TRUE, est_cp_loc = TRUE)

# Estimated change-point locations
result$est_cp
#> [1] 1018 1487
# Parameter estimates (mean) of each segment
SNSeg_estimate(result)
#> $mean
#> [1] 0.39444849 0.02107613 0.38362925
#> 
#> attr(,"class")
#> [1] "SNSeg_estimate"
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)

We then show how to use the S3 methods summary, print and plot.

summary(result)
#> There are 2 change-point(s) detected at 90th confidence level based on the change in the single mean parameter.
#> 
#> The critical value of SN-based test is 135.39283594
#> 
#> The detected change-point location(s) are 1018,1487 with a grid_size of 116
print(result)
#> The detected change-point location(s) are 1018,1487
plot(result, cpts.col = 'red')

We can see both the plot_SN = TRUE option and the plot.SN function generates the same plot. Users can select any choice based on their preferences. The class of the argument result is SNSeg_Uni.

Segmentation for Variance

set.seed(7)
ts <- MAR_Variance(2, "V1")
ts <- ts[,2]
# grid_size defined
result <- SNSeg_Uni(ts, paras_to_test = "variance", confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = NULL, 
                    plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (variance) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Segmentation for Autocorrelation

set.seed(7)
ts <- MAR_Variance(2, "A3")
ts <- ts[,2]
# grid_size defined
result <- SNSeg_Uni(ts, paras_to_test = "acf", confidence = 0.9,
          grid_size_scale = 0.05, grid_size = 92, plot_SN = FALSE,
          est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (acf) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Segmentation for bivariate correlation

library(mvtnorm)
set.seed(7)
n <- 1000
sigma_cross <- list(4*matrix(c(1,0.8,0.8,1), nrow=2),
                    matrix(c(1,0.2,0.2,1), nrow=2),
                    matrix(c(1,0.8,0.8,1), nrow=2))
cp_sets <- round(c(0,n/3,2*n/3,n))
noCP <- length(cp_sets)-2
rho_sets <- rep(0.5, noCP+1)
ts <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets, sigma_cross)
ts <- ts[1][[1]]
# grid_size defined
result <- SNSeg_Uni(ts, paras_to_test = "bivcor", confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = 77, 
                    plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (bivariate correlation) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Segmentation for quantile

Please download the packages truncnorm and evd before running the following code.

library(truncnorm)
library(evd)
set.seed(7)
n <- 1000
cp_sets <- c(0,n/2,n)
noCP <- length(cp_sets)-2
reptime <- 2
rho <- 0.2
# AR time series with no change-point (mean, var)=(0,1)
ts <- MAR(n, reptime, rho)*sqrt(1-rho^2)
trunc_r <- 0
p <- pnorm(trunc_r)
gpd_scale <- 2
gpd_shape <- 0.125
for(ts_index in 1:reptime){
  ts[(cp_sets[2]+1):n, ts_index] <- mix_GauGPD(pnorm(ts[(cp_sets[2]+1):n, ts_index]),
p,trunc_r,gpd_scale,gpd_shape)
}
ts <- ts[,2]
# grid_size undefined
# test in 90% quantile
result <- SNSeg_Uni(ts, paras_to_test = 0.9, confidence = 0.9,
                    grid_size_scale = 0.066, grid_size = NULL,
                    plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (90th quantile) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Test in a general functional

To perform the SN-based test using any general functional, users should define their own function as the input of paras_to_test. The input of this function should consist of the followings:

  • ts: A univariate time series provided by the user.

The user-defined function should return a numeric value as the output.

set.seed(7)
n <- 500
reptime <- 2
cp_sets <- round(n*c(0,cumsum(c(0.5,0.25)),1))
mean_shift <- c(0.4,0,0.4)
rho <- -0.7
ts <- MAR(n, reptime, rho)
no_seg <- length(cp_sets)-1
for(index in 1:no_seg){ # Mean shift
  tau1 <- cp_sets[index]+1
  tau2 <- cp_sets[index+1]
  ts[tau1:tau2,] <- ts[tau1:tau2,] + mean_shift[index]
}
ts <- ts[,2]
# Define a general functional for the input 'paras_to_test'
paras_to_test = function(ts){
  mean(ts)
}
result.SNCP.general <- SNSeg_Uni(ts, paras_to_test = paras_to_test, 
                                 confidence = 0.9, grid_size_scale = 0.05, 
                                 grid_size = NULL, plot_SN = FALSE, 
                                 est_cp_loc = TRUE)
# Estimated change-point locations
result.SNCP.general$est_cp
# Parameter estimates (general functional) of each segment
SNSeg_estimate(result.SNCP.general)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result.SNCP.general, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Test in multiple parameters

set.seed(7)
n <- 1000
cp_sets <- c(0,333,667,1000)
no_seg <- length(cp_sets)-1
rho <- 0
# AR time series with no change-point (mean, var)=(0,1)
ts <- MAR(n, 2, rho)*sqrt(1-rho^2)
no_seg <- length(cp_sets)-1
sd_shift <- c(1,1.6,1)
for(index in 1:no_seg){ # Mean shift
  tau1 <- cp_sets[index]+1
  tau2 <- cp_sets[index+1]
  ts[tau1:tau2,] <- ts[tau1:tau2,]*sd_shift[index]
}
d <- 2
ts <- ts[,2]

# Test in 90th and 95th quantile with grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = c(0.9, 0.95), confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = NULL, 
                    plot_SN = FALSE, est_cp_loc = FALSE)

# Test in 90th quantile and the variance with grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = c(0.9, 'variance'),
                    confidence = 0.95, grid_size_scale = 0.078,
                    grid_size = NULL, plot_SN = FALSE, 
                    est_cp_loc = FALSE)

# Test in 90th quantile, variance and acf with grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = c(0.9,'variance', "acf"),
                    confidence = 0.9, grid_size_scale = 0.064,
                    grid_size = NULL, plot_SN = TRUE, 
                    est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Test in 60th quantile, mean, variance and acf with grid_size defined
result.last <- SNSeg_Uni(ts, paras_to_test = c(0.6, 'mean', "variance",   
                    "acf"), confidence = 0.9, grid_size_scale = 0.05,
                    grid_size = 83, plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result.last$est_cp
# Parameter estimates (of the last example) of each segment
SNSeg_estimate(result.last)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result.last, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Examples: SNSeg_Multi

The function SNSeg_Multi detects change-points for multivariate time series based on the change in multivariate means or covariances. Users can set paras_to_test to mean or covariance for change-points estimation.

  • The dimension of parameters to be tested must be at most ten. If the parameter is mean, the dimension is equivalent to the number of time series. If the parameter is covariance, the dimension is equivalent to the number of unknown parameters within the covariance matrix.

For examples of multivariate means and covariances, please check the following commands:

# Please run this function before running examples:
exchange_cor_matrix <- function(d, rho){
  tmp <- matrix(rho, d, d)
  diag(tmp) <- 1
  return(tmp)
}

Segmentation for Multivariate Mean

library(mvtnorm)
set.seed(10)
d <- 5
n <- 1000
cp_sets <- round(n*c(0,cumsum(c(0.075,0.3,0.05,0.1,0.05)),1))
mean_shift <- c(-3,0,3,0,-3,0)/sqrt(d)
mean_shift <- sign(mean_shift)*ceiling(abs(mean_shift)*10)/10
rho_sets <- 0.5
sigma_cross <- list(exchange_cor_matrix(d,0))
ts <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets=c(0,n), sigma_cross)
noCP <- length(cp_sets)-2
no_seg <- length(cp_sets)-1
for(rep_index in 1:2){
  for(index in 1:no_seg){ # Mean shift
    tau1 <- cp_sets[index]+1
    tau2 <- cp_sets[index+1]
    ts[[rep_index]][,tau1:tau2] <- ts[[rep_index]][,tau1:tau2] + mean_shift[index]
    }
}
ts <- ts[1][[1]]

# grid_size undefined
result <- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.95,
                      grid_size_scale = 0.079, grid_size = NULL,
                      plot_SN = FALSE, est_cp_loc = TRUE)
# grid_size defined
result <- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.99,
                      grid_size_scale = 0.05, grid_size = 65,
                      plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (multivariate mean) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(2,3))
plot(result, cpts.col = 'red')

Segmentation for Multivariate Covariance

library(mvtnorm)
set.seed(10)
reptime <- 2
d <- 4
n <- 1000
sigma_cross <- list(exchange_cor_matrix(d,0.2),
                    2*exchange_cor_matrix(d,0.5),
                    4*exchange_cor_matrix(d,0.5))
rho_sets <- c(0.3,0.3,0.3)
mean_shift <- c(0,0,0) # with mean change
cp_sets <- round(c(0,n/3,2*n/3,n))
ts <- MAR_MTS_Covariance(n, reptime, rho_sets, cp_sets, sigma_cross)
noCP <- length(cp_sets)-2
no_seg <- length(cp_sets)-1
for(rep_index in 1:reptime){
  for(index in 1:no_seg){ # Mean shift
    tau1 <- cp_sets[index]+1
    tau2 <- cp_sets[index+1]
    ts[[rep_index]][,tau1:tau2] <- ts[[rep_index]][,tau1:tau2] +
      mean_shift[index]
    }
}
ts <- ts[[1]]

# grid_size undefined
result <- SNSeg_Multi(ts, paras_to_test = "covariance", 
                      confidence = 0.9, grid_size_scale = 0.05, 
                      grid_size = NULL, plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined
result <- SNSeg_Multi(ts, paras_to_test = "covariance", 
                      confidence = 0.9, grid_size_scale = 0.05, 
                      grid_size = 81, plot_SN = FALSE,
                      est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (covariance estimate) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(1,1))
plot(result, cpts.col = 'red')

Examples: SNSeg_HD

The function SNSeg_HD performs change-points estimation for high-dimensional time series (dimension greater than 10) using the change in high-dimensional means. For usage examples of SNSeg_HD, we provide the followig code:

n <- 600
p <- 100
nocp <- 5
cp_sets <- round(seq(0,nocp+1,1)/(nocp+1)*n)
num_entry <- 5
kappa <- sqrt(4/5) # Wang et al(2020)
mean_shift <- rep(c(0,kappa),100)[1:(length(cp_sets)-1)]
set.seed(1)
ts <- matrix(rnorm(n*p,0,1),n,p)
no_seg <- length(cp_sets)-1
for(index in 1:no_seg){ # Mean shift
  tau1 <- cp_sets[index]+1
  tau2 <- cp_sets[index+1]
  ts[tau1:tau2,1:num_entry] <- ts[tau1:tau2,1:num_entry] +
    mean_shift[index] # sparse change
}
# SN segmentation plot 
# grid_size undefined (plot the first 3 time series)
result <- SNSeg_HD(ts, confidence = 0.9, grid_size_scale = 0.05,
                   grid_size = NULL, plot_SN = FALSE, est_cp_loc = TRUE,
                   ts_index = c(1:3))
# grid_size defined (plot the 1st, 3rd and 5th time series)
result <- SNSeg_HD(ts, confidence = 0.9, grid_size_scale  = 0.05,
                   grid_size = 52, plot_SN = FALSE, est_cp_loc = TRUE,
                   ts_index = c(1,3,5))
# Estimated change-point locations
result$est_cp
# Parameter estimates (high-dimensional means) of each segment
summary.stat <- SNSeg_estimate(result)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red', ts_index = c(1:3))

We note that only the selected time series were plotted by setting values for ts_index.