Title: | A Collection of Change-Point Detection Methods |
---|---|
Description: | Performs a series of offline and/or online change-point detection algorithms for 1) univariate mean: <doi:10.1214/20-EJS1710>, <arXiv:2006.03283>; 2) univariate polynomials: <doi:10.1214/21-EJS1963>; 3) univariate and multivariate nonparametric settings: <doi:10.1214/21-EJS1809>, <doi:10.1109/TIT.2021.3130330>; 4) high-dimensional covariances: <doi:10.3150/20-BEJ1249>; 5) high-dimensional networks with and without missing values: <doi:10.1214/20-AOS1953>, <arXiv:2101.05477>, <arXiv:2110.06450>; 6) high-dimensional linear regression models: <arXiv:2010.10410>, <arXiv:2207.12453>; 7) high-dimensional vector autoregressive models: <arXiv:1909.06359>; 8) high-dimensional self exciting point processes: <arXiv:2006.03572>; 9) dependent dynamic nonparametric random dot product graphs: <arXiv:1911.07494>; 10) univariate mean against adversarial attacks: <arXiv:2105.10417>. |
Authors: | Haotian Xu [aut, cre], Oscar Padilla [aut], Daren Wang [aut], Mengchu Li [aut], Qin Wen [ctb] |
Maintainer: | Haotian Xu <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.0 |
Built: | 2024-11-07 06:47:23 UTC |
Source: | CRAN |
Perform the adversarially robust change point detection method with automatic selection of the contamination proportion epsilon when treating the inliner distributions as Gaussian.
aARC(y, t_dat, guess_true = 0.05, h, block_num = 1)
aARC(y, t_dat, guess_true = 0.05, h, block_num = 1)
y |
A |
t_dat |
A |
guess_true |
A |
h |
An |
block_num |
An |
An numeric
vector of estimated change point locations.
Mengchu Li
Li and Yu (2021) <arXiv:2105.10417>.
#' ### simulate data with contamination obs_num = 1000 D = 2 noise = 0.1 # proportion of contamination mu0 = 0 mu1 = 1 sd =1 idmixture = rbinom(obs_num/D, 1, 1-noise) dat = NULL for (j in 1:D){ for (i in 1:(obs_num/(2*D))){ if (idmixture[i] == 1){ dat = c(dat,rnorm(1,mu0,sd)) } else{ dat = c(dat,rnorm(1,mu1/(2*noise),0)) } } for (i in (obs_num/(2*D)+1):(obs_num/D)){ if (idmixture[i] == 1){ dat = c(dat,rnorm(1,mu1,sd)) } else{ dat = c(dat,rnorm(1,mu1/(2*noise)-(1-noise)*mu1/noise,0)) } } } plot(dat) ### perform aARC aARC(dat, dat[1:200], h = 120)
#' ### simulate data with contamination obs_num = 1000 D = 2 noise = 0.1 # proportion of contamination mu0 = 0 mu1 = 1 sd =1 idmixture = rbinom(obs_num/D, 1, 1-noise) dat = NULL for (j in 1:D){ for (i in 1:(obs_num/(2*D))){ if (idmixture[i] == 1){ dat = c(dat,rnorm(1,mu0,sd)) } else{ dat = c(dat,rnorm(1,mu1/(2*noise),0)) } } for (i in (obs_num/(2*D)+1):(obs_num/D)){ if (idmixture[i] == 1){ dat = c(dat,rnorm(1,mu1,sd)) } else{ dat = c(dat,rnorm(1,mu1/(2*noise)-(1-noise)*mu1/noise,0)) } } } plot(dat) ### perform aARC aARC(dat, dat[1:200], h = 120)
Perform the adversarially robust change point detection method.
ARC(y, h, block_num = 1, epsilon, gaussian = TRUE)
ARC(y, h, block_num = 1, epsilon, gaussian = TRUE)
y |
A |
h |
An |
block_num |
An |
epsilon |
A |
gaussian |
A |
An numeric
vector of estimated change point locations
Mengchu Li
Li and Yu (2021) <arXiv:2105.10417>.
### simulate data with contamination obs_num = 1000 D = 2 noise = 0.1 # proportion of contamination mu0 = 0 mu1 = 1 sd =1 idmixture = rbinom(obs_num/D, 1, 1-noise) dat = NULL for (j in 1:D){ for (i in 1:(obs_num/(2*D))){ if (idmixture[i] == 1){ dat = c(dat,rnorm(1,mu0,sd)) } else{ dat = c(dat,rnorm(1,mu1/(2*noise),0)) } } for (i in (obs_num/(2*D)+1):(obs_num/D)){ if (idmixture[i] == 1){ dat = c(dat,rnorm(1,mu1,sd)) } else{ dat = c(dat,rnorm(1,mu1/(2*noise)-(1-noise)*mu1/noise,0)) } } } plot(dat) ### perform ARC ARC(dat,h = 120, epsilon = 0.1)
### simulate data with contamination obs_num = 1000 D = 2 noise = 0.1 # proportion of contamination mu0 = 0 mu1 = 1 sd =1 idmixture = rbinom(obs_num/D, 1, 1-noise) dat = NULL for (j in 1:D){ for (i in 1:(obs_num/(2*D))){ if (idmixture[i] == 1){ dat = c(dat,rnorm(1,mu0,sd)) } else{ dat = c(dat,rnorm(1,mu1/(2*noise),0)) } } for (i in (obs_num/(2*D)+1):(obs_num/D)){ if (idmixture[i] == 1){ dat = c(dat,rnorm(1,mu1,sd)) } else{ dat = c(dat,rnorm(1,mu1/(2*noise)-(1-noise)*mu1/noise,0)) } } } plot(dat) ### perform ARC ARC(dat,h = 120, epsilon = 0.1)
Perform the backward detection method with a robust bootstrap change point test using U-statistics on univariate data.
BD_U(y, M, B = 100, inter = NULL, inter_ini = TRUE)
BD_U(y, M, B = 100, inter = NULL, inter_ini = TRUE)
y |
A |
M |
An |
B |
An |
inter |
A nuisance parameter. |
inter_ini |
A nuisance parameter. |
An numeric
vector of estimated change point locations.
Mengchu Li
Yu and Chen (2019) <arXiv:1904.03372>.
Perform binary segmentation for covariance change points detection through Operator Norm.
BS.cov(X, s, e, level = 0)
BS.cov(X, s, e, level = 0)
X |
A |
s |
A |
e |
A |
level |
A parameter for tracking the level at which a change point is detected. Should be fixed as 0. |
An object of class
"BS", which is a list
with the structure:
S: A vector of estimated changepoints (sorted in strictly increasing order).
Dval: A vector of values of CUSUM statistic based on KS distance.
Level: A vector representing the levels at which each change point is detected.
Parent: A matrix with the starting indices on the first row and the ending indices on the second row.
Haotian Xu
Wang, Yu and Rinaldo (2021) <doi:10.3150/20-BEJ1249>.
thresholdBS
for obtain change points estimation.
p = 10 A1 = gen.cov.mat(p, 1, "equal") A2 = gen.cov.mat(p, 2, "diagonal") A3 = gen.cov.mat(p, 3, "power") X = cbind(t(MASS::mvrnorm(100, mu = rep(0, p), A1)), t(MASS::mvrnorm(150, mu = rep(0, p), A2)), t(MASS::mvrnorm(200, mu = rep(0, p), A3))) temp = BS.cov(X, 1, 450) thresholdBS(temp, 10)
p = 10 A1 = gen.cov.mat(p, 1, "equal") A2 = gen.cov.mat(p, 2, "diagonal") A3 = gen.cov.mat(p, 3, "power") X = cbind(t(MASS::mvrnorm(100, mu = rep(0, p), A1)), t(MASS::mvrnorm(150, mu = rep(0, p), A2)), t(MASS::mvrnorm(200, mu = rep(0, p), A3))) temp = BS.cov(X, 1, 450) thresholdBS(temp, 10)
Perform standard binary segmentation for univariate nonparametric change points detection.
BS.uni.nonpar(Y, s, e, N, delta, level = 0)
BS.uni.nonpar(Y, s, e, N, delta, level = 0)
Y |
A |
s |
A |
e |
A |
N |
A |
delta |
A positive |
level |
Should be fixed as 0. |
An object of class
"BS", which is a list
with the following structure:
S |
A vector of estimated change points (sorted in strictly increasing order). |
Dval |
A vector of values of CUSUM statistic based on KS distance. |
Level |
A vector representing the levels at which each change point is detected. |
Parent |
A matrix with the starting indices on the first row and the ending indices on the second row. |
Oscar Hernan Madrid Padilla & Haotian Xu
Padilla, Yu, Wang and Rinaldo (2021) <doi:10.1214/21-EJS1809>.
thresholdBS
for obtaining change points estimation, tuneBSuninonpar
for a tuning version.
Y = t(as.matrix(c(rnorm(100, 0, 1), rnorm(100, 0, 10), rnorm(100, 0, 40)))) N = rep(1, 300) temp = BS.uni.nonpar(Y, 1, 300, N, 5) plot.ts(t(Y)) points(x = tail(temp$S[order(temp$Dval)],4), y = Y[,tail(temp$S[order(temp$Dval)],4)], col = "red")
Y = t(as.matrix(c(rnorm(100, 0, 1), rnorm(100, 0, 10), rnorm(100, 0, 40)))) N = rep(1, 300) temp = BS.uni.nonpar(Y, 1, 300, N, 5) plot.ts(t(Y)) points(x = tail(temp$S[order(temp$Dval)],4), y = Y[,tail(temp$S[order(temp$Dval)],4)], col = "red")
Perform standard binary segmentation for univariate mean change points detection.
BS.univar(y, s, e, delta = 2, level = 0)
BS.univar(y, s, e, delta = 2, level = 0)
y |
A |
s |
A |
e |
A |
delta |
A positive |
level |
Should be fixed as 0. |
An object of class
"BS", which is a list
with the following structure:
S |
A vector of estimated change point locations (sorted in strictly increasing order). |
Dval |
A vector of values of CUSUM statistic. |
Level |
A vector representing the levels at which each change point is detected. |
Parent |
A matrix with the starting indices on the first row and the ending indices on the second row. |
Haotian Xu
Wang, Yu and Rinaldo (2020) <doi:10.1214/20-EJS1710>.
thresholdBS
for obtaining change points estimation, tuneBSunivar
for a tuning version.
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) temp = BS.univar(y, 1, length(y), delta = 5) plot.ts(y) points(x = tail(temp$S[order(temp$Dval)],4), y = y[tail(temp$S[order(temp$Dval)],4)], col = "red") BS_result = thresholdBS(temp, tau = 4) BS_result print(BS_result$BS_tree, "value") print(BS_result$BS_tree_trimmed, "value") cpt_hat = sort(BS_result$cpt_hat[,1]) # the threshold tau is specified to be 4 Hausdorff.dist(cpt_hat, cpt_true) cpt_LR = local.refine.univar(cpt_hat, y) Hausdorff.dist(cpt_LR, cpt_true)
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) temp = BS.univar(y, 1, length(y), delta = 5) plot.ts(y) points(x = tail(temp$S[order(temp$Dval)],4), y = y[tail(temp$S[order(temp$Dval)],4)], col = "red") BS_result = thresholdBS(temp, tau = 4) BS_result print(BS_result$BS_tree, "value") print(BS_result$BS_tree_trimmed, "value") cpt_hat = sort(BS_result$cpt_hat[,1]) # the threshold tau is specified to be 4 Hausdorff.dist(cpt_hat, cpt_true) cpt_LR = local.refine.univar(cpt_hat, y) Hausdorff.dist(cpt_LR, cpt_true)
Calibrate step for online change point detection for network data by controlling the false alarm rate at level alpha.
calibrate.online.network.missing( train_miss_list, train_eta_list, threshold_len, alpha_grid, permu_num, pi_lb_hat, pi_ub_hat, rho_hat, rank_hat, C_lambda = 2/3, delta = 5 )
calibrate.online.network.missing( train_miss_list, train_eta_list, threshold_len, alpha_grid, permu_num, pi_lb_hat, pi_ub_hat, rho_hat, rank_hat, C_lambda = 2/3, delta = 5 )
train_miss_list |
A |
train_eta_list |
A |
threshold_len |
An |
alpha_grid |
A |
permu_num |
An |
pi_lb_hat |
A |
pi_ub_hat |
A |
rho_hat |
A |
rank_hat |
An |
C_lambda |
A |
delta |
An |
A list
with the following structure:
C_lambda |
The absolute constant |
rho_hat |
the (estimated) sparsity parameter |
rank_hat |
the (estimated) rank of underlying graphon matrix |
pi_lb_hat |
the (estimated) lower bound of the missing probability |
pi_ub_hat |
the (estimated) upper bound of the missing probability |
thresholds_array |
A |
Haotian Xu
Dubey, Xu and Yu (2021) <arxiv:2110.06450>
online.network.missing
for detecting online change point.
p = 6 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM train_obs_num = 150 # sample size for each segment conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm = simu.SBM(conn1_mat, can_vec, train_obs_num, symm = TRUE, self = TRUE) train_mat = sbm$obs_mat train_list = lapply(1:ncol(train_mat), function(t) lowertri2mat(train_mat[,t], p, diag = TRUE)) pi_mat = matrix(0.9, p, p) train_eta_list = lapply(1:length(train_list), function(t) gen.missing(pi_mat, symm = TRUE)) train_miss_list = lapply(1:length(train_list), function(t) train_eta_list[[t]] * train_list[[t]]) pi_lb_hat = quantile(Reduce("+", train_eta_list)/train_obs_num, 0.05) # estimator of pi_lb pi_ub_hat = quantile(Reduce("+", train_eta_list)/train_obs_num, 0.95) # estimator of pi_ub C_lambda = 2/3 lambda = lambda.network.missing(1, length(train_miss_list), length(train_miss_list), 0.05, rho = 0.509, pi_ub = pi_ub_hat, p, C_lambda) graphon_miss_impute = softImpute.network.missing(train_miss_list, train_eta_list, lambda, 1) graphon_miss_hat = graphon_miss_impute$u %*% diag(as.numeric(graphon_miss_impute$d)) %*% t(graphon_miss_impute$v) rho_hat = quantile(graphon_miss_hat, 0.95) rank_hat = sum(graphon_miss_impute$d != 0) alpha_grid = c(0.05) permu_num = 10 threshold_len = 30 temp = calibrate.online.network.missing(train_miss_list, train_eta_list, threshold_len, alpha_grid, permu_num, pi_lb_hat, pi_ub_hat, rho_hat, rank_hat, C_lambda, delta = 5)
p = 6 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM train_obs_num = 150 # sample size for each segment conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm = simu.SBM(conn1_mat, can_vec, train_obs_num, symm = TRUE, self = TRUE) train_mat = sbm$obs_mat train_list = lapply(1:ncol(train_mat), function(t) lowertri2mat(train_mat[,t], p, diag = TRUE)) pi_mat = matrix(0.9, p, p) train_eta_list = lapply(1:length(train_list), function(t) gen.missing(pi_mat, symm = TRUE)) train_miss_list = lapply(1:length(train_list), function(t) train_eta_list[[t]] * train_list[[t]]) pi_lb_hat = quantile(Reduce("+", train_eta_list)/train_obs_num, 0.05) # estimator of pi_lb pi_ub_hat = quantile(Reduce("+", train_eta_list)/train_obs_num, 0.95) # estimator of pi_ub C_lambda = 2/3 lambda = lambda.network.missing(1, length(train_miss_list), length(train_miss_list), 0.05, rho = 0.509, pi_ub = pi_ub_hat, p, C_lambda) graphon_miss_impute = softImpute.network.missing(train_miss_list, train_eta_list, lambda, 1) graphon_miss_hat = graphon_miss_impute$u %*% diag(as.numeric(graphon_miss_impute$d)) %*% t(graphon_miss_impute$v) rho_hat = quantile(graphon_miss_hat, 0.95) rank_hat = sum(graphon_miss_impute$d != 0) alpha_grid = c(0.05) permu_num = 10 threshold_len = 30 temp = calibrate.online.network.missing(train_miss_list, train_eta_list, threshold_len, alpha_grid, permu_num, pi_lb_hat, pi_ub_hat, rho_hat, rank_hat, C_lambda, delta = 5)
Performs a series of offline and/or online change-point detection algorithms for 1) univariate mean; 2) univariate polynomials; 3) univariate and multivariate nonparametric settings; 4) high-dimensional covariances; 5) high-dimensional networks with and without missing values; 6) high-dimensional linear regression models; 7) high-dimensional vector autoregressive models; 8) high-dimensional self exciting point processes; 9) dependent dynamic nonparametric random dot product graphs; 10) univariate mean against adversarial attacks. For more informations, see Wang et al. (2020) <arXiv:1810.09498>; Yu et al. (2020) <arXiv:2006.03283>; Yu and Chatterjee (2020) <arXiv:2007.09910>; Padilla et al. (2021) <arXiv:1905.10019>; Padilla et al. (2019) <arXiv:1910.13289>; Wang et al. (2021) <arXiv:1712.09912>; Wang et al. (2018) <arXiv:1809.09602>; Padilla et al. (2019) <arXiv:1911.07494>; Yu et al. (2021) <arXiv:2101.05477>; Rinaldo et al. (2020) <arXiv:2010.10410>; Wang et al. (2019) <arXiv:1909.06359>; Wang et al. (2020) <arXiv:2006.03572>; Dubey et al. (2021) <arXiv:2110.06450>; Li and Yu (2021) <arXiv:2105.10417>.
Construct element-wise confidence interval for change points.
CI.regression( cpt_init, cpt_LR, beta_hat, y, X, w = 0.9, B = 1000, M, alpha_vec, rounding = TRUE )
CI.regression( cpt_init, cpt_LR, beta_hat, y, X, w = 0.9, B = 1000, M, alpha_vec, rounding = TRUE )
cpt_init |
An |
cpt_LR |
An |
beta_hat |
A |
y |
A |
X |
A |
w |
A |
B |
An |
M |
An |
alpha_vec |
An |
rounding |
A |
An length(cpt_init)-2-length(alpha_vec) array of confidence intervals.
Haotian Xu
Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
d0 = 5 p = 10 n = 200 cpt_true = c(70, 140) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) lambda_set = c(0.1, 0.5, 1, 2) zeta_set = c(10, 15, 20) temp = CV.search.DPDU.regression(y = data$y, X = data$X, lambda_set, zeta_set) temp$test_error # test error result # find the indices of lambda_set and zeta_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) lambda_set[min_idx[2]] zeta_set[min_idx[1]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) beta_hat = matrix(unlist(temp$beta_hat[min_idx[1], min_idx[2]]), ncol = length(cpt_init)+1) cpt_LR = local.refine.DPDU.regression(cpt_init, beta_hat, data$y, data$X, w = 0.9) alpha_vec = c(0.01, 0.05, 0.1) CI.regression(cpt_init, cpt_LR, beta_hat, data$y, data$X, w = 0.9, B = 1000, M = n, alpha_vec)
d0 = 5 p = 10 n = 200 cpt_true = c(70, 140) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) lambda_set = c(0.1, 0.5, 1, 2) zeta_set = c(10, 15, 20) temp = CV.search.DPDU.regression(y = data$y, X = data$X, lambda_set, zeta_set) temp$test_error # test error result # find the indices of lambda_set and zeta_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) lambda_set[min_idx[2]] zeta_set[min_idx[1]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) beta_hat = matrix(unlist(temp$beta_hat[min_idx[1], min_idx[2]]), ncol = length(cpt_init)+1) cpt_LR = local.refine.DPDU.regression(cpt_init, beta_hat, data$y, data$X, w = 0.9) alpha_vec = c(0.01, 0.05, 0.1) CI.regression(cpt_init, cpt_LR, beta_hat, data$y, data$X, w = 0.9, B = 1000, M = n, alpha_vec)
Perform grid search based on Cross-Validation of all tuning parameters (gamma, lambda and zeta)
CV.search.DP.LR.regression( y, X, gamma_set, lambda_set, zeta_set, delta, eps = 0.001 )
CV.search.DP.LR.regression( y, X, gamma_set, lambda_set, zeta_set, delta, eps = 0.001 )
y |
A |
X |
A |
gamma_set |
A |
lambda_set |
A |
zeta_set |
A |
delta |
A strictly positive |
eps |
A |
A list
with the following structure:
cpt_hat |
A list of vector of estimated changepoints (sorted in strictly increasing order) |
K_hat |
A list of scalar of number of estimated changepoints |
test_error |
A list of vector of testing errors (each row corresponding to each gamma, and each column corresponding to each lambda) |
train_error |
A list of vector of training errors |
Daren Wang & Haotian Xu
Rinaldo, Wang, Wen, Willett and Yu (2020) <arxiv:2010.10410>
set.seed(123) d0 = 8 p = 15 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) gamma_set = c(0.01, 0.1) lambda_set = c(0.01, 0.1) temp = CV.search.DP.regression(y = data$y, X = data$X, gamma_set, lambda_set, delta = 2) temp$test_error # test error result # find the indices of gamma_set and lambda_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) gamma_set[min_idx[1]] lambda_set[min_idx[2]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) zeta_set = c(0.1, 1) temp_zeta = CV.search.DP.LR.regression(data$y, data$X, gamma_set[min_idx[1]], lambda_set[min_idx[2]], zeta_set, delta = 2, eps = 0.001) min_zeta_idx = which.min(unlist(temp_zeta$test_error)) cpt_LR = local.refine.regression(cpt_init, data$y, X = data$X, zeta = zeta_set[min_zeta_idx]) Hausdorff.dist(cpt_init, cpt_true) Hausdorff.dist(cpt_LR, cpt_true)
set.seed(123) d0 = 8 p = 15 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) gamma_set = c(0.01, 0.1) lambda_set = c(0.01, 0.1) temp = CV.search.DP.regression(y = data$y, X = data$X, gamma_set, lambda_set, delta = 2) temp$test_error # test error result # find the indices of gamma_set and lambda_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) gamma_set[min_idx[1]] lambda_set[min_idx[2]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) zeta_set = c(0.1, 1) temp_zeta = CV.search.DP.LR.regression(data$y, data$X, gamma_set[min_idx[1]], lambda_set[min_idx[2]], zeta_set, delta = 2, eps = 0.001) min_zeta_idx = which.min(unlist(temp_zeta$test_error)) cpt_LR = local.refine.regression(cpt_init, data$y, X = data$X, zeta = zeta_set[min_zeta_idx]) Hausdorff.dist(cpt_init, cpt_true) Hausdorff.dist(cpt_LR, cpt_true)
Perform grid search for dynamic programming to select the tuning parameter through Cross-Validation.
CV.search.DP.poly(y, r, gamma_set, delta)
CV.search.DP.poly(y, r, gamma_set, delta)
y |
A |
r |
An |
gamma_set |
A |
delta |
A positive |
A list
with the following structure:
cpt_hat |
A list of vector of estimated change points locations (sorted in strictly increasing order) |
K_hat |
A list of scalar of number of estimated change points |
test_error |
A list of vector of testing errors |
train_error |
A list of vector of training errors |
Haotian Xu
Yu and Chatterjee (2020) <arXiv:2007.09910>
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) plot.ts(y) gamma_set = 3:9 DP_result = CV.search.DP.poly(y, r = 2, gamma_set, delta = 5) min_idx = which.min(DP_result$test_error) cpt_init = unlist(DP_result$cpt_hat[min_idx]) local.refine.poly(cpt_init, y, r = 2, delta_lr = 5)
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) plot.ts(y) gamma_set = 3:9 DP_result = CV.search.DP.poly(y, r = 2, gamma_set, delta = 5) min_idx = which.min(DP_result$test_error) cpt_init = unlist(DP_result$cpt_hat[min_idx]) local.refine.poly(cpt_init, y, r = 2, delta_lr = 5)
penalisation.Perform grid search to select tuning parameters gamma (for penalty of DP) and lambda (for lasso penalty) based on cross-validation.
CV.search.DP.regression(y, X, gamma_set, lambda_set, delta, eps = 0.001)
CV.search.DP.regression(y, X, gamma_set, lambda_set, delta, eps = 0.001)
y |
A |
X |
A |
gamma_set |
A |
lambda_set |
A |
delta |
A strictly positive |
eps |
A |
A list
with the following structure:
cpt_hat |
A list of vector of estimated change points |
K_hat |
A list of scalar of number of estimated change points |
test_error |
A list of vector of testing errors (each row corresponding to each gamma, and each column corresponding to each lambda) |
train_error |
A list of vector of training errors |
Daren Wang
Rinaldo, Wang, Wen, Willett and Yu (2020) <arxiv:2010.10410>
d0 = 10 p = 20 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) gamma_set = c(0.01, 0.1, 1) lambda_set = c(0.01, 0.1, 1, 3) temp = CV.search.DP.regression(y = data$y, X = data$X, gamma_set, lambda_set, delta = 2) temp$test_error # test error result # find the indices of gamma_set and lambda_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) gamma_set[min_idx[1]] lambda_set[min_idx[2]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]])
d0 = 10 p = 20 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) gamma_set = c(0.01, 0.1, 1) lambda_set = c(0.01, 0.1, 1, 3) temp = CV.search.DP.regression(y = data$y, X = data$X, gamma_set, lambda_set, delta = 2) temp$test_error # test error result # find the indices of gamma_set and lambda_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) gamma_set[min_idx[1]] lambda_set[min_idx[2]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]])
Perform grid search for dynamic programming to select the tuning parameter through Cross-Validation.
CV.search.DP.univar(y, gamma_set, delta)
CV.search.DP.univar(y, gamma_set, delta)
y |
A |
gamma_set |
A |
delta |
A positive |
A list
with the following structure:
cpt_hat |
A list of vector of estimated change points (sorted in strictly increasing order). |
K_hat |
A list of scalar of number of estimated change points. |
test_error |
A list of vector of testing errors. |
train_error |
A list of vector of training errors. |
Daren Wang & Haotian Xu
Wang, Yu and Rinaldo (2020) <doi:10.1214/20-EJS1710>
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) gamma_set = 1:5 DP_result = CV.search.DP.univar(y, gamma_set, delta = 5) min_idx = which.min(DP_result$test_error) cpt_hat = unlist(DP_result$cpt_hat[min_idx]) Hausdorff.dist(cpt_hat, cpt_true)
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) gamma_set = 1:5 DP_result = CV.search.DP.univar(y, gamma_set, delta = 5) min_idx = which.min(DP_result$test_error) cpt_hat = unlist(DP_result$cpt_hat[min_idx]) Hausdorff.dist(cpt_hat, cpt_true)
penalty.Perform grid search based on cross-validation of dynamic programming for VAR change points detection.
CV.search.DP.VAR1(DATA, gamma_set, lambda_set, delta, eps = 0.001)
CV.search.DP.VAR1(DATA, gamma_set, lambda_set, delta, eps = 0.001)
DATA |
A |
gamma_set |
A |
lambda_set |
A |
delta |
A strictly |
eps |
A |
A list
with the following structure:
cpt_hat |
A list of vector of estimated change points |
K_hat |
A list of scalar of number of estimated change points |
test_error |
A list of vector of testing errors (each row corresponding to each gamma, and each column corresponding to each lambda) |
train_error |
A list of vector of training errors |
Daren Wang & Haotian Xu
Wang, Yu, Rinaldo and Willett (2019) <arxiv:1909.06359>
set.seed(123) p = 10 sigma = 1 n = 20 v1 = 2*(seq(1,p,1)%%2) - 1 v2 = -v1 AA = matrix(0, nrow = p, ncol = p-2) A1 = cbind(v1,v2,AA)*0.1 A2 = cbind(v2,v1,AA)*0.1 A3 = A1 cpt_true = c(40, 80) data = simu.VAR1(sigma, p, 2*n+1, A1) data = cbind(data, simu.VAR1(sigma, p, 2*n, A2, vzero=c(data[,ncol(data)]))) data = cbind(data, simu.VAR1(sigma, p, 2*n, A3, vzero=c(data[,ncol(data)]))) gamma_set = c(0.1, 0.5, 1) lambda_set = c(0.1, 1, 3.2) temp = CV.search.DP.VAR1(data, gamma_set, lambda_set, delta = 5) temp$test_error # test error result # find the indices of gamma.set and lambda.set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) Hausdorff.dist(cpt_init, cpt_true)
set.seed(123) p = 10 sigma = 1 n = 20 v1 = 2*(seq(1,p,1)%%2) - 1 v2 = -v1 AA = matrix(0, nrow = p, ncol = p-2) A1 = cbind(v1,v2,AA)*0.1 A2 = cbind(v2,v1,AA)*0.1 A3 = A1 cpt_true = c(40, 80) data = simu.VAR1(sigma, p, 2*n+1, A1) data = cbind(data, simu.VAR1(sigma, p, 2*n, A2, vzero=c(data[,ncol(data)]))) data = cbind(data, simu.VAR1(sigma, p, 2*n, A3, vzero=c(data[,ncol(data)]))) gamma_set = c(0.1, 0.5, 1) lambda_set = c(0.1, 1, 3.2) temp = CV.search.DP.VAR1(data, gamma_set, lambda_set, delta = 5) temp$test_error # test error result # find the indices of gamma.set and lambda.set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) Hausdorff.dist(cpt_init, cpt_true)
penalisation.Perform grid search to select tuning parameters gamma (for penalty of DP) and lambda (for lasso penalty) based on cross-validation.
CV.search.DPDU.regression(y, X, lambda_set, zeta_set, eps = 0.001)
CV.search.DPDU.regression(y, X, lambda_set, zeta_set, eps = 0.001)
y |
A |
X |
A |
lambda_set |
A |
zeta_set |
An |
eps |
A |
A list
with the following structure:
cpt_hat |
A list of vectors of estimated change points |
K_hat |
A list of scalars of number of estimated change points |
test_error |
A matrix of testing errors (each row corresponding to each gamma, and each column corresponding to each lambda) |
train_error |
A matrix of training errors |
beta_hat |
A list of matrices of estimated regression coefficients |
Haotian Xu
Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
d0 = 5 p = 30 n = 200 cpt_true = 100 data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) lambda_set = c(0.01, 0.1, 1, 2) zeta_set = c(10, 15, 20) temp = CV.search.DPDU.regression(y = data$y, X = data$X, lambda_set, zeta_set) temp$test_error # test error result # find the indices of lambda_set and zeta_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) lambda_set[min_idx[2]] zeta_set[min_idx[1]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) beta_hat = matrix(unlist(temp$beta_hat[min_idx[1], min_idx[2]]), ncol = length(cpt_init)+1)
d0 = 5 p = 30 n = 200 cpt_true = 100 data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) lambda_set = c(0.01, 0.1, 1, 2) zeta_set = c(10, 15, 20) temp = CV.search.DPDU.regression(y = data$y, X = data$X, lambda_set, zeta_set) temp$test_error # test error result # find the indices of lambda_set and zeta_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) lambda_set[min_idx[2]] zeta_set[min_idx[1]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) beta_hat = matrix(unlist(temp$beta_hat[min_idx[1], min_idx[2]]), ncol = length(cpt_init)+1)
Perform dynamic programming algorithm for univariate polynomials change points detection.
DP.poly(y, r, gamma, delta)
DP.poly(y, r, gamma, delta)
y |
A |
r |
An |
gamma |
A |
delta |
A strictly |
A list
with the following structure:
An object of class
"DP", which is a list
with the following structure:
partition |
A vector of the best partition. |
yhat |
A vector of mean estimation for corresponding to the best partition. |
cpt |
A vector of change points estimation. |
Haotian Xu
Yu and Chatterjee (2020) <arXiv:2007.09910>
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) plot.ts(y) temp = DP.poly(y, r = 2, gamma = 15, delta = 5) temp$cpt
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) plot.ts(y) temp = DP.poly(y, r = 2, gamma = 15, delta = 5) temp$cpt
penalisation.Perform dynamic programming algorithm for regression change points localisation.
DP.regression(y, X, gamma, lambda, delta, eps = 0.001)
DP.regression(y, X, gamma, lambda, delta, eps = 0.001)
y |
A |
X |
A |
gamma |
A positive |
lambda |
A positive |
delta |
A positive |
eps |
A |
An object of class
"DP", which is a list
with the following structure:
partition |
A vector of the best partition. |
cpt |
A vector of change points estimation. |
Daren Wang & Haotian Xu
Rinaldo, Wang, Wen, Willett and Yu (2020) <arxiv:2010.10410>
d0 = 10 p = 20 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) temp = DP.regression(y = data$y, X = data$X, gamma = 2, lambda = 1, delta = 5) cpt_hat = temp$cpt
d0 = 10 p = 20 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) temp = DP.regression(y = data$y, X = data$X, gamma = 2, lambda = 1, delta = 5) cpt_hat = temp$cpt
penalty.Perform dynamic programming for SEPP change points detection.
DP.SEPP(DATA, gamma, lambda, delta, delta2, intercept, threshold)
DP.SEPP(DATA, gamma, lambda, delta, delta2, intercept, threshold)
DATA |
A |
gamma |
A |
lambda |
A |
delta |
An |
delta2 |
An |
intercept |
A |
threshold |
A |
An object of class
"DP", which is a list
with the following structure:
partition |
A vector of the best partition. |
cpt |
A vector of change points estimation. |
Daren Wang & Haotian Xu
Wang, D., Yu, Y., & Willett, R. (2020). Detecting Abrupt Changes in High-Dimensional Self-Exciting Poisson Processes. arXiv preprint arXiv:2006.03572.
p = 8 # dimension n = 15 s = 3 # s is sparsity factor = 0.2 # large factor gives exact recovery threshold = 4 # thresholding makes the process stable intercept = 1/2 # intercept of the model. Assume to be known as in the existing literature A1 = A2 = A3 = matrix(0, p, p) diag(A1[,-1]) = 1 diag(A1) = 1 diag(A1[-1,]) = -1 A1 = A1*factor A1[(s+1):p, (s+1):p] = 0 diag(A2[,-1]) = 1 diag(A2) = -1 diag(A2[-1,]) = 1 A2 = A2*factor A2[(s+1):p, (s+1):p] = 0 data1 = simu.SEPP(intercept, n, A1, threshold, vzero = NULL) data2 = simu.SEPP(intercept, n, A2, threshold, vzero = data1[,n]) data = cbind(data1, data2) gamma = 0.1 delta = 0.5*n delta2 = 1.5*n intercept = 1/2 threshold = 6 DP_result = DP.SEPP(data, gamma = gamma, lambda = 0.03, delta, delta2, intercept, threshold) cpt_hat = DP_result$cpt
p = 8 # dimension n = 15 s = 3 # s is sparsity factor = 0.2 # large factor gives exact recovery threshold = 4 # thresholding makes the process stable intercept = 1/2 # intercept of the model. Assume to be known as in the existing literature A1 = A2 = A3 = matrix(0, p, p) diag(A1[,-1]) = 1 diag(A1) = 1 diag(A1[-1,]) = -1 A1 = A1*factor A1[(s+1):p, (s+1):p] = 0 diag(A2[,-1]) = 1 diag(A2) = -1 diag(A2[-1,]) = 1 A2 = A2*factor A2[(s+1):p, (s+1):p] = 0 data1 = simu.SEPP(intercept, n, A1, threshold, vzero = NULL) data2 = simu.SEPP(intercept, n, A2, threshold, vzero = data1[,n]) data = cbind(data1, data2) gamma = 0.1 delta = 0.5*n delta2 = 1.5*n intercept = 1/2 threshold = 6 DP_result = DP.SEPP(data, gamma = gamma, lambda = 0.03, delta, delta2, intercept, threshold) cpt_hat = DP_result$cpt
penalty.Perform dynamic programming for univariate mean change points detection.
DP.univar(y, gamma, delta)
DP.univar(y, gamma, delta)
y |
A |
gamma |
A |
delta |
A positive |
An object of class
"DP", which is a list
with the following structure:
partition |
A vector of the best partition. |
yhat |
A vector of mean estimation for corresponding to the best partition. |
cpt |
A vector of change points estimation. |
Haotian Xu
Wang, Yu and Rinaldo (2020) <doi:10.1214/20-EJS1710>
set.seed(123) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(1,30),rep(0,120),rep(1,130)) DP_result = DP.univar(y, gamma = 5, delta = 5) cpt_hat = DP_result$cpt Hausdorff.dist(cpt_hat, cpt_true)
set.seed(123) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(1,30),rep(0,120),rep(1,130)) DP_result = DP.univar(y, gamma = 5, delta = 5) cpt_hat = DP_result$cpt Hausdorff.dist(cpt_hat, cpt_true)
penalty.Perform dynamic programming for VAR1 change points detection through penalty.
DP.VAR1(X_futu, X_curr, gamma, lambda, delta, eps = 0.001)
DP.VAR1(X_futu, X_curr, gamma, lambda, delta, eps = 0.001)
X_futu |
A |
X_curr |
A |
gamma |
A |
lambda |
A |
delta |
A strictly |
eps |
A |
An object of class
"DP", which is a list
with the following structure:
partition |
A vector of the best partition. |
cpt |
A vector of change points estimation. |
Daren Wang & Haotian Xu
Wang, Yu, Rinaldo and Willett (2019) <arxiv:1909.06359>
p = 10 sigma = 1 n = 20 v1 = 2*(seq(1,p,1)%%2) - 1 v2 = -v1 AA = matrix(0, nrow = p, ncol = p-2) A1 = cbind(v1,v2,AA)*0.1 A2 = cbind(v2,v1,AA)*0.1 A3 = A1 data = simu.VAR1(sigma, p, 2*n+1, A1) data = cbind(data, simu.VAR1(sigma, p, 2*n, A2, vzero=c(data[,ncol(data)]))) data = cbind(data, simu.VAR1(sigma, p, 2*n, A3, vzero=c(data[,ncol(data)]))) N = ncol(data) X_curr = data[,1:(N-1)] X_futu = data[,2:N] DP_result = DP.VAR1(X_futu, X_curr, gamma = 1, lambda = 1, delta = 5) DP_result$cpt
p = 10 sigma = 1 n = 20 v1 = 2*(seq(1,p,1)%%2) - 1 v2 = -v1 AA = matrix(0, nrow = p, ncol = p-2) A1 = cbind(v1,v2,AA)*0.1 A2 = cbind(v2,v1,AA)*0.1 A3 = A1 data = simu.VAR1(sigma, p, 2*n+1, A1) data = cbind(data, simu.VAR1(sigma, p, 2*n, A2, vzero=c(data[,ncol(data)]))) data = cbind(data, simu.VAR1(sigma, p, 2*n, A3, vzero=c(data[,ncol(data)]))) N = ncol(data) X_curr = data[,1:(N-1)] X_futu = data[,2:N] DP_result = DP.VAR1(X_futu, X_curr, gamma = 1, lambda = 1, delta = 5) DP_result$cpt
penalisation.Perform DPDU algorithm for regression change points localisation.
DPDU.regression(y, X, lambda, zeta, eps = 0.001)
DPDU.regression(y, X, lambda, zeta, eps = 0.001)
y |
A |
X |
A |
lambda |
A positive |
zeta |
A positive |
eps |
A |
An object of class
"DP", which is a list
with the following structure:
partition |
A vector of the best partition. |
cpt |
A vector of change points estimation. |
Haotian Xu
Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
d0 = 10 p = 20 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) temp = DPDU.regression(y = data$y, X = data$X, lambda = 1, zeta = 20) cpt_hat = temp$cpt
d0 = 10 p = 20 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) temp = DPDU.regression(y = data$y, X = data$X, lambda = 1, zeta = 20) cpt_hat = temp$cpt
Generate population covariance matrix with dimension p.
gen.cov.mat(p, sigma2, type)
gen.cov.mat(p, sigma2, type)
p |
A |
sigma2 |
A positive |
type |
Specify the type of a covariance matrix: Diagonal structure ("diagonal"); Equal correlation structure ("equal"); Power decay structure ("power"). |
A numeric
p-by-p matrix.
Haotian Xu
gen.cov.mat(p = 5, sigma2 = 1, type = "diagonal")
gen.cov.mat(p = 5, sigma2 = 1, type = "diagonal")
Function to generate a matrix with values 0 or 1, where 0 indicating the entry is missing
gen.missing(pi_mat, symm = TRUE)
gen.missing(pi_mat, symm = TRUE)
pi_mat |
A |
symm |
A |
A numeric
p x p matrix.
Haotian Xu
p = 5 pi_mat = matrix(0.9, p, p) eta_mat = gen.missing(pi_mat, symm = TRUE)
p = 5 pi_mat = matrix(0.9, p, p) eta_mat = gen.missing(pi_mat, symm = TRUE)
Generate univariate data from piecewise polynomials (currently, only the linear, quadratic functions and cubic functions are considered).
gen.piece.poly(init_coef_vec, cpt_vec, kappa_mat, n, sigma)
gen.piece.poly(init_coef_vec, cpt_vec, kappa_mat, n, sigma)
init_coef_vec |
A (r+1)-dim |
cpt_vec |
A K-dim |
kappa_mat |
A (r+1)xK |
n |
An |
sigma |
A |
A vector of data generated from piecewise polynomials.
Haotian Xu
Yu and Chatterjee (2020) <arXiv:2007.09910>.
r = 2 init_coef_vec = c(-2, 2, 9) cpt_true = c(100, 200) n = 300 sigma = 1 kappa_mat = cbind(c(3, 9, -27), c(-3, 9, -27)) plot.ts(gen.piece.poly(init_coef_vec, cpt_true, kappa_mat, n, sigma), ylab = "y")
r = 2 init_coef_vec = c(-2, 2, 9) cpt_true = c(100, 200) n = 300 sigma = 1 kappa_mat = cbind(c(3, 9, -27), c(-3, 9, -27)) plot.ts(gen.piece.poly(init_coef_vec, cpt_true, kappa_mat, n, sigma), ylab = "y")
Compute mean function of piecewise polynomials (currently, only the linear, quadratic functions and cubic functions are considered).
gen.piece.poly.noiseless(init_coef_vec, cpt_vec, kappa_mat, n)
gen.piece.poly.noiseless(init_coef_vec, cpt_vec, kappa_mat, n)
init_coef_vec |
A |
cpt_vec |
An |
kappa_mat |
A |
n |
An |
A vector of mean function of piecewise polynomials.
Haotian Xu
Yu and Chatterjee (2020) <arXiv:2007.09910>
r = 2 init_coef_vec = c(-2, 2, 9) cpt_true = c(100, 200) n = 300 kappa_mat = cbind(c(3, 9, -27), c(-3, 9, -27)) plot.ts(gen.piece.poly.noiseless(init_coef_vec, cpt_true, kappa_mat, n), ylab = "Values of piecewise polynomials")
r = 2 init_coef_vec = c(-2, 2, 9) cpt_true = c(100, 200) n = 300 kappa_mat = cbind(c(3, 9, -27), c(-3, 9, -27)) plot.ts(gen.piece.poly.noiseless(init_coef_vec, cpt_true, kappa_mat, n), ylab = "Values of piecewise polynomials")
Compute the bidirectional Hausdorff distance between two sets.
Hausdorff.dist(vec1, vec2)
Hausdorff.dist(vec1, vec2)
vec1 |
A |
vec2 |
A |
An integer scalar of bidirectional Hausdorff distance.
Daren Wang
vec1 = sample.int(1000, size = 50) vec2 = sample.int(2000, size = 100) Hausdorff.dist(vec1, vec2)
vec1 = sample.int(1000, size = 50) vec2 = sample.int(2000, size = 100) Hausdorff.dist(vec1, vec2)
Computes the element-wise adaptive Huber mean estimator.
huber_mean(x, tau)
huber_mean(x, tau)
x |
A |
tau |
A |
A numeric
scalar corresponding to the adaptive Huber mean estimator.
Haotian Xu
set.seed(123) y = rnorm(100) mean(y) huber_mean(y, 1.345)
set.seed(123) y = rnorm(100) mean(y) huber_mean(y, 1.345)
Function to compute the default thresholding parameter for leading singular value in the soft-impute algorithm.
lambda.network.missing(s, e, t, alpha, rho, pi_ub, p, C_lambda)
lambda.network.missing(s, e, t, alpha, rho, pi_ub, p, C_lambda)
s |
An |
e |
An |
t |
An |
alpha |
A |
rho |
A |
pi_ub |
A |
p |
An |
C_lambda |
A |
The default thresholding parameter is given in Theorem 2 of the reference.
The default thresholding parameter for leading singular value in the soft-impute algorithm
Dubey, Xu and Yu (2021) <arxiv:2110.06450>
Perform local refinement for VAR1 change points detection.
local.refine.CV.VAR1(cpt_init, DATA, zeta_set, delta_local)
local.refine.CV.VAR1(cpt_init, DATA, zeta_set, delta_local)
cpt_init |
A |
DATA |
A |
zeta_set |
A |
delta_local |
A strictly |
A list
with the following structure:
cpt_hat |
A vector of estimated change point locations (sorted in strictly increasing order) |
zeta |
A scalar of selected zeta by cross-validation |
Daren Wang & Haotian Xu
Wang, Yu, Rinaldo and Willett (2019) <arxiv:1909.06359>
Perform local refinement for regression change points localisation.
local.refine.DPDU.regression(cpt_init, beta_hat, y, X, w = 0.9)
local.refine.DPDU.regression(cpt_init, beta_hat, y, X, w = 0.9)
cpt_init |
An |
beta_hat |
A |
y |
A |
X |
A |
w |
A |
A vector of locally refined change points estimation.
Haotian Xu
Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
d0 = 5 p = 30 n = 200 cpt_true = 100 data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) lambda_set = c(0.01, 0.1, 1, 2) zeta_set = c(10, 15, 20) temp = CV.search.DPDU.regression(y = data$y, X = data$X, lambda_set, zeta_set) temp$test_error # test error result # find the indices of lambda_set and zeta_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) lambda_set[min_idx[2]] zeta_set[min_idx[1]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) beta_hat = matrix(unlist(temp$beta_hat[min_idx[1], min_idx[2]]), ncol = length(cpt_init)+1) cpt_refined = local.refine.DPDU.regression(cpt_init, beta_hat, data$y, data$X, w = 0.9)
d0 = 5 p = 30 n = 200 cpt_true = 100 data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) lambda_set = c(0.01, 0.1, 1, 2) zeta_set = c(10, 15, 20) temp = CV.search.DPDU.regression(y = data$y, X = data$X, lambda_set, zeta_set) temp$test_error # test error result # find the indices of lambda_set and zeta_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) lambda_set[min_idx[2]] zeta_set[min_idx[1]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) beta_hat = matrix(unlist(temp$beta_hat[min_idx[1], min_idx[2]]), ncol = length(cpt_init)+1) cpt_refined = local.refine.DPDU.regression(cpt_init, beta_hat, data$y, data$X, w = 0.9)
Perform local refinement for network change points detection.
local.refine.network( cpt_init, data_mat1, data_mat2, self = FALSE, w = 0.5, tau2, tau3 = Inf )
local.refine.network( cpt_init, data_mat1, data_mat2, self = FALSE, w = 0.5, tau2, tau3 = Inf )
cpt_init |
A |
data_mat1 |
A |
data_mat2 |
A independent copy of data_mat1. |
self |
A |
w |
A |
tau2 |
A positive |
tau3 |
A positive |
A numeric
vector of locally refined change point locations.
Daren Wang & Haotian Xu
Wang, Yu and Rinaldo (2018) <arxiv:1809.09602>.
p = 15 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM n = 100 # sample size for each segment # connectivity matrix for the first and the third segments conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix for the second segment conn2_mat = rho * matrix(c(0.6,0.5,0.6,0.5,0.6,1,0.6,1,0.6), nrow = 3) set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm1 = simu.SBM(conn1_mat, can_vec, n, symm = TRUE, self = TRUE) sbm2 = simu.SBM(conn2_mat, can_vec, n, symm = TRUE, self = TRUE) data_mat = cbind(sbm1$obs_mat, sbm2$obs_mat) data_mat1 = data_mat[,seq(1,ncol(data_mat),2)] data_mat2 = data_mat[,seq(2,ncol(data_mat),2)] M = 10 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(data_mat1)) temp = WBS.network(data_mat1, data_mat2, 1, ncol(data_mat1), intervals$Alpha, intervals$Beta, delta = 5) rho_hat = quantile(rowMeans(data_mat), 0.95) tau = p*rho_hat*(log(n))^2/20 # default threshold given in the paper cpt_init = unlist(thresholdBS(temp, tau)$cpt_hat[,1]) cpt_refined = local.refine.network(cpt_init, data_mat1, data_mat2, self = TRUE, tau2 = p*rho_hat/3, tau3 = Inf) cpt_WBS = 2*cpt_init cpt_refined = 2*cpt_refined
p = 15 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM n = 100 # sample size for each segment # connectivity matrix for the first and the third segments conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix for the second segment conn2_mat = rho * matrix(c(0.6,0.5,0.6,0.5,0.6,1,0.6,1,0.6), nrow = 3) set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm1 = simu.SBM(conn1_mat, can_vec, n, symm = TRUE, self = TRUE) sbm2 = simu.SBM(conn2_mat, can_vec, n, symm = TRUE, self = TRUE) data_mat = cbind(sbm1$obs_mat, sbm2$obs_mat) data_mat1 = data_mat[,seq(1,ncol(data_mat),2)] data_mat2 = data_mat[,seq(2,ncol(data_mat),2)] M = 10 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(data_mat1)) temp = WBS.network(data_mat1, data_mat2, 1, ncol(data_mat1), intervals$Alpha, intervals$Beta, delta = 5) rho_hat = quantile(rowMeans(data_mat), 0.95) tau = p*rho_hat*(log(n))^2/20 # default threshold given in the paper cpt_init = unlist(thresholdBS(temp, tau)$cpt_hat[,1]) cpt_refined = local.refine.network(cpt_init, data_mat1, data_mat2, self = TRUE, tau2 = p*rho_hat/3, tau3 = Inf) cpt_WBS = 2*cpt_init cpt_refined = 2*cpt_refined
Perform local refinement for univariate polynomials change point detection.
local.refine.poly(cpt_init, y, r, delta_lr)
local.refine.poly(cpt_init, y, r, delta_lr)
cpt_init |
An |
y |
A |
r |
An |
delta_lr |
A positive |
An integer
vector of locally refined change point estimation.
Haotian Xu
Yu and Chatterjee (2020) <arXiv:2007.09910>
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) plot.ts(y) gamma_set = 3:9 DP_result = CV.search.DP.poly(y, r = 2, gamma_set, delta = 5) min_idx = which.min(DP_result$test_error) cpt_init = unlist(DP_result$cpt_hat[min_idx]) local.refine.poly(cpt_init, y, r = 2, delta_lr = 5)
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) plot.ts(y) gamma_set = 3:9 DP_result = CV.search.DP.poly(y, r = 2, gamma_set, delta = 5) min_idx = which.min(DP_result$test_error) cpt_init = unlist(DP_result$cpt_hat[min_idx]) local.refine.poly(cpt_init, y, r = 2, delta_lr = 5)
Perform local refinement for regression change points localisation.
local.refine.regression(cpt_init, y, X, zeta)
local.refine.regression(cpt_init, y, X, zeta)
cpt_init |
An |
y |
A |
X |
A |
zeta |
A |
A vector of locally refined change points estimation.
Daren Wang & Haotian Xu
Rinaldo, A., Wang, D., Wen, Q., Willett, R., & Yu, Y. (2021, March). Localizing changes in high-dimensional regression models. In International Conference on Artificial Intelligence and Statistics (pp. 2089-2097). PMLR.
Rinaldo, Wang, Wen, Willett and Yu (2020) <arxiv:2010.10410>
d0 = 10 p = 20 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) gamma_set = c(0.01, 0.1, 1) lambda_set = c(0.01, 0.1, 1, 3) temp = CV.search.DP.regression(y = data$y, X = data$X, gamma_set, lambda_set, delta = 2) temp$test_error # test error result # find the indices of gamma_set and lambda_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) gamma_set[min_idx[1]] lambda_set[min_idx[2]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) local.refine.regression(cpt_init, data$y, X = data$X, zeta = 0.5)
d0 = 10 p = 20 n = 100 cpt_true = c(30, 70) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) gamma_set = c(0.01, 0.1, 1) lambda_set = c(0.01, 0.1, 1, 3) temp = CV.search.DP.regression(y = data$y, X = data$X, gamma_set, lambda_set, delta = 2) temp$test_error # test error result # find the indices of gamma_set and lambda_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) gamma_set[min_idx[1]] lambda_set[min_idx[2]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) local.refine.regression(cpt_init, data$y, X = data$X, zeta = 0.5)
Perform local refinement for univariate mean change points detection.
local.refine.univar(cpt_init, y)
local.refine.univar(cpt_init, y)
cpt_init |
An |
y |
A |
An integer
vector of locally refined change point estimation.
Haotian Xu
Wang, Yu and Rinaldo (2020) <doi:10.1214/20-EJS1710>.
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) gamma_set = 1:5 DP_result = CV.search.DP.univar(y, gamma_set, delta = 5) min_idx = which.min(DP_result$test_error) cpt_hat = unlist(DP_result$cpt_hat[min_idx]) Hausdorff.dist(cpt_hat, cpt_true) cpt_LR = local.refine.univar(cpt_hat, y) Hausdorff.dist(cpt_LR, cpt_true)
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) gamma_set = 1:5 DP_result = CV.search.DP.univar(y, gamma_set, delta = 5) min_idx = which.min(DP_result$test_error) cpt_hat = unlist(DP_result$cpt_hat[min_idx]) Hausdorff.dist(cpt_hat, cpt_true) cpt_LR = local.refine.univar(cpt_hat, y) Hausdorff.dist(cpt_LR, cpt_true)
Perform local refinement for VAR1 change points detection.
local.refine.VAR1(cpt_init, DATA, zeta)
local.refine.VAR1(cpt_init, DATA, zeta)
cpt_init |
A |
DATA |
A |
zeta |
A |
An integer
vector of locally refined change points estimation.
Daren Wang & Haotian Xu
Wang, Yu, Rinaldo and Willett (2019) <arxiv:1909.06359>.
Transform a vector containing lower diagonal entries into a symmetric matrix of dimension p.
lowertri2mat(lowertri_vec, p, diag = FALSE)
lowertri2mat(lowertri_vec, p, diag = FALSE)
lowertri_vec |
A |
p |
A |
diag |
A |
A numeric
p x p symmetric matrix.
Haotian Xu
A = matrix(1:16, 4, 4) B = lowertri2mat(A[lower.tri(A)], 4, diag = FALSE) C = lowertri2mat(A[lower.tri(A, diag = TRUE)], 4, diag = TRUE)
A = matrix(1:16, 4, 4) B = lowertri2mat(A[lower.tri(A)], 4, diag = FALSE) C = lowertri2mat(A[lower.tri(A, diag = TRUE)], 4, diag = TRUE)
Estimating long-run variance for regression settings with change points.
LRV.regression(cpt_init, beta_hat, y, X, w = 0.9, block_size)
LRV.regression(cpt_init, beta_hat, y, X, w = 0.9, block_size)
cpt_init |
An |
beta_hat |
A |
y |
A |
X |
A |
w |
A |
block_size |
An |
A vector of long-run variance estimators associated with all local refined intervals.
Haotian Xu
Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
d0 = 5 p = 10 n = 200 cpt_true = c(70, 140) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) lambda_set = c(0.1, 0.5, 1, 2) zeta_set = c(10, 15, 20) temp = CV.search.DPDU.regression(y = data$y, X = data$X, lambda_set, zeta_set) temp$test_error # test error result # find the indices of lambda_set and zeta_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) lambda_set[min_idx[2]] zeta_set[min_idx[1]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) beta_hat = matrix(unlist(temp$beta_hat[min_idx[1], min_idx[2]]), ncol = length(cpt_init)+1) interval_refine = trim_interval(n, cpt_init) # choose S block_size = ceiling(sqrt(min(floor(interval_refine[,2]) - ceiling(interval_refine[,1])))/2) LRV_est = LRV.regression(cpt_init, beta_hat, data$y, data$X, w = 0.9, block_size)
d0 = 5 p = 10 n = 200 cpt_true = c(70, 140) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9) lambda_set = c(0.1, 0.5, 1, 2) zeta_set = c(10, 15, 20) temp = CV.search.DPDU.regression(y = data$y, X = data$X, lambda_set, zeta_set) temp$test_error # test error result # find the indices of lambda_set and zeta_set which minimizes the test error min_idx = as.vector(arrayInd(which.min(temp$test_error), dim(temp$test_error))) lambda_set[min_idx[2]] zeta_set[min_idx[1]] cpt_init = unlist(temp$cpt_hat[min_idx[1], min_idx[2]]) beta_hat = matrix(unlist(temp$beta_hat[min_idx[1], min_idx[2]]), ncol = length(cpt_init)+1) interval_refine = trim_interval(n, cpt_init) # choose S block_size = ceiling(sqrt(min(floor(interval_refine[,2]) - ceiling(interval_refine[,1])))/2) LRV_est = LRV.regression(cpt_init, beta_hat, data$y, data$X, w = 0.9, block_size)
Perform online change point detection for network data by controlling the false alarm rate at level alpha or controlling the average run length gamma. The default choice of the tuning parameters tau1, tau2 and tau3 are used (see Section 4.1 of the reference).
online.network( data_mat1, data_mat2, self = TRUE, b_vec = NULL, train_mat = NULL, alpha = NULL, gamma = NULL, permu_num = NULL )
online.network( data_mat1, data_mat2, self = TRUE, b_vec = NULL, train_mat = NULL, alpha = NULL, gamma = NULL, permu_num = NULL )
data_mat1 |
A |
data_mat2 |
A |
self |
A |
b_vec |
A |
train_mat |
A |
alpha |
A |
gamma |
An |
permu_num |
An |
A list
with the following structure:
cpt |
Estimated change point |
score |
A |
b_vec |
A |
Oscar Hernan Madrid Padilla & Haotian Xu
Yu, Padilla, Wang and Rinaldo (2021) <arxiv:2101.05477>
set.seed(123) p = 15 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM n = 100 # sample size for each segment # connectivity matrix for the first and the third segments conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix for the second segment conn2_mat = rho * matrix(c(0.6,0.5,0.6,0.5,0.6,1,0.6,1,0.6), nrow = 3) set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm1 = simu.SBM(conn1_mat, can_vec, n, symm = TRUE, self = TRUE) sbm2 = simu.SBM(conn2_mat, can_vec, n, symm = TRUE, self = TRUE) data_mat = cbind(sbm1$obs_mat, sbm2$obs_mat) data_mat1 = data_mat[,seq(1,ncol(data_mat),2)] data_mat2 = data_mat[,seq(2,ncol(data_mat),2)] train_mat = simu.SBM(conn1_mat, can_vec, n = 150, symm = TRUE, self = TRUE)$obs_mat temp = online.network(data_mat1, data_mat2, self = TRUE, b_vec = NULL, train_mat, alpha = 0.05, gamma = NULL, permu_num = 20) cpt_hat = 2 * temp$cpt
set.seed(123) p = 15 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM n = 100 # sample size for each segment # connectivity matrix for the first and the third segments conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix for the second segment conn2_mat = rho * matrix(c(0.6,0.5,0.6,0.5,0.6,1,0.6,1,0.6), nrow = 3) set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm1 = simu.SBM(conn1_mat, can_vec, n, symm = TRUE, self = TRUE) sbm2 = simu.SBM(conn2_mat, can_vec, n, symm = TRUE, self = TRUE) data_mat = cbind(sbm1$obs_mat, sbm2$obs_mat) data_mat1 = data_mat[,seq(1,ncol(data_mat),2)] data_mat2 = data_mat[,seq(2,ncol(data_mat),2)] train_mat = simu.SBM(conn1_mat, can_vec, n = 150, symm = TRUE, self = TRUE)$obs_mat temp = online.network(data_mat1, data_mat2, self = TRUE, b_vec = NULL, train_mat, alpha = 0.05, gamma = NULL, permu_num = 20) cpt_hat = 2 * temp$cpt
Perform online change point detection for network with missing values by controlling the false alarm rate at level alpha.
online.network.missing( data_incomplete_list, eta_list, alpha_grid, thresholds_array, rho_hat, pi_ub_hat, C_lambda = 2/3, delta = 5 )
online.network.missing( data_incomplete_list, eta_list, alpha_grid, thresholds_array, rho_hat, pi_ub_hat, C_lambda = 2/3, delta = 5 )
data_incomplete_list |
A |
eta_list |
A |
alpha_grid |
A |
thresholds_array |
A |
rho_hat |
A |
pi_ub_hat |
A |
C_lambda |
A |
delta |
An |
Online change point estimator.
Haotian Xu
Dubey, Xu and Yu (2021) <arxiv:2110.06450>
calibrate.online.network.missing
for calibrating thresholds.
Perform online change point detection with controlled false alarm rate or average run length.
online.univar( y_vec, b_vec = NULL, train_vec = NULL, alpha = NULL, gamma = NULL, permu_num = NULL )
online.univar( y_vec, b_vec = NULL, train_vec = NULL, alpha = NULL, gamma = NULL, permu_num = NULL )
y_vec |
A |
b_vec |
A |
train_vec |
A |
alpha |
A |
gamma |
An |
permu_num |
An |
A list
with the following structure:
cpt_hat |
An |
b_vec |
A |
Haotian Xu
Yu, Padilla, Wang and Rinaldo (2020) <arxiv:2006.03283>
y_vec = rnorm(150) + c(rep(0, 100), rep(1, 50)) train_vec = rnorm(100) # control the false alarm rate temp1 = online.univar(y_vec = y_vec, train_vec = train_vec, alpha = 0.05, permu_num = 20) temp1$cpt_hat temp1$b_vec # calibrated threshold
y_vec = rnorm(150) + c(rep(0, 100), rep(1, 50)) train_vec = rnorm(100) # control the false alarm rate temp1 = online.univar(y_vec = y_vec, train_vec = train_vec, alpha = 0.05, permu_num = 20) temp1$cpt_hat temp1$b_vec # calibrated threshold
Perform Online change point detection with potentially multiple change points.
online.univar.multi( y_vec, b_vec = NULL, train_vec = NULL, alpha = NULL, gamma = NULL, permu_num = NULL )
online.univar.multi( y_vec, b_vec = NULL, train_vec = NULL, alpha = NULL, gamma = NULL, permu_num = NULL )
y_vec |
A |
b_vec |
A |
train_vec |
A |
alpha |
A |
gamma |
An |
permu_num |
An |
#' @title Online change point detection with controlled average run length.
#' @description Perform online change point detection via CUSUM (single change point, type 2).
#' @param y_vec A numeric
vector of observations.
#' @param gamma A integer
scalar of interval length (>= 2).
#' @param tau_gamma A numeric
scalar of threshold.
#' @param ... Additional arguments.
#' @return An integer
scalar of estimated change points location.
#' @export
#' @author Haotian Xu
#' @examples
#' TO DO
online.univar.one2 = function(y_vec, gamma, tau_gamma, ...)
t = 1
FLAG = 0
while(FLAG == 0 & t <= length(y_vec))
t = t + 1
e = max(t-gamma, 0)
cusum_vec = sapply((e+1):(t-1), function(s) sqrt((t-s)*(s-e)/(t-e)) * abs(mean(y_vec[(e+1):s]) - mean(y_vec[(s+1):t])))
FLAG = 1 - prod(cusum_vec <= tau_gamma)
return(t)
#' @title Online change point detection via CUSUM (single change point, type 3).
#' @description Perform online change point detection via CUSUM (single change point, type 3).
#' @param y_vec A numeric
vector of observations.
#' @param tau_vec A numeric
vector of thresholds at time t>= 1.
#' @param ... Additional arguments.
#' @return An integer
scalar of estimated change point location.
#' @export
#' @author Haotian Xu
#' @examples
#' TO DO
online.univar.one3 = function(y_vec, tau_vec, ...)
if(length(y_vec) != length(tau_vec))
stop("y_vec and tau_vec should have the same length.")
t = 1
FLAG = 0
while(FLAG == 0 & t <= length(y_vec))
t = t + 1
J = floor(log2(t))
j = 0
while(j < J & FLAG == 0)
j = j + 1
s_j = t - 2^(j-1)
cusum = sqrt((t-s_j)*s_j/t) * abs(mean(y_vec[1:s_j]) - mean(y_vec[(s_j+1):t]))
FLAG = (cusum > tau_vec[t])
return(t)
An integer
vector of estimated change points.
Haotian Xu
Yu, Padilla, Wang and Rinaldo (2020) <arxiv:2006.03283>
y_vec = rnorm(200) + c(rep(0, 50), rep(1, 100), rep(0, 50)) train_vec = rnorm(150) # control the false alarm rate temp1 = online.univar.multi(y_vec = y_vec, train_vec = train_vec, alpha = 0.05, permu_num = 20) temp1
y_vec = rnorm(200) + c(rep(0, 50), rep(1, 100), rep(0, 50)) train_vec = rnorm(150) # control the false alarm rate temp1 = online.univar.multi(y_vec = y_vec, train_vec = train_vec, alpha = 0.05, permu_num = 20) temp1
Simulate a sparse regression model with change points in coefficients under temporal dependence.
simu.change.regression( d0, cpt_true, p, n, sigma, kappa, cov_type = "I", mod_X = "IID", mod_e = "IID" )
simu.change.regression( d0, cpt_true, p, n, sigma, kappa, cov_type = "I", mod_X = "IID", mod_e = "IID" )
d0 |
A |
cpt_true |
An |
p |
An |
n |
An |
sigma |
A |
kappa |
A |
cov_type |
A |
mod_X |
A |
mod_e |
A |
A list
with the following structure:
cpt_true |
A vector of true changepoints (sorted in strictly increasing order). |
X |
An n-by-p design matrix. |
y |
An n-dim vector of response variable. |
betafullmat |
A p-by-n matrix of coefficients. |
Daren Wang, Zifeng Zhao & Haotian Xu
Rinaldo, Wang, Wen, Willett and Yu (2020) <arxiv:2010.10410>; Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
d0 = 10 p = 30 n = 100 cpt_true = c(10, 30, 40, 70, 90) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9)
d0 = 10 p = 30 n = 100 cpt_true = c(10, 30, 40, 70, 90) data = simu.change.regression(d0, cpt_true, p, n, sigma = 1, kappa = 9)
Simulate a dot product graph (without change point). The generated data is a matrix with each column corresponding to the vectorized adjacency (sub)matrix at a time point. For example, if the network matrix is required to be symmetric and without self-loop, only the strictly lower diagonal entries are considered.
simu.RDPG(x_mat, n, symm = TRUE, self = FALSE)
simu.RDPG(x_mat, n, symm = TRUE, self = FALSE)
x_mat |
A |
n |
A |
symm |
A |
self |
A |
A list
with the following structure:
obs_mat |
A matrix, with each column be the vectorized adjacency (sub)matrix. For example, if "symm = TRUE" and "self = FALSE", only the strictly lower triangular matrix is considered. |
graphon_mat |
Underlying graphon matrix. |
Haotian Xu
p = 20 # number of nodes n = 50 # sample size for each segment lat_dim_num = 5 # number of latent dimensions set.seed(1) x_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) x_tilde_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) y_mat = rbind(x_tilde_mat[1:floor(p/4),], x_mat[(floor(p/4)+1):p,]) rdpg1 = simu.RDPG(x_mat, n, symm = TRUE, self = FALSE) rdpg2 = simu.RDPG(y_mat, n, symm = TRUE, self = FALSE) data1_mat = rdpg1$obs_mat data2_mat = rdpg2$obs_mat data_mat = cbind(data1_mat, data2_mat)
p = 20 # number of nodes n = 50 # sample size for each segment lat_dim_num = 5 # number of latent dimensions set.seed(1) x_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) x_tilde_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) y_mat = rbind(x_tilde_mat[1:floor(p/4),], x_mat[(floor(p/4)+1):p,]) rdpg1 = simu.RDPG(x_mat, n, symm = TRUE, self = FALSE) rdpg2 = simu.RDPG(y_mat, n, symm = TRUE, self = FALSE) data1_mat = rdpg1$obs_mat data2_mat = rdpg2$obs_mat data_mat = cbind(data1_mat, data2_mat)
Simulate a Stochastic Block Model (without change point). The generated data is a matrix with each column corresponding to the vectorized adjacency (sub)matrix at a time point. For example, if the network matrix is required to be symmetric and without self-loop, only the strictly lower diagonal entries are considered.
simu.SBM(connec_mat, can_vec, n, symm = FALSE, self = TRUE)
simu.SBM(connec_mat, can_vec, n, symm = FALSE, self = TRUE)
connec_mat |
A |
can_vec |
A |
n |
A |
symm |
A |
self |
A |
A list
with the following structure:
obs_mat |
A matrix, with each column be the vectorized adjacency (sub)matrix. For example, if "symm = TRUE" and "self = FALSE", only the strictly lower triangular matrix is considered. |
graphon_mat |
Underlying graphon matrix. |
Daren Wang & Haotian Xu
Wang, Yu and Rinaldo (2018) <arxiv:1809.09602>.
p = 15 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM n = 100 # sample size for each segment # connectivity matrix for the first and the third segments conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix for the second segment conn2_mat = rho * matrix(c(0.6,0.5,0.6,0.5,0.6,1,0.6,1,0.6), nrow = 3) set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm1 = simu.SBM(conn1_mat, can_vec, n, symm = TRUE, self = TRUE) sbm2 = simu.SBM(conn2_mat, can_vec, n, symm = TRUE, self = TRUE) data_mat = cbind(sbm1$obs_mat, sbm2$obs_mat)
p = 15 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM n = 100 # sample size for each segment # connectivity matrix for the first and the third segments conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix for the second segment conn2_mat = rho * matrix(c(0.6,0.5,0.6,0.5,0.6,1,0.6,1,0.6), nrow = 3) set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm1 = simu.SBM(conn1_mat, can_vec, n, symm = TRUE, self = TRUE) sbm2 = simu.SBM(conn2_mat, can_vec, n, symm = TRUE, self = TRUE) data_mat = cbind(sbm1$obs_mat, sbm2$obs_mat)
Simulate a (stable) SEPP model (without change point).
simu.SEPP(intercept, n, A, threshold, vzero = NULL)
simu.SEPP(intercept, n, A, threshold, vzero = NULL)
intercept |
A |
n |
An |
A |
A |
threshold |
A |
vzero |
A |
A p-by-n matrix.
Daren Wang & Haotian Xu
Wang, Yu, & Willett (2020). Detecting Abrupt Changes in High-Dimensional Self-Exciting Poisson Processes. <arXiv:2006.03572>.
p = 10 # dimension n = 50 s = 5 # sparsity factor = 0.12 # large factor gives exact recovery threshold = 4 # thresholding makes the process stable intercept = 1/2 # intercept of the model. Assume to be known as in the existing literature A1 = A2 = A3 = matrix(0, p, p) diag(A1[,-1]) = 1 diag(A1) = 1 diag(A1[-1,]) = -1 A1 = A1*factor A1[(s+1):p, (s+1):p] = 0 diag(A2[,-1]) = 1 diag(A2) = -1 diag(A2[-1,]) = 1 A2 = A2*factor A2[(s+1):p, (s+1):p] = 0 diag(A3[,-1]) = 1 diag(A3) = 1 diag(A3[-1,]) = -1 A3 = A3*factor A3[(s+1):p, (s+1):p] = 0 data1 = simu.SEPP(intercept, n, A1, threshold, vzero = NULL) data2 = simu.SEPP(intercept, n, A2, threshold, vzero = data1[,n]) data3 = simu.SEPP(intercept, n, A3, threshold, vzero = data2[,n]) data = cbind(data1, data2, data3) dim(data)
p = 10 # dimension n = 50 s = 5 # sparsity factor = 0.12 # large factor gives exact recovery threshold = 4 # thresholding makes the process stable intercept = 1/2 # intercept of the model. Assume to be known as in the existing literature A1 = A2 = A3 = matrix(0, p, p) diag(A1[,-1]) = 1 diag(A1) = 1 diag(A1[-1,]) = -1 A1 = A1*factor A1[(s+1):p, (s+1):p] = 0 diag(A2[,-1]) = 1 diag(A2) = -1 diag(A2[-1,]) = 1 A2 = A2*factor A2[(s+1):p, (s+1):p] = 0 diag(A3[,-1]) = 1 diag(A3) = 1 diag(A3[-1,]) = -1 A3 = A3*factor A3[(s+1):p, (s+1):p] = 0 data1 = simu.SEPP(intercept, n, A1, threshold, vzero = NULL) data2 = simu.SEPP(intercept, n, A2, threshold, vzero = data1[,n]) data3 = simu.SEPP(intercept, n, A3, threshold, vzero = data2[,n]) data = cbind(data1, data2, data3) dim(data)
Simulate data of size n and dimension p from a VAR1 model (without change point) with Gaussian i.i.d. error terms.
simu.VAR1(sigma, p, n, A, vzero = NULL)
simu.VAR1(sigma, p, n, A, vzero = NULL)
sigma |
A |
p |
An |
n |
An |
A |
A |
vzero |
A |
A p-by-n matrix.
Daren Wang
Wang, Yu, Rinaldo and Willett (2019) <arxiv:1909.06359>
p = 10 sigma = 1 n = 100 A = matrix(rnorm(p*p), nrow = p)*0.1 # transition matrix simu.VAR1(sigma, p, n, A)
p = 10 sigma = 1 n = 100 A = matrix(rnorm(p*p), nrow = p)*0.1 # transition matrix simu.VAR1(sigma, p, n, A)
Estimate graphon matrix by soft-impute for independent adjacency matrices with missing values.
softImpute.network.missing( data_incomplete_list, eta_list, lambda, a, it_max = 10000 )
softImpute.network.missing( data_incomplete_list, eta_list, lambda, a, it_max = 10000 )
data_incomplete_list |
A |
eta_list |
A |
lambda |
A |
a |
A |
it_max |
An |
Estimated graphon matrix
Haotian Xu
Dubey, Xu and Yu (2021) <arxiv:2110.06450>
Given a BS object, perform thresholding to find the change point locations.
thresholdBS(BS_object, tau)
thresholdBS(BS_object, tau)
BS_object |
A |
tau |
A positive |
A list
with the following structure:
BS_tree_trimmed |
BS_tree with change points which do not satisfy the thresholding criteria removed |
cpt_hat |
A matrix contains change point locations, values of corresponding statistic, and levels at which each change point is detected |
Haotian Xu
BS.univar
, BS.uni.nonpar
, BS.cov
, WBS.univar
, WBS.uni.nonpar
, WBS.multi.nonpar
, WBS.network
, WBSIP.cov
y = c(rnorm(100, 0, 1), rnorm(100, 10, 10), rnorm(100, 40, 10)) temp = BS.univar(y, 1, 300, 5) plot.ts(y) points(x = tail(temp$S[order(temp$Dval)],4), y = y[tail(temp$S[order(temp$Dval)],4)], col = "red") thresholdBS(temp, 20)
y = c(rnorm(100, 0, 1), rnorm(100, 10, 10), rnorm(100, 40, 10)) temp = BS.univar(y, 1, 300, 5) plot.ts(y) points(x = tail(temp$S[order(temp$Dval)],4), y = y[tail(temp$S[order(temp$Dval)],4)], col = "red") thresholdBS(temp, 20)
Performing the interval trimming for local refinement.
trim_interval(n, cpt_init, w = 0.9)
trim_interval(n, cpt_init, w = 0.9)
n |
An |
cpt_init |
An |
w |
A |
A matrix with each row be a trimmed interval.
Haotian Xu
Xu, Wang, Zhao and Yu (2022) <arXiv:2207.12453>.
A function to compute change points based on the multivariate nonparametic method with tuning parameter selected by FDR control.
tuneBSmultinonpar(BS_object, Y)
tuneBSmultinonpar(BS_object, Y)
BS_object |
A "BS" object produced by |
Y |
A |
A vector of estimated change points.
Oscar Hernan Madrid Padilla & Haotian Xu
Padilla, Yu, Wang and Rinaldo (2019) <arxiv:1910.13289>.
n = 70 v = c(floor(n/3), 2*floor(n/3)) # location of change points p = 4 Y = matrix(0, p, n) # matrix for data mu0 = rep(0, p) # mean of the data mu1 = rep(0, p) mu1[1:floor(p/2)] = 2 Sigma0 = diag(p) #Covariance matrices of the data Sigma1 = diag(p)*2 # Generate data for(t in 1:n){ if(t < v[1] || t > v[2]){ Y[,t] = MASS::mvrnorm(n = 1, mu0, Sigma0) } if(t >= v[1] && t < v[2]){ Y[,t] = MASS::mvrnorm(n = 1, mu1, Sigma1) } }## close for generate data M = 8 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(Y)) #Random intervals K_max = 30 h = 5*(K_max*log(n)/n)^{1/p} # bandwith temp = WBS.multi.nonpar(Y, Y, 1, ncol(Y), intervals$Alpha, intervals$Beta, h, delta = 10) S = tuneBSmultinonpar(temp, Y)
n = 70 v = c(floor(n/3), 2*floor(n/3)) # location of change points p = 4 Y = matrix(0, p, n) # matrix for data mu0 = rep(0, p) # mean of the data mu1 = rep(0, p) mu1[1:floor(p/2)] = 2 Sigma0 = diag(p) #Covariance matrices of the data Sigma1 = diag(p)*2 # Generate data for(t in 1:n){ if(t < v[1] || t > v[2]){ Y[,t] = MASS::mvrnorm(n = 1, mu0, Sigma0) } if(t >= v[1] && t < v[2]){ Y[,t] = MASS::mvrnorm(n = 1, mu1, Sigma1) } }## close for generate data M = 8 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(Y)) #Random intervals K_max = 30 h = 5*(K_max*log(n)/n)^{1/p} # bandwith temp = WBS.multi.nonpar(Y, Y, 1, ncol(Y), intervals$Alpha, intervals$Beta, h, delta = 10) S = tuneBSmultinonpar(temp, Y)
Perform Change points detection for dependent dynamic random dot product graph models. The tuning parameter tau for WBS is automatically selected based on the BIC-type scores defined in Equation (2.4) in Zou et al. (2014).
tuneBSnonparRDPG(BS_object, data_mat, lowerdiag = FALSE, d)
tuneBSnonparRDPG(BS_object, data_mat, lowerdiag = FALSE, d)
BS_object |
A "BS" object produced by |
data_mat |
A |
lowerdiag |
A |
d |
A |
A numeric
vector of estimated change points.
Oscar Hernan Madrid Padilla & Haotian Xu
Padilla, Yu and Priebe (2019) <arxiv:1911.07494>.
### generate data p = 20 # number of nodes n = 50 # sample size for each segment lat_dim_num = 5 # number of latent dimensions set.seed(1) x_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) x_tilde_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) y_mat = rbind(x_tilde_mat[1:floor(p/4),], x_mat[(floor(p/4)+1):p,]) rdpg1 = simu.RDPG(x_mat, n, symm = TRUE, self = FALSE) rdpg2 = simu.RDPG(y_mat, n, symm = TRUE, self = FALSE) data1_mat = rdpg1$obs_mat data2_mat = rdpg2$obs_mat data_mat = cbind(data1_mat, data2_mat) ### detect change points M = 20 # number of random intervals for WBS d = 10 # parameter for scaled PCA algorithm delta = 5 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(data_mat)) WBS_result = WBS.nonpar.RDPG(data_mat, lowerdiag = TRUE, d, Alpha = intervals$Alpha, Beta = intervals$Beta, delta) cpt_hat = tuneBSnonparRDPG(WBS_result, data_mat, lowerdiag = TRUE, d)
### generate data p = 20 # number of nodes n = 50 # sample size for each segment lat_dim_num = 5 # number of latent dimensions set.seed(1) x_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) x_tilde_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) y_mat = rbind(x_tilde_mat[1:floor(p/4),], x_mat[(floor(p/4)+1):p,]) rdpg1 = simu.RDPG(x_mat, n, symm = TRUE, self = FALSE) rdpg2 = simu.RDPG(y_mat, n, symm = TRUE, self = FALSE) data1_mat = rdpg1$obs_mat data2_mat = rdpg2$obs_mat data_mat = cbind(data1_mat, data2_mat) ### detect change points M = 20 # number of random intervals for WBS d = 10 # parameter for scaled PCA algorithm delta = 5 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(data_mat)) WBS_result = WBS.nonpar.RDPG(data_mat, lowerdiag = TRUE, d, Alpha = intervals$Alpha, Beta = intervals$Beta, delta) cpt_hat = tuneBSnonparRDPG(WBS_result, data_mat, lowerdiag = TRUE, d)
Perform wild binary segmentation with tuning parameter selection based on sample splitting.
tuneBSuninonpar(BS_object, Y, N)
tuneBSuninonpar(BS_object, Y, N)
BS_object |
A "BS" object produced by |
Y |
A |
N |
A |
A vector of estimated change points (sorted in strictly increasing order).
Oscar Hernan Madrid Padilla & Haotian Xu
Padilla, Yu, Wang and Rinaldo (2021) <doi:10.1214/21-EJS1809>.
BS.uni.nonpar
and WBS.uni.nonpar
.
Y = t(as.matrix(c(rnorm(100, 0, 1), rnorm(100, 0, 10), rnorm(50, 0, 40)))) W = Y # W is a copy of the matrix Y, it can be Y itself. N = rep(1, 250) M = 5 set.seed(123) intervals = WBS.intervals(M = M, lower = 1, upper = ncol(Y)) BS_object = WBS.uni.nonpar(W, 1, ncol(Y), intervals$Alpha, intervals$Beta, N, delta = 5) cpt_hat = tuneBSuninonpar(BS_object, Y, N)
Y = t(as.matrix(c(rnorm(100, 0, 1), rnorm(100, 0, 10), rnorm(50, 0, 40)))) W = Y # W is a copy of the matrix Y, it can be Y itself. N = rep(1, 250) M = 5 set.seed(123) intervals = WBS.intervals(M = M, lower = 1, upper = ncol(Y)) BS_object = WBS.uni.nonpar(W, 1, ncol(Y), intervals$Alpha, intervals$Beta, N, delta = 5) cpt_hat = tuneBSuninonpar(BS_object, Y, N)
Perform univariate mean change points detection based on standard or wild binary segmentation. The threshold parameter tau for WBS is automatically selected based on the sSIC score defined in Equation (4) in Fryzlewicz (2014).
tuneBSunivar(BS_object, y)
tuneBSunivar(BS_object, y)
BS_object |
A "BS" object produced by |
y |
A |
A list
with the following structure:
cpt |
A vector of estimated change point locations (sorted in strictly increasing order). |
tau |
A scalar of selected threshold tau based on sSIC. |
Daren Wang & Haotian Xu
Wang, Yu and Rinaldo (2020) <doi:10.1214/20-EJS1710>; Fryzlewicz (2014), Wild binary segmentation for multiple change-point detection, <DOI: 10.1214/14-AOS1245>.
BS.univar
and WBS.univar
.
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) ## change points detection by WBS intervals = WBS.intervals(M = 100, lower = 1, upper = length(y)) temp2 = WBS.univar(y, 1, length(y), intervals$Alpha, intervals$Beta, delta = 5) WBS_result = tuneBSunivar(temp2, y) cpt_WBS = WBS_result$cpt Hausdorff.dist(cpt_WBS, cpt_true)
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) ## change points detection by WBS intervals = WBS.intervals(M = 100, lower = 1, upper = length(y)) temp2 = WBS.univar(y, 1, length(y), intervals$Alpha, intervals$Beta, delta = 5) WBS_result = tuneBSunivar(temp2, y) cpt_WBS = WBS_result$cpt Hausdorff.dist(cpt_WBS, cpt_true)
Generate random intervals for WBS.
WBS.intervals(M, lower = 1, upper)
WBS.intervals(M, lower = 1, upper)
M |
A positive |
lower |
A positive |
upper |
A positive |
A list
with the following structure:
Alpha |
A M-dim vector representing the starting indices of random intervals |
Beta |
A M-dim vector representing the ending indices of random intervals |
Oscar Hernan Madrid Padilla
WBS.intervals(120, lower = 1, upper = 300)
WBS.intervals(120, lower = 1, upper = 300)
Perform wild binary segmentation for multivariate nonparametric change points detection.
WBS.multi.nonpar(Y, W, s, e, Alpha, Beta, h, delta, level = 0)
WBS.multi.nonpar(Y, W, s, e, Alpha, Beta, h, delta, level = 0)
Y |
A |
W |
A copy of the matrix Y, it can be Y itself. |
s |
A |
e |
A |
Alpha |
A |
Beta |
A |
h |
A |
delta |
A |
level |
Should be fixed as 0. |
An object of class
"BS", which is a list
with the following structure:
S |
A vector of estimated change points (sorted in strictly increasing order). |
Dval |
A vector of values of CUSUM statistic based on KS distance. |
Level |
A vector representing the levels at which each change point is detected. |
Parent |
A matrix with the starting indices on the first row and the ending indices on the second row. |
Oscar Hernan Madrid Padilla & Haotian Xu
Padilla, Yu, Wang and Rinaldo (2019) <arxiv:1910.13289>.
thresholdBS
for obtain change points estimation, tuneBSmultinonpar
for a tuning version.
n = 70 v = c(floor(n/3), 2*floor(n/3)) # location of change points p = 4 Y = matrix(0, p, n) # matrix for data mu0 = rep(0, p) # mean of the data mu1 = rep(0, p) mu1[1:floor(p/2)] = 2 Sigma0 = diag(p) #Covariance matrices of the data Sigma1 = diag(p)*2 # Generate data for(t in 1:n){ if(t < v[1] || t > v[2]){ Y[,t] = MASS::mvrnorm(n = 1, mu0, Sigma0) } if(t >= v[1] && t < v[2]){ Y[,t] = MASS::mvrnorm(n = 1, mu1, Sigma1) } }## close for generate data M = 10 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(Y)) #Random intervals K_max = 30 h = 5*(K_max*log(n)/n)^{1/p} # bandwith temp = WBS.multi.nonpar(Y, Y, 1, ncol(Y), intervals$Alpha, intervals$Beta, h, delta = 10) result = thresholdBS(temp, median(temp$Dval))
n = 70 v = c(floor(n/3), 2*floor(n/3)) # location of change points p = 4 Y = matrix(0, p, n) # matrix for data mu0 = rep(0, p) # mean of the data mu1 = rep(0, p) mu1[1:floor(p/2)] = 2 Sigma0 = diag(p) #Covariance matrices of the data Sigma1 = diag(p)*2 # Generate data for(t in 1:n){ if(t < v[1] || t > v[2]){ Y[,t] = MASS::mvrnorm(n = 1, mu0, Sigma0) } if(t >= v[1] && t < v[2]){ Y[,t] = MASS::mvrnorm(n = 1, mu1, Sigma1) } }## close for generate data M = 10 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(Y)) #Random intervals K_max = 30 h = 5*(K_max*log(n)/n)^{1/p} # bandwith temp = WBS.multi.nonpar(Y, Y, 1, ncol(Y), intervals$Alpha, intervals$Beta, h, delta = 10) result = thresholdBS(temp, median(temp$Dval))
Perform wild binary segmentation for network change points detection.
WBS.network(data_mat1, data_mat2, s, e, Alpha, Beta, delta, level = 0)
WBS.network(data_mat1, data_mat2, s, e, Alpha, Beta, delta, level = 0)
data_mat1 |
A |
data_mat2 |
An independent copy of data_mat1. |
s |
A |
e |
A |
Alpha |
A |
Beta |
A |
delta |
A positive |
level |
Should be fixed as 0. |
An object of class
"BS", which is a list
with the following structure:
S |
A vector of estimated change points (sorted in strictly increasing order). |
Dval |
A vector of values of CUSUM statistic based on KS distance. |
Level |
A vector representing the levels at which each change point is detected. |
Parent |
A matrix with the starting indices on the first row and the ending indices on the second row. |
Daren Wang & Haotian Xu
Wang, Yu and Rinaldo (2018) <arxiv:1809.09602>.
thresholdBS
for obtaining change points estimation.
p = 15 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM n = 100 # sample size for each segment # connectivity matrix for the first and the third segments conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix for the second segment conn2_mat = rho * matrix(c(0.6,0.5,0.6,0.5,0.6,1,0.6,1,0.6), nrow = 3) set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm1 = simu.SBM(conn1_mat, can_vec, n, symm = TRUE, self = TRUE) sbm2 = simu.SBM(conn2_mat, can_vec, n, symm = TRUE, self = TRUE) data_mat = cbind(sbm1$obs_mat, sbm2$obs_mat) data_mat1 = data_mat[,seq(1,ncol(data_mat),2)] data_mat2 = data_mat[,seq(2,ncol(data_mat),2)] M = 10 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(data_mat1)) temp = WBS.network(data_mat1, data_mat2, 1, ncol(data_mat1), intervals$Alpha, intervals$Beta, delta = 5) rho_hat = quantile(rowMeans(data_mat), 0.95) tau = p*rho_hat*(log(n))^2/20 # default threshold given in the paper cpt_init = unlist(thresholdBS(temp, tau)$cpt_hat[,1]) cpt_refined = local.refine.network(cpt_init, data_mat1, data_mat2, self = TRUE, tau2 = p*rho_hat/3, tau3 = Inf) cpt_WBS = 2*cpt_init cpt_refined = 2*cpt_refined
p = 15 # number of nodes rho = 0.5 # sparsity parameter block_num = 3 # number of groups for SBM n = 100 # sample size for each segment # connectivity matrix for the first and the third segments conn1_mat = rho * matrix(c(0.6,1,0.6,1,0.6,0.5,0.6,0.5,0.6), nrow = 3) # connectivity matrix for the second segment conn2_mat = rho * matrix(c(0.6,0.5,0.6,0.5,0.6,1,0.6,1,0.6), nrow = 3) set.seed(1) can_vec = sample(1:p, replace = FALSE) # randomly assign nodes into groups sbm1 = simu.SBM(conn1_mat, can_vec, n, symm = TRUE, self = TRUE) sbm2 = simu.SBM(conn2_mat, can_vec, n, symm = TRUE, self = TRUE) data_mat = cbind(sbm1$obs_mat, sbm2$obs_mat) data_mat1 = data_mat[,seq(1,ncol(data_mat),2)] data_mat2 = data_mat[,seq(2,ncol(data_mat),2)] M = 10 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(data_mat1)) temp = WBS.network(data_mat1, data_mat2, 1, ncol(data_mat1), intervals$Alpha, intervals$Beta, delta = 5) rho_hat = quantile(rowMeans(data_mat), 0.95) tau = p*rho_hat*(log(n))^2/20 # default threshold given in the paper cpt_init = unlist(thresholdBS(temp, tau)$cpt_hat[,1]) cpt_refined = local.refine.network(cpt_init, data_mat1, data_mat2, self = TRUE, tau2 = p*rho_hat/3, tau3 = Inf) cpt_WBS = 2*cpt_init cpt_refined = 2*cpt_refined
Perform wild binary segmentation for dependent dynamic random dot product graph models.
WBS.nonpar.RDPG(data_mat, lowerdiag = FALSE, d, Alpha, Beta, delta)
WBS.nonpar.RDPG(data_mat, lowerdiag = FALSE, d, Alpha, Beta, delta)
data_mat |
A |
lowerdiag |
A |
d |
A |
Alpha |
A |
Beta |
A |
delta |
A positive |
An object of class
"BS", which is a list
with the following structure:
S |
A vector of estimated change point locations (sorted in strictly increasing order). |
Dval |
A vector of values of CUSUM statistic. |
Level |
A vector representing the levels at which each change point is detected. |
Parent |
A matrix with the starting indices on the first row and the ending indices on the second row. |
Oscar Hernan Madrid Padilla, Haotian Xu
Padilla, Yu and Priebe (2019) <arxiv:1911.07494>.
thresholdBS
for obtaining change points estimation, tuneBSnonparRDPG
for a tuning version.
### generate data p = 20 # number of nodes n = 50 # sample size for each segment lat_dim_num = 5 # number of latent dimensions set.seed(1) x_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) x_tilde_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) y_mat = rbind(x_tilde_mat[1:floor(p/4),], x_mat[(floor(p/4)+1):p,]) rdpg1 = simu.RDPG(x_mat, n, symm = TRUE, self = FALSE) rdpg2 = simu.RDPG(y_mat, n, symm = TRUE, self = FALSE) data1_mat = rdpg1$obs_mat data2_mat = rdpg2$obs_mat data_mat = cbind(data1_mat, data2_mat) ### detect change points M = 30 # number of random intervals for WBS d = 10 # parameter for scaled PCA algorithm delta = 5 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(data_mat)) WBS_result = WBS.nonpar.RDPG(data_mat, lowerdiag = TRUE, d, Alpha = intervals$Alpha, Beta = intervals$Beta, delta)
### generate data p = 20 # number of nodes n = 50 # sample size for each segment lat_dim_num = 5 # number of latent dimensions set.seed(1) x_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) x_tilde_mat = matrix(runif(p*lat_dim_num), nrow = p, ncol = lat_dim_num) y_mat = rbind(x_tilde_mat[1:floor(p/4),], x_mat[(floor(p/4)+1):p,]) rdpg1 = simu.RDPG(x_mat, n, symm = TRUE, self = FALSE) rdpg2 = simu.RDPG(y_mat, n, symm = TRUE, self = FALSE) data1_mat = rdpg1$obs_mat data2_mat = rdpg2$obs_mat data_mat = cbind(data1_mat, data2_mat) ### detect change points M = 30 # number of random intervals for WBS d = 10 # parameter for scaled PCA algorithm delta = 5 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(data_mat)) WBS_result = WBS.nonpar.RDPG(data_mat, lowerdiag = TRUE, d, Alpha = intervals$Alpha, Beta = intervals$Beta, delta)
Perform wild binary segmentation for univariate nonparametric change points detection.
WBS.uni.nonpar(Y, s, e, Alpha, Beta, N, delta, level = 0)
WBS.uni.nonpar(Y, s, e, Alpha, Beta, N, delta, level = 0)
Y |
A |
s |
A |
e |
A |
Alpha |
A |
Beta |
A |
N |
A |
delta |
A positive |
level |
Should be fixed as 0. |
A list
with the following structure:
S |
A vector of estimated change points (sorted in strictly increasing order) |
Dval |
A vector of values of CUSUM statistic based on KS distance |
Level |
A vector representing the levels at which each change point is detected |
Parent |
A matrix with the starting indices on the first row and the ending indices on the second row |
Oscar Hernan Madrid Padilla, Haotian Xu
Padilla, Yu, Wang and Rinaldo (2021) <doi:10.1214/21-EJS1809>.
thresholdBS
for obtaining change points estimation, tuneBSuninonpar
for a tuning version.
Y = t(as.matrix(c(rnorm(100, 0, 1), rnorm(100, 0, 10), rnorm(100, 0, 40)))) N = rep(1, 300) M = 20 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(Y)) temp = WBS.uni.nonpar(Y, 1, 300, intervals$Alpha, intervals$Beta, N, 5) plot.ts(t(Y)) points(x = tail(temp$S[order(temp$Dval)], 4), y = Y[,tail(temp$S[order(temp$Dval)],4)], col = "red") thresholdBS(temp, 2)
Y = t(as.matrix(c(rnorm(100, 0, 1), rnorm(100, 0, 10), rnorm(100, 0, 40)))) N = rep(1, 300) M = 20 intervals = WBS.intervals(M = M, lower = 1, upper = ncol(Y)) temp = WBS.uni.nonpar(Y, 1, 300, intervals$Alpha, intervals$Beta, N, 5) plot.ts(t(Y)) points(x = tail(temp$S[order(temp$Dval)], 4), y = Y[,tail(temp$S[order(temp$Dval)],4)], col = "red") thresholdBS(temp, 2)
Perform a robust version of the wild binary segmentation method using Huber loss.
WBS.uni.rob(y, s, e, Alpha, Beta, K = 1.345, delta, level = 0)
WBS.uni.rob(y, s, e, Alpha, Beta, K = 1.345, delta, level = 0)
y |
A |
s |
A |
e |
A |
Alpha |
An |
Beta |
An |
K |
A |
delta |
A positive |
level |
Should be fixed as 0. |
An object of class
"BS", which is a list
with the following structure:
S |
A vector of estimated change points (sorted in strictly increasing order). |
Dval |
A vector of values of CUSUM statistic based on KS distance. |
Level |
A vector representing the levels at which each change point is detected. |
Parent |
A matrix with the starting indices on the first row and the ending indices on the second row. |
Mengchu Li & Haotian Xu
Fearnhead & Rigaill (2019) <doi:10.1080/01621459.2017.1385466>.
thresholdBS
for obtaining change points estimation.
Perform wild binary segmentation for univariate mean change points detection.
WBS.univar(y, s, e, Alpha, Beta, delta = 2, level = 0)
WBS.univar(y, s, e, Alpha, Beta, delta = 2, level = 0)
y |
A |
s |
A |
e |
A |
Alpha |
A |
Beta |
A |
delta |
A positive |
level |
Should be fixed as 0. |
An object of class
"BS", which is a list
with the following structure:
S |
A vector of estimated change point locations (sorted in strictly increasing order). |
Dval |
A vector of values of CUSUM statistic. |
Level |
A vector representing the levels at which each change point is detected. |
Parent |
A matrix with the starting indices on the first row and the ending indices on the second row. |
Haotian Xu
Wang, Yu and Rinaldo (2020) <doi:10.1214/20-EJS1710>.
thresholdBS
for obtaining change points estimation, tuneBSunivar
for a tuning version.
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) intervals = WBS.intervals(M = 300, lower = 1, upper = length(y)) temp = WBS.univar(y, 1, length(y), intervals$Alpha, intervals$Beta, delta = 5) plot.ts(y) points(x = tail(temp$S[order(temp$Dval)],4), y = y[tail(temp$S[order(temp$Dval)],4)], col = "red") WBS_result = thresholdBS(temp, tau = 4) print(WBS_result$BS_tree, "value") plot(WBS_result$BS_tree) print(WBS_result$BS_tree_trimmed, "value") plot(WBS_result$BS_tree_trimmed) cpt_hat = sort(WBS_result$cpt_hat[,1]) # the threshold tau is specified to be 4 Hausdorff.dist(cpt_hat, cpt_true) cpt_LR = local.refine.univar(cpt_hat, y) Hausdorff.dist(cpt_LR, cpt_true)
set.seed(0) cpt_true = c(20, 50, 170) y = rnorm(300) + c(rep(0,20),rep(2,30),rep(0,120),rep(2,130)) intervals = WBS.intervals(M = 300, lower = 1, upper = length(y)) temp = WBS.univar(y, 1, length(y), intervals$Alpha, intervals$Beta, delta = 5) plot.ts(y) points(x = tail(temp$S[order(temp$Dval)],4), y = y[tail(temp$S[order(temp$Dval)],4)], col = "red") WBS_result = thresholdBS(temp, tau = 4) print(WBS_result$BS_tree, "value") plot(WBS_result$BS_tree) print(WBS_result$BS_tree_trimmed, "value") plot(WBS_result$BS_tree_trimmed) cpt_hat = sort(WBS_result$cpt_hat[,1]) # the threshold tau is specified to be 4 Hausdorff.dist(cpt_hat, cpt_true) cpt_LR = local.refine.univar(cpt_hat, y) Hausdorff.dist(cpt_LR, cpt_true)
Perform wild binary segmentation for covariance change points detection through Independent Projection
WBSIP.cov(X, X_prime, s, e, Alpha, Beta, delta, level = 0)
WBSIP.cov(X, X_prime, s, e, Alpha, Beta, delta, level = 0)
X |
A |
X_prime |
A |
s |
A |
e |
A |
Alpha |
A |
Beta |
A |
delta |
A positive |
level |
A parameter for tracking the level at which a change point is detected. Should be fixed as 0. |
An object of class
"BS", which is a list
with the following structure:
S |
A vector of estimated change points (sorted in strictly increasing order) |
Dval |
A vector of values of CUSUM statistic based on KS distance |
Level |
A vector representing the levels at which each change point is detected |
Parent |
A matrix with the starting indices on the first row and the ending indices on the second row |
Haotian Xu
Wang, Yu and Rinaldo (2021) <doi:10.3150/20-BEJ1249>.
thresholdBS
for obtain change points estimation.
p = 10 A1 = gen.cov.mat(p, 1, "equal") A2 = gen.cov.mat(p, 3, "power") A3 = A1 set.seed(1234) X = cbind(t(MASS::mvrnorm(50, mu = rep(0, p), A1)), t(MASS::mvrnorm(50, mu = rep(0, p), A2)), t(MASS::mvrnorm(50, mu = rep(0, p), A3))) X_prime = cbind(t(MASS::mvrnorm(50, mu = rep(0, p), A1)), t(MASS::mvrnorm(50, mu = rep(0, p), A2)), t(MASS::mvrnorm(50, mu = rep(0, p), A3))) intervals = WBS.intervals(M = 120, lower = 1, upper = dim(X)[2]) temp = WBSIP.cov(X, X_prime, 1, dim(X)[2], intervals$Alpha, intervals$Beta, delta = 5) tau = sqrt(p*log(ncol(X)))*1.5 sort(thresholdBS(temp, tau)$cpt_hat[,1])
p = 10 A1 = gen.cov.mat(p, 1, "equal") A2 = gen.cov.mat(p, 3, "power") A3 = A1 set.seed(1234) X = cbind(t(MASS::mvrnorm(50, mu = rep(0, p), A1)), t(MASS::mvrnorm(50, mu = rep(0, p), A2)), t(MASS::mvrnorm(50, mu = rep(0, p), A3))) X_prime = cbind(t(MASS::mvrnorm(50, mu = rep(0, p), A1)), t(MASS::mvrnorm(50, mu = rep(0, p), A2)), t(MASS::mvrnorm(50, mu = rep(0, p), A3))) intervals = WBS.intervals(M = 120, lower = 1, upper = dim(X)[2]) temp = WBSIP.cov(X, X_prime, 1, dim(X)[2], intervals$Alpha, intervals$Beta, delta = 5) tau = sqrt(p*log(ncol(X)))*1.5 sort(thresholdBS(temp, tau)$cpt_hat[,1])