Title: | Multiple Testing of Local Extrema for Detection of Change Points |
---|---|
Description: | Simultaneously detect the number and locations of change points in piecewise linear models under stationary Gaussian noise allowing autocorrelated random noise. The core idea is to transform the problem of detecting change points into the detection of local extrema (local maxima and local minima)through kernel smoothing and differentiation of the data sequence, see Cheng et al. (2020) <doi:10.1214/20-EJS1751>. A low-computational and fast algorithm call 'dSTEM' is introduced to detect change points based on the 'STEM' algorithm in D. Cheng and A. Schwartzman (2017) <doi:10.1214/16-AOS1458>. |
Authors: | Zhibing He <[email protected]> |
Maintainer: | Zhibing He <[email protected]> |
License: | GPL-3 |
Version: | 2.0-1 |
Built: | 2024-12-03 06:54:13 UTC |
Source: | CRAN |
Compute convolution function using FFT, similar to 'conv'
in matlab
conv(u, v, shape = c("same", "full"))
conv(u, v, shape = c("same", "full"))
u |
numerical vector |
v |
numerical vector, don't need to have the same length as |
shape |
if 'same', return central part of the convolution and has the same size as |
a vector of convolution, as specified by shape.
Matlab document on 'conv'
: https://www.mathworks.com/help/matlab/ref/conv.html
u = c(-1,2,3,-2,0,1,2) v = c(2,4,-1,1) w = conv(u,v,'same')
u = c(-1,2,3,-2,0,1,2) v = c(2,4,-1,1) w = conv(u,v,'same')
Plot data sequence, the first and second-order derivatives, and their local extrema
cp.plt(x, order, icd.noise, H)
cp.plt(x, order, icd.noise, H)
x |
numerical vector of signal or signal-plus-noise data |
order |
order of derivative of data |
icd.noise |
logical value indicating if |
H |
optional, vector of change-point locations |
a plot
l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5)*3 beta1 = c(2,-1,2.5,-3,-0.2,2.5,-0.5)/50 signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,1) gamma = 25 sdata = smth.gau(signal+noise,gamma) dy = diff(sdata) ddy = diff(sdata,differences=2) cp.plt(signal,0,FALSE) points(signal+noise,col="grey") cp.plt(dy,1,H=h) cp.plt(ddy,2,H=h)
l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5)*3 beta1 = c(2,-1,2.5,-3,-0.2,2.5,-0.5)/50 signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,1) gamma = 25 sdata = smth.gau(signal+noise,gamma) dy = diff(sdata) ddy = diff(sdata,differences=2) cp.plt(signal,0,FALSE) points(signal+noise,col="grey") cp.plt(dy,1,H=h) cp.plt(ddy,2,H=h)
Multiple testing of change points for kernel smoothed data
cpTest( x, order, alpha, gamma, sigma, breaks, slope, untest, nu, is.constant, margin )
cpTest( x, order, alpha, gamma, sigma, breaks, slope, untest, nu, is.constant, margin )
x |
vector of kernel smoothed data |
order |
order of derivative of data |
alpha |
global significant level |
gamma |
bandwidth of Gaussian kernel |
sigma |
standard deviation of kernel smoothed noise |
breaks |
vector of rough estimate of change-point locations, only required when order is 1. |
slope |
vector of rough estimate of slopes associated with |
untest |
vector of locations unnecessary to test |
nu |
standard deviation of Gaussian kernel used to generate autocorrelated Gaussian noise, it is 0 if the noise is Gaussian white noise. |
is.constant |
logical value indicating if the signal is piecewise constant,
if TRUE, |
margin |
length of one period of data |
a list of estimated change-point locations and threshold for p-value
## piecewise linear signal l = 1200 h = seq(150,by=150,length.out=6) jump = rep(0,7) beta1 = c(2,-1,2.5,-3,-0.2,2.5)/50 beta1 = c(beta1,-sum(beta1*(c(h[1],diff(h))))/(l-tail(h,1))) signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,2) gamma = 25 sdata = smth.gau(signal+noise,gamma) ddy = diff(sdata,differences=2) model2 = cpTest(x=ddy,order=2,gamma=gamma,alpha=0.05) ## piecewise constant l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5) beta1 = rep(0,length(h)+1) signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,1) gamma = 25 sdata = smth.gau(signal+noise,gamma) dy = diff(sdata) model1 = cpTest(x=dy,order=1,alpha=0.05,gamma=gamma,is.constant=TRUE) ## piecewise linear with jump l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5)*3 beta1 = c(2,-1,2.5,-3,-0.2,2.5,-0.5)/50 signal = gen.signal(l=l,h=h,jump=jump,b1=beta1) noise = rnorm(length(signal),0,1) gamma = 25 sdata = smth.gau(signal+noise,gamma) dy = diff(sdata) ddy = diff(sdata,differences=2) model2 = cpTest(x=ddy,order=2,gamma=gamma,alpha=0.1) breaks = est.pair(vall=model2$vall,peak=model2$peak,gamma=gamma)$cp slope = est.slope(x=(signal+noise),breaks=breaks)
## piecewise linear signal l = 1200 h = seq(150,by=150,length.out=6) jump = rep(0,7) beta1 = c(2,-1,2.5,-3,-0.2,2.5)/50 beta1 = c(beta1,-sum(beta1*(c(h[1],diff(h))))/(l-tail(h,1))) signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,2) gamma = 25 sdata = smth.gau(signal+noise,gamma) ddy = diff(sdata,differences=2) model2 = cpTest(x=ddy,order=2,gamma=gamma,alpha=0.05) ## piecewise constant l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5) beta1 = rep(0,length(h)+1) signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,1) gamma = 25 sdata = smth.gau(signal+noise,gamma) dy = diff(sdata) model1 = cpTest(x=dy,order=1,alpha=0.05,gamma=gamma,is.constant=TRUE) ## piecewise linear with jump l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5)*3 beta1 = c(2,-1,2.5,-3,-0.2,2.5,-0.5)/50 signal = gen.signal(l=l,h=h,jump=jump,b1=beta1) noise = rnorm(length(signal),0,1) gamma = 25 sdata = smth.gau(signal+noise,gamma) dy = diff(sdata) ddy = diff(sdata,differences=2) model2 = cpTest(x=ddy,order=2,gamma=gamma,alpha=0.1) breaks = est.pair(vall=model2$vall,peak=model2$peak,gamma=gamma)$cp slope = est.slope(x=(signal+noise),breaks=breaks)
Detection of change points based on 'dSTEM' algorithm
dstem( data, type = c("I", "II-step", "II-linear", "mixture"), gamma = 20, alpha = 0.05 )
dstem( data, type = c("I", "II-step", "II-linear", "mixture"), gamma = 20, alpha = 0.05 )
data |
vector of data sequence |
type |
"I" if the change points are piecewise linear and continuous;
"II-step" if the change points are piecewise constant and noncontinuous;
"II-linear" if the change points are piecewise linear and noncontinuous;
"mixture" if both type I and type II change points are include in |
gamma |
bandwidth of Gaussian kernel |
alpha |
global significant level |
if type is 'mixture', the output is a list of type I and type II change points, otherwise, it is a list of change points
## piecewise linear signal l = 1200 h = seq(150,by=150,length.out=6) jump = rep(0,7) beta1 = c(2,-1,2.5,-3,-0.2,2.5)/50 beta1 = c(beta1,-sum(beta1*(c(h[1],diff(h))))/(l-tail(h,1))) signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,1) gamma = 25 model = dstem(signal + noise,"I",gamma=gamma,alpha=0.05) ## piecewise constant l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5) beta1 = rep(0,length(h)+1) signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,1) gamma = 25 model = dstem(signal + noise, "II-step",gamma,alpha=0.05) ## piecewise linear with jump l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5)*3 beta1 = c(2,-1,2.5,-3,-0.2,2.5,-0.5)/50 signal = gen.signal(l=l,h=h,jump=jump,b1=beta1) noise = rnorm(length(signal),0,1) gamma = 25 model = dstem(signal + noise, "II-linear",gamma,alpha=0.05)
## piecewise linear signal l = 1200 h = seq(150,by=150,length.out=6) jump = rep(0,7) beta1 = c(2,-1,2.5,-3,-0.2,2.5)/50 beta1 = c(beta1,-sum(beta1*(c(h[1],diff(h))))/(l-tail(h,1))) signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,1) gamma = 25 model = dstem(signal + noise,"I",gamma=gamma,alpha=0.05) ## piecewise constant l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5) beta1 = rep(0,length(h)+1) signal = gen.signal(l,h,jump,beta1) noise = rnorm(length(signal),0,1) gamma = 25 model = dstem(signal + noise, "II-step",gamma,alpha=0.05) ## piecewise linear with jump l = 1200 h = seq(150,by=150,length.out=6) jump = c(0,1.5,2,2.2,1.8,2,1.5)*3 beta1 = c(2,-1,2.5,-3,-0.2,2.5,-0.5)/50 signal = gen.signal(l=l,h=h,jump=jump,b1=beta1) noise = rnorm(length(signal),0,1) gamma = 25 model = dstem(signal + noise, "II-linear",gamma,alpha=0.05)
Identify pairwise local maxima and local minima of the second-order derivative
est.pair(vall, peak, gamma)
est.pair(vall, peak, gamma)
vall |
vector of locations of significant local minima |
peak |
vector of locations of significant local maxima |
gamma |
bandwidth of Gaussian kernel smoothing function |
a list of detected pairs and detected change-point locations through second-order derivative testing
Estimate variance of smoothed Gaussian noise through its second-order derivative
est.sigma2(x, gamma, k = 0.5)
est.sigma2(x, gamma, k = 0.5)
x |
numerical vector of second-order derivative of kernel smoothed data |
gamma |
bandwidth of Gaussian kernel |
k |
numerical value, local maxima (minima) are presumed beyond |
value of estimated variance of smoothed noise
l=15000; h = seq(150,l,150) jump = rep(0,length(h)+1); b1 = seq(from=0,by=0.15,length = length(h)+1) signal = gen.signal(l,h,jump,b1) data = signal + rnorm(length(signal),0,1) # standard white noise gamma = 10 ddy = diff(smth.gau(data,gamma),differences=2) est.sigma2(ddy,gamma,k=0.5) # true value is \eqn{\frac{1}{2\sqrt{pi}\gamma}}
l=15000; h = seq(150,l,150) jump = rep(0,length(h)+1); b1 = seq(from=0,by=0.15,length = length(h)+1) signal = gen.signal(l,h,jump,b1) data = signal + rnorm(length(signal),0,1) # standard white noise gamma = 10 ddy = diff(smth.gau(data,gamma),differences=2) est.sigma2(ddy,gamma,k=0.5) # true value is \eqn{\frac{1}{2\sqrt{pi}\gamma}}
Estimate piecewise slope for piecewise linear model
est.slope(x, breaks)
est.slope(x, breaks)
x |
numerical vector of signal-plus-noise data |
breaks |
numerical vector of change-point locations |
a vector of estimated piecewise slope
Compute TPR and FPR
Fdr(uh, th, b)
Fdr(uh, th, b)
uh |
numerical vector of estimated change point locations |
th |
numerical vector of true change point locations |
b |
location tolerance, usually specified as the bandwidth |
a dataframe of FDR
(FPR) and Power
(TPR)
Compute FDR threshold based on Benjamini-Hochberg (BH) algorithm
fdrBH(p, q)
fdrBH(p, q)
p |
a vector of p-values |
q |
False Discovery Rate level |
p-value threshold based on independence or positive dependence
fdrBH(seq(0.01,0.1,0.01),q=0.1)
fdrBH(seq(0.01,0.1,0.01),q=0.1)
Generate simulated signals
gen.signal(l, h, jump, b1, rep = 1, shift = 0)
gen.signal(l, h, jump, b1, rep = 1, shift = 0)
l |
length of data, if data is periodic then the length in each period |
h |
numerical vector of true change point locations |
jump |
numerical vector of jump size at change point locations |
b1 |
numerical vector of piecewise slopes |
rep |
number of periods if data is periodic, default is 1 |
shift |
numerical vector of vertical shifts for each period, default is 0 |
a vector of simulated signal
l = 1200 h = seq(150,by=150,length.out=6) jump = rep(0,7) beta1 = c(2,-1,2.5,-3,-0.2,2.5)/50 beta1 = c(beta1,-sum(beta1*(c(h[1],diff(h))))/(l-tail(h,1))) signal = gen.signal(l,h,jump,beta1)
l = 1200 h = seq(150,by=150,length.out=6) jump = rep(0,7) beta1 = c(2,-1,2.5,-3,-0.2,2.5)/50 beta1 = c(beta1,-sum(beta1*(c(h[1],diff(h))))/(l-tail(h,1))) signal = gen.signal(l,h,jump,beta1)
A subset of daily stock price data of HST from November 7, 2011 to November 5, 2021.
HST_stock
HST_stock
A data frame with 2,517 rows and 6 columns:
date from November 7, 2011 to November 5, 2021
stock price
stock exchange volume
<https://finance.yahoo.com/quote/HST>
Smoothing data using Gaussian kernel
smth.gau(x, gamma)
smth.gau(x, gamma)
x |
numeric vector of values to smooth |
gamma |
bandwidth of Gaussian kernel |
vector of smoothed values
smth.gau(x=rnorm(1000), gamma=20)
smth.gau(x=rnorm(1000), gamma=20)
Compute SNR of a certain change point location
snr(order, gamma, is.jump, jump, diffb, addb)
snr(order, gamma, is.jump, jump, diffb, addb)
order |
order of derivative of data |
gamma |
bandwidth of Gaussian kernel |
is.jump |
logical value indicating if the location to be calculated is a jump point |
jump |
jump height |
diffb |
difference of the slopes on left and right sides of the location |
addb |
sum of the slopes, only used when order is 1 |
value of SNR
Find local maxima and local minima of data sequence
which.peaks(x, partial = FALSE, decreasing = FALSE)
which.peaks(x, partial = FALSE, decreasing = FALSE)
x |
numerical vector contains local maxima (minima) |
partial |
logical value indicating if the two endpoints will be considered |
decreasing |
logical value indicating whether to find local minima |
a vector of locations of local maxima or minima
a = 100:1 which.peaks(a*sin(a/3))
a = 100:1 which.peaks(a*sin(a/3))