Title: | Some Nonparametric CUSUM Tests for Change-Point Detection in Possibly Multivariate Observations |
---|---|
Description: | Provides nonparametric CUSUM tests for detecting changes in possibly serially dependent univariate or low-dimensional multivariate observations. Retrospective tests sensitive to changes in the expectation, the variance, the covariance, the autocovariance, the distribution function, Spearman's rho, Kendall's tau, Gini's mean difference, and the copula are provided, as well as a test for detecting changes in the distribution of independent block maxima (with environmental studies in mind). The package also contains a test sensitive to changes in the autocopula and a combined test of stationarity sensitive to changes in the distribution function and the autocopula. The latest additions are an open-end sequential test based on the retrospective CUSUM statistic that can be used for monitoring changes in the mean of possibly serially dependent univariate observations, as well as closed-end and open-end sequential tests based on empirical distribution functions that can be used for monitoring changes in the contemporary distribution of possibly serially dependent univariate or low-dimensional multivariate observations. |
Authors: | Ivan Kojadinovic [aut, cre] , Alex Verhoijsen [ctb] |
Maintainer: | Ivan Kojadinovic <[email protected]> |
License: | GPL (>= 3) | file LICENCE |
Version: | 0.2-6 |
Built: | 2024-12-18 06:27:41 UTC |
Source: | CRAN |
In the context of the standard CUSUM test based on the sample mean or
in a particular empirical process setting, the following functions
estimate the bandwidth parameter controlling the serial dependence
when generating dependent multiplier sequences using the 'moving
average approach'; see Section 5 of the third reference. The function
function bOpt()
is called in the functions
cpMean()
, cpVar()
, cpGini()
,
cpAutocov()
, cpCov()
,
cpTau()
and detOpenEndCpMean()
when b
is
set to NULL
. The function function bOptEmpProc()
is
called in the functions cpDist()
,
cpCopula()
, cpAutocop()
,
stDistAutocop()
and simClosedEndCpDist()
when
b
is set to NULL
.
bOpt(influ, weights = c("parzen", "bartlett")) bOptEmpProc(x, m=5, weights = c("parzen", "bartlett"), L.method=c("max","median","mean","min"))
bOpt(influ, weights = c("parzen", "bartlett")) bOptEmpProc(x, m=5, weights = c("parzen", "bartlett"), L.method=c("max","median","mean","min"))
influ |
a numeric containing the relevant influence coefficients, which, in the case of the standard CUSUM test based on the sample mean, are simply the available observations; see also the last reference. |
x |
a data matrix whose rows are continuous observations. |
weights |
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the third reference. |
m |
a strictly positive integer specifying the number of points of the
uniform grid on |
L.method |
a string specifying how the parameter |
The implemented approach results from an adaptation of the procedure described in the first two references (see also the references therein). The use of theses functions in a context different from that considered in the third or fourth reference may not be meaningful.
Acknowledgment: Part of the code of the function results from an adaptation of R code of C. Parmeter and J. Racine, itself an adaptation of Matlab code by A. Patton.
A strictly positive integer.
D.N. Politis and H. White (2004), Automatic block-length selection for the dependent bootstrap, Econometric Reviews 23(1), pages 53-70.
D.N. Politis, H. White and A.J. Patton (2004), Correction: Automatic block-length selection for the dependent bootstrap, Econometric Reviews 28(4), pages 372-375.
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
A. Bücher and I. Kojadinovic (2016), Dependent multiplier bootstraps for non-degenerate U-statistics under mixing conditions with applications, Journal of Statistical Planning and Inference 170 pages 83-105, https://arxiv.org/abs/1412.5875.
cpDist()
, cpCopula()
,
cpAutocop()
, stDistAutocop()
,
cpMean()
, cpVar()
, cpGini()
,
cpAutocov()
, cpCov()
,
cpTau()
, seqOpenEndCpMean
and
seqClosedEndCpDist
.
Nonparametric test for change-point detection particularly sensitive to changes in the autocopula of univariate continuous observations. Approximate p-values for the test statistic are obtained by means of a multiplier approach. Details can be found in the first reference.
cpAutocop(x, lag = 1, b = NULL, bivariate = FALSE, weights = c("parzen", "bartlett"), m = 5, N = 1000, init.seq = NULL, include.replicates = FALSE)
cpAutocop(x, lag = 1, b = NULL, bivariate = FALSE, weights = c("parzen", "bartlett"), m = 5, N = 1000, init.seq = NULL, include.replicates = FALSE)
x |
a one-column matrix containing continuous observations. |
lag |
an integer specifying at which lag to consider the
autocopula; the autocopula is a ( |
b |
strictly positive integer specifying the value of the
bandwidth parameter determining the serial dependence when
generating dependent multiplier sequences using the 'moving average
approach'; see Section 5 of the second reference. If set to
|
bivariate |
a logical specifying whether the test should focus
only on the bivariate margin of the ( |
weights |
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the second reference. |
m |
a strictly positive integer specifying the number of points of the
uniform grid on |
N |
number of multiplier replications. |
init.seq |
a sequence of independent standard normal variates of
length |
include.replicates |
a logical specifying whether the
object of |
The approximate p-value is computed as
where and
denote the test statistic and
a multiplier replication, respectively. This ensures that the
approximate p-value is a number strictly between 0 and 1, which is
sometimes necessary for further treatments.
An object of class
htest
which is a list,
some of the components of which are
statistic |
value of the test statistic. |
p.value |
corresponding approximate p-value. |
cvm |
the values of the |
b |
the value of parameter |
This is a tests for a continuous univariate time series.
A. Bücher, J.-D. Fermanian and I. Kojadinovic (2019), Combining cumulative sum change-point detection tests for assessing the stationarity of univariate time series, Journal of Time Series Analysis 40, pages 124-150, https://arxiv.org/abs/1709.02673.
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
cpAutocov()
for a related test based on
the autocovariance.
## AR1 example n <- 200 k <- n/2 ## the true change-point x <- matrix(c(arima.sim(list(ar = -0.5), n = k), arima.sim(list(ar = 0.5), n = n - k))) cp <- cpAutocop(x) cp ## Estimated change-point which(cp$cvm == max(cp$cvm)) ## AR2 example n <- 200 k <- n/2 ## the true change-point x <- matrix(c(arima.sim(list(ar = c(0,-0.5)), n = k), arima.sim(list(ar = c(0,0.5)), n = n - k))) cpAutocop(x) cpAutocop(x, lag = 2) cpAutocop(x, lag = 2, bivariate = TRUE)
## AR1 example n <- 200 k <- n/2 ## the true change-point x <- matrix(c(arima.sim(list(ar = -0.5), n = k), arima.sim(list(ar = 0.5), n = n - k))) cp <- cpAutocop(x) cp ## Estimated change-point which(cp$cvm == max(cp$cvm)) ## AR2 example n <- 200 k <- n/2 ## the true change-point x <- matrix(c(arima.sim(list(ar = c(0,-0.5)), n = k), arima.sim(list(ar = c(0,0.5)), n = n - k))) cpAutocop(x) cpAutocop(x, lag = 2) cpAutocop(x, lag = 2, bivariate = TRUE)
Nonparametric tests for change-point detection in the distribution of independent block maxima based either on the probability weighted moment method (see the second reference) or on the generalized probability weighted moment method (see the first reference) for estimating the parameters of the generalized extreme value (GEV) distribution. It is assumed that the block maxima are independent and that their unknown distribution functions (d.f.s) are continuous, but not necessarily that they are GEV distributed. Three statistics are computed. Under the assumption that the block maxima are GEV distributed, these are statistics particularly sensitive to changes in the location, scale and shape parameters of the GEV. Details can be found in third reference.
cpBlockMax(x, method = c("pwm", "gpwm"), r = 10)
cpBlockMax(x, method = c("pwm", "gpwm"), r = 10)
x |
a numeric vector representing independent block maxima whose unknown d.f.s are assumed continuous. |
method |
a string specifying how statistics will be defined; can
be either |
r |
strictly positive integer specifying the set of breakpoints
that will be tested; more precisely, starting from the initial
sample of block maxima, the tests compare subsamples formed by the
|
Approximate p-values are computed
from the estimated asymptotic null distributions, which involve the
Kolmogorov distribution. The latter is dealt with reusing code from
the ks.test()
function; credit to RCore.
An object of class
htest
which is a list,
some of the components of which are
statistic |
value of the three test statistics. |
pvalues |
corresponding approximate p-values. |
stats.loc |
the values of the |
stats.scale |
the values of the |
stats.shape |
the values of the |
The tests were derived under the assumption of block maxima with continuous d.f., which implies that ties occur with probability zero. A way to deal with ties based on randomization is proposed in the third reference.
J. Diebolt, A. Guillou, P. Naveau and P. Ribereau (2008), Improving probability-weighted moment methods for the generalized extreme-value distribution, REVSTAT 6, pages 33-50.
J.R.M. Hosking, J.R. Wallis and E.F. Wood (1985), Estimation of the generalized extreme-value distribution by the method of probability-weighted moments, Technometrics 27, pages 251-261.
I. Kojadinovic and P. Naveau (2017), Nonparametric tests for change-point detection in the distribution of block maxima based on probability weighted moments, Extremes 20:2, pages 417-450, https://arxiv.org/abs/1507.06121.
cpDist()
for a related test based on the empirical d.f.
## Not run: require(evd) n <- 100 k <- 50 ## the true change-point ## Change in the shape parameter of a GEV x <- rgev(k,loc=0,scale=1,shape=-0.8) y <- rgev(k,loc=0,scale=1,shape=0.4) cp <- cpBlockMax(c(x,y)) cp ## Estimated change-point which(cp$stats.shape == max(cp$stats.shape)) ## Change in the scale parameter of a GEV x <- rgev(k,loc=0,scale=0.5,shape=0) y <- rgev(k,loc=0,scale=1,shape=0) cp <- cpBlockMax(c(x,y)) cp ## Estimated change-point which(cp$stats.scale == max(cp$stats.scale)) ## Change in the location parameter of a GEV x <- rgev(k,loc=0,scale=1,shape=0) y <- rgev(k,loc=0.5,scale=1,shape=0) cp <- cpBlockMax(c(x,y)) cp ## Estimated change-point which(cp$stats.loc == max(cp$stats.loc)) ## End(Not run)
## Not run: require(evd) n <- 100 k <- 50 ## the true change-point ## Change in the shape parameter of a GEV x <- rgev(k,loc=0,scale=1,shape=-0.8) y <- rgev(k,loc=0,scale=1,shape=0.4) cp <- cpBlockMax(c(x,y)) cp ## Estimated change-point which(cp$stats.shape == max(cp$stats.shape)) ## Change in the scale parameter of a GEV x <- rgev(k,loc=0,scale=0.5,shape=0) y <- rgev(k,loc=0,scale=1,shape=0) cp <- cpBlockMax(c(x,y)) cp ## Estimated change-point which(cp$stats.scale == max(cp$stats.scale)) ## Change in the location parameter of a GEV x <- rgev(k,loc=0,scale=1,shape=0) y <- rgev(k,loc=0.5,scale=1,shape=0) cp <- cpBlockMax(c(x,y)) cp ## Estimated change-point which(cp$stats.loc == max(cp$stats.loc)) ## End(Not run)
Nonparametric test for change-point detection particularly sensitive to changes in the copula of multivariate continuous observations. The observations can be serially independent or dependent (strongly mixing). Approximate p-values for the test statistic are obtained by means of a multiplier approach. Details can be found in the first reference.
cpCopula(x, method = c("seq", "nonseq"), b = NULL, weights = c("parzen", "bartlett"), m = 5, L.method=c("max","median","mean","min"), N = 1000, init.seq = NULL, include.replicates = FALSE)
cpCopula(x, method = c("seq", "nonseq"), b = NULL, weights = c("parzen", "bartlett"), m = 5, L.method=c("max","median","mean","min"), N = 1000, init.seq = NULL, include.replicates = FALSE)
x |
a data matrix whose rows are multivariate continuous observations. |
method |
a string specifying the simulation method for
generating multiplier replicates of the test statistic;
can be either |
b |
strictly positive integer specifying the value of the
bandwidth parameter determining the serial dependence when
generating dependent multiplier sequences using the 'moving average
approach'; see Section 5 of the second reference. The
value 1 will create i.i.d. multiplier
sequences suitable for serially independent observations. If set to
|
weights |
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the second reference. |
m |
a strictly positive integer specifying the number of points of the
uniform grid on |
L.method |
a string specifying how the parameter |
N |
number of multiplier replications. |
init.seq |
a sequence of independent standard normal variates of
length |
include.replicates |
a logical specifying whether the
object of |
The approximate p-value is computed as
where and
denote the test statistic and
a multiplier replication, respectively. This ensures that the
approximate p-value is a number strictly between 0 and 1, which is
sometimes necessary for further treatments.
An object of class
htest
which is a list,
some of the components of which are
statistic |
value of the test statistic. |
p.value |
corresponding approximate p-value. |
cvm |
the values of the |
b |
the value of parameter |
These tests were derived under the assumption of continuous margins.
A. Bücher, I. Kojadinovic, T. Rohmer and J. Segers (2014), Detecting changes in cross-sectional dependence in multivariate time series, Journal of Multivariate Analysis 132, pages 111-128, https://arxiv.org/abs/1206.2557.
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
cpRho()
for a related test based on
Spearman's rho, cpTau()
for a related test based on
Kendall's tau, cpDist()
for a related test based
on the multivariate empirical d.f., bOptEmpProc()
for the
function used to estimate b
from x
if b = NULL
.
## Not run: require(copula) n <- 100 k <- 50 ## the true change-point u <- rCopula(k, gumbelCopula(1.5)) v <- rCopula(n - k, gumbelCopula(3)) x <- rbind(u,v) cp <- cpCopula(x, b = 1) cp ## Estimated change-point which(cp$cvm == max(cp$cvm)) ## End(Not run)
## Not run: require(copula) n <- 100 k <- 50 ## the true change-point u <- rCopula(k, gumbelCopula(1.5)) v <- rCopula(n - k, gumbelCopula(3)) x <- rbind(u,v) cp <- cpCopula(x, b = 1) cp ## Estimated change-point which(cp$cvm == max(cp$cvm)) ## End(Not run)
Nonparametric test for change-point detection based on the (multivariate) empirical distribution function. The observations can be continuous univariate or multivariate, and serially independent or dependent (strongly mixing). Approximate p-values for the test statistics are obtained by means of a multiplier approach. The first reference treats the serially independent case while details about the serially dependent case can be found in second and third references.
cpDist(x, statistic = c("cvmmax", "cvmmean", "ksmax", "ksmean"), method = c("nonseq", "seq"), b = NULL, gamma = 0, delta = 1e-4, weights = c("parzen", "bartlett"), m = 5, L.method=c("max","median","mean","min"), N = 1000, init.seq = NULL, include.replicates = FALSE)
cpDist(x, statistic = c("cvmmax", "cvmmean", "ksmax", "ksmean"), method = c("nonseq", "seq"), b = NULL, gamma = 0, delta = 1e-4, weights = c("parzen", "bartlett"), m = 5, L.method=c("max","median","mean","min"), N = 1000, init.seq = NULL, include.replicates = FALSE)
x |
a data matrix whose rows are continuous observations. |
statistic |
a string specifying the statistic whose value and
p-value will be displayed; can be either |
method |
a string specifying the simulation method for
generating multiplier replicates of the test statistic;
can be either |
b |
strictly positive integer specifying the value of the
bandwidth parameter determining the serial dependence when
generating dependent multiplier sequences using the 'moving average
approach'; see Section 5 of the second reference. The
value 1 will create i.i.d. multiplier
sequences suitable for serially independent observations. If set to
|
gamma |
parameter between 0 and 0.5 appearing in the definition of the weight function used in the detector function. |
delta |
parameter between 0 and 1 appearing in the definition of the weight function used in the detector function. |
weights |
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the second reference. |
m |
a strictly positive integer specifying the number of points of the
uniform grid on |
L.method |
a string specifying how the parameter |
N |
number of multiplier replications. |
init.seq |
a sequence of independent standard normal variates of
length |
include.replicates |
a logical specifying whether the
object of |
The approximate p-value is computed as
where and
denote the test statistic and
a multiplier replication, respectively. This ensures that the
approximate p-value is a number strictly between 0 and 1, which is
sometimes necessary for further treatments.
An object of class
htest
which is a list,
some of the components of which are
statistic |
value of the test statistic. |
p.value |
corresponding approximate p-value. |
cvm |
the values of the |
ks |
the values of the |
all.statistics |
the values of all four test statistics. |
all.p.values |
the corresponding p-values. |
b |
the value of parameter |
Note that when the observations are continuous univariate and serially independent, independent realizations of the tests statistics under the null hypothesis of no change in the distribution can be obtained by simulation; see Section 4 in the first reference.
M. Holmes, I. Kojadinovic and J-F. Quessy (2013), Nonparametric tests for change-point detection à la Gombay and Horváth, Journal of Multivariate Analysis 115, pages 16-32.
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
A. Bücher, J.-D. Fermanian and I. Kojadinovic (2019), Combining cumulative sum change-point detection tests for assessing the stationarity of univariate time series, Journal of Time Series Analysis 40, pages 124-150, https://arxiv.org/abs/1709.02673.
cpCopula()
for a related test based on the empirical
copula, cpRho()
for a related test based on
Spearman's rho, cpTau()
for a related test based on
Kendall's tau, bOptEmpProc()
for the function used to
estimate b
from x
if b = NULL
,
seqClosedEndCpDist
for the corresponding sequential test.
## A univariate example n <- 100 k <- 50 ## the true change-point y <- rnorm(k) z <- rexp(n-k) x <- matrix(c(y,z)) cp <- cpDist(x, b = 1) cp ## All statistics cp$all.statistics ## Corresponding p.values cp$all.p.values ## Estimated change-point which(cp$cvm == max(cp$cvm)) which(cp$ks == max(cp$ks)) ## A very artificial trivariate example ## with a break in the first margin n <- 100 k <- 50 ## the true change-point y <- rnorm(k) z <- rnorm(n-k, mean = 2) x <- cbind(c(y,z),matrix(rnorm(2*n), n, 2)) cp <- cpDist(x, b = 1) cp ## All statistics cp$all.statistics ## Corresponding p.values cp$all.p.values ## Estimated change-point which(cp$cvm == max(cp$cvm)) which(cp$ks == max(cp$ks))
## A univariate example n <- 100 k <- 50 ## the true change-point y <- rnorm(k) z <- rexp(n-k) x <- matrix(c(y,z)) cp <- cpDist(x, b = 1) cp ## All statistics cp$all.statistics ## Corresponding p.values cp$all.p.values ## Estimated change-point which(cp$cvm == max(cp$cvm)) which(cp$ks == max(cp$ks)) ## A very artificial trivariate example ## with a break in the first margin n <- 100 k <- 50 ## the true change-point y <- rnorm(k) z <- rnorm(n-k, mean = 2) x <- cbind(c(y,z),matrix(rnorm(2*n), n, 2)) cp <- cpDist(x, b = 1) cp ## All statistics cp$all.statistics ## Corresponding p.values cp$all.p.values ## Estimated change-point which(cp$cvm == max(cp$cvm)) which(cp$ks == max(cp$ks))
Nonparametric test for change-point detection particularly sensitive to changes in Spearman's rho in multivariate time series. The observations can be serially independent or dependent (strongly mixing). Approximate p-values for the test statistic are obtained by means of a multiplier approach or by estimating the asymptotic null distribution. Details can be found in first reference.
cpRho(x, method = c("mult", "asym.var"), statistic = c("pairwise", "global"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE)
cpRho(x, method = c("mult", "asym.var"), statistic = c("pairwise", "global"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE)
x |
a data matrix whose rows are multivariate continuous observations. |
method |
a string specifying the method for computing the
approximate p-value for the test statistic; can be either
|
statistic |
a string specifying the test statistic; can be either
|
b |
strictly positive integer specifying the value of the
bandwidth parameter determining the serial dependence when
generating dependent multiplier sequences using the 'moving average
approach'; see Section 5 of the second reference. The value 1
will create i.i.d. multiplier
sequences suitable for serially independent observations. If set to
|
weights |
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the second reference. |
N |
number of multiplier replications. |
init.seq |
a sequence of independent standard normal variates of
length |
include.replicates |
a logical specifying whether the
object of |
When method == "mult"
, the approximate p-value is computed as
where and
denote the test statistic and
a multiplier replication, respectively. This ensures that the
approximate p-value is a number strictly between 0 and 1, which is
sometimes necessary for further treatments.
When method == "asym.var"
, the approximate p-value is computed
from the estimated asymptotic null distribution, which involves the
Kolmogorov distribution. The latter is dealt with reusing code from
the ks.test()
function; credit to RCore.
An object of class
htest
which is a list,
some of the components of which are
statistic |
value of the test statistic. |
p.value |
corresponding approximate p-value. |
rho |
the values of the |
b |
the value of parameter |
These tests were derived under the assumption of continuous margins.
I. Kojadinovic, J-F. Quessy and T. Rohmer (2016), Testing the constancy of Spearman's rho in multivariate time series, Annals of the Institute of Statistical Mathematics 68:5, pages 929-954, https://arxiv.org/abs/1407.1624.
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
cpTau()
for a related test based on
Kendall's tau, cpDist()
for a related test
based on the multivariate
empirical d.f., cpCopula()
for a related test based on
the empirical copula.
## Not run: require(copula) n <- 100 k <- 50 ## the true change-point u <- rCopula(k,gumbelCopula(1.5)) v <- rCopula(n-k,gumbelCopula(3)) x <- rbind(u,v) cp <- cpRho(x, b = 1) cp ## Estimated change-point which(cp$rho == max(cp$rho)) ## End(Not run)
## Not run: require(copula) n <- 100 k <- 50 ## the true change-point u <- rCopula(k,gumbelCopula(1.5)) v <- rCopula(n-k,gumbelCopula(3)) x <- rbind(u,v) cp <- cpRho(x, b = 1) cp ## Estimated change-point which(cp$rho == max(cp$rho)) ## End(Not run)
Nonparametric CUSUM tests for change-point detection particularly sensitive to changes in certain quantities that can be estimated using one-sample U-statistics of order one or two. So far, the quantities under consideration are the expectation (thus corresponding to the standard CUSUM test based on the sample mean), the variance, Gini's mean difference, the autocovariance at a specified lag, the covariance for bivariate data and Kendall's tau for multivariate data. The observations can be serially independent or dependent (strongly mixing). Approximate p-values for the test statistic are obtained by means of a multiplier approach or by estimating the asymptotic null distribution. Details can be found in the first reference.
cpMean(x, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpVar(x, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpGini(x, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpAutocov(x, lag = 1, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpCov(x, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpTau(x, method = c("seq", "nonseq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE)
cpMean(x, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpVar(x, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpGini(x, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpAutocov(x, lag = 1, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpCov(x, method = c("nonseq", "seq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE) cpTau(x, method = c("seq", "nonseq", "asym.var"), b = NULL, weights = c("parzen", "bartlett"), N = 1000, init.seq = NULL, include.replicates = FALSE)
x |
a numeric vector or a data matrix containing continuous observations. |
lag |
an integer specifying at which lag to consider the autocovariance. |
method |
a string specifying the method for computing the
approximate p-value for the test statistic; can be either
|
b |
strictly positive integer specifying the value of the
bandwidth parameter determining the serial dependence when
generating dependent multiplier sequences using the 'moving average
approach'; see Section 5 of the second reference. The value 1
will create i.i.d. multiplier
sequences suitable for serially independent observations. If set to
|
weights |
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the second reference. |
N |
number of multiplier replications. |
init.seq |
a sequence of independent standard normal variates of
length |
include.replicates |
a logical specifying whether the
object of |
When method
is either "seq"
or "nonseq"
,
the approximate p-value is computed as
where and
denote the test statistic and
a multiplier replication, respectively. This ensures that the
approximate p-value is a number strictly between 0 and 1, which is
sometimes necessary for further treatments.
When method = "asym.var"
, the approximate p-value is computed
from the estimated asymptotic null distribution, which involves the
Kolmogorov distribution. The latter is dealt with reusing code from
the ks.test()
function; credit to RCore.
An object of class
htest
which is a list,
some of the components of which are
statistic |
value of the test statistic. |
p.value |
corresponding approximate p-value. |
u |
the values of the |
b |
the value of parameter |
A. Bücher and I. Kojadinovic (2016), Dependent multiplier bootstraps for non-degenerate U-statistics under mixing conditions with applications, Journal of Statistical Planning and Inference 170, pages 83-105, https://arxiv.org/abs/1412.5875.
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
cpDist()
for a related test based on the multivariate
empirical d.f., cpCopula()
for a related test based on
the empirical copula, cpAutocop()
for a related test based on
the empirical autocopula, cpRho()
for a related test based on
Spearman's rho, bOpt()
for the function used to
estimate b
from x
if b = NULL
and
seqOpenEndCpMean
for related sequential tests that can be used
for online monitoring.
## The standard CUSUM test based on the sample mean cp <- cpMean(c(rnorm(50), rnorm(50, mean=1)), b=1) cp ## Estimated change-point which(cp$statistics == cp$statistic) ## Testing for changes in the autocovariance n <- 200 k <- n/2 ## the true change-point x <- c(arima.sim(list(ar = -0.5), n = k), arima.sim(list(ar = 0.5), n = n - k)) cp <- cpAutocov(x) cp ## Estimated change-point which(cp$u == cp$statistic) ## Another example x <- c(arima.sim(list(ar = c(0,-0.5)), n = k), arima.sim(list(ar = c(0,0.5)), n = n - k)) cpAutocov(x) cp <- cpAutocov(x, lag = 2) cp ## Estimated change-point which(cp$u == cp$statistic) ## Not run: ## Testing for changes in Kendall's tau require(copula) n <- 100 k <- 50 ## the true change-point u <- rCopula(k,gumbelCopula(1.5)) v <- rCopula(n-k,gumbelCopula(3)) x <- rbind(u,v) cp <- cpTau(x) cp ## Estimated change-point which(cp$u == cp$statistic) ## Testing for changes in the covariance cp <- cpCov(x) cp ## Estimated change-point which(cp$u == cp$statistic) ## End(Not run)
## The standard CUSUM test based on the sample mean cp <- cpMean(c(rnorm(50), rnorm(50, mean=1)), b=1) cp ## Estimated change-point which(cp$statistics == cp$statistic) ## Testing for changes in the autocovariance n <- 200 k <- n/2 ## the true change-point x <- c(arima.sim(list(ar = -0.5), n = k), arima.sim(list(ar = 0.5), n = n - k)) cp <- cpAutocov(x) cp ## Estimated change-point which(cp$u == cp$statistic) ## Another example x <- c(arima.sim(list(ar = c(0,-0.5)), n = k), arima.sim(list(ar = c(0,0.5)), n = n - k)) cpAutocov(x) cp <- cpAutocov(x, lag = 2) cp ## Estimated change-point which(cp$u == cp$statistic) ## Not run: ## Testing for changes in Kendall's tau require(copula) n <- 100 k <- 50 ## the true change-point u <- rCopula(k,gumbelCopula(1.5)) v <- rCopula(n-k,gumbelCopula(3)) x <- rbind(u,v) cp <- cpTau(x) cp ## Estimated change-point which(cp$u == cp$statistic) ## Testing for changes in the covariance cp <- cpCov(x) cp ## Estimated change-point which(cp$u == cp$statistic) ## End(Not run)
Estimated quantiles for the open-end nonparametric sequential
change-point detection tests described in
seqOpenEndCpMean
and
seqOpenEndCpDist
. More details can be found in the
references below.
data("quantiles")
data("quantiles")
list
of 6 elements. The first 5 are arrays containing
the estimated 90%, 95% and 99% quantiles necessary for carrying out
the sequential tests described in seqOpenEndCpMean
. The
last element is a list containing the estimated 90%, 95% and 99%
quantiles as well as other estimated parameters necessary for carrying
out the sequential test described in seqOpenEndCpDist
.
J. Gösmann, T. Kley and H. Dette (2021), A new approach for open-end sequential change point monitoring, Journal of Time Series Analysis 42:1, pages 63-84, https://arxiv.org/abs/1906.03225.
M. Holmes and I. Kojadinovic (2021), Open-end nonparametric sequential change-point detection based on the retrospective CUSUM statistic, Electronic Journal of Statistics 15:1, pages 2288-2335, doi:10.1214/21-EJS1840.
L. Horváth, M. Hušková, P. Kokoszka and J. Steinebach (2004). Monitoring changes in linear models. Journal of Statistical Planning and Inference 126, pages 225-251.
M. Holmes, I. Kojadinovic and A. Verhoijsen (2022), Multi-purpose open-end monitoring procedures for multivariate observations based on the empirical distribution function, 45 pages, https://arxiv.org/abs/2201.10311.
data("quantiles") str(quantiles)
data("quantiles") str(quantiles)
Returns a matrix of ‘representative’ points.
selectPoints(x, r, kappa = 1.5, plot = FALSE)
selectPoints(x, r, kappa = 1.5, plot = FALSE)
x |
a numeric matrix with |
r |
integer specifying the size of an initial uniformly-spaced
grid ‘on the probability scale’; an upper bound for the number of
selected points is |
kappa |
numeric constant required to be strictly greater than one involved in the point selection procedure. |
plot |
logical used only if |
The selection procedure is described in detail in Section 3.2 of the
reference below. Set plot = TRUE
for visual feedback and
information on the minimum number of ‘surrounding’ observations
required for a grid point to be selected. The initial grid 'on the
probability scale' is in blue, while the points selected by the procedure
are in red.
a matrix with d
columns whose rows are the selected points.
M. Holmes, I. Kojadinovic, and A. Verhoijsen, Multi-purpose open-end monitoring procedures for multivariate observations based on the empirical distribution function, 45 pages, https://arxiv.org/abs/2201.10311.
selectPoints()
is used in detOpenEndCpDist()
.
## Generate data set.seed(123) x1 <- rnorm(1000, 0, 1) x2 <- rnorm(1000, 0.7 * x1, sqrt((1 - 0.7^2))) x <- cbind(x1, x2) ## Point selection selectPoints(x, r = 3, kappa = 1.5, plot = TRUE) selectPoints(x, r = 3, kappa = 4, plot = TRUE) selectPoints(x, r = 5, kappa = 1.5, plot = TRUE) selectPoints(x, r = 5, kappa = 4, plot = TRUE)
## Generate data set.seed(123) x1 <- rnorm(1000, 0, 1) x2 <- rnorm(1000, 0.7 * x1, sqrt((1 - 0.7^2))) x <- cbind(x1, x2) ## Point selection selectPoints(x, r = 3, kappa = 1.5, plot = TRUE) selectPoints(x, r = 3, kappa = 4, plot = TRUE) selectPoints(x, r = 5, kappa = 1.5, plot = TRUE) selectPoints(x, r = 5, kappa = 4, plot = TRUE)
Closed-end nonparametric sequential test for change-point detection based on the (multivariate) empirical distribution function. The observations can be continuous univariate or multivariate, and serially independent or dependent (strongly mixing). To carry out the test, four steps are required. The first step consists of simulating under the null many trajectories of the detector function. The second step consists of estimating a piecewise constant threshold function from these trajectories. The third step consists of computing the detector function from the data to be monitored. The fourth and last step consists of comparing the detector function with the estimated threshold function. Each of these steps corresponds to one of the functions in the usage section below. The current implementation is preliminary and not optimized for real-time monitoring (but could still be used for that). If the observations to be monitored are univariate and can be assumed serially independent, the simulation of the trajectories of the detector functions can be carried out using Monte Carlo simulation. In all other cases, the test relies on a dependent multiplier bootstrap. Details can be found in the second reference.
simClosedEndCpDist(x.learn = NULL, m = NULL, n, gamma = 0.25, delta = 1e-4, B = 1000, method = c("sim", "mult"), b = NULL, weights = c("parzen", "bartlett"), g = 5, L.method = c("max","median","mean","min")) threshClosedEndCpDist(sims, p = 1, alpha = 0.05, type = 7) detClosedEndCpDist(x.learn, x, gamma = 0.25, delta = 1e-4) monClosedEndCpDist(det, thresh, statistic = c("mac", "mmc", "mmk", "mk", "mc"), plot = TRUE)
simClosedEndCpDist(x.learn = NULL, m = NULL, n, gamma = 0.25, delta = 1e-4, B = 1000, method = c("sim", "mult"), b = NULL, weights = c("parzen", "bartlett"), g = 5, L.method = c("max","median","mean","min")) threshClosedEndCpDist(sims, p = 1, alpha = 0.05, type = 7) detClosedEndCpDist(x.learn, x, gamma = 0.25, delta = 1e-4) monClosedEndCpDist(det, thresh, statistic = c("mac", "mmc", "mmk", "mk", "mc"), plot = TRUE)
x.learn |
a data matrix whose rows are continuous observations, representing the learning sample. |
m |
a strictly positive integer specifying the size of the
learning sample if |
n |
a strictly positive integer specifying the monitoring horizon;
the monitoring period is |
gamma |
a real parameter between 0 and 0.5 appearing in the definition of the weight function used in the detector function. |
delta |
a real parameter between 0 and 1 appearing in the definition of the weight function used in the detector function. |
B |
the number of trajectories of the detector function to simulate under the null. |
method |
a string specifying the trajectory simulation method;
can be either |
b |
strictly positive integer specifying the value of the
bandwidth parameter determining the serial dependence when
generating dependent multiplier sequences using the 'moving average
approach'; see Section 5 of the first reference. The value 1 will
create i.i.d. multiplier sequences suitable for serially independent
observations. If set to |
weights |
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the first reference. |
g |
a strictly positive integer specifying the number of points of
the uniform grid on |
L.method |
a string specifying how the parameter |
sims |
an object of class |
p |
a strictly positive integer specifying the number of steps of
the piece constant threshold function; |
alpha |
the value of the desired significance level for the sequential test. |
type |
an integer between 1 and 9 selecting one of the nine quantile
algorithms detailed in the help of the function |
x |
a data matrix whose rows are continuous observations corresponding to the new observations to be monitored for a change in contemporary distribution. |
det |
an object of class |
thresh |
an object of class |
statistic |
a string specifying the statistic/detector to be used
for the monitoring; can be either |
plot |
logical indicating whether the monitoring should be plotted. |
The testing procedure is described in detail in the second reference.
All functions return lists whose components have explicit names. The
function monClosedEndCpDist()
in particular returns a list whose
components are
alarm |
a logical indicating whether the detector function has exceeded the threshold function. |
time.alarm |
an integer corresponding to the time at
which the detector function has exceeded the threshold function or
|
times.max |
a vector of times at which the successive detectors
|
time.change |
an integer giving the estimated time of change if
|
This is a test for continuous (multivariate) time series.
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
I. Kojadinovic and G. Verdier (2021), Nonparametric sequential change-point detection for multivariate time series based on empirical distribution functions, Electronic Journal of Statistics 15(1), pages 773-829, doi:10.1214/21-EJS1798.
see cpDist()
for the corresponding a posteriori (offline) test.
## Not run: ## Example of montoring for the period m+1, ..., n m <- 100 # size of the learning sample n <- 150 # monitoring horizon ## The learning sample set.seed(123) x.learn <- matrix(rnorm(m)) ## New observations with a large change in mean ## to simulate monitoring for the period m+1, ..., n k <- 125 ## the true change-point x <- matrix(c(rnorm(k-m), rnorm(n-k, mean = 2))) ## Step 1: Simulation of B trajectories of the detector functions under the null B <- 1e4 ## Under the assumption of serial independence ## (no need to specify the learning sample) traj.sim <- simClosedEndCpDist(m = m, n = n, B = B, method = "sim") ## Without the assumption of serial independence ## (the learning sample is compulsory; the larger it is, the better; ## the monitoring horizon n should not be too large) traj.mult <- simClosedEndCpDist(x.learn = x.learn, n = n, B = B, method = "mult") ## Step 2: Compute threshold functions with p steps p <- 2 tf.sim <- threshClosedEndCpDist(traj.sim, p = p) # p can be taken large if B is very large tf.mult <- threshClosedEndCpDist(traj.mult, p = p) # p should not be taken too # large unless both m and B # are very large ## Step 3: Compute the detectors for the monitoring period m+1, ... , n det <- detClosedEndCpDist(x.learn = x.learn, x = x) ## Step 4: Monitoring ## Simulate the monitoring with the first threshold function monClosedEndCpDist(det, tf.sim) ## Simulate the monitoring with the second threshold function monClosedEndCpDist(det, tf.mult) ## Simulate the monitoring with the first threshold function ## and another detector function monClosedEndCpDist(det, tf.sim, statistic = "mmk") ## Alternative steps 3 and 4: ## Compute the detectors for the monitoring period m+1, ... , m+20 only det <- detClosedEndCpDist(x.learn = x.learn, x = x[1:20,,drop = FALSE]) ## Simulate the monitoring with the first threshold function monClosedEndCpDist(det, tf.sim) ## Simulate the monitoring with the second threshold function monClosedEndCpDist(det, tf.mult) ## End(Not run)
## Not run: ## Example of montoring for the period m+1, ..., n m <- 100 # size of the learning sample n <- 150 # monitoring horizon ## The learning sample set.seed(123) x.learn <- matrix(rnorm(m)) ## New observations with a large change in mean ## to simulate monitoring for the period m+1, ..., n k <- 125 ## the true change-point x <- matrix(c(rnorm(k-m), rnorm(n-k, mean = 2))) ## Step 1: Simulation of B trajectories of the detector functions under the null B <- 1e4 ## Under the assumption of serial independence ## (no need to specify the learning sample) traj.sim <- simClosedEndCpDist(m = m, n = n, B = B, method = "sim") ## Without the assumption of serial independence ## (the learning sample is compulsory; the larger it is, the better; ## the monitoring horizon n should not be too large) traj.mult <- simClosedEndCpDist(x.learn = x.learn, n = n, B = B, method = "mult") ## Step 2: Compute threshold functions with p steps p <- 2 tf.sim <- threshClosedEndCpDist(traj.sim, p = p) # p can be taken large if B is very large tf.mult <- threshClosedEndCpDist(traj.mult, p = p) # p should not be taken too # large unless both m and B # are very large ## Step 3: Compute the detectors for the monitoring period m+1, ... , n det <- detClosedEndCpDist(x.learn = x.learn, x = x) ## Step 4: Monitoring ## Simulate the monitoring with the first threshold function monClosedEndCpDist(det, tf.sim) ## Simulate the monitoring with the second threshold function monClosedEndCpDist(det, tf.mult) ## Simulate the monitoring with the first threshold function ## and another detector function monClosedEndCpDist(det, tf.sim, statistic = "mmk") ## Alternative steps 3 and 4: ## Compute the detectors for the monitoring period m+1, ... , m+20 only det <- detClosedEndCpDist(x.learn = x.learn, x = x[1:20,,drop = FALSE]) ## Simulate the monitoring with the first threshold function monClosedEndCpDist(det, tf.sim) ## Simulate the monitoring with the second threshold function monClosedEndCpDist(det, tf.mult) ## End(Not run)
Open-end nonparametric sequential test for change-point detection based on a retrospective CUSUM statistic constructed from differences of empirical distribution functions. The observations can be univariate or multivariate (low-dimensional), and serially dependent. To carry out the test, two steps are required. The first step consists of computing a detector function. The second step consists of comparing the detector function to a suitable constant threshold function. Each of these steps corresponds to one of the functions in the usage section below. The current implementation is preliminary and not optimized for real-time monitoring (but could still be used for that). Details can be found in the first reference.
detOpenEndCpDist(x.learn, x, pts = NULL, r = NULL, sigma = NULL, kappa = 1.5, ...) monOpenEndCpDist(det, alpha = 0.05, plot = TRUE)
detOpenEndCpDist(x.learn, x, pts = NULL, r = NULL, sigma = NULL, kappa = 1.5, ...) monOpenEndCpDist(det, alpha = 0.05, plot = TRUE)
x.learn |
a numeric matrix representing the learning sample. |
x |
a numeric matrix representing the observations collected after the beginning of the monitoring. |
pts |
a numeric matrix whose rows represent the evaluation
points; if not provided by user, chosen automatically from
the learning sample using parameter |
r |
integer greater or equal than 2 representing the number of
evaluation points per dimension to be chosen from the learning
sample; used only if |
sigma |
a numeric matrix representing the covariance matrix to be
used; if |
kappa |
constant involved in the point selection procedure; used only if the multivariate case; should be larger than 1. |
... |
optional arguments passed to |
det |
an object of class |
alpha |
the value of the desired significance level for the sequential test. |
plot |
logical indicating whether the monitoring should be plotted. |
The testing procedure is described in detail in the first reference.
Both functions return lists whose components have explicit names. The
function monOpenEndCpDist()
in particular returns a list whose
components are
alarm |
a logical indicating whether the detector function has exceeded the threshold function. |
time.alarm |
an integer corresponding to the time at
which the detector function has exceeded the threshold function or
|
times.max |
a vector of times at which the successive detectors have reached their maximum; this sequence of times can be used to estimate the time of change from the time of alarm. |
time.change |
an integer giving the estimated time of change if
|
statistic |
the value of |
eta |
the value of |
p |
number of evaluations points of the empirical distribution functions. |
pts |
evaluation points of the empirical distribution functions. |
alpha |
the value of |
sigma |
the value of |
detector |
the successive values of the detector. |
threshold |
the value of the constant threshold for the detector. |
M. Holmes, I. Kojadinovic and A. Verhoijsen (2022), Multi-purpose open-end monitoring procedures for multivariate observations based on the empirical distribution function, 45 pages, https://arxiv.org/abs/2201.10311.
M. Holmes and I. Kojadinovic (2021), Open-end nonparametric sequential change-point detection based on the retrospective CUSUM statistic, Electronic Journal of Statistics 15:1, pages 2288-2335, doi:10.1214/21-EJS1840.
See detOpenEndCpMean()
for the corresponding test
sensitive to changes in the mean, selectPoints()
for the
underlying point selection procedure used in the multivariate case
and lrvar()
for information on the estimation
of the underlying long-run covariance matrix.
## Not run: ## Example of open-end monitoring m <- 800 # size of the learning sample nm <- 5000 # number of collected observations after the start n <- nm + m # total number of observations set.seed(456) ## Univariate, no change in distribution r <- 5 # number of evaluation points x <- rnorm(n) ## Step 1: Compute the detector det <- detOpenEndCpDist(x.learn = matrix(x[1:m]), x = matrix(x[(m + 1):n]), r = r) ## Step 2: Monitoring mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## Univariate, change in distribution k <- 2000 # m + k + 1 is the time of change x[(m + k + 1):n] <- rt(nm - k, df = 3) det <- detOpenEndCpDist(x.learn = matrix(x[1:m]), x = matrix(x[(m + 1):n]), r = r) mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## Bivariate, no change d <- 2 r <- 4 # number of evaluation points per dimension x <- matrix(rnorm(n * d), nrow = n, ncol = d) det <- detOpenEndCpDist(x.learn = x[1:m, ], x = x[(m + 1):n, ], r = r) mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## Bivariate, change in the mean of the first margin x[(m + k + 1):n, 1] <- x[(m + k + 1):n, 1] + 0.3 det <- detOpenEndCpDist(x.learn = x[1:m, ], x = x[(m + 1):n, ], r = r) mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## Bivariate, change in the dependence structure x1 <- rnorm(n) x2 <- c(rnorm(m + k, 0.2 * x1[1:(m + k)], sqrt((1 - 0.2^2))), rnorm(nm - k, 0.7 * x1[(m + k + 1):n], sqrt((1 - 0.7^2)))) x <- cbind(x1, x2) det <- detOpenEndCpDist(x.learn = x[1:m, ], x = x[(m + 1):n, ], r = r) mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## End(Not run)
## Not run: ## Example of open-end monitoring m <- 800 # size of the learning sample nm <- 5000 # number of collected observations after the start n <- nm + m # total number of observations set.seed(456) ## Univariate, no change in distribution r <- 5 # number of evaluation points x <- rnorm(n) ## Step 1: Compute the detector det <- detOpenEndCpDist(x.learn = matrix(x[1:m]), x = matrix(x[(m + 1):n]), r = r) ## Step 2: Monitoring mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## Univariate, change in distribution k <- 2000 # m + k + 1 is the time of change x[(m + k + 1):n] <- rt(nm - k, df = 3) det <- detOpenEndCpDist(x.learn = matrix(x[1:m]), x = matrix(x[(m + 1):n]), r = r) mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## Bivariate, no change d <- 2 r <- 4 # number of evaluation points per dimension x <- matrix(rnorm(n * d), nrow = n, ncol = d) det <- detOpenEndCpDist(x.learn = x[1:m, ], x = x[(m + 1):n, ], r = r) mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## Bivariate, change in the mean of the first margin x[(m + k + 1):n, 1] <- x[(m + k + 1):n, 1] + 0.3 det <- detOpenEndCpDist(x.learn = x[1:m, ], x = x[(m + 1):n, ], r = r) mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## Bivariate, change in the dependence structure x1 <- rnorm(n) x2 <- c(rnorm(m + k, 0.2 * x1[1:(m + k)], sqrt((1 - 0.2^2))), rnorm(nm - k, 0.7 * x1[(m + k + 1):n], sqrt((1 - 0.7^2)))) x <- cbind(x1, x2) det <- detOpenEndCpDist(x.learn = x[1:m, ], x = x[(m + 1):n, ], r = r) mon <- monOpenEndCpDist(det = det, alpha = 0.05, plot = TRUE) ## End(Not run)
Open-end nonparametric sequential test for change-point detection based on the retrospective CUSUM statistic. The observations need to be univariate but can be serially dependent. To carry out the test, two steps are required. The first step consists of computing a detector function. The second step consists of comparing the detector function to a suitable constant threshold function. Each of these steps corresponds to one of the functions in the usage section below. The current implementation is preliminary and not optimized for real-time monitoring (but could still be used for that). Details can be found in the third reference.
detOpenEndCpMean(x.learn, x, sigma = NULL, b = NULL, weights = c("parzen", "bartlett")) monOpenEndCpMean(det, statistic = c("t", "s", "r", "e", "cs"), eta = 0.001, gamma = 0.45, alpha = 0.05, sigma = NULL, plot = TRUE)
detOpenEndCpMean(x.learn, x, sigma = NULL, b = NULL, weights = c("parzen", "bartlett")) monOpenEndCpMean(det, statistic = c("t", "s", "r", "e", "cs"), eta = 0.001, gamma = 0.45, alpha = 0.05, sigma = NULL, plot = TRUE)
x.learn |
a numeric vector representing the learning sample. |
x |
a numeric vector representing the observations collected after the beginning of the monitoring for a change in mean. |
sigma |
an estimate of the long-run variance of the time series of
which |
b |
strictly positive integer specifying the value of the
bandwidth for the estimation of the long-run variance if
|
weights |
a string specifying the kernel for creating the weights
used for the estimation of the long-run variance if |
det |
an object of class |
statistic |
a string specifying the statistic/detector to be used
for the monitoring; can be either |
eta |
a real parameter whose role is described in detail in the third reference. |
gamma |
a real parameter that can improve the power of the sequential test at the beginning of the monitoring; possible values are 0, 0.1, 0.25, 0.45, 0.65 and 0.85, but not for all statistics; see the third reference. |
alpha |
the value of the desired significance level for the sequential test. |
plot |
logical indicating whether the monitoring should be plotted. |
The testing procedure is described in detail in the third
reference. An alternative way of estimating the long-run variance is
to use the function lrvar()
of the package
sandwich and to pass it through the argument sigma
.
Both functions return lists whose components have explicit names. The
function monOpenEndCpMean()
in particular returns a list whose
components are
alarm |
a logical indicating whether the detector function has exceeded the threshold function. |
time.alarm |
an integer corresponding to the time at
which the detector function has exceeded the threshold function or
|
times.max |
a vector of times at which the successive detectors
|
time.change |
an integer giving the estimated time of change if
|
statistic |
the value of |
eta |
the value of |
gamma |
the value of |
alpha |
the value of |
sigma |
the value of |
detector |
the successive values of the chosen detector. |
threshold |
the value of the constant threshold for the chosen detector. |
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
J. Gösmann, T. Kley and H. Dette (2021), A new approach for open-end sequential change point monitoring, Journal of Time Series Analysis 42:1, pages 63-84, https://arxiv.org/abs/1906.03225.
M. Holmes and I. Kojadinovic (2021), Open-end nonparametric sequential change-point detection based on the retrospective CUSUM statistic, Electronic Journal of Statistics 15:1, pages 2288-2335, doi:10.1214/21-EJS1840.
D.N. Politis and H. White (2004), Automatic block-length selection for the dependent bootstrap, Econometric Reviews 23(1), pages 53-70.
See cpMean()
for the corresponding a posteriori
(offline) test and detOpenEndCpDist()
for the corresponding test for
changes in the distribution function.
## Not run: ## Example of open-end monitoring m <- 100 # size of the learning sample ## The learning sample set.seed(123) x.learn <- rnorm(m) ## New observations with a change in mean ## to simulate monitoring for the period m+1, ..., n n <- 5000 k <- 2500 ## the true change-point x <- c(rnorm(k-m), rnorm(n-k, mean = 0.2)) ## Step 1: Compute the detector det <- detOpenEndCpMean(x.learn = x.learn, x = x) ## Step 2: Monitoring with the default detector m1 <- monOpenEndCpMean(det) str(m1) ## Monitoring with another detector m2 <- monOpenEndCpMean(det, statistic = "s", gamma = 0.85) str(m2) ## End(Not run)
## Not run: ## Example of open-end monitoring m <- 100 # size of the learning sample ## The learning sample set.seed(123) x.learn <- rnorm(m) ## New observations with a change in mean ## to simulate monitoring for the period m+1, ..., n n <- 5000 k <- 2500 ## the true change-point x <- c(rnorm(k-m), rnorm(n-k, mean = 0.2)) ## Step 1: Compute the detector det <- detOpenEndCpMean(x.learn = x.learn, x = x) ## Step 2: Monitoring with the default detector m1 <- monOpenEndCpMean(det) str(m1) ## Monitoring with another detector m2 <- monOpenEndCpMean(det, statistic = "s", gamma = 0.85) str(m2) ## End(Not run)
A nonparametric test of stationarity for univariate continuous time
series resulting from a combination à la Fisher of the
change-point test sensitive to changes in the distribution function
implemented in cpDist()
and the change-point test
sensitive to changes in the autcopula implemented in
cpAutocop()
. Approximate p-values are obtained by
combining two multiplier resampling schemes. Details can be
found in the first reference.
stDistAutocop(x, lag = 1, b = NULL, pairwise = FALSE, weights = c("parzen", "bartlett"), m = 5, N = 1000)
stDistAutocop(x, lag = 1, b = NULL, pairwise = FALSE, weights = c("parzen", "bartlett"), m = 5, N = 1000)
x |
a one-column matrix containing continuous observations. |
lag |
an integer specifying at which lag to consider the
autocopula; the autcopula is a ( |
b |
strictly positive integer specifying the value of the
bandwidth parameter determining the serial dependence when
generating dependent multiplier sequences using the 'moving average
approach'; see Section 5 of the second reference. If set to
|
pairwise |
a logical specifying whether the test should focus
only on the bivariate margins of the ( |
weights |
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the second reference. |
m |
a strictly positive integer specifying the number of points of the
uniform grid on |
N |
number of multiplier replications. |
The testing procedure is described in detail in the second section of the first reference.
An object of class
htest
which is a list,
some of the components of which are
statistic |
value of the test statistic. |
p.value |
corresponding approximate p-value à Fisher. |
component.p.values |
p-values of the component tests arising in the combination. |
b |
the value of parameter |
This is a test for continuous univariate time series.
A. Bücher, J.-D. Fermanian and I. Kojadinovic (2019), Combining cumulative sum change-point detection tests for assessing the stationarity of univariate time series, Journal of Time Series Analysis 40, pages 124-150, https://arxiv.org/abs/1709.02673.
A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, https://arxiv.org/abs/1306.3930.
see cpDist()
and cpAutocop()
for the
component tests.
## AR1 example n <- 200 k <- n/2 ## the true change-point x <- matrix(c(arima.sim(list(ar = -0.1), n = k), arima.sim(list(ar = 0.5), n = n - k))) stDistAutocop(x) ## AR2 example n <- 200 k <- n/2 ## the true change-point x <- matrix(c(arima.sim(list(ar = c(0,-0.1)), n = k), arima.sim(list(ar = c(0,0.5)), n = n - k))) ## Not run: stDistAutocop(x) stDistAutocop(x, lag = 2) ## End(Not run) stDistAutocop(x, lag = 2, pairwise = TRUE)
## AR1 example n <- 200 k <- n/2 ## the true change-point x <- matrix(c(arima.sim(list(ar = -0.1), n = k), arima.sim(list(ar = 0.5), n = n - k))) stDistAutocop(x) ## AR2 example n <- 200 k <- n/2 ## the true change-point x <- matrix(c(arima.sim(list(ar = c(0,-0.1)), n = k), arima.sim(list(ar = c(0,0.5)), n = n - k))) ## Not run: stDistAutocop(x) stDistAutocop(x, lag = 2) ## End(Not run) stDistAutocop(x, lag = 2, pairwise = TRUE)