Title: | High-Dimensional Changepoint Estimation via Sparse Projection |
---|---|
Description: | Provides a data-driven projection-based method for estimating changepoints in high-dimensional time series. Multiple changepoints are estimated using a (wild) binary segmentation scheme. |
Authors: | Tengyao Wang, Bertille Follain and Richard Samworth |
Maintainer: | Tengyao Wang <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2024-12-11 06:54:32 UTC |
Source: | CRAN |
inspect
The threshold level to be used in inspect
is computed via Monte Carlo simulation of multivariate time series that do not contain any changepoints.
compute.threshold(n, p, nrep = 100, show_progress = TRUE)
compute.threshold(n, p, nrep = 100, show_progress = TRUE)
n |
Time length of the observation. |
p |
Dimension of the multivariate time series. |
nrep |
Number of Monte Carlo repetition to be used. |
show_progress |
whether to show the progress of Monte Carlo simulation |
A numeric value indicating the threshold level that should be used based on the Monte Carlo simulation.
compute.threshold(n=200, p=50)
compute.threshold(n=200, p=50)
Performing CUSUM transformation to the input matrix of multivariate time series. If the input is a vector, it is treated as a matrix of one row.
cusum.transform(x)
cusum.transform(x)
x |
input matrix |
For any integers p and n, the CUSUM transformation is defined by
The transformed matrix is returned. Note that the returned matrix has the same number of rows but one fewer columns compared with the input matrix.
x <- matrix(rnorm(20),4,5) cusum.transform(x)
x <- matrix(rnorm(20),4,5) cusum.transform(x)
MissCUSUM transformation of a matrix with missing entries
cusum.transform.missing(x)
cusum.transform.missing(x)
x |
a matrix with missing entries represented by NA |
MissCUSUM transformed matrix
MissCUSUM transformation of a single vector with missing entries
cusum.univariate.missing(x)
cusum.univariate.missing(x)
x |
a vector with missing entries represented by NA |
MissCUSUM transformed vector
This is the main function of the package InspectChangepoint. The function inspect
estimates the locations of multiple changepoints in the mean structure of a multivariate time series. Multiple changepoints are estimated using a (wild) binary segmentation scheme, whereas each segmentation step uses the locate.change
function.
inspect( x, lambda, threshold, schatten = c(1, 2), M, missing_data = "auto", show_progress = FALSE )
inspect( x, lambda, threshold, schatten = c(1, 2), M, missing_data = "auto", show_progress = FALSE )
x |
The input data matrix of a high-dimensional time series, with each component time series stored as a row. |
lambda |
Regularisation parameter used in |
threshold |
Threshold level for testing whether an identified changepoint is a true changepoint. If no value is supplied, the threshold level is computed via Monte Carlo simulation of 100 repetitions from the null model. |
schatten |
The Schatten norm constraint to use in the |
M |
The Monte Carlo parameter used for wild binary segmentation. Default is M = 0, which means a classical binary segmentation scheme is used. |
missing_data |
How missing data in x should be handled. If missing_data='meanImpute', then missing data are imputed with row means; if 'MissInspect', use the MissInspect algorithm of Follain et al. (2022)' if 'auto', the program will make the choice depending on the amount of missingness. |
show_progress |
whether to display progress of computation |
The input time series is first standardised using the rescale.variance
function. Recursive calls of the locate.change
function then segments the multivariate time series using (wild) binary segmentation. A changepoint at time z is defined here to mean that the time series has constant mean structure for time up to and including z and constant mean structure for time from z+1 onwards.
More details about model assumption and theoretical guarantees can be found in Wang and Samworth (2016). Note that Monte Carlo computation of the threshold value can be slow, especially for large p. If inspect
is to be used multiple times with the same (or similar) data matrix size, it is better to precompute the threshold level via Monte Carlo simulation by calling the compute.threshold
function.
The return value is an S3 object of class 'inspect'. It contains a list of two objeccts:
x The input data matrix
changepoints A matrix with three columns. The first column contains the locations of estimated changepoints sorted in increasing order; the second column contains the maximum CUSUM statistics of the projected univariate time series associated with each estimated changepoint; the third column contains the depth of binary segmentation for each detected changepoint.
Wang, T. and Samworth, R. J. (2018) High dimensional changepoint estimation via sparse projection. J. Roy. Statist. Soc., Ser. B, 80, 57–83. Follain, B., Wang, T. and Samworth R. J. (2022) High-dimensional changepoint estimation with heterogeneous missingness. J. Roy. Statist. Soc., Ser. B, to appear
n <- 500; p <- 100; ks <- 30; zs <- c(125,250,375) varthetas <- c(0.2,0.4,0.6); overlap <- 0.5 obj <- multi.change(n, p, ks, zs, varthetas, overlap) x <- obj$x threshold <- compute.threshold(n,p) ret <- inspect(x, threshold = threshold) ret summary(ret) plot(ret)
n <- 500; p <- 100; ks <- 30; zs <- c(125,250,375) varthetas <- c(0.2,0.4,0.6); overlap <- 0.5 obj <- multi.change(n, p, ks, zs, varthetas, overlap) x <- obj$x threshold <- compute.threshold(n,p) ret <- inspect(x, threshold = threshold) ret summary(ret) plot(ret)
Estimate the location of one changepoint in a multivariate time
series. It uses the function sparse.svd
to estimate the best
projection direction, then using univariate CUSUM statistics of the projected
time series to estimate the changepoint location.
locate.change( x, lambda, schatten = 2, sample.splitting = FALSE, standardize.series = FALSE, view.cusum = FALSE )
locate.change( x, lambda, schatten = 2, sample.splitting = FALSE, standardize.series = FALSE, view.cusum = FALSE )
x |
A (p x n) data matrix of multivariate time series, each column represents a data point |
lambda |
Regularisation parameter. If no value is supplied, the dafault value is chosen to be sqrt(log(log(n)*p/2)) for p and n number of rows and columns of the data matrix x respectively. |
schatten |
The Schatten norm constraint to use in the |
sample.splitting |
Whether the changepoint should be estimated via sample splitting. The theoretical result is proven only for the sample splitted version of the algorithm. However, the default setting in practice is without sample splitting. |
standardize.series |
Whether the given time series should be standardised before estimating the projection direction. Default is FALSE, i.e. the input series is assume to have variance 1 in each coordinate. |
view.cusum |
Whether to show a plot of the projected CUSUM series |
A list of two items:
changepoint - A single integer value estimate of the changepoint location is returned. If the estimated changepoint is z, it means that the multivariate time series is piecewise constant up to z and from z+1 onwards.
cusum - The maximum absolute CUSUM statistic of the projected univariate time series associated with the estimated changepoint.
vector.proj - the vector of projection, which is proportional to an estimate of the vector of change.
Wang, T., Samworth, R. J. (2016) High-dimensional changepoint estimation via sparse projection. Arxiv preprint: arxiv1606.06246.
n <- 2000; p <- 1000; k <- 32; z <- 400; vartheta <- 0.12; sigma <- 1; shape <- 3 noise <- 0; corr <- 0 obj <- single.change(n,p,k,z,vartheta,sigma,shape,noise,corr) x <- obj$x locate.change(x)
n <- 2000; p <- 1000; k <- 32; z <- 400; vartheta <- 0.12; sigma <- 1; shape <- 3 noise <- 0; corr <- 0 obj <- single.change(n,p,k,z,vartheta,sigma,shape,noise,corr) x <- obj$x locate.change(x)
Single changepoint estimation with missing data
locate.change.missing( x, lambda, standardize.series = FALSE, view.cusum = FALSE )
locate.change.missing( x, lambda, standardize.series = FALSE, view.cusum = FALSE )
x |
A (p x n) data matrix of multivariate time series, each column represents a data point |
lambda |
Regularisation parameter. If no value is supplied, the dafault value is chosen to be sqrt(log(log(n)*p/2)) for p and n number of rows and columns of the data matrix x respectively. |
standardize.series |
Whether the given time series should be standardised before estimating the projection direction. Default is FALSE, i.e. the input series is assume to have variance 1 in each coordinate. |
view.cusum |
Whether to show a plot of the projected CUSUM series |
A list of two items:
changepoint - A single integer value estimate of the changepoint location is returned. If the estimated changepoint is z, it means that the multivariate time series is piecewise constant up to z and from z+1 onwards.
cusum - The maximum absolute CUSUM statistic of the projected univariate time series associated with the estimated changepoint.
vector.proj - the vector of projection, which is proportional to an estimate of the vector of change.
Wang, T., Samworth, R. J. (2016) High-dimensional changepoint estimation via sparse projection. Arxiv preprint: arxiv1606.06246.
n <- 2000; p <- 1000; k <- 32; z <- 400; vartheta <- 0.12; sigma <- 1; shape <- 3 noise <- 0; corr <- 0 obj <- single.change(n,p,k,z,vartheta,sigma,shape,noise,corr) x <- obj$x locate.change(x)
n <- 2000; p <- 1000; k <- 32; z <- 400; vartheta <- 0.12; sigma <- 1; shape <- 3 noise <- 0; corr <- 0 obj <- single.change(n,p,k,z,vartheta,sigma,shape,noise,corr) x <- obj$x locate.change(x)
The data matrix is generated via X = mu + W, where mu is the mean structure matrix that captures the changepoint locations and sparsity structure, and W is a random noise matrix having independent N(0,sigma^2) entries.
multi.change(n, p, ks, zs, varthetas, sigma = 1, overlap = 0, shape = 3)
multi.change(n, p, ks, zs, varthetas, sigma = 1, overlap = 0, shape = 3)
n |
Time length of the observation |
p |
Dimension of the multivariate time series |
ks |
A vector describing the number of coordinates that undergo a change in each changepoint. If only a scalar is supplied, each changepoint will have the same number of coordinates that undergo a change. |
zs |
A vector describing the locations of the changepoints. |
varthetas |
A vector describing the root mean squared change magnitude in coordinates that undergo a change for each changepoint. If only a scalar is supplied, each changepoint will have the same signal strength value. |
sigma |
noise level |
overlap |
A number between 0 and 1. The proportion of overlap in the signal coordinates for successive changepoints. |
shape |
How the signal strength is distributed across signal coordinates. When shape = 0, all signal coordinates are changed by the same amount; when shape = 1, their signal strength are proportional to 1, sqrt(2), ..., sqrt(k); when shape = 2, they are proportional to 1, 2, ..., k; when shape = 3, they are proportional to 1, 1/sqrt(2), ..., 1/sqrt(k). |
An S3 object of the class 'hdchangeseq' is returned.
x - The generated data matrix
mu - The mean structure of the data matrix
n <- 2000; p <- 200; ks <- 40; zs <- c(500,1000,1500); varthetas <- c(0.1,0.15,0.2); overlap <- 0.5 obj <- multi.change(n, p, ks, zs, varthetas, overlap) plot(obj, noise = TRUE)
n <- 2000; p <- 200; ks <- 40; zs <- c(500,1000,1500); varthetas <- c(0.1,0.15,0.2); overlap <- 0.5 obj <- multi.change(n, p, ks, zs, varthetas, overlap) plot(obj, noise = TRUE)
Projection (with respect to the inner product defined by the Frobenius norm) of a matrix onto the unit sphere defined by the nuclear norm.
PiS(M)
PiS(M)
M |
Input matrix |
This is an auxiliary function used by the InspectChangepoint
package. The projection is achieved by first performing a singular value decomposition, then projecting the vector of singular values onto the standard simplex, and finally using singular value decomposition in reverse to build the projected matrix.
A matrix of the same dimension as the input is returned.
M <- matrix(rnorm(20),4,5) PiS(M)
M <- matrix(rnorm(20),4,5) PiS(M)
The input vector is projected onto the standard simplex, i.e. the set of vectors of the same length as the input vector with non-negative entries that sum to 1.
PiW(v)
PiW(v)
v |
Input vector |
This is an auxiliary function used by the InspectChangepoint
package.
A vector in the standard simplex that is closest to the input vector is returned.
Chen, Y. and Ye, X. (2011) Projection onto a simplex. arXiv preprint, arxiv:1101.6081.
v <- rnorm(10) PiW(v)
v <- rnorm(10) PiW(v)
Visualising the high-dimensional time series in an 'hdchangeseq' class object. The data matrix or its mean structure is visualised using a grid of coloured rectangles with colours corresponding to the value contained in corresponding coordinates. A heat-spectrum (red to white for values low to high) is used to convert values to colours.
## S3 method for class 'hdchangeseq' plot(x, noise = TRUE, shuffle = FALSE, ...)
## S3 method for class 'hdchangeseq' plot(x, noise = TRUE, shuffle = FALSE, ...)
x |
An object of 'hdchangeseq' class |
noise |
If noise == TRUE, the data matrix is plotted, otherwise, only the mean structure is plotted. |
shuffle |
Whether to shuffle the rows of the plotted matrix. |
... |
Other graphical parameters are not used. |
n <- 2000; p <- 200; ks <- 40; zs <- c(500,1000,1500) varthetas <- c(0.1,0.15,0.2); overlap <- 0.5 obj <- multi.change(n, p, ks, zs, varthetas, overlap) plot(obj, noise = TRUE)
n <- 2000; p <- 200; ks <- 40; zs <- c(500,1000,1500) varthetas <- c(0.1,0.15,0.2); overlap <- 0.5 obj <- multi.change(n, p, ks, zs, varthetas, overlap) plot(obj, noise = TRUE)
Plot function for 'inspect' class objects
## S3 method for class 'inspect' plot(x, ...)
## S3 method for class 'inspect' plot(x, ...)
x |
an 'inspect' class object |
... |
other arguments to be passed to methods are not used |
Print function for 'inspect' class objects
## S3 method for class 'inspect' print(x, ...)
## S3 method for class 'inspect' print(x, ...)
x |
an 'inspect' class object |
... |
other arguments to be passed to methods are not used |
Print percentage
printPercentage(ind, tot)
printPercentage(ind, tot)
ind |
a vector of for loop interator |
tot |
a vector of for loop lengths |
on screen output of percentage
Generate a random unit vectors in R^n
random.UnitVector(n)
random.UnitVector(n)
n |
length of random vector |
Each row of the input matrix is normalised by the estimated standard deviation computed through the median absolute deviation of increments.
rescale.variance(x)
rescale.variance(x)
x |
An input matrix of real values. |
This is an auxiliary function used by the InspectChangepoint
package.
A rescaled matrix of the same size is returned.
x <- matrix(rnorm(40),5,8) * (1:5) x.rescaled <- rescale.variance(x) x.rescaled
x <- matrix(rnorm(40),5,8) * (1:5) x.rescaled <- rescale.variance(x) x.rescaled
The data matrix is generated via X = mu + W, where mu is the mean structure matrix that captures the changepoint location and sparsity structure, and W is a random noise matrix.
single.change(n, p, k, z, vartheta, sigma = 1, shape = 3, noise = 0, corr = 0)
single.change(n, p, k, z, vartheta, sigma = 1, shape = 3, noise = 0, corr = 0)
n |
Time length of the observation |
p |
Dimension of the multivariate time series |
k |
Number of coordinates that undergo a change |
z |
Changepoint location, a number between 1 and n-1. |
vartheta |
The root mean squared change magnitude in coordinates that undergo a change |
sigma |
noise level, see |
shape |
How the signal strength is distributed across signal coordinates. When shape = 0, all signal coordinates are changed by the same amount; when shape = 1, their signal strength are proportional to 1, sqrt(2), ..., sqrt(k); when shape = 2, they are proportional to 1, 2, ..., k; when shape = 3, they are proportional to 1, 1/sqrt(2), ..., 1/sqrt(k). |
noise |
Noise structure of the multivarite time series. For noise = 0, 0.5, 1, columns of W have independent multivariate normal distribution with covariance matrix Sigma. When noise = 0, Sigma = sigma^2 * I_p; when noise = 0.5, noise has local dependence structure given by Sigma_i,j = sigma*corr^|i-j|; when noise = 1, noise has global dependence structure given by matrix(corr,p,p)+diag(p)*(1-corr))) * sigma. When noise = 2, rows of the W are independent and each having an AR(1) structure given by W_j,t = W_j,t-1 * sqrt(corr) + rnorm(sd = sigma) * sqrt(1-corr). For noise = 3, 4, entries of W have i.i.d. uniform distribution and exponential distribution respectively, each centred and rescaled to have zero mean and variance sigma^2. |
corr |
Used to specify correlation structure in the noise. See |
An S3 object of the class 'hdchangeseq' is returned.
x - The generated data matrix
mu - The mean structure of the data matrix
n <- 2000; p <- 100; k <- 10; z <- 800; vartheta <- 1; sigma <- 1 shape <- 3; noise <- 0; corr <- 0 obj <- single.change(n,p,k,z,vartheta,sigma, shape, noise, corr) plot(obj, noise = TRUE)
n <- 2000; p <- 100; k <- 10; z <- 800; vartheta <- 1; sigma <- 1 shape <- 3; noise <- 0; corr <- 0 obj <- single.change(n,p,k,z,vartheta,sigma, shape, noise, corr) plot(obj, noise = TRUE)
Estimating the sparse left leading singular vector by first computing a maximiser Mhat of the convex problem
subject to the Schatten norm constraint |M|_schatten <= 1 using alternating direction method of multipliers (ADMM). Then the leading left singular vector of Mhat is returned.
sparse.svd(Z, lambda, schatten = c(1, 2), max.iter = 1000, tolerance = 1e-05)
sparse.svd(Z, lambda, schatten = c(1, 2), max.iter = 1000, tolerance = 1e-05)
Z |
Input matrix whose left leading singular vector is to be estimated. |
lambda |
Regularisation parameter |
schatten |
Schatten norm constraint to be used. Default uses Schatten-2-norm, i.e. the Frobenius norm. Also possible to use Schatten-1-norm, the nuclear norm. |
max.iter |
maximum iteration for ADMM, only used if schatten=1 |
tolerance |
tolerance level for convergence checking, only used if schatten=1 |
In case of schatten = 2, a closed-form solution for Mhat using matrix soft thresholding is possible. We use the closed-form solution instead of the ADMM algorithm to speed up the computation.
A vector that has the same length as nrow(Z) is returned.
Z <- matrix(rnorm(20),4,5) lambda <- 0.5 sparse.svd(Z, lambda)
Z <- matrix(rnorm(20),4,5) lambda <- 0.5 sparse.svd(Z, lambda)
Computing the sparse leading left singular vector of a matrix with missing entries
sparse.svd.missing(Z, lambda, max_iter = 1000, tol = 1e-10)
sparse.svd.missing(Z, lambda, max_iter = 1000, tol = 1e-10)
Z |
Input matrix whose left leading singular vector is to be estimated. |
lambda |
Regularisation parameter |
max_iter |
maximum iteration |
tol |
tolerance level for convergence |
Summary function for 'inspect' class objects
## S3 method for class 'inspect' summary(object, ...)
## S3 method for class 'inspect' summary(object, ...)
object |
an 'inspect' class object |
... |
other arguments to be passed to methods are not used |
Clipping vector or matrix x from above and below
vector.clip(x, upper = Inf, lower = -upper)
vector.clip(x, upper = Inf, lower = -upper)
x |
a vector of real numbers |
upper |
clip above this value |
lower |
clip below this value |
the entrywise L_q norm of a vector or a matrix
Calculate the entrywise L_q norm of a vector or a matrix
vector.norm(v, q = 2, na.rm = FALSE)
vector.norm(v, q = 2, na.rm = FALSE)
v |
a vector of real numbers |
q |
a nonnegative real number or Inf |
na.rm |
boolean, whether to remove NA before calculation |
the entrywise L_q norm of a vector or a matrix
Normalise a vector
vector.normalise(v, q = 2, na.rm = FALSE)
vector.normalise(v, q = 2, na.rm = FALSE)
v |
a vector of real numbers |
q |
a nonnegative real number or Inf |
na.rm |
boolean, whether to remove NA before calculation |
normalised version of this vector
entries of v are moved towards 0 by the amount lambda until they hit 0.
vector.soft.thresh(x, lambda)
vector.soft.thresh(x, lambda)
x |
a vector of real numbers |
lambda |
soft thresholding value |
a vector of the same length