Title: | Wild Binary Segmentation for Multiple Change-Point Detection |
---|---|
Description: | Provides efficient implementation of the Wild Binary Segmentation and Binary Segmentation algorithms for estimation of the number and locations of multiple change-points in the piecewise constant function plus Gaussian noise model. |
Authors: | Rafal Baranowski [aut, cre], Piotr Fryzlewicz [aut] |
Maintainer: | Rafal Baranowski <[email protected]> |
License: | GPL-2 |
Version: | 1.4.1 |
Built: | 2024-11-05 06:54:31 UTC |
Source: | CRAN |
The package implements Wild Binary Segmentation, a technique for consistent estimation of the number and locations of multiple change-points in data. It also provides a fast implementation of the standard Binary Segmentation algorithm.
The main routines of the package are wbs
, sbs
and changepoints
.
P. Fryzlewicz (2014), Wild Binary Segmentation for multiple change-point detection. Annals of Statistics, to appear. (http://stats.lse.ac.uk/fryzlewicz/wbs/wbs.pdf)
#an example in which standard Binary Segmentation fails to detect change points x <- rnorm(300)+ c(rep(0,130),rep(-1,20),rep(1,20),rep(0,130)) s <- sbs(x) w <- wbs(x) s.cpt <- changepoints(s) s.cpt w.cpt <- changepoints(w) w.cpt # in this example, both algorithms work well x <- rnorm(300) + c(rep(1,50),rep(0,250)) s <- sbs(x) w <- wbs(x) s.cpt <- changepoints(s) s.cpt w.cpt <- changepoints(w) w.cpt
#an example in which standard Binary Segmentation fails to detect change points x <- rnorm(300)+ c(rep(0,130),rep(-1,20),rep(1,20),rep(0,130)) s <- sbs(x) w <- wbs(x) s.cpt <- changepoints(s) s.cpt w.cpt <- changepoints(w) w.cpt # in this example, both algorithms work well x <- rnorm(300) + c(rep(1,50),rep(0,250)) s <- sbs(x) w <- wbs(x) s.cpt <- changepoints(s) s.cpt w.cpt <- changepoints(w) w.cpt
The function evaluates the penalty term for the standard Bayesian Information Criterion applied to the change-point detection problem. This routine is typically not called directly by the user; its name can be passed as an argument to changepoints
.
bic.penalty(n, cpt)
bic.penalty(n, cpt)
n |
the number of observations |
cpt |
a vector with localisations of change-points |
the penalty term where
denotes the number of elements in
cpt
x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) w.cpt <- changepoints(w,penalty="bic.penalty") w.cpt$cpt.ic x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) w.cpt <- changepoints(w,penalty="bic.penalty") w.cpt$cpt.ic
x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) w.cpt <- changepoints(w,penalty="bic.penalty") w.cpt$cpt.ic x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) w.cpt <- changepoints(w,penalty="bic.penalty") w.cpt$cpt.ic
The function applies user-specified stopping criteria to extract change-points from object
generated by wbs
or sbs
. For object
of class 'sbs', the function returns
change-points whose corresponding test statistic exceeds threshold given in th
. For object
of class 'wbs',
the change-points can be also detected using information criteria with penalties specified in penalty
.
changepoints(object, ...) ## S3 method for class 'sbs' changepoints(object, th = NULL, th.const = 1.3, Kmax = NULL, ...) ## S3 method for class 'wbs' changepoints(object, th = NULL, th.const = 1.3, Kmax = 50, penalty = c("ssic.penalty", "bic.penalty", "mbic.penalty"), ...)
changepoints(object, ...) ## S3 method for class 'sbs' changepoints(object, th = NULL, th.const = 1.3, Kmax = NULL, ...) ## S3 method for class 'wbs' changepoints(object, th = NULL, th.const = 1.3, Kmax = 50, penalty = c("ssic.penalty", "bic.penalty", "mbic.penalty"), ...)
object |
an object of 'wbs' or 'sbs' class returned by, respectively, |
... |
further arguments that may be passed to the penalty functions |
th |
a vector of positive scalars |
th.const |
a vector of positive scalars |
Kmax |
a maximum number of change-points to be detected |
penalty |
a character vector with names of penalty functions used |
For the change-point detection based on thresholding (object
of class 'sbs' or 'wbs'), the user can either specify the thresholds in th
directly,
determine the maximum number Kmax
of change-points to be detected, or let th
depend on th.const
.
When Kmax
is given, the function automatically sets th
to the lowest threshold such that the number of detected change-points is lower or equal than Kmax
.
Note that for the BS algorithm it might be not possible to find the threshold such that exactly Kmax
change-points are found.
When th
and Kmax
are omitted, the threshold value is set to
where sigma is the Median Absolute Deviation estimate of the noise level and is the number of elements in
x
.
For the change-point detection based on information criteria (object
of class 'wbs' only),
the user can specify both the maximum number of change-points (Kmax
) and a type of the penalty used.
Parameter penalty
should contain a list of characters with names of the functions of at least two arguments (n
and cpt
).
For each penalty given, the following information criterion is minimized over candidate sets of change-points cpt
:
where denotes the number of elements in
,
is the corresponding maximum
likelihood estimator of the residual variance.
sigma |
Median Absolute Deviation estimate of the noise level |
th |
a vector of thresholds |
no.cpt.th |
the number of change-points detected for each value of |
cpt.th |
a list with the change-points detected for each value of |
Kmax |
a maximum number of change-points detected |
ic.curve |
a list with values of the chosen information criteria |
no.cpt.ic |
the number of change-points detected for each information criterion considered |
cpt.ic |
a list with the change-points detected for each information criterion considered |
#we generates gaussian noise + Poisson process signal with 10 jumps on average set.seed(10) N <- rpois(1,10) true.cpt <- sample(1000,N) m1 <- matrix(rep(1:1000,N),1000,N,byrow=FALSE) m2 <- matrix(rep(true.cpt,1000),1000,N,byrow=TRUE) x <- rnorm(1000) + apply(m1>=m2,1,sum) # we apply the BS and WBS algorithms with default values for their parameters s <- sbs(x) w <- wbs(x) s.cpt <- changepoints(s) s.cpt w.cpt <- changepoints(w) w.cpt #we can use different stopping criteria, invoking sbs/wbs functions is not necessary s.cpt <- changepoints(s,th.const=c(1,1.3)) s.cpt w.cpt <- changepoints(w,th.const=c(1,1.3)) w.cpt
#we generates gaussian noise + Poisson process signal with 10 jumps on average set.seed(10) N <- rpois(1,10) true.cpt <- sample(1000,N) m1 <- matrix(rep(1:1000,N),1000,N,byrow=FALSE) m2 <- matrix(rep(true.cpt,1000),1000,N,byrow=TRUE) x <- rnorm(1000) + apply(m1>=m2,1,sum) # we apply the BS and WBS algorithms with default values for their parameters s <- sbs(x) w <- wbs(x) s.cpt <- changepoints(s) s.cpt w.cpt <- changepoints(w) w.cpt #we can use different stopping criteria, invoking sbs/wbs functions is not necessary s.cpt <- changepoints(s,th.const=c(1,1.3)) s.cpt w.cpt <- changepoints(w,th.const=c(1,1.3)) w.cpt
The function generates approximately M
intervals with endpoints in 1
,2
,...,n
, without random drawing. This routine
can be used inside wbs
function and is typically not called directly by the user.
fixed.intervals(n, M)
fixed.intervals(n, M)
n |
a number of endpoints to choose from |
M |
a number of intervals to generate |
Function finds the minimal m
such that .
Then it generates
m
approximately equally-spaced positive integers lower than n
and returns all possible intervals consisting of any two of these points.
a 2-column matrix with start (first column) and end (second column) points of an interval in each row
fixed.intervals(10,100)
fixed.intervals(10,100)
The function evaluates the penalty term for the Modified Bayes Information Criterion proposed in N. Zhang and D. Siegmund (2007). This routine is typically not called directly by the user; its name can be passed as an argument to changepoints
.
mbic.penalty(n, cpt)
mbic.penalty(n, cpt)
n |
the number of observations |
cpt |
a vector with localisations of change-points |
the penalty term
where denotes the number of elements in
cpt
and are the lengths of the intervals between changepoints in
cpt
N. Zhang and D. Siegmund (2007), A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data, Biometrics.
x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) w.cpt <- changepoints(w,penalty="mbic.penalty") w.cpt$cpt.ic
x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) w.cpt <- changepoints(w,penalty="mbic.penalty") w.cpt$cpt.ic
The function finds the average of the input vector x
between change-points given in cpt
.
means.between.cpt(x, cpt = NULL, ...)
means.between.cpt(x, cpt = NULL, ...)
x |
a vector |
cpt |
a vector of integers with localisations of change-points |
... |
further arguments passed to |
a vector of the same length as x
, piecewise constant and equal to the mean between change-points given in cpt
x <- rnorm(100)+c(rep(-1,50),rep(1,50)) cpt <- 50 means.between.cpt(x,cpt) w <- wbs(x) cpt <- changepoints(w) means.between.cpt(x,cpt=cpt$cpt.ic$sbic)
x <- rnorm(100)+c(rep(-1,50),rep(1,50)) cpt <- 50 means.between.cpt(x,cpt) w <- wbs(x) cpt <- changepoints(w) means.between.cpt(x,cpt=cpt$cpt.ic$sbic)
Plots the input vector used to generate 'sbs' object x
with fitted piecewise constant function, equal to the mean
between change-points specified in cpt
.
## S3 method for class 'sbs' plot(x, cpt, ...)
## S3 method for class 'sbs' plot(x, cpt, ...)
x |
an object of class 'sbs', returned by |
cpt |
a vector of integers with localisations of change-points |
... |
other parameters which may be passed to |
When cpt
is omitted, the function automatically finds change-points
using changepoints
function with a default value of the threshold.
Plots the input vector used to generate 'wbs' object x
with fitted piecewise constant function, equal to the mean
between change-points specified in cpt
.
## S3 method for class 'wbs' plot(x, cpt, ...)
## S3 method for class 'wbs' plot(x, cpt, ...)
x |
an object of class 'wbs', returned by |
cpt |
a vector of integers with localisations of change-points |
... |
other parameters which may be passed to |
When cpt
is omitted, the function automatically finds change-points
using changepoints
function with strengthened Schwarz Information Criterion as a stopping criterion for the WBS algorithm.
Print for an 'sbs' object
## S3 method for class 'sbs' print(x, ...)
## S3 method for class 'sbs' print(x, ...)
x |
an object of class 'sbs' |
... |
further arguments passed to |
Print for a 'wbs' object
## S3 method for class 'wbs' print(x, ...)
## S3 method for class 'wbs' print(x, ...)
x |
an object of class 'wbs' |
... |
further arguments passed to |
The function generates M
intervals, whose endpoints are
are drawn uniformly without replacements from 1
,2
,..., n
. This routine can be
used inside wbs
function and is typically not called directly by the user.
random.intervals(n, M)
random.intervals(n, M)
n |
a number of endpoints to choose from |
M |
a number of intervals to generate |
a M
by 2 matrix with start (first column) and end (second column) points of an interval in each row
random.intervals(10,100)
random.intervals(10,100)
The function applies the Binary Segmentation algorithm to identify potential locations of the change-points in the mean of the input vector x
.
The object returned by this routine can be further passed to the changepoints
function,
which finds the final estimate of the change-points based on thresholding.
sbs(x, ...) ## Default S3 method: sbs(x, ...)
sbs(x, ...) ## Default S3 method: sbs(x, ...)
x |
a numeric vector |
... |
not in use |
an object of class "sbs", which contains the following fields
x |
the vector provided |
n |
the length of |
res |
a 6-column matrix with results, where 's' and 'e' denote start- end points of the intervals in which change-points candidates 'cpt' have been found; column 'CUSUM' contains corresponding value of CUSUM statistic; 'min.th' is the smallest threshold value for which given change-point candidate would be not added to the set of estimated change-points; the last column is the scale at which the change-point has been found |
x <- rnorm(300) + c(rep(1,50),rep(0,250)) s <- sbs(x) s.cpt <- changepoints(s) s.cpt th <- c(s.cpt$th,0.7*s.cpt$th) s.cpt <- changepoints(s,th=th) s.cpt
x <- rnorm(300) + c(rep(1,50),rep(0,250)) s <- sbs(x) s.cpt <- changepoints(s) s.cpt th <- c(s.cpt$th,0.7*s.cpt$th) s.cpt <- changepoints(s,th=th) s.cpt
The function evaluates the penalty term for the strengthened Schwarz Information Criterion proposed in P. Fryzlewicz (2014). This routine is typically not called directly by the user; its name can be passed as an argument to changepoints
.
ssic.penalty(n, cpt, alpha = 1.01, ssic.type = c("log", "power"))
ssic.penalty(n, cpt, alpha = 1.01, ssic.type = c("log", "power"))
n |
the number of observations |
cpt |
a vector with localisations of change-points |
alpha |
a scalar greater than one |
ssic.type |
a string ("log" or "power") |
the penalty term for
ssic.penalty="log"
or for
ssic.penalty="power"
, where denotes the number of elements in
cpt
P. Fryzlewicz (2014), Wild Binary Segmentation for multiple change-point detection. Annals of Statistics, to appear. (http://stats.lse.ac.uk/fryzlewicz/wbs/wbs.pdf)
x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) w.cpt <- changepoints(w,penalty="ssic.penalty") w.cpt$cpt.ic
x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) w.cpt <- changepoints(w,penalty="ssic.penalty") w.cpt$cpt.ic
The function applies the Wild Binary Segmentation algorithm to identify potential locations of the change-points in the mean of the input vector x
.
The object returned by this routine can be further passed to the changepoints
function,
which finds the final estimate of the change-points based on chosen stopping criteria.
wbs(x, ...) ## Default S3 method: wbs(x, M = 5000, rand.intervals = TRUE, integrated = TRUE, ...)
wbs(x, ...) ## Default S3 method: wbs(x, M = 5000, rand.intervals = TRUE, integrated = TRUE, ...)
x |
a numeric vector |
... |
not in use |
M |
a number of intervals used in the WBS algorithm |
rand.intervals |
a logical variable; if |
integrated |
a logical variable indicating the version of Wild Binary Segmentation algorithm used; when |
an object of class "wbs", which contains the following fields
x |
the input vector provided |
n |
the length of |
M |
the number of intervals used |
rand.intervals |
a logical variable indicating type of intervals |
integrated |
a logical variable indicating type of WBS procedure |
res |
a 6-column matrix with results, where 's' and 'e' denote start- end points of the intervals in which change-points candidates 'cpt' have been found; column 'CUSUM' contains corresponding value of CUSUM statistic; 'min.th' is the smallest threshold value for which given change-point candidate would be not added to the set of estimated change-points; the last column is the scale at which the change-point has been found |
x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) plot(w) w.cpt <- changepoints(w) w.cpt th <- c(w.cpt$th,0.7*w.cpt$th) w.cpt <- changepoints(w,th=th) w.cpt$cpt.th
x <- rnorm(300) + c(rep(1,50),rep(0,250)) w <- wbs(x) plot(w) w.cpt <- changepoints(w) w.cpt th <- c(w.cpt$th,0.7*w.cpt$th) w.cpt <- changepoints(w,th=th) w.cpt$cpt.th