Title: | Methods for Nonparametric Changepoint Detection |
---|---|
Description: | Implements the multiple changepoint algorithm PELT with a nonparametric cost function based on the empirical distribution of the data. This package extends the changepoint package (see Killick, R and Eckley, I (2014) <doi:10.18637/jss.v058.i03> ). |
Authors: | Kaylea Haynes [aut], Rebecca Killick [aut], Paul Fearnhead [ths, ctb], Idris Eckley [ths], Daniel Grose [ctb, cre] |
Maintainer: | Daniel Grose <[email protected]> |
License: | GPL |
Version: | 1.0.5 |
Built: | 2024-11-29 09:06:29 UTC |
Source: | CRAN |
Calculates the optimal positioning and number of changepoints for data given a user specified cost function and penalty.
cpt.np( data, penalty = "MBIC", pen.value = 0, method = "PELT", test.stat = "empirical_distribution", class = TRUE, minseglen = 1, nquantiles = 10, verbose = TRUE )
cpt.np( data, penalty = "MBIC", pen.value = 0, method = "PELT", test.stat = "empirical_distribution", class = TRUE, minseglen = 1, nquantiles = 10, verbose = TRUE )
data |
A vector, ts object or matrix containing the data within which you wish to find a changepoint. If the data is a matrix, each row is considered as a separate dataset. |
penalty |
Choice of "None", "SIC", "BIC", "MBIC", AIC", "Hannan-Quinn", "Manual" and "CROPS" penalties. If Manual is specified, the manual penalty is contained in the pen.value parameter. If CROPS is specified, the penalty range is contained in the pen.value parameter; note this is a vector of length 2 which contains the minimum and maximum penalty value. Note CROPS can only be used if the method is "PELT". The predefined penalties listed DO count the changepoint as a parameter, postfix a 0 e.g."SIC0" to NOT count the changepoint as a parameter. |
pen.value |
The value of the penalty when using the Manual penalty option. A vector of length 2 (min,max) if using the CROPS penalty. |
method |
Currently the only method is "PELT". |
test.stat |
The assumed test statistic/distribution of the data. Currently only "empirical_distribution". |
class |
Logical. If TRUE then an object of class cpt is returned. |
minseglen |
Positive integer giving the minimum segment length (number of observations between changes), default is the minimum allowed by theory. |
nquantiles |
The number of quantiles to calculate when test.stat = "empirical_distribution". |
verbose |
Logical value. If TRUE then progress will be reported when penalty=CROPS. Default value is TRUE. |
This function is used to find multiple changes in a data set using the changepoint algorithm PELT with a nonparametric cost function based on the empirical distribution. A changepoint is denoted as the first observation of the new segment.
If class=TRUE
then an object of S4 class "cpt" is returned. The slot cpts
contains the changepoints that are returned. For class=FALSE
the structure is as follows.
If data is a vector (single dataset) then a vector/list is returned depending on the value of method. If data is a matrix (multiple datasets) then a list is returned where each element in the list is either a vector or list depending on the value of method.
If method is PELT then a vector is returned containing the changepoint locations for the penalty supplied. If the penalty is CROPS then a list is returned with the elements:
cpt.out |
A data frame containing the value of the penalty value where the number of segmentations changes, the number of segmentations and the value of the cost at that penalty value. |
changepoints |
The optimal changepoints for the different penalty values starting with the lowest penalty value. |
Kaylea Haynes
Haynes K, Fearnhead P, Eckley IA (2017). “A computationally efficient nonparametric approach for changepoint detection.” Statistics and Computing, 27(5), 1293–1305. ISSN 1573-1375, doi:10.1007/s11222-016-9687-5.
Killick R, Fearnhead P, Eckley IA (2012). “Optimal Detection of Changepoints With a Linear Computational Cost.” Journal of the American Statistical Association, 107, 1590-1598. doi:10.1080/01621459.2012.737745.
Haynes K, A. Eckley I, Fearnhead P (2015). “Computationally Efficient Changepoint Detection for a Range of Penalties.” Journal of Computational and Graphical Statistics, 26, 1-28. doi:10.1080/10618600.2015.1116445.
PELT in parametric settings: cpt.mean
for changes in the mean, cpt.var
for changes in the variance and cpt.meanvar
for changes in the mean and variance.
#Example of a data set of length 1000 with changes in location #(model 1 of Haynes, K et al. (2016)) with the empirical distribution cost function. set.seed(12) J <- function(x){ (1+sign(x))/2 } n <- 1000 tau <- c(0.1,0.13,0.15,0.23,0.25,0.4,0.44,0.65,0.76,0.78,0.81)*n h <- c(2.01, -2.51, 1.51, -2.01, 2.51, -2.11, 1.05, 2.16, -1.56, 2.56, -2.11) sigma <- 0.5 t <- seq(0,1,length.out = n) data <- array() for (i in 1:n){ data[i] <- sum(h*J(n*t[i] - tau)) + (sigma * rnorm(1)) } out <- cpt.np(data, penalty = "SIC",method="PELT",test.stat="empirical_distribution", class=TRUE,minseglen=2, nquantiles =4*log(length(data))) cpts(out) #returns 100 130 150 230 250 400 440 650 760 780 810 as the changepoint locations. plot(out) #Example 2 uses the heart rate data . data(HeartRate) cptHeartRate <- cpt.np(HeartRate, penalty = "CROPS", pen.value = c(5,200), method="PELT", test.stat="empirical_distribution", class=TRUE,minseglen=2, nquantiles =4*log(length(HeartRate))) plot(cptHeartRate, diagnostic = TRUE) plot(cptHeartRate, ncpts = 11)
#Example of a data set of length 1000 with changes in location #(model 1 of Haynes, K et al. (2016)) with the empirical distribution cost function. set.seed(12) J <- function(x){ (1+sign(x))/2 } n <- 1000 tau <- c(0.1,0.13,0.15,0.23,0.25,0.4,0.44,0.65,0.76,0.78,0.81)*n h <- c(2.01, -2.51, 1.51, -2.01, 2.51, -2.11, 1.05, 2.16, -1.56, 2.56, -2.11) sigma <- 0.5 t <- seq(0,1,length.out = n) data <- array() for (i in 1:n){ data[i] <- sum(h*J(n*t[i] - tau)) + (sigma * rnorm(1)) } out <- cpt.np(data, penalty = "SIC",method="PELT",test.stat="empirical_distribution", class=TRUE,minseglen=2, nquantiles =4*log(length(data))) cpts(out) #returns 100 130 150 230 250 400 440 650 760 780 810 as the changepoint locations. plot(out) #Example 2 uses the heart rate data . data(HeartRate) cptHeartRate <- cpt.np(HeartRate, penalty = "CROPS", pen.value = c(5,200), method="PELT", test.stat="empirical_distribution", class=TRUE,minseglen=2, nquantiles =4*log(length(HeartRate))) plot(cptHeartRate, diagnostic = TRUE) plot(cptHeartRate, ncpts = 11)
A dataset containing heart-rate recorded during a run.
data(HeartRate)
data(HeartRate)
A vector of recorded heart rates at points over time with 1160 data points.
Kaylea Haynes
Haynes K, Fearnhead P, Eckley IA (2017). “A computationally efficient nonparametric approach for changepoint detection.” Statistics and Computing, 27(5), 1293–1305. ISSN 1573-1375, doi:10.1007/s11222-016-9687-5.