Title: | Multiscale Change-Point Inference |
---|---|
Description: | Allows fitting of step-functions to univariate serial data where neither the number of jumps nor their positions is known by implementing the multiscale regression estimators SMUCE, simulataneous multiscale changepoint estimator, (K. Frick, A. Munk and H. Sieling, 2014) <doi:10.1111/rssb.12047> and HSMUCE, heterogeneous SMUCE, (F. Pein, H. Sieling and A. Munk, 2017) <doi:10.1111/rssb.12202>. In addition, confidence intervals for the change-point locations and bands for the unknown signal can be obtained. |
Authors: | Pein Florian [aut, cre], Thomas Hotz [aut], Hannes Sieling [aut], Timo Aspelmeier [ctb] |
Maintainer: | Pein Florian <[email protected]> |
License: | GPL-3 |
Version: | 2.1-10 |
Built: | 2024-11-18 06:28:45 UTC |
Source: | CRAN |
Allows fitting of step-functions to univariate serial data where neither the number of jumps nor their positions is known by implementing the multiscale regression estimators SMUCE (Frick et al., 2014) and HSMUCE (Pein et al., 2017). In addition, confidence intervals for the change-point locations and bands for the unknown signal can be obtained. This is implemented in the function stepFit
. Moreover, technical quantities like the statistics itself, bounds or critical values can be computed by other functions of the package. More details can be found in the vignette.
New in version 2.0-0:
stepFit |
Piecewise constant multiscale inference |
critVal |
Critical values |
computeBounds |
Computation of the bounds |
computeStat |
Computation of the multiscale statistic |
monteCarloSimulation |
Monte Carlo simulation |
parametricFamily |
Parametric families |
intervalSystem |
Interval systems |
penalty |
Penalties |
From version 1.0-0:
compareBlocks |
Compare fit blockwise with ground truth |
neighbours |
Neighbouring integers |
sdrobnorm |
Robust standard deviation estimate |
stepblock |
Step function |
stepcand |
Forward selection of candidate jumps |
stepfit |
Fitted step function |
steppath |
Solution path of step-functions |
stepsel |
Automatic selection of number of jumps |
Mainly used for patchclamp recordings and may be transferred to a specialised package:
BesselPolynomial |
Bessel Polynomials |
contMC |
Continuous time Markov chain |
dfilter |
Digital filters |
jsmurf |
Reconstruct filtered piecewise constant functions with noise |
transit |
TRANSIT algorithm for detecting jumps |
Deprecated (please read the documentation of them theirself for more details):
MRC |
Compute Multiresolution Criterion |
MRC.1000 |
Values of the MRC statistic for 1,000 observations (all intervals) |
MRC.asymptotic |
"Asymptotic" values of the MRC statistic (all intervals) |
MRC.asymptotic.dyadic |
"Asymptotic" values of the MRC statistic(dyadic intervals) |
bounds |
Bounds based on MRC |
family |
Family of distributions |
smuceR |
Piecewise constant regression with SMUCE |
If q == NULL
in critVal
, stepFit
or computeBounds
a Monte-Carlo simulation is required for computing critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package offers multiple possibilities for saving and loading the simulations. Simulations can either be saved in the workspace in the variable critValStepRTab
or persistently on the file system for which the package R.cache
is used. Moreover, storing in and loading from variables and RDS files is supported. Finally, a pre-simulated collection of simulations can be accessed by installing the package stepRdata
available from http://www.stochastik.math.uni-goettingen.de/stepRdata_1.0-0.tar.gz. By default simulations will be saved in the workspace and on the file system. For more details and for how simulation can be removed see Section Simulating, saving and loading of Monte-Carlo simulations in critVal
.
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
Pein, F., Tecuapetla-Gómez, I., Schütte, O., Steinem, C., Munk, A. (2017) Fully-automatic multiresolution idealization for filtered ion channel recordings: flickering event detection. arXiv:1706.03671.
Hotz, T., Schütte, O., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C., and Munk, A. (2013) Idealizing ion channel recordings by a jump segmentation multiresolution filter. IEEE Transactions on NanoBioscience 12(4), 376–386.
VanDongen, A. M. J. (1996) A new algorithm for idealizing single ion channel data containing multiple unknown conductance levels. Biophysical Journal 70(3), 1303–1315.
Futschik, A., Hotz, T., Munk, A., Sieling, H. (2014) Multiresolution DNA partitioning: statistical evidence for segments. Bioinformatics, 30(16), 2255–2262.
Boysen, L., Kempe, A., Liebscher, V., Munk, A., Wittich, O. (2009) Consistencies and rates of convergence of jump-penalized least squares estimators. The Annals of Statistics 37(1), 157–183.
Davies, P. L., Kovac, A. (2001) Local extremes, runs, strings and multiresolution. The Annals of Statistics 29, 1–65.
Friedrich, F., Kempe, A., Liebscher, V., Winkler, G. (2008) Complexity penalized M-estimation: fast computation. Journal of Computational and Graphical Statistics 17(1), 201–224.
stepFit
, critVal
, computeStat
, computeBounds
, jsmurf
, transit
, sdrobnorm
, compareBlocks
, parametricFamily, intervalSystem, penalty
# generate random observations set.seed(1) n <- 100L x <- seq(1 / n, 1, 1 / n) mu <- stepfit(cost = 0, family = "gauss", value = c(0, 3, 0, -2, 0), param = NULL, leftEnd = x[c(1, 21, 26, 71, 81)], rightEnd = x[c(20, 25, 70, 80, 100)], x0 = 0, leftIndex = c(1, 21, 26, 71, 81), rightIndex = c(20, 25, 70, 80, 100)) sigma0 <- 0.5 epsilon <- rnorm(n, 0, sigma0) y <- fitted(mu) + epsilon plot(x, y, pch = 16, col = "grey30", ylim = c(-3, 4)) lines(mu, lwd = 3) # computation of SMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red", lwd = 3) # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # higher significance level for larger detection power, but less confidence # suggested for screening purposes stepFit(y, x = x, alpha = 0.9, jumpint = TRUE, confband = TRUE) # smaller significance level for the small risk that the number of # change-points is overestimated with probability not more than 5%, # but smaller detection power stepFit(y, x = x, alpha = 0.05, jumpint = TRUE, confband = TRUE) # different interval system, lengths, penalty and given parameter sd stepFit(y, x = x, alpha = 0.5, intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "weights", weights = c(0.4, 0.3, 0.2, 0.1), sd = sigma0, jumpint = TRUE, confband = TRUE) # the above calls saved and (attempted to) load Monte-Carlo simulations and # simulated them for nq = 128 observations # in the following call no saving, no loading and simulation for n = 100 # observations is required, progress of the simulation will be reported stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE, messages = 1000L, options = list(simulation = "vector", load = list(), save = list())) # critVal was called in stepFit, can be called explicitly, # for instance outside of a for loop to save computation time qVector <- critVal(100L, alpha = 0.5) identical(stepFit(y, x = x, q = qVector, jumpint = TRUE, confband = TRUE), fit) qValue <- critVal(100L, alpha = 0.5, output = "value") identical(stepFit(y, x = x, q = qValue, jumpint = TRUE, confband = TRUE), fit) # computeBounds gives the multiscale contraint computeBounds(y, alpha = 0.5) # monteCarloSimulation will be called in critVal if required # can be called explicitly stat <- monteCarloSimulation(n = 100L) identical(critVal(n = 100L, alpha = 0.5, stat = stat), critVal(n = 100L, alpha = 0.5, options = list(load = list(), simulation = "vector"))) identical(critVal(n = 100L, alpha = 0.5, stat = stat, output = "value"), critVal(n = 100L, alpha = 0.5, output = "value", options = list(load = list(), simulation = "vector"))) stat <- monteCarloSimulation(n = 100L, output = "maximum") identical(critVal(n = 100L, alpha = 0.5, stat = stat), critVal(n = 100L, alpha = 0.5, options = list(load = list(), simulation = "vector"))) identical(critVal(n = 100L, alpha = 0.5, stat = stat, output = "value"), critVal(n = 100L, alpha = 0.5, output = "value", options = list(load = list(), simulation = "vector"))) # fit satisfies the multiscale contraint, i.e. # the computed penalized multiscale statistic is not larger than the global quantile computeStat(y, signal = fit, output = "maximum") <= qValue # multiscale vector of statistics is componentwise not larger than # the vector of critical values all(computeStat(y, signal = fit, output = "vector") <= qVector) # family "hsmuce" set.seed(1) y <- c(rnorm(50, 0, 1), rnorm(50, 1, 0.2)) plot(x, y, pch = 16, col = "grey30", ylim = c(-2.5, 2)) # computation of HSMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, family = "hsmuce", jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red", lwd = 3) # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # for comparison SMUCE, not recommend to use here lines(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), lwd = 3, col = "blue", lty = "22") # family "mDependentPS" # generate observations from a moving average process set.seed(1) y <- c(rep(0, 50), rep(2, 50)) + as.numeric(arima.sim(n = 100, list(ar = c(), ma = c(0.8, 0.5, 0.3)), sd = sigma0)) correlations <- as.numeric(ARMAacf(ar = c(), ma = c(0.8, 0.5, 0.3), lag.max = 3)) covariances <- sigma0^2 * correlations plot(x, y, pch = 16, col = "grey30", ylim = c(-2, 4)) # computation of SMUCE for dependent observations with given covariances fit <- stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", covariances = covariances, jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red", lwd = 3) # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # for comparison SMUCE for independent observations, not recommend to use here lines(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), lwd = 3, col = "blue", lty = "22") # with given correlations, standard deviation will be estimated by sdrobnorm stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", correlations = correlations, jumpint = TRUE, confband = TRUE) # examples from version 1.0-0 # estimating step-functions with Gaussian white noise added # simulate a Gaussian hidden Markov model of length 1000 with 2 states # with identical transition rates 0.01, and signal-to-noise ratio 2 sim <- contMC(1e3, 0:1, matrix(c(0, 0.01, 0.01, 0), 2), param=1/2) plot(sim$data, cex = 0.1) lines(sim$cont, col="red") # maximum-likelihood estimation under multiresolution constraints fit.MRC <- smuceR(sim$data$y, sim$data$x) lines(fit.MRC, col="blue") # choose number of jumps using BIC path <- steppath(sim$data$y, sim$data$x, max.blocks=1e2) fit.BIC <- path[[stepsel.BIC(path)]] lines(fit.BIC, col="green3", lty = 2) # estimate after filtering # simulate filtered ion channel recording with two states set.seed(9) # sampling rate 10 kHz sampling <- 1e4 # tenfold oversampling over <- 10 # 1 kHz 4-pole Bessel-filter, adjusted for oversampling cutoff <- 1e3 df.over <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling / over)) # two states, leaving state 1 at 10 Hz, state 2 at 20 Hz rates <- rbind(c(0, 10), c(20, 0)) # simulate 0.5 s, level 0 corresponds to state 1, level 1 to state 2 # noise level is 0.3 after filtering Sim <- contMC(0.5 * sampling, 0:1, rates, sampling=sampling, family="gaussKern", param = list(df=df.over, over=over, sd=0.3)) plot(Sim$data, pch = ".") lines(Sim$discr, col = "red") # fit under multiresolution constraints using filter corresponding to sample rate df <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling)) Fit.MRC <- jsmurf(Sim$data$y, Sim$data$x, param=df, r=1e2) lines(Fit.MRC, col = "blue") # fit using TRANSIT Fit.trans <- transit(Sim$data$y, Sim$data$x) lines(Fit.trans, col = "green3", lty=2)
# generate random observations set.seed(1) n <- 100L x <- seq(1 / n, 1, 1 / n) mu <- stepfit(cost = 0, family = "gauss", value = c(0, 3, 0, -2, 0), param = NULL, leftEnd = x[c(1, 21, 26, 71, 81)], rightEnd = x[c(20, 25, 70, 80, 100)], x0 = 0, leftIndex = c(1, 21, 26, 71, 81), rightIndex = c(20, 25, 70, 80, 100)) sigma0 <- 0.5 epsilon <- rnorm(n, 0, sigma0) y <- fitted(mu) + epsilon plot(x, y, pch = 16, col = "grey30", ylim = c(-3, 4)) lines(mu, lwd = 3) # computation of SMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red", lwd = 3) # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # higher significance level for larger detection power, but less confidence # suggested for screening purposes stepFit(y, x = x, alpha = 0.9, jumpint = TRUE, confband = TRUE) # smaller significance level for the small risk that the number of # change-points is overestimated with probability not more than 5%, # but smaller detection power stepFit(y, x = x, alpha = 0.05, jumpint = TRUE, confband = TRUE) # different interval system, lengths, penalty and given parameter sd stepFit(y, x = x, alpha = 0.5, intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "weights", weights = c(0.4, 0.3, 0.2, 0.1), sd = sigma0, jumpint = TRUE, confband = TRUE) # the above calls saved and (attempted to) load Monte-Carlo simulations and # simulated them for nq = 128 observations # in the following call no saving, no loading and simulation for n = 100 # observations is required, progress of the simulation will be reported stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE, messages = 1000L, options = list(simulation = "vector", load = list(), save = list())) # critVal was called in stepFit, can be called explicitly, # for instance outside of a for loop to save computation time qVector <- critVal(100L, alpha = 0.5) identical(stepFit(y, x = x, q = qVector, jumpint = TRUE, confband = TRUE), fit) qValue <- critVal(100L, alpha = 0.5, output = "value") identical(stepFit(y, x = x, q = qValue, jumpint = TRUE, confband = TRUE), fit) # computeBounds gives the multiscale contraint computeBounds(y, alpha = 0.5) # monteCarloSimulation will be called in critVal if required # can be called explicitly stat <- monteCarloSimulation(n = 100L) identical(critVal(n = 100L, alpha = 0.5, stat = stat), critVal(n = 100L, alpha = 0.5, options = list(load = list(), simulation = "vector"))) identical(critVal(n = 100L, alpha = 0.5, stat = stat, output = "value"), critVal(n = 100L, alpha = 0.5, output = "value", options = list(load = list(), simulation = "vector"))) stat <- monteCarloSimulation(n = 100L, output = "maximum") identical(critVal(n = 100L, alpha = 0.5, stat = stat), critVal(n = 100L, alpha = 0.5, options = list(load = list(), simulation = "vector"))) identical(critVal(n = 100L, alpha = 0.5, stat = stat, output = "value"), critVal(n = 100L, alpha = 0.5, output = "value", options = list(load = list(), simulation = "vector"))) # fit satisfies the multiscale contraint, i.e. # the computed penalized multiscale statistic is not larger than the global quantile computeStat(y, signal = fit, output = "maximum") <= qValue # multiscale vector of statistics is componentwise not larger than # the vector of critical values all(computeStat(y, signal = fit, output = "vector") <= qVector) # family "hsmuce" set.seed(1) y <- c(rnorm(50, 0, 1), rnorm(50, 1, 0.2)) plot(x, y, pch = 16, col = "grey30", ylim = c(-2.5, 2)) # computation of HSMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, family = "hsmuce", jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red", lwd = 3) # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # for comparison SMUCE, not recommend to use here lines(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), lwd = 3, col = "blue", lty = "22") # family "mDependentPS" # generate observations from a moving average process set.seed(1) y <- c(rep(0, 50), rep(2, 50)) + as.numeric(arima.sim(n = 100, list(ar = c(), ma = c(0.8, 0.5, 0.3)), sd = sigma0)) correlations <- as.numeric(ARMAacf(ar = c(), ma = c(0.8, 0.5, 0.3), lag.max = 3)) covariances <- sigma0^2 * correlations plot(x, y, pch = 16, col = "grey30", ylim = c(-2, 4)) # computation of SMUCE for dependent observations with given covariances fit <- stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", covariances = covariances, jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red", lwd = 3) # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # for comparison SMUCE for independent observations, not recommend to use here lines(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), lwd = 3, col = "blue", lty = "22") # with given correlations, standard deviation will be estimated by sdrobnorm stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", correlations = correlations, jumpint = TRUE, confband = TRUE) # examples from version 1.0-0 # estimating step-functions with Gaussian white noise added # simulate a Gaussian hidden Markov model of length 1000 with 2 states # with identical transition rates 0.01, and signal-to-noise ratio 2 sim <- contMC(1e3, 0:1, matrix(c(0, 0.01, 0.01, 0), 2), param=1/2) plot(sim$data, cex = 0.1) lines(sim$cont, col="red") # maximum-likelihood estimation under multiresolution constraints fit.MRC <- smuceR(sim$data$y, sim$data$x) lines(fit.MRC, col="blue") # choose number of jumps using BIC path <- steppath(sim$data$y, sim$data$x, max.blocks=1e2) fit.BIC <- path[[stepsel.BIC(path)]] lines(fit.BIC, col="green3", lty = 2) # estimate after filtering # simulate filtered ion channel recording with two states set.seed(9) # sampling rate 10 kHz sampling <- 1e4 # tenfold oversampling over <- 10 # 1 kHz 4-pole Bessel-filter, adjusted for oversampling cutoff <- 1e3 df.over <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling / over)) # two states, leaving state 1 at 10 Hz, state 2 at 20 Hz rates <- rbind(c(0, 10), c(20, 0)) # simulate 0.5 s, level 0 corresponds to state 1, level 1 to state 2 # noise level is 0.3 after filtering Sim <- contMC(0.5 * sampling, 0:1, rates, sampling=sampling, family="gaussKern", param = list(df=df.over, over=over, sd=0.3)) plot(Sim$data, pch = ".") lines(Sim$discr, col = "red") # fit under multiresolution constraints using filter corresponding to sample rate df <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling)) Fit.MRC <- jsmurf(Sim$data$y, Sim$data$x, param=df, r=1e2) lines(Fit.MRC, col = "blue") # fit using TRANSIT Fit.trans <- transit(Sim$data$y, Sim$data$x) lines(Fit.trans, col = "green3", lty=2)
Recursively compute coefficients of Bessel Polynomials.
Deprecation warning: This function is a help function for the Bessel filters in dfilter
and may be removed when dfilter
will be removed.
BesselPolynomial(n, reverse = FALSE)
BesselPolynomial(n, reverse = FALSE)
n |
order |
reverse |
whether to return the coefficients of a reverse Bessel Polynomial |
Returns the polynom's coefficients ordered increasing with the exponent, i.e. starting with the intercept, as for polyroot
.
# 15 x^3 + 15 x^2 + 6 x + 1 BesselPolynomial(3)
# 15 x^3 + 15 x^2 + 6 x + 1 BesselPolynomial(3)
Computes two-sided bounds for a set of intervals based on a multiresolution criterion (MRC).
Deprecation warning: This function is deprecated, but still working, however, may be defunct in a future version. Please use instead the function computeBounds
. An example how to reproduce results (currently only family "gauss"
is supported) is given below.
bounds(y, type = "MRC", ...) bounds.MRC(y, q, alpha = 0.05, r = ceiling(50 / min(alpha, 1 - alpha)), lengths = if(family == "gaussKern") 2^(floor(log2(length(y))):ceiling(log2(length(param$kern)))) else 2^(floor(log2(length(y))):0), penalty = c("none", "len", "var", "sqrt"), name = if(family == "gaussKern") ".MRC.ktable" else ".MRC.table", pos = .MCstepR, family = c("gauss", "gaussvar", "poisson", "binomial","gaussKern"), param = NULL, subset, max.iter = 1e2, eps = 1e-3) ## S3 method for class 'bounds' x[subset]
bounds(y, type = "MRC", ...) bounds.MRC(y, q, alpha = 0.05, r = ceiling(50 / min(alpha, 1 - alpha)), lengths = if(family == "gaussKern") 2^(floor(log2(length(y))):ceiling(log2(length(param$kern)))) else 2^(floor(log2(length(y))):0), penalty = c("none", "len", "var", "sqrt"), name = if(family == "gaussKern") ".MRC.ktable" else ".MRC.table", pos = .MCstepR, family = c("gauss", "gaussvar", "poisson", "binomial","gaussKern"), param = NULL, subset, max.iter = 1e2, eps = 1e-3) ## S3 method for class 'bounds' x[subset]
y |
a numeric vector containing the serial data |
type |
so far only bounds of type |
... |
further arguments to be passed on to |
q |
quantile of the MRC; if specified, |
alpha |
level of significance |
r |
number of simulations to use to obtain quantile of MRC for specified |
lengths |
vector of interval lengths to use, dyadic intervals by default |
penalty |
penalty term in the multiresolution statistic: |
family , param
|
specifies distribution of data, see family |
subset |
a subset of indices of |
name , pos
|
under which name and where precomputed results are stored, or retrieved, see |
max.iter |
maximal iterations in Newton's method to compute non-Gaussian MRC bounds |
eps |
tolerance in Newton's method |
x |
an object of class |
Returns an object of class bounds
, i.e. a list whose entry bounds
contains two-sided bounds (lower
and upper
) of the considered intervals (with left index li
and right index ri
) in a data.frame
, along with a vector start
specifying in which row of entry bounds
intervals with corresponding li
start (if any; specified as a C-style index), and a logical
feasible
telling whether a feasible solution exists for these bounds (always TRUE
for MRC bounds which are not restricted to a subset
).
computeBounds
, stepbound
, family
y <- rnorm(100, c(rep(0, 50), rep(1, 50)), 0.5) b <- computeBounds(y, q = 4, intervalSystem = "dyaLen", penalty = "none") b <- b[order(b$li, b$ri), ] attr(b, "row.names") <- seq(along = b$li) # entries in bounds are recovered by computeBounds all.equal(bounds(y, q = 4)$bounds, b) # TRUE # simulate signal of 100 data points Y <- rpois(100, 1:100 / 10) # compute bounds for intervals of dyadic lengths b <- bounds(Y, penalty="len", family="poisson", q=4) # compute bounds for all intervals b <- bounds(Y, penalty="len", family="poisson", q=4, lengths=1:100)
y <- rnorm(100, c(rep(0, 50), rep(1, 50)), 0.5) b <- computeBounds(y, q = 4, intervalSystem = "dyaLen", penalty = "none") b <- b[order(b$li, b$ri), ] attr(b, "row.names") <- seq(along = b$li) # entries in bounds are recovered by computeBounds all.equal(bounds(y, q = 4)$bounds, b) # TRUE # simulate signal of 100 data points Y <- rpois(100, 1:100 / 10) # compute bounds for intervals of dyadic lengths b <- bounds(Y, penalty="len", family="poisson", q=4) # compute bounds for all intervals b <- bounds(Y, penalty="len", family="poisson", q=4, lengths=1:100)
Blockwise comparison of a fitted step function with a known ground truth using different criteria.
compareBlocks(truth, estimate, dist = 5e3)
compareBlocks(truth, estimate, dist = 5e3)
truth |
an object of class |
estimate |
corresponding estimated object(s) of class |
dist |
a single |
A data.frame
, containing just one row if two single stepblock
were given, with columns
true.num , est.num
|
the true / estimated number of blocks |
true.pos , false.pos , false.neg , sens.rate , prec.rate
|
the number of true / false positive, false negatives, as well as the corresponding sensitivity and precision rates, where an estimated block is considered a true positive if it there is a corresponding block in the ground truth with both endpoints within |
fpsle |
false positive sensitive localization error: for each estimated block's midpoint find into which true block it falls, and sum distances of the respective borders |
fnsle |
false negative sensitive localization error: for each true block's mid-point find into which estimated block it falls, and sum distances of the respective borders |
total.le |
total localization error: sum of |
No differences between true and fitted parameter values are taking into account, only the precision of the detected blocks is considered; also, differing from the criteria in Elhaik et al.~(2010), no blocks are merged in the ground truth if its parameter values are close, as this may punish sensitive estimators.
Beware that these criteria compare blockwise, i.e. they do not compare the precision of single jumps but for each block both endpoints have to match well at the same time.
Elhaik, E., Graur, D., Josić, K. (2010) Comparative testing of DNA segmentation algorithms using benchmark simulations. Molecular Biology and Evolution 27(5), 1015-24.
Futschik, A., Hotz, T., Munk, A. Sieling, H. (2014) Multiresolution DNA partitioning: statistical evidence for segments. Bioinformatics, 30(16), 2255–2262.
# simulate two Gaussian hidden Markov models of length 1000 with 2 states each # with identical transition rates being 0.01 and 0.05, resp, signal-to-noise ratio is 5 sim <- lapply(c(0.01, 0.05), function(rate) contMC(1e3, 0:1, matrix(c(0, rate, rate, 0), 2), param=1/5)) plot(sim[[1]]$data) lines(sim[[1]]$cont, col="red") # use smuceR to estimate fit fit <- lapply(sim, function(s) smuceR(s$data$y, s$data$x)) lines(fit[[1]], col="blue") # compare fit with (discretised) ground truth compareBlocks(lapply(sim, function(s) s$discr), fit)
# simulate two Gaussian hidden Markov models of length 1000 with 2 states each # with identical transition rates being 0.01 and 0.05, resp, signal-to-noise ratio is 5 sim <- lapply(c(0.01, 0.05), function(rate) contMC(1e3, 0:1, matrix(c(0, rate, rate, 0), 2), param=1/5)) plot(sim[[1]]$data) lines(sim[[1]]$cont, col="red") # use smuceR to estimate fit fit <- lapply(sim, function(s) smuceR(s$data$y, s$data$x)) lines(fit[[1]], col="blue") # compare fit with (discretised) ground truth compareBlocks(lapply(sim, function(s) s$discr), fit)
Computes the multiscale contraint given by the multiscale test, (3.12) in the vignette. In more detail, returns the bounds of the interval of parameters for which the test statistic is smaller than or equal to the critical value for the corresponding length, i.e. the two solutions resulting from equating the test statistic to the critical value.
If q == NULL
a Monte-Carlo simulation is required for computing critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package saves them by default in the workspace and on the file system such that a second call requiring the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Storing of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument messages
and the saving can be controlled by the argument option
, both can be specified in ...
and are explained in monteCarloSimulation
and critVal
, respectively.
computeBounds(y, q = NULL, alpha = NULL, family = NULL, intervalSystem = NULL, lengths = NULL, ...)
computeBounds(y, q = NULL, alpha = NULL, family = NULL, intervalSystem = NULL, lengths = NULL, ...)
y |
a numeric vector containing the observations |
q |
either |
alpha |
a probability, i.e. a single numeric between 0 and 1, giving the significance level. Its choice is a trade-off between data fit and parsimony of the estimator. In other words, this argument balances the risks of missing change-points and detecting additional artefacts. For more details on this choice see (Frick et al., 2014, section 4) and (Pein et al., 2017, section 3.4). Either |
family |
a string specifying the assumed parametric family, for more details see parametricFamily, currently |
intervalSystem |
a string giving the used interval system, either |
lengths |
an integer vector giving the set of lengths, i.e. only intervals of these lengths will be considered. Note that not all lengths are possible for all interval systems and for all parametric families, see intervalSystem and parametricFamily, respectively, to see which ones are allowed. By default ( |
... |
there are two groups of further arguments:
|
A data.frame
containing two integer vectors li
and ri
and two numeric vectors lower
and upper
. For each interval in the set of intervals specified by intervalSystem
and lengths
li
and ri
give the left and right index of the interval and lower
and upper
give the lower and upper bounds for the parameter on the given interval.
If q == NULL
a Monte-Carlo simulation is required for computing critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package offers multiple possibilities for saving and loading the simulations. Progress of a simulation can be reported by the argument messages
which can be specified in ...
and is explained in the documentation of monteCarloSimulation
. Each Monte-Carlo simulation is specific to the number of observations, the parametric family (including certain parameters, see parametricFamily) and the interval system, and for simulations of class "MCSimulationMaximum"
, additionally, to the set of lengths and the used penalty. Monte-Carlo simulations can also be performed for a (slightly) larger number of observations given in the argument
nq
in ...
and explained in the documentation of critVal
, which avoids extensive resimulations for only a little bit varying number of observations. Simulations can either be saved in the workspace in the variable critValStepRTab
or persistently on the file system for which the package R.cache
is used. Moreover, storing in and loading from variables and RDS files is supported. Finally, a pre-simulated collection of simulations can be accessed by installing the package stepRdata
available from http://www.stochastik.math.uni-goettingen.de/stepRdata_1.0-0.tar.gz. The simulation, saving and loading can be controlled by the argument option
which can be specified in ...
and is explained in the documentation of critVal
. By default simulations will be saved in the workspace and on the file system. For more details and for how simulation can be removed see Section Simulating, saving and loading of Monte-Carlo simulations in critVal
.
Depending on intervalSystem
and lengths
the intervals might be ordered differently to allow fast computation. For most applications the order should not matter. Otherwise, the entries can be reordered with order
, an example is given below.
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
critVal
, penalty
, parametricFamily
, intervalSystem
, stepFit
, computeStat
, monteCarloSimulation
y <- c(rnorm(50), rnorm(50, 1)) # the multiscale contraint bounds <- computeBounds(y, alpha = 0.5) # the order of the bounds depends on intervalSystem and lengths # to allow fast computation # if a specific order is required it can be reordered by order # b is ordered with increasing left indices and increasing right indices b <- bounds[order(bounds$li, bounds$ri), ] attr(b, "row.names") <- seq(along = b$li) # higher significance level for larger detection power, but less confidence computeBounds(y, alpha = 0.99) # smaller significance level for stronger confidence statements, but at # the risk of missing change-points computeBounds(y, alpha = 0.05) # different interval system, lengths, penalty and given parameter sd computeBounds(y, alpha = 0.5, intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "weights", weights = c(0.4, 0.3, 0.2, 0.1), sd = 0.5) # with given q identical(computeBounds(y, q = critVal(100L, alpha = 0.5)), bounds) identical(computeBounds(y, q = critVal(100L, alpha = 0.5, output = "value")), bounds) # the above calls saved and (attempted to) load Monte-Carlo simulations and # simulated them for nq = 128 observations # in the following call no saving, no loading and simulation for n = 100 # observations is required, progress of the simulation will be reported computeBounds(y, alpha = 0.5, messages = 1000L, options = list(simulation = "vector", load = list(), save = list())) # with given stat to compute q stat <- monteCarloSimulation(n = 128L) identical(computeBounds(y, alpha = 0.5, stat = stat), computeBounds(y, alpha = 0.5, options = list(load = list())))
y <- c(rnorm(50), rnorm(50, 1)) # the multiscale contraint bounds <- computeBounds(y, alpha = 0.5) # the order of the bounds depends on intervalSystem and lengths # to allow fast computation # if a specific order is required it can be reordered by order # b is ordered with increasing left indices and increasing right indices b <- bounds[order(bounds$li, bounds$ri), ] attr(b, "row.names") <- seq(along = b$li) # higher significance level for larger detection power, but less confidence computeBounds(y, alpha = 0.99) # smaller significance level for stronger confidence statements, but at # the risk of missing change-points computeBounds(y, alpha = 0.05) # different interval system, lengths, penalty and given parameter sd computeBounds(y, alpha = 0.5, intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "weights", weights = c(0.4, 0.3, 0.2, 0.1), sd = 0.5) # with given q identical(computeBounds(y, q = critVal(100L, alpha = 0.5)), bounds) identical(computeBounds(y, q = critVal(100L, alpha = 0.5, output = "value")), bounds) # the above calls saved and (attempted to) load Monte-Carlo simulations and # simulated them for nq = 128 observations # in the following call no saving, no loading and simulation for n = 100 # observations is required, progress of the simulation will be reported computeBounds(y, alpha = 0.5, messages = 1000L, options = list(simulation = "vector", load = list(), save = list())) # with given stat to compute q stat <- monteCarloSimulation(n = 128L) identical(computeBounds(y, alpha = 0.5, stat = stat), computeBounds(y, alpha = 0.5, options = list(load = list())))
Computes the multiscale vector of penalised statistics, (3.7) in the vignette, or the penalised multiscale statistic, (3.6) in the vignette, for given signal.
computeStat(y, signal = 0, family = NULL, intervalSystem = NULL, lengths = NULL, penalty = NULL, nq = length(y), output = c("list", "vector", "maximum"), ...)
computeStat(y, signal = 0, family = NULL, intervalSystem = NULL, lengths = NULL, penalty = NULL, nq = length(y), output = c("list", "vector", "maximum"), ...)
y |
a numeric vector containing the observations |
signal |
the given signal, either a single numeric for a constant function equal to the given value or an object of class |
family |
a string specifying the assumed parametric family, for more details see parametricFamily, currently |
intervalSystem |
a string giving the used interval system, either |
lengths |
an integer vector giving the set of lengths, i.e. only intervals of these lengths will be considered. Note that not all lengths are possible for all interval systems and for all parametric families, see intervalSystem and parametricFamily, respectively, to see which ones are allowed. By default ( |
penalty |
a string specifying how the statistics will be penalised, either |
nq |
a single integer larger than or equal to |
output |
a string specifying the output, see Value |
... |
further parameters of the parametric family. Depending on argument |
If output == list
a list containing in maximum
the penalised multiscale statistic, i.e. the maximum over all test statistics, in stat
the multiscale vector of penalised statistics, i.e. a vector of length lengths
giving the maximum over all tests of that length, and in lengths
the vector of lengths. If output == vector
a numeric vector giving the multiscale vector of penalised statistics. If output == maximum
a single numeric giving the penalised multiscale statistic. -Inf
is returned for lengths for which on all intervals of that length contained in the set of intervals the signal
is not constant and, hence, no test statistic can be computed. This behaves similar to max(numeric(0))
.
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
parametricFamily, intervalSystem, penalty, monteCarloSimulation
, stepFit
, computeBounds
y <- rnorm(100) # for the default signal = 0 a signal constant 0 is assumed identical(computeStat(y), computeStat(y, signal = list(leftIndex = 1L, rightIndex = 100L, value = 0))) # different constant value ret <- computeStat(y, signal = 1) # penalised multiscale statistic identical(ret$maximum, computeStat(y, signal = 1, output = "maximum")) # multiscale vector of penalised statistics identical(ret$stat, computeStat(y, signal = 1, output = "vector")) y <- c(rnorm(50), rnorm(50, 1)) # true signal computeStat(y, signal = list(leftIndex = c(1L, 51L), rightIndex = c(50L, 100L), value = c(0, 1))) # fit satisfies the multiscale contraint, i.e. # the penalised multiscale statistic is not larger than the used global quantile 1 computeStat(y, signal = stepFit(y, q = 1), output = "maximum") <= 1 # different interval system, lengths, penalty, given parameter sd # and computed for an increased number of observations nq computeStat(y, signal = list(leftIndex = c(1L, 51L), rightIndex = c(50L, 100L), value = c(0, 1)), nq = 128, sd = 0.5, intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "none") # family "hsmuce" computeStat(y, signal = mean(y), family = "hsmuce") # family "mDependentPS" signal <- list(leftIndex = c(1L, 13L), rightIndex = c(12L, 17L), value = c(0, -1)) y <- c(rep(0, 13), rep(-1, 4)) + as.numeric(arima.sim(n = 17, list(ar = c(), ma = c(0.8, 0.5, 0.3)), sd = 1)) covariances <- as.numeric(ARMAacf(ar = c(), ma = c(0.8, 0.5, 0.3), lag.max = 3)) computeStat(y, signal = signal, family = "mDependentPS", covariances = covariances)
y <- rnorm(100) # for the default signal = 0 a signal constant 0 is assumed identical(computeStat(y), computeStat(y, signal = list(leftIndex = 1L, rightIndex = 100L, value = 0))) # different constant value ret <- computeStat(y, signal = 1) # penalised multiscale statistic identical(ret$maximum, computeStat(y, signal = 1, output = "maximum")) # multiscale vector of penalised statistics identical(ret$stat, computeStat(y, signal = 1, output = "vector")) y <- c(rnorm(50), rnorm(50, 1)) # true signal computeStat(y, signal = list(leftIndex = c(1L, 51L), rightIndex = c(50L, 100L), value = c(0, 1))) # fit satisfies the multiscale contraint, i.e. # the penalised multiscale statistic is not larger than the used global quantile 1 computeStat(y, signal = stepFit(y, q = 1), output = "maximum") <= 1 # different interval system, lengths, penalty, given parameter sd # and computed for an increased number of observations nq computeStat(y, signal = list(leftIndex = c(1L, 51L), rightIndex = c(50L, 100L), value = c(0, 1)), nq = 128, sd = 0.5, intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "none") # family "hsmuce" computeStat(y, signal = mean(y), family = "hsmuce") # family "mDependentPS" signal <- list(leftIndex = c(1L, 13L), rightIndex = c(12L, 17L), value = c(0, -1)) y <- c(rep(0, 13), rep(-1, 4)) + as.numeric(arima.sim(n = 17, list(ar = c(), ma = c(0.8, 0.5, 0.3)), sd = 1)) covariances <- as.numeric(ARMAacf(ar = c(), ma = c(0.8, 0.5, 0.3), lag.max = 3)) computeStat(y, signal = signal, family = "mDependentPS", covariances = covariances)
Simulate a continuous time Markov chain.
Deprecation warning: This function is mainly used for patchlamp recordings and may be transferred to a specialised package.
contMC(n, values, rates, start = 1, sampling = 1, family = c("gauss", "gaussKern"), param = NULL)
contMC(n, values, rates, start = 1, sampling = 1, family = c("gauss", "gaussKern"), param = NULL)
n |
number of data points to simulate |
values |
a |
rates |
a square |
start |
the state in which the Markov chain is started |
sampling |
the sampling rate |
family |
whether Gaussian white ( |
param |
for |
A list
with components
cont |
an object of class |
discr |
an object of class |
data |
a |
This follows the description for simulating ion channels given by VanDongen (1996).
VanDongen, A. M. J. (1996) A new algorithm for idealizing single ion channel data containing multiple unknown conductance levels. Biophysical Journal 70(3), 1303–1315.
stepblock
, jsmurf
, stepbound
, steppath
, family
, dfilter
# Simulate filtered ion channel recording with two states set.seed(9) # sampling rate 10 kHz sampling <- 1e4 # tenfold oversampling over <- 10 # 1 kHz 4-pole Bessel-filter, adjusted for oversampling cutoff <- 1e3 df <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling / over)) # two states, leaving state 1 at 1 Hz, state 2 at 10 Hz rates <- rbind(c(0, 1e0), c(1e1, 0)) # simulate 5 s, level 0 corresponds to state 1, level 1 to state 2 # noise level is 0.1 after filtering sim <- contMC(5 * sampling, 0:1, rates, sampling=sampling, family="gaussKern", param = list(df=df, over=over, sd=0.1)) sim$cont plot(sim$data, pch = ".") lines(sim$discr, col = "red") # noise level after filtering, estimated from first block sd(sim$data$y[1:sim$discr$rightIndex[1]]) # show autocovariance in first block acf(ts(sim$data$y[1:sim$discr$rightIndex[1]], freq=sampling), type = "cov") # power spectrum in first block s <- spec.pgram(ts(sim$data$y[1:sim$discr$rightIndex[1]], freq=sampling), spans=c(200,90)) # cutoff frequency is where power spectrum is halved abline(v=cutoff, h=s$spec[1] / 2, lty = 2)
# Simulate filtered ion channel recording with two states set.seed(9) # sampling rate 10 kHz sampling <- 1e4 # tenfold oversampling over <- 10 # 1 kHz 4-pole Bessel-filter, adjusted for oversampling cutoff <- 1e3 df <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling / over)) # two states, leaving state 1 at 1 Hz, state 2 at 10 Hz rates <- rbind(c(0, 1e0), c(1e1, 0)) # simulate 5 s, level 0 corresponds to state 1, level 1 to state 2 # noise level is 0.1 after filtering sim <- contMC(5 * sampling, 0:1, rates, sampling=sampling, family="gaussKern", param = list(df=df, over=over, sd=0.1)) sim$cont plot(sim$data, pch = ".") lines(sim$discr, col = "red") # noise level after filtering, estimated from first block sd(sim$data$y[1:sim$discr$rightIndex[1]]) # show autocovariance in first block acf(ts(sim$data$y[1:sim$discr$rightIndex[1]], freq=sampling), type = "cov") # power spectrum in first block s <- spec.pgram(ts(sim$data$y[1:sim$discr$rightIndex[1]], freq=sampling), spans=c(200,90)) # cutoff frequency is where power spectrum is halved abline(v=cutoff, h=s$spec[1] / 2, lty = 2)
Computes the vector of critical values or the global quantile. This function offers two ways of computation, either at significance level alpha
from a Monte-Carlo simulation, see also section 3.2 in the vignette for more details, or from the global quantile / critical values given in the argument q
. For more details on these two options see Section Computation of critical values / global quantile.
Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package saves them by default in the workspace and on the file system such that a second call requiring the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Storing of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument messages
in ...
, explained in monteCarloSimulation
, and the saving can be controlled by the argument option
.
critVal(n, q = NULL, alpha = NULL, nq = 2L^(as.integer(log2(n) + 1e-12) + 1L) - 1L, family = NULL, intervalSystem = NULL, lengths = NULL, penalty = NULL, weights = NULL, stat = NULL, r = 1e4, output = c("vector", "value"), options = NULL, ...)
critVal(n, q = NULL, alpha = NULL, nq = 2L^(as.integer(log2(n) + 1e-12) + 1L) - 1L, family = NULL, intervalSystem = NULL, lengths = NULL, penalty = NULL, weights = NULL, stat = NULL, r = 1e4, output = c("vector", "value"), options = NULL, ...)
n |
a positive integer giving the number of observations |
q |
either |
alpha |
a probability, i.e. a single numeric between 0 and 1, giving the significance level. Its choice is a trade-off between data fit and parsimony of the estimator. In other words, this argument balances the risks of missing change-points and detecting additional artefacts. For more details on this choice see (Frick et al., 2014, section 4) and (Pein et al., 2017, section 3.4). Either |
nq |
a positive integer larger than or equal to |
family |
a string specifying the assumed parametric family, for more details see parametricFamily, currently |
intervalSystem |
a string giving the used interval system, either |
lengths |
an integer vector giving the set of lengths, i.e. only intervals of these lengths will be considered. Note that not all lengths are possible for all interval systems and for all parametric families, see intervalSystem and parametricFamily, respectively, to see which ones are allowed. By default ( |
penalty |
a string specifying how different scales will be balanced, either |
weights |
a numeric vector of length weights == rep(1 / length(lengths), length(lengths)) |
stat |
an object of class |
r |
a positive integer giving the required number of Monte-Carlo simulations if they will be simulated or loaded from the workspace or the file system |
output |
a string specifying the return value, if |
options |
a |
... |
there are two groups of further arguments:
|
If output == "vector"
a numeric vector giving the vector of critical values, i.e. a vector of length length(lengths)
, giving for each length the corresponding critical value. If output == "value"
a single numeric giving the global quantile. In both cases, additionally, an attribute
"n"
gives the number of observations for which the Monte-Carlo simulation was performed.
This function offers two ways to compute the resulting value:
If q == NULL
it will be computed at significance level alpha
from a Monte-Carlo simulation. For penalties "sqrt"
, "log"
and "none"
the global quantile will be the (1-alpha
)-quantile of the penalised multiscale statistic, see section 3.2.1 in the vignette. And if required the vector of critical values will be derived from it. For penalty "weights"
the vector of critical values will be calculated accordingly to the given weights
. The Monte-Carlo simulation can either be given in stat
or will be attempted to load or will be simulated. How Monte-Carlo simulations are simulated, saved and loaded can be controlled by the argument option
, for more details see the Section Simulating, saving and loading of Monte-Carlo simulations.
If q
is given it will be derived from it. For the argument q
either a single finite numeric giving the global quantile or a vector of finite numerics giving the vector of critical values (not allowed for output == "value"
) is possible:
A single numeric giving the global quantile. If output == "vector"
the vector of critical values will be computed from it for the given lengths
and penalty
(penalty "weights"
is not allowed). Note that the global quantile is specific to the arguments family
, intervalSystem
, lengths
and penalty
.
A vector of length length(lengths)
, giving for each length the corresponding critical value. This vector is identical to the vector of critical values.
A vector of length n
giving for each length 1:n
the corresponding critical value.
A vector of length equal to the number of all possible lengths for the given interval system and the given parametric family giving for each possible length the corresponding critical value.
Additionally, an attribute
"n"
giving the number of observations for which q
was computed is allowed. This attribute
must be a single integer and equal to or larger than the argument n
which means that q
must have been computed for at least n
observations. This allows additionally:
A vector of length attr(q, "n")
giving for each length 1:attr(q, "n")
the corresponding critical value.
A vector of length equal to the number of all possible lengths for the given interval system and the given parametric family if the number of observations is attr(q, "n")
giving for each possible length the corresponding critical value.
The attribute
"n"
will be kept or set to n
if missing.
Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this function offers multiple possibilities for saving and loading the simulations. The simulation, saving and loading can be controlled by the argument option
. This argument has to be a list
or NULL
and the following named entries are allowed: "simulation"
, "save"
, "load"
, "envir"
and "dirs"
. All missing entries will be set to their default option.
Objects of class "MCSimulationVector"
, containing simulations of the multiscale vector of statistics, and objects of class "MCSimulationMaximum"
, containing simulations of the penalised multiscale statistic (for penalties "sqrt"
, "log"
and "none"
), can be simulated, saved and loaded. Each Monte-Carlo simulation is specific to the number of observations, the parametric family and the interval system, for "MCSimulationMaximum"
additionally to the set of lengths and the used penalty. Both types will lead to the same result, however, an object of class "MCSimulationVector"
is more flexible, since critical values for all penalties and all set of lengths can be derived from it, but requires much more storage space and has slightly larger saving and loading times. Note that Monte-Carlo simulations can only be saved and loaded if they are generated with the default function for generating random observations, i.e. when rand.gen
(in ...
) is NULL
. For a given simulation this is signalled by the attribute
"save"
which is TRUE
if a simulation can be saved and FALSE
otherwise.
Monte-Carlo simulations can also be performed for a (slightly) larger number of observations given in the argument
nq
, which avoids extensive resimulations for only a little bit varying number of observations. The overestimation control is still satisfied but the detection power is (slightly) smaller. But note that the default lengths
might change when the number of observations is increased and, hence, for type "vectorIncreased"
still a different simulation might be required.
We refer to the different types as follow:
"vector"
: an object of class "MCSimulationMaximum"
, i.e. simulations of the penalized multiscale statistic, for n
observations
"vectorIncreased"
: an object of class "MCSimulationMaximum"
, i.e. simulations of the penalized multiscale statistic, for nq
observations
"matrix"
: an object of class "MCSimulationVector"
, i.e. simulations of the multiscale vector of statistics, for n
observations
"matrixIncreased"
: an object of class "MCSimulationVector"
, i.e. simulations of the multiscale vector of statistics, for nq
observations
The simulations can either be saved in the workspace in the variable critValStepRTab
or persistently on the file system for which the package R.cache
is used. Loading from the workspace is faster, but either the user has to store the workspace manually or in a new session simulations have to be performed again. Moreover, storing in and loading from variables and RDS files is supported. Finally, a pre-computed collection of simulations of type "matrixIncreased"
for parametric families "gauss"
and "hsmuce"
can be accessed by installing the package stepRdata
available from http://www.stochastik.math.uni-goettingen.de/stepRdata_1.0-0.tar.gz.
For loading from / saving in the workspace the variable critValStepRTab
in the environment
options$envir
will be looked for and if missing in case of saving also created there. Moreover, the variable(s) specified in options$save$variable
(explained in the Subsection Saving: options$save) will be assigned to this environment
. options$envir
will be passed to the arguments pos
and where
in the functions assign
, get
, and exists
, respectively. By default, a local enviroment in the package is used.
For loading from / saving on the file system loadCache(key = keyList, dirs = options$dirs)
and saveCache(stat, key = attr(stat, "keyList"), dirs = options$dirs)
are called, respectively. In other words, options$dirs
has to be a character
vector
constituting the path to the cache subdirectory relative to the cache root directory as returned by getCacheRootPath
(). If options$dirs == ""
the path will be the cache root path. By default the subdirectory "stepR"
is used, i.e. options$dirs == "stepR"
. Missing directories will be created.
Whenever Monte-Carlo simulations have to be performed, i.e. when stat == NULL
and the required Monte-Carlo simulation could not be loaded, the type specified in options$simulation
will be simulated by monteCarloSimulation
. In other words, options$simulation
must be a single string of the following: "vector"
, "vectorIncreased"
, "matrix"
or "matrixIncreased"
. By default (options$simulation == NULL
), an object of class "MCSimulationVector"
for nq
observations will be simulated, i.e. options$simulation
== "matrixIncreased"
. For this choice please recall the explanations regarding computation time and flexibility at the beginning of this section.
Loading of the simulations can be controlled by the entry options$load
which itself has to be a list
with possible entries: "RDSfile"
, "workspace"
, "package"
and "fileSystem"
. Missing entries disable the loading from this option.
Whenever a Monte-Carlo simulation is required, i.e. when the variable q
is not given, it will be searched for at the following places in the given order until found:
in the variable stat
,
in options$load$RDSfile
as an RDS file, i.e. the simulation will be loaded by
readRDS(options$load$RDSfile).
In other words, options$load$RDSfile
has to be a connection
or the name of the file where the R object is read from,
in the workspace or on the file system in the following order: "vector"
, "matrix"
, "vectorIncreased"
and finally of "matrixIncreased"
. For penalty == "weights"
it will only be looked for "matrix"
and "matrixIncreased"
. For each options it will first be looked in the workspace and then on the file system. All searches can be disabled by not specifying the corresponding string in options$load$workspace
and options$load$fileSystem
. In other words, options$load$workspace
and options$load$fileSystem
have to be vectors of strings containing none, some or all of "vector"
, "matrix"
, "vectorIncreased"
and "matrixIncreased"
,
in the package stepRdata
(if installed) and if options$load$package == TRUE
. In other words, options$load$package
must be a single logical or NULL
,
if all other options fail a Monte-Carlo simulation will be performed.
By default (if options$load
is missing / NULL
) no RDS file is specified and all other options are enabled, i.e.
options$load <- list(workspace = c("vector", "vectorIncreased", "matrix", "matrixIncreased"), fileSystem = c("vector", "vectorIncreased", "matrix", "matrixIncreased"), package = TRUE, RDSfile = NULL).
Saving of the simulations can be controlled by the entry options$save
which itself has to be a list
with possible entries: "workspace"
, "fileSystem"
, "RDSfile"
and "variable"
. Missing entries disable the saving in this option.
All available simulations, no matter whether they are given by stat
, loaded, simulated or in case of "vector"
and "vectorIncreased"
computed from "matrix"
and "matrixIncreased"
, respectively, will be saved in all options for which the corresponding type is specified. Here we say a simulation is of type "vectorIncreased"
or "matrixIncreased"
if the simulation is not performed for n
observations. More specifically, a simulation will be saved:
in the workspace or on the file system if the corresponding string is contained in options$save$workspace
and options$save$fileSystem
, respectively. In other words, options$save$workspace
and options$save$fileSystem
have to be vectors of strings containing none, some or all of "vector"
, "matrix"
, "vectorIncreased"
and "matrixIncreased"
,
in an RDS file specified by options$save$RDSfile
which has to be a vector of one or two connections
or names of files where the R object is saved to. If options$save$RDSfile
is of length two a simulation of type "vector"
or "vectorIncreased"
(only one can occur at one function call) will be saved in options$save$RDSfile[1]
by
saveRDS(stat, file = options$save$RDSfile[1])
and "matrix"
or "matrixIncreased"
(only one can occur at one function call) will be saved in options$save$RDSfile[2]
. If options$save$RDSfile
is of length one both will be saved in options$save$RDSfile
which means if both occur at the same call only "vector"
or "vectorIncreased"
will be saved. Each saving can be disabled by not specifying options$save$RDSfile
or by passing an empty string to the corresponding entry of options$save$RDSfile
.
in a variable named by options$save$variable
in the environment
options$envir
. Hence, options$save$variable
has to be a vector of one or two containing variable names (character vectors). If options$save$variable
is of length two a simulation of type "vector"
or "vectorIncreased"
(only one can occur at one function call) will be saved in options$save$variable[1]
and "matrix"
or "matrixIncreased"
(only one can occur at one function call) will be saved in options$save$variable[2]
. If options$save$variable
is of length one both will be saved in options$save$variable
which means if both occur at the same call only "vector"
or "vectorIncreased"
will be saved. Each saving can be disabled by not specifying options$save$variable
or by passing ""
to the corresponding entry of options$save$variable
.
By default (if options$save
is missing) "vector"
and "vectorIncreased"
will be saved in the workspace and "matrix"
and "matrixIncreased"
on the file system, i.e.
options$save <- list(workspace = c("vector", "vectorIncreased"), fileSystem = c("matrix", "matrixIncreased"), RDSfile = NULL, variable = NULL).
Simulations can be removed from the workspace by removing the variable critValStepRTab
, i.e. by calling remove(critValStepRTab, envir = envir)
, with envir
the used environment, and from the file system by deleting the corresponding subfolder, i.e. by calling
unlink(file.path(R.cache::getCacheRootPath(), dirs), recursive = TRUE),
with dirs
the corresponding subdirectory.
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
monteCarloSimulation
, penalty
, parametricFamily
, intervalSystem
, stepFit
, computeBounds
# vector of critical values qVector <- critVal(100L, alpha = 0.5) # global quantile qValue <- critVal(100L, alpha = 0.5, output = "value") # vector can be computed from the global quantile identical(critVal(100L, q = qValue), qVector) # for a conservative significance level, stronger confidence statements critVal(100L, alpha = 0.05) critVal(100L, alpha = 0.05, output = "value") # higher significance level for larger detection power, but less confidence critVal(100L, alpha = 0.99) critVal(100L, alpha = 0.99, output = "value") # different parametric family, different intervalSystem, a subset of lengths, # different penalty and given weights q <- critVal(100L, alpha = 0.05, family = "hsmuce", intervalSystem = "dyaLen", lengths = c(2L, 4L, 16L, 32L), penalty = "weights", weights = c(0.4, 0.3, 0.2, 0.1)) # vector of critical values can be given by a vector of length n vec <- 1:100 vec[c(2L, 4L, 16L, 32L)] <- q attr(vec, "n") <- 128L identical(critVal(100L, q = vec, family = "hsmuce", intervalSystem = "dyaLen", lengths = c(2L, 4L, 16L, 32L)), q) # with a given monte-Carlo simulation for nq = 128 observations stat <- monteCarloSimulation(128) critVal(n = 100L, alpha = 0.05, stat = stat) # the above calls saved and (attempted to) load Monte-Carlo simulations and # simulated them for nq = 128 observations # in the following call no saving, no loading and simulation for n = 100 # observations is required, progress of the simulation will be reported critVal(n = 100L, alpha = 0.05, messages = 1000L, options = list(simulation = "vector", load = list(), save = list())) # only type "vector" will be saved and loaded in the workspace critVal(n = 100L, alpha = 0.05, messages = 1000L, options = list(simulation = "vector", load = list(workspace = "vector"), save = list(workspace = "vector"))) # simulation of type "matrix" will be saved in a RDS file # saving of type "vector" is disabled by passing "", # different seed is set and number of simulations is reduced to r = 1e3 # to allow faster computation at the price of a less precise result file <- tempfile(pattern = "file", tmpdir = tempdir(), fileext = ".RDS") critVal(n = 100L, alpha = 0.05, seed = 1, r = 1e3, options = list(simulation = "matrix", load = list(), save = list(RDSfile = c("", file)))) identical(readRDS(file), monteCarloSimulation(100L, seed = 1, r = 1e3))
# vector of critical values qVector <- critVal(100L, alpha = 0.5) # global quantile qValue <- critVal(100L, alpha = 0.5, output = "value") # vector can be computed from the global quantile identical(critVal(100L, q = qValue), qVector) # for a conservative significance level, stronger confidence statements critVal(100L, alpha = 0.05) critVal(100L, alpha = 0.05, output = "value") # higher significance level for larger detection power, but less confidence critVal(100L, alpha = 0.99) critVal(100L, alpha = 0.99, output = "value") # different parametric family, different intervalSystem, a subset of lengths, # different penalty and given weights q <- critVal(100L, alpha = 0.05, family = "hsmuce", intervalSystem = "dyaLen", lengths = c(2L, 4L, 16L, 32L), penalty = "weights", weights = c(0.4, 0.3, 0.2, 0.1)) # vector of critical values can be given by a vector of length n vec <- 1:100 vec[c(2L, 4L, 16L, 32L)] <- q attr(vec, "n") <- 128L identical(critVal(100L, q = vec, family = "hsmuce", intervalSystem = "dyaLen", lengths = c(2L, 4L, 16L, 32L)), q) # with a given monte-Carlo simulation for nq = 128 observations stat <- monteCarloSimulation(128) critVal(n = 100L, alpha = 0.05, stat = stat) # the above calls saved and (attempted to) load Monte-Carlo simulations and # simulated them for nq = 128 observations # in the following call no saving, no loading and simulation for n = 100 # observations is required, progress of the simulation will be reported critVal(n = 100L, alpha = 0.05, messages = 1000L, options = list(simulation = "vector", load = list(), save = list())) # only type "vector" will be saved and loaded in the workspace critVal(n = 100L, alpha = 0.05, messages = 1000L, options = list(simulation = "vector", load = list(workspace = "vector"), save = list(workspace = "vector"))) # simulation of type "matrix" will be saved in a RDS file # saving of type "vector" is disabled by passing "", # different seed is set and number of simulations is reduced to r = 1e3 # to allow faster computation at the price of a less precise result file <- tempfile(pattern = "file", tmpdir = tempdir(), fileext = ".RDS") critVal(n = 100L, alpha = 0.05, seed = 1, r = 1e3, options = list(simulation = "matrix", load = list(), save = list(RDSfile = c("", file)))) identical(readRDS(file), monteCarloSimulation(100L, seed = 1, r = 1e3))
Create digital filters.
Deprecation warning: This function is mainly used for patchlamp recordings and may be transferred to a specialised package.
dfilter(type = c("bessel", "gauss", "custom"), param = list(pole = 4, cutoff = 1 / 10), len = ceiling(3/param$cutoff)) ## S3 method for class 'dfilter' print(x, ...)
dfilter(type = c("bessel", "gauss", "custom"), param = list(pole = 4, cutoff = 1 / 10), len = ceiling(3/param$cutoff)) ## S3 method for class 'dfilter' print(x, ...)
type |
allows to choose Bessel, Gauss or custom filters |
param |
for a |
len |
filter length (unnecessary for |
x |
the object |
... |
for generic methods only |
Returns a list of class
dfilter
that contains elements kern
and step
, the (digitised) filter kernel and step-response, resp., as well as an element param
containing the argument param
, for a "bessel"
filter alongside the corresponding analogue kernel, step response, power spectrum, and autocorrelation function depending on time or frequency as elements kernfun
, stepfun
, spectrum
, and acfun
, resp.
filter
, convolve
, BesselPolynomial
, Normal
, family
# 6-pole Bessel filter with cut-off frequency 1 / 100, with length 100 (too short!) dfilter("bessel", list(pole = 6, cutoff = 1 / 100), 100) # custom filter: running mean of length 3 dfilter("custom", rep(1, 3)) dfilter("custom", rep(1, 3))$kern # normalised! dfilter("custom", rep(1, 3))$step # Gaussian filter with bandwidth 3 and length 11 (from -5 to 5) dfilter("gauss", 3, 11)
# 6-pole Bessel filter with cut-off frequency 1 / 100, with length 100 (too short!) dfilter("bessel", list(pole = 6, cutoff = 1 / 100), 100) # custom filter: running mean of length 3 dfilter("custom", rep(1, 3)) dfilter("custom", rep(1, 3))$kern # normalised! dfilter("custom", rep(1, 3))$step # Gaussian filter with bandwidth 3 and length 11 (from -5 to 5) dfilter("gauss", 3, 11)
Families of distributions supported by package stepR
.
Deprecation warning: This overviw is deprecated, but still given and up to date for some older, deprecated functions, however, may be removed in a future version. For an overview about the parametric families supported by the new functions see parametricFamily
.
Package stepR
supports several families of distributions (mainly exponential) to model the data, some of which require additional (fixed) parameters. In particular, the following families are available:
"gauss"
normal distribution with unknown mean but known, fixed standard deviation given as a single numeric
(will be estimated using sdrobnorm
if omitted); cf. dnorm
.
"gaussvar"
normal distribution with unknown variance but known, fixed mean assumed to be zero; cf. dnorm
.
"poisson"
Poisson distribution with unknown intensity (no additional parameter); cf. dpois
.
"binomial"
binomial distribution with unknown success probability but known, fixed size given as a single integer
; cf. dbinom
.
"gaussKern"
normal distribution with unknown mean and unknown, fixed standard deviation (being estimated using sdrobnorm
), after filtering with a fixed filter which needs to be given as the additional parameter (a dfilter
object); cf. dfilter
.
The family is selected via the family
argument, providing the corresponding string, while the param
argument contains the parameters if any.
Beware that not all families can be chosen for all functions.
Distributions, parametricFamily
, dnorm
, dpois
, dbinom
, dfilter
, sdrobnorm
# illustrating different families fitted to the same binomial data set size <- 200 n <- 200 # truth p <- 10^seq(-3, -0.1, length = n) # data y <- rbinom(n, size, p) plot(y) lines(size * p, col = "red") # fit 4 jumps, binomial family jumps <- 4 bfit <- steppath(y, family = "binomial", param = size, max.blocks = jumps) lines(bfit[[jumps]], col = "orange") # Gaussian approximation with estimated variance gfit <- steppath(y, family = "gauss", max.blocks = jumps) lines(gfit[[jumps]], col = "green3", lty = 2) # Poisson approximation pfit <- steppath(y, family = "poisson", max.blocks = jumps) lines(pfit[[jumps]], col = "blue", lty = 2) legend("topleft", legend = c("binomial", "gauss", "poisson"), lwd = 2, col = c("orange", "green3", "blue"))
# illustrating different families fitted to the same binomial data set size <- 200 n <- 200 # truth p <- 10^seq(-3, -0.1, length = n) # data y <- rbinom(n, size, p) plot(y) lines(size * p, col = "red") # fit 4 jumps, binomial family jumps <- 4 bfit <- steppath(y, family = "binomial", param = size, max.blocks = jumps) lines(bfit[[jumps]], col = "orange") # Gaussian approximation with estimated variance gfit <- steppath(y, family = "gauss", max.blocks = jumps) lines(gfit[[jumps]], col = "green3", lty = 2) # Poisson approximation pfit <- steppath(y, family = "poisson", max.blocks = jumps) lines(pfit[[jumps]], col = "blue", lty = 2) legend("topleft", legend = c("binomial", "gauss", "poisson"), lwd = 2, col = c("orange", "green3", "blue"))
Overview about the supported interval systems. More details are given in section 6 of the vignette.
The following interval systems (set of intervals on which tests will be performed) are available. Intervals are given as indices of observations / sample points.
"all"
all intervals. More precisely, the set . This system allows all lengths
1:n
.
"dyaLen"
all intervals of dyadic length. More precisely, the set . This system allows all lengths of dyadic length
2^(0:as.integer(floor(log2(n)) + 1e-6))
.
"dyaPar"
the dyadic partition, i.e. all disjoint intervals of dyadic length. More precisely, the set . This system allows all lengths of dyadic length
2^(0:as.integer(floor(log2(n)) + 1e-6))
.
The interval system is selected via the intervalSystem
argument, providing the corresponding string. By default (NULL
) the default interval system of the specified parametric family will be used, which one this will be is described in parametricFamily
. With the additional argument lengths
it is possible to specify a set of lengths such that only tests on intervals with a length contained in this set will be performed. The set of lengths has to be a subset of all lengths that are allowed by the interval system and the parametric family. By default (NULL
) all lengths allowed by the interval system and the parametric family are used.
y <- c(rnorm(50), rnorm(50, 2)) # interval system of all intervals and all lengths fit <- stepFit(y, alpha = 0.5, intervalSystem = "all", lengths = 1:100, jumpint = TRUE, confband = TRUE) # default for family "gauss" if number of observations is 1000 or less identical(stepFit(y, alpha = 0.5, jumpint = TRUE, confband = TRUE), fit) # intervalSystem "dyaLen" and a subset of lengths !identical(stepFit(y, alpha = 0.5, intervalSystem = "dyaLen", lengths = c(2, 4, 16), jumpint = TRUE, confband = TRUE), fit) # default for lengths are all possible lengths of the interval system # and the parametric family identical(stepFit(y, alpha = 0.5, intervalSystem = "dyaPar", jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, intervalSystem = "dyaPar", lengths = 2^(0:6), jumpint = TRUE, confband = TRUE)) # interval system "dyaPar" is default for parametric family "hsmuce" # length 1 is not possible for this parametric family identical(stepFit(y, alpha = 0.5, family = "hsmuce", jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "hsmuce", intervalSystem = "dyaPar", lengths = 2^(1:6), jumpint = TRUE, confband = TRUE)) # interval system "dyaLen" is default for parametric family "mDependentPS" identical(stepFit(y, alpha = 0.5, family = "mDependentPS", covariances = c(1, 0.5), jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "mDependentPS", covariances = c(1, 0.5), intervalSystem = "dyaLen", lengths = 2^(0:6), jumpint = TRUE, confband = TRUE))
y <- c(rnorm(50), rnorm(50, 2)) # interval system of all intervals and all lengths fit <- stepFit(y, alpha = 0.5, intervalSystem = "all", lengths = 1:100, jumpint = TRUE, confband = TRUE) # default for family "gauss" if number of observations is 1000 or less identical(stepFit(y, alpha = 0.5, jumpint = TRUE, confband = TRUE), fit) # intervalSystem "dyaLen" and a subset of lengths !identical(stepFit(y, alpha = 0.5, intervalSystem = "dyaLen", lengths = c(2, 4, 16), jumpint = TRUE, confband = TRUE), fit) # default for lengths are all possible lengths of the interval system # and the parametric family identical(stepFit(y, alpha = 0.5, intervalSystem = "dyaPar", jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, intervalSystem = "dyaPar", lengths = 2^(0:6), jumpint = TRUE, confband = TRUE)) # interval system "dyaPar" is default for parametric family "hsmuce" # length 1 is not possible for this parametric family identical(stepFit(y, alpha = 0.5, family = "hsmuce", jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "hsmuce", intervalSystem = "dyaPar", lengths = 2^(1:6), jumpint = TRUE, confband = TRUE)) # interval system "dyaLen" is default for parametric family "mDependentPS" identical(stepFit(y, alpha = 0.5, family = "mDependentPS", covariances = c(1, 0.5), jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "mDependentPS", covariances = c(1, 0.5), intervalSystem = "dyaLen", lengths = 2^(0:6), jumpint = TRUE, confband = TRUE))
Reconstructs a piecewise constant function to which white noise was added and the sum filtered afterwards.
Deprecation warning: This function is mainly used for patchlamp recordings and may be transferred to a specialised package.
jsmurf(y, x = 1:length(y), x0 = 2 * x[1] - x[2], q, alpha = 0.05, r = 4e3, lengths = 2^(floor(log2(length(y))):floor(log2(max(length(param$kern) + 1, 1 / param$param$cutoff)))), param, rm.out = FALSE, jumpint = confband, confband = FALSE)
jsmurf(y, x = 1:length(y), x0 = 2 * x[1] - x[2], q, alpha = 0.05, r = 4e3, lengths = 2^(floor(log2(length(y))):floor(log2(max(length(param$kern) + 1, 1 / param$param$cutoff)))), param, rm.out = FALSE, jumpint = confband, confband = FALSE)
y |
a numeric vector containing the serial data |
x |
a numeric vector of the same length as |
x0 |
a single numeric giving the last unobserved sample point directly before sampling started |
q |
threshold value, by default chosen automatically |
alpha |
significance level; if set to a value in (0,1), |
r |
numer of simulations; if specified along |
lengths |
length of intervals considered; by default up to a sample size of 1000 all lengths, otherwise only dyadic lengths |
param |
a |
rm.out |
a |
jumpint |
|
confband |
|
An object object of class stepfit
that contains the fit; if jumpint == TRUE
function jumpint
allows to extract the 1 - alpha
confidence interval for the jumps, if confband == TRUE
function confband
allows to extract the 1 - alpha
confidence band.
Hotz, T., Schütte, O., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C., and Munk, A. (2013) Idealizing ion channel recordings by a jump segmentation multiresolution filter. IEEE Transactions on NanoBioscience 12(4), 376–386.
stepbound
, bounds
, family, MRC.asymptotic
, sdrobnorm
, stepfit
# simulate filtered ion channel recording with two states set.seed(9) # sampling rate 10 kHz sampling <- 1e4 # tenfold oversampling over <- 10 # 1 kHz 4-pole Bessel-filter, adjusted for oversampling cutoff <- 1e3 df.over <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling / over)) # two states, leaving state 1 at 10 Hz, state 2 at 20 Hz rates <- rbind(c(0, 10), c(20, 0)) # simulate 0.5 s, level 0 corresponds to state 1, level 1 to state 2 # noise level is 0.3 after filtering sim <- contMC(0.5 * sampling, 0:1, rates, sampling=sampling, family="gaussKern", param = list(df=df.over, over=over, sd=0.3)) plot(sim$data, pch = ".") lines(sim$discr, col = "red") # fit using filter corresponding to sample rate df <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling)) fit <- jsmurf(sim$data$y, sim$data$x, param=df, r=1e2) lines(fit, col = "blue") # fitted values take filter into account lines(sim$data$x, fitted(fit), col = "green3", lty = 2)
# simulate filtered ion channel recording with two states set.seed(9) # sampling rate 10 kHz sampling <- 1e4 # tenfold oversampling over <- 10 # 1 kHz 4-pole Bessel-filter, adjusted for oversampling cutoff <- 1e3 df.over <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling / over)) # two states, leaving state 1 at 10 Hz, state 2 at 20 Hz rates <- rbind(c(0, 10), c(20, 0)) # simulate 0.5 s, level 0 corresponds to state 1, level 1 to state 2 # noise level is 0.3 after filtering sim <- contMC(0.5 * sampling, 0:1, rates, sampling=sampling, family="gaussKern", param = list(df=df.over, over=over, sd=0.3)) plot(sim$data, pch = ".") lines(sim$discr, col = "red") # fit using filter corresponding to sample rate df <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling)) fit <- jsmurf(sim$data$y, sim$data$x, param=df, r=1e2) lines(fit, col = "blue") # fitted values take filter into account lines(sim$data$x, fitted(fit), col = "green3", lty = 2)
Extract and plot confidence intervals and bands from fits given by a stepfit
object.
jumpint(sb, ...) ## S3 method for class 'stepfit' jumpint(sb, ...) ## S3 method for class 'jumpint' points(x, pch.left = NA, pch.right = NA, y.left = NA, y.right = NA, xpd = NA, ...) confband(sb, ...) ## S3 method for class 'stepfit' confband(sb, ...) ## S3 method for class 'confband' lines(x, dataspace = TRUE, ...)
jumpint(sb, ...) ## S3 method for class 'stepfit' jumpint(sb, ...) ## S3 method for class 'jumpint' points(x, pch.left = NA, pch.right = NA, y.left = NA, y.right = NA, xpd = NA, ...) confband(sb, ...) ## S3 method for class 'stepfit' confband(sb, ...) ## S3 method for class 'confband' lines(x, dataspace = TRUE, ...)
sb |
the result of a fit by |
x |
the object |
pch.left , pch.right
|
the plotting character to use for the left/right end of the interval with defaults |
y.left , y.right
|
at which height to plot the interval boundaries with default |
xpd |
see |
dataspace |
|
... |
arguments to be passed to generic methods |
For jumpint
an object of class jumpint
, i.e. a data.frame
whose columns rightEndLeftBound
and rightEndRightBound
specify the left and right end of the confidence interval for the block's right end, resp., given the number of blocks was estimated correctly, and similarly columns rightIndexLeftBound
and rightIndexRightBound
specify the left and right indices of the confidence interval, resp. Function points
plots these intervals on the lower horizontal axis (by default).
For confband
an object of class confband
, i.e. a data.frame
with columns lower
and upper
specifying a confidence band computed at every point x
; this is a simultaneous confidence band assuming the true number of jumps has been determined. Function lines
plots the confidence band.
Observe that jumps may occur immediately before or after an observed x
; this lack of knowledge is reflected in the visual impressions by the lower and upper envelopes jumping vertically early, so that possible jumps between x
s remain within the band, and by the confidence intervals starting immediately after the last x
for which there cannot be a jump, cf. the note in the help for stepblock
.
# simulate Bernoulli data with four blocks y <- rbinom(200, 1, rep(c(0.1, 0.7, 0.3, 0.9), each=50)) # fit step function sb <- stepbound(y, family="binomial", param=1, confband=TRUE) plot(y, pch="|") lines(sb) # confidence intervals for jumps jumpint(sb) points(jumpint(sb), col="blue") # confidence band confband(sb) lines(confband(sb), lty=2, col="blue")
# simulate Bernoulli data with four blocks y <- rbinom(200, 1, rep(c(0.1, 0.7, 0.3, 0.9), each=50)) # fit step function sb <- stepbound(y, family="binomial", param=1, confband=TRUE) plot(y, pch="|") lines(sb) # confidence intervals for jumps jumpint(sb) points(jumpint(sb), col="blue") # confidence band confband(sb) lines(confband(sb), lty=2, col="blue")
Performs Monte-Carlo simulations of the multiscale vector of statistics, (3.9) in the vignette, and of the penalised multiscale statistic, (3.6) in the vignette, when no signal is present, see also section 3.2.3 in the vignette.
monteCarloSimulation(n, r = 1e4L, family = NULL, intervalSystem = NULL, lengths = NULL, penalty = NULL, output = c("vector", "maximum"), seed = n, rand.gen = NULL, messages = NULL, ...)
monteCarloSimulation(n, r = 1e4L, family = NULL, intervalSystem = NULL, lengths = NULL, penalty = NULL, output = c("vector", "maximum"), seed = n, rand.gen = NULL, messages = NULL, ...)
n |
a positive integer giving the number of observations for which the Monte-Carlo simulation will be performed |
r |
a positive integer giving the number of repititions |
family |
a string specifying the assumed parametric family, for more details see parametricFamily, currently |
intervalSystem |
a string giving the used interval system, either |
lengths |
an integer vector giving the set of lengths, i.e. only intervals of these lengths will be considered. Only required for |
penalty |
a string specifying how the statistics will be penalised, either |
output |
a string specifying the output, see Value |
seed |
will be passed to |
rand.gen |
by default ( |
messages |
a positive integer or |
... |
further parameters of the parametric family. Depending on the argument |
If output == "vector"
an object of class "MCSimulationVector"
, i.e. a times
r
matrix containing r
independent samples of the multiscale vector of statistics, with the number of scales, i.e. the number of possible lengths for the given interval system and given parametric family. If
output == "maximum"
an object of class "MCSimulationMaximum"
, i.e. a vector of length r
containing r
independent samples of the penalised multiscale statistic. For both, additionally, the following attributes
are set:
"keyList": A list specifying for which number of observations n
, which parametric family with which parameters by a SHA-1 hash, which interval system and in case of "MCSimulationMaximum"
, additionally, for which lengths and which penalisation the simulation was performed.
"key": A key used internally for identification when the object will be saved and loaded.
"n": The number of observations n
for which the simulation was performed.
"lengths": The lengths for which the simulation was performed.
"save": A logical
which is TRUE
if the object can be saved which is the case for rand.gen == NULL
and FALSE
otherwise.
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
critVal
, computeStat
, penalty
, parametricFamily
, intervalSystem
# monteCarloSimulation will be called in critVal, can be called explicitly # object of class MCSimulationVector stat <- monteCarloSimulation(n = 100L) identical(critVal(n = 100L, alpha = 0.5, stat = stat), critVal(n = 100L, alpha = 0.5, options = list(load = list(), simulation = "matrix"))) # object of class MCSimulationMaximum stat <- monteCarloSimulation(n = 100L, output = "maximum") identical(critVal(n = 100L, alpha = 0.5, stat = stat), critVal(n = 100L, alpha = 0.5, options = list(load = list(), simulation = "vector"))) # different interval system, lengths and penalty monteCarloSimulation(n = 100L, output = "maximum", intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "log") # with a different number of iterations, different seed, # reported progress and user written rand.gen function stat <- monteCarloSimulation(n = 100L, r = 1e3, seed = 1, messages = 100, rand.gen = function(data) {rnorm(100)}) # the optional argument sd of parametric family "gauss" will be replaced by 1 identical(monteCarloSimulation(n = 100L, r = 1e3, sd = 5), monteCarloSimulation(n = 100L, r = 1e3, sd = 1)) # simulation for family "hsmuce" monteCarloSimulation(n = 100L, family = "hsmuce") # simulation for family "mDependentGauss" # covariances must be given (can also be given by correlations or filter) stat <- monteCarloSimulation(n = 100L, family = "mDependentPS", covariances = c(1, 0.5, 0.3)) # variance will be standardized to 1 # output might be on some systems even identical all.equal(monteCarloSimulation(n = 100L, family = "mDependentPS", covariances = c(2, 1, 0.6)), stat)
# monteCarloSimulation will be called in critVal, can be called explicitly # object of class MCSimulationVector stat <- monteCarloSimulation(n = 100L) identical(critVal(n = 100L, alpha = 0.5, stat = stat), critVal(n = 100L, alpha = 0.5, options = list(load = list(), simulation = "matrix"))) # object of class MCSimulationMaximum stat <- monteCarloSimulation(n = 100L, output = "maximum") identical(critVal(n = 100L, alpha = 0.5, stat = stat), critVal(n = 100L, alpha = 0.5, options = list(load = list(), simulation = "vector"))) # different interval system, lengths and penalty monteCarloSimulation(n = 100L, output = "maximum", intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "log") # with a different number of iterations, different seed, # reported progress and user written rand.gen function stat <- monteCarloSimulation(n = 100L, r = 1e3, seed = 1, messages = 100, rand.gen = function(data) {rnorm(100)}) # the optional argument sd of parametric family "gauss" will be replaced by 1 identical(monteCarloSimulation(n = 100L, r = 1e3, sd = 5), monteCarloSimulation(n = 100L, r = 1e3, sd = 1)) # simulation for family "hsmuce" monteCarloSimulation(n = 100L, family = "hsmuce") # simulation for family "mDependentGauss" # covariances must be given (can also be given by correlations or filter) stat <- monteCarloSimulation(n = 100L, family = "mDependentPS", covariances = c(1, 0.5, 0.3)) # variance will be standardized to 1 # output might be on some systems even identical all.equal(monteCarloSimulation(n = 100L, family = "mDependentPS", covariances = c(2, 1, 0.6)), stat)
Computes multiresolution coefficients, the corresponding criterion, simulates these for Gaussian white or coloured noise, based on which p-values and quantiles are obtained.
Deprecation warning: The function MRC.simul
is deprecated, but still working, however, may be defunct in a future version. Please use instead the function monteCarloSimulation
. An example how to reproduce results is given below. Some other functions are help function and might be removed, too.
MRC(x, lengths = 2^(floor(log2(length(x))):0), norm = sqrt(lengths), penalty = c("none", "log", "sqrt")) MRCoeff(x, lengths = 2^(floor(log2(length(x))):0), norm = sqrt(lengths), signed = FALSE) MRC.simul(n, r, lengths = 2^(floor(log2(n)):0), penalty = c("none", "log", "sqrt")) MRC.pvalue(q, n, r, lengths = 2^(floor(log2(n)):0), penalty = c("none", "log", "sqrt"), name = ".MRC.table", pos = .MCstepR, inherits = TRUE) MRC.FFT(epsFFT, testFFT, K = matrix(TRUE, nrow(testFFT), ncol(testFFT)), lengths, penalty = c("none", "log", "sqrt")) MRC.quant(p, n, r, lengths = 2^(floor(log2(n)):0), penalty = c("none", "log", "sqrt"), name = ".MRC.table", pos = .MCstepR, inherits = TRUE, ...) kMRC.simul(n, r, kern, lengths = 2^(floor(log2(n)):ceiling(log2(length(kern))))) kMRC.pvalue(q, n, r, kern, lengths = 2^(floor(log2(n)):ceiling(log2(length(kern)))), name = ".MRC.ktable", pos = .MCstepR, inherits = TRUE) kMRC.quant(p, n, r, kern, lengths = 2^(floor(log2(n)):ceiling(log2(length(kern)))), name = ".MRC.ktable", pos = .MCstepR, inherits = TRUE, ...)
MRC(x, lengths = 2^(floor(log2(length(x))):0), norm = sqrt(lengths), penalty = c("none", "log", "sqrt")) MRCoeff(x, lengths = 2^(floor(log2(length(x))):0), norm = sqrt(lengths), signed = FALSE) MRC.simul(n, r, lengths = 2^(floor(log2(n)):0), penalty = c("none", "log", "sqrt")) MRC.pvalue(q, n, r, lengths = 2^(floor(log2(n)):0), penalty = c("none", "log", "sqrt"), name = ".MRC.table", pos = .MCstepR, inherits = TRUE) MRC.FFT(epsFFT, testFFT, K = matrix(TRUE, nrow(testFFT), ncol(testFFT)), lengths, penalty = c("none", "log", "sqrt")) MRC.quant(p, n, r, lengths = 2^(floor(log2(n)):0), penalty = c("none", "log", "sqrt"), name = ".MRC.table", pos = .MCstepR, inherits = TRUE, ...) kMRC.simul(n, r, kern, lengths = 2^(floor(log2(n)):ceiling(log2(length(kern))))) kMRC.pvalue(q, n, r, kern, lengths = 2^(floor(log2(n)):ceiling(log2(length(kern)))), name = ".MRC.ktable", pos = .MCstepR, inherits = TRUE) kMRC.quant(p, n, r, kern, lengths = 2^(floor(log2(n)):ceiling(log2(length(kern)))), name = ".MRC.ktable", pos = .MCstepR, inherits = TRUE, ...)
x |
a vector of numerical observations |
lengths |
vector of interval lengths to use, dyadic intervals by default |
signed |
whether signed coefficients should be returned |
q |
quantile |
n |
length of data set |
r |
number of simulations to use |
name , pos , inherits
|
under which name and where precomputed results are stored, or retrieved, see |
K |
a |
epsFFT |
a vector containg the FFT of the data set |
testFFT |
a matrix containing the FFTs of the intervals |
kern |
a filter kernel |
penalty |
penalty term in the multiresolution statistic: |
norm |
how the partial sums should be normalised, by default |
p |
p-value |
... |
further arguments passed to function |
MRC |
a vector giving the maximum as well as the indices of the corresponding interval's start and length |
MRCoeff |
a matrix giving the multiresolution coefficients for all test intervals |
MRC.pvalue , MRC.quant , MRC.simul
|
the corresponding p-value / quantile / vector of simulated values under the assumption of standard Gaussian white noise |
kMRC.pvalue , kMRC.simul , kMRC.simul
|
the corresponding p-value / quantile / vector of simulated values under the assumption of filtered Gaussian white noise |
Davies, P. L., Kovac, A. (2001) Local extremes, runs, strings and multiresolution. The Annals of Statistics 29, 1–65.
Dümbgen, L., Spokoiny, V. (2001) Multiscale testing of qualitative hypotheses. The Annals of Statistics 29, 124–152.
Siegmund, D. O., Venkatraman, E. S. (1995) Using the generalized likelihood ratio statistic for sequential detection of a change-point. The Annals of Statistics 23, 255–271.
Siegmund, D. O., Yakir, B. (2000) Tail probabilities for the null distribution of scanning statistics. Bernoulli 6, 191–213.
monteCarloSimulation
, smuceR
, jsmurf
, stepbound
, stepsel
, quantile
set.seed(100) all.equal(MRC.simul(100, r = 100), sort(monteCarloSimulation(n = 100, r = 100, output = "maximum", penalty = "none", intervalSystem = "dyaLen")), check.attributes = FALSE) # simulate signal of 100 data points set.seed(100) f <- rep(c(0, 2, 0), c(60, 10, 30)) # add gaussian noise x <- f + rnorm(100) # compute multiresolution criterion m <- MRC(x) # compute Monte-Carlo p-value based on 100 simulations MRC.pvalue(m["max"], length(x), 100) # compute multiresolution coefficients M <- MRCoeff(x) # plot multiresolution coefficients, colours show p-values below 5% in 1% steps op <- par(mar = c(5, 4, 2, 4) + 0.1) image(1:length(x), seq(min(x), max(x), length = ncol(M)), apply(M[,ncol(M):1], 1:2, MRC.pvalue, n = length(x), r = 100), breaks = (0:5) / 100, col = rgb(1, seq(0, 1, length = 5), 0, 0.75), xlab = "location / left end of interval", ylab ="measurement", main = "Multiresolution Coefficients", sub = paste("MRC p-value =", signif(MRC.pvalue(m["max"], length(x), 100), 3))) axis(4, min(x) + diff(range(x)) * ( pretty(1:ncol(M) - 1) ) / dim(M)[2], 2^pretty(1:ncol(M) - 1)) mtext("interval lengths", 4, 3) # plot signal and its mean points(x) lines(f, lty = 2) abline(h = mean(x)) par(op)
set.seed(100) all.equal(MRC.simul(100, r = 100), sort(monteCarloSimulation(n = 100, r = 100, output = "maximum", penalty = "none", intervalSystem = "dyaLen")), check.attributes = FALSE) # simulate signal of 100 data points set.seed(100) f <- rep(c(0, 2, 0), c(60, 10, 30)) # add gaussian noise x <- f + rnorm(100) # compute multiresolution criterion m <- MRC(x) # compute Monte-Carlo p-value based on 100 simulations MRC.pvalue(m["max"], length(x), 100) # compute multiresolution coefficients M <- MRCoeff(x) # plot multiresolution coefficients, colours show p-values below 5% in 1% steps op <- par(mar = c(5, 4, 2, 4) + 0.1) image(1:length(x), seq(min(x), max(x), length = ncol(M)), apply(M[,ncol(M):1], 1:2, MRC.pvalue, n = length(x), r = 100), breaks = (0:5) / 100, col = rgb(1, seq(0, 1, length = 5), 0, 0.75), xlab = "location / left end of interval", ylab ="measurement", main = "Multiresolution Coefficients", sub = paste("MRC p-value =", signif(MRC.pvalue(m["max"], length(x), 100), 3))) axis(4, min(x) + diff(range(x)) * ( pretty(1:ncol(M) - 1) ) / dim(M)[2], 2^pretty(1:ncol(M) - 1)) mtext("interval lengths", 4, 3) # plot signal and its mean points(x) lines(f, lty = 2) abline(h = mean(x)) par(op)
Simulated values of the MRC statistic with penalty="sqrt"
based on all interval lengths computed from Gaussian white noise sequences of length 1,000.
Deprecation warning: This data set is needed for smuceR
and may be removed when this function will be removed.
MRC.1000
MRC.1000
A numeric
vector containing 10,000 sorted values.
# threshold value for 95% confidence quantile(stepR::MRC.1000, .95)
# threshold value for 95% confidence quantile(stepR::MRC.1000, .95)
Simulated values of the MRC statistic with penalty="sqrt"
based on all interval lengths computed from Gaussian white noise sequences of ("almost infinite") length 5,000.
Deprecation warning: This data set is needed for smuceR
and may be removed when this function will be removed.
MRC.asymptotic
MRC.asymptotic
A numeric
vector containing 10,000 sorted values.
# "asymptotic" threshold value for 95% confidence quantile(stepR::MRC.asymptotic, .95)
# "asymptotic" threshold value for 95% confidence quantile(stepR::MRC.asymptotic, .95)
Simulated values of the MRC statistic with penalty="sqrt"
based on dyadic interval lengths computed from Gaussian white noise sequences of ("almost infinite") length 100,000.
Deprecation warning: This data set is needed for smuceR
and may be removed when this function will be removed.
MRC.asymptotic.dyadic
MRC.asymptotic.dyadic
A numeric
vector containing 10,000 sorted values.
# "asymptotic" threshold value for 95% confidence quantile(stepR::MRC.asymptotic.dyadic, .95)
# "asymptotic" threshold value for 95% confidence quantile(stepR::MRC.asymptotic.dyadic, .95)
Find integers within some radius of the given ones.
neighbours(k, x = 1:max(k), r = 0)
neighbours(k, x = 1:max(k), r = 0)
k |
integers within whose neighbourhood to look |
x |
allowed integers |
r |
radius within which to look |
Returns those integers in x
which are at most r
from some integer in k
, i.e. the intersection of x
with the union of the balls of radius r
centred at the values of k
. The return values are unique and sorted.
is.element
, match
, findInterval
, stepcand
neighbours(c(10, 0, 5), r = 1) neighbours(c(10, 0, 5), 0:15, r = 1)
neighbours(c(10, 0, 5), r = 1) neighbours(c(10, 0, 5), 0:15, r = 1)
Overview about the supported parametric families (models). More details are given in section 5 of the vignette.
The following parametric families (models and fitting methods) are available. Some of them have additional parameters that have to / can be specified in ...
.
"gauss"
independent normal distributed variables with unknown mean but known, constant standard deviation given by the optional argument sd
. Fits are obtained by the method SMUCE (Frick et al., 2014) for independent normal distributed observations. Argument sd
has to be a single, positive, finite numeric
. If omitted it will be estimated by sdrobnorm
. For monteCarloSimulation
sd == 1
will be used always. The observations argument y
has to be a numeric vector with finite entries. The default interval system is "all"
up to 1000 observations and "dyaLen"
for more observations. Possible lengths are 1:length(y)
. The default penalty is "sqrt"
. In monteCarloSimulation
by default n
random observations will be generated by rnorm
.
"hsmuce"
independent normal distributed variables with unknown mean and also unknown piecewise constant standard deviation as a nuisance parameter. Fits are obtained by the method HSMUCE (Pein et al., 2017). No additional argument has to be given. The observations argument y
has to be a numeric vector with finite entries. The default interval system is "dyaPar"
and possible lengths are 2:length(y)
. The default penalty is "weights"
which will automatically be converted to "none"
in computeStat
and monteCarloSimulation
. In monteCarloSimulation
by default n
random observations will be generated by rnorm
.
"mDependentPS"
normal distributed variables with unknown mean and m-dependent errors with known covariance structure given either by the argument covariances
, correlations
or filter
. Fits are obtained by the method SMUCE (Frick et al., 2014) for m-dependent normal distributed observations using partial sum tests and minimizing the least squares distance (Pein et al., 2017, (7) and (8)). If correlations
or filter
is used to specify the covariances an additional optional argument sd
can be used to specify the constant standard deviation. If covariances
is specified the arguments correlations
, filter
and sd
will be ignored and if correlations
is specified the argument filter
will be ignored. The argument covariances
has to be a finite numeric vector, m will be defined by m = length(covariances) - 1
, giving the vector of covariances, i.e. the first element must be positive, the absolute value of every other element must be smaller than or equal to the first one and the last element should not be zero. The argument correlation
has to be a finite numeric vector, m will be defined by m = length(correlations) - 1
, giving the vector of correlations, i.e. the first element must be 1, the absolute value of every other element must be smaller than or equal to the first one and the last element should not be zero. Covariances will be calculated by correlations * sd^2
. The argument filter
has to be an object of class lowpassFilter
from which the correlation vector will be obtained. The argument sd
has to be a single, positive, finite numeric
. If omitted it will be estimated by sdrobnorm
with lag = m + 1
. For monteCarloSimulation
sd == 1
will be used always. The observations argument y
has to be a numeric vector with finite entries. The default interval system is "dyaLen"
and possible lengths are 1:length(y)
. The default penalty is "sqrt"
. In monteCarloSimulation
by default n
random observations will be generated by calculating the coefficients of the corresponding moving average process and generating random observations from it.
The family is selected via the family
argument, providing the corresponding string, while additional parameters have to / can be specified in ...
if any.
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
Pein, F., Tecuapetla-Gómez, I., Schütte, O., Steinem, C., Munk, A. (2017) Fully-automatic multiresolution idealization for filtered ion channel recordings: flickering event detection. arXiv:1706.03671.
Distributions, sdrobnorm
, rnorm
# parametric family "gauss": independent gaussian errors with constant variance set.seed(1) x <- seq(1 / 100, 1, 1 / 100) y <- c(rnorm(50), rnorm(50, 2)) plot(x, y, pch = 16, col = "grey30", ylim = c(-3, 5)) # computation of SMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, family = "gauss", jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red") # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # "gauss" is default for family identical(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), fit) # missing sd is estimated by sdrobnorm identical(stepFit(y, x = x, alpha = 0.5, family = "gauss", sd = sdrobnorm(y), jumpint = TRUE, confband = TRUE), fit) # parametric family "hsmuce": independent gaussian errors with also # piecewise constant variance # estimaton that is robust against variance changes set.seed(1) y <- c(rnorm(50, 0, 1), rnorm(50, 1, 0.2)) plot(x, y, pch = 16, col = "grey30", ylim = c(-2.5, 2)) # computation of HSMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, family = "hsmuce", jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red") # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # for comparison SMUCE lines(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), lwd = 3, col = "blue", lty = "22") # parametric family "mDependentPS": m dependent observations with known covariances # observations are generated from a moving average process set.seed(1) y <- c(rep(0, 50), rep(2, 50)) + as.numeric(arima.sim(n = 100, list(ar = c(), ma = c(0.8, 0.5, 0.3)), sd = 0.5)) correlations <- as.numeric(ARMAacf(ar = c(), ma = c(0.8, 0.5, 0.3), lag.max = 3)) covariances <- 0.5^2 * correlations plot(x, y, pch = 16, col = "grey30", ylim = c(-2, 4)) # computation of SMUCE for dependent observations with given covariances fit <- stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", covariances = covariances, jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red") # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # for comparison SMUCE for independent gaussian errors lines(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), lwd = 3, col = "blue", lty = "22") # covariance structure can also be given by correlations and sd identical(stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", correlations = correlations, sd = 0.5, jumpint = TRUE, confband = TRUE), fit) # if sd is missing it will be estimated by sdrobnorm identical(stepFit(y, x = x, alpha = 0.5,family = "mDependentPS", correlations = correlations, jumpint = TRUE, confband = TRUE), stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", correlations = correlations, sd = sdrobnorm(y, lag = length(correlations)), jumpint = TRUE, confband = TRUE))
# parametric family "gauss": independent gaussian errors with constant variance set.seed(1) x <- seq(1 / 100, 1, 1 / 100) y <- c(rnorm(50), rnorm(50, 2)) plot(x, y, pch = 16, col = "grey30", ylim = c(-3, 5)) # computation of SMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, family = "gauss", jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red") # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # "gauss" is default for family identical(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), fit) # missing sd is estimated by sdrobnorm identical(stepFit(y, x = x, alpha = 0.5, family = "gauss", sd = sdrobnorm(y), jumpint = TRUE, confband = TRUE), fit) # parametric family "hsmuce": independent gaussian errors with also # piecewise constant variance # estimaton that is robust against variance changes set.seed(1) y <- c(rnorm(50, 0, 1), rnorm(50, 1, 0.2)) plot(x, y, pch = 16, col = "grey30", ylim = c(-2.5, 2)) # computation of HSMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, family = "hsmuce", jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red") # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # for comparison SMUCE lines(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), lwd = 3, col = "blue", lty = "22") # parametric family "mDependentPS": m dependent observations with known covariances # observations are generated from a moving average process set.seed(1) y <- c(rep(0, 50), rep(2, 50)) + as.numeric(arima.sim(n = 100, list(ar = c(), ma = c(0.8, 0.5, 0.3)), sd = 0.5)) correlations <- as.numeric(ARMAacf(ar = c(), ma = c(0.8, 0.5, 0.3), lag.max = 3)) covariances <- 0.5^2 * correlations plot(x, y, pch = 16, col = "grey30", ylim = c(-2, 4)) # computation of SMUCE for dependent observations with given covariances fit <- stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", covariances = covariances, jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red") # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # for comparison SMUCE for independent gaussian errors lines(stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE), lwd = 3, col = "blue", lty = "22") # covariance structure can also be given by correlations and sd identical(stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", correlations = correlations, sd = 0.5, jumpint = TRUE, confband = TRUE), fit) # if sd is missing it will be estimated by sdrobnorm identical(stepFit(y, x = x, alpha = 0.5,family = "mDependentPS", correlations = correlations, jumpint = TRUE, confband = TRUE), stepFit(y, x = x, alpha = 0.5, family = "mDependentPS", correlations = correlations, sd = sdrobnorm(y, lag = length(correlations)), jumpint = TRUE, confband = TRUE))
Overview about the supported penalties. More details are also given in section 3.2 of the vignette.
The penalties (ways to balance different scales) can be divided into two groups: scale penalisation and balancing by weights. More precisely, the scale penalisations "sqrt"
, "log"
and "none"
and balancing by weights called "weights"
are available.
Let T
be the unpenalised test statistic of the specified parametric family on an interval of length l
and nq
the number of observations used for the penalisation, typically the number of observations n
but can also be chosen larger.
"sqrt"
penalised statistic is sqrt(2 * T) - sqrt(2 * log(exp(1) * nq / l)
. This penalisation is proposed in (Frick et al., 2014) and guarantees for most parametric families that the penalised multiscale statistic is asymptotically finite. This is not true for parametric family "hsmuce"
. Hence, this penalisation is recommended and the default one for the parametric families "gauss"
and "mDependentPS"
, but not for "hsmuce"
.
"log"
penalised statistic is T - log(exp(1) * nq / l)
. This penalisation is outdated and only still supported for comparisons.
"none"
no penalisation, penalised statistic is equal to the unpenalised. Multiscale regression without a penalisation is not recommend.
"weights"
critical values will be computed by weights, see section 3.2.2 in the vignette and (Pein et al., 2017, section 2) for more details. This penalty is recommend and the default one for the parametric family "hsmuce"
, but can also be used for other families. Will be replaced by "none"
in computeStat
and monteCarloSimulation
.
The penalisation is selected via the penalty
argument providing the corresponding string. If NULL
the default penalty of the specified parametric family will be used, see parametricFamily
for which one this will be.
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
set.seed(1) y <- c(rnorm(50), rnorm(50, 2)) # penalty "sqrt" fit <- stepFit(y, alpha = 0.5, penalty = "sqrt", jumpint = TRUE, confband = TRUE) # default for family "gauss" identical(stepFit(y, alpha = 0.5, jumpint = TRUE, confband = TRUE), fit) # penalty "weights" !identical(stepFit(y, alpha = 0.5, penalty = "weights", jumpint = TRUE, confband = TRUE), fit) # penalty "weights" is default for parametric family "hsmuce" # by default equal weights are chosen identical(stepFit(y, alpha = 0.5, family = "hsmuce", jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "hsmuce", penalty = "weights", weights = rep(1 / 6, 6), jumpint = TRUE, confband = TRUE)) # different weights !identical(stepFit(y, alpha = 0.5, family = "hsmuce", weights = 6:1 / sum(6:1), jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "hsmuce", penalty = "weights", weights = rep(1 / 6, 6), jumpint = TRUE, confband = TRUE)) # penalty "sqrt is default for parametric family "mDependentPS" identical(stepFit(y, alpha = 0.5, family = "mDependentPS", covariances = c(1, 0.5), jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "mDependentPS", covariances = c(1, 0.5), penalty = "sqrt", jumpint = TRUE, confband = TRUE))
set.seed(1) y <- c(rnorm(50), rnorm(50, 2)) # penalty "sqrt" fit <- stepFit(y, alpha = 0.5, penalty = "sqrt", jumpint = TRUE, confband = TRUE) # default for family "gauss" identical(stepFit(y, alpha = 0.5, jumpint = TRUE, confband = TRUE), fit) # penalty "weights" !identical(stepFit(y, alpha = 0.5, penalty = "weights", jumpint = TRUE, confband = TRUE), fit) # penalty "weights" is default for parametric family "hsmuce" # by default equal weights are chosen identical(stepFit(y, alpha = 0.5, family = "hsmuce", jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "hsmuce", penalty = "weights", weights = rep(1 / 6, 6), jumpint = TRUE, confband = TRUE)) # different weights !identical(stepFit(y, alpha = 0.5, family = "hsmuce", weights = 6:1 / sum(6:1), jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "hsmuce", penalty = "weights", weights = rep(1 / 6, 6), jumpint = TRUE, confband = TRUE)) # penalty "sqrt is default for parametric family "mDependentPS" identical(stepFit(y, alpha = 0.5, family = "mDependentPS", covariances = c(1, 0.5), jumpint = TRUE, confband = TRUE), stepFit(y, alpha = 0.5, family = "mDependentPS", covariances = c(1, 0.5), penalty = "sqrt", jumpint = TRUE, confband = TRUE))
Robust estimation of the standard deviation of Gaussian data.
sdrobnorm(x, p = c(0.25, 0.75), lag = 1, supressWarningNA = FALSE, supressWarningResultNA = FALSE)
sdrobnorm(x, p = c(0.25, 0.75), lag = 1, supressWarningNA = FALSE, supressWarningResultNA = FALSE)
x |
a vector of numerical observations. |
p |
vector of two distinct probabilities |
lag |
a single integer giving the lag of the difference used, see |
supressWarningNA |
a single logical, if |
supressWarningResultNA |
a single logical, if |
Compares the difference between the estimated sample quantile corresponding to p
after taking (lag
ged) differences) with the corresponding theoretical quantiles of Gaussian white noise to determine the standard deviation under a Gaussian assumption. If the data contain (few) jumps, this will (on average) be a slight overestimate of the true standard deviation.
This estimator has been inspired by (1.7) in (Davies and Kovac, 2001).
Returns the estimate of the sample's standard deviation, i.e. a single non-negative numeric, NA
if length(x) < lag + 2
.
Davies, P. L., Kovac, A. (2001) Local extremes, runs, strings and multiresolution. The Annals of Statistics 29, 1–65.
sd
, diff
, parametricFamily, family
# simulate data sample y <- rnorm(100, c(rep(1, 50), rep(10, 50)), 2) # estimate standard deviation sdrobnorm(y)
# simulate data sample y <- rnorm(100, c(rep(1, 50), rep(10, 50)), 2) # estimate standard deviation sdrobnorm(y)
Computes the SMUCE estimator for one-dimensional data.
Deprecation warning: This function is deprecated, but still working, however, may be defunct in a future version. Please use instead the function stepFit
. At the moment some families are supported by this function that are not supported by the current version of stepFit
. They will be added in a future version. An example how to reproduce results is given below.
smuceR(y, x = 1:length(y), x0 = 2 * x[1] - x[2], q = thresh.smuceR(length(y)), alpha, r, lengths, family = c("gauss", "gaussvar", "poisson", "binomial"), param, jumpint = confband, confband = FALSE) thresh.smuceR(v)
smuceR(y, x = 1:length(y), x0 = 2 * x[1] - x[2], q = thresh.smuceR(length(y)), alpha, r, lengths, family = c("gauss", "gaussvar", "poisson", "binomial"), param, jumpint = confband, confband = FALSE) thresh.smuceR(v)
y |
a numeric vector containing the serial data |
x |
a numeric vector of the same length as |
x0 |
a single numeric giving the last unobserved sample point directly before sampling started |
q |
threshold value, by default chosen automatically according to Frick et al.~(2013) |
alpha |
significance level; if set to a value in (0,1), |
r |
numer of simulations; if specified along |
lengths |
length of intervals considered; by default up to a sample size of 1000 all lengths, otherwise only dyadic lengths |
family , param
|
specifies distribution of data, see family |
jumpint |
|
confband |
|
v |
number of data points |
For smuceR
, an object of class stepfit
that contains the fit; if jumpint == TRUE
function jumpint
allows to extract the 1 - alpha
confidence interval for the jumps, if confband == TRUE
function confband
allows to extract the 1 - alpha
confidence band.
For thresh.smuceR
, a precomputed threshhold value, see reference.
Frick, K., Munk, A., and Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Futschik, A., Hotz, T., Munk, A. Sieling, H. (2014) Multiresolution DNA partitioning: statistical evidence for segments. Bioinformatics, 30(16), 2255–2262.
stepFit
, stepbound
, bounds
, family, MRC.asymptotic
, sdrobnorm
, stepfit
y <- rnorm(100, c(rep(0, 50), rep(1, 50)), 0.5) # fitted function, confidence intervals, and confidence band by stepFit all.equal(fitted(smuceR(y, q = 1)), fitted(stepFit(y, q = 1))) all.equal(fitted(smuceR(y, alpha = 0.5)), fitted(stepFit(y, q = as.numeric(quantile(stepR::MRC.1000, 0.5))))) all.equal(fitted(smuceR(y)), fitted(stepFit(y, q = thresh.smuceR(length(y))))) all.equal(jumpint(smuceR(y, q = 1, jumpint = TRUE)), jumpint(stepFit(y, q = 1, jumpint = TRUE))) all.equal(confband(smuceR(y, q = 1, confband = TRUE)), confband(stepFit(y, q = 1, confband = TRUE)), check.attributes = FALSE) # simulate poisson data with two levels y <- rpois(100, c(rep(1, 50), rep(4, 50))) # compute fit, q is chosen automatically fit <- smuceR(y, family="poisson", confband = TRUE) # plot result plot(y) lines(fit) # plot confidence intervals for jumps on axis points(jumpint(fit), col="blue") # confidence band lines(confband(fit), lty=2, col="blue") # simulate binomial data with two levels y <- rbinom(200,3,rep(c(0.1,0.7),c(110,90))) # compute fit, q is the 0.9-quantile of the (asymptotic) null distribution fit <- smuceR(y, alpha=0.1, family="binomial", param=3, confband = TRUE) # plot result plot(y) lines(fit) # plot confidence intervals for jumps on axis points(jumpint(fit), col="blue") # confidence band lines(confband(fit), lty=2, col="blue")
y <- rnorm(100, c(rep(0, 50), rep(1, 50)), 0.5) # fitted function, confidence intervals, and confidence band by stepFit all.equal(fitted(smuceR(y, q = 1)), fitted(stepFit(y, q = 1))) all.equal(fitted(smuceR(y, alpha = 0.5)), fitted(stepFit(y, q = as.numeric(quantile(stepR::MRC.1000, 0.5))))) all.equal(fitted(smuceR(y)), fitted(stepFit(y, q = thresh.smuceR(length(y))))) all.equal(jumpint(smuceR(y, q = 1, jumpint = TRUE)), jumpint(stepFit(y, q = 1, jumpint = TRUE))) all.equal(confband(smuceR(y, q = 1, confband = TRUE)), confband(stepFit(y, q = 1, confband = TRUE)), check.attributes = FALSE) # simulate poisson data with two levels y <- rpois(100, c(rep(1, 50), rep(4, 50))) # compute fit, q is chosen automatically fit <- smuceR(y, family="poisson", confband = TRUE) # plot result plot(y) lines(fit) # plot confidence intervals for jumps on axis points(jumpint(fit), col="blue") # confidence band lines(confband(fit), lty=2, col="blue") # simulate binomial data with two levels y <- rbinom(200,3,rep(c(0.1,0.7),c(110,90))) # compute fit, q is the 0.9-quantile of the (asymptotic) null distribution fit <- smuceR(y, alpha=0.1, family="binomial", param=3, confband = TRUE) # plot result plot(y) lines(fit) # plot confidence intervals for jumps on axis points(jumpint(fit), col="blue") # confidence band lines(confband(fit), lty=2, col="blue")
Constructs an object containing a step function sampled over finitely many values.
stepblock(value, leftEnd = c(1, rightEnd[-length(rightEnd)] + 1), rightEnd, x0 = 0) ## S3 method for class 'stepblock' x[i, j, drop = if(missing(i)) TRUE else if(missing(j)) FALSE else length(j) == 1, ...] ## S3 method for class 'stepblock' print(x, ...) ## S3 method for class 'stepblock' plot(x, type = "c", xlab = "x", ylab = "y", main = "Step function", sub = NULL, ...) ## S3 method for class 'stepblock' lines(x, type = "c", ...)
stepblock(value, leftEnd = c(1, rightEnd[-length(rightEnd)] + 1), rightEnd, x0 = 0) ## S3 method for class 'stepblock' x[i, j, drop = if(missing(i)) TRUE else if(missing(j)) FALSE else length(j) == 1, ...] ## S3 method for class 'stepblock' print(x, ...) ## S3 method for class 'stepblock' plot(x, type = "c", xlab = "x", ylab = "y", main = "Step function", sub = NULL, ...) ## S3 method for class 'stepblock' lines(x, type = "c", ...)
value |
a numeric vector containing the fitted values for each block; its length gives the number of blocks |
leftEnd |
a numeric vector of the same length as |
rightEnd |
a numeric vector of the same length as |
x0 |
a single numeric giving the last unobserved sample point directly before sampling started, i.e. before |
x |
the object |
i , j , drop
|
see |
type |
|
xlab , ylab , main , sub
|
see |
... |
for generic methods only |
For stepblock
an object of class stepblock
, i.e. a data.frame
with columns value
, leftEnd
and rightEnd
and attr
ibute x0
.
For the purposes of this package step functions are taken to be left-continuous, i.e. the function jumps after the rightEnd
of a block.
However, step functions are usually sampled at a discrete set of points so that the exact position of the jump is unknown, except that it has to occur before the next sampling point; this is expressed in the implementation by the specification of a leftEnd
within the block so that every rightEnd
and leftEnd
is a sampling point (or the boundary of the observation window), there is no sampling point between one block's rightEnd
and the following block's leftEnd
, while the step function is constant at least on the closed interval with boundary leftEnd
, rightEnd
.
step
, stepfit
, family, [.data.frame
, plot
, lines
# step function consisting of 3 blocks: 1 on (0, 3]; 2 on (3, 6], 0 on (6, 8] # sampled on the integers 1:10 f <- stepblock(value = c(1, 2, 0), rightEnd = c(3, 6, 8)) f # show different plot types plot(f, type = "C") lines(f, type = "E", lty = 2, col = "red") lines(f, type = "B", lty = 3, col = "blue") legend("bottomleft", legend = c("C", "E", "B"), lty = 1:3, col = c("black", "red", "blue"))
# step function consisting of 3 blocks: 1 on (0, 3]; 2 on (3, 6], 0 on (6, 8] # sampled on the integers 1:10 f <- stepblock(value = c(1, 2, 0), rightEnd = c(3, 6, 8)) f # show different plot types plot(f, type = "C") lines(f, type = "E", lty = 2, col = "red") lines(f, type = "B", lty = 3, col = "blue") legend("bottomleft", legend = c("C", "E", "B"), lty = 1:3, col = c("black", "red", "blue"))
Computes piecewise constant maximum likelihood estimators with minimal number of jumps under given restrictions on subintervals.
Deprecation warning: This function is a help function for smuceR
and jsmurf
and may be removed when these function will be removed.
stepbound(y, bounds, ...) ## Default S3 method: stepbound(y, bounds, x = 1:length(y), x0 = 2 * x[1] - x[2], max.cand = NULL, family = c("gauss", "gaussvar", "poisson", "binomial", "gaussKern"), param = NULL, weights = rep(1, length(y)), refit = y, jumpint = confband, confband = FALSE, ...) ## S3 method for class 'stepcand' stepbound(y, bounds, refit = TRUE, ...)
stepbound(y, bounds, ...) ## Default S3 method: stepbound(y, bounds, x = 1:length(y), x0 = 2 * x[1] - x[2], max.cand = NULL, family = c("gauss", "gaussvar", "poisson", "binomial", "gaussKern"), param = NULL, weights = rep(1, length(y)), refit = y, jumpint = confband, confband = FALSE, ...) ## S3 method for class 'stepcand' stepbound(y, bounds, refit = TRUE, ...)
y |
a vector of numerical observations |
bounds |
bounds on the value allowed on intervals; typically computed with |
x |
a numeric vector of the same length as |
x0 |
a single numeric giving the last unobserved sample point directly before sampling started |
max.cand , weights
|
see |
family , param
|
specifies distribution of data, see family |
refit |
|
jumpint |
|
confband |
|
... |
arguments to be passed to generic methods |
An object of class stepfit
that contains the fit; if jumpint == TRUE
function jumpint
allows to extract the confidence interval for the jumps, if confband == TRUE
function confband
allows to extract the confidence band.
Frick, K., Munk, A., and Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Hotz, T., Schütte, O., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C., and Munk, A. (2013) Idealizing ion channel recordings by a jump segmentation multiresolution filter. IEEE Transactions on NanoBioscience 12(4), 376–386.
bounds
, smuceR
, jsmurf
, stepsel
, stepfit
, jumpint
, confband
# simulate poisson data with two levels y <- rpois(100, c(rep(1, 50), rep(4, 50))) # compute bounds b <- bounds(y, penalty="len", family="poisson", q=4) # fit step function to bounds sb <- stepbound(y, b, family="poisson", confband=TRUE) plot(y) lines(sb) # plot confidence intervals for jumps on axis points(jumpint(sb), col="blue") # confidence band lines(confband(sb), lty=2, col="blue")
# simulate poisson data with two levels y <- rpois(100, c(rep(1, 50), rep(4, 50))) # compute bounds b <- bounds(y, penalty="len", family="poisson", q=4) # fit step function to bounds sb <- stepbound(y, b, family="poisson", confband=TRUE) plot(y) lines(sb) # plot confidence intervals for jumps on axis points(jumpint(sb), col="blue") # confidence band lines(confband(sb), lty=2, col="blue")
Find candidates for jumps in serial data by forward selection.
stepcand(y, x = 1:length(y), x0 = 2 * x[1] - x[2], max.cand = NULL, family = c("gauss", "gaussvar", "poisson", "binomial", "gaussKern"), param = NULL, weights = rep(1, length(y)), cand.radius = 0)
stepcand(y, x = 1:length(y), x0 = 2 * x[1] - x[2], max.cand = NULL, family = c("gauss", "gaussvar", "poisson", "binomial", "gaussKern"), param = NULL, weights = rep(1, length(y)), cand.radius = 0)
y |
a numeric vector containing the serial data |
x |
a numeric vector of the same length as |
x0 |
a single numeric giving the last unobserved sample point directly before sampling started |
max.cand |
single integer giving the maximal number of blocks to find; defaults to using all data (note: there will be one block more than the number of jumps |
family |
distribution of the errors, either |
param |
additional parameters specifying the distribution of the errors; the number of trials for family |
weights |
a numeric vector of the same length as |
cand.radius |
a non-negative integer: adds for each candidate found all indices that are at most |
An object of class stepcand
extending class stepfit
such that it can be used as an input to steppath.stepcand
: additionally contains columns
cumSum |
The cumulative sum of |
cumSumSq |
The cumulative sum of squares of |
cumSumWe |
The cumulative sum of weights up to |
improve |
The improvement this jump brought about when it was selected. |
# simulate 5 blocks (4 jumps) within a total of 100 data points b <- c(sort(sample(1:99, 4)), 100) f <- rep(rnorm(5, 0, 4), c(b[1], diff(b))) rbind(b = b, f = unique(f), lambda = exp(unique(f) / 10) * 20) # add gaussian noise x <- f + rnorm(100) # find 10 candidate jumps stepcand(x, max.cand = 10) # for poisson observations y <- rpois(100, exp(f / 10) * 20) # find 10 candidate jumps stepcand(y, max.cand = 10, family = "poisson") # for binomial observations size <- 10 z <- rbinom(100, size, pnorm(f / 10)) # find 10 candidate jumps stepcand(z, max.cand = 10, family = "binomial", param = size)
# simulate 5 blocks (4 jumps) within a total of 100 data points b <- c(sort(sample(1:99, 4)), 100) f <- rep(rnorm(5, 0, 4), c(b[1], diff(b))) rbind(b = b, f = unique(f), lambda = exp(unique(f) / 10) * 20) # add gaussian noise x <- f + rnorm(100) # find 10 candidate jumps stepcand(x, max.cand = 10) # for poisson observations y <- rpois(100, exp(f / 10) * 20) # find 10 candidate jumps stepcand(y, max.cand = 10, family = "poisson") # for binomial observations size <- 10 z <- rbinom(100, size, pnorm(f / 10)) # find 10 candidate jumps stepcand(z, max.cand = 10, family = "binomial", param = size)
Constructs an object containing a step function fitted to some data.
stepfit(cost, family, value, param = NULL, leftEnd, rightEnd, x0, leftIndex = leftEnd, rightIndex = rightEnd) ## S3 method for class 'stepfit' x[i, j, drop = if(missing(i)) TRUE else if(missing(j)) FALSE else length(j) == 1, refit = FALSE] ## S3 method for class 'stepfit' print(x, ...) ## S3 method for class 'stepfit' plot(x, dataspace = TRUE, ...) ## S3 method for class 'stepfit' lines(x, dataspace = TRUE, ...) ## S3 method for class 'stepfit' fitted(object, ...) ## S3 method for class 'stepfit' residuals(object, y, ...) ## S3 method for class 'stepfit' logLik(object, df = NULL, nobs = object$rightIndex[nrow(object)], ...)
stepfit(cost, family, value, param = NULL, leftEnd, rightEnd, x0, leftIndex = leftEnd, rightIndex = rightEnd) ## S3 method for class 'stepfit' x[i, j, drop = if(missing(i)) TRUE else if(missing(j)) FALSE else length(j) == 1, refit = FALSE] ## S3 method for class 'stepfit' print(x, ...) ## S3 method for class 'stepfit' plot(x, dataspace = TRUE, ...) ## S3 method for class 'stepfit' lines(x, dataspace = TRUE, ...) ## S3 method for class 'stepfit' fitted(object, ...) ## S3 method for class 'stepfit' residuals(object, y, ...) ## S3 method for class 'stepfit' logLik(object, df = NULL, nobs = object$rightIndex[nrow(object)], ...)
cost |
the value of the cost-functional used for the fit: RSS for family |
family |
distribution of the errors, either |
value |
a numeric vector containing the fitted values for each block; its length gives the number of blocks |
param |
additional paramters specifying the distribution of the errors, the number of trials for family |
leftEnd |
a numeric vector of the same length as |
rightEnd |
a numeric vector of the same length as |
x0 |
a single numeric giving the last unobserved sample point directly before sampling started, i.e. before |
leftIndex |
a numeric vector of the same length as |
rightIndex |
a numeric vector of the same length as |
x , object
|
the object |
y |
a numeric vector containing the data with which to compare the fit |
df |
the number of estimated parameters: by default the number of blocks for families |
nobs |
the number of observations used for estimating |
... |
for generic methods only |
i , j , drop
|
see |
refit |
|
dataspace |
|
stepfit |
an object of class |
[.stepfit |
an object of class |
fitted.stepfit |
a numeric vector of length |
residuals.stepfit |
a numeric vector of length |
logLik.stepfit |
an object of class |
plot.stepfit , plot.stepfit
|
the corresponding functions for |
stepblock
, stepbound
, steppath
, stepsel
, family, "[.data.frame"
, fitted
, residuals
, logLik
, AIC
# simulate 5 blocks (4 jumps) within a total of 100 data points b <- c(sort(sample(1:99, 4)), 100) p <- rep(runif(5), c(b[1], diff(b))) # success probabilities # binomial observations, each with 10 trials y <- rbinom(100, 10, p) # find solution with 5 blocks fit <- steppath(y, family = "binomial", param = 10)[[5]] plot(y, ylim = c(0, 10)) lines(fit, col = "red") # residual diagnostics for Gaussian data yg <- rnorm(100, qnorm(p), 1) fitg <- steppath(yg)[[5]] plot(yg, ylim = c(0, 10)) lines(fitg, col = "red") plot(resid(fitg, yg)) qqnorm(resid(fitg, yg))
# simulate 5 blocks (4 jumps) within a total of 100 data points b <- c(sort(sample(1:99, 4)), 100) p <- rep(runif(5), c(b[1], diff(b))) # success probabilities # binomial observations, each with 10 trials y <- rbinom(100, 10, p) # find solution with 5 blocks fit <- steppath(y, family = "binomial", param = 10)[[5]] plot(y, ylim = c(0, 10)) lines(fit, col = "red") # residual diagnostics for Gaussian data yg <- rnorm(100, qnorm(p), 1) fitg <- steppath(yg)[[5]] plot(yg, ylim = c(0, 10)) lines(fitg, col = "red") plot(resid(fitg, yg)) qqnorm(resid(fitg, yg))
Computes the multiscale regression estimator, see (3.1) in the vignette, and allows for confidence statements, see section 3 in the vignette. It implements the estimators SMUCE and HSMUCE as well as their confidence intervals and bands.
If q == NULL
a Monte-Carlo simulation is required for computing critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package saves them by default in the workspace and on the file system such that a second call requiring the same Monte-Carlo simulation will be much faster. For more details, in particular to which arguments the Monte-Carlo simulations are specific, see Section Storing of Monte-Carlo simulations below. Progress of a Monte-Carlo simulation can be reported by the argument messages
and the saving can be controlled by the argument option
, both can be specified in ...
and are explained in monteCarloSimulation
and critVal
, respectively.
stepFit(y, q = NULL, alpha = NULL, x = 1:length(y), x0 = 2 * x[1] - x[2], family = NULL, intervalSystem = NULL, lengths = NULL, confband = FALSE, jumpint = confband, ...)
stepFit(y, q = NULL, alpha = NULL, x = 1:length(y), x0 = 2 * x[1] - x[2], family = NULL, intervalSystem = NULL, lengths = NULL, confband = FALSE, jumpint = confband, ...)
y |
a numeric vector containing the observations |
q |
either |
alpha |
a probability, i.e. a single numeric between 0 and 1, giving the significance level. Its choice is a trade-off between data fit and parsimony of the estimator. In other words, this argument balances the risks of missing change-points and detecting additional artefacts. For more details on this choice see (Frick et al., 2014, section 4) and (Pein et al., 2017, section 3.4). Either |
x |
a numeric vector of the same length as |
x0 |
a single numeric giving the last unobserved sample point directly before sampling started |
family |
a string specifying the assumed parametric family, for more details see parametricFamily, currently |
intervalSystem |
a string giving the used interval system, either |
lengths |
an integer vector giving the set of lengths, i.e. only intervals of these lengths will be considered. Note that not all lengths are possible for all interval systems and for all parametric families, see intervalSystem and parametricFamily, respectively, to see which ones are allowed. By default ( |
confband |
single |
jumpint |
single |
... |
there are two groups of further arguments:
|
An object of class stepfit
that contains the fit. If jumpint == TRUE
function jumpint
allows to extract the 1 - alpha
confidence interval for the jumps. If confband == TRUE
function confband
allows to extract the 1 - alpha
confidence band.
If q == NULL
a Monte-Carlo simulation is required for computing critical values. Since a Monte-Carlo simulation lasts potentially much longer (up to several hours or days if the number of observations is in the millions) than the main calculations, this package offers multiple possibilities for saving and loading the simulations. Progress of a simulation can be reported by the argument messages
which can be specified in ...
and is explained in the documentation of monteCarloSimulation
. Each Monte-Carlo simulation is specific to the number of observations, the parametric family (including certain parameters, see parametricFamily) and the interval system, and for simulations of class "MCSimulationMaximum"
, additionally, to the set of lengths and the used penalty. Monte-Carlo simulations can also be performed for a (slightly) larger number of observations given in the argument
nq
in ...
and explained in the documentation of critVal
, which avoids extensive resimulations for only a little bit varying number of observations. Simulations can either be saved in the workspace in the variable critValStepRTab
or persistently on the file system for which the package R.cache
is used. Moreover, storing in and loading from variables and RDS files is supported. Finally, a pre-simulated collection of simulations can be accessed by installing the package stepRdata
available from http://www.stochastik.math.uni-goettingen.de/stepRdata_1.0-0.tar.gz. The simulation, saving and loading can be controlled by the argument option
which can be specified in ...
and is explained in the documentation of critVal
. By default simulations will be saved in the workspace and on the file system. For more details and for how simulation can be removed see Section Simulating, saving and loading of Monte-Carlo simulations in critVal
.
Frick, K., Munk, A., Sieling, H. (2014) Multiscale change-point inference. With discussion and rejoinder by the authors. Journal of the Royal Statistical Society, Series B 76(3), 495–580.
Pein, F., Sieling, H., Munk, A. (2017) Heterogeneous change point inference. Journal of the Royal Statistical Society, Series B, 79(4), 1207–1227.
critVal
, penalty
, parametricFamily
, intervalSystem
, monteCarloSimulation
# generate random observations y <- c(rnorm(50), rnorm(50, 1)) x <- seq(0.01, 1, 0.01) plot(x, y, pch = 16, col = "grey30", ylim = c(-3, 4)) # computation of SMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red") # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # higher significance level for larger detection power, but less confidence stepFit(y, x = x, alpha = 0.99, jumpint = TRUE, confband = TRUE) # smaller significance level for the small risk that the number of # change-points is overestimated with probability not more than 5%, # but smaller detection power stepFit(y, x = x, alpha = 0.05, jumpint = TRUE, confband = TRUE) # different interval system, lengths, penalty and given parameter sd stepFit(y, x = x, alpha = 0.5, intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "weights", weights = c(0.4, 0.3, 0.2, 0.1), sd = 0.5, jumpint = TRUE, confband = TRUE) # with given q identical(stepFit(y, x = x, q = critVal(100L, alpha = 0.5), jumpint = TRUE, confband = TRUE), fit) identical(stepFit(y, x = x, q = critVal(100L, alpha = 0.5, output = "value"), jumpint = TRUE, confband = TRUE), fit) # the above calls saved and (attempted to) load Monte-Carlo simulations and # simulated them for nq = 128 observations # in the following call no saving, no loading and simulation for n = 100 # observations is required, progress of the simulation will be reported stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE, messages = 1000L, options = list(simulation = "vector", load = list(), save = list())) # with given stat to compute q stat <- monteCarloSimulation(n = 128L) identical(stepFit(y, x = x, alpha = 0.5, stat = stat, jumpint = TRUE, confband = TRUE), stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE, options = list(load = list())))
# generate random observations y <- c(rnorm(50), rnorm(50, 1)) x <- seq(0.01, 1, 0.01) plot(x, y, pch = 16, col = "grey30", ylim = c(-3, 4)) # computation of SMUCE and its confidence statements fit <- stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE) lines(fit, lwd = 3, col = "red", lty = "22") # confidence intervals for the change-point locations points(jumpint(fit), col = "red") # confidence band lines(confband(fit), lty = "22", col = "darkred", lwd = 2) # higher significance level for larger detection power, but less confidence stepFit(y, x = x, alpha = 0.99, jumpint = TRUE, confband = TRUE) # smaller significance level for the small risk that the number of # change-points is overestimated with probability not more than 5%, # but smaller detection power stepFit(y, x = x, alpha = 0.05, jumpint = TRUE, confband = TRUE) # different interval system, lengths, penalty and given parameter sd stepFit(y, x = x, alpha = 0.5, intervalSystem = "dyaLen", lengths = c(1L, 2L, 4L, 8L), penalty = "weights", weights = c(0.4, 0.3, 0.2, 0.1), sd = 0.5, jumpint = TRUE, confband = TRUE) # with given q identical(stepFit(y, x = x, q = critVal(100L, alpha = 0.5), jumpint = TRUE, confband = TRUE), fit) identical(stepFit(y, x = x, q = critVal(100L, alpha = 0.5, output = "value"), jumpint = TRUE, confband = TRUE), fit) # the above calls saved and (attempted to) load Monte-Carlo simulations and # simulated them for nq = 128 observations # in the following call no saving, no loading and simulation for n = 100 # observations is required, progress of the simulation will be reported stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE, messages = 1000L, options = list(simulation = "vector", load = list(), save = list())) # with given stat to compute q stat <- monteCarloSimulation(n = 128L) identical(stepFit(y, x = x, alpha = 0.5, stat = stat, jumpint = TRUE, confband = TRUE), stepFit(y, x = x, alpha = 0.5, jumpint = TRUE, confband = TRUE, options = list(load = list())))
Find optimal fits with step-functions having jumps at given candidate positions for all possible subset sizes.
steppath(y, ..., max.blocks) ## Default S3 method: steppath(y, x = 1:length(y), x0 = 2 * x[1] - x[2], max.cand = NULL, family = c("gauss", "gaussvar", "poisson", "binomial", "gaussKern"), param = NULL, weights = rep(1, length(y)), cand.radius = 0, ..., max.blocks = max.cand) ## S3 method for class 'stepcand' steppath(y, ..., max.blocks = sum(!is.na(y$number))) ## S3 method for class 'steppath' x[[i]] ## S3 method for class 'steppath' length(x) ## S3 method for class 'steppath' print(x, ...) ## S3 method for class 'steppath' logLik(object, df = NULL, nobs = object$cand$rightIndex[nrow(object$cand)], ...)
steppath(y, ..., max.blocks) ## Default S3 method: steppath(y, x = 1:length(y), x0 = 2 * x[1] - x[2], max.cand = NULL, family = c("gauss", "gaussvar", "poisson", "binomial", "gaussKern"), param = NULL, weights = rep(1, length(y)), cand.radius = 0, ..., max.blocks = max.cand) ## S3 method for class 'stepcand' steppath(y, ..., max.blocks = sum(!is.na(y$number))) ## S3 method for class 'steppath' x[[i]] ## S3 method for class 'steppath' length(x) ## S3 method for class 'steppath' print(x, ...) ## S3 method for class 'steppath' logLik(object, df = NULL, nobs = object$cand$rightIndex[nrow(object$cand)], ...)
for steppath
:
y |
either an object of class |
x , x0 , max.cand , family , param , weights , cand.radius
|
for |
max.blocks |
single integer giving the maximal number of blocks to find; defaults to number of candidates (note: there will be one block more than the number of jumps |
... |
for generic methods only |
for methods on a steppath
object x
or object
:
object |
the object |
i |
if this is an integer returns the fit with |
df |
the number of estimated parameters: by default the number of blocks for families |
nobs |
the number of observations used for estimating |
For steppath
an object of class steppath
, i.e. a list
with components
path |
A list of length |
cost |
A numeric vector of length |
cand |
An object of class |
[[.steppath
returns the fit with i
blocks as an object of class stepfit
; length.steppath
the maximum number of blocks for which a fit has been computed. logLik.stepfit
returns an object of class logLik
giving the likelihood of the data given the fits corresponding to cost
, e.g. for use with AIC.
Friedrich, F., Kempe, A., Liebscher, V., Winkler, G. (2008) Complexity penalized M-estimation: fast computation. Journal of Computational and Graphical Statistics 17(1), 201–224.
stepcand
, stepfit
, family, logLik
, AIC
# simulate 5 blocks (4 jumps) within a total of 100 data points b <- c(sort(sample(1:99, 4)), 100) f <- rep(rnorm(5, 0, 4), c(b[1], diff(b))) # add Gaussian noise x <- f + rnorm(100) # find 10 candidate jumps cand <- stepcand(x, max.cand = 10) cand # compute solution path path <- steppath(cand) path plot(x) lines(path[[5]], col = "red") # compare result having 5 blocks with truth fit <- path[[5]] fit logLik(fit) AIC(logLik(fit)) cbind(fit, trueRightEnd = b, trueLevel = unique(f)) # for poisson observations y <- rpois(100, exp(f / 10) * 20) # compute solution path, compare result having 5 blocks with truth cbind(steppath(y, max.cand = 10, family = "poisson")[[5]], trueRightEnd = b, trueIntensity = exp(unique(f) / 10) * 20) # for binomial observations size <- 10 z <- rbinom(100, size, pnorm(f / 10)) # compute solution path, compare result having 5 blocks with truth cbind(steppath(z, max.cand = 10, family = "binomial", param = size)[[5]], trueRightEnd = b, trueIntensity = pnorm(unique(f) / 10)) # an example where stepcand is not optimal but indices found are close to optimal ones blocks <- c(rep(0, 9), 1, 3, rep(1, 9)) blocks stepcand(blocks, max.cand = 3)[,c("rightEnd", "value", "number")] # erroneously puts the "1" into the right block in the first step steppath(blocks)[[3]][,c("rightEnd", "value")] # putting the "1" in the middle block is optimal steppath(blocks, max.cand = 3, cand.radius = 1)[[3]][,c("rightEnd", "value")] # also looking in the 1-neighbourhood remedies the problem
# simulate 5 blocks (4 jumps) within a total of 100 data points b <- c(sort(sample(1:99, 4)), 100) f <- rep(rnorm(5, 0, 4), c(b[1], diff(b))) # add Gaussian noise x <- f + rnorm(100) # find 10 candidate jumps cand <- stepcand(x, max.cand = 10) cand # compute solution path path <- steppath(cand) path plot(x) lines(path[[5]], col = "red") # compare result having 5 blocks with truth fit <- path[[5]] fit logLik(fit) AIC(logLik(fit)) cbind(fit, trueRightEnd = b, trueLevel = unique(f)) # for poisson observations y <- rpois(100, exp(f / 10) * 20) # compute solution path, compare result having 5 blocks with truth cbind(steppath(y, max.cand = 10, family = "poisson")[[5]], trueRightEnd = b, trueIntensity = exp(unique(f) / 10) * 20) # for binomial observations size <- 10 z <- rbinom(100, size, pnorm(f / 10)) # compute solution path, compare result having 5 blocks with truth cbind(steppath(z, max.cand = 10, family = "binomial", param = size)[[5]], trueRightEnd = b, trueIntensity = pnorm(unique(f) / 10)) # an example where stepcand is not optimal but indices found are close to optimal ones blocks <- c(rep(0, 9), 1, 3, rep(1, 9)) blocks stepcand(blocks, max.cand = 3)[,c("rightEnd", "value", "number")] # erroneously puts the "1" into the right block in the first step steppath(blocks)[[3]][,c("rightEnd", "value")] # putting the "1" in the middle block is optimal steppath(blocks, max.cand = 3, cand.radius = 1)[[3]][,c("rightEnd", "value")] # also looking in the 1-neighbourhood remedies the problem
Select the number of jumps.
stepsel(path, y, type = c("MRC", "AIC", "BIC"), ...) stepsel.MRC(path, y, q, alpha = 0.05, r = ceiling(50 / min(alpha, 1 - alpha)), lengths = if(attr(path$cand, "family") == "gaussKern") 2^(floor(log2(length(y))):ceiling(log2(length(attr(path$cand, "param")$kern)))) else 2^(floor(log2(length(y))):0), penalty = c("none", "log", "sqrt"), name = if(attr(path$cand, "family") == "gaussKern") ".MRC.ktable" else ".MRC.table", pos = .MCstepR) stepsel.AIC(path, ...) stepsel.BIC(path, ...)
stepsel(path, y, type = c("MRC", "AIC", "BIC"), ...) stepsel.MRC(path, y, q, alpha = 0.05, r = ceiling(50 / min(alpha, 1 - alpha)), lengths = if(attr(path$cand, "family") == "gaussKern") 2^(floor(log2(length(y))):ceiling(log2(length(attr(path$cand, "param")$kern)))) else 2^(floor(log2(length(y))):0), penalty = c("none", "log", "sqrt"), name = if(attr(path$cand, "family") == "gaussKern") ".MRC.ktable" else ".MRC.table", pos = .MCstepR) stepsel.AIC(path, ...) stepsel.BIC(path, ...)
path |
an object of class |
y |
for |
type |
how to select, dispatches specific method |
... |
further argument passed to specific method |
q , alpha , r , lengths , penalty , name , pos
|
see |
A single integer giving the number of blocks selected, with attr
ibute crit
containing the values of the criterion (MRC / AIC / BIC) for each fit in the path.
To obtain the threshold described in Boysen et al.~(2009, Theorem~5), set q=(1+delta) * sdrobnorm(y) * sqrt(2*length(y))
for some positive delta
and penalty="none"
.
Boysen, L., Kempe, A., Liebscher, V., Munk, A., Wittich, O. (2009) Consistencies and rates of convergence of jump-penalized least squares estimators. The Annals of Statistics 37(1), 157–183.
Yao, Y.-C. (1988) Estimating the number of change-points via Schwarz' criterion. Statistics & Probability Letters 6, 181–189.
steppath
, stepfit
, family, stepbound
# simulate 5 blocks (4 jumps) within a total of 100 data points b <- c(sort(sample(1:99, 4)), 100) f <- rep(rnorm(5, 0, 4), c(b[1], diff(b))) rbind(b = b, f = unique(f)) # add gaussian noise y <- f + rnorm(100) # find 10 candidate jumps path <- steppath(y, max.cand = 10) # select number of jumps by simulated MRC with sqrt-penalty # thresholded with positive delta, and by BIC sel.MRC <- stepsel(path, y, "MRC", alpha = 0.05, r = 1e2, penalty = "sqrt") sel.MRC delta <- .1 sel.delta <- stepsel(path, y, "MRC", q = (1 + delta) * sdrobnorm(y) * sqrt(2 * length(y)), penalty = "none") sel.delta sel.BIC <- stepsel(path, type="BIC") sel.BIC # compare results with truth fit.MRC <- path[[sel.MRC]] as.data.frame(fit.MRC) as.data.frame(path[[sel.delta]]) as.data.frame(path[[sel.BIC]])
# simulate 5 blocks (4 jumps) within a total of 100 data points b <- c(sort(sample(1:99, 4)), 100) f <- rep(rnorm(5, 0, 4), c(b[1], diff(b))) rbind(b = b, f = unique(f)) # add gaussian noise y <- f + rnorm(100) # find 10 candidate jumps path <- steppath(y, max.cand = 10) # select number of jumps by simulated MRC with sqrt-penalty # thresholded with positive delta, and by BIC sel.MRC <- stepsel(path, y, "MRC", alpha = 0.05, r = 1e2, penalty = "sqrt") sel.MRC delta <- .1 sel.delta <- stepsel(path, y, "MRC", q = (1 + delta) * sdrobnorm(y) * sqrt(2 * length(y)), penalty = "none") sel.delta sel.BIC <- stepsel(path, type="BIC") sel.BIC # compare results with truth fit.MRC <- path[[sel.MRC]] as.data.frame(fit.MRC) as.data.frame(path[[sel.delta]]) as.data.frame(path[[sel.BIC]])
For developers only; users should look at the function improveSmallScales
in the CRAN package clampSeg
. Implements the second step of HILDE (Pein et al., 2020, Section III-B) in which an initial fit is tested for missed short events.
.testSmallScales(data, family, lengths = NULL, q, alpha, ...)
.testSmallScales(data, family, lengths = NULL, q, alpha, ...)
data |
a numeric vector containing the observations |
family |
a string specifying the assumed parametric family, currently |
lengths |
an integer vector giving the set of lengths, i.e. only intervals of these lengths will be considered. By default ( |
q |
either |
alpha |
a probability, i.e. a single numeric between 0 and 1, giving the significance level. Its choice is a trade-off between data fit and parsimony of the estimator. In other words, this argument balances the risks of missing change-points and detecting additional artefacts. For more details on this choice see (Frick et al., 2014, section 4) and (Pein et al., 2017, section 3.4). Either |
... |
there are two groups of further arguments:
|
a list
with entries jumps, addLeft, addRight, noDeconvolution, data, q
Pein, F., Bartsch, A., Steinem, C., and Munk, A. (2020) Heterogeneous idealization of ion channel recordings - Open channel noise. Submitted.
Reimplementation of VanDongen's algorithm for detecting jumps in ion channel recordings.
Deprecation warning: This function is mainly used for patchlamp recordings and may be transferred to a specialised package.
transit(y, x = 1:length(y), x0 = 2 * x[1] - x[2], sigma.amp = NA, sigma.slope = NA, amp.thresh = 3, slope.thresh = 2, rel.amp.n = 3, rel.amp.thresh = 4, family = c("gauss", "gaussKern"), param = NULL, refit = FALSE)
transit(y, x = 1:length(y), x0 = 2 * x[1] - x[2], sigma.amp = NA, sigma.slope = NA, amp.thresh = 3, slope.thresh = 2, rel.amp.n = 3, rel.amp.thresh = 4, family = c("gauss", "gaussKern"), param = NULL, refit = FALSE)
y |
a numeric vector containing the serial data |
sigma.amp |
amplitude (i.e. raw data within block) standard deviation; estimated using |
sigma.slope |
slope (i.e. central difference within block) standard deviation; estimated using |
amp.thresh |
amplitude threshold |
slope.thresh |
slope threshold |
rel.amp.n |
relative amplitude threshold will be used for blocks with no more datapoints than this |
rel.amp.thresh |
relative amplitude threshold |
x |
a numeric vector of the same length as |
x0 |
a single numeric giving the last unobserved sample point directly before sampling started |
family , param
|
specifies distribution of data, see family |
refit |
should the |
Returns an object of class stepfit
which encodes the jumps and corresponding mean values.
Only central, no forward differences have been used in this implementation. Moreover, the standard deviations will be estimated by sdrobnorm
if omitted (respecting the filter's effect if applicable).
VanDongen, A. M. J. (1996) A new algorithm for idealizing single ion channel data containing multiple unknown conductance levels. Biophysical Journal 70(3), 1303–1315.
stepfit
, sdrobnorm
, jsmurf
, stepbound
, steppath
# estimating step-functions with Gaussian white noise added # simulate a Gaussian hidden Markov model of length 1000 with 2 states # with identical transition rates 0.01, and signal-to-noise ratio 2 sim <- contMC(1e3, 0:1, matrix(c(0, 0.01, 0.01, 0), 2), param=1/2) plot(sim$data, cex = 0.1) lines(sim$cont, col="red") # maximum-likelihood estimation under multiresolution constraints fit.MRC <- smuceR(sim$data$y, sim$data$x) lines(fit.MRC, col="blue") # choose number of jumps using BIC path <- steppath(sim$data$y, sim$data$x, max.blocks=1e2) fit.BIC <- path[[stepsel.BIC(path)]] lines(fit.BIC, col="green3", lty = 2) # estimate after filtering # simulate filtered ion channel recording with two states set.seed(9) # sampling rate 10 kHz sampling <- 1e4 # tenfold oversampling over <- 10 # 1 kHz 4-pole Bessel-filter, adjusted for oversampling cutoff <- 1e3 df.over <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling / over)) # two states, leaving state 1 at 10 Hz, state 2 at 20 Hz rates <- rbind(c(0, 10), c(20, 0)) # simulate 0.5 s, level 0 corresponds to state 1, level 1 to state 2 # noise level is 0.3 after filtering Sim <- contMC(0.5 * sampling, 0:1, rates, sampling=sampling, family="gaussKern", param = list(df=df.over, over=over, sd=0.3)) plot(Sim$data, pch = ".") lines(Sim$discr, col = "red") # fit under multiresolution constraints using filter corresponding to sample rate df <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling)) Fit.MRC <- jsmurf(Sim$data$y, Sim$data$x, param=df, r=1e2) lines(Fit.MRC, col = "blue") # fit using TRANSIT Fit.trans <- transit(Sim$data$y, Sim$data$x) lines(Fit.trans, col = "green3", lty=2)
# estimating step-functions with Gaussian white noise added # simulate a Gaussian hidden Markov model of length 1000 with 2 states # with identical transition rates 0.01, and signal-to-noise ratio 2 sim <- contMC(1e3, 0:1, matrix(c(0, 0.01, 0.01, 0), 2), param=1/2) plot(sim$data, cex = 0.1) lines(sim$cont, col="red") # maximum-likelihood estimation under multiresolution constraints fit.MRC <- smuceR(sim$data$y, sim$data$x) lines(fit.MRC, col="blue") # choose number of jumps using BIC path <- steppath(sim$data$y, sim$data$x, max.blocks=1e2) fit.BIC <- path[[stepsel.BIC(path)]] lines(fit.BIC, col="green3", lty = 2) # estimate after filtering # simulate filtered ion channel recording with two states set.seed(9) # sampling rate 10 kHz sampling <- 1e4 # tenfold oversampling over <- 10 # 1 kHz 4-pole Bessel-filter, adjusted for oversampling cutoff <- 1e3 df.over <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling / over)) # two states, leaving state 1 at 10 Hz, state 2 at 20 Hz rates <- rbind(c(0, 10), c(20, 0)) # simulate 0.5 s, level 0 corresponds to state 1, level 1 to state 2 # noise level is 0.3 after filtering Sim <- contMC(0.5 * sampling, 0:1, rates, sampling=sampling, family="gaussKern", param = list(df=df.over, over=over, sd=0.3)) plot(Sim$data, pch = ".") lines(Sim$discr, col = "red") # fit under multiresolution constraints using filter corresponding to sample rate df <- dfilter("bessel", list(pole=4, cutoff=cutoff / sampling)) Fit.MRC <- jsmurf(Sim$data$y, Sim$data$x, param=df, r=1e2) lines(Fit.MRC, col = "blue") # fit using TRANSIT Fit.trans <- transit(Sim$data$y, Sim$data$x) lines(Fit.trans, col = "green3", lty=2)