Title: | Simultaneous Confidence Bands for the Mean of Functional Data |
---|---|
Description: | Statistical methods for estimating and inferring the mean of functional data. The methods include simultaneous confidence bands, local polynomial fitting, bandwidth selection by plug-in and cross-validation, goodness-of-fit tests for parametric models, equality tests for two-sample problems, and plotting functions. |
Authors: | David Degras |
Maintainer: | David Degras <[email protected]> |
License: | GPL-3 |
Version: | 1.2.2 |
Built: | 2024-11-20 06:33:30 UTC |
Source: | CRAN |
Statistical methods for estimating and inferring the mean of functional data. The methods include simultaneous confidence bands, local polynomial fitting, bandwidth selection by plug-in and cross-validation, goodness-of-fit tests for parametric models, equality tests for two-sample problems, and plotting functions.
Package: | SCBmeanfd |
Type: | Package |
Version: | 1.2.1 |
Date: | 2016-09-24 |
License: | GPL-3 |
David Degras <[email protected]>
Azzalini, A. and Bowman, A. (1993). On the use of nonparametric regression for checking linear relationships. Journal of the Royal Statistical Society. Series B 55, 549-557.
Benhenni, K. and Degras, D. (2014). Local polynomial estimation of the mean function and its derivatives based on functional data and regular designs. ESAIM: Probability and Statistics 18, 881-899.
Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica 21, 1735-1765.
Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B 53, 233-243.
Compute the cross-validation score of Rice and Silverman (1991) for the local polynomial estimation of a mean function.
cv.score(bandwidth, x, y, degree = 1, gridsize = length(x))
cv.score(bandwidth, x, y, degree = 1, gridsize = length(x))
bandwidth |
kernel bandwidth. |
x |
observation points. Missing values are not accepted. |
y |
matrix or data frame with functional observations (= curves) stored in rows. The number of columns of |
degree |
degree of the local polynomial fit. |
gridsize |
size of evaluation grid for the smoothed data. |
The cross-validation score is obtained by leaving in turn each curve out and computing the prediction error of the local polynomial smoother based on all other curves. For a bandwith value , this score is
where is the measurement of the
-th curve at location
for
and
, and
is the local polynomial estimator with bandwidth
based on all curves except the
-th.
If the x
values are not equally spaced, the data are first smoothed and evaluated on a grid of length gridsize
spanning the range of x
. The smoothed data are then interpolated back to x
.
the cross-validation score.
Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B (Methodological), 53, 233–243.
## Artificial example x <- seq(0, 1, len = 100) mu <- x + .2 * sin(2 * pi * x) y <- matrix(mu + rnorm(2000, sd = .25), 20, 100, byrow = TRUE) h <- c(.005, .01, .02, .05, .1, .15) cv <- numeric() for (i in 1:length(h)) cv[i] <- cv.score(h[i], x, y, 1) plot(h, cv, type = "l") ## Plasma citrate data ## Compare cross-validation scores and bandwidths ## for local linear and local quadratic smoothing ## Not run: data(plasma) time <- 8:21 h1 <- seq(.5, 1.3, .05) h2 <- seq(.75, 2, .05) cv1 <- sapply(h1, cv.score, x = time, y = plasma, degree = 1) cv2 <- sapply(h2, cv.score, x = time, y = plasma, degree = 2) plot(h1, cv1, type = "l", xlim = range(c(h1,h2)), ylim = range(c(cv1, cv2)), xlab = "Bandwidth (hour)", ylab = "CV score", main = "Cross validation for local polynomial estimation") lines(h2, cv2, col = 2) legend("topleft", legend = c("Linear", "Quadratic"), lty = 1, col = 1:2, cex = .9) ## Note: using local linear (resp. quadratic) smoothing ## with a bandwidth smaller than .5 (resp. .75) can result ## in non-definiteness or numerical instability of the estimator. ## End(Not run)
## Artificial example x <- seq(0, 1, len = 100) mu <- x + .2 * sin(2 * pi * x) y <- matrix(mu + rnorm(2000, sd = .25), 20, 100, byrow = TRUE) h <- c(.005, .01, .02, .05, .1, .15) cv <- numeric() for (i in 1:length(h)) cv[i] <- cv.score(h[i], x, y, 1) plot(h, cv, type = "l") ## Plasma citrate data ## Compare cross-validation scores and bandwidths ## for local linear and local quadratic smoothing ## Not run: data(plasma) time <- 8:21 h1 <- seq(.5, 1.3, .05) h2 <- seq(.75, 2, .05) cv1 <- sapply(h1, cv.score, x = time, y = plasma, degree = 1) cv2 <- sapply(h2, cv.score, x = time, y = plasma, degree = 2) plot(h1, cv1, type = "l", xlim = range(c(h1,h2)), ylim = range(c(cv1, cv2)), xlab = "Bandwidth (hour)", ylab = "CV score", main = "Cross validation for local polynomial estimation") lines(h2, cv2, col = 2) legend("topleft", legend = c("Linear", "Quadratic"), lty = 1, col = 1:2, cex = .9) ## Note: using local linear (resp. quadratic) smoothing ## with a bandwidth smaller than .5 (resp. .75) can result ## in non-definiteness or numerical instability of the estimator. ## End(Not run)
Select the cross-validation bandwidth described in Rice and Silverman (1991) for the local polynomial estimation of a mean function based on functional data.
cv.select(x, y, degree = 1, interval = NULL, gridsize = length(x), ...)
cv.select(x, y, degree = 1, interval = NULL, gridsize = length(x), ...)
x |
observation points. Missing values are not accepted. |
y |
matrix or data frame with functional observations (= curves) stored in rows. The number of columns of |
degree |
degree of the local polynomial fit. |
interval |
lower and upper bounds of the search interval (numeric vector of length 2). |
gridsize |
size of evaluation grid for the smoothed data. |
... |
additional arguments to pass to the optimization function |
The cross-validation score is obtained by leaving in turn each curve out and computing the prediction error of the local polynomial smoother based on all other curves. For a bandwith value , this score is
where is the measurement of the
-th curve at location
for
and
, and
is the local polynomial estimator with bandwidth
based on all curves except the
-th.
If the x
values are not equally spaced, the data are first smoothed and evaluated on a grid of length gridsize
spanning the range of x
. The smoothed data are then interpolated back to x
.
cv.select
uses the standard R function optimize
to optimize cv.score
. If the argument interval
is not specified, the lower bound of the search interval is by default if
and
if
. The default value of the upper bound is
. These values guarantee in most cases that the local polynomial estimator is well defined. It is often useful to plot the function to be optimized for a range of argument values (grid search) before applying a numerical optimizer. In this way, the search interval can be narrowed down and the optimizer is more likely to find a global solution.
a bandwidth that minimizes the cross-validation score.
Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B (Methodological), 53, 233–243.
## Not run: ## Plasma citrate data ## Compare cross-validation scores and bandwidths ## for local linear and local quadratic smoothing data(plasma) time <- 8:21 ## Local linear smoothing cv.select(time, plasma, 1) # local solution h = 3.76, S(h) = 463.08 cv.select(time, plasma, 1, interval = c(.5, 1)) # global solution = .75, S(h) = 439.54 ## Local quadratic smoothing cv.select(time, plasma, 2) # global solution h = 1.15, S(h) = 432.75 cv.select(time, plasma, 2, interval = c(1, 1.5)) # same ## End(Not run)
## Not run: ## Plasma citrate data ## Compare cross-validation scores and bandwidths ## for local linear and local quadratic smoothing data(plasma) time <- 8:21 ## Local linear smoothing cv.select(time, plasma, 1) # local solution h = 3.76, S(h) = 463.08 cv.select(time, plasma, 1, interval = c(.5, 1)) # global solution = .75, S(h) = 439.54 ## Local quadratic smoothing cv.select(time, plasma, 2) # global solution h = 1.15, S(h) = 432.75 cv.select(time, plasma, 2, interval = c(1, 1.5)) # same ## End(Not run)
Log-periodograms of five phonemes ("sh" as in "she", "dcl" as in "dark", "iy" as the vowel in "she", "aa" as the vowel in "dark", and "ao" as the first vowel in "water") spoken by a sample of ~50 male speakers. For each phoneme, 400 log-periodograms are observed on a uniform grid of 150 frequencies.
data(phoneme)
data(phoneme)
A data frame with 2000 rows and 151 columns. Each row contains a discretized log-periodogram (150 first columns) followed by a phoneme indicator (last column) coded as follows:
Code | Phoneme |
1 | "sh" |
2 | "iy" |
3 | "dcl" |
4 | "aa" |
5 | "ao" |
This dataset was created by the STAPH group in Toulouse, France
http://www.math.univ-toulouse.fr/staph/npfda/.
The larger original dataset can be found at
http://statweb.stanford.edu/~tibs/ElemStatLearn/.
## Not run: data(phoneme) freq <- 1:150 classes <- phoneme[,151] phoneme <- phoneme[,-151] classnames <- c("sh", "iy", "dcl", "aa", "ao") ## Local linear fit to the mean log-periodogram for each phoneme llfit <- mapply(locpoly, y = by(phoneme, classes, colMeans), MoreArgs = list(x = freq, bandwidth = 2, degree = 1, gridsize = length(freq))) llfit.y <- matrix(unlist(llfit["y",]), 150, 5) matplot(freq, llfit.y, type = "l", lty = 1, xlab = "Frequency (scaled)", ylab = "Log-intensity", main = "Local linear estimation\nof the population mean log-periodogram") legend("topright", legend = classnames, col = 1:5, lty = 1) ## End(Not run)
## Not run: data(phoneme) freq <- 1:150 classes <- phoneme[,151] phoneme <- phoneme[,-151] classnames <- c("sh", "iy", "dcl", "aa", "ao") ## Local linear fit to the mean log-periodogram for each phoneme llfit <- mapply(locpoly, y = by(phoneme, classes, colMeans), MoreArgs = list(x = freq, bandwidth = 2, degree = 1, gridsize = length(freq))) llfit.y <- matrix(unlist(llfit["y",]), 150, 5) matplot(freq, llfit.y, type = "l", lty = 1, xlab = "Frequency (scaled)", ylab = "Log-intensity", main = "Local linear estimation\nof the population mean log-periodogram") legend("topright", legend = classnames, col = 1:5, lty = 1) ## End(Not run)
Plasma citrate concentrations measured on 10 human subjects on the same day. The measurements on an individual were taken each hour from 8:00AM to 9:00PM. A possible statistical analysis is to estimate the population mean plasma citrate concentration as a function of time of day.
data(plasma)
data(plasma)
A matrix with 10 rows (corresponding to subjects) and 14 columns (corresponding to hours).
Andersen, A. H., Jensen, E. B. and Schou, G. (1981). Two-way analysis of variance with correlated errors. International Statistical Review 49, 153–157.
Hart, T. D. and Wehrly, T. E. (1986). Kernel regression estimation using repeated measurements data. Journal of the American Statistical Association 81, 1080–1088.
## Not run: data(plasma) matplot(x = 8:21, y = t(plasma), cex = .75, type = "l", col = 1, lty = 1, lwd = .5, xlab = "Time of day (hour)", ylab = "Concentration", main = "Plasma citrate data") lines(8:21, colMeans(plasma), col = 2, lwd = 1.5) legend("topright", col = 2, lty = 1, lwd = 1.5, legend = "Mean") ## End(Not run)
## Not run: data(plasma) matplot(x = 8:21, y = t(plasma), cex = .75, type = "l", col = 1, lty = 1, lwd = .5, xlab = "Time of day (hour)", ylab = "Concentration", main = "Plasma citrate data") lines(8:21, colMeans(plasma), col = 2, lwd = 1.5) legend("topright", col = 2, lty = 1, lwd = 1.5, legend = "Mean") ## End(Not run)
plot
method for class "SCBand"
.
## S3 method for class 'SCBand' plot(x , y = NULL, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, col = NULL, cex = NULL, pch = NULL, lty = NULL, lwd = NULL, legend = TRUE, where = NULL, text = NULL, legend.cex = NULL, horiz = TRUE, bty = "n", ...)
## S3 method for class 'SCBand' plot(x , y = NULL, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, col = NULL, cex = NULL, pch = NULL, lty = NULL, lwd = NULL, legend = TRUE, where = NULL, text = NULL, legend.cex = NULL, horiz = TRUE, bty = "n", ...)
x |
|
y |
optional y data. |
xlim |
x limits of the plot (numeric vector of length 2). |
ylim |
y limits of the plot (numeric vector of length 2). |
main |
title of the plot. |
xlab |
label of the x axis. |
ylab |
label of the y axis. |
col |
colors for lines and points. |
cex |
scale of plotting characters and symbols relative to default (numerical vector). |
pch |
vector of plotting characters or symbols: see |
lty |
a vector of line types, see |
lwd |
a vector of line widths, see |
legend |
logical; if |
where |
legend location: |
text |
text of the legend (character vector). |
legend.cex |
character expansion factor relative to current par("cex") for the legend. |
horiz |
logical; if TRUE, the legend is displayed horizontally rather than vertically. |
bty |
type of box to be drawn around the legend. The allowed values are |
... |
additional arguments passed to the function |
The argument y
can be used to plot subsets of the y data.
If non null, this argument has priority over the component x$y
for plotting.
The graphical parameters col
, cex
, pch
, lty
, and lwd
apply to the following components to be plotted: data, parametric estimate, nonparametric estimate(s), normal simultaneous confidence bands (SCB), and bootstrap SCB. More precisely, cex
and pch
must be specified as vectors of length equal to the number of data sets to be plotted (0, 1, or 2);
lty
and lwd
must specified as numeric vectors of length equal to the total number of estimates and SCB components; col
applies to all components and should be specified accordingly. If necessary, graphical parameters are recycled to match the required length.
By default a legend is plotted horizontally at the bottom of the graph.
## Not run: ## Plasma citrate data time <- 8:21 data(plasma) h <- cv.select(time, plasma, degree = 1, interval = c(.5, 1)) scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100) plot(scbplasma, cex = .2, legend.cex = .85, xlab = "Time of day", ylab = "Concentration", main = "Plasma citrate data") ## End(Not run)
## Not run: ## Plasma citrate data time <- 8:21 data(plasma) h <- cv.select(time, plasma, degree = 1, interval = c(.5, 1)) scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100) plot(scbplasma, cex = .2, legend.cex = .85, xlab = "Time of day", ylab = "Concentration", main = "Plasma citrate data") ## End(Not run)
Implement the Pseudo-Likelihood Ratio Test (PLRT) of Azzalini and Bowman (1993) with functional data. The test is used to assess whether a mean function belongs to a given finite-dimensional function space.
plrt.model(x, y, model, verbose = FALSE)
plrt.model(x, y, model, verbose = FALSE)
x |
a numeric vector of x data. |
y |
a matrix or data frame with functional observations (= curves) stored in rows. The number of columns of |
model |
an integer specifying the degree of a polynomial basis, or a data frame/matrix containing the basis functions stored in columns. In the latter case, the basis functions must be evaluated at |
verbose |
logical; if |
the p value of the PLRT.
Azzalini, A. and Bowman, A. (1993). On the use of nonparametric regression for checking linear relationships. Journal of the Royal Statistical Society. Series B 55, 549-557.
## Example: Gaussian process with mean = linear function + bump ## and Onstein-Uhlenbeck covariance. The bump is high in the y ## direction and narrow in the x direction. The SCB and PLRT ## tests are compared. # The departure from linearity in the mean function is strong # in the supremum norm (SCB test) but mild in the euclidean norm # (PLRT). With either n = 20 or n = 100 curves, the SCB test # strongly rejects the incorrect linear model for the mean # function while the PLRT retains it. p <- 100 # number of observation points x <- seq(0, 1, len = p) mu <- -1 + 1.5 * x + 0.2 * dnorm(x, .6, .02) plot(x, mu, type = "l") R <- (.25)^2 * exp(20 * log(.9) * abs(outer(x,x,"-"))) # covariance eigR <- eigen(R, symmetric = TRUE) simR <- eigR$vectors %*% diag(sqrt(eigR$values)) n <- 20 set.seed(100) y <- mu + simR %*% matrix(rnorm(n*p), p, n) y <- t(y) points(x, colMeans(y)) h <- cv.select(x, y, 1) scb.model(x, y, 1, bandwidth = h) # p value: <1e-16 plrt.model(x, y, 1, verbose = TRUE) # p value: .442 n <- 100 y <- mu + simR %*% matrix(rnorm(n*p), p, n) y <- t(y) h <- cv.select(x, y, 1) scb.model(x, y, 1, bandwidth = h) # p value: <1e-16 plrt.model(x, y, 1, verbose = TRUE) # p value: .456
## Example: Gaussian process with mean = linear function + bump ## and Onstein-Uhlenbeck covariance. The bump is high in the y ## direction and narrow in the x direction. The SCB and PLRT ## tests are compared. # The departure from linearity in the mean function is strong # in the supremum norm (SCB test) but mild in the euclidean norm # (PLRT). With either n = 20 or n = 100 curves, the SCB test # strongly rejects the incorrect linear model for the mean # function while the PLRT retains it. p <- 100 # number of observation points x <- seq(0, 1, len = p) mu <- -1 + 1.5 * x + 0.2 * dnorm(x, .6, .02) plot(x, mu, type = "l") R <- (.25)^2 * exp(20 * log(.9) * abs(outer(x,x,"-"))) # covariance eigR <- eigen(R, symmetric = TRUE) simR <- eigR$vectors %*% diag(sqrt(eigR$values)) n <- 20 set.seed(100) y <- mu + simR %*% matrix(rnorm(n*p), p, n) y <- t(y) points(x, colMeans(y)) h <- cv.select(x, y, 1) scb.model(x, y, 1, bandwidth = h) # p value: <1e-16 plrt.model(x, y, 1, verbose = TRUE) # p value: .442 n <- 100 y <- mu + simR %*% matrix(rnorm(n*p), p, n) y <- t(y) h <- cv.select(x, y, 1) scb.model(x, y, 1, bandwidth = h) # p value: <1e-16 plrt.model(x, y, 1, verbose = TRUE) # p value: .456
Select the plug-in bandwidth described in Benhenni and Degras (2011) for the local polynomial estimation of a mean function and its first derivative based on functional data.
plugin.select(x, y, drv = 0L, degree = drv+1L, gridsize = length(x), ...)
plugin.select(x, y, drv = 0L, degree = drv+1L, gridsize = length(x), ...)
x |
numeric vector of x data. This observation grid must be uniform and missing values are not accepted. |
y |
matrix or data frame with functional observations (= curves) stored in rows. The number of columns of |
drv |
order of the derivative to estimate. Must be 0 or 1. |
degree |
degree of local polynomial used. Must equal |
gridsize |
the size of the grid over which the mean function is to be estimated.
Defaults to |
... |
additional arguments to pass to the optimizer of the CV score. |
The plug-in method should not be used with small data sets, since it is based on asymptotic considerations and requires reasonably accurate estimates of derivatives of the mean and covariance functions. Both the number of observed curves and observation points should be moderate to large. The plug-in bandwidth is designed to minimize the asymptotic mean integrated squared estimation error
where is the mean function and
is a local polynomial estimator with kernel bandwidth
. The expression of the plug-in bandwidth can be found in Benhenni and Degras (2011).
the plug-in bandwidth.
Benhenni, K. and Degras, D. (2011). Local polynomial estimation of the average growth curve with functional data. http://arxiv.org/abs/1107.4058
## Not run: ## Phoneme data data(phoneme) classes <- phoneme[,151] phoneme <- phoneme[,-151] freq <- 1:150 plugin.bandwidth <- numeric(5) cv.bandwidth <- numeric(5) # compare with cross-validation for (i in 1:5) { plugin.bandwidth[i] <- plugin.select(x = freq, y = phoneme[classes == i, ], drv = 0, degree = 1) cv.bandwidth[i] <- cv.select(x = freq, y = phoneme[classes == i, ], degree = 1) } round(cbind(plugin.bandwidth, cv.bandwidth), 4) ## End(Not run)
## Not run: ## Phoneme data data(phoneme) classes <- phoneme[,151] phoneme <- phoneme[,-151] freq <- 1:150 plugin.bandwidth <- numeric(5) cv.bandwidth <- numeric(5) # compare with cross-validation for (i in 1:5) { plugin.bandwidth[i] <- plugin.select(x = freq, y = phoneme[classes == i, ], drv = 0, degree = 1) cv.bandwidth[i] <- cv.select(x = freq, y = phoneme[classes == i, ], degree = 1) } round(cbind(plugin.bandwidth, cv.bandwidth), 4) ## End(Not run)
print
method for class "SCBand"
.
## S3 method for class 'SCBand' print(x,...)
## S3 method for class 'SCBand' print(x,...)
x |
an object of class |
... |
for compatibility with the generic |
The function print.SCBand
concisely displays the information of an object of class "SCBand"
. More precisely it shows the
data range, bandwidth used in local polynomial estimation, and key information on SCB and statistical tests.
## Not run: # Plasma citrate data data(plasma) time <- 8:21 h <- cv.select(time, plasma, 1) scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100) scbplasma ## End(Not run)
## Not run: # Plasma citrate data data(plasma) time <- 8:21 h <- cv.select(time, plasma, 1) scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100) scbplasma ## End(Not run)
This two-sample test builds simultaneous confidence bands (SCB) for the difference between two population mean functions and retains the equality assumption if the null function is contained in the bands. Equivalently, SCB are built around one of the local linear estimates (the one for say, population 1), and the equality hypothesis is accepted if the other estimate (the one for population 2) lies within the bands.
scb.equal(x, y, bandwidth, level = .05, degree = 1, scbtype = c("normal","bootstrap","both","no"), gridsize = NULL, keep.y = TRUE, nrep = 2e4, nboot = 1e4, parallel = c("no","multicore","snow"), ncpus = getOption("boot.ncpus",1L), cl = NULL)
scb.equal(x, y, bandwidth, level = .05, degree = 1, scbtype = c("normal","bootstrap","both","no"), gridsize = NULL, keep.y = TRUE, nrep = 2e4, nboot = 1e4, parallel = c("no","multicore","snow"), ncpus = getOption("boot.ncpus",1L), cl = NULL)
x |
a list of length 2 or matrix with 2 columns containing the x values of each sample. If the two samples are observed on the same grid, |
y |
a list of length 2 containing matrices or data frames with functional observations (= curves) stored in rows. The number of columns of each component of |
bandwidth |
the kernel bandwidths (numeric vector of length 1 or 2). |
level |
the significance level of the test (default = .05). |
degree |
the degree of the local polynomial fit. |
scbtype |
the type of simultaneous confidence bands to build: "normal", "bootstrap", "both", or "no". |
gridsize |
the size of the grid over which the mean function is to be estimated. Defaults to the length of the smallest |
keep.y |
logical; if |
nrep |
the number of replicates for the normal SCB method (default = 20,000). |
nboot |
the number of replicates for the bootstrap SCB method (default = 5,000). |
parallel |
the computation method for the bootstrap SCB. By default, computations are sequential ( |
ncpus |
the number of cores to use for parallel computing if |
cl |
the name of the cluster to use for parallel computing if |
A list object of class "SCBand"
. Depending on the function used to create the object (scb.mean
, scb.model
, or scb.equal
), some of its components are set to NULL
. For scb.mean
, the object has components:
x |
the argument |
y |
if |
call |
the function call. |
model |
|
par |
|
nonpar |
a list of two local linear estimates, one for each population. |
bandwidth |
the argument |
degree |
the degree of local polynomial used. Currently, only local linear estimation is supported. |
level |
the argument |
scbtype |
the argument |
teststat |
the test statistic. |
pnorm |
the p value for the normal-based statistical test. |
pboot |
the p value for the boostrap-based statistical test. |
qnorm |
the quantile used to build the normal SCB. |
qboot |
the quantile used to build the bootstrap SCB. |
normscb |
a matrix containing the normal SCB stored in columns. |
bootscb |
a matrix containing the bootstrap SCB stored in columns. |
gridsize |
the argument |
nrep |
the argument |
nboot |
the argument |
Depending on the value of scbtype
, some or all of
the fields pnorm
, qnorm
, normscb
, nrep
, pboot
, qboot
, normboot
and nboot
may be NULL
.
Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica, 21, 1735–1765.
## Not run: # Phoneme data: compare the mean log-periodograms # for phonemes "aa" as the vowel in "dark" and "ao" # as the first vowel in "water" data(phoneme) n <- nrow(phoneme) N <- ncol(phoneme) classes <- split(1:n,phoneme[,N]) names(classes) <- c("sh", "iy", "dcl", "aa", "ao") freq <- 1:150 compare.aa.ao <- scb.equal(freq, list(phoneme[classes$aa,-N], phoneme[classes$ao,-N]), bandwidth = c(.75, .75), scbtype = "both", nboot = 2e3) summary(compare.aa.ao) ## End(Not run)
## Not run: # Phoneme data: compare the mean log-periodograms # for phonemes "aa" as the vowel in "dark" and "ao" # as the first vowel in "water" data(phoneme) n <- nrow(phoneme) N <- ncol(phoneme) classes <- split(1:n,phoneme[,N]) names(classes) <- c("sh", "iy", "dcl", "aa", "ao") freq <- 1:150 compare.aa.ao <- scb.equal(freq, list(phoneme[classes$aa,-N], phoneme[classes$ao,-N]), bandwidth = c(.75, .75), scbtype = "both", nboot = 2e3) summary(compare.aa.ao) ## End(Not run)
Fit a local linear estimator and build simultaneous confidence bands (SCB) for the mean of functional data.
scb.mean(x, y, bandwidth, level = .95, degree = 1, scbtype = c("normal","bootstrap","both","no"), gridsize = length(x), keep.y = TRUE, nrep = 2e4, nboot = 5e3, parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus",1L), cl = NULL)
scb.mean(x, y, bandwidth, level = .95, degree = 1, scbtype = c("normal","bootstrap","both","no"), gridsize = length(x), keep.y = TRUE, nrep = 2e4, nboot = 5e3, parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus",1L), cl = NULL)
x |
a numeric vector of x data. Missing values are not accepted. |
y |
a matrix or data frame with functional observations (= curves) stored in rows. The number of columns of |
bandwidth |
the kernel bandwidth smoothing parameter. |
level |
the level of the simultaneous confidence bands. |
degree |
the degree of the local polynomial fit. |
scbtype |
the type of simultaneous confidence bands to build: "normal", "bootstrap", "both", or "no". |
gridsize |
the size of the grid used to evaluate the mean function estimates and SCB. Defaults to length(x). |
keep.y |
logical; if |
nrep |
number of replicates for the Gaussian SCB method (20,000 by default). |
nboot |
number of replicates for the bootstrap SCB method (5,000 by default). |
parallel |
the computation method for the SCB. By default, computations are sequential ( |
ncpus |
number of cores to use for parallel computing when |
cl |
name of the cluster to use for parallel computing when |
The local polynomial fitting uses a standard normal kernel and is implemented via the locpoly
function.
Bootstrap SCB are implemented with the boot
function and typically require more computation time than normal SCB.
An object of class "SCBand"
. To accommodate the different functions creating objects of this class (scb.mean
, scb.model
, and scb.equal
), some components of the object are set to NULL
. The component list is:
x |
the x data. |
y |
the y data if |
call |
the function call. |
model |
|
par |
|
nonpar |
a nonparametric estimate. |
bandwidth |
the argument |
degree |
the degree of local polynomial used. Currently, only local linear estimation is supported. |
level |
the argument |
scbtype |
the argument |
teststat |
|
pnorm |
|
pboot |
|
qnorm |
the quantile used to build the normal SCB. |
qboot |
the quantile used to build the bootstrap SCB. |
normscb |
a matrix containing the normal SCB stored in columns. |
bootscb |
a matrix containing the bootstrap SCB stored in columns. |
gridsize |
the argument |
nrep |
the argument |
nboot |
the argument |
Depending on the value of scbtype
, some of
the fields qnorm
, normscb
, nrep
, qboot
, normboot
and nboot
may be NULL
.
Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica, 21, 1735–1765.
## Not run: ## Plasma citrate data data(plasma) time <- 8:21 h <- cv.select(time, plasma, 1, c(.5, 1)) scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100) scbplasma plot(scbplasma, cex = .2, legend.cex = .85, xlab = "Time", ylab = "Concentration", main = "Plasma citrate data") ## End(Not run)
## Not run: ## Plasma citrate data data(plasma) time <- 8:21 h <- cv.select(time, plasma, 1, c(.5, 1)) scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100) scbplasma plot(scbplasma, cex = .2, legend.cex = .85, xlab = "Time", ylab = "Concentration", main = "Plasma citrate data") ## End(Not run)
This is the goodness-of-fit test for parametric models of the mean function described in Degras (2011). The candidate model must be a finite-dimensional function space (curvilinear regression). The test is based on the sup-norm distance between a smoothed parametric estimate and a local linear estimate. Graphically, the candidate model is retained whenever one of the estimates lies within the SCB built around the other.
scb.model(x, y, model, bandwidth, level = .05, degree = 1, scbtype = c("normal","bootstrap","both","no"), gridsize = length(x), keep.y = TRUE, nrep = 2e4, nboot = 5e3, parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus",1L), cl = NULL)
scb.model(x, y, model, bandwidth, level = .05, degree = 1, scbtype = c("normal","bootstrap","both","no"), gridsize = length(x), keep.y = TRUE, nrep = 2e4, nboot = 5e3, parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus",1L), cl = NULL)
x |
a numeric vector of x data. |
y |
a matrix or data frame with functional observations (= curves) stored in rows. The number of columns of |
model |
an integer specifying the degree of a polynomial basis, or a data frame/matrix containing the basis functions stored in columns. In the latter case, the basis functions must be evaluated on a uniform grid of size |
bandwidth |
the kernel bandwidth smoothing parameter. |
level |
the significance level of the test (default = .05). |
degree |
the degree of the local polynomial fit. |
scbtype |
the type of simultaneous confidence bands to build: "normal", "bootstrap", "both", or "no". |
gridsize |
the size of the grid over which the mean function is to be estimated. Defaults to |
keep.y |
logical; if |
nrep |
the number of replicates for the normal SCB method (default = 20,000). |
nboot |
the number of replicates for the bootstrap SCB method (default = 5,000). |
parallel |
the computation method for the bootstrap SCB. By default, computations are sequential ( |
ncpus |
the number of cores to use for parallel computing when |
cl |
the name of the cluster to use for parallel computing when |
An object of class "SCBand"
. To accommodate the different functions creating objects of this class (scb.mean
, scb.model
, and scb.equal
), some components of the object are set to NULL
. The component list is:
x |
the argument |
y |
the argument |
call |
the function call. |
model |
the argument |
par |
a smoothed parametric estimate. |
nonpar |
a local linear estimate. |
bandwidth |
the argument |
degree |
the degree of the local polynomial. Currently, only local linear estimation is supported. |
level |
the argument |
scbtype |
the argument |
teststat |
the test statistic. |
pnorm |
the p value for the normal-based statistical test. |
pboot |
the p value for the boostrap-based statistical test. |
qnorm |
the quantile used to build the normal SCB. |
qboot |
the quantile used to build the bootstrap SCB. |
normscb |
a matrix containing the normal SCB stored in columns. |
bootscb |
a matrix containing the bootstrap SCB stored in columns. |
gridsize |
the argument |
nrep |
the argument |
nboot |
the argument |
Depending on the value of scbtype
, some or all of
the fields pnorm
, qnorm
, normscb
, nrep
, pboot
, qboot
, normboot
and nboot
may be NULL
.
Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data.
Statistica Sinica, 21, 1735–1765.
## Example from Degras (2011) ## Gaussian process with polynomial mean function ## and Ornstein-Uhlenbeck covariance function ## The SCB and PLRT tests are compared set.seed(100) p <- 100 # number of observation points x <- seq(0, 1, len = p) mu <- 10 * x^3 - 15 * x^4 + 6 * x^5 # mean R <- (.25)^2 * exp(20 * log(.9) * abs(outer(x,x,"-"))) # covariance eigR <- eigen(R, symmetric = TRUE) simR <- eigR$vectors %*% diag(sqrt(eigR$values)) # Candidate model for mu: polynomial of degree <= 3 # This model, although incorrect, closely approximates mu. # With n = 50 curves, the SCB and PLRT incorrectly retain the model. # With n = 70 curves, both tests reject it. n <- 50 y <- mu + simR %*% matrix(rnorm(n*p), p, n) # simulate data y <- t(y) # arrange the trajectories in rows h <- cv.select(x, y, 1) scb.model(x, y, 3, bandwidth = h) # p value: .652 plrt.model(x, y, 3, verbose = TRUE) # p value: .450 n <- 70 y <- mu + simR %*% matrix(rnorm(n*p), p, n) y <- t(y) h <- cv.select(x, y, 1) scb.model(x, y, 3, bandwidth = h) # p value: .004 plrt.model(x, y, 3, verbose = TRUE) # p value: .001 # Correct model: polynomials of degree <= 5 scb.model(x, y, 5, bandwidth = h) # p value: .696 plrt.model(x, y, 5, verbose = TRUE) # p value: .628
## Example from Degras (2011) ## Gaussian process with polynomial mean function ## and Ornstein-Uhlenbeck covariance function ## The SCB and PLRT tests are compared set.seed(100) p <- 100 # number of observation points x <- seq(0, 1, len = p) mu <- 10 * x^3 - 15 * x^4 + 6 * x^5 # mean R <- (.25)^2 * exp(20 * log(.9) * abs(outer(x,x,"-"))) # covariance eigR <- eigen(R, symmetric = TRUE) simR <- eigR$vectors %*% diag(sqrt(eigR$values)) # Candidate model for mu: polynomial of degree <= 3 # This model, although incorrect, closely approximates mu. # With n = 50 curves, the SCB and PLRT incorrectly retain the model. # With n = 70 curves, both tests reject it. n <- 50 y <- mu + simR %*% matrix(rnorm(n*p), p, n) # simulate data y <- t(y) # arrange the trajectories in rows h <- cv.select(x, y, 1) scb.model(x, y, 3, bandwidth = h) # p value: .652 plrt.model(x, y, 3, verbose = TRUE) # p value: .450 n <- 70 y <- mu + simR %*% matrix(rnorm(n*p), p, n) y <- t(y) h <- cv.select(x, y, 1) scb.model(x, y, 3, bandwidth = h) # p value: .004 plrt.model(x, y, 3, verbose = TRUE) # p value: .001 # Correct model: polynomials of degree <= 5 scb.model(x, y, 5, bandwidth = h) # p value: .696 plrt.model(x, y, 5, verbose = TRUE) # p value: .628
summary
method for class "SCBand"
## S3 method for class 'SCBand' summary(object, ...)
## S3 method for class 'SCBand' summary(object, ...)
object |
an object of class |
... |
additional arguments; not currently used. |
The function summary.SCBand
displays all fields of a SCBand
object at the exception of x
, y
, par
, nonpar
, normscb
, and bootscb
which are potentially big. It provides information on the function call, data, local polynomial fit, SCB, and statistical tests.
## Not run: ## Plasma citrate data data(plasma) time <- 8:21 h <- cv.select(time, plasma, 1) scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100) summary(scbplasma) ## End(Not run)
## Not run: ## Plasma citrate data data(plasma) time <- 8:21 h <- cv.select(time, plasma, 1) scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100) summary(scbplasma) ## End(Not run)