Package 'SCBmeanfd'

Title:	Simultaneous Confidence Bands for the Mean of Functional Data
Description:	Statistical methods for estimating and inferring the mean of functional data. The methods include simultaneous confidence bands, local polynomial fitting, bandwidth selection by plug-in and cross-validation, goodness-of-fit tests for parametric models, equality tests for two-sample problems, and plotting functions.
Authors:	David Degras
Maintainer:	David Degras <[email protected]>
License:	GPL-3
Version:	1.2.2
Built:	2025-02-18 07:20:58 UTC
Source:	CRAN

Help Index

Simultaneous Confidence Bands for Mean Functions
Leave-One-Curve-out Cross-Validation Score
Cross-Validation Bandwidth Selection for Local Polynomial Estimation
Phoneme data
Plasma citrate data
Plot a SCBand Object
Pseudo-Likelihood Ratio Test for Models of the Mean Function
Plug-in Bandwidth Selection for Local Polynomial Estimation
Print a SCBand Object
Compare Two Mean Functions
Build Simultaneous Confidence Bands for Mean Functions
Goodness-Of-Fit of a Model for the Mean Function
Summarize a SCBand Object

Simultaneous Confidence Bands for Mean Functions

Description

Statistical methods for estimating and inferring the mean of functional data. The methods include simultaneous confidence bands, local polynomial fitting, bandwidth selection by plug-in and cross-validation, goodness-of-fit tests for parametric models, equality tests for two-sample problems, and plotting functions.

Details

Package:	SCBmeanfd
Type:	Package
Version:	1.2.1
Date:	2016-09-24
License:	GPL-3

Author(s)

David Degras <[email protected]>

References

Azzalini, A. and Bowman, A. (1993). On the use of nonparametric regression for checking linear relationships. Journal of the Royal Statistical Society. Series B 55, 549-557.

Benhenni, K. and Degras, D. (2014). Local polynomial estimation of the mean function and its derivatives based on functional data and regular designs. ESAIM: Probability and Statistics 18, 881-899.

Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica 21, 1735-1765.

Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B 53, 233-243.

Leave-One-Curve-out Cross-Validation Score

Description

Compute the cross-validation score of Rice and Silverman (1991) for the local polynomial estimation of a mean function.

Usage

cv.score(bandwidth, x, y, degree = 1, gridsize = length(x))
cv.score(bandwidth, x, y, degree = 1, gridsize = length(x))

Arguments

`bandwidth`	kernel bandwidth.
`x`	observation points. Missing values are not accepted.
`y`	matrix or data frame with functional observations (= curves) stored in rows. The number of columns of `y` must match the length of `x`. Missing values are not accepted.
`degree`	degree of the local polynomial fit.
`gridsize`	size of evaluation grid for the smoothed data.

Details

The cross-validation score is obtained by leaving in turn each curve out and computing the prediction error of the local polynomial smoother based on all other curves. For a bandwith value $h$ , this score is

$CV(h) = \frac{1}{np} \sum_{i=1}^n \sum_{j=1}^p \left( Y_{ij} - \hat{\mu}^{-(i)}(x_j;h) \right)^2,$

where $Y_{ij}$ is the measurement of the $i$ -th curve at location $x_j$ for $i=1,\ldots,n$ and $j=1,\ldots,p$ , and $\hat{\mu}^{-(i)}(x_j;h)$ is the local polynomial estimator with bandwidth $h$ based on all curves except the $i$ -th.

If the x values are not equally spaced, the data are first smoothed and evaluated on a grid of length gridsize spanning the range of x. The smoothed data are then interpolated back to x.

Value

the cross-validation score.

References

Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B (Methodological), 53, 233–243.

Examples

## Artificial example 
x <- seq(0, 1, len = 100)
mu <- x + .2 * sin(2 * pi * x)
y <- matrix(mu + rnorm(2000, sd = .25), 20, 100, byrow = TRUE)
h <- c(.005, .01, .02, .05, .1, .15)
cv <- numeric()
for (i in 1:length(h)) cv[i] <- cv.score(h[i], x, y, 1)
plot(h, cv, type = "l")

## Plasma citrate data
## Compare cross-validation scores and bandwidths  
## for local linear and local quadratic smoothing
## Not run: 
data(plasma)
time <- 8:21
h1 <- seq(.5, 1.3, .05)
h2 <- seq(.75, 2, .05)
cv1 <- sapply(h1, cv.score, x = time, y = plasma, degree = 1)
cv2 <- sapply(h2, cv.score, x = time, y = plasma, degree = 2)
plot(h1, cv1, type = "l", xlim = range(c(h1,h2)), ylim = range(c(cv1, cv2)), 
  xlab = "Bandwidth (hour)", ylab = "CV score", 
  main = "Cross validation for local polynomial estimation")
lines(h2, cv2, col = 2)
legend("topleft", legend = c("Linear", "Quadratic"), lty = 1, 
  col = 1:2, cex = .9)

## Note: using local linear (resp. quadratic) smoothing 
## with a bandwidth smaller than .5 (resp. .75) can result 
## in non-definiteness or numerical instability of the estimator. 

## End(Not run)
## Artificial example 
x <- seq(0, 1, len = 100)
mu <- x + .2 * sin(2 * pi * x)
y <- matrix(mu + rnorm(2000, sd = .25), 20, 100, byrow = TRUE)
h <- c(.005, .01, .02, .05, .1, .15)
cv <- numeric()
for (i in 1:length(h)) cv[i] <- cv.score(h[i], x, y, 1)
plot(h, cv, type = "l")

## Plasma citrate data
## Compare cross-validation scores and bandwidths  
## for local linear and local quadratic smoothing
## Not run: 
data(plasma)
time <- 8:21
h1 <- seq(.5, 1.3, .05)
h2 <- seq(.75, 2, .05)
cv1 <- sapply(h1, cv.score, x = time, y = plasma, degree = 1)
cv2 <- sapply(h2, cv.score, x = time, y = plasma, degree = 2)
plot(h1, cv1, type = "l", xlim = range(c(h1,h2)), ylim = range(c(cv1, cv2)), 
  xlab = "Bandwidth (hour)", ylab = "CV score", 
  main = "Cross validation for local polynomial estimation")
lines(h2, cv2, col = 2)
legend("topleft", legend = c("Linear", "Quadratic"), lty = 1, 
  col = 1:2, cex = .9)

## Note: using local linear (resp. quadratic) smoothing 
## with a bandwidth smaller than .5 (resp. .75) can result 
## in non-definiteness or numerical instability of the estimator. 

## End(Not run)

Cross-Validation Bandwidth Selection for Local Polynomial Estimation

Description

Select the cross-validation bandwidth described in Rice and Silverman (1991) for the local polynomial estimation of a mean function based on functional data.

Usage

 cv.select(x, y, degree = 1, interval = NULL, gridsize = length(x), ...) cv.select(x, y, degree = 1, interval = NULL, gridsize = length(x), ...)

Arguments

`x`	observation points. Missing values are not accepted.
`y`	matrix or data frame with functional observations (= curves) stored in rows. The number of columns of `y` must match the length of `x`. Missing values are not accepted.
`degree`	degree of the local polynomial fit.
`interval`	lower and upper bounds of the search interval (numeric vector of length 2).
`gridsize`	size of evaluation grid for the smoothed data.
`...`	additional arguments to pass to the optimization function `optimize`.

Details

$CV(h) = \frac{1}{np} \sum_{i=1}^n \sum_{j=1}^p \left( Y_{ij} - \hat{\mu}^{-(i)}(x_j;h) \right)^2,$

If the x values are not equally spaced, the data are first smoothed and evaluated on a grid of length gridsize spanning the range of x. The smoothed data are then interpolated back to x.

cv.select uses the standard R function optimize to optimize cv.score. If the argument interval is not specified, the lower bound of the search interval is by default $(x_2-x_1)/2$ if $degree < 2$ and $x_2-x_1$ if $degree >= 2$ . The default value of the upper bound is $(\max(x)-\min(x))/2$ . These values guarantee in most cases that the local polynomial estimator is well defined. It is often useful to plot the function to be optimized for a range of argument values (grid search) before applying a numerical optimizer. In this way, the search interval can be narrowed down and the optimizer is more likely to find a global solution.

Value

a bandwidth that minimizes the cross-validation score.

References

Examples

## Not run: 
## Plasma citrate data
## Compare cross-validation scores and bandwidths  
## for local linear and local quadratic smoothing

data(plasma)
time <- 8:21   				
## Local linear smoothing						
cv.select(time, plasma, 1)	# local solution h = 3.76, S(h) = 463.08			
cv.select(time, plasma, 1, interval = c(.5, 1))	# global solution = .75, S(h) = 439.54

## Local quadratic smoothing						
cv.select(time, plasma, 2)	# global solution h = 1.15, S(h) = 432.75			
cv.select(time, plasma, 2, interval = c(1, 1.5))	# same

## End(Not run)
## Not run: 
## Plasma citrate data
## Compare cross-validation scores and bandwidths  
## for local linear and local quadratic smoothing

data(plasma)
time <- 8:21   				
## Local linear smoothing						
cv.select(time, plasma, 1)	# local solution h = 3.76, S(h) = 463.08			
cv.select(time, plasma, 1, interval = c(.5, 1))	# global solution = .75, S(h) = 439.54

## Local quadratic smoothing						
cv.select(time, plasma, 2)	# global solution h = 1.15, S(h) = 432.75			
cv.select(time, plasma, 2, interval = c(1, 1.5))	# same

## End(Not run)

Phoneme data

Description

Log-periodograms of five phonemes ("sh" as in "she", "dcl" as in "dark", "iy" as the vowel in "she", "aa" as the vowel in "dark", and "ao" as the first vowel in "water") spoken by a sample of ~50 male speakers. For each phoneme, 400 log-periodograms are observed on a uniform grid of 150 frequencies.

Usage

data(phoneme)data(phoneme)

Format

A data frame with 2000 rows and 151 columns. Each row contains a discretized log-periodogram (150 first columns) followed by a phoneme indicator (last column) coded as follows:

Code	Phoneme
1	"sh"
2	"iy"
3	"dcl"
4	"aa"
5	"ao"

Source

This dataset was created by the STAPH group in Toulouse, France
http://www.math.univ-toulouse.fr/staph/npfda/.
The larger original dataset can be found at
http://statweb.stanford.edu/~tibs/ElemStatLearn/.

Examples

## Not run: 
data(phoneme)
freq <- 1:150
classes <- phoneme[,151]
phoneme <- phoneme[,-151]
classnames <- c("sh", "iy", "dcl", "aa", "ao")

## Local linear fit to the mean log-periodogram for each phoneme 
llfit <- mapply(locpoly, y = by(phoneme, classes, colMeans), 
          MoreArgs = list(x = freq, bandwidth = 2, degree = 1, 
          gridsize = length(freq)))
llfit.y <- matrix(unlist(llfit["y",]), 150, 5)  
matplot(freq, llfit.y, type = "l", lty = 1, xlab = "Frequency (scaled)", 
	ylab = "Log-intensity", 
	main = "Local linear estimation\nof the population mean log-periodogram")
legend("topright", legend = classnames, col = 1:5, lty = 1)

## End(Not run)
## Not run: 
data(phoneme)
freq <- 1:150
classes <- phoneme[,151]
phoneme <- phoneme[,-151]
classnames <- c("sh", "iy", "dcl", "aa", "ao")

## Local linear fit to the mean log-periodogram for each phoneme 
llfit <- mapply(locpoly, y = by(phoneme, classes, colMeans), 
          MoreArgs = list(x = freq, bandwidth = 2, degree = 1, 
          gridsize = length(freq)))
llfit.y <- matrix(unlist(llfit["y",]), 150, 5)  
matplot(freq, llfit.y, type = "l", lty = 1, xlab = "Frequency (scaled)", 
	ylab = "Log-intensity", 
	main = "Local linear estimation\nof the population mean log-periodogram")
legend("topright", legend = classnames, col = 1:5, lty = 1)

## End(Not run)

Plasma citrate data

Description

Plasma citrate concentrations measured on 10 human subjects on the same day. The measurements on an individual were taken each hour from 8:00AM to 9:00PM. A possible statistical analysis is to estimate the population mean plasma citrate concentration as a function of time of day.

Usage

data(plasma)data(plasma)

Format

A matrix with 10 rows (corresponding to subjects) and 14 columns (corresponding to hours).

Source

Andersen, A. H., Jensen, E. B. and Schou, G. (1981). Two-way analysis of variance with correlated errors. International Statistical Review 49, 153–157.

Hart, T. D. and Wehrly, T. E. (1986). Kernel regression estimation using repeated measurements data. Journal of the American Statistical Association 81, 1080–1088.

Examples

## Not run: 
data(plasma)
matplot(x = 8:21, y = t(plasma), cex = .75, type = "l", col = 1, lty = 1,
  lwd = .5, xlab = "Time of day (hour)", ylab = "Concentration", 
  main = "Plasma citrate data")
lines(8:21, colMeans(plasma), col = 2, lwd = 1.5)
legend("topright", col = 2, lty = 1, lwd = 1.5, legend = "Mean")
## End(Not run)
## Not run: 
data(plasma)
matplot(x = 8:21, y = t(plasma), cex = .75, type = "l", col = 1, lty = 1,
  lwd = .5, xlab = "Time of day (hour)", ylab = "Concentration", 
  main = "Plasma citrate data")
lines(8:21, colMeans(plasma), col = 2, lwd = 1.5)
legend("topright", col = 2, lty = 1, lwd = 1.5, legend = "Mean")
## End(Not run)

Plot a SCBand Object

Description

plot method for class "SCBand".

Usage

## S3 method for class 'SCBand'
plot(x , y = NULL, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, 
     ylab = NULL, col = NULL, cex = NULL, pch = NULL, lty = NULL, lwd = NULL, 
     legend = TRUE, where = NULL, text = NULL, legend.cex = NULL, horiz = TRUE, 
     bty = "n", ...)
## S3 method for class 'SCBand'
plot(x , y = NULL, xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, 
     ylab = NULL, col = NULL, cex = NULL, pch = NULL, lty = NULL, lwd = NULL, 
     legend = TRUE, where = NULL, text = NULL, legend.cex = NULL, horiz = TRUE, 
     bty = "n", ...)

Arguments

`x`	`SCBand` object, typically result of a call to `scb.mean`, `scb.model`, or `scb.equal`.
`y`	optional y data.
`xlim`	x limits of the plot (numeric vector of length 2).
`ylim`	y limits of the plot (numeric vector of length 2).
`main`	title of the plot.
`xlab`	label of the x axis.
`ylab`	label of the y axis.
`col`	colors for lines and points.
`cex`	scale of plotting characters and symbols relative to default (numerical vector).
`pch`	vector of plotting characters or symbols: see `points`.
`lty`	a vector of line types, see `par`.
`lwd`	a vector of line widths, see `par`.
`legend`	logical; if `TRUE`, a legend is added to the plot.
`where`	legend location: `"bottomright", "bottom"`, `"bottomleft", "left", "topleft", "top", "topright", "right"` or `"center"`.
`text`	text of the legend (character vector).
`legend.cex`	character expansion factor relative to current par("cex") for the legend.
`horiz`	logical; if TRUE, the legend is displayed horizontally rather than vertically.
`bty`	type of box to be drawn around the legend. The allowed values are `"n"` (the default) and `"o"`.
`...`	additional arguments passed to the function `matplot` for displaying the y data.

Details

The argument y can be used to plot subsets of the y data. If non null, this argument has priority over the component x$y for plotting.

The graphical parameters col, cex, pch, lty, and lwd apply to the following components to be plotted: data, parametric estimate, nonparametric estimate(s), normal simultaneous confidence bands (SCB), and bootstrap SCB. More precisely, cex and pch must be specified as vectors of length equal to the number of $y$ data sets to be plotted (0, 1, or 2); lty and lwd must specified as numeric vectors of length equal to the total number of estimates and SCB components; col applies to all components and should be specified accordingly. If necessary, graphical parameters are recycled to match the required length.

By default a legend is plotted horizontally at the bottom of the graph.

Examples

## Not run: 
## Plasma citrate data 
time <- 8:21
data(plasma)
h <- cv.select(time, plasma, degree = 1, interval = c(.5, 1))
scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both",
                      gridsize = 100)
plot(scbplasma, cex = .2, legend.cex = .85, xlab = "Time of day", 
     ylab = "Concentration", main = "Plasma citrate data")

## End(Not run)
## Not run: 
## Plasma citrate data 
time <- 8:21
data(plasma)
h <- cv.select(time, plasma, degree = 1, interval = c(.5, 1))
scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both",
                      gridsize = 100)
plot(scbplasma, cex = .2, legend.cex = .85, xlab = "Time of day", 
     ylab = "Concentration", main = "Plasma citrate data")

## End(Not run)

Pseudo-Likelihood Ratio Test for Models of the Mean Function

Description

Implement the Pseudo-Likelihood Ratio Test (PLRT) of Azzalini and Bowman (1993) with functional data. The test is used to assess whether a mean function belongs to a given finite-dimensional function space.

Usage

 plrt.model(x, y, model, verbose = FALSE) plrt.model(x, y, model, verbose = FALSE)

Arguments

`x`	a numeric vector of x data. `x` must be a uniform grid; missing values are not accepted.
`y`	a matrix or data frame with functional observations (= curves) stored in rows. The number of columns of `y` must match the length of `x`. Missing values are not accepted.
`model`	an integer specifying the degree of a polynomial basis, or a data frame/matrix containing the basis functions stored in columns. In the latter case, the basis functions must be evaluated at `x`.
`verbose`	logical; if `TRUE`, information on the candidate model, kernel bandwidth, test statistic, and p value is printed; otherwise no print.

Value

the p value of the PLRT.

References

Azzalini, A. and Bowman, A. (1993). On the use of nonparametric regression for checking linear relationships. Journal of the Royal Statistical Society. Series B 55, 549-557.

Examples

## Example: Gaussian process with mean = linear function + bump 
## and Onstein-Uhlenbeck covariance. The bump is high in the y   
## direction and narrow in the x direction. The SCB and PLRT 
## tests are compared.

# The departure from linearity in the mean function is strong  
# in the supremum norm (SCB test) but mild in the euclidean norm 
# (PLRT). With either n = 20 or n = 100 curves, the SCB test 
# strongly rejects the incorrect linear model for the mean 
# function while the PLRT retains it. 

p    <- 100    # number of observation points
x 	  <- seq(0, 1, len = p)
mu <- -1 + 1.5 * x + 0.2 * dnorm(x, .6, .02) 
plot(x, mu, type = "l")
R 	  <- (.25)^2 * exp(20 * log(.9) * abs(outer(x,x,"-"))) # covariance 
eigR <- eigen(R, symmetric = TRUE)  	
simR <- eigR$vectors %*% diag(sqrt(eigR$values)) 	 

n  <- 20
set.seed(100)
y  <- mu + simR %*% matrix(rnorm(n*p), p, n) 		
y  <- t(y) 	
points(x, colMeans(y))									
h  <- cv.select(x, y, 1)	
scb.model(x, y, 1, bandwidth = h)		 # p value: <1e-16
plrt.model(x, y, 1, verbose = TRUE)	# p value: .442
n  <- 100
y  <- mu + simR %*% matrix(rnorm(n*p), p, n) 		
y  <- t(y) 	
h  <- cv.select(x, y, 1)	
scb.model(x, y, 1, bandwidth = h)		 # p value: <1e-16
plrt.model(x, y, 1, verbose = TRUE)	# p value: .456
## Example: Gaussian process with mean = linear function + bump 
## and Onstein-Uhlenbeck covariance. The bump is high in the y   
## direction and narrow in the x direction. The SCB and PLRT 
## tests are compared.

# The departure from linearity in the mean function is strong  
# in the supremum norm (SCB test) but mild in the euclidean norm 
# (PLRT). With either n = 20 or n = 100 curves, the SCB test 
# strongly rejects the incorrect linear model for the mean 
# function while the PLRT retains it. 

p    <- 100    # number of observation points
x 	  <- seq(0, 1, len = p)
mu <- -1 + 1.5 * x + 0.2 * dnorm(x, .6, .02) 
plot(x, mu, type = "l")
R 	  <- (.25)^2 * exp(20 * log(.9) * abs(outer(x,x,"-"))) # covariance 
eigR <- eigen(R, symmetric = TRUE)  	
simR <- eigR$vectors %*% diag(sqrt(eigR$values)) 	 

n  <- 20
set.seed(100)
y  <- mu + simR %*% matrix(rnorm(n*p), p, n) 		
y  <- t(y) 	
points(x, colMeans(y))									
h  <- cv.select(x, y, 1)	
scb.model(x, y, 1, bandwidth = h)		 # p value: <1e-16
plrt.model(x, y, 1, verbose = TRUE)	# p value: .442
n  <- 100
y  <- mu + simR %*% matrix(rnorm(n*p), p, n) 		
y  <- t(y) 	
h  <- cv.select(x, y, 1)	
scb.model(x, y, 1, bandwidth = h)		 # p value: <1e-16
plrt.model(x, y, 1, verbose = TRUE)	# p value: .456

Plug-in Bandwidth Selection for Local Polynomial Estimation

Description

Select the plug-in bandwidth described in Benhenni and Degras (2011) for the local polynomial estimation of a mean function and its first derivative based on functional data.

Usage

plugin.select(x, y, drv = 0L, degree = drv+1L, gridsize = length(x), ...) 
plugin.select(x, y, drv = 0L, degree = drv+1L, gridsize = length(x), ...)

Arguments

`x`	numeric vector of x data. This observation grid must be uniform and missing values are not accepted.
`y`	matrix or data frame with functional observations (= curves) stored in rows. The number of columns of `y` must match the length of `x`. Missing values are not accepted.
`drv`	order of the derivative to estimate. Must be 0 or 1.
`degree`	degree of local polynomial used. Must equal `drv` or `drv+1`.
`gridsize`	the size of the grid over which the mean function is to be estimated. Defaults to `length(x)`.
`...`	additional arguments to pass to the optimizer of the CV score.

Details

The plug-in method should not be used with small data sets, since it is based on asymptotic considerations and requires reasonably accurate estimates of derivatives of the mean and covariance functions. Both the number of observed curves and observation points should be moderate to large. The plug-in bandwidth is designed to minimize the asymptotic mean integrated squared estimation error

$AMISE(h) = \int (\mu(t) - \hat{\mu}(t; h))^2 dt,$

where $\mu(x)$ is the mean function and $\hat{\mu}(t;h)$ is a local polynomial estimator with kernel bandwidth $h$ . The expression of the plug-in bandwidth can be found in Benhenni and Degras (2011).

Value

the plug-in bandwidth.

References

Benhenni, K. and Degras, D. (2011). Local polynomial estimation of the average growth curve with functional data. http://arxiv.org/abs/1107.4058

Examples

## Not run: 
## Phoneme data
data(phoneme)
classes <- phoneme[,151]
phoneme <- phoneme[,-151]
freq    <- 1:150
plugin.bandwidth  <- numeric(5) 
cv.bandwidth  <- numeric(5)  # compare with cross-validation
for (i in 1:5) {
  plugin.bandwidth[i] <- plugin.select(x = freq, y = phoneme[classes == i, ],
                          drv = 0, degree = 1) 
  cv.bandwidth[i]     <- cv.select(x = freq, y = phoneme[classes == i, ], 
                          degree = 1)
}

round(cbind(plugin.bandwidth, cv.bandwidth), 4)


## End(Not run)
## Not run: 
## Phoneme data
data(phoneme)
classes <- phoneme[,151]
phoneme <- phoneme[,-151]
freq    <- 1:150
plugin.bandwidth  <- numeric(5) 
cv.bandwidth  <- numeric(5)  # compare with cross-validation
for (i in 1:5) {
  plugin.bandwidth[i] <- plugin.select(x = freq, y = phoneme[classes == i, ],
                          drv = 0, degree = 1) 
  cv.bandwidth[i]     <- cv.select(x = freq, y = phoneme[classes == i, ], 
                          degree = 1)
}

round(cbind(plugin.bandwidth, cv.bandwidth), 4)


## End(Not run)

Print a SCBand Object

Description

print method for class "SCBand".

Usage

## S3 method for class 'SCBand'
print(x,...)## S3 method for class 'SCBand'
print(x,...)

Arguments

`x`	an object of class `"SCBand"`.
`...`	for compatibility with the generic `print` method; argument not currently used.

Details

The function print.SCBand concisely displays the information of an object of class "SCBand". More precisely it shows the data range, bandwidth used in local polynomial estimation, and key information on SCB and statistical tests.

Examples

## Not run: 
# Plasma citrate data
data(plasma)
time <- 8:21
h <- cv.select(time, plasma, 1)
scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100)
scbplasma

## End(Not run)
## Not run: 
# Plasma citrate data
data(plasma)
time <- 8:21
h <- cv.select(time, plasma, 1)
scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100)
scbplasma

## End(Not run)

Compare Two Mean Functions

Description

This two-sample test builds simultaneous confidence bands (SCB) for the difference between two population mean functions and retains the equality assumption if the null function is contained in the bands. Equivalently, SCB are built around one of the local linear estimates (the one for say, population 1), and the equality hypothesis is accepted if the other estimate (the one for population 2) lies within the bands.

Usage

scb.equal(x, y, bandwidth, level = .05, degree = 1, 
	scbtype = c("normal","bootstrap","both","no"), gridsize = NULL, 
	keep.y = TRUE, nrep = 2e4, nboot = 1e4, parallel = c("no","multicore","snow"), 
	ncpus = getOption("boot.ncpus",1L), cl = NULL)
scb.equal(x, y, bandwidth, level = .05, degree = 1, 
	scbtype = c("normal","bootstrap","both","no"), gridsize = NULL, 
	keep.y = TRUE, nrep = 2e4, nboot = 1e4, parallel = c("no","multicore","snow"), 
	ncpus = getOption("boot.ncpus",1L), cl = NULL)

Arguments

`x`	a list of length 2 or matrix with 2 columns containing the x values of each sample. If the two samples are observed on the same grid, `x` can be specified as a single vector. The range of `x` must be the same for each sample; missing values are not accepted.
`y`	a list of length 2 containing matrices or data frames with functional observations (= curves) stored in rows. The number of columns of each component of `y` must match the length of the corresponding `x`. Missing values are not accepted.
`bandwidth`	the kernel bandwidths (numeric vector of length 1 or 2).
`level`	the significance level of the test (default = .05).
`degree`	the degree of the local polynomial fit.
`scbtype`	the type of simultaneous confidence bands to build: "normal", "bootstrap", "both", or "no".
`gridsize`	the size of the grid over which the mean function is to be estimated. Defaults to the length of the smallest `x` grid.
`keep.y`	logical; if `TRUE`, keep `y` in the result.
`nrep`	the number of replicates for the normal SCB method (default = 20,000).
`nboot`	the number of replicates for the bootstrap SCB method (default = 5,000).
`parallel`	the computation method for the bootstrap SCB. By default, computations are sequential (`"no"`). The function `boot` is used and can be run in parallel using the package `parallel`. Both options `"multicore"` and `"snow"` are available for parallel computing.
`ncpus`	the number of cores to use for parallel computing if `parallel = "multicore"`.
`cl`	the name of the cluster to use for parallel computing if `parallel = "snow"`.

Value

A list object of class "SCBand". Depending on the function used to create the object (scb.mean, scb.model, or scb.equal), some of its components are set to NULL. For scb.mean, the object has components:

`x`	the argument `x`.
`y`	if `keep.y` is `TRUE`, the argument `y`, else `NULL`.
`call`	the function call.
`model`	`NULL`.
`par`	`NULL`.
`nonpar`	a list of two local linear estimates, one for each population.
`bandwidth`	the argument `bandwidth`.
`degree`	the degree of local polynomial used. Currently, only local linear estimation is supported.
`level`	the argument `level`.
`scbtype`	the argument `scbtype`.
`teststat`	the test statistic.
`pnorm`	the p value for the normal-based statistical test.
`pboot`	the p value for the boostrap-based statistical test.
`qnorm`	the quantile used to build the normal SCB.
`qboot`	the quantile used to build the bootstrap SCB.
`normscb`	a matrix containing the normal SCB stored in columns.
`bootscb`	a matrix containing the bootstrap SCB stored in columns.
`gridsize`	the argument `gridsize`, or length(x) if no argument was specified.
`nrep`	the argument `nrep`.
`nboot`	the argument `nboot`.

Depending on the value of scbtype, some or all of the fields pnorm, qnorm, normscb, nrep, pboot, qboot, normboot and nboot may be NULL.

References

Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica, 21, 1735–1765.

Examples

## Not run: 
# Phoneme data: compare the mean log-periodograms 
# for phonemes "aa" as the vowel in "dark" and "ao" 
# as the first vowel in "water"
data(phoneme)
n <- nrow(phoneme)
N <- ncol(phoneme) 
classes <- split(1:n,phoneme[,N])
names(classes) <- c("sh", "iy", "dcl", "aa", "ao")
freq    <- 1:150
compare.aa.ao <- scb.equal(freq, list(phoneme[classes$aa,-N], 
  phoneme[classes$ao,-N]), bandwidth = c(.75, .75), scbtype = "both", nboot = 2e3)
summary(compare.aa.ao)

## End(Not run)
## Not run: 
# Phoneme data: compare the mean log-periodograms 
# for phonemes "aa" as the vowel in "dark" and "ao" 
# as the first vowel in "water"
data(phoneme)
n <- nrow(phoneme)
N <- ncol(phoneme) 
classes <- split(1:n,phoneme[,N])
names(classes) <- c("sh", "iy", "dcl", "aa", "ao")
freq    <- 1:150
compare.aa.ao <- scb.equal(freq, list(phoneme[classes$aa,-N], 
  phoneme[classes$ao,-N]), bandwidth = c(.75, .75), scbtype = "both", nboot = 2e3)
summary(compare.aa.ao)

## End(Not run)

Build Simultaneous Confidence Bands for Mean Functions

Description

Fit a local linear estimator and build simultaneous confidence bands (SCB) for the mean of functional data.

Usage

scb.mean(x, y, bandwidth, level = .95, degree = 1, 
	scbtype = c("normal","bootstrap","both","no"), gridsize = length(x), 
	keep.y = TRUE, nrep = 2e4, nboot = 5e3, parallel = c("no", "multicore", "snow"), 
	ncpus = getOption("boot.ncpus",1L), cl = NULL)
scb.mean(x, y, bandwidth, level = .95, degree = 1, 
	scbtype = c("normal","bootstrap","both","no"), gridsize = length(x), 
	keep.y = TRUE, nrep = 2e4, nboot = 5e3, parallel = c("no", "multicore", "snow"), 
	ncpus = getOption("boot.ncpus",1L), cl = NULL)

Arguments

`x`	a numeric vector of x data. Missing values are not accepted.
`y`	a matrix or data frame with functional observations (= curves) stored in rows. The number of columns of `y` must match the length of `x`. Missing values are not accepted.
`bandwidth`	the kernel bandwidth smoothing parameter.
`level`	the level of the simultaneous confidence bands.
`degree`	the degree of the local polynomial fit.
`scbtype`	the type of simultaneous confidence bands to build: "normal", "bootstrap", "both", or "no".
`gridsize`	the size of the grid used to evaluate the mean function estimates and SCB. Defaults to length(x).
`keep.y`	logical; if `TRUE`, keep `y` in the result.
`nrep`	number of replicates for the Gaussian SCB method (20,000 by default).
`nboot`	number of replicates for the bootstrap SCB method (5,000 by default).
`parallel`	the computation method for the SCB. By default, computations are sequential (`"no"`). The bootstrap method uses function `boot` and can be run in parallel using the package `parallel`. In this case both options `"multicore"` and `"snow"` are available.
`ncpus`	number of cores to use for parallel computing when `parallel = "multicore"`.
`cl`	name of the cluster to use for parallel computing when `parallel = "snow"`.

Details

The local polynomial fitting uses a standard normal kernel and is implemented via the locpoly function. Bootstrap SCB are implemented with the boot function and typically require more computation time than normal SCB.

Value

An object of class "SCBand". To accommodate the different functions creating objects of this class (scb.mean, scb.model, and scb.equal), some components of the object are set to NULL. The component list is:

`x`	the x data.
`y`	the y data if `keep.y` is `TRUE`, else `NULL`.
`call`	the function call.
`model`	`NULL`.
`par`	`NULL`.
`nonpar`	a nonparametric estimate.
`bandwidth`	the argument `bandwidth`.
`degree`	the degree of local polynomial used. Currently, only local linear estimation is supported.
`level`	the argument `level`.
`scbtype`	the argument `type`.
`teststat`	`NULL`.
`pnorm`	`NULL`.
`pboot`	`NULL`.
`qnorm`	the quantile used to build the normal SCB.
`qboot`	the quantile used to build the bootstrap SCB.
`normscb`	a matrix containing the normal SCB stored in columns.
`bootscb`	a matrix containing the bootstrap SCB stored in columns.
`gridsize`	the argument `gridsize` if nonnull, else length(x).
`nrep`	the argument `nrep`.
`nboot`	the argument `nboot`.

Depending on the value of scbtype, some of the fields qnorm, normscb, nrep, qboot, normboot and nboot may be NULL.

References

Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statistica Sinica, 21, 1735–1765.

Examples

## Not run: 
## Plasma citrate data
data(plasma)
time <- 8:21
h <- cv.select(time, plasma, 1, c(.5, 1))
scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100)
scbplasma
plot(scbplasma, cex = .2, legend.cex = .85, xlab = "Time", ylab = "Concentration", 
  main = "Plasma citrate data")

## End(Not run)
## Not run: 
## Plasma citrate data
data(plasma)
time <- 8:21
h <- cv.select(time, plasma, 1, c(.5, 1))
scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100)
scbplasma
plot(scbplasma, cex = .2, legend.cex = .85, xlab = "Time", ylab = "Concentration", 
  main = "Plasma citrate data")

## End(Not run)

Goodness-Of-Fit of a Model for the Mean Function

Description

This is the goodness-of-fit test for parametric models of the mean function described in Degras (2011). The candidate model must be a finite-dimensional function space (curvilinear regression). The test is based on the sup-norm distance between a smoothed parametric estimate and a local linear estimate. Graphically, the candidate model is retained whenever one of the estimates lies within the SCB built around the other.

Usage

scb.model(x, y, model, bandwidth, level = .05, degree = 1, 
	scbtype = c("normal","bootstrap","both","no"), gridsize = length(x), 
	keep.y = TRUE, nrep = 2e4, nboot = 5e3, parallel = c("no", "multicore", "snow"), 
	ncpus = getOption("boot.ncpus",1L), cl = NULL)
scb.model(x, y, model, bandwidth, level = .05, degree = 1, 
	scbtype = c("normal","bootstrap","both","no"), gridsize = length(x), 
	keep.y = TRUE, nrep = 2e4, nboot = 5e3, parallel = c("no", "multicore", "snow"), 
	ncpus = getOption("boot.ncpus",1L), cl = NULL)

Arguments

`x`	a numeric vector of x data. `x` must be a uniform grid; missing values are not accepted.
`y`	a matrix or data frame with functional observations (= curves) stored in rows. The number of columns of `y` must match the length of `x`. Missing values are not accepted.
`model`	an integer specifying the degree of a polynomial basis, or a data frame/matrix containing the basis functions stored in columns. In the latter case, the basis functions must be evaluated on a uniform grid of size `gridsize` spanning the range of `x`.
`bandwidth`	the kernel bandwidth smoothing parameter.
`level`	the significance level of the test (default = .05).
`degree`	the degree of the local polynomial fit.
`scbtype`	the type of simultaneous confidence bands to build: "normal", "bootstrap", "both", or "no".
`gridsize`	the size of the grid over which the mean function is to be estimated. Defaults to `length(x)`.
`keep.y`	logical; if `TRUE`, keep `y` in the result.
`nrep`	the number of replicates for the normal SCB method (default = 20,000).
`nboot`	the number of replicates for the bootstrap SCB method (default = 5,000).
`parallel`	the computation method for the bootstrap SCB. By default, computations are sequential (`"no"`). The function `boot` is used and can be run in parallel using the package `parallel`. Both options `"multicore"` and `"snow"` are available for parallel computing.
`ncpus`	the number of cores to use for parallel computing when `parallel = "multicore"`.
`cl`	the name of the cluster to use for parallel computing when `parallel = "snow"`.

Value

`x`	the argument `x`.
`y`	the argument `y` if `keep.y` is `TRUE`, else `NULL`.
`call`	the function call.
`model`	the argument `model`.
`par`	a smoothed parametric estimate.
`nonpar`	a local linear estimate.
`bandwidth`	the argument `bandwidth`.
`degree`	the degree of the local polynomial. Currently, only local linear estimation is supported.
`level`	the argument `level`.
`scbtype`	the argument `type`.
`teststat`	the test statistic.
`pnorm`	the p value for the normal-based statistical test.
`pboot`	the p value for the boostrap-based statistical test.
`qnorm`	the quantile used to build the normal SCB.
`qboot`	the quantile used to build the bootstrap SCB.
`normscb`	a matrix containing the normal SCB stored in columns.
`bootscb`	a matrix containing the bootstrap SCB stored in columns.
`gridsize`	the argument `gridsize`, or length(x) if no argument was specified.
`nrep`	the argument `nrep`.
`nboot`	the argument `nboot`.

Depending on the value of scbtype, some or all of the fields pnorm, qnorm, normscb, nrep, pboot, qboot, normboot and nboot may be NULL.

References

Degras, D. (2011). Simultaneous confidence bands for nonparametric regression with functional data.
Statistica Sinica, 21, 1735–1765.

Examples

## Example from Degras (2011)
## Gaussian process with polynomial mean function 
## and Ornstein-Uhlenbeck covariance function
## The SCB and PLRT tests are compared

set.seed(100)
p    <- 100  	# number of observation points
x 	  <- seq(0, 1, len = p)
mu	  <- 10 * x^3 - 15 * x^4 + 6 * x^5	# mean 
R 	  <- (.25)^2 * exp(20 * log(.9) * abs(outer(x,x,"-"))) # covariance 
eigR <- eigen(R, symmetric = TRUE)  	
simR <- eigR$vectors %*% diag(sqrt(eigR$values)) 	 

# Candidate model for mu: polynomial of degree <= 3
# This model, although incorrect, closely approximates mu.
# With n = 50 curves, the SCB and PLRT incorrectly retain the model.
# With n = 70 curves, both tests reject it. 
n <- 50  
y <- mu + simR %*% matrix(rnorm(n*p), p, n) 	# simulate data  
y <- t(y) 	# arrange the trajectories in rows
h <- cv.select(x, y, 1)	
scb.model(x, y, 3, bandwidth = h)		  # p value: .652
plrt.model(x, y, 3, verbose = TRUE)	# p value: .450 
n <- 70  	
y <- mu + simR %*% matrix(rnorm(n*p), p, n) 		
y <- t(y) 										
h <- cv.select(x, y, 1)	
scb.model(x, y, 3, bandwidth = h)		  # p value: .004
plrt.model(x, y, 3, verbose = TRUE)	# p value: .001

# Correct model: polynomials of degree <= 5
scb.model(x, y, 5, bandwidth = h)  	# p value: .696
plrt.model(x, y, 5, verbose = TRUE)	# p value: .628

## Example from Degras (2011)
## Gaussian process with polynomial mean function 
## and Ornstein-Uhlenbeck covariance function
## The SCB and PLRT tests are compared

set.seed(100)
p    <- 100  	# number of observation points
x 	  <- seq(0, 1, len = p)
mu	  <- 10 * x^3 - 15 * x^4 + 6 * x^5	# mean 
R 	  <- (.25)^2 * exp(20 * log(.9) * abs(outer(x,x,"-"))) # covariance 
eigR <- eigen(R, symmetric = TRUE)  	
simR <- eigR$vectors %*% diag(sqrt(eigR$values)) 	 

# Candidate model for mu: polynomial of degree <= 3
# This model, although incorrect, closely approximates mu.
# With n = 50 curves, the SCB and PLRT incorrectly retain the model.
# With n = 70 curves, both tests reject it. 
n <- 50  
y <- mu + simR %*% matrix(rnorm(n*p), p, n) 	# simulate data  
y <- t(y) 	# arrange the trajectories in rows
h <- cv.select(x, y, 1)	
scb.model(x, y, 3, bandwidth = h)		  # p value: .652
plrt.model(x, y, 3, verbose = TRUE)	# p value: .450 
n <- 70  	
y <- mu + simR %*% matrix(rnorm(n*p), p, n) 		
y <- t(y) 										
h <- cv.select(x, y, 1)	
scb.model(x, y, 3, bandwidth = h)		  # p value: .004
plrt.model(x, y, 3, verbose = TRUE)	# p value: .001

# Correct model: polynomials of degree <= 5
scb.model(x, y, 5, bandwidth = h)  	# p value: .696
plrt.model(x, y, 5, verbose = TRUE)	# p value: .628

Summarize a SCBand Object

Description

summary method for class "SCBand"

Usage

## S3 method for class 'SCBand'
summary(object, ...)## S3 method for class 'SCBand'
summary(object, ...)

Arguments

`object`	an object of class `"SCBand"`
`...`	additional arguments; not currently used.

Details

The function summary.SCBand displays all fields of a SCBand object at the exception of x, y, par, nonpar, normscb, and bootscb which are potentially big. It provides information on the function call, data, local polynomial fit, SCB, and statistical tests.

Examples

## Not run: 
## Plasma citrate data 
data(plasma)
time <- 8:21
h <- cv.select(time, plasma, 1)
scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100)
summary(scbplasma)

## End(Not run)
## Not run: 
## Plasma citrate data 
data(plasma)
time <- 8:21
h <- cv.select(time, plasma, 1)
scbplasma <- scb.mean(time, plasma, bandwidth = h, scbtype = "both", gridsize = 100)
summary(scbplasma)

## End(Not run)

Package 'SCBmeanfd'

Help Index

Simultaneous Confidence Bands for Mean Functions

Description

Details

Author(s)

References

Leave-One-Curve-out Cross-Validation Score

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Cross-Validation Bandwidth Selection for Local Polynomial Estimation

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Phoneme data

Description

Usage

Format

Source

Examples

Plasma citrate data

Description

Usage

Format

Source

Examples

Plot a SCBand Object

Description

Usage

Arguments

Details

Examples

Pseudo-Likelihood Ratio Test for Models of the Mean Function

Description

Usage

Arguments

Value

References

See Also

Examples

Plug-in Bandwidth Selection for Local Polynomial Estimation

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Print a SCBand Object

Description

Usage

Arguments

Details

See Also

Examples

Compare Two Mean Functions

Description

Usage

Arguments

Value

References

See Also

Examples

Build Simultaneous Confidence Bands for Mean Functions

Description

Usage

Arguments

Details