Title: | Fast Covariance Estimation for Sparse Functional Data |
---|---|
Description: | We implement the Fast Covariance Estimation for Sparse Functional Data paper published in Statistics and Computing <doi: 10.1007/s11222-017-9744-8>. |
Authors: | Luo Xiao [aut], Cai Li [aut,cre], William Checkley [aut], Ciprian Crainiceanu [aut] |
Maintainer: | Cai Li <[email protected]> |
License: | GPL-3 |
Version: | 0.1-7 |
Built: | 2024-10-29 06:18:59 UTC |
Source: | CRAN |
Fast Covariance Estimation for Sparse Functional Data
Package: | face |
Type: | Package |
Version: | 0.1-7 |
Date: | 2022-07-18 |
License: GPL-3 |
Luo Xiao, Cai Li, William Checkley and Ciprian Crainiceanu
Maintainer: Cai Li <[email protected]>
Luo Xiao, Cai Li, William Checkley and Ciprian Crainiceanu, Fast covariance estimation for sparse functional data, Stat. Comput., doi:10.1007/s11222-017-9744-8.
face.sparse
object
Extraction of correlation and mean from a face.sparse
object
cor.face(object,argvals.new,option="raw")
cor.face(object,argvals.new,option="raw")
object |
A |
argvals.new |
Where to evalulate correlation and mean. |
option |
Defaults to "raw"; if "smooth", then extract correlation from smoothed covariance function. |
argvals.new |
Where to evaluate correlation and mean. |
option |
Defaults to "raw"; if "smooth", then extract correlation from smoothed covariance function. |
Cor |
estimated correlation matrix at |
mu |
estimated group/population mean at |
Luo Xiao <[email protected]>
Luo Xiao, Cai Li, William Checkley and Ciprian Crainiceanu, Fast covariance estimation for sparse functional data, Stat. Comput., doi:10.1007/s11222-017-9744-8.
# See the examples for "face.sparse".
# See the examples for "face.sparse".
The function is to estimate the mean and covariance function from a cluster of functions/longitudinal observations.
face.sparse(data, newdata = NULL, center = TRUE, argvals.new = NULL, knots = 7, p = 3, m = 2, lambda = NULL, lambda_mean = NULL, search.length = 14, lower = -3, upper = 10, lower2 = -3, upper2 = 5, calculate.scores = FALSE,pve=0.99,two_step=FALSE)
face.sparse(data, newdata = NULL, center = TRUE, argvals.new = NULL, knots = 7, p = 3, m = 2, lambda = NULL, lambda_mean = NULL, search.length = 14, lower = -3, upper = 10, lower2 = -3, upper2 = 5, calculate.scores = FALSE,pve=0.99,two_step=FALSE)
data |
a data frame with three arguments:
(1) |
newdata |
of the same strucutre as |
center |
logical. If TRUE, then Pspline smoothing of the population mean will be conducted and subtracted from the data before covariance smoothing; if FALSE, then the population mean will be just 0s. |
argvals.new |
a vector of observation time points to evaluate mean function, covariance function, error variance
and etc. If NULL,
then 100 equidistant points in the range of |
knots |
the number of knots for B-spline basis functions to be used; defaults to 7. The resulting number of basis functions is the number of interior knots plus the degree of B-splines. |
p |
the degrees of B-splines; defaults to 3. |
m |
the order of differencing penalty; defaults to 2. |
lambda |
the value of the smoothing parameter for covariance smoothing; defaults to NULL. |
lambda_mean |
the value of the smoothing parameter for mean smoothing; defaults to NULL. |
search.length |
the number of equidistant (log scale) smoothing parameters to search; defaults to 14. |
lower , upper
|
bounds for log smoothing parameter for first step of estimation; defaults are -3 and 10, respectively. |
lower2 , upper2
|
bounds for log smoothing parameter for second step of estimation; defaults are |
calculate.scores |
if TRUE, scores will be calculated. |
pve |
Defaults 0.99. To select the number of eigenvalues by percentage of variance. |
two_step |
if TRUE, a two-step estimation procedure will be applied. |
This is a generalized version of bivariate P-splines (Eilers and Marx, 2003) for covariance smoothing of sparse functional or longitudinal data. It uses tensor product B-spline basis functions and employes a differencing penalty on the assosciated parameter matrix. The only smoothing parameter in the method is selected by leave-one-subject-out cross validation and is implemented with a fast algorithm.
There are two steps for estimation. During the first step, the objective function to minimize is the penalized least squares on empirical estimates of covariance function. During the second step, the covariance between the empirical estimates (depending on the estimates of covariance function) are accounted and thus a generalized penalized least squares are minimized.
If center
is TRUE, then a population mean will be calculated and is smoothed by
univariate P-spline smoothing:pspline
(Eilers and Marx, 1996). This univariate
smoothing uses leave-one-subject-out cross validation to select the smoothing parameter.
The knots are "equally-spaced", the differencing penalty in Eilers and Marx (2003) is used.
If the functional data are observed at the same grid for each function/curve
and can be organized into
a data matrix, then fpca.face
in the package refund
should instead be used. fpca.face
allows a small percentage (less than
30 percent) of
missing data in the data matrix.
newdata |
Input |
y.pred , mu.pred , Chat.diag.pred , var.error.pred
|
Predicted/estimated objects at
|
Theta |
Estimated parameter matrix |
argvals.new |
Vector of time points to evaluate population parameters |
mu.new , Chat.new , Cor.new , Cor.raw.new , Chat.raw.diag.new , var.error.new
|
Estimated objects at
|
eigenfunctions , eigenvalues
|
Estimated eigenfunctions (scaled eigenvector) and eigenvalues at |
mu.hat , var.error.hat
|
Estimated objects at |
calculate.scores , rand_eff
|
if |
... |
... |
Luo Xiao <[email protected]> and Cai Li <[email protected]>
Luo Xiao, Cai Li, William Checkley and Ciprian Crainiceanu, Fast covariance estimation for sparse functional data, Stat. Comput., doi:10.1007/s11222-017-9744-8.
Paul Eilers and Brian Marx, Multivariate calibration with temperature interaction using two-dimensional penalized signal regression, Chemometrics and Intelligent Laboratory Systems 66 (2003), 159-174.
Paul Eilers and Brian Marx, Flexible smoothing with B-splines and penalties, Statist. Sci., 11, 89-121, 1996.
Simon N. Wood, P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data, Stat. Comput., doi:10.1007/s11222-016-9666-x.
fpca.face
and fpca.sc
in refund
## Not run: ########################## #### CD4 data example ########################## require(refund) data(cd4) n <- nrow(cd4) Tt <- ncol(cd4) id <- rep(1:n,each=Tt) t <- rep(-18:42,times=n) y <- as.vector(t(cd4)) sel <- which(is.na(y)) ## organize data and apply FACEs data <- data.frame(y=log(y[-sel]), argvals = t[-sel], subj = id[-sel]) data <- data[data$y>4.5,] fit_face <- face.sparse(data,argvals.new=(-20:40)) ## set calculate.scores to TRUE if want to get scores fit_face <- face.sparse(data,argvals.new=(-20:40),calculate.scores=TRUE) scores <- fit_face$rand_eff$scores data.h <- data tnew <- fit_face$argvals.new ## scatter plots Xlab <- "Months since seroconversion" Ylab <- "log (CD4 count)" par(mfrow=c(1,1),mar = c(4.5,4.5,3,2)) id <- data.h$subj uid <- unique(id) plot(data.h$argvals,data.h$y, type = "n", ylim = c(4.5,8), xlab = Xlab, ylab = Ylab, cex.lab = 1.25,cex.axis=1.25,cex.main = 1.25) for(i in 1:10){ seq <- which(id==uid[i]) lines(data.h$argvals[seq],data.h$y[seq],lty=1,col="gray",lwd=1,type="l") #points(data.h$argvals[seq],data.h$y[seq],col=1,lty=1,pch=1) } Sample <- seq(10,50,by=10) for(i in Sample){ seq <- which(id==uid[i]) lines(data.h$argvals[seq],data.h$y[seq],lty=1,col="black",lwd=1,type="l") } lines(tnew,fit_face$mu.new,lwd=2,lty=2,col="red") ## plots of variance/correlation functions Cov <- fit_face$Chat.new Cov_diag <- diag(Cov) Cor <- fit_face$Cor.new par(mfrow=c(1,2),mar=c(4.5,4.1,3,4.5)) plot(tnew,Cov_diag,type="l", xlab = Xlab, ylab="",main= "CD4: variance function", #ylim = c(0.8,1.5), cex.axis=1.25,cex.lab=1.25,cex.main=1.25,lwd=2) require(fields) image.plot(tnew,tnew,Cor, xlab=Xlab, ylab = Xlab, main = "CD4: correlation function", cex.axis=1.25,cex.lab=1.25,cex.main=1.25, axis.args = list(at = c(0,0.2,0.4,0.6,0.8,1.0)), legend.shrink=0.75,legend.line=-1.5) ## prediction of several subjects par(mfrow=c(2,2),mar=c(4.5,4.5,3,2)) Sample <- c(30,40,50,60) for(i in 1:4){ sel <- which(id==uid[Sample[i]]) dati <- data.h[sel,] seq <- -20:40 k <- length(seq) dati_pred <- data.frame(y = rep(NA,nrow(dati) + k ), argvals = c(rep(NA,nrow(dati)),seq), subj=rep(dati$subj[1],nrow(dati) + k ) ) dati_pred[1:nrow(dati),] <- dati yhat2 <- predict(fit_face,dati_pred) data3 <- dati Ylim <- range(c(data3$y,yhat2$y.pred)) plot(data3$argvals,data3$y,xlab=Xlab,ylab=Ylab, main = paste("Male ",i,sep=""), ylim = c(4,8.5), cex.lab=1.25,cex.axis = 1.25,cex.main = 1.25,pch=1,xlim=c(-20,40)) Ord <- nrow(dati) + 1:k lines(dati_pred$argvals[Ord],yhat2$y.pred[Ord],col="red",lwd=2) lines(dati_pred$argvals[Ord], yhat2$y.pred[Ord] - 1.96*yhat2$se.pred[Ord], col="red",lwd=1,lty=2) lines(dati_pred$argvals[Ord], yhat2$y.pred[Ord] + 1.96*yhat2$se.pred[Ord], col="red",lwd=1,lty=2) lines(tnew,fit_face$mu.new,lty=3,col="black",lwd=2) legend("bottomleft",c("mean","prediction"),lty=c(3,1),col=1:2,lwd=2,bty="n") } ## End(Not run)
## Not run: ########################## #### CD4 data example ########################## require(refund) data(cd4) n <- nrow(cd4) Tt <- ncol(cd4) id <- rep(1:n,each=Tt) t <- rep(-18:42,times=n) y <- as.vector(t(cd4)) sel <- which(is.na(y)) ## organize data and apply FACEs data <- data.frame(y=log(y[-sel]), argvals = t[-sel], subj = id[-sel]) data <- data[data$y>4.5,] fit_face <- face.sparse(data,argvals.new=(-20:40)) ## set calculate.scores to TRUE if want to get scores fit_face <- face.sparse(data,argvals.new=(-20:40),calculate.scores=TRUE) scores <- fit_face$rand_eff$scores data.h <- data tnew <- fit_face$argvals.new ## scatter plots Xlab <- "Months since seroconversion" Ylab <- "log (CD4 count)" par(mfrow=c(1,1),mar = c(4.5,4.5,3,2)) id <- data.h$subj uid <- unique(id) plot(data.h$argvals,data.h$y, type = "n", ylim = c(4.5,8), xlab = Xlab, ylab = Ylab, cex.lab = 1.25,cex.axis=1.25,cex.main = 1.25) for(i in 1:10){ seq <- which(id==uid[i]) lines(data.h$argvals[seq],data.h$y[seq],lty=1,col="gray",lwd=1,type="l") #points(data.h$argvals[seq],data.h$y[seq],col=1,lty=1,pch=1) } Sample <- seq(10,50,by=10) for(i in Sample){ seq <- which(id==uid[i]) lines(data.h$argvals[seq],data.h$y[seq],lty=1,col="black",lwd=1,type="l") } lines(tnew,fit_face$mu.new,lwd=2,lty=2,col="red") ## plots of variance/correlation functions Cov <- fit_face$Chat.new Cov_diag <- diag(Cov) Cor <- fit_face$Cor.new par(mfrow=c(1,2),mar=c(4.5,4.1,3,4.5)) plot(tnew,Cov_diag,type="l", xlab = Xlab, ylab="",main= "CD4: variance function", #ylim = c(0.8,1.5), cex.axis=1.25,cex.lab=1.25,cex.main=1.25,lwd=2) require(fields) image.plot(tnew,tnew,Cor, xlab=Xlab, ylab = Xlab, main = "CD4: correlation function", cex.axis=1.25,cex.lab=1.25,cex.main=1.25, axis.args = list(at = c(0,0.2,0.4,0.6,0.8,1.0)), legend.shrink=0.75,legend.line=-1.5) ## prediction of several subjects par(mfrow=c(2,2),mar=c(4.5,4.5,3,2)) Sample <- c(30,40,50,60) for(i in 1:4){ sel <- which(id==uid[Sample[i]]) dati <- data.h[sel,] seq <- -20:40 k <- length(seq) dati_pred <- data.frame(y = rep(NA,nrow(dati) + k ), argvals = c(rep(NA,nrow(dati)),seq), subj=rep(dati$subj[1],nrow(dati) + k ) ) dati_pred[1:nrow(dati),] <- dati yhat2 <- predict(fit_face,dati_pred) data3 <- dati Ylim <- range(c(data3$y,yhat2$y.pred)) plot(data3$argvals,data3$y,xlab=Xlab,ylab=Ylab, main = paste("Male ",i,sep=""), ylim = c(4,8.5), cex.lab=1.25,cex.axis = 1.25,cex.main = 1.25,pch=1,xlim=c(-20,40)) Ord <- nrow(dati) + 1:k lines(dati_pred$argvals[Ord],yhat2$y.pred[Ord],col="red",lwd=2) lines(dati_pred$argvals[Ord], yhat2$y.pred[Ord] - 1.96*yhat2$se.pred[Ord], col="red",lwd=1,lty=2) lines(dati_pred$argvals[Ord], yhat2$y.pred[Ord] + 1.96*yhat2$se.pred[Ord], col="red",lwd=1,lty=2) lines(tnew,fit_face$mu.new,lty=3,col="black",lwd=2) legend("bottomleft",c("mean","prediction"),lty=c(3,1),col=1:2,lwd=2,bty="n") } ## End(Not run)
Predict subject-specific curves based on a fit from "face.sparse".
## S3 method for class 'face.sparse' predict(object, newdata,...)
## S3 method for class 'face.sparse' predict(object, newdata,...)
object |
a fitted object from the R function "face.sparse". |
newdata |
a data frame with three arguments:
(1) |
... |
further arguments passed to or from other methods. |
This function makes prediction based on observed data for each subject. So for each subject,
it requires at least one observed data. For the time points prediction is desired but
no observation is available, just make the corresponding data$y
as NA.
object |
A "face.sparse" fit |
newdata |
Input |
y.pred , mu.pred , Chat.pred , Chat.diag.pred , var.error.pred
|
Predicted/estimated objects at
the observation time points in |
rand_eff |
if |
... |
... |
Luo Xiao <[email protected]>
Luo Xiao, Cai Li, William Checkley and Ciprian Crainiceanu, Fast covariance estimation for sparse functional data, Stat. Comput., doi:10.1007/s11222-017-9744-8.
#See the examples for "face.sparse".
#See the examples for "face.sparse".
Predict mean values based on a fit from "pspline".
## S3 method for class 'pspline.face' predict(object, argvals.new,...)
## S3 method for class 'pspline.face' predict(object, argvals.new,...)
object |
a fitted object from the R function "pspline". |
argvals.new |
a vector of new time points. |
... |
further arguments passed to or from other methods. |
Predicted means at argvals.new
.
Luo Xiao <[email protected]>
Luo Xiao, Cai Li, William Checkley and Ciprian Crainiceanu, Fast covariance estimation for sparse functional data, Stat. Comput., doi:10.1007/s11222-017-9744-8.
#See the examples for "pspline".
#See the examples for "pspline".
Univariate P-spline smoothing with the smoothing parameter selected by leave-one-subject-out cross validation.
pspline(data, argvals.new = NULL, knots = 35, p = 3, m = 2, lambda = NULL, search.length = 100, lower = -20, upper = 20)
pspline(data, argvals.new = NULL, knots = 35, p = 3, m = 2, lambda = NULL, search.length = 100, lower = -20, upper = 20)
data |
a data frame with three arguments:
(1) |
argvals.new |
a vector of observations times for prediction; if NULL, then the same as |
knots |
a vector of interior knots or the number of knots for B-spline basis functions to be used; defaults to 35. |
p |
the degrees of B-splines; defaults to 3. |
m |
the order of differencing penalty; defaults to 2. |
lambda |
the value of the smoothing parameter; defaults to NULL. |
search.length |
the number of equidistant (log scale) smoothing parameters to search; defaults to 100. |
lower , upper
|
bounds for log smoothing parameter; defaults are -20 and 20. |
The function is an implementation of the P-spline smoothing in Eilers and Marx (1996). P-splines uses B-splines as basis functions and employs a differencing penalty on the coefficients. Leave-one-subject-out cross validation is used for selecting the smoothing parameter and a fast algorithm is implemented.
fitted.values |
Fitted mean values |
B |
B-spline design matrix |
theta |
Estimated coefficients |
s |
Eigenvalues |
knots |
Knots |
p |
The degrees of B-splines |
m |
The order of differencing penalty |
lambda |
The value of the smoothing parameter |
argvals.new |
A vector of observations times |
mu.new |
Fitted mean values at |
Luo Xiao <[email protected]>
Paul Eilers and Brian Marx, Flexible smoothing with B-splines and penalties, Statist. Sci., 11, 89-121, 1996.
Luo Xiao, Cai Li, William Checkley and Ciprian Crainiceanu, Fast covariance estimation for sparse functional data, Stat. Comput., doi:10.1007/s11222-017-9744-8.
## Not run: ## cd4 data require(refund) data(cd4) n <- nrow(cd4) T <- ncol(cd4) id <- rep(1:n,each=T) t <- rep(-18:42,times=n) y <- as.vector(t(cd4)) sel <- which(is.na(y)) ## organize data data <- data.frame(y=log(y[-sel]), argvals = t[-sel], subj = id[-sel]) data <- data[data$y>4.5,] ## smooth fit <- pspline(data) ## plot plot(data$argvals,fit$mu.new,type="p") ## prediction pred <- predict(fit,quantile(data$argvals,c(0.2,0.6))) pred ## End(Not run)
## Not run: ## cd4 data require(refund) data(cd4) n <- nrow(cd4) T <- ncol(cd4) id <- rep(1:n,each=T) t <- rep(-18:42,times=n) y <- as.vector(t(cd4)) sel <- which(is.na(y)) ## organize data data <- data.frame(y=log(y[-sel]), argvals = t[-sel], subj = id[-sel]) data <- data[data$y>4.5,] ## smooth fit <- pspline(data) ## plot plot(data$argvals,fit$mu.new,type="p") ## prediction pred <- predict(fit,quantile(data$argvals,c(0.2,0.6))) pred ## End(Not run)
Construct knots from either quantiles of observed time points or equally-spaced time points.
select.knots(t,knots=10,p=3,option="equally-spaced")
select.knots(t,knots=10,p=3,option="equally-spaced")
t |
Observed time points. |
knots |
Number of interior knots. |
p |
Degrees of B-splines to be used. |
option |
Default "equally-spaced": equally-spaced time points in the range of |
The number of knots in the output will be knot
plus 2 times p
; and
the B-spline basis matrix constructed from this vector of knots with degrees p
will be knots
plus p
.
A vector of knots
Luo Xiao <[email protected]>
t <- rnorm(100) knots <- select.knots(t)
t <- rnorm(100) knots <- select.knots(t)