Title: | High-Dimensional Principal Fitted Components and Abundant Regression |
---|---|
Description: | Fit and predict with the high-dimensional principal fitted components model. This model is described by Cook, Forzani, and Rothman (2012) <doi:10.1214/11-AOS962>. |
Authors: | Adam J. Rothman |
Maintainer: | Adam J. Rothman <[email protected]> |
License: | GPL-2 |
Version: | 1.2 |
Built: | 2024-11-05 06:15:01 UTC |
Source: | CRAN |
Fit and predict with the high-dimensional principal fitted components model.
The main functions are fit.pfc
, pred.response
.
Adam J. Rothman
Maintainer: Adam J. Rothman <[email protected]>
Cook, R. D., Forzani, L., and Rothman, A. J. (2012). Estimating sufficient reductions of the predictors in abundant high-dimensional regressions. Annals of Statistics 40(1), 353-384.
Let denote the
measurements of
the predictor and response, where
and
.
The model assumes that these measurements
are a realization
of
independent copies of
the random vector
, where
;
with rank
;
with rank
;
is a known
vector valued function;
;
; and
is independent of
.
The central subspace is
.
This function computes estimates of these model parameters
by imposing constraints for identifiability.
The mean parameters and
are estimated with
and
.
Let
,
which we require to be positive definite.
Given a user-specified weight matrix
,
let
subject to the constraints that is diagonal and
. The sufficient reduction estimate
is defined by
fit.pfc(X, y, r=4, d=NULL, F.user=NULL, weight.type=c("sample", "diag", "L1"), lam.vec=NULL, kfold=5, silent=TRUE, qrtol=1e-10, cov.tol=1e-4, cov.maxit=1e3, NPERM=1e3, level=0.01)
fit.pfc(X, y, r=4, d=NULL, F.user=NULL, weight.type=c("sample", "diag", "L1"), lam.vec=NULL, kfold=5, silent=TRUE, qrtol=1e-10, cov.tol=1e-4, cov.maxit=1e3, NPERM=1e3, level=0.01)
X |
The predictor matrix with |
y |
The vector of measured responses with |
r |
When polynomial basis functions are used (which is the case when |
d |
The dimension of the central subspace defined above. This must be specified by the user
when |
F.user |
A matrix with |
weight.type |
The type of weight matrix estimate
|
lam.vec |
A vector of candidate tuning parameter values to use when |
kfold |
The number of folds to use in cross-validation to select the optimal tuning parameter when |
silent |
Logical. When |
qrtol |
The tolerance for calls to |
cov.tol |
The convergence tolerance for the QUIC algorithm used when |
cov.maxit |
The maximum number of iterations allowed for the QUIC algorithm used when |
NPERM |
The number of permutations to used in the sequential permutation testing procedure to select |
level |
The significance level to use to terminate the sequential permutation testing procedure to select |
See Cook, Forzani, and Rothman (2012) more information.
A list with
Gamhat |
this is |
bhat |
this is |
Rmat |
this is |
What |
this is |
d |
this is |
r |
this is |
GWG |
this is |
fc |
a matrix with |
Xc |
a matrix with |
y |
the vector of |
mx |
this is |
mf |
this is |
best.lam |
this is selected tuning parameter value used when |
lam.vec |
this is the vector of candidate tuning parameter values used when
|
err.vec |
this is the vector of validation errors from cross validation, one error for each entry in |
test.info |
a dataframe that summarizes the results from the sequential testing procedure. Will be |
Adam J. Rothman
Cook, R. D., Forzani, L., and Rothman, A. J. (2012). Estimating sufficient reductions of the predictors in abundant high-dimensional regressions. Annals of Statistics 40(1), 353-384.
Friedman, J., Hastie, T., and Tibshirani R. (2008). Sparse inverse covariance estimation with the lasso. Biostatistics 9(3), 432-441.
set.seed(1) n=20 p=30 d=2 y=sqrt(12)*runif(n) Gam=matrix(rnorm(p*d), nrow=p, ncol=d) beta=diag(2) E=matrix(0.5*rnorm(n*p), nrow=n, ncol=p) V=matrix(c(1, sqrt(12), sqrt(12), 12.8), nrow=2, ncol=2) tmp=eigen(V, symmetric=TRUE) V.msqrt=tcrossprod(tmp$vec*rep(tmp$val^(-0.5), each=2), tmp$vec) Fyc=cbind(y-sqrt(3),y^2-4)%*%V.msqrt X=0+Fyc%*%t(beta)%*%t(Gam) + E fit=fit.pfc(X=X, y=y, r=3, weight.type="sample") ## display hypothesis testing information for selecting d fit$test.info ## make a response versus fitted values plot plot(pred.response(fit), y)
set.seed(1) n=20 p=30 d=2 y=sqrt(12)*runif(n) Gam=matrix(rnorm(p*d), nrow=p, ncol=d) beta=diag(2) E=matrix(0.5*rnorm(n*p), nrow=n, ncol=p) V=matrix(c(1, sqrt(12), sqrt(12), 12.8), nrow=2, ncol=2) tmp=eigen(V, symmetric=TRUE) V.msqrt=tcrossprod(tmp$vec*rep(tmp$val^(-0.5), each=2), tmp$vec) Fyc=cbind(y-sqrt(3),y^2-4)%*%V.msqrt X=0+Fyc%*%t(beta)%*%t(Gam) + E fit=fit.pfc(X=X, y=y, r=3, weight.type="sample") ## display hypothesis testing information for selecting d fit$test.info ## make a response versus fitted values plot plot(pred.response(fit), y)
Let denote the values of the
predictors.
This function computes
using equation (8.1)
of Cook, Forzani, and Rothman (2012).
pred.response(fit, newx=NULL)
pred.response(fit, newx=NULL)
fit |
The object returned by |
newx |
A matrix with |
See Cook, Forzani, and Rothman (2012) for more information.
A vector of response prediction with nrow(newx)
entries.
Adam J. Rothman
Cook, R. D., Forzani, L., and Rothman, A. J. (2012). Estimating sufficient reductions of the predictors in abundant high-dimensional regressions. Annals of Statistics 40(1), 353-384.
set.seed(1) n=25 p=50 d=1 true.G = matrix(rnorm(p*d), nrow=p, ncol=d) y=rnorm(n) fy = y E=matrix(rnorm(n*p), nrow=n, ncol=p) X=fy%*%t(true.G) + E fit=fit.pfc(X=X, r=4, d=d, y=y, weight.type="diag") fitted.values=pred.response(fit) mean((y-fitted.values)^2) plot(fitted.values, y) n.new=100 y.new=rnorm(n.new) fy.new=y.new E.new=matrix(rnorm(n.new*p), nrow=n.new, ncol=p) X.new = fy.new%*%t(true.G) + E.new mean((y.new - pred.response(fit, newx=X.new))^2)
set.seed(1) n=25 p=50 d=1 true.G = matrix(rnorm(p*d), nrow=p, ncol=d) y=rnorm(n) fy = y E=matrix(rnorm(n*p), nrow=n, ncol=p) X=fy%*%t(true.G) + E fit=fit.pfc(X=X, r=4, d=d, y=y, weight.type="diag") fitted.values=pred.response(fit) mean((y-fitted.values)^2) plot(fitted.values, y) n.new=100 y.new=rnorm(n.new) fy.new=y.new E.new=matrix(rnorm(n.new*p), nrow=n.new, ncol=p) X.new = fy.new%*%t(true.G) + E.new mean((y.new - pred.response(fit, newx=X.new))^2)