Title: | Positive Definite Sparse Covariance Estimators |
---|---|
Description: | Compute and tune some positive definite and sparse covariance estimators. |
Authors: | Adam J. Rothman |
Maintainer: | Adam J. Rothman <[email protected]> |
License: | GPL-2 |
Version: | 1.2.1 |
Built: | 2024-10-31 21:26:11 UTC |
Source: | CRAN |
A package to compute and tune some positive definite and sparse covariance estimators
The main functions are pdsoft
, pdsoft.cv
,
band.chol
, and band.chol.cv
.
Adam J. Rothman
Maintainer: Adam J. Rothman <[email protected]>
Rothman, A. J., Levina, E., and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97(3): 539-550.
Rothman, A. J. (2012). Positive definite estimators of large covariance matrices. Biometrika 99(3): 733-740
Computes the -banded covariance estimator by
-banding the covariance Cholesky factor
as described by Rothman, Levina, and Zhu (2010).
band.chol(x, k, centered = FALSE, method = c("fast", "safe"))
band.chol(x, k, centered = FALSE, method = c("fast", "safe"))
x |
A data matrix with |
k |
The banding parameter (the number of sub-diagonals to keep as non-zero). Should be a non-negative integer. |
centered |
Logical: |
method |
The method to use. The default is
|
method="fast"
is much faster than method="safe"
.
See Rothman, Levina, and Zhu (2010).
The banded covariance estimate (a by
matrix).
Adam J. Rothman
Rothman, A. J., Levina, E., and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97(3): 539-550.
set.seed(1) n=50 p=20 true.cov=diag(p) true.cov[cbind(1:(p-1), 2:p)]=0.4 true.cov[cbind(2:p, 1:(p-1))]=0.4 eo=eigen(true.cov, symmetric=TRUE) z=matrix(rnorm(n*p), nrow=n, ncol=p) x=z%*% tcrossprod(eo$vec*rep(eo$val^(0.5), each=p),eo$vec) sigma=band.chol(x=x, k=1) sigma
set.seed(1) n=50 p=20 true.cov=diag(p) true.cov[cbind(1:(p-1), 2:p)]=0.4 true.cov[cbind(2:p, 1:(p-1))]=0.4 eo=eigen(true.cov, symmetric=TRUE) z=matrix(rnorm(n*p), nrow=n, ncol=p) x=z%*% tcrossprod(eo$vec*rep(eo$val^(0.5), each=p),eo$vec) sigma=band.chol(x=x, k=1) sigma
Selects the banding parameter and computes the banded covariance estimator by banding the covariance Cholesky factor as described by Rothman, Levina, and Zhu (2010).
band.chol.cv(x, k.vec = NULL, method = c("fast", "safe"), nsplits = 10, n.tr = NULL, quiet = TRUE)
band.chol.cv(x, k.vec = NULL, method = c("fast", "safe"), nsplits = 10, n.tr = NULL, quiet = TRUE)
x |
A data matrix with |
k.vec |
An optional vector of candidate banding parameters (the possible number of sub-diagonals to keep as non-zero).
The default is the long vector |
method |
The method to use. The default is
|
nsplits |
Number of random splits to use for banding parameter selection. |
n.tr |
Optional number of cases to use in the training set. The default is the nearest
integer to |
quiet |
Logical: |
method="fast"
is much faster than method="safe"
.
See Rothman, Levina, and Zhu (2010).
A list with
sigma |
the covariance estimate at the selected banding parameter |
best.k |
the selected banding parameter |
cv.err |
the vector of validation errors, one for each entry in |
k.vec |
the vector of candidate banding parameters |
n.tr |
The number of cases used for the training set |
Adam J. Rothman
Rothman, A. J., Levina, E., and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97(3): 539-550.
set.seed(1) n=50 p=20 true.cov=diag(p) true.cov[cbind(1:(p-1), 2:p)]=0.4 true.cov[cbind(2:p, 1:(p-1))]=0.4 eo=eigen(true.cov, symmetric=TRUE) z=matrix(rnorm(n*p), nrow=n, ncol=p) x=z%*% tcrossprod(eo$vec*rep(eo$val^(0.5), each=p),eo$vec) cv.out=band.chol.cv(x=x) plot(cv.out$k.vec, cv.out$cv.err) cv.out$best.k cv.out$sigma
set.seed(1) n=50 p=20 true.cov=diag(p) true.cov[cbind(1:(p-1), 2:p)]=0.4 true.cov[cbind(2:p, 1:(p-1))]=0.4 eo=eigen(true.cov, symmetric=TRUE) z=matrix(rnorm(n*p), nrow=n, ncol=p) x=z%*% tcrossprod(eo$vec*rep(eo$val^(0.5), each=p),eo$vec) cv.out=band.chol.cv(x=x) plot(cv.out$k.vec, cv.out$cv.err) cv.out$best.k cv.out$sigma
Computes the sparse and positive definite covariance matrix estimator proposed by Rothman (2012).
pdsoft(s, lam, tau = 1e-04, init = c("soft", "diag", "dense", "user"), s0 = NULL, i0 = NULL, standard = TRUE, tolin = 1e-08, tolout = 1e-08, maxitin = 10000, maxitout = 1000, quiet = TRUE)
pdsoft(s, lam, tau = 1e-04, init = c("soft", "diag", "dense", "user"), s0 = NULL, i0 = NULL, standard = TRUE, tolin = 1e-08, tolout = 1e-08, maxitin = 10000, maxitout = 1000, quiet = TRUE)
s |
A realization of the |
lam |
The tuning parameter |
tau |
The logarithmic barrier parameter. The default is |
init |
The type of initialization. The default option |
s0 |
Optional user supplied starting point for |
i0 |
Optional user supplied starting point for |
standard |
Logical: |
tolin |
Convergence tolerance for the inner loop of the algorithm that solves the lasso regression. |
tolout |
Convergence tolerance for the outer loop of the algorithm. |
maxitin |
Maximum number of inner-loop iterations allowed |
maxitout |
Maximum number of outer-loop iterations allowed |
quiet |
Logical: |
See Rothman (2012) for the objective function and more information.
A list with
sigma |
covariance estimate |
omega |
inverse covariance estimate |
theta |
correlation matrix estimate, will be |
theta.inv |
inverse correlation matrix estimate, will be |
So long as s
is symmetric with positive diagonal entries and
init
is not set to "user"
(or init
is set to "user"
and i0
as a positive definite matrix),
then omega
is positive definite. If tolin
and tolout
are too large,
or maxitin
and maxitout
are too small, then sigma
may be indefinite.
Adam J. Rothman
Rothman, A. J. (2012). Positive definite estimators of large covariance matrices. Biometrika 99(3): 733-740
set.seed(1) n=50 p=20 true.cov=diag(p) true.cov[cbind(1:(p-1), 2:p)]=0.4 true.cov[cbind(2:p, 1:(p-1))]=0.4 eo=eigen(true.cov, symmetric=TRUE) z=matrix(rnorm(n*p), nrow=n, ncol=p) x=z%*% tcrossprod(eo$vec*rep(eo$val^(0.5), each=p),eo$vec) s=cov(x)*(n-1)/n output=pdsoft(s=s, lam=0.3) output$sigma
set.seed(1) n=50 p=20 true.cov=diag(p) true.cov[cbind(1:(p-1), 2:p)]=0.4 true.cov[cbind(2:p, 1:(p-1))]=0.4 eo=eigen(true.cov, symmetric=TRUE) z=matrix(rnorm(n*p), nrow=n, ncol=p) x=z%*% tcrossprod(eo$vec*rep(eo$val^(0.5), each=p),eo$vec) s=cov(x)*(n-1)/n output=pdsoft(s=s, lam=0.3) output$sigma
Computes and selects the tuning parameter for the sparse and positive definite covariance matrix estimator proposed by Rothman (2012).
pdsoft.cv(x, lam.vec = NULL, standard = TRUE, init = c("diag", "soft", "dense"), tau = 1e-04, nsplits = 10, n.tr = NULL, tolin = 1e-08, tolout = 1e-08, maxitin = 10000, maxitout = 1000, quiet = TRUE)
pdsoft.cv(x, lam.vec = NULL, standard = TRUE, init = c("diag", "soft", "dense"), tau = 1e-04, nsplits = 10, n.tr = NULL, tolin = 1e-08, tolout = 1e-08, maxitin = 10000, maxitout = 1000, quiet = TRUE)
x |
A data matrix with |
lam.vec |
An optional vector of candidate lasso-type penalty tuning parameter values.
The default for |
standard |
Logical: |
init |
The type of initialization used for the estimate computed at the maximum element in |
tau |
The logarithmic barrier parameter. The default is |
nsplits |
The number of random splits to use for the tuning parameter selection. |
n.tr |
Optional number of cases to use in the training set. The default is the nearest
integer to |
tolin |
Convergence tolerance for the inner loop of the algorithm that solves the lasso regression. |
tolout |
Convergence tolerance for the outer loop of the algorithm. |
maxitin |
Maximum number of inner-loop iterations allowed |
maxitout |
Maximum number of outer-loop iterations allowed |
quiet |
Logical: |
See Rothman (2012) for the objective function and more information.
A list with
sigma |
covariance estimate at the selected tuning parameter |
omega |
inverse covariance estimate at the selected tuning parameter |
best.lam |
the selected value of the tuning parameter |
cv.err |
a vector of the validation errors, one for each element in |
lam.vec |
the vector of candidate tuning parameter values |
n.tr |
the number of cases used for the training set |
It is always the case that omega
is positive definite. If tolin
and tolout
are too large,
or maxitin
and maxitout
are too small, then sigma
may be indefinite.
Adam J. Rothman
Rothman, A. J. (2012). Positive definite estimators of large covariance matrices. Biometrika 99(3): 733-740
set.seed(1) n=50 p=20 true.cov=diag(p) true.cov[cbind(1:(p-1), 2:p)]=0.4 true.cov[cbind(2:p, 1:(p-1))]=0.4 eo=eigen(true.cov, symmetric=TRUE) z=matrix(rnorm(n*p), nrow=n, ncol=p) x=z%*% tcrossprod(eo$vec*rep(eo$val^(0.5), each=p),eo$vec) output=pdsoft.cv(x=x) plot(output$lam.vec, output$cv.err) output$best.lam output$sigma
set.seed(1) n=50 p=20 true.cov=diag(p) true.cov[cbind(1:(p-1), 2:p)]=0.4 true.cov[cbind(2:p, 1:(p-1))]=0.4 eo=eigen(true.cov, symmetric=TRUE) z=matrix(rnorm(n*p), nrow=n, ncol=p) x=z%*% tcrossprod(eo$vec*rep(eo$val^(0.5), each=p),eo$vec) output=pdsoft.cv(x=x) plot(output$lam.vec, output$cv.err) output$best.lam output$sigma