Title: | Statistical Tools for Covariance Analysis |
---|---|
Description: | Covariance is of universal prevalence across various disciplines within statistics. We provide a rich collection of geometric and inferential tools for convenient analysis of covariance structures, topics including distance measures, mean covariance estimator, covariance hypothesis test for one-sample and two-sample cases, and covariance estimation. For an introduction to covariance in multivariate statistical analysis, see Schervish (1987) <doi:10.1214/ss/1177013111>. |
Authors: | Kyoungjae Lee [aut], Lizhen Lin [ctb], Kisung You [aut, cre] |
Maintainer: | Kisung You <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.5.4 |
Built: | 2024-11-11 07:18:37 UTC |
Source: | CRAN |
It performs Bayesian version of 1-sample test for Covariance where the null hypothesis is
where is the covariance of data model and
is a
hypothesized covariance. Denote
be the
-th column of data matrix.
Under the maximum pairwise Bayes Factor framework, we have following hypothesis,
The model is
and the prior is set, under , as
BCovTest1.mxPBF(data, Sigma0 = diag(ncol(data)), a0 = 2, b0 = 2, gamma = 1)
BCovTest1.mxPBF(data, Sigma0 = diag(ncol(data)), a0 = 2, b0 = 2, gamma = 1)
data |
an |
Sigma0 |
a |
a0 |
shape parameter for inverse-gamma prior. |
b0 |
scale parameter for inverse-gamma prior. |
gamma |
non-negative number. See the equation above. |
a named list containing:
a matrix of pairwise log Bayes factors.
Lee K, Lin L, Dunson D (2018). “Maximum Pairwise Bayes Factors for Covariance Structure Testing.” arXiv preprint. https://arxiv.org/abs/1809.03105.
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 10 data = matrix(rnorm(100*pdim), nrow=100) ## run mxPBF-based test out1 = BCovTest1.mxPBF(data) out2 = BCovTest1.mxPBF(data, a0=5.0, b0=5.0) # change some params ## visualize two Bayes Factor matrices opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(exp(out1$log.BF.mat)[,pdim:1], main="default") image(exp(out2$log.BF.mat)[,pdim:1], main="a0=b0=5.0") par(opar) ## End(Not run)
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 10 data = matrix(rnorm(100*pdim), nrow=100) ## run mxPBF-based test out1 = BCovTest1.mxPBF(data) out2 = BCovTest1.mxPBF(data, a0=5.0, b0=5.0) # change some params ## visualize two Bayes Factor matrices opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(exp(out1$log.BF.mat)[,pdim:1], main="default") image(exp(out2$log.BF.mat)[,pdim:1], main="a0=b0=5.0") par(opar) ## End(Not run)
One-sample diagonality test can be stated with the null hypothesis
and alternative hypothesis
with
. Let
be the
-th column of data matrix. Under the maximum pairwise bayes factor framework, we have following hypothesis,
The model is
Under , the prior is set as
and under , priors are
BDiagTest1.mxPBF(data, a0 = 2, b0 = 2, gamma = 1)
BDiagTest1.mxPBF(data, a0 = 2, b0 = 2, gamma = 1)
data |
an |
a0 |
shape parameter for inverse-gamma prior. |
b0 |
scale parameter for inverse-gamma prior. |
gamma |
non-negative number. See the equation above. |
a named list containing:
matrix of pairwise log Bayes factors.
Lee K, Lin L, Dunson D (2018). “Maximum Pairwise Bayes Factors for Covariance Structure Testing.” arXiv preprint. https://arxiv.org/abs/1809.03105.
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 10 data = matrix(rnorm(100*pdim), nrow=100) ## run test ## run mxPBF-based test out1 = BDiagTest1.mxPBF(data) out2 = BDiagTest1.mxPBF(data, a0=5.0, b0=5.0) # change some params ## visualize two Bayes Factor matrices opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(exp(out1$log.BF.mat)[,pdim:1], main="default") image(exp(out2$log.BF.mat)[,pdim:1], main="a0=b0=5.0") par(opar) ## End(Not run)
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 10 data = matrix(rnorm(100*pdim), nrow=100) ## run test ## run mxPBF-based test out1 = BDiagTest1.mxPBF(data) out2 = BDiagTest1.mxPBF(data, a0=5.0, b0=5.0) # change some params ## visualize two Bayes Factor matrices opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(exp(out1$log.BF.mat)[,pdim:1], main="default") image(exp(out2$log.BF.mat)[,pdim:1], main="a0=b0=5.0") par(opar) ## End(Not run)
For a given 3-dimensional array where symmetric positive definite (SPD) matrices are stacked slice by slice, it computes pairwise distance using various popular measures. Some of measures are metric as they suffice 3 conditions in mathematical context; nonnegative definiteness, symmetry, and triangle inequalities. Other non-metric measures represent dissimilarities between two SPD objects.
CovDist( A, method = c("AIRM", "Bhattacharyya", "Cholesky", "Euclidean", "Hellinger", "JBLD", "KLDM", "LERM", "Procrustes.SS", "Procrustes.Full", "PowerEuclidean", "RootEuclidean"), power = 1 )
CovDist( A, method = c("AIRM", "Bhattacharyya", "Cholesky", "Euclidean", "Hellinger", "JBLD", "KLDM", "LERM", "Procrustes.SS", "Procrustes.Full", "PowerEuclidean", "RootEuclidean"), power = 1 )
A |
a |
method |
the type of distance measures to be used; |
power |
a non-zero number for PowerEuclidean distance. |
an symmetric matrix of pairwise distances.
Arsigny V, Fillard P, Pennec X, Ayache N (2006). “Log-Euclidean metrics for fast and simple calculus on diffusion tensors.” Magnetic Resonance in Medicine, 56(2), 411–421. ISSN 0740-3194, 1522-2594.
Dryden IL, Koloydenko A, Zhou D (2009). “Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging.” The Annals of Applied Statistics, 3(3), 1102–1123. ISSN 1932-6157.
## generate 100 SPD matrices of size (5-by-5) samples = samplecovs(100,5) ## get pairwise distance for "AIRM" distAIRM = CovDist(samples, method="AIRM") ## dimension reduction using MDS ss = cmdscale(distAIRM) ## visualize opar <- par(no.readonly=TRUE) plot(ss[,1],ss[,2],main="2d projection") par(opar)
## generate 100 SPD matrices of size (5-by-5) samples = samplecovs(100,5) ## get pairwise distance for "AIRM" distAIRM = CovDist(samples, method="AIRM") ## dimension reduction using MDS ss = cmdscale(distAIRM) ## visualize opar <- par(no.readonly=TRUE) plot(ss[,1],ss[,2],main="2d projection") par(opar)
Ledoit and Wolf (2003, 2004) proposed a linear shrinkage strategy to estimate covariance matrix with an application to portfolio optimization. An optimal covariance is written as a convex combination as follows,
where a control parameter/weight,
an empirical covariance matrix, and
a target matrix.
Although authors used
a highly structured estimator, we also enabled an arbitrary target matrix to be used as long as it's symmetric
and positive definite of corresponding size.
CovEst.2003LW(X, target = NULL)
CovEst.2003LW(X, target = NULL)
X |
an |
target |
target matrix |
a named list containing:
a covariance matrix estimate.
an estimate for convex combination weight according to the relevant theory.
Ledoit O, Wolf M (2003). “Improved estimation of the covariance matrix of stock returns with an application to portfolio selection.” Journal of Empirical Finance, 10(5), 603–621. ISSN 09275398.
Ledoit O, Wolf M (2004). “A well-conditioned estimator for large-dimensional covariance matrices.” Journal of Multivariate Analysis, 88(2), 365–411. ISSN 0047259X.
Ledoit O, Wolf M (2004). “Honey, I Shrunk the Sample Covariance Matrix.” The Journal of Portfolio Management, 30(4), 110–119. ISSN 0095-4918.
## CRAN-purpose small computation # set a seed for reproducibility set.seed(11) # small data with identity covariance pdim <- 5 dat.small <- matrix(rnorm(20*pdim), ncol=pdim) # run the code with highly structured estimator out.small <- CovEst.2003LW(dat.small) # visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(diag(5)[,pdim:1], main="true cov") image(cov(dat.small)[,pdim:1], main="sample cov") image(out.small$S[,pdim:1], main="estimated cov") par(opar) ## Not run: ## want to see how delta is determined according to # the number of observations we have. nsamples = seq(from=5, to=200, by=5) nnsample = length(nsamples) # we will record two values; delta and norm difference vec.delta = rep(0, nnsample) vec.normd = rep(0, nnsample) for (i in 1:nnsample){ dat.norun <- matrix(rnorm(nsamples[i]*pdim), ncol=pdim) # sample in R^5 out.norun <- CovEst.2003LW(dat.norun) # run with default vec.delta[i] = out.norun$delta vec.normd[i] = norm(out.norun$S - diag(pdim),"f") # Frobenius norm } # let's visualize the results opar <- par(no.readonly=TRUE) par(mfrow=c(1,2)) plot(nsamples, vec.delta, lwd=2, type="b", col="red", main="estimated deltas") plot(nsamples, vec.normd, lwd=2, type="b", col="blue",main="Frobenius error") par(opar) ## End(Not run)
## CRAN-purpose small computation # set a seed for reproducibility set.seed(11) # small data with identity covariance pdim <- 5 dat.small <- matrix(rnorm(20*pdim), ncol=pdim) # run the code with highly structured estimator out.small <- CovEst.2003LW(dat.small) # visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(diag(5)[,pdim:1], main="true cov") image(cov(dat.small)[,pdim:1], main="sample cov") image(out.small$S[,pdim:1], main="estimated cov") par(opar) ## Not run: ## want to see how delta is determined according to # the number of observations we have. nsamples = seq(from=5, to=200, by=5) nnsample = length(nsamples) # we will record two values; delta and norm difference vec.delta = rep(0, nnsample) vec.normd = rep(0, nnsample) for (i in 1:nnsample){ dat.norun <- matrix(rnorm(nsamples[i]*pdim), ncol=pdim) # sample in R^5 out.norun <- CovEst.2003LW(dat.norun) # run with default vec.delta[i] = out.norun$delta vec.normd[i] = norm(out.norun$S - diag(pdim),"f") # Frobenius norm } # let's visualize the results opar <- par(no.readonly=TRUE) par(mfrow=c(1,2)) plot(nsamples, vec.delta, lwd=2, type="b", col="red", main="estimated deltas") plot(nsamples, vec.normd, lwd=2, type="b", col="blue",main="Frobenius error") par(opar) ## End(Not run)
Authors propose to estimate covariance matrix by iteratively approximating the shrinkage with
where a control parameter/weight,
an empirical covariance matrix, and
a target matrix.
It is proposed to use a structured estimate
where
is an identity matrix of dimension
.
CovEst.2010OAS(X)
CovEst.2010OAS(X)
X |
an |
a named list containing:
a covariance matrix estimate.
an estimate for convex combination weight.
Chen Y, Wiesel A, Eldar YC, Hero AO (2010). “Shrinkage Algorithms for MMSE Covariance Estimation.” IEEE Transactions on Signal Processing, 58(10), 5016–5029. ISSN 1053-587X, 1941-0476.
## CRAN-purpose small computation # set a seed for reproducibility set.seed(11) # small data with identity covariance pdim <- 5 dat.small <- matrix(rnorm(10*pdim), ncol=pdim) # run the code out.small <- CovEst.2010OAS(dat.small) # visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(diag(pdim)[,pdim:1], main="true cov") image(cov(dat.small)[,pdim:1], main="sample cov") image(out.small$S[,pdim:1], main="estimated cov") par(opar) ## Not run: ## want to see how delta is determined according to # the number of observations we have. nsamples = seq(from=5, to=200, by=5) nnsample = length(nsamples) # we will record two values; rho and norm difference vec.rho = rep(0, nnsample) vec.normd = rep(0, nnsample) for (i in 1:nnsample){ dat.norun <- matrix(rnorm(nsamples[i]*pdim), ncol=pdim) # sample in R^5 out.norun <- CovEst.2010OAS(dat.norun) # run with default vec.rho[i] = out.norun$rho vec.normd[i] = norm(out.norun$S - diag(pdim),"f") # Frobenius norm } # let's visualize the results opar <- par(no.readonly=TRUE) par(mfrow=c(1,2)) plot(nsamples, vec.rho, lwd=2, type="b", col="red", main="estimated rhos") plot(nsamples, vec.normd, lwd=2, type="b", col="blue",main="Frobenius error") par(opar) ## End(Not run)
## CRAN-purpose small computation # set a seed for reproducibility set.seed(11) # small data with identity covariance pdim <- 5 dat.small <- matrix(rnorm(10*pdim), ncol=pdim) # run the code out.small <- CovEst.2010OAS(dat.small) # visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(diag(pdim)[,pdim:1], main="true cov") image(cov(dat.small)[,pdim:1], main="sample cov") image(out.small$S[,pdim:1], main="estimated cov") par(opar) ## Not run: ## want to see how delta is determined according to # the number of observations we have. nsamples = seq(from=5, to=200, by=5) nnsample = length(nsamples) # we will record two values; rho and norm difference vec.rho = rep(0, nnsample) vec.normd = rep(0, nnsample) for (i in 1:nnsample){ dat.norun <- matrix(rnorm(nsamples[i]*pdim), ncol=pdim) # sample in R^5 out.norun <- CovEst.2010OAS(dat.norun) # run with default vec.rho[i] = out.norun$rho vec.normd[i] = norm(out.norun$S - diag(pdim),"f") # Frobenius norm } # let's visualize the results opar <- par(no.readonly=TRUE) par(mfrow=c(1,2)) plot(nsamples, vec.rho, lwd=2, type="b", col="red", main="estimated rhos") plot(nsamples, vec.normd, lwd=2, type="b", col="blue",main="Frobenius error") par(opar) ## End(Not run)
Authors propose to estimate covariance matrix by minimizing mean squared error with the following formula,
where a control parameter/weight,
an empirical covariance matrix, and
a target matrix.
It is proposed to use a structured estimate
where
is an identity matrix of dimension
.
CovEst.2010RBLW(X)
CovEst.2010RBLW(X)
X |
an |
a named list containing:
a covariance matrix estimate.
an estimate for convex combination weight.
Chen Y, Wiesel A, Eldar YC, Hero AO (2010). “Shrinkage Algorithms for MMSE Covariance Estimation.” IEEE Transactions on Signal Processing, 58(10), 5016–5029. ISSN 1053-587X, 1941-0476.
## CRAN-purpose small computation # set a seed for reproducibility set.seed(11) # small data with identity covariance pdim <- 10 dat.small <- matrix(rnorm(5*pdim), ncol=pdim) # run the code out.small <- CovEst.2010RBLW(dat.small) # visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(diag(pdim)[,pdim:1], main="true cov") image(cov(dat.small)[,pdim:1], main="sample cov") image(out.small$S[,pdim:1], main="estimated cov") par(opar) ## Not run: ## want to see how delta is determined according to # the number of observations we have. nsamples = seq(from=5, to=200, by=5) nnsample = length(nsamples) # we will record two values; rho and norm difference vec.rho = rep(0, nnsample) vec.normd = rep(0, nnsample) for (i in 1:nnsample){ dat.norun <- matrix(rnorm(nsamples[i]*pdim), ncol=pdim) # sample in R^5 out.norun <- CovEst.2010RBLW(dat.norun) # run with default vec.rho[i] = out.norun$rho vec.normd[i] = norm(out.norun$S - diag(5),"f") # Frobenius norm } # let's visualize the results opar <- par(mfrow=c(1,2)) plot(nsamples, vec.rho, lwd=2, type="b", col="red", main="estimated rhos") plot(nsamples, vec.normd, lwd=2, type="b", col="blue",main="Frobenius error") par(opar) ## End(Not run)
## CRAN-purpose small computation # set a seed for reproducibility set.seed(11) # small data with identity covariance pdim <- 10 dat.small <- matrix(rnorm(5*pdim), ncol=pdim) # run the code out.small <- CovEst.2010RBLW(dat.small) # visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(diag(pdim)[,pdim:1], main="true cov") image(cov(dat.small)[,pdim:1], main="sample cov") image(out.small$S[,pdim:1], main="estimated cov") par(opar) ## Not run: ## want to see how delta is determined according to # the number of observations we have. nsamples = seq(from=5, to=200, by=5) nnsample = length(nsamples) # we will record two values; rho and norm difference vec.rho = rep(0, nnsample) vec.normd = rep(0, nnsample) for (i in 1:nnsample){ dat.norun <- matrix(rnorm(nsamples[i]*pdim), ncol=pdim) # sample in R^5 out.norun <- CovEst.2010RBLW(dat.norun) # run with default vec.rho[i] = out.norun$rho vec.normd[i] = norm(out.norun$S - diag(5),"f") # Frobenius norm } # let's visualize the results opar <- par(mfrow=c(1,2)) plot(nsamples, vec.rho, lwd=2, type="b", col="red", main="estimated rhos") plot(nsamples, vec.normd, lwd=2, type="b", col="blue",main="Frobenius error") par(opar) ## End(Not run)
Cai and Liu (2011) proposed an adaptive variant of Bickel and Levina (2008) - CovEst.hard
. The idea of adaptive thresholding is
to apply thresholding technique on correlation matrix in that it becomes adaptive in terms of each variable.
CovEst.adaptive(X, thr = 0.5, nCV = 10, parallel = FALSE)
CovEst.adaptive(X, thr = 0.5, nCV = 10, parallel = FALSE)
X |
an |
thr |
user-defined threshold value. If it is a vector of regularization values, it automatically selects one that minimizes cross validation risk. |
nCV |
the number of repetitions for 2-fold random cross validations for each threshold value. |
parallel |
a logical; |
a named list containing:
a covariance matrix estimate.
a dataframe containing vector of tested threshold values(thr
) and corresponding cross validation scores(CVscore
).
Cai T, Liu W (2011). “Adaptive Thresholding for Sparse Covariance Matrix Estimation.” Journal of the American Statistical Association, 106(494), 672–684. ISSN 0162-1459, 1537-274X.
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## apply 4 different schemes # mthr is a vector of regularization parameters to be tested mthr <- seq(from=0.01,to=0.99,length.out=10) out1 <- CovEst.adaptive(data, thr=0.1) # threshold value 0.1 out2 <- CovEst.adaptive(data, thr=0.5) # threshold value 0.5 out3 <- CovEst.adaptive(data, thr=0.1) # threshold value 0.9 out4 <- CovEst.adaptive(data, thr=mthr) # automatic threshold checking ## visualize 4 estimated matrices opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out1$S[,pdim:1], col=gray((0:100)/100), main="thr=0.1") image(out2$S[,pdim:1], col=gray((0:100)/100), main="thr=0.5") image(out3$S[,pdim:1], col=gray((0:100)/100), main="thr=0.9") image(out4$S[,pdim:1], col=gray((0:100)/100), main="automatic") par(opar)
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## apply 4 different schemes # mthr is a vector of regularization parameters to be tested mthr <- seq(from=0.01,to=0.99,length.out=10) out1 <- CovEst.adaptive(data, thr=0.1) # threshold value 0.1 out2 <- CovEst.adaptive(data, thr=0.5) # threshold value 0.5 out3 <- CovEst.adaptive(data, thr=0.1) # threshold value 0.9 out4 <- CovEst.adaptive(data, thr=mthr) # automatic threshold checking ## visualize 4 estimated matrices opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out1$S[,pdim:1], col=gray((0:100)/100), main="thr=0.1") image(out2$S[,pdim:1], col=gray((0:100)/100), main="thr=0.5") image(out3$S[,pdim:1], col=gray((0:100)/100), main="thr=0.9") image(out4$S[,pdim:1], col=gray((0:100)/100), main="automatic") par(opar)
Bickel and Levina (2008) proposed a sparse covariance estimation technique to apply thresholding on off-diagonal elements of
the sample covariance matrix. The entry of sample covariance matrix if
where
is
a thresholding value (
thr
). If thr
is rather a vector of regularization parameters, it applies
cross-validation scheme to select an optimal value.
CovEst.hard(X, thr = sqrt(log(ncol(X))/nrow(X)), nCV = 10, parallel = FALSE)
CovEst.hard(X, thr = sqrt(log(ncol(X))/nrow(X)), nCV = 10, parallel = FALSE)
X |
an |
thr |
user-defined threshold value. If it is a vector of regularization values, it automatically selects one that minimizes cross validation risk. |
nCV |
the number of repetitions for 2-fold random cross validations for each threshold value. |
parallel |
a logical; |
a named list containing:
a covariance matrix estimate.
a dataframe containing vector of tested threshold values(thr
) and corresponding cross validation scores(CVscore
).
Bickel PJ, Levina E (2008). “Covariance regularization by thresholding.” The Annals of Statistics, 36(6), 2577–2604. ISSN 0090-5364.
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## apply 4 different schemes # mthr is a vector of regularization parameters to be tested mthr <- exp(seq(from=log(0.1),to=log(10),length.out=10)) out1 <- CovEst.hard(data, thr=0.1) # threshold value 0.1 out2 <- CovEst.hard(data, thr=1) # threshold value 1 out3 <- CovEst.hard(data, thr=10) # threshold value 10 out4 <- CovEst.hard(data, thr=mthr) # automatic threshold checking ## visualize 4 estimated matrices gcol <- gray((0:100)/100) opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out1$S[,pdim:1], col=gcol, main="thr=0.1") image(out2$S[,pdim:1], col=gcol, main="thr=1") image(out3$S[,pdim:1], col=gcol, main="thr=10") image(out4$S[,pdim:1], col=gcol, main="automatic") par(opar)
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## apply 4 different schemes # mthr is a vector of regularization parameters to be tested mthr <- exp(seq(from=log(0.1),to=log(10),length.out=10)) out1 <- CovEst.hard(data, thr=0.1) # threshold value 0.1 out2 <- CovEst.hard(data, thr=1) # threshold value 1 out3 <- CovEst.hard(data, thr=10) # threshold value 10 out4 <- CovEst.hard(data, thr=mthr) # automatic threshold checking ## visualize 4 estimated matrices gcol <- gray((0:100)/100) opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out1$S[,pdim:1], col=gcol, main="thr=0.1") image(out2$S[,pdim:1], col=gcol, main="thr=1") image(out3$S[,pdim:1], col=gcol, main="thr=10") image(out4$S[,pdim:1], col=gcol, main="automatic") par(opar)
Sparse covariance estimation does not necessarily guarantee positive definiteness of an estimated covariance matrix. Fan et al. (2013) proposed to solve this issue by taking an iterative procedure to take an incremental decrease of threshold value until positive definiteness is preserved.
CovEst.hardPD(X)
CovEst.hardPD(X)
X |
an |
a named list containing:
a covariance matrix estimate.
an optimal threshold value that guarantees positive definiteness after thresholding.
Fan J, Liao Y, Mincheva M (2013). “Large covariance estimation by thresholding principal orthogonal complements.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(4), 603–680. ISSN 13697412.
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## apply 4 different schemes out1 <- CovEst.hard(data, thr=0.1) # threshold value 0.1 out2 <- CovEst.hard(data, thr=1) # threshold value 1 out3 <- CovEst.hard(data, thr=10) # threshold value 10 out4 <- CovEst.hardPD(data) # automatic threshold checking ## visualize 4 estimated matrices mmessage <- paste("hardPD::optimal thr=",sprintf("%.2f",out4$optC),sep="") gcol <- gray((0:100)/100) opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out1$S[,pdim:1], col=gcol, main="thr=0.1") image(out2$S[,pdim:1], col=gcol, main="thr=1") image(out3$S[,pdim:1], col=gcol, main="thr=10") image(out4$S[,pdim:1], col=gcol, main=mmessage) par(opar)
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## apply 4 different schemes out1 <- CovEst.hard(data, thr=0.1) # threshold value 0.1 out2 <- CovEst.hard(data, thr=1) # threshold value 1 out3 <- CovEst.hard(data, thr=10) # threshold value 10 out4 <- CovEst.hardPD(data) # automatic threshold checking ## visualize 4 estimated matrices mmessage <- paste("hardPD::optimal thr=",sprintf("%.2f",out4$optC),sep="") gcol <- gray((0:100)/100) opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out1$S[,pdim:1], col=gcol, main="thr=0.1") image(out2$S[,pdim:1], col=gcol, main="thr=1") image(out3$S[,pdim:1], col=gcol, main="thr=10") image(out4$S[,pdim:1], col=gcol, main=mmessage) par(opar)
Qi and Sun (2006) proposed an algorithm for computing the positive correlation matrix with Positive Definiteness and transforming it back in order to estimate covariance matrix. This algorithm does not depend on any parameters.
CovEst.nearPD(X)
CovEst.nearPD(X)
X |
an |
a named list containing:
a covariance matrix estimate.
Qi H, Sun D (2006). “A Quadratically Convergent Newton Method for Computing the Nearest Correlation Matrix.” SIAM Journal on Matrix Analysis and Applications, 28(2), 360–385. ISSN 0895-4798, 1095-7162.
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## compare against sample covariance out1 <- cov(data) out2 <- CovEst.nearPD(data) # apply nearPD ## visualize 2 estimated matrices gcol <- gray((0:100)/100) opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(out1[,pdim:1], col=gcol, main="sample covariance") image(out2$S[,pdim:1], col=gcol, main="SPD Projection") par(opar)
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## compare against sample covariance out1 <- cov(data) out2 <- CovEst.nearPD(data) # apply nearPD ## visualize 2 estimated matrices gcol <- gray((0:100)/100) opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(out1[,pdim:1], col=gcol, main="sample covariance") image(out2$S[,pdim:1], col=gcol, main="SPD Projection") par(opar)
Soft Thresholding method for covariance estimation takes off-diagonal elements of sample covariance matrix and applies
where is a sign of the value
, and
. If
thr
is rather a vector of regularization parameters, it applies
cross-validation scheme to select an optimal value.
CovEst.soft(X, thr = 0.5, nCV = 10, parallel = FALSE)
CovEst.soft(X, thr = 0.5, nCV = 10, parallel = FALSE)
X |
an |
thr |
user-defined threshold value. If it is a vector of regularization values, it automatically selects one that minimizes cross validation risk. |
nCV |
the number of repetitions for 2-fold random cross validations for each threshold value. |
parallel |
a logical; |
a named list containing:
a covariance matrix estimate.
a dataframe containing vector of tested threshold values(thr
) and corresponding cross validation scores(CVscore
).
Antoniadis A, Fan J (2001). “Regularization of Wavelet Approximations.” Journal of the American Statistical Association, 96(455), 939–967. ISSN 0162-1459, 1537-274X.
Donoho DL, Johnstone IM, Kerkyacharian G, Picard D (1995). “Wavelet Shrinkage: Asymptopia?” Journal of the Royal Statistical Society. Series B (Methodological), 57(2), 301–369. ISSN 00359246.
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## apply 4 different schemes # mthr is a vector of regularization parameters to be tested mthr <- exp(seq(from=log(0.1),to=log(10),length.out=10)) out1 <- CovEst.soft(data, thr=0.1) # threshold value 0.1 out2 <- CovEst.soft(data, thr=1) # threshold value 1 out3 <- CovEst.soft(data, thr=10) # threshold value 10 out4 <- CovEst.soft(data, thr=mthr) # automatic threshold checking ## visualize 4 estimated matrices gcol <- gray((0:100)/100) opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out1$S[,pdim:1], col=gcol, main="thr=0.1") image(out2$S[,pdim:1], col=gcol, main="thr=1") image(out3$S[,pdim:1], col=gcol, main="thr=10") image(out4$S[,pdim:1], col=gcol, main="automatic") par(opar)
## generate data from multivariate normal with Identity covariance. pdim <- 5 data <- matrix(rnorm(10*pdim), ncol=pdim) ## apply 4 different schemes # mthr is a vector of regularization parameters to be tested mthr <- exp(seq(from=log(0.1),to=log(10),length.out=10)) out1 <- CovEst.soft(data, thr=0.1) # threshold value 0.1 out2 <- CovEst.soft(data, thr=1) # threshold value 1 out3 <- CovEst.soft(data, thr=10) # threshold value 10 out4 <- CovEst.soft(data, thr=mthr) # automatic threshold checking ## visualize 4 estimated matrices gcol <- gray((0:100)/100) opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out1$S[,pdim:1], col=gcol, main="thr=0.1") image(out2$S[,pdim:1], col=gcol, main="thr=1") image(out3$S[,pdim:1], col=gcol, main="thr=10") image(out4$S[,pdim:1], col=gcol, main="automatic") par(opar)
For a given 3-dimensional array where symmetric positive definite (SPD) matrices are stacked slice by slice, it estimates Frechet mean on an open cone of SPD matrices under corresponding metric/distance measure.
CovMean( A, method = c("AIRM", "Cholesky", "Euclidean", "LERM", "Procrustes.SS", "Procrustes.Full", "PowerEuclidean", "RootEuclidean"), power = 1 )
CovMean( A, method = c("AIRM", "Cholesky", "Euclidean", "LERM", "Procrustes.SS", "Procrustes.Full", "PowerEuclidean", "RootEuclidean"), power = 1 )
A |
a |
method |
the type of distance measures to be used; |
power |
a non-zero number for PowerEuclidean distance. |
a mean covariance matrix estimated.
Dryden IL, Koloydenko A, Zhou D (2009). “Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging.” The Annals of Applied Statistics, 3(3), 1102–1123. ISSN 1932-6157.
## Not run: ## generate 100 sample covariances of size (5-by-5). pdim = 5 samples = samplecovs(100,pdim) ## compute mean of first 50 sample covariances from data under Normal(0,Identity). mLERM = CovMean(samples[,,1:50], method="LERM") mAIRM = CovMean(samples[,,1:50], method="AIRM") mChol = CovMean(samples[,,1:50], method="Cholesky") mRoot = CovMean(samples[,,1:50], method="RootEuclidean") ## visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(mLERM[,pdim:1], main="LERM mean") image(mAIRM[,pdim:1], main="AIRM mean") image(mChol[,pdim:1], main="Cholesky mean") image(mRoot[,pdim:1], main="RootEuclidean mean") par(opar) ## End(Not run)
## Not run: ## generate 100 sample covariances of size (5-by-5). pdim = 5 samples = samplecovs(100,pdim) ## compute mean of first 50 sample covariances from data under Normal(0,Identity). mLERM = CovMean(samples[,,1:50], method="LERM") mAIRM = CovMean(samples[,,1:50], method="AIRM") mChol = CovMean(samples[,,1:50], method="Cholesky") mRoot = CovMean(samples[,,1:50], method="RootEuclidean") ## visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(mLERM[,pdim:1], main="LERM mean") image(mAIRM[,pdim:1], main="AIRM mean") image(mChol[,pdim:1], main="Cholesky mean") image(mRoot[,pdim:1], main="RootEuclidean mean") par(opar) ## End(Not run)
Given data, it performs 1-sample test for Covariance where the null hypothesis is
where is the covariance of data model and
is a
hypothesized covariance based on a procedure proposed by Cai and Ma (2013).
CovTest1.2013Cai(data, Sigma0 = diag(ncol(data)), alpha = 0.05)
CovTest1.2013Cai(data, Sigma0 = diag(ncol(data)), alpha = 0.05)
data |
an |
Sigma0 |
a |
alpha |
level of significance. |
a named list containing:
a test statistic value.
rejection criterion to be compared against test statistic.
a logical; TRUE
to reject null hypothesis, FALSE
otherwise.
Cai TT, Ma Z (2013). “Optimal hypothesis testing for high dimensional covariance matrices.” Bernoulli, 19(5B), 2359–2388. ISSN 1350-7265.
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 5 data = matrix(rnorm(10*pdim), ncol=pdim) ## run the test CovTest1.2013Cai(data) ## End(Not run)
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 5 data = matrix(rnorm(10*pdim), ncol=pdim) ## run the test CovTest1.2013Cai(data) ## End(Not run)
Given data, it performs 1-sample test for Covariance where the null hypothesis is
where is the covariance of data model and
is a
hypothesized covariance based on a procedure proposed by Srivastava, Yanagihara, and Kubokawa (2014).
CovTest1.2014Srivastava(data, Sigma0 = diag(ncol(data)), alpha = 0.05)
CovTest1.2014Srivastava(data, Sigma0 = diag(ncol(data)), alpha = 0.05)
data |
an |
Sigma0 |
a |
alpha |
level of significance. |
a named list containing
a test statistic value.
rejection criterion to be compared against test statistic.
a logical; TRUE
to reject null hypothesis, FALSE
otherwise.
Srivastava MS, Yanagihara H, Kubokawa T (2014). “Tests for covariance matrices in high dimension with less sample size.” Journal of Multivariate Analysis, 130, 289–309. ISSN 0047259X.
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 5 data = matrix(rnorm(10*pdim), ncol=pdim) ## run the test CovTest1.2014Srivastava(data) ## End(Not run)
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 5 data = matrix(rnorm(10*pdim), ncol=pdim) ## run the test CovTest1.2014Srivastava(data) ## End(Not run)
Given two sets of data, it performs 2-sample test for equality of covariance matrices where the null hypothesis is
where and
represent true (unknown) covariance
for each dataset based on a procedure proposed by Cai and Ma (2013).
If
statistic
threshold
, it rejects null hypothesis.
CovTest2.2013Cai(X, Y, alpha = 0.05)
CovTest2.2013Cai(X, Y, alpha = 0.05)
X |
an |
Y |
an |
alpha |
level of significance. |
a named list containing
a test statistic value.
rejection criterion to be compared against test statistic.
a logical; TRUE
to reject null hypothesis, FALSE
otherwise.
Cai TT, Ma Z (2013). “Optimal hypothesis testing for high dimensional covariance matrices.” Bernoulli, 19(5B), 2359–2388. ISSN 1350-7265.
## generate 2 datasets from multivariate normal with identical covariance. pdim = 5 data1 = matrix(rnorm(100*pdim), ncol=pdim) data2 = matrix(rnorm(150*pdim), ncol=pdim) ## run test CovTest2.2013Cai(data1, data2)
## generate 2 datasets from multivariate normal with identical covariance. pdim = 5 data1 = matrix(rnorm(100*pdim), ncol=pdim) data2 = matrix(rnorm(150*pdim), ncol=pdim) ## run test CovTest2.2013Cai(data1, data2)
Given data, it performs 1-sample test for diagonal entries of a Covariance matrix where the null hypothesis is
and alternative hypothesis is
with
based on a procedure proposed by Cai and Jiang (2011).
DiagTest1.2011Cai(data, alpha = 0.05)
DiagTest1.2011Cai(data, alpha = 0.05)
data |
an |
alpha |
level of significance. |
a named list containing:
a test statistic value.
rejection criterion to be compared against test statistic.
a logical; TRUE
to reject null hypothesis, FALSE
otherwise.
Cai TT, Jiang T (2011). “Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices.” The Annals of Statistics, 39(3), 1496–1525. ISSN 0090-5364.
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 5 data = matrix(rnorm(100*pdim), ncol=pdim) ## run test with different alpha values DiagTest1.2011Cai(data, alpha=0.01) DiagTest1.2011Cai(data, alpha=0.05) DiagTest1.2011Cai(data, alpha=0.10) ## End(Not run)
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 5 data = matrix(rnorm(100*pdim), ncol=pdim) ## run test with different alpha values DiagTest1.2011Cai(data, alpha=0.01) DiagTest1.2011Cai(data, alpha=0.05) DiagTest1.2011Cai(data, alpha=0.10) ## End(Not run)
Given data, it performs 1-sample test for diagonal entries of a Covariance matrix where the null hypothesis is
and alternative hypothesis is
with
based on a procedure proposed by Lan et al. (2015).
DiagTest1.2015Lan(data, alpha = 0.05)
DiagTest1.2015Lan(data, alpha = 0.05)
data |
an |
alpha |
level of significance. |
a named list containing:
a test statistic value.
rejection criterion to be compared against test statistic.
a logical; TRUE
to reject null hypothesis, FALSE
otherwise.
Lan W, Luo R, Tsai C, Wang H, Yang Y (2015). “Testing the Diagonality of a Large Covariance Matrix in a Regression Setting.” Journal of Business & Economic Statistics, 33(1), 76–86. ISSN 0735-0015, 1537-2707.
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 5 data = matrix(rnorm(100*pdim), ncol=pdim) ## run test with different alpha values DiagTest1.2015Lan(data, alpha=0.01) DiagTest1.2015Lan(data, alpha=0.05) DiagTest1.2015Lan(data, alpha=0.10) ## End(Not run)
## Not run: ## generate data from multivariate normal with trivial covariance. pdim = 5 data = matrix(rnorm(100*pdim), ncol=pdim) ## run test with different alpha values DiagTest1.2015Lan(data, alpha=0.01) DiagTest1.2015Lan(data, alpha=0.05) DiagTest1.2015Lan(data, alpha=0.10) ## End(Not run)
Covariance is of universal prevalence across various disciplines within statistics. CovTools package aims at providing a rich collection of geometric and statistical tools for a variety of inferences on covariance structures as well as its inverse called precision matrix. See the sections below for a comprehensive list of functions provided from the package.
From inference on manifolds perspective, we have following functions,
name of a function | description |
CovDist |
compute pairwise distance of covariance matrices |
CovMean |
compute mean covariance matrix |
PreEst.2014An
returns an estimator of the banded precision matrix using the modified Cholesky decomposition.
It uses the estimator defined in Bickel and Levina (2008). The bandwidth is determined by the bandwidth test
suggested by An, Guo and Liu (2014).
PreEst.2014An( X, upperK = floor(ncol(X)/2), algorithm = c("Bonferroni", "Holm"), alpha = 0.01 )
PreEst.2014An( X, upperK = floor(ncol(X)/2), algorithm = c("Bonferroni", "Holm"), alpha = 0.01 )
X |
an |
upperK |
upper bound of bandwidth |
algorithm |
bandwidth test algorithm to be used. |
alpha |
significance level for the bandwidth test. |
a named list containing:
a estimated banded precision matrix.
an estimated optimal bandwidth acquired from the test procedure.
An B, Guo J, Liu Y (2014). “Hypothesis testing for band size detection of high-dimensional banded precision matrices.” Biometrika, 101(2), 477–483. ISSN 0006-3444, 1464-3510.
Bickel PJ, Levina E (2008). “Regularized estimation of large covariance matrices.” The Annals of Statistics, 36(1), 199–227. ISSN 0090-5364.
## Not run: ## parameter setting p = 200; n = 100 k0 = 5; A0min=0.1; A0max=0.2; D0min=2; D0max=5 set.seed(123) A0 = matrix(0, p,p) for(i in 2:p){ term1 = runif(n=min(k0,i-1),min=A0min, max=A0max) term2 = sample(c(1,-1),size=min(k0,i-1),replace=TRUE) vals = term1*term2 vals = vals[ order(abs(vals)) ] A0[i, max(1, i-k0):(i-1)] = vals } D0 = diag(runif(n = p, min = D0min, max = D0max)) Omega0 = t(diag(p) - A0)%*%diag(1/diag(D0))%*%(diag(p) - A0) ## data generation (based on AR representation) ## it is same with generating n random samples from N_p(0, Omega0^{-1}) X = matrix(0, nrow=n, ncol=p) X[,1] = rnorm(n, sd = sqrt(D0[1,1])) for(j in 2:p){ mean.vec.j = X[, 1:(j-1)]%*%as.matrix(A0[j, 1:(j-1)]) X[,j] = rnorm(n, mean = mean.vec.j, sd = sqrt(D0[j,j])) } ## banded estimation using two different schemes Omega1 <- PreEst.2014An(X, upperK=20, algorithm="Bonferroni") Omega2 <- PreEst.2014An(X, upperK=20, algorithm="Holm") ## visualize true and estimated precision matrices opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(Omega0[,p:1], main="Original Precision") image(Omega1$C[,p:1], main="banded3::Bonferroni") image(Omega2$C[,p:1], main="banded3::Holm") par(opar) ## End(Not run)
## Not run: ## parameter setting p = 200; n = 100 k0 = 5; A0min=0.1; A0max=0.2; D0min=2; D0max=5 set.seed(123) A0 = matrix(0, p,p) for(i in 2:p){ term1 = runif(n=min(k0,i-1),min=A0min, max=A0max) term2 = sample(c(1,-1),size=min(k0,i-1),replace=TRUE) vals = term1*term2 vals = vals[ order(abs(vals)) ] A0[i, max(1, i-k0):(i-1)] = vals } D0 = diag(runif(n = p, min = D0min, max = D0max)) Omega0 = t(diag(p) - A0)%*%diag(1/diag(D0))%*%(diag(p) - A0) ## data generation (based on AR representation) ## it is same with generating n random samples from N_p(0, Omega0^{-1}) X = matrix(0, nrow=n, ncol=p) X[,1] = rnorm(n, sd = sqrt(D0[1,1])) for(j in 2:p){ mean.vec.j = X[, 1:(j-1)]%*%as.matrix(A0[j, 1:(j-1)]) X[,j] = rnorm(n, mean = mean.vec.j, sd = sqrt(D0[j,j])) } ## banded estimation using two different schemes Omega1 <- PreEst.2014An(X, upperK=20, algorithm="Bonferroni") Omega2 <- PreEst.2014An(X, upperK=20, algorithm="Holm") ## visualize true and estimated precision matrices opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(Omega0[,p:1], main="Original Precision") image(Omega1$C[,p:1], main="banded3::Bonferroni") image(Omega2$C[,p:1], main="banded3::Holm") par(opar) ## End(Not run)
PreEst.2014Banerjee
returns a Bayes estimator of the banded precision matrix using G-Wishart prior.
Stein’s loss or squared error loss function is used depending on the “loss” argument in the function.
The bandwidth is set at the mode of marginal posterior for the bandwidth parameter.
PreEst.2014Banerjee( X, upperK = floor(ncol(X)/2), delta = 10, logpi = function(k) { -k^4 }, loss = c("Stein", "Squared") )
PreEst.2014Banerjee( X, upperK = floor(ncol(X)/2), delta = 10, logpi = function(k) { -k^4 }, loss = c("Stein", "Squared") )
X |
an |
upperK |
upper bound of bandwidth |
delta |
hyperparameter for G-Wishart prior. Default value is 10. It has to be larger than 2. |
logpi |
log of prior distribution for bandwidth |
loss |
type of loss; either |
a named list containing:
a MAP estimate for precision matrix.
Banerjee S, Ghosal S (2014). “Posterior convergence rates for estimating large precision matrices using graphical models.” Electronic Journal of Statistics, 8(2), 2111–2137. ISSN 1935-7524.
## generate data from multivariate normal with Identity precision. pdim = 10 data = matrix(rnorm(50*pdim), ncol=pdim) ## compare different K out1 <- PreEst.2014Banerjee(data, upperK=1) out2 <- PreEst.2014Banerjee(data, upperK=3) out3 <- PreEst.2014Banerjee(data, upperK=5) ## visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(diag(pdim)[,pdim:1],main="Original Precision") image(out1$C[,pdim:1], main="banded1::upperK=1") image(out2$C[,pdim:1], main="banded1::upperK=3") image(out3$C[,pdim:1], main="banded1::upperK=5") par(opar)
## generate data from multivariate normal with Identity precision. pdim = 10 data = matrix(rnorm(50*pdim), ncol=pdim) ## compare different K out1 <- PreEst.2014Banerjee(data, upperK=1) out2 <- PreEst.2014Banerjee(data, upperK=3) out3 <- PreEst.2014Banerjee(data, upperK=5) ## visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(diag(pdim)[,pdim:1],main="Original Precision") image(out1$C[,pdim:1], main="banded1::upperK=1") image(out2$C[,pdim:1], main="banded1::upperK=3") image(out3$C[,pdim:1], main="banded1::upperK=5") par(opar)
PreEst.2017Lee
returns a Bayes estimator of the banded precision matrix,
which is defined in subsection 3.3 of Lee and Lee (2017), using the k-BC prior.
The bandwidth is set at the mode of marginal posterior for the bandwidth parameter.
PreEst.2017Lee(X, upperK = floor(ncol(X)/2), logpi = function(k) { -k^4 })
PreEst.2017Lee(X, upperK = floor(ncol(X)/2), logpi = function(k) { -k^4 })
X |
an |
upperK |
upper bound of bandwidth |
logpi |
log of prior distribution for bandwidth |
a named list containing:
a MAP estimate for precision matrix.
Lee K, Lee J (2017). “Estimating Large Precision Matrices via Modified Cholesky Decomposition.” ArXiv e-prints.
## generate data from multivariate normal with Identity precision. pdim = 5 data = matrix(rnorm(100*pdim), ncol=pdim) ## compare different K out1 <- PreEst.2017Lee(data, upperK=1) out2 <- PreEst.2017Lee(data, upperK=3) out3 <- PreEst.2017Lee(data, upperK=5) ## visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(diag(pdim)[,pdim:1], main="Original Precision") image(out1$C[,pdim:1], main="banded2::upperK=1") image(out2$C[,pdim:1], main="banded2::upperK=3") image(out3$C[,pdim:1], main="banded2::upperK=5") par(opar)
## generate data from multivariate normal with Identity precision. pdim = 5 data = matrix(rnorm(100*pdim), ncol=pdim) ## compare different K out1 <- PreEst.2017Lee(data, upperK=1) out2 <- PreEst.2017Lee(data, upperK=3) out3 <- PreEst.2017Lee(data, upperK=5) ## visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(diag(pdim)[,pdim:1], main="Original Precision") image(out1$C[,pdim:1], main="banded2::upperK=1") image(out2$C[,pdim:1], main="banded2::upperK=3") image(out3$C[,pdim:1], main="banded2::upperK=5") par(opar)
Given a sample covariance matrix , graphical lasso aims at estimating sparse precision matrix
- inverse
of covariance. It solves a following optimization problem,
where a regularization parameter,
,
and
indicates positive definiteness. We provide three
modes of computations,
'fixed'
,'confidence'
, or 'BIC'
with respect to . Please see the section below for more details.
PreEst.glasso(X, method = list(type = "fixed", param = 1), parallel = FALSE)
PreEst.glasso(X, method = list(type = "fixed", param = 1), parallel = FALSE)
X |
an |
method |
a list containing following parameters,
|
parallel |
a logical; |
a named list containing:
a estimated precision matrix.
a dataframe containing values and corresponding BIC scores with
type='BIC'
method.
We currently provide three options for solving the problem, 'fixed'
,'confidence'
, or 'BIC'
with respect to .
When the method type is
'fixed'
, the parameter should be a single numeric value as a user-defined value. Likewise,
method type of
'confidence'
requires a singule numeric value in , where the value is set heuristically
according to
for a given confidence level as proposed by Banerjee et al. (2006).
Finally,
'BIC'
type requires a vector of values and opts for a lambda value with the lowest BIC values
as proposed by Yuan and Lin (2007).
Banerjee O, Ghaoui LE, d'Aspremont A, Natsoulis G (2006). “Convex optimization techniques for fitting sparse Gaussian graphical models.” In Proceedings of the 23rd international conference on Machine learning, 89–96. ISBN 978-1-59593-383-6.
Yuan M, Lin Y (2007). “Model Selection and Estimation in the Gaussian Graphical Model.” Biometrika, 94(1), 19–35. ISSN 00063444.
Friedman J, Hastie T, Tibshirani R (2008). “Sparse inverse covariance estimation with the graphical lasso.” Biostatistics, 9(3), 432–441. ISSN 1465-4644, 1468-4357.
## generate data from multivariate normal with Identity precision. pdim = 10 data = matrix(rnorm(100*pdim), ncol=pdim) ## prepare input arguments for diefferent scenarios lbdvec <- c(0.01,0.1,1,10,100) # a vector of regularization parameters list1 <- list(type="fixed",param=1.0) # single regularization parameter case list2 <- list(type="confidence",param=0.95) # single confidence level case list3 <- list(type="BIC",param=lbdvec) # multiple regularizers with BIC selection ## compute with different scenarios out1 <- PreEst.glasso(data, method=list1) out2 <- PreEst.glasso(data, method=list2) out3 <- PreEst.glasso(data, method=list3) ## visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(diag(pdim)[,pdim:1], main="Original Precision") image(out1$C[,pdim:1], main="glasso::lambda=1.0") image(out2$C[,pdim:1], main="glasso::Confidence=0.95") image(out3$C[,pdim:1], main="glasso::BIC selection") par(opar)
## generate data from multivariate normal with Identity precision. pdim = 10 data = matrix(rnorm(100*pdim), ncol=pdim) ## prepare input arguments for diefferent scenarios lbdvec <- c(0.01,0.1,1,10,100) # a vector of regularization parameters list1 <- list(type="fixed",param=1.0) # single regularization parameter case list2 <- list(type="confidence",param=0.95) # single confidence level case list3 <- list(type="BIC",param=lbdvec) # multiple regularizers with BIC selection ## compute with different scenarios out1 <- PreEst.glasso(data, method=list1) out2 <- PreEst.glasso(data, method=list2) out3 <- PreEst.glasso(data, method=list3) ## visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(diag(pdim)[,pdim:1], main="Original Precision") image(out1$C[,pdim:1], main="glasso::lambda=1.0") image(out2$C[,pdim:1], main="glasso::Confidence=0.95") image(out3$C[,pdim:1], main="glasso::BIC selection") par(opar)
For visualization purpose, samplecovs
generates a 3d array
of stacked sample covariances where - in 3rd dimension, the first half
are sample covariances of samples generated independently from
normal distribution with identity covariance, where the latter half
consists of samples covariances from dense random population covariance.
samplecovs(ncopy, size)
samplecovs(ncopy, size)
ncopy |
the total number of sample covariances to be generated. |
size |
dimension |
a array of strictly positive definite sample covariances.
## generate total of 20 samples covariances of size 5-by-5. samples <- samplecovs(20,5)
## generate total of 20 samples covariances of size 5-by-5. samples <- samplecovs(20,5)