This package considers the estimation and inference of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. Data are assumed to follow a tensor normal distribution whose covariance has a Kronecker product structure. For estimation, this package provides an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, and attains an estimator with the optimal statistical rate of convergence. Notably, such an estimator achieves estimation consistency with only one tensor sample, which is unobserved in previous work. For inference, this package provides a large-scale multiple testing method for support recovery of sparse precision matrix. A FDR control procedure can be easily implmented into the inference. Test consistency and FDR convergence achieves with only two tensor samples.
A tensor ${\cal T} \in \mathbb R^{m_1 \times m_2 \times \cdots \times m_K}$ follows the tensor normal distribution with zero mean and covariance matrices $\bf{\Sigma}_1, \ldots, \bf{\Sigma}_K$, denoted as ${\cal T} \sim \textrm{TN}({\bf0}; \bf{\Sigma}_1, \ldots, \bf{\Sigma}_K)$, if its probability density function is $$ p({\cal T}| \bf{\Sigma}_1,\ldots,\bf{\Sigma}_K) = (2\pi)^{-m/2} \biggl\{ \prod_{k=1}^K |\bf{\Sigma}_k|^{-m/(2m_k)} \biggr\} \exp \big(- \|{\cal T} \times \bf{\Sigma}^{-1/2}\|_F^2/2 \big), $$ where $m = \prod_{k=1}^K m_k$ and $\bf{\Sigma}^{-1/2} := \{\bf{\Sigma}_1^{-1/2},\ldots,\bf{\Sigma}_K^{-1/2}\}$.
A standard approach to estimate precision matrix $\bf{\Omega}_k^*$, k = 1, …, K, is to use the maximum likelihood method. Up to a constant, the negative log-likelihood function of the tensor normal distribution is $$ \ell(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) := \frac{1}{2}\textrm{tr}[\bf{S} (\bf{\Omega}_K \otimes \cdots \otimes \bf{\Omega}_1)] - \frac{1}{2}\sum_{k=1}^K \frac{m}{m_k} \log |\bf{\Omega}_k|, $$ where $\bf{S} := \frac{1}{n} \sum_{i=1}^n \textrm{vec}({\cal T}_i) \textrm{vec}({\cal T}_i)^{\top}$. To encourage the sparsity of each precision matrix in the high-dimensional scenario, a penalized log-likelihood estimator is proposed which minimizes $$ q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) := \frac{1}{m}\textrm{tr}[\bf{S} (\bf{\Omega}_K \otimes \cdots \otimes \bf{\Omega}_1)] - \sum_{k=1}^K \frac{1}{m_k} \log |\bf{\Omega}_k| + \sum_{k=1}^K P_{\lambda_k}(\bf{\Omega}_k), $$ where $P_{\lambda_k}(\bf{\Omega}_k) = \lambda_k \sum_{i\ne j} |[\bf{\Omega}_{k}]_{i,j}|$ is penalty function and λk is tuning parameter.
$q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)$ is jointly non-convex with respect to $\bf{\Omega}_1, \ldots, \bf{\Omega}_K$. Nevertheless, $q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)$ is bi-convex problem since $q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)$ is convex in $\bf{\Omega}_k$ when the rest K − 1 precision matrices are fixed. According to its bi-convex property, this package solves this non-convex problem by alternatively update one precision matrix with other matrices fixed. Note that, for any k = 1, …, K, minimizing $q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)$ with respect to $\bf{\Omega}_k$ while fixing the rest K − 1 precision matrices is equivalent to minimizing $$ L(\bf{\Omega}_k) := \frac{1}{m_k}\textrm{tr}(\bf{S}_k \bf{\Omega}_k) - \frac{1}{m_k} \log |\bf{\Omega}_k| + \lambda_k \|\bf{\Omega}_k\|_{1,\textrm{off}}. $$ Here $\bf{S}_k := \frac{m_k}{n m}\sum_{i=1}^n \bf{V}_i^k \bf{V}_i^{k\top}$, where $\bf{V}_i^k := \big[ {\cal T}_i \times \big\{\bf{\Omega}_1^{1/2},\ldots,\bf{\Omega}_{k-1}^{1/2}, 1_{m_k}, \bf{\Omega}_{k+1}^{1/2},\ldots,\bf{\Omega}_{K}^{1/2} \big\} \big]_{(k)}$ with × the tensor product operation and [⋅](k) the mode-k matricization operation defined in . Note that minimizing $L(\bf{\Omega}_k)$ corresponds to estimating vector-valued Gaussian graphical model and can be solved efficiently via the glasso algorithm Friedman et al. (2008). See Lyu et al. (2019) for detailed algorithm, named Tlasso.
Multiple testing method is established to testing the entries of precision matrix for each way of the tensor, Lyu et al. (2019). We focus on $\bf{\Omega}_1^*$, and procedures on the rest K − 1 precision matrices are symmetric. Hypothesis for $\bf{\Omega}_1^*$ is, ∀ 1 ≤ i < j ≤ m1, $$ H_{0 1 , i j } : \; [{\bf{\Omega}}_1^*]_{i,j} =0 \quad\quad \textrm{vs} \quad\quad H_{11 , ij} : \; [\bf{\Omega}_1^*]_{i,j} \ne 0 $$
Let index −i in k-th mode denote all elements of mode-k except the i-th one. From the distribution of ${\cal T }_{: , i_2 , \ldots , i_K}$, we have, for every ik ∈ {1, …, mk}, k ∈ {1, …, K}, ${\cal T}_{i_1 , i_2 , \ldots , i_K} | {\cal T}_{-i_1 , i_2 , \ldots , i_K} \sim \textrm{N} ( - [\bf{\Omega}_1^*]_{i_1,i_1}^{-1} [\bf{\Omega}_1^*]_{i_1, - i_1} {\cal T}_{-i_1 , i_2 , \ldots , i_K} ; [\bf{\Omega}_1^*]_{i_1, i_1 }^{-1} \prod_{k=2}^K[\bf{\Sigma}_k^*]_{i_k, i_k} )$. An equivalent linear model is $$ {\cal T }_{l ; i_1, i_2 , \ldots , i_K} = {\cal T }_{l ; - i_1, i_2 , \ldots , i_K}^\top \bf{\theta}_{i_1} + \xi_{l ; i_1, i_2 , \ldots , i_K}, \forall \, l \in \{ 1 , \ldots , n \} $$ where $\bf{\theta}_{i_1} = - [\bf{\Omega}_1^*]_{i_1,i_1}^{-1} [\bf{\Omega}_1^*]_{i_1, - i_1} \text{ , and }\, \xi_{l ;i_1, i_2 , \ldots , i_K} \sim \textrm{N} (0 \, ; [\bf{\Omega}_1^*]_{i_1, i_1 }^{-1} \prod_{k=2}^K [\bf{\Sigma}_k^*]_{i_k, i_k}).$ Note that intercept term is eliminated since zero mean of ${\cal T }_{: , i_2 , \ldots , i_K}$.
Test statistic is constructed by both bias and variance correction of
sample covariance of residuals. Let $\hat{\bf{\theta}}_{i_1} = (\hat{\theta}_{1, i_1} ,
\ldots , \hat{\theta}_{m_1-1, i_1})^\top = -
[\hat{\bf{\Omega}}_1]_{i_1,i_1}^{-1} [\hat{\bf{\Omega}}_1]_{i_1, -
i_1}$ be the estimator of $\bf{\theta}_{i_1}$, where $\hat{\bf{\Omega}}_1$ is the output of
Tlasso Algorithm. Given $\{
\hat{\bf{\theta}}_{i_1} \}_{i_1 = 1 }^{m_1}$, the residual of
the linear model is defined as $$
\hat{\xi}_{l ; i_1, i_2 , \ldots , i_K} = {\cal T }_{l ; i_1, i_2 ,
\ldots , i_K} - \bar{{\cal T }}_{ i_1, i_2 , \ldots , i_K} - ( {\cal T
}_{l ; - i_1, i_2 , \ldots , i_K} - \bar{{\cal T }}_{ - i_1, i_2 ,
\ldots , i_K} )^\top \hat{\bf{\theta}}_{i_1},
$$ where $\bar{{\cal T }} =
\frac{1}{n}\sum \limits_{l=1}^{n} {\cal T }_{l }$. The sample
covariance of residuals is
$$
\hat{\varrho}_{i,j}= \frac{m_1}{(n-1) m } \sum \limits_{l=1}^{n} \sum
\limits_{i_2=1}^{m_2} \cdots \sum \limits_{i_K=1}^{m_K} \hat{\xi}_{l ;
i, i_2 , \ldots , i_K} \hat{\xi}_{l ; j, i_2 , \ldots , i_K} .
$$ Test statistic is $$\tau_{i,j} =
\frac{\hat{\varrho}_{i.j} + \mu_{i,j}}{\varpi}, \forall 1 \le i < j
\le m_1.$$ The bias correction term $\mu_{i,j}=\hat{\varrho}_{i,i} \hat{\theta}_{i , j}
+ \hat{\varrho}_{j,j} \hat{\theta}_{j-1, i}$ translates the mean
of τi, j
to zero under H01, ij.
The variance correction term $$
\varpi^2 = \frac{m \cdot \|\widehat{\bf{S}}_2\|_F^2 \cdots
\|\widehat{\bf{S}}_K\|_F^2}{m_1 \cdot
(\textrm{tr}(\widehat{\bf{S}}_2))^2 \cdots
(\textrm{tr}(\widehat{\bf{S}}_K))^2},
$$ plays a significant role in rescaling τi, j
into an asymptotic standard normal distribution, where $ k :=
{i=1}^n _i _i^{} $ is the estimate of $\bf{\Sigma}_k$ with $\widehat{\bf{V}}_i := \big[ {\cal T}_i \times
\bigl\{\widehat{\bf{\Omega}}_1^{1/2},\ldots,\widehat{\bf{\Omega}}_{k-1}^{1/2},
1_{m_k},
\widehat{\bf{\Omega}}_{k+1}^{1/2},\ldots,\widehat{\bf{\Omega}}_{K}^{1/2}
\bigr\} \big]_{(k)}$ and $\widehat{\bf{\Omega}}_{k}$ from
Tlasso Algorithm, k ∈ {2, …, K}. Lyu et al. (2019) proves
that, under certain conditions, $$
\sqrt{\frac{(n-1) m }{ m_1 \hat{\varrho}_{i,i} \hat{\varrho}_{j,j} }}
\tau_{i,j} \rightarrow \textrm{N} ( 0 ;1 ) $$ in distribution,
as nm/m1 → ∞.
Let $\tilde{\tau}_{i,j}= \sqrt{(n-1) m/ (m_1\hat{\varrho}_{i,i} \hat{\varrho}_{j,j}) } \tau_{i,j}$. Given the values of τ̃ij and thresholding level 𝜍, we define a test procedure φ𝜍(τ̃i, j) = 1{|τ̃i, j| ≥ 𝜍} and reject H01, ij if φ𝜍(τ̃i, j) = 1. The definition of FDR/FDP in our notation is $$ \textrm{FDP} = \frac{| \{ (i,j) \in {\cal H}_0 : \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | }{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \, \text{ , and } \,\textrm{FDR} = \bf{E} (\textrm{FDP}). $$ where ${\cal H}_0 = \{ (i,j) : [\bf{\Omega}_1^*]_{i,j} = 0 , 1 \le i < j \le m_1 \}$. To recover the support of $\bf{\Omega}_1^*$, (m1 − 1)m1/2 tests are simultaneously conducted. Hence, a small enough 𝜍 that enhances power while controls FDP under a pre-specific level υ ∈ (0, 1) is an ideal choice. In particular, 𝜍* = inf {𝜍 > 0 : FDP ≤ υ}. However, 𝜍* is oracle since we have no access to the information of ${\cal H}_0$. Due to sparsity, $w_0 =|{\cal H}_0|$ can be approximated by w = m1(m1 − 1)/2. P(φ𝜍(τ̃i, j) = 1) is close to 2(1 − Φ(𝜍)) by test consistency. The approximation of 𝜍* is $$ \hat{\varsigma}= \inf \bigg \{ \varsigma > 0: \frac{2(1-\Phi( \varsigma )) w}{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \le \upsilon \bigg \}. $$
FDR/FDP control procedure is to infer the support of $\bf{\Omega}_1^*$ at thresholding level $\hat{\varsigma}$. In particular, FDR and FDP in the procedure are $$ \textrm{FDP}_1 = \frac{| \{ (i,j) \in {\cal H}_0 : \varphi_{\hat{\varsigma}} (\tilde{\tau}_{i,j})=1 \} | }{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\hat{\varsigma}} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \text{ , and } \,\textrm{FDR}_1 = \bf{E} (\textrm{FDP}_1). $$ Note that we set τ̃i, j = τ̃j, i for 1 ≤ i < j ≤ m1. The inference of $\textrm{supp}(\bf{\Omega}_k^*) , \, k \in \{2 , \ldots , K \}$, is a symmetric procedure.
Lyu et al. (2019) gives the asymptotic control of FDP1 and FDR1 for the support of $\bf{\Omega}_{1}^*$: under certain conditions, FDP1w/υw0 → 1, FDR1w/υw0 → 1 in probability as nm/m1 → ∞.
This packages contains following functions:
ChainOmega
NeighborOmega
knn
nearest-neighbor graph, this function first
randomly picks p points from a unit square and computes all pairwise
distances among the points. Then it searches for the knn
nearest-neighbors of each point and a pair of symmetric entries in the
precision matrix that has a random chosen value from [−1, −0.5] ∪ [0.5, 1]. Finally, to ensure
positive definite property, it normalizes the matrix as Ω < −Ω + (λmin(Ω) + 0.2)1p
where λmin(⋅)
refers to the samllest eigenvalue.
Trnorm
Sigma.list
is not given, default distribution is
from either triangle graph or nearest-neighbor graph (depends on
type
).
Tlasso.fit
T
and
thres
. Algorithm will be terminated if output in certain
iteration change less than thres
. Otherwise, T iterations
will be fully operated.
covres
biascor
) for inference is off-diagnoal. Elements in
Omega.list are true precision matrices or estimation of the true ones,
the latter can be output of Tlasso.fit
.
biascor
Omega.list
are true
precision matrices or estimation of the true ones, the latter can be
output of Tlasso.fit
.
varcor
est.analysis
infer.analysis
graph.pattern
The purpose of this section is to show users the basic usage of this package. We will briefly go through main functions, see what they can do and have a look at outputs. An detailed example of complete procedures of estimation and inference will be presented to give users a general sense of the pakcage.
First, we load Tlasso
package:
Then, we generate a list of precision matrices of triangle graph.
m.vec = c(5,5,5) # dimensionality of a tensor
# m1, m2, m3
n = 5 # sample size
Omega.true.list = list()
for (k in 1:length(m.vec)) {
Omega.true.list[[k]] = ChainOmega(m.vec[k], sd = k)
}
Omega.true.list[[1]]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.3168143 -0.2308893 0.0000000 0.0000000 0.0000000
## [2,] -0.2308893 0.4674875 -0.2123306 0.0000000 0.0000000
## [3,] 0.0000000 -0.2123306 0.4234692 -0.1841058 0.0000000
## [4,] 0.0000000 0.0000000 -0.1841058 0.3658496 -0.1499391
## [5,] 0.0000000 0.0000000 0.0000000 -0.1499391 0.2415995
ChainOmega
returns a precision matrix of triangle graph
with dimension m.vec[k]
and seed number k
.
Given precision matrices, we generate obervations from corresponding
tensor normal distribution.
Sigma.true.list = list()
for (k in 1:length(m.vec)) {
Sigma.true.list[[k]] = solve(Omega.true.list[[k]])
} # generate covariance matrices list
DATA=Trnorm(n,m.vec,Sigma.list=Sigma.true.list)
# obersavations from tensor normal distribution
Trnorm
generates observations from separable tensor
normal distribution with covariance matrix Sigma.list[[k]]
for kth mode. DATA
is a m1 * m2 * m3 * n
array, i.e, a 10 × 10 × 10 × 10 array.
Default distribution is from triangle graph or 4 near-neighbor graph
(depends on type
).
DATA2=Trnorm(n,m.vec)
# default is triangle graph
# equivalent to DATA2 = Trnorm(n,m.vec, type='Chain', sd=1)
DATA3=Trnorm(n,m.vec,type='Neighbor')
# 4 nearest-neighbor graph
# equivalent to DATA3 = Trnorm(n,m.vec, type='Neighbor', sd=1, knn=4)
Given observations DATA
, we use Tlasso.fit
to conduct alternating optimization.
lambda.thm = 20*c( sqrt(log(m.vec[1])/(n*prod(m.vec))),
sqrt(log(m.vec[2])/(n*prod(m.vec))),
sqrt(log(m.vec[3])/(n*prod(m.vec))))
# lambda.thm is regularization parameter
out.tlasso = Tlasso.fit(DATA,T=1,lambda.vec = lambda.thm)
# output is a list of estimation of precision matrices
out.tlasso[[1]]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.361422637 -0.23704270 0.002333856 -0.06126152 0.01539202
## [2,] -0.237046526 0.46822624 -0.192384949 -0.05154091 0.05256162
## [3,] 0.002304876 -0.19236551 0.424993048 -0.17817458 -0.07814616
## [4,] -0.061257515 -0.05155747 -0.178173408 0.37108685 -0.08295747
## [5,] 0.015405071 0.05255995 -0.078142842 -0.08296053 0.19265707
Tlasso.fit
generates a list of precision matrices
out.tlasso
. Default is maximal iteration T=1
and termination thresholding level thres=1e-05
. If
estimation output changes less than thres
, in terms of
Frobenius norm, after certain iteration, Tlasso.fit
will be
terminated immediately (before Tth iteration). The performance of
Tlasso.fit
can be evaluated by
est.analysis
.
# compare out.tlasso and Omega.true.list
# main diagnoal is taken into consideration
est.analysis(out.tlasso,Omega.true.list,offdiag=FALSE)
## $error.kro
## [1] 0.3656762
##
## $tpr.kro
## [1] 1
##
## $tnr.kro
## [1] 0.4989574
##
## $av.error.f
## [1] 0.1986636
##
## $av.error.max
## [1] 0.08618703
##
## $av.tpr
## [1] 1
##
## $av.tnr
## [1] 0.3333333
##
## $error.f
## [1] 0.2130124 0.0980819 0.2848964
##
## $error.max
## [1] 0.07814616 0.05873962 0.12167531
##
## $tpr
## [1] 1 1 1
##
## $tnr
## [1] 0.0000000 0.3333333 0.6666667
est.analysis
returns a list of estimation performance
measures:
Argument | Explanation |
---|---|
Out$error.kro |
error in Frobenius norm of kronecker product |
Out$tpr.kro |
TPR of kronecker product |
Out$tnr.kro |
TNR of kronecker product |
Out$av.error.f |
averaged Frobenius norm error across all modes |
Out$av.error.max |
averaged Max norm error across all modes |
Out$av.tpr |
averaged TPR across all modes |
Out$av.tnr |
averaged TNR across all modes |
Out$error.f |
vector; error in Frobenius norm of each mode |
Out$error.max |
vector; error in Max norm of each mode |
Out$tpr |
vector; TPR of each mode |
Out$tnr |
vector; TNR of each mode |
Given DATA
and out.tlasso
, we next show how
to compute test statistic.
mat.list=list() # list of matrices of test statistic value
for ( k in 1:length(m.vec)) {
rho=covres(DATA, out.tlasso, k = k)
# sample covariance matrix of residuals, including diagnoal
bias_rho=biascor(rho,out.tlasso,k=k)
# bias corrected sample covariance of residuals, excluding diagnoal
varpi2=varcor(DATA, out.tlasso, k = k)
# variance correction term for kth mode's sample covariance of residuals
tautest=matrix(0,m.vec[k],m.vec[k])
for( i in 1:(m.vec[k]-1)) {
for ( j in (i+1):m.vec[k]){
tautest[j,i]=tautest[i,j]=sqrt((n-1)*prod(m.vec[-k]))*
bias_rho[i,j]/sqrt(varpi2*rho[i,i]*rho[j,j])
# compute final test statistic
}
}
mat.list[[k]]=tautest
}
mat.list[[1]]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.0000000 2.761395 -0.231802 1.509672 -0.4881653
## [2,] 2.7613945 0.000000 2.411092 0.797850 -1.2962198
## [3,] -0.2318020 2.411092 0.000000 2.153166 1.2980072
## [4,] 1.5096719 0.797850 2.153166 0.000000 2.3900777
## [5,] -0.4881653 -1.296220 1.298007 2.390078 0.0000000
To compute test statistic, we first need to compute sample covariance
of residuals via rho
. rho
returns a sample
covariance matrix, including diagnoal. Then we conduct bias correction
biascor
and variance correction varcor
.
biascor
returns a bias corrected sample covariance matrix.
varcor
returns a scalar to scale test statistic into
standard normal. Given test statistic value mat.list
, we
turn to test hypothesis. The significant level we choose is 0.95.
# inference measures (off-diagnoal), critical value is 0.975 quantile of standard normal
infer.analysis(mat.list, qnorm(0.975), Omega.true.list, offdiag=TRUE)
## $fp
## [1] 0 0 2
##
## $fn
## [1] 0 0 4
##
## $d
## [1] 8 8 6
##
## $nd
## [1] 12 12 14
##
## $t
## [1] 8 8 8
infer.analysis
returns a list of inference performance
measures:
Argument | Explanation |
---|---|
Out$fp |
vector; number of false positive of each mode |
Out$fn |
vector; number of false negative of each mode |
Out$d |
vector; number of all discovery of each mode |
Out$nd |
vector; number of all non-discovery of each mode |
Out$t |
vector; number of all true non-zero entries of each mode |
Due to the fact that this inference procedure relies on multiple testing, FDR control is indispensible. Lyu et al. (2019) provides an easy-implemented and efficient FDR control procedure. This procedure asymptotically controls FDR via selecting the smallest critical value that contrains FDP under certain level.
k=1 # interested mode
upsilon=0.1 # control level
# compute the difference between FDP and upsilon
fun=function(varsigma,mk,upsilon,tautest) {
return((2*(1-pnorm(varsigma))*mk*(mk-1))/max(1,sum(sign(abs(tautest) > varsigma))) - upsilon)
}
# select a critical value in (0,6) that has the samllest difference
diff=c();ind=1;inter=seq(0,6,0.0001)
for (varsigma in inter) {
diff[ind]=fun(varsigma,mk=m.vec[k],upsilon=upsilon,tautest=mat.list[[k]])
ind=ind+1
}
# the smallest critical value that constrains FDP under upsilon
critical=inter[min(which(diff < 0))]
# testing hypothesis with the critcal value
# FDR will converge to the limit proved in Lyu et al. 2019.
inference.FDR=infer.analysis(mat.list, critical, Omega.true.list, offdiag=TRUE)
Finally, we would like to visualize graph structure of inference via
graph.pattern
.
k=1 # interested mode
# true graph structure.
# set thres=0 in case true edge is eliminated
graph.pattern(Omega.true.list[[1]],main='True graph of mode 1',thres=0)
inf.mat=mat.list[[k]] > qnorm(0.975)
# set thres=0 (<1) since inf.mat is logical
graph.pattern(inf.mat,main='Inference of mode 1',thres=0)
From graphs we can see that inference result quite matches true graph.