---
title: "Package \"Tlasso\""
author: "Xiang Lyu, Will Wei Sun, Zhaoran Wang, Han Liu, Jian Yang, Guang Cheng. "
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Tlasso}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
> [Basic Info](#basic)
> [Methodology](#method)
> [Functions](#func)
> [Quick Start](#quick)
> [Reference](#ref)
## Basic Info
This package considers the estimation and inference of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. Data are assumed to follow a tensor normal distribution whose covariance has a Kronecker product structure. For estimation, this package provides an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, and attains an estimator with the optimal statistical rate of convergence. Notably, such an estimator achieves estimation consistency with only one tensor sample, which is unobserved in previous work. For inference, this package provides a large-scale multiple testing method for support recovery of sparse precision matrix. A FDR control procedure can be easily implmented into the inference. Test consistency and FDR convergence achieves with only two tensor samples.
## Methodology
### Estimation: Tlasso Algorithm. [Lyu et al. (2019)][3]
A tensor ${\cal T} \in \mathbb R^{m_1 \times m_2 \times \cdots \times m_K}$ follows the tensor normal distribution with zero mean and covariance matrices $\bf{\Sigma}_1, \ldots, \bf{\Sigma}_K$, denoted as ${\cal T} \sim \textrm{TN}({\bf0}; \bf{\Sigma}_1, \ldots, \bf{\Sigma}_K)$, if its probability density function is
$$
p({\cal T}| \bf{\Sigma}_1,\ldots,\bf{\Sigma}_K) = (2\pi)^{-m/2} \biggl\{ \prod_{k=1}^K |\bf{\Sigma}_k|^{-m/(2m_k)} \biggr\} \exp \big(- \|{\cal T} \times \bf{\Sigma}^{-1/2}\|_F^2/2 \big),
$$
where $m = \prod_{k=1}^K m_k$ and $\bf{\Sigma}^{-1/2} := \{\bf{\Sigma}_1^{-1/2},\ldots,\bf{\Sigma}_K^{-1/2}\}$.
A standard approach to estimate precision matrix $\bf{\Omega}_k^*$, $k=1,\ldots,K$, is to use the maximum likelihood method. Up to a constant, the negative log-likelihood function of the tensor normal distribution is
$$
\ell(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) := \frac{1}{2}\textrm{tr}[\bf{S} (\bf{\Omega}_K \otimes \cdots \otimes \bf{\Omega}_1)] - \frac{1}{2}\sum_{k=1}^K \frac{m}{m_k} \log |\bf{\Omega}_k|,
$$
where $\bf{S} := \frac{1}{n} \sum_{i=1}^n \textrm{vec}({\cal T}_i) \textrm{vec}({\cal T}_i)^{\top}$. To encourage the sparsity of each precision matrix in the high-dimensional scenario, a penalized log-likelihood estimator is proposed which minimizes
$$
q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K) := \frac{1}{m}\textrm{tr}[\bf{S} (\bf{\Omega}_K \otimes \cdots \otimes \bf{\Omega}_1)] - \sum_{k=1}^K \frac{1}{m_k} \log |\bf{\Omega}_k| + \sum_{k=1}^K P_{\lambda_k}(\bf{\Omega}_k),
$$
where $P_{\lambda_k}(\bf{\Omega}_k) = \lambda_k \sum_{i\ne j} |[\bf{\Omega}_{k}]_{i,j}|$ is penalty function and $\lambda_k$ is tuning parameter.
$q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)$ is jointly non-convex with respect to $\bf{\Omega}_1, \ldots, \bf{\Omega}_K$. Nevertheless, $q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)$ is bi-convex problem since $q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)$ is convex in $\bf{\Omega}_k$ when the rest $K-1$ precision matrices are fixed. According to its bi-convex property, this package solves this non-convex problem by alternatively update one precision matrix with other matrices fixed. Note that, for any $k = 1,\ldots, K$, minimizing $q_n(\bf{\Omega}_1, \ldots, \bf{\Omega}_K)$ with respect to $\bf{\Omega}_k$ while fixing the rest $K-1$ precision matrices is equivalent to minimizing
$$
L(\bf{\Omega}_k) := \frac{1}{m_k}\textrm{tr}(\bf{S}_k \bf{\Omega}_k) - \frac{1}{m_k} \log |\bf{\Omega}_k| + \lambda_k \|\bf{\Omega}_k\|_{1,\textrm{off}}.
$$
Here $\bf{S}_k := \frac{m_k}{n m}\sum_{i=1}^n \bf{V}_i^k \bf{V}_i^{k\top}$, where $\bf{V}_i^k := \big[ {\cal T}_i \times \big\{\bf{\Omega}_1^{1/2},\ldots,\bf{\Omega}_{k-1}^{1/2}, 1_{m_k}, \bf{\Omega}_{k+1}^{1/2},\ldots,\bf{\Omega}_{K}^{1/2} \big\} \big]_{(k)}$ with $\times$ the tensor product operation and $[\cdot]_{(k)}$ the mode-$k$ matricization operation defined in \S\ref{sec:prelim}. Note that minimizing $L(\bf{\Omega}_k)$ corresponds to estimating vector-valued Gaussian graphical model and can be solved efficiently via the glasso algorithm [Friedman et al. (2008)][2]. See [Lyu et al. (2019)][3] for detailed algorithm, named **Tlasso**.
### Inference:
Multiple testing method is established to testing the entries of precision matrix for each way of the tensor, [Lyu et al. (2019)][3]. We focus on $\bf{\Omega}_1^*$, and procedures on the rest $K-1$ precision matrices are symmetric. Hypothesis for $\bf{\Omega}_1^*$ is, $\forall \,1 \le i < j \le m_1$,
$$
H_{0 1 , i j } : \; [{\bf{\Omega}}_1^*]_{i,j} =0 \quad\quad \textrm{vs} \quad\quad H_{11 , ij} : \; [\bf{\Omega}_1^*]_{i,j} \ne 0
$$
Let index $-i$ in $k$-th mode denote all elements of mode-$k$ except the $i$-th one. From the distribution of ${\cal T }_{: , i_2 , \ldots , i_K}$, we have, for every $i_k \in \{ 1 , \ldots , m_k\}$, $k \in \{ 1 , \ldots , K\}$, ${\cal T}_{i_1 , i_2 , \ldots , i_K} | {\cal T}_{-i_1 , i_2 , \ldots , i_K} \sim \textrm{N} ( - [\bf{\Omega}_1^*]_{i_1,i_1}^{-1} [\bf{\Omega}_1^*]_{i_1, - i_1} {\cal T}_{-i_1 , i_2 , \ldots , i_K} ; [\bf{\Omega}_1^*]_{i_1, i_1 }^{-1} \prod_{k=2}^K[\bf{\Sigma}_k^*]_{i_k, i_k} )$. An equivalent linear model is
$$
{\cal T }_{l ; i_1, i_2 , \ldots , i_K} = {\cal T }_{l ; - i_1, i_2 , \ldots , i_K}^\top \bf{\theta}_{i_1} + \xi_{l ; i_1, i_2 , \ldots , i_K}, \forall \, l \in \{ 1 , \ldots , n \}
$$
where $\bf{\theta}_{i_1} = - [\bf{\Omega}_1^*]_{i_1,i_1}^{-1} [\bf{\Omega}_1^*]_{i_1, - i_1} \text{ , and }\, \xi_{l ;i_1, i_2 , \ldots , i_K} \sim \textrm{N} (0 \, ; [\bf{\Omega}_1^*]_{i_1, i_1 }^{-1} \prod_{k=2}^K [\bf{\Sigma}_k^*]_{i_k, i_k}).$
Note that intercept term is eliminated since zero mean of ${\cal T }_{: , i_2 , \ldots , i_K}$.
Test statistic is constructed by both bias and variance correction of sample covariance of residuals. Let
$\hat{\bf{\theta}}_{i_1} = (\hat{\theta}_{1, i_1} , \ldots , \hat{\theta}_{m_1-1, i_1})^\top = - [\hat{\bf{\Omega}}_1]_{i_1,i_1}^{-1} [\hat{\bf{\Omega}}_1]_{i_1, - i_1}$
be the estimator of $\bf{\theta}_{i_1}$, where $\hat{\bf{\Omega}}_1$ is the output of **Tlasso** Algorithm. Given $\{ \hat{\bf{\theta}}_{i_1} \}_{i_1 = 1 }^{m_1}$, the residual of the linear model is defined as
$$
\hat{\xi}_{l ; i_1, i_2 , \ldots , i_K} = {\cal T }_{l ; i_1, i_2 , \ldots , i_K} - \bar{{\cal T }}_{ i_1, i_2 , \ldots , i_K} - ( {\cal T }_{l ; - i_1, i_2 , \ldots , i_K} - \bar{{\cal T }}_{ - i_1, i_2 , \ldots , i_K} )^\top \hat{\bf{\theta}}_{i_1},
$$
where $\bar{{\cal T }} = \frac{1}{n}\sum \limits_{l=1}^{n} {\cal T }_{l }$. The sample covariance of residuals is
$$
\hat{\varrho}_{i,j}= \frac{m_1}{(n-1) m } \sum \limits_{l=1}^{n} \sum \limits_{i_2=1}^{m_2} \cdots \sum \limits_{i_K=1}^{m_K} \hat{\xi}_{l ; i, i_2 , \ldots , i_K} \hat{\xi}_{l ; j, i_2 , \ldots , i_K} .
$$
Test statistic is
$$\tau_{i,j} = \frac{\hat{\varrho}_{i.j} + \mu_{i,j}}{\varpi}, \forall 1 \le i < j \le m_1.$$
The bias correction term $\mu_{i,j}=\hat{\varrho}_{i,i} \hat{\theta}_{i , j} + \hat{\varrho}_{j,j} \hat{\theta}_{j-1, i}$ translates the mean of $\tau_{i,j}$ to zero under $H_{01, ij}$. The variance correction term
$$
\varpi^2 = \frac{m \cdot \|\widehat{\bf{S}}_2\|_F^2 \cdots \|\widehat{\bf{S}}_K\|_F^2}{m_1 \cdot (\textrm{tr}(\widehat{\bf{S}}_2))^2 \cdots (\textrm{tr}(\widehat{\bf{S}}_K))^2},
$$
plays a significant role in rescaling $\tau_{i,j}$ into an asymptotic standard normal distribution, where
$
\widehat{\bf{S}}_k := \frac{m_k}{nm} \sum_{i=1}^n \widehat{\bf{V}}_i \widehat{\bf{V}}_i^{\top}
$
is the estimate of $\bf{\Sigma}_k$ with $\widehat{\bf{V}}_i := \big[ {\cal T}_i \times \bigl\{\widehat{\bf{\Omega}}_1^{1/2},\ldots,\widehat{\bf{\Omega}}_{k-1}^{1/2}, 1_{m_k}, \widehat{\bf{\Omega}}_{k+1}^{1/2},\ldots,\widehat{\bf{\Omega}}_{K}^{1/2} \bigr\} \big]_{(k)}$ and $\widehat{\bf{\Omega}}_{k}$ from **Tlasso** Algorithm, $k \in \{2 , \ldots , K \}$. [Lyu et al. (2019)][3] proves that, under certain conditions,
$$ \sqrt{\frac{(n-1) m }{ m_1 \hat{\varrho}_{i,i} \hat{\varrho}_{j,j} }} \tau_{i,j} \rightarrow \textrm{N} ( 0 ;1 ) $$
in distribution, as $nm/m_1 \rightarrow \infty$.
Let $\tilde{\tau}_{i,j}= \sqrt{(n-1) m/ (m_1\hat{\varrho}_{i,i} \hat{\varrho}_{j,j}) } \tau_{i,j}$. Given the values of $\tilde{\tau}_{ij}$ and thresholding level $\varsigma$, we define a test procedure $\varphi_{\varsigma} (\tilde{\tau}_{i,j})= 1 \{ |\tilde{\tau}_{i,j}| \ge \varsigma \}$ and reject $H_{01, ij}$ if $\varphi_{\varsigma} (\tilde{\tau}_{i,j})=1$. The definition of FDR/FDP in our notation is
$$
\textrm{FDP} = \frac{| \{ (i,j) \in {\cal H}_0 : \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | }{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \, \text{ , and } \,\textrm{FDR} = \bf{E} (\textrm{FDP}).
$$
where ${\cal H}_0 = \{ (i,j) : [\bf{\Omega}_1^*]_{i,j} = 0 , 1 \le i < j \le m_1 \}$.
To recover the support of $\bf{\Omega}_1^*$, $(m_1-1)m_1/2$ tests are simultaneously conducted. Hence, a small enough $\varsigma$ that enhances power while controls FDP under a pre-specific level $\upsilon \in (0,1)$ is an ideal choice. In particular, $\varsigma_{*} = \inf \{ \varsigma > 0: \text{FDP} \le \upsilon \}$. However, $\varsigma_*$ is oracle since we have no access to the information of ${\cal H}_0$. Due to sparsity, $w_0 =|{\cal H}_0|$ can be approximated by $w= m_1(m_1 -1 )/2$. $P(\varphi_{\varsigma} (\tilde{\tau}_{i,j})=1)$ is close to $2(1 - \Phi( \varsigma))$ by test consistency. The approximation of $\varsigma_*$ is
$$
\hat{\varsigma}= \inf \bigg \{ \varsigma > 0: \frac{2(1-\Phi( \varsigma )) w}{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\varsigma} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \le \upsilon \bigg \}.
$$
FDR/FDP control procedure is to infer the support of $\bf{\Omega}_1^*$ at thresholding level $\hat{\varsigma}$. In particular, FDR and FDP in the procedure are
$$
\textrm{FDP}_1 = \frac{| \{ (i,j) \in {\cal H}_0 : \varphi_{\hat{\varsigma}} (\tilde{\tau}_{i,j})=1 \} | }{ | \{ (i,j) : 1 \le i < j \le m_1, \varphi_{\hat{\varsigma}} (\tilde{\tau}_{i,j})=1 \} | \vee 1 } \text{ , and } \,\textrm{FDR}_1 = \bf{E} (\textrm{FDP}_1).
$$
Note that we set $\tilde{\tau}_{i,j} = \tilde{\tau}_{j,i}$ for $1 \le i < j \le m_1$. The inference of $\textrm{supp}(\bf{\Omega}_k^*) , \, k \in \{2 , \ldots , K \}$, is a symmetric procedure.
[Lyu et al. (2019)][3] gives the asymptotic control of $\text{FDP}_1$ and $\text{FDR}_1$ for the support of $\bf{\Omega}_{1}^*$: under certain conditions,
$$\textrm{FDP}_1 w / \upsilon w_0 \rightarrow 1 , \; \textrm{FDR}_1 w / \upsilon w_0 \rightarrow 1 $$
in probability as $nm/m_1 \rightarrow \infty$.
[Back to Top](#top)
## Functions
This packages contains following functions:
1. `ChainOmega`
: This function generates precision matrix of triangle graph (chain like network) following the set-up in [Fan et al. (2009)][4]. It first constructs a covariance matrix $\Sigma$ that its (i,j) entry is $\exp (- | h_i - h_j |/2)$ with $h_1 < h_2 < \ldots < h_p$. The difference $h_i - h_{i+1}$ is generated i.i.d. from Unif(0.5,1). See [Fan et al. (2009)][4] for more details.
1. `NeighborOmega`
: This function generates precision matrix of nearest neighbor network following the set-up in [Li and Gui (2006)][5] and [Lee and Liu (2006)][6]. For a `knn` nearest-neighbor graph, this function first randomly picks p points from a unit square and computes all pairwise distances among the points. Then it searches for the knn nearest-neighbors of each point and a pair of symmetric entries in the precision matrix that has a random chosen value from $[-1, -0.5] \cup [0.5, 1]$. Finally, to ensure positive definite property, it normalizes the matrix as $\Omega <- \Omega + (\lambda_{\min} (\Omega) + 0.2 ) 1_p$ where $\lambda_{\min} (\cdot )$ refers to the samllest eigenvalue.
1. `Trnorm`
: This function generates obeservations from separable tensor normal distribution and returns a $m_1 * \ldots * m_K * n$ array. If `Sigma.list` is not given, default distribution is from either triangle graph or nearest-neighbor graph (depends on `type`).
1. `Tlasso.fit`
: This function conducts an alternating optimization algorithm to precision matrices of sparse tensor graphical models. The output is optimal consistent even when $T=1$, see [Lyu et al. (2019)][3] for details. There are two ternimation criteria, `T` and `thres`. Algorithm will be terminated if output in certain iteration change less than `thres`. Otherwise, T iterations will be fully operated.
1. `covres`
: This function generates sample covariance matrix of residuals (includes diagnoal) and is the basis for support recovery procedure, see [Lyu et al. (2019)][3]. Note that output matrix includes diagnoal while bias corrected matrix (output of `biascor`) for inference is off-diagnoal. Elements in Omega.list are true precision matrices or estimation of the true ones, the latter can be output of `Tlasso.fit`.
1. `biascor`
: This function computes bias corrected sample covariance matrix of residuals (excludes diagnoal, diagnoal is zero vector). Note that output matrix excludes diagnoal while sample covariance of residuals includes diagnoal, see [Lyu et al. (2019)][3] for details. Elements in `Omega.list` are true precision matrices or estimation of the true ones, the latter can be output of `Tlasso.fit`.
1. `varcor`
: This function computes variance correction term of sample covariance of residuals and is utilized to normalize test statistic into standord normal, see [Lyu et al. (2019)][3] for details.
1. `est.analysis`
: This function generates a list of performance measures of optimazation for sparse tensor graphical models, i.e., estimation errors and model selection consistency. Errors are measured in Frobenius norm and Max norm. Model selection measures are TPR and TNR. All these measures are computed in each mode, average across all modes, and kronecker production of precision matrices.
1. `infer.analysis`
: This function computes performance measures of inference for sparse tensor graphical models. False positive, false negative, discovery (number of rejected null hypothesis), non-discovery (number of non-rejected null hypothesis), and total non-zero entries of each true precision matrix is listed in output.
1. `graph.pattern`
: This function draws an undirected graph based on presicion matrix to present connection among variables. If an entry is zero, then no edge is connected between corresponding pair of nodes.
[Back to Top](#top)
## Quick Start
The purpose of this section is to show users the basic usage of this package. We will briefly go through main functions, see what they can do and have a look at outputs. An detailed example of complete procedures of estimation and inference will be presented to give users a general sense of the pakcage.
First, we load `Tlasso` package:
```{r}
library(Tlasso)
```
Then, we generate a list of precision matrices of triangle graph.
```{r}
m.vec = c(5,5,5) # dimensionality of a tensor
# m1, m2, m3
n = 5 # sample size
Omega.true.list = list()
for (k in 1:length(m.vec)) {
Omega.true.list[[k]] = ChainOmega(m.vec[k], sd = k)
}
Omega.true.list[[1]]
```
`ChainOmega` returns a precision matrix of triangle graph with dimension `m.vec[k]` and seed number `k`. Given precision matrices, we generate obervations from corresponding tensor normal distribution.
```{r}
Sigma.true.list = list()
for (k in 1:length(m.vec)) {
Sigma.true.list[[k]] = solve(Omega.true.list[[k]])
} # generate covariance matrices list
DATA=Trnorm(n,m.vec,Sigma.list=Sigma.true.list)
# obersavations from tensor normal distribution
```
`Trnorm` generates observations from separable tensor normal distribution with covariance matrix `Sigma.list[[k]]` for kth mode. `DATA` is a $m_1 * m_2 * m_3 * n$ array, i.e, a $10 \times 10 \times 10 \times 10$ array. Default distribution is from triangle graph or 4 near-neighbor graph (depends on `type`).
```{r, eval=FALSE}
DATA2=Trnorm(n,m.vec)
# default is triangle graph
# equivalent to DATA2 = Trnorm(n,m.vec, type='Chain', sd=1)
DATA3=Trnorm(n,m.vec,type='Neighbor')
# 4 nearest-neighbor graph
# equivalent to DATA3 = Trnorm(n,m.vec, type='Neighbor', sd=1, knn=4)
```
Given observations `DATA`, we use `Tlasso.fit` to conduct alternating optimization.
```{r}
lambda.thm = 20*c( sqrt(log(m.vec[1])/(n*prod(m.vec))),
sqrt(log(m.vec[2])/(n*prod(m.vec))),
sqrt(log(m.vec[3])/(n*prod(m.vec))))
# lambda.thm is regularization parameter
out.tlasso = Tlasso.fit(DATA,T=1,lambda.vec = lambda.thm)
# output is a list of estimation of precision matrices
out.tlasso[[1]]
```
`Tlasso.fit` generates a list of precision matrices `out.tlasso`. Default is maximal iteration `T=1` and termination thresholding level `thres=1e-05`. If estimation output changes less than `thres`, in terms of Frobenius norm, after certain iteration, `Tlasso.fit` will be terminated immediately (before Tth iteration). The performance of `Tlasso.fit` can be evaluated by `est.analysis`.
```{r}
# compare out.tlasso and Omega.true.list
# main diagnoal is taken into consideration
est.analysis(out.tlasso,Omega.true.list,offdiag=FALSE)
```
`est.analysis` returns a list of estimation performance measures:
| Argument | Explanation |
|------------------ | --------------------------------------------------|
| `Out$error.kro` | error in Frobenius norm of kronecker product |
| `Out$tpr.kro` | TPR of kronecker product |
| `Out$tnr.kro` | TNR of kronecker product |
| `Out$av.error.f` | averaged Frobenius norm error across all modes |
| `Out$av.error.max` | averaged Max norm error across all modes |
| `Out$av.tpr` | averaged TPR across all modes |
| `Out$av.tnr` | averaged TNR across all modes |
| `Out$error.f` | vector; error in Frobenius norm of each mode |
| `Out$error.max` | vector; error in Max norm of each mode |
| `Out$tpr` | vector; TPR of each mode |
| `Out$tnr` | vector; TNR of each mode |
Given `DATA` and `out.tlasso`, we next show how to compute test statistic.
```{r}
mat.list=list() # list of matrices of test statistic value
for ( k in 1:length(m.vec)) {
rho=covres(DATA, out.tlasso, k = k)
# sample covariance matrix of residuals, including diagnoal
bias_rho=biascor(rho,out.tlasso,k=k)
# bias corrected sample covariance of residuals, excluding diagnoal
varpi2=varcor(DATA, out.tlasso, k = k)
# variance correction term for kth mode's sample covariance of residuals
tautest=matrix(0,m.vec[k],m.vec[k])
for( i in 1:(m.vec[k]-1)) {
for ( j in (i+1):m.vec[k]){
tautest[j,i]=tautest[i,j]=sqrt((n-1)*prod(m.vec[-k]))*
bias_rho[i,j]/sqrt(varpi2*rho[i,i]*rho[j,j])
# compute final test statistic
}
}
mat.list[[k]]=tautest
}
mat.list[[1]]
```
To compute test statistic, we first need to compute sample covariance of residuals via `rho`. `rho` returns a sample covariance matrix, including diagnoal. Then we conduct bias correction `biascor` and variance correction `varcor`. `biascor` returns a bias corrected sample covariance matrix. `varcor` returns a scalar to scale test statistic into standard normal. Given test statistic value `mat.list`, we turn to test hypothesis. The significant level we choose is $0.95$.
```{r}
# inference measures (off-diagnoal), critical value is 0.975 quantile of standard normal
infer.analysis(mat.list, qnorm(0.975), Omega.true.list, offdiag=TRUE)
```
`infer.analysis` returns a list of inference performance measures:
| Argument | Explanation |
|--------------------|-----------------------------------------------------------|
| `Out$fp` | vector; number of false positive of each mode |
| `Out$fn` | vector; number of false negative of each mode |
| `Out$d` | vector; number of all discovery of each mode |
| `Out$nd` | vector; number of all non-discovery of each mode |
| `Out$t` | vector; number of all true non-zero entries of each mode |
Due to the fact that this inference procedure relies on multiple testing, FDR control is indispensible. [Lyu et al. (2019)][3] provides an easy-implemented and efficient FDR control procedure. This procedure asymptotically controls FDR via selecting the smallest critical value that contrains FDP under certain level.
```{r, eval=FALSE}
k=1 # interested mode
upsilon=0.1 # control level
# compute the difference between FDP and upsilon
fun=function(varsigma,mk,upsilon,tautest) {
return((2*(1-pnorm(varsigma))*mk*(mk-1))/max(1,sum(sign(abs(tautest) > varsigma))) - upsilon)
}
# select a critical value in (0,6) that has the samllest difference
diff=c();ind=1;inter=seq(0,6,0.0001)
for (varsigma in inter) {
diff[ind]=fun(varsigma,mk=m.vec[k],upsilon=upsilon,tautest=mat.list[[k]])
ind=ind+1
}
# the smallest critical value that constrains FDP under upsilon
critical=inter[min(which(diff < 0))]
# testing hypothesis with the critcal value
# FDR will converge to the limit proved in Lyu et al. 2019.
inference.FDR=infer.analysis(mat.list, critical, Omega.true.list, offdiag=TRUE)
```
Finally, we would like to visualize graph structure of inference via `graph.pattern`.
```{r, fig.height=5 , fig.width=5, fig.align='center' }
k=1 # interested mode
# true graph structure.
# set thres=0 in case true edge is eliminated
graph.pattern(Omega.true.list[[1]],main='True graph of mode 1',thres=0)
inf.mat=mat.list[[k]] > qnorm(0.975)
# set thres=0 (<1) since inf.mat is logical
graph.pattern(inf.mat,main='Inference of mode 1',thres=0)
```
From graphs we can see that inference result quite matches true graph.
[Back to Top](#top)
## Reference
1. Fan J, Feng Y, Wu Y. *Network exploration via the adaptive LASSO and SCAD penalties.* ***The annals of applied statistics***, 2009, **3(2): 521**.
1. Friedman J, Hastie T, Tibshirani R. *Sparse inverse covariance estimation with the graphical lasso.* ***Biostatistics***, 2008: **9.3: 432-441**.
1. Lee W, Liu Y. *Joint estimation of multiple precision matrices with common structures.* ***Journal of Machine Learning Research***, 2015, **16: 1035-1062**.
1. Li H, Gui J. *Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks.* ***Biostatistics***, 2006, **7(2): 302-317**.
1. Lyu, X., Sun, W. W., Wang, Z., Liu, H., Yang, J., Cheng, G. *Tensor graphical model: Non-convex optimization and statistical inference.* ***IEEE transactions on pattern analysis and machine intelligence***, 2019: **42(8): 2024-2037**.
[2]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3019769/
[3]: https://arxiv.org/abs/1609.04522
[4]: https://arxiv.org/abs/0908.2053
[5]: https://paperity.org/p/38773767/gradient-directed-regularization-for-sparse-gaussian-concentration-graphs-with
[6]: https://jmlr.org/papers/v16/lee15a.html