Penalized Precision Matrix Estimation in grasps

Preliminary

Consider the following setting:

Gaussian graphical model (GGM) assumption:
The data X_p × p consists of independent and identically distributed samples X₁, …, X_n ∼ N_p(μ, Σ).
Disjoint group structure:
The p variables can be partitioned into disjoint groups.
Goal:
Estimate the precision matrix Ω = Σ⁻¹ = (ω_ij)_p × p.

Sparse-Group Estimator

where:

$S = n^{-1} \sum_{i=1}^n (X_i-\bar{X})(X_i-\bar{X})^\top$ is the empirical covariance matrix.
λ ≥ 0 is the global regularization parameter controlling overall shrinkage.
α ∈ [0, 1] is the mixing parameter controlling the balance between element-wise and block-wise penalties.
γ is the additional parameter for non-convex penalties, controlling the degree of nonconvexity (or concavity) of the penalty function.
𝒫_{λ, α, γ}(Ω) is a generic bi-level penalty template that combines element-wise and block-wise regularization, allowing convex or non-convex regularizers while preserving the intrinsic group structure among variables.
𝒫_λ, γ^idv(Ω) is the element-wise individual penalty component.
𝒫_λ, γ^grp(Ω) is the block-wise group penalty component.
P_λ, γ(⋅) is the penalty function.
Ω_gg^′ is the submatrix of Ω with the rows from group g and columns from group g^′.
The Frobenius norm ‖Ω‖_F is defined as ‖Ω‖_F = (∑_i, j|ω_ij|²)^1/2 = [tr(Ω^⊤Ω)]^1/2.

Note:

The parameter γ is only relevant for non-convex penalties. The Lasso penalty can be viewed as a special case in which γ is not required.

Penalties

Lasso: Least absolute shrinkage and selection operator (Tibshirani 1996; Friedman et al. 2008)

P_λ(ω_ij) = λ|ω_ij|.

Adaptive lasso (Zou 2006; Fan et al. 2009)

$$ P_{\lambda,\gamma}(\omega_{ij}) = \lambda\frac{\vert\omega_{ij}\vert}{v_{ij}}, $$ where V = (v_ij)_d × d = (|ω̃_ij|^γ)_d × d is a matrix of adaptive weights, and ω̃_ij is the initial estimate obtained using penalty = "lasso".

Atan: Arctangent type penalty (Wang and Zhu 2016)

$$ P_{\lambda,\gamma}(\omega_{ij}) = \lambda(\gamma+\frac{2}{\pi}) \arctan\left(\frac{\vert\omega_{ij}\vert}{\gamma}\right), \quad \gamma > 0. $$

Exp: Exponential type penalty (Wang et al. 2018)

$$ P_{\lambda,\gamma}(\omega_{ij}) = \lambda\left[1-\exp\left(-\frac{\vert\omega_{ij}\vert}{\gamma}\right)\right], \quad \gamma > 0. $$

Lq (Frank and Friedman 1993; Fu 1998; Fan and Li 2001)

P_λ, γ(ω_ij) = λ|ω_ij|^γ, 0 < γ < 1.

LSP: Log-sum penalty (Candès et al. 2008)

$$ P_{\lambda,\gamma}(\omega_{ij}) = \lambda\log\left(1+\frac{\vert\omega_{ij}\vert}{\gamma}\right), \quad \gamma > 0. $$

MCP: Minimax concave penalty (Zhang 2010)

$$ P_{\lambda,\gamma}(\omega_{ij}) = \begin{cases} \lambda\vert\omega_{ij}\vert - \dfrac{\omega_{ij}^2}{2\gamma}, & \text{if } \vert\omega_{ij}\vert \leq \gamma\lambda, \\ \dfrac{1}{2}\gamma\lambda^2, & \text{if } \vert\omega_{ij}\vert > \gamma\lambda. \end{cases} \quad \gamma > 1. $$

SCAD: Smoothly clipped absolute deviation (Fan and Li 2001; Fan et al. 2009)

$$ P_{\lambda,\gamma}(\omega_{ij}) = \begin{cases} \lambda\vert\omega_{ij}\vert & \text{if } \vert\omega_{ij}\vert \leq \lambda, \\ \dfrac{2\gamma\lambda\vert\omega_{ij}\vert-\omega_{ij}^2-\lambda^2}{2(\gamma-1)} & \text{if } \lambda < \vert\omega_{ij}\vert < \gamma\lambda, \\ \dfrac{\lambda^2(\gamma+1)}{2} & \text{if } \vert\omega_{ij}\vert \geq \gamma\lambda. \end{cases} \quad \gamma > 2. $$

Note:

For Lasso, which is convex, the additional parameter γ is not required, and the penalty function P_λ, γ(⋅) simplifies to P_λ(⋅).

Illustrative Visualization

Figure 1 illustrates a comparison of various penalty functions P(ω) evaluated over a range of ω values. The main panel (right) provides a wider view of the penalty functions’ behavior for larger |ω|, while the inset panel (left) magnifies the region near zero [−1, 1].

library(grasps) ## for penalty computation
library(ggplot2) ## for visualization

penalties <- c("atan", "exp", "lasso", "lq", "lsp", "mcp", "scad")

pen_df <- compute_penalty(seq(-4, 4, by = 0.01), penalties, lambda = 1)
plot(pen_df, xlim = c(-1, 1), ylim = c(0, 1), zoom.size = 1) +
  guides(color = guide_legend(nrow = 2, byrow = TRUE))

Figure 1: Illustrative penalty functions.

Figure 2 displays the derivative function P^′(ω) associated with a range of penalty types. The Lasso exhibits a constant derivative, corresponding to uniform shrinkage. For MCP and SCAD, the derivatives are piecewise: initially equal to the Lasso derivative, then decreasing over an intermediate region, and eventually dropping to zero, indicating that large |ω| receive no shrinkage. Other non-convex penalties show smoothly diminishing derivatives as |ω| increases, reflecting their tendency to shrink small |ω| strongly while exerting little to no shrinkage on large ones.

deriv_df <- compute_derivative(seq(0, 4, by = 0.01), penalties, lambda = 1)
plot(deriv_df) +
  scale_y_continuous(limits = c(0, 1.5)) +
  guides(color = guide_legend(nrow = 2, byrow = TRUE))

Figure 2: Illustrative penalty derivatives.

Reference

Candès, Emmanuel J., Michael B. Wakin, and Stephen P. Boyd. 2008. “Enhancing Sparsity by Reweighted ℓ₁ Minimization.” Journal of Fourier Analysis and Applications 14 (5): 877–905. https://doi.org/10.1007/s00041-008-9045-x.

Fan, Jianqing, Yang Feng, and Yichao Wu. 2009. “Network Exploration via the Adaptive LASSO and SCAD Penalties.” The Annals of Applied Statistics 3 (2): 521–41. https://doi.org/10.1214/08-aoas215.

Fan, Jianqing, and Runze Li. 2001. “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96 (456): 1348–60. https://doi.org/10.1198/016214501753382273.

Frank, Lldiko E., and Jerome H. Friedman. 1993. “A Statistical View of Some Chemometrics Regression Tools.” Technometrics 35 (2): 109–35. https://doi.org/10.1080/00401706.1993.10485033.

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41. https://doi.org/10.1093/biostatistics/kxm045.

Fu, Wenjiang J. 1998. “Penalized Regressions: The Bridge Versus the Lasso.” Journal of Computational and Graphical Statistics 7 (3): 397–416. https://doi.org/10.1080/10618600.1998.10474784.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society: Series B (Methodological) 58 (1): 267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.

Wang, Yanxin, Qibin Fan, and Li Zhu. 2018. “Variable Selection and Estimation Using a Continuous Approximation to the L₀ Penalty.” Annals of the Institute of Statistical Mathematics 70 (1): 191–214. https://doi.org/10.1007/s10463-016-0588-3.

Wang, Yanxin, and Li Zhu. 2016. “Variable Selection and Parameter Estimation with the Atan Regularization Method.” Journal of Probability and Statistics 2016: 6495417. https://doi.org/10.1155/2016/6495417.

Zhang, Cun-Hui. 2010. “Nearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics 38 (2): 894–942. https://doi.org/10.1214/09-AOS729.

Zou, Hui. 2006. “The Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association 101 (476): 1418–29. https://doi.org/10.1198/016214506000000735.