Description
GMMPF
PGGMBC

Description

The Gaussian Graphical Model-based (GGM) framework, focusing on the precision matrix and conditional dependence, is a more popular paradigm of heterogeneity analysis, which is more informative than that limited to simple distributional properties. In GGM-based analyses, to determine the number of subgroups is a challenging and important task. This package contains a recently developed and novel method via penalized fusion which can determine the number of subgroups and recover the subgrouping structure fully data-dependently. Moreover, the package also includes some Gaussian graphical mixture model methods requiring a given number of subgroups. The main functions contained in the package are as follows.

GMMPF: This function implements the GGM-based heterogeneity analysis via penalized fusion (Ren et al., 2021).
PGGMBC: This method implements the penalized GGM-based clustering with unconstrained covariance matrices (Zhou et al., 2009).
summary-network: This function provides the summary of the characteristics of the resulting network structures, including the overlap of edges of different subgroups, the connection of node, and so on.
plot-network: This function implements the visualization of network structures.

We note that the penalties p(⋅, λ) used in Ren et al. (2021) and Zhou et al. (2009) are MCP and lasso, respectively. Our package provides the variety of types of penalties for both two methods, including convex and concave penalties. The workflow of the GMMPF package is as follows.

GMMPF

A relatively large number K, an upper bound of the true number of subgroups K₀, needs to be set by the users, which is easy to specify based on some biological knowledge. A new fusion penalty is developed to shrink differences of parameters among the K subgroups and encourage equality, and then a smaller number of subgroups can be yielded. Three tuning parameters λ₁, λ₂, and λ₃ are involved, where λ₁ and λ₂ are routine to determine the sparsity of parameters in means and precision matrices and regularize estimation. And the conditional dependence relationships for each subgroup can be obtained by examining the nonzero estimates of the resulting precision matrices. λ₃ is a pivotal parameter to control the degree of shrinking differences, which implements the effective ``searching” between 1 and K based on the penalized fusion technique.

Data setting

Denote n as the size of independent subjects. Consider sample i( = 1, …, n), p-dimensional measurement x_i is available. Further assume that the n subjects belong to K₀ subgroups, where the value of K₀ is unknown. For the lth subgroup, assume the Gaussian distribution: where the mean and covariance matrix are unknown. Overall, x_is satisfy distribution: where the mixture probabilities π_l^*s are also unknown. Our goal is to determine the number of subgroups K₀ and estimate the subgrouping structure fully data-dependently.

Method

GGM-based heterogeneity analysis via penalized fusion is based on the penalized objective function: where X denotes the collection of observed data, Ω = (Ω₁^⊤, ⋯, Ω_K^⊤)^⊤, Ω_k = vec (μ_k, Θ_k) = (μ_k1, …, μ_kp, θ_k11, …, θ_kp1, …, θ_k1p, …, θ_kpp) ∈ ℝ^p² + p, Θ_k = Σ_k⁻¹ is the k-th precision matrix with the ij-th entry θ_kij, π = (π₁, ⋯, π_K)^⊤, ∥ ⋅ ∥_F is the Frobenius norm, and p(⋅, λ) is a penalty function with tuning parameter λ > 0, which can be selected as lasso, SCAD, MCP, and others. K is a known constant that satisfies K > K₀. Consider: Denote $\{\widehat{\boldsymbol{\Upsilon}}_1 , \cdots, \widehat{\boldsymbol{\Upsilon}}_{\widehat{K}_0} \}$ as the distinct values of $\widehat{\boldsymbol{\Omega}}$, that is, $\{k: \widehat{\boldsymbol{\Omega}}_k \equiv \widehat{\boldsymbol{\Upsilon}}_l, k=1, \cdots, K \}_{ l=1, \cdots, \widehat{K}_0 }$ constitutes a partition of {1, ⋯, K}. Then there are K̂₀ subgroups with estimated mean and precision parameters in $\widehat{\boldsymbol{\Omega}}$. The mixture probabilities can be extracted from $\widehat{\boldsymbol{\pi}}$.

Example

First, we call the built-in simulation data set (K₀ = 3), and set the upper bound of K₀ and the sequences of the tuning parameters (λ1, λ2, and λ3).

data(example.data)
K <- 6
lambda <- genelambda.obo(nlambda1=5,lambda1_max=0.5,lambda1_min=0.1,
                         nlambda2=15,lambda2_max=1.5,lambda2_min=0.1,
                         nlambda3=10,lambda3_max=3.5,lambda3_min=0.5)

Apply GGMPF to the data.

res <- GGMPF(lambda, example.data$data, K, penalty = "MCP")
Theta_hat.list <- res$Theta_hat.list
Mu_hat.list <- res$Mu_hat.list
opt_num <- res$Opt_num
opt_Mu_hat <- Mu_hat.list[[opt_num]]
opt_Theta_hat <- Theta_hat.list[[opt_num]]
K_hat <- dim(opt_Theta_hat)[3]
K_hat  # Output the estimated K0.

Summarize the characteristics of the resulting network structures, and implement visualization of network structures.

summ <- summary_network(opt_Mu_hat, opt_Theta_hat, example.data$data)
summ$Theta_summary$overlap
va_names <- c("6")
linked_node_names(summ, va_names, num_subgroup=1)
plot_network(summ, num_subgroup = c(1:K_hat), plot.mfrow=c(1,K_hat))

References:

Ren M., Zhang S., Zhang Q. and Ma S. (2021). Gaussian Graphical Model-based Heterogeneity Analysis via Penalized Fusion. Biometrics, Published Online.

PGGMBC

This method combines Gaussian graphical mixture model and the regularization of the means and precision matrices based on the given number of subgroups in advance. The two involved tuning parameters λ₁ and λ₂ are same as those in GMMPF. Moreover, The users can easily implement BIC-based subgroup number selection using the function of outputing BIC values.

Data setting

It is same as the GGMPF.

Method

Given the number of subgroups K₀, penalized GGM-based clustering with unconstrained covariance matrices is based on the model: where Ω^′ = (Ω₁^⊤, ⋯, Ω_K₀^⊤)^⊤, π^′ = (π₁, ⋯, π_K₀)^⊤, and other notations are similar to those in Section .

Example

First, we call the built-in simulation data set, and give the true K₀ and the sequences of the tuning parameters (λ1 and λ2).

data(example.data)
K <- 3
lambda <- genelambda.obo(nlambda1=5,lambda1_max=0.5,lambda1_min=0.1,
                         nlambda2=15,lambda2_max=1.5,lambda2_min=0.1)

Apply PGGMBC to the data.

res <- PGGMBC(lambda, example.data$data, K, initial.selection="K-means")
Theta_hat.list <- res$Theta_hat.list
opt_num <- res$Opt_num
opt_Theta_hat <- Theta_hat.list[[opt_num]]

The usages of summarizing the characteristics of the resulting network structures and implementing visualization of network structures are same as the GGMPF.

References:

Zhou, H., Pan, W. and Shen, X. (2009). Penalized model-based clustering with unconstrained covariance matrices. Electronic Journal of Statistics, 3, 1473-1496.

- Table of contents
- Description
- GMMPF
- PGGMBC

HeteroGGM

Table of contents

Description

GMMPF

Data setting

Method

Example

References:

PGGMBC

Data setting

Method

Example

References: