Title: | Graphical Models with Latent Variables |
---|---|
Description: | Three methods are provided to estimate graphical models with latent variables: (1) Jin, Y., Ning, Y., and Tan, K. M. (2020) (preprint available); (2) Chandrasekaran, V., Parrilo, P. A. & Willsky, A. S. (2012) <doi:10.1214/11-AOS949>; (3) Tan, K. M., Ning, Y., Witten, D. M. & Liu, H. (2016) <doi:10.1093/biomet/asw050>. |
Authors: | Yanxin Jin, Samantha Yang, Kean Ming Tan |
Maintainer: | Yanxin Jin <[email protected]> |
License: | GPL-3 |
Version: | 1.1 |
Built: | 2024-11-28 06:31:31 UTC |
Source: | CRAN |
Estimate graphical models with latent variables and correlated replicates using the method in Jin et al. (2020).
corlatent(data, accuracy, n, R, p, lambda1, lambda2, lambda3, distribution = "Gaussian", rule = "AND")
corlatent(data, accuracy, n, R, p, lambda1, lambda2, lambda3, distribution = "Gaussian", rule = "AND")
data |
data set. Can be a matrix, list, array, or data frame. If the data set is a matrix, it should have |
accuracy |
the threshhold where algorithm stops. The algorithm stops when the difference between estimaters of the |
n |
the number of observations. |
R |
the number of replicates for each observation. |
p |
the number of observed variables. |
lambda1 |
tuning parameter that encourages estimated graph to be sparse. |
lambda2 |
tuning parameter that models the effects of correlated replicates. Usually set to be equal to lambda1. |
lambda3 |
tuning parameter that encourages the latent effect to be piecewise constants. |
distribution |
For a data set with Gaussian distribution, use "Gaussian"; For a data set with Ising distribution, use "Ising". Default is "Gaussian". |
rule |
rules to combine matrices that encode the conditional dependence relationships between sets of two observed variables. Options are "AND" and "OR". Default is "AND". |
The corlatent method has two assumptions. Assumption 1 states that the replicates are assumed to follow a one-lag vector autoregressive model, conditioned on the latent variables.
Assumption 2 states that the latent variables are piecewise constant across replicates.
Based on these two assumptions, the method solve the following problem for
.
where is the log likelihood function,
encodes the conditional dependence relationships between
th observed variable and the other observed variables,
models the correlation among replicates,
encodes the latent effect,
,
,
are the tuning parameters,
is an n-dimensional identity matrix and
is the discrete first derivative matrix where the
th and
th column of every ith row are -1 and 1, respectively.
This method aims at modeling exponential family graphical models with correlated replicates and latent variables.
omega |
a matrix that encodes the conditional dependence relationships between sets of two observed variables |
theta |
the adjacency matrix with 0 and 1 encoding conditional independence and dependence between sets of two observed variables, respectively |
penalties |
the penalty values |
Jin, Y., Ning, Y., and Tan, K. M. (2020), ‘Exponential Family Graphical Models with Correlated Replicates and Unmeasured Confounders’, preprint available.
# Gaussian distribution with "AND" rule n <- 20 R <- 10 p <- 5 l <- 2 s <- 2 seed <- 1 data <- generate_Gaussian(n, R, p, l, s, sparsityA = 0.95, sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed)$X result <- corlatent(data, accuracy = 1e-6, n, R, p,lambda1 = 0.1, lambda2 = 0.1, lambda3 = 1e+5,distribution = "Gaussian", rule = "AND")
# Gaussian distribution with "AND" rule n <- 20 R <- 10 p <- 5 l <- 2 s <- 2 seed <- 1 data <- generate_Gaussian(n, R, p, l, s, sparsityA = 0.95, sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed)$X result <- corlatent(data, accuracy = 1e-6, n, R, p,lambda1 = 0.1, lambda2 = 0.1, lambda3 = 1e+5,distribution = "Gaussian", rule = "AND")
This function will generate a Gaussian distributed data set with latent variables and correlated replicates.
generate_Gaussian(n, R, p, l, s, sparsityA, sparsityobserved, sparsitylatent, lwb, upb, seed)
generate_Gaussian(n, R, p, l, s, sparsityA, sparsityobserved, sparsitylatent, lwb, upb, seed)
n |
the number of observations. |
R |
the number of replicates. |
p |
the number of observed variables. |
l |
the number of latent variables. |
s |
latent effects are generated as |
sparsityA |
proportion of the number of zeros in the transition matrix |
sparsityobserved |
proportion of the number of zeros in the inverse covariance of the observed variables. Must be a number from 0 to 1. |
sparsitylatent |
proportion of the number of zeros in the inverse covariances among latent variables and between observed variables and latent variables. Must be a number from 0 to 1. |
lwb |
lower bound for the elements in the inverse covariance matrix. |
upb |
upper bound for the elements in the inverse covariance matrix. |
seed |
the seed for the random number generator. |
This function aims to generate a Gaussian distributed data set with latent variables and correlated replicates. For each observation, the latent variables are piecewise constant across replicates, and conditioned on the latent variables, the replicates follow a one-lag vector autoregressive model, where the transition matrix is sparse with non-zero elements set equal to 0.3.
The matrix
is the covariance matrix for the observed variables X and the latent variables
, and we partition
into matrices that quantify the relationships among the observed variables (
), between the observed variables and latent variables (
or
), and of the latent variables (
).
In general, the data is generated with:
where and
.
X |
the generated data, which is a list with |
truegraph |
a matrix that encodes the conditional dependence relationships between sets of two observed variables |
Jin, Y., Ning, Y., and Tan, K. M. (2020), ‘Exponential Family Graphical Models with Correlated Replicates and Unmeasured Confounders’, preprint available.
data <- generate_Gaussian(n = 50, R = 20, p = 30, l = 2, s = 2, sparsityA = 0.95, sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed = 1)
data <- generate_Gaussian(n = 50, R = 20, p = 30, l = 2, s = 2, sparsityA = 0.95, sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed = 1)
Estimate Gaussian graphical models with latent variables using the method in Chandrasekaran et al. (2012).
lvglasso(data, n, p, lambda1, lambda2, rule = "AND")
lvglasso(data, n, p, lambda1, lambda2, rule = "AND")
data |
data set, can be a matrix or data frame with |
n |
the number of observations. |
p |
the number of observed variables. |
lambda1 |
tuning parameter that encourages estimated graph to be sparse. Lambda1 is proportional to lambda2. |
lambda2 |
tuning parameter that encourages the matrix |
rule |
rules to combine the inverse covariance matrices. Options are "AND" and "OR". Default is "AND". |
The lvglasso method assumes that all the variables, both observed and latent, are jointly Gaussian, and specifies the conditional distribution of observed variables on the latent variables by a graphical model. Under the high-dimentional setting, this method provides consistent estimators for the conditional graphical model of observed variables conditioned on latent variables.
omega |
a matrix that encodes the conditional dependence relationships between sets of two observed variables |
theta |
the adjacency matrix with 0 and 1 encoding conditional independence and dependence between sets of two observed variables, respectively |
penalties |
the penalty values |
Chandrasekaran, V., Parrilo, P. A. & Willsky, A. S. (2012), ‘Latent variable graphical model selection via convex optimization’, Ann. Statist. 40(4), 1935–1967.
#Gaussian distribution with "AND" rule n <- 50 R <- 20 p <- 30 l <- 2 s <- 2 data <- generate_Gaussian(n, R, p, l, s, sparsityA = 0.95, sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed = 1)$X result <- lvglasso(data, n, p, lambda1 = 0.222, lambda2 = 0.1*0.222, rule = "AND")
#Gaussian distribution with "AND" rule n <- 50 R <- 20 p <- 30 l <- 2 s <- 2 data <- generate_Gaussian(n, R, p, l, s, sparsityA = 0.95, sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed = 1)$X result <- lvglasso(data, n, p, lambda1 = 0.222, lambda2 = 0.1*0.222, rule = "AND")
Estimate graphical models with latent variables and replicates using the method in Tan et al. (2016).
semilatent(data, n, R, p, lambda, distribution = "Gaussian", rule = "AND")
semilatent(data, n, R, p, lambda, distribution = "Gaussian", rule = "AND")
data |
data set. Can be a matrix, list, array, or data frame. If the data set is a matrix, it should have |
n |
the number of observations. |
R |
the number of replicates for each observation. |
p |
the number of observed variables. |
lambda |
tuning parameter that encourages estimated graph to be sparse. |
distribution |
For a data set with Gaussian distribution, use "Gaussian"; For a data set with Ising distribution, use "Ising". Default is "Gaussian". |
rule |
rules to combine matrices that encode the conditional dependence relationships between sets of two observed variables. Options are "AND" and "OR". Default is "AND". |
The semilatent method has two assumptions. The first one states that the latent variables are constant across replicates.
Assumption 2 states that given the latent variables, the replicates are mutually independent.
With these two assumptions, the method seeks to solve the following problem for .
where is a nuisance-free loss function,
is a parameter that represents the conditional dependence relationships between
th observed variable and the other observed variables, and
is a tuning parameter.
This method aims at modeling semiparametric exponential family graphical model with latent variables and replicates.
omega |
a matrix that encodes the conditional dependence relationships between sets of two observed variables |
theta |
the adjacency matrix with 0 and 1 encoding conditional independence and dependence between sets of two observed variables, respectively |
penalty |
the penalty value |
Tan, K. M., Ning, Y., Witten, D. M. & Liu, H. (2016), ‘Replicates in high dimensions, with applications to latent variable graphical models’, Biometrika 103(4), 761–777.
#semilatent Gaussian with "AND" rule n <- 50 R <- 20 p <- 30 seed <- 1 l <- 2 s <- 2 data <- generate_Gaussian(n, R, p, l, s, sparsityA = 0.95, sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed)$X result <- semilatent(data, n, R, p, lambda = 0.1,distribution = "Gaussian", rule = "AND")
#semilatent Gaussian with "AND" rule n <- 50 R <- 20 p <- 30 seed <- 1 l <- 2 s <- 2 data <- generate_Gaussian(n, R, p, l, s, sparsityA = 0.95, sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed)$X result <- semilatent(data, n, R, p, lambda = 0.1,distribution = "Gaussian", rule = "AND")