Title: | Copula Graphical Models for Heterogeneous Mixed Data |
---|---|
Description: | A multi-core R package that allows for the statistical modeling of multi-group multivariate mixed data using Gaussian graphical models. Combining the Gaussian copula framework with the fused graphical lasso penalty, the 'heteromixgm' package can handle a wide variety of datasets found in various sciences. The package also includes an option to perform model selection using the AIC, BIC and EBIC information criteria, a function that plots partial correlation graphs based on the selected precision matrices, as well as simulate mixed heterogeneous data for exploratory or simulation purposes and one multi-group multivariate mixed agricultural dataset pertaining to maize yields. The package implements the methodological developments found in Hermes et al. (2024) <doi:10.1080/10618600.2023.2289545>. |
Authors: | Sjoerd Hermes [aut, cre], Joost van Heerwaarden [ctb], Pariya Behrouzi [ctb] |
Maintainer: | Sjoerd Hermes <[email protected]> |
License: | GPL-3 |
Version: | 2.0.2 |
Built: | 2024-12-13 06:56:03 UTC |
Source: | CRAN |
Simulate mixed multi-group data.
data_sim(network, n, p, K, ncat, rho, gamma_g = NULL, gamma_o, gamma_b = NULL, gamma_p = NULL, prob = NULL, nclass = NULL)
data_sim(network, n, p, K, ncat, rho, gamma_g = NULL, gamma_o, gamma_b = NULL, gamma_p = NULL, prob = NULL, nclass = NULL)
network |
Type of network, either "circle", "Random", "Cluster", "Scale-free", "AR1" or "AR2". |
n |
Number of observations. |
p |
Number of variables. |
K |
Number of groups. |
ncat |
Number of categories for ordinal variables. |
rho |
Dissimilarity parameter inducing dissimilarity between the K datasets. |
gamma_g |
Proportion of Gaussian variables in the data. |
gamma_o |
Proportion of ordinal variables in the data. |
gamma_b |
Proportion of binomial variables in the data. |
gamma_p |
Proportion of Poisson variables in the data.. |
prob |
Edge occurency probability in random graph. |
nclass |
Number of clusters in cluster graph. |
z |
A list of |
theta |
A list of |
Sjoerd Hermes, Joost van Heerwaarden and Pariya Behrouzi
Maintainer: Sjoerd Hermes [email protected]
1. Hermes, S., van Heerwaarden, J., & Behrouzi, P. (2024).
Copula graphical models for heterogeneous mixed data.
Journal of Computational and Graphical Statistics, 1-15.
data_sim(network = "Random", n = 10, p = 50, K = 3, ncat = 6, rho = 0.25, gamma_o = 0.5, gamma_b = 0.1, gamma_p = 0.2, prob = 0.05)
data_sim(network = "Random", n = 10, p = 50, K = 3, ncat = 6, rho = 0.25, gamma_o = 0.5, gamma_b = 0.1, gamma_p = 0.2, prob = 0.05)
This function implements either the Gibbs or approximation method within the Gaussian copula graphical model to estimate the conditional expectation for the data that not follow Gaussianity assumption (e.g. ordinal, discrete, continuous non-Gaussian, or mixed dataset).
heteromixgm(X, method, lambda1, lambda2, ncores)
heteromixgm(X, method, lambda1, lambda2, ncores)
X |
A list containing |
method |
Choice between "Gibbs" and "Approximate" indicating which method to use. |
lambda1 |
Vector containing values (in [0,1]) for the sparsity
penalization of each |
lambda2 |
Vector containing values (in [0,1]) for the similarity
penalization between the |
ncores |
Number of cores to be used during parallel computing. |
Z |
New transformation of the data based on given or default
|
ES |
Expectation of covariance matrix( diagonal scaled to 1) of the Gaussian copula graphical model. |
Sigma |
The covariance matrix of the latent variable given the data. |
Theta |
The inverse covariance matrix of the latent variable given the data. |
loglik |
Value of the Log likelihood under the estimated parameters. |
Sjoerd Hermes, Joost van Heerwaarden and Pariya Behrouzi
Maintainer: Sjoerd Hermes [email protected]
1. Hermes, S., van Heerwaarden, J., & Behrouzi, P. (2024).
Copula graphical models for heterogeneous mixed data.
Journal of Computational and Graphical Statistics, 1-15.
data(maize) l1 <- c(0.4) l2 <- c(0,0.1) ncores <- 1 est <- heteromixgm(maize, "Approximate", l1, l2, ncores)
data(maize) l1 <- c(0.4) l2 <- c(0,0.1) ncores <- 1 est <- heteromixgm(maize, "Approximate", l1, l2, ncores)
Initializes parameters to be used in the approximate method algorithm.
initialize(y, ncores)
initialize(y, ncores)
y |
Data. |
ncores |
Number of cores to be used during parallel computing. |
ES |
Expectation of covariance matrices ( diagonal scaled to 1) of the Gaussian copula graphical model. |
Z |
New transformation of the data based on given or default
|
lower_upper |
Lower and upper truncation points for the truncated normal distribution. |
Sjoerd Hermes, Joost van Heerwaarden and Pariya Behrouzi
Maintainer: Sjoerd Hermes [email protected]
1. Hermes, S., van Heerwaarden, J., & Behrouzi, P. (2024).
Copula graphical models for heterogeneous mixed data.
Journal of Computational and Graphical Statistics, 1-15.
y <- list(matrix(runif(25), 5, 5),matrix(runif(25), 5, 5),matrix(runif(25), 5, 5)) ncores <- 1 initialize(y, ncores)
y <- list(matrix(runif(25), 5, 5),matrix(runif(25), 5, 5),matrix(runif(25), 5, 5)) ncores <- 1 initialize(y, ncores)
Calculates lower and upper bands for each data point, using a set of cut-points which is obtained from the Gaussian copula.
lower.upper(y)
lower.upper(y)
y |
An ( |
lower |
A |
upper |
A |
Sjoerd Hermes, Joost van Heerwaarden and Pariya Behrouzi
Maintainer: Sjoerd Hermes [email protected]
1. Hermes, S., van Heerwaarden, J., & Behrouzi, P. (2024).
Copula graphical models for heterogeneous mixed data.
Journal of Computational and Graphical Statistics, 1-15.
y <- list(matrix(runif(25), 5, 5),matrix(runif(25), 5, 5),matrix(runif(25), 5, 5)) lower.upper(y[[1]])
y <- list(matrix(runif(25), 5, 5),matrix(runif(25), 5, 5),matrix(runif(25), 5, 5)) lower.upper(y[[1]])
This is a dataset consisting of maize yields, environmental and management variables measured across 2 groups. The groups pertain to different seasons (2010 and 2013) for farms in Pawe Ethiopia.
data("maize")
data("maize")
The format is: List of 2
Contains a subset of data used in the Hermes et al. (2024) paper, which is a subset of data used in the Vasco Silva et al. (forthcoming) paper.
1. Hermes, S., van Heerwaarden, J., & Behrouzi, P. (2024).
Copula graphical models for heterogeneous mixed data.
Journal of Computational and Graphical Statistics, 1-15.
1. Hermes, S., van Heerwaarden, J., & Behrouzi, P. (2024).
Copula graphical models for heterogeneous mixed data.
Journal of Computational and Graphical Statistics, 1-15.
2. Vasco Silva, J., J. van Heerwaarden, R. Pytrik, A. G. Laborte, K. Tesfaye,
and M. K. van Ittersum (forthcoming). Big data, small explanatory power?
lessons learnt with random forest predictive modeling of crop yield in
contrasting farming systems.
data(maize)
data(maize)
Model selection using the AIC, BIC and eBIC.
modselect(est, X, l1, l2, gamma)
modselect(est, X, l1, l2, gamma)
est |
Estimates of model obtained from cgmmd() function |
X |
A list of |
l1 |
Vector containing l1 penalty values. |
l2 |
Vector containing l2 penalty values. |
gamma |
EBIC gamma parameter. |
selectmat |
Matrix containing the "optimal" l1 and l2 values for each information criterion. |
theta_aic |
Estimated precision matrices using the AIC for model selection. |
theta_bic |
Estimated precision matrices using the BIC for model selection. |
theta_ebic |
Estimated precision matrices using the EBIC for model selection. |
Sjoerd Hermes, Joost van Heerwaarden and Pariya Behrouzi
Maintainer: Sjoerd Hermes [email protected]
1. Hermes, S., van Heerwaarden, J., & Behrouzi, P. (2024).
Copula graphical models for heterogeneous mixed data.
Journal of Computational and Graphical Statistics, 1-15.
X <- list(matrix(runif(25), 5, 5),matrix(runif(25), 5, 5),matrix(runif(25), 5, 5)) l1 <- c(0.4) l2 <- c(0,0.1) gamma <- 0.5 ncores <- 1 est <- heteromixgm(X, "Approximate", l1, l2, ncores) modselect(est, X, l1, l2, gamma)
X <- list(matrix(runif(25), 5, 5),matrix(runif(25), 5, 5),matrix(runif(25), 5, 5)) l1 <- c(0.4) l2 <- c(0,0.1) gamma <- 0.5 ncores <- 1 est <- heteromixgm(X, "Approximate", l1, l2, ncores) modselect(est, X, l1, l2, gamma)
Plots all partial correlation graphs based on the
selected
using one of the information criteria.
plot_pcorgraph(Theta, pos_clr, neg_clr, plot_layout, label_cex)
plot_pcorgraph(Theta, pos_clr, neg_clr, plot_layout, label_cex)
Theta |
List of |
pos_clr |
Color, hexadecimal color allowed, representing the positive partial correlations in the plotted graphs. |
neg_clr |
Color, hexadecimal color allowed, representing the negative partial correlations in the plotted graphs. |
plot_layout |
Number of rows and columns for the plot layout. |
label_cex |
Size of the vertex labels in the plotted graphs. |
There is no return value. The function only shows plots in the graphics output device.
Sjoerd Hermes, Joost van Heerwaarden and Pariya Behrouzi
Maintainer: Sjoerd Hermes [email protected]
1. Hermes, S., van Heerwaarden, J., & Behrouzi, P. (2024).
Copula graphical models for heterogeneous mixed data.
Journal of Computational and Graphical Statistics, 1-15.
temp <- data_sim(network = "Random", n = 100, p = 20, K = 4, ncat = 6, rho = 0.25, gamma_o = 0.5, gamma_b = 0.1, gamma_p = 0.2, prob = 0.05) X <- temp$z l1 <- c(0.1) l2 <- c(0,0.1) gamma <- 0.5 ncores <- 1 est <- heteromixgm(X, "Approximate", l1, l2, ncores) temp = modselect(est, X, l1, l2, gamma) plot_pcorgraph(temp$theta_aic, "green", "red", c(2,2), 4.5)
temp <- data_sim(network = "Random", n = 100, p = 20, K = 4, ncat = 6, rho = 0.25, gamma_o = 0.5, gamma_b = 0.1, gamma_p = 0.2, prob = 0.05) X <- temp$z l1 <- c(0.1) l2 <- c(0,0.1) gamma <- 0.5 ncores <- 1 est <- heteromixgm(X, "Approximate", l1, l2, ncores) temp = modselect(est, X, l1, l2, gamma) plot_pcorgraph(temp$theta_aic, "green", "red", c(2,2), 4.5)