Title: | Semi-Parametric Estimation with Gaussian Copula |
---|---|
Description: | A method for generating random vectors which are linked by a Gaussian copula. It also enables to estimate the correlation matrix of the Gaussian copula in order to identify independencies within the data. |
Authors: | Julie Cartier [aut], Florence Jaffrezic [aut], Gildas Mazo [aut], Ekaterina Tomilina [aut, cre] |
Maintainer: | Ekaterina Tomilina <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0.0 |
Built: | 2024-12-07 06:59:11 UTC |
Source: | CRAN |
This function enables the user to simulate data from a Gaussian copula and arbitrary marginal quantile functions
CopulaSim(n, R, qdist, random = FALSE)
CopulaSim(n, R, qdist, random = FALSE)
n |
the number of observations |
R |
a correlation matrix of size dxd |
qdist |
a vector containing the names of the marginal quantile functions as well as the number of times they are present in the dataset |
random |
a boolean defining whether the order of the correlation coefficients should be randomized |
a list containing an nxd data frame, the shuffled correlation matrix R, and the permutation leading to the new correlation matrix
M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) CopulaSim(20,M,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4),rep("qbinom(4,0.8)",2)),random=TRUE)
M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) CopulaSim(20,M,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4),rep("qbinom(4,0.8)",2)),random=TRUE)
This function enables the user to plot the graph corresponding to the correlations of the Gaussian copula
cor_network_graph(R, TS, binary = TRUE, legend)
cor_network_graph(R, TS, binary = TRUE, legend)
R |
a correlation matrix of size dxd (d is the number of variables) |
TS |
a threshold for the absolute values of the correlation matrix coefficients |
binary |
a boolean specifying whether the coefficients should be binarized, TRUE by defaut (zero if the coefficient is less than the threshold in absolute value, 1 otherwise). If FALSE, the edge width is proportional to the coefficient value. |
legend |
a vector containing the type of each variable used to color the vertices |
a graph representing the correlations between the latent Gaussian variables
R <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) data <- CopulaSim(20,R,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4), rep("qbinom(4,0.8)",2)),random=FALSE)[[1]] cor_network_graph(R,TS=0.3,binary=TRUE,legend=c(rep("Normal",6), rep("Exponential",4),rep("Binomial",2)))
R <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) data <- CopulaSim(20,R,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4), rep("qbinom(4,0.8)",2)),random=FALSE)[[1]] cor_network_graph(R,TS=0.3,binary=TRUE,legend=c(rep("Normal",6), rep("Exponential",4),rep("Binomial",2)))
This function enables the user to generate a diagonal block-matrix with homogeneous blocks
diag_block_matrix(blocks, coeff)
diag_block_matrix(blocks, coeff)
blocks |
a vector containing the sizes of the blocks |
coeff |
a vector containing the coefficient corresponding to each block, the coefficients must be between 0 and 1 |
a diagonal block-matrix containing the specified coefficients
diag_block_matrix(c(3,4,5),c(0.3,0.4,0.8))
diag_block_matrix(c(3,4,5),c(0.3,0.4,0.8))
This function enables the user to generate gaussian vectors with correlation matrix R
gauss_gen(R, n)
gauss_gen(R, n)
R |
a correlation matrix of size dxd |
n |
the number of observations |
a nxd data frame containing n observations of the d variables
M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) gauss_gen(M,20)
M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) gauss_gen(M,20)
Dataset containing RNA counts, protein expression and mutations measured on breast cancer tumors.
icgc_data
icgc_data
A dataframe of 15 variables and 250 observations containing the following:
RNA counts (discrete)
protein expression measurements (discrete)
5 mutations (binary)
This function enables the user to threshold matrix coefficients
matrix_cor_ts(R, TS, binary = TRUE)
matrix_cor_ts(R, TS, binary = TRUE)
R |
a correlation matrix |
TS |
a threshold |
binary |
a boolean specifying whether the coefficients should be binarized, TRUE by defaut (zero if the coefficient is less than the threshold in absolute value, 1 otherwise) |
the thresholded input matrix
M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) matrix_cor_ts(M,0.5)
M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) matrix_cor_ts(M,0.5)
This function enables the user to generate a sparse, nonnegative definite correlation matrix via the Cholesky decomposition
matrix_gen(d, gamma)
matrix_gen(d, gamma)
d |
the number of variables |
gamma |
an initial sparsity parameter for the lower triangular matrices in the Cholesky decomposition, must be between 0 and 1 |
a list containing the generated correlation matrix and its final sparsity parameter (ie the proportion of zeros)
matrix_gen(15,0.81)
matrix_gen(15,0.81)
This function enables the user to estimate the correlation matrix of the Gaussian copula for a given dataset
rho_estim(data, Type, parallel = FALSE)
rho_estim(data, Type, parallel = FALSE)
data |
an nxd data frame containing n observations of d variables |
Type |
a vector containing the type of the variables, "C" for continuous and "D" for discrete |
parallel |
a boolean encoding whether the computations should be parallelized |
the dxd estimated correlation matrix of the Gaussian copula
M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) data <- CopulaSim(20,M,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4), rep("qbinom(4,0.8)",2)),random=FALSE)[[1]] rho_estim(data,c(rep("C",10),rep("D",2)))
M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2)) data <- CopulaSim(20,M,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4), rep("qbinom(4,0.8)",2)),random=FALSE)[[1]] rho_estim(data,c(rep("C",10),rep("D",2)))