Title: | Conditional Distance Correlation Based Feature Screening and Conditional Independence Inference |
---|---|
Description: | Conditional distance correlation <doi:10.1080/01621459.2014.993081> is a novel conditional dependence measurement of two multivariate random variables given a confounding variable. This package provides conditional distance correlation, performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data <https://www3.stat.sinica.edu.tw/statistica/J28N1/J28N114/J28N114.html>, and conducts conditional distance covariance test for conditional independence assumption of two multivariate variable. |
Authors: | Wenhao Hu [aut], Mian Huang [aut], Wenliang Pan [aut], Xueqin Wang [aut] , Canhong Wen [aut, cre], Yuan Tian [aut], Heping Zhang [aut] , Jin Zhu [aut] |
Maintainer: | Canhong Wen <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0.5 |
Built: | 2024-11-22 06:57:13 UTC |
Source: | CRAN |
Conditional distance correlation <doi:10.1080/01621459.2014.993081> is a novel conditional dependence measurement of two multivariate random variables given a confounding variable. This package provides conditional distance correlation, performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data <doi:10.5705/ss.202014.0117>, and conducts conditional distance covariance test for conditional independence assumption of two multivariate variable.
Wenhao Hu, Mian Huang, Wenliang Pan, Xueqin Wang, Canhong Wen, Yuan Tian, Heping Zhang, Jin Zhu Maintainer: Canhong Wen <[email protected]>
Wang, X., Pan, W., Hu, W., Tian, Y. and Zhang, H., 2015. Conditional distance correlation. Journal of the American Statistical Association, 110(512), pp.1726-1734.
Wen, C., Pan, W., Huang, M. and Wang, X., 2018. Sure independence screening adjusted for confounding covariates with ultrahigh-dimensional data. Statistica Sinica, 28, pp.293-317. URL http://www3.stat.sinica.edu.tw/statistica/J28N1/28-1.html
Computes conditional distance covariance and conditional distance correlation statistics, which are multivariate measures of conditional dependence.
cdcov(x, y, z, width, index = 1, distance = FALSE) cdcor(x, y, z, width, index = 1, distance = FALSE)
cdcov(x, y, z, width, index = 1, distance = FALSE) cdcor(x, y, z, width, index = 1, distance = FALSE)
x |
a numeric vector, matrix, or |
y |
a numeric vector, matrix, or |
z |
|
width |
a user-specified positive value (univariate conditional variable) or vector (multivariate conditional variable) for
gaussian kernel bandwidth. Its default value is relies on |
index |
exponent on Euclidean distance, in |
distance |
if |
cdcov
and cdcor
compute conditional distance covariance and conditional distance correlation statistics.
The sample sizes (number of rows or length of the vector) of the two variables must agree,
and samples must not contain missing values.
If we set distance = TRUE
, arguments x
, y
can be a dist
object recording distance between samples;
otherwise, these arguments are treated as multivariate data.
cdcov |
conditional distance covariance test statistic. |
cdcor |
conditional distance correlation statistic. |
cdc |
conditional distance covariance/correlation vector. |
Canhong Wen, Wenliang Pan, and Xueqin Wang
Wang, X., Pan, W., Hu, W., Tian, Y. and Zhang, H., 2015. Conditional distance correlation. Journal of the American Statistical Association, 110(512), pp.1726-1734.
library(cdcsis) ############# Conditional Distance Covariance ############# set.seed(1) x <- rnorm(25) y <- rnorm(25) z <- rnorm(25) cdcov(x, y, z) ############# Conditional Distance Correlation ############# num <- 25 set.seed(1) x <- rnorm(num) y <- rnorm(num) z <- rnorm(num) cdcor(x, y, z)
library(cdcsis) ############# Conditional Distance Covariance ############# set.seed(1) x <- rnorm(25) y <- rnorm(25) z <- rnorm(25) cdcov(x, y, z) ############# Conditional Distance Correlation ############# num <- 25 set.seed(1) x <- rnorm(num) y <- rnorm(num) z <- rnorm(num) cdcor(x, y, z)
Performs the nonparametric conditional distance covariance test for conditional independence assumption
cdcov.test( x, y, z, num.bootstrap = 99, width, distance = FALSE, index = 1, seed = 1, num.threads = 1 )
cdcov.test( x, y, z, num.bootstrap = 99, width, distance = FALSE, index = 1, seed = 1, num.threads = 1 )
x |
a numeric vector, matrix, or |
y |
a numeric vector, matrix, or |
z |
|
num.bootstrap |
the number of local bootstrap procedure replications. Default: |
width |
a user-specified positive value (univariate conditional variable) or vector (multivariate conditional variable) for
gaussian kernel bandwidth. Its default value is relies on |
distance |
if |
index |
exponent on Euclidean distance, in |
seed |
the random seed |
num.threads |
number of threads. Default |
cdcov.test
returns a list with class "htest" containing the following components:
statistic |
conditional distance covariance statistic. |
p.value |
the |
replicates |
the number of local bootstrap procedure replications. |
size |
sample sizes. |
alternative |
a character string describing the alternative hypothesis. |
method |
a character string indicating what type of test was performed. |
data.name |
description of data. |
Wang, X., Pan, W., Hu, W., Tian, Y. and Zhang, H., 2015. Conditional distance correlation. Journal of the American Statistical Association, 110(512), pp.1726-1734.
library(cdcsis) set.seed(1) num <- 50 ################# Conditional Independent ################# ## Case 1: cov_mat <- matrix(c(1, 0.36, 0.6, 0.36, 1, 0.6, 0.6, 0.6, 1), nrow = 3) dat <- mvtnorm::rmvnorm(n = num, sigma = cov_mat) x <- dat[, 1] y <- dat[, 2] z <- dat[, 3] cdcov.test(x, y, z) ## Case 2: z <- rnorm(num) x <- 0.5 * (z^3 / 7 + z / 2) + tanh(rnorm(num)) x <- x + x^3 / 3 y <- (z^3 + z) / 3 + rnorm(num) y <- y + tanh(y / 3) cdcov.test(x, y, z, num.bootstrap = 99) ################# Conditional Dependent ################# ## Case 3: cov_mat <- matrix(c(1, 0.7, 0.6, 0.7, 1, 0.6, 0.6, 0.6, 1), nrow = 3) dat <- mvtnorm::rmvnorm(n = num, sigma = cov_mat) x <- dat[, 1] y <- dat[, 2] z <- dat[, 3] cdcov.test(x, y, z, width = 0.5) ## Case 4: z <- matrix(rt(num * 4, df = 2), nrow = num) x <- z y <- cbind(sin(z[, 1]) + cos(z[, 2]) + (z[, 3])^2 + (z[, 4])^2, (z[, 1])^2 + (z[, 2])^2 + z[, 3] + z[, 4]) z <- z[, 1:2] cdcov.test(x, y, z, seed = 2) ################# Distance Matrix Input ################# x <- dist(x) y <- dist(y) cdcov.test(x, y, z, seed = 2, distance = TRUE)
library(cdcsis) set.seed(1) num <- 50 ################# Conditional Independent ################# ## Case 1: cov_mat <- matrix(c(1, 0.36, 0.6, 0.36, 1, 0.6, 0.6, 0.6, 1), nrow = 3) dat <- mvtnorm::rmvnorm(n = num, sigma = cov_mat) x <- dat[, 1] y <- dat[, 2] z <- dat[, 3] cdcov.test(x, y, z) ## Case 2: z <- rnorm(num) x <- 0.5 * (z^3 / 7 + z / 2) + tanh(rnorm(num)) x <- x + x^3 / 3 y <- (z^3 + z) / 3 + rnorm(num) y <- y + tanh(y / 3) cdcov.test(x, y, z, num.bootstrap = 99) ################# Conditional Dependent ################# ## Case 3: cov_mat <- matrix(c(1, 0.7, 0.6, 0.7, 1, 0.6, 0.6, 0.6, 1), nrow = 3) dat <- mvtnorm::rmvnorm(n = num, sigma = cov_mat) x <- dat[, 1] y <- dat[, 2] z <- dat[, 3] cdcov.test(x, y, z, width = 0.5) ## Case 4: z <- matrix(rt(num * 4, df = 2), nrow = num) x <- z y <- cbind(sin(z[, 1]) + cos(z[, 2]) + (z[, 3])^2 + (z[, 4])^2, (z[, 1])^2 + (z[, 2])^2 + z[, 3] + z[, 4]) z <- z[, 1:2] cdcov.test(x, y, z, seed = 2) ################# Distance Matrix Input ################# x <- dist(x) y <- dist(y) cdcov.test(x, y, z, seed = 2, distance = TRUE)
Performs conditional distance correlation sure independence screening (CDC-SIS).
cdcsis( x, y, z = NULL, width, threshold = nrow(y), distance = FALSE, index = 1, num.threads = 1 )
cdcsis( x, y, z = NULL, width, threshold = nrow(y), distance = FALSE, index = 1, num.threads = 1 )
x |
a numeric matrix, or a list which contains multiple numeric matrix |
y |
a numeric vector, matrix, or |
z |
|
width |
a user-specified positive value (univariate conditional variable) or vector (multivariate conditional variable) for
gaussian kernel bandwidth. Its default value is relies on |
threshold |
the threshold of the number of predictors recuited by CDC-SIS.
Should be less than or equal than the number of column of |
distance |
if |
index |
exponent on Euclidean distance, in |
num.threads |
number of threads. Default |
ix |
the vector of indices selected by CDC-SIS |
cdcor |
the conditional distance correlation for each univariate/multivariate variable in |
Canhong Wen, Wenliang Pan, Mian Huang, and Xueqin Wang
Wen, C., Pan, W., Huang, M. and Wang, X., 2018. Sure independence screening adjusted for confounding covariates with ultrahigh-dimensional data. Statistica Sinica, 28, pp.293-317. URL http://www3.stat.sinica.edu.tw/statistica/J28N1/J28N114/J28N114.html
## Not run: library(cdcsis) ########## univariate explanative variables ########## set.seed(1) num <- 100 p <- 150 x <- matrix(rnorm(num * p), nrow = num) z <- rnorm(num) y <- 3 * x[, 1] + 1.5 * x[, 2] + 4 * z * x[, 5] + rnorm(num) res <- cdcsis(x, y, z) head(res[["ix"]], n = 10) ########## multivariate explanative variables ########## x <- as.list(as.data.frame(x)) x <- lapply(x, as.matrix) x[[1]] <- cbind(x[[1]], x[[2]]) x[[2]] <- NULL res <- cdcsis(x, y, z) head(res[["ix"]], n = 10) ########## multivariate response variables ########## num <- 100 p <- 150 x <- matrix(rnorm(num * p), nrow = num) z <- rnorm(num) y1 <- 3 * x[, 1] + 5 * z * x[, 4] + rnorm(num) y2 <- 3 * x[, 2] + 5 * x[, 3] + 2 * z + rnorm(num) y <- cbind(y1, y2) res <- cdcsis(x, y, z) head(res[["ix"]], n = 10) ## End(Not run)
## Not run: library(cdcsis) ########## univariate explanative variables ########## set.seed(1) num <- 100 p <- 150 x <- matrix(rnorm(num * p), nrow = num) z <- rnorm(num) y <- 3 * x[, 1] + 1.5 * x[, 2] + 4 * z * x[, 5] + rnorm(num) res <- cdcsis(x, y, z) head(res[["ix"]], n = 10) ########## multivariate explanative variables ########## x <- as.list(as.data.frame(x)) x <- lapply(x, as.matrix) x[[1]] <- cbind(x[[1]], x[[2]]) x[[2]] <- NULL res <- cdcsis(x, y, z) head(res[["ix"]], n = 10) ########## multivariate response variables ########## num <- 100 p <- 150 x <- matrix(rnorm(num * p), nrow = num) z <- rnorm(num) y1 <- 3 * x[, 1] + 5 * z * x[, 4] + rnorm(num) y2 <- 3 * x[, 2] + 5 * x[, 3] + 2 * z + rnorm(num) y <- cbind(y1, y2) res <- cdcsis(x, y, z) head(res[["ix"]], n = 10) ## End(Not run)