Title: | Fast Mutual Information Based Independence Test |
---|---|
Description: | A mutual information estimator based on k-nearest neighbor method proposed by A. Kraskov, et al. (2004) <doi:10.1103/PhysRevE.69.066138> to measure general dependence and the time complexity for our estimator is only squared to the sample size, which is faster than other statistics. Besides, an implementation of mutual information based independence test is provided for analyzing multivariate data in Euclidean space (T B. Berrett, et al. (2019) <doi:10.1093/biomet/asz024>); furthermore, we extend it to tackle datasets in metric spaces. |
Authors: | Shiyun Lin [aut, cre], Jin Zhu [aut], Wenliang Pan [aut], Xueqin Wang [aut], SC2S2 [cph] |
Maintainer: | Shiyun Lin <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.1 |
Built: | 2024-10-27 06:27:42 UTC |
Source: | CRAN |
Estimate mutual information based on the distribution of nearest neighborhood distances. The kNN method is described by Kraskov, et. al (2004).
mi(x, y, k = 5, distance = FALSE)
mi(x, y, k = 5, distance = FALSE)
x |
A numeric vector, matrix, data.frame or |
y |
A numeric vector, matrix, data.frame or |
k |
Order of neighborhood to be used in the kNN method. |
distance |
Bool flag for considering |
If two samples are passed to arguments x
and y
, the sample sizes
(i.e. number of rows of the matrix or length of the vector) must agree.
Moreover, data being passed to x
and y
must not contain missing or infinite values.
mi |
The estimated mutual information. |
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical review E 69(6): 066138.
library(fastmit) set.seed(1) x <- rnorm(100) y <- x + rnorm(100) mi(x, y, k = 5, distance = FALSE) set.seed(1) x <- rnorm(100) y <- 100 * x + rnorm(100) distx <- dist(x) disty <- dist(y) mi(distx, disty, k = 5, distance = TRUE)
library(fastmit) set.seed(1) x <- rnorm(100) y <- x + rnorm(100) mi(x, y, k = 5, distance = FALSE) set.seed(1) x <- rnorm(100) y <- 100 * x + rnorm(100) distx <- dist(x) disty <- dist(y) mi(distx, disty, k = 5, distance = TRUE)
Mutual Information test of independence. Mutual Information are generic dependence measures in Banach spaces.
mi.test(x, y, k = 5, distance = FALSE, num.permutations = 99, seed = 1)
mi.test(x, y, k = 5, distance = FALSE, num.permutations = 99, seed = 1)
x |
A numeric vector, matrix, data.frame or |
y |
A numeric vector, matrix, data.frame or |
k |
Order of neighborhood to be used in the kNN method. |
distance |
Bool flag for considering |
num.permutations |
The number of permutation replications.
If |
seed |
The random seed. Default: |
If two samples are passed to arguments x
and y
, the sample sizes
(i.e. number of rows of the matrix or length of the vector) must agree.
Moreover, data being passed to x
and y
must not contain missing or infinite values.
mi.test
utilizes the Mutual Information statistics (see mi
)
to measure dependence and derive a -value via replicating the random permutation
num.permutations
times.
If num.permutations > 0
, mi.test
returns a htest
class object containing the following components:
statistic |
Mutual Information statistic. |
p.value |
The p-value for the test. |
replicates |
Permutation replications of the test statistic. |
size |
Sample size. |
alternative |
A character string describes the alternative hypothesis. |
method |
A character string indicates what type of test was performed. |
data.name |
Description of data. |
If num.permutations = 0
, mi.test
returns a statistic value.
library(fastmit) set.seed(1) error <- runif(50, min = -0.3, max = 0.3) x <- runif(50, 0, 4*pi) y <- cos(x) + error # plot(x, y) res <- mi.test(x, y)
library(fastmit) set.seed(1) error <- runif(50, min = -0.3, max = 0.3) x <- runif(50, 0, 4*pi) y <- cos(x) + error # plot(x, y) res <- mi.test(x, y)