Title: | Model-Free Functional Chi-Squared and Exact Tests |
---|---|
Description: | Statistical hypothesis testing methods for inferring model-free functional dependency using asymptotic chi-squared or exact distributions. Functional test statistics are asymmetric and functionally optimal, unique from other related statistics. Tests in this package reveal evidence for causality based on the causality-by- functionality principle. They include asymptotic functional chi-squared tests (Zhang & Song 2013) <doi:10.48550/arXiv.1311.2707>, an adapted functional chi-squared test (Kumar & Song 2022) <doi:10.1093/bioinformatics/btac206>, and an exact functional test (Zhong & Song 2019) <doi:10.1109/TCBB.2018.2809743> (Nguyen et al. 2020) <doi:10.24963/ijcai.2020/372>. The normalized functional chi-squared test was used by Best Performer 'NMSUSongLab' in HPN-DREAM (DREAM8) Breast Cancer Network Inference Challenges (Hill et al. 2016) <doi:10.1038/nmeth.3773>. A function index (Zhong & Song 2019) <doi:10.1186/s12920-019-0565-9> (Kumar et al. 2018) <doi:10.1109/BIBM.2018.8621502> derived from the functional test statistic offers a new effect size measure for the strength of functional dependency, a better alternative to conditional entropy in many aspects. For continuous data, these tests offer an advantage over regression analysis when a parametric functional form cannot be assumed; for categorical data, they provide a novel means to assess directional dependency not possible with symmetrical Pearson's chi-squared or Fisher's exact tests. |
Authors: | Yang Zhang [aut], Hua Zhong [aut]
|
Maintainer: | Joe Song <[email protected]> |
License: | LGPL (>= 3) |
Version: | 2.5.4 |
Built: | 2025-02-05 06:42:44 UTC |
Source: | CRAN |
Statistical hypothesis testing methods for model-free functional dependency using asymptotic chi-squared or exact distributions. Functional chi-squared test statistics (Zhang and Song 2013; Zhang 2014; Nguyen 2018; Zhong 2019; Zhong and Song 2019a; Nguyen et al. 2020) are asymmetric, functionally optimal, and model-free, unique from other related statistical measures.
Tests in this package reveal evidence for causality based on the causality-by-functionality principle (Simon and Rescher 1966). The tests require data from two or more variables be formatted as a contingency table. Continuous variables need to be discretized first, for example, using R packages Ckmeans.1d.dp or GridOnClusters.
The package implements an asymptotic functional chi-squared test (Zhang and Song 2013; Zhang 2014), an adapted functional chi-squared test (Kumar2022AFT), and an exact functional test (Nguyen 2018; Zhong 2019; Zhong and Song 2019a; Nguyen et al. 2020). The normalized functional chi-squared test was used by Best Performer NMSUSongLab in HPN-DREAM (DREAM8) Breast Cancer Network Inference Challenges (Hill et al. 2016).
A function index derived from the functional chi-squared offers a new effect size measure for the strength of function dependency. It is asymmetrically functionally optimal, different from the symmetric Cramer's V, also a better alternative to conditional entropy in many aspects.
A simulator is provided to generate functional, dependent non-functional, and independent patterns (Sharma et al. 2017).
For continuous data, these tests offer an advantage over regression analysis when a parametric form cannot be reliably assumed for the underlying function. For categorical data, they provide a novel means to assess directional dependency not possible with symmetrical Pearson's chi-squared test, G-test, or Fisher's exact test.
Package: | FunChisq |
Type: | Package |
Current version: | 2.5.3 |
Initial release version: | 1.0 |
Initial release date: | 2014-03-08 |
License: | LGPL (>= 3) |
Yang Zhang, Hua Zhong, Hien Nguyen, Ruby Sharma, Sajal Kumar, Yiyi Li, and Joe Song
Hill SM, Heiser LM, Cokelaer T, Unger M, Nesser NK, Carlin DE, Zhang Y, Sokolov A, Paull EO, Wong CK, Graim K, Bivol A, Wang H, Zhu F, Afsari B, Danilova LV, Favorov AV, Lee WS, Taylor D, Hu CW, Long BL, Noren DP, Bisberg AJ, The HPN-DREAM Consortium, Mills GB, Gray JW, Kellen M, Norman T, Friend S, Qutub AA, Fertig EJ, Guan Y, Song M, Stuart JM, Spellman PT, Koeppl H, Stolovitzky G, Saez-Rodriguez J, Mukherjee S (2016).
“Inferring causal molecular networks: empirical assessment through a community-based effort.”
Nat Methods, 13, 310–318.
doi:10.1038/nmeth.3773.
Nguyen HH (2018).
Inference of Functional Dependency via Asymmetric, Optimal, and Model-free Statistics.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
Nguyen HH, Zhong H, Song M (2020).
“Optimality, accuracy, and efficiency of an exact functional test.”
In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 2683–2689.
doi:10.24963/ijcai.2020/372.
Sharma R, Kumar S, Zhong H, Song M (2017).
“Simulating noisy, nonparametric, and multivariate discrete patterns.”
The R Journal, 9(2), 366–377.
doi:10.32614/RJ-2017-053.
Simon HA, Rescher N (1966).
“Cause and counterfactual.”
Philosophy of Science, 33(4), 323–340.
Zhang Y (2014).
Nonparametric Statistical Methods for Biological Network Inference.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
Zhang Y, Song M (2013).
“Deciphering interactions in causal networks without parametric assumptions.”
arXiv Molecular Networks, arXiv:1311.2707.
https://arxiv.org/abs/1311.2707.
Zhong H (2019).
Model-free Gene-to-zone Network Inference of Molecular Mechanisms in Biology.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
Zhong H, Song M (2019a).
“A fast exact functional test for directional association and cancer biology applications.”
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(3), 818–826.
doi:10.1109/TCBB.2018.2809743.
For data discretization, an option is optimal univariate clustering via package Ckmeans.1d.dp. A second option is joint multivariate discretization via package GridOnClusters.
For symmetric dependency tests on discrete data, see Pearson's chi-squared test (chisq.test
), Fisher's exact test (fisher.test
), mutual information (package entropy), and G-test, implemented in packages DescTools and RVAideMemoire.
The function can apply two types of noise to contingency tables of discrete values. A house noise model is designed for ordinal variables; a candle noise model is for categorical variables. Noise is applied independently for each data point in a table.
add.noise(tables, u, noise.model, margin=0) add.house.noise(tables, u, margin=0) add.candle.noise(tables, u, margin=0)
add.noise(tables, u, noise.model, margin=0) add.house.noise(tables, u, margin=0) add.candle.noise(tables, u, margin=0)
tables |
a list of tables or one table. A table can be either a matrix or a data frame of integer values. |
u |
a numeric value between 0 and 1 to specify the noise level to be applied to the input tables. See Details. |
noise.model |
a character string indicating the noise model of either |
margin |
a value of either 0, 1, or 2. Default is 0. 0: noise is applied along both rows and columns in a table. The sum of values in the table is the same before and after noise application. 1: noise is applied along each row. The sum of each row is the same before and after noise application. 2: noise is applied along each column. The sum of each column is the same before and after noise application. |
Each noise model defines a conditional probability function of a noisy version given an original discrete value and a noise level. In the house noise model for ordinal variables, defined in (Zhang et al. 2015), the probability decreases as the noisy version deviates from the original ordinal value. The shape of the function is like a pitched house roof. In the candle noise model for categorical variables, the probability of the noisy version for any value other than the original categorical value is the same given the noise level. The function shape is like a candle.
At a minimum level of 0, no noise is applied on the input table(s). A maximum level of 1 indicates that the original sample will be changed to some other values with a probability of 1. For a discrete random variable of two possible values, a noise level of 1 will flip the values and create a non-random pattern; a noise level of 0.5 creates the most random pattern.
If tables
is a list, the function returns a list of tables with noised applied. If tables
is a numeric matrix or a data frame, the function returns one table with noise applied.
Hua Zhong, Yang Zhang, and Joe Song.
Zhang Y, Liu ZL, Song M (2015). “ChiNet uncovers rewired transcription subnetworks in tolerant yeast for advanced biofuels conversion.” Nucleic Acids Research, 43(9), 4393–4407. doi:10.1093/nar/gkv358.
# Example 1. Add house noise to a single table # Create a 4x4 table t <- matrix(c(3,0,0,0, 0,2,2,0, 0,0,0,4, 3,3,2,0), nrow=4, ncol=4, byrow=TRUE) # Two ways to apply house noise at level 0.1 along both rows # and columns of the table: add.noise(t, 0.1, "house", 0) add.house.noise(t, 0.1, 0) # Example 2. Add candle noise to a list of tables # Create a list of tables t.list <- list(t+5, t*10, t*2) # Two ways to apply candle noise at level 0.2 along the rows # of the table: add.noise(t.list, 0.2, "candle", 1) add.candle.noise(t.list, 0.2, 1)
# Example 1. Add house noise to a single table # Create a 4x4 table t <- matrix(c(3,0,0,0, 0,2,2,0, 0,0,0,4, 3,3,2,0), nrow=4, ncol=4, byrow=TRUE) # Two ways to apply house noise at level 0.1 along both rows # and columns of the table: add.noise(t, 0.1, "house", 0) add.house.noise(t, 0.1, 0) # Example 2. Add candle noise to a list of tables # Create a list of tables t.list <- list(t+5, t*10, t*2) # Two ways to apply candle noise at level 0.2 along the rows # of the table: add.noise(t.list, 0.2, "candle", 1) add.candle.noise(t.list, 0.2, 1)
Asymptotic chi-squared test to determine the model-free functional dependency of effect variable on a cause variable
, conditioned on a third variable
.
cond.fun.chisq.test(x, y, z=NULL, data=NULL, log.p = FALSE, method = c("fchisq", "nfchisq"))
cond.fun.chisq.test(x, y, z=NULL, data=NULL, log.p = FALSE, method = c("fchisq", "nfchisq"))
x |
vector or character; either a discrete random variable (cause) represented as vector or a character column name in |
y |
vector or character; either a discrete random variable (effect) represented as vector or a character column name in |
z |
vector or character; either a discrete random variable (condition) represented as vector or a character column name in |
data |
a data frame containing three or more columns whose names can be used as values for |
log.p |
logical; if |
method |
a character string to specify the method to compute the conditional functional chi-squared test statistic and its p-value. The options are |
The conditional functional chi-squared test introduces the concept of conditional functional depedency, where the functional association between two variables (x
and y
) is tested conditioned on a third variable (z
) (Zhang 2014). Two methods are provided to compute the chi-squared statistic and its p-value. When method = "fchisq"
, the p-value is computed using the chi-squared distribution; when method = "nfchisq"
, a normalized statistic is obtained by shifting and scaling the original chi-squared statistic and a p-value is computed using the standard normal distribution (Box et al. 2005). The normalized test is more conservative on the degrees of freedom.
A list with class "htest
" containing the following components:
statistic |
the conditional functional chi-squared statistic if |
parameter |
degrees of freedom for the conditional functional chi-squared statistic. |
p.value |
p-value of the conditional functional test. If |
estimate |
an estimate of the conditional function index between 0 and 1. The value of 1 indicates strong functional dependency between |
Sajal Kumar and Mingzhou Song
Box GE, Hunter JS, Hunter WG (2005).
Statistics for Experimenters: Design, Innovation and Discovery, 2nd edition.
Wiley-Interscience, New York.
Zhang Y (2014).
Nonparametric Statistical Methods for Biological Network Inference.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
See (unconditional) functional chi-squared test fun.chisq.test
.
# Generate a relationship between variables X and Z xz = matrix(c(30,2,2, 2,2,40, 2,30,2),ncol=3,nrow=3, byrow = TRUE) # Re-construct X x = rep(c(1:nrow(xz)),rowSums(xz)) # Re-construct Z z = c() for(i in 1:nrow(xz)) z = c(z,rep(c(1:ncol(xz)),xz[i,])) # Generate a relationship between variables Z and Y # Make sure Z retains its distribution zy = matrix(c(4,30, 30,4, 4,40),ncol=2,nrow=3, byrow = TRUE) # Re-construct Y y = rep(0,length(z)) for(i in unique(z)) y[z==i] = rep(c(1:ncol(zy)),zy[i,]) # Tables table(x,z) table(z,y) table(x,y) # Conditional functional dependency # Y = f(X) | Z should be false cond.fun.chisq.test(x=x,y=y,z=z) # Z = f(X) | Y should be true cond.fun.chisq.test(x=x,y=z,z=y) # Y = f(Z) | X should be true cond.fun.chisq.test(x=z,y=y,z=x)
# Generate a relationship between variables X and Z xz = matrix(c(30,2,2, 2,2,40, 2,30,2),ncol=3,nrow=3, byrow = TRUE) # Re-construct X x = rep(c(1:nrow(xz)),rowSums(xz)) # Re-construct Z z = c() for(i in 1:nrow(xz)) z = c(z,rep(c(1:ncol(xz)),xz[i,])) # Generate a relationship between variables Z and Y # Make sure Z retains its distribution zy = matrix(c(4,30, 30,4, 4,40),ncol=2,nrow=3, byrow = TRUE) # Re-construct Y y = rep(0,length(z)) for(i in unique(z)) y[z==i] = rep(c(1:ncol(zy)),zy[i,]) # Tables table(x,z) table(z,y) table(x,y) # Conditional functional dependency # Y = f(X) | Z should be false cond.fun.chisq.test(x=x,y=y,z=z) # Z = f(X) | Y should be true cond.fun.chisq.test(x=x,y=z,z=y) # Y = f(Z) | X should be true cond.fun.chisq.test(x=z,y=y,z=x)
Comparative functional chi-squared tests on two or more contingency tables.
cp.fun.chisq.test( x, method = c("fchisq", "nfchisq", "default", "normalized"), log.p = FALSE )
cp.fun.chisq.test( x, method = c("fchisq", "nfchisq", "default", "normalized"), log.p = FALSE )
x |
a list of at least two matrices representing contingency tables of the same dimensionality. |
method |
a character string to specify the method to compute the functional chi-squared statistic and its p-value. The default is Note: |
log.p |
logical; if |
The comparative functional chi-squared test determines whether the patterns underlying the contingency tables are heterogeneous in a functional way (Zhang 2014). Specifically, it evaluates whether the column variable is a changed function of the row variable across the contingency tables.
Two methods are provided to compute the functional chi-squared statistic and its p-value. When method = "fchisq"
(or "default"
), the p-value is computed using the chi-squared distribution; when method =
"nfchisq"
(or "normalized"
) a normalized statistic is obtained by shifting and scaling the original statistic and a p-value is computed using the standard normal distribution (Box et al. 2005)
(Box et al., 2005). The normalized test is more conservative on the degrees of freedom.
A list with class "htest
" containing the following components:
statistic |
functional heterogeneity statistic if |
parameter |
degrees of freedom. |
p.value |
p-value of the comparative functional chi-squared test. By default, it is computed by the chi-squared distribution. If |
Yang Zhang and Joe Song
Box GE, Hunter JS, Hunter WG (2005).
Statistics for Experimenters: Design, Innovation and Discovery, 2nd edition.
Wiley-Interscience, New York.
Zhang Y (2014).
Nonparametric Statistical Methods for Biological Network Inference.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
For comparative chi-squared test that does not consider functional dependencies, cp.chisq.test
.
x <- matrix(c(4,0,4,0,4,0,1,0,1), 3) y <- t(x) z <- matrix(c(1,0,1,4,0,4,0,4,0), 3) data <- list(x,y,z) cp.fun.chisq.test(data) cp.fun.chisq.test(data, method="nfchisq")
x <- matrix(c(4,0,4,0,4,0,1,0,1), 3) y <- t(x) z <- matrix(c(1,0,1,4,0,4,0,4,0), 3) data <- list(x,y,z) cp.fun.chisq.test(data) cp.fun.chisq.test(data, method="nfchisq")
Perform the exact functional test on a contingency table to determine if the column variable is a function of the row variable. The null population includes tables with fixed row and column sums as in the observed table. The null distribution follows an exact multivariate hypergeometric distribution.
EFTDP(nm) EFTDQP(nm)
EFTDP(nm) EFTDQP(nm)
nm |
a matrix of nonnegative integers representing a contingency table. |
The exact functional test is performed using branch-and-bound with two algorithms (DP and DQP) to avoid re-calculation of bounds (Nguyen 2018; Nguyen et al. 2020).
The exact p-value of the test.
The functions provide a direct entry into the C++ implementations of the exact functional test (Nguyen 2018; Nguyen et al. 2020).
Hien Nguyen, Hua Zhong, Yiyi Li, and Joe Song
Nguyen HH (2018).
Inference of Functional Dependency via Asymmetric, Optimal, and Model-free Statistics.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
Nguyen HH, Zhong H, Song M (2020).
“Optimality, accuracy, and efficiency of an exact functional test.”
In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 2683–2689.
doi:10.24963/ijcai.2020/372.
x = matrix(c(0, 6, 3, 0, 10, 5, 4, 4, 1), nrow=3) EFTDQP(x) EFTDQP(t(x)) EFTDP(x) EFTDP(t(x))
x = matrix(c(0, 6, 3, 0, 10, 5, 4, 4, 1), nrow=3) EFTDQP(x) EFTDQP(t(x)) EFTDP(x) EFTDP(t(x))
Asymptotic chi-squared, normalized chi-squared or exact tests on contingency tables to determine model-free functional dependency of the column variable on the row variable.
fun.chisq.test( x, method = c("fchisq", "nfchisq", "adapted", "exact", "exact.qp", "exact.dp", "exact.dqp", "default", "normalized", "simulate.p.value"), alternative = c("non-constant", "all"), log.p=FALSE, index.kind = c("conditional", "unconditional"), simulate.nruns = 2000, exact.mode.bound=TRUE )
fun.chisq.test( x, method = c("fchisq", "nfchisq", "adapted", "exact", "exact.qp", "exact.dp", "exact.dqp", "default", "normalized", "simulate.p.value"), alternative = c("non-constant", "all"), log.p=FALSE, index.kind = c("conditional", "unconditional"), simulate.nruns = 2000, exact.mode.bound=TRUE )
x |
a matrix representing a contingency table. The row variable represents the independent variable or all unique combinations of multiple independent variables. The column variable is the dependent variable. |
method |
a character string to specify the method to compute the functional chi-squared test statistic and its p-value. The options are Note: |
alternative |
a character string to specify the alternative hypothesis. The options are |
log.p |
logical; if |
index.kind |
a character string to specify the kind of function index xi.f to be estimated. The options are |
simulate.nruns |
A number to specify the number of tables generated to simulate the null distribution. Default is |
exact.mode.bound |
logical; if |
The functional chi-squared test determines whether the column variable is a function of the row variable in contingency table x
(Zhang and Song 2013; Zhang 2014). This function supports three hypothesis testing methods:
When method="fchisq"
(equivalent to "default"
, the default), the test statistic is computed as described in (Zhang and Song 2013; Zhang 2014) and the p-value is computed using the chi-squared distribution.
When method="nfchisq"
(equivalent to "normalized"
), the test statistic is obtained by shifting and scaling the original test statistic (Zhang and Song 2013; Zhang 2014); and the p-value is computed using the standard normal distribution (Box et al. 2005). The normalized chi-squared, more conservative on the degrees of freedom, was used by the Best Performer NMSUSongLab in HPN-DREAM (DREAM8) Breast Cancer Network Inference Challenges.
When method="exact"
, "exact.qp"
(quadratic programming) (Zhong and Song 2019a; Zhong 2019), "exact.dp"
(dynamic programming) (Nguyen 2018; Nguyen et al. 2020), or "exact.dqp"
(dynamic and quadratic programming) (Nguyen 2018; Nguyen et al. 2020), an exact functional test is performed. The option of "exact"
uses "exact.dqp"
, the fastest method. All methods compute an exact p-value.
When method="adapted"
, the adapted functional chi-squared test (Kumar and Song 2022) is used. The test statistic is obtained by evaluating the most populous portrait or square (number of rows <= number of columns) table in the contingency table x
. The p-value is computed using the chi-squared distribution. This option should be used to determine the functional direction between variables in x
.
For the "exact.qp"
and "exact.dp"
options, if the sample size is no more than 200 or the average cell count is less than five, and the table size is no more than 10 in either row or column, the exact test will not be called and the asymptotic functional chi-squared test (method="fchisq"
) is used instead.
For "exact.dqp"
, the exact functional test will always be performed.
For 2-by-2 contingency tables, the asymptotic test options (method="fchisq"
or "nfchisq"
) are recommended to test functional dependency, instead of the exact functional test.
When method="simulate.p.value"
, a simulated null distribution is used to calculate p-value
. The null distribution is a multinomial distribution that is the product of two marginal distributions. Like other Monte Carlo based methods, this method is slower but may be more accurate than other methods based on asymptotic distributions.
index.kind
specifies the kind of function index to be computed. If the experimental design controls neither the row nor column marginal sums, index.kind = "unconditional"
is recommended; If the column marginal sums are controlled, index.kind = "conditional"
is recommended. The conditional
function index is the square root of Goodman-Kruskal's tau (Goodman and Kruskal 1954). The choice of index.kind
affects only the function index xi.f value, but not the test statistic or p-value.
A list with class "htest
" containing the following components:
statistic |
the functional chi-squared statistic if |
parameter |
degrees of freedom for the functional chi-squared statistic. |
p.value |
p-value of the functional test. If |
estimate |
an estimate of function index between 0 and 1. The value of 1 indicates a strictly mathematical function. It is asymmetrical with respect to transpose of the input contingency table, different from the symmetrical Cramer's V based on the Pearson's chi-squared test statistic. See (Zhong and Song 2019b; Kumar et al. 2018) for the definition of function index. |
Yang Zhang, Hua Zhong, Hien Nguyen, Sajal Kumar, and Joe Song
Box GE, Hunter JS, Hunter WG (2005).
Statistics for Experimenters: Design, Innovation and Discovery, 2nd edition.
Wiley-Interscience, New York.
Goodman LA, Kruskal WH (1954).
“Measures of Association for Cross Classifications.”
Journal of the American Statistical Association, 49(268), 732–764.
Kumar S, Song M (2022).
“Overcoming biases in causal inference of molecular interactions.”
Bioinformatics, 38(10), 2818–2825.
doi:10.1093/bioinformatics/btac206.
Kumar S, Zhong H, Sharma R, Li Y, Song M (2018).
“Scrutinizing functional interaction networks from RNA-binding proteins to their targets in cancer.”
In IEEE International Conference on Bioinformatics and Biomedicine, 185–190.
doi:10.1109/BIBM.2018.8621502.
Nguyen HH (2018).
Inference of Functional Dependency via Asymmetric, Optimal, and Model-free Statistics.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
Nguyen HH, Zhong H, Song M (2020).
“Optimality, accuracy, and efficiency of an exact functional test.”
In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 2683–2689.
doi:10.24963/ijcai.2020/372.
Zhang Y (2014).
Nonparametric Statistical Methods for Biological Network Inference.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
Zhang Y, Song M (2013).
“Deciphering interactions in causal networks without parametric assumptions.”
arXiv Molecular Networks, arXiv:1311.2707.
https://arxiv.org/abs/1311.2707.
Zhong H (2019).
Model-free Gene-to-zone Network Inference of Molecular Mechanisms in Biology.
Ph.D. thesis, Department of Computer Science, New Mexico State University, Las Cruces, NM, USA.
Zhong H, Song M (2019a).
“A fast exact functional test for directional association and cancer biology applications.”
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(3), 818–826.
doi:10.1109/TCBB.2018.2809743.
Zhong H, Song M (2019b).
“Directional association test reveals high-quality putative cancer driver biomarkers including noncoding RNAs.”
BMC Med Genomics, 12(7), 129.
doi:10.1186/s12920-019-0565-9.
For data discretization, an option is optimal univariate clustering via package Ckmeans.1d.dp. A second option is joint multivariate discretization via package GridOnClusters.
For symmetrical dependency tests on discrete data, see Pearson's chi-squared test chisq.test
, Fisher's exact test fisher.test
, and mutual information methods in package entropy.
# Example 1. Asymptotic functional chi-squared test x <- matrix(c(20,0,20,0,20,0,5,0,5), 3) fun.chisq.test(x) # strong functional dependency fun.chisq.test(t(x)) # weak functional dependency # Example 2. Normalized functional chi-squared test x <- matrix(c(8,0,8,0,8,0,2,0,2), 3) fun.chisq.test(x, method="nfchisq") # strong functional dependency fun.chisq.test(t(x), method="nfchisq") # weak functional dependency # Example 3. Exact functional chi-squared test x <- matrix(c(4,0,4,0,4,0,1,0,1), 3) fun.chisq.test(x, method="exact") # strong functional dependency fun.chisq.test(t(x), method="exact") # weak functional dependency # Example 4. Exact functional chi-squared test on a real data set # (Shen et al., 2002) # x is a contingency table with row variable for p53 mutation and # column variable for CIMP x <- matrix(c(12,26,18,0,8,12), nrow=2, ncol=3, byrow=TRUE) # Example 5. Adpated functional chi-squared test x <- matrix(c(20, 0, 1, 0, 1, 20, 3, 2, 15, 2, 5, 2), 3, 4, byrow=TRUE) fun.chisq.test(x, method="adapted") # strong functional dependency fun.chisq.test(t(x), method="adapted") # weak functional dependency # Test the functional dependency: p53 mutation -> CIMP fun.chisq.test(x, method="exact") # Test the functional dependency CIMP -> p53 mutation fun.chisq.test(t(x), method="exact") # Example 6. Asymptotic functional chi-squared test with simulated distribution x <- matrix(c(20,0,20,0,20,0,5,0,5), 3) fun.chisq.test(x, method="simulate.p.value") fun.chisq.test(x, method="simulate.p.value", simulate.n = 1000)
# Example 1. Asymptotic functional chi-squared test x <- matrix(c(20,0,20,0,20,0,5,0,5), 3) fun.chisq.test(x) # strong functional dependency fun.chisq.test(t(x)) # weak functional dependency # Example 2. Normalized functional chi-squared test x <- matrix(c(8,0,8,0,8,0,2,0,2), 3) fun.chisq.test(x, method="nfchisq") # strong functional dependency fun.chisq.test(t(x), method="nfchisq") # weak functional dependency # Example 3. Exact functional chi-squared test x <- matrix(c(4,0,4,0,4,0,1,0,1), 3) fun.chisq.test(x, method="exact") # strong functional dependency fun.chisq.test(t(x), method="exact") # weak functional dependency # Example 4. Exact functional chi-squared test on a real data set # (Shen et al., 2002) # x is a contingency table with row variable for p53 mutation and # column variable for CIMP x <- matrix(c(12,26,18,0,8,12), nrow=2, ncol=3, byrow=TRUE) # Example 5. Adpated functional chi-squared test x <- matrix(c(20, 0, 1, 0, 1, 20, 3, 2, 15, 2, 5, 2), 3, 4, byrow=TRUE) fun.chisq.test(x, method="adapted") # strong functional dependency fun.chisq.test(t(x), method="adapted") # weak functional dependency # Test the functional dependency: p53 mutation -> CIMP fun.chisq.test(x, method="exact") # Test the functional dependency CIMP -> p53 mutation fun.chisq.test(t(x), method="exact") # Example 6. Asymptotic functional chi-squared test with simulated distribution x <- matrix(c(20,0,20,0,20,0,5,0,5), 3) fun.chisq.test(x, method="simulate.p.value") fun.chisq.test(x, method="simulate.p.value", simulate.n = 1000)
These functions are provided for compatibility with older versions of package FunChisq only, and may be removed eventually.
The following functions are deprecated and will be made defunct; use the replacement indicated below:
cp.chisq.test
: now available as cp.chisq.test
in package DiffXTables
A table is visualized as a matrix whose cells are shown with intensity of a given color proportional to the count in each cell. The count in a cell must be real: negative numbers or non-integers are acceptable. It provides a global understanding of the underlying pattern.
plot_table(table, xlab = "Column", ylab = "Row", col = "green3", xaxt = "n", yaxt = "n", main = NULL, show.value = TRUE, value.cex = 2, highlight=c("row.maxima", "none"), highlight.col=col, mgp=c(0.5,0,0), mar=c(2,2,3,1.5), ...)
plot_table(table, xlab = "Column", ylab = "Row", col = "green3", xaxt = "n", yaxt = "n", main = NULL, show.value = TRUE, value.cex = 2, highlight=c("row.maxima", "none"), highlight.col=col, mgp=c(0.5,0,0), mar=c(2,2,3,1.5), ...)
table |
A data frame or a matrix. |
xlab |
The lable of the horizontal axis. |
ylab |
The lable of the vertical axis. |
col |
The color corresponding to the maximum value in the table. |
xaxt |
The style of the horizontal axis. See |
yaxt |
The style of the vertical axis. See |
main |
The title of the plot. |
show.value |
logical. Show the value of each cell in the table on the plot. |
value.cex |
Relative magnification factor if values are to be put in the cell. |
... |
Parameters acceptable to |
highlight |
Specify to highlight row maxima or no highlight. When highlighted, a box is placed around each row maximum. |
highlight.col |
The color used to highlight a cell in the table. |
mgp |
The margin (in mex units) for the axis title, labels and line. See |
mar |
The margins of the four sides of the plot. See |
Joe Song
opar <- par(mfrow=c(2,2)) plot_table(matrix(1:6, nrow=2), col="seagreen2") plot_table(matrix(rnorm(20), nrow=5), col="orange", show.value=FALSE) plot_table(matrix(rpois(16, 2), nrow=4), col="cornflowerblue", highlight="none") plot_table(matrix(rbinom(15, 8, 0.5), nrow=3), col="sienna2", highlight="none") par(opar)
opar <- par(mfrow=c(2,2)) plot_table(matrix(1:6, nrow=2), col="seagreen2") plot_table(matrix(rnorm(20), nrow=5), col="orange", show.value=FALSE) plot_table(matrix(rpois(16, 2), nrow=4), col="cornflowerblue", highlight="none") plot_table(matrix(rbinom(15, 8, 0.5), nrow=3), col="sienna2", highlight="none") par(opar)
Generate random contingency tables representing various functional, non-functional, dependent, or independent patterns, without specifying a parametric model for the patterns.
simulate_tables( n = 100, nrow = 3, ncol = 3, type = c("functional", "many.to.one", "discontinuous", "independent", "dependent.non.functional"), n.tables = 1, row.marginal = NULL, col.marginal = NULL, noise = 0.0, noise.model = c("house", "candle"), margin = 0 )
simulate_tables( n = 100, nrow = 3, ncol = 3, type = c("functional", "many.to.one", "discontinuous", "independent", "dependent.non.functional"), n.tables = 1, row.marginal = NULL, col.marginal = NULL, noise = 0.0, noise.model = c("house", "candle"), margin = 0 )
n |
a positive integer specifying the sample size to be distributed in each table. For |
nrow |
a positive integer specifying the number of rows in each table. The value must be no less than 2. For |
ncol |
a positive integer specifying the number of columns in output table. |
type |
a character string to specify the type of pattern underlying the table. The options are |
n.tables |
a positive integer value specifying the number of tables to be generated. |
row.marginal |
a non-negative numeric vector of length |
col.marginal |
a non-negative numeric vector of length |
noise |
a numeric value between 0 and 1 specifying the noise level to be added to a table using function |
noise.model |
a character string indicating the noise model of either |
margin |
a numeric value of either 0, 1 or 2. Default is 0.
0: noise is applied along both rows and columns.
1: noise is applied along each row.
2: noise is applied along each column.
See |
This function generates five types of table representing different interaction patterns between row and column discrete random variables and
. Three of the five types are non-constant functional patterns (
is a non-constant function of
):
type="functional"
: is a function of
but
may or may not be a function of
.
type="many.to.one"
: is a many-to-one function of
but
is not a function of
.
type="discontinuous"
: is a function of
, where the function value of X must differ from its neighbors.
may or may not be a function of
. A discontinuous function forms a contrast with those that are close to constant functions.
The fourth type
"dependent.non.functional"
is non-functional patterns where and
are statistically dependent but not function of each other. The samples are distributed according to
row.marginal
probabilities.
The fifth type
"independent"
represents patterns where and
are statistically independent whose joint probability mass function is the product of their marginal probability mass functions.
For all functional tables (type="functional"
, type="many.to.one"
, type="discontinuous"
), the samples are distributed using either the given row or column marginal probabilities. Theoretically, it is not always possible to enforce both marginals in a functional pattern. If both marginals are provided, one will be randomly selected to generate a table; about half of the time each equested marginal is used. If neither is provided, either row or column uniform marginal will be randomly selected to generate a table; half of the time a table will have a uniform row marginal and the other half a uniform column marginal.
Random noise can be optionally applied to the tables using either the house or the candle noise model. See add.noise
for details.
Sharma et al. (2017) provide full mathematical and statistical details of the simulation strategies for the above table types except the "discontinuous"
type which was introduced after the publication.
A list containing the following components:
pattern.list |
a list of tables containing binary patterns in 0's and 1's. Each table is created by setting all non-zero entries in the corresponding sampled contingency table from |
sample.list |
a list of tables satisfying both the mathematical and statistical requirements. These tables are noise free. |
noise.list |
a list of tables after applying noise to the corresponding tables in |
pvalue.list |
a list of p-values reporting the statistical significance of the generated tables for the required type. When the pattern type specifies a functional relationship, the p-values are computed by the functional chi-square test (Zhang and Song 2013); otherwise, the Pearson's chi-square test of independence is used to calculate the p-value. |
Ruby Sharma, Sajal Kumar, Hua Zhong, and Joe Song
Sharma R, Kumar S, Zhong H, Song M (2017).
“Simulating noisy, nonparametric, and multivariate discrete patterns.”
The R Journal, 9(2), 366–377.
doi:10.32614/RJ-2017-053.
Zhang Y, Liu ZL, Song M (2015).
“ChiNet uncovers rewired transcription subnetworks in tolerant yeast for advanced biofuels conversion.”
Nucleic Acids Research, 43(9), 4393–4407.
doi:10.1093/nar/gkv358.
Zhang Y, Song M (2013).
“Deciphering interactions in causal networks without parametric assumptions.”
arXiv Molecular Networks, arXiv:1311.2707.
https://arxiv.org/abs/1311.2707.
add.noise
for details of the noise model.
# In all examples, x is the row variable and y is the column # variable of a table. # Example 1. Simulating a noisy function where y=f(x), # x may or may not be g(y) with given row.marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="functional", noise=0.2, n.tables = 1, row.marginal = c(0.3,0.2,0.3,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 1. Functional pattern") plot_table(tbls$sample.list[[1]], main="Ex 1. Sampled pattern (noise free)") plot_table(tbls$noise.list[[1]], main="Ex 1. Sampled pattern with 0.2 noise") plot.new() # Example 2. Simulating a noisy functional pattern where # y=f(x), x may or may not be g(y) with given row.marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="functional", noise=0.5, n.tables = 1, row.marginal = c(0.3,0.2,0.3,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 2. Functioal pattern", col="seagreen2") plot_table(tbls$sample.list[[1]], main="Ex 2. Sampled pattern (noise free)", col="seagreen2") plot_table(tbls$noise.list[[1]], main="Ex 2. Sampled pattern with 0.5 noise", col="seagreen2") plot.new() # Example 3. Simulating a noisy many.to.one function where # y=f(x), x!=f(y) with given row.marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="many.to.one", noise=0.2, n.tables = 1, row.marginal = c(0.4,0.3,0.1,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 3. Many-to-one pattern", col="limegreen") plot_table(tbls$sample.list[[1]], main="Ex 3. Sampled pattern (noise free)", col="limegreen") plot_table(tbls$noise.list[[1]], main="Ex 3. Sampled pattern with 0.2 noise", col="limegreen") plot.new() # Example 4. Simulating noisy discontinuous # pattern where y=f(x), x may or may not be g(y) with given row.marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="discontinuous", noise=0.2, n.tables = 1, row.marginal = c(0.2,0.4,0.2,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 4. Discontinuous pattern", col="springgreen3") plot_table(tbls$sample.list[[1]], main="Ex 4. Sampled pattern (noise free)", col="springgreen3") plot_table(tbls$noise.list[[1]], main="Ex 4. Sampled pattern with 0.2 noise", col="springgreen3") plot.new() # Example 5. Simulating noisy dependent.non.functional # pattern where y!=f(x) and x and y are statistically # dependent. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="dependent.non.functional", noise=0.3, n.tables = 1, row.marginal = c(0.2,0.4,0.2,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 5. Dependent.non.functional pattern", col="sienna2", highlight="none") plot_table(tbls$sample.list[[1]], main="Ex 5. Sampled pattern (noise free)", col="sienna2", highlight="none") plot_table(tbls$noise.list[[1]], main="Ex 5. Sampled pattern with 0.3 noise", col="sienna2", highlight="none") plot.new() # Example 6. Simulating a pattern where x and y are # statistically independent. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="independent", noise=0.3, n.tables = 1, row.marginal = c(0.4,0.3,0.1,0.2), col.marginal = c(0.1,0.2,0.4,0.2,0.1)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 6. Independent pattern", col="cornflowerblue", highlight="none") plot_table(tbls$sample.list[[1]], main="Ex 6. Sampled pattern (noise free)", col="cornflowerblue", highlight="none") plot_table(tbls$noise.list[[1]], main="Ex 6. Sampled pattern with 0.3 noise", col="cornflowerblue", highlight="none") plot.new() # Example 7. Simulating a noisy function where y=f(x), # x may or may not be g(y), with given column marginal tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="functional", noise=0.2, n.tables = 1, col.marginal = c(0.2,0.1,0.4,0.2,0.1)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 7. Functional pattern") plot_table(tbls$sample.list[[1]], main="Ex 7. Sampled pattern (noise free)") plot_table(tbls$noise.list[[1]], main="Ex 7. Sampled pattern with 0.2 noise") plot.new() # Example 8. Simulating a noisy many.to.one function where # y=f(x), x!=f(y) with given column marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=4, type="many.to.one", noise=0.2, n.tables = 1, col.marginal = c(0.4,0.3,0.1,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 8. Many-to-one pattern", col="limegreen") plot_table(tbls$sample.list[[1]], main="Ex 8. Sampled pattern (noise free)", col="limegreen") plot_table(tbls$noise.list[[1]], main="Ex 8. Sampled pattern with 0.2 noise", col="limegreen") plot.new() # Example 9. Simulating noisy discontinuous # pattern where y=f(x), x may or may not be g(y) with given column marginal tbls <- simulate_tables(n=100, nrow=4, ncol=4, type="discontinuous", noise=0.2, n.tables = 1, col.marginal = c(0.1,0.4,0.2,0.3)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 9. Discontinuous pattern", col="springgreen3") plot_table(tbls$sample.list[[1]], main="Ex 9. Sampled pattern (noise free)", col="springgreen3") plot_table(tbls$noise.list[[1]], main="Ex 9. Sampled pattern with 0.2 noise", col="springgreen3") plot.new()
# In all examples, x is the row variable and y is the column # variable of a table. # Example 1. Simulating a noisy function where y=f(x), # x may or may not be g(y) with given row.marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="functional", noise=0.2, n.tables = 1, row.marginal = c(0.3,0.2,0.3,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 1. Functional pattern") plot_table(tbls$sample.list[[1]], main="Ex 1. Sampled pattern (noise free)") plot_table(tbls$noise.list[[1]], main="Ex 1. Sampled pattern with 0.2 noise") plot.new() # Example 2. Simulating a noisy functional pattern where # y=f(x), x may or may not be g(y) with given row.marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="functional", noise=0.5, n.tables = 1, row.marginal = c(0.3,0.2,0.3,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 2. Functioal pattern", col="seagreen2") plot_table(tbls$sample.list[[1]], main="Ex 2. Sampled pattern (noise free)", col="seagreen2") plot_table(tbls$noise.list[[1]], main="Ex 2. Sampled pattern with 0.5 noise", col="seagreen2") plot.new() # Example 3. Simulating a noisy many.to.one function where # y=f(x), x!=f(y) with given row.marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="many.to.one", noise=0.2, n.tables = 1, row.marginal = c(0.4,0.3,0.1,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 3. Many-to-one pattern", col="limegreen") plot_table(tbls$sample.list[[1]], main="Ex 3. Sampled pattern (noise free)", col="limegreen") plot_table(tbls$noise.list[[1]], main="Ex 3. Sampled pattern with 0.2 noise", col="limegreen") plot.new() # Example 4. Simulating noisy discontinuous # pattern where y=f(x), x may or may not be g(y) with given row.marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="discontinuous", noise=0.2, n.tables = 1, row.marginal = c(0.2,0.4,0.2,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 4. Discontinuous pattern", col="springgreen3") plot_table(tbls$sample.list[[1]], main="Ex 4. Sampled pattern (noise free)", col="springgreen3") plot_table(tbls$noise.list[[1]], main="Ex 4. Sampled pattern with 0.2 noise", col="springgreen3") plot.new() # Example 5. Simulating noisy dependent.non.functional # pattern where y!=f(x) and x and y are statistically # dependent. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="dependent.non.functional", noise=0.3, n.tables = 1, row.marginal = c(0.2,0.4,0.2,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 5. Dependent.non.functional pattern", col="sienna2", highlight="none") plot_table(tbls$sample.list[[1]], main="Ex 5. Sampled pattern (noise free)", col="sienna2", highlight="none") plot_table(tbls$noise.list[[1]], main="Ex 5. Sampled pattern with 0.3 noise", col="sienna2", highlight="none") plot.new() # Example 6. Simulating a pattern where x and y are # statistically independent. tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="independent", noise=0.3, n.tables = 1, row.marginal = c(0.4,0.3,0.1,0.2), col.marginal = c(0.1,0.2,0.4,0.2,0.1)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 6. Independent pattern", col="cornflowerblue", highlight="none") plot_table(tbls$sample.list[[1]], main="Ex 6. Sampled pattern (noise free)", col="cornflowerblue", highlight="none") plot_table(tbls$noise.list[[1]], main="Ex 6. Sampled pattern with 0.3 noise", col="cornflowerblue", highlight="none") plot.new() # Example 7. Simulating a noisy function where y=f(x), # x may or may not be g(y), with given column marginal tbls <- simulate_tables(n=100, nrow=4, ncol=5, type="functional", noise=0.2, n.tables = 1, col.marginal = c(0.2,0.1,0.4,0.2,0.1)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 7. Functional pattern") plot_table(tbls$sample.list[[1]], main="Ex 7. Sampled pattern (noise free)") plot_table(tbls$noise.list[[1]], main="Ex 7. Sampled pattern with 0.2 noise") plot.new() # Example 8. Simulating a noisy many.to.one function where # y=f(x), x!=f(y) with given column marginal. tbls <- simulate_tables(n=100, nrow=4, ncol=4, type="many.to.one", noise=0.2, n.tables = 1, col.marginal = c(0.4,0.3,0.1,0.2)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 8. Many-to-one pattern", col="limegreen") plot_table(tbls$sample.list[[1]], main="Ex 8. Sampled pattern (noise free)", col="limegreen") plot_table(tbls$noise.list[[1]], main="Ex 8. Sampled pattern with 0.2 noise", col="limegreen") plot.new() # Example 9. Simulating noisy discontinuous # pattern where y=f(x), x may or may not be g(y) with given column marginal tbls <- simulate_tables(n=100, nrow=4, ncol=4, type="discontinuous", noise=0.2, n.tables = 1, col.marginal = c(0.1,0.4,0.2,0.3)) par(mfrow=c(2,2)) plot_table(tbls$pattern.list[[1]], main="Ex 9. Discontinuous pattern", col="springgreen3") plot_table(tbls$sample.list[[1]], main="Ex 9. Sampled pattern (noise free)", col="springgreen3") plot_table(tbls$noise.list[[1]], main="Ex 9. Sampled pattern with 0.2 noise", col="springgreen3") plot.new()
Apply functional chi-squared tests on many-to-one combinatorial relationships for functional dependency using multivariate discrete data.
test.interactions( x, list.ind.vars, dep.vars, var.names = rownames(x), index.kind = c("conditional", "unconditional") )
test.interactions( x, list.ind.vars, dep.vars, var.names = rownames(x), index.kind = c("conditional", "unconditional") )
x |
A numeric matrix or data frame of discrete values. Rows represent variables and columns represent samples. Thus, each row index is a variable index, used by |
list.ind.vars |
A list of numeric or integer vectors, each vector representing independent variable indices in one interaction. Each vector (parents) forms a pair with a dependent variable (child) of the same position in |
dep.vars |
A numeric vector representing indices of dependent variables (children) in multiple interactions. |
var.names |
Optional. A character vector specifying names of all variables (rows). If not provided, the default is the row names of |
index.kind |
A character string to specify the kind of function index to return, identical to the same argument in |
test.interactions
tests functional dependencies in multiple directional interactions. Each interaction, either one-to-one or many-to-one, is a parents-child pair representing a relationship from independent variables (parents) to a dependent variable (child). The parents-child pairs are specified in two input arguments list.ind.vars
(a list of parents for each interaction) and dep.vars
(vector of children in each interaction).
The function automatically creates contingency tables for interactions of interest, thus convenient to use on multivariate data sets. As the function is implemented in C++ and capable of testing multiple many-to-one interactions in one call, it is much faster than calling the R function fun.chisq.test
multiple times.
test.interactions
implements only the method="fchisq"
option in fun.chisq.test
.
When a contingency table is created for each interaction, all combinations of unique values of the independent variables (parents) form the rows and the unique values of dependent variable (child) form the columns in the contingency table. The table entries are the counts of the corresponding combination of parent and child values. Either rows or columns with all zero counts are removed from the contingency table before functional chi-squared test is applied.
A data frame with five columns. Each row represents the testing result of each directional interaction. The 1st column is either the indices or names (if var.names
is not NULL
) of independent variables (parents); The 2nd column is the indices or names of the dependent variable (child); The 3rd column named p.value
are p-values; The 4th column named statistic
is chi-squared values; and the 5th column named estimate
is the function indices for each interaction.
Hua Zhong and Joe Song
This function calls functional chi-squared test implemented in C++ and is thus much faster than the R version fun.chisq.test
.
For data discretization by optimal univariate k-means clustering, see Ckmeans.1d.dp.
x <- matrix( c(0,0,1,0,1, 1,0,2,1,0, 2,2,0,0,0, 1,2,1,1,2, 1,0,2,1,2), nrow = 5, ncol = 5, byrow = TRUE) list.ind.vars <-list( c(1),c(1),c(1), c(2),c(2),c(2), c(1,2), c(2,3), c(3,4), c(4,5)) dep.vars <- c( 3,4,5, 3,4,5, 3,4, 5,1) # list.ind.vars and dep.vars together specify # the following ten interactions: # 1 -> 3 # 1 -> 4 # 1 -> 5 # 2 -> 3 # 2 -> 4 # 2 -> 5 # 1,2 -> 3 # 2,3 -> 4 # 3,4 -> 5 # 4,5 -> 1 var.names <- paste0("var", 1:5) test.interactions( x = x, list.ind.vars = list.ind.vars, dep.vars = dep.vars, var.names = var.names, index.kind = "unconditional")
x <- matrix( c(0,0,1,0,1, 1,0,2,1,0, 2,2,0,0,0, 1,2,1,1,2, 1,0,2,1,2), nrow = 5, ncol = 5, byrow = TRUE) list.ind.vars <-list( c(1),c(1),c(1), c(2),c(2),c(2), c(1,2), c(2,3), c(3,4), c(4,5)) dep.vars <- c( 3,4,5, 3,4,5, 3,4, 5,1) # list.ind.vars and dep.vars together specify # the following ten interactions: # 1 -> 3 # 1 -> 4 # 1 -> 5 # 2 -> 3 # 2 -> 4 # 2 -> 5 # 1,2 -> 3 # 2,3 -> 4 # 3,4 -> 5 # 4,5 -> 1 var.names <- paste0("var", 1:5) test.interactions( x = x, list.ind.vars = list.ind.vars, dep.vars = dep.vars, var.names = var.names, index.kind = "unconditional")