Title: | Kernel PC Algorithm for Causal Structure Detection |
---|---|
Description: | Kernel PC (kPC) algorithm for causal structure learning and causal inference using graphical models. kPC is a version of PC algorithm that uses kernel based independence criteria in order to be able to deal with non-linear relationships and non-Gaussian noise. |
Authors: | Petras Verbyla, Nina Ines Bertille Desgranges, Lorenz Wernisch |
Maintainer: | Petras Verbyla <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1 |
Built: | 2024-12-15 07:48:09 UTC |
Source: | CRAN |
Test to check the independence between two variables x and y using the Distance Covariance. The dcov.gamma() function, uses Distance Covariance independence criterion with gamma approximation to test for independence between two random variables.
dcov.gamma(x, y, index = 1, numCol = 100)
dcov.gamma(x, y, index = 1, numCol = 100)
x |
data of first sample |
y |
data of second sample |
index |
exponent on Euclidean distance, in (0,2] |
numCol |
Number of columns used in incomplete Singular Value Decomposition |
Let x and y be two samples of length n. Gram matrices K and L are defined as: and
, where 0<s<2.
. Let A=HKH and B=HLH, then
. For more detail: dcov.test in package energy. Gamma test compares
with the
quantile of the gamma distribution with mean and variance same as
under independence hypothesis.
dcov.gamma() returns a list with class htest containing
method |
description of test |
statistic |
observed value of the test statistic |
estimate |
nV^2(x,y) |
estimates |
a vector: [nV^2(x,y), mean of nV^2(x,y), variance of nV^2(x,y)] |
replicates |
replicates of the test statistic |
p.value |
approximate p-value of the test |
data.name |
desciption of data |
Petras Verbyla ([email protected]) and Nina Ines Bertille Desgranges
A. Gretton et al. (2005). Kernel Methods for Measuring Independence. JMLR 6 (2005) 2075-2129.
G. Szekely, M. Rizzo and N. Bakirov (2007). Measuring and Testing Dependence by Correlation of Distances. The Annals of Statistics 2007, Vol. 35, No. 6, 2769-2794.
hsic.perm, hsic.clust, hsic.gamma, dcov.test, kernelCItest
library(energy) set.seed(10) #independence x <- runif(300) y <- runif(300) hsic.gamma(x,y) hsic.perm(x,y) dcov.gamma(x,y) dcov.test(x,y) #uncorelated but not dependent z <- 10*(runif(300)-0.5) w <- z^2 + 10*runif(300) cor(z,w) hsic.gamma(z,w) hsic.perm(z,w) dcov.gamma(z,w) dcov.test(z,w)
library(energy) set.seed(10) #independence x <- runif(300) y <- runif(300) hsic.gamma(x,y) hsic.perm(x,y) dcov.gamma(x,y) dcov.test(x,y) #uncorelated but not dependent z <- 10*(runif(300)-0.5) w <- z^2 + 10*runif(300) cor(z,w) hsic.gamma(z,w) hsic.perm(z,w) dcov.gamma(z,w) dcov.test(z,w)
Creates a formula for gam to be used in regrXonS. For data , variable to be regressed
, i=1...n and variables to regress on
creates formula
.
frml.additive.smooth(target.ind, pred.inds, var.str = "x")
frml.additive.smooth(target.ind, pred.inds, var.str = "x")
target.ind |
integer, number for the variable to be regressed |
pred.inds |
integer(s), number(s) for the variable(s) on which we regress |
var.str |
name of variables used to create formula, default is "x" |
formula.additive.smooth() returns a formula
Petras Verbyla ([email protected])
Creates a formula for gam to be used in regrXonS. For data , variable to be regressed
, i=1...n and variables to regress on
creates formula
.
frml.full.smooth(target.ind, pred.inds, var.str = "x")
frml.full.smooth(target.ind, pred.inds, var.str = "x")
target.ind |
integer, number for the variable to be regressed |
pred.inds |
integer(s), number(s) for the variable(s) on which we regress |
var.str |
name of variables used to create formula, default is "x" |
formula.full.smooth() returns a formula
Petras Verbyla ([email protected])
Conditional independence test using HSIC and permutation with clusters.
hsic.clust(x, y, z, sig = 1, p = 100, numCluster = 10, numCol = 50, eps = 0.1, paral = 1)
hsic.clust(x, y, z, sig = 1, p = 100, numCluster = 10, numCol = 50, eps = 0.1, paral = 1)
x |
first variable |
y |
second variable |
z |
set of variables on which we condition |
sig |
the with of the Gaussian kernel |
p |
the number of permutations |
numCluster |
number of clusters for clustering z |
numCol |
maximum number of columns that we use for the incomplete Cholesky decomposition |
eps |
normalization parameter for HSIC cluster test |
paral |
number of cores used |
Let x and y be two samples of length n. Gram matrices K and L are defined as: ,
and
.
. Let
,
and
.
. Permutation test clusters Z and then permutes Y in the clusters of Z p times to get
and calculates
.
.
hsic.clust() returns a list with class htest containing
method |
description of test |
statistic |
observed value of the test statistic |
estimate |
HSIC(x,y) |
estimates |
a vector: [HSIC(x,y), mean of HSIC(x,y), variance of HSIC(x,y)] |
replicates |
replicates of the test statistic |
p.value |
approximate p-value of the test |
data.name |
desciption of data |
Petras Verbyla ([email protected]) and Nina Ines Bertille Desgranges
Tillman, R. E., Gretton, A. and Spirtes, P. (2009). Nonlinear directed acyclic structure learning with weakly additive noise model. NIPS 22, Vancouver.
K. Fukumizu et al. (2007). Kernel Measures of Conditional Dependence. NIPS 20. https://papers.nips.cc/paper/3340-kernel-measures-of-conditional-dependence.pdf
hsic.gamma, hsic.perm, kernelCItest
library(energy) set.seed(10) # x and y dependent, but independent conditionally on z z <- 10*runif(300) x <- sin(z) + runif(300) y <- cos(z) + runif(300) plot(x,y) hsic.gamma(x,y) hsic.perm(x,y) dcov.test(x,y) hsic.clust(x,y,z)
library(energy) set.seed(10) # x and y dependent, but independent conditionally on z z <- 10*runif(300) x <- sin(z) + runif(300) y <- cos(z) + runif(300) plot(x,y) hsic.gamma(x,y) hsic.perm(x,y) dcov.test(x,y) hsic.clust(x,y,z)
Test to check the independence between two variables x and y using HSIC. The hsic.gamma() function, uses Hilbert-Schmidt independence criterion to test for independence between random variables.
hsic.gamma(x, y, sig = 1, numCol = 100)
hsic.gamma(x, y, sig = 1, numCol = 100)
x |
data of first sample |
y |
data of second sample |
sig |
Gaussian kernel width for HSIC tests. Default is 1 |
numCol |
maximum number of columns that we use for the incomplete Cholesky decomposition |
Let x and y be two samples of length n. Gram matrices K and L are defined as: and
.
. Let
and
, then
. Gamma test compares HSIC(x,y) with the
quantile of the gamma distribution with mean and variance such as HSIC under independence hypothesis.
hsic.gamma() returns a list with class htest containing
method |
description of test |
statistic |
observed value of the test statistic |
estimate |
HSIC(x,y) |
estimates |
a vector: [HSIC(x,y), mean of HSIC(x,y), variance of HSIC(x,y)] |
replicates |
replicates of the test statistic |
p.value |
approximate p-value of the test |
data.name |
desciption of data |
Petras Verbyla ([email protected]) and Nina Ines Bertille Desgranges
A. Gretton et al. (2005). Kernel Methods for Measuring Independence. JMLR 6 (2005) 2075-2129.
hsic.perm, hsic.clust, kernelCItest
library(energy) set.seed(10) #independence x <- runif(300) y <- runif(300) hsic.gamma(x,y) hsic.perm(x,y) dcov.gamma(x,y) dcov.test(x,y) #uncorelated but not dependent z <- 10*(runif(300)-0.5) w <- z^2 + 10*runif(300) cor(z,w) hsic.gamma(z,w) hsic.perm(z,w) dcov.gamma(z,w) dcov.test(z,w)
library(energy) set.seed(10) #independence x <- runif(300) y <- runif(300) hsic.gamma(x,y) hsic.perm(x,y) dcov.gamma(x,y) dcov.test(x,y) #uncorelated but not dependent z <- 10*(runif(300)-0.5) w <- z^2 + 10*runif(300) cor(z,w) hsic.gamma(z,w) hsic.perm(z,w) dcov.gamma(z,w) dcov.test(z,w)
Test to check the independence between two variables x and y using HSIC. The hsic.perm() function, uses Hilbert-Schmidt independence criterion to test for independence between random variables.
hsic.perm(x, y, sig = 1, p = 100, numCol = 50)
hsic.perm(x, y, sig = 1, p = 100, numCol = 50)
x |
data of first sample |
y |
data of second sample |
sig |
Gaussian kernel width for HSIC tests. Default is 1 |
p |
Number of permutations. Default is 100 |
numCol |
maximum number of columns that we use for the incomplete Cholesky decomposition |
Let x and y be two samples of length n. Gram matrices K and L are defined as: and
.
. Let
and
, then
. Permutation test permutes y p times to get
and calculates HSIC(x,y_(p)).
.
hsic.perm() returns a list with class htest containing
method |
description of test |
statistic |
observed value of the test statistic |
estimate |
HSIC(x,y) |
estimates |
a vector: [HSIC(x,y), mean of HSIC(x,y), variance of HSIC(x,y)] |
replicates |
replicates of the test statistic |
p.value |
approximate p-value of the test |
data.name |
desciption of data |
Petras Verbyla ([email protected]) and Nina Ines Bertille Desgranges
A. Gretton et al. (2005). Kernel Methods for Measuring Independence. JMLR 6 (2005) 2075-2129.
hsic.gamma, hsic.clust, kernelCItest
library(energy) set.seed(10) #independence x <- runif(300) y <- runif(300) hsic.gamma(x,y) hsic.perm(x,y) dcov.gamma(x,y) dcov.test(x,y) #uncorelated but not dependent z <- 10*(runif(300)-0.5) w <- z^2 + 10*runif(300) cor(z,w) hsic.gamma(z,w) hsic.perm(z,w) dcov.gamma(z,w) dcov.test(z,w)
library(energy) set.seed(10) #independence x <- runif(300) y <- runif(300) hsic.gamma(x,y) hsic.perm(x,y) dcov.gamma(x,y) dcov.test(x,y) #uncorelated but not dependent z <- 10*(runif(300)-0.5) w <- z^2 + 10*runif(300) cor(z,w) hsic.gamma(z,w) hsic.perm(z,w) dcov.gamma(z,w) dcov.test(z,w)
Test to check the independence between two variables x and y using HSIC. The hsic.test() function, uses Hilbert-Schmidt independence criterion to test for independence between two random variables.
hsic.test(x, y, p = 0, hsic.method = c("gamma", "perm"), sig = 1, numCol = floor(length(x)/10))
hsic.test(x, y, p = 0, hsic.method = c("gamma", "perm"), sig = 1, numCol = floor(length(x)/10))
x |
data of first sample |
y |
data of second sample |
p |
number of replicates, if 0 |
hsic.method |
method for HSIC test, either gamma test hsic.gamma or permutation test hsic.perm |
sig |
Gaussian kernel width for HSIC. Default is 1 |
numCol |
number of columns in the Incomplete Cholesky Decomposition of Gram matrices. Default is floor(length(x)/10) |
Let x and y be two samples of length n. Gram matrices K and L are defined as: and
.
. Let
and
, then
.
hsic.gamma() returns a list with class htest containing
method |
description of test |
statistic |
observed value of the test statistic |
estimate |
HSIC(x,y) |
estimates |
a vector: [HSIC(x,y), mean of HSIC(x,y), variance of HSIC(x,y)] |
replicates |
replicates of the test statistic |
p.value |
approximate p-value of the test |
data.name |
desciption of data |
Petras Verbyla ([email protected]) and Nina Ines Bertille Desgranges
A. Gretton et al. (2005). Kernel Methods for Measuring Independence. JMLR 6 (2005) 2075-2129.
hsic.perm, hsic.clust, kernelCItest
library(energy) set.seed(10) #independence x <- runif(300) y <- runif(300) hsic.gamma(x,y) hsic.perm(x,y) dcov.gamma(x,y) dcov.test(x,y) #uncorelated but not dependent z <- 10*(runif(300)-0.5) w <- z^2 + 10*runif(300) cor(z,w) hsic.gamma(z,w) hsic.perm(z,w) dcov.gamma(z,w) dcov.test(z,w)
library(energy) set.seed(10) #independence x <- runif(300) y <- runif(300) hsic.gamma(x,y) hsic.perm(x,y) dcov.gamma(x,y) dcov.test(x,y) #uncorelated but not dependent z <- 10*(runif(300)-0.5) w <- z^2 + 10*runif(300) cor(z,w) hsic.gamma(z,w) hsic.perm(z,w) dcov.gamma(z,w) dcov.test(z,w)
Test to check the (conditional) dependence between two variables x and y given a set of variables S, using independence criteria. The kernelCItest() function, uses Distance Covariance or Hilbert-Schmidt Independence Criterion to test for the (conditional) independence between random variables, with an interface that can easily by used in skeleton
, pc
or kpc
.
kernelCItest(x, y, S = NULL, suffStat, verbose = FALSE, data, ic.method = NULL, p = NULL, index = NULL, sig = NULL, numCol = NULL, numCluster = NULL, eps = NULL, paral = NULL)
kernelCItest(x, y, S = NULL, suffStat, verbose = FALSE, data, ic.method = NULL, p = NULL, index = NULL, sig = NULL, numCol = NULL, numCluster = NULL, eps = NULL, paral = NULL)
x , y , S
|
It is tested, whether x and y are conditionally independent given the subset S of the remaining nodes. x, y, S all are integers, corresponding to variable or node numbers. |
suffStat |
a list of parameters consisting of data, ic.method, p, index, sig, numCol, numCluster, eps, paral |
verbose |
a logical parameter, if TRUE, detailed output is provided. |
data |
numeric matrix witch collumns representing variables and rows representing samples |
ic.method |
Method for the (conditional) independence test: Distance Covariance (permutation or gamma test), HSIC (permutation or gamma test) or HSIC cluster |
p |
Number of permutations for Distance Covariance, HSIC permutation and HSIC cluster tests. Default is Distance Covariance |
index |
Number in (0,2] the power of the distance in the Distance Covariance |
sig |
Gaussian kernel width for HSIC tests. Default is 1 |
numCol |
Number of columns used in the incomplete Cholesky decomposition. Default is 50 |
numCluster |
Number of clusters for kPC clust algorithm |
eps |
Normalization parameter for kPC clust. Default is 0.1 |
paral |
Number of cores to use for parallel calculations. |
kernelCItest() returns the p-value of the test.
Petras Verbyla ([email protected]) and Nina Ines Bertille Desgranges
G. Szekely, M. Rizzo and N. Bakirov (2007). Measuring and Testing Dependence by Correlation of Distances. The Annals of Statistics 2007, Vol. 35, No. 6, 2769-2794.
A. Gretton et al. (2005). Kernel Methods for Measuring Independence. JMLR 6 (2005) 2075-2129.
R. Tillman, A. Gretton and P. Spirtes (2009). Nonlinear directed acyclic structure learning with weakly additive noise model. NIPS 22, Vancouver.
set.seed(10) library(pcalg) z <- 10*runif(300) w <- 10*runif(300) x <- sin(z) + runif(300) y <- cos(z) + runif(300) data <- cbind(x,y,z,w) #conditionally independent test1a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="dcc.gamma")) test2a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="dcc.perm")) test3a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="hsic.gamma")) test4a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="hsic.perm")) test5a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="hsic.clust")) test6a <- gaussCItest( x=1,y=2,S=c(3),suffStat = list(C=cor(data),n=4)) test1a test2a test3a test4a test5a test6a #dependent test1b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="dcc.gamma")) test2b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="dcc.perm")) test3b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="hsic.gamma")) test4b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="hsic.perm")) test5b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="hsic.clust")) test6b <- gaussCItest( x=1,y=2,S=c(4),suffStat = list(C=cor(data),n=4)) test1b test2b test3b test4b test5b test6b
set.seed(10) library(pcalg) z <- 10*runif(300) w <- 10*runif(300) x <- sin(z) + runif(300) y <- cos(z) + runif(300) data <- cbind(x,y,z,w) #conditionally independent test1a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="dcc.gamma")) test2a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="dcc.perm")) test3a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="hsic.gamma")) test4a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="hsic.perm")) test5a <- kernelCItest(x=1,y=2,S=c(3),suffStat = list(data=data,ic.method="hsic.clust")) test6a <- gaussCItest( x=1,y=2,S=c(3),suffStat = list(C=cor(data),n=4)) test1a test2a test3a test4a test5a test6a #dependent test1b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="dcc.gamma")) test2b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="dcc.perm")) test3b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="hsic.gamma")) test4b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="hsic.perm")) test5b <- kernelCItest(x=1,y=2,S=c(4),suffStat = list(data=data,ic.method="hsic.clust")) test6b <- gaussCItest( x=1,y=2,S=c(4),suffStat = list(C=cor(data),n=4)) test1b test2b test3b test4b test5b test6b
Estimates the weakly additive noise partially directed acyclic graph (WAN-PDAG) from observational data, using the kPC algorithm. This is a version of pc
from pcalg package, that uses HSIC (hsic.gamma, hsic.perm or hsic.clust) or distance covariance (dcov.test
or dcov.gamma) independence tests and udag2wanpdag instead of udag2pdag
in the last step.
kpc(suffStat, indepTest, alpha, labels, p, fixedGaps = NULL, fixedEdges = NULL, NAdelete = TRUE, m.max = Inf, u2pd = c("relaxed", "rand", "retry"), skel.method = c("stable", "original", "stable.fast"), conservative = FALSE, maj.rule = FALSE, solve.confl = FALSE, verbose = FALSE)
kpc(suffStat, indepTest, alpha, labels, p, fixedGaps = NULL, fixedEdges = NULL, NAdelete = TRUE, m.max = Inf, u2pd = c("relaxed", "rand", "retry"), skel.method = c("stable", "original", "stable.fast"), conservative = FALSE, maj.rule = FALSE, solve.confl = FALSE, verbose = FALSE)
suffStat |
a list of sufficient statistics, containing all necessary elements for the conditional independence decisions in the function indepTest |
indepTest |
A function for testing conditional independence. It is internally called as indepTest(x,y,S,suffStat), and tests conditional independence of x and y given S. Here, x and y are variables, and S is a (possibly empty) vector of variables (all variables are denoted by their column numbers in the adjacency matrix). suffStat is a list, see the argument above. The return value of indepTest is the p-value of the test for conditional independence. Default is kernelCItest. |
alpha |
significance level (number in (0,1) for the individual conditional independence tests. |
labels |
(optional) character vector of variable (or "node") names. Typically preferred to specifying p. |
p |
(optional) number of variables (or nodes). May be specified if labels are not, in which case labels is set to 1:p. |
fixedGaps |
A logical matrix of dimension p*p. If entry [i,j] or [j,i] (or both) are TRUE, the edge i-j is removed before starting the algorithm. Therefore, this edge is guaranteed to be absent in the resulting graph. |
fixedEdges |
A logical matrix of dimension p*p. If entry [i,j] or [j,i] (or both) are TRUE, the edge i-j is never considered for removal. Therefore, this edge is guaranteed to be present in the resulting graph. |
NAdelete |
If indepTest returns NA and this option is TRUE, the corresponding edge is deleted. If this option is FALSE, the edge is not deleted. |
m.max |
Maximal size of the conditioning sets that are considered in the conditional independence tests. |
u2pd |
String specifying the method for dealing with conflicting information when trying to orient edges (see details below). |
skel.method |
Character string specifying method; the default, "stable" provides an order-independent skeleton, see skeleton. |
conservative |
Logical indicating if the conservative PC is used. In this case, only option u2pd = "relaxed" is supported. Note that therefore the resulting object might not be extendable to a DAG. See details for more information. |
maj.rule |
Logical indicating that the triples shall be checked for ambiguity using a majority rule idea, which is less strict than the conservative PC algorithm. For more information, see details. |
solve.confl |
If TRUE, the orientation of the v-structures and the orientation rules work with lists for candidate sets and allow bi-directed edges to resolve conflicting edge orientations. In this case, only option u2pd = relaxed is supported. Note, that therefore the resulting object might not be a CPDAG because bi-directed edges might be present. See details for more information. |
verbose |
If TRUE, detailed output is provided. |
For more information: pc
.
An object of class "pcAlgo" (see pcAlgo
) containing an estimate of the equivalence class of the underlying DAG.
Petras Verbyla ([email protected])
Tillman, R. E., Gretton, A. and Spirtes, P. (2009). Nonlinear directed acyclic structure learning with weakly additive noise model. NIPS 22, Vancouver.
## Not run: library(pcalg) set.seed(4) n <- 300 data <- NULL x1 <- 2*(runif(n)-0.5) x2 <- x1 + runif(n)-0.5 x3 <- x1^2 + 0.6*runif(n) x4 <- rnorm(n) x5 <- x3 + x4^2 + 2*runif(n) x6 <- 10*(runif(n)-0.5) x7 <- x6^2 + 5*runif(n) x8 <- 2*x7^2 + 1.5*rnorm(n) x9 <- x7 + 4*runif(n) data <- cbind(x1,x2,x3,x4,x5,x6,x7,x8,x9) true <- matrix(0,9,9) true[c(1),c(2,3)]<-true[c(3,4),5]<-true[c(6),c(7)]<-true[c(7),c(8)]<-true[7,9]<-1 pc <- pc(suffStat = list(C = cor(data), n = 9), indepTest = gaussCItest, alpha = 0.9, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) kpc1 <- kpc(suffStat = list(data=data, ic.method="dcc.perm"), indepTest = kernelCItest, alpha = 0.1, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) kpc2 <- kpc(suffStat = list(data=data, ic.method="hsic.gamma"), indepTest = kernelCItest, alpha = 0.1, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) kpc3 <- kpc(suffStat = list(data=data, ic.method="hsic.perm"), indepTest = kernelCItest, alpha = 0.1, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) kpc4 <- kpc(suffStat = list(data=data, ic.method="hsic.clust"), indepTest = kernelCItest, alpha = 0.1, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) if (require(Rgraphviz)) { par(mfrow=c(2,3)) plot(pc,main="pc") plot(kpc1,main="dpc.perm") plot(kpc2,main="kpc.gamma") plot(kpc3,main="kpc.perm") plot(kpc4,main="kpc.clust") plot(as(true,"graphNEL"),main="True DAG") } ## End(Not run)
## Not run: library(pcalg) set.seed(4) n <- 300 data <- NULL x1 <- 2*(runif(n)-0.5) x2 <- x1 + runif(n)-0.5 x3 <- x1^2 + 0.6*runif(n) x4 <- rnorm(n) x5 <- x3 + x4^2 + 2*runif(n) x6 <- 10*(runif(n)-0.5) x7 <- x6^2 + 5*runif(n) x8 <- 2*x7^2 + 1.5*rnorm(n) x9 <- x7 + 4*runif(n) data <- cbind(x1,x2,x3,x4,x5,x6,x7,x8,x9) true <- matrix(0,9,9) true[c(1),c(2,3)]<-true[c(3,4),5]<-true[c(6),c(7)]<-true[c(7),c(8)]<-true[7,9]<-1 pc <- pc(suffStat = list(C = cor(data), n = 9), indepTest = gaussCItest, alpha = 0.9, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) kpc1 <- kpc(suffStat = list(data=data, ic.method="dcc.perm"), indepTest = kernelCItest, alpha = 0.1, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) kpc2 <- kpc(suffStat = list(data=data, ic.method="hsic.gamma"), indepTest = kernelCItest, alpha = 0.1, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) kpc3 <- kpc(suffStat = list(data=data, ic.method="hsic.perm"), indepTest = kernelCItest, alpha = 0.1, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) kpc4 <- kpc(suffStat = list(data=data, ic.method="hsic.clust"), indepTest = kernelCItest, alpha = 0.1, labels = colnames(data), u2pd = "relaxed", skel.method = "stable", verbose = TRUE) if (require(Rgraphviz)) { par(mfrow=c(2,3)) plot(pc,main="pc") plot(kpc1,main="dpc.perm") plot(kpc2,main="kpc.gamma") plot(kpc3,main="kpc.perm") plot(kpc4,main="kpc.clust") plot(as(true,"graphNEL"),main="True DAG") } ## End(Not run)
Uses the generalised additive model gam to non-linearly and non-parametrically regress variable V on its parents and set of variables S.
regrVonPS(G, V, S, suffStat, indepTest = kernelCItest, alpha = 0.2)
regrVonPS(G, V, S, suffStat, indepTest = kernelCItest, alpha = 0.2)
G |
adjacency matrix, for the graph |
V |
integer, node which we regress |
S |
integer(s), set we regress on |
suffStat |
sufficient statistics to perform the independence test kernelCItest |
indepTest |
independence test to check for dependence between residuals of V and S |
alpha |
numeric cutoff for significance level of individual partial correlation tests |
regrVonPS() returns the number of p-values smaller than the cutoff, i.e 0 means residuals of V are independent of all variables in S
Petras Verbyla ([email protected])
Uses the generalised additive model gam to non-linearly and non-parametrically regress set of variables X on a set of variables S and returns residuals of X.
regrXonS(X, S)
regrXonS(X, S)
X |
numeric matrix, set of variables to be regressed. Each column represents separate variable |
S |
numeric matrix, set of variables we will regress on. Each column represents separate variable |
If the number of variables in S is we use frml.full.smooth as formula for gam to regress X on S, otherwise we use frml.additive.smooth.
regrXonS() returns the residuals of X regressed on S.
Petras Verbyla ([email protected])
set.seed(10) library(energy) z <- 10*runif(300) w <- 10*runif(300) x <- sin(z) + runif(300) y <- cos(z) + runif(300) data <- cbind(x,y,z,w) hsic.gamma(x,y) hsic.perm(x,y) dcov.test(x,y) resid <- regrXonS(cbind(x,y),cbind(z,w)) hsic.gamma(resid[,1],resid[,2]) hsic.perm(resid[,1],resid[,2]) dcov.test(resid[,1],resid[,2])
set.seed(10) library(energy) z <- 10*runif(300) w <- 10*runif(300) x <- sin(z) + runif(300) y <- cos(z) + runif(300) data <- cbind(x,y,z,w) hsic.gamma(x,y) hsic.perm(x,y) dcov.test(x,y) resid <- regrXonS(cbind(x,y),cbind(z,w)) hsic.gamma(resid[,1],resid[,2]) hsic.perm(resid[,1],resid[,2]) dcov.test(resid[,1],resid[,2])
This function performs the last (generalised transitive) step in the kpc algorithm. It transforms an object of the class "pcAlgo" containing a skeleton and corresponding conditional independence information into a weakly additive noise directed acyclic graph (CPDAG). The functions first determine the v-structures in the collider step, and then performs the Generalised Transitive Step as described in Tillman et al (2009) to orient as many of the remaining edges as possible.
udag2wanpdag(gInput, suffStat, indepTest = kernelCItest, alpha = 0.2, verbose = FALSE, unfVect = NULL, solve.confl = FALSE, orientCollider = TRUE, rules = rep(TRUE, 3))
udag2wanpdag(gInput, suffStat, indepTest = kernelCItest, alpha = 0.2, verbose = FALSE, unfVect = NULL, solve.confl = FALSE, orientCollider = TRUE, rules = rep(TRUE, 3))
gInput |
"pcAlgo"-object containing skeleton and conditional indepedence information. |
suffStat |
a list of sufficient statistics, containing all necessary elements for the conditional independence decisions in the function indepTest. |
indepTest |
A function for testing conditional independence. It is internally called as indepTest(x,y,S,suffStat). Default is kernelCItest. |
alpha |
significance level (number in (0,1) for the individual conditional independence tests. |
verbose |
0: No output; 1: Details |
unfVect |
vector containing numbers that encode ambiguous triples (as returned by pc.cons.intern). This is needed in the conservative and majority rule PC algorithms. |
solve.confl |
if TRUE, the orientation of the v-structures and the orientation rules work with lists for candidate sets and allow bi-directed edges to resolve conflicting edge orientations. Note that therefore the resulting object is order-independent but might not be a PDAG because bi-directed edges can be present. |
orientCollider |
if TRUE, collider are oriented. |
rules |
Array of length 3 containing TRUE or FALSE for each rule. TRUE in position i means that rule i (Ri) will be applied. By default, all rules are used.gInput |
First we perform a collider step, that is orienting triples a-b-c as a->b<-c iff b is not in separating set of a and c. Then we orient edges a-S as a->S if b_r is independent of a set S, where b_r are the residuals of b non parametrically regressed on S and parents of b and none of the edges S_i-a can be oriented as S_i->a, that is residuals S_i_r would be independent of a.
An oriented object of class "pcAlgo".
Tillman, R. E., Gretton, A. and Spirtes, P. (2009). Nonlinear directed acyclic structure learning with weakly additive noise model. NIPS 22, Vancouver.
## Not run: library(pcalg) set.seed(4) n <- 300 data <- NULL x1 <- 2*(runif(n)-0.5) x2 <- x1 + runif(n)-0.5 x3 <- x1^2 + 0.6*runif(n) x4 <- rnorm(n) x5 <- x3 + x4^2 + 2*runif(n) x6 <- 10*(runif(n)-0.5) x7 <- x6^2 + 10*runif(n) x8 <- 2*x7^2 + rnorm(n) x9 <- x7 + 5*runif(n) data <- cbind(x1,x2,x3,x4,x5,x6,x7,x8,x9) true <- matrix(0,9,9) true[c(1),c(2,3)]<-true[c(3,4),5]<-true[c(6),c(7)]<-true[c(7),c(8)]<-true[7,9]<-1 ## estimate skeleton resU1 <- skeleton(suffStat = list(data=data, ic.method="dcc.perm", p=200), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1, p=9) resU2 <- skeleton(suffStat = list(data=data, ic.method="hsic.gamma", sig=1, numCol = 50), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1, p=9) resU3 <- skeleton(suffStat = list(data=data, ic.method="hsic.perm", sig=1, numCol = 50, p=200), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1, p=9) resU4 <- skeleton(suffStat = list(data=data, ic.method="hsic.clust", p=200, sig=1, numCluster=100, numCol = 50, eps = 0.1, paral = 1), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1, p=9) resU5 <- skeleton(suffStat = list(C = cor(data), n = n), indepTest = gaussCItest, verbose = TRUE, alpha = 0.1, p=9) if (require(Rgraphviz)) { par(mfrow=c(2,3)) plot(resU1,main="dpc") plot(resU2,main="kpc-resid-gamma") plot(resU3,main="kpc-resid-perm") plot(resU4,main="kpc-clust") plot(resU5,main="pc") plot(as(true,"graphNEL"),main="True DAG") } ## orient edges using three different methods resD1 <- udag2wanpdag(gInput = resU1, suffStat = list(data=data, ic.method="dcc.perm", sig=1, numCol = 50, p=200), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1) resD2 <- udag2wanpdag(gInput = resU1, suffStat = list(data=data, ic.method="hsic.gamma", sig=1, numCol = 50), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1) resD3 <- udag2wanpdag(gInput = resU1, suffStat = list(data=data, ic.method="hsic.perm", sig=1, numCol = 50, p=200), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1) resD4 <- udag2pdagRelaxed(gInput = resU1, verbose = T) if (require(Rgraphviz)) { par(mfrow=c(2,3)) plot(resD1,main="dpc") plot(resD2,main="kpc-resid-gamma") plot(resD3,main="kpc-resid-perm") plot(resD4,main="pc") plot(as(true,"graphNEL"),main="True DAG") } ## End(Not run)
## Not run: library(pcalg) set.seed(4) n <- 300 data <- NULL x1 <- 2*(runif(n)-0.5) x2 <- x1 + runif(n)-0.5 x3 <- x1^2 + 0.6*runif(n) x4 <- rnorm(n) x5 <- x3 + x4^2 + 2*runif(n) x6 <- 10*(runif(n)-0.5) x7 <- x6^2 + 10*runif(n) x8 <- 2*x7^2 + rnorm(n) x9 <- x7 + 5*runif(n) data <- cbind(x1,x2,x3,x4,x5,x6,x7,x8,x9) true <- matrix(0,9,9) true[c(1),c(2,3)]<-true[c(3,4),5]<-true[c(6),c(7)]<-true[c(7),c(8)]<-true[7,9]<-1 ## estimate skeleton resU1 <- skeleton(suffStat = list(data=data, ic.method="dcc.perm", p=200), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1, p=9) resU2 <- skeleton(suffStat = list(data=data, ic.method="hsic.gamma", sig=1, numCol = 50), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1, p=9) resU3 <- skeleton(suffStat = list(data=data, ic.method="hsic.perm", sig=1, numCol = 50, p=200), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1, p=9) resU4 <- skeleton(suffStat = list(data=data, ic.method="hsic.clust", p=200, sig=1, numCluster=100, numCol = 50, eps = 0.1, paral = 1), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1, p=9) resU5 <- skeleton(suffStat = list(C = cor(data), n = n), indepTest = gaussCItest, verbose = TRUE, alpha = 0.1, p=9) if (require(Rgraphviz)) { par(mfrow=c(2,3)) plot(resU1,main="dpc") plot(resU2,main="kpc-resid-gamma") plot(resU3,main="kpc-resid-perm") plot(resU4,main="kpc-clust") plot(resU5,main="pc") plot(as(true,"graphNEL"),main="True DAG") } ## orient edges using three different methods resD1 <- udag2wanpdag(gInput = resU1, suffStat = list(data=data, ic.method="dcc.perm", sig=1, numCol = 50, p=200), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1) resD2 <- udag2wanpdag(gInput = resU1, suffStat = list(data=data, ic.method="hsic.gamma", sig=1, numCol = 50), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1) resD3 <- udag2wanpdag(gInput = resU1, suffStat = list(data=data, ic.method="hsic.perm", sig=1, numCol = 50, p=200), indepTest = kernelCItest, verbose = TRUE, alpha = 0.1) resD4 <- udag2pdagRelaxed(gInput = resU1, verbose = T) if (require(Rgraphviz)) { par(mfrow=c(2,3)) plot(resD1,main="dpc") plot(resD2,main="kpc-resid-gamma") plot(resD3,main="kpc-resid-perm") plot(resD4,main="pc") plot(as(true,"graphNEL"),main="True DAG") } ## End(Not run)