Title: | A Structural Equation Embedded Likelihood Framework for Causal Discovery |
---|---|
Description: | Provides the SELF criteria to learn causal structure. Please cite "Ruichu Cai, Jie Qiao, Zhenjie Zhang, Zhifeng Hao. SELF: Structural Equational Embedded Likelihood Framework for Causal Discovery. AAAI. 2018." |
Authors: | Ruichu Cai [ths, aut], Jie Qiao [aut, cre], Zhenjie Zhang [ths, aut], Zhifeng Hao [ths, aut] |
Maintainer: | Jie Qiao <[email protected]> |
License: | GPL-2 |
Version: | 0.1.1 |
Built: | 2024-11-20 06:44:59 UTC |
Source: | CRAN |
Provides the SELF criteria to learn causal structure. Please cite "Ruichu Cai, Jie Qiao, Zhenjie Zhang, Zhifeng Hao. SELF: Structural Equational Embedded Likelihood Framework for Causal Discovery. AAAI. 2018."
Maintainer: Jie Qiao [email protected]
Authors:
Ruichu Cai [email protected] [thesis advisor]
Zhenjie Zhang [email protected] [thesis advisor]
Zhifeng Hao [email protected] [thesis advisor]
The function for the causal structure learning.
fhc(D, G = NULL, min_increase = 0.01, score_type = "bic", file = "", verbose = TRUE, save_model = FALSE, bw = "nrd0", booster = "gbtree", gamma = 10, nrounds = 30, ...)
fhc(D, G = NULL, min_increase = 0.01, score_type = "bic", file = "", verbose = TRUE, save_model = FALSE, bw = "nrd0", booster = "gbtree", gamma = 10, nrounds = 30, ...)
D |
Input Data. |
G |
An initial graph for hill climbing. Default: empty graph. |
min_increase |
Minimum score increase for faster convergence. |
score_type |
You can choose "bic","log","aic" score to learn the causal struture. Default: bic |
file |
Specifies the output folder and its path to save the model at each iteration. |
verbose |
Show the progress bar for each iteration. |
save_model |
Save the meta data during the iteration so that you can easily restore progress and evaluate the model during iteration. |
bw |
the smoothing bandwidth which is the parameter of the function stats::density(Kernel stats::density Estimation) |
booster |
Choose the regression method, it could be "lm", "gbtree" and "gblinear". The "lm" and "gblinear" is the linear regression methods and "gbtree" is the nonlinear regression method. Default: gbtree |
gamma |
The parameter in xgboost: minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. |
nrounds |
the maximum number of trees for xgboost.Default:30. |
... |
other parameters for xgboost.see also: help(xgboost) |
The adjacency matrix of the casual structure.
## Not run: #x->y->z set.seed(0) x=rnorm(4000) y=x^2+runif(4000,-1,1)*0.1 z=y^2+runif(4000,-1,1)*0.1 data=data.frame(x,y,z) fhc(data,gamma=10,booster = "gbtree") #x->y->z linear data set.seed(0) x=rnorm(4000) y=3*x+runif(4000,-1,1)*0.1 z=3*y+runif(4000,-1,1)*0.1 data=data.frame(x,y,z) fhc(data,booster = "lm") #randomGraph with linear data set.seed(0) G=randomGraph(dim=10,indegree=1.5) data=synthetic_data_linear(G=G,sample_num=4000) fitG=fhc(data,booster = "lm") indicators(fitG,G) ## End(Not run)
## Not run: #x->y->z set.seed(0) x=rnorm(4000) y=x^2+runif(4000,-1,1)*0.1 z=y^2+runif(4000,-1,1)*0.1 data=data.frame(x,y,z) fhc(data,gamma=10,booster = "gbtree") #x->y->z linear data set.seed(0) x=rnorm(4000) y=3*x+runif(4000,-1,1)*0.1 z=3*y+runif(4000,-1,1)*0.1 data=data.frame(x,y,z) fhc(data,booster = "lm") #randomGraph with linear data set.seed(0) G=randomGraph(dim=10,indegree=1.5) data=synthetic_data_linear(G=G,sample_num=4000) fitG=fhc(data,booster = "lm") indicators(fitG,G) ## End(Not run)
Calculate the f1,precision,recall score of the graph
indicators(pred, real)
indicators(pred, real)
pred |
Predicted graph |
real |
Real graph |
f1,precision,recall score.
pred<-matrix(c(0,0,0,0,1,0,1,1,0),nrow=3,ncol=3) real<-matrix(c(0,0,0,0,1,0,1,0,0),nrow=3,ncol=3) indicators(pred,real)
pred<-matrix(c(0,0,0,0,1,0,1,1,0),nrow=3,ncol=3) real<-matrix(c(0,0,0,0,1,0,1,0,0),nrow=3,ncol=3) indicators(pred,real)
The nonlinear data comparison algorithm. We use the mmpc algorithm to learn a causal skeleton and use ANM to recognize the direction
mmpcAnm(data)
mmpcAnm(data)
data |
The data |
Generate a random graph based on the given dimension size and average indegree
randomGraph(dim, indegree, maxite = 10000)
randomGraph(dim, indegree, maxite = 10000)
dim |
The dimension of the random graph |
indegree |
The average indegree of random graph for each nodes |
maxite |
The maximum iterations to find the random graph |
Return a random graph
randomGraph(dim=10,indegree=1)
randomGraph(dim=10,indegree=1)
Synthetic linear data base on the graph. The noises are sampled from the super-gaussian distribution. The coefficients are sample from U(-1,-0.5),U(0.5,1)
synthetic_data_linear(G, sample_num, ratio = 1, return_noise = FALSE)
synthetic_data_linear(G, sample_num, ratio = 1, return_noise = FALSE)
G |
An adjacency matrix. |
sample_num |
The number of samples |
ratio |
The noise ratio It will grow or shrink the value of the noise |
return_noise |
Whether return the noise of each nodes for further analysis. |
Return a synthetic data
G<-matrix(c(0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0),nrow = 4,ncol = 4) data=synthetic_data_linear(G,100)
G<-matrix(c(0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0),nrow = 4,ncol = 4) data=synthetic_data_linear(G,100)
synthetic nonlinear data base on the graph. The data generation mechanism is y=scale(a1b1x^2+a2b2x^3+a3b3x^4+a4b4sin(x)+a5b5sin(x^2)).
synthetic_data_nonlinear(G, sample_num, ratio = 1, return_noise = FALSE)
synthetic_data_nonlinear(G, sample_num, ratio = 1, return_noise = FALSE)
G |
An adjacency matrix. |
sample_num |
The number of samples |
ratio |
The noise ratio. It will grow or shrink the value of the noise. |
return_noise |
Whether return the noise of each nodes for further analysis. |
Return a synthetic data
G<-matrix(c(0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0),nrow = 4,ncol = 4) data=synthetic_data_nonlinear(G,100)
G<-matrix(c(0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0),nrow = 4,ncol = 4) data=synthetic_data_nonlinear(G,100)