Package 'SELF'

Title: A Structural Equation Embedded Likelihood Framework for Causal Discovery
Description: Provides the SELF criteria to learn causal structure. Please cite "Ruichu Cai, Jie Qiao, Zhenjie Zhang, Zhifeng Hao. SELF: Structural Equational Embedded Likelihood Framework for Causal Discovery. AAAI. 2018."
Authors: Ruichu Cai [ths, aut], Jie Qiao [aut, cre], Zhenjie Zhang [ths, aut], Zhifeng Hao [ths, aut]
Maintainer: Jie Qiao <[email protected]>
License: GPL-2
Version: 0.1.1
Built: 2024-11-20 06:44:59 UTC
Source: CRAN

Help Index


SELF: A Structural Equation Embedded Likelihood Framework for Causal Discovery

Description

Provides the SELF criteria to learn causal structure. Please cite "Ruichu Cai, Jie Qiao, Zhenjie Zhang, Zhifeng Hao. SELF: Structural Equational Embedded Likelihood Framework for Causal Discovery. AAAI. 2018."

Author(s)

Maintainer: Jie Qiao [email protected]

Authors:


Fast Hill-Climbing

Description

The function for the causal structure learning.

Usage

fhc(D, G = NULL, min_increase = 0.01, score_type = "bic", file = "",
  verbose = TRUE, save_model = FALSE, bw = "nrd0", booster = "gbtree",
  gamma = 10, nrounds = 30, ...)

Arguments

D

Input Data.

G

An initial graph for hill climbing. Default: empty graph.

min_increase

Minimum score increase for faster convergence.

score_type

You can choose "bic","log","aic" score to learn the causal struture. Default: bic

file

Specifies the output folder and its path to save the model at each iteration.

verbose

Show the progress bar for each iteration.

save_model

Save the meta data during the iteration so that you can easily restore progress and evaluate the model during iteration.

bw

the smoothing bandwidth which is the parameter of the function stats::density(Kernel stats::density Estimation)

booster

Choose the regression method, it could be "lm", "gbtree" and "gblinear". The "lm" and "gblinear" is the linear regression methods and "gbtree" is the nonlinear regression method. Default: gbtree

gamma

The parameter in xgboost: minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be.

nrounds

the maximum number of trees for xgboost.Default:30.

...

other parameters for xgboost.see also: help(xgboost)

Value

The adjacency matrix of the casual structure.

Examples

## Not run: 
#x->y->z
set.seed(0)
x=rnorm(4000)
y=x^2+runif(4000,-1,1)*0.1
z=y^2+runif(4000,-1,1)*0.1
data=data.frame(x,y,z)
fhc(data,gamma=10,booster = "gbtree")

#x->y->z linear data
set.seed(0)
x=rnorm(4000)
y=3*x+runif(4000,-1,1)*0.1
z=3*y+runif(4000,-1,1)*0.1
data=data.frame(x,y,z)
fhc(data,booster = "lm")

#randomGraph with linear data

set.seed(0)
G=randomGraph(dim=10,indegree=1.5)
data=synthetic_data_linear(G=G,sample_num=4000)
fitG=fhc(data,booster = "lm")
indicators(fitG,G)

## End(Not run)

Calculate the f1,precision,recall score of the graph

Description

Calculate the f1,precision,recall score of the graph

Usage

indicators(pred, real)

Arguments

pred

Predicted graph

real

Real graph

Value

f1,precision,recall score.

Examples

pred<-matrix(c(0,0,0,0,1,0,1,1,0),nrow=3,ncol=3)
real<-matrix(c(0,0,0,0,1,0,1,0,0),nrow=3,ncol=3)
indicators(pred,real)

mmpc algorithm with additive noise model

Description

The nonlinear data comparison algorithm. We use the mmpc algorithm to learn a causal skeleton and use ANM to recognize the direction

Usage

mmpcAnm(data)

Arguments

data

The data


Generate a random graph

Description

Generate a random graph based on the given dimension size and average indegree

Usage

randomGraph(dim, indegree, maxite = 10000)

Arguments

dim

The dimension of the random graph

indegree

The average indegree of random graph for each nodes

maxite

The maximum iterations to find the random graph

Value

Return a random graph

Examples

randomGraph(dim=10,indegree=1)

synthetic linear data base on the graph

Description

Synthetic linear data base on the graph. The noises are sampled from the super-gaussian distribution. The coefficients are sample from U(-1,-0.5),U(0.5,1)

Usage

synthetic_data_linear(G, sample_num, ratio = 1, return_noise = FALSE)

Arguments

G

An adjacency matrix.

sample_num

The number of samples

ratio

The noise ratio It will grow or shrink the value of the noise

return_noise

Whether return the noise of each nodes for further analysis.

Value

Return a synthetic data

Examples

G<-matrix(c(0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0),nrow = 4,ncol = 4)
data=synthetic_data_linear(G,100)

synthetic nonlinear data base on the graph

Description

synthetic nonlinear data base on the graph. The data generation mechanism is y=scale(a1b1x^2+a2b2x^3+a3b3x^4+a4b4sin(x)+a5b5sin(x^2)).

Usage

synthetic_data_nonlinear(G, sample_num, ratio = 1, return_noise = FALSE)

Arguments

G

An adjacency matrix.

sample_num

The number of samples

ratio

The noise ratio. It will grow or shrink the value of the noise.

return_noise

Whether return the noise of each nodes for further analysis.

Value

Return a synthetic data

Examples

G<-matrix(c(0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0),nrow = 4,ncol = 4)
data=synthetic_data_nonlinear(G,100)