Title: | Generate Causally-Simulated Data |
---|---|
Description: | Generate causally-simulated data to serve as ground truth for evaluating methods in causal discovery and effect estimation. The package provides tools to assist in defining functions based on specified edges, and conversely, defining edges based on functions. It enables the generation of data according to these predefined functions and causal structures. This is particularly useful for researchers in fields such as artificial intelligence, statistics, biology, medicine, epidemiology, economics, and social sciences, who are developing a general or a domain-specific methods to discover causal structures and estimate causal effects. Data simulation adheres to principles of structural causal modeling. Detailed methodologies and examples are documented in our vignette, available at <https://htmlpreview.github.io/?https://github.com/herdiantrisufriyana/rcausim/blob/master/doc/causal_simulation_exemplar.html>. |
Authors: | Herdiantri Sufriyana [aut, cre] , Emily Chia-Yu Su [aut] |
Maintainer: | Herdiantri Sufriyana <[email protected]> |
License: | GPL-3 |
Version: | 0.1.1 |
Built: | 2024-12-22 06:31:18 UTC |
Source: | CRAN |
Generate causally-simulated data
data_from_function(func, n)
data_from_function(func, n)
func |
Functions, an object class generated by
|
n |
Number of observations, a numeric of length 1, non-negative, and non-decimal. |
A data frame which include the simulated data for each vertex as a column.
data(functions) data_from_function(functions, n = 100)
data(functions) data_from_function(functions, n = 100)
Define a function in the list of functions
define(func, which, what)
define(func, which, what)
func |
Functions, an object class generated by
|
which |
Which, a character of length 1 indicating a vertex name for which function is defined. The vertex name must be defined in 'Functions'. |
what |
What, a function to be defined. It must use all and only the specified arguments for the vertex in 'Functions', if not previously defined. |
A list of either functions or character vectors of arguments for
function. It can be continuously defined or redefined by a user using
define
function. If all elements of the list are functions, then it
can be an input for generating the simulated data.
data(edges) functions <- function_from_edge(edges) function_B <- function(n){ rnorm(n, 90, 5) } functions <- define(functions, 'B', function_B)
data(edges) functions <- function_from_edge(edges) function_B <- function(n){ rnorm(n, 90, 5) } functions <- define(functions, 'B', function_B)
Identify edges given functions
edge_from_function(func)
edge_from_function(func)
func |
Functions, an object class generated by
|
A data frame which include the columns 'from' and 'to in this order.
data(functions) edge_from_function(functions)
data(functions) edge_from_function(functions)
An example of a data frame which include the columns 'from' and 'to' in this order. A vertex name 'n' does not exist.
edges
edges
A data frame with 7 rows and 2 columns:
A vertex name from which a directed edge comes.
A vertex name to which a directed edge comes.
Generated for examples in this package.
List functions given edges
function_from_edge(e)
function_from_edge(e)
e |
Edge, a data frame that must only include the columns 'from' and 'to in this order. A vertex name 'n' is not allowed. |
A list of character vectors of arguments for function which will be
defined by a user using define
function.
data(edges) function_from_edge(edges)
data(edges) function_from_edge(edges)
List functions from user
function_from_user(func)
function_from_user(func)
func |
Functions, a list of functions which are defined by a user. The list must be non-empty. All elements of the list must be named. All elements of the list must be functions. The list must construct 1 edge or more. |
A list of functions. It can be an input for generating the simulated
data, or redefined by a user using define
function.
function_B <- function(n){ rnorm(n, mean = 90, sd = 5) } function_A <- function(B){ ifelse(B>=95, 1, 0) } functions <- list(A = function_A, B = function_B) functions <- function_from_user(functions)
function_B <- function(n){ rnorm(n, mean = 90, sd = 5) } function_A <- function(B){ ifelse(B>=95, 1, 0) } functions <- list(A = function_A, B = function_B) functions <- function_from_user(functions)
An example of an object class generated by function_from_edge
or function_from_user
functions. The causal structure is a directed
acyclic graph (DAG), which means no loops are allowed. A function in the list
include 'n' as the only argument. All arguments within any function are
defined by their respective functions, except the argument 'n'. The output
lengths of vertex functions match the specified length 'n'.
functions
functions
A list with 5 elements:
A function with an argument 'n'.
A function with an argument 'B'.
A function with an argument 'A'.
A function with arguments 'A', 'B', and 'D'.
A function with arguments 'A' and 'C'.
Generated for examples in this package.
Print method for Functions
## S3 method for class 'Functions' print(x, ...)
## S3 method for class 'Functions' print(x, ...)
x |
Functions, an object class generated by
|
... |
Additional arguments are ignored in this method, but are included to maintain consistency with the generic print method. |
A summary of vertices that has functions. If there are vertices without functions, an instruction is shown.
data(edges) functions <- function_from_edge(edges) print(functions)
data(edges) functions <- function_from_edge(edges) print(functions)
Generate time-varying data
time_varying(func, data, T_max)
time_varying(func, data, T_max)
func |
Functions, an object class generated by
|
data |
Data, a data frame generated by |
T_max |
Maximum time for every instance, a numeric vector of length equal to the number of rows in 'data' and must be non-negative and non-decimal. |
A data frame which include the simulated data for each vertex as a column for each time up to maximum time for every instance.
data(functions) simulated_data <- data_from_function(functions, n = 100) function_B <- function(B){ B + 1 } functions <- define(functions, which = "B", what = function_B) T_max <- rpois(nrow(simulated_data), lambda = 25) time_varying(functions, data = simulated_data, T_max = T_max)
data(functions) simulated_data <- data_from_function(functions, n = 100) function_B <- function(B){ B + 1 } functions <- define(functions, which = "B", what = function_B) T_max <- rpois(nrow(simulated_data), lambda = 25) time_varying(functions, data = simulated_data, T_max = T_max)