Title: | Directed Acyclic Graphs: Analysis and Data Simulation |
---|---|
Description: | Draw, manipulate, and evaluate directed acyclic graphs and simulate corresponding data, as described in International Journal of Epidemiology 50(6):1772-1777. |
Authors: | Lutz P Breitling |
Maintainer: | Lutz P Breitling <[email protected]> |
License: | GPL-2 |
Version: | 1.2.1 |
Built: | 2024-12-17 06:48:18 UTC |
Source: | CRAN |
The package dagR contains a couple of functions to draw, manipulate and evaluate directed acyclic graphs (DAG), with a focus on epidemiologic applications, namely the assessment of adjustment sets and potentially biasing paths.
The functions for finding and evaluating paths essentially implement the graphical algorithms outlined in Greenland (1999).
When using this package for your work, please cite Breitling (2010) and/or Breitling et al. (2022).
For motivations to use this package in epidemiology teaching and methodological research, please refer
to Duan et al. (2022).
Note: As spelled out in the license, this suite of functions comes without any warranty, and cautious use is strongly advised.
Although testing was carried out as meticulously as possible, it must be expected that bugs or errors remain, in particular in the early versions of the package.
Please report any problems, concerns, but also suggestions for improvements or extensions to the author.
Important additions in future versions could be e.g. improved drawing routines with better formatting of alternative node symbols in the DAG (taking into account the string length) and algorithms with intelligent/efficient search for minimal adjustment sets.
Package: | dagR |
Type: | Package |
Version: | 1.2.1 |
Date: | 2022-10-09 |
License: | GPL-2 |
LazyLoad: | yes |
dag.init
is used for setting up DAGs. See the code of the functions demo.dag0
to demo.dag6
for example code.
To adjust and/or evalute DAGs for biasing paths, use dag.adjust
, dag.draw
for drawing a DAG.
dag.search
uses brute.search
to evaluate all possible adjustment sets, allowing the identification of minimal sufficient adjustment sets using msas
.
dag.sim
simulates data (normally distributed or binary) according to
the causal structure given by a DAG object.
In version 1.2.0, generic S3 methods (print, plot, summary) for dagR-DAGs were implemented, but the original functions summary_dagRdag
to summarize and dag.draw
to plot a DAG object were preserved for backwards compatibility. Export functions to other packages were added upon a reviewer request.
Several helper functions currently are not hidden and should later be made internal.
Please see the NEWS file for version changes and known open issues.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Breitling LP, Duan C, Dragomir AD, Luta G (2022). Using dagR to identify minimal sufficient adjustment sets and to simulate data based on directed acyclic graphs. Int J Epidemiol 50(6):1772-1777 <doi:10.1093/ije/dyab167>.
Duan C, Dragomir AD, Luta G, Breitling LP (2022). Reflection on modern methods: Understanding bias and data analytical strategies through DAG-based data simulations. Int J Epidemiol 50(6):2091-2097 <doi:10.1093/ije/dyab096>.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology 10(1):37-48.
Conveniently add an arc to an existing DAG.
add.arc(dag, arc, type = 0)
add.arc(dag, arc, type = 0)
dag |
The DAG to which an arc should be added. |
arc |
A vector of length 2, indicating from which node (first element) to
which node (second element) the arc is to go.
Note: the node numbering follows the numbering of the existing
DAG (as shown in |
type |
0 (=default) for a directed arc, 1 for an undirected association. |
A DAG with the arc (and corresponding arc.type
) added, and with the
path-related variables (paths
, pathsN
, path.status
, searchType
, searchRes
) removed.
Lutz P Breitling <[email protected]>
Conveniently adds a node to an existing DAG, inserting its coordinates and label before the outcome node. Also updates the arcs correspondingly.
add.node(dag, name = "unknown", type = 1, x = NA, y = NA)
add.node(dag, name = "unknown", type = 1, x = NA, y = NA)
dag |
The DAG to which the node is to be added. |
name |
Label for the node (defaults to "unknown"). |
type |
Type of node (1=covariable, 2=unknown); defaults to 1. |
x |
X coordinate for the node position. |
y |
Y coordinate for the node position. |
If no x and y coordinates are provided, the function places the node
in an arbitrary position, slightly different with each additional
node, so that one can more easily replace the nodes afterwards
using dag.move
.
A DAG with the new node added.
Lutz P Breitling <[email protected]>
Adds two radian angles together and applies modulus 2*pi. This is internally called by smoothArc, though hardly needed.
addAngle(a, b)
addAngle(a, b)
a |
Angle 1 in radian. |
b |
Angle 2 in radian. |
numeric value [0, 2*pi).
Lutz P Breitling <[email protected]>
Creates a matrix with all combinations of 1 to all elements of the
vector provided. Elements to occur in all combinations can be specified.
This is internally called by brute.search
.
allCombs(x, force = c(), trace = FALSE)
allCombs(x, force = c(), trace = FALSE)
x |
A vector of elements of which combinations are to be formed. |
force |
A vector of elements that are supposed to occur in each combination. |
trace |
A boolean indicating if some output should be printed (TRUE) or not (FALSE=default). |
A matrix with one combination per row. For the shorter combinations, the columns to the right are filled up with NA.
Lutz P Breitling <[email protected]>
Calculates the radian angle of the line connecting two points.
Internally called by smoothArc
.
angle(A, B)
angle(A, B)
A |
Vector of length two indicating the coordinates of the first point. |
B |
Vector of length two indicating the coordinates of the second point. |
A numeric value [0, 2*pi).
Lutz P Breitling <[email protected]>
Calculates the coordinates of the point that is at
a specific radian angle in a specific distance from a source point.
Internally called by smoothArc
.
anglePoint(A, angl, len)
anglePoint(A, angl, len)
A |
Vector of length two with the coordinates of the source point. |
angl |
Radian angle indicating into which direction the new point is to be calculated. |
len |
The distance at which the new point is situated from the source point. |
A vector of length two with the coordinates of the new point.
Another pretty superfluous helper function...
Lutz P Breitling <[email protected]>
Checks if an association between two DAG nodes already exists, i.e.
does not need to be introduced when adjusting for a shared child etc.
Internally called by dag.adjustment
.
assoc.exists(dag, a, b)
assoc.exists(dag, a, b)
dag |
The DAG to be dealt with. |
a |
First node. |
b |
Second node. |
A boolean indicating whether or not an association between first node and second node already exists.
Lutz P Breitling <[email protected]>
Evaluates all adjustment sets of a DAG, optionally including adjustment sets including "unknown" nodes. If the DAG has a non-empty adjustment set, only adjustment sets including these adjustment variables are evaluated.
brute.search(dag, allow.unknown = FALSE, trace = TRUE, stop = 0)
brute.search(dag, allow.unknown = FALSE, trace = TRUE, stop = 0)
dag |
The DAG to be evaluated. |
allow.unknown |
Boolean indicating "unknown" nodes should be featured in the adjustment sets to be evaluated (TRUE) or not (FALSE=default). |
trace |
Boolean indicating if some output should be produced (TRUE=default). |
stop |
If =0, all eligible adjustment sets are evaluated. If =1, evaluations are stopped after the first sufficient adjustment set has been evaluated. Defaults to 0. |
A dataframe with the first columns (X1..Xn
) indicating the variables in the respective adjustment set evaluated.
The column totalPaths
indicates the number of paths found when adjusting for the respective set,
and openPaths
indicates the number of biasing paths.
The output produced by brute.search
allows to manually identify
sufficient and minimal sufficient adjustment sets, which in the future should
preferably be done by a helper summary function.
The evaluation of a complicated DAG like demo.dag2
can take quite
some time, and future functions should either employ more intelligent
algorithms to search specifically for sufficient sets, or they should
allow e.g. the evaluation of adjustment sets of specific sizes.
Lutz P Breitling <[email protected]>
Looks for associations introduced by adjusting for the covariables specified, then looks for biasing paths, and finally evaluates these paths.
dag.adjust(dag, A = c())
dag.adjust(dag, A = c())
dag |
The DAG to be adjusted (or evaluated). |
A |
Vector indicating the adjustment set. The numbering is according to the nodes vector of the DAG, which is shown e.g. in the legend of a DAG drawn by |
If the adjustment set is empty, the function only looks for biasing paths and evalutes these.
A DAG with the adjustment set A, and possibly with additional
associations introduced by adjustment, biasing paths found,
and the status of these.
If adjustment set is not empty, searchType
and searchRes
are set to NULL
.
CAVE: Do not apply this to an already adjusted DAG, since
this might not be handled appropriately (see documentation
of dag.adjustment
called by dag.adjust
).
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Breitling LP, Duan C, Dragomir AD, Luta G (2022). Using dagR to identify minimal sufficient adjustment sets and
to simulate data based on directed acyclic graphs. Int J Epidemiol 50(6):1772-1777.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology 10(1):37-48.
dag.adjustment
, find.paths
,
eval.paths
Identifies the associations introduced by adjustment for the
variables specified, and returns the DAG with these associations
added. Note that this is called internally by dag.adjust
, which
makes sure that biasing paths are looked for and evaluted afterwards.
Thus, dag.adjustment
should 1.) not be called directly, and
2.) not be called on an already adjusted DAG!
dag.adjustment(dag, A=NULL)
dag.adjustment(dag, A=NULL)
dag |
The DAG to be adjusted. |
A |
The adjustment set to be applied. |
The adjustment set A specified when calling dag.adjustment
overrules the
adjustment variables that are present in the DAG. To keep these in
the adjustment set, one has to add them to A.
A DAG with A as the adjustment set and the associations introduced by adjustment for A added to the DAG.
You should not use dag.adjustment
on an already adjusted
DAG, since it cannot identify associations that had been introduced
by the earlier adjustment. If the new adjustment set does not include
the adjustment variables present in the first set, the new DAG might
feature associations that actually only would be introduced when
adjusting for the variables featured in the first but not second
adjustment set.
Lutz P Breitling <[email protected]>
dag.adjust
, find.paths
,
eval.paths
This identifies those nodes in a DAG that are ancestors of the nodes specified, i.e. acc. to the model depicted by the DAG they causally precede those nodes.
Internally called by dag.adjustment
in the context of finding associations introduced by adjustment.
dag.ancestors(dag, A)
dag.ancestors(dag, A)
dag |
The DAG to be evaluated. |
A |
A vector of nodes for which ancestors are to be identified. |
A vector indicating which nodes are ancestors of those in A. Note that A actually is included at the beginning of the vector.
Lutz P Breitling <[email protected]>
Draws a DAG defined in an object of class dagRdag (as of dagR version 1.2.0, the generic function plot.dagRdag can be used for this purpose, but dag.draw is maintained for backwards compatibility). The nodes are represented by 'C' (covariables; numbered with subscripts) and 'U' (unknown/unmeasured covariables; numbered with subscripts), 'X' and 'Y' (exposure and outcome, respectively). A legend presents the names of the nodes. The X->Y arc is marked with a questionmark as the relationship of interest. Adjusted variables are under- and over-lined. Undirected associations are drawn with dashed lines. If paths have been identified (and evaluated), these (and their status) are written next to the legend.
dag.draw(dag, legend = TRUE, paths = TRUE, numbering = FALSE, p = FALSE, alt.symb = TRUE, noxy = 0, ...)
dag.draw(dag, legend = TRUE, paths = TRUE, numbering = FALSE, p = FALSE, alt.symb = TRUE, noxy = 0, ...)
dag |
The DAG to be drawn. |
legend |
Boolean indicating whether a node legend should be included. |
paths |
Boolean indicating whether paths (and their status) should be written. |
numbering |
Boolean indicating whether the arcs should be numbered in the DAG. |
p |
Boolean indicating whether the curving points of undirected associations should be drawn. |
alt.symb |
Boolean indicating if the alternative node symbols (dag$symbols) should be used. Note that especially the legends and paths will not be formatted nicely if these symbols are longer strings. |
noxy |
Integer to indicate if the X->Y should not be drawn (0=default; 1=no arc; 2=arc, but no question mark). |
... |
Currently not used. |
Returns the DAG (for whatever reason...).
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Breitling LP, Duan C, Dragomir AD, Luta G (2022). Using dagR to identify minimal sufficient adjustment sets and
to simulate data based on directed acyclic graphs. Int J Epidemiol 50(6):1772-1777.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology 10(1):37-48.
dag.letter
, garrows
,
smoothArc
, dag.legend
,
write.paths
Allows setting up a new DAG.
See the demo.dag0
to demo.dag6
functions for some example specifications.
dag.init(outcome = NULL, exposure = NULL, covs = c(), arcs = c(), assocs = c(), xgap = 0.04, ygap = 0.05, len = 0.1, y.name = NULL, x.name = NULL, cov.names = c(), symbols = NULL, ...)
dag.init(outcome = NULL, exposure = NULL, covs = c(), arcs = c(), assocs = c(), xgap = 0.04, ygap = 0.05, len = 0.1, y.name = NULL, x.name = NULL, cov.names = c(), symbols = NULL, ...)
outcome |
Currently not used! |
exposure |
Currently not used! |
covs |
Vector including an integer for each covariable to be in the DAG (1 for a "standard" covariable, 2 for an unknown/unmeasured one). |
arcs |
Vector of duplets of integers, in which nodes from which an arc or undirected association is to emenate are followed by those to which it is to point. To refer to the exposure, use 0, to refer to the outcome, use -1, to refer to covariables, use and element of 1:length(covs). |
assocs |
A vector of same length as covs, with 0 indicating directed arcs, 1 indicating undirected associations. |
xgap |
How much x space is to be left between arc ends and nodes when drawing? |
ygap |
How much y space is to be left between arc ends and nodes when drawing? |
len |
Length of arrow whiskers when drawing. |
y.name |
Label of outcome. |
x.name |
Label of exposure. |
cov.names |
Vector of covariable labels. |
symbols |
Vector of alternative node symbols. Longer symbols will not be formatted nicely. Note that the first element refers to the exposure, the following ones to the covariables, the last one to the outcome. |
... |
Currently not used. |
A DAG (objects of class dagRdag
). Check out some of the demonstration DAGs for details.
The DAG is actually a list object, with elements
cov.types
(the covs
vector, with 0 put in front, and -1 at the end).;
x
and y
(coordinates for drawing the nodes, initially set up more or less in a half-circle above the x->y arc);
arc
(the arcs, transformed into a matrix);
arc.type
(the assocs
vector);
curve.x
and curve.y
(if associations are featured, these provide the coordinates through which to curve);
xgap
, ygap
, len
(the respective drawing parameters);
symbols
(alternative node symbols);
version
(dagR version).
CAVE: The numbering of the covariables and arc coordinates is different here than in the functions later used on the DAG (e.g. add.arc
, dag.adjust
)! The functions generally work according to the indexing of the R objects that they handle. Whereas for dag.init
the n
covariable nodes are numbered 1:n
, the node vector of the resulting DAG will also contain the exposure node at the beginning and the outcome node at the end, i.e. it will go from 1:(n+2)
with the covariables at 2:n+1
. summary_dagRdag
will show the latter numbering. Example: when adjusting for the first covariable, dag.adjust
must be handed the adjustment set A=2
, as the first covariable will occupy the second node (the first node is occupied by the exposure).
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology 10(1):37-48.
#dag.init(covs = c(1, 1), arcs = c(0, 2, 1, 2, 1, 0, -1, 2))
#dag.init(covs = c(1, 1), arcs = c(0, 2, 1, 2, 1, 0, -1, 2))
Lists the DAG symbols along with their names/labels below a DAG drawn.
dag.legend(dag, lx = -0.15, ly = -0.075, alt.symb = TRUE)
dag.legend(dag, lx = -0.15, ly = -0.075, alt.symb = TRUE)
dag |
The DAG for which the legend is needed. |
lx |
X coordinate for repositioning legend. |
ly |
Y coordinate for repositioning legend. |
alt.symb |
Boolean indicating if the alternative node symbols (dag$symbols) should be used. Note that the formatting is not changed, i.e. longer symbols will not be formatted nicely. |
Lutz P Breitling <[email protected]>
Writes the node symbols, 'X' and 'Y' for exposure and outcome, 'C' and 'U' (with consecutive subscripts) for known and unknown covariables. Since v1.1.2, alt.symb allows the use of custom node symbols. Unknownness is identified by either node name 'unknown' or covariable type '2' in the DAG object. Note that adjusted nodes are marked by bar and underline; this currently does not apply to those marked as unknown.
dag.letter(dag, letter, x, y, alt.symb = TRUE)
dag.letter(dag, letter, x, y, alt.symb = TRUE)
dag |
The DAG for which a node is to be written. |
letter |
The node that is of interest. |
x |
X position. |
y |
Y position. |
alt.symb |
Boolean indicating if custom symbols (dag$symbols) should be used if available. |
Lutz P Breitling <[email protected]>
dag.draw
, dag.legend
,
write.paths
Similar to dag.letter(), but returning a string to label a DAG node. Adjusted nodes are marked by a preceding underscore.
dag.letter2(dag, letter, alt.symb)
dag.letter2(dag, letter, alt.symb)
dag |
The dagRdag object for which a node symbol is to be returned. |
letter |
The number of the node for which the symbol (often a single letter...) is to be returned. |
alt.symb |
If TRUE, the alternative node symbols of the DAG object will be used. |
A string containing the DAG letter or alternative symbol.
Lutz P Breitling <[email protected]>
This allows to reposition a node or association curving point of a DAG graphically. First, select a node or curving point by left-clicking close to it. Then reposition it to any other position by left-clicking. Once you are happy with the new position, right-click to exit.
dag.move(dag)
dag.move(dag)
dag |
The DAG to be modified. |
The same DAG, but with the feature repositioned.
Lutz P Breitling <[email protected]>
Currently, this simply is a wrapper for brute.search
, which returns the input DAG with the results of brute.search
and a string describing the search setup.
dag.search(dag, type = "brute", allow.unknown = FALSE, trace = FALSE, stop = 0)
dag.search(dag, type = "brute", allow.unknown = FALSE, trace = FALSE, stop = 0)
dag |
DAG to be evaluated. |
type |
Type of search to be performed. Currently, only =brute is possible. |
allow.unknown |
See |
trace |
See |
stop |
See |
The DAG with components searchType
and searchRes
added.
Lutz P Breitling <[email protected]>
Simulates data according to a DAG object. This function may be replaced by dag.sim2 in the future.
dag.sim(dag, b = rep(0, nrow(dag$arc)), bxy = 0, n, mu = rep(0, length(dag$x)), binary = rep(0, length(dag$x)), stdev = rep(0, length(dag$x)), naming = 2, seed = NA, verbose = FALSE)
dag.sim(dag, b = rep(0, nrow(dag$arc)), bxy = 0, n, mu = rep(0, length(dag$x)), binary = rep(0, length(dag$x)), stdev = rep(0, length(dag$x)), naming = 2, seed = NA, verbose = FALSE)
dag |
The DAG object according to which data is to be simulated. |
b |
Vector of coefficients defining the direct effects of the DAG arcs. |
bxy |
Coefficient defining the direct effect of main exposure X on outcome Y. |
n |
Number of observations to be simulated. |
mu |
Vector of means that are to be simulated for the different DAG nodes. For binary nodes without an ancestor, the mean is taken as the prevalence to be simulated. For binary nodes with ancestors, the mean is similarly interpreted (see details in Value section). |
binary |
Vector indicating which nodes are to be continuous (=0) and binary (=1). |
stdev |
Vector of standard deviations for each node. For nodes without ancestors, continuous data are drawn from a Normal distribution with this standard deviation. For nodes with ancestors, this is the standard deviation of the residual noise that is added to the calculated observation values. |
naming |
If =2, the alternative DAG node symbols are used for naming the variables in the output dataframe. Otherwise, the output dataframe variables are named X1...Xn. |
seed |
Seed to initialize the random number generator. |
verbose |
If =TRUE, additional output is given during the simulation, in particular showing the different calculation steps. |
A dataframe with n (rows) observations featuring simulated data for each node (columns) in the DAG. Simulation steps: 1. simulate data for nodes i without ancestors, drawing from Normal distribution with mean mu[i] and stdev[i] (continuous node), or drawing from Bernoulli events with probability mu[i] (binary node). 2. simulate data for nodes i for which all ancestors already have been simulated by multiplying the ancestor values with the corresponding arc coefficients and summing them up, shifting the resulting values to the mean mu[i] specified for the currently simulated node (logit-transformed if binary), then adding noise drawn from a Normal distribution with mean 0 and standard deviation stdev[i], finally using the inverse logit of the resulting values as success probabilities for simulating binary data if node is binary.
Undirected arcs are ignored in these simulations!
Lutz P Breitling <[email protected]>
Breitling LP, Duan C, Dragomir AD, Luta G (2022). Using dagR to identify minimal sufficient adjustment sets and
to simulate data based on directed acyclic graphs. Int J Epidemiol 50(6):1772-1777.
Duan C, Dragomir AD, Luta G, Breitling LP (2022). Reflection on modern methods: Understanding bias and data
analytical strategies through DAG-based data simulations. Int J Epidemiol 50(6):2091-2097.
Simulates data according to a DAG object.
dag.sim2(dag, b = rep(0, nrow(dag$arc)), bxy = 0, n, distr = rep(0, length(dag$x)), mu = rep(0, length(dag$x)), stdev = rep(0, length(dag$x)), nu = NA, lambda = NA, binary = NA, naming = 2, seed = NA, verbose = FALSE)
dag.sim2(dag, b = rep(0, nrow(dag$arc)), bxy = 0, n, distr = rep(0, length(dag$x)), mu = rep(0, length(dag$x)), stdev = rep(0, length(dag$x)), nu = NA, lambda = NA, binary = NA, naming = 2, seed = NA, verbose = FALSE)
dag |
The DAG object according to which data is to be simulated. |
b |
Vector of coefficients defining the direct effects of the DAG arcs (on linear scale). |
bxy |
Coefficient defining the direct effect of main exposure X on outcome Y (on linear scale). |
n |
Number of observations to be simulated. |
distr |
0 for Normal distribution continuous nodes, |
mu |
Vector of means that are to be simulated for the different DAG nodes: |
stdev |
Vector of standard deviations for each node. |
nu |
Not used. |
lambda |
Not used. |
binary |
For backwards compatibility: Vector indicating which nodes are to be continuous (=0) and binary (=1). If given, this is passed to argument "distr" and a warning is issued. |
naming |
If =2, the alternative DAG node symbols are used for naming the variables in the output dataframe. Otherwise, the output dataframe variables are named X1...Xn. |
seed |
Seed to initialize the random number generator. |
verbose |
If =TRUE, additional output is given during the simulation, in particular showing the different calculation steps. |
A dataframe with n (rows) observations featuring simulated data for each node (columns) in the DAG.
Simulation steps:
1. simulate data for nodes i without ancestors, drawing from Normal distribution with mean mu[i] and stdev[i]
(continuous node), or drawing from Bernoulli events with probability mu[i] (binary node).
2. simulate data for nodes i for which all ancestors already have been simulated by multiplying the ancestor values
with the corresponding arc coefficients and summing them up, shifting the resulting values to the mean mu[i] (exceptions: distr=1.1 or
distr=2.1, as detailed in "mu" above) specified for the
currently simulated node (logit-transformed if binary based on logistic model), then adding noise drawn from a Normal distribution with mean 0
and standard deviation stdev[i], finally using the resulting values (inverse logit, if binary based on logistic model) as success probabilities
for simulating binary data if node is binary.
As the noise is added after shifting to the mean, the mean of the simulated data will not be exact. Also, the noise is added before calculating descendant nodes, i.e. it is sort of true inter-individual variation, rather than measurement error.
For the risk difference model, the success probability calculated by summing the weighted ancestors can easily be <0 (or >1).
If this happens, the probability is set to 0 (or 1), and a warning is issued.
Undirected arcs are ignored in these simulations.
Lutz P Breitling <[email protected]>
Breitling LP, Duan C, Dragomir AD, Luta G (2022). Using dagR to identify minimal sufficient adjustment sets and
to simulate data based on directed acyclic graphs. Int J Epidemiol 50(6):1772-1777.
Duan C, Dragomir AD, Luta G, Breitling LP (2022). Reflection on modern methods: Understanding bias and data
analytical strategies through DAG-based data simulations. Int J Epidemiol 50(6):2091-2097.
Translates a DAG as defined in a dagRdag object created by dagR into the dagitty package format. Node labeling follows the rules used for plotting dagRdag objects, but adjusted nodes are marked by a preceding underscore instead of under- and over-line.
dagR2dagitty(x, alt.symb = TRUE, only.code = TRUE)
dagR2dagitty(x, alt.symb = TRUE, only.code = TRUE)
x |
The dagR DAG to be translated. |
alt.symb |
Boolean indicating if the alternative node symbols should be used. |
only.code |
If TRUE, a string with R dagitty function call is returned, which should be checked by the user (and possibly edited as required) before running it to create an equivalent dagitty DAG. If FALSE and the dagitty package has been installed and loaded, the dagitty function is called directly and the resulting dagitty DAG is returned. |
Either a string containing dagitty syntax to translate the dagR DAG into dagitty format, or a dagitty object.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Breitling LP, Duan C, Dragomir AD, Luta G (2022). Using dagR to identify minimal sufficient adjustment sets and
to simulate data based on directed acyclic graphs. Int J Epidemiol 50(6):1772-1777.
https://cran.r-project.org/package=dagitty
Initializes a simple DAG used during the dagR development phase.
demo.dag0()
demo.dag0()
Returns a DAG.
Lutz P Breitling <[email protected]>
demo.dag1
, demo.dag2
, demo.dag3
,
demo.dag4
, demo.dag5
, demo.dag6
Initializes a classical "M DAG" useful for demonstrating harmful adjustment. The DAG is motivated by figure 3 in Fleischer (2008) and also featured in Breitling (2010).
demo.dag1()
demo.dag1()
Returns a DAG.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Fleischer NL, Diez Roux AV (2008). Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction. J Epidemiol Community Health 62:842-846.
demo.dag0
, demo.dag2
, demo.dag3
,
demo.dag4
, demo.dag5
, demo.dag6
Initializes a more complex DAG, motivated by Shrier (2008).
This DAG was used to examine the performance of brute.search
and
has been featured in Breitling (2010).
demo.dag2()
demo.dag2()
Returns a DAG.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Shrier I, Platt RW (2008). Reducing bias through directed acyclic graphs. BMC Med Res Methodol 8:70
demo.dag0
, demo.dag1
, demo.dag3
,
demo.dag4
, demo.dag5
, demo.dag6
Initializes a DAG motivated by the manual for the software DAG v0.11 (Knüppel 2009). This DAG has been featured in Breitling (2010).
demo.dag3()
demo.dag3()
Returns a DAG.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Knüppel S (2009). DAG v0.11 documentation (Oct 21, 2009). https://hsz.dife.de/dag/
demo.dag0
, demo.dag1
, demo.dag2
,
demo.dag4
, demo.dag5
, demo.dag6
Initializes a miscellaneous DAG. What happens if you adjust for the exposure's child?
demo.dag4()
demo.dag4()
Returns a DAG.
Lutz P Breitling <[email protected]>
demo.dag0
, demo.dag1
, demo.dag2
,
demo.dag3
, demo.dag5
, demo.dag6
Initializes a miscellaneous DAG. What happens if you adjust for the outcome's child?
demo.dag5()
demo.dag5()
Returns a DAG.
Lutz P Breitling <[email protected]>
demo.dag0
, demo.dag1
, demo.dag2
,
demo.dag3
, demo.dag4
, demo.dag6
Initializes a miscellaneous DAG. What happens if you adjust for the collider?
demo.dag6()
demo.dag6()
Returns a DAG.
Lutz P Breitling <[email protected]>
demo.dag0
, demo.dag1
, demo.dag2
,
demo.dag3
, demo.dag4
, demo.dag5
Initializes a DAG motivated by the manual for the software DAG v0.11 (Kn\"uppel 2009). This DAG has been featured in Breitling (2010). The DAG is the same as DAG #3, but #7 demonstrates the use of alternative node symbols.
demo.dag7()
demo.dag7()
Returns a DAG.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Kn\"uppel S (2009). DAG v0.11 documentation (Oct 21, 2009). https://hsz.dife.de/dag/
demo.dag3
Another rather superfluous helper function, internally used by smoothArc
. Calculates the distance between two points.
distPoints(A, B)
distPoints(A, B)
A |
Vector of length two, indicating x and y of first point. |
B |
Vector of length two, indicating x and y of second point. |
Distance between the two points.
Lutz P Breitling <[email protected]>
This essentially implements the graphical algorithm described in Greenland (1999) to identify open "backdoor" (or not strictly backdoor, but potentially biasing) paths in a DAG. Paths are identified as being 'open', 'blocked by collider', or 'blocked by adjustment'. If both latter conditions apply, 'blocked by collider' is returned.
eval.paths(dag)
eval.paths(dag)
dag |
A DAG to which |
This function identifies a collider-blocked path as 'blocked by collider' even if it has been unblocked by adjusting for the collider. One could argue that this should not be the case. However, the biasing seems to be sufficiently represented in the DAG by the introduction of the association "jumping" the collider and potentially opening biasing paths.
A DAG with component path.status
added.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology 10(1):37-48.
This identifies paths linking exposure and outcome in a DAG. Forward paths (including a directed arc emanating from the exposure) are not identified.
find.paths(dag)
find.paths(dag)
dag |
A DAG for which paths should be found. |
A DAG with components pathsN
(number of paths identified) and paths
(matrix with each row describing one path by indicating the arcs forming the path; ends with NA
as some other function recognize the end of the path that way) added.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology 10(1):37-48.
Internally called by dag.draw
for drawing directed arcs.
garrows(x0, y0, x1, y1, xgap, ygap, len = 0.1)
garrows(x0, y0, x1, y1, xgap, ygap, len = 0.1)
x0 |
X coordinate of origin. |
y0 |
Y coordinate of origin. |
x1 |
X coordinate of target node. |
y1 |
Y coordinate of target node. |
xgap |
Space between node and arc ends on x axis. |
ygap |
Space between node and arc ends on y axis. |
len |
Length of arrow whiskers (default=0.1). |
Lutz P Breitling <[email protected]>
Another rather superfluous helper function, calculating the radian angle between two radian angles. Internally called by smoothArc
.
inAngle(a, b)
inAngle(a, b)
a |
Radian angle 1. |
b |
Radian angle 2. |
Numeric in range from -pi to pi.
Lutz P Breitling <[email protected]>
This function checks for each node in a DAG whether backtracing arcs leading to it results in an "infinite recursion" error indicating that there actually is a cyclic part in the DAG (which then obviously seems not to be a DAG).
is.acyclic(dag, maxSecs=NA)
is.acyclic(dag, maxSecs=NA)
dag |
The DAG to be check. |
maxSecs |
maximum time before function aborts; |
A list with two elements.
acyclic
is a boolean indicating whether the DAG is acyclic (=TRUE) or contains a cyclic component (=FALSE).
nodewise
is a vector containing 1 boolean per node in the DAG, TRUE indicating that backtracing from this node does not lead to a cyclic component, FALSE indicating that backtracing from this node leads to a cyclic component.
Lutz P Breitling <[email protected]>
Another trivial helper function, called internally by eval.paths
.
It checks whether the specified (numeric) value is part of a specified vector of (numeric) values.
is.in(x, c = NULL)
is.in(x, c = NULL)
x |
A numeric value, for which the presence in a vector is to be checked. |
c |
A vector of numeric values. |
Boolean; TRUE if value is present, FALSE if not.
Lutz P Breitling <[email protected]>
Another helper function, internally used by brute.search
. It checks whether the node specified is of type=2 or is named 'unknown'.
is.unknown(x, dag)
is.unknown(x, dag)
x |
The node of interest. |
dag |
The DAG to be evaluated. |
TRUE if unknown (acc. to type or name), FALSE otherwise.
Lutz P Breitling <[email protected]>
Evaluates DAG adjustment sets identified by a dag.search
(or brute.search
) for minimal sufficiency by counting for each sufficient adjustment set A how many smaller sufficient ones that are contained in A exist.
msas(adjSets)
msas(adjSets)
adjSets |
The |
A vector containing a -1
for each insufficient adjustment set, and for sufficient ones the number of smaller sufficient ones contained in it.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Breitling LP, Duan C, Dragomir AD, Luta G (2022). Using dagR to identify minimal sufficient adjustment sets and
to simulate data based on directed acyclic graphs. Int J Epidemiol 50(6):1772-1777.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology 10(1):37-48.
Knüppel S, Stang A (2010). DAG Program: identifying minimal sufficient adjustment sets. Epidemiology 21(1):159.
Generic function to draw a directed acyclic graph in an object of class dagRdag.
This essentially passes the DAG object to the function dag.draw
, which is
maintained for backwards compatibility.
## S3 method for class 'dagRdag' plot(x, y, ...)
## S3 method for class 'dagRdag' plot(x, y, ...)
x |
Object of class dagRdag to be passed to |
y |
Currently not used. |
... |
Other arguments to be passed to |
For all available arguments, see documentation of dag.draw
.
The DAG object is returned.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Breitling LP, Duan C, Dragomir AD, Luta G (2022). Using dagR to identify minimal sufficient adjustment sets and
to simulate data based on directed acyclic graphs. Int J Epidemiol 50(6):1772-1777.
Generic function print
code for class dagRdag
. This uses the default print
method for list objects and points the user to the availability of the more convenient summary
method.
## S3 method for class 'dagRdag' print(x, ...)
## S3 method for class 'dagRdag' print(x, ...)
x |
An object of class |
... |
Other arguments passed to the print routine. |
Lutz P Breitling <[email protected]>
Conveniently remove an arc from an existing DAG.
rm.arc(dag, arc)
rm.arc(dag, arc)
dag |
The DAG from which to remove the arc. |
arc |
A single integer, indicating which arc is to be removed (refering to the respective row of the |
A DAG with the arc specified removed along with the corresponding attributes like arc types, curves, and path evaluation variables.
The numbering of the arcs can be visualized by applying dag.draw
with the option "numbering=TRUE".
Lutz P Breitling <[email protected]>
Conveniently remove a node from an existing DAG.
rm.node(dag, node)
rm.node(dag, node)
dag |
The DAG from which to remove the node. |
node |
A single integer, indicating which node is to be removed. |
A DAG with the node specified removed, along with the corresponding attributes and dependent variables, i.e. arcs involving this node are also removed, and the numbering of the nodes (and their occurrence in arcs) is corrected accordingly.
Note: Search components (searchType
, searchRes
) of the DAG currently are generally set to NULL, even if no path is removed. This is for simplicity, because the node numbers would need to be changed eg. in the searchRes
variables etc.
Lutz P Breitling <[email protected]>
This draws a dashed connection between two points, curving it so that it goes through a third point.
This is internally used by dag.draw
to draw associations.
smoothArc(A, B, C, res = 20, gap = 0.05, p = FALSE)
smoothArc(A, B, C, res = 20, gap = 0.05, p = FALSE)
A |
Vector of length 2, providing xy coordinates of first point. |
B |
Vector of length 2, providing xy coordinates of second point. |
C |
Vector of length 2, indicating xy coordinates through which the association should be curved. |
res |
How smooth should the curve be drawn? |
gap |
How far from point A and B should the line end? |
p |
If TRUE, the point through which the curve goes is drawn (this is to allow better moving it with |
In the version 1.0.1 distributed as online supplemental material with Breitling (2010), the function contains arbitrary default values used during development.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Generic function summary()
working code for class dagRdag
, which is used by package dagR
from version 1.1.1 on. From version 1.2.0, summary.dagRdag()
is available as a generic function, but summary_dagRdag is preserved for backwards compatibility.
summary_dagRdag(dag)
summary_dagRdag(dag)
dag |
An object of class |
Summarizes according to what functions have been applied to the DAG. It does not itself call dag.search
and the like. Exception: is calls is.acyclic
(with maxSecs=5
).
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic reserach. Epidemiology 10(1):37-48.
Kn\"uppel S, Stang A (2010). DAG Program: identifying minimal sufficient adjustment sets. Epidemiology 21(1):159.
Generic function summary()
for class dagRdag
.
## S3 method for class 'dagRdag' summary(object, ...)
## S3 method for class 'dagRdag' summary(object, ...)
object |
An object of class |
... |
Currently not used. |
Summarizes according to what functions have been applied to the DAG. It does not itself call dag.search
and the like. Exception: is calls is.acyclic
(with maxSecs=5
).
This function passes the object
to summary_dagRdag
, which is preserved for backwards compatibility.
Lutz P Breitling <[email protected]>
Breitling LP (2010). dagR: a suite of R functions for directed acyclic graphs. Epidemiology 21(4):586-587.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology 10(1):37-48.
Kn\"uppel S, Stang A (2010). DAG Program: identifying minimal sufficient adjustment sets. Epidemiology 21(1):159.
Checks if all numeric elements of a vector occur also in another vector. It is internally used by msas
to check if some adjustment set is contained in another one.
viv(v1, v2)
viv(v1, v2)
v1 |
The vector whose occurrence in v2 is to be checked. |
v2 |
The vector in which v1 might occur. |
If a value occurs more than once in v1, it is counted as contained in v2 if it appears there once.
An empty v1 (consisting only of NA) is considered to be contained in any v2.
TRUE if v1 occurs in v2, FALSE otherwise.
Lutz P Breitling <[email protected]>
Writes the paths into a DAG drawing, using the symbols ('C', 'U', 'X', 'Y') used in the drawing, indicating directed arcs by '<' and '>', undirected ones by '-'. Since version 1.1.2, alt.symb allow usage of custom node symbols, though multi-character symbols will not be formatted well.
Adjusted variables are under- and over-lined.
If the paths have been evaluated using eval.paths
, the status are also written.
write.paths(dag, px = 0.5, py = -0.06, alt.symb = TRUE)
write.paths(dag, px = 0.5, py = -0.06, alt.symb = TRUE)
dag |
The DAG that has been drawn. |
px |
An x coordinate to change the position of the path writing. |
py |
A y coordinate to change the position of the path writing. |
alt.symb |
Boolean indicating if alternative node symbols (dag$symbols) should be used. |
Lutz P Breitling <[email protected]>
dag.draw
, find.paths
,
eval.paths
, dag.legend