Title: | Detecting Influence Paths with Information Theory |
---|---|
Description: | Traces information spread through interactions between features, utilising information theory measures and a higher-order generalisation of the concept of widest paths in graphs. In particular, 'vistla' can be used to better understand the results of high-throughput biomedical experiments, by organising the effects of the investigated intervention in a tree-like hierarchy from direct to indirect ones, following the plausible information relay circuits. Due to its higher-order nature, 'vistla' can handle multi-modality and assign multiple roles to a single feature. |
Authors: | Miron B. Kursa [aut, cre] |
Maintainer: | Miron B. Kursa <[email protected]> |
License: | GPL (>= 3) |
Version: | 2.0.3 |
Built: | 2024-12-27 06:14:46 UTC |
Source: | CRAN |
Gives access to a list of all branches in the tree.
branches(x, suboptimal = FALSE) ## S3 method for class 'vistla' as.data.frame(x, row.names = NULL, optional = FALSE, suboptimal = FALSE, ...)
branches(x, suboptimal = FALSE) ## S3 method for class 'vistla' as.data.frame(x, row.names = NULL, optional = FALSE, suboptimal = FALSE, ...)
x |
vistla object. |
suboptimal |
if TRUE, sub-optimal branches are included. |
row.names |
passed to |
optional |
passed to |
... |
ignored. |
A data frame collecting all branches traced by vistla.
Each row corresponds to a single branch, i.e., edge between feature pairs.
This way it is a triplet of original features, names of which are stored in a
,
b
and c
columns.
For instance, path
would be stored in three rows, for
=
,
and
.
The width of a path (minimal
value) between root and feature pair
is
stored in the
score
column.
depth
stores the path depth, starting from 1 for pairs directly connected to the root,
and increasing by one for each additional feature.
Final column, leaf
, is a logical path indicating whether the edge is a final segment
of the widest path between root and .
Pruned trees (obtained with prune
and using targets
argument
in the vistla
call) have no suboptimal branches.
Chain is generated from an uniform variable X by progressively adding gaussian noise, producing a mediator chain identical to this of the chain
data, i.e.,
The set consists of 20 observations, and is tuned to be easily deciphered.
data(cchain)
data(cchain)
A data set with six numerical columns.
Chain is generated from a simple Bayes network,
where every variable is binary. The set consists of 11 observations, and is tuned to be easily deciphered.
data(chain)
data(chain)
A data set with six binary factor columns.
Vistla can be run in the ensemble mode, in which tree is built multiple times, usually on a slightly modified input data. This mode can be triggered by passing a value to the ensemble argument of the vistla method. This function can be used to construct the proper value for this argument.
ensemble(n = 30, resample = TRUE, prune = 0) ## S3 method for class 'vistla_ensemble_control' print(x, ...)
ensemble(n = 30, resample = TRUE, prune = 0) ## S3 method for class 'vistla_ensemble_control' print(x, ...)
n |
number of replicatons. |
resample |
if |
prune |
Minimal number of iterations in which certain branch must appear not be prunned during ensemble consolidation.
Zero (default) means no prunning.
Note that |
x |
ensemble control value to print. |
... |
ignored. |
A vistla_ensemble_control
object which can be passed to the vistla
function.
Vistla builds the tree by optimising the influence score over path, which is given by the iota function.
The flow
argument of the vistla function can be used to modify the default iota and some associated behaviours.
This function can be used to construct the proper value for this argument.
flow(code, ..., from = TRUE, into = FALSE, down, up, forcepath) ## S3 method for class 'vistla_flow' print(x, ...)
flow(code, ..., from = TRUE, into = FALSE, down, up, forcepath) ## S3 method for class 'vistla_flow' print(x, ...)
code |
Character code of the flow parameter, like |
... |
ignored. |
from |
if |
into |
if |
down |
if |
up |
if |
forcepath |
when neither |
x |
flow value to print. |
A vistla_flow
object which can be passed to the vistla
function;
in practice, a single integer value.
Traverses the vistla tree in a depth-first order and lists the visited vertices as a data frame.
hierarchy(x)
hierarchy(x)
x |
vistla object. |
A data frame of a class vistla_hierarchy
.
This function effectively prunes the tree off suboptimal paths.
Junction is a model of a multimodal agent, a variable that is an element of multiple separate paths.
Here, these paths are
and
while
is the junction.
The set consists of 50 observations.
data(junction)
data(junction)
A data set with eight factor columns.
Produces a matrix where
is a score
of the path ending in vertices
and
.
Since vistla works on vertex pairs, this value is unique.
This can be interpreted as a feature similarity matrix
in context of the current vistla root.
leaf_scores(x)
leaf_scores(x)
x |
vistla object. |
A square matrix with leaf scores of all feature pairs.
This function should be called on an unpruned vistla tree, otherwise the result will be mostly composed of zeroes.
Produces a matrix where
is a
value of
.
This matrix is always calculated as an initial step of the
vistla algorithm and stored in the vistla object.
mi_scores(x)
mi_scores(x)
x |
vistla object. |
A symmetric square matrix with mutual information scores between features and root.
One can use this function for a quick, ad hoc discretisation of numerical features in a data frame, so that it could be passed to vistla
using the maximal likelihood estimation (mle, the default).
This can be used to simulate legacy behaviour of vistla, which was to automatically perform such conversion with 10 equal-width bins.
The non-numeric columns are left as they were, hence this function is idempotent and does nothing when given fully discrete data.
mle_coerce(x, bins = 3, equal = c("size", "width"))
mle_coerce(x, bins = 3, equal = c("size", "width"))
x |
Data frame to be converted. |
bins |
Number of bins to cut each numerical column into. |
equal |
If given |
A copy of x
, in which numerical columns have been discretised.
While convenient, this function does not necessary provide optimal quantisation of the data (in terms of future vistla performance); especially the bins parameter should be adjusted to the input data, either via optimisation or based on the known properties of the input or mechanisms behind it.
## Not run: data(cchain) vistla(Y~.,data=mle_coerce(cchain,3,"size")) ## End(Not run)
## Not run: data(cchain) vistla(Y~.,data=mle_coerce(cchain,3,"size")) ## End(Not run)
Gives access to a vector of feature names over a path to a certain target feature.
path_to(x, target, detailed = FALSE)
path_to(x, target, detailed = FALSE)
x |
vistla or vistla_hierarchy object. |
target |
target feature name. |
detailed |
if |
By default, a character vector with names of features along the path from target
into root.
When detailed
is set to TRUE
and input is a vistla object, a data.frame
in a format identical
to this produced by branches
, yet without the leaf
column.
Executes path_to
for all path possible targets and returns
a list with the results.
paths(x, targets_only = !is.null(x$targets), detailed = FALSE)
paths(x, targets_only = !is.null(x$targets), detailed = FALSE)
x |
vistla or vistla_hierarchy object. |
targets_only |
if |
detailed |
passed to |
A named list with one element per leaf or target, containing
the path between this feature and root, in a format identical
to this used by the path_to
function.
Plots a vistla tree, using layout derived by a Buchheim et al. extension of the standard Reingold-Tilford method. The tree root is placed on the left, while the paths extend to the right, with all branches of the same depth at the same horizontal coordinate. The path are sorted vertically, from strongest on top to weakest on the bottom. Link weight indicates, by default, the link's score. A feature name in parentheses indicates that is is only a way-point in a path to some other feature.
## S3 method for class 'vistla' plot( x, ..., slant, circular, asp1 = FALSE, pmar = c(0.05, 0.05, 0.05, 0.05), edge_col = 1, edge_lwd = "scale", edge_lty = 1, label_text = function(x) x$name, label_border_col = 1, label_border_lty = function(x) ifelse(x$leaf, 1, 2), label_fill = "white" ) ## S3 method for class 'vistla_plot' plot(x, ...) ## S3 method for class 'vistla_plot' print(x, ...)
## S3 method for class 'vistla' plot( x, ..., slant, circular, asp1 = FALSE, pmar = c(0.05, 0.05, 0.05, 0.05), edge_col = 1, edge_lwd = "scale", edge_lty = 1, label_text = function(x) x$name, label_border_col = 1, label_border_lty = function(x) ifelse(x$leaf, 1, 2), label_fill = "white" ) ## S3 method for class 'vistla_plot' plot(x, ...) ## S3 method for class 'vistla_plot' print(x, ...)
x |
vistla, vistla hierarchy or vistla plot object. |
... |
ignored. |
slant |
arrange vertices in a slanted way.
Can be given as a number, possibly negative, indicating the amount of slant, or as |
circular |
if given |
asp1 |
if |
pmar |
Specifies margins as a fraction of graph size; expects a 4-element vector, in standard R bottom-left-top-right order. |
edge_col |
edge colour; can be given as vector, then mapping order adheres to the one in hierarchy object; please note that the edge towards first feature, the root, is not drawn, so the first element is effectively ignored. If given as a function, it is called on the internally generated extended hierarchy object, and the result is used as an aesthetic. |
edge_lwd |
edge width; behaves similarly to |
edge_lty |
edge line-type; behaves similarly to |
label_text |
vertex label text, feature name by default.
Behaves similarly to |
label_border_col |
vertex label border colour; behaves similarly to |
label_border_lty |
vertex label border line-type; behaves similarly to |
label_fill |
vertex label fill colour; behaves similarly to |
Grid object with the graph.
The graph is rendered using the grid graphics system, in a manner similar to ggplot2
; the output of the plot.vistla
function is only a grid graphical object, while the actual plotting is done when this object is printed or plotted.
Yet, said object can be used with other functions in the grid ecosystem for rendering into files, being edited, combined with other plots, etc.
"Drawing rooted trees in linear time" C. Buchheim, M. Jünger, S. Leipert. Software: Practice and Experience 36(6):651-665 (2006).
Utility functions to print vistla objects.
## S3 method for class 'vistla_hierarchy' print(x, ...) ## S3 method for class 'vistla' print(x, n = 7L, ...)
## S3 method for class 'vistla_hierarchy' print(x, ...) ## S3 method for class 'vistla' print(x, n = 7L, ...)
x |
vistla object. |
... |
ignored. |
n |
maximal number of paths to preview. |
Invisible copy of x
.
This function allows to filter out suboptimal branches, as well as weak ones or these not in particular paths of interest.
prune(x, targets, iomin, score)
prune(x, targets, iomin, score)
x |
vistla object or a vistla_hierarchy object. |
targets |
a character vector of features. When not missing, all branches not on lying paths to these targets are pruned. Unreachable targets are ignored, while names not present in the analysed set cause an error. |
iomin |
a legacy name for score, valid only for vistla objects; passing a value to either of them works the same, but giving some values for both is an error. |
score |
a score threshold below which branches should be removed.
When given, it effectively overrides the value of |
Pruned x
; if both arguments are missing, this function still removes suboptimal branches.
## Not run: data(chain) v<-vistla(Y~.,data=chain) print(v) print(prune(v,targets="M3")) print(prune(v,score=0.3)) ## End(Not run)
## Not run: data(chain) v<-vistla(Y~.,data=chain) print(v) print(prune(v,targets="M3")) print(prune(v,score=0.3)) ## End(Not run)
Detects influence paths.
vistla(x, ...) ## S3 method for class 'formula' vistla(formula, data, ..., yn) ## S3 method for class 'data.frame' vistla( x, y, ..., flow, iomin, targets, estimator = c("mle", "kt"), verbose = FALSE, yn = "Y", ensemble, threads ) ## Default S3 method: vistla(x, ...)
vistla(x, ...) ## S3 method for class 'formula' vistla(formula, data, ..., yn) ## S3 method for class 'data.frame' vistla( x, y, ..., flow, iomin, targets, estimator = c("mle", "kt"), verbose = FALSE, yn = "Y", ensemble, threads ) ## Default S3 method: vistla(x, ...)
x |
data frame of predictors. |
... |
pass-through arguments, ignored. |
formula |
alternatively, formula describing the task, in a form |
data |
|
yn |
name of the root ( |
y |
vistla tree root, a feature from which influence paths will be traced. |
flow |
algorithm mode, specifying the iota function which gives local score to an edge of an edge graph.
If in doubt, use the default, |
iomin |
score threshold below which path is not considered further.
The higher value the less paths are generated, which also lowers the time taken by the function.
The default value of 0 turns of this filtering.
The same effect can be later achieved with the |
targets |
a vector of target feature names.
If given, the algorithm will stop just after reaching the last of them, rather than after tracing all paths from the root.
The same effect can be later achieved with the |
estimator |
mutual information estimator to use.
|
verbose |
when set to |
ensemble |
used to switch vistla to the ensemble mode, in which a number of vistla models are built over permuted realisations of the input, and merged into a single consensus tree.
Should be given an output of the |
threads |
number of threads to use. When missing or set to 0, vistla uses all available cores. |
Normally, the tracing results represented as an object of a class vistla
.
Use paths
and path_to
functions to extract individual paths,
branches
to get the whole tree and mi_scores
to get the basic score matrix.
When ensemble
argument is given, a hierarchy object with the scored being counts of times certain path was present among the replicated ensemble, possibly pruned.
The ensemble mode is both faster and makes better use of multithreading than replicating vistla manually.
"Kendall transformation brings a robust categorical representation of ordinal data" M.B. Kursa. SciRep 12, 8341 (2022).
Exports the vistla tree in a DOT format, which can be later layouted and rendered by Graphviz programs like dot or neato.
write.dot( x, con, vstyle = list(shape = function(x) ifelse(x$depth < 0, "egg", ifelse(x$leaf, "box", "ellipse")), label = function(x) sprintf("\"%s\"", x$name)), estyle = list(penwidth = function(x) sprintf("%0.3f", 0.5 + x$score/max(x$score) * 2.5)), gstyle = list(overlap = "\"prism\"", splines = "true"), direction = c("none", "fromY", "intoY") )
write.dot( x, con, vstyle = list(shape = function(x) ifelse(x$depth < 0, "egg", ifelse(x$leaf, "box", "ellipse")), label = function(x) sprintf("\"%s\"", x$name)), estyle = list(penwidth = function(x) sprintf("%0.3f", 0.5 + x$score/max(x$score) * 2.5)), gstyle = list(overlap = "\"prism\"", splines = "true"), direction = c("none", "fromY", "intoY") )
x |
vistla object. |
con |
connection; passed to |
vstyle |
vertex attribute list — should be a named list of Graphviz attributes like |
estyle |
edge attribute list, behaves exactly like |
gstyle |
graph attribute list. Functions are not supported here. |
direction |
when set to |
For a missing con
argument, a character vector with the graph in the DOT format, invisible NULL
otherwise.
Graphviz attribute values can be either strings, like "some vertex"
in label
, or atoms, like box
for shape
.
When returning a string value, you must supply quotes, otherwise it will be included as an atom.
The default value of gstyle
may invoke long layout calculations in Graphviz.
Change to list()
for a fast but less aesthetic layout.
The function does no validation whether provided attributes or values are correct.
"An open graph visualization system and its applications to software engineering" E.R. Gansner, S.C. North. Software: Practice and Experience 30:1203-1233 (2000).