| Title: | Transition Trajectories and Dynamics of Variable-Length Pathways or Sequences |
|---|---|
| Description: | Analyzes transition trajectories in event, sequence, and ordered data, focusing on how states follow one another, how far processes unfold, and where pathways branch or converge. Trajectories are modeled using variable-order prediction suffix trees (Ron, Singer, & Tishby, 1996) <doi:10.1023/A:1026490906255>, implemented in both frequency-based and prediction-based forms. The framework includes multiple pruning, validation, and smoothing techniques to ensure model robustness. Visualization options include transition trees, radial sunburst diagrams, transition heatmaps, and forward trajectory trees. |
| Authors: | Mohammed Saqr [aut, cre, cph], Sonsoles López-Pernas [aut] |
| Maintainer: | Mohammed Saqr <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.2 |
| Built: | 2026-06-18 22:13:25 UTC |
| Source: | https://github.com/cran/transitiontrees |
A long, one-row-per-message log from an AI-assisted collaboration study, with Unix timestamps and an explicit session id. Used to demonstrate long-format loading with Unix time and sessions. Bundled example dataset.
ai_longai_long
A data.frame with 8551 rows and 9 columns, including
project, session_id, timestamp (Unix seconds),
code / cluster (the state at two granularities), and
code_order / order_in_session (within-sequence order).
Bundled example dataset.
data(ai_long) context_tree(ai_long, actor = "project", time = "timestamp", action = "code", max_depth = 2L)data(ai_long) context_tree(ai_long, actor = "project", time = "timestamp", action = "code", max_depth = 2L)
Returns the canonical tidy node table — identical to
tree_pathways(tree). Lets users do
as.data.frame(tree) and immediately filter, sort, or export
with base-R idioms.
## S3 method for class 'transitiontrees' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'transitiontrees' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
A |
row.names, optional
|
Ignored. |
... |
Forwarded to |
A data.frame with columns pathway, depth,
count, likely_next, next_probability,
divergence, changes_prediction. See
tree_pathways.
Uniform tidy-extract: returns the per-pathway summary table
(object$summary), so as.data.frame(boot) and
summary(boot) are interchangeable extractors.
## S3 method for class 'transitiontrees_bootstrap' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'transitiontrees_bootstrap' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
A |
row.names, optional
|
Ignored. |
... |
Ignored. |
A data.frame; see bootstrap_pathways for the
full column vocabulary.
Uniform tidy-extract: returns the per-pathway divergence breakdown
(object$pathways) — the consumer-facing detail behind the
scalar pdist and the permutation p_value.
## S3 method for class 'transitiontrees_comparison' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'transitiontrees_comparison' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
A |
row.names, optional
|
Ignored. |
... |
Ignored. |
A data.frame with columns pathway, count_a,
count_b, divergence_ab, divergence_ba,
divergence_sym.
Row-binds each group's tree_pathways table, tagged
with a leading group column, so the whole batch is one tidy
frame ready to filter, facet, or join.
## S3 method for class 'transitiontrees_group' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'transitiontrees_group' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
A |
row.names, optional
|
Ignored. |
... |
Forwarded to |
A data.frame: the canonical pathway columns with a leading
group column identifying the source tree.
Uniform tidy-extract: returns the per-pathway comparison table
(object$pathways) so as.data.frame(cmp) yields the full
divergence / usage breakdown as a base data.frame.
## S3 method for class 'transitiontrees_group_comparison' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'transitiontrees_group_comparison' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
A |
row.names, optional
|
Ignored. |
... |
Ignored. |
A data.frame of per-pathway results; see
compare_groups for the column vocabulary.
Non-parametric sequence bootstrap for a fitted transitiontrees.
Methodologically built on Saqr, Tikka & López-Pernas (2025),
extending an edge-level bootstrap framework to
variable-depth pathways.
The bootstrap tracks every pathway in the original tree. Each
iteration resamples whole sequences with replacement, aggregates
raw counts per depth, and reads each pathway's count vector
directly from the resample. No smoothing, no nmin
filter, no extra parameters inside the loop — the bootstrap
operates on counts analogously to an edge-weight bootstrap.
Two complementary measures are reported per pathway:
p_stabilityBootstrap-estimated probability that
the chosen stat (default count) falls outside
[cr[1] * observed, cr[2] * observed], with a +1
correction. This is a stability p-value: small values mean the
pathway rarely fails the chosen reproducibility criterion under
sequence-level resampling.
stability_rateUncorrected descriptive companion:
the fraction of resamples where the chosen stat lies
inside the consistency band. A pathway whose count drops to zero
in a resample fails the band test automatically.
informative_rateFraction of resamples where the
pathway's empirical likelihood-ratio statistic
against its parent context exceeds the chi-square critical
value at level alpha_g2 (df = |alphabet| - 1).
Tests reproducibly significant divergence from the
shorter-history baseline.
Read stable and informative together:
stable && informative: reproducible and predictively
distinctive pathway.
stable && !informative: reproducible pathway count /
statistic, but not predictively distinctive from its parent.
!stable && informative: sharp or divergent pathway
carried by an unstable subset of sequences.
!stable && !informative: weak or sample-fragile
pathway.
bootstrap_pathways( tree, iter = 1000L, stat = c("count", "next_probability", "divergence"), consistency_range = c(0.5, 1.5), stability_threshold = 0.95, informative_threshold = 0.8, alpha = 0.05, ci_level = 0.05, seed = 1L, keep_resamples = TRUE, progress = FALSE )bootstrap_pathways( tree, iter = 1000L, stat = c("count", "next_probability", "divergence"), consistency_range = c(0.5, 1.5), stability_threshold = 0.95, informative_threshold = 0.8, alpha = 0.05, ci_level = 0.05, seed = 1L, keep_resamples = TRUE, progress = FALSE )
tree |
A fitted |
iter |
Integer. Number of bootstrap iterations.
Default |
stat |
Character. Pathway statistic on which
|
consistency_range |
Numeric vector of length 2.
Multiplicative tolerance band around the observed value.
Default |
stability_threshold |
Numeric in |
informative_threshold |
Numeric in |
alpha |
Numeric in |
ci_level |
Numeric in |
seed |
Integer or |
keep_resamples |
Logical. If |
progress |
Logical. Show a progress bar.
Default |
A transitiontrees_bootstrap object: a list with
Per-pathway data.frame, sorted so that
stable & informative pathways come first then by
stability_rate descending.
Empirical original-pathway statistics (no smoothing) as a tidy data.frame.
Raw resample matrices:
iter x n_pathways, columns named by pathway.
Configuration.
Saqr, M., Tikka, S., & López-Pernas, S. (2025). Transition Network Analysis. LAK '25, doi:10.1145/3706468.3706513.
seqs <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 1L) boot <- bootstrap_pathways(tree, iter = 50L) summary(boot)seqs <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 1L) boot <- bootstrap_pathways(tree, iter = 50L) summary(boot)
Returns the top n pathways by occurrence count – the
trajectories the data actually contains many copies of.
common_pathways(tree, top = 10L, depth = NULL, min_count = 1L)common_pathways(tree, top = 10L, depth = NULL, min_count = 1L)
tree |
A |
top |
Integer. Number of pathways to return. Default 10. |
depth |
Integer or NULL. Restrict to pathways of this exact
depth. |
min_count |
Integer. Minimum count cut-off. Default 1. |
A data.frame, same columns as tree_pathways,
sorted by count descending.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) common_pathways(tree, top = 8) common_pathways(tree, top = 8, depth = 3L) # restrict to depth-3seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) common_pathways(tree, top = 8) common_pathways(tree, top = 8, depth = 3L) # restrict to depth-3
Test how a set of fitted group trees (a transitiontrees_group from
context_tree(..., group =)) differ, on two complementary axes:
given the same context, do the groups predict a different next state? Measured per context by the count-weighted Jensen-Shannon divergence (bits) across the groups' next-state distributions.
do the groups reach a context at different
rates? Measured per context by a homogeneity statistic
on its prevalence (its share of each group's positions).
Significance is assessed by a label-permutation null throughout (per-pathway and omnibus), with Benjamini-Hochberg FDR on the per-pathway p-values.
compare_groups(group, iter = 999L, min_count = 1L, seed = 1L, block = NULL)compare_groups(group, iter = 999L, min_count = 1L, seed = 1L, block = NULL)
group |
A |
iter |
Integer. Number of label permutations. Default 999. |
min_count |
Integer. Drop contexts whose total count across all groups is below this. Default 1. |
seed |
Integer or |
block |
Block ids for a stratified permutation: group
labels are shuffled only within each block, so the null
respects nested / repeated-measures structure (e.g. several
sequences from one subject) and holds any between-block difference
fixed. Normally you do not pass this — fit with
|
The permutation pools every sequence, shuffles the group labels
(preserving group sizes), and recomputes the statistics from raw
counts using the same counting routine as the fit. The tested
context set is the union of the contexts the groups' trees actually
represent. For two groups the behavioral measure is JSD, which is
not the symmetric-KL distance used by
compare_trees(); the distance_matrix component
does use tree_distance() (symmetric KL) for
consistency with the pairwise function.
A transitiontrees_group_comparison: a list with
Per-context data.frame sorted by jsd_bits
descending, with columns pathway, depth,
count_total, one count_<group> and one
modal_<group> column per group (most likely next state,
ties broken by alphabet order), flips (do the groups'
modal next states disagree?), jsd_bits, jsd_p,
jsd_padj, usage_g2, usage_p,
usage_padj. usage_* is NA for the root,
which has no prevalence test.
Two-row data.frame: the behavioral and usage global statistics with permutation p-values.
K x K symmetric-KL distance matrix between
the groups (from tree_distance()).
Configuration.
compare_trees for the pairwise permutation test.
gx <- replicate(40, sample(c("A","B","C"), 8, replace = TRUE, prob = c(.2,.6,.2)), simplify = FALSE) gy <- replicate(40, sample(c("A","B","C"), 8, replace = TRUE, prob = c(.2,.2,.6)), simplify = FALSE) grp <- context_tree(c(gx, gy), group = rep(c("x","y"), each = 40), max_depth = 1L) cmp <- compare_groups(grp, iter = 199L) cmpgx <- replicate(40, sample(c("A","B","C"), 8, replace = TRUE, prob = c(.2,.6,.2)), simplify = FALSE) gy <- replicate(40, sample(c("A","B","C"), 8, replace = TRUE, prob = c(.2,.2,.6)), simplify = FALSE) grp <- context_tree(c(gx, gy), group = rep(c("x","y"), each = 40), max_depth = 1L) cmp <- compare_groups(grp, iter = 199L) cmp
Prunes a fitted tree under several criteria — holding alpha
and threshold fixed — and returns a tidy one-row-per-criterion
summary of how aggressively each trims the tree. A convenience
wrapper over repeated prune_tree calls that
collapses the usual vapply() criterion loop into one call.
compare_pruning( tree, criterion = c("G2", "KL", "AIC", "BIC"), alpha = 0.05, threshold = 0.005 )compare_pruning( tree, criterion = c("G2", "KL", "AIC", "BIC"), alpha = 0.05, threshold = 0.005 )
tree |
A |
criterion |
Character vector of criteria to compare. Defaults
to all four: |
alpha |
Significance level for |
threshold |
Minimum information gain in nats for |
A data.frame with one row per criterion (in the order
given by criterion) and columns criterion,
n_nodes (post-prune size) and reduction_pct (percent
of the original nodes removed).
prune_tree to apply one criterion,
tune_tree for cross-validated selection.
set.seed(1) seqs <- replicate(80, sample(c("A", "B", "C"), 14, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 4L, min_count = 3L) compare_pruning(tree) compare_pruning(tree, criterion = c("G2", "BIC"), alpha = 0.01)set.seed(1) seqs <- replicate(80, sample(c("A", "B", "C"), 14, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 4L, min_count = 3L) compare_pruning(tree) compare_pruning(tree, criterion = c("G2", "BIC"), alpha = 0.01)
Fits a context tree under several smoothing schemes — holding
max_depth, nmin and every other argument fixed — and
returns a tidy one-row-per-scheme comparison of tree size and
in-sample perplexity. A convenience wrapper over repeated
context_tree calls that collapses the usual five-line
lapply() loop into a single call.
compare_smoothing( data, smoothing = c("floor", "laplace", "kneser_ney", "witten_bell", "jelinek_mercer"), ... )compare_smoothing( data, smoothing = c("floor", "laplace", "kneser_ney", "witten_bell", "jelinek_mercer"), ... )
data |
Either sequence data in any form accepted by
|
smoothing |
Character vector of smoothing-method names to
compare. Defaults to all five: |
... |
Further arguments passed to |
The perplexity reported is in-sample (computed on the
fitting data), so it rewards memorisation and must not be used
to pick a smoother — use tune_tree() for
out-of-sample selection. The point of this table is the side-by-side
view and the invariance of n_nodes across schemes: smoothing
changes the probabilities inside the tree, never which
contexts exist (topology is set by nmin, not by the smoother).
A data.frame with one row per scheme (in the order
given by smoothing) and columns smoothing (method
name), n_nodes (tree size) and perplexity (in-sample).
smooth_tree to re-smooth a fitted tree
without re-counting; tune_tree for
cross-validated selection.
set.seed(1) seqs <- replicate(50, sample(c("A", "B", "C"), 12, replace = TRUE), simplify = FALSE) compare_smoothing(seqs, max_depth = 3L, min_count = 5L) compare_smoothing(seqs, smoothing = c("floor", "kneser_ney"), max_depth = 2L)set.seed(1) seqs <- replicate(50, sample(c("A", "B", "C"), 12, replace = TRUE), simplify = FALSE) compare_smoothing(seqs, max_depth = 3L, min_count = 5L) compare_smoothing(seqs, smoothing = c("floor", "kneser_ney"), max_depth = 2L)
Computes the count-weighted symmetric Kullback-Leibler divergence between two fitted transitiontrees, then provides a reference distribution by permuting sequence-to-tree assignments.
Use this to ask: do two cohorts (group A vs. group B, baseline vs. intervention) generate significantly different pathway dynamics?
compare_trees(tree_a, tree_b = NULL, iter = 200L, seed = 1L, symmetric = TRUE)compare_trees(tree_a, tree_b = NULL, iter = 200L, seed = 1L, symmetric = TRUE)
tree_a, tree_b
|
context trees fit on data subsets A and B.
Alternatively, pass a two-element |
iter |
Integer. Number of permutations. Default 200. |
seed |
Integer. RNG seed for reproducibility. Default 1. |
symmetric |
Logical. Default |
A transitiontrees_comparison object with components:
observed scalar distance
numeric vector, length iter
one-sided p-value (proportion of null at least as extreme as observed)
per-pathway breakdown data.frame
set.seed(1) m1 <- matrix(sample(c("A","B","C"), 200, TRUE), 20) m2 <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr1 <- context_tree(m1, max_depth = 2L, min_count = 3L) tr2 <- context_tree(m2, max_depth = 2L, min_count = 3L) compare_trees(tr1, tr2, iter = 50)set.seed(1) m1 <- matrix(sample(c("A","B","C"), 200, TRUE), 20) m2 <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr1 <- context_tree(m1, max_depth = 2L, min_count = 3L) tr2 <- context_tree(m2, max_depth = 2L, min_count = 3L) compare_trees(tr1, tr2, iter = 50)
Estimates a variable-depth context tree (prediction suffix tree; Ron, Singer & Tishby
1996) from a collection of sequences. Each internal node represents
a context (string of recent states); each leaf carries a smoothed
conditional distribution over the next state. The tree is grown to
max_depth, then optionally pruned via
prune_tree().
context_tree( data, max_depth = 5L, min_count = 5L, smoothing = "floor", alphabet = NULL, weights = NULL, group = NULL, block = NULL, actor = NULL, time = NULL, action = NULL, order = NULL, session = NULL, time_threshold = 900 )context_tree( data, max_depth = 5L, min_count = 5L, smoothing = "floor", alphabet = NULL, weights = NULL, group = NULL, block = NULL, actor = NULL, time = NULL, action = NULL, order = NULL, session = NULL, time_threshold = 900 )
data |
Sequence data in any of these forms: a wide data.frame /
character matrix (rows = trajectories, columns = time-steps), a
list of character vectors, or a
supported network/transition model object, taken
directly: a fitted network object carrying its sequences, or a
transition-network object. Any such object that
follows the same convention (a |
max_depth |
Integer. Maximum context length the tree may represent. Default 5. |
min_count |
Integer. Minimum number of times a context must occur
to receive its own node. Default 5. Contexts seen fewer than
|
smoothing |
Smoothing specification: a method name as a string
(uses defaults for that method's hyperparameters) or a list of
the form |
alphabet |
Character vector. Optional. Override the data-derived alphabet (useful when the test set may include states unseen in training). |
weights |
Numeric vector of per-sequence weights, length equal
to the number of input rows / list elements. |
group |
Optional grouping for a batch fit: a vector with
one entry per input sequence, a column name of a network object's
|
block |
Optional block id for a stratified group comparison, in
the same shapes as |
actor, time, order, session, time_threshold
|
Passed to
|
action |
Character. Naming a state/code column switches
|
Construction is bottom-up via k-gram counting. The root holds the
marginal next-state distribution; depth-k nodes hold the
next-state distribution conditional on the most recent k states.
Nodes whose total count falls below min_count are not created.
All nodes start "live" and unpruned; prune_tree()
decides which to retain.
For a single fit, a transitiontrees object (described below).
For a grouped fit (group = supplied, or a grouped family
object passed in) a transitiontrees_group: a named list of
transitiontreess, one per group, in the group's key order, with its
own print and as.data.frame methods. A single
transitiontrees is a list with components
Named list of node descriptors. Names are context
strings (e.g. "A -> B"); the root is keyed by the literal
sentinel "<root>". Each entry has depth (integer),
counts (numeric vector indexed by alphabet),
prob (smoothed probability vector), and n (sum of
counts).
data.frame with columns parent, child,
symbol for fast tree traversal.
character vector of states.
Integer. The fitted depth (may be less than the
requested max_depth if data are short).
the chosen min-count threshold.
resolved smoothing list (method +
hyperparameters).
number of sequences and observations.
Logical. TRUE after
prune_tree() has been applied.
When pruned is TRUE, a list capturing
the criterion / alpha / threshold used; otherwise NULL.
The cleaned trajectories (a list of character vectors) retained for downstream bootstrap and permutation routines.
Ron, D., Singer, Y., Tishby, N. (1996). The power of amnesia: learning probabilistic automata with variable memory length. Machine Learning, 25, 117-149.
set.seed(1) seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) tree summary(tree)set.seed(1) seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) tree summary(tree)
Returns the top n pathways by Kullback-Leibler divergence
from their (k-1)-suffix. These are the pathways whose extended
history adds the most predictive information over the shorter one.
Pathways whose most likely next state actually flips between orders
are marked in the changes_prediction column.
divergent_pathways(tree, top = 10L, min_count = 1L, flips_only = FALSE)divergent_pathways(tree, top = 10L, min_count = 1L, flips_only = FALSE)
tree |
A |
top |
Integer. Number of pathways to return. Default 10. |
min_count |
Integer. Drop pathways with fewer than this many occurrences. Default 1. |
flips_only |
Logical. If |
A data.frame, same columns as tree_pathways,
sorted by divergence descending.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) divergent_pathways(tree, top = 6) divergent_pathways(tree, flips_only = TRUE)seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) divergent_pathways(tree, top = 6) divergent_pathways(tree, flips_only = TRUE)
A wide set of student engagement-state sequences, one learner per row and one time-step per column. Used to demonstrate fitting from wide sequence data and from the network objects (tna / Nestimate) built on top of it. Bundled example dataset.
engagementengagement
A data.frame with 1000 rows (learners) and 25 columns
(time-steps). States "Active", "Average",
"Disengaged"; NA marks missing / dropped-out positions.
Bundled example dataset.
data(engagement) context_tree(engagement, max_depth = 2L)data(engagement) context_tree(engagement, max_depth = 2L)
Sample Sequences from a Fitted Context Tree
generate_sequences(tree, n = 5L, length = 10L, start = NULL)generate_sequences(tree, n = 5L, length = 10L, start = NULL)
tree |
A |
n |
Integer. Number of sequences to sample. |
length |
Integer. Length of each sampled sequence. |
start |
NULL or character vector. If NULL (default), each
sequence starts from the root marginal; otherwise must have
length |
A character matrix of dimension n x length.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) generate_sequences(tree, n = 5, length = 10)seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) generate_sequences(tree, n = 5, length = 10)
A long, one-row-per-event log of collaborative regulation moves, with
timestamps. Used to demonstrate long-format loading: reshape with
prepare_input() (or name action = in
context_tree()) to split each actor's events into
time-gap sessions. Bundled example dataset.
group_regulation_longgroup_regulation_long
A data.frame with 27533 rows and 6 columns:
integer; the learner.
character; an achievement-level covariate.
numeric; the collaboration group.
character; the course.
POSIXct; the event timestamp.
character; the regulation move (the state).
Bundled example dataset.
data(group_regulation_long) context_tree(group_regulation_long, actor = "Actor", time = "Time", action = "Action", max_depth = 3L)data(group_regulation_long) context_tree(group_regulation_long, actor = "Actor", time = "Time", action = "Action", max_depth = 3L)
Fill the gaps in incomplete sequences using a fitted context tree: each missing state is predicted from the longest matching context of the states that precede it. Filling proceeds left to right, so a just-imputed state becomes part of the context for later gaps.
impute_sequences(tree, newdata, method = c("modal", "prob"), seed = NULL)impute_sequences(tree, newdata, method = c("modal", "prob"), seed = NULL)
tree |
A |
newdata |
Sequences with gaps: a list of character vectors, a
character matrix / data.frame (one row per sequence, |
method |
One of |
seed |
Integer or |
Only internal gaps are imputed. A run of trailing NA
/ "" cells (end-of-sequence padding in a wide frame) is left
untouched, since there is no observed state after it to mark the
sequence as continuing. A sequence that is entirely missing is
returned unchanged (there is nothing to condition on).
The same container shape as newdata (list, matrix,
data.frame, or character vector) with internal gaps filled.
seqs <- replicate(60, sample(c("A", "B", "C"), 8, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 2L) gappy <- list(c("A", NA, "C"), c("B", "B", NA, "A")) impute_sequences(tree, gappy)seqs <- replicate(60, sample(c("A", "B", "C"), 8, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 2L) gappy <- list(c("A", NA, "C"), c("B", "B", NA, "A")) impute_sequences(tree, gappy)
Returns a logLik object compatible with stats::AIC(),
stats::BIC(), and the rest of the model-comparison toolchain.
If newdata is NULL, returns the in-sample log-likelihood
computed from the fitted node counts. Otherwise returns the held-out
log-likelihood scoring newdata under the fitted tree.
## S3 method for class 'transitiontrees' logLik(object, newdata = NULL, ...)## S3 method for class 'transitiontrees' logLik(object, newdata = NULL, ...)
object |
A |
newdata |
Optional. Sequence data in any format accepted by
|
... |
Ignored. |
Out-of-vocabulary handling: when newdata contains a state not
in the tree's alphabet, the transition into that state is
omitted from scoring (it is not penalised), and the transition
out of it is scored against the root context (the unseen
state cannot extend a history). The reported nobs therefore
counts only the positions actually scored. perplexity(),
score_sequences(), and score_positions() inherit the
same behaviour.
A logLik object with attributes nobs and
df (number of free parameters in the fitted tree).
tree <- context_tree(matrix(sample(c("A","B","C"), 200, TRUE), 20), max_depth = 2, min_count = 2) logLik(tree) AIC(tree); BIC(tree)tree <- context_tree(matrix(sample(c("A","B","C"), 200, TRUE), 20), max_depth = 2, min_count = 2) logLik(tree) AIC(tree); BIC(tree)
Scan every context in a fitted tree for a chosen next state and
return those whose predicted probability for that state falls in a
requested range. A tidy context-mining table:
"in which histories is the next move state unusually likely or unlikely?"
mine_contexts(tree, state, min_prob = NULL, max_prob = NULL, min_count = 1L)mine_contexts(tree, state, min_prob = NULL, max_prob = NULL, min_count = 1L)
tree |
A |
state |
Character. The next state to score, one of the tree's alphabet. |
min_prob, max_prob
|
Numeric in |
min_count |
Integer. Drop contexts with fewer than this many occurrences. Default 1. |
A data.frame with columns pathway, depth,
count, state, prob (P(state | context)),
and is_modal (whether state is the context's most
likely next state; ties broken by alphabet order), sorted by
prob descending. The empty case returns a 0-row data.frame
with the same schema.
seqs <- replicate(60, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 2L) mine_contexts(tree, state = "A", min_prob = 0.4)seqs <- replicate(60, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 2L) mine_contexts(tree, state = "A", min_prob = 0.4)
Rank held-out sequences by how well the fitted tree predicts them. A tidy pattern-mining table: surface the subsequences the model finds most surprising (poor fit, high perplexity) or most expected (good fit, low perplexity).
mine_sequences(tree, newdata, n = 10L, which = c("surprising", "expected"))mine_sequences(tree, newdata, n = 10L, which = c("surprising", "expected"))
tree |
A |
newdata |
Sequence data in any format accepted by
|
n |
Integer. Number of sequences to return. Default 10. |
which |
One of |
A data.frame with the score_sequences columns
(sequence_id, n_scored, log_lik,
perplexity), the top n by the chosen direction.
fit <- replicate(60, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(fit, max_depth = 2L) new <- replicate(20, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) mine_sequences(tree, new, n = 5, which = "surprising")fit <- replicate(60, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(fit, max_depth = 2L) new <- replicate(20, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) mine_sequences(tree, new, n = 5, which = "surprising")
Bundles the standard goodness-of-fit scalars for a fitted tree into
one tidy row: logLik, the parameter count df, the
observation count nobs, AIC, BIC, and
perplexity. A one-call replacement for
logLik(); nobs(); AIC(); BIC(); perplexity().
model_fit(tree, newdata = NULL)model_fit(tree, newdata = NULL)
tree |
A |
newdata |
Optional sequence data. If supplied, the scalars are
evaluated on it (held-out); if |
With newdata, every scalar is computed out-of-sample
(AIC/BIC use the held-out deviance with the model's
training df). A transitiontrees_group returns one row per
group, tagged with a leading group column.
A one-row data.frame (one row per group for a
transitiontrees_group) with columns logLik, df,
nobs, AIC, BIC, perplexity.
perplexity, logLik.transitiontrees.
seqs <- replicate(60, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 2L, min_count = 3L) model_fit(tree)seqs <- replicate(60, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 2L, min_count = 3L) model_fit(tree)
The count of contexts the tree represents — an intuitive accessor for
length(tree$nodes) (the number printed in the tree banner).
n_nodes(tree)n_nodes(tree)
tree |
A |
An integer. For a transitiontrees_group, a named integer
vector with one count per group.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) n_nodes(context_tree(seqs, max_depth = 3L))seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) n_nodes(context_tree(seqs, max_depth = 3L))
Number of Observations Used to Fit a context tree
## S3 method for class 'transitiontrees' nobs(object, ...)## S3 method for class 'transitiontrees' nobs(object, ...)
object |
A |
... |
Ignored. |
Integer. The number of state observations the tree was fitted
on — the (weight-adjusted) token total, equal to the "nobs"
attribute of the in-sample logLik.transitiontrees().
(A held-out logLik(newdata) reports its own "nobs" —
the positions actually scored, e.g. excluding states outside the
tree's alphabet — which can be smaller.)
Test Whether a Pathway Exists in the Tree
pathway_exists(tree, pathway)pathway_exists(tree, pathway)
tree |
A |
pathway |
Character. Pathway as arrow-notation string or character vector. |
Logical scalar.
tr <- context_tree(matrix(sample(c("A","B"), 50, TRUE), 5), max_depth = 2L, min_count = 1L) pathway_exists(tr, "A")tr <- context_tree(matrix(sample(c("A","B"), 50, TRUE), 5), max_depth = 2L, min_count = 1L) pathway_exists(tr, "A")
exp(-mean log-likelihood per observation), the standard
language-modelling evaluation metric. Lower is better. A perplexity
of on an alphabet of size means the model is as
predictive as a uniform distribution over symbols.
is the uniform baseline; is perfect
deterministic prediction.
perplexity(tree, newdata = NULL)perplexity(tree, newdata = NULL)
tree |
A |
newdata |
Sequence data; |
Numeric scalar.
tree <- context_tree(matrix(sample(c("A","B","C"), 200, TRUE), 20), max_depth = 2, min_count = 2) perplexity(tree)tree <- context_tree(matrix(sample(c("A","B","C"), 200, TRUE), 20), max_depth = 2, min_count = 2) perplexity(tree)
A per-context map of how two groups differ in their next-state
predictions: one row per shared context, one column per next state.
By default each cell is the Pearson standardized residual of the first
group against the no-difference null (red = more than expected, blue =
less; |r| > 2 notable), which is support-aware and decomposes
the per-context ; measure = "probability" shows the raw
P(group1) - P(group2) instead.
plot_difference( group, groups = NULL, depth = NULL, min_count = 1L, comparison = NULL, alpha = 0.05, annotate = TRUE, layout = c("tile", "tree"), measure = c("residual", "probability") )plot_difference( group, groups = NULL, depth = NULL, min_count = 1L, comparison = NULL, alpha = 0.05, annotate = TRUE, layout = c("tile", "tree"), measure = c("residual", "probability") )
group |
A |
groups |
Optional length-2 character vector naming the two groups
to subtract ( |
depth |
Integer or |
min_count |
Integer. Drop contexts whose count in either group is below this. Default 1. |
comparison |
Optional |
alpha |
Numeric. FDR threshold for the significance stars when
|
annotate |
Logical. Print the signed difference in each cell
( |
layout |
One of |
measure |
For |
A ggplot object.
compare_groups for the significance test.
gx <- replicate(60, sample(c("A","B","C"), 8, replace = TRUE, prob = c(.2,.6,.2)), simplify = FALSE) gy <- replicate(60, sample(c("A","B","C"), 8, replace = TRUE, prob = c(.2,.2,.6)), simplify = FALSE) grp <- context_tree(c(gx, gy), group = rep(c("x","y"), each = 60), max_depth = 2L) plot_difference(grp) # residual heatmap plot_difference(grp, layout = "tree") # difference on the tree mapgx <- replicate(60, sample(c("A","B","C"), 8, replace = TRUE, prob = c(.2,.6,.2)), simplify = FALSE) gy <- replicate(60, sample(c("A","B","C"), 8, replace = TRUE, prob = c(.2,.2,.6)), simplify = FALSE) grp <- context_tree(c(gx, gy), group = rep(c("x","y"), each = 60), max_depth = 2L) plot_difference(grp) # residual heatmap plot_difference(grp, layout = "tree") # difference on the tree map
Small-multiples bar chart of the full next-state distribution
P(next | context), one panel per context. A per-context
probability display: where plot_pathways() renders the same
numbers as a heatmap, this shows each context's distribution as its
own panel, with the modal bar highlighted.
plot_distributions(tree, contexts = NULL, top = 12L, min_count = 1L)plot_distributions(tree, contexts = NULL, top = 12L, min_count = 1L)
tree |
A |
contexts |
Character vector of pathway strings to show, or
|
top |
Integer. Number of contexts to show when |
min_count |
Integer. Drop contexts below this count. Default 1. |
A ggplot object.
seqs <- replicate(60, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 2L) plot_distributions(tree, top = 9)seqs <- replicate(60, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 2L) plot_distributions(tree, top = 9)
Lollipop chart of per-pathway Kullback-Leibler divergence from the (k-1)-suffix. Point size is proportional to pathway count; orange points mark pathways whose modal next state flips between orders. Annotates each flip with the prediction change, e.g. "Disengaged -> Active".
plot_divergence(tree, top = 15L, min_count = 5L, title = NULL, ...)plot_divergence(tree, top = 15L, min_count = 5L, title = NULL, ...)
tree |
A |
top |
Integer. Number of pathways to show. Default 15. |
min_count |
Integer. Drop pathways below this count. Default 5. |
title |
Character. Plot title; if |
... |
Ignored. |
A ggplot object.
seqs <- replicate(50, sample(c("A", "B", "C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3L) plot_divergence(tree)seqs <- replicate(50, sample(c("A", "B", "C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3L) plot_divergence(tree)
Faceted histogram of the bootstrap resample values for a chosen pathway statistic, one panel per pathway.
plot_pathway_resamples( x, pathways = NULL, stat = c("count", "next_probability", "divergence", "G2"), top = 6L, bins = 30L )plot_pathway_resamples( x, pathways = NULL, stat = c("count", "next_probability", "divergence", "G2"), top = 6L, bins = 30L )
x |
A |
pathways |
Character vector of pathway names. |
stat |
Character. One of |
top |
Integer. Default 6. |
bins |
Integer. Histogram bins. Default 30. |
A ggplot object.
seqs <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) boot <- bootstrap_pathways(context_tree(seqs, max_depth = 1L), iter = 50L) plot_pathway_resamples(boot, stat = "count")seqs <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) boot <- bootstrap_pathways(context_tree(seqs, max_depth = 1L), iter = 50L) plot_pathway_resamples(boot, stat = "count")
Heatmap visualisation of the pathway table: rows are pathways
(sorted), columns are next-state probabilities under the fitted
tree. The modal next state of each row is annotated in bold; rows
whose modal next state flips relative to their parent
pathway are flagged in the row labels (with a leading caret
>). A side strip on the left encodes the pathway count on a
log scale.
This is the natural pathway-focused visualisation: one glance shows which pathways are common, which are sharp (high mass on a single next state), which are diffuse (mass spread evenly), and which carry trajectory-specific structure that order-1 misses.
plot_pathways( tree, top = 20L, sort_by = c("count", "divergence", "depth"), min_count = 5L, show_flips = TRUE, title = NULL, ... )plot_pathways( tree, top = 20L, sort_by = c("count", "divergence", "depth"), min_count = 5L, show_flips = TRUE, title = NULL, ... )
tree |
A |
top |
Integer. Maximum number of pathways to show. Default 20. |
sort_by |
Character. One of |
min_count |
Integer. Drop pathways with fewer than this many occurrences. Default 5. |
show_flips |
Logical. Mark modal-flip pathways with a leading
caret in the label. Default |
title |
Character. Plot title; if |
... |
Ignored. |
A ggplot object.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) plot_pathways(tree) plot_pathways(tree, sort_by = "divergence", top = 12)seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) plot_pathways(tree) plot_pathways(tree, sort_by = "divergence", top = 12)
Visual diagnostics for how a fitted tree scores held-out sequences,
built on score_positions():
type = "position"predicted probability of the observed next state against position in the sequence — where the model is confident vs. surprised as a sequence unfolds.
type = "ecdf"the empirical cumulative distribution of those predicted probabilities — a calibration-style view of how often the model assigns high vs. low probability to what actually happened.
type = "logloss"the per-position log-loss
against position — a
per-position log-loss view. Lower is better
(0 = certain and correct); the dashed line is the mean log-loss
over all scored positions.
plot_predictive(tree, newdata, type = c("position", "ecdf", "logloss"))plot_predictive(tree, newdata, type = c("position", "ecdf", "logloss"))
tree |
A |
newdata |
Held-out sequence data in any format accepted by
|
type |
One of |
A ggplot object.
fit <- replicate(60, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(fit, max_depth = 2L) new <- replicate(15, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) plot_predictive(tree, new, type = "position") plot_predictive(tree, new, type = "ecdf") plot_predictive(tree, new, type = "logloss")fit <- replicate(60, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(fit, max_depth = 2L) new <- replicate(15, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) plot_predictive(tree, new, type = "position") plot_predictive(tree, new, type = "ecdf") plot_predictive(tree, new, type = "logloss")
A suffix-chain pruning view: take one pathway and show, side by
side, the next-state distribution at every context along its suffix
chain — the full context, then the context with its oldest move
dropped, and so on down to the root — marking which contexts
prune_tree (criterion "G2") keeps versus prunes.
It answers "how much memory does this pathway actually need?": each
context is drawn as its own panel (deepest memory on the left, root on
the right) and classified into three states by opacity. Solid
contexts are informative — their own clears the cutoff,
so they add predictive information over their one-shorter parent.
Mid-opacity contexts are retained: their own is
below the cutoff, but a deeper context diverges, so prune_tree()
keeps them only as a structural bridge — they do not themselves add
memory. Faded contexts are pruned (the redundant tail).
The panel title carries the full context, the decision, and the
. The requested pathway must be a fitted context; the function
errors rather than silently plotting a shorter suffix.
plot_pruning(tree, pathway, alpha = 0.05)plot_pruning(tree, pathway, alpha = 0.05)
tree |
A |
pathway |
A single pathway string in arrow form
( |
alpha |
Significance level for the |
The keep/prune decision is exactly prune_tree's G2 rule
(); the cumulative
pruned flag follows the leaf-up amnesia rule. The distributions
and counts shown are the same node values reported throughout the
pathway API (see PARITY.md).
A ggplot object.
prune_tree, plot_distributions.
seqs <- replicate(80, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3L, min_count = 3L) plot_pruning(tree, "A -> B -> C")seqs <- replicate(80, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3L, min_count = 3L) plot_pruning(tree, "A -> B -> C")
Where plot.transitiontrees draws the fitted context tree
backwards (each node is a suffix, the most-recent move), this
draws the same prompts forwards: a prefix tree that starts at a
common root and follows each sequence move by move in time. The one
tree can be coloured two ways:
measure = "frequency"node and edge colour and width encode how many sequences walk each path.
measure = "predictability"colour encodes
from the model
(tree); edge width still encodes flow.
Higher values are drawn darker.
plot_trajectories( tree, measure = c("frequency", "predictability"), min_count = 4L )plot_trajectories( tree, measure = c("frequency", "predictability"), min_count = 4L )
tree |
A |
measure |
One of |
min_count |
Integer. Keep only prefixes occurring at least this many times. Default 4. |
The predictability of a node's last move is the model's conditional
probability of that move given the preceding history, truncated to the
tree's max_depth and read via query_pathway()
(the empty history for a first move uses the root distribution). This
is the forward-reading complement to the backward context tree. It is
drawn entirely in ggplot2, with no additional plotting dependency.
A ggplot object.
plot.transitiontrees for the backward context
tree, plot_pruning for the suffix-chain view.
seqs <- replicate(120, sample(c("A", "B", "C"), 8, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3L, min_count = 3L) pruned <- prune_tree(tree) plot_trajectories(tree, measure = "frequency") plot_trajectories(pruned, measure = "predictability")seqs <- replicate(120, sample(c("A", "B", "C"), 8, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3L, min_count = 3L) pruned <- prune_tree(tree) plot_trajectories(tree, measure = "frequency") plot_trajectories(pruned, measure = "predictability")
Renders a fitted transitiontrees in one of four styles:
"horizontal" (default) — pure-ggplot2 left-to-right
phylogram: root on the left, contexts fanned out vertically to
the right, each labelled beneath with its full arrow-notation
context and (by default) its modal prediction
"(state pct%)" on a second line. Node fill is the
most recent move (rightmost token), so each branch off a
depth-1 hub shares a colour. Set show_prediction = FALSE
for context-only labels.
"dendrogram" — pure-ggplot2 radial tree: root at the
centre, leaves on the outer ring.
"icicle" — pure-ggplot2 circular partition / sunburst;
inner ring carries full state names, outer rings carry 3-letter
abbreviations.
"interactive" — visNetwork htmlwidget
(Suggests). A draggable, zoomable hierarchical tree; node size
= context count and edge width = child's count (“flow”),
the same encoding as the static styles. Hover for a tooltip
with the full pathway, count, modal next state, and the
complete next-state distribution.
Common encoding: node size = context count, edge thickness = child's
count (“flow”). Node fill is the most recent move (rightmost
token of the context) in the "horizontal" style; the
"dendrogram", "icicle", and "interactive" styles
colour by the branching (oldest) token.
## S3 method for class 'transitiontrees' plot( x, style = c("horizontal", "dendrogram", "icicle", "interactive"), point_size_range = NULL, edge_size_range = NULL, ... )## S3 method for class 'transitiontrees' plot( x, style = c("horizontal", "dendrogram", "icicle", "interactive"), point_size_range = NULL, edge_size_range = NULL, ... )
x |
A |
style |
One of |
point_size_range |
Numeric length-2 vector controlling the
minimum and maximum node-point size. Default |
edge_size_range |
Numeric length-2 vector for edge width.
Default |
... |
Passed to the chosen backend. For
|
The three static styles ("horizontal", "dendrogram",
"icicle") are drawn in pure ggplot2. "interactive"
requires visNetwork; the dispatcher errors informatively if it
is missing.
A ggplot object for the three static styles; an
htmlwidget for "interactive".
set.seed(1) m <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr <- context_tree(m, max_depth = 2L, min_count = 3L) plot(tr) # left-to-right phylogram (default) plot(tr, style = "dendrogram") # radial dendrogram plot(tr, style = "icicle") # sunburst if (requireNamespace("visNetwork", quietly = TRUE)) plot(tr, style = "interactive") # visNetwork htmlwidget (Suggests)set.seed(1) m <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr <- context_tree(m, max_depth = 2L, min_count = 3L) plot(tr) # left-to-right phylogram (default) plot(tr, style = "dendrogram") # radial dendrogram plot(tr, style = "icicle") # sunburst if (requireNamespace("visNetwork", quietly = TRUE)) plot(tr, style = "interactive") # visNetwork htmlwidget (Suggests)
Forest plot of per-pathway (likelihood-ratio against
parent) with bootstrap 95% CI bars. Pathways are ordered with
stable & informative ones first, then by stability rate.
The chi-square critical value at alpha_g2 is shown as a
dashed reference line: pathways whose CI lies entirely above it
are reproducibly informative.
## S3 method for class 'transitiontrees_bootstrap' plot(x, top = 25L, min_stability = NULL, ...)## S3 method for class 'transitiontrees_bootstrap' plot(x, top = 25L, min_stability = NULL, ...)
x |
A |
top |
Integer. Maximum pathways to show. Default 25. |
min_stability |
Numeric. Minimum stability_rate to display.
Default |
... |
Ignored. |
A ggplot object.
seqs <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) boot <- bootstrap_pathways(context_tree(seqs, max_depth = 1L), iter = 50L) plot(boot)seqs <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) boot <- bootstrap_pathways(context_tree(seqs, max_depth = 1L), iter = 50L) plot(boot)
Visualises the permutation-test result. Histogram of the null distribution of transitiontrees distances under shuffled group labels; a vertical line marks the observed distance; the panel header carries the p-value.
## S3 method for class 'transitiontrees_comparison' plot(x, bins = 30L, ...)## S3 method for class 'transitiontrees_comparison' plot(x, bins = 30L, ...)
x |
A |
bins |
Integer. Histogram bins. Default 30. |
... |
Ignored. |
A ggplot object.
Draw every member of a transitiontrees_group in turn via
plot.transitiontrees. Each member's plot is printed (so the
call produces one figure per group, e.g. in an R Markdown chunk),
captioned with its group name; the named list of plot objects is
returned invisibly for further use.
## S3 method for class 'transitiontrees_group' plot(x, ...)## S3 method for class 'transitiontrees_group' plot(x, ...)
x |
A |
... |
Passed to |
Invisibly, a named list (one entry per group, in group order) of the per-member plot objects.
m <- matrix(sample(c("A","B","C"), 200, replace = TRUE), 40, 5) grp <- context_tree(m, group = rep(c("x","y"), each = 20), max_depth = 2L) plot(grp) # one tree per groupm <- matrix(sample(c("A","B","C"), 200, replace = TRUE), 40, 5) grp <- context_tree(m, group = rep(c("x","y"), each = 20), max_depth = 2L) plot(grp) # one tree per group
Plot a Group Comparison
## S3 method for class 'transitiontrees_group_comparison' plot(x, style = c("divergence", "matrix"), top = 15L, alpha = 0.05, ...)## S3 method for class 'transitiontrees_group_comparison' plot(x, style = c("divergence", "matrix"), top = 15L, alpha = 0.05, ...)
x |
A |
style |
One of |
top |
Integer. Pathways to show in |
alpha |
Numeric. FDR cutoff for highlighting. Default 0.05. |
... |
Ignored. |
A ggplot object.
Visualises the held-out perplexity surface returned by
tune_tree(). Lines track perplexity vs. max_depth;
facets split by smoothing scheme and prune; colour encodes
nmin. The minimum-perplexity configuration is highlighted
with a star.
## S3 method for class 'transitiontrees_tune' plot(x, ...)## S3 method for class 'transitiontrees_tune' plot(x, ...)
x |
A |
... |
Ignored. |
A ggplot object.
Predict Next-State Probabilities from a Context Tree
## S3 method for class 'transitiontrees' predict(object, newdata, type = c("prob", "class"), ...)## S3 method for class 'transitiontrees' predict(object, newdata, type = c("prob", "class"), ...)
object |
A |
newdata |
Either (i) a list of character vectors (each is the "history" leading up to the prediction point), (ii) a wide data.frame / matrix whose rows are histories, or (iii) a single character vector treated as one history. |
type |
One of |
... |
Ignored. |
If type = "prob": a matrix with one row per history
and one column per state. A list/data.frame/matrix newdata
always returns a matrix (1 x k for a single-history container);
a bare character vector returns a named vector for interactive
convenience. If type = "class": a character vector of
modal predictions.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) predict(tree, newdata = list(c("A","B"), c("C","C","B"))) predict(tree, newdata = list(c("A","B")), type = "class") predict(tree, newdata = c("A","B")) # bare vector → named vectorseqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) predict(tree, newdata = list(c("A","B"), c("C","C","B"))) predict(tree, newdata = list(c("A","B")), type = "class") predict(tree, newdata = c("A","B")) # bare vector → named vector
Turns a long, one-row-per-event table into the wide, one-row-per-
sequence character frame that context_tree() consumes.
Events are grouped by actor and ordered by time (or
order); when time is given, each actor's events are
split into sessions whenever the gap to the previous event
exceeds time_threshold seconds. This mirrors the standard
timestamp / session-splitting rule, in pure base R.
prepare_input( data, actor = NULL, time = NULL, action = NULL, order = NULL, session = NULL, time_threshold = 900, format = NULL, is_unix_time = FALSE, unix_time_unit = c("seconds", "milliseconds", "microseconds"), meta = NULL )prepare_input( data, actor = NULL, time = NULL, action = NULL, order = NULL, session = NULL, time_threshold = 900, format = NULL, is_unix_time = FALSE, unix_time_unit = c("seconds", "milliseconds", "microseconds"), meta = NULL )
data |
A long-format |
actor |
Character. Column(s) naming the unit each sequence
belongs to (e.g. a user id). Several columns are combined with
|
time |
Character. Column holding the event timestamp, used both
to order events and to split sessions by |
action |
Character. Column holding the event's state / code — the symbol that becomes a cell of the sequence. Required. |
order |
Character. Optional column giving an explicit within-
actor ordering (used when |
session |
Character. Optional column giving an explicit session id within an actor. If supplied, sessions are taken from it directly and no time-gap splitting is done. |
time_threshold |
Numeric. Seconds; a gap larger than this starts
a new session. Default |
format |
Character. Optional |
is_unix_time |
Logical. Force |
unix_time_unit |
One of |
meta |
Optional character vector of column names to carry through
the reshape as per-sequence metadata (one value per session, the
first event's). Returned as a |
A wide character data.frame, one row per sequence
(session), columns T1, T2, ... holding the ordered states and
trailing NAs past the end of each sequence. Row names are the
session ids. When meta is given, an aligned per-sequence
metadata data.frame is attached as attr(., "meta").
Pass it straight to context_tree().
long <- data.frame( user = c("a","a","a","a","b","b"), t = as.POSIXct("2020-01-01 09:00:00", tz = "UTC") + c(0, 60, 3600, 3660, 0, 30), state = c("X","Y","X","Z","Y","X"), stringsAsFactors = FALSE) ## one-hour gap splits user a into two sessions wide <- prepare_input(long, actor = "user", time = "t", action = "state") wide context_tree(wide, max_depth = 2L, min_count = 1L)long <- data.frame( user = c("a","a","a","a","b","b"), t = as.POSIXct("2020-01-01 09:00:00", tz = "UTC") + c(0, 60, 3600, 3660, 0, 30), state = c("X","Y","X","Z","Y","X"), stringsAsFactors = FALSE) ## one-hour gap splits user a into two sessions wide <- prepare_input(long, actor = "user", time = "t", action = "state") wide context_tree(wide, max_depth = 2L, min_count = 1L)
Print method for the object returned by
summary.transitiontrees: a one-line banner followed by the
leading rows of the canonical pathway table.
## S3 method for class 'summary.transitiontrees' print(x, n = 10L, ...)## S3 method for class 'summary.transitiontrees' print(x, n = 10L, ...)
x |
A |
n |
Integer. Number of pathway rows to print. Default 10. |
... |
Ignored. |
x invisibly.
Print a Context Tree
## S3 method for class 'transitiontrees' print(x, max_lines = 25L, digits = 2L, ...)## S3 method for class 'transitiontrees' print(x, max_lines = 25L, digits = 2L, ...)
x |
A |
max_lines |
Integer. Maximum tree-rendering lines. Default 25. |
digits |
Integer. Probability digits. Default 2. |
... |
Ignored. |
x invisibly.
Print a context tree Bootstrap
## S3 method for class 'transitiontrees_bootstrap' print(x, n = 10L, digits = 3L, ...)## S3 method for class 'transitiontrees_bootstrap' print(x, n = 10L, digits = 3L, ...)
x |
A |
n |
Integer. Number of top pathways to print. Default 10. |
digits |
Integer. Numeric digits for the printed table. Default 3. |
... |
Ignored. |
x invisibly.
Print a context tree Comparison
## S3 method for class 'transitiontrees_comparison' print(x, digits = 3L, n = 6L, ...)## S3 method for class 'transitiontrees_comparison' print(x, digits = 3L, n = 6L, ...)
x |
A |
digits |
Integer. Numeric digits for the printed table. Default 3. |
n |
Integer. Number of top divergent pathways to print. Default 6. |
... |
Ignored. |
x invisibly.
Print a Group of Context Trees
## S3 method for class 'transitiontrees_group' print(x, ...)## S3 method for class 'transitiontrees_group' print(x, ...)
x |
A |
... |
Ignored. |
x invisibly.
Print method for a transitiontrees_group_comparison: a header with
the groups and permutation setup, the omnibus behavioral / usage
statistics, and the top pathways ranked by behavioral divergence.
## S3 method for class 'transitiontrees_group_comparison' print(x, n = 10L, digits = 3L, ...)## S3 method for class 'transitiontrees_group_comparison' print(x, n = 10L, digits = 3L, ...)
x |
A |
n |
Integer. Number of top pathways to print. Default 10. |
digits |
Integer. Numeric digits for the printed tables. Default 3. |
... |
Ignored. |
x invisibly.
Print a context tree Tuning Grid
## S3 method for class 'transitiontrees_tune' print(x, n = 10L, ...)## S3 method for class 'transitiontrees_tune' print(x, n = 10L, ...)
x |
A |
n |
Integer. Number of configurations to print. Default 10. |
... |
Ignored. |
x invisibly.
Removes nodes that do not earn their depth under the chosen criterion. Pruning is applied bottom-up: a node is dropped when extending its parent's prediction at this context produces less information / likelihood than the depth penalty allows.
Renamed from prune() to avoid collision with other
prune() generics.
prune_tree( tree, criterion = c("G2", "KL", "AIC", "BIC"), alpha = 0.05, threshold = 0.005 )prune_tree( tree, criterion = c("G2", "KL", "AIC", "BIC"), alpha = 0.05, threshold = 0.005 )
tree |
A |
criterion |
One of |
alpha |
Numeric in (0, 1). Significance level for |
threshold |
Numeric. Minimum information gain in nats for
|
For each leaf, compute the criterion against its parent. If the
criterion does not exceed its threshold, drop the leaf and revisit
the parent. Repeat until stable. The root is never dropped.
Surviving nodes keep their original smoothed prob vector
(whatever smoothing scheme was applied at fit time).
Note on units: the "KL" threshold is in nats
(natural log), whereas the divergence column reported by
tree_pathways() / divergent_pathways() is
in bits (log base 2). Multiply a nats threshold by
1 / log(2) (~1.4427) to read it on the pathway-table scale.
A pruned transitiontrees with tree$pruned = TRUE and
tree$pruning carrying the criterion + threshold settings.
Ron, D., Singer, Y., Tishby, N. (1996). The power of amnesia. Machine Learning, 25, 117-149.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 4) pruned <- prune_tree(tree, criterion = "G2", alpha = 0.05)seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 4) pruned <- prune_tree(tree, criterion = "G2", alpha = 0.05)
Returns the probability the fitted tree assigns to a given pathway / next-state pair. Two lookup modes:
exact = TRUE: the pathway must appear as a node;
otherwise returns NA.
exact = FALSE (default): if the pathway is missing,
falls back to the longest matching suffix that *is* in the tree
(mirrors predict.transitiontrees()).
query_pathway(tree, pathway, next_state = NULL, exact = FALSE)query_pathway(tree, pathway, next_state = NULL, exact = FALSE)
tree |
A |
pathway |
Character. The conditioning pathway, either as a
single arrow-notation string ("A -> B -> C") or as a character
vector of states ( |
next_state |
Character. Next-state symbol to query, or
|
exact |
Logical. Default |
If next_state is supplied, a numeric scalar. Otherwise
a named numeric vector indexed by alphabet.
set.seed(1) m <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr <- context_tree(m, max_depth = 2L, min_count = 3L) query_pathway(tr, c("A","B")) query_pathway(tr, "A -> B", next_state = "C")set.seed(1) m <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr <- context_tree(m, max_depth = 2L, min_count = 3L) query_pathway(tr, c("A","B")) query_pathway(tr, "A -> B", next_state = "C")
Returns one row per held-out (sequence, position) with the matched context, predicted probability of the observed next state, and log-likelihood contribution. Useful for diagnostic plots showing where the model is confident vs. surprised.
score_positions(tree, newdata, worst = NULL)score_positions(tree, newdata, worst = NULL)
tree |
A |
newdata |
Sequence data. |
worst |
Integer or |
A data.frame with columns sequence_id,
position, matched_context, observed,
predicted_prob, log_lik.
fit <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(fit, max_depth = 1L) new <- replicate(5, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) score_positions(tree, new, worst = 5L)fit <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(fit, max_depth = 1L) new <- replicate(5, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) score_positions(tree, new, worst = 5L)
Returns one row per held-out sequence with its log-likelihood,
number of scored positions, and per-sequence perplexity
(exp(-log_lik / n_scored)).
score_sequences(tree, newdata)score_sequences(tree, newdata)
tree |
A |
newdata |
Sequence data. |
A data.frame with columns sequence_id,
n_scored, log_lik, perplexity.
fit <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(fit, max_depth = 1L) new <- replicate(5, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) score_sequences(tree, new)fit <- replicate(40, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) tree <- context_tree(fit, max_depth = 1L) new <- replicate(5, sample(c("A", "B", "C"), 10, replace = TRUE), simplify = FALSE) score_sequences(tree, new)
Returns the top n pathways by predictive sharpness – the
probability mass on their modal next state. High values indicate
strongly deterministic continuations; low values indicate
ambiguous next-state distributions.
sharp_pathways(tree, top = 10L, min_count = 1L)sharp_pathways(tree, top = 10L, min_count = 1L)
tree |
A |
top |
Integer. Number of pathways to return. Default 10. |
min_count |
Integer. Drop pathways with fewer than this many occurrences. Default 1. |
A data.frame, same columns as tree_pathways,
sorted by next_probability descending.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) sharp_pathways(tree, top = 5)seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) sharp_pathways(tree, top = 5)
S3 simulate() method for transitiontrees objects. Wraps
generate_sequences with the standard nsim
argument name and an optional seed (set via
set.seed() when supplied).
## S3 method for class 'transitiontrees' simulate(object, nsim = 5L, seed = NULL, length = 10L, start = NULL, ...)## S3 method for class 'transitiontrees' simulate(object, nsim = 5L, seed = NULL, length = 10L, start = NULL, ...)
object |
A |
nsim |
Integer. Number of sequences to simulate. Default 5. |
seed |
Integer or |
length |
Integer. Length of each simulated sequence. |
start |
NULL or character vector. Optional first state for each
sequence; see |
... |
Ignored. |
A character matrix of dimension nsim x length.
Replaces every node's probability vector with a new smoothing scheme without refitting the tree. Walks nodes top-down by depth so each node's parent is re-smoothed before its children read it.
smooth_tree(tree, smoothing = "floor")smooth_tree(tree, smoothing = "floor")
tree |
A |
smoothing |
Smoothing specification: either a method name as a
string (uses defaults for that method's hyperparameters) or a list
of the form |
For "kneser_ney" the canonical continuation-distribution
formulation requires per-state type counts. transitiontrees does
not track these; the implementation uses the parent's smoothed
probability as the back-off distribution, an approximation
discussed in Begleiter, El-Yaniv & Yona (2004), JAIR 22, §3.
A new transitiontrees with re-smoothed probabilities. Counts
and topology are unchanged.
set.seed(1) m <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr <- context_tree(m, max_depth = 2L, min_count = 3L) smooth_tree(tr, "kneser_ney") smooth_tree(tr, list("kneser_ney", discount = 0.5))set.seed(1) m <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr <- context_tree(m, max_depth = 2L, min_count = 3L) smooth_tree(tr, "kneser_ney") smooth_tree(tr, list("kneser_ney", discount = 0.5))
Returns a new transitiontrees containing only the queried node and
its descendants. Node names are kept absolute (so the original
pathway is preserved as a key), but the returned object has a
local_root attribute pointing at the queried pathway.
subtree(tree, pathway)subtree(tree, pathway)
tree |
A |
pathway |
Character. The root pathway (arrow-notation or character vector). Must exist in the tree. |
A new transitiontrees whose nodes and edges are restricted
to descendants of pathway. The alphabet, smoothing, and
other hyperparameters are copied unchanged. Printing the subtree
reports the context it was cut at (also stored in
attr(., "local_root") for programmatic use).
set.seed(1) m <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr <- context_tree(m, max_depth = 2L, min_count = 3L) subtree(tr, "A") # prints "subtree of: A" in the bannerset.seed(1) m <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr <- context_tree(m, max_depth = 2L, min_count = 3L) subtree(tr, "A") # prints "subtree of: A" in the banner
Summary of a Context Tree
## S3 method for class 'transitiontrees' summary(object, ...)## S3 method for class 'transitiontrees' summary(object, ...)
object |
A |
... |
Ignored. |
A summary.transitiontrees object. The $table slot is
the canonical pathway data.frame from
tree_pathways (columns pathway,
depth, count, likely_next,
next_probability, divergence,
changes_prediction), re-sorted by (depth, -count) so
the structural tree order is read top-to-bottom.
Summarise a context tree Bootstrap
## S3 method for class 'transitiontrees_bootstrap' summary(object, ...)## S3 method for class 'transitiontrees_bootstrap' summary(object, ...)
object |
A |
... |
Ignored. |
The per-pathway summary data.frame
(object$summary); see bootstrap_pathways
for the full column vocabulary.
Prints a compact verdict for a transitiontrees_group_comparison:
the omnibus behavioral and usage permutation p-values, and how many
pathways pass the FDR cutoff on each axis or flip their modal next
state between groups. Returns the per-pathway table invisibly.
## S3 method for class 'transitiontrees_group_comparison' summary(object, alpha = 0.05, ...)## S3 method for class 'transitiontrees_group_comparison' summary(object, alpha = 0.05, ...)
object |
A |
alpha |
Numeric. FDR cutoff used when counting significant pathways. Default 0.05. |
... |
Ignored. |
Invisibly, the per-pathway data.frame (object$pathways);
see compare_groups for the column vocabulary.
An example set of categorical learning-engagement trajectories used in
the transitiontrees examples and the “trajectories” vignette.
Each row is one learner, each column a time-step, and each cell the
engagement state at that step. Trailing NAs mark the end of a
trajectory. This wide character matrix is exactly the shape
context_tree() consumes.
trajectoriestrajectories
A character matrix with 138 rows (learners) and 15 columns
(time-steps). Three states: "Active", "Average",
"Disengaged".
Bundled example dataset.
data(trajectories) dim(trajectories) tree <- context_tree(trajectories) treedata(trajectories) dim(trajectories) tree <- context_tree(trajectories) tree
For each non-root node in the tree, reports the Kullback-Leibler
divergence of its conditional next-state distribution against its
parent's. Large values flag contexts where extending memory by one
more step changes the prediction; the changes_prediction
column flags contexts where the most likely next state changes
between the node and its parent.
Renamed from path_dependence() to avoid a naming collision with
a sibling package.
tree_dependence( tree, base = 2, sort_by = c("divergence", "entropy_drop", "entropy", "count", "depth"), top = NULL )tree_dependence( tree, base = 2, sort_by = c("divergence", "entropy_drop", "entropy", "count", "depth"), top = NULL )
tree |
A |
base |
Numeric. Logarithm base for the KL divergence. Default
2 (bits). Use |
sort_by |
Character. Column to sort by, descending. One of
|
top |
Integer or |
This is the diagnostic that the tree's pruning rule (under
criterion = "KL") is comparing against its threshold. It
answers the substantive question: for which contexts does this
tree disagree with a memoryless / shorter-memory model, and where
does that disagreement actually flip the prediction?
The mean of n * KL across rows recovers, up to constants,
the chain-level mutual-information gain from the variable-depth
model over the order-1 model.
A data.frame with one row per non-root pathway, sorted by
divergence descending. Columns: pathway,
depth, count, divergence (Kullback-Leibler
divergence from the parent's prediction), entropy (Shannon
entropy of this pathway's next-state distribution),
entropy_before (entropy of the parent's distribution),
entropy_drop (entropy_before - entropy, the
uncertainty this step of history removes), likely_next
(this node's most likely next state), likely_before (the
parent context's most likely next state), changes_prediction
(likely_next != likely_before). The empty case returns a
0-row data.frame with the same schema.
Cover, T.M. & Thomas, J.A. (2006). Elements of Information Theory, 2nd ed. Wiley.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) pruned <- prune_tree(tree, criterion = "G2") tree_dependence(pruned)seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) pruned <- prune_tree(tree, criterion = "G2") tree_dependence(pruned)
Bare-metal scalar: a count-weighted average of per-context symmetric Kullback-Leibler divergence over the union of pathways present in either tree. No null distribution.
tree_distance(tree_a, tree_b, symmetric = TRUE)tree_distance(tree_a, tree_b, symmetric = TRUE)
tree_a, tree_b
|
context trees with matching alphabets. |
symmetric |
Logical. |
Numeric scalar.
set.seed(1) m1 <- matrix(sample(c("A","B","C"), 200, TRUE), 20) m2 <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr1 <- context_tree(m1, max_depth = 2L, min_count = 3L) tr2 <- context_tree(m2, max_depth = 2L, min_count = 3L) tree_distance(tr1, tr2)set.seed(1) m1 <- matrix(sample(c("A","B","C"), 200, TRUE), 20) m2 <- matrix(sample(c("A","B","C"), 200, TRUE), 20) tr1 <- context_tree(m1, max_depth = 2L, min_count = 3L) tr2 <- context_tree(m2, max_depth = 2L, min_count = 3L) tree_distance(tr1, tr2)
Returns a tidy data.frame with one row per pathway (= context) in the tree. The pathway is the sequence of states the tree conditions on; each row reports the count, depth, modal next state, and how surprising the next-state distribution is relative to a shorter history. This is the substantive consumer-facing API of a fitted tree – a ranked list of trajectories that the data actually supports, with a consistent interpretive frame.
Renamed from pathways() to avoid a naming collision with a
sibling package.
tree_pathways( tree, min_count = 1L, sort_by = c("count", "divergence", "depth"), decreasing = TRUE, ... )tree_pathways( tree, min_count = 1L, sort_by = c("count", "divergence", "depth"), decreasing = TRUE, ... )
tree |
A |
min_count |
Integer. Drop pathways with fewer than this many occurrences. Default 1. |
sort_by |
Character. One of |
decreasing |
Logical. Default |
... |
Ignored. |
Each row is a pathway – a (possibly empty) sequence of states ending
at the point where a prediction is made. The divergence column
quantifies how much more information the pathway carries than its
(k-1)-suffix in bits. changes_prediction = TRUE marks pathways
where the longer history changes which next state is most likely.
A data.frame with columns pathway (arrow notation,
e.g. "A -> B -> C"; the root is reported as "(start)"),
depth (history length), count, likely_next
(the most likely next state), next_probability (its
probability), divergence (Kullback-Leibler divergence from
the parent context's prediction, in bits; NA for the root
and when the parent context is absent, and Inf when the
pathway places probability on a state the parent predicts with
probability 0), and changes_prediction (logical, did the
most likely next state change vs the parent context?). The empty
case returns a 0-row data.frame with the same schema.
seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) tree_pathways(tree)seqs <- replicate(50, sample(c("A","B","C"), 12, replace = TRUE), simplify = FALSE) tree <- context_tree(seqs, max_depth = 3) tree_pathways(tree)
Runs k-fold cross-validation over a grid of fitting and pruning
hyperparameters. Returns a data.frame ranked by held-out perplexity.
The configuration with minimum perplexity is exposed via
attr(result, "best").
Folds are at the sequence level (each fold holds out whole sequences, not positions within sequences).
tune_tree( data, max_depth = 2L:5L, min_count = c(3L, 5L, 10L), smoothing = "floor", prune = c(FALSE, TRUE), alpha = 0.05, folds = 5L, seed = 1L, actor = NULL, time = NULL, action = NULL, order = NULL, session = NULL, time_threshold = 900 )tune_tree( data, max_depth = 2L:5L, min_count = c(3L, 5L, 10L), smoothing = "floor", prune = c(FALSE, TRUE), alpha = 0.05, folds = 5L, seed = 1L, actor = NULL, time = NULL, action = NULL, order = NULL, session = NULL, time_threshold = 900 )
data |
Sequence data; format accepted by |
max_depth |
Integer vector. Grid values for tree depth.
Default |
min_count |
Integer vector. Grid for minimum-count threshold.
Default |
smoothing |
Smoothing grid. A character vector of method names
(e.g. |
prune |
Logical vector. Whether to apply G^2 pruning.
Default |
alpha |
Numeric. Significance level for G^2 pruning when
|
folds |
Integer. Number of CV folds. Default 5. |
seed |
Integer. RNG seed for reproducible folds. Default 1. |
actor, time, action, order, session, time_threshold
|
Long-format
reshaping, forwarded to |
A transitiontrees_tune object: a data.frame with one row per
grid point and columns max_depth, nmin,
smoothing, prune, logLik, n_scored,
perplexity, n_nodes_avg, and folds_failed (the
number of CV folds that errored for that configuration), sorted by
perplexity ascending. attr(result, "best") carries the
minimum-perplexity row among configurations whose every fold
scored; it is NULL if none did. A warning is issued when any
configuration had a failed fold.
set.seed(1) m <- matrix(sample(c("A","B","C"), 30 * 12, replace = TRUE), 30, 12) tune_tree(m, max_depth = 1:3, smoothing = c("floor", "kneser_ney"), prune = FALSE, folds = 4)set.seed(1) m <- matrix(sample(c("A","B","C"), 30 * 12, replace = TRUE), 30, 12) tune_tree(m, max_depth = 1:3, smoothing = c("floor", "kneser_ney"), prune = FALSE, folds = 4)