| Title: | Augmented Modelling of Relational Events |
|---|---|
| Description: | Utilities for simulating and prototyping relational event models, including helpers to generate dynamic event sequences and covariate processes for sender and receiver sets. The endogenous-effect and case-control estimation machinery follows Juozaitiene and Wit (2024) <doi:10.1093/jrsssa/qnae132>. |
| Authors: | Francisco Richter [aut, cre], Martina Boschi [aut], Ernst C. Wit [aut], Melania Lembo [aut] |
| Maintainer: | Francisco Richter <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-06-29 19:58:49 UTC |
| Source: | https://github.com/cran/amorem |
This helper augments an event log with sender and/or receiver covariates that live in separate lookup tables. It is designed for static covariates (one row per actor). Dynamic covariates should be merged manually before calling this helper.
attach_static_covariates( event_log, sender_covariates = NULL, receiver_covariates = NULL, actor_col = "actor", sender_prefix = "sender_", receiver_prefix = "receiver_", allow_missing = TRUE )attach_static_covariates( event_log, sender_covariates = NULL, receiver_covariates = NULL, actor_col = "actor", sender_prefix = "sender_", receiver_prefix = "receiver_", allow_missing = TRUE )
event_log |
A standardized event log containing columns |
sender_covariates, receiver_covariates
|
Data frames with one row per
actor. Each must include the identifier column specified by |
actor_col |
Name of the identifier column inside the covariate tables. |
sender_prefix, receiver_prefix
|
Prefixes applied to the appended covariate column names. |
allow_missing |
Logical; if |
The input event_log with additional columns for each covariate table
supplied.
Per-actor covariates for the classroom_events event stream.
classroom_actorsclassroom_actors
A data frame with 20 rows and 3 columns:
Character actor id matching the sender/receiver columns
of classroom_events.
Factor "F" / "M" — biological sex.
Factor with levels "instructor", "grade_11",
"grade_12".
McFarland (2001), via networkDynamic. See classroom_events.
Time-stamped directed interactions among 20 individuals in a single
US high-school class session, recorded on 16 October 1996 by Daniel
McFarland. The same data appear in the networkDynamic R package as
McFarland_cls33_10_16_96; this is a tidy event-table form.
classroom_eventsclassroom_events
A data frame with 691 rows and 5 columns:
Minutes since the start of the class period.
Character actor id matching classroom_actors$id.
Character actor id matching classroom_actors$id.
Factor with levels "social", "sanction",
"task".
Integer weight of the interaction.
McFarland, D. (2001). Student resistance: How the formal and
informal organization of classrooms facilitate everyday forms of
student defiance. American Journal of Sociology 107(3), 612–678.
doi:10.1086/338779. Redistributed via the networkDynamic R
package (CRAN), dataset McFarland_cls33_10_16_96.
Directed time-stamped instant messages between students of the University of California, Irvine over 193 days in 2004. Each row is one message. Sourced from the SNAP repository.
college_msgcollege_msg
A data frame with 59,835 rows and 3 columns:
Days since the first message.
attr(., "unix_origin") holds the Unix epoch of time = 0.
Character user id.
Character user id.
Panzarasa, P., Opsahl, T., Carley, K. (2009). Patterns and dynamics of users' behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology 60(5), 911–932. Distributed via SNAP: https://snap.stanford.edu/data/CollegeMsg.html.
Superseded by rem(), the unified front-end for fitting relational
event models on preprocessed case-control data. compare_models() remains
fully supported.
Convenience wrapper that runs the canonical case-control / no-intercept
binomial-GLM recipe on every specification in models and returns a
tidy AIC comparison table. One case-control sample is drawn from
event_log and shared across every specification so that the AIC
values are directly comparable.
Each specification is a character vector of stat names accepted by
endogenous_features(). The function computes the union of
all stats once, builds case-minus-control differences, and fits one
binomial GLM per specification with the appropriate subset of
columns. The fitted models are equivalent to the partial-likelihood
parametrisation used in case-control REM inference
(Vu et al. 2017; Juozaitienė & Wit 2024).
For n_controls = 1 the helper fits a no-intercept binomial GLM
on case-minus-control differences. For n_controls > 1 it falls
back to survival::clogit() — a true conditional-logistic fit
that correctly handles multiple controls per stratum. The
survival package is in Suggests and is required only when
n_controls > 1.
compare_models( event_log, models, n_controls = 1, scope = c("all", "appearance", "citation"), mode = c("one", "two"), random_effects = NULL, half_life = NULL, seed = NULL, keep_fits = FALSE )compare_models( event_log, models, n_controls = 1, scope = c("all", "appearance", "citation"), mode = c("one", "two"), random_effects = NULL, half_life = NULL, seed = NULL, keep_fits = FALSE )
event_log |
Data frame with |
models |
Named list of character vectors. Each entry names one
candidate specification; the vector contents are the
endogenous statistics it includes. Stats must be valid names for
|
n_controls |
Number of controls per case in
|
scope, mode
|
Passed through to |
random_effects |
Optional character vector. May be any of
|
half_life |
Required when any specification contains an exp-decay stat. Shared across all specs that use one. |
seed |
Optional integer seed for the case-control sample. |
keep_fits |
Logical; when |
A data frame with one row per specification and columns
model, n_terms, n_obs, log_lik, AIC, delta_AIC. Sorted
ascending by AIC. The model with the lowest AIC has
delta_AIC = 0.
Juozaitienė R, Wit EC (2024). It's about time: revisiting reciprocity and triadicity in relational event analysis. Journal of the Royal Statistical Society Series A 188(4), 1246-1262. doi:10.1093/jrsssa/qnae132.
endogenous_features(), sample_non_events().
data(classroom_events) compare_models( classroom_events, models = list( count = c("reciprocity_count", "transitivity_count"), continuous = c("reciprocity_time_recent", "transitivity_time_recent"), interrupted = c("reciprocity_time_recent_interrupted", "transitivity_time_recent_interrupted")), seed = 11)data(classroom_events) compare_models( classroom_events, models = list( count = c("reciprocity_count", "transitivity_count"), continuous = c("reciprocity_time_recent", "transitivity_time_recent"), interrupted = c("reciprocity_time_recent_interrupted", "transitivity_time_recent_interrupted")), seed = 11)
Superseded by rem(), the unified front-end for fitting relational
event models on preprocessed case-control data. compare_models_global()
remains fully supported.
Implements the time-shifted partial likelihood of Lembo,
Juozaitienė, Vinciotti & Wit (2025) for fitting relational event
models with global covariate effects — covariates that are
time-dependent but constant across all interacting pairs (e.g.\
temperature, time of day, the residual baseline hazard). Standard
case-control partial likelihood cannot identify these because
global terms cancel in the rate ratio; this function follows the
paper's Section 4 recipe: a random per-dyad time shift breaks the
cancellation, and with one non-event per event the partial
likelihood reduces to a degenerate logistic additive model fit by
mgcv::gam().
Per the paper's equations 11-13:
where each is the difference between the
(smooth) function evaluated at the focal event time and at the
sampled non-event's shifted time
.
Shift distribution. Per-dyad shifts are drawn
independently from an exponential distribution with mean
where
is the average inter-arrival time
in event_log. The paper's simulation studies find that
works in practice and that the estimates
are robust to choices in .
Specification format. Each entry of models is a named character
vector mapping a covariate name (a statistic in
endogenous_features() or a column of
global_covariates) to an effect type:
"linear" – linear beta * x term.
"nl" – smooth s(x) (thin-plate, paper's default).
"tv" – smooth s(time, by = x) (time-varying).
"tvnl" – tensor product te(time, x).
"global_smooth" – smooth s(x_global) evaluated at the
focal time vs. the non-event's shifted time (the paper's
g_b(x^{(b)}(t)) family).
"global_cyclic" – cyclic smooth s(x_global, bs = "cc")
for time-of-day-like covariates with a periodic domain.
"global_time" – a smooth on time itself, recovering
the residual time effect of paper eq. 3.
compare_models_global( event_log, models, global_covariates = NULL, scope = c("all", "appearance", "citation"), mode = c("one", "two"), half_life = NULL, shift_scale = 1, k = NULL, k_cyclic = 10, seed = NULL, keep_fits = FALSE )compare_models_global( event_log, models, global_covariates = NULL, scope = c("all", "appearance", "citation"), mode = c("one", "two"), half_life = NULL, shift_scale = 1, k = NULL, k_cyclic = 10, seed = NULL, keep_fits = FALSE )
event_log |
Data frame with |
models |
Named list of specifications (see "Specification format" above). |
global_covariates |
Optional data frame with a |
scope, mode
|
Passed through to |
half_life |
Required when any dyadic spec uses an exp-decay stat. |
shift_scale |
Multiplier on the average inter-arrival time for the exponential shift distribution. Defaults to 1. |
k |
Optional knot count for smooth terms (see
|
k_cyclic |
Knot count for |
seed |
Integer seed for the case-control sample and the shift draws. |
keep_fits |
Logical; when |
Data frame with one row per specification and columns
model, n_terms, n_obs, log_lik, AIC, delta_AIC.
Lembo M, Juozaitienė R, Vinciotti V, Wit EC (2025). Relational event models with global covariates: an application to bike sharing. Journal of the Royal Statistical Society, Series C. doi:10.1093/jrsssc/qlaf058.
compare_models() (linear, no globals),
compare_models_smooth() (smooth dyadic effects, no globals).
data(classroom_events) # Hourly temperature track on the same time axis: g <- data.frame(time = seq(0, max(classroom_events$time), length = 50), temperature = rnorm(50, 20, 5)) compare_models_global( classroom_events, models = list( dyadic_only = c(reciprocity_count = "linear", transitivity_count = "linear"), with_global = c(reciprocity_count = "linear", transitivity_count = "linear", temperature = "global_smooth", time = "global_time")), global_covariates = g, seed = 11, k = 5)data(classroom_events) # Hourly temperature track on the same time axis: g <- data.frame(time = seq(0, max(classroom_events$time), length = 50), temperature = rnorm(50, 20, 5)) compare_models_global( classroom_events, models = list( dyadic_only = c(reciprocity_count = "linear", transitivity_count = "linear"), with_global = c(reciprocity_count = "linear", transitivity_count = "linear", temperature = "global_smooth", time = "global_time")), global_covariates = g, seed = 11, k = 5)
Superseded by rem(), which fits the same smooth (TV / NL / TVNL)
effects on preprocessed case-control data. compare_models_smooth() remains
fully supported.
Mirrors compare_models() but lets each statistic in a specification
take one of four effect types instead of a single linear coefficient:
linear, time-varying (TV), non-linear (NL), or jointly time-varying
non-linear (TVNL). The smooth machinery follows
Boschi, Lerner & Wit (2025); the matrix-of-event-vs-non-event trick
is documented in their Section 3.3.
For each specification:
One case-control sample is drawn from event_log with
n_controls = 1 (paired event / non-event design).
For every requested statistic, both the case (event) and the
control (non-event) features are computed via
endogenous_features().
The mgcv design uses the case-vs-control matrix trick:
linear -> a single coefficient on case - control (column d_stat).
tv -> s(time, by = d_stat) — smooth in time, multiplied by d_stat.
nl -> s(stat_mat, by = I_mat) where stat_mat is a
two-column matrix cbind(case, control) and I_mat is
cbind(1, -1).
tvnl -> te(time_mat, stat_mat, by = I_mat) tensor product
smooth, with time_mat both columns equal to the event time vector.
The model is fitted with mgcv::gam and a degenerate logistic
likelihood: response = rep(1, n), formula = one ~ -1 + ...,
family = binomial. This matches Boschi et al. equation 8.
AIC values are directly comparable across specifications because every
fit uses the same case-control sample. Returns the same tidy
data.frame as compare_models().
compare_models_smooth( event_log, models, scope = c("all", "appearance", "citation"), mode = c("one", "two"), half_life = NULL, k = NULL, seed = NULL, keep_fits = FALSE )compare_models_smooth( event_log, models, scope = c("all", "appearance", "citation"), mode = c("one", "two"), half_life = NULL, k = NULL, seed = NULL, keep_fits = FALSE )
event_log |
Data frame with |
models |
Named list of specifications. Each entry is itself a
named character vector (or named list) mapping statistic names to
effect types:
list(
linear = c(reciprocity_count = "linear",
transitivity_count = "linear"),
nl = c(reciprocity_time_recent = "nl",
transitivity_time_recent = "nl"),
tvnl = c(reciprocity_time_recent = "tvnl",
transitivity_time_recent = "tvnl"))
|
scope, mode
|
Passed through to |
half_life |
Required when an exp-decay statistic is requested. |
k |
Optional integer: knot count for |
seed |
Integer seed for the case-control sample. |
keep_fits |
Logical; when |
Data frame with one row per specification and columns
model, n_terms, n_obs, log_lik, AIC, delta_AIC.
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289.
compare_models() for the linear-only variant.
data(classroom_events) compare_models_smooth( classroom_events, models = list( linear = c(reciprocity_time_recent = "linear", transitivity_time_recent = "linear"), nl = c(reciprocity_time_recent = "nl", transitivity_time_recent = "nl"), tvnl = c(reciprocity_time_recent = "tvnl", transitivity_time_recent = "tvnl")), seed = 11)data(classroom_events) compare_models_smooth( classroom_events, models = list( linear = c(reciprocity_time_recent = "linear", transitivity_time_recent = "linear"), nl = c(reciprocity_time_recent = "nl", transitivity_time_recent = "nl"), tvnl = c(reciprocity_time_recent = "tvnl", transitivity_time_recent = "tvnl")), seed = 11)
Returns the names of the endogenous statistics that
endogenous_features() evaluates with the compiled C++ engine.
Statistics outside this set are computed by the (slower) pure-R fallback.
A character vector of statistic names.
length(cpp_supported_stats()) head(cpp_supported_stats())length(cpp_supported_stats()) head(cpp_supported_stats())
A 56 × 56 matrix of pairwise geographic distances (in metres) between
US states and territories, computed from boundary geometries via
sf::st_distance.
dist_matrixdist_matrix
A numeric matrix with 56 rows and 56 columns. Row and column names are state/territory names.
Computed from US Census TIGER/Line shapefiles using the tigris, sf, and geosphere packages. See Walker (2024), Pebesma (2018), Hijmans (2022).
Internal emails between members of a single department of a large European research institution over ~803 days. The dataset has been filtered to remove self-loops. Sourced from the SNAP repository as a single-department slice of the email-Eu-core-temporal dataset.
email_eu_coreemail_eu_core
A data frame with 12,216 rows and 3 columns:
Days since the first email in the recording window.
Character employee id (anonymised).
Character employee id (anonymised).
Paranjape, A., Benson, A.R., Leskovec, J. (2017). Motifs in temporal networks. WSDM '17, 601–610. doi:10.1145/3018661.3018731. Distributed via SNAP: https://snap.stanford.edu/data/email-Eu-core-temporal.html.
Given a standardized relational event log, this helper derives historical statistics for each event based on the evolving network. The statistics follow the taxonomy of Juozaitienė and Wit (2025, JRSS-A) and cover reciprocity, transitivity, cyclic closure, sending balance and receiving balance. All definitions use the continuous convention (effects persist even after a closure event).
endogenous_features( event_log, stats = c("sender_outdegree", "receiver_indegree", "reciprocity", "recency"), half_life = NULL, sort = TRUE, history_log = NULL, prior_log = NULL )endogenous_features( event_log, stats = c("sender_outdegree", "receiver_indegree", "reciprocity", "recency"), half_life = NULL, sort = TRUE, history_log = NULL, prior_log = NULL )
event_log |
A data.frame containing at least |
stats |
Character vector of statistics to compute. See Details for the full list of allowed values. |
half_life |
Positive numeric; the half-life parameter |
sort |
Logical; when |
history_log |
Optional data.frame giving the authoritative event
history (columns |
prior_log |
Optional data.frame of events that precede the study window
(columns |
All statistics are evaluated immediately before the event is logged. They are grouped into five families.
Degree / baseline:
sender_outdegreeNumber of events previously sent by the sender.
receiver_indegreeNumber of events previously received by the receiver.
recencyElapsed time since the last event on the same ordered
pair; NA when the dyad is brand new.
Reciprocity — reverse-dyad (receiver sender) history:
reciprocity / reciprocity_binary
1 if the reverse dyad has ever been observed, 0 otherwise.
reciprocity_countTotal count of past reverse-dyad events.
reciprocity_exp_decayExponentially weighted sum of past
reverse-dyad events (requires half_life).
reciprocity_time_recentElapsed time since the most recent
reverse-dyad event; NA if none.
reciprocity_time_firstElapsed time since the first
reverse-dyad event; NA if none.
Transitivity — two-path :
transitivity_binary1 if any intermediary exists with
both and before .
transitivity_countNumber of such intermediaries.
transitivity_binary_orderedLike binary but requiring
to precede .
transitivity_count_orderedCount with order restriction.
transitivity_exp_decayExp-decay weighted sum over two-paths
(requires half_life).
transitivity_exp_decay_orderedExp-decay with order restriction.
transitivity_time_recentTime since the most recently completed
two-path; NA if none.
transitivity_time_firstTime since the earliest two-path; NA
if none.
transitivity_time_recent_orderedTime since the most recent
ordered two-path; NA if none.
transitivity_time_first_orderedTime since the earliest ordered
two-path; NA if none.
Cyclic closure — two-path , closed by
:
cyclic_binary1 if any cyclic two-path exists.
cyclic_countNumber of cyclic intermediaries.
cyclic_time_recentTime since the most recent cyclic two-path
formation; NA if none.
cyclic_time_firstTime since the first cyclic two-path
formation; NA if none.
Sending balance — shared target: both and
exist:
sending_balance_binary1 if any shared target exists.
sending_balance_countNumber of shared targets.
sending_balance_time_recentTime since the most recent
shared-target two-path formation; NA if none.
sending_balance_time_firstTime since the first
shared-target two-path formation; NA if none.
Receiving balance — shared source: both and
exist:
receiving_balance_binary1 if any shared source exists.
receiving_balance_countNumber of shared sources.
receiving_balance_time_recentTime since the most recent
shared-source two-path formation; NA if none.
receiving_balance_time_firstTime since the first
shared-source two-path formation; NA if none.
The statistic "sender_receivers_set" is special: it adds a list-column
in which each element is the character vector of receivers the row's sender
has reached before that row (the building block for set-valued endogenous
covariates, e.g. an alien species' previously invaded regions). It honours
history_log, so it can be computed for sampled non-events without those
non-events polluting the history.
The event log with added columns, one per requested statistic
(sender_receivers_set is added as a list-column).
data(classroom_events) feats <- endogenous_features(classroom_events, stats = c("reciprocity", "recency")) head(feats)data(classroom_events) feats <- endogenous_features(classroom_events, stats = c("reciprocity", "recency")) head(feats)
Implements the auxiliary-statistic test of Boschi & Wit (2025),
Section 3.7 / eq. 20. Tests whether a covariate auxiliary that is
not part of model has nonetheless been adequately captured
indirectly by the fitted model. Uses the simulation-based p-value
described in the paper: n_sim replicates of
are drawn from i.i.d. standard
normals, the test statistic
is
computed, and the empirical p-value is the fraction of replicates
with .
gof_auxiliary( event_log, model, auxiliary, n_sim = 1000, scope = "all", mode = "one", half_life = NULL, seed = NULL )gof_auxiliary( event_log, model, auxiliary, n_sim = 1000, scope = "all", mode = "one", half_life = NULL, seed = NULL )
event_log |
Dyadic event log. |
model |
Named character vector of |
auxiliary |
Name of the statistic to test as an unmodelled
feature; must be a statistic computable by
|
n_sim |
Number of Monte Carlo replicates (default 1000). |
scope, mode, half_life, seed
|
See |
List with statistic (), p_value,
G, u, and auxiliary.
Boschi M, Wit EC (2025). Goodness of fit in relational event models. Statistics and Computing 36(4).
Implements the omnibus test of Boschi & Wit (2025), Section 3.6 /
eq. 19. Runs gof_univariate() per covariate in model, then
combines the resulting p-values via the Cauchy combination
(Liu & Xie 2020), with analytic p-value
.
gof_global( event_log, model, scope = "all", mode = "one", half_life = NULL, seed = NULL )gof_global( event_log, model, scope = "all", mode = "one", half_life = NULL, seed = NULL )
event_log |
Dyadic event log. |
model |
Named character vector of |
scope, mode, half_life, seed
|
See |
List with statistic (), p_value,
and components (per-covariate data.frame with covariate,
statistic, p_value).
Boschi M, Wit EC (2025). Goodness of fit in relational event models. Statistics and Computing 36(4).
Implements the multivariate test of Boschi & Wit (2025), Section 3.4.
Builds a q-dimensional cumulative residual process from the spline
basis of the requested covariate's smooth effect, normalises by the
inverse-square-root of the empirical variance-covariance matrix
(eq. 17), and tests against a q-dimensional
standard Brownian bridge via .
The p-value is computed empirically by simulating n_sim Brownian
bridge trajectories.
gof_multivariate( event_log, model, covariate, k_basis = 5, n_sim = 1000, scope = "all", mode = "one", half_life = NULL, seed = NULL )gof_multivariate( event_log, model, covariate, k_basis = 5, n_sim = 1000, scope = "all", mode = "one", half_life = NULL, seed = NULL )
event_log |
Dyadic event log. |
model |
Named character vector of |
covariate |
Name of the covariate to test under a smooth effect. |
k_basis |
Spline-basis dimension for |
n_sim |
Number of simulated Brownian bridges for the empirical p-value (default 1000). |
scope, mode, half_life, seed
|
See |
List with statistic (), p_value,
W (n x q matrix), u, and covariate.
Boschi M, Wit EC (2025). Goodness of fit in relational event models. Statistics and Computing 36(4).
Implements the univariate cumulative martingale residual test of
Boschi & Wit (2025), Section 3.3. The test statistic is
where
is the normalised cumulative score process for
the requested covariate; under correct specification
converges to a standard Brownian bridge, so the p-value follows the
Kolmogorov-Smirnov distribution
.
gof_univariate( event_log, model, covariate, scope = "all", mode = "one", half_life = NULL, seed = NULL )gof_univariate( event_log, model, covariate, scope = "all", mode = "one", half_life = NULL, seed = NULL )
event_log |
Dyadic event log. |
model |
Named character vector of |
covariate |
Name of the covariate in |
scope, mode, half_life, seed
|
See |
A list with statistic (), p_value (KS),
W (numeric vector of length n, the normalised process),
and u (the time grid in [0, 1]).
Boschi M, Wit EC (2025). Goodness of fit in relational event models. Statistics and Computing 36(4).
data(classroom_events) gof_univariate(classroom_events, model = c(reciprocity_count = "linear", transitivity_count = "linear"), covariate = "reciprocity_count", seed = 1)data(classroom_events) gof_univariate(classroom_events, model = c(reciprocity_count = "linear", transitivity_count = "linear"), covariate = "reciprocity_count", seed = 1)
For a focal candidate hyperedge ,
activity(t, I, J) counts the number of past events
with
satisfying AND
.
hyperedge_activity(hyperedge_log, I, J = character(0), t)hyperedge_activity(hyperedge_log, I, J = character(0), t)
hyperedge_log |
A hyperedge log (see |
I |
Character vector of sender names defining the focal subset. |
J |
Character vector of receiver names defining the focal
subset. Pass |
t |
Focal time. Only events strictly before |
A single non-negative integer.
Lerner J, Boschi M, Wit EC (2025). Subset repetition.
Hyperedge analogue of endogenous_features(). Accepts a
hyperedge log (see hyperedge_log()) and computes hyperedge-native
statistics, falling back to the dyadic engine for stat names that
belong to the standard dyadic endogenous catalogue.
hyperedge_features(hyperedge_log, stats, half_life = NULL)hyperedge_features(hyperedge_log, stats, half_life = NULL)
hyperedge_log |
A hyperedge log (see |
stats |
Character vector of statistic names. Mix of hyperedge-
native names listed above and the dyadic catalogue names accepted
by |
half_life |
Required when an exp-decay statistic is requested (only applies to delegated dyadic stats; hyperedge subrep does not use a half-life). |
Recognised hyperedge stat names:
"subrep_<rho>_<l>"Directed subset repetition (paper eq. 4).
rho = sender-side subset cardinality (1..|I|),
l = receiver-side subset cardinality (0..|J|, 0 = ignore receivers).
Examples: "subrep_1_1" (average activity over single-actor sub-pairs),
"subrep_2_1" (over pair-of-senders × single-receiver subpairs).
"subrep_<rho>"Undirected subset repetition. Equivalent to
"subrep_<rho>_0"; counts past events whose participant set is a
superset of the chosen subset, with no receiver-side restriction.
"activity"Counts past events whose participant set covers
the focal event's entire (I, J) pair. Equivalent to
"subrep_<|I|>_<|J|>".
For dyadic-shaped events (every row has |I| = |J| = 1) and a
dyadic stat name, this function delegates to
endogenous_features() via as_dyadic_log().
The hyperedge log with one added column per requested stat.
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289.
hyperedge_subrep(), hyperedge_activity(),
endogenous_features().
hl <- hyperedge_log( I = list(c("a","b"), c("a","c"), c("b","c"), c("a","b","c")), J = list(c("X"), c("X","Y"), c("Y"), c("X")), time = c(1, 2, 3, 4)) hyperedge_features(hl, stats = c("subrep_1_1", "subrep_2_1", "activity"))hl <- hyperedge_log( I = list(c("a","b"), c("a","c"), c("b","c"), c("a","b","c")), J = list(c("X"), c("X","Y"), c("Y"), c("X")), time = c(1, 2, 3, 4)) hyperedge_features(hl, stats = c("subrep_1_1", "subrep_2_1", "activity"))
A hyperedge log generalises the dyadic
(sender, receiver, time) event log used elsewhere in amorem to a
(I, J, time) event log where I and J are list-columns
containing the set of senders and the set of receivers participating
in each hyperevent. This matches the data model of Boschi, Lerner &
Wit (2025): each event is a time-stamped directed hyperedge
from a sender set to a
receiver set.
hyperedge_log(I, J, time) is_hyperedge_log(x) as_hyperedge_log(event_log) as_dyadic_log(hyperedge_log)hyperedge_log(I, J, time) is_hyperedge_log(x) as_hyperedge_log(event_log) as_dyadic_log(hyperedge_log)
I |
List-column: each element is a character vector of sender
actor names participating in that event. Length-1 vectors are
allowed (and become standard dyadic events when combined with a
length-1 |
J |
List-column: each element is a character vector of receiver actor names. Empty character vectors are allowed and signal an undirected hyperevent. |
time |
Numeric vector of event times. Must be finite and non-decreasing after sorting. |
x |
A data frame or list-of-columns to test or convert. |
event_log |
A dyadic event log with |
hyperedge_log |
A hyperedge log produced by |
The constructor hyperedge_log() accepts list-columns directly and
performs validation (character members, non-empty sets, finite
times, sorted by time). as_hyperedge_log() promotes a dyadic
(sender, receiver, time) data frame to the hyperedge form by
wrapping each sender and receiver in a length-1 character
vector. as_dyadic_log() is the inverse: it succeeds only when
every row of the hyperedge log has a length-1 sender set AND a
length-1 receiver set.
For undirected hyperevents (e.g.\ multi-actor meetings), pass an
empty receiver set: J = list(character(0), character(0), ...).
The receiver list-column must still be present.
A data.frame with columns I, J, time, additionally
carrying class amorem_hyperedge_log to distinguish it from a
dyadic log in dispatch contexts. Sorted by time ascending.
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289.
# Two co-authored citation events: hl <- hyperedge_log( I = list(c("alice", "bob"), c("alice", "carol")), J = list(c("paperA"), c("paperA", "paperB")), time = c(1.0, 2.5)) is_hyperedge_log(hl) # Round-trip a dyadic log: dy <- data.frame(sender = c("a", "b"), receiver = c("b", "c"), time = c(1, 2)) h <- as_hyperedge_log(dy) as_dyadic_log(h)# Two co-authored citation events: hl <- hyperedge_log( I = list(c("alice", "bob"), c("alice", "carol")), J = list(c("paperA"), c("paperA", "paperB")), time = c(1.0, 2.5)) is_hyperedge_log(hl) # Round-trip a dyadic log: dy <- data.frame(sender = c("a", "b"), receiver = c("b", "c"), time = c(1, 2)) h <- as_hyperedge_log(dy) as_dyadic_log(h)
Adds two integer columns to a hyperedge log: size_I (the number
of senders) and size_J (the number of receivers). Convenient
shortcut for filtering / case-control sampling matched on
cardinality (see Boschi et al. 2025, Section 3.3).
hyperedge_sizes(hyperedge_log)hyperedge_sizes(hyperedge_log)
hyperedge_log |
A hyperedge log. |
The hyperedge log with two added integer columns.
For a focal hyperedge and orders
, computes the average activity over
every sender subset of I of size rho and every receiver subset
of J of size l, per Boschi, Lerner & Wit (2025) Equation 4:
hyperedge_subrep( hyperedge_log, I, J = character(0), t, rho = length(I), l = length(J) )hyperedge_subrep( hyperedge_log, I, J = character(0), t, rho = length(I), l = length(J) )
hyperedge_log |
A hyperedge log (see |
I |
Character vector of senders for the focal event. |
J |
Character vector of receivers (or |
t |
Focal time. |
rho |
Order on the sender side: subset cardinality. Must be
between 1 and |
l |
Order on the receiver side: subset cardinality. Must be
between 0 and |
For dyadic events with ,
subrep(rho = 1, l = 1) reduces to the dyad event count
(already exposed as reciprocity_count and related stats in
endogenous_features()). The function exists because
for true hyperedge data the average over subsets of intermediate
size captures partial-subset repetition that no dyadic statistic
can represent.
A single non-negative numeric.
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time- Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289. Lerner J, et al. (2025). The eventnet computation framework.
hl <- hyperedge_log( I = list(c("a","b"), c("a","c"), c("b","c")), J = list(c("X"), c("X","Y"), c("Y")), time = c(1, 2, 3)) # Activity for the (a, X) sub-pair before t = 4: hyperedge_activity(hl, I = "a", J = "X", t = 4) # First-order subrep on event (a, b) -> X at t = 4: hyperedge_subrep(hl, I = c("a","b"), J = "X", t = 4, rho = 1, l = 1)hl <- hyperedge_log( I = list(c("a","b"), c("a","c"), c("b","c")), J = list(c("X"), c("X","Y"), c("Y")), time = c(1, 2, 3)) # Activity for the (a, X) sub-pair before t = 4: hyperedge_activity(hl, I = "a", J = "X", t = 4) # First-order subrep on event (a, b) -> X at t = 4: hyperedge_subrep(hl, I = c("a","b"), J = "X", t = 4, rho = 1, l = 1)
Computes per-observation martingale residuals
from a one-control-per-case
partial-likelihood fit, where is the case indicator
inside the (case, control) pair and
is the fitted probability that observation is the event
in its risk set. The residuals sum to zero within each stratum.
martingale_residuals( event_log, model, scope = c("all", "appearance", "citation"), mode = c("one", "two"), half_life = NULL, seed = NULL )martingale_residuals( event_log, model, scope = c("all", "appearance", "citation"), mode = c("one", "two"), half_life = NULL, seed = NULL )
event_log |
Dyadic event log (see |
model |
A named character vector mapping statistic name to
|
scope, mode, half_life, seed
|
Same meaning as in
|
Useful as a goodness-of-fit diagnostic: plotting residuals vs. time or
vs. a covariate reveals systematic miscalibration. The convention
matches survival::residuals.coxph(type = "martingale") for the
two-element risk set induced by 1-control case-control sampling.
Only the linear partial-likelihood path (compare_models()-style
linear-effect specs) is supported by this helper; for smooth-effect
fits the case-vs-control matrix design used by
compare_models_smooth() does not have a clean per-observation
martingale interpretation.
A data frame with one row per observation in the case-control
table (so 2N rows for N events), with columns:
stratum, role ("case" or "control"), sender, receiver,
time, eta, fitted_prob, residual.
Therneau TM, Grambsch PM, Fleming TR (1990). Martingale-based residuals for survival models. Biometrika 77(1), 147–160.
compare_models(), compare_models_smooth().
data(classroom_events) res <- martingale_residuals( classroom_events, model = c(reciprocity_count = "linear", transitivity_count = "linear"), seed = 1) plot(res$time, res$residual, col = ifelse(res$role == "case", "red", "grey60"), ylab = "Martingale residual", xlab = "Event time") abline(h = 0)data(classroom_events) res <- martingale_residuals( classroom_events, model = c(reciprocity_count = "linear", transitivity_count = "linear"), seed = 1) plot(res$time, res$residual, col = ifelse(res$role == "case", "red", "grey60"), ylab = "Martingale residual", xlab = "Event time") abline(h = 0)
Collects the architecture and training hyper-parameters used by
rem(method = "nn"). Training maximizes the same conditional-logistic
partial likelihood as method = "clogit" (softmax over each risk set), so
this backend is a drop-in flexible counterpart of the linear conditional
logit. Two predictor architectures are available:
"mlp"a multilayer perceptron scoring the full covariate vector jointly — can represent interactions between statistics.
"additive_spline"an additive predictor sum_k f_k(x_k) with
each f_k a B-spline expansion fitted by (mini-batch) stochastic
gradient — the STREAM construction of Filippi-Mazzola & Wit (2024,
JRSS-C 73(4), doi:10.1093/jrsssc/qlae023). Interpretable per-feature
curves; with batch_strata it scales to event logs far beyond what an
in-memory smooth fit can hold.
nn_control( hidden = c(16L, 8L), activation = c("relu", "tanh"), architecture = c("mlp", "additive_spline"), spline_df = 8L, batch_strata = NULL, epochs = 300L, lr = 0.01, l2 = 1e-04, validation = 0.2, patience = 25L, standardize = TRUE, engine = c("r", "torch"), seed = NULL, verbose = FALSE )nn_control( hidden = c(16L, 8L), activation = c("relu", "tanh"), architecture = c("mlp", "additive_spline"), spline_df = 8L, batch_strata = NULL, epochs = 300L, lr = 0.01, l2 = 1e-04, validation = 0.2, patience = 25L, standardize = TRUE, engine = c("r", "torch"), seed = NULL, verbose = FALSE )
|
Integer vector of hidden-layer sizes for |
|
activation |
Hidden-layer activation for |
architecture |
Predictor architecture: |
spline_df |
Degrees of freedom (basis size) per covariate for
|
batch_strata |
Optional mini-batch size, in strata, for stochastic
gradient training. |
epochs |
Maximum number of training epochs (full passes over the training strata). |
lr |
Adam learning rate. |
l2 |
L2 penalty (weight decay). The pure-R engine penalises the weights
only; the torch engine applies it via Adam's |
validation |
Fraction of strata held out for validation / early
stopping. Set to |
patience |
Early-stopping patience: training stops after this many epochs without improvement of the validation loss; the best parameters are restored. |
standardize |
Z-score the features before training (recommended; the
scaling is stored and re-applied by |
engine |
Training engine: |
seed |
Optional integer seed for reproducible initialization and validation split. |
verbose |
Print the loss every 50 epochs. |
A list of class "nn_control".
Quantifies uncertainty for a rem(method = "nn") fit by a stratum
bootstrap: the case-control strata are resampled with replacement, the
network is refit on each resample (reusing the original nn_control()
settings, including the training engine), and the spread across refits
yields partial-dependence uncertainty bands and a concordance confidence
interval. This is the inferential counterpart that the point-prediction
nn backend otherwise lacks (coef() returns NULL).
nn_uncertainty( object, data, B = 200L, case = NULL, stratum = NULL, n_grid = 50L, level = 0.95, seed = NULL )nn_uncertainty( object, data, B = 200L, case = NULL, stratum = NULL, n_grid = 50L, level = 0.95, seed = NULL )
object |
A fitted |
data |
The case-control data frame the model was fit on (same columns). |
B |
Number of bootstrap resamples. |
case, stratum
|
Event-indicator and stratum columns, resolved exactly as
in |
n_grid |
Grid resolution for the partial-dependence curves. |
level |
Confidence level for the bands and the concordance interval. |
seed |
Optional integer seed for the resampling. |
Each bootstrap partial-dependence curve is centred (its grid-mean removed) before the pointwise quantiles are taken, so the bands describe uncertainty in the shape of each effect, not the conditional-logit's unidentified per-stratum offset.
An object of class "nn_uncertainty": a per-feature list of
data.frame(x, lo, med, hi) bands, a concordance quantile interval, and
the settings B, level. Has print() and plot() methods.
Plot partial-dependence uncertainty bands
## S3 method for class 'nn_uncertainty' plot(x, ...)## S3 method for class 'nn_uncertainty' plot(x, ...)
x |
An |
... |
Passed to the underlying |
x, invisibly.
Time-stamped directed emails among employees of a mid-sized
manufacturing company over a nine-month period. Sourced from Network
Repository as the ia-radoslaw-email dataset.
radoslaw_emailradoslaw_email
A data frame with 82,927 rows and 4 columns:
Days since the first email.
attr(., "unix_origin") holds the Unix epoch of time = 0.
Character employee id.
Character employee id.
Integer — 1 for every record in the original file.
Michalski, R., Palus, S., Kazienko, P. (2014). Seed selection for spread of influence in social networks: Temporal vs. static approach. New Generation Computing 32(3–4), 213–235. doi:10.1007/s00354-014-0402-9. Distributed via https://networkrepository.com/ia-radoslaw-email.php.
rem() is the unified front-end for fitting relational event models from
already preprocessed case-control data (e.g. produced by eventnet),
where the endogenous/exogenous covariates have already been computed. It is
intended to supersede compare_models(), compare_models_smooth() and
compare_models_global(), which couple feature computation and fitting.
rem( formula, data, method = c("gam", "clogit", "nn"), case = NULL, stratum = NULL, time = NULL, k = NULL, gam_method = NULL, nn = nn_control(), ... )rem( formula, data, method = c("gam", "clogit", "nn"), case = NULL, stratum = NULL, time = NULL, k = NULL, gam_method = NULL, nn = nn_control(), ... )
formula |
A formula; see Formula syntax. |
data |
A data.frame of preprocessed case-control data (wide for the
|
method |
Estimation backend; see Description. |
case |
Optional name of the 0/1 event-indicator column for the |
stratum |
Name of the column grouping each case with its controls
(required by |
time |
Name of the time column, required for |
k |
Optional integer basis dimension passed to |
gam_method |
Smoothness-selection method for the |
nn |
An |
... |
Reserved for future use. |
Two estimation backends are provided:
"gam"Degenerate logistic regression on a case-1-control
design (Boschi, Lerner & Wit 2025): the response is a constant 1 and the
linear predictor is built from event-minus-control differences. Supports
smooth time-varying (tv), non-linear (nl) and time-varying
non-linear (tvnl) effects via mgcv::gam().
"clogit"Conditional logistic regression on a case-k-control
design via survival::clogit() (linear terms only). The case/control
strata are taken from stratum, or derived as cumsum(case == 1) when
stratum is NULL (assuming each case is immediately followed by its
controls, the eventnet blocked layout).
"nn"Flexible conditional-logistic models on the same
case-k-control design as clogit, trained by (mini-batch) gradient
descent on the exact risk-set softmax partial likelihood. Two
architectures via nn_control(): a multilayer perceptron scoring the
full covariate vector jointly (interaction-capable), or an
additive_spline predictor — per-covariate B-spline expansions fitted
by stochastic gradient, the STREAM construction of Filippi-Mazzola &
Wit (2024, JRSS-C). No coefficient table; summary() reports
in-sample (and, with a validation split, held-out) concordance and
plot(type = "pdp") shows per-feature curves. Pure-R implementation,
no extra dependencies.
An object of class "rem": a list with the fitted model ($fit),
the method, the original formula, the parsed terms, and the number of
observations n. Has summary(), coef(), plot() and logLik()
methods.
The right-hand side lists covariates. A bare name is a linear effect; wrap
a name to request a smooth effect (gam method only):
tv(x) — time-varying linear effect: s(time, by = d_x).
nl(x) — non-linear effect: s(cbind(x_ev, x_nv), by = c(1, -1)).
tvnl(x) — time-varying non-linear effect (tensor product).
re(x) — random effect of a grouping factor x (e.g. the
sender), built from the matched x_ev / x_nv levels as
s(cbind(x_ev, x_nv), by = cbind(1, -1), bs = "re"), contributing
f(event_level) - f(control_level) (following the REM tutorial's
species-invasiveness term). Falls back to a single column x when
x_ev / x_nv are absent. Identified only when the event and control
differ on x.
For the gam method the left-hand side is ignored (the response is the
constant case indicator); for clogit the left-hand side names the 0/1 event
indicator column (e.g. event ~ x), unless case is given explicitly.
For a covariate x, the event/control difference is taken from column x,
else d_x, else x_ev - x_nv. Non-linear terms use transform_x_ev /
transform_x_nv when present (the eventnet spline-transformed covariate),
otherwise x_ev / x_nv. tvnl uses transformed_time when present.
Undirected logs (senders only, no receiver/TARGET column) are supported.
compare_models_smooth() (superseded), simulate_relational_events()
(whose wide = TRUE output is a valid input here),
simulate_directed_hyperevents_tvnl().
set.seed(1) w <- simulate_relational_events( n_events = 300, senders = paste0("a", 1:12), receivers = paste0("a", 1:12), n_controls = 1, endogenous_stats = "reciprocity_count", endogenous_effects = c(reciprocity_count = 0.6), wide = TRUE) fit <- rem(~ reciprocity_count, data = w, method = "gam") coef(fit)set.seed(1) w <- simulate_relational_events( n_events = 300, senders = paste0("a", 1:12), receivers = paste0("a", 1:12), n_controls = 1, endogenous_stats = "reciprocity_count", endogenous_effects = c(reciprocity_count = 0.6), wide = TRUE) fit <- rem(~ reciprocity_count, data = w, method = "gam") coef(fit)
Given an observed event log, generate nested case-control data by sampling counterfactual sender–receiver pairs according to predefined strategies.
sample_non_events( event_log, n_controls = 1, scope = c("all", "appearance", "citation"), mode = c("two", "one"), risk = c("standard", "remove"), exclude_pairs = NULL, allow_loops = FALSE, seed = NULL, max_attempts = 1000 )sample_non_events( event_log, n_controls = 1, scope = c("all", "appearance", "citation"), mode = c("two", "one"), risk = c("standard", "remove"), exclude_pairs = NULL, allow_loops = FALSE, seed = NULL, max_attempts = 1000 )
event_log |
Data frame with columns |
n_controls |
Number of non-events (controls) to sample per realized event. |
scope |
Candidate set definition. |
mode |
|
risk |
Strategy governing the risk set. |
exclude_pairs |
Optional two-column data.frame/matrix of
|
allow_loops |
Logical; can sampled non-events have identical sender and receiver? |
seed |
Optional seed for reproducibility. |
max_attempts |
Maximum resampling attempts per control before giving up (prevents infinite loops when candidate sets are small). |
A data.frame containing the original events (event = 1) and the
sampled controls (event = 0), grouped by stratum identifiers.
data(classroom_events) cc <- sample_non_events(classroom_events, n_controls = 1, seed = 1) head(cc)data(classroom_events) cc <- sample_non_events(classroom_events, n_controls = 1, seed = 1) head(cc)
Create simple exogenous covariate structures for senders and receivers. The function can return static values (one row per actor) or time-stamped processes (one row per actor and time point) that follow independent AR(1) dynamics.
simulate_actor_covariates( senders, receivers, covariate_names, time_points = NULL, sd = 1, rho = 0, seed = NULL )simulate_actor_covariates( senders, receivers, covariate_names, time_points = NULL, sd = 1, rho = 0, seed = NULL )
senders |
Character vector of sender actors. |
receivers |
Character vector of receiver actors. |
covariate_names |
Character vector naming the covariates to simulate. |
time_points |
Optional numeric vector of strictly increasing time stamps for time-varying covariates. When omitted, static covariates are returned. |
sd |
Standard deviation of the innovation noise. |
rho |
AR(1) coefficient used when |
seed |
Optional integer to make the simulation reproducible. |
A list with two elements: sender_covariates and
receiver_covariates. Each element is either a wide data.frame (static
case) or a tidy data.frame with columns actor, time,
covariate, and value (dynamic case).
sender_cov <- simulate_actor_covariates( senders = letters[1:3], receivers = LETTERS[1:2], covariate_names = c("activity", "recency"), time_points = seq(0, 4), rho = 0.6, sd = 0.2, seed = 123 ) str(sender_cov)sender_cov <- simulate_actor_covariates( senders = letters[1:3], receivers = LETTERS[1:2], covariate_names = c("activity", "recency"), time_points = seq(0, 4), rho = 0.6, sd = 0.2, seed = 123 ) str(sender_cov)
Generates a sequence of directed hyperevents from a sender set
to a receiver set
, with both
and non-empty. This is the directed two-mode
counterpart to simulate_hyperedge_events() and matches the data
model used in Boschi, Lerner & Wit (2025) Section 5 for
citation networks (authors citing papers).
simulate_directed_hyperedge_events( n_events, senders, receivers, min_size_I = 1L, max_size_I = 1L, min_size_J = 1L, max_size_J = 1L, baseline_rate = 1, endogenous_stats = character(0), endogenous_effects = numeric(0), start_time = 0 )simulate_directed_hyperedge_events( n_events, senders, receivers, min_size_I = 1L, max_size_I = 1L, min_size_J = 1L, max_size_J = 1L, baseline_rate = 1, endogenous_stats = character(0), endogenous_effects = numeric(0), start_time = 0 )
n_events |
Number of events to simulate. |
senders |
Character vector of sender names |
receivers |
Character vector of receiver names |
min_size_I, max_size_I
|
Sender-side cardinality bounds. |
min_size_J, max_size_J
|
Receiver-side cardinality bounds. |
baseline_rate |
Multiplicative baseline ( |
endogenous_stats |
Character vector of supported stat names:
|
endogenous_effects |
Numeric vector of coefficients,
same length and order as |
start_time |
Simulation start time. |
At each step the simulator enumerates every candidate hyperedge
with
and ,
computes the rate
and draws one event proportional to its rate. The waiting time is exponential with rate equal to the total intensity.
Candidate-space size is exponential in
and , so practical use is
limited to small actor / item universes.
A directed hyperedge log
(amorem_hyperedge_log data frame with I, J, time columns;
J non-empty on every row).
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289, Section 5.
hl <- simulate_directed_hyperedge_events( n_events = 40, senders = paste0("a", 1:4), receivers = paste0("p", 1:4), max_size_I = 2, max_size_J = 2, baseline_rate = 0.3, endogenous_stats = c("subrep_1_1", "size_I"), endogenous_effects = c(subrep_1_1 = 0.8, size_I = -0.4))hl <- simulate_directed_hyperedge_events( n_events = 40, senders = paste0("a", 1:4), receivers = paste0("p", 1:4), max_size_I = 2, max_size_J = 2, baseline_rate = 0.3, endogenous_stats = c("subrep_1_1", "size_I"), endogenous_effects = c(subrep_1_1 = 0.8, size_I = -0.4))
A teaching-oriented simulator for directed relational hyper-events in
which the sender set and the receiver set are disjoint, driven by exogenous
group covariates with a time-varying effect on the sender side and a
non-linear effect on the receiver side. It is the packaged, parameterised
form of the workshop running example (sunbelt-workshop-materials/running_example.R)
and produces a ready-to-fit case-control dataset for GAM-based estimation of
smooth (TV / NL) effects.
simulate_directed_hyperevents_tvnl( sender_attr, receiver_attr, time_varying_effect = function(t) sin(2 * t), nonlinear_effect = function(x) -4 + 2 * exp(-((x - 3)^2)/(2 * 2^2)), horizon = 2, dt = 0.01, n_controls = 1L, max_group_size_sender = length(sender_attr), max_group_size_receiver = length(receiver_attr) )simulate_directed_hyperevents_tvnl( sender_attr, receiver_attr, time_varying_effect = function(t) sin(2 * t), nonlinear_effect = function(x) -4 + 2 * exp(-((x - 3)^2)/(2 * 2^2)), horizon = 2, dt = 0.01, n_controls = 1L, max_group_size_sender = length(sender_attr), max_group_size_receiver = length(receiver_attr) )
sender_attr |
Named numeric vector of sender actor attributes (names are the sender ids). |
receiver_attr |
Named numeric vector of receiver actor attributes. |
time_varying_effect |
Function of one argument giving the time-varying
coefficient |
nonlinear_effect |
Function of one argument giving the non-linear effect
|
horizon |
Positive numeric; the simulation end time (start is 0). |
dt |
Positive numeric; the time-grid step used by the thinning scheme
(must be smaller than |
n_controls |
Non-negative integer; number of non-event pairs sampled per event. Defaults to 1 (a case-1-control design). |
max_group_size_sender, max_group_size_receiver
|
Integers; the largest subset size considered when enumerating sender / receiver groups. Default to the full actor sets (all non-empty subsets). Use 1 for ordinary dyadic events. |
Each sender (resp. receiver) group is a non-empty subset of the sender
(resp. receiver) actors – a hyperedge endpoint. A group's covariate is the
mean of its members' actor attributes. For an ordered group pair
the instantaneous rate at time is
where is the time-varying sender effect
(time_varying_effect), is the non-linear receiver effect
(nonlinear_effect), and denotes a group covariate. Events are drawn
on a fixed time grid of width dt by a thinning scheme: within each step the
next inter-event time is sampled from an exponential with the total rate
evaluated at the step midpoint, and the firing pair is chosen with
probability proportional to its rate. For every realised event,
n_controls non-event pairs are sampled uniformly from the remaining group
pairs, yielding a case-n_controls-control design.
Setting max_group_size_sender = 1 and max_group_size_receiver = 1 reduces
the groups to single actors, i.e. ordinary directed dyadic events.
A long-format data.frame, one row per (event or control), with
columns event_id (links a case to its controls), event_time,
event (1 = realised event, 0 = sampled non-event),
sender_group, receiver_group (group labels), and
cov_sender, cov_receiver (group covariates). The true
data-generating effect functions are attached as
attr(x, "truth") (a list with time_varying_effect,
nonlinear_effect, and horizon) for comparison against fitted smooths.
set.seed(1234) sa <- setNames(rnorm(4, 5, 1.5), paste0("S", 1:4)) ra <- setNames(rnorm(4, 3, 2.0), paste0("R", 1:4)) d <- simulate_directed_hyperevents_tvnl(sa, ra, horizon = 2, n_controls = 1) head(d) table(d$event)set.seed(1234) sa <- setNames(rnorm(4, 5, 1.5), paste0("S", 1:4)) ra <- setNames(rnorm(4, 3, 2.0), paste0("R", 1:4)) d <- simulate_directed_hyperevents_tvnl(sa, ra, horizon = 2, n_controls = 1) head(d) table(d$event)
Generates a sequence of undirected hyperevents — meetings of
varying size drawn from the actor set actors — under a linear
hyperedge model. Mirrors the simulation setup of Boschi, Lerner &
Wit (2025) Section 4: each event is a subset of actors with size
in 1..max_size, fired with rate
where each is one of the hyperedge-native
covariates supported by hyperedge_features()
(activity, subrep_<rho> for undirected events) or size
(the event's cardinality ).
simulate_hyperedge_events( n_events, actors, max_size, baseline_rate, endogenous_stats = character(0), endogenous_effects = numeric(0), start_time = 0, min_size = 1L )simulate_hyperedge_events( n_events, actors, max_size, baseline_rate, endogenous_stats = character(0), endogenous_effects = numeric(0), start_time = 0, min_size = 1L )
n_events |
Number of events to simulate. |
actors |
Character vector of actor names. |
max_size |
Maximum allowed meeting size ( |
baseline_rate |
Multiplicative baseline ( |
endogenous_stats |
Character vector of stat names accepted by
|
endogenous_effects |
Numeric vector of coefficients, same
length and order as |
start_time |
Simulation start time. |
min_size |
Minimum allowed meeting size. Defaults to 1. |
At each step the simulator enumerates every subset of actors
with size in 1..max_size. The per-event work is therefore
; practical for small actor counts (e.g. ,
max_size <= 4).
A hyperedge log (see hyperedge_log()) with n_events
rows.
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289.
# Five-actor meetings of size up to 3, with weak attractor on # repeated triads and a size penalty: hl <- simulate_hyperedge_events( n_events = 50, actors = LETTERS[1:5], max_size = 3, baseline_rate = 0.2, endogenous_stats = c("subrep_2", "size"), endogenous_effects = c(subrep_2 = 0.5, size = -0.3))# Five-actor meetings of size up to 3, with weak attractor on # repeated triads and a size penalty: hl <- simulate_hyperedge_events( n_events = 50, actors = LETTERS[1:5], max_size = 3, baseline_rate = 0.2, endogenous_stats = c("subrep_2", "size"), endogenous_effects = c(subrep_2 = 0.5, size = -0.3))
Generate a simple relational event log for a sender set and receiver set using a softmax allocation rule over dyadic intensities. The process follows the Gillespie algorithm, where the time between events is drawn from an exponential distribution with rate equal to the sum of all dyadic intensities.
simulate_relational_events( n_events, senders, receivers, baseline_rate = 1, start_time = 0, horizon = Inf, contribution_logits = NULL, sender_covariates = NULL, sender_effects = NULL, receiver_covariates = NULL, receiver_effects = NULL, allow_loops = FALSE, n_controls = 0, endogenous_stats = NULL, endogenous_effects = NULL, global_covariates = NULL, global_effects = NULL, method = c("gillespie", "tau_leap"), tau = NULL, half_life = NULL, risk = c("standard", "remove"), wide = FALSE )simulate_relational_events( n_events, senders, receivers, baseline_rate = 1, start_time = 0, horizon = Inf, contribution_logits = NULL, sender_covariates = NULL, sender_effects = NULL, receiver_covariates = NULL, receiver_effects = NULL, allow_loops = FALSE, n_controls = 0, endogenous_stats = NULL, endogenous_effects = NULL, global_covariates = NULL, global_effects = NULL, method = c("gillespie", "tau_leap"), tau = NULL, half_life = NULL, risk = c("standard", "remove"), wide = FALSE )
n_events |
Number of events to generate. |
senders |
Character vector listing the sender set |
receivers |
Character vector listing the receiver set |
baseline_rate |
Positive scalar. A constant baseline hazard multiplier applied to all dyads. Defaults to 1. |
start_time |
Initial time stamp. |
horizon |
Optional maximum horizon; simulation stops once the cumulative time would exceed this value. |
contribution_logits |
Optional |
sender_covariates |
Optional numeric data.frame/matrix with one row per sender. |
sender_effects |
Optional numeric vector of coefficients for
|
receiver_covariates |
Optional numeric data.frame/matrix with one row per receiver. |
receiver_effects |
Optional numeric vector of coefficients for
|
allow_loops |
Logical; whether sender and receiver can coincide. |
n_controls |
Integer; number of non-events (controls) to sample
uniformly at random for each realized event. If |
endogenous_stats |
Optional character vector of endogenous mechanisms to include in the rate. Each entry updates a state matrix after every event so the intensity of the next event depends on the realized history. Supported values:
Defaults to |
endogenous_effects |
Numeric vector of linear coefficients for
|
global_covariates |
Optional data.frame describing piecewise-constant
global covariates: variables whose value at time |
global_effects |
Numeric vector of linear coefficients for the global
covariates. May be named (names must match the covariate columns in
|
method |
Simulation algorithm. Either |
tau |
Positive scalar; the step size for |
half_life |
Positive scalar; the half-life |
risk |
Risk-set rule. |
wide |
Logical; when |
When global_covariates is supplied, the simulator uses a
boundary-aware Gillespie scheme: the total event rate is rescaled by
; whenever a sampled waiting time would
cross an interval boundary, the clock is advanced to the boundary without
recording an event, and the next waiting time is redrawn under the new
global multiplier. Global covariates do not change the per-dyad selection
probabilities (the multiplier cancels), only the waiting-time
distribution. When combined with endogenous_stats, the
per-dyad rates are recomputed at every step from the current endogenous
state and then rescaled by the global multiplier.
The "tau_leap" algorithm advances the clock by a user-chosen step
and draws, for every dyad, a
number of events using the rates at the start of the step. Multiple
events can fire in the same step; they are placed at uniform times within
and reported in time order, but they share the
start-of-step endogenous state and global multiplier. Endogenous state is
updated once at the end of the step using all events in that step. The
tau-leap algorithm trades exactness for predictable, vectorised work per
step; it is most useful for high-rate regimes or for problems where the
per-event recomputation in the Gillespie path is the bottleneck. Choose
small enough that (i) on every active
dyad and (ii) is smaller than the shortest interval in
global_covariates (within-step boundary crossings are not
resolved; the start-of-step global multiplier is used for the entire
step).
If n_controls = 0, a data.frame with columns sender,
receiver and time. If n_controls > 0, it returns a
long-format data.frame with additional columns stratum (grouping an
event with its controls) and event (1 for the realized event,
0 for controls). When endogenous_stats is supplied, one extra column
per stat is appended carrying the value each row's dyad had at its event
time (immediately before the event fired), so downstream conditional
logistic / GAM estimators can recover the effects. When
global_covariates is supplied, one column per covariate is appended
carrying the value of that covariate at each row's event time. When
wide = TRUE (which requires n_controls = 1), this long
case-control output is reshaped to one row per event with columns
stratum, time, sender_ev, receiver_ev,
sender_nv, receiver_nv and, for each covariate,
<cov>_ev, <cov>_nv and d_<cov>.
set.seed(1) senders <- receivers <- LETTERS[1:3] sender_cov <- data.frame(activity = c(0.5, -0.2, 1.1)) receiver_cov <- data.frame(popularity = c(0.1, 0.3, -0.4)) # Standard event simulation events <- simulate_relational_events( n_events = 5, senders = senders, receivers = receivers, sender_covariates = sender_cov, sender_effects = 1, receiver_covariates = receiver_cov, receiver_effects = 2 ) events # Case-control generation for partial likelihood inference cc_events <- simulate_relational_events( n_events = 5, senders = senders, receivers = receivers, sender_covariates = sender_cov, sender_effects = 1, n_controls = 2 ) head(cc_events) # Ready-made case-1-control (wide) dataset for degenerate logistic regression wide_events <- simulate_relational_events( n_events = 5, senders = senders, receivers = receivers, n_controls = 1, endogenous_stats = "reciprocity_count", endogenous_effects = c(reciprocity_count = 0.6), wide = TRUE ) head(wide_events)set.seed(1) senders <- receivers <- LETTERS[1:3] sender_cov <- data.frame(activity = c(0.5, -0.2, 1.1)) receiver_cov <- data.frame(popularity = c(0.1, 0.3, -0.4)) # Standard event simulation events <- simulate_relational_events( n_events = 5, senders = senders, receivers = receivers, sender_covariates = sender_cov, sender_effects = 1, receiver_covariates = receiver_cov, receiver_effects = 2 ) events # Case-control generation for partial likelihood inference cc_events <- simulate_relational_events( n_events = 5, senders = senders, receivers = receivers, sender_covariates = sender_cov, sender_effects = 1, n_controls = 2 ) head(cc_events) # Ready-made case-1-control (wide) dataset for degenerate logistic regression wide_events <- simulate_relational_events( n_events = 5, senders = senders, receivers = receivers, n_controls = 1, endogenous_stats = "reciprocity_count", endogenous_effects = c(reciprocity_count = 0.6), wide = TRUE ) head(wide_events)
Time-stamped directed phone calls among undergraduates in an MIT
residence hall over the 2008–2009 academic year. Sourced from the
goldfish R package (Social_Evolution$calls).
social_evolution_callssocial_evolution_calls
A data frame with 439 rows and 4 columns:
Days since the first recorded call.
attr(., "unix_origin") holds the Unix epoch of time = 0.
Character actor id matching
social_evolution_actors$id.
Same domain as sender.
Integer increment recorded for the call (typically 1).
Madan, A., Cebrian, M., Moturu, S., Farrahi, K. (2011).
Sensing the "health state" of a community. IEEE Pervasive
Computing 11(1), 36–45. doi:10.1109/MPRV.2011.79. Redistributed
via the goldfish R package (github.com/snlab-ch/goldfish),
dataset Social_Evolution.
social_evolution_actors, social_evolution_friendship
Self-reported friendship ties recorded at survey waves throughout the Social Evolution study.
social_evolution_friendshipsocial_evolution_friendship
A data frame with 766 rows and 4 columns:
Days since the first recorded call (same origin as
social_evolution_calls). attr(., "unix_origin") holds the
Unix epoch of time = 0.
Character actor id (the survey respondent).
Character actor id (the nominated friend).
Integer — 1 adds the tie, 0 removes it.
Madan et al. (2011), via goldfish.
social_evolution_calls, social_evolution_actors
Module A focuses on preprocessing utilities. This helper normalizes user
supplied event logs into the canonical sender/receiver/time structure
expected elsewhere in the package. It also handles common cleaning tasks such
as sorting, dropping missing rows, and removing loops.
standardize_event_log( event_log, sender_col = "sender", receiver_col = "receiver", time_col = "time", sort = TRUE, drop_nas = TRUE, drop_loops = FALSE, strictly_increasing_time = FALSE, remove_duplicates = TRUE, keep_extra = TRUE )standardize_event_log( event_log, sender_col = "sender", receiver_col = "receiver", time_col = "time", sort = TRUE, drop_nas = TRUE, drop_loops = FALSE, strictly_increasing_time = FALSE, remove_duplicates = TRUE, keep_extra = TRUE )
event_log |
A data.frame (or tibble) containing at least one row per event. |
sender_col, receiver_col, time_col
|
Column names storing the sender, receiver, and time information. |
sort |
Logical; should the output be sorted by time (ties are kept in input order)? |
drop_nas |
Logical; if |
drop_loops |
Logical; when |
strictly_increasing_time |
Logical; if |
remove_duplicates |
Logical; drop duplicated combinations of sender/receiver/time. |
keep_extra |
Logical; if |
A data.frame with columns sender, receiver, and time. The return
object is tagged with class "amorem_event_log" for downstream dispatch.
data(classroom_events) std <- standardize_event_log(classroom_events) head(std)data(classroom_events) std <- standardize_event_log(classroom_events) head(std)
Maps non-negative time gaps to bounded recency
weights via
where is the median of the supplied (or reference) gaps.
Large gaps map toward 0; gaps near 0 map toward 1. The half-life of
the kernel is , so the median gap
itself is mapped to approximately
.
transform_recency(delta, half_life = NULL, reference = NULL)transform_recency(delta, half_life = NULL, reference = NULL)
delta |
Numeric vector of non-negative time gaps. NAs propagate. |
half_life |
Optional positive scalar. If supplied, used directly
as the kernel scale |
reference |
Optional numeric vector. If supplied, the median is
computed on |
This is the data-driven recency parametrisation used as a preprocessing
step for global and exogenous covariates in
Lembo, Juozaitiene, Vinciotti & Wit (2025) and matches the
"recency" axis of endogenous_features().
Numeric vector the same length as delta, with values in
(0, 1]. NAs in delta are preserved.
Lembo M, Juozaitiene R, Vinciotti V, Wit EC (2025). Relational Event Models with Global Covariates. JRSS-C.
set.seed(1) gaps <- rexp(20, rate = 0.5) transform_recency(gaps) transform_recency(gaps, half_life = 1)set.seed(1) gaps <- rexp(20, rate = 0.5) transform_recency(gaps) transform_recency(gaps, half_life = 1)
Reshapes a long case-(k-)control dataset – one row per case and per control,
with a 0/1 case indicator – into a wide case-1-control table with one row
per case. For each covariate the event value (<cov>_ev), the matched
control value (<cov>_nv) and their difference (d_<cov>, event minus
control) are emitted, ready for the gam backend of rem().
widen_case_control( data, case = NULL, stratum = NULL, covariates = NULL, control_index = 1L, keep_ids = TRUE )widen_case_control( data, case = NULL, stratum = NULL, covariates = NULL, control_index = 1L, keep_ids = TRUE )
data |
A long case-control data.frame. |
case |
Optional name of the 0/1 event-indicator column. If |
stratum |
Optional name of the column grouping each case with its
controls. When |
covariates |
Character vector of covariate columns to widen. When
|
control_index |
Which control within each stratum to pair with the case (default the first). Lets a case-k-control log be reduced to case-1-control. |
keep_ids |
Logical; when |
This is the preprocessing companion to rem() for eventnet-style output,
where a case row is followed by its controls and the stratum id is left blank
on control rows.
A data.frame with one row per case: a stratum column, the
sender/receiver identifiers (sender_ev/receiver_ev/sender_nv/
receiver_nv, when present in data and keep_ids = TRUE) and, for each
covariate, <cov>_ev, <cov>_nv and d_<cov>. Strata without exactly one
case or without the requested control are dropped (with a message).
rem(), simulate_relational_events() (wide = TRUE).
set.seed(1) long <- data.frame( IS_OBSERVED = rep(c(1, 0, 0), 4), x = rnorm(12), y = rnorm(12)) widen_case_control(long, control_index = 1)set.seed(1) long <- data.frame( IS_OBSERVED = rep(c(1, 0, 0), 4), x = rnorm(12), y = rnorm(12)) widen_case_control(long, control_index = 1)
Actor attributes for the Social Evolution study
Description
Per-actor covariates for social_evolution_calls and social_evolution_friendship.
Usage
Format
A data frame with 84 rows and 4 columns:
Character actor id (
"Actor 1","Actor 2", …).Logical — whether the actor was present at the start of the study window.
Integer dormitory floor.
Factor — student grade type (freshman, sophomore, junior, senior, graduate-tutor).
Source
Madan et al. (2011), via
goldfish. See social_evolution_calls.