--- title: "Pipeline: netify to ergm (statnet)" description: "Prepare netify objects for ERGM modeling with statnet." author: "Cassy Dorff and Shahryar Minhas" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Pipeline: netify to ergm (statnet)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( dev = "png", dpi = 150, cache = FALSE, echo = TRUE, collapse = TRUE, comment = "#>" ) ``` ERGMs (Exponential Random Graph Models) are the workhorse of inferential network analysis in the [statnet](https://statnet.org/) ecosystem. With `netify`, build the network, attach attributes with `add_node_vars()` / `add_dyad_vars()`, then convert to the `network` format that `ergm` expects with `to_statnet()`. This vignette covers: 1. The cross-sectional pipeline (single network -> single ergm fit) 2. The longitudinal pipeline (per-time ergm fits) 3. The multilayer pipeline (per-layer ergm fits -- `to_statnet()` now iterates layers automatically) 4. Round-tripping ergm-simulated networks back into netify for descriptive checks ```{r} library(netify) library(ggplot2) library(network) library(ergm) data(icews) ``` ## 1. cross-sectional pipeline The simplest case: one snapshot, one model. ```{r} # build a single-year netify object with nodal and dyadic attributes icews_2010 <- icews[icews$year == 2010, ] verb_coop <- netify( icews_2010, actor1 = "i", actor2 = "j", symmetric = FALSE, weight = "verbCoop", nodal_vars = c("i_polity2", "i_log_gdp"), dyad_vars = "matlCoop" ) # convert to statnet 'network' object sn_2010 <- to_statnet(verb_coop) sn_2010 ``` `to_statnet()` carries nodal attributes through as vertex attributes and dyadic attributes through as edge attributes. ### before you fit: three sanity checks ERGMs fail in cryptic ways when the underlying network is malformed. Three checks catch most "invalid output from statistic" headaches before you ever call `ergm()`: ```{r} # 1. nas in nodal covariates referenced by nodecov / nodematch nd <- attr(verb_coop, "nodal_data") na_cols <- names(nd)[vapply(nd, function(c) any(is.na(c)), logical(1))] na_cols <- setdiff(na_cols, c("actor", "time", "layer")) na_cols # 2. isolates -- ergm fits but some terms (e.g. gwdegree) degenerate m <- get_raw(verb_coop) bin <- (m != 0) & !is.na(m) deg_total <- rowSums(bin) + colSums(bin) isolates <- rownames(m)[deg_total == 0] length(isolates) # 3. symmetry: the netify flag must match how your model treats ties attr(verb_coop, "symmetric") ``` If `na_cols` is non-empty, drop the affected actors with `drop_na_actors(verb_coop, cols = na_cols)` or impute before converting. Here we create a cleaned network before showing a model formula that uses nodal covariates: ```{r} clean_cols <- attr(sn_2010, "netify_na_cols") if (is.null(clean_cols)) clean_cols <- na_cols if (!is.null(clean_cols) && length(clean_cols) > 0) { verb_coop_clean <- drop_na_actors(verb_coop, cols = clean_cols) } else { verb_coop_clean <- verb_coop } sn_2010_clean <- to_statnet(verb_coop_clean) attr(sn_2010_clean, "netify_na_cols") ``` Now fit an ergm against the cleaned network: ```{r, eval = FALSE} # note: this chunk is not evaluated by default to keep vignette build fast. # replace eval = false with eval = true to actually run. set.seed(6886) m <- ergm( sn_2010_clean ~ edges + nodecov("i_polity2") + nodecov("i_log_gdp") ) summary(m) ``` The full model above is not evaluated during vignette builds. The short block below checks the `netify()` -> `to_statnet()` -> `ergm()` handoff: ```{r} toy_edges <- data.frame( i = c("a", "a", "b", "c"), j = c("b", "c", "c", "d"), y = 1 ) toy_net <- suppressMessages(netify( toy_edges, actor1 = "i", actor2 = "j", weight = "y", symmetric = FALSE )) toy_sn <- to_statnet(toy_net) set.seed(6886) invisible(capture.output( toy_fit <- ergm(toy_sn ~ edges, estimate = "MPLE", eval.loglik = FALSE) )) coef(toy_fit) ``` One common ERGM failure comes from missing nodal covariates: `ergm::nodecov("i_polity2")` will refuse to fit if any vertex has an `NA` polity score. Either subset to actors with complete covariates or impute before passing to `to_statnet()`. When `to_statnet()` sees NAs in nodal attributes, it records the affected column names on the output object. The `drop_na_actors()` helper handles this for cross-sectional, longitudinal, and bipartite netlets. Use the same approach before formulas that reference nodal attributes with `ergm::nodecov()` or `ergm::nodematch()`. ### dyadic edge covariates: the `_e` suffix Any dyadic covariate you passed to `netify()` (e.g. `dyad_vars = "matlCoop"`) is attached to the resulting `network` object in two places: - as a network-level attribute under its original name (the full `n x n` matrix), accessible via `network::get.network.attribute(sn, "matlCoop")`, and - as a per-edge attribute under `_e` (here, `matlCoop_e`), populated only on edges that actually exist. The trailing `_e` disambiguates the per-edge edgelist from the network-level matrix. For `ergm::edgecov()`, pass the **original (matrix) name** -- `edgecov()` resolves its argument as a network-level matrix attribute, so the `_e` per-edge alias will not work there: ```{r, eval = FALSE} m <- ergm(sn_2010 ~ edges + edgecov("matlCoop")) ``` The `_e` per-edge attribute is exposed for descriptive uses (for example, coloring edges by covariate in `network::plot.network`). `to_statnet()` emits a one-shot inform listing both forms the first time it attaches dyadic covariates. Goodness-of-fit, mcmc.diagnostics, and other postestimation tools live in `ergm` itself. ## 2. longitudinal pipeline (per-time fits) For a longitudinal netify, `to_statnet()` returns a named list -- one `network` object per time period: ```{r} verb_longit <- netify( icews[icews$year %in% 2010:2012, ], actor1 = "i", actor2 = "j", time = "year", symmetric = FALSE, weight = "verbCoop", nodal_vars = "i_polity2" ) sn_list <- to_statnet(verb_longit) length(sn_list) names(sn_list) class(sn_list[[1]]) ``` Fit a separate ergm per period: ```{r, eval = FALSE} set.seed(6886) fits <- lapply(sn_list, function(n) { ergm(n ~ edges) }) ``` For *coevolution* models (where ties change as a function of past ties), look at `tergm` from the statnet suite -- `netify` provides the data, `tergm` does the modeling. A typical tergm 4.x call against the same per-period list looks like this: ```{r, eval = FALSE} # longitudinal ergm via tergm 4.x library(tergm) set.seed(6886) nets <- to_statnet(verb_longit) # named list of network objects fit <- tergm( nets ~ Form(~ edges) + Persist(~ edges), estimate = "CMLE", times = seq_along(nets) ) summary(fit) ``` In the tergm 4.x split formulation, `Form()` models the *formation* of new edges between periods, and `Persist()` models the *persistence* of existing edges from one period to the next; both formulas accept the same ergm terms (`edges`, `ergm::nodecov()`, `ergm::nodematch()`, `mutual`, `gwesp`, etc.). Add nodal covariates after applying the same missing-data check shown in the cross-sectional example. ## 3. multilayer pipeline (per-layer fits) Multilayer ergm modeling is its own research literature. The simple pragmatic approach is to fit one ergm per layer. `to_statnet()` handles this automatically: pass a multilayer netify, get back a named list keyed by layer. ```{r} verb <- netify(icews_2010, actor1 = "i", actor2 = "j", symmetric = FALSE, weight = "verbCoop") matl <- netify(icews_2010, actor1 = "i", actor2 = "j", symmetric = FALSE, weight = "matlCoop") multi <- layer_netify(list(verbal = verb, material = matl)) sn_multi <- to_statnet(multi) length(sn_multi) names(sn_multi) ``` Each element is a `network` object you can plug straight into `ergm()`. ## 4. round-tripping simulated networks back to netify A useful descriptive check after fitting an ergm is to simulate from the fit, compute descriptives on the simulated networks, and compare to the observed network. `simulate.ergm()` returns `network` objects, which means you can pipe them straight back into a netify for any of netify's descriptive tools: ```{r, eval = FALSE} set.seed(6886) # m is the fitted ergm object from the model chunk above sims <- simulate(m, nsim = 100) # list of network objects # convert each simulated network back into a netify for comparison sim_nets <- lapply(sims, function(s) to_netify(s)) # compare observed vs simulated at the structural level all_nets <- c(list(observed = verb_coop), sim_nets) struct_comp <- compare_networks(all_nets, what = "structure") ``` This gives observed-vs-simulated comparisons for density, reciprocity, transitivity, mean degree, and related summaries. ## tl;dr ```r # build -> attach attrs -> export -> model net <- netify(df, actor1 = "i", actor2 = "j", symmetric = FALSE, weight = "x") net <- add_node_vars(net, attrs, actor = "id") sn <- to_statnet(net) # single network, longit list, or multilayer list m <- ergm(sn ~ edges + ...) # modeling happens in ergm/statnet ``` For latent-factor and DBN workflows, see the project-site article on `lame` and `dbn`. ## references 1. Butts, C. T. (2008). network: A Package for Managing Relational Data in R. Journal of Statistical Software, 24(2), 1-36. doi:10.18637/jss.v024.i02 2. Cranmer, S. J., Desmarais, B. A., & Morgan, J. W. (2021). Inferential Network Analysis. Cambridge University Press. doi:10.1017/9781316662915 3. Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M., & Morris, M. (2008). ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. Journal of Statistical Software, 24(3), 1-29. doi:10.18637/jss.v024.i03 4. Krivitsky, P. N., & Handcock, M. S. (2014). A Separable Model for Dynamic Networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 29-46. doi:10.1111/rssb.12014 5. Snijders, T. A. B., Pattison, P. E., Robins, G. L., & Handcock, M. S. (2006). New Specifications for Exponential Random Graph Models. Sociological Methodology, 36(1), 99-153. doi:10.1111/j.1467-9531.2006.00176.x