Title: | Fit, Simulate, and Diagnose Hierarchical Exponential-Family Models for Big Networks |
---|---|
Description: | A toolbox for analyzing and simulating large networks based on hierarchical exponential-family random graph models (HERGMs).'bigergm' implements the estimation for large networks efficiently building on the 'lighthergm' and 'hergm' packages. Moreover, the package contains tools for simulating networks with local dependence to assess the goodness-of-fit. |
Authors: | Cornelius Fritz [aut, cre], Michael Schweinberger [aut], Shota Komatsu [aut], Juan Nelson Martínez Dahbura [aut], Takanori Nishida [aut], Angelo Mele [aut] |
Maintainer: | Cornelius Fritz <[email protected]> |
License: | GPL-3 |
Version: | 1.2.3 |
Built: | 2024-12-23 06:22:39 UTC |
Source: | CRAN |
This function computes the adjusted rand index (ARI) of the true and estimated block membership (its definition can be found here https://en.wikipedia.org/wiki/Rand_index). The adjusted rand index is used as a measure of association between two group membership vectors. The more similar the two partitions z_star and z are, the closer the ARI is to 1.
ari(z_star, z)
ari(z_star, z)
z_star |
The true block membership |
z |
The estimated block membership |
The adjusted rand index
data(toyNet) set.seed(123) ari(z_star = toyNet%v% "block", z = sample(c(1:4),size = 200,replace = TRUE))
data(toyNet) set.seed(123) ari(z_star = toyNet%v% "block", z = sample(c(1:4),size = 200,replace = TRUE))
The network corresponds to the contacts between the 17 terrorists who carried out the bombing in Bali, Indonesia in 2002. The network is taken from Koschade (2006).
A statnet
's network class object.
data(bali)
Koschade, S. (2006). A social network analysis of Jemaah Islamiyah: The applications to counter-terrorism and intelligence. Studies in Conflict and Terrorism, 29, 559–575.
The function bigergm
estimates and simulates three classes of exponential-family
random graph models for large networks under local dependence:
The p_1 model of Holland and Leinhardt (1981) in exponential-family form and extensions by Vu, Hunter, and Schweinberger (2013), Schweinberger, Petrescu-Prahova, and Vu (2014), Dahbura et al. (2021), and Fritz et al. (2024) to both directed and undirected random graphs with additional model terms, with and without covariates.
The stochastic block model of Snijders and Nowicki (1997) and Nowicki and Snijders (2001) in exponential-family form.
The exponential-family random graph models with local dependence of Schweinberger and Handcock (2015), with and without covariates. The exponential-family random graph models with local dependence replace the long-range dependence of conventional exponential-family random graph models by short-range dependence. Therefore, exponential-family random graph models with local dependence replace the strong dependence of conventional exponential-family random graph models by weak dependence, reducing the problem of model degeneracy (Handcock, 2003; Schweinberger, 2011) and improving goodness-of-fit (Schweinberger and Handcock, 2015). In addition, exponential-family random graph models with local dependence satisfy a weak form of self-consistency in the sense that these models are self-consistent under neighborhood sampling (Schweinberger and Handcock, 2015), which enables consistent estimation of neighborhood-dependent parameters (Schweinberger and Stewart, 2017; Schweinberger, 2017).
bigergm( object, add_intercepts = FALSE, n_blocks = NULL, n_cores = 1, blocks = NULL, estimate_parameters = TRUE, verbose = 0, n_MM_step_max = 100, tol_MM_step = 1e-04, initialization = "infomap", use_infomap_python = FALSE, virtualenv_python = "r-bigergm", seed_infomap = NULL, weight_for_initialization = 1000, seed = NULL, method_within = "MPLE", control_within = ergm::control.ergm(), clustering_with_features = TRUE, compute_pi = FALSE, check_alpha_update = FALSE, check_blocks = FALSE, cache = NULL, return_checkpoint = TRUE, only_use_preprocessed = FALSE, ... )
bigergm( object, add_intercepts = FALSE, n_blocks = NULL, n_cores = 1, blocks = NULL, estimate_parameters = TRUE, verbose = 0, n_MM_step_max = 100, tol_MM_step = 1e-04, initialization = "infomap", use_infomap_python = FALSE, virtualenv_python = "r-bigergm", seed_infomap = NULL, weight_for_initialization = 1000, seed = NULL, method_within = "MPLE", control_within = ergm::control.ergm(), clustering_with_features = TRUE, compute_pi = FALSE, check_alpha_update = FALSE, check_blocks = FALSE, cache = NULL, return_checkpoint = TRUE, only_use_preprocessed = FALSE, ... )
object |
An R |
add_intercepts |
Boolean value to indicate whether adequate intercepts should be added to the provided formula so that the model in the first stage of the estimation is a nested model of the estimated model in the second stage of the estimation. |
n_blocks |
The number of blocks. This must be specified by the user.
When you pass a |
n_cores |
The number of CPU cores to use. |
blocks |
The pre-specified block memberships for each node.
If |
estimate_parameters |
If |
verbose |
A logical or an integer: if this is TRUE/1, the program will print out additional information about the progress of estimation and simulation. A higher value yields lower level information. |
n_MM_step_max |
The maximum number of MM iterations.
Currently, no early stopping criteria is introduced. Thus |
tol_MM_step |
Tolerance regarding the relative change of the lower bound of the likelihood used to decide on the convergence of the clustering step |
initialization |
How the blocks should be initialized.
If |
use_infomap_python |
If |
virtualenv_python |
Which virtual environment should be used for the infomap algorithm? |
seed_infomap |
seed value (integer) for the infomap algorithm, which can be used to initialize the estimation of the blocks. |
weight_for_initialization |
weight value used for cluster initialization. The higher this value, the more weight is put on the initialized block allocation. |
seed |
seed value (integer) for the random number generator. |
method_within |
If "MPLE" (the default), then the maximum pseudolikelihood estimator is implemented when estimating the within-block network model. If "MLE", then an approximate maximum likelihood estimator is conducted. If "CD" (EXPERIMENTAL), the Monte-Carlo contrastive divergence estimate is returned. |
control_within |
A list of control parameters for the |
clustering_with_features |
If |
compute_pi |
If |
check_alpha_update |
If |
check_blocks |
If TRUE, this function keeps track of estimated block memberships at each MM iteration. |
cache |
a |
return_checkpoint |
If |
only_use_preprocessed |
If |
... |
Additional arguments, to be passed to lower-level functions (mainly to the |
An object of class 'bigergm' including the results of the fitted model. These include:
call of the mode
vector of the found block of the nodes into cluster
vector of the initial block of the nodes into cluster
Connection probabilities represented as a n_blocks x n_blocks
matrix from the first stage of the estimation between all clusters
list of cluster allocation for each node and each iteration
list of posterior distributions of cluster allocations for all nodes for each iteration
change in 'alpha' for each iteration
vector of the evidence lower bounds from the MM algorithm
matrix representing the converged posterior distributions of cluster allocations for all nodes
integer number indicating the number of iterations carried out
sparse matrix representing the adjacency matrix used for the estimation
character stating the status of the estimation
ergm
object of the model for within cluster connections
ergm
object of the model for between cluster connections
list of information to continue the estimation (only returned if return_checkpoint = TRUE
)
vector of the found blocks of the nodes into cluster before the final check for bad clusters
binary value if the parameters in the second step of the algorithm should be estimated or not
Babkin, S., Stewart, J., Long, X., and M. Schweinberger (2020). Large-scale estimation of random graph models with local dependence. Computational Statistics and Data Analysis, 152, 1–19.
Dahbura, J. N. M., Komatsu, S., Nishida, T. and Mele, A. (2021), ‘A structural model of business cards exchange networks’. https://arxiv.org/abs/2105.12704
Fritz C., Georg C., Mele A., and Schweinberger M. (2024). A strategic model of software dependency networks. https://arxiv.org/abs/2402.13375
Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Technical report, Center for Statistics and the Social Sciences, University of Washington, Seattle.
https://csss.uw.edu/Papers/wp39.pdf
Holland, P. W. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, Theory & Methods, 76, 33–65.
Morris M, Handcock MS, Hunter DR (2008). Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 24.
Nowicki, K. and T. A. B. Snijders (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, Theory & Methods, 96, 1077–1087.
Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association, Theory & Methods, 106, 1361–1370.
Schweinberger, M. (2020). Consistent structure estimation of exponential-family random graph models with block structure. Bernoulli, 26, 1205–1233.
Schweinberger, M. and M. S. Handcock (2015). Local dependence in random graph models: characterization, properties, and statistical inference. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 7, 647-676.
Schweinberger, M., Krivitsky, P. N., Butts, C.T. and J. Stewart (2020). Exponential-family models of random graphs: Inference in finite, super, and infinite population scenarios. Statistical Science, 35, 627-662.
Schweinberger, M. and P. Luna (2018). HERGM: Hierarchical exponential-family random graph models. Journal of Statistical Software, 85, 1–39.
Schweinberger, M., Petrescu-Prahova, M. and D. Q. Vu (2014). Disaster response on September 11, 2001 through the lens of statistical network analysis. Social Networks, 37, 42–55.
Schweinberger, M. and J. Stewart (2020). Concentration and consistency results for canonical and curved exponential-family random graphs. The Annals of Statistics, 48, 374–396.
Snijders, T. A. B. and K. Nowicki (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14, 75–100.
Stewart, J., Schweinberger, M., Bojanowski, M., and M. Morris (2019). Multilevel network data facilitate statistical inference for curved ERGMs with geometrically weighted terms. Social Networks, 59, 98–119.
Vu, D. Q., Hunter, D. R. and M. Schweinberger (2013). Model-based clustering of large networks. Annals of Applied Statistics, 7, 1010–1039.
# Load an embedded network object. data(toyNet) # Specify the model that you would like to estimate. model_formula <- toyNet ~ edges + nodematch("x") + nodematch("y") + triangle # Estimate the model bigergm_res <- bigergm( object = model_formula, # The model you would like to estimate n_blocks = 4, # The number of blocks n_MM_step_max = 10, # The maximum number of MM algorithm steps estimate_parameters = TRUE, # Perform parameter estimation after the block recovery step clustering_with_features = TRUE, # Indicate that clustering must take into account nodematch on characteristics check_blocks = FALSE) # Example with N() operator ## Not run: set.seed(1) # Prepare ingredients for simulating a network N <- 500 K <- 10 list_within_params <- c(1, 2, 2,-0.5) list_between_params <- c(-8, 0.5, -0.5) formula <- g ~ edges + nodematch("x") + nodematch("y") + N(~edges,~log(n)-1) memb <- sample(1:K,prob = c(0.1,0.2,0.05,0.05,0.10,0.1,0.1,0.1,0.1,0.1), size = N, replace = TRUE) vertex_id <- as.character(11:(11 + N - 1)) x <- sample(1:2, size = N, replace = TRUE) y <- sample(1:2, size = N, replace = TRUE) df <- tibble::tibble( id = vertex_id, memb = memb, x = x, y = y ) g <- network::network.initialize(n = N, directed = FALSE) g %v% "vertex.names" <- df$id g %v% "block" <- df$memb g %v% "x" <- df$x g %v% "y" <- df$y # Simulate a network g_sim <- simulate_bigergm( formula = formula, coef_within = list_within_params, coef_between = list_between_params, nsim = 1, control_within = control.simulate.formula(MCMC.burnin = 200000)) estimation <- bigergm(update(formula,new = g_sim~.), n_blocks = 10, verbose = T) summary(estimation) ## End(Not run)
# Load an embedded network object. data(toyNet) # Specify the model that you would like to estimate. model_formula <- toyNet ~ edges + nodematch("x") + nodematch("y") + triangle # Estimate the model bigergm_res <- bigergm( object = model_formula, # The model you would like to estimate n_blocks = 4, # The number of blocks n_MM_step_max = 10, # The maximum number of MM algorithm steps estimate_parameters = TRUE, # Perform parameter estimation after the block recovery step clustering_with_features = TRUE, # Indicate that clustering must take into account nodematch on characteristics check_blocks = FALSE) # Example with N() operator ## Not run: set.seed(1) # Prepare ingredients for simulating a network N <- 500 K <- 10 list_within_params <- c(1, 2, 2,-0.5) list_between_params <- c(-8, 0.5, -0.5) formula <- g ~ edges + nodematch("x") + nodematch("y") + N(~edges,~log(n)-1) memb <- sample(1:K,prob = c(0.1,0.2,0.05,0.05,0.10,0.1,0.1,0.1,0.1,0.1), size = N, replace = TRUE) vertex_id <- as.character(11:(11 + N - 1)) x <- sample(1:2, size = N, replace = TRUE) y <- sample(1:2, size = N, replace = TRUE) df <- tibble::tibble( id = vertex_id, memb = memb, x = x, y = y ) g <- network::network.initialize(n = N, directed = FALSE) g %v% "vertex.names" <- df$id g %v% "block" <- df$memb g %v% "x" <- df$x g %v% "y" <- df$y # Simulate a network g_sim <- simulate_bigergm( formula = formula, coef_within = list_within_params, coef_between = list_between_params, nsim = 1, control_within = control.simulate.formula(MCMC.burnin = 200000)) estimation <- bigergm(update(formula,new = g_sim~.), n_blocks = 10, verbose = T) summary(estimation) ## End(Not run)
Van de Bunt (1999) and Van de Bunt et al. (1999)
collected data on friendships between 32 freshmen at a European university at 7 time points.
Here, the last time point is used.
A directed edge from student i
to j
indicates that student i
considers student j
to be a friend" or
best friend".
A statnet
's network class object.
data(bunt)
Van de Bunt, G. G. (1999). Friends by choice. An Actor-Oriented Statistical Network Model for Friendship Networks through Time. Thesis Publishers, Amsterdam.
Van de Bunt, G. G., Van Duijn, M. A. J., and T. A. B. Snijders (1999). Friendship Networks Through Time: An Actor-Oriented Statistical Network Model. Computational and Mathematical Organization Theory, 5, 167–192.
Function to estimate the between-block model by relying on the maximum likelihood estimator.
est_between( formula, network, add_intercepts = TRUE, clustering_with_features = FALSE )
est_between( formula, network, add_intercepts = TRUE, clustering_with_features = FALSE )
formula |
An R |
network |
a network object with one vertex attribute called 'block' representing which node belongs to which block |
add_intercepts |
Boolean value to indicate whether adequate intercepts should be added to the provided formula so that the model in the first stage of the estimation is a nested model of the estimated model in the second stage of the estimation |
clustering_with_features |
Boolean value to indicate if the clustering
was carried out making use of the covariates or not (only important if |
'ergm' object of the estimated model.
Morris M, Handcock MS, Hunter DR (2008). Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 24.
adj <- c( c(0, 1, 0, 0, 1, 0), c(1, 0, 1, 0, 0, 1), c(0, 1, 0, 1, 1, 0), c(0, 0, 1, 0, 1, 1), c(1, 0, 1, 1, 0, 1), c(0, 1, 0, 1, 1, 0) ) adj <- matrix(data = adj, nrow = 6, ncol = 6) rownames(adj) <- as.character(1001:1006) colnames(adj) <- as.character(1001:1006) # Use non-consecutive block names block <- c(50, 70, 95, 50, 95, 70) g <- network::network(adj, matrix.type = "adjacency") g %v% "block" <- block est <- est_between( formula = g ~ edges,network = g, add_intercepts = FALSE, clustering_with_features = FALSE )
adj <- c( c(0, 1, 0, 0, 1, 0), c(1, 0, 1, 0, 0, 1), c(0, 1, 0, 1, 1, 0), c(0, 0, 1, 0, 1, 1), c(1, 0, 1, 1, 0, 1), c(0, 1, 0, 1, 1, 0) ) adj <- matrix(data = adj, nrow = 6, ncol = 6) rownames(adj) <- as.character(1001:1006) colnames(adj) <- as.character(1001:1006) # Use non-consecutive block names block <- c(50, 70, 95, 50, 95, 70) g <- network::network(adj, matrix.type = "adjacency") g %v% "block" <- block est <- est_between( formula = g ~ edges,network = g, add_intercepts = FALSE, clustering_with_features = FALSE )
Function to estimate the within-block model. Both pseudo-maximum likelihood and monte carlo approximate maximum likelihood estimators are implemented.
est_within( formula, network, seed = NULL, method = "MPLE", add_intercepts = TRUE, clustering_with_features = FALSE, return_network = FALSE, ... )
est_within( formula, network, seed = NULL, method = "MPLE", add_intercepts = TRUE, clustering_with_features = FALSE, return_network = FALSE, ... )
formula |
An R |
network |
a network object with one vertex attribute called 'block' representing which node belongs to which block |
seed |
seed value (integer) for the random number generator |
method |
If "MPLE" (the default), then the maximum pseudolikelihood estimator is returned. If "MLE", then an approximate maximum likelihood estimator is returned. |
add_intercepts |
Boolean value to indicate whether adequate intercepts should be added to the provided formula so that the model in the first stage of the estimation is a nested model of the estimated model in the second stage of the estimation |
clustering_with_features |
Boolean value to indicate if the clustering
was carried out making use of the covariates or not (only important if |
return_network |
Boolean value to indicate if the network object should be returned in the output.
This is needed if the user wants to use, e.g., the |
... |
Additional arguments, to be passed to the |
'ergm' object of the estimated model.
Morris M, Handcock MS, Hunter DR (2008). Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 24.
adj <- c( c(0, 1, 0, 0, 1, 0), c(1, 0, 1, 0, 0, 1), c(0, 1, 0, 1, 1, 0), c(0, 0, 1, 0, 1, 1), c(1, 0, 1, 1, 0, 1), c(0, 1, 0, 1, 1, 0) ) adj <- matrix(data = adj, nrow = 6, ncol = 6) rownames(adj) <- as.character(1001:1006) colnames(adj) <- as.character(1001:1006) # Use non-consecutive block names block <- c(70, 70, 70, 70, 95, 95) g <- network::network(adj, matrix.type = "adjacency", directed = FALSE) g %v% "block" <- block g %v% "vertex.names" <- 1:length(g %v% "vertex.names") est <- est_within( formula = g ~ edges, network = g, parallel = FALSE, verbose = 0, initial_estimate = NULL, seed = NULL, method = "MPLE", add_intercepts = FALSE, clustering_with_features = FALSE )
adj <- c( c(0, 1, 0, 0, 1, 0), c(1, 0, 1, 0, 0, 1), c(0, 1, 0, 1, 1, 0), c(0, 0, 1, 0, 1, 1), c(1, 0, 1, 1, 0, 1), c(0, 1, 0, 1, 1, 0) ) adj <- matrix(data = adj, nrow = 6, ncol = 6) rownames(adj) <- as.character(1001:1006) colnames(adj) <- as.character(1001:1006) # Use non-consecutive block names block <- c(70, 70, 70, 70, 95, 95) g <- network::network(adj, matrix.type = "adjacency", directed = FALSE) g %v% "block" <- block g %v% "vertex.names" <- 1:length(g %v% "vertex.names") est <- est_within( formula = g ~ edges, network = g, parallel = FALSE, verbose = 0, initial_estimate = NULL, seed = NULL, method = "MPLE", add_intercepts = FALSE, clustering_with_features = FALSE )
Function to return a list of networks, each network representing the within-block network of a block.
get_between_networks(network, block)
get_between_networks(network, block)
network |
a network object |
block |
a vector of integers representing the block of each node |
a list of networks
# Load an embedded network object. data(toyNet) get_within_networks(toyNet, toyNet %v% "block")
# Load an embedded network object. data(toyNet) get_within_networks(toyNet, toyNet %v% "block")
Function to return a list of networks, each network representing the within-block network of a block.
get_within_networks(network, block, combined_networks = TRUE)
get_within_networks(network, block, combined_networks = TRUE)
network |
a network object |
block |
a vector of integers representing the block of each node |
combined_networks |
a boolean indicating whether the between-block networks should be returned as a |
a list of networks
# Load an embedded network object. data(toyNet) get_within_networks(toyNet, toyNet %v% "block")
# Load an embedded network object. data(toyNet) get_within_networks(toyNet, toyNet %v% "block")
A sample of graphs is randomly drawn from the specified model. The first
argument is typically the output of a call to bigergm
and the
model used for that call is the one fit.
By default, the sample consists of 100 simulated networks, but this sample
size (and many other settings) can be changed using the ergm_control
argument described above.
## S3 method for class 'bigergm' gof( object, ..., type = "full", control_within = ergm::control.simulate.formula(), seed = NULL, nsim = 100, compute_geodesic_distance = TRUE, start_from_observed = TRUE, simulate_sbm = FALSE )
## S3 method for class 'bigergm' gof( object, ..., type = "full", control_within = ergm::control.simulate.formula(), seed = NULL, nsim = 100, compute_geodesic_distance = TRUE, start_from_observed = TRUE, simulate_sbm = FALSE )
object |
An |
... |
Additional arguments, to be passed to |
type |
the type of evaluation to perform. Can take the values |
control_within |
MCMC parameters as an instance of |
seed |
the seed to be passed to simulate_bigergm. If |
nsim |
the number of simulations to employ for calculating goodness of fit, default is 100. |
compute_geodesic_distance |
if |
start_from_observed |
if |
simulate_sbm |
if |
gof.bigergm
returns a list with two entries.
The first entry 'original' is another list of the network stats, degree distribution, edgewise-shared partner distribution, and geodesic distance distribution (if compute_geodesic_distance = TRUE
) of the observed network.
The second entry is called 'simulated' is also list compiling the network stats, degree distribution, edgewise-shared partner distribution, and geodesic distance distribution (if compute_geodesic_distance = TRUE
) of all simulated networks.
data(toyNet) # Specify the model that you would like to estimate. data(toyNet) # Specify the model that you would like to estimate. model_formula <- toyNet ~ edges + nodematch("x") + nodematch("y") + triangle estimate <- bigergm(model_formula,n_blocks = 4) gof_res <- gof(estimate, nsim = 100 ) plot(gof_res)
data(toyNet) # Specify the model that you would like to estimate. data(toyNet) # Specify the model that you would like to estimate. model_formula <- toyNet ~ edges + nodematch("x") + nodematch("y") + triangle estimate <- bigergm(model_formula,n_blocks = 4) gof_res <- gof(estimate, nsim = 100 ) plot(gof_res)
The network corresponds to collaborations between 39 workers in a tailor shop in Africa:
an undirected edge between workers i
and j
indicates that the workers collaborated.
The network is taken from Kapferer (1972).
A statnet
's network class object.
data(kapferer)
Kapferer, B. (1972). Strategy and Transaction in an African Factory. Manchester University Press, Manchester, U.K.
This function plots the network with the found clusters. The nodes are colored according to the found clusters.
Note that the function uses the network
package for plotting the network and should therefore not be used for large networks with more than 1-2 K vertices
## S3 method for class 'bigergm' plot(x, ...)
## S3 method for class 'bigergm' plot(x, ...)
x |
The output of the bigergm function |
... |
Additional arguments, to be passed to lower-level functions |
Install Python dependencies needed for using the Python implementation of infomap.
The code uses the reticulate
package to install the Python packages infomap
and numpy
.
These packages are needed for the bigergm
function when use_infomap_python = TRUE
else the Python implementation is not needed.
py_dep(envname = "r-bigergm", method = "auto", ...)
py_dep(envname = "r-bigergm", method = "auto", ...)
envname |
The name, or full path, of the environment in which Python packages are to be installed. When NULL (the default), the active environment as set by the RETICULATE_PYTHON_ENV variable will be used; if that is unset, then the r-reticulate environment will be used. |
method |
Installation method. By default, "auto" automatically finds a method that will work in the local environment. Change the default to force a specific installation method. Note that the "virtualenv" method is not available on Windows. |
... |
Additional arguments, to be passed to lower-level functions |
No return value, called for installing the Python dependencies 'infomap' and 'numpy'
The data was collected by Facebook and provided as part of Traud et al. (2012)
A statnet
's network class object. It has three nodal features.
anonymized dorm in which each node lives.
gender of each node.
anonymized highschool to which each node went to.
year of graduation of each node.
... data(reed)
Traud, Mucha, Porter (2012). Social Structure of Facebook Network. Physica A: Statistical Mechanics and its Applications, 391, 4165-4180
The data was collected by Facebook and provided as part of Traud et al. (2012)
A statnet
's network class object. It has three nodal features.
anonymized dorm in which each node lives.
gender of each node.
anonymized highschool to which each node went to.
year of graduation of each node.
data(rice)
Traud, Mucha, Porter (2012). Social Structure of Facebook Network. Physica A: Statistical Mechanics and its Applications, 391, 4165-4180
This function simulates networks under Exponential Random Graph Models (ERGMs) with local dependence.
There is also an option to simulate only within-block networks and a S3 method for the class bigergm
.
simulate_bigergm( formula, coef_within, coef_between, network = ergm.getnetwork(formula), control_within = ergm::control.simulate.formula(), only_within = FALSE, seed = NULL, nsim = 1, output = "network", verbose = 0, ... )
simulate_bigergm( formula, coef_within, coef_between, network = ergm.getnetwork(formula), control_within = ergm::control.simulate.formula(), only_within = FALSE, seed = NULL, nsim = 1, output = "network", verbose = 0, ... )
formula |
An R |
coef_within |
a vector of within-block parameters. The order of the parameters should match that of the formula. |
coef_between |
a vector of between-block parameters. The order of the parameters should match that of the formula without externality terms. |
network |
a network object to be used as a seed network for the simulation (if none is provided, the network on the lhs of the |
control_within |
auxiliary function as user interface for fine-tuning ERGM simulation for within-block networks. |
only_within |
If this is TRUE, only within-block networks are simulated. |
seed |
seed value (integer) for network simulation. |
nsim |
number of networks generated. |
output |
Normally character, one of "network" (default), "stats", "edgelist", to determine the output format. |
verbose |
If this is TRUE/1, the program will print out additional information about the progress of simulation. |
... |
Additional arguments, passed to |
Simulated networks, the output form depends on the parameter output
(default is a list of networks).
Morris M, Handcock MS, Hunter DR (2008). Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 24.
data(toyNet) # Specify the model that you would like to estimate. model_formula <- toyNet ~ edges + nodematch("x") + nodematch("y") + triangle # Simulate network stats sim_stats <- bigergm::simulate_bigergm( formula = model_formula, # Formula for the model coef_between = c(-4.5,0.8, 0.4), # The coefficients for the between connections coef_within = c(-1.7,0.5,0.6,0.15), # The coefficients for the within connections nsim = 10, # Number of simulations to return output = "stats", # Type of output )
data(toyNet) # Specify the model that you would like to estimate. model_formula <- toyNet ~ edges + nodematch("x") + nodematch("y") + triangle # Simulate network stats sim_stats <- bigergm::simulate_bigergm( formula = model_formula, # Formula for the model coef_between = c(-4.5,0.8, 0.4), # The coefficients for the between connections coef_within = c(-1.7,0.5,0.6,0.15), # The coefficients for the within connections nsim = 10, # Number of simulations to return output = "stats", # Type of output )
This function simulates networks under the Exponential Random Graph Model (ERGM)
with local dependence with all parameters set according to the estimated model (object
).
See simulate_bigergm
for details of the simulation process
## S3 method for class 'bigergm' simulate( object, nsim = 1, seed = NULL, ..., output = "network", control_within = ergm::control.simulate.formula(), only_within = FALSE, verbose = 0 )
## S3 method for class 'bigergm' simulate( object, nsim = 1, seed = NULL, ..., output = "network", control_within = ergm::control.simulate.formula(), only_within = FALSE, verbose = 0 )
object |
an object of class |
nsim |
number of networks to be randomly drawn from the given distribution on the set of all networks. |
seed |
seed value (integer) for network simulation. |
... |
Additional arguments, passed to |
output |
Normally character, one of "network" (default), "stats", "edgelist", to determine the output of the function. |
control_within |
|
only_within |
If this is TRUE, only within-block networks are simulated. |
verbose |
If this is TRUE/1, the program will print out additional information about the progress of simulation. |
Simulated networks, the output form depends on the parameter output
(default is a list of networks).
The network includes the Twitter (X) following interactions between U.S. state legislators. The data was collection by Gopal et al. (2022) and Kim et al. (2022). For this network, we only include the largest connected component of state legislators that were active on Twitter in the six months leading up to and including the insurrection at the United States Capitol on January 6, 2021. All state senate and state representatives for states with a bicameral system are included and all state legislators for state (Nebraska) with a unicameral system are included.
data(state_twitter)
data(state_twitter)
A statnet
's network class object. It has the following categorical attributes for each state legislator.
factor stating whether the legislator is 'female' or 'male'.
party affiliation of the legislator, which is 'Democratic', 'Independent' or 'Republican'.
race with the following levels: 'Asian or Pacific Islander', 'Black', 'Latino', 'MENA(Middle East and North Africa)','Multiracial', 'Native American', and 'White'.
character of the state that the legislator represents.
Gopal, Kim, Nakka, Boehmke, Harden, Desmarais. The National Network of U.S. State Legislators on Twitter. Political Science Research & Methods, Forthcoming.
Kim, Nakka, Gopal, Desmarais,Mancinelli, Harden, Ko, and Boehmke (2022). Attention to the COVID-19 pandemic on Twitter: Partisan differences among U.S. state legislators. Legislative Studies Quarterly 47, 1023–1041.
bigergm
with.This network has a clear cluster structure. The number of clusters is four, and which cluster each node belongs to is defined in the variable "block".
data(toyNet)
data(toyNet)
A statnet
's network class object. It has three nodal features.
block membership of each node
a covariate. It has 10 labels.
a covariate. It has 10 labels.
...
1
and 2
are not variables with any particular meaning.
This function computes Yule's Phi-coefficient between the true and estimated block membership (its definition can be found here https://en.wikipedia.org/wiki/Phi_coefficient). In this context, the Phi Coefficient is a measure of association between two group membership vectors.
yule(z_star, z)
yule(z_star, z)
z_star |
a true block membership |
z |
an estimated block membership |
Real value of Yule's Phi-coefficient between the true and estimated block membership is returned.
data(toyNet) yule(z_star = toyNet%v% "block", z = sample(c(1:4),size = 200,replace = TRUE))
data(toyNet) yule(z_star = toyNet%v% "block", z = sample(c(1:4),size = 200,replace = TRUE))