Package 'gsbm' reference manual

Title:	Estimate Parameters in the Generalized SBM
Description:	Given an adjacency matrix drawn from a Generalized Stochastic Block Model with missing observations, this package robustly estimates the probabilities of connection between nodes and detects outliers nodes, as describes in Gaucher, Klopp and Robin (2019) <arXiv:1911.13122>.
Authors:	Genevieve Robin [aut], Solenne Gaucher [aut, cre]
Maintainer:	Solenne Gaucher <[email protected]>
License:	GPL-3
Version:	0.2.2
Built:	2025-02-19 07:02:54 UTC
Source:	CRAN

Political blogs network

Description

Network of political blogs, first analyzed in "The political blogosphere and the 2004 US Election" by Lada A. Adamic and Natalie Glance, in Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem (2005). This data set, collected before the 2004 American presidential election, records hyperlinks connecting political blogs to one another. These blogs have been labeled manually as either "liberal" or "conservative".

Usage

data(blogosphere)
data(blogosphere)

Format

A list with 3 attributes:

A: adjacency matrix of the graph. A binary matrix encoding 16714 connections between 1222 nodes
names: vector of the names political blogs corresponding to the rows and columns of the adjacency matrix
opinion: vector of the political orientation of the blogs (0 for liberal, 1 for conservative).

Source

http://www-personal.umich.edu/~mejn/netdata/

Parameter selection by cross validation

Description

Selection by cross validation of the regularization parameters (lambda1 and lambda2) for estimating the probabilities of connection in a Generalized Stochastic Block Model.

Usage

crossval(
  A,
  epsilon = 0.1,
  nb.boot = 5,
  thresh = 1e-05,
  maxit = 100,
  lambda1.max = NULL,
  lambda2.max = NULL,
  lambda1.min = NULL,
  lambda2.min = NULL,
  length = 10,
  S0 = NULL,
  L0 = NULL,
  trace.it = FALSE
)
crossval(
  A,
  epsilon = 0.1,
  nb.boot = 5,
  thresh = 1e-05,
  maxit = 100,
  lambda1.max = NULL,
  lambda2.max = NULL,
  lambda1.min = NULL,
  lambda2.min = NULL,
  length = 10,
  S0 = NULL,
  L0 = NULL,
  trace.it = FALSE
)

Arguments

`A`	nxn adjacency matrix
`epsilon`	regularization parameter for the L^2-norm penalty (positive number, if NULL, default method is applied)
`nb.boot`	number of folds for cross validation (integer)
`thresh`	convergence tolerance (positive number)
`maxit`	maximum number of iterations (positive integer)
`lambda1.max`	maximum regularization parameter for nuclear norm penalty (positive number)
`lambda2.max`	maximum regularization parameter for 2,1-norm norm penalty (positive number)
`lambda1.min`	minimum regularization parameter for nuclear norm penalty (positive number)
`lambda2.min`	minimum regularization parameter for 2,1-norm norm penalty (positive number)
`length`	size of cross-validation grid (integer)
`S0`	initial value for the sparse component
`L0`	initial value for the low-rank component
`trace.it`	whether messages about convergence should be printed (boolean)

Value

The values selected by cross-validation for the regularization parameters lambda1 and lambda2. The return value is a list of components

lambda1 selected value for the parameter of the nuclear norm penalization.
lambda2 selected value for the parameter of the 2,1-norm penalization.
estim.cv result of the gsbm_mcgd function for the parameters selected.
error a table containing the errors for all pairs of parameters on the grid.
lambda1.grid grid of value for the parameter lambda1.
lambda2.grid grid of value for the parameter lambda2.

Examples

# Draw a 50x50 adjacency matrix
# Generalized SBM with 2 communities and 2 outliers
# Create low-rank matrix L
L <- matrix(0,50,50) # low-rank component
L[1:25, 1:25] <- 0.6 # connection probabilities within community 1
L[1:25, 26:48] <- 0.1 # connection probabilities between communities 1 and 2
L[26:48, 1:25] <- 0.1 # connection probabilities between communities 1 and 2
L[26:48, 26:48] <- 0.6 # connection probabilities within community 2

# Create column-sparse matrix S
S <- matrix(0,50,50) # column sparse component
S[49:50,1:48] <- 0.6 # connection probabilities between outliers and inliers

# Draw connections and create the adjacency matrix
undir <- rbinom(n=50*(50-1)/2, size=1, prob=(L+S+t(S))[upper.tri(L+S+t(S))]) # draw edges
A <- matrix(0,50,50)
A[upper.tri(A)] <- undir
A <- (A+t(A))
# Draw a 50x50 adjacency matrix
# Generalized SBM with 2 communities and 2 outliers
# Create low-rank matrix L
L <- matrix(0,50,50) # low-rank component
L[1:25, 1:25] <- 0.6 # connection probabilities within community 1
L[1:25, 26:48] <- 0.1 # connection probabilities between communities 1 and 2
L[26:48, 1:25] <- 0.1 # connection probabilities between communities 1 and 2
L[26:48, 26:48] <- 0.6 # connection probabilities within community 2

# Create column-sparse matrix S
S <- matrix(0,50,50) # column sparse component
S[49:50,1:48] <- 0.6 # connection probabilities between outliers and inliers

# Draw connections and create the adjacency matrix
undir <- rbinom(n=50*(50-1)/2, size=1, prob=(L+S+t(S))[upper.tri(L+S+t(S))]) # draw edges
A <- matrix(0,50,50)
A[upper.tri(A)] <- undir
A <- (A+t(A))

Fit a Generalized Stochastic Block Model

Description

Given an adjacency matrix with missing observations, the function gsbm_mgcd robustly estimates the probabilities of connections between nodes.

Usage

gsbm_mcgd(
  A,
  lambda1,
  lambda2,
  epsilon = 0.1,
  U = NULL,
  maxit = 100,
  thresh = 1e-06,
  S0 = NULL,
  L0 = NULL,
  R0 = NULL,
  trace.it = FALSE
)
gsbm_mcgd(
  A,
  lambda1,
  lambda2,
  epsilon = 0.1,
  U = NULL,
  maxit = 100,
  thresh = 1e-06,
  S0 = NULL,
  L0 = NULL,
  R0 = NULL,
  trace.it = FALSE
)

Arguments

`A`	nxn adjacency matrix
`lambda1`	regularization parameter for nuclear norm penalty (positive number)
`lambda2`	regularization parameter for 2,1-norm penalty (positive number)
`epsilon`	regularization parameter for the L2-norm penalty (positive number, if NULL, default method is applied)
`U`	lower bound on nuclear norm (positive number, if NULL, default method is applied)
`maxit`	maximum number of iterations (positive integer, if NULL, default method is applied)
`thresh`	convergence tolerance (positive number, if NULL, default method is applied)
`S0`	initial value for the sparse component (if NULL, default method is applied))
`L0`	initial value for the low-rank component (if NULL, default method is applied)
`R0`	lower bound on nuclear norm of L0 (positive number, if NULL, default method is applied)
`trace.it`	whether messages about convergence should be printed (boolean, if NULL, default is FALSE)

Value

The estimate for the nxn matrix of probabilities of connections between nodes. It is given as the sum of a low-rank nxn matrix L, corresponding to connections between inlier nodes, and a column sparse nxn matrix S, corresponding to connections between outlier nodes and the rest of the network. The matrices L and S are such that

E(A) = L - diag(L) + S + S'

where E(A) is the expectation of the adjacency matrix, diag(L) is a nxn diagonal matrix with diagonal entries equal to those of L, and S' means S transposed.

The return value is a list of components

A the adjacency matrix.
L estimate for the low-rank component.
S estimate for the column-sparse component.
objective the value of the objective function.
R a bound on the nuclear norm of the low-rank component.
iter number of iterations between convergence of the objective function.

Examples

# Draw a 50x50 adjacency matrix
# Generalized SBM with 2 communities and 2 outliers
# Create low-rank matrix L
L <- matrix(0,50,50) # low-rank component
L[1:25, 1:25] <- 0.6 # connection probabilities within community 1
L[1:25, 26:48] <- 0.1 # connection probabilities between communities 1 and 2
L[26:48, 1:25] <- 0.1 # connection probabilities between communities 1 and 2
L[26:48, 26:48] <- 0.6 # connection probabilities within community 2

# Create column-sparse matrix S
S <- matrix(0,50,50) # column sparse component
S[49:50,1:48] <- 0.6 # connection probabilties between outliers and inliers

# Draw connections and create the adjacency matrix
undir <- rbinom(n=50*(50-1)/2, size=1, prob=(L+S+t(S))[upper.tri(L+S+t(S))]) # draw edges
A <- matrix(0,50,50)
A[upper.tri(A)] <- undir
A <- (A+t(A))

# Estimate the probabilities of connection
lambda1 <- 7
lambda2 <- 7
res <- gsbm_mcgd(A, lambda1, lambda2)
# Draw a 50x50 adjacency matrix
# Generalized SBM with 2 communities and 2 outliers
# Create low-rank matrix L
L <- matrix(0,50,50) # low-rank component
L[1:25, 1:25] <- 0.6 # connection probabilities within community 1
L[1:25, 26:48] <- 0.1 # connection probabilities between communities 1 and 2
L[26:48, 1:25] <- 0.1 # connection probabilities between communities 1 and 2
L[26:48, 26:48] <- 0.6 # connection probabilities within community 2

# Create column-sparse matrix S
S <- matrix(0,50,50) # column sparse component
S[49:50,1:48] <- 0.6 # connection probabilties between outliers and inliers

# Draw connections and create the adjacency matrix
undir <- rbinom(n=50*(50-1)/2, size=1, prob=(L+S+t(S))[upper.tri(L+S+t(S))]) # draw edges
A <- matrix(0,50,50)
A[upper.tri(A)] <- undir
A <- (A+t(A))

# Estimate the probabilities of connection
lambda1 <- 7
lambda2 <- 7
res <- gsbm_mcgd(A, lambda1, lambda2)

Fit a Generalized Stochastic Block Model

Description

Given an adjacency matrix with missing observations, the function gsbm_mgcd robustly estimates the probabilities of connections between nodes.

Usage

gsbm_mcgd_parallel(
  A,
  lambda1,
  lambda2,
  epsilon = 0.1,
  maxit = 100,
  step_L = 0.01,
  step_S = 0.1,
  trace.it = FALSE,
  n_cores = detectCores(),
  save = FALSE,
  file = NULL
)
gsbm_mcgd_parallel(
  A,
  lambda1,
  lambda2,
  epsilon = 0.1,
  maxit = 100,
  step_L = 0.01,
  step_S = 0.1,
  trace.it = FALSE,
  n_cores = detectCores(),
  save = FALSE,
  file = NULL
)

Arguments

`A`	nxn adjacency matrix
`lambda1`	regularization parameter for nuclear norm penalty (positive number)
`lambda2`	regularization parameter for 2,1-norm penalty (positive number)
`epsilon`	regularization parameter for the L2-norm penalty (positive number, if NULL, default method is applied)
`maxit`	maximum number of iterations (positive integer, if NULL, default method is applied)
`step_L`	step size for the gradient step of L parameter (positive number)
`step_S`	step size for the gradient step of S parameter (positive number)
`trace.it`	whether messages about convergence should be printed (boolean, if NULL, default is FALSE)
`n_cores`	number of cores to parallellize on (integer number, default is set with detectCores())
`save`	whether or not value of current estimates should be saved at each iteration (boolean)
`file`	if save is set to TRUE, name of the folder where current estimates should be saved (character string, file saved in file/L_iter.txt at iteration iter)

Value

E(A) = L - diag(L) + S + S'

where E(A) is the expectation of the adjacency matrix, diag(L) is a nxn diagonal matrix with diagonal entries equal to those of L, and S' means S transposed.

The return value is a list of components

A the adjacency matrix.
L estimate for the low-rank component.
S estimate for the column-sparse component.
objective the value of the objective function.
R a bound on the nuclear norm of the low-rank component.
iter number of iterations between convergence of the objective function.

Character network from "Les miserables" novel

Description

A dataset containing Les Misérables characters network, encoding interactions between characters of Victor Hugo's novel. Two characters are connected whenever they appear in the same chapter. This dataset was first created by Donald Knuth as part of the Stanford Graph Base. (https://people.sc.fsu.edu/~jburkardt/datasets/sgb/sgb.html). It contains 77 nodes corresponding to characters of the novel, and 254

Usage

data(les_miserables)
data(les_miserables)

Format

A list with 2 attributes:

A: adjacency matrix of the graph. A binary matrix encoding 254 connections between 77 nodes
names: a vector giving the names of the characters corresponding to the rows and columns of the adjacency matrix

Source

https://people.sc.fsu.edu/~jburkardt/datasets/sgb/sgb.html

Network of interactions within a primary school in the course of a day

Description

This network, collected and analyzed by J. Stehle et al. in "High-resolution measurements of face-to-face contact patterns in a primary school", records physical interactions between 226 children and 10 teachers within a primary school over the course of a day. The network data was collected using a system of sensors worn by the participants. This system records the duration of interactions between two individuals facing each other at a maximum distance of one and a half meters.

Usage

data(PrimarySchool)
data(PrimarySchool)

Format

A list with 2 attributes:

A: adjacency matrix of the graph. A binary matrix encoding 2490 undirected connections between 236 nodes, with 7054 missing entries
class: vector indicating the class of the node if the corresponding individual is a child, and otherwise that it belongs to the group of teachers.

Source

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0023176

Package 'gsbm'

Help Index

Political blogs network

Description

Usage

Format

Source

Parameter selection by cross validation

Description

Usage

Arguments

Value

Examples

Fit a Generalized Stochastic Block Model

Description

Usage

Arguments

Value

Examples

Fit a Generalized Stochastic Block Model

Description

Usage

Arguments

Value

Character network from "Les miserables" novel

Description

Usage

Format

Source

Network of interactions within a primary school in the course of a day

Description

Usage

Format

Source