Title: | Network-Valued Data Analysis |
---|---|
Description: | A flexible statistical framework for network-valued data analysis. It leverages the complexity of the space of distributions on graphs by using the permutation framework for inference as implemented in the 'flipr' package. Currently, only the two-sample testing problem is covered and generalization to k samples and regression will be added in the future as well. It is a 4-step procedure where the user chooses a suitable representation of the networks, a suitable metric to embed the representation into a metric space, one or more test statistics to target specific aspects of the distributions to be compared and a formula to compute the permutation p-value. Two types of inference are provided: a global test answering whether there is a difference between the distributions that generated the two samples and a local test for localizing differences on the network structure. The latter is assumed to be shared by all networks of both samples. References: Lovato, I., Pini, A., Stamm, A., Vantini, S. (2020) "Model-free two-sample test for network-valued data" <doi:10.1016/j.csda.2019.106896>; Lovato, I., Pini, A., Stamm, A., Taquet, M., Vantini, S. (2021) "Multiscale null hypothesis testing for network-valued data: Analysis of brain networks of patients with autism" <doi:10.1111/rssc.12463>. |
Authors: | Ilenia Lovato [aut], Alessia Pini [aut], Aymeric Stamm [aut, cre] , Simone Vantini [aut] |
Maintainer: | Aymeric Stamm <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.2.0 |
Built: | 2024-11-28 06:44:33 UTC |
Source: | CRAN |
This function flags a list of igraph
objects as an
nvd
object as defined in this package.
as_nvd(obj)
as_nvd(obj)
obj |
A list of |
An nvd
object.
gnp_params <- list(p = 1/3) as_nvd(nvd(model = "gnp", n = 10L, model_params = gnp_params))
gnp_params <- list(p = 1/3) as_nvd(nvd(model = "gnp", n = 10L, model_params = gnp_params))
This function converts a vector of memberships into a proper vertex partition object.
as_vertex_partition(x)
as_vertex_partition(x)
x |
A list grouping the vertices by partition element or an integer or character vector of vertex memberships. |
A vertex_partition
object storing the corresponding vertex
partition.
m1 <- c("P1", "P3", "P4", "P1", "P2", "P2", "P3", "P1", "P4", "P3") V1 <- as_vertex_partition(m1) m2 <- as.integer(c(1, 3, 4, 1, 2, 2, 3, 1, 4, 3)) V2 <- as_vertex_partition(m2)
m1 <- c("P1", "P3", "P4", "P1", "P2", "P2", "P3", "P1", "P4", "P3") V1 <- as_vertex_partition(m1) m2 <- as.integer(c(1, 3, 4, 1, 2, 2, 3, 1, 4, 3)) V2 <- as_vertex_partition(m2)
This function computes the matrix of pairwise distances between all the
elements of the two samples put together. The cardinality of the fist sample
is denoted by and that of the second one is denoted by
.
dist_nvd( x, y = NULL, representation = "adjacency", distance = "frobenius", matching_iterations = 0, target_matrix = NULL )
dist_nvd( x, y = NULL, representation = "adjacency", distance = "frobenius", matching_iterations = 0, target_matrix = NULL )
x |
A |
y |
A |
representation |
A string specifying the desired type of representation,
among: |
distance |
A string specifying the chosen distance for calculating the
test statistic, among: |
matching_iterations |
An integer value specifying the maximum number of
runs when looking for the optimal permutation for graph matching. Defaults
to |
target_matrix |
A square numeric matrix of size |
A matrix of dimension containing the
distances between all the elements of the two samples put together.
gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) y <- nvd(model = "k_regular", n = 10L, model_params = k_regular_params) dist_nvd(x, y, "adjacency", "spectral")
gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) y <- nvd(model = "k_regular", n = 10L, model_params = k_regular_params) dist_nvd(x, y, "adjacency", "spectral")
This is a collection of functions computing the distance between two networks.
dist_hamming(x, y, representation = "laplacian") dist_frobenius( x, y, representation = "laplacian", matching_iterations = 0, target_matrix = NULL ) dist_spectral(x, y, representation = "laplacian") dist_root_euclidean(x, y, representation = "laplacian")
dist_hamming(x, y, representation = "laplacian") dist_frobenius( x, y, representation = "laplacian", matching_iterations = 0, target_matrix = NULL ) dist_spectral(x, y, representation = "laplacian") dist_root_euclidean(x, y, representation = "laplacian")
x |
An |
y |
An |
representation |
A string specifying the desired type of representation,
among: |
matching_iterations |
An integer value specifying the maximum number of
runs when looking for the optimal permutation for graph matching. Defaults
to |
target_matrix |
A square numeric matrix of size |
Let be the matrix representation of network
and
be
the matrix representation of network
. The Hamming distance between
and
is given by
where is the number of vertices in networks
and
. The Frobenius distance between
and
is given by
The spectral distance between
and
is given by
where
and
of the eigenvalues of
and
, respectively.
This distance gives rise to classes of equivalence. Consider the spectral
decomposition of
and
:
and
where and
are the matrices whose columns are the
eigenvectors of
and
, respectively and
and
are
the diagonal matrices with elements the eigenvalues of
and
,
respectively. The root-Euclidean distance between
and
is
given by
Root-Euclidean distance can used only with the laplacian matrix representation.
A scalar measuring the distance between the two input networks.
g1 <- igraph::sample_gnp(20, 0.1) g2 <- igraph::sample_gnp(20, 0.2) dist_hamming(g1, g2, "adjacency") dist_frobenius(g1, g2, "adjacency") dist_spectral(g1, g2, "laplacian") dist_root_euclidean(g1, g2, "laplacian")
g1 <- igraph::sample_gnp(20, 0.1) g2 <- igraph::sample_gnp(20, 0.2) dist_hamming(g1, g2, "adjacency") dist_frobenius(g1, g2, "adjacency") dist_spectral(g1, g2, "laplacian") dist_root_euclidean(g1, g2, "laplacian")
Transform distance matrix in edge properties of minimal spanning tree
edge_count_global_variables(d, n1, k = 1L)
edge_count_global_variables(d, n1, k = 1L)
d |
A matrix of dimension |
n1 |
An integer giving the size of the first sample. |
k |
An integer specifying the density of the minimal spanning tree to generate. |
A list of edge properties of the minimal spanning tree.
n1 <- 30L n2 <- 10L gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) x <- nvd(model = "gnp", n = n1, model_params = gnp_params) y <- nvd(model = "k_regular", n = n2, model_params = k_regular_params) d <- dist_nvd(x, y, representation = "laplacian", distance = "frobenius") e <- edge_count_global_variables(d, n1, k = 5L)
n1 <- 30L n2 <- 10L gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) x <- nvd(model = "gnp", n = n1, model_params = gnp_params) y <- nvd(model = "k_regular", n = n2, model_params = k_regular_params) d <- dist_nvd(x, y, representation = "laplacian", distance = "frobenius") e <- edge_count_global_variables(d, n1, k = 5L)
Sigma-Algebra generated by a Partition
generate_sigma_algebra(x)
generate_sigma_algebra(x)
x |
Input partition stored as a |
Sigma-algebra
g <- igraph::make_ring(7) m <- as.integer(c(1, 2, 1, 3, 4, 4, 3)) p <- as_vertex_partition(m) sa <- generate_sigma_algebra(p) all_full <- purrr::modify_depth(sa, 2, ~ subgraph_full (g, .x)) all_intra <- purrr::modify_depth(sa, 2, ~ subgraph_intra(g, .x)) all_inter <- purrr::modify_depth(sa, 2, ~ subgraph_inter(g, .x))
g <- igraph::make_ring(7) m <- as.integer(c(1, 2, 1, 3, 4, 4, 3)) p <- as_vertex_partition(m) sa <- generate_sigma_algebra(p) all_full <- purrr::modify_depth(sa, 2, ~ subgraph_full (g, .x)) all_intra <- purrr::modify_depth(sa, 2, ~ subgraph_intra(g, .x)) all_inter <- purrr::modify_depth(sa, 2, ~ subgraph_inter(g, .x))
This is a collection of functions computing the inner product between two networks.
ipro_frobenius(x, y, representation = "laplacian")
ipro_frobenius(x, y, representation = "laplacian")
x |
An |
y |
An |
representation |
A string specifying the desired type of representation,
among: |
A scalar measuring the angle between the two input networks.
g1 <- igraph::sample_gnp(20, 0.1) g2 <- igraph::sample_gnp(20, 0.2) ipro_frobenius(g1, g2, "adjacency")
g1 <- igraph::sample_gnp(20, 0.1) g2 <- igraph::sample_gnp(20, 0.2) ipro_frobenius(g1, g2, "adjacency")
This function computes the sample Fréchet mean from an observed sample of network-valued random variables according to a specified matrix representation. It currently only supports the Euclidean geometry i.e. the sample Fréchet mean is obtained as the argmin of the sum of squared Frobenius distances.
## S3 method for class 'nvd' mean(x, weights = rep(1, length(x)), representation = "adjacency", ...)
## S3 method for class 'nvd' mean(x, weights = rep(1, length(x)), representation = "adjacency", ...)
x |
An |
weights |
A numeric vector specifying weights for each observation (default: equally weighted). |
representation |
A string specifying the graph representation to be used. Choices are adjacency, laplacian, modularity, graphon. Default is adjacency. |
... |
Other argument to be parsed to the |
The mean network in the chosen matrix representation assuming Euclidean geometry for now.
gnp_params <- list(p = 1/3) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) mean(x)
gnp_params <- list(p = 1/3) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) mean(x)
This is the constructor for objects of class nvd
.
nvd( model = "smallworld", n = 1L, num_vertices = 25L, model_params = list(dim = 1L, nei = 4L, p = 0.15), seed = 1234 )
nvd( model = "smallworld", n = 1L, num_vertices = 25L, model_params = list(dim = 1L, nei = 4L, p = 0.15), seed = 1234 )
model |
A string specifying the model to be used for sampling networks
(current choices are: |
n |
An integer specifying the sample size. Defaults to |
num_vertices |
An integer specifying the order of the graphs to be
generated (i.e. the number of nodes). Defaults to |
model_params |
A named list setting the parameters of the model you are
considering. Defaults to |
seed |
An integer specifying the random generator seed. Defaults to
|
A nvd
object which is a list of igraph
objects.
smallworld_params <- list(dim = 1L, nei = 4L, p = 0.15) nvd(model_params = smallworld_params)
smallworld_params <- list(dim = 1L, nei = 4L, p = 0.15) nvd(model_params = smallworld_params)
This function generates 2-dimensional plots of samples of networks via multi-dimensional scaling using all representations and distances included in the package.
## S3 method for class 'nvd' autoplot(object, memberships = rep(1, length(object)), method = "mds", ...) ## S3 method for class 'nvd' plot(x, method = "mds", ...)
## S3 method for class 'nvd' autoplot(object, memberships = rep(1, length(object)), method = "mds", ...) ## S3 method for class 'nvd' plot(x, method = "mds", ...)
object , x
|
A list containing two samples of network-valued data stored
as objects of class |
memberships |
An integer vector specifying the membership of each
network to a specific sample. Defaults to |
method |
A string specifying which dimensionality reduction method to
use for projecting the samples into the cartesian plane. Choices are
|
... |
Extra arguments to be passed to the plot function. |
Invisibly returns a ggplot
object. In
particular, the data set computed to generate the plot can be retrieved via
$data
. This is a tibble
containing the following
variables:
V1
: the x-coordinate of each observation in the plane,
V2
: the y-coordinate of each observation in the plane,
Label
: the sample membership of each observation,
Representation
: the type of matrix representation used to manipulate each
observation,
Distance
: the distance used to measure how far each observation is from
the others.
gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) y <- nvd(model = "k_regular", n = 10L, model_params = k_regular_params) mb <- c(rep(1, length(x)), rep(2, length(y))) z <- as_nvd(c(x, y)) ggplot2::autoplot(z, memberships = mb) plot(z, memberships = mb)
gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) y <- nvd(model = "k_regular", n = 10L, model_params = k_regular_params) mb <- c(rep(1, length(x)), rep(2, length(y))) z <- as_nvd(c(x, y)) ggplot2::autoplot(z, memberships = mb) plot(z, memberships = mb)
This function provides a Monte-Carlo estimate of the power of the permutation tests proposed in this package.
power2( model1 = "gnp", model2 = "k_regular", n1 = 20L, n2 = 20L, num_vertices = 25L, model1_params = NULL, model2_params = NULL, representation = "adjacency", distance = "frobenius", stats = c("flipr:t_ip", "flipr:f_ip"), B = 1000L, alpha = 0.05, test = "exact", k = 5L, R = 1000L, seed = 1234 )
power2( model1 = "gnp", model2 = "k_regular", n1 = 20L, n2 = 20L, num_vertices = 25L, model1_params = NULL, model2_params = NULL, representation = "adjacency", distance = "frobenius", stats = c("flipr:t_ip", "flipr:f_ip"), B = 1000L, alpha = 0.05, test = "exact", k = 5L, R = 1000L, seed = 1234 )
model1 |
A string specifying the model to be used for generating the
first sample. Choices are |
model2 |
A string specifying the model to be used for generating the
second sample. Choices are |
n1 |
The size of the first sample. Defaults to |
n2 |
The size of the second sample. Defaults to |
num_vertices |
The number of nodes in the generated graphs. Defaults to
|
model1_params |
A named list setting the parameters of the first chosen
model. Defaults to |
model2_params |
A named list setting the parameters of the second chosen
model. Defaults to |
representation |
A string specifying the desired type of representation,
among: |
distance |
A string specifying the chosen distance for calculating the
test statistic, among: |
stats |
A character vector specifying the chosen test statistic(s),
among: |
B |
The number of permutation or the tolerance. If this number is lower
than |
alpha |
Significance level for hypothesis testing. Defaults to |
test |
A character string specifying the formula to be used to compute
the permutation p-value. Choices are |
k |
An integer specifying the density of the minimum spanning tree used
for the edge count statistics. Defaults to |
R |
Number of Monte-Carlo trials used to estimate the power. Defaults to
|
seed |
An integer specifying the random generator seed. Defaults to '1234. |
Currently, six scenarios of pairs of populations are implemented. Scenario 0 allows to make sure that all our permutation tests are exact.
A numeric value estimating the power of the test.
gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) power2( model1_params = gnp_params, model2_params = k_regular_params, R = 10, B = 100, seed = 1234 )
gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) power2( model1_params = gnp_params, model2_params = k_regular_params, R = 10, B = 100, seed = 1234 )
Network-Valued to Matrix-Valued Data
repr_nvd(x, y = NULL, representation = "adjacency")
repr_nvd(x, y = NULL, representation = "adjacency")
x |
An |
y |
An |
representation |
A string specifying the requested matrix
representation. Choices are: |
A list of matrices.
gnp_params <- list(p = 1/3) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) xm <- repr_nvd(x)
gnp_params <- list(p = 1/3) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) xm <- repr_nvd(x)
This is a collection of functions that convert a graph stored as an
igraph
object into a desired matrix representation
among adjacency matrix, graph laplacian, modularity matrix or graphon (edge
probability matrix).
repr_adjacency(network, validate = TRUE) repr_laplacian(network, validate = TRUE) repr_modularity(network, validate = TRUE) repr_graphon(network, validate = TRUE)
repr_adjacency(network, validate = TRUE) repr_laplacian(network, validate = TRUE) repr_modularity(network, validate = TRUE) repr_graphon(network, validate = TRUE)
network |
An |
validate |
A boolean specifying whether the function should check the
class of its input (default: |
A numeric square matrix giving the desired network representation recorded in the object's class.
g <- igraph::sample_smallworld(1, 25, 3, 0.05) repr_adjacency(g) repr_laplacian(g) repr_modularity(g) repr_graphon(g)
g <- igraph::sample_smallworld(1, 25, 3, 0.05) repr_adjacency(g) repr_laplacian(g) repr_modularity(g) repr_graphon(g)
This function generates two samples of networks according to the stochastic
block model (SBM). This is essentially a wrapper around
sample_sbm
which allows to sample a single network from
the SBM.
sample2_sbm(n, nv, p1, b1, p2 = p1, b2 = b1, seed = NULL)
sample2_sbm(n, nv, p1, b1, p2 = p1, b2 = b1, seed = NULL)
n |
Integer scalar giving the sample size. |
nv |
Integer scalar giving the number of vertices of the generated networks, common to all networks in both samples. |
p1 |
The matrix giving the Bernoulli rates for the 1st sample. This is a KxK matrix, where K is the number of groups. The probability of creating an edge between vertices from groups i and j is given by element (i,j). For undirected graphs, this matrix must be symmetric. |
b1 |
Numeric vector giving the number of vertices in each group for the first sample. The sum of the vector must match the number of vertices. |
p2 |
The matrix giving the Bernoulli rates for the 2nd sample (default: same as 1st sample). This is a KxK matrix, where K is the number of groups. The probability of creating an edge between vertices from groups i and j is given by element (i,j). For undirected graphs, this matrix must be symmetric. |
b2 |
Numeric vector giving the number of vertices in each group for the second sample (default: same as 1st sample). The sum of the vector must match the number of vertices. |
seed |
The seed for the random number generator (default: |
A length-2 list containing the two samples stored as
nvd
objects.
n <- 10 p1 <- matrix( data = c(0.1, 0.4, 0.1, 0.4, 0.4, 0.4, 0.1, 0.4, 0.1, 0.1, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4), nrow = 4, ncol = 4, byrow = TRUE ) p2 <- matrix( data = c(0.1, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.1, 0.1, 0.4, 0.4, 0.1, 0.4), nrow = 4, ncol = 4, byrow = TRUE ) sim <- sample2_sbm(n, 68, p1, c(17, 17, 17, 17), p2, seed = 1234)
n <- 10 p1 <- matrix( data = c(0.1, 0.4, 0.1, 0.4, 0.4, 0.4, 0.1, 0.4, 0.1, 0.1, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4), nrow = 4, ncol = 4, byrow = TRUE ) p2 <- matrix( data = c(0.1, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.1, 0.1, 0.4, 0.4, 0.1, 0.4), nrow = 4, ncol = 4, byrow = TRUE ) sim <- sample2_sbm(n, 68, p1, c(17, 17, 17, 17), p2, seed = 1234)
A collection of functions to generate random graphs with specified edge distribution.
rpois_network(n, num_vertices, lambda = 1) rexp_network(n, num_vertices, rate = 1) rbinom_network(n, num_vertices, size = 1, prob = 0.5) rsbm(n, num_vertices, pref_matrix, block_sizes)
rpois_network(n, num_vertices, lambda = 1) rexp_network(n, num_vertices, rate = 1) rbinom_network(n, num_vertices, size = 1, prob = 0.5) rsbm(n, num_vertices, pref_matrix, block_sizes)
n |
Sample size. |
num_vertices |
Number of vertices. |
lambda |
The mean parameter for the Poisson distribution (default: 1). |
rate |
The rate parameter for the exponential distribution (default: 1). |
size |
The number of trials for the binomial distribution (default: 1). |
prob |
The probability of success on each trial for the binomial distribution (default: 0.5). |
pref_matrix |
The matrix giving the Bernoulli rates. This is a KxK
matrix, where K is the number of groups. The probability of creating an
edge between vertices from groups i and j is given by element (i,j). For
undirected graphs, this matrix must be symmetric. See
|
block_sizes |
Numeric vector giving the number of vertices in each
group. The sum of the vector must match the number of vertices. See
|
A object of class nvd
containing the sample of graphs.
nvd <- rexp_network(10, 68)
nvd <- rexp_network(10, 68)
This is a collection of functions that provide statistics for testing equality in distribution between samples of networks.
stat_student_euclidean(d, indices, ...) stat_welch_euclidean(d, indices, ...) stat_original_edge_count(d, indices, edge_count_prep, ...) stat_generalized_edge_count(d, indices, edge_count_prep, ...) stat_weighted_edge_count(d, indices, edge_count_prep, ...)
stat_student_euclidean(d, indices, ...) stat_welch_euclidean(d, indices, ...) stat_original_edge_count(d, indices, edge_count_prep, ...) stat_generalized_edge_count(d, indices, edge_count_prep, ...) stat_weighted_edge_count(d, indices, edge_count_prep, ...)
d |
Either a matrix of dimension |
indices |
A vector of dimension |
... |
Extra parameters specific to some statistics. |
edge_count_prep |
A list of preprocessed data information used by edge
count statistics and produced by |
In details, there are three main categories of statistics:
Euclidean t-Statistics: both Student stat_student_euclidean
version for
equal variances and Welch stat_welch_euclidean
version for unequal
variances,
Statistics based on similarity graphs: 3 types of edge count statistics.
A scalar giving the value of the desired test statistic.
n1 <- 30L n2 <- 10L gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) x <- nvd(model = "gnp", n = n1, model_params = gnp_params) y <- nvd(model = "k_regular", n = n2, model_params = k_regular_params) r <- repr_nvd(x, y, representation = "laplacian") stat_student_euclidean(r, 1:n1) stat_welch_euclidean(r, 1:n1) d <- dist_nvd(x, y, representation = "laplacian", distance = "frobenius") ecp <- edge_count_global_variables(d, n1, k = 5L) stat_original_edge_count(d, 1:n1, edge_count_prep = ecp) stat_generalized_edge_count(d, 1:n1, edge_count_prep = ecp) stat_weighted_edge_count(d, 1:n1, edge_count_prep = ecp)
n1 <- 30L n2 <- 10L gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) x <- nvd(model = "gnp", n = n1, model_params = gnp_params) y <- nvd(model = "k_regular", n = n2, model_params = k_regular_params) r <- repr_nvd(x, y, representation = "laplacian") stat_student_euclidean(r, 1:n1) stat_welch_euclidean(r, 1:n1) d <- dist_nvd(x, y, representation = "laplacian", distance = "frobenius") ecp <- edge_count_global_variables(d, n1, k = 5L) stat_original_edge_count(d, 1:n1, edge_count_prep = ecp) stat_generalized_edge_count(d, 1:n1, edge_count_prep = ecp) stat_weighted_edge_count(d, 1:n1, edge_count_prep = ecp)
This is a collection of functions for extracting full, intra and inter subgraphs of a graph given a list of vertex subsets.
subgraph_full(g, vids) subgraph_intra(g, vids) subgraph_inter(g, vids)
subgraph_full(g, vids) subgraph_intra(g, vids) subgraph_inter(g, vids)
g |
An |
vids |
A list of integer vectors identifying vertex subsets. |
An igraph
object storing a subgraph of type
full, intra or inter.
g <- igraph::make_ring(10) g_full <- subgraph_full (g, list(1:3, 4:5, 8:10)) g_intra <- subgraph_intra(g, list(1:3, 4:5, 8:10)) g_inter <- subgraph_inter(g, list(1:3, 4:5, 8:10))
g <- igraph::make_ring(10) g_full <- subgraph_full (g, list(1:3, 4:5, 8:10)) g_intra <- subgraph_intra(g, list(1:3, 4:5, 8:10)) g_inter <- subgraph_inter(g, list(1:3, 4:5, 8:10))
This function carries out an hypothesis test where the null hypothesis is that the two populations of networks share the same underlying probabilistic distribution against the alternative hypothesis that the two populations come from different distributions. The test is performed in a non-parametric fashion using a permutational framework in which several statistics can be used, together with several choices of network matrix representations and distances between networks.
test2_global( x, y, representation = c("adjacency", "laplacian", "modularity", "transitivity"), distance = c("frobenius", "hamming", "spectral", "root-euclidean"), stats = c("flipr:t_ip", "flipr:f_ip"), B = 1000L, test = "exact", k = 5L, seed = NULL, ... )
test2_global( x, y, representation = c("adjacency", "laplacian", "modularity", "transitivity"), distance = c("frobenius", "hamming", "spectral", "root-euclidean"), stats = c("flipr:t_ip", "flipr:f_ip"), B = 1000L, test = "exact", k = 5L, seed = NULL, ... )
x |
Either an object of class nvd listing networks in sample 1 or a
distance matrix of size |
y |
Either an object of class nvd listing networks in sample 2 or an integer value specifying the size of sample 1 or an integer vector specifying the indices of the observations belonging to sample 1. |
representation |
A string specifying the desired type of representation,
among: |
distance |
A string specifying the chosen distance for calculating the
test statistic, among: |
stats |
A character vector specifying the chosen test statistic(s),
among: |
B |
The number of permutation or the tolerance. If this number is lower
than |
test |
A character string specifying the formula to be used to compute
the permutation p-value. Choices are |
k |
An integer specifying the density of the minimum spanning tree used
for the edge count statistics. Defaults to |
seed |
An integer for specifying the seed of the random generator for
result reproducibility. Defaults to |
... |
Extra arguments to be passed to the distance function. |
A list
with three components: the value of the
statistic for the original two samples, the p-value of the resulting
permutation test and a numeric vector storing the values of the permuted
statistics.
n <- 5L gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) # Two different models for the two populations x <- nvd(model = "gnp", n = n, model_params = gnp_params) y <- nvd(model = "k_regular", n = n, model_params = k_regular_params) t1 <- test2_global(x, y, representation = "modularity") t1$pvalue # Same model for the two populations x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) y <- nvd(model = "gnp", n = 10L, model_params = gnp_params) t2 <- test2_global(x, y, representation = "modularity") t2$pvalue
n <- 5L gnp_params <- list(p = 1/3) k_regular_params <- list(k = 8L) # Two different models for the two populations x <- nvd(model = "gnp", n = n, model_params = gnp_params) y <- nvd(model = "k_regular", n = n, model_params = k_regular_params) t1 <- test2_global(x, y, representation = "modularity") t1$pvalue # Same model for the two populations x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) y <- nvd(model = "gnp", n = 10L, model_params = gnp_params) t2 <- test2_global(x, y, representation = "modularity") t2$pvalue
Local Two-Sample Test for Network-Valued Data
test2_local( x, y, partition, representation = "adjacency", distance = "frobenius", stats = c("flipr:t_ip", "flipr:f_ip"), B = 1000L, alpha = 0.05, test = "exact", k = 5L, seed = NULL, verbose = FALSE )
test2_local( x, y, partition, representation = "adjacency", distance = "frobenius", stats = c("flipr:t_ip", "flipr:f_ip"), B = 1000L, alpha = 0.05, test = "exact", k = 5L, seed = NULL, verbose = FALSE )
x |
Either an object of class nvd listing networks in sample 1 or a
distance matrix of size |
y |
Either an object of class nvd listing networks in sample 2 or an integer value specifying the size of sample 1 or an integer vector specifying the indices of the observations belonging to sample 1. |
partition |
Either a list or an integer vector specifying vertex memberships into partition elements. |
representation |
A string specifying the desired type of representation,
among: |
distance |
A string specifying the chosen distance for calculating the
test statistic, among: |
stats |
A character vector specifying the chosen test statistic(s),
among: |
B |
The number of permutation or the tolerance. If this number is lower
than |
alpha |
Significance level for hypothesis testing. If set to 1, the
function outputs properly adjusted p-values. If lower than 1, then only
p-values lower than alpha are properly adjusted. Defaults to |
test |
A character string specifying the formula to be used to compute
the permutation p-value. Choices are |
k |
An integer specifying the density of the minimum spanning tree used
for the edge count statistics. Defaults to |
seed |
An integer for specifying the seed of the random generator for
result reproducibility. Defaults to |
verbose |
Boolean specifying whether information on intermediate tests
should be printed in the process (default: |
A length-2 list reporting the adjusted p-values of each element of the partition for the intra- and inter-tests.
n <- 5L p1 <- matrix( data = c(0.1, 0.4, 0.1, 0.4, 0.4, 0.4, 0.1, 0.4, 0.1, 0.1, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4), nrow = 4, ncol = 4, byrow = TRUE ) p2 <- matrix( data = c(0.1, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.1, 0.1, 0.4, 0.4, 0.1, 0.4), nrow = 4, ncol = 4, byrow = TRUE ) sim <- sample2_sbm(n, 68, p1, c(17, 17, 17, 17), p2, seed = 1234) m <- as.integer(c(rep(1, 17), rep(2, 17), rep(3, 17), rep(4, 17))) test2_local(sim$x, sim$y, m, seed = 1234, alpha = 0.05, B = 19)
n <- 5L p1 <- matrix( data = c(0.1, 0.4, 0.1, 0.4, 0.4, 0.4, 0.1, 0.4, 0.1, 0.1, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4), nrow = 4, ncol = 4, byrow = TRUE ) p2 <- matrix( data = c(0.1, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.1, 0.1, 0.4, 0.4, 0.1, 0.4), nrow = 4, ncol = 4, byrow = TRUE ) sim <- sample2_sbm(n, 68, p1, c(17, 17, 17, 17), p2, seed = 1234) m <- as.integer(c(rep(1, 17), rep(2, 17), rep(3, 17), rep(4, 17))) test2_local(sim$x, sim$y, m, seed = 1234, alpha = 0.05, B = 19)
This function computes the Fréchet variance around a specified network from an observed sample of network-valued random variables according to a specified distance. In most cases, the user is willing to compute the sample variance, in which case the Fréchet variance has to be evaluated w.r.t. the sample Fréchet mean. In this case, it is important that the user indicates the same distance as the one (s)he used to separately compute the sample Fréchet mean. This function can also be used as is as the function to be minimized in order to find the Fréchet mean for a given distance.
var_nvd(x, x0, weights = rep(1, length(x)), distance = "frobenius")
var_nvd(x, x0, weights = rep(1, length(x)), distance = "frobenius")
x |
An |
x0 |
A network already in matrix representation around which to calculate variance (usually the Fréchet mean but not necessarily). Note that the chosen matrix representation is extracted from this parameter. |
weights |
A numeric vector specifying weights for each observation (default: equally weighted). |
distance |
A string specifying the distance to be used. Possible choices
are: hamming, frobenius, spectral or root-euclidean. Default is frobenius.
When the Fréchet mean is used as |
A positive scalar value evaluating the amount of variability of the sample around the specified network.
gnp_params <- list(p = 1/3) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) m <- mean(x) var_nvd(x = x, x0 = m, distance = "frobenius")
gnp_params <- list(p = 1/3) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) m <- mean(x) var_nvd(x = x, x0 = m, distance = "frobenius")
This function computes the Fréchet variance using exclusively inter-point distances. As such, it can accommodate any pair of representation and distance.
var2_nvd(x, representation = "adjacency", distance = "frobenius")
var2_nvd(x, representation = "adjacency", distance = "frobenius")
x |
An |
representation |
A string specifying the graph representation to be used. Choices are adjacency, laplacian, modularity, graphon. Default is adjacency. |
distance |
A string specifying the distance to be used. Possible choices are: hamming, frobenius, spectral or root-euclidean. Default is frobenius. |
A positive scalar value evaluating the variance based on inter-point distances.
gnp_params <- list(p = 1/3) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) var2_nvd(x = x, representation = "graphon", distance = "frobenius")
gnp_params <- list(p = 1/3) x <- nvd(model = "gnp", n = 10L, model_params = gnp_params) var2_nvd(x = x, representation = "graphon", distance = "frobenius")