Title: | Distance Measures for Networks |
---|---|
Description: | Network is a prevalent form of data structure in many fields. As an object of analysis, many distance or metric measures have been proposed to define the concept of similarity between two networks. We provide a number of distance measures for networks. See Jurman et al (2011) <doi:10.3233/978-1-60750-692-8-227> for an overview on spectral class of inter-graph distance measures. |
Authors: | Kisung You [aut, cre] |
Maintainer: | Kisung You <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.4 |
Built: | 2024-12-07 06:54:31 UTC |
Source: | CRAN |
Simulated list of 20 adjacency matrices of 28 nodes. First 10 are from Erdős–Rényi model with , and
the latter 10 are generated using
. Each element in the list is of size
, symmetric,
having values in
or
, and every diagonal element is set as
in accordance with no self-loop assumption.
data(graph20)
data(graph20)
A list
of 20 adjacency matrices of size .
Below is the code used to generate graph20:
require(stats) graph20 = list() for (i in 1:10){ # type-1 adjacency matrices rbin = rbinom(784,1,0.9) mat = matrix(rbin, nrow=28) matout = mat*t(mat) diag(matout) = 0 graph20[[i]]=matout } for (i in 11:20){ # type-2 adjacency matrices rbin = rbinom(784,1,0.5) mat = matrix(rbin, nrow=28) matout = mat*t(mat) diag(matout) = 0 graph20[[i]]=matout }
Centrality is a core concept in studying the topological structure of
complex networks, which can be either defined for each node or edge.
nd.centrality
offers 3 distance measures on node-defined centralities.
See this Wikipedia page for more
on network/graph centrality.
nd.centrality( A, out.dist = TRUE, mode = c("Degree", "Close", "Between"), directed = FALSE )
nd.centrality( A, out.dist = TRUE, mode = c("Degree", "Close", "Between"), directed = FALSE )
A |
a list of length |
out.dist |
a logical; |
mode |
type of node centrality definitions to be used. |
directed |
a logical; |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
an matrix where rows are node centralities for each graph.
Roy M, Schmid S, Trédan G (2014). “Modeling and Measuring Graph Similarity: The Case for Centrality Distance.” In FOMC 2014, 10th ACM International Workshop on Foundations of Mobile Computing, 53.
## load example data data(graph20) ## use 3 types of centrality measures out1 <- nd.centrality(graph20, out.dist=FALSE,mode="Degree") out2 <- nd.centrality(graph20, out.dist=FALSE,mode="Close") out3 <- nd.centrality(graph20, out.dist=FALSE,mode="Between") ## visualize opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(out1$D[,20:1], main="Degree", col=gray(0:32/32), axes=FALSE) image(out2$D[,20:1], main="Close", col=gray(0:32/32), axes=FALSE) image(out3$D[,20:1], main="Between", col=gray(0:32/32), axes=FALSE) par(opar)
## load example data data(graph20) ## use 3 types of centrality measures out1 <- nd.centrality(graph20, out.dist=FALSE,mode="Degree") out2 <- nd.centrality(graph20, out.dist=FALSE,mode="Close") out3 <- nd.centrality(graph20, out.dist=FALSE,mode="Between") ## visualize opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(out1$D[,20:1], main="Degree", col=gray(0:32/32), axes=FALSE) image(out2$D[,20:1], main="Close", col=gray(0:32/32), axes=FALSE) image(out3$D[,20:1], main="Between", col=gray(0:32/32), axes=FALSE) par(opar)
Distance of Continuous Spectral DensitiesThe method employs spectral density of eigenvalues from
Laplacian in that for each, we have corresponding
spectral density as a sum of
narrow Lorentz distributions with
bandwidth
parameter.
Since it involves integration of a function over the
non-compact domain, it may blow up to infinity and the code
automatically aborts the process.
nd.csd(A, out.dist = TRUE, bandwidth = 1)
nd.csd(A, out.dist = TRUE, bandwidth = 1)
A |
a list of length |
out.dist |
a logical; |
bandwidth |
common bandwidth of positive real number. |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
an matrix where each row is top-
vibrational spectra.
Ipsen M, Mikhailov AS (2002). “Evolutionary reconstruction of networks.” Physical Review E, 66(4). ISSN 1063-651X, 1095-3787.
## load example data data(graph20) ## compute distance matrix output = nd.csd(graph20, out.dist=FALSE, bandwidth=1.0) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
## load example data data(graph20) ## compute distance matrix output = nd.csd(graph20, out.dist=FALSE, bandwidth=1.0) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
Discrete Spectral Distance (DSD) is defined as the Euclidean distance between
the spectra of various matrices, such as adjacency matrix (
"Adj"
),
(unnormalized) Laplacian matrix (
"Lap"
),
signless Laplacian matrix (
"SLap"
), or
normalized Laplacian matrix .
nd.dsd(A, out.dist = TRUE, type = c("Lap", "SLap", "NLap", "Adj"))
nd.dsd(A, out.dist = TRUE, type = c("Lap", "SLap", "NLap", "Adj"))
A |
a list of length |
out.dist |
a logical; |
type |
type of target structure. One of |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
an matrix where each row is top-
vibrational spectra.
Wilson RC, Zhu P (2008). “A study of graph spectra for comparing graphs and trees.” Pattern Recognition, 41(9), 2833–2841. ISSN 00313203.
## load example data and extract only a few data(graph20) gr.small = graph20[c(1:5,11:15)] ## compute distance matrix output <- nd.dsd(gr.small, out.dist=FALSE) ## visualize opar <- par(no.readonly=TRUE) par(pty="s") image(output$D[,10:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
## load example data and extract only a few data(graph20) gr.small = graph20[c(1:5,11:15)] ## compute distance matrix output <- nd.dsd(gr.small, out.dist=FALSE) ## visualize opar <- par(no.readonly=TRUE) par(pty="s") image(output$D[,10:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
It is of the most simplest form that Edge Difference Distance (EDD) takestwo adjacency matrices and takes Frobenius norm of their differnces.
nd.edd(A, out.dist = TRUE)
nd.edd(A, out.dist = TRUE)
A |
a list of length |
out.dist |
a logical; |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
Hammond DK, Gur Y, Johnson CR (2013). “Graph Diffusion Distance: A Difference Measure for Weighted Graphs Based on the Graph Laplacian Exponential Kernel.” In Proceedings of the IEEE global conference on information and signal processing (GlobalSIP'13), 419–422.
## load example data data(graph20) ## compute distance matrix output = nd.edd(graph20, out.dist=FALSE) ## visualize opar <- par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
## load example data data(graph20) ## compute distance matrix output = nd.edd(graph20, out.dist=FALSE) ## visualize opar <- par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
eigenvaluesExtremal distance (nd.extremal
) is a type of spectral distance measures on two graphs' graph Laplacian,
where is an adjacency matrix and
. It takes top-
eigenvalues from
graph Laplacian matrices and take normalized sum of squared differences as metric. Note that it is
1. non-negative, 2. separated, 3. symmetric, and satisfies 4. triangle inequality in that
it is indeed a metric.
nd.extremal(A, out.dist = TRUE, k = ceiling(nrow(A)/5))
nd.extremal(A, out.dist = TRUE, k = ceiling(nrow(A)/5))
A |
a list of length |
out.dist |
a logical; |
k |
the number of largest eigenvalues to be used. |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
an matrix where each row is top-
Laplacian eigenvalues.
Jakobson D, Rivin I (2002). “Extremal metrics on graphs I.” Forum Mathematicum, 14(1). ISSN 0933-7741, 1435-5337.
## load data data(graph20) ## compute distance matrix output = nd.extremal(graph20, out.dist=FALSE, k=2) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="two group case", col=gray(0:32/32), axes=FALSE) par(opar)
## load data data(graph20) ## compute distance matrix output = nd.extremal(graph20, out.dist=FALSE, k=2) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="two group case", col=gray(0:32/32), axes=FALSE) par(opar)
Graph Diffusion Distance (nd.gdd
) quantifies the difference between two weighted graphs of same size. It takes
an idea from heat diffusion process on graphs via graph Laplacian exponential kernel matrices. For a given
adjacency matrix , the graph Laplacian is defined as
where . For two adjacency matrices
and
,
GDD is defined as
where is matrix exponential,
a Frobenius norm, and
and
Laplacian matrices corresponding to
and
, respectively.
nd.gdd(A, out.dist = TRUE, vect = seq(from = 0.1, to = 1, length.out = 10))
nd.gdd(A, out.dist = TRUE, vect = seq(from = 0.1, to = 1, length.out = 10))
A |
a list of length |
out.dist |
a logical; |
vect |
a vector of parameters |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
an matrix whose entries are maximizer of the cost function.
Hammond DK, Gur Y, Johnson CR (2013). “Graph Diffusion Distance: A Difference Measure for Weighted Graphs Based on the Graph Laplacian Exponential Kernel.” In Proceedings of the IEEE global conference on information and signal processing (GlobalSIP'13), 419–422.
## load data and extract a subset data(graph20) gr.small = graph20[c(1:5,11:15)] ## compute distance matrix output = nd.gdd(gr.small, out.dist=FALSE) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,10:1], main="two group case", col=gray((0:32)/32), axes=FALSE) par(opar)
## load data and extract a subset data(graph20) gr.small = graph20[c(1:5,11:15)] ## compute distance matrix output = nd.gdd(gr.small, out.dist=FALSE) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,10:1], main="two group case", col=gray((0:32)/32), axes=FALSE) par(opar)
Graphon is a symmetric measurable function
that is considered to be a generating model for an observed network. nd.graphon
computes
distances between networks based on the estimated graphons of each network. Estimation methods
are taken from graphon package. For more details, see the function links below.
nd.graphon( A, out.dist = TRUE, method = c("completion", "LG", "nbdsmooth", "SBA", "USVT"), ... )
nd.graphon( A, out.dist = TRUE, method = c("completion", "LG", "nbdsmooth", "SBA", "USVT"), ... )
A |
a list of length |
out.dist |
a logical; |
method |
type of graphon estimation methods to be used. |
... |
extra parameters to be passed onto graphon estimation functions. See also |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
Mukherjee SS, Sarkar P, Lin L (2017). “On clustering network-valued data.” In Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.), Advances in neural information processing systems 30, 7071–7081. Curran Associates, Inc.
## load example data data(graph20) ## compute USVT-based distance output <- nd.graphon(graph20, out.dist=FALSE, method="usvt") ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="USVT", col=gray(0:32/32), axes=FALSE) par(opar)
## load example data data(graph20) ## compute USVT-based distance output <- nd.graphon(graph20, out.dist=FALSE, method="usvt") ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="USVT", col=gray(0:32/32), axes=FALSE) par(opar)
Hamming Distance is the count of discrepancy between two binary networks for each edge. Therefore, if used with non-binary networks, it might return a warning message and distorted results. It was originally designed to compare two strings of equal length, see Wikipedia page for more detailed introduction.
nd.hamming(A, out.dist = TRUE)
nd.hamming(A, out.dist = TRUE)
A |
a list of length |
out.dist |
a logical; |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
Hamming RW (1950). “Error Detecting and Error Correcting Codes.” Bell System Technical Journal, 29(2), 147–160. ISSN 00058580.
## load example data and extract only a few data(graph20) gr.small = graph20[c(1:5,11:15)] ## compute distance matrix output = nd.hamming(gr.small, out.dist=FALSE) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,10:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
## load example data and extract only a few data(graph20) gr.small = graph20[c(1:5,11:15)] ## compute distance matrix output = nd.hamming(gr.small, out.dist=FALSE) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,10:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
Hamming-Ipsen-Mikhailov (HIM) combines the local Hamming edit distance and the global
Ipsen-Mikhailov distance to merge information at each scale. For Ipsen-Mikhailove distance,
it is provided as nd.csd
in our package for consistency. Given a parameter (
xi
),
it is defined as
where and
stand for Hamming and I-M distance, respectively.
nd.him(A, out.dist = TRUE, xi = 1, ntest = 10)
nd.him(A, out.dist = TRUE, xi = 1, ntest = 10)
A |
a list of length |
out.dist |
a logical; |
xi |
a parameter to control balance between two distances. |
ntest |
the number of searching over |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
Jurman G, Visintainer R, Filosi M, Riccadonna S, Furlanello C (2015). “The HIM glocal metric and kernel for network comparison and classification.” In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 1–10. ISBN 978-1-4673-8272-4.
## load example data data(graph20) ## compute distance matrix output = nd.him(graph20, out.dist=FALSE) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
## load example data data(graph20) ## compute distance matrix output = nd.him(graph20, out.dist=FALSE) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,20:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
For a graph with an adjacency matrix , graph moment is defined as
where is the number of vertices and
is an order of the moment.
nd.moments
computes
pairwise distances based on log of graph moments from to
.
nd.moments( A, k = 3, metric = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), out.dist = TRUE )
nd.moments( A, k = 3, metric = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), out.dist = TRUE )
A |
a list of length |
k |
the integer order of moments. If |
metric |
type of distance measures for log-moment features. See |
out.dist |
a logical; |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
Mukherjee SS, Sarkar P, Lin L (2017). “On clustering network-valued data.” In Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.), Advances in neural information processing systems 30, 7071–7081. Curran Associates, Inc.
## load example data data(graph20) ## compute distance based on different k's. out3 <- nd.moments(graph20, k=3, out.dist=FALSE) out5 <- nd.moments(graph20, k=5, out.dist=FALSE) out7 <- nd.moments(graph20, k=7, out.dist=FALSE) out9 <- nd.moments(graph20, k=9, out.dist=FALSE) ## visualize opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out3$D[,20:1], col=gray(0:32/32), axes=FALSE, main="k=3") image(out5$D[,20:1], col=gray(0:32/32), axes=FALSE, main="k=5") image(out7$D[,20:1], col=gray(0:32/32), axes=FALSE, main="k=7") image(out9$D[,20:1], col=gray(0:32/32), axes=FALSE, main="k=9") par(opar)
## load example data data(graph20) ## compute distance based on different k's. out3 <- nd.moments(graph20, k=3, out.dist=FALSE) out5 <- nd.moments(graph20, k=5, out.dist=FALSE) out7 <- nd.moments(graph20, k=7, out.dist=FALSE) out9 <- nd.moments(graph20, k=9, out.dist=FALSE) ## visualize opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(out3$D[,20:1], col=gray(0:32/32), axes=FALSE, main="k=3") image(out5$D[,20:1], col=gray(0:32/32), axes=FALSE, main="k=5") image(out7$D[,20:1], col=gray(0:32/32), axes=FALSE, main="k=7") image(out9$D[,20:1], col=gray(0:32/32), axes=FALSE, main="k=9") par(opar)
Network Flow Distance
nd.nfd( A, order = 0, out.dist = TRUE, vect = seq(from = 0, to = 10, length.out = 1000) )
nd.nfd( A, order = 0, out.dist = TRUE, vect = seq(from = 0, to = 10, length.out = 1000) )
A |
a list of length |
order |
the order of Laplacian; currently only 0 and 1 are supported. |
out.dist |
a logical; |
vect |
a vector of parameters |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
## Not run: ## load example data data(graph20) # compute two diffusion-based distances and visualize out1 = nd.gdd(graph20, out.dist=FALSE) out2 = nd.nfd(graph20, out.dist=FALSE) # visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(out1$D[,20:1],col=gray((0:32)/32), main="nd.gdd",axes=FALSE) image(out2$D[,20:1],col=gray((0:32)/32), main="nd.nfd",axes=FALSE) par(opar) ## End(Not run)
## Not run: ## load example data data(graph20) # compute two diffusion-based distances and visualize out1 = nd.gdd(graph20, out.dist=FALSE) out2 = nd.nfd(graph20, out.dist=FALSE) # visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(out1$D[,20:1],col=gray((0:32)/32), main="nd.gdd",axes=FALSE) image(out2$D[,20:1],col=gray((0:32)/32), main="nd.nfd",axes=FALSE) par(opar) ## End(Not run)
Normalized Laplacian matrix contains topological information of
a corresponding network via its spectrum. nd.wsd
adopts weighted
spectral distribution of eigenvalues and brings about a metric via
binning strategy.
nd.wsd(A, out.dist = TRUE, K = 50, wN = 4)
nd.wsd(A, out.dist = TRUE, K = 50, wN = 4)
A |
a list of length |
out.dist |
a logical; |
K |
the number of bins for the spectrum interval |
wN |
a decaying exponent; default is |
a named list containing
an matrix or
dist
object containing pairwise distance measures.
an matrix of rows being eigenvalues for each graph.
Fay D, Haddadi H, Thomason A, Moore AW, Mortier R, Jamakovic A, Uhlig S, Rio M (2010). “Weighted Spectral Distribution for Internet Topology Analysis: Theory and Applications.” IEEE/ACM Transactions on Networking, 18(1), 164–176. ISSN 1063-6692, 1558-2566.
## load example data and extract a few data(graph20) gr.small = graph20[c(1:5,11:15)] ## compute distance matrix output = nd.wsd(gr.small, out.dist=FALSE, K=10) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,10:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
## load example data and extract a few data(graph20) gr.small = graph20[c(1:5,11:15)] ## compute distance matrix output = nd.wsd(gr.small, out.dist=FALSE, K=10) ## visualize opar = par(no.readonly=TRUE) par(pty="s") image(output$D[,10:1], main="two group case", axes=FALSE, col=gray(0:32/32)) par(opar)
Network has gathered much attention from many disciplines, as many of real data
can be well represented in the relational form. The concept of distance - or, metric - between
two networks is the starting point for inference on population of networks. NetworkDistance package
provides a not-so-comprehensive collection of distance measures for measuring dissimilarity between two network objects.
Data should be supplied as adjacency matrices, where we support three formats of data representation;
matrix
object in R base, network
class from network package, and igraph
class from
igraph package.