Title: | Similarity-First Search Seriation Algorithm |
---|---|
Description: | An implementation of the Similarity-First Search algorithm (SFS), a combinatorial algorithm which can be used to solve the seriation problem and to recognize some structured weighted graphs. The SFS algorithm represents a generalization to weighted graphs of the graph search algorithm Lexicographic Breadth-First Search (Lex-BFS), a variant of Breadth-First Search. The SFS algorithm reduces to Lex-BFS when applied to binary matrices (or, equivalently, unweighted graphs). Hence this library can be also considered for Lex-BFS applications such as recognition of graph classes like chordal or unit interval graphs. In fact, the SFS seriation algorithm implemented in this package is a multisweep algorithm, which consists in repeating a finite number of SFS iterations (at most n sweeps for a matrix of size n). If the data matrix has a Robinsonian structure, then the ranking returned by the multistep SFS algorithm is a Robinson ordering of the input matrix. Otherwise the algorithm can be used as a heuristic to return a ranking partially satisfying the Robinson property. |
Authors: | Matteo Seminaroti [aut, cph], Utz-Uwe Haus [aut, cre, cph], Monique Laurent [ctb] |
Maintainer: | Utz-Uwe Haus <[email protected]> |
License: | GPL-3 |
Version: | 0.1.4 |
Built: | 2024-12-17 06:38:37 UTC |
Source: | CRAN |
Read the similarity (or dissimilarity) information between the objects that one wants to order and build a 3-columns data frame
, where each row represents the (dis)similarity
between objects
and
. In case of symmetric data (i.e.,
), only the entries for pairs
with
are listed.
read(data, zero_epsilon = 1e-200, symmetric = TRUE, identical_val = FALSE)
read(data, zero_epsilon = 1e-200, symmetric = TRUE, identical_val = FALSE)
data |
a representation of the similarity (or dissimilarity) between pairs of objects. |
zero_epsilon |
a numerical value which determines that values in |
symmetric |
a boolean value equal to |
identical_val |
a boolean value equal to |
The input data can be a weighted adjacency matrix (represented by the objects: matrix
, dist
or data frame
), or a list of all the weighted edges of a weighted graph (represented by a 3-col data frame
) where each row represents the (dis)similarity
between objects
and
with
). If not specified, the data is assumed to be symmetric (i.e., same entry at positions
and
). Since by default the data is assumed to be symmetric, if it is represented by a 3-columns
data frame
, then it is assumed that symmetric pairs are not listed, and thus by default identical_val = FALSE
. The reason for this choice is that for large symmetric data, it is more efficient to list the symmetric entries only once.
However, note that if symmetric = FALSE
then identical_val = TRUE
automatically.
Returns a 3-columns data frame
representation of the original data listing all the pairwise (dis)similarities between objects and selecting only the entries
with
when the data is a symmetric matrix
.
Matteo Seminaroti (SFS) and Utz-Uwe Haus (R wrapping)
Return a ranking of the objects such that similar objects are ordered close to each other. If the matrix is Robinsonian, then the ranking returned is a Robinson ordering.
sfs(matrix, sfs_epsilon = 0, dissimilarity = FALSE, Robinsonian = FALSE, num_sweeps = 4)
sfs(matrix, sfs_epsilon = 0, dissimilarity = FALSE, Robinsonian = FALSE, num_sweeps = 4)
matrix |
a 3-columns |
sfs_epsilon |
a numerical value which determines that two entries whose difference is below this threshold are considered to be equal. |
dissimilarity |
a boolean value equal to |
Robinsonian |
a boolean value equal to |
num_sweeps |
an integer value that determines how many iterations of SFS shall be performed. |
Given a a 3-columns data frame
listing all the similarities (or dissimilarities) among the objects, this function builds a
spMat
object in Armadillo and computes a finite number of repeated SFS iterations (each called a sweep). The user may decide the threshold for which two entries are considered equal, meaning that if
sfs_epsilon
, then objects and
have the same similarity (or dissimilarity) with respect to object
. By default, this threshold is set to
.
If not specified, the matrix
represents the similarity information between objects. If dissimilarity = TRUE
, then the matrix
represents the dissimilarity information and the SFS algorithm is modified by sorting the neighborhood of a visited vertex for increasing values (instead of for decreasing values).
The parameter num_sweeps
sets the number of sweeps performed by SFS()
. This number directly affects the complexity of the function since, as each sweep runs in time,
SFS()
runs in time. By default,
num_sweeps=4
, as it is known that three sweeps suffice for recognizing Robinonian binary matrices and for general matrices experiments show that four sweeps are enough for finding a good ranking for most data. If Robinsonian = TRUE
, then the number of sweeps is automatically set to the number of objects to rank minus one. In this case,
sfs()
also checks if the returned permutation is a Robinson ordering (since it is known that if the order returned after sweeps is not a Robinson ordering then the data is not Robinsonian). Efficient measures are implemented in order to avoid unnecessary time consuming loops between consecutive SFS iterations. Note that checking if a given permutation is a Robinson ordering is implemented at the moment only when dealing with similarities among the objects.
Finally, the object returned by SFS()
is a vector of integers, where the entry at position represents the
-th object in the ranking. If the
matrix
is 0-based, then the returned vector is 0-based. If matrix
is 1-based, then the returned vector is 1-based.
Return a (row) vector of integers representing the ranking of the objects, which is 0-based or 1-based accordingly with the input matrix
.
Matteo Seminaroti (SFS) and Utz-Uwe Haus (R wrapping)
M. Laurent and M. Seminaroti. Similarity-First Search: a new algorithm with application to Robinsonian matrix recognition. SIAM Journal on Discrete Mathematics (to appear). arXiv:1601.03521. 2016.
M. Seminaroti. Combinatorial Algorithms for the Seriation Problem. PhD thesis. School of Economics and Management, Tilburg University, pages 1–209. 2016.
## install package in R #install.packages("SFS_0.1.tar.gz") #install.packages("seriation") ## load package in R library(SFS) ## invoke SFS on a R Matrix mtxt <- c("11 2 9 0 5 0 5 5 2 0 5 0 5 6 0 0 2 0 5", "2 11 2 0 9 0 8 5 10 0 5 0 5 2 0 0 10 0 8", "9 2 11 0 5 0 5 5 2 0 5 0 5 10 0 0 2 0 5", "0 0 0 11 0 3 0 0 0 3 0 3 0 0 10 3 0 9 0", "5 9 5 0 11 0 8 7 9 0 7 0 7 5 0 0 9 0 10", "0 0 0 3 0 11 0 0 0 10 0 6 0 0 5 8 0 5 0", "5 8 5 0 8 0 11 7 8 0 7 0 7 5 0 0 8 0 9", "5 5 5 0 7 0 7 11 6 0 10 0 8 7 0 0 6 0 7", "2 10 2 0 9 0 8 6 11 0 6 0 5 2 0 0 10 0 8", "0 0 0 3 0 10 0 0 0 11 0 6 0 0 4 9 0 5 0", "5 5 5 0 7 0 7 10 6 0 11 0 9 7 0 0 6 0 7", "0 0 0 3 0 6 0 0 0 6 0 11 0 0 9 6 0 10 0", "5 5 5 0 7 0 7 8 5 0 9 0 11 7 0 0 5 0 7", "6 2 10 0 5 0 5 7 2 0 7 0 7 11 0 0 2 0 5", "0 0 0 10 0 5 0 0 0 4 0 9 0 0 11 4 0 10 0", "0 0 0 3 0 8 0 0 0 9 0 6 0 0 4 11 0 4 0", "2 10 2 0 9 0 8 6 10 0 6 0 5 2 0 0 11 0 8", "0 0 0 9 0 5 0 0 0 5 0 10 0 0 10 4 0 11 0", "5 8 5 0 10 0 9 7 8 0 7 0 7 5 0 0 8 0 11") M <- as.matrix(read.table(textConnection(mtxt))) A <- SFS::read(M) SFS::sfs(A, Robinsonian = TRUE) ## invoke SFS on a data-frame with 3-column data-frames with 0-based vertices, with ## (row, col, value) triples containing symmetric values data <- c("0 0 1.0", "1 0 1.5", "2 0 1.9", "0 1 2.0", "1 1 2.5", "2 1 2.9", "0 2 3.0", "1 2 3.5", "2 2 3.9") M <- read.table(textConnection(data)) A <- SFS::read(M, identical_val = TRUE) SFS::sfs(A) ## invoke SFS on a \code{dist} from seriation package: library(seriation) data("iris"); x <- as.matrix(iris[-5]); x <- x[sample(1:nrow(x)),]; M <- dist(x) D <- SFS::read(M) SFS::sfs(D, sfs_epsilon = 0.01, dissimilarity = TRUE) ## invoke SFS reading from file a Robinsonian matrix M <- read.table(system.file("extdata", "list_130.txt", package = "SFS")) A <- SFS::read(M, identical_val = TRUE) SFS::sfs(A, Robinsonian = TRUE) ## invoke SFS reading from file containing 3-columns (row, col, value) entries ## of an asymmetric non-Robinsonian similarity matrix with 1-based vertices M <- read.table(system.file("extdata", "list_130.txt", package = "SFS")) A <- SFS::read(M, identical_val = TRUE, symmetric = FALSE) SFS::sfs(A, num_sweeps = 7)
## install package in R #install.packages("SFS_0.1.tar.gz") #install.packages("seriation") ## load package in R library(SFS) ## invoke SFS on a R Matrix mtxt <- c("11 2 9 0 5 0 5 5 2 0 5 0 5 6 0 0 2 0 5", "2 11 2 0 9 0 8 5 10 0 5 0 5 2 0 0 10 0 8", "9 2 11 0 5 0 5 5 2 0 5 0 5 10 0 0 2 0 5", "0 0 0 11 0 3 0 0 0 3 0 3 0 0 10 3 0 9 0", "5 9 5 0 11 0 8 7 9 0 7 0 7 5 0 0 9 0 10", "0 0 0 3 0 11 0 0 0 10 0 6 0 0 5 8 0 5 0", "5 8 5 0 8 0 11 7 8 0 7 0 7 5 0 0 8 0 9", "5 5 5 0 7 0 7 11 6 0 10 0 8 7 0 0 6 0 7", "2 10 2 0 9 0 8 6 11 0 6 0 5 2 0 0 10 0 8", "0 0 0 3 0 10 0 0 0 11 0 6 0 0 4 9 0 5 0", "5 5 5 0 7 0 7 10 6 0 11 0 9 7 0 0 6 0 7", "0 0 0 3 0 6 0 0 0 6 0 11 0 0 9 6 0 10 0", "5 5 5 0 7 0 7 8 5 0 9 0 11 7 0 0 5 0 7", "6 2 10 0 5 0 5 7 2 0 7 0 7 11 0 0 2 0 5", "0 0 0 10 0 5 0 0 0 4 0 9 0 0 11 4 0 10 0", "0 0 0 3 0 8 0 0 0 9 0 6 0 0 4 11 0 4 0", "2 10 2 0 9 0 8 6 10 0 6 0 5 2 0 0 11 0 8", "0 0 0 9 0 5 0 0 0 5 0 10 0 0 10 4 0 11 0", "5 8 5 0 10 0 9 7 8 0 7 0 7 5 0 0 8 0 11") M <- as.matrix(read.table(textConnection(mtxt))) A <- SFS::read(M) SFS::sfs(A, Robinsonian = TRUE) ## invoke SFS on a data-frame with 3-column data-frames with 0-based vertices, with ## (row, col, value) triples containing symmetric values data <- c("0 0 1.0", "1 0 1.5", "2 0 1.9", "0 1 2.0", "1 1 2.5", "2 1 2.9", "0 2 3.0", "1 2 3.5", "2 2 3.9") M <- read.table(textConnection(data)) A <- SFS::read(M, identical_val = TRUE) SFS::sfs(A) ## invoke SFS on a \code{dist} from seriation package: library(seriation) data("iris"); x <- as.matrix(iris[-5]); x <- x[sample(1:nrow(x)),]; M <- dist(x) D <- SFS::read(M) SFS::sfs(D, sfs_epsilon = 0.01, dissimilarity = TRUE) ## invoke SFS reading from file a Robinsonian matrix M <- read.table(system.file("extdata", "list_130.txt", package = "SFS")) A <- SFS::read(M, identical_val = TRUE) SFS::sfs(A, Robinsonian = TRUE) ## invoke SFS reading from file containing 3-columns (row, col, value) entries ## of an asymmetric non-Robinsonian similarity matrix with 1-based vertices M <- read.table(system.file("extdata", "list_130.txt", package = "SFS")) A <- SFS::read(M, identical_val = TRUE, symmetric = FALSE) SFS::sfs(A, num_sweeps = 7)