| Title: | Importing, Constructing, and Exporting Bibliometric Networks |
|---|---|
| Description: | Imports, constructs, and exports bibliometric networks from scholarly metadata. Reads 'Scopus', 'Web of Science', 'BibTeX', 'RIS', 'OpenAlex', 'Lens.org', 'Dimensions', and 'Crossref' exports. Goes beyond standard co-networks with attention-weighted networks (lead, last, proximity, circular position weights), position-aware counting (harmonic, arithmetic, geometric, golden-ratio), similarity and dissimilarity normalisations, temporal networks with fixed, sliding, and cumulative windows, disparity-filter backbone extraction, historiograph construction, and local citation scoring. Methods described in López-Pernas, Saqr & Apiola (2023) <doi:10.1007/978-3-031-25336-2_5>. |
| Authors: | Mohammed Saqr [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-5881-3109>), Sonsoles López-Pernas [aut] (ORCID: <https://orcid.org/0000-0002-9621-1392>) |
| Maintainer: | Mohammed Saqr <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.4.4 |
| Built: | 2026-05-20 13:46:55 UTC |
| Source: | https://github.com/cran/bibnets |
Constructs a network between authors using one of four relationship types and any of 13 counting methods, including 9 position-dependent methods that respect author byline order.
author_network( data, type = "collaboration", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, position_weights = c(1, 0.8, 0.6, 0.4), first_last_weight = 2, attention = NULL, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )author_network( data, type = "collaboration", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, position_weights = c(1, 0.8, 0.6, 0.4), first_last_weight = 2, attention = NULL, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )
data |
A data frame with at least |
type |
Character. Relationship type:
|
counting |
Character. Counting method. Position-independent methods
( |
similarity |
Character. Similarity measure: |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum number of papers for an author to be included. Default 1. |
position_weights |
Numeric vector. Custom weights for
|
first_last_weight |
Numeric. Multiplier for |
attention |
Character or NULL. Attention-based weighting independent of
|
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
data(biblio_data) author_network(biblio_data, "collaboration") author_network(biblio_data, "collaboration", counting = "harmonic") author_network(biblio_data, "collaboration", counting = "geometric", similarity = "association")data(biblio_data) author_network(biblio_data, "collaboration") author_network(biblio_data, "collaboration", counting = "harmonic") author_network(biblio_data, "collaboration", counting = "geometric", similarity = "association")
Applies the disparity filter to a weighted edge list. For each edge, it computes an alpha (p-value) from both endpoints and keeps the edge if it is statistically significant from at least one endpoint.
backbone(edges, alpha = 0.05)backbone(edges, alpha = 0.05)
edges |
A data frame with at least columns |
alpha |
Numeric. Significance threshold in (0, 1). Default |
The null model asks: given that node has total strength
distributed uniformly across edges, what is the probability that
a single edge weight is as large as ? The answer is
An edge is retained if .
Nodes with only one edge always have and are always kept.
The filtered edge data frame with an added alpha column (the
minimum alpha from the two endpoints).
edges <- data.frame( from = c("A", "A", "A", "B", "C"), to = c("B", "C", "D", "C", "D"), weight = c(10, 1, 1, 8, 1) ) backbone(edges, alpha = 0.05)edges <- data.frame( from = c("A", "A", "A", "B", "C"), to = c("B", "C", "D", "C", "D"), weight = c(10, 1, 1, 8, 1) ) backbone(edges, alpha = 0.05)
A small synthetic dataset of 10 scholarly papers with overlapping authors, references, and keywords. Designed for testing and demonstrating all network construction functions in bibnets.
biblio_databiblio_data
A data frame with 10 rows and 9 columns:
Unique document identifier (W1–W10).
Document title.
Publication year (2018–2022).
Source journal (Scientometrics, Journal of Informetrics, JASIST, Quantitative Science Studies).
DOI string.
Times cited.
List-column of author name strings (6 unique authors).
List-column of cited reference IDs (10 unique refs, R1–R10). Each paper cites exactly 4 references.
List-column of keyword strings (24 unique keywords). Each paper has 3 keywords.
data(biblio_data) reference_network(biblio_data) document_network(biblio_data, "coupling") author_network(biblio_data, "collaboration")data(biblio_data) reference_network(biblio_data) document_network(biblio_data, "coupling") author_network(biblio_data, "collaboration")
With one field, entities are linked when they co-occur in the same
document. With by, entities are linked when they share values of the
by field across documents.
conetwork( data, field, by = NULL, sep = ";", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )conetwork( data, field, by = NULL, sep = ";", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )
data |
A data frame with column |
field |
Character. The entity field — determines what the nodes are. |
by |
Character or |
sep |
Character or |
counting |
Character. Counting method. Default |
similarity |
Character. Normalization method. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum entity frequency. Default 1. |
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
Fields can be list-columns (already split) or character columns with
delimiters (auto-split via sep).
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
data(biblio_data) # Co-occurrence: keywords appearing in the same document conetwork(biblio_data, "keywords") # Authors linked by shared keywords conetwork(biblio_data, "authors", by = "keywords") # Keywords linked by shared authors conetwork(biblio_data, "keywords", by = "authors") # Journals linked by shared references (= journal coupling) conetwork(biblio_data, "journal", by = "references", similarity = "cosine") # Auto-splits semicolon-delimited string columns d <- data.frame(id = 1:3, tags = c("ml; dl; nlp", "ml; cv", "dl; cv")) conetwork(d, "tags")data(biblio_data) # Co-occurrence: keywords appearing in the same document conetwork(biblio_data, "keywords") # Authors linked by shared keywords conetwork(biblio_data, "authors", by = "keywords") # Keywords linked by shared authors conetwork(biblio_data, "keywords", by = "authors") # Journals linked by shared references (= journal coupling) conetwork(biblio_data, "journal", by = "references", similarity = "cosine") # Auto-splits semicolon-delimited string columns d <- data.frame(id = 1:3, tags = c("ml; dl; nlp", "ml; cv", "dl; cv")) conetwork(d, "tags")
Constructs a network between countries based on collaboration or coupling.
country_network( data, type = "collaboration", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, attention = NULL, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )country_network( data, type = "collaboration", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, attention = NULL, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )
data |
A data frame with |
type |
Character. |
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum papers per country. Default 1. |
attention |
Character or NULL. Attention-based weighting independent of
|
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
data(open_alex_gold_open_access_learning_analytics) country_network(open_alex_gold_open_access_learning_analytics, "collaboration")data(open_alex_gold_open_access_learning_analytics) country_network(open_alex_gold_open_access_learning_analytics, "collaboration")
Constructs a network between documents (papers) in the dataset.
document_network( data, type = "coupling", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )document_network( data, type = "coupling", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )
data |
A data frame with |
type |
Character. Relationship type:
|
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum reference frequency. Default 1. |
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
Depends on format. For type = "citation", edges are directed
(from = citing, to = cited) with weight and count both 1.
data(biblio_data) document_network(biblio_data, "coupling") document_network(biblio_data, "coupling", counting = "strength")data(biblio_data) document_network(biblio_data, "coupling") document_network(biblio_data, "coupling", counting = "strength")
Keeps only edges between the most frequent nodes. Node frequency is determined by how many edges each node participates in.
filter_top(edges, n)filter_top(edges, n)
edges |
A data frame with at least |
n |
Integer. Number of top nodes to keep. |
A filtered data frame with edges among the top n nodes.
data(biblio_data) edges <- author_network(biblio_data, "collaboration") # Keep only edges among the top 3 most connected authors filter_top(edges, 3)data(biblio_data) edges <- author_network(biblio_data, "collaboration") # Keep only edges among the top 3 most connected authors filter_top(edges, 3)
Constructs a Garfield-style historiograph: a directed citation network among the most locally cited documents, laid out chronologically.
historiograph(data, n = 30, min_lcs = 1)historiograph(data, n = 30, min_lcs = 1)
data |
A data frame with |
n |
Integer. Number of top locally cited documents to include. Default 30. |
min_lcs |
Integer. Minimum local citation score for inclusion. Default 1. |
A list with:
$nodesData frame of included documents with id, lcs,
gcs, year, title, journal, doi.
$edgesData frame of directed citation edges with from
(citing), to (cited), year_from, year_to.
data(biblio_data) h <- historiograph(biblio_data, n = 5) h$nodes h$edgesdata(biblio_data) h <- historiograph(biblio_data, n = 5) h$nodes h$edges
Constructs a network between institutions (affiliations).
institution_network( data, type = "collaboration", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, attention = NULL, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )institution_network( data, type = "collaboration", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, attention = NULL, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )
data |
A data frame with |
type |
Character. |
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum papers per institution. Default 1. |
attention |
Character or NULL. Attention-based weighting independent of
|
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
data(open_alex_gold_open_access_learning_analytics) institution_network(open_alex_gold_open_access_learning_analytics, "collaboration")data(open_alex_gold_open_access_learning_analytics) institution_network(open_alex_gold_open_access_learning_analytics, "collaboration")
Constructs a network where two keywords are linked when they appear together in the same document.
keyword_network( data, field = "keywords", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, attention = NULL, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )keyword_network( data, field = "keywords", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, attention = NULL, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )
data |
A data frame with |
field |
Character. Name of the keyword list-column. Default
|
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum keyword frequency. Default 1. |
attention |
Character or NULL. Attention-based weighting independent of
|
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
data(biblio_data) keyword_network(biblio_data) keyword_network(biblio_data, similarity = "association")data(biblio_data) keyword_network(biblio_data) keyword_network(biblio_data, similarity = "association")
Counts how many times each document is cited by other documents within the dataset.
local_citations(data)local_citations(data)
data |
A data frame with |
A data frame with columns:
idDocument identifier.
lcsLocal Citation Score: times cited within the dataset.
gcsGlobal Citation Score: cited_by_count if available.
Plus any metadata columns present in the input (title, year,
journal, doi).
data(biblio_data) local_citations(biblio_data)data(biblio_data) local_citations(biblio_data)
Applies a similarity normalization to a square co-occurrence matrix. The diagonal of the input matrix is used as the total occurrence count for each item. Operates entirely in sparse representation.
normalize(A, method = "none")normalize(A, method = "none")
A |
A square symmetric matrix (dense or sparse) representing co-occurrence counts. |
method |
Character. Normalization method:
|
A normalized sparse matrix of the same dimensions.
# Create a small co-occurrence matrix A <- matrix(c(10, 3, 1, 3, 8, 2, 1, 2, 5), nrow = 3, dimnames = list(c("a", "b", "c"), c("a", "b", "c"))) normalize(A, "association") normalize(A, "cosine") normalize(A, "jaccard")# Create a small co-occurrence matrix A <- matrix(c(10, 3, 1, 3, 8, 2, 1, 2, 5), nrow = 3, dimnames = list(c("a", "b", "c"), c("a", "b", "c"))) normalize(A, "association") normalize(A, "cosine") normalize(A, "jaccard")
A corpus of 1,508 gold open-access scholarly works on learning analytics,
retrieved from OpenAlex (CC0 licence). All records have a verified title,
publication year, and at least one author. Journal names are present for
works published in a named source; preprints and book chapters may have
NA in journal.
open_alex_gold_open_access_learning_analyticsopen_alex_gold_open_access_learning_analytics
A data frame with 1,508 rows and 11 columns:
OpenAlex work ID (e.g. "W2769342982").
Work title.
Publication year (integer).
Source name, or NA if not available.
DOI string without the https://doi.org/ prefix,
or NA.
Number of citing works as recorded in OpenAlex.
Work type ("article", "review",
"preprint", "book-chapter", etc.).
List-column of author display names (pipe-split from the OpenAlex flat export; one name per authorship slot).
List-column with one element: the primary OpenAlex
topic for the work (e.g. "Online Learning and Analytics").
List-column of institution display names (one entry per authorship–institution pair).
List-column of two-letter ISO country codes (one entry per authorship–institution pair).
OpenAlex https://openalex.org, CC0 licence.
data(open_alex_gold_open_access_learning_analytics) d <- open_alex_gold_open_access_learning_analytics author_network(d, "collaboration") country_network(d, "collaboration")data(open_alex_gold_open_access_learning_analytics) d <- open_alex_gold_open_access_learning_analytics author_network(d, "collaboration") country_network(d, "collaboration")
Print a bibnets network edge list
## S3 method for class 'bibnets_network' print(x, n = 10L, ...)## S3 method for class 'bibnets_network' print(x, n = 10L, ...)
x |
A |
n |
Integer. Number of rows to show. Default 10. |
... |
Ignored. |
Invisibly returns x.
data(biblio_data) edges <- author_network(biblio_data, "collaboration") print(edges)data(biblio_data) edges <- author_network(biblio_data, "collaboration") print(edges)
Reduces a weighted edge list by removing weak or excess edges.
prune(edges, threshold = NULL, top_n = NULL)prune(edges, threshold = NULL, top_n = NULL)
edges |
A data frame with at least columns |
threshold |
Numeric. Keep only edges with |
top_n |
Integer. For each node, keep only its |
The filtered edge data frame (same columns as input).
edges <- data.frame( from = c("A","A","A","B","B","C"), to = c("B","C","D","C","D","D"), weight = c(5, 1, 2, 4, 1, 3) ) # Keep only edges with weight >= 3 prune(edges, threshold = 3) # Keep the 2 strongest edges per node prune(edges, top_n = 2)edges <- data.frame( from = c("A","A","A","B","B","C"), to = c("B","C","D","C","D","D"), weight = c(5, 1, 2, 4, 1, 3) ) # Keep only edges with weight >= 3 prune(edges, threshold = 3) # Keep the 2 strongest edges per node prune(edges, top_n = 2)
Universal reader that handles files, folders, format detection, and generic CSV input. Accepts a single file, multiple files, or a directory.
read_biblio(path, format = "auto", id = NULL, actors = NULL, sep = ";", ...)read_biblio(path, format = "auto", id = NULL, actors = NULL, sep = ";", ...)
path |
Character. Path to a file, a vector of file paths, or a directory containing export files. |
format |
Character. File format:
|
id |
Character. Column name for document identifier. Only used
when |
actors |
Character vector. Column names to split into list-columns.
Only used when |
sep |
Character. Delimiter for splitting actor columns. Default |
... |
Additional arguments passed to the format-specific reader. |
A data frame.
# Auto-detect format from file content (here: a bundled OpenAlex CSV) f <- system.file("extdata", "openalex_works.csv", package = "bibnets") data <- read_biblio(f) head(data[, c("id", "title", "year", "journal")]) # Read multiple files at once; auto-detects each format f_scopus <- system.file("extdata", "scopus_sample.csv", package = "bibnets") f_wos <- system.file("extdata", "wos_sample.txt", package = "bibnets") combined <- read_biblio(c(f_scopus, f_wos)) head(combined[, c("id", "title", "year", "journal")]) # Read every supported export in a directory (here: the bundled extdata) folder <- system.file("extdata", package = "bibnets") all_data <- read_biblio(folder) nrow(all_data) # Generic CSV: point read_biblio at any CSV and name the list-column fields tmp <- tempfile(fileext = ".csv") write.csv(data.frame( doc_id = c("a", "b"), Authors = c("Smith J; Jones A", "Davis M"), Keywords = c("networks; bibliometrics", "analytics") ), tmp, row.names = FALSE) generic <- read_biblio(tmp, format = "generic", id = "doc_id", actors = c("Authors", "Keywords"), sep = ";") head(generic)# Auto-detect format from file content (here: a bundled OpenAlex CSV) f <- system.file("extdata", "openalex_works.csv", package = "bibnets") data <- read_biblio(f) head(data[, c("id", "title", "year", "journal")]) # Read multiple files at once; auto-detects each format f_scopus <- system.file("extdata", "scopus_sample.csv", package = "bibnets") f_wos <- system.file("extdata", "wos_sample.txt", package = "bibnets") combined <- read_biblio(c(f_scopus, f_wos)) head(combined[, c("id", "title", "year", "journal")]) # Read every supported export in a directory (here: the bundled extdata) folder <- system.file("extdata", package = "bibnets") all_data <- read_biblio(folder) nrow(all_data) # Generic CSV: point read_biblio at any CSV and name the list-column fields tmp <- tempfile(fileext = ".csv") write.csv(data.frame( doc_id = c("a", "b"), Authors = c("Smith J; Jones A", "Davis M"), Keywords = c("networks; bibliometrics", "analytics") ), tmp, row.names = FALSE) generic <- read_biblio(tmp, format = "generic", id = "doc_id", actors = c("Authors", "Keywords"), sep = ";") head(generic)
Parses a .bib file into a standardized bibliometric data frame.
Note: standard BibTeX does not contain cited references, so the
references column will be empty unless the file includes a
non-standard cited-references or note field with reference data.
read_bibtex(file, encoding = "UTF-8")read_bibtex(file, encoding = "UTF-8")
file |
Path to a |
encoding |
Character. File encoding. Default |
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references (typically empty for
BibTeX), and keywords.
# Write a minimal BibTeX entry to a temp file, then read it bib <- '@article{smith2020, title = {Bibliometric networks}, author = {Smith, J. and Jones, K.}, journal = {Test Journal}, year = {2020}, doi = {10.1000/test} }' f <- tempfile(fileext = ".bib") writeLines(bib, f) data <- read_bibtex(f) data[, c("id", "title", "year", "journal", "doi")] unlink(f)# Write a minimal BibTeX entry to a temp file, then read it bib <- '@article{smith2020, title = {Bibliometric networks}, author = {Smith, J. and Jones, K.}, journal = {Test Journal}, year = {2020}, doi = {10.1000/test} }' f <- tempfile(fileext = ".bib") writeLines(bib, f) data <- read_bibtex(f) data[, c("id", "title", "year", "journal", "doi")] unlink(f)
Takes the output of rcrossref::cr_works() (the $data tibble/data frame)
and converts it to the standardized bibnets format.
read_crossref(data)read_crossref(data)
data |
A data frame from |
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
# Construct a minimal data frame matching the structure of # rcrossref::cr_works(...)$data. In practice, pass that data frame directly. raw <- data.frame( doi = c("10.1/a", "10.2/b"), title = c("First paper", "Second paper"), issued = c("2022-01-01", "2021-06-15"), container.title = c("Journal A", "Journal B"), is.referenced.by.count = c("3", "9"), type = c("journal-article", "journal-article"), stringsAsFactors = FALSE ) raw$author <- list( data.frame(given = c("Jane", "Anne"), family = c("Smith", "Jones"), stringsAsFactors = FALSE), data.frame(given = "Mark", family = "Davis", stringsAsFactors = FALSE) ) data <- read_crossref(raw) head(data[, c("id", "title", "year", "journal")])# Construct a minimal data frame matching the structure of # rcrossref::cr_works(...)$data. In practice, pass that data frame directly. raw <- data.frame( doi = c("10.1/a", "10.2/b"), title = c("First paper", "Second paper"), issued = c("2022-01-01", "2021-06-15"), container.title = c("Journal A", "Journal B"), is.referenced.by.count = c("3", "9"), type = c("journal-article", "journal-article"), stringsAsFactors = FALSE ) raw$author <- list( data.frame(given = c("Jane", "Anne"), family = c("Smith", "Jones"), stringsAsFactors = FALSE), data.frame(given = "Mark", family = "Davis", stringsAsFactors = FALSE) ) data <- read_crossref(raw) head(data[, c("id", "title", "year", "journal")])
Parses a CSV file exported from Dimensions into a standardized bibliometric data frame.
read_dimensions(file, encoding = "UTF-8")read_dimensions(file, encoding = "UTF-8")
file |
Path to a Dimensions CSV export file. |
encoding |
Character. File encoding. Default |
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
Dimensions-specific extras: affiliations (list-column),
countries (list-column).
f <- system.file("extdata", "dimensions_sample.csv", package = "bibnets") data <- read_dimensions(f) head(data[, c("id", "title", "year", "journal")])f <- system.file("extdata", "dimensions_sample.csv", package = "bibnets") data <- read_dimensions(f) head(data[, c("id", "title", "year", "journal")])
Parses a CSV file exported from Lens.org into a standardized bibliometric data frame.
read_lens(file, encoding = "UTF-8")read_lens(file, encoding = "UTF-8")
file |
Path to a Lens.org CSV export file. |
encoding |
Character. File encoding. Default |
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
f <- system.file("extdata", "lens_sample.csv", package = "bibnets") data <- read_lens(f) head(data[, c("id", "title", "year", "journal")])f <- system.file("extdata", "lens_sample.csv", package = "bibnets") data <- read_lens(f) head(data[, c("id", "title", "year", "journal")])
Takes the output of openalexR::oa_fetch() (a tibble/data frame of works)
and converts it to the standardized bibnets format with list-columns.
read_openalex(data)read_openalex(data)
data |
A data frame from |
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
# Construct a minimal data frame matching the structure returned by # openalexR::oa_fetch(entity = "works", ...). In practice, pass the # result of oa_fetch() directly. raw <- data.frame( id = c("W123", "W456"), display_name = c("First paper", "Second paper"), publication_year = c(2022L, 2021L), so = c("Journal A", "Journal B"), doi = c("https://doi.org/10.1/a", "https://doi.org/10.2/b"), cited_by_count = c(5L, 12L), stringsAsFactors = FALSE ) raw$author <- list( data.frame(au_display_name = c("Smith J", "Jones A"), stringsAsFactors = FALSE), data.frame(au_display_name = "Davis M", stringsAsFactors = FALSE) ) raw$referenced_works <- list(c("W100", "W200"), "W123") data <- read_openalex(raw) head(data[, c("id", "title", "year", "journal", "doi")])# Construct a minimal data frame matching the structure returned by # openalexR::oa_fetch(entity = "works", ...). In practice, pass the # result of oa_fetch() directly. raw <- data.frame( id = c("W123", "W456"), display_name = c("First paper", "Second paper"), publication_year = c(2022L, 2021L), so = c("Journal A", "Journal B"), doi = c("https://doi.org/10.1/a", "https://doi.org/10.2/b"), cited_by_count = c(5L, 12L), stringsAsFactors = FALSE ) raw$author <- list( data.frame(au_display_name = c("Smith J", "Jones A"), stringsAsFactors = FALSE), data.frame(au_display_name = "Davis M", stringsAsFactors = FALSE) ) raw$referenced_works <- list(c("W100", "W200"), "W123") data <- read_openalex(raw) head(data[, c("id", "title", "year", "journal", "doi")])
Reads the flat CSV format downloaded directly from the OpenAlex website
(openalex.org/works exports). Multi-value fields are pipe-delimited (|).
This is distinct from the nested tibble produced by openalexR::oa_fetch(),
which is handled by read_openalex().
read_openalex_csv(file, sep = "|")read_openalex_csv(file, sep = "|")
file |
Path to the CSV file. |
sep |
Character. Delimiter for multi-value fields. Default |
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, keywords, affiliations,
countries. abstract and references are always NA / empty
(not available in the flat export).
f <- system.file("extdata", "openalex_works.csv", package = "bibnets") data <- read_openalex_csv(f)f <- system.file("extdata", "openalex_works.csv", package = "bibnets") data <- read_openalex_csv(f)
Parses a .ris file into a standardized bibliometric data frame.
Like BibTeX, standard RIS does not include cited references.
read_ris(file, encoding = "UTF-8")read_ris(file, encoding = "UTF-8")
file |
Path to a |
encoding |
Character. File encoding. Default |
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references (typically empty for
RIS), and keywords.
# Write a minimal RIS record to a temp file, then read it ris <- "TY - JOUR AU - Smith, J. AU - Jones, K. TI - Bibliometric networks JO - Test Journal PY - 2020 DO - 10.1000/test ER - " f <- tempfile(fileext = ".ris") writeLines(ris, f) data <- read_ris(f) data[, c("id", "title", "year", "journal", "doi")] unlink(f)# Write a minimal RIS record to a temp file, then read it ris <- "TY - JOUR AU - Smith, J. AU - Jones, K. TI - Bibliometric networks JO - Test Journal PY - 2020 DO - 10.1000/test ER - " f <- tempfile(fileext = ".ris") writeLines(ris, f) data <- read_ris(f) data[, c("id", "title", "year", "journal", "doi")] unlink(f)
Parses a CSV file exported from Scopus into a standardized bibliometric data frame with list-columns for multi-valued fields.
read_scopus(file, encoding = "UTF-8")read_scopus(file, encoding = "UTF-8")
file |
Path to a Scopus CSV export file. |
encoding |
Character. File encoding. Default |
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
Scopus-specific extras: index_keywords (list-column),
affiliations (character), language (character).
f <- system.file("extdata", "scopus_sample.csv", package = "bibnets") data <- read_scopus(f) head(data[, c("id", "title", "year", "journal")])f <- system.file("extdata", "scopus_sample.csv", package = "bibnets") data <- read_scopus(f) head(data[, c("id", "title", "year", "journal")])
Parses a Web of Science export file (plaintext or tab-delimited) into a standardized bibliometric data frame.
read_wos(file, format = "plaintext")read_wos(file, format = "plaintext")
file |
Path to a WoS export file (.txt). |
format |
Character. |
A data frame in the standard bibnets format: id, title,
year, journal, doi, cited_by_count, abstract, type,
plus list-columns authors, references, and keywords.
WoS-specific extra: keywords_plus (list-column).
f <- system.file("extdata", "wos_sample.txt", package = "bibnets") data <- read_wos(f) head(data[, c("id", "title", "year", "journal")])f <- system.file("extdata", "wos_sample.txt", package = "bibnets") data <- read_wos(f) head(data[, c("id", "title", "year", "journal")])
Constructs a co-citation or equivalence network among cited references. Two references are linked when they are cited together by the same paper.
reference_network( data, type = "co_citation", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )reference_network( data, type = "co_citation", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )
data |
A data frame with |
type |
Character. |
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum times a reference must be cited. Default 1. |
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
data(biblio_data) reference_network(biblio_data) reference_network(biblio_data, similarity = "association")data(biblio_data) reference_network(biblio_data) reference_network(biblio_data, similarity = "association")
First 500 records from a Scopus bibliometric export on the intersection of green cloud computing and quantization, covering 2020–2025. Includes full references, author keywords, index keywords, and affiliations.
scopus_quantum_cloudscopus_quantum_cloud
A data frame with 499 rows and 12 columns:
Scopus EID.
Work title.
Publication year (integer).
Source title.
DOI string without the https://doi.org/ prefix.
Times cited in Scopus.
Abstract text.
Document type ("Article", "Review", etc.).
List-column of author name strings.
List-column of cited reference strings.
List-column of author keywords.
List-column of affiliation strings.
Scopus bibliometric export. Dataset archived at doi:10.5281/zenodo.17142636 (CC BY 4.0).
data(scopus_quantum_cloud) author_network(scopus_quantum_cloud, "collaboration") keyword_network(scopus_quantum_cloud) document_network(scopus_quantum_cloud, "coupling", similarity = "cosine")data(scopus_quantum_cloud) author_network(scopus_quantum_cloud, "collaboration") keyword_network(scopus_quantum_cloud) document_network(scopus_quantum_cloud, "coupling", similarity = "cosine")
Constructs a network between publication sources (journals, book series).
source_network( data, type = "coupling", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )source_network( data, type = "coupling", counting = "full", similarity = "none", threshold = 0, min_occur = 1L, top_n = NULL, self_loops = FALSE, deduplicate = TRUE, format = "edgelist" )
data |
A data frame with |
type |
Character. |
counting |
Character. Counting method. Default |
similarity |
Character. Similarity measure. Default |
threshold |
Numeric. Minimum edge weight. Default 0. |
min_occur |
Integer. Minimum papers per source. Default 1. |
top_n |
Integer or NULL. Return only the top n edges by weight. Default NULL (all edges). |
self_loops |
Logical. If |
deduplicate |
Logical. If |
format |
Character. Output format:
|
Depends on format: a bibnets_network data frame (default),
a Gephi-ready data frame, an igraph graph, a cograph_network, or a
sparse matrix.
data(biblio_data) source_network(biblio_data, "coupling")data(biblio_data) source_network(biblio_data, "coupling")
Splits semicolon-separated strings (common in Scopus/WoS exports) into character vectors, trimming whitespace.
split_field(x, sep = ";")split_field(x, sep = ";")
x |
A character vector of semicolon-delimited strings. |
sep |
Character. Delimiter. Default |
A list of character vectors.
split_field(c("Alice; Bob; Carol", "Dave; Eve"))split_field(c("Alice; Bob; Carol", "Dave; Eve"))
Summarise a bibnets network
## S3 method for class 'bibnets_network' summary(object, ...)## S3 method for class 'bibnets_network' summary(object, ...)
object |
A |
... |
Ignored. |
Invisibly returns object.
data(biblio_data) edges <- author_network(biblio_data, "collaboration") summary(edges)data(biblio_data) edges <- author_network(biblio_data, "collaboration") summary(edges)
Splits data by time windows and builds a separate network for each window using any network function.
temporal_network( data, network_fun, ..., window = 3, step = NULL, strategy = "fixed", time_col = "year" )temporal_network( data, network_fun, ..., window = 3, step = NULL, strategy = "fixed", time_col = "year" )
data |
A data frame with a numeric time column. |
network_fun |
Function or character string naming a network function
(e.g., |
... |
Additional arguments passed to |
window |
Integer. Width of each time window in units of the time column (years, months, quarters, etc.). Default 3. |
step |
Integer or |
strategy |
Character. Time window strategy:
|
time_col |
Character. Name of the column containing the time variable.
Default |
A named list of data frames (edge lists). Names are window
labels like "2018-2020".
data(biblio_data) # Fixed 3-year windows temporal_network(biblio_data, author_network, "collaboration") # Sliding window temporal_network(biblio_data, author_network, "collaboration", window = 2, strategy = "sliding") # Cumulative temporal_network(biblio_data, reference_network, threshold = 0, strategy = "cumulative", window = 2) # With string name temporal_network(biblio_data, "keyword_network", window = 3)data(biblio_data) # Fixed 3-year windows temporal_network(biblio_data, author_network, "collaboration") # Sliding window temporal_network(biblio_data, author_network, "collaboration", window = 2, strategy = "sliding") # Cumulative temporal_network(biblio_data, reference_network, threshold = 0, strategy = "cumulative", window = 2) # With string name temporal_network(biblio_data, "keyword_network", window = 3)
Converts a bibnets edge list to a cograph_network object by calling
cograph::as_cograph(). Optionally merges node metadata (e.g., from
local_citations()) into the network's node table so attributes like
lcs or year can be used directly in splot() aesthetic parameters
(e.g., node_size = "lcs").
to_cograph(edges, nodes = NULL, directed = FALSE)to_cograph(edges, nodes = NULL, directed = FALSE)
edges |
A data frame with at least |
nodes |
Optional data frame of node attributes with an |
directed |
Logical. Default |
Note: bibnets edge lists (from, to, weight) are accepted directly
by cograph::splot() without conversion. This function is only needed
when you want to attach node-level metadata.
A cograph_network object (S3 list with $nodes and $edges).
data(biblio_data) # Without metadata: splot() accepts bibnets edges directly edges <- author_network(biblio_data, "collaboration") # With metadata: document network + local citation scores as node size edges <- document_network(biblio_data, type = "coupling") nodes <- local_citations(biblio_data) # keyed by document id net <- to_cograph(edges, nodes = nodes)data(biblio_data) # Without metadata: splot() accepts bibnets edges directly edges <- author_network(biblio_data, "collaboration") # With metadata: document network + local citation scores as node size edges <- document_network(biblio_data, type = "coupling") nodes <- local_citations(biblio_data) # keyed by document id net <- to_cograph(edges, nodes = nodes)
Converts a bibnets edge list (and optional node table) to the CSV format
expected by Gephi's Data Laboratory. Column names are remapped to Gephi
conventions (Source, Target, Weight, Id, Label).
to_gephi(edges, nodes = NULL, file = NULL, directed = FALSE)to_gephi(edges, nodes = NULL, file = NULL, directed = FALSE)
edges |
A data frame with at least |
nodes |
Optional data frame of node attributes. Must contain an |
file |
Optional directory path. If supplied, writes |
directed |
Logical. Sets the |
If file = NULL: a list with $nodes and $edges data frames.
If file is a directory path: writes two CSV files invisibly and returns
the file paths.
data(biblio_data) edges <- author_network(biblio_data, "collaboration") gephi <- to_gephi(edges) head(gephi$edges)data(biblio_data) edges <- author_network(biblio_data, "collaboration") gephi <- to_gephi(edges) head(gephi$edges)
Writes a bibnets edge list (and optional node attributes) to a GraphML file using pure base R — no XML package required.
to_graphml(edges, nodes = NULL, file = NULL, directed = FALSE)to_graphml(edges, nodes = NULL, file = NULL, directed = FALSE)
edges |
A data frame with at least |
nodes |
Optional data frame of node attributes with an |
file |
File path to write. If |
directed |
Logical. Default |
If file = NULL: GraphML as a character string. Otherwise writes
the file and returns the path invisibly.
data(biblio_data) edges <- keyword_network(biblio_data) xml <- to_graphml(edges) cat(substr(xml, 1, 300))data(biblio_data) edges <- keyword_network(biblio_data) xml <- to_graphml(edges) cat(substr(xml, 1, 300))
Convert edge data frame to igraph
to_igraph(edges, directed = FALSE)to_igraph(edges, directed = FALSE)
edges |
A data frame with at least |
directed |
Logical. Default |
An igraph graph object.
data(biblio_data) edges <- author_network(biblio_data, "collaboration") g <- to_igraph(edges)data(biblio_data) edges <- author_network(biblio_data, "collaboration") g <- to_igraph(edges)
Convert edge data frame to adjacency matrix
to_matrix(edges, symmetric = TRUE)to_matrix(edges, symmetric = TRUE)
edges |
A data frame with |
symmetric |
Logical. If |
A sparse Matrix.
data(biblio_data) edges <- reference_network(biblio_data, min_occur = 2) to_matrix(edges)data(biblio_data) edges <- reference_network(biblio_data, min_occur = 2) to_matrix(edges)
Convert edge data frame to tbl_graph
to_tbl_graph(edges, directed = FALSE)to_tbl_graph(edges, directed = FALSE)
edges |
A data frame with at least |
directed |
Logical. Default |
A tbl_graph object (tidygraph).
data(biblio_data) edges <- keyword_network(biblio_data) tg <- to_tbl_graph(edges)data(biblio_data) edges <- keyword_network(biblio_data) tg <- to_tbl_graph(edges)