CiteSource provides three custom metadata fields for labeling
citation records: cite_source, cite_label, and
cite_string. Most workflows use cite_source to
identify the database and cite_label to track the review
stage (search, screened, final). The cite_string field
provides a third dimension for cases where you need to distinguish
between variations of a search strategy within the same source.
The most common use case is within-source string
comparison: you are testing multiple query formulations in a
single database before finalizing your search strategy, and you want to
compare how each performs without conflating the query variation with
the source identity. Encoding the variations as separate
cite_source values would work, but it loses the ability to
aggregate results at the database level. Using cite_string
keeps the database identity intact while enabling a separate axis of
analysis.
In this example, five search strings were run in Web of Science. We
use cite_source to record the database and
cite_string to label each query variation, then compare
their performance against a set of benchmark studies.
The key difference from a standard import: cite_source
is the same database (“WoS”) for all search strings, while
cite_string differentiates the query variations. The
benchmark file gets cite_source = NA and
cite_label = "benchmark".
imported_tbl <- tibble::tribble(
~files, ~cite_sources, ~cite_labels, ~cite_strings,
"benchmark_15.ris", NA, "benchmark", NA,
"search1_166.ris", "WoS", "search", "string 1",
"search2_278.ris", "WoS", "search", "string 2",
"search3_302.ris", "WoS", "search", "string 3",
"search4_460.ris", "WoS", "search", "string 4",
"search5_495.ris", "WoS", "search", "string 5"
) |>
dplyr::mutate(files = paste0(file_path, files))
raw_citations <- read_citations(metadata = imported_tbl, verbose = FALSE)The upset plot shows how records are distributed across string combinations. This tells you which strings are finding records the others miss and how much overlap exists between query variations.
plot_contributions() shows unique and shared record
counts for each string. Strings with a high proportion of unique records
are contributing coverage that the other strings miss; strings with
mostly shared records may be redundant.
Filtering to the benchmark records and using the record-level table shows exactly which benchmark studies each string found — and which were missed entirely.
| Scenario | Recommended field |
|---|---|
| Different databases (PubMed, Scopus, WoS) | cite_source |
| Same database, different query variations | cite_string |
| Hand searching, citation chasing alongside database searches | cite_string (method) + cite_source
(target) |
| Tracking records through review stages | cite_label |
For most reviews, cite_source and
cite_label are sufficient. cite_string becomes
valuable when you are doing pre-search validation with multiple query
variants, or when you want to distinguish supplementary search methods
from the primary database searches while keeping both associated with
the same source.