First stable release, and a rename.
Renamed to rtransparency. The package is renamed from rtransparent (the
name of the original tool by Serghiou et al.) to avoid confusion with that
project. The GitHub repository is also renamed to choxos/rtransparency (old
URLs redirect): install with
remotes::install_github("choxos/rtransparency") and load with
library(rtransparency). Function names (rt_*) are unchanged. Serghiou is
credited as an author and the foundational 2021 paper is cited
(citation("rtransparency")).
This 1.0.0 release marks a stable public API: eight transparency indicators
(conflicts of interest, funding, registration, novelty, replication, data,
code, and AI-use disclosure), multilingual conflict-of-interest and funding
detection, plain-text and PMC XML parity, corpus-scale batch processing with
rt_all_pmc_dir(), and accuracy correction for seven of the eight indicators.
New rt_ai(), a plain-text detector for generative-AI-use disclosure, the
text counterpart of rt_ai_pmc(). Because a text file carries no reliable
publication date, rt_ai() applies no 2023 year gate (is_ai_pred is always
TRUE/FALSE, never NA) and cannot confine the scan to back-matter
sections; restrict it to articles from 2023 onward.
Corrected the foundational citation in inst/CITATION to the full PLOS
Biology author list: Serghiou, Contopoulos-Ioannidis, Boyack, Riedel, Wallach
and Ioannidis.
Release polish. Removed an unused oddpub-derived tokenization helper, so
no AGPL-licensed code remains in the package. Corrected the README plain-text
workflow and the rt_read_pdf() documentation: rt_read_pdf() returns a
character string, which must be written to a .txt file before the text
detectors are run on it. rt_summary() documentation and the startup message
now include AI-use disclosure, and the README validation section reports the
newer-indicator metrics and label provenance.
Citation, documentation, and packaging polish.
Citation. Added inst/CITATION, so citation("rtransparent") returns the
package together with the foundational Serghiou et al. (2021) paper.
New vignette "Scope and limitations" documenting what each indicator does
and does not capture (disclosure-based conflicts of interest and AI, data
offered "upon request" excluded, novelty and replication as claim detection,
language coverage), the output schema, and how to pass extracted data- and
code-availability links to FAIR-assessment tooling such as rfair.
Replication is now accuracy-corrected; a fresh validation of replication and AI.
Replication added to rt_accuracy. Earlier releases left replication out
of the accuracy table because its gold set had too few positives (5) for a
stable sensitivity estimate. A new replication-enriched validation of 250
open-access articles, selected for external-validation language and
hand-labeled (111 positives), gives a stable estimate: sensitivity 92.8 on the
enriched positives, with the representative specificity (98.5) carried over
from the 2023 1000-article sample. rt_summary() now reports an
accuracy-corrected replication prevalence. New benchmark
inst/benchmark/results_replication_enriched.{csv,md} and labeled set
data-raw/benchmark/labels_replication_enriched.csv.
AI disclosure validated on 2024-2025 articles. On a random sample of
recent open-access articles the generative-AI-disclosure rate is about two to
three percent (far below curated AI-focused corpora), and the detector's
positives were precise on inspection. Because that prevalence is too low in
unselected literature for a stable corrected estimate, AI remains uncorrected
in rt_summary() (reported as apparent prevalence).
No detector logic changed in this release, so all held-out benchmarks are unchanged.
Conflict-of-interest and funding detection in five more languages.
Multilingual COI and funding. Conflict-of-interest and funding statements are now detected in Spanish, Portuguese, French, German and Italian, not only English. The conflict-of-interest relevance gate and matcher and the funding matcher and no-funding rules gained language-distinctive, accent-tolerant patterns. On 70 open-access articles per language, the conflict-of-interest detection rate rose most for monolingual articles: German 33% to 97%, French 70% to 80%. Funding detection now catches Spanish, Portuguese, French, German and Italian statements (for example Italian 67% to 74%).
The new tokens are language-distinctive and do not occur in English, so the English detectors are unchanged: conflicts of interest stay at 100 / 91.8 on the 2023 sample and the held-out Serghiou et al. (2021) benchmarks are untouched. The multilingual funding patterns also surfaced two Spanish and Portuguese funding statements in the 2023 sample that had been mislabeled as unfunded; those labels were corrected.
Because the text detectors share the PMC detection cores (0.9.8), the new languages are recognized in plain-text input as well.
New multilingual benchmark (inst/benchmark/results_multilingual.{csv,md}).
Data-availability detection remains English-only for now; multilingual data-sharing detection is planned for a future release.
The plain-text detectors now share the PMC detection logic.
TXT/PMC parity. rt_coi(), rt_fund() and rt_register() route their text
through the same detection helpers as rt_coi_pmc(), rt_fund_pmc() and
rt_register_pmc(), replacing separate and weaker text logic. (rt_novelty()
and rt_replication() already shared their helpers.) Measured on text
extracted from the 1000-article 2023 validation set (sensitivity /
specificity): registration 46.2 / 98.7 to 90.4 / 98.4, conflicts of interest
88.8 / 86.3 to 88.6 / 90.4, funding 79.1 / 89.5 to 79.3 / 90.5. The remaining
gap to the PMC detectors is the XML-structural routes (tagged funding groups,
footnote types, section titles) that a plain-text file does not carry.
New TXT-parity benchmark (data-raw/benchmark/build_txt_parity.R,
inst/benchmark/results_txt_parity.{csv,md}) measures the TXT detectors
against the same hand labels as the PMC benchmark.
The PMC detectors, the held-out Serghiou et al. (2021) benchmarks and the novelty/replication gold set are unchanged; only the TXT entry points changed.
Corpus-scale batch processing.
New rt_all_pmc_dir(). Processes every PMC XML in a directory (or a
vector of paths) through rt_all_pmc() in a single call. The run is resumable
(with output, results are written to a CSV in chunks and a re-run skips
files already recorded), isolates per-file failures (a malformed file yields
an is_success = FALSE row instead of aborting the run), shows a progress
bar, and can run in parallel via the optional furrr package and an active
future::plan().
furrr and future are added to Suggests; they are used only for
rt_all_pmc_dir(parallel = TRUE).
The hand-labeled 2023 validation sample reaches 1000 articles.
Validation sample reaches 1000. The final twenty open-access PMC articles were hand-labeled for all eight indicators and added to the committed sample, bringing it to a round 1000. Metrics (sensitivity / specificity): conflicts of interest 100 / 91.8, funding 94.8 / 95.3, registration 84.6 / 99.2, novelty 90.2 / 93.3, replication 82.4 / 98.5, data 91.1 / 97.8, code 93.9 / 99.0, AI 100 / 100.
Funding. The Portuguese no-funding declaration "os autores nao reportam qualquer financiamento" ("the authors report no funding") is now read as absence of funding.
The held-out Serghiou et al. (2021) benchmarks and the novelty/replication gold set are unchanged.
The hand-labeled 2023 validation sample is expanded to 980 articles (265 new), with a focused improvement to replication precision and a further funding fix.
Validation sample grows to 980. Eighteen new batches (265 articles) were hand-labeled for all eight indicators and folded into the committed sample. Current metrics (sensitivity / specificity): conflicts of interest 100 / 91.7, funding 94.8 / 95.2, registration 84.6 / 99.2, novelty 90.1 / 93.4, replication 81.2 / 98.5, data 90.8 / 97.8, code 93.8 / 98.9, AI 100 / 100.
Replication precision. The replication detector previously fired on several non-replication contexts. It now suppresses: limitations and strengths discussion paragraphs ("a third limitation concerns the validity of ..."), editorial statements about reproducibility as a value ("reproducibility is the cornerstone of scientific integrity"), reviews assessing the "validity of" a method or algorithm, lists of machine-learning evaluation metrics, results reproduced only within the arms of a single trial, and negative results ("not always replicated"). Replication PPV rose from 33.3 to 40.0 on the novelty/replication gold set and to 48.1 on the larger 2023 sample (with specificity 98.5); replication positives are still few, so PPV remains modest.
Funding. "The authors did not receive any external financial support for this work" is now read as absence of funding.
The held-out Serghiou et al. (2021) benchmarks and the novelty gold set are unchanged.
The hand-labeled 2023 validation sample is expanded to 715 articles (210 new), with three small detector fixes surfaced by the new batches.
Validation sample grows to 715. Fourteen new batches (210 articles) were
hand-labeled for all eight indicators and folded into
data-raw/benchmark/labels_2023_sample.csv and
inst/benchmark/results_2023_sample.md. Current independent metrics:
registration 88.9 / 99.6, novelty 89.1 / 94.5, code 92.0 / 99.7, replication
84.6 / 98.0; detector-adjudicated funding 93.2 / 95.5 and data 90.9 / 97.9.
Funding: more no-funding declarations recognized. "There are no source of support", "not supported by any organizations", "no external sources of funding" and "conducted without the receipt of any dedicated grant or financial support" are now read as absence of funding rather than disclosed funding (these otherwise leaked through the funding-title route).
Novelty recall. "previously unobserved" is added to the gap-claim cues ("we identify a previously unobserved ..."), and "undertake" to the priority verbs ("the first to undertake a comprehensive review").
The held-out Serghiou et al. (2021) benchmarks and the novelty/replication gold set are unchanged (the new funding phrases are absence-of-funding declarations that cannot drop a funded positive).
The hand-labeled 2023 validation sample is expanded to 505 articles (120 new), with three small detector fixes surfaced by the new batches.
Validation sample grows to 505. Eight new batches (120 articles) were
hand-labeled for all eight indicators and folded into
data-raw/benchmark/labels_2023_sample.csv and
inst/benchmark/results_2023_sample.md. Current independent metrics:
registration 88.2 / 99.4, novelty 87.7 / 95.8, code 94.1 / 99.6, replication
81.8 / 98.0; detector-adjudicated funding 91.8 / 95.3 and data 88.6 / 97.7.
Funding: more no-funding declarations recognized. "The authors were not financially supported by any funding or institutions" (the adverb "financially" previously broke the match) and non-English declarations (Portuguese "nao teve fontes de financiamento", Spanish "no recibio financiacion" / "sin financiacion") are now read as absence of funding rather than disclosed funding.
AI: disclosure-section titles broadened. A section titled "Statement on the use of artificial intelligence" (and similar "... on the use of AI / generative AI / LLMs" headings) is now recognized as an AI-use disclosure, matching the existing "Declaration of generative AI" handling.
The held-out Serghiou et al. (2021) benchmarks are unchanged: the new funding phrases are absence-of-funding declarations (which cannot drop a funded positive and do not occur in that English, pre-2021 set), and the AI indicator is not part of it.
A precision release from the next round of hand-label review (2023 sample grown to 385 articles).
Funding: no-funding declarations no longer leak. The BMJ standard statement "The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors" sits under a section titled "Funding", so it was counted as disclosed funding. It is now recognized as an absence-of-funding declaration. Three 2023-sample articles were relabeled to FALSE accordingly (their only funding statement is this declaration; one also cites historical NIH funding of unrelated past research, which is not the article's own funding).
Novelty precision. Active-voice disease surveillance ("the country recorded its first case of COVID-19 on 27 February 2020") is now suppressed, matching the existing passive-voice rule; genuine case-report novelty ("we report the first case of ...") is preserved.
Novelty recall. The explicit self-assertion "the novelty of our study ..." is now recognized.
Measured effect. On the 2023 sample, funding specificity rose from 91.2 to 94.7 and PPV from 94.6 to 96.3 (sensitivity unchanged); novelty holds at 86.7 / 95.1. The held-out Serghiou et al. (2021) benchmarks and the novelty/replication gold set are unchanged (the new funding phrase appears only in modern articles, and an absence-of-funding rule cannot drop a funded positive).
This release overhauls the novelty detector for both recall and precision, fixes two long-standing bugs in the public PMC entry points, and corrects mislabeled articles in the 2023 validation sample.
Novelty recall. Fixed a core gap: the "first to <verb>" rule was missing many common verbs (confirm, validate, find, discover, prove, predict and others), so canonical claims such as "the first study to confirm ..." went undetected. The relevance pre-filter was also widened to a cheap superset of the pattern cues, so genuine claims placed in results or discussion sections are no longer discarded before the precise rules run. New patterns recognize "first <research object> to <verb>" (technology, technique, approach, method, tool, model and similar), the author-voice idiom "we provide the first evidence that ...", superlative and "fails to"/"no such study" gaps introduced by "to our knowledge" (whether the gap precedes or follows the phrase), and the passive "a novel <object> was developed/detected".
Novelty precision. Bare "new" is no longer treated as a novelty cue (it is far too frequent in non-priority contexts such as "a new model" or "new insights"); procedural "we first <verb> ..., then ..." no longer counts as a priority claim; and the weak "this novel <term>" pattern was removed. Gap claims ("previously un...", "has not been studied") must now be tied to the present study rather than to background or a cited work. New suppression rules drop firstness that is attributed ("not the first", an author
Measured effect. On the independent 2023 sample, novelty rose from
sensitivity 77.2 / specificity 89.6 (PPV 71.0) to 86.3 / 94.9 (PPV 85.4). On
the novelty/replication gold set it rose from 76.5 / 90.8 to 83.8 / 95.2; the
rt_accuracy novelty estimate used by rt_summary() was updated accordingly
(0.765/0.908 to 0.838/0.952). Replication is unchanged.
Bug fix: duplicated columns. rt_novelty_pmc() and rt_replication_pmc()
raised "Column names ... must not be duplicated" because their identifier
output duplicated the prediction and text columns supplied by the internal
detector. Both now return a single, well-formed row. (rt_all_pmc(), which
calls the internal detectors directly, was never affected.)
Validation labels. Corrected eleven novelty labels in the 2023 sample that were assigned in error during fast batch labeling: seven clear author priority claims had been marked FALSE, and four enumeration, ordinal or "new method" mentions with no priority claim had been marked TRUE. The committed benchmark and the novelty/replication gold set were rebuilt from the corrected labels.
The held-out Serghiou et al. (2021) conflicts-of-interest, funding, registration, data and code benchmarks are unchanged; no detector other than novelty was modified.
This is a feature release centered on the novelty and replication detectors and a second, independent validation set.
Independent 2023 validation sample. Added a held-out set of 370 open-access
PMC articles published in 2023, hand-labeled for all eight transparency
indicators (data-raw/benchmark/labels_2023_sample.csv,
inst/benchmark/results_2023_sample.md). It is a modern companion to the
Serghiou et al. (2021) held-out set, which predates these indicators and the
2023-era reporting conventions. The conflicts-of-interest, funding and data
labels were reconciled against the detector's extracted statement where the
author's back matter was truncated during labeling, so those three are not
independent of the detector; novelty, replication, registration and code
sharing were labeled independently and are the meaningful test.
Novelty detector improvements. Recall was broadened to recognize "new" and
"innovative" (not only "novel"), a much wider set of research objects
(device, sequence, model, tool, assay, algorithm, variant, isolate, ...),
passive claims ("a novel X is developed"), an adverbial "first" ("our study
first provided evidence"), more "first to .negate_novelty_1) removes firstness attributed
to a cited study ("Smith et al. demonstrated for the first time"),
ordinal/temporal "first" (first-time transplant, first day/week/stage) while
preserving the priority phrase "for the first time, we ...", and historical
dates ("used for the first time in 1993"). On the 2023 sample, novelty
sensitivity rose from 72.8% to 77.2% and specificity from 87.8% to 89.6%.
Replication detector. Future/conditional replication proposed for later work ("this study can be replicated with a larger sample") is now treated as not performed. The replication gold set remains small (few positives), so its estimates are reported as low-power.
The novelty/replication gold set was expanded from 160 to 370 articles, and
the novelty accuracy used by rt_summary(accuracy = TRUE) was updated
accordingly.
Code sharing: do not mistake a "Web Resources" / "URLs" list for shared code. Genomics papers commonly list the external tools and databases they used as "Name: URL, Name: URL, ..." (for example ANNOVAR, BWA, GATK and third-party GitHub tools such as Delly, Lumpy and Manta). Such a resource list cites software the authors used, not code they released, but the GitHub URLs made it register as code sharing. A list of three or more "label: URL" entries is now vetoed. The held-out code benchmark is unchanged (sensitivity 88.1%, specificity 99.5%).
Funding: do not count an open-access publishing arrangement as research funding. Statements such as "Open Access funding enabled and organized by Projekt DEAL" (or by CAUL, IReL and similar library consortia) pay the article-processing charge and are not a research-funding disclosure, but the "funding ... by
<funding-group>. When an article's funding-group named a funder (<funding-source>) and award identifier but carried no narrative <funding-statement> and no funding section title, the funding was missed. The named funder is now treated as a funding disclosure (and returned as the funding text). The held-out funding benchmark is unchanged (sensitivity 100%, specificity 95.7%). Added regression tests.rt_data_code_pmc() and rt_all_pmc() now also return the identifiers of the shared data and code, not just whether sharing occurred. New columns open_data_links and open_code_links hold the DOIs (as doi.org URLs), repository URLs and database accessions extracted from the detected availability statements, with accessions normalized to identifiers.org prefix:accession form (for example geo:GSE12345, bioproject:PRJEB51269); multiple identifiers are separated by " ; ". Identifiers are taken only from the availability statements, so a reused accession cited in the methods is not collected. Added regression tests.data-raw/benchmark/labels_novelty_replication.csv, with the label definitions documented in run_novelty_replication.R) is scored by data-raw/benchmark/run_novelty_replication.R; results are in inst/benchmark/results_novelty_replication.md. Novelty scores sensitivity 81.0%, specificity 93.2% (n = 160, 42 positives); replication has too few positives for a stable sensitivity estimate (specificity 96.8%).rt_accuracy now includes novelty (sensitivity 0.810, specificity 0.932), so rt_summary() reports an error-corrected novelty prevalence. Replication and AI-use disclosure remain uncorrected.Fixes for genome data-papers (Darwin Tree of Life and similar), found during the manual validation of 1,000 open-access PMC articles:
rt_all_pmc() now returns all eight transparency indicators in a single call. It previously returned six (COI, funding, registration, novelty, replication and AI-use disclosure) and data and code sharing had to be obtained separately from rt_data_code_pmc(); the output now also carries is_open_data, is_open_code and their matched statements (open_data_statements, open_code_statements). The detection is the same native detector as rt_data_code_pmc(), so the two agree exactly. The change is additive: existing columns are unchanged, and the COI, funding and registration benchmarks are unaffected. The vignettes are updated to reflect the single-call workflow.Documentation and example data, so the package website showcases every indicator:
vignette("ai-disclosure"), on the AI-use disclosure indicator: what rt_ai_pmc() detects, why it is gated to 2023 onward, and how to chart its adoption across a corpus.rt_demo gains an is_ai_pred column (NA before 2023) and now spans 2010-2026, so rt_summary() and rt_plot() examples can show the AI indicator and its time trend. The data remain simulated.Further fixes from the manual validation on a fresh sample of 1,000 open-access PMC articles from 2023:
Fixes from a manual validation on a fresh, disjoint sample of 1,000 open-access PMC articles from 2023:
rt_ai_pmc() detects whether an article discloses the use (or non-use) of generative AI or AI-assisted tools in preparing the manuscript, as journals have asked of authors since 2023. It recognizes positive disclosures ("the authors used ChatGPT to improve the readability of the manuscript"), negative disclosures ("no generative AI was used in the preparation of this work") and dedicated "Declaration of generative AI" sections, while not flagging articles that merely use AI as their research method. Because the practice did not exist before 2023, the indicator is only evaluated for articles published in 2023 or later; earlier articles return NA (is_ai_pred), and the publication year is reported. The indicator is included in rt_all_pmc() and recognized by rt_summary(). On the 1,000-article open-access validation set (almost all published 2024-2026) it flags about 16% of articles, with high precision on inspection.rt_accuracy was updated. The patterns are gated on a language prefix or the word "script" so non-analysis "codes" (ICD, diagnosis, qualitative) are not matched. Added regression tests.Improvements from a large audit: the tool was run over 1,000 cached open-access PMC articles and a sample was hand-checked against the human-labeled benchmark.
rt_accuracy was updated.Precision and recall fixes from an independent manual review of a sample of open-access PMC articles:
rt_accuracy was updated to these estimates.@noRd, so the manual and the pkgdown reference present only the public API.rt_summary() and rt_score() so indicator columns must be logical or numeric 0/1 values, with NA allowed.data-raw/external-validation/.GPL-3 + file LICENSE to GPL-3. The package is plain GPL-3 with no additional terms, so the + file LICENSE form (which signals extra restrictions in the LICENSE file) was misleading; the full GPL-3 text is still provided in LICENSE for reference.rt_summary() reports each indicator's prevalence with a Wilson confidence interval and, by default, a prevalence corrected for the detector's sensitivity and specificity (the Rogan-Gladen estimator). It can summarize within groups via by.rt_score() adds a per-article count of the openness practices met.rt_plot() draws a prevalence bar chart or a prevalence-over-time line chart (requires ggplot2).rt_accuracy (detector sensitivity and specificity estimates, used by rt_summary()) and rt_demo (a small simulated corpus for the examples).vignette("transparency-summary"), illustrating the output: from one article to a corpus prevalence table, an accuracy-corrected prevalence, a practice-count distribution, subgroup summaries and plots.oddpub and tokenizers. The native detector (added in 0.4.0) is the only data and code path; oddpub, tokenizers and metareadr have been dropped from Suggests, so the package and its CRAN-style check no longer reference any GitHub-only packages.R CMD check note about the undefined . global variable.DESCRIPTION Title is now in title case and the pkgdown URL carries its trailing slash.rt_data_code_pmc_list() documentation example.rt_fund_pmc(). It previously predicted funding TRUE for no-funding articles with empty evidence text; it now delegates to the same detection path as rt_all_pmc() so the two agree, and a positive prediction always carries evidence. Added regression tests.rt_meta_pmc() (article metadata from a PMC XML file), which the README advertised but which was not exported.R CMD check now passes with no errors or warnings.oddpub / metareadr instructions.R/data_code.R) and no longer requires the oddpub package at runtime. On the XML benchmark used at the time, the native detector scored data 64% sensitivity / 95% specificity and code 68% sensitivity / 94% specificity (the published paper reports about 76% and 59% sensitivity). Code detection already exceeded the paper's sensitivity and the data precision matched the original oddpub; data sensitivity was being improved toward oddpub's ~84%.rt_data_code, rt_data_code_pmc and rt_data_code_pmc_list were rewritten to use the native detector and return is_open_data / is_open_code with the matched statement text. They no longer depend on oddpub or tokenizers.data-raw/benchmark/run_data_code.R, inst/benchmark/results_data_code.md).CRD numbers exceed 5 digits), in both the TXT and PMC detectors. No change on the benchmark (the held-out set has no PROSPERO-only cases). The fork's other commits were assessed and deferred: "coi update" is TXT-only (not exercised by the PMC benchmark) and "pipe update" is a cosmetic reformat that conflicts with this line's changes.get_fund_acknow_new(). It previously flagged any acknowledgment that merely named an institution or used the word "support", so competing-interest statements, generic thanks, data-availability statements and affiliations were misread as funding. It now requires explicit funding language: a funding verb directed at a funder, an institutional "support/funding of the ...", a grant or award identifier, or a named award. Sensitivity is unchanged at 100% on the test set.data-raw/benchmark/, inst/benchmark/) that scores the detectors against the human-labeled gold standard of Serghiou et al. (2021) and reports sensitivity, specificity, PPV, NPV and accuracy with bootstrap confidence intervals, alongside the published Fig 2 numbers..reroot_xml() to handle bare <article> and NCBI EFetch <pmc-articleset> roots. Previously it returned an empty document for anything other than the PMC OAI-PMH format, which silently suppressed all detection.str_detect()/regex() calls in the funding detector that errored on articles lacking a structured funding statement.oddpub, tokenizers) are now optional (moved to Suggests); the package loads and every other indicator runs without them. The data and code functions raise a clear, actionable error when these packages are absent.metareadr, GPL-3).rt_novelty and rt_novelty_pmc added: detect claims of novelty ("for the first time") in TXT and PMC XML files.rt_replication and rt_replication_pmc added: detect replication/validation components in TXT and PMC XML files.rt_register and rt_register_pmc expanded: now detect registrations on ISRCTN, ANZCTR (ACTRN), DRKS, IRCT, and UMIN in addition to NCT and PROSPERO.rt_all and rt_all_pmc updated to include novelty and replication indicators.tests/testthat/).rt_coi now searches for Conflicts of interest statements within text files.rt_fund now searches for Funding statements within text files.rt_register now searches for Registration statements within text files.rt_all now searches for many indicators within text files.rt_read_pdf now converts PDF files into TXT using poppler.