| Title: | Behavioural Reproducibility Auditing for R Projects |
|---|---|
| Description: | Audits R scripts for behavioural reproducibility risk. Scans scripts for qualified package::function calls and checks them against a curated database of known silent breaking changes across popular CRAN packages. Flags stochastic calls lacking set.seed() and detects locale-sensitive operations that may produce different results across systems. Supports baseline certification of analytical outputs so that silent numerical drift can be detected across package upgrades or platform changes. Generates human-readable audit reports suitable for academic submission or pharmaceutical QC workflows. For more details see <https://github.com/repro-stats/reproducr>. |
| Authors: | Ndoh Penn [aut, cre] (ORCID: <https://orcid.org/0009-0003-9054-465X>) |
| Maintainer: | Ndoh Penn <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0 |
| Built: | 2026-06-20 17:31:01 UTC |
| Source: | https://github.com/cran/reproducr |
You finish an analysis. The code runs. The numbers look right. But are they stable?
reproducr makes behavioural reproducibility risks visible and trackable.
It scans your scripts for known silent breaking changes, flags stochastic
calls missing set.seed(), certifies analytical outputs as baselines, and
detects numerical drift across runs.
Tier 1 – Scan & score
report <- audit_script("analysis.R")
risks <- risk_score(report)
print(risks)
Tier 2 – Baseline & drift
model <- lm(mpg ~ wt, data = mtcars) certify(list(coefs = coef(model)), tag = "submission-v1") # Later, after any environment change: check_drift(list(coefs = coef(model)), against = "submission-v1")
Tier 3 – Report & export
repro_report(report, risks, format = "html", style = "pharma") repro_badge(report, risks, output = "README")
| Function | Purpose |
audit_script() |
Parse a script and extract all pkg::fn calls |
risk_score() |
Check calls against the breaking-changes database |
certify() |
Hash and store analytical outputs as a baseline |
check_drift() |
Compare current outputs against a stored baseline |
list_certs() |
List all certifications in a .reproducr file |
repro_report() |
Render a human-readable audit report |
repro_badge() |
Generate a reproducibility status badge |
check_db_staleness() |
Check database entries against current CRAN versions |
The internal database covers known silent breaking changes in:
dplyr, tidyr, ggplot2, readr, purrr, stringr, broom,
data.table, lme4, lubridate, and base R. Community contributions
are welcome – see vignette("contributing-to-the-database").
The database is kept current via a weekly GitHub Actions workflow that
calls check_db_staleness() and opens an issue automatically
when any entry's to_version ceiling falls below the current CRAN release.
Maintainer: Ndoh Penn [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/repro-stats/reproducr/issues
Parses one or more R source files and extracts every qualified
package::function call, resolving the installed version of each package.
The resulting audit_report object is the entry point for the rest of the
reproducr workflow.
audit_script(path = ".", renv = TRUE, verbose = TRUE) ## S3 method for class 'audit_report' print(x, ...) ## S3 method for class 'audit_report' summary(object, ...)audit_script(path = ".", renv = TRUE, verbose = TRUE) ## S3 method for class 'audit_report' print(x, ...) ## S3 method for class 'audit_report' summary(object, ...)
path |
|
renv |
|
verbose |
|
x |
An |
... |
Additional arguments (currently unused). |
object |
An |
An S3 object of class "audit_report", a list containing:
callsA data.frame with one row per detected pkg::fn call,
columns file, line, pkg, fn, pkg_version.
envA list with R version, platform, OS, locale, and timezone.
renv_usedlogical – were versions sourced from a lockfile?
timestampPOSIXct timestamp of when the audit was run.
pathsCharacter vector of files that were scanned.
audit_script() uses regular-expression matching on source text to extract
qualified calls of the form pkg::fn or pkg:::fn. It intentionally skips
comment lines (lines beginning with #, after trimming whitespace). For more
robust analysis, tools that operate on the parse tree (e.g. lintr) should
be used alongside reproducr.
Only qualified calls – those using :: or ::: – are detected.
Unqualified calls (e.g. filter(df, x > 0) without dplyr::) are not
detected because he package cannot be determined unambiguously from source
text alone. This is by design: qualifying calls is also a reproducibility
best practice.
risk_score() to check detected calls against the
breaking-changes database; repro_report() to render the
full audit; certify() to lock a set of outputs as a baseline.
# Write a temporary script to audit script <- tempfile(fileext = ".R") writeLines(c( "set.seed(237)", "x <- dplyr::filter(mtcars, cyl == 4)", "y <- dplyr::summarise(x, mean_mpg = mean(mpg))", "z <- stats::rnorm(nrow(y))" ), script) report <- audit_script(script, renv = FALSE, verbose = FALSE) print(report) # See the detected calls as a data frame report$calls# Write a temporary script to audit script <- tempfile(fileext = ".R") writeLines(c( "set.seed(237)", "x <- dplyr::filter(mtcars, cyl == 4)", "y <- dplyr::summarise(x, mean_mpg = mean(mpg))", "z <- stats::rnorm(nrow(y))" ), script) report <- audit_script(script, renv = FALSE, verbose = FALSE) print(report) # See the detected calls as a data frame report$calls
Hashes a named list of R objects (model coefficients, summary statistics,
key scalars, data frames) and saves them alongside full environment metadata
to a local certification file (.reproducr.rds by default). Later runs
can call check_drift() to verify that results have not changed.
Think of certify() as a "signed receipt" for a completed analysis run.
certify(outputs, tag, script = NULL, file = ".reproducr")certify(outputs, tag, script = NULL, file = ".reproducr")
outputs |
A fully named list of R objects to certify. Each element is
hashed using SHA-256 (or a base-R fallback if |
tag |
|
script |
|
file |
|
Invisibly returns the certification record (a list). Prints a one-line summary to the console.
All certifications for a project are accumulated in a single .reproducr.rds
file. You can have multiple tags representing different stages (e.g. before
and after peer review). Use list_certs() to inspect stored tags.
Commit .reproducr.rds to your project's version control repository.
This makes the certification auditable and shareable with collaborators.
check_drift() to compare current outputs against a baseline;
list_certs() to inspect stored certifications.
model <- lm(mpg ~ wt, data = mtcars) cert_file <- tempfile() certify( outputs = list( coefs = coef(model), r_squared = summary(model)$r.squared, n_obs = nrow(mtcars) ), tag = "baseline-v1", script = "analysis.R", file = cert_file ) # See what is stored list_certs(file = cert_file)model <- lm(mpg ~ wt, data = mtcars) cert_file <- tempfile() certify( outputs = list( coefs = coef(model), r_squared = summary(model)$r.squared, n_obs = nrow(mtcars) ), tag = "baseline-v1", script = "analysis.R", file = cert_file ) # See what is stored list_certs(file = cert_file)
Compares the to_version ceiling and from_version floor of each entry
in the breaking-changes database against the current version of that
package on CRAN. Two types of staleness are detected:
stale_ceiling – the package has released a new version above
the to_version ceiling. The window may need extending.
stale_floor – the current CRAN version is so far ahead of
from_version that the window captures users who are already well
past the breaking-change transition. The entry may need closing or
the from_version floor raising.
This function is primarily intended for use by reproducr maintainers
and contributors. It is also run as a scheduled GitHub Actions workflow
on the reproducr repository to automatically open issues when staleness
is detected.
check_db_staleness( packages = NULL, verbose = TRUE, source = "cran", from_version_major_threshold = 1L )check_db_staleness( packages = NULL, verbose = TRUE, source = "cran", from_version_major_threshold = 1L )
packages |
|
verbose |
|
source |
Default |
from_version_major_threshold |
|
A data.frame of class c("staleness_report", "data.frame")
with one row per database entry. Columns:
keyThe pkg::fn key.
pkgPackage name.
fnFunction name.
from_versionThe floor version currently in the database.
to_versionThe ceiling version currently in the database.
current_versionThe current version on CRAN or installed.
statusOne of "ok", "stale_ceiling", "stale_floor",
or "unknown".
gapDescription of the version gap. NA when status is
"ok" or "unknown".
Rows are ordered: stale_ceiling first, stale_floor second, then ok, then unknown.
risk_score() which uses the database at runtime;
vignette("contributing-to-the-database") for the database schema and
version window design principles.
# Check all tracked packages against CRAN report <- check_db_staleness() print(report) # Check specific packages only check_db_staleness(packages = c("dplyr", "tidyr")) # Offline check using installed versions check_db_staleness(source = "installed") # Filter to stale entries only report <- check_db_staleness() report[report$status != "ok", ]# Check all tracked packages against CRAN report <- check_db_staleness() print(report) # Check specific packages only check_db_staleness(packages = c("dplyr", "tidyr")) # Offline check using installed versions check_db_staleness(source = "installed") # Filter to stale entries only report <- check_db_staleness() report[report$status != "ok", ]
Re-hashes a set of named R objects and compares them against a previously
stored certification. Reports which outputs are unchanged ("ok"), have
changed ("drifted"), are present in the baseline but not supplied
("missing"), or are new outputs not in the baseline ("new").
check_drift( outputs, against = "latest", file = ".reproducr", tolerance = 1e-10 )check_drift( outputs, against = "latest", file = ".reproducr", tolerance = 1e-10 )
outputs |
A fully named list of current R objects – the same names used
in the |
against |
|
file |
|
tolerance |
|
Invisibly returns a data.frame of class
c("drift_report", "data.frame") with columns output, status
("ok", "drifted", "missing", "new"), max_delta, and note.
Also emits a summary via message().
certify() to create a baseline; list_certs() to see available
tags.
cert_file <- tempfile() model <- lm(mpg ~ wt, data = mtcars) certify(list(coefs = coef(model)), tag = "v1", file = cert_file) # Same outputs -- should report "ok" result <- check_drift(list(coefs = coef(model)), against = "v1", file = cert_file ) print(result) # Different model -- should report "drifted" model2 <- lm(mpg ~ hp, data = mtcars) check_drift(list(coefs = coef(model2)), against = "v1", file = cert_file )cert_file <- tempfile() model <- lm(mpg ~ wt, data = mtcars) certify(list(coefs = coef(model)), tag = "v1", file = cert_file) # Same outputs -- should report "ok" result <- check_drift(list(coefs = coef(model)), against = "v1", file = cert_file ) print(result) # Different model -- should report "drifted" model2 <- lm(mpg ~ hp, data = mtcars) check_drift(list(coefs = coef(model2)), against = "v1", file = cert_file )
A convenience function to inspect what certification tags are stored and
their key metadata, without needing to read the raw .rds file.
list_certs(file = ".reproducr")list_certs(file = ".reproducr")
file |
|
A data.frame with columns tag, timestamp, r_version,
os, n_outputs, script – one row per certification.
Returns an empty data frame if no certifications exist.
cert_file <- tempfile() model <- lm(mpg ~ wt, data = mtcars) certify(list(coefs = coef(model)), tag = "v1", file = cert_file) certify(list(coefs = coef(model)), tag = "v2", file = cert_file) list_certs(file = cert_file)cert_file <- tempfile() model <- lm(mpg ~ wt, data = mtcars) certify(list(coefs = coef(model)), tag = "v1", file = cert_file) certify(list(coefs = coef(model)), tag = "v2", file = cert_file) list_certs(file = cert_file)
Produces a shields.io Markdown badge reflecting the current reproducibility status of a project. The badge is colour-coded:
Green (reproducible) – no risks detected.
Yellow (caution) – medium-severity risks only.
Red (at risk) – one or more high-severity risks or drifted outputs.
Grey (unknown) – no risk information supplied.
Can be inserted automatically into a README.md (e.g. from a GitHub
Actions workflow).
repro_badge( audit, risks = NULL, drift = NULL, output = "markdown", readme_path = "README.md" )repro_badge( audit, risks = NULL, drift = NULL, output = "markdown", readme_path = "README.md" )
audit |
An |
risks |
A |
drift |
A |
output |
|
readme_path |
|
Invisibly returns the badge Markdown string.
repro_report(), risk_score(),
check_drift()
script <- tempfile(fileext = ".R") writeLines("x <- dplyr::filter(mtcars, cyl == 4)", script) report <- audit_script(script, renv = FALSE, verbose = FALSE) risks <- risk_score(report) badge <- repro_badge(report, risks) cat(badge)script <- tempfile(fileext = ".R") writeLines("x <- dplyr::filter(mtcars, cyl == 4)", script) report <- audit_script(script, renv = FALSE, verbose = FALSE) risks <- risk_score(report) badge <- repro_badge(report, risks) cat(badge)
Renders a reproducibility audit report from an audit_script() result
and optionally a risk_score() result and check_drift() result. Three
style presets are available:
"minimal" – compact summary suitable for console review or internal
project documentation.
"academic" – generates a ready-to-paste methods paragraph for journal
submissions, listing all packages with versions and summarising risk findings.
"pharma" – structured QC document with a risk register and sign-off
fields, suitable for pharmaceutical or regulated analytical workflows.
repro_report( audit, risks = NULL, drift = NULL, format = "text", style = "minimal", output_file = NULL )repro_report( audit, risks = NULL, drift = NULL, format = "text", style = "minimal", output_file = NULL )
audit |
An |
risks |
A |
drift |
A |
format |
|
style |
|
output_file |
|
Invisibly returns the report content as a character string. For file-based formats, the file is also written to disk.
audit_script(), risk_score(),
check_drift(), repro_badge()
script <- tempfile(fileext = ".R") writeLines(c( "set.seed(237)", "x <- dplyr::filter(mtcars, cyl == 4)", "y <- stats::rnorm(10)" ), script) report <- audit_script(script, renv = FALSE, verbose = FALSE) risks <- risk_score(report) # Console summary repro_report(report, risks, format = "text", style = "minimal") # Academic methods paragraph (printed, not written to file) cat(repro_report(report, risks, format = "text", style = "academic"))script <- tempfile(fileext = ".R") writeLines(c( "set.seed(237)", "x <- dplyr::filter(mtcars, cyl == 4)", "y <- stats::rnorm(10)" ), script) report <- audit_script(script, renv = FALSE, verbose = FALSE) risks <- risk_score(report) # Console summary repro_report(report, risks, format = "text", style = "minimal") # Academic methods paragraph (printed, not written to file) cat(repro_report(report, risks, format = "text", style = "academic"))
Takes an audit_report and checks every detected pkg::fn call against
three independent checks:
"changelog" – matches against a curated database of known breaking
changes in popular CRAN packages, flagging calls where the installed
version falls in a known-risky version window.
"seed_check" – flags stochastic functions (rnorm, sample, etc.)
where no set.seed() appears within 50 lines above the call.
"locale_check" – flags functions whose output is locale-sensitive
(sort(), format(), tolower(), etc.).
risk_score( audit, methods = c("changelog", "seed_check", "locale_check"), min_risk = "low", major_version_grace = 1L ) ## S3 method for class 'risk_report' print(x, ...) ## S3 method for class 'risk_report' as.data.frame(x, ...) ## S3 method for class 'risk_report' x[i, j, ...]risk_score( audit, methods = c("changelog", "seed_check", "locale_check"), min_risk = "low", major_version_grace = 1L ) ## S3 method for class 'risk_report' print(x, ...) ## S3 method for class 'risk_report' as.data.frame(x, ...) ## S3 method for class 'risk_report' x[i, j, ...]
audit |
An |
methods |
|
min_risk |
|
major_version_grace |
|
x |
A |
... |
Additional arguments (currently unused). |
i |
Row index. |
j |
Column index. When columns are subsetted and required columns are
removed, the |
A data.frame of class c("risk_report", "data.frame") with one
row per flagged call. Columns:
fileSource file path.
lineLine number of the call.
callThe pkg::fn string.
pkg_versionInstalled or lockfile-resolved version.
risk"high", "medium", or "low".
checkWhich check flagged it: "changelog", "seed_check",
or "locale_check".
descriptionPlain-English explanation of the risk.
referenceURL to the relevant changelog or documentation.
Rows are ordered by risk severity (high first), then by file and line. If no risks are found, an empty data frame with the same columns is returned.
The "changelog" check uses a half-open version window (from_ver, to_ver]:
a call is flagged only if the installed version is greater than
from_ver and at most to_ver. This means the risk is scoped to
versions where the breaking change is known to apply.
When an installed version is major_version_grace or more major versions
ahead of from_version, the entry is suppressed entirely. The user is
already past the breaking-change transition – flagging it at any severity
would be a false positive. The database staleness check
(check_db_staleness()) handles the maintenance concern of
identifying entries whose from_version floor is too old.
audit_script() to generate the input;
repro_report() to render the results;
check_db_staleness() to identify database entries with
windows that are too wide.
script <- tempfile(fileext = ".R") writeLines(c( "x <- dplyr::summarise(mtcars, n = dplyr::n())", "y <- stats::rnorm(100)", "z <- base::sort(letters)" ), script) report <- audit_script(script, renv = FALSE, verbose = FALSE) risks <- risk_score(report) print(risks) # High-severity items only risk_score(report, min_risk = "high") # Only the changelog check risk_score(report, methods = "changelog")script <- tempfile(fileext = ".R") writeLines(c( "x <- dplyr::summarise(mtcars, n = dplyr::n())", "y <- stats::rnorm(100)", "z <- base::sort(letters)" ), script) report <- audit_script(script, renv = FALSE, verbose = FALSE) risks <- risk_score(report) print(risks) # High-severity items only risk_score(report, min_risk = "high") # Only the changelog check risk_score(report, methods = "changelog")