You finish an analysis. The code runs. The numbers look right. But are they stable?
Package updates change function behaviour silently. Stochastic code without a fixed seed produces different results on every run. Results certified last month may drift this month — with no error and no warning.
reproducr makes these risks visible and trackable via a
three-tier workflow:
It works with your existing setup. If you use renv,
reproducr reads your lockfile automatically. No
configuration required.
These are not hypothetical. Each scenario describes a class of problem that occurs routinely in research and regulated workflows, produces no error, and is invisible without explicit tooling.
You write an analysis in January using dplyr 1.0.4 and share it with a colleague who has dplyr 1.1.2.
results <- mtcars |>
dplyr::group_by(cyl) |>
dplyr::summarise(mean_mpg = mean(mpg))
# You then chain a further operation:
results |> dplyr::mutate(rank = dplyr::row_number())In dplyr 1.0.x, summarise() retained grouping by
default. In dplyr 1.1.x it drops the last grouping level. Your
colleague’s mutate() now operates on ungrouped data — the
rank column is computed differently. No error. No warning.
Different numbers.
reproducr flags this immediately:
[HIGH] dplyr::summarise
In dplyr 1.1.0, summarise() changed its default grouping behaviour...
You develop a model locally on R 3.5.3 and deploy to a production server running R 3.6.2.
R 3.6.0 changed the default RNG algorithm for sample().
The same seed now produces a different train/test split. Your model is
trained on different data than you validated locally. Accuracy metrics
differ silently across environments.
reproducr flags this:
[HIGH] stats::sample
In R 3.6.0, the default RNG algorithm changed...
You use renv to lock your environment and restore it six
months later on a new machine. Everything installs correctly but results
differ.
renv locked readr 2.0.1. Your original
analysis was written with readr 1.4.0. The lockfile
captured the version you were already on when you ran
renv::init() — past the breaking change. You never compared
against pre-2.0 output.
data <- readr::read_csv("clinical_data.csv")
# Column "patient_id" now parses as character instead of double.
# Downstream merge silently drops rows.renv cannot detect this because it only sees versions,
not behaviour. reproducr sees the function call and flags
it:
[HIGH] readr::read_csv
In readr 2.0.0, read_csv() switched to the vroom backend.
Column type guessing changed...
The entry point is audit_script(). It reads your R
source files, extracts every qualified pkg::fn call, and
resolves which version of each package is in use.
# Create a small example script
script <- tempfile(fileext = ".R")
writeLines(c(
"# Example analysis",
"set.seed(237)",
"x <- dplyr::filter(mtcars, cyl == 4)",
"y <- dplyr::summarise(x, mean_mpg = mean(mpg), n = dplyr::n())",
"fit <- lm(mpg ~ wt, data = x)",
"z <- stats::rnorm(nrow(y))",
"out <- base::sort(unique(x$gear))"
), script)
report <- audit_script(script, renv = FALSE, verbose = FALSE)
print(report)
#>
#> -- reproducr audit report [2026-06-20 17:30] --
#>
#> Files scanned: 1
#> Packages found: 3
#> Calls detected: 5
#> R version: 4.6.0
#> Platform: Linux 6.17.0-1018-azure
#> Versions from: installed library
#>
#> Next step: risks <- risk_score(report)report$calls
#> file line pkg fn pkg_version
#> 1 /tmp/RtmpqLnHka/file9c22b43a18a.R 3 dplyr filter <NA>
#> 2 /tmp/RtmpqLnHka/file9c22b43a18a.R 4 dplyr summarise <NA>
#> 3 /tmp/RtmpqLnHka/file9c22b43a18a.R 4 dplyr n <NA>
#> 4 /tmp/RtmpqLnHka/file9c22b43a18a.R 6 stats rnorm 4.6.0
#> 5 /tmp/RtmpqLnHka/file9c22b43a18a.R 7 base sort 4.6.0Pass the report to risk_score() to run three independent
checks:
risks <- risk_score(report)
print(risks)
#>
#> -- reproducr risk score --
#>
#> HIGH: 0
#> MEDIUM: 0
#> LOW: 1
#>
#> [LOW] base::sort (line 7 in file9c22b43a18a.R)
#> Check : locale_check
#> Details : sort() output is locale-sensitive. Current locale: en_US.UTF-8.
#> Results may differ on machines with different LC_COLLATE or
#> LC_TIME settings.
#> Reference: https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html"changelog" — checks calls against a
curated database of known silent breaking changes"seed_check" — flags stochastic
functions without a nearby set.seed()"locale_check" — flags functions whose
output varies by system locale# High-severity only
high_risks <- risk_score(report, min_risk = "high")
# Just the seed check
seed_issues <- risk_score(report, methods = "seed_check")# As a plain data frame for downstream use
as.data.frame(risks)
#> file line call pkg_version risk
#> 1 /tmp/RtmpqLnHka/file9c22b43a18a.R 7 base::sort 4.6.0 low
#> check
#> 1 locale_check
#> description
#> 1 sort() output is locale-sensitive. Current locale: en_US.UTF-8. Results may differ on machines with different LC_COLLATE or LC_TIME settings.
#> reference
#> 1 https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.htmlAfter running an analysis, certify the key outputs using
certify().
After any environment change, re-run check_drift():
result <- check_drift(
outputs = list(
coefs = coef(model),
r_squared = summary(model)$r.squared,
n_obs = nrow(mtcars)
),
against = "baseline-v1",
file = cert_file
)
#> -- reproducr drift check vs 'baseline-v1' --
#> Verdict : ALL OUTPUTS MATCH
#> OK : 3
#> Drifted : 0
#> Missing : 0
#> New : 0# Different model — shows drift
model2 <- lm(mpg ~ hp, data = mtcars)
check_drift(
outputs = list(coefs = coef(model2)),
against = "baseline-v1",
file = cert_file
)
#> -- reproducr drift check vs 'baseline-v1' --
#> Verdict : DRIFT DETECTED
#> OK : 0
#> Drifted : 1
#> Missing : 2
#> New : 0
#> Drifted outputs:
#> - coefscat(repro_report(report, risks, format = "text", style = "academic"))
#> Methods paragraph (reproducr)
#>
#> All analyses were conducted in R (version 4.6.0) on Linux 6.17.0-1018-azure. The following packages were used: dplyr, stats (v4.6.0), base (v4.6.0). Reproducibility auditing (reproducr) identified 1 potential concern(s) (0 high, 0 medium severity) relating to known behavioural changes in package APIs across versions. The full audit report and certification records are available in the supplementary materials.
#> # Methods paragraph (reproducr)
#>
#> All analyses were conducted in R (version 4.6.0) on Linux 6.17.0-1018-azure. The following packages were used: dplyr, stats (v4.6.0), base (v4.6.0). Reproducibility auditing (reproducr) identified 1 potential concern(s) (0 high, 0 medium severity) relating to known behavioural changes in package APIs across versions. The full audit report and certification records are available in the supplementary materials.badge <- repro_badge(report, risks, output = "markdown")
#> [](https://repro-stats.github.io/reproducr/)
cat(badge)
#> [](https://repro-stats.github.io/reproducr/)library(reproducr)
# Tier 1
report <- audit_script("analysis.R")
risks <- risk_score(report)
# Tier 2
certify(
outputs = list(coefs = coef(my_model)),
tag = "submission-v1"
)
check_drift(
outputs = list(coefs = coef(my_model)),
against = "submission-v1"
)
# Tier 3
repro_report(report, risks, format = "html", style = "pharma")
repro_badge(report, risks, output = "README")