--- title: "Getting started with scopusflow" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with scopusflow} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ```{r setup} library(scopusflow) ``` This vignette is fully reproducible without a Scopus API key. It draws on a small static fixture bundled with the package, so the whole workflow can be shown offline. The few steps that genuinely need the API are shown but not run. ## Describing a search as a plan A plan separates describing a search from executing it. Plans are inspectable, saveable and version-controllable, and they can be partitioned, for example by year, so that a large retrieval stays under the API's `start < 5000` ceiling and can be cached and resumed. ```{r} plan <- scopus_plan( "machine translation", years = 2018:2020, field = "TITLE-ABS-KEY", partition = "year" ) plan ``` Each row is one query cell. Field tags wrap the query and years become a date filter: ```{r} scopus_plan("language learning", field = "TITLE")$query scopus_plan("x", years = 2015:2020)$date ``` ## Sizing and fetching With a key configured, you size a search cheaply and then execute the plan, optionally caching each cell so that an interrupted run resumes without re-spending quota. These contact the API, so they are not evaluated here: ```{r eval = FALSE} scopus_count("machine translation", years = 2018:2020, field = "TITLE-ABS-KEY") records <- scopus_fetch_plan(plan, cache_dir = scopus_cache_dir(), resume = TRUE) ``` ## The record schema Whether records come from the API or from the bundled example data, they share one stable schema. The package ships a small, already normalised set, which we use here to continue offline: ```{r} records <- example_records records ``` `scopus_records()` produces this same shape from a raw API response, flattening the nested result into one row per record. ## DOIs and change tracking Extract a clean, deduplicated DOI list for import into a reference manager, and compare two retrievals to see exactly what changed: ```{r} dois <- scopus_extract_dois(records) dois # Suppose a later retrieval added one DOI and dropped another. later <- c(dois[-1], "10.1000/example.999") scopus_diff_dois(old = dois, new = later) ``` You can write the DOIs to a path you specify: ```{r} out <- file.path(tempdir(), "dois.csv") scopus_extract_dois(records, file = out) readLines(out) ``` ## Comparing topic trends `scopus_compare_topics()` issues one count request per term per year, so it needs the API. Its output has a fixed shape, which we reproduce here to show the plot: ```{r eval = FALSE} cmp <- scopus_compare_topics( reference_query = "language learning", comparison_terms = c("effect size", "Bayesian"), years = 2015:2020, field = "TITLE-ABS-KEY" ) ``` ```{r} # A stand-in comparison object with the same columns scopus_compare_topics() # returns, so the plotting step is reproducible offline. cmp <- tibble::tibble( query = "q", query_type = rep(c("reference", "comparison", "comparison"), each = 6), abridged_query = rep(c("language learning", "effect size", "Bayesian"), each = 6), year = rep(2015:2020, 3), n = c(rep(100, 6), 20, 24, 30, 33, 40, 45, 5, 7, 9, 12, 15, 19), reference_n = rep(100, 18), comparison_percentage = c(rep(100, 6), 20, 24, 30, 33, 40, 45, 5, 7, 9, 12, 15, 19), average_comparison_percentage = rep(c(100, 32, 11.2), each = 6) ) class(cmp) <- c("scopus_comparison", class(cmp)) cmp ``` ```{r fig.alt = "Line chart of two topics' share of the reference literature over time", fig.width = 7, fig.height = 4.5} if (requireNamespace("ggplot2", quietly = TRUE)) { plot_scopus_comparison(cmp) } ``` ## Export and interoperability Hand results to `bibliometrix`-style workflows, or save and reload them: ```{r} head(as_bibliometrix(records)) path <- file.path(tempdir(), "records.rds") write_scopus_records(records, path) identical(read_scopus_records(path), records) ``` ## Handling failures Network and API problems surface as typed conditions, all inheriting from `scopus_error`, so a workflow can respond to them in code: ```{r eval = FALSE} tryCatch( scopus_fetch("..."), scopus_error_no_key = function(e) message("No API key configured."), scopus_error_rate_limit = function(e) message("Rate limited, so backing off."), scopus_error = function(e) message("Scopus error: ", conditionMessage(e)) ) ```