Title: | Access the 'Finna' API |
---|---|
Description: | Provides functions to access and retrieve metadata from the 'Finna' API <https://api.finna.fi/>, which aggregates content from Finnish archives, libraries, and museums. |
Authors: | Akewak Jeba [aut, cre] , Leo Lahti [aut] |
Maintainer: | Akewak Jeba <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 0.1.1 |
Built: | 2025-01-22 20:01:28 UTC |
Source: | CRAN |
Performs basic analysis on Finna metadata, summarizing the distribution of formats, years, and authors.
analyze_metadata(metadata)
analyze_metadata(metadata)
metadata |
A tibble containing refined Finna metadata. |
A list of tibbles with summaries of formats, years, and authors.
library(finna) sibelius_data <- search_finna("sibelius") refined_data <- refine_metadata(sibelius_data) analyze_metadata(refined_data)
library(finna) sibelius_data <- search_finna("sibelius") refined_data <- refine_metadata(sibelius_data) analyze_metadata(refined_data)
This function analyzes how search results for a given query have trended over time, binned by decades. It plots the number of records found for each decade, allowing users to observe long-term trends.
analyze_trends_over_time(data, query = "Records Over Time")
analyze_trends_over_time(data, query = "Records Over Time")
data |
A tibble containing Finna search results with a |
query |
A search query string (optional) to label the plot. |
A ggplot2 plot showing the trend of records over time.
finna_data <- search_finna("Sibelius") trends <- analyze_trends_over_time(finna_data, "Sibelius") print(trends)
finna_data <- search_finna("Sibelius") trends <- analyze_trends_over_time(finna_data, "Sibelius") print(trends)
This function tests whether R can successfully connect to the Finna API by downloading
the OpenAPI specification from https://api.finna.fi/api/v1/?openapi
. It returns
a logical value indicating the accessibility of the API.
check_api_access()
check_api_access()
A logical value:
TRUE
: The API is accessible.
FALSE
: The API is not accessible.
## Not run: # Check if the API is accessible access <- check_api_access() if (access) { message("Finna API is accessible") } else { message("Finna API is not accessible") } ## End(Not run)
## Not run: # Check if the API is accessible access <- check_api_access() if (access) { message("Finna API is accessible") } else { message("Finna API is not accessible") } ## End(Not run)
This function reads a CSV file from a URL containing Melinda IDs and author names. If the author name is missing (NA), it searches the 'Finna' API for the corresponding Melinda ID to retrieve and update the author name. The updated data is saved in a CSV file.
enrich_author_name(url, output_file = "updated_na_author_rows.csv")
enrich_author_name(url, output_file = "updated_na_author_rows.csv")
url |
A character string specifying the URL of the CSV file with Melinda IDs and author names. |
output_file |
A character string specifying the output CSV file name. |
A tibble with updated author names. The file is saved to a temporary directory using tempdir()
.
## Not run: enrich_author_name(url = "https://example/na_author_rows.csv", output_file = "updated_na_author_rows.csv") ## End(Not run)
## Not run: enrich_author_name(url = "https://example/na_author_rows.csv", output_file = "updated_na_author_rows.csv") ## End(Not run)
This function fetches records from the Finna API in chunks of 100,000, automatically paginating through the results until the maximum number of records is reached.
fetch_all_records( base_query = "*", base_filters = c("collection:\"FEN\""), sort = "main_date_str asc", limit_per_query = 1e+05, total_limit = Inf )
fetch_all_records( base_query = "*", base_filters = c("collection:\"FEN\""), sort = "main_date_str asc", limit_per_query = 1e+05, total_limit = Inf )
base_query |
A string specifying the base query. Defaults to "*". |
base_filters |
A character vector of filters to apply to the query.
Defaults to |
sort |
A string defining the sort order of the results. Default is "main_date_str asc". |
limit_per_query |
An integer specifying the number of records to fetch per query. Defaults to 100000. |
total_limit |
An integer specifying the maximum number of records to fetch. Defaults to |
A tibble containing all fetched records.
## Not run: results <- fetch_all_records( base_query = "*", base_filters = c('collection:"FEN"'), sort = "main_date_str asc", limit_per_query = 100000, total_limit = Inf ) print(results) ## End(Not run)
## Not run: results <- fetch_all_records( base_query = "*", base_filters = c('collection:"FEN"'), sort = "main_date_str asc", limit_per_query = 100000, total_limit = Inf ) print(results) ## End(Not run)
This function retrieves data from the Finna API and formats it as a tidy tibble.
fetch_finna( query = NULL, limit = 0, facets = "building", lng = "fi", prettyPrint = TRUE )
fetch_finna( query = NULL, limit = 0, facets = "building", lng = "fi", prettyPrint = TRUE )
query |
The query string for filtering results. Defaults to NULL, which fetches data without a specific search term. |
limit |
Maximum number of results to fetch. Defaults to 0. |
facets |
Facet to retrieve, defaults to "building". |
lng |
Language for results, defaults to "fi". |
prettyPrint |
Logical, whether to pretty-print JSON responses. |
A tibble containing the fetched data with relevant fields.
## Not run: fetch_finna(query = "record_format:ead", limit = 0) fetch_finna() # Fetches data with no specific query ## End(Not run)
## Not run: fetch_finna(query = "record_format:ead", limit = 0) fetch_finna() # Fetches data with no specific query ## End(Not run)
This function fetches records from the Finna API in chunks divided by year ranges, handling missing date values.
fetch_viola_records( base_query = "*", base_filters = c("collection:\"VIO\""), year_ranges = list(c(0, as.numeric(format(Sys.Date(), "%Y")))), include_na = TRUE, limit_per_query = 1e+05, total_limit = Inf, delay_after_query = 5 )
fetch_viola_records( base_query = "*", base_filters = c("collection:\"VIO\""), year_ranges = list(c(0, as.numeric(format(Sys.Date(), "%Y")))), include_na = TRUE, limit_per_query = 1e+05, total_limit = Inf, delay_after_query = 5 )
base_query |
The base query string, defaults to "*". |
base_filters |
A character vector of filters for the search, e.g., |
year_ranges |
A list of numeric vectors specifying year ranges, e.g., |
include_na |
Whether to include records with missing |
limit_per_query |
Maximum number of records to fetch per query. Default is 100000. |
total_limit |
Maximum number of records to fetch overall. Default is |
delay_after_query |
Delay in seconds between queries. Default is 5. |
A tibble containing all fetched records.
Automatically generates a citation for a Finna collection result.
finna_cite(result, index, style = "citation")
finna_cite(result, index, style = "citation")
result |
The Finna collection result as a tibble. |
index |
The index of the collection to cite (numeric). |
style |
The citation style to use (default: "citation"). See |
A bibliographic entry (bibentry
) printed in the specified style.
Provides an interactive interface to search, select, and download datasets from Finna API.
finna_interactive()
finna_interactive()
A dataframe containing the selected dataset or downloaded data.
search_finna()
, fetch_finna()
, finna_cite()
This function retrieves multiple Finna records based on a vector of record IDs. You can specify which fields to return, the language, and the pagination options.
get_finna_records( ids, field = NULL, prettyPrint = FALSE, lng = "fi", page = 1, limit = 100 )
get_finna_records( ids, field = NULL, prettyPrint = FALSE, lng = "fi", page = 1, limit = 100 )
ids |
A vector of record IDs to retrieve. |
field |
A vector of fields to return. Defaults to NULL, which returns all default fields. |
prettyPrint |
Logical; whether to pretty-print the response. Defaults to FALSE. |
lng |
Language for returned translated strings. Defaults to "fi". |
page |
The page number to retrieve. Defaults to 1. |
limit |
The number of records to return per page. Defaults to 20. |
A tibble containing the retrieved records data with provenance information.
records <- get_finna_records("fikka.3405646", field = "title", prettyPrint = TRUE, lng = "en-gb") print(records)
records <- get_finna_records("fikka.3405646", field = "title", prettyPrint = TRUE, lng = "en-gb") print(records)
This function harvests metadata records from an OAI-PMH-compliant server in batches, using a custom User-Agent string to identify the service and returns them in a tibble format.
harvest_oai_pmh( base_url, metadata_prefix, set = NULL, verbose = TRUE, user_agent = "FinnaHarvester/1.0", output_file = NULL, record_limit = NULL )
harvest_oai_pmh( base_url, metadata_prefix, set = NULL, verbose = TRUE, user_agent = "FinnaHarvester/1.0", output_file = NULL, record_limit = NULL )
base_url |
A string. The base URL of the OAI-PMH server. |
metadata_prefix |
A string. The metadata format to request (e.g., "oai_dc", "marc21"). |
set |
A string. Optional. A set specifier to limit the harvested records (e.g., "non_dedup"). |
verbose |
A logical. Whether to display progress messages. Default is |
user_agent |
A string. A custom User-Agent string to identify the service. Default is "FinnaHarvester/1.0". |
output_file |
output file to be saved as a csv file. |
record_limit |
limits the number of records that the user wants to fetch |
A tibble with the harvested records containing selected metadata fields.
## Not run: # Example for oai_dc (Dublin Core) records_oai_dc <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_dc", user_agent = "MyCustomHarvester/1.0" ) # Example for marc21 (MARC 21) records_marc21 <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "marc21", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_vufind_json (VuFind JSON) records_oai_vufind_json <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_vufind_json", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_ead (Encoded Archival Description) records_oai_ead <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_ead", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_ead3 (Encoded Archival Description version 3) records_oai_ead3 <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_ead3", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_forward (Forward metadata format) records_oai_forward <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_forward", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_lido (Lightweight Information Describing Objects) records_oai_lido <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_lido", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_qdc (Qualified Dublin Core) records_oai_qdc <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_qdc", user_agent = "MyCustomHarvester/1.0" ) ## End(Not run)
## Not run: # Example for oai_dc (Dublin Core) records_oai_dc <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_dc", user_agent = "MyCustomHarvester/1.0" ) # Example for marc21 (MARC 21) records_marc21 <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "marc21", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_vufind_json (VuFind JSON) records_oai_vufind_json <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_vufind_json", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_ead (Encoded Archival Description) records_oai_ead <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_ead", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_ead3 (Encoded Archival Description version 3) records_oai_ead3 <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_ead3", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_forward (Forward metadata format) records_oai_forward <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_forward", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_lido (Lightweight Information Describing Objects) records_oai_lido <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_lido", user_agent = "MyCustomHarvester/1.0" ) # Example for oai_qdc (Qualified Dublin Core) records_oai_qdc <- harvest_oai_pmh( base_url = "https://api.finna.fi/OAI/Server", metadata_prefix = "oai_qdc", user_agent = "MyCustomHarvester/1.0" ) ## End(Not run)
This function loads previously saved 'Finna' search results from a local .rds
file for offline access.
load_offline_data(file_name = "offline_search_results")
load_offline_data(file_name = "offline_search_results")
file_name |
A string representing the name of the file to load. The function automatically appends ".rds" if not already included. |
A tibble or data frame containing the loaded search results.
## Not run: search_results <- search_finna("sibelius") save_for_offline(search_results, "sibelius_search_results") offline_data <- load_offline_data("sibelius_search_results") print(offline_data) ## End(Not run)
## Not run: search_results <- search_finna("sibelius") save_for_offline(search_results, "sibelius_search_results") offline_data <- load_offline_data("sibelius_search_results") print(offline_data) ## End(Not run)
The refine_metadata
function cleans and standardizes Finna metadata by:
Validating Required Fields: Checks for the presence of key metadata fields and returns NULL
if any are missing.
Handling Missing Values: Replaces NA
values in critical fields with descriptive placeholder text (e.g., "Unknown Title").
Selecting Relevant Fields: Keeps only the following fields for streamlined analysis:
Title
: The title of the resource.
Author
: The creator or author of the resource.
Year
: The publication or release year.
Language
: The language of the resource.
Formats
: The format(s) of the resource (e.g., Book, Audio).
Subjects
: The subject keywords or classifications.
Library
: The owning library or institution.
Series
: The series or collection the resource belongs to.
refine_metadata(data)
refine_metadata(data)
data |
A tibble containing raw Finna metadata. |
A tibble with selected, cleaned metadata fields, or NULL if required fields are missing.
library(finna) sibelius_data <- search_finna("sibelius") refine_metadata(sibelius_data)
library(finna) sibelius_data <- search_finna("sibelius") refine_metadata(sibelius_data)
This function saves 'Finna' search results and metadata locally to a file in .rds
format,
allowing users to access and analyze the data offline without an internet connection.
save_for_offline(data, file_name = "offline_search_results")
save_for_offline(data, file_name = "offline_search_results")
data |
A tibble or data frame containing the 'Finna' search results. |
file_name |
A string representing the name of the file to save. The function automatically appends ".rds" to the name if not already included. |
No return value. Called for its side effects of saving the data to a file.
## Not run: search_results <- search_finna("sibelius") save_for_offline(search_results, "sibelius_search_results") ## End(Not run)
## Not run: search_results <- search_finna("sibelius") save_for_offline(search_results, "sibelius_search_results") ## End(Not run)
This function retrieves records from the Finna index with an option to limit the total number of records returned. The function paginates through the results, fetching records until the specified total limit is reached.
search_finna( query = NULL, type = "AllFields", fields = NULL, filters = NULL, facets = NULL, facetFilters = NULL, sort = "relevance,id asc", limit = 100, lng = "fi", prettyPrint = FALSE )
search_finna( query = NULL, type = "AllFields", fields = NULL, filters = NULL, facets = NULL, facetFilters = NULL, sort = "relevance,id asc", limit = 100, lng = "fi", prettyPrint = FALSE )
query |
description |
type |
A string specifying the type of search. Options include "AllFields", "Title", "Author", "Subject". Defaults to "AllFields". |
fields |
A vector of fields to be returned in the search results. Defaults to NULL, which returns a standard set of fields. |
filters |
A vector of filter queries to refine the search. Defaults to NULL. |
facets |
A vector specifying which facets to return in the results. Defaults to NULL. |
facetFilters |
A vector of regular expressions to filter facets. Defaults to NULL. |
sort |
A string defining the sort order of the results. Options include:
|
limit |
An integer specifying the total number of records to return across multiple pages. |
lng |
A string for the language of returned translated strings. Options are "fi" - Finnish, "en-gb" - English, "sv" - Swedish, "se" - Sami. Defaults to "fi" - Finnish. |
prettyPrint |
A logical value indicating whether to pretty-print the JSON response. Useful for debugging. Defaults to FALSE. |
A tibble containing the search results with relevant fields extracted and provenance information.
search_results <- search_finna("sibelius", sort = "main_date_str desc", limit = 100) print(search_results)
search_results <- search_finna("sibelius", sort = "main_date_str desc", limit = 100) print(search_results)
This function retrieves only the publisher information from the Finna index based on the search query.
search_publisher( query = NULL, limit = 100, lng = "fi", filters = NULL, prettyPrint = FALSE )
search_publisher( query = NULL, limit = 100, lng = "fi", filters = NULL, prettyPrint = FALSE )
query |
A string specifying the search query. |
limit |
An integer specifying the total number of records to return. |
lng |
A string for the language of returned translated strings. Defaults to "fi". |
filters |
A vector of filter queries to refine the search. Defaults to NULL. |
prettyPrint |
A logical value indicating whether to pretty-print the JSON response. Defaults to FALSE. |
A tibble containing the record IDs and their respective publishers.
publishers <- search_publisher("sibelius", limit = 10) print(publishers)
publishers <- search_publisher("sibelius", limit = 10) print(publishers)
Visualizes the top entries for a given field in a data frame. Count and percentage statistics is also shown as needed.
top_plot( x, field = NULL, ntop = NULL, highlight = NULL, max.char = Inf, show.rest = FALSE, show.percentage = FALSE, log10 = FALSE )
top_plot( x, field = NULL, ntop = NULL, highlight = NULL, max.char = Inf, show.rest = FALSE, show.percentage = FALSE, log10 = FALSE )
x |
Data frame, vector or factor |
field |
Field to show |
ntop |
Number of top entries to show |
highlight |
Entries from the 'field' to be highlighted |
max.char |
Max number of characters in strings. Longer strings will be cut and only max.char first characters are shown. No cutting by default |
show.rest |
Show the count of leave-out samples (not in top-N) as an additional bar. |
show.percentage |
Show the proportion of each category with respect to the total sample count. |
log10 |
Show the counts on log10 scale (default FALSE) |
ggplot object
Leo Lahti [email protected]
See citation("bibliographica")
## Not run: p <- top_plot(x, field, 50)
## Not run: p <- top_plot(x, field, 50)