Title: | Utilities to Extract and Process 'YAML' Fragments |
---|---|
Description: | Provides a number of functions to facilitate extracting information in 'YAML' fragments from one or multiple files, optionally structuring the information in a 'data.tree'. 'YAML' (recursive acronym for "YAML ain't Markup Language") is a convention for specifying structured data in a format that is both machine- and human-readable. 'YAML' therefore lends itself well for embedding (meta)data in plain text files, such as Markdown files. This principle is implemented in 'yum' with minimal dependencies (i.e. only the 'yaml' packages, and the 'data.tree' package can be used to enable additional functionality). |
Authors: | Gjalt-Jorn Peters [aut, cre] |
Maintainer: | Gjalt-Jorn Peters <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-12-23 06:25:10 UTC |
Source: | CRAN |
If the data.tree::data.tree package is installed, this function
can be used to convert a list of objects, as loaded from extracted
YAML fragments, into a data.tree::Node()
.
build_tree( x, idName = "id", parentIdName = "parentId", childrenName = "children", autofill = c(label = "id"), rankdir = "LR", directed = "false", silent = TRUE )
build_tree( x, idName = "id", parentIdName = "parentId", childrenName = "children", autofill = c(label = "id"), rankdir = "LR", directed = "false", silent = TRUE )
x |
Either a list of YAML fragments loaded from a file with
|
idName |
The name of the field containing each elements' identifier, used to build the data tree when there are references to a parent from a child element. |
parentIdName |
The name of the field containing references to an element's parent element (i.e. the field containing the identifier of the corresponding parent element). |
childrenName |
The name of the field containing an element's children, either as a list of elements, or using the 'shorthand' notation, in which case a vector is supplied with the identifiers of the children. |
autofill |
A named vector where the names represent fields to fill with
the values of the fields specified in the vector values. Note that autofill
replacements are only applied if the fields to be autofilled (i.e. the names of
the vector specified in |
rankdir |
How to plot the plot when it's plotted: the default |
directed |
Whether the edges should have arrows ( |
silent |
Whether to provide ( |
a data.tree::Node()
object.
loadedYum <- yum::load_yaml_fragments(text=c( "---", "-", " id: firstFragment", "---", "Outside of YAML", "---", "-", " id: secondFragment", " parentId: firstFragment", "---", "Also outside of YAML")); yum::build_tree(loadedYum);
loadedYum <- yum::load_yaml_fragments(text=c( "---", "-", " id: firstFragment", "---", "Outside of YAML", "---", "-", " id: secondFragment", " parentId: firstFragment", "---", "Also outside of YAML")); yum::build_tree(loadedYum);
These function deletes all YAML fragments from a file, returning a character vector without the lines that specified the YAML fragments.
delete_yaml_fragments( file, text, delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, silent = TRUE )
delete_yaml_fragments( file, text, delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, silent = TRUE )
file |
The path to a file to scan; if provided, takes precedence
over |
text |
A character vector to scan, where every element should
represent one line in the file; can be specified instead of |
delimiterRegEx |
The regular expression used to locate YAML fragments. |
ignoreOddDelimiters |
Whether to throw an error (FALSE) or delete the last delimiter (TRUE) if an odd number of delimiters is encountered. |
silent |
Whether to be silent (TRUE) or informative (FALSE). |
A list of character vectors.
yum::delete_yaml_fragments(text=c("---", "First YAML fragment", "---", "Outside of YAML", "---", "Second fragment", "---", "Also outside of YAML"));
yum::delete_yaml_fragments(text=c("---", "First YAML fragment", "---", "Outside of YAML", "---", "Second fragment", "---", "Also outside of YAML"));
These function extracts all YAML fragments from all files in a directory returning a list of character vectors containing the extracted fragments.
extract_yaml_dir( path, recursive = TRUE, fileRegexes = c("^[^\\.]+.*$"), delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
extract_yaml_dir( path, recursive = TRUE, fileRegexes = c("^[^\\.]+.*$"), delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
path |
The path containing the files. |
recursive |
Whether to also process subdirectories ( |
fileRegexes |
A vector of regular expressions to match the files
against: only files matching one or more regular expressions in this
vector are processed. The default regex ( |
delimiterRegEx |
The regular expression used to locate YAML fragments. |
ignoreOddDelimiters |
Whether to throw an error (FALSE) or delete the last delimiter (TRUE) if an odd number of delimiters is encountered. |
encoding |
The encoding to use when calling |
silent |
Whether to be silent ( |
A list of character vectors.
### First get the directory where 'yum' is installed yumDir <- system.file(package="yum"); ### Specify the path of some example files examplePath <- file.path(yumDir, "extdata"); ### Show files (should be three .dct files) list.files(examplePath); ### Load these files yum::extract_yaml_dir(path=examplePath);
### First get the directory where 'yum' is installed yumDir <- system.file(package="yum"); ### Specify the path of some example files examplePath <- file.path(yumDir, "extdata"); ### Show files (should be three .dct files) list.files(examplePath); ### Load these files yum::extract_yaml_dir(path=examplePath);
These function extracts all YAML fragments from a file, returning a list of character vectors containing the extracted fragments.
extract_yaml_fragments( text, file, delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
extract_yaml_fragments( text, file, delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
text , file
|
As |
delimiterRegEx |
The regular expression used to locate YAML fragments. |
ignoreOddDelimiters |
Whether to throw an error (FALSE) or delete the last delimiter (TRUE) if an odd number of delimiters is encountered. |
encoding |
The encoding to use when calling |
silent |
Whether to be silent ( |
A list of character vectors, where each vector corresponds to one YAML fragment in the source file or text.
extract_yaml_fragments(text=" --- First: YAML fragment id: firstFragment --- Outside of YAML --- Second: YAML fragment id: secondFragment parentId: firstFragment --- Also outside of YAML ");
extract_yaml_fragments(text=" --- First: YAML fragment id: firstFragment --- Outside of YAML --- Second: YAML fragment id: secondFragment parentId: firstFragment --- Also outside of YAML ");
These function finds all YAML fragments from a file, returning their start and end indices or all indices of all lines in the (non-)YAML fragments.
find_yaml_fragment_indices( file, text, invert = FALSE, returnFragmentIndices = TRUE, returnPairedIndices = TRUE, delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, silent = TRUE )
find_yaml_fragment_indices( file, text, invert = FALSE, returnFragmentIndices = TRUE, returnPairedIndices = TRUE, delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, silent = TRUE )
file |
The path to a file to scan; if provided, takes precedence
over |
text |
A character vector to scan, where every element should
represent one line in the file; can be specified instead of |
invert |
Set to |
returnFragmentIndices |
Set to |
returnPairedIndices |
Whether to return two vectors with the start and end indices, or pair them up in vectors of 2. |
delimiterRegEx |
The regular expression used to locate YAML fragments. |
ignoreOddDelimiters |
Whether to throw an error (FALSE) or delete the last delimiter (TRUE) if an odd number of delimiters is encountered. |
silent |
Whether to be silent (TRUE) or informative (FALSE). |
A list of numeric vectors with start and end indices
### Create simple text vector with the right delimiters simpleExampleText <- c( "---", "First YAML fragment", "---", "Outside of YAML", "This, too.", "---", "Second fragment", "---", "Also outside of YAML", "Another one outside", "Last one" ); yum::find_yaml_fragment_indices( text=simpleExampleText ); yum::find_yaml_fragment_indices( text=simpleExampleText, returnFragmentIndices = FALSE ); yum::find_yaml_fragment_indices( text=simpleExampleText, invert = TRUE );
### Create simple text vector with the right delimiters simpleExampleText <- c( "---", "First YAML fragment", "---", "Outside of YAML", "This, too.", "---", "Second fragment", "---", "Also outside of YAML", "Another one outside", "Last one" ); yum::find_yaml_fragment_indices( text=simpleExampleText ); yum::find_yaml_fragment_indices( text=simpleExampleText, returnFragmentIndices = FALSE ); yum::find_yaml_fragment_indices( text=simpleExampleText, invert = TRUE );
This function takes a hierarchical structure of lists and extracts all atomic vectors, returning one flat list of all those vectors.
flatten_list_of_lists(x)
flatten_list_of_lists(x)
x |
The list of lists. |
A list of atomic vectors.
### First create a list of lists listOfLists <- list(list(list(1:3, 8:5), 7:7), list(1:4, 8:2)); yum::flatten_list_of_lists(listOfLists);
### First create a list of lists listOfLists <- list(list(list(1:3, 8:5), 7:7), list(1:4, 8:2)); yum::flatten_list_of_lists(listOfLists);
Checking whether numbers are odd or even
is.odd(vector) is.even(vector)
is.odd(vector) is.even(vector)
vector |
The vector to process |
A logical vector.
is.odd(4);
is.odd(4);
These function extracts all YAML fragments from a file or text (load_and_simplify
)
or from all files in a directory (load_and_simplify_dir
) and loads them
by calling load_yaml_fragments()
, and then calls simplify_by_flattening()
,
on the result, returning the resulting list.
load_and_simplify( text, file, yamlFragments = NULL, select = ".*", simplify = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE ) load_and_simplify_dir( path, recursive = TRUE, fileRegexes = c("^[^\\.]+.*$"), select = ".*", simplify = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
load_and_simplify( text, file, yamlFragments = NULL, select = ".*", simplify = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE ) load_and_simplify_dir( path, recursive = TRUE, fileRegexes = c("^[^\\.]+.*$"), select = ".*", simplify = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
text |
As |
file |
As |
yamlFragments |
A character vector of class |
select |
A vector of regular expressions specifying object names
to retain. The default ( |
simplify |
A regular expression specifying which elements to simplify (default is everything) |
delimiterRegEx |
The regular expression used to locate YAML fragments. |
ignoreOddDelimiters |
Whether to throw an error (FALSE) or delete the last delimiter (TRUE) if an odd number of delimiters is encountered. |
encoding |
The encoding to use when calling |
silent |
Whether to be silent ( |
path |
The path containing the files. |
recursive |
Whether to also process subdirectories ( |
fileRegexes |
A vector of regular expressions to match the files
against: only files matching one or more regular expressions in this
vector are processed. The default regex ( |
A list of objects, where each object corresponds to one
item specified in the read YAML fragment(s) from the source file
or text. If the convention of the rock
, dct
and justifier
packages is followed, each object in this list contains one or
more named objects (lists), where the name indicates the type
of information contained. Each of those objects (lists) then
contains one or more objects of that type, such as metadata or
codes for rock
, a decentralized construct taxonomy element
for dct
, and a justification, decision, assertion, or source
for justifier
.
yum::load_and_simplify(text=" --- firstObject: id: firstFragment --- Outside of YAML --- otherObjectType: - id: secondFragment parentId: firstFragment - id: thirdFragment parentId: firstFragment --- Also outside of YAML");
yum::load_and_simplify(text=" --- firstObject: id: firstFragment --- Outside of YAML --- otherObjectType: - id: secondFragment parentId: firstFragment - id: thirdFragment parentId: firstFragment --- Also outside of YAML");
These function extracts all YAML fragments from all files in a directory returning a list of character vectors containing the extracted fragments.
load_yaml_dir( path, recursive = TRUE, fileRegexes = c("^[^\\.]+.*$"), select = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
load_yaml_dir( path, recursive = TRUE, fileRegexes = c("^[^\\.]+.*$"), select = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
path |
The path containing the files. |
recursive |
Whether to also process subdirectories ( |
fileRegexes |
A vector of regular expressions to match the files
against: only files matching one or more regular expressions in this
vector are processed. The default regex ( |
select |
A vector of regular expressions specifying object names
to retain. The default ( |
delimiterRegEx |
The regular expression used to locate YAML fragments. |
ignoreOddDelimiters |
Whether to throw an error (FALSE) or delete the last delimiter (TRUE) if an odd number of delimiters is encountered. |
encoding |
The encoding to use when calling |
silent |
Whether to be silent ( |
These function extracts all YAML fragments from all files in a
directory and then calls yaml::yaml.load()
to parse them. It
then returns a list where each element is a list with the parsed
fragments in a file.
A list of lists of objects.
### First get the directory where 'yum' is installed yumDir <- system.file(package="yum"); ### Specify the path of some example files examplePath <- file.path(yumDir, "extdata"); ### Show files (should be three .dct files) list.files(examplePath); ### Load these files yum::load_yaml_dir(path=examplePath);
### First get the directory where 'yum' is installed yumDir <- system.file(package="yum"); ### Specify the path of some example files examplePath <- file.path(yumDir, "extdata"); ### Show files (should be three .dct files) list.files(examplePath); ### Load these files yum::load_yaml_dir(path=examplePath);
These function extracts all YAML fragments from a file and then
calls yaml::yaml.load()
to parse them. It then returns a list
of the parsed fragments.
load_yaml_fragments( text, file, yamlFragments = NULL, select = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
load_yaml_fragments( text, file, yamlFragments = NULL, select = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
text |
As |
file |
As |
yamlFragments |
A character vector of class |
select |
A vector of regular expressions specifying object names
to retain. The default ( |
delimiterRegEx |
The regular expression used to locate YAML fragments. |
ignoreOddDelimiters |
Whether to throw an error (FALSE) or delete the last delimiter (TRUE) if an odd number of delimiters is encountered. |
encoding |
The encoding to use when calling |
silent |
Whether to be silent ( |
A list of objects, where each object corresponds to one
YAML fragment from the source file or text. If the convention of
the rock
, dct
and justifier
packages is followed, each object
in this list contains one or more named objects (lists), where the
name indicated the type of information contained. Each of those
objects (lists) then contains one or more objects of that type,
such as metadata or codes for rock
, a decentralized construct
taxonomy element for dct
, and a justification for justifier
.
yum::load_yaml_fragments(text=" --- - id: firstFragment --- Outside of YAML --- - id: secondFragment parentId: firstFragment --- Also outside of YAML");
yum::load_yaml_fragments(text=" --- - id: firstFragment --- Outside of YAML --- - id: secondFragment parentId: firstFragment --- Also outside of YAML");
These function extracts all YAML fragments from character vectors in a list, returning a list of character vectors containing the extracted fragments.
load_yaml_list( x, recursive = TRUE, select = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
load_yaml_list( x, recursive = TRUE, select = ".*", delimiterRegEx = "^---$", ignoreOddDelimiters = FALSE, encoding = "UTF-8", silent = TRUE )
x |
The list containing the character vectors. |
recursive |
Whether to first |
select |
A vector of regular expressions specifying object names
to retain. The default ( |
delimiterRegEx |
The regular expression used to locate YAML fragments. |
ignoreOddDelimiters |
Whether to throw an error (FALSE) or delete the last delimiter (TRUE) if an odd number of delimiters is encountered. |
encoding |
The encoding to use when calling |
silent |
Whether to be silent ( |
This function calls yaml::yaml.load()
on all character vectors
in a list. It then returns a list where each element is a list
with the parsed fragments in a file.
A list of lists of objects.
yamlList <- list(c( "---", "-", " id: firstFragment", "---"), c( "---", "-", " id: secondFragment", " parentId: firstFragment", "---")); yum::load_yaml_list(yamlList);
yamlList <- list(c( "---", "-", " id: firstFragment", "---"), c( "---", "-", " id: secondFragment", " parentId: firstFragment", "---")); yum::load_yaml_list(yamlList);
This function does some cleaning and simplifying to allow efficient specification of elements in the YAML fragments.
simplify_by_flattening(x, simplify = ".*", .level = 1)
simplify_by_flattening(x, simplify = ".*", .level = 1)
x |
Extracted (and loaded) YAML fragments |
simplify |
A regular expression specifying which elements to simplify (default is everything) |
.level |
Internal argument to enable slightly-less-than-elegant 'recursion'. |
A simplified list (but still a list)
yamlFragmentExample <- ' --- source: - id: src_1 label: "Label 1" - id: src_2 label: "Label 2" assertion: - id: assertion_1 label: "Assertion 1" - id: assertion_2 label: "Assertion 2" --- '; loadedExampleFragments <- load_yaml_fragments(yamlFragmentExample); simplified <- simplify_by_flattening(loadedExampleFragments); ### Pre simmplification: str(loadedExampleFragments); ### Post simmplification: str(simplified);
yamlFragmentExample <- ' --- source: - id: src_1 label: "Label 1" - id: src_2 label: "Label 2" assertion: - id: assertion_1 label: "Assertion 1" - id: assertion_2 label: "Assertion 2" --- '; loadedExampleFragments <- load_yaml_fragments(yamlFragmentExample); simplified <- simplify_by_flattening(loadedExampleFragments); ### Pre simmplification: str(loadedExampleFragments); ### Post simmplification: str(simplified);
Easily parse a vector into a character value
vecTxt( vector, delimiter = ", ", useQuote = "", firstDelimiter = NULL, lastDelimiter = " & ", firstElements = 0, lastElements = 1, lastHasPrecedence = TRUE ) vecTxtQ(vector, useQuote = "'", ...)
vecTxt( vector, delimiter = ", ", useQuote = "", firstDelimiter = NULL, lastDelimiter = " & ", firstElements = 0, lastElements = 1, lastHasPrecedence = TRUE ) vecTxtQ(vector, useQuote = "'", ...)
vector |
The vector to process. |
delimiter , firstDelimiter , lastDelimiter
|
The delimiters
to use for respectively the middle, first
|
useQuote |
This character string is pre- and appended to all elements;
so use this to quote all elements ( |
firstElements , lastElements
|
The number of elements for which to use the first respective last delimiters |
lastHasPrecedence |
If the vector is very short, it's possible that the
sum of firstElements and lastElements is larger than the vector length. In
that case, downwardly adjust the number of elements to separate with the
first delimiter ( |
... |
Any addition arguments to |
A character vector of length 1.
vecTxtQ(names(mtcars));
vecTxtQ(names(mtcars));