Title: | Functions to Interact with the 'FAIR Data Pipeline' |
---|---|
Description: | R implementation of the 'FAIR Data Pipeline API'. The 'FAIR Data Pipeline' is intended to enable tracking of provenance of FAIR (findable, accessible and interoperable) data used in epidemiological modelling. |
Authors: | Sonia Mitchell [aut] , Ryan Field [cre, aut] |
Maintainer: | Ryan Field <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.60.0 |
Built: | 2024-12-08 07:19:13 UTC |
Source: | CRAN |
FAIR Data Pipeline API
For more information see https://www.fairdatapipeline.org/
Maintainer: Ryan Field [email protected] (ORCID)
Authors:
Sonia Mitchell (ORCID)
Useful links:
Report bugs at https://github.com/FAIRDataPipeline/rDataPipeline/issues
Add data product to read
block of user-written config file. Used in
combination with create_config()
for unit testing.
add_read( path, data_product, component, version, use_data_product, use_component, use_version, use_namespace )
add_read( path, data_product, component, version, use_data_product, use_component, use_version, use_namespace )
path |
config file path |
data_product |
data_product field |
component |
component field |
version |
(optional) version field |
use_data_product |
(optional) use_data_product field |
use_component |
(optional) use_component field |
use_version |
(optional) use_version field |
use_namespace |
(optional) use_namespace field |
## Not run: path <- "test_config/config.yaml" # Write run_metadata block create_config(path = path, description = "test", input_namespace = "test_user", output_namespace = "test_user") # Write read block add_read(path = path, data_product = "test/array", component = "level/a/s/d/f/s", version = "0.2.0") ## End(Not run)
## Not run: path <- "test_config/config.yaml" # Write run_metadata block create_config(path = path, description = "test", input_namespace = "test_user", output_namespace = "test_user") # Write read block add_read(path = path, data_product = "test/array", component = "level/a/s/d/f/s", version = "0.2.0") ## End(Not run)
Add data product to read
block of user-written config file. Used in
combination with create_config()
for unit testing.
add_write( path, data_product, description, version, file_type, use_data_product, use_component, use_version, use_namespace )
add_write( path, data_product, description, version, file_type, use_data_product, use_component, use_version, use_namespace )
path |
config file path |
data_product |
data_product field |
description |
component field |
version |
(optional) version field |
file_type |
(optional) file type field |
use_data_product |
(optional) use_data_product field |
use_component |
(optional) use_component field |
use_version |
(optional) use_version field |
use_namespace |
(optional) use_namespace field |
## Not run: path <- "test_config/config.yaml" # Write run_metadata block create_config(path = path, description = "test", input_namespace = "test_user", output_namespace = "test_user") # Write read block add_write(path = path, data_product = "test/array", description = "data product description", version = "0.2.0") ## End(Not run)
## Not run: path <- "test_config/config.yaml" # Write run_metadata block create_config(path = path, description = "test", input_namespace = "test_user", output_namespace = "test_user") # Write read block add_write(path = path, data_product = "test/array", description = "data product description", version = "0.2.0") ## End(Not run)
Generates (user generated) config.yaml files for unit tests. Use
add_read()
and add_write()
functions to add read and write
blocks.
create_config( path, description, input_namespace, output_namespace, write_data_store = file.path(tempdir(), "datastore", ""), force = TRUE, local_repo = "local_repo" )
create_config( path, description, input_namespace, output_namespace, write_data_store = file.path(tempdir(), "datastore", ""), force = TRUE, local_repo = "local_repo" )
path |
config file path |
description |
description field |
input_namespace |
input_namespace field |
output_namespace |
output_namespace field |
write_data_store |
write_data_store field |
force |
force |
local_repo |
local_repo |
fair_init
fair_init(name, identifier, endpoint = "http://127.0.0.1:8000/api/")
fair_init(name, identifier, endpoint = "http://127.0.0.1:8000/api/")
name |
a |
identifier |
(optional) a |
endpoint |
a |
fair_run
fair_run( path = "config.yaml", endpoint = "http://127.0.0.1:8000/api/", skip = FALSE )
fair_run( path = "config.yaml", endpoint = "http://127.0.0.1:8000/api/", skip = FALSE )
path |
string |
endpoint |
a |
skip |
don't bother checking whether the repo is clean |
fdp-class
fdp-class
Container for class fdp
yaml
a list
containing the contents of the working
config.yaml
fdp_config_dir
a string
specifying the directory passed
from fair run
model_config
a string
specifying the URL of an entry in
the object
table associated with the storage_location
of the
working config.yaml
submission_script
a string
specifying the URL of an entry in
the object
table associated with the storage_location
of the
submission script
code_repo
a string
specifying the URL of an entry in
the object
table associated with the GitHub repository
code_run
a string
specifying the URL of an entry in
the code_run
table
inputs
a data.frame
containing metadata associated with
code_run
inputs
outputs
a data.frame
containing metadata associated with
code_run
outputs
issues
a data.frame
containing metadata associated with
code_run
issues
new()
Create a new fdp
object
fdp$new( yaml, fdp_config_dir, model_config, submission_script, code_repo, code_run )
yaml
a list
containing the contents of the working
config.yaml
fdp_config_dir
a string
specifying the directory passed
from fair run
model_config
a string
specifying the URL of an entry in
the object
table associated with the storage_location
of the
working config.yaml
submission_script
a string
specifying the URL of an entry in
the object
table associated with the storage_location
of the
submission script
code_repo
a string
specifying the URL of an entry in
the object
table associated with the GitHub repository
code_run
a string
specifying the URL of an entry in
the code_run
table
Returns a new fdp
object
print()
Print method
fdp$print(...)
...
additional parameters, currently none are used
input()
Record code_run
inputs in fdp
object
fdp$input( data_product, use_data_product, use_component, use_version, use_namespace, path, component_url )
data_product
a string
specifying the name of the data
product, used as a reference
use_data_product
a string
specifying the name of the data
product, used as input in the code_run
use_component
a string
specifying the name of the data
product component, used as input in the code_run
use_version
a string
specifying the data product version,
used as input in the code_run
use_namespace
a string
specifying the namespace in which
the data product resides, used as input in the code_run
path
a string
specifying the location of the data product
in the local data store
component_url
a string
specifying the URL of an entry in the
object_component
table
Returns an updated fdp
object
output()
Record code_run
outputs in fdp
object
fdp$output( data_product, use_data_product, use_component, use_version, use_namespace, path, data_product_description, component_description, public )
data_product
a string
specifying the name of the data
product, used as a reference
use_data_product
a string
specifying the name of the data
product, used as output in the code_run
use_component
a string
specifying the name of the data
product component, used as output in the code_run
use_version
a string
specifying the version of the data
product, used as output in the code_run
use_namespace
a string
specifying the namespace in which
the data product resides, used as output in the code_run
path
a string
specifying the location of the data product
in the local data store
data_product_description
a string
containing a description of
the data product
component_description
a string
containing a description of
the data product component
public
Returns an updated fdp
object
output_index()
Return index of data product recorded in fdp
object
so that an issue may be attached
fdp$output_index(data_product, component, version, namespace)
data_product
a string
specifying the name of the data
product, used as output in the code_run
component
a string
specifying the name of the data
product component, used as output in the code_run
version
a string
specifying the name of the data
product version, used as output in the code_run
namespace
a string
specifying the namespace in which
the data product resides, used as input in the code_run
Returns an index used to identify the data product
raise_issue()
Record issue
in fdp
object
fdp$raise_issue( index, type, use_data_product, use_component, use_version, use_namespace, issue, severity )
index
a numeric
index, used to identify each input and
output in the fdp
object
type
a string
specifying the type of issue (one of
"data", "config", "script", "repo")
use_data_product
a string
specifying the name of the data
product, used as output in the code_run
use_component
a string
specifying the name of the data
product component, used as output in the code_run
use_version
a string
specifying the name of the data
product version, used as output in the code_run
use_namespace
a string
specifying the namespace in which
the data product resides, used as input in the code_run
issue
a string
containing a free text description of the
issue
severity
an integer
specifying the severity of the
issue
Returns an updated fdp
object
finalise_output_hash()
Record file hash and update path name in fdp
object
fdp$finalise_output_hash( use_data_product, use_data_product_runid, use_version, use_namespace, hash, new_path, data_product_url, delete_if_duplicate = FALSE )
use_data_product
a string
specifying the name of the data
product, used as output in the code_run
use_data_product_runid
a string
specifying the name of the
data product, the same as use_data_product
excluding the RUN_ID
variable
use_version
a string
specifying the name of the data
product version, used as output in the code_run
use_namespace
a string
specifying the namespace in which
the data product resides, used as input in the code_run
hash
a string
specifying the hash of the file
new_path
a string
specifying the updated location (filename
is now the hash of the file) of the data product in the local data store
data_product_url
a string
specifying the URL of an
object
associated with the data_product
delete_if_duplicate
(optional) default is FALSE
Returns an updated fdp
object
finalise_output_url()
Record data_product
and component URLs in fdp
object
fdp$finalise_output_url( use_data_product, use_component, use_version, use_namespace, component_url )
use_data_product
a string
specifying the name of the data
product, used as output in the code_run
use_component
a string
specifying the name of the data
product component, used as output in the code_run
use_version
a string
specifying the name of the data
product version, used as output in the code_run
use_namespace
a string
specifying the namespace in which
the data product resides, used as input in the code_run
component_url
a string
specifying the URL of an entry in the
object_component
table
Returns an updated fdp
object
clone()
The objects of this class are cloneable with this method.
fdp$clone(deep = FALSE)
deep
Whether to make a deep clone.
Finalise Code Run and push associated metadata to the local registry.
finalise(handle, delete_if_empty = FALSE, delete_if_duplicate = FALSE)
finalise(handle, delete_if_empty = FALSE, delete_if_duplicate = FALSE)
handle |
an object of class |
delete_if_empty |
(optional) default is |
delete_if_duplicate |
(optional) default is |
If a Code Run does not read an input, write an output, or attach an issue,
then delete the Code Run entry when delete_if_empty
is set to TRUE
.
If a data product has the same hash as a previous version, remove it from
the registry when delete_if_duplicate
is set to TRUE
.
Find read aliases in working config that match wildcard string
find_read_match(handle, data_product)
find_read_match(handle, data_product)
handle |
an object of class |
data_product |
a |
Find write aliases in working config that match wildcard string
find_write_match(handle, data_product)
find_write_match(handle, data_product)
handle |
an object of class |
data_product |
a |
Returns metadata associated with the calculated hash of a target file. When multiple entries exist in the data registry all are returned.
findme(file, endpoint)
findme(file, endpoint)
file |
file path |
endpoint |
endpoint |
Returns the names of the items at the root of the file
get_components(filename)
get_components(filename)
filename |
a |
Returns the names of the items at the root of the file
Other get functions:
get_entry()
,
get_existing()
,
get_file_hash()
,
get_github_hash()
get_dataproduct
get_dataproduct( data_product, version, namespace, endpoint = "http://127.0.0.1:8000/api/" )
get_dataproduct( data_product, version, namespace, endpoint = "http://127.0.0.1:8000/api/" )
data_product |
data_product |
version |
version |
namespace |
namespace |
endpoint |
endpoint |
Return all fields associated with a table entry in the data registry
get_entry(table, query, endpoint = "http://127.0.0.1:8000/api/")
get_entry(table, query, endpoint = "http://127.0.0.1:8000/api/")
table |
a |
query |
a |
endpoint |
a |
Returns a list
of fields present in the specified entry
Other get functions:
get_components()
,
get_existing()
,
get_file_hash()
,
get_github_hash()
Reads in a working config file, generates new Code Run entry, and returns a handle containing various metadata.
initialise(config, script)
initialise(config, script)
config |
a |
script |
a |
Returns an object of class fdp, R6
containing metadata
required by the Data Pipeline API
Link path to external format data
link_read(handle, data_product)
link_read(handle, data_product)
handle |
an object of class |
data_product |
a |
Returns a string
specifying the location of the data product
to be read
Link path for external format data
link_write(handle, data_product)
link_write(handle, data_product)
handle |
an object of class |
data_product |
a |
Returns a string
specifying the location in which the data
product should be written
raise_issue
raise_issue( index, handle, component = NA, data_product, issue, severity, whole_object = FALSE )
raise_issue( index, handle, component = NA, data_product, issue, severity, whole_object = FALSE )
index |
index returned from |
handle |
an object of class |
component |
a |
data_product |
a |
issue |
a |
severity |
a |
whole_object |
a |
Raise issue with config file
raise_issue_config(handle, issue, severity)
raise_issue_config(handle, issue, severity)
handle |
an object of class |
issue |
a |
severity |
a |
Raise issue with remote repository
raise_issue_repo(handle, issue, severity)
raise_issue_repo(handle, issue, severity)
handle |
an object of class |
issue |
a |
severity |
a |
Raise issue with submission script
raise_issue_script(handle, issue, severity)
raise_issue_script(handle, issue, severity)
handle |
an object of class |
issue |
a |
severity |
a |
Function to read array type data from hdf5 file.
read_array(handle, data_product, component)
read_array(handle, data_product, component)
handle |
an object of class |
data_product |
a |
component |
a |
Returns an array with attached Dimension_i_title
,
Dimension_i_units
, Dimension_i_values
, and units
attributes, if available
Function to read distribution type data from toml file.
read_distribution(handle, data_product, component)
read_distribution(handle, data_product, component)
handle |
an object of class |
data_product |
a |
component |
a |
Function to read point-estimate type data from toml file.
read_estimate(handle, data_product, component)
read_estimate(handle, data_product, component)
handle |
an object of class |
data_product |
a |
component |
a |
Function to read table type data from hdf5 file.
read_table(handle, data_product, component)
read_table(handle, data_product, component)
handle |
an object of class |
data_product |
a |
component |
a |
Returns a data.frame
with attached column_units
attributes, if available
Function to populate hdf5 file with array type data.
write_array( array, handle, data_product, component, description, dimension_names, dimension_values, dimension_units, units )
write_array( array, handle, data_product, component, description, dimension_names, dimension_values, dimension_units, units )
array |
an |
handle |
an object of class |
data_product |
a |
component |
a |
description |
a |
dimension_names |
a |
dimension_values |
(optional) a |
dimension_units |
(optional) a |
units |
(optional) a |
Returns a handle index associated with the just written component, which can be used to raise an issue if necessary
Other write functions:
write_distribution()
,
write_estimate()
,
write_table()
Write distribution component to TOML file
write_distribution( distribution, parameters, handle, data_product, component, description )
write_distribution( distribution, parameters, handle, data_product, component, description )
distribution |
a |
parameters |
a |
handle |
an object of class |
data_product |
a |
component |
a |
description |
a |
Returns a handle index associated with the just written component, which can be used to raise an issue if necessary
Other write functions:
write_array()
,
write_estimate()
,
write_table()
Function to populate toml file with point-estimate type data. If a file already exists at the specified location, an additional component will be added.
write_estimate(value, handle, data_product, component, description)
write_estimate(value, handle, data_product, component, description)
value |
an object of class |
handle |
an object of class |
data_product |
a |
component |
a |
description |
a |
Returns a handle index associated with the just written component, which can be used to raise an issue if necessary
Other write functions:
write_array()
,
write_distribution()
,
write_table()
Function to populate hdf5 file with array type data.
write_table( df, handle, data_product, component, description, row_names, column_units )
write_table( df, handle, data_product, component, description, row_names, column_units )
df |
an |
handle |
an object of class |
data_product |
a |
component |
a |
description |
a |
row_names |
(optional) a |
column_units |
(optional) a |
Returns a handle index associated with the just written component, which can be used to raise an issue if necessary
Other write functions:
write_array()
,
write_distribution()
,
write_estimate()